From 8dae82dede05e818786ab9276271e37a6be92d96 Mon Sep 17 00:00:00 2001 From: JakubPietrakIntel Date: Wed, 7 Dec 2022 15:49:18 +0100 Subject: [PATCH] Squashed commit of the following: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 63ebc8d6a000199e963d29b6c8a0f54d3150872b Author: Jakub Pietrak Date: Thu Dec 1 13:32:03 2022 +0100 rm print commit 2c8ffeaf1b2168ed9ad4ca6b192a1231fb036760 Author: Jakub Pietrak Date: Thu Dec 1 11:35:02 2022 +0100 pytorch_sparse.matmul to torch.sparse.matmul commit ee0e184a1ce5dc6ad7005a67621fac19d6fdbb0b Merge: 4562359b9f 3a858ba8e3 Author: Jakub Pietrak Date: Mon Nov 28 14:09:42 2022 +0100 Merge branch 'gh/mingfeima/85/head' of https://github.com/pytorch/pytorch into pyg-36 commit 4562359b9fb3de301690334a892d44911eda45c8 Merge: deba083400 b5616cd5f4 Author: Jakub Pietrak Date: Mon Nov 28 12:22:11 2022 +0000 Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36 commit deba0834008ad95af7e3a6603223a0f8a5555967 Merge: 0e1a8522bb a97d0508cb Author: Jakub Pietrak Date: Mon Nov 28 12:19:25 2022 +0000 Merge branch 'pyg-36' of https://github.com/JakubPietrakIntel/pytorch into pyg-36 commit 0e1a8522bb695387816a29bbfcf182962429b3ab Merge: 059a238619 75bfbc35ca Author: Jakub Pietrak Date: Mon Nov 28 12:16:35 2022 +0000 Merge remote-tracking branch 'origin/gh/mingfeima/85/head' into pyg-36 commit b5616cd5f4fc150138b79d3396a603eda6a7a8a8 Author: Michael Voznesensky Date: Mon Nov 28 05:12:37 2022 +0000 Add simple assert to detect fake tensors on modules (#89723) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89723 Approved by: https://github.com/ezyang commit db1f1144f1303db45e0b9d96e4bb6bdd87c80e5a Author: Edward Z. Yang Date: Sat Nov 26 13:52:28 2022 -0800 Beef up AOTAutograd logging with aot_id and input descriptions (#89710) A few things in this PR, that I found useful while debugging some recent issues: - We now allocate an aot_id to each aot_function/aot_module invocation, and print it whenever we report error messages and graph output logging. Check the comment for why this sort of thing is useful, and also why it's different from nth_graph. This number is now incorporated into aot_graph_name - I noticed that nth_graph only gets incremented when backwards is compiled. Because backwards is compiled lazily, this means that multiple forward graphs would have gotten the same ID! I change nth_graph to always increment to avoid confusion here. - I added a simple describe_input function, which makes use of num_params_buffers to tell the user if the input index they're looking at is a param/buffer or an input. With the help of https://github.com/pytorch/pytorch/pull/89709 we could give even more detailed information about inputs (we could also easily give detailed information about parameters if we stored a mapping of index to parameter name, but I didn't need this when debugging so I'll let someone else add it if they need it.) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89710 Approved by: https://github.com/bdhirsh commit 5f8848f32901e35cead64d520885f718679c2bbe Author: Edward Z. Yang Date: Thu Nov 24 15:26:55 2022 -0500 Don't suppress log messages for dynamo CI config (#89653) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89653 Approved by: https://github.com/albanD, https://github.com/kit1980 commit 1a2dd6b15e0089a9e45ba4feb90c2d0dfac19238 Author: Edward Z. Yang Date: Sun Nov 27 19:27:45 2022 -0500 Add single process version of dynamo distributed hf_Bert tests (#89721) It's a lot easier to debug problems in the Dynamo optimization pass if you aren't actually triggering a multiprocessing run. Keep these tests around. I think the other tests can probably get this treatment too, leaving this to future work. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721 Approved by: https://github.com/voznesenskym commit 0e7c100c9b7417efb1a8f65778a1e3c9ad10ef3e Author: Edward Z. Yang Date: Sat Nov 26 11:25:24 2022 -0800 Add debug asserts to AOTAutograd for input consistency with compilation (#89702) Fixes https://github.com/pytorch/torchdynamo/issues/1927 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89702 Approved by: https://github.com/bdhirsh commit 1f95f24d3003a35568a00b5e5e18439846089b0f Author: Edward Z. Yang Date: Sat Nov 26 11:25:24 2022 -0800 Factor input deduplication into a separate function (#89701) It turns out that instead of having a giant blobby aot_dispatch_autograd function, we can factor it into a series of wrapper functions, each of which successively guarantees more invariants on the inner compilation function until the final inner function is quite trivial. How exactly you have to wrap the input user functions and the output compiled functions can be expressed concisely in Haskell, so I've included the Haskell formulation in code comments. This PR shows how to do this for input deduplication. Dealing with the rest of the view handling is left to future work. This PR should also be a slight performance improvement as deduplicating is skipped entirely when there are no duplicate inputs. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89701 Approved by: https://github.com/bdhirsh commit dcefc8f90fbc86041a7abcce4f227d15c59bd96c Author: Edward Z. Yang Date: Sat Nov 26 14:28:56 2022 -0500 Implement guard_source on RandomValueSource (#89711) I audited the pattern matches on the enum and it didn't look like this one should apply there. Sorry, no test, I know this matters on symbolic-shapes branch but I haven't had time to extract out a minimal reproducer. Take my word for it. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89711 Approved by: https://github.com/jansel commit 1da633f98a5da000083c0c47d9e192b2689f867b Author: Edward Z. Yang Date: Thu Nov 24 13:57:17 2022 +0000 Access named parameters/buffers/etc via getattr rather than index (#89625) I'm not sure why this never caused problems before. The error manifests as `TypeError: 'MyModule' object is not subscriptable` Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89625 Approved by: https://github.com/albanD commit e36d68af8885f27d8c0b4727ab078bf53e55e7a0 Author: Horace He Date: Thu Nov 24 02:17:37 2022 +0000 Don't allow recomputing a node that *must* be materialized in the backwards pass (#89171) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89171 Approved by: https://github.com/ngimel commit b709078dc673cbd5025a1df3eae7f5c60acc2698 Author: Taylor Robie Date: Sat Nov 26 10:33:21 2022 -0800 [Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926) There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.) Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926 Approved by: https://github.com/chaekit commit 143d2881a844934c95c4ada63b38179d97e65af3 Author: Taylor Robie Date: Sat Nov 26 10:33:19 2022 -0800 [Profiler] Memory profiler part 10: Mark optimizer state (#88925) This is also a fairly simple pass, since we're simply collecting values from the python tracer. Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925 Approved by: https://github.com/chaekit commit ae725d501e33ed6f823997bea03d99cdc8dae5ff Author: Taylor Robie Date: Sat Nov 26 10:33:18 2022 -0800 [Profiler] Memory profiler part 9: Mark activations (#88924) This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass. Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924 Approved by: https://github.com/chaekit commit 56e40fe054ecb7700142ea9ae7fe37e77800a2da Author: Yuxin Wu Date: Sun Nov 27 05:55:24 2022 +0000 Let SyncBatchNorm fallback to BN if not using distributed training (#89706) Fixes #63662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706 Approved by: https://github.com/soumith commit 39449ea61d9a6644731687219282f610cbf7cf54 Author: PyTorch MergeBot Date: Sun Nov 27 02:59:04 2022 +0000 [vision hash update] update the pinned vision hash (#89692) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89692 Approved by: https://github.com/pytorchbot commit 483d3a3d07e6694757c5158bc21f7f757f8c82c3 Author: Taylor Robie Date: Sat Nov 26 10:33:16 2022 -0800 [Profiler] E2E expecttests for category assignment (#88653) Up until now the unit tests for category assignment have been narrowly scoped to specific checks on specific Tensors. However as we start to reach reasonable levels of category assignment it's useful to supplement those tests with higher level summary tests to inspect the larger graph and confirm that it makes sense. (It will also be necessary for some categories like activations where it is tedious to record all relevant Tensors.) The general structure of these tests is to capture a model invocation with `__torch_dispatch__` and then cross reference those inputs and outputs with the categories assigned by the memory profiler. Differential Revision: [D40868659](https://our.internmc.facebook.com/intern/diff/D40868659/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88653 Approved by: https://github.com/chaekit commit 0435894bb3b2d60e5da9f993c2a56d95fb03a971 Author: Taylor Robie Date: Sat Nov 26 10:33:14 2022 -0800 [Profiler] Memory profiler part 8: Mark parameters. (#87568) Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates. Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.) Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568 Approved by: https://github.com/chaekit commit 17fa6bf1f57cbbe84a14566efcf00f21e1abe489 Author: Taylor Robie Date: Sat Nov 26 10:33:13 2022 -0800 [Profiler] Memory profiler part 7: Mark inputs (#87567) It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference. Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements. Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567 Approved by: https://github.com/chaekit commit 64c5c77cd47212da719eb29c3b0a2b07cebb3705 Author: Taylor Robie Date: Sat Nov 26 10:33:11 2022 -0800 [Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566) Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories. We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage. Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node. Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.) Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566 Approved by: https://github.com/chaekit commit 5f09a6d573a2a07c00c76c3cbdbffe0fafe2436d Author: Taylor Robie Date: Sat Nov 26 10:33:09 2022 -0800 [Profiler] Memory profiler part 5: Data flow graph (#87006) The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs. It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node. Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006 Approved by: https://github.com/chaekit commit c3116dd78b294f1bd3f6424dc1bfb7ff86bb0a66 Author: Taylor Robie Date: Sat Nov 26 10:33:08 2022 -0800 [Profiler] Memory profiler part 4: Select top level torch ops (#86880) In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is. Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880 Approved by: https://github.com/chaekit commit bb77accb4c996e3aab9ae4b665fb8464400c8194 Author: Jiong Gong Date: Sat Nov 26 14:06:44 2022 +0000 [Inductor] Record cpp kernel in PyTorch Profiler (#89367) Add an option `config.cpp.enable_kernel_profile` to record individual cpp kernel time in PyTorch Profiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89367 Approved by: https://github.com/jansel commit 36018a6ee63f140b95ad644d09920798b0c624f8 Author: Edward Z. Yang Date: Fri Nov 25 13:48:35 2022 -0800 Don't suppress exceptions from backends (#89656) Taken from voz's https://github.com/pytorch/pytorch/pull/89392 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89656 Approved by: https://github.com/voznesenskym commit 3e20d023b1f442ebe59e76604395cd8d4abed52a Author: Natalia Gimelshein Date: Sat Nov 26 03:08:23 2022 +0000 put descriptive kernel names behind config (#89697) Per title, generated kernel names are often long and confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89697 Approved by: https://github.com/Chillee commit 591dfffa38848de54b7f5f4e49260847024c9281 Author: jlukehubbard <58089207+jlukehubbard@users.noreply.github.com> Date: Fri Nov 25 21:31:53 2022 +0000 update docstring for torch.linalg.lstsq (#89383) Previous documentation lacked details about the handling of over- and underdetermined systems, and made incorrect mention of MAGMA. Fixes #85021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89383 Approved by: https://github.com/lezcano commit c9a0cc86407d7ec20524b0e26305109d0cf2b5c2 Author: Edward Z. Yang Date: Fri Nov 25 03:31:20 2022 +0000 Simplify aot_module_simplified by removing top_args/top_kwargs (#89666) This makes good on Chillee's CR comment at https://github.com/pytorch/functorch/pull/660/files/af30d351cc93dfafb5a94dbcb32983c5ef65fd6a#r843315222 which was never done in the original PR. There is no logic change, just unpack the args/kwargs at the top level and remove the inner function indirection. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89666 Approved by: https://github.com/voznesenskym commit 6168f22fae66da5703e087bcd10076921ca157e7 Author: Edward Z. Yang Date: Fri Nov 25 03:31:19 2022 +0000 Don't support kwargs at runtime in aot_module_simplified (#89664) The preexisting logic here added in https://github.com/pytorch/functorch/pull/970 was very peculiar: if top_kwargs was non-empty, then the inner compiled function supports kwargs. Naively, this would leave you to expect that there is some sort of correlation between top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but kwargs is the RUNTIME kwargs that are to be passed to the compiled function. But (1) we don't support this (the function to be compiled only takes a list of tensors) and (2) even if we did support it, conditioning on whether or not you had passed AOTAutograd configuration kwargs to support kwargs at runtime is bonkers. So delete it. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89664 Approved by: https://github.com/voznesenskym commit b04dda4291f1d30b064572e4521e82fa2573af77 Author: Edward Z. Yang Date: Fri Nov 25 03:31:19 2022 +0000 Delay verify correctness wrapping to call site. (#89662) There is only one call site for compiler_fn, so we can safely delay wrapping verify correctness to here. This will help later when we change the backend compiler calling convention to pass fake tensors (but I need to pass real tensors here.) This is adapted from voz's changes at https://github.com/pytorch/pytorch/pull/89392 but with less changes to the substantive logic. I only moved the relevant inner implementation; there are no changes otherwise. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89662 Approved by: https://github.com/voznesenskym commit 61a3fe4b6409965223273c1098f9a77ff071efe1 Author: Natalia Gimelshein Date: Fri Nov 25 19:42:38 2022 +0000 make inductor correctly propagate nans for maximum and minimum (#89612) Partially fixes https://github.com/pytorch/torchdynamo/issues/594 Also, small cleanup for `where` codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/89612 Approved by: https://github.com/soumith, https://github.com/jansel commit 70c0a3006ee96b3db1f531109fc383f8159e2d2f Author: Ikko Ashimine Date: Fri Nov 25 19:26:18 2022 +0000 Fix typo in segment_reduction_op_gpu.cu (#89647) menber -> member Pull Request resolved: https://github.com/pytorch/pytorch/pull/89647 Approved by: https://github.com/kit1980 commit 2c0bd85c755043d696452ddab354f3ff6775738b Author: kshitij12345 Date: Fri Nov 25 14:53:57 2022 +0000 complex: register c10::complex with py::cast (#89680) Fixes #77134 TODO: * [x] Add test (tested locally with script below) (Are there similar tests in the test-suite?) ```c++ namespace py = pybind11; int main() { py::scoped_interpreter guard{}; // start the interpreter auto casted_cdouble = py::cast(c10::complex(1.0, 2.0)); assert( (c10::complex(1.0, 2.0) == py::cast>(casted_cdouble))); auto casted_cfloat = py::cast(c10::complex(1.0, 2.0)); assert( (c10::complex(1.0, 2.0) == py::cast>(casted_cfloat))); auto casted_chalf = py::cast(c10::complex(1.0, 2.0)); assert( (c10::complex(1.0, 2.0) == py::cast>(casted_chalf))); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89680 Approved by: https://github.com/ezyang commit a97d0508cb5259951bc48300fb914cebdf322bb9 Merge: 849be586e6 abb446af8c Author: Jakub Pietrak Date: Fri Nov 25 15:24:54 2022 +0100 Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36 commit 849be586e649421ba58182feb9067a4ac65479e3 Merge: 059a238619 75bfbc35ca Author: Jakub Pietrak Date: Fri Nov 25 14:25:40 2022 +0100 Merge branch 'gh/mingfeima/85/head' into pyg-36 commit abb446af8c65a49bbc3767e14605a73d244c176b Author: Alvaro Gaona Date: Fri Nov 25 11:09:28 2022 +0000 Implement old windows in Python (#87082) Relates to #85366 - Bartlett, Blackman, Hamming, Hann. - Except Kaiser which will be in a different PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/87082 Approved by: https://github.com/mruberry, https://github.com/lezcano commit 059a238619b122f922c569c618919a277420e483 Merge: 26ba2e9751 95ea47ef0c Author: Jakub Pietrak <97102979+JakubPietrakIntel@users.noreply.github.com> Date: Fri Nov 25 10:00:53 2022 +0100 Merge branch 'pytorch:master' into jpietrak/pyg-36 commit 95ea47ef0c1cffe1fe05cc36bdc47c26cc72f13e Author: Jason Ansel Date: Fri Nov 25 04:28:36 2022 +0000 torchdynamo to torch._dynamo in aot_autograd.py (#89385) Test Plan: Run torchbench models Differential Revision: D41429573 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89385 Approved by: https://github.com/soumith, https://github.com/malfet commit 69043247819042db18ac9526c2d747fa61fe8880 Author: Edward Z. Yang Date: Thu Nov 24 12:00:13 2022 -0800 Remove fake_tensor_propagation (#89646) You always have to run dynamo with fake tensors. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646 Approved by: https://github.com/soumith commit 1aa1014b262b75d4269d9a4d8b562c6ee43a0991 Author: Edward Z. Yang Date: Thu Nov 24 12:00:12 2022 -0800 xfail maml test, instead of running it without fake tensor prop (#89645) A previous version of this patch graph breaks when torch.tensor fails, but that causes ``` PYTORCH_TEST_WITH_DYNAMO=1 python test/nn/test_embedding.py -k test_embedding_bag_1D_padding_idx_cpu_float32 ``` to start failing. Probably another latent bug that needs investigating. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89645 Approved by: https://github.com/albanD commit a048913e2530442360c36a48420079ca9ebca149 Author: PyTorch MergeBot Date: Fri Nov 25 03:03:41 2022 +0000 [vision hash update] update the pinned vision hash (#89667) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89667 Approved by: https://github.com/pytorchbot commit 3b3ebcd031b68762938806f541d7247a1521bb11 Author: XiaobingSuper Date: Thu Nov 24 02:33:01 2022 -0500 TorchDynamo: weight prepack for single conv (#89209) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89209 Approved by: https://github.com/jgong5, https://github.com/jansel commit 0c4f3db7bf24e94125c6802718a1105ee548c953 Author: XiaobingSuper Date: Thu Nov 24 02:32:59 2022 -0500 TorchDynamo: weight prepack for mkl linear (#89109) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89109 Approved by: https://github.com/jgong5, https://github.com/jansel commit 07151a6bd62e308b6b32e2e0edfc4d5f0563576e Author: XiaobingSuper Date: Thu Nov 24 02:32:55 2022 -0500 TorchDynamo: weight prepack for onednn convolution external call (#88988) This PR is about enabled weight prepack using the MKLDNN tensor: 1. enable fake tensor mode for MKLDNN tensor input. 2. make convolution fusion kernel support MKLDNN tensor input. 3. do the weight prepack at FX fusion step. For better performance, we always use channels_last for CPU convolution path. because we test that the channels_last path can get a better performance than block input path, and also avoid the activation's layout conversion(plain to block, block to plain), currently, there only need plain to plain format conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88988 Approved by: https://github.com/jgong5, https://github.com/jansel commit 0884fdaba0280e3f3ad2abc34c0940587f744886 Author: Edward Z. Yang Date: Thu Nov 24 14:31:00 2022 -0500 Revert "Dont clone unmutated args in triton autotuning (#89519)" (#89652) This reverts commit f18f0c70ab10c400947e71be30794e04dcc22acf. Testing to see if this fixes gmixer_24_224 mixer_b16_224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89652 Approved by: https://github.com/eellison commit 4a16f8cdb26be3561742e86f184e59f65418fe63 Author: Edward Z. Yang Date: Thu Nov 24 09:00:09 2022 -0800 Reenable fake_tensor_propagation on test_cudnn_rnn (#89644) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89644 Approved by: https://github.com/anjali411 commit fc7dcb684aa38da5b1534fc701657ee63af8909c Author: Edward Z. Yang Date: Thu Nov 24 09:00:09 2022 -0800 Run optimizer tests with fake tensors (#89643) This is a slight regression: RAdam and Adagrad don't appear to trace at all under fake tensors. But I think this is a more accurate reflection of the current state of affairs. Along the way fix some problems on the fake tensor path. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89643 Approved by: https://github.com/anjali411 commit 9b13508ef3a4e858fbbbf068b3a825f1632e8daa Author: Edward Z. Yang Date: Thu Nov 24 09:00:08 2022 -0800 Force test_rng_state to run with fake tensor prop (#89641) I'm not really sure what desertfire's intended follow up was on https://github.com/pytorch/pytorch/pull/87490 because when I remove the unsupported() call, dynamo tests pass. But the change here is conservative and I think strictly better than the current situation. The idea is to force fake tensor pop on for the test, and then just observe that we are doing a graph break. Clearly, export doesn't work, so I manually xfail it. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89641 Approved by: https://github.com/anjali411 commit c6be06d93ab911a3fbb185451c8cf42bcedad0c1 Author: Edward Z. Yang Date: Thu Nov 24 09:00:08 2022 -0800 Easy: These tests work with fake_tensor_propagation on (#89640) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89640 Approved by: https://github.com/anjali411, https://github.com/albanD commit 6fb6eb0a7498839e69302da7bf8c04205c64e0f3 Author: Edward Z. Yang Date: Thu Nov 24 08:11:48 2022 -0800 Support unspecialized integers with dynamic shapes (#89639) Previously, we hackily wrapped unspecialized integers into tensors and treated them as tensor inputs. Sometimes, downstream operations would not be able to deal with the tensor input. Now, we wrap them into SymInt, so more correct overload selection occurs. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639 Approved by: https://github.com/anjali411 commit 0c96841a20f0ae9380ef26657914276a42c9c9d7 Author: Edward Z. Yang Date: Thu Nov 24 08:11:47 2022 -0800 Cond capture with fake tensors actually works; don't raise in this case (#89638) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89638 Approved by: https://github.com/anjali411 commit d3c012f409a4e4d5a11070a90b5578da82778030 Author: kshitij12345 Date: Thu Nov 24 21:41:20 2022 +0000 [test_nn] split pruning tests from test_nn (#89590) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590 Approved by: https://github.com/albanD commit 83666f167dcf023d301f16fad82b9afb374ad836 Author: Aleksandar Samardžić Date: Thu Nov 24 14:44:12 2022 +0000 Added vectorized CPU code for uint8_t datatype. (#89284) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89284 Approved by: https://github.com/lezcano, https://github.com/peterbell10 commit 9497552771ca59c68509398ab3094e590a3047c5 Author: Howard Huang Date: Thu Nov 24 19:41:17 2022 +0000 Update SyncBatchNorm _all_gather_base to all_gather_into_tensor (#89521) Summary: Fixes https://github.com/pytorch/pytorch/issues/88568 `_all_gather_base` is deprecated. So replacing its usage with `all_gather_into_tensor` Test Plan: CI Differential Revision: D41479983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89521 Approved by: https://github.com/wz337 commit 94a88b53ed37854379813abf9641d1637fe2688b Author: Edward Z. Yang Date: Thu Nov 24 08:11:46 2022 -0800 Remove fake_tensors_available (#89637) As we are one repo now, they are always available. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637 Approved by: https://github.com/anjali411 commit 1c8b0779de76d0c76d34835047106ab37b41790b Author: Emilio Castillo Date: Thu Nov 24 18:25:26 2022 +0000 Fix segfault when swapping custom allocator (#89613) Just screwed it before merging ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/89613 Approved by: https://github.com/albanD commit fd279fe85b8f5a8e74c615436f0b180621b6ef52 Author: Edward Z. Yang Date: Thu Nov 24 09:23:05 2022 -0500 Make pytest work again on test/dynamo (#89631) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89631 Approved by: https://github.com/anjali411 commit c3e85d879cdbd3973754760c6767c75276b1dca8 Author: albanD Date: Thu Nov 24 17:11:42 2022 +0000 Mention discrepency between original impl and our impl of RAdam (#89575) Fixes https://github.com/pytorch/pytorch/issues/88836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575 Approved by: https://github.com/mruberry commit 860bae49e4925868a0221ec4345d08407280bac7 Author: Edward Z. Yang Date: Wed Nov 23 08:04:31 2022 -0800 Suppress guards on as_strided call only. (#89569) See comment in meta_utils.py for the whole story. This doesn't have a substantive impact yet, but will in the next PR on the stack. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89569 Approved by: https://github.com/albanD commit 1588ea0dbf16f37ce14cfc8764666985c16ccbf9 Author: mfkasim1 Date: Thu Nov 24 11:11:51 2022 +0000 Added log1p for complex in c10 (#89214) One PR towards #89205. The content is mostly from PR #38465, but slightly changed the expression to make it faster. Here are some benchmarking code: ```c++ // main.cc template inline std::complex log1p_v0(const std::complex &z) { // this PR T x = z.real(); T y = z.imag(); T theta = std::atan2(y, x + T(1)); T r = x * (x + T(2)) + y * y; return {T(0.5) * std::log1p(r), theta}; } template inline std::complex log1p_v1(const std::complex &z) { // PR #38465 T x = z.real(); T y = z.imag(); std::complex p1 = z + T(1); T r = std::abs(p1); T a = std::arg(p1); T rm1 = (x * x + y * y + x * T(2)) / (r + 1); return {std::log1p(rm1), a}; } template inline std::complex log1p_v2(const std::complex &z) { // naive, but numerically inaccurate return std::log(T(1) + z); } int main() { int n = 1000000; std::complex res(0.0, 0.0); std::complex input(0.5, 2.0); auto start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v0(input); } auto end = std::chrono::system_clock::now(); auto elapsed = end - start; std::cout << "time for v0: " << elapsed.count() << '\n'; start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v1(input); } end = std::chrono::system_clock::now(); elapsed = end - start; std::cout << "time for v1: " << elapsed.count() << '\n'; start = std::chrono::system_clock::now(); for (int i = 0; i < n; i++) { res += log1p_v2(input); } end = std::chrono::system_clock::now(); elapsed = end - start; std::cout << "time for v2: " << elapsed.count() << '\n'; std::cout << res << '\n'; } ``` Compiling the script with command `g++ main.cc` produces the following results: ``` time for v0: 237812271 time for v1: 414524941 time for v2: 360585994 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89214 Approved by: https://github.com/lezcano commit 4f5c4c022a8365d06ac401582958bbf0fd3f8337 Author: Jiewen Tan Date: Thu Nov 24 10:57:01 2022 +0000 [LTC] Refine MetricsArena::Reset (#89608) Summary: After counters are reset, getters' behaviors are inconsistent. To improve that, here I 1) move the validation of CounterData into CounterData::IsValid such that it's better encapsulated, 2) divide getters into two groups: a) MetricsArena::GetCounter() and b) MetricsArena::ForEachCounter(), and route MetricsArena::GetCounterNames() and CreateMetricReport() to use b. This is paired with pytorch/xla#4217. Test Plan: PJRT_DEVICE=CPU python xla/test/test_metrics.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/89608 Approved by: https://github.com/JackCaoG commit a8629a1c18fd13300ce69c1d6042004038885cf0 Author: Jithun Nair Date: Thu Nov 24 10:53:20 2022 +0000 Upgrade nightly wheels to ROCm5.3 (#89101) Dependent on PR https://github.com/pytorch/builder/pull/1193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89101 Approved by: https://github.com/kit1980 commit c0d81aa70ce45a0c2e7ced6c9f42a92d15523188 Author: Ivan Yashchuk Date: Thu Nov 24 09:37:10 2022 +0000 Use fx.replace_pattern for removing empty_like+fill in nvFuser+PrimTorch execution (#89132) I learned about `torch.fx.replace_pattern` and it's a cleaner way of removing unnecessary tensor materialization from the graph coming from tracing C++ code `1 - tensor`. Test: ``` python -m pytest test/test_prims.py -k "test_silu_backward_no_filled_tensor" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89132 Approved by: https://github.com/mruberry, https://github.com/jjsjann123 commit b515c1d96082214e81cc57ce2a1de9164b50206f Author: Hao Guan <10684225+hguandl@users.noreply.github.com> Date: Thu Nov 24 08:14:24 2022 +0000 [QAT] Check the value of numel to avoid segfault (#81547) Fixes #78123 Segmentation fault RuntimeError: numel is out of the bound of input tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/81547 Approved by: https://github.com/kit1980 commit 22a1b5e243e852e1c423c697e51975d1545d2a1b Author: Vasiliy Kuznetsov Date: Wed Nov 23 13:01:15 2022 -0800 quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431) Summary: This PR deprecates the `compute_dtype` field on observers, and replaces it with the `is_dynamic` field on observers. This is better aligned with the reference model spec. Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431 Approved by: https://github.com/jerryzh168 commit e4ccec6ecab9b48e804d58f60135f0950fca864f Author: Yanbo Liang Date: Thu Nov 24 05:28:58 2022 +0000 [Dynamo] Fix bug of using customized torch.autograd.Function (#89397) Fixes https://github.com/pytorch/torchdynamo/issues/1899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89397 Approved by: https://github.com/jansel commit 903ae4570e401e5c4e42dc4a44cae37f805044a4 Author: Michael Lazos Date: Thu Nov 24 04:15:34 2022 +0000 Disable optimizer tracing, enable for tests only (#89500) Disabling optimizer tracing before launch until it can be added to the benchmark suites without increasing compile times Pull Request resolved: https://github.com/pytorch/pytorch/pull/89500 Approved by: https://github.com/anijain2305 commit c79489c8e69f965f3e5af8f3f39df78e7d4732ba Author: albanD Date: Thu Nov 24 03:39:55 2022 +0000 Expose to python the backward AD view_func (#89586) This will be useful for other systems (AOTAutograd) that want to replay autograd views. FYI @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/89586 Approved by: https://github.com/soulitzer commit 4cb6bbbe27162c7b0835879131991d2155329718 Author: Nikita Karetnikov Date: Thu Nov 24 01:02:28 2022 +0100 Symintify `embedding` (#89327) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327 Approved by: https://github.com/ezyang commit 9c867eae1a7fffb6f893717073150cff04a923a4 Author: Wu, Chunyuan Date: Wed Nov 23 20:10:41 2022 +0000 nnc: fix Store if value is fp32 while buf is bf16 (#86788) Fixes https://github.com/pytorch/pytorch/issues/86533. For the below graph: ```bash [DUMP kernel.cpp:1690] TensorExprKernel graph: [DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)): [DUMP kernel.cpp:1690] %1 : int = prim::Constant[value=0]() [DUMP kernel.cpp:1690] %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29 [DUMP kernel.cpp:1690] %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19 [DUMP kernel.cpp:1690] return (%3) ``` **Loop stmt before the fix:** The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16. ```bash [DEBUG llvm_codegen.cpp:489] After HalfRewriter { [DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8); [DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) { [DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f; [DEBUG llvm_codegen.cpp:489] } [DEBUG llvm_codegen.cpp:489] } ``` **Loop stmt after the fix:** ```bash [DEBUG llvm_codegen.cpp:489] After HalfRewriter { [DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8)); [DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) { [DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f); [DEBUG llvm_codegen.cpp:489] } [DEBUG llvm_codegen.cpp:489] } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788 Approved by: https://github.com/EikanWang, https://github.com/kit1980 commit f0e5bc4b9f231b438f76ddd13b2c21b7cb8a09ac Author: Zhijing Li (Accelerator Enablement) Date: Thu Nov 24 02:18:32 2022 +0000 Symintified layer_norm (#89466) Summary: As titled. Test Plan: ``` buck2 run mode/opt scripts/wwei6:test_executorch ``` Differential Revision: D41451390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89466 Approved by: https://github.com/frank-wei, https://github.com/ezyang commit fdb2dd113d3aec0acb2a473de6be49940ab6a115 Author: Alexander Grund Date: Thu Nov 24 01:52:11 2022 +0000 Install missing VSX headers (POWER) (#85547) E.g. `test_cpp_extensions_aot_ninja` fails as it includes `vec.h` which requires the vec/vsx/* headers and `sleef.h`. The latter is also required for AVX512 builds on non MSVC compilers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85547 Approved by: https://github.com/kit1980 commit e922bd4e523b0a30f6607f6497ac458571e00131 Author: Wei-Sheng Chin Date: Thu Nov 24 01:30:09 2022 +0000 [ONNX] Move two headers from .h to .cc (#86852) As title. Header dependency should be as small as possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86852 Approved by: https://github.com/titaiwangms, https://github.com/BowenBao commit 23fe2ff910fd1577281a2210d1184aff705191b8 Author: Shunting Zhang Date: Thu Nov 24 01:28:10 2022 +0000 verify the number of outputs of xla graph (#89536) This PR add tests to verify the behavior of number of outputs returns by an XLA graph. The understanding from this PR will help us fix https://github.com/pytorch/torchdynamo/issues/1908 and enable training for dynamo/torchxla integration eventually. Send this PR separately so Jack could help verify if the behavior is expected and play with it. List some code snippets here since their behavior is not straightforward at a first glance: ``` def forward(self, a, b, c): """ The XLA graph will only return the first 2 items """ return a + b, a + c, b ``` ``` def forward(self, a, b, c): """ Inplace update on b cause it to be returned in XLA graph """ b.zero_() return a + b, a + c, b ``` ``` def forward(self, a, b, c): """ Even if we return b twice, the XLA graph only return b once. """ b.zero_() return a + b, a + c, b, b ``` Here are what observed by the added tests: 1. XLA does not return outputs that are also inputs -- if the tensor is not inplace updated. At first glance people may feel curious why should we consider this kind of 'non-realistic' corner case. But this kind of graphs indeed shows up in AOTAutograd. The main reason is AOTAutograd lift all model parameters/buffers as graph input and may return some of them. Check ***test_direct_return*** 2. if a tensor is inplace updated, XLA will still return it as graph output even if it's also an input. The only difference compared to item 1 is, the inplace updating on the tensor cause it being returned. This happens for BatchNorm2d since the running_mean/variance tensors will be inplace updated during training. Check ***test_direct_return_with_inplace_update*** Pull Request resolved: https://github.com/pytorch/pytorch/pull/89536 Approved by: https://github.com/jansel commit 0bde5149819e9854bca1363aa6c9f52f7db2496e Author: Nikita Shulga Date: Thu Nov 24 00:57:17 2022 +0000 Add `c10::` namespace in front of `optional` (#89605) Prep change for moving the codebase to C++17 standard Was part of https://github.com/pytorch/pytorch/pull/85969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89605 Approved by: https://github.com/weiwangmeta, https://github.com/kit1980 commit e19a7165fd1a9a35fcac42706c20e658776c10ab Author: foram-chandra <96388449+foram-chandra@users.noreply.github.com> Date: Thu Nov 24 00:34:26 2022 +0000 [nn] Remove deprecation warning from nn.functional.{tanh, sigmoid} (#86905) Fixes #65909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86905 Approved by: https://github.com/albanD, https://github.com/kit1980 commit a00bd6f686d7a485f7bea5f971b7e793118842b8 Author: clee2000 <44682903+clee2000@users.noreply.github.com> Date: Wed Nov 23 23:48:32 2022 +0000 Don't run auto request review on forked PRs (#89583) tested on https://github.com/pytorch/pytorch/pull/89581 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89583 Approved by: https://github.com/albanD, https://github.com/malfet commit 0a1a53083e331b3648ad4cb6f750d130e3530731 Author: Nikita Karetnikov Date: Wed Nov 23 20:42:55 2022 +0000 [primTorch] Enable regex error testing for some refs (#87765) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765 Approved by: https://github.com/mruberry commit 3ad2a032f4924d58c556b80840f6d51aa8a4472b Author: Nikita Shulga Date: Wed Nov 23 23:23:24 2022 +0000 Update default cmake to 3.18 (#89570) Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh ` Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570 Approved by: https://github.com/atalman commit 8695f0cced016d43298b43a4baf30315061fdacd Author: Jane Xu Date: Wed Nov 23 23:23:17 2022 +0000 Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697) Using the same repro from the issue (but with BatchNorm2D) Rectifies native_batch_norm schema by splitting the schema into 2: 1. one will have NON-optional alias-able running_mean and running_var inputs 2. the other will just not have those parameters at all (no_stats variation) **Calling for name suggestions!** I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit` CI should pass. Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697 Approved by: https://github.com/albanD commit a00efe55c3790789b967facf10c3f426faa98155 Author: Everton Constantino Date: Wed Nov 23 22:46:29 2022 +0000 Fix CheckOutputStreamSetting on JitLoggingTest as it failed if logging wasn't enabled. (#82722) `JIT_LOG` checks if logging was enabled for that particular file and when it isn't it doesn't output anything. Since the test checks for the size of `test_stream` it fails. I believe forcing the file to have logging enabled to see if the stream is being correctly set during test makes no sense so this patches just forcibly outputs and checks if it worked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82722 Approved by: https://github.com/davidberard98 commit b8d3afd88665de5f01f696333d0ff291bd94a57b Author: Huy Do Date: Wed Nov 23 22:39:36 2022 +0000 Skip upload test stats for test reports from rerun disabled tests workflow (#89548) I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699. The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping). This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages. This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats. I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text). The size of the zipped file is not a big immediate problem [3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check. The script can now finish when running locally: * `upload_test_stats` finishes around 3+ minutes ``` time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master ... Writing 8925 documents to S3 Done! Writing 1760 documents to S3 Done! Writing 1675249 documents to S3 Done! python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954 1 185.69s user 12.89s system 75% cpu 4:22.82 total ``` * `check_disabled_tests` finishes within 3 minutes ``` time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch ... python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 1 154.19s user 4.17s system 97% cpu 2:42.50 total ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548 Approved by: https://github.com/clee2000 commit f18f0c70ab10c400947e71be30794e04dcc22acf Author: Elias Ellison Date: Wed Nov 23 19:02:51 2022 +0000 Dont clone unmutated args in triton autotuning (#89519) Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great. Edit: i think it's just the triton cache clearing https://github.com/openai/triton/blob/44f577984d28ee979f704e2c28a1dcbac9639840/python/triton/testing.py#L159 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89519 Approved by: https://github.com/ngimel, https://github.com/jansel commit ac19c5be82febc2140d4601c98daf45646a399ab Author: Peter Bell Date: Tue Nov 22 22:26:21 2022 +0000 FFT: disable dimension wrapping for scalar tensors (#89234) Fixes #88985 By default, `maybe_wrap_dim` allows through `dim=0` or `dim=-1` for scalar tensors which leads to an invalid dimension being used to index into `tensor.sizes()` as in the code sample from the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89234 Approved by: https://github.com/mruberry commit 50e2e4faf38c6ebafacc43b72c40333f1f7b401e Author: Pearu Peterson Date: Wed Nov 23 12:05:37 2022 +0200 Sparse CSC/BSR/BSC serialization and pickle support (#89553) Fixes https://github.com/pytorch/pytorch/issues/89497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553 Approved by: https://github.com/cpuhrsch commit a8d6b82167ef417e21c807cb29d7eabea15014da Author: Elias Ellison Date: Wed Nov 23 16:47:43 2022 +0000 Fix norm decomp when dtype is passed in (#89508) Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508 Approved by: https://github.com/anijain2305 commit 72110d783344c4121730b032ca0d269896604dcf Author: Elias Ellison Date: Wed Nov 23 17:03:09 2022 +0000 Fix Upsample Decomp Striding For Small Channels (#89528) Fix for https://github.com/pytorch/torchdynamo/issues/623. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528 Approved by: https://github.com/ngimel, https://github.com/anijain2305 commit b7483be06afe8d4242adeb559cfbe6e0e89419d0 Author: Jerry Zhang Date: Wed Nov 23 11:03:45 2022 -0800 [quant][docs] Add docstrings for operators defined in torch.ops.quantized_decomposed namespace (#89547) Summary: no functionality changes Test Plan: NA Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89547 Approved by: https://github.com/vkuzo commit a188f05e8c1788d393c072868421991dfcb55b02 Author: Natalia Gimelshein Date: Wed Nov 23 20:18:54 2022 +0000 Reland #89031 Added conv constraint that infers layouts (#89530) Relands #89031 Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530 Approved by: https://github.com/Chillee commit e800d27b10137727c68cb71bccabe3a93cf38e9e Author: William Wen Date: Wed Nov 23 20:11:39 2022 +0000 [dashboard] Add graphs for all summary metrics, add additional testing flags (#89580) Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580 Approved by: https://github.com/davidberard98 commit 953f39578a7019c4c34bc1dbd6cb0facb554af79 Author: Charlie West-Taylor Date: Wed Nov 23 19:51:50 2022 +0000 Mark IPU device as not supports_as_strided (#89130) Currently causes issues in calls to `.to`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89130 Approved by: https://github.com/albanD commit 37e46a503502cdeda791cf684522ef83b5655328 Author: Yanbo Liang Date: Wed Nov 23 19:44:46 2022 +0000 [Dynamo] Fix several bugs & code refactor in RangeVariable (#89322) Fix bug in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884): https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_clovaai_stargan_v2.py ``` E TypeError: 'list' object cannot be interpreted as an integer E E from user code: E File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_clovaai_stargan_v2.py", line 335, in forward E idx = torch.LongTensor(range(y.size(0))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89322 Approved by: https://github.com/jansel commit 91dcef41ae96ede3f07375c2d38cb28d534e97f8 Author: Xilun Wu <12968408+XilunWu@users.noreply.github.com> Date: Wed Nov 23 19:43:28 2022 +0000 Thread PG: add allreduce to threaded pg (#89043) Summary: Goal Add `all_reduce` collective to multi-threaded ProcessGroup added in D40236769 (https://github.com/pytorch/pytorch/commit/6663ae5537f3c61030ba4d425bd57a097c51430a). Code Motion Added `allreduce` collective to ProcessLocalGroup (a subclass of c10d ProcessGroup). What's Next Add a DDP test utilizing the new allreduce op. Generalize `allreduce` to allow other `ReduceOp`s besides `SUM`. Test Plan: cd fbcode/caffe2 buck2 test mode/dev //caffe2/test/distributed:multi_threaded Differential Revision: D41046606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89043 Approved by: https://github.com/wanchaol commit 27db806888c36b029f51197a40e5196cc10792db Author: Charlie West-Taylor Date: Wed Nov 23 19:41:07 2022 +0000 Handle Tensor.__deepcopy__ via clone(), on IPU (#89129) Currently it falls through to a call to `storage()`, which the IPU doesn't support. I've made the minimal change here for ease of merging (this'd help us if it was in for 1.13.1), however... **QUESTION**: Is there any reason why `not torch._C._has_storage(self)` needs to *also* be guarded on `self.device.type == privateuseone`? in other words, could the condition for using `clone` not be this? ```python self.is_sparse or self.device.type in ["lazy", "xla", "mps", "ort", "meta", "hpu", "ipu"] or not torch._C._has_storage(self) or (type(self) is not Tensor and self.data_ptr() == 0) ``` If the condition fails, the very next thing is a call to `self._typed_storage()` which will fail, so it feels to me like *any* case without storage shouldn't fall through to the `storage()` call. The original PR for adding the 'no storage and device is `PrivateUse1`' condition ([86557](https://github.com/pytorch/pytorch/pull/86557)) doesn't discuss whether this could be broadened. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89129 Approved by: https://github.com/albanD commit fa7a963f6536dd05c381fbf23270f4f009f9f113 Author: Sergii Dymchenko Date: Wed Nov 23 19:39:47 2022 +0000 Remove BaseException TODO (#89540) After discussion in https://github.com/pytorch/pytorch/pull/88461#issuecomment-1318965664 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89540 Approved by: https://github.com/H-Huang commit 9eed6b7f9aa4f5fc65075de3189acc9add221660 Author: Yanbo Liang Date: Wed Nov 23 19:39:43 2022 +0000 [Dynamo] Several fixes on TensorVariable & TorchVariable (#89486) This is a group of bug fixes for [7k github models](https://github.com/pytorch/torchdynamo/issues/1884), it would fix 30+ model tests. * Support ```tensor.type()```. * Support ```tensor.get_device()```. * Support ```torch.nn.functional._Reduction.get_enum```. * Support ```torch._utils._get_device_index()```. * Fallback ```tensor.data_ptr()```. * ```FakeTensor``` always returns 0 * For no fake tensor propagation, we ```clone``` the input tensor, which makes no sense to track the original ```data_ptr```. And I don't think this is a very popular API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89486 Approved by: https://github.com/jansel commit f03e6672fb6a694d6f03980e3f34d8181c7cc663 Author: Iris Date: Wed Nov 23 19:39:01 2022 +0000 [Checkpoint][2D] Minor update for dedup_tensors.py (#89542) Rename variables for better readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89542 Approved by: https://github.com/H-Huang commit 74703eb50299b26082bc2a357770739a68460199 Author: Iris Date: Wed Nov 23 19:36:01 2022 +0000 [Checkpoint] Add a logger to dedup_tensors (#89503) Add a logger to dedup_tensors to log the duplicate keys to remove in global plan (List of SavePlan). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89503 Approved by: https://github.com/fduwjj commit 57353c9608263df98156a73aaa6ed35a2a2306ad Author: Brian Hirsh Date: Wed Nov 23 08:29:08 2022 -0800 first draft of input mutation handling for aot autograd (#88817) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88817 Approved by: https://github.com/ezyang, https://github.com/wconstab commit 902e4e3926a9333178510f032580e4acd56c40da Author: PyTorch MergeBot Date: Wed Nov 23 19:05:13 2022 +0000 Revert "Fix the kineto daemon build condition (#89174)" This reverts commit 9fd00f194ae4e28948a9a03a6382c20dde04e4fd. Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil. commit 049a0f2cd5916c8392c6bd1adc41c709de892f3a Author: Bin Bao Date: Wed Nov 23 02:00:44 2022 +0000 [inductor] Update CI model tests (#89499) Summary: 1) Add model inference test 2) Switch model training test to use AMP Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499 Approved by: https://github.com/bertmaher commit 95474e00a9477b1333e13fa95887a2ce05c4a6a6 Author: Jerry Zhang Date: Tue Nov 22 20:29:26 2022 -0800 [quant][be] Remove unused util code (#89272) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89272 Approved by: https://github.com/andrewor14 commit 128faf2b69f62b55d3ae1b4cb3e24ec594af0009 Author: Jerry Zhang Date: Tue Nov 22 20:29:26 2022 -0800 [quant][be] Refactor the error checking code for quantize_per_channel op (#89271) Summary: at Test Plan: make sure it compiles Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89271 Approved by: https://github.com/andrewor14 commit 71c0e84914b74bc30178292e02f67bc47c0bee21 Author: Catherine Lee Date: Wed Nov 23 18:27:37 2022 +0000 Gate leak check and reruns on schedule (#89504) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/89504 Approved by: https://github.com/huydhn commit c9d4390d1328f7d57070106bae035bd77a76452b Author: Emilio Castillo Date: Wed Nov 23 17:54:33 2022 +0000 Add Pluggable CUDA allocator backend (#86786) Fixes #43144 This uses the Backend system added by [82682](https://github.com/pytorch/pytorch/pull/82682) to change allocators dynamically during the code execution. This will allow us to use RMM, use CUDA managed memory for some portions of the code that do not fit in GPU memory. Write static memory allocators to reduce fragmentation while training models and improve interoperability with external DL compilers/libraries. For example, we could have the following allocator in c++ ```c++ extern "C" { void* my_malloc(ssize_t size, int device, cudaStream_t stream) { void *ptr; std::cout<<"alloc "<< size< Date: Wed Nov 23 17:27:40 2022 +0000 [test_nn] split parametrization test from test_nn (#89552) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89552 Approved by: https://github.com/albanD commit 347a7d97a5e4855e0648fdcc194e28d3019276b6 Author: albanD Date: Wed Nov 23 16:51:42 2022 +0000 Deprecate decorating classes with torch.no_grad and similar (#89522) Fixes https://github.com/pytorch/pytorch/issues/89450 I would have completely removed it but I don't think this is particularly urgent and there are some use of it in the wild: https://github.com/search?q=%2Ftorch%5C.no_grad%5C%28%5C%29%5Cnclass%2F&type=code So we might as well take one release to do it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89522 Approved by: https://github.com/lezcano, https://github.com/soulitzer, https://github.com/janeyx99 commit 2de38a0714da1ddc3625e6e794e1c3ef869c841a Author: Nikita Shulga Date: Wed Nov 23 16:33:13 2022 +0000 Add `torch._dynamo` to docs (#89510) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89510 Approved by: https://github.com/msaroufim commit de0dee30d021b4546709dd7b785daba335f44942 Author: fduwjj Date: Wed Nov 23 05:29:53 2022 +0000 [PT-D][3/N] Sync TP API change to Pytorch (#89535) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89535 Approved by: https://github.com/wanchaol commit 795473ff5e861f21c6b5e0611177fcfa9e1c4e0c Author: Yukio Siraichi Date: Wed Nov 23 15:56:54 2022 +0000 Call `symint::sizes()` instead of `sizes()` on convolution error messages. (#89549) This PR fixes convolution when using `torchdynamo` with dynamic shapes. **Problem:** there are some `tensor.sizes()` calls in a few error messages. As a result, an uninformative error message was being displayed. ```python @torch._dynamo.optimize("eager") def foo(inp, w): return F.conv2d(inp, w) inp = torch.rand((1, 1, 32, 32)) w = torch.rand((1, 2, 3, 3)) foo(inp, w) ``` ----- **Before this PR:** ```python Traceback (most recent call last): File "torch/_dynamo/utils.py", line 1076, in run_node return node.target(*args, **kwargs) File "torch/_subclasses/fake_tensor.py", line 867, in __torch_dispatch__ op_impl_out = op_impl(self, func, *args, **kwargs) File "torch/_subclasses/fake_tensor.py", line 445, in conv conv_backend = torch._C._select_conv_backend(**kwargs) RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides ``` **After this PR:** ```python Traceback (most recent call last): File "torch/_dynamo/utils.py", line 1076, in run_node return node.target(*args, **kwargs) File "torch/_subclasses/fake_tensor.py", line 867, in __torch_dispatch__ op_impl_out = op_impl(self, func, *args, **kwargs) File "torch/_subclasses/fake_tensor.py", line 445, in conv conv_backend = torch._C._select_conv_backend(**kwargs) RuntimeError: Given groups=1, weight of size [1, s1, s2, s2], expected input[1, 1, s0, s0] to have s1 channels, but got 1 channels instead ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89549 Approved by: https://github.com/ezyang commit 3a858ba8e3b6f398f3b981d258e8309d1c93ba39 Merge: 685d432634 724c74d85a Author: mingfeima Date: Wed Nov 23 21:20:11 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit 724c74d85ac47dcbe8975e07bd8d82cb6ec1d3d3 Author: mingfeima Date: Wed Nov 23 21:20:11 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit 39772a6a01c1579497f68f916e7abb56aaee1c1e Author: Jerry Zhang Date: Tue Nov 22 20:29:25 2022 -0800 [quant] Add support for quantize_per_channel in the reference flow with decomposed tensor (#89270) Summary: att, after this PR we can produce quantize_per_channel and dequantize_per_channel ops (typically used for quantizing weights) in the reference flow using decomposed tensor Test Plan: python test/test_quantization.py -k test__convert_to_reference_decomposed_fx_per_channel_quant Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89270 Approved by: https://github.com/vkuzo commit c651944f9226661ad41fa201c61300030c1c2e18 Author: Kshiteej K Date: Wed Nov 23 08:39:45 2022 +0000 [test_nn] split hooks test from test_nn (#89201) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89201 Approved by: https://github.com/albanD commit dd140fc351e322303229ad2a5713b7ee51d35673 Author: Kshiteej K Date: Wed Nov 23 08:30:51 2022 +0000 [test_nn] move init tests from test_nn (#89202) Ref: https://github.com/pytorch/pytorch/issues/63085 Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89202 Approved by: https://github.com/albanD commit 685d432634a7e01aa6f58cff7aeaf3f894b1e4f3 Merge: 2c3d1877fb 89de8ac645 Author: mingfeima Date: Wed Nov 23 16:28:38 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit 89de8ac64544dd298ac0a4e648f2e166a5a6f0c0 Merge: c0dbc6488f 7594e043b8 Author: mingfeima Date: Wed Nov 23 16:28:38 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit 7594e043b85b6e2c0cf4b2f257ac9606313cec90 Author: Alexander Grund Date: Wed Nov 23 06:50:05 2022 +0000 Fix Use-after-Free in qembeddingbag_byte_prepack_out (#84750) When FBGEMM is not used (either manually disabled or on platforms such as POWER where it isn't supported at all) the fallback code requests a `data_ptr` on a `Tensor` object returned by `to(ScalarType::Float)` in the same line. This object will be destroyed at the end of the line leading to a dangling pointer. On some platforms this manifests in wrong results being returned as the memory gets overwritten. On other platforms anything may happen due to this being undefined behavior, although most likely it will just crash or continue to return semi-random results which may even happen to be correct (when the memory is not reused yet) Fix this by binding the temporary object (or initial object) to a const value reference which extents its lifetime and getting the `data_ptr` from that. Fixes #84748 This bug was introduced by a seemingly unrelated change in #64081 hence ccing @d1jang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84750 Approved by: https://github.com/kimishpatel commit 07dd2fe6c32948e5ca0a2871e5eb31602a9684cf Author: Nikita Karetnikov Date: Wed Nov 23 00:49:43 2022 +0100 Symintify `select` (#89326) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89326 Approved by: https://github.com/ezyang commit 29742786f38d4873576c73917e8509908132dae2 Author: Jerry Zhang Date: Mon Nov 21 14:19:04 2022 -0800 [quant] Add dequantize_per_channel in quantized_decomposed op library (#89269) Summary: att Test Plan: python test/test_quantization.py -k test_decomposed_dequantize_per_channel Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89269 Approved by: https://github.com/vkuzo commit 52669534438db3d680def4c70cb03b7e27566d7e Author: Edward Z. Yang Date: Tue Nov 22 07:47:48 2022 -0800 Add crossref debug mode for functionalization, catches stride errors (#89498) The idea is to add a custom handler to Functionalize key in Python dispatcher that runs the functionalized version along side a non functionalized version, and checks that their outputs agree in the end. (Technically, for metadata mutation we should also check the inputs, but for now we're relying on those functions returning self.) I turned this on for test_functionalize.py (new TestCrossRefFunctionalize) and found a bunch of failures that look legit. This probably doesn't interact that nicely if you're also tracing at the same time, probably need more special logic for that (directly, just disabling tracing for when we create the nested fake tensor mode, but IDK if there's a more principled way to organize this.) There are some misc fixups which I can split if people really want. - xfail_inherited_tests moved to test common_utils - Bindings for _dispatch_tls_set_dispatch_key_included, _dispatch_tls_is_dispatch_key_included and _functionalization_reapply_views_tls - Type stubs for _enable_functionalization, _disable_functionalization - all_known_overloads utility to let you iterate over all OpOverloads in all namespaces. Iterator support on all torch._ops objects to let you iterate over their members. - suspend_functionalization lets you temporarily disable functionalization mode in a context - check_metadata_matches for easily comparing outputs of functions and see if they match (TODO: there are a few copies of this logic, consolidate!) - _fmt for easily printing the metadata of a tensor without its data - _uncache_dispatch for removing a particular dispatch key from the cache, so that we force it to regenerate - check_significant_strides new kwarg only_cuda to let you also do stride test even when inputs are not CUDA - Functionalize in torch._C.DispatchKey Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89498 Approved by: https://github.com/malfet commit fe990c8db92abce3d22b24c61958c844bb4834f0 Author: Nikita Shulga Date: Wed Nov 23 03:31:17 2022 +0000 [BE] Add more `ssh` instructions (#89516) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/89516 Approved by: https://github.com/huydhn commit 5b51ca6808191e9f3dcea1d43fa731488cc688bb Author: Alexander Grund Date: Wed Nov 23 03:07:22 2022 +0000 Update CUDA compiler matrix (#86360) Switch GCC/Clang max versions to be exclusive as the `include/crt/host_config.h` checks the major version only for the upper bound. This allows to be less restrictive and match the checks in the aforementioned header. Also update the versions using that header in the CUDA SDKs. Follow up to #82860 I noticed this as PyTorch 1.12.1 with CUDA 11.3.1 and GCC 10.3 was failing in the `test_cpp_extensions*` tests. Example for CUDA 11.3.1 from the SDK header: ``` // Error out ... // Error out ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86360 Approved by: https://github.com/ezyang commit 504570d577366f309bc7fc63fa7909f9d372d722 Author: Sergii Dymchenko Date: Wed Nov 23 02:59:25 2022 +0000 Delete unused variable assignment in _refs/__init__.py (#89538) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89538 Approved by: https://github.com/huydhn commit ed32511974daafa256457784820c42f75d83d300 Author: Edward Z. Yang Date: Tue Nov 22 12:02:59 2022 -0800 Don't use explain() for --explain; instead read it off the counters (#89518) Fixes huggingface problem where example_inputs is not actually the args. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89518 Approved by: https://github.com/albanD commit f5d18574a33d2b9421f724a023676281e2076fce Author: Shen Li Date: Tue Nov 22 22:32:49 2022 +0000 Allow Module forward-pre and forward hooks to take kwargs (#89389) closes #35643 This PR is mostly borrowed from #82042. Thanks @Padarn for implementing the first version and debugging into the errors. Based on the discussion in #82042 this PR adds a with_kwargs argument to register_forward_pre_hook and register_forward_hook methods. When the arg is set to true, the provided hook must accept kwargs args. Under the hook, this PR adds a `_forward_pre_hooks_with_kwargs` and a `_forward_hook_with_kwargs` set to keep track of which hooks accept kwargs. Differential Revision: [D41431111](https://our.internmc.facebook.com/intern/diff/D41431111) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89389 Approved by: https://github.com/soulitzer commit 4935b597ac0c93e023637ed4755db84398ccc41b Author: Thomas <37830237+thomaslin2020@users.noreply.github.com> Date: Wed Nov 23 02:18:03 2022 +0000 Added implementation and tests for MPS Hardswish (#87952) Fixes issue #86807 by adding MPS backend support for aten::hardswish. Registered mps hardswish functions in native_functions.yaml, and added the code implementation to Activations.mm. Added functions: - hardswish_mps - hardswish_mps_ - hardswish_backward_mps - hardswish_out_mps Added test in test/test_mps.py and tested code using the command `python3 test/test_mps.py -k test_hardswish` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87952 Approved by: https://github.com/kulinseth, https://github.com/kit1980 commit 1cfd3858ac54fe3883534309081631a0a892ba3f Author: Animesh Jain Date: Wed Nov 23 00:48:00 2022 +0000 [inductor] Use dense masks for indirect indexing (#89524) Fixes https://github.com/pytorch/torchdynamo/issues/1654 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89524 Approved by: https://github.com/jansel commit 26322544b87071dabee2f47242584fd3c8a9fbd7 Author: Will Constable Date: Tue Nov 22 19:24:00 2022 +0000 Add limited FSDP correctness to torchdynamo benchmark (#89469) - Does not do recursive wrapping - Only supports accuracy bench - Mainly useful for sweeping over models for correctness, in part to evaluate whether dynamo support for FSDP is breaking anywhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/89469 Approved by: https://github.com/davidberard98, https://github.com/aazzolini commit 7f4b4d282702265f8e1da337d3027df7a3ba17d9 Author: Nikita Shulga Date: Wed Nov 23 00:07:59 2022 +0000 [Inductor] Limit g++12 installation to Linux (#89472) According to https://anaconda.org/conda-forge/gxx/ its only available on Linux Pull Request resolved: https://github.com/pytorch/pytorch/pull/89472 Approved by: https://github.com/soumith, https://github.com/jgong5 commit b50699f24733e53779112b56eafe39f2cc369521 Author: Will Constable Date: Tue Nov 22 19:24:00 2022 +0000 Fix inductor fallback_random for dropout/rand_like (#89515) - Avoid fx graph rewrite that replaces certain ops with ones using triton random - Keep track of replacement ops using triton random, so it is possible to not disable all replacements when using fallback_random Pull Request resolved: https://github.com/pytorch/pytorch/pull/89515 Approved by: https://github.com/ngimel commit 8bf8e4d71e8fb125a3bbf6cc951e661e453598bb Author: William Wen Date: Tue Nov 22 23:42:09 2022 +0000 [dashboard] Add metric graphs back to dashboard (#89531) Title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89531 Approved by: https://github.com/davidberard98 commit ce856cee7eeea9a6eb5ed30fa512b38b3d8f3edf Author: Kshiteej K Date: Tue Nov 22 22:55:41 2022 +0000 [test_nn] fix missing class attributes for NNTestCase (#89200) Missed setting these class variable 😓 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89200 Approved by: https://github.com/albanD commit 391b593ca262432ccba1939f7448275cfd4f62e6 Author: Jerry Zhang Date: Mon Nov 21 14:19:03 2022 -0800 [quant] Add quantize_per_channel in quantized_decomposed op library (#89268) Summary: att Test Plan: python test/test_quantization.py -k test_decomposed_quantize_per_channel Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89268 Approved by: https://github.com/vkuzo commit 5bba783d2170ddca8f4dd781d287dedb69de312a Author: Animesh Jain Date: Tue Nov 22 22:25:30 2022 +0000 [dashboard] Remove aot_cudagraphs and nvprims_nvfuser (#89514) Helps speeding up Dashboard runs We will bring these back when the backends are ready to be tested on full model suite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89514 Approved by: https://github.com/SherlockNoMad commit ea920a11156cb7a037feb45285c0ce6520b3801c Author: Manuel Candales Date: Tue Nov 22 22:15:54 2022 +0000 [Vulkan][TCC] Add tests for quantize_per_tensor and dequantize (#89496) Summary: Add tests for quantize per tensor and dequantize Test Plan: On Mac ``` cd ~/fbsource buck1 run -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 ``` On Android ``` cd ~/fbsource buck1 build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_quantized_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_quantized_api_test adb shell "/data/local/tmp/vulkan_quantized_api_test" ``` Reviewed By: salilsdesai Differential Revision: D41047097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89496 Approved by: https://github.com/digantdesai commit 74e62a1fefb7100689169dc12fd70095de54079d Author: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com> Date: Tue Nov 22 22:15:38 2022 +0000 [ROCm] Optimize layer norm backward kernel for ROCm (#87635) We observed that the native PyTorch LayerNormBackwardKernelImplInternal has suboptimal performance for certain input sizes on AMD GPUs especially when `fs` (=`config_m` in our benchmark script) is large and `bs` (=`config_n` in our benchmark script) is small (commonly seen in [the CvT model](https://arxiv.org/abs/2103.15808)) in the benchmark script of [PR #68238](https://github.com/pytorch/pytorch/pull/68238#issue-1051621716) on AMD GPUs. This PR is to replace `GammaBetaBackwardCUDAKernel` with the Apex layernorm backward kernel with some ROCm-specific parameter tuning when `fs` (=`config_m`) is larger than 512 on AMD GPUs. There are a few PRs for LayerNorm kernel: - https://github.com/pytorch/pytorch/pull/26201 - https://github.com/pytorch/pytorch/pull/27634 - https://github.com/pytorch/pytorch/pull/68238 Therefore, we have tested and compared the kernel before and at this PR with the input shapes in the last two PRs along with those commonly used in the CvT model on AMD MI100. --- **Current** M | N | fwd (half) | fwdbwd (half) | fwd (float) | fwdbwd (float) -- | -- | -- | -- | -- | -- 50432 | 384 | 0.387256 | 1.372758 | 0.378975 | 1.47892 50176 | 384 | 0.38231 | 1.362416 | 0.378084 | 1.473886 200704 | 192 | 0.997859 | 4.315875 | 0.989306 | 4.560827 802816 | 64 | 3.671828 | 16.68013 | 3.613515 | 16.827946 200 | 256 | 0.066503 | 0.332096 | 0.071422 | 0.325349 1000 | 256 | 0.071848 | 0.333355 | 0.073038 | 0.334753 6000 | 256 | 0.086334 | 0.345139 | 0.086834 | 0.347429 6272 | 256 | 0.088601 | 0.347906 | 0.087855 | 0.351245 200 | 512 | 0.071626 | 0.329726 | 0.073798 | 0.326878 1000 | 512 | 0.073975 | 0.330226 | 0.074166 | 0.332751 6000 | 512 | 0.099617 | 0.362367 | 0.100095 | 0.378313 6272 | 512 | 0.100378 | 0.358066 | 0.099857 | 0.395982 200 | 1024 | 0.072954 | 0.326382 | 0.073899 | 0.333007 1000 | 1024 | 0.0743 | 0.325532 | 0.071126 | 0.330991 6000 | 1024 | 0.127025 | 0.390084 | 0.128692 | 0.471504 6272 | 1024 | 0.130704 | 0.403536 | 0.135244 | 0.487133 200 | 1536 | 0.070331 | 0.339169 | 0.070086 | 0.331015 1000 | 1536 | 0.075085 | 0.330042 | 0.076295 | 0.328778 6000 | 1536 | 0.148889 | 0.44949 | 0.155781 | 0.659987 6272 | 1536 | 0.154939 | 0.478871 | 0.17673 | 0.716025 200 | 2048 | 0.070269 | 0.335585 | 0.072804 | 0.334655 1000 | 2048 | 0.080094 | 0.326991 | 0.080426 | 0.32685 6000 | 2048 | 0.187888 | 0.623023 | 0.245762 | 0.981635 6272 | 2048 | 0.195431 | 0.65244 | 0.262574 | 1.008141 200 | 3072 | 0.068205 | 0.339428 | 0.073068 | 0.344034 1000 | 3072 | 0.087554 | 0.328899 | 0.09218 | 0.346433 6000 | 3072 | 0.240352 | 0.905058 | 0.368135 | 1.280462 6272 | 3072 | 0.26179 | 0.959387 | 0.387782 | 1.476524 128 | 2097152 | 5.905976 | 22.724793 | 10.287974 | 30.242092 256 | 1048576 | 4.561596 | 19.554308 | 10.223171 | 29.42371 512 | 524288 | 4.146751 | 22.7247 | 11.404285 | 39.175902 1024 | 262144 | 5.193135 | 23.403325 | 11.334512 | 38.947192 2048 | 131072 | 4.992907 | 23.377801 | 11.400286 | 40.889191 4096 | 65536 | 5.429488 | 24.275701 | 11.196778 | 41.4751 8192 | 32768 | 5.35758 | 21.360312 | 10.535418 | 42.875646 16384 | 16384 | 5.44947 | 20.852605 | 10.357685 | 34.603408 32768 | 8192 | 4.688925 | 17.379392 | 9.635596 | 31.188271 --------- **At this PR** M | N | fwd (half) | fwdbwd (half) | fwd (float) | fwdbwd (float) -- | -- | -- | -- | -- | -- 50432 | 384 | 0.38797 | 0.93103 | 0.37966 | 1.15283 50176 | 384 | 0.3874 | 0.96417 | 0.38462 | 1.18595 200704 | 192 | 1.00002 | 2.40876 | 0.99224 | 2.55579 802816 | 64 | 3.67348 | 7.98658 | 3.61871 | 7.72404 200 | 256 | 0.07292 | 0.35119 | 0.07195 | 0.32602 1000 | 256 | 0.07354 | 0.33325 | 0.07237 | 0.33742 6000 | 256 | 0.08819 | 0.33283 | 0.08453 | 0.3279 6272 | 256 | 0.0886 | 0.33446 | 0.08774 | 0.33426 200 | 512 | 0.0701 | 0.33505 | 0.07072 | 0.33018 1000 | 512 | 0.07042 | 0.33442 | 0.074 | 0.33206 6000 | 512 | 0.09931 | 0.34956 | 0.09895 | 0.3572 6272 | 512 | 0.10103 | 0.32976 | 0.10041 | 0.36635 200 | 1024 | 0.07144 | 0.33579 | 0.07209 | 0.33216 1000 | 1024 | 0.0736 | 0.32803 | 0.07286 | 0.32936 6000 | 1024 | 0.12584 | 0.38916 | 0.12852 | 0.48273 6272 | 1024 | 0.13053 | 0.38804 | 0.13464 | 0.49545 200 | 1536 | 0.07159 | 0.3396 | 0.07062 | 0.33545 1000 | 1536 | 0.07443 | 0.33239 | 0.07366 | 0.33204 6000 | 1536 | 0.14959 | 0.45043 | 0.15826 | 0.69119 6272 | 1536 | 0.1542 | 0.47644 | 0.18249 | 0.72208 200 | 2048 | 0.07258 | 0.33982 | 0.07412 | 0.33859 1000 | 2048 | 0.0793 | 0.32816 | 0.07864 | 0.32583 6000 | 2048 | 0.18973 | 0.571 | 0.25506 | 0.91796 6272 | 2048 | 0.19719 | 0.64208 | 0.26445 | 0.95055 200 | 3072 | 0.07092 | 0.33867 | 0.07104 | 0.34695 1000 | 3072 | 0.08727 | 0.33144 | 0.09144 | 0.36633 6000 | 3072 | 0.24683 | 0.87275 | 0.37761 | 1.3289 6272 | 3072 | 0.26437 | 0.91178 | 0.38496 | 1.53694 128 | 2097152 | 6.27936 | 23.69425 | 10.40004 | 30.13699 256 | 1048576 | 4.5404 | 19.47675 | 10.28494 | 29.36936 512 | 524288 | 4.13951 | 18.78771 | 10.09557 | 32.67083 1024 | 262144 | 4.47576 | 18.00411 | 9.56488 | 31.47117 2048 | 131072 | 4.28026 | 16.95619 | 9.40297 | 30.82845 4096 | 65536 | 4.2653 | 16.5018 | 9.03315 | 30.08392 8192 | 32768 | 4.25613 | 16.13583 | 8.9258 | 30.75296 16384 | 16384 | 4.20256 | 16.38207 | 9.52587 | 31.31113 32768 | 8192 | 4.20231 | 16.19452 | 9.31478 | 31.03514 --------- **Performance Improvement (%)**
M | N | fwdbwd, torch.float16 | fwdbwd, torch.float32 -- | -- | -- | -- 50432 | 384 | 32.178 | 22.049 50176 | 384 | 29.231 | 19.536 200704 | 192 | 44.188 | 43.962 802816 | 64 | 52.119 | 54.100 200 | 256 | -5.750 | -0.206 1000 | 256 | 0.031 | -0.797 6000 | 256 | 3.566 | 5.621 6272 | 256 | 3.865 | 4.836 200 | 512 | -1.615 | -1.010 1000 | 512 | -1.270 | 0.208 6000 | 512 | 3.534 | 5.581 6272 | 512 | 7.905 | 7.483 200 | 1024 | -2.883 | 0.254 1000 | 1024 | -0.767 | 0.493 6000 | 1024 | 0.237 | -2.381 6272 | 1024 | 3.840 | -1.707 200 | 1536 | -0.127 | -1.340 1000 | 1536 | -0.711 | -0.992 6000 | 1536 | -0.209 | -4.728 6272 | 1536 | 0.508 | -0.846 200 | 2048 | -1.262 | -1.176 1000 | 2048 | -0.358 | 0.312 6000 | 2048 | 8.350 | 6.487 6272 | 2048 | 1.588 | 5.713 200 | 3072 | 0.223 | -0.848 1000 | 3072 | -0.773 | -5.743 6000 | 3072 | 3.570 | -3.783 6272 | 3072 | 4.962 | -4.092 128 | 2097152 | -4.266 | 0.348 256 | 1048576 | 0.397 | 0.185 512 | 524288 | 17.325 | 16.605 1024 | 262144 | 23.070 | 19.195 2048 | 131072 | 27.469 | 24.605 4096 | 65536 | 32.023 | 27.465 8192 | 32768 | 24.459 | 28.274 16384 | 16384 | 21.439 | 9.514 32768 | 8192 | 6.818 | 0.491
--------- **Benchmark script of this PR** ``` from distutils.command.config import config import torch from torch.nn import LayerNorm import timeit number_runs = 1000 # TODO: Modify this to save time! def test_forward(layer_norm_cuda, input_cuda): layer_norm_cuda(input_cuda); torch.cuda.synchronize() def test_backward(out_cuda, layer_norm_grad_cuda, create_graph): out_cuda.backward(layer_norm_grad_cuda, retain_graph=True, create_graph=create_graph); torch.cuda.synchronize() def test_fwdbwd(input_cuda, layer_norm_cuda, gO): input_cuda.grad = None layer_norm_cuda.zero_grad(set_to_none=True) out = layer_norm_cuda(input_cuda) out.backward(gO) torch.cuda.synchronize() def benchmark(config_m, config_n): print("M | N | fwd (half) | fwdbwd (half) | fwd (float) | fwdbwd (float)") if len(config_m) != len(config_n): print("Please make sure the lengths of config_m and config_m are the same.") for i in range(len(config_m)): normalized_shape = config_n[i] results = [config_m[i], config_n[i]] for dtype in (torch.half, torch.float): if dtype == torch.half: layer_norm_cuda = LayerNorm(normalized_shape).half().cuda() else: layer_norm_cuda = LayerNorm(normalized_shape).cuda() input_cuda = torch.randn(config_m[i], config_n[i], device='cuda', dtype=dtype, requires_grad=True) result_fwd = timeit.timeit(lambda: test_forward(layer_norm_cuda, input_cuda), number=number_runs) results.append(result_fwd / number_runs * 1000) gO = torch.rand_like(input_cuda) result_fwdbwd = timeit.timeit(lambda: test_fwdbwd(input_cuda, layer_norm_cuda, gO), number=number_runs) results.append(result_fwdbwd / number_runs * 1000) print('{:09d}|{:09d}|{:9.5f}|{:9.5f}|{:9.5f}|{:9.5f}'.format(results[0], results[1], results[2], results[3], results[4], results[5])) print("Times are in microseconds (us).") config_m_cvt = [50432, 50176, 200704, 802816] config_n_cvt = [384, 384, 192, 64] config_m_68238 = [200, 1000, 6000, 6272, 200, 1000, 6000, 6272, 200, 1000, 6000, 6272, 200, 1000, 6000, 6272, 200, 1000, 6000, 6272, 200, 1000, 6000, 6272] config_n_68238 = [256,256,256,256,512,512,512,512,1024,1024,1024,1024,1536,1536,1536,1536,2048,2048,2048,2048,3072,3072,3072,3072] config_m_27634 = [128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768] config_n_27634 = [2097152, 1048576, 524288, 262144, 131072, 65536, 32768, 16384, 8192] config_m = config_m_cvt + config_m_68238 + config_m_27634 config_n = config_n_cvt + config_n_68238 + config_n_27634 benchmark(config_m, config_n) ``` CC: @jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/87635 Approved by: https://github.com/jataylo, https://github.com/jeffdaily, https://github.com/ezyang commit 00b7d8ef237f4f0fc3d247e016d504095b415d1f Author: Catherine Lee Date: Tue Nov 22 21:52:50 2022 +0000 Shard windows periodic job more (#89455) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/89455 Approved by: https://github.com/huydhn commit 77d7f2c65945438e0292b270998cea07c0d9d3d8 Author: William Wen Date: Tue Nov 22 21:17:36 2022 +0000 [dashboard] Add commit date & fix date related issues (#89517) Add commit date to build summary of dashboard. Make the date of the run reflective of when the run started, not when the run ended. Use PST (UTC -8) to determine day, rather than GMT (UTC +0). Test comment: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1324176119 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89517 Approved by: https://github.com/anijain2305 commit 177baf366ad16b868ab19a8776ae0e636f9d1951 Author: Alexander Grund Date: Tue Nov 22 20:29:07 2022 +0000 Fix vectorized trigonometric functions for VSX (#86453) Replace the remaining hand-written code in vec256_float_vsx.h by calls to Sleef functions similar to what was done in #59382 & #82646 after #41541 This fixes wrong results for e.g. `sin(1e20)`. Fixes #85978 To fix #85978 I only needed to do the sin/cos functions to make the test pass but to not encounter the same issue again and again (see the previous PRs and issues) I checked the whole file for similar functions where a Sleef function could be used and changed those too. In the diff I've noticed the faulty whitespace so to make this complete I fixed that too, so it should now be done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86453 Approved by: https://github.com/malfet commit ac3004757ef64b1ed1ff884a39d2a34cdfb5f772 Author: Alexander Grund Date: Tue Nov 22 20:27:27 2022 +0000 Relax tolerance for test_out_addbmm_cpu_float32 (#86365) The test may fail due to slightly different values caused by different order of matrizes in SGEMM: > Mismatched elements: 1 / 50 (2.0%) > Greatest absolute difference: 1.430511474609375e-05 at index (4, 5) (up to 1e-05 allowed) > Greatest relative difference: 4.65393206065873e-06 at index (4, 5) (up to 1.3e-06 allowed) Observed on POWER (ppc64le) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86365 Approved by: https://github.com/mruberry, https://github.com/kit1980 commit d053d513432bea75ae783529bf9f639f977a47d2 Author: Alexander Grund Date: Tue Nov 22 20:25:38 2022 +0000 (Further) limit world size in test_fsdp_pure_fp16 (#86280) Test still fails when run on 5 A100 GPUs, although it works with 5 V100s. Using 4 GPUs seems to be fine. Followup to #85957 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86280 Approved by: https://github.com/awgu, https://github.com/kit1980 commit c2ce79f06eb4a8cec2f9cfbdf3a1a4021a0a4cfa Author: Li-Huai (Allan) Lin Date: Tue Nov 22 19:33:21 2022 +0000 Fix dev-discuss link in the maintainer docs (#89493) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/89493 Approved by: https://github.com/H-Huang commit ef8b91fec73884f3043da8f541176ab7b4c57364 Author: Fuzzkatt Date: Tue Nov 22 19:05:56 2022 +0000 enable previously failing UCC distributed_test.py tests (#89023) Enables previously failing UCC distributed_test.py tests that are now fixed due to either ProcessGroupUCC barrier blocking fix (https://github.com/pytorch/pytorch/pull/86961) or UCC-side timeout error handling fix: (https://github.com/openucx/ucc/pull/679/files). Bump upstream UCC version to build UCC with timeout error handling fix merged in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89023 Approved by: https://github.com/kwen2501, https://github.com/malfet commit f281f435a8c60cf5781688bee3e4ff258c52344f Author: Animesh Jain Date: Tue Nov 22 18:42:13 2022 +0000 Fix benchmarks - xla tensor test (#89509) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/89509 Approved by: https://github.com/ngimel, https://github.com/shunting314 commit 7c0bb61291d62c449b78ce4930c27cbbd8ffac92 Author: mantaionut Date: Tue Nov 22 18:37:14 2022 +0000 Force numpy prod to use 64 bit integers on Windows in some tests (#88089) This fixes some prod and masked.prod tests on Windows. np.prod uses int32 on Windows so it overflows. On Linux it uses by default int64. Fixes #77305 Fixes #77320 Fixes #77334 Fixes #77335 Fixes #77336 Fixes #77337 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88089 Approved by: https://github.com/mruberry commit f4898daaeeea130a009330674726b5492982af85 Author: PratsBhatt Date: Tue Nov 22 18:00:01 2022 +0000 Add cached conda env file for Buck CI workflow (#89422) Fixes - T137631262 Caching conda dependencies for build workflows. Conda dependencies have been gathered from the workflow https://github.com/pytorch/pytorch/blob/master/.github/workflows/_buck-build-test.yml The pull request updates the action from `conda-incubator/setup-miniconda@v2` to `pytorch/test-infra/.github/actions/setup-miniconda@main` as it supports caching. Test Plan: Running the `ciflow/periodic` which runs the ci builds `buck-build-test` workflow. Expected output is to have all the conda dependencies cached. Screenshot 2022-11-22 at 15 44 20 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89422 Approved by: https://github.com/huydhn commit 9c0bf9387c1e39efda268a1fb300e8f87b7ef0e6 Author: anjali411 Date: Tue Nov 22 13:33:55 2022 +0000 Meta impl for linalg_cholesky and linalg_cholesky_ex (#89430) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89430 Approved by: https://github.com/ezyang commit c4e08387c1542eca67dc6e40661a50006bc879ff Author: Jerry Zhang Date: Mon Nov 21 14:19:02 2022 -0800 [quant][fx] Support producing reference quantized patterns for dynamic quantization (#89248) Summary: split the is_decomposed logic for `_replace_observer_with_quantize_dequantize_node` in a separate function and added support for dynamic quantization in the decomposed version of this function. In case of dynamic quantization, we'll produce the following reference quantized pattern in decomposed mode: ``` x -> choose_qparams -> quantize_per_tensor -> dequantize_per_tensor -> linear ``` Test Plan: python test/test_quantization.py -k test__convert_to_reference_decomposed_fx_dynamic_quant Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89248 Approved by: https://github.com/vkuzo commit 2823fc5e4c73a36ae1859889d34f4cc0d4145ae5 Author: Bin Bao Date: Tue Nov 22 00:30:12 2022 +0000 [inductor] generate nan in the cpp backend (#89289) Summary: Fixes https://github.com/pytorch/torchdynamo/issues/1797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89289 Approved by: https://github.com/ngimel, https://github.com/jansel, https://github.com/jgong5 commit 5797f74924d1f19cbb10e689a0f8112665fc07d9 Author: Howard Huang Date: Mon Nov 21 11:05:39 2022 -0800 [19/N] Add monitored_barrier custom op with CPU implementation (#89318) Differential Revision: [D41415324](https://our.internmc.facebook.com/intern/diff/D41415324) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89318 Approved by: https://github.com/kwen2501 commit be22b5d39f37aa501d07fa3ff3b9448826d48eca Author: Howard Huang Date: Mon Nov 21 11:05:38 2022 -0800 [18/N] Add allgather_coalesced custom op with CPU/CUDA implementations (#89317) Differential Revision: [D41415321](https://our.internmc.facebook.com/intern/diff/D41415321) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89317 Approved by: https://github.com/kwen2501 commit 2c3d1877fbb10736f142fcb85952890a69ce3047 Merge: 744f52223a c0dbc6488f Author: mingfeima Date: Tue Nov 22 21:41:47 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit c0dbc6488f3baa5d413ce36d2e93e7b7db21806a Author: mingfeima Date: Tue Nov 22 21:41:47 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit d9cbe7764e1af938d7edc23ffa873703d960df6c Author: Edward Z. Yang Date: Tue Nov 22 05:02:45 2022 -0800 Make aten.copy preserve strides (hf_Longformer) (#89464) Fixes https://github.com/pytorch/torchdynamo/issues/1888 Signed-off-by: Edward Z. Yang Differential Revision: [D41460986](https://our.internmc.facebook.com/intern/diff/D41460986) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89464 Approved by: https://github.com/bdhirsh commit 2d94fd3b198a31f28df10b7d9b3fcd526a82f24a Author: Manuel Candales Date: Tue Nov 22 11:05:58 2022 +0000 [Vulkan][TCC] Fix quantized shaders (#89456) Summary: Fix rounding issue in quantized shaders Test Plan: On Mac ``` cd ~/fbsource buck1 run -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 ``` On Android ``` cd ~/fbsource buck1 build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_quantized_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_quantized_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_quantized_api_test adb shell "/data/local/tmp/vulkan_quantized_api_test" ``` Reviewed By: salilsdesai Differential Revision: D41047095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89456 Approved by: https://github.com/kirklandsign, https://github.com/digantdesai commit 0f7dca17332152fdd28270eb95398efbe8212ca2 Author: Aleksandar Samardžić Date: Mon Nov 21 04:22:00 2022 +0000 Vectorized CPU code implementing right shift operator. (#88990) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88990 Approved by: https://github.com/lezcano, https://github.com/peterbell10 commit 1d6a188d08829b1aee28eb1e6255d5bf43a77f16 Author: lezcano Date: Sat Nov 19 01:00:03 2022 +0000 Reland Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761) (#84624) Reland https://github.com/pytorch/pytorch/pull/81761 Differential Revision: [D39332292](https://our.internmc.facebook.com/intern/diff/D39332292) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84624 Approved by: https://github.com/kit1980 commit 6b085d5cadffb10591c450623f93a21dd3dd786d Author: Iris Date: Tue Nov 22 07:49:06 2022 +0000 [Checkpoint][2D][2/N] Add traverse for distributed checkpoint to core distributed (#89398) This PR moves traverse and its test to torch.distributed.checkpoint. This is a pre-req for enabling 2D checkpoint. This is used when flatten nested dict and flatten sharded tensors. Docstring and comments will be added in the following PRs. Test: ``` python3 test/distributed/_tensor/parallel/test_2d_parallel.py ``` and CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/89398 Approved by: https://github.com/wanchaol commit 7b0650d5cf4897089f32c011504d2b2d185cc60a Author: Mike Iovine Date: Tue Nov 22 06:26:10 2022 +0000 Back out "[static-runtime] change the backend for permute_copy" (#89463) Summary: This permute copy change seems to be causing huge regressions on machines without AVX512. Revert to mitigate. This shouldn't be problematic since the improvement from changing it was super small anyways. Differential Revision: D41450088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89463 Approved by: https://github.com/hlu1 commit f2cf1b0f5e98094cf7a97439ebdf3679ceee04b0 Author: Nikita Shulga Date: Tue Nov 22 05:48:43 2022 +0000 Revert submodule updates introduced by #89157 (#89449) Reverts updates that were introduced by https://github.com/pytorch/pytorch/pull/89157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89449 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/clee2000 commit 40cf214f2d18b3b8af5354ddc5dad8156ea32520 Author: Wang, Eikan Date: Mon Nov 21 03:31:51 2022 +0000 Support masked_fill to address the GPT2 performance issue (#89274) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89274 Approved by: https://github.com/jgong5, https://github.com/jansel commit e545caa50f3cd893ca0419543e57af08a7de85b5 Author: Shunting Zhang Date: Tue Nov 22 03:57:01 2022 +0000 dynamo/torchxla integration: trace on xla rather than eager (#88904) In #87741 we added the inference support for dynamo/torchxla integration. Later on in #88449 we attempt to add the training support. That attempt is not smooth because - we try 2 things together 1. let dynamo trace the model on xla rather than eager 2. enable training - It turns out neither of these two tasks are trivial enough. Furthermore, item 2 (enable training) depends on item 1 (tracing on xla). We enable training via AOTAutograd. AOTAutograd lift all model parameters/buffers as graph inputs. Without item 1 being done, we would need copy all graph inputs (including model parameters/buffers) from eager device to xla devices. That hurts performance a lot. Have a cache to map eager parameter to XLA parameter does not solve the problem since the update on either will not sync automatically to the other. They will easily go out of sync. This PR let dynamo trace the model on XLA rather than eager. This is a preparation step to enabling training. Also, tracing on XLA makes the data movement more efficient. We see 1.5x geomean speedup compared to previous 1.38x. ``` +-------------------------+--------------------+-------------------------+ | Model | XLA (trace once) | XLA (trace everytime) | +=========================+====================+=========================+ | resnet18 | 1.38 | 1.008 | +-------------------------+--------------------+-------------------------+ | resnet50 | 1.227 | 0.998 | +-------------------------+--------------------+-------------------------+ | resnext50_32x4d | 1.544 | 1.008 | +-------------------------+--------------------+-------------------------+ | alexnet | 1.085 | 1.045 | +-------------------------+--------------------+-------------------------+ | mobilenet_v2 | 2.028 | 1.013 | +-------------------------+--------------------+-------------------------+ | mnasnet1_0 | 1.516 | 0.995 | +-------------------------+--------------------+-------------------------+ | squeezenet1_1 | 0.868 | 1.01 | +-------------------------+--------------------+-------------------------+ | vgg16 | 1.099 | 1.008 | +-------------------------+--------------------+-------------------------+ | BERT_pytorch | 3.26 | 1.027 | +-------------------------+--------------------+-------------------------+ | timm_vision_transformer | 2.182 | 1.015 | +-------------------------+--------------------+-------------------------+ | geomean | 1.50389 | 1.01261 | +-------------------------+--------------------+-------------------------+ ``` Example command ``` GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --only resnet18 --backend=torchxla_trace_once ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88904 Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/jansel commit 1dae59ba168fe3c4c11c102f935101c3e4f3b105 Author: Iris Date: Tue Nov 22 03:52:32 2022 +0000 [Checkpoint][2D][1/N] Add dedup_tensors for distributed checkpoint to core distributed (#89399) This PR moves dedup_tensors and its test to torch.distributed.checkpoint. This is a pre-req for enabling 2D checkpoint. This removes duplicated shards in list of SavePlan. It is used when saving DT with replicated placement. Docstring and comments will be added in the following PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89399 Approved by: https://github.com/wanchaol commit ce342ed2d3a4a0dd8151abe80bfe0bb06a7b0ae9 Author: Huy Do Date: Tue Nov 22 03:39:15 2022 +0000 Fix retrying logic for successful unittest tests under --rerun-disabled-tests mode (#89454) When looking into Rockset data for disabled test unittest, for example `testAdd`, I see that it's re-run only 3 times instead of 50+ times as expected under rerun-disabled -test mode ``` [ { "name": "testAdd", "classname": "TestLazyReuseIr", "filename": "lazy/test_reuse_ir.py", "flaky": false, "num_green": 3, "num_red": 0 } ] ``` It turns out that I made a mistake mixing `RERUN_DISABLED_TESTS` and `report_only` into `(RERUN_DISABLED_TESTS or report_only) and num_retries_left < MAX_NUM_RETRIES` in https://github.com/pytorch/pytorch/pull/88646. The retrying logic for successful tests under rerun-disabled-tests mode is never executed because num_retries_left would be equal to MAX_NUM_RETRIES (not smaller) if the very first run successes. Thus, the sample test `testAdd` finishes right away (1 success count) * `report_only` and `RERUN_DISABLED_TESTS` are 2 different things and shouldn't be mixed together. RERUN_DISABLED_TESTS has the higher priority. * We also don't want to retry skipped tests under rerun-disabled-tests mode because they are only skipped due to `check_if_enable` check `Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run` * CI https://github.com/pytorch/pytorch/actions/runs/3518228784 generates https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3518228784/1/artifact/test-reports-test-default-4-4-linux.4xlarge.nvidia.gpu_9627285587.zip in which `testAdd` is correctly called multiple times and `TestLazyReuseIr` is skipped correctly * Locally ``` $ python test/run_test.py --verbose -i lazy/test_reuse_ir Ignoring disabled issues: [] Selected tests: lazy/test_reuse_ir Prioritized test from test file changes. reordering tests for PR: prioritized: [] the rest: ['lazy/test_reuse_ir'] Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/slow-tests.json to /Users/huydo/Storage/mine/pytorch/test/.pytorch-slow-tests.json Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/disabled-tests-condensed.json to /Users/huydo/Storage/mine/pytorch/test/.pytorch-disabled-tests.json parallel (file granularity) tests: lazy/test_reuse_ir serial (file granularity) tests: Ignoring disabled issues: [] Ignoring disabled issues: [] Running lazy/test_reuse_ir ... [2022-11-21 13:21:07.165877] Executing ['/Users/huydo/miniconda3/envs/py3.9/bin/python', '-bb', 'lazy/test_reuse_ir.py', '-v', '--import-slow-tests', '--import-disabled-tests', '--rerun-disabled-tests'] ... [2022-11-21 13:21:07.166279] Expand the folded group to see the log file of lazy/test_reuse_ir Running tests... ---------------------------------------------------------------------- Test results will be stored in test-reports/python-unittest/lazy.test_reuse_ir testAdd (__main__.TestLazyReuseIr) ... ok (1.215s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 50 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 49 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 48 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 47 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 46 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 45 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 44 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 43 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 42 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 41 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 40 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 39 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 38 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 37 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 36 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 35 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 34 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 33 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 32 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 31 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 30 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 29 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 28 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 27 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 26 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 25 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 24 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 23 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 22 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 21 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 20 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 19 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 18 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 17 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 16 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 15 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 14 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 13 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 12 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 11 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 10 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 9 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 8 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 7 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 6 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 5 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 4 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 3 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 2 ok (0.001s) testAdd (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 1 ok (0.001s) testAddSub (__main__.TestLazyReuseIr) ... testAdd succeeded - num_retries_left: 0 skip: Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run (0.001s) testAddSubFallback (__main__.TestLazyReuseIr) ... skip: Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run (0.001s) testBatchNorm (__main__.TestLazyReuseIr) ... skip: Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run (0.001s) ---------------------------------------------------------------------- Ran 54 tests in 1.264s OK (skipped=3) ``` Here is the sample rockset query ``` WITH added_row_number AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY name, classname, filename ORDER BY _event_time DESC) AS row_number FROM commons.rerun_disabled_tests ) SELECT name, classname, filename, flaky, num_green, num_red FROM added_row_number WHERE row_number = 1 AND name = 'testAdd' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89454 Approved by: https://github.com/clee2000 commit 338f61904421bef1b46c9d614470b523c0696654 Author: PyTorch MergeBot Date: Tue Nov 22 03:38:53 2022 +0000 [vision hash update] update the pinned vision hash (#89471) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89471 Approved by: https://github.com/pytorchbot commit 00b9473ad68da319a1dc3f655cc1a97490ae9669 Author: fduwjj Date: Tue Nov 22 03:05:50 2022 +0000 [PT-D][Tensor Parallelism][2/N] Sync TP API change to PT prod (#89467) This is part of TP Beta Release efforts. ref: https://github.com/pytorch/tau/issues/576 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89467 Approved by: https://github.com/wanchaol commit 82713a1cc4589f084ecbcb591d1f9b12570cac43 Author: Animesh Jain Date: Tue Nov 22 02:23:21 2022 +0000 [inductor][compilation time] Fallback when kernel size for avg/max pool is large (#89448) This fixes compilation time for yolov3 from 400 seconds to 48 seconds. yolov3 has a 13x13 max_pool2d kernel, which was creating really large Triton code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89448 Approved by: https://github.com/ngimel commit 496c8ae760bf646d7a45aad0c2e0320a67b66fd2 Author: maxren Date: Mon Nov 21 10:58:05 2022 -0800 [xnnpack][lite-int] Handle Constant Data (#89445) Handling constant data for xnnpack delegation. This allows us to handle new modules like such: ``` class Module(torch.nn.Module): def __init__(self): super().__init__() self._constant = torch.ones(4, 4, 4) def forward(self, x): return x + self._constant ``` this is the precursor work to handling convolution, as we need to serialize constant data(weights) Differential Revision: [D41050349](https://our.internmc.facebook.com/intern/diff/D41050349/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89445 Approved by: https://github.com/digantdesai commit 120d200620159597f416f9142f1d5708182ca047 Author: Animesh Jain Date: Tue Nov 22 02:20:45 2022 +0000 Revert "Added conv constraint that infers layouts (#89031)" (#89451) This reverts commit 716f70f19a4b63268da2a753afdbe9b385a831ab. Fixes performance regression and compilation latency increase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89451 Approved by: https://github.com/soumith, https://github.com/jansel commit 06dffb3319a38bf909939f64320e0fde88679b94 Author: Edward Z. Yang Date: Mon Nov 21 17:54:25 2022 -0500 dont clone symints, dont clobber symint proxies (#88230) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88230 Approved by: https://github.com/albanD commit 58a74f34f981de2c24b8f57c37687421c87a782a Author: Howard Huang Date: Mon Nov 21 11:05:38 2022 -0800 [17/N] Add _reduce_scatter_base custom op with CPU/CUDA implementation (#88903) Differential Revision: [D41415325](https://our.internmc.facebook.com/intern/diff/D41415325) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88903 Approved by: https://github.com/kwen2501 commit 7174572b1ef4cff545e4ca8fc77c135e58fcbefb Author: Will Constable Date: Mon Nov 21 21:37:32 2022 +0000 Add torchvis support to dist bench (#89324) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89324 Approved by: https://github.com/davidberard98, https://github.com/albanD commit 57ed94804e8195f227c7a75899a319cc0a3b833a Author: Edward Z. Yang Date: Mon Nov 21 16:04:46 2022 -0500 Bind DispatchKey.Functionalonalize in pybind11 (#89452) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89452 Approved by: https://github.com/albanD, https://github.com/bdhirsh commit b189a7444da8b17c535e7d04c9ab705289ec53e1 Author: Khushi Date: Tue Nov 22 00:15:30 2022 +0000 [fix] tril & tril : out of bound check (#89384) Fixes #83326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89384 Approved by: https://github.com/ngimel commit dbc354b262f7f5e49aa781785cfce6299fdc2aa8 Author: Huy Do Date: Tue Nov 22 00:13:38 2022 +0000 Mitigate flaky test_ops_fwd_gradients on macOS (#89410) This has been flaky on macOS for a while ([hud](https://hud.pytorch.org/failure/RuntimeError%3A%20test_ops_fwd_gradients%20failed)) and I can reproduce this locally. The issue was raised by https://github.com/pytorch/pytorch/issues/66033 and it seems to point to macos itself https://github.com/graphia-app/graphia/issues/33. So switching to single thread when running `test_ops_fwd_gradients` on macOS as a mitigation for the flaky tests. `pytest test_ops_fwd_gradients.py -k test_fn_fwgrad_bwgrad -vv --flake-finder` to run all `test_fn_fwgrad_bwgrad` tests 50 times to make sure they all pass (no flaky anymore) https://hud.pytorch.org/tests shows that `test_ops_fwd_gradients` on macOS takes about 15m to finish or 8 minute if using 2 shards like in the test. There is no obvious difference in the test duration: ``` 2022-11-21T21:34:18.6078080Z Running test_ops_fwd_gradients ... [2022-11-21 21:34:18.600663] 2022-11-21T21:34:21.6805770Z Executing ['/Users/runner/work/_temp/conda_environment_3517515737/bin/python', '-bb', 'test_ops_fwd_gradients.py', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2', '--shard-id=0', '--num-shards=2', '-k=not _linalg_cholesky_', '--import-slow-tests', '--import-disabled-tests'] ... [2022-11-21 21:34:21.680156] 2022-11-21T21:34:21.6806380Z Ignoring disabled issues: [] 2022-11-21T21:34:21.6815250Z Executing ['/Users/runner/work/_temp/conda_environment_3517515737/bin/python', '-bb', 'test_ops_fwd_gradients.py', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2', '--shard-id=1', '--num-shards=2', '-k=not _linalg_cholesky_', '--import-slow-tests', '--import-disabled-tests'] ... [2022-11-21 21:34:21.681174] 2022-11-21T21:34:21.6815830Z Ignoring disabled issues: [] ..... 2022-11-21T21:40:42.2422700Z =============================== warnings summary =============================== ..... 2022-11-21T21:40:42.2424670Z - generated xml file: /Users/runner/work/pytorch/pytorch/test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-47b619449ea7db1f.xml - 2022-11-21T21:40:42.2424850Z = 831 passed, 596 skipped, 5 deselected, 17 xfailed, 1 warning in 374.54s (0:06:14) = ..... 2022-11-21T21:42:00.1923310Z =============================== warnings summary =============================== ..... 2022-11-21T21:42:00.1925370Z - generated xml file: /Users/runner/work/pytorch/pytorch/test/test-reports/python-pytest/test_ops_fwd_gradients/test_ops_fwd_gradients-d24ee6419a602a6e.xml - 2022-11-21T21:42:00.1925540Z = 828 passed, 603 skipped, 7 deselected, 20 xfailed, 1 warning in 452.94s (0:07:32) = .... 2022-11-21T21:42:09.9035670Z FINISHED PRINTING LOG FILE of test_ops_fwd_gradients (/Users/runner/work/pytorch/pytorch/test/test-reports/test_ops_fwd_gradients_ha_3rfhb) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89410 Approved by: https://github.com/soulitzer commit ea50549ce62aeeccfe27035a0a975e83b9c2c987 Author: Edward Z. Yang Date: Mon Nov 21 18:12:21 2022 -0500 Suppress guards when creating fake tensors (#89349) When we create fake tensors, we may call operators that introduce guards, to accurately reconstruct views. But these guards are spurious: if a user is able to present a tensor that "looks the same", they have implicitly fulfilled the contract that the view is creatable. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89349 Approved by: https://github.com/voznesenskym commit fa4980cd5e7581b5195ed4d02d63bf73497549d0 Author: William Wen Date: Mon Nov 21 22:56:13 2022 +0000 Add commit hash to dynamo dashboard (#89462) Title - also fix a small bug with dashboard outputs. Sample: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1322732698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89462 Approved by: https://github.com/anijain2305 commit 186192bb26a71ec9b0131a6c49fdf19e76d208d7 Author: Yanbo Liang Date: Mon Nov 21 22:43:58 2022 +0000 [Dynamo] Fix bugs when calling tensor.data and tensor.layout (#89257) Fix bugs in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884). * Legacy code still use ```tensor.data```, I think we can use ```tensor.detach``` to rewrite, not sure if there is anything I didn't anticipate. * Support ```tensor.layout```. The root cause of these issues are: dynamo wraps unimplemented ```tensor.x``` call into ```GetAttrVariable(TensorVariable, x)```, but this op was not inserted into FX graph. Hence, during the fake tensor propagation, it throws ```KeyError: 'example_value` ```. For these two popular attributes, Dynamo should support them anyway. However, if dynamo should support ___all___ ```tensor.x``` call and not fallback to ```GetAttrVariable```, I think it's debatable. If I turn off fake tensor propagation, it works well even not including this fix. So I'm curious if we should improve the fake propagation to cover similar cases. cc @mlazos @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire @jansel @eellison ``` Traceback (most recent call last): File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/convert_frame.py", line 404, in _compile out_code = transform_code_object(code, transform) File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object transformations(instructions, code_options) File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/convert_frame.py", line 392, in transform tracer.run() File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 1523, in run super().run() File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 389, in run and self.step() File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 359, in step getattr(self, inst.opname)(inst) File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 193, in wrapper return inner_fn(self, inst) File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 865, in CALL_FUNCTION_KW self.call_function(fn, args, kwargs) File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 301, in call_function self.push(fn.call_function(self, args, kwargs)) File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/variables/torch.py", line 407, in call_function tensor_variable = wrap_fx_proxy( File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/variables/builder.py", line 636, in wrap_fx_proxy return wrap_fx_proxy_cls( File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/variables/builder.py", line 676, in wrap_fx_proxy_cls example_value = get_fake_value(proxy.node, tx) File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 1024, in get_fake_value args, kwargs = torch.fx.node.map_arg((node.args, node.kwargs), visit) File "/scratch/ybliang/work/repos/pytorch/torch/fx/node.py", line 613, in map_arg return map_aggregate(a, lambda x: fn(x) if isinstance(x, Node) else x) File "/scratch/ybliang/work/repos/pytorch/torch/fx/node.py", line 621, in map_aggregate t = tuple(map_aggregate(elem, fn) for elem in a) File "/scratch/ybliang/work/repos/pytorch/torch/fx/node.py", line 621, in t = tuple(map_aggregate(elem, fn) for elem in a) File "/scratch/ybliang/work/repos/pytorch/torch/fx/node.py", line 627, in map_aggregate return immutable_dict((k, map_aggregate(v, fn)) for k, v in a.items()) File "/scratch/ybliang/work/repos/pytorch/torch/fx/node.py", line 627, in return immutable_dict((k, map_aggregate(v, fn)) for k, v in a.items()) File "/scratch/ybliang/work/repos/pytorch/torch/fx/node.py", line 631, in map_aggregate return fn(a) File "/scratch/ybliang/work/repos/pytorch/torch/fx/node.py", line 613, in return map_aggregate(a, lambda x: fn(x) if isinstance(x, Node) else x) File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 1022, in visit return n.meta["example_value"] KeyError: 'example_value\n\nfrom user code:\n File "./generated/test_BayesWatch_pytorch_prunes.py", line 108, in forward\n return torch.zeros([x.size()[0], self.channels, x.size()[2] // self.spatial, x.size()[3] // self.spatial], dtype=x.dtype, layout=x.layout, device=x.device)\n\nSet torch._dynamo.config.verbose=True for more information\n\n\nYou can suppress this exception and fall back to eager by setting:\n torch._dynamo.config.suppress_errors = True\n' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89257 Approved by: https://github.com/jansel commit 821ba6b51beb1844f264fd57e1eccecb446e4870 Author: Wanchao Liang Date: Mon Nov 21 19:19:29 2022 +0000 [4/n] Thread PG: add reduce_scatter to threaded pg (#89442) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89442 Approved by: https://github.com/yhcharles, https://github.com/fduwjj commit 3e99d4db7671430901bb6292073f368ce1443e05 Author: Wanchao Liang Date: Mon Nov 21 19:19:29 2022 +0000 [3/n] Thread PG: add scatter to threaded pg (#89441) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89441 Approved by: https://github.com/XilunWu, https://github.com/yhcharles, https://github.com/fduwjj commit 3876f94c3d0eb329686d0699da2bab00849099b6 Author: Wanchao Liang Date: Mon Nov 21 19:19:29 2022 +0000 [2/n] Thread PG: add test for broadcast (#89440) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89440 Approved by: https://github.com/XilunWu, https://github.com/yhcharles, https://github.com/fduwjj commit deae450899eb048754f046999a18fbda8c9b2d68 Author: Wanchao Liang Date: Mon Nov 21 19:19:29 2022 +0000 [1/n] Thread PG: add test for allgather (#89439) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89439 Approved by: https://github.com/XilunWu, https://github.com/yhcharles, https://github.com/fduwjj commit 047e542a1a71448083d812783380b855e023eb14 Author: Mengwei Liu Date: Mon Nov 21 21:08:13 2022 +0000 [tools] expose selective build library (#89351) Change the base module and visibility of `tools:gen_oplist_lib` so that it can be reused. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89351 Approved by: https://github.com/cccclai commit c068fa900f1352240a04123a74d4d1f83b295222 Author: Peter Bell Date: Sun Nov 20 23:36:41 2022 +0000 [inductor] Misc division lowering fixes (#88603) 1. `aten.div.Tensor_mode` should allow broadcasting 2. `div` can use `ELEMENTWISE_TYPE_PROMOTION_KIND.INT_TO_FLOAT` 3. `prims.div` on integers should be truncating division 4. Add lowering for `true_divide` which is aliased to `div` 5. register lowering for inplace version of `div_mode` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88603 Approved by: https://github.com/ngimel commit 1267dcf2971b181d7379928f3452ce622add91e9 Author: Peter Bell Date: Sun Nov 20 23:19:24 2022 +0000 [inductor] Fix nan handling for aten.sign (#88937) ATen gives `sign(nan) == 0` but inductor's cuda codegen would give `sign(nan) == 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88937 Approved by: https://github.com/ngimel commit 3d247a8bcd6f07ffae8c7144ac08ba9fdeeb2025 Author: Keval Morabia Date: Mon Nov 21 20:40:04 2022 +0000 Fix unconvertible_ops as per #89261 (#89299) Fixes #89261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89299 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit 1d9e1fca97a2a01ea75b0938e38feee1d5288ebd Author: Driss Guessous Date: Mon Nov 21 20:02:09 2022 +0000 Update sdp dispatch logic to enable fused backward (#89154) Reorganizes how the sdp dispatch logic is down in order to enable backwards for fused kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/89154 Approved by: https://github.com/cpuhrsch commit cf9476554fce9a9c909eebd7439f4b3f4d208f6c Author: Taylor Robie Date: Mon Nov 21 09:23:16 2022 -0800 update kineto pinned commit (#89435) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89435 Approved by: https://github.com/malfet commit e4d9dbd7d236e86fac0055feb7dd8f64516d375e Author: Xu Zhao Date: Mon Nov 21 17:25:28 2022 +0000 Port torchdynamo's torchbench script to userbenchmark (#89239) Summary: This Diff ports the torchbench.py script from torchdynamo to torchbench to support the development of internal models. Currently, only works with the `--only` option, and can only test one model at a time. Note that the noisy logs are from upstream model code, not the benchmark code. In the internal environment, `torch._dynamo.config.base_dir` is not writable, so we add an option to specify the output directory. Test Plan: ``` $ buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only ads_dhen_5x --part over --output-directory /tmp/tb-test/ cuda eval ads_dhen_5x 1/ 1 +0 frames 2s 1 graphs 1 graph calls 412/ 411 = 100% ops 100% time ``` ``` $ buck2 run mode/opt //caffe2/benchmarks/dynamo:torchbench -- --performance --only cmf_10x --part over --output-directory /tmp/tb-test/ cuda eval cmf_10x 1/ 1 +0 frames 1s 1 graphs 1 graph calls 306/ 305 = 100% ops 100% time ``` Reviewed By: jansel Differential Revision: D41294311 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89239 Approved by: https://github.com/jansel commit 9d209e78348ee5c3e1ead700d240fb476b3bc4de Author: PyTorch MergeBot Date: Mon Nov 21 16:48:26 2022 +0000 Revert "[ao] making _is_activation_post_process private (#87520)" This reverts commit 45c62a337756ff9db97cd64d2d42d9e65dda0a85. Reverted https://github.com/pytorch/pytorch/pull/87520 on behalf of https://github.com/bigfootjon due to Diff reverted internally commit f3db03612f9c6fb8717e1e13a9295da3c9c05193 Author: PyTorch MergeBot Date: Mon Nov 21 16:38:20 2022 +0000 Revert "[ao] maintain BC for is_activation_post_process (#89260)" This reverts commit c5fafb4e1694f141d8a1a31142cce4049d9057ed. Reverted https://github.com/pytorch/pytorch/pull/89260 on behalf of https://github.com/DanilBaibak due to breaking internal builds commit 6796979ee1063890fd04bbf21f298f669129df8f Author: Jiong Gong Date: Mon Nov 21 14:20:33 2022 +0000 [Inductor] Limit the number of compile threads to the available cpu cores (#89377) `config.compile_threads` gets the number of compile threads via `min(32,os.cpu_count())` while `os.cpu_count()` is the total number of cpu cores in the system, not the available ones. This would cause compile thread contention when the available cpu cores are less than `min(32,os.cpu_count())`, e.g., available cpu cores are limited with numactl or taskset, making the compilation very slow. This PR tries to use `len(os.sched_getaffinity(0))` if `os.sched_getaffinity` is available which returns the available number of cpu cores. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89377 Approved by: https://github.com/soumith commit c2cf0bde1f4e9bed642648f299db0f6d5ecb5996 Author: lezcano Date: Mon Nov 21 10:54:32 2022 +0000 Move the OpInfo same-storage error to the autograd test (#88306) This check was previously located at the `non_contiguous` test (quite and odd location). Even more, at https://github.com/pytorch/pytorch/pull/86378#discussion_r993658395, Kshiteej found that this assert was not doing anything really. We move it to the autograd test and make it a proper `self.assert`. We also disallow returning 1-tuples from sample_input functions, as they were breaking this assert. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88306 Approved by: https://github.com/mruberry commit a80e5e78137fb8adea6e7d638be483f866fe26e8 Author: yanbing-j Date: Mon Nov 21 09:52:34 2022 +0000 Update ideep for future performance improvement (#87966) **Summary** The update includes API changes and optimzations to reduce framework overhead, which will benefit all mkldnn (onednn) ops in JIT mode and inductor CPU backend, etc. These benefits will be seen after switching to new ideep API by future PRs. **Test plan** For correctness, all UTs that call mkldnn ops, including test_ops.py, test_mkldnn*.py, test_quantization.py, etc. For performance, TorchBench has been run and no regression is found. Results are shown below. - Intel (R) Xeon (R) IceLake with 40 cores - Use multi-instance - Using tcmalloc & Intel OMP ![image](https://user-images.githubusercontent.com/12522207/201631004-bb77468d-953b-4757-a001-94d44615b5f6.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87966 Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper commit 31708a731076b7feed3051b81d309a9babb4efc0 Author: XiaobingSuper Date: Sun Nov 20 20:46:03 2022 -0500 TorchDynamo: enable conv+silu fusion (#89278) This PR will improve the tf_efficientnet_b0 performance by fusing conv+silu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89278 Approved by: https://github.com/jgong5, https://github.com/jansel commit bc716383a6a3063b35cedfe8d163c61a4ff8f301 Author: Wang, Eikan Date: Mon Nov 21 03:31:50 2022 +0000 Redefine the simdlen semantic (#89263) This PR is targeting to automatically enable vectorization optimization for TorchInductor. It refined the semantics of `config.cpp.simdlen`. Originally, `None` means to disable vectorization while a specific value means the number of elements to be vectorized once time. But it depends on the data. Regarding 256bit SVE/SIMD ISA for ARM and X86, the `simdlen` should be 16 for Float while 32 for BFloat. Hence, this PR defined the `simdlen` as the bit width. The detailed semantics are as follows. - **_simdlen = None_**: Automatically determine the SIMD bit width. Detect HW information and pick the proper vectorization ISA. Specific for X86, the priority of AVX512 is higher than AVX2. - **_simdlen <=1_**: Explicitly disable SIMD - **_simdlen > 1_**: Explicitly specify the SIMD bit width. It equals the disabled semantic if the bit width does not match the ISA width. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89263 Approved by: https://github.com/jgong5, https://github.com/jansel commit 79770d3636626b2130e58d5acdf1d6a56953329d Author: XiaobingSuper Date: Sun Nov 20 20:46:02 2022 -0500 TorchDynamo: enable conv+relu6 fusion (#89265) This PR is about enabled conv+relu6 which improves mobilenet'e performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89265 Approved by: https://github.com/jgong5, https://github.com/jansel commit 744f52223a41a6de972728286c2db196a45e3df6 Merge: 4fec37d580 73a6cdb92f Author: mingfeima Date: Mon Nov 21 15:33:31 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit 73a6cdb92f2ab305eec4ea400d2ad956b6a52365 Author: mingfeima Date: Mon Nov 21 15:33:31 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit 4fec37d5800163dd8bff11210cbf4424700aeb7c Merge: 2f59c69ac7 27e02c0176 Author: mingfeima Date: Mon Nov 21 14:16:39 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit 27e02c0176a5de350e629e858d19abff62649c2d Merge: f93ba52d25 e0251de42f Author: mingfeima Date: Mon Nov 21 14:16:39 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned] commit e0251de42f56c8de0bd9b2783bfa2ae67e4813c5 Author: Shen Li Date: Sun Nov 20 22:54:45 2022 +0000 [Easy] Use prepend arg to register forward hooks in quantize.py (#89391) Differential Revision: [D41431110](https://our.internmc.facebook.com/intern/diff/D41431110) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89391 Approved by: https://github.com/awgu commit 1db5ce095fb0e721c92304bceca7798456929e73 Author: PyTorch MergeBot Date: Mon Nov 21 03:08:31 2022 +0000 [vision hash update] update the pinned vision hash (#89287) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89287 Approved by: https://github.com/pytorchbot commit 51e961dd7bb9abaf999e6028208b2778a57c32b2 Author: Natalia Gimelshein Date: Mon Nov 21 00:58:03 2022 +0000 use std/libdevice erf in inductor (#89388) By itself, libdevice version of erf has the same perf as our decomposition, but in real workloads it leads to better fusion groups (due to fewer ops in the fused kernel). Bonus: a few fp64 test skips removed, because our decomposition wasn't accurate enough for fp64, but libdevice version is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89388 Approved by: https://github.com/jansel commit 1856fa5df7fda9950da26eff2ef885e845bf6b6c Author: Huy Do Date: Sun Nov 20 23:36:47 2022 +0000 Temporary increase ASAN shard 5 to 4xlarge (#89387) ASAN shard 5 also see OOM now https://hud.pytorch.org/pytorch/pytorch/commit/7b0d577c226fae78f377b26feab4122c4203ad59, may be we should increase all 5 of them to 4xlarge until https://github.com/pytorch/pytorch/issues/88309 is resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/89387 Approved by: https://github.com/kit1980 commit e1d58b1928a9427f05e3f44ab9b8119000bced09 Author: PyTorch MergeBot Date: Sun Nov 20 22:14:38 2022 +0000 Revert "Update sdp dispatch logic to enable fused backward (#89154)" This reverts commit 2e72ec79823111e8dd8c5e82c5d1b56197cd52d3. Reverted https://github.com/pytorch/pytorch/pull/89154 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but the new test_sdp_math_gradcheck test breaks periodic slow gradcheck, i.e. https://hud.pytorch.org/pytorch/pytorch/commit/419ef2cdcfe84442de5232739284c6a51a18632f commit c09929659ce8ba2f1b7b2f6e50084ccbf854d44b Author: Edward Z. Yang Date: Sun Nov 20 09:13:30 2022 -0500 Also include MKL_THREAD_LIB in link libraries for caffe2::mkl (#89378) Actually fixes https://github.com/pytorch/audio/issues/2784 for real; in my previous testing I didn't check if I could import torchaudio; now torchaudio successfully imports. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89378 Approved by: https://github.com/soumith commit 7b0d577c226fae78f377b26feab4122c4203ad59 Author: Edward Z. Yang Date: Sat Nov 19 22:31:24 2022 -0500 Set INTERFACE_LINK_DIRECTORIES on caffe2::mkl (#89359) This ensures that subsequent link commands involving mkl libraries know where to find the libraries if they are in a non-standard location (which is the case if you installed mkl via conda, which is what our standard instructions recommend.) This is kind of a hack, because the MKL libraries are not actually guaranteed to be in $MKL_ROOT/lib (they are for the conda install though). The real fix is to properly use the MKL targets from FindMKL.cmake but thats its own can of fish. See https://github.com/pytorch/pytorch/issues/73008 This fixes https://github.com/pytorch/audio/issues/2784 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89359 Approved by: https://github.com/soumith commit dbeacf11820e336e803bb719b7aaaf2125ae4d9c Author: Edward Z. Yang Date: Sat Nov 19 19:44:18 2022 -0500 Fix cat striding in PrimTorch (#89332) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89332 Approved by: https://github.com/ngimel commit caf3d5319f15e47363fe36856326f5e4ab3303e1 Author: Sherlock Huang Date: Sat Nov 19 23:10:34 2022 +0000 Symintify numel(), infer_size, prims.elementwise_meta (#88956) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88956 Approved by: https://github.com/ezyang commit 7c811efab70a3546f997e37178c93d1de24e0444 Author: Edward Z. Yang Date: Sat Nov 19 12:52:39 2022 -0500 Add support for dynamic kwarg to torch._dynamo.optimize (#89290) This is an easier way to enable dynamic shapes for a region. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89290 Approved by: https://github.com/soumith, https://github.com/jansel, https://github.com/voznesenskym commit 8ad39536d741d9fc8c5d33f1344d23bd56f1c050 Author: PyTorch MergeBot Date: Sat Nov 19 21:47:55 2022 +0000 Revert "Symintify numel(), infer_size, prims.elementwise_meta (#88956)" This reverts commit ce2f8700bafcf44850402a39188ec121ba8b5486. Reverted https://github.com/pytorch/pytorch/pull/88956 on behalf of https://github.com/ezyang due to somehow breaks torch.numel commit 8ac58bc2e3449760bef7f36f600a40c96d5bc5ba Author: kvathupo Date: Sat Nov 19 21:40:07 2022 +0000 Add nullptr_t overload to c10::intrusive_ptr (#89196) __What?__ Fixes #82413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89196 Approved by: https://github.com/ezyang commit 5582001bd5e5c66dcab8859ecb84cbaa42524fd4 Author: Edward Z. Yang Date: Sat Nov 19 12:51:53 2022 -0500 Reland 2 "Towards unifying symbolic and non symbolic fake tensor (#89038) (#89143)" (#89346) This reverts commit 8e4c9828f4c990f439179912159086aaed790493. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89346 Approved by: https://github.com/wconstab commit 6afe341276f9ffa660446c5fa15b68558791869a Author: fduwjj Date: Sat Nov 19 18:01:25 2022 +0000 [PT-D][1/N] Sync TP Beta change to prod (#89242) This is part of TP Beta Release efforts. ref: https://github.com/pytorch/tau/issues/576 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89242 Approved by: https://github.com/wanchaol commit 6b8c1b19b513ec3d82d588961f8a2b4a86e08f99 Author: Michael Voznesensky Date: Sat Nov 19 17:49:39 2022 +0000 RM expectedFailure UnspecReproTests.test_batch_norm_act_unspec (#89340) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89340 Approved by: https://github.com/bertmaher commit 6daf60be5abe4184121bc41e69e336015a268d6a Author: AllenTiTaiWang Date: Sat Nov 19 02:56:14 2022 +0000 [ONNX] Add setType from user into InferredType and Reliable in ConstantValueMap (#88622) `setType` API is not respected in current exporter because the graph-level shape type inference simply overrides every NOT ONNX Op shape we had from node-level shape type inference. To address this issue, this PR (1) makes custom Op with `setType` **reliable** in ConstantValueMap to secure its shape/type information in pass: _C._jit_pass_onnx. (2) If an invalid Op with shape/type in pass: _C._jit_pass_onnx_graph_shape_type_inference(graph-level), we recognize it as reliable. 1. In #62856, The refactor in onnx.cpp made regression on custom Op, as that was the step we should update custom Op shape/type information into ConstantValueMap for remaining Ops. 2. Add another condition besides IsValidONNXNode for custom Op setType in shape_type_inference.cpp. If all the node output has shape (not all dynamic), we say it's custom set type. 3. ~However, this PR won't solve the [issue](https://github.com/pytorch/pytorch/issues/87738#issuecomment-1292831219) that in the node-level shape type inference, exporter invokes the warning in terms of the unknow custom Op, since we process its symbolic_fn after this warning, but it would have shape/type if setType is used correctly. And that will be left for another issue to solve. #84661~ Add `no_type_warning` in UpdateReliable() and it only warns if non ONNX node with no given type appears. Fixes #81693 Fixes #87738 NOTE: not confident of this not breaking anything. Please share your thoughts if there is a robust test on your mind. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88622 Approved by: https://github.com/BowenBao commit 940959ebbfa54204b3cd45f918c5ee65b5efc3d0 Author: Jerry Zhang Date: Fri Nov 18 22:46:47 2022 -0800 [quant][fix] Add quant_min/quant_max for default dynamic quantization observer (#89267) Summary: This is needed for choose qparams, but previously it is not configurable, and in the reference quantization flow with decomposed Tensor, we are making this explicit Test Plan: tested in future PR Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89267 Approved by: https://github.com/vkuzo commit 808bdbab89e875abbbe9652bde675b4402eed532 Author: Michael Voznesensky Date: Sat Nov 19 07:16:29 2022 +0000 Fix try/except flow where DataDependentOutputException is getting wrapped in a RuntimeError (#89314) Repro fixed ``` def fn(a): return a.repeat_interleave(14, dim=0).repeat_interleave(14, dim=1) x = torch.ones(14, 14).to(dtype=torch.int64) opt_fn = torch._dynamo.optimize("eager")(fn) opt_fn(x) ``` Fixes [#1886](https://github.com/pytorch/torchdynamo/issues/1886) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89314 Approved by: https://github.com/anijain2305, https://github.com/eellison commit 419ef2cdcfe84442de5232739284c6a51a18632f Author: Horace He Date: Fri Nov 18 21:39:11 2022 +0000 Added utility to count memory reads/written in Inductor (#89203) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89203 Approved by: https://github.com/jansel, https://github.com/ngimel commit 7a2930b357a4e62bb0bab53bb0d23c607b6ede38 Author: kshitij12345 Date: Sat Nov 19 04:09:29 2022 +0000 add jvp test with non-contig inputs (#89131) Ref: https://github.com/pytorch/functorch/issues/1029 We update `test_jvp` to do contiguous and non-contiguous testing in a single test. Prev time for `test_jvp` : ~28s New time for `test_jvp`: ~45s Pull Request resolved: https://github.com/pytorch/pytorch/pull/89131 Approved by: https://github.com/zou3519 commit 631baecbcd821124296cfe40e5c297cf2def410c Author: Michael Voznesensky Date: Sat Nov 19 03:35:07 2022 +0000 Add --explain flag to bench (#89316) TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 time python benchmarks/dynamo/torchbench.py --accuracy --explain --backend aot_eager --train --only BERT_pytorch Dynamo produced 76 graphs with 75 graph break and 198 ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/89316 Approved by: https://github.com/ezyang commit e6996ea172b01fa6c6586379ccb26746c32e2b2c Author: Yuxin Wu Date: Sat Nov 19 02:24:18 2022 +0000 Don't redefine __STDC_FORMAT_MACROS (#89310) Similar to https://github.com/pytorch/pytorch/pull/39608 and https://github.com/pytorch/pytorch/pull/6676 This causes a compile error in our internal build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89310 Approved by: https://github.com/kit1980 commit 8c0515dbff04f03cae584e10100715e05f7cb32e Author: Nikolay Korovaiko Date: Sat Nov 19 02:18:03 2022 +0000 cast C++ py-bound SymNode to SymInt correctly (#89295) Unfortunately, it's a bit hard to test purely on the Pytorch core side, but it passes the XLA tests which are currently disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89295 Approved by: https://github.com/ezyang commit 2e72ec79823111e8dd8c5e82c5d1b56197cd52d3 Author: Driss Guessous Date: Sat Nov 19 02:06:24 2022 +0000 Update sdp dispatch logic to enable fused backward (#89154) Reorganizes how the sdp dispatch logic is down in order to enable backwards for fused kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/89154 Approved by: https://github.com/cpuhrsch commit 85a87e635c677e1c6d992bb9ea21c634e8b1d58f Author: Michael Lazos Date: Sat Nov 19 01:47:45 2022 +0000 [dynamo] mutable local caching to make dynamo faster at tracing mutation (#89170) Make mutation faster to speed up tracing optimizers, helps with https://github.com/pytorch/torchdynamo/issues/1803 `replace_all` no longer iterates over the entire variable tracker data structure every time a mutation is performed Each variable tracker internally keeps a set of contained mutable variable trackers, to provide a hint to `replace_all`. This is populated with a call to `apply` from `__post_init__` in the base `VariableTracker` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89170 Approved by: https://github.com/jansel commit ea58955dda6452307ce43a5beef0a466b49f1bef Author: Nikita Shulga Date: Sat Nov 19 01:13:08 2022 +0000 Move bazel to c++17 (#89297) Splitting out various smaller pieces from https://github.com/pytorch/pytorch/pull/85969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89297 Approved by: https://github.com/huydhn commit cad5772c2c2e2c719664765119172610eed9c590 Author: Animesh Jain Date: Sat Nov 19 00:22:43 2022 +0000 [dashboard][huggingface] skip accuracy checks for really large models… (#89273) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89273 Approved by: https://github.com/desertfire commit ee907375fa085fbc61bd087f7d459fd62fd1ae4f Author: Howard Huang Date: Sat Nov 19 00:21:11 2022 +0000 [small] Update error message (#89294) Summary: `RuntimeError: Invalid function argument. Expected parameter "tensor_list" to be of type List[torch.Tensor].` to `RuntimeError: Invalid function argument. Expected parameter "input_tensor_list" to be of type List[torch.Tensor].` Test Plan: sandcastle Differential Revision: D41405238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89294 Approved by: https://github.com/awgu commit c3938bb97ab2bf0942bee2a97d30051733e839ca Author: zhxchen17 Date: Sat Nov 19 00:19:47 2022 +0000 [functorch] introduce an experimental map() op. (#88767) Summary: We want to introduce an experimental control flow op: map() to export some models as FX graphs correctly. Some calrification on basic requirements we have in mind: 1. This op can nest cond() and other control flow primitives internally. 2. We don't necessarily need loop carried dependencies for the models we've seen. 3. This map() op can handle dynamically shaped tensor as input and return dynamically shaped output based on input shapes. 4. We should be able to pass through additional arguments to the loop body as extra arguments. In this diff we introduce a new control flow op `map()` which has the following semantics: ``` def map(f: Callable, xs: Tensor, *args): return torch.stack([f(x, *args) for x in xs]) ``` Test Plan: pytest functorch/test_control_flow.py CI Differential Revision: D41165796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88767 Approved by: https://github.com/zou3519 commit 94b5c807fdb1fdf62bc2ab5f0161936f564b140c Author: Edward Z. Yang Date: Fri Nov 18 13:14:40 2022 -0800 Detach fake tensors into val, so they aren't affected by metadata mutation (#89140) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89140 Approved by: https://github.com/bdhirsh commit 885f8a56d445796100f3ab6f806633890662021a Author: Nikita Shulga Date: Fri Nov 18 23:44:57 2022 +0000 [BE] Print backtraces from coredumps (#89309) By simply invoking `gdb python core -ex "bt" -ex "q"` Test plan: See: [linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)](https://github.com/pytorch/pytorch/actions/runs/3500498821/jobs/5863369649#step:14:39) Not sure why multiprocessing tests SEGFAULT, but they do Pull Request resolved: https://github.com/pytorch/pytorch/pull/89309 Approved by: https://github.com/clee2000, https://github.com/huydhn commit 0e1fcc8aa8790e54a85efdc81b959f46f089e3d3 Author: Tran Le Date: Fri Nov 18 23:19:14 2022 +0000 [FX] Add type annotation to `getitem` node before `split_module` (#88510) Summary: Some nodes lost the type annotation during `split_module`, causing the submodels to be un-scriptable. This is because compiler always infer Tensor type, which is wrong for non-Tensor types. We attempt to infer type annotation for `getitem` node to improve scriptability. Test Plan: ``` buck2 test //caffe2/test:fx_experimental ``` Differential Revision: D41037819 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88510 Approved by: https://github.com/xush6528 commit ecfb4e064ccedb42fd73d99f24cb749e05e28801 Author: Wei Wang Date: Fri Nov 18 23:05:50 2022 +0000 [Inductor CI] Use string format for cuda-arch-list input to prevent 8.0/9.0/10.0 etc from being interpreted as 8/9/10 (#89279) Currently or in future whenever we change the cuda-arch-list to num.0, github action or some agent would pass just num to TORCH_CUDA_ARCH_LIST This num is not regex matched during cuda arch analysis phase. (here: https://github.com/pytorch/pytorch/blob/c5fafb4e1694f141d8a1a31142cce4049d9057ed/cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake#L229) Example failure: https://github.com/weiwangmeta/pytorch/actions/runs/3495656108/jobs/5852735299 Unknown CUDA Architecture Name 8 in CUDA_SELECT_NVCC_ARCH_FLAGS This change reminds us to use e.g. '8.0', '9.0', '10.0' etc instead of 8.0, 9.0, 10.0 as GHA or some other agent may erroneously truncate it to pure numbers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89279 Approved by: https://github.com/desertfire, https://github.com/atalman commit 7551136b81251fef0505a935ab614a44dd355479 Author: Bryce Long Date: Fri Nov 18 22:36:05 2022 +0000 Add NVTX markers that dump additional information for nvprim_nvfuser Dynamo graphs (#88259) dump information on graphs that NVFuser JIT compiles: - the markers show the list of ops, args, and inputs that make up the graph also dumps information on FX nodes that are not touched by NVFuser: - the markers show the op, name, and arg list of the node Pull Request resolved: https://github.com/pytorch/pytorch/pull/88259 Approved by: https://github.com/IvanYashchuk, https://github.com/jjsjann123, https://github.com/mruberry commit 35d5fc52f01f0314ab1bf1555ea27d6fedbb7d98 Author: Taylor Robie Date: Thu Nov 17 13:33:39 2022 -0800 [Profiler] Don't raise SOFT_ASSERT in debug builds. (#89240) Enough people are hitting this issue that we need to turn off hard failures until the fire rate is zero in steady state. (via scuba logging.) Differential Revision: [D41382914](https://our.internmc.facebook.com/intern/diff/D41382914/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89240 Approved by: https://github.com/aaronenyeshi commit bfffc8d8efc3247853d706148146a5fd62d5ef08 Author: Andrew Gu Date: Thu Nov 17 23:06:09 2022 +0000 [DDP][Docs] Add warning that `no_sync()` should include forward (#89244) The issue where the user only includes `loss.backward()` inside `no_sync()` but not the forward pass has arisen several times now. I think adding an explicit warning in the docs is worthwhile. Rendered doc: Screen Shot 2022-11-17 at 9 21 32 PM Pull Request resolved: https://github.com/pytorch/pytorch/pull/89244 Approved by: https://github.com/zhaojuanmao commit 304b5de1b01213b18947ffcb6f5782f89fcd0b2e Author: David Berard Date: Thu Nov 17 09:58:29 2022 -0800 Re-enable test_hf_bert_fsdp (#89223) It looks like this failure was actually caused by https://github.com/pytorch/pytorch/pull/88629, see the revert message on that PR. It probably just looked like a flaky test on CI because of how quickly the PR was reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89223 Approved by: https://github.com/voznesenskym commit ba605c3b0439fd5dfe062f42e60b990c88c061d4 Author: Edward Z. Yang Date: Fri Nov 18 06:59:21 2022 -0800 Don't trace when we track_tensor_tree (#89139) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89139 Approved by: https://github.com/bdhirsh commit e04dc35a6a1d1447f6e067db5f29f88adff91acf Author: Edward Z. Yang Date: Fri Nov 18 06:59:20 2022 -0800 Symintify obeys_layout_contract (#89138) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89138 Approved by: https://github.com/bdhirsh commit 837ca8f344380f2356b01662f215ff561b09401f Author: Zain Rizvi Date: Fri Nov 18 19:36:09 2022 +0000 Remove --retry-all-errors from environment with old curl (#89298) The version of curl on the `ubuntu-latest` box doesn't support the `--retry-all-errors` param and is breaking periodic builds Example: https://github.com/pytorch/pytorch/actions/runs/3495466804/jobs/5852265880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89298 Approved by: https://github.com/huydhn commit ee2ce3fef6d6bd073eb31303808618db88cec2e1 Author: Huy Do Date: Fri Nov 18 18:55:33 2022 +0000 Set make max load when building libtorch (#89237) The nccl build is still OOM sometimes when using `$(MAKE)`: ``` virtual memory exhausted: Cannot allocate memory Makefile:73: recipe for target '/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o' failed make[5]: *** [/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o] Error 1 make[5]: Leaving directory '/var/lib/jenkins/workspace/third_party/nccl/nccl/src/collectives/device' ``` * https://github.com/pytorch/pytorch/actions/runs/3476485191/jobs/5811758058 * https://github.com/pytorch/pytorch/actions/runs/3422228421/jobs/5702153639 So trying to set the same limit here as when building with ninja Pull Request resolved: https://github.com/pytorch/pytorch/pull/89237 Approved by: https://github.com/malfet commit 7ec8a4d2a26f717d0a4073e6005f9edfdd7ab641 Author: vfdev-5 Date: Fri Nov 18 18:46:50 2022 +0000 Vectorized horizontal flip implementation (#88989) When we benchmarked image processing transforms in torchvision : tensor vs pillow we saw that horizontal flip on uint8 data `(3, X, X)` is 2-3x slower. Due to the fact that output's first stride is negative, implementation does a simple data copy using [`basic_loop`](https://github.com/pytorch/pytorch/blob/8371bb8a3dddbead709bc1e9d26715818a34fa8a/aten/src/ATen/native/cpu/Loops.h#L286). In this PR, a vectorized path is added for horizontal flip op for dtypes: uint8, int, float32, long and double and there is a speed-up that reduces the gap between PIL and tensor ops ``` CPU capability usage: AVX2 [----------------------------------------------------------------- Horizontal flip -----------------------------------------------------------------] | torch (1.14.0a0+git2ed1d29) PR | Pillow (9.3.0) | torch (1.14.0.dev20221116+cu116) nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------ channels=3, size=256, dtype=torch.int64 | 101.307 (+-0.904) | | 111.364 (+-0.328) channels=3, size=520, dtype=torch.int64 | 462.369 (+-2.184) | | 505.602 (+-0.541) channels=3, size=712, dtype=torch.int64 | 1855.441 (+-6.528) | | 1828.370 (+-8.600) channels=1, size=256, dtype=torch.int32 | 22.282 (+-0.130) | 44.218 (+-0.936) | 34.651 (+-0.162) channels=1, size=520, dtype=torch.int32 | 72.180 (+-0.076) | 166.639 (+-1.180) | 118.820 (+-0.210) channels=1, size=712, dtype=torch.int32 | 129.621 (+-0.649) | 307.140 (+-2.221) | 216.104 (+-0.793) channels=3, size=256, dtype=torch.uint8 | 51.685 (+-0.200) | 44.171 (+-0.818) | 361.611 (+-0.276) channels=3, size=520, dtype=torch.uint8 | 223.320 (+-0.726) | 166.607 (+-2.256) | 1462.012 (+-4.917) channels=3, size=712, dtype=torch.uint8 | 423.298 (+-1.156) | 307.067 (+-1.999) | 2738.481 (+-1.715) channels=1, size=256, dtype=torch.float32 | 22.281 (+-0.056) | 44.149 (+-0.808) | 35.316 (+-0.028) channels=1, size=520, dtype=torch.float32 | 72.268 (+-0.106) | 166.631 (+-1.212) | 119.504 (+-0.340) channels=1, size=712, dtype=torch.float32 | 129.777 (+-0.632) | 307.078 (+-1.909) | 216.987 (+-0.185) channels=1, size=256, dtype=torch.float16 | 32.789 (+-0.081) | | 34.044 (+-0.039) channels=1, size=520, dtype=torch.float16 | 112.693 (+-0.478) | | 117.445 (+-0.125) channels=1, size=712, dtype=torch.float16 | 203.644 (+-0.791) | | 213.283 (+-0.397) channels=3, size=256, dtype=torch.float64 | 102.058 (+-0.333) | | 108.404 (+-0.346) channels=3, size=520, dtype=torch.float64 | 473.139 (+-1.327) | | 503.265 (+-0.365) channels=3, size=712, dtype=torch.float64 | 1854.489 (+-9.513) | | 1844.345 (+-1.371) channels=1, size=256, dtype=torch.int16 | 11.927 (+-0.056) | | 33.993 (+-0.037) channels=1, size=520, dtype=torch.int16 | 39.724 (+-0.148) | | 117.577 (+-0.153) channels=1, size=712, dtype=torch.int16 | 68.264 (+-0.133) | | 213.118 (+-0.157) Times are in microseconds (us). ``` ``` CPU capability usage: AVX512 [----------------------------------------------------------------- Horizontal flip ------------------------------------------------------------------] | torch (1.14.0a0+git2ed1d29) PR | Pillow (9.3.0) | torch (1.14.0.dev20221118+cu116) nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------- channels=3, size=256, dtype=torch.int64 | 131.244 (+-1.954) | | 135.649 (+-4.066) channels=3, size=520, dtype=torch.int64 | 522.032 (+-4.660) | | 539.822 (+-10.420) channels=3, size=712, dtype=torch.int64 | 1041.111 (+-53.575) | | 1322.411 (+-80.017) channels=1, size=256, dtype=torch.int32 | 10.108 (+-0.414) | 49.164 (+-1.000) | 34.606 (+-0.865) channels=1, size=520, dtype=torch.int32 | 93.218 (+-1.417) | 191.985 (+-5.047) | 133.664 (+-5.372) channels=1, size=712, dtype=torch.int32 | 167.919 (+-2.854) | 353.574 (+-6.568) | 246.162 (+-5.753) channels=3, size=256, dtype=torch.uint8 | 34.710 (+-0.541) | 49.005 (+-0.923) | 136.603 (+-2.339) channels=3, size=520, dtype=torch.uint8 | 154.873 (+-3.049) | 191.729 (+-4.997) | 534.329 (+-10.754) channels=3, size=712, dtype=torch.uint8 | 290.319 (+-4.819) | 351.619 (+-6.978) | 997.119 (+-33.086) channels=1, size=256, dtype=torch.float32 | 10.345 (+-0.338) | 49.105 (+-0.942) | 35.478 (+-0.733) channels=1, size=520, dtype=torch.float32 | 81.131 (+-5.281) | 191.697 (+-4.555) | 133.554 (+-4.193) channels=1, size=712, dtype=torch.float32 | 169.581 (+-3.476) | 352.995 (+-10.792) | 251.089 (+-7.485) channels=1, size=256, dtype=torch.float16 | 35.259 (+-0.612) | | 35.154 (+-0.924) channels=1, size=520, dtype=torch.float16 | 132.407 (+-1.980) | | 131.850 (+-5.611) channels=1, size=712, dtype=torch.float16 | 240.192 (+-5.479) | | 239.555 (+-7.273) channels=3, size=256, dtype=torch.float64 | 129.649 (+-2.349) | | 130.429 (+-6.240) channels=3, size=520, dtype=torch.float64 | 548.534 (+-5.179) | | 622.568 (+-25.720) channels=3, size=712, dtype=torch.float64 | 1208.091 (+-77.095) | | 1679.204 (+-316.292) channels=1, size=256, dtype=torch.int16 | 7.801 (+-0.115) | | 34.517 (+-0.482) channels=1, size=520, dtype=torch.int16 | 36.010 (+-0.855) | | 131.001 (+-1.686) channels=1, size=712, dtype=torch.int16 | 87.395 (+-1.355) | | 237.731 (+-4.181) Times are in microseconds (us). ``` [Source](https://gist.github.com/vfdev-5/c0421f54c8aed655b042dd1ce4cb621e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88989 Approved by: https://github.com/lezcano, https://github.com/datumbox, https://github.com/peterbell10, https://github.com/ngimel commit 81a4aeabdf9d550ceda52a5060f19568de61b265 Author: Yanbo Liang Date: Fri Nov 18 18:43:15 2022 +0000 [Dynamo] Support Tensor.nelement & torch.cuda.is_available (#89164) Fix several errors in [7k github models](https://github.com/pytorch/torchdynamo/issues/1198). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89164 Approved by: https://github.com/soumith commit 8a419cbffb939ef00ce723bbdf5bf1b8c62a7d74 Author: Horace He Date: Fri Nov 18 10:56:03 2022 +0000 Added partial decomposition of conv_backward and grad_bias computation (#89128) `convolution_backward` often just kicks off the `sum` as a separate kernel. Splitting it off in a decomp allows us to fuse it into other ops: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Convolution.cpp#L2150 Improves `convnext_base` from 373 img/s => 383 img/s Not sure what other models use convolution with bias haha. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89128 Approved by: https://github.com/ezyang commit 38ccd08f9b79bc2102050833948f5112aed2dfc4 Author: Jerry Zhang Date: Fri Nov 18 00:15:45 2022 -0800 [quant][fx][be] Refactor replace observer with q/dq op code (#89247) Summary: This is a refactor to prepare for future extensions, no functionality changes Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89247 Approved by: https://github.com/vkuzo, https://github.com/andrewor14 commit c219b55b5f8d5718d382735628e9eb8a46caee9f Author: zhxchen17 Date: Thu Nov 17 21:35:51 2022 -0800 Use standard __func__ macro in symbolic shape. (#89264) Summary: I saw the following issue only on Windows build in PR #88767: ``` RuntimeError: AttributeError: 'SymNode' object has no attribute 'torch::impl::PythonSymNodeImpl::ge' ``` It's only on Windows because we get the attributes of SymNode in C++ with `__FUNCTION__` macro, which is not in C++ standard, therefore has platform specific behavior. In this case, MSVC will include a function's namespace and class name, which is not intended here. Instead we should use `__func__`. see: https://en.cppreference.com/w/cpp/language/function#Function_definition godbolt example to show the difference: https://godbolt.org/z/PGfvecxPx Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89264 Approved by: https://github.com/ezyang commit 12a97444c3f5b640be54f3307895cd0e0c18085a Author: Richard Howell Date: Fri Nov 18 16:30:53 2022 +0000 [xplat] remove -weak_framework (#89233) Summary: The `-weak_framework` flag is no longer necessary, Buck will weakly link frameworks depending on the `target_sdk_version` of the binary being linked. Test Plan: Compare IG load commands before and after change with P553208168 ``` load command difference in Instagram.app/Frameworks/InstagramXplatFramework.framework/InstagramXplatFramework --- /tmp/tmpvd97s2v0 2022-11-16 12:13:54.082910598 -0800 +++ /tmp/tmpj20r_4ca 2022-11-16 12:13:54.082910598 -0800 @@ -9,7 +9,7 @@ /System/Library/Frameworks/CoreHaptics.framework/CoreHaptics (compatibility version 1.0.0, current version 1.0.0, weak) /System/Library/Frameworks/CoreImage.framework/CoreImage (compatibility version 1.0.0, current version 5.0.0) /System/Library/Frameworks/CoreLocation.framework/CoreLocation (compatibility version 1.0.0, current version 2780.0.17) - /System/Library/Frameworks/CoreML.framework/CoreML (compatibility version 1.0.0, current version 1.0.0, weak) + /System/Library/Frameworks/CoreML.framework/CoreML (compatibility version 1.0.0, current version 1.0.0) /System/Library/Frameworks/CoreMedia.framework/CoreMedia (compatibility version 1.0.0, current version 1.0.0) /System/Library/Frameworks/CoreServices.framework/CoreServices (compatibility version 1.0.0, current version 1226.0.0) /System/Library/Frameworks/CoreTelephony.framework/CoreTelephony (compatibility version 1.0.0, current version 0.0.0) @@ -33,9 +33,9 @@ /System/Library/Frameworks/Security.framework/Security (compatibility version 1.0.0, current version 60420.40.34) /System/Library/Frameworks/SystemConfiguration.framework/SystemConfiguration (compatibility version 1.0.0, current version 1241.40.2) /System/Library/Frameworks/UIKit.framework/UIKit (compatibility version 1.0.0, current version 6109.1.108) - /System/Library/Frameworks/UserNotifications.framework/UserNotifications (compatibility version 1.0.0, current version 1.0.0, weak) + /System/Library/Frameworks/UserNotifications.framework/UserNotifications (compatibility version 1.0.0, current version 1.0.0) /System/Library/Frameworks/VideoToolbox.framework/VideoToolbox (compatibility version 1.0.0, current version 1.0.0) - /System/Library/Frameworks/WebKit.framework/WebKit (compatibility version 1.0.0, current version 614.2.9, weak) + /System/Library/Frameworks/WebKit.framework/WebKit (compatibility version 1.0.0, current version 614.2.9) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0) /usr/lib/libbz2.1.0.dylib (compatibility version 1.0.0, current version 1.0.8) /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1300.32.0) ``` Both these changes are correct, WebKit is available from 8.0, UserNotifications from 10.0 and CoreML from 11.0. Instagram has a deployment target of 12.4. Reviewed By: ebgraham Differential Revision: D41348639 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89233 Approved by: https://github.com/malfet commit 19e66fcec235fe46a23186a59446bcfe70ad4f6d Author: andrewor14 Date: Thu Nov 17 12:47:33 2022 -0800 [Quant] Allow setting fixed qparams for inner LSTM ops (#88456) Summary: In both eager and FX graph mode quantization, `torch.ao.nn.quantizable.LSTM` is used as an observed custom module, which is responsible for inserting its own observers. By default, the user specifies a single QConfig for the custom module (either through QConfigMapping or by setting the "qconfig" attribute"), and all inner ops will [inherit this QConfig](https://github.com/pytorch/pytorch/blob/dc00bb51b8d370bf3891f0edb2c6e0c2914e329a/torch/ao/nn/quantizable/modules/rnn.py#L366-L378) and use the same observer/fake_quantize constructors. Today, users who wish to override this behavior must extend `torch.ao.nn.quantizable.LSTM` and write a lot of custom code to manually assign the QConfigs to the inner ops. This commit alleviates this burden on the user by providing a helper function to assign QConfigs with custom observers. An example use case of this is providing a reference implementation for a backend kernel that hardcodes qparams for efficiency. Example usage: ``` import torch from torch.ao.quantization import get_default_qconfig_mapping from torch.ao.quantization.fx.custom_config import ( PrepareCustomConfig, ConvertCustomConfig, ) class MyModel(torch.nn.Module): ... class UserLSTM(torch.ao.nn.quantizable.LSTM): @classmethod def from_float(cls, other): assert isinstance(other, cls._FLOAT_MODULE) linear_output_obs_ctr = FixedQParamsObserver.with_args( scale=2 ** -11, zero_point=2 ** 15, dtype=torch.qint32) sigmoid_obs_ctr = FixedQParamsObserver.with_args( scale=2 ** -16, zero_point=0, dtype=torch.qint32) tanh_obs_ctr = FixedQParamsObserver.with_args( scale=2 ** -15, zero_point=2 ** 15, dtype=torch.qint32) cell_state_obs_ctr = FixedQParamsObserver.with_args( scale=2 ** -11, zero_point=0, dtype=torch.qint32) hidden_state_obs_ctr = FixedQParamsObserver.with_args( scale=2 ** -7, zero_point=2 ** 7, dtype=torch.quint8) return torch.ao.quantization.utils._get_lstm_with_individually_observed_parts( float_lstm=other, linear_output_obs_ctr=linear_output_obs_ctr, sigmoid_obs_ctr=sigmoid_obs_ctr, tanh_obs_ctr=tanh_obs_ctr, cell_state_obs_ctr=cell_state_obs_ctr, hidden_state_obs_ctr=hidden_state_obs_ctr, ) qconfig_mapping = get_default_qconfig_mapping() example_inputs = (torch.rand(5, 3, 50), torch.rand(1, 3, 50), torch.randn(1, 3, 50)) prepare_custom_config = PrepareCustomConfig() \ .set_float_to_observed_mapping(torch.nn.LSTM, UserLSTM) convert_custom_config = ConvertCustomConfig() \ .set_observed_to_quantized_mapping(UserLSTM, torch.ao.nn.quantized.LSTM) model = MyModel() model = prepare_fx(model, qconfig_mapping, example_inputs, prepare_custom_config=prepare_custom_config) model(*example_inputs) # calibrate model = convert_fx(model, convert_custom_config=convert_custom_config) model(*example_inputs) ``` Test Plan: python test/test_quantization.py TestQuantizeFx.test_static_lstm_with_custom_fixed_qparams Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88456 Approved by: https://github.com/jerryzh168, https://github.com/vkuzo commit 19fcb80551854431e7e05c422690751037a18488 Author: Bin Bao Date: Fri Nov 18 16:15:55 2022 +0000 [inductor] Skip DALLE2_pytorch in torchbench (#89288) Summary: DALLE2_pytorch fails in eager as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89288 Approved by: https://github.com/Krovatkin commit 1f7c0ff6e799e7bde94975f7a5bbec39a69ab8f6 Author: Bin Bao Date: Fri Nov 18 13:41:51 2022 +0000 [inductor] Temporarily disable functorch_dp_cifar10 test in TorchBench (#89281) Summary: The failure wasn't caught because of a land race. Skip the test for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89281 Approved by: https://github.com/Krovatkin commit 55e55d95ea9a6f64bba50cdc9e243808cb534202 Author: Howard Huang Date: Fri Nov 18 15:27:15 2022 +0000 Update torch.distributed.DistBackendError type (#89235) Summary: Update torch.distributed.DistBackendError type based on https://fb.workplace.com/groups/pyreqa/posts/5753993921357059 Test Plan: Pyre tests should pass? let sandcastle run Reviewed By: markkm Differential Revision: D41384130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89235 Approved by: https://github.com/awgu commit 154e58c03285f3d399b8818dd17e973d486efefa Author: lezcano Date: Fri Nov 18 11:25:36 2022 +0000 Add most in-place references/decompositions (#88117) We add most in-place references in a generic way. We also implement a wrapper to implement the annoying interface that `nn.functional` nonlinearities have. We fix along the way a couple decompositions for some non-linearities by extending the arguments that the references have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88117 Approved by: https://github.com/mruberry commit 6741443c7ceae0201fd76b5e6fc59ebd8cd6876a Author: lezcano Date: Fri Nov 18 10:35:46 2022 +0000 Simplify maybe_resize_out (#88116) The previous behaviour would call `resize_` on 0-sized elements even when their size was correct. This would make some test fail, as resize_ may be an in-place operation and it's not supported by some subsystems Pull Request resolved: https://github.com/pytorch/pytorch/pull/88116 Approved by: https://github.com/mruberry commit ce0e22a81a2383c7c951310c9c0aa7638748687b Author: lezcano Date: Fri Nov 18 10:35:45 2022 +0000 Fix names of some reference functions (#88115) The `__name__` field of some binary reference functions was wrong. We fix this to be consistent with unary reference functions. In the future, we should probably make the binary reference wrapper return a wrapper itself to avoid all those calls to `partial`. This change helps performing some homogeneous treatment of functions by their name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88115 Approved by: https://github.com/mruberry commit 2e358cc98fab728aad8775de28596d589358b3b2 Author: Jacob Hayes Date: Fri Nov 18 14:09:21 2022 +0000 Add platform markers for linux only extra_install_requires (#88826) Fixes #88049 https://github.com/pytorch/pytorch/pull/85097 added new extra dependencies on `nvidia-*`. They are linux (GPU) only packages, but were not marked as such, causing issues installing pytorch 1.13 via Poetry (and possibly other tools that follow PyPI's metadata API) on non-Linux systems. This "fixes" the issue by adding the `; platform_system = 'Linux'` marker on these dependencies, but the main problem of different metadata for different wheels is a [somewhat larger issue](https://github.com/pytorch/pytorch/issues/88049#issuecomment-1302555269). https://github.com/pytorch/pytorch/pull/85097 used `;` as a delimiter for splitting the different deps, but that is the delimiter used in markers, so I changed to split on `|`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88826 Approved by: https://github.com/neersighted, https://github.com/lalmei, https://github.com/malfet commit 5654fed23e7728eca717b23c97c1fca8c176112a Author: Nikita Shulga Date: Fri Nov 18 10:51:07 2022 +0000 Export c10/[macros|util] headers to be used by internal inductor builds (#89249) Summary: Fixes package boundary violation that existed in previous implementation Test Plan: CI Differential Revision: D41391862 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89249 Approved by: https://github.com/izaitsevfb commit 4c6724985d8b85c5719078a25255dbd7369c25e5 Author: Iris Date: Fri Nov 18 09:49:36 2022 +0000 [PT-D][Checkpoint] Update import and update docstring for distributed checkpoint (#89256) Update test import and docstring as we have moved distributed checkpointing from torch.distributed._shard.checkpoint to torch.distributed.checkpoint (https://github.com/pytorch/pytorch/pull/88698). Test: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/89256 Approved by: https://github.com/fduwjj commit 2dcacc6b999a44e13a0dbb679ac17d767b05d898 Author: Jiewen Tan Date: Fri Nov 18 09:28:46 2022 +0000 [LTC] Upstream short_metrics (#89186) Summary: This pull request upstreams pytorch/xla#4148. Test Plan: xla CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89186 Approved by: https://github.com/JackCaoG commit c5fafb4e1694f141d8a1a31142cce4049d9057ed Author: HDCharles Date: Thu Nov 17 19:20:22 2022 -0800 [ao] maintain BC for is_activation_post_process (#89260) Summary: tests are failing due to code packaged with trained models calling now defunct function names (is_activation_post_process). this diff maintains BC temporarily until the cached code can be refreshed Test Plan: no functional change Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89260 Approved by: https://github.com/jerryzh168 commit 30c3e5afb0c0ad22c1084a2064ebdc09f7808ecc Author: Michael Lazos Date: Fri Nov 18 07:46:35 2022 +0000 Disable tracing `zero_grad()` (#88731) Tracing through zero grad is slow, and doesn't provide any benefits. Helps https://github.com/pytorch/torchdynamo/issues/1803 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88731 Approved by: https://github.com/anijain2305 commit afdc48f843afab531a4315a1ca1a43f5f303c5b7 Author: Huy Do Date: Fri Nov 18 07:39:16 2022 +0000 Gate CUDA-only inductor tests by HAS_CUDA (#89251) This is to prevent these tests from running on platform where CUDA doesn't exist such as macos. And they are quite flaky https://hud.pytorch.org/failure/test_linear_permute_fusion_cpu there failing the CI from time to time Pull Request resolved: https://github.com/pytorch/pytorch/pull/89251 Approved by: https://github.com/soumith, https://github.com/desertfire commit 6a964c16e5125f485372418d129c3eabdec7e881 Author: kshitij12345 Date: Fri Nov 18 07:31:10 2022 +0000 [flaky] relax tolerance conv1d_vs_scipy (#89193) Fixes https://github.com/pytorch/pytorch/issues/89087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89193 Approved by: https://github.com/kit1980 commit fc1c0cd3ef5af94e2b6cb262252cf97b61e5d3cb Author: PumeTu Date: Fri Nov 18 07:24:33 2022 +0000 Add support trace on MPS backend (#87910) Fixes [#87221](https://github.com/pytorch/pytorch/issues/87221) `trace` now supported on MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/87910 Approved by: https://github.com/kulinseth, https://github.com/malfet commit 7beb1518896482596a0d35ec404338d430933250 Author: maxren Date: Thu Nov 17 14:31:43 2022 -0800 [xnnpack][executorch] remove unordered_set from xnn_compiler (#89231) Removing unrodered_set from xnncompiler for executorch. While some STL libraries are unavoidable, and I think it should be ok for delegate to pull these libraries, unordered_set wasn't really needed, and we should be serializing the number of external ids anyways After this, the backend classes should be good to hg copy into executorch Differential Revision: [D41227391](https://our.internmc.facebook.com/intern/diff/D41227391/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89231 Approved by: https://github.com/salilsdesai, https://github.com/cccclai commit ab75982d3a8d76052dbaf1eb37c5b9b729ac0dd8 Author: Zain Rizvi Date: Fri Nov 18 07:03:22 2022 +0000 Always retry curl downloads (#89157) Modify our curl commands so that they always retry downloads. By default, curl only retries what it considers to be "transient" errors, based on the server's response. However, curl's estimate of what's transient is very conservative. By adding the --retry-all-errors parameter we'll always retry curl commands. In particular, I'm hoping this mitigates errors where curl fails with the below error ([logs](https://github.com/pytorch/pytorch/actions/runs/3468758110/jobs/5794939941)) `curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to ossci-linux.s3.amazonaws.com:443` Some of the modified downloads didn't even have retries, so I added them in More details: https://everything.curl.dev/usingcurl/downloads/retry Pull Request resolved: https://github.com/pytorch/pytorch/pull/89157 Approved by: https://github.com/kit1980, https://github.com/malfet commit 3bc78295c265df62983fcbcadb4a87ef7d0fbf2d Author: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Date: Fri Nov 18 05:08:45 2022 +0000 Fix consistentcy of histc on CPU and CUDA (#87832) Fixes #87657 The main reason why `histc` returns slightly different outputs is the difference on how bin position is calculate. The CPU calculates it as: https://github.com/pytorch/pytorch/blob/449778a939f2adc8867c5035b08be4e2d88339d8/aten/src/ATen/native/cpu/HistogramKernel.cpp#L168-L170 which is basically `(i - a) / (b - a) * N`, while cuda code https://github.com/pytorch/pytorch/blob/449778a939f2adc8867c5035b08be4e2d88339d8/aten/src/ATen/native/cuda/SummaryOps.cu#L41 which is `(i - a) * N / (b - a)`. For some cases like in #87657 the order of arithmetic operations matters due to the floating point round-off. ________________ Not sure where would be the most appropriate place to put the unit test. Hope `test_reductions::test_histc` will do. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87832 Approved by: https://github.com/soumith commit f1fb586bc64b96264f4409421d758e9336f19eef Author: Sherlock Huang Date: Thu Nov 17 18:50:33 2022 +0000 Symintify repeat_interleave.self_int (#89111) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89111 Approved by: https://github.com/ezyang commit ba5e39e106caaf4e013fbfc4890d3df13e66d6c9 Author: Sherlock Huang Date: Thu Nov 17 18:10:40 2022 +0000 Fix tol for test_nvfuser_correctness__softmax_backward_data_cuda (#89178) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89178 Approved by: https://github.com/kit1980 commit 6f609dd0e03e11395cc637a34abd68472e5a1e12 Author: Yoni Chechik Date: Fri Nov 18 04:29:00 2022 +0000 docs: conv2d `padding` attribute- add `int` option (#85004) `padding: int` already exists but isn't mentioned in the genereted docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/85004 Approved by: https://github.com/albanD, https://github.com/kit1980 commit 6f4f69f54d181b34373e07dcb415f6c2af61868f Author: Jacob Szwejbka Date: Fri Nov 18 04:13:03 2022 +0000 [Executorch] [Quantization] New pattern for dynamic dequant (#89236) Summary: The op exposed should be qparams, and then we have concerns about prims not being supported so make q and dq ops that take in tensors Test Plan: unit test Differential Revision: D41382580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89236 Approved by: https://github.com/jerryzh168 commit f4efc5e821259aee1b64ee32f992ea3458dcd546 Author: Jerry Zhang Date: Thu Nov 17 16:45:47 2022 -0800 [quant][be] Move some helper functions to the top level to reduce function length (#89246) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89246 Approved by: https://github.com/vkuzo commit 6ed14c7dcfb261e84016407d8025bf3e27999730 Author: PyTorch MergeBot Date: Fri Nov 18 03:45:53 2022 +0000 [vision hash update] update the pinned vision hash (#89102) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89102 Approved by: https://github.com/pytorchbot commit 3c2676de3d35fd22f79c46eaa770d03f1418c480 Author: Jiewen Tan Date: Fri Nov 18 03:37:14 2022 +0000 [LTC] Restore GetPythonFrames (#89122) Summary: pytorch/pytorch@936e930 delete the registration of GetPythonFramesFunction. Restore that and add a test case to prevent regression. Test Plan: python test/lazy/test_debug_util.py Fixes pytorch/xla#4206. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89122 Approved by: https://github.com/JackCaoG commit 65bcd1f88099dfeefccb6c6b7a0918e3a7ded606 Author: John Detloff Date: Fri Nov 18 03:17:35 2022 +0000 Add previously deleted circleci readme back to repo (#85598) This readme was deleted here: https://github.com/pytorch/pytorch/pull/73224 I chatted with the author, who doesn't remember exactly why it was deleted but suspects it was due either to out of date contents or because of the upcoming migration to github actions. With that said, we have references to this readme through our circleci directory, and since we do still have a lot of circleci workflows I feel this readme still adds a lot of value. (I recently did some CI tasks that required me to dig this readme up in order to solve a problem). I recommend we restore this file with a warning that its contents may be out of date, until our CircleCI workflows are entirely migrated to Github Actions Pull Request resolved: https://github.com/pytorch/pytorch/pull/85598 Approved by: https://github.com/clee2000, https://github.com/malfet commit 92f9214a311a6b94dff9e38836d5b0849a539647 Author: mikey dagitses Date: Thu Nov 17 16:20:45 2022 -0500 add -Wnarrowing as error to cmake builds (#89207) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89207 Approved by: https://github.com/wconstab, https://github.com/malfet commit fd0efb01a7a3a5b487d3d23c2c53a936620ba28a Author: Raman kumar Date: Fri Nov 18 02:53:39 2022 +0000 [MPS] Support for median with dim (#88807) **Aim**: Add support for aten::median for MPS backend (Fixes #87220) This is fresh clean PR from the previous [PR](https://github.com/pytorch/pytorch/pull/88554) - Implementing the new median function in aten/src/ATen/native/mps/operations/ReduceOps.mm - Adding it to aten/src/ATen/native/native_functions.yaml - Adding it to existing test_median median of entire input tensor on MPS `torch.median(mps_inputTensor)` median of along a dim `torch.median(mps_inputTensor, dim=[int], keepdim=[Bool])` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88807 Approved by: https://github.com/kulinseth commit 9fd00f194ae4e28948a9a03a6382c20dde04e4fd Author: Dmytro Dzhulgakov Date: Fri Nov 18 02:42:45 2022 +0000 Fix the kineto daemon build condition (#89174) If we're not building the lite interpreter we shouldn't be disabling Kineto. This eliminates a step from https://github.com/facebookincubator/dynolog/blob/main/docs/pytorch_profiler.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/89174 Approved by: https://github.com/kimishpatel, https://github.com/malfet commit b652fbc57a331df5aa28b0bcd07f9e72db2fdbae Author: David Boetius Date: Fri Nov 18 01:57:38 2022 +0000 Fix torch.nn.functional.gelu docstring formatting (#89061) The docstring of `torch.nn.functional.gelu` is formatted incorrectly, so that part of the math isn't rendered and there are extra blocks when there shouldn't: https://pytorch.org/docs/stable/generated/torch.nn.functional.gelu.html I didn't build the docs, so I am not 100% sure that I got the formatting right, but I am confident. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89061 Approved by: https://github.com/bdhirsh, https://github.com/kit1980 commit 177621a0b28b931d9be6976c2c38cb57af7949d9 Author: Huy Do Date: Fri Nov 18 00:11:42 2022 +0000 Use pytest-flakefinder to rerun tests multiple times (#89106) Per title. The way re-run is handled in https://github.com/pytorch/pytorch/pull/88646 only applies to unittest. * https://github.com/pytorch/pytorch/actions/runs/3484930558 * https://github.com/pytorch/pytorch/actions/runs/3484930319 Manually download the test report artifacts and verify that that pytest test_ops is called multiple times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89106 Approved by: https://github.com/clee2000 commit 57e05e822d0f53db04d2ee2216906f6fc01b4a4f Author: Dmitry Tomshin Date: Fri Nov 18 00:10:48 2022 +0000 Issue 68576 prefetch factor (#88972) Fixes #68576 This PR allows set the `prefetch_factor=None` making it really optional according to the documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/88972 Approved by: https://github.com/kit1980 commit 2b3ac879a7d68aca8a7608e97a7cfc713dbf5c6c Author: Sean Ross-Ross Date: Thu Nov 17 23:36:15 2022 +0000 feat: adding view_copy_batch_rule and opinfo for view_copy (#88150) to add view_copy to vmap dispatch and adding opinfo part of https://github.com/pytorch/functorch/issues/825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88150 Approved by: https://github.com/kshitij12345, https://github.com/zou3519 commit 31b10e7d4083acd0eb689ae3873c13b8711770be Author: Bin Bao Date: Thu Nov 17 19:43:37 2022 +0000 Enable inductor CI for TorchBench (#87465) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87465 Approved by: https://github.com/malfet commit 3d8a853a87515a5e29e384396ff8769f4ee2f946 Author: erjia Date: Thu Nov 17 23:06:41 2022 +0000 [DataPipe] Add container template for _Fork and _Demux (#89216) - This would remove the hard-coded check within `_ChildDataPipe`. - Add `get_length_by_instance` to parent class to make sure there is a chance that child DataPipe can have different lengths - Prevent Error when `__del__` executed when the object has already been removed Pull Request resolved: https://github.com/pytorch/pytorch/pull/89216 Approved by: https://github.com/NivekT commit e2229a89b0618b58011a69a28e3d23cf7096e547 Author: keineahnung2345 Date: Thu Nov 17 22:28:20 2022 +0000 Fix typo in aten/src/README.md (#89175) remove redundant "have to" Pull Request resolved: https://github.com/pytorch/pytorch/pull/89175 Approved by: https://github.com/kit1980 commit a695fcf20103bb08ae660788d128cd924e6ec05b Author: Charlie Yan Date: Thu Nov 17 19:05:44 2022 +0000 Add tests for replicate multiple modules (#89099) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89099 Approved by: https://github.com/zhaojuanmao commit 767f6aa49fe20a2766b9843d01e3b7f7793df6a3 Author: Nikita Shulga Date: Thu Nov 17 22:05:27 2022 +0000 [JIT][Security] Do not blindly eval input string (#89189) Introduce `_eval_no_call` method, that evaluates statement only if it does not contain any calls(done by examining the bytecode), thus preventing command injection exploit Added simple unit test to check for that `torch.jit.annotations.get_signature` would not result in calling random code. Although, this code path exists for Python-2 compatibility, and perhaps should be simply removed. Fixes https://github.com/pytorch/pytorch/issues/88868 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89189 Approved by: https://github.com/suo commit fbbf3687453aed1b732eee6f6e9050258ce29561 Author: Huy Do Date: Thu Nov 17 21:33:59 2022 +0000 Fix distributed test paths when running periodic multigpu job (#89225) Some distributed tests are moved to a new location after https://github.com/pytorch/pytorch/pull/88698. This is currently failing periodic multigpu job: * https://github.com/pytorch/pytorch/actions/runs/3484486207/jobs/5829301159 * https://github.com/pytorch/pytorch/actions/runs/3484486207/jobs/5829301093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89225 Approved by: https://github.com/clee2000 commit f057a45fafcd5869d8f6f7e687fad1d36749b9d0 Author: mikey dagitses Date: Thu Nov 17 06:09:55 2022 -0500 reland "support running test_mobile_profiler with buck1/buck2 and OSS (#89001)" (#89091) We modify this to no longer use std::experimental::filesystem::path and use our own custom type instead. This reverts commit c53a5ac6cca7e2e7d7c47b1a816c7eaa2e7a7704. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89091 Approved by: https://github.com/r-barnes, https://github.com/malfet commit e856a4d66bead8997a83f8714547c09fcbcdc263 Author: Xiao Wang <24860335+xwang233@users.noreply.github.com> Date: Thu Nov 17 20:10:52 2022 +0000 Add an env var to skip cudnn version compatibility check (#89184) skip the check by setting `PYTORCH_SKIP_CUDNN_COMPATIBILITY_CHECK=1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89184 Approved by: https://github.com/ngimel commit 04169c5b6e53e89e339f02b61287154034ee9fca Author: Tugsbayasgalan (Tugsuu) Manlaibaatar Date: Mon Nov 14 23:26:15 2022 -0800 Rewrite assert statement with torch._assert under config (#88246) This diff rewrites assert statement in python with torch._assert under config. The resulting graph looks something like: ``` SOURCE CODE: def f(x): assert x[0] == 3 return x.cos() CAPTURED GRAPH: graph(): %arg0 : [#users=2] = placeholder[target=arg0] %getitem : [#users=1] = call_function[target=operator.getitem](args = (%arg0, 0), kwargs = {}) %eq : [#users=1] = call_function[target=operator.eq](args = (%getitem, 3), kwargs = {}) %_assert : [#users=0] = call_function[target=torch._assert](args = (%eq, "assertion_error"), kwargs = {}) %cos : [#users=1] = call_method[target=cos](args = (%arg0,), kwargs = {}) return cos ``` Note that this introduces side-effect as it could error out while executing graph, but the assertion can eliminated via DCE if we choose to ignore it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88246 Approved by: https://github.com/jansel commit af448e84eb2978062dc6ca4d3d538ed46b58f3d6 Author: William Wen Date: Thu Nov 17 19:20:49 2022 +0000 Fix bug in dynamo dashboard summary stats diff (#89226) Fixes issue where a suite may not be present in one of the logs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89226 Approved by: https://github.com/anijain2305 commit 706f791a1912af62e5a605bf93e246b457506627 Author: PyTorch MergeBot Date: Thu Nov 17 18:27:08 2022 +0000 Revert "Support masked_fill (#88736)" This reverts commit 2b131b1d43b10a2a005f3f042f920a62501e4e2d. Reverted https://github.com/pytorch/pytorch/pull/88736 on behalf of https://github.com/kit1980 due to Inductor tests are failing with AttributeError: module 'torch._inductor.codecache' has no attribute 'valid_vec_isa_list' commit 8e4c9828f4c990f439179912159086aaed790493 Author: PyTorch MergeBot Date: Thu Nov 17 17:02:36 2022 +0000 Revert "Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143)" This reverts commit e686b8c3ba93cb7caa314c78bf84dbd2d7df9683. Reverted https://github.com/pytorch/pytorch/pull/89143 on behalf of https://github.com/ZainRizvi due to This seems to be causing the test_make_fx_symbolic_exhaustive_rad2deg_cpu_float32 and test_make_fx_symbolic_exhaustive_inplace_rad2deg_cpu_float32 test to fail across multiple jobs commit cd81a700ecfb84a039257896af7b8398435b089e Author: Jiong Gong Date: Thu Nov 17 16:43:16 2022 +0000 Fix buffer overflow from AddressSanitizer checks due to inaccurate bfloat16 representation of large integer (#89210) Fixes #88939 The root cause of the issue is that BF16 cannot accurately represent big integer values. In the test case below, `539` as one of the corner pixel index is wrongly represented as `540` (from https://github.com/jgong5/pytorch/blob/fc60a1865eafc985217eccc0251f82014041e6a7/aten/src/ATen/native/UpSample.h#L271) and then the access out of the range with this index. Thanks to @malfet for the investigation and initial fix. I also reported an issue https://github.com/pytorch/pytorch/issues/89212 to track the issue of inaccurate integer representation of bf16 that need to be addressed in other places of PyTorch. ```python import torch def test(): arg_1 = torch.rand([1, 10, 540, 540], dtype=torch.bfloat16).clone() res = torch.nn.functional.interpolate(arg_1,2,mode='bilinear',align_corners=True) test() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89210 Approved by: https://github.com/malfet commit 2b131b1d43b10a2a005f3f042f920a62501e4e2d Author: Wang, Eikan Date: Thu Nov 17 03:33:32 2022 +0000 Support masked_fill (#88736) Support `masked_fill` to address the GPT2 performance issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88736 Approved by: https://github.com/jansel, https://github.com/jgong5 commit e686b8c3ba93cb7caa314c78bf84dbd2d7df9683 Author: Edward Z. Yang Date: Wed Nov 16 21:31:02 2022 -0800 Reland "Towards unifying symbolic and non symbolic fake tensor (#89038)" (#89143) This reverts commit cf6003f0469ae1440d4a8585860c2c5f4c738707. Differential Revision: [D41363992](https://our.internmc.facebook.com/intern/diff/D41363992) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89143 Approved by: https://github.com/albanD commit bdc9911575277848ccac56b344dd624aa97fb87d Author: Will Constable Date: Wed Nov 16 23:31:57 2022 +0000 Fix typo in dist_util.py (#89167) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89167 Approved by: https://github.com/davidberard98 commit 3beccbc29939f7a34346ed1a3646f6464086eeb4 Author: ecao Date: Thu Nov 17 08:15:49 2022 +0000 Add BFloat16 support and optimization for mish, hardtanh backward, and silu on CPU (#82460) * add BFloat16 support for mish and hardtanh backward on CPU. * optimize the performance for silu - optimize the performance for silu: bfloat16 single socket (28 cores): ``` before: 1x128x1024 forward 0.090 s backward 0.218 s 10x128x1024 forward 0.146 s backward 0.314 s after: 1x128x1024 forward 0.064 s backward 0.100 s 10x128x1024 forward 0.085 s backward 0.133 s ``` single core: ``` before: 1x128x1024 forward 0.300 s backward 0.606 s 10x128x1024 forward 2.825 s backward 5.834 s after: 1x128x1024 forward 0.156 s backward 0.239 s 10x128x1024 forward 1.447 s backward 2.165 s ``` - Add BFloat16 support for mish and backward of hardtanh on CPU. single socket (20 cores): op | shape | fp32 / s | fp32 / s | bf16 / s |  bf16 / s -- | -- | -- | -- | -- | --   |   | forward | backward | forward | backward silu | [10, 128, 10, 10] | 4.41E-05 | 7.67E-05 | 5.32E-05 | 9.38E-05   | [10, 128, 80, 80] | 0.0008 | 0.001788 | 0.00067 | 0.001031 mish | [10, 128, 10, 10] | 0.000356 | 0.000427 | 0.000367 | 0.000436   | [10, 128, 80, 80] | 0.004527 | 0.005807 | 0.004757 | 0.005393 hardtanh | [10, 128, 10, 10] | / | 3.97E-05 | / | 4.45E-05   | [10, 128, 80, 80] | / | 0.001748 | / | 0.000645 single core: op | shape | fp32 / s | fp32 / s | bf16 / s |  bf16 / s -- | -- | -- | -- | -- | --   |   | forward | backward | forward | backward silu | [10, 128, 10, 10] | 1.17E-04 | 1.91E-04 | 1.35E-04 | 2.23E-04   | [10, 128, 80, 80] | 0.007434 | 0.013141 | 0.008464 | 0.013044 mish | [10, 128, 10, 10] | 0.00103 | 0.00122 | 0.00106 | 0.001227   | [10, 128, 80, 80] | 0.065629 | 0.078418 | 0.067779 | 0.077214 hardtanh | [10, 128, 10, 10] | / | 1.18E-04 | / | 9.30E-05   | [10, 128, 80, 80] | / | 0.010773 | / | 0.005834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82460 Approved by: https://github.com/mingfeima, https://github.com/malfet commit 37c85cf5f2215da13d5836de46f44af72ed079ba Author: Mark Saroufim Date: Thu Nov 17 07:24:55 2022 +0000 Add warning if tensor cores are not used (#88844) Fixes https://github.com/pytorch/torchdynamo/issues/1839 Should I do this for all backends or just inductor? On a V100 I got from AWS ```python from torch._dynamo import optimize import torch def fn(x, y): a = torch.cos(x) b = torch.sin(y) return a + b new_fn = optimize("inductor")(fn) a = new_fn(torch.Tensor(1),torch.Tensor(1)) print(a) ``` ``` (sourcetorch) ubuntu@ip-172-31-31-152:~/test$ python test.py /home/ubuntu/pytorch/torch/_dynamo/eval_frame.py:318: UserWarning: Tensor cores are available but not enabled. Consider setting torch.backends.cuda.matmul.allow_tf32 == True in your python script for speedups warnings.warn( tensor([1.3717]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88844 Approved by: https://github.com/ngimel, https://github.com/mlazos, https://github.com/anijain2305 commit b72f5b9ae3f7d1de74d9d2d40236fd09d606be0e Author: Yanbo Liang Date: Thu Nov 17 06:57:42 2022 +0000 [Dynamo] Support typing.Mapping & Support function as argument (#88963) These missing features come from https://github.com/pytorch/benchmark/pull/1302, where we'd like to enable E2E hf_bert dynamo train/eval. The dependent [HuggingFace accelerate library](https://huggingface.co/docs/accelerate/index) requires these improvements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88963 Approved by: https://github.com/jansel commit 126e44173d0dd4d942d8e20c73442048a46cfc24 Author: AllenTiTaiWang Date: Thu Nov 17 03:27:18 2022 +0000 [ONNX] Add onnx-script into ONNX docs (#89078) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89078 Approved by: https://github.com/BowenBao commit 74610a1cedbab64e813f3b49535cd8691a3ec5c7 Author: Animesh Jain Date: Thu Nov 17 06:14:21 2022 +0000 [dynamo][benchmarks] HF - Fix seq len and batch sizes (#89165) Fixes many models in https://github.com/pytorch/torchdynamo/issues/1842 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89165 Approved by: https://github.com/ngimel commit a41f70603aededc414da58523361773dbf13bde2 Author: Andrew M. James Date: Thu Nov 17 02:01:13 2022 +0000 Round out rad2deg sparse support (#88442) - Add sparse coo dispatch - Modify backward to work with sparse compressed layouts - Enable sparse_compressed autograd testing - Correct layout support attributes on OpInfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442 Approved by: https://github.com/cpuhrsch commit 70fb673e51decdd8bf4e55244d910a8e5680d12f Author: Rachel030219 <13704467+Rachel030219@users.noreply.github.com> Date: Thu Nov 17 05:55:25 2022 +0000 Use software approach to catch overflow ( `c10/utils/safe_numerics.h` ) on ARM devices (#89042) Fixes #89040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89042 Approved by: https://github.com/malfet commit 54fca6a9da77b56b1a82373c814e61378b5d04c2 Author: Aaron Gokaslan Date: Thu Nov 17 05:01:08 2022 +0000 Fix: prefer .is_none() over .is(py::none()) for pybind11 in caffe2 (#88199) Follow up to #88051 . I noticed that I missed a few spots in the caffe2 folder. Prefer `.is_none()` over `.is(py::none())` as `.is_none()` is more efficient since it avoid reference counting increments and decrements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88199 Approved by: https://github.com/albanD, https://github.com/kit1980 commit 4e1d19c5a577b947a3dc84d9eec4a186ad3cd52f Author: PyTorch MergeBot Date: Thu Nov 17 04:58:53 2022 +0000 Revert "Redefine the simdlen semantic: (#88482)" This reverts commit fce6d6b3dcc879720bc45143426b86232106818a. Reverted https://github.com/pytorch/pytorch/pull/88482 on behalf of https://github.com/kit1980 due to Broke multiple tests in several trunk workflows, for example https://github.com/pytorch/pytorch/actions/runs/3485086792/jobs/5830429554 commit 81a8fdc40d7c504f99d5796a5b187551493685e4 Author: Lukas Hoenig Date: Thu Nov 17 04:54:23 2022 +0000 [MPS] Add binary operations dtype precedence test case (#87545) See https://github.com/pytorch/pytorch/pull/84742 and https://github.com/pytorch/pytorch/pull/78319. The test case tests that - for the binary operations (add, sub, mul, div), - for all data types (dtypes), - for a range of representative values and their combinations, - for various shapes and ways of creating the test tensors, the contents and dtype of the result tensor is identical for the MPS and CPU backends. It adds about 15-18s runtime to `test_mps.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87545 Approved by: https://github.com/kit1980 commit 44c9185f91699b74c7953eb912f37fb24991958d Author: ecao Date: Thu Nov 17 04:47:45 2022 +0000 Fix empty input issue of convolution for channels last memory format (#86521) Fixes empty input convolution issue : when input is empty e.g. shape of (0, 3, 3, 4) and weight is channels last format, at::_unsafe_view will raise "view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead." Pull Request resolved: https://github.com/pytorch/pytorch/pull/86521 Approved by: https://github.com/jgong5, https://github.com/malfet commit 637e764ec5d879a5cce0f63f747db3967b708517 Author: maxren Date: Wed Nov 16 10:46:30 2022 -0800 [xnnpack][executorch] Pass xnnexecutor pointer to compileModel() (#89090) Here we pass XNNExecutor* to compile model so that XNNExecutor can be allocated by runtime. This signature change is for executorch: ``` XNNExecutor compileModel(void* buffer) --> void compileModel(void* buffer, XNNExecutor* executor) ``` The intended usecase for allocating Executor and Compiling the serialized flatbuffer: ``` XNNExecutor* executor = runtime_allocator->allocateList(1); XNNCompiler::compileModel(processed.buffer, executor); ``` Differential Revision: [D41208387](https://our.internmc.facebook.com/intern/diff/D41208387/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89090 Approved by: https://github.com/digantdesai commit 24b9890f0343a156a5785be859610316ecf8274e Author: Colin Taylor Date: Thu Nov 17 04:26:10 2022 +0000 [torchrec] [composable] update ShardedEmbeddingBagCollection to be use registered EBCs with shardedTensors as registered modules (#758) (#88026) Summary: X-link: https://github.com/pytorch/torchrec/pull/758 This PR fixes a bug in FSDP/DDP, where ShardedTensors are not supported even if passed in as params to ignore. this is important for composability because TorchRec named_parameters() will return FQN of shardedTensors (as defined in goals) It defines device of ShardedTensor to be None when local_tensor() does not exist on rank update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit Differential Revision: D40458625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88026 Approved by: https://github.com/wanchaol, https://github.com/rohan-varma commit 1cd6ebe0958ab8eff2b7ba715d9544f067dfe59e Author: Kazuaki Ishizaki Date: Thu Nov 17 04:18:10 2022 +0000 Fix typos in messages under torch (#89049) This PR fixes typos of messages in `.py` files under torch directory. Only in `torch/onnx/symbolic_opset16.py`, fix a typo in comment to make the operator name correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89049 Approved by: https://github.com/lezcano commit d1f48f05cef9e2b3b01c64a21a6e2abc3ddab323 Author: maxren Date: Wed Nov 16 10:46:28 2022 -0800 [xnnpack][Bug Fix] Pass serialized model by reference (#89089) Two changes - Remove XNNCompiler Dependence on std::string by passing void* - Grab ser_model by reference: This bug was causing data pointers given to xnn_runtime to be freed because ser_model was on the stack. Differential Revision: [D41208380](https://our.internmc.facebook.com/intern/diff/D41208380/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89089 Approved by: https://github.com/digantdesai commit 366f1b2c2f6b273fcba5f071bf2297a963051894 Author: maxren Date: Wed Nov 16 10:46:27 2022 -0800 [xnnpack][lite-int] Freeze/Inline module to remove reference to self (#88863) We need to inline graph before converting from torchscript to xnnpack flatubuffer. Remove graph dependence on self. This will later help us work with constant data. Differential Revision: [D41049858](https://our.internmc.facebook.com/intern/diff/D41049858/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88863 Approved by: https://github.com/digantdesai commit 1adb7b9b845603a834f452da0e99790779740d83 Author: Jerry Zhang Date: Tue Nov 15 16:01:29 2022 -0800 [nn][utils] Preserve requires_grad from original weight and bias in fuse conv/linear bn weights (#89100) Summary: att, previously we just call nn.Parameter which will have requires_grad=True by default, after this PR we will preserve the requires_grad Test Plan: python test/test_nn.py TestFusionUtils Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41343694](https://our.internmc.facebook.com/intern/diff/D41343694) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89100 Approved by: https://github.com/ngimel commit a5f04e9a915104692ae67ccd79768e8147cc0d2d Author: Kazuaki Ishizaki Date: Thu Nov 17 03:36:59 2022 +0000 Fix typos in .md and .rst files (#88962) This PR fixes typos `Github` in `.md` and `.rst` files. `Github` -> `GitHub` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88962 Approved by: https://github.com/kit1980 commit 573eaf12258df8e87434ffa19a42b04fb873c6dc Author: Huy Do Date: Thu Nov 17 03:36:56 2022 +0000 Analyze and upload disabled tests rerun to S3 (#89083) Analyze and upload disabled tests rerun to S3. Note that this only picks up `test-reports` from `rerun_disable_tests` workflows. Running the script manually `python -m tools.stats.check_disabled_tests --workflow-run-id 3473068035 --workflow-run-attempt 1 --repo pytorch/pytorch` and see the files successfully uploaded to s3://ossci-raw-job-status/rerun_disabled_tests/3473068035/1 Rockset collection created https://console.rockset.com/collections/details/commons.rerun_disabled_tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/89083 Approved by: https://github.com/clee2000 commit fce6d6b3dcc879720bc45143426b86232106818a Author: Wang, Eikan Date: Wed Nov 16 23:58:11 2022 +0000 Redefine the simdlen semantic: (#88482) This PR is targeting to automatically enable vectorization optimization for TorchInductor. It refined the semantics of `config.cpp.simdlen`. Originally, `None` means to disable vectorization while a specific value means the number of elements to be vectorized once time. But it depends on the data. Regarding 256bit SVE/SIMD ISA for ARM and X86, the `simdlen` should be 16 for Float while 32 for BFloat. Hence, this PR defined the `simdlen` as the bit width. The detailed semantics are as follows. - **_simdlen = None_**: Automatically determine the SIMD bit width. Detect HW information and pick the proper vectorization ISA. Specific for X86, the priority of AVX512 is higher than AVX2. - **_simdlen <=1_**: Explicitly disable SIMD - **_simdlen > 1_**: Explicitly specify the SIMD bit width. It equals the disabled semantic if the bit width does not match the ISA width. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88482 Approved by: https://github.com/jgong5, https://github.com/jansel commit c3acb9c8859fb5cfa1959ee49849f07942c40ccc Author: AllenTiTaiWang Date: Wed Nov 16 19:50:02 2022 +0000 [ONNX] Add Internal Utils: onnx_proto_utils.py for onnx/onnx-script/onnx_proto (#88376) Added `onnx_proto_utils.py` for onnx/onnx-script related process. The idea is like jit_utils.py, and to simplify what we have in `torch/onnx/utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88376 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit f3af5ba48effeb7785df2049348d83467c5fb986 Author: Charlie Yan Date: Tue Nov 15 23:33:05 2022 +0000 [WIP] Composable API: `replicate` and `DistributedState` (#87649) This PR adds the first version of the `replicate()` composable API. For this prototype version, I try to reuse as much code from existing `DistributedDataParallel` as possible, and iterate on it in later changes. The basic idea of this prototype is: - create a `ReplicateState` object. It internally uses a `ParameterList` module to hold all parameters of modules marked by `replicate()` API. - create an internal `_ddp` object, which reuses existing `DistributedDataParallel` implementation, and wraps the `ParameterList` object - install pre-forward and after-forward hooks on the root module, which calls methods of `_ddp` to run initialization and forward Pull Request resolved: https://github.com/pytorch/pytorch/pull/87649 Approved by: https://github.com/zhaojuanmao commit f73d9a79fe8d52be27c3c28cd93ce690bdc4f9b7 Author: Riley Dulin Date: Thu Nov 17 02:43:33 2022 +0000 [torch][fx] Fix PassManager to not use a class variable mutable list (#89108) Summary: I found a confusing bug in the PassManager that only happens when you instantiate one multiple times: it will use old passes and constraints! This occurs because the class-level declarations initialize it to an empty list, but the problem is that class initializers only run once, and are creating class variables. This means the same empty list was being reused every time, except after the first time it isn't empty. The empty list has to be created in `__init__` newly each time or else it'll be shared. Note that this is the same type of bug as using an empty list as a default parameter, where it'll reuse the same list pointer and not make it empty each time. The better way to do this is with either: * An immutable default parameter like an empty tuple, that you create a new list from: `self.passes = list(passes)` * Use None and then create the empty list inside `__init__` I chose the latter as it's less likely to cause a behavior change due to the changed default. Note that for immutable values like `False` and `1` this doesn't apply as you can't mutate that value for everyone. Test Plan: Added a test to ensure that the pass state is not saved. Without my change, this test would fail as it would run all of the `2 * x` passes first, then all of the `3 * x` passes. Differential Revision: D41327056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89108 Approved by: https://github.com/angelayi commit ac0a6f381de06b58aa583daf7771c410c69709fd Author: Wanchao Liang Date: Wed Nov 16 22:28:36 2022 +0000 [dtensor] disable op db tests for now (#89162) context: https://github.com/pytorch/pytorch/issues/89160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89162 Approved by: https://github.com/fduwjj commit 2f59c69ac7cf027e14012dfeba6b65506787682d Merge: 473970e8b4 f93ba52d25 Author: mingfeima Date: Thu Nov 17 10:16:48 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned] commit f93ba52d252cc158cf98e40fc5dc20a114903821 Merge: d6f3ee98ff f5e2cb5249 Author: mingfeima Date: Thu Nov 17 10:16:48 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" cc jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned] commit 30d9fb9157b59db27cd2c0c6e6b0b6221efda571 Author: Animesh Jain Date: Thu Nov 17 02:03:45 2022 +0000 [dynamo][reland] API Support for nn.Module (#89113) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/89113 Approved by: https://github.com/ezyang commit f5e2cb52496ab51edaa25ac35908b6832e23dadb Author: William Wen Date: Thu Nov 17 02:02:26 2022 +0000 Add comprehensive minifier tests (#88022) Adds tests for https://github.com/pytorch/torchdynamo/issues/1241. To run: `pytest test/dynamo/test_minifier.py`. Actually runs minifier launcher script and repro scripts, rather than just checking for existence of the minifier launcher script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88022 Approved by: https://github.com/mlazos, https://github.com/anijain2305 commit 473970e8b46b164bf684561c9ad41549b55c53d8 Merge: 07cea67d12 d6f3ee98ff Author: mingfeima Date: Thu Nov 17 10:01:49 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit d6f3ee98ffbf9a9338ba80d8668177bf248a8f7f Author: mingfeima Date: Thu Nov 17 10:01:49 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 088f2fa567fcf74aa746886e3e90fd3e6c58fa61 Author: Kazuaki Ishizaki Date: Thu Nov 17 01:55:03 2022 +0000 Fix typos in messages under test (#89121) This PR fixes typos of messages in `.cpp` and `.py` files under test directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89121 Approved by: https://github.com/mruberry, https://github.com/kit1980 commit 716f70f19a4b63268da2a753afdbe9b385a831ab Author: Horace He Date: Wed Nov 16 19:58:30 2022 +0000 Added conv constraint that infers layouts (#89031) The core problem that we often have with contiguous/channels-last layouts and convolutions is that Inductor often doesn't do a great job of "preserving" the eager-mode layouts. So, for example, we'll often have something like ``` a: channels-last b = foo(a) c = convolution(a) ``` In eager-mode, `a` would stay channels-last, and we would avoid two transpose copies (one into NHWC and one back into NCHW) within the convolution kernel. However, Inductor currently sometimes loses the "correct" layout of `b` (not in this simple example, but others). Then, not only will we do a transpose within `foo`, but we'll then immediately transpose it back to do the convolution (and then again once the convolution is done). This is particularly egregious in `convnext_base`, where there's a lot of mixing of non-channels last tensors and channels-last tensors. The solution in this PR is to constrain the inputs to `aten.convolution`/`aten.convolution_backward` to match the layouts from eager-mode. This ensures that we'll never do extra transposes *within* `aten.convolution`, which are particularly bad (since Inductor can't fuse them). Pull Request resolved: https://github.com/pytorch/pytorch/pull/89031 Approved by: https://github.com/ngimel, https://github.com/jansel commit 251fdda77b8f60667e016c89f65f798ea5f3eaea Author: Huy Do Date: Thu Nov 17 01:45:48 2022 +0000 Add pytest-flakefinder as a test dependency (#89103) This is used to re-run tests multiple times to determine their flakiness status. The way re-run is handled in https://github.com/pytorch/pytorch/pull/88646 only applies to unittest Per their documentation, `pytest-repeat` doesn't work with `unittest.Testcase` it seems, so trying https://github.com/dropbox/pytest-flakefinder instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/89103 Approved by: https://github.com/clee2000 commit 0d87a4fec89fc78e568224935897ec585a6368a6 Author: keineahnung2345 Date: Thu Nov 17 01:09:55 2022 +0000 Fix typo in Dispatcher.h (#89045) Fix typo in Dispatcher.h: hamespace -> namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/89045 Approved by: https://github.com/bdhirsh, https://github.com/kit1980 commit 80b6761863407a8cf1ca780fcf97d135743f7812 Author: John Detloff Date: Thu Nov 17 01:06:12 2022 +0000 Update README.md (#85534) Our jenkins builds are gone, so this badge is broken and should be removed Pull Request resolved: https://github.com/pytorch/pytorch/pull/85534 Approved by: https://github.com/ngimel, https://github.com/kit1980 commit 3af5cf4de16e4e9256be6439a3539e3e52e3a879 Author: R Max Espinoza Date: Thu Nov 17 01:03:31 2022 +0000 doc(typo): memroy -> memory (#89126) Minor typo in comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89126 Approved by: https://github.com/kit1980 commit cfd552547f106f4a7841976dad8b795b82d161c8 Author: Charlie West-Taylor Date: Thu Nov 17 00:59:12 2022 +0000 Use the Python frame safely in _pythonCallstack (#88993) Currently, the result of `PyEval_GetFrame()` is piped straight to `Py_INCREF`. However, `PyEval_GetFrame` [may return null](https://docs.python.org/3/c-api/reflection.html#c.PyEval_GetFrame), which seems to be the case sometimes, when calling `_pythonCallstack` from another thread. This is handled in the subsequent `while (nullptr != frame)` block, but `Py_INCREF`, called before it, [doesn't handle this case](https://docs.python.org/3/c-api/refcounting.html#c.Py_INCREF), so the program segfaults. The safe form of `Py_INCREF` is `Py_XINCREF`, so use that instead ([docs](https://docs.python.org/3/c-api/refcounting.html#c.Py_XINCREF)). Pull Request resolved: https://github.com/pytorch/pytorch/pull/88993 Approved by: https://github.com/albanD commit 8506b305df531f7567a430854cbe7fcfa539416a Author: Nikolay Korovaiko Date: Thu Nov 17 00:38:44 2022 +0000 handle scatter(Scalar) overload in inductor (#88894) Relanding https://github.com/pytorch/pytorch/pull/88210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88894 Approved by: https://github.com/desertfire commit 0c835e25bbde7869101023ebfaab9b7ec01ece25 Author: atalman Date: Thu Nov 17 00:30:12 2022 +0000 Fix nightly build binary errors (#89153) This is pretty much self explanatory issues Two typo's in generate generate binary script caused workflows to be generated with invalid parameters: 1 .generated-linux-binary-libtorch-pre-cxx11-master.yml 2 .generated-macos-arm64-binary-wheel-nightly.yml Pull Request resolved: https://github.com/pytorch/pytorch/pull/89153 Approved by: https://github.com/malfet commit 98379a3949ed4b4f4a76bd9fed2806f82b6c0aa0 Author: AllenTiTaiWang Date: Wed Nov 16 19:50:02 2022 +0000 [ONNX] Add onnx-script test cases (#86907) The test cases for #86906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86907 Approved by: https://github.com/BowenBao commit f920bfaf2a6bfb4bc7966f8417309d94164ff86f Author: Will Constable Date: Wed Nov 16 18:40:41 2022 +0000 Use torchrun for dynamo/distributed.py (#89149) Mainly wanted to confirm torchrun works fine with dynamo/ddp, but it is also a better system than manually launching processes. Partially addresses issue #1779 New run commands ------------ single process: python benchmarks/dynamo/distributed.py [args] multi-gpu (e.g. 2 gpu on one host): torchrun --nproc_per_node 2 benchmarks/dynamo/distributed.py [args] Pull Request resolved: https://github.com/pytorch/pytorch/pull/89149 Approved by: https://github.com/aazzolini commit 8ba62bdff5441b65938ad27e944aa91e4f7eb61a Author: Fuzzkatt Date: Wed Nov 16 22:50:11 2022 +0000 add test_c10d_spawn_ucc.py (#86508) Initial PR to create UCC equivalent of https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_spawn_gloo.py and https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_spawn_nccl.py. Currently only added common ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86508 Approved by: https://github.com/kwen2501 commit ec61951f0771e70de12e6e46bd131ace98486238 Author: Mikayla Gawarecki Date: Wed Nov 16 19:17:08 2022 +0000 Fix inaccuracy in nt constructor documentation + broken rendering (#89152) Rendering was broken and docstring seemed to be inaccurate ![Screen Shot 2022-11-16 at 2 16 28 PM](https://user-images.githubusercontent.com/35276741/202273588-a2da5b7b-1a6d-46bb-a74e-c0de9a0fd064.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89152 Approved by: https://github.com/cpuhrsch commit 5848704ef8feba9fff3ec4f8ce7d1d3189ec5af8 Author: Mikayla Gawarecki Date: Wed Nov 16 19:00:49 2022 +0000 Removed unecessary check in `select_nested` (#89150) Implementation in #88585 should work for all dimensions. Removed unnecessary check that constrained select to dims 0 and 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89150 Approved by: https://github.com/cpuhrsch commit ee1d375bf98f6e4c69b2d6f3aa1c702cb652d2f2 Author: Andrew Gu Date: Wed Nov 16 18:36:24 2022 +0000 [FSDP] Add fast path for `NO_SHARD` `clip_grad_norm_()` (#89137) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89137 Approved by: https://github.com/rohan-varma commit e70f446a16f25b7f344d256c8fa0b78769920d00 Author: Yanbo Liang Date: Wed Nov 16 21:59:31 2022 +0000 [Dynamo] Fix bug in NamedTupleVariable (#89110) Fixes https://github.com/pytorch/torchdynamo/issues/1866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89110 Approved by: https://github.com/jansel commit 640af8d70a3adc7727661c15260d42fe931e9de4 Author: William Wen Date: Wed Nov 16 21:54:24 2022 +0000 More dynamo dashboard improvements (#89155) A number of dashboard improvements: - Add accuracy failures to warnings section - Add regression detection to all metrics (speedup, compile time, peak memory), not just accuracy - Add testing flag to update-dashboard to prevent image/comment uploads - Add section for comparing summary statistics (passrate, speedup) between 2 most recent reports - Show names of reports for summary stats diff and regression detection sections - Remove metric graphs from the comment (they can still be found in the generated text file) Sample comment: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1317565972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89155 Approved by: https://github.com/anijain2305 commit 305b9b1f0e5802437a7ed8169e0ff3fb5c06d4ec Author: Nikolay Korovaiko Date: Wed Nov 16 21:54:20 2022 +0000 Fix XLASymNode.str() no str() attribute error (#89093) This fixes https://github.com/pytorch/xla/issues/4199 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89093 Approved by: https://github.com/ezyang commit 4908a12542798a3e8641faae6b74f068fdfc6778 Author: Edward Z. Yang Date: Wed Nov 16 11:59:40 2022 -0500 Reland "SymIntify convolution backend calculation (#89069)"" (#89142) This reverts commit 90db86be108184a6c86c73e1b01012352c72e66b. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89142 Approved by: https://github.com/albanD, https://github.com/malfet commit 45c62a337756ff9db97cd64d2d42d9e65dda0a85 Author: HDCharles Date: Wed Nov 16 10:07:14 2022 -0800 [ao] making _is_activation_post_process private (#87520) Summary: same function in observer and quantize, consolidated to a single function. Note the definitions were slightly different, I've changed the definition to be maximally inclusive so that the name of the function is more accurate Test Plan: python test/test_public_bindings.py python test/test_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709276](https://our.internmc.facebook.com/intern/diff/D40709276) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87520 Approved by: https://github.com/jcaip commit aee96bbf5a34b7d9b12b8d03fa1904e595c6a329 Author: Iris Date: Wed Nov 16 21:06:35 2022 +0000 [PT-D][Checkpointing] Move distributed checkpointing from torch.distributed._shard.checkpoint to torch.distributed.checkpoint (#88698) Context in RFC: https://github.com/pytorch/pytorch/issues/86620 .rst file will be finalized in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88698 Approved by: https://github.com/wanchaol commit 6b521bbf3589d763f9ad348ee24e54be12c44356 Author: soulitzer Date: Wed Nov 16 11:22:58 2022 -0500 Prevent module full_backward_hook from erroring in double backward (#88357) Also clarifies documentation to say "execute if and only if gradients wrt outputs are computed" (previously, "execute every time gradients wrt inputs are computed") See https://docs.google.com/document/d/1tFZKYdsSzRBJ7Di7SWt8X8fSg-E3eiUPwomMF10UyhM/edit for more details regarding the question: 'should module full_backward_hooks be called every time the gradients wrt module inputs are called, or should module full_backward_hooks only be called when the "backward for the module" have been computed?' Fixes https://github.com/pytorch/pytorch/issues/88312 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88357 Approved by: https://github.com/albanD commit 0581331963cb3dc18fa59a800661c800ebff92c2 Author: BowenBao Date: Mon Nov 14 13:31:23 2022 -0800 [ONNX] Document ONNX diagnostics (#88371) Reference pages: - Landing page: https://docs-preview.pytorch.org/88371/onnx_diagnostics.html - Individual rule: https://docs-preview.pytorch.org/88371/generated/onnx_diagnostics_rules/POE0004%3Aoperator-supported-in-newer-opset-version.html An initial PR to setup the document generation for ONNX diagnostics. * Add document page for ONNX diagnostics. * Add document generation for diagnostics rules from `rules.yaml`. * Add dependency on `myst-parser` for markdown to rst parsing. More content to be added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88371 Approved by: https://github.com/abock, https://github.com/justinchuby, https://github.com/malfet, https://github.com/kit1980 commit 848e7240a11c9fd82298bc5b5ae14534e1307627 Author: Yanbo Liang Date: Wed Nov 16 19:08:49 2022 +0000 [Dynamo] Add a dummy profiler to avoid activating real profiler (#88930) See context at https://github.com/pytorch/torchdynamo/issues/1721#issuecomment-1312396059 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88930 Approved by: https://github.com/jansel commit 61801799a0a6a2fe0b577450c1fdd55af6063664 Author: andrewor14 Date: Tue Nov 15 13:27:57 2022 -0800 [Quant][bc-breaking] Remove overwrite_output_observer (#88620) Summary: When the BackendConfig was first introduced, `overwrite_output_observer` and `overwrite_output_fake_quantize` were added to ensure fixed qparams ops like `torch.nn.Sigmoid` and `torch.nn.Tanh` used the correct observers and fake quantizes. However, this is hacky because the BackendConfig should not set the observer constructors themselves, but should instead specify only requirements on the observers. Later, https://github.com/pytorch/pytorch/pull/80184 added the correct observers to `get_default_qconfig_mapping` along with validation logic that throws an error if incorrect observers were specified. With this change, we no longer need to overwrite the observers from the BackendConfig, since we expect the user to pass in the correct observers for these ops. This commit removes these overwrite observer settings in the BackendConfig. Instead, we represent the observer constraints for fixed qparams ops through the existing DTypeWithConstraints mechanism. Note that, however, to be consistent with other DTypeWithConstraints checks, we no longer throw an error if an incorrect observer is specified, but simply ignore the offending QConfig and log a warning instead. This is the BC-breaking part of the change. BC-breaking notes: ``` from torch.ao.quantization.qconfig import default_qconfig from torch.ao.quantization.quantize_fx import prepare_fx model = ModelWithFixedQParamsOps() qconfig_mapping = QConfigMapping().set_global(default_qconfig) example_inputs = ... prepare_fx(model, qconfig_mapping, example_inputs) ``` Before this commit, running the above leads to an exception because the wrong observers are used for fixed qparams ops. After this commit, the above will only encounter a warning, and the fixed qparams ops will not be quantized. In both cases, switching to `get_default_qconfig_mapping` will cause the fixed qparams ops to be quantized. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88620 Approved by: https://github.com/jerryzh168 commit a6ef2c7634e2a77fe698d5335d29e10ca24cdf2b Author: Huy Do Date: Wed Nov 16 18:25:38 2022 +0000 Support test-config filter logic for rocm (#89046) The logic used by `mem_leak_check` https://github.com/pytorch/pytorch/pull/88373 is currently not applied to rocm, i.e. https://hud.pytorch.org/pytorch/pytorch/commit/06486cd0087200e08ebb8a9518e064251c7c5309 because its workflows don't have the test-config filtering logic yet (linux, mac, and windows all have it already). In another work, rocm tests always run with mem leak check disabled at the moment. We want that but also to run the test with mem leak check enabled periodically one per day. This PR closes that gap Pull Request resolved: https://github.com/pytorch/pytorch/pull/89046 Approved by: https://github.com/clee2000 commit 7b0adc290a744de42e875822a1be4fa2b8d96147 Author: Peter Bell Date: Wed Nov 16 12:40:27 2022 +0000 Run tests from test/inductor in inductor CI job (#88957) CUDA inductor tests are currently not run in CI because the only jobs that have triton installed don't actually run these test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88957 Approved by: https://github.com/ngimel, https://github.com/seemethere commit 58ebf92cf06bd68ca7aba0e29526e9004d53f08d Author: lezcano Date: Wed Nov 16 14:09:59 2022 +0000 Add bfloat16 support to torch.prod to align with torch.cumprod (#87205) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/87205 Approved by: https://github.com/mruberry commit 33209153035ef60f84014983186f9eefde7dab72 Author: lezcano Date: Wed Nov 16 14:09:59 2022 +0000 Fix decomp for embedding_backward and simplify the decomposition of embedding_dense and embedding_dense_backward (#87204) See the title Pull Request resolved: https://github.com/pytorch/pytorch/pull/87204 Approved by: https://github.com/Chillee commit e1ecf53d8480899b5b41c295e52eafb7347f0141 Author: lezcano Date: Wed Nov 16 14:09:58 2022 +0000 Simplify linspace decomp and increase its tolerance (#87203) This is an interesting one Since this is an operation that's intrinsically defined on the reals, we should perform the ops on that dtype always, and just cast to the desired dtype at the end. This simplifies the decomposition. Now, I started looking at this one when I started seeing failures on a test that's added in a later PR. What's going on here is that, by doing an upcast to a higher dtype and then cast down to integers, sometimes there's an off-by-one error. I think this is fine, as the decomposition is more accurate than the original function, which goes in line with the whole PrimTorch effort. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87203 Approved by: https://github.com/mruberry commit d2d22d89d92bf7d6bb02417dab04027d7fcc80d3 Author: bmedishe Date: Wed Nov 16 17:42:26 2022 +0000 test_unary_ufuncs few tests enabled on rocm which are passing (#89007) This PR is to enable tests which are skip on rocm from test package test_unary_ufuncs.py::TestUnaryUfuncsCUDA
test_file | test_name | test_class -- | -- | -- test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_2_cuda_float16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_2_cuda_float32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_2_cuda_float64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_2_cuda_int16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_2_cuda_int32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_2_cuda_int64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_4_cuda_float16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_4_cuda_float32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_4_cuda_float64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_4_cuda_int16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_4_cuda_int32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_polygamma_polygamma_n_4_cuda_int64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_large_tan_cuda_float64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_bfloat16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_float16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_float32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_float64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_int16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_int32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_int64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_int8 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_atan_cuda_uint8 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_2_cuda_float16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_2_cuda_float32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_2_cuda_float64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_2_cuda_int16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_2_cuda_int32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_2_cuda_int64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_2_cuda_int8 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_2_cuda_uint8 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_4_cuda_float16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_4_cuda_float32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_4_cuda_float64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_4_cuda_int16 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_4_cuda_int32 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_4_cuda_int64 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_4_cuda_int8 | (__main__.TestUnaryUfuncsCUDA) test_unary_ufuncs | test_reference_numerics_small_polygamma_polygamma_n_4_cuda_uint8 | (__main__.TestUnaryUfuncsCUDA)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89007 Approved by: https://github.com/mruberry commit 7f55db4fb0fb12ed593c7f23de01bfb9330b7dd5 Author: Jacob Szwejbka Date: Wed Nov 16 16:59:36 2022 +0000 add quantize_decomposed_dynamic to op lib (#88855) Summary: Needed for dynamic quant reference pattern graphs. Test Plan: added unittest Differential Revision: D41205030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88855 Approved by: https://github.com/jerryzh168 commit cf6003f0469ae1440d4a8585860c2c5f4c738707 Author: PyTorch MergeBot Date: Wed Nov 16 16:52:47 2022 +0000 Revert "Towards unifying symbolic and non symbolic fake tensor (#89038)" This reverts commit 37d54239c7ea88fd9c98dcac3fcc9b98a6f9e9d1. Reverted https://github.com/pytorch/pytorch/pull/89038 on behalf of https://github.com/ezyang due to executorch segfaults commit fe276ea0f9b4cce9c7d32157f831897fbbd1c85a Author: Kirtesh Patil Date: Wed Nov 16 16:40:24 2022 +0000 [UCC] Add pre & post processing for CPU collectives (#89030) Summary: The CPU block in `collective_post` was missing pre & post processing. The reduce-scatter implementaion expects use of pre-processing callback to flatten the input tensors, however, the missing invocation meant grabage values were being passed. Test Plan: Tested the reduce-scatter collective using PARAM Reviewed By: eastzone Differential Revision: D41291592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89030 Approved by: https://github.com/kingchc, https://github.com/kwen2501 commit 90db86be108184a6c86c73e1b01012352c72e66b Author: PyTorch MergeBot Date: Wed Nov 16 16:36:27 2022 +0000 Revert "SymIntify convolution backend calculation (#89069)" This reverts commit 09ed8b67e24cfe29f3fa7b5dd28eaa7749229f12. Reverted https://github.com/pytorch/pytorch/pull/89069 on behalf of https://github.com/DanilBaibak due to breaking internal builds commit cf4b4b1b060fd48d4103acb4d0422e88c7e3b69e Author: Angel Avila Date: Wed Nov 16 16:30:56 2022 +0000 Fix python types in pybind function signatures (#89115) Fixes #88958 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89115 Approved by: https://github.com/ezyang commit abe41aee776e7ab39c34f28a88f03a03dc6f1479 Author: AllenTiTaiWang Date: Wed Nov 16 06:30:03 2022 +0000 [ONNX] Support custom Op with onnx-script local function (#86906) Extend `register_custom_op` to support onnx-script local function. The FunctionProto from onnx-script is represented by custom op and inserted into ModelProto for op execution. NOTE: I did experiments on >2GB case of a simple model with large initializers: ```python import torch class Net(torch.nn.Module): def __init__(self, B, C): super().__init__() self.layer_norm = torch.nn.LayerNorm((B, C), eps=1e-3) def forward(self, x): return self.layer_norm(x) N, B, C = 3, 25000, 25000 model = Net(B, C) x = torch.randn(N, B, C) torch.onnx.export(model, x, "large_model.onnx", opset_version=12) ``` And it turns out we won't get model_bytes > 2GB after `_export_onnx` pybind cpp function, as we split initializer in external files in that function, and have serialization before return the model bytes, which protobuf is not allowed to be larger than 2GB at any circumstances. The test cases can be found in the next PR #86907 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/86906 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit 9fe36a02146c57ed8165bb8914708437043899ab Author: mindest Date: Wed Nov 16 15:08:41 2022 +0000 [ONNX] Extra support for bernoulli export (#88655) * add opset 15 support for `bernoulli`. * add extra export options for different `bernoulli` cases: `x.bernoulli(p)` where `p` is a tensor or float. Fixes #88299 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88655 Approved by: https://github.com/BowenBao commit 37d54239c7ea88fd9c98dcac3fcc9b98a6f9e9d1 Author: Edward Z. Yang Date: Wed Nov 16 05:58:02 2022 -0800 Towards unifying symbolic and non symbolic fake tensor (#89038) Fake tensor behaves pretty differently depending on if you have symbolic shapes or not. This leads to bugs; for example, we weren't getting correct convolution_backward strides because we bypassed the correct stride logic in fake tensor on symbolic shapes. This PR attempts to unify the two codepaths. I don't manage to unify everything, but I get most of it. The algorithm is delicate and I'm still hosing down test failures. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89038 Approved by: https://github.com/anjali411 commit 09ed8b67e24cfe29f3fa7b5dd28eaa7749229f12 Author: Edward Z. Yang Date: Tue Nov 15 10:10:28 2022 -0800 SymIntify convolution backend calculation (#89069) We will need this to implement a convolution meta function that is SymInt aware. I use templates so that regular convolution code is not affected by the change. No tests for symbolic ints directly; that will come in a subsequent PR which also needs to refactor fake tensors. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89069 Approved by: https://github.com/SherlockNoMad commit 5e0c01330c76c003e55aec29bfb3e83926ee933a Author: Edward Z. Yang Date: Tue Nov 15 10:10:27 2022 -0800 SymIntArrayRef type caster (#89074) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89074 Approved by: https://github.com/SherlockNoMad commit 57af0c82454c199ab7a734c3d12df93c93f50812 Author: Nikita Karetnikov Date: Wed Nov 16 11:25:35 2022 +0100 Bug fix: make sure `copy_impl` doesn't read out of bounds (#88544) Fixes #88543. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88544 Approved by: https://github.com/lezcano commit dc40d3f93f849e467b2b56595a01f28e84ac7fa2 Author: anjali411 Date: Tue Nov 15 19:24:31 2022 +0000 Add meta impl for grid_sampler_2d_backward (#88745) TODO: add an OpInfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88745 Approved by: https://github.com/ezyang commit 52701227737489392e59fe57ded40226bf0811f6 Author: Jiawen Liu Date: Wed Nov 16 10:37:26 2022 +0000 [Inductor] Build FX Linear + Permute Vertical Fusion in Inductor (#89118) Summary: Build fx-based linear/matmul/bmm + permute/transpose vertical fusion in Inductor For an internal Ads model: **1.15x -> 1.36x speedup** Test Plan: CI Reviewed By: bertmaher, jansel, jianyuh Differential Revision: D41071665 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89118 Approved by: https://github.com/jianyuh commit 9d28775c1d28ab7c1dd93479a58bdafb9b626341 Author: PyTorch MergeBot Date: Wed Nov 16 09:45:49 2022 +0000 Revert "Rewrite assert statement with torch._assert under config (#88246)" This reverts commit 62ba15e10e875ce088dff26e872605ee70c8c04a. Reverted https://github.com/pytorch/pytorch/pull/88246 on behalf of https://github.com/DanilBaibak due to breaking internal builds commit 9d2f5a278414aeaa6f3277c5b15aee4938601fa6 Author: Animesh Jain Date: Wed Nov 16 08:51:30 2022 +0000 [dynamo] Support if cond on NNModuleVariable (#89095) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89095 Approved by: https://github.com/yanboliang, https://github.com/mlazos commit f20b3f2e5734b23a9e0a898196ddf77aa90323b8 Author: Wanchao Liang Date: Tue Nov 15 22:51:33 2022 +0000 [dtensor] PART 8: move tensor parallel api and tests to core distributed (#88180) This PR moves tensor/parallel folder and tests to torch.distributed. part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88180 Approved by: https://github.com/aazzolini commit 0230e52b541358cec075b9b9f3e6286d3964848f Author: Wanchao Liang Date: Tue Nov 15 22:51:33 2022 +0000 [dtensor] PART 7: move remaining DTensor tests to core distributed (#88179) This PR moves remaining tests, i.e. tensor_ops, op db tests to core distributed part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88179 Approved by: https://github.com/aazzolini commit 550a019fb85647f0bc7fe8ee231dc158b4f30d7c Author: Wanchao Liang Date: Tue Nov 15 22:51:32 2022 +0000 [dtensor] PART 6: move DTensor op tests to core distributed (#88551) This PR moves DTensor op tests to core distributed, including prop_rule, pointwise op, matrix op tests, etc. part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88551 Approved by: https://github.com/aazzolini commit 527c5bdb4574f12f5071b0466ce981ce1c129d75 Author: Wanchao Liang Date: Tue Nov 15 22:51:31 2022 +0000 [dtensor] PART 5: move DTensor basic tests to core distributed (#88178) This PR moves DTensor basic tests to torch.distributed, including dtensor, device_mesh tests part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88178 Approved by: https://github.com/fduwjj commit 1b88476320a99680a6e01f8f4afed5c5196cf39d Author: Wanchao Liang Date: Tue Nov 15 08:04:38 2022 +0000 [dtensor] PART 4: move remaining DTensor ops to core distributed (#88550) This PR moves the view related DTensor ops to core distributed, tests will be add in follow up PRs part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88550 Approved by: https://github.com/fduwjj commit 2dcf0978a249ae136c39e396200e5ed51407471d Author: Wanchao Liang Date: Tue Nov 15 08:04:38 2022 +0000 [dtensor] PART 3: move most DTensor ops to core distributed (#88177) This PR moves most DTensor ops to torch.distributed._tensor. We will add all tests in the following PRs. part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88177 Approved by: https://github.com/fduwjj commit 4b945967de2ae9a3c6df579a1541b822de46110c Author: Wanchao Liang Date: Tue Nov 15 08:04:38 2022 +0000 [dtensor] PART 2: move DTensor abstraction and APIs to core distributed (#88176) This PR moves the core DTensor abstraction and high level APIs to torch.distributed._tensor folder, which includes the following: 1. DTensor class 2. high level APIs (distribute_tensor/module) 3. dispatching logic 4. redistribute logic part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88176 Approved by: https://github.com/fduwjj commit 370fc5cb421f54fc9513237390e09cca0e06e01b Author: Wanchao Liang Date: Tue Nov 15 08:04:37 2022 +0000 [dtensor] PART 1: move DeviceMesh and placement to core distributed (#88549) This PR creates `torch.distributed._tensor` package and moves DeviceMesh, PlacementTypes to it part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88549 Approved by: https://github.com/fduwjj commit 59ba15f37407294eed3ecdb9986b02c5c2d52a70 Author: Huy Do Date: Wed Nov 16 07:44:41 2022 +0000 Upload CSV test reports from inductor (#89112) Inductor test report artifacts are now on HUD but its files are in CSV format instead of the default XML files from pytest or unittest that we expect. So this PR uploads both suffixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/89112 Approved by: https://github.com/desertfire commit 7e66d1d6cdb4e8d854a8da160daeb910783f069d Author: Jiawen Liu Date: Wed Nov 16 06:27:13 2022 +0000 [Inductor] Support Shape Padding for aten.mm in Inductor (#89086) Summary: Support shape padding for aten.mm in Inductor (originally from [#88709](https://github.com/pytorch/pytorch/pull/88709)) Differential Revision: D41315078 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89086 Approved by: https://github.com/jianyuh commit e2f0648750f2d0d0ac648728ce4c514db178cfa1 Author: Kenichi Maehashi Date: Wed Nov 16 05:07:51 2022 +0000 Add an option to include actual license terms to the output (#85624) When building products using PyTorch, it is often required to display license terms for all dependencies. The feature itself has been implemented in #81500 but it seems there are no options to enable it. This PR implements the option. cc/ @mattip @rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/85624 Approved by: https://github.com/rgommers, https://github.com/seemethere commit 8ebbd5a89a66bf84d7358f4d353ec2708d6c5429 Author: Johannes Pitz Date: Wed Nov 16 04:38:30 2022 +0000 Easier to understand event_dim computation (#81396) Fixes #81254 Only easier to understand, not a real fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81396 Approved by: https://github.com/fritzo, https://github.com/kit1980 commit ce2f8700bafcf44850402a39188ec121ba8b5486 Author: Sherlock Huang Date: Tue Nov 15 21:02:44 2022 +0000 Symintify numel(), infer_size, prims.elementwise_meta (#88956) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88956 Approved by: https://github.com/ezyang commit b291c1213ae18e89a5c616913f14b4bb8eda12a8 Author: Driss Guessous Date: Wed Nov 16 03:07:54 2022 +0000 Create native function for determining which implementation of SDP to call (#89029) Creates a callable native function that can determine which implementation of scaled dot product will get called. This allows to bump re-order the runtime dispatch of SDP to enable autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89029 Approved by: https://github.com/cpuhrsch commit 397f10067200d9b77acb92952b4ea3741738c28b Author: Andrew Gu Date: Tue Nov 15 19:19:47 2022 +0000 [FSDP] Test `named_parameters()` in forward (`use_orig_params=True`) (#89066) This adds a unit test following the FSDP change in https://github.com/pytorch/pytorch/pull/88781. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89066 Approved by: https://github.com/fegin commit 46ba0150cbfb8d86c378f0f3ce2d816e530a933b Author: Huy Do Date: Wed Nov 16 02:39:22 2022 +0000 Increase slow grad check timeout (#89079) Now that periodic jobs are run under `mem_leak_check` mode with parallelization turning off. It's very easy for `linux-bionic-cuda11.6-py3-gcc7-slow-gradcheck / test` to timeout because one of the shards is very close to the 4h mark. * https://hud.pytorch.org/pytorch/pytorch/commit/2452e3f99a072760fc46d3f9025aaa37ca7ea2ab * https://hud.pytorch.org/pytorch/pytorch/commit/35e668b5ced25e735b6e523d557ed7fd60267914 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89079 Approved by: https://github.com/clee2000 commit 9f0b2c73f36b0f5276f84cdaaef4d54a60df61f5 Author: PyTorch MergeBot Date: Wed Nov 16 01:13:00 2022 +0000 Revert "[Inductor] Build FX Linear + Permute Vertical Fusion in Inductor (#88859)" This reverts commit d60abe4b9521e235c0e9beb00cda0d6c5673f4e0. Reverted https://github.com/pytorch/pytorch/pull/88859 on behalf of https://github.com/kit1980 due to Broke Mac OS testing, which were clearly shown in CI commit d96dd8ff09a9e35f8cce6745c3e015eb0082eb1b Author: Edward Z. Yang Date: Tue Nov 15 08:05:31 2022 -0800 Add int64_t, SymInt overloads for all binary operators in C++ (#89063) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89063 Approved by: https://github.com/SherlockNoMad commit 431642111f74a22ebb5edc98e32b1449b4b3e46b Author: Edward Z. Yang Date: Tue Nov 15 06:41:53 2022 -0800 Move ConvParams methods directly on struct (#89062) This reduces boilerplate. Also, I plan to add a template parameter to ConvParams; without moving the methods onto the struct, I would have to manually template every method. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89062 Approved by: https://github.com/SherlockNoMad commit 49f0be0762e8cac48ccf3b19d1c662be6b271581 Author: Edward Z. Yang Date: Tue Nov 15 06:32:36 2022 -0800 Hide ConvParams struct from ConvUtils.h (#89059) It isn't actually used outside of Convolution.cpp, so no reason to publish it. I intend to turn this into a template, so moving it with the method definitions is very convenient. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/89059 Approved by: https://github.com/SherlockNoMad commit 19cacecf34cf46f1c7ca3920979dcd6fd7709a61 Author: Salil Desai Date: Wed Nov 16 00:56:12 2022 +0000 Fix and Re-enable test_quantize_fx_lite_script_module.py (#88897) Summary: After D35984526 (https://github.com/pytorch/pytorch/commit/416899d1a9fcb9dbc8bb66ed796b86360f573903), ```torch.ao.quantization.quantize_fx.prepare_fx``` requires passing in ```example_args```. This diff fixes the calls to ```prepare_fx``` in this test by adding in ```example_args``` as necessary. Test Plan: ``` buck test caffe2/test:fx_quantization_lite ``` ``` ✓ ListingSuccess: caffe2/test:fx_quantization_lite : 3 tests discovered (39.689) ✓ Pass: caffe2/test:fx_quantization_lite - test_conv2d (mobile.test_quantize_fx_lite_script_module.TestLiteFuseFx) (44.451) ✓ Pass: caffe2/test:fx_quantization_lite - test_embedding (mobile.test_quantize_fx_lite_script_module.TestLiteFuseFx) (45.462) ✓ Pass: caffe2/test:fx_quantization_lite - test_submodule (mobile.test_quantize_fx_lite_script_module.TestLiteFuseFx) (45.933) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3096224827259146 ``` Differential Revision: D41227335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88897 Approved by: https://github.com/dagitses commit 3bc327993f7182f3305b0aae854a26c83458c5a6 Author: Richard Zou Date: Tue Nov 15 08:12:03 2022 -0800 PyDispatcher integration with functorch (#88785) This PR teaches PyDispatcher and PyOperator about functorch transforms. It is important that PyDispatcher/PyOperator dispatch with functorch transforms, because this is our plan for higher-order operators (operators that accept functions as arguments). Examples of these include: - functorch transforms over the existing cond operator (control flow) - autograd.Function support for functorch (which I am working towards), - AOTDispatcher (should be a higher order operator) Concretely, the problem with teaching PyDispatcher/PyOperator about functorch is that the stack-based dispatching logic (DynamicLayerStack) is hidden inside the fallbacks for two dispatch keys (DynamicLayer{Front, Back}). PyDispatcher doesn't know about C++ boxed fallbacks, our plan on record for that is that we need to reimplement all of them in Python (but can call helper functions in C++ to make our lives easier). Instead of exposing all of what DynamicLayer{Front, Back} do to python, this PR takes the approach of re-implementing part of the stack-based dispatching in Python. The motivation is that this is more sane and follows what the "ideal" implementation of functorch would have been: - each transform should be a "mode" - there should be no TLS dispatch key set hackery. functorch needs to do this hackery today to re-use VariableType implementations. This PR: - exposes the DynamicLayerStack to Python - The DynamicLayerStack is a stack of Interpreters. These get exposed to Python as well. - Interpreters can run operations (Interpreter.process) or lower them to the next interpreter in the stack (Interpreter.lower) - To use a PyOperator with functorch transforms, a developer needs to register a rule for each transform (vmap, grad, jvp, ...). - The PyOperator API is NOT user-facing. Things like autograd.Function support for functorch will end up going through the autograd.Function API. Question for reviewers: - Does this design make sense? - I'm trying to split up the "functorch support for autograd.Function" work into logical pieces. Would it be better if I didn't? (the full thing is a bit long - 1000-2000 LOC). Test Plan: - new tests that construct PyOperator and compose them with functorch transforms Pull Request resolved: https://github.com/pytorch/pytorch/pull/88785 Approved by: https://github.com/samdow, https://github.com/soulitzer commit 2268a3215cdadbbbd561100a6368704ba9ef5f0d Author: Richard Zou Date: Mon Nov 14 11:00:15 2022 -0800 [functorch] add switch to enable autograd.Function (#88784) This is mostly a debug or "if you know what you're doing" switch for now. It is not public API. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/88784 Approved by: https://github.com/samdow, https://github.com/soulitzer commit 0ce22574b1aee4688e6ef56f66d6dfb31ae33b04 Author: PyTorch MergeBot Date: Wed Nov 16 00:45:41 2022 +0000 Revert "Enable correct supported activities for kineto on rocm (#88207)" This reverts commit 35093fc1ab9749e6b763acead007e56b54c6375b. Reverted https://github.com/pytorch/pytorch/pull/88207 on behalf of https://github.com/kit1980 due to Broke test_kineto on trunk / win-vs2019-cuda11.6-py3 / test (default, 4, 5, windows.8xlarge.nvidia.gpu) commit a13433940c4e8d7cc54d4fa5b3a9c0ff28fc0e8b Author: Shunting Zhang Date: Wed Nov 16 00:29:08 2022 +0000 allow loading model from a path in torchbench (#89028) Sometimes it's really convenient to run simple models thru the torchbench.py script rather than those from pytorch/benchmark. This PR add the ability to run any model from a specified path by overloading the --only argument. This PR is split out from #88904 Here is the usage: Specify the path and class name of the model in format like: --only=path:,class: Due to the fact that dynamo changes current working directory, the path should be an absolute path. The class should have a method get_example_inputs to return the inputs for the model. An example looks like ``` class LinearModel(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(10, 10) def forward(self, x): return self.linear(x) def get_example_inputs(self): return (torch.randn(2, 10),) ``` Test command: ``` WARNING:common:torch.cuda.is_available() == False, using CPU cpu eval LinearModel 0.824x p=0.00 ``` Content of model_collection.py ``` from torch import nn import torch class LinearModel(nn.Module): """ AotAutogradStrategy.compile_fn ignore graph with at most 1 call nodes. Make sure this model calls 2 linear layers to avoid being skipped. """ def __init__(self, nlayer=2): super().__init__() layers = [] for _ in range(nlayer): layers.append(nn.Linear(10, 10)) self.layers = nn.Sequential(*layers) def forward(self, x): return self.layers(x) def get_example_inputs(self): return (torch.randn(2, 10),) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89028 Approved by: https://github.com/jansel commit 60ffeb986648420810098cba6ac0ad1cee06bd95 Author: Michael Lazos Date: Wed Nov 16 00:08:34 2022 +0000 Don't iterate over graph when adding graph input (#89084) helps with https://github.com/pytorch/torchdynamo/issues/1803 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89084 Approved by: https://github.com/jansel commit ee05f47bddfb97b4b292808543d928b3526fc0ca Author: Charlie Yan Date: Tue Nov 15 18:03:53 2022 +0000 Rebase and re-land thread PG (#88795) The previous PR (https://github.com/pytorch/pytorch/pull/88627) has been reverted due to a failed check. After rebasing and rerun, all checks passed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88795 Approved by: https://github.com/huydhn, https://github.com/wanchaol commit 35093fc1ab9749e6b763acead007e56b54c6375b Author: Michael Wootton Date: Tue Nov 15 21:40:43 2022 +0000 Enable correct supported activities for kineto on rocm (#88207) A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/88207 Approved by: https://github.com/malfet, https://github.com/jeffdaily commit d0130cd21ee419fcb33a9ceefa3583aac1e736e1 Author: Bin Bao Date: Mon Nov 14 14:47:15 2022 +0000 Enable test_ops for inductor (#88994) Summary: skip several unsupported test cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/88994 Approved by: https://github.com/Krovatkin commit 67af734adeebf448c54bbc294e115244c5c32f35 Author: mikey dagitses Date: Tue Nov 15 21:33:38 2022 +0000 skip test that is broken in head (#88759) Test Plan: Rely on CI. Differential Revision: D41156351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88759 Approved by: https://github.com/zou3519 commit 175b7e1cde0eaaef0465aa9c760842e5ea07e104 Author: Catherine Lee Date: Tue Nov 15 21:27:14 2022 +0000 print xpass (#89020) Print unexpected success as XPASS. I will submit a PR to test-infra so that the log classifier can find these Ex: https://github.com/pytorch/pytorch/actions/runs/3466368885/jobs/5790424173 ``` test_import_hipify (__main__.TestHipify) ... ok (0.000s) test_check_onnx_broadcast (__main__.TestONNXUtils) ... ok (0.000s) test_prepare_onnx_paddings (__main__.TestONNXUtils) ... ok (0.000s) test_load_standalone (__main__.TestStandaloneCPPJIT) ... ok (16.512s) ====================================================================== XPASS [4.072s]: test_smoke (__main__.TestCollectEnv) ---------------------------------------------------------------------- ---------------------------------------------------------------------- Ran 31 tests in 24.594s FAILED (skipped=7, unexpected successes=1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89020 Approved by: https://github.com/huydhn, https://github.com/seemethere commit 8dc3353b0b1c12f64ba790c7be85cfbc99448cb4 Author: Nikita Vedeneev Date: Tue Nov 15 21:16:15 2022 +0000 add `to(dtype)` support for all sparse compressed formats (#89055) Fixes [#88419](https://github.com/pytorch/pytorch/issues/88419) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89055 Approved by: https://github.com/cpuhrsch commit da2afcb1e0006354f78d5e56d2933382d7af9ebf Author: Nikita Shulga Date: Tue Nov 15 21:05:59 2022 +0000 Add test for out-of-bounds Tensor access on GPU (#39211) Since CUDA context can not recover safely from on-device assert, use `torch.multiprocessing.spawn` to execute a method in another context and verify that it raises unrecoverable error. As those types of tests are pretty slow (6 seconds on powerful linux box with one GPU) run it only in the slow shard. Closes https://github.com/pytorch/pytorch/issues/38944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39211 Approved by: https://github.com/ezyang commit d47b94fa8e17ad805f1283943dd2b1bc46b309b8 Author: Fabio Rocha Date: Mon Nov 14 10:47:34 2022 +0000 [inductor] Added bucketize to decomp table (#88348) These are the benchmark results vs eager ``` [--------------------------- bucketize ----------------------------] | eager | decomp 32 threads: -------------------------------------------------------- ((16384, 1024), (16,)), (True, True) | 600 | 464 ((16384, 1024), (16,)), (True, False) | 542 | 464 ((16384, 1024), (16,)), (False, True) | 780 | 731 ((16384, 1024), (16,)), (False, False) | 777 | 731 ((16384, 1024), (64,)), (True, True) | 624 | 515 ((16384, 1024), (64,)), (True, False) | 603 | 515 ((16384, 1024), (64,)), (False, True) | 789 | 718 ((16384, 1024), (64,)), (False, False) | 786 | 718 ((16384, 1024), (256,)), (True, True) | 878 | 820 ((16384, 1024), (256,)), (True, False) | 891 | 830 ((16384, 1024), (256,)), (False, True) | 897 | 900 ((16384, 1024), (256,)), (False, False) | 900 | 900 ((16384, 1024), (1024,)), (True, True) | 2000 | 1890 ((16384, 1024), (1024,)), (True, False) | 1950 | 1892 ((16384, 1024), (1024,)), (False, True) | 1990 | 1962 ((16384, 1024), (1024,)), (False, False) | 1990 | 2060 ((16384, 1024), (4096,)), (True, True) | 3405 | 3155 ((16384, 1024), (4096,)), (True, False) | 3244 | 3154 ((16384, 1024), (4096,)), (False, True) | 3282 | 3219 ((16384, 1024), (4096,)), (False, False) | 3278 | 3220 ((16384, 1024), (16384,)), (True, True) | 4626 | 4672 ((16384, 1024), (16384,)), (True, False) | 4629 | 4671 ((16384, 1024), (16384,)), (False, True) | 4662 | 4829 ((16384, 1024), (16384,)), (False, False) | 4665 | 4824 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88348 Approved by: https://github.com/ngimel commit 9262d18e1bc1f31479677cbd2c121770f3f36522 Author: Fabio Rocha Date: Mon Nov 14 10:47:32 2022 +0000 [inductor] Introduce CSEVariable type and use it to track if Triton variables are scalar (#88347) This fixes https://github.com/pytorch/torchdynamo/issues/1515 To fix it, we need to keep track of whether a Triton variable is a scalar (so we can not use a mask when doing indirect loads through them). This requires a way of annotating variable names generated by CSE with properties. So now CSE will use CSEVariable class to keep track of variables and let backends subclass it so they can annotate them with whatever information they want. TritonCSEVariable is such a subclass that track the `is_scalar` property. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88347 Approved by: https://github.com/jgong5, https://github.com/ngimel commit edd2dea859613a9792cfd08a77cf6ae56a531644 Author: Colin Taylor Date: Tue Nov 15 20:46:00 2022 +0000 [torch] [analytics] add dynamo to analytics (#88915) Summary: as title. Differential Revision: D41237602 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88915 Approved by: https://github.com/jansel commit 3e2ba60ac0598c6d85ea83a25fd15df855b9f2f9 Author: Colin Taylor Date: Tue Nov 15 20:36:13 2022 +0000 [torch] [analytics] add pytorch event logger callsites to torch.save and torch.load (#89003) Summary: as title. Differential Revision: D41239419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89003 Approved by: https://github.com/ezyang, https://github.com/dzhulgakov commit d8466964b348b6172317f70b8e52de02402bad54 Author: Nikita Shulga Date: Tue Nov 15 20:35:48 2022 +0000 Add range check to multi margin loss target (#89008) Fixes https://github.com/pytorch/pytorch/issues/88724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89008 Approved by: https://github.com/ngimel commit 18c1f2f82eee51bf0e0061dc08d5416b6a7fe0cf Author: Colin Taylor Date: Tue Nov 15 20:35:34 2022 +0000 [torch] [analytics] add pytorch event logger callsites to transformers and encoder/decoders (#88896) Differential Revision: D41227275 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88896 Approved by: https://github.com/mikekgfb commit ff6d2a6d1b8245563c8122849144dddaa276483a Author: Driss Guessous Date: Tue Nov 15 20:22:54 2022 +0000 Add mem efficient backward (#88856) - Use gradcheck to test correctness. The kernel is not implemented for fp64 so run checks with bumped tolerances in fp32 - I also made updates based off of Xformer main branch and flash-attention cutlass branch. - This will enable the fused backward to be called for scaled dot product attention Pull Request resolved: https://github.com/pytorch/pytorch/pull/88856 Approved by: https://github.com/cpuhrsch commit d60abe4b9521e235c0e9beb00cda0d6c5673f4e0 Author: Jiawen Liu Date: Tue Nov 15 19:34:38 2022 +0000 [Inductor] Build FX Linear + Permute Vertical Fusion in Inductor (#88859) Summary: Build fx-based linear/matmul/bmm + permute/transpose vertical fusion in Inductor For an internal Ads model: **1.15x -> 1.36x speedup** Differential Revision: D41071665 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88859 Approved by: https://github.com/jianyuh, https://github.com/jansel commit f5df68509097c65263ccf100e5df6b1057e9a2fa Author: Xiao Wang <24860335+xwang233@users.noreply.github.com> Date: Tue Nov 15 19:25:53 2022 +0000 Enable channels_last_3d on SyncBatchNorm (#88401) This PR enabled the use of fast channels_last kernels on SyncBatchNorm with channels_last_3d memory format. With a small benchmark script here https://github.com/pytorch/pytorch/issues/88021#issuecomment-1299059859, on V100, I got master: ``` DDP channels_last=False, run_forward_backward, time: 0.8945400714874268 sec DDP channels_last=True, run_forward_backward, time: 1.4736433029174805 sec ``` This PR: ``` DDP channels_last=False, run_forward_backward, time: 0.8927242755889893 sec DDP channels_last=True, run_forward_backward, time: 0.48697471618652344 sec ``` This PR is a follow-up of https://github.com/pytorch/pytorch/pull/46906 Close https://github.com/pytorch/pytorch/issues/88021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88401 Approved by: https://github.com/ngimel commit 8023c9dc6420bce8e37ad4e4e363cb7bed7f70de Author: Taylor Robie Date: Fri Nov 11 16:30:05 2022 -0800 [Profiler] Memory profiler part 3: Schema parsing and mutable arguments (#86854) The appropriate annotation for a block of memory is a function of time: an input can be mutated in-place to become an activation, a clever kernel might steal the memory of a detached input (such as a mask) to use as output memory, etc. We could pessimistically assume that all ops mutate all of their inputs, however inspection of schema allows us to significantly narrow that assumption with minimal effort. Checking schemas also allows us to distinguish between dispatcher ops (which have load bearing semantics) and user annotations with reasonably high precision. Differential Revision: [D40220390](https://our.internmc.facebook.com/intern/diff/D40220390/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86854 Approved by: https://github.com/chaekit commit 2439bc1e9bab3721bb9f1c4853baf03b610c89da Author: Taylor Robie Date: Fri Nov 11 16:30:03 2022 -0800 [Profiler] Memory profiler part 2: Config validation (#86853) Memory profiling requires `record_shapes`, `profile_memory`, and `with_stack`. This PR just adds a skeleton endpoint with a good error message if certain flags are missing. Differential Revision: [D39920801](https://our.internmc.facebook.com/intern/diff/D39920801/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86853 Approved by: https://github.com/chaekit commit 279dcce702a56f5b3ce5e864fa4db2f882e01084 Author: mikey dagitses Date: Tue Nov 15 19:08:31 2022 +0000 disable test that fails in fbcode (#88786) Summary: caffe2/test:torch_cuda - test_advanced_indexing_assignment_lazy (test_view_ops.TestViewOpsLAZY) RuntimeError: TorchScript backend not yet supported in FBCODE/OVRSOURCE builds File "/usr/local/fbcode/platform010/lib/python3.8/unittest/suite.py", line 163, in _handleClassSetUp setUpClass() File "/re_cwd/fbcode/buck-out/opt/gen/caffe2/test/torch_cuda#binary,link-tree/torch/testing/_internal/common_device_type.py", line 506, in setUpClass torch._lazy.ts_backend.init() File "/re_cwd/fbcode/buck-out/opt/gen/caffe2/test/torch_cuda#binary,link-tree/torch/_lazy/ts_backend.py", line 6, in init torch._C._lazy_ts_backend._init() Test Plan: Rely on CI. Differential Revision: D41170545 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88786 Approved by: https://github.com/zou3519 commit 1db0f735e8fe14245e98e875c15ecf95ed2142ce Author: Taylor Robie Date: Fri Nov 11 16:30:01 2022 -0800 [Profiler] Account for caching when assigning IDs (#88917) The python tracer caches information about module and optimizer state. That means that for subsequent calls, the presence of a Tensor in these fields does not imply that the Tensor is still live; just that it was live during the first call. (I should perhaps rename the fields to something like `stale_parameters` to convey this.) Unless we discard subsequent calls ID assignment get tripped up when it see's a Tensor that was already released. Differential Revision: [D41226827](https://our.internmc.facebook.com/intern/diff/D41226827/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88917 Approved by: https://github.com/chaekit commit ee4412381ea3577fbf32858f35f8b76bdc548b49 Author: Zain Rizvi Date: Tue Nov 15 17:55:29 2022 +0000 Allow ROCm runners to have 2 or more gpus (#89011) [This run](https://github.com/pytorch/pytorch/actions/runs/3432340660/jobs/5721731207) failed claiming that it couldn't detect GPUs on the runner. Inspecting the rocminfo output (higher up in logs) show that it in fact had three GPUs, but the workflow is currently setup to expect either 2 or 4 gpus. The workflow files currently have no way of specifying wither it'll get a 2 gpu or a 4 gpu machine, so really 2 is all any test can expect to get. [This old PR](https://github.com/pytorch/pytorch/pull/72142/files) shows that historically ROCm runners only had 4 gpus, then later the logic was extended to expect 2 GPU runners as well. It's not clear how the ROCm runner ended up with 3 gpus instead of 2 or 4 (something for ROCm folks to look into) but there doesn't seem to be a good reason for ROCm workflows to fail if 3 (or 5) gpus ever show up on a machine. This PR makes the workflows resilient to ROCm having these alternate GPU counts Also filed https://github.com/pytorch/pytorch/issues/89012 against the ROCm team to explore why the runner only had 3 gpus Pull Request resolved: https://github.com/pytorch/pytorch/pull/89011 Approved by: https://github.com/huydhn commit 2819df9a19480feba72f9c613be25e56d4f05142 Author: Pruthvi Madugundu Date: Tue Nov 15 17:49:00 2022 +0000 [ROCm] Enable python ref executor UTs for ROCm (#88981) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88981 Approved by: https://github.com/mruberry commit 62ba15e10e875ce088dff26e872605ee70c8c04a Author: Tugsbayasgalan (Tugsuu) Manlaibaatar Date: Mon Nov 14 23:26:15 2022 -0800 Rewrite assert statement with torch._assert under config (#88246) This diff rewrites assert statement in python with torch._assert under config. The resulting graph looks something like: ``` SOURCE CODE: def f(x): assert x[0] == 3 return x.cos() CAPTURED GRAPH: graph(): %arg0 : [#users=2] = placeholder[target=arg0] %getitem : [#users=1] = call_function[target=operator.getitem](args = (%arg0, 0), kwargs = {}) %eq : [#users=1] = call_function[target=operator.eq](args = (%getitem, 3), kwargs = {}) %_assert : [#users=0] = call_function[target=torch._assert](args = (%eq, "assertion_error"), kwargs = {}) %cos : [#users=1] = call_method[target=cos](args = (%arg0,), kwargs = {}) return cos ``` Note that this introduces side-effect as it could error out while executing graph, but the assertion can eliminated via DCE if we choose to ignore it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88246 Approved by: https://github.com/jansel commit b815f1fc502387311a7b4da8c2f52ead56cbfff5 Author: anjali411 Date: Tue Nov 15 13:05:30 2022 +0000 Symintify view_as_complex and view_as_real (#89052) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #89052 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89052 Approved by: https://github.com/ezyang commit b9029fc4497a9453e76892c9cf56144add89faf7 Author: HDCharles Date: Fri Nov 11 08:55:40 2022 -0800 [ao] quant_type.py fixing public v private (#87519) Summary: made _get_quant_type_to_str private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709282](https://our.internmc.facebook.com/intern/diff/D40709282) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87519 Approved by: https://github.com/jcaip commit 5faa2792fa3c46f2124d1d1c5f7b6a3865d47d7b Author: Sherlock Huang Date: Tue Nov 15 01:06:23 2022 +0000 Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761 Approved by: https://github.com/ezyang commit 63e16216d8830b6340816c873b035e1a31ad4636 Author: Masaki Kozuki Date: Tue Nov 15 13:21:39 2022 +0000 [c10d] Implement `__instancecheck__` for `c10d::ReduceOp` (#88275) Summary: - Customize the metaclass of `torch.distributed.distributed_c10d.ReduceOp` for the sake of custom `__instancecheck__` - Add `copy.copy`, `copy.deepcopy`, and `pickle` support with tests Rel: - #81272 - #84243 - #87191 - #87303 - #87555 Ref: - https://github.com/pybind/pybind11/issues/2696 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88275 Approved by: https://github.com/wanchaol commit 2452e3f99a072760fc46d3f9025aaa37ca7ea2ab Author: Chen Lai Date: Mon Nov 14 20:16:45 2022 -0800 Update xnnpack graph schema to use xnode and xvalue (#89036) There are different nodes definition like [Node in autograd](https://www.internalfb.com/code/fbsource/fbcode/caffe2/torch/csrc/autograd/function.h?lines=108-609&reveal=108-609) and onnxnodes and etc. Understand namespace can be used where nodes from definition are used together, however it's still better to slightly differentiate the name. Differential Revision: [D41002324](https://our.internmc.facebook.com/intern/diff/D41002324/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89036 Approved by: https://github.com/mcr229 commit 8c46a5de3a2e72c5ffbb714fa4e2d44fc2e59951 Author: Chen Lai Date: Mon Nov 14 20:16:43 2022 -0800 Add debug handle to xnnpack schema (#89033) As title, add three things to the schema 1. debug handle for each node 2. file identifier, so we can sanity check we are getting the xnnpack schema flatbuffers file, instead of other random binary 3. extension, so the dumped binary will end up with its own extension like `myschema.xnnpack` (maybe can have a better name) instead of the default extension `.bin` Differential Revision: [D40906970](https://our.internmc.facebook.com/intern/diff/D40906970/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89033 Approved by: https://github.com/mcr229 commit 50c18217a3849c56a0fe5bdb923bd67fa70da31c Author: PyTorch MergeBot Date: Tue Nov 15 09:37:09 2022 +0000 Revert "Add mem efficient backward (#88856)" This reverts commit 35e668b5ced25e735b6e523d557ed7fd60267914. Reverted https://github.com/pytorch/pytorch/pull/88856 on behalf of https://github.com/DanilBaibak due to breaking internal builds commit 5314af5383e56376cd62da22ae07681656667e71 Author: Wenzhe Xue Date: Tue Nov 15 07:29:52 2022 +0000 Set correct size of `attr::output_layouts` when the graph has multiple outputs in JIT oneDNN fuser (#88496) Bug: Previously, `initOutputLayouts()` was called after creating a graph and before merging other nodes. It is a vector with one element. So when a graph contains multiple outputs, e.g. using AOTAutograd compile in my case, layout_propagation pass try to access out of range elements in the vector. Then it comes to the second bug in `useOpaqueLayout()`, the out of range checks the index with the updated output size instead of the size of the vector. Then used `[]` to access the element, which is out of range. Fixes the above two issues: 1. check the offset is within range with the size of `attr::output_layouts` vector instead of another variable. This check catches the error now. 2. change the place to initial `attr::output_layouts` after node merging. The graph may change with node merging. Thus we moved the initialization in layout_propagation with the complete graph. Added test time: `Ran 1 test in 0.383s` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88496 Approved by: https://github.com/jgong5, https://github.com/sanchitintel commit 60e59c075561068c7d1fe9e9fc40a2df3cd2d2d7 Author: peterjc123 Date: Tue Nov 15 06:36:24 2022 +0000 Fix get_default_qat_qconfig for PT 1.13 (#88876) See https://github.com/pytorch/pytorch/pull/84329/files#r1019916766 for more context Pull Request resolved: https://github.com/pytorch/pytorch/pull/88876 Approved by: https://github.com/jgong5, https://github.com/vkuzo commit 5ed90c40f874359aca13f7f50e6d115524937d02 Author: Natalia Gimelshein Date: Tue Nov 15 06:16:13 2022 +0000 enable index_put test (#89019) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/89019 Approved by: https://github.com/desertfire commit 68fd8f37063f0011f1c0589e8f38f7606e3f6748 Author: Iris Date: Tue Nov 15 06:13:15 2022 +0000 [BE] [c10d][send] Improve error message on dist.send() with destination rank as itself (#89004) This improves error msg on dist.send() and add corresponding test in test_c10d_common.py(https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_common.py). Context in issue#83912: https://github.com/pytorch/pytorch/issues/83912 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89004 Approved by: https://github.com/H-Huang commit 21dd311077d00ff5c3f930295ddc8cf915a262d7 Author: Huy Do Date: Tue Nov 15 05:08:26 2022 +0000 Add a mode to rerun all disabled tests (without running anything else) (#88646) Rerun all disabled test to gather their latest result so that we can close disabled tickets automatically. When running under this mode (RERUN_DISABLED_TESTS=true), only disabled tests are run while the rest are skipped `` The logic is roughly as follows, the test runs multiple times (n=50) * If the disabled test passes, and it's flaky, do nothing because it's still flaky. In the test report, we'll see the test passes with the following skipped message: ``` ``` * If the disabled test passes every single time, and it is not flaky anymore, mark it so that it can be closed later. We will see the test runs and passes, i.e. ``` ``` * If the disabled test fails after all retries, this is also expected. So only report this but don't fail the job (because we don't care about red signals here), we'll see the test is skipped (without the `flaky` field), i.e. ``` ``` This runs at the same schedule as `mem_leak_check` (daily). The change to update test stats, and (potentially) grouping on HUD will come in separated PRs. * pull https://github.com/pytorch/pytorch/actions/runs/3447434434 * trunk https://github.com/pytorch/pytorch/actions/runs/3447434928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88646 Approved by: https://github.com/clee2000 commit 73d71ae3d62607f2e480af37c470375ea405eb1c Author: Elias Ellison Date: Tue Nov 15 00:21:52 2022 +0000 [WIP] Unwrap View in Reinterpret View (#89016) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89016 Approved by: https://github.com/ngimel commit dd6beca854be6cc0619d0b0693bc2fc558636217 Author: Everton Constantino Date: Tue Nov 15 04:10:49 2022 +0000 Changing the use from ASSERT_EQ to ASSERT_FLOAT_EQ on nn_utils test. (#83693) Changing the use from ASSERT_EQ to ASSERT_FLOAT_EQ on nn_utils.cpp:ClipGradNorm as this is the proper way to compare equality between floating point values. This avoids `test_api` ClipGradNorm failing for WoA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83693 Approved by: https://github.com/ngimel, https://github.com/kit1980 commit ce8a45c282c68abbf37f7af99d4bd7cb53fa020d Author: PyTorch MergeBot Date: Tue Nov 15 03:32:00 2022 +0000 [vision hash update] update the pinned vision hash (#89026) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89026 Approved by: https://github.com/pytorchbot commit 55b88cde0ab0e5457422777971af845842b2689b Author: Jiawen Liu Date: Tue Nov 15 03:10:36 2022 +0000 [Inductor] Build Shape Padding in Inductor (#88709) Summary: Build shape padding for matmul/bmm/addmm in Inductor Differential Revision: D41071282 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88709 Approved by: https://github.com/bertmaher, https://github.com/Chillee commit cbdb683dc843f2d50617ad962d5e57501e5154d4 Author: Edward Z. Yang Date: Mon Nov 14 16:51:32 2022 -0500 Add test that bias gradient is properly tested in same_two_models (#88995) See https://github.com/pytorch/pytorch/pull/88629#issuecomment-1313850324 for why this got broken. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88995 Approved by: https://github.com/albanD commit 45d2daaf855d4e79f6e09c4d3f85743b955446e6 Author: William Wen Date: Tue Nov 15 02:32:55 2022 +0000 Fix lookup file update in dashboard (#89024) Lookup file should be updated before graphs are generated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89024 Approved by: https://github.com/mlazos, https://github.com/anijain2305 commit 1f88b208acab2cf974849c9161d24f08486f592c Author: Michael Gschwind Date: Tue Nov 15 01:25:17 2022 +0000 Fix cuda/cpu check on NoneType (Unit test) (#88970) Summary: Fix cuda/cpu check on NoneType (unit test) Test Plan: sabdcastle/ github CI/CD Differential Revision: D41208798 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88970 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch commit 35e668b5ced25e735b6e523d557ed7fd60267914 Author: Driss Guessous Date: Tue Nov 15 01:10:35 2022 +0000 Add mem efficient backward (#88856) - Use gradcheck to test correctness. The kernel is not implemented for fp64 so run checks with bumped tolerances in fp32 - I also made updates based off of Xformer main branch and flash-attention cutlass branch. - This will enable the fused backward to be called for scaled dot product attention Pull Request resolved: https://github.com/pytorch/pytorch/pull/88856 Approved by: https://github.com/cpuhrsch commit f3462833bdd1324d32ad9a78b5f142fb4d75f57c Author: Zain Rizvi Date: Tue Nov 15 01:01:37 2022 +0000 Use same retry logic as macos binary builds (#89014) Occasionally the command to download sccache via curl fails with network errors (example below). The default curl retry option only retries errors that are considered "transient", but but the set of actual transient commands is greater than what curl considers to be transient. This PR modifies the retry logic for downloading sccache to match what's in https://github.com/pytorch/pytorch/blob/master/.github/templates/macos_binary_build_workflow.yml.j2#L79-L89, using the retry action to ensure we both retry all transient errors, and including a longer retry delay to give the transient issue time to resolve itself. Example failure from [this run](https://github.com/pytorch/pytorch/actions/runs/3422664884/jobs/5700595220): ``` Run sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:06 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:07 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:08 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:10 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:11 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:12 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:13 --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:14 --:--:-- 0 curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to s3.amazonaws.com:443 Error: Process completed with exit code 35. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89014 Approved by: https://github.com/huydhn commit 7a37bbed15321fa121f628053ee3c93d516700f5 Author: XiaobingSuper Date: Mon Nov 14 07:40:32 2022 -0500 Take input striding for conv fusion op based on eager output (#88864) As https://github.com/pytorch/pytorch/pull/88706, we also change the input stride check using eager output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88864 Approved by: https://github.com/jgong5, https://github.com/jansel commit 0544a32ba35acd8648692a662197e3497654858e Author: Jongsoo Park Date: Tue Nov 15 00:48:49 2022 +0000 [inductor] fix could not find as_strided with config.triton.mm=triton (#88946) Summary: ReinterpretView doesn't seem to be handled properly with matrix multiply Triton kernels Reviewed By: bertmaher Differential Revision: D40836677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88946 Approved by: https://github.com/jansel commit 92c78f37afca6c1ff6c40be7c7ed44b162b287b4 Author: wswartworth Date: Mon Nov 14 23:58:46 2022 +0000 improving torch.linalg.lstsq documentation formatting (#89013) Fixes #80441 The highlighting in the documentation for torch.linalg.lstsq was incorrect due to a newline that sphinx doesn't parse correctly. Instead of writing the tensors directly, I used randn to generate the tensors. This seems to be more consistent with how other documentation is written. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89013 Approved by: https://github.com/lezcano commit 8df64abc6d8cd1de7017096159a93bb9c7c02bc1 Author: Edward Z. Yang Date: Mon Nov 14 10:49:20 2022 -0500 Fix some naughty uses of reshape/flatten (#88999) Mutating after reshape/flatten is bad! And it turns out the corresponding view operations are guaranteed to work too. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88999 Approved by: https://github.com/albanD commit c53a5ac6cca7e2e7d7c47b1a816c7eaa2e7a7704 Author: PyTorch MergeBot Date: Mon Nov 14 23:36:17 2022 +0000 Revert "support running test_mobile_profiler with buck1/buck2 and OSS (#89001)" This reverts commit 3b33a2794e07b5216aa473da67755af3aa6e6433. Reverted https://github.com/pytorch/pytorch/pull/89001 on behalf of https://github.com/kit1980 due to Broke trunk / macos-12-py3-x86-64-lite-interpreter / build commit 3c3bd55bea3424cbfc0c319dcead9c1e5c55646d Author: Kazuaki Ishizaki Date: Mon Nov 14 23:24:31 2022 +0000 [testing] fix a key in parse_namespace() (#88969) This PR fixes an incorrect key name of `mappings` dict in `parse_namespace()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88969 Approved by: https://github.com/kit1980 commit 911a1349dd5d93b9de62d82f439b09eae9aedb92 Author: Yanbo Liang Date: Mon Nov 14 22:45:50 2022 +0000 [Dynamo] Fix torch.is_tensor and torch.overrides.is_tensor_like (#88704) Fixes error from 7k github models: https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_arashwan_matrixnet.py Error: ``` AssertionError: torch.* op returned non-Tensor bool call_function from user code: File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_arashwan_matrixnet.py", line 749, in scatter return scatter_map(inputs) File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_arashwan_matrixnet.py", line 741, in scatter_map assert not torch.is_tensor(obj), 'Tensors not supported in scatter.' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88704 Approved by: https://github.com/jansel commit 3b33a2794e07b5216aa473da67755af3aa6e6433 Author: mikey dagitses Date: Mon Nov 14 22:11:29 2022 +0000 support running test_mobile_profiler with buck1/buck2 and OSS (#89001) Summary: Internally we are switching to a new version of buck, but we also must keep this working in OSS. Test Plan: Rely on CI. Differential Revision: D41270673 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89001 Approved by: https://github.com/r-barnes, https://github.com/osalpekar, https://github.com/malfet commit 074278f393e1a31b7ee058479cd5906ae830f5ed Author: Nikita Shulga Date: Mon Nov 14 21:54:46 2022 +0000 [CI] Push `latest` and hash+CUDAver tags (#88971) For nightly docker build to simulate the behavior of `push_nightly_docker_ghcr.yml` Tested in https://github.com/pytorch/pytorch/actions/runs/3465221336/jobs/5787694933 Fixes https://github.com/pytorch/pytorch/issues/88833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88971 Approved by: https://github.com/seemethere commit b2082833c6082cbb25caf48bdb8f58c490b2c8a7 Author: PyTorch MergeBot Date: Mon Nov 14 21:21:09 2022 +0000 Revert "woof (#89010)" This reverts commit 4570bd6030c97577d2fa994857d0a022ef7563a4. Reverted https://github.com/pytorch/pytorch/pull/89010 on behalf of https://github.com/ezyang due to whoops this actually landed commit 4570bd6030c97577d2fa994857d0a022ef7563a4 Author: Edward Z. Yang Date: Mon Nov 14 14:34:01 2022 -0500 woof (#89010) Signed-off-by: Edward Z. Yang Differential Revision: [D41276175](https://our.internmc.facebook.com/intern/diff/D41276175) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89010 Approved by: https://github.com/bigfootjon commit f80992217dd2ae5ca0af5e280388cba6078ef57b Author: anjali411 Date: Mon Nov 14 14:43:15 2022 +0000 Remove skip (#88979) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88979 Approved by: https://github.com/voznesenskym commit 540b42a1a883bb56235cdbf0bbbf103041c4dd8c Author: Jerry Zhang Date: Mon Nov 14 19:27:46 2022 +0000 [quant][executorch] Support quant fusion for cat in quant in executorch stack (#88960) Summary: * added cat in executorch backend config * added quant fusion for "dq - cat - q" pattern Test Plan: buck run executorch/exir/tests:quant_fusion_pass -- "executorch.exir.tests.test_quant_fusion_pass.TestQuantFusionPass.test_cat" Reviewed By: qihqi Differential Revision: D41111054 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88960 Approved by: https://github.com/JacobSzwejbka commit e0c194f10b20a5ab2ad8d2075bec81ca57320268 Author: Kazuaki Ishizaki Date: Mon Nov 14 19:06:38 2022 +0000 Fix typos in messages under torch (#88961) This PR fixes typos of messages and parms in c++ source and head files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88961 Approved by: https://github.com/albanD commit 3d79ced8cfb2ddd250f9a31dad9b990c120e6dab Author: Peter Bell Date: Sat Nov 12 14:20:41 2022 +0000 wrap_pybind_function: support member function pointers (#88932) This updates `wrap_pybind_function` to use `invoke` and adds the `invoke_traits` object which is analogous to `function_traits` but for member functions it includes the class as an explicit argument. To test this is working properly, I've also applied it to the `CUDAGraph` binding code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88932 Approved by: https://github.com/albanD commit 36d87465fb9b34914e6db50638c0f5bf04e3d7d9 Author: William Wen Date: Mon Nov 14 18:43:50 2022 +0000 Fix long comment error on dashboard (#89002) Fix dashboard comment failure due to the following trace: ``` Traceback (most recent call last): File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1180, in DashboardUpdater(args).update() File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1119, in update self.comment_on_gh(comment) File "/scratch/anijain/dashboard/work/pytorch/benchmarks/dynamo/runner.py", line 1096, in comment_on_gh subprocess.check_call( File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 368, in check_call retcode = call(*popenargs, **kwargs) File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 349, in call with Popen(*popenargs, **kwargs) as p: File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/scratch/anijain/dashboard/env/lib/python3.9/subprocess.py", line 1821, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) OSError: [Errno 7] Argument list too long: '/data/home/anijain/miniconda/bin/gh' srun: error: a100-st-p4d24xlarge-27: task 0: Exited with exit code 1 ``` That is, we were trying to execute a gh command in the OS that was too long. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89002 Approved by: https://github.com/davidberard98 commit cdb798faefa2520b37938311bcef1c175581a0ff Author: Sean Ross-Ross Date: Mon Nov 14 18:39:45 2022 +0000 _get_nested_attr should return a value in the general case (#88822) Fixes https://github.com/pytorch/functorch/issues/1053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88822 Approved by: https://github.com/zou3519 commit f1a5044de0639180f667d212800aa43f34026b3c Author: Khushi Agrawal Date: Mon Nov 14 18:18:45 2022 +0000 [primTorch] _refs & opinfo alpha_dropout (#87989) Add _refs and OpInfo for `nn.functional.alpha_dropout` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87989 Approved by: https://github.com/mruberry commit b0c86caa1d46a16195682e2afe5456f97265aa53 Author: Ivan Yashchuk Date: Mon Nov 14 17:49:30 2022 +0000 Remove cpu path from lobpcg's basis helper (#88984) Fixes https://github.com/pytorch/pytorch/issues/88650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88984 Approved by: https://github.com/lezcano commit 06f1b52705ee360e5ac89e0f1f32f69ffde72b9a Author: Natalia Gimelshein Date: Mon Nov 14 17:37:24 2022 +0000 don't use prims.unsqueeze in group_norm (#88927) inductor doesn't have prims.squeeze lowering, so this breaks it. Longer term, `squeeze` with multiple dimensions is not a prim, nvfuser implements it with a loop, inductor uses `_squeeze_multiple` helper which turns it into a loop. Prim should accept only a single dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88927 Approved by: https://github.com/eellison commit c8f3d1c13460bbaa85b7f423bfb7f414e825c757 Author: Peter Bell Date: Mon Nov 14 12:36:44 2022 +0000 Run test_torchinductor_opinfo CPU tests if triton not installed (#88934) These test are not run currently because normal CI workers don't have triton installed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88934 Approved by: https://github.com/ngimel commit ec4eadac5baebcf094836108a25ef3af63d39f5d Author: Brian Hirsh Date: Fri Nov 11 14:13:01 2022 -0800 reland "Do not use unsafe restriding for subclasses (#87610)" (#88343) This reverts commit 5b75b19f51837e162cc0e5e5757dfd9bef437c67. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88343 Approved by: https://github.com/ezyang commit 9943d46aab4465b887039aa1a9b5d9ebc0a01a35 Author: XiaobingSuper Date: Sun Nov 13 22:09:58 2022 -0500 TorchDynamo: skip convolution fusion when convolution's padding is string (#88794) Currently, the fusion convolution doesn't support the case when padding is a string, we will support it at the next step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88794 Approved by: https://github.com/jansel, https://github.com/jgong5 commit 15ef0660c553ebb50ad639f563062cab01e5e6dc Author: XiaobingSuper Date: Sun Nov 13 22:09:56 2022 -0500 Fake Tensor For (ConvFusion) Propagation (#88414) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88414 Approved by: https://github.com/jgong5, https://github.com/jansel commit 5e6cefd258dfdb4ddf2956c0b5631d84e97027e5 Author: PyTorch MergeBot Date: Mon Nov 14 12:02:43 2022 +0000 Revert "Run test_torchinductor_opinfo CPU tests if triton not installed (#88934)" This reverts commit 8371bb8a3dddbead709bc1e9d26715818a34fa8a. Reverted https://github.com/pytorch/pytorch/pull/88934 on behalf of https://github.com/peterbell10 due to Inductor tests failing on master commit 8371bb8a3dddbead709bc1e9d26715818a34fa8a Author: Peter Bell Date: Sun Nov 13 22:33:13 2022 +0000 Run test_torchinductor_opinfo CPU tests if triton not installed (#88934) These test are not run currently because normal CI workers don't have triton installed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88934 Approved by: https://github.com/ngimel commit 072920c281bb4d9ca899c6c781a8374ab42a9a3f Author: XiaobingSuper Date: Sun Nov 13 22:09:54 2022 -0500 TorchDynamo: Add convolution binary+unary fusion for cpu in inference mode (#88412) This PR is about enabling the fusion of **conv+binary+relu**, which will improve the vision model's performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88412 Approved by: https://github.com/jgong5, https://github.com/jansel commit cb4842c9495a68d2a1d4a3ee3ffc9eab30dce28c Author: PyTorch MergeBot Date: Mon Nov 14 10:29:24 2022 +0000 [xla hash update] update the pinned xla hash (#88982) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88982 Approved by: https://github.com/pytorchbot commit 03296844aa0cb560401584545ba1412e52c87b37 Author: Kazuaki Ishizaki Date: Mon Nov 14 09:50:50 2022 +0000 Fix typos in messages under aten (#88964) This PR fixes typos of messages and parms in c++ source files under `aten` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88964 Approved by: https://github.com/lezcano commit 4ad7b17fabd2a2b6873bc369bd223223ff1e628b Author: XiaobingSuper Date: Sun Nov 13 22:09:53 2022 -0500 TorchDynamo: Add convolution binary(inplace) fusion for cpu in inference mode (#88403) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88403 Approved by: https://github.com/jgong5, https://github.com/jansel commit 06486cd0087200e08ebb8a9518e064251c7c5309 Author: iLeGend <824040212@qq.com> Date: Mon Nov 14 03:39:43 2022 +0000 fix typo: AT_MKLDNN_EBABLED => AT_MKLDNN_ENABLED (#88952) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88952 Approved by: https://github.com/XiaobingSuper commit eea506aee12371a1fbde271c99fb30a8537d1db7 Author: PyTorch MergeBot Date: Mon Nov 14 01:58:47 2022 +0000 Revert "Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761)" This reverts commit 9eabcc370f4c3a04be85cb1f878038f10716bdc3. Reverted https://github.com/pytorch/pytorch/pull/88761 on behalf of https://github.com/suo due to much broken https://hud.pytorch.org/pytorch/pytorch/commit/9eabcc370f4c3a04be85cb1f878038f10716bdc3 commit 48dc24ddceb5d048ceb38f00f6d4ec0cfc3e71d0 Author: Aaron Gokaslan Date: Sun Nov 13 22:05:41 2022 +0000 Fix: [ATen] Add some missing moves (#88514) Related to #88512 , but for ATen. This should reduce a number of copies and inefficient atomic smart pointer increments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88514 Approved by: https://github.com/jgong5, https://github.com/ezyang commit 9eabcc370f4c3a04be85cb1f878038f10716bdc3 Author: Sherlock Huang Date: Sun Nov 13 06:06:24 2022 +0000 Symintify decomps for split and upsample_bilinear; Fix decomp for _softmax_backward_data and native_dropout_backward (#88761) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88761 Approved by: https://github.com/ezyang commit 76af71444a43962ee3e1cef987ac2028f2b8f44d Author: Nikita Karetnikov Date: Sat Nov 12 20:06:12 2022 +0100 [primTorch] Add ref for `complex` (#88562) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88562 Approved by: https://github.com/ezyang commit 8f7e519f12d165c06ea3e20b994c2d3c5c44af2c Author: Jason Ansel Date: Sun Nov 13 19:42:42 2022 +0000 Skip dynamo benchmark tests under TSAN (#88895) Summary: Fixes T137546804 Test Plan: ``` buck2 test mode/opt-tsan //caffe2/benchmarks/dynamo:test buck2 test mode/opt //caffe2/benchmarks/dynamo:test ``` Differential Revision: D41226384 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88895 Approved by: https://github.com/anijain2305 commit 52be0c42abfcf566e730d927b6a3e90e4380017b Author: anjali411 Date: Sun Nov 13 15:56:16 2022 +0000 meta function for max_pool2d_with_indices_backward (#88743) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88743 Approved by: https://github.com/lezcano, https://github.com/ezyang commit 98bcb4acb651378d7eaae7532d52f08939464c06 Author: PyTorch MergeBot Date: Sun Nov 13 16:21:12 2022 +0000 Revert "[reland][dynamo] Better support for nn.Module (#88959)" This reverts commit e950afc3958c9bae5d61cbc99bc088309141df6d. Reverted https://github.com/pytorch/pytorch/pull/88959 on behalf of https://github.com/malfet due to Broke `test_accuracy_issue1` commit 897d029a738c831448c0984bc0ab91544ca04545 Author: Animesh Jain Date: Sun Nov 13 16:20:45 2022 +0000 [reland][dynamo] fixes dict changed during runtime error (#88877) Reland https://github.com/pytorch/pytorch/pull/87526 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88877 Approved by: https://github.com/ezyang commit 4284862db6e7c14494f27ef681036d909a5e8b67 Author: Andrew Gu Date: Sat Nov 12 19:26:28 2022 +0000 [Dynamo][FSDP] Migrate to `ModuleWrapPolicy` (#88453) Hello @wconstab! As you saw, `transformer_auto_wrap_policy()` is a misnomer and actually works for any module classes. The PR before this one tries to add a class `ModuleWrapPolicy` that takes in the `module_classes` in its constructor and works just like `transformer_auto_wrap_policy()` without requiring the `functools.partial()`. I hope you do not mind if we update the dynamo benchmarks util file with this migration. The PR before this one might require some back and forth within FSDP devs, so I apologize for any consequent updates to this PR, which in itself is an easy change. I will request review once we know the previous PR is good for land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88453 Approved by: https://github.com/wconstab commit bca75fd2d36de72c2682b47d62eab01f6f897b75 Author: Chen Lai Date: Sat Nov 12 21:41:31 2022 -0800 Move xnnpack taget to fb code base (#88909) 1. Move the source file list to the `build_variables.bzl`, as it's the source of truth for both internal buck build and oss build 2. Move target definitions to `fb` internal folder 3. Some changes are triggered from auto format. Differential Revision: [D40906961](https://our.internmc.facebook.com/intern/diff/D40906961/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40906961/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/88909 Approved by: https://github.com/mcr229 commit 2b12bfce8800cfcc54222e913955914994bb4daf Author: Animesh Jain Date: Sun Nov 13 09:53:38 2022 +0000 [dynamo] Skip frame when graph break in a loop (#88857) This fixes excessing recompilation issue in tacotron2 but has few caveats - https://github.com/pytorch/torchdynamo/issues/330 For tacotron2, the repro is something like this ~~~ def inner(x): return torch.sin(x) def fn(x): for _ in range(100): inner(x) torch._dynamo.graph_break() return x ~~~ The problem here is that Dynamo has guards on the TUPLE_ITERATOR_LEN whenever a graph break happens. Therefore, we keep on recompiling. This PR checks if there is a backedge (helps with while loop) in presence of a graph break. If there is, Dynamo skips processing this frame. Therefore, Dynamo gets called when inner is called, and we compile only once. Note that, if there was no graph break, we will unroll the original loop, and see one graph with 100 sin operations (just as before, so no changes there). The caveat is - We are skipping the frame, so if we have something like this ~~~ def fn(x): for _ in range(100): torch._dynamo.graph_break() return x ~~~ Dynamo will skip processing this frame, and might miss on the optimization. Completely open for suggestions. Happy to re-implement if there is a better way to handle this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88857 Approved by: https://github.com/jansel, https://github.com/yanboliang commit e950afc3958c9bae5d61cbc99bc088309141df6d Author: Animesh Jain Date: Sun Nov 13 08:19:45 2022 +0000 [reland][dynamo] Better support for nn.Module (#88959) Relanding https://github.com/pytorch/pytorch/pull/88629 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88959 Approved by: https://github.com/msaroufim commit 06ce1338bced2d2cb933a383157b335f65a35e71 Author: Michael Voznesensky Date: Sun Nov 13 04:50:21 2022 +0000 [dynamo] Port all pytorch/dynamo and test/dynamo pieces over from symbolic-shapes branch (#88768) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88768 Approved by: https://github.com/jansel, https://github.com/ezyang commit 4f2639e56ad5b26d2f5383dcc14e0f91c250d355 Author: Andrew Gu Date: Sat Nov 12 20:27:00 2022 +0000 [FSDP] Fix `FSDP.clip_grad_norm_()` for `NO_SHARD` (#88955) This PR fixes `FSDP.clip_grad_norm_()` for `NO_SHARD`, which previously "double-counted" each gradient `world_size`-many times. This does not address any discrepancies between `FULL_SHARD` and DDP. (Note that the unit tests do show parity between `FULL_SHARD` and DDP when using `FSDP.clip_grad_norm_()` and `nn.utils.clip_grad_norm_()` respectively on one iteration.) The added unit test code path tests mixing nested FSDP instances with both `FULL_SHARD` and `NO_SHARD` to ensure that the `local_sharded_norm` and `local_nonsharded_norm` computations are interoperating correctly. I want to test non-FSDP root instance in the future, but this is BC breaking since we need to make `clip_grad_norm_()` a static method, which would require a different method call syntax (`FSDP.clip_grad_norm_(root_module, ...)` vs. `root_module.clip_grad_norm_(...)`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/88955 Approved by: https://github.com/zhaojuanmao commit 46796fe5e9b74602d45927304773fdcda1c3215a Author: Edward Z. Yang Date: Sat Nov 12 06:19:02 2022 -0800 Fix XLA symbolic shapes binding (#88928) Obsoletes https://github.com/pytorch/pytorch/pull/88772 Mostly revolves around NOT assuming that the inside is a SymNode, but instead duck-typed to be a SymNode. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88928 Approved by: https://github.com/SherlockNoMad commit 2aca97cc9ae7081f00ebc7d58367c443cd4528cf Author: Aleksandar Samardžić Date: Sun Nov 13 00:31:11 2022 +0000 Vectorized CPU code implementing left shift operator. (#88607) This PR adds vectorized implementation for CPU version of left shift operator. All of the tests run by `pytest test/test_ops.py -vk left_shift` pass. Here are some additional details:
Benchmarking script (writen by Philip, with small tweaks by Mario) comparing left shifts with multiplications - on par now ```python import torch from torch import Tensor from torch.utils.benchmark import Timer, Compare from itertools import product from functools import partial def _num_value_bits(dtype): if dtype == torch.uint8: return 8 else: # torch.int32 return 31 def _max_value(dtype): if dtype == torch.uint8: return 255 else: # torch.int32 return 2147483647 def bitshift(image, dtype): num_value_bits_input = _num_value_bits(image.dtype) num_value_bits_output = _num_value_bits(dtype) return image.to(dtype).bitwise_left_shift_(num_value_bits_output - num_value_bits_input) def mul(image, dtype): input_max = float(_max_value(image.dtype)) output_max = float(_max_value(dtype)) factor = int((output_max + 1) // (input_max + 1)) image = image.to(dtype) return image * factor size = 256 image = torch.randint(0, 256, (3, size, size), dtype=torch.uint8) dtype = torch.int32 def gen_inputs(): devices = ("cpu",) fns = (mul, bitshift) threads = (1,) for device, fn, threads in product(devices, fns, threads): yield f"Bitshift {device} {image.dtype}", str(tuple(image.shape)), threads, fn, image, dtype def benchmark(label, sub_label, threads, f, *args, **kwargs): return Timer("f(*args, **kwargs)", globals=locals(), label=label, description=f.__name__, sub_label=sub_label, num_threads=threads).blocked_autorange() results = [] for args in gen_inputs(): results.append(benchmark(*args)) compare = Compare(results) compare.trim_significant_figures() compare.print() ```
Test script exercising large number of combinations of left shift operands that I've used for further testing (validates results through comparing with results generated by NumPy) ```python import numpy as np import torch def _create_inputs(dtype): info = torch.iinfo(dtype) if dtype == torch.int8 or dtype == torch.int16: ntests = info.max + 1 x = torch.arange(info.max + 1, dtype=dtype, device="cpu", requires_grad=False) else: ntests = 100000 x = torch.randint(info.max + 1 if dtype != torch.int64 else info.max, (ntests,), dtype=dtype, device="cpu", requires_grad=False) y = torch.tensor(range(info.bits), dtype=dtype, device="cpu", requires_grad=False) xy = torch.cartesian_prod(x, y) return (xy[:, 0], xy[:, 1]) torch.manual_seed(0) for dtype in (torch.int8, torch.int16, torch.int32, torch.int64): (x, y) = _create_inputs(dtype) z = x << y xnp = x.numpy() ynp = y.numpy() znp = z.numpy() assert((znp == (xnp << ynp)).all()) ```
Benchmarking script running the left shift operator on tensors of different length (and varying number of bits to shift) ```python import torch import pickle import itertools from torch.utils.benchmark import Timer, Compare torch.manual_seed(0) lengths = [1024, 4096, 16384, 65536] rhss = [1, 2, 7, 8, 15, 16, 31, 32, 63, 64] benchmark_name = "lshift" label = "" dtypes = [torch.int8, torch.int16, torch.int32, torch.int64] results = [] def _make_args(dtype, length, rhs): info = torch.iinfo(dtype) imax = info.max return (torch.randint(info.max, (length,), dtype=dtype, device="cpu", requires_grad=False), rhs * torch.ones((length,), dtype=dtype, device="cpu", requires_grad=False)) for dtype, length, rhs in itertools.product(dtypes, lengths, rhss): x, y = _make_args(dtype, length, rhs) timer = Timer("x << y", globals=globals(), label=benchmark_name, description=label, sub_label=f"dtype={dtype},length={length}", num_threads=1) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open("{}.pickle".format(label), "wb") as f: pickle.dump(results, f) ```
Results of running above benchmarking script - results manually merged for runs of viable/strict (labeled "master" in the table below) and my branch (labeled "mybranch" in the table below) ``` [------------------- lshift -------------------------------] | master | mybranch 1 threads: ------------------------------------------------ dtype=torch.int8,length=1024 | 3 | 3 dtype=torch.int8,length=4096 | 5 | 3 dtype=torch.int8,length=16384 | 14 | 5 dtype=torch.int8,length=65536 | 51 | 15 dtype=torch.int16,length=1024 | 3 | 3 dtype=torch.int16,length=4096 | 4 | 3 dtype=torch.int16,length=16384 | 11 | 5 dtype=torch.int16,length=65536 | 39 | 13 dtype=torch.int32,length=1024 | 3 | 2 dtype=torch.int32,length=4096 | 4 | 3 dtype=torch.int32,length=16384 | 10 | 4 dtype=torch.int32,length=65536 | 35 | 12 dtype=torch.int64,length=1024 | 3 | 3 dtype=torch.int64,length=4096 | 4 | 3 dtype=torch.int64,length=16384 | 11 | 6 dtype=torch.int64,length=65536 | 36 | 20 Times are in microseconds (us). ```
All of the testing/benchmarking was conducted on qpu3, that supports AVX2 only. For basic validation of AVX-512 update of left shift implementation for 8-bit operands (that is the only one that is non-trivial in AVX-512 case), [Compiler Explorer](https://godbolt.org/) is used, with GCC trunk and `-mavx512f -mavx512bw` flags added. Here are further details:
C program used for basic validation of AVX-512 vectorized version for 8-bit operands ``` static void print_m512i_int8(const __m512i* x) { int8_t val[64]; memcpy(val, x, sizeof(val)); for (int i = 0; i < 64; ++i) { if (i > 0) printf(", "); printf("%d", (int)val[i]); } printf("\n"); } int main() { __m512i a = _mm512_set_epi8(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1); __m512i b = _mm512_set_epi8(7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0); // ------- Copied code from vec512_int.h // Mask used to set upper 8 bits of each 16-bit value to 0, and keep // lower 8 bits. __m512i mask = _mm512_set_epi16(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff); // Convert 8-bit operands from lower lanes to 16-bit values, and // perform vectorized shift. Make sure that upper 8 bits of 16-bit // results are all 0. __m256i a_lo_8 = _mm512_extracti64x4_epi64(a, 0); __m256i b_lo_8 = _mm512_extracti64x4_epi64(b, 0); __m512i a_lo_16 = _mm512_cvtepi8_epi16(a_lo_8); __m512i b_lo_16 = _mm512_cvtepi8_epi16(b_lo_8); __m512i c_lo_16 = _mm512_and_si512(_mm512_sllv_epi16(a_lo_16, b_lo_16), mask); // Convert 8-bit operands from upper lanes to 16-bit values, and // perform vectorized shift. Make sure that upper 8 bits of 16-bit // results are all 0. __m256i a_hi_8 = _mm512_extracti64x4_epi64(a, 1); __m256i b_hi_8 = _mm512_extracti64x4_epi64(b, 1); __m512i a_hi_16 = _mm512_cvtepi8_epi16(a_hi_8); __m512i b_hi_16 = _mm512_cvtepi8_epi16(b_hi_8); __m512i c_hi_16 = _mm512_and_si512(_mm512_sllv_epi16(a_hi_16, b_hi_16), mask); // Cast 16-bit results back into 8-bit values and merge them // together (using unsigned saturation with higher 8 bits set to 0 // above ensures that results are correct). Values are merged per // lanes, so this is not yet the final result. __m512i c_perm = _mm512_packus_epi16(c_lo_16, c_hi_16); // Permute values so that final result is produced. __m512i idx = _mm512_set_epi64(7, 5, 3, 1, 6, 4, 2, 0); __m512i c = _mm512_permutexvar_epi64(idx, c_perm); // ------- End copied print_m512i_int8(&c); // Expected output: 1(x8), 2(x8), 4(x8), 8(x8), 16(x8), 32(x8), 64(x8), 128(x8), -128(x8) return 0; } ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88607 Approved by: https://github.com/jgong5, https://github.com/lezcano, https://github.com/peterbell10 commit df1df9d10a7a2f4d7b1327fa85d0bb5fb6e9b693 Author: Howard Huang Date: Fri Nov 11 11:44:00 2022 -0800 [16/N] Add _allgather_base custom op with CPU/CUDA implementation (#88889) Differential Revision: [D41227739](https://our.internmc.facebook.com/intern/diff/D41227739) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88889 Approved by: https://github.com/kwen2501 commit 3765621356c645ead1d712c5b7e4d57d6803cc81 Author: ydwu4 Date: Sat Nov 12 20:00:51 2022 +0000 torchdynamo support self.modules() for nn_module (#88695) This PR allows models to call self.modules() during dynamo tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88695 Approved by: https://github.com/voznesenskym commit 27dc03e09b6b1948e416a9fd78e6ca2b0a0bb1c7 Author: soulitzer Date: Fri Nov 11 11:51:22 2022 -0500 Turn internal assert when saved tensor is detached inplace into torch check (#88860) Fixes https://github.com/pytorch/pytorch/issues/88809 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88860 Approved by: https://github.com/albanD commit 4270bb37dacf7e3b2b784fa4ff4002ee6bf87e56 Author: Nikita Karetnikov Date: Sat Nov 12 00:41:57 2022 +0100 [primTorch] Improve `narrow` and `narrow_copy`: refs, tests, docs (#87045) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87045 Approved by: https://github.com/mruberry commit 6e5f736d86be09bd86a5da276ce2f5dcbe0bfc09 Author: Howard Huang Date: Fri Nov 11 08:21:48 2022 -0800 [15/N] Add allreduce_coalesced custom op with CPU/CUDA implementations (#88846) Differential Revision: [D41227740](https://our.internmc.facebook.com/intern/diff/D41227740) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88846 Approved by: https://github.com/kwen2501 commit ae2c668cc044d841853e2672d96bfe0afb38a89c Author: PyTorch MergeBot Date: Sat Nov 12 07:52:53 2022 +0000 Revert "[dynamo][api] Better support of torch.nn.Module (#88629)" This reverts commit c83348597b195f2da1cca0e8318c878b104bce5d. Reverted https://github.com/pytorch/pytorch/pull/88629 on behalf of https://github.com/anijain2305 due to job failing on master https://github.com/pytorch/pytorch/actions/runs/3449914495/jobs/5758267231 commit 6b775c42dd2d40992611fb5636e787560663902c Author: Jerry Zhang Date: Sat Nov 12 07:52:44 2022 +0000 [quant][executorch] Support quant fusion for reshape in quant in executorch stack (#88858) Summary: This diff added support for fusing "dq - reshape - q" to a reshape op, the op is needed in wakeword model Test Plan: buck test executorch/exir/tests:quant_fusion_pass Reviewed By: qihqi, JacobSzwejbka Differential Revision: D41111069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88858 Approved by: https://github.com/JacobSzwejbka commit 34641c4384328ad9a3d2dc928de5b60a239427ee Author: PyTorch MergeBot Date: Sat Nov 12 05:16:41 2022 +0000 Revert "Add comprehensive minifier tests (#88022)" This reverts commit 5ff600aa6e40c6b4d426594bbb1f446f005b7fb3. Reverted https://github.com/pytorch/pytorch/pull/88022 on behalf of https://github.com/wconstab due to Seems to be causing CI failures relating to minifier test and some /tmp/ path not existing commit c83348597b195f2da1cca0e8318c878b104bce5d Author: Animesh Jain Date: Sat Nov 12 04:45:17 2022 +0000 [dynamo][api] Better support of torch.nn.Module (#88629) This is an API change, so please review carefully. With this PR, torchdynamo returns an `OptimizedModule` class object, a subclass of `torch.nn.Module`, when asked to optimize a `nn.Module` object. Most of the methods are redirected to the original `nn.Module`, which is installed as `_mod` in the `OptimizedModule`. This is helpful for many cases ``` mod = MockModule() opt_mod = torch._dynamo.optimize()(mod) print(opt_mod) # Works opt_mod = opt_mod.to(device="cuda") print(opt_mod) # Works opt_mod(input) # Triggers recompile if necessary, earlier we were shedding the TorchDynamo wrapper opt_mod.parameters() # Refers to the original module ``` Topics unclear to me * I have overridden many methods to raise NotImplementedError. A careful review of those will be good. * hooks * For the optimized forward, should we call torchdynamo optimization on `__call__` or `forward` * What else to test Pull Request resolved: https://github.com/pytorch/pytorch/pull/88629 Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/msaroufim commit d01bf1d1f11ab1fb9ae21a007138e2c4ecc31b63 Author: Andrew Gu Date: Sat Nov 12 01:05:46 2022 +0000 [FSDP] Introduce `ModuleWrapPolicy` for simplicity (#88450) **BC Breaking Change** This renames `unwrapped_params` to `nonwrapped_numel`. I prefer `nonwrapped` over `unwrapped` because "unwrap" suggests that some wrapping has been undone. I prefer `numel` over `params` because that is unit of measurement; I think we should keep "params" to refer to `nn.Parameter`s themselves. This only breaks anything that passes `unwrapped_params` as a keyword argument, but I did not see anything that did that (except the one internal benchmark file but that does not actually depend on our `pytorch` code). In a follow-up, I want to rename `min_num_params` to `min_nonwrapped_numel` in `size_based_auto_wrap_policy`, which is also BC breaking. Again, this is to differentiate between "params" being `nn.Parameter`s and "numel" being the unit for `param.numel()`. **Overview** This PR introduces `ModuleWrapPolicy` as a lightweight layer over the existing `transformer_auto_wrap_policy`. The most common auto wrapping paradigm is: ``` module_classes: Set[Type[nn.Module]] = ... auto_wrap_policy = functools.partial( transformer_auto_wrap_policy, transformer_layer_cls=module_classes, ) fsdp_model = FSDP(model, auto_wrap_policy=auto_wrap_policy, ...) ``` Now, users can instead write: ``` auto_wrap_policy = ModuleWrapPolicy(module_classes) fsdp_model = FSDP(model, auto_wrap_policy=auto_wrap_policy, ...) ``` This hides the unused arguments expected from the callable (`recurse` and `unwrapped_params`/`nonwrapped_numel`). `ModuleWrapPolicy` inherits from an abstract base class `FSDPPolicy` that expects a `policy` property. This decouples the construct of such `FSDPPolicy` classes and their actual `policy`, which must abide by the `_recursive_wrap` interface. Any existing auto wrap policy can be rewritten as a class that inherits from `FSDPPolicy`, so this approach is fully backward compatible from a functionality perspective. I call this base class `FSDPPolicy` to generalize over the cases where we may not want to actually perform any nested wrapping. In reality, the policy is meant for constructing `FlatParameter`s, which just happened to be induced by a nested wrapping before. Given this, I am changing the constructor argument in `fully_shard()` to simply `policy` instead of `auto_wrap_policy`. This PR migrates usages of `transformer_auto_wrap_policy` within our unit test suite to `ModuleWrapPolicy` as much as possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88450 Approved by: https://github.com/zhaojuanmao commit b2b0a0d3baf6258fbf728572719937810fd890ce Author: PyTorch MergeBot Date: Sat Nov 12 03:21:06 2022 +0000 [vision hash update] update the pinned vision hash (#88920) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88920 Approved by: https://github.com/pytorchbot commit ae4074669ecbf2a6d8bf99db745d29dce98d0c10 Author: Chien-Chin Huang Date: Thu Nov 10 21:19:22 2022 +0000 [FSDP][state_dict][6/N] Remove most FSDP module dependency from _optim_utils (#88638) **What** This PR removes most `FullyShardedDataParallel` dependencies from `optim_utils`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88638 Approved by: https://github.com/awgu commit 4108367123c1b44289b5c731c3bb7022976b816d Author: Bin Bao Date: Fri Nov 11 20:41:36 2022 +0000 Exclude poolformer_m36 from the inductor model test (#88908) Summary: The root cause is still to be investigated. Issue tracked at https://github.com/pytorch/torchdynamo/issues/1856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88908 Approved by: https://github.com/malfet commit 1e2327baf7a2d9c63bef08e5f996ef983e199429 Author: mikey dagitses Date: Sat Nov 12 02:23:48 2022 +0000 fix fx tests (#88886) Summary: Some source files are missing and TPX couldn't handle the default test names. Test Plan: Rely on CI. Differential Revision: D41218564 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88886 Approved by: https://github.com/zou3519 commit 66736ff425d7163df0eed48e9944c8539e92b577 Author: Edward Z. Yang Date: Fri Nov 11 09:33:41 2022 -0500 Fix bug in OptionalTensorList (#88887) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88887 Approved by: https://github.com/anjali411 commit 2b166532f7ac280232daf6c44620e96e258867cf Author: Edward Z. Yang Date: Fri Nov 11 09:00:55 2022 -0500 Remove incorrect assert about hermetic state. (#88885) I'm not sure why I thought this assert was valid in the first place, and there's no comment about it. The assert is tantamount to saying, "no tensor objects should become dead via SafePyObject when hermetic mode is on." But suppose we run a Python GC while we're inside hermetic mode. This could result in us disposing non-hermetic tensors, which would hit decref. So the assert seems invalid. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88885 Approved by: https://github.com/anjali411, https://github.com/malfet commit 2cd05a2818bacbc2e252052b6b71085e4de16b0d Author: Jiaxu Zhu Date: Sat Nov 12 01:20:52 2022 +0000 Support torch.qint32 in Convert (#88871) Enable the `torch.qint32` when creating `quantize_per_tensor` function call in `convert_fx` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88871 Approved by: https://github.com/jerryzh168 commit a3f3ec8fac98151f31373ba3bcfe2d601584a840 Author: Will Constable Date: Fri Nov 11 21:22:49 2022 +0000 [FSDP+dynamo]: forward treats parameter-views as params (#88781) Dynamo+AotAutograd needs a way to wrap all tensors (whether inputs or params/buffers) in FakeTensor wrappers, and FSDP's mangling of parameters hides them from this wrapping. This PR unblocks running hf_bert and hf_T5 with FSDP under dynamo, whether using recursive wrapping around transformer layers or only applying FSDP around the whole model. Perf/memory validation and possibly optimization is the next step. `python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager` `python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager --fsdp_wrap` `python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager` `python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager --fsdp_wrap` The problem: Dynamo (Actually aot_autograd) trips up with FSDP becuase it must wrap all input tensors in FakeTensor wrappers, and it only knows to wrap graph inputs or named_(parameters, buffers). FSDP's pre_forward hook sets views (which are not nn.param) into the flatparam as attrs on the module with the same name as the original param, but they will not show up in named_parameters. - in use_orig_params mode, FSDP still de-registers params during pre-forward hook, then re-registers them post-forward - during forward (between the hooks), the params are setattr'd on the module as regular view tensors, not nn.Parameters - note: use_orig_params is the recommended way to use FSDP, and use_orig_params=False is being deprecated. So i only consider use_orig_params=True for this enablement The solution: - adding them to named_buffers is not possible because it interferes with how FSDP's `_apply` works - since they are not actual nn.parameters, register_parameter will complain about registering them - simply seting `module._parameters[name] = view` seems to be a viable workaround, despite being hacky, and FSDP code does modify _parameters directly already. Note: Manual checkpointing still isn't working with FSDP+dynamo, so that will have to be addressed in a follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88781 Approved by: https://github.com/ezyang, https://github.com/awgu commit 5ff600aa6e40c6b4d426594bbb1f446f005b7fb3 Author: William Wen Date: Sat Nov 12 00:22:25 2022 +0000 Add comprehensive minifier tests (#88022) Adds tests for https://github.com/pytorch/torchdynamo/issues/1241. To run: `pytest test/dynamo/test_minifier.py`. Actually runs minifier launcher script and repro scripts, rather than just checking for existence of the minifier launcher script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88022 Approved by: https://github.com/mlazos, https://github.com/anijain2305 commit 37c5b42fa6597ebf7dbfb6db4ada2c7803950555 Author: Horace He Date: Fri Nov 11 19:17:47 2022 +0000 Fix matmul decomp to use reshape instead of contiguous().view() (#88832) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88832 Approved by: https://github.com/bertmaher, https://github.com/ngimel commit 7c3adddd6c3fe1bda4a9e5bfb9f992a802329551 Author: Richard Zou Date: Wed Nov 9 12:20:16 2022 -0800 [functorch] delete some unused files (#88763) Some post-merge cleanup. - packaging/ was for building standalone windows binaries - our flake8 config got superceded by PyTorch's. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88763 Approved by: https://github.com/samdow commit a7fa423f48af8af220e9286a6b4c374d533f77e0 Author: Peter Bell Date: Fri Nov 11 14:41:35 2022 +0000 copy_: Short-circuit when self and src view the same data (#88884) This comes up if you use inplace operators on a slice, e.g. ```python import torch a = torch.rand(1000000, device="cuda") a[::2] *= 2 ``` The last line looks as if it should be fully inplace, but is actually equivalent to: ```python tmp = a[::2] tmp *= 2 a[::2] = tmp ``` Which results in `mul_` and `copy_` being called. With this PR, the redundant copy becomes a no-op and the above example is 2x faster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88884 Approved by: https://github.com/ngimel commit 6fe47b682fe1ba2dd2c7da02ff1bb06f8670e3a7 Author: Yanbo Liang Date: Fri Nov 11 22:31:32 2022 +0000 [Dynamo] Fix str(Guard.obj_weakref) bug to re-ennable support overriding __getattr__ (#88564) See my inline comments! Pull Request resolved: https://github.com/pytorch/pytorch/pull/88564 Approved by: https://github.com/ezyang, https://github.com/anijain2305 commit be8d88f8d0c6825b1b19354ffbaa4466aae0d3b8 Author: Kevin Tse Date: Thu Nov 10 18:33:09 2022 -0500 [DataLoader] Removing DataLoader2 related code (#88848) Removing these lines of code as `DataLoader2` has been added to [TorchData](https://github.com/pytorch/data). I'm importing this to confirm it will not impact internal codes. Differential Revision: [D41201578](https://our.internmc.facebook.com/intern/diff/D41201578) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88848 Approved by: https://github.com/ejguan commit f39cad50b765b6fd2f4927a4d1552fff5928c61e Author: Nikita Shulga Date: Fri Nov 11 22:07:34 2022 +0000 Make InductorCPU usable in internally (#88870) Test Plan: `buck2 test mode/opt //caffe2/test:test_inductor -- --exact 'caffe2/test:test_inductor - test_dtype_mismatch_issue_cuda (caffe2.test.inductor.test_torchinductor.CudaTests)'` Differential Revision: D41206109 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88870 Approved by: https://github.com/izaitsevfb commit fbc1878265374a159639993269d40a6e08503278 Author: BowenBao Date: Tue Nov 8 10:22:32 2022 -0800 [ONNX] Pretty print diagnostic logging (#88261) Adds pretty print diagnostic logging. For example ```python import io import torch from torch.onnx._internal import diagnostics class CustomAdd(torch.autograd.Function): @staticmethod def forward(ctx, x, y): return x + y @staticmethod def symbolic(g, x, y): return g.op("custom::CustomAdd", x, y) class M(torch.nn.Module): def forward(self, x): return CustomAdd.apply(x, x) torch.onnx.export(M(), torch.randn(3, 4), io.BytesIO()) ``` By default, observe minimum summary of diagnostics ``` ========= Diagnostic Run torch.onnx.export version 1.14.0a0+git90a69c5 ========= verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 3 WARNING 0 ERROR ======================== 3 WARNING were not printed due to the log level. ``` Adjusting the `verbose` and `level` argument. ```python diagnostics.engine.pretty_print(verbose=True, level=diagnostics.levels.WARNING) ``` Prints full log. ``` =============================== 1 Diagnostic Run =============================== ========= Diagnostic Run torch.onnx.export version 1.14.0a0+git90a69c5 ========= verbose: True, log level: Level.WARNING ======================= 0 NONE 0 NOTE 3 WARNING 0 ERROR ======================== WARNING: node-missing-onnx-shape-inference ========================================== The shape inference of custom::CustomAdd type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. --------------------------- Stack: Python call stack --------------------------- frame: diagnostic = ExportDiagnostic(rule, level, message, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/diagnostics/_diagnostic.py:151 frame: n, utils._params_dict, GLOBALS.export_onnx_opset_version /home/bowbao/pytorch_dev/torch/onnx/_patch_torch.py:82 frame: <@beartype(torch.onnx._patch_torch._graph_op) at 0x7f62184b6710>:78 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: return function(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_deprecation.py:30 frame: return g.op("custom::CustomAdd", x, y) test_pretty_print.py:14 frame: return symbolic_fn(g, *args) /home/bowbao/pytorch_dev/torch/onnx/utils.py:1716 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: graph = _C._jit_pass_onnx(graph, operator_export_type) /home/bowbao/pytorch_dev/torch/onnx/utils.py:663 frame: <@beartype(torch.onnx.utils._optimize_graph) at 0x7f62180e05f0>:85 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: module=module, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1123 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: dynamic_axes=dynamic_axes, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1539 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: export_modules_as_functions=export_modules_as_functions, /home/bowbao/pytorch_dev/torch/onnx/utils.py:519 frame: <@beartype(torch.onnx.utils.export) at 0x7f62180e0170>:347 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: torch.onnx.export(M(), torch.randn(3, 4), io.BytesIO()) test_pretty_print.py:22 ---------------------------- Stack: C++ call stack ----------------------------- frame: () frame: ( + 0x88411b (0x7f625b36011b in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::UpdateReliable(torch::jit::Value*, std::pair const&) + 0x7d3 (0x7f625b351743 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::UpdateReliable(torch::jit::Node*) + 0x4f (0x7f625b35198f in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map, std::allocator >, c10::IValue, std::less, std::allocator > >, std::allocator, std::allocator > const, c10::IValue> > > const&, int) + 0xac9 (0x7f625b357179 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: ( + 0xabd026 (0x7f625b599026 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: ( + 0x3c0fda (0x7f625ae9cfda in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: () WARNING: node-missing-onnx-shape-inference ========================================== The shape inference of custom::CustomAdd type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. --------------------------- Stack: Python call stack --------------------------- frame: diagnostic = ExportDiagnostic(rule, level, message, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/diagnostics/_diagnostic.py:151 frame: graph, params_dict, GLOBALS.export_onnx_opset_version /home/bowbao/pytorch_dev/torch/onnx/utils.py:688 frame: <@beartype(torch.onnx.utils._optimize_graph) at 0x7f62180e05f0>:85 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: module=module, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1123 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: dynamic_axes=dynamic_axes, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1539 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: export_modules_as_functions=export_modules_as_functions, /home/bowbao/pytorch_dev/torch/onnx/utils.py:519 frame: <@beartype(torch.onnx.utils.export) at 0x7f62180e0170>:347 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: torch.onnx.export(M(), torch.randn(3, 4), io.BytesIO()) test_pretty_print.py:22 ---------------------------- Stack: C++ call stack ----------------------------- frame: () frame: ( + 0x88411b (0x7f625b36011b in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::UpdateReliable(torch::jit::Value*, std::pair const&) + 0x7d3 (0x7f625b351743 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::UpdateReliable(torch::jit::Node*) + 0x4f (0x7f625b35198f in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map, std::allocator >, c10::IValue, std::less, std::allocator > >, std::allocator, std::allocator > const, c10::IValue> > > const&, int) + 0xac9 (0x7f625b357179 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: ( + 0x87d6d1 (0x7f625b3596d1 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::ONNXShapeTypeInference(std::shared_ptr&, std::map, std::allocator >, c10::IValue, std::less, std::allocator > >, std::allocator, std::allocator > const, c10::IValue> > > const&, int) + 0x33 (0x7f625b359cf3 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: ( + 0xabdbae (0x7f625b599bae in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: ( + 0x3c0fda (0x7f625ae9cfda in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: () WARNING: node-missing-onnx-shape-inference ========================================== The shape inference of custom::CustomAdd type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. --------------------------- Stack: Python call stack --------------------------- frame: diagnostic = ExportDiagnostic(rule, level, message, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/diagnostics/_diagnostic.py:151 frame: graph, params_dict, GLOBALS.export_onnx_opset_version /home/bowbao/pytorch_dev/torch/onnx/utils.py:1179 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: dynamic_axes=dynamic_axes, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1539 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: export_modules_as_functions=export_modules_as_functions, /home/bowbao/pytorch_dev/torch/onnx/utils.py:519 frame: <@beartype(torch.onnx.utils.export) at 0x7f62180e0170>:347 frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81 frame: torch.onnx.export(M(), torch.randn(3, 4), io.BytesIO()) test_pretty_print.py:22 ---------------------------- Stack: C++ call stack ----------------------------- frame: () frame: ( + 0x88411b (0x7f625b36011b in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::UpdateReliable(torch::jit::Value*, std::pair const&) + 0x7d3 (0x7f625b351743 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::UpdateReliable(torch::jit::Node*) + 0x4f (0x7f625b35198f in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map, std::allocator >, c10::IValue, std::less, std::allocator > >, std::allocator, std::allocator > const, c10::IValue> > > const&, int) + 0xac9 (0x7f625b357179 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: ( + 0x87d6d1 (0x7f625b3596d1 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: (torch::jit::ONNXShapeTypeInference(std::shared_ptr&, std::map, std::allocator >, c10::IValue, std::less, std::allocator > >, std::allocator, std::allocator > const, c10::IValue> > > const&, int) + 0x33 (0x7f625b359cf3 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: ( + 0xabdbae (0x7f625b599bae in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: ( + 0x3c0fda (0x7f625ae9cfda in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so)) frame: () ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88261 Approved by: https://github.com/abock, https://github.com/justinchuby commit ea0ec9d71ca5428bedfcaf74990c109af8cb9a64 Author: efiks <5167930+efiks@users.noreply.github.com> Date: Fri Nov 11 21:58:23 2022 +0000 [tourch] BatchBoxCox - fix numerical issue in vectorized code (#88875) Summary: Usage of fast math in BatchBoxCox kernel provided different math results between dev and optimized versions which cause few internal test to fail. For now disabling the compiler optimized version and relying on ATEN vectors Differential Revision: D41211784 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88875 Approved by: https://github.com/hyuen commit dfb4b73e45896851d734e34a9902fd8b151797fe Author: Richard Barnes Date: Fri Nov 11 21:51:10 2022 +0000 Fix unused variable 'options' warning in RNN.cpp (#88753) Fixes ``` /home/rbarnes/pytorch/aten/src/ATen/native/cudnn/RNN.cpp:73:17: warning: unused variable 'options' [-Wunused-variable] TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); ^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88753 Approved by: https://github.com/soumith commit 7aa144ac54808419f7a702ef0c5a4445dba4c587 Author: Chien-Chin Huang Date: Thu Nov 10 21:19:21 2022 +0000 [FSDP][state_dict][5/N] Remove the FSDP module dependency from _state_dict_utils (#88637) **What** This PR completely removes the `FullyShardedDataParallel` dependency from `_state_dict_utils` -- `_state_dict_utils` now depends only on `_FSDPState` and all the utils modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88637 Approved by: https://github.com/awgu commit 575e02df5357ef6216b2d2db2424d10432679df2 Author: Nikita Shulga Date: Fri Nov 11 21:19:26 2022 +0000 Fix CUDNN_PATH handling on Windows (#88898) Fixes https://github.com/pytorch/pytorch/issues/88873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88898 Approved by: https://github.com/kit1980 commit f74946324e794d2332251d0497dc8ff4f831caa9 Author: kshitij12345 Date: Fri Nov 11 21:11:12 2022 +0000 [fix] allow saving python attr on Tensor and Parameter via torch.save (#81616) Fixes: https://github.com/pytorch/pytorch/issues/72129 TODO: * [x] Fix for Parameter Benchmark (Measurable diff for small tensors) ``` [-------------- Save and Load --------------] | After PR | Before PR 1 threads: ---------------------------------- () | 111.7 | 106.9 (4, 4) | 114.4 | 109.2 (128, 128) | 135.2 | 128.3 (1024, 1024) | 1431.9 | 1431.3 Times are in microseconds (us). ```
Benchmark Script ```python import torch from torch.testing._internal.common_utils import BytesIOContext from torch.utils import benchmark import pickle shapes = ((), (4, 4), (128, 128), (1024, 1024)) sizes = [1, 64, 1024, 10000] results = [] def save_load_fn(t): with BytesIOContext() as f: torch.save(t, f) f.seek(0) torch.load(f) for shape in shapes: t = torch.randn(shape) label = 'Save and Load' sub_label = f'{shape}' results.append(benchmark.Timer( stmt='save_load_fn(t)', globals={'t': t, 'save_load_fn':save_load_fn}, label=label, sub_label=sub_label, description='Before PR', ).blocked_autorange(min_run_time=2)) compare = benchmark.Compare(results) compare.print() with open('before_pr.pkl', 'wb') as f: pickle.dump(results, f) ```
NOTE : **BC-Breaking** : After this PR, all tensors (also regular tensors) will be serialised using `_rebuild_from_type_v2`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81616 Approved by: https://github.com/albanD, https://github.com/kurtamohler commit ba4d5aae06bde7c0ad045e54b7ad86f4542efb86 Author: PyTorch MergeBot Date: Fri Nov 11 19:13:05 2022 +0000 Revert "rename DisableTorchFunction to DisableTorchFunctionSubclass (#88218)" This reverts commit 7f28be10e5e71efda37800384fa897785499bed1. Reverted https://github.com/pytorch/pytorch/pull/88218 on behalf of https://github.com/izaitsevfb due to BC-breaking change, D41211901 commit 4e5d7afe84c01ed730f0f43395d7fa0542e81f3a Author: PyTorch MergeBot Date: Fri Nov 11 19:08:30 2022 +0000 Revert "add DisableTorchFunction that matches DisableTorchDispatch (#88219)" This reverts commit c0ecce15b5a54ff0185f9976e6bfb6f3a7de698d. Reverted https://github.com/pytorch/pytorch/pull/88219 on behalf of https://github.com/izaitsevfb due to BC-breaking change, D41211901 commit 9d7d21f5691979f728f42a709e1a47ab3e905342 Author: BowenBao Date: Tue Nov 8 10:22:31 2022 -0800 [ONNX] Add stack info to diagnostics (#87258) ~~Investigating strange bug releasing 'graph' right when returning from `_C._jit_pass_onnx`.~~ ~~Can be repro-ed locally via `test_cpp_diagnose`, with changes in this PR.~~ Resolved by https://github.com/pytorch/pytorch/pull/87829. This PR adds methods to record stack backtrace information to diagnostics. * #87830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87258 Approved by: https://github.com/abock commit 3d1c5c89ed27ff16601aecf7834a6bd06f578c45 Author: Chien-Chin Huang Date: Thu Nov 10 21:19:21 2022 +0000 [FSDP][state_dict][4/N] Move the core logic of summon full parameters to _unshard_params_utils.py (#88636) **What** `_summon_full_parameters` is required for state_dict. To enable composable FSDP state_dict, `_summon_full_params` must be accessible without FullyShardedDataParall. This PR move the core logic of `_summon_full_params` to `_unshard_params_utils`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88636 Approved by: https://github.com/awgu commit 5f0783bd6d27a0a239263b943d626c533b8b9a90 Author: Thiago Crepaldi Date: Fri Nov 11 17:43:46 2022 +0000 Fix ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops (#88504) Follow-up for #87735 Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one scenario slipped through. A use case where the model can be exported without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and BUILD_CAFFE2=0 A new unit test has been added, but it won't prevent regressions if BUILD_CAFFE2=0 is not executed on CI again Fixes #87313 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88504 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit 8ff2e34ca6905404aba35a432acf667ee6a13c6e Author: Elias Ellison Date: Fri Nov 11 04:25:11 2022 +0000 Take input striding for conv forward based on eager output (#88706) From discussion with @Chillee and @ngimel we'll likely need further fixes to ensure that we hit channels last kernels but this is still worth landing in its own right. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88706 Approved by: https://github.com/ngimel commit adfbd831cf59111c3d3a4a50ba6372bba94b63d1 Author: PyTorch MergeBot Date: Fri Nov 11 17:03:25 2022 +0000 Revert "[Autograd] Use in-place input accumulation fast path for dense Tensors. (#88339)" This reverts commit 8f66ae413f8c9d7f2418d7f0b9f69d409c455b46. Reverted https://github.com/pytorch/pytorch/pull/88339 on behalf of https://github.com/mehtanirav due to Internal test failures commit 89a326ff7ea56a1d735d26800b07a10e35c2dff4 Author: Kurt Mohler Date: Fri Nov 11 16:57:05 2022 +0000 Explicitly check filelike arg of `torch.save` (#88867) Fixes #88793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88867 Approved by: https://github.com/ezyang commit a6832b08a3f6c1b425a075fe204a1f21361f33d9 Author: Elias Ellison Date: Tue Nov 8 19:23:21 2022 +0000 Regularize bernouilli_ with bernouilli decomp (#88349) Fix for https://github.com/pytorch/torchdynamo/issues/1796. Just like the other [bernouilli decomp](https://github.com/pytorch/pytorch/blob/master/torch/_inductor/decomposition.py#L302) we need to pass `dtype=float32` to avoid `"check_uniform_bounds" not implemented` errors. Are we planning on enabling `TEST_WITH_TORCHINDUCTOR` ? Do I need to change anything with the tests ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/88349 Approved by: https://github.com/desertfire commit 1e8f95ace16cb617d71f8f8254c1d5bafd9f586c Author: Nikita Karetnikov Date: Fri Nov 11 13:51:18 2022 +0100 Symintify `broadcast_to` (#88776) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88776 Approved by: https://github.com/ezyang commit d615d1228932eaa5e026f5399e099f2036d2379b Author: anjali411 Date: Fri Nov 11 15:24:28 2022 +0000 Add meta impl for topk (#88694) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88694 Approved by: https://github.com/ezyang commit 3c7f96665e784a793d2d1a120ea8fe370b3f6d81 Author: Chien-Chin Huang Date: Thu Nov 10 19:54:56 2022 +0000 [FSDP][state_dict][3/N] Change how state_dict utils access attributes in _FSDPState (#88635) **What This PR Does** _state_dict_utils currently accesses the FSDP states through module. To enable composable FSDP state_dict, these accesses need to go through _FSDPState. module is still required for most APIs as state_dict has to access per-module information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88635 Approved by: https://github.com/awgu commit b92acee8f83c7852194d6979362aea0c240709da Author: soulitzer Date: Thu Nov 10 19:08:42 2022 -0500 Add context manager to allow mutation on saved tensors (#79056) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79056 Approved by: https://github.com/albanD commit 91b71cdbe4f31006fad91f9dd460123677a7c625 Author: Bin Bao Date: Wed Nov 9 20:39:50 2022 +0000 [dynamo] Add torch.device to is_safe_constant (#88766) Test Plan: ``` PYTORCH_TEST_WITH_DYNAMO=1 python test/test_torch.py -k test_advancedindex_mixed_cpu_devices_cuda ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88766 Approved by: https://github.com/jansel commit 324ac93a43a93f671bb34b835926b22d13442735 Author: Chien-Chin Huang Date: Tue Nov 8 00:16:14 2022 +0000 [FSDP][state_dict][2/N] Move state_dict related enums/dataclasses/states to state_dict_utils.py, api.py and init_state_dict() (#88481) **Motivation**: Several Enums, Dataclasses and states defined in fully_sharded_data_paralle.py should be moved to a place where the composable FSDP can access. This PR does the move. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88481 Approved by: https://github.com/rohan-varma, https://github.com/awgu commit ee91c328da5739ce03b3127cd7c542ce505212b8 Author: Michael Gschwind Date: Fri Nov 11 12:19:31 2022 +0000 Fix cuda/cpu check on NoneType (#88854) Summary: Fix cuda/cpu check on NoneType Test Plan: sabdcastle/ github CI/CD Differential Revision: D41203955 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88854 Approved by: https://github.com/drisspg, https://github.com/ngimel commit d15a6b0c975b9e1e90ed4e951071e5269c10ac5b Author: kshitij12345 Date: Fri Nov 11 08:51:26 2022 +0000 Error on ZeroTensor serialization (#88803) Follow-up : https://github.com/pytorch/pytorch/pull/88182#issuecomment-1308628415 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88803 Approved by: https://github.com/anjali411 commit b843f4db0a26aae6536e6b971f73bcc5af21c90a Author: AllenTiTaiWang Date: Wed Nov 9 17:41:10 2022 +0000 [ONNX] Add test case for onnx::Max scalar type (#88751) Referenced by minimum cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/88751 Approved by: https://github.com/wschin, https://github.com/BowenBao commit 396c3b1d88d7624938a2bb0b287f2a19f1e89bb4 Author: Eddie Yan Date: Fri Nov 11 05:23:48 2022 +0000 Use `atomicAdd` for `bfloat16` in Ampere and above (#84981) WIP to fix extremely slow `scatter_add` issue vs. fp16. The current changes seem to improve performance, but it still appears to lag behind the fp16 equivalent. CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/84981 Approved by: https://github.com/ngimel commit a6d72f44a4e8b6e9d2e878f30fd8b1d3e1197f0e Author: AllenTiTaiWang Date: Wed Nov 9 17:27:22 2022 +0000 [ONNX] Add onnx::Max into standard Op for scalar type alignment (#88750) Easy fix for onnx::Max ScalarType Pull Request resolved: https://github.com/pytorch/pytorch/pull/88750 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit 0de8f047c1cc950c59b0448b9b78dafc0202c43f Author: PyTorch MergeBot Date: Fri Nov 11 04:19:08 2022 +0000 Revert "[dynamo] fixes dict changed during runtime error (#87526)" This reverts commit cf04b36ce8f531730210b03eaa347977a1c2d75c. Reverted https://github.com/pytorch/pytorch/pull/87526 on behalf of https://github.com/anijain2305 due to error reported commit 310335de48ab9d8bcd33b98f3f71ef88ae4bd45c Author: Jane Xu Date: Fri Nov 11 04:02:44 2022 +0000 Update lr_scheduler.pyi to match lr_scheduler.py (#88818) Following #88503, we should also update the pyi file Pull Request resolved: https://github.com/pytorch/pytorch/pull/88818 Approved by: https://github.com/soulitzer commit 86b7aa26f0bb8878d925a625af45d16d4bb2f2af Author: Wei-Sheng Chin Date: Fri Nov 11 03:49:27 2022 +0000 Fix FakeTensorProp on Module with Parameters or Buffers (#88700) In `FakeTensorMode.__torch_dispatch__`, the output is now always computed by meta kernels in ```python try: with in_kernel_invocation_manager(self): r = func(*args, **kwargs) # <----- "r" can be a real tensor. except NotImplementedError as not_implemented_error: if not self.allow_fallback_kernels: raise not_implemented_error return run_fallback_kernel(self, func, args, kwargs, not_implemented_error) return self.wrap_meta_outputs_with_default_device_logic(r, func, args, kwargs) ``` For example, I observed a CPU tensor is generated when executing `aten.addmm` when running `FakeTensorProp`. Therefore, I'd like to allow `FakeTensorMode` to wrap real tensor as `FakeTensor` during the computation. Does this PR look a good direction to fix this problem? If yes, I can go ahead and add some tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88700 Approved by: https://github.com/eellison, https://github.com/ezyang commit c4fc5d372f3db37380fe213b5726403cb1330d5d Author: Chien-Chin Huang Date: Mon Nov 7 23:46:29 2022 +0000 [FSDP][state_dict][1/N] Moving state_dict logic to pre_state_dict_hook (#87900) This is one step toward the ultimate goal: remove the overwritten state_dict in FSDP. All the logic should be either in `pre_state_dict_hook` or `post_state_dict_hook`. Since current `nn.Module` does not support `pre_state_dict_hook`, this PR mimic `pre_state_dict_hook` by calling the pre hook inside post the hook, effectively ditching all the work done by `nn.Module.state_dict`. Once `pre_state_dict_hook` is supported by `nn.Module`, these pre hook calls can be moved out from the post hooks and be registered to `nn.Module.pre_state_dict_hook`. The major issue of this temporary solution is that `post_state_dict_hook` is called from the leaf node to the root node. This makes the `module._lazy_init()` invalid as FSDP assumes `_lazy_init()` to be called from the root. As a result, `FSDP.state_dict` currently contains only one logic -- calling `module._lazy_init()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87900 Approved by: https://github.com/rohan-varma commit 9d09968bbe05fc6d7d7c3d8b1acfbe1b1b1413a8 Author: Emil Lynegaard Date: Fri Nov 11 03:34:54 2022 +0000 Disable check for dropout in MultiheadAttention fast_path (#88831) Since we already enforce eval mode for the fast_path, we do not need to also check for a falsy dropout value, as a model trained with dropout will have a non-zero dropout during eval mode, even though it won't be applied. Fixes #88806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88831 Approved by: https://github.com/drisspg commit 3082378701605884ff07f7ba7984864340b19b34 Author: PyTorch MergeBot Date: Fri Nov 11 03:33:55 2022 +0000 [vision hash update] update the pinned vision hash (#88853) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88853 Approved by: https://github.com/pytorchbot commit 495e7b1c729e64693e794ea22640b4552816f0ef Author: Sherlock Huang Date: Thu Nov 10 21:22:29 2022 +0000 Ref for aten.full; symint changes in prim (#88762) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88762 Approved by: https://github.com/ezyang commit 3fbf748f2109de408bd47efb1a43e3897d7a775c Author: Michael Voznesensky Date: Fri Nov 11 02:30:29 2022 +0000 Assert we have triton before scheduling on triton (#88849) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88849 Approved by: https://github.com/wconstab, https://github.com/ngimel, https://github.com/jansel commit fc9e36dd426d4747bb7c71ee93bcbaa700bda01d Author: anjali411 Date: Thu Nov 10 22:41:47 2022 +0000 Add meta support for scalar_tensor and argmax (#88590) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88590 Approved by: https://github.com/albanD commit c961e45ee559a61bfb4f1e8a548e574ef89d3102 Author: Nikolay Korovaiko Date: Thu Nov 10 12:21:50 2022 -0800 handle zero dims in reductions (#88280) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88280 Approved by: https://github.com/ngimel commit 534ae6ae4790aec1b148b7e878ae60828ae45ac0 Author: Ryan Spring Date: Fri Nov 11 01:08:16 2022 +0000 [primTorch] Implement group norm reference (#87054) Add group norm reference Split from #81191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87054 Approved by: https://github.com/mruberry commit 072834d56dada58f99216ce398fb57cce57968a9 Author: HDCharles Date: Tue Nov 8 07:59:12 2022 -0800 [ao] qconfig_mapping.py fixing public v private (#87518) Summary: made _GLOBAL_DICT_KEY, _OBJECT_TYPE_DICT_KEY, _MODULE_NAME_REGEX_DICT_KEY, _MODULE_NAME_DICT_KEY, _MODULE_NAME_OBJECT_TYPE_ORDER_DICT_KEY private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709278](https://our.internmc.facebook.com/intern/diff/D40709278) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87518 Approved by: https://github.com/jcaip commit f9221bf53b376d1284e2356b716c2cd47fcd65f2 Author: Ian Graves Date: Fri Nov 11 00:19:20 2022 +0000 [pytorch] Enable memory map file support for Android, Apple, and CXX (#88545) Summary: See title. Left Windows out so it still compiles. Test Plan: Add a `#fail` below [this line](https://fburl.com/code/p0mlhlw4) and build for various platforms and confirm it fails which proves the `#ifdef` was hit. ``` buck2 build xplat/langtech/tuna/cli:tuclixAndroid buck2 build xplat/langtech/tuna/cli:tuclix ``` CI/CD for the rest. Differential Revision: D41054824 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88545 Approved by: https://github.com/qihqi commit 8441443132106fd673a81cd8f6728b332d16f837 Author: PyTorch MergeBot Date: Thu Nov 10 23:56:49 2022 +0000 Revert "Add nondeterministic error for `scatter` (#88244)" This reverts commit e940a2f8e2a3aa9d98291e73b3d40fcffb6182c8. Reverted https://github.com/pytorch/pytorch/pull/88244 on behalf of https://github.com/mehtanirav due to Internal test failures commit 62ef15e320f4a0aaa2f39296e9299f56926fb7c9 Author: Nikita Shulga Date: Thu Nov 10 23:52:27 2022 +0000 [MPS] Fix `test_embedding_dense_backward` (#88847) By copying randomly initialized weights distribution from MPS `nn.Embedding` to `cpu` Test plan: `python test_mps.py -k test_embedding_dense_backward --repeat 150` Fixes https://github.com/pytorch/pytorch/issues/88679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88847 Approved by: https://github.com/seemethere commit b30222e0c481f29fe0785dde518c590ac392e9a2 Author: Yanbo Liang Date: Thu Nov 10 23:47:21 2022 +0000 [Dynamo] Add complete support for Tensor.is_contiguous (#88407) Fixes https://github.com/pytorch/torchdynamo/issues/1783 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88407 Approved by: https://github.com/jansel commit ae01615d7558d02383efe673ec0b92e2abe40db5 Author: Dmytro Dzhulgakov Date: Thu Nov 10 23:44:49 2022 +0000 Fix cupti search path in CMake (#88657) Minor fix for when cuda is installed via conda. In this case the libraries are in `lib` and not `lib64`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88657 Approved by: https://github.com/kit1980, https://github.com/malfet commit d9ad08ce8a07a3d17df397051b32591f4446edfa Author: Sherlock Huang Date: Thu Nov 10 20:35:52 2022 +0000 Symbolic shape: sym_floor , sym_sqrt, sym_int (#88760) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88760 Approved by: https://github.com/ezyang commit cc04cf50bfb6110e4c1c5889ad7da626dafac384 Author: Yanbo Liang Date: Thu Nov 10 23:37:29 2022 +0000 [Inductor] Fix lowmem_dropout() missing 1 required positional argument: 'p' (#88716) Fixes error from 7k github models: https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_GuYuc_WS_DAN_PyTorch.py Error: ``` TypeError: lowmem_dropout() missing 1 required positional argument: 'p' While executing %lowmem_dropout : [#users=1] = call_function[target=torch._inductor.overrides.lowmem_dropout](args = (%avg_pool2d_9,), kwargs = {training: False}) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88716 Approved by: https://github.com/ngimel, https://github.com/jansel, https://github.com/desertfire commit 500fd65531e77deb7784d3ac4f78c5cbe21efe41 Author: BowenBao Date: Tue Nov 8 10:22:31 2022 -0800 [ONNX] Create common ExportTestCase base class (#88145) Refactor out a common base class `ExportTestCase`, for common things in `setUp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88145 Approved by: https://github.com/justinchuby, https://github.com/abock, https://github.com/AllenTiTaiWang commit 20ae19aa1dd307f9bdde0754c327ffb69eef13c0 Author: BowenBao Date: Tue Nov 8 10:22:31 2022 -0800 [ONNX] Improve diagnostic message formatting (#87830) * Reflect required arguments in method signature for each diagnostic rule. Previous design accepts arbitrary sized tuple which is hard to use and prone to error. ![image](https://user-images.githubusercontent.com/9376104/200381982-d1e905f0-a159-4ef5-8d2e-070524e8f5bf.png) * Removed `DiagnosticTool` to keep things compact. * Removed specifying supported rule set for tool(context) and checking if rule of reported diagnostic falls inside the set, to keep things compact. * Initial overview markdown file. * Change `full_description` definition. Now `text` field should not be empty. And its markdown should be stored in `markdown` field. * Change `message_default_template` to allow only named fields (excluding numeric fields). `field_name` provides clarity on what argument is expected. * Added `diagnose` api to `torch.onnx._internal.diagnostics`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87830 Approved by: https://github.com/abock commit a6610faa93ac008c088bcbe26bdbb56de8275cf1 Author: HDCharles Date: Tue Nov 8 07:59:11 2022 -0800 [ao] qconfig_mapping_utils.py fixing public v private (#87517) Summary: made _get_object_type_qconfig, _get_module_name_regex_qconfig, _get_module_name_qconfig, _maybe_adjust_qconfig_for_module_type_or_name, _get_flattened_qconfig_dict _update_qconfig_for_qat private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709279](https://our.internmc.facebook.com/intern/diff/D40709279) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87517 Approved by: https://github.com/jcaip commit c1553880de95845c5a194247c683872949d66cd6 Author: Michael Lazos Date: Thu Nov 10 21:38:04 2022 +0000 Have kernel names include fused ops (#88624) - Propagates origin fx nodes through inlining during lowering - Concatenates op names into kernel name - Adds config to cap the number of ops in the kernel name so they don't get too long Caveats: - The ordering in the name may not match the order that the ops are executed in the kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/88624 Approved by: https://github.com/anijain2305, https://github.com/jansel commit ad2eba802c04394875af0f00b985f7f338423f1e Author: HDCharles Date: Tue Nov 8 07:59:11 2022 -0800 [ao] fuser_method_mappings.py fixing public v private (#87516) Summary: made _get_valid_patterns, _DEFAULT_PATTERN_TO_FUSER_METHOD, _reverse3, _reverse2, _reverse_sequential_wrapper2, _DEFAULT_OP_LIST_TO_FUSER_METHOD, _sequential_wrapper2 private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709281](https://our.internmc.facebook.com/intern/diff/D40709281) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87516 Approved by: https://github.com/jcaip commit 37b468ac777ba548a2808010fd2f1b146b779fe0 Author: maxren Date: Wed Nov 9 15:33:57 2022 -0800 [xnnpack][lite-int][on-device] rebuild serialized modules at runtime (#88780) This is the on-device runtime work. We modify the compile and execute from our hacky solution from before to what will actually be running at runtime. First we rebuild our graph from the serialized flatbuffer string. We also introduce a runtime wrapper that inherits CustomClassHolder that allows us to forward along the built xnngraph runtime to our execute function Once the subgraph object has been rebuilt by our we pass it along to the runtime wrapper for us to forward along to execute At execute we prep the input/outputs and invoke the runtime using our runtime wrapper. Finally we forward those results to our execution Differential Revision: [D39413031](https://our.internmc.facebook.com/intern/diff/D39413031/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39413031/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/88780 Approved by: https://github.com/digantdesai commit de38c8769835ab0efa055baaf7605be37e410417 Author: Catherine Lee Date: Thu Nov 10 21:32:41 2022 +0000 Use run_test in MPS (#88829) Run mps through run_test to get disable test infra, create xml files (which can then be used for flakiness detection), and reruns Also added the workflow steps for uploading the xml files Pull Request resolved: https://github.com/pytorch/pytorch/pull/88829 Approved by: https://github.com/malfet, https://github.com/huydhn commit 1ae772a663f772171f0c5d6d7d311792f331206a Author: Bert Maher Date: Thu Nov 10 06:56:26 2022 -0800 [inductor] Remove import check for fast_flush (#88812) https://github.com/pytorch/pytorch/pull/88557/ has a guard to make sure that triton's `do_bench` includes the `fast_flush` argument. Since we've updated Triton to a sufficiently recent revision, we can remove that guard. Differential Revision: [D41185280](https://our.internmc.facebook.com/intern/diff/D41185280/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88812 Approved by: https://github.com/soumith commit 3a4e8736ad66db2089cbcb3a24cf779aab3a7564 Author: maxren Date: Wed Nov 9 15:33:00 2022 -0800 [xnnpack][on-device] compiler --> executor object (#88779) This is purely to abstract away the subgraph rebuild from the flatbuffer object. CompileModel return an executor object which we can use to setup inputs and run forward with. We Include ATen/utils for torch_check, this will be changed when moving to executorch Differential Revision: [D40733163](https://our.internmc.facebook.com/intern/diff/D40733163/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88779 Approved by: https://github.com/digantdesai commit 394b998de2228a4b4730c52b50975a2ecf756049 Author: Mark Saroufim Date: Thu Nov 10 21:04:35 2022 +0000 sub setup.py install -> develop (#88507) If someone is building the project from source they're likely a contributor for which develop will be much more useful. For people that want to try the latest and greatest they can leverage the nightlies Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/88507 Approved by: https://github.com/malfet commit d5e1e2f0fcd4e0602295bfaf80b8aeb80c86a70d Author: maxren Date: Wed Nov 9 15:31:44 2022 -0800 [xnnpack][on-device] executor class (#88778) Executor object used to wrap our xnn_runtime object. The ideal flow of this object looks as such: ``` executor.set_inputs(vector inputs, vector outputs) executor.forward() ``` This will likely be returned by our delegate compile and given over to execute in order to run inference using the xnn runtime ``` ``` These Aten functions are included in order to use at::Tensor when setting the inputs, this will change when used for Executorch because we will be switching from at::Tensor to whatever tensor abstraction is used for ET. Seems like they have the same call for `.data_ptr()`, so realistically all logic here will be the same. ATen/Utils is used for TORCH_CHECK. We will switch to ET_CHECK_MESSAGE for executorch. Differential Revision: [D40733121](https://our.internmc.facebook.com/intern/diff/D40733121/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88778 Approved by: https://github.com/digantdesai commit 29550e2c1df4cf3ef949e8f1ef973fd5e103a2d3 Author: PyTorch MergeBot Date: Thu Nov 10 20:56:30 2022 +0000 Revert "[Inductor] Build FX Linear + Permute Vertical Fusion in Inductor (#88566)" This reverts commit 48b58930cbfa725ac25a9303d496c76bf983574d. Reverted https://github.com/pytorch/pytorch/pull/88566 on behalf of https://github.com/huydhn due to This change breaks trunk https://hud.pytorch.org/pytorch/pytorch/commit/48b58930cbfa725ac25a9303d496c76bf983574d commit 90cf14ddf691bfae2d5c793376c68921b7111fde Author: erjia Date: Thu Nov 10 19:54:19 2022 +0000 [DataPipe] Deprecating drop_empty_batches from Filter and other functional APIs (#88693) - Deprecating based on https://github.com/pytorch/data/issues/163 Corresponding PRs from TorchData: https://github.com/pytorch/data/pull/890 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88693 Approved by: https://github.com/NivekT commit 98ecd06580b667441a45bfe7a67bc95ddb8a9353 Author: Felix Divo <4403130+felixdivo@users.noreply.github.com> Date: Thu Nov 10 19:29:29 2022 +0000 Bring Unfold/Fold param doc order in line with code (#88819) Now the first parameter (if used as a positional argument) is the first that is listed in the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88819 Approved by: https://github.com/ngimel commit 1d54ce9d5d4e44416a55ad002b8dc9b984ecc906 Author: Howard Huang Date: Thu Nov 10 06:31:46 2022 -0800 [14/N] Refactor _new_process_group_helper() to remove repeated code (#88351) Changes: - refactor parts of `_new_process_group_helper()` to remove repeated code Differential Revision: [D41188274](https://our.internmc.facebook.com/intern/diff/D41188274) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88351 Approved by: https://github.com/kwen2501 commit 4bcf2c53e521f5c61615b0adb84312513ad583f2 Author: William Wen Date: Thu Nov 10 19:22:09 2022 +0000 Add warnings & regressions info text (#88837) Add text about what warnings and accuracy regressions dropdowns mean. Sample: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1310770285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88837 Approved by: https://github.com/anijain2305 commit 3b8245ab12d54723b6e7bcceb176235f13f0348b Author: Jiewen Tan Date: Thu Nov 10 18:34:19 2022 +0000 [LTC] Make ComputePostOrder accept const T pointers (#88773) Summary: Since `c10::ArrayRef` now support `c10::ArrayRef`, let's restore `ComputePostOrder` to accept `const Node*` again, which is more suitable for the context of the given helpers. Test Plan: CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88773 Approved by: https://github.com/JackCaoG commit 48b58930cbfa725ac25a9303d496c76bf983574d Author: Jiawen Liu Date: Thu Nov 10 18:32:25 2022 +0000 [Inductor] Build FX Linear + Permute Vertical Fusion in Inductor (#88566) Summary: Build fx-based linear/matmul/bmm + permute/transpose vertical fusion in Inductor For an internal Ads model: 1.15x -> 1.36x speedup Differential Revision: D41071665 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88566 Approved by: https://github.com/jansel, https://github.com/jianyuh commit d157fca59c3f28b532f5e845c48df0e2bedbfa39 Author: PyTorch MergeBot Date: Thu Nov 10 18:19:51 2022 +0000 Revert "Symintify `broadcast_to` (#88776)" This reverts commit 3a09d9a129406a05ca7e82c1438f9aa83019f48d. Reverted https://github.com/pytorch/pytorch/pull/88776 on behalf of https://github.com/malfet due to Broke functorch/test_aotdispatch on M1, see https://hud.pytorch.org/pytorch/pytorch/commit/3a09d9a129406a05ca7e82c1438f9aa83019f48d commit 6bf2776ac1d16692778f052ba6796d3308ea97c6 Author: Andrew Gu Date: Thu Nov 10 15:17:51 2022 +0000 [FSDP][Perf] Do not call `pad` in no-padding case (#88769) - Calling `F.pad()` issues a pad kernel from the CPU even if there is no padding needed, which can incur some non-negligible overhead. This PR removes that unnecessary call for the no-padding case. - This PR also does not zero the newly-allocated sharded gradient tensor before the reduce-scatter if `use_orig_params=True` because there is no need. The reduce-scatter will fill the tensor anyway, and we do not care about the values in the padding. For `use_orig_params=False`, the padding is exposed to the user, so we preserve the existing semantics of zeroing it. I left a to-do to follow-up since we may optimize that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88769 Approved by: https://github.com/zhaojuanmao commit d3178465eed4895fa12430943db37d00dd2c483b Author: Bert Maher Date: Thu Nov 10 18:17:20 2022 +0000 [dynamo] `VariableTracker.call_method` requires a name (#88311) Summary: as title Test Plan: Before: N2743445, After: N2748186. Note there's a new error, but at least we got past the easy one. Differential Revision: D40938415 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88311 Approved by: https://github.com/brad-mengchi commit 1e4079a4762f515406c7f4654e7a4340914898ef Author: Bert Maher Date: Thu Nov 10 04:42:37 2022 +0000 [nnc] Disable opaque pointers mode in LLVM backend to allow getPointerElementType (#88798) As of LLVM 15 typed pointers are going away: https://llvm.org/docs/OpaquePointers.html. Thus `getPointerElementType` is no longer legal, since pointers are all opaque. I don't totally remember why we use it so prolifically, or whether there's an easy change to get rid of it, or whether we'd need a significant refactor to carry around `Type`s alongside `Value`s. But in any case, NNC is deprecated (see: TorchInductor) and will hopefully be gone before LLVM 16 is a thing. For now, we can apply the hack of turning off opaque pointer mode on the LLVMContext. Differential Revision: [D41176215](https://our.internmc.facebook.com/intern/diff/D41176215) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88798 Approved by: https://github.com/desertfire commit 656d0de6c50c373c7da2960ae6e9ca07b262384f Author: Panagiotis Antoniadis Date: Thu Nov 10 18:11:29 2022 +0000 Change TORCH_INTERNAL_ASSERT to TORCH_CHECK and add a nice error message (#88804) Fixes #87672 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88804 Approved by: https://github.com/ezyang commit 79b049af5ecbd8619acb4196f8c59228832ec99b Author: Huy Do Date: Thu Nov 10 17:48:16 2022 +0000 Switch to setup-nvidia action (#88757) Use the new [setup-nvidia](https://github.com/pytorch/test-infra/blob/main/.github/actions/setup-nvidia/action.yml) action from test-infra. The new action is created so that it can be shared across different PyTorch repos. For examples: * [pytorch/pytorch](https://github.com/pytorch/pytorch/blob/master/.github/scripts/install_nvidia_utils_linux.sh) (fixed by this PR) * [pytorch/tau](https://github.com/pytorch/tau/blob/main/.github/workflows/install_nvidia_utils_linux.sh) (fixed by https://github.com/pytorch/tau/pull/595) * [pytorch/torchsnapshot](https://github.com/pytorch/torchsnapshot/blob/main/.github/scripts/install_nvidia_utils_linux.sh) (fixed by https://github.com/pytorch/torchsnapshot/pull/130) * [torch/multiply](https://github.com/pytorch/multipy/blob/main/.github/scripts/install_nvidia_utils_linux.sh) (fixed by https://github.com/pytorch/multipy/pull/264) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88757 Approved by: https://github.com/seemethere, https://github.com/atalman commit f98edfcc48c903d0d22a0105b0fafe4ca58121e6 Author: Nikita Shulga Date: Thu Nov 10 17:42:20 2022 +0000 Make TorchElastic timer importable on Windows (#88522) Also, add `torch.distributed` to test imports, so that we would not regress in the future Fixes https://github.com/pytorch/pytorch/issues/85427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88522 Approved by: https://github.com/d4l3k commit 4b898a7304246275b250b159dd0ac8e68a6df95d Author: Nikita Karetnikov Date: Thu Nov 10 01:07:50 2022 +0100 Symintify `adaptive_avg_pool3d` (#88783) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88783 Approved by: https://github.com/ezyang commit 3a09d9a129406a05ca7e82c1438f9aa83019f48d Author: Nikita Karetnikov Date: Thu Nov 10 11:48:31 2022 +0100 Symintify `broadcast_to` (#88776) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88776 Approved by: https://github.com/ezyang commit c0ecce15b5a54ff0185f9976e6bfb6f3a7de698d Author: samdow Date: Mon Nov 7 15:43:39 2022 -0500 add DisableTorchFunction that matches DisableTorchDispatch (#88219) Closes #87990. This implements a new disable guard that matches DisableTorchDispatch (disables all subclasses and modes) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88219 Approved by: https://github.com/ezyang commit 7f28be10e5e71efda37800384fa897785499bed1 Author: samdow Date: Tue Nov 1 18:35:38 2022 -0400 rename DisableTorchFunction to DisableTorchFunctionSubclass (#88218) First half of #87990. This doesn't change any of the behavior and is just a rename Pull Request resolved: https://github.com/pytorch/pytorch/pull/88218 Approved by: https://github.com/ezyang, https://github.com/zou3519 commit 3e43ff279428e5d07932968fbd7792200fa15a4d Author: XiaobingSuper Date: Thu Nov 10 01:30:03 2022 -0500 torchdynamo: add convolution add(relu) inplace fusion kernel (#88048) This PR is about add convolution add(relu) inplace fusion kernel which works for **other.add_(conv)**. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88048 Approved by: https://github.com/jgong5, https://github.com/jansel commit e6561291b89ecfbe35990decfcf16db47419d429 Author: Philip Meier Date: Thu Nov 10 13:44:45 2022 +0000 add hack to allow hybrid compressed sparse comparison in assertEqual (#88749) Hybrid sparse CSR tensors can currently not be compared to strided ones since `.to_dense` does not work: ```py import torch from torch.testing._internal.common_utils import TestCase assertEqual = TestCase().assertEqual actual = torch.sparse_csr_tensor([0, 2, 4], [0, 1, 0, 1], [[1, 11], [2, 12] ,[3, 13] ,[4, 14]]) expected = torch.stack([actual[0].to_dense(), actual[1].to_dense()]) assertEqual(actual, expected) ``` ``` main.py:4: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at ../aten/src/ATen/SparseCsrTensorImpl.cpp:54.) actual = torch.sparse_csr_tensor([0, 2, 4], [0, 1, 0, 1], [[1, 11], [2, 12] ,[3, 13] ,[4, 14]]) Traceback (most recent call last): File "/home/philip/git/pytorch/torch/torch/testing/_comparison.py", line 1098, in assert_equal pair.compare() File "/home/philip/git/pytorch/torch/torch/testing/_comparison.py", line 619, in compare actual, expected = self._equalize_attributes(actual, expected) File "/home/philip/git/pytorch/torch/torch/testing/_comparison.py", line 706, in _equalize_attributes actual = actual.to_dense() if actual.layout != torch.strided else actual RuntimeError: sparse_compressed_to_dense: Hybrid tensors are not supported The above exception was the direct cause of the following exception: Traceback (most recent call last): File "main.py", line 10, in assertEqual(actual, expected) File "/home/philip/git/pytorch/torch/torch/testing/_internal/common_utils.py", line 2503, in assertEqual msg=(lambda generated_msg: f"{generated_msg}\n{msg}") if isinstance(msg, str) and self.longMessage else msg, File "/home/philip/git/pytorch/torch/torch/testing/_comparison.py", line 1112, in assert_equal ) from error RuntimeError: Comparing TensorOrArrayPair( id=(), actual=tensor(crow_indices=tensor([0, 2, 4]), col_indices=tensor([0, 1, 0, 1]), values=tensor([[ 1, 11], [ 2, 12], [ 3, 13], [ 4, 14]]), size=(2, 2, 2), nnz=4, layout=torch.sparse_csr), expected=tensor([[[ 1, 11], [ 2, 12]], [[ 3, 13], [ 4, 14]]]), rtol=0.0, atol=0.0, equal_nan=True, check_device=False, check_dtype=True, check_layout=False, check_stride=False, check_is_coalesced=False, ) resulted in the unexpected exception above. If you are a user and see this message during normal operation please file an issue at https://github.com/pytorch/pytorch/issues. If you are a developer and working on the comparison functions, please except the previous error and raise an expressive `ErrorMeta` instead. ``` This adds a temporary hack to `TestCase.assertEqual` to enable this. Basically, we are going through the individual CSR subtensors, call `.to_dense()` on them, and stack everything back together. I opted to not do this in the common machinery, since that way users are not affected by this (undocumented) hack. I also added an xfailed test that will trigger as soon as the behavior is supported natively so we don't forget to remove the hack when it is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88749 Approved by: https://github.com/mruberry, https://github.com/pearu commit 7c353eb39559f2c8897a0580700dd0a6f943d34f Author: Li-Huai (Allan) Lin Date: Thu Nov 10 09:40:05 2022 +0000 [MPS] Fix softplus (#88555) 1. Fixes #87780 2. Fixes mps graph cache issue 3. Adds proper tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/88555 Approved by: https://github.com/kulinseth commit 7ad87f63e248b629d435a199cb61f4ed1f3dfcab Author: Grigory Sizov Date: Thu Nov 10 08:12:56 2022 +0000 Support src_mask and src_key_padding_mask for Better Transformer (#88488) Fixes T135842750 (follow-up for #87377) At present, having both `src_key_padding_mask` and `src_mask` at the same time is not supported on the fastpath in Transformer and Multi-Head Attention. This PR enables using both masks on the fastpath on CPU and GPU: if both masks are passed, we merge them into a 4D mask in Python and change mask type to 2 before passing downstream. Downstream processing in native code is not changed, as it already supports 4D mask. Indeed, it is done depending on the device: - on CUDA, by `SoftMax.cu::masked_softmax_cuda`. When mask type is 2, it calls either `dispatch_softmax_forward` -> `softmax_warp_forward` or `at::softmax` (depending on the input size). In both cases 4D mask is supported. - on CPU, by `SoftMax.cpp::masked_softmax_cpp`. It calls `hosted_softmax` which supports 4D mask. - Extended `test_mask_check_fastpath` to check that fast path is indeed taken in Transformer when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path_mock` to check that fast path is taken in MHA when two masks are passed - Added `test_multihead_self_attn_two_masks_fast_path` to check that fast and slow paths give the same result when two masks are passed in MHA - `test_masked_softmax_mask_types` now covers mask type 2 - `test_transformerencoderlayer_fast_path` (CPU smoke test) is expanded to the case of both masks provided simultaneously - `test_masked_softmax_devices_parity` checks that mask type 2 is accepted by CPU and CUDA paths Pull Request resolved: https://github.com/pytorch/pytorch/pull/88488 Approved by: https://github.com/mikekgfb commit dcefea2706fb35ece5e49fc138d952a2acd15824 Author: efiks <5167930+efiks@users.noreply.github.com> Date: Thu Nov 10 06:11:05 2022 +0000 [caffe2][tourch] Optimize BatchBoxCox (#87585) Differential Revision: D40215424 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87585 Approved by: https://github.com/hyuen commit e87c79ca0cbab476a7d09853b5830b615a62f679 Author: PyTorch MergeBot Date: Thu Nov 10 03:04:57 2022 +0000 [vision hash update] update the pinned vision hash (#88742) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88742 Approved by: https://github.com/pytorchbot commit cf04b36ce8f531730210b03eaa347977a1c2d75c Author: Animesh Jain Date: Thu Nov 10 01:57:17 2022 +0000 [dynamo] fixes dict changed during runtime error (#87526) Fixes https://github.com/pytorch/torchdynamo/issues/1744 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87526 Approved by: https://github.com/ezyang commit 0b8889c724f52dd767564fcd51e8a0ee5e99b45f Author: William Wen Date: Thu Nov 10 01:48:04 2022 +0000 Do not flag models in dashboard due to NaN values (#88792) Title. Tested by running `python benchmarks/dynamo/runner.py --output-dir ../test-dynamo-runner-logs-4 --training --visualize_logs` on a copy of a recent set of logs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88792 Approved by: https://github.com/anijain2305 commit 6e3555edea3ec2f453d6dc2ddcba9c6313d5ced5 Author: William Wen Date: Thu Nov 10 01:45:52 2022 +0000 Add absolute latency to dashboard (#88790) Add absolute latency to dashboard, as requested by https://github.com/pytorch/torchdynamo/issues/1833#issuecomment-1302742914 Tested by setting `run.sh` to ``` rm -rf ../test-dynamo-runner-logs-7/ mkdir ../test-dynamo-runner-logs-7/ python benchmarks/dynamo/torchbench.py --performance --float32 -dcuda --output=../test-dynamo-runner-logs-7//inductor_torchbench_float32_training_cuda_performance.csv --training --inductor --no-skip --dashboard --only mobilenet_v2 --cold_start_latency python benchmarks/dynamo/torchbench.py --accuracy --float32 -dcuda --output=../test-dynamo-runner-logs-7//inductor_torchbench_float32_training_cuda_accuracy.csv --training --inductor --no-skip --dashboard --only mobilenet_v2 ``` and running `python benchmarks/dynamo/runner.py --output-dir ../test-dynamo-runner-logs-7/ --dashboard-archive-path /data/home/williamwen/dynamo-runner-logs-copy --training --run --compilers inductor --flag-compilers inductor --suites torchbench --update-dashboard` (need to comment out the `generate_commands` line and change the github issue ID from 681 to something else). Sample comment: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1309645562 NOTE: this change breaks processing old logs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88790 Approved by: https://github.com/anijain2305 commit 2381548071d01e1a3f22793a5e0bff1ad0f58a69 Author: Elias Ellison Date: Wed Nov 9 14:21:21 2022 -0800 add stride constraints to fallbacks (#88534) Add stride/contiguity constraints to fallbacks so that inputs will be in the right stride permutation for the fallback kernel. Improves perf of coat_lite_mini from 1.48415536054865 -> 2.010956856330101. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88534 Approved by: https://github.com/ngimel commit fb5c6ae61f1f622ec388ae9fa00e7683ce1729ce Author: Eddie Yan Date: Thu Nov 10 00:49:07 2022 +0000 [cuDNN][cuDNN V8 API] Match V7 API behavior for `channels_last` stride coercion for cuDNN (#88699) For ConvNeXt failure in https://github.com/pytorch/torchdynamo/issues/1833 cuDNN V7 has some stride "fixing" code to coerce cuDNN to use channels-last in cases when allowed by size 1 strides that was omitted in V8, which seems to seems to lead to performance regressions. This PR patches in the same fix for V8. CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/88699 Approved by: https://github.com/ngimel commit 59115e6139a475ec21d642e6f99798b8c37bcf4d Author: mikey dagitses Date: Thu Nov 10 00:27:59 2022 +0000 disable test that times out in fbcode (#88758) Test Plan: Rely on CI. Differential Revision: D41162966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88758 Approved by: https://github.com/zou3519 commit 16bd363863cceb907118557289af70882ea68985 Author: William Wen Date: Thu Nov 10 00:26:58 2022 +0000 Fix dynamo dashboard passrate denominator (#88777) Before the dashboard improvements, the passrate table looked like this: ~~~ +------------------------+------------+-------------+-------------+ | Compiler | torchbench | huggingface | timm_models | +------------------------+------------+-------------+-------------+ | eager | 98%, 54/55 | 100%, 43/43 | 100%, 61/61 | | aot_eager | 95%, 52/55 | 100%, 43/43 | 97%, 59/61 | | aot_cudagraphs | 75%, 41/55 | 49%, 21/43 | 38%, 23/61 | | nvprims_nvfuser | 71%, 39/55 | 16%, 7/43 | 48%, 29/61 | | inductor | 87%, 48/55 | 93%, 40/43 | 95%, 58/61 | | inductor_no_cudagraphs | 93%, 51/55 | 93%, 40/43 | 95%, 58/61 | +------------------------+------------+-------------+-------------+ ~~~ After the change, the table looked like: ~~~ +------------------------+------------+-------------+-------------+ | Compiler | torchbench | huggingface | timm_models | +------------------------+------------+-------------+-------------+ | eager | 82%, 53/65 | 84%, 43/51 | 82%, 61/74 | | aot_eager | 83%, 54/65 | 84%, 43/51 | 82%, 61/74 | | aot_cudagraphs | 69%, 45/65 | 65%, 33/51 | 38%, 28/74 | | nvprims_nvfuser | 48%, 31/65 | 78%, 40/51 | 26%, 19/74 | | inductor | 75%, 49/65 | 82%, 42/51 | 81%, 60/74 | | inductor_no_cudagraphs | 82%, 53/65 | 82%, 42/51 | 82%, 61/74 | +------------------------+------------+-------------+-------------+ ~~~ There is no actual regression, but the passrate is lower since the denominator is wrong. Check fix by running locally (e.g. `python benchmarks/dynamo/runner.py --output-dir ../test-dynamo-runner-logs-5 --training --visualize_logs`) and comparing passrate table output to previously correct one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88777 Approved by: https://github.com/anijain2305 commit 4f18739bf05bffff85609f90e0c319d8110c5616 Author: Nikita Shulga Date: Thu Nov 10 00:06:31 2022 +0000 Fix Docker image generation (#88741) Pass install channel when building nightly images Pass `TRITON_VERSION` argument to install triton for nightly images Fix `generate_pytorch_version.py` to work with unannotated tags and avoid failures like the following: ``` % git checkout nightly % ./.github/scripts/generate_pytorch_version.py fatal: No annotated tags can describe '93f15b1b54ca5fb4a7ca9c21a813b4b86ebaeafa'. However, there were unannotated tags: try --tags. Traceback (most recent call last): File "/Users/nshulga/git/pytorch/pytorch-release/./.github/scripts/generate_pytorch_version.py", line 120, in main() File "/Users/nshulga/git/pytorch/pytorch-release/./.github/scripts/generate_pytorch_version.py", line 115, in main print(version_obj.get_release_version()) File "/Users/nshulga/git/pytorch/pytorch-release/./.github/scripts/generate_pytorch_version.py", line 75, in get_release_version if not get_tag(): File "/Users/nshulga/git/pytorch/pytorch-release/./.github/scripts/generate_pytorch_version.py", line 37, in get_tag dirty_tag = subprocess.check_output( File "/Users/nshulga/miniforge3/lib/python3.9/subprocess.py", line 424, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/Users/nshulga/miniforge3/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['git', 'describe']' returned non-zero exit status 128. ``` After the change nightly is reported as(due to autolabelling issue, should be fixed by ttps://github.com/pytorch/test-infra/pull/1047 ): ``` % ./.github/scripts/generate_pytorch_version.py ciflow/inductor/26921+cpu ``` Even for tagged release commits version generation was wrong: ``` % git checkout release/1.13 % ./.github/scripts/generate_pytorch_version.py ciflow/periodic/79617-4848-g7c98e70d44+cpu ``` After the fix, it is as expected: ``` % ./.github/scripts/generate_pytorch_version.py 1.13.0+cpu ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88741 Approved by: https://github.com/dagitses, https://github.com/msaroufim commit 7006ac6ee509c00da54a0c90c38685a7adb61779 Author: Akshit Khurana Date: Tue Nov 8 10:29:39 2022 -0800 [Dynamo] Fix Tensor.T trace (#88642) Summary: Tensor.T considered T as a GetAttr and didn't progate "example_value" Via https://pytorch.org/docs/stable/tensors.html#torch.Tensor.T > If n is the number of dimensions in x, x.T is equivalent to > x.permute(n-1, n-2, ..., 0). Fixes pytorch/torchdynamo#1476 Test Plan: pytest test/dynamo/test_functions.py::FunctionTests::test_T Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D41130306](https://our.internmc.facebook.com/intern/diff/D41130306) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88642 Approved by: https://github.com/tugsbayasgalan, https://github.com/yanboliang, https://github.com/jansel commit c7fc7104594f19e263a525aa572f97e65b08c386 Author: PyTorch MergeBot Date: Wed Nov 9 22:38:41 2022 +0000 Revert "[3/n] Thread PG: add threaded PG implementation (#88627)" This reverts commit 6dd081846e3ae6192b375d658d4b4f3d6bd9df6e. Reverted https://github.com/pytorch/pytorch/pull/88627 on behalf of https://github.com/huydhn due to This breaks one macos m1 test https://hud.pytorch.org/pytorch/pytorch/commit/6dd081846e3ae6192b375d658d4b4f3d6bd9df6e in trunk. PR also fails with the same issue so I think trymerge code has a bug here letting this one merged commit 6fe4ccc7cbd5e953e5888947229945f7590e3bfe Author: HDCharles Date: Tue Nov 8 07:59:10 2022 -0800 [ao] qconfig.py fix public v private (#87515) Summary: made is_reuse_input_qconfig, _activation_is_memoryless, _partial_wrapper_equals, _obs_or_fq_ctr_equals, _add_module_to_qconfig_obs_ctr, _assert_valid_qconfig private Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40709280](https://our.internmc.facebook.com/intern/diff/D40709280) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87515 Approved by: https://github.com/jcaip commit 3a3500fa082482f8131a22196566f89da3de4162 Author: Howard Huang Date: Wed Nov 9 06:47:53 2022 -0800 [13/N] Update gather with CPU/CUDA implementations (#86409) Differential Revision: [D40181612](https://our.internmc.facebook.com/intern/diff/D40181612) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86409 Approved by: https://github.com/kwen2501 commit 1af9b38a907cdd8a21f4e0a363af3f136fa4062a Author: anjali411 Date: Wed Nov 9 14:48:20 2022 +0000 Symintify embedding_sparse_backward (#88746) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88746 Approved by: https://github.com/ezyang commit b7aa22d6db889a9ae31aabae80abc3e99ebc37ee Author: Zhengxu Chen Date: Wed Nov 9 21:39:46 2022 +0000 [fx] Fix GraphModule.print_readable() (#88730) Summary: `__nested_code()` seems removed. Test Plan: CI Differential Revision: D41149662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88730 Approved by: https://github.com/SherlockNoMad commit 6dd081846e3ae6192b375d658d4b4f3d6bd9df6e Author: Charlie Yan Date: Wed Nov 9 20:51:11 2022 +0000 [3/n] Thread PG: add threaded PG implementation (#88627) Summary: After the previous 2 diffs, finally we can add the threaded ProcessGroup implementation. Test Plan: TBD Reviewed By: XilunWu Differential Revision: D40992593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88627 Approved by: https://github.com/XilunWu, https://github.com/H-Huang commit 93d3bd626ed9bb99ded7a4e269f7a1fa486ac5d3 Author: PyTorch MergeBot Date: Wed Nov 9 20:48:32 2022 +0000 Revert "[primTorch] Improve `narrow` and `narrow_copy`: refs, tests, docs (#87045)" This reverts commit aa8279bcb8687e025a666e18828a436eb7ef7b45. Reverted https://github.com/pytorch/pytorch/pull/87045 on behalf of https://github.com/izaitsevfb due to BC-breaking change, D41161182 commit 8523c45717b21a205ddc74ec0fa0d97e7c201388 Author: Charlie Yan Date: Wed Nov 9 20:29:34 2022 +0000 Delete stub file to enable mypy check (#4649) (#88701) Summary: X-link: https://github.com/facebookresearch/detectron2/pull/4649 Context in https://fburl.com/4irjskbe This change deletes distributed.pyi, so that lintrunner will run mypy on distributed.py for typing check. Test Plan: CI Differential Revision: D41028360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88701 Approved by: https://github.com/zhaojuanmao commit 133e61af7a8f4b098daf7d34f848e3c2a6cb4ae4 Author: Sherlock Huang Date: Wed Nov 9 04:51:04 2022 +0000 OpOverload is_view (#88722) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88722 Approved by: https://github.com/ezyang commit 55df18e3da859024efd190d8b3145d25915adc5a Author: Howard Huang Date: Wed Nov 9 06:47:53 2022 -0800 [12/N] Update scatter with CPU/CUDA implementations (#86408) Differential Revision: [D40181613](https://our.internmc.facebook.com/intern/diff/D40181613) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86408 Approved by: https://github.com/kwen2501 commit 3a1bdfee67170103f621671ebd1b64d06863539d Author: mikey dagitses Date: Wed Nov 9 18:20:04 2022 +0000 skip environment collection test in fbcode (#88744) Summary: This runs pip, which we don't have in the fbcode environment. Test Plan: Rely on CI. Differential Revision: D41156589 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88744 Approved by: https://github.com/zou3519 commit de53d4143a3a6bb08eddd845ad7f824112283792 Author: Jason Ansel Date: Wed Nov 9 18:13:06 2022 +0000 Fix TorchInductor benchmarking in fbcode (#88689) Summary: Makes the C++ TorchInductor benchmarking work in fbcode plus some minor fixed to enable that. Test Plan: Test added Differential Revision: D41045910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88689 Approved by: https://github.com/soumith commit c4a3aa8fe7306aa490959df35a5933187b170d56 Author: ssjia Date: Tue Nov 8 13:44:43 2022 -0800 [vulkan] Add option for buffer representations in vTensor (#87622) This diff adds the option to use a Buffer to store data for a `vTensor` by passing `StorageType::BUFFER` to the constructor of `vTensor`. To enable this change, the construction of `vTensor` and `vTensorStorage` had to be slightly refactored to properly support strides. To summarize the changes: * `vTensorStorage` now contains no Tensor metadata (such as tensor sizes, strides, and `TensorOptions`) - it now only contains the image extents (if texture storage is used) and the buffer length. Tensor metadata is now managed by `vTensor`. The reason for this is to allow multiple `vTensor` objects to point to the same `vTensorStorage` but with different metadata which may be a useful feature now that Buffer storage is enabled. * `vTensor` will now compute the strides upon construction based on the requested sizes and memory layout if Buffer storage is requested. Previously, strides were faked by setting them all to 0 as strides do not apply to image textures (this behavior is preserved for texture storage). Differential Revision: [D40604163](https://our.internmc.facebook.com/intern/diff/D40604163/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87622 Approved by: https://github.com/digantdesai commit d81797e845123b6f682b0fe1c4c6e6b905059c65 Author: Edward Z. Yang Date: Wed Nov 9 08:24:44 2022 -0500 Meta function for aten.sort and aten.scatter* (#88705) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88705 Approved by: https://github.com/ezyang commit 100b55637b28cf826a52613abd62d7d49825a0ac Author: Will Constable Date: Wed Nov 9 16:41:04 2022 +0000 Mark dynamo torchbench dlrm as unsupported (#88712) - DLRM requires special configuration of embedding layers which are sparse and not compatible with DDP. - I could mark the embedding params as ignored in DDP to make the benchmark pass, but this isn't a representative benchmark. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88712 Approved by: https://github.com/ezyang commit eb9b1560195a89df6a14ded05b3e76d97346a1f2 Author: kshitij12345 Date: Wed Nov 9 17:15:12 2022 +0000 [fix] MathBits: serialization (#88182) Fixes #81690 TODO: * [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++) * [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python) * [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this) * [x] Add Comments * [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.) Notes: Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : https://github.com/pytorch/pytorch/issues/88484 Sparse Tensor: ```python >>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse() >>> a.conj().is_conj() False >>> a._neg_view() Traceback (most recent call last): File "", line 1, in NotImplementedError: Cannot access storage of SparseTensorImpl ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88182 Approved by: https://github.com/ezyang, https://github.com/anjali411 commit 525fe53aa41b2d4c3f411f8d1e92b55d95b7b0a6 Author: Nikita Shulga Date: Wed Nov 9 16:13:56 2022 +0000 [BE] Delete push_nightly_docker_ghcr (#88748) As it seems to be duplicating the functionality of `docker-release.yml` and have not produced a valid build in last 16 days, according to https://github.com/pytorch/pytorch/actions/workflows/push_nightly_docker_ghcr.yml Pull Request resolved: https://github.com/pytorch/pytorch/pull/88748 Approved by: https://github.com/seemethere commit f11f0e4a033ab09c637870ce0fad6ac68ec81eb0 Author: Bin Bao Date: Wed Nov 9 13:05:32 2022 +0000 [inductor] Handle nested tuple/list output in fallback kernel (#88495) Summary: Currently fallback kernel in inductor assumes its output is either a tensor or a tuple/list of tensors. This PR makes it handle more generic output data structure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88495 Approved by: https://github.com/jansel commit 3150c9dc6f296d941ece9aa4f4c189f36393ef8f Author: mikey dagitses Date: Tue Nov 8 10:06:03 2022 -0500 extract out the clean workspace test to its own file (#88682) Summary: This test relies on what the root workspace is before any other code is run. However, some of the test cases change it. If the order the tests are run is randomized, then the test can fail if run after one of them. Having it on its own ensures that it always sees a pristine state. Test Plan: Verified locally and confirmed in internal and external CI. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/88682 Approved by: https://github.com/r-barnes, https://github.com/malfet commit c19bae9f8457ac9b8774369a0f0a7ea31c90c3e9 Author: Edward Z. Yang Date: Wed Nov 9 08:13:03 2022 -0500 Add SherlockNoMad to symbolic-shapes reviewer list (#88739) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88739 Approved by: https://github.com/anjali411 commit 44de7cdbc463d73b967d1157041b402c3106239d Author: Edward Z. Yang Date: Mon Nov 7 13:34:38 2022 -0500 Add voznesenskym to symbolic-shapes group, move wconstab to listener (#88593) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88593 Approved by: https://github.com/anjali411 commit c86cc68d23521f8d6956e49fcd214d314f98da35 Author: Edward Z. Yang Date: Tue Nov 8 13:33:51 2022 -0500 Mark diag.out composite (#88670) It's implementation just redispatches, it works for more than CPU/CUDA. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88670 Approved by: https://github.com/anjali411 commit 69b2352236bc07798dd3e57844c1e7f0aa262b42 Author: Ivan Yashchuk Date: Wed Nov 9 12:56:55 2022 +0000 Add min cut partitioner for AOT+nvFuser (#88204) Here we mark most of `torch.ops.nvprims` as something that can be recomputed in the backward passes (and hopefully fused). TODO: - [x] Add a test after https://github.com/pytorch/pytorch/pull/88186 is merged Pull Request resolved: https://github.com/pytorch/pytorch/pull/88204 Approved by: https://github.com/jjsjann123, https://github.com/jansel commit ff7c5b0df80f5b72cd3dbb3a372d481f989a6ef3 Author: Sean Ross-Ross Date: Tue Nov 8 12:35:40 2022 -0600 Changing as_strided_scatter to deterministic inputs (#85583) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85583 Approved by: https://github.com/mruberry commit fca6ed02b91e0d685cc9b8b504bac5d356d31876 Author: blzheng Date: Wed Nov 9 10:40:23 2022 +0000 [Inductor] fix c++ compile error with masked float value init (#88298) Fixes #88201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88298 Approved by: https://github.com/jgong5, https://github.com/jansel commit 652af5ec15b81c39ec7413519d0ce9938d87bcf1 Author: Fabio Rocha Date: Tue Nov 8 19:25:30 2022 +0000 upsample_*.vec ops are now CompositeImplicit (#85638) It was previously CompositeExplicit but it was not really necessary. See discussion in https://github.com/pytorch/pytorch/issues/85405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85638 Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel commit aa8279bcb8687e025a666e18828a436eb7ef7b45 Author: Nikita Karetnikov Date: Wed Nov 9 00:53:37 2022 +0100 [primTorch] Improve `narrow` and `narrow_copy`: refs, tests, docs (#87045) Fixes #87019. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87045 Approved by: https://github.com/mruberry commit f6192b75c66cf5ac4591170106d8f58e7848bd07 Author: Xia, Weiwen Date: Wed Nov 9 08:08:11 2022 +0000 [Quant] Support lowering of channel shuffle in FX (#83731) Support lowering of channel shuffle in FX by adding its module and functional op to `is_copy_node` list in `torch/ao/quantization/fx/_lower_to_native_backend.py` UTs added to test - correctness of quantized `ChannelShuffle` module. - FX lowering of `ChannelShuffle` module and functional `channel_shuffle`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83731 Approved by: https://github.com/jerryzh168 commit ab9a19a95b628132bf0ad6474f245b4e596b9d74 Author: Nikita Shulga Date: Wed Nov 9 06:55:22 2022 +0000 [BE] Move `setup-ssh` step ahead of clone PyTorch (#88715) It allows one to SSH faster rather than having to wait for repo clone to finish. I.e. right now one usually have to wait for a few minutes fore PyTorch clone is finished, but with this change you can SSH ahead of time (thanks to `setup-ssh` being a composite action Pull Request resolved: https://github.com/pytorch/pytorch/pull/88715 Approved by: https://github.com/clee2000, https://github.com/izaitsevfb commit a7420d2ccb62d005f2e1853cfef8d25eb7748a90 Author: Eddie Yan Date: Wed Nov 9 01:49:50 2022 +0000 Hopper (`sm90`) support (#87736) Essentially a followup of #87436 CC @xwang233 @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/87736 Approved by: https://github.com/xwang233, https://github.com/malfet commit 19d7941e37cd4727ccb874ada7a310dc679ebaab Author: Wei-Sheng Chin Date: Wed Nov 9 01:31:42 2022 +0000 Fix Python-bound function signature (torch._C.Graph.addInput) (#88528) In pytorch/torch/_C/__init__.pyi, Graph.addInput has signature ```python def addInput(self, name: str) -> Value: ... ``` which doesn't match the corresponding function ```cpp Value* addInput(const std::string& name = "") { return block_->addInput(name); } ``` in python_ir.cpp. This PR aligns the bound function on both C++ and Python sides. Without this PR, mypy will compain whenever a change contains some calls to `addInput`; for example, ![image](https://user-images.githubusercontent.com/3524474/200092086-429b8d63-9321-4d03-b0d6-f4c9bd361756.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88528 Approved by: https://github.com/davidberard98 commit f0e6cea2ed2de9e5da9af8be4a243b9aae5aec06 Author: Edward Z. Yang Date: Tue Nov 8 13:47:27 2022 -0500 Meta registrations for inplace operators (#88678) Also, handle non-default alpha correctly. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88678 Approved by: https://github.com/SherlockNoMad, https://github.com/albanD commit a880ddc164203b6f49971f5af44cdb7d9b059f06 Author: Edward Z. Yang Date: Tue Nov 8 13:47:26 2022 -0500 Meta implementation for unsqueeze_ (#88675) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88675 Approved by: https://github.com/SherlockNoMad commit 1dab35ca1bea35ca7d069281490b709851fbcf95 Author: Edward Z. Yang Date: Tue Nov 8 13:47:26 2022 -0500 Meta implementation for bernoulli (#88676) For some reason bernoulli uses legacy memory format, see linked issue. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88676 Approved by: https://github.com/SherlockNoMad commit 6be426ca1aa857af7a148271ae4599f108b17a69 Author: Nikita Shulga Date: Wed Nov 9 01:04:29 2022 +0000 Update gloo submodule (#88530) Also, add an explicit cudart dependency to `torch_cuda` if Kineto is used with GPU support (it used to be somehow inherited from a wrong `gloo` setup) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88530 Approved by: https://github.com/osalpekar commit 08b2a251e122ff9ee3e5dc1af5513ab6cbd99db4 Author: Zhengxu Chen Date: Wed Nov 9 01:02:07 2022 +0000 [export] Preserve meta["val"] on placeholders in dynamo.export(). (#88651) Summary: Today when we transform the captured graph in the last step in export(aten_graph=True), we construct a new graph which doesn't have the all the metadata to be preserved, for example, node.meta["val"]. meta["val"] is important for writing passes and analysis on the graph later in the pipeline, we may want to preserve that on placeholder nodes. Test Plan: test_export.py:test_export_meta_val Differential Revision: D41110864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88651 Approved by: https://github.com/tugsbayasgalan, https://github.com/jansel commit 5f876bfdc512a376c12ee15cb58037937d73cf38 Author: Bin Bao Date: Tue Nov 8 15:31:15 2022 +0000 Reduce the number of shards inductor uses for model tests (#88610) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88610 Approved by: https://github.com/huydhn commit 9f58e027a9b802f541ea1d9ad750be833db2c39c Author: Antoni Viros i Martin Date: Wed Nov 9 00:19:36 2022 +0000 Add implementation for irregular dimension selection for nested tensors. (#88585) Summary: This diff modifies the implementation of the select operator so slices of the irregular dimension can be selected (e.g. nt[:,0,:]). Test Plan: Added new unit tests to test that the new functions work as intended (see them in diff). To test, `buck test mode/dev-nosan //caffe2/test:nested` Differential Revision: D41083993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88585 Approved by: https://github.com/cpuhrsch commit 87238e64914246f9f04ecec013fd6a78d87517b1 Author: Samantha Andow Date: Wed Nov 9 00:09:20 2022 +0000 [nn] add remove_duplicate flag to named_parameters (#759) (#88090) Summary: X-link: https://github.com/pytorch/torchrec/pull/759 Since the remove_duplicate flag was added to named_buffers in D39493161 (https://github.com/pytorch/pytorch/commit/c12f829cce29eb6971094a9bbb0f8971aed86f5c), this adds the same flag to named_parameters Test Plan: python test/test_nn.py -k test_buffers_and_named_buffers OSS Tests Differential Revision: D40801899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88090 Approved by: https://github.com/albanD commit cef13ebea0ba604540d1cb16e13fbd2c36040f59 Author: Taylor Robie Date: Mon Nov 7 15:48:35 2022 -0800 [Profiler] Memory profiler part 1: Gradient identification (#86802) There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86802 Approved by: https://github.com/chaekit commit c0e6b4329fe2dd35bb0bf162f4203ad7e0162554 Author: Michael Suo Date: Mon Nov 7 22:23:01 2022 -0800 [dynamo] only error out on nested fx trace if dynamo is optimizing (#88640) I think this is the final resolution to issue caused by https://github.com/pytorch/pytorch/pull/87797. The nvfuser issue that PR tripped up was because, even though we're correctly disabling torchdynamo via a `DisableContext`, the nested fx trace check was still firing. This PR properly narrows it to only fire if we're not disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88640 Approved by: https://github.com/yf225 commit a02ea655b5a5fbd615003d675c3e1765820298d2 Author: Mikayla Gawarecki Date: Tue Nov 8 19:14:48 2022 +0000 Slight fix in error message for check_for_seq_len_1_nested_tensor (#88690) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88690 Approved by: https://github.com/cpuhrsch commit 6e6f929b2c5308b8de2c82884b8fa70bd4778842 Author: Taylor Robie Date: Mon Nov 7 11:24:27 2022 -0800 [Profiler] Restructure inputs and capture TensorLists. (#87825) This PR unifies and rationalizes some of the input representation in Result. The current approach of storing separate types in separate vectors is tedious for two types (Tensors and scalars), but would be even more annoying with the addition of TensorLists. A similar disconnection exists with sizes and strides which the user is also expected to zip with tensor_metadata. I simplified things by moving inputs to a variant and moving sizes and strides into TensorMetadata. This also forced collection of sizes and strides in python tracer which helps to bring it in line with op profiling. Collection of TensorLists is fairly straightforward; `InputOutputEncoder` already has a spot for them (I actually collected them in the original TorchTidy prototype) so it was just a matter of plumbing things through. Differential Revision: [D40734451](https://our.internmc.facebook.com/intern/diff/D40734451/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87825 Approved by: https://github.com/slgong-fb, https://github.com/chaekit commit e132c45fd033842a677ce125a6d2657a500901a2 Author: Taylor Robie Date: Mon Nov 7 11:24:25 2022 -0800 [Profiler] Handle ABA for TensorImpl* when assigning IDs (#87133) Part of the current ID assingment algorithm groups any Storages which are associated with the same TensorImpl*. This isn't sound (which I knew but deferred until it actually became a problem) because pointers can be reused by different objects. (ABA problem) ABA is easy to handle for Storage because we see allocations and frees, but ~TensorImpl is very hot and cannot tolerate profiling code without significant increases in overhead. This PR narrows the conditions under which ID assignment will join on TensorImpl*. Two storages which are associated with the same TensorImpl* are grouped IFF they were live at the same time. (Note that this still allows storages with disjoint lifetimes to be joined transitively through a third storage which overlaps with both.) The need for this PR arose in memory profiling. The Python argument parser creates short lived Tensors for (some) scalar arguments which triggers this issue. (Which is stochastic and platform dependent since optimizations like reusing recently freed allocations is implementation defined.) Spurious connections can lead to confusing and long range interactions when building up the memory profile, so it makes sense to harden ID assignment to avoid any issues. Differential Revision: [D40445121](https://our.internmc.facebook.com/intern/diff/D40445121/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87133 Approved by: https://github.com/slgong-fb, https://github.com/chaekit commit 078c25df13b1c24da994245a9879fe4b6c23ce23 Author: Nikita Shulga Date: Tue Nov 8 21:10:07 2022 +0000 [MPS][BE] Code cleanup (#88529) Various code cleanup in MPS operations: - Per @kulinseth suggestion move `mpsSupportsCumsum` to `MPSDevice.h` and rename it to `is_macos_13_or_newer()` - Move Ventura MPSGraph new operators to `MPSGraphVenturaOps.h` header - Use `LookupAs` and `CreateCachedGraphAs` to make code more compact - Formatting Pull Request resolved: https://github.com/pytorch/pytorch/pull/88529 Approved by: https://github.com/kulinseth commit 1d82eba98b22fb987c2085ea1f85a78f8d9b6f28 Author: Sherlock Huang Date: Tue Nov 8 06:57:30 2022 +0000 PatternMatcher supports matching list-typed args (#88656) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88656 Approved by: https://github.com/jerryzh168 commit 8e2627d42fda299749b2d1e4b3899009824928c5 Author: Peter Bell Date: Mon Nov 7 22:00:15 2022 +0000 [inductor] Fix aten.fmod lowering (#88602) Currently the lowering for aten.fmod promotes integral types to float and calls `tl.libdevice.fmod` whereas the ATen behavior is to use the modulo operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88602 Approved by: https://github.com/jansel commit f556d73574ca55f39c61d2519f27b2d35dbed77b Author: Mengwei Liu Date: Tue Nov 8 19:53:11 2022 +0000 [torch] Implement aten::native_batch_norm.out for CPU (#88604) Summary: Implement `native_batch_norm.out` for CPU. Reuses the main logic for `native_batch_norm` but extract out the Tensor creation logic for outputs. There are 3 outputs: `output`, `save_mean` and `save_var`. `batch_norm_cpu` calls `batch_norm_cpu_update_stats_template` to get `save_mean` and `save_var`, and then calls into `batch_norm_cpu_transform_input_template` which initializes `output`. In the implementation of `batch_norm_cpu_out`, I did the following: * Let `batch_norm_cpu_transform_input_template` to take another argument `output`, ask the call sites to pass in a output Tensor. * Overload `batch_norm_cpu_update_stats_template` to take `save_mean` and `save_var`, ask the call sites to pass in those Tensors. * In `batch_norm_cpu_out`, pass `output`, `save_mean` and `save_var` all the way to our new `batch_norm_cpu_transform_input_template` and `batch_norm_cpu_update_stats_template`. * In `batch_norm_cpu`, prepare for these outputs and call `batch_norm_cpu_out`. Test Plan: Enable unit tests for `native_batch_norm.out`. Differential Revision: D40992036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88604 Approved by: https://github.com/iseeyuan, https://github.com/jjsjann123 commit 3e30a9ea1cfdb8db87aedc76ae1d3edd5dc8ace5 Author: Eddie Yan Date: Tue Nov 8 19:44:23 2022 +0000 Fix `CUDA_MAX_THREADS_PER_SM` for `sm_87` (#88644) CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/88644 Approved by: https://github.com/ngimel commit 6bb7f4f29f0a36ce410ce53d824f531eaf74c76e Author: Edward Z. Yang Date: Tue Nov 8 06:12:03 2022 -0800 Minor error message improvements on meta functions (#88677) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88677 Approved by: https://github.com/SherlockNoMad commit d98a884b33ebf4ad6b34a19ee72499c7beb06893 Author: PyTorch MergeBot Date: Tue Nov 8 19:04:25 2022 +0000 Revert "[cuDNN] (re-open) Enable cuDNN Frontend v8 API by Default (#87669)" This reverts commit 3c6bddc3f6347ce7d1ed33aee94cdaa953cbc387. Reverted https://github.com/pytorch/pytorch/pull/87669 on behalf of https://github.com/eqy due to investigating convnext benchmark regressions commit 5eecfcf5f3d118a9e2d502dfb8689018c9591662 Author: Nikita Shulga Date: Tue Nov 8 18:52:56 2022 +0000 Run libtorch trunk build on linux.4xlarge (#88683) Add optional `runner` input to `_linux-build.yml` Move `libtorch-linux-bionic-cuda11_6-py3_7-gcc7-build` to `linux.4xlarge` as it occasionally OOMS on 2xlarge one Pull Request resolved: https://github.com/pytorch/pytorch/pull/88683 Approved by: https://github.com/atalman, https://github.com/weiwangmeta commit eaf4fe3d2b7096579b05b52d543756f74d0e91e7 Author: zyq8709 Date: Tue Nov 8 18:46:56 2022 +0000 Most recently used cache management for TorchDynamo (#88076) Modify the lookup procedure for TorchDynamo caches to keep the head of the single linked list as the most recently used cache entry, which may potentially improve probability for cache hitting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88076 Approved by: https://github.com/jansel commit 1b5373fc830f9dc58e98d26645fba91d96cc13da Author: Edward Z. Yang Date: Tue Nov 8 05:33:18 2022 -0800 Mark as_strided_ as supporting SymInt in C++ (#88674) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88674 Approved by: https://github.com/anjali411 commit dba887766b8b3924d6e39a65c88d8e554f76c861 Author: PyTorch MergeBot Date: Tue Nov 8 18:37:48 2022 +0000 Revert "torchdynamo support modules() for nn_module (#88023)" This reverts commit 96104c7b7e908634a473792b6b2e9279d79d23d8. Reverted https://github.com/pytorch/pytorch/pull/88023 on behalf of https://github.com/ydwu4 due to [Internal breakages] https://www.internalfb.com/intern/sandcastle/job/9007200067589062/ commit 860e354d1c3276bc445071b19e45357d129ed535 Author: Edward Z. Yang Date: Tue Nov 8 10:23:53 2022 -0800 Support diag_embed.out decomposition (#88671) This is a little tricky: there is a diag_embed.out, but its not bound in Python because it's autogenerated, see https://github.com/pytorch/pytorch/issues/88598 So I can't "just" add the out variant to the ref, as this makes it inconsistent with the torch API. To workaround this, I mark the ref as supporting out, but not the original function. This is useful to do, because it means that diag_embed.out now supports symbolic shapes. However, this cannot be easily tested because I can't mark the out variant as being supported in the normal OpInfo test. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88671 Approved by: https://github.com/mruberry commit 3f6a560184d90b19e298477775d05c5996e6abbc Author: Edward Z. Yang Date: Tue Nov 8 05:28:04 2022 -0800 Correctly test that dtype/device match in generated .out kernels for composites (#88672) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88672 Approved by: https://github.com/anjali411 commit 245144a6361ec3b89012a63a4956646718a4d080 Author: Edward Z. Yang Date: Tue Nov 8 05:30:06 2022 -0800 Propagate layout and pin memory in randint to inner constructor (#88673) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88673 Approved by: https://github.com/anjali411 commit 96104c7b7e908634a473792b6b2e9279d79d23d8 Author: Yidi Wu Date: Tue Nov 8 18:22:03 2022 +0000 torchdynamo support modules() for nn_module (#88023) Differential Revision: D40820879 This diff allows models to call self.modules() during dynamo tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88023 Approved by: https://github.com/tugsbayasgalan, https://github.com/voznesenskym, https://github.com/jansel commit ee28b865ee9c87cce4db0011987baf8d125cc857 Author: Kurt Mohler Date: Tue Nov 8 18:11:01 2022 +0000 Deprecate TypedStorage, its derived classes, and all of their public methods (#85303) Part of #85302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303 Approved by: https://github.com/ezyang commit 53ca5ad347451f3087dedc8df5c1a34663812a6b Author: Natalia Gimelshein Date: Tue Nov 8 17:06:28 2022 +0000 enable scalar reduction with dim=-1 (#88628) Tested with all samples for `sum`, but also fixes all samples errors on other reductions (amin, amax, any, all etc) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88628 Approved by: https://github.com/desertfire commit 89c5819626b5b4edd2f000c6baf2ef56fa93458f Author: Will Constable Date: Tue Nov 8 02:22:01 2022 +0000 Dynamo DDP accuracy bench uses find_unused_parameters (#88645) - find_unused_parameters adds a slight overhead, but is required in cases where users do not manually specify parameters to ignore which will not receive grads. In some models, some parameters do not receive grads, and this causes DDP to throw an exception as it waits for a grad for each parameter Pull Request resolved: https://github.com/pytorch/pytorch/pull/88645 Approved by: https://github.com/soumith commit fcc28834765fc4dbff85f8b3992f8a72fc739694 Author: albanD Date: Mon Nov 7 10:24:20 2022 -0500 Clean up SymFloat binding to cover all functions (#88370) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88370 Approved by: https://github.com/ezyang commit 6abaa5946dc21d7836d5d46b6acc84f61f38f970 Author: albanD Date: Mon Nov 7 10:23:18 2022 -0500 Fix categorization of sym_int method (#88369) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88369 Approved by: https://github.com/ezyang, https://github.com/bdhirsh, https://github.com/anjali411 commit bc66ddb5cb276f3ef0be4d73819f1b172e0872d1 Author: Howard Huang Date: Mon Nov 7 15:44:31 2022 -0800 Add torch.distributed.DistBackendError exception type, thrown from C10D_NCCL_CHECK (#88134) Currently all of the distributed errors are thrown from the `TORCH_CHECK` macro which throws a generic `RuntimeError`. This change introduced a new error type `DistBackendError` which derives from `RuntimeError` to signify there was an error with the backend communication library. This allows for better error handling and analysis at higher levels in the stack. Motivation: https://docs.google.com/document/d/1j6VPOkC6znscliFuiDWMuMV1_fH4Abgdq7TCHMcXai4/edit#heading=h.a9rc38misyx8 Changes: - introduce new error type - Update `C10D_NCCL_CHECK` Sample script to demonstrate new error type ```python import torch import torch.distributed as dist if __name__ == "__main__": dist.init_process_group("nccl") dist.broadcast(torch.tensor([1, 2, 3]).cuda(), 0) ``` Differential Revision: [D40998803](https://our.internmc.facebook.com/intern/diff/D40998803) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88134 Approved by: https://github.com/rohan-varma commit 1a7c4b0de71de290a1b35cd96fb2ca6e7d24b131 Author: lezcano Date: Mon Nov 7 16:32:25 2022 +0000 Create _make_alias to preserve the name of a function when creating an alias (#88114) Before, we would inherit the name of the aliased function, which was very confusing, and disallowed some homogeneous treatment of references, as we do later in this stack Pull Request resolved: https://github.com/pytorch/pytorch/pull/88114 Approved by: https://github.com/mruberry commit af09270e10bbd063f9cdede03ba0ef27f0607304 Author: jjsjann123 Date: Tue Nov 8 12:06:35 2022 +0000 nvprims bookend non compute (#88457) Cherry-pickeding: https://github.com/csarofeen/pytorch/pull/2099 1. enabling bookend non-compute-ops pass on nvfuser 2. fixing bookend op check on intermediate tensor as partition inputs 3. python tests added for: `getitem` special handling bookend_non_compute removal 4. patching dfs by excluding dfs within partition to avoid going over recursion limitation Pull Request resolved: https://github.com/pytorch/pytorch/pull/88457 Approved by: https://github.com/SherlockNoMad commit 8cb5c5543e92628782deb00dda78380076b89e66 Author: Huy Do Date: Tue Nov 8 08:32:45 2022 +0000 Revive static_runtime_benchmark build and test (#87660) This build uses the wrong BUILD_ENVIRONMENT `pytorch-linux-focal-py3`, thus it hasn't been run for a long time (forgotten). The name was probably the old name of the build environment we used in the past. The convention today doesn't have the `pytorch-` prefix. There is a TODO for this: > TODO: this condition is never (BUILD_ENVIRONMENT doesn't start with pytorch-), need to fix this. This is done as part of [T131829540](https://www.internalfb.com/intern/tasks/?t=131829540), where we want `static_runtime_benchmark` build and test jobs to run in OSS CI to avoid breaking internal * I also fix some compiler warning errors `-Werror=sign-compare`, `-Werror,-Wunused-const-variable`, and gcc7 compatibility issue along the way because this hasn't been run for a long time. * Reviving this test also reveals a small bug in `PrepackWeights` test in `test_static_runtime.cc` added recently in https://github.com/pytorch/pytorch/pull/85289. The test refers to an internal ops and should only be run internally. This has been fixed by https://github.com/pytorch/pytorch/pull/87799 (To be merged) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87660 Approved by: https://github.com/malfet commit 02c1a304fa801942258a15a7e50abaa92aca2ddf Author: Michael Suo Date: Tue Nov 8 06:29:11 2022 +0000 [ci] increase timeout time of ios test app build (#88611) We were timing out; 5 minutes seems a bit short. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88611 Approved by: https://github.com/clee2000, https://github.com/huydhn, https://github.com/ZainRizvi commit 8f66ae413f8c9d7f2418d7f0b9f69d409c455b46 Author: Taylor Robie Date: Mon Nov 7 16:07:13 2022 -0800 [Autograd] Use in-place input accumulation fast path for dense Tensors. (#88339) There is a fast path in InputBuffer to steal memory when use count is zero, however it is only used for sparse Tensors. According to Natalia, this is just because it wasn't obvious that there would be a benefit for dense Tensors so there was no reason to live dangerously. However I've noticed large Tensors in internal models which would benefit from this optimization as well. Differential Revision: [D40946601](https://our.internmc.facebook.com/intern/diff/D40946601/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88339 Approved by: https://github.com/ngimel commit ffb6e68962a6c376ffb658752877e939d14c2f6d Author: Charlie Yan Date: Tue Nov 8 05:12:18 2022 +0000 Add missing args to DDP constructor in distributed.pyi (#88209) Summary: As title. And remove all unnecessary `pyre-fixme` for the unknown arg in call-site. Test Plan: CI Differential Revision: D40874013 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88209 Approved by: https://github.com/zhaojuanmao commit ced71e8e82b8c1a035716c671da51f16b49f4eb5 Author: biubiuX <4338192+biubiuX@users.noreply.github.com> Date: Tue Nov 8 04:49:45 2022 +0000 [Pytorch] add an option to disable TORCH_WARN and TORCH_WARN_ONCE log (#87188) Summary: Add an option to disable TORCH_WARN, some op could trigger spammy TOCH_WARN log which is not desired under certain scenario. Test Plan: Tested with -pt.disable_warn = 1 and -pt.disable_warn = 0 verified TORCH_WARN and TORCH_WARN_ONCE are properly handled tested with -pt.strip_error_messages = 1, -pt.disable_warn = 0 verified strip error message is respected when warn is printed Differential Revision: D40321550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87188 Approved by: https://github.com/kurtamohler, https://github.com/ezyang commit ed97e0aa2918e687309ee9a146c8294aefb237d2 Author: PyTorch MergeBot Date: Tue Nov 8 03:29:52 2022 +0000 [vision hash update] update the pinned vision hash (#88465) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88465 Approved by: https://github.com/pytorchbot commit 9f11ce7f67612d1c11f1a6a9b264779b27062e82 Author: BoringCrypto Date: Tue Nov 8 03:26:44 2022 +0000 Setting pickle_module isn't working (#88570) When setting the pickle_module it currently always gets overwritten by the pickle module. This should only happen when the pickle_module isn't specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88570 Approved by: https://github.com/kit1980 commit 825f4e602b766545a4ee6dfd971056e24c7dbbe8 Author: Edward Z. Yang Date: Mon Nov 7 11:13:07 2022 -0800 Add support for symbolic shapes to sparse tensor (#88573) Along the way, I undid making sparse/dense dim symint (they're dimensions, so they should be static.) Also symintify set_indices_and_values_unsafe There is a little bit of a nontrivial infra change here: previously, we didn't populate the strides field on sparse tensors. It is now populated with "empty" strides, and this meant that sparse tensors were falsely reporting they were non-overlapping dense/contiguous. I added in a hack to work around this case. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88573 Approved by: https://github.com/anjali411 commit c29502dd2fa38c79ada620fbde2f61d58df6e219 Author: Jiewen Tan Date: Tue Nov 8 02:22:02 2022 +0000 [LTC] Remove view (#88445) Summary: This pull request removes the last view ops, the original view. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.TestView* Pull Request resolved: https://github.com/pytorch/pytorch/pull/88445 Approved by: https://github.com/JackCaoG, https://github.com/antoniojkim, https://github.com/Krovatkin commit f2000842a864ed4c2287aa3a821ab8a9224ad52b Author: Nikita Shulga Date: Tue Nov 8 01:46:25 2022 +0000 Do not use double for single-prec upsample (#88277) I'm not sure, what would be the best behaviour here, but it feels a bit strange to perform parts of `float32` computations as `float64` and then downcast them back to `float32`. Use `at::opmath_type` rather than `at:acc_type` as no accumulation is used in the op. I don't know much about double vs single precision scalar perf on x86 CPU, but before the change: ``` python -c "import timeit;import torch;x=torch.arange(100, dtype=torch.float32).reshape(1, 1, 10, 10); print(timeit.Timer(stmt='torch.nn.functional.interpolate(x, scale_factor=2.0, mode=\"bilinear\", align_corners=False)', globals={'x':x, 'torch':torch}).timeit())" 11.337517574429512 ``` After the change: ``` $ python -c "import timeit;import torch;x=torch.arange(100, dtype=torch.float32).reshape(1, 1, 10, 10); print(timeit.Timer(stmt='torch.nn.functional.interpolate(x, scale_factor=2.0, mode=\"bilinear\", align_corners=False)', globals={'x':x, 'torch':torch}).timeit())" 10.513805857859552 ``` I.e. roughly 7% perf degradation (measured on Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz) NOTE: - `aten::acc_type` yields `double` - `aten::acc_type` return `float`. Fixes https://github.com/pytorch/pytorch/issues/87968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88277 Approved by: https://github.com/mingfeima, https://github.com/ngimel, https://github.com/jgong5 commit 4ea2310f1e4410b439430e42450e176463a960c2 Author: Kazuaki Ishizaki Date: Tue Nov 8 01:33:36 2022 +0000 Fix typos used in documents under torch directory (#88483) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html. This is a follow-up of #88300. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88483 Approved by: https://github.com/kit1980 commit d25be63c05889250212249e3cd87e48d12c4f9c1 Author: Huy Do Date: Tue Nov 8 01:17:35 2022 +0000 [Reland] Use sudo when reset NVIDIA devices (#88605) I accidentally delete my remote branch, so I need to create a new PR for this fix (instead of updating the reverted PR https://github.com/pytorch/pytorch/pull/88531) TIL, sudo echo doesn't do that I think it does, the correct syntax should be `echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset` granting sudo permission to the latter tee command. Due diligence and actually login to `i-07e62045d15df3629` and make sure that the command works Pull Request resolved: https://github.com/pytorch/pytorch/pull/88605 Approved by: https://github.com/ZainRizvi commit c77368d41615835e5124affe79f88feed93e8855 Author: Antoni Viros i Martin Date: Tue Nov 8 00:03:14 2022 +0000 Implement a constructor for nested_tensor that is similar to torch.tensor() (#88213) Summary: This diff merges both previous implementations of constructors for nested tensors, the one from lists of tensors and the one with arbitrary python lists, adn implements it in pytorch core so no extensions are needed to construct NT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88213 Approved by: https://github.com/cpuhrsch commit 72a7351993c953500bd8cdb1fb7a9e33aaa7ef9d Author: Huy Do Date: Mon Nov 7 23:53:17 2022 +0000 Pin linux ninja dep to 1.10.2 (#88548) The latest version 1.11.1 breaks PyTorch CI. A bunch of tests are failing now in master https://hud.pytorch.org/pytorch/pytorch/commit/d1ee0730410ac910760c0a21156e574093a0d15a. Curiously, the latest commit https://hud.pytorch.org/pytorch/pytorch/commit/81042d3a53335259c60e5aa8c9b9614c3d87b05f looks green, but it's good to pin this dependency anyway https://github.com/pytorch/pytorch/blob/master/.circleci/docker/requirements-ci.txt#L95-L97 has a curious note about ninja and why it's not part of the docker container (need to revisit this later on): ``` ``` This is one more reason to justify the effort to consolidating all pip and conda dependencies to get rid of this family of issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88548 Approved by: https://github.com/clee2000 commit fdf286510828e149b896235db07d48ab51cd1121 Author: Huy Do Date: Mon Nov 7 23:49:19 2022 +0000 Use test/test-reports for inductor (#88533) So that the test reports can be picked up automatically by the CI and uploaded to S3. Later on, this will allows the querying of these test reports from our Rockset DB. For example https://github.com/pytorch/pytorch/actions/runs/3382363153/jobs/5617382531 `Upload test statistics` shows: ``` + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test No tests in reports found in test ``` https://hud.pytorch.org/pytorch/pytorch/commit/678d038001b0bd61501739ea97989d28f758343e inductor artifacts are also empty zip at the moment Pull Request resolved: https://github.com/pytorch/pytorch/pull/88533 Approved by: https://github.com/desertfire commit eb3f975c6e29104014fa9bbffe12ab32709672d9 Author: Peter Bell Date: Sun Nov 6 23:38:12 2022 +0000 Fix segfault in has_torch_function (#88559) Fixes #83908 `PySequence_Fast` may return `NULL` to indicate an error was raised, in which case `sequence_has_torch_function` will dereference a null pointer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88559 Approved by: https://github.com/ezyang, https://github.com/Skylion007, https://github.com/hameerabbasi commit 4796e23bbbdcbfa9110338af3c445ca366bd0b2b Author: Huy Do Date: Mon Nov 7 23:05:11 2022 +0000 Fix pull docs build running with a schedule and increase cpp doc timeout to 4h (#88589) * After https://github.com/pytorch/pytorch/pull/88373, pull workflow can now be triggered with a schedule. This changes the assumption in the doc build workflow when schedule event is used to determine if the docs should be pushed * I'll create a follow-up issue to see if it's possible to improve the performance of cpp doc build job. At the moment, it uses a linux.12xlarge runner and still couldn't finish the job after 3h Pull Request resolved: https://github.com/pytorch/pytorch/pull/88589 Approved by: https://github.com/seemethere, https://github.com/ZainRizvi commit d453b3c4d4b1cc5a0c626221a1f389dfa862ca5e Author: lezcano Date: Mon Nov 7 19:40:25 2022 +0000 Add a note on the stability of linalg functions. (#88313) This was long-due, as it keeps comming up in issues. Fixes https://github.com/pytorch/pytorch/issues/85950 Fixes https://github.com/pytorch/pytorch/issues/59720 Fixes https://github.com/pytorch/pytorch/issues/59782 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88313 Approved by: https://github.com/soumith, https://github.com/mruberry commit b00c43b310e7544ed74daa84a9638fddbe190304 Author: PyTorch MergeBot Date: Mon Nov 7 22:29:56 2022 +0000 Revert "fallback for scatter_(scalar) (#88210)" This reverts commit 896fa8c5c9b0191c9621e04ab5e20057614d48ad. Reverted https://github.com/pytorch/pytorch/pull/88210 on behalf of https://github.com/suo due to this broke inductor tests, see: https://hud.pytorch.org/pytorch/pytorch/commit/896fa8c5c9b0191c9621e04ab5e20057614d48ad commit 0e67b2f7dd13db1fea421d860ede65a653738dfe Author: William Wen Date: Mon Nov 7 22:24:44 2022 +0000 Dynamo Dashboard Improvements (#88516) Implement various features in https://github.com/pytorch/torchdynamo/issues/1644: - Upload nightly run logs to /fsx before parsing - for backing up parsing failures. - Flag models with (1) < 0.95x speedup, (2) > 2min compile time, (3) < 0.9x compression ratio - Flag models that were passing yesterday but failed today. - Other small bug fixes. See https://github.com/pytorch/torchdynamo/issues/1831 for sample outputs. Also tested by running run.sh: ```bash rm -rf ../test-dynamo-runner-logs-3/ mkdir ../test-dynamo-runner-logs-3/ python benchmarks/dynamo/torchbench.py --performance --float32 -dcuda --output=../test-dynamo-runner-logs-3//inductor_torchbench_float32_training_cuda_performance.csv --training --inductor --no-skip --dashboard --only mobilenet_v2 --cold_start_latency python benchmarks/dynamo/torchbench.py --accuracy --float32 -dcuda --output=../test-dynamo-runner-logs-3//inductor_torchbench_float32_training_cuda_accuracy.csv --training --inductor --no-skip --dashboard --only mobilenet_v2 ``` with the command `python benchmarks/dynamo/runner.py --output-dir ../test-dynamo-runner-logs-3/ --dashboard-archive-path /data/home/williamwen/dynamo-runner-logs-copy --training --run --compilers inductor --flag-compilers inductor --suites torchbench --update-dashboard` (need to comment out the `generate_commands` line and change the github issue ID from 681 to something else). Pull Request resolved: https://github.com/pytorch/pytorch/pull/88516 Approved by: https://github.com/anijain2305 commit b14e06503a67ed72c2a84462d34e7494f3ead5b1 Author: Aaron Gokaslan Date: Mon Nov 7 22:17:10 2022 +0000 (fix): Add some missing std::moves to C10 (#88512) I saw some missed optimization opportunities in C10 using std::move and thought I would submit a PR to fix them. There are particularly a lot of them dealing with the symbolic operators which are used in quite a few places including in loops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88512 Approved by: https://github.com/ezyang commit d8506ff42b3d0dd8d25ab989967daffba13268cd Author: lezcano Date: Mon Nov 7 19:21:24 2022 +0000 Generalize gesvdjBatched to run whith full_matrices==false (#88502) As brought up in https://github.com/pytorch/pytorch/issues/86234#issuecomment-1268296036, our heuristic for which SVD backend to choose was not great in some cases. The case in which there could be some improvements is when we have a large batch of very small non-square matrices. This PR, adapts the calling code to gesvdj by creating two temporary square buffers to allow to call gesvdjBatched, and then copies back the result into the output buffers. We then modify the heuristic that chooses between gesvdj and gesvdjBatched. Fixes https://github.com/pytorch/pytorch/issues/86234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88502 Approved by: https://github.com/IvanYashchuk, https://github.com/nikitaved, https://github.com/mruberry, https://github.com/xwang233 commit 9dadf8fcc21413fe12ea2c81d970f4877a9235a3 Author: Vitaly Fedyunin Date: Mon Nov 7 10:30:55 2022 -0500 [DataPipes] Add group support to the sharding_filter (#88424) Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88424 Approved by: https://github.com/ejguan commit 23a3eb37cfa52fcbfb766bd733cfa60b28b83f42 Author: Edward Z. Yang Date: Mon Nov 7 08:51:15 2022 -0800 SymIntify _copy functionalization kernels (and _copy_out too) (#88572) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88572 Approved by: https://github.com/anjali411, https://github.com/bdhirsh commit 896fa8c5c9b0191c9621e04ab5e20057614d48ad Author: Nikolay Korovaiko Date: Mon Nov 7 21:25:55 2022 +0000 fallback for scatter_(scalar) (#88210) `scatter_reduce_` overloads can only accept `Tensor src`. `scatter_`, on the other hand, can accept `Number src`. Switching a fallback from `scatter_reduce_` to `scatter_` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88210 Approved by: https://github.com/desertfire commit 0a69c50a46d50ae265e2d1d826d0b4b69d4351fd Author: Jane Xu Date: Mon Nov 7 21:15:07 2022 +0000 Publicly expose _LRScheduler to LRScheduler (#88503) Fixes #61232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88503 Approved by: https://github.com/soulitzer commit 05b9e8ec00274ffb8dc94b974d1335d5986f9620 Author: Huy Do Date: Mon Nov 7 21:04:02 2022 +0000 Upload test stats for inductor workflow (#88535) We miss this new workflow, so none of its test stats are uploaded to rockset Pull Request resolved: https://github.com/pytorch/pytorch/pull/88535 Approved by: https://github.com/desertfire commit a37524085df7685820f9c15c39d95da077d49be7 Author: Yu Guo Date: Fri Nov 4 16:51:35 2022 -0700 [torchdynamo] support torch.autograd._profiler_enabled (#88378) fix https://github.com/pytorch/torchdynamo/issues/1826 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88378 Approved by: https://github.com/voznesenskym commit 95d57b54e024c4d0442c0c76cb37b1b3ac06db26 Author: Sherlock Huang Date: Fri Nov 4 17:10:21 2022 +0000 Handle pin_memory in refs.randn (#88473) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88473 Approved by: https://github.com/mruberry commit bf49dada1e3b94621823f0d9017081683f107ece Author: Michael Suo Date: Mon Nov 7 08:57:51 2022 -0800 [nvfuser] skip extremal tests on rocm (#88587) Summary: These are failing in rocm so disable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88587 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn commit 7bf9db81c5b19fb1fb5c2056e03f183a85ebfc5c Author: PyTorch MergeBot Date: Mon Nov 7 19:59:42 2022 +0000 Revert "Use sudo when reset NVIDIA devices (#88531)" This reverts commit 505486ce9321bc22d2156a1a9b97fe474a05b53b. Reverted https://github.com/pytorch/pytorch/pull/88531 on behalf of https://github.com/huydhn due to Wrong sudo echo usage, should use tee instead commit 78a0ca29d939fc3017c3281730ba19ece5162f5c Author: PyTorch MergeBot Date: Mon Nov 7 18:51:16 2022 +0000 Revert "[fix] allow saving python attr on Tensor and Parameter via torch.save (#81616)" This reverts commit 54b6188cc6dee45b775d688223b847dc8ea85bff. Reverted https://github.com/pytorch/pytorch/pull/81616 on behalf of https://github.com/mehtanirav due to Internal publishing is broken commit 91a403984255418142abcf0966f2aa02ff4ae5ef Author: Angela Yi Date: Mon Nov 7 18:42:41 2022 +0000 [exir][fx] PassManager error handling (#88520) Summary: * Added an error message for when the result is not a PassResult * Modified the error handling to capture exceptions that happen in the check() function * consolidated inplace_wrapper and pass_result_wrapper Test Plan: CI Differential Revision: D40950135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88520 Approved by: https://github.com/SherlockNoMad commit bd1ffc6501376c6a00dec67d2dd8482470a140b5 Author: Yanbo Liang Date: Mon Nov 7 18:03:31 2022 +0000 [Dynamo] Fix bug: GradMode doesn't carry grad state correctly after graph break (#88537) Fixes https://github.com/pytorch/torchdynamo/issues/1446 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88537 Approved by: https://github.com/jansel commit 6663ae5537f3c61030ba4d425bd57a097c51430a Author: Rodrigo Kumpera Date: Mon Nov 7 17:56:40 2022 +0000 [2/n] Thread PG: add class _World to distributed_c10d.py (#781) (#88471) Summary: X-link: https://github.com/pytorch/torchrec/pull/781 Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 This change ensures BC by keeping the global variables around and have the default _World wrap it. I have relinked this diff to a new github PR, so that I can update it. The original PR is > Pull Request resolved: https://github.com/pytorch/pytorch/pull/86348 Differential Revision: D40236769 Pulled By: yhcharles Pull Request resolved: https://github.com/pytorch/pytorch/pull/88471 Approved by: https://github.com/gnadathur, https://github.com/rohan-varma commit fc8f2f66fecea51c80357c424ab6b336b744ca80 Author: Zain Rizvi Date: Mon Nov 7 17:38:42 2022 +0000 Clarify rules for which commit is used in CI (#88425) The old information was out of date. Updating it as per @janeyx99's feedback Pull Request resolved: https://github.com/pytorch/pytorch/pull/88425 Approved by: https://github.com/malfet commit c407a7b20330afb957944ad26633a388220a4e43 Author: Huy Do Date: Mon Nov 7 17:26:28 2022 +0000 Upgrade Linux NVIDIA driver to the latest prod version (#88517) The driver (515.76) is downloaded from https://www.nvidia.com/en-us/drivers/unix. This should help address the issue with A10G GPU on G5 runners according to NVIDIA. This is to address https://github.com/pytorch/pytorch/issues/88352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88517 Approved by: https://github.com/ZainRizvi commit 505486ce9321bc22d2156a1a9b97fe474a05b53b Author: Huy Do Date: Mon Nov 7 17:19:02 2022 +0000 Use sudo when reset NVIDIA devices (#88531) Per title, I should have known, i.e. https://ossci-raw-job-status.s3.amazonaws.com/log/9307292415 ``` 2022-11-04T23:52:18.2921665Z + echo 1 2022-11-04T23:52:18.2921862Z Reseting 0000:00:1e.0 (enabled state: 0) 2022-11-04T23:52:18.2922186Z .github/scripts/install_nvidia_utils_linux.sh: line 77: /sys/bus/pci/devices/0000:00:1e.0/reset: Permission denied ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88531 Approved by: https://github.com/ZainRizvi commit cec4bd99b05a0beb548a821c5efc8a02833ba2c3 Author: Nikolay Korovaiko Date: Mon Nov 7 17:02:08 2022 +0000 allow XLA folks update the pin (#88527) this is one of the files XLA team needs to update ocassionally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88527 Approved by: https://github.com/wconstab commit a16ced03c93dcbc5b08d0f9a36f8feab583f129a Author: Brian Hirsh Date: Fri Nov 4 14:20:19 2022 -0700 reland "fix as_strided_scatter_backward (#87646)" (#88342) This reverts commit 71fb763e5452881cb3be8fefa9419b785d0a61e2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88342 Approved by: https://github.com/zou3519 commit dd43903fa99b8549225ec63c2e81ef4693436be0 Author: Mike Iovine Date: Mon Nov 7 14:36:39 2022 +0000 [Static Runtime] Fix tensor_split sections overload (#88113) Summary: D40798763 broke this op. Unfortunately, it wasn't caught at land time due to the recent OSS Static Runtime test problems. The problem is C++ overload resolution. After D40798763, the int that we were passing to `at::native::tensor_split` was getting implicitly converted to `IntArrayRef`. Fix this by converting the int to a `SymInt` and calling the correct overload. Test Plan: ``` buck2 test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Tensor_Split --run-disabled ``` Differential Revision: D40862394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88113 Approved by: https://github.com/hlu1 commit 7076a6481d9f6d3ed40af1eac285fe5046a87531 Author: PyTorch MergeBot Date: Mon Nov 7 10:22:44 2022 +0000 [xla hash update] update the pinned xla hash (#88070) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88070 Approved by: https://github.com/pytorchbot commit ad27d762a7457c6a7f5b0c4c6778935c282df71b Author: Wang, Eikan Date: Fri Nov 4 05:28:18 2022 +0000 Support sign for HF models like ElectraForQuestionAnswering (#88160) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88160 Approved by: https://github.com/jansel commit a9d37ce8f50a3111cc9eaf4f633decd092b9d726 Author: Wang, Eikan Date: Fri Nov 4 05:28:17 2022 +0000 Support reduction vectorization (#87356) This PR is to optimize reduction implementation by `at::vec`. The main idea is as same as the aten implementation. - Step1: Parallelize and vectorize the reduction implementation - Step2: Invoke `at::vec::vec_reduce_all` to reduce the vector generated at step 1 to a single scalar - Step3: Handle the tail elements For the implementation, we create two kernels - `CppVecKernel` and `CppKernel`. The code block generation is as follows step by step. - Gen the non-reduction loop - [Code](https://github.com/pytorch/pytorch/blob/gh/EikanWang/9/head/torch/_inductor/codegen/cpp.py#L1008-L1010) - Gen the reduction initialization both for vectorization and non-vectorization kernel - [Code](https://github.com/pytorch/pytorch/blob/gh/EikanWang/9/head/torch/_inductor/codegen/cpp.py#L1015) - Gen the reduction loop for the vectorization kernel - [Code](https://github.com/pytorch/pytorch/blob/gh/EikanWang/9/head/torch/_inductor/codegen/cpp.py#L1021-L1023) - Gen the code to reduce the vector to scalar - [Code](https://github.com/pytorch/pytorch/blob/gh/EikanWang/9/head/torch/_inductor/codegen/cpp.py#L1033) - Gen the reduction loop for the non-vectorization kernel - [Code](https://github.com/pytorch/pytorch/blob/gh/EikanWang/9/head/torch/_inductor/codegen/cpp.py#L1042) - Do some post-reduction things like store reduction value - [Code](https://github.com/pytorch/pytorch/blob/gh/EikanWang/9/head/torch/_inductor/codegen/cpp.py#L1049) ```python for loop in CppVecKernel.NoneReductionLoop: CppVecKernel.ReductionPrefix for loop in CppVecKernel.ReductionLoop CppVecKernel.Loads CppVecKernel.Compute CppVecKernel.Stores CppVecKernel.ReductionSuffix for loop in CppKernel.ReductionLoop CppKernel.Loads CppKernel.Compute CppKernel.Stores CppKernel.ReductionSuffix ``` The code snippet for maximum reduction exemplifies the idea. More detailed comments are inlined. ```C++ { // Declare reduction for at::vec::Vectorized since it is not built-in data type. float tmp4 = 0; // tmp4_vec is used to vectorize the sum reduction for tmp4 auto tmp4_vec = at::vec::Vectorized(tmp4); float tmp6 = 0; // tmp6_vec is used to vectorize the sum reduction for tmp6 auto tmp6_vec = at::vec::Vectorized(tmp6); { // Parallelize the vectorized reduction for(long i0=0; i0<192; i0+=1) { auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); auto tmp2 = tmp0 - tmp1; auto tmp3 = tmp2.abs(); auto tmp5 = tmp2 * tmp2; tmp4_vec += tmp3; tmp6_vec += tmp5; } // Reduce the tmp4_vec as a scalar and store at tmp4 tmp4 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp4_vec); // Reduce the tmp6_vec as a scalar and store at tmp6 tmp6 = at::vec::vec_reduce_all([](at::vec::Vectorized& x, at::vec::Vectorized&y) {return x + y;}, tmp6_vec); // Handle the tail elements that could not be vectorized by aten. for(long i0=1536; i0<1536; i0+=1) { auto tmp0 = in_ptr0[i0]; auto tmp1 = in_ptr1[i0]; auto tmp2 = tmp0 - tmp1; auto tmp3 = std::abs(tmp2); auto tmp5 = tmp2 * tmp2; tmp4 += tmp3; tmp6 += tmp5; } } out_ptr0[0] = tmp4; out_ptr1[0] = tmp6; } ``` Performance(Measured by operatorbench and the base line of speedup ratio is aten operator performance): Softmax (1,16,384,384,dim=3) | Speedup ratio (simdlen=None) | Speedup ratio (simdlen=8) + this PR -- | -- | -- 24c | 0.37410838067524177 | 0.9036240100351164 4c | 0.24655829520907663 | 1.0255329993674518 1c | 0.21595768114988007 | 1.000587368005134 HW Configuration: SKU: SKX Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz MemTotal: 196708148 kB MemFree: 89318532 kB MemBandwidth: 112195.1MB/S Pull Request resolved: https://github.com/pytorch/pytorch/pull/87356 Approved by: https://github.com/jgong5, https://github.com/jansel commit 6541e51ffd84b044cfde81bb2ea241a75a87952d Author: Wang, Eikan Date: Fri Nov 4 05:28:15 2022 +0000 Explicit vectorization support for TorchInductor (#87068) In this PR, we replace OMP SIMD with `aten::vec` to optimize TorchInductor vectorization performance. Take `res=torch.exp(torch.add(x, y))` as the example. The generated code is as follows if `config.cpp.simdlen` is 8. ```C++ extern "C" void kernel(const float* __restrict__ in_ptr0, const float* __restrict__ in_ptr1, float* __restrict__ out_ptr0, const long ks0, const long ks1) { { for(long i0=0; i0<((ks0*ks1) / 8); ++i0) { auto tmp0 = at::vec::Vectorized::loadu(in_ptr0 + 8*i0); auto tmp1 = at::vec::Vectorized::loadu(in_ptr1 + 8*i0); auto tmp2 = tmp0 + tmp1; auto tmp3 = tmp2.exp(); tmp3.store(out_ptr0 + 8*i0); } for(long i0=8*(((ks0*ks1) / 8)); i0 Date: Mon Nov 7 05:48:22 2022 +0000 use faster cache flush in triton benchmarking (#88557) Speeds up autotuning a little bit more (about 90s -> 75s for coat_lite_mini) @bertmaher, I've put in workaround so that internal doesn't break, but it can be removed once triton is updated internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88557 Approved by: https://github.com/anijain2305 commit eda247ee6ce2f8bc29d86ec94f3863f929a2ea6e Author: YJ Shi Date: Mon Nov 7 01:33:57 2022 +0000 [Dynamo] fix torchdynamo's TVM meta schedule backend (#88249) Note that the previous `optimize_torch` functionality of pytorch is not working with default pytorch release with CXX11 ABI off as TVM by default needs CXX11 ABI for builds. Source: [1](https://discuss.tvm.apache.org/t/can-someone-please-give-me-the-steps-to-use-pt-tvmdsoop/12525), [2](https://discuss.pytorch.org/t/undefined-symbol-when-import-lltm-cpp-extension/32627). It would be easier for user to tune with meta schedule instead of finding a CXX11-compatible pytorch, turning on the `pt-tvmdsoop` flag in TVM and rebuilding it. This could be revisited once the `pt-tvmdsoop` flag is updated and tuned on by default in TVM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88249 Approved by: https://github.com/jansel commit 791d9ee2533d394dc26cff64de74df72d45835e4 Author: Peter Bell Date: Thu Nov 3 16:22:50 2022 +0000 [inductor] Add lowering for as_strided_scatter (#88379) Ref pytorch/torchdynamo#327 The use of as_strided does require in-memory manipulations, however this lowering allows those memory ops to be fused with any preceding calculations. e.g. ``` def f(a, b): return torch.as_strided_scatter( a * 8 + 10, b * 2 - 4, size=(a.numel() // 2,), stride=(2,)) ``` Before this compiles to two kernels and a call to `aten.as_strided_scatter` and with this PR it compiles to just two kernels and no additional operator calls. In theory I think this could be a decomposition, but in practice I saw the `output_view.copy_(src)` being optimized out in some cases when this was implemented as a decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88379 Approved by: https://github.com/jansel commit 81042d3a53335259c60e5aa8c9b9614c3d87b05f Author: PyTorch MergeBot Date: Sun Nov 6 02:29:53 2022 +0000 Revert "Reenable optimizer overlap tests (#88439)" This reverts commit da452bcadbc6f34989c6b3b0db6075a272aa9891. Reverted https://github.com/pytorch/pytorch/pull/88439 on behalf of https://github.com/huydhn due to This change breaks trunk due to a land race missing reason parameter to sandcastle_skip_if https://hud.pytorch.org/pytorch/pytorch/commit/da452bcadbc6f34989c6b3b0db6075a272aa9891 commit bbaa0637df93292eb372b355f01756437aed3ce9 Author: Nikita Karetnikov Date: Fri Nov 4 11:50:18 2022 +0100 Add error inputs to `gaussian_nll_loss` `OpInfo` (#88486) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88486 Approved by: https://github.com/lezcano commit 404f254e205a5aef6a21138d8db17f2ac9d031ae Author: Rohan Varma Date: Sat Nov 5 08:31:02 2022 +0000 Upstream apply_optim_in_backward from TorchRec (#87397) (#88539) Summary: Upstreaming this as part of sharing common APIs. This is just a plain move, any changes needed to support DDP / FSDP will come in follow up diffs. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D40564646 fbshipit-source-id: 619c434e02196812f8d4db1e40d07290e08b18f9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88539 Approved by: https://github.com/awgu commit da452bcadbc6f34989c6b3b0db6075a272aa9891 Author: Rohan Varma Date: Thu Nov 3 18:33:14 2022 +0000 Reenable optimizer overlap tests (#88439) Closes https://github.com/pytorch/pytorch/issues/73259. Not sure the root cause but CI seems fine with these tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88439 Approved by: https://github.com/awgu commit d1ee0730410ac910760c0a21156e574093a0d15a Author: Edward Z. Yang Date: Wed Nov 2 16:39:49 2022 -0400 Handle case when candidate is empty (#88359) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88359 Approved by: https://github.com/wconstab commit 46730aec35ee047b92b288e0366da0f7e993e5ae Author: Sherlock Huang Date: Fri Nov 4 23:11:17 2022 +0000 [Reland] Fix primTorch compute_elementwise_output_strides (#88525) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88525 Approved by: https://github.com/desertfire commit 0e3031f7e76fbd84e62650642dc334c11cc3c511 Author: Edward Z. Yang Date: Fri Nov 4 12:31:51 2022 -0700 Functionalize and compute joint simultaneously. (#88063) This also comes with some bug fixes that were uncovered from doing this: - Forward device calls to inner tensor in FunctionalTensorWrapper - Make legacyExtractDispatchKey exclude Functionalize, so that it can get at the real device type key. This is noncontroversial. - Stop stripping dense from key set. The reason for this is FunctionalWrapperTensor may be used in contexts where people query if it is dense or not. If it doesn't report this correctly (from the dispatch key), it will cause errors. This caused some torchbench models to fail when I did one-pass tracing. - Save and restore reapply views TLS correctly Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88063 Approved by: https://github.com/bdhirsh commit 957a9b63c5c2953da3a1d1fc86c20703c96b2fa6 Author: Sherlock Huang Date: Fri Nov 4 05:01:27 2022 +0000 fx.replace_pattern accepts pattern/replacement as GraphModule (#88479) Symbolic tracer is no longer the default tracer to produce fx graph. SubgraphRewriter should thus accept a raw GraphModule, rather than use symbolic tracer by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88479 Approved by: https://github.com/jerryzh168 commit 4bb5c2c2051371bfed09f9ec46416f3dba550c14 Author: Will Constable Date: Fri Nov 4 22:05:21 2022 +0000 Add docstring to DDPOptimizer (#88521) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88521 Approved by: https://github.com/aazzolini commit 1f32c3c087503151e87e235e78ebd92fe5090d79 Author: Will Constable Date: Fri Nov 4 21:00:01 2022 +0000 Add single-process DDP accuracy support to dynamo benchmark suite (#88511) - does not intend to support multi-process, as that is more complex and we have torchbench scripts for that - currently only works in accuracy mode as this was the main goal, but could be extended for measuring single-gpu perf impact of graph breaks Run with `python benchmarks/dynamo/torchbench.py --inductor --training --accuracy --only hf_Bert --ddp` Example output ``` cuda train hf_Bert [2022-11-04 18:52:08,304] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to complex input striding PASS ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88511 Approved by: https://github.com/davidberard98, https://github.com/aazzolini commit 3fd0729bb663d039204cbcea0726e028541a25ad Author: Will Constable Date: Fri Nov 4 16:27:48 2022 +0000 DDPOptimizer replace debug=True/False with using torchdynamo logger (#88480) Example output: ``` 2022-11-04 05:09:29,525] torch._dynamo.optimizations.distributed: [INFO] DDPOptimizer bucket assignments ┌─────────┬────────────┬───────────────────┐ │ Index │ Size (b) │ Param Names │ ├─────────┼────────────┼───────────────────┤ │ 0 │ 100120020 │ self_net_6_weight │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_6_bias │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_4_weight │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_4_bias │ ├─────────┼────────────┼───────────────────┤ │ 1 │ 100020000 │ self_net_2_weight │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_2_bias │ ├─────────┼────────────┼───────────────────┤ │ 2 │ 220000 │ self_net_0_weight │ ├─────────┼────────────┼───────────────────┤ │ │ │ self_net_0_bias │ └─────────┴────────────┴───────────────────┘ [2022-11-04 05:09:29,527] torch._dynamo.optimizations.distributed: [DEBUG] ---orig graph--- graph(): %inputs : torch.Tensor [#users=1] = placeholder[target=inputs] %self_net_0 : [#users=1] = call_module[target=self_net_0](args = (%inputs,), kwargs = {}) %self_net_1 : [#users=1] = call_module[target=self_net_1](args = (%self_net_0,), kwargs = {}) %self_net_2 : [#users=1] = call_module[target=self_net_2](args = (%self_net_1,), kwargs = {}) %self_net_3 : [#users=1] = call_module[target=self_net_3](args = (%self_net_2,), kwargs = {}) %self_net_4 : [#users=1] = call_module[target=self_net_4](args = (%self_net_3,), kwargs = {}) %self_net_5 : [#users=1] = call_module[target=self_net_5](args = (%self_net_4,), kwargs = {}) %self_net_6 : [#users=1] = call_module[target=self_net_6](args = (%self_net_5,), kwargs = {}) %self_net_7 : [#users=1] = call_module[target=self_net_7](args = (%self_net_6,), kwargs = {}) return (self_net_7,) ---split graph--- graph(): %inputs : torch.Tensor [#users=1] = placeholder[target=inputs] %submod_0 : [#users=1] = call_module[target=submod_0](args = (%inputs,), kwargs = {}) %submod_1 : [#users=1] = call_module[target=submod_1](args = (%submod_0,), kwargs = {}) %submod_2 : [#users=1] = call_module[target=submod_2](args = (%submod_1,), kwargs = {}) return (submod_2,) ---submod_0 graph--- graph(): %inputs : [#users=1] = placeholder[target=inputs] %self_net_0 : [#users=1] = call_module[target=self_net_0](args = (%inputs,), kwargs = {}) %self_net_1 : [#users=1] = call_module[target=self_net_1](args = (%self_net_0,), kwargs = {}) return self_net_1 ---submod_1 graph--- graph(): %self_net_1 : [#users=1] = placeholder[target=self_net_1] %self_net_2 : [#users=1] = call_module[target=self_net_2](args = (%self_net_1,), kwargs = {}) %self_net_3 : [#users=1] = call_module[target=self_net_3](args = (%self_net_2,), kwargs = {}) return self_net_3 ---submod_2 graph--- graph(): %self_net_3 : [#users=1] = placeholder[target=self_net_3] %self_net_4 : [#users=1] = call_module[target=self_net_4](args = (%self_net_3,), kwargs = {}) %self_net_5 : [#users=1] = call_module[target=self_net_5](args = (%self_net_4,), kwargs = {}) %self_net_6 : [#users=1] = call_module[target=self_net_6](args = (%self_net_5,), kwargs = {}) %self_net_7 : [#users=1] = call_module[target=self_net_7](args = (%self_net_6,), kwargs = {}) return self_net_7 --------------- ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88480 Approved by: https://github.com/anj-s, https://github.com/davidberard98 commit 52375a0fd2a5d16109c1ed4d25e1210d0df382a5 Author: jjsjann123 Date: Sat Nov 5 02:22:27 2022 +0000 nvprims native batch norm patch (#88455) Cherry-picking: https://github.com/csarofeen/pytorch/pull/2104 - [x] Added explicit cast on inputs to nvprims.native_batch_norm. This avoids the explicit cast, which gives us issue on fusion definition. - [x] add python repro with dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88455 Approved by: https://github.com/mruberry, https://github.com/IvanYashchuk commit b1116a51173f474d55798b82faeee92deef4f9a8 Author: Yanbo Liang Date: Sat Nov 5 00:17:15 2022 +0000 [Dynamo] Improve BuiltinVariable log when incorrect arg count happens (#88409) Fixes https://github.com/pytorch/torchdynamo/issues/1832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88409 Approved by: https://github.com/mlazos commit 5220d07d2ca3dd094b1d7aa7de242184291d342f Author: Michael Lazos Date: Fri Nov 4 23:26:44 2022 +0000 Fix minifier accuracy msg (#88515) Fixes https://github.com/pytorch/torchdynamo/issues/1809 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88515 Approved by: https://github.com/yanboliang, https://github.com/williamwen42 commit dde9affeaafff957a6d5bf98e33e4119b14cd2d5 Author: Mergen Nachin Date: Fri Nov 4 13:03:00 2022 -0700 Populate self.export in InstructionTranslatorBase (#88508) Summary: This is a followup to https://github.com/pytorch/pytorch/pull/88354/files#diff-622913fdb49db90d6f3a8ab225b4badb7996023e6498e9f7c6d03fe9f32d0986R836 Reference to self.export got added to InstructionTranslatorBase (i.e. STORE_ATTR) but self.export is populated only for InstructionTranslators. Here's an example failure ``` File "/scratch/williamwen/work/pytorch/torch/_dynamo/symbolic_convert.py", line 322, in step getattr(self, inst.opname)(inst) File "/scratch/williamwen/work/pytorch/torch/_dynamo/symbolic_convert.py", line 844, in STORE_ATTR not self.export AttributeError: 'InliningInstructionTranslator' object has no attribute 'export' ``` Let's populate with the base class with export flag. Test Plan: python test/dynamo/test_export_mutations.py python test/dynamo/test_export.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/88508 Approved by: https://github.com/tugsbayasgalan commit afdc2283ef09cecd2476725d02a770c4c297a3ce Author: Digant Desai Date: Fri Nov 4 23:01:45 2022 +0000 [QNNPACK] Add unaligned attributes where asan fails (#88276) Summary: Bypass "Runtime error: store to misaligned address [...] for type 'uint16_t' (aka 'unsigned short'), which requires 2 byte alignment" Test Plan: One of the failing tests, now passes `buck test fbsource//arvr/mode/platform010/dev-asan fbsource//arvr/libraries/eye/engine:sys_test_eyetrackingenginevisioninterface` Reviewed By: kimishpatel, salilsdesai Differential Revision: D40918376 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88276 Approved by: https://github.com/manuelcandales commit 7560a7b27c431f194dd6e06d24f7b49757ea562d Author: andrewor14 Date: Fri Nov 4 09:01:23 2022 -0700 [Quant] Respect non_leaf_module_list for activation modules (#88498) Summary: This commit fixes the bug where `non_leaf_module_list` was not respected for activation modules like `torch.nn.Sigmoid` and `torch.nn.Tanh`. Today, these modules default to `default_fixed_qparams_range_0to1_fake_quant`, and there is no way to configure them to use any other activation_post_process (e.g. FixedQParamsObserver) (see this [mapping](https://github.com/pytorch/pytorch/blob/dc00bb51b8d370bf3891f0edb2c6e0c2914e329a/torch/ao/quantization/quantization_mappings.py#L188-L193)). `non_leaf_module_list` is a "list of non-leaf modules we want to add observer" (see prepare docstring). If the user explicitly specified to insert observers for these modules, we should respect that instead of continuing to use the default. Test Plan: python test/test_quantization.py TestQuantizeEagerPTQStatic.test_activations_in_non_leaf_module_list Reviewers: vkuzo, jerryzh168 Subscribers: vkuzo, jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88498 Approved by: https://github.com/jerryzh168 commit 5af3feefab99a23df393962f664eee1e33619803 Author: Jane Xu Date: Fri Nov 4 21:48:26 2022 +0000 [BE] Update native_functions.yaml README; we do not support Tensor! (#88513) Just a doc update to minimize confusion Pull Request resolved: https://github.com/pytorch/pytorch/pull/88513 Approved by: https://github.com/bdhirsh commit 678d038001b0bd61501739ea97989d28f758343e Author: Will Constable Date: Fri Nov 4 16:27:48 2022 +0000 Support DDP ignored parameters in DDPOptimizer (#88460) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88460 Approved by: https://github.com/aazzolini commit ff6770a9a1db4bb19db24c88bfe7a666722b45d2 Author: Andrew M. James Date: Thu Nov 3 13:55:54 2022 -0500 enable backward for log1p (sparse layouts) (#88155) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155 Approved by: https://github.com/cpuhrsch commit 6938dd0b2cdb80d503a5d84c7e0cb7969ea47d93 Author: Andrew M. James Date: Thu Nov 3 13:55:54 2022 -0500 Support sparse inputs to deg2rad (#88156) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156 Approved by: https://github.com/cpuhrsch commit 1964d8c34fd4afc4c8fd9f749350c9f7d98861f3 Author: Andrew M. James Date: Thu Nov 3 13:55:53 2022 -0500 Enable sparse_csr autograd testing for relu (#88154) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88154 Approved by: https://github.com/cpuhrsch commit f03302ba49318b5d6eea55b509fd448be39070f9 Author: Andrew M. James Date: Thu Nov 3 13:55:53 2022 -0500 Add sparse layout support for torch.frac (#88153) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88153 Approved by: https://github.com/cpuhrsch commit d632d94cc7bc1d60ae90b68c31c920f2828c341c Author: Catherine Lee Date: Fri Nov 4 20:47:42 2022 +0000 Disable mem leak check (#88373) tbh at this point it might be easier to make a new workflow and copy the relevant jobs... Changes: * Disable cuda mem leak check except for on scheduled workflows * Make pull and trunk run on a schedule which will run the memory leak check * Periodic will always run the memory leak check -> periodic does not have parallelization anymore * Concurrency check changed to be slightly more generous Pull Request resolved: https://github.com/pytorch/pytorch/pull/88373 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn commit 093e22083613dd4b92c1ced20201edf713484a23 Author: Huy Do Date: Fri Nov 4 20:35:11 2022 +0000 Re-enable inductor models tests as periodical jobs (#88509) Run every 4 hour same as periodic, but offset by an hour. This should give us some signals instead of completely disabling these jobs on master after https://github.com/pytorch/pytorch/pull/88374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88509 Approved by: https://github.com/malfet commit 3e6579b8f66b62462b98066bc6d98ed8046d38da Author: Jane Xu Date: Fri Nov 4 20:34:23 2022 +0000 Don't print fatal:... in generate_torch_version.py (#88335) During build, users commonly see a message like ``` fatal: no tag exactly matches 'd8b4f33324b1eb6c1103874764116fb68e0d0af4' ``` which is usually ignored when builds succeed, but has confused users when build fails (due to a different issue). This PR removes the red herring, since this usually prints for local development when tags are not found. We catch the exception anyway and handle it under the hood, so we don't need to print it and confuse the user. Test plan: Note that builds on trunk current have this line, cmd-F 'fatal: no tag exactly matches' in https://github.com/pytorch/pytorch/actions/runs/3379162092/jobs/5610355820. Then check in the PR build to see that the line no longer appears. I also tagged my commit locally and printed what tag would be--this code and the old code printed the same results for what tag would be. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88335 Approved by: https://github.com/seemethere commit 955cbe610bc3fe6913f2041d5215e1bf23a8dbd0 Author: Bin Bao Date: Fri Nov 4 18:00:28 2022 +0000 [inductor] Handle the case where kwargs contains tensor (#88417) Summary: Fix https://github.com/pytorch/torchdynamo/issues/1805; currently inductor does not allow any tensor in kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88417 Approved by: https://github.com/ngimel commit e940a2f8e2a3aa9d98291e73b3d40fcffb6182c8 Author: Kurt Mohler Date: Fri Nov 4 20:23:56 2022 +0000 Add nondeterministic error for `scatter` (#88244) Fixes #88096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88244 Approved by: https://github.com/ezyang, https://github.com/mruberry commit 6575174dcb67ebfa5300d0ff2941189543187a3f Author: Mor Tzur Date: Fri Nov 4 20:18:08 2022 +0000 [fx2ait] fixes for AITSplitter (#87805) Summary: propagate lower settings to AITSplitter settings. Reviewed By: yinghai, qxy11 Differential Revision: D40568216 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87805 Approved by: https://github.com/yinghai commit 7b419e8513a024e172eae767e24ec1b849976b13 Author: jjsjann123 Date: Wed Nov 2 01:14:05 2022 -0700 [NVFuser] Upstream push 1026 (#87779) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Codegen changes include: * codegen improvement: i. allow non-root trivial reductions, allow empty/no-op fusion ii. fixes vectorization checks and size calculation iii. bank conflict handle improvement iv. enables transpose scheduler * misc: i. CI tests failure fixes ii. cpp tests file clean up iii. trivial forwarding supports added in codegen runtime iv. added factory methods support in codegen Commits that's in this PR from the devel branch: ``` 7117a7e37ebec372d9e802fdfb8abb7786960f4a patching nvfuser conv cudnn test numerics mismatch (#2048) 65af1a4e7013f070df1ba33701f2d524de79d096 Inserting sync for redundant parallel types is already done at the (#2023) 6ac74d181689c8f135f60bfc1ec139d88941c98c Fix sync map (#2047) f5bca333355e2c0033523f3402de5b8aac602c00 Bank conflict checker improvements (#2032) d2ca7e3fd203537946be3f7b435303c60fa7f51e Minor update on cp.async code generation. (#1901) d36cf61f5570c9c992a748126287c4e7432228e0 Test file cleanup (#2040) 0b8e83f49c2ea9f04a4aad5061c1e7f4268474c6 Allow non-root trivial reductions (#2037) a2dfe40b27cd3f5c04207596f0a1818fbd5e5439 Fix vectorize size calculation (#2035) e040676a317fe34ea5875276270c7be88f6eaa56 Use withPredicate to replace setPredicate to maintain Exprs immutable (#2025) 197221b847ad5eb347d7ec1cf2706733aacbf97c removing ci workflow (#2034) 40e2703d00795526e7855860aa00b9ab7160755f Reduction rand like patch (#2031) bc772661cbdb3b711d8e9854ae9b8b7052e3e4a3 Add utility for checking bank conflict of shared memory (#2029) ddd1cf7695f3fb172a0e4bcb8e4004573617a037 Add back FusionReductionWithTrivialReduction_CUDA (#2030) fbd97e5ef15fa0f7573800e6fbb5743463fd9e57 Revert "Cleanup trivial reduction workarounds (#2006)" (#2024) bca20c1dfb8aa8d881fc7973e7579ce82bc6a894 Cleanup trivial reduction workarounds (#2006) e4b65850eee1d70084105bb6e1f290651adde23e Trivial forwarding (#1995) 1a0e355b5027ed0df501989194ee8f2be3fdd37a Fix contiguity analysis of predicates to match updated contiguity. (#1991) a4effa6a5f7066647519dc56e854f4c8a2efd2a7 Enable output allocation cache (#2010) 35440b7953ed8da164a5fb28f87d7fd760ac5e00 Patching bn inference (#2016) 0f9f0b4060dc8ca18dc65779cfd7e0776b6b38e8 Add matmul benchmark (#2007) 45045cd05ea268f510587321dbcc8d7c2977cdab Enable tests previously disabled due to an aliasing bug (#2005) 967aa77d2c8e360c7c01587522eec1c1d377c87e Contiguous indexing for View operations (#1990) a43cb20f48943595894e345865bc1eabf58a5b48 Make inlining even more modular (#2004) dc458358c0ac91dfaf4e6655a9b3fc206fc0c897 Test util cleanup (#2003) 3ca21ebe4d213f0070ffdfa4ae5d7f6cb0b8e870 More strict validation (#2000) a7a7d573310c4707a9f381831d3114210461af01 Fix build problem (#1999) fc235b064e27921fa9d6dbb9dc7055e5bae1c222 Just fixes comments (#1998) 482386c0509fee6edb2964c5ae72074791f3e43a cleanup (#1997) 4cbe0db6558a82c3097d281eec9c85ad2ea0893a Improve divisible split detection (#1970) 42ccc52bdc18bab0330f4b93ed1399164e2980c9 Minor build fix. (#1996) fcf8c091f72d46f3055975a35afd06263324ede6 Cleanup of lower_utils.cpp: Isolate out GpuLower usage (#1989) 15f2f6dba8cbf408ec93c344767c1862c30f7ecc Move ConcretizedBroadcastDomains to shared_ptr in GpuLower. (#1988) 8f1c7f52679a3ad6acfd419d28a2f4be4a7d89e2 Minor cleanup lower_unroll.cpp (#1994) 1d9858c80319ca7f0037db7de5f04e47f540d76c Minor cleanup (#1992) f262d9cab59f41c669f53799c6d4a6b9fc4267eb Add support for uniform RNG (#1986) eb1dad10c73f855eb1ecb20a8b1f7b6edb0c9ea3 Remove non-const functions, remove GpuLower instance on build, pass in ca_map. (#1987) 634820c5e3586c0fe44132c51179b3155be18072 Add support for some empty fusion (#1981) eabe8d844ad765ee4973faa4821d451ef71b83c3 Segment self mapping fusions (#1954) e96aacfd9cf9b3c6d08f120282762489bdf540c8 Enable Transpose operation (#1882) 425dce2777420248e9f08893765b5402644f4161 Add a null scheduler that helps segmenting away no-op schedules (#1835) 306d4a68f127dd1b854b749855e48ba23444ba60 Fix canScheduleCompileTime check of transpose scheduler (#1969) b1bd32cc1b2ae7bbd44701477bddbcfa6642a9be Minor fix (#1967) bd93578143c1763c1e00ba613a017f8130a6b989 Enable transpose scheduler (#1927) b7a206e93b4ac823c791c87f12859cf7af264a4c Move scheduler vectorize utilities into their own file (#1959) d9420e4ca090489bf210e68e9912bb059b895baf View scheduling (#1928) c668e13aea0cf21d40f95b48e0163b812712cdf2 Upstream push ci fixes (#1965) c40202bb40ce955955bb97b12762ef3b6b612997 Fix dump effective bandwidth (#1962) 93505bcbb90a7849bd67090fe5708d867e8909e4 WAR on index mapping when exact and permissive maps differ (#1960) 45e95fd1d3c773ee9b2a21d79624c279d269da9f Allow splitting inner-most ID to create virtual innermost ID in transpose scheduler (#1930) a3ecb339442131f87842eb56955e4f17c544e99f Improve the comments at the beginning of index_compute.h (#1946) f7bc3417cc2923a635042cc6cc361b2f344248d6 Remove unused variables (#1955) df3393adbb5cb0309d091f358cfa98706bd4d313 Some cleanup (#1957) 7d1d7c8724ab5a226fad0f5a80feeac04975a496 TVDomainGuard factory (#1953) 357ba224c0fb41ed3e4e8594d95599c973f4a0ca Fill allocation with nan on tests (#1956) 8eafc54685d406f5ac527bcbacc475fda4492d7a Fix detection of unmappable root domains (#1952) 90a51f282601ba8ebd4c84b9334efd7762a234bc Some indexing cleanups, Add eye support (#1940) ddc01e4e16428aec92f9c84d698f959b6436a971 Exclude unsupported data types (#1951) 992e17c0688fe690c51b50e81a75803621b7e6aa test the groups the same order as they are merged (#1949) 208262b75d1fed0597a0329d61d57bc8bcd7ff14 Move detection of self mapping IDs to IterDomainGraph from (#1941) ac4de38c6ee53b366e85fdfe408c3642d32b57df Merge pull request #1945 from csarofeen/master_merge_0828 631094891a96f715d8c9925fb73d41013ca7f2e3 Add full, full_like, zeros, zeros_like, ones, ones_like (#1943) aab10bce4541204c46b91ff0f0ed9878aec1bfc4 Merge remote-tracking branch 'upstream/viable/strict' into HEAD 4c254c063bb55887b45677e3812357556a7aa80d Fix arange when step is negative (#1942) 89330aa23aa804340b2406ab58899d816e3dc3d2 Tensor factories must set the output shape as its input (#1939) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D40869846](https://our.internmc.facebook.com/intern/diff/D40869846) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87779 Approved by: https://github.com/davidberard98 commit 15e54293efb7d3ec58ded51e2854971d16b2fa66 Author: Li-Huai (Allan) Lin Date: Fri Nov 4 19:43:56 2022 +0000 [MPS] Fix embedding backward with scalar index (#82809) Previously the embedding backward always expands `-1` dim to indices, resulting in the following error when the indices is a scalar: ``` error: Rank of data array must equal number of outer dimensions in indices array + rank of slice to update, 2 != 1 + 0 -:8:10: note: see current operation: %5 = "mps.scatter_nd"(%0, %arg1, %4) {batch_dims = 0 : ui32, mode = 0 : i32} : (tensor<10x5xf16>, ``` Now makes it conditional. Reproducer: ```python def repro(): w = torch.tensor([[-2.6465, 2.5859, 0.4688, 1.7949, 3.2676], [-3.1641, 8.9375, 5.7578, -2.9453, -6.5469], [ 2.0469, 1.3516, -8.7344, 6.0000, 1.3906], [ 6.5781, 7.8438, 6.9766, 3.2891, -5.1172], [-7.9414, 7.7344, 4.1875, 2.8574, 2.9531], [-0.4844, -5.6328, -6.8359, -4.5156, 3.7891], [ 4.9375, 6.6094, 6.7031, 0.6719, -6.4219], [ 7.0469, 8.2031, 4.4453, 1.7129, -2.4688], [ 1.2207, -3.3750, -2.4531, 7.4062, -6.0469], [-8.9688, 2.2656, 2.4160, -1.0176, 8.4531]], dtype=torch.float32, requires_grad=True) x = torch.tensor(5) out = torch.nn.functional.embedding(x, w) out.sum().backward() w_mps = w.detach().clone().to("mps").requires_grad_() x_mps = x.to("mps") out = torch.nn.functional.embedding(x_mps, w_mps) out.sum().backward() # error ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82809 Approved by: https://github.com/malfet commit 5b767d404e9d9e80c7f900bb28f9ccde1d76bdaa Author: Codrin Popa Date: Fri Nov 4 19:31:16 2022 +0000 Modified roundup_power2_divisions to specify the number of divisions for each power of two interval (#87290) Summary: Improved roundup_power2_divisions knob so it allows better control of rouding in the PyTorch CUDA Caching Allocator. This new version allows setting the number of divisions per power of two interval starting from 1MB and ending at 64GB and above. An example use case is when rouding is desirable for small allocations but there are also very large allocations which are persistent, thus would not benefit from rounding and take up extra space. Test Plan: Tested locally Differential Revision: D40103909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87290 Approved by: https://github.com/zdevito commit b78b8727ff39fd47e3d465e3e6e6e6cf5e578c62 Author: ssjia Date: Thu Nov 3 09:33:25 2022 -0700 [vulkan] enable prepacking for Batchnorm op (#88433) Adds a `BatchNormPackedContext` so that the `batchnorm` op can use prepacking. Differential Revision: [D40721546](https://our.internmc.facebook.com/intern/diff/D40721546/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88433 Approved by: https://github.com/manuelcandales commit 53eac1d48222becc46d0654648648fbf172a1214 Author: Edward Z. Yang Date: Fri Nov 4 06:18:25 2022 -0700 Revert "Revert "Put Python Dispatcher cache in dict, clear it on new registrations. (#88329)"" (#88489) The bug was that I was accidentally caching at the wrong key name, so we were never actually hitting the cache. I've renamed the resolved key to final_key to avoid shadowing in this way. This reverts commit 410ce96a23a3496a45478e0b25ffac53aa3c116f. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88489 Approved by: https://github.com/albanD commit 79abea5683254897fb49dc30d747914de474192c Author: jjsjann123 Date: Fri Nov 4 19:17:07 2022 +0000 nvprim python runtime dtype correctness patch (#88452) Cherry-picking: https://github.com/csarofeen/pytorch/pull/2133 - [x] casts FusionDefinition output to original dtype recorded in the GraphModule - [x] add a python repro with dynamo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88452 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry commit 8c1c6759b28c73cff623c7fef71e0eca00087414 Author: PyTorch MergeBot Date: Fri Nov 4 19:12:35 2022 +0000 Revert "remove assert_allclose from torch.testing (#87974)" This reverts commit 5669e10d37fa3cca21cf82c843ae4c4e79da1b89. Reverted https://github.com/pytorch/pytorch/pull/87974 on behalf of https://github.com/mehtanirav due to Internal breakages from method removal commit bda688c186658b6b018ca88ec592d17eafcb4b2b Author: Edward Z. Yang Date: Fri Nov 4 12:39:03 2022 -0400 Fix typo in clones (#88501) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88501 Approved by: https://github.com/wconstab commit 633f0d620dcfc7681739e39b018dd13cc4f0090d Author: Shiyan Deng Date: Fri Nov 4 17:35:12 2022 +0000 [torch package] Treat builtins as default extern module (#88385) Summary: When using torch deploy, if we do fx transformation and then try to pickle/unpickle a fx GraphModule, it's possible that the GraphModule's code depends on `builtins` but we didn't add it to extern module. Reviewed By: PaliC Differential Revision: D40958730 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88385 Approved by: https://github.com/PaliC commit ead36e5a907c9fbcd837835e52ce448d428f228e Author: John Detloff Date: Fri Nov 4 17:31:17 2022 +0000 Add dep on Accelerate framework to torch podspecs (#88422) A dep on Accelerate was added in https://github.com/pytorch/pytorch/pull/80449 We need to declare this dep in our podspec, otherwise users will have to add the Accelerate framework to their projects manually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88422 Approved by: https://github.com/kimishpatel, https://github.com/malfet commit dc00bb51b8d370bf3891f0edb2c6e0c2914e329a Author: Manuel Candales Date: Fri Nov 4 12:07:12 2022 +0000 [Vulkan][TCC] Add tests for conv2d prepack context (#88316) Summary: Implement Vulkan tests for the create/run context functions in Convolution.cpp, their transposed versions and their backwards compatible versions: - create_conv2d_context - run_conv2d_context - create_tconv2d_context - run_tconv2d_context - conv2d_clamp_prepack - conv2d_clamp_run Test Plan: On Mac ``` cd ~/fbsource buck run -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 ``` On Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Reviewed By: salilsdesai Differential Revision: D40935343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88316 Approved by: https://github.com/salilsdesai commit a171b0636a058d0cd059d39f39e37d5cc1d38df1 Author: Wonjoo Lee Date: Fri Nov 4 08:23:54 2022 +0000 Add use_lazy_shape flag to GenLazyIr class (#88444) Add use_lazy_shape flag to GenLazyIr class to allow XLA to use its custom shape class. The default value is kept to use lazy shape, so this PR does not introduce any new behaviors. PyTorch/XLA companion PR: https://github.com/pytorch/xla/pull/4111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88444 Approved by: https://github.com/alanwaketan, https://github.com/wconstab commit b3206268ace6ebcb5d716ed6673876e62ef484f2 Author: XiaobingSuper Date: Thu Nov 3 00:46:02 2022 -0400 TorchDynamo: enable convolution and batchnorm folding for inference path (#87435) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87435 Approved by: https://github.com/jgong5, https://github.com/jansel commit fbd08fb358b643386edd4dd28b9c747aab4ba8c1 Author: Pruthvi Madugundu Date: Fri Nov 4 04:43:05 2022 +0000 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet commit 70b00b13830c8adbaa2db8f61d475c2458b707c4 Author: Will Constable Date: Thu Nov 3 22:55:24 2022 +0000 Add hf_bert + DDP multigpu test (#88435) Spot-checks an e2e model working with ddp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88435 Approved by: https://github.com/davidberard98 commit 71f793d31265578e2df673cf838ec456bc501d77 Author: XiaobingSuper Date: Thu Nov 3 00:46:01 2022 -0400 TorchDynamo: Add linear binary fusion for cpu in BF16 inference mode (#87066) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87066 Approved by: https://github.com/jgong5, https://github.com/jansel commit 7d95b1e3444a2fdae7c1a5ebb24072167b923c0a Author: Elias Ellison Date: Thu Nov 3 23:10:28 2022 +0000 Run all fallback kernels with FakeTensor (#88248) This improves the memory compression of resnet18 from .84 -> .94 on inductor no-cudagraphs. It does mean that any extern kernel which incorrectly computes strides will be a hard error at runtime, but that's an issue we are going to have to face with dynamic shapes anyway. CC @ezyang, @SherlockNoMad Pull Request resolved: https://github.com/pytorch/pytorch/pull/88248 Approved by: https://github.com/ezyang commit e4efea4f14fd26c1ec83ab25d0197c3e3d40c7a4 Author: XiaobingSuper Date: Thu Nov 3 00:45:59 2022 -0400 TorchDynamo: Add linear unary fusion for cpu in BF16 inference mode (#87065) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87065 Approved by: https://github.com/jgong5, https://github.com/jansel commit 657f2e12f0e212b2f4afd89ab2c824c409dcc951 Author: Nikita Shulga Date: Fri Nov 4 01:22:41 2022 +0000 [MPS] Add native `cumsum` implementation (#88319) Using https://developer.apple.com/documentation/metalperformanceshadersgraph/mpsgraph/4057333-cumulativesumwithtensor?language=objc Fall back to CPU if running on older MacOS versions In `unary_op` add output tensor dims/dtype to the graph key (as even in default op we check output graph type) Also, upcast int16 to int32 as MPS cumsum op on Ventura returns incorrect results for Int16 type (and it makes total sense for int8, as chances for overflow are very high) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88319 Approved by: https://github.com/kulinseth commit 52173188efb3a8b3e5053357c66fd5bde45dc929 Author: XiaobingSuper Date: Thu Nov 3 00:45:54 2022 -0400 TorchDynamo: Add convolution binary fusion for cpu in inference mode (#87064) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87064 Approved by: https://github.com/jgong5, https://github.com/jansel commit 2ce2fc133d5f06e2d563176a96bc0cc8fa207670 Author: Elias Ellison Date: Thu Nov 3 18:58:07 2022 +0000 Disable Current Modes when printing Tensor (#88344) Fix for https://github.com/pytorch/pytorch/issues/88087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88344 Approved by: https://github.com/ezyang, https://github.com/samdow commit e804c7229490474230a15df8a6eb5f1712828df6 Author: Jiewen Tan Date: Fri Nov 4 00:06:07 2022 +0000 [LTC] Update merge_rules.yaml (#88291) Summary: Some of the LTC code-gen infra has been moved from codegen/ to torchgen/. Update the merge_rules.yaml to reflect that. Test Plan: New GH PRs... Pull Request resolved: https://github.com/pytorch/pytorch/pull/88291 Approved by: https://github.com/malfet commit a84d68cdfd3b2d7e9f43221ac0ecc646db63a1d4 Author: Andrew Gu Date: Thu Nov 3 16:26:56 2022 +0000 [FSDP][Docs] Reword `sharding_strategy` docs and other minor doc changes (#88431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88431 Approved by: https://github.com/mrshenli commit ff23e07b2eabc95ed2d08d6aebbaa242425bd8df Author: Andrew Gu Date: Thu Nov 3 16:26:46 2022 +0000 [FSDP][Docs] Simplify CPU offload docs (#88430) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88430 Approved by: https://github.com/mrshenli commit 4de50b25215b71517831b9766c4655d56ef7946e Author: Chien-Chin Huang Date: Thu Nov 3 19:30:05 2022 +0000 [FSDP] Allow to use TorchDispatch with FSDP (#88014) Add `_no_dispatch_record_stream` to disable TorchDispatch before calling `record_stream()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88014 Approved by: https://github.com/awgu commit 31ebd3cc2fb4a9025d3a17b90400ea83125dc17c Author: Huy Do Date: Thu Nov 3 23:15:39 2022 +0000 Reset NVIDIA devices stuck in failed mode (#88459) Try to reset the NVIDIA devices if they get stuck in failed mode per comment in https://github.com/pytorch/pytorch/issues/88388 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88459 Approved by: https://github.com/malfet commit ab8f3333ff02d7a6260e616f87ab4f8ed3e1db4b Author: Andrew Gu Date: Thu Nov 3 16:26:36 2022 +0000 [FSDP][Docs] Simplify `mixed_precision` ctor docs (#88429) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88429 Approved by: https://github.com/mrshenli commit 36582574f3bef05f0822bbd6982062342cfcdab8 Author: Animesh Jain Date: Thu Nov 3 22:56:05 2022 +0000 [dynamo] Skip mutation detection for inference mode (#88406) Skip the mutation detection for inference_mode, and raise a warning. This helps one internal model Related to https://github.com/pytorch/torchdynamo/issues/1768 @ezyang What do you think about this? The issue that Dynamo mutation detector uses version counter to detect mutation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88406 Approved by: https://github.com/ezyang commit 410ce96a23a3496a45478e0b25ffac53aa3c116f Author: PyTorch MergeBot Date: Thu Nov 3 21:57:19 2022 +0000 Revert "Put Python Dispatcher cache in dict, clear it on new registrations. (#88329)" This reverts commit 86c7cd287caeb23c227d97d283e58bc123294746. Reverted https://github.com/pytorch/pytorch/pull/88329 on behalf of https://github.com/clee2000 due to test_decomp takes an extra 2 hours in some jobs, windows takes so long it times out commit 9946041a3edbfa3a9db1c38aa0436f0d6f1a29db Author: samdow Date: Thu Nov 3 21:50:52 2022 +0000 [functorch] make hessian docs actually use hessian function (#88451) I was going through the hessian docs to find an example and noticed that these docs don't actually use the hessian function.... Pull Request resolved: https://github.com/pytorch/pytorch/pull/88451 Approved by: https://github.com/zou3519, https://github.com/Skylion007 commit ce961b34430b52d0591fea4e485ffcb4633c4e90 Author: Elias Ellison Date: Thu Nov 3 18:22:44 2022 +0000 Dont hold onto references of saved tensors in backward (#88247) This improves memory compression of resnet18 on inductor non-cudagraphs from .78 -> .0.84. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88247 Approved by: https://github.com/ezyang commit 65de9a2b8119e765abeb893e37ab49ea3276e41c Author: Sam Tsai Date: Thu Nov 3 20:32:54 2022 +0000 Fix fuse_func method overwrite (#87791) (#88193) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87791 Fixing the interface so that the fuse_func is honored and not replaced but the default fuse_known_method. Test Plan: Wait for sandcastle Reviewed By: jerryzh168 Differential Revision: D40722395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88193 Approved by: https://github.com/jerryzh168 commit 433746300dd8f5362329bbb208a61584febbba11 Author: Po-Wei Chou Date: Thu Nov 3 20:20:49 2022 +0000 [pytorch] Expose EmbeddingPackedParamsBase::unpack to Python (#88362) Summary: User can't call `.unpack()` when they have a quantized Embedding layer because `&EmbeddingPackedParamsBase::unpack` was never exposed to Python through pybind. This diff fixes that. Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D40606585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88362 Approved by: https://github.com/jerryzh168 commit 23a6e1532142a2858d5e5445b5bcd2e468e80a66 Author: Justin Chu Date: Thu Nov 3 20:18:33 2022 +0000 [ONNX] Remove the INT64_MAX magic numbers (#88341) Remove the magic numbers in symbolic opsets and use a INT64_MAX global instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88341 Approved by: https://github.com/BowenBao commit 6d7eee04b8943dd371465e4f909eba8474ce0292 Author: Andrew Gu Date: Thu Nov 3 16:26:26 2022 +0000 [FSDP] Default to `BACKWARD_PRE` (#88428) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88428 Approved by: https://github.com/mrshenli commit c28022d96c12210328d7bab6add026699cf8e9ee Author: Author Name Date: Thu Nov 3 20:08:16 2022 +0000 [profiler] Add an option initialize kineto profiler on start up (#87226) (#88020) Summary: Overall this patch enables initializing the kineto profiling library on start-up. This is guarded by an env variable that is described a bit more later. The kineto profiler is otherwise initialized lazily when pytorch profiler is invoked. We are enabling on-demand profiling capability for pytorch. As users run large distributed training flows this will enable one to capture a pytorch profiler/GPU trace remotely, from outside the process. The kineto library and a monitoring daemon - dynolog- interact to achieve this. Dynolog will be open sourced by end of October, and has been dogfooded on Meta AI Research cluster. https://github.com/facebookincubator/dynolog Kineto library registers itself with the dynolog daemon running on the host over inter process communication ``` | kineto | --> (ipcfabric) --> | dynolog | * register() * poll for on-demand tracing configs() ``` This feature is currently enabled by setting the env variable `KINETO_USE_DAEMON`. However, it only works if we initialize kineto, else the thread to talk to dynolog is not spun up. Related PRs in kineto include https://github.com/pytorch/kineto/pull/637 https://github.com/pytorch/kineto/pull/653 Build pytorch from source (need to set USE_LITE_INTERPRETER_PROFILER=OFF) Run a simple linear model [example](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html). ``` export KINETO_CONFIG=/private/home/bcoutinho//libkineto.conf export KINETO_USE_DAEMON=1 python3 /private/home/bcoutinho/linear_model.py ``` Output ``` INFO:2022-10-18 09:01:12 4169946:4169946 init.cpp:98] Registering daemon config loader cuda:0 ``` We can trigger a trace using the dynolog client tool ``` response length = 147 response = {"activityProfilersBusy":0,"activityProfilersTriggered":[4116844],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[4116844]} Matched 1 processes Trace output files will be written to: /tmp/gpu_trace_test_4116844.json ``` ``` python3 ../../linear_model.py cuda:0 99 1425.056884765625 10099 8.817168235778809 ``` Currently the environment should guard users from picking this change up unless intended. The libkineto_init does setup CUPTI APIs and spins up a thread to read on-demand configurations. This should not be problematic, we can provide a more granular init in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87226 Reviewed By: chaekit Differential Revision: D40558184 Pulled By: briancoutinho fbshipit-source-id: afea7502b1d72201c00994c87fde63a35783f4d5 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/88020 Approved by: https://github.com/chaekit commit 826b4a9c2dd856c11ca7d73bc0d617758bff6a5a Author: Max Ren Date: Thu Nov 3 20:05:53 2022 +0000 [coreml] delegate multiple outputs (#88345) Summary: https://www.internalfb.com/code/fbsource/[c0e4da0b5c7fff3b4e31e4611033c30cabdc6aef]/fbcode/caffe2/torch/csrc/jit/backends/backend_detail.cpp?lines=268-276 seems like the torchscript addition of `$unpack, = self.__backend.execute( ... ` the comma after unpack forces the result of execute to have only one item. So for this fix now when the size of the outputs > 1, execute returns a List List of outputs (basically put the outputs in another list before putting it into the list we return) ``` [[output1, output2, output3, ...]] ``` instead of ``` [output1, output2, output3, ...] ``` Do we want to fix this in backend_detail? Or should we make the change in our delegate to accomadate the torchscript? Proposing this q here. Requesting cccclai, kimishpatel for approval here Test Plan: unblocked models for chengxiangyin and models in pytorch playground all passing unit tests Reviewed By: kimishpatel, cccclai Differential Revision: D40328684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88345 Approved by: https://github.com/jmdetloff, https://github.com/Skylion007 commit 9533fe9031cc82c2f833ef066f8dd2d5d2d1eebf Author: Kimish Patel Date: Wed Nov 2 08:55:14 2022 -0700 [pytorch][vulkan] Add bias storage type to template (#88324) To enable buffer based use for bias as well, this diff adds storage type for bias to template Differential Revision: [D40689003](https://our.internmc.facebook.com/intern/diff/D40689003/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88324 Approved by: https://github.com/jmdetloff commit 893f8e3790df47b165025c9e6b5b37b85bdfd501 Author: Kimish Patel Date: Wed Nov 2 08:55:08 2022 -0700 [PyTorch][Vulkan] Add template based codegen for shader generation (#88323) We would like to be able to parameterize kernels such that a parameterized algorithm can be implemented via templates. We can then profile performance of a kernel with different parameter values. This enables us to determine what parameters may work the best for a given kernel or a given device. In this diff one such kernel added in 1x1 conv which parameters across size of the tile being produced by each invocation. Few other options for parameters can be: - One can imagine dtype can also be a parameter such that we can do compute in fp16 or int8/int16. - Register blocking for input channels Differential Revision: [D40280336](https://our.internmc.facebook.com/intern/diff/D40280336/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40280336/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/88323 Approved by: https://github.com/jmdetloff commit 60925fcb7e0662e5dc925b0cd5f79615e336cb4b Author: Elias Ellison Date: Thu Nov 3 03:37:23 2022 +0000 Dont clone inputs if using fake tensor (#88208) Not sure that this will really reduce memory use but it is an extraneous copy in our stack right now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88208 Approved by: https://github.com/anijain2305 commit 192e806c265f8c90ffb34ab3787be5c153e84972 Author: Kimish Patel Date: Wed Nov 2 08:55:02 2022 -0700 [Pytorch][vulkan] Generate shader with parameters (#88322) Parametsr such as tile size and weight type and format is embedded within the shader code. This is used to generate ShaderInfo. For now we will maintain both ShaderSrc and ShaderInfo so as to transition from VK_KERNEL to VK_SHADER incremental. Otherwise we will have to switch multiple of them at the same time. Differential Revision: [D40280338](https://our.internmc.facebook.com/intern/diff/D40280338/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88322 Approved by: https://github.com/jmdetloff, https://github.com/mcr229 commit fe3a226d74008f7ce846198530c75e2df232934f Author: kshitij12345 Date: Thu Nov 3 19:28:33 2022 +0000 [minor] use set_default_dtype instead of try and finally (#88295) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88295 Approved by: https://github.com/mruberry commit f8b73340c85ca29fb47bf5056246f6edd1ec261e Author: Animesh Jain Date: Thu Nov 3 19:07:03 2022 +0000 [dashboard] Replace aot_nvfuser with nvprims_nvfuser (#88437) @IvanYashchuk @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/88437 Approved by: https://github.com/soumith commit 2bda2baad787923b064c747e619e62a6af969940 Author: Yanbo Liang Date: Thu Nov 3 18:03:36 2022 +0000 [Dynamo][Easy] Fix config.suppress_errors error log (#88402) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/88402 Approved by: https://github.com/williamwen42 commit 4d62ee1b36f895d9a2987f02ae9c34c6424e0faf Author: Michael Lazos Date: Thu Nov 3 17:59:05 2022 +0000 Verbose exc printing fix (#88387) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/88387 Approved by: https://github.com/tugsbayasgalan commit 0a274c4b6c916363ce3e3f75b315ac66156f8ce6 Author: Justin Chu Date: Thu Nov 3 17:41:48 2022 +0000 [ONNX] Default runtime type checking to raising errors (#86555) Default runtime type checking to raise by changing the default value to `GLOBALS.runtime_type_check_state` into ERRORS Pull Request resolved: https://github.com/pytorch/pytorch/pull/86555 Approved by: https://github.com/BowenBao commit d70bc222d8581bc4256119d51c9344472f71fe95 Author: XiaobingSuper Date: Sun Oct 30 22:23:51 2022 -0400 add parameters check for mkldnn_transpose (#85318) This PR is about add parameters check for mkldnn_transpose, fixed https://github.com/pytorch/pytorch/issues/85216. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85318 Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/leslie-fang-intel commit c1dd13fb2fb623986508a4cf9f1fe4cc1656a52f Author: Animesh Jain Date: Thu Nov 3 17:05:50 2022 +0000 [dynamo] Support compare op for userfunctionvariable (#88372) Helps reduce graph breaks for one of the training models Pull Request resolved: https://github.com/pytorch/pytorch/pull/88372 Approved by: https://github.com/jansel commit 2c46d5725e3b89d8f83ed2ba940225fa57a7156f Author: Mergen Nachin Date: Wed Nov 2 14:13:20 2022 -0700 Disallow module attribute mutation (#88354) Summary: See https://github.com/pytorch/torchdynamo/issues/1475 Not allowing any new mutations happen inside forward() function during export. Test Plan: Run `python test/dynamo/test_export.py` and make sure it passes Added new unit tests (3 positive tests and 4 negative tests) Here's what the actual error looks like ``` File "/home/mnachin/local/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 322, in step getattr(self, inst.opname)(inst) File "/home/mnachin/local/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 835, in STORE_ATTR assert not self.export, f"Mutating module attribute {inst.argval} during export." AssertionError: Mutating module attribute a during export. from user code: File "/data/users/mnachin/pytorch/test/dynamo/test_export_mutations.py", line 25, in forward self.a = self.a.to(torch.float64) Set torch._dynamo.config.verbose=True for more information ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88354 Approved by: https://github.com/tugsbayasgalan, https://github.com/jansel commit 2b117c843628e8f73d8fbb471eb045cf6805fdc3 Author: PyTorch MergeBot Date: Thu Nov 3 16:53:01 2022 +0000 Revert "Fix primTorch compute_elementwise_output_strides (#88175)" This reverts commit 1c8a0656d65412b83d3c00f2fc66ab958e991de8. Reverted https://github.com/pytorch/pytorch/pull/88175 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks cuda 11.6 in trunk. As the PR signal was green, this is probably a landrace commit 0f6304ef1ebb089e03b251bb90f886ec1bfd6194 Author: Nikolay Korovaiko Date: Thu Nov 3 16:52:37 2022 +0000 disable the out variants in test_cumprod test for inductor (#88328) `out=` variants aren't supported by autograd and it's not a must fix, so disabling the test (https://github.com/pytorch/torchdynamo/issues/1798) for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88328 Approved by: https://github.com/desertfire commit 529ba076c6ac898d3d236ffc9f018d74cf888a18 Author: Nikolay Korovaiko Date: Thu Nov 3 16:21:15 2022 +0000 add an exclude for test_constructor for inductor (#88143) This test (https://github.com/pytorch/torchdynamo/issues/1800) fails since none of the c-tor ops support `pin_memory=True`. Natalia suggests it's not a priority to fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88143 Approved by: https://github.com/desertfire commit 002dad35f4cef9ee468e0b67e8765355be3e0689 Author: Nikolay Korovaiko Date: Thu Nov 3 16:20:14 2022 +0000 better error message for out= ops (#88367) In cases where a tensor kwarg is actually "out=", the following error message would look nicer than this : ``` Traceback (most recent call last): File "/fsx/users/binbao/pytorch/torch/_inductor/graph.py", line 241, in call_function out = lowerings[target](*args, **kwargs) File "/fsx/users/binbao/pytorch/torch/_inductor/lowering.py", line 168, in wrapped assert not any(isinstance(x, TensorBox) for x in kwargs.values()) AssertionError ``` https://github.com/pytorch/torchdynamo/issues/1798 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88367 Approved by: https://github.com/desertfire commit b4fcfe77b22257072234f5e0d76baeb6a7404427 Author: Natalia Gimelshein Date: Thu Nov 3 15:58:18 2022 +0000 reduce the number of autotuning iterations, don't autotune simple til… (#88386) …ed copies Partially fixes https://github.com/pytorch/torchdynamo/issues/1807, reduces compile time for me from 360 s to 90s. Kernels with multiple outputs sometimes autotune to unexpected configs, so I'm limiting the heuristic to relatively safe application. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88386 Approved by: https://github.com/jansel commit 5e6ceebccbafa6febf8c3fa8abc058f311319015 Author: Christian Puhrsch Date: Thu Nov 3 15:15:57 2022 +0000 Add support for neg to NestedTensor (#88131) Partially fixes #86889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88131 Approved by: https://github.com/drisspg commit 35be73df094f02dd26562cf665a6158e80bc4045 Author: Andrew Gu Date: Wed Nov 2 18:06:05 2022 +0000 [FSDP()][Easy] Make `fully_shard()` only `FULL_SHARD` (#88260) We can have a separate API for each of the other sharding strategies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88260 Approved by: https://github.com/mrshenli commit fc743ec0595a03dd755bf44fd36d70f02e97dd25 Author: Andrew Gu Date: Wed Nov 2 18:06:05 2022 +0000 [FSDP()] Have `fully_shard()` abide by `@contract`! (#88235) We are making some progress on composability :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88235 Approved by: https://github.com/mrshenli commit 63cd5d7e2743bbbe86cc333adc6bc834228daef3 Author: Bin Bao Date: Wed Nov 2 15:10:37 2022 +0000 Add a shortcut in Makefile for updating triton (#88318) Summary: Local triton installation needs to be updated after we migrate to a newer version of triton, e.g. https://github.com/pytorch/pytorch/pull/88242. The Makefile shortcut makes that easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88318 Approved by: https://github.com/ezyang commit f884e817d448228cb8b0685f774ede1d8207ff72 Author: Edward Z. Yang Date: Wed Nov 2 19:08:07 2022 -0700 Make Python op registration work with torchdeploy/multipy (#87162) See strategy at PythonOpRegistrationTrampoline.cpp for the big picture. Along the way, I made OperatorHandle support == and hashing, and slightly changed the low level python_dispatch impl API to disallow empty strings for dispatch key, which had the knock on effect of requiring us to explicitly make sure we pass in CompositeImplicitAutograd if we would have passed in "" (I didn't apply this to the rest of the file because I'm lazy.) Test strategy is we delete the logic for preventing Python op registrations in torch from being skipped in a torchdeploy context and show CI still works. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162 Approved by: https://github.com/anjali411, https://github.com/bdhirsh commit 2f296cfdbb8063297a37cd54ba1ccf44022faa70 Author: Edward Z. Yang Date: Wed Nov 2 20:44:18 2022 -0700 Add a reshape_copy operator. (#88314) The semantics is "as if" you did a reshape, but it always copied even if the input was directly view'able. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88314 Approved by: https://github.com/albanD commit 86c7cd287caeb23c227d97d283e58bc123294746 Author: Edward Z. Yang Date: Wed Nov 2 20:44:17 2022 -0700 Put Python Dispatcher cache in dict, clear it on new registrations. (#88329) The motivation is that I am going to add the ability to temporarily install entries to the python dispatcher, and to do that, I need an easier way to clear the cache. Putting the cache in a dict centralizes cache clearing in one place. I then add some easy cache clearing. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88329 Approved by: https://github.com/albanD commit 97d3b200ca49b9434dd9e5de979c9d23a866a38e Author: Edward Z. Yang Date: Wed Nov 2 18:55:33 2022 -0700 Unconditionally enable python dispatcher in AOTAutograd (#88365) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88365 Approved by: https://github.com/Chillee commit a689502275529f78ba4a88c2e62ab897a96a040a Author: Andrew Gu Date: Wed Nov 2 20:34:41 2022 +0000 [FSDP] Do not include empty state in `_flatten_optim_state_dict()` (#88353) https://github.com/pytorch/pytorch/blob/983c0e7f3101f1543bed6c4ec1539a4d590a94c0/torch/optim/adam.py#L163 The above line requires that a candidate optimizer state dict being loaded via `load_state_dict()` has non-empty state for its 0th parameter (via `state_values[0]`). This PR changes FSDP to only include non-empty mappings in the state returned by `_flatten_optim_state_dict()`, which is the subroutine for both `shard_full_optim_state_dict()` and `flatten_sharded_optim_state_dict()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88353 Approved by: https://github.com/fegin commit 95a9721a15cb7be77b221ba5778d456880eaad20 Author: Andrew Gu Date: Wed Nov 2 18:06:04 2022 +0000 [FSDP()][Easy] Rename `_State` to `_FSDPState` (#88234) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88234 Approved by: https://github.com/mrshenli commit 0520131ed6ee7965e7b53b291974b589220cdf3a Author: Andrew Gu Date: Wed Nov 2 18:06:04 2022 +0000 [FSDP()] Rename to `fully_shard()` and move to `_composable/` (#88233) After internal discussion, we are currently preferring `fully_shard()` as the name of the composable FSDP API. - `FullyShardedDataParallel` (FSDP) has existing brand value, so the chosen name should try to preserve that. We think this takes precedence over the fact that composable FSDP may encompass than just the ZeRO-3 approach of _fully sharding_. - Given the refactoring efforts, it would also not be challenging to create a new frontend API like `hybrid_shard()` that calls into the same underlying initialization and runtime except for a different `ShardingStrategy`. In other words, we do not have to coalesce all sharding strategies under `fully_shard()`. - The other composable APIs are verbs (`replicate()`, `checkpoint()`), so the chosen name should be a verb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88233 Approved by: https://github.com/mrshenli commit 54b6188cc6dee45b775d688223b847dc8ea85bff Author: Kshiteej K Date: Thu Nov 3 09:57:47 2022 +0000 [fix] allow saving python attr on Tensor and Parameter via torch.save (#81616) Fixes: https://github.com/pytorch/pytorch/issues/72129 TODO: * [x] Fix for Parameter Benchmark (Measurable diff for small tensors) ``` [-------------- Save and Load --------------] | After PR | Before PR 1 threads: ---------------------------------- () | 111.7 | 106.9 (4, 4) | 114.4 | 109.2 (128, 128) | 135.2 | 128.3 (1024, 1024) | 1431.9 | 1431.3 Times are in microseconds (us). ```
Benchmark Script ```python import torch from torch.testing._internal.common_utils import BytesIOContext from torch.utils import benchmark import pickle shapes = ((), (4, 4), (128, 128), (1024, 1024)) sizes = [1, 64, 1024, 10000] results = [] def save_load_fn(t): with BytesIOContext() as f: torch.save(t, f) f.seek(0) torch.load(f) for shape in shapes: t = torch.randn(shape) label = 'Save and Load' sub_label = f'{shape}' results.append(benchmark.Timer( stmt='save_load_fn(t)', globals={'t': t, 'save_load_fn':save_load_fn}, label=label, sub_label=sub_label, description='Before PR', ).blocked_autorange(min_run_time=2)) compare = benchmark.Compare(results) compare.print() with open('before_pr.pkl', 'wb') as f: pickle.dump(results, f) ```
NOTE : **BC-Breaking** : After this PR, all tensors (also regular tensors) will be serialised using `_rebuild_from_type_v2`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81616 Approved by: https://github.com/albanD, https://github.com/kurtamohler commit 1c8a0656d65412b83d3c00f2fc66ab958e991de8 Author: Sherlock Huang Date: Thu Nov 3 06:02:37 2022 +0000 Fix primTorch compute_elementwise_output_strides (#88175) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88175 Approved by: https://github.com/ngimel commit 0efd4e92b5b8e29b083a91093c803e62c3507cf7 Author: Wonjoo Lee Date: Thu Nov 3 06:19:40 2022 +0000 Make GenLazyNativeFuncDefinition generator to be customizable in lazy codegen (#87823) As part of the ongoing LTC migration effort, PyTorch/XLA is updating its codegen to use `xla::Shape` instead of `torch::lazy::Shape`. To achieve this, this PR updates the codegen to make the `GenLazyNativeFuncDefinition` generator customizable. The existing `GenLazyNativeFuncDefinition` is kept by using the initial default values, so this change should not introduce any new behaviors to the existing codegen in PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87823 Approved by: https://github.com/alanwaketan, https://github.com/wconstab commit a8f40b39ce4f9fa9ffd90400b7d10ea4051d623a Author: Thiago Crepaldi Date: Thu Nov 3 03:01:33 2022 +0000 Update all ONNX symbolics with new JitScalarType API (#87245) Fixes https://github.com/pytorch/pytorch/issues/84365 and more This PR addresses not only the issue above, but the entire family of issues related to `torch._C.Value.type()` parsing when `scalarType()` or `dtype()` is not available. This issue exists before `JitScalarType` was introduced, but the new implementation refactored the bug in because the new api `from_name` and `from_dtype` requires parsing `torch._C.Value.type()` to get proper inputs, which is exactly the root cause for this family of bugs. Therefore `from_name` and `from_dtype` must be called when the implementor knows the `name` and `dtype` without parsing a `torch._C.Value`. To handle the corner cases hidden within `torch._C.Value`, a new `from_value` API was introduced and it should be used in favor of the former ones for most cases. The new API is safer and doesn't require type parsing from user, triggering JIT asserts in the core of pytorch. Although CI is passing for all tests, please review carefully all symbolics/helpers refactoring to make sure the meaning/intetion of the old call are not changed in the new call Pull Request resolved: https://github.com/pytorch/pytorch/pull/87245 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit b013825c7d104dca2c6c11cd985453d8520577f7 Author: PyTorch MergeBot Date: Thu Nov 3 02:57:24 2022 +0000 [vision hash update] update the pinned vision hash (#88382) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88382 Approved by: https://github.com/pytorchbot commit 5fb9c113aee76dd0465a6ee7067eeb018929b922 Author: Aaron Gokaslan Date: Thu Nov 3 02:53:26 2022 +0000 Update pybind11 to v2.10.1 (#88332) I am one of the maintainers of pybind11, and a frequent PyTorch user. We added quite a lot of bugfixes and performance improvements in 2.10.1 (see the changelog for full details) and I wanted to upstream them to PyTorch. Our releases is tested throughout Google's codebase including on their global builds of PyTorch so there should be no surprises. The main new feature is optin in Eigen Tensor to Numpy casters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88332 Approved by: https://github.com/soumith commit e59d307e2f1d3be0395838acbd03085f2285c0eb Author: Richard Barnes Date: Thu Nov 3 02:48:41 2022 +0000 Improve perf by avoiding implicit string creation in c10_cuda_check_implementation (#88350) Test Plan: Sandcastle Differential Revision: D40949947 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88350 Approved by: https://github.com/Skylion007, https://github.com/soumith commit a0fb234b4523e06d3e4bd1f06fb421bcd09c8939 Author: Jerry Zhang Date: Wed Nov 2 15:42:08 2022 -0700 [codegen] using TORCH_LIBRARY_FRAGMENT for some namespaces (#88229) Summary: Sometimes we want to extend an existing custom namespace library, instead of creating a new one, but we don't have a namespace config right now, so we hardcode some custom libraries defined in pytorch today, i.e. quantized and quantized_decomposed Test Plan: ci Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/88229 Approved by: https://github.com/ezyang commit 7b8cc063acab584e6fe0f0a82ff246fab6691205 Author: Huy Do Date: Thu Nov 3 02:15:07 2022 +0000 Not run inductor test in trunk (#88374) Trying to not run in inductor tests in trunk at the moment because of CUDA issue with G5 runner: * CUDA GPU not found https://github.com/pytorch/pytorch/actions/runs/3379516207/jobs/5611539300 * NVIDIA driver installation fails https://github.com/pytorch/pytorch/actions/runs/3379922198/jobs/5612458360 * Docker fails to start https://github.com/pytorch/pytorch/actions/runs/3381276196/jobs/5615513348 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88374 Approved by: https://github.com/desertfire commit d979caa87c7810ea68845b86696d883452da9b8f Author: Mikayla Gawarecki Date: Wed Nov 2 20:28:39 2022 +0000 Added add/mul for nested dense [B, *, D], [B, 1, D] case (CUDA-only) (#88289) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88289 Approved by: https://github.com/cpuhrsch commit 4c20c0509d5cf8d4dea83cc330056044a6277b1b Author: soulitzer Date: Wed Nov 2 13:52:15 2022 -0400 Split out forward AD tests from test_ops_gradients and reenable slow gradcheck CI (#88216) Fixes: https://github.com/pytorch/pytorch/issues/88010 This PR does a couple things to stop slow gradcheck from timing out: - Splits out test_ops_fwd_gradients from test_ops_gradients, and factors out TestFwdGradients and TestBwdGradients which both inherit from TestGradients, now situated in common_utils (maybe there is a better place?) - Skips CompositeCompliance (and several other test files) for slow gradcheck CI since they do not use gradcheck - because test times for test_ops_fwd_gradients and test_ops_gradients are either unknown or wrong, we hardcode them for now to prevent them from being put together. We can undo the hack after we see actual test times are updated. ("def calculate_shards" randomly divides tests with unknown test times in a round-robin fashion.) - Updates references to test_ops_gradients and TestGradients - Test files that are skipped for slow gradcheck CI are now centrally located in in run_tests.py, this reduces how fine-grained we can be with the skips, so for some skips (one so far) we still use the old skipping mechanism, e.g. for test_mps Pull Request resolved: https://github.com/pytorch/pytorch/pull/88216 Approved by: https://github.com/albanD commit a8561c4571fe668d35e24c8f61bd296e23db807c Author: PyTorch MergeBot Date: Wed Nov 2 23:33:15 2022 +0000 Revert "[inductor] Handle the case where kwargs contains tensor (#88215)" This reverts commit 983c0e7f3101f1543bed6c4ec1539a4d590a94c0. Reverted https://github.com/pytorch/pytorch/pull/88215 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think it breaks trunk https://github.com/pytorch/pytorch/actions/runs/3380662072/jobs/5613987333 with a failure in test_torchinductor_opinfo.py commit 7354368fd5a8dec5c9fc26dddf5f7da37f1d2499 Author: Jiewen Tan Date: Wed Nov 2 23:31:26 2022 +0000 [LTC] Remove non-native view ops (#88031) Summary: LTC somehow implements a bunch of non-native view ops during the transition to functionalization. Let's remove them now that functionalization is final. Test Plan: CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88031 Approved by: https://github.com/JackCaoG, https://github.com/antoniojkim commit 72f3688029d0bfdd5f2926c8efeb9451135ae6da Author: Kimish Patel Date: Wed Nov 2 08:54:54 2022 -0700 [Pytorch][Vulkan] Update spv generation script to embed shader parameters (#88321) This diffs adds shader parameters such as tile size, weight storage type and format to the generated spv.cpp file. This is used in ShaderInfo struct that ops such as convolution will use to determine, the workgroup size and how to pack weights. Differential Revision: [D40280337](https://our.internmc.facebook.com/intern/diff/D40280337/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88321 Approved by: https://github.com/jmdetloff, https://github.com/mcr229 commit 6c858e37271472b2255e3358be97fd135a9fbe59 Author: Andrew Gu Date: Wed Nov 2 11:38:11 2022 +0000 [FSDP][Easy] Remove unneeded `TrainingState` transition (#88232) Follow-up from previous PR in the stack Pull Request resolved: https://github.com/pytorch/pytorch/pull/88232 Approved by: https://github.com/mrshenli commit 73de44fc561a202aba9d849fb8ada5adad030077 Author: Andrew Gu Date: Wed Nov 2 11:38:10 2022 +0000 [FSDP] Rename `unflat_param_name` -> `fqn` for consistency (#88123) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88123 Approved by: https://github.com/mrshenli commit f35d5145a1cc34c6e6dc3680e408344806aefbac Author: Andrew Gu Date: Wed Nov 2 11:38:10 2022 +0000 [FSDP] Simplify `_get_buffer_names()` (#88122) This is a follow-up from a previous PR in this stack. The PR simplifies the `_get_buffer_names()` implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88122 Approved by: https://github.com/mrshenli commit 572a3d2d6efd5493df8cb43c7da98bcf0bf20129 Author: Andrew Gu Date: Wed Nov 2 11:38:10 2022 +0000 [FSDP] Remove unneeded `torch.no_grad()` context when offloading to CPU (#88121) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88121 Approved by: https://github.com/mrshenli commit c87f0501ab847ea900aff61be54bb67c3a27a4fe Author: Andrew Gu Date: Wed Nov 2 11:38:09 2022 +0000 [FSDP][Docs] Add note mentioning rate limiter for backward prefetch (#88120) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88120 Approved by: https://github.com/mrshenli commit 32d22edc676c176b8b247f66134b3b8913724818 Author: Andrew Gu Date: Wed Nov 2 11:38:09 2022 +0000 [FSDP()][27/N] Add forward hook registration (#88040) This PR adds the forward hook registration to composable FSDP and adds a unit test for the runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88040 Approved by: https://github.com/zhaojuanmao, https://github.com/rohan-varma commit 6fd416650ab2b5bd9be046b9ad8cccaf016e6538 Author: Christian Puhrsch Date: Wed Nov 2 23:24:33 2022 +0000 Add _foreach_addc(div/mul)(_).Tensor (#88157) Support passing value scalars as a flat 1D Tensor. Currently we can only pass either an individual scalar or a ScalarList. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88157 Approved by: https://github.com/ngimel, https://github.com/albanD commit 91a51fe9f487082e3f71055d3af41df2fb1bf88b Author: Henry Cheng <39224097+jazzysoggy@users.noreply.github.com> Date: Wed Nov 2 23:07:45 2022 +0000 [ONNX] Produce comprehensive assertion errors for quantized outputs (#87242) Fixes #83038 Currently _compare_ort_pytorch_outputs does not produce clearer error messages for differences in the zero point or scale of the two outputs. It also does not produce a clear error message for whether both are quantized. This pull request adds assertions to output whether the scales and zero points have differences, and whether each individual output is quantized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87242 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit ca2dc8b4e7ee36c3d305642490f482505ff6ad37 Author: Charlie Yan Date: Wed Nov 2 23:02:08 2022 +0000 [1/n] Thread PG: fix pyre error of class ProcessGroup (#88281) Summary: Fix the typing stub of `ProcessGroup` in "torch/distributed/__init__.py", so that it won't confuse pyre, and we can remove a lot of pyre suppression comments. Test Plan: pyre check Differential Revision: D40921667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88281 Approved by: https://github.com/wanchaol commit d1ba4c3a6d7a5007665419c57988b06d5b87e96e Author: Jiong Gong Date: Wed Nov 2 22:57:07 2022 +0000 Update Reviewers for CPU-related Modules (#87591) This PR updates the reviewers responsible for CPU related modules: "IDEEP", "oneDNN graph", "CPU ATen backend", "CPU frontend" and "Autocast". It also adds "NNC" and adds the corresponding reviewers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87591 Approved by: https://github.com/malfet commit b325c3fc25937f5fb9ba2fb1d3768cbfbefea6c6 Author: jjsjann123 Date: Wed Nov 2 22:47:30 2022 +0000 [nvFuser] patches profiling on scalar arguments for std/var (#88165) Fixes #86531 Added profiling on scalar values for aten::std & aten::var. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88165 Approved by: https://github.com/kevinstephano commit bf7c996dcb2fac229abba7ba2e0bdb379ceb2ff2 Author: PyTorch MergeBot Date: Wed Nov 2 22:35:14 2022 +0000 Revert "torchdynamo support modules() for nn_module (#88023)" This reverts commit eb91e8a534f94127a6d744543f2080a44bca9e57. Reverted https://github.com/pytorch/pytorch/pull/88023 on behalf of https://github.com/mehtanirav due to [Internal breakages](https://www.internalfb.com/intern/sandcastle/job/13510799692855066/insights) commit 7dfa75546c998f384ed5210ba9ac87c591cb36a4 Author: Huy Do Date: Wed Nov 2 21:59:54 2022 +0000 Print only the driver version from the first GPU (#88364) For example, distributed test has more than one of them: ``` nvidia-smi --query-gpu=driver_version --format=csv,noheader 515.57 515.57 ``` while `--id=0` correctly prints: ``` nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 515.57 ``` This is to avoid re-install the same driver as in https://github.com/pytorch/pytorch/actions/runs/3380662072/jobs/5613981088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88364 Approved by: https://github.com/seemethere, https://github.com/ZainRizvi commit 943b20e7ae290d8e71f877eb700f197a9df56cbe Author: Christian Puhrsch Date: Wed Nov 2 21:51:40 2022 +0000 Use tensor cores for NT bmm (#86856) Copy of internal diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86856 Approved by: https://github.com/drisspg commit 1c0d47cb17806fae3f368061f594997d87d7fd8d Author: Scott Wolchok Date: Mon Oct 31 16:17:19 2022 -0700 [PyTorch] Make c10::irange(x) generate the same assembly as for loop (#86841) `c10::irange(n)` generated an extra `sar` and `andn` instruction compared to a traditional `for` loop. now it doesn't. Differential Revision: [D40321009](https://our.internmc.facebook.com/intern/diff/D40321009/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86841 Approved by: https://github.com/r-barnes, https://github.com/malfet commit ef4ce6d4c6ce1bd5ec26d7f6f71f1c053da46945 Author: Richard Zou Date: Wed Nov 2 10:25:49 2022 -0700 Add [[noreturn]] attribute to operator() in DispatchKeyExtractor.h (#88333) Originally D40537408. Submitting this through the diff train workflow to get it merged faster. Test Plan: - Build PyTorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/88333 Approved by: https://github.com/ezyang commit 983c0e7f3101f1543bed6c4ec1539a4d590a94c0 Author: Bin Bao Date: Wed Nov 2 01:23:57 2022 +0000 [inductor] Handle the case where kwargs contains tensor (#88215) Summary: Fix https://github.com/pytorch/torchdynamo/issues/1805; currently inductor does not allow any tensor in kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88215 Approved by: https://github.com/ngimel commit 98f09c9ab3d0796ffd68c06daa408f0400835173 Author: albanD Date: Wed Nov 2 19:41:09 2022 +0000 [WIP] Add symnode magic method testing (#88119) There are failures that need to be addressed before landing: - Some issue with handling of booleans. - Most functions return wrong result when mixing int/float Pull Request resolved: https://github.com/pytorch/pytorch/pull/88119 Approved by: https://github.com/ezyang commit 99c07735e457a2961f2319b4ba19f0d04eb47967 Author: PyTorch MergeBot Date: Wed Nov 2 18:43:36 2022 +0000 Revert "Add support for neg to NestedTensor (#88131)" This reverts commit 6a75a0d1a197e378ebbf1f73f5ab93ce79cb873a. Reverted https://github.com/pytorch/pytorch/pull/88131 on behalf of https://github.com/mehtanirav due to [Internal breakages](https://www.internalfb.com/intern/sandcastle/job/13510799692239080/insights) commit 0fa23663ccd5350469c95615ddb7d2fd2a88abe3 Author: PyTorch MergeBot Date: Wed Nov 2 18:13:37 2022 +0000 Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)" This reverts commit 1e2c4a6e0e60dda763b53f00f25ee5c1f1e5233d. Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev commit 4a84d69f5098d04131d94f15cad92a46ea70b198 Author: Zachary DeVito Date: Tue Nov 1 11:35:23 2022 -0700 [functorch.dims] Fix corner cases with permute (#88226) Previously the permute function was extended to behave like the `order` function for first-class dimensions. However, unlike `permute`, `order` doesn't have a keyword argment `dims`, and there is no way to add it in a way that makes both permute an order to continue to have the same behavior. So this change just removes the extra functionality of permute, which wasn't documented anyway. Fixes #88187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88226 Approved by: https://github.com/zou3519 commit 84a302e53401f74c88495928154637af49e06fb2 Author: soulitzer Date: Wed Nov 2 11:03:04 2022 -0400 Remove wrong internal assert in handle_view_on_rebase (#88243) Fixes: https://github.com/pytorch/pytorch/issues/88205 The `CreationMeta::NO_GRAD_MODE` path in handle_view_on_rebase wrongly assumes that the tensor would be a leaf, because tensors created in no_grad are always leaf tensors. However, due to creation_meta propagation, a view of a view created in no_grad also has `CreationMeta::NO_GRAD_MODE`, but DOES have grad_fn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88243 Approved by: https://github.com/albanD commit 30dc6cee3aaa3fd30883f2953beaa3374ad0aab2 Author: Andrew Gu Date: Wed Nov 2 11:38:09 2022 +0000 [FSDP()][26/N] Move `_lazy_init()` into `_fsdp_root_pre_forward()` (#87941) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87941 Approved by: https://github.com/mrshenli commit 1e2c4a6e0e60dda763b53f00f25ee5c1f1e5233d Author: Pruthvi Madugundu Date: Wed Nov 2 17:41:57 2022 +0000 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet commit b18d0f1dc9757be4ca58059ece28ac4e60bf6f0c Author: Huy Do Date: Wed Nov 2 17:39:04 2022 +0000 Add more debug information when installing NVIDIA driver (#88168) This calls `lspci`, `lsmod`, and `modinfo nvidia` before and after the installation to gather more data about the "No GPU available" transient issue on G5 runner, i.e. https://hud.pytorch.org/pytorch/pytorch/commit/59fe272c1e698989228af5ad197bdd2985e4e9b9 This also handles `nvidia-smi` call and tries to re-install the driver if the first call fails, i.e. `No devices were found` https://hud.pytorch.org/pytorch/pytorch/commit/8ea19c802e38c061e79176360c1ecaa81ce2088a Pull Request resolved: https://github.com/pytorch/pytorch/pull/88168 Approved by: https://github.com/clee2000, https://github.com/malfet commit 923a5e96850014c84e76244874f39d9cdd186a0b Author: Michael Suo Date: Tue Nov 1 14:44:17 2022 -0700 [dynamo] Error when user nests FX with dynamo (#87797) Today, this doesn't work and dynamo errors out in a very non-obvious way (see: https://gist.github.com/suo/dde04830372ab51a4a34ea760f14200a). Here, we detect the error early and exit with a nicer msg. Also add a config option to just no-op dynamo (which need to unblock internal enablement). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87797 Approved by: https://github.com/yf225, https://github.com/soumith, https://github.com/jansel commit c503398828170549801660eba80fe04f07c5bd42 Author: Huy Do Date: Wed Nov 2 17:27:30 2022 +0000 Ignore macos usage log upload artifact failure (#88288) I'm not quite sure why GitHub starts to get flaky when we are trying to upload usage_log.txt to it (500 Internal server error). But we can live without it, so let's just ignore this for now, and follow up on this latter. The failures all come from M1 runner, so it seems to point to a connectivity issue between AWS and GitHub: * https://github.com/pytorch/pytorch/actions/runs/3373976793/jobs/5599310905 * https://github.com/pytorch/pytorch/actions/runs/3372858660/jobs/5597033598 * https://github.com/pytorch/pytorch/actions/runs/3371548201/jobs/5594274444 * https://github.com/pytorch/pytorch/actions/runs/3370877990/jobs/5592709210 * https://github.com/pytorch/pytorch/actions/runs/3370609384/jobs/5592008430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88288 Approved by: https://github.com/clee2000 commit 5b882a34c42534a2a994da2e3504abae0a730126 Author: Huy Do Date: Wed Nov 2 17:21:59 2022 +0000 Consolidate macos pip dependencies (#88071) After conda, consolidating all macos pip dependencies to cache every dependencies that macos CI needs. Two small issues are found along the way in `_mac-test-mps` workflow: * It didn't have `Install macOS homebrew dependencies` to install libomp like the regular `_mac-test` workflow * It didn't install `scipy`, thus silently skipping some `signal.windows` tests Both are fixed in this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/88071 Approved by: https://github.com/malfet commit f132c171ac542c8abe8f6bf54befd9f2e14ad9b6 Author: Andrew Gu Date: Wed Nov 2 11:38:08 2022 +0000 [FSDP()][25/N] Add `_post_forward_reshard()` (#87940) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87940 Approved by: https://github.com/mrshenli commit 5b75b19f51837e162cc0e5e5757dfd9bef437c67 Author: PyTorch MergeBot Date: Wed Nov 2 16:59:00 2022 +0000 Revert "Do not use unsafe restriding for subclasses (#87610)" This reverts commit 73379acaf3865379aed0a1bab1320616772152f3. Reverted https://github.com/pytorch/pytorch/pull/87610 on behalf of https://github.com/mehtanirav due to [Internal breakages](https://www.internalfb.com/intern/sandcastle/job/36028797828925790/insights) commit c00c34fb6939384c53cd9125de8e158f9276ee36 Author: Sherlock Huang Date: Wed Nov 2 01:32:09 2022 +0000 Fix meta for aten.upsample_bilinear2d.vec (#88158) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88158 Approved by: https://github.com/ngimel commit 71fb763e5452881cb3be8fefa9419b785d0a61e2 Author: PyTorch MergeBot Date: Wed Nov 2 16:54:36 2022 +0000 Revert "fix as_strided_scatter_backward (#87646)" This reverts commit f9d7985851f49c3b44383dae50cd77632e7e2245. Reverted https://github.com/pytorch/pytorch/pull/87646 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think this one or one of the PR in the stack break bionic-cuda11.7 on trunk https://hud.pytorch.org/pytorch/pytorch/commit/70782981f06a042796d4604df2ec1491f4f5b194 commit bf2819a836b2dac0448305be9447df0846b846b9 Author: Andrew Gu Date: Wed Nov 2 11:38:07 2022 +0000 [FSDP()][24/N] Refactor `_lazy_init()` (#87939) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87939 Approved by: https://github.com/zhaojuanmao commit bd5b4e6504bf487c313d0b85100242898ad85c8d Author: Rohan Varma Date: Wed Nov 2 16:31:16 2022 +0000 [Easy] Unused var in functional_adam (#88292) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/88292 Approved by: https://github.com/awgu commit 7382c88df2889bf58ef62fe52ed3e1361e384811 Author: Nikita Shulga Date: Wed Nov 2 16:27:40 2022 +0000 [BE][MPS] Do not use malloc/free in 2022 (#88307) Use `std::vector` to store tensor shapes and automatically free them when array goes out of scope Pull Request resolved: https://github.com/pytorch/pytorch/pull/88307 Approved by: https://github.com/kulinseth commit 4e6f5f22fd7585eb629cd884f10b4a016f6c8266 Author: Nikita Shulga Date: Wed Nov 2 16:26:11 2022 +0000 Run asan's shard 4 on `linux.4xlarge` (#88310) In attempt to mitigate OOMs, see https://github.com/pytorch/pytorch/issues/88309 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88310 Approved by: https://github.com/albanD commit 3d90788a58badb454d15601868a396853ce94ddb Author: AllenTiTaiWang Date: Tue Nov 1 17:20:37 2022 +0000 [ONNX] Add 0d-tensor test case in runtime check (#87212) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87212 Approved by: https://github.com/BowenBao commit 2aed6707100dc685b83c4b9575d9eb07f1c6fa3e Author: Thiago Crepaldi Date: Wed Nov 2 15:54:40 2022 +0000 Fix ONNX operator_export_type on the new registry (#87735) Fixes #87313 Our ONNX pipelines do not run with BUILD_CAFFE2=0, so tests for operator_export_type ONNX_ATEN and ONNX_ATEN_FALLBACK will not be fully tested, allowing regressions to happen again. We need to run the same set of tests for both BUILD_CAFFE2=0 and 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87735 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit b2679dc61cf792e922ba56b1ccb75982e6c20553 Author: Edward Z. Yang Date: Wed Nov 2 10:43:35 2022 -0400 Remove Krovatkin from dynamic shapes auto request review (#88315) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88315 Approved by: https://github.com/soumith commit dcbcf5b90e56dfb30d4f87d607f3f4b361f52077 Author: Digant Desai Date: Tue Nov 1 21:39:12 2022 -0700 [profiler] Expose experimental performance events to python (#87905) Reports total counts (includes time spent in all children), self counts can be calculated manully. Differential Revision: [D40282770](https://our.internmc.facebook.com/intern/diff/D40282770/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87905 Approved by: https://github.com/SS-JIA commit 47a542dc0601243e51231cd0d0a28a7ef0c89b2b Author: Digant Desai Date: Tue Nov 1 21:39:11 2022 -0700 Nested profiling support for Linux-perf Profiler (#87904) Add a stack of start counter values, and attribute each disable to the last enable Differential Revision: [D40539212](https://our.internmc.facebook.com/intern/diff/D40539212/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87904 Approved by: https://github.com/SS-JIA commit ebdaeaaa8c0a0e3089d7d16fa9d79a2f3185eba4 Author: Digant Desai Date: Tue Nov 1 21:39:09 2022 -0700 [edge profiler] Add e2e test for profiler event and chrometrace (#87877) * Runs an existing model and checks an aten op if it gets perf events generated in the chrometrace * Doesn't check for exact values since that's harder to do in a hardware independent way Differential Revision: [D40474957](https://our.internmc.facebook.com/intern/diff/D40474957/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87877 Approved by: https://github.com/SS-JIA commit 03346296dbfd1033cb0898983eebcd4c0af32afb Author: Digant Desai Date: Tue Nov 1 21:39:07 2022 -0700 [edge profiler] Add support for performance events counting (#87876) * Add support in lite_predictor benchmark binary to select event lists * Uses Linux perf through Kineto profiler Differential Revision: [D39837216](https://our.internmc.facebook.com/intern/diff/D39837216/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39837216/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87876 Approved by: https://github.com/SS-JIA commit bc1e9a07a3644f300e3d27b377a152c330ca6dd9 Author: Digant Desai Date: Tue Nov 1 21:39:05 2022 -0700 [profiler] Add Performance events support in Kineto profiler (#87874) * Wiring to allow user to pass event names to profiler and reflect the count to the chrometrace * If not used, the runtime and size overhead should be neglegible * For now, primary user will be KinetoEdgeCPUProfiler but the impl does not assume that * Not exposed to python yet Differential Revision: [D40238032](https://our.internmc.facebook.com/intern/diff/D40238032/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40238032/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87874 Approved by: https://github.com/SS-JIA commit 70782981f06a042796d4604df2ec1491f4f5b194 Author: Brian Hirsh Date: Tue Nov 1 20:06:45 2022 -0700 aot_dispatch test fix: always use functionalization in symbolic tests (#87647) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87647 Approved by: https://github.com/ezyang, https://github.com/Chillee commit f9d7985851f49c3b44383dae50cd77632e7e2245 Author: Brian Hirsh Date: Tue Nov 1 20:06:44 2022 -0700 fix as_strided_scatter_backward (#87646) as_strided_scatter's derivative formula was broken - instead of making a "mask" of 1's and 0's, it would effectively make a mask of 1's and uninitialized memory. Fixes https://github.com/pytorch/pytorch/issues/88105 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87646 Approved by: https://github.com/albanD commit b5a925ff2eac429d27ea55c0046dd68f24e409e2 Author: Brian Hirsh Date: Tue Nov 1 20:06:44 2022 -0700 propagate .meta info when replacing subgraphs in fx (#87255) Fixes https://github.com/pytorch/torchdynamo/issues/1708 Our FX subgraph partitioner works by taking all of the original output nodes from a subgraph, and replacing it with a new `call_module` node in the graph. If the original subgraph outputs had fake tensors and other metadata stored in their `.meta` attribute though, then this information was getting lost when we spliced in the subgraph. Losing metadata on an FX graph also seems like an easy trap to fall into, so I'm wondering if there are any better guardrails that we can add. I ended up fixing in this PR by adding an optional kwarg to propagate meta info directly in the `fx.Node.replace_all_uses_with`, just because propagating metadata seems like a pretty core thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87255 Approved by: https://github.com/wconstab, https://github.com/SherlockNoMad commit 5669e10d37fa3cca21cf82c843ae4c4e79da1b89 Author: Philip Meier Date: Wed Nov 2 11:25:06 2022 +0100 remove assert_allclose from torch.testing (#87974) See #87969 or #86586 for the reasoning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87974 Approved by: https://github.com/mruberry commit b9c617838ab34d97cea4e773e34db4e2bd3a2526 Author: Philip Meier Date: Wed Nov 2 11:25:06 2022 +0100 remove make_non_contiguous from torch.testing (#87973) See #87969 or #86586 for the reasoning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87973 Approved by: https://github.com/mruberry commit 8893c6cd074682755d5f9e4219b86a0c7f13e76c Author: Philip Meier Date: Wed Nov 2 11:25:05 2022 +0100 remove deprecated dtype getters from torch.testing (#87972) See #87969 or #86586 for the reasoning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87972 Approved by: https://github.com/mruberry commit a360be50b50dfd6ecedaa835106b3fd45d571412 Author: Philip Meier Date: Wed Nov 2 11:25:05 2022 +0100 remove deprecated device getter from torch.testing (#87971) See #87969 or #86586 for the reasoning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87971 Approved by: https://github.com/mruberry commit 554cdc9a63f9d8471061265bc49f5fbf0a220364 Author: Philip Meier Date: Wed Nov 2 11:25:05 2022 +0100 remove deprecated rand and randn from torch.testing (#87970) See #87969 or #86586 for the reasoning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87970 Approved by: https://github.com/mruberry commit bc73affdade9f8315c8e5cc62211a67562877f8b Author: Philip Meier Date: Wed Nov 2 11:25:04 2022 +0100 prepare removal of deprecated functionality in torch.testing (#87969) _Redo of #86586 with all BC breaking changes granularly placed into separate commits._ --- Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac33be0c8ad9956e3fc15e78ddb6cb95, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969 Approved by: https://github.com/mruberry commit 0fc7de398636f4b53e6c3fde38b4e48a5ff5b37d Author: Digant Desai Date: Tue Nov 1 21:39:03 2022 -0700 [profiler] Add Linux Perf support (#87866) * Add support to use Linux kernel perf subsystem via the profiler. * For now the perf configurability is quite limited to just event names. Threading etc. to come later. * Given we want to support variety of different cpu types, number of events list (in addition to the standard set of events) is also limited. * Rather than failing with unsupported feature for non-Linux platforms, it returns zeros for all the event counts. * For now, max event counts is capped at 4, time multiplexing is not allowed. * Threadpool recreate hack is restricted to mobile only - need to add better support for threading in general Differential Revision: [D40238033](https://our.internmc.facebook.com/intern/diff/D40238033/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40238033/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87866 Approved by: https://github.com/SS-JIA commit d6b58d6924057ba38faa49d68cead72989641962 Author: Andrew Gu Date: Tue Nov 1 22:47:12 2022 +0000 [FSDP()][23/N] Refactor handle attr initialization (#87938) **`_init_param_attributes()` -> `init_flat_param_attributes()`** We move `_init_param_attributes()` to `FlatParamHandle.init_flat_param_attributes()` (as already marked as to-do during previous refactoring). **`_reset_lazy_init()`** We no longer delete `_local_shard` from each `FlatParameter` in `_reset_lazy_init()`. **Analysis** Thus, the two semantic differences are that we remove the initial `if hasattr(p, "_local_shard")` early return in `_init_param_attributes()` and the `delattr(p, "_local_shard")` in `_reset_lazy_init()`. This is safe because - If we never call `_reset_lazy_init()`, then `init_flat_param_attributes()` is only called once. There is no opportunity for an early return. - If we call `_reset_lazy_init()`, then `init_flat_param_attributes()` will be called again in the next `_lazy_init()`. However, since we removed the early return, all of the attributes initialized in `init_flat_param_attributes()` simply get re-initialized and override any existing attributes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87938 Approved by: https://github.com/mrshenli commit d172dcf3164bd45d12c67da32d8196987beff997 Author: Andrew Gu Date: Tue Nov 1 22:47:12 2022 +0000 [FSDP()][21/N] Refactor and fix `_cast_buffers()` (#87935) This PR refactors and fixes `_cast_buffers()`. **Before** Buffers were not correctly cast back to their original dtypes for submodules when using buffer mixed precision. - `_cast_buffers(recurse=False)` incorrectly casts all buffers, including those in submodules. This is because of this outer loop over `self.modules()`: https://github.com/pytorch/pytorch/blob/c40033be162db0f94d37e7ccbd2a89d67f8b8e47/torch/distributed/fsdp/fully_sharded_data_parallel.py#L700 - There was a unit test that checked that buffers were cast as expected (`test_mixed_precision_e2e_full_shard()`). The unit test _coincidentally_ passed because all modules shared the same buffer name `"buffer"`. In `_cast_buffers()`, the `dict` mapping buffer name to original dtype is populated lazily (during `_lazy_init()`). However, the keys are unprefixed: https://github.com/pytorch/pytorch/blob/c40033be162db0f94d37e7ccbd2a89d67f8b8e47/torch/distributed/fsdp/fully_sharded_data_parallel.py#L712-L717 - Thus, even though (1) `_cast_buffers(recurse=False)` was only called on the root and (2) `self._buffer_name_to_orig_dtype` had unprefixed names as keys, the unit test still passed because (1) `_cast_buffers()` still looped over all buffers despite `recurse=False` and (2) all submodules' buffers were named `"buffer"` and had the same original and low-precision dtypes and hence were cast correctly. If we change each submodule to have its own distinct buffer name, then the unit test fails. This PR makes such a change to showcase the progression granted by this PR. **After** This PR separates `_cast_buffers()` into three methods: `_get_buffers_and_dtypes_for_computation()`, `_get_buffers_and_dtypes_for_checkpoint()`, and `_cast_buffers_to_dtype_and_device()`. This is to separate the different use cases (casting for computation and casting for checkpointing) and the corresponding code paths. Plus, the signature for `_cast_buffers_to_dtype_and_device()` makes it clear exactly what buffers are being cast and to what dtype. Both `_get_...()` functions assume that they are called on the root only for now. This coincides with the construction of `_buffer_name_to_orig_dtype` in the FSDP constructor, which loops over all submodules. (This means that for non-root modules, their `_buffer_name_to_orig_dtype` is populated but not used.) The `dict`'s keys are clean since the buffer cast to original dtype happens in a `summon_full_params()` context, which cleans the names. **Follow-Ups** - We can try to move `_get_buffers_and_dtypes_for_checkpoint()` into `_state_dict_utils.py` in a follow-up. - We may want to move to per-module buffer casting (i.e. do not have the root module cast for all submodules). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87935 Approved by: https://github.com/mrshenli commit b0b1e78e2ddccc07f94762bdfe33770c75d12db1 Author: Andrew Gu Date: Tue Nov 1 22:47:11 2022 +0000 [FSDP] Rename `dtype` to `buffer_name_to_dtype` (#87934) This PR is easy and only a rename. `dtype` does not convey that it is actually a `Dict[str, torch.dtype]` (when not `None`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87934 Approved by: https://github.com/mrshenli commit d14fc0bc36601b4e72b09d50c2cfdcfdb61be4ad Author: Andrew Gu Date: Tue Nov 1 22:47:11 2022 +0000 [FSDP] Remove `device` arg from `_cast_buffers()` (#87933) This PR is easy. The `device` argument in `_cast_buffers()` is never used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87933 Approved by: https://github.com/mrshenli commit 19c7df89fbdf2d8a28ba67d10cfdfe7540bb0c55 Author: Andrew Gu Date: Tue Nov 1 22:47:11 2022 +0000 [FSDP()][20/N][Easy] Move functions in file (#87932) This PR is easy. I just wanted to group functions in the file according to the same logical order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87932 Approved by: https://github.com/mrshenli commit 4635f56da170dbf25759b2128e57a387d01cb41c Author: Andrew Gu Date: Tue Nov 1 22:47:10 2022 +0000 [FSDP()][18/N] Refactor `pre_forward_unshard()` (#87931) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87931 Approved by: https://github.com/mrshenli commit 0a752688bda6cb217852068e15da5133a9d7e5b6 Author: Andrew Gu Date: Tue Nov 1 22:47:10 2022 +0000 [FSDP()][17/N] Refactor `_fsdp_root_pre_forward()` (#87930) This PR moves `_fsdp_root_pre_forward()` to `_runtime_utils.py`. Note: This PR includes a (temporary) fix for `NO_SHARD` + `CPUOffload(offload_params=True)`, where we set `non_blocking=False` when copying the gradient from device to host. It is only included in this PR since the test was **flaky** (but not consistently failing) on this PR , so I needed to fix to unblock land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87930 Approved by: https://github.com/mrshenli commit 39d9d2ed706dd9737ee49828b63c8975f31a87fe Author: lezcano Date: Tue Nov 1 16:31:11 2022 +0000 Implement reference for lerp (#87424) We follow the vectorised CPU implementation for numerical accuracy Pull Request resolved: https://github.com/pytorch/pytorch/pull/87424 Approved by: https://github.com/ezyang commit 6b5d7fccc6140fc15a7e882e9c8de21477e31459 Author: Ivan Yashchuk Date: Wed Nov 2 11:11:28 2022 +0000 Add a basic test for "nvprims_nvfuser" Dynamo backend (#88186) Ref. https://github.com/pytorch/pytorch/pull/87797#issuecomment-1297635210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88186 Approved by: https://github.com/ezyang commit 9ebb8d52320feb5c634ddde767d0404b94443443 Author: Ivan Yashchuk Date: Wed Nov 2 10:05:12 2022 +0000 Add ops.broadcast for nvFuser (#88080) Having nvFuser's `broadcast` available alongside `broadcast_in_dim` would allow easier experimentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88080 Approved by: https://github.com/jjsjann123, https://github.com/kevinstephano, https://github.com/mruberry commit 2ddefbdc3c4fab70b4c2898a0a25e403610741fc Author: Kazuaki Ishizaki Date: Wed Nov 2 09:38:13 2022 +0000 Fix typos used in documents under torch directory (#88300) This PR fixes typos, in comments of Python files, that are found from a search box at https://pytorch.org/docs/master/search.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/88300 Approved by: https://github.com/lezcano commit 4a8382b58eeca9eed09c7c3b801b81befc2f75ce Author: Ivan Yashchuk Date: Wed Nov 2 09:29:20 2022 +0000 Update caching of tensor arguments for nvFuser's fusion creation (#87860) Previously nvFuser's fusion definition was cached based on concrete shape and strides of tensor inputs for simplicity and correctness. This PR changes Python's cache to check the number of dimensions, size-1 dimensions, and contiguity information based on given strides and shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87860 Approved by: https://github.com/kevinstephano, https://github.com/jjsjann123, https://github.com/ngimel commit ccf6b558a4c58d1ae92689b2a5064916b42eff05 Author: Yanbo Liang Date: Wed Nov 2 06:58:02 2022 +0000 [Dynamo] UserFunctionVariable supports type & ABCMeta as arguments (#88257) Fixes https://github.com/pytorch/torchdynamo/issues/1785 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88257 Approved by: https://github.com/ezyang commit e763b7abebd7e3e9376a59b5f728916e0ca084a8 Author: Kshiteej K Date: Wed Nov 2 06:37:33 2022 +0000 [complex] conv_transpose3d : complex support (#87967) Reference: https://github.com/pytorch/pytorch/issues/71108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87967 Approved by: https://github.com/anjali411 commit 7674af9ce7a3f5b210f16d0e935a89c76440434c Author: PyTorch MergeBot Date: Wed Nov 2 05:22:38 2022 +0000 [vision hash update] update the pinned vision hash (#88162) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88162 Approved by: https://github.com/pytorchbot commit 4ab5d79b286007eb126ca0002cdaed2305c05cc1 Author: Fabio Rocha Date: Tue Nov 1 19:29:17 2022 +0000 [inductor] Updated some triton.libdevice calls (#88242) triton master now does not require `d` or `f` suffix to some libdevice function calls - it dispatches to right library call based on argument type. triton pin updated to https://github.com/openai/triton/commit/f16138d447bccc54641a9c48ffedbd449a1a40a7 Also removed some xfails for some unrelated tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88242 Approved by: https://github.com/ngimel commit a51da28551e9f13a7afca5bbc829a8d9abced44e Author: Will Constable Date: Wed Nov 2 03:52:17 2022 +0000 Support multi-gpu CI for inductor-distributed (#87996) This test by itself isn't the end goal, but it is a minimal test that exercises multi-gpu and the focus of the PR is the infra behind enabling that. I'll follow up with more tests using actual models etc. and @malfet @desertfire for awareness/feedback on the infra side Pull Request resolved: https://github.com/pytorch/pytorch/pull/87996 Approved by: https://github.com/aazzolini commit 95fc0bcaaddc2d24e8759f24dbefa789d04e9e42 Author: Edward Z. Yang Date: Mon Oct 31 13:33:52 2022 -0700 Disable torchdynamo in backwards compiler harder (#88132) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88132 Approved by: https://github.com/bertmaher, https://github.com/malfet commit 3c6bddc3f6347ce7d1ed33aee94cdaa953cbc387 Author: eqy Date: Wed Nov 2 01:36:37 2022 +0000 [cuDNN] (re-open) Enable cuDNN Frontend v8 API by Default (#87669) Has a small tweak to a test that was breaking on A10 (CC @malfet). CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/87669 Approved by: https://github.com/ngimel commit dfa94757557da460b677dbcc7edcb19f0e7122d7 Author: Peter Bell Date: Tue Nov 1 18:03:06 2022 +0000 Check SM version before calling flash attention with BFloat16 (#86600) The flash attention code path requires sm80 or newer to run on BFloat16, so any OpInfo tests running with BFloat16 would fail with the error: ``` RuntimeError: Expected q_dtype == at::kHalf || (is_sm8x && q_dtype == at::kBFloat16) to be true, but got false. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86600 Approved by: https://github.com/ngimel commit bc9caafc7898a6df534f566f599f8f5a78d207d1 Author: Peter Bell Date: Tue Nov 1 17:50:07 2022 +0000 record_function: update to use custom_class API (#76420) Re-submit of gh-72302 This still has a small performance hit, but it much smaller. On my machine I see `_record_fucntion_exit._RecordFunction` takes 1.05 us compared to the `Tensor` overload taking 0.79 us. In an overall comparison, I see a 0.7 us slowdown from 6.0 us to 6.7 us for this timeit benchmark ```python import torch def foo(): with torch.profiler.record_function("foo"): return torch.eye(3) %timeit foo() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/76420 Approved by: https://github.com/robieta commit 0131a66ab6c8454c9ac7517641f63095b090e8cb Author: Kazuaki Ishizaki Date: Tue Nov 1 22:58:22 2022 +0000 Fix typos under torch directory (#88172) This PR fixes typos in '.md' files under torch directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/88172 Approved by: https://github.com/malfet commit 72958b9665a59c1fc53c2254c675530dcc2886dd Author: Yanbo Liang Date: Tue Nov 1 22:45:11 2022 +0000 [Dynamo] Update Dynamo benchmarks running commands (#87844) Fixes https://github.com/pytorch/torchdynamo/issues/1761 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87844 Approved by: https://github.com/jansel commit a56beb2a822a07b8a933ef71f20813715a58e030 Author: jjsjann123 Date: Tue Nov 1 22:43:51 2022 +0000 [nvfuser] merge rule update (#88228) adding Kevin to NVFuser reviewer Pull Request resolved: https://github.com/pytorch/pytorch/pull/88228 Approved by: https://github.com/soumith commit fb1586fbcb23b3427c55b7f1c9bd554f4a1aa05d Author: Shiyan Deng Date: Tue Nov 1 22:42:04 2022 +0000 Make a copy of the submodule inputs (#87899) Summary: There might be inplace ops in the model that would change the saved inputs. To avoid that, we save a deepcopy version. Test Plan: CI Differential Revision: D40771290 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87899 Approved by: https://github.com/houseroad commit 73492645cfcee7f2b3b6f6803cca1baca814a901 Author: Charlie Yan Date: Tue Nov 1 17:46:00 2022 +0000 Copy DDP code to be reused in composable API (#87836) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87836 Approved by: https://github.com/mrshenli commit b2dfd2026034c8ca13d9ddef7fd990a3f2054a1e Author: Andrew M. James Date: Mon Oct 31 18:29:05 2022 -0500 Remove BSC conversion skip from TestSparseCompressed.test_consistency (#88152) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88152 Approved by: https://github.com/cpuhrsch commit d044b4cc58f1e088e19c0f731a18630607887389 Author: Andrew M. James Date: Mon Oct 31 18:29:02 2022 -0500 Update torch.abs and torch.positive opinfos to reflect sparse support (#88151) cc @nikitaved @pearu @cpuhrsch @bhosmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/88151 Approved by: https://github.com/cpuhrsch commit ffd54def8fa52653d3a68bc00f0583e8d16d6acb Author: Nikita Shulga Date: Tue Nov 1 22:17:12 2022 +0000 [GHF] Remove CC line from commit message (#88252) This line is added by autoCCBot, but is not really meaningful as commit message Test Plan: ``` >>> from trymerge import GitHubPR, RE_PR_CC_LINE >>> import re >>> pr=GitHubPR("pytorch", "pytorch", 87809) >>> re.sub(RE_PR_CC_LINE, "", pr.get_body()) 'Fixes #ISSUE_NUMBER\r\n\n\n' >>> pr=GitHubPR("pytorch", "pytorch", 87913) >>> re.sub(RE_PR_CC_LINE, "", pr.get_body()) 'Parallel compilation warms the Threadpool when we call `torch._dynamo.optimize()`. In current benchmarks, we were setting up the TRITON_CACHE_DIR much later. Because of this parallel compilation artifacts were not used and compilation latency improvements were not visible in dashboard. This PR just prepones the setup of TRITON_CACHE_DIR.\n\n' >>> pr=GitHubPR("pytorch", "pytorch", 85692) >>> re.sub(RE_PR_CC_LINE, "", pr.get_body()) 'This PR sets CUDA_MODULE_LOADING if it\'s not set by the user. By default, it sets it to "LAZY".\r\n\r\nIt was tested using the following commands:\r\n```\r\npython -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"\r\n```\r\nwhich shows a memory usage of: 287,047,680 bytes\r\n\r\nvs\r\n\r\n```\r\nCUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"\r\n```\r\nwhich shows 666,632,192 bytes. \r\n\r\nC++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality).\r\n\r\n' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88252 Approved by: https://github.com/xuzhao9, https://github.com/izaitsevfb commit ba643b4ddf3ef0e6ada3fdfd885ed18a71ed8e44 Author: Sean Ross-Ross Date: Tue Nov 1 21:42:51 2022 +0000 feature: adding batch support for narrow_copy operator (#88130) Implement batch support https://github.com/pytorch/functorch/issues/825 for narrow copy narrow_copy was already added as an opinfo cc @zou3519 @Chillee @samdow @soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/88130 Approved by: https://github.com/kshitij12345, https://github.com/zou3519 commit c40033be162db0f94d37e7ccbd2a89d67f8b8e47 Author: Manuel Candales Date: Tue Nov 1 21:01:31 2022 +0000 [Vulkan][TCC] Implement tests for cat_batch, cat_width and normalize_dim (#87633) Summary: Implement Vulkan tests for these untested functions in Concat.cpp: - cat_batch - cat_width - normalize_dim Test Plan: ```cd ~/fbsource buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 ``` Differential Revision: D40605571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87633 Approved by: https://github.com/salilsdesai, https://github.com/kirklandsign, https://github.com/SS-JIA commit e6ea0a4a4b4ae840ff441a2a7331030dee942766 Author: Elias Ellison Date: Tue Nov 1 08:56:06 2022 -0700 Don't Require contiguous For Extern Kernels (#87650) cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87650 Approved by: https://github.com/desertfire commit 8ef9bda1bf7df84483c593f55e704657887120d6 Author: Kevin Stephano Date: Tue Nov 1 19:02:40 2022 +0000 Fix nvFuser Fusion Definition printing of Squeeze and Permute (#88041) NM cc @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88041 Approved by: https://github.com/IvanYashchuk, https://github.com/jjsjann123, https://github.com/mruberry commit 68f9f256a3ccae692204d42600e618fb2112b8cb Author: Jerry Zhang Date: Fri Oct 28 11:29:04 2022 -0700 [reland][fx][subgraph_rewriter] Change match_filter to be a List in replace_pattern_with_filters (#87998) Summary: att, this is experimental api so not marking it as bc-breaking. The match will be accepted only if all the filters in the list passes. Changing the filter arg to be list also allows us to pass in empty list that means no filter, which makes user code cleaner. Test Plan: python test/test_fx.py -k test_replace_pattern_with_filters Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D40810943](https://our.internmc.facebook.com/intern/diff/D40810943) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87998 Approved by: https://github.com/SherlockNoMad commit 2c7de4a14425759fdfdca7d0c5091ceafe564695 Author: Tugsbayasgalan Manlaibaatar Date: Mon Oct 31 10:58:36 2022 -0700 Add meta implementation for aten.max.dim (#88005) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88005 Approved by: https://github.com/Chillee, https://github.com/bdhirsh commit 97b3eeac90d6e3fdc093117422e031b8630528a1 Author: jjsjann123 Date: Tue Nov 1 18:07:17 2022 +0000 remove assert on tensor inputs to FusionGroup (#88018) Fixes #86530 #86227 #85872 All issues seem to be duplicate of each other. Removes the false positive assert Fixes come from @kevinstephano Pull Request resolved: https://github.com/pytorch/pytorch/pull/88018 Approved by: https://github.com/kevinstephano, https://github.com/soumith commit e1c123d29a40ae1f3eae312a118e22769b1db870 Author: Nikita Shulga Date: Tue Nov 1 17:59:35 2022 +0000 Add UBSAN to ASAN (#88055) Add undefined behavior sanitizer to `USE_ASAN` option. Added `torch._C._crash_if_vptr_ubsan()` that only fails if vptr belongs to a wrong class after typecast Deleted all ubsan supressions, but disabled `ProtoTest::Basic` as it fails above-mentioned vptr check. Fixes https://github.com/pytorch/pytorch/issues/88042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88055 Approved by: https://github.com/ezyang commit 81f74eed75d24b63b5af8e818d74667647702dbf Author: Howard Huang Date: Mon Oct 31 16:33:12 2022 -0700 [11/N] Update all_to_all with CPU/CUDA implementations (#86407) * #83916 [7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations Pull Request resolved: https://github.com/pytorch/pytorch/pull/86407 Approved by: https://github.com/kwen2501 commit 90fa25705c0da7b50b88302626988267210186ba Author: Ivan Yashchuk Date: Tue Nov 1 17:46:52 2022 +0000 Rename 'nvfuser' to 'ts_nvfuser' indicating TorchScript usage (#88188) cc @kevinstephano @jjsjann123 @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88188 Approved by: https://github.com/soumith, https://github.com/jansel commit bed8102741bdd62936cc743e83751e2fb91a5a3f Author: Howard Huang Date: Mon Oct 31 16:33:11 2022 -0700 [10/N] Update barrier with CPU/CUDA implementations (#86368) - Updates for the barrier collective - NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from https://github.com/pytorch/pytorch/issues/86225 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/86368 Approved by: https://github.com/kwen2501 commit 1f34067e9d83aa11f2f4d0ecd08cb0e0ed94dbd0 Author: Andrew Gu Date: Tue Nov 1 13:36:04 2022 +0000 [FSDP()][16/N] Refactor post-forward/pre-backward (#87929) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87929 Approved by: https://github.com/mrshenli commit 5a53f024e4a3a7958e97546a03fe224788a91df5 Author: Andrew Gu Date: Tue Nov 1 13:36:03 2022 +0000 [FSDP()][15/N] Refactor `_init_streams()` (#87928) This PR is easy. I think I move `_init_streams()` again in a later PR though :/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/87928 Approved by: https://github.com/mrshenli commit 90c5f856b2bd0b8c1776baac959219e9487ba4b1 Author: Andrew Gu Date: Tue Nov 1 13:36:03 2022 +0000 [FSDP()][14/N] Refactor pre-forward/post-backward (#87927) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87927 Approved by: https://github.com/mrshenli commit eb91e8a534f94127a6d744543f2080a44bca9e57 Author: Yidi Wu Date: Tue Nov 1 17:10:45 2022 +0000 torchdynamo support modules() for nn_module (#88023) Differential Revision: D40820879 This diff allows models to call self.modules() during dynamo tracing. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88023 Approved by: https://github.com/tugsbayasgalan, https://github.com/voznesenskym, https://github.com/jansel commit de1f641f11d3a486032cd2a63ac958ec23d2c92b Author: Sherlock Huang Date: Tue Nov 1 02:02:20 2022 +0000 Fix meta function for aten.addmm (#88068) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88068 Approved by: https://github.com/albanD commit fdc419786df8a6d86edf8c82b07bf4e9b8b551d0 Author: Thiago Crepaldi Date: Tue Nov 1 16:43:58 2022 +0000 Add unit test for torch_geometric library (#85937) Fixes #65138 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85937 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit 5c3666cb813f6ffa9e11580552c35435716703de Author: Han Qi (qihqi) Date: Tue Nov 1 16:11:30 2022 +0000 [codev] Make backport work with flatbuffer models (#88127) Summary: By adding flatbuffer as dependency of backport. Differential Revision: D40865452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88127 Approved by: https://github.com/cccclai commit bb7e6254e4387e66beb938fb5b756d0f5c28d2a1 Author: Edward Z. Yang Date: Mon Oct 31 14:42:19 2022 -0700 Add ability to freeze storages inside functionalization (#88141) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88141 Approved by: https://github.com/albanD, https://github.com/bdhirsh commit 61f955dd83c0a6e12aca2a0a7c7bf267bcdd1bc5 Author: Edward Z. Yang Date: Mon Oct 31 14:42:15 2022 -0700 Inline Alias into FunctionalStorageImpl (#88140) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88140 Approved by: https://github.com/bdhirsh commit 73c9911fc001991809a6c90e2d61f71fc69ffde6 Author: Natalia Gimelshein Date: Tue Nov 1 15:47:43 2022 +0000 always realize output regardless of the number of reads (#88046) This improves hf_Bert 1.139x->1.21x, currently lowmem dropout doesn't work for nn.Dropout module, and before this change we were recomputing all the dropout masks in a very inefficient kernel. This change pushes dropout masks to be saved in the dropout kernels where they are first computed. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88046 Approved by: https://github.com/Chillee commit c368c0faf08528fb73a2b74f905946268a3224a3 Author: Sherlock Huang Date: Tue Nov 1 02:02:19 2022 +0000 Fix meta for aten.fill, constant_pad_nd, _adaptive_avg_pool2d (#88069) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88069 Approved by: https://github.com/ngimel, https://github.com/malfet commit 82a9de16d4e1cb6da3626b26d6cbaaaa9258721c Author: Will Constable Date: Tue Nov 1 15:35:44 2022 +0000 Change dynamo/distributed tests to use cuda/nccl (#88133) - FSDP tests require nccl - also run in inductor shard and skip inductor in distributed shard - inductor shard has newer GPU and supports triton/inductor, but only runs on trunk - distributed shard runs on PR, but inductor shard only runs on trunk/opt-in Pull Request resolved: https://github.com/pytorch/pytorch/pull/88133 Approved by: https://github.com/davidberard98 commit 44f8efd5c1cd5c7641cb875e615ae480a730b9fa Author: Yanli Zhao Date: Tue Nov 1 15:27:40 2022 +0000 [BE]fix DDP when the number of output features is zero (#87793) Fixes #87280 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87793 Approved by: https://github.com/rohan-varma commit 20d849b98237f83ab4fcc5439d8e8c8f8fd71c8c Author: Howard Huang Date: Mon Oct 31 16:33:11 2022 -0700 [9/N] [Dispatchable Collectives] Update reduce_scatter with CPU / CUDA implementations (#86166) - Updates for the reduce_scatter collective https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86166 Approved by: https://github.com/kwen2501 commit 1e5d33b6dfc0ed477fec57aca63427d038751207 Author: Edward Z. Yang Date: Mon Oct 31 17:48:35 2022 -0700 Reenable assert sanity testing with ADInplaceOrView reenable (#88102) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88102 Approved by: https://github.com/albanD commit bdb14238ec66640a9523a479fac60eda26a3b552 Author: AllenTiTaiWang Date: Mon Oct 31 23:44:23 2022 +0000 [Reland][ONNX] Move all torch.onnx.export related tests to test/onnx (#87292) Moving torch.onnx.export related tests to test/onnx integrates ONNX tests to the same CI machine, so the testing environment can be better managed. Fixes https://github.com/pytorch/pytorch/issues/87320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87292 Approved by: https://github.com/thiagocrepaldi, https://github.com/BowenBao, https://github.com/kit1980, https://github.com/malfet commit 62988e4fe642561e82ac95114214cdd10273a936 Author: Charlie Yan Date: Tue Nov 1 13:51:06 2022 +0000 Update _distributed_c10d.pyi (#88088) Summary: `_distributed_c10d.pyi` is out of sync with the C++ binding. This change updates it. Test Plan: TBD Differential Revision: D40840836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88088 Approved by: https://github.com/wanchaol commit b1750d0440e0bcc94de2295a8f24a2cd0cdcd886 Author: Andrew Gu Date: Tue Nov 1 01:16:26 2022 +0000 [FSDP()][13/N] Refactor unshard/reshard/grads (#87926) This PR is not too complicated. We just move unshard/reshard/grads out to `_runtime_utils.py` and make them take `state: _State` instead of `self`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87926 Approved by: https://github.com/mrshenli commit 8039317c07874b62647457bac7bf5df499f41501 Author: Andrew Gu Date: Mon Oct 31 20:54:53 2022 +0000 [FSDP()][12/N] Easy cleanup (#87925) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87925 Approved by: https://github.com/mrshenli commit c1e28731b382ba5ea742cfc5a35bda6e4bcb35fc Author: Andrew Gu Date: Mon Oct 31 20:54:52 2022 +0000 [FSDP()][10/N][11/N] Introduce composable (ctor only) (#87924) This PR introduces the composable FSDP API (with constructor semantics only) along with some further constructor refactoring. A notable contribution here is `_get_submodule_to_states()`, which performs auto wrapping without actually wrapping. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87924 Approved by: https://github.com/mrshenli commit 78170701a348619745dc289bdf591384929a414a Author: Andrew Gu Date: Mon Oct 31 20:54:52 2022 +0000 [FSDP()][9/N] Refactor ctor (continued) (#87923) This PR makes a second pass over the constructor. The logic has been grouped into `_init_<...>` functions based on intent (e.g. `_init_prefetching_state()` or `_init_runtime_state()`). This makes the initialization code for composable FSDP much cleaner than having to re-write the same sequences of lower-level helper calls. This PR also moves `_ExecOrderData` into its own file `_exec_order_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87923 Approved by: https://github.com/mrshenli commit 23fe6c8ca15ec2cf6ea74f93aa91cae343ea534f Author: Mike Iovine Date: Tue Nov 1 09:58:26 2022 +0000 [Static Runtime] Fix ReplaceWithMaybeCopy test in OSS (#88099) Summary: `ReplaceWithMaybeCopy` is guarded by `FBCODE_CAFFE` in `OptimizeGraph`. Run the pass manually to ensure it does the replacement. Test Plan: Existing tests Differential Revision: D40858743 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88099 Approved by: https://github.com/huydhn commit 7c6fe21a386213617d77b98be28729e6e32b29a0 Author: Huy Do Date: Tue Nov 1 05:58:42 2022 +0000 Fix monitoring script for macos (#88159) The monitoring script is currently failing with AccessDenied when trying to access uss memory on mac because [psutil.memory_full_info](https://psutil.readthedocs.io/en/latest/index.html?highlight=memory_full_info) requires higher user privileges Example failures: * https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3363066309/1/artifact/usage-log-test-default-2-2-macos-12_9208104847.zip * https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3363066309/1/artifact/usage-log-test-default-2-2-macos-m1-12_9207913759.zip I could also make this script run with sudo, effectively granting this permission. But I'm not entirely sure that we need uss memory for mac, so gracefully handling the error looks nicer Pull Request resolved: https://github.com/pytorch/pytorch/pull/88159 Approved by: https://github.com/clee2000 commit 323c646ca9e0f8eb452ed446b305382afcc7e270 Author: Kevin Stephano Date: Tue Nov 1 05:05:15 2022 +0000 Cleaned up the nvFuser Python Frontend Batch Norm printing (#88057) * Removed `define_null_tensor` usage in favor of using optional arguments for binding. * Re-ordered the non-State arguments for easier printing. * Added a printing function to include booleans `training` and `channels_last` * Fixed `define_tensor` to print `is_cpu` cc @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88057 Approved by: https://github.com/IvanYashchuk, https://github.com/jjsjann123, https://github.com/mruberry commit a6acbad5c33a60109ba8373da8aa61a728ae4b20 Author: Nikita Shulga Date: Tue Nov 1 03:59:51 2022 +0000 [BE] Use default constructor in `LoggerVoidify` (#88054) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88054 Approved by: https://github.com/kit1980 commit 560786ac206278a48121feef2c4c55d71bdb9a77 Author: Driss Guessous Date: Tue Nov 1 03:14:24 2022 +0000 call contiguous on BMM inputs for NT on CUDA (#88108) Fixes #87713 BMM for cpu supports non-contiguous nested tensor inputs, while BMM for Cuda does not support currently non-contiguous inputs. The derivative for BMM: ``` - name: bmm(Tensor self, Tensor mat2) -> Tensor self: grad.bmm(mat2.transpose(1, 2).conj()) mat2: self.transpose(1, 2).conj().bmm(grad) result: self_t.bmm(mat2_p) + self_p.bmm(mat2_t) ``` When calling backward it was impossible for this function to succeed since the inputs were always discontiguous, regardless of the user input. This adds contiguous calls to BMM_cuda implementation for nested tensors. This was not caught by tests because grad_check is currently only done on CPU in test_nestedtensors. This PR updates the autograd test to also be run on GPU. As a result I found one more issue with the backward for to_padded_tensor erroring instead of calling the generic version. cc @cpuhrsch @jbschlosser @bhosmer @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/88108 Approved by: https://github.com/cpuhrsch commit 0eea05b11e628eb4fd35a5664b3dd3812ab61461 Author: Ivan Yashchuk Date: Tue Nov 1 03:09:34 2022 +0000 Remove "prims_nvfuser" backend for TorchDynamo (#88083) Removing "prims_nvfuser" backend according to the discussion in https://github.com/pytorch/torchdynamo/pull/1281#discussion_r979468355. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88083 Approved by: https://github.com/ezyang commit a8aaee77bed496c3f0c80410c62c4aac5bff4296 Author: Sahan Paliskara Date: Mon Oct 31 12:40:30 2022 -0700 [torch::deploy] add gpu unit tests to CI (#88107) Adds `torch::deploy`'s GPU tests to core CI to make sure core changes don't break them. Overall, deploy tests take 11 min, so it shouldn't be much of a burden :) https://github.com/pytorch/pytorch/actions/runs/3364231795/jobs/5578861939 Differential Revision: [D40861442](https://our.internmc.facebook.com/intern/diff/D40861442) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88107 Approved by: https://github.com/d4l3k, https://github.com/anirbanr-fb-r2p commit 6a75a0d1a197e378ebbf1f73f5ab93ce79cb873a Author: Christian Puhrsch Date: Tue Nov 1 02:37:42 2022 +0000 Add support for neg to NestedTensor (#88131) Partially fixes #86889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88131 Approved by: https://github.com/drisspg commit 708c050af9878f46a79088ab92e22dd0589fdcd4 Author: yanbing-j Date: Tue Nov 1 02:06:30 2022 +0000 Add labeler with cpu, mkldnn, amp, NNC and quantization paths to start (#87690) This PR is to dd labeler with `module: cpu`, `module: mkldnn`, `module: amp (automated mixed precision)`, `NNC` and `oncall: quantization' paths to start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87690 Approved by: https://github.com/ezyang, https://github.com/malfet commit 3aa7a528553cca8d473805caa0e52fe63fb11f52 Author: maxren Date: Mon Oct 31 10:18:53 2022 -0700 [xnnpack][lite-int][4/n] introduce serialization to delegate (#87908) We introduced the serializer we created in the previous diff to our XNNGraph builder, the purpose of this is to serialize parts of the graph as we build this. At the end, we are able to finish and serialize the xnngraph into a std::string for use when we forward this along to on-device runtime. The next diff will rebuild the xnngraph from the serialization we introduce here, so testing the serialization of the graph will be done in the next diff Differential Revision: [D39335580](https://our.internmc.facebook.com/intern/diff/D39335580/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39335580/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87908 Approved by: https://github.com/digantdesai commit 8287c1d96494805728b06e3c7b80d07b55897352 Author: maxren Date: Mon Oct 31 10:18:51 2022 -0700 [xnnpack][lite-int][3/n] flatbuffer serializer class (#87907) Creating a serializer class that allows us to serialize the xnnpack graph creation arguments. This essentially abstracts away the flatbuffer api manipulation and serialization that we deal with. As a result we can call ``` XNNSerializer::serializeAddNode() XNNSerializer::serializeTensorValue() XNNSerializer::finishAndSerialize ``` to serialize the graph Differential Revision: [D39196312](https://our.internmc.facebook.com/intern/diff/D39196312/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39196312/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87907 Approved by: https://github.com/digantdesai commit 7bf819b181f3d4407e06b25d2f8fdd2230c44891 Author: maxren Date: Mon Oct 31 10:18:49 2022 -0700 [xnnpack]lite-int][2/n] flatbuffer xnn_value schema (#87906) serializer schema for xnnpack graphs Differential Revision: [D39003170](https://our.internmc.facebook.com/intern/diff/D39003170/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87906 Approved by: https://github.com/digantdesai commit 905d532d39526cf612857a4a802702af1805a71c Author: maxren Date: Mon Oct 31 10:18:46 2022 -0700 [xnnpack][lite-int][1/n] flatbuffer buck rules (#87826) Writing a placeholder schema.fbs file for now to setup the buck gen rules. The generated schema file will be used in the xnnpack name space and be reserved for serialization/deserialization of our xnnpack lowered graph Steps Accomplished - Buck rules to compile flatbuffer schema - added header file to preprocess - everything compiles correctly Differential Revision: [D38999169](https://our.internmc.facebook.com/intern/diff/D38999169/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38999169/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87826 Approved by: https://github.com/digantdesai commit aa1f9a1bd79da47d17f28a50f467a628728e68ac Author: maxren Date: Mon Oct 31 10:18:45 2022 -0700 [xnnpack][lite-int][graph-build] torchscript -> xnnpack graph (#87824) This point we perform conversion for Torchscript IR to XNNPack graph. Currently we only support converting Add Nodes and fp32 tensor values. As a caveat, we are not building this at runtime. So for testing we just run the xnn graph once ahead of time and with sample inputs and forward it to execute. This is only for testing, and will be changed in a later diff. This will allow us to check that graph creation is sound. Differential Revision: [D39838851](https://our.internmc.facebook.com/intern/diff/D39838851/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87824 Approved by: https://github.com/digantdesai, https://github.com/salilsdesai commit d596b048e5b7fba24e2dfb33413462c646950d93 Author: Edward Z. Yang Date: Mon Oct 31 09:25:34 2022 -0400 Also skip large models for normal --accuracy runs (#88086) Signed-off-by: Edward Z. Yang cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88086 Approved by: https://github.com/albanD commit afd00673b6dedbdb811cfb1a9078deee1cb53f38 Author: Driss Guessous Date: Tue Nov 1 00:00:35 2022 +0000 Change Nested Tensor logging copy (#88104) Change the copy of how we log NestedTensor usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88104 Approved by: https://github.com/mikaylagawarecki commit c0761a835b88eb2ed2186a9aaac73d471c2fb843 Author: PyTorch MergeBot Date: Mon Oct 31 23:49:37 2022 +0000 Revert "[dynamo] Error when user nests FX with dynamo (#87797)" This reverts commit 1da5aeb97b73664ff0fe2f4bb48379655cede969. Reverted https://github.com/pytorch/pytorch/pull/87797 on behalf of https://github.com/ezyang due to breaks nvfuser stack, needs more investigation commit caaf37a1116cf4ce0f372bbd9241f8a827dc33b7 Author: Nikita Shulga Date: Mon Oct 31 23:38:03 2022 +0000 Fix `PyTorchStreamWriter` exception handling (#88128) Avoid double exception in destructor if attempting to serialize to python object that does not have `write` method Use `Finalizer` class in `PyTorchStreamWriter::writeEndOfFile()` to a always set `finailized_` property even if excretion occurs. (as there isn't much one can do at this point) Add expicit check for the attribue to `_open_zipfile_writer_buffer` and add unitests Modernize code a bit by using Python-3 `super()` method Fixes https://github.com/pytorch/pytorch/issues/87997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88128 Approved by: https://github.com/albanD commit ea8a5b09a9e1e08017c245799891496bfd40c7f6 Author: John Detloff Date: Mon Oct 31 23:36:00 2022 +0000 [IOS] Update Cocoapods for 1.13 release (#88075) Update the podspecs for libtorch and libtorch-lite to v 1.13 to prepare for the 1.13 pod release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88075 Approved by: https://github.com/manuelcandales, https://github.com/salilsdesai, https://github.com/malfet commit bc03aa6013e101222c9652d04a2b08e48f626dfb Author: Masaki Kozuki Date: Mon Oct 31 22:45:23 2022 +0000 Store `autocast_gpu_dtype` in `custom_fwd` and `custom_bwd` for BFloat16 autocast (#88029) As per #87979, `custom_bwd` seems to forcefully use `torch.float16` for `torch.autograd.Function.backward` regardless of the `dtype` used in the forward. Changes: - store the `dtype` in `args[0]` - update tests to confirm the dtype of intermediate result tensors that are outputs of autocast compatible `torch` functions cc @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/88029 Approved by: https://github.com/ngimel commit f2b247f0d891f8ff5bcaa5276a51324f692e104c Author: Edward Z. Yang Date: Mon Oct 31 16:42:58 2022 -0400 Remove stale comment (#88135) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88135 Approved by: https://github.com/albanD commit 139afc50ecafcde5fb085e2cca78fed55e6b5aad Author: Christian Puhrsch Date: Mon Oct 31 21:31:54 2022 +0000 Fix links to tutorial in torch masked docs (#88129) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88129 Approved by: https://github.com/jisaacso commit 9fed04ba337680b7e6f2a6cde0b7bbc256a41eae Author: Catherine Lee Date: Mon Oct 31 21:12:52 2022 +0000 fix for auto labeler (#88100) followed https://lightrun.com/answers/actions-labeler-how-to-only-add-label-not-remove-when-pr-is-opened side note: should we move this logic to test-infra to be with the release notes labeler? Pull Request resolved: https://github.com/pytorch/pytorch/pull/88100 Approved by: https://github.com/huydhn commit ba26bc0fc266ddb58ec199349d2c93c7a905dfd0 Author: Radek Bartoň Date: Mon Oct 31 21:11:16 2022 +0000 Fix random "C1041: cannot open program database" errors when compiling on Windows (#88084) Adds `/FS` option to `CMAKE_CXX_FLAGS` and `CMAKE_CUDA_FLAGS`. So far I've encountered this kind of errors: ``` C:\Users\MyUser\AppData\Local\Temp\tmpxft_00004728_00000000-7_cuda.cudafe1.cpp: fatal error C1041: cannot open program database 'C:\Projects\pytorch\build\third_party\gloo\gloo\CMakeFiles\gloo_cuda.dir\vc140.pdb'; if multiple CL.EXE write to the same .PDB file, please use /FS ``` when building with VS 2022. cc @peterjc123 @mszhanyi @skyline75489 @nbcsm Related issues: - https://github.com/pytorch/pytorch/issues/87691 - https://github.com/pytorch/pytorch/issues/39989 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88084 Approved by: https://github.com/ezyang commit 73379acaf3865379aed0a1bab1320616772152f3 Author: Brian Hirsh Date: Mon Oct 31 09:47:26 2022 -0700 Do not use unsafe restriding for subclasses (#87610) This helps convert some accuracy errors into runtime errors, which makes it easier to debug. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87610 Approved by: https://github.com/albanD commit 6fe41e76a928ae00ad7e7dfe1036461f7b0b301f Author: Christian Puhrsch Date: Mon Oct 31 20:10:05 2022 +0000 Create separate files for NT Unary, Binary and Matmul ops (#88091) Improves code organization and code share. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88091 Approved by: https://github.com/drisspg commit 1a9edc8136e0667ce59ae5ffbbd4930110be4ff1 Author: Sean Ross-Ross Date: Mon Oct 31 10:11:14 2022 -0500 Changing from sample_inputs to reference_inputs in test_compare_cpu (#86462) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86462 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 4c78c7c82af08872f901076c5daaf8148f03b096 Author: Grigory Sizov Date: Mon Oct 31 19:59:35 2022 +0000 Enable `src_mask` in fast path of `TransformerEncoderLayer ` (#87377) Fixes https://github.com/pytorch/pytorch/issues/81129#issuecomment-1179435674 Passing a 2D attention mask `src_mask` into the fast path of `TransformerEncoderLayer` in CPU was causing an error and so was disabled in https://github.com/pytorch/pytorch/pull/81277. This PR unrolls this fix, enabling `src_mask` on the fast path: - Either attention mask `src_mask` of shape `(L, L)` or padding mask `src_key_padding_mask` of shape `(B, L)` are now allowed on the CPU fast path. If softmax is applied along the last dimension (as in multi-head attention), these masks are processed without expanding them to 4D. Instead, when iterating through the input, `Softmax.cpp::host_softmax` converts the index to match the mask dimensions, depending on the type. - If softmax is applied along the dimension other than the last, `Softmax.cpp::masked_softmax_cpu` expands masks to 4D, converting them to `mask_type=2`. Theoretically one could also add special optimized cases for `dim=0, 1, 2` and process them without mask expansion, but I don't know how often is that used - `test_transformerencoderlayer_fast_path` is extended to cover both attention mask and padding mask - `test_masked_softmax_mask_types_0_1` is added to ensure results from CPU softmax with attention and padding masks match the explicit slow calculation - `test_masked_softmax_devices_parity` is added to ensure results from masked softmax on CPU and CUDA match I had to replace `float` with `torch.get_default_dtype()` in a couple of tests for the following reason: - `test_nn.py` [sets the default type to `torch.double`](https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L24-L26) - If I execute `test_nn.py` and `test_transformers.py` in one `pytest` run, this default still holds for transformer tests - Some tests in `test_transformers.py` which were previously following the slow path now switched to fast path, and hard-coded `float` started clashing with default `double` Let me know if there is a better way around it - or maybe I'm not supposed to run tests with `pytest` like this Pull Request resolved: https://github.com/pytorch/pytorch/pull/87377 Approved by: https://github.com/mikekgfb, https://github.com/weiwangmeta, https://github.com/malfet commit e9599724fa25c3c2149f301f704fe90df6b591b0 Author: PyTorch MergeBot Date: Mon Oct 31 19:55:58 2022 +0000 Revert "[ONNX] Move all torch.onnx.export related tests to test/onnx (#87292)" This reverts commit e3e84830aade59722d819bc5fa01922239494790. Reverted https://github.com/pytorch/pytorch/pull/87292 on behalf of https://github.com/weiwangmeta due to breaking internal test relating to quantization eager tests, see test/quantization/eager/test_quantize_eager_ptq.py test_lower_graph_linear and test_lower_graph_conv2d commit e9cabef6631395c3dbb8d3d82b94e108e6b87db3 Author: KevinYuk Date: Mon Oct 31 19:46:01 2022 +0000 enable xpu group norm channels last support (#87680) XPU would support channels last format for group norm operator, however, Pytorch converts all input tensor to contiguous format, which includes channels last tensor. Need Pytorch pass down this memory format hint to us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87680 Approved by: https://github.com/albanD commit 7d2f1cd2115ec333767aef8087c8ea3ba6e90ea5 Author: Kazuaki Ishizaki Date: Mon Oct 31 19:31:56 2022 +0000 Fix typos under docs directory (#88033) This PR fixes typos in `.rst` and `.Doxyfile` files under docs directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/88033 Approved by: https://github.com/soulitzer commit c7ac3334308adebd8037f2af7b66972f42311ab5 Author: Sherlock Huang Date: Mon Oct 31 16:38:23 2022 +0000 Fix args for meta__fused_moving_avg_obs_fq_helper (#88058) Fixes https://github.com/pytorch/torchdynamo/issues/1802 There are a few problems, 1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test 2. self.empty_like() is not a valid call. it should be torch.empty_like(self) 3. python meta function has some unexplained behavior for arguments with default value of bool type? In particular, problem 3 is the most concerning one. **UPDATE: This is expected behavior, see discussion below for explanation.** Without setting the default value for `per_row_fake_quant` and `symmetric_quant`, it gets the following error when running with meta tensor. ``` meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant' ``` I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function. ``` print("per_row_fake_quant", per_row_fake_quant) print("symmetric_quant", symmetric_quant) ``` When default values are False, printed value correctly reflect the args value populated from call site. When default values are True, printed value is ALWAYS True, regardless of the populated value from call site. When default Values are None, printed value is `None` when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'. I also verify that this bug also affect for other meta function with default args.... My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ? I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing. cc @ezyang cc @bdhirsh. I know you had a fix in the symbolic shape branch... cc @yanboliang who reported this bug Pull Request resolved: https://github.com/pytorch/pytorch/pull/88058 Approved by: https://github.com/bdhirsh, https://github.com/yanboliang commit 3eb379052dc898a4e380045ca8fcd4f8bc75a524 Author: Peter Bell Date: Mon Oct 31 14:22:55 2022 +0000 unfold_backward: Remove stride >= size kernel in favour of copy_ (#88061) unfold_backward has a dedicated kernel for `stride >= size` which uses temporary tensors created by `at::arange` to perform the mapping from unfolded to folded. This instead uses `unfold` to view the output, and does a direct copy from the gradient into the view. In benchmarks I see either no difference or a marginal speed benefit from this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88061 Approved by: https://github.com/albanD commit ceddcf5434c41ba6e96d36f9e727bde0ee191220 Author: Peter Bell Date: Mon Oct 31 14:22:54 2022 +0000 istft: Use unfold_backward instead of col2im (#88060) `unfold_backward` implements the same operation as `col2im` but without support for 2d kernels or dilation. However, `istft` doesn't use any of those features and `unfold_backward` actually has a faster `TensorIterator` based implementation so we should use it here instead. In the example from #87353 I see a 2x speedup on both CPU and CUDA. On a wider variety of sizes and inputs I still see speedups across the board, especially on CPU since `col2im` isn't parallelized but `unfold_backward` is: | device | shape | hop_length | Master (us) | This PR (us) | Speedup | |--------|-----------------|------------|-------------|--------------|---------| | CUDA | (1, 129, 33) | 256 | 147 | 136 | 1.08 | | | | 128 | 153 | 128 | 1.20 | | | (100, 129, 20) | 256 | 181 | 147 | 1.23 | | | | 128 | 171 | 137 | 1.25 | | | (1000, 129, 10) | 256 | 681 | 443 | 1.55 | | | | 128 | 632 | 446 | 1.42 | | CPU | (1, 129, 33) | 256 | 106 | 104 | 1.02 | | | | 128 | 103 | 81 | 1.27 | | | (100, 129, 20) | 256 | 2400 | 399 | 6.02 | | | | 128 | 2150 | 313 | 6.87 | | | (1000, 129, 10) | 256 | 13800 | 3740 | 3.69 | | | | 128 | 12700 | 2110 | 6.02 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/88060 Approved by: https://github.com/albanD commit ff9449464484b4ca48bd7c68d8adfd31e97a4263 Author: Edward Z. Yang Date: Mon Oct 31 06:48:39 2022 -0700 Revert "Revert "Unify meta tensor and fake tensor converter conversion (#87943)"" (#88045) This reverts commit bc64999b8382796199178cf480adf51512b5f139. Check torch/_subclasses/meta_utils.py for "This is very tricky" for the bugfix explanation. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88045 Approved by: https://github.com/kit1980, https://github.com/Chillee commit 2e1199d171359b7fbf4d3a6f2b9fcafeaf27e39e Author: Jerry Zhang Date: Fri Oct 28 16:42:29 2022 -0700 [quant][fx] Fix a typo in utils.py (#88024) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/88024 Approved by: https://github.com/HDCharles, https://github.com/z-a-f commit 0a4ca9d08340fdba60d1ed73a52cdeebe5ac1b7e Author: Sherlock Huang Date: Mon Oct 31 04:12:36 2022 +0000 Fix meta for aten.angle and aten.index_copy (#88066) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88066 Approved by: https://github.com/albanD commit a3f8495b848fafe3ed792eed0cbd6b0db09586aa Author: Khushi Date: Mon Oct 31 17:08:52 2022 +0000 [primTorch fix] use _maybe_convert_to_dtype (#85163) Fixes #84561 - [x] fix lint tests cc: @Lezcano!! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85163 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 2702aaffc01f8ae66a4341be81778a56d203951a Author: Catherine Lee Date: Mon Oct 31 16:52:56 2022 +0000 remove old label check functionality (#88007) no longer needed as we have check_labels.py to check if the pr has labels and it blocks merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/88007 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/ZainRizvi commit 83f31ffdfe617d67bcf90312c7e57804de6cb87e Author: Catherine Lee Date: Mon Oct 31 16:52:28 2022 +0000 Move check labels to separate workflow (#87999) * moves check labels to separate workflow that is triggered on the usual pull_request triggers as well as labeled and unlabeled * deletes comments when label is added Fixes https://github.com/pytorch/test-infra/issues/978 and https://github.com/pytorch/pytorch/issues/87865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87999 Approved by: https://github.com/huydhn commit 5723fd503c22388654b66cf8e8634354b0867adb Author: Sherlock Huang Date: Mon Oct 31 03:24:48 2022 +0000 Fix meta function for aten.flip and aten.rot90 (#88065) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88065 Approved by: https://github.com/mruberry commit 9308cefbdfbe74f6a7be60d0a117e12a71198d0e Author: Andrew Gu Date: Mon Oct 31 01:43:05 2022 +0000 [FSDP()][8/N] Refactor limiter's `_FreeEventQueue` (#87922) This PR is easy. It just moves `_FreeEventQueue` into its own file `_limiter_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87922 Approved by: https://github.com/rohan-varma, https://github.com/mrshenli commit d89cf2fdc9d73d0f3920ab31437a24a520628b03 Author: Andrew Gu Date: Mon Oct 31 01:43:05 2022 +0000 [FSDP()][7/N] Refactor most of ctor (#87921) The goal of this PR is to make one pass over the FSDP constructor and refactor each helper method call to not be `self.<...>`. Subsequent PRs will make further passes over the FSDP constructor. This PR looks like a lot of lines of code change, but it is only reorganization. Methods are moved to `_init_utils.py` and `_common_utils.py`. This also marks the beginning of moving methods from `_utils.py` to `_common_utils.py` -- they will be coalesced eventually. I am only using `_common_utils.py` as a staging ground to include the methods that have been affected by the refactoring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87921 Approved by: https://github.com/mrshenli commit 9d9267c6f76a9d801b6d9c69fddc61e20f0f4b48 Author: Andrew Gu Date: Sun Oct 30 15:26:12 2022 +0000 [FSDP()][3/N] Refactor public APIs (#87917) - This PR defines a new `api.py` meant to hold the public API for FSDP (minus `FullyShardedDataParallel` itself). This is needed because several of the `_<...>_utils.py` files rely on the public API, and we cannot import from `torch.distributed.fsdp.fully_sharded_data_parallel` without a circular import. Calling the file `api.py` follows the convention used by `ShardedTensor`. - This PR cleans up the wording in the `BackwardPrefetch`, `ShardingStrategy`, `MixedPrecision`, and `CPUOffload` docstrings. - This PR adds the aforementioned classes to `fsdp.rst` to have them rendered in public docs. - To abide by the public bindings contract (`test_public_bindings.py`), the aforementioned classes are removed from `fully_sharded_data_parallel.py`'s `__all__`. This is technically BC breaking if someone uses `from torch.distributed.fsdp.fully_sharded_data_parallel import *`; however, that does not happen in any of our own external or internal code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87917 Approved by: https://github.com/mrshenli commit 59fe272c1e698989228af5ad197bdd2985e4e9b9 Author: Aaron Gokaslan Date: Mon Oct 31 16:41:24 2022 +0000 Fix: prefer .is_none() over .is(py::none()) for pybind11 (#88051) Fixes minor perf regression I saw in #85688 and replaced throughout the code base. `obj == Py_None` is directly equivalent to is_none(). Constructing a temporary py::none() object needlessly incref/decref the refcount of py::none, this method avoids that and therefore is more efficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88051 Approved by: https://github.com/albanD commit 75dbe3790938c30716463604ccfa68c0f9f6a7f5 Author: vasiliy Date: Fri Oct 28 12:15:49 2022 -0700 make autocast cache global instead of thread-local (#86492) Summary: There is a memory leak because `torch.clear_autocast_cache()` clears the autocast cache from the main thread, but autograd can write to this cache from a background thread, so whatever autograd writes will leak. With some offline discussion we decided that a global cache is a practical way to deal with this, and the performance impact of the lock should be negligible. Test Plan: I don't have a local repro of the original issue, need to look into how to get that. A toy example (https://gist.github.com/vkuzo/0d6318fe7f7cb1c505e370cd5c1a643b) does cache clearing as expected on forward and backward pass. local testing: ``` python test/test_cuda.py -k autocast python test/test_autocast.py ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86492 Approved by: https://github.com/ezyang commit 34f523b22158ca4a4a7974ec867084bab98bde83 Author: Andrew Gu Date: Sat Oct 29 21:14:50 2022 +0000 [FSDP] Enable `use_orig_params=True` test (#88034) I accidentally committed the `use_orig_params` PR with this test disabled. This PR simply re-enables it. It passes locally, so if CI is green, then this is an easy land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88034 Approved by: https://github.com/H-Huang commit df1cc0ef473893ffde87513b5be69cc3e2306561 Author: Salil Desai Date: Sun Oct 30 20:30:57 2022 -0700 [Vulkan] Add Vulkan Rewrite to Transfer Inputs and Outputs to Vulkan and CPU Backends Respectively (#87432) With this change, we don't have to manually invoke transferring input and output backends when we run vulkan models. Graph rewrite code based off of: - https://github.com/pytorch/pytorch/commit/32efff45ba77f2bb4b1e709613b99070f119745a#diff-a473bddb458dc24225866a45092d6eca064eddd256245d93020e48e216eee4d5R160-R179 Differential Revision: [D39519168](https://our.internmc.facebook.com/intern/diff/D39519168/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39519168/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87432 Approved by: https://github.com/mcr229, https://github.com/digantdesai commit bc6862515164a31d3a62e46a49977d54a618323c Author: Salil Desai Date: Sun Oct 30 20:30:55 2022 -0700 [Vulkan] Add support for Optimization Blocklist to Vulkan Rewrite (#87431) Optimization Blocklist will be used in a future diff (D40315730) to make the rewrite to transfer input/output backends optional Differential Revision: [D40315729](https://our.internmc.facebook.com/intern/diff/D40315729/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87431 Approved by: https://github.com/mcr229, https://github.com/digantdesai commit f717986f93f5e167a530867061cfa40d49c14316 Author: Edward Z. Yang Date: Mon Oct 31 09:20:49 2022 -0400 .gitignore log files (#88085) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88085 Approved by: https://github.com/albanD commit 8ea19c802e38c061e79176360c1ecaa81ce2088a Author: Edward Z. Yang Date: Sat Oct 29 21:43:09 2022 -0400 Make IValue::unsafeToTensorImpl a little less unsafe. (#88043) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88043 Approved by: https://github.com/anjali411, https://github.com/albanD commit e238752e20ae637c88e8534482f83a5074a82d43 Author: Edward Z. Yang Date: Sat Oct 29 17:25:42 2022 -0700 Simplify magic method definition code. (#88017) It turns out sym_float (and the hypothetical sym_int) can be defined in the same way as conventional magic methods. Do so. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88017 Approved by: https://github.com/albanD commit 2a47b107801569f7b21994d199d7b2fc6f8a25e7 Author: Edward Z. Yang Date: Sat Oct 29 08:45:32 2022 -0700 Get the magic method try reverse protocol correct (#88030) Signed-off-by: Edward Z. Yang cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88030 Approved by: https://github.com/anjali411, https://github.com/albanD commit 12dd877395a47d4de382b06fda9623da37782226 Author: Horace He Date: Sat Oct 29 00:59:57 2022 +0000 Fix all references to torchdynamo from the merge (#87731) cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @jansel Pull Request resolved: https://github.com/pytorch/pytorch/pull/87731 Approved by: https://github.com/yanboliang, https://github.com/ezyang, https://github.com/anijain2305, https://github.com/jansel commit 496acb6602644fee4db7c19df700f6224ce07f84 Author: Edward Z. Yang Date: Sun Oct 30 13:24:50 2022 -0400 Add fake tensor files to ciflow/inductor (#88052) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88052 Approved by: https://github.com/anijain2305 commit 6735bf21c70a5d0873036bc252e8a6873cb35291 Author: Kshiteej K Date: Mon Oct 31 04:42:45 2022 +0000 [test_nn] split convolution tests from test_nn (#87474) Ref #63085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87474 Approved by: https://github.com/albanD commit 46ce92713dff83182f36b9f4d2a112f9e568825f Author: Jing Xu Date: Mon Oct 31 04:40:52 2022 +0000 fix github bug issue 87552 (#88059) Fixes #87552 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88059 Approved by: https://github.com/jgong5, https://github.com/ngimel commit e24ce484ed52f6441db159ef0479ff06c72f2efd Author: Driss Guessous Date: Mon Oct 31 04:06:31 2022 +0000 Use scaled_dot_product_attention within attention.cpp (#87312) Use the private _scaled_dot_product_attention to support _native_multiheaded_attention. _SDP provides access to fused kernels when certain conditions are meant enabling a speed up for MHA. cc @cpuhrsch @jbschlosser @bhosmer @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/87312 Approved by: https://github.com/cpuhrsch commit d13f1e6ab4d20451f7e2acd87571ffa7fece0c32 Author: Fuzzkatt Date: Mon Oct 31 03:56:55 2022 +0000 Add sequence number support for UCC (#85047) Add sequence number support for UCC, mostly following format of ProcressGroupNCCL. Pass new test: `test_all_gather_object_subgroup` Add skips for gather tests: `test_gather_object` and `test_gather_object_subgroup` cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/85047 Approved by: https://github.com/kwen2501 commit 9642a7c2f6f4b21c44bbc5709b9af396df4053dc Author: HAOCHENYE <21724054@zju.edu.cn> Date: Mon Oct 31 03:00:30 2022 +0000 [ONNX] Fix get wrong summary of the docstring in `torch.onnx._deprecation.deprecated` (#87194) The summary of the deprecated function could be multi-line. Therefore the code below: https://github.com/pytorch/pytorch/blob/9ac2a06acf75538a35751f785d5f509d6127d6cd/torch/onnx/_deprecation.py#L45 should be adjusted to ```python summary_and_body = docstring.split("\n\n", 1) ``` Otherwise, the multi-line summary will be separated wrongly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87194 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit d67b2edec34ef27956de0b2ebb5d7e50dbba9de3 Author: Animesh Jain Date: Mon Oct 31 02:30:29 2022 +0000 [dynamo][dashboard] minor fixes for a clean Dashboard (#88056) * better check for cold start latency * sort on inductor column for better readability. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88056 Approved by: https://github.com/ngimel commit 9109ecf9142064367566cf540fd2803a09318652 Author: Mengchi Zhang Date: Sun Oct 30 18:22:17 2022 +0000 Even "nvcc not found" should be commented out (#87959) Summary: Even "nvcc not found" should be commented out in minifier_launcher.py, cause there could be a case that PyTorch/minifier can find cuda path but nvcc is not explicitly included in env variable like PATH. Differential Revision: D40790023 cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87959 Approved by: https://github.com/anijain2305, https://github.com/jianyuh commit 1b575782a0c307aae264714e3244afcf50bb365c Author: Animesh Jain Date: Sun Oct 30 17:10:17 2022 +0000 [dynamo][benchmarks] use fresh inductor cache and raise batch size wherever possible (#88044) cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88044 Approved by: https://github.com/ngimel commit e7b854fae9ff8116eaf4aeb24e04cac550bed362 Author: Nikita Shulga Date: Sun Oct 30 04:31:45 2022 +0000 [BE] Do not package caffe2 in wheel (#87986) If PyTorch is build without caffe2 integration, do not package unusable .py files/headers Same is true about functorch - don't package it unless building with `functorch` (although, I wonder if we should remove this option at some point in the future) Followup after https://github.com/pytorch/builder/pull/1181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87986 Approved by: https://github.com/seemethere commit 65e771959962156f434fad9b4fbe0c719813ab63 Author: PyTorch MergeBot Date: Sun Oct 30 03:02:55 2022 +0000 [vision hash update] update the pinned vision hash (#87948) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87948 Approved by: https://github.com/pytorchbot commit 621158cd7f3e1321e77d3312c39c258ad1f68d28 Author: Nikita Shulga Date: Sun Oct 30 01:04:55 2022 +0000 [BE] Do not assign string literal to `char *` (#87949) Not sure, what I was thinking when writing something like: ``` auto foo = std::getenv("BAR"); if (!foo) { foo = "baz"; } ``` as `std::getenv` return `char *` (i.e. mutable string), but string literals are immutable. (i.e. `const char *`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87949 Approved by: https://github.com/kit1980 commit 59001d05b406bb00d5838f04ca972180e1a4946e Author: Yanbo Liang Date: Sat Oct 29 20:36:20 2022 +0000 [Inductor] Enable Inductor unspec inputs test for different dtypes (#87809) Fixes #ISSUE_NUMBER cc @jansel @mlazos @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87809 Approved by: https://github.com/ngimel commit bc64999b8382796199178cf480adf51512b5f139 Author: PyTorch MergeBot Date: Sat Oct 29 18:39:28 2022 +0000 Revert "Unify meta tensor and fake tensor converter conversion (#87943)" This reverts commit baa715e790921e6498861e59556035de1a481cc5. Reverted https://github.com/pytorch/pytorch/pull/87943 on behalf of https://github.com/kit1980 due to Broke several inductor tests commit e4a8661ab84022c1bff622c6d2f6e679180b1df5 Author: Shunting Zhang Date: Sat Oct 29 17:52:26 2022 +0000 torchdynamo and xla integration (#87741) - torchdynamo and torchxla uses different strategies to be a sound graph capture technique. The former relies on guards; the latter relies on retracing - guard system is quite low overhead but torchxla tracing overhead is quite high The main idea is to leverage guard system in torchdynamo to avoid retracing in torchxla so that - we can integration torchdynamo with XLA - we reduce or even completely avoid tracing overhead of torchxla We found that different frameworks do not generate numerically identical results for the SAME model with the SAME input. By default, torchdynamo uses eager as baseline so the model will run with PyTorch. It would be tricky to compare a model running on XLA with this baseline: it's hard to check correctness. To make the comparison easier, we add a flag `--use-xla-baseline`. When it's enabled, the baseline will be run on XLA. We add 2 new dynamo backends torchxla_trivial and trochxla_trace_once to control the optimization targets. torchxla_trivial simply moves inputs/model parameters to XLA and run the model on XLA. There is tracing overhead for each run. We should expect that result to be mostly neutral compared to the XLA baseline. torchxla_trace_once only traces once during AOT compiling time. Here are the steps: 1. dynamo capture guards and the subgraph 2. torchxla_trace_once backend trace the graph with torchxla, lowering the graph and record a hash of the graph for later lookup 3. at inference time, the hash is used directly to lookup the optimized graph and run it. We can not handle LTC/torchxla fall back right now. If a op misses LTC kernel, we raise and exception and that will results in dynamo fallback (or try another compiler). People have brainstormed the idea of graph breaking and stitching the subgraphs together. But maybe it's easier to add those missing LTC kernels for those models. The models we tested are those not causing LTC fallback. We run the tests on **GPU**. We see **1.38x** geomean speedup for trochxla_trace_once and torchxla_trivial is mostly neutral as expected. ``` | Model | XLA (trace once) | XLA (trace everytime) | +=========================+====================+=========================+ | resnet18 | 1.346 | 1.045 | +-------------------------+--------------------+-------------------------+ | resnet50 | 1.153 | 1.007 | +-------------------------+--------------------+-------------------------+ | resnext50_32x4d | 1.381 | 1.039 | +-------------------------+--------------------+-------------------------+ | alexnet | 1.045 | 1.018 | +-------------------------+--------------------+-------------------------+ | mobilenet_v2 | 1.562 | 1.021 | +-------------------------+--------------------+-------------------------+ | mnasnet1_0 | 1.303 | 1.069 | +-------------------------+--------------------+-------------------------+ | squeezenet1_1 | 1.278 | 1.025 | +-------------------------+--------------------+-------------------------+ | vgg16 | 1.076 | 1.008 | +-------------------------+--------------------+-------------------------+ | BERT_pytorch | 2.224 | 0.978 | +-------------------------+--------------------+-------------------------+ | timm_vision_transformer | 1.81 | 1.025 | +-------------------------+--------------------+-------------------------+ | geomean | 1.38101 | 1.02324 | +-------------------------+--------------------+-------------------------+ ``` The speedup is similar to what we see from previous work for LTC's TorchScript backend (we see 1.40 geomean speedup there): https://docs.google.com/presentation/d/1G09X8v41u_cLKLtSdf7v6R8G19-iZTPcW_VAdOnvYBI/edit#slide=id.g11bf989cb6b_1_5 - Use AOT autograd to enable training - Share results on XLA devices - Do more extensive tests on torchbench models Example command ``` GPU_NUM_DEVICES=1 python benchmarks/dynamo/torchbench.py --randomize-input --performance --use-xla-baseline --only resnet18 --backend=torchxla_trace_once ``` Thanks @JackCaoG from torchxla team to help debugging various perf issues and merging the torchxla PR! That's super critical for us to get the results above. torchxla side PR: https://github.com/pytorch/xla/pull/4119 topic: not user facing cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @jansel Pull Request resolved: https://github.com/pytorch/pytorch/pull/87741 Approved by: https://github.com/wconstab commit 6cd25eb6de41ac05affa069e0d607ae8cdd54d6b Author: Richard Barnes Date: Sat Oct 29 17:48:23 2022 +0000 Use TORCH_CHECK instead of inappropriate CUDA_KERNEL_ASSERT (#87714) `CUDA_KERNEL_ASSERT` should only be used inside kernels; switch these bad usages to `TORCH_CHECK` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87714 Approved by: https://github.com/ezyang commit 384b84d6a601e6e7b9dab1f68e3498ba6d84e950 Author: Huy Do Date: Sat Oct 29 17:40:07 2022 +0000 [BE] Upload GHA artifacts to S3 (#87827) This is exclusively used by macOS, ROCM (and any other future workflows) that don't have direct access to S3 to upload their artifacts Running the script locally with the personal GITHUB_TOKEN: ``` python3 -m tools.stats.upload_artifacts --workflow-run-id 3342375847 --workflow-run-attempt 1 --repo pytorch/pytorch Using temporary directory: /var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb Downloading sccache-stats-macos-12-py3-arm64-runattempt1-9155493770 Downloading sccache-stats-macos-12-py3-lite-interpreter-x86-64-runattempt1-9155493303 Downloading sccache-stats-macos-12-py3-x86-64-runattempt1-9155493627 Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/sccache-stats-macos-12-py3-arm64-runattempt1-9155493770 to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/sccache-stats-macos-12-py3-arm64-9155493770 Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/sccache-stats-macos-12-py3-lite-interpreter-x86-64-runattempt1-9155493303 to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/sccache-stats-macos-12-py3-lite-interpreter-x86-64-9155493303 Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/sccache-stats-macos-12-py3-x86-64-runattempt1-9155493627 to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/sccache-stats-macos-12-py3-x86-64-9155493627 Downloading test-jsons-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip Downloading test-jsons-runattempt1-test-default-1-2-macos-12_9155944815.zip Downloading test-jsons-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip Downloading test-jsons-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip Downloading test-jsons-runattempt1-test-default-2-2-macos-12_9155944892.zip Downloading test-jsons-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-1-2-linux.rocm.gpu_9155913429.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-1-2-macos-12_9155944815.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-1-2-macos-12_9155944815.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-1-2-macos-m1-12_9155888061.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-2-2-linux.rocm.gpu_9155913500.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-2-2-macos-12_9155944892.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-2-2-macos-12_9155944892.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-jsons-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-jsons-test-default-2-2-macos-m1-12_9155888182.zip Downloading test-reports-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip Downloading test-reports-runattempt1-test-default-1-2-macos-12_9155944815.zip Downloading test-reports-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip Downloading test-reports-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip Downloading test-reports-runattempt1-test-default-2-2-macos-12_9155944892.zip Downloading test-reports-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-1-2-linux.rocm.gpu_9155913429.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-1-2-macos-12_9155944815.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-1-2-macos-12_9155944815.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-1-2-macos-m1-12_9155888061.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-2-2-linux.rocm.gpu_9155913500.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-2-2-macos-12_9155944892.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-2-2-macos-12_9155944892.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/test-reports-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/test-reports-test-default-2-2-macos-m1-12_9155888182.zip Downloading usage-log-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip Downloading usage-log-runattempt1-test-default-1-2-macos-12_9155944815.zip Downloading usage-log-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip Downloading usage-log-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip Downloading usage-log-runattempt1-test-default-2-2-macos-12_9155944892.zip Downloading usage-log-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-1-2-linux.rocm.gpu_9155913429.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-1-2-linux.rocm.gpu_9155913429.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-1-2-macos-12_9155944815.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-1-2-macos-12_9155944815.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-1-2-macos-m1-12_9155888061.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-1-2-macos-m1-12_9155888061.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-2-2-linux.rocm.gpu_9155913500.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-2-2-linux.rocm.gpu_9155913500.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-2-2-macos-12_9155944892.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-2-2-macos-12_9155944892.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpxl6d7kcb/usage-log-runattempt1-test-default-2-2-macos-m1-12_9155888182.zip to s3://gha-artifacts/pytorch/pytorch/3342375847/1/artifact/usage-log-test-default-2-2-macos-m1-12_9155888182.zip ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87827 Approved by: https://github.com/clee2000 commit d9b6e41da9a24ad35b043cd79b581508c8c6304b Author: Shen Li Date: Sat Oct 29 04:07:56 2022 +0000 Add composable activation checkpointing (#87664) This is a composable activation checkpointing API. Unlike functional activation checkpointing APIs, this one does not require changing model source code. Unlike ``nn.Module`` wrapper activation checkpointing APIs, this one does not modify model structure or fully-qualified names either. Under the hood, it registers activation checkpointing logic as pre- and post-forward hooks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87664 Approved by: https://github.com/zhaojuanmao commit 19171a21ee8a9cc1a811ac46d3abd975f0b6fc3b Author: Sergey Lebedev Date: Sat Oct 29 16:33:18 2022 +0000 Make barrier blocking in UCC (#86961) Currently CUDA UCC barrier is nonblocking with respect to CPU and there is no flag to change it. To make UCC PG barrier behaviour consistent with NCCL PG in this PR barrier has changed to be always blocking. cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/86961 Approved by: https://github.com/kwen2501 commit baa715e790921e6498861e59556035de1a481cc5 Author: Edward Z. Yang Date: Fri Oct 28 13:28:39 2022 -0700 Unify meta tensor and fake tensor converter conversion (#87943) Meta tensor does a lot of work to make sure tensors "look" similar to the original parts; e.g., if the original was a non-leaf, meta converter ensures the meta tensor is a non-leaf too. Fake tensor destroyed some of these properties when it wraps it in a FakeTensor. This patch pushes the FakeTensor constructor into the meta converter itself, so that we first create a fake tensor, and then we do various convertibility bits to it to make it look right. The two tricky bits: - We need to have no_dispatch enabled when we allocate the initial meta tensor, or fake tensor gets mad at us for making a meta fake tensor. This necessitates the double-callback structure of the callback arguments: the meta construction happens *inside* the function so it is covered by no_dispatch - I can't store tensors for the storages anymore, as that will result in a leak. But we have untyped storage now, so I just store untyped storages instead. Signed-off-by: Edward Z. Yang cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87943 Approved by: https://github.com/eellison, https://github.com/albanD commit 4210cebc166dd355a315034b2a5aecdffacf5f91 Author: AllenTiTaiWang Date: Fri Oct 28 19:54:52 2022 +0000 [ONNX] Add internal node kind parsing (#87638) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87638 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit cb05a4da3916469ec511c042b95e447ca395e8d7 Author: AllenTiTaiWang Date: Fri Oct 28 19:31:23 2022 +0000 [ONNX] Parametrized Avgpool2D test to have all test combinations (#87893) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87893 Approved by: https://github.com/BowenBao commit f2ae459311607b341779590e2e985c6b7c895f1d Author: AllenTiTaiWang Date: Fri Oct 28 19:31:23 2022 +0000 [ONNX] Disable ONNX ceil_mode and count_include_pad to aligntorch ceil_mode results in corner case (#87892) ONNX and PyTorch has different equation on pooling and different strategy on ceil_mode, which leads to discrepancy on corner case (#71549 ). Specifically, PyTorch avereage pooling is not following [the equation on documentation](https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html), it allows sliding window to go off-bound instead, if they start within the left padding or the input (in NOTE section). More details can be found in #57178. This PR changes avgpool in opset 10 and 11 back the way as opset 9, which it stops using ceil_mode and count_include_pad in onnx::AveragePool A comprehensive test for all combinations of parameters can be found in the next PR. #87893 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87892 Approved by: https://github.com/BowenBao commit c810489dd9549da64ef3610e150c9589f7217759 Author: Huy Do Date: Sat Oct 29 08:43:45 2022 +0000 Cleanup macos common conda installation (#87816) The conda dependencies have all been installed for `_mac-test` in https://github.com/pytorch/pytorch/pull/87541. I missed the same step for `_mac-build` and `_mac-test-mps` workflows, so both are also updated here. Note that arm64 is cross-compiled from x86, so the env file needs to be set explicitly in that case After this one, I have a WIP PR to consolidate macos pip dependencies next Pull Request resolved: https://github.com/pytorch/pytorch/pull/87816 Approved by: https://github.com/ZainRizvi commit 53fea905477e64960002def848a7d897d8ae52a4 Author: Huy Do Date: Sat Oct 29 08:34:13 2022 +0000 Store usage log on GitHub when S3 is not available (#87947) It turns out that we haven't uploaded the usage log to GitHub when S3 is not available (macos, rocm), for example, https://github.com/pytorch/pytorch/actions/runs/3325822440#artifacts only includes test-report, test-json, sccache stats, and build artifacts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87947 Approved by: https://github.com/clee2000 commit d3c01c722d95d9b386fa47078563687d2bffbdad Author: Edward Z. Yang Date: Fri Oct 28 17:20:10 2022 -0400 Fix pybind11 problems with c10::SymInt unregistered (#88011) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/88011 Approved by: https://github.com/weiwangmeta, https://github.com/albanD commit e667c0065657c29f42e592a4dcd810801cb83457 Author: Andrew Gu Date: Fri Oct 28 18:15:57 2022 +0000 [FSDP()][2/N] Refactor training state (#87916) This PR actually has meaningful changes. We stratify `TrainingState` into two levels: one is per FSDP instance and one is per `FlatParamHandle`/`FlatParameter`. - At the FSDP instance level, we only care about `IDLE`, FSDP computation (i.e. `FORWARD_BACKWARD`), or `SUMMON_FULL_PARAMS`. These dynamically modify behavior (e.g. `summon_full_params()` forces full precision). - At the `FlatParamHandle` level, we care about the training state for invariants and debugging. Hence, we keep `IDLE`, `FORWARD`, `BACKWARD_PRE`, `BACKWARD_POST`, and `SUMMON_FULL_PARAMS`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87916 Approved by: https://github.com/mrshenli commit cbc9faebfe962286ec8dd9cf8a5854613693f78a Author: Andrew Gu Date: Fri Oct 28 04:17:33 2022 +0000 [FSDP()][1/N] Start refactoring FSDP root pre-forward (#87915) Welcome! This PR starts the refactoring journey. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87915 Approved by: https://github.com/mrshenli commit edd6cf9996ce93ce11efe818a1e1f31a08920018 Author: PyTorch MergeBot Date: Sat Oct 29 06:48:12 2022 +0000 Revert "[ONNX] Deprecate operators.py (#87798)" This reverts commit 88eff1072290177221e7a09d792f7f135b4c83ca. Reverted https://github.com/pytorch/pytorch/pull/87798 on behalf of https://github.com/weiwangmeta due to breaking internal builds see D40797126 commit e3e84830aade59722d819bc5fa01922239494790 Author: AllenTiTaiWang Date: Fri Oct 28 19:54:52 2022 +0000 [ONNX] Move all torch.onnx.export related tests to test/onnx (#87292) Moving torch.onnx.export related tests to test/onnx integrates ONNX tests to the same CI machine, so the testing environment can be better managed. Fixes https://github.com/pytorch/pytorch/issues/87320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87292 Approved by: https://github.com/thiagocrepaldi, https://github.com/BowenBao, https://github.com/kit1980 commit 1dad051b05f896a5958e33423ccd3baa10ad1072 Author: Loren Arthur Date: Sat Oct 29 04:52:01 2022 +0000 Move workspace related functions to separate file (#87651) Move workspace related functions to separate file Test Plan: Existing tests Differential Revision: D40657708 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87651 Approved by: https://github.com/malfet commit 0cf572ff6c7522fa89ad4816bed3c5667e7106ee Author: Iris Zhang Date: Sat Oct 29 04:38:34 2022 +0000 [C10D][BE] Add exception handlers to c10d collectives function (#87643) (#87988) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87643 1. Add a decorator function exception_handlers to c10d collectives. 2. Update test(torch/distributed/distributed_c10d.py) to include mp tests for exception_handler. ``` python3 test/distributed/test_c10d_error_logger.py ``` Test Plan: Test in OSS. Reviewed By: H-Huang Differential Revision: D40281632 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87988 Approved by: https://github.com/H-Huang commit 20e16c013fafe0f76d565434632b744300af05ea Author: Tovly Deutsch Date: Sat Oct 29 04:20:56 2022 +0000 Allow caffe2 to build with fbcode/mode/mac (#87293) Summary: The Mac contbuild builds under the `fbcode/mode/mac` which caffe2 fails to build under. This is due to that build mode enforcing protobuf v3. The caffe2 targets already account for this issue under `arvr` build modes by swapping out protobuf dependencies. They don't account for the same issue under `fbcode/mode/mac`. This diff fixes that by checking for `is_fbcode_mac` in these situations (in addition to `arvr`). Test Plan: ``` buck build --flagfile fbsource//fbcode/mode/mac fbsource//xplat/caffe2/... ``` Reviewed By: kimishpatel Differential Revision: D39552724 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87293 Approved by: https://github.com/kimishpatel commit 98354130096041224e9764a2f976d2d015d958ee Author: Elias Ellison Date: Fri Oct 28 12:33:37 2022 -0700 Fake Tensor For (Conv) Propagation (#87641) Resubmitting https://github.com/pytorch/pytorch/pull/87302 so it can be ghstack'd with the pr below. Incorrect strides in any meta impl would lead to runtime assertion errors for fallback kernels, so start by just enabling it for conv. Replaces https://github.com/pytorch/pytorch/pull/87588. cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87641 Approved by: https://github.com/jansel commit 14d5f139d205f924eb7ddd3e61215971bd194855 Author: Kazuaki Ishizaki Date: Sat Oct 29 01:26:15 2022 +0000 Fix typos under benchmarks, test, and tools directories (#87975) This PR fixes typos in `.md` files under benchmarks, test, and tools directories Pull Request resolved: https://github.com/pytorch/pytorch/pull/87975 Approved by: https://github.com/kit1980 commit 18f3db2963f3d0ac6b5eca0543cd51bbcd8e0428 Author: Richard Zou Date: Sat Oct 29 01:21:55 2022 +0000 Fix functorch tests (#87914) Test Plan: - Run tests Differential Revision: D40777145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87914 Approved by: https://github.com/Chillee, https://github.com/osalpekar commit af0c339f00094c4c2f3c260b55e04e0e3654776a Author: Sergii Dymchenko Date: Sat Oct 29 00:23:47 2022 +0000 Disable slow-gradcheck tests (#88008) Disable because slow-gradcheck tests take > 4 hrs and time out. Will need to figure out if and how to re-enable later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88008 Approved by: https://github.com/seemethere, https://github.com/huydhn commit 785054d3a9d0cf3d528511c42d81a9f09e36f1c6 Author: Nikita Shulga Date: Fri Oct 28 23:59:47 2022 +0000 [CI] Report build errors in Windows build step (#88001) Should make failures like https://github.com/pytorch/pytorch/actions/runs/3346715682/jobs/5543900889 much more debuggable P.S. I don't know how to write batch, just hope its going to work Pull Request resolved: https://github.com/pytorch/pytorch/pull/88001 Approved by: https://github.com/seemethere commit 1eba3f220e04e347d0fd869b2118ddb7a49308d5 Author: Daniil Kutz Date: Fri Oct 28 23:51:53 2022 +0000 Fix bugs found by static analysis (#85705) These PR fixes a number of bugs found by Svace static analyzer: 1. DEREF_AFTER_FREE at qnnpack_utils.h: Pointer '&convolution->zero_buffer' is dereferenced at qnnpack_utils.h:258 after the referenced memory was deallocated at operator-delete.c:25 by passing as 1st parameter to function 'pytorch_qnnp_delete_operator' at qnnpack_utils.h:251. 2. DEREF_AFTER_NULL at impl.cpp: After having been compared to NULL value at impl.cpp:1892, pointer 'schema' is passed as 2nd parameter in call to function 'c10::operator<<' at impl.cpp:1921, where it is dereferenced at function_schema_inl.h:13. 3. DEREF_OF_NULL at stmt.h: After having been compared to NULL value at stmt.h:744, pointer 'body->_M_ptr' is passed in call to function 'torch::jit::tensorexpr::malformed_input::malformed_input' at stmt.h:745, where it is dereferenced at exceptions.h:67. 4. DEREF_OF_NULL at loopnest.h: Pointer 'f->ptr' that can have only NULL value (checked at loopnest.cpp:1482), is passed in call to function 'torch::jit::tensorexpr::malformed_input::malformed_input' at loopnest.cpp:1483, where it is dereferenced at exceptions.h:67. This is the same error as 3: forwarding a nullptr to malformed_input(). 4. TAINTED_INT.LOOP in python_arg_parser: Integer value 'this->size' obtained from untrusted source at python_arg_parser.cpp:118 without checking its bounds is used as a loop bound at python_arg_parser.cpp:698 by calling function 'torch::FunctionParameter::set_default_str' at python_arg_parser.cpp:133. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85705 Approved by: https://github.com/kit1980 commit 376acf7625d741d031e1d2a147f8d68626e21d82 Author: BowenBao Date: Fri Oct 28 23:51:42 2022 +0000 Add 'share_from_this' to 'torch::jit::Graph' (#87343) Avoid passing raw pointer of 'torch::jit::Graph' to python. Otherwise, it will corrupt the `internals::registered_instance` of pybind11, caching a holder for python w.r.t the raw pointer of 'torch::jit::Graph', while not increasing the use count of the existing shared_ptr. The behavior afterwards is random and probably undefined. Most of the time it works, if the holder is deallocated timely on python side, and the cache then cleared from `internals::registered_instance`. Things are back to normal. Otherwise, it fails with either segfault or a runtime error of message "Unable to cast from non-held to held instance". One of such scenarios is normally and correctly returning a shared_ptr of that 'torch::jit::Graph' to python. Pybind finds the holder via cache. Due to this, the shared_ptr use_count will not increase. If there is no other use on C++ side, the graph will be freed, while python still has access, via the holder created previously. @t-vi had a great analysis and solution to this exact problem at #51833 which I hope I had seen before debugging this issue... ~~I'm building the PR based on the original commit. @t-vi please let me know if you'd prefer otherwise.~~ Sending the PR separately due to CLA issues. Need to check in CI if adding `enable_shared_from_this` breaks other stuff. Fixes #51833, and CI issues in #87258, #86182. cc @malfet, @kit1980 for changes on JIT IR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87343 Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/malfet commit ecf277abeca4d0b09c2587ca6d9a37be602e889a Author: Jerry Zhang Date: Thu Oct 27 10:49:55 2022 -0700 [quant][improvement] Check the fixedqparam op qconfig based on backend_config (#87425) Summary: Previously we hardcoded the supported observers for fixedqparam ops, this PR changes that to take the information from BackendConfig, this allows users to customize the support for fixed qparam ops Test Plan: python test/test_quantization.py TestQuantizeFx.test_change_backend_config_for_fixed_qparam_ops Reviewers: Subscribers: Tasks: Tags: unlinked from diff since it's too hard to land Pull Request resolved: https://github.com/pytorch/pytorch/pull/87425 Approved by: https://github.com/andrewor14 commit c3c817c972b50066bec6ea14176b931039d8fbd6 Author: Eli Uriegas <1700823+seemethere@users.noreply.github.com> Date: Fri Oct 28 15:12:31 2022 -0700 Revert "ci: Switch merge / revert flow to our own infra" (#88016) commit a2ffc3be971aec9245e4beee0a65ecc73e71f870 Author: Andrew Gu Date: Fri Oct 28 02:02:25 2022 +0000 [AC] Add trailing "." to `_CHECKPOINT_PREFIX` like FSDP (#87951) This is for consistency with FSDP. - `_FSDP_WRAPPED_MODULE` and `_CHECKPOINT_WRAPPED_MODULE` are exactly the wrapped module variable name, meaning you can call `getattr(module, _FSDP_WRAPPED_MODULE)` or `getattr(module, _CHECKPOINT_WRAPPED_MODULE)`. - `_FSDP_PREFIX` and `_CHECKPOINT_PREFIX` include the trailing `"."` and are only used for FQNs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87951 Approved by: https://github.com/zhaojuanmao commit 4faf086e5f2c7743b45bcefa7f951f8faaa0e94d Author: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> Date: Fri Oct 28 22:05:11 2022 +0000 Update build scripts for ninja and ROCm5.3 install (#87505) cc @jeffdaily @sunway513 @ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/87505 Approved by: https://github.com/seemethere commit 349ad23ffbcee88f2f0d590da1f8cf577d3a7627 Author: Eli Uriegas <1700823+seemethere@users.noreply.github.com> Date: Fri Oct 28 14:37:55 2022 -0700 ci: Switch merge / revert flow to our own infra (#88009) commit 9691ba2dbd8c1f6967d0d97a3679104368b329ed Author: Michael Lazos Date: Fri Oct 28 21:33:53 2022 +0000 Remove excess exception logging for minifier, cleanup backend failure exception format (#87537) Fixes https://github.com/pytorch/torchdynamo/issues/1376 Ensures exceptions are printed only in one place, once. implements some of the ideas from https://github.com/pytorch/torchdynamo/issues/1754 - Attaches a field to the exception which indicates that it's minified, a usage message is printed if this field is present cc @jansel @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87537 Approved by: https://github.com/anijain2305 commit 1c37119a1f735cac0cda0064dca7c69b658216aa Author: Andrew Gu Date: Fri Oct 28 02:02:25 2022 +0000 [FSDP] New fix for composing with other module wrappers (#87950) We change `.module` to pass through `ActivationWrapper` directly to the inner wrapped module. This should fix the state dict issues. Given the invariant that `.module` always returns the inner wrapped module, FSDP always registers the `FlatParameter` on the inner wrapped module, regardless of if there is an intermediate `ActivationWrapper` or not. This avoids casing on whether `ActivationWrapper` is added before or after FSDP construction. This PR removes the added unit test in `test_fsdp_misc.py` for changing the wrapped module because I would rather not complicated `_lazy_init()` logic just to support that kind of adversarial behavior. The user should not be swapping out the wrapped module arbitrarily or deleting the `FlatParameter`. I mainly had those tests to make sure that all branches of the code I added was correct. Differential Revision: [D40799961](https://our.internmc.facebook.com/intern/diff/D40799961) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87950 Approved by: https://github.com/zhaojuanmao commit c2c269c10aa7469b023894c9d5428316a4d36221 Author: Edward Z. Yang Date: Fri Oct 28 07:07:44 2022 -0700 Convert MetaConverter's tensor memo into a weak value dictionary. (#87911) This is in preparation for unifying fake tensor converter and meta converter's memo tables. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87911 Approved by: https://github.com/eellison commit e72962a34dc3d6d8e52f1d7b76e982e05885fdaa Author: Edward Z. Yang Date: Fri Oct 28 07:07:44 2022 -0700 Force people to call from_meta_and_device directly (#87903) It was pretty hard to tell at call site if I was doing device meta convert or not. This gets rid of the "dual" API and forces people to call the method manually for the device case. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87903 Approved by: https://github.com/eellison, https://github.com/albanD commit ab8fbd26f8a5c7d5fdb8527536dbf2aa613ce722 Author: Andrey Talman Date: Fri Oct 28 19:55:31 2022 +0000 Advance nightly docker to 11.6 (#87858) Fixes following: https://github.com/pytorch/pytorch/actions/runs/3242695506/jobs/5316334351 crash in Docker builds introduced by: #82682 The PR seems to introduce some changes not compatible with cuda 11.3 which is used by our Docker builds This is a reland of original pr: https://github.com/pytorch/pytorch/pull/86941 (Created this new PR to start fresh) Which was reverted because conda install, installed wrong version of pytorch. It installed pytorch for cuda 11.3 still rather then 11.6 This should be fixed now with Release 1.13 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87858 Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/izaitsevfb commit c5cb6ec06619a2fc9874b967f11d13663c5d32c1 Author: Eddie Yan Date: Fri Oct 28 19:33:42 2022 +0000 Allow 64bit indexing for channels-last upsample2d on CUDA (#87901) CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/87901 Approved by: https://github.com/ngimel commit fb64f7b804911fd74322132c209f86047825f04a Author: Taylor Robie Date: Wed Oct 26 16:56:51 2022 -0700 [Profiler][Trivial] Move ID assignment code to `data_flow.cpp` (#87670) ID assignment has become a very complex facet of the profiler. The existing code has grown organically as I've discovered various refinements and has become very difficult to understand or reason about. (With more complexity coming in https://github.com/pytorch/pytorch/pull/87133) I want to take a step back and add some structure and additional comments to the ID assignment algorithm. Before I do, however, it's time to move it out of `collection.cpp` to a dedicated data flow file. Differential Revision: [D40666360](https://our.internmc.facebook.com/intern/diff/D40666360/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40666360/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87670 Approved by: https://github.com/slgong-fb commit 8d395ec6bc95e7a24311000ce65c992c6a568f34 Author: Taylor Robie Date: Wed Oct 26 16:56:50 2022 -0700 [Profiler][Trivial] Add hashing struct for pairs and tuples. (#87668) There is a fairly simple and commonly used hash_combine in c10/util; however in order to use it in a map we need to wrap it in a hashing struct. By defining template functions we also get recursive unpacking for free. (A later PR will want to hash a `tuple, tuple>`) Differential Revision: [D40666359](https://our.internmc.facebook.com/intern/diff/D40666359/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87668 Approved by: https://github.com/slgong-fb commit d13b6781d8b7353919ee06378636773f762b880e Author: PyTorch MergeBot Date: Fri Oct 28 17:55:19 2022 +0000 Revert "[fx][subgraph_rewriter] Change match_filter to be a List in replace_pattern_with_filters (#87257)" This reverts commit 58650835bb91d927623e6bff5cc4844fbcad6368. Reverted https://github.com/pytorch/pytorch/pull/87257 on behalf of https://github.com/weiwangmeta due to breaking internal builds/BC-breaking change commit fc21b9db23377569423f20a749a170375a11966d Author: Elias Ellison Date: Thu Oct 27 17:12:36 2022 -0700 Use Eager Code To Determine Conv Layout (#87305) The logic for determine conv backend and therefore output striding is very complex. It depends on build settings, input striding/contiguity, sizes, etc. Eventually we should port that logic to the meta impl for dynamic shapes but that will require a lot more work and keeping the implementations in sync. See https://github.com/pytorch/torchdynamo/issues/1701 This is a prerequisite to removing the inductor conv stride propagation and more general fake tensor for inductor propagation. In that PR, the meta impls for cpu conv give incorrect striding which led to test failures (https://github.com/pytorch/pytorch/pull/87083). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87305 Approved by: https://github.com/ezyang commit 1bc0e923bb006ee9e43996dfde49df89ea11b979 Author: Natalia Gimelshein Date: Fri Oct 28 16:09:25 2022 +0000 add special case for power of 0.5 (#87912) Workaround for https://github.com/pytorch/torchdynamo/issues/1775, and calling sqrt is better in any case, but `libdevice.pow` still for some reason doesn't work if both arguments are scalars cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @mreso, can you please check if that takes you further with diffusers cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87912 Approved by: https://github.com/desertfire commit 35c611d30f6024fc6fc94b437372ab4ee1b3544d Author: Driss Guessous Date: Fri Oct 28 15:51:10 2022 +0000 Add mem efficient backend flag (#87946) Add in a torch.backends.cuda flag and update context manager to pic between the three implementations of the scaled_dot_product_attention. cc @cpuhrsch @jbschlosser @bhosmer @mikaylagawarecki Pull Request resolved: https://github.com/pytorch/pytorch/pull/87946 Approved by: https://github.com/cpuhrsch commit 89fd451934ac4065bd0064ba9d92e8b8b3827619 Author: Shen Li Date: Fri Oct 28 14:45:38 2022 +0000 Fix codeowner errors (#87954) Error message: "Unknown owner: make sure @mingzhe09088 exists and has write access to the repository." Pull Request resolved: https://github.com/pytorch/pytorch/pull/87954 Approved by: https://github.com/wangkuiyi commit 8a9aca7b8d7fba320b4f2a8c2f18a25f572c46b6 Author: albanD Date: Fri Oct 28 13:40:11 2022 +0000 Reland 2 Many symintifications (#87604) (#87980) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87980 Approved by: https://github.com/ezyang commit ce3e0e9856e32fae61df282f0b97b0e2e1eadf9d Author: Shen Li Date: Fri Oct 28 02:04:36 2022 +0000 Add state to distributed composable API (#87838) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87838 Approved by: https://github.com/yhcharles commit b192e7e415c50cf7af5c70f35a8c20c38985d06d Author: Christian Puhrsch Date: Fri Oct 28 11:26:17 2022 +0000 Support non-contiguous NestedTensors for elementwise ops (#87888) Enables benchmarking of math path of sdp kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/87888 Approved by: https://github.com/drisspg commit f150e70ca2a5d7efdfb55e3115ccd750b39acc39 Author: leslie-fang-intel Date: Fri Oct 28 10:30:30 2022 +0000 add the function specialization for promote with ITensorListRef (#87756) Fixes [#87684](https://github.com/pytorch/pytorch/issues/87684) It's due to a new tensor list type is introduced as `ITensorListRef`. We need the function specialization for `prioritize` and `cached_cast` for this new tensor list type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87756 Approved by: https://github.com/jgong5, https://github.com/ezyang commit 166b5d3e7c5c230c455dcbcc05c84dd6bc03721b Author: PyTorch MergeBot Date: Fri Oct 28 06:11:42 2022 +0000 Revert "[EZ] Fix simple bug in torchdynamo (#87821)" This reverts commit ce7fcab9bdf61a34bc56b7cd45a882e4ad6ba175. Reverted https://github.com/pytorch/pytorch/pull/87821 on behalf of https://github.com/kit1980 due to Broke many dynamo tests https://github.com/pytorch/pytorch/actions/runs/3341984303/jobs/5534381456 commit 78b406932f0e4afd82b672f959b8cb9ce1e79f9d Author: Charlie Yan Date: Fri Oct 28 04:05:01 2022 +0000 Add me to reviewers of composable API changes (#87891) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87891 Approved by: https://github.com/mrshenli commit 1da5aeb97b73664ff0fe2f4bb48379655cede969 Author: Michael Suo Date: Thu Oct 27 15:01:21 2022 -0700 [dynamo] Error when user nests FX with dynamo (#87797) Today, this doesn't work and dynamo errors out in a very non-obvious way (see: https://gist.github.com/suo/dde04830372ab51a4a34ea760f14200a). Here, we detect the error early and exit with a nicer msg. Also add a config option to just no-op dynamo (which need to unblock internal enablement). cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87797 Approved by: https://github.com/yf225, https://github.com/soumith, https://github.com/jansel commit 07f7c4615bc858a8822c05aa310310446fc78836 Author: Xia, Weiwen Date: Fri Oct 28 04:58:54 2022 +0000 [MKLDNN] Replace pooling algorithm `pooling_avg` with `pooling_avg_exclude_padding` for future oneDNN upgrades (#87851) **Description** Replace pooling algorithm `pooling_avg` with `pooling_avg_exclude_padding` in implementation of mkldnn pooling. It's only a change of names, not algorithm. The former is an alias of the latter and it will be removed in future oneDNN library upgrades. This change has no effect on functionality or performance. **Validation** Covered by UT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87851 Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper commit 23b79e6f48c4350b9a2ed7680a13d22e5d8066b6 Author: rboca Date: Fri Oct 28 04:56:37 2022 +0000 Update CMakeLists.txt (#87030) Fix Caffe2_CPU_INCLUDE with Caffe2_GPU_INCLUDE. The expanding parent scope should be with the same variable name. The compilation in certain build configurations is corrected with this fix. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87030 Approved by: https://github.com/kit1980 commit daff5d35567615bb80f19e59474d8af7af84daf2 Author: Kazuaki Ishizaki Date: Fri Oct 28 04:53:33 2022 +0000 Fix typos under caffe2 directory (#87840) This PR fixes typos in `.md` files under caffe2 directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/87840 Approved by: https://github.com/kit1980 commit e8a97a3721f86eacbbf5e1160be07cc27544b9aa Author: Sherlock Huang Date: Fri Oct 28 00:01:07 2022 +0000 FakeTensorMode and Prims.add/sub/mul/div support scalar only inputs (#87759) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87759 Approved by: https://github.com/ngimel, https://github.com/mruberry, https://github.com/eellison commit d47ffecbe4ab1f177fbebc7d8f42d8b84f29f996 Author: Michael Suo Date: Thu Oct 27 12:37:59 2022 -0700 [dynamo] relax fake tensor restriction with `assume_constant_result` (#87895) This works now because of https://github.com/pytorch/pytorch/pull/87091, so don't error out anymore. cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87895 Approved by: https://github.com/tugsbayasgalan, https://github.com/voznesenskym commit 2e48b478e06b38a7468832d980d214441855547e Author: Jithun Nair Date: Fri Oct 28 03:50:43 2022 +0000 [ROCm] Use -rpath-link to fix libtinfo conflict (#83552) Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors. cc @jeffdaily @sunway513 @ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552 Approved by: https://github.com/malfet, https://github.com/pruthvistony commit 9c793b366faf49c78effb0c78d26c48f7664bc92 Author: sanchitintel Date: Fri Oct 28 03:42:19 2022 +0000 Move incorrectly placed closing curly brace of `extern "C"` block (#87853) When `__SYCL_DEVICE_ONLY__` is defined, while building PyTorch, the output of the preprocessing step would not have the closing curly brace of the `extern "C"` block, as it has been incorrectly placed. Compilers don't seem to report an error or a warning for a missing closing brace of an `extern "C"` block. If `c10/macros/Macros.h` would be included in a C++ file, and after the preprocessing stage, if the preprocessed source file would have some templated code after `extern "C" {`, then, after compilation, linking might fail with the error `templates must have c++ linkage`). eg. https://stackoverflow.com/questions/61717819/template-with-c-linkage-error-when-using-template-keyword-in-main-cpp/61717908#61717908 (its answer also has a small snippet of code to reproduce such an issue). one-liner bug fix that rectifies the placement of closing curly brace (`}`), so that the `extern "C"` block ends properly when `__SYCL_DEVICE_ONLY__` is defined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87853 Approved by: https://github.com/jgong5, https://github.com/kit1980, https://github.com/malfet commit 13de4d2137b7417d118a84c01ea88c21393e0a5d Author: Sherlock Huang Date: Thu Oct 27 22:20:36 2022 +0000 Meta OpInfo Test for stride correctness (#87849) Failing test logs here https://gist.github.com/SherlockNoMad/a7e132f3cb4152900f8a6d7df358c59e Pull Request resolved: https://github.com/pytorch/pytorch/pull/87849 Approved by: https://github.com/eellison commit 8b4d95759c7d0e6b7d4c3a3facaaa18ffe4cbd54 Author: PyTorch MergeBot Date: Fri Oct 28 03:00:09 2022 +0000 Revert "Many symintifications (#87604)" This reverts commit 777e6a2c5100f3274cff1bcf7e47ccbe1a651927. Reverted https://github.com/pytorch/pytorch/pull/87604 on behalf of https://github.com/weiwangmeta due to breaking internal builds commit 2cb7c3f865ac8305f0af2806082b3bc8ec29a640 Author: Animesh Jain Date: Fri Oct 28 02:41:12 2022 +0000 [dynamo][benchmarks] Prepone Cold start setup (#87913) Parallel compilation warms the Threadpool when we call `torch._dynamo.optimize()`. In current benchmarks, we were setting up the TRITON_CACHE_DIR much later. Because of this parallel compilation artifacts were not used and compilation latency improvements were not visible in dashboard. This PR just prepones the setup of TRITON_CACHE_DIR. cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87913 Approved by: https://github.com/wconstab commit 641d8e0e699a981b1272df66848ab87e118f5eca Author: PyTorch MergeBot Date: Fri Oct 28 02:20:24 2022 +0000 Revert "Enable mypy check for distributed.py, and fix type errors (#87543)" This reverts commit 2cc624cd4318414905d2475432aee13db9031cc6. Reverted https://github.com/pytorch/pytorch/pull/87543 on behalf of https://github.com/weiwangmeta due to breaking internal builds commit f9679184116f1d29c483c2b2a4c3a9d730be4694 Author: Andrew Gu Date: Thu Oct 27 16:59:49 2022 +0000 [AC] Return `None` from `apply_activation_checkpointing()` (#87871) `_recursive_wrap()` returns `Tuple[nn.Module, int]`, where the `nn.Module` is the in-place modified module and the `int` is the numel wrapped. In that sense, the return value is not meant to be publicly used. The `apply_activation_checkpointing()` docs already suggest that the function returns `None`, so this PR simply follows that. **Test Plan** CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/87871 Approved by: https://github.com/zhaojuanmao commit 81c4049f4d2f4e94818ae52c04c870805713c59e Author: Mike Iovine Date: Fri Oct 28 01:28:34 2022 +0000 [Static Runtime] Move PrepackWeights to internal-only graph passes (#87799) Summary: The pass introduces an `fb::` operator and thus cannot be used in OSS. The test failure was not exposed because the Static Runtime tests have been disabled in OSS for a while. The Dev Infra folks encountered this failure when re-enabling the tests. Test Plan: Existing tests Differential Revision: D40724547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87799 Approved by: https://github.com/huydhn commit ce7fcab9bdf61a34bc56b7cd45a882e4ad6ba175 Author: Tugsbayasgalan Manlaibaatar Date: Thu Oct 27 04:04:26 2022 +0000 [EZ] Fix simple bug in torchdynamo (#87821) cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87821 Approved by: https://github.com/voznesenskym, https://github.com/jansel commit fd27246c16d8a80e7de0ccc86d014f9759611b0f Author: lezcano Date: Thu Oct 27 21:46:25 2022 +0000 Fix decomposition for std (#87181) The previous implementation was lacking a few features and incurred on a pretty large error cc @ezyang @mruberry @ngimel @Lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87181 Approved by: https://github.com/ngimel, https://github.com/peterbell10 commit f21d0b310cecbd68ae345e4b677a702892c57292 Author: lezcano Date: Thu Oct 27 21:46:25 2022 +0000 Add decomposition for diagonal_scatter (#87282) cc @ezyang @mruberry @ngimel @Lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87282 Approved by: https://github.com/mruberry commit 9225f261769170fda1136ed19238f7c74cddb2bb Author: Andrew Gu Date: Thu Oct 27 20:13:27 2022 +0000 [FSDP] Fix wrapped module changing after ctor (#87837) Recently, I retired `FlattenParamsWrapper`, which meant that FSDP registers its `FlatParameter` on the wrapped module instead of the `FlattenParamsWrapper` instance. This is only relevant for `use_orig_params=False`. If the user changes an FSDP instance's wrapped module after the FSDP constructor, then the `FlatParameter` is no longer registered on the wrapped module. This can cause issues for full state dict, which checks if the `FlatParameter` is currently registered as an early return condition for `rank0_only=True`. The solution in this PR is to re-establish the wrapped module in `_lazy_init()`, de-registering from the old wrapped module and re-registering to the new wrapped module, where the assumption is that the user should not modify the module structure upon `_lazy_init()`. The direct access to the private attribute `_parameters` from `nn.Module` is not ideal, but we already rely on it for the dynamic `FlatParameter` registration. The tradeoff is whether we want an additional `nn.Module` wrapper (`FlattenParamsWrapper`) and use `delattr` plus a singleton list to do the dynamic registration or we want to access `_parameters`. If this becomes a problem, we can work with Core team on a solution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87837 Approved by: https://github.com/zhaojuanmao commit 7a3afe61d2230e8620718c326223ecc9e276fde3 Author: Richard Barnes Date: Fri Oct 28 00:41:04 2022 +0000 Check all CUDA API calls for errors in caffe2/ (#81816) Test Plan: Sandcastle Differential Revision: D35194868 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81816 Approved by: https://github.com/ezyang commit 3ece9fb45df90dec72251104ec29b85cb062e6b7 Author: Richard Barnes Date: Fri Oct 28 00:40:47 2022 +0000 Check all CUDA API calls for errors in torch/ (#81560) Summary: Original commit changeset: 0bb770d2cdb2 Original Phabricator Diff: D35194935 (https://github.com/pytorch/pytorch/commit/79e5b053b690852b21d881357904bc5a4438d95b) Differential Revision: D35291874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81560 Approved by: https://github.com/ezyang commit 4e3a0ff92ed2e5873d77d38bca50647b1ad2f4a8 Author: Bin Bao Date: Thu Oct 27 16:26:42 2022 +0000 Update how inductor cpu tests are skipped on fbcode (#87867) cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87867 Approved by: https://github.com/anijain2305 commit 6cc4ae3d2d64b10d7104c4a0cc4083a644ef8e54 Author: PyTorch MergeBot Date: Thu Oct 27 23:55:59 2022 +0000 Revert "[Inductor] Enable Inductor unspec inputs test for different dtypes (#87809)" This reverts commit 369755f8ce1b043c88efbc50ee09c0258dec5162. Reverted https://github.com/pytorch/pytorch/pull/87809 on behalf of https://github.com/kit1980 due to Broke trunk / cuda11.6-py3.10-gcc7-sm86 / test (default, 4, 4, linux.g5.4xlarge.nvidia.gpu), same error on pull. commit cda0d5a57b9126c6d244fdd5b02198f05c742615 Author: PyTorch MergeBot Date: Thu Oct 27 21:16:58 2022 +0000 Revert "[dynamo] Error when user nests FX with dynamo (#87797)" This reverts commit a485528a7e4551461d57db3deb8b40c2acea08d2. Reverted https://github.com/pytorch/pytorch/pull/87797 on behalf of https://github.com/kit1980 due to Broke linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge), same error on pull commit 6ad3543a1b000a369d811e0af195209f62f32fbc Author: soulitzer Date: Wed Oct 26 14:34:58 2022 -0400 BE: Improve test_will_engine_execute_node unittest (#87806) Adds the test from https://github.com/pytorch/pytorch/pull/86672 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87806 Approved by: https://github.com/albanD commit 0f7df16c71215bc7bd7835fc5933ac3343b8a627 Author: foram-chandra <96388449+foram-chandra@users.noreply.github.com> Date: Thu Oct 27 21:03:42 2022 +0000 [doc] Add out-kwarg documentation to torch.where (#87870) Fixes #87862 cc: @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/87870 Approved by: https://github.com/lezcano commit 46b16977d97fd3b241a641c8020d0bc073a218d0 Author: Alvaro Gaona Date: Thu Oct 27 21:00:59 2022 +0000 Reimplement Kaiser window (#87330) Relates to #85366 - For reference follow #87082. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87330 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 369755f8ce1b043c88efbc50ee09c0258dec5162 Author: Yanbo Liang Date: Thu Oct 27 20:58:46 2022 +0000 [Inductor] Enable Inductor unspec inputs test for different dtypes (#87809) Fixes #ISSUE_NUMBER cc @jansel @mlazos @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87809 Approved by: https://github.com/ngimel commit 1ff52225f185e11faa421528815aaa43e79e0722 Author: Edward Z. Yang Date: Thu Oct 27 13:49:11 2022 -0700 Unify SymIntNode and SymFloatNode into SymNode (#87817) This refactor was prompted by challenges handling mixed int/float operations in C++. A previous version of this patch added overloads for each permutation of int/float and was unwieldy https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different approach. The general outline of the patch is to combine the C++ types SymIntNode and SymFloatNode into a single type, SymNode. This is type erased; we no longer know statically at C++ if we have an int/float and have to test it with the is_int()/is_float() virtual methods. This has a number of knock on effects. - We no longer have C++ classes to bind to Python. Instead, we take an entirely new approach to our Python API, where we have a SymInt/SymFloat class defined entirely in Python, which hold a SymNode (which corresponds to the C++ SymNode). However, SymNode is not pybind11-bound; instead, it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode when it goes into C++. This implies a userland rename. In principle, it is also possible for the canonical implementation of SymNode to be written in C++, and then bound to Python with pybind11 (we have this code, although it is commented out.) However, I did not implement this as we currently have no C++ implementations of SymNode. Because we do return SymInt/SymFloat from C++ bindings, the C++ binding code needs to know how to find these classes. Currently, this is done just by manually importing torch and getting the attributes. - Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now takes SymInt/SymFloat, rather than SymNode, bringing it in line with how __torch_dispatch__ works. Some miscellaneous improvements: - SymInt now has a constructor that takes SymNode. Note that this constructor is ambiguous if you pass in a subclass of SymNode, so an explicit downcast is necessary. This means toSymFloat/toSymInt are no more. This is a mild optimization as it means rvalue reference works automatically. - We uniformly use the caster for c10::SymInt/SymFloat, rather than going the long way via the SymIntNode/SymFloatNode. - Removed some unnecessary toSymInt/toSymFloat calls in normalize_* functions, pretty sure this doesn't do anything. - guard_int is now a free function, since to guard on an int you cannot assume the method exists. A function can handle both int and SymInt inputs. - We clean up the magic method definition code for SymInt/SymFloat/SymNode. ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets plain methods; this is to help avoid confusion between the two types. Signed-off-by: Edward Z. Yang cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817 Approved by: https://github.com/albanD, https://github.com/anjali411 commit 2205f56f462fb9cbb1c068acc1cf29aca27aef0a Author: Jiewen Tan Date: Thu Oct 27 20:39:30 2022 +0000 [LTC] Remove lazy::View (#87822) Summary: This is the first part to remove the whole view and aliasing infrastructure in LTC, which is deprecated in favor of functionalization. It mainly removes things that use lazy::View. Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/87822 Approved by: https://github.com/JackCaoG, https://github.com/antoniojkim, https://github.com/wconstab commit 83b381d34db05d01ccde1c3da755b3dca5504ee7 Author: Animesh Jain Date: Thu Oct 27 19:49:29 2022 +0000 [dynamo] add inductor runs w/o cudagraphs (#87847) as title cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87847 Approved by: https://github.com/jansel commit d2d0be9a76bcdaf5f26eb88dd505ccf2ac6d7e40 Author: samdow Date: Thu Oct 27 17:10:04 2022 +0000 fix typo in per sample grad test (#87790) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87790 Approved by: https://github.com/zou3519 commit b8b1d7be24a29d9b20b25c0dd5273a499af07097 Author: Akshit Khurana Date: Wed Oct 26 15:44:00 2022 -0700 [dynamo] Add ao.nn to skipfiles inline allowlist (#87820) Summary: Allow torch.ao.nn module to be inlined Test Plan: Tested manually for https://github.com/pytorch/torchdynamo/issues/1737 Reviewers: Subscribers: Tasks: Tags: cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Differential Revision: [D40768679](https://our.internmc.facebook.com/intern/diff/D40768679) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87820 Approved by: https://github.com/jansel commit a485528a7e4551461d57db3deb8b40c2acea08d2 Author: Michael Suo Date: Wed Oct 26 10:49:38 2022 -0700 [dynamo] Error when user nests FX with dynamo (#87797) Today, this doesn't work and dynamo errors out in a very non-obvious way (see: https://gist.github.com/suo/dde04830372ab51a4a34ea760f14200a). Here, we detect the error early and exit with a nicer msg. Also add a config option to just no-op dynamo (which need to unblock internal enablement). cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87797 Approved by: https://github.com/yf225, https://github.com/soumith, https://github.com/jansel commit f1b78224cab093112173cd34bef0938fe2cb927e Author: Natalia Gimelshein Date: Thu Oct 27 15:53:11 2022 +0000 Fix type promotion for 2 wrapped scalar args (#87845) Fixes #76801 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87845 Approved by: https://github.com/SherlockNoMad, https://github.com/mruberry commit 03d6af4db3974fcbf1ce7d3b3be46c1134c72e6e Author: Brian Hirsh Date: Wed Oct 26 14:11:22 2022 -0700 add nesting to TORCH_SHOW_DISPATCH_TRACE (#87751) Added indents to `TORCH_SHOW_DISPATCH_TRACE` so that you more easily see the call tree from the dispatcher. Definitely slower, but it's all guarded under the `DEBUG` build. Example output: I know we have the PyDispatcher now, but I still found this helpful for debugging ``` [call] op=[aten::ones], key=[BackendSelect] [redispatch] op=[aten::ones], key=[CPU] [call] op=[aten::empty.memory_format], key=[BackendSelect] [redispatch] op=[aten::empty.memory_format], key=[CPU] [call] op=[aten::fill_.Scalar], key=[CPU] [call] op=[aten::clone], key=[AutogradCPU] [redispatch] op=[aten::clone], key=[CPU] [call] op=[aten::empty_strided], key=[BackendSelect] [redispatch] op=[aten::empty_strided], key=[CPU] [call] op=[aten::copy_], key=[CPU] [call] op=[aten::view], key=[PythonTLSSnapshot] [redispatchBoxed] op=[aten::view], key=[AutogradCPU] [redispatch] op=[aten::view], key=[ADInplaceOrView] [redispatch] op=[aten::view], key=[Functionalize] [call] op=[aten::view], key=[PythonTLSSnapshot] [redispatchBoxed] op=[aten::view], key=[Meta] [call] op=[aten::view], key=[PythonTLSSnapshot] [redispatchBoxed] op=[aten::view], key=[Python] [callBoxed] op=[aten::view], key=[CPU] [call] op=[aten::clone], key=[PythonTLSSnapshot] [redispatchBoxed] op=[aten::clone], key=[AutogradCPU] [redispatch] op=[aten::clone], key=[Functionalize] [callBoxed] op=[aten::clone], key=[PythonTLSSnapshot] [redispatchBoxed] op=[aten::clone], key=[Python] [callBoxed] op=[aten::clone], key=[CPU] [call] op=[aten::empty_strided], key=[BackendSelect] [redispatch] op=[aten::empty_strided], key=[CPU] [call] op=[aten::copy_], key=[CPU] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87751 Approved by: https://github.com/ezyang, https://github.com/zou3519 commit 23ff47ccc53cda92ffe2482f22a4321f721eace0 Author: Brian Hirsh Date: Wed Oct 26 14:11:22 2022 -0700 functionalization: fix detach() (#87750) `.detach()` worked in basic cases previously, but didn't properly preserve view relationships between the base and the output. This wasn't heavily tested, because autograd doesn't normally encounter `FunctionalTensorWrapper` directly, but could become more common if we fuse functionalization and autograd into a single tracing pass. This will also be a bug fix for LTC (and XLA when they use functionalization) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87750 Approved by: https://github.com/ezyang commit e2bbc0a134369c56f1be437e4548a2204a83b46e Author: Nikita Shulga Date: Thu Oct 27 15:38:48 2022 +0000 [BE] Move remaining workflows off Xenial (#87834) Both BE and prerequisite for moving our CI/CD to C++17 compiler (gcc-5.4 is not fully C++17 compliant) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87834 Approved by: https://github.com/weiwangmeta, https://github.com/kit1980, https://github.com/huydhn commit 1e1b04512879d6166dc1f5adff482723e2d0da9e Author: jpvillam Date: Thu Oct 27 15:11:28 2022 +0000 [ROCM] Enable Sparse Pickle Test (#82729) Missed stream context for serialization Missing ROCm stream context on memory operations for serialization Ran the sparse pickle test Pull Request resolved: https://github.com/pytorch/pytorch/pull/82729 Approved by: https://github.com/ngimel commit aaba0bd30641c56db1dc0550b81fbc458db46276 Author: Mike Iovine Date: Thu Oct 27 12:29:51 2022 +0000 [JIT] Fix torch.jit.script for functions with many decorators (#87804) Summary: Python's function parsing from the `ast` module records the line number of the function definition, not the first decorator. So this diff fixes crashes like this: ``` IndexError: vector::_M_range_check: __n (which is 10) >= this->size() (which is 8) ``` Test Plan: New unit test Differential Revision: D40726352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87804 Approved by: https://github.com/tugsbayasgalan, https://github.com/davidberard98 commit 1780e0ef7fe49f0b1e2723bb88d926bac231eee1 Author: kshitij12345 Date: Thu Oct 27 10:46:53 2022 +0000 [complex] conv_transpose2d (#81805) Reference: https://github.com/pytorch/pytorch/issues/71108 Fixes : #86414 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81805 Approved by: https://github.com/anjali411 commit c36db82e12a80e31a50e28aeda2801d18a952959 Author: XiaobingSuper Date: Wed Oct 26 01:46:46 2022 -0400 TorchDynamo: Add convolution unary fusion for cpu in inference mode (#87063) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87063 Approved by: https://github.com/jgong5, https://github.com/jansel commit b16b5fb802028ab96e4e15a09d6c1d94304c4f83 Author: Taylor Robie Date: Wed Oct 26 16:56:47 2022 -0700 [Profiler] Hold weak reference to prevent TensorImpl address reuse during profiling. (#87244) A recurring problem with assigning Tensor IDs is that we want to preserve identity when storage changes but we don't observe TensorImpl destruction so identity assignment is not robust to the ABA problem with respect to TensorImpl*. ~TensorImpl is far too hot to instrument; even adding a call to a no-op function in a different compilation unit increases overhead by tens of percent. (OSS builds do not have any sort of LTO.) Fortunately there is a solution. A PyTorch Tensor is a `c10::intrusive_ptr`, which in turn holds a storage. (Which is a `c10::intrusive_ptr`) `c10::intrusive_ptr` has a `c10::weak_intrusive_ptr` class for taking non-owning references to the underlying object. The implementation involves both a strong refcount and weak refcount in `c10::intrusive_ptr`. If the strong refcount of an intrusive_ptr goes to zero and there are no weak references then everything is deleted. However if there is a weak reference then the intrusive_ptr calls `release_resources()` but not delete. This has the effect of freeing the underlying resources (ensuring that program semantics are unchanged) but leaves behind an empty shell of an `intrusive_ptr` that the `weak_intrusive_ptr`s use to check status. And herein lies the solution: as long as we hold a weak reference to a TensorImpl we will block deletion and prevent the `TensorImpl*` from being reused. This PR uses a `c10::weak_intrusive_ptr` to store the address of profiled TensorImpls and then converts it to a raw pointer (or rather, a `TensorImplAddress`) during post processing when we no longer care about blocking address reuse. Differential Revision: [D40492848](https://our.internmc.facebook.com/intern/diff/D40492848/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87244 Approved by: https://github.com/slgong-fb, https://github.com/albanD commit 4b2390517263592fb6972e4b128777bc038ee4aa Author: Mengwei Liu Date: Thu Oct 27 06:04:22 2022 +0000 [torch] Add torch cpp cpu target for torch/csrc/api/src files (#87327) Summary: Duplicating fbcode target `fbcode//caffe2:torch-cpp-cpu` target in xplat. In D40460749 our user wants to use `torch::kNearest` enum which is defined in `torch/csrc/api/src/enum.cpp`. Adding this target to support it. Test Plan: Rely on CI Differential Revision: D40532087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87327 Approved by: https://github.com/ezyang commit bf113e38fad30fb1eec1f94563f419518ae3178c Author: Richard Barnes Date: Thu Oct 27 05:15:16 2022 +0000 use nv_diag_suppress (#87712) Fixes: ``` /dev/shm/rbarnes/tempfs/pytorch/aten/src/ATen/native/cuda/UnaryFractionKernels.cu(125): warning #20236-D: pragma "diag_suppress" is deprecated, use "nv_diag_suppress" instead /dev/shm/rbarnes/tempfs/pytorch/aten/src/ATen/native/cuda/UnaryFractionKernels.cu(125): warning #20236-D: pragma "diag_suppress" is deprecated, use "nv_diag_suppress" instead /dev/shm/rbarnes/tempfs/pytorch/aten/src/ATen/native/sparse/cuda/SparseMatMul.cu(73): warning #20236-D: pragma "diag_suppress" is deprecated, use "nv_diag_suppress" instead /dev/shm/rbarnes/tempfs/pytorch/aten/src/ATen/native/sparse/cuda/SparseMatMul.cu(73): warning #20236-D: pragma "diag_suppress" is deprecated, use "nv_diag_suppress" instead ``` cc @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/87712 Approved by: https://github.com/soumith commit 107f92a6830f61b88a7eb55934610f491623dc9b Author: Andrew Gu Date: Thu Oct 27 00:03:15 2022 +0000 [FSDP] ufmt FSDP test (#87812) This applies `ufmt` to all of the FSDP test files in the `test/distributed/fsdp/` directory. **Test Plan** CI **Notes** For VSCode users, - Install `ufmt`: https://pypi.org/project/ufmt/ - Install VSCode `ufmt` extension: https://marketplace.visualstudio.com/items?itemName=omnilib.ufmt - Include in `settings.json`: ``` { "[python]": { "editor.defaultFormatter": "omnilib.ufmt", "editor.formatOnSave": true, }, } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87812 Approved by: https://github.com/rohan-varma commit e3cf81e0a73e7aec282f41469353d955a2fef143 Author: Andrew Gu Date: Thu Oct 27 00:03:14 2022 +0000 [FSDP] ufmt /fsdp (#87811) This applies `ufmt` to all of the FSDP files in the `torch/distributed/fsdp/` directory. **Test Plan** CI **Notes** For VSCode users, - Install `ufmt`: https://pypi.org/project/ufmt/ - Install VSCode `ufmt` extension: https://marketplace.visualstudio.com/items?itemName=omnilib.ufmt - Include in `settings.json`: ``` { "[python]": { "editor.defaultFormatter": "omnilib.ufmt", "editor.formatOnSave": true, }, } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87811 Approved by: https://github.com/rohan-varma, https://github.com/fegin commit 49ce3ed14cab4aca39ed42d6dbbc1759667a28fe Author: PyTorch MergeBot Date: Thu Oct 27 04:23:43 2022 +0000 [vision hash update] update the pinned vision hash (#87831) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87831 Approved by: https://github.com/pytorchbot commit 21bef8e944c90cdf98c2ead4369410db252944e1 Author: Horace He Date: Wed Oct 26 16:37:10 2022 +0000 fix sym_storage conversion and some cleanup (#87718) cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87718 Approved by: https://github.com/ezyang commit 58650835bb91d927623e6bff5cc4844fbcad6368 Author: Jerry Zhang Date: Wed Oct 26 14:43:42 2022 -0700 [fx][subgraph_rewriter] Change match_filter to be a List in replace_pattern_with_filters (#87257) Summary: att, this is experimental api so not marking it as bc-breaking. The match will be accepted only if all the filters in the list passes. Changing the filter arg to be list also allows us to pass in empty list that means no filter, which makes user code cleaner. Test Plan: python test/test_fx.py -k test_replace_pattern_with_filters Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87257 Approved by: https://github.com/SherlockNoMad commit 195a13f48ce10bb80aeb792993cd33747e1de755 Author: Jerry Zhang Date: Wed Oct 26 14:43:42 2022 -0700 [quant][be] Remove unused function `quantize_node` (#87153) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87153 Approved by: https://github.com/andrewor14 commit 30ea8f5c207d3f136cece6c5ca503c18f47b5007 Author: Nikita Shulga Date: Thu Oct 27 01:24:01 2022 +0000 Limit ROCM option to Linux only (#87833) As it's not available on neither Windows nor MacOS cc @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/87833 Approved by: https://github.com/kit1980 commit 0e3b5ea026cc45d3008ac2b1d02a27f65c4d957d Author: Jerry Zhang Date: Wed Oct 26 14:43:41 2022 -0700 [quant][fx] Add _convert_to_reference_decomposed (#87094) Summary: _convert_to_reference_decomposed is a private convert function in fx graph mode quantization flow to convert a calibrated/trained model to a reference quantized model with decomposed quantized tensor representations. Test Plan: python test/test_quantization.py TestQuantizeFx.test__convert_to_reference_decomposed_fx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87094 Approved by: https://github.com/andrewor14 commit a12d3d6b49cb4c9fdc325b0952ac748f55ae72a2 Author: Digant Desai Date: Thu Oct 27 00:59:40 2022 +0000 [profiler] Standard performance event names for the profiler (#87538) Summary: The goal is to create a hardware/backend independent event abstraction on which a standard set of tooling can be developed. Test Plan: CI Reviewed By: kimishpatel Differential Revision: D40238034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87538 Approved by: https://github.com/salilsdesai, https://github.com/kirklandsign commit 2cc624cd4318414905d2475432aee13db9031cc6 Author: Charlie Yan Date: Wed Oct 26 19:37:52 2022 +0000 Enable mypy check for distributed.py, and fix type errors (#87543) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87543 Approved by: https://github.com/fduwjj commit 5dbd80a605c5c0f12a57de464e9f89f55e6f8f97 Author: Valentin Andrei Date: Thu Oct 27 00:18:16 2022 +0000 [pytorch] Layer norm backward speed gain with warp shuffles (#87814) Summary: Improved native layer norm backward performance. Rewrote `GammaBetaBackwardCUDAKernel` to use shared memory only for the reduction step, but not for loading `mean` and `rstd`. The previous implementation used only `threadIdx.x = 0` to load `mean` and `rstd` into shared memory, and then all threads would access the values in order to do loop unrolling. This approached increased register usage and decreased occupancy, without much benefit from using shared memory (this is because the values were already cached in L1). The new implementation is simpler and register usage is smaller, thus occupancy is better. Added another implementation called `GammaBetaBackwardCUDAKernel_32x32` which is only for shapes dividing exactly to a (32 x 32) block. This permits using warp shuffles for speeding up loading `mean` and `rstd` as well as for the final reduction stage. The effective bandwidth of this implementation is equal to STREAM Triad. Observed that we can get additional benefit if we lower the threshold for calling `GammaBetaBackwardSimpleCUDAKernel` (simple col-wise reduction implementation) from `512` to `128`. Test Plan: Wrote a simple CUDA app that calls the previous implementation of `GammaBetaBackwardCUDAKernel` and the current one, using FP32 values and compares the results. The epsilon value we used for FP comparison is 0.00001 for the weight and 0.0001 for the bias. Ran the benchmark for various sizes A100 GPU and got the results below. Almost all sizes show good speedup. ``` Size (32, 32); Mismatches: dg = 0 db = 0 out of 32. reference = 0.0073 (ms); optimized = 0.0071 (ms); bw_opt = 1.14 GB/s; speedup = 2.68% Size (64, 32); Mismatches: dg = 0 db = 0 out of 32. reference = 0.0107 (ms); optimized = 0.0107 (ms); bw_opt = 1.50 GB/s; speedup = 0.22% Size (256, 128); Mismatches: dg = 0 db = 0 out of 128. reference = 0.0323 (ms); optimized = 0.0075 (ms); bw_opt = 32.89 GB/s; speedup = 330.16% Size (512, 1024); Mismatches: dg = 0 db = 0 out of 1024. reference = 0.0103 (ms); optimized = 0.0089 (ms); bw_opt = 440.54 GB/s; speedup = 15.82% Size (1024, 2048); Mismatches: dg = 0 db = 0 out of 2048. reference = 0.0197 (ms); optimized = 0.0136 (ms); bw_opt = 1151.44 GB/s; speedup = 44.91% Size (2048, 2048); Mismatches: dg = 0 db = 0 out of 2048. reference = 0.0416 (ms); optimized = 0.0283 (ms); bw_opt = 1105.31 GB/s; speedup = 47.01% Size (4096, 16384); Mismatches: dg = 0 db = 0 out of 16384. reference = 0.4420 (ms); optimized = 0.3915 (ms); bw_opt = 1277.58 GB/s; speedup = 12.90% Size (70000, 64); Mismatches: dg = 0 db = 0 out of 64. reference = 0.5908 (ms); optimized = 0.6850 (ms); bw_opt = 49.49 GB/s; speedup = -13.75% Size (131072, 512); Mismatches: dg = 0 db = 0 out of 512. reference = 1.1961 (ms); optimized = 0.9234 (ms); bw_opt = 542.54 GB/s; speedup = 29.53% Size (1000, 520); Mismatches: dg = 0 db = 0 out of 520. reference = 0.0132 (ms); optimized = 0.0113 (ms); bw_opt = 343.83 GB/s; speedup = 16.88% Size (4005, 4005); Mismatches: dg = 0 db = 0 out of 4005. reference = 0.1441 (ms); optimized = 0.1054 (ms); bw_opt = 1134.36 GB/s; speedup = 36.71% Size (10000, 1000); Mismatches: dg = 0 db = 0 out of 1000. reference = 0.1293 (ms); optimized = 0.1248 (ms); bw_opt = 597.71 GB/s; speedup = 3.63% Size (1024, 10000); Mismatches: dg = 0 db = 0 out of 10000. reference = 0.0738 (ms); optimized = 0.0735 (ms); bw_opt = 1039.40 GB/s; speedup = 0.45% Size (8192, 4096); Mismatches: dg = 0 db = 0 out of 4096. reference = 0.2673 (ms); optimized = 0.2223 (ms); bw_opt = 1125.01 GB/s; speedup = 20.25% Size (10000, 10000); Mismatches: dg = 0 db = 0 out of 10000. reference = 0.7331 (ms); optimized = 0.8940 (ms); bw_opt = 833.54 GB/s; speedup = -18.00% Size (3072, 10000); Mismatches: dg = 0 db = 0 out of 10000. reference = 0.2087 (ms); optimized = 0.2364 (ms); bw_opt = 968.64 GB/s; speedup = -11.71% Size (6144, 10000); Mismatches: dg = 0 db = 0 out of 10000. reference = 0.4197 (ms); optimized = 0.5118 (ms); bw_opt = 894.63 GB/s; speedup = -18.00% Size (1024, 20000); Mismatches: dg = 0 db = 0 out of 20000. reference = 0.1480 (ms); optimized = 0.1297 (ms); bw_opt = 1177.68 GB/s; speedup = 14.12% Size (1024, 20000); Mismatches: dg = 0 db = 0 out of 20000. reference = 0.1483 (ms); optimized = 0.1278 (ms); bw_opt = 1195.26 GB/s; speedup = 16.04% Size (512, 1536); Mismatches: dg = 0 db = 0 out of 1536. reference = 0.0104 (ms); optimized = 0.0091 (ms); bw_opt = 646.72 GB/s; speedup = 14.44% Size (512, 6144); Mismatches: dg = 0 db = 0 out of 6144. reference = 0.0219 (ms); optimized = 0.0156 (ms); bw_opt = 1506.30 GB/s; speedup = 40.52% Size (512, 10240); Mismatches: dg = 0 db = 0 out of 10240. reference = 0.0424 (ms); optimized = 0.0370 (ms); bw_opt = 1057.84 GB/s; speedup = 14.63% Size (1000, 1000); Mismatches: dg = 0 db = 0 out of 1000. reference = 0.0139 (ms); optimized = 0.0119 (ms); bw_opt = 627.51 GB/s; speedup = 16.83% Size (2000, 2000); Mismatches: dg = 0 db = 0 out of 2000. reference = 0.0421 (ms); optimized = 0.0412 (ms); bw_opt = 724.10 GB/s; speedup = 2.20% Size (10240, 10240); Mismatches: dg = 0 db = 0 out of 10240. reference = 0.7210 (ms); optimized = 0.6098 (ms); bw_opt = 1281.40 GB/s; speedup = 18.24% Size (384, 128); Mismatches: dg = 0 db = 0 out of 128. reference = 0.0449 (ms); optimized = 0.0089 (ms); bw_opt = 41.50 GB/s; speedup = 403.48% Size (2048, 1024); Mismatches: dg = 0 db = 0 out of 1024. reference = 0.0208 (ms); optimized = 0.0169 (ms); bw_opt = 925.70 GB/s; speedup = 23.13% Size (267, 513); Mismatches: dg = 0 db = 0 out of 513. reference = 0.0342 (ms); optimized = 0.0090 (ms); bw_opt = 114.18 GB/s; speedup = 280.64% Size (67, 123479); Mismatches: dg = 0 db = 0 out of 123479. reference = 0.0562 (ms); optimized = 0.0552 (ms); bw_opt = 1133.46 GB/s; speedup = 1.81% Size (1024, 123479); Mismatches: dg = 0 db = 0 out of 123479. reference = 0.8573 (ms); optimized = 0.9245 (ms); bw_opt = 1020.02 GB/s; speedup = -7.27% Size (2048, 66679); Mismatches: dg = 0 db = 0 out of 66679. reference = 0.8778 (ms); optimized = 0.8590 (ms); bw_opt = 1185.05 GB/s; speedup = 2.19% Size (200, 256); Mismatches: dg = 0 db = 0 out of 256. reference = 0.0215 (ms); optimized = 0.0066 (ms); bw_opt = 58.49 GB/s; speedup = 226.81% Size (1000, 256); Mismatches: dg = 0 db = 0 out of 256. reference = 0.0109 (ms); optimized = 0.0092 (ms); bw_opt = 208.27 GB/s; speedup = 18.65% Size (6000, 256); Mismatches: dg = 0 db = 0 out of 256. reference = 0.0394 (ms); optimized = 0.0301 (ms); bw_opt = 381.90 GB/s; speedup = 30.98% Size (6272, 256); Mismatches: dg = 0 db = 0 out of 256. reference = 0.0403 (ms); optimized = 0.0300 (ms); bw_opt = 400.48 GB/s; speedup = 34.34% Size (200, 512); Mismatches: dg = 0 db = 0 out of 512. reference = 0.0218 (ms); optimized = 0.0066 (ms); bw_opt = 116.33 GB/s; speedup = 229.96% Size (1000, 512); Mismatches: dg = 0 db = 0 out of 512. reference = 0.0110 (ms); optimized = 0.0094 (ms); bw_opt = 407.29 GB/s; speedup = 17.26% Size (6000, 512); Mismatches: dg = 0 db = 0 out of 512. reference = 0.0535 (ms); optimized = 0.0594 (ms); bw_opt = 386.05 GB/s; speedup = -9.95% Size (6272, 512); Mismatches: dg = 0 db = 0 out of 512. reference = 0.0573 (ms); optimized = 0.0387 (ms); bw_opt = 619.62 GB/s; speedup = 48.06% Size (200, 1024); Mismatches: dg = 0 db = 0 out of 1024. reference = 0.0221 (ms); optimized = 0.0069 (ms); bw_opt = 222.78 GB/s; speedup = 220.76% Size (1000, 1024); Mismatches: dg = 0 db = 0 out of 1024. reference = 0.0113 (ms); optimized = 0.0097 (ms); bw_opt = 787.79 GB/s; speedup = 16.46% Size (6000, 1024); Mismatches: dg = 0 db = 0 out of 1024. reference = 0.0723 (ms); optimized = 0.0715 (ms); bw_opt = 640.95 GB/s; speedup = 1.10% Size (6272, 1024); Mismatches: dg = 0 db = 0 out of 1024. reference = 0.0751 (ms); optimized = 0.0572 (ms); bw_opt = 837.57 GB/s; speedup = 31.30% Size (200, 1536); Mismatches: dg = 0 db = 0 out of 1536. reference = 0.0232 (ms); optimized = 0.0071 (ms); bw_opt = 323.97 GB/s; speedup = 226.51% Size (1000, 1536); Mismatches: dg = 0 db = 0 out of 1536. reference = 0.0125 (ms); optimized = 0.0114 (ms); bw_opt = 1005.84 GB/s; speedup = 9.62% Size (6000, 1536); Mismatches: dg = 0 db = 0 out of 1536. reference = 0.0807 (ms); optimized = 0.0830 (ms); bw_opt = 828.02 GB/s; speedup = -2.76% Size (6272, 1536); Mismatches: dg = 0 db = 0 out of 1536. reference = 0.0836 (ms); optimized = 0.0695 (ms); bw_opt = 1033.62 GB/s; speedup = 20.27% Size (200, 2048); Mismatches: dg = 0 db = 0 out of 2048. reference = 0.0224 (ms); optimized = 0.0075 (ms); bw_opt = 408.58 GB/s; speedup = 198.10% Size (1000, 2048); Mismatches: dg = 0 db = 0 out of 2048. reference = 0.0165 (ms); optimized = 0.0135 (ms); bw_opt = 1132.42 GB/s; speedup = 22.26% Size (6000, 2048); Mismatches: dg = 0 db = 0 out of 2048. reference = 0.0993 (ms); optimized = 0.0989 (ms); bw_opt = 926.35 GB/s; speedup = 0.41% Size (6272, 2048); Mismatches: dg = 0 db = 0 out of 2048. reference = 0.1033 (ms); optimized = 0.0826 (ms); bw_opt = 1159.55 GB/s; speedup = 25.09% Size (200, 3072); Mismatches: dg = 0 db = 0 out of 3072. reference = 0.0230 (ms); optimized = 0.0076 (ms); bw_opt = 605.09 GB/s; speedup = 202.51% Size (1000, 3072); Mismatches: dg = 0 db = 0 out of 3072. reference = 0.0207 (ms); optimized = 0.0213 (ms); bw_opt = 1076.45 GB/s; speedup = -2.69% Size (6000, 3072); Mismatches: dg = 0 db = 0 out of 3072. reference = 0.1198 (ms); optimized = 0.1274 (ms); bw_opt = 1078.58 GB/s; speedup = -5.95% Size (6272, 3072); Mismatches: dg = 0 db = 0 out of 3072. reference = 0.1293 (ms); optimized = 0.1189 (ms); bw_opt = 1207.95 GB/s; speedup = 8.76% Average speedup = 52.88% ``` For additional numerical validation used the following script: ``` def run_model_on_device(fs, X, gO, device_string, numeric_type): ln = torch.nn.LayerNorm((fs,), device=device_string, dtype=numeric_type) ln.reset_parameters() X.grad = None ln.zero_grad(set_to_none=True) out = ln(X) out.backward(gO) return (ln.weight.grad, ln.bias.grad) def run_correctness_test(eps_weight, eps_bias): dtype = torch.float for fs in (512, 1024, 2048, 4096, 8192, 10000, 500, 1000, 2001, 4005, 8117): for bs in (512, 1024, 2048, 4096, 525, 1033, 2064, 3000): mean_adjustment = torch.randn(fs, device="cpu", dtype=torch.float) X = mean_adjustment * torch.randn( bs, fs, device="cpu", dtype=torch.float, requires_grad=True ) X = X.detach().requires_grad_() gO = torch.rand_like(X) X_gpu = X.to("cuda") X_gpu = X_gpu.detach().requires_grad_() gO_gpu = gO.to("cuda") gO_gpu = gO_gpu.detach().requires_grad_() grad_cpu_ref = run_model_on_device(fs, X, gO, "cpu", dtype) grad_gpu = run_model_on_device(fs, X_gpu, gO_gpu, "cuda", dtype) weight_grad_gpu_target = grad_gpu[0].detach().to("cpu") bias_grad_gpu_target = grad_gpu[1].detach().to("cpu") weight_delta = torch.abs(grad_cpu_ref[0] - weight_grad_gpu_target) weight_mismatches = (weight_delta >= eps_weight).nonzero() weight_mismatch_pct = len(weight_mismatches) / len(weight_delta) * 100 bias_delta = torch.abs(grad_cpu_ref[1] - bias_grad_gpu_target) bias_mismatches = (bias_delta >= eps_bias).nonzero() bias_mismatch_pct = len(bias_mismatches) / len(bias_delta) * 100 print( "Size ({} x {}) mismatch percentage: weight {:3.2f} bias {:3.2f}".format( fs, bs, weight_mismatch_pct, bias_mismatch_pct ) ) ``` `NVFuserTest.FusionMagicSchedulerLayerNormBackward_CUDA` test also does additional numerical validation and it passes. Differential Revision: D40730981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87814 Approved by: https://github.com/weiwangmeta commit 449778a939f2adc8867c5035b08be4e2d88339d8 Author: Kazuaki Ishizaki Date: Thu Oct 27 00:01:10 2022 +0000 Fix typos under .github directory (#87828) This PR fixes typos in `.md` files under .github directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/87828 Approved by: https://github.com/clee2000 commit 2c66889f90aa9ef2dbe44d0a39878591002e990b Author: wchen61 <183351030@qq.com> Date: Wed Oct 26 23:44:13 2022 +0000 Synchronize before change cuda stream (#82050) (#82056) Summary: Fixes https://github.com/pytorch/pytorch/issues/82050 Need synchronize before change cuda stream Pull Request resolved: https://github.com/pytorch/pytorch/pull/82056 Approved by: https://github.com/ngimel commit 59b9d29260ac59c608d534175dba65e372201955 Author: Nikita Karetnikov Date: Wed Oct 26 07:36:02 2022 +0200 [primTorch] Check `error_regex` in `test_python_ref_errors` (#86987) cc @ezyang @mruberry @ngimel @Lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/86987 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 5ee5f5ac1b6c300cb604d33e1501a78107b9bd58 Author: Nikita Shulga Date: Wed Oct 26 23:16:29 2022 +0000 [BE] Don't build CUDA-10.2 docker images (#87819) As CUDA-10.2 should not longer be used in CI/CD Test Plan: ` grep cuda10.2 .github -R|grep -v mock` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87819 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi commit 3208c2f6bd1218398a18a3df91575cdda6e65e24 Author: Driss Guessous Date: Wed Oct 26 22:42:39 2022 +0000 Add logging for nested tensor usage tracking (#87632) Add logging message so that we can track nested tensor adoption. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87632 Approved by: https://github.com/cpuhrsch commit 536474e82394e617335e97806032c39d24387730 Author: Jiewen Tan Date: Wed Oct 26 22:41:19 2022 +0000 [LTC] Remove tensor.storage_ (#87645) Summary: Since LTC now supports functionalization, we don't need to fake a storage to support is_alias_of anymore. Let's remove it. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyOpsTest.IsAliasOf Pull Request resolved: https://github.com/pytorch/pytorch/pull/87645 Approved by: https://github.com/JackCaoG, https://github.com/bdhirsh commit 5edbc926834327d471da505aca902180d30ff991 Author: Catherine Lee Date: Wed Oct 26 22:10:10 2022 +0000 print stderr for ghstack rebase (#87795) current output tends to be empty on failure, which makes it hard to debug Pull Request resolved: https://github.com/pytorch/pytorch/pull/87795 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi commit 91c95ff7c57260647d12d4e4e4c8de82bce12fa2 Author: Will Constable Date: Wed Oct 26 04:34:41 2022 +0000 Enable graph_split_inductor test as it runs now (#87762) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87762 Approved by: https://github.com/davidberard98 commit 53c640a5283c82cdd37cd29e7975627d02d094ec Author: Nikita Shulga Date: Wed Oct 26 21:51:13 2022 +0000 [CI] Delete `nnpack` installation from conda (#87813) Not sure why it was there to begin with and I really hope none of our CI depend on the package that was last updated 5 years ago, see https://anaconda.org/killeent/nnpack Pull Request resolved: https://github.com/pytorch/pytorch/pull/87813 Approved by: https://github.com/atalman, https://github.com/kit1980, https://github.com/ZainRizvi commit 1522946882fee9e4d8c20e143a58d7074cc2efd4 Author: Cameron Voisey Date: Wed Oct 26 21:34:13 2022 +0000 Simplify installation instruction in contributing file (#87460) Simplification of one of the installation instructions in CONTRIBUTING.md that I found tricky to parse at first. Also adds a link to the "Make no-op build fast" section to make it easier to navigate to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87460 Approved by: https://github.com/ngimel commit adb76ef510eb645af71292f23f5d2d560d92e936 Author: soulitzer Date: Wed Oct 26 13:34:34 2022 -0400 Expose API for backward execution order (#87507) In this PR: - graph_task stores graph roots on construction so that we can later traverse through the graph - before the nodes are returned, they needed to be converted from raw_ptr to shared_ptr, and this should be OK because the graph is guaranteed to be alive Pull Request resolved: https://github.com/pytorch/pytorch/pull/87507 Approved by: https://github.com/albanD commit 926827b89cc3eda268df2a54be6d96a150eb506c Author: PyTorch MergeBot Date: Wed Oct 26 21:01:09 2022 +0000 Revert "Disable linux-bionic-py3_7-clang8-xla-test (#87737)" This reverts commit 21f7e7d040c646b4ce7f4a4e973da97660462bdc. Reverted https://github.com/pytorch/pytorch/pull/87737 on behalf of https://github.com/kit1980 due to Re-enable XLA tests after https://github.com/pytorch/pytorch/pull/87818 commit 71933d381b7c021dfa1818e05539a1910fe95296 Author: Zafar Date: Wed Oct 26 20:55:10 2022 +0000 [ao] Fixing tests for block pruning shapes (#87326) The current unittests were only checking the tensors whose shapes were already multiples of the block size. That caused some hidden bugs to creep in. Specifically, for the shapes that would require padding for the mask/data, the sparsifier would try to apply shape-mismatching tensors onto each other. This caused segfaults as well as silent failures. This makes minor adjustments to the code to make sure the masks and data shapes are aligned, as well as fixing the tests to catch this. Test Plan: ```python python test/test_ao_sparsity.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87326 Approved by: https://github.com/jcaip commit 1168f427909df47c4c2afa3e9ecc3f4eef5c7af8 Author: Sergii Dymchenko Date: Wed Oct 26 20:54:25 2022 +0000 Update XLA hash (#87818) This is a re-creation of https://github.com/pytorch/pytorch/pull/87808 so we don't have to wait. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87818 Approved by: https://github.com/clee2000 commit bbcd4b2f2f2cabbef7c2bcec494795d32f830cdb Author: Bin Bao Date: Wed Oct 26 13:59:07 2022 +0000 Clean up CPU test in test_torchinductor.py for fbcode (#87783) cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87783 Approved by: https://github.com/bertmaher commit 88eff1072290177221e7a09d792f7f135b4c83ca Author: Justin Chu Date: Wed Oct 26 20:42:06 2022 +0000 [ONNX] Deprecate operators.py (#87798) Deprecate `torch.onnx.operators` because it's only for backwards compatibility Pull Request resolved: https://github.com/pytorch/pytorch/pull/87798 Approved by: https://github.com/BowenBao commit b21fe312c0f4cbc17e957010f22b2a8eaa0825e9 Author: Sherlock Huang Date: Wed Oct 26 17:38:05 2022 +0000 Fix meta for index_add and index_put (#87775) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87775 Approved by: https://github.com/ezyang, https://github.com/ngimel commit 8016fd9eb10f5a933debb323149f9c0e5634cc9b Author: Huy Do Date: Wed Oct 26 20:08:29 2022 +0000 Set check-latest to false when setup python and pip cache in CI (#87621) I missed the fine print in https://github.com/actions/setup-python/blob/main/README.md#caching-packages-dependencies when setting up the cache using setup-python GHA > Restored cache will not be used if the requirements.txt file is not updated for a long time and a newer version of the dependency is available which can lead to an increase in total build time. The latter part is important because it implies that even with the cache, pip will still try to check if a newer version exists and that part can be flaky, i.e. https://github.com/pytorch/pytorch/actions/runs/3313764038/jobs/5472180293 This undesired behavior can be turned off by setting the advance option `check-latest` to false https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md#check-latest-version. Per my understanding, this should tell pip install in these workflows to use the local cached copy of the package avoiding the need to query pypi every single time. `check-latest` was added quite recently https://github.com/actions/setup-python/pull/406, so `actionlint-1.6.15` fails to recognize it. Thus, this PR also upgrades `actionlint` to the latest 1.6.21 to pass the linter check. Here is an example error from 1.6.15 from https://github.com/pytorch/pytorch/actions/runs/3315388073/jobs/5475918454: ``` >>> Lint for .github/workflows/lint.yml: Error (ACTIONLINT) [action] input "check-latest" is not defined in action "actions/setup-python@v4". available inputs are "architecture", "cache", "cache-dependency-path", "python-version", "python-version-file", "token" 25 | with: 26 | python-version: 3.8 27 | architecture: x64 >>> 28 | check-latest: false 29 | cache: pip 30 | cache-dependency-path: | 31 | **/.github/requirements-gha-cache.txt ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87621 Approved by: https://github.com/ZainRizvi commit 5f4329134e30b8ce86db05388ebe55f3ab3a7099 Author: PyTorch MergeBot Date: Wed Oct 26 19:40:51 2022 +0000 Revert "Set check-latest to false when setup python and pip cache in CI (#87621)" This reverts commit 4080b1db284fd531654bcb2984a7fe0ff3b310cd. Reverted https://github.com/pytorch/pytorch/pull/87621 on behalf of https://github.com/huydhn due to Somehow setup-python treats Python 3.10 as Python 3.1 in pr-label.yml. I missed this signal because this is only run at push commit 38dd4cbdf1dc982492a0cc94a54eb2f71c31d8fe Author: jpvillam Date: Wed Oct 26 19:39:21 2022 +0000 ROCm enable sparse_sampled_addmm (#86401) Enables: test_comprehensive_sparse_sampled_addmm_cuda_complex128 test_comprehensive_sparse_sampled_addmm_cuda_complex64 test_comprehensive_sparse_sampled_addmm_cuda_float32 test_comprehensive_sparse_sampled_addmm_cuda_float64 test_dispatch_meta_sparse_sampled_addmm_cuda_complex128 test_dispatch_meta_sparse_sampled_addmm_cuda_complex64 test_dispatch_meta_sparse_sampled_addmm_cuda_float32 test_dispatch_meta_sparse_sampled_addmm_cuda_float64 test_meta_sparse_sampled_addmm_cuda_complex128 test_meta_sparse_sampled_addmm_cuda_complex64 test_meta_sparse_sampled_addmm_cuda_float32 test_meta_sparse_sampled_addmm_cuda_float64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86401 Approved by: https://github.com/ngimel commit 123b103bf101682e670c96ab505b6eb8475e8657 Author: Will Constable Date: Wed Oct 26 04:34:41 2022 +0000 Add dynamo_optimize_ddp arg to dist bench (#87768) cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87768 Approved by: https://github.com/davidberard98 commit aa66c6e01e16fe7012f0d27246b8159eb85e89aa Author: Will Constable Date: Wed Oct 26 04:34:38 2022 +0000 Fix missing weight init and clean up helper (#87760) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87760 Approved by: https://github.com/davidberard98 commit 58dc95b321631f40d2f18915f7cb6a68bdbd8607 Author: Kazuaki Ishizaki Date: Wed Oct 26 19:29:05 2022 +0000 Fix typos under aten directory (#87754) This PR fixes typos in `.md` files under aten directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/87754 Approved by: https://github.com/kit1980 commit 4080b1db284fd531654bcb2984a7fe0ff3b310cd Author: Huy Do Date: Wed Oct 26 19:23:55 2022 +0000 Set check-latest to false when setup python and pip cache in CI (#87621) I missed the fine print in https://github.com/actions/setup-python/blob/main/README.md#caching-packages-dependencies when setting up the cache using setup-python GHA > Restored cache will not be used if the requirements.txt file is not updated for a long time and a newer version of the dependency is available which can lead to an increase in total build time. The latter part is important because it implies that even with the cache, pip will still try to check if a newer version exists and that part can be flaky, i.e. https://github.com/pytorch/pytorch/actions/runs/3313764038/jobs/5472180293 This undesired behavior can be turned off by setting the advance option `check-latest` to false https://github.com/actions/setup-python/blob/main/docs/advanced-usage.md#check-latest-version. Per my understanding, this should tell pip install in these workflows to use the local cached copy of the package avoiding the need to query pypi every single time. `check-latest` was added quite recently https://github.com/actions/setup-python/pull/406, so `actionlint-1.6.15` fails to recognize it. Thus, this PR also upgrades `actionlint` to the latest 1.6.21 to pass the linter check. Here is an example error from 1.6.15 from https://github.com/pytorch/pytorch/actions/runs/3315388073/jobs/5475918454: ``` >>> Lint for .github/workflows/lint.yml: Error (ACTIONLINT) [action] input "check-latest" is not defined in action "actions/setup-python@v4". available inputs are "architecture", "cache", "cache-dependency-path", "python-version", "python-version-file", "token" 25 | with: 26 | python-version: 3.8 27 | architecture: x64 >>> 28 | check-latest: false 29 | cache: pip 30 | cache-dependency-path: | 31 | **/.github/requirements-gha-cache.txt ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87621 Approved by: https://github.com/ZainRizvi commit 2c1efe7472079fbeeb1ee9415db83851d8276c93 Author: Bin Bao Date: Wed Oct 26 16:13:20 2022 +0000 Enable some PyTorch core tests with inductor (#87490) Summary: 1) Graph break on torch.random.set_rng_state since it blocks running inductor core tests; 2) Add several inductor-specific skips; 3) Enable several core tests for inductor CI; cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87490 Approved by: https://github.com/eellison commit f7a04f310b76438448df758a3c9c2bf91b704a11 Author: HDCharles Date: Tue Oct 25 22:15:46 2022 -0700 [ao][ns] Replacing List[QConfigMapping] in PNP (#86922) Summary: Added QConfigMultiMapping which is essentially a List[QConfigMapping] with set methods and dedicated handling to avoid unwanted matches and improve UX. note: the from __future__ import annotations line caused weird errors when the QConfigMultiMapping class was put in _numeric_suite_fx.py so it was moved. Test Plan: python test/test_quantization.py TestFxNumericSuiteNShadows Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86922 Approved by: https://github.com/vkuzo commit 9639cb83ebd147d1a8ef7fa17855be6b69b040e6 Author: PyTorch MergeBot Date: Wed Oct 26 18:51:36 2022 +0000 Revert "[pytorch] Layer norm backward speed gain with warp shuffles (#87445)" This reverts commit b6f28334bc3276a56d79dea6cb7ed99411556348. Reverted https://github.com/pytorch/pytorch/pull/87445 on behalf of https://github.com/weiwangmeta due to breaking internal builds due to MS compiler commit 585d71513de98f02659835b08785de845bc6d348 Author: Ethan Pronovost Date: Wed Oct 26 18:50:48 2022 +0000 Add type annotations to distribution.py (#87577) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87577 Approved by: https://github.com/kit1980 commit 16e35bd179f101b8e0d266550e039bbbad513892 Author: arnaudstiegler Date: Wed Oct 26 17:45:46 2022 +0000 Adding expm1 to MPS (#87147) Fixes #86744 - Implementing the new `expm1_out_mps` function in `aten/src/ATen/native/mps/operations/UnaryOps.mm` - Adding it to `aten/src/ATen/native/native_functions.yaml` - Adding it to existing `test.test_mps.TestNLLLoss.test_unary_ops` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87147 Approved by: https://github.com/kulinseth commit 493ff6ac5bf66ead6fd53af5881ad7ae1795c5e8 Author: Sergii Dymchenko Date: Wed Oct 26 17:43:35 2022 +0000 Install py for pytest-sugar (#87803) linux-focal-py3.7-clang10-onnx / test is failng, the issue is https://github.com/Teemu/pytest-sugar/issues/241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87803 Approved by: https://github.com/seemethere, https://github.com/huydhn commit e2e428b03cdc9a0d206c72af31bca6e3c98d48b3 Author: albanD Date: Wed Oct 26 10:26:44 2022 -0400 Remove custom Ceil in favor of sympy.ceiling (#87294) [Alban]: the other changes that used to be in this PR (neg and fix for true div) are moved to other places where they already exist. Namely neg is already in master and true div will be in the next PR on the stack where all other functions are fixed at the same time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87294 Approved by: https://github.com/ezyang commit 777e6a2c5100f3274cff1bcf7e47ccbe1a651927 Author: albanD Date: Wed Oct 26 10:26:44 2022 -0400 Many symintifications (#87604) Adds expand_inplace conv conv_double_backward convolution adaptive_avg_pool2d_symint _embedding_bag_backward_symint cudnn_grid_sampler cuda 32 bit indexing nll_loss / nll_loss_2d tensor split pooling same mode cudnn_is_acceptable storage nbytes Pull Request resolved: https://github.com/pytorch/pytorch/pull/87604 Approved by: https://github.com/ezyang commit ae4fbac819992a76af87c8d800fecf3ace707b54 Author: Ivan Yashchuk Date: Wed Oct 26 17:00:02 2022 +0000 Enable nvprims.transpose fusions for nvFuser (#86967) This PR allows transposes to be fused with other operations. If a fusion group is formed only from operations that just manipulate metadata in PyTorch (transpose, view, etc.) then this group is not sent to nvFuser. On top of that if we have converted to `nvprims` but then decided to not form a fusion group we modify the graph use `prim.impl_aten` attribute instead of calling `prim(*args, **kwargs)` that has a higher overhead. cc @kevinstephano @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86967 Approved by: https://github.com/jjsjann123, https://github.com/SherlockNoMad commit ac0c13f665aa14c99837779580da74f01d9b96ab Author: PyTorch MergeBot Date: Wed Oct 26 16:43:13 2022 +0000 Revert "[ROCm] Use -rpath-link to fix libtinfo conflict (#83552)" This reverts commit a10446c4d826ae5505fa129ea9800d3924b25364. Reverted https://github.com/pytorch/pytorch/pull/83552 on behalf of https://github.com/kit1980 due to Broke ios/macos builds https://github.com/pytorch/pytorch/actions/runs/3329991911/jobs/5507911292 commit 701b3dd77380bb0f7e696c9511b8ee765488687d Author: Rohan Varma Date: Wed Oct 26 16:20:46 2022 +0000 optim utils all_gather_into_tensor (#87769) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87769 Approved by: https://github.com/awgu commit 642b63e1e788b26b270fc7f20865460b012bde1f Author: Richard Zou Date: Tue Oct 25 06:58:11 2022 -0700 Add test that `import torch` doesn't modify global logging state (#87629) Fixes https://github.com/pytorch/pytorch/issues/87626 Also adds the same test for `import functorch`. Users have complained at us when we do modify the global logging state, which has happened in the past. Test Plan: - tested locally; I added `logging.basicConfig` to `torch/__init__.py` and checked that the test got triggered Pull Request resolved: https://github.com/pytorch/pytorch/pull/87629 Approved by: https://github.com/albanD commit 422f946b8c6e7dba89f9277ac12f847713545856 Author: Chien-Chin Huang Date: Tue Oct 25 22:59:58 2022 +0000 [FSDP][BE] Improve the assert message of sharded load_state_dict (#87486) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87486 Approved by: https://github.com/awgu commit c2ef5c4f7ee894e12e44af6d6aa2c4972cf71025 Author: Pruthvi Madugundu Date: Wed Oct 26 15:34:38 2022 +0000 [ROCm] Move ROCm CI build to python 3.8 version (#86677) Currently it is python 3.7 want to upgrade to python 3.8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86677 Approved by: https://github.com/malfet commit 775fef51b76bd8fb323d75b9dd532446e3598d25 Author: Antoni Viros i Martin Date: Wed Oct 26 14:48:27 2022 +0000 Implement copy_, fill_, and ones_like for Nested Tensors backends (#87728) Summary: This diff implements copy_ in order to allow pinned memory transfers for nested tensors, as well as fill_ and ones_like, to test whether nested tensors can be created with other factory functions. Test Plan: Pass all CI and sandcastle jobs. Reviewed By: mikekgfb Differential Revision: D40689594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87728 Approved by: https://github.com/cpuhrsch commit a10446c4d826ae5505fa129ea9800d3924b25364 Author: Jithun Nair Date: Wed Oct 26 14:40:29 2022 +0000 [ROCm] Use -rpath-link to fix libtinfo conflict (#83552) Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552 Approved by: https://github.com/malfet commit ed7a8ab4361e47b9d64d9680561f350565ca3a7b Author: Mike Iovine Date: Wed Oct 26 14:34:29 2022 +0000 [Static Runtime] Make canEnableStaticRuntime examine sub-blocks (#87396) Summary: Someone was running into problems where 1) Static Runtime enablement would fail 2) We would try to fall back to the JIT interpreter *after trying to create `StaticModule`* 3) The fallback fails because Static Runtime mangled the graph. We don't want to prevent Static Runtime from mutating its input due to memory concerns. The intent of `canEnableStaticRuntime` is to catch issues in the module before Static Runtime messes with it. With this diff, `StaticModule` instantiation can be avoided by querying `canEnableStaticRuntime` and the issue is fixed. Test Plan: New unit test Differential Revision: D40564452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87396 Approved by: https://github.com/tenpercent commit 72f446b9bc394d4b39ce5038a4087990b5ac7852 Author: Ivan Yashchuk Date: Wed Oct 26 14:18:46 2022 +0000 Remove getitem special handling in the partitioner (#87073) This special handling of getitem unnecessary splits fusions at functions with tuple outputs. Example script: ```py import torch from torch.fx.passes.infra.partitioner import CapabilityBasedPartitioner from torch._prims.nvfuser_executor import NvfuserPrimOperatorSupport from torch.fx.experimental.proxy_tensor import make_fx def func(x): xx = torch.ops.nvprims.add(x, 1) var, mean = torch.ops.nvprims.var_mean(x, correction=0) var_cos = torch.ops.nvprims.cos(var) mean_sin = torch.ops.nvprims.sin(mean) return torch.ops.nvprims.add(var_cos, mean_sin) a = torch.randn(5, 3, 3, device="cuda") gm = make_fx(func)(a) gm.graph.print_tabular() supported_ops = NvfuserPrimOperatorSupport() partitioner = CapabilityBasedPartitioner( gm, supported_ops, allows_single_node_partition=False ) partitions = partitioner.propose_partitions() print(partitions) partitioned_graph = partitioner.fuse_partitions(partitions) partitioned_graph.graph.print_tabular() ``` Output on master: ```py opcode name target args kwargs ------------- --------- --------------------------- ---------------- ----------------- placeholder x_1 x_1 () {} call_function add nvprims.add.default (x_1, 1) {} call_function var_mean nvprims.var_mean.main (x_1, [0, 1, 2]) {'correction': 0} call_function getitem (var_mean, 0) {} call_function getitem_1 (var_mean, 1) {} call_function cos nvprims.cos.default (getitem,) {} call_function sin nvprims.sin.default (getitem_1,) {} call_function add_1 nvprims.add.default (cos, sin) {} output output output (add_1,) {} [{cos, sin, add_1}, {var_mean, add, getitem, getitem_1}] opcode name target args kwargs ------------- --------- --------------------------- ---------------------- -------- placeholder x_1 x_1 () {} call_module fused_1 fused_1 (x_1,) {} call_function getitem_2 (fused_1, 0) {} call_function getitem_3 (fused_1, 1) {} call_module fused_0 fused_0 (getitem_2, getitem_3) {} output output output (fused_0,) {} ``` Output with this PR: ``` [{var_mean, add_1, cos, sin, add, getitem_1, getitem}] opcode name target args kwargs ----------- ------- -------- ---------- -------- placeholder x_1 x_1 () {} call_module fused_0 fused_0 (x_1,) {} output output output (fused_0,) {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87073 Approved by: https://github.com/jjsjann123, https://github.com/SherlockNoMad commit 59aacc40ca2248a18af385cd30831ee785bbb684 Author: Natalia Gimelshein Date: Wed Oct 26 06:33:43 2022 +0000 Couple fixes for argmax/argmin (#87758) Removes a wrong assert, makes min number of warps = 2 (1 for some reason generates invalid code, https://github.com/openai/triton/issues/802). Hopefully fixes https://github.com/pytorch/torchdynamo/issues/1743, cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @mreso Pull Request resolved: https://github.com/pytorch/pytorch/pull/87758 Approved by: https://github.com/Chillee, https://github.com/soumith commit 0294787bd6286c8672f4659bd7d7ddca3c3a14c3 Author: Charlie Yan Date: Wed Oct 26 00:32:13 2022 +0000 Format distributed.py (#87667) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87667 Approved by: https://github.com/zhaojuanmao commit a24635208bce5030cb1d9fdd2f66d3b6abd9dbef Author: Yanbo Liang Date: Wed Oct 26 05:40:25 2022 +0000 [Inductor] update triton commit pin (#87732) Fixes https://github.com/pytorch/torchdynamo/issues/1746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87732 Approved by: https://github.com/ngimel commit 02797db24f137961305c2a9a670bb3667059ba15 Author: PyTorch MergeBot Date: Wed Oct 26 05:09:39 2022 +0000 [vision hash update] update the pinned vision hash (#87744) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87744 Approved by: https://github.com/pytorchbot commit 0d13ffbbae0ae12e72ed8856ccdd822bf840344c Author: Zachary DeVito Date: Tue Oct 25 19:47:30 2022 +0000 [inductor] Fix finalization issues when using multiprocessing (#87725) If python was launched with 'spawn' it will not use the standard shutdown methods that concurrent.futures requires. So we register a shutdown with the method it does uses. Without this, shutdown hangs since the workers will not exit. cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87725 Approved by: https://github.com/wconstab commit 8a6a126182bfa21af9868d17478a099a2b18c6d3 Author: Chien-Chin Huang Date: Tue Oct 25 22:59:57 2022 +0000 [FSDP][BE] Split state_dict related hooks to a separate file to reduce development conflicts (#87421) This PR does following two things to improve the code quality. 1. Split state_dict related hooks to a separate file to reduce development conflicts. 2. Remove unused APIs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87421 Approved by: https://github.com/rohan-varma commit 82c8365c16a63bed7a9f6937b49f5b45a83cc32b Author: Nikita Shulga Date: Wed Oct 26 03:31:54 2022 +0000 [BE] Delete `TH_DISALLOW_COPY_AND_ASSIGN` (#87743) Replace it with `AT_DISALLOW_COPY_AND_ASSIGN` and delete the header that contained this define Pull Request resolved: https://github.com/pytorch/pytorch/pull/87743 Approved by: https://github.com/atalman, https://github.com/ngimel commit 354549e0337a18f99b21aae7a48d4af1e54e0f97 Author: Nikita Shulga Date: Wed Oct 26 03:30:45 2022 +0000 [MPS] Use `bandPartWithTensor:numLowerTensor:...` (#87752) To make it uniform with the rest of usage of this op throughout MPS codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/87752 Approved by: https://github.com/kulinseth commit de65f156ed6595f0748ff03d27928ddeee3695af Author: Shen Li Date: Tue Oct 25 22:30:54 2022 +0000 Add distributed composable API contract (#87580) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87580 Approved by: https://github.com/yhcharles commit 9c2555f018c4a4b64500dce3c37b4dcdc48d0c0b Author: Huy Do Date: Wed Oct 26 02:28:36 2022 +0000 Upgrade CI binary build runner from 4x to 12xlarge (#87727) It currently takes a whopping 2h30m just to build PyTorch binary for every PR and commit. Pushing it to 12xlarge reduces the time to 1h40m https://github.com/pytorch/pytorch/actions/runs/3323869550/jobs/5494754029, not exactly a linear (and fair) trade, but good enough to reduce this long pole. I'll monitor the queue for 12xlarge after this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87727 Approved by: https://github.com/kit1980, https://github.com/malfet commit 85a79a7f506dadbc269cf2abb7536f64ab49074d Author: Justin Chu Date: Wed Oct 26 00:39:59 2022 +0000 [ONNX] Expand `_cast_` symbolic functions (#87666) The `_cast_` family of symbolic functions has been created from a template function. Even though it saved some lines, it very much obscured the intention of the code. Since the list doesn't really change and the `_cast_` family are IIRC deprecated, it is safe for us to expand the templates and make the code more readable. This PR also removes any direct calls to `_cast_` functions to maintain a consistent pattern of directly creating `Cast` nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87666 Approved by: https://github.com/BowenBao commit 63397ac3f9402e05f1795f35bb381c236dadd1d4 Author: Sergii Dymchenko Date: Wed Oct 26 00:26:44 2022 +0000 Disable ossf-scorecard (#87740) Disable as it frequently fails https://github.com/pytorch/pytorch/actions/runs/3325113107/jobs/5497443452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87740 Approved by: https://github.com/huydhn commit c600ce39ed20f7a6d6fb5a1d62ffac573b760db6 Author: Justin Chu Date: Tue Oct 25 23:07:12 2022 +0000 [ONNX] Refactor UnsupportedOperatorError arguments (#85349) Merged the first two arguments because we always use qualified names to identify symbolic functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/85349 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 57b36bf353b4776954ea02d16f6051a72d46c532 Author: Bin Bao Date: Tue Oct 25 20:21:16 2022 +0000 Bring back TIMM model inductor CI test (#87730) Summary: https://github.com/pytorch/pytorch/pull/87588 has solved the inductor compilation speed regression, so we can try to run TIMM models with fewer shards and also enable pretained model downloading which should resolve the flakyness we have seen previously. cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87730 Approved by: https://github.com/anijain2305 commit 85ffbedfb2a2bffda220e3fb73dc311dba5e7fed Author: Richard Barnes Date: Wed Oct 26 00:07:44 2022 +0000 Strip GCC5 stuff from PyTorch (#85914) [This file](https://github.com/pytorch/pytorch/pull/63208/files) indicates that we don't support anything less than GCC 7.5. Given that, let's remove this GCC 5 stuff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85914 Approved by: https://github.com/ezyang commit 21f7e7d040c646b4ce7f4a4e973da97660462bdc Author: Sergii Dymchenko Date: Wed Oct 26 00:03:24 2022 +0000 Disable linux-bionic-py3_7-clang8-xla-test (#87737) pull / linux-bionic-py3_7-clang8-xla / test fails with strange sudo npm install -g bazels3cache node: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by node) https://github.com/pytorch/pytorch/actions/runs/3324545518/jobs/5496432160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87737 Approved by: https://github.com/huydhn commit 7ab6f56ca72a5f1b8c7b0c73e3947c0af3f998c8 Author: Jerry Zhang Date: Tue Oct 25 17:39:24 2022 +0000 [quant][core] Add quantize/dequantize ops for decomposed quantized Tensor representation (#87093) Summary: Added q/dq implementation for out of core (decomposed) quantized Tensor representation, meaning that instead of storing quantization parameters (e.g. scale/zero_point) in a separate quantized Tensor object, we will store quantization parameters in the argument of operators. ``` quantize(float32_tensor, scale, zero_point, dtype) -> int8_tensor dequantize(int8_tensor, scale, zero_point, dtype) -> float32_tensor ``` Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize python test/test_quantization.py TestQuantizedTensor.test_decomposed_dequantize Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87093 Approved by: https://github.com/dzdang, https://github.com/z-a-f commit 4a168e994146161f9b3113f4dca44255f783e066 Author: Max Podkorytov Date: Tue Oct 25 23:48:16 2022 +0000 [static-runtime] run codegen (#87534) Summary: ``` buck run //caffe2/torch/fb/jit:gen_static_runtime_ops ``` Test Plan: CI Differential Revision: D40612521 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87534 Approved by: https://github.com/mikeiovine commit dd82d936e11d9ce22b477a3433ed9269ac66c385 Author: eqy Date: Tue Oct 25 23:30:30 2022 +0000 [cuDNN][cuDNN V8 API] Use suggest memory format for cuDNN V8 API (#87617) Fixes some failures we observed in `functorch` tests which seemed to stem from benchmark cache collisions on the same memory format. Changing the memory format to be dependent on both input and weight seems to resolve them. CC @crcrpar @ptrblck cc @csarofeen @ptrblck @xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87617 Approved by: https://github.com/ngimel commit 882a4f4528702a22b9528d97bad920727b8b8b72 Author: Sergii Dymchenko Date: Tue Oct 25 23:29:02 2022 +0000 Update xla.txt (#87739) As per @JackCaoG suggestion to fix the xla tests. This PR replaces https://github.com/pytorch/pytorch/pull/87737, see that for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87739 Approved by: https://github.com/weiwangmeta commit 20c08f299fa6ab839d21b42e4d1fa15b410a314b Author: Rohan Varma Date: Tue Oct 25 13:34:16 2022 -0700 [FSDP][BE] Skip asan (#87729) Per title Differential Revision: [D40690407](https://our.internmc.facebook.com/intern/diff/D40690407/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87729 Approved by: https://github.com/awgu commit bd4c4537dc477b9a4df4cb44c2042a10d31341ab Author: Minh Nguyen Date: Tue Oct 25 22:52:52 2022 +0000 aten cpu and xnnpack to be compatible with arvr mode build (#87125) Summary: When building 3d photo sdk generator package in arvr/mode/mac and arvr/mode/mac-arm modes, we got several issues with aten cpu and xnnpack libraries. The reason is that those packages are using platform-* properties (platform-deps, platform-srcs...) which are not compatible with arvr modes. This diff fixes those issues by using `select` for non-platform properties when is_arvr_mode() is true, while keeping those platform ones for non-arvr modes. Test Plan: ``` buck build //arvr/projects/compphoto/photo3d_sdk/unity/plugin:generator_plugin_shared arvr/mode/mac-arm/dev buck build //arvr/projects/compphoto/photo3d_sdk/unity/plugin:generator_plugin_shared arvr/mode/mac-arm/opt buck build //arvr/projects/compphoto/photo3d_sdk/unity/plugin:generator_plugin_shared arvr/mode/mac/dev buck build //arvr/projects/compphoto/photo3d_sdk/unity/plugin:generator_plugin_shared arvr/mode/mac/opt ``` and sandcastle builds Differential Revision: D40028669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87125 Approved by: https://github.com/kimishpatel commit a605a30732fd57c900ceb7705e88403e0b591bb1 Author: William Wen Date: Tue Oct 25 22:47:54 2022 +0000 Fix CODE level usage in dynamo config.py (#87522) Fixes https://github.com/pytorch/torchdynamo/issues/1718. Tested by changing `log_level = logging.WARNING` in config.py to `log_level = logging.CODE` and running a test script that doesn't touch `log_level`. cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87522 Approved by: https://github.com/mlazos commit e150a6212b31bc3bb088a821c82943207060b6eb Author: Horace He Date: Tue Oct 25 18:49:25 2022 +0000 Added gm.print_readable to torchinductor_trace output (#87717) cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87717 Approved by: https://github.com/ngimel commit b013eb5447c937f03f52c7e3c7cc6cb7b7939a98 Author: maxren Date: Mon Oct 24 15:24:57 2022 -0700 [xnnpack][lite-int][graph-build] graph passes and op checking (#87128) Beginning of building the xnnpack graph from the torchscript IR. We first massage the torchscript graph using a few graph passes that perform things such as unused self argument removal and constant propagation. This also performs tracing for us so that the model does not have to be prepped by tracing before being lowered by us. The other check we perform is through the torchscript IR to identify any nodes that are not lowerable/supported, and throwing an error to spit out the specific nodes that are not lowerable. Differential Revision: [D39838338](https://our.internmc.facebook.com/intern/diff/D39838338/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39838338/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87128 Approved by: https://github.com/salilsdesai commit 44d7ba7efb47ae8dc1713de569221d1f44c6e4b9 Author: Michael Lazos Date: Tue Oct 25 21:55:27 2022 +0000 Fix debug dir bugs and minifier output directories (#87682) Fixes https://github.com/pytorch/torchdynamo/issues/1758, https://github.com/pytorch/torchdynamo/issues/1752 - minifier_launcher.py now dumps checkpoints to \/checkpoints when run - a single debug directory is created per script invocation, asserts failing with no directory will no longer occur - torchinductor debug tracing will correctly dump to the debug directory now since no prior setup is needed, (the directory was incorrectly only initialized during dynamo tracing) cc @jansel @lezcano @fdrocha @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87682 Approved by: https://github.com/ezyang commit ff2569bc8c86f5a64d72ae9232ea59e84a73dd80 Author: Ivan Yashchuk Date: Tue Oct 25 21:53:11 2022 +0000 Intercept aten._reshape_alias for nvFuser (#87072) This would help forming larger fusion groups. If this won't end up executed by nvFuser then eager mode implementation would call into `.reshape`: https://github.com/pytorch/pytorch/blob/37e9e89afbc3554258545a026fab4cd9e1a4b85d/torch/_prims/nvfuser_prims.py#L552-L553 cc @kevinstephano @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87072 Approved by: https://github.com/ngimel commit a3d495bd4ee3c55d9111668178f20d881459773a Author: Kazuaki Ishizaki Date: Tue Oct 25 21:49:59 2022 +0000 Fix typos under functorch directory (#87663) This PR fixes typos in `.md` and `.rst` files under functorch directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/87663 Approved by: https://github.com/kit1980 commit 0b162f5b494dea3b20540386f06b49840fb347e6 Author: Sherlock Huang Date: Tue Oct 25 04:46:42 2022 +0000 Fix stride for prims.where (#87563) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87563 Approved by: https://github.com/ngimel, https://github.com/mruberry commit bc194948140fc3ea83f596f51c2097c23361ce57 Author: Michael Voznesensky Date: Tue Oct 25 21:15:40 2022 +0000 [Dynamo] Symbolic shape guards (#87570) **Introduces symbolic shape guards into dynamo.** In this PR, we take the existing fake tensor infra and plumbing in dynamo and we start passing a shape_env around. This shape_env does not get plumbed down to middle layers / backend yet - it only collects expressions from frontend invocations at the moment. We then translate these expressions into guards at the point where we take other guards installed throughout dynamo - and add them to check_fn. Part 1 of https://docs.google.com/document/d/1QJ-M4zfMkD-fjHIqW089RptjLl9EgozZGCceUbvmgfY/edit# cc @jansel @lezcano @fdrocha @mlazos @soumith @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87570 Approved by: https://github.com/ezyang commit d0e12d1cc8b08ea8770b6ec941372793c4e4d4d0 Author: HDCharles Date: Tue Oct 25 09:58:57 2022 -0700 [ao] Adding FAQ to docs (#87322) Summary: migrated from: https://discuss.pytorch.org/t/quantization-frequently-asked-questions/161251 Test Plan: circle CI tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/87322 Approved by: https://github.com/z-a-f commit ece3758afc61cb43e4d2f480b46a76b3a8376984 Author: Sherlock Huang Date: Tue Oct 25 04:46:42 2022 +0000 Fix _refs for aten.zeros/ones/empty/randn (#87569) refs for aten.zeros/ones/empty/randn doesn't support .names overload. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87569 Approved by: https://github.com/ngimel commit ebe5aad466fa7d1a25903be04ab7b15bdb6dcdf2 Author: Animesh Jain Date: Tue Oct 25 19:58:23 2022 +0000 [inductor] Revert channels-last support (#87588) We witnessed slow compilation times last week. Earlier, I thought it was due to parallel compilation. But, after git bisect, I found the source of extra time to be my PR - https://github.com/pytorch/pytorch/pull/87049 For 1x1 kernel, the current striding check incorrectly declares channels-first 1x1 convs to channels last. I am not sure why it caused so much compilation time jump. Or why it did not fail? There was no change in performance speedup. cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu to identify what could be source of this compilation time increase, so that we can manually check that part of the stack. With this `res2next50` compilation time went back to 96 seconds (which was raised to 900 seconds with my earlier PR) for single thread. And parallel-compilation brings it down to ~30 seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87588 Approved by: https://github.com/soumith, https://github.com/jansel, https://github.com/ngimel commit 312628d29972ef48897e79a3b46a7a680ecc4759 Author: S.Cao-office Date: Tue Oct 25 19:51:42 2022 +0000 Fixed minor typos in torch.flip and torch.rot90 (#87724) Fixes #87721 @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/87724 Approved by: https://github.com/malfet commit 52ac8adc209395d2631a2d05714fc60a8f937591 Author: AllenTiTaiWang Date: Tue Oct 25 16:31:45 2022 +0000 [ONNX] Fix pad Circular Mode (#86984) In https://github.com/pytorch/pytorch/pull/73433, a ONNX test case is missed, and the result is incorrect when it is converted to ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86984 Approved by: https://github.com/BowenBao commit e532fb9a95d5d453fa2128df189cb4c89424f429 Author: Xu Zhao Date: Tue Oct 25 19:38:41 2022 +0000 Use setup_instance script to enable conda and load cuda libraries (#87296) Fixes the broken torchbench CI after the machine image update. RUN_TORCHBENCH: nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/87296 Approved by: https://github.com/davidberard98 commit 7a6808c5f6d556285607ab51adc4ae69f30ae3c9 Author: min-jean-cho Date: Tue Oct 25 19:24:35 2022 +0000 build: support DNNL_GRAPH_CPU_RUNTIME=TBB (#87512) Force set cmake `DNNL_GRAPH_CPU_RUNTIME` as `MKLDNN_CPU_RUNTIME` to overwrite [`set(DNNL_GRAPH_CPU_RUNTIME "OMP")`](https://github.com/oneapi-src/oneDNN/blob/d19d0f795c60695bd32f894c6f01771b2dfbe24d/cmake/options.cmake#L65-L67), enabling user-specified `MKLDNN_CPU_RUNTIME` values (`OMP` (default), `TBB`) for `DNNL_GRAPH_CPU_RUNTIME`. Fixes https://github.com/pytorch/pytorch/issues/87511 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87512 Approved by: https://github.com/jgong5, https://github.com/ashokei, https://github.com/malfet commit 82698b8954f9cde4c109c8ee58d3314d81adb30a Author: Shen Li Date: Tue Oct 25 15:00:39 2022 +0000 Add prepend argument to nn.Module hooks (#87370) cc @ezyang @gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/87370 Approved by: https://github.com/soulitzer commit 82dff8ee09278bfea385c27d21d88b978ef911c9 Author: AllenTiTaiWang Date: Tue Oct 25 15:52:17 2022 +0000 [ONNX] replace AT_ASSERT with TORCH_INTERTNAL_ASSERT take 2 (#86405) Address the AT_ASSERT in torch/jit/csrc/serialization (ONNX related). Pull Request resolved: https://github.com/pytorch/pytorch/pull/86405 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit 65b4a633bbcb45111962f48573930e3d2a2bc2c2 Author: AllenTiTaiWang Date: Tue Oct 25 15:55:31 2022 +0000 [ONNX] Support quantized::conv1d_relu (#85997) According to #38248, quantized::conv1d_relu shares packing parameters with Conv2D (kspatialDim is also 2), and needs a different unpacking way. Therefore, a new `QuantizedParamsType=Conv1D` is used to differentiate the two, and has to extract 1D information from 2D packed parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85997 Approved by: https://github.com/BowenBao commit 15370d32b9aaf036f559ac10059b50ac6dbc5cc6 Author: Bin Bao Date: Tue Oct 25 17:34:29 2022 +0000 Disable test_inductor_timm_shard (#87710) Summary: tests are flaky. Need more time for investigation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87710 Approved by: https://github.com/anijain2305, https://github.com/malfet commit 874625e039274b83e70163733f6f2c1689b0de4e Author: Will Constable Date: Tue Oct 25 02:35:41 2022 +0000 Graph-break on FSDP in dynamo (#87420) Why we want to graph-break FSDP - FSDP has communication ops during forward and backward which we currently can't trace into the graph but also want to ensure are overlapped with compute - dynamo has issues tracing into or capturing a call to fsdp module without a break (see below) How we graph-break on FSDP - marking FSDP.forward code as skip means the code frames will graph-break; but in this case all of torch.* is listed in skipfiles.py anyway, so this is taken care of - disallowing the FSDP module prevents dynamo trying to record a 'call_module(FSDPmodule)' node into a graph, which happens earlier than the graphbreak that would be caused by skip, and causes additional issues: dynamo deepcopies modules before call-module handling, and FSDP module isn't trivially deep-copyable cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87420 Approved by: https://github.com/aazzolini commit b6f28334bc3276a56d79dea6cb7ed99411556348 Author: Valentin Andrei Date: Tue Oct 25 17:03:23 2022 +0000 [pytorch] Layer norm backward speed gain with warp shuffles (#87445) Test Plan: ``` Times below are Forward + Backward on A100 Size FP32. Gain. FP16. Gain 256, 256 101.30 9% 103.9 6% 512, 256 110.10 -4% 102.9 10% 1024, 256 104.30 7% 102.4 6% 2048, 256 107.60 4% 109.7 0% 4096, 256 116.70 8% 109.1 0% 6144, 256 106.10 7% 112.8 2% 8192, 256 106.10 1% 109.7 2% 256, 512 102.10 3% 108.5 1% 512, 512 101.50 40% 105.9 4% 1024, 512 109.70 20% 109.2 -1% 2048, 512 107.40 24% 107.2 1% 4096, 512 108.00 6% 110.6 -3% 6144, 512 103.90 13% 105.8 7% 8192, 512 138.70 14% 105.6 7% 256, 1024 106.20 1% 102.9 6% 512, 1024 104.50 4% 104.2 3% 1024, 1024 126.90 -15% 103.9 10% 2048, 1024 127.40 -15% 102.2 6% 4096, 1024 117.70 6% 102.8 21% 6144, 1024 165.30 11% 112.2 12% 8192, 1024 211.90 11% 144.8 13% 256, 1536 102.80 11% 103.1 6% 512, 1536 103.30 9% 102.9 18% 1024, 1536 111.00 -2% 117.2 7% 2048, 1536 102.30 12% 132.1 -4% 4096, 1536 165.50 5% 112.9 18% 6144, 1536 236.60 5% 145.7 12% 8192, 1536 307.80 5% 186.1 11% 256, 2048 110.60 -1% 103.8 7% 512, 2048 105.20 3% 105.6 1% 1024, 2048 106.70 3% 114.8 3% 2048, 2048 124.90 5% 109.7 0% 4096, 2048 231.40 4% 129.9 10% 6144, 2048 332.80 4% 182.5 11% 8192, 2048 434.60 4% 235.2 11% 256, 3072 111.60 8% 110.8 1% 512, 3072 106.80 1% 104.6 10% 1024, 3072 104.90 3% 109.9 4% 2048, 3072 193.80 0% 106.2 10% 4096, 3072 364.50 0% 187.8 5% 6144, 3072 538.30 0% 267 5% 8192, 3072 718.00 -1% 346.7 6% 256, 4096 103.60 4% 110.2 -1% 512, 4096 131.40 -11% 117 -7% 1024, 4096 135.80 1% 104.8 7% 2048, 4096 268.20 1% 149.4 10% 4096, 4096 520.70 1% 268.5 9% 6144, 4096 786.30 0% 389.8 9% 8192, 4096 1043.50 0% 509 10% ``` Used the following script from ngimel: ``` import torch from torch.utils.benchmark import Compare, Timer results = [] for dtype in (torch.float, torch.half): for fs in (256, 512, 1024, 1536, 2048, 3072, 4096): for bs in (256, 512, 1024, 2048, 4096, 6144, 8192): ln = torch.nn.LayerNorm((fs,), device="cuda", dtype=dtype) X = torch.randn(bs, fs, device="cuda", dtype=dtype, requires_grad=True) gO = torch.rand_like(X) stmtfwd = "ln(X)" stmtfwdbwd = "X.grad=None; ln.zero_grad(set_to_none=True); out = ln(X); out.backward(gO)" tfwd = Timer( stmt=stmtfwd, label="ln", sub_label=f"{bs:5}, {fs:5}", description=f"fwd, {dtype}", globals=globals(), ) tfwdbwd = Timer( stmt=stmtfwdbwd, label="ln", sub_label=f"{bs:5}, {fs:5}", description=f"fwdbwd, {dtype}", globals=globals(), ) for t in (tfwd, tfwdbwd): results.append(t.blocked_autorange()) print(fs, end="\r") c = Compare(results) c.print() ``` Differential Revision: D40567574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87445 Approved by: https://github.com/ngimel commit 7b5978254f0785f8a1c94b545c444985a2c19d96 Author: Tugsbayasgalan Manlaibaatar Date: Mon Oct 24 15:44:46 2022 -0700 Add named_buffers to torchdynamo nn_module (#87644) Fixes: https://github.com/pytorch/torchdynamo/issues/1738 cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87644 Approved by: https://github.com/jansel commit 8a2a4ed488277797ea6b15ee531e9374aa45acfd Author: stumpOS Date: Tue Oct 25 17:00:27 2022 +0000 consider numel args when identifying aligned args (#87394) Fixes #ISSUE_NUMBER https://github.com/pytorch/torchdynamo/issues/1527 cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu Pull Request resolved: https://github.com/pytorch/pytorch/pull/87394 Approved by: https://github.com/jansel commit 569eebb43cdc11a83dab28ef33ba969bc54d8979 Author: Horace He Date: Tue Oct 25 04:04:16 2022 +0000 Add get_guard_expr to symbolic_shapes which returns all guards in a single expression (#87665) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87665 Approved by: https://github.com/ezyang, https://github.com/voznesenskym commit eb99c1efce20aedf67e696ba5aa61fecb5651838 Author: Sherlock Huang Date: Tue Oct 25 04:46:42 2022 +0000 Prefer python meta function over c++ meta function (#87426) This is a policy update for meta registration. **We now prefer python meta implementation over C++ meta function.** This is a flip of the previous policy, where we prefer C++ meta function over python meta function if they both exist. Here's the meta registration process: 1. register_meta and register_decomposition will place the python meta/decomp functions into the `global_decomp_table`. However, they will NOT register them into dispatcher. 2. After global_decomp_table is populated, we will compile an `active_meta_table`. For a given op, we pick the most specific decomp function from `global_decomp_table` in the preference order of Meta > PostAutograd > PreAutograd. 3. We will unconditionally register all of them into python dispatcher. And register them into C++ dispatcher, unless it one of the following 3 cases - 1. the op is a CompositeImplicitAutograd, and should rely on decomposed op's meta - 2. the op is a view op, as the MetaTensor doesn't support aliased storage - 3. the op is in the blocklist (due to UT failures, and we will burn down this list op by op) Over the long run, we wish to implement all meta functions in python. With this PR, 321 op_overloads will have cpp meta overridden by python meta. There are still 400 op_overloads is using cpp meta. The exact list can be found here https://gist.github.com/SherlockNoMad/d20bb736178df8eebd3b054c8bb7cdc5 cc @ngimel @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87426 Approved by: https://github.com/ezyang, https://github.com/jansel commit 65601f5ef3b86231ce886f534fbc8c1c4de9f11d Author: AllenTiTaiWang Date: Mon Oct 24 21:14:18 2022 +0000 [ONNX] Add Support on 0d tensor Broadcast (#87211) I am not sure if this will break things ... Although 0d tensor is an undefined behavior in ONNX spec, I did some experiments and found that ONNX shape inference actually provides 0d as inference from 0d and 1d Op calculations, and the bug happened in Broadcast function. But still, if this breaks things really bad, I think we can put 0d tensor handling on hold, as it's not very common usage on models? Pull Request resolved: https://github.com/pytorch/pytorch/pull/87211 Approved by: https://github.com/jcwchen, https://github.com/BowenBao commit 5308886ec3b09819e95dd5dfffde597a25f3fb43 Author: PyTorch MergeBot Date: Tue Oct 25 14:45:12 2022 +0000 Revert "Intercept aten._reshape_alias for nvFuser (#87072)" This reverts commit 163a829caa82559e7f938f65c1b647a5d50663c3. Reverted https://github.com/pytorch/pytorch/pull/87072 on behalf of https://github.com/malfet due to Looks like it broke test_indexing in dynamo shard, see https://github.com/pytorch/pytorch/actions/runs/3318778609/jobs/5483248042 commit 0cba7888c5eeb66535e72bad852c3ca3dc3ac681 Author: Driss Guessous Date: Tue Oct 25 14:44:05 2022 +0000 Performance improvment to cumulative seq len (#87530) Performance improvement to calculating metadata needed for gluing in nested tensors to fused kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87530 Approved by: https://github.com/cpuhrsch commit 87163fe8df1ac64070fa9b9a6b04ba5fae0979f3 Author: Bert Maher Date: Mon Oct 24 12:57:57 2022 -0700 [inductor] Trivial smoke-test (#87598) As we're bringing up dynamo+inductor on Meta-internal infra, I keep wanting a stupidly simple program to run to see if anything at all is working. This test is that program :-p. Obviously test_torchinductor.py is more comprehensive but it's also harder to tell exactly what's going on, whereas this test fits on one screen. Differential Revision: [D40595798](https://our.internmc.facebook.com/intern/diff/D40595798/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D40595798/)! cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87598 Approved by: https://github.com/anijain2305, https://github.com/brad-mengchi commit 9efca7c0850c65d915827f3325d704dcb4f11a1c Author: Jagadish Krishnamoorthy Date: Tue Oct 25 07:17:44 2022 +0000 [ROCm] [FakeTensorTest] Enable test_fallback_memory_prop (#85760) Signed-off-by: Jagadish Krishnamoorthy Pull Request resolved: https://github.com/pytorch/pytorch/pull/85760 Approved by: https://github.com/kit1980 commit e818574e78580c86064cd8ac37e5d492350e1e72 Author: Daniel Falbel Date: Tue Oct 25 07:12:28 2022 +0000 Support `signbit` in MPS. (#87214) Implements the signbit operator for MPS. Links to #77764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87214 Approved by: https://github.com/kulinseth, https://github.com/kit1980 commit 163a829caa82559e7f938f65c1b647a5d50663c3 Author: Ivan Yashchuk Date: Tue Oct 25 06:55:59 2022 +0000 Intercept aten._reshape_alias for nvFuser (#87072) This would help forming larger fusion groups. If this won't end up executed by nvFuser then eager mode implementation would call into `.reshape`: https://github.com/pytorch/pytorch/blob/37e9e89afbc3554258545a026fab4cd9e1a4b85d/torch/_prims/nvfuser_prims.py#L552-L553 cc @kevinstephano @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87072 Approved by: https://github.com/ngimel commit 9bbdc7ab3462a1be8267bc81321fca702eccf854 Author: PyTorch MergeBot Date: Tue Oct 25 06:14:54 2022 +0000 [vision hash update] update the pinned vision hash (#87639) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87639 Approved by: https://github.com/pytorchbot commit e85230b8197ddf38268ae0732971515149d1b652 Author: Takeshi Watanabe Date: Tue Oct 25 05:49:52 2022 +0000 [JIT] Fix return types of inputs/outputs method in Graph (#86349) The C++ definition return `ArrayRef` but in python binding it returns iterator instead: https://github.com/pytorch/pytorch/blob/d04889323e2bc0b7321b76e564292565c88b9a5e/torch/csrc/jit/python/python_ir.cpp#L631 I've had hard time with mypy and there is also fixed version of stubs in pytorch-pfn-extras for my project: https://github.com/pfnet/pytorch-pfn-extras/blob/beeab3f30381fd1ed313bc09d561c567482784a1/stubs/torch/_C/__init__.pyi#L458 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86349 Approved by: https://github.com/kit1980 commit 0367c12bce8f1b98bae57d6d380c8066a70edfba Author: Bill Schnurr Date: Tue Oct 25 04:47:10 2022 +0000 Fix torch.testing.assert_close not exported from module (#87619) For pylance/pyright static typechecking "Imported symbols are considered private by default. If they use the “import A as A” (a redundant module alias), “from X import A as A” (a redundant symbol alias)" https://github.com/microsoft/pyright/blob/main/docs/typed-libraries.md#library-interface torch.testing.assert_close not exported from module https://github.com/microsoft/pylance-release/issues/3526 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87619 Approved by: https://github.com/kit1980 commit ec15942916b3e09a0acd75664aa699d10131e6df Author: shynehr Date: Tue Oct 25 04:45:52 2022 +0000 remove unnecessary __syncthreads() in conv_depthwise2d_grad_weight_kernel (#84854) Threads within a thread block would be synchronize inside the function BlockReduceSum when intra-warp reduce finishes. It's unnessary to synchronize threads before invoking function BlockReduceSum. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84854 Approved by: https://github.com/ngimel commit 874a94ce9482a1af4bee782a831b2632cd6eda13 Author: Soof Golan <83900570+soof-golan@users.noreply.github.com> Date: Tue Oct 25 04:43:07 2022 +0000 Fix `tensor.stride()` type hint (#84177) `tensor.stride()` now hints at tuple of variable length instead of tuple with constant length of 1 Fixes #84176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84177 Approved by: https://github.com/Chillee commit 4ef5f5dec7d8fff6c73d2908ae4ecdfb2cebce04 Author: Howard Huang Date: Mon Oct 24 12:30:45 2022 -0700 Fix use after free in tensorpipe agent (#87627) Fixes #87359, which identifies use after free for reverse device maps. This is only in the dynamic RPC feature and not effecting stable RPC code path. Unfortunately the test `TensorPipeRpcTest.test_dynamic_rpc_existing_rank_can_communicate_with_new_rank_cuda` that is failing is also running into separate issue. I've temporarily disabled some of the test code to investigate the error in asychronously. Testing plan: - tested all the dynamic RPC tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/87627 Approved by: https://github.com/rohan-varma commit fd60b818b9d5b9ff6c7e33b7c2ba15d5b2fe97cd Author: Tom Stein Date: Tue Oct 25 04:07:16 2022 +0000 [Python] refactor slices on sorted (#86995) Sometimes you want to query the small element of a set of elements and use `sorted(elements)[0]` without a second thought. However, this is not optimal, since the entire list must be sorted first `O(n log n)`. It would be better to use the `min(elements)` method provided for this purpose `O(n)`. Furthermore `sorted(elements)[::-1]` is not very efficient, because it would be better to use `sorted(elements, reverse=True)` to save the slice operation. **TLDR: using `sorted(elements)[0]` is slow and can be replaced with `min(elements)`.** I stumbled across these code snippets while playing around with CodeQL (see https://lgtm.com/query/4148064474379348546/). Pull Request resolved: https://github.com/pytorch/pytorch/pull/86995 Approved by: https://github.com/jansel commit 98f40af7e3133e042454efab668a842c4d01176e Author: Yanbo Liang Date: Tue Oct 25 03:22:27 2022 +0000 [Inductor] Truncate function expr str if it's too long at RecordLoadStore (#87248) See context at https://github.com/pytorch/torchdynamo/issues/1352#issuecomment-1283131872 Fixes https://github.com/pytorch/torchdynamo/issues/1352 cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @penguinwu Pull Request resolved: https://github.com/pytorch/pytorch/pull/87248 Approved by: https://github.com/jansel commit 07cea67d12318368a5dfb10310d77b6754df65c7 Merge: 5140b126d9 ee804a839f Author: mingfeima Date: Tue Oct 25 10:51:58 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit ee804a839f794e1fc047039c54b37080b54194b9 Author: mingfeima Date: Tue Oct 25 10:51:58 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 0fab8df0b637f70e94e3b17c529f200375a35342 Author: Kazuaki Ishizaki Date: Tue Oct 25 02:49:11 2022 +0000 Fix incorrect param names in get_testing_overrides (#87625) This PR fixes incorrect parameter names for lambda in `get_testing_overrides()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87625 Approved by: https://github.com/kit1980 commit d4aa811593428314d2af6a2dadff30aa0f0a0e97 Author: Sherlock Huang Date: Mon Oct 24 21:52:12 2022 +0000 Defer importing meta_table (#87630) This is needed to work around an internal test failure: https://www.internalfb.com/tasks/?t=135878641 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87630 Approved by: https://github.com/eellison, https://github.com/khabinov commit 5140b126d948acb944c7a530cf00ae917583756f Merge: c31e42ca1d c06dfb1e65 Author: mingfeima Date: Tue Oct 25 09:56:29 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit c06dfb1e653de107ee0ee8adc68390ed89db8acd Merge: 88824d9e20 ea30002a60 Author: mingfeima Date: Tue Oct 25 09:56:29 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit ea30002a60df2031679469fc53238b9c9aca697c Author: Huy Do Date: Tue Oct 25 01:45:23 2022 +0000 Add cached conda env files for macos (arm64, x86) (#87541) So far, we only cache macos conda dependency for build workflow. All the test dependencies are still not cached and installed by the CI. This PR introduces a new `.github/requirements` directory which I plan to explicitly include all the conda and pip build and test dependencies across all platforms. This allows pip and conda installation to be consolidated in one place (and properly cached) Those conda dependencies come from https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/macos-common.sh. Once this PR is merged, I will follow up with another one to clean up all conda installation from that file (to make sure that nothing break along the way) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87541 Approved by: https://github.com/ZainRizvi commit 63138fbec3319c712a126c2ad6b46357a08ba0f6 Author: erjia Date: Tue Oct 25 01:27:56 2022 +0000 [DataLoader2] Change serialization wrapper to iterator (#87459) This is temporary fix for internal SEV. We have run three different workflows to validate this fix would unblock internal SEV. And, those are a few following-up tasks: - [ ] Create reproducible test for multithreading with generator - [ ] Figure out how to make fullsynciterator is working properly with generator - [ ] Move Wrapper back to generator if needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/87459 Approved by: https://github.com/NivekT commit 3f94adc1056b541851422f887149d54756ed91c1 Author: Aaron Enye Shi Date: Tue Oct 25 00:50:13 2022 +0000 [Kineto][Profiler] Rename Profiler post processing Index Key (#87477) Summary: Rather than using the full name Profiler Event Index, use a shorten name Ev Idx. In the future, we should address this by adding a lookup table of short name to long name. Test Plan: CI Reviewed By: robieta, slgong-fb Differential Revision: D40328758 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/87477 Approved by: https://github.com/chaekit commit a3c5a80a2552ab26b8b769cb94bf15edaf03b734 Author: Nikita Shulga Date: Tue Oct 25 00:18:31 2022 +0000 Fix TensorShape.cpp compilation (#87654) Build failure introduced by landrace while merging https://github.com/pytorch/pytorch/pull/75575 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87654 Approved by: https://github.com/albanD commit 28593a8339de9c9daa244125b223766c4dfd40ff Author: Masaki Kozuki Date: Tue Oct 25 00:11:50 2022 +0000 [docs] `batch_isend_irecv` and `P2POp` of torch.distributed (#86438) Reopening https://github.com/pytorch/pytorch/pull/79722 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/86438 Approved by: https://github.com/kit1980 commit cf895bac152b17530d3f0b82104a2eb3ec9528be Author: Nikita Shulga Date: Tue Oct 25 00:00:57 2022 +0000 Fix typo in secrets name (#87655) They are case sensitive and should be all uppercase Pull Request resolved: https://github.com/pytorch/pytorch/pull/87655 Approved by: https://github.com/kit1980, https://github.com/weiwangmeta commit b085c80126d6234d3a3fc8646f38520eae05283d Author: albanD Date: Mon Oct 24 15:37:20 2022 -0400 Add /= to c10::SymInt (#87603) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87603 Approved by: https://github.com/bdhirsh commit 5ce9993dce36942d5b1e6c8f46d346014897d326 Author: albanD Date: Mon Oct 24 15:37:20 2022 -0400 Fix a PyObject leak (#87608) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87608 Approved by: https://github.com/ezyang commit 3263bd24bee43b4e2c263c0076a2136de6ead947 Author: albanD Date: Mon Oct 24 15:37:20 2022 -0400 Improve argument printing (#87601) No more "expected tuple but got tuple". We appropriately grovel in the list/tuple for the element that mismatched and report what exactly twinged the failure. invalid_arguments.cpp is a shitshow so I did something slapdash to get it not completely horrible. See https://github.com/pytorch/pytorch/issues/87514 for more context. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87601 Approved by: https://github.com/Chillee commit 72ec1b5fc14565671e3c485e93acd26552691c9f Author: Kazuaki Ishizaki Date: Mon Oct 24 23:52:44 2022 +0000 Fix typo under docs directory (#87583) This PR fixes typo in `.rst` files under docs directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/87583 Approved by: https://github.com/kit1980 commit 8ff3566aab2cd5c5fb4fba35b06e79cabeaeb052 Author: Edward Z. Yang Date: Mon Oct 24 19:40:19 2022 -0400 Make me codeowner of test_aotdispatch.py (#87624) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87624 Approved by: https://github.com/albanD commit 72064c456f5773c676054697e6df42db10d7c375 Author: Edward Z. Yang Date: Mon Oct 24 16:36:25 2022 -0700 Fix bernoulli functionalization. (#87573) For testing, see https://github.com/pytorch/pytorch/issues/87571 Signed-off-by: Edward Z. Yang cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87573 Approved by: https://github.com/albanD commit be925df25d7f6be2c34e62ec5b791ccd354c36d3 Author: Peter Bell Date: Mon Oct 17 18:57:07 2022 +0100 ATen/native (6/6): Use per-operator headers (#75576) Differential Revision: [D40126699](https://our.internmc.facebook.com/intern/diff/D40126699) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75576 Approved by: https://github.com/malfet commit 630fcdadcf9606c1d90ea94d9993c95e0c49fc01 Author: Peter Bell Date: Mon Oct 17 18:57:07 2022 +0100 ATen/native (5/6): Use per-operator headers (#75575) Differential Revision: [D40126696](https://our.internmc.facebook.com/intern/diff/D40126696) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75575 Approved by: https://github.com/malfet commit 482f6419ee17b4ab6ee32997db6a1e89220c5ca2 Author: Peter Bell Date: Mon Oct 17 18:57:07 2022 +0100 ATen/native (4/6): Use per-operator headers (#75574) Differential Revision: [D40126697](https://our.internmc.facebook.com/intern/diff/D40126697) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75574 Approved by: https://github.com/malfet commit 4abd3e299dd2d212047dcd5391bc404653afb94e Author: Peter Bell Date: Mon Oct 17 18:57:06 2022 +0100 ATen/native (3/6): Use per-operator headers (#75573) Differential Revision: [D40126701](https://our.internmc.facebook.com/intern/diff/D40126701) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75573 Approved by: https://github.com/malfet commit f1440e77e7606d598ea39ebcd0e75988514abea1 Author: Nikita Shulga Date: Mon Oct 24 23:05:14 2022 +0000 [CI] Fix triton wheel build (#87461) If one to use auto-install llvm mechanism, somehow one ends us with few unresovled symbols if build on manylinux image. Workaround by installing llvm from OS repos. Also, add an upload job, which is executed only on trunk Fixes https://github.com/pytorch/torchdynamo/issues/1733 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87461 Approved by: https://github.com/msaroufim commit 1655b47a384d5e6ba31420b5daee1c029a821387 Author: Huy Do Date: Mon Oct 24 22:44:42 2022 +0000 Add some common tools to docker base (#86993) I always need to install these 2 tools whenever I use Docker manually to debug build and test issues: * unzip is to extracted the zipped artifacts from PyTorch CI * gdb is to do you know what :) IMO, it makes sense to have them as part of the container image Pull Request resolved: https://github.com/pytorch/pytorch/pull/86993 Approved by: https://github.com/ZainRizvi commit 96aac51717194eb8dcd9cc711bd78cfc9cf39e92 Author: kshitij12345 Date: Mon Oct 24 22:43:11 2022 +0000 [functorch] dont compute expected output multiple times (#86202) Fixes https://github.com/pytorch/functorch/issues/1028 Description: We update `get_fallback_and_vmap_exhaustive` to compute expected output only once as described in the issue. NOTE: This doesn't take care of the repeated computation in `test_vmap_exhaustive` and will be followed up later. TODO: * [x] Benchmark and see how much difference does this make. (Comparison Table Below: [Link](https://github.com/pytorch/pytorch/pull/86202#issuecomment-1285477653)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86202 Approved by: https://github.com/zou3519 commit bad64bdd93a9f44d8312b5eed9d6f9c4aab1f9d5 Author: Huy Do Date: Mon Oct 24 22:24:44 2022 +0000 Upgrade actions/upload-artifact to v3 (#87553) Upgrade a bunch of actions to get rid of the deprecation warnings, i.e. https://github.com/pytorch/pytorch/actions/runs/3304031186 * Upgrade actions/upload-artifact to v3 * Upgrade Windows actions/setup-python to v4 (left over) Note: Warnings coming from setup/cache will be fixed upstream by https://github.com/pytorch/test-infra/pull/941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87553 Approved by: https://github.com/clee2000 commit c4fecff97d5b5405bcac6c6f8dc34bcb1d2cb020 Author: Animesh Jain Date: Mon Oct 24 21:53:14 2022 +0000 [inductor] Prevent aggressive fusion during inductor lowering (#87447) Fixes https://github.com/pytorch/torchdynamo/issues/1599 Inductor performs aggressive fusion of ops during the lowering of Fx graph into IR nodes. Note that this fusion is different from the fusion that we typically discuss in the context of Inductor, which refers to the fusion of SchedulerNodes (way after lowering). This PR, instead, ensures that we don't accumulate too many ops in the IR node to begin with. In the case of hf_t5_large backward graph, earlier we would generate a kernel with 100s of operators, causing that kernel to take ~350 seconds of compilation time. With this PR, we get it down from 350 seconds to 50 seconds. Note that this could affect performance. I doubt that it will lead to really large dip though. In my toy examples, even if the lowering creates multiple IR nodes, if its a simple fusion, later fusion still creates one node. I would like (1) test_torchinductor.py, (2) test_torchinductor_info.py, and (3) atleast HF models to be enabled in CI before merging this one. @ngimel @jansel @Chillee cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu Pull Request resolved: https://github.com/pytorch/pytorch/pull/87447 Approved by: https://github.com/jansel commit e5ceab173a410aa24a0706d48b2fae307016605f Author: Michael Suo Date: Mon Oct 24 14:29:00 2022 -0700 [dynamo] fix `explain` (#87640) Another casualty of the core move Pull Request resolved: https://github.com/pytorch/pytorch/pull/87640 Approved by: https://github.com/voznesenskym commit 71fe069d985e97b5947d133f2f2bde9adea01ed7 Author: Greg Hogan Date: Mon Oct 24 21:25:36 2022 +0000 ada lovelace (arch 8.9) support (#87436) changes required to be able to compile https://github.com/pytorch/vision and https://github.com/nvidia/apex for `sm_89` architecture Pull Request resolved: https://github.com/pytorch/pytorch/pull/87436 Approved by: https://github.com/ngimel commit 4105ef9a6bf094dfbed19e35cdf5af3a7c57bb12 Author: albanD Date: Mon Oct 24 21:03:58 2022 +0000 small improvement to error message in fx interpreter (#87599) From https://github.com/pytorch/pytorch/pull/84246/files#r972537173 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87599 Approved by: https://github.com/ezyang commit 8d37e51931dff142bd2b7c2830a69972d3a05012 Author: shubhambhokare1 Date: Mon Oct 24 20:48:29 2022 +0000 [ONNX] Enable test_fill script test (#79555) For scripting mode, aten::clone requires input to be a TensorType. Hence if we encounter an IntType, FloatType or BoolType input, we set the input to the appropriate TensorType Pull Request resolved: https://github.com/pytorch/pytorch/pull/79555 Approved by: https://github.com/justinchuby, https://github.com/BowenBao, https://github.com/abock commit fbe256cb1e5d08ca3ef6140b048a87105c287dc3 Author: Catherine Lee Date: Mon Oct 24 20:21:16 2022 +0000 cpp docs push fix (#87614) currently failing with ``` To https://github.com/pytorch/cppdocs + 2825b2745bb...80ec4daa657 HEAD -> pytorchbot/temp-branch-cpp (forced update) Branch 'master' set up to track remote branch 'pytorchbot/temp-branch-cpp' from 'origin'. ++ sleep 30 ++ git push -u origin fatal: The upstream branch of your current branch does not match the name of your current branch. To push to the upstream branch on the remote, use git push origin HEAD:pytorchbot/temp-branch-cpp To push to the branch of the same name on the remote, use git push origin HEAD ``` just checked the settings, master of pytorch/cppdocs does not have easy cla as a required check, so we don't need the temp branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/87614 Approved by: https://github.com/huydhn commit 2abe9c464ee6b3859573c3edae5ef38ae1da2f6c Author: Richard Zou Date: Mon Oct 24 12:46:27 2022 -0700 Add codeowners for functorch (#86213) The list is for people who want to be notified on changes to the files in there. Review is not required from the list of names; I just want to be notified to keep track of what is going on. Let me know if you want your names added too in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86213 Approved by: https://github.com/Chillee commit 00b8c7e63b40f2596a1cde66eea806759131dcea Author: alexmsettle <37422826+alexmsettle@users.noreply.github.com> Date: Mon Oct 24 20:02:56 2022 +0000 New feature for issue #85575. (#86514) Introduced RECORD_OUTPUTS() macro that goes with RECORD_FUNCTION(). It is used to capture the output tensors from a kernel launch. The tensors automatically get passed to the profiler using record_function methods. This allows the profiler to track the tensors that flow into and out of each op. Fixes #85575 cc @robieta @chaekit @aaronenyeshi @ngimel @nbcsm @guotuofeng @guyang3532 @gaoteng-git @tiffzhaofb Pull Request resolved: https://github.com/pytorch/pytorch/pull/86514 Approved by: https://github.com/robieta commit 17509d1ec41c6c513a382c2a6a044ac6a6f903c5 Author: Manuel Candales Date: Mon Oct 24 19:41:53 2022 +0000 [Vulkan][TCC] Implement tests for hardtanh, hardtanh_, relu and relu_ (#87506) Summary: Implement Vulkan tests for these untested functions in Clamp.cpp: - hardtanh - hardtanh_ - relu - relu_ Test Plan: ```cd ~/fbsource buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64``` Reviewed By: kirklandsign Differential Revision: D40603655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87506 Approved by: https://github.com/salilsdesai commit 4f2d869095034301b903cd2ef807b416547c0d9c Author: atalman Date: Mon Oct 24 19:38:07 2022 +0000 Fix distributed issue by including distributed files (#87615) This fixes regression in distributed headers installation. Caused by following PR: https://github.com/pytorch/pytorch/pull/85953 which removed the inclusions Fixes #87173 Test plan from wheel build by this CI: https://github.com/pytorch/pytorch/actions/runs/3314742519 ``` [ec2-user@ip-10-0-9-132 c10d]$ pwd /home/ec2-user/actions-runner/_work/_temp/artifacts/torch/include/torch/csrc/distributed/c10d [ec2-user@ip-10-0-9-132 c10d]$ ls -las total 300 4 drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 24 19:12 . 0 drwxr-xr-x 4 ec2-user ec2-user 29 Oct 24 19:12 .. 12 -rw-r--r-- 1 ec2-user ec2-user 9051 Oct 24 17:28 Backend.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 216 Oct 24 17:28 c10d.h 4 -rw-r--r-- 1 ec2-user ec2-user 3880 Oct 24 17:28 comm.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 604 Oct 24 17:28 debug.h 4 -rw-r--r-- 1 ec2-user ec2-user 1717 Oct 24 17:28 default_comm_hooks.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1316 Oct 24 17:28 error.h 4 -rw-r--r-- 1 ec2-user ec2-user 962 Oct 24 17:28 exception.h 4 -rw-r--r-- 1 ec2-user ec2-user 1461 Oct 24 17:28 FileStore.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 771 Oct 24 17:28 GlooDeviceFactory.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1154 Oct 24 17:28 HashStore.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 4058 Oct 24 17:28 logger.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2059 Oct 24 17:28 logging.h 8 -rw-r--r-- 1 ec2-user ec2-user 7979 Oct 24 17:28 NCCLUtils.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2756 Oct 24 17:28 Ops.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1814 Oct 24 17:28 ParamCommsUtils.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1478 Oct 24 17:28 PrefixStore.hpp 16 -rw-r--r-- 1 ec2-user ec2-user 13235 Oct 24 17:28 ProcessGroupGloo.hpp 12 -rw-r--r-- 1 ec2-user ec2-user 11298 Oct 24 17:28 ProcessGroup.hpp 12 -rw-r--r-- 1 ec2-user ec2-user 8645 Oct 24 17:28 ProcessGroupMPI.hpp 28 -rw-r--r-- 1 ec2-user ec2-user 26526 Oct 24 17:28 ProcessGroupNCCL.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 3805 Oct 24 17:28 ProcessGroupRoundRobin.hpp 12 -rw-r--r-- 1 ec2-user ec2-user 10361 Oct 24 17:28 ProcessGroupUCC.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 5062 Oct 24 17:28 ProcessGroupWrapper.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 4201 Oct 24 17:28 PyProcessGroup.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1072 Oct 24 17:28 python_comm_hook.h 24 -rw-r--r-- 1 ec2-user ec2-user 23859 Oct 24 17:28 reducer.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2330 Oct 24 17:28 reducer_timer.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1683 Oct 24 17:28 sequence_num.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2108 Oct 24 17:28 socket.h 4 -rw-r--r-- 1 ec2-user ec2-user 2589 Oct 24 17:28 Store.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 3264 Oct 24 17:28 TCPStore.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 6944 Oct 24 17:28 TraceUtils.h 8 -rw-r--r-- 1 ec2-user ec2-user 4539 Oct 24 17:28 Types.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 580 Oct 24 17:28 UCCForNCCL.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2301 Oct 24 17:28 UCCTracing.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 4933 Oct 24 17:28 UCCUtils.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 584 Oct 24 17:28 UnixSockUtils.hpp 24 -rw-r--r-- 1 ec2-user ec2-user 20796 Oct 24 17:28 Utils.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 575 Oct 24 17:28 WinSockUtils.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 4259 Oct 24 17:28 Work.hpp ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87615 Approved by: https://github.com/malfet commit e46a8971e61cd6f37a7edc38586af0828d4c33ce Author: Animesh Jain Date: Mon Oct 24 18:48:46 2022 +0000 [dynamo] Support class members in nn modules (#87531) Fixes https://github.com/pytorch/torchdynamo/issues/1740 @voznesenskym cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu Pull Request resolved: https://github.com/pytorch/pytorch/pull/87531 Approved by: https://github.com/jansel commit 272747db364795a843e740f5e7a3f17320a30855 Author: Natalia Gimelshein Date: Mon Oct 24 18:41:38 2022 +0000 attempted fix for nvrtc with lovelace (#87611) Fixes #87595 (maybe?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87611 Approved by: https://github.com/malfet, https://github.com/atalman commit 4b4aff774fd26150cb60bd56acd26355ec7c023a Author: Andrew Gu Date: Mon Oct 24 14:48:33 2022 +0000 [FSDP] Fix `use_orig_params=True` + AC (#87413) Without this change, the post-backward hooks do not run when using reentrant activation checkpointing. **Explanation** FSDP registers the original parameters as plain `Tensor`s in the forward pass so that their ops are tracked by autograd to ensure proper gradient propagation into the `FlatParameter`s. FSDP registers the post-backward hooks in its pre-forward. For `use_orig_params=True`, FSDP replaces the plain `Tensor`s with the sharded `nn.Parameter`s in the post-forward when resharding. This differs from `use_orig_params=False`, which keeps the plain `Tensor`s registered as attributes, except their data are freed, meaning that accessing them between forward and backward errors. Before this PR, for `use_orig_params=True`, FSDP simply restores the unsharded original parameter data in the pre-backward to enable correct gradient computation. However, this does not suffice for reentrant activation checkpointing (AC), where the recomputed forward happens after FSDP's pre-backward and the ops in the recomputed forward must be tracked by autograd. My initial solution was to simply have FSDP restore the original parameters as plain `Tensor`s again in the pre-backward so that they would be tracked by autograd exactly like the normal forward. However, this seems to not suffice in general. The `FlatParameter`'s `AccumulateGrad` object may change after the original pre-forward when performing a recomputed forward. The new approach in this PR is to follow the `use_orig_params=False` way -- namely, to preserve the plain `Tensor` variables across forward and backward. I achieved this by saving the variables explicitly in the forward and restoring them in the pre-backward. I clear them in the post-backward to avoid the dangling references (though, I do not think this is strictly necessary). An alternative approach I considered is using forward hooks. However, this does not change the order of operations across FSDP, checkpoint, and the wrapped module, so it does not work. (As long as the order is FSDP(checkpoint(module)), then registered hooks still happen either before or after the checkpoint recomputation -- we cannot insert logic to run inside the checkpoint recomputation.) **Test Plan** I augmented the existing reentrant checkpointing unit tests to also test `use_orig_params=True`. I also verified that the pycls model does not error (even with the new approach). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87413 Approved by: https://github.com/rohan-varma commit 7a4d91cac4f2aa79fb44e9048edae957e788394f Author: Will Constable Date: Sun Oct 23 14:18:48 2022 +0000 Add distributed dynamo benchmarking utils (#87419) Util for convenient local benchmarking/debugging of distributed models. Not to be confused with the 'real' distributed benchmark script we use for torchbench experiments on slurm. Tries to be simple/hackable and let you use different combinations of DDP/FSDP with models and dynamo backends. Example usage `python benchmarks/dynamo/distributed.py --toy_model --dynamo inductor --ddp` `--dynamo` flag accepts normal dynamo backends (plus 'print' which literally prints graphs to screen) `--torchbench_model ` works in place of `--toy_model` `--fsdp` is WIP cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87419 Approved by: https://github.com/jansel commit 181b615b4e95376abc2f39ab7f9d145dcfd46c50 Author: Edward Z. Yang Date: Mon Oct 24 11:47:40 2022 -0400 Fix accuracy minifier (#87606) Signed-off-by: Edward Z. Yang cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang @penguinwu Pull Request resolved: https://github.com/pytorch/pytorch/pull/87606 Approved by: https://github.com/anjali411, https://github.com/anijain2305, https://github.com/albanD, https://github.com/soumith, https://github.com/malfet commit 512a3a48e38accbdeb63cfbe45621adb57c903bc Author: RangiLyu Date: Mon Oct 24 16:03:11 2022 +0000 sync AveragedModel buffers when use_buffers=False (#84054) Fixes #84053 As described in the issue, the AveragedModel will deep copy the model during initialization, which means that the buffers in the averaged model cannot be updated together with the model. One solution is to make the buffers equal to the source model every time when calling `update_parameters`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84054 Approved by: https://github.com/samdow commit 1bcd63d5e160a1f8451d3d3d2910ae722564cb6e Author: Jane Xu Date: Mon Oct 24 15:09:40 2022 +0000 [BE][einsum] add small comment explaining an invariant (#87264) Tiny followup from https://github.com/pytorch/pytorch/pull/87135#discussion_r998488064 and another typo i noticed while doing the autograd lab Pull Request resolved: https://github.com/pytorch/pytorch/pull/87264 Approved by: https://github.com/soulitzer commit a06e235edae9189989d53c9ac2d790cbbbd73632 Author: Andrew Gu Date: Mon Oct 24 11:37:26 2022 +0000 [FSDP] `summon_full_params()` in computation stream (#86836) This should help with memory usage. In particular, this allows FSDP to use caching allocator blocks from the computation stream for the `summon_full_params()` all-gathers, which should help avoid over-allocating blocks to the unshard stream. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86836 Approved by: https://github.com/rohan-varma commit eafc910d16a99200af089099e24468c7f8926a05 Author: andrewor14 Date: Fri Oct 21 14:09:52 2022 -0700 [Quant][docs] Add README for BackendConfig (#86523) Summary: This adds a README for `torch.ao.quantization.backend_config` that describes both the high level motivation and the specifications of the BackendConfig API. Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/86523 Approved by: https://github.com/jerryzh168 commit 084e77366302e4d88a883a4d2cc88944e943958f Author: Andrew Gu Date: Mon Oct 24 03:31:34 2022 +0000 [FSDP][2/N] Remove `params_with_grad` (#87480) This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm. This PR is technically BC-breaking. However, I checked that no one internally is using this API. cc @ezyang @gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/87480 Approved by: https://github.com/rohan-varma commit edac0d22afb6108841b1808ed3948500976942ea Author: Andrew Gu Date: Mon Oct 24 03:31:34 2022 +0000 [FSDP][1/N] Rework `clip_grad_norm_()` and tests (#87479) This PR reworks FSDP's `clip_grad_norm_()` and its unit tests. The unit tests in `test_fsdp_core.py` still need to be revisited and will be done in follow-up work. Some details in arbitrary order: - This renames `_calc_grad_norm()` to `_get_grad_norm()`. This is to simplify our verb usage in method names. Otherwise, we may diverge to different verbs like "compute", "calculate", "get", "find" etc. I am open to discussion here. - Because we call `torch.linalg.vector_norm()` as the underlying norm calculation subroutine, which can take infinity as input for the norm type, there is no reason to have a separate conditional branch for the infinity norm. - This removes a host-device synchronization point from `clip_grad_norm_()` by using the same trick from `torch.nn.utils.clip_grad_norm_()`. This may improve throughput for workloads like metaseq, which computes gradient norms regularly. - This returns the total norm from `clip_grad_norm_()` as mentioned in the docstring. Before nothing was returned. - This rewrites the unit tests, which were slightly problematic. Much of the logic to verify gradient norms were computed correctly were exactly the same as the logic used to compute them in FSDP (i.e. `^p`, sum via all-reduce, `^(1/p)`). This defeats the purpose of unit testing. There were some other oddities like `input = torch.rand(14, 2, device=self.rank); in_data = torch.tensor(input[self.rank], device=self.rank)`, where we materialize a full `(14, 2)` shape but only ever use the first two rows (assuming world size 2). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87479 Approved by: https://github.com/rohan-varma commit 3528b1fc9a18bd8129c8c14bf00aa276d91c72f8 Author: Andrew Gu Date: Mon Oct 24 03:31:34 2022 +0000 [FSDP][Docs] Clarify warnings to mention collectives (#87478) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87478 Approved by: https://github.com/rohan-varma commit 573c8b6b07e7746219ae6557b3e2cc790865d1c8 Author: Andrew Gu Date: Mon Oct 24 03:36:52 2022 +0000 [FSDP] Rename streams (#86833) This time around, I decided to rename the "all_gather" stream to the "unshard" stream to emphasize that it includes both the actual all-gather op but also the corresponding memory allocations (and also now the unflattening as well). (A similar reasoning applies for the "pre-all-gather" stream becoming the "pre-unshard" stream.) This PR is definitely safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86833 Approved by: https://github.com/rohan-varma commit 04ad0134ae51a50a1f657c1e4b86c3c16f0e9158 Author: Andrew Gu Date: Mon Oct 24 03:39:38 2022 +0000 [FSDP] Use `reduce_scatter_tensor()` (#87240) Let us silence some more warnings 👍🏼 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87240 Approved by: https://github.com/rohan-varma commit cdb63a77d5fcdaa94e8aacd15592a3a53ac776d5 Author: PyTorch MergeBot Date: Mon Oct 24 10:43:23 2022 +0000 [xla hash update] update the pinned xla hash (#87590) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87590 Approved by: https://github.com/pytorchbot commit c31e42ca1d91404556f8bae7ba6fce69e7974e25 Merge: 115acf126a 88824d9e20 Author: mingfeima Date: Mon Oct 24 15:23:28 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 88824d9e20c75a7770f662e8d5e2b7b9bb45c1cf Author: mingfeima Date: Mon Oct 24 15:23:28 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit faf9c47abb18168448163a242b22b34e75ff42e1 Author: lezcano Date: Sun Oct 23 20:38:41 2022 +0000 Simplify a few diagonal-related functions (#87180) `diag` was unnecessarily implemented as a kernel rather than as a composite function, which made it unnecessarily difficult (explicit backward + all it entails). We also change a few uses of `diag` on 2D tensors for `diagonal()`. The latter returns a view rather than creating a new tensor. We also upgrade its meta implementation to a fully-fledged decomposition I tried implementing the backwards of `diagonal()` via `diag_scatter` (or better `diag_scatter_` to keep the perf) but functionalisation was failing and I was not sure how to fix this, so I moved on. It may be possible to simplify that one as well if @soulitzer or someone knows how to do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87180 Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/mruberry commit 08c2314d98d38f9a74e8dd34a65c6000c2fae3d1 Author: lezcano Date: Sun Oct 23 20:38:41 2022 +0000 [PrimTorch] Add maker for *_copy variants of view functions (#87278) Implements `diagonal_copy` as an example. This PR also fixes a number of correcness issues with `diagonal_copy`. cc @ezyang @mruberry @ngimel @Lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87278 Approved by: https://github.com/mruberry commit 5e4bcb049e5d57ffe5aa539fa93eae372351a45d Author: lezcano Date: Sun Oct 23 20:38:41 2022 +0000 Improve readability of the extra message errors in assertEqual (#87202) Goes from (note the `linspace.default` is very difficult to find) ``` Mismatched elements: 15 / 50 (30.0%) Greatest absolute difference: 1 at index (17,) Greatest relative difference: 1.0 at index (17,) : linspace.default args = (0, -3, 50) kwargs = {'dtype': torch.int16, 'device': device(type='cpu'), 'pin_memory': False} ``` to ``` Mismatched elements: 15 / 50 (30.0%) Greatest absolute difference: 1 at index (17,) Greatest relative difference: 1.0 at index (17,) linspace.default args = (0, -3, 50) kwargs = {'dtype': torch.int16, 'device': device(type='cpu'), 'pin_memory': False} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87202 Approved by: https://github.com/ezyang commit 115acf126a601cfe58a0a233e847d32260b34dd4 Merge: 8269bd8fb6 8cce3d7fb8 Author: mingfeima Date: Mon Oct 24 12:52:44 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 8cce3d7fb8015d9014008a2d9014fb918c4b8cad Author: mingfeima Date: Mon Oct 24 12:52:44 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 233305a852e1cd7f319b15b5137074c9eac455f6 Author: Will Constable Date: Sat Oct 22 14:50:45 2022 +0000 Improvements for DDP Optimizer (#87549) - adds support for 'first_bucket_cap' arg, to align bucketing more precisely with DDP, which may start a smaller first bucket - refactors the bucket splitting logic to be cleaner - adds pretty-print for bucket info, and a way to access bucket info from the DDPOptimizer class from a test case or benchmark - dumps debug logs to stdout cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87549 Approved by: https://github.com/soumith commit 4c8e1a98290a1d9a8b3bb7673bce845583863323 Author: eqy Date: Sun Oct 23 21:17:12 2022 +0000 Fix 64bit indexing in `vol2col` (#87527) Surfaced from #87354 CC @ngimel @ptrblck @maybeLee Pull Request resolved: https://github.com/pytorch/pytorch/pull/87527 Approved by: https://github.com/ngimel commit 2e4c89eba980030f3c711f5693b97e9c17d58a06 Author: efiks <5167930+efiks@users.noreply.github.com> Date: Sun Oct 23 19:29:25 2022 +0000 [torch] Unify batch_box_cox implementations into perfkernels folder (#86569) Summary: 1) Adding MKL/AVX2 based implementation into perfkernels. This implementation is similar to caffe2/operators/batch_box_cox_op.cc 2) Migrating batch_box_cox_op of caffe2 use this implementation Test Plan: CI Differential Revision: D40208074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86569 Approved by: https://github.com/hyuen commit 0d2baed45e9c9902c85d62a950cac33420cb18e9 Author: Taylor Robie Date: Sat Oct 22 17:37:58 2022 -0700 [Profiler] Regularize `AccumulateGrad` name (#86909) Memory profiler will use AccumulateGrad when detecting gradients. The name difference between Windows and other platforms has already cropped up with profiler trees so it makes sense to address it at the source. Differential Revision: [D40347550](https://our.internmc.facebook.com/intern/diff/D40347550/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86909 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi commit 5ec03fc17affd1de7eb9fb3bab567b3de0702e9b Author: Taylor Robie Date: Sat Oct 22 17:37:57 2022 -0700 [Profiler][Trivial] Add Module cls and self bindings and type_caster macro (#86755) Just a bit of clean up. We will need `self` and `cls` for memory profiling, and the type_caster specializations were getting quite verbose. Differential Revision: [D39920728](https://our.internmc.facebook.com/intern/diff/D39920728/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86755 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi commit b0e10292faf947fe08589392c3731bbbcf3b2a05 Author: Taylor Robie Date: Sat Oct 22 17:37:55 2022 -0700 [Profiler] Tensor IDs for Module and Optimizer variables (#86754) More sophisticated profiling will increasingly rely on python tracer to contextualize observed results. This PR adds Tensors which are observed by the python tracer to the identity assignment loop. Differential Revision: [D39852885](https://our.internmc.facebook.com/intern/diff/D39852885/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86754 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi commit be2d647ea690cf302926a67b38d980841e403178 Author: Taylor Robie Date: Sat Oct 22 17:37:54 2022 -0700 [Profiler] Use parameter as key for optimizer state recording. (#86753) While optimizer can store state however it likes, in practice most optimizer state corresponds to a particular parameter. (This is the case for all `torch.optim` optimizers.) Thus, it turns out to be ergonomic to collect using that structure. Note that this doesn't lock us into anything; we can always collect state with non Tensor keys if the use case arises. One simplification that arises is that Module and Optimizer collection has very similar structure. So similar, in fact, that it is possible to use a common template for config. I also found that a lot of the `check_and_store` logic could be simplified and inlined by this joining of collected optimizer state. Differential Revision: [D40210703](https://our.internmc.facebook.com/intern/diff/D40210703/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86753 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi commit fc3beef5ac11a88c7f538efcb7c60c5971393f38 Author: Horace He Date: Sun Oct 23 02:53:37 2022 +0000 Fix stupid N^2 naming behavior in FX and removed assert that slows things a lot sometimes (#87533) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87533 Approved by: https://github.com/ezyang, https://github.com/voznesenskym commit efdd43d5193435206fbe76cecc294961d10558db Author: PyTorch MergeBot Date: Sun Oct 23 03:18:57 2022 +0000 [vision hash update] update the pinned vision hash (#87528) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87528 Approved by: https://github.com/pytorchbot commit 9bb4926de0b76210f0d1ab90f897671fe4334d7c Author: Ryan Spring Date: Sat Oct 22 17:59:25 2022 +0000 Add xlogy and xlog1py references (#77712) * Add reference implementations for `xlogy` and `xlog1py` * Replace `_wrap_scalar` helper function with `scalar_tensor` prim Pull Request resolved: https://github.com/pytorch/pytorch/pull/77712 Approved by: https://github.com/mruberry commit f3f1b447787da713a00ad4219532a6e4e9e2bcf8 Author: Sherlock Huang Date: Sat Oct 22 02:21:07 2022 +0000 Fix meta for meta_fill_ (#87493) Existing meta_fill_ doesn't correctly reflect the aliasing relationship for aten.fill. A new MetaTensor should be return instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87493 Approved by: https://github.com/eellison, https://github.com/bdhirsh commit 2f9fc160a41d8da719d086e830d703e6af5efd6b Author: Nikita Shulga Date: Sat Oct 22 06:06:15 2022 +0000 [CI] Run all MacOS builds on MacOS-12 (#87496) Not sure why we needed macos-10.15 for libtorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/87496 Approved by: https://github.com/atalman, https://github.com/seemethere commit c28cdb53ea1f3e377e478fbdfa64b8cffc3828e6 Author: Nikita Shulga Date: Sat Oct 22 06:00:59 2022 +0000 [BE] Delete BUILD_SPLIT_CUDA option (#87502) As we are linking with cuDNN and cuBLAS dynamically for all configs anyway, as statically linked cuDNN is different library than dynamically linked one, increases default memory footprint, etc, and libtorch_cuda even if compiled for all GPU architectures is no longer approaching 2Gb binary size limit, so BUILD_SPLIT_CUDA can go away. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87502 Approved by: https://github.com/atalman commit f047dadab94c44ed348147960b9a2a24ed505b31 Author: Bin Bao Date: Fri Oct 21 23:01:17 2022 +0000 Enable inductor CI for TIMM (#87462) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87462 Approved by: https://github.com/anijain2305 commit 0ef0a78196cf8726b0327c9f370615c8889ed676 Author: PyTorch MergeBot Date: Sat Oct 22 04:51:33 2022 +0000 Revert "Improvements for DDP Optimizer (#87525)" This reverts commit cf693a02e0f6a022d10fd882af20efacfe7ecb76. Reverted https://github.com/pytorch/pytorch/pull/87525 on behalf of https://github.com/ZainRizvi due to The macos error messages look like they were indeed caused by this PR commit cf693a02e0f6a022d10fd882af20efacfe7ecb76 Author: Will Constable Date: Sat Oct 22 01:03:41 2022 +0000 Improvements for DDP Optimizer (#87525) - adds support for 'first_bucket_cap' arg, to align bucketing more precisely with DDP, which may start a smaller first bucket - refactors the bucket splitting logic to be cleaner - adds pretty-print for bucket info, and a way to access bucket info from the DDPOptimizer class from a test case or benchmark - dumps debug logs to stdout cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87525 Approved by: https://github.com/davidberard98 commit 8461460d55c2474b236a5d7198067ed299631b76 Author: Michael Lazos Date: Sat Oct 22 03:43:08 2022 +0000 Unified debug directory for dynamo/inductor tools (#87438) Fixes https://github.com/pytorch/torchdynamo/issues/1705 Fixes https://github.com/pytorch/torchdynamo/issues/1383 Adds a debug directory by default called `torchdynamo_debug` in the current working directory. In the debug directory for each run of dynamo (an enter and exit of optimize) folder run_\ is created which contains any minifier/inductor/torchdynamo artifacts under respective folders. Updated the minifier, record replay, and inductor tracing to use this directory cc @jansel @lezcano @fdrocha @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87438 Approved by: https://github.com/soumith commit b18fadae88f02232ede90c8577b4509015fedcc8 Author: Will Constable Date: Fri Oct 21 23:13:39 2022 +0000 Re-enable dynamo ddp tests (#87524) - Move dynamo dist tests to another shard Pull Request resolved: https://github.com/pytorch/pytorch/pull/87524 Approved by: https://github.com/davidberard98 commit 707218f1253ffb3a000c9c5db4d96e0cf3bda4c7 Author: Jason Ansel Date: Fri Oct 21 15:14:15 2022 -0700 Reland #87025 and fix periodic tests (#87084) - Relands #87025 - disables failing tests related to https://github.com/pytorch/torchdynamo/issues/1697 - Reverts https://github.com/pytorch/pytorch/commit/d01eea6027c26bf100fc99a705669f60648964ae cc @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87084 Approved by: https://github.com/malfet, https://github.com/voznesenskym commit 5c4a2e679b6318f0094e2a0c8310ac40658c0d95 Author: Catherine Lee Date: Fri Oct 21 22:53:35 2022 +0000 fix docs push (#87498) push docs to temp branch first then push to actual branch to satisfy CLA check in branch protections Pull Request resolved: https://github.com/pytorch/pytorch/pull/87498 Approved by: https://github.com/malfet commit 838b699e1082791d5e838ca0de0d72c4b6120e14 Author: Edward Z. Yang Date: Fri Oct 21 12:57:55 2022 -0400 as_strided_scatter storage offset defaults to None not 0 (#87481) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87481 Approved by: https://github.com/bdhirsh commit c55b3325176129babc7b870e6d624deac6930183 Author: Will Constable Date: Fri Oct 21 16:21:43 2022 +0000 Delete unused static runtime experiment (#87473) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87473 Approved by: https://github.com/anijain2305 commit dfc65f43f9f1b15b14759396547816f5605519f2 Author: Will Constable Date: Fri Oct 21 16:21:43 2022 +0000 Delete unused ts experiment (#87472) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87472 Approved by: https://github.com/anijain2305 commit 7baf4b1969fcd63de5d6f5d8118cc61bab6b1e97 Author: Will Constable Date: Fri Oct 21 16:21:43 2022 +0000 Delete unused ltc experiments (#87471) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87471 Approved by: https://github.com/anijain2305 commit 62d30f5a8ab6816874c7f1d43402bb7e1d1eb6ec Author: Will Constable Date: Fri Oct 21 16:21:43 2022 +0000 Remove unused cold_start experiment (#87470) - this `--cold_start` experiment didn't end up being used - there is a new `--cold_start_latency` flag that is used - this experiment was only hooked up for nvfuser anyway cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87470 Approved by: https://github.com/anijain2305 commit ee231671c0e50329dfd6c6cdb9d3e78848c5754c Author: Will Constable Date: Fri Oct 21 16:21:42 2022 +0000 Make torchbench setup a function (#87469) cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87469 Approved by: https://github.com/anijain2305 commit 169ec120efed0a5f0e050a4d9c7c762ba05fa67c Author: samdow Date: Wed Oct 19 10:36:40 2022 -0400 [Modes] refactor modes to only use a stack in cpp (#86458) Refactors the mode code to only have the C++ mode stack and not the "C++ mode" like we originally had. This also simplifies the mode logic in a number of places Pull Request resolved: https://github.com/pytorch/pytorch/pull/86458 Approved by: https://github.com/zou3519 commit 13cad7e1203a5a2416240dec87ee6e374486dcdc Author: Huy Do Date: Fri Oct 21 19:14:28 2022 +0000 [BE] Remove pip and conda installation in Linux build workflow (#87256) All the dependencies should come from the Docker container already. This only updates Linux build workflow, Linux test workflow comes later in a separate PR. The `opt-einsum` package that was installed as part of PyTorch wheel has already been installed in the Docker container [requirements-ci.txt](https://github.com/pytorch/pytorch/blob/master/.circleci/docker/requirements-ci.txt#L127) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87256 Approved by: https://github.com/malfet commit 620dbc43d8e3c836ad8d934987ee2f87fefbad7a Author: Alex Date: Fri Oct 21 19:03:00 2022 +0000 Slowly introduce ops to be tested by test_numpy_ref on MPS backend (#87342) Enable a test that would have caught https://github.com/pytorch/pytorch/issues/86239 Prior to the fix for that bug, this test fails with ``` _____________________________ TestCommonMPS.test_numpy_ref_mps_where_mps_float32 _____________________________ Traceback (most recent call last): File "/Users/alex/git/pytorch/test/test_ops.py", line 197, in test_numpy_ref_mps self.compare_with_reference( File "/Users/alex/git/pytorch/torch/testing/_internal/common_utils.py", line 2366, in compare_with_reference actual = torch_fn(t_inp, *t_args, **t_kwargs) File "/Users/alex/git/pytorch/torch/testing/_internal/opinfo/core.py", line 1068, in __call__ return self.op(*args, **kwargs) File "/Users/alex/git/pytorch/torch/testing/_internal/common_methods_invocations.py", line 15167, in op=lambda self, condition, other: torch.where(condition, self, other), RuntimeError: 0'th index 3 of x tensor does not match the other tensors ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87342 Approved by: https://github.com/albanD commit 7bd04fb09f3c1c310f1303272def3d59bf547964 Author: Iris Zhang Date: Fri Oct 21 18:45:38 2022 +0000 [1/N][C10D] Add a customized ScubaLogHandler implementation for internal FB use (#86699) (#87123) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86699 This diff does the following: 1. **c10d_error_logger.py**: Add an API to create a logger with a specific logging handler based on the destination. 2. The API from above would get a logging handler based on the destination provided. - **caffe2/torch/distributed/logging_handlers.py**: For OSS, we simply use a NullHandler() for now. 3. Add associated test files for 1 and 2. Test Plan: ``` buck test @//mode/dev-nosan //caffe2/test/distributed:test_c10d_error_logger -- --print-passing-details ``` ``` File changed: fbcode//caffe2/test/distributed/test_c10d_error_logger.py File changed: fbsource//xplat/caffe2/test/distributed/TARGETS 9 additional file changes waiting for all tests to finish... ✓ Listing success: caffe2/test/distributed:test_c10d_error_logger (0.2s) Found 1 tests ✓ Pass: caffe2/test/distributed:test_c10d_error_logger - test_get_or_create_logger (caffe2.test.distributed.test_c10d_error_logger.C10dErrorLoggerTest) (0.2s) stdout: stderr: Buck UI: https://www.internalfb.com/buck2/b975f6b0-77e9-4287-8722-f95b48036181 Test Session: https://www.internalfb.com/intern/testinfra/testrun/1407375150206593 RE: reSessionID-4d7ab8ca-1051-48e9-a5a8-6edbe15d1fe4 Up: 124 B Down: 0 B Jobs completed: 5. Time elapsed: 3.5s. Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` Differential Revision: D39920391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87123 Approved by: https://github.com/fduwjj, https://github.com/H-Huang commit 100beb20999a54152557ae7875433e1558f0541a Author: Zain Rizvi Date: Fri Oct 21 18:15:38 2022 +0000 Only label checks against pull requests (#87488) When a commit is triggered via any mechanism other than a pull request, there will not be a PR to check labels for. The job will fail with the error: ``` 2022-10-21T17:50:53.2938592Z + python3 .github/scripts/check_labels.py '' 2022-10-21T17:50:53.4758863Z usage: Check PR labels [-h] pr_num 2022-10-21T17:50:53.4759337Z Check PR labels: error: argument pr_num: invalid int value: '' ``` Instead, we should limit the workflow to only run on pull requests Pull Request resolved: https://github.com/pytorch/pytorch/pull/87488 Approved by: https://github.com/huydhn commit 2a6079d58808236e52b8040e45450a6312a284a6 Author: Catherine Lee Date: Fri Oct 21 18:13:56 2022 +0000 fix for dynamo xml reporting (#87378) dynamo tests call a helper function in torch/_dynamo/test_case.py which then calls run_tests in common_utils.py so the test report path looked something like /opt/conda/lib/python3/10/site-packages/torch/_dynamo/test_case * instead of using frame, use argv[0] which should be the invoking file * got rid of sanitize functorch test name because theyve been moved into the test folder Pull Request resolved: https://github.com/pytorch/pytorch/pull/87378 Approved by: https://github.com/huydhn commit 6e1764d806bc45e7c15c79f7e5f1a0bafb76ec73 Author: Eli Uriegas Date: Fri Oct 21 11:17:39 2022 -0400 ci: Allow nvidia-smi to continue with non-0 exit (#87464) Allows nvidia-smi to return a non-0 exit status like status 14 since status 14 is a warning and doesn't affect actual execution see https://github.com/NVIDIA/gpu-operator/issues/285 Signed-off-by: Eli Uriegas Pull Request resolved: https://github.com/pytorch/pytorch/pull/87464 Approved by: https://github.com/atalman, https://github.com/malfet, https://github.com/ZainRizvi commit 9ad1659b17ff12109b7bf4e8669d1e07ed4a84e7 Author: Brian Hirsh Date: Fri Oct 21 08:29:10 2022 -0700 functionalization: make view_copy outputs always contiguous (#85747) This fixes an issue with mobile: The output of view_copy ops should always be contiguous. Later, we can consider adding optional arguments to the `view_copy()` functions to let you explicitly say what the contiguity of the output can be (e.g. channels_last) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85747 Approved by: https://github.com/ezyang commit 294bfb8e806764eaeac8f7dad10ab07ad8770110 Author: Neel Patel Date: Fri Oct 21 17:39:27 2022 +0000 Create workflow to make sure PRs have valid labels (#86829) When a dev submits a PR against the repo, we want to validate that they applied two labels to the PR corresponding the module they edited and the kind of change they're making. Extended the open source workflow CI to add a validation to ensure that the PR being checked has the required labels on it. If it doesn't, the check fails and a bot will post a message on the PR with instructions on what labels the developer needs to add (https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work). Every time a new version of PyTorch is released, we want to compile all the changes made to each module. However, when devs forget to tag their PR, compiling the changes to write the release notes becomes a burdensome process (only ~20% of PRs are currently labeled appropriately, which means it can take up to 40 hours to compile release notes). With this new validation, the hope is that most PRs are labeled accordingly for more timely release notes compilation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86829 Approved by: https://github.com/ZainRizvi commit fbcd4fe2d28d478330308bf50dfb4247371ca848 Author: Huy Do Date: Fri Oct 21 17:39:01 2022 +0000 Skip auto request review on forked PR (#87482) Addresses the comment in https://github.com/pytorch/pytorch/pull/87409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87482 Approved by: https://github.com/albanD commit 5b7f027d911efd499674414c20fce5af5f8269d2 Author: Peter Bell Date: Thu Oct 20 18:06:25 2022 +0100 Remove redundant zeroing in col2im/im2col (#87375) All of the kernels already either start by zeroing the output, or are careful in their implementation to write values to every output location. So, these `zero_` calls should be redundant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87375 Approved by: https://github.com/albanD commit 4fc72b0f4e9ac3c260031224271fa9d71578113f Author: chuksmbaka Date: Fri Oct 21 17:30:18 2022 +0000 Grammatical update of the tech docs. (#87357) Fixes #ISSUE_NUMBER A more appropriate and correct word. ![grammatical correction](https://user-images.githubusercontent.com/25278471/196927273-7e4c0c9b-96a6-43d1-9b10-17b40665feed.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87357 Approved by: https://github.com/albanD commit 6efdcb07884ab9ebeb5e73c1dc043dc9869b1639 Author: William Wen Date: Fri Oct 21 17:30:14 2022 +0000 Add dynamo smoke test (#87400) https://github.com/pytorch/torchdynamo/issues/1733 Move the old smoke test over from the old dynamo repo. cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87400 Approved by: https://github.com/msaroufim commit db83a0578c914ddc4f229657ba9d9bbe879f92e5 Author: Zachary DeVito Date: Fri Oct 21 03:51:25 2022 +0000 [inductor] force 'fork' method for processes, cleanup (#87411) To cooperate with other multithreading methods, this forces the process pool to use 'fork' even if others have set it diferently. We require fork because otherwise `if __name__ == __main__` needs to be set which we do not control as a library. Furthermore this adds code to cleanup worker processes if the parent exits abnormally (e.g. segfault). Previously we would leave live but inactive workers around. cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87411 Approved by: https://github.com/soumith, https://github.com/anijain2305 commit 96691865b9696e6218e36cc5a5cf794334859275 Author: Edward Z. Yang Date: Fri Oct 21 07:29:38 2022 -0700 [dynamo] Unify raise_on_* config to suppress_errors and raise by default (#87440) I noticed that a lot of bugs are being suppressed by torchdynamo's default error suppression, and worse yet, there's no way to unsuppress them. After discussion with voz and soumith, we decided that we will unify error suppression into a single option (suppress_errors) and default suppression to False. If your model used to work and no longer works, try TORCHDYNAMO_SUPPRESS_ERRORS=1 to bring back the old suppression behavior. Signed-off-by: Edward Z. Yang cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87440 Approved by: https://github.com/voznesenskym, https://github.com/albanD commit 1133682c46c7a7d7ea804519125555129f3f0498 Author: Andrew Gu Date: Fri Oct 21 11:35:30 2022 +0000 [FSDP][2/N] Fix grad zero vs. `None` edge case (#87308) Some original parameters corresponding to one `FlatParameter` may have `None` gradient while others do not. In that case, the `flat_param.grad` must be non-`None`. However, FSDP should take care to expose the original parameters' gradients regardless. To achieve this, we track a `_is_grad_none` mask over the parameters' gradients. - `_is_grad_none` is initialized to `False` for all. - `_is_grad_none[i]` is set to `True` when writing zeros in place of `None` when writing back the `i`th gradient. - `_is_grad_none[i]` is set to `False` via `_reset_is_grad_none()`, which should be called in the post-backward. See the docstring for details. - `_is_grad_none[i]` must be `False` in order to set `param.grad` to be a view into `flat_param.grad`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87308 Approved by: https://github.com/zhaojuanmao commit 4ee13a5925b13c81525d331a842acc263d295b8e Author: Andrew Gu Date: Fri Oct 21 11:35:30 2022 +0000 [FSDP][1/N] Update `summon_full_params(with_grads)` `None` gradient (#87314) This PR changes `summon_full_params(with_grads=True)`'s behavior to be such that if all ranks have `flat_param.grad = None`, then the original parameters will correctly have `orig_param.grad = None`. This is achieved with a preliminary all-reduce. Note that if a particular original parameter's gradient is `None` on all of the containing ranks, but not all ranks' `flat_param.grad = None`, then that particular gradient is still going to be set to zeros. This can be handled if desired in follow-up work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87314 Approved by: https://github.com/zhaojuanmao commit 4caddac534cd58fdd19eff922212ec7884e85ebc Author: Jerry Zhang Date: Fri Oct 21 16:57:33 2022 +0000 [quant][api] Add assert for backend in get_default_qconfig related apis (#86259) (#87331) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86259 Add assertion to make sure backend is one of "fbgemm", "x86", "qnnpack" and "onednn" for get_default_qconfig, get_default_qat_qconfig, get_default_qconfig_mapping and get_default_qat_qconfig_mapping Test Plan: python test/test_quantization.py -k test_get_default_qconfig_mapping Imported from OSS Reviewed By: jcaip Differential Revision: D40236474 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87331 Approved by: https://github.com/andrewor14 commit 4cc5d6644fd647f14bafae7cb4a4348dd4327c72 Author: Andrew Gu Date: Fri Oct 21 11:30:58 2022 +0000 [FSDP][6/N] Remove FPW! (#87114) This PR simply deletes `flatten_params_wrapper.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87114 Approved by: https://github.com/zhaojuanmao commit f8dd27420ba9945589a4e1dea4f657d3ee68c46f Author: Andrew Gu Date: Fri Oct 21 11:30:58 2022 +0000 [FSDP][5/N] Update `FlatParamHandle` after FPW deprecation (#87113) This PR resolves a TODO left in `FlatParamHandle` that was conditional on deprecating `FlattenParamsWrapper`. We simply pass in the process group into the `FlatParamHandle` constructor instead of later in `shard()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87113 Approved by: https://github.com/zhaojuanmao commit 214d51756ab8fad49639be9c20120e6e4384778b Author: Andrew Gu Date: Fri Oct 21 11:30:57 2022 +0000 [FSDP][4/N] Rework FPW test to not use FPW (#87112) Testing coverage is pretty much preserved except that we do not test on CPU, which is not a tangible loss for FSDP anyway. I renamed a few tests slightly, and I moved some helpers to be immediately below the corresponding test method. This makes it a bit easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87112 Approved by: https://github.com/zhaojuanmao commit 277e37f945da4d3e119223cd0ca8593101f593d5 Author: Andrew Gu Date: Fri Oct 21 11:30:57 2022 +0000 [FSDP][3/N] Register `flat_param` to wrapped module (#87086) This PR registers each `FlatParameter` to the wrapped module, eliminating `FlattenParamsWrapper` usage completely from FSDP. Registering each `FlatParameter` to the wrapped module is preferred over registering to the `FullyShardedDataParallel` instance for both functional-like and non-recursive wrapping. It simplifies the `FlatParameter` naming to be a function of the number of `FlatParameter`s per wrapped module instead of the number of `FlatParameter`s per FSDP instance. For now, we assume 1 `FlatParameter` per wrapped module, so we can simply use a single name `FLAT_PARAM = _flat_param`. From an implementation perspective, we raise some methods from `FlattenParamsWrapper` directly up to `FullyShardedDataParallel`. There will need to be further refactoring for functional-like and non-recursive wrapping. For example, the property `self._has_params -> bool` may need to change to a method `self._has_params(wrapped_module) -> bool`. Such changes are out of scope for this PR and will be done in follow-ups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87086 Approved by: https://github.com/zhaojuanmao commit 9f8ef8eaff1970ccb87ba7a0c25787588c9c39ad Author: Andrew Gu Date: Fri Oct 21 11:30:56 2022 +0000 [FSDP][2/N] Remove `_fsdp_wrapped_module.flat_param` (#86122) This removes **direct** usages of `_fsdp_wrapped_module.flat_param` with `_handles[0].flat_param`. The preferred way to access the `flat_param` will be through the handle. We may converge to only storing `self._handles` and no longer `self.params` in the future. Right now, `self.params` is always exactly `[handle.flat_param for handle in self._handles]`. cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86122 Approved by: https://github.com/zhaojuanmao commit ce0c6e828ed2338df75017fa434fcb2744502024 Author: Brian Hirsh Date: Fri Oct 21 06:21:41 2022 -0700 Reland "add an API for external backends to register custom device names (#86992)" (#87453) Re-land of https://github.com/pytorch/pytorch/pull/86992 This reverts commit a895af92506f206889610251624590798d0deabd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87453 Approved by: https://github.com/ezyang, https://github.com/albanD commit 70c46d32e25b7e8b5c0e457d78292c8eb9634d5a Author: jyx-su <108294040+jyx-su@users.noreply.github.com> Date: Fri Oct 21 16:28:29 2022 +0000 Fix input dimension issue in RNN, LSTM, GRU error message (#87442) Fixes #86576 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87442 Approved by: https://github.com/albanD commit 0c1dec375fce6fb5f75f72ea88391eeada118805 Author: PyTorch MergeBot Date: Fri Oct 21 16:03:00 2022 +0000 Revert "Back out "Revert D40198461: [pytorch][PR] Backport currently dont work with some models if:" (#87124)" This reverts commit a42fbfa0cb467b582799a5132561c82a3d33b1b7. Reverted https://github.com/pytorch/pytorch/pull/87124 on behalf of https://github.com/ZainRizvi due to This is causing periodic jobs to fail commit d73d4aa7de953a3794593ac9e6d6b3a1ce514c3c Author: Edward Z. Yang Date: Fri Oct 21 05:54:15 2022 -0700 Audit for error prone isinstance int/float and add lint (#87345) We recently fixed a bug on symbolic-shapes branch where an isinstance(x, int) test failed when passed a SymIntNode. To prevent this, I've added a lint for all the codepaths where we may pass SymInt/SymFloat directly to reject direct isinstance int/float tests, and instead use one of the aliases. The lint rule explains the options. I then go and fix all of them. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87345 Approved by: https://github.com/bdhirsh, https://github.com/albanD commit 1285542f9b54972089655f91146e277c004762a2 Author: Peter Bell Date: Fri Oct 21 13:29:31 2022 +0100 OpInfo: Add test that sample_inputs_func returns a generator (#84567) This also includes a small list exception for single element lists since none of the memory usage or performance implications of lists apply there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84567 Approved by: https://github.com/lezcano, https://github.com/mruberry commit aa8248cc9a80fc7fc2e5981b8238271d9642eb40 Author: Masaki Kozuki Date: Fri Oct 21 15:05:36 2022 +0000 Reenable `isinstance` with `torch.distributed.ReduceOp` (#87303) tentatively marking as draft as I haven't gotten a comprehensive list of side effects... Ref: https://stackoverflow.com/questions/40244413/python-static-class-attribute-of-the-class-itself Rel: https://github.com/pytorch/pytorch/issues/87191 cc @kwen2501 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87303 Approved by: https://github.com/wanchaol commit d37dc6f69874ffac21390f5e78bf79c43631eb92 Author: Antonio Kim Date: Fri Oct 21 14:28:14 2022 +0000 Make LazyGraphExecutor extensible (#87218) Add `LazyGraphExecutor` to backend interface so that its is extensible by a vendor backend. I've made some preliminary methods virtual. Not sure if we want to make all methods in `LazyGraphExecutor` virtual. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87218 Approved by: https://github.com/wconstab, https://github.com/alanwaketan commit d80a5f9a963fdfb583ca21a1dc70c1355983da39 Author: Kazuaki Ishizaki Date: Fri Oct 21 14:22:20 2022 +0000 Fix typo under torch directory (#87274) This PR fixes typo in .md files under torch directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/87274 Approved by: https://github.com/albanD commit ae62cf7c02c009ab1123a3bfa08ca9a5e4255e4a Author: Nikita Shulga Date: Fri Oct 21 14:10:05 2022 +0000 [MPS] Revamp copy_to_mps_ implementation (#86956) Tensor's view in linear storage is represented by the following parameters: `.shape`, `.stride()` and `.storage_offset()`. Only tensors that are representable as 1d-views can be copied from host to device (and vice versa) using single [`copy(from:sourceOffset:to:destinationOffset:size:)`](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400767-copyfrombuffer?language=objc) call. Modify `copy_to_mps_` function to do the following steps: - Cast `src` tensor to dst data type if needed - Expand `src` tensor to `dst` tensor shape - Clone `src` tensor if it is not stride contiguous (i.e. can not be represented by `src.view(src.numel())`) - Create an empty tensor if `dst` is not stride-contiguous or if its strides are different then potentially cloned `src` strides - Do 1d copy for `src` to (potentiall temp) `dst` - Finally do re-striding/copy on MPS if needed Add test to cover cases where stide-contiguous permuted tensor is copied to MPS, non-stride-contiguous tensor is copied to MPS and if permuted CPU tensor is copied to differently permuted MPS tensor Fixes https://github.com/pytorch/pytorch/issues/86954 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86956 Approved by: https://github.com/kulinseth commit 435e78e5237d9fb3e433fff6ce028569db937264 Author: Michael Voznesensky Date: Fri Oct 21 07:55:23 2022 +0000 [dynamo] [easy] RM spurious `)` (#87439) Fixes #ISSUE_NUMBER cc @jansel @lezcano @fdrocha @mlazos @soumith @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87439 Approved by: https://github.com/msaroufim, https://github.com/soumith commit ab901b48178d6f927f90009d71d7784a5d5627f2 Author: Sherlock Huang Date: Fri Oct 21 00:46:34 2022 +0000 Python binding for dispatcher getAllOpNames (#87422) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87422 Approved by: https://github.com/bdhirsh commit 7caeac17183a9aee0ccce4a3470925c6fe7e5007 Author: Soumith Chintala Date: Fri Oct 21 06:36:13 2022 +0000 [inductor] Fix channels_last conv2d propagation when CuDNN is not found (#87266) Fixes https://github.com/pytorch/torchdynamo/issues/1701 cc @jansel @lezcano @fdrocha @mlazos @voznesenskym @yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87266 Approved by: https://github.com/anijain2305, https://github.com/jansel, https://github.com/voznesenskym commit 6b59d9b566001cd7036ac06497372eae6238cdd4 Author: Antonio Kim Date: Fri Oct 21 05:12:23 2022 +0000 Fix registration hooks (#87369) There is a bug in the implementation of the registration hooks introduced in https://github.com/pytorch/pytorch/pull/86148 whereby if the hook returns a tensor, then the short circuiting logic: ``` value = hook(self, name, value) or value ``` Raises an exception ``` RuntimeError: Boolean value of Tensor with more than one value is ambiguous ``` Fixing the logic so that it only checks to see if the value is `None` before overriding Fixes #85837 CC: @albanD @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/87369 Approved by: https://github.com/albanD commit 8269bd8fb656e43250719091ac302b0eee289f22 Merge: 4bc2a0dcda c79309051a Author: mingfeima Date: Fri Oct 21 11:38:00 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit c79309051a85a75e087421bd087194a94a43acc6 Merge: 5df5c3e33e 6faa6c68e8 Author: mingfeima Date: Fri Oct 21 11:38:00 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit ff43288d31ea7f3de69f4907e2a36455c742d9c9 Author: Soumith Chintala Date: Fri Oct 21 03:14:28 2022 +0000 [AOT][CUDAGraphs] torchdynamo -> torch._dynamo (#87243) Fixes lingering issues from the torchdynamo -> torch._dynamo migration Pull Request resolved: https://github.com/pytorch/pytorch/pull/87243 Approved by: https://github.com/suo, https://github.com/voznesenskym, https://github.com/jansel commit 13ab819356e5a7b7deab1c486fdf36ba0906ebda Author: Richard Zou Date: Thu Oct 20 15:40:03 2022 -0700 [functorch] fix AOTAutograd tutorial (#87415) It was raising asserts previously Pull Request resolved: https://github.com/pytorch/pytorch/pull/87415 Approved by: https://github.com/Chillee commit b1cf377cceb44cb8f567d8ccd59b1d085b13ac50 Author: Bin Bao Date: Thu Oct 20 22:37:07 2022 +0000 Enable inductor CI for huggingface (#86792) Summary: Unit tests will be enabled after fixed in trunck. TorchBench and TIMM need more setup and are coming later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86792 Approved by: https://github.com/jansel, https://github.com/huydhn commit 9ba632253a4e40749aa0589618c19dac1d0b7839 Author: Yanbo Liang Date: Fri Oct 21 01:24:00 2022 +0000 [Inductor] Convert 0d CPU tensor to scalar during triton codegen (#87329) This is a follow up to address [this](https://github.com/pytorch/torchdynamo/pull/1284#pullrequestreview-1130319129). We revised to use the codegen approach to handle 0d CPU tensor, which will not support cudagraph any more. cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87329 Approved by: https://github.com/ngimel commit 961ebca2255f902477e9ea7060b8f28781e3c0cd Author: Nikita Shulga Date: Fri Oct 21 01:09:50 2022 +0000 Add `weights_only` option to `torch.load` (#86812) This addresses the security issue in default Python's `unpickler` that allows arbitrary code execution while unpickling. Restrict classes allowed to be unpicked to in `None`, `int`, `bool`, `str`, `float`, `list`, `tuple`, `dict`/`OrderedDict` as well as `torch.Size`, `torch.nn.Param` as well as `torch.Tensor` and `torch.Storage` variants. Defaults `weights_only` is set to `False`, but allows global override to safe only load via `TORCH_FORCE_WEIGHTS_ONLY_LOAD` environment variable. To some extent, addresses https://github.com/pytorch/pytorch/issues/52596 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86812 Approved by: https://github.com/ezyang commit e3d73bbb07c1dd992a8a209b399c733b64bb8de8 Author: Jason Ansel Date: Thu Oct 20 17:35:49 2022 -0700 Remove jansel/voz from dynamo CODEOWNERS (#87430) Now that CC bot is working on PRs this is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87430 Approved by: https://github.com/voznesenskym commit bd1e95ce306956a748915a217c3ae9012469b0fa Author: Chien-Chin Huang Date: Thu Oct 20 12:30:09 2022 -0700 Improve the performance of validate_non_overlapping_shards_metadata (#85639) `validate_non_overlapping_shards_metadata()` uses a quadratic algorithm to verify the overlapping. However, in some cases (only one dimension is sharded), we a O(nlogn) algorithm can easily be implemented. This PR changes the implementation of `validate_non_overlapping_shards_metadata()`. Differential Revision: [D39681725](https://our.internmc.facebook.com/intern/diff/D39681725/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85639 Approved by: https://github.com/wanchaol commit a42fbfa0cb467b582799a5132561c82a3d33b1b7 Author: Han Qi (qihqi) Date: Thu Oct 20 23:02:10 2022 +0000 Back out "Revert D40198461: [pytorch][PR] Backport currently dont work with some models if:" (#87124) Summary: reland after fixing windows build failure for OVR. Notable change: ``` ``` changed to ```#if defined(FBCODE_CAFFE2) || defined(FB_XPLAT_BUILD) ``` Appearently `-DFB_XPLAT_BUILD` wasn't getting picked up in windows if using `or `to connect Original commit changeset: 7a31fc4b455f Original Phabricator Diff: D40198461 Test Plan: waitforsandcastle Reviewed By: davidberard98, cccclai Differential Revision: D40290932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87124 Approved by: https://github.com/gmagogsfm commit f38a88c4dd8ce006b9934d0d2f121fb93564479b Author: PyTorch MergeBot Date: Thu Oct 20 22:01:51 2022 +0000 Revert "[dynamo] use optimizers correctly in benchmarking (#87311)" This reverts commit 703c19008df4700b6a522b0ae5c4b6d5ffc0906f. Reverted https://github.com/pytorch/pytorch/pull/87311 on behalf of https://github.com/anijain2305 due to Bin (desertfire) is trying to get torchbench models in CI, and this PR prevents that. I will bring this back after models are in CI. commit a91abedf0d78c2582987a5a46472e84cb105d196 Author: Yanbo Liang Date: Thu Oct 20 21:59:12 2022 +0000 [Inductor] TorchInductor tracing fx_graph.py should import overrides (#87271) Running the generated script would be failed if there are ops like ```philox_rand_like``` and ```philox_rand_like```. cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87271 Approved by: https://github.com/jansel commit 1801b57cf6aeed7e4a859227ba2a080a16611fae Author: Catherine Lee Date: Thu Oct 20 21:50:20 2022 +0000 set ci in mps (#87325) dunno if installing xml runner like this is a good idea Pull Request resolved: https://github.com/pytorch/pytorch/pull/87325 Approved by: https://github.com/huydhn, https://github.com/malfet commit f7da9db9c174917f8f77b43c92f879cb7c29484d Author: Sherlock Huang Date: Wed Oct 19 20:13:16 2022 +0000 Unify decomp registries into global_decomposition_table (#86857) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86857 Approved by: https://github.com/ezyang commit 7e83f65ad502992a8d75c91eea2cf3de69bb0b7a Author: Svetlana Karslioglu Date: Thu Oct 20 21:02:09 2022 +0000 Add General Project Policies (#87385) Add General Project Policies to the Governance page Pull Request resolved: https://github.com/pytorch/pytorch/pull/87385 Approved by: https://github.com/orionr commit 17202b363780a06ae07e5cecceffaae6418ad6f8 Author: George Qi Date: Thu Oct 20 20:20:12 2022 +0000 [maskedtensor] fix docs formatting (#87387) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87387 Approved by: https://github.com/cpuhrsch commit bc8cf332447793bbcac7d0493e1b98acabfdb748 Author: samdow Date: Thu Oct 20 13:45:20 2022 -0400 add deprecation warning to nn stateless functional_call (#87367) Same as the release version but just for master Pull Request resolved: https://github.com/pytorch/pytorch/pull/87367 Approved by: https://github.com/albanD, https://github.com/atalman commit 9b88dcf248e717ca6c3f8c5e11f600825547a561 Author: Catherine Lee Date: Thu Oct 20 19:40:59 2022 +0000 [ci] handle libomp upgrade on github (#87382) like #86979, idk if this is a good idea but it seems to fix the problem Pull Request resolved: https://github.com/pytorch/pytorch/pull/87382 Approved by: https://github.com/seemethere commit 0826863962ef58c3b26c15c6745ba3049a05df06 Author: Richard Zou Date: Thu Oct 20 11:11:18 2022 -0700 [functorch][docs] Downgrade the warning about forward-mode AD coverage (#87383) Previously we claimed that "forward-mode AD coverage is not that good". We've since improved it so I clarified the statement in our docs and downgraded the warning to a note. Test Plan: - view docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/87383 Approved by: https://github.com/samdow commit 2fd008ed43c53a75d9a8d857546416ba2c45645d Author: Michael Voznesensky Date: Thu Oct 20 18:14:40 2022 +0000 [dynamo] Add support for invoking nn sequential (#87156) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87156 Approved by: https://github.com/jansel commit 68e946b0c37fc97e1de7320af4202464bd1880c9 Author: Horace He Date: Thu Oct 20 00:48:08 2022 +0000 Fixed tune_layout to not do anything for non-2d convolutions (#87328) cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87328 Approved by: https://github.com/ngimel commit b805e1abefd10efabff019e9bb5e3d7d8ba85660 Author: Richard Zou Date: Thu Oct 13 12:44:46 2022 -0700 [functorch] Fix torch.cat batching rule (#86932) The bug was discovered in https://github.com/pytorch/pytorch/pull/86842. torch.cat has an edge case where it ignores all tensors of shape [0]. So if any of the BatchedTensors have logical shape [0] but physical shape [B, 0], then we coerce them to shape [0] by slicing them. Why don't we just ignore those Tensors? We need to propagate requires_grad-ness somehow (e.g. if the BatchedTensor wraps a Tensor of shape [B, 0] that requires grad, then the output must require grad). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/86932 Approved by: https://github.com/Chillee commit c16b7b41f76233ba930ce7dce6d31f1d362f7e86 Author: Taylor Robie Date: Wed Oct 19 20:53:38 2022 -0700 [Profiler][Trivial] Small style and safety fixes (#86752) I noticed a couple abbreviations in the new optimizer capture code that are worth expanding. I also made the RawTensorMetadata a bit safer. Differential Revision: [D40210702](https://our.internmc.facebook.com/intern/diff/D40210702/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86752 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi commit 1e4a274248959a08bd60529d313355dc837e36fe Author: Zachary DeVito Date: Thu Oct 20 00:03:00 2022 +0000 [dynamo] avoid popen.communicate() (#87335) It seems like when popen.communicate() is used it waits for all the desendents of popen to close the stdin/stderr. However, if we have have worker processes running in the child, and the child segfaults, those processes will stay alive until someone waitpid's the child. Since those children have open handles to the stdin/stderr pipe, communicate never returns. This change just writes the output to temp files and directly calls wait() on the child, which returns as soon as it dies. cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87335 Approved by: https://github.com/anijain2305, https://github.com/voznesenskym commit 75a5a46aa005e1c56f5a7935003cb480f33f9257 Author: Zain Rizvi Date: Thu Oct 20 17:16:45 2022 +0000 Retry sccache downloads (#87306) This is meant to mitigate network flakiness like the one seen on [this build](https://github.com/pytorch/pytorch/actions/runs/3283124693/jobs/5407443872) which results in s3 refusing a connection and sccache failing to download Adding the retry at the workflow level instead of the curl level since as per the job it doesn't seem like the curl command was retried at all. It's possible that the specific html code returned during "Connection refused" isn't one of the ones the gets retried, or the retries don't show on the console and it needed a longer period of time between retries or that. Using the job level retry with a generous retry delay solves for both possibilities. Sample error log: ``` Run sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v[2](https://github.com/pytorch/pytorch/actions/runs/3283124693/jobs/5407443872#step:6:2).15 --output /usr/local/bin/sccache sudo chmod +x /usr/local/bin/sccache echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" echo "SCCACHE_S[3](https://github.com/pytorch/pytorch/actions/runs/3283124693/jobs/5407443872#step:6:3)_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${GITHUB_ENV}" shell: /bin/bash -e {0} env: AWS_ACCESS_KEY_ID: *** AWS_SECRET_ACCESS_KEY: *** BUILD_ENVIRONMENT: macos-12-py3-x86-6[4](https://github.com/pytorch/pytorch/actions/runs/3283124693/jobs/5407443872#step:6:4) DEVELOPER_DIR: /Applications/Xcode_13.3.1.app/Contents/Developer CONDA_ENV: /Users/runner/work/_temp/conda_environment_3283124[6](https://github.com/pytorch/pytorch/actions/runs/3283124693/jobs/5407443872#step:6:6)93 CONDA_RUN: conda run -p /Users/runner/work/_temp/conda_environment_3283124693 --no-capture-output CONDA_BUILD: conda run -p /Users/runner/work/_temp/conda_environment_3283124693 conda-build CONDA_INSTALL: conda install -p /Users/runner/work/_temp/conda_environment_3283124693 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: ([7](https://github.com/pytorch/pytorch/actions/runs/3283124693/jobs/5407443872#step:6:7)) Failed to connect to s3.amazonaws.com port 443 after [8](https://github.com/pytorch/pytorch/actions/runs/3283124693/jobs/5407443872#step:6:8)6 ms: Connection refused Error: Process completed with exit code 7. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87306 Approved by: https://github.com/seemethere commit 4b757f4633494d7bbc55973f36f14aeca96387fa Author: Rui Zhu Date: Thu Oct 20 16:01:54 2022 +0000 Assert if padding mask type is unexpected (#86353) (#87106) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86353 Fix the issue described in https://github.com/pytorch/pytorch/issues/86120 Test Plan: buck test mode/opt caffe2/test:test_transformers -- test_train_with_long_type_pad Differential Revision: D40129968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87106 Approved by: https://github.com/malfet commit 38543d8da0ddce0734ce1ecebb7013382508e142 Author: efiks <5167930+efiks@users.noreply.github.com> Date: Thu Oct 20 15:10:44 2022 +0000 [torch] Add fmsub to vectrozation primitives (#86568) Summary: Add fmsub which is similar to fmadd Test Plan: CI Differential Revision: D40215267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86568 Approved by: https://github.com/ajtulloch, https://github.com/malfet commit a895af92506f206889610251624590798d0deabd Author: PyTorch MergeBot Date: Thu Oct 20 14:51:08 2022 +0000 Revert "add an API for external backends to register custom device names (#86992)" This reverts commit fb6826bfd82660aa905459f894c81d97d143dd2c. Reverted https://github.com/pytorch/pytorch/pull/86992 on behalf of https://github.com/jeanschmidt due to breaking internal builds - D40534212 - arstudio-windows-tests-landcastle-0 commit 9199f9188c6150bebd73968b1539fdd1a12d1c98 Author: albanD Date: Wed Oct 19 18:33:17 2022 -0400 Add inplace function testing to test_proxy_tensor (#87324) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87324 Approved by: https://github.com/ezyang commit 254b681dc69c1d6e36864684e40ce850cb364b64 Author: albanD Date: Wed Oct 19 18:33:17 2022 -0400 Convert torch.Size() argument to sym size in test_proxy_tensor (#87304) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87304 Approved by: https://github.com/ezyang commit 9bd6ea5d76dfb20c90eeb6ee9328ba6b66014645 Author: albanD Date: Wed Oct 19 18:01:24 2022 -0400 Add meta inplace testing (#87291) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87291 Approved by: https://github.com/ezyang commit 2e08ac8696fee6e8e8ce876934b95dda1f491357 Author: albanD Date: Wed Oct 19 18:01:24 2022 -0400 Add randint OpInfo (#87231) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87231 Approved by: https://github.com/ezyang commit 8b704eddcd4cd646e7d084869e6bf20d5a7ebf40 Author: Bert Maher Date: Thu Oct 20 14:15:47 2022 +0000 Update the pinned triton hash (#87300) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87300 Approved by: https://github.com/jansel commit c4cf701889864dceba779569ae642cf95932538a Author: PyTorch MergeBot Date: Thu Oct 20 13:44:14 2022 +0000 Revert "[complex] conv_transpose2d (#81805)" This reverts commit 528dd05108cdac6726748c34e385b5c3136256df. Reverted https://github.com/pytorch/pytorch/pull/81805 on behalf of https://github.com/jeanschmidt due to Breaking internal builds - D40534110 - android-java-tests-0 commit 05ad7bd7433cb65d92802cb5c64fcab2c278f073 Author: PyTorch MergeBot Date: Thu Oct 20 13:17:11 2022 +0000 Revert "Advance nightly docker to 11.6 (#86941)" This reverts commit c5de535bc0b785abbacfebddf660af4cd3b2a6a1. Reverted https://github.com/pytorch/pytorch/pull/86941 on behalf of https://github.com/atalman due to Workflow is passing but installs CUDA 11.3 PyTorch rather then 11.6 commit 1b8af28fe883a58dcb1ae048ab60ad17162dcdb8 Author: Nikita Karetnikov Date: Thu Oct 20 11:02:06 2022 +0200 [primTorch] Add refs for `softmax`, `softmin`, `log_softmax` (#84956) cc @ezyang @mruberry @ngimel @Lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/84956 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 703c19008df4700b6a522b0ae5c4b6d5ffc0906f Author: Animesh Jain Date: Thu Oct 20 05:46:25 2022 +0000 [dynamo] use optimizers correctly in benchmarking (#87311) We were not setting optimizers correctly * This hid the issue that we see here - https://github.com/pytorch/torchdynamo/issues/1687 * This has also revealed that we are activating profilers for every dynamo optimized model call. This could affect speedup cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87311 Approved by: https://github.com/mlazos, https://github.com/yanboliang commit 8349bf1cd1d5df7be73b194940bcf96209159f40 Author: Horace He Date: Wed Oct 19 21:55:58 2022 +0000 Added special printing to FloorDiv so it's printed out with // insead of as a name (#87263) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87263 Approved by: https://github.com/ezyang commit b90db4a78f8d760377a81a5a64d03ab4b67599de Author: erjia Date: Thu Oct 20 05:05:53 2022 +0000 [DataPipe] Fix type checking to accept both Iter and Map DataPipe (#87285) Fixes https://github.com/pytorch/data/issues/841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87285 Approved by: https://github.com/NivekT commit d94e33f041f37fafe333e491d1c07c8c285a2f58 Author: Antoni Viros i Martin Date: Thu Oct 20 03:46:48 2022 +0000 Add support for .to() for NestedTensor backends (#87146) Summary: This commit adds support for moving NestedTensors from CPU to GPU and back. The implementation includes requires implementing empty_like(), which is based on PR#83140. Test Plan: Added a new unit test based on the unit test for the main .to() implementation. All unit tests must pass, as well as every sandcastle job. Differential Revision: D40437585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87146 Approved by: https://github.com/drisspg commit 472bdb3aa84678b2faa4afe1cb5757f55e14ed9a Author: PyTorch MergeBot Date: Thu Oct 20 03:45:16 2022 +0000 [vision hash update] update the pinned vision hash (#87339) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87339 Approved by: https://github.com/pytorchbot commit c18eead2df44346df989088b18fe4e4a57c2d64e Author: soulitzer Date: Wed Oct 19 18:07:29 2022 -0400 Update saved variable hooks to no longer trigger on wrapped numbers (#87316) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87316 Approved by: https://github.com/ezyang, https://github.com/albanD commit 0cae309069858c1b4e92c1b5c345e28245eef1d3 Author: andrewor14 Date: Wed Oct 19 15:16:13 2022 -0700 [Quant] Add get_symmetric_qnnpack_qconfig_mapping (#87002) Summary: Today, in order to get XNNPACK quantized ops to work, the user must write some code that refers to private data structures (`_FIXED_QPARAMS_OP_TO_OBSERVER`) to create a QConfigMapping that is compatible with the symmetric constraints in the QNNPACK BackendConfig. This is because `get_default_qconfig("qnnpack")` produces a QConfig that does not satisfy these constraints, and the default QConfigMapping for QNNPACK uses this Qconfig. Instead, we simply put this code into a helper function to make it easier for the user to run XNNPACK quantized ops. In the future, once there is feature parity between the set of ops supported by QNNPACK and XNNPACK, we should revisit whether to simply change `get_default_qconfig("qnnpack")` to return an XNNPACK-compatible QConfig. Test Plan: python test/test_quantization.py TestQuantizeFx.test_symmetric_qnnpack_qconfig_mapping Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/87002 Approved by: https://github.com/vkuzo commit e6bc8f415b5bd5b576123ef004021130751b3894 Author: Huy Do Date: Thu Oct 20 02:13:11 2022 +0000 [BE] Move conda cmake installation to Docker (#87309) This is parts of the effort to consolidate pip and conda installation in the CI to improve our CI reliability. This moves conda cmake installation to Docker in those use cases that require it: * Ubuntu bionic and focal On the other hand: * XLA doesn't seem to need conda cmake anymore (Build and test successfully) * Centos is not in used anywhere in the CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/87309 Approved by: https://github.com/ZainRizvi, https://github.com/malfet commit 0d2c2110f178da19aaf89259a2034c9c0653fcee Author: Zachary DeVito Date: Wed Oct 19 14:16:54 2022 -0700 [allocator] Introduce the abstract class CUDACachingAllocator (#87251) This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87251 Approved by: https://github.com/wconstab commit 888e15408e861fcaa6b2bfaa2130cb96e90ffa24 Author: Huy Do Date: Thu Oct 20 01:04:42 2022 +0000 Fix wrong lintrunner version (#87295) The syntax is invalid for pip. I missed this a while back: ``` Run pip install -r .github/requirements-gha-cache.txt ERROR: Invalid requirement: 'lintrunner=0.9.2' (from line 11 of .github/requirements-gha-cache.txt) Hint: = is not a valid operator. Did you mean == ? ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87295 Approved by: https://github.com/ZainRizvi commit bd757b364c92b778533dde51a723f5b6278517e0 Author: Horace He Date: Wed Oct 19 03:19:22 2022 +0000 Ensure that symbolic variables incorporate fresh constraints before they're used (#87254) cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87254 Approved by: https://github.com/jansel commit bcde75427e89df07a5744e64ca9271d1c53e8a7e Author: Sahan Paliskara Date: Wed Oct 19 13:36:52 2022 -0700 run torch::deploy test using pip install (#86507) This PR runs the unit tests for [multipy](https://github.com/pytorch/multipy) in pytorch core such that we are able to make sure changes in core do not break multipy as adding `_prims` did. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86507 Approved by: https://github.com/anirbanr-fb-r2p, https://github.com/d4l3k commit 07bd053a7ef92263db8d612f4fc7c28e06ade45c Author: Rohan Varma Date: Tue Oct 18 10:56:04 2022 -0700 [rpc] Wrap exception creation with try/catch (#87224) Sometimes, we cannot recreate the exception with only string (for example if it is a custom exception type). Ideal situation would be to carry over all details on how to recreate the remote end's exception and throw that on client, but for now, we raise a RuntimeError with the original error msg when we cannot reconstruct. Created from CodeHub with https://fburl.com/edit-in-codehub Differential Revision: [D40353274](https://our.internmc.facebook.com/intern/diff/D40353274/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87224 Approved by: https://github.com/fduwjj commit c97ffcff464fad4aa12a86b99a75d491071cd575 Author: Edward Z. Yang Date: Wed Oct 19 15:39:12 2022 -0400 [discussion] fix for aot autograd outputs that dont require grad (#86838) Fixes https://github.com/pytorch/functorch/issues/1052 I got here after some discussion with Alban. Today, if you aot_function() trace a program where some of its inputs have `requires_grad=True`, but some outputs are expected to have `requires_grad=False`, we will incorrectly set all outputs to have `requires_grad=True`. A simple solution is to use autograd.function's API for marking outputs as non-differentiable, based on what we witnessed when we traced the forward. This will make the `autograd.Function` that we return **wrong**, if you created it using inputs that required grad, and tried to re-use it with inputs that have different `requires_grad` field. But as long as we're hiding behind dynamo, which should guard on requires_grad, then we'll re-run `aot_function()` and get out a new compiled function that does the right thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86838 Approved by: https://github.com/ezyang commit c9b618447d7c948003f26c3b49c28cdc193bd3f0 Author: Michael Lazos Date: Wed Oct 19 22:44:01 2022 +0000 Fix line numbers bug (#87247) Fixes https://github.com/pytorch/torchdynamo/issues/1462 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87247 Approved by: https://github.com/anijain2305, https://github.com/jansel commit c8889f4e109866610bd1981f03deee8f102b5ce6 Author: Nikita Shulga Date: Wed Oct 19 22:15:28 2022 +0000 `cuda._is_in_bad_fork`->`_C._cuda_isInBadFork` (#87317) Former is always available, while later is only available if PyTorch compiled with CUDA And if it does, then ``` $ python -c "import torch;print(torch._C._cuda_isInBadFork == torch.cuda._is_in_bad_fork)" True ``` Fixes https://github.com/pytorch/torchdynamo/issues/1709 ( at least the symptom) cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87317 Approved by: https://github.com/voznesenskym, https://github.com/albanD, https://github.com/soumith, https://github.com/jansel commit 56b150ac63653f982c2b4aaa61336e5f6ecd1e4c Author: Yanbo Liang Date: Wed Oct 19 22:13:07 2022 +0000 [Dynamo] Support optimizing over any Tensor with requires_grad = True (#87141) Fixes https://github.com/pytorch/torchdynamo/issues/1604 Re-submit for https://github.com/pytorch/torchdynamo/pull/1646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87141 Approved by: https://github.com/jansel commit 12b2f70a89494b2ad374aaa16b8fdbf16da66e57 Author: albanD Date: Wed Oct 19 11:27:42 2022 -0400 Symintify pad ops (#87046) Following comments below, we need to add support for `std::negate`/`std::min`/`std::max`/`operator-` for SymInt Pull Request resolved: https://github.com/pytorch/pytorch/pull/87046 Approved by: https://github.com/ezyang commit c5de535bc0b785abbacfebddf660af4cd3b2a6a1 Author: atalman Date: Wed Oct 19 21:26:53 2022 +0000 Advance nightly docker to 11.6 (#86941) Fixes following: https://github.com/pytorch/pytorch/actions/runs/3242695506/jobs/5316334351 crash in Docker builds introduced by: #82682 The PR seems to introduce some changes not compatible with cuda 11.3 which is used by our Docker builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/86941 Approved by: https://github.com/malfet commit 6eeeb8817229e7df054db38337cd944b6e2daaad Author: Peter Bell Date: Wed Oct 19 17:00:52 2022 +0100 OpInfo: Sample input cleanup (4/n) (#86324) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86324 Approved by: https://github.com/mruberry commit c141f28b648ee3c6cb0a7286f0aa100297417e74 Author: albanD Date: Wed Oct 19 20:56:37 2022 +0000 Fix compilation warning and spurious print (#87297) Fixes compilation warning, make this warning an error and remove a random print. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87297 Approved by: https://github.com/malfet commit 4a533f12157ffb5c05c142490e4ceaa311981b38 Author: Nikita Shulga Date: Wed Oct 19 20:51:32 2022 +0000 Tweak several test serialization to store models state_dict (#87143) Namely, change: - `test_meta_serialization` - `test_serialization_2gb_file` - `test_pathlike_serialization` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87143 Approved by: https://github.com/ezyang commit cf2be34ff5d854a5afcdc4e88aa468aaeb5d47db Author: George Qi Date: Wed Oct 19 18:27:21 2022 +0000 [maskedtensor] add docs (#84887) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84887 Approved by: https://github.com/cpuhrsch commit cd2161352675c2a02d7c651374511fcef2ef83e7 Author: PyTorch MergeBot Date: Wed Oct 19 20:36:55 2022 +0000 Revert "[primTorch] Add refs for `softmax`, `softmin`, `log_softmax` (#84956)" This reverts commit c09ca93e4733fdf0183433114dda2fc30a846700. Reverted https://github.com/pytorch/pytorch/pull/84956 on behalf of https://github.com/ZainRizvi due to This is causing the MPS test test_output_match_log_softmax_with_dtype_cpu_float32 (__main__.TestConsistencyCPU) to fail commit c08c7997503fbe8472a957f712322d5fb5fa11bf Author: Chien-Chin Huang Date: Wed Oct 19 09:05:48 2022 -0700 [FSDP] Add set_state_dict_type API to setup state_dict_type without using context manager (#86243) FSDP.state_dict_type is a context manager. However, users may want to decide what state_dict is going to used during initialization. `set_state_dict_type` allows users to do so. Differential Revision: [D40083670](https://our.internmc.facebook.com/intern/diff/D40083670/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86243 Approved by: https://github.com/rohan-varma commit f3cc588d09f62471a46d555368f1627932e1812f Author: PyTorch MergeBot Date: Wed Oct 19 18:57:24 2022 +0000 Revert "Dynamo FX graph stack traceback fix (#87136)" This reverts commit 89e6078bc3d83b61e03511304ec42743b84df42e. Reverted https://github.com/pytorch/pytorch/pull/87136 on behalf of https://github.com/clee2000 due to causing a lot of tests to fail on master even though pr is green commit c09ca93e4733fdf0183433114dda2fc30a846700 Author: Nikita Karetnikov Date: Wed Oct 19 05:08:27 2022 +0200 [primTorch] Add refs for `softmax`, `softmin`, `log_softmax` (#84956) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84956 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 00c91f4446d91b04f2313632a4d45addcb9e6950 Author: Zachary DeVito Date: Tue Oct 18 17:27:21 2022 -0700 [allocator] disable tests that don't work for cudaMallocAsyncAllocator (#87250) Two tests were failing locally for me and don't appear to be run in our CI. Disabling them so we can otherwise refactor the allocators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87250 Approved by: https://github.com/wconstab commit 15ca68526cf012dc10a1c450b30dba23643588d3 Author: Richard Zou Date: Tue Oct 18 12:53:23 2022 -0700 [functorch] Get rid of defunct functorch/setup.py (#87235) We initially left it there for BC concerns. - It has been more than a month since then, - I have migrated folks who used the previous install command (pip install ...pytorch.git@subdir=functorch) off of it so it's time to get rid of it Test Plan: - code reading Pull Request resolved: https://github.com/pytorch/pytorch/pull/87235 Approved by: https://github.com/Chillee commit ac80da2293179ac69dc346b6d15d9f7f7ba154f7 Author: Richard Zou Date: Tue Oct 18 12:51:18 2022 -0700 [functorch] add test for torch.manual_seed inside grad transform (#87233) I can see this behavior regressing really easily, so adding a test for it. Test Plan: - run test Pull Request resolved: https://github.com/pytorch/pytorch/pull/87233 Approved by: https://github.com/Chillee commit f56ce8dbad728ca59a29b7dd089f5a705a40f70d Author: Zachary DeVito Date: Tue Oct 18 13:24:52 2022 -0700 [allocator] Move getFreeMutex (#87237) It isn't used at all the allocators and this change makes that more clear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87237 Approved by: https://github.com/wconstab commit 89e6078bc3d83b61e03511304ec42743b84df42e Author: William Wen Date: Wed Oct 19 17:15:43 2022 +0000 Dynamo FX graph stack traceback fix (#87136) Migration from https://github.com/pytorch/torchdynamo/pull/1655. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87136 Approved by: https://github.com/voznesenskym commit 40d0fa53149c6d50a199585f7498db9ec93f98ef Author: atalman Date: Wed Oct 19 17:09:37 2022 +0000 Reenable aot tests on windows for cuda 11.7 and up (#87193) Reenable aot tests on windows for cuda 11.7 and up Issue: https://github.com/pytorch/pytorch/issues/69460 seems to be mitigated in CUDA 11.7 hence re-enable this test cc @peterjc123 @mszhanyi @skyline75489 @nbcsm Pull Request resolved: https://github.com/pytorch/pytorch/pull/87193 Approved by: https://github.com/malfet commit 86a581928a4f5065a79771a7a2d87c6999c452e9 Author: Huy Do Date: Wed Oct 19 17:01:09 2022 +0000 Pin ios conda dependencies (#87229) I also pin blas to 1.0 instead of the newer 2.116 available elsewhere (https://anaconda.org/conda-forge/blas) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87229 Approved by: https://github.com/izaitsevfb, https://github.com/ZainRizvi, https://github.com/malfet commit a79e034d89d3d112fcb8d16f7a6862934a44955d Author: Nikita Shulga Date: Wed Oct 19 17:00:10 2022 +0000 [MPS] Do not dispatch empty job in `bitwise_not` (#87286) Follows the pattern from https://github.com/pytorch/pytorch/pull/85285 and returns before computing dispatching an empty metal kernel for bitwise not operation. Fixes crash when invoked with empty MPS tensor on AMD GPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/87286 Approved by: https://github.com/kulinseth commit 6775c3e19d74f01841584dbb1d71fc84fe991455 Author: Natalia Gimelshein Date: Wed Oct 19 16:55:27 2022 +0000 fix 0d cpu tensor handling when it's the first arg (#87273) Fixes https://github.com/pytorch/torchdynamo/issues/1681 When at least one of the pw args is on cuda, set device to cuda. We assume that cases of true device mismatch have been already weeded out during tracing, and what we have is 0d cpu tensor + cuda tensor interop. Also fix 0d tensor test that previously wasn't compiling with dynamo. cc @jansel @lezcano @fdrocha Pull Request resolved: https://github.com/pytorch/pytorch/pull/87273 Approved by: https://github.com/soumith, https://github.com/voznesenskym commit fb6826bfd82660aa905459f894c81d97d143dd2c Author: Brian Hirsh Date: Tue Oct 18 16:13:27 2022 -0700 add an API for external backends to register custom device names (#86992) This API adds some improvements to external backends who are building C++ backends out of tree using the `PrivateUse1` dispatch key. The docs and linked examples go over the API in more detail, but you should be able to use it like: ``` > torch.register_privateuse1_backend("foo")` > a = torch.ones(2, device="foo") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86992 Approved by: https://github.com/albanD commit cc64863d71dd31e74c42b190a6a2dfd5de0305e6 Author: William Wen Date: Wed Oct 19 16:39:12 2022 +0000 Clean Inductor complication cache during dynamo dashboard run (#87246) Implement improvement from https://github.com/pytorch/torchdynamo/issues/1644. Tested by running `python benchmarks/dynamo/runner.py --print_run_commands --training` and inspecting the generated `run.sh` file for the `--cold_start_latency` flag, e.g. ``` python benchmarks/dynamo/torchbench.py --performance --float32 -dcuda --output=benchmark_logs/inductor_torchbench_float32_training_cuda_performance.csv --training --inductor --no-skip --dashboard -x fambench_xlmr -x detectron2_fasterrcnn_r_50_c4 -x detectron2_fasterrcnn_r_50_dc5 -x detectron2_maskrcnn_r_101_fpn -x detectron2_maskrcnn_r_50_fpn -x detectron2_fasterrcnn_r_50_fpn -x detectron2_maskrcnn -x detectron2_fasterrcnn_r_101_dc5 -x opacus_cifar10 -x detectron2_maskrcnn_r_101_c4 -x pyhpc_turbulent_kinetic_energy -x maml -x detectron2_fasterrcnn_r_101_fpn -x pyhpc_equation_of_state -x detectron2_fasterrcnn_r_101_c4 -x pyhpc_isoneutral_mixing --cold_start_latency ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87246 Approved by: https://github.com/anijain2305, https://github.com/jansel commit b3071e2eb61fbbad9b36d7022855111efc6c37f4 Author: Brian Hirsh Date: Tue Oct 18 18:29:15 2022 -0700 functionalization: skip meta reference compute for aot autograd (#87108) The context is that historically, XLA/LTC tensors haven't had accurate stride information, and functionalization would run "reference" meta kernels for view ops on the side to properly compute strides. This is more complicated in symint tracing world - we have a `FunctionalTensorWrapper()` that wraps the underlying tensor and has its own set of sizes/strides metadata, but we never create proxy objects for the sizes/strides of the wrapper. In symint tracing world with aot autograd, we're guaranteed that our underlying strides are accurate anyway, since aot autograd uses fake tensors to perform tracing. We encountered a few bugs with symint's from the `FunctionalTensorWrapper` making their way into `__torch_dispatch__`. To side-step that area of bugs completely (and marginally improve perf), this PR disables the meta tensor tracing for non XLA/LTC use cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87108 Approved by: https://github.com/ezyang, https://github.com/wconstab commit 4801397b6ee2a82098b059b40294039d9d350eaa Author: Brian Hirsh Date: Tue Oct 18 18:29:15 2022 -0700 ban .sizes() and .strides() calls in derivatives.yaml (#86611) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86611 Approved by: https://github.com/wconstab, https://github.com/albanD commit 182ee8799675dee562e8589f4420184c267a8db0 Author: anjali411 Date: Wed Oct 19 12:28:02 2022 +0000 symintify nll loss fns (#86915) (#87095) This reverts commit bbd7b38d5580c44ffb4404d431e07bc2316e59d5. Reland https://github.com/pytorch/pytorch/pull/86915 with a fix for python arg parser handing for SymInt and SymIntList. This was uncovered because we are calling directly into python bindings code through test_autocast.py (`torch._C._nn.nll_loss`) without providing a value for the optional symint arg (`ignore_index`). The arg parser constructs the SymInt and SymIntList using the recorded "default_int" or "default_int_list" (schema string parsing) in case a value is not received for an optional argument. Since we weren't handling the symint case properly, the default_int just had a garbage value which was later being used to construct SymInt. Follow up issue for other unhandled parameter types: https://github.com/pytorch/pytorch/issues/87283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87095 Approved by: https://github.com/ezyang, https://github.com/albanD commit c6187ea326e6bfb2054e271c8fed23f14ab53615 Author: leizhenyuan Date: Wed Oct 19 13:24:48 2022 +0000 add support for pin memory on xpu device (#86545) add support for pin memory on xpu device Pull Request resolved: https://github.com/pytorch/pytorch/pull/86545 Approved by: https://github.com/ezyang commit 528dd05108cdac6726748c34e385b5c3136256df Author: kshitij12345 Date: Wed Oct 19 09:12:27 2022 +0000 [complex] conv_transpose2d (#81805) Reference: https://github.com/pytorch/pytorch/issues/71108 Fixes : #86414 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81805 Approved by: https://github.com/anjali411 commit 232fbd90ff6d93362120d955befeeb297179ddad Author: XiaobingSuper Date: Sun Oct 16 22:54:57 2022 -0400 [TorchDynamo]: fused bias for cpu convolution path (#87050) For aten.convolution CPU path, the bias always can be fused, so this PR adds a device check: if inputs' device is CPU, we will fuse it for a good performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87050 Approved by: https://github.com/jgong5, https://github.com/jansel commit 5e23074f0d8538ba00645f08a48cc12bf5ae3a8e Author: Horace He Date: Wed Oct 19 02:07:13 2022 +0000 Fixed FakeTensor not calling CompositeImplicitAutograd decomps sometimes (#87252) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87252 Approved by: https://github.com/ezyang, https://github.com/bdhirsh commit b5bdc34541a407390b7f9bd3dcc97b1d7b982c7f Author: Jason Ansel Date: Wed Oct 19 06:32:42 2022 +0000 [inductor] Sympy compability fix (#87249) Test Plan: github tests Reviewed By: yf225, voznesenskym Differential Revision: D40495411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87249 Approved by: https://github.com/ngimel, https://github.com/voznesenskym commit 6faa6c68e8b76fb68f3a2b2783685102d0e87c00 Author: Chiao Date: Wed Oct 19 05:11:29 2022 +0000 fsdp lazy_init typo (#87184) Minor typo, changed with -> without Pull Request resolved: https://github.com/pytorch/pytorch/pull/87184 Approved by: https://github.com/awgu commit 2418ddb1ecf609b6e302257bfc10c62db1dc147e Author: Horace He Date: Wed Oct 19 01:24:38 2022 +0000 Unified symbolic shape variables between Inductor and AOTDispatcher (#87161) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87161 Approved by: https://github.com/jansel commit 48df4b7a1ddcb0a60d97e24a22cf3b3e6ad9d378 Author: PyTorch MergeBot Date: Wed Oct 19 04:12:52 2022 +0000 [vision hash update] update the pinned vision hash (#87100) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87100 Approved by: https://github.com/pytorchbot commit dfe3fc028c7c0e9a40701dcd5d6c72c20e35b690 Author: Nikita Shulga Date: Wed Oct 19 03:35:16 2022 +0000 [CI] Add triton wheels build workflow (#87234) Also, add `torchtriton` and `jinja2` as extra `dynamo` dependency to PyTorch wheels, Version packages as first 10 characters of pinned repo hash and make `torch[dynamo]` wheel depend on the exact version it was build against. TODO: Automate uploading to nightly wheels storage Pull Request resolved: https://github.com/pytorch/pytorch/pull/87234 Approved by: https://github.com/msaroufim commit c413a32135b745d29e555069d7cd8f6e6527b59f Author: David Berard Date: Mon Oct 17 08:40:21 2022 -0700 Release note script: match topics with spaces or underscores (#87011) e.g. match "new features" in the category as "new_features" Pull Request resolved: https://github.com/pytorch/pytorch/pull/87011 Approved by: https://github.com/albanD, https://github.com/soulitzer commit c471c29fdccc3fe48a78083c638a4a88559488b4 Author: Driss Guessous Date: Wed Oct 19 02:16:29 2022 +0000 Update sdp guards for performance (#87241) Makes the contiguous check for the nt input more strict/correct as well as makes some performance improvements to the checks Pull Request resolved: https://github.com/pytorch/pytorch/pull/87241 Approved by: https://github.com/cpuhrsch commit 6d0d7afe8d5ed7a701d634729dc7be9d0ef4a4b2 Author: Nikita Shulga Date: Wed Oct 19 02:11:54 2022 +0000 [GHA][BE] Delete unused macros from `common.yml.j2` (#87253) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87253 Approved by: https://github.com/huydhn commit 31e731e5aeffcdf22b4a20f7b9f716694151fe0a Author: Michael Suo Date: Tue Oct 18 14:58:23 2022 -0700 [dynamo] fix logging (#87239) Currently, setting `torch._dynamo.config.log_level` doesn't do anything, as the module name has changed during the move. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87239 Approved by: https://github.com/jansel, https://github.com/soumith, https://github.com/mlazos commit 7ff1ca4e33df951653c116621bbade88941cb2bd Author: Tongzhou Wang Date: Wed Oct 19 00:25:02 2022 +0000 Add type annotation to get_worker_info (#87017) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87017 Approved by: https://github.com/ejguan, https://github.com/NivekT commit 4dc579838be94c343cc8542c7a80b9a9a8c15b51 Author: Yidi Wu Date: Wed Oct 19 00:12:59 2022 +0000 Allow fx.Graph.owning_module to be used as attribute. (#86822) Summary: The current behavior of owning_module setter is difficult to understand: it changes the owning_module to None if owners is not 0 but increments the owners count. If the owning_module is None, the owners count should be 0 as none of them is accessible. On the other hand, if the owners count increases, the owning_module should be a collection (e.g. a list). This diff changes owning_module to be a normal attribute. The semantic is that graph can have **at most one** owning module and can be assigned to new module. The alternative is to use a list to represent the owning_modules of a graph but it breaks backward compatibility and the exact use cases of having multiple owning_modules are not clear. Test Plan: Test with CI. Differential Revision: D40200624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86822 Approved by: https://github.com/tugsbayasgalan commit 3eb742938578b752ca03d0f9962158dcb0edd343 Author: Seonglyong Gong Date: Wed Oct 19 00:00:10 2022 +0000 [Profiler][trivial] Add profiler options to trace metadata (#87102) Summary: Add profiler options (`profile_memory`, `record_shapes`, `with_stack`, `with_modules`, and `with_flops`) to trace metadata Test Plan: CI tests Differential Revision: D40373514 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87102 Approved by: https://github.com/aaronenyeshi commit f6c6048b1086f291ac9934ee1927270eba5a6519 Author: Christian Puhrsch Date: Tue Oct 18 23:11:47 2022 +0000 Use CUTLASS GEMM for NT bmm (#85894) Copy of https://github.com/pytorch/pytorch/pull/85710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894 Approved by: https://github.com/drisspg commit 80790ecee4f04de4bf1675fec8a2593d7a2b32c0 Author: Jane Xu Date: Tue Oct 18 23:01:28 2022 +0000 [einsum] Call view instead of sum to remediate MPS regression (#87135) Fixes #87010. It turns out that squeeze is much faster than sum, and view is faster than squeeze, so we should default to that whenever possible. Benchmarking results show that, on MPS, we would be going from the following code taking **29.89ms instead of the current 1466ms, almost a 50x speedup**. ``` q = torch.rand(16, 4096, 40, device='mps', dtype=torch.float) k = torch.rand(16, 4096, 40, device='mps', dtype=torch.float) torch.einsum('b i d, b j d -> b i j', q, k).max().item() ``` And a regular einsum will now take **.506ms instead of 2.76ms.** ``` q = torch.rand(16, 4096, 40, device='mps', dtype=torch.float) k = torch.rand(16, 4096, 40, device='mps', dtype=torch.float) torch.einsum('b i d, b j d -> b i j', q, k) ``` Special thanks to @soulitzer for helping me experiment + figure out how to squash the remaining 5x regression due to squeeze being slower than view!! Pull Request resolved: https://github.com/pytorch/pytorch/pull/87135 Approved by: https://github.com/soulitzer, https://github.com/malfet, https://github.com/albanD commit c4a03e4da19c643b8321a4a0ba0863259498ca7b Author: Jane Xu Date: Tue Oct 18 22:58:44 2022 +0000 [einsum] keep the promise that we contract left to right (#87199) We promise that if path is not defined, we would go left to right. The previous code did not keep that promise as we push'd combined ops to the back of the list. For most use cases this is fine (einsum with 3 or fewer inputs), but we should do what we say. Test plan: Added a print statement to print the sizes of ops we're contracting to see if the order is fixed. Code run: ``` import torch a = torch.rand(1) b = torch.rand(2) c = torch.rand(3) d = torch.rand(4) torch.einsum('a,b,c,d->abcd', a,b,c,d) ``` BEFORE--it does a+b, then c+d, then a+b+c+d, which...is right, but it's not the order specified by the user. ``` /Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 1, 1, 1]and b: [1, 2, 1, 1] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.) return _VF.einsum(equation, operands) # type: ignore[attr-defined] /Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 1, 3, 1]and b: [1, 1, 1, 4] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.) return _VF.einsum(equation, operands) # type: ignore[attr-defined] /Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 2, 1, 1]and b: [1, 1, 3, 4] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.) return _VF.einsum(equation, operands) # type: ignore[attr-defined] ``` WITH THIS CHANGE--it actually goes left to right: a+b, a+b+c, a+b+c+d ``` /Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 1, 1, 1]and b: [1, 2, 1, 1] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.) return _VF.einsum(equation, operands) # type: ignore[attr-defined] /Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 2, 1, 1]and b: [1, 1, 3, 1] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.) return _VF.einsum(equation, operands) # type: ignore[attr-defined] /Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 2, 3, 1]and b: [1, 1, 1, 4] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.) return _VF.einsum(equation, operands) # type: ignore[attr-defined] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87199 Approved by: https://github.com/soulitzer commit d06d569e90f3ca3e721b679be285385e5bd3eea9 Author: Driss Guessous Date: Tue Oct 18 21:38:43 2022 +0000 Update the sdp benchmark to work with nested tensors (#87215) Update the sdp benchmark to work with nested tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/87215 Approved by: https://github.com/cpuhrsch commit e8c4adf3c3b8e479d240c3160d85fde68808e92c Author: Christian Puhrsch Date: Tue Oct 18 21:07:57 2022 +0000 Add torch.sparse overview section (#85265) The goal of this section is to provide a general overview of how PyTorch handles sparsity for readers who are already familiar with sparse matrices and their operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85265 Approved by: https://github.com/jisaacso commit 31edccf6c7080ccfbdce93613ba2deadaaf3b0b0 Author: PyTorch MergeBot Date: Tue Oct 18 21:03:23 2022 +0000 Revert "Temporarily disable ios jobs (#87186)" This reverts commit d29dc2b72a6cb5fb24ff3eacd816e08bd16298dc. Reverted https://github.com/pytorch/pytorch/pull/87186 on behalf of https://github.com/huydhn due to Official conda channel is back and conda-forge has been reverted commit 223ad9bc9e7a0af5bf37587933f81da43cf84868 Author: Catherine Lee Date: Tue Oct 18 20:57:55 2022 +0000 [ci] remove circleci mac jobs (#87225) mac jobs are run on every pr after approval, so these are redundant ios jobs can stay until the end of the year because they are on periodic and not run on every pr Pull Request resolved: https://github.com/pytorch/pytorch/pull/87225 Approved by: https://github.com/malfet, https://github.com/ZainRizvi, https://github.com/janeyx99 commit 9a786202b704b9488fdc9e5163ff6af88510d56f Author: Catherine Lee Date: Tue Oct 18 20:57:27 2022 +0000 [ci] fix log printing (#87223) idk how i missed this example https://github.com/pytorch/pytorch/actions/runs/3275717751/jobs/5391093040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87223 Approved by: https://github.com/malfet, https://github.com/kit1980, https://github.com/janeyx99 commit afa508607827aa3397165f4d6cde0180369cc3ba Author: PyTorch MergeBot Date: Tue Oct 18 20:54:06 2022 +0000 Revert "Install blas from conda-forge (#87150)" This reverts commit f02f0e3ad1565e3da1e78efaa994e80c7577fd0c. Reverted https://github.com/pytorch/pytorch/pull/87150 on behalf of https://github.com/huydhn due to Conda issue has been resolved upstream https://github.com/pytorch/pytorch/issues/87148 commit e7cefff05830fa1209daec4bc004e0ba1c1277b2 Author: Aaron Enye Shi Date: Tue Oct 18 20:47:09 2022 +0000 [Kineto][Profiler] Guard event metadata python thread via verbose flag (#87096) Summary: For Python Tracing enabled trace files, this field "python thread": 0 is repeated for every python_function event. This bloats the trace json size for large number of events or deep call stacks. Instead make this metadata guarded by the verbose flag. Test Plan: CI Reviewed By: robieta, slgong-fb Differential Revision: D40325815 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/87096 Approved by: https://github.com/slgong-fb, https://github.com/robieta commit c54bcea7934f59896ee8973ca814b8ea8597989e Author: Will Feng (DPER) Date: Tue Oct 18 20:26:30 2022 +0000 Improve complex_memory_overlap check for Inductor CUDA graph (#87177) Point fix for https://github.com/pytorch/torchdynamo/issues/1620 to unblock internal models. Supersedes https://github.com/pytorch/pytorch/pull/87058. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87177 Approved by: https://github.com/ezyang commit ef1844a151218046a7f7266e0015264f2b0bc7b4 Author: Nikita Shulga Date: Tue Oct 18 20:05:45 2022 +0000 [CI] Move sm86 tests from periodic to trunk (#87228) This adds Ampere GPU testing to trunk CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/87228 Approved by: https://github.com/jansel, https://github.com/huydhn commit 1dbc8ad3b74f774d8571eed95559714260f0b6de Author: Kurt Mohler Date: Tue Oct 18 20:02:42 2022 +0000 Add `Warning` class and refactor C++ warnings to use it (#84101) Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101 Approved by: https://github.com/albanD commit db6590925593e7af9b373680d6e6e76d1b7a359c Author: Andrew M. James Date: Tue Oct 18 19:55:18 2022 +0000 [Docs] Update mm family ops and F.linear to note limited sparse support. (#86220) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86220 Approved by: https://github.com/cpuhrsch commit a73ca6f58c1487abab013805922c47437e50eecf Author: PyTorch MergeBot Date: Tue Oct 18 19:34:02 2022 +0000 Revert "Improve readability of the extra message errors in assertEqual (#87202)" This reverts commit 56c28ee32a78eb6f32a533d8fd64278cb9063016. Reverted https://github.com/pytorch/pytorch/pull/87202 on behalf of https://github.com/malfet due to broke test_testing, see https://hud.pytorch.org/pytorch/pytorch/commit/56c28ee32a78eb6f32a533d8fd64278cb9063016 commit e4285f09b9993d4a17b755c74b68bed69f7473d0 Author: Fabio Rocha Date: Tue Oct 18 09:43:59 2022 +0000 [inductor] new way to compile f64 libdevice calls (#87189) Porting over [torchdynamo/#1633](https://github.com/pytorch/torchdynamo/pull/1633) `torch/_inductor/codegen/triton.py` now defines `libdevice_` variants of some functions. You can request dispatch to those for float64 dtypes when using `register_pointwise` by setting `use_libdevice_for_f64=True`. Other minor changes: - In triton, sigmoid now codegens tl.sigmoid - silu now comes from decomp, not lowering - Some test skips no longer necessary, removed or made xfails Switching to `tl.sigmoid` has exactly same performance. Moving `silu` to decomp does not change anything, same triton code is generated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87189 Approved by: https://github.com/ngimel commit c56be31d2ec838f29c46d8b585b31b5e47f478e8 Author: Jiang, Yanbing Date: Tue Oct 18 19:07:58 2022 +0000 Upgrade oneDNN to v2.7 (#87061) This PR is to upgrade oneDNN to v2.7. **Performance Optimizations** - Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). - Introduced performance optimizations for [bf16 floating point math mode](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) on Intel Xeon Scalable processors (code name Sapphire Rapids). The bf16 math mode allows oneDNN to use bf16 arithmetic and Intel AMX instructions in computations on fp32 data. Please go to https://github.com/oneapi-src/oneDNN/releases/tag/v2.7 for more detailed changes. **Functionality** - Updated ITT API to 3.22.5 - Fixed correctness issue in fp32 convolution implementation for cases with large spatial size (https://github.com/pytorch/pytorch/issues/84488) Use TorchBench test in ICX with 40 cores Intel OpenMP & tcmalloc were preloaded ![image](https://user-images.githubusercontent.com/61222868/196121957-656faebc-9f4a-49f0-9ef0-0784416c3a47.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87061 Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper, https://github.com/weiwangmeta commit 2485498294c213daa6092cf384a85ac0890d7fa7 Author: Andrew Gu Date: Tue Oct 18 15:37:01 2022 +0000 [FSDP] Use `all_gather_into_tensor()` (#87077) Let us silence some warnings 👍🏼 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87077 Approved by: https://github.com/rohan-varma commit 56c28ee32a78eb6f32a533d8fd64278cb9063016 Author: lezcano Date: Tue Oct 18 15:05:33 2022 +0000 Improve readability of the extra message errors in assertEqual (#87202) Goes from (note the `linspace.default` is very difficult to find) ``` Mismatched elements: 15 / 50 (30.0%) Greatest absolute difference: 1 at index (17,) Greatest relative difference: 1.0 at index (17,) : linspace.default args = (0, -3, 50) kwargs = {'dtype': torch.int16, 'device': device(type='cpu'), 'pin_memory': False} ``` to ``` Mismatched elements: 15 / 50 (30.0%) Greatest absolute difference: 1 at index (17,) Greatest relative difference: 1.0 at index (17,) linspace.default args = (0, -3, 50) kwargs = {'dtype': torch.int16, 'device': device(type='cpu'), 'pin_memory': False} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87202 Approved by: https://github.com/ezyang commit 48f02312232d71f8e5cabfcc85b70f8330953057 Author: lezcano Date: Tue Oct 18 09:36:29 2022 +0000 Fix Scalar(bool) handling in toIValue (#87179) At the moment, they were casted to `int64`, which breaks quite a few casting rules for example in `ops.aten`. Quite a vintage bug, circa 2020. With this fix, the following code prints `torch.bool`, rather than `torch.int64`. ```python import torch msk = torch.tensor([False]) b = torch.tensor([False]) print(torch.ops.aten.where.ScalarSelf(msk, True, b).dtype) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87179 Approved by: https://github.com/albanD commit 4540330f97313096793f0bd7115ac84adb616a4c Author: PyTorch MergeBot Date: Tue Oct 18 18:29:15 2022 +0000 Revert "Use conda-forge in mac mps test (#87155)" This reverts commit 74138a8daa93ec4cb08e4dd31c2773ec0c751d94. Reverted https://github.com/pytorch/pytorch/pull/87155 on behalf of https://github.com/huydhn due to Conda issue has been resolved upstream https://github.com/pytorch/pytorch/issues/87148 commit adc7ee09dce01e3e49985e76f07055af98262d03 Author: Horace He Date: Tue Oct 18 02:43:48 2022 +0000 Added upsample_nearest3d/1d lowering to inductor (#87158) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87158 Approved by: https://github.com/ngimel commit d7801a60424d1fa2823af1b19a21ad61070f5ff0 Author: Michael Voznesensky Date: Tue Oct 18 18:24:13 2022 +0000 Add voznesenskym to CODEOWNERS (#87227) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87227 Approved by: https://github.com/jansel commit 88b76ae9ea89dda5847133f7414073f44bba4535 Author: Sherlock Huang Date: Mon Oct 17 22:53:50 2022 +0000 Store type(module) in the module stack (#87149) - As requested by quantization team, it prefer storing type(module) in the module stack. - Consequently, as module stack gets verbose, we skip printing module stack in the gm.print_readable() Pull Request resolved: https://github.com/pytorch/pytorch/pull/87149 Approved by: https://github.com/jerryzh168, https://github.com/jansel commit d01eea6027c26bf100fc99a705669f60648964ae Author: Nikita Shulga Date: Tue Oct 18 17:19:52 2022 +0000 Do not run triton tests on sm86 (#87198) As its broken right now and nobody care to fix it, see this test run for example: https://hud.pytorch.org/pytorch/pytorch/commit/d36c284d1446cb250178f8e89fff9b342ee1a5a9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87198 Approved by: https://github.com/soumith, https://github.com/albanD commit 2b03a941f7a3b2539731ef26ce2462b883e296e9 Author: Michael Voznesensky Date: Tue Oct 18 16:54:40 2022 +0000 [dynamo] graph capture for calls to arbitrary self. methods on nn module (#87040) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87040 Approved by: https://github.com/jansel commit 09a967d6c9e464a49909df7ff1459e00ab8aac09 Author: hxu296 Date: Tue Oct 18 16:50:39 2022 +0000 Make nested TreeSpec printing nicer (#46538) (#86546) 1. Made TreeSpec into a dataclass. 2. In `__repr__`, recursively transformed TreeSpec into dictionaries and then pretty-printed it. Fixes #46538. Hi, @ezyang. this PR is for the TreeSpec `__repr__` refactor we discussed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86546 Approved by: https://github.com/ezyang commit 440f734169c1337dc84323adb1e88e11d7a72059 Author: Animesh Jain Date: Tue Oct 18 15:53:53 2022 +0000 [inductor] Minifier fixes (#87062) Fixes https://github.com/pytorch/torchdynamo/issues/1690 This fixes the error seen in the minifiers. But does not repro the original issue that prompted the above issue. Fx minifiers work at the level of Fx-graphs, and the original issue lies outside of the Fx graph and is only visible on the second iteration. Therefore, the original issue escapes the abstraction of our existing Fx-based minifiers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87062 Approved by: https://github.com/eellison commit c30cfb07abb930ae2227692a20dbb5e4b9632db7 Author: Animesh Jain Date: Tue Oct 18 15:53:40 2022 +0000 [dynamo][dashboard] Run 2 iterations for the correctness runs (#87104) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87104 Approved by: https://github.com/soumith commit d29dc2b72a6cb5fb24ff3eacd816e08bd16298dc Author: Huy Do Date: Tue Oct 18 15:27:27 2022 +0000 Temporarily disable ios jobs (#87186) While investigating segfault issue: * https://app.circleci.com/pipelines/github/pytorch/pytorch/584349/workflows/6c68b0ce-023e-4f62-83bf-e77962daf8ad/jobs/17180595 * https://github.com/pytorch/pytorch/actions/runs/3269860268/jobs/5377851127 This might be related to the use of conda-forge in https://github.com/pytorch/pytorch/issues/87148, i.e. conda-forge pulls in different version of some dependencies and breaks thing. If that's the case, we could not revert conda-forge change yet because the checksum issue hasn't been fixed upstream yet (Test PR https://github.com/pytorch/pytorch/pull/87185) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87186 Approved by: https://github.com/ZainRizvi, https://github.com/malfet commit ecd25df3131bea694e9b34fe4a76f8ca411a8f05 Author: Christian Puhrsch Date: Tue Oct 18 15:24:18 2022 +0000 Add prototype warning to MaskedTensor constructor (#87107) When a user constructs a MaskedTensor we should signal its development status to set expecations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87107 Approved by: https://github.com/bhosmer commit 240bba7ac85b6163c7c75a168019cd0b6d1c6aa0 Author: anjali411 Date: Tue Oct 18 12:16:05 2022 +0000 add sym_int (#86916) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86916 Approved by: https://github.com/ezyang commit 157310c85ddcf1047377adecd1b905994436d613 Author: Soumith Chintala Date: Tue Oct 18 14:08:01 2022 +0000 [inductor][triton] if device is a torch.device, then make cuda_properties index it correctly (#87174) Without this, I was running into obvious `KeyError`s that were assuming that the device was an integer when running `examples/imagenet`. ```python (pytorch) soumith@bluebox:~/code/examples/imagenet$ python main.py --gpu 0 /home/soumith/dataset/imagenet /home/soumith/code/vision/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: warn(f"Failed to load image Python extension: {e}") /home/soumith/code/examples/imagenet/main.py:100: UserWarning: You have chosen a specific GPU. This will completely disable data parallelism. warnings.warn('You have chosen a specific GPU. This will completely ' Use GPU: 0 for training => creating model 'resnet18' make_fallback(aten.unfold): a decomposition exists, we should switch to it make_fallback(aten.unfold_backward): a decomposition exists, we should switch to it Traceback (most recent call last): File "/home/soumith/code/pytorch/torch/_inductor/graph.py", line 254, in call_function return lowerings[target](*args, **kwargs) File "/home/soumith/code/pytorch/torch/_inductor/lowering.py", line 202, in wrapped return decomp_fn(*args, **kwargs) File "/home/soumith/code/pytorch/torch/_inductor/lowering.py", line 2994, in var_ diffs = square(sub(x, mean(x, axis, keepdim=True))) File "/home/soumith/code/pytorch/torch/_inductor/lowering.py", line 202, in wrapped return decomp_fn(*args, **kwargs) File "/home/soumith/code/pytorch/torch/_inductor/lowering.py", line 2983, in mean sum_result = sum_(x, axis, keepdim) File "/home/soumith/code/pytorch/torch/_inductor/lowering.py", line 202, in wrapped return decomp_fn(*args, **kwargs) File "/home/soumith/code/pytorch/torch/_inductor/lowering.py", line 3211, in sum_ return fn(x, axis, keepdims, dtype=dtype) File "/home/soumith/code/pytorch/torch/_inductor/lowering.py", line 2953, in inner result = Reduction.create( File "/home/soumith/code/pytorch/torch/_inductor/ir.py", line 714, in create hint, split = cls.num_splits( File "/home/soumith/code/pytorch/torch/_inductor/ir.py", line 454, in num_splits num_sm = get_device_properties(device).multi_processor_count File "/home/soumith/code/pytorch/torch/_inductor/cuda_properties.py", line 43, in get_device_properties return _properties()[_device(device)] KeyError: device(type='cuda', index=0) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87174 Approved by: https://github.com/yf225 commit dbccccb7a2f724fc57e42bd1f347212f12984a67 Author: Nikita Shulga Date: Tue Oct 18 13:53:30 2022 +0000 [BE] Get rid of deprecation warnings in workflows (take 3) (#87152) - Per [deprecation announcement](https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/) replace `echo "::set-output name="` with echo to `${GITHUB_OUTPUT}` as shown in following [example](https://docs.github.com/en/actions/using-jobs/defining-outputs-for-jobs#example-defining-outputs-for-a-job) - Update `actions/setup-python` from `v2` to `v4` to get rid of deprecated node version warning - Update `actions/checkout-python` from `v2` to `v3` (and `silent-checkout` branch as well) - Update `retry` action to https://github.com/nick-fields/retry/commit/3e91a01664abd3c5cd539100d10d33b9c5b68482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87152 Approved by: https://github.com/kit1980, https://github.com/izaitsevfb commit 9ac2a06acf75538a35751f785d5f509d6127d6cd Author: Peter Bell Date: Mon Oct 17 20:59:19 2022 +0100 istft: require complex input (#86628) Real dtype input to `torch.istft` has been deprecated since PyTorch 1.8, so it is more than passed its due date to be removed. BC-breaking message: `torch.istft` no longer supports input in the form of real tensors with shape `(..., 2)` to mimic complex tensors. Instead, convert inputs to a complex tensor first before calling `torch.istft`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86628 Approved by: https://github.com/mruberry commit b886cd15f5d2979e50790aa7420b6bd94fd7b89d Author: Nikita Karetnikov Date: Mon Oct 17 22:55:35 2022 +0200 [primTorch] Add a ref for NumPy-style `T` (#86850) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86850 Approved by: https://github.com/lezcano, https://github.com/mruberry commit f2ec9fbd03b131fe4f80ad77305271912a687246 Author: Nikita Vedeneev Date: Tue Oct 18 09:07:35 2022 +0000 `torch.ormqr`: backward support (#86800) Seems good to have, especially when neither `a` nor `tau` requires grads and/or they are pretty small in number. Fixes https://github.com/pytorch/pytorch/issues/86267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86800 Approved by: https://github.com/lezcano commit 841995d53b7ea51e8dae64e0d3d4f4d888406d8b Author: Nikita Karetnikov Date: Mon Oct 17 21:43:28 2022 +0200 [primTorch] Add refs for data conversion ops (#86561) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86561 Approved by: https://github.com/lezcano, https://github.com/mruberry, https://github.com/zou3519 commit 731b4bf0f119315495e3847e065afca282778ee6 Author: PyTorch MergeBot Date: Tue Oct 18 08:14:15 2022 +0000 Revert "Check all CUDA API calls in aten/src/ATen/test for errors (#74919) (#83556)" This reverts commit a7ed398cf6bca767d93c6d81f3ecf4198e1b52e0. Reverted https://github.com/pytorch/pytorch/pull/83556 on behalf of https://github.com/huydhn due to Sorry for revert your PR, but I think it breaks cuda tests https://hud.pytorch.org/pytorch/pytorch/commit/a7ed398cf6bca767d93c6d81f3ecf4198e1b52e0. This should not have been force merged commit 8b0cc9c752477238cacfa171abf5061bc08bed28 Author: Jason Ansel Date: Tue Oct 18 06:06:31 2022 +0000 [inductor] Fix copysign issue in old msvc build (#87117) Should fix https://github.com/pytorch/pytorch/pull/87028#issuecomment-1281066036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87117 Approved by: https://github.com/DanilBaibak commit 11915b3196d092137f12052f3fb3d723e95ce729 Author: PyTorch MergeBot Date: Tue Oct 18 05:32:45 2022 +0000 Revert "[BE] Get rid of deprecation warnings in workflows (#87152)" This reverts commit 9da032ecee8b0c7a5ce822bb4425af9208dc2fa1. Reverted https://github.com/pytorch/pytorch/pull/87152 on behalf of https://github.com/malfet due to Regresses is_pr_labelled workflow again commit d36c284d1446cb250178f8e89fff9b342ee1a5a9 Author: Zachary DeVito Date: Mon Oct 17 17:59:56 2022 +0000 [triton] allow cuda properties to be queried from workers (#87101) Fixes https://github.com/pytorch/pytorch/pull/87048 by saving the needed properties before fork. Actually attempting to get CUDA to load in the workers is probably not desired: cuda initialization takes O(seconds). Having multiple processes using the same device will slow things down. This just moves the needed properties from the main trainer process to the workers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87101 Approved by: https://github.com/soumith commit 9da032ecee8b0c7a5ce822bb4425af9208dc2fa1 Author: Nikita Shulga Date: Tue Oct 18 04:34:58 2022 +0000 [BE] Get rid of deprecation warnings in workflows (#87152) - Per [deprecation announcement](https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/) replace `echo "::set-output name="` with echo to `${GITHUB_OUTPUT}` as shown in following [example](https://docs.github.com/en/actions/using-jobs/defining-outputs-for-jobs#example-defining-outputs-for-a-job) - Update `actions/setup-python` from `v2` to `v4` to get rid of deprecated node version warning - Update `actions/checkout-python` from `v2` to `v3` (and `silent-checkout` branch as well) - Update `retry` action to https://github.com/nick-fields/retry/commit/3e91a01664abd3c5cd539100d10d33b9c5b68482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87152 Approved by: https://github.com/kit1980, https://github.com/izaitsevfb commit 66658e1da7fb3590d3d760d2b26793fb49ab28a5 Author: PyTorch MergeBot Date: Tue Oct 18 04:14:01 2022 +0000 Revert "[BE] Get rid of deprecation warnings in workflows (#87152)" This reverts commit acaf484f0a38f6a7becf342bb3492e1de09f64e1. Reverted https://github.com/pytorch/pytorch/pull/87152 on behalf of https://github.com/malfet due to Regresses is_pr_labelled workflow commit 8ca7820e4531e61b3d381d5eddf43c4969ba0c7d Author: Yanbo Liang Date: Tue Oct 18 03:46:01 2022 +0000 [Inductor] Lift the maximum depth of the Python interpreter stack to adapt large/deep models (#87130) Partly fixes https://github.com/pytorch/torchdynamo/issues/1693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87130 Approved by: https://github.com/jansel commit acaf484f0a38f6a7becf342bb3492e1de09f64e1 Author: Nikita Shulga Date: Tue Oct 18 03:38:24 2022 +0000 [BE] Get rid of deprecation warnings in workflows (#87152) - Per [deprecation announcement](https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/) replace `echo "::set-output name="` with echo to `${GITHUB_OUTPUT}` as shown in following [example](https://docs.github.com/en/actions/using-jobs/defining-outputs-for-jobs#example-defining-outputs-for-a-job) - Update `actions/setup-python` from `v2` to `v4` to get rid of deprecated node version warning - Update `actions/checkout-python` from `v2` to `v3` (and `silent-checkout` branch as well) - Update `retry` action to https://github.com/nick-fields/retry/commit/3e91a01664abd3c5cd539100d10d33b9c5b68482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87152 Approved by: https://github.com/kit1980, https://github.com/izaitsevfb commit 5fb687182dba781d9c95388d19f4784b98cb8b20 Author: Driss Guessous Date: Tue Oct 18 02:00:04 2022 +0000 Enable sdp_forward for NestedTensors (#86720) This PR implements a sdp_forward for NestedTensors. This impl will call into flash and mem_efficient_attention when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86720 Approved by: https://github.com/cpuhrsch commit 74138a8daa93ec4cb08e4dd31c2773ec0c751d94 Author: Huy Do Date: Tue Oct 18 01:14:07 2022 +0000 Use conda-forge in mac mps test (#87155) https://github.com/pytorch/pytorch/pull/87150 works, most of the jobs are ok now. However, I miss one last piece in MPS test workflow https://github.com/pytorch/pytorch/actions/runs/3269594289/jobs/5377469209. So this fixes the missing piece to use conda-forge Pull Request resolved: https://github.com/pytorch/pytorch/pull/87155 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi commit 9d1a8edc0e609387f30848ddcae569a238052d66 Author: ssjia Date: Fri Oct 14 15:10:28 2022 -0700 [vulkan] Use 2D texture types for convolution weights and biases (#86972) Differential Revision: [D40385500](https://our.internmc.facebook.com/intern/diff/D40385500/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86972 Approved by: https://github.com/salilsdesai, https://github.com/kirklandsign commit 5b588036aa0152d83d58f1b52038137043da0768 Author: ssjia Date: Fri Oct 14 15:10:25 2022 -0700 [vulkan] Enable 2D texture types (#86971) Adds the ability to use 2D GPU textures to represent tensors. The `StorageType` enum can be used to represent other representation modes in the future, such as buffer representations, etc. Differential Revision: [D40363112](https://our.internmc.facebook.com/intern/diff/D40363112/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86971 Approved by: https://github.com/kirklandsign commit a7ed398cf6bca767d93c6d81f3ecf4198e1b52e0 Author: Richard Barnes Date: Tue Oct 18 00:35:44 2022 +0000 Check all CUDA API calls in aten/src/ATen/test for errors (#74919) (#83556) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74919 Test Plan: Sandcastle Differential Revision: D35194596 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83556 Approved by: https://github.com/malfet commit f02f0e3ad1565e3da1e78efaa994e80c7577fd0c Author: Huy Do Date: Tue Oct 18 00:11:37 2022 +0000 Install blas from conda-forge (#87150) Mitigate https://github.com/pytorch/pytorch/issues/87148 On AWS (m1, linux) * Run `conda install blas:openblas`, it should failed with `ChecksumMismatchError`: ``` ChecksumMismatchError: Conda detected a mismatch between the expected content and downloaded content for url 'https://repo.anaconda.com/pkgs/main/linux-64/blas-1.0-openblas.conda'. download saved to: /tmp/debug/pkgs/blas-1.0-openblas.conda expected sha256: c85b5d0a336b5be0f415c71fd7fe2eca59e09f42221bfa684aafef5510ba5487 actual sha256: 5dc5483db0d9785b19e021cee418a8ee03e0ff0e5ebd0b75af4927746604e187 ``` * Run ` conda install -c conda-forge blas:openblas` works Pull Request resolved: https://github.com/pytorch/pytorch/pull/87150 Approved by: https://github.com/kit1980 commit 9db7270ee7f18a9baf8c0b9b87ace5d8c655bb53 Author: albanD Date: Mon Oct 17 22:56:49 2022 +0000 Small update to Module note (#87142) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87142 Approved by: https://github.com/cpuhrsch commit fb614b1871d83c3063907c77f81177fc01bea19f Author: Nirav Mehta Date: Mon Oct 17 22:15:47 2022 +0000 Enable UBSAN mode for test_jit (#85735) Run `test_jit` executable with UBSAN flag in order to catch errors that might cause internal breakage Pull Request resolved: https://github.com/pytorch/pytorch/pull/85735 Approved by: https://github.com/dagitses commit 18cc00d3993f2e84c83274ff1ada6430291aa3bd Author: Catherine Lee Date: Mon Oct 17 22:10:21 2022 +0000 [ci] put more logs in a folded group (#86138) fixes: request to not print the entire log file, but the last couple of lines since they are probably the most relevant all but last 300 lines of failing tests get put into a folded group example https://github.com/pytorch/pytorch/actions/runs/3177200444/jobs/5177703202 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86138 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi, https://github.com/lezcano commit e3b84f6c9d5f5aeb6356948bdaaf419ad906226a Author: Catherine Lee Date: Mon Oct 17 22:09:56 2022 +0000 remove dynamo hash updates (#87092) remove workflow for updating dynamo hash as it got moved into this repo Pull Request resolved: https://github.com/pytorch/pytorch/pull/87092 Approved by: https://github.com/huydhn commit 4fd98dfe69287914fd29b38fbccaf7ac4d7261ee Author: David Berard Date: Mon Oct 17 10:29:41 2022 -0700 Don't only apply DDP optimizer on forward frames (#87097) Previously a check would only apply DDP optimizer on frames named "forward". But on hf_T5_large, a graph break causes some frames like: ``` ``` So instead, apply DDP optimizer on all frames. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87097 Approved by: https://github.com/wconstab commit 09d720919ec975860bea3dd42ac13f4921c7d245 Author: Justin Chu Date: Tue Oct 11 23:40:14 2022 +0000 Add venv to gitignore (#86702) `venv` is the common directory for creating virtual environments. Adding it to gitignore to support development that does not use anaconda to manage envs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86702 Approved by: https://github.com/kit1980 commit 0cb273b5d9e4a31574357df2f2290322088c7802 Author: Kevin Tse Date: Mon Oct 17 11:24:05 2022 -0400 [DataPipe] Fixing interface generation in setup.py (#87081) Based on the artifact generated on this [page](https://hud.pytorch.org/pr/87081), I downloaded [[s3] linux-focal-py3.7-clang7-asan/artifacts.zip](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3266430083/linux-focal-py3.7-clang7-asan/artifacts.zip) (1.14 GB) and unpacked it. `torch.utils.data.datapipes.datapipe.pyi` does exist. I believe this means the file should be part of the distribution. I also did `wheel unpack ***.whl` to confirm the existence of the file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87081 Approved by: https://github.com/ejguan commit f5ee2d88406b45c6730f3b34bb54979836374c40 Author: Michael Suo Date: Mon Oct 17 21:27:21 2022 +0000 [ci] fix bot comment (#87127) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87127 Approved by: https://github.com/clee2000 commit f552eee42765e7de01c7df7bd794c70fd094874d Author: Andrew Gu Date: Mon Oct 17 21:17:07 2022 +0000 [Docs] Remove outdated comment for sparse all-reduce (#87018) https://github.com/pytorch/pytorch/pull/23917 switched to using allgatherv instead of allgather for gloo sparse all-reduce. This PR removes a comment saying to use allgatherv if available since that has already been done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87018 Approved by: https://github.com/H-Huang commit d023e8393396acd871629e91516170d72ced10e0 Author: Catherine Lee Date: Mon Oct 17 21:03:42 2022 +0000 handle libomp update on circleci (#86979) libomp got an update and now its keg only reverts https://github.com/pytorch/pytorch/pull/86940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86979 Approved by: https://github.com/huydhn, https://github.com/malfet commit 5acf6e0e80fb3c029fe62ff665bc5279ec00a70c Author: Huy Do Date: Mon Oct 17 20:57:55 2022 +0000 Use 12xlarge for nightly cpp doc generation job (#86859) The job starts to run out of memory a lot recently https://hud.pytorch.org/failure/Process%20completed%20with%20exit%20code%20137. Probably more and more docs are added, so this ups the runner for cpp doc nightly from 4xlarge to the next tier of 12xlarge. This also choose the smaller runner of 2xlarge for python and functorch docs (may be linux.large is good enough for them?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86859 Approved by: https://github.com/malfet commit 4814270708cb6141c1fb6202f883c084c71290b4 Author: Michael Suo Date: Mon Oct 17 20:14:43 2022 +0000 [dynamo] Introduce `get_real_value` API to TensorVariable (#87091) Right now, example_value is doing two jobs: - We use it to propagate metadata (e.g. return type, shapes, etc.) throughout the graph - We use it to satisfy queries for the actual value (e.g. torch.cond, `assume_constant_result`) This is further complicated by the fact that we have two modes, one where `example_value` is a fake tensor, and one where it is a real tensor (this is the `fake_tensor_propagation` config flag). This leads to scenarios where we don't support every combination of job + mode, e.g. if `fake_tensor_propagation=False`, `assume_constant_result` is broken. This is made worse by the fact that "fake tensor mode" is the default and is required if you want dynamic shapes to work. So, this PR introduces a `get_real_value` API that just runs the graph up to `node` in order to get a concrete value. This API is orthogonal to `example_value`, so it doesn't care about `fake_tensor_propagation`. When `fake_tensor_propagation=True`: `example_value` is a fake tensor, you must use the `get_real_value` API to get a concrete value. This will be the only configuration in the future. When `fake_tensor_propagation=False`: `example_value` and `get_real_value` will produce the same value. This is redundant but we will be removing this config soon. To support this, I introduce a cache for computed real values, to memoize the work involved if we're asking for real values a lot. I attached this state to `OutputGraph` because it seems to be what historically managed `example_value` lifetimes, but idk. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87091 Approved by: https://github.com/wconstab commit e85dbcc9b075961ab082975348c5cf1d99b7da76 Author: Jan Margeta Date: Mon Oct 17 20:01:07 2022 +0000 [docs] Fix ScalarTensor __repr__ in Extending PyTorch example (#86330) This PR fixes the __repr__ of the `ScalarTensor` class in the Extending PyTorch example to correspond with the class name instead of `DiagonalTensor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86330 Approved by: https://github.com/bdhirsh commit b8007742c287d792f2e89bbb7af5f87f6afdd2e8 Author: Michael Voznesensky Date: Mon Oct 17 19:55:39 2022 +0000 [Dynamo] More robust pyop support, module properties as args (#87020) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87020 Approved by: https://github.com/jansel commit 1167949b2df0e9ff228aac0e1b82403c05021546 Author: Thiago Crepaldi Date: Mon Oct 17 19:45:33 2022 +0000 [ONNX] Ignore print(Tensor) during tracing (#86223) Fixes #73619 Fixes https://github.com/microsoft/onnxruntime/issues/11812 This PR adds new symbolics: `aten::_conj`, `aten::conj_physical`, `aten::resolve_conj`, and `aten::resolve_neg` While the last two are always NO-OP by definition (do not change nodes), the first raises an exception as they are not supported by ONNX yet Pull Request resolved: https://github.com/pytorch/pytorch/pull/86223 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit 31931515bc927675136b1637fb9782b9b5ff3174 Author: Ivan Yashchuk Date: Mon Oct 17 18:46:28 2022 +0000 Workarounds for cudnn_batch_norm with TorchRefsNvfuserCapabilityMode (#86796) This PR adds workarounds to support AOT Autograd's graphs containing `aten.cudnn_batch_norm` and `aten.cudnn_batch_norm_backward` with `TorchRefsNvfuserCapabilityMode`. The problem with the decomposition of `aten.cudnn_batch_norm` is that it uses a `new_empty` call that is not supported by nvFuser and we are conservative with lowering functions to nvprims by default. The problem with the decomposition of `aten.cudnn_batch_norm_backward` is described here https://github.com/pytorch/pytorch/pull/86115#issue-1394883782, but changing the decomposition directly in that PR makes many tests fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86796 Approved by: https://github.com/mruberry commit 33343def0b4a0ec58f0557edba017748f789c8d6 Author: holimion Date: Mon Oct 17 18:27:46 2022 +0000 add XLA backend into tensor type strings (#86881) add XLA backend into tensor type strings Pull Request resolved: https://github.com/pytorch/pytorch/pull/86881 Approved by: https://github.com/bdhirsh commit 317eeb81c3e7ab21ca4359819c4a89122ce574f5 Author: PyTorch MergeBot Date: Mon Oct 17 18:26:59 2022 +0000 Revert "OpInfo: Sample input cleanup (4/n) (#86324)" This reverts commit 2a6d37d23d163a35c0b62c4319a6c2f049a27833. Reverted https://github.com/pytorch/pytorch/pull/86324 on behalf of https://github.com/peterbell10 due to Caused tolerance issues in periodic test commit 8f85831fdf473be541b12b843329d9b3f124c6d6 Author: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Date: Mon Oct 17 18:17:01 2022 +0000 Give more clear error message when gscope is non-empty (#87005) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87005 Approved by: https://github.com/alanwaketan, https://github.com/Krovatkin commit c01c7a5e2cd1074409f31b1338524d440db8b460 Author: Kevin Tse Date: Mon Oct 17 15:19:29 2022 +0000 [DataPipe] Fix missing functional name for FileLister (#86497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86497 Approved by: https://github.com/ejguan commit c27a5171b80b126917ef4435e232055415fcf617 Author: Huy Do Date: Mon Oct 17 17:39:19 2022 +0000 Update action lint with missing new runners from scale-config (#87009) Using runner label like `linux.12xlarge` results in linter failure from actionlint, i.e. https://github.com/pytorch/pytorch/actions/runs/3253740221/jobs/5341281952 ``` Error (ACTIONLINT) [runner-label] label "linux.12xlarge" is unknown. available labels are "windows- latest", "windows-2022", "windows-2019", "windows-2016", "ubuntu- latest", "ubuntu-22.04", "ubuntu-20.04", "ubuntu-[18](https://github.com/pytorch/pytorch/actions/runs/3253740221/jobs/5341281952#step:7:19).04", "macos-latest", "macos-12", "macos-12.0", "macos-11", "macos-11.0", "macos-10.15", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows", "linux.[20](https://github.com/pytorch/pytorch/actions/runs/3253740221/jobs/5341281952#step:7:21)_04.4x", "linux.20_04.16x", "linux.large", "linux.2xlarge", "linux.4xlarge", "linux.4xlarge.nvidia.gpu", "linux.8xlarge.nvidia.gpu", "linux.16xlarge.nvidia.gpu", "windows.4xlarge", "windows.8xlarge.nvidia.gpu", "bm-runner", "linux.rocm.gpu", "macos-m1- 12", "macos-12-xl", "macos-12", "macos12.3-m1". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file 47 | # an OOM issue when running the job, so this upgrades the runner from 4xlarge 48 | # to the next available tier of 12xlarge. So much memory just to generate cpp 49 | # doc >>> 50 | runner: linux.12xlarge 51 | # Nightly cpp docs take about 150m to finish, and the number is stable 52 | timeout-minutes: 180 53 | - docs_type: python ``` `linux.12xlarge` is a valid runner label from https://github.com/pytorch/test-infra/blob/main/.github/scale-config.yml. This also adds `linux.24xlarge` and `linux.g5.4xlarge.nvidia.gpu`, which are also not added yet Pull Request resolved: https://github.com/pytorch/pytorch/pull/87009 Approved by: https://github.com/ZainRizvi commit 1704256b107500c1ebc2e803b55e31e11104e618 Author: Natalia Gimelshein Date: Mon Oct 17 17:08:44 2022 +0000 Enables `where` to have cpu scalar args (#87022) This is for decompositions only, no attempt made to have good performance for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87022 Approved by: https://github.com/ezyang, https://github.com/eellison, https://github.com/mruberry commit f3969bd8b50fc20e12a3c6a69a5788786b0d904c Author: samdow Date: Mon Oct 17 09:36:22 2022 -0400 [functorch] Fix cross to match unbatched behavior (#86926) Fixes #83936 #83907 In #83936, I noticed that after I wrote cross, it's silently incorrect because I misunderstood what the fix to linalg was going to be. This fixes functorch to not be silently incorrect with `linalg.cross`. Since it's a silent correctness issue that I missed, I'm hoping to cherry pick it too Pull Request resolved: https://github.com/pytorch/pytorch/pull/86926 Approved by: https://github.com/zou3519 commit e271e823c7d1b231175ab4a0145d4ef2f7b7519c Author: Sherlock Huang Date: Fri Oct 14 16:11:15 2022 +0000 Avoid calling logging.basicConfig (#86959) Fixes https://github.com/pytorch/pytorch/issues/85952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86959 Approved by: https://github.com/xwang233, https://github.com/davidberard98 commit 6351220573c8d86972f5188dc5a570686fa3f8ed Author: anjali411 Date: Mon Oct 17 12:30:34 2022 +0000 Add meta support for _adaptive_avg_pool2d_backward (#86359) (#87074) This reverts commit 3edf79dc03193c98b665d62231fe69a10dfab1fa. Reland of https://github.com/pytorch/pytorch/pull/86359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87074 Approved by: https://github.com/ezyang commit 66715767ffcd986d18b1658ee13f63dbdf5eb898 Author: PyTorch MergeBot Date: Mon Oct 17 16:02:49 2022 +0000 Revert "[Dynamo] More robust pyop support, module properties as args (#87020)" This reverts commit 3c320a5613c26aa3568c330ae1c34a03dadf2b5c. Reverted https://github.com/pytorch/pytorch/pull/87020 on behalf of https://github.com/ZainRizvi due to This appears to have caused two periodic tests to fail commit 8617f5f48183b84fc7335a2754fc1ffa9666a0dc Author: Natalia Gimelshein Date: Mon Oct 17 15:59:05 2022 +0000 fix cudagraphify for inplace parameter change (#87060) Fixes https://github.com/pytorch/torchdynamo/issues/1687 cc @albanD, @chillee, I don't know what I'm doing. According to previous discussions, calling `detach()` on inputs can cause bugs if inputs are later inplace-resized (cc @ezyang) https://github.com/pytorch/pytorch/pull/85301/files#diff-8678402e01603e588fcf175a61de9ed578d885b1cc082e028021856190223fb7L433, but should we weed out these patterns before they are sent to cudagraphify? Pull Request resolved: https://github.com/pytorch/pytorch/pull/87060 Approved by: https://github.com/jansel, https://github.com/albanD commit 2c6167c4bb5165e5844b541275cee35687dc9783 Author: PyTorch MergeBot Date: Mon Oct 17 15:44:14 2022 +0000 Revert "[inductor] Use decomps for unfold (#87025)" This reverts commit 5099883f059a9b15592b8ba3b7bf83145163b966. Reverted https://github.com/pytorch/pytorch/pull/87025 on behalf of https://github.com/ZainRizvi due to Breaks periodic tests commit 2b558138cf0a0296b27814d129023d1b1a503f29 Author: Animesh Jain Date: Mon Oct 17 15:43:53 2022 +0000 [inductor] Set correct strides in fallback example run (#87049) Fixes #ISSUE_NUMBER Helps in resolving many issues seen in https://github.com/pytorch/torchdynamo/issues/1675 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87049 Approved by: https://github.com/jansel commit 4e5357faf5fd65e56b14bec6bdd33e915f909bde Author: Peter Bell Date: Mon Oct 17 13:25:09 2022 +0100 ATen/native (2/6): Use per-operator headers (#75572) Differential Revision: [D40126702](https://our.internmc.facebook.com/intern/diff/D40126702) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75572 Approved by: https://github.com/DanilBaibak, https://github.com/malfet commit b40f4434ac3512a21dcec91467df1b179898503f Author: albanD Date: Sun Oct 16 22:16:16 2022 -0400 conv backward impl (#87047) ~~Waiting for test run to see if this backward is actually exercised. If not, I will add test before merging.~~ Test updated. Ready to go now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87047 Approved by: https://github.com/ezyang commit 1463013c85e2c89adaad76612637ef951ffc7e94 Author: albanD Date: Sun Oct 16 22:16:16 2022 -0400 autograd clone_obey_contract() symint support (#87044) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87044 Approved by: https://github.com/ezyang commit 86c2e44cb68a646e368a8e52915fd2a835842dc7 Author: albanD Date: Sun Oct 16 22:16:15 2022 -0400 meta funcs for avg_pool2d and avg_pool2d_backward (#87043) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87043 Approved by: https://github.com/ezyang commit c21dcffc005aeb061ac869d3ff712daf89d11ea4 Author: albanD Date: Sun Oct 16 22:16:14 2022 -0400 Very limited pow support (#87042) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/87042 Approved by: https://github.com/ezyang commit 37e9e89afbc3554258545a026fab4cd9e1a4b85d Author: PyTorch MergeBot Date: Mon Oct 17 10:55:42 2022 +0000 [xla hash update] update the pinned xla hash (#87067) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87067 Approved by: https://github.com/pytorchbot commit 91b3cd0b5a7d297a82ca0f9068ea7f9ac1963ced Author: Nikita Karetnikov Date: Sun Oct 16 23:22:01 2022 +0200 [primTorch] Add a ref for `narrow_copy` (#86748) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86748 Approved by: https://github.com/mruberry commit 847ded6db325af268527e7096e31085fcb845495 Author: Ryan Spring Date: Mon Oct 17 06:20:31 2022 +0000 [primTorch] Implement NLL loss reference (#81128) Add Reference: - nll_loss Depends on: - expand https://github.com/pytorch/pytorch/pull/79820 - advance indexing Pull Request resolved: https://github.com/pytorch/pytorch/pull/81128 Approved by: https://github.com/mruberry commit 78e2289005738df1faefb6c2309495b8b8d367bb Author: Jiong Gong Date: Mon Oct 17 06:05:30 2022 +0000 [TorchInductor] enable inplace buffers by default (#87037) This PR enables the inplace_buffers configuration by default after fixing issue: https://github.com/pytorch/torchdynamo/issues/1670. UT is added to cover the fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87037 Approved by: https://github.com/jansel commit 1b43883fd61a5e3525ea213262bfcb3aedc941d3 Author: Emilio Castillo Date: Mon Oct 17 04:32:08 2022 +0000 Make `AdamW`, `NAdam` & `RAdam` differentiable (#86183) Blocked by #86096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86183 Approved by: https://github.com/albanD commit 364a9973cab8e7458abd27e3926168978fe5428e Author: PyTorch MergeBot Date: Mon Oct 17 03:17:00 2022 +0000 [vision hash update] update the pinned vision hash (#87021) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87021 Approved by: https://github.com/pytorchbot commit 3a4c0900c737fe73f900f0d21fc21d972f9bbd2e Author: albanD Date: Mon Oct 17 02:09:40 2022 +0000 Reland 3 of Merge more symbolic meta kernels and symint changes from branch (#86795) Take 3 Contains: - symintification of split* - floor support on SymFloat - pad_backward, gather, scatter meta Pull Request resolved: https://github.com/pytorch/pytorch/pull/86795 Approved by: https://github.com/z-a-f commit 0379af681b4b20475589189251aafbb2e6bb91ca Author: Jason Ansel Date: Sun Oct 16 15:10:07 2022 -0700 [inductor] Disable parallel compile (#87048) https://github.com/pytorch/pytorch/pull/87032 seems to have an issue that breaks our benchmark script, it might have to do with the benchmark script also using subprocess. Before this PR: ``` $ ./benchmarks/dynamo/torchbench.py --performance --inductor --raise --training --float16 ... Traceback (most recent call last): File "/home/jansel/conda/envs/pytorch/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/home/jansel/pytorch/torch/_inductor/codecache.py", line 239, in _worker_compile kernel = TritonCodeCache.load(source_code) File "/home/jansel/pytorch/torch/_inductor/codecache.py", line 234, in load mod = PyCodeCache.load(source_code) File "/home/jansel/pytorch/torch/_inductor/codecache.py", line 212, in load exec(code, mod.__dict__, mod.__dict__) File "/tmp/torchinductor_jansel/ij/cij7smji4sw2a56i4yz45bjkrosd2sb2raqnxzsxxpg4kwzuo2ta.py", line 5, in from torch._inductor.triton_ops.autotune import reduction File "/home/jansel/pytorch/torch/_inductor/triton_ops/__init__.py", line 3, in if has_triton(): File "/home/jansel/pytorch/torch/_inductor/utils.py", line 38, in has_triton return triton is not None and torch.cuda.get_device_capability() >= (7, 0) File "/home/jansel/pytorch/torch/cuda/__init__.py", line 368, in get_device_capability prop = get_device_properties(device) File "/home/jansel/pytorch/torch/cuda/__init__.py", line 382, in get_device_properties _lazy_init() # will define _get_device_properties File "/home/jansel/pytorch/torch/cuda/__init__.py", line 228, in _lazy_init raise RuntimeError( RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method ``` cc @zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/87048 Approved by: https://github.com/soumith commit 3007efda08f2fd61f9c48a810f6931a560f9ca62 Author: Peter Bell Date: Sun Oct 16 20:23:08 2022 +0100 stft: Require return_complex to be passed explicitly for real input (#86724) This behavior has been deprecated since PyTorch 1.8 but this step of the deprecation cycle was put on hold in #50102 waiting for JIT upgraders functionality which doesn't seem to have panned out. I'd say there has been more than enough of a deprecation period, so we should just continue. BC-breaking message: `torch.stft` takes an optional `return_complex` parameter that indicates whether the output should be a floating point tensor or a complex tensor. `return_complex` previously defaulted to `False` for real input tensors. This PR removes the default and makes `return_complex` a required argument for real inputs. However, complex inputs will continue to default to `return_complex=True`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86724 Approved by: https://github.com/mruberry, https://github.com/albanD commit 2b7236a0e1c0d2339165103b2cd42e25debee99d Author: Zachary DeVito Date: Sun Oct 16 05:17:20 2022 +0000 [torchdynamo] Use ProcessPoolExecutor for triton compiles (#87032) This patch significantly improves the parallel compilation performance for cThis patch significantly improves the parallel compilation performance for compiling triton kernels by using ProcessPoolExecutor to create persistent pool of compilation workers. Previously os.fork overhead and GIL contention limited the achieved parallelism. This patch replaces the worker threads with a pool of processes to do the raw compilation, and does serial work on the main thread for everything else. This other work couldn't be parallelized anyway since it is mostly in python. In cold start situations, the time to get the worker threads started can be significant portion of the time. This patch starts the workers earlier so they are ready to perform compilation (see code comments) when dynamo gets to that point. Just tested this on one example benchmark (tf_efficientnet_b0), but the results are significant, almost eliminating the difference between a warm and cold compilation. ``` 39.613s - warm 41.290s - cold, this patch 2m53.197s - cold, single threaded: 1m7.092s - cold, old setup n = 8 (its best config) ``` (cold compilation is done after running `rm -rf /tmp/torchinductor_$USER`).ompiling triton kernels by using ProcessPoolExecutor to create persistent pool of compilation workers. Previously os.fork overhead and GIL contention limited the achieved parallelism. This patch replaces the worker threads with a pool of processes to do the raw compilation, and does serial work on the main thread for everything else. This other work couldn't be parallelized anyway since it is mostly in python. In cold start situations, the time to get the worker threads started can be significant portion of the time. This patch starts the workers earlier so they are ready to perform compilation (see code comments) when dynamo gets to that point. Just tested this on one example benchmark (tf_efficientnet_b0), but the results are significant, almost eliminating the difference between a warm and cold compilation. ``` 39.613s - warm 41.290s - cold, this patch 2m53.197s - cold, single threaded: 1m7.092s - cold, old setup n = 8 (its best config) ``` (cold compilation is done after running `rm -rf /tmp/torchinductor_$USER`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87032 Approved by: https://github.com/soumith, https://github.com/jansel commit 945d333ae485673d7a603ca71822c9a39ca4775a Author: Jason Ansel Date: Sun Oct 16 09:20:50 2022 -0700 Migrate dynamo CI test shards to torch._dynamo (#87039) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87039 Approved by: https://github.com/voznesenskym commit 30f6f6903c7e68d2105d5b8dfe8841a788bab051 Author: Jason Ansel Date: Sun Oct 16 10:16:04 2022 -0700 [inductor] Move size asserts to C++, fix bug (#87028) Inductor internally models any `size=1` dimension as having `stride=0` to simplify indexing formulas (sympy will remove these terms from the expression). This caused a bug in our generate stride assert in detectron2_maskrcnn_r_50_fpn, where we asserted the wrong stride of a size==1 dimension. This fixes that bug, and moves size/stride assert logic to C++ which should be a small perf gain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87028 Approved by: https://github.com/anijain2305 commit d45e99acf5fed7d0ea0ffcff36231c63ea3a8db5 Author: Jason Ansel Date: Sun Oct 16 08:09:32 2022 -0700 [dynamo] Put printing graph breaks behind a config option (#87026) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87026 Approved by: https://github.com/soumith, https://github.com/voznesenskym commit 2a6d37d23d163a35c0b62c4319a6c2f049a27833 Author: Peter Bell Date: Fri Oct 14 17:06:42 2022 +0100 OpInfo: Sample input cleanup (4/n) (#86324) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86324 Approved by: https://github.com/mruberry commit 5099883f059a9b15592b8ba3b7bf83145163b966 Author: Jason Ansel Date: Sat Oct 15 21:08:48 2022 -0700 [inductor] Use decomps for unfold (#87025) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87025 Approved by: https://github.com/soumith commit 8a8cd092c8537b226f5c38ed88bc07e181b0946c Author: Edward Z. Yang Date: Sun Oct 16 06:13:18 2022 +0000 Add labeler with dynamo/inductor paths to start (#87024) The other missing ingredient is getting CC bot to work on labels on PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/87024 Approved by: https://github.com/soumith, https://github.com/jansel commit a0c2a7f2eda788a48f1d243940297f1467faf138 Author: Jason Ansel Date: Sat Oct 15 17:50:58 2022 -0700 Add triton to CI (#86988) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86988 Approved by: https://github.com/malfet, https://github.com/voznesenskym, https://github.com/soumith commit 3c320a5613c26aa3568c330ae1c34a03dadf2b5c Author: Michael Voznesensky Date: Sun Oct 16 02:15:10 2022 +0000 [Dynamo] More robust pyop support, module properties as args (#87020) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87020 Approved by: https://github.com/jansel commit 5d6e8315630d4e62e5e015c2e4c816be04f1f94e Author: Peter Bell Date: Fri Oct 14 17:06:42 2022 +0100 OpInfo: Sample input cleanup (3/n) (#86380) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86380 Approved by: https://github.com/mruberry commit 054a2fd6c2fc361796663eed4772368c287d6c83 Author: Jason Ansel Date: Sat Oct 15 08:35:32 2022 -0700 Sync changes from `pytorch/torchdynamo` (#87013) This updates to: https://github.com/pytorch/torchdynamo/commit/6380959be21851bfda99424392cc08fda29d073d Generated with: https://github.com/pytorch/torchdynamo/blob/main/copy_to_core.sh Pull Request resolved: https://github.com/pytorch/pytorch/pull/87013 Approved by: https://github.com/voznesenskym commit 2c1bc216b8893e59e986d843dcf0e152e1938ac1 Author: Horace He Date: Sat Oct 15 04:10:47 2022 +0000 Fixed partitioner issue with getitem and made metadata a storage more consistent (#87012) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87012 Approved by: https://github.com/ngimel commit 91c7015426f6b57b87bd92a63e9c08d9fd46a020 Author: Jane Xu Date: Sat Oct 15 06:23:48 2022 +0000 [einsum] Fix opt_einsum defaults to be more reasonable (#86985) Fixes the confusing situation mentioned here https://github.com/pytorch/pytorch/issues/85224#issuecomment-1278628262 by - setting better OG defaults - changing warnings to errors now that we have better defaults Test plan: - Ran einsum tests locally + CI - Uninstalled opt-einsum and ran through setting - `enabled` to False (doesn't throw error) - `strategy` to anything that's not None (errors) - `strategy` to None (noops) - Installed opt-einsum and ran through setting - `enabled` to False (doesn't throw error) - `enabled` to True (doesn't throw error, no ops + defaults to 'auto') - `strategy` to random string (errors) - `strategy` to None (noops, still is 'auto') - `strategy` to 'greedy' (is set to 'greedy') Pull Request resolved: https://github.com/pytorch/pytorch/pull/86985 Approved by: https://github.com/soulitzer commit 7980ed95bd708d6e9baf64c95b1aa83df8891b59 Author: tangleintel Date: Sat Oct 15 05:33:07 2022 +0000 Support unpacking python dictionary in torch.jit.trace() (#81623) Say, if you have a model and its forward method defined as follows: **`def forward(self, key1=value1, key2=value2, key3=value3)`** And you have a dataset and each data point in the dataset is a python dict as follows: **`data = {key1:value1, key3:value3, key2:value2}`** The problem is that if you want to trace the model using the dict data by the giving dataset, you need unpack the dictionary and reorder its value manually and make up a tuple as **`data_tuple = (value1, value2, value3)`** as the **`example_inputs`** parameter of **`torch.jit.trace()`**. This marshalling process is not user friendly. Say, if you have a model and its forward method defined as follows: **`def forward(self, key1=None, key2=None, key3=None)`** -> The default value is **None** And you have a dataset and each data point in the dataset is a python dict as follows: **`data = {key1:value1, key3:value3}`** -> Only **part of** the required value by forward was given, the rest use the default value. The problem is that if you want to trace the model using the dict data by the giving dataset, it's not feasible at all. Cause neither you can pass a tuple like **`T1 = (value1, value3)`** nor **`T2 = (value1, None, value3)`**. T1 will mismatch value3 with key2 and T2 include **None** type which will be blocked by tracer's type checking. (Of course you can pass **`T3 = (value1,)`** to make the trace function finish without exception, but the traced model you get probably is not what you expect cause the different input may result in different traced result.). These problems come from the HuggingFace's PT model, especially in text-classification tasks with datasets such as [MRPC,](https://paperswithcode.com/dataset/mrpc) [MNLI](https://paperswithcode.com/dataset/multinli) etc. To address these two issues, we propose to support a new type, that is, python dict as example_inputs parameter for torch.jit.trace(). We can base on the runtime type information of the example_inputs object to determine if we fall back to the original tuple path or go into the new dictionary path. Both problem 1 and problem 2 can be solved by utilizing the "**`**`**" operator. 1. If we use dict as example_inputs to trace the model, then we have to pass a dictionary to the traced model too. (Cause probably we will change the order of debug name of the input parameter in torchscript IR, thus we can't assume the traced model's input parameters order are the same with the original model.). We need highlight this too in the document to mitigate this problem. For example: ``` example_inputs_dict = next(iter(dataloader)) jit_model = model.eval() jit_model = torch.jit.trace(jit_model, example_inputs_dict, strict=False) # Now the IR will be graph(%self : __torch__.module.___torch_mangle_n.Mymodule, %key1 : type1, %key3 : type3, %key2 : type2) jit_model = torch.jit.freeze(jit_model) jit_model(**example_inputs_dict) example_inputs_tuple = (value1, value3, value2) jit_model(*example_inputs_tuple) ``` 1. This PR will make some UT introduced in [39601](https://github.com/pytorch/pytorch/pull/39601) fail, which I think should be classified as unpacking a tuple containing a single dictionary element in our solution. 4. I think there is ambiguity since currently we only specify passing a tuple or a single Tensor as our example_inputs parameter in **torch.jit.trace()**'s documentation, but it seems we can still passing a dictionary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81623 Approved by: https://github.com/davidberard98 commit bdefa260b2831977b4a458d9daef2b710330c78c Author: Rohan Varma Date: Fri Oct 14 20:45:25 2022 +0000 [RFC] Separate CPU offload activation to its own wrapper (#85459) Passing in `offload_to_cpu=True` to checkpoint_wrapper is a bit confusing, because this causes the activation checkpoint args to be ignored and we do CPU offloading. This isn't ideal from API design perspective, so proposing to make `offload_wrapper` its own concept. Now, offload to CPU + checkpoint can be composed together, such as ``` apply_ac_wrapper(model, checkpoint_wrapper, check_fn=lambda mod: isinstance(mod, TransformerLayer)) model = offload_wrapper(model) ``` Will polish / add tests if this proposal sounds good. Differential Revision: [D39719854](https://our.internmc.facebook.com/intern/diff/D39719854/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85459 Approved by: https://github.com/awgu commit 100113b87747cc36a42621d0c94e8c72ddcead80 Author: Jerry Zhang Date: Thu Oct 13 17:02:33 2022 -0700 [quant][docs] Formatting fixes for fx graph mode quantization README (#86914) Summary: att Test Plan: No code changes involved Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86914 Approved by: https://github.com/vkuzo commit f6f1aefb8fc1664fec5825615e3353c68c41724b Author: PyTorch MergeBot Date: Sat Oct 15 03:25:03 2022 +0000 [vision hash update] update the pinned vision hash (#86758) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86758 Approved by: https://github.com/pytorchbot commit 46aaae98c5b95f33afd98c62d642808652594dd6 Author: XiaobingSuper Date: Fri Oct 14 05:33:06 2022 -0400 torchdynamo: add linear pointwise(binary) fusion kernel (#86583) Support binary fusion of Linear with: - add - sub - mul - div Pull Request resolved: https://github.com/pytorch/pytorch/pull/86583 Approved by: https://github.com/jgong5, https://github.com/jansel commit 5210fab64d4322438ebfd8ec9c1170d5effab0a3 Author: XiaobingSuper Date: Fri Oct 14 05:33:05 2022 -0400 torchdynamo: add convolution pointwise(binary) fusion kernel (#86582) Support binary fusion of Convolution with: - add - sub - mul - div Pull Request resolved: https://github.com/pytorch/pytorch/pull/86582 Approved by: https://github.com/jgong5, https://github.com/jansel commit 9a7a49b254086038cc16af44ae2d51bb2084ae0d Author: XiaobingSuper Date: Fri Oct 14 05:33:04 2022 -0400 torchdynamo: add convolution pointwise(unary) fusion kernel (#86581) Support unary fusion of Convolution with: - relu - sigmoid - tanh - hardswish - leaky_relu - hardtanh - gelu Pull Request resolved: https://github.com/pytorch/pytorch/pull/86581 Approved by: https://github.com/jgong5, https://github.com/jansel commit d5a7e6db38f4e77a91dd0568d2c21039c5c3032e Author: Peter Bell Date: Sat Oct 15 00:23:24 2022 +0100 ATen/native (1/6): Use per-operator headers (#75571) Differential Revision: [D40126698](https://our.internmc.facebook.com/intern/diff/D40126698) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75571 Approved by: https://github.com/malfet commit 4584d06e760eeb810f4d69ce14fc927ac3d96b17 Author: edward-io Date: Sat Oct 15 00:25:23 2022 +0000 [data] add autocompletion to datapipes (#86960) In REPLs (e.g. jupyter notebook) autocomplete now works: image even with custom data pipes: image Unfortunately I wasn't able to figure out how to get autocomplete to work for non-REPLs (e.g. VSCode) - may need to generate fake pyi stubs, which 1) won't work for custom datapipes and 2) is a larger project to tackle :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86960 Approved by: https://github.com/NivekT commit 3924aa75b111fc5832647dd4cae87c62ed8a2863 Author: Nikita Shulga Date: Sat Oct 15 00:20:42 2022 +0000 [BE] Extend linter to detect DOS newlines (#86973) Fix DOS newlines in `onednn/decompose_silu.[cpp|h]` introduced by https://github.com/pytorch/pytorch/pull/85591 as well as one in `.github/PULL_REQUEST_TEMPLATE.md` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86973 Approved by: https://github.com/huydhn, https://github.com/izaitsevfb commit b8aa1767cdca37def5d21cfa8aaf4a23e8ed3905 Author: Jerry Zhang Date: Thu Oct 13 17:02:32 2022 -0700 [quant][be] Remove unused helper functions in convert.py (#86913) Summary: att Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86913 Approved by: https://github.com/vkuzo commit 761ca20dd8d3bfda1694aa85eac7ee11f2ff68aa Author: Jerry Zhang Date: Thu Oct 13 17:02:31 2022 -0700 [quant][be] Rename qconfig_map to node_name_to_qconfig (#86861) Summary: att, with the introduction of QConfigMapping, this name is now very confusing, so renamed it to something clearer Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86861 Approved by: https://github.com/vkuzo commit 8f71e8de7ef33e0cc3c92d976aa0eedae92fa1aa Author: Jason Ansel Date: Fri Oct 14 11:05:28 2022 -0700 Sync changes from pytorch/torchdynamo, enable tests (#86950) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86950 Approved by: https://github.com/Chillee commit 78ef40973c1d6b97a9002be323a5f46ed83b58ee Author: Will Constable Date: Fri Oct 14 22:34:33 2022 +0000 Set -Werror=braced-scalar-init (#86911) - `vector({0})` would give you the vector(size, ...) ctor and produce an empty vector of T, along with the scalar-init warning - `vector({T(0)})` would give you the vector of a single T(0) as you might have intended, and bypasses the warning/error - the warning can easily be missed but can have serious consequences, so make it an error Pull Request resolved: https://github.com/pytorch/pytorch/pull/86911 Approved by: https://github.com/albanD commit 155b88580694a92c0f8304442d685139616e52e3 Author: maxren Date: Fri Oct 14 14:00:21 2022 -0700 [xnnpack][lite-int] preprocess (#86980) Split up original preprocess diff: This diff introduces the skeleton structure of the delegate APIs. first introducing the method compile spec error handling. For now it just outputs an empty tensor object upon execute. But just proves that delegate apis is working and a new xnnpack delegate backend has been added. Differential Revision: [D38562918](https://our.internmc.facebook.com/intern/diff/D38562918/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38562918/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/86980 Approved by: https://github.com/salilsdesai, https://github.com/cccclai commit 7c73b456211efe5d9f0d0a65f9a509b26d24f1aa Author: shubhambhokare1 Date: Fri Oct 14 21:58:01 2022 +0000 [onnx] Add support for autograd function inlining in ONNX_ATEN_FALLBACK mode (#85736) Solution to #85027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85736 Approved by: https://github.com/BowenBao commit d29c8c0ffa68f11790fc2e9fd78778bb8e9bc281 Author: Catherine Lee Date: Fri Oct 14 21:44:13 2022 +0000 enable optim tests on dynamo to test flaky bot (#86976) will link the issue that disabled them if this gets approved Pull Request resolved: https://github.com/pytorch/pytorch/pull/86976 Approved by: https://github.com/albanD commit 1a7409c77199403153f1260e2281bae2f76745f6 Author: maxren Date: Fri Oct 14 10:37:42 2022 -0700 [CoreML][ios_crash] Use special throw macro when encountering CoreML API errors (#86938) Error messages from TORCH_CHECK are stripped during production builds via -DSTRIP_ERROR_MESSAGES. This diff introduces a new macro COREML_CHECK which will always preserve the error message. This macro is used when encountering errors produced by CoreML API calls so that we can heve enough context to debug. Differential Revision: [D40351013](https://our.internmc.facebook.com/intern/diff/D40351013/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86938 Approved by: https://github.com/salilsdesai commit 34c86adec49322ab6586a65b9817ef282d44d55e Author: Brian Hirsh Date: Fri Oct 14 10:01:32 2022 -0700 symintify all of derivatives.yaml (#86610) Big-bang PR to symintify **all** .sizes() calls in derivatives.yaml, which will be needed for symbolic tracing. * with the exception of `split()`, which is tougher to land because it requires internal changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86610 Approved by: https://github.com/albanD commit d7bbb61f6b0c1a120e603e6313114457c4909835 Author: Brian Hirsh Date: Fri Oct 14 09:06:22 2022 -0700 min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86609) This PR shouldn't matter too much, but I figured I'd land it instead of deleting. `PySymInt.min/max` are technically broken today, and this fixes them - but it doesn't matter (yet) because nobody is calling `min()` / `max()` on symints from python (they all happen using `std::min/max` in C++, which desugar to lt / gt calls). Pull Request resolved: https://github.com/pytorch/pytorch/pull/86609 Approved by: https://github.com/albanD commit 1bb609ad47902353018948f4cd04a0aee9542e43 Author: Sean Ross-Ross Date: Wed Oct 12 14:22:47 2022 -0500 Added new test test_compare_cpu that checks if cpu and gpu results are consistent (#85011) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85011 Approved by: https://github.com/lezcano, https://github.com/mruberry commit e027740e7745bb0843d31337be3a17b805f4f712 Author: Lukas Mührke <46906556+LukasM937@users.noreply.github.com> Date: Fri Oct 14 19:59:33 2022 +0000 Chore: Add 'mps' to the docs of tensor_attributes (#86585) Since PyTorch supports 'mps' (Apple metal) devices it should be reflected in the documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86585 Approved by: https://github.com/albanD commit fc3afc840784106b173c87c95b1ee96a4018bb3d Author: Ivan Yashchuk Date: Fri Oct 14 19:49:39 2022 +0000 Remove empty_like+fill from AOT Autograd graphs for nvFuser (#86908) AOT Autograd records C++ code `1 - tensor` as a sequence of empty_like, fill, and sub (see https://github.com/pytorch/pytorch/issues/86612). Both empty_like and fill are not supported yet. This PR is a workaround for enabling fusions of `silu_backward`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86908 Approved by: https://github.com/ngimel commit 56a744bf47edd1adb423593955b786a2ede8bd4f Author: Justin Chu Date: Fri Oct 14 19:44:44 2022 +0000 [ONNX] Reland: Update training state logic to support ScriptedModule (#86745) In https://github.com/pytorch/pytorch/issues/86325, it was reported that ScriptedModule do not have a training attribute and will fail export because we don't expect them as input. Also - Parameterized the test_util_funs test Thanks @borisfom for the suggestion! Fixes #86325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86745 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 527ebedbff717074189a4c499ad5a62712442300 Author: Andrew M. James Date: Fri Oct 14 09:25:46 2022 -0500 Sparse support for ReLU (#86749) ReLU support for all sparse layouts, including backward. Fixes #85208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86749 Approved by: https://github.com/cpuhrsch, https://github.com/nikitaved commit ef045695e0b622968d7c15f86a60ccc4f3b0a1ed Author: Sherlock Huang Date: Fri Oct 14 15:51:26 2022 +0000 Fix decomp for huber_loss_backward (#86955) Fixes https://github.com/pytorch/pytorch/issues/86846 aten.huber_loss_backward calls aten.huber_loss_backward.out in its CompositeExplicitAutograd kernel. The decomp was mistaken registered for both aten.huber_loss_backward.default and aten.huber_loss_backward.out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86955 Approved by: https://github.com/Chillee commit 7da018b2f80c04038f797dbb76168416de8e2529 Author: Richard Zou Date: Fri Oct 14 11:39:45 2022 -0700 [functorch] fix fbcode tests (#86936) Differential Revision: [D40358418](https://our.internmc.facebook.com/intern/diff/D40358418) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86936 Approved by: https://github.com/samdow commit f17b3e9b7adaa849b2065fdcb5efb1b444f4725a Author: Peter Bell Date: Fri Oct 14 16:27:15 2022 +0100 Vectorize tensor lerp kernel (#84845) Fixes #86964 In a simple timeit benchmark I see 1.7x speedup for complex64, from 6.7 us to 3.9 us; and a 3.2x speedup for float32, from 6.2 us to 1.9 us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84845 Approved by: https://github.com/lezcano, https://github.com/malfet commit 13cff2ee8ea1d7aea2ad201cbd77ebe2b9a29d25 Author: Nikita Shulga Date: Fri Oct 14 17:35:18 2022 +0000 [MPS] Copy from CPU always add storageOffset (#86958) Because why wouldn't it? Fixes https://github.com/pytorch/pytorch/issues/86052 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86958 Approved by: https://github.com/kulinseth commit 1ece1ab6c2c5488b8475c70681aebddbdb9579ba Author: Catherine Lee Date: Fri Oct 14 17:31:31 2022 +0000 [ci] print rerun stacktraces for pytest (#86831) example: https://github.com/pytorch/pytorch/actions/runs/3238428826/jobs/5306808276 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86831 Approved by: https://github.com/huydhn commit d393a463ff5140b9257c0650137e03db0a78de58 Author: Huy Do Date: Fri Oct 14 17:26:49 2022 +0000 Fix functorch test selection logic (#86944) I realize that `run_test.py` doesn't take into account functorch test selection logic at the moment, for example `python test/run_test.py --functorch -i functorch/test_ops --verbose` stills run all functorch tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/86944 Approved by: https://github.com/clee2000, https://github.com/malfet commit bbd7b38d5580c44ffb4404d431e07bc2316e59d5 Author: PyTorch MergeBot Date: Fri Oct 14 17:22:55 2022 +0000 Revert "symintify nll loss fns (#86915)" This reverts commit 0ece7c86d829e2515e8b7d5df13cf0279b70c0e9. Reverted https://github.com/pytorch/pytorch/pull/86915 on behalf of https://github.com/anjali411 due to test_autocast_nn_fp32 fails commit 0ece7c86d829e2515e8b7d5df13cf0279b70c0e9 Author: anjali411 Date: Fri Oct 14 14:21:10 2022 +0000 symintify nll loss fns (#86915) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86915 Approved by: https://github.com/albanD commit a86278b08c12ba8db203bee22c56958a3c245b3e Author: Chien-Chin Huang Date: Thu Oct 13 09:42:14 2022 -0700 [FSDP] Consolidate FSDP state_dict offload_to_cpu settings (#86211) Consolidate FSDP state_dict offload_to_cpu settings. All state_dict_types now have offload_to_cpu options. Differential Revision: [D40065969](https://our.internmc.facebook.com/intern/diff/D40065969/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86211 Approved by: https://github.com/rohan-varma commit c9a8d309bda59164554b38deff18ac8bf824af34 Author: Catherine Lee Date: Fri Oct 14 16:04:04 2022 +0000 add super setup to test to enable disabling in test_dims.py (#86953) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86953 Approved by: https://github.com/huydhn commit 8eb579e362581d2ab2c440b4aad8b39fde4a9920 Author: PyTorch MergeBot Date: Fri Oct 14 14:56:59 2022 +0000 Revert "[Profiler] Move legacy profiler out of `torch/csrc/autograd` (#85512)" This reverts commit 157a3d2a7cd25779258f3e3dcef14633f1930103. Reverted https://github.com/pytorch/pytorch/pull/85512 on behalf of https://github.com/DanilBaibak due to Due to files were deleted, the internal build failed. Please re-submit via codev. commit 4460e40db4300b2b0d5dbfaedee0d82a19c444b9 Author: Nikita Karetnikov Date: Thu Oct 13 20:50:20 2022 +0200 [primTorch] Add a ref for `addcmul` (#86731) Based on: https://github.com/pytorch/pytorch/pull/79827 https://github.com/pytorch/pytorch/pull/72949 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86731 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 746500d58d90bb8d7833596d2c38e0f8142859d8 Author: PyTorch MergeBot Date: Fri Oct 14 14:25:51 2022 +0000 Revert "[cuDNN] Enable cuDNN Frontend v8 API by Default (#84948)" This reverts commit 427e0a6b4ebc691f1fa98662d04d5c431a75107f. Reverted https://github.com/pytorch/pytorch/pull/84948 on behalf of https://github.com/malfet due to Broke SM86 sanity commit 2cfc4cb36748a250f5252f1844f570c0cb806b8f Author: Ivan Yashchuk Date: Fri Oct 14 12:15:28 2022 +0000 Add optional recomputable_ops argument for the min cut partitioner (#86686) `min_cut_rematerialization_partition` has a default set of hard-coded operations that are allowed to be recomputed in the backward pass. This PR adds customization ability to this function allowing users to control the behavior by passing `recomputable_ops` instead of relying on the default setting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86686 Approved by: https://github.com/Chillee commit fd8068478469077753f34873c50656d3a44e01e1 Author: Ivan Yashchuk Date: Fri Oct 14 12:08:02 2022 +0000 Add nvFuser support for torch.Tensor.view (#84634) This is an alternative to https://github.com/pytorch/pytorch/pull/83739. While PrimTorch has `view` as a reference, we would like to use nvFuser's implementation for `view` for now. Later we might transition to PrimTorch's `torch._refs.view`. See `test_nvprims_view` for examples of things that are now sent to nvFuser. Note that nvFuser's `view` is a copy-like operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84634 Approved by: https://github.com/kevinstephano, https://github.com/mruberry commit b48deedb77003261fb0331048ab00e19fba901ee Author: Alvaro Gaona Date: Fri Oct 14 11:33:32 2022 +0000 Set up new module torch.signal.windows (#85599) Resolves #85366 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85599 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 056cfb0464bd137b0c4848a02ad0b6f283c25320 Author: PyTorch MergeBot Date: Fri Oct 14 05:40:18 2022 +0000 Revert "[ONNX] Update training state logic to support ScriptedModule (#86745)" This reverts commit 960b98128e475b15b66119f325232039799852cd. Reverted https://github.com/pytorch/pytorch/pull/86745 on behalf of https://github.com/janeyx99 due to https://hud.pytorch.org/pytorch/pytorch/commit/960b98128e475b15b66119f325232039799852cd broke onnx tests on trunk commit 157a3d2a7cd25779258f3e3dcef14633f1930103 Author: Taylor Robie Date: Thu Oct 13 07:49:03 2022 -0700 [Profiler] Move legacy profiler out of `torch/csrc/autograd` (#85512) The legacy profiler is an eyesore in the autograd folder. At this point the implementation is almost completely decoupled from the rest of profiler, and it is in maintaince mode pending deprecation. As a result, I'm moving it to `torch/csrc/profiler/standalone`. Unfortuantely BC requires that the symbols remain in `torch::autograd::profiler`, so I've put some basic forwarding logic in `torch/csrc/autograd/profiler.h`. One strange bit is that `profiler_legacy.h` forward declares `torch::autograd::Node`, but doesn't seem to do anything with it. I think we can delete it, but I want to test to make sure. (Note: this should not land until https://github.com/pytorch/torchrec/pull/595 is landed.) Differential Revision: [D39108648](https://our.internmc.facebook.com/intern/diff/D39108648/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85512 Approved by: https://github.com/aaronenyeshi commit 35fb0077495247bcda218a136c3a70f3022de7d2 Author: Taylor Robie Date: Thu Oct 13 07:49:00 2022 -0700 [Profiler][Minor] Separate standalone profilers from the main PyTorch profiler. (#85511) There are a number of instrumentation utils which have been added to the profiler toolkit. They are generally small and self contained, often wrapping vendor APIs. (NVTX, ITT) They don't really interact with the much more expansive machinery of the PyTorch profiler beyond registration / unregistration, minor util sharing, and reusing the profiler base class. Just as in the case of stubs, it makes sense to group them in a dedicated subfolder. Differential Revision: [D39108649](https://our.internmc.facebook.com/intern/diff/D39108649/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39108649/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85511 Approved by: https://github.com/albanD commit b8f14b7877cf1107f4572fbacf5aabba83aec641 Author: Taylor Robie Date: Thu Oct 13 07:48:58 2022 -0700 [Profiler][Minor] Group and consolidate stub APIs (#85510) There is a concept in profiler of a stub that wraps a profiling API. It was introduced for CUDA profiling before Kineto, and ITT has adopted it to call into VTune APIs. However for the most part we don't really interact with them when developing the PyTorch profiler. Thus it makes sense to unify the fallback registration mechanism and create a subfolder to free up real estate in the top level `torch/csrc/profiler` directory. Differential Revision: [D39108647](https://our.internmc.facebook.com/intern/diff/D39108647/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85510 Approved by: https://github.com/aaronenyeshi commit bc4ca4c2c4085e1ea2c212718d4470d057ec7c3f Author: Chien-Chin Huang Date: Thu Oct 13 10:56:26 2022 -0700 [FSDP] Fix load_sharded_state_dict FQN mismatches for shared parameters (#86524) `_sharded_pre_load_state_dict_hook()` should calls `_param_fqns()` to ensure shared parameters names are also included. Differential Revision: [D40201304](https://our.internmc.facebook.com/intern/diff/D40201304/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86524 Approved by: https://github.com/rohan-varma commit 960b98128e475b15b66119f325232039799852cd Author: Justin Chu Date: Fri Oct 14 01:31:40 2022 +0000 [ONNX] Update training state logic to support ScriptedModule (#86745) In https://github.com/pytorch/pytorch/issues/86325, it was reported that ScriptedModule do not have a training attribute and will fail export because we don't expect them as input. Also - Parameterized the test_util_funs test Thanks @borisfom for the suggestion! Fixes #86325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86745 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit f451e824f39516f503c2bdfd785d254b447b9557 Author: PyTorch MergeBot Date: Fri Oct 14 01:26:45 2022 +0000 Revert " C10D extension to enable per-thread PG (#86348)" This reverts commit 97abc21f2bda38e73de2a86da7f43c8126930681. Reverted https://github.com/pytorch/pytorch/pull/86348 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks macos tests https://hud.pytorch.org/pytorch/pytorch/commit/97abc21f2bda38e73de2a86da7f43c8126930681 commit c16c4a37abca2f1e2bb2918307e19bfa40e9500f Author: Huy Do Date: Fri Oct 14 00:47:16 2022 +0000 Remove functorch copy of conftest.py (#86927) Now that its tests have been moved to PyTorch test. This was a left over from https://github.com/pytorch/pytorch/pull/86623 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86927 Approved by: https://github.com/clee2000 commit b3b9786fdd1dbbadb4b75190646b5d0bb5c89771 Author: Horace He Date: Thu Oct 13 20:19:16 2022 +0000 Unified symbolic shape variables between AOTAutograd and Inductor (#86659) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86659 Approved by: https://github.com/wconstab commit c7c09722ad5ee25c5891f863e5bbd1575ad77970 Author: Jason Ansel Date: Thu Oct 13 23:18:06 2022 +0000 Move TorchDynamo into PyTorch core (#86461) Context: https://github.com/pytorch/torchdynamo/issues/1588 This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core. - `torchdynamo` becomes `torch._dynamo` - `torchinductor` becomes `torch._inductor` This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461 Approved by: https://github.com/voznesenskym commit 97abc21f2bda38e73de2a86da7f43c8126930681 Author: Rodrigo Kumpera Date: Thu Oct 13 22:23:28 2022 +0000 C10D extension to enable per-thread PG (#86348) Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 This change ensures BC by keeping the global variables around and have the default _World wrap it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86348 Approved by: https://github.com/rohan-varma commit 66979fbfaa2af227a6834157fa6f532979b2d23b Author: Peter Bell Date: Thu Oct 13 17:42:11 2022 +0000 Improve complex lerp performance (#84844) The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84844 Approved by: https://github.com/ngimel commit ae45dab57e22e3d04516e7dd81ef8dbefd51bfe3 Author: Catherine Lee Date: Thu Oct 13 21:27:52 2022 +0000 disable failing circleci test jobs (#86940) should revert later when fixed Pull Request resolved: https://github.com/pytorch/pytorch/pull/86940 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi commit 974ad8fa6cc63b89234beb5ebff54c2d42711932 Author: sanchitintel Date: Thu Oct 13 20:36:59 2022 +0000 Add BFloat16 dtype support for oneDNN Graph JIT fuser (#85591) Intel Xeon Cooper Lake platform & beyond support the `AVX512_BF16` ISA, which is essentially native BFloat16 support. oneDNN Graph delivers high inference performance with BFloat16 on such machines. While oneDNN Graph can still be used with BFloat16 on older machines that lack `avx512_bf16` ISA but support `avx512bw`, `avx512vl` & `avx512dq` ISAs, the BF16 performance on these older machines will be significantly poorer (probably even poorer than Float32), as they lack native BF16 support. Currently, [AMP support for eager mode & JIT mode is divergent in PyTorch](https://github.com/pytorch/pytorch/issues/75956). So, for using oneDNN Graph with BFloat16, eager-mode AMP should be leveraged by turning off AMP for JIT mode, using `torch._C._jit_set_autocast_mode(False)` in python code, so as to avoid conflicts. Please use the following environment variable to view JIT logs - `PYTORCH_JIT_LOG_LEVEL=">>graph_helper:>>graph_fuser:>>kernel:>>interface"` 1. This PR does NOT change the `oneDNN` commit or the `ideep` files. While the `ideep` commit is being updated, only files pertaining to oneDNN Graph are being updated. oneDNN Graph is being upgraded to version 0.5.2 (alpha patch release 2). To put things into perspective, `ideep` is a git submodule of PyTorch. `oneDNN Graph` is a git submodule of `ideep` (`ideep/mkl-dnn`), and oneDNN is a git submodule of oneDNN Graph (`ideep/mkl-dnn/third_party/oneDNN`). 2. Unit-tests are being updated. We now use the [existing dtypes decorator](https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_device_type.py#L123-L131). 3. Suggestions made by @eellison in the [FP32 PR](https://github.com/pytorch/pytorch/pull/68111#pullrequestreview-896719477) are being incorporated/addressed - | Action-item | Status | | :--- | ---: | |checkInputCompatibility follow up | Fixed | |the mayConvertScalarInputToTensor logic we can consider | Added type promotion code | |fix up fixConvOptionalBias| The current approach seems correct | |Use opinfo tests| using dtypes decorator. Will use `OpInfo` in a subsequent PR, if that'd be possible. Should we create a list of ops from opDB that are supported by oneDNN Graph, and add it to `common_methods_invocations.py`? | |inferDevice torch_check call | not necessary now, perhaps, as only CPU is supported, for now? We'd add it by the beta release of oneDNN Graph, though, so that by then, users might be able to use other fusers with oneDNN Graph (NNC/TensorExpr are already compatible with the oneDNN Graph fuser). We can still add it, if you'd insist. | |not checking shapes of input mkldnn tensor to llga guard | Those checks should not be present because oneDNN Graph may use blocked or channels-last layout, so those strides would be different. They're only skipped if an LLGA subgraph's output is input to another LLGA subgraph, which enables LLGA to choose an optimal layout between them. | |fix test failures with respect to unsupported inputs | We'll address them with the upcoming release of oneDNN Graph beta version| 4. More PyTorch ops are being been mapped to oneDNN Graph ```python example_input = torch.rand(1, 3, 224, 224) torch.jit.enable_onednn_fusion(True) torch._C._jit_set_autocast_mode(False) with torch.no_grad(), torch.cpu.amp.autocast(): model = torch.jit.trace(model, (example_input)) model = torch.jit.freeze(model) model(example_input) model(example_input) model(example_input) ``` **URL:** https://github.com/sanchitintel/benchmark/tree/onednn_graph_benchmark (instructions present at URL). **Batch-size(s):** TorchBench-default for each model **Baseline :** PyTorch JIT OFI FP32 **Machine:** Intel(R) Xeon(R) Platinum 8371HC (Cooper Lake) **Sockets used**: 1 **Number of cores on one socket**: 26 Intel OpenMP & tcmalloc were preloaded | name | latency of PyTorch JIT OFI FP32 (s) | Latency of oneDNN Graph BF16 (s) | % change | | :--- | ---: | ---: | ---: | | test_eval[alexnet-cpu-jit] | 1.063851 | 0.509820 | -52.1% | | test_eval[mnasnet1_0-cpu-jit] | 0.218435 | 0.107100 | -51.0% | | test_eval[mobilenet_v2-cpu-jit] | 0.114467 | 0.058359 | -49.0% | | test_eval[mobilenet_v3_large-cpu-jit] | 0.233873 | 0.117614 | -49.7% | | test_eval[resnet18-cpu-jit] | 0.160584 | 0.075854 | -52.8% | | test_eval[resnet50-cpu-jit] | 1.652846 | 0.713373 | -56.8% | | test_eval[resnext50_32x4d-cpu-jit] | 0.471174 | 0.209431 | -55.6% | |test_eval[shufflenet_v2_x1_0-cpu-jit] | 0.310306 | 0.167090 | -46.2% | | test_eval[squeezenet1_1-cpu-jit] | 0.161247 | 0.045684 | -71.7% | | test_eval[timm_efficientnet-cpu-jit] | 1.643772 | 0.800099 | -51.3% | | test_eval[timm_regnet-cpu-jit] | 5.732272 | 2.333417 | -59.3% | | test_eval[timm_resnest-cpu-jit] | 1.366464 | 0.715252 | -47.7% | | test_eval[timm_vision_transformer-cpu-jit] | 0.508521 | 0.271598 | -46.6% | | test_eval[timm_vovnet-cpu-jit] | 2.756692 | 1.125033 | -59.2% | | test_eval[vgg16-cpu-jit] | 0.711533 | 0.312344 | -56.1% | | name | latency of PyTorch JIT OFI FP32 (s) | Latency of oneDNN Graph BF16 (s) | % change | | :--- | ---: | ---: | ---: | | test_eval[alexnet-cpu-jit] | 0.062871 | 0.034198 | -45.6% | | test_eval[mnasnet1_0-cpu-jit] | 0.022490 | 0.008172 | -63.7% | | test_eval[mobilenet_v2-cpu-jit] | 0.012730 | 0.005866 | -53.9% | | test_eval[mobilenet_v3_large-cpu-jit] | 0.025948 | 0.010346 | -60.1% | | test_eval[resnet18-cpu-jit] | 0.011194 | 0.005726 | -48.9% | | test_eval[resnet50-cpu-jit] | 0.124662 | 0.045599 | -63.4% | | test_eval[resnext50_32x4d-cpu-jit] | 0.034737 | 0.015214 | -56.2% | |test_eval[shufflenet_v2_x1_0-cpu-jit] | 0.028820 | 0.012517 | -56.6% | | test_eval[squeezenet1_1-cpu-jit] | 0.012557 | 0.003876 | -69.1% | | test_eval[timm_efficientnet-cpu-jit] | 0.203177 | 0.051879 | -74.5% | | test_eval[timm_regnet-cpu-jit] | 0.452050 | 0.151113 | -66.6% | | test_eval[timm_resnest-cpu-jit] | 0.117072 | 0.052848 | -54.9% | | test_eval[timm_vision_transformer-cpu-jit] | 0.046048 | 0.023275 | -49.5% | | test_eval[timm_vovnet-cpu-jit] | 0.213187 | 0.077482 | -63.7% | | test_eval[vgg16-cpu-jit] | 0.044726 | 0.021998 | -50.8% | Pull Request resolved: https://github.com/pytorch/pytorch/pull/85591 Approved by: https://github.com/jgong5, https://github.com/frank-wei, https://github.com/chunyuan-w commit 14dd5db2f50ceb8fb3e7ab565e348e2eb616791a Author: Rodrigo Kumpera Date: Thu Oct 13 20:28:44 2022 +0000 [fsdp] Fix test for 2d parallel integration to trigger the load hooks. (#86272) nit: replaced empty array bool test with explicit test for its length. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86272 Approved by: https://github.com/awgu commit 18f58e2df1f5997c93f213c94a60eb72a63a05e4 Author: Jerry Zhang Date: Thu Oct 13 10:13:11 2022 -0700 [quant][be] Rename node_name_to_target_dtype to node_name_to_target_dtype_info (#86860) Summary: att, renaming to improve readability Test Plan: no functionality changes Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86860 Approved by: https://github.com/jcaip commit 158a071034a45ead778107beceedd6b696ff5234 Author: inisis Date: Thu Oct 13 20:12:52 2022 +0000 add _freeze for embedding op (#86769) Fixes #86663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86769 Approved by: https://github.com/albanD commit e737f2d81c8f83e5020d3383b320e024bb908a47 Author: Nikolay Korovaiko Date: Thu Oct 13 19:35:31 2022 +0000 set the correct size of aten tensor in presence of mkldnn padding (#86767) This fixes https://github.com/pytorch/pytorch/issues/86556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86767 Approved by: https://github.com/eellison commit 860ad04990addc6f6ba130c7d252cb23689ddceb Author: BowenBao Date: Mon Oct 10 16:47:18 2022 -0700 [ONNX] Fix FindCommonAncestor in function_extraction (#86650) One line fix to get absolute value of `diff` before looping over. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86650 Approved by: https://github.com/AllenTiTaiWang, https://github.com/abock commit af1dcef79c1a91ae03faacbdeb2f9127013f7889 Author: BowenBao Date: Wed Oct 12 14:59:02 2022 -0700 [ONNX] Fix triu/tril export with diagonal input (#86843) Investigation with @thiagocrepaldi discovered this bug with triu/tril export when `diagonal` is passed in as input. Previously assumption was made that `diagonal` is always provided a constant value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86843 Approved by: https://github.com/thiagocrepaldi, https://github.com/abock commit dbdfb8dd8b7e2ffe427e4acd045249b89236af9b Author: Ivan Yashchuk Date: Thu Oct 13 18:08:58 2022 +0000 Skip test_nvfuser_extremal_values for native_batch_norm (#86897) New tests were introduced with https://github.com/pytorch/pytorch/commit/68a6113248ac25841b524d59f9dc0f298b389ba2. This PR explicitly skips the problematic tests. Fixes https://github.com/pytorch/pytorch/issues/86176 Fixes https://github.com/pytorch/pytorch/issues/86177 Fixes https://github.com/pytorch/pytorch/issues/86178 Fixes https://github.com/pytorch/pytorch/issues/86179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86897 Approved by: https://github.com/soulitzer commit 2ce6150d23928773c35274aec369eb0a5ecd6fa4 Author: BowenBao Date: Wed Oct 12 10:35:22 2022 -0700 [ONNX] Fix scalar_type_analysis metadata for copied constant (#86716) Fix the source of metadata for copied constant. Since the constant is being implicitly casted, it makes more sense to assign code location and etc with the user node. This issue was discovered in https://github.com/pytorch/pytorch/issues/86627. This PR also adds unit test coverage for scope information of nodes when they are altered by CSE and related passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86716 Approved by: https://github.com/thiagocrepaldi, https://github.com/malfet commit 4839f73f329b38819e6f69a8662d61dc36558e52 Author: Sheil Kumar Date: Thu Oct 13 17:54:28 2022 +0000 Fix incorrect tensor storage check (#86845) Fix incorrect tensor storage check This change contains an incorrect check for storage: https://github.com/pytorch/pytorch/pull/86557 **self.storage is not None** should have been: **not torch._C._has_storage(self)** These fixes were run through the DirectML test suite, and confirm the check is now working correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86845 Approved by: https://github.com/martinb35, https://github.com/bdhirsh commit afc996386552c13e3910164507408295ab77689a Author: Frankie Robertson Date: Thu Oct 13 17:42:28 2022 +0000 Fix path to nested_tensor in example (#86891) This appears to be a typo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86891 Approved by: https://github.com/H-Huang commit 54ee95c8ecaf19ef6464daf0e5d967c781011101 Author: Kshiteej K Date: Thu Oct 13 17:36:37 2022 +0000 [nn] module: full_backward_pre_hook (#86700) Fixes https://github.com/pytorch/pytorch/issues/42824 * [x] Test * [x] Doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/86700 Approved by: https://github.com/soulitzer commit 7dcfbedce071c62d0ac40ca86c844b5cd4b4d9ef Author: mikael10j Date: Thu Oct 13 17:31:33 2022 +0000 Fix LinearLR scheduler start_factor (#86695) Fixes #86454 The `start_factor` must be comprised in ]0;1] instead of [0;1] to avoid division by 0. This PR changes the lower limit checking of the parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86695 Approved by: https://github.com/albanD commit 6ee94b572ac248b11a90558b7358c242ad9b56fa Author: samdow Date: Thu Oct 13 17:26:54 2022 +0000 [functorch] Add shard to run functorch tests with asan (#82164) This adds asan testing for functorch. It was running really long (>4hrs) with test ops, so we decided that those tests are probably redundant and skipped those. This brings this test's time down to ~30 min Pull Request resolved: https://github.com/pytorch/pytorch/pull/82164 Approved by: https://github.com/zou3519, https://github.com/malfet, https://github.com/huydhn commit 427e0a6b4ebc691f1fa98662d04d5c431a75107f Author: Eddie Yan Date: Thu Oct 13 17:26:36 2022 +0000 [cuDNN] Enable cuDNN Frontend v8 API by Default (#84948) Opening this PR for testing for now to check CI status. 🤞 CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/84948 Approved by: https://github.com/ngimel commit b0d80f4355ac75a19400c3bd278db104841ffbba Author: BowenBao Date: Mon Oct 10 17:23:55 2022 -0700 [ONNX] Clarify phrasing of skipScriptTest/skipTraceTest decorators (#86216) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86216 Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock commit 0ee09996086db18d6f449d1a6743dd6f33d94153 Author: BowenBao Date: Tue Oct 11 00:15:59 2022 +0000 [ONNX] Renable assert diagnostic test (#85999) Fix to properly clear 'background_context' of export diagnostic 'engine' in `clear`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85999 Approved by: https://github.com/abock commit cff333bdb55b98d6c2464db684cf0f1a0f769987 Author: Tugsbayasgalan Manlaibaatar Date: Wed Oct 12 17:24:38 2022 -0700 Enable max.unary_out (#86855) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86855 Approved by: https://github.com/jerryzh168, https://github.com/bdhirsh commit 25811663af2f7ddf6623b28807697268eb2167ab Author: Colin Taylor Date: Thu Oct 13 16:48:24 2022 +0000 [FSDP] restricts meta model check to non ignored modules in FSDP (#86766) Summary: as title Test Plan: see test plan D40287799 Differential Revision: D40287890 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86766 Approved by: https://github.com/awgu commit ab6955067875c9a84c98de0e76a53ea46502a89c Author: Mikayla Gawarecki Date: Wed Oct 12 22:31:13 2022 +0000 Add nested squeeze.dim and unsqueeze (#86813) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86813 Approved by: https://github.com/drisspg commit e531cf7b2e55a6aa0eee711b260b3bb8cd56067e Author: HDCharles Date: Wed Oct 12 20:48:36 2022 -0700 [ao] fixing public v private for fx.backend_config_utils.py (#86037) Summary: just added a missing function to __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86037 Approved by: https://github.com/jerryzh168 commit d169f950da4e175db9fe0444d55a0f49f8ec9fcc Author: PyTorch MergeBot Date: Thu Oct 13 15:28:09 2022 +0000 Revert "Use CUTLASS GEMM for NT bmm [OSS-only] (#85894)" This reverts commit ef58a132f223d5abf2bd3f8bee380aca6c29d17f. Reverted https://github.com/pytorch/pytorch/pull/85894 on behalf of https://github.com/DanilBaibak due to Break internal build commit b97ae59e29ff78829632bd4ae24edd5ecc9cf5ea Author: Will Constable Date: Thu Oct 13 15:10:46 2022 +0000 Change legacy wrap_dim to work with symint == (#86842) - previously, sizes == vector({0}) failed to hit SymInt::operator==, causing a the loop to bail out too early and make an invalid call to downstream maybe_wrap_dim helper Pull Request resolved: https://github.com/pytorch/pytorch/pull/86842 Approved by: https://github.com/Chillee, https://github.com/malfet, https://github.com/albanD commit 3d9fd060f47fa623d241f4a8c2da6ea7ab6dfb72 Author: Richard Zou Date: Wed Oct 12 13:00:40 2022 -0700 [functorch] Add more details to the functorch install page (#86823) Added some details about: - `pip uninstall functorch` being helpful if there are problems - `pip install functorch` still working for BC reasons. Test Plan: - wait for docs preview Pull Request resolved: https://github.com/pytorch/pytorch/pull/86823 Approved by: https://github.com/samdow commit cbc01c4344238efb40151b0968536296d0f24331 Author: Peter Bell Date: Wed Oct 12 23:25:43 2022 +0100 OpInfo: Sample input cleanup (2/n) (#86379) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86379 Approved by: https://github.com/mruberry commit 2efc56d9d7faf21f5c90ea0523a7fa8ea76e1b1b Author: Peter Bell Date: Wed Oct 12 23:25:43 2022 +0100 OpInfo: Sample input cleanup (1/n) (#86231) This rewrites various sample and error input functions to: - use the convention of `make_arg = functools.partial(make_tensor, ...)` - use the new natural syntax for `SampleInput` construction - yield instead of returning a lists, to reduce memory consumption Pull Request resolved: https://github.com/pytorch/pytorch/pull/86231 Approved by: https://github.com/mruberry commit 45274c56a4547d9e3562ee40b0c515622ff80745 Author: BowenBao Date: Mon Oct 10 17:23:55 2022 -0700 [ONNX] Partially re-enable RoiAlign and RoiPool unit tests (#86169) This PR depends on https://github.com/pytorch/vision/pull/6685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86169 Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock commit e17732b234e05eccad7e7e2d7fbd6c26f9bdca87 Author: Brian Hirsh Date: Wed Oct 12 13:55:14 2022 -0700 [test] add cross-ref tests for python meta kernels (#86228) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86228 Approved by: https://github.com/albanD commit 0feccda7d74a23509e1b1edd0c5c76d5f67fa813 Author: Brian Hirsh Date: Wed Oct 12 13:55:13 2022 -0700 fix aliasing bug in pixel shuffle/unshuffle (#86608) Fixes https://github.com/pytorch/pytorch/issues/82235 cc @albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write Pull Request resolved: https://github.com/pytorch/pytorch/pull/86608 Approved by: https://github.com/albanD commit 337605054359a63083edcc7dcd8d887ce32947ed Author: Brian Hirsh Date: Wed Oct 12 13:55:13 2022 -0700 fix type promotion for group_norm composite C++ kernel (#86607) python decomp for `native_group_norm` is correct in more cases than the C++ composite. Updating the tests to fail properly in this case was more annoying than just fixing the C++ decomp, so I fixed it here. When the input tensor had a dtype with less precision than float32, the C++ decomp would unconditionally set the mean/variance to float32, which was wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86607 Approved by: https://github.com/albanD commit 6907db3f9578fc8cc477c175d982c6dcac69332d Author: Brian Hirsh Date: Wed Oct 12 13:55:13 2022 -0700 fix aliasing for primtorch view meta kernels (#86285) Fixes https://github.com/pytorch/pytorch/issues/86284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86285 Approved by: https://github.com/albanD, https://github.com/mruberry commit 77e68b16cc1d320852742274bb8a15d1aa7f4915 Author: Michael Andreas Dagitses Date: Thu Oct 13 06:14:21 2022 -0700 suggest rebasing through @pytorchbot if PR is stale (#86898) Summary: Test Plan: Testing on GitHub with `stale_pr_days` set to zero. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86898 Approved by: https://github.com/malfet commit 8fffb79771f367e767cc85c31e5e0daed9f6eb7c Author: Richard Zou Date: Wed Oct 12 11:38:56 2022 -0700 Add vmap support for slogdet; fix regression from functorch 0.2.1 (#86815) This PR adds vmap support for slogdet -- slogdet just decomposes into linalg.slogdet. This fixes a regression from functorch 0.2.1 (slogdet had a batching rule then, and doesn't anymore). We didn't catch the regression because it seems like slogdet doesn't have an OpInfo (I'm not sure if it had one before). Test Plan: - new one-off test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86815 Approved by: https://github.com/samdow commit 77d94ac5ab0c15bdfb2dfe6df6ab8ad87f67edef Author: Syed Tousif Ahmed Date: Thu Oct 13 14:03:01 2022 +0000 Sets CUDA_MODULE_LOADING to LAZY when not set by the user (#85692) This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY". It was tested using the following commands: ``` python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)" ``` which shows a memory usage of: 287,047,680 bytes vs ``` CUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)" ``` which shows 666,632,192 bytes. C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality). cc: @ptrblck @ngimel @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/85692 Approved by: https://github.com/malfet commit 30a8a87c80dbfd7df81927a5acd190fac2240e04 Author: Tugsbayasgalan Manlaibaatar Date: Wed Oct 12 18:26:35 2022 -0700 Fix autogen for _ctc_loss.Tensor (#86871) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86871 Approved by: https://github.com/larryliu0820 commit dc6ce1485ec576df6f8d9f9e9717628802995cf4 Author: Salil Desai Date: Tue Oct 11 22:49:15 2022 -0700 Use Variable Size Indices in Sparse Qlinear Code (#85247) Final changes to enable sparse weight packing with variable size indices pack_block_sparse.cc is deleted because all functions in it have a template added, so they are moved to pack_block_sparse.h Differential Revision: [D39025651](https://our.internmc.facebook.com/intern/diff/D39025651/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39025651/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85247 Approved by: https://github.com/digantdesai commit d3afd49c85947a178dc7e2f15f97206387d2a279 Author: Salil Desai Date: Tue Oct 11 22:49:13 2022 -0700 Enable 16bit and 8bit Row/Col Indices in Qnnpack Fully Connected Sparse Op (#85246) This diff enables using the 16bit and 8bit kernels added in the previous diff. (This change used to be in D38954842 v11 but was moved into its own diff) Differential Revision: [D39403164](https://our.internmc.facebook.com/intern/diff/D39403164/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85246 Approved by: https://github.com/kimishpatel commit 6c6e06619f30fb3a12bc05738f6b7f39425618c9 Author: Salil Desai Date: Tue Oct 11 22:49:11 2022 -0700 Add 16bit and 8bit row/col indices q8gemm sparse kernels (#85245) TLDR: see D39003528 to see the actual changes in this diff more clearly, which will make reviewing easier ___ The 32bit versions were changed to be created with a macros which are also used to create 16bit and 8bit versions This diff shows that almost all of the lines in the .s files were modified, but most changes are just adding spaces to the front and ;/ to the end so they can be contained in the macro. To generate these changes, I first wrote the macros without the spaces and ;/, and then I ran a script (see the python file in D39003528) to get the final version. To review this diff more easily, if you want to see the code changes before I ran the script, which makes it much easier to see which lines were changed, see D39003528. Each version of this diff is synched with the same number version of that diff (so if I change this diff I will mirror the changes to the same version on that diff) Differential Revision: [D39003527](https://our.internmc.facebook.com/intern/diff/D39003527/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85245 Approved by: https://github.com/kimishpatel commit 6c6a32c2233f5d0820a265574734ab5706beeeee Author: Salil Desai Date: Tue Oct 11 22:49:09 2022 -0700 Enable Running Variable Size Row/Col Indices q8gemm Sparse Kernels in QNNPACK (#85244) For aarch32 and aarch64, the 16bit and 8bit versions of the kernels are left empty. I will be adding them in a future diff (D39003527) to avoid having this diff be too cluttered. Differential Revision: [D38954842](https://our.internmc.facebook.com/intern/diff/D38954842/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85244 Approved by: https://github.com/kimishpatel commit 4c0e1dc9808bc2c68ceaceae72e41308dccf8c5d Author: Salil Desai Date: Tue Oct 11 22:49:08 2022 -0700 Update Qnnpack Fully Connected Sparse Op to Store Variable Size Indices (#85243) Only uint32_t is supported for now, but uint16_t and uint8_t support will be added in future diffs. Differential Revision: [D38828545](https://our.internmc.facebook.com/intern/diff/D38828545/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85243 Approved by: https://github.com/kimishpatel commit 1a87c25fe19f117fd413ec469d7c21ac6ff44a62 Author: Nikita Shulga Date: Thu Oct 13 04:25:41 2022 +0000 Add functorch shard to sm86-periodic workflow (#86820) After https://github.com/pytorch/pytorch/pull/86799 was landed there shouldn't be a need to increase tolerances Pull Request resolved: https://github.com/pytorch/pytorch/pull/86820 Approved by: https://github.com/zou3519 commit cb4867a71a5944baaf6655bd765652cf37864443 Author: Emilio Castillo Date: Thu Oct 13 04:06:13 2022 +0000 Make `ASGD` & `RProp` differentiable (#86258) Blocked by #86183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86258 Approved by: https://github.com/albanD commit 5224906749c85f1e2f6d7ec37a02bd29bcdebef3 Author: Huy Do Date: Thu Oct 13 03:31:28 2022 +0000 Spread distributed backends among all distributed shards (#86837) So that they can be run in parallel without stepping on each other toe Pull Request resolved: https://github.com/pytorch/pytorch/pull/86837 Approved by: https://github.com/clee2000 commit 48c648d75df4a2d02ede71f34c11b7f48c80da0e Author: Peter Bell Date: Tue Oct 11 03:24:07 2022 +0100 Fix typo TORCH_ONLY_METHOD_OPERATORS -> TORCH_ASSERT_ONLY_... (#86661) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86661 Approved by: https://github.com/malfet commit 67fbd940bae60e4392fb72eb495d51f6e0261260 Author: HDCharles Date: Wed Oct 12 10:04:06 2022 -0700 [ao] fixing public v private for fx.quantization_types (#86036) Summary: this file doesn't actually exist anymore so its just a case of removing the exception for it Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86036 Approved by: https://github.com/jerryzh168 commit b00cdb5b3416d908898c30d5f070085f7765f916 Author: HDCharles Date: Wed Oct 12 10:04:05 2022 -0700 [ao] fixing public v private for quantization_patterns.py (#86034) Summary: no significant changes, just addded __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86034 Approved by: https://github.com/jerryzh168 commit 77d29bcee200f04bece4a86283acfb8e1ec830ad Author: Khushi Agrawal Date: Thu Oct 13 01:18:30 2022 +0000 [primTorch] special: ndtr, ndtri, log_ndtr, erfcx (#86077) - Adds prims and _refs for `erfcx` and `ndtri`. - Adds _refs for `ndtr`, and `log_ndtr`. cc @kshitij12345 @lezcano @mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/86077 Approved by: https://github.com/mruberry commit ea586c0579a1fce55dbba4be7c88e9e04e709cef Author: Michael Voznesensky Date: Thu Oct 13 00:54:17 2022 +0000 Fix up cond a bit to make it work w/ fake tensor (#86727) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86727 Approved by: https://github.com/zou3519 commit 2a75152537c364eafecc9046d3e82bfc934cd056 Author: Mikayla Gawarecki Date: Wed Oct 12 21:21:10 2022 +0000 [easy] Add nested tanh (#86826) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86826 Approved by: https://github.com/cpuhrsch commit b79bac0e4ddb3a0b956b8bd0b33ab88daaa64de4 Author: CaoE Date: Thu Oct 13 00:42:45 2022 +0000 Make the data types of output and input consistenst for batchnorm (#84410) The model TTS will crash due to the issue:: when input of BN is not contiguous and the data type of input is different with that of parameters, BN will raise error `RuntimeError: !needs_dynamic_casting::check(iter) INTERNAL ASSERT FAILED at "xxx/pytorch/aten/src/ATen/native/cpu/Loops.h":311, please report a bug to PyTorch`. Make the data types of output and input consistenst for batchnorm to fix the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84410 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet commit c2f29e75cd9e8edb5bb2bb4163a4e26dd8f7d9f4 Author: Catherine Lee Date: Thu Oct 13 00:42:40 2022 +0000 [flakybot] add dynamo as platform (#86701) corresponding pr in test-infra https://github.com/pytorch/test-infra/pull/874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86701 Approved by: https://github.com/huydhn commit 9470059766dcb3f1e67f9d015aec8b57239ea421 Author: Zain Rizvi Date: Thu Oct 13 00:38:45 2022 +0000 Allow viable/strict promotion even if periodic or docker-release-builds jobs are failing (#86827) Allow `viable/strict` promotion even if `periodic` or `docker-release-builds` jobs are failing **Why?** Those jobs only run occasionally and for all we know the current viable/strict commit may already include the errors that the above cron based workflows may have later detected. Blocking the viable/strict upgrade because of these scheduled jobs doesn't really offer any value, it just leads to people getting older PRs when they try to fork off of viable/strict without guaranteeing an improvement in test quality Though frankly, the current situation is worse than that. Assume the branch history looks like A -> B A is the current `viable/strict` commit B is a commit that failed some `periodic` test, so `viable/strict` wasn't upgraded to B Now lets say there's a commit C that gets merged. C neither contains a fix for the failing periodic build, nor does a scheduled periodic workflow run against C. The branch becomes A -> B -> C In the above scenario, today we will promote `viable/strict` to C since there was no failing workflow there!!! Even though it didn't actually fix what was broken with B! In short, avoiding the upgrade to B really doesn't make any sense today and we shouldn't do it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86827 Approved by: https://github.com/janeyx99 commit 66cab5245fbae639d7bc528d22eafe97c03bb935 Author: albanD Date: Wed Oct 12 11:24:51 2022 -0400 Reland 2 min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86797) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86797 Approved by: https://github.com/bdhirsh commit 894c4218dd9a656d03b21a6f560b51ac432ae529 Author: Eli Uriegas Date: Wed Oct 12 13:36:56 2022 -0700 ci: Just use regular checkout (#86824) checkout-pytorch seems to have issues and is purpose made for our PR testing and appears to conflict with what we're trying to do for binary builds. For builds like https://github.com/pytorch/pytorch/actions/runs/3207520052/jobs/5242479607 there is a confusion over where the reference is pulled and I believe it is root caused by the checkout logic in checkout-pytorch. So with that in mind I suggest we just use the upstream checkout action for this job Signed-off-by: Eli Uriegas Pull Request resolved: https://github.com/pytorch/pytorch/pull/86824 Approved by: https://github.com/atalman commit aacb9f3ac63d9a31d064c76ff3d328037355b28e Author: Emilio Castillo Date: Wed Oct 12 23:16:29 2022 +0000 Make `Adadelta`,`Adagrad` & `Adamax` differentiable (#86096) Continuing the differentiable optimizers support Pull Request resolved: https://github.com/pytorch/pytorch/pull/86096 Approved by: https://github.com/janeyx99 commit e552cf105058e6d7ea367d31ff3d3c0a31ea0bbd Author: Shawn Zhong Date: Wed Oct 12 22:31:48 2022 +0000 [DOC] Use type hints to show annotation in the docs (#79086) Fixes #44964 Use type hints in the code to show type annotations in the parameters section of the docs. For the parameters already documented in the docstring, but lack the type annotation, the type hints from the code are used: | [Before](https://pytorch.org/docs/master/generated/torch.nn.AdaptiveMaxPool1d.html) | [After](https://docs-preview.pytorch.org/79086/generated/torch.nn.AdaptiveMaxPool1d.html) | | --- | --- | | image | image | | [Before](https://pytorch.org/docs/master/generated/torch.nn.Linear.html) | [After](https://docs-preview.pytorch.org/79086/generated/torch.nn.Linear.html) | | --- | --- | | image | image | Ref: - PR https://github.com/pytorch/pytorch/pull/49294 removed type annotations from signatures in HTML docs. - Sphinx version was bumped to 5.0.0 in PR #70309 - Duplicated (closed) issues: #78311 and #77501 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79086 Approved by: https://github.com/malfet commit a77f2a95a77cc2c4af9c1fa4144dfe97bab2f3ed Author: Mikayla Gawarecki Date: Wed Oct 12 18:20:35 2022 +0000 Improve NestedTensor documentation (#85186) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85186 Approved by: https://github.com/cpuhrsch commit be81f3d8d4c6e974fa1644ace20bc1e75e168c90 Author: Huy Do Date: Wed Oct 12 21:17:25 2022 +0000 Revert distributed test parallelization (#86756) Revert an old commit and resolve some conflicts Fixes https://github.com/pytorch/pytorch/issues/86418 Fixes https://github.com/pytorch/pytorch/issues/86419 Fixes https://github.com/pytorch/pytorch/issues/86415 Fixes https://github.com/pytorch/pytorch/issues/86420 Fixes https://github.com/pytorch/pytorch/issues/86416 Fixes https://github.com/pytorch/pytorch/issues/86392 Fixes https://github.com/pytorch/pytorch/issues/86391 Fixes https://github.com/pytorch/pytorch/issues/86397 Fixes https://github.com/pytorch/pytorch/issues/86390 Fixes https://github.com/pytorch/pytorch/issues/86398 Fixes https://github.com/pytorch/pytorch/issues/86396 Fixes https://github.com/pytorch/pytorch/issues/86395 Fixes https://github.com/pytorch/pytorch/issues/86393 Fixes https://github.com/pytorch/pytorch/issues/86394 Fixes https://github.com/pytorch/pytorch/issues/86440 Fixes https://github.com/pytorch/pytorch/issues/86442 Fixes https://github.com/pytorch/pytorch/issues/86439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86756 Approved by: https://github.com/mrshenli commit 09a676f639b422baf947768c47116b944470e411 Author: Antonio Kim Date: Wed Oct 12 20:57:19 2022 +0000 Add hooks for register_buffer/module/parameter (#86148) As described in the issue, this PR adds hooks to be run when `register_parameter`, `register_buffer` and `register_module` are called. Fixes #85837 cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345 @saketh-are Pull Request resolved: https://github.com/pytorch/pytorch/pull/86148 Approved by: https://github.com/albanD commit c08cbfccd9e2a9b2e6006773d04aafa74977684f Author: Zain Rizvi Date: Wed Oct 12 20:43:42 2022 +0000 Let retried jobs advance viable/strict (#86821) Today, even if we retry a failed workflow it succeeds on the retry, viable/strict doesn't advance forward. Success on retry is proof that the error wasn't with the current commit and that we should in fact promote viable/strict. This PR points to an updated rockset query which will only look at the success status of the most recent job in each workflow Here's the query edited: Original query: https://console.rockset.com/lambdas/details/commons.commit_jobs_batch_query/versions/15aba20837ae9d75?tab=sql Updated query: https://console.rockset.com/lambdas/details/commons.commit_jobs_batch_query/versions/8003fdfd18b64696?tab=sql Testing: Tested the old and new query against commits known to have succeeded on retry Pull Request resolved: https://github.com/pytorch/pytorch/pull/86821 Approved by: https://github.com/huydhn, https://github.com/malfet commit 3b26680222998778f48e0a1939bdafab6db53c7c Author: vfdev Date: Wed Oct 12 20:33:14 2022 +0000 Update _torch_docs / ldexp (#86721) Fixes a typo on ldexp docstring. https://pytorch.org/docs/master/generated/torch.ldexp.html?highlight=ldexp#torch.ldexp image https://livesphinx.herokuapp.com/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/86721 Approved by: https://github.com/samdow commit 363b108e39c714521836b9855062022c98d6dba8 Author: Jerry Zhang Date: Tue Oct 11 17:23:55 2022 -0700 [quant][fx] Fix weight_dtype and bias_dtype backend_config checks (#86719) Summary: This PR adds checks for the existence of "weight_dtype" and "bias_dtype" in the node_name_to_dtype dictionary before accessing it, the corner case is hit when we check the compatibility of qconfig and backend_config for weight and bias that appears before activation (e.g. torch.addmm) Test Plan: python test/test_quantization.py -k test_backend_config_check_for_weight_and_bias Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86719 Approved by: https://github.com/andrewor14 commit d6bfbdf50c48ebc3a909a47c416ba0a73ee6174d Author: HDCharles Date: Wed Oct 12 10:04:05 2022 -0700 [ao] fixing public v private for fx.pattern_utils.py (#86033) Summary: added __all__, one issue with QuantizeHandler is that since its defined as 'Any' it can't be set as a public module although it should be, i've set it to private here but when the circular dependency gets fixed, it will probably be removed. Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86033 Approved by: https://github.com/jerryzh168 commit bf0116d1f0c5ec58308a0af4e8f4212a78db649f Author: HDCharles Date: Wed Oct 12 10:04:04 2022 -0700 [ao] fixing public v private for fx.graph_module.py (#86032) Summary: no significant changes, just added __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86032 Approved by: https://github.com/jerryzh168 commit 25476f2e4b8bc1ddc0d6b6a7d71d7626fa5eb76e Author: HDCharles Date: Wed Oct 12 10:04:04 2022 -0700 [ao] fixing public v private for quantization_types (#86031) Summary: the main problem with this was that the different objects defined simply as 'Any' should theoretically be public but making them public either A) results in an error about the module being 'typing' rather than whatever module it should be or B) you set the module manually, thereby changing the module for the original 'Any' class. note: QuantizeHandler has a similar issue where its simply defined as 'Any' Pattern was defined in multiple places which was causing issues so i just moved it to a single place given the note at the top of quantization_types.py indicating these definitions should be moved to utils at some point anyway. Finally i changed any references to these objects to point at the correct locations. Note: i didn't see any fb internal references to NodePattern or QuantizerCls that would cause issues. Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86031 Approved by: https://github.com/jerryzh168 commit ef58a132f223d5abf2bd3f8bee380aca6c29d17f Author: Christian Puhrsch Date: Wed Oct 12 20:03:25 2022 +0000 Use CUTLASS GEMM for NT bmm [OSS-only] (#85894) OSS-only copy of https://github.com/pytorch/pytorch/pull/85710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894 Approved by: https://github.com/drisspg commit 73c43ce2e2a074f4a1e688d4f8b2ebacd9256476 Author: Peter Bell Date: Mon Oct 10 15:58:26 2022 +0100 Display unexpected exceptions raised from test_dtypes (#86599) Currently `test_dtypes` swallows all exceptions which can make debugging failures more tricky. This changes the test to save the exceptions and print only the unexpected ones at the end e.g. ``` AssertionError: The supported dtypes for nn.functional._scaled_dot_product_attention on device type cuda are incorrect! The following dtypes did not work in backward but are listed by the OpInfo: {torch.bfloat16}. Unexpected failures raised the following errors: torch.bfloat16 - CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling [...] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86599 Approved by: https://github.com/mruberry commit 6be9d9a630993f0a64c16d82d9605b8e4a5ad603 Author: Amadeusz Skrzypczak Date: Wed Oct 12 19:37:13 2022 +0000 Add AutocastHPU support (#84927) New dispatch key and necessary functions are added to PyTorch. Backend implementation will be added in the external library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84927 Approved by: https://github.com/bdhirsh commit 553eaaba7c173cbd4507f2929d39d4b61c246bf6 Author: Richard Zou Date: Wed Oct 12 19:27:17 2022 +0000 Disable tf32 in functorch transform tests (#86799) This PR applies a large hammer and disables TF32 in specific functorch transform tests. TF32 isn't precise enough to test correctness. We could have applied a smaller hammer by disabling TF32 per-OpInfo, but that doesn't seem to have too much additional benefit (e.g. if a convolution batching rule is correct on fp32 then I would expect it to be correct under TF32 modulo precision issues because the actual sequence of PyTorch operators we invoke has not changed, only the backend did). Test Plan: - I tested this locally on a machine with A100 GPUs. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86799 Approved by: https://github.com/malfet commit d56017a14f34b5130fa70c0cba010e3d2506deb0 Author: Nikita Karetnikov Date: Wed Oct 12 11:20:04 2022 +0200 [primTorch] Add ref for `triplet_margin_loss`, improve `triplet_margin_with_distance_loss` (#85614) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85614 Approved by: https://github.com/lezcano, https://github.com/mruberry commit ce56ee11fdf3843d507425bfa401e6ad5f4ee492 Author: Daniel Dale Date: Wed Oct 12 18:37:50 2022 +0000 Extend torch.cuda.is_available() to attempt an NVML-based CUDA availability assessment when explicitly requested by the user (#85951) Fixes #83973 (This is a substitute PR for https://github.com/pytorch/pytorch/pull/85024) First of all, thanks for your invaluable contributions to PyTorch everyone! Given how extensively `torch.cuda.is_available` is used in the PyTorch ecosystem, IMHO it's worthwhile to provide downstream libraries/frameworks/users the ability to alter the default behavior of `torch.cuda.is_available` in the context of their PyTorch usage. I'm confident there are many current and future such use cases which could benefit from leveraging a weakened, NVML-based `torch.cuda.is_available` assessment at a downstream framework's explicit direction (thanks @malfet https://github.com/pytorch/pytorch/commit/81da50a972fc402a6dd880fe392af0f0051cb6de !). Though one could always patch out the `torch.cuda.is_available` function with another implementation in a downstream library, I think this environmental variable based configuration option is more convenient and the cost to including the option is quite low. As discussed in https://github.com/pytorch/pytorch/pull/85024#issuecomment-1261542045, this PR gates new non-default NVML-based CUDA behavior with an environmental variable (PYTORCH_NVML_BASED_CUDA_CHK) that allows a user/framework to invoke non-default, NVML-based `is_available()` assessments if desired. Thanks again for your work everyone! @ngimel @malfet @awaelchli Pull Request resolved: https://github.com/pytorch/pytorch/pull/85951 Approved by: https://github.com/ngimel commit cd7c86eaa46874993affc48d31f826625762c461 Author: Ivan Yashchuk Date: Wed Oct 12 18:21:58 2022 +0000 Add prims.clone (#86705) This simple PR adds `clone` as a primitive. Current implementation of `clone` is not supported with nvFuser executor because of `empty_like` + `copy_to`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86705 Approved by: https://github.com/mruberry commit 3356d0385fd0f3f0f6ce2d8c681a40fd110c7848 Author: Howard Huang Date: Wed Oct 12 15:40:07 2022 +0000 [BE] Store helper functions C++ for python API parity (#82136) Add helper functions for `store.set()`, `store.compare_set()` to accept string arguments instead of vector and refactored some usages internally Pull Request resolved: https://github.com/pytorch/pytorch/pull/82136 Approved by: https://github.com/rohan-varma commit cc7ea93c2cf4275faaae29db9006d8f6067b1c5a Author: BowenBao Date: Mon Oct 10 17:23:54 2022 -0700 [ONNX] Support device().type() string comparison with constant (#86168) Fixes #86168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86168 Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock commit 58542eb25618eb6784567e7497f4764ab04d70ad Author: HDCharles Date: Tue Oct 11 17:40:37 2022 -0700 [ao] fixing public v private for backend_config.native.py (#86030) Summary: no significant changes, just added some things to __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86030 Approved by: https://github.com/jerryzh168 commit 409efebab8718a7bfc714ab3787e5a8689289697 Author: Vladimír Aubrecht Date: Wed Oct 12 15:44:28 2022 +0000 Added define to fix issue with compatibility with latest Windows SDK (#85408) Fixes #83820. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85408 Approved by: https://github.com/ezyang commit f24d174fffaf6efbc0c95ed561ab839ca496f6a7 Author: Sheil Kumar Date: Wed Oct 12 15:26:29 2022 +0000 Allow PrivateUse1 backends to not have Storage (#86557) Allow PrivateUse1 backends to not have Storage To unblock the DirectML backend, this change would be needed for 1.13 as well. The DirectML backend creates tensors using the open registration pattern documented here: https://pytorch.org/tutorials/advanced/extend_dispatcher.html [registration example](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbdhirsh%2Fpytorch_open_registration_example&data=05%7C01%7CSheil.Kumar%40microsoft.com%7Cf107b0b4349e41f1a57808daa7ee8a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638006940242882444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ivYLNmuC1WMitwu8n%2B1RAmeKkRM4ssb7EvhhGKJDFwk%3D&reserved=0) However, DirectML tensors are opaque, and do not have Storage. The DirectML Tensor Impl derives from OpaqueTensorImpl, which does not have a storage. Because of this various places in the code fail that expect storage to be present. We had made various changes in-tree to accommodate this: a. def __deepcopy__(self, memo): [https://github.com/pytorch/pytorch/blob/b5acba88959698d35cb548c78dd3fb151f85f28b/torch/_tensor.py#L119](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fb5acba88959698d35cb548c78dd3fb151f85f28b%2Ftorch%2F_tensor.py%23L119&data=05%7C01%7CSheil.Kumar%40microsoft.com%7Cf107b0b4349e41f1a57808daa7ee8a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638006940242882444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ajg23nMCzgRDwlinqSxS%2BRmOkAcDCr3LW%2BBEfNCn5hw%3D&reserved=0) or self.device.type in ["lazy", "xla", "mps", "ort", "meta", "hpu", 'dml'] b. def _reduce_ex_internal(self, proto): [https://github.com/pytorch/pytorch/blob/b5acba88959698d35cb548c78dd3fb151f85f28b/torch/_tensor.py#L275](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fb5acba88959698d35cb548c78dd3fb151f85f28b%2Ftorch%2F_tensor.py%23L275&data=05%7C01%7CSheil.Kumar%40microsoft.com%7Cf107b0b4349e41f1a57808daa7ee8a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638006940242882444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xDW6LwPSe2F396OJ6QSJY6mVzJVDeQiJgA0G347y2pw%3D&reserved=0) if self.device.type in ["xla", "ort", "hpu", "dml"]: c. TensorIteratorBase::build has an unsupported list for tensors without storage. [https://github.com/pytorch/pytorch/blob/b5acba88959698d35cb548c78dd3fb151f85f28b/aten/src/ATen/TensorIterator.cpp#L1497](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fb5acba88959698d35cb548c78dd3fb151f85f28b%2Faten%2Fsrc%2FATen%2FTensorIterator.cpp%23L1497&data=05%7C01%7CSheil.Kumar%40microsoft.com%7Cf107b0b4349e41f1a57808daa7ee8a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638006940242882444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qAdgNgzKl0xrtOvsABpw1VGkSoGUpe7jwDPhHw3XjgU%3D&reserved=0) Using the PrivateUse1 backend, similar exemptions need to be made in order to relax requirements on Storage so that the DirectML backend tensors can work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86557 Approved by: https://github.com/bdhirsh, https://github.com/martinb35 commit 61a5898675d2b18bea1009305ce1b1f7042b7d64 Author: Philip Meier Date: Wed Oct 12 13:03:46 2022 +0000 use cff standard for citation information (#86200) GH picks up on our `CITATION` file in the root of the repository. ![Screenshot from 2022-10-04 11-34-54](https://user-images.githubusercontent.com/6849766/193811617-b71ef606-a043-498b-bb2d-14b6c05e79e7.png) However, [the preferred way](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files) is use a `CITATION.cff` file instead since GH supports the [citation file format (CFF) standard](https://github.com/citation-file-format/citation-file-format). With this PR, the prompt changes to ![Screenshot from 2022-10-04 13-48-21](https://user-images.githubusercontent.com/6849766/193812010-026bfad7-7c4e-4b59-a90a-1d3ad47303d0.png) with the following auto-generated bibtex entry: ```bibtex @inproceedings{Paszke_PyTorch_An_Imperative_2019, author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith}, booktitle = {Advances in Neural Information Processing Systems 32}, pages = {8024--8035}, publisher = {Curran Associates, Inc.}, title = {{PyTorch: An Imperative Style, High-Performance Deep Learning Library}}, url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}, year = {2019} } ``` Comparing with what we currently have the only significant difference is that the editors are no longer listed although the metadata is there. This is an issue with GH's automatic conversion and might be fixed in the future. Plus, the cite key was changed from `NEURIPS2019_9015` to `Paszke_PyTorch_An_Imperative_2019`, but this has no effect on the rendered result. Do we also want to adopt the CFF standard? Pull Request resolved: https://github.com/pytorch/pytorch/pull/86200 Approved by: https://github.com/dagitses commit 493ded249ecaba1d76459901600d2dc7439a9f43 Author: Fabio Rocha Date: Wed Oct 12 09:33:06 2022 +0000 [primTorch] decomposition for bucketize (#86366) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86366 Approved by: https://github.com/mruberry commit f903f1ab343fea72177f29fc8d453febcaad8905 Author: jjsjann123 Date: Wed Oct 12 07:50:46 2022 +0000 Patching getitem in partitioner (#86713) 1. rejecting getitem operator in backends fusion query getitem is merged in a special post partition pass, backends that takes getitem shouldn't affect the logic 2. added test for failing cases Fixes #86698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86713 Approved by: https://github.com/SherlockNoMad commit 2344135179642df5d383d2e91880600f774cbdef Author: Khushi Date: Wed Oct 12 07:00:40 2022 +0000 [primTorch] special: entr, expit (#86592) Add _refs for `entr` & `expit`. cc @mruberry @kshitij12345! Pull Request resolved: https://github.com/pytorch/pytorch/pull/86592 Approved by: https://github.com/mruberry commit a47f93b6c97a39bb8934fc531145d8cdac5cf8f6 Author: Sherlock Huang Date: Wed Oct 12 02:26:02 2022 +0000 Add type and shape annotation for gm.print_readable() (#86562) For ``` def f(a, b): dim0 = a.shape[0] + b.shape[0] dim1 = a.shape[1] + b.shape[1] d = a.new_empty(dim0, dim1) return d fx_g = make_fx(f, tracing_mode="symbolic")(torch.randn(5, 3), torch.randn(4, 3)) fx_g.print_readable() ``` Tracing with 'real' and 'fake' mode yields ``` class f(torch.nn.Module): def forward(self, a_1: Tensor[5, 3], b_1: Tensor[4, 3]): new_empty: Tensor[9, 6] = torch.ops.aten.new_empty.default(a_1, [9, 6], dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False); a_1 = None return new_empty ``` Tracing with 'symbolic' mode yields ``` def forward(self, a_1: Tensor[t0.size(0), t0.size(1)], b_1: Tensor[t1.size(0), t0.size(1)]): sym_size: Symint(t0.size(0)) = torch.ops.aten.sym_size(a_1, 0) sym_size_1: Symint(t1.size(0)) = torch.ops.aten.sym_size(b_1, 0) add: Symint(t0.size(0) + t1.size(0)) = sym_size + sym_size_1; sym_size = sym_size_1 = None sym_size_2: Symint(t0.size(1)) = torch.ops.aten.sym_size(a_1, 1) sym_size_3: Symint(t0.size(1)) = torch.ops.aten.sym_size(b_1, 1); b_1 = None add_1: Symint(2*t0.size(1)) = sym_size_2 + sym_size_3; sym_size_2 = sym_size_3 = None new_empty: Tensor[t0.size(0) + t1.size(0), 2*t0.size(1)] = torch.ops.aten.new_empty.default(a_1, [add, add_1], dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False); a_1 = add = add_1 = None return new_empty ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86562 Approved by: https://github.com/Chillee commit e0d6898cbd9e7af8ecb1e911e4a8c29e79a78921 Author: PyTorch MergeBot Date: Wed Oct 12 04:12:43 2022 +0000 Revert "Backport currently dont work with some models if: (#86510)" This reverts commit 4bfb7341819b3bfcaf65ddc136f25d23983740a7. Reverted https://github.com/pytorch/pytorch/pull/86510 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit 25725fd62448165b91647304c26d676db22b6955 Author: Eddie Yan Date: Wed Oct 12 03:44:21 2022 +0000 (Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682) Rebased version of @mcarilli 's cudaMallocAsync #65365 for continued testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/82682 Approved by: https://github.com/ngimel commit a216f4700cbd3d126b4677bcf30f2082da0163ea Author: Nikita Shulga Date: Wed Oct 12 01:45:21 2022 +0000 Add testing on A10G GPU to periodic workflow (#85524) This enables testing on lots of modern CUDA features on sm_86 capable GPU While migrating to that platform, discovered that `functorch` tests for `nn.functional.conv.transpose3d` produce garbage on sm_80+ as well as 2 `nvfuser` tests unexpectedly pass and one unexpectedly fails. TODO: - Investigate unexpected success for `test_vmapvjp_linalg_householder_product_cuda_float32` and add `functorch` shard Pull Request resolved: https://github.com/pytorch/pytorch/pull/85524 Approved by: https://github.com/ngimel commit c4f0b93f8653505584bbd71162f82d4e7633da0c Author: Elias Ellison Date: Tue Oct 11 01:24:48 2022 +0000 Disable autocast in aot autograd (#86515) Fix for https://github.com/pytorch/torchdynamo/issues/1368 From comment: > When we invoke a Composite Implicit autograd operator that has an autocast rule, such as Einsum, autocast is disabled during its invocation. When we trace out the operators in an implicit op, re-applying on autocast rules on those operators might yield divergence from what was executed at runtime. This pass checks for divergence. If divergence is found, we will disable autocast. We would like to avoid disabling autocast if possible because accessing TLS is slow. Concretely, the problem found was when invoked `sum` in `einsum`: As seen by the following divergence: ``` >>> with torch.cuda.amp.autocast(enabled=True): ... print(torch.ops.aten.sum.dim_IntList(torch.rand([2, 2, 2], device="cuda", dtype=torch.half), [1, 2]).dtype) ... torch.float32 >>> print(torch.ops.aten.sum.dim_IntList(torch.rand([2, 2, 2], device="cuda", dtype=torch.half), [1, 2]).dtype) torch.float16 ``` Edit: we've decided to accept the overhead of universally disabling autocast instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/86515 Approved by: https://github.com/bdhirsh, https://github.com/Chillee commit d598290baab45b52b9b78d3083ac215f4251943c Author: Christian Puhrsch Date: Wed Oct 12 01:27:57 2022 +0000 Basic SDP benchmark harness (#86729) Basic benchmark for reference and discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86729 Approved by: https://github.com/drisspg commit 4bfb7341819b3bfcaf65ddc136f25d23983740a7 Author: Han Qi (qihqi) Date: Wed Oct 12 00:39:25 2022 +0000 Backport currently dont work with some models if: (#86510) Backport currently dont work with some models if: * model is originally exported with interface call enabled (backport would disable it) * model is flatbuffer (flatbuffer support is soft enabled via link time registry), so we manually trigger it Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86510 Approved by: https://github.com/cccclai commit ce48df9e938ac208cf018545517344c6a6debab2 Author: Bin Bao Date: Tue Oct 11 20:31:12 2022 +0000 Re-enable torchdynamo unit tests (#86658) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86658 Approved by: https://github.com/jansel commit 692b525b71658caf45fd8d70dd3f285b6eb6b821 Author: Nikita Shulga Date: Wed Oct 12 00:32:53 2022 +0000 [MPS] Extend unary ops to int64 (#86615) Most of them are already supported for `int64` except for: - rounding operations (`floor`, `ceil` and `round`), which are no-ops for integral types anyway - sign operation, when it can be emulated by clamping it tensor to [-1, 1] range Test new types by test MPS Fixes https://github.com/pytorch/pytorch/issues/86319 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86615 Approved by: https://github.com/DenisVieriu97, https://github.com/huydhn commit f912b5854466754b49aad5f9fc3f3470093dd192 Author: PyTorch MergeBot Date: Tue Oct 11 23:53:12 2022 +0000 Revert "Enable max.unary_out (#85926)" This reverts commit 16a0fa1204edb118800261a26281e624988eb239. Reverted https://github.com/pytorch/pytorch/pull/85926 on behalf of https://github.com/osalpekar due to The internal diff for this commit shows a number of pytorch quantization test failures. Here is a sample output: AssertionError: Tensor-likes are not close! Mismatched elements: 319 / 320 (99.7%). Greatest absolute difference: 0.056652069091796875 at index (0, 0, 4, 5) (up to 1e-05 allowed). Link to the diff: [D40232598](https://www.internalfb.com/diff/D40232598). Link to the Sandcastle job that is failing: https://www.internalfb.com/intern/sandcastle/job/18014399302908587/ commit 2aa981ab74df71c8d019f12032ce75910601b52c Author: PyTorch MergeBot Date: Tue Oct 11 23:39:50 2022 +0000 Revert "Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334) (#86488)" This reverts commit 978b46d7c96627e3b3553ad70ad21cb161d05f90. Reverted https://github.com/pytorch/pytorch/pull/86488 on behalf of https://github.com/osalpekar due to Broke executorch builds internally with the following message: RuntimeError: Missing out variant for functional op: aten::split.Tensor(Tensor(a -> *) self, SymInt split_size, int dim=0) -> Tensor(a)[] . Make sure you have loaded your custom_ops_generated_lib commit 9eb4f9dd175b3d73b1c9b7c1d00dad406db60e5e Author: Nikita Shulga Date: Tue Oct 11 19:49:23 2022 +0000 Tweak test tolerances to be compatible with A10G (#86538) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86538 Approved by: https://github.com/ngimel commit 7fa601b1a738382a8730f9f011b0a5c39247af6a Author: Nikita Shulga Date: Tue Oct 11 23:27:30 2022 +0000 Skip chalf.mean in test_reductions_large_half_tensors (#86747) As `mean_reduce` is not implemented for complex half Fixes https://github.com/pytorch/pytorch/issues/86743 and unblock A10G testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/86747 Approved by: https://github.com/ngimel commit 811b8e012b3ddcb84adb2e483089758e84b6a995 Author: PyTorch MergeBot Date: Tue Oct 11 23:12:40 2022 +0000 Revert "min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86643)" This reverts commit 86f914e9966e91b3d3e7c1504f5b1f00a9498d88. Reverted https://github.com/pytorch/pytorch/pull/86643 on behalf of https://github.com/osalpekar due to Need to revert this to cleanly revert https://github.com/pytorch/pytorch/pull/86488. This should be safe to re-land later commit f1fdb6efbd09dad3c308b0447682f1f14d2c325e Author: Jason Ansel Date: Tue Oct 11 23:01:21 2022 +0000 Manual changes for moving dynamo to core (#86621) This is the subset of the changes in #86461 not auto-generated by `copy_to_core.sh`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86621 Approved by: https://github.com/albanD commit 09364f4298e45142bf3a3a6a316447d24abb2fdf Author: Nikita Shulga Date: Tue Oct 11 22:39:58 2022 +0000 Compile C10 with `Wshadow` (#86666) This should prevent further regressions like https://github.com/pytorch/pytorch/pull/86646 Update `fmt` to `7.1.0` to fix variable shadowing in that library Pull Request resolved: https://github.com/pytorch/pytorch/pull/86666 Approved by: https://github.com/seemethere commit 0337f0ad473ffc298a30e603050d2df9d0073428 Author: Zain Rizvi Date: Tue Oct 11 21:56:01 2022 +0000 Add error checking to flaky test bot platform parser (#86632) If an invalid platform is specified when disabling a test with flaky test bot, the CI crashes, skipping all tests that come after it. This turns it into a console message instead. Not erroring out here since it'll affect random PRs. Actual error message should go into the bot that parses the original issue so that it can respond on that issue directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/86632 Approved by: https://github.com/huydhn commit 42bd275233259d6a4b4d071c14355d4ec45b3ec2 Author: Partho Date: Tue Oct 11 21:41:48 2022 +0000 [doc] LR scheduler example fix (#86629) Fixes issue #86208 As suggested in the issue, updated the LR scheduler example to use a regular nn.Module like the other examples on the same page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86629 Approved by: https://github.com/soulitzer commit 32152ce328230de27c9e3d3c1cfdc97c9ad1738a Author: jimku9 Date: Tue Oct 11 21:21:53 2022 +0000 Add original sources/references to Wishart.py in distributions (#86543) @fritzo As discussed, add original sources/references to Wishart.py in distributions and corrected typos in the error messages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86543 Approved by: https://github.com/fritzo commit 50af1ace5e4b029cef48b851dfc9e9ebbc1bb2b5 Author: Sherlock Huang Date: Tue Oct 11 17:56:59 2022 +0000 Mark aten ops as canonical (#86215) This is the first batch of canonical aten ops. 87 in total. More to come in the future PRs. native_dropout abs add.Tensor add.Scalar arange.start_step bitwise_not bmm cat clamp constant_pad_nd convolution convolution_backward div.Tensor div.Scalar embedding_dense_backward erf exp expand fill.Scalar grid_sampler_2d native_group_norm native_group_norm_backward native_layer_norm native_layer_norm_backward log _log_softmax max.dim amax mean.dim min.dim amin mm mul.Tensor mul.Scalar native_batch_norm permute scalar_tensor reciprocal neg repeat relu gelu rsqrt sigmoid slice.Tensor slice_scatter _softmax squeeze.dim sum.dim_IntList sqrt tanh unsqueeze var.dim where.self clone sub.Tensor sub.Scalar addmm _to_copy view scatter_add bitwise_and.Tensor bitwise_or.Tensor eq.Scalar ge.Scalar le.Scalar gt.Scalar lt.Scalar index_select nonzero gather maximum minimum pow.Tensor_Scalar hardtanh leaky_relu _adaptive_avg_pool2d _adaptive_avg_pool2d_backward avg_pool2d avg_pool2d_backward max_pool2d_with_indices max_pool2d_with_indices_backward upsample_bilinear2d.vec upsample_bilinear2d_backward.vec upsample_nearest2d.vec upsample_nearest2d_backward.vec col2im Pull Request resolved: https://github.com/pytorch/pytorch/pull/86215 Approved by: https://github.com/suo, https://github.com/anjali411 commit 8db30255c36fc7a93d8d5285415d7ab96911e1df Author: Jeff Daily Date: Tue Oct 11 20:55:58 2022 +0000 [ROCm] set nvfuser default to disabled, keep CI (#86369) Bug fix. nvfuser is functional for ROCm on gfx906, but some tests are failing for other gfx targets. Disable nvfuser until all features are verified. Users may still opt-in by setting the known env var PYTORCH_JIT_ENABLE_NVFUSER=1. This PR sets this env var for the github actions workflow for ROCm since all current CI hosts are gfx906. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86369 Approved by: https://github.com/huydhn commit 5ffe24fca43b05216d55fa73a7d2629248315a30 Author: Stephen Jia Date: Tue Oct 11 20:16:56 2022 +0000 [vulkan][ez] fix always printing out a warning when retrieving the global context (#86697) Summary: D40151818 (https://github.com/pytorch/pytorch/commit/82ed5ca3401e965067fd03a6bac57978f884f715) replaces the `TORCH_CHECK` with a `TORCH_WARN` but since it does not check if the context is valid the message gets printed every time. This diff fixes that. Test Plan: Referring to [Pytorch Vulkan Testing Procedures](https://fb.quip.com/fZALAc9zhlcU) On Mac: 1. `vulkan_api_test` on Mac 2. model comparison binary on Mac On Android: 1. `vulkan_api_test` on Android 2. benchmark binary on Android Reviewed By: salilsdesai Differential Revision: D40266820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86697 Approved by: https://github.com/kirklandsign commit f32aeeae00015ed484f8bfea2e24018de0dae277 Author: Han Qi (qihqi) Date: Tue Oct 11 20:07:58 2022 +0000 Set interface_call to true be default (#86668) Summary: ASR models need it Test Plan: existing unit tests Reviewed By: cccclai Differential Revision: D40251788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86668 Approved by: https://github.com/cccclai commit 7f02f2ac0cc8e9db2137d299769b456afe27fa45 Author: Huy Do Date: Tue Oct 11 19:34:44 2022 +0000 [Experimentation] Add TSAN build and test (#85313) Some parts of the PR are adopted from the previously abandoned https://github.com/pytorch/pytorch/pull/36694. This PR is the first part to setup TSAN jobs in the CI. The data race warnings from TSAN will need to be reviewed later in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85313 Approved by: https://github.com/osalpekar commit 92562046e9d6ef32b14e17b2b06433cfe7990912 Author: 胡玮文 Date: Tue Oct 11 19:03:43 2022 +0000 Optimize __dlpack_device__ performance (#86665) This can be critical when processing a large number of tensors ```bash python -m timeit --setup 'import torch; t = torch.empty(1000, device="cuda")' 't.__dlpack_device__()' ``` based on 1.12.1: before: 100000 loops, best of 5: 2.32 usec per loop after: 500000 loops, best of 5: 844 nsec per loop Pull Request resolved: https://github.com/pytorch/pytorch/pull/86665 Approved by: https://github.com/SunDoge, https://github.com/soulitzer commit c12f829cce29eb6971094a9bbb0f8971aed86f5c Author: Jerry Zhang Date: Tue Oct 11 18:49:09 2022 +0000 [nn] Add remove_duplicate flag to named_buffers (#674) (#85903) Summary: X-link: https://github.com/pytorch/torchrec/pull/674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84984 this is to allow named_buffers to return the same buffer objects with different names multiple times, needed by internal use cases ghstack-source-id: 168589597 Test Plan: python test/test_nn.py -k test_buffers_and_named_buffers Imported from OSS Reviewed By: albanD Differential Revision: D39493161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85903 Approved by: https://github.com/albanD commit 693250ac859fbb15ea4b4426d9cefaf97e151eb7 Author: David Date: Tue Oct 11 18:05:53 2022 +0000 Docs: fx.Node docs incorrectly state that the self argument is included in args for module calls (#86685) It seems like the [torch.fx.Node docs](https://pytorch.org/docs/stable/fx.html#torch.fx.Node) are incorrect regarding the inclusion of the self argument for module call nodes. While the docs state that self (the module) is included in `args`, it is in fact not, as demonstrated by this code: ```python import torch from torch import fx, nn class Net(nn.Module): def __init__(self): super().__init__() self.submod = nn.Linear(10, 10) def forward(self, x): x = x.flatten() return self.submod(x) graph_module = fx.symbolic_trace(Net()) print(graph_module.graph) # doesn't show self for the submodule call submod_node = list(graph_module.graph.nodes)[2] print(submod_node.op) # call_module print(submod_node.args) # (flatten,) => would need to have len 2 if self was included flatten_node = list(graph_module.graph.nodes)[1] print(flatten_node.op) # call_method print(flatten_node.args) # (x,) => here self is included (and docs are correct) ``` Since [torch.fx.Interpreter also uses `args` as if self was is not included](https://github.com/pytorch/pytorch/blob/2fe580859012d2d24a54e452195ccbc7f3191036/torch/fx/interpreter.py#L288), I assume the docs are incorrect. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86685 Approved by: https://github.com/soulitzer commit 160118d72a5c8e425ba30495e672d33ee1c94b50 Author: Fang Wang Date: Tue Oct 11 17:52:18 2022 +0000 Add test case for matrix multiply-add with large inputs (#85550) Summary: - Added test case for addmm, baddbmm and linear with large inputs - Testing with torch types: float32, float16, bfloat16 Test Plan: Run unit tests with: `buck2 run mode/opt //caffe2/test:linalg_re_cuda` ``` ... test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_1000_10000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_1000_1000_1000_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_float16 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_2_100_100_100_cpu_float32 (test_linalg_re_cuda.TestLinalgReCudaCPU) ... skipped 'Only runs on cuda' test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_10000_10000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_1_10000_1000_10000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_1000_1000_1000_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_bfloat16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_float16 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok test_addmm_baddbmm_large_input_2_100_100_100_cuda_float32 (test_linalg_re_cuda.TestLinalgReCudaCUDA) ... ok ---------------------------------------------------------------------- Ran 24 tests in 63.224s OK (skipped=12) ``` Differential Revision: D39718256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85550 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet commit 212fa874ce0bb8c2d70be2bcd87188c072d1082d Author: vfdev Date: Tue Oct 11 17:52:16 2022 +0000 Fix torch histogramdd docstring (#86593) Fixed torch histogramdd docsting with missing common_args Pull Request resolved: https://github.com/pytorch/pytorch/pull/86593 Approved by: https://github.com/soulitzer commit f26292d91e1bf358d4f4902688433248c931ca68 Author: Jane Xu Date: Tue Oct 11 17:42:51 2022 +0000 [BE] Fix python docs typos up till torch.chunk (#86642) Was doing the Views lab linked https://github.com/pytorch/pytorch/wiki/Tensor-and-Operator-Basics and noticed a few typos, which led to this PR. Test plan: verified in preview Pull Request resolved: https://github.com/pytorch/pytorch/pull/86642 Approved by: https://github.com/soulitzer commit 86f914e9966e91b3d3e7c1504f5b1f00a9498d88 Author: albanD Date: Tue Oct 11 10:35:18 2022 -0400 min/max support for SymInt/Floats, finish as_strided/scatter/squeeze() backward symint support (#86643) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86643 Approved by: https://github.com/anjali411 commit 6923dc3b590e51773ee9e0a536b0863963b91232 Author: Jane Xu Date: Tue Oct 11 17:23:36 2022 +0000 Add module: decompositions as an owner to test_decomp.py (#86703) so flaky tests can be attributed to @SherlockNoMad too 😛 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86703 Approved by: https://github.com/albanD commit 109f4d445382df93ed3afc592fd64719c0b86c01 Author: Richard Zou Date: Tue Oct 11 07:28:20 2022 -0700 Move functorch tests from functorch/test/* to test/functorch/* (#86623) This is the first step described in https://github.com/pytorch/pytorch/issues/86618 . test/functorch/* is the final location for these tests. Test Plan: - Check that the functorch shards in CI are still running tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86623 Approved by: https://github.com/huydhn commit 51ea4418621e0236e7ec1ebf3606317ad1430548 Author: Ivan Yashchuk Date: Tue Oct 11 16:39:57 2022 +0000 Upcast to fp32 in test_addmm_block ref_half_bfloat16 (#86682) Fixes https://github.com/pytorch/pytorch/issues/86681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86682 Approved by: https://github.com/nikitaved commit 3edf79dc03193c98b665d62231fe69a10dfab1fa Author: PyTorch MergeBot Date: Tue Oct 11 16:33:41 2022 +0000 Revert "Add meta support for _adaptive_avg_pool2d_backward (#86359)" This reverts commit a56a8c0fc0251bb4cd24b366a290db2e4beea747. Reverted https://github.com/pytorch/pytorch/pull/86359 on behalf of https://github.com/clee2000 due to causing unexpected success for functorch on master but PR is green (landrace?) https://github.com/pytorch/pytorch/actions/runs/3227306657/jobs/5282180524 https://hud.pytorch.org/pytorch/pytorch/commit/a56a8c0fc0251bb4cd24b366a290db2e4beea747 commit 97de281176d2476aec8cabfae9981c86f6179531 Author: Nicolas Hug Date: Thu Oct 6 11:32:29 2022 +0000 Improve interpolate() speed for channels_last CPU images and masks (#86361) This PR improves the speed of `interpolate()`: - on CPU - on images and masks (`num_channels < 4`, `channels_last=True`) - for the following modes: linear (antialias=False), nearest (int and float), and nearest-exact (int and float) - for both upsampling and downsampling The actual speed-up ranges from 1.1X to 110X, but this depends on various factors like number of threads and of course input_size/output_size. In a typical torchvision ImageNet training job (where num_threads=1 because of DataLoader multi-processing), the following speed-ups should be expected (I ran much more benchmarks than this one, see below for more details): ``` (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=1 1.0X 1.0ms vs 1.0ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=1 1.9X 0.9ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=1 1.7X 0.9ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=1 2.1X 1.0ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=1 1.8X 0.9ms vs 0.5ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=1 7X 0.8ms vs 0.1ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=1 14X 0.852ms vs 0.061ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=1 9X 0.828ms vs 0.087ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=1 15X 0.922ms vs 0.061ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=1 10X 0.897ms vs 0.087ms ``` An immediate follow-up to this PR would be to do the same changes for the 3D kernels. Thanks a ton @fmassa for the help! Results:
``` ---------------------------------------------------------------------------------------------------- (1, 3, 64, 64) -> (224, 224) linear float32 num_threads=1 0.9X 0.9ms vs 1.1ms (1, 3, 64, 64) -> (224, 224) nearest float32 num_threads=1 1.6X 0.9ms vs 0.5ms (1, 3, 64, 64) -> (224, 224) nearest uint8 num_threads=1 1.7X 0.9ms vs 0.5ms (1, 3, 64, 64) -> (224, 224) nearest-exact float32 num_threads=1 1.7X 1.0ms vs 0.5ms (1, 3, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=1 1.9X 0.9ms vs 0.5ms (1, 1, 64, 64) -> (224, 224) linear float32 num_threads=1 8X 0.806ms vs 0.097ms (1, 1, 64, 64) -> (224, 224) nearest float32 num_threads=1 15X 0.848ms vs 0.056ms (1, 1, 64, 64) -> (224, 224) nearest uint8 num_threads=1 10X 0.828ms vs 0.084ms (1, 1, 64, 64) -> (224, 224) nearest-exact float32 num_threads=1 16X 0.914ms vs 0.057ms (1, 1, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=1 10X 0.900ms vs 0.086ms (1, 3, 64, 64) -> (224, 224) linear float32 num_threads=2 1.6X 1.1ms vs 0.7ms (1, 3, 64, 64) -> (224, 224) nearest float32 num_threads=2 1.6X 0.6ms vs 0.4ms (1, 3, 64, 64) -> (224, 224) nearest uint8 num_threads=2 1.7X 0.4ms vs 0.3ms (1, 3, 64, 64) -> (224, 224) nearest-exact float32 num_threads=2 1.7X 0.6ms vs 0.4ms (1, 3, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=2 1.7X 0.5ms vs 0.3ms (1, 1, 64, 64) -> (224, 224) linear float32 num_threads=2 9X 0.800ms vs 0.088ms (1, 1, 64, 64) -> (224, 224) nearest float32 num_threads=2 11X 0.459ms vs 0.043ms (1, 1, 64, 64) -> (224, 224) nearest uint8 num_threads=2 7X 0.424ms vs 0.064ms (1, 1, 64, 64) -> (224, 224) nearest-exact float32 num_threads=2 12X 0.503ms vs 0.043ms (1, 1, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=2 8X 0.461ms vs 0.059ms (1, 3, 64, 64) -> (224, 224) linear float32 num_threads=12 3X 1.1ms vs 0.3ms (1, 3, 64, 64) -> (224, 224) nearest float32 num_threads=12 1.6X 0.3ms vs 0.2ms (1, 3, 64, 64) -> (224, 224) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 64, 64) -> (224, 224) nearest-exact float32 num_threads=12 1.5X 0.3ms vs 0.2ms (1, 3, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 1, 64, 64) -> (224, 224) linear float32 num_threads=12 5X 0.8ms vs 0.2ms (1, 1, 64, 64) -> (224, 224) nearest float32 num_threads=12 10X 0.445ms vs 0.047ms (1, 1, 64, 64) -> (224, 224) nearest uint8 num_threads=12 7X 0.432ms vs 0.062ms (1, 1, 64, 64) -> (224, 224) nearest-exact float32 num_threads=12 10X 0.478ms vs 0.046ms (1, 1, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=12 7X 0.470ms vs 0.063ms (1, 3, 64, 64) -> (224, 224) linear float32 num_threads=32 3X 1.1ms vs 0.4ms (1, 3, 64, 64) -> (224, 224) nearest float32 num_threads=32 1.8X 0.3ms vs 0.2ms (1, 3, 64, 64) -> (224, 224) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 64, 64) -> (224, 224) nearest-exact float32 num_threads=32 1.4X 0.3ms vs 0.2ms (1, 3, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 1, 64, 64) -> (224, 224) linear float32 num_threads=32 11X 0.815ms vs 0.074ms (1, 1, 64, 64) -> (224, 224) nearest float32 num_threads=32 10X 0.443ms vs 0.045ms (1, 1, 64, 64) -> (224, 224) nearest uint8 num_threads=32 7X 0.436ms vs 0.061ms (1, 1, 64, 64) -> (224, 224) nearest-exact float32 num_threads=32 10X 0.478ms vs 0.046ms (1, 1, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=32 8X 0.470ms vs 0.061ms ---------------------------------------------------------------------------------------------------- (1, 3, 128, 128) -> (224, 224) linear float32 num_threads=1 0.9X 0.9ms vs 1.1ms (1, 3, 128, 128) -> (224, 224) nearest float32 num_threads=1 1.5X 0.9ms vs 0.6ms (1, 3, 128, 128) -> (224, 224) nearest uint8 num_threads=1 1.7X 0.9ms vs 0.5ms (1, 3, 128, 128) -> (224, 224) nearest-exact float32 num_threads=1 1.6X 1.0ms vs 0.6ms (1, 3, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=1 1.8X 0.9ms vs 0.5ms (1, 1, 128, 128) -> (224, 224) linear float32 num_threads=1 8X 0.808ms vs 0.099ms (1, 1, 128, 128) -> (224, 224) nearest float32 num_threads=1 15X 0.848ms vs 0.058ms (1, 1, 128, 128) -> (224, 224) nearest uint8 num_threads=1 9X 0.820ms vs 0.087ms (1, 1, 128, 128) -> (224, 224) nearest-exact float32 num_threads=1 16X 0.909ms vs 0.059ms (1, 1, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=1 10X 0.898ms vs 0.088ms (1, 3, 128, 128) -> (224, 224) linear float32 num_threads=2 1.4X 0.9ms vs 0.7ms (1, 3, 128, 128) -> (224, 224) nearest float32 num_threads=2 1.5X 0.5ms vs 0.3ms (1, 3, 128, 128) -> (224, 224) nearest uint8 num_threads=2 1.7X 0.4ms vs 0.3ms (1, 3, 128, 128) -> (224, 224) nearest-exact float32 num_threads=2 1.5X 0.5ms vs 0.4ms (1, 3, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=2 1.8X 0.5ms vs 0.3ms (1, 1, 128, 128) -> (224, 224) linear float32 num_threads=2 9X 0.799ms vs 0.090ms (1, 1, 128, 128) -> (224, 224) nearest float32 num_threads=2 10X 0.459ms vs 0.045ms (1, 1, 128, 128) -> (224, 224) nearest uint8 num_threads=2 7X 0.427ms vs 0.059ms (1, 1, 128, 128) -> (224, 224) nearest-exact float32 num_threads=2 11X 0.501ms vs 0.044ms (1, 1, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=2 8X 0.460ms vs 0.060ms (1, 3, 128, 128) -> (224, 224) linear float32 num_threads=12 2.9X 1.0ms vs 0.3ms (1, 3, 128, 128) -> (224, 224) nearest float32 num_threads=12 1.2X 0.2ms vs 0.2ms (1, 3, 128, 128) -> (224, 224) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 128, 128) -> (224, 224) nearest-exact float32 num_threads=12 1.1X 0.2ms vs 0.2ms (1, 3, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 1, 128, 128) -> (224, 224) linear float32 num_threads=12 12X 0.809ms vs 0.068ms (1, 1, 128, 128) -> (224, 224) nearest float32 num_threads=12 11X 0.438ms vs 0.041ms (1, 1, 128, 128) -> (224, 224) nearest uint8 num_threads=12 8X 0.432ms vs 0.055ms (1, 1, 128, 128) -> (224, 224) nearest-exact float32 num_threads=12 12X 0.480ms vs 0.041ms (1, 1, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=12 8X 0.464ms vs 0.056ms (1, 3, 128, 128) -> (224, 224) linear float32 num_threads=32 3X 1.1ms vs 0.3ms (1, 3, 128, 128) -> (224, 224) nearest float32 num_threads=32 1.3X 0.3ms vs 0.2ms (1, 3, 128, 128) -> (224, 224) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 128, 128) -> (224, 224) nearest-exact float32 num_threads=32 1.4X 0.3ms vs 0.2ms (1, 3, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 128, 128) -> (224, 224) linear float32 num_threads=32 11X 0.813ms vs 0.075ms (1, 1, 128, 128) -> (224, 224) nearest float32 num_threads=32 10X 0.443ms vs 0.046ms (1, 1, 128, 128) -> (224, 224) nearest uint8 num_threads=32 7X 0.433ms vs 0.061ms (1, 1, 128, 128) -> (224, 224) nearest-exact float32 num_threads=32 10X 0.478ms vs 0.046ms (1, 1, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=32 8X 0.470ms vs 0.062ms ---------------------------------------------------------------------------------------------------- (1, 3, 224, 224) -> (600, 400) linear float32 num_threads=1 0.9X 4.5ms vs 5.2ms (1, 3, 224, 224) -> (600, 400) nearest float32 num_threads=1 1.5X 4.2ms vs 2.8ms (1, 3, 224, 224) -> (600, 400) nearest uint8 num_threads=1 1.8X 4.1ms vs 2.3ms (1, 3, 224, 224) -> (600, 400) nearest-exact float32 num_threads=1 1.6X 4.5ms vs 2.8ms (1, 3, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=1 1.9X 4.4ms vs 2.3ms (1, 1, 224, 224) -> (600, 400) linear float32 num_threads=1 9X 3.8ms vs 0.4ms (1, 1, 224, 224) -> (600, 400) nearest float32 num_threads=1 17X 4.0ms vs 0.2ms (1, 1, 224, 224) -> (600, 400) nearest uint8 num_threads=1 11X 3.9ms vs 0.4ms (1, 1, 224, 224) -> (600, 400) nearest-exact float32 num_threads=1 19X 4.4ms vs 0.2ms (1, 1, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=1 12X 4.3ms vs 0.4ms (1, 3, 224, 224) -> (600, 400) linear float32 num_threads=2 1.5X 4.5ms vs 3.1ms (1, 3, 224, 224) -> (600, 400) nearest float32 num_threads=2 1.4X 2.3ms vs 1.6ms (1, 3, 224, 224) -> (600, 400) nearest uint8 num_threads=2 1.7X 2.1ms vs 1.2ms (1, 3, 224, 224) -> (600, 400) nearest-exact float32 num_threads=2 1.6X 2.5ms vs 1.6ms (1, 3, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=2 1.8X 2.2ms vs 1.2ms (1, 1, 224, 224) -> (600, 400) linear float32 num_threads=2 15X 3.8ms vs 0.3ms (1, 1, 224, 224) -> (600, 400) nearest float32 num_threads=2 15X 2.2ms vs 0.1ms (1, 1, 224, 224) -> (600, 400) nearest uint8 num_threads=2 7X 2.0ms vs 0.3ms (1, 1, 224, 224) -> (600, 400) nearest-exact float32 num_threads=2 16X 2.4ms vs 0.1ms (1, 1, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=2 8X 2.2ms vs 0.3ms (1, 3, 224, 224) -> (600, 400) linear float32 num_threads=12 8X 5.2ms vs 0.7ms (1, 3, 224, 224) -> (600, 400) nearest float32 num_threads=12 1.3X 0.6ms vs 0.4ms (1, 3, 224, 224) -> (600, 400) nearest uint8 num_threads=12 1.7X 0.4ms vs 0.2ms (1, 3, 224, 224) -> (600, 400) nearest-exact float32 num_threads=12 1.4X 0.6ms vs 0.4ms (1, 3, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=12 1.8X 0.4ms vs 0.2ms (1, 1, 224, 224) -> (600, 400) linear float32 num_threads=12 36X 3.9ms vs 0.1ms (1, 1, 224, 224) -> (600, 400) nearest float32 num_threads=12 10X 0.526ms vs 0.051ms (1, 1, 224, 224) -> (600, 400) nearest uint8 num_threads=12 7X 0.514ms vs 0.069ms (1, 1, 224, 224) -> (600, 400) nearest-exact float32 num_threads=12 11X 0.569ms vs 0.052ms (1, 1, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=12 8X 0.557ms vs 0.070ms (1, 3, 224, 224) -> (600, 400) linear float32 num_threads=32 9X 4.5ms vs 0.5ms (1, 3, 224, 224) -> (600, 400) nearest float32 num_threads=32 0.5X 0.2ms vs 0.5ms (1, 3, 224, 224) -> (600, 400) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 224, 224) -> (600, 400) nearest-exact float32 num_threads=32 1.0X 0.5ms vs 0.5ms (1, 3, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 224, 224) -> (600, 400) linear float32 num_threads=32 44X 3.864ms vs 0.087ms (1, 1, 224, 224) -> (600, 400) nearest float32 num_threads=32 10X 0.527ms vs 0.053ms (1, 1, 224, 224) -> (600, 400) nearest uint8 num_threads=32 7X 0.516ms vs 0.070ms (1, 1, 224, 224) -> (600, 400) nearest-exact float32 num_threads=32 10X 0.567ms vs 0.055ms (1, 1, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=32 8X 0.558ms vs 0.072ms ---------------------------------------------------------------------------------------------------- (1, 3, 256, 256) -> (320, 320) linear float32 num_threads=1 1.0X 1.9ms vs 1.9ms (1, 3, 256, 256) -> (320, 320) nearest float32 num_threads=1 2.0X 1.8ms vs 0.9ms (1, 3, 256, 256) -> (320, 320) nearest uint8 num_threads=1 1.7X 1.8ms vs 1.0ms (1, 3, 256, 256) -> (320, 320) nearest-exact float32 num_threads=1 2.1X 1.9ms vs 0.9ms (1, 3, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=1 1.9X 1.9ms vs 1.0ms (1, 1, 256, 256) -> (320, 320) linear float32 num_threads=1 9X 1.6ms vs 0.2ms (1, 1, 256, 256) -> (320, 320) nearest float32 num_threads=1 16X 1.7ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) nearest uint8 num_threads=1 10X 1.7ms vs 0.2ms (1, 1, 256, 256) -> (320, 320) nearest-exact float32 num_threads=1 17X 1.9ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=1 11X 1.8ms vs 0.2ms (1, 3, 256, 256) -> (320, 320) linear float32 num_threads=2 1.7X 1.9ms vs 1.1ms (1, 3, 256, 256) -> (320, 320) nearest float32 num_threads=2 2.0X 1.0ms vs 0.5ms (1, 3, 256, 256) -> (320, 320) nearest uint8 num_threads=2 1.7X 0.9ms vs 0.5ms (1, 3, 256, 256) -> (320, 320) nearest-exact float32 num_threads=2 2.3X 1.1ms vs 0.5ms (1, 3, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=2 1.8X 1.0ms vs 0.5ms (1, 1, 256, 256) -> (320, 320) linear float32 num_threads=2 8X 1.6ms vs 0.2ms (1, 1, 256, 256) -> (320, 320) nearest float32 num_threads=2 14X 0.931ms vs 0.067ms (1, 1, 256, 256) -> (320, 320) nearest uint8 num_threads=2 7X 0.9ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) nearest-exact float32 num_threads=2 15X 1.016ms vs 0.069ms (1, 1, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=2 9X 0.9ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) linear float32 num_threads=12 8X 1.9ms vs 0.3ms (1, 3, 256, 256) -> (320, 320) nearest float32 num_threads=12 1.7X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest-exact float32 num_threads=12 1.9X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) linear float32 num_threads=12 20X 1.630ms vs 0.081ms (1, 1, 256, 256) -> (320, 320) nearest float32 num_threads=12 10X 0.457ms vs 0.044ms (1, 1, 256, 256) -> (320, 320) nearest uint8 num_threads=12 7X 0.439ms vs 0.060ms (1, 1, 256, 256) -> (320, 320) nearest-exact float32 num_threads=12 11X 0.485ms vs 0.045ms (1, 1, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=12 8X 0.474ms vs 0.061ms (1, 3, 256, 256) -> (320, 320) linear float32 num_threads=32 8X 1.9ms vs 0.3ms (1, 3, 256, 256) -> (320, 320) nearest float32 num_threads=32 2.0X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest-exact float32 num_threads=32 1.4X 0.2ms vs 0.2ms (1, 3, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=32 1.4X 0.2ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) linear float32 num_threads=32 21X 1.628ms vs 0.078ms (1, 1, 256, 256) -> (320, 320) nearest float32 num_threads=32 9X 0.453ms vs 0.048ms (1, 1, 256, 256) -> (320, 320) nearest uint8 num_threads=32 7X 0.445ms vs 0.063ms (1, 1, 256, 256) -> (320, 320) nearest-exact float32 num_threads=32 11X 0.535ms vs 0.048ms (1, 1, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=32 8X 0.502ms vs 0.063ms ---------------------------------------------------------------------------------------------------- (1, 3, 500, 500) -> (800, 800) linear float32 num_threads=1 1.0X 13.8ms vs 14.0ms (1, 3, 500, 500) -> (800, 800) nearest float32 num_threads=1 1.8X 13.1ms vs 7.4ms (1, 3, 500, 500) -> (800, 800) nearest uint8 num_threads=1 1.8X 11.1ms vs 6.1ms (1, 3, 500, 500) -> (800, 800) nearest-exact float32 num_threads=1 1.9X 13.9ms vs 7.4ms (1, 3, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=1 1.9X 11.8ms vs 6.1ms (1, 1, 500, 500) -> (800, 800) linear float32 num_threads=1 10X 10.2ms vs 1.1ms (1, 1, 500, 500) -> (800, 800) nearest float32 num_threads=1 19X 10.8ms vs 0.6ms (1, 1, 500, 500) -> (800, 800) nearest uint8 num_threads=1 11X 10.4ms vs 0.9ms (1, 1, 500, 500) -> (800, 800) nearest-exact float32 num_threads=1 20X 11.6ms vs 0.6ms (1, 1, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=1 12X 11.4ms vs 0.9ms (1, 3, 500, 500) -> (800, 800) linear float32 num_threads=2 1.8X 13.7ms vs 7.7ms (1, 3, 500, 500) -> (800, 800) nearest float32 num_threads=2 2.6X 7.3ms vs 2.8ms (1, 3, 500, 500) -> (800, 800) nearest uint8 num_threads=2 1.8X 5.6ms vs 3.1ms (1, 3, 500, 500) -> (800, 800) nearest-exact float32 num_threads=2 1.9X 7.9ms vs 4.1ms (1, 3, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=2 1.9X 6.0ms vs 3.1ms (1, 1, 500, 500) -> (800, 800) linear float32 num_threads=2 18X 10.1ms vs 0.6ms (1, 1, 500, 500) -> (800, 800) nearest float32 num_threads=2 19X 5.8ms vs 0.3ms (1, 1, 500, 500) -> (800, 800) nearest uint8 num_threads=2 10X 5.3ms vs 0.5ms (1, 1, 500, 500) -> (800, 800) nearest-exact float32 num_threads=2 20X 6.3ms vs 0.3ms (1, 1, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=2 11X 5.7ms vs 0.5ms (1, 3, 500, 500) -> (800, 800) linear float32 num_threads=12 8X 13.8ms vs 1.6ms (1, 3, 500, 500) -> (800, 800) nearest float32 num_threads=12 2.9X 1.5ms vs 0.5ms (1, 3, 500, 500) -> (800, 800) nearest uint8 num_threads=12 1.7X 1.0ms vs 0.5ms (1, 3, 500, 500) -> (800, 800) nearest-exact float32 num_threads=12 1.5X 1.5ms vs 1.0ms (1, 3, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=12 1.8X 1.0ms vs 0.6ms (1, 1, 500, 500) -> (800, 800) linear float32 num_threads=12 80X 10.1ms vs 0.1ms (1, 1, 500, 500) -> (800, 800) nearest float32 num_threads=12 13X 0.928ms vs 0.072ms (1, 1, 500, 500) -> (800, 800) nearest uint8 num_threads=12 8X 0.9ms vs 0.1ms (1, 1, 500, 500) -> (800, 800) nearest-exact float32 num_threads=12 13X 1.001ms vs 0.074ms (1, 1, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=12 9X 1.0ms vs 0.1ms (1, 3, 500, 500) -> (800, 800) linear float32 num_threads=32 18X 14.0ms vs 0.8ms (1, 3, 500, 500) -> (800, 800) nearest float32 num_threads=32 1.9X 1.0ms vs 0.6ms (1, 3, 500, 500) -> (800, 800) nearest uint8 num_threads=32 2.9X 0.7ms vs 0.2ms (1, 3, 500, 500) -> (800, 800) nearest-exact float32 num_threads=32 1.7X 0.9ms vs 0.6ms (1, 3, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=32 1.8X 0.4ms vs 0.2ms (1, 1, 500, 500) -> (800, 800) linear float32 num_threads=32 111X 10.254ms vs 0.092ms (1, 1, 500, 500) -> (800, 800) nearest float32 num_threads=32 14X 0.784ms vs 0.056ms (1, 1, 500, 500) -> (800, 800) nearest uint8 num_threads=32 7X 0.551ms vs 0.075ms (1, 1, 500, 500) -> (800, 800) nearest-exact float32 num_threads=32 11X 0.607ms vs 0.057ms (1, 1, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=32 8X 0.596ms vs 0.076ms ---------------------------------------------------------------------------------------------------- (1, 3, 224, 224) -> (64, 64) linear float32 num_threads=1 1.0X 0.084ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) nearest float32 num_threads=1 1.0X 0.077ms vs 0.078ms (1, 3, 224, 224) -> (64, 64) nearest uint8 num_threads=1 1.0X 0.076ms vs 0.076ms (1, 3, 224, 224) -> (64, 64) nearest-exact float32 num_threads=1 1.0X 0.083ms vs 0.083ms (1, 3, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=1 1.0X 0.081ms vs 0.082ms (1, 1, 224, 224) -> (64, 64) linear float32 num_threads=1 1.0X 0.071ms vs 0.071ms (1, 1, 224, 224) -> (64, 64) nearest float32 num_threads=1 1.0X 0.074ms vs 0.074ms (1, 1, 224, 224) -> (64, 64) nearest uint8 num_threads=1 1.0X 0.072ms vs 0.072ms (1, 1, 224, 224) -> (64, 64) nearest-exact float32 num_threads=1 1.0X 0.080ms vs 0.080ms (1, 1, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=1 0.9X 0.078ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) linear float32 num_threads=2 1.0X 0.083ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) nearest float32 num_threads=2 1.0X 0.076ms vs 0.077ms (1, 3, 224, 224) -> (64, 64) nearest uint8 num_threads=2 1.0X 0.075ms vs 0.074ms (1, 3, 224, 224) -> (64, 64) nearest-exact float32 num_threads=2 1.0X 0.082ms vs 0.083ms (1, 3, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=2 1.0X 0.080ms vs 0.083ms (1, 1, 224, 224) -> (64, 64) linear float32 num_threads=2 1.0X 0.070ms vs 0.071ms (1, 1, 224, 224) -> (64, 64) nearest float32 num_threads=2 1.0X 0.073ms vs 0.075ms (1, 1, 224, 224) -> (64, 64) nearest uint8 num_threads=2 1.0X 0.071ms vs 0.072ms (1, 1, 224, 224) -> (64, 64) nearest-exact float32 num_threads=2 1.0X 0.079ms vs 0.080ms (1, 1, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=2 1.0X 0.077ms vs 0.079ms (1, 3, 224, 224) -> (64, 64) linear float32 num_threads=12 1.0X 0.083ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) nearest float32 num_threads=12 1.0X 0.080ms vs 0.078ms (1, 3, 224, 224) -> (64, 64) nearest uint8 num_threads=12 1.0X 0.077ms vs 0.075ms (1, 3, 224, 224) -> (64, 64) nearest-exact float32 num_threads=12 1.0X 0.083ms vs 0.083ms (1, 3, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=12 1.0X 0.083ms vs 0.082ms (1, 1, 224, 224) -> (64, 64) linear float32 num_threads=12 1.0X 0.071ms vs 0.071ms (1, 1, 224, 224) -> (64, 64) nearest float32 num_threads=12 1.0X 0.076ms vs 0.074ms (1, 1, 224, 224) -> (64, 64) nearest uint8 num_threads=12 1.0X 0.073ms vs 0.071ms (1, 1, 224, 224) -> (64, 64) nearest-exact float32 num_threads=12 1.0X 0.080ms vs 0.080ms (1, 1, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=12 1.0X 0.080ms vs 0.078ms (1, 3, 224, 224) -> (64, 64) linear float32 num_threads=32 1.0X 0.084ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) nearest float32 num_threads=32 1.0X 0.078ms vs 0.077ms (1, 3, 224, 224) -> (64, 64) nearest uint8 num_threads=32 1.0X 0.076ms vs 0.076ms (1, 3, 224, 224) -> (64, 64) nearest-exact float32 num_threads=32 1.0X 0.083ms vs 0.083ms (1, 3, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=32 1.0X 0.081ms vs 0.082ms (1, 1, 224, 224) -> (64, 64) linear float32 num_threads=32 1.0X 0.072ms vs 0.072ms (1, 1, 224, 224) -> (64, 64) nearest float32 num_threads=32 1.0X 0.074ms vs 0.075ms (1, 1, 224, 224) -> (64, 64) nearest uint8 num_threads=32 1.0X 0.072ms vs 0.072ms (1, 1, 224, 224) -> (64, 64) nearest-exact float32 num_threads=32 1.0X 0.077ms vs 0.080ms (1, 1, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=32 1.0X 0.076ms vs 0.079ms ---------------------------------------------------------------------------------------------------- (1, 3, 224, 224) -> (128, 128) linear float32 num_threads=1 1.0X 0.3ms vs 0.3ms (1, 3, 224, 224) -> (128, 128) nearest float32 num_threads=1 1.8X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest uint8 num_threads=1 1.6X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest-exact float32 num_threads=1 2.0X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=1 1.7X 0.3ms vs 0.2ms (1, 1, 224, 224) -> (128, 128) linear float32 num_threads=1 6X 0.265ms vs 0.044ms (1, 1, 224, 224) -> (128, 128) nearest float32 num_threads=1 10X 0.280ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest uint8 num_threads=1 7X 0.273ms vs 0.037ms (1, 1, 224, 224) -> (128, 128) nearest-exact float32 num_threads=1 11X 0.303ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=1 8X 0.297ms vs 0.038ms (1, 3, 224, 224) -> (128, 128) linear float32 num_threads=2 1.5X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest float32 num_threads=2 1.8X 0.163ms vs 0.093ms (1, 3, 224, 224) -> (128, 128) nearest uint8 num_threads=2 1.5X 0.2ms vs 0.1ms (1, 3, 224, 224) -> (128, 128) nearest-exact float32 num_threads=2 1.9X 0.180ms vs 0.096ms (1, 3, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=2 1.6X 0.2ms vs 0.1ms (1, 1, 224, 224) -> (128, 128) linear float32 num_threads=2 6X 0.264ms vs 0.044ms (1, 1, 224, 224) -> (128, 128) nearest float32 num_threads=2 10X 0.278ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest uint8 num_threads=2 7X 0.270ms vs 0.037ms (1, 1, 224, 224) -> (128, 128) nearest-exact float32 num_threads=2 11X 0.298ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=2 8X 0.293ms vs 0.037ms (1, 3, 224, 224) -> (128, 128) linear float32 num_threads=12 1.5X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest float32 num_threads=12 1.7X 0.158ms vs 0.095ms (1, 3, 224, 224) -> (128, 128) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 224, 224) -> (128, 128) nearest-exact float32 num_threads=12 1.7X 0.170ms vs 0.100ms (1, 3, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 1, 224, 224) -> (128, 128) linear float32 num_threads=12 6X 0.269ms vs 0.043ms (1, 1, 224, 224) -> (128, 128) nearest float32 num_threads=12 11X 0.291ms vs 0.027ms (1, 1, 224, 224) -> (128, 128) nearest uint8 num_threads=12 8X 0.281ms vs 0.037ms (1, 1, 224, 224) -> (128, 128) nearest-exact float32 num_threads=12 11X 0.305ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=12 8X 0.306ms vs 0.038ms (1, 3, 224, 224) -> (128, 128) linear float32 num_threads=32 1.5X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest float32 num_threads=32 1.6X 0.160ms vs 0.098ms (1, 3, 224, 224) -> (128, 128) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 224, 224) -> (128, 128) nearest-exact float32 num_threads=32 1.7X 0.171ms vs 0.099ms (1, 3, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 224, 224) -> (128, 128) linear float32 num_threads=32 6X 0.269ms vs 0.044ms (1, 1, 224, 224) -> (128, 128) nearest float32 num_threads=32 10X 0.282ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest uint8 num_threads=32 7X 0.276ms vs 0.037ms (1, 1, 224, 224) -> (128, 128) nearest-exact float32 num_threads=32 11X 0.305ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=32 8X 0.299ms vs 0.038ms ---------------------------------------------------------------------------------------------------- (1, 3, 320, 320) -> (256, 256) linear float32 num_threads=1 1.0X 1.2ms vs 1.3ms (1, 3, 320, 320) -> (256, 256) nearest float32 num_threads=1 2.0X 1.2ms vs 0.6ms (1, 3, 320, 320) -> (256, 256) nearest uint8 num_threads=1 1.7X 1.1ms vs 0.7ms (1, 3, 320, 320) -> (256, 256) nearest-exact float32 num_threads=1 2.1X 1.2ms vs 0.6ms (1, 3, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=1 1.9X 1.2ms vs 0.7ms (1, 1, 320, 320) -> (256, 256) linear float32 num_threads=1 8X 1.1ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) nearest float32 num_threads=1 15X 1.109ms vs 0.073ms (1, 1, 320, 320) -> (256, 256) nearest uint8 num_threads=1 10X 1.1ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) nearest-exact float32 num_threads=1 16X 1.192ms vs 0.074ms (1, 1, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=1 11X 1.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) linear float32 num_threads=2 1.7X 1.2ms vs 0.7ms (1, 3, 320, 320) -> (256, 256) nearest float32 num_threads=2 2.0X 0.6ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest uint8 num_threads=2 1.7X 0.6ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest-exact float32 num_threads=2 2.2X 0.7ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=2 1.8X 0.6ms vs 0.3ms (1, 1, 320, 320) -> (256, 256) linear float32 num_threads=2 9X 1.0ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) nearest float32 num_threads=2 11X 0.598ms vs 0.052ms (1, 1, 320, 320) -> (256, 256) nearest uint8 num_threads=2 8X 0.556ms vs 0.072ms (1, 1, 320, 320) -> (256, 256) nearest-exact float32 num_threads=2 12X 0.649ms vs 0.053ms (1, 1, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=2 8X 0.598ms vs 0.073ms (1, 3, 320, 320) -> (256, 256) linear float32 num_threads=12 5X 1.2ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest float32 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest uint8 num_threads=12 1.3X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest-exact float32 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=12 1.4X 0.2ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) linear float32 num_threads=12 9X 1.0ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) nearest float32 num_threads=12 12X 0.572ms vs 0.048ms (1, 1, 320, 320) -> (256, 256) nearest uint8 num_threads=12 8X 0.560ms vs 0.068ms (1, 1, 320, 320) -> (256, 256) nearest-exact float32 num_threads=12 13X 0.617ms vs 0.049ms (1, 1, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=12 9X 0.604ms vs 0.068ms (1, 3, 320, 320) -> (256, 256) linear float32 num_threads=32 5X 1.2ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest float32 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest uint8 num_threads=32 1.4X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest-exact float32 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=32 1.4X 0.2ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) linear float32 num_threads=32 13X 1.042ms vs 0.081ms (1, 1, 320, 320) -> (256, 256) nearest float32 num_threads=32 12X 0.586ms vs 0.050ms (1, 1, 320, 320) -> (256, 256) nearest uint8 num_threads=32 8X 0.562ms vs 0.069ms (1, 1, 320, 320) -> (256, 256) nearest-exact float32 num_threads=32 12X 0.621ms vs 0.051ms (1, 1, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=32 9X 0.609ms vs 0.070ms ---------------------------------------------------------------------------------------------------- (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=1 1.0X 1.0ms vs 1.0ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=1 1.9X 0.9ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=1 1.7X 0.9ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=1 2.1X 1.0ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=1 1.8X 0.9ms vs 0.5ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=1 7X 0.8ms vs 0.1ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=1 14X 0.852ms vs 0.061ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=1 9X 0.828ms vs 0.087ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=1 15X 0.922ms vs 0.061ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=1 10X 0.897ms vs 0.087ms (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=2 1.6X 0.9ms vs 0.6ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=2 1.9X 0.5ms vs 0.2ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=2 1.7X 0.4ms vs 0.3ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=2 2.1X 0.5ms vs 0.3ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=2 1.8X 0.5ms vs 0.3ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=2 10X 0.808ms vs 0.084ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=2 10X 0.462ms vs 0.046ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=2 7X 0.429ms vs 0.062ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=2 12X 0.504ms vs 0.044ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=2 7X 0.461ms vs 0.063ms (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=12 4X 1.0ms vs 0.2ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=12 1.7X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=12 1.9X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=12 12X 0.820ms vs 0.067ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=12 11X 0.438ms vs 0.041ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=12 8X 0.431ms vs 0.056ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=12 12X 0.482ms vs 0.041ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=12 8X 0.467ms vs 0.056ms (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=32 4X 1.0ms vs 0.3ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=32 1.7X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=32 1.8X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=32 12X 0.824ms vs 0.070ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=32 10X 0.443ms vs 0.044ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=32 7X 0.438ms vs 0.059ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=32 11X 0.479ms vs 0.045ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=32 8X 0.470ms vs 0.059ms ---------------------------------------------------------------------------------------------------- (1, 3, 800, 800) -> (500, 500) linear float32 num_threads=1 1.0X 4.7ms vs 4.7ms (1, 3, 800, 800) -> (500, 500) nearest float32 num_threads=1 2.0X 4.4ms vs 2.2ms (1, 3, 800, 800) -> (500, 500) nearest uint8 num_threads=1 1.8X 4.3ms vs 2.5ms (1, 3, 800, 800) -> (500, 500) nearest-exact float32 num_threads=1 2.1X 4.7ms vs 2.2ms (1, 3, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=1 1.9X 4.6ms vs 2.5ms (1, 1, 800, 800) -> (500, 500) linear float32 num_threads=1 9X 4.0ms vs 0.4ms (1, 1, 800, 800) -> (500, 500) nearest float32 num_threads=1 17X 4.2ms vs 0.2ms (1, 1, 800, 800) -> (500, 500) nearest uint8 num_threads=1 11X 4.1ms vs 0.4ms (1, 1, 800, 800) -> (500, 500) nearest-exact float32 num_threads=1 19X 4.6ms vs 0.2ms (1, 1, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=1 12X 4.5ms vs 0.4ms (1, 3, 800, 800) -> (500, 500) linear float32 num_threads=2 1.7X 4.7ms vs 2.7ms (1, 3, 800, 800) -> (500, 500) nearest float32 num_threads=2 2.1X 2.4ms vs 1.1ms (1, 3, 800, 800) -> (500, 500) nearest uint8 num_threads=2 1.8X 2.2ms vs 1.3ms (1, 3, 800, 800) -> (500, 500) nearest-exact float32 num_threads=2 2.3X 2.6ms vs 1.1ms (1, 3, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=2 1.9X 2.3ms vs 1.3ms (1, 1, 800, 800) -> (500, 500) linear float32 num_threads=2 15X 4.0ms vs 0.3ms (1, 1, 800, 800) -> (500, 500) nearest float32 num_threads=2 16X 2.3ms vs 0.1ms (1, 1, 800, 800) -> (500, 500) nearest uint8 num_threads=2 9X 2.1ms vs 0.2ms (1, 1, 800, 800) -> (500, 500) nearest-exact float32 num_threads=2 17X 2.5ms vs 0.1ms (1, 1, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=2 10X 2.3ms vs 0.2ms (1, 3, 800, 800) -> (500, 500) linear float32 num_threads=12 10X 4.7ms vs 0.5ms (1, 3, 800, 800) -> (500, 500) nearest float32 num_threads=12 1.9X 0.4ms vs 0.2ms (1, 3, 800, 800) -> (500, 500) nearest uint8 num_threads=12 1.7X 0.4ms vs 0.2ms (1, 3, 800, 800) -> (500, 500) nearest-exact float32 num_threads=12 1.9X 0.4ms vs 0.2ms (1, 3, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=12 1.8X 0.4ms vs 0.2ms (1, 1, 800, 800) -> (500, 500) linear float32 num_threads=12 41X 3.969ms vs 0.096ms (1, 1, 800, 800) -> (500, 500) nearest float32 num_threads=12 11X 0.545ms vs 0.051ms (1, 1, 800, 800) -> (500, 500) nearest uint8 num_threads=12 8X 0.532ms vs 0.070ms (1, 1, 800, 800) -> (500, 500) nearest-exact float32 num_threads=12 11X 0.590ms vs 0.052ms (1, 1, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=12 8X 0.578ms vs 0.071ms (1, 3, 800, 800) -> (500, 500) linear float32 num_threads=32 17X 4.7ms vs 0.3ms (1, 3, 800, 800) -> (500, 500) nearest float32 num_threads=32 1.8X 0.2ms vs 0.1ms (1, 3, 800, 800) -> (500, 500) nearest uint8 num_threads=32 2.0X 0.3ms vs 0.1ms (1, 3, 800, 800) -> (500, 500) nearest-exact float32 num_threads=32 1.9X 0.2ms vs 0.1ms (1, 3, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 800, 800) -> (500, 500) linear float32 num_threads=32 45X 4.028ms vs 0.090ms (1, 1, 800, 800) -> (500, 500) nearest float32 num_threads=32 10X 0.549ms vs 0.053ms (1, 1, 800, 800) -> (500, 500) nearest uint8 num_threads=32 7X 0.536ms vs 0.072ms (1, 1, 800, 800) -> (500, 500) nearest-exact float32 num_threads=32 11X 0.592ms vs 0.055ms (1, 1, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=32 8X 0.581ms vs 0.074ms ```
Code:
I used this file which is adapted from https://github.com/pytorch/pytorch/blob/master/benchmarks/operator_benchmark/pt/interpolate_test.py ```py import operator_benchmark as op_bench import torch """Microbenchmarks for interpolate operator.""" class InterpolateBenchmark(op_bench.TorchBenchmarkBase): def init(self, input_size, output_size, channels_last=False, mode='linear', dtype=torch.float): input_image = torch.randint(0, 256, size=input_size, dtype=dtype, device='cpu', requires_grad=self.auto_set()) if channels_last: if input_image.ndim == 4: input_image = input_image.contiguous(memory_format=torch.channels_last) elif input_image.ndim == 5: input_image = input_image.contiguous(memory_format=torch.channels_last_3d) else: raise ValueError( f"Can not set channels_last to the input of {input_image.ndim} dims" ) align_corners = None if "nearest" in mode else False if mode == "linear": mode = { 3: 'linear', 4: 'bilinear', 5: 'trilinear', }[input_image.ndim] self.inputs = { "input_image": input_image, "output_size": output_size, "mode": mode, "align_corners": align_corners, } self.set_module_name("interpolate") def forward(self, input_image, output_size, mode, align_corners): return torch.nn.functional.interpolate(input_image, size=output_size, mode=mode, align_corners=align_corners) def make_config(): sizes = ( ((224, 224), (64, 64)), ((224, 224), (128, 128)), ((600, 400), (224, 224)), ((320, 320), (256, 256)), ((800, 800), (500, 500)), ) attrs = [] for (HW1, HW2) in sizes: attrs.append([(1, 3, *HW1), HW2]) # 3 channels attrs.append([(1, 1, *HW1), HW2]) # 1 channel attrs.append([(1, 3, *HW2), HW1]) # 3 channels attrs.append([(1, 1, *HW2), HW1]) # 1 channel config = op_bench.config_list( attr_names=["input_size", "output_size"], attrs=attrs, cross_product_configs={ 'channels_last': [True], 'mode': ["linear", "nearest", "nearest-exact"], 'dtype': [torch.float, torch.uint8] }, tags=["short"], ) def get_mode(l): for d in l: if "mode" in d: return d["mode"] def get_dtype(l): for d in l: if "dtype" in d: return d["dtype"] config = [l for l in config if not(get_mode(l) == "linear" and get_dtype(l) == torch.uint8)] return config config = make_config() op_bench.generate_pt_test(config, InterpolateBenchmark) if __name__ == "__main__": op_bench.benchmark_runner.main() ``` with ``` for num_threads in 1 2 12 32; do echo "num_threads=$num_threads" && python -m pt.my_interpolate_test --iterations 1000 --omp_num_threads $num_threads ; done > $out_file ``` and this very ugly helper ```py import re with open("main") as f: main = f.readlines() with open("new") as f: new = f.readlines() out = [] for main_line, new_line in zip(main, new): if main_line.startswith("num_threads="): num_threads = int(main_line.split("=")[-1]) if main_line.startswith("# Input"): deets = f"{main_line.strip()}, {num_threads=}" if main_line.startswith("Forward"): main_time = float(main_line.split()[-1]) new_time = float(new_line.split()[-1]) ratio = main_time / new_time fmt = ".1f" if ratio < 3 else ".0f" improv = f"{ratio:{fmt}}X" time_fmt = ",.3f" if new_time < 100 else ",.1f" deets = deets.strip().replace("# Input: ", "") deets = deets.replace(": ", "=") deets = deets.replace("input_size=", "") deets = deets.replace(", output_size=", " -> ") deets = deets.replace("dtype=torch.", "") deets = deets.replace("mode=", "") deets = deets.replace("channels_last=True, ", "") split = deets.split(",") size = ','.join(split[:-3]) mode, dtype, threads = split[-3:] deets = f"{size:<30} {mode:<15} {dtype:<10} {threads:<15}" l = f"{deets} {improv:<5} {main_time / 1000:{time_fmt}}ms vs {new_time / 1000:{time_fmt}}ms" out.append(l) def key(s): num_threads = (int(re.findall(r"num_threads=(\d+)", s)[0]),) input_shape, output_shape = re.findall("\(.*?\)", s) input_shape = input_shape[1:-1] # remove parenthesis input_HW = tuple(int(x) for x in input_shape.split(",")[-2:]) input_C = (-int(input_shape.split(",")[1]),) output_HW = tuple(int(x) for x in output_shape[1:-1].split(",")) is_downsample = (output_HW[0] < input_HW[0],) if "linear" in s: mode = "linear" elif "nearest-exact" in s: mode = "nearest-exact" else: assert "nearest" in s mode = "nearest" mode = (mode,) return is_downsample + input_HW + output_HW + num_threads + input_C + mode for i, l in enumerate(sorted(out, key=key)): if i % 10 == 0 and i % 40 != 0: print() if i % 40 == 0: print("-" * 100) print(l) ```
Closes https://github.com/pytorch/pytorch/issues/83840 When this is merged we should be able to remove some hack in vision as well https://github.com/pytorch/vision/pull/6661 (CC @vfdev-5 @datumbox ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86361 Approved by: https://github.com/vfdev-5, https://github.com/datumbox, https://github.com/fmassa commit a4ee6956ff074f82c1306d6555d900b48a4b3de0 Author: Nikita Shulga Date: Tue Oct 11 16:11:47 2022 +0000 Pin numpy version during MPS tests (#86691) numpy-1.23.1 for some reason can not be loaded on M1 Fixes https://github.com/pytorch/pytorch/issues/86688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86691 Approved by: https://github.com/DanilBaibak, https://github.com/atalman, https://github.com/seemethere commit 352d9264822b8064b0c0792bc00492e69e569a37 Author: eqy Date: Tue Oct 11 16:03:49 2022 +0000 [CUBLAS][CUDA GRAPHS] (re-re-re-re-open of #83461) Explicitly set the workspace for cuBLAS handles (#86645) re-opening (again) in hopes of working around failed/stuck CLA check CC @ptrblck @ngimel @huydhn Pull Request resolved: https://github.com/pytorch/pytorch/pull/86645 Approved by: https://github.com/zdevito commit 937d677d9f588ba9cddcac64ecfcad7ace9e8a58 Author: Richard Zou Date: Mon Oct 10 08:08:51 2022 -0700 Add version selector back to functorch docs (#86602) I accidentally deleted it in https://github.com/pytorch/pytorch/pull/85856/ . This brings the version selector back. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86602 Approved by: https://github.com/samdow commit a56a8c0fc0251bb4cd24b366a290db2e4beea747 Author: anjali411 Date: Mon Oct 10 20:28:32 2022 +0000 Add meta support for _adaptive_avg_pool2d_backward (#86359) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86359 Approved by: https://github.com/ezyang, https://github.com/albanD commit 03d8ab4decdd9a7391ea6c026d0b095708288ca7 Author: Ivan Yashchuk Date: Tue Oct 11 13:03:20 2022 +0000 Skip forward AD tests for torch.native_batch_norm (#86206) `test_forward_mode_AD` has problems with `torch.native_batch_norm` when computing Jacobian using finite-differences. Weirdly this test unexpectedly passed on periodic CI. Let's skip this test instead of xfailing. Fixes https://github.com/pytorch/pytorch/issues/86175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86206 Approved by: https://github.com/soulitzer commit 6ab07febcea936e75bc95d3ebdbb087b2033ba11 Author: Andrew Gu Date: Tue Oct 11 01:37:54 2022 +0000 [FSDP][Easy] Rename `_prefixed_param_names` -> `_fqns` for consistency (#86653) This renames `_prefixed_param_names` to `_fqns` to help converge on the terminology. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86653 Approved by: https://github.com/rohan-varma commit 2fe580859012d2d24a54e452195ccbc7f3191036 Author: albanD Date: Mon Oct 10 20:19:30 2022 -0400 Symintify NLL loss, copy and squeeze (#86606) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86606 Approved by: https://github.com/anjali411 commit be8627827e0e9ee3769335641aacf0193f66e476 Author: albanD Date: Mon Oct 10 20:19:30 2022 -0400 More symintification of get/set item (#86605) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86605 Approved by: https://github.com/anjali411 commit f84144225242c476a674886af1470220b915fe51 Author: albanD Date: Mon Oct 10 18:13:59 2022 -0400 symintify autograd view chaining (#86604) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86604 Approved by: https://github.com/anjali411 commit 49c9b0a1541f596bca2671ea52fa646dc560ebb7 Author: albanD Date: Mon Oct 10 18:13:59 2022 -0400 symintify einsum (#86603) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86603 Approved by: https://github.com/anjali411 commit 3a2cfbb813e19c1648b23079f704829f9997425d Author: PyTorch MergeBot Date: Tue Oct 11 10:17:27 2022 +0000 Revert "Improve interpolate() speed for channels_last images and masks (#86361)" This reverts commit 93b2d991581db86074dd8011fdc903bd554466b1. Reverted https://github.com/pytorch/pytorch/pull/86361 on behalf of https://github.com/DanilBaibak due to Break the internal import process commit 17074389dec0ee3e2a949fb75bb51cde471d17fe Author: Jianyu Huang Date: Tue Oct 11 06:12:17 2022 +0000 index op with int32 support (#86318) Differential Revision: D40089960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86318 Approved by: https://github.com/malfet commit 88a8a900b90fc7ff2f0e67c4e716520ac7fac75f Author: kshitij12345 Date: Tue Oct 11 05:40:12 2022 +0000 fix: half reduction with multiple sub-iterators (#85596) Fixes #74438 TODO: * [x] Add test Pull Request resolved: https://github.com/pytorch/pytorch/pull/85596 Approved by: https://github.com/ngimel commit 55479fe80ee8df9750c2a4d1022943d04c3e46d6 Author: Louis Feng Date: Tue Oct 11 04:38:26 2022 +0000 Enable capturing of comm collective parameters (#98) (#85368) Summary: X-link: https://github.com/facebookresearch/torch_ucc/pull/98 Add tensor input, output, and other metadata for PyTorch comms. Test Plan: P517138779 Reviewed By: Pavani-Panakanti Differential Revision: D38357077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85368 Approved by: https://github.com/H-Huang commit ad2b04c39c41949d8869de743736bcaeec2dfa0d Author: PyTorch MergeBot Date: Tue Oct 11 03:28:58 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#86651) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86651 Approved by: https://github.com/pytorchbot commit bd381121b9e1e32b1a4acef1504c6e843560e24e Author: PyTorch MergeBot Date: Tue Oct 11 03:24:30 2022 +0000 [vision hash update] update the pinned vision hash (#86652) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86652 Approved by: https://github.com/pytorchbot commit deb414a43fea7fab883858daf13890d9367b68ec Author: PyTorch MergeBot Date: Tue Oct 11 02:50:47 2022 +0000 Revert "Use FindCUDAToolkit to find cuda dependencies (#82695)" This reverts commit fb9b96593c784b86b3d913ef8799ee120c203207. Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/malfet due to Break cublas packaging into wheel commit 577070ff961ed10224b6a8294cbebbea2da77d7f Author: Jianyu Huang Date: Tue Oct 11 02:15:51 2022 +0000 update fbgemm commit ID in PyTorch (#86577) Summary: Update after https://github.com/pytorch/FBGEMM/pull/1388 . Previous issue: D40216348 Test Plan: CI Differential Revision: D40219252 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86577 Approved by: https://github.com/malfet commit d8b971ed259b9ea37f8b4fb360b4aeea6a54a938 Author: Will Constable Date: Tue Oct 11 01:42:26 2022 +0000 Fixes for partitioner with symbolic shapes (#86425) - supports saving symint (and symfloat..) values between fw/bwd, using sketchy logic that probably needs to be improved but seems to work so far - sets a correct weight=1 for sym nodes for cost purposes - lets user functions return symints/floats (but if the same symfloat is saved for backward, that gets duplicated annoyingly) - makes partitioning decisions based on observed trace-time sizes without guarding! (this is sketchy, but it isn't clear that it will lead to bad partitioning choices either) - improves infra for tracking symint-family of types: is_sym_node() and _py_sym_types Pull Request resolved: https://github.com/pytorch/pytorch/pull/86425 Approved by: https://github.com/ezyang commit 16f65f178a6a51e9d25fa6ee73e21325c9b348cd Author: Driss Guessous Date: Tue Oct 11 01:21:37 2022 +0000 Nested tensor forward only chunk operations (#85645) Taking over this pr: https://github.com/pytorch/pytorch/pull/83736 Adding support for chunk without autograd support Pull Request resolved: https://github.com/pytorch/pytorch/pull/85645 Approved by: https://github.com/cpuhrsch commit 4fc0d5341cc58617376c33a2a2c47c2439f4e222 Author: Alan Lin Date: Tue Oct 11 01:21:16 2022 +0000 [PyTorch][Fix] Improve numerical stability of HistogramObserver (#86522) Summary: As titled, HistogramObserver may fail in a certain scenario. Specifically, we originally compute `hist_bin_width` as `(self.max_val - self.min_val) / (self.bins * upsample_rate)`. It's possible that the numerator part is close the the FP32 threshold (1.4e-45) and conducting the division will cause overflow. Bring some redundent computations to avoid such scenario. Test Plan: https://pxl.cl/2ggD4 (https://github.com/pytorch/pytorch/commit/04490e90ea59229355b2771893719fe8896e80f0) Differential Revision: D40149594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86522 Approved by: https://github.com/jerryzh168 commit 8a47a49d5ee19590876b2afcf12b8950c72d81ba Author: Jerry Zhang Date: Mon Oct 10 13:58:09 2022 -0700 [quant] Move the order of x86 engine to avoid changing the default qengine (#86631) since the default qengine is the last element of the engine in supported_engines list, adding x86 qengine in the end of the list changes the default quantized engine as well. this PR will be a short term fix to revert the changes. We have an issue here to track the proper fix: https://github.com/pytorch/pytorch/issues/86404 Motivation: a meta internal team found that the inference failed in onednn prepacking with error: "could not create a primitive descriptor for a reorder primitive." in a COPPER_LAKE machine, we are working with intel to repro and fix the problem. in the mean time, we'll revert the changes of default option back to fbgemm Pull Request resolved: https://github.com/pytorch/pytorch/pull/86631 Approved by: https://github.com/vkuzo commit 224ae0da107ee426a2e19dab3eee52b6252f842f Author: Nikita Shulga Date: Mon Oct 10 23:52:28 2022 +0000 [BE] Fix variable shadowing in CUDACachingAllocator.cpp (#86646) Test Plan: CI Differential Revision: D40245365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86646 Approved by: https://github.com/seemethere commit 2cb330ab15f6a40458272a421f8731069f3e2043 Author: jjsjann123 Date: Mon Oct 10 23:48:52 2022 +0000 Acyclic partition patch (#86511) Fixes #86159 and #86108 Refactored graph partition to check for cyclic dependency on each partition merge, instead of relying on a pre-baked dependency map. The previous implementation suffers from not updating dependency on existing partition. When a fusion happens, the updated dependency map needs to be propagated to all nodes in the graph, so each node in a partition shares an identical dependency set. Previous implementation suffers from the not identifying cyclic dependency in issue #86159. Updated implementation does a cyclic check on partitioned graph before attempting a merge of two partitions. - [x] python repro added with cyclic dependency after partition `TestFXGraphPasses.forward12` - [x] fix dependency map with updated implementation using cyclic check Pull Request resolved: https://github.com/pytorch/pytorch/pull/86511 Approved by: https://github.com/SherlockNoMad commit dd6dd03ff27a1a0e89bad83b6bcb0794116812d9 Author: jjsjann123 Date: Mon Oct 10 23:31:21 2022 +0000 Enable output allocation cache (#86100) Cherry-picked from devel branch: https://github.com/csarofeen/pytorch/pull/2010 turns on accidentally disabled output allocation cache [#2002](https://github.com/csarofeen/pytorch/issues/2002) Updated check for safety regarding allocation cache by iterating all IterDomain on outputs and enables cache re-use only when no extent value is a consumer of fusion inputs (output sizes is not dependent on scalar inputs). Pull Request resolved: https://github.com/pytorch/pytorch/pull/86100 Approved by: https://github.com/csarofeen commit 82ed5ca3401e965067fd03a6bac57978f884f715 Author: Akshit Khurana Date: Mon Oct 10 22:32:44 2022 +0000 [Vulkan] Don't crash immediately if Vulkan context could not be retrieved (#86485) Test Plan: Internal AIBench test Reviewed By: SS-JIA Differential Revision: D40151818 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86485 Approved by: https://github.com/kimishpatel commit b409d1f65b8c1e2607e250526d215d6a2ae8ef01 Author: Elias Ellison Date: Fri Oct 7 19:18:41 2022 +0000 Turn on Data Dependent Throwing (#86480) This was already enabled in TorchDynamo, but was staged to make sure things don't break. Also makes backward single threaded for tests to fix a memory leak. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86480 Approved by: https://github.com/bdhirsh commit ce7751188afb42263ebda159d6ee7a343a833cc1 Author: Andrew Gu Date: Mon Oct 10 18:05:12 2022 +0000 [DDP] Add `PackedSequence` support when `device_ids` is specified (#86614) Before this PR, if a user runs DDP with `device_ids` specified and with a `PackedSequence` input, then the execution will error with something like: ``` raise ValueError( ValueError: batch_sizes should always be on CPU. Instances of PackedSequence should never be created manually. They should be instantiated by functions like pack_sequence and pack_padded_sequences in nn.utils.rnn. https://pytorch.org/docs/stable/nn.html... ``` This is because the DDP forward calls `_to_kwargs()`, which calls `_recursive_to()`, which moves the inputs to GPU. However, `_is_namedtuple(packed_sequence)` returns `True`, leading to the branch `return [type(obj)(*args) for args in zip(*map(to_map, obj))]`, which tries to construct a `PackedSequence` directly via `type(obj)(*args)`, leading to the error. Repro for `_is_namedtuple(packed_sequence)` returning `True`: ``` import random import torch import torch.nn.utils.rnn as rnn_utils from torch.nn.parallel.scatter_gather import _is_namedtuple def _ordered_sequence(tensor_type): seqs = [tensor_type(random.randint(1, 256)) for _ in range(32)] seqs = [s.random_(-128, 128) for s in seqs] ordered = sorted(seqs, key=len, reverse=True) return ordered def _padded_sequence(tensor_type): ordered = _ordered_sequence(tensor_type) lengths = [len(i) for i in ordered] padded_tensor = rnn_utils.pad_sequence(ordered) return padded_tensor, lengths padded, lengths = _padded_sequence(torch.Tensor) packed = rnn_utils.pack_padded_sequence( padded, lengths, enforce_sorted=False) print(type(packed), packed.data.device) print(_is_namedtuple(packed)) ``` Test Plan: ``` python test/distributed/test_c10d_nccl.py -k test_ddp_packed_sequence ``` Without the fix, the added unit test fails with the expected error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86614 Approved by: https://github.com/rohan-varma commit b7b5bd47ae3d5e82ef98e34c406e68b8dc12e448 Author: Nikita Shulga Date: Mon Oct 10 20:36:22 2022 +0000 [MPS] Implement `frac` operator (#86625) As combination if self-trunc Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86625 Approved by: https://github.com/kulinseth, https://github.com/albanD commit 885122b7dc3367c1feaf01410ad625acccf816dc Author: David Reiss Date: Mon Oct 10 17:41:31 2022 +0000 Move PadNd from ATen/native to ATen (#82379) Summary: This header is being included from both aten/native and torch/csrc, but some of our build configurations don't allow direct dependencies from torch/csrc to atent/native, so put the header in aten where it's always accessible. Resolves https://github.com/pytorch/pytorch/issues/81198 Test Plan: CI. ``` ./scripts/build_android.sh env ANDROID_ABI="x86_64" ANDROID_NDK=".../ndk-bundle" CMAKE_CXX_COMPILER_LAUNCHER=ccache CMAKE_C_COMPILER_LAUNCHER=ccache USE_VULKAN=0 ./scripts/build_android.sh echo '#include ' > test.cpp g++ -E -I $PWD/build_android/install/include/ -I $PWD/build_android/install/include/torch/csrc/api/include test.cpp >/dev/null ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82379 Approved by: https://github.com/ezyang, https://github.com/malfet commit e2a4dfa468330c0587849bea4896ff5fffb33010 Author: anjali411 Date: Sun Oct 9 16:01:31 2022 +0000 Add correct __all__ for torch.distributed and torch.cuda submodules (#85702) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85702 Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/rohan-varma commit d93b1b9c4ed6a30e5982ebfa15807d0c497cb837 Author: Rohan Varma Date: Mon Oct 10 18:42:35 2022 +0000 Address feedback from previous PR (#86622) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86622 Approved by: https://github.com/albanD commit d792d75091416b74b55bd281874a21c4f960cc73 Author: Jerry Zhang Date: Mon Oct 10 05:55:54 2022 +0000 [quant][fix] Fix the call to get_executorch_backend_config (#86338) Summary: previously the call failed because there was an infinite loop in _get_share_qparams_ops_configs Test Plan: python test/test_quantization.py -k test_get_executorch_backend_config Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86338 Approved by: https://github.com/andrewor14 commit 2288a1c8065dd1d43410089719028087bd40e997 Author: Sean Ross-Ross Date: Fri Oct 7 13:46:43 2022 -0500 Added new option any_common_cpu_cuda_one to OpDTypes (#86286) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86286 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 8f2dda5bf2451765cebeb111a97be40f70c95989 Author: Alex Date: Mon Oct 10 17:42:13 2022 +0000 [CI] Build MacOS M1 binaries without distributed support (#86451) Partial fix for #86448 which causes the broken code to be exercised in CI. If this demonstrates the break, I'm not sure whether there should be a fix forward of https://github.com/pytorch/pytorch/pull/85781 or a revert Pull Request resolved: https://github.com/pytorch/pytorch/pull/86451 Approved by: https://github.com/malfet commit dcc3ae98b7278d9d85be853bfcd070b2a081003f Author: Driss Guessous Date: Mon Oct 10 17:37:19 2022 +0000 [NestedTensor] Add a contiguous checks to get_buffer (#86496) Many NestedTensor ops are implemented using a connivence function named get_buffer. This returns a dense, contiguous tensor that is a view of the underlying storage of the NestedTensor. This function allows NestedTensor ops to piggy back off of the implementations for dense tensor under certain scenarios. This PR adds a TORCH_CHECK() to get buffer to insure that the calling NT is in fact contiguous. It also adds an "unsafe" version for a few ops that are designed to handle contiguity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86496 Approved by: https://github.com/albanD, https://github.com/cpuhrsch commit ad449b338feaa1520d2724726bcbd613a0e15b55 Author: Howard Huang Date: Fri Oct 7 09:04:13 2022 -0700 [8/N] [Dispatchable Collectives] Update allgather with CPU / CUDA implementations  (#84423) - Updates for the allgather collective https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84423 Approved by: https://github.com/kwen2501 commit 9eb771583ce56fba9c78a80681cac47ee07b3f49 Author: anjali411 Date: Mon Oct 10 13:16:01 2022 +0000 symintify rand and randint functions and meta suport for randint (#86358) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86358 Approved by: https://github.com/ezyang, https://github.com/albanD commit 67358ee124e6e826b63a854f7bc5b341e7734406 Author: David Date: Mon Oct 10 16:57:52 2022 +0000 MaxPool: correct pooling description (#86559) In the documentation of `nn.MaxPool2d` and `nn.MaxPool3d`, the argument description of `padding` incorrectly states that zero padding is applied. The remainder of the documentation correctly states that negative infinity padding is applied. The documentation of `padding` in `nn.MaxPool1d`, `nn.functional.max_pool1d/2d/3d` is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86559 Approved by: https://github.com/albanD commit 16a0fa1204edb118800261a26281e624988eb239 Author: Tugsbayasgalan Manlaibaatar Date: Fri Oct 7 13:37:02 2022 -0700 Enable max.unary_out (#85926) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85926 Approved by: https://github.com/bdhirsh commit e18d466f35d9dd5c4fd38328e67cada1504abb8b Author: Kshiteej K Date: Mon Oct 10 16:29:52 2022 +0000 [test_nn] split lazy_modules from test_nn (#86526) Ref: #63085 NOTE: We don't need an accompanying XLA PR as these tests run only on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86526 Approved by: https://github.com/albanD commit 8a1fc5d2f843501ed1ce1bf90b20eaa709c8aae2 Author: Howard Huang Date: Fri Oct 7 09:04:13 2022 -0700 [7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations (#83916) - Updates for the reduce collective https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83916 Approved by: https://github.com/kwen2501 commit 978b46d7c96627e3b3553ad70ad21cb161d05f90 Author: albanD Date: Mon Oct 10 08:44:51 2022 -0400 Reland 2 of Merge more symbolic meta kernels and symint changes from branch (#86334) (#86488) symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops add meta_bernoulli_ meta kernel for at::gather get pytorch_struct to pass: meta for scatter_add, fix backward symintify split ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/86488 Approved by: https://github.com/ezyang commit 55663b7f8174db5d71d611403078ebcec4075b1a Author: albanD Date: Mon Oct 10 08:44:51 2022 -0400 Reland 3 of Symintify getitem and add the required helper functions (#86207) (#86487) Note that this might not cover every use of the function (we know it doesn't) But this is enough to get few models passing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86487 Approved by: https://github.com/ezyang commit 4a5fdc56ec692fe5e39b8f5d2da6be16434c5a02 Author: Brian Hirsh Date: Fri Oct 7 12:05:29 2022 -0700 fix some composite compliance ops for functionalization (#86470) Confirmed that this fixes https://github.com/pytorch/pytorch/issues/86384 cc @tugsbayasgalan Functionalization should be included in the "isSubclass" checks that we run, for composite operators that have a different path for composite compliance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86470 Approved by: https://github.com/ezyang, https://github.com/zou3519 commit 5102f0cffcd249254245fdb3eb74abcd2151f9ac Author: Andrew Gu Date: Fri Oct 7 13:17:18 2022 +0000 [FSDP][1/N] Retire `FlattenParamsWrapper` (#86117) This deprecates `FlattenParamsWrapper`'s usage for "unflattening" the original parameters. After this PR, FPW only serves to register and de-register its `FlatParameter` for the parent `FullyShardedDataParallel` instance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86117 Approved by: https://github.com/zhaojuanmao commit bf7c46facf442950a191ed4053a2b7ef6c39b35a Author: PyTorch MergeBot Date: Mon Oct 10 10:47:38 2022 +0000 [xla hash update] update the pinned xla hash (#86099) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86099 Approved by: https://github.com/pytorchbot commit 5844f00bbf3d434757b59487f5edeaaf51d292f5 Author: Andrew Gu Date: Sat Oct 8 00:15:34 2022 +0000 [FSDP] Add `low_prec` prefix to param and reduce dtype varnames (#86512) This PR renames `param_dtype` and `reduce_dtype` in `HandleConfig` to `low_prec_param_dtype` and `low_prec_reduce_dtype` to emphasize that they are meant to be of the low precision (if not `None`). (In my mind, mixed precision refers to the paradigm of using both full and low precision together during training. "Reduced" and "low precision" mean the same thing, but I prefer the term "low precision" in the code since it is shorter. A particular dtype can be a low precision dtype or a full precision dtype.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86512 Approved by: https://github.com/zhaojuanmao commit cc5de7f1ac01eece0a5bf6b94987d1ac9cacb2af Author: Andrew Gu Date: Sat Oct 8 13:54:36 2022 +0000 [FSDP] Remove `utils.py` (moved to `_utils.py`) (#86528) I messed up my git with an earlier PR, where I did not actually remove `utils.py` when moving it to `_utils.py`. This removes `utils.py`, which is now outdated and unused. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86528 Approved by: https://github.com/H-Huang commit c6b7c33885eeff9dc125f87c7134772d59d0ba21 Author: chunyuan Date: Mon Oct 10 05:47:11 2022 +0000 torchdynamo: add linear eltwise fusion kernel (#85622) Support fusion of linear with: - relu - sigmoid - tanh - hardswish - leaky_relu - hardtanh - gelu Pull Request resolved: https://github.com/pytorch/pytorch/pull/85622 Approved by: https://github.com/EikanWang, https://github.com/jansel commit ec2d22ece066bdb91b43394178f9c94a324f881f Author: PyTorch MergeBot Date: Mon Oct 10 03:26:25 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#86567) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86567 Approved by: https://github.com/pytorchbot commit 753536b7a50df31d0529ba040f1e07cde3cca56d Author: Peter Bell Date: Sun Oct 9 13:49:24 2022 +0100 BlasKernel: Improve gemm's inner dot product when a is transposed (#80977) `gemm_transab_` accumulates the sum in the output, despite the inner loop being over a single output element. This changes it to accumulate in a register, which also avoids early truncation for bfloat16. I've also factored out a generic `sum` function that can be shared with `gemm_transa_` to handle unrolling and multiple accumulators. I have benchmarked addmm for bfloat16 with shapes (320,600) X (600,320) and for both layouts I see a significant speedup. | layout | Before (ms) | After (ms) | |----------|-------------|------------| | transa | 71.5 | 31 | | transab | 249 | 35 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/80977 Approved by: https://github.com/ngimel commit a45fead623e3f9a11bacbf5d49b252c3e867167d Author: Peter Bell Date: Sun Oct 9 19:40:48 2022 +0100 mkl: Use per-operator headers (#75570) Differential Revision: [D40126703](https://our.internmc.facebook.com/intern/diff/D40126703) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75570 Approved by: https://github.com/malfet commit c89d286af633a802226c34ccbdd5c7c4be10dcfb Author: anjali411 Date: Sun Oct 9 13:35:57 2022 +0000 symintify unbind_backward and tensor_split (#86357) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86357 Approved by: https://github.com/albanD commit a6c0442cce252742e6e71270640908b5c1b91961 Author: anjali411 Date: Sun Oct 9 12:29:07 2022 +0000 Add __all__ to torch.{autograd, fx, cuda} submodules (#85343) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85343 Approved by: https://github.com/albanD commit 6aec0d3ddbdfaed1baaaddc20b2c25597de12291 Author: Nikita Shulga Date: Sun Oct 9 14:20:46 2022 +0000 [BE] Remove remaining cuda-11.3 builds (#86540) `linux-bionic-cuda11_3-py3_7-clang9-build` is redundant is is covered by `linux-jammy-cuda11.6-cudnn8-py3.8-clang12` And migrate no-per-operator header build (which mimics internal behavior) from `linux-xenial-cuda11.3` to `linux-bionic-cuda11.7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86540 Approved by: https://github.com/weiwangmeta, https://github.com/atalman commit 7134b9bc7b1d25b453ec5c53b1ec70cb206228a1 Author: Peter Bell Date: Tue Oct 4 23:48:53 2022 +0100 Quantized: Use per-operator headers (#75569) Differential Revision: [D40126700](https://our.internmc.facebook.com/intern/diff/D40126700) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75569 Approved by: https://github.com/malfet commit 67434c70df5df353944f6ba876d9dd06b669bacd Author: Nikita Shulga Date: Sun Oct 9 06:47:36 2022 +0000 [MPS] Fix printTensor() for MPS (#86534) MPS does not support double type, so tensor need to be cast to CPU first before it can be cast to double. Also, do a little bit of BE, by initializing values and marking unused range variables with C10_UNUSED Fixes https://github.com/pytorch/pytorch/issues/86410 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86534 Approved by: https://github.com/weiwangmeta commit 9998f9100bfc620bd28af272af2e16b34b8c8bcf Author: PyTorch MergeBot Date: Sun Oct 9 03:30:05 2022 +0000 [vision hash update] update the pinned vision hash (#86490) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86490 Approved by: https://github.com/pytorchbot commit 92ac84c98a19310885f3d818aba56b981940d615 Author: PyTorch MergeBot Date: Sun Oct 9 03:28:35 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#86489) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86489 Approved by: https://github.com/pytorchbot commit 492d1be5d2bba36dd6710caa411cdf332ca9e5c8 Author: Peter Bell Date: Tue Oct 4 23:48:52 2022 +0100 QuantizedCPU: Use per-operator headers (#71217) Differential Revision: [D33949895](https://our.internmc.facebook.com/intern/diff/D33949895) Pull Request resolved: https://github.com/pytorch/pytorch/pull/71217 Approved by: https://github.com/malfet commit 4bfe2a24505049fa4fe43d24c2e3a5f5d99d9f00 Author: Peter Bell Date: Tue Oct 4 23:48:52 2022 +0100 cuDNN/miopen: Use per-operator headers (#71216) Differential Revision: [D33949898](https://our.internmc.facebook.com/intern/diff/D33949898) Pull Request resolved: https://github.com/pytorch/pytorch/pull/71216 Approved by: https://github.com/malfet commit 33f0e98a492acb55cca192ea8f4bb5bf24f28a4b Author: Edward Z. Yang Date: Sat Oct 8 07:17:37 2022 +0000 Re-land*4 "SymIntify cat and narrow" (#86468) This re-lands https://github.com/pytorch/pytorch/pull/86289 but with more wrappers. Contains implicit inclusion of in internal usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86468 Approved by: https://github.com/albanD commit 4bc2a0dcda79ab8589b469fa31919a8141361f42 Merge: 475022cd5d 5df5c3e33e Author: mingfeima Date: Sat Oct 8 15:11:29 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 5df5c3e33eff4016b365e21753168aecaead166c Merge: fd840676b0 8ea2ed0fc7 Author: mingfeima Date: Sat Oct 8 15:11:29 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 8ea2ed0fc728b964b8abfc768c37c3eb8b315dd5 Author: PyTorch MergeBot Date: Sat Oct 8 05:14:39 2022 +0000 Revert "Re-enable torchdynamo tests (#86297)" This reverts commit e61028813007518bd6be0e6482a8742b84c30da7. Reverted https://github.com/pytorch/pytorch/pull/86297 on behalf of https://github.com/malfet due to Reverting to return trunk back to green, dynamo shard2 started failing shortly after the merge commit d3f7c34cb3d7ac43115bd3ccd9cbdbf3e5654498 Author: Elias Ellison Date: Fri Oct 7 18:01:13 2022 +0000 Enable aten-aten decomps (#85921) Invokes aten-aten decomps with re-entrant FakeMode. These decomps are being used in other places, so it's good to unify the path static fake tensor takes / get additional testing etc. There is also an instance where we return different devices with cpu/cuda which this fixes ([batch_norm](https://github.com/pytorch/pytorch/blob/master/torch/_decomp/decompositions.py#L1374)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85921 Approved by: https://github.com/ezyang commit af9c6bc851cfc8fba9e4c71830b783cb34d92a05 Author: Andrew Gu Date: Sat Oct 8 00:15:23 2022 +0000 [FSDP] Add `keep_low_precision_grads` support when CPU offloading (#86495) When CPU offloading, FSDP uses `_cpu_grad`, not `_saved_grad_shard`. This adds support for `keep_low_precision_grads` for that case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86495 Approved by: https://github.com/rohan-varma commit 7ec12a559cadbb82a1bd6546908897afedd453af Author: PyTorch MergeBot Date: Sat Oct 8 01:59:54 2022 +0000 Revert "Enable aten-aten decomps (#85921)" This reverts commit 62e4f51efdf98a3a91d29efa55e5665d5398b464. Reverted https://github.com/pytorch/pytorch/pull/85921 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. I think it breaks a dynamo test in trunk https://hud.pytorch.org/pytorch/pytorch/commit/62e4f51efdf98a3a91d29efa55e5665d5398b464 commit b0ceb8ea1c765963a6210d02686dbffd48e96bc8 Author: ssjia Date: Fri Oct 7 11:39:04 2022 -0700 [vulkan] Add buffer to buffer copies (#86424) Differential Revision: [D40112702](https://our.internmc.facebook.com/intern/diff/D40112702/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86424 Approved by: https://github.com/kimishpatel commit 511d81cd2abff1922ea22e9acf4a7b6fb5e84dbd Author: ssjia Date: Fri Oct 7 11:39:03 2022 -0700 [vulkan] Clean up convolution code (#86423) Differential Revision: [D39553863](https://our.internmc.facebook.com/intern/diff/D39553863/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86423 Approved by: https://github.com/kimishpatel commit b645c237bc88c441792d83f19575d0fd3284dcb4 Author: Cody Ohlsen Date: Sat Oct 8 01:25:03 2022 +0000 make g2p ~30% faster on mobile by suppressing a log (#85907) Summary: using the tool from D39559248 i was able to make g2p faster on mobile by taking a look at profiles on stella frames. It turned out that the pytorch interpreter code does some logging that ends up being a pretty big bottleneck. Differential Revision: D39901455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85907 Approved by: https://github.com/dzdang commit bac26155e7e5bca949e986fa36ab37e042b8ad53 Author: David Berard Date: Sat Oct 8 00:38:11 2022 +0000 [JIT] Allow freezing modules that contain mutable interfaces (#86039) This PR allows freezing modules like the one below: ```python @torch.jit.interface class ModuleInterface(torch.nn.Module): def forward(self, inp: torch.Tensor) -> torch.Tensor: pass class ImplementsInterface(torch.nn.Module): def __init__(self): super(ImplementsInterface, self).__init__() self.sum = torch.zeros((2, 2)) def forward(self, inp: torch.Tensor) -> torch.Tensor: self.sum += inp.relu() # this makes the interface-implementing module mutable return self.sum class WrapperModule(torch.nn.Module): impl: ModuleInterface def __init__(self): super().__init__() self.impl = ImplementsInterface() def forward(self, x: torch.Tensor) -> torch.Tensor: return self.impl.forward(x) ``` Previously during freezing, we handle interfaces as shown below: 1. we inline interfaces in any preserved method graphs 2. during `cleanupFrozenModule`, we try to simplify the module data structure (<- this part is unrelated to freezing so far). During this step, if we found that a interface type was mutable, we'd error out; because of the possibility of a module that _swaps out the value of an interface-typed attribute at runtime_. Below is an example of a module that swaps out the value of an interface-typed attribute at runtime: ```python class MyBadModule(torch.nn.Module): impl: MyInterface option1: IfaceImpl1 option2: IfaceImpl2 .... def forward(self, x): if x > 0: self.impl = self.option1 else: self.impl = self.option2 .... ``` ^ this type of situation cannot be supported by freezing (or at least would be difficult to do correctly) because it greatly complicates the details of handling types and simplifying the module data structure. But we can still support the first example without _too_ much work: 1. inline the interface code as before 2. check to see if we have any setattrs on interface types; if so, error out 3. otherwise, replace the type of the interface types with the concrete type implementation 4. continue simplifying the module data structure as if we never had any interfaces. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86039 Approved by: https://github.com/eellison commit 04490e90ea59229355b2771893719fe8896e80f0 Author: Hongxia Yang Date: Sat Oct 8 00:06:05 2022 +0000 better error message fix (#86422) Summary: A user had a problem with fx-scripting and the error message can be improved. Error was shown as: RuntimeError: Keys for dictionaries used as an argument cannot contain a Node. Got key: {k} which is obvious not quite helpful. Test Plan: Test in a notebook: {F778667593} Reviewed By: xunnanxu, SherlockNoMad Differential Revision: D40157518 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86422 Approved by: https://github.com/SherlockNoMad commit 3a02873183e81ed0af76ab46b01c3829b8dc1d35 Author: zaf Date: Fri Oct 7 14:05:13 2022 -0700 [quant][ao_migration] nn.intrinsic.quantized migration to ao (#86172) All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.quantized`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline. ``` python test/test_quantization.py -- TestAOMigrationNNIntrinsic ``` Internal: ``` buck2 test @mode/dev-nosan //caffe2/test:quantization -- TestAOMigrationNNIntrinsic ``` Differential Revision: [D39425515](https://our.internmc.facebook.com/intern/diff/D39425515/) Differential Revision: [D39425515](https://our.internmc.facebook.com/intern/diff/D39425515) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86172 Approved by: https://github.com/jerryzh168 commit 91b1bae1df1e72e17d2ab296845c214bc39422a0 Author: Zachary DeVito Date: Fri Oct 7 13:21:48 2022 -0700 Caching allocator tracing (#86241) We currently can take snapshots of the state of the allocated cuda memory, but we do not have a way to correlate these snapshots with the actions the allocator that were taken between snapshots. This PR adds a simple fixed-sized buffer that records the major actions that the allocator takes (ALLOC, FREE, SEGMENT_ALLOC, SEGMENT_FREE, OOM, SNAPSHOT) and includes these with the snapshot information. Capturing period snapshots with a big enough trace buffer makes it possible to see how the allocator state changes over time. We plan to use this functionality to guide how settings in the allocator can be adjusted and eventually have a more robust overall algorithm. As a component of this functionality, we also add the ability to get a callback when the allocator will throw an OOM, primarily so that snapshots can be taken immediately to see why the program ran out of memory (most programs have some C++ state that would free tensors before the OutOfMemory exception can be caught). This PR also updates the _memory_viz.py script to pretty-print the trace information and provide a better textual summary of snapshots distinguishing between internal and external fragmentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86241 Approved by: https://github.com/ngimel commit 8a3a54e012488ddb4e372f559bbeed2b41e7eb1c Author: Sherlock Huang Date: Fri Oct 7 18:07:54 2022 +0000 Fix index_select decomp (#86469) For decomposing index_select with 0-dim tensor, we cannot write `x.unsqueeze(0)[index].squeeze(0).clone()` , as tensor[index] will trigger index.item() if index is a 0-dim tensor, and .item() cannot be symbolically traced with FakeTensor. We use `torch.ops.aten.index(x.unsqueeze(0), [index]).squeeze(0).clone()` as a workaround. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86469 Approved by: https://github.com/ngimel commit a079dad7cfdcc6982ec704a924b1432ff01b3a09 Author: albanD Date: Fri Oct 7 22:47:46 2022 +0000 Skip dynamo for all optim test as they are all flaky otherwise (#86482) Fixes https://github.com/pytorch/pytorch/issues/86433 Fixes https://github.com/pytorch/pytorch/issues/86435 Fixes https://github.com/pytorch/pytorch/issues/86432 Fixes https://github.com/pytorch/pytorch/issues/86389 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86482 Approved by: https://github.com/ezyang commit ba3fde6aa08c97be2616bcc9f372781166ed7342 Author: soulitzer Date: Fri Oct 7 14:29:33 2022 -0400 Add multi-grad hooks (#86260) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86260 Approved by: https://github.com/albanD commit 97e56c176d6091a91ac2afe284fc8cb406780ddd Author: albanD Date: Fri Oct 7 21:09:37 2022 +0000 Try to fix shutdown test in edge cases (#86464) Fixes https://github.com/pytorch/pytorch/issues/85259 See the issue for debugging details. tl;dr: when a worker thread is actually used, make sure it is initialized before exiting. Yes, it is very unlikely it will take >10s to initialize but it is what seems to happen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86464 Approved by: https://github.com/soulitzer, https://github.com/ezyang commit 62e4f51efdf98a3a91d29efa55e5665d5398b464 Author: Elias Ellison Date: Fri Oct 7 18:01:13 2022 +0000 Enable aten-aten decomps (#85921) Invokes aten-aten decomps with re-entrant FakeMode. These decomps are being used in other places, so it's good to unify the path static fake tensor takes / get additional testing etc. There is also an instance where we return different devices with cpu/cuda which this fixes ([batch_norm](https://github.com/pytorch/pytorch/blob/master/torch/_decomp/decompositions.py#L1374)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85921 Approved by: https://github.com/ezyang commit a95889ba7c1ecd8cb0f90507a6152cb035bcefd1 Author: Andrew Gu Date: Fri Oct 7 13:17:17 2022 +0000 [FSDP] Add initial `summon_full_params(with_grads=True)` (#85738) This adds `summon_full_params(with_grads=True)` for `use_orig_params=True` and `offload_to_cpu=False`. Filling in the `use_orig_params=False` case requires some already-planned refactoring, and the `offload_to_cpu=True` case needs some additional work as well. Adding this is helpful for debugging `use_orig_params=True` to make sure gradients are being updated correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85738 Approved by: https://github.com/rohan-varma commit 82229d1e33dc8b34d0c6b35aaa23e51644cd4c74 Author: kshitij12345 Date: Fri Oct 7 19:24:59 2022 +0000 [optim] fix: empty grad support for SparseAdam (#86459) Fixes #82486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86459 Approved by: https://github.com/albanD commit 66d480d314236a8cd8df4a28ed8867d48b6fa448 Author: PyTorch MergeBot Date: Fri Oct 7 18:55:01 2022 +0000 Revert "Disable mac m1 jobs (#86463)" This reverts commit ac632b437489b4c0c2714d5ad37517bb60e09750. Reverted https://github.com/pytorch/pytorch/pull/86463 on behalf of https://github.com/huydhn due to Queue is decreasing, re-enable the jobs commit ac632b437489b4c0c2714d5ad37517bb60e09750 Author: Huy Do Date: Fri Oct 7 18:28:47 2022 +0000 Disable mac m1 jobs (#86463) There is a queue and some runners are not accessible. This is to mitigate the Sev https://github.com/pytorch/pytorch/issues/86466 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86463 Approved by: https://github.com/clee2000 commit ac74976a566ff83f64017776351a9b3ce4402896 Author: HDCharles Date: Thu Oct 6 14:48:38 2022 -0700 [ao] fixing public v private for fuser_method_mappings.py (#86029) Summary: no significant changes, just added __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86029 Approved by: https://github.com/jerryzh168 commit be682befbc836a07d5d070bb569450429526a64b Author: Andrew Gu Date: Fri Oct 7 13:17:16 2022 +0000 [FSDP] Add `use_orig_params` (#84911) **Overview** This PR adds the option to use the original parameters via `use_orig_params=True` in the FSDP constructor. - This exposes the original parameters rather than the `FlatParameter`s from `named_parameters()`, which means that the optimizer runs on the original parameters. Hence, users may assign original parameters from the same `FlatParameter` to different parameter groups. - This enables decoupling the original parameter variables from their storage without changing the variables themselves, which is critical for our upcoming execution-order-based non-recursive wrapping policy. For more detailed design explanation, refer to the Quip shared internally. **Follow-Ups** See 85831 (removing link to avoid spamming the issue whenever I update this PR). `test_fsdp_use_orig_params.py` adds ~4 min 46 seconds to the TTS on the AWS cluster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84911 Approved by: https://github.com/rohan-varma commit b43ae1c4116487ea6a195d533b5d5622075dec9d Author: Chengqi Deng Date: Fri Oct 7 17:59:26 2022 +0000 Add reference counter in FileStore (#85601) Fixes #67566. This diff added a reference counter in the FileStore object. The underlying file would be removed only if the reference counter became 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85601 Approved by: https://github.com/H-Huang commit efccb6401c6451389c9005a43c29fd055fb89452 Author: zaf Date: Thu Oct 6 13:33:20 2022 -0700 [quant][ao_migration] nn.intrinsic.qat migration to ao (#86171) All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.qat`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline. ``` python test/test_quantization.py TestAOMigrationNNIntrinsic ``` Differential Revision: [D39419993](https://our.internmc.facebook.com/intern/diff/D39419993/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39419993/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/86171 Approved by: https://github.com/jerryzh168 commit e61028813007518bd6be0e6482a8742b84c30da7 Author: Yanbo Liang Date: Fri Oct 7 17:16:40 2022 +0000 Re-enable torchdynamo tests (#86297) We temporarily skipped torchdynamo tests due to many failures, now we fix the problems and re-enable tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86297 Approved by: https://github.com/anijain2305 commit e8d3b7201c9f8223380e2eb66e2213ae3be08869 Author: HDCharles Date: Thu Oct 6 14:48:37 2022 -0700 [ao] fixing public v private for fuse_modules.py (#86028) Summary: no significant changes, just added __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86028 Approved by: https://github.com/jerryzh168 commit d29912cc06241fb8e2ad11629271a92780a759c2 Author: HDCharles Date: Thu Oct 6 14:48:36 2022 -0700 [ao] fixing public v private for torch/ao/quantization (#86027) Summary: no significant changes, just needed to add __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86027 Approved by: https://github.com/jerryzh168 commit 65b408074f4ecc99faf5720ea5b3570a483ec9f4 Author: PyTorch MergeBot Date: Fri Oct 7 16:29:27 2022 +0000 Revert "Relandx3 "SymIntify cat and narrow" (#86289)" This reverts commit a00f8489df5586178d7b5f83928bf8049ce32f24. Reverted https://github.com/pytorch/pytorch/pull/86289 on behalf of https://github.com/malfet due to @seemether unlanded the rest of the stack and it will fail intern import anyway commit 5b69b87d5abbb272fb48be5a5a4dc17f8399c124 Author: PyTorch MergeBot Date: Fri Oct 7 16:10:30 2022 +0000 Revert "Symintify getitem and add the required helper functions (#86207)" This reverts commit fd5085c445c3f1a4c90e55154cf26fe30f52a0ab. Reverted https://github.com/pytorch/pytorch/pull/86207 on behalf of https://github.com/seemethere due to Fails internal tests, see: https://www.internalfb.com/intern/sandcastle/job/22517998926071860/insights commit 75df4b5e3daa2a177f35bd0e43629c814238b639 Author: PyTorch MergeBot Date: Fri Oct 7 16:03:30 2022 +0000 Revert "Merge more symbolic meta kernels and symint changes from branch (#86334)" This reverts commit 08e3999fa494238f8f62346a140da36bd43864e7. Reverted https://github.com/pytorch/pytorch/pull/86334 on behalf of https://github.com/seemethere due to Trying to revert https://github.com/pytorch/pytorch/pull/86207, this PR causes merge conflicts with the initial revert so will have to revert this as well commit b3fdb02fb25508d9c61d70b594f8a7fac3b2a365 Author: Check Deng Date: Fri Oct 7 15:55:55 2022 +0000 Fix memory leak in _LRScheduler.step() (#85602) Fixes #85410 This diff removed the cyclic references in `_LRScheduler.step()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85602 Approved by: https://github.com/albanD commit 0e639ff45c616946fb3e5e3f06b9486d88ce86ca Author: PyTorch MergeBot Date: Fri Oct 7 14:55:44 2022 +0000 Revert "Cleanup PT-D imports (#85781)" This reverts commit 9a170b24f64d7cfdd887ff122c241ac6ff85f4c6. Reverted https://github.com/pytorch/pytorch/pull/85781 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit 9b2ea41f481bac297c8e1e88c431c03127a35759 Author: Nikita Vedeneev Date: Fri Oct 7 14:50:48 2022 +0000 COO intersection primitives : fusing value selection with value intersection. (#86269) As per title. This one fuses 3 kernels into 1 with about 20-10% performance improvement. This kernel is also useful for union-like operations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86269 Approved by: https://github.com/amjames, https://github.com/cpuhrsch commit e125baf90b53a97992ef392a06d6321618b14113 Author: Richard Zou Date: Thu Oct 6 12:45:23 2022 -0700 [autocast] Clean up registrations using new macros (#86403) This PR cleans up m.impl(...) calls to use the new KERNEL / KERNEL_CPU macros. That saves us the trouble of writing out the signatures. Test Plan: - code reading - wait for tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86403 Approved by: https://github.com/ezyang commit 9b74267eb6de5076e7eb2e92bc34eef771384c1e Author: Richard Zou Date: Thu Oct 6 12:45:19 2022 -0700 [autocast] Make it easier to register rules (#86402) On the way to resolving https://github.com/pytorch/pytorch/issues/86294 Previously, there were three macros used to register autocast rules: - KERNEL - KERNEL_DIFFERENT_REDISPATCH_SIGNATURE - KERNEL_CPU This PR makes the KERNEL and KERNEL_CPU macros less redundant for users. KERNEL_DIFFERENT_REDISPATCH_SIGNATURE is weird and only used three times, so I didn't change them. Concretely, KERNEL(OP, OP_NAME, SIGNATURE, POLICY) is redundant: - op/op_name are similar, and the signature can be decltype'd. PR changes it so that instead, one uses either: - KERNEL(OP, POLICY) - KERNEL2(OP, OVERLOAD, POLICY) depending on whether the operator name has an overload. This PR also gives the same treatment to the KERNEL_CPU macro, which is used for registering autocast cpu rules: it splits KERNEL_CPU into KERNEL_CPU(OP, POLICY) AND KERNEL_CPU2(OP, OVERLOAD, POLICY). I will do some more cleanup of things that are implemented via `m.impl(...)` in a follow-up PR so that I don't get confused when I need to rebase. Test Plan: - wait for tests (how good are our autocast tests?) - code reading Pull Request resolved: https://github.com/pytorch/pytorch/pull/86402 Approved by: https://github.com/ezyang commit 55f5e0de8dbd15d3732017796bddfa10fc76d033 Author: Supraj Bachawala Date: Fri Oct 7 14:13:15 2022 +0000 remove unused arg from `impl_func_cum_ops` (#86364) Fixes #86224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86364 Approved by: https://github.com/bdhirsh commit a00f8489df5586178d7b5f83928bf8049ce32f24 Author: Edward Z. Yang Date: Wed Oct 5 11:32:48 2022 -0700 Relandx3 "SymIntify cat and narrow" (#86289) This reverts commit fc94a2115b31dfe7a0d8f28eb4f5ed532c4f0792. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86289 Approved by: https://github.com/wconstab commit cc9183eb4c05f2dbb002279698cc21c4781e9492 Author: Howard Huang Date: Fri Oct 7 12:59:09 2022 +0000 Update distributed.rst backend collective support chart (#86406) NCCL `scatter` was added by Wanchao in https://github.com/pytorch/pytorch/pull/70029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86406 Approved by: https://github.com/wanchaol commit b74ca31bf6d3f1d16849a1e893164450a917e447 Author: kshitij12345 Date: Fri Oct 7 12:12:03 2022 +0000 [fix] sum_to_size: MathBits test - don't reuse same input tensor (#86378) Fixes #85409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86378 Approved by: https://github.com/anjali411 commit facbddb9ff494f0c0c9a06ea823bc7cd3f203352 Author: Salil Desai Date: Fri Oct 7 11:58:41 2022 +0000 Override Quantized Backend to use Fbgemm in Qlinear Packed Params Test (#86236) Summary: After D39934051, we must explicitly ```override_quantized_engine('fbgemm')``` for this test to work Test Plan: ``` buck test //caffe2/test:ao -- TestQlinearPackedParams ``` Before: ``` Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499663624574 ✓ ListingSuccess: caffe2/test:ao : 72 tests discovered (32.830) ✓ Pass: caffe2/test:ao - test_qlinear_packed_params_qnnpack (ao.sparsity.test_qlinear_packed_params.TestQlinearPackedParams) (25.085) ✗ Fail: caffe2/test:ao - test_qlinear_packed_params (ao.sparsity.test_qlinear_packed_params.TestQlinearPackedParams) (26.706) Test output: > RuntimeError: Didn't find engine for operation ao::sparse::qlinear_prepack X86 ``` After: ``` Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7599824485968786 ✓ ListingSuccess: caffe2/test:ao : 72 tests discovered (31.082) ✓ Pass: caffe2/test:ao - test_qlinear_packed_params_fbgemm (ao.sparsity.test_qlinear_packed_params.TestQlinearPackedParams) (100.409) ✓ Pass: caffe2/test:ao - test_qlinear_packed_params_qnnpack (ao.sparsity.test_qlinear_packed_params.TestQlinearPackedParams) (100.544) Summary Pass: 2 ListingSuccess: 1 ``` Differential Revision: D40078176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86236 Approved by: https://github.com/jmdetloff, https://github.com/z-a-f commit dbea07b6aa208f4dfdc8c0876fc2469bffa74fbe Author: Seonglyong Gong Date: Fri Oct 7 09:58:50 2022 +0000 [Profiler] record gradient from nnModule (#86355) Summary: - catch .grad tensor info - update data type and `check_and_store`, etc - update unit test case Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39711295 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86355 Approved by: https://github.com/chaekit commit 28a0b3fb18da6ff96b6d4edb252b15e7f3e331a9 Author: lezcano Date: Thu Oct 6 22:50:05 2022 +0000 Fix col2im and im2col decompositions (#86426) I threw in some tests for good measure. Fixes https://github.com/pytorch/pytorch/issues/86332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86426 Approved by: https://github.com/ngimel commit 93b2d991581db86074dd8011fdc903bd554466b1 Author: Nicolas Hug Date: Thu Oct 6 11:32:29 2022 +0000 Improve interpolate() speed for channels_last images and masks (#86361) This PR improves the speed of `interpolate()`: - on images and masks (`num_channels < 4`, `channels_last=True`) - for the following modes: linear (antialias=False), nearest (int and float), and nearest-exact (int and float) - for both upsampling and downsampling The actual speed-up ranges from 1.1X to 110X, but this depends on various factors like number of threads and of course input_size/output_size. In a typical torchvision ImageNet training job (where num_threads=1 because of DataLoader multi-processing), the following speed-ups should be expected (I ran much more benchmarks than this one, see below for more details): ``` (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=1 1.0X 1.0ms vs 1.0ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=1 1.9X 0.9ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=1 1.7X 0.9ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=1 2.1X 1.0ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=1 1.8X 0.9ms vs 0.5ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=1 7X 0.8ms vs 0.1ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=1 14X 0.852ms vs 0.061ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=1 9X 0.828ms vs 0.087ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=1 15X 0.922ms vs 0.061ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=1 10X 0.897ms vs 0.087ms ``` An immediate follow-up to this PR would be to do the same changes for the 3D kernels. Thanks a ton @fmassa for the help! Results:
``` ---------------------------------------------------------------------------------------------------- (1, 3, 64, 64) -> (224, 224) linear float32 num_threads=1 0.9X 0.9ms vs 1.1ms (1, 3, 64, 64) -> (224, 224) nearest float32 num_threads=1 1.6X 0.9ms vs 0.5ms (1, 3, 64, 64) -> (224, 224) nearest uint8 num_threads=1 1.7X 0.9ms vs 0.5ms (1, 3, 64, 64) -> (224, 224) nearest-exact float32 num_threads=1 1.7X 1.0ms vs 0.5ms (1, 3, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=1 1.9X 0.9ms vs 0.5ms (1, 1, 64, 64) -> (224, 224) linear float32 num_threads=1 8X 0.806ms vs 0.097ms (1, 1, 64, 64) -> (224, 224) nearest float32 num_threads=1 15X 0.848ms vs 0.056ms (1, 1, 64, 64) -> (224, 224) nearest uint8 num_threads=1 10X 0.828ms vs 0.084ms (1, 1, 64, 64) -> (224, 224) nearest-exact float32 num_threads=1 16X 0.914ms vs 0.057ms (1, 1, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=1 10X 0.900ms vs 0.086ms (1, 3, 64, 64) -> (224, 224) linear float32 num_threads=2 1.6X 1.1ms vs 0.7ms (1, 3, 64, 64) -> (224, 224) nearest float32 num_threads=2 1.6X 0.6ms vs 0.4ms (1, 3, 64, 64) -> (224, 224) nearest uint8 num_threads=2 1.7X 0.4ms vs 0.3ms (1, 3, 64, 64) -> (224, 224) nearest-exact float32 num_threads=2 1.7X 0.6ms vs 0.4ms (1, 3, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=2 1.7X 0.5ms vs 0.3ms (1, 1, 64, 64) -> (224, 224) linear float32 num_threads=2 9X 0.800ms vs 0.088ms (1, 1, 64, 64) -> (224, 224) nearest float32 num_threads=2 11X 0.459ms vs 0.043ms (1, 1, 64, 64) -> (224, 224) nearest uint8 num_threads=2 7X 0.424ms vs 0.064ms (1, 1, 64, 64) -> (224, 224) nearest-exact float32 num_threads=2 12X 0.503ms vs 0.043ms (1, 1, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=2 8X 0.461ms vs 0.059ms (1, 3, 64, 64) -> (224, 224) linear float32 num_threads=12 3X 1.1ms vs 0.3ms (1, 3, 64, 64) -> (224, 224) nearest float32 num_threads=12 1.6X 0.3ms vs 0.2ms (1, 3, 64, 64) -> (224, 224) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 64, 64) -> (224, 224) nearest-exact float32 num_threads=12 1.5X 0.3ms vs 0.2ms (1, 3, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 1, 64, 64) -> (224, 224) linear float32 num_threads=12 5X 0.8ms vs 0.2ms (1, 1, 64, 64) -> (224, 224) nearest float32 num_threads=12 10X 0.445ms vs 0.047ms (1, 1, 64, 64) -> (224, 224) nearest uint8 num_threads=12 7X 0.432ms vs 0.062ms (1, 1, 64, 64) -> (224, 224) nearest-exact float32 num_threads=12 10X 0.478ms vs 0.046ms (1, 1, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=12 7X 0.470ms vs 0.063ms (1, 3, 64, 64) -> (224, 224) linear float32 num_threads=32 3X 1.1ms vs 0.4ms (1, 3, 64, 64) -> (224, 224) nearest float32 num_threads=32 1.8X 0.3ms vs 0.2ms (1, 3, 64, 64) -> (224, 224) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 64, 64) -> (224, 224) nearest-exact float32 num_threads=32 1.4X 0.3ms vs 0.2ms (1, 3, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 1, 64, 64) -> (224, 224) linear float32 num_threads=32 11X 0.815ms vs 0.074ms (1, 1, 64, 64) -> (224, 224) nearest float32 num_threads=32 10X 0.443ms vs 0.045ms (1, 1, 64, 64) -> (224, 224) nearest uint8 num_threads=32 7X 0.436ms vs 0.061ms (1, 1, 64, 64) -> (224, 224) nearest-exact float32 num_threads=32 10X 0.478ms vs 0.046ms (1, 1, 64, 64) -> (224, 224) nearest-exact uint8 num_threads=32 8X 0.470ms vs 0.061ms ---------------------------------------------------------------------------------------------------- (1, 3, 128, 128) -> (224, 224) linear float32 num_threads=1 0.9X 0.9ms vs 1.1ms (1, 3, 128, 128) -> (224, 224) nearest float32 num_threads=1 1.5X 0.9ms vs 0.6ms (1, 3, 128, 128) -> (224, 224) nearest uint8 num_threads=1 1.7X 0.9ms vs 0.5ms (1, 3, 128, 128) -> (224, 224) nearest-exact float32 num_threads=1 1.6X 1.0ms vs 0.6ms (1, 3, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=1 1.8X 0.9ms vs 0.5ms (1, 1, 128, 128) -> (224, 224) linear float32 num_threads=1 8X 0.808ms vs 0.099ms (1, 1, 128, 128) -> (224, 224) nearest float32 num_threads=1 15X 0.848ms vs 0.058ms (1, 1, 128, 128) -> (224, 224) nearest uint8 num_threads=1 9X 0.820ms vs 0.087ms (1, 1, 128, 128) -> (224, 224) nearest-exact float32 num_threads=1 16X 0.909ms vs 0.059ms (1, 1, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=1 10X 0.898ms vs 0.088ms (1, 3, 128, 128) -> (224, 224) linear float32 num_threads=2 1.4X 0.9ms vs 0.7ms (1, 3, 128, 128) -> (224, 224) nearest float32 num_threads=2 1.5X 0.5ms vs 0.3ms (1, 3, 128, 128) -> (224, 224) nearest uint8 num_threads=2 1.7X 0.4ms vs 0.3ms (1, 3, 128, 128) -> (224, 224) nearest-exact float32 num_threads=2 1.5X 0.5ms vs 0.4ms (1, 3, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=2 1.8X 0.5ms vs 0.3ms (1, 1, 128, 128) -> (224, 224) linear float32 num_threads=2 9X 0.799ms vs 0.090ms (1, 1, 128, 128) -> (224, 224) nearest float32 num_threads=2 10X 0.459ms vs 0.045ms (1, 1, 128, 128) -> (224, 224) nearest uint8 num_threads=2 7X 0.427ms vs 0.059ms (1, 1, 128, 128) -> (224, 224) nearest-exact float32 num_threads=2 11X 0.501ms vs 0.044ms (1, 1, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=2 8X 0.460ms vs 0.060ms (1, 3, 128, 128) -> (224, 224) linear float32 num_threads=12 2.9X 1.0ms vs 0.3ms (1, 3, 128, 128) -> (224, 224) nearest float32 num_threads=12 1.2X 0.2ms vs 0.2ms (1, 3, 128, 128) -> (224, 224) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 128, 128) -> (224, 224) nearest-exact float32 num_threads=12 1.1X 0.2ms vs 0.2ms (1, 3, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 1, 128, 128) -> (224, 224) linear float32 num_threads=12 12X 0.809ms vs 0.068ms (1, 1, 128, 128) -> (224, 224) nearest float32 num_threads=12 11X 0.438ms vs 0.041ms (1, 1, 128, 128) -> (224, 224) nearest uint8 num_threads=12 8X 0.432ms vs 0.055ms (1, 1, 128, 128) -> (224, 224) nearest-exact float32 num_threads=12 12X 0.480ms vs 0.041ms (1, 1, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=12 8X 0.464ms vs 0.056ms (1, 3, 128, 128) -> (224, 224) linear float32 num_threads=32 3X 1.1ms vs 0.3ms (1, 3, 128, 128) -> (224, 224) nearest float32 num_threads=32 1.3X 0.3ms vs 0.2ms (1, 3, 128, 128) -> (224, 224) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 128, 128) -> (224, 224) nearest-exact float32 num_threads=32 1.4X 0.3ms vs 0.2ms (1, 3, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 128, 128) -> (224, 224) linear float32 num_threads=32 11X 0.813ms vs 0.075ms (1, 1, 128, 128) -> (224, 224) nearest float32 num_threads=32 10X 0.443ms vs 0.046ms (1, 1, 128, 128) -> (224, 224) nearest uint8 num_threads=32 7X 0.433ms vs 0.061ms (1, 1, 128, 128) -> (224, 224) nearest-exact float32 num_threads=32 10X 0.478ms vs 0.046ms (1, 1, 128, 128) -> (224, 224) nearest-exact uint8 num_threads=32 8X 0.470ms vs 0.062ms ---------------------------------------------------------------------------------------------------- (1, 3, 224, 224) -> (600, 400) linear float32 num_threads=1 0.9X 4.5ms vs 5.2ms (1, 3, 224, 224) -> (600, 400) nearest float32 num_threads=1 1.5X 4.2ms vs 2.8ms (1, 3, 224, 224) -> (600, 400) nearest uint8 num_threads=1 1.8X 4.1ms vs 2.3ms (1, 3, 224, 224) -> (600, 400) nearest-exact float32 num_threads=1 1.6X 4.5ms vs 2.8ms (1, 3, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=1 1.9X 4.4ms vs 2.3ms (1, 1, 224, 224) -> (600, 400) linear float32 num_threads=1 9X 3.8ms vs 0.4ms (1, 1, 224, 224) -> (600, 400) nearest float32 num_threads=1 17X 4.0ms vs 0.2ms (1, 1, 224, 224) -> (600, 400) nearest uint8 num_threads=1 11X 3.9ms vs 0.4ms (1, 1, 224, 224) -> (600, 400) nearest-exact float32 num_threads=1 19X 4.4ms vs 0.2ms (1, 1, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=1 12X 4.3ms vs 0.4ms (1, 3, 224, 224) -> (600, 400) linear float32 num_threads=2 1.5X 4.5ms vs 3.1ms (1, 3, 224, 224) -> (600, 400) nearest float32 num_threads=2 1.4X 2.3ms vs 1.6ms (1, 3, 224, 224) -> (600, 400) nearest uint8 num_threads=2 1.7X 2.1ms vs 1.2ms (1, 3, 224, 224) -> (600, 400) nearest-exact float32 num_threads=2 1.6X 2.5ms vs 1.6ms (1, 3, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=2 1.8X 2.2ms vs 1.2ms (1, 1, 224, 224) -> (600, 400) linear float32 num_threads=2 15X 3.8ms vs 0.3ms (1, 1, 224, 224) -> (600, 400) nearest float32 num_threads=2 15X 2.2ms vs 0.1ms (1, 1, 224, 224) -> (600, 400) nearest uint8 num_threads=2 7X 2.0ms vs 0.3ms (1, 1, 224, 224) -> (600, 400) nearest-exact float32 num_threads=2 16X 2.4ms vs 0.1ms (1, 1, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=2 8X 2.2ms vs 0.3ms (1, 3, 224, 224) -> (600, 400) linear float32 num_threads=12 8X 5.2ms vs 0.7ms (1, 3, 224, 224) -> (600, 400) nearest float32 num_threads=12 1.3X 0.6ms vs 0.4ms (1, 3, 224, 224) -> (600, 400) nearest uint8 num_threads=12 1.7X 0.4ms vs 0.2ms (1, 3, 224, 224) -> (600, 400) nearest-exact float32 num_threads=12 1.4X 0.6ms vs 0.4ms (1, 3, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=12 1.8X 0.4ms vs 0.2ms (1, 1, 224, 224) -> (600, 400) linear float32 num_threads=12 36X 3.9ms vs 0.1ms (1, 1, 224, 224) -> (600, 400) nearest float32 num_threads=12 10X 0.526ms vs 0.051ms (1, 1, 224, 224) -> (600, 400) nearest uint8 num_threads=12 7X 0.514ms vs 0.069ms (1, 1, 224, 224) -> (600, 400) nearest-exact float32 num_threads=12 11X 0.569ms vs 0.052ms (1, 1, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=12 8X 0.557ms vs 0.070ms (1, 3, 224, 224) -> (600, 400) linear float32 num_threads=32 9X 4.5ms vs 0.5ms (1, 3, 224, 224) -> (600, 400) nearest float32 num_threads=32 0.5X 0.2ms vs 0.5ms (1, 3, 224, 224) -> (600, 400) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 224, 224) -> (600, 400) nearest-exact float32 num_threads=32 1.0X 0.5ms vs 0.5ms (1, 3, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 224, 224) -> (600, 400) linear float32 num_threads=32 44X 3.864ms vs 0.087ms (1, 1, 224, 224) -> (600, 400) nearest float32 num_threads=32 10X 0.527ms vs 0.053ms (1, 1, 224, 224) -> (600, 400) nearest uint8 num_threads=32 7X 0.516ms vs 0.070ms (1, 1, 224, 224) -> (600, 400) nearest-exact float32 num_threads=32 10X 0.567ms vs 0.055ms (1, 1, 224, 224) -> (600, 400) nearest-exact uint8 num_threads=32 8X 0.558ms vs 0.072ms ---------------------------------------------------------------------------------------------------- (1, 3, 256, 256) -> (320, 320) linear float32 num_threads=1 1.0X 1.9ms vs 1.9ms (1, 3, 256, 256) -> (320, 320) nearest float32 num_threads=1 2.0X 1.8ms vs 0.9ms (1, 3, 256, 256) -> (320, 320) nearest uint8 num_threads=1 1.7X 1.8ms vs 1.0ms (1, 3, 256, 256) -> (320, 320) nearest-exact float32 num_threads=1 2.1X 1.9ms vs 0.9ms (1, 3, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=1 1.9X 1.9ms vs 1.0ms (1, 1, 256, 256) -> (320, 320) linear float32 num_threads=1 9X 1.6ms vs 0.2ms (1, 1, 256, 256) -> (320, 320) nearest float32 num_threads=1 16X 1.7ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) nearest uint8 num_threads=1 10X 1.7ms vs 0.2ms (1, 1, 256, 256) -> (320, 320) nearest-exact float32 num_threads=1 17X 1.9ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=1 11X 1.8ms vs 0.2ms (1, 3, 256, 256) -> (320, 320) linear float32 num_threads=2 1.7X 1.9ms vs 1.1ms (1, 3, 256, 256) -> (320, 320) nearest float32 num_threads=2 2.0X 1.0ms vs 0.5ms (1, 3, 256, 256) -> (320, 320) nearest uint8 num_threads=2 1.7X 0.9ms vs 0.5ms (1, 3, 256, 256) -> (320, 320) nearest-exact float32 num_threads=2 2.3X 1.1ms vs 0.5ms (1, 3, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=2 1.8X 1.0ms vs 0.5ms (1, 1, 256, 256) -> (320, 320) linear float32 num_threads=2 8X 1.6ms vs 0.2ms (1, 1, 256, 256) -> (320, 320) nearest float32 num_threads=2 14X 0.931ms vs 0.067ms (1, 1, 256, 256) -> (320, 320) nearest uint8 num_threads=2 7X 0.9ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) nearest-exact float32 num_threads=2 15X 1.016ms vs 0.069ms (1, 1, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=2 9X 0.9ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) linear float32 num_threads=12 8X 1.9ms vs 0.3ms (1, 3, 256, 256) -> (320, 320) nearest float32 num_threads=12 1.7X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest-exact float32 num_threads=12 1.9X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) linear float32 num_threads=12 20X 1.630ms vs 0.081ms (1, 1, 256, 256) -> (320, 320) nearest float32 num_threads=12 10X 0.457ms vs 0.044ms (1, 1, 256, 256) -> (320, 320) nearest uint8 num_threads=12 7X 0.439ms vs 0.060ms (1, 1, 256, 256) -> (320, 320) nearest-exact float32 num_threads=12 11X 0.485ms vs 0.045ms (1, 1, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=12 8X 0.474ms vs 0.061ms (1, 3, 256, 256) -> (320, 320) linear float32 num_threads=32 8X 1.9ms vs 0.3ms (1, 3, 256, 256) -> (320, 320) nearest float32 num_threads=32 2.0X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 3, 256, 256) -> (320, 320) nearest-exact float32 num_threads=32 1.4X 0.2ms vs 0.2ms (1, 3, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=32 1.4X 0.2ms vs 0.1ms (1, 1, 256, 256) -> (320, 320) linear float32 num_threads=32 21X 1.628ms vs 0.078ms (1, 1, 256, 256) -> (320, 320) nearest float32 num_threads=32 9X 0.453ms vs 0.048ms (1, 1, 256, 256) -> (320, 320) nearest uint8 num_threads=32 7X 0.445ms vs 0.063ms (1, 1, 256, 256) -> (320, 320) nearest-exact float32 num_threads=32 11X 0.535ms vs 0.048ms (1, 1, 256, 256) -> (320, 320) nearest-exact uint8 num_threads=32 8X 0.502ms vs 0.063ms ---------------------------------------------------------------------------------------------------- (1, 3, 500, 500) -> (800, 800) linear float32 num_threads=1 1.0X 13.8ms vs 14.0ms (1, 3, 500, 500) -> (800, 800) nearest float32 num_threads=1 1.8X 13.1ms vs 7.4ms (1, 3, 500, 500) -> (800, 800) nearest uint8 num_threads=1 1.8X 11.1ms vs 6.1ms (1, 3, 500, 500) -> (800, 800) nearest-exact float32 num_threads=1 1.9X 13.9ms vs 7.4ms (1, 3, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=1 1.9X 11.8ms vs 6.1ms (1, 1, 500, 500) -> (800, 800) linear float32 num_threads=1 10X 10.2ms vs 1.1ms (1, 1, 500, 500) -> (800, 800) nearest float32 num_threads=1 19X 10.8ms vs 0.6ms (1, 1, 500, 500) -> (800, 800) nearest uint8 num_threads=1 11X 10.4ms vs 0.9ms (1, 1, 500, 500) -> (800, 800) nearest-exact float32 num_threads=1 20X 11.6ms vs 0.6ms (1, 1, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=1 12X 11.4ms vs 0.9ms (1, 3, 500, 500) -> (800, 800) linear float32 num_threads=2 1.8X 13.7ms vs 7.7ms (1, 3, 500, 500) -> (800, 800) nearest float32 num_threads=2 2.6X 7.3ms vs 2.8ms (1, 3, 500, 500) -> (800, 800) nearest uint8 num_threads=2 1.8X 5.6ms vs 3.1ms (1, 3, 500, 500) -> (800, 800) nearest-exact float32 num_threads=2 1.9X 7.9ms vs 4.1ms (1, 3, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=2 1.9X 6.0ms vs 3.1ms (1, 1, 500, 500) -> (800, 800) linear float32 num_threads=2 18X 10.1ms vs 0.6ms (1, 1, 500, 500) -> (800, 800) nearest float32 num_threads=2 19X 5.8ms vs 0.3ms (1, 1, 500, 500) -> (800, 800) nearest uint8 num_threads=2 10X 5.3ms vs 0.5ms (1, 1, 500, 500) -> (800, 800) nearest-exact float32 num_threads=2 20X 6.3ms vs 0.3ms (1, 1, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=2 11X 5.7ms vs 0.5ms (1, 3, 500, 500) -> (800, 800) linear float32 num_threads=12 8X 13.8ms vs 1.6ms (1, 3, 500, 500) -> (800, 800) nearest float32 num_threads=12 2.9X 1.5ms vs 0.5ms (1, 3, 500, 500) -> (800, 800) nearest uint8 num_threads=12 1.7X 1.0ms vs 0.5ms (1, 3, 500, 500) -> (800, 800) nearest-exact float32 num_threads=12 1.5X 1.5ms vs 1.0ms (1, 3, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=12 1.8X 1.0ms vs 0.6ms (1, 1, 500, 500) -> (800, 800) linear float32 num_threads=12 80X 10.1ms vs 0.1ms (1, 1, 500, 500) -> (800, 800) nearest float32 num_threads=12 13X 0.928ms vs 0.072ms (1, 1, 500, 500) -> (800, 800) nearest uint8 num_threads=12 8X 0.9ms vs 0.1ms (1, 1, 500, 500) -> (800, 800) nearest-exact float32 num_threads=12 13X 1.001ms vs 0.074ms (1, 1, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=12 9X 1.0ms vs 0.1ms (1, 3, 500, 500) -> (800, 800) linear float32 num_threads=32 18X 14.0ms vs 0.8ms (1, 3, 500, 500) -> (800, 800) nearest float32 num_threads=32 1.9X 1.0ms vs 0.6ms (1, 3, 500, 500) -> (800, 800) nearest uint8 num_threads=32 2.9X 0.7ms vs 0.2ms (1, 3, 500, 500) -> (800, 800) nearest-exact float32 num_threads=32 1.7X 0.9ms vs 0.6ms (1, 3, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=32 1.8X 0.4ms vs 0.2ms (1, 1, 500, 500) -> (800, 800) linear float32 num_threads=32 111X 10.254ms vs 0.092ms (1, 1, 500, 500) -> (800, 800) nearest float32 num_threads=32 14X 0.784ms vs 0.056ms (1, 1, 500, 500) -> (800, 800) nearest uint8 num_threads=32 7X 0.551ms vs 0.075ms (1, 1, 500, 500) -> (800, 800) nearest-exact float32 num_threads=32 11X 0.607ms vs 0.057ms (1, 1, 500, 500) -> (800, 800) nearest-exact uint8 num_threads=32 8X 0.596ms vs 0.076ms ---------------------------------------------------------------------------------------------------- (1, 3, 224, 224) -> (64, 64) linear float32 num_threads=1 1.0X 0.084ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) nearest float32 num_threads=1 1.0X 0.077ms vs 0.078ms (1, 3, 224, 224) -> (64, 64) nearest uint8 num_threads=1 1.0X 0.076ms vs 0.076ms (1, 3, 224, 224) -> (64, 64) nearest-exact float32 num_threads=1 1.0X 0.083ms vs 0.083ms (1, 3, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=1 1.0X 0.081ms vs 0.082ms (1, 1, 224, 224) -> (64, 64) linear float32 num_threads=1 1.0X 0.071ms vs 0.071ms (1, 1, 224, 224) -> (64, 64) nearest float32 num_threads=1 1.0X 0.074ms vs 0.074ms (1, 1, 224, 224) -> (64, 64) nearest uint8 num_threads=1 1.0X 0.072ms vs 0.072ms (1, 1, 224, 224) -> (64, 64) nearest-exact float32 num_threads=1 1.0X 0.080ms vs 0.080ms (1, 1, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=1 0.9X 0.078ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) linear float32 num_threads=2 1.0X 0.083ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) nearest float32 num_threads=2 1.0X 0.076ms vs 0.077ms (1, 3, 224, 224) -> (64, 64) nearest uint8 num_threads=2 1.0X 0.075ms vs 0.074ms (1, 3, 224, 224) -> (64, 64) nearest-exact float32 num_threads=2 1.0X 0.082ms vs 0.083ms (1, 3, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=2 1.0X 0.080ms vs 0.083ms (1, 1, 224, 224) -> (64, 64) linear float32 num_threads=2 1.0X 0.070ms vs 0.071ms (1, 1, 224, 224) -> (64, 64) nearest float32 num_threads=2 1.0X 0.073ms vs 0.075ms (1, 1, 224, 224) -> (64, 64) nearest uint8 num_threads=2 1.0X 0.071ms vs 0.072ms (1, 1, 224, 224) -> (64, 64) nearest-exact float32 num_threads=2 1.0X 0.079ms vs 0.080ms (1, 1, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=2 1.0X 0.077ms vs 0.079ms (1, 3, 224, 224) -> (64, 64) linear float32 num_threads=12 1.0X 0.083ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) nearest float32 num_threads=12 1.0X 0.080ms vs 0.078ms (1, 3, 224, 224) -> (64, 64) nearest uint8 num_threads=12 1.0X 0.077ms vs 0.075ms (1, 3, 224, 224) -> (64, 64) nearest-exact float32 num_threads=12 1.0X 0.083ms vs 0.083ms (1, 3, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=12 1.0X 0.083ms vs 0.082ms (1, 1, 224, 224) -> (64, 64) linear float32 num_threads=12 1.0X 0.071ms vs 0.071ms (1, 1, 224, 224) -> (64, 64) nearest float32 num_threads=12 1.0X 0.076ms vs 0.074ms (1, 1, 224, 224) -> (64, 64) nearest uint8 num_threads=12 1.0X 0.073ms vs 0.071ms (1, 1, 224, 224) -> (64, 64) nearest-exact float32 num_threads=12 1.0X 0.080ms vs 0.080ms (1, 1, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=12 1.0X 0.080ms vs 0.078ms (1, 3, 224, 224) -> (64, 64) linear float32 num_threads=32 1.0X 0.084ms vs 0.084ms (1, 3, 224, 224) -> (64, 64) nearest float32 num_threads=32 1.0X 0.078ms vs 0.077ms (1, 3, 224, 224) -> (64, 64) nearest uint8 num_threads=32 1.0X 0.076ms vs 0.076ms (1, 3, 224, 224) -> (64, 64) nearest-exact float32 num_threads=32 1.0X 0.083ms vs 0.083ms (1, 3, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=32 1.0X 0.081ms vs 0.082ms (1, 1, 224, 224) -> (64, 64) linear float32 num_threads=32 1.0X 0.072ms vs 0.072ms (1, 1, 224, 224) -> (64, 64) nearest float32 num_threads=32 1.0X 0.074ms vs 0.075ms (1, 1, 224, 224) -> (64, 64) nearest uint8 num_threads=32 1.0X 0.072ms vs 0.072ms (1, 1, 224, 224) -> (64, 64) nearest-exact float32 num_threads=32 1.0X 0.077ms vs 0.080ms (1, 1, 224, 224) -> (64, 64) nearest-exact uint8 num_threads=32 1.0X 0.076ms vs 0.079ms ---------------------------------------------------------------------------------------------------- (1, 3, 224, 224) -> (128, 128) linear float32 num_threads=1 1.0X 0.3ms vs 0.3ms (1, 3, 224, 224) -> (128, 128) nearest float32 num_threads=1 1.8X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest uint8 num_threads=1 1.6X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest-exact float32 num_threads=1 2.0X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=1 1.7X 0.3ms vs 0.2ms (1, 1, 224, 224) -> (128, 128) linear float32 num_threads=1 6X 0.265ms vs 0.044ms (1, 1, 224, 224) -> (128, 128) nearest float32 num_threads=1 10X 0.280ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest uint8 num_threads=1 7X 0.273ms vs 0.037ms (1, 1, 224, 224) -> (128, 128) nearest-exact float32 num_threads=1 11X 0.303ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=1 8X 0.297ms vs 0.038ms (1, 3, 224, 224) -> (128, 128) linear float32 num_threads=2 1.5X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest float32 num_threads=2 1.8X 0.163ms vs 0.093ms (1, 3, 224, 224) -> (128, 128) nearest uint8 num_threads=2 1.5X 0.2ms vs 0.1ms (1, 3, 224, 224) -> (128, 128) nearest-exact float32 num_threads=2 1.9X 0.180ms vs 0.096ms (1, 3, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=2 1.6X 0.2ms vs 0.1ms (1, 1, 224, 224) -> (128, 128) linear float32 num_threads=2 6X 0.264ms vs 0.044ms (1, 1, 224, 224) -> (128, 128) nearest float32 num_threads=2 10X 0.278ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest uint8 num_threads=2 7X 0.270ms vs 0.037ms (1, 1, 224, 224) -> (128, 128) nearest-exact float32 num_threads=2 11X 0.298ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=2 8X 0.293ms vs 0.037ms (1, 3, 224, 224) -> (128, 128) linear float32 num_threads=12 1.5X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest float32 num_threads=12 1.7X 0.158ms vs 0.095ms (1, 3, 224, 224) -> (128, 128) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 224, 224) -> (128, 128) nearest-exact float32 num_threads=12 1.7X 0.170ms vs 0.100ms (1, 3, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 1, 224, 224) -> (128, 128) linear float32 num_threads=12 6X 0.269ms vs 0.043ms (1, 1, 224, 224) -> (128, 128) nearest float32 num_threads=12 11X 0.291ms vs 0.027ms (1, 1, 224, 224) -> (128, 128) nearest uint8 num_threads=12 8X 0.281ms vs 0.037ms (1, 1, 224, 224) -> (128, 128) nearest-exact float32 num_threads=12 11X 0.305ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=12 8X 0.306ms vs 0.038ms (1, 3, 224, 224) -> (128, 128) linear float32 num_threads=32 1.5X 0.3ms vs 0.2ms (1, 3, 224, 224) -> (128, 128) nearest float32 num_threads=32 1.6X 0.160ms vs 0.098ms (1, 3, 224, 224) -> (128, 128) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 224, 224) -> (128, 128) nearest-exact float32 num_threads=32 1.7X 0.171ms vs 0.099ms (1, 3, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 224, 224) -> (128, 128) linear float32 num_threads=32 6X 0.269ms vs 0.044ms (1, 1, 224, 224) -> (128, 128) nearest float32 num_threads=32 10X 0.282ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest uint8 num_threads=32 7X 0.276ms vs 0.037ms (1, 1, 224, 224) -> (128, 128) nearest-exact float32 num_threads=32 11X 0.305ms vs 0.028ms (1, 1, 224, 224) -> (128, 128) nearest-exact uint8 num_threads=32 8X 0.299ms vs 0.038ms ---------------------------------------------------------------------------------------------------- (1, 3, 320, 320) -> (256, 256) linear float32 num_threads=1 1.0X 1.2ms vs 1.3ms (1, 3, 320, 320) -> (256, 256) nearest float32 num_threads=1 2.0X 1.2ms vs 0.6ms (1, 3, 320, 320) -> (256, 256) nearest uint8 num_threads=1 1.7X 1.1ms vs 0.7ms (1, 3, 320, 320) -> (256, 256) nearest-exact float32 num_threads=1 2.1X 1.2ms vs 0.6ms (1, 3, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=1 1.9X 1.2ms vs 0.7ms (1, 1, 320, 320) -> (256, 256) linear float32 num_threads=1 8X 1.1ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) nearest float32 num_threads=1 15X 1.109ms vs 0.073ms (1, 1, 320, 320) -> (256, 256) nearest uint8 num_threads=1 10X 1.1ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) nearest-exact float32 num_threads=1 16X 1.192ms vs 0.074ms (1, 1, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=1 11X 1.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) linear float32 num_threads=2 1.7X 1.2ms vs 0.7ms (1, 3, 320, 320) -> (256, 256) nearest float32 num_threads=2 2.0X 0.6ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest uint8 num_threads=2 1.7X 0.6ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest-exact float32 num_threads=2 2.2X 0.7ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=2 1.8X 0.6ms vs 0.3ms (1, 1, 320, 320) -> (256, 256) linear float32 num_threads=2 9X 1.0ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) nearest float32 num_threads=2 11X 0.598ms vs 0.052ms (1, 1, 320, 320) -> (256, 256) nearest uint8 num_threads=2 8X 0.556ms vs 0.072ms (1, 1, 320, 320) -> (256, 256) nearest-exact float32 num_threads=2 12X 0.649ms vs 0.053ms (1, 1, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=2 8X 0.598ms vs 0.073ms (1, 3, 320, 320) -> (256, 256) linear float32 num_threads=12 5X 1.2ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest float32 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest uint8 num_threads=12 1.3X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest-exact float32 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=12 1.4X 0.2ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) linear float32 num_threads=12 9X 1.0ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) nearest float32 num_threads=12 12X 0.572ms vs 0.048ms (1, 1, 320, 320) -> (256, 256) nearest uint8 num_threads=12 8X 0.560ms vs 0.068ms (1, 1, 320, 320) -> (256, 256) nearest-exact float32 num_threads=12 13X 0.617ms vs 0.049ms (1, 1, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=12 9X 0.604ms vs 0.068ms (1, 3, 320, 320) -> (256, 256) linear float32 num_threads=32 5X 1.2ms vs 0.3ms (1, 3, 320, 320) -> (256, 256) nearest float32 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest uint8 num_threads=32 1.4X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest-exact float32 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 3, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=32 1.4X 0.2ms vs 0.1ms (1, 1, 320, 320) -> (256, 256) linear float32 num_threads=32 13X 1.042ms vs 0.081ms (1, 1, 320, 320) -> (256, 256) nearest float32 num_threads=32 12X 0.586ms vs 0.050ms (1, 1, 320, 320) -> (256, 256) nearest uint8 num_threads=32 8X 0.562ms vs 0.069ms (1, 1, 320, 320) -> (256, 256) nearest-exact float32 num_threads=32 12X 0.621ms vs 0.051ms (1, 1, 320, 320) -> (256, 256) nearest-exact uint8 num_threads=32 9X 0.609ms vs 0.070ms ---------------------------------------------------------------------------------------------------- (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=1 1.0X 1.0ms vs 1.0ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=1 1.9X 0.9ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=1 1.7X 0.9ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=1 2.1X 1.0ms vs 0.5ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=1 1.8X 0.9ms vs 0.5ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=1 7X 0.8ms vs 0.1ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=1 14X 0.852ms vs 0.061ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=1 9X 0.828ms vs 0.087ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=1 15X 0.922ms vs 0.061ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=1 10X 0.897ms vs 0.087ms (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=2 1.6X 0.9ms vs 0.6ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=2 1.9X 0.5ms vs 0.2ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=2 1.7X 0.4ms vs 0.3ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=2 2.1X 0.5ms vs 0.3ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=2 1.8X 0.5ms vs 0.3ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=2 10X 0.808ms vs 0.084ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=2 10X 0.462ms vs 0.046ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=2 7X 0.429ms vs 0.062ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=2 12X 0.504ms vs 0.044ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=2 7X 0.461ms vs 0.063ms (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=12 4X 1.0ms vs 0.2ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=12 1.7X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=12 1.5X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=12 1.9X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=12 1.6X 0.2ms vs 0.1ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=12 12X 0.820ms vs 0.067ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=12 11X 0.438ms vs 0.041ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=12 8X 0.431ms vs 0.056ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=12 12X 0.482ms vs 0.041ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=12 8X 0.467ms vs 0.056ms (1, 3, 600, 400) -> (224, 224) linear float32 num_threads=32 4X 1.0ms vs 0.3ms (1, 3, 600, 400) -> (224, 224) nearest float32 num_threads=32 1.7X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest uint8 num_threads=32 1.5X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest-exact float32 num_threads=32 1.8X 0.2ms vs 0.1ms (1, 3, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 600, 400) -> (224, 224) linear float32 num_threads=32 12X 0.824ms vs 0.070ms (1, 1, 600, 400) -> (224, 224) nearest float32 num_threads=32 10X 0.443ms vs 0.044ms (1, 1, 600, 400) -> (224, 224) nearest uint8 num_threads=32 7X 0.438ms vs 0.059ms (1, 1, 600, 400) -> (224, 224) nearest-exact float32 num_threads=32 11X 0.479ms vs 0.045ms (1, 1, 600, 400) -> (224, 224) nearest-exact uint8 num_threads=32 8X 0.470ms vs 0.059ms ---------------------------------------------------------------------------------------------------- (1, 3, 800, 800) -> (500, 500) linear float32 num_threads=1 1.0X 4.7ms vs 4.7ms (1, 3, 800, 800) -> (500, 500) nearest float32 num_threads=1 2.0X 4.4ms vs 2.2ms (1, 3, 800, 800) -> (500, 500) nearest uint8 num_threads=1 1.8X 4.3ms vs 2.5ms (1, 3, 800, 800) -> (500, 500) nearest-exact float32 num_threads=1 2.1X 4.7ms vs 2.2ms (1, 3, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=1 1.9X 4.6ms vs 2.5ms (1, 1, 800, 800) -> (500, 500) linear float32 num_threads=1 9X 4.0ms vs 0.4ms (1, 1, 800, 800) -> (500, 500) nearest float32 num_threads=1 17X 4.2ms vs 0.2ms (1, 1, 800, 800) -> (500, 500) nearest uint8 num_threads=1 11X 4.1ms vs 0.4ms (1, 1, 800, 800) -> (500, 500) nearest-exact float32 num_threads=1 19X 4.6ms vs 0.2ms (1, 1, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=1 12X 4.5ms vs 0.4ms (1, 3, 800, 800) -> (500, 500) linear float32 num_threads=2 1.7X 4.7ms vs 2.7ms (1, 3, 800, 800) -> (500, 500) nearest float32 num_threads=2 2.1X 2.4ms vs 1.1ms (1, 3, 800, 800) -> (500, 500) nearest uint8 num_threads=2 1.8X 2.2ms vs 1.3ms (1, 3, 800, 800) -> (500, 500) nearest-exact float32 num_threads=2 2.3X 2.6ms vs 1.1ms (1, 3, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=2 1.9X 2.3ms vs 1.3ms (1, 1, 800, 800) -> (500, 500) linear float32 num_threads=2 15X 4.0ms vs 0.3ms (1, 1, 800, 800) -> (500, 500) nearest float32 num_threads=2 16X 2.3ms vs 0.1ms (1, 1, 800, 800) -> (500, 500) nearest uint8 num_threads=2 9X 2.1ms vs 0.2ms (1, 1, 800, 800) -> (500, 500) nearest-exact float32 num_threads=2 17X 2.5ms vs 0.1ms (1, 1, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=2 10X 2.3ms vs 0.2ms (1, 3, 800, 800) -> (500, 500) linear float32 num_threads=12 10X 4.7ms vs 0.5ms (1, 3, 800, 800) -> (500, 500) nearest float32 num_threads=12 1.9X 0.4ms vs 0.2ms (1, 3, 800, 800) -> (500, 500) nearest uint8 num_threads=12 1.7X 0.4ms vs 0.2ms (1, 3, 800, 800) -> (500, 500) nearest-exact float32 num_threads=12 1.9X 0.4ms vs 0.2ms (1, 3, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=12 1.8X 0.4ms vs 0.2ms (1, 1, 800, 800) -> (500, 500) linear float32 num_threads=12 41X 3.969ms vs 0.096ms (1, 1, 800, 800) -> (500, 500) nearest float32 num_threads=12 11X 0.545ms vs 0.051ms (1, 1, 800, 800) -> (500, 500) nearest uint8 num_threads=12 8X 0.532ms vs 0.070ms (1, 1, 800, 800) -> (500, 500) nearest-exact float32 num_threads=12 11X 0.590ms vs 0.052ms (1, 1, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=12 8X 0.578ms vs 0.071ms (1, 3, 800, 800) -> (500, 500) linear float32 num_threads=32 17X 4.7ms vs 0.3ms (1, 3, 800, 800) -> (500, 500) nearest float32 num_threads=32 1.8X 0.2ms vs 0.1ms (1, 3, 800, 800) -> (500, 500) nearest uint8 num_threads=32 2.0X 0.3ms vs 0.1ms (1, 3, 800, 800) -> (500, 500) nearest-exact float32 num_threads=32 1.9X 0.2ms vs 0.1ms (1, 3, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=32 1.6X 0.2ms vs 0.1ms (1, 1, 800, 800) -> (500, 500) linear float32 num_threads=32 45X 4.028ms vs 0.090ms (1, 1, 800, 800) -> (500, 500) nearest float32 num_threads=32 10X 0.549ms vs 0.053ms (1, 1, 800, 800) -> (500, 500) nearest uint8 num_threads=32 7X 0.536ms vs 0.072ms (1, 1, 800, 800) -> (500, 500) nearest-exact float32 num_threads=32 11X 0.592ms vs 0.055ms (1, 1, 800, 800) -> (500, 500) nearest-exact uint8 num_threads=32 8X 0.581ms vs 0.074ms ```
Code:
I used this file which is adapted from https://github.com/pytorch/pytorch/blob/master/benchmarks/operator_benchmark/pt/interpolate_test.py ```py import operator_benchmark as op_bench import torch """Microbenchmarks for interpolate operator.""" class InterpolateBenchmark(op_bench.TorchBenchmarkBase): def init(self, input_size, output_size, channels_last=False, mode='linear', dtype=torch.float): input_image = torch.randint(0, 256, size=input_size, dtype=dtype, device='cpu', requires_grad=self.auto_set()) if channels_last: if input_image.ndim == 4: input_image = input_image.contiguous(memory_format=torch.channels_last) elif input_image.ndim == 5: input_image = input_image.contiguous(memory_format=torch.channels_last_3d) else: raise ValueError( f"Can not set channels_last to the input of {input_image.ndim} dims" ) align_corners = None if "nearest" in mode else False if mode == "linear": mode = { 3: 'linear', 4: 'bilinear', 5: 'trilinear', }[input_image.ndim] self.inputs = { "input_image": input_image, "output_size": output_size, "mode": mode, "align_corners": align_corners, } self.set_module_name("interpolate") def forward(self, input_image, output_size, mode, align_corners): return torch.nn.functional.interpolate(input_image, size=output_size, mode=mode, align_corners=align_corners) def make_config(): sizes = ( ((224, 224), (64, 64)), ((224, 224), (128, 128)), ((600, 400), (224, 224)), ((320, 320), (256, 256)), ((800, 800), (500, 500)), ) attrs = [] for (HW1, HW2) in sizes: attrs.append([(1, 3, *HW1), HW2]) # 3 channels attrs.append([(1, 1, *HW1), HW2]) # 1 channel attrs.append([(1, 3, *HW2), HW1]) # 3 channels attrs.append([(1, 1, *HW2), HW1]) # 1 channel config = op_bench.config_list( attr_names=["input_size", "output_size"], attrs=attrs, cross_product_configs={ 'channels_last': [True], 'mode': ["linear", "nearest", "nearest-exact"], 'dtype': [torch.float, torch.uint8] }, tags=["short"], ) def get_mode(l): for d in l: if "mode" in d: return d["mode"] def get_dtype(l): for d in l: if "dtype" in d: return d["dtype"] config = [l for l in config if not(get_mode(l) == "linear" and get_dtype(l) == torch.uint8)] return config config = make_config() op_bench.generate_pt_test(config, InterpolateBenchmark) if __name__ == "__main__": op_bench.benchmark_runner.main() ``` with ``` for num_threads in 1 2 12 32; do echo "num_threads=$num_threads" && python -m pt.my_interpolate_test --iterations 1000 --omp_num_threads $num_threads ; done > $out_file ``` and this very ugly helper ```py import re with open("main") as f: main = f.readlines() with open("new") as f: new = f.readlines() out = [] for main_line, new_line in zip(main, new): if main_line.startswith("num_threads="): num_threads = int(main_line.split("=")[-1]) if main_line.startswith("# Input"): deets = f"{main_line.strip()}, {num_threads=}" if main_line.startswith("Forward"): main_time = float(main_line.split()[-1]) new_time = float(new_line.split()[-1]) ratio = main_time / new_time fmt = ".1f" if ratio < 3 else ".0f" improv = f"{ratio:{fmt}}X" time_fmt = ",.3f" if new_time < 100 else ",.1f" deets = deets.strip().replace("# Input: ", "") deets = deets.replace(": ", "=") deets = deets.replace("input_size=", "") deets = deets.replace(", output_size=", " -> ") deets = deets.replace("dtype=torch.", "") deets = deets.replace("mode=", "") deets = deets.replace("channels_last=True, ", "") split = deets.split(",") size = ','.join(split[:-3]) mode, dtype, threads = split[-3:] deets = f"{size:<30} {mode:<15} {dtype:<10} {threads:<15}" l = f"{deets} {improv:<5} {main_time / 1000:{time_fmt}}ms vs {new_time / 1000:{time_fmt}}ms" out.append(l) def key(s): num_threads = (int(re.findall(r"num_threads=(\d+)", s)[0]),) input_shape, output_shape = re.findall("\(.*?\)", s) input_shape = input_shape[1:-1] # remove parenthesis input_HW = tuple(int(x) for x in input_shape.split(",")[-2:]) input_C = (-int(input_shape.split(",")[1]),) output_HW = tuple(int(x) for x in output_shape[1:-1].split(",")) is_downsample = (output_HW[0] < input_HW[0],) if "linear" in s: mode = "linear" elif "nearest-exact" in s: mode = "nearest-exact" else: assert "nearest" in s mode = "nearest" mode = (mode,) return is_downsample + input_HW + output_HW + num_threads + input_C + mode for i, l in enumerate(sorted(out, key=key)): if i % 10 == 0 and i % 40 != 0: print() if i % 40 == 0: print("-" * 100) print(l) ```
Closes https://github.com/pytorch/pytorch/issues/83840 When this is merged we should be able to remove some hack in vision as well https://github.com/pytorch/vision/pull/6661 (CC @vfdev-5 @datumbox ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86361 Approved by: https://github.com/vfdev-5, https://github.com/datumbox, https://github.com/fmassa commit 70c6a988d6b5cabed84c686316b6bbeb235cc05c Author: Wang, Eikan Date: Fri Oct 7 03:54:33 2022 +0000 Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) Currently, NNC only parallelizes the loop statement of the graph outputs. The logic could bypass some loop statements that could be parallelized. Take an example as follows and suppose the output of `ExternallCall` is also the output of NNC fusion group. Current [parallel logic](https://github.com/pytorch/pytorch/pull/85056/files#diff-9a11174c26e4b57ab73e819520122bc314467c72962f3a5b79e7400ea3c4bbe5L781-L785) only tries to parallel the `ExternalCall` and bypass `stmt1` and `stmt2`. ```c++ stmt1: For: stmt2: For: stmt3: ExternalCall ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85056 Approved by: https://github.com/frank-wei, https://github.com/bertmaher commit 2110c8944379bc3268c74e8d9f76c6fb3c896dfe Author: PyTorch MergeBot Date: Fri Oct 7 05:20:36 2022 +0000 Revert "Revert "Revert "SymIntify cat and narrow (#86191)"" (#86289)" This reverts commit e778fbf5197638d6196c5d5acf6f9588a1e83368. Reverted https://github.com/pytorch/pytorch/pull/86289 on behalf of https://github.com/seemethere due to Fails internal tests see: https://www.internalfb.com/intern/sandcastle/job/27021598552487548/ commit 6c604c9262307ffcaf1d7dd68bfa5f6b44513d06 Author: eqy Date: Fri Oct 7 05:13:37 2022 +0000 [CuDNN v8 API][Quantization]fix alignment function in quantized cuDNN V8 path (#86253) This bug was in the native cuDNN V8 API integration and was fixed a while ago, but the change was never ported here. Previously the returned alignment could be twice the actual alignment of the data if the alignment was smaller than 16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86253 Approved by: https://github.com/dzdang commit 455b873919d928a073eb2d60e07d1c5b2de2d6c6 Author: Sherlock Huang Date: Fri Oct 7 02:26:50 2022 +0000 Introduce a match filter for SubgraphRewriter (#86430) This PR introduces an interface for user defined function that filters the matches in SubgraphRewriter. The function will have the following signature. callable(match: InternalMatch, original_graph: Graph, pattern_graph: Graph) -> bool This filter is applied after SubgraphMatcher returns the matches, and before replacement takes place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86430 Approved by: https://github.com/jerryzh168 commit b5fd845fdf90121d91e8f4cf66a2de761707d22f Author: PyTorch MergeBot Date: Fri Oct 7 04:44:19 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#86399) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86399 Approved by: https://github.com/pytorchbot commit 10aead9adc20bd45b7692e97a64cb76f114c8e16 Author: Nikita Shulga Date: Fri Oct 7 04:39:28 2022 +0000 [MPS] Cache multinomial_with_replacement graph (#86437) Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses https://github.com/pytorch/pytorch/pull/86342#pullrequestreview-1132197848 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86437 Approved by: https://github.com/kulinseth commit 9ceadcadb21beb8e346109d804db35f0c213d8e0 Author: Elias Ellison Date: Fri Oct 7 00:18:44 2022 +0000 Fix unfold backward decomp aliasing for 0 dim input (#86428) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86428 Approved by: https://github.com/ngimel, https://github.com/ezyang commit b14f1d7bb855834ec5f2d3996746e048ba835d69 Author: Kevin Stephano Date: Fri Oct 7 03:55:13 2022 +0000 Add Skip List for Aten Ops that are fused in nvFuser. (#86101) This Skip List (tuple) is added under the nvprims context manager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86101 Approved by: https://github.com/jjsjann123, https://github.com/mruberry commit c5a4844085ea4db27912f5174be2585aebf7079a Author: Driss Guessous Date: Fri Oct 7 03:52:46 2022 +0000 Xformer SDP forward/backward kernel (#86157) Include xformer kernel code and make header updates to successfully build. Need to update the kernel calling code and dispatch system to clean this up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86157 Approved by: https://github.com/cpuhrsch commit ca39e3679ff834d67da20abaa3b313664c935d8a Author: PyTorch MergeBot Date: Fri Oct 7 03:19:28 2022 +0000 [vision hash update] update the pinned vision hash (#86173) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86173 Approved by: https://github.com/pytorchbot commit 2fec853c8796adbf1b6b13fc095b032c5b9ef7b9 Author: Sherlock Huang Date: Thu Oct 6 23:18:04 2022 +0000 Fix SubgraphMatcher for case of no anchor found (#86421) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86421 Approved by: https://github.com/jerryzh168 commit b73f0e98d5eed44729aeb5925912b7038ce7c59a Author: Michael Voznesensky Date: Fri Oct 7 01:46:51 2022 +0000 Fix cond tests after CI was disabled for a bit (#86321) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86321 Approved by: https://github.com/zou3519 commit ca69ddb4f7b1e1449756177889c454edb8bc091f Author: Alex Date: Fri Oct 7 01:38:57 2022 +0000 Fix broadcasting to implicit leading dimensions in `torch.where` on MPS (#86240) Fixes #86239 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86240 Approved by: https://github.com/kulinseth commit 0e30da3f2f3620caa91ada734d9cd3b91d4ee606 Author: Zafar Date: Fri Oct 7 00:58:38 2022 +0000 [refactor] Renaming ao.sparsity to ao.pruning (#84867) `Sparsity` as a term doesn't reflect the tools that are developed by the AO. The `torch/ao/sparsity` also has utilities for structured pruning, which internally we always referred to as just "pruning". To avoid any confusion, we renamed `Sparsity` to `Prune`. We will not be introducing the backwards compatibility, as so far this toolset was kept under silent development. This change will reflect the changes in the documentation as well. **TODO:** - [ ] Change the tutorials - [ ] Confirm no bc-breakages - [ ] Reflect the changes in the trackers and RFC docs Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84867 Approved by: https://github.com/supriyar commit 9a170b24f64d7cfdd887ff122c241ac6ff85f4c6 Author: Dennis van der Staay Date: Fri Oct 7 00:29:32 2022 +0000 Cleanup PT-D imports (#85781) Summary: The flow logic around torch.dist imports results in large number of pyre errors (100's); would be preferable to just raise on importing as opposed to silently fail. Con: Some percentage (MacOS?) of users may have notebooks that imports PT-D, although would think small, since any attempt to call parts of the library would just fail... TODO: assuming ok, will remove the 10's-100's of unused pyre ignores no longer required. Test Plan: existing unit tests Differential Revision: D39842273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85781 Approved by: https://github.com/mrshenli commit a2419638373071c74692c9fe5996c69ef509f581 Author: Masaki Kozuki Date: Fri Oct 7 00:10:25 2022 +0000 [nll_loss] Avoid unnecessary type casts (#86086) follow-up #85395 `AT_DISPATCH_NLL_LOSS_INDEX_TYPES` should not be removed in favor of #59765 and there's a testcase https://github.com/pytorch/pytorch/blob/99ca25e6eb8299f31824bdbaf62f16f8a8db458d/test/test_nn.py#L16832 Besides the dispatcher, I wanted to sanity check `int64_t ignore_index` because `int64_t` can be inappropriate considering that `target` can be `Byte`. However, given that the default value is -100 as in https://github.com/pytorch/pytorch/blob/0a75c42f36c0e50a22c06fa65478df53d7d420c4/aten/src/ATen/native/native_functions.yaml#L9949 it's not easy to add a check while keeping the backward compatibility. Thus I decided to not add a check. cc @lezcano @t-vi Pull Request resolved: https://github.com/pytorch/pytorch/pull/86086 Approved by: https://github.com/lezcano commit 2232db7fc12301a2226d1921948917d5b23b6888 Author: Nikita Shulga Date: Fri Oct 7 00:08:42 2022 +0000 Replacement is irrelevant for 1-sample multinomial (#86342) So use fast path, both on CPU and on MPS Also, remove some spurious copy-n-paste checks from MPS codepath CUDA already has this optimization, see https://github.com/pytorch/pytorch/blob/dc9c507d24d0c833cb09105177326f1f6bbe99c4/aten/src/ATen/native/cuda/MultinomialKernel.cu#L355-L356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86342 Approved by: https://github.com/ngimel commit 5a8b07de75acc2b03fadee4fa12384cc5e779a0f Author: Peter Bell Date: Mon Aug 29 21:21:17 2022 +0100 Declare public dependencies on libshm (#82694) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82694 Approved by: https://github.com/malfet commit 08e3999fa494238f8f62346a140da36bd43864e7 Author: Brian Hirsh Date: Thu Oct 6 13:25:05 2022 -0700 Merge more symbolic meta kernels and symint changes from branch (#86334) symintify split_with_sizes, dropout, fused_fake_obs_quant. meta for padding_2d ops add meta_bernoulli_ meta kernel for at::gather get pytorch_struct to pass: meta for scatter_add, fix backward symintify split ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/86334 Approved by: https://github.com/ezyang commit 3af0eafea69f200bd83c5e0c06f5af5fb4a90c28 Author: atalman Date: Thu Oct 6 23:26:58 2022 +0000 Release 1.13: Bump nightly version 1.13->1.14 (#86296) Release 1.13: Bump nightly version 1.13->1.14 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86296 Approved by: https://github.com/seemethere, https://github.com/malfet commit 5ed75ec1d7131c1aa65c94669d24dffbcb5d8769 Author: Tongzhou Wang Date: Thu Oct 6 23:11:22 2022 +0000 Fix SparseAdam consuming iterator (#86210) Fixes https://github.com/pytorch/pytorch/issues/86209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86210 Approved by: https://github.com/cpuhrsch commit f0977c4658c6f8c10e3342cf9a0249d5d23a3505 Author: Rohan Varma Date: Thu Oct 6 19:21:55 2022 +0000 [FSDP] Doc to explain running submodules (#86343) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86343 Approved by: https://github.com/awgu commit 3db8ddcac10239dc44aeeab16ab66c82444f358d Author: Rohan Varma Date: Thu Oct 6 19:14:02 2022 +0000 [FSDP] Fix clip_grad_norm for CPU offload (#86337) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86337 Approved by: https://github.com/awgu commit adfd8f382331adbf9cbfa14039ef3b61f2f4e10c Author: Rohan Varma Date: Thu Oct 6 19:14:02 2022 +0000 [FSDP] assert to runtime error (#86336) Prefer raising an error over `assert` which should mostly to indicate a developer bug, but user can cause this error path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86336 Approved by: https://github.com/awgu commit 7a411952fbb82cec38da936a7d863da49726699f Author: Rohan Varma Date: Thu Oct 6 19:14:01 2022 +0000 CheckpointSequential support non-reentrant (#86331) Closes https://github.com/pytorch/pytorch/issues/86328 Adds `use_reentrant` argument to `checkpoint_sequential`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86331 Approved by: https://github.com/zhaojuanmao, https://github.com/albanD commit 3037f3d710184b56d949087b39438649d314bac0 Author: David Date: Thu Oct 6 22:38:50 2022 +0000 Docs: fix typo (#86273) Typo in torch.fx.Interpreter.fetch_attr docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/86273 Approved by: https://github.com/kit1980 commit 233d6f195aa404766448c33d18b8e7ca5e66de51 Author: PyTorch MergeBot Date: Thu Oct 6 22:02:02 2022 +0000 Revert "Fix memory leak in _LRScheduler.step() (#85602)" This reverts commit eb32330d6b3709dc8910eb298d8802fbca57b05c. Reverted https://github.com/pytorch/pytorch/pull/85602 on behalf of https://github.com/albanD due to newly added test is flaky commit bf746798841bd42c9b849716f9eeefde3271e93d Author: atalman Date: Thu Oct 6 21:55:33 2022 +0000 Fix for binary upload step, use bash shell rather then default sh (#86382) This fixes the issue during upload: ``` Run # reference ends with an RC suffix if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then echo "UPLOAD_CHANNEL=test" >> "$GITHUB_ENV" fi shell: sh -e {0} /__w/_temp/f045f5d8-ddb.sh: 2: [[: not found ``` Test failure: https://github.com/pytorch/pytorch/actions/runs/3199561387/jobs/5225448559 Test success: https://github.com/pytorch/pytorch/actions/runs/3199573560/jobs/5225480345 Error started when we switched to: continuumio/miniconda3:4.12.0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86382 Approved by: https://github.com/weiwangmeta commit facf210f9a6e98bdeb2ec343b8f16c5bb047c4ce Author: HDCharles Date: Thu Oct 6 11:21:24 2022 -0700 [ao] fixing public v private for qconfig.py (#86026) Summary: no changes, just removed the exception for this file, someone had already fixed the actual file Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86026 Approved by: https://github.com/jerryzh168 commit 7c5e07f87ba0443ada94bf849ed3cceb9a2f31a2 Author: Jay Chae Date: Thu Oct 6 21:36:15 2022 +0000 [kineto] guard global observer init against Edge profiler (#86347) Summary: looks like Sandcastle CI didn't cover any of concrete mobile CI(cc: kimishpatel i'd assume we have ton of mobile tests in Github?). This is failing on Oculus with the similar failure as Mac(not sure if this is an ARM thing). either way on demand tracing should not be enabled on these platforms so disable them completely in the future, we should have runtime check on this for even safer guarding Test Plan: Set up Hollywood via P536072492 crash on mutex. likely SIOF ``` FORTIFY: pthread_mutex_lock called on a destroyed mutex (0x5d7e298b08) *** Aborted at 1665017107 (Unix time, try 'date -d 1665017107') *** *** Signal 6 (SIGABRT) (0xeca) received by PID 3786 (pthread TID 0x785bd1eed0) (linux TID 3786) (maybe from PID 3786, UID 0) (code: -1), stack trace: *** (error retrieving stack trace) ``` Redacted in the top but the test passes without the crash P536101962 Differential Revision: D40129840 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86347 Approved by: https://github.com/aaronenyeshi commit bc919ac7963be9f113b4e3c8b668905404301f8f Author: Jiaxu Zhu Date: Thu Oct 6 20:05:56 2022 +0000 [torch.ao.quantization] include torch.qint32 for static quant (#86345) Summary: include `torch.qint32` to `activation_is_statically_quantized` and `get_quant_type` so that fakequantize with `dtype=torch.qint32` won't be skipped Test Plan: updated `test_custom_module_class` Differential Revision: D40128178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86345 Approved by: https://github.com/jerryzh168 commit 08780229df8f860dbef3fa82ffd1072b124b29c5 Author: lezcano Date: Thu Oct 6 15:55:21 2022 +0000 Two small improvements to references (#86371) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/86371 Approved by: https://github.com/mruberry commit 795906f207bb95245d47501645876b5d165aee3e Author: Huy Do Date: Thu Oct 6 18:53:59 2022 +0000 Add total GPU memory utilization (#86250) Although we already have per process GPU memory usage, I'm curious to see what is the number for `gpu_utilization.memory` per https://docs.nvidia.com/deploy/nvml-api/structnvmlUtilization__t.html. Also fixing a tiny typo issue that has been bugging me for a while `total_gpu_utilizaiton` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86250 Approved by: https://github.com/ZainRizvi commit 1059d3b52d9984f32f87432f9f87eaf7164d7f88 Author: Zain Rizvi Date: Thu Oct 6 18:47:07 2022 +0000 Make mergebot message clearer when starting a new merge (#86311) Modifying how the merge started message appears to make it more readable. Also removing some deprecated v1 land checks messages Old: image New: image Pull Request resolved: https://github.com/pytorch/pytorch/pull/86311 Approved by: https://github.com/malfet, https://github.com/huydhn commit 6b295cd0460825ff29ab151208de137d76bf8364 Author: Pearu Peterson Date: Thu Oct 6 13:11:58 2022 +0300 Enable autograd on Linear with sparse COO weight (#86302) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86302 Approved by: https://github.com/cpuhrsch commit 8f2c2167d42d067adf4fe7e13f04de0d0b6d87aa Author: Pearu Peterson Date: Thu Oct 6 13:11:57 2022 +0300 Support autograd on sparse_mm in full. (#86301) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86301 Approved by: https://github.com/cpuhrsch commit 88b882cd1c93e8fe9b4f2bf0c542c700a8ba69a6 Author: Pearu Peterson Date: Thu Oct 6 13:11:57 2022 +0300 Support sum on a sparse COO tensor. (#86300) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86300 Approved by: https://github.com/cpuhrsch commit f104490d635747e4164e954d36954ea3a01731a5 Author: Pearu Peterson Date: Thu Oct 6 13:11:56 2022 +0300 Support autograd on Linear with sparse compressed weight. (#86137) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86137 Approved by: https://github.com/cpuhrsch commit fc21cc82fcdb07d604dd9ae161acc05b93097c1b Author: Pearu Peterson Date: Thu Oct 6 13:11:56 2022 +0300 Enable sparse_dim() and dense_dim() methods for Strided tensors (#86203) The reason for enabling sparse/dense_dim() for strided tensors is to have more meaningful error messages: For instance, compare ``` NotImplementedError: Could not run 'aten::sparse_dim' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::sparse_dim' is only available for these backends: [SparseCPU, SparseCUDA, SparseMeta, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher]. ``` [master] vs ``` RuntimeError: addmm: matrices expected, got 0D tensor ``` [this PR] where the latter message gives a hint of which function is to blame for dealing with unexpected inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86203 Approved by: https://github.com/cpuhrsch commit bed1ece9c54f10580ee870ae1d73edcb9279727f Author: PyTorch MergeBot Date: Thu Oct 6 17:34:29 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#86306) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86306 Approved by: https://github.com/pytorchbot commit eb32330d6b3709dc8910eb298d8802fbca57b05c Author: Chengqi Deng Date: Thu Oct 6 17:07:34 2022 +0000 Fix memory leak in _LRScheduler.step() (#85602) Fixes #85410 This diff removed the cyclic references in `_LRScheduler.step()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85602 Approved by: https://github.com/albanD commit b8b564c90872316fe84fc781631afed5e78e069a Author: Huy Do Date: Thu Oct 6 16:47:45 2022 +0000 Ensure the minimum NVIDIA driver version to be 515.57 for CUDA 11.7 (#86344) This does 2 things: * Ensure that `nvidia-driver-latest-dkms` package is removed if it's installed. This allows the installation to go forward without the below error when using the standard installation script from S3: ``` (Answer: Abort installation) ERROR: The installation was canceled due to the availability or presence of an alternate driver installation. Please see /var/log/nvidia-installer.log for more details. ``` * Not skipping the installation if a driver different than `515.57` exists to avoid any unexpected behavior when using a different driver version. This partly addresses the recent issue in https://github.com/pytorch/pytorch/issues/85778 in which `510.60.02` is there instead (not sure from where) and fails CUDA 11.7 test Pull Request resolved: https://github.com/pytorch/pytorch/pull/86344 Approved by: https://github.com/atalman, https://github.com/malfet commit 0c148a4b5f1d30daf7401b9c1131d290274e0cd3 Author: Christian Puhrsch Date: Thu Oct 6 16:28:05 2022 +0000 Remove extra bracket, update header definition (#86317) Summary: Fix compilation error Test Plan: Unit test Reviewed By: malfet, mikaylagawarecki Differential Revision: D40108369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86317 Approved by: https://github.com/malfet commit fb9b96593c784b86b3d913ef8799ee120c203207 Author: Peter Bell Date: Mon Aug 29 21:21:16 2022 +0100 Use FindCUDAToolkit to find cuda dependencies (#82695) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet commit fa799132d82c3c48253aaf7d3ee3a8c5e007350d Author: Nikita Shulga Date: Thu Oct 6 15:38:57 2022 +0000 [MPS] Better error message for `slow_conv2d_forward` (#86303) Error `Could not run 'aten::_slow_conv2d_forward' with arguments from the 'MPS' backend.` is very misleading as usually this method is only invoked if input is on CPU but weights are on MPS device. Raise a more user friendly error in this case Add test to `test_invalid_conv2d` to check for those conditions. Fixes https://github.com/pytorch/pytorch/issues/77931 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86303 Approved by: https://github.com/kulinseth commit 4d7728890b134b712c16ace20e6660f1f840db43 Author: Edward Z. Yang Date: Wed Oct 5 21:42:02 2022 -0700 Inline asIntArrayRef (#86350) I was benchmarking and this is worth maybe 5% on at::empty, but it's basically free so we should do it. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86350 Approved by: https://github.com/albanD commit cebf08afb24dec0720935b9a9bd64ecf05b472d5 Author: andrewor14 Date: Wed Oct 5 15:30:59 2022 -0700 [Quant] Remove weight from DTypeConfig for non-weighted ops (#86335) Summary: Weight dtypes should be specified only for weighted ops like conv and linear. This commit removes weight dtypes from the DTypeConfigs used in binary ops and fixed qparams ops. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/86335 Approved by: https://github.com/vkuzo commit cdbffa7f665dd144ada92c11d223aeb8b5c3887a Author: Antoni Viros i Martin Date: Thu Oct 6 13:10:25 2022 +0000 🦊 [AI Accelerators] Consolidate native_layer_norm for nested tensor (#86295) Summary: In order to make the layer normalization implementation for nested tensors public, it needs to be generalized to accept a normalized_shape argument instead of assuming it to be the last dimension of the nested_tensor. This commit does that, as well as adding extra unit tests to ensure the implementation is correct. Test Plan: All unit tests designed to test different ways of using the function work: `buck test //caffe2/test:nested -- test_layer_norm` Differential Revision: D40105207 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86295 Approved by: https://github.com/drisspg commit 85c3b745f6fc94b757f30f518108ee64ffd292a5 Author: John Detloff Date: Thu Oct 6 10:08:54 2022 +0000 Conditionally build the TestApp benchmark based on lite interpreter (#86314) The TestApp benchmark was recently re-added, however it seems it only builds when pytorch is built with the lite interpreter. This diff adds a macro to compile out the benchmark when pytorch is built as full jit. This should fix our full jit simulator nightly builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86314 Approved by: https://github.com/malfet commit 936e93058b2781d6cee2da59cccba051726dd46f Author: Sahan Paliskara Date: Wed Oct 5 21:06:01 2022 -0700 Delete torch::deploy from pytorch core (#85953) As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there. This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953 Approved by: https://github.com/seemethere, https://github.com/malfet commit 27c3fb03864597909a7288e82e3e6699131e7509 Author: Seonglyong Gong Date: Thu Oct 6 06:32:25 2022 +0000 [Profiler] trace verbose=false by default (#86263) Summary: - Added config option to remove 'Call stack' field from trace file (#84982) - Change default value to `false` Test Plan: - `experimental_config=_ExperimentalConfig(verbose=true),` will add 'Call stack' field back in the trace file. - CI tests Differential Revision: D40092377 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86263 Approved by: https://github.com/aaronenyeshi commit a117fde86febc2b1c27e7a0e809ae22d46e33849 Author: Seonglyong Gong Date: Thu Oct 6 06:18:56 2022 +0000 [Profiler] Apply TensorMetadata for Optimizer and nnModule (#86047) Summary: - Use `TensorMetadat` struct in saving tensor info from Optimizer and nnModule. Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39682205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86047 Approved by: https://github.com/chaekit, https://github.com/robieta commit fd5085c445c3f1a4c90e55154cf26fe30f52a0ab Author: albanD Date: Thu Oct 6 04:46:19 2022 +0000 Symintify getitem and add the required helper functions (#86207) Note that this might not cover every use of the function (we know it doesn't) But this is enough to get few models passing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86207 Approved by: https://github.com/ezyang, https://github.com/Chillee, https://github.com/bdhirsh commit 0a75c42f36c0e50a22c06fa65478df53d7d420c4 Author: Edward Yang Date: Thu Oct 6 04:11:05 2022 +0000 Workaround MSVC ICE due to constexpr char* template argument (#86288) Test Plan: Lease a Windows sandcastle https://www.internalfb.com/intern/wiki/Windows_Platform_Engineering/Leasable_VM_-_User_Guide/ and run: ``` buck build arvr/mode/win/opt //xplat/caffe2:_C_impl ``` Differential Revision: D40109191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86288 Approved by: https://github.com/albanD, https://github.com/malfet commit 45f03d69486e45b67bfcab9e60a2c24aa5f1ea8d Author: Edward Z. Yang Date: Wed Oct 5 14:44:34 2022 -0700 Add at::symint:: namespace for ease of templated functions (#86329) Our prevailing strategy for symbolic shapes in C++ is to only write the SymInt version of the code, and pay a slight performance tax from not knowing if it is symbolic or not. However, there are some fastpath functions where this tax is unacceptable, and we want to specialize for the int case. Sometimes, it is easy to template the function; but when the function involves Tensors, it is not, because the functions you may want to call are not templated, e.g., t.view vs t.view_symint This PR adds an at::symint:: namespace which contains templated functions for all functions in PyTorch which you can use in this way. To show this works, I refactored sum_to to stop incorrectly reinterpret casting and instead use a template. Instead of t.sizes(), we call at::symint::sizes(t), and so forth. The template functions are SFINAE'd using a template argument that is not otherwise used. As such, deduction is impossible. Typically, deduction is hard anyway, because many of the constructors are ambiguous (this is why we split foo and foo_symint in the first place). So you must pass a template argument to these functions. These functions are codegened into Functions.h so they are subject to per-operator headers. This matters most for methods, which likely didn't include the per-operator header, so you will have to add an include in that case. We never generate method variants for these. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86329 Approved by: https://github.com/bdhirsh, https://github.com/voznesenskym commit ea21a982f25120235d91e3be5a371a26855c112c Author: Edward Z. Yang Date: Tue Oct 4 23:08:51 2022 -0400 Reduce warning suppression by just disabling pytest warnings plugin (#86255) Fixes https://github.com/pytorch/pytorch/issues/85626 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86255 Approved by: https://github.com/lezcano, https://github.com/albanD commit adf5919720c02dcf8c1ff32c890dd1c4e54d6fe7 Author: Edward Z. Yang Date: Mon Oct 3 13:56:53 2022 -0700 Add option to record C++ backtraces in _record_memory_history (#86145) I used this to debug https://github.com/pytorch/pytorch/issues/86136 so it is useful. The implementation is not so fast so it is not enabled by default. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86145 Approved by: https://github.com/albanD, https://github.com/zdevito commit 97d6b5bbf89172ad94143f4ce4a9b9a3a4d7b744 Author: Edward Z. Yang Date: Mon Oct 3 13:56:53 2022 -0700 Refactor _cuda_recordMemoryHistory to use pybind11 (#86139) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86139 Approved by: https://github.com/albanD commit d04889323e2bc0b7321b76e564292565c88b9a5e Author: Elias Ellison Date: Wed Oct 5 21:25:25 2022 +0000 Add Context Manager for Disabling Multithreading in Backwards, use in aot autograd (#86245) We were running into a few issues with running multithreaded backwards in aot_autograd: such as https://github.com/pytorch/pytorch/issues/86136, and `FakeTensorMode` getting into a weird state as a result of not executing functions completely sequentially. The multithreaded backwards is lost in translation when we trace out the backwards anyway, and adds a lot of additional complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86245 Approved by: https://github.com/albanD, https://github.com/yf225 commit 237316aa1da372e894a3bd4ad5ad7e831e3b7636 Author: Vasiliy Kuznetsov Date: Tue Oct 4 16:59:15 2022 -0700 PNP: early FX numeric suite tool to quantize each layer N times (#80521) Summary: This PR is an early prototype of a tool to quantize each layer of a model N times, with N qconfigs each. We follow the design agreed upon in https://fburl.com/gdoc/e1gaq3ih . Current API: ``` m = M().eval() example_input = (torch.randn(2, 2),) qconfig_mappings = [ QConfigMapping().set_global(torch.quantization.default_qconfig), QConfigMapping().set_global(torch.quantization.default_dynamic_qconfig), ] backend_config = get_native_backend_config() msp = prepare_n_shadows_model( m, example_input, qconfig_mappings, backend_config) for _ in range(2): msp(*example_input) msq = convert_n_shadows_model(msp) msq(*example_input) results = extract_results_n_shadows_model(msq) print_comparisons_n_shadows_model(results) // example output subgraph_idx ref_node_name best_idx 1 2 -------------- --------------- ---------- ------- ------- subgraph_0 fc1 2 42.0834 42.6279 subgraph_1 fc2 2 43.7259 50.0593 ``` Test plan: ``` python test/test_quantization.py -k test_n_shadows ``` Differential Revision: [D37650332](https://our.internmc.facebook.com/intern/diff/D37650332) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80521 Approved by: https://github.com/jerryzh168, https://github.com/andrewor14 commit b233d83471147bf578c7ae79df2ee8bc30c10ca2 Author: Yu Guo Date: Thu Oct 6 01:08:59 2022 +0000 make torch.histc ignore NaNs on CPU (#85870) Summary: cuda torch.histc already ignores NaNs Test Plan: unittest added Differential Revision: D39911272 fix https://github.com/pytorch/pytorch/issues/85853 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85870 Approved by: https://github.com/ngimel commit ddec1eea05e8c2efc772536cf94d578950c37f5e Author: Mike Iovine Date: Thu Oct 6 01:07:40 2022 +0000 [Static Runtime] Block linalg_svdvals codegen & run codegen script (#85983) Summary: The test is causing issues: ``` terminate called after throwing an instance of 'std::runtime_error' what(): The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): graph(%A: Tensor, %driver: str?): %bias: None = prim::Constant() %ret = aten::linalg_svdvals(%A, %driver) ~~~~ <--- HERE %cloned = aten::clone(%ret, %bias) return (%cloned) RuntimeError: torch.linalg.svd: keyword argument `driver=` is only supported on CUDA inputs with cuSOLVER backend. ``` Just block the op and re-run the codegen script to remove everything and update the generated ops. Test Plan: Existing tests Differential Revision: D39973860 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85983 Approved by: https://github.com/xuzhao9, https://github.com/tenpercent commit bebd1622490becd09de97003bd22761e973d3edd Author: Charlie Yan Date: Thu Oct 6 00:48:54 2022 +0000 Fix doc of DDP (#86244) (#86256) [ghstack-poisoned] Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86256 Approved by: https://github.com/rohan-varma commit 020f2b2c0b697a9bbc5422b2c4428c4f6604f11b Author: Brian Hirsh Date: Wed Oct 5 11:25:31 2022 -0700 add myself for dynamic shapes PR review (#86292) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86292 Approved by: https://github.com/albanD commit dc9c507d24d0c833cb09105177326f1f6bbe99c4 Author: Natalia Gimelshein Date: Wed Oct 5 23:59:16 2022 +0000 add nominal support for int32 indices in index/index_put ops (#86309) Currently index_select/index_add decompositions decompose to `index` or `index_put` ops. The problem with this is that `index_select` and `index_add` accept int32 indices while `index` doesn't. That leads to error in meta func for those decompositions. This PR adds non-performant support for int32 indices to `index` operations, to allow decompositions go through. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86309 Approved by: https://github.com/lezcano commit e8b0bea677b44206f663788e3a9d6a85b3779ed2 Author: Edward Z. Yang Date: Wed Oct 5 12:46:41 2022 -0700 Rename fromIntArrayRef to fromIntArrayRefSlow, audit call sites (#86235) Some of them are known non-negative, I've revised them accordingly. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86235 Approved by: https://github.com/albanD commit 168ba066e3944a1bd897fe25f29a6754e31ca186 Author: PyTorch MergeBot Date: Wed Oct 5 22:42:56 2022 +0000 Revert "Symintify getitem and add the required helper functions (#86207)" This reverts commit 17addb307ee9a4d12ad6918e90358a9a47a4f12b. Reverted https://github.com/pytorch/pytorch/pull/86207 on behalf of https://github.com/malfet due to Broke lint, by double-registering `meta_index_put`, but no CI was run during the outage commit be4e43c7d05eb67923d84162a7a4203173db3206 Author: Rohan Varma Date: Wed Oct 5 22:30:02 2022 +0000 Remove DataParallel remnants from DDP doc (#86221) As @aazzolini pointed out, the docstring is incorrect and probably vestige from DP / single process multi device mode in DDP. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/86221 Approved by: https://github.com/aazzolini commit 9e1a43122046536d5c1fedc6b1e6d912ca6afb51 Author: Sherlock Huang Date: Wed Oct 5 18:27:34 2022 +0000 Mark ctc_loss with dynamic_output_shape (#86293) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86293 Approved by: https://github.com/eellison commit 0e5a27fb8d7df8541251f1ebfc4373c1358c1bab Author: Edward Z. Yang Date: Wed Oct 5 15:10:18 2022 -0400 Fix horribly double truncation bug in Scalar (#86304) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86304 Approved by: https://github.com/albanD commit 73777d8a2bed1c8878a1858ab8241f5acf0d022b Author: HDCharles Date: Tue Oct 4 13:04:45 2022 -0700 [ao] fixing public v private for quantization_mappings.py (#86025) Summary: no significant changes, just added __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86025 Approved by: https://github.com/jerryzh168 commit 28a5cd94802c33a29c2d4435f0fb79152711819b Author: HDCharles Date: Tue Oct 4 13:04:44 2022 -0700 [ao] fixing public v private for quantize_jit.py (#86024) Summary: just needed to add __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86024 Approved by: https://github.com/jerryzh168 commit 17addb307ee9a4d12ad6918e90358a9a47a4f12b Author: albanD Date: Wed Oct 5 21:19:00 2022 +0000 Symintify getitem and add the required helper functions (#86207) Note that this might not cover every use of the function (we know it doesn't) But this is enough to get few models passing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86207 Approved by: https://github.com/ezyang commit b8895df8db23213a0db50fe833930dd1f4e4b5a5 Author: albanD Date: Wed Oct 5 21:08:40 2022 +0000 Fix black binary again for debug python (#86275) The `--no-binary` flag was not ported when moving from black only to ufmt. This adds it back. This is to work around the fact that black binary hard crashes when running with debug python and it needs to be compiled from source. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86275 Approved by: https://github.com/bdhirsh, https://github.com/malfet commit e778fbf5197638d6196c5d5acf6f9588a1e83368 Author: Edward Z. Yang Date: Wed Oct 5 11:32:48 2022 -0700 Revert "Revert "SymIntify cat and narrow (#86191)"" (#86289) This reverts commit fc94a2115b31dfe7a0d8f28eb4f5ed532c4f0792. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86289 Approved by: https://github.com/wconstab commit 089a64e99e2d2b937c72e25c0fa6e4f673b8a1a1 Author: Min Si Date: Wed Oct 5 20:02:02 2022 +0000 Install c10d headers with absolute path (#86257) https://github.com/pytorch/pytorch/pull/85780 updated all c10d headers in pytorch to use absolute path following the other distributed components. However, the headers were still copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch`, thus external extentions still have to reference the c10d headers as ``, making the usage inconsistent (the only exception was c10d/exception.h, which was copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`). This patch fixes the installation step to copy all c10d headers to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`, thus external extensions can consistently reference c10d headers with the absolute path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86257 Approved by: https://github.com/kumpera commit b67e022833df912068406dbd1da2345e6693c7db Author: lezcano Date: Wed Oct 5 12:06:28 2022 +0000 Fix ref / decomposition index_add (#86266) The decomposition of `index_add` was using `slice(None)`, when it should use just `None`. The reference for index_add was also wrong, as `x[idx] += t` does not use atomic add, so it does not work when several `idx`s point to the same location. This PR adds extra reference inputs to help test for this. Fixes https://github.com/pytorch/torchdynamo/issues/1356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86266 Approved by: https://github.com/ngimel commit 14db44ad72f0110b484c0c8aaf520e110cc91f53 Author: HDCharles Date: Tue Oct 4 13:04:43 2022 -0700 [ao] fixing public v private for quantize.py (#86023) Summary: just needed to add __all__ Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86023 Approved by: https://github.com/jerryzh168 commit c21caff8765f00ed5a2e1ed448ad5e6329c87b8d Author: HDCharles Date: Tue Oct 4 13:04:43 2022 -0700 [ao] correctly set public v private for fake_quantize.py (#86022) Summary: biggest issue was that the constructors for the fake_quantize classes use custom partials that live in the observer module and so the module for these needed to be set correctly in the constructor class method Test Plan: python test/test_public_bindings.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/86022 Approved by: https://github.com/jerryzh168 commit 3b1ec7511e6d616fbe2e9f8721ff9be6c55d3d42 Author: Edward Z. Yang Date: Tue Oct 4 14:56:28 2022 -0700 Optimize is_symbolic test and some refactor (#86230) Our SymInt rep can be represented more efficiently as just a greater than test, but the compiler doesn't seem to figure it out. Help it out. There is also some refactoring to simplify the code and add more debugging. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86230 Approved by: https://github.com/albanD commit 8c6d352bcfa3a8d2f7322d3577117b2d432cd002 Author: Bin Chen Date: Wed Oct 5 18:23:53 2022 +0000 Log a new "timer expired" event to Scuba in file_based_local_timer (#85861) Summary: The "kill worker process" event was logged to Scuba only when the worker process was really reaped. We want to add a new event "timer expired", no matter the worker process will be reaped or not. This will help collect data before we enable the JustKnob to kill the worker process on timeout. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test:local_agent_test ``` ``` Test Session: https://www.internalfb.com/intern/testinfra/testrun/7318349508929624 RE: reSessionID-ea464c43-54e7-44f2-942b-14ea8aa98c74 Up: 10.5 KiB Down: 1.1 MiB Jobs completed: 100. Time elapsed: 3206.9s. Cache hits: 91%. Commands: 11 (cached: 10, remote: 1, local: 0) Tests finished: Pass 55. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` -------- ``` buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test/fb:local_agent_fb_internal_test ``` ``` Test Session: https://www.internalfb.com/intern/testinfra/testrun/6473924579130483 RE: reSessionID-231a47b7-a43d-4c0f-9f73-64713ffcbbd3 Up: 5.7 MiB Down: 1.9 GiB Jobs completed: 182156. Time elapsed: 282.4s. Cache hits: 99%. Commands: 72112 (cached: 72107, remote: 1, local: 4) Tests finished: Pass 2. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` Differential Revision: D39903376 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85861 Approved by: https://github.com/d4l3k commit fc94a2115b31dfe7a0d8f28eb4f5ed532c4f0792 Author: PyTorch MergeBot Date: Wed Oct 5 17:19:55 2022 +0000 Revert "SymIntify cat and narrow (#86191)" This reverts commit 63d8d4f6ec5c973ad7b8669cd39ee9b550e5f55b. Reverted https://github.com/pytorch/pytorch/pull/86191 on behalf of https://github.com/seemethere due to Fails internal tests, see [D40106464](https://www.internalfb.com/diff/D40106464) commit 3ec71fce79f4e568c48796da4b18a3e6f2c6fc29 Author: Peter Bell Date: Tue Oct 4 21:12:22 2022 +0100 Improve make_tensor performance for float and complex types (#85473) For floating types, `make_tensor` calls `rand` and then does a linear interpolation from `low` to `high`. This instead calls `uniform_(low, high)` to cut out the interpolation step. For complex types, `make_tensor` does the `rand` + interpolation step twice and calls `torch.complex(real, imag)` at the end. This instead uses `view_as_real` and `uniform_(low, high)` to fuse it all into one operation. My benchmarks show significant speedups in all cases for float32 and complex64. | Device | dtype | Size | Master (us) | This PR (us) | Speedup | |--------|-----------|-------|-------------|--------------|---------| | CPU | float32 | 8 | 19.4 | 6.34 | 3.1 | | | | 4096 | 36.8 | 21.3 | 1.7 | | | | 2**24 | 167,000 | 80,500 | 2.1 | | | complex32 | 8 | 37.0 | 7.57 | 4.9 | | | | 4096 | 73.1 | 37.6 | 1.9 | | | | 2**24 | 409,000 | 161,000 | 2.5 | | CUDA | float32 | 8 | 40.4 | 11.7 | 3.5 | | | | 4096 | 38.7 | 11.7 | 3.3 | | | | 2**24 | 2,300 | 238 | 9.7 | | | complex32 | 8 | 78.7 | 14 | 5.6 | | | | 4096 | 82.7 | 13.8 | 6.0 | | | | 2**24 | 5,520 | 489 | 11.3 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/85473 Approved by: https://github.com/mruberry commit 7f607e8cb5c933fda87149e64e3a74f125d8adaf Author: PyTorch MergeBot Date: Wed Oct 5 17:02:33 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#85774) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85774 Approved by: https://github.com/pytorchbot, https://github.com/malfet commit 97d2e1df5565b7f3a5358178b8f3a2a039c7f976 Author: Nikita Shulga Date: Wed Oct 5 09:09:17 2022 -0700 [MPS] Fix GELU for `torch.half` (#86218) Also, make sure it raises catcheable errors if invoked with integral types Otherwise, it used to fail with following fatal error invoked for `torch.half` and with similar signatures if invoked for integral types ``` loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/4883e71d-37bd-11ed-b0ef-b25c5e9b9057/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<2xf16>' and 'tensor<1xf32>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s). ``` Modified `test_gelu_simple` to check both fwd and backward gradients for gelu commit 63d8d4f6ec5c973ad7b8669cd39ee9b550e5f55b Author: Will Constable Date: Wed Oct 5 14:46:55 2022 +0000 SymIntify cat and narrow (#86191) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86191 Approved by: https://github.com/ezyang commit 0e03dc5f1e00a9e021ec8f6e98d0c7df7af78d03 Author: Horace He Date: Wed Oct 5 11:08:05 2022 +0000 Remove softmax from recomputable ops (#86268) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86268 Approved by: https://github.com/ezyang commit c609768896ead0b4bb439a0b03e58360a5c00023 Author: lezcano Date: Wed Oct 5 10:13:17 2022 +0000 Add refs for torch.unfold and a decomposition for its backward. (#85629) It's not clear to me what's the difference between `unfold` and `unfold_copy`, as this latter one is codegen'd I also took this chance to clean the implementation of unfold and its reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/85629 Approved by: https://github.com/mruberry commit 67eb2d5952741f2024c826d008ed35b8a1cc56d9 Author: Andrew Gu Date: Tue Oct 4 20:37:51 2022 +0000 [FSDP] Dequeue one instead of flush (#86165) For the rate limiter, I initially implemented the approach of only dequeueing a single event, but there was concern about blocking the CPU _every_ iteration. The landed approach instead blocks every `_max_num_inflight_all_gathers` iterations and flushes the entire queue. However, upon further analysis, the approach of dequeueing a single event should be more performant with the same memory usage -- as the name suggests, both have `_max_num_inflight_all_gathers` concurrently inflight all-gathers. The cost of blocking the CPU thread is not important compared to the duration the CPU thread is actually blocked. This PR's approach reduces the latter quantity. **Fast Communication; Slow Computation** Screen Shot 2022-10-04 at 4 15 13 PM **Slow Communication; Fast Computation** Screen Shot 2022-10-04 at 4 34 15 PM **T5-11B** 2 nodes / 16 40 GB A100s with EFA and batch size 6: - [Old] 5.81 s / batch; 24 and 20 CUDA malloc retries on local rank 0s; 35.234 GB peak active; 38.806 GB peak reserved - [New] 5.10 s / batch; 25 and 29 CUDA malloc retries on local rank 0s; 35.234 GB peak active; 38.868 GB peak reserved 4 nodes / 32 40 GB A100s with EFA and batch size 7: - [Old] 5.21 s / batch; 0, 0, 0, 0 CUDA malloc retries on local rank 0s; 33.695 GB peak active; 38.494 GB peak reserved - [New] 4.93 s / batch; 1, 0, 0, 0 CUDA malloc retries on local rank 0s; 33.678 GB peak active; 38.792 GB peak reserved The new version changes the fragmentation in the allocator. It is possible that by blocking the CPU thread more in the old approach, the initial blocks used to serve the all-gather stream allocations are different compared to the new approach. Even though the number of CUDA malloc retries increases slightly, the net result is a speedup with the new approach. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86165 Approved by: https://github.com/zhaojuanmao commit 1c5ca724f42edcb669afa491ab67bcd3bcc9a70e Author: Thytu Date: Wed Oct 5 11:13:29 2022 +0000 PixelShuffle check that output is not null before applying kernel (#85155) (#86262) * Checks that output tensor is not null before applying kernel in `pixel_shuffle` op * Checks that output tensor is not null before applying kernel in `pixel_unshuffle` op * Add test case testing `pixel_shuffle` with shapes producing empty output * Add test case testing `pixel_unshuffle` with shapes producing empty output Fixes #85155 FYI @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/86262 Approved by: https://github.com/lezcano commit 9d6109c4b049df29fe6c4fc8288670f4104c3249 Author: Philip Meier Date: Wed Oct 5 10:33:26 2022 +0000 improve annotations (#86105) In `torchvision` we started to use tensor subclasses. With the current annotations, this minimal example throws three errors when checking with `mypy`: ```py from typing import Type, TypeVar, Any, Optional, Union import torch T = TypeVar("T", bound="TensorSubclass") class TensorSubclass(torch.Tensor): def __new__( cls: Type[T], data: Any, *, dtype: Optional[torch.dtype] = None, device: Optional[Union[torch.device, str, int]] = None, ) -> T: return torch.as_tensor(data, dtype=dtype, device=device).as_subclass(cls) ``` ``` main.py:16:16: error: Incompatible return value type (got "Tensor", expected "T") [return-value] main.py:16:58: error: Argument "device" to "as_tensor" has incompatible type "Union[device, str, int, None]"; expected "Optional[device]" [arg-type] main.py:16:78: error: Argument 1 to "as_subclass" of "_TensorBase" has incompatible type "Type[T]"; expected "Tensor" [arg-type] ``` I'll explain inline why the old annotations are wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86105 Approved by: https://github.com/albanD commit 736adc08084814c13d57324f74adad091e304eb2 Author: Zachary DeVito Date: Tue Oct 4 21:50:27 2022 -0700 Memory snapshots from C++ (#86190) Sometimes the driving process want to save memory snapshots but isn't Python. Add a simple API to turn it on without python stack traces. It still saves to the same format for the vizualization and summary scripts, using the C++ Pickler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86190 Approved by: https://github.com/ezyang commit a348975e00081334ac96d855932d2753a62f1e77 Author: Jane Xu Date: Wed Oct 5 06:33:25 2022 +0000 Add opteinsum backend to give users control (#86219) This achieves the same things as https://github.com/pytorch/pytorch/pull/85908 but using backends instead of kwargs (which breaks torchscript unfortunately). This also does mean we let go of numpy compatibility BUT the wins here are that users can control what opt einsum they wanna do! The backend allows for..well you should just read the docs: ``` .. attribute:: torch.backends.opteinsum.enabled A :class:`bool` that controls whether opt_einsum is enabled (on by default). If so, torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html) to calculate an optimal path of contraction for faster performance. .. attribute:: torch.backends.opteinsum.strategy A :class:`str` that specifies which strategies to try when `torch.backends.opteinsum.enabled` is True. By default, torch.einsum will try the "auto" strategy, but the "greedy" and "optimal" strategies are also supported. Note that the "optimal" strategy is factorial on the number of inputs as it tries all possible paths. See more details in opt_einsum's docs (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html). ``` In trying (and failing) to land 85908, I discovered that jit script does NOT actually pull from python's version of einsum (because it cannot support variadic args nor kwargs). Thus I learned that jitted einsum does not subscribe to the new opt_einsum path calculation. Overall, this is fine since jit script is getting deprecated, but where is the best place to document this? - added tests to CI - locally tested that trying to set the strategy to something invalid will error properly - locally tested that tests will pass even if you don't have opt-einsum - locally tested that setting the strategy when opt-einsum is not there will also error properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/86219 Approved by: https://github.com/soulitzer, https://github.com/malfet commit db13049b8844f3e7a15ebd902572639c19b21fc1 Author: Zachary DeVito Date: Tue Oct 4 20:03:44 2022 -0700 [allocator tracing] missing GIL acquire (#86254) Bug where the context destructor needs to hold the GIL to free the context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86254 Approved by: https://github.com/ezyang commit d07b85393abd79d07ecfca7378b8f3c7342650a2 Author: Edward Z. Yang Date: Tue Oct 4 19:07:32 2022 -0700 SymInt fixes from symbolic-shapes branch (#86242) symintify a few inplace meta functions symintify resize_(), nbytes(), functionalization input mutations meta funcs for avg_pool2d_backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/86242 Approved by: https://github.com/Chillee commit ac25c210e5452d360fcc8cf5ea96c85756e3e370 Author: David Berard Date: Tue Oct 4 23:42:20 2022 +0000 [jit][easy] remove deprecated escape sequence (#85987) Not sure why but this started throwing a lot of warnings while I was adding tests to test_freezing.py, so I'm removing the deprecated escape sequences to get rid of the warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85987 Approved by: https://github.com/eellison commit 2355b6256b9fcafc6e6f01301650538898ca7b8e Author: Nikita Shulga Date: Wed Oct 5 01:02:31 2022 +0000 Remove `std::cout` from `multinomial_out_mps` (#86246) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86246 Approved by: https://github.com/xuzhao9, https://github.com/seemethere commit 4f95f7ae9b664fec153ba2069f44b311238649d7 Author: Richard Zou Date: Tue Oct 4 10:48:59 2022 -0700 Remove unnecessary header (#86212) This appears to fix an internal build failure Pull Request resolved: https://github.com/pytorch/pytorch/pull/86212 Approved by: https://github.com/samdow commit 6d7235e3d391e10b24821ed97bd397fca19b8120 Author: albanD Date: Wed Oct 5 00:15:11 2022 +0000 enable cpu meta testing (#86226) Just add the relevant skips for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86226 Approved by: https://github.com/ezyang commit 1432b9978b9e3838a7940700fb54f89b63fc72e5 Author: lezcano Date: Tue Oct 4 21:14:45 2022 +0000 Add ref for cumsum (#86229) As noted in the comment, this decomposition may not be as efficient as specific implementations of it in different backends. Added here to then benchmark it. Note that this is needed by TorchInductor https://github.com/pytorch/torchdynamo/issues/883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86229 Approved by: https://github.com/ngimel commit b317736c3990a4d42fe5be7de3944c1a2a6c2667 Author: Peter Bell Date: Tue Oct 4 21:14:00 2022 +0100 Fix default correction value in std/var decompositions (#85839) `torch.std` and `torch.var` default to the unbiased estimator, i.e. `correction=1`. This only works as is because the default on this overload is not exercised by the tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85839 Approved by: https://github.com/ezyang commit adb12438c127d1ded1c0ba027eb8621ca709e427 Author: Zafar Date: Tue Oct 4 22:44:13 2022 +0000 [AO] Cubic sparsity level scheduler (#85232) The scheduler updates the levels of sparsity based on https://arxiv.org/abs/1710.01878. The update rule is defined as: $$ \begin{aligned} s_t &= s_f + (s_i - s_f)\left( 1 - \frac{t - t_0}{n\Delta t} \right)^3 \\ \mbox{for} ~ t &\in \left\\{ t_0, t_0+\Delta t, \dots, t_0 + n\Delta t \right\\} \end{aligned} $$ There is a minor difference compared to the original paper. By providing `initially_zero` argument, one can set the level of sparsity before step $t_0$: If `False`, the sparsity level before $t_0$ is set to $s_i$, otherwise 0. ``` python test/test_ao_sparsity.py -- TestCubicScheduler ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85232 Approved by: https://github.com/junesg, https://github.com/jerryzh168 commit 248796987e45e565daa2515dffeb8c187650bf72 Author: Andrew Gu Date: Tue Oct 4 12:53:41 2022 +0000 [FSDP] Expose internal prefetch limits (#86198) This PR refactors the prefetching implementation to enable a module to prefetch more than one all-gather. - The motivation is for backward prefetching, but forward prefetching is included in the change as well. - The prefetching limit is a _limit_. In some edge cases (e.g. dynamic graph or first/last module), the limit may not be reached. - The prefetching limit is kept as internal in this PR -- it is set as local variables `backward_prefetch_limit` and `forward_prefetch_limit` in the `FullyShardedDataParallel` constructor and passed to the `_ExecOrderData()` constructor. - This PR additionally includes some clean up for forward prefetching but does not change any semantics assuming static graph. If we increase the `backward_prefetch_limit` to `2`, then a typical pattern may be that the first module in the pre-backward prefetches 2, but every next module only prefetches 1 since its first target was already prefetched by the previous. If we did not do this behavior, then with more modules, the prefetching would run further and further ahead. **`_handles_prefetched`** - This is used to avoid multiple modules prefetching the same handles keys. - `_handles_prefetched[handles_key]` is set to `True` when the prefetch for `handles_key` happens from the CPU thread (`_prefetch_handles()`). - `_handles_prefetched[handles_key]` is set to `False` when any handle in `handles_key` is resharded (`_reshard()`). - `_handles_prefetched` is cleared at the end of the backward (`_wait_for_post_backward()`). **`_needs_pre_backward_unshard`** - This is used to determine if a handles key should be backward prefetched at all. - `_needs_pre_backward_unshard[handles_key]` is set to `False` in the post-forward (`_register_pre_backward_hooks()`). - `_needs_pre_backward_unshard[handles_key]` is set to `True` in the post-forward if the forward outputs include tensors that require gradient (`_register_pre_backward_hook()`). - `_needs_pre_backward_unshard[handles_key]` is set to `False` in the pre-backward hook, after unsharding (`_pre_backward_hook()`). **`_needs_pre_forward_unshard`** - This is used to determine if a handles key should be forward prefetched at all. - `_needs_pre_forward_unshard[handles_key]` is set to `True` in the root's pre-forward (`_fsdp_root_pre_forward()`). - `_needs_pre_forward_unshard[handles_key]` is set to `False` in the pre-forward unshard (`_pre_forward_unshard()`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/86198 Approved by: https://github.com/zhaojuanmao commit f20e4eab7b63a04f39e06ce1e535c45a85ae1672 Author: Jing Xu Date: Tue Oct 4 21:57:05 2022 +0000 Fix ITT unit-tests if PyTorch is compiled with `USE_ITT=OFF` (#86199) Fixes https://github.com/pytorch/pytorch/pull/84848#discussion_r986329680 @malfet @slgong-fb Pull Request resolved: https://github.com/pytorch/pytorch/pull/86199 Approved by: https://github.com/malfet commit d39e9c1e9087069fa774b0e3eb47e04750edca88 Author: Howard Huang Date: Mon Oct 3 16:45:22 2022 -0700 [6/N] [Dispatchable Collectives] Update recv with CPU / CUDA implementations (#83876) * - Updates for the recv collective https://github.com/pytorch/pytorch/issues/86225 Differential Revision: [D40044552](https://our.internmc.facebook.com/intern/diff/D40044552) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83876 Approved by: https://github.com/kwen2501 commit d447eff146118f42ef4146161a37aba7fc3ac069 Author: Jay Chae Date: Tue Oct 4 20:02:41 2022 +0000 [kineto] make ProfilerKineto the only option (#84714) Differential Revision: D39356665 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84714 Approved by: https://github.com/slgong-fb, https://github.com/aaronenyeshi commit d724a9193560216234df13d2f55f71845daa72ba Author: Nirav Mehta Date: Tue Oct 4 19:43:54 2022 +0000 Adding Wunused-local-typedef build flag (#86154) In the past, we have seen PRs causing internal breakages caused by `-Wunused-local-typedef` flag which than had to be fixed. For example: [#79978](https://github.com/pytorch/pytorch/pull/79978) As part of this change, we want to catch this error in the PR Checks itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86154 Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/osalpekar commit 8da704cdb7f68bfa09516e7be17f004b98c48eb3 Author: Nikita Shulga Date: Tue Oct 4 19:01:48 2022 +0000 [MPS] Remove incorrect asserts from `Copy.mm` (#86184) Those asserts simply do not work for views. I.e. they are erroneously triggered for in `copy_to_mps_` when running something like `python -c "import torch;x=torch.empty(10,device='mps');y=torch.tensor([10]);print(x.shape);x[2]=y[0]"` And in `copy_from_mps_` when running the same script, but with order of devices inverted: `python -c "import torch;x=torch.empty(10);y=torch.tensor([10], device="mps");print(x.shape);x[2]=y[0]"` If this was supposed to be a boundary check, than it should have validated, that `storage_offset() + nbytes() <= storage.nbytes()`, but this check is already done by the upper layer, isn't it? Fixes https://github.com/pytorch/pytorch/issues/86153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86184 Approved by: https://github.com/kulinseth commit 9da5646cdb378c37e222e176478eaabca585579d Author: Elias Ellison Date: Tue Oct 4 16:15:56 2022 +0000 Add device logic handling for functions which allow scalar inputs as tensors (#86149) Some functions allow scalars as tensor inputs. Add handling for them in device logic. Fix for https://github.com/pytorch/torchdynamo/issues/1445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86149 Approved by: https://github.com/ezyang, https://github.com/bdhirsh commit d6b030856be34532e7bfeadf342c69dd9762fb13 Author: Khushi Date: Tue Oct 4 18:21:45 2022 +0000 [primTorch] special: j0, j1, spherical_j0 (#86049) Adds prims and refs for special functions (bessel_j0, bessel_j1, spherical_bessel_j0). Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/86049 Approved by: https://github.com/mruberry commit 8bce2f3d22c454ed8000245d5f21c16ea9ac4b0d Author: Richard Zou Date: Mon Oct 3 13:31:11 2022 -0700 [easy] Add spaces to vmap over as_strided error message (#86150) Lack of spaces made it harder to read Pull Request resolved: https://github.com/pytorch/pytorch/pull/86150 Approved by: https://github.com/samdow commit e1859c0707a5624583f77476e1feed94e45f342a Author: Elias Ellison Date: Mon Oct 3 20:19:40 2022 +0000 delete special fake tensor new handling (#86144) Delete the special-cased handling of `new` in FakeTensor. Ever since the dispatch keys were updated to reflect the FakeTensor's device, the special cased handling was not needed. Fixes https://github.com/pytorch/torchdynamo/issues/1448 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86144 Approved by: https://github.com/ezyang commit 3f2e7d5c9a5569e4c2d4857d01697fc2bfbfe4fa Author: Howard Huang Date: Mon Oct 3 16:45:22 2022 -0700 [5/N] [Dispatchable Collectives] Update send with CPU / CUDA implementations (#83859) Differential Revision: [D40044550](https://our.internmc.facebook.com/intern/diff/D40044550) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83859 Approved by: https://github.com/kwen2501 commit a75edfa97c9d985a337fbea7b9c0f4061153aaf0 Author: Jing Xu Date: Tue Oct 4 08:20:13 2022 +0000 Move ITT testing to its own test case (#86174) Fixes https://github.com/pytorch/pytorch/pull/84848#discussion_r986329680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86174 Approved by: https://github.com/malfet commit b95e0fcc2c40f08f43cc69edbb0168ab17facbda Author: Horace He Date: Tue Oct 4 04:25:19 2022 +0000 Forward fix land race (unexpected successes) (#86186) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86186 Approved by: https://github.com/ezyang commit 79dd621f76d8a1f9d780b0940c21665736b0b1d9 Author: Edward Z. Yang Date: Mon Oct 3 17:25:45 2022 -0700 Symbolic shapes mega merge PR (Oct 3) (#86160) - TensorGeometry supports symint - check_size supports symint - functorch batch rule improved symint - Some operator support for symint in LTC - More supported operations on SymInt and SymFloat - More symint support in backwards formulas This merge includes code contributions from bdhirsh and anjali411. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86160 Approved by: https://github.com/Chillee commit de75274883d15bfd0b70d5ebd1d3d03a6e4540a0 Author: Horace He Date: Mon Oct 3 22:40:49 2022 +0000 Symintified factory functions (#86067) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86067 Approved by: https://github.com/ezyang commit 82d9592f1baaf943b81bca13a51d655139f050aa Author: Horace He Date: Mon Oct 3 22:40:49 2022 +0000 Batch of symintifications to allow more models to pass in inference (#86104) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86104 Approved by: https://github.com/ezyang commit a4ff07f19754187d8c8aa722bab422a52152ba9c Author: Richard Zou Date: Mon Oct 3 13:31:11 2022 -0700 Stop modifying the global logger on `import functorch` (#86147) Fixes https://github.com/pytorch/pytorch/issues/85952 `logging.basicConfig` modifies the global logger which affects other programs. importing a package should generally be side-effect free so this PR gets rid of that call. Test Plan: - tested locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/86147 Approved by: https://github.com/ezyang commit fe190078aa78dad94297aece4d8322e5f4262558 Author: soulitzer Date: Mon Oct 3 16:18:09 2022 -0400 Require bias to be contiguous for depthwise3x3_winograd backend (#85711) Fixes https://github.com/pytorch/pytorch/issues/85694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85711 Approved by: https://github.com/malfet, https://github.com/albanD commit bc1d884061dfb7bec0e1a442567ce9638959ad96 Author: George Qi Date: Mon Oct 3 19:04:52 2022 +0000 [maskedtensor] use masked_softmax for forward/backward instead of regular softmax (#85845) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85845 Approved by: https://github.com/cpuhrsch commit 0db9419e282c9cacaf9cd7bee3633d5d68219895 Author: Eli Uriegas Date: Mon Oct 3 11:51:14 2022 -0700 .github: Improve sanity check for generated files (#86143) Makes it so that generated files from .gitattributes do not affect the pr-sanity-check Tested using: (https://github.com/pytorch/pytorch/pull/86143) ``` ❯ BASE=d401732baadf2df666f242cd32db5df3b09dbec6 HEAD=eaf9aa24acf6a1fc68243935f4b33188a59bfdd2 bash .github/scripts/pr-sanity-check.sh INFO: Checking aginst the following stats + git diff --stat 6d06be89fe2b9ca30c3d97475dd192fc7e3f7357 eaf9aa24acf6a1fc68243935f4b33188a59bfdd2 + sed '$d' INFO: Showing non-generated files: + cat /tmp/tmp.mQeK24emtZ .github/scripts/test_trymerge.py | 2 +- .github/scripts/trymerge.py | 14 + INFO: PR SIZE is 16 ``` Signed-off-by: Eli Uriegas Pull Request resolved: https://github.com/pytorch/pytorch/pull/86143 Approved by: https://github.com/malfet, https://github.com/albanD commit 5ca0f9e1d4ee411b52785407b79e26a6dddfb391 Author: Nikita Shulga Date: Mon Oct 3 22:50:04 2022 +0000 [GHF] Make EasyCLA unskippable (#86161) And make small update to the revert test to use mocked rules rather than latest ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/86161 Approved by: https://github.com/zpao, https://github.com/weiwangmeta commit f3d7ab5438ff8740b4dd0403525a3c1400786e8f Author: Edward Z. Yang Date: Mon Oct 3 16:43:22 2022 -0400 Unconditionally register Python decomps to Meta key in Python Dispatcher (#85750) This makes them available for Python Dispatcher to service them when symbolic shapes are involved. This is needed because under certain conditions, functionalization will directly call the Meta kernel for a function in order to produce a properly sized output wrapper tensor for a view operation. This direct call bypasses the normal decomposition table mechanism. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85750 Approved by: https://github.com/wconstab commit 06ddb1c07e3426d5d9c719c63f949359773e9c42 Author: Huy Do Date: Mon Oct 3 22:18:06 2022 +0000 Revert "Disable XLA test (#86123)" (#86151) And also remove torch_patches/.torch_pin to mitigate the sev https://github.com/pytorch/pytorch/issues/86093 until XLA fixes the weird logic in https://github.com/pytorch/xla/blob/master/scripts/apply_patches.sh#L17-L18. Ticket cut to XLA https://github.com/pytorch/xla/issues/4068 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86151 Approved by: https://github.com/kit1980 commit cda815dc232b23f433a20782acda2f5e161ed99b Author: Paul O’Shannessy Date: Mon Oct 3 22:13:54 2022 +0000 Switch to checking EasyCLA on merge (#86127) This is part of the work required to switch over to the new PyTorch Foundation CLA (#85559). Pull Request resolved: https://github.com/pytorch/pytorch/pull/86127 Approved by: https://github.com/malfet commit dfde7cf3e211b9a0456fc4a14df89b80d40f1816 Author: originates <105183376+originates@users.noreply.github.com> Date: Mon Oct 3 22:09:59 2022 +0000 ANTIALIAS updated to Resampling.LANCZOS in torch/utils/tensorboard/summary.py (#85679) **Line 492: ANTIALIAS updated to Resampling.LANCZOS** Removes the following Depreciation Warning: `DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). ` `Use Resampling.LANCZOS instead.` --- ``` try: ANTIALIAS = Image.Resampling.LANCZOS except AttributeError: ANTIALIAS = Image.ANTIALIAS image = image.resize((scaled_width, scaled_height), ANTIALIAS) ``` Now Resampling.LANCZOS will be used unless it gives an AttributeError exception in which case it will revert back to using Image.ANTIALIAS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85679 Approved by: https://github.com/albanD commit 2494c318c40d344025adf0ad4322471401dee24d Author: Mikayla Gawarecki Date: Mon Oct 3 18:13:17 2022 +0000 [easy] fix nested view call taking in more than one -1 (#86134) https://github.com/pytorch/pytorch/pull/85691 (allowing only one -1 in nested view/reshape) broke this. Was not caught by CI but internal tests are broken Pull Request resolved: https://github.com/pytorch/pytorch/pull/86134 Approved by: https://github.com/cpuhrsch commit 6a842e33c6b847cfedc68315b06b0645d51d9a28 Author: Kulin Seth Date: Mon Oct 3 21:05:30 2022 +0000 MPS: Add multinomial op (#80760) Add multinomial with replacement Pull Request resolved: https://github.com/pytorch/pytorch/pull/80760 Approved by: https://github.com/razarmehr, https://github.com/malfet commit 37013bb443c4cef95675300f371ff0263ed303ca Author: Horace He Date: Mon Oct 3 16:59:03 2022 +0000 Added _unsafe_view decomp (#86103) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86103 Approved by: https://github.com/ezyang commit 40a8cc28e78292dac55ac77fc5dc7fbda9428698 Author: Abhishek Pathak Date: Mon Oct 3 20:38:03 2022 +0000 [MPS] Cast dot inputs to int32 when needed (#86140) Fixes https://github.com/pytorch/pytorch/issues/85758 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86140 Approved by: https://github.com/kulinseth, https://github.com/malfet commit 954660a3083e5f3dcf014ae475b53fc181281be0 Author: Edward Z. Yang Date: Mon Oct 3 09:29:49 2022 -0700 Correctly error if you pass in tensors where size arguments expected (#86126) This also makes symintlist track intlist exception handling, which eellison fixed. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86126 Approved by: https://github.com/eellison commit 2aa9e0750acfabe99c59869b232102ab3cc62ae5 Author: Edward Z. Yang Date: Mon Oct 3 08:36:09 2022 -0700 Symintified all functions, not including factory functions (#86078) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86078 Approved by: https://github.com/Chillee, https://github.com/albanD commit cb87983cb8f4a26928f9852d96de63da6d4f363c Author: Edward Z. Yang Date: Mon Oct 3 08:36:09 2022 -0700 Decay integer-only (Optional)SymIntArrayRef to IntList in IValue (#86094) We have logic that says if you ask for a SymIntList from an IValue, but the IValue is actually an IntList, we will still give it to you in that case (check ivalue_to_arg in aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h). However, we also need the *inverse* version of this logic, which says that if you construct an IValue from a SymIntArrayRef, and it is actually integer only, we need to store it as an IntList, so that toIntList on the IValue will work. The way this works is a bit twisty, but our basic strategy is to disable construction of IValue from list container types that contain SymInt directly, and then directly implement variants of these constructors by hand, which iterate over the elements of the list and test if there are any SymInts or not to decide what type to construct the underlying List. These variants have to be templated, otherwise we will run afoul ambiguous overloads. I only did the overloads that actually occurred in practice; you may need to add more if you SymIntify more stuff. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86094 Approved by: https://github.com/anjali411, https://github.com/albanD commit 146db41eb95e3430f088c3045326616d9eec1874 Author: Edward Z. Yang Date: Mon Oct 3 07:22:21 2022 -0700 Preserve/strip OptionalSymIntArrayRef when finding real schema (#86114) Missed this one because I forgot you also have to update it. Thankfully the new Metal CI caught it. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86114 Approved by: https://github.com/anjali411 commit 1da74929d95e70bfd0e6e031f6b21a5b05513a63 Author: ruki Date: Mon Oct 3 20:00:53 2022 +0000 Fix compile error for vs2022 #79358 (#85958) Fixes #79358 - #79358 - https://github.com/xmake-io/xmake-repo/pull/1503#issuecomment-1263104439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85958 Approved by: https://github.com/ngimel commit 36634d78da398787e54e4737c55e4b0a20894cb2 Author: Justin Chu Date: Mon Oct 3 17:14:22 2022 +0000 [ONNX] Remove registration in __init__ (#86130) Remove unused import in `__init__.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86130 Approved by: https://github.com/BowenBao commit e01d616ba9f6d39726a2710d4431afe637074de4 Author: Richard Zou Date: Mon Oct 3 09:25:39 2022 -0700 Re-introduce the functorch docs build (#85838) (#86125) We deleted it when merging functorch into pytorch. This PR makes a new functorch docs build. The docs are relatively simple: - cd into `functorch/docs` and run `make html` to build the docs. - docs should get pushed to the pytorch/functorch repo's gh-pages branch. The long term plan is: - one day, the functorch APIs will just be torch.* APIs, at which point we can fold all of the functorch docs into the regular PyTorch docs - When that happens, the functorch examples and tutorials (that are on the functorch docs site) can be moved to the pytorch examples and pytorch tutorials. Test Plan: - check docs preview - watch this PR after it goes in Differential Revision: [D40026222](https://our.internmc.facebook.com/intern/diff/D40026222) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86125 Approved by: https://github.com/atalman, https://github.com/malfet commit 75c0e3a471c19b883feca15fd4ecfabedf746691 Author: Ramin Azarmehr Date: Mon Oct 3 18:40:16 2022 +0000 [MPS] Improve memory usage and performance utilizing garbage collector and adaptive commit (#86119) - Improve memory usage and performance utilizing garbage collector and adaptive commit - Enable low watermark limit to detect memory pressure. - Enable garbage collection and adaptive commit strategies when under memory pressure. - More efficient resource management by splitting large heaps (instead of reusing oversized buffers for smaller allocation requests) - Introduce Extra Large heaps to improve performance by avoiding numerous costly allocation of smaller heaps - Fix purgeability when releasing the Metal heaps - Fix the race condition when deferring the heap's size update Fixes #79283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86119 Approved by: https://github.com/kulinseth, https://github.com/malfet commit 8860e489949ca02926539ed11d01f307952d6017 Author: Abhishek Pathak Date: Mon Oct 3 18:12:48 2022 +0000 [MPS] Handle compatible inputs to where (#85946) Inputs with different number of dimensions but compatible shapes were being rejected e.g. x.shape = [10,1,10] y.shape = [10,10] cond.shape = [10,10,1] Pull Request resolved: https://github.com/pytorch/pytorch/pull/85946 Approved by: https://github.com/malfet commit 2f692236fe8cbaeda641ec8e837f3b6da8b4d754 Author: Nikita Shulga Date: Mon Oct 3 17:41:43 2022 +0000 [GHF] Add commit statuses to checkruns conclusions (#86129) Needed to surface CircleCI/EasyCLA checks to the `mergebot` rules Pull Request resolved: https://github.com/pytorch/pytorch/pull/86129 Approved by: https://github.com/huydhn commit cd6477617c24f3f9ce2d35a1d956d0f7d68110d9 Author: Driss Guessous Date: Mon Oct 3 17:36:36 2022 +0000 Custom sdp implementations dense (#85984) - This code creates the runtime dispatch system for choosing a performant fused SDP kernel. The only choice of fused kernel is flash_attention. It also creates python flags and a context manager that can be used to turn off and on behavior for dispatch. - This also adds support for flash_attention with dense tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85984 Approved by: https://github.com/cpuhrsch commit 8d9472d7d402983c696836630c1034d56dfb3d87 Author: vfdev Date: Mon Oct 3 17:35:44 2022 +0000 [skip-ci] Fixed bad link in build_ci_governance.rst (#85933) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85933 Approved by: https://github.com/albanD commit 85d520d448dd9fcaccd324029c3f4e4462913133 Author: Masaki Kozuki Date: Mon Oct 3 17:32:07 2022 +0000 [docs] Add `torch.channels_last_3d (#85888) As per title, updating https://pytorch.org/docs/master/tensor_attributes.html#torch-memory-format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85888 Approved by: https://github.com/ngimel commit 2067b768fc8ffa181d6e9dc9d62e1696e9cf4ef8 Author: Chien-Chin Huang Date: Fri Sep 30 15:20:20 2022 -0700 [FSDP] Delay moving tensor to CPU until necessary for optim_state_dict() (#85761) Optimizer state_dict currently move tensors to CPU() immediately after allgather(). However, for sharded optimizer state_dict, this moving is duplicated. We should wait until all the sharding are done. This PR may slightly reduce the performance of full optimizer state_dict as it has to allocate more memory than w/o this PR. But the benchmark shows the memory allocation is pretty light. Differential Revision: [D39855912](https://our.internmc.facebook.com/intern/diff/D39855912/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39855912/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85761 Approved by: https://github.com/rohan-varma commit e23cede0aa8986c103c87a61c3f97a4203218a0f Author: PyTorch MergeBot Date: Mon Oct 3 17:22:31 2022 +0000 Revert "Require bias to be contiguous for depthwise3x3_winograd backend (#85711)" This reverts commit 9a126702ce5a73d3409be8bb7cd04a9fbd7d162a. Reverted https://github.com/pytorch/pytorch/pull/85711 on behalf of https://github.com/huydhn due to This breaks functorch/test_vmap with some unexpected successes https://hud.pytorch.org/pytorch/pytorch/commit/9a126702ce5a73d3409be8bb7cd04a9fbd7d162a commit c670bad72ff2af7eb75dfa3a924754c7fd5a2370 Author: Jesus Magana Date: Mon Oct 3 17:22:04 2022 +0000 Update dist.scatter() documentation (#86069) Update documentation for dist. scatter Fixes #84566 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86069 Approved by: https://github.com/rohan-varma, https://github.com/H-Huang commit 2403d0c25829f6d74b8246dcdc6fce8f4aff1106 Author: Cuiqing Li Date: Mon Oct 3 17:20:58 2022 +0000 implementation of qmul using xnnpack (#86040) Summary: implementation of qmul using xnnpack Test Plan: buck run caffe2/test:quantization -- quantization.core.test_quantized_op.TestQNNPackOps Reviewed By: digantdesai, kirklandsign Differential Revision: D39701867 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86040 Approved by: https://github.com/digantdesai commit 7941b042a73266e786d7367ad26a5bc4760b4fe1 Author: Catherine Lee Date: Mon Oct 3 16:59:39 2022 +0000 parallelize at file granularity (#85770) part two of https://github.com/pytorch/pytorch/pull/84961 tests files in parallel at the test file granularity * 2 procs at a time * number of tests ran changed by <200, possibly due to adding more tests on master between the base commit and head commit of the PR * may cause flakiness, but I haven't seen it in my small sample size of this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/85770 Approved by: https://github.com/huydhn commit d401732baadf2df666f242cd32db5df3b09dbec6 Author: Codrin Popa Date: Mon Oct 3 16:56:22 2022 +0000 Added roundup_bypass_threshold_mb knobs to the PyTorch Caching Allocator (#85940) Summary: Added an additional roundup knob( ``roundup_bypass_threshold_mb``) to bypass rounding the requested allocation size, for allocation requests larger than the threshold value (in MB). This can help reduce the memory footprint when making large allocations that are expected to be persistent or have a large lifetime. Differential Revision: D39868104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85940 Approved by: https://github.com/zdevito commit bc993e39cc3c2c37e58a88ae3071b6f5e73ef8fc Author: Horace He Date: Mon Oct 3 07:11:53 2022 +0000 Unwrap SymInt => Proxy when being returned from the wrapped function make_fx traces (#86098) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86098 Approved by: https://github.com/ezyang commit 470f8fb9e55f2083f1053ed75e27768cdaf2747b Author: Richard Zou Date: Fri Sep 30 09:30:22 2022 -0700 Fix functorch/test/test_control_flow (#85981) The tests weren't being run in PyTorch CI. On deeper investigation, it looks like the test file doesn't work under the unittest test runner (it works under pytest though). This PR enables running these tests under unittest and also marks things that now fail as expected failure. We should fix these at some point. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85981 Approved by: https://github.com/samdow, https://github.com/voznesenskym commit a262ccea58946cd9efb5e7d4a38032b40996a237 Author: Richard Zou Date: Fri Sep 30 13:19:46 2022 -0700 Change torch.autograd.graph.disable_saved_tensors_hooks to be public API (#85994) Also addresses some comments from the review in https://github.com/pytorch/pytorch/pull/85971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85994 Approved by: https://github.com/albanD, https://github.com/soulitzer commit 6d06be89fe2b9ca30c3d97475dd192fc7e3f7357 Author: Huy Do Date: Mon Oct 3 16:19:12 2022 +0000 Disable XLA test (#86123) This is related to https://github.com/pytorch/pytorch/issues/86093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86123 Approved by: https://github.com/ZainRizvi commit 5fa840103b4eac16a3fc87bb26ebf701fbd1666c Author: PyTorch MergeBot Date: Mon Oct 3 16:08:18 2022 +0000 Revert "Re-introduce the functorch docs build (#85838)" This reverts commit 0449cf0c9e469f052bb9316b13260d126d6f01d4. Reverted https://github.com/pytorch/pytorch/pull/85838 on behalf of https://github.com/atalman due to Break internal build commit 9a126702ce5a73d3409be8bb7cd04a9fbd7d162a Author: soulitzer Date: Fri Sep 30 23:50:00 2022 -0400 Require bias to be contiguous for depthwise3x3_winograd backend (#85711) Fixes https://github.com/pytorch/pytorch/issues/85694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85711 Approved by: https://github.com/malfet, https://github.com/albanD commit d253d6ec0c1c086b9d3be98b421d224ff20b734e Author: Nikita Shulga Date: Mon Oct 3 15:04:33 2022 +0000 [Metal][BE] Fix signed/unsigned compare (#86068) To enable Metal builds in OSS Guard `[self dealloc]` call in `MPSImageWrapper.mm` with `#if !__has_feature(objc_arc)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/86068 Approved by: https://github.com/ezyang commit 4a528bc16fa52ea94384861bea84cfb61d9a645c Author: Vasiliy Kuznetsov Date: Fri Sep 30 16:41:34 2022 -0700 remove vkuzo from CODEOWNERS for AO (#86038) Summary: I was added to various places in https://github.com/pytorch/pytorch/pull/79505, this is too noisy to be useful so taking myself off. Always happy to help when folks tag me manually. Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/86038 Approved by: https://github.com/HDCharles commit 68a6113248ac25841b524d59f9dc0f298b389ba2 Author: Ivan Yashchuk Date: Mon Oct 3 15:03:08 2022 +0000 Add nvFuser support for torch.native_batch_norm (#85562) This PR adds nvFuser's implementation for batch_norm as there's no reference yet (https://github.com/pytorch/pytorch/pull/81191) and no in-place copy support (https://github.com/pytorch/pytorch/pull/84545). Pull Request resolved: https://github.com/pytorch/pytorch/pull/85562 Approved by: https://github.com/kevinstephano, https://github.com/ngimel commit d28a882319d92bae17827101787adf838a05df0a Author: Justin Chu Date: Mon Oct 3 14:34:27 2022 +0000 [ONNX] Remove excessive deprecation messages (#86065) The deprecation messages in SymbolicContext will be emitted every time it is initialized. Since we already emit deprecation messages at registration time, the deprecation decorator can be removed in `__init__` to reduce noise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86065 Approved by: https://github.com/BowenBao commit 6cd9c447daa083478c4272674f0e80ff4e0c6a5a Author: Edward Z. Yang Date: Sun Oct 2 21:15:03 2022 -0700 Make test_api compile on DEBUG mode with some compiler versions (#86092) The symbol seems to conflict under some compiler versions, giving an error like "relocation refers to global symbol which is defined in a discarded section". Simple enough to put it in an anonymous namespace, so why not. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86092 Approved by: https://github.com/Chillee commit 368e8e7520f95bec7a82653beccd9779522f854d Author: Edward Z. Yang Date: Sun Oct 2 21:14:59 2022 -0700 Skip, don't xfail, nondeterministic as_strided_scatter test (#86091) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86091 Approved by: https://github.com/Chillee commit 1f157099fa359a4e504cd19c7fb5019858a5d36c Author: Edward Z. Yang Date: Sun Oct 2 18:00:15 2022 -0700 Teach remove_symint to handle OptionalSymIntArrayRef (#86088) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86088 Approved by: https://github.com/Chillee, https://github.com/anjali411 commit bd32f9a833c911a7acaddcafba48806c1b94f6d0 Author: Edward Z. Yang Date: Sun Oct 2 17:12:40 2022 -0700 Correct ownership of OptionalSymIntArrayRef in backwards (#86087) Also add some cheap but cheerful sanity checks to help detect similar situations in the future. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86087 Approved by: https://github.com/albanD commit 6fd5d6397a58860ba60178a22574f9701eab061a Author: vfdev Date: Mon Oct 3 10:57:08 2022 +0000 [Docs] Updated torchvision people (#85931) cc @datumbox @pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/85931 Approved by: https://github.com/fmassa, https://github.com/datumbox commit 5322f00151cace0aaec2701d1ab75c648b86d592 Author: John Detloff Date: Mon Oct 3 06:43:06 2022 +0000 Re-add benchmarking files to ios TestApp (#85539) Fixes #76033 The benchmarking code in the iOS TestApp was removed a while back as dead code: https://github.com/pytorch/pytorch/pull/64849 I believe this was done in error - as this leaves our TestApp empty, nothing occurs when it runs. And we still have a tutorial up demonstrating how to use the benchmarking feature of the TestApp. This diff restores the files that were deleted, with some minor tweaks for compatibility with changes that have happened since they were deleted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85539 Approved by: https://github.com/kimishpatel commit 2b5625a726372f7a6e1fcfd60e687e57b329a7f6 Author: Rohan Varma Date: Mon Oct 3 06:15:20 2022 +0000 Update hierarchical_model_averager.py (#85648) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/85648 Approved by: https://github.com/wayi1, https://github.com/H-Huang commit 6a1e3f2f3720fd92514b385f8177edf669082961 Author: Nikita Shulga Date: Mon Oct 3 05:51:22 2022 +0000 Update fbgemm submodule (#86054) Reland of https://github.com/pytorch/pytorch/commit/481def752cc001ff8ac7e3b723ece11aa1110c77 Fixes https://github.com/pytorch/pytorch/issues/85956 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86054 Approved by: https://github.com/xuzhao9 commit acd2f21ea130ad74bc68ed938044dfb20ff4c205 Author: Taylor Robie Date: Sun Oct 2 16:07:52 2022 -0700 [Profiler] Update python binding type annotations (#85722) The annotations for `torch._C._profiler` have gotten a bit stale. This PR simply brings them up to date. There is one small quality of life change that alters behavior: instead of returning device type and index separately we return a `torch.device` object. Differential Revision: [D39852803](https://our.internmc.facebook.com/intern/diff/D39852803/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85722 Approved by: https://github.com/chaekit commit 5ed338a55b8d320a851d9461edddb92a6d8b8b90 Author: Taylor Robie Date: Sun Oct 2 16:07:51 2022 -0700 [Profiler] Add dtype to `_TensorMetadata` (#85721) `Inputs.dtypes_` stringifies the dtypes; however this loses information which is hard to recover and useful for analysis. So this PR adds full `torch.dtype` info for Tensors. Differential Revision: [D39852802](https://our.internmc.facebook.com/intern/diff/D39852802/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85721 Approved by: https://github.com/chaekit commit ba95984588f100d781c1700218c2f7cd77cf380a Author: Taylor Robie Date: Sun Oct 2 16:07:49 2022 -0700 [Profiler] Make `name` a property. (#85720) This is just a quality of life change. `.name` is 30% fewer characters than `.name()`. I should have done this from the start. Differential Revision: [D39788873](https://our.internmc.facebook.com/intern/diff/D39788873/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85720 Approved by: https://github.com/chaekit commit dcac4dd58edefb6951a60266e53d8767dc9be002 Author: Jianyu Huang Date: Mon Oct 3 03:29:08 2022 +0000 Add int32_t range check in packed_accessor32 in PyTorch TensorBase (#86085) Summary: As ajtulloch suggested, we can make tensor.packed_accessor32<...>() raise an exception if tensor.numel() > std::numeric_limits::max(). Trade-off: run-time check overhead (one-time) when doing `packed_accessor32` accessor. Differential Revision: D39996275 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86085 Approved by: https://github.com/ngimel commit aabf3e234b532d76b05cb76d837638905e68bb77 Author: Edward Z. Yang Date: Sun Oct 2 17:12:37 2022 -0700 Allow functionalize_aten_op to work with non-SymInt signature. (#86080) This is done similarly to how we did CPU fallback template. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86080 Approved by: https://github.com/wconstab commit 21e00d5accd3ddd8a138c2e4a805a7a38bfc8847 Author: Edward Z. Yang Date: Sun Oct 2 13:24:47 2022 -0700 Fix type of as_float_unchecked (#86075) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86075 Approved by: https://github.com/wconstab commit 8753703b6804796007b5974ad2bca6e14a7a61c1 Author: Edward Z. Yang Date: Sun Oct 2 12:50:17 2022 -0700 Fix some bugs in SymFloat IValue and toPyObject handling (#86072) - Test for symbolic cases first before non-symbolic, as symbolic ints/floats advertise as being ints/floats - Add missing case for toPyObject Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86072 Approved by: https://github.com/wconstab commit a66506b136766fb75c818283e48697166d1e7cbe Author: Edward Z. Yang Date: Sun Oct 2 16:08:03 2022 -0400 Revert "Revert "Build and run Metal tests in CI (#86062)"" (#86073) This reverts commit 195184e69cda79678590c759719b1dc1d7ef6d09. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86073 Approved by: https://github.com/malfet commit 07ce0b435b5c4197836b9f08342e566a46c55961 Author: lezcano Date: Sun Oct 2 21:59:42 2022 +0000 Remove backward for im2col and col2im (#85542) `im2col` is a linear map, and `col2im` is its adjoint. As such, the adjoint to `col2im` is `im2col` (the adjoint of the adjoint is the original function. There's no point having explicit derivatives in ATen for these functions, so this PR deletes all these. Furthermore, along the way, we fix an error for the derivative of im2col for non-batched inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85542 Approved by: https://github.com/soulitzer, https://github.com/ngimel commit 99ca25e6eb8299f31824bdbaf62f16f8a8db458d Author: Michael Fisher <86859628+MFisherBE@users.noreply.github.com> Date: Sun Oct 2 22:55:34 2022 +0000 Misspelling Correction PR common_methods_invocations.py (#86081) Noticed a misspelling while looking at Issue #85712. This fix just fixes the mispelling on line #3107. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86081 Approved by: https://github.com/ngimel commit e6dd2965af330d4aaad49de4551ee87df3007ee8 Author: Horace He Date: Sun Oct 2 17:42:36 2022 +0000 A bunch of coverage improvements (re for models in inference snext50, BERT_pytorch, mobilenet_v3_large, pytorch_CycleGAN_and_pix2pix, dcgan, resnet18, mnasnet1_0) (#86050) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86050 Approved by: https://github.com/ezyang commit b8bf60445938e988c020478ebf0c98ec19d24416 Author: Horace He Date: Sun Oct 2 16:50:09 2022 +0000 Ported linear to symints (#86021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86021 Approved by: https://github.com/ezyang commit b9b24c31fda46d8403a28403898f129127e3f35e Author: Nikita Shulga Date: Sun Oct 2 20:13:05 2022 +0000 [MPS] Fix non-contig to contig tensor copy (#86056) This handles a rare case when MPS tensor is constructed from non-contiguous CPU tensor. Fixes https://github.com/pytorch/pytorch/issues/85967 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86056 Approved by: https://github.com/janeyx99 commit 007e12a3e956d5a4362415664a252d9090ab57ac Author: Peter Bell Date: Sun Oct 2 11:29:07 2022 +0100 OpInfo: Extend natural syntax to allow adding metadata (#85890) Splitting into a seperate PR in case of bike shedding. We can't use the normal fluent syntax `SampleInput(x).name("foo")` because `.name` is already how the metadata is accessed. So instead, this adds a single function where you pass keyword arguments to fill in the metadata, e.g. ``` SampleInput(x).with_metadata( name="foo", output_process_fn_grad=out_fn) ``` An alternative closer to the normal fluent style would be to adding a prefix to the property's name, e.g. ``` (SampleInput(x) .with_name("foo") .with_output_process_fn_grad(out_fn)) ``` However, I have a slight preference for the `with_metadata` style because you don't need to add extra parenthesis to break lines. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85890 Approved by: https://github.com/mruberry commit ed5f95048e1151944786ab6437fca63df0800051 Author: Peter Bell Date: Sun Oct 2 11:29:07 2022 +0100 OpInfo: Add natural syntax for SampleInput creation (#85723) Most SampleInput objects currently have no additional metadata, meaning they have a 1:1 mapping with a normal function call. This adds var arg forms of the `SampleInput` constructor such that you can just call the `SampleInput` constructor as you would call the operator. So, for example ```python SampleInput(make_arg(shape), args=(2, 3), kwargs=dict(alpha=4)) ``` becomes ```python SampleInput(make_arg(shape), 2, 3, alpha=4) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85723 Approved by: https://github.com/mruberry commit 195184e69cda79678590c759719b1dc1d7ef6d09 Author: PyTorch MergeBot Date: Sun Oct 2 19:08:30 2022 +0000 Revert "Build and run Metal tests in CI (#86062)" This reverts commit f88bf8de2cba75377baf469b3dd3f8bc415ee7d2. Reverted https://github.com/pytorch/pytorch/pull/86062 on behalf of https://github.com/huydhn due to Breaking trunk https://hud.pytorch.org/pytorch/pytorch/commit/f88bf8de2cba75377baf469b3dd3f8bc415ee7d2 commit 36380897553063c7b433f738671ff23c5ad58ced Author: Edward Z. Yang Date: Sun Oct 2 06:43:50 2022 -0700 Ported reshape to symints and added a shim for BC (#85998) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85998 Approved by: https://github.com/ezyang commit f88bf8de2cba75377baf469b3dd3f8bc415ee7d2 Author: Edward Z. Yang Date: Sun Oct 2 11:33:20 2022 -0400 Build and run Metal tests in CI (#86062) Fixes https://github.com/pytorch/pytorch/issues/84172 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86062 Approved by: https://github.com/kimishpatel, https://github.com/malfet commit cd5ac15d5d6273ccefc6d84d79a6daf6d612ab1d Author: Edward Z. Yang Date: Sat Oct 1 22:47:08 2022 -0400 Fix internal/external desync for Metal hotfix (#86061) For some reason, the fbcode to GitHub sync landed the wrong version of the PR. This corrects the synchronization problem, and actually makes the Metal backend work. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86061 Approved by: https://github.com/malfet commit b26eafec079a18bc331f569a7e35497129feed71 Author: Kulin Seth Date: Sun Oct 2 15:27:52 2022 +0000 [MPS] Specialized memory pool for scalar values. (#85817) - Add buffer usage and debug verbosity flags to MPSAllocator - Add high_watermark_ration to limit the memory allocation Pull Request resolved: https://github.com/pytorch/pytorch/pull/85817 Approved by: https://github.com/razarmehr commit 481def752cc001ff8ac7e3b723ece11aa1110c77 Author: Nikita Shulga Date: Sun Oct 2 15:05:34 2022 +0000 Update fbgemm submodule (#86054) Fixes https://github.com/pytorch/pytorch/issues/85956 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86054 Approved by: https://github.com/xuzhao9 commit f183a989a21473cae84bc23e5e3cbbf8a087b8c0 Author: Elias Ellison Date: Thu Sep 29 21:20:53 2022 +0000 Fix fake tensor kernel nesting (#85920) If you e.g. printed within a decomp which would call `in_kernel_invocation_manager`, on the exit from the manager it would unilaterally remove meta from the tls / set the tensor to return its real device. We should just restore what the existing state was. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85920 Approved by: https://github.com/ezyang, https://github.com/bdhirsh, https://github.com/huydhn commit 365498f673681a09ee67b54493a664ea646b036a Author: Edward Z. Yang Date: Sat Oct 1 10:08:53 2022 -0700 Add rmod support to SymIntNode (#86053) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86053 Approved by: https://github.com/wconstab commit c857b3e73ec707a08d44bd2d01ab03e61ee44380 Author: Edward Z. Yang Date: Sat Oct 1 06:53:57 2022 -0700 More fixes for LTC symint interlock. (#86043) Now, we also avoid translating SymInt to valueT if you haven't asked for a SymInt implementation. This makes embedding_dense_backward work without changes to LTC. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86043 Approved by: https://github.com/wconstab commit 0060d871df2710a98211db3683bd48b1b648e9e0 Author: Edward Z. Yang Date: Sat Oct 1 06:53:57 2022 -0700 Add a bunch of extra functionality to SymFloat (#86046) - SymInt to SymFloat conversion - All the basic arithmetic operators on c10::SymFloat Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86046 Approved by: https://github.com/wconstab commit 833edeb020084377dcff27e8ef8b0a2af115fb27 Author: Will Constable Date: Sun Oct 2 00:00:46 2022 +0000 Register py metas to py dispatcher so they are used by functionalization (#86057) - this ensures python metas are always used during symbolic tracing/functionalization without overshadowing c++ metas during eager runtime Pull Request resolved: https://github.com/pytorch/pytorch/pull/86057 Approved by: https://github.com/ezyang commit b562987c28b37009d2d95d9506b67e3c16fab83e Author: PyTorch MergeBot Date: Sat Oct 1 19:30:21 2022 +0000 Revert "Fix fake tensor kernel nesting (#85920)" This reverts commit c2d9ea7f4b54c7d4332bc457fd76238c61f129de. Reverted https://github.com/pytorch/pytorch/pull/85920 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I suspect that it causes a flaky memory leak issue in TestFakeTensorCUDA.test_fake_crossref_backward_amp_linalg_lstsq_cuda_float32 commit fe89cd6c57477dc265895f946ff89d5cae047d0f Author: Nikita Shulga Date: Sat Oct 1 17:21:31 2022 +0000 [BE] Use reusable workflows from test-infra (#86035) Instead of local copies, use workflows checked into test-infra by https://github.com/pytorch/test-infra/pull/783 Thought about deleting the actions later, but if I understand how GHA merges work, older PRs merged onto this changes should not cause any problems as it will immediately reference actions from test-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/86035 Approved by: https://github.com/kit1980 commit 92c2295ab4b5ccdedcc32227c1125a4daf9e2759 Author: Edward Z. Yang Date: Sat Oct 1 06:53:57 2022 -0700 Remove dead ts_native_functions.yaml entries (#86045) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86045 Approved by: https://github.com/Chillee commit 2f703c5956f3c861c80d5ac736ff2aeba6dfb476 Author: Edward Z. Yang Date: Sat Oct 1 06:53:56 2022 -0700 SymInt-ify TypeAndSize (#86044) Commit originally by anjali411, with bugfix from Edward. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86044 Approved by: https://github.com/Chillee commit 07800c9c815abc1b478f0292e376d7c27e94b053 Author: Edward Z. Yang Date: Sat Oct 1 06:53:56 2022 -0700 Miscellaneous fixes from symbolic-shapes branch (#86042) - Make toIValue accept SymIntNode and SymFloatNode where number (aka Scalar) is expected - Binding for symintlistOptional in python arg parser - Teach translate to convert from IntArrayRef to ArrayRef - Don't query _symint function for meta info in LTC unless LTC is code generating a symint function Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/86042 Approved by: https://github.com/Chillee commit d9273e8b6b42dec1cd5b52779075912bee854130 Author: kshitij12345 Date: Sat Oct 1 06:32:19 2022 +0000 [functorch] refactor: get_exhaustive_batched_inputs (#85965) `get_exhaustive_batched_inputs_batch_norm_is_training` and `get_exhaustive_batched_inputs` are same except for a couple of lines. We move the above functionality into `generate_vmap_inputs` (which is now only function to create batched inputs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85965 Approved by: https://github.com/zou3519 commit a5a2f576a768f01b14d2742e8fd7a478a2ab01d3 Author: PyTorch MergeBot Date: Sat Oct 1 02:49:06 2022 +0000 [vision hash update] update the pinned vision hash (#85776) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85776 Approved by: https://github.com/pytorchbot commit 05d1128106e50075b0fd7d667680214ace34306c Author: Ke Wen Date: Sat Oct 1 00:59:39 2022 +0000 [c10d] Start deprecating *_multigpu APIs (#85961) - For most users training is on one GPU per process so these APIs are rarely used - They added one more API dimension - They can be expressed in a composed manner - They are not abstracted – specific to GPU - They caused backend APIs and implementations to have nested `std::vector>`, which is hard to read or maintain Pull Request resolved: https://github.com/pytorch/pytorch/pull/85961 Approved by: https://github.com/XilunWu, https://github.com/H-Huang commit 463283e016ffa7d8a0da35a1d28c8b8ab0db2ea7 Author: Ke Wen Date: Sat Oct 1 00:55:27 2022 +0000 [c10d] Start deprecating *_coalesced APIs (#85959) - We consider that general users need not to use the `*_coalesced` APIs unless there is an extreme concern about performance. - We are investigating using a context manager named `coalescing_manager` which wrap around multiple individual collectives to compose the coalescing hint, rather than giving each collective a *_coalesced variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85959 Approved by: https://github.com/XilunWu, https://github.com/H-Huang commit bf667c63e7c76cb7bfb6ef8cb8d844d6c301937b Author: Ramin Azarmehr Date: Sat Oct 1 00:33:23 2022 +0000 Fix the error with constant_pad_nd for 4D+ padding (#85991) - We warn the user and fall back to default implementation for 4D+ constant padding Fixes #84535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85991 Approved by: https://github.com/kulinseth commit be29ca97169e2621acf67e87020f461da3032129 Author: Chien-Chin Huang Date: Fri Sep 30 11:15:15 2022 -0700 [FSDP] Ignore buffers that are non-persistent. (#85740) A buffer can be registered as non-persistent. A non-persistent buffer won't be in the state_dict. Differential Revision: [D39858689](https://our.internmc.facebook.com/intern/diff/D39858689/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85740 Approved by: https://github.com/awgu, https://github.com/rohan-varma commit db4c6fe54fd043bb249657be4054252ca5f78b36 Author: PyTorch MergeBot Date: Fri Sep 30 23:54:49 2022 +0000 Revert "[maskedtensor] use masked_softmax for forward/backward instead of regular softmax (#85845)" This reverts commit a4d10342e98b0abb3286a3780617afe108328ac7. Reverted https://github.com/pytorch/pytorch/pull/85845 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks CUDA test_softmax_cuda (main.TestBasicsCUDA) commit 9bf9db57be42a9d2ba77e3042578ac439848aec1 Author: Horace He Date: Fri Sep 30 20:19:39 2022 +0000 Refactored recomputable ops a bit and added a bunch more ops (#85993) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85993 Approved by: https://github.com/ngimel commit e09a84a184e1687f4ddc7f3fc875eaaf5b9ec74f Author: Horace He Date: Fri Sep 30 20:18:43 2022 +0000 Removed debug output that doesn't work with faketensors (#85992) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85992 Approved by: https://github.com/ngimel commit 4b86a9359ae1cc0dd9e9b0480eee72850c7565b6 Author: Xia, Weiwen Date: Fri Sep 30 23:44:45 2022 +0000 [Quant] Make x86 backend default when querying qconfig (#85461) This PR is a follow-up of #84329 [[Quant] Add unified x86 quant backend](https://github.com/pytorch/pytorch/pull/84329) It makes `x86` backend default when querying `qconfig`. Users get x86's qconfig/qconfig_mappings if backend is not specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85461 Approved by: https://github.com/jgong5, https://github.com/vkuzo commit fd553c46f401bdce1c74b3251495a72940729d5e Author: jjsjann123 Date: Fri Sep 30 23:19:25 2022 +0000 nvprim op support runtime checks on dtype compatibility on prims.convert_element_type (#85566) I'm seeing issue that we lower `_to_copy` into `nvprims.convert_element_type`. In cases where we are casting to a dtype that's not supported by nvfuser, this raise runtime error. I added a quick check in the lowering part where each op can peek at fx.node and make a runtime decision on whether the given op should be lowered to nvprim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85566 Approved by: https://github.com/IvanYashchuk, https://github.com/ngimel commit 01292cc9e498b74960a5e4de68dfd577f4cb14de Author: Nikita Shulga Date: Fri Sep 30 23:13:42 2022 +0000 [BE] Get rid of `std::result_of` in `c10` (#85977) As it is a deprecated and to be removed in C++20 Fixes https://github.com/pytorch/pytorch/issues/85962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85977 Approved by: https://github.com/kit1980 commit c2d9ea7f4b54c7d4332bc457fd76238c61f129de Author: Elias Ellison Date: Thu Sep 29 21:20:53 2022 +0000 Fix fake tensor kernel nesting (#85920) If you e.g. printed within a decomp which would call `in_kernel_invocation_manager`, on the exit from the manager it would unilaterally remove meta from the tls / set the tensor to return its real device. We should just restore what the existing state was. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85920 Approved by: https://github.com/ezyang, https://github.com/bdhirsh commit 28061d50e6ea29a3400044f28a2c374ec8f4da17 Author: soulitzer Date: Fri Sep 30 15:40:25 2022 -0400 Lazily load decompositions for jvp (#85989) Reduces time it takes to run `python -c "import torch"` by ~10% See https://github.com/pytorch/pytorch/issues/85513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85989 Approved by: https://github.com/albanD, https://github.com/zou3519 commit 334686bde752a8b34d02aac069cf3f910f7d8b70 Author: Ramin Azarmehr Date: Fri Sep 30 22:57:57 2022 +0000 Fix the dimension of padding to match the input's dimension (#85990) Fixes #85143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85990 Approved by: https://github.com/malfet, https://github.com/kulinseth commit 24fc680ee4228225c01fb6699210056ca2603a3f Author: andrewor14 Date: Fri Sep 30 11:14:30 2022 -0700 [Quant] Enable XNNPACK ops in QNNPACK BackendConfig (#85863) **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of https://github.com/pytorch/pytorch/pull/74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85863 Approved by: https://github.com/jerryzh168 commit d9421f81584145d17452864151d61aa694e601d5 Author: Fuzzkatt Date: Fri Sep 30 22:51:56 2022 +0000 added fix for WorkUCC (#84368) Added new constructor for WorkUCC to take in optional inputTensors argument for to enable record_shapes=True for profiling purposes. Tested at https://github.com/pytorch/pytorch/pull/84323 which manually merges in https://github.com/pytorch/pytorch/pull/83285. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84368 Approved by: https://github.com/kingchc, https://github.com/kwen2501 commit a4cc63991ad351f1e98c4bac8955e34a0cb7b1a6 Author: Ramin Azarmehr Date: Fri Sep 30 22:40:50 2022 +0000 [MPS] Enable caching for random ops with Philox engine (#85833) Also Fix type cast issue in Bernoulli (Fixes #85611) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85833 Approved by: https://github.com/kulinseth, https://github.com/malfet commit 071f875046202b87213865dfc180abdf8368f116 Author: Digant Desai Date: Fri Sep 30 22:02:44 2022 +0000 [quant] Fix per channel weight observer (#85883) Summary: `per_channel_weight_observer_range_neg_127_to_127` now correctly uses `PerChannelMinMaxObserver` instead of `MinMaxObserver` Test Plan: Adds a new test `quantization.core.test_top_level_apis ` to instansiate and run `forward()` on all `default` observers Differential Revision: D39916482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85883 Approved by: https://github.com/salilsdesai commit 6a5550fca4144b11f89f1db4e32205e8dc295cbd Author: Kshiteej K Date: Fri Sep 30 21:45:37 2022 +0000 [test_nn] split embedding tests from test_nn (#85892) Ref https://github.com/pytorch/pytorch/issues/63085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85892 Approved by: https://github.com/albanD commit 2037b7cb609b5621e82e5fe09bc806ce463e90b6 Author: Edward Z. Yang Date: Fri Sep 30 10:01:35 2022 -0700 Make FunctionalTensorWrapper correctly handle symbolic shapes (#85975) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85975 Approved by: https://github.com/bdhirsh, https://github.com/albanD commit 3b6588ab7451d32516115f558ea08e0dec6c6d53 Author: Edward Z. Yang Date: Fri Sep 30 10:01:35 2022 -0700 Consistent compute numel/contiguous strategy with SymInts (#85858) Previously, our handling for contiguity was inconsistent in the following ways: - is_strides_like 2d/3d and is_non_overlapping_and_dense always were computed based on sizes_and_strides_, even if you had symbolic ints - Furthermore, even if you set custom policy for strides, these quantities were not overridable by subclasses - Furthermore, we didn't even store these fields on ExtraMeta - We duplicate implementations of compute_contiguous (plain, channels last, channels last 3d) - We inconsistently called refresh_numel()/refresh_contiguous(), versus recomputing it ourselves This factor makes a consistent strategy for all of the boolean fields, and for numel computation. After this refactor: - All layout boolean fields are interposable via strides policy and can be overridden from Python; you will never access a garbage field - All layout boolean fields are on ExtraMeta - You can always call refresh_numel/contiguous, no matter if your Tensor is contiguous or not - The numel/layout boolean fields are always populated consistently with the sizes strides fields (either on Tensor or ExtraMeta), even if you have custom policy - There is only one implementation of the actual computation logic Signed-off-by: Edward Z. Yang Differential Revision: [D39907696](https://our.internmc.facebook.com/intern/diff/D39907696) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85858 Approved by: https://github.com/albanD commit 84a06d71936e61ceeee2abb9c9cb7bf5ee6440dd Author: Edward Z. Yang Date: Fri Sep 30 09:55:45 2022 -0700 Enable convolution_backward with bias and symints (#85970) Originally by Krovatkin from https://github.com/pytorch/pytorch/pull/85816 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85970 Approved by: https://github.com/albanD commit a4d10342e98b0abb3286a3780617afe108328ac7 Author: George Qi Date: Fri Sep 30 18:18:14 2022 +0000 [maskedtensor] use masked_softmax for forward/backward instead of regular softmax (#85845) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85845 Approved by: https://github.com/cpuhrsch commit 1c97084685f19435759f785d33fde7ea3a61afa7 Author: Nikita Shulga Date: Fri Sep 30 20:58:56 2022 +0000 [BE] Generate names of known device from array (#85982) Rather than hardcoding list of device names, generate it from list of known types. Performance is not important at the error codepath, as it will not be evaluated during normal codepath. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85982 Approved by: https://github.com/kit1980 commit 71eb04403ca46e19a3efcde454cedbc2f990dc12 Author: PyTorch MergeBot Date: Fri Sep 30 20:53:41 2022 +0000 Revert "[CUBLAS][CUDA GRAPHS] (re-re-open of #83461) Explicitly set the workspace for cuBLAS handles (#85447)" This reverts commit b04b2fa9aa52cacbdc9aaaf477d55b0af845ce81. Reverted https://github.com/pytorch/pytorch/pull/85447 on behalf of https://github.com/seemethere due to Caused a CUDA memory leak, detected by our performance benchmark suite commit 401a358817b6657fc412b05fee6395f7e82a9226 Author: Catherine Lee Date: Fri Sep 30 20:44:12 2022 +0000 [ci] two procs for parallelization (#85985) hitting ooms on linux cuda so use 2 procs instead of 3 https://github.com/pytorch/pytorch/issues/85939 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85985 Approved by: https://github.com/huydhn commit e73e3e352312be7d4b293bed65da021a2fc81ab6 Author: Richard Zou Date: Fri Sep 30 09:30:18 2022 -0700 [functorch] test no warning on `import functorch` (#85980) Copied from https://github.com/pytorch/pytorch/blob/24adadd4dbcd90b5aba1d4a45847e4ffa83bd6cc/test/test_testing.py#L1808 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85980 Approved by: https://github.com/samdow commit eed5f0464c305974ae6a9cb8e6c685eb40c4477e Author: Richard Zou Date: Fri Sep 30 08:30:16 2022 -0700 [functorch] fix whirlwind tour ipynb (#85974) It was missing an "import torch" Pull Request resolved: https://github.com/pytorch/pytorch/pull/85974 Approved by: https://github.com/samdow commit 4c04fa9587fb534fa7c9848e06141bb862a56bb4 Author: Masaki Kozuki Date: Fri Sep 30 20:32:05 2022 +0000 Remove `optim_mt` from `test/test_optim.py` (#83549) As per title, this updates `test_optim.py` so that `foreach` optimizers are constructed using the `foreach` keyword argument of `torch.optim` optimizers. Also, this makes some cosmetic changes to remove `torch.autograd.Variable`, `.data` calls, and `torch._six`. Related: https://github.com/pytorch/pytorch/pull/81705#discussion_r939440776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83549 Approved by: https://github.com/ngimel commit 94da90e41f171975cc455dcf42e80918d06d978b Author: albanD Date: Fri Sep 30 20:07:05 2022 +0000 LU solve/unpack fix to prevent bad memory usage on CPU (#85922) Fixes https://github.com/pytorch/pytorch/issues/77898 Fixes https://github.com/pytorch/pytorch/issues/85026 There is a minor perf impact but: - For lu_solve, the actual compute is going to be more expensive than this O(n) check (ones pass over the other matrices is O(n^2) in any case) - For lu_unpack, the check inside the kernel should be almost free. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85922 Approved by: https://github.com/ngimel, https://github.com/nikitaved commit 7238ca4c2e865acff66170909e701cccacee928a Author: Richard Zou Date: Fri Sep 30 07:52:30 2022 -0700 Disallow saved tensor hooks in functorch transforms (#85972) Technically they may only be a problem with the grad transform. Though the branch cut is soon, this is the more conservative change, it also lets us disable checkpointing for functorch (which definitely doesn't work with all transforms) and not a lot of people use saved tensor hooks with functorch (I discovered this while testing). Test Plan: - new tests Differential Revision: [D39970934](https://our.internmc.facebook.com/intern/diff/D39970934) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85972 Approved by: https://github.com/samdow commit 7c72bc48d88d55ff687f8adfaec41b2c5d7c659f Author: Richard Zou Date: Fri Sep 30 07:52:26 2022 -0700 Add mechanism to disable the "saved tensors hooks" feature (#85971) The rationale for this is that functorch doesn't work with saved variable hooks at the moment or checkpointing and we need some way to disable it. Concretely: - there's a context manager that does the disabling - this feature is disabled on a thread-local basis - one can set an error message or use the default error message that says the feature has been disabled Since it is thread local I needed to update ATen/ThreadLocalState. To make things nicer, this PR refactors all the "saved tensors hooks" related TLS things into a single struct. Test Plan: - new test Differential Revision: [D39970936](https://our.internmc.facebook.com/intern/diff/D39970936) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85971 Approved by: https://github.com/albanD, https://github.com/soulitzer commit 69b927701a6369d90e273edca812bb9546aca67f Author: Justin Chu Date: Fri Sep 30 19:35:34 2022 +0000 [ONNX] Update user documentation (#85819) - Remove mentions of `SymbolicContext` in the doc - Comment out the PythonOp example so that it is not shown to users - Updated code blocks and wording - Changed to recommend using `pip` for installing onnx. Now adds a deprecation message to the docs (demo only): ![image](https://user-images.githubusercontent.com/11205048/193327649-f789b369-6b59-49e0-8bba-34a6785eb128.png) Fixes #85608 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85819 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 1fae890a07f180c64825faa214916dc0cbd6cb58 Author: samdow Date: Fri Sep 30 11:26:27 2022 -0400 fix grad silent correctness issue from view fn followed by an inplace fn (#85374) From https://github.com/pytorch/functorch/issues/1007, which was an issue where we would wrap aliases of unwrapped tensors and miss the inplace error message where we should have gotten it. Instead of keeping aliases unwrapped like I had originally wanted, this simplifies it slightly such that: (1) All tensors that were previously wrapped are still wrapped. This is occasionally important because of the 1-1 relationship between a tensor and autograd meta. By keeping the same number of wrapper tensors before, we'll never have autograd try to write multiple autograd metas to the same tensor when it wouldn't before (2) The tensors that either were unwrapped tensors or aliases of unwrapped tensors now get a flag on them (now called `alias_of_unwrapped`). This way, they are still wrapper tensors (and don't have to potentially break autograd) but we can identify that they should be treated like an unwrapped tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/85374 Approved by: https://github.com/zou3519 commit 8d99d6127ef49db8286f6fb5dc7ae7e634c92a22 Author: Antonio Kim Date: Fri Sep 30 19:25:38 2022 +0000 Add torch_lazy_all_numbers_special_scalars flag (#85902) This is to allow even non zero and one scalars to appear as constants in the graph. The assumption being that none of them will change. The flag is set to `false` by default to preserve the original behaviour. CC: @wconstab @JackCaoG @ke1337 @vaibhavc-cerebras @glebk-cerebras Pull Request resolved: https://github.com/pytorch/pytorch/pull/85902 Approved by: https://github.com/wconstab commit be327ec08f320e256d444693dde65fe55831bc46 Author: Denis Vieriu <104024078+DenisVieriu97@users.noreply.github.com> Date: Fri Sep 30 18:51:43 2022 +0000 [MPS] Fix base shape size for view ops in case of multiple slices (#85934) Fixes https://github.com/pytorch/pytorch/issues/84364, https://github.com/pytorch/pytorch/issues/85592 Fixes bug for view ops where the base shape would be incorectly determined. E.g for the following tensor `torch.tensor([0.5, 0.5], device="mps")[1][None]`, we could consider the base shape of the parent tensor as 1, while the actual base shape is 2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85934 Approved by: https://github.com/kulinseth commit 8669f6d42691c2124414cc97d0061ea6a0143007 Author: Justin Chu Date: Fri Sep 30 18:33:00 2022 +0000 [ONNX] Fix layer_norm return type (#85979) When aten fallback is true, `_layer_norm_returns_normalized_input_mean_rstd` can return a single value. - Removed `_layer_norm_returns_normalized_input_mean_rstd` and have layer_norm call native_layer_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85979 Approved by: https://github.com/BowenBao commit 7ddf167ba5db277e02f983a6bde2bc3f5fbe1caa Author: Prashant Kumar Date: Fri Sep 30 18:30:06 2022 +0000 Move the asserts in shape functions upsample_nearest_2d op. (#85801) The assert check are moved to top and the function now returns out. This is needed by the downstream torch-mlir project to correctly determine the output type. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/85801 Approved by: https://github.com/eellison commit b60ad2e5292db92e4b055abae78e692f5b8326f5 Author: George Qi Date: Thu Sep 29 23:51:05 2022 +0000 [maskedtensor] negative testing (#85938) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85938 Approved by: https://github.com/cpuhrsch commit 0a7d8b40b6f956a14b7ea02e04f596e914414c47 Author: Feisi Fu Date: Thu Sep 29 06:45:03 2022 +0000 Create a quantized in-palce version CUDA ReLU function, relu_quantized_cuda_. (#85670) Summary: this and #85669 are to allow the relu function to run on a quantized tensor on cuda. That is torch.relu(qa) for a quantized tensor qa on cuda. Test Plan: python test/test_quantization.py Previous PR that has been reverted: #85502. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85670 Approved by: https://github.com/dzdang, https://github.com/z-a-f commit eb650abc2c38f9f2f0e45f13877a4a57b8825cca Author: Pedro Nacht <15221358+pnacht@users.noreply.github.com> Date: Fri Sep 30 16:53:16 2022 +0000 Add OpenSSF Scorecard Action (#85412) Closes #85159 As per the linked issue, this PR adds the OpenSSF Scorecards GitHub Action, which automatically checks the repo's supply-chain security processes and reports results to the repo's Security dashboard. This current version of the workflow has the `id-token : write` permission. This is necessary in order to publish results to a public REST API the OpenSSF makes available for consumers to check participating projects' results. Naturally, if you'd rather not publish these results, I can modify the workflow to remove this behavior. The Action has an associated optional badge which can be added to the repo's README. However, given how PyTorch avoids badges, I have naturally not included it. (Let me know if you want it!) @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/85412 Approved by: https://github.com/malfet, https://github.com/huydhn commit 7e5105dd113dd6b4a920a3952088e3563ede1375 Author: Catherine Lee Date: Fri Sep 30 16:51:28 2022 +0000 [ci] expand log file if failed (#85927) as in title, expand the logs if the test file failed ex https://github.com/pytorch/pytorch/actions/runs/3155045945/jobs/5133566508 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85927 Approved by: https://github.com/huydhn, https://github.com/janeyx99 commit 9ba1630bd729a35e903a8c411e3e5341de5ba165 Author: Alexander Grund Date: Fri Sep 30 16:45:41 2022 +0000 Limit world size in test_fsdp_pure_fp16 (#85957) When using more than 5 GPUs for this test the difference between the reference output tensor and the FSDP output tensor becomes to large likely due to the usual floating point inaccuracies especially as FP16 is used. So set the world size (i.e. the number of GPUs) to a maximum of 5. Fixes #78975 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85957 Approved by: https://github.com/awgu commit 3a13c8493a06973f671604b17dd9ef8836eec52c Author: Rohan Varma Date: Fri Sep 30 16:28:17 2022 +0000 [1.13] Mention optim_input future BC breakage (#85963) We should remove this arg when release after 1.13 rolls around, enhance warning to indicate it will be gone. We can do this as FSDP is still beta and can be BC breaking until we stabilize the API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85963 Approved by: https://github.com/awgu commit d003757a841b1f8691904257128d91fd699c6c54 Author: Will Constable Date: Fri Sep 30 16:10:31 2022 +0000 Clone symint on set_sizes_and_strides (#85878) From the perspective of having valid sympy expressions for any given size/stride property, we can have tensors inherit SymInts from each other (in cases where the size expression is unchanged, which is a common case). But we also use SymInts to let us build graph traces of our programs, and we need to be able to trace from a SymInt back to the tensor that it originated from in order to trace correct graphs. This change ensures each tensor starts with fresh SymInts. - note: our policy has already been to use PySymIntNode objects to store pointers to proxy-tracer objects for use during tracing - before making this change (to clone symints), sometimes we'd attempt to store more than one proxy-tracer object on the same symint and the last-stored one would clobber all the earlier ones. This would result in tracing the wrong graph in some cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85878 Approved by: https://github.com/ezyang commit 24adadd4dbcd90b5aba1d4a45847e4ffa83bd6cc Author: PyTorch MergeBot Date: Fri Sep 30 14:32:49 2022 +0000 Revert "Disallow saved tensor hooks in functorch transforms (#85829)" This reverts commit d8277d9075396a3188490c322648605927384ba5. Reverted https://github.com/pytorch/pytorch/pull/85829 on behalf of https://github.com/atalman due to Reverting since failed build-fisp-diff-linux_platform010-opt commit 801818f9e6bb8684a1c41dc6ef3c74ad62feeb4d Author: PyTorch MergeBot Date: Fri Sep 30 14:31:09 2022 +0000 Revert "Add mechanism to disable the "saved tensors hooks" feature (#85553)" This reverts commit 5aa183d2bc7372b4deb4e4b2f31017be9f13264c. Reverted https://github.com/pytorch/pytorch/pull/85553 on behalf of https://github.com/atalman due to Reverting since failed build-fisp-diff-linux_platform010-opt commit b13b10a8fab83c9c260e16a8cfb4d99140e9352b Author: erjia Date: Fri Sep 30 13:30:18 2022 +0000 Extend collate function that can register collate functions to handle specific types (#85748) As per request from Vision team, adding `collate` function with an extra argument of `collate_fn_map` to dispatch custom collate functions for non-collection objects and specific objects. If the type of batch element is not present in`collate_fn_map`, it will go through all keys in the insertion order to check if the type is a subclass of the key. If so, it will invoke the corresponding collate functions. And, `default_collate` will utilize the `collate` function with a few by default collate function for `int`, `float`, `str` and `numpy object`. Benefit: - Domain teams can register their own `collate` function to handle their specific type of objects - Easier for users to extend from the `collate` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85748 Approved by: https://github.com/NivekT, https://github.com/pmeier commit b00a5359f750a75e3722327144a5ce2170f6e28a Author: Ivan Yashchuk Date: Fri Sep 30 12:01:45 2022 +0000 Add a way to skip lowering to nvprims (#85811) This PR adds `skip_ops` argument to `TorchRefsNvfuserCapabilityMode` and `NvfuserPrimsMode` which is an iterable of function names to be skipped in the translation to nvprims process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85811 Approved by: https://github.com/mruberry, https://github.com/jjsjann123 commit 787028cadb7fe83986111ffb7ddb058a68b763c0 Author: lezcano Date: Thu Sep 29 18:19:57 2022 +0000 Implement col2im decomposition and fix im2col and add a few preconditions (#85541) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85541 Approved by: https://github.com/jansel commit 1f38abb5d2d3b9458b395bb31b684aeef14ca99f Author: Ke Wen Date: Fri Sep 30 09:17:49 2022 +0000 Adopt ncclRemoteError (#85887) `ncclRemoteError` was added in NCCL 2.13 to indicate a network error or a remote process exiting prematurely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85887 Approved by: https://github.com/wanchaol commit 8f4edf1e1dc9419f0bab66a67c8f149d7b53fc25 Author: BowenBao Date: Thu Sep 29 20:18:37 2022 -0700 [ONNX] Initial version of diagnostics infrastructure. (#85107) This PR introduces a general Python diagnostics infrastructure powered by SARIF, and the exporter diagnostics module that builds on top of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85107 Approved by: https://github.com/abock, https://github.com/justinchuby commit dab1c7c379d9597d8424aa55da888cfabe9ade3b Author: Nikita Shulga Date: Fri Sep 30 06:33:42 2022 +0000 Update trunk CUDA-10.2 to CUDA-11.7 (#85943) As CUDA-10.2 is finally disabled Pull Request resolved: https://github.com/pytorch/pytorch/pull/85943 Approved by: https://github.com/huydhn, https://github.com/atalman commit ade1c19612ae84654f76aa9e5c709de6d9654d72 Author: Ke Wen Date: Fri Sep 30 05:48:16 2022 +0000 Add reduce_scatter_tensor in place of _reduce_scatter_base (#85867) This is a twin PR similar to the one for `all_gather_into_tensor` (#85686). The philosophy for renaming `_reduce_scatter_base` instead of merging it is described in #85686. Cc @rohan-varma @H-Huang @crcrpar @ptrblck @mrshenli Pull Request resolved: https://github.com/pytorch/pytorch/pull/85867 Approved by: https://github.com/crcrpar, https://github.com/H-Huang commit 33401ee81f91d213a4c24ec0b4de266701179b48 Author: BowenBao Date: Thu Sep 29 13:42:34 2022 -0700 [ONNX] Rename 'sarif_om' to 'sarif' (#85918) 'sarif_om' was the module name in the original repository https://github.com/microsoft/sarif-python-om. But since we have moved along with various extensions, it wouldn't hurt to rename the module for clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85918 Approved by: https://github.com/abock, https://github.com/thiagocrepaldi, https://github.com/justinchuby commit 6bb0a36d0ecc886bf31ae917f244f039501a779a Author: BowenBao Date: Thu Sep 29 13:42:29 2022 -0700 [ONNX] Add type annotation for SARIF attributes (#85898) Separated from #85651 to highlight the type annotation changes. It should support all type annotations needed by SARIF, except for the dictionary types described verbally like the following example. For now it is only annotated as `Any`. To enable it, we will need to extend `jschema_to_python` tool to allow passing in type hints. ```json "messageStrings": { "description": "A set of name/value pairs with arbitrary names. Each value is a multiformatMessageString object, which holds message strings in plain text and (optionally) Markdown format. The strings can include placeholders, which can be used to construct a message in combination with an arbitrary number of additional string arguments.", "type": "object", "additionalProperties": { "$ref": "#/definitions/multiformatMessageString" } }, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85898 Approved by: https://github.com/justinchuby, https://github.com/abock, https://github.com/thiagocrepaldi commit e9b254a025b493df7a5f16a4f3f4641f07adf44b Author: BowenBao Date: Thu Sep 29 13:42:28 2022 -0700 [ONNX] Migrate SARIF from attr to dataclasses (#85651) Move to dataclasses since PyTorch does not depend on `attr`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85651 Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock, https://github.com/thiagocrepaldi commit 91667d1d218937eb85ac1db8e22b8ab94213be9f Author: BowenBao Date: Thu Sep 29 13:42:28 2022 -0700 [ONNX] Introduce SARIF (#85428) That's the parent issue tracking this and more follow up tasks, so will keep open after this. This PR introduces the python classes for SARIF object model, along with script for generation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85428 Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock, https://github.com/thiagocrepaldi commit 1ad0048b64d0e709482d387419947c9142b94b04 Author: Min Si Date: Fri Sep 30 05:13:48 2022 +0000 Refactor distribuetd to use absolute header path (#85780) Headers under torch/csrc/distributed may be referened with relative path, e.g., "". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. **How to test**: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera, https://github.com/huydhn commit 0b0ce72b250f4f65ceee6909ad8743e5174c3579 Author: Taylor Robie Date: Wed Sep 28 15:42:59 2022 -0700 [Profiler] Extend ID assignment to allocations and frees (#85719) This is necessary for memory profiling because we need to know how to interpret an allocation. However there is a slight wrinkle: we don't know if an allocation is for a Tensor's StorageImpl until we see it used in a later call. (We could record outputs, however we're not willing to incur the overhead.) So we instead treat all allocations as relevant and then filter out some later. Otherwise the change to the ID assignment algorithm is minimal. Differential Revision: [D39788870](https://our.internmc.facebook.com/intern/diff/D39788870/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85719 Approved by: https://github.com/chaekit commit 95681929e4c379c504d8a7761f8104118a5a16db Author: Edward Yang Date: Fri Sep 30 03:19:09 2022 +0000 Hotfix for S298125 (#85814) Summary: Crash error is: ``` Mismatch in kernel C++ signatures operator: aten::cat no debug info kernel 1: FN2at6TensorEN3c108ArrayRefIS0_EExE dispatch key: Metal registered at buck-out/gen/a1f97bbb/fbobjc/Libraries/FBPyTorchCore/torch_core_ig_ops_metal/aten/src/ATen/native/metal/ops/MetalConcat.mm:205 kernel 2: FN2at6TensorERKN3c108IListRefIS0_EExE dispatch key: CPU registered at buck-out/gen/a1f97bbb/fbobjc/Libraries/FBPyTorchCore/torch_core_ig_ops_aten/RegisterCPU.cpp:29749 Exception raised from registerKernel at xplat/caffe2/aten/src/ATen/core/dispatch/OperatorEntry.cpp:130 (most recent call first): ``` We fix it by changing the Metal kernel to take an IListRef instead of an ArrayRef. Test Plan: Build igios per https://www.internalfb.com/intern/wiki/IOS_On_Demand/iOS_On_Demand_Use_Guide/ and show it doesn't crash Differential Revision: D39888394 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85814 Approved by: https://github.com/SS-JIA commit a50d8864fc6a7821134a76927ff292575e5ecc85 Author: PyTorch MergeBot Date: Fri Sep 30 02:04:29 2022 +0000 Revert "Refactor distribuetd to use absolute header path (#85780)" This reverts commit 668082718aefce95ecc1b1c312ea6f127b2c662e. Reverted https://github.com/pytorch/pytorch/pull/85780 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks build due to a missing file commit 668082718aefce95ecc1b1c312ea6f127b2c662e Author: Min Si Date: Fri Sep 30 00:27:24 2022 +0000 Refactor distribuetd to use absolute header path (#85780) Headers under torch/csrc/distributed may be referened with relative path, e.g., "". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. **How to test**: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera commit 81b366a9dd519bc3d93d307bff81c279c9510e1b Author: Abhishek Pathak Date: Fri Sep 30 00:24:16 2022 +0000 [MPS] Handle scalar input for scatter and gather (#85842) Issue noticed in test consistency - "Indexing dim 0 is out of bounds of tensor" Pull Request resolved: https://github.com/pytorch/pytorch/pull/85842 Approved by: https://github.com/kulinseth commit 62a4fd7907f0f2c667f05aa9a4d1eec7190a6c83 Author: Abhishek Pathak Date: Fri Sep 30 00:19:14 2022 +0000 [MPS] Handle output shape for empty input in binary ops (#85836) Output of input shape [0,1,2] should be [0,1,2], not [0] i.e. delay returning from empty input condition to resize/reshape the output accordingly Pull Request resolved: https://github.com/pytorch/pytorch/pull/85836 Approved by: https://github.com/DenisVieriu97, https://github.com/kulinseth commit ae93a4dc43a7103b1856b83456b247dd3395fe47 Author: Richard Barnes Date: Fri Sep 30 00:09:02 2022 +0000 Make launch check exit code depend on results (#85886) It is possible for code to land that doesn't check kernel launches for success (#85885) fixes such an issue. I think setting the return code of the linter is the correct way of handling this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85886 Approved by: https://github.com/ezyang commit dde43d083b2ece2f659d6ef45e18d4d24b55a1b2 Author: Andrey Date: Thu Sep 29 23:44:57 2022 +0000 [c10d] Reorder macros so they are defined before getting used (#85850) Summary: Move preprocessor macros all the way up, so they are defined before being used. Test Plan: existing tests Reviewed By: wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/85850 Approved by: https://github.com/wanchaol commit 9009393f46515c4dbc5ae5d9054e8c2df48ee5c5 Author: Justin Chu Date: Thu Sep 29 23:26:54 2022 +0000 [ONNX] Remove protocol dataclass (#85916) Remove the `_WithOp` protocol because it is not used and causes the dataclass `GraphContext` to not be able to init in some python versions. Reference to issue of dataclasses Inheriting from Protocol https://github.com/python/cpython/issues/89244 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85916 Approved by: https://github.com/BowenBao, https://github.com/abock, https://github.com/thiagocrepaldi commit 6a14fcb9223d27d4be1d19d24e535707f76a3e01 Author: Denis Vieriu Date: Thu Sep 29 23:23:00 2022 +0000 [MPS] Add support for aten::masked_select on mps (#119) (#85818) Reuse the `index.Tensor_out` implementation since it's already expanding the bool/byte indices to long tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85818 Approved by: https://github.com/kulinseth commit 85258ec17eb12b57943cb9b3157d2696f2097fbe Author: George Qi Date: Thu Sep 29 20:14:55 2022 +0000 Add mask_type=2 to masked_softmax for when mask.size() == input.size() (#85915) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85915 Approved by: https://github.com/cpuhrsch commit 6004c65af8fe9c5bbd12811dfb42f9e369b9ebce Author: Kevin Stephano Date: Thu Sep 29 23:06:15 2022 +0000 Fix rand_like nvprim meta function. (#85882) Really minor fix necessary to work with TorchDynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85882 Approved by: https://github.com/mruberry, https://github.com/jjsjann123 commit 103a21f4809f586b3c07aa37aa59e8b234dd2880 Author: vfdev Date: Thu Sep 29 22:43:07 2022 +0000 Update _torch_docs.py (#85924) Typo fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/85924 Approved by: https://github.com/kit1980 commit c036fb3e7d50a4d239218e404f1d304669c035c3 Author: Yu Guo Date: Thu Sep 29 10:43:07 2022 -0700 assert lambda >= 0 in poisson distribution cuda kernel (#85906) fix https://github.com/pytorch/pytorch/issues/85731 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85906 Approved by: https://github.com/ngimel commit bc57306bdd6a041e64d77e8bc8fdb470e6ff0815 Author: Kazuaki Ishizaki Date: Thu Sep 29 21:41:59 2022 +0000 Fix typo under docs directory and RELEASE.md (#85896) This PR fixes typo in rst files under docs directory and `RELEASE.md`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85896 Approved by: https://github.com/kit1980 commit 11224f34b8d1eedd9806168532c1ad5f2adb1508 Author: Yu Guo Date: Thu Sep 29 21:20:38 2022 +0000 assert weights being 1-d tensor in bincount (#85881) Summary: as title, fix https://github.com/pytorch/pytorch/issues/85777 Test Plan: unittest added Differential Revision: D39913476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85881 Approved by: https://github.com/mruberry, https://github.com/ngimel commit 6db3539e700ce7a81be356700f0803b2002bc63c Author: PyTorch MergeBot Date: Thu Sep 29 20:06:52 2022 +0000 Revert "Improve make_tensor performance for float and complex types (#85473)" This reverts commit a76995e584b880910f0724be98eb21773e8ed6e9. Reverted https://github.com/pytorch/pytorch/pull/85473 on behalf of https://github.com/huydhn due to Sorry for revert your PR, but it seems to cause a bunch of flaky test in pull an periodic commit 50000f3cdcc4f0c4e29ec20b52fd54723092b95a Author: Richard Zou Date: Thu Sep 29 14:46:23 2022 -0400 Align functorch docs with PyTorch's (#85856) This PR: - changes the header/footer to be the same as PyTorch docs - removes the functorch logo (we don't need it anymore, functorch has been adopted into PyTorch) - adjusts the functorch docs to make it clear that the page is functorch documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85856 Approved by: https://github.com/svekars, https://github.com/samdow commit 19c7a6b54b9b29991cfdc5607361c2f30f5a6248 Author: Richard Zou Date: Thu Sep 29 14:46:23 2022 -0400 [functorch] Update notebooks for latest release (#85855) This PR: - dedups our colab notebooks with the regular functorch notebooks. The colab notebooks were versions of the reuglar notebooks that had install instructions. Now that functorch is easier to install, we do not need those anymore. - fixes the colab links Test Plan: - build docs locally and tested them Pull Request resolved: https://github.com/pytorch/pytorch/pull/85855 Approved by: https://github.com/samdow commit 48b3582e28141d0f7ebc27dcbca5ead1825df76f Author: Richard Zou Date: Thu Sep 29 14:46:22 2022 -0400 [functorch] Update install instructions in docs (#85854) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85854 Approved by: https://github.com/samdow commit 06868004b7cf10241a68be74d35a536572e650bc Author: John Detloff Date: Thu Sep 29 19:49:11 2022 +0000 Remove codesigning from ios circleci workflows (#85630) This PR is a follow up to https://github.com/pytorch/pytorch/pull/85597 which removes codesigning from our github action workflows. This is a synonymous change to our circleci workflows. Since we only run TestApp on simulator we don't need to have this codesigning logic. (And more pressingly, these dev cert is expiring at the end of the month and we don't have a replacement) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85630 Approved by: https://github.com/atalman, https://github.com/malfet commit a9183c0f9ecc9be47fcb7abf1b23204d26821aa8 Author: Tugsbayasgalan (Tugsuu) Manlaibaatar Date: Thu Sep 29 19:16:17 2022 +0000 Fix bug in PythonFallBack (#85795) Summary: Previously PythonCallBack fails to find interpreter to dispatch to when it encounters an op with OptionalTensorList parameter, this diff fixes that Test Plan: CI Differential Revision: D39881382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85795 Approved by: https://github.com/ezyang, https://github.com/bdhirsh commit fe87ae692f813934d1a74d000fd1e3b546c27ae2 Author: Alexander Grund Date: Thu Sep 29 18:36:33 2022 +0000 Fix `check_compiler_ok_for_platform` on non-English locales (#85891) The function checks the output of e.g. `c++ -v` for "gcc version". But on another locale than English it might be "gcc-Version" which makes the check fail. This causes the function to wrongly return false on systems where `c++` is a hardlink to `g++` and the current locale returns another output format. Fix this by setting `LC_ALL=C`. I found this as `test_utils.py` was failing in `test_cpp_compiler_is_ok` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85891 Approved by: https://github.com/ezyang commit 0449cf0c9e469f052bb9316b13260d126d6f01d4 Author: Richard Zou Date: Wed Sep 28 14:53:51 2022 -0700 Re-introduce the functorch docs build (#85838) We deleted it when merging functorch into pytorch. This PR makes a new functorch docs build. The docs are relatively simple: - cd into `functorch/docs` and run `make html` to build the docs. - docs should get pushed to the pytorch/functorch repo's gh-pages branch. The long term plan is: - one day, the functorch APIs will just be torch.* APIs, at which point we can fold all of the functorch docs into the regular PyTorch docs - When that happens, the functorch examples and tutorials (that are on the functorch docs site) can be moved to the pytorch examples and pytorch tutorials. Test Plan: - check docs preview - watch this PR after it goes in Pull Request resolved: https://github.com/pytorch/pytorch/pull/85838 Approved by: https://github.com/malfet commit 941d7a31f65d4c6da3b94178be254f7ac20d482e Author: Saliya Ekanayake Date: Thu Sep 29 17:28:58 2022 +0000 Pass group ranks and options to third party distributed backends (#73164) Fixes #73163 PyTorch's [_new_process_group_helper()](https://github.com/pytorch/pytorch/blob/9f541aa3aca768e7fbfa4a9d648b554f22b261f7/torch/distributed/distributed_c10d.py#L633) does not pass group's participating ranks to the backend. This PR adds the above capability. Also, refactors some variables for better clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73164 Approved by: https://github.com/kumpera commit e15a48def7f1e7b58710bcc4a3d18624948c5fbc Author: nikitaved Date: Thu Sep 29 17:12:04 2022 +0000 (bsr/csr) x dense mm (#85551) As per title. This implementation is not the most optimal and could be improved albeit with native kernels (i.e. block matching need not be materialized). Compared to existing kernels it offers: - Half float support (In fact, any dtype that supports `matmul` will work). - Arbitrary block sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85551 Approved by: https://github.com/amjames, https://github.com/cpuhrsch commit ef0baba23f65062cfda6dd25fa67d02dc1a06fea Author: Masaki Kozuki Date: Thu Sep 29 17:02:04 2022 +0000 Use `int64_t` for nll_loss with cuda inputs (#85395) Related #85005 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85395 Approved by: https://github.com/t-vi, https://github.com/lezcano commit 5f26df0345bef35de9cbf585ca0c1af1cd91b9c8 Author: Masaki Kozuki Date: Thu Sep 29 16:58:59 2022 +0000 resubmit: "resubmit: [mta] APEX style Fused Adam (#81705) (#85507)" (#85739) Embarrassingly move the pow implementations around [ATen/native/cuda/PowKernel.cu#L21-L66](https://github.com/pytorch/pytorch/blob/849b08f14b2a741d0b90bb7bfce0ebb3d07d1981/aten/src/ATen/native/cuda/PowKernel.cu#L21-L66) to a new header file and let FusedAdam use them to tame MSVC, hopefully. cc @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85739 Approved by: https://github.com/ngimel commit 44eefb1376b5a05568f047afde4193e951293625 Author: Kimish Patel Date: Tue Sep 27 18:06:32 2022 -0700 Update debug flag for vulkan (#85715) DEBUG is too generic name and when building some other target it seems to conflict with it, so defining VULKAN_DEBUG Differential Revision: [D39449772](https://our.internmc.facebook.com/intern/diff/D39449772/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39449772/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85715 Approved by: https://github.com/SS-JIA commit ad3bea58daa0de46e7f2a5f2b3a397f9b3aba5fb Author: Kimish Patel Date: Tue Sep 27 18:06:29 2022 -0700 Add vulkan qualifier to the kernel name (#85714) This helps with any post processing as well as distinguishing the kernel name appearing in the chrome trace Differential Revision: [D39473299](https://our.internmc.facebook.com/intern/diff/D39473299/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85714 Approved by: https://github.com/SS-JIA commit e0af12a0765738e1367b3288be413d3f3737522e Author: Kimish Patel Date: Tue Sep 27 18:06:27 2022 -0700 [Pytorch][benchmark vulkan] Fix vulkan profiling (#85713) This diff: - adds interface to enable/disable profiling - Fixes profiling bug where ticks measured by timestamp queries are not accounting for timestampPeriod. Differential Revision: [D39449769](https://our.internmc.facebook.com/intern/diff/D39449769/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39449769/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85713 Approved by: https://github.com/SS-JIA commit e0170c7cded06d4e91a2ee8a0512b80ed4b66e6e Author: Richard Zou Date: Wed Sep 28 09:59:18 2022 -0700 Remove torch/extension.h dependency in torch/csrc/functorch/init.cpp (#85659) This file doesn't depend on APIs there. Required adding some namespacing to symbols. Test Plan: - build & test Pull Request resolved: https://github.com/pytorch/pytorch/pull/85659 Approved by: https://github.com/Chillee commit 8fb470e81ad29d82204235194b809b8072942c70 Author: kshitij12345 Date: Thu Sep 29 15:40:09 2022 +0000 [fix] max_pool1d: shape check (#85594) Fixes #76587 Before PR: ```python import torch max_pool = torch.nn.MaxPool1d(3) t = torch.rand([17, 0, 50], dtype=torch.float32) # note requires_grad is False max_pool(t) # Worked and returned tensor of shape [17, 0, 48]. ``` After PR ```python import torch max_pool = torch.nn.MaxPool1d(3) t = torch.rand([17, 0, 50], dtype=torch.float32) # note requires_grad is False max_pool(t) # Errors with `max_pool1d: Expected 2D or 3D (batch mode) tensor with optional 0 dim batch size for input, but got: [17, 0, 48]` ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85594 Approved by: https://github.com/mruberry commit cab6ffa0f7a12ce50d50831910922014392ee173 Author: jjsjann123 Date: Thu Sep 29 15:22:45 2022 +0000 catches failure on nvprim speculative lowering (#85580) Fixes #85517 Added a try/catch exception during tracing `get_isolated_graphmodule` inside `_is_func_unsupported_nvfuser`. Stops speculative lowering to nvprim when query errors out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85580 Approved by: https://github.com/mruberry, https://github.com/IvanYashchuk commit a807f1987a60324075ab0690af3b5f7cf9ecf319 Author: atalman Date: Thu Sep 29 15:04:24 2022 +0000 Stop cuda-10.2 binary builds (#85873) Deprecate cuda 10.2 nightly Pull Request resolved: https://github.com/pytorch/pytorch/pull/85873 Approved by: https://github.com/malfet commit 3cdf621fe5c8f8378fda209b8a143443d33b2086 Author: Jane Xu Date: Thu Sep 29 14:28:55 2022 +0000 Add opt-einsum to CI (#85574) Depends on https://github.com/pytorch/pytorch/pull/84890. This PR adds opt_einsum to CI, enabling path optimization for the multi-input case. It also updates the installation sites to install torch with einsum, but those are mostly to make sure it would work on the user's end (as opt-einsum would have already been installed in the docker or in prior set up steps). This PR also updates the windows build_pytorch.bat script to use the same bdist_wheel and install commands as on Linux, replacing the `setup.py install` that'll become deprecated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85574 Approved by: https://github.com/huydhn, https://github.com/soulitzer commit b25a1ce22d965a852da5979e7d6af9fe91518451 Author: Eric Zhang Date: Thu Sep 29 14:17:05 2022 +0000 Release GIL when doing shared memory copies on Tensors (#85389) See discussion here for context: https://pytorch.slack.com/archives/GEEQ2K4MD/p1663672716533319?thread_ts=1662155536.133099&cid=GEEQ2K4MD, opening a PR as suggested by @albanD Currently PyTorch holds the GIL when copying Tensors into shared memory. For certain workloads it would be nice to be able to copy different tensors into shared memory in parallel, but with the GIL being held the copies can't truly run in parallel. Here's a short example of this: ``` import torch import time from multiprocessing.pool import ThreadPool tensors = [] for i in range(64): for j in range(8): t = torch.ones(128, 480, 640).type(torch.uint8) * i tensors.append(t) print("Done generating input tensors") with ThreadPool(processes=8) as pool: futures = [] before = time.time() for t in tensors: future = pool.apply_async(t.share_memory_) futures.append(future) for f in futures: f.get() elapsed = time.time() - before print("ELAPSED TIME", elapsed) ``` With this diff, I get: ``` ~$ python repro.py Done generating input tensors ELAPSED TIME 3.561321258544922 ~$ ``` Previously, I would get: ``` ~$ python repro.py Done generating input tensors ELAPSED TIME 16.305657386779785 ~$ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85389 Approved by: https://github.com/albanD commit 6fae62b35f0e4a0d93de6966dc1d9517e9b6ddff Author: PyTorch MergeBot Date: Thu Sep 29 13:51:05 2022 +0000 Revert "C10D extension to enable per-thread PG (#84153)" This reverts commit 5cbffbbac9a59098637f821e8b6e10f609de30ff. Reverted https://github.com/pytorch/pytorch/pull/84153 on behalf of https://github.com/kumpera due to broke internal stuff commit 976e2a350273b352eac7cbf7d11dcdeacfcba34d Author: Jithun Nair Date: Thu Sep 29 13:31:41 2022 +0000 Separate magma installation for ROCm into its own file (#85567) This aligns it with the builder repo scripts structure: https://github.com/pytorch/builder/blob/main/common/install_rocm_magma.sh https://github.com/pytorch/builder/blob/main/common/install_rocm.sh Pull Request resolved: https://github.com/pytorch/pytorch/pull/85567 Approved by: https://github.com/jeffdaily, https://github.com/huydhn commit 9fb72ca4941edb37c7529b3750617dc4bb6b4fc1 Author: lezcano Date: Wed Sep 28 12:16:55 2022 +0000 Treat layout / pin_memory consistently across creation refs (#85333) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85333 Approved by: https://github.com/mruberry, https://github.com/ngimel commit a76995e584b880910f0724be98eb21773e8ed6e9 Author: Peter Bell Date: Wed Sep 28 17:38:25 2022 +0100 Improve make_tensor performance for float and complex types (#85473) For floating types, `make_tensor` calls `rand` and then does a linear interpolation from `low` to `high`. This instead calls `uniform_(low, high)` to cut out the interpolation step. For complex types, `make_tensor` does the `rand` + interpolation step twice and calls `torch.complex(real, imag)` at the end. This instead uses `view_as_real` and `uniform_(low, high)` to fuse it all into one operation. My benchmarks show significant speedups in all cases for float32 and complex64. | Device | dtype | Size | Master (us) | This PR (us) | Speedup | |--------|-----------|-------|-------------|--------------|---------| | CPU | float32 | 8 | 19.4 | 6.34 | 3.1 | | | | 4096 | 36.8 | 21.3 | 1.7 | | | | 2**24 | 167,000 | 80,500 | 2.1 | | | complex32 | 8 | 37.0 | 7.57 | 4.9 | | | | 4096 | 73.1 | 37.6 | 1.9 | | | | 2**24 | 409,000 | 161,000 | 2.5 | | CUDA | float32 | 8 | 40.4 | 11.7 | 3.5 | | | | 4096 | 38.7 | 11.7 | 3.3 | | | | 2**24 | 2,300 | 238 | 9.7 | | | complex32 | 8 | 78.7 | 14 | 5.6 | | | | 4096 | 82.7 | 13.8 | 6.0 | | | | 2**24 | 5,520 | 489 | 11.3 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/85473 Approved by: https://github.com/mruberry commit ad87365e54e7b20b49ac23ee325f1da732655808 Author: Peizhao Zhang Date: Thu Sep 29 07:58:54 2022 +0000 [qat]A more stable conv_bn fusion for qat training. (#85744) Summary: A more stable conv_bn fusion for qat training: * Existing implementation may cause QAT training loss become NaN. This could happen when the fused conv for qat (torch/nn/intrinsic/qat/modules/conv_fused.py) is used and is independent of if fake_quant is enabled. * This is caused by the unscaling for the conv output (`conv_orig = conv / scale_factor` where `scale_factor = bn.weight / running_std`) when there is 0 in `bn.weight`. * This implementation follows the [white paper](https://arxiv.org/pdf/1806.08342.pdf) better and fixed the issue by scaling `running_std / std_Y` instead and compute the fused output accordingly (see comments in conv_fused.py for more details): * It comes at the cost of running conv twice (one to update bn statistics and one to compute fake quant for fused weights). * It does not need to use conv bias for back prop. * It uses the bn statistics computed with the current input batch, while the existing code uses the statistics without the current batch. * The implementation could be enabled by setting the flag `_enable_slow_path_for_better_numerical_stability` to True after the model is prepared for QAT. * Unit test * Added test case for zero `bn.weight`. * Added test case for conv to has bias. Test Plan: buck run mode/dev-nosan //caffe2/test:quantization -- -r quantization.eager.test_quantize_eager_qat Differential Revision: D29506778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85744 Approved by: https://github.com/vkuzo commit 3cfc61b84659cea435411a546eca6a891584247f Author: Seonglyong Gong Date: Thu Sep 29 07:28:33 2022 +0000 [Profiler][trivial] Optimizer states (part 4 of Record Optimizer) (#85840) Summary: - add states into OptInfo and update unit testcase Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39406540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85840 Approved by: https://github.com/robieta commit 475022cd5d1fa3708c8d8728c6ae6eb34c053696 Merge: 5704c73b56 fd840676b0 Author: mingfeima Date: Thu Sep 29 14:41:56 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit fd840676b063e182ce55109470e7de389c9d9bfa Merge: 6b416bf681 1c0f0b33a0 Author: mingfeima Date: Thu Sep 29 14:41:56 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 72b32f164415e9f2e86767b2373c939f8f343d1b Author: Wanchao Liang Date: Wed Sep 28 17:43:03 2022 +0000 [c10d] move ncclgetlasterror directive definition upfront (#85825) Move the directive definition of ncclGetLastError() upfront so that C++ preprocessor does not treat this as a empty string Pull Request resolved: https://github.com/pytorch/pytorch/pull/85825 Approved by: https://github.com/H-Huang, https://github.com/kwen2501 commit dc63948dc9dbe4c224a78a7c14f406893f6fd381 Author: Justin Chu Date: Thu Sep 29 04:24:04 2022 +0000 [ONNX] Update behavior for `register_custom_op_symbolic` (#85636) Update `register_custom_op_symbolic`'s behavior to _only register the symbolic function at a single version_. This is more aligned with the semantics of the API signature. As a result of this change, opset 7 and opset 8 implementations are now seen as fallback when the opset_version >= 9. Previously any ops internally registered to opset < 9 are not discoverable by an export version target >= 9. Updated the test to reflect this change. The implication of this change is that users will need to register a symbolic function to the exact version when they want to override an existing symbolic. They are not impacted if (1) an implementation does not existing for the op, or (2) they are already registering to the exact version for export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85636 Approved by: https://github.com/BowenBao commit 3c9e8cd8df5f5739ed20830a0bfffd966a5c11db Author: Feisi Fu Date: Wed Sep 28 21:10:40 2022 +0000 Create a quantized non-in-palce version CUDA ReLU function, (#85669) Summary: this and #85670 are to allow the relu function to run on a quantized tensor on cuda. That is torch.relu(qa) for a quantized tensor qa on cuda. Test Plan: python test/test_quantization.py Previous PR that has been reverted: #85502. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85669 Approved by: https://github.com/dzdang commit 7628603aeeeb8ed160c2479f75175bb3ea028a42 Author: Seonglyong Gong Date: Thu Sep 29 03:58:34 2022 +0000 [Profiler] bug fix: python object reference counting (#85847) Summary: Wrong reference counting of Python Objects has made intermittent and corner-case-only segfault. - before : increment once decrement in a loop. - after: increment and decrement in different but consistent loops. Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39902973 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85847 Approved by: https://github.com/robieta, https://github.com/aaronenyeshi commit edb99df2e086fd22068f877c526b9424771bef0f Author: BowenBao Date: Tue Sep 27 16:42:07 2022 -0700 [ONNX] Fix reduce node shape inference (#85765) Fix logic in `ProcessReduceNode`. Previously a scalar was assigned for output shape of reduce nodes when `axes` attribute was not provided, regardless of the value of `keepdims_i` attribute. Hence it is incorrectly assuming all output axes should be folded. Since input rank is known, this fix populates axes to be `[0, 1, ..., input_rank - 1]` if axes is not provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85765 Approved by: https://github.com/abock commit 7e4684009c67ae6ce337e7c7727dc605b637af35 Author: soulitzer Date: Wed Sep 28 19:21:10 2022 -0400 Improve codegen for jvp decomposition (#84894) Fixes: https://github.com/pytorch/pytorch/issues/84888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84894 Approved by: https://github.com/albanD commit f23f362c5d10c47389eb0a1a93f45788c5abef8b Author: Taylor Robie Date: Wed Sep 28 15:42:58 2022 -0700 [Profiler] Use strong typedef for Tensor ID (#85718) I want add Tensor ID to allocations (for allocs which are `StorageImpl`s). To keep things safe and organized I need to pull the ID type into a standalone entity, which makes it an ideal time to convert to a strong typedef. Differential Revision: [D39788872](https://our.internmc.facebook.com/intern/diff/D39788872/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85718 Approved by: https://github.com/chaekit commit 282d8dfa68cc5098db050bef3991a62fbec4825e Author: Taylor Robie Date: Wed Sep 28 15:42:56 2022 -0700 [Profiler] Fix traversal utility (#85717) `eventTreeDFS` traverses in the wrong order (right to left). Moreover, we will need more complex traversal (e.g. early stopping) for memory profiler. Thus, I made a simple general `_traverse` method and added `functools.partial` specializations for DFS and BFS. Differential Revision: [D39788871](https://our.internmc.facebook.com/intern/diff/D39788871/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85717 Approved by: https://github.com/chaekit commit dfdfaec3fc599c2bb7a8ffaf1215e0284a2f4aa8 Author: Taylor Robie Date: Wed Sep 28 15:42:54 2022 -0700 [Profiler] Don't assign in AppendOnlyList::emplace_back (#85716) It turns out that we're invoking the copy assign operator in AppendOnlyList. While copy elision is expected to mostly hide any costs it does present issues for types with deleted copy assign operators. (It also seems to produce slightly worse assembly: https://godbolt.org/z/o4Gvz1fKs) Calling new at the correct position seems to be a better way to go about this. (At least from looking at other high performance containers like SmallVector.) Differential Revision: [D39852804](https://our.internmc.facebook.com/intern/diff/D39852804/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85716 Approved by: https://github.com/chaekit commit bd65adf4e9e59ac7de1d7f7d329a5df4237dcc5f Author: soulitzer Date: Wed Sep 28 19:21:10 2022 -0400 Properly fix log_sigmoid vmapjvp and remove hack (#84892) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84892 Approved by: https://github.com/albanD, https://github.com/zou3519 commit cca909645f9c3528469381e4236a468667258a1e Author: CaoE Date: Thu Sep 29 01:16:16 2022 +0000 Add bfloat16 support for lerp on CPU (#84327) Add bfloat16 support for lerp on CPU single core: op | shape |fp32 forward/ms|bf16 forward/s|fb32 backward/s| bf16 backward/s -- | -- | -- | -- | -- | -- lerp (tensor) | [10, 128, 10, 124] | 0.005489 | 0.000613 | 0.006658 | 0.003385   | [10, 128, 20, 124] | 0.011057 | 0.001204 | 0.016032 | 0.007869   | [10, 128, 30, 124] | 0.016691 | 0.001954 | 0.025549 | 0.012823   |   |   |   |   |   lerp (scalar) | [10, 128, 10, 124] | 0.001096 | 0.000507 | 0.002024 | 0.001479   | [10, 128, 20, 124] | 0.00247 | 0.000997 | 0.005468 | 0.002907   | [10, 128, 30, 124] | 0.004178 | 0.001513 | 0.009775 | 0.004859 single socket (28cores): op | shape | fp32 forward/s| bf16 forward/s| fb32backward/s| bf16 backward/s -- | -- | -- | -- | -- | -- lerp (tensor) | [10, 128, 10, 124] | 0.000236 | 3.93E-05 | 0.000494 | 0.000235   | [10, 128, 20, 124] | 0.000525 | 7.39E-05 | 0.002485 | 0.000638   | [10, 128, 30, 124] | 0.000801 | 0.000121 | 0.004235 | 0.001529   |   |   |   |   |   lerp (scalar) | [10, 128, 10, 124] | 5.90E-05 | 3.32E-05 | 0.000129 | 0.000116   | [10, 128, 20, 124] | 0.000155 | 5.87E-05 | 0.000368 | 0.000206   | [10, 128, 30, 124] | 0.000324 | 9.04E-05 | 0.001322 | 0.000313 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84327 Approved by: https://github.com/frank-wei commit 7cdd39b39316173487c0c4cfbb60aef0cb645757 Author: Justin Chu Date: Thu Sep 29 00:52:21 2022 +0000 [ONNX] Update `unconvertible_ops` (#85595) Update `unconvertible_ops` to create a list of unconvertible ops using the updated registry. - Use fewer passes in the jit process instead to avoid errors during conversion in the ONNX fallback mode - Actually check the registry to find implemented ops - Fix type hints for `_create_jit_graph` and `_jit_pass_onnx_remove_inplace_ops_for_onnx` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85595 Approved by: https://github.com/BowenBao commit ada6e5b53a55b5acfd48c503c94296d871296bb7 Author: Edward Z. Yang Date: Wed Sep 28 17:28:26 2022 -0400 Implement duck shaping on SymInts (#85808) Duck shaping says that when two input tensors have the same size, we assume they are symbolically related. This follows the same optimization done by inductor. This optimization is not done completely because we don't currently install guards corresponding to the duck shape relationships we created, but overall the guard propagation for dynamic shape tracing is incomplete at the moment. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85808 Approved by: https://github.com/albanD commit 3a3e2002d88ca2491170065c47cc50ce435fb92f Author: Xia, Weiwen Date: Thu Sep 29 00:44:40 2022 +0000 [Quant] Add unified x86 quant backend (#84329) Implement unified quantization backend 'X86' for x86 platforms. It combines the advantages of FBGEMM and ONEDNN. It selects kernels during weight prepacking and hide the details from end users. It will be the default backend in place of FBGEMM. For details, please refer to this RFC: [[RFC] Unified quantization backend for x86 CPU platforms](https://github.com/pytorch/pytorch/issues/83888) **Correctness** Covered by UT **Accuracy** By running torchvision models on imagenet, no accuracy difference is found between FBGEMM and the unified X86 backend: [torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx](https://github.com/pytorch/pytorch/files/9598114/torchvision_accuracy_comparison_fbgemm_vs_x86.xlsx) **Performance** Depends on https://github.com/pytorch/pytorch/pull/84470 which improves performance. For early PoC results, please refer to https://github.com/pytorch/pytorch/files/9399202/unified_qengine_poc_performance_bechmark.xlsx With the two PRs combined, we collected some data on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz Method: Run multi-instances with 4 cores per instance on whole socket. Using JeMalloc and Intel OMP. Models/throughput | fbgemm | x86 | improvement -- | -- | -- | -- wide_resnet101_2 | 173.5675 | 241.815 | 39.32% resnext101_32x8d | 174.365 | 339.8175 | 94.89% resnet50 | 573.155 | 1174.14 | 104.86% vgg19_bn | 260.335 | 337.92 | 29.80% vgg19 | 257.935 | 333.265 | 29.21% inception_v3 | 601.1175 | 1309.33 | 117.82% densenet161 | 296.645 | 435.5625 | 46.83% mnasnet1_0 | 1216.7 | 4057.515 | 233.49% squeezenet1_0 | 1220.085 | 5153.3875 | 322.38% alexnet | 2294.91 | 2624.6375 | 14.37% fbnetc_100 | 976.2825 | 3110.1825 | 218.57% shufflenet_v2_x0_5 | 1555.76 | 3026.125 | 94.51% spnasnet_100 | 1059.065 | 3502.0975 | 230.68% pytorch-unet | 192.76 | 246.77 | 28.02% acgan | 257.32 | 333.7325 | 29.70% cgan | 7790.6925 | 7803.1025 | 0.16% sgan | 257.565 | 338.8875 | 31.57% se_resnet50 | 492.3725 | 916.5175 | 86.14% vggm | 300.2875 | 316.2075 | 5.30% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84329 Approved by: https://github.com/jerryzh168 commit d542aab5c1bc544f9dc0eb5632bfe4432223d890 Author: zaf Date: Wed Sep 28 14:25:10 2022 -0700 [quant][ao_migration] nn.intrinsic migration to ao (#84842) All quantization-related modules are being migrated to `torch.ao`. This migrates the `nn.intrinsic.modules`. Please, see the [tracker](https://github.com/pytorch/pytorch/issues/81667) for the timeline. Differential Revision: [D39419733](https://our.internmc.facebook.com/intern/diff/D39419733/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39419733/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84842 Approved by: https://github.com/jerryzh168 commit 6a2b12dd656ed8c347968bebb4c3552582454019 Author: Elias Ellison Date: Wed Sep 28 20:20:56 2022 +0000 Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85471 Approved by: https://github.com/ezyang commit a67621a6ca19b0c1423a7b136bfd90d8f04182fb Author: Richard Zou Date: Wed Sep 28 11:31:47 2022 -0700 Update functorch README to reflect move into PyTorch (#85832) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85832 Approved by: https://github.com/Chillee commit 498591467b3c651aa929ad1d0858d6004da8b908 Author: Richard Zou Date: Wed Sep 28 11:20:11 2022 -0700 Excise functorch/version.txt (#85830) functorch no longer needs separate versioning. Also, we'll delete functorch/setup.py soon (in a couple of weeks). We've been leaving it around for BC reasons. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85830 Approved by: https://github.com/Chillee commit d8277d9075396a3188490c322648605927384ba5 Author: Richard Zou Date: Wed Sep 28 11:13:41 2022 -0700 Disallow saved tensor hooks in functorch transforms (#85829) Technically they may only be a problem with the grad transform. Though the branch cut is soon, this is the more conservative change, it also lets us disable checkpointing for functorch (which definitely doesn't work with all transforms) and not a lot of people use saved tensor hooks with functorch (I discovered this while testing). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/85829 Approved by: https://github.com/samdow commit 5aa183d2bc7372b4deb4e4b2f31017be9f13264c Author: Richard Zou Date: Wed Sep 28 07:24:35 2022 -0700 Add mechanism to disable the "saved tensors hooks" feature (#85553) The rationale for this is that functorch doesn't work with saved variable hooks at the moment or checkpointing and we need some way to disable it. Concretely: - there's a context manager that does the disabling - this feature is disabled on a thread-local basis - one can set an error message or use the default error message that says the feature has been disabled Since it is thread local I needed to update ATen/ThreadLocalState. To make things nicer, this PR refactors all the "saved tensors hooks" related TLS things into a single struct. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/85553 Approved by: https://github.com/soulitzer commit 85d8441fbabcc9e45648dbaa2c7c964ae32b1bb7 Author: Justin Chu Date: Wed Sep 28 19:52:43 2022 +0000 [ONNX] Deprecate setter functions for global variables (#85165) `_set_opset_version` and `_set_operator_export_type` are previously deprecated. This PR decorates them with the deprecation decorator, so warnings are emitted. - Remove usage of `_set_opset_version` and `_set_operator_export_type` in favor of setting the globals vars directly in torch.onnx internal - Update `GLOBALS.operator_export_type`'s default to not be None to tighten types - Remove usage of `_set_onnx_shape_inference` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85165 Approved by: https://github.com/BowenBao, https://github.com/AllenTiTaiWang commit 5deeb09d4e3001adfd3d04139b4a330915069ea7 Author: Justin Chu Date: Wed Sep 28 19:52:43 2022 +0000 [ONNX] Annotate all g as GraphContext (#85491) - Use g.opset to test export opset version - Annotate all `g` as GraphContext Pull Request resolved: https://github.com/pytorch/pytorch/pull/85491 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit c42a408baa65b37b459d538b440313f67c7c1cb7 Author: Justin Chu Date: Wed Sep 28 19:52:42 2022 +0000 [ONNX] Create decorator to handle symbolic context (#84776) - Create decorator to handle old style custom symbolics that require context - Deprecate `torch.onnx.SymbolicContext` in favor of `GraphContext`. Added deprecation message - Remove README reference of SymbolicContext Pull Request resolved: https://github.com/pytorch/pytorch/pull/84776 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 723193ec16897e8389a2ff7cf916a4a7e1ec564a Author: Eddie Yan Date: Wed Sep 28 22:30:42 2022 +0000 [cuDNN][cuDNN v8 API] Fix 3d convolution_add_relu in V8 (#85055) Fix for issue uncovered in #84948 CC @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85055 Approved by: https://github.com/ngimel commit 01add6e2884347409a27efa16dcbf5b355ec4bd5 Author: Mikayla Gawarecki Date: Wed Sep 28 20:30:37 2022 +0000 Allow only one -1 in nested view/reshape (#85691) Behavior before this PR: 1. `-1` allowed for implicit batch dimension 2. multiple `-1`s allowed for pre-existing dimensions 3. for new dimensions, `-1` is not allowed it is worth noting that for the most part 3 is basically unreachable because assuming a nested tensor has at least 1 ragged dimension, you would expect at least one -1 to be in the proposed shape for the pre-existing dimensions Behavior after this PR: 1. batch dimension **must be specified** 2. **only one** `-1` allowed for pre-existing dimensions **this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension** 3. unchanged Pull Request resolved: https://github.com/pytorch/pytorch/pull/85691 Approved by: https://github.com/cpuhrsch commit 1418a663b1d833159ce4bef4fad84bb983e454c9 Author: atalman Date: Wed Sep 28 22:27:52 2022 +0000 Fix upload condition pypi-cudnn build (#85799) Fix upload condition pypi-cudnn build We excute this in sh and looks like the condition with "==" is not getting triggered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85799 Approved by: https://github.com/DanilBaibak, https://github.com/jeanschmidt, https://github.com/seemethere commit 3d2316670f5968b6238f06bed247ec0ecbff444b Author: Justin Chu Date: Wed Sep 28 19:52:42 2022 +0000 [ONNX] Create GraphContext and load `g.op` method to the class (#84728) This PR create the `GraphContext` class and relays all graph methods to _C.Graph as well as implements the `g.op` method. The GraphContext object is passed into the symbolic functions in place of _C.Graph for compatibility with existing symbolic functions. This way (1) we can type annotate all `g` args because the method is defined and (2) we can use additional context information in symbolic functions. (3) no more monkey patching on `_C.Graph` Also - Fix return type of `_jit_pass_fixup_onnx_controlflow_node` - Create `torchscript.py` to house torch.Graph related functions - Change `GraphContext.op` to create nodes in the Block instead of the Graph - Create `add_op_with_blocks` to handle scenarios where we need to directly manipulate sub-blocks. Update loop and if symbolic functions to use this function. Should we put all the context inside `SymbolicContext` and make it an attribute in the `GraphContext` class? This way we only define two attributes `GraphContext.graph` and `GraphContext.context`. Currently all context attributes are directly defined in the class. Keep GraphContext flatand note that it will change in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84728 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 75db0225ad4bea02b84505bf96c1abceea97c99a Author: Elias Ellison Date: Wed Sep 28 19:45:49 2022 +0000 Handle fake tensor in intlist (#85759) Previously, we were swallowing up the Fake Tensor Exception and throwing `TypeError`, which led to https://github.com/pytorch/torchdynamo/issues/1066. Now, we are propagating back the `DataDependentOutputException`. If this approach is accepted, I can go ahead and do doublelist, symintlist, afterward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85759 Approved by: https://github.com/ezyang commit 913f5784d74bb69eff12b1cf9ac8c3d222750411 Author: Brian Hirsh Date: Tue Sep 27 12:03:54 2022 -0700 move functionalize out of experimental namespace (#85742) Did a very quick sanity check - it looks like functorch docs don't get the nice preview link that pytofch-bot gives for normal pytorch docs, so I built locally and scanned `html/generated/functorch.functionalize.html` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85742 Approved by: https://github.com/zou3519 commit 796da4df4d264aea8b3879dbda3f154271e94634 Author: Animesh Jain Date: Wed Sep 28 20:52:45 2022 +0000 Return contiguous tensor from softmax decomposition (#85788) Fixes https://github.com/pytorch/torchdynamo/issues/1135 Softmax decomp's output stride does not match with aten softmax output stride. Not sure if its desirable. Opening a PR for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85788 Approved by: https://github.com/ngimel, https://github.com/ezyang commit 8bb69a007f940ca5712ffc69922fa8f94cf27bd7 Author: Catherine Lee Date: Wed Sep 28 20:44:49 2022 +0000 reenable circleci mac jobs (#85824) undo https://github.com/pytorch/pytorch/pull/84438 and see if its green now Pull Request resolved: https://github.com/pytorch/pytorch/pull/85824 Approved by: https://github.com/huydhn, https://github.com/malfet commit 879ae45230d98e50250e9f8e704d78b5973bf227 Author: atalman Date: Wed Sep 28 20:34:13 2022 +0000 Increase timeout and retry count conda upload (#85802) Increase timeout and retry count conda upload. We are keep seeing conda upload failures even with 2 min timeout. Hence increasing timeout to 5min and retry to 5 times Pull Request resolved: https://github.com/pytorch/pytorch/pull/85802 Approved by: https://github.com/datumbox commit afaee00feca07c565f6b080e021b3422bfc1e8d4 Author: Mikayla Gawarecki Date: Wed Sep 28 17:38:29 2022 +0000 Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593) Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ). Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc. Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593 Approved by: https://github.com/drisspg, https://github.com/cpuhrsch commit a876432aea11fd2c6e11b8397f4f89a30cd1e8ba Author: soulitzer Date: Wed Sep 28 13:00:50 2022 -0400 Expose torch._will_engine_execute_node (#84773) Addresses: https://github.com/pytorch/pytorch/issues/83617 This PR a way to query the TLS graph task's exec_info which is a map mapping the Node to a bool indicating whether it will be executed in the current backward pass (as determined by the inputs= argument for .grad of .backward). - this works with both custom Function nodes and normal codegened nodes - to be able to verify whether the pyobject passed is an actual node, we now store pointers to PyTypeObjects into a set on registration. - error out when .backward without inputs= to avoid silently returning True Alternatives: - not sure if it is possible to bind to Python from a raw pointer to Node. At least we wouldn't be able to use existing logic, and the Python object should only hold a weak reference to the Node. - other solutions to the motivating issue seem to require more extensive modification to the engine See the issue linked for an example of usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/84773 Approved by: https://github.com/albanD commit 8dd45424eaf4ab39c8723efdec91a269c7eb9448 Author: Nikita Karetnikov Date: Wed Sep 28 17:23:42 2022 +0000 [primTorch] Add ref for `huber_loss` and error inputs (#85041) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85041 Approved by: https://github.com/lezcano, https://github.com/mruberry commit 0b93afb112d48bb6d89a1e183a90b403560e84e4 Author: Elias Ellison Date: Wed Sep 28 07:55:11 2022 -0700 add amp tests (#85434) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85434 Approved by: https://github.com/ngimel commit 29c78266c046c7f83e7d84fc764af47e62ae9542 Author: Peter Bell Date: Tue Sep 27 23:54:50 2022 +0100 test_decomp.py: Skip tests for embedding_backward bf16 (#84554) `embedding_backward`'s decomposition is less accurate for bf16. Currently bfloat16 is skipped in both forward and backward, but the forward decomposition matches 1-1 with the ATen implementation so this re-enables the test for the forwards decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84554 Approved by: https://github.com/albanD commit c2e9b9ec4a51e49a094f4ea413cba3f0567f82c2 Author: Peter Bell Date: Tue Sep 27 18:37:23 2022 +0100 TestModule: Don't assume sample inputs version counter is zero (#85734) The intention of this assert is to check the input tensor's version counter has increased, indicating it was mutated by `m_inplace`. However, the cloning step to create `input_arg_clone` restarts the version counter to zero, so this test may fail if the sample input was ever mutated during its creation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85734 Approved by: https://github.com/albanD commit 5b476e68afd0fd8e14494f3d209bd6b63f4d422f Author: Edward Z. Yang Date: Wed Sep 28 10:13:21 2022 -0700 Slightly beefed up dynamic shapes tests for storage_offset (#85806) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85806 Approved by: https://github.com/albanD commit d776693701bef4283858961a4c597edd73d1fc6d Author: Seonglyong Gong Date: Wed Sep 28 19:18:12 2022 +0000 [Profiler] Optimizer param_groups (part 3 of Record Optimizer) (#85784) Summary: - use TensorMetadata struct - check_and_store util as overloading - param_groups - clean up unit test cases Test Plan: buck run mode/opt //caffe2/test:profiler Reviewed By: chaekit Differential Revision: D39406072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85784 Approved by: https://github.com/aaronenyeshi, https://github.com/robieta commit 2f8cfb74af5123323e64b1c0fddfdd63ab0b3205 Author: Animesh Jain Date: Wed Sep 28 18:35:51 2022 +0000 Fix gelu repr (#85790) Fixes https://github.com/pytorch/torchdynamo/issues/1378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85790 Approved by: https://github.com/ezyang commit ff71f457889a0fe7de3b921d5ae2341e0ab7dc69 Author: Andrew Gu Date: Wed Sep 28 16:19:07 2022 +0000 [FSDP] Add `FSDPExtensions` for TP support (#85039) This adds `FSDPExtensions` to enable TP + FSDP composability. To be agnostic to both `ShardedTensor` and `DistributedTensor`, the design relies on customizable hooks. Some notes: - I preferred the `_ext` prefix (short for "extension") over `_param_extension` simply because it is shorter. It should not matter much because it is purely internal facing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85039 Approved by: https://github.com/kumpera, https://github.com/fegin commit a4bd89b267e81dc2a23ed767e1efc30fcfb7c665 Author: Horace He Date: Wed Sep 28 17:24:11 2022 +0000 Revert "Revert "Symintified mmm/addmm derivative formulas (#85794)"" (#85820) This reverts commit 823dc33b00b811c28a3924a6f0a31ba6afee7272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85820 Approved by: https://github.com/huydhn commit 39130ccf7353bb7eecd34e51657f7e39fb70a353 Author: Horace He Date: Wed Sep 28 08:58:18 2022 +0000 Registered _like metas (#85793) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85793 Approved by: https://github.com/ezyang commit fc8ba3a92d44dc113f979d201d48f51c2887ded4 Author: PyTorch MergeBot Date: Wed Sep 28 17:22:53 2022 +0000 Revert "Allow only one -1 in nested view/reshape (#85691)" This reverts commit 4c4e5f6106b69960833d7766799fd4f246aa7cd7. Reverted https://github.com/pytorch/pytorch/pull/85691 on behalf of https://github.com/atalman due to Causes github first merge conflict commit b44a4a8b51774fd9bfdaa929db342cbbc28fe252 Author: PyTorch MergeBot Date: Wed Sep 28 17:18:29 2022 +0000 Revert "Registered _like metas (#85793)" This reverts commit a4e75ccf85bd580ae5cccd471cfe8aee60dc1aa2. Reverted https://github.com/pytorch/pytorch/pull/85793 on behalf of https://github.com/huydhn due to Sorry, reverting as this breaks an aot_autograd mac test on functorch. https://github.com/pytorch/pytorch/pull/85794 was reverted before but it was at the top of the stack so the revert still fail https://hud.pytorch.org/pytorch/pytorch/commit/823dc33b00b811c28a3924a6f0a31ba6afee7272 commit 4c6dc6a1a479dcb9dc3ca9b08c480fdcefd26113 Author: Nikita Shulga Date: Wed Sep 28 17:12:25 2022 +0000 [BE] Do not use VLA (#85800) [Variable Length Array](https://en.wikipedia.org/wiki/Variable-length_array) is part of C99 standard, but has never been adopted to C++ Also, warning if they are used (surprisingly those warnings can not be turned into errors. Remove code duplication in `OperationUtils.mm` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85800 Approved by: https://github.com/kulinseth, https://github.com/jeanschmidt commit 424aad7f826db3a51a5416229be8ce7a965879b4 Author: David Berard Date: Tue Sep 27 18:28:00 2022 -0700 [JIT] support freezing modules that don't have a forward method (#85779) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85779 Approved by: https://github.com/eellison commit a0b1693996f408c112b7a628860b7a754aaa4f77 Author: PyTorch MergeBot Date: Wed Sep 28 17:04:53 2022 +0000 Revert "Update `amax/amin/norm/count_nonzero` signatures with `int[*]? dim` (#83300)" This reverts commit 1c0f0b33a0e013d6ec162cf488ff7643c4ffa33e. Reverted https://github.com/pytorch/pytorch/pull/83300 on behalf of https://github.com/jeffdaily due to The commit breaks nvfuser tests commit 224b689cf19febc23fffb77beb97c0a0560f9585 Author: Edward Z. Yang Date: Wed Sep 28 07:15:28 2022 -0700 Handling for getitem with boolean in meta, and other improvements (#85807) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85807 Approved by: https://github.com/albanD commit b6885f7d4ab1f71c40d5f1e40773acbad038d355 Author: Edward Z. Yang Date: Wed Sep 28 10:20:29 2022 -0400 Don't make parameters have symbolic shapes (#85809) Parameters won't change size across iterations of the training loop, so this is a cost-free optimization that avoids us having to do symbolic math over parameters. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85809 Approved by: https://github.com/albanD commit e63d3a8aa6c52f29fa329df321cd51fef7e8a7c9 Author: Edward Z. Yang Date: Wed Sep 28 10:28:14 2022 -0400 Augment errors raised in fx.Interpreter with Node info (#85810) We have been using this extra error context in the symbolic-shapes branch and it is quite useful. Contributing it upstream here. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85810 Approved by: https://github.com/albanD commit b04b2fa9aa52cacbdc9aaaf477d55b0af845ce81 Author: Eddie Yan Date: Wed Sep 28 16:04:58 2022 +0000 [CUBLAS][CUDA GRAPHS] (re-re-open of #83461) Explicitly set the workspace for cuBLAS handles (#85447) Now includes @dagitses 's optimizations and fixes for teardown CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85447 Approved by: https://github.com/malfet commit 823dc33b00b811c28a3924a6f0a31ba6afee7272 Author: PyTorch MergeBot Date: Wed Sep 28 16:02:05 2022 +0000 Revert "Symintified mmm/addmm derivative formulas (#85794)" This reverts commit 230edd2515367fcb44cea7c40106ff9f6a712a2a. Reverted https://github.com/pytorch/pytorch/pull/85794 on behalf of https://github.com/janeyx99 due to Sorry, reverting as this breaks an aot_autograd mac test on functorch https://hud.pytorch.org/pytorch/pytorch/commit/230edd2515367fcb44cea7c40106ff9f6a712a2a commit 5709c67f1f93c47729621fe3a6ec3247286dd03b Author: Andres Lugo-Reyes Date: Wed Sep 28 15:48:24 2022 +0000 [ROCm] Retry loop implemented to avoid transient memory leak errors (#82607) Added a retry loop to memory leak checker to avoid rare case in which ROCM reports a false positive memory leak. Original issue observed as part of this ticket: https://github.com/pytorch/pytorch/issues/62533 - Applied changes and built - python test/test_cuda.py - Ensure all tests pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/82607 Approved by: https://github.com/malfet commit b2311192e6c4745aac3fdd774ac9d56a36b396d4 Author: Weiyi Zheng Date: Wed Sep 28 15:26:03 2022 +0000 [NN module] speed up _load_from_state_dict (#85743) Fixes #61398 The original implementation is very slow when the state_dict.keys() is long. This PR only passes relevant keys to the child module. existing test passes: `pytest test/test_nn.py -k state_dict` I couldn't figure out a good way to write a new test for this new behavior. Had a new snippet, but it will be flaky if integrated into the main CI because it's a timing based check. But I can verify that the test took 30s to run, after this PR it only takes 0.5s. ```python def test_load_state_dict_large(self): import copy import time base_module = nn.Linear(1,1) model = base_module for level in range(4): model = nn.Sequential(*[copy.deepcopy(model) for _ in range(10)]) state_dict = model.state_dict() self.assertEqual(len(state_dict), 20000) st = time.time() model.load_state_dict(state_dict, strict=True) strict_load_time = time.time() - st self.assertLess(strict_load_time, 10) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85743 Approved by: https://github.com/albanD commit cef8dfc8ba849be649f752a14a5f11cdbe1e17fc Author: Jane Xu Date: Wed Sep 28 14:59:22 2022 +0000 [BE] small typo+lint fixes for einsum/sumproduct_pair (#85709) Easy review! This PR fixes some typos + lints + clarifies some instructions Pull Request resolved: https://github.com/pytorch/pytorch/pull/85709 Approved by: https://github.com/soulitzer commit 230edd2515367fcb44cea7c40106ff9f6a712a2a Author: Horace He Date: Wed Sep 28 08:58:18 2022 +0000 Symintified mmm/addmm derivative formulas (#85794) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85794 Approved by: https://github.com/ezyang commit a4e75ccf85bd580ae5cccd471cfe8aee60dc1aa2 Author: Horace He Date: Wed Sep 28 08:58:18 2022 +0000 Registered _like metas (#85793) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85793 Approved by: https://github.com/ezyang commit 35fe4abdc74d88c0d4768f3cd7aedcfd2e817a3d Author: Horace He Date: Wed Sep 28 08:58:18 2022 +0000 Added symbolic shape testing for AOTAutograd (#85789) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85789 Approved by: https://github.com/ezyang commit 0b251d985df51d16c71d9c28b11b800fd38bf4bd Author: Jeff Daily Date: Wed Sep 28 14:05:02 2022 +0000 skip test TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32 (#85767) This test was marked as expected failure, but this test is flaky for ROCm but only because ROCm sometimes gets expected success. The test was only marked expected failure due to non-determinism that was already well-known. See the nearby comments. https://github.com/pytorch/pytorch/blob/a4c94f0739158d2f7fd27f2be59b77f33027e1c7/torch/testing/_internal/common_methods_invocations.py#L11410-L11421 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85767 Approved by: https://github.com/clee2000 commit 06e0583fb0f62e27a52fb87f3dce3156cd2d0073 Author: Howard Huang Date: Tue Sep 27 14:27:17 2022 -0700 [4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations (#83810) * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83810 Approved by: https://github.com/kwen2501 commit 0e256c255089649de7913e3707c38c6aefc59def Author: Horace He Date: Wed Sep 28 06:22:51 2022 +0000 removed compile cache and static argnums (#85783) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85783 Approved by: https://github.com/wconstab commit 614d6f19e3d30cac0d77059e738d1f25d75eb408 Author: Richard Barnes Date: Wed Sep 28 04:53:19 2022 +0000 Fix Use obj1.is(obj2) warnings (#85688) Fixes: ``` ^ /dev/shm/rbarnes/tempfs/pytorch/torch/csrc/autograd/python_variable.cpp:2603:11: warning: 'operator==' is deprecated: Use obj1.is(obj2) instead [-Wdeprecated-declarations] if (out == Py_None) { ^ /dev/shm/rbarnes/tempfs/pytorch/cmake/../third_party/pybind11/include/pybind11/detail/../pytypes.h:276:5: note: 'operator==' has been explicitly marked deprecated here PYBIND11_DEPRECATED("Use obj1.is(obj2) instead") ^ /dev/shm/rbarnes/tempfs/pytorch/cmake/../third_party/pybind11/include/pybind11/detail/common.h:136:43: note: expanded from macro 'PYBIND11_DEPRECATED' ^ /dev/shm/rbarnes/tempfs/pytorch/torch/csrc/autograd/python_variable.cpp:2627:11: warning: 'operator==' is deprecated: Use obj1.is(obj2) instead [-Wdeprecated-declarations] if (out == Py_None) { ^ /dev/shm/rbarnes/tempfs/pytorch/cmake/../third_party/pybind11/include/pybind11/detail/../pytypes.h:276:5: note: 'operator==' has been explicitly marked deprecated here PYBIND11_DEPRECATED("Use obj1.is(obj2) instead") ^ /dev/shm/rbarnes/tempfs/pytorch/cmake/../third_party/pybind11/include/pybind11/detail/common.h:136:43: note: expanded from macro 'PYBIND11_DEPRECATED' ^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85688 Approved by: https://github.com/albanD, https://github.com/ezyang commit 793488cda262a338205314ccba90e549e4682f82 Author: Edward Z. Yang Date: Tue Sep 27 15:01:01 2022 -0700 Revert "Revert "Symintifying slice ops (#85196)"" (#85746) This reverts commit 3a171dfb0c08956d55f341039cf35e3a18269c34. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85746 Approved by: https://github.com/albanD commit 049be5ac107f50c1ed7ccfea2f1fcdbdf6be0f88 Author: Edward Z. Yang Date: Tue Sep 27 15:11:21 2022 -0700 Remove some dead code from fake tensor (#85764) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85764 Approved by: https://github.com/wconstab commit 795028a3cec2603a750bdc02ab2b93329f43e883 Author: Thomas Viehmann Date: Wed Sep 28 03:50:42 2022 +0000 Make Python reference for permute accept varargs (#85460) Fixes #85452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85460 Approved by: https://github.com/jjsjann123, https://github.com/mruberry, https://github.com/ngimel commit ccac8d13d5988de7302551a5df460072eb918683 Author: Howard Huang Date: Tue Sep 27 14:27:17 2022 -0700 [3/N] [Dispatchable Collectives] Update broadcast_ with CPU and CUDA implementations (#83735) * Update the broadcast op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Add test to validate that a separate device implementation is not supported. In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D38876771](https://our.internmc.facebook.com/intern/diff/D38876771) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83735 Approved by: https://github.com/kwen2501 commit 01f946d766e3ae58b2371306587659483d5e1b39 Author: Horace He Date: Tue Sep 27 22:57:29 2022 +0000 Rename test file from test_pythonkey to test_aotdispatch (#85769) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85769 Approved by: https://github.com/ezyang commit fc2e7ebaacbca3e8b851d6d6ceef96a616709f93 Author: Horace He Date: Tue Sep 27 22:53:52 2022 +0000 Added floordiv simplification rule needed for models (#85768) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85768 Approved by: https://github.com/ezyang commit a8be59545dd2acd48da2a8f6e99d45ec348d95c4 Author: Paul Saab Date: Wed Sep 28 03:00:30 2022 +0000 [aarch64] Use the correct binary when cross building //caffe2/torch/csrc/deploy:remove_dt_needed (#85632) Test Plan: CI Differential Revision: D39807135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85632 Approved by: https://github.com/ajtulloch commit 3276b51243fcc0fea4f780216d1f9a5886805d2b Author: Ke Wen Date: Wed Sep 28 02:56:48 2022 +0000 Add environment parse function that supports default value (#85563) We use "-2" to represent an unset environment variable. Now adding a util function to attach default value if environment variable is unset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85563 Approved by: https://github.com/rohan-varma, https://github.com/H-Huang commit f80ef73d1c0da4938a264a1ac1c903c78ee3fc6a Author: Seonglyong Gong Date: Wed Sep 28 02:48:07 2022 +0000 [Profiler] tracking Optimizer (part 2 of Record Optimizer) (#84920) Summary: Part 2 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063) - hooking from optimizer step - PyOptCall Type - declare data type for collection - python binding - simple unit test case Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39402667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84920 Approved by: https://github.com/robieta commit 1c0f0b33a0e013d6ec162cf488ff7643c4ffa33e Author: Kurt Mohler Date: Wed Sep 28 01:56:37 2022 +0000 Update `amax/amin/norm/count_nonzero` signatures with `int[*]? dim` (#83300) Changes `dim` arg to use `int[*]?` type for the following functions in `native_funcitons.yaml`: * `amax` * `amin` * `norm` * `frobenius_norm` * `native_norm` * `count_nonzero` Part of #29137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83300 Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/kulinseth commit 80b88862239554f895a369f88e197ecb0fa53281 Author: Jing Xu Date: Wed Sep 28 01:39:58 2022 +0000 add itt unit test and docstrings (#84848) Add unit tests and docstrings corresponding to PR https://github.com/pytorch/pytorch/pull/63289 UT: 1. `test_profiler_emit_itt` in `test/test_autograd.py`. This test is merely intended to catch if emit_itt breaks on construction. 2. Test `torch.profiler.itt` functions in `test/test_itt.py` 3. Only testing that emit_itt runs when `record_shapes` option is enabled in `test/test_profiler.py`. Docstring: 1. add ITT related info into `docs/source/bottleneck.rst` 4. add `torch.profiler.itt` functions to `docs/source/profiler.rst` 5. add docstring to `torch.profiler.itt` functions in `torch/profiler/itt.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84848 Approved by: https://github.com/malfet commit 572dd862c4461e87731f8eabc800b4cfb52cb647 Author: PyTorch MergeBot Date: Wed Sep 28 01:36:43 2022 +0000 Revert "Update `amax/amin/norm/count_nonzero` signatures with `int[*]? dim` (#83300)" This reverts commit 8c7c7ed3221aeeefb63ef2b7a221a5d8b274cda5. Reverted https://github.com/pytorch/pytorch/pull/83300 on behalf of https://github.com/huydhn due to The commit pin breaks XLA test somehow commit 1c1f3a99dcc5c2fdcbdc0ed011167de41efc9497 Author: Chien-Chin Huang Date: Tue Sep 27 11:26:12 2022 -0700 [FSDP] Handle the state_dict on CPU cases (#85640) state_dict may not be on GPUs. We need to move it to the compute_device in order to gather the ShardedTensor. Differential Revision: [D39681730](https://our.internmc.facebook.com/intern/diff/D39681730/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85640 Approved by: https://github.com/rohan-varma, https://github.com/awgu commit ce4f187f15d846b310511809c8afa8bd925d250b Author: Denis Vieriu Date: Wed Sep 28 00:47:52 2022 +0000 [MPS] Add tensor::index_put op (#85672) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85672 Approved by: https://github.com/malfet commit 9858f043508432a8c79691b93faf21d99d5cbf99 Author: Jerry Zhang Date: Tue Sep 27 12:55:02 2022 -0700 [quant][docs] Add types for scale and zero_point tensor for torch.fake_quantize_per_channel_affine docs (#85733) Summary: Fixes: https://github.com/pytorch/pytorch/issues/85525 Test Plan: visual inspection for the docs page Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/85733 Approved by: https://github.com/z-a-f commit 7ff6a00a9a59ee53ca71dfa11697fb7822fd3c0c Author: Kulin Seth Date: Wed Sep 28 00:43:11 2022 +0000 [MPS] Handle 1D weight in linear layer (#85752) Fixes https://github.com/pytorch/pytorch/issues/84591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85752 Approved by: https://github.com/malfet commit 4ca125a9e1dd1bed2606ce44f137d59905580db5 Author: andrewor14 Date: Tue Sep 27 13:37:00 2022 -0700 [Quant][fx] Add quant and scale ranges to BackendConfig (#85200) **Summary:** This commit adds the following constraints to BackendConfig: quant_min_lower_bound quant_max_upper_bound scale_min_lower_bound scale_max_upper_bound This is motivated by QNNPACK constraints on qint8 weight values and the min scale value. Actually enforcing these constraints in the QNNPACK BackendConfig will follow in a future commit. Today, users can also specify the above constraints through QConfigs, and these settings may not necessarily match the ones specified in the BackendConfig. In this case, we will handle the discrepancy as follows: (1) Require QConfig quant ranges to fall within the backend's (2) Require QConfig min scale value (eps) >= backend's (3) Require QConfig to specify quant range if the backend specified one (4) Require QConfig to specify min scale value (eps) if the backend specified one Public API changes: * Previous API, still supported after this commit: ``` dtype_config = DTypeConfig( input_dtype=torch.quint8, output_dtype=torch.quint8, weight_dtype=torch.qint8, bias_dtype=torch.float, ) ``` * New API: ``` dtype_config = DTypeConfig( input_dtype=DTypeWithConstraints( dtype=torch.quint8, quant_min_lower_bound=0, quant_max_upper_bound=127, scale_min_lower_bound=2 ** -12, ), output_dtype=DTypeWithConstraints( dtype=torch.quint8, quant_min_lower_bound=0, quant_max_upper_bound=127, scale_min_lower_bound=2 ** -12, ), weight_dtype=DTypeWithConstraints( dtype=torch.qint8, quant_min_lower_bound=-128, quant_max_upper_bound=127, scale_min_lower_bound=2 ** -12, ), bias_dtype=torch.float, ) ``` * Additionally, the following `DTypeConfig` attributes have new types with helper getters: ``` dtype_config.input_dtype dtype_config.output_dtype dtype_config.weight_dtype dtype_config.get_input_dtype() dtype_config.get_output_dtype() dtype_config.get_weight_dtype() ``` Note that scale_max is currently not used because there is no existing mechanism to enforce this on the observer. In the future, we can validate this as well if there is a use case. **Test Plan:** python test/test_quantization.py TestBackendConfig.test_dtype_with_constraints python test/test_quantization.py TestQuantizeFx.test_backend_config_scale_min python test/test_quantization.py TestQuantizeFx.test_backend_config_quantization_range **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85200 Approved by: https://github.com/jerryzh168 commit 24a268143da49911d7ab44afb59865dcdd0f3456 Author: Edward Z. Yang Date: Tue Sep 27 15:11:20 2022 -0700 Directly access has_symbolic_sizes_strides, avoid expensive test (#85754) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85754 Approved by: https://github.com/albanD commit 8c7c7ed3221aeeefb63ef2b7a221a5d8b274cda5 Author: Kurt Mohler Date: Tue Sep 27 23:50:01 2022 +0000 Update `amax/amin/norm/count_nonzero` signatures with `int[*]? dim` (#83300) Changes `dim` arg to use `int[*]?` type for the following functions in `native_funcitons.yaml`: * `amax` * `amin` * `norm` * `frobenius_norm` * `native_norm` * `count_nonzero` Part of #29137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83300 Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/kulinseth commit f1f6cb07e2d486ea1408a7b554dd6f715e400d13 Author: Xia, Weiwen Date: Tue Sep 27 23:40:02 2022 +0000 [UT] Fix random failure of test_qconv_transpose1d by skip using hypothesis (#85463) TestQuantizedConv.test_qconv_transpose1d fails randomly due to hypothesis (according to @jerryzh168). This PR fixes it by rewriting the test case without hypothesis. Use fixed parameters and `itertools.product` to generate test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85463 Approved by: https://github.com/jerryzh168 commit ea50e7f262e826e9f0eef1623e8e8656911adef9 Author: mikey dagitses Date: Tue Sep 27 23:31:51 2022 +0000 fix ovrsource pytorch build from D39769513 (#85708) Test Plan: Tested locally, verifying with CI. Reviewed By: h-friederich Differential Revision: D39851831 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85708 Approved by: https://github.com/zou3519 commit 7934596b700b34cac507cac4f2b9d106e36efa02 Author: James Zeng Date: Tue Sep 27 23:27:40 2022 +0000 [ucc] Remove internal tracing (#85730) Summary: Remove internal tracing since this was not upstreamed yet. Test Plan: All PyTorch test should pass. Differential Revision: D39853937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85730 Approved by: https://github.com/kwen2501 commit f98109795f9b214286c44af4936b0032a6992df1 Author: Edward Z. Yang Date: Tue Sep 27 14:01:51 2022 -0700 Stop using restore() in ProxyTorchDispatchMode (#85756) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85756 Approved by: https://github.com/samdow commit 0a64e73d1259d8dcfbb1c22f65175e841380c878 Author: Kannav Date: Tue Sep 27 22:58:06 2022 +0000 52189: refractor unreachable Runtime Error (#85725) Fixes #52189 Since the PR #52228 had gone cold. I made the requested changes and fixed the linting error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85725 Approved by: https://github.com/lezcano commit 5bfcf1f01aaf84add54addf7f39afe986602baa9 Author: Andrew M. James Date: Mon Sep 26 17:06:46 2022 -0500 [Docs] Update sparse Maintainers (#85126) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85126 Approved by: https://github.com/cpuhrsch commit 775a22c7c664d6cfca60af287b94c0f70245696e Author: Ke Wen Date: Tue Sep 27 22:50:19 2022 +0000 Add all_gather_into_tensor in place of _all_gather_base (#85686) - This PR renames `_all_gather_base` to `all_gather_into_tensor` so that it is clearer in meaning. - The `all_gather_into_tensor` API differs from the `all_gather` API in the output it accepts -- a single, large tensor instead of a list of tensors. - This PR also adds deprecation warning to `_all_gather_base`. `_all_gather_base` was implemented in https://github.com/pytorch/pytorch/pull/33924 to avoid unnecessary flattening. There was previous effort (#82639) to merge `_all_gather_base` with the existing `all_gather` API by detecting the parameter type passed in for the output. There are, however, two "blockers" that make the merge difficult: (i) The merge leads to backward compatibility break. We would need to change the parameter name `tensor_list` in `all_gather` to a general name `output` that can cover both tensor and tensor list. (ii) Recently, the `all_gather` API has added uneven tensor support, utilizing the tensor boundaries implied by the list. We are, however, not sure to add such support to the `_all_gather_base` function, because that would require users to pass in additional tensor boundary information. In view of the above, we decided to productize `_all_gather_base` as a separate function, but with a clearer name. Added tests: - `test_all_gather_into_cat_tensor_cuda` -- output form as with `torch.cat`. For example: ``` >>> tensor_in tensor([1, 2], device='cuda:0') # Rank 0 tensor([3, 4], device='cuda:1') # Rank 1 >>> tensor_out tensor([1, 2, 3, 4], device='cuda:0') # Rank 0 tensor([1, 2, 3, 4], device='cuda:1') # Rank 1 ``` - `test_all_gather_into_stack_tensor_cuda` -- output form as with `torch.stack`. For example: ``` >>> tensor_out2 tensor([[1, 2], [3, 4]], device='cuda:0') # Rank 0 tensor([[1, 2], [3, 4]], device='cuda:1') # Rank 1 ``` The output form is determined by the shape of the output tensor passed by the user, no flag used. Cc @rohan-varma @mrshenli @crcrpar @ptrblck @H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85686 Approved by: https://github.com/rohan-varma, https://github.com/crcrpar commit 34cee3e82ba77b537ded37478a3852de5ea96bd5 Author: Will Constable Date: Tue Sep 27 22:48:11 2022 +0000 Auto tag milad for symbolic-shapes PRs (#85751) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85751 Approved by: https://github.com/ezyang commit 16543f6878348da33ef18b7d0d16dde096213fd6 Author: Seonglyong Gong Date: Tue Sep 27 22:41:21 2022 +0000 Revisit python tracing OD flow (#85326) Summary: - add `set_withstack()`, overriding `ClientInterface`'s no-op funtion. - revert `start()` and #ifdef Differential Revision: D39647074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85326 Approved by: https://github.com/chaekit commit fc99705f22ae6c4165cca705c79f784bb7c7831a Author: Richard Zou Date: Mon Sep 26 14:48:54 2022 -0700 Add functorch M1 shard (#85565) functorch should have a test wherever regular PyTorch gets tested. This PR adds an M1 shard to test functorch. Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/85565 Approved by: https://github.com/malfet, https://github.com/huydhn commit 5cbffbbac9a59098637f821e8b6e10f609de30ff Author: Rodrigo Kumpera Date: Tue Sep 27 21:42:24 2022 +0000 C10D extension to enable per-thread PG (#84153) Move a bunch of globals to instance methods and replace all use to them. We move all PG related globals under World and use a singleton instance under _world. This creates an undocumented extension point to inject full control of how how c10d state behaves. One simple hack is to change _world to an implementation that uses a threadlocal and enable per-thread PGs. It almost get DDP working and the PG is missing an implementation of all_reduce. This enables notebook usage of PTD, which is a big deal for learning it: https://gist.github.com/kumpera/32cb051fa26b8cad8bdf671f968dcd68 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84153 Approved by: https://github.com/rohan-varma commit b6bee3c491d3a9a99920fb67203d2bdb8390ac59 Author: atalman Date: Tue Sep 27 21:34:36 2022 +0000 Upload to different path for pypi cudnn (#85753) We want to upload pypi cudnn builds to a different download folder something like cu117_pypi_cudnn Pull Request resolved: https://github.com/pytorch/pytorch/pull/85753 Approved by: https://github.com/seemethere, https://github.com/kit1980 commit 6cfe555f4fe54c8df5a07af39a335d0f4d914d95 Author: Thiago Crepaldi Date: Tue Sep 27 21:26:32 2022 +0000 [ONNX] Apply Common Subexpression Elimination pass to ONNX export (#85665) Exporting graphs with Autocast may fail due to a limitation on JIT tracer. By disabling Autocast cache, tracer works, but there can be performance hit when there is reuse of weights in convolution, for example By applying CSE, such performance loss can be reverted. ps: As a comment at #84092 mentioned, disabling Autocast cache is an acceptable workaround and used throughout PyTorch code. Fixes #84092 ```python graph(%0 : Float(requires_grad=0, device=cpu)): %3 : Scalar = aten::ScalarImplicit(%0), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: %13 : int = prim::Constant[value=3](), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 %14 : int = prim::Constant[value=4](), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 %15 : int[] = prim::ListConstruct(%13, %14), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: %16 : NoneType = prim::Constant(), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: %17 : NoneType = prim::Constant(), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: %18 : Device = prim::Constant[value="cpu"](), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 %19 : bool = prim::Constant[value=0](), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 %20 : Float(3, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::full(%15, %3, %16, %17, %18, %19), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 return (%20) AFTER graph(%0 : Float(requires_grad=0, device=cpu)): %3 : Scalar = aten::ScalarImplicit(%0), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: %13 : int = prim::Constant[value=3](), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 %14 : int = prim::Constant[value=4](), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 %15 : int[] = prim::ListConstruct(%13, %14), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: %16 : NoneType = prim::Constant(), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: %18 : Device = prim::Constant[value="cpu"](), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 %19 : bool = prim::Constant[value=0](), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 %20 : Float(3, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::full(%15, %3, %16, %16, %18, %19), scope: test_onnx_opset.TestONNXOpset.test_full..MyModule:: # /home/thiagofc/dev/github/pytorch/test/onnx/test_onnx_opset.py:347:0 return (%20) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85665 Approved by: https://github.com/ngimel, https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit c719ec9c11f9f7f57ebcfe611e0717c24ecb3b9b Author: Kevin Tse Date: Tue Sep 27 13:49:14 2022 -0400 [DataPipe] Fix MapDataPipe spawn lambda test (#85668) The test in its original form fails and I believe it is because the expected result is incorrect, unless we expect different behaviors between `IterDataPipe` and `MapDataPipe` in multiprocessing. Differential Revision: [D39832182](https://our.internmc.facebook.com/intern/diff/D39832182) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85668 Approved by: https://github.com/ejguan commit 64a526d4af6ea69d84791bf6c4c3e2695f3828ce Author: Kevin Tse Date: Tue Sep 27 13:49:14 2022 -0400 [DataLoader] Replacing `traverse` function with `traverse_datapipes` (#85667) This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead. While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised. ``` /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13. warnings.warn(msg, FutureWarning) ``` A few things we'd like to do: 1. Deprecate the key word arg `only_datapipe` 2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future 3. Do not raise a warning when users are using the function correctly This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`): - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True` - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative. If this looks right, I will send a follow up PR in `TorchData`. Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85667 Approved by: https://github.com/ejguan commit 8a926b31878f8deb6aee051b1438e68c43fcd31b Author: Andrew M. James Date: Tue Sep 27 12:04:59 2022 -0500 Enable CSC @ CSC addmm (#85379) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85379 Approved by: https://github.com/pearu, https://github.com/cpuhrsch commit bb5001ce3d9084279e1e83971976cd0535d21d73 Author: Andrew M. James Date: Tue Sep 27 12:04:58 2022 -0500 Enable dense x bsc mm/addmm (#85308) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85308 Approved by: https://github.com/pearu commit e59120ab51039fe2e7642b0cf34902f8fb236091 Author: Nikita Shulga Date: Tue Sep 27 19:43:12 2022 +0000 C++20 compatible changes (#85703) `std::hash::result_type` is deprecated since C++17 and removed in c++20, so use `c10::invoke_result_t` to define it Fixes https://github.com/pytorch/pytorch/issues/85603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85703 Approved by: https://github.com/ezyang commit e746fff8ba2338c56cc88fef6e5d131b5590db8a Author: Abhishek Pathak Date: Tue Sep 27 19:08:22 2022 +0000 [MPS] Enable adaptive avg pool 2d with larger output size (#85726) * Handle adpative pool 2d forward and backward when ouptut size is larger than input size * Disallow larger output size if not a multiple of input size Fixes: https://github.com/pytorch/pytorch/issues/80732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85726 Approved by: https://github.com/malfet commit c8776dca6a503eaa92a8d7b2427ebce7a19df398 Author: Srikumar Sastry Date: Tue Sep 27 18:43:39 2022 +0000 Remove extra `with` in value error exception statement (#84713) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84713 Approved by: https://github.com/ngimel commit a1066110559701ceb7acc090d3972b9ed6f4231d Author: samdow Date: Tue Sep 27 11:40:29 2022 -0400 [Modes] fix handle_torch_funcion logic (#85707) Fixes #85696. I didn't totally get what was happening in handle_torch_function and so was trying to recreate the original logic instead of follow what the C++ is doing. This fixes that Pull Request resolved: https://github.com/pytorch/pytorch/pull/85707 Approved by: https://github.com/ezyang commit f4251525dece37071d846901107deef3978c55ad Author: Omkar Salpekar Date: Tue Sep 27 18:11:18 2022 +0000 Adding Wunused-lambda-capture to Clang build flags (#85655) Add `-Wunused-lambda-capture` to clang build flags to better align internal and OSS build systems. This flag is not supported in gcc so only adding for clang builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85655 Approved by: https://github.com/huydhn commit d51f6de9b8794aa1d5af6e2e4ea0ccf1a2d69f95 Author: Jesse Cai Date: Tue Sep 27 13:40:01 2022 +0000 [quant][core][feature] Implement index_put for quantized CUDA tensors (#85685) Summary: - Add new cuda test for quantized index_put - Add determinsitc test for CPU and CUDA quantized index_put - Add in QuantizedCUDA implementation for index_put - wrote new `index_put_kernel_quantized_cuda` - CUDA index_put determinstic implemented in `index_put_with_sort_kernel_quantized` I think quantize_val is not CUDA compatible, because of the reliance on std::numeric_limits. Might be something useful to add in the future? Test Plan: ``` python test/test_quantization.py -k test_qtensor_index_put ``` Reviewers: Subscribers: Tasks: Tags: quant Pull Request resolved: https://github.com/pytorch/pytorch/pull/85685 Approved by: https://github.com/dzdang commit 3a171dfb0c08956d55f341039cf35e3a18269c34 Author: PyTorch MergeBot Date: Tue Sep 27 18:01:27 2022 +0000 Revert "Symintifying slice ops (#85196)" This reverts commit 4c01c51266afae57c6d6952c84fff2802d9b2bb9. Reverted https://github.com/pytorch/pytorch/pull/85196 on behalf of https://github.com/atalman due to Break internal build Exutorch commit 9f1468ae6c8e6e3938a3f1cfb9378e11af2fd0cd Author: Peter Jung Date: Tue Sep 27 17:41:56 2022 +0000 CyclicLR memory leak fix (#85462) Hi, we noticed in our team that by using CyclicLR, there is a problem with memory clearance on GPU (probably it will be the case without the GPU as well, but that was our use case) After initializing CyclicLR, GPU memory is not cleared even after the model, optimizer and scheduler are out of scope (e.g. reference count is zero). This is because `__init__` method inside `CyclicLR` creates reference to its own methods and it will not get removed until `gc.collect()` is called manually. This is a problem if people want to test multiple models in one run of a script, after testing the first model, second one will fail on `CUDA out of memory error` because the first one is not cleared from the memory. I propose a simple fix by using `weakref`, similarly as in `_LRScheduler` base class, but if you have any comments I am happy to change it. Here is the code to reproduce the bug: ``` import torch import weakref from transformers import DetrForObjectDetection class X: def __init__(self, optimizer): self.optimizer = optimizer self.func = self.dummy def dummy(self, x): return 1. def test(): model = DetrForObjectDetection.from_pretrained('facebook/detr-resnet-50') model.to('cuda') optimizer = torch.optim.Adam(model.parameters()) x = X(optimizer) test() print(f'{torch.cuda.memory_reserved()}, {torch.cuda.memory_allocated()}') # Should print (, 0), but with cyclic reference, it will print (, ). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85462 Approved by: https://github.com/albanD commit 4c4e5f6106b69960833d7766799fd4f246aa7cd7 Author: Mikayla Gawarecki Date: Tue Sep 27 05:29:57 2022 +0000 Allow only one -1 in nested view/reshape (#85691) Behavior before this PR: 1. `-1` allowed for implicit batch dimension 2. multiple `-1`s allowed for pre-existing dimensions 3. for new dimensions, `-1` is not allowed it is worth noting that for the most part 3 is basically unreachable because assuming a nested tensor has at least 1 ragged dimension, you would expect at least one -1 to be in the proposed shape for the pre-existing dimensions Behavior after this PR: 1. batch dimension **must be specified** 2. **only one** `-1` allowed for pre-existing dimensions **this effectively means that we only allow reshaping/viewing of nt with ONE ragged dimension** 3. unchanged Pull Request resolved: https://github.com/pytorch/pytorch/pull/85691 Approved by: https://github.com/cpuhrsch commit 7167996346c5e5299559c8501821d2ab7ef770d3 Author: PyTorch MergeBot Date: Tue Sep 27 16:59:35 2022 +0000 Revert "resubmit: [mta] APEX style Fused Adam (#81705) (#85507)" This reverts commit 4615d1bcfa0915a992e7445086ba559ca7441607. Reverted https://github.com/pytorch/pytorch/pull/85507 on behalf of https://github.com/atalman due to Break internal windows builds commit f8e71ca3384370fa42e4ad386ff567c5d28c6506 Author: li-yi-dong <73142299+li-yi-dong@users.noreply.github.com> Date: Tue Sep 27 16:55:39 2022 +0000 Designate divice to generate_square_subsequent_mask (#85609) When the model is on GPU, generating the mask on defalut device(cpu) is quite time consuming. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85609 Approved by: https://github.com/albanD commit aaef5d8f2cb46e3a8cc81244c69c2140fb0bbd1b Author: Andrew M. James Date: Tue Sep 27 09:12:02 2022 -0500 sparse mm/addmm enable dense x csc, csc x dense and simplify layout check logic. (#85307) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85307 Approved by: https://github.com/pearu, https://github.com/cpuhrsch commit b656ba0b1105aa672c1c7be6138fcdca7ad924c8 Author: Peter Bell Date: Tue Sep 27 14:58:09 2022 +0100 Use hexfloat for threshold OpInfo tests (#85676) 0.123 isn't exactly representable as a floating point value, and so the threshold will move marginally depending on the data type where the computation is performed. This leads to a rare flake in tests comparing against a reference implementation. Instead, this chooses a threshold which is exactly representable as a bfloat16 value and thus has the same value for all data types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85676 Approved by: https://github.com/ngimel commit fdef5078977d5aa5d3b77acd324d36212c7159a5 Author: Peter Bell Date: Tue Sep 27 14:58:09 2022 +0100 Simplify noncontiguous_like (#85518) This removes the special casing for zero-dim tensors and also uses indexing instead of manual stride manipulations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85518 Approved by: https://github.com/albanD commit 101f10d7cacbb29ad62a519a030858a35fbba6d4 Author: S. Song <41357537+shmsong@users.noreply.github.com> Date: Tue Sep 27 15:53:01 2022 +0000 Cherry pick sorting patch (#85620) Fixes https://github.com/csarofeen/pytorch/issues/1947 Cherry-picked patch for torchbench issues where fusion segmenter asserts in nvfuser: 1. test the groups comes with the same order as they are merged. 2. Fix detection of un-mappable root domains: ComputeAtRootDomainMap flags domains that should not be mapped due to reductions. Previously, checking if a domain potentially causes an invalid mapping is only done with one domain in each group of domains that are found to be mappable so far. That's not actually sufficient as the unmappable domain set is created just once with no root mapping information. The fix is to check all consumer domains of a producer tensor. A small other fix is also done to address a different problem discovered after the first fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85620 Approved by: https://github.com/csarofeen, https://github.com/davidberard98 commit 1367f2409f11aaf3d56ad81b0c0cc79120e9d124 Author: Nikita Shulga Date: Tue Sep 27 15:44:53 2022 +0000 [MPS] Fix placeholder behavior for transposed view (#85689) Looks like the expectation in that code were that `.clone` will return contiguous tensor, so explicitly specify memory format Fixes https://github.com/pytorch/pytorch/issues/85675 and https://github.com/pytorch/pytorch/issues/85224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85689 Approved by: https://github.com/kulinseth commit 15c52ffc4f9a02f7078033677d44ccd760107952 Author: soulitzer Date: Mon Sep 26 18:24:57 2022 -0400 Disallow auto_element_wise for in-place and fix some in-place gradients (#85634) Fixes https://github.com/pytorch/pytorch/issues/85535 Also fixes the backward and forward gradients of `nn.functional.threshold`. The issue was that in-place gradients weren't tested because the in-place variants were not properly registered to the OpInfo. Perhaps an alternative to this to make auto_element_wise smart enough to actually handle the in-places cases (we have 4 cases total now where we manually copy_ after doing auto_element_wise), but that requires a few more changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85634 Approved by: https://github.com/albanD commit 01dbbeeeb5ab7ede28e333982e98713282a0e4b8 Author: Sherlock Huang Date: Tue Sep 27 04:15:56 2022 +0000 Expose cpp_backtrace to python binding (#84896) We can now get cpp stack trace by calling torch.utils.get_cpp_backtrace() Sample output when calling from a torch_dispatch stack: ``` frame #23: torch::handle_torch_function_no_python_arg_parser(c10::ArrayRef, _object*, _object*, char const*, _object*, char const*, torch::TorchFunctionName) (0x7f69330bab90 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/utils/python_arg_parser.cpp:323) frame #24: (0x7f6932a09e79 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/python_variable.cpp:2252) frame #25: (0x7f69261aee33 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/PythonFallbackKernel.cpp:56) frame #26: (0x7f69261afef9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:19) frame #27: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #28: (0x7f6926fae9b9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/boxing.h:227) frame #29: at::Tensor c10::Dispatcher::redispatch(c10::TypedOperatorHandle const&, c10::DispatchKeySet, at::Tensor const&) const (0x7f6926e821f5 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:106) frame #30: at::_ops::alias::redispatch(c10::DispatchKeySet, at::Tensor const&) (0x7f6927142c31 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:438) frame #31: (0x7f692ae4f8be in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:1361) frame #32: (0x7f692ae4f9b1 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:1362) frame #33: (0x7f692aef77e9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13) frame #34: (0x7f6926fae7d8 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:50) frame #35: at::Tensor c10::Dispatcher::redispatch(c10::TypedOperatorHandle const&, c10::DispatchKeySet, at::Tensor const&) const (0x7f6926e821c9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:97) frame #36: at::_ops::alias::redispatch(c10::DispatchKeySet, at::Tensor const&) (0x7f6927142c31 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:438) frame #37: (0x7f6929ec654a in /fsx/users/bahuang/repos/pytorch_fsx/build/aten/src/ATen/RedispatchFunctions.h:10697) frame #38: (0x7f6929d9edae in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/VariableType_1.cpp:2837) frame #39: (0x7f6929d9f043 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/VariableType_1.cpp:2838) frame #40: (0x7f6929e7d2f9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13) frame #41: (0x7f6929eb1344 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:478) frame #42: (0x7f6929ea7b99 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:490) frame #43: (0x7f6929e7d370 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:563) frame #44: (0x7f6929e7d43a in /fsx/users/bahuang/repos/pytorch_fsx/c10/util/C++17.h:239) frame #45: (0x7f6929e7d48c in /fsx/users/bahuang/repos/pytorch_fsx/c10/util/C++17.h:364) frame #46: (0x7f6929e7d50a in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:554) frame #47: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #48: c10::KernelFunction::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadd26 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:43) frame #49: c10::Dispatcher::redispatchBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f692603890a in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:652) frame #50: (0x7f69260387f9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:388) frame #51: (0x7f69261af0ef in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/PythonFallbackKernel.cpp:96) frame #52: (0x7f69261aff2b in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:25) frame #53: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #54: c10::KernelFunction::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadd26 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:43) frame #55: c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const (0x7f6925fd6ab2 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:628) frame #56: (0x7f6925fd6690 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:376) frame #57: (0x7f692bf5b525 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:380) frame #58: (0x7f692bf59fac in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/runtime/register_c10_ops.cpp:15) frame #59: (0x7f692bf5af41 in /usr/include/c++/7/bits/std_function.h:316) frame #60: std::function >&)>::operator()(std::vector >&) const (0x7f6932ab9a0f in /usr/include/c++/7/bits/std_function.h:706) frame #61: (0x7f6932aad541 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/stack.h:41) frame #62: (0x7f6932ab3102 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/pybind_utils.h:1206 (discriminator 1)) frame #63: (0x7f6932ab3943 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/pybind_utils.h:1272) frame #64: (0x7f6932a46120 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/init.cpp:1767) frame #65: (0x7f6932a997be in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/cast.h:1441) frame #66: (0x7f6932a8a985 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/cast.h:1410) frame #67: (0x7f6932a66e1e in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:249) frame #68: (0x7f6932a66ec2 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:224) frame #69: (0x7f6932473111 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:929) frame #104: __libc_start_main (0x7f693485dc87 in /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:310) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84896 Approved by: https://github.com/ezyang commit 54e03cdda9fca7fcd8b29e40812213b6ebc8c091 Author: Rodrigo Kumpera Date: Tue Sep 27 14:45:56 2022 +0000 Don't use a fixed name to avoid race conditions. (#84952) Fixes #84886 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84952 Approved by: https://github.com/rohan-varma commit 0183c1e3362c53bedea88932f08bedb78a5822d7 Author: anjali411 Date: Tue Sep 27 12:30:22 2022 +0000 Add __all__ to torch.utils submodules (#85331) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85331 Approved by: https://github.com/albanD commit f64857189d514a662dac09ca6421e1e95e87e843 Author: Andrew M. James Date: Mon Sep 26 16:59:54 2022 -0500 resize_as_sparse support all compressed layouts (#85378) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85378 Approved by: https://github.com/pearu, https://github.com/cpuhrsch commit 45be74cc63aad66f496475a9513e3cc36cace5b6 Author: Wang, Eikan Date: Mon Sep 26 07:50:32 2022 +0000 Optimize to if the datatyep of the source tensor is as same as the dest datatype (#85140) The AMP inserts `_autocast_to_reduced_precision` and `_autocast_to_full_precision` automatically. The aten implementation provides a fast path to bypass the conversion if the tensor data type has been the reduced/full precision. But NNC always does the conversion which could bring >5% E2E performance regression. This PR is to address the performance issue like aten. We will not pull `_autocast_to_reduced_precision` and `_autocast_to_full_precision` into NNC fusion group and fallback to aten to trigger its fast path if the tensor data type has been the reduced/full precision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85140 Approved by: https://github.com/frank-wei commit 83261ff9a8bcf47ae343be40d171bfdce3bd613c Author: Wang, Eikan Date: Mon Sep 26 07:50:30 2022 +0000 Use high precision accmulate buffer for bf16 accmulation (#84402) Accumulation operation is not friendly to BFloat16 because its mantissa part is only 7bits while the operand could not impact the final result if it is very small. Take `a += b` as an example, `a` will become bigger with running the computation. And then, the variance between `a` and `b` also is being huge, the `b` would not impact `a`. Hence, the best practice is to use FP32 to do accumulation and then convert back to BF16 as long as the accumulation is finished. This PR also follows the best practice. We extend the `ReduceOp` by adding `accumulation` buffer and recording the result buffer and `Reducer`'s operand. Because we need to replace the original `ReduceOp` with a new `ReduceOp` to use `accumulation` buffer for reduction. - Extend `ReduceOp` by adding `accumulation` buffer and recording the result buffer and `Reducer`'s operand - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-0f4be13525117d5c49c69bd18e92eb15dda36b5a59b7a10c7e1114f5cac10afbR225-R229) - Replace the original `ReduceOp` with a new `ReduceOp` to use `accumulation` buffer for reduction - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-fac6725328dc01e235944c7afc9f29c804488973c02c25ecd93d562884d959b3R26-R36) - Cast the accumulation buffer from FP32 to BF16 and write back to the result buffer - [PR change](https://github.com/pytorch/pytorch/pull/84402/files#diff-fac6725328dc01e235944c7afc9f29c804488973c02c25ecd93d562884d959b3R62-R67) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84402 Approved by: https://github.com/frank-wei commit cf5699f2fc3e322efe2bb2446949dc30f1d36b4b Author: ssjia Date: Mon Sep 19 11:52:32 2022 -0700 [vulkan] Rewrite prepacking functions using aten functions + some code cleanup (#84973) Rewrites the convolution prepacking function using aten ops, removing a large amount of redundant code. Adds detailed comments describing the transformations that are performed. Also cleans up some unneeded code. Differential Revision: [D39486489](https://our.internmc.facebook.com/intern/diff/D39486489/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84973 Approved by: https://github.com/salilsdesai commit b360d66391f03a0d5dc2c9a7aff496324b75aa2f Author: PyTorch MergeBot Date: Tue Sep 27 02:55:59 2022 +0000 Revert "Add environment parse function that supports default value (#85563)" This reverts commit 784f4ba1ce16996d497fae2fb107425b3bbeb71b. Reverted https://github.com/pytorch/pytorch/pull/85563 on behalf of https://github.com/huydhn due to Fail test_DistributedDataParallel (main.TestDistBackendWithSpawn) commit e1e056ac447290ed113926f916ab768c0c81641b Author: PyTorch MergeBot Date: Tue Sep 27 02:38:36 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#85683) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85683 Approved by: https://github.com/pytorchbot commit 8125d2e188ed9e7be4cb3f76d2ed5c6260316ff8 Author: PyTorch MergeBot Date: Tue Sep 27 02:33:55 2022 +0000 [vision hash update] update the pinned vision hash (#85684) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85684 Approved by: https://github.com/pytorchbot commit 7a5449f148dba51ef51979cf371226b158a5b73b Author: Abhishek Pathak Date: Tue Sep 27 01:54:42 2022 +0000 [MPS] Clamp op - fix shape issues (#114) (#85673) * Handle shape mismatch * Handle case where 1 occurs in input shape; fix fill_new_shapes * Move clamp ops to allowlist Pull Request resolved: https://github.com/pytorch/pytorch/pull/85673 Approved by: https://github.com/malfet commit cce6d8d6419bcaeb8e65c809b16901810194c221 Author: Richard Barnes Date: Tue Sep 27 01:38:32 2022 +0000 Fix warning in kineto_shim.h (#85653) Fixes: ``` In file included from /dev/shm/rbarnes/tempfs/pytorch/torch/csrc/profiler/kineto_shim.cpp:1: /dev/shm/rbarnes/tempfs/pytorch/torch/csrc/profiler/kineto_shim.h:111:8: warning: private field 'saved_' is not used [-Wunused-private-field] bool saved_ = false; // Kineto's save is destructive ^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85653 Approved by: https://github.com/ezyang commit 18d8c548f4458c03ae6be60903db3ccf1c0d8d3f Author: samdow Date: Mon Sep 26 16:42:07 2022 -0400 [Modes] remove enable and rewrite mode stack (squashed) (#84774) Based on @ezyang's suggestion, mode stack now has "one true mode" which is the _only_ mode that can ever be active at the C++ level. That mode's torch dispatch is just to take the top mode in the stack, reenable itself (if we aren't at the end of the mode stack), and run the top mode's torch_{dispatch|function} This maintains that in the middle of a mode's torch dispatch, the mode itself will not be active. It changes the function the user has to call to see what the current mode is (no longer queries the C++, it's python only) but allows the user to also see the entire mode stack easily Removes `enable_torch_dispatch_mode` and `.restore()` since neither makes sense in this new setup Why do we want this? Well, a pretty common pattern that was coming up was that users had to do something like ```python def f(mode): with mode.restore(): # user needs to understand this restore thing? ... with Mode() as m: pass f(m) ``` Many users were getting error from forgetting to call `.restore` or from forgetting to add the (tbh weird) "mode instantiation" step where they use the mode as a context manager with an empty body. Really, they wanted to treat modes like context managers and just write ```python def f(mode): with mode: ... f(Mode()) ``` ** Technical Details ** With the old mode stack, we basically had a linked list so the mode itself could only be used once and had a fixed parent. In this new design, the mode stack is just a python list that we're pushing to and popping from. There's only one mode that's ever active at the C++ level and it runs the next mode in the Python list. The modes don't have state on them anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/84774 Approved by: https://github.com/ezyang, https://github.com/zou3519 commit a0be0ca16144fababbbb94f58b1fd88e41f29162 Author: Denis Vieriu <104024078+DenisVieriu97@users.noreply.github.com> Date: Tue Sep 27 01:01:16 2022 +0000 [MPS] Fix test consistency error 'mlir module expected element type ui8 but received si8' (#85666) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85666 Approved by: https://github.com/kulinseth commit b8d2ab3dd5869e0af3aa9490636acf887d664be7 Author: Ramin Azarmehr Date: Tue Sep 27 01:00:53 2022 +0000 [MPS] Fix memory leaks that cause the buffers not to be released and cause OOM (#85661) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85661 Approved by: https://github.com/kulinseth commit 755b39ba6668d2ce8c0e3ba4ee032ff0dfc827b7 Author: Wenguang Mao Date: Tue Sep 27 00:56:57 2022 +0000 [LRD] Allowing using dedicated iteration counter for learning rate (#85195) Summary: So that we could manipulate the iteration counter for lrarning rate separately (for learning rate decay or learning rate re-warming up etc), without affecting other techniques relying on iterations (such as EMA) Test Plan: Unit tests: ``` ✓ Pass: caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475) ✓ Pass: caffe2/caffe2/python:optimizer_test - test_global_norm_based_gradient_clipping (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475) ✓ Pass: caffe2/caffe2/python:optimizer_test - test_lr_injection (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475) ✓ Pass: caffe2/caffe2/python:optimizer_test - main (46.475) Summary Pass: 5 Skip: 1 ↻ caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) ListingSuccess: 1 ``` Reviewed By: liangming168 Differential Revision: D38747417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85195 Approved by: https://github.com/liangming168, https://github.com/eellison commit 784f4ba1ce16996d497fae2fb107425b3bbeb71b Author: Ke Wen Date: Tue Sep 27 00:34:50 2022 +0000 Add environment parse function that supports default value (#85563) We use "-2" to represent an unset environment variable. Now adding a util function to attach default value if environment variable is unset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85563 Approved by: https://github.com/rohan-varma, https://github.com/H-Huang commit 686555b663077b40f28bc88adc049e64035046b4 Author: George Qi Date: Mon Sep 26 21:03:05 2022 +0000 [maskedtensor] port torch/_masked into torch/masked (#85515) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85515 Approved by: https://github.com/cpuhrsch commit 90261945b71d2ac2a24bd59cbaf823a84ef3b8d2 Author: Elias Ellison Date: Mon Sep 26 20:41:18 2022 +0000 Copy over non parameter grad (#85658) Wow, ugh silly mistake. Fix for https://github.com/pytorch/torchdynamo/issues/1291 not even sure how all the tests passed before this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85658 Approved by: https://github.com/Chillee, https://github.com/anijain2305 commit 4a2d2e5e40835a7931577fc800a02574e2eb44e6 Author: Brian Hirsh Date: Mon Sep 26 11:43:12 2022 -0700 Change API type `Tensor[]` for structured kernels. (#73350) Partially fixes: #66328 This PR: - adds support for `ITensorList` to the dispatcher for: - computing the dispatch key - boxing and unboxing `ITensorList` - modified the codegen for structured kernels: - codegen APIs use `ITensorList` instead of `ArrayRef` **Changes summary:** - Signature changes due to the different APIs: - dispatcher API (e.g. `BatchingRegistrations.cpp`) - C++ API (e.g. `TensorShape.cpp`) - Miscelaneous functions used by codegen'd functions (e.g. `FunctionalTensorWrapper.*`) - Dispatcher changes for handling `ITensorList` correctly (e.g. `DispatchKeyExtractor.h`) - Signature changes of `at::cat` due to the need of `const` inside `TensorBody.h` - Forward declarations of `ITensorList` (e.g. `MethodOperators.h`) - Codegen changes, special casing structured kernels (e.g. `gen.py`) **Short description of structured kernels special casing:** I introduced, mainly, 5 types of changes to the codegen for generating code depending on whether the kernel is structured or not: 1. Added a `structured_type_override` flag to the `argument_type` function definition of the affected APIs (mainly the dispatcher and C++ APIs). - `api/cpp.py`, `api/dispatcher.py`, `api/native.py` 2. Added a `structured_type_override` member to the signature classes (e.g. `CppSignature`), since `FunctionSchema` doesn't really know whether the function is structured or not - `api/types.py` 3. Added a `part_of_structured_group` to `NativeFunction` class, which is just a convenient function to forward to `structured_type_override` wherever needed - `model.py` 4. Appropriately changed the rest of the codegen, whenever it used either the signature classes or the `arguments` function directly 5. Added a check for `const ITensorList&` type wherever there was a check for `TensorList` Pull Request resolved: https://github.com/pytorch/pytorch/pull/73350 Approved by: https://github.com/bdhirsh commit 1a2734e015a695bbd4ea4de93bf6aaaa3202eed8 Author: Huy Do Date: Mon Sep 26 21:39:00 2022 +0000 Fix broken periodic workflow after removing ios secret (#85664) Broken after https://github.com/pytorch/pytorch/pull/85597, i.e. https://github.com/pytorch/pytorch/actions/runs/3130970082 ``` The workflow is not valid. .github/workflows/periodic.yml (Line: 189, Col: 26): Invalid secret, IOS_CERT_KEY_2022 is not defined in the referenced workflow. .github/workflows/periodic.yml (Line: 190, Col: 24): Invalid secret, IOS_CERT_SECRET is not defined in the referenced workflow. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85664 Approved by: https://github.com/clee2000, https://github.com/kit1980, https://github.com/janeyx99, https://github.com/ZainRizvi, https://github.com/malfet commit e4471032dae4d68e82358e715777f2f385d7ff09 Author: Huy Do Date: Mon Sep 26 21:35:00 2022 +0000 Enforce non-virtual-dtor everywhere (#85586) This can finally be removed because NVIDIA has merged my PR on https://github.com/NVIDIA/cudnn-frontend/pull/33 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85586 Approved by: https://github.com/seemethere, https://github.com/ZainRizvi commit f325c29b05fdf350322dcb26e2cd6dff3fb06f1e Author: Mike Iovine Date: Mon Sep 26 21:30:16 2022 +0000 [fx] Make NormalizeArgs preserve node type (#85637) Summary: Make `NormalizeArgs` preserve node types when transforming the graph. This bug is preventing me from scripting a graph that goes through the fx2trt `acc_tracer`. Test Plan: New unit test Reviewed By: ipiszy Differential Revision: D39753021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85637 Approved by: https://github.com/Chillee commit 5547c6aa4e587578eb62d3b102520bc1eebba419 Author: Sherlock Huang Date: Mon Sep 26 17:57:01 2022 +0000 Match kwargs in SubgrpahMatcher (#85617) Pattern node and target node must have identical kwargs now... Use envvar `LOGLEVEL=INFO` to turn on the logging message for easier debugging... Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #85617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85617 Approved by: https://github.com/jerryzh168, https://github.com/davidberard98 commit e38b3424c3f0555c1c255130064dad60c5046c4f Author: Richard Zou Date: Mon Sep 26 08:22:13 2022 -0700 Clean up the functorch test skip mechanism; add a new decorator (#85564) This PR: - adds a `decorate` thing that can be added to skip/xfail lists. This lets people provide their own decorator (e.g. unittest.skipIf blah) - does some refactoring of the skip/xfail list mechanism to make it more sane Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/85564 Approved by: https://github.com/samdow commit 6a04df3ac85714b33dcb2af20d0eeea96d131d75 Author: cpuhrsch Date: Mon Sep 26 20:49:19 2022 +0000 Get flash_attn to compile for CUDA 11.6 linux nightly build (#84941) This PR only attempts to get this code to compile for all archs so that we can dispatch to it in https://github.com/pytorch/pytorch/pull/84653 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84941 Approved by: https://github.com/drisspg, https://github.com/malfet commit 15435325eb6c561f8a126b5c4e48ac57c064a6fa Author: Daniel Dale Date: Mon Sep 26 20:17:52 2022 +0000 Configure PyTorch Testing ArgumentParser Instance To Avoid Unnecessary Conflicts with System Args (#85616) Fixes #85615 Currently, internal test discovery instantiates an `ArgumentParser` and adds numerous arguments to the internal parser: https://github.com/pytorch/pytorch/blob/f0570354dda37c03c63377ada1ec889cf82ae9f6/torch/testing/_internal/common_utils.py#L491-L500 ... In this context, `argparse` will load [system args](https://github.com/python/cpython/blob/b494f5935c92951e75597bfe1c8b1f3112fec270/Lib/argparse.py#L1826-L1829) from any external scripts invoking PyTorch testing (e.g. `vscode`). The default behavior of `argparse` is to [allow abbreviations](https://github.com/python/cpython/blob/b494f5935c92951e75597bfe1c8b1f3112fec270/Lib/argparse.py#L2243-L2251) of arguments, but when an `ArgumentParser` instance has many arguments and may be invoked in the context of potentially conflicting system args, the `ArgumentParser` should reduce the potential for conflicts by being instantiated with `allow_abbrev` set to `False`. With the current default configuration, some abbreviations of the `ArgumentParser` long options conflict with system args used by `vscode` to invoke PyTorch test execution: ```bash python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \ ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/visualstudio_py_testlauncher.py \ --us=./test --up=test_cuda.py --uvInt=2 -ttest_cuda.TestCuda.test_memory_allocation \ --testFile=./test/test_cuda.py >>>PYTHON-EXEC-OUTPUT ... visualstudio_py_testlauncher.py: error: argument --use-pytest: ignored explicit argument './test' ``` The full relevant stack: ``` pytorch/test/jit/test_cuda.py, line 11, in \n from torch.testing._internal.jit_utils import JitTestCase\n'\ pytorch/torch/testing/_internal/jit_utils.py, line 18, in \n from torch.testing._internal.common_utils import IS_WINDOWS, \\\n' pytorch/torch/testing/_internal/common_utils.py, line 518, in \n args, remaining = parser.parse_known_args()\n' argparse.py, line 1853, in parse_known_args\n namespace, args = self._parse_known_args(args, namespace)\n' argparse.py, line 2062, in _parse_known_args\n start_index = consume_optional(start_index)\n' argparse.py, line 1983, in consume_optional\n msg = _(\'ignored explicit argument %r\')\n' ``` The `argparse` [condition](https://github.com/python/cpython/blob/b494f5935c92951e75597bfe1c8b1f3112fec270/Lib/argparse.py#L2250) that generates the error in this case: ```python print(option_string) --use-pytest print(option_prefix) --us option_string.startswith(option_prefix) True ``` It'd be nice if `vscode` didn't use two-letter options :facepalm: but PyTorch testing shouldn't depend on such good behavior by invoking wrappers IMHO. I haven't seen any current dependency on the abbreviated internal PyTorch `ArgumentParser` options so this change should only extend the usability of the (always improving!) PyTorch testing modules. This simple PR avoids these conflicting options by instantiating the `ArgumentParser` with `allow_abbrev=False` Thanks to everyone in the community for their continued contributions to this incredibly valuable framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85616 Approved by: https://github.com/clee2000 commit d5ce2bbed26a175e7bf69480759c2cfe73f42a75 Author: Fabio Rocha Date: Mon Sep 26 09:03:23 2022 +0000 [primTorch] decompositions for upsample_bicubic2d (#85403) FYI, this decomposition seems to be significantly slower than the lowering in torchinductor: ``` ------------------------------------- upsample_bicubic2d -------------------------------------] | lowering | Inductor | Eager 32 threads: ------------------------------------------------------------------------------------ (torch.Size([16, 4, 128, 256]),), ((512, 1024), True) | 1.8 | 3.880 | 1.4 (torch.Size([16, 4, 128, 256]),), ((512, 1024), False) | 1.9 | 3.887 | 1.4 ``` This seems related to the fact that in the lowering we can use int32s as the indices and in the decomp we can only use int64s (see https://github.com/pytorch/torchdynamo/issues/1293). Pull Request resolved: https://github.com/pytorch/pytorch/pull/85403 Approved by: https://github.com/ngimel commit 70cce9f8d1a099af8b017d7263897d3ca2fb9fe6 Author: PyTorch MergeBot Date: Mon Sep 26 20:07:13 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#85225) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85225 Approved by: https://github.com/pytorchbot, https://github.com/voznesenskym commit 89896b8778b76685c4fc40d1f8ce337d36105c02 Author: Bobby Impollonia Date: Mon Sep 26 19:13:00 2022 +0000 Fix typo in comment (#85635) This comment should talk about an object "leak", not an object "lead" Pull Request resolved: https://github.com/pytorch/pytorch/pull/85635 Approved by: https://github.com/kit1980 commit 291b080e8c54cdb19dd823728cf30fffece10f4d Author: Aaron Bockover Date: Mon Sep 26 18:47:06 2022 +0000 CODEOWNERS: [ONNX] remove @shubhambhokare1; add @abock (#85476) Add me to field notifications for the ONNX team, replacing @shubhambhokare1. cc @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/85476 Approved by: https://github.com/kit1980, https://github.com/AllenTiTaiWang commit a8ca0d4849111fd6c83e5ae90517f83bf1ceb6b3 Author: Vasiliy Kuznetsov Date: Fri Sep 23 10:59:21 2022 -0600 fix segmentation fault in QTensor.choose_qparams_optimized (#85552) Summary: Fixes segmentation fault in `QTensor.choose_qparams_optimized`, this guards against the user passing in a value of `numel` which does not make sense. Fixes https://github.com/pytorch/pytorch/issues/85212 Test plan: Probably not worth it to add a test for this, so testing manually. ``` import torch input = torch.full((64,), 1, dtype=torch.float32, requires_grad=False) numel = 1250999896764 n_bins = 0 ratio = 0 bit_width = 0 torch.choose_qparams_optimized(input, numel, n_bins, ratio, bit_width) // RuntimeError is thrown ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85552 Approved by: https://github.com/jerryzh168 commit bcc544e9d7b257c8b287151194d87565e95eb893 Author: Elias Ellison Date: Mon Sep 26 07:03:44 2022 -0700 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang commit 0f561f0bd21a1f3214cc4050b3e4a1cc739981e9 Author: Bin Chen Date: Mon Sep 26 16:05:17 2022 +0000 Log Watchdog events to scuba (#85391) Summary: This diff logs some events of FileTimerServer to a scuba table. The events include "server started", "server stopped", "set timer", "clear timer" and "kill worker process". Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test:local_agent_test ``` ``` Test Session: https://www.internalfb.com/intern/testinfra/testrun/1407375146936031 RE: reSessionID-2224cf79-6a28-4762-ab7c-9875adb244dc 3.4 KiB▲, 0.0 B▼ Jobs completed: 57. Time elapsed: 3084.4s. Tests finished: Pass 55. Fail 0. Fatal 0. Skip 0. 0 builds failed ``` Differential Revision: D39665560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85391 Approved by: https://github.com/d4l3k commit 60d98821c51be03acb827e0b6a81276eb3e3b1e1 Author: Richard Zou Date: Fri Sep 23 11:24:32 2022 -0700 Remove unnecessary skips in test_dispatch.py (#85557) The functorch dangling impls have been fixed, I hope CI passes Pull Request resolved: https://github.com/pytorch/pytorch/pull/85557 Approved by: https://github.com/ezyang commit b0eeffdf6f94b359270a1a6991da57994ba1d689 Author: Richard Zou Date: Fri Sep 23 10:55:51 2022 -0700 Fix printing regular tensors inside functorch transforms (#85556) Fixes https://github.com/pytorch/functorch/issues/1026 We need to disable functorch's stack-based dispatching mechanism inside the tensor printing. Otherwise, all operations that clean up the data of the Tensor for printing dispatch through the entire functorch stack and causes problems. Disabling stack-based dispatching and printing a functorch wrapped tensor is not a problem; we're still able to get the attributes on the wrapped tensor that we want. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/85556 Approved by: https://github.com/samdow commit 30fccd03a69dc0249037b84593a6f1116a44e297 Author: milesial Date: Mon Sep 26 15:15:57 2022 +0000 Make profiler table column widths changeable via arguments (#85203) Maximum widths for the name and shapes columns of profiler results tables are no longer hardcoded. If None is passed, it will use the maximum width of the data, without cropping. Fixes #70595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85203 Approved by: https://github.com/ezyang commit b32020e937a03b54427761a5a5009f064c7e5bac Author: saltyJeff Date: Mon Sep 26 15:13:24 2022 +0000 make vulkan codegen windows-compatible (#85241) Using `:` to join together paths works on *nix only. This process uses cmake's `list(APPEND ...)` to make vulkan codegen work on windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85241 Approved by: https://github.com/ezyang commit ef95baf2eca65bb796a351900025b24939502e0a Author: Yukio Siraichi Date: Mon Sep 26 14:44:37 2022 +0000 Add `IListRefTag::Materialized` to `IListRefIterator` destructor. (#85467) Fixes #85404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85467 Approved by: https://github.com/ezyang commit 9c036aa112b0a8fd9afb824d1fda058e2b66ba1d Author: Edward Z. Yang Date: Sun Sep 25 12:17:01 2022 -0700 Add SymInt to Scalar (#84958) This is by no means comprehensive, but adds initial support for SymInt as a Scalar. Things that don't work yet but need to: - for some reason `torch.add(tensor, sym_int)` got matched to the `add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor` schema - `x + sym_int` failed bc we tried to turn `x` into a sym int: ``` "__radd__", [](c10::SymIntNode a, py::object b) -> c10::SymIntNode { auto snb = toSymIntNode(a, b); return a->add(snb); }) ``` - Many more things I'm sure Pull Request resolved: https://github.com/pytorch/pytorch/pull/84958 Approved by: https://github.com/ezyang commit 33404436aaf90ec4d0a39db0f0b9f7622db3d404 Author: foram-chandra <96388449+foram-chandra@users.noreply.github.com> Date: Sun Sep 25 22:23:21 2022 +0000 [doc] Add pin_memory and layout to new_{zeros, ones, full} (#85605) Fixes #84986 Besides adding `pin_memory` and `layout`, I have also updated the signature to reflect keyword only arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85605 Approved by: https://github.com/lezcano, https://github.com/SherlockNoMad commit 6e50f8e39540b3aeec7b32109d2755786e7d9a2d Author: Daniel Dale Date: Sun Sep 25 21:54:05 2022 +0000 Allow External Scripts (e.g. vscode) To Discover and Execute unittest Tests (#85584) Fixes #85578 Currently, many test modules customize test loading and discovery via the [load_tests protocol](https://docs.python.org/3/library/unittest.html#load-tests-protocol). The salient custom behavior included (introduced with https://github.com/pytorch/pytorch/pull/13250) is to verify that the script discovering or executing the test is the same script in which the test is defined. I believe this unnecessarily precludes the use of external tools to discover and execute tests (e.g. the vscode testing extension is widely used and IMHO quite convenient). This simple PR retains the current restriction by default while offering users the option to disable the aforementioned check if desired by setting an environmental variable. For example: 1. Setup a test env: ```bash ./tools/nightly.py checkout -b some_test_branch conda activate pytorch-deps conda install -c pytorch-nightly numpy expecttest mypy pytest hypothesis astunparse ninja pyyaml cmake cffi typing_extensions future six requests dataclasses -y ``` 2. The default test collection behavior discovers 5 matching tests (only tests within `test/jit/test_cuda.py` because it doesn't alter the default `load_test` behavior: ```bash python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \ ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/testing_tools/unittest_discovery.py \ ./test test_cuda.py | grep test_cuda | wc -l 5 ``` 3. Set the new env variable (in vscode, you would put it in the .env file) ```bash export PYTORCH_DISABLE_RUNNING_SCRIPT_CHK=1 ``` 4. All of the desired tests are now discovered and can be executed successfully! ```bash python ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/get_output_via_markers.py \ ~/.vscode-server/extensions/ms-python.python-2022.14.0/pythonFiles/testing_tools/unittest_discovery.py \ ./test test_cuda.py | grep test_cuda | wc -l 175 ``` ![image](https://user-images.githubusercontent.com/7462936/192068508-a292caaf-a1d2-4115-a557-02ac5da80b60.png) A potentially relevant note, the previous behavior of the custom `load_tests` flattened all the `TestSuite`s in each test module: https://github.com/pytorch/pytorch/blob/4c01c51266afae57c6d6952c84fff2802d9b2bb9/torch/testing/_internal/common_utils.py#L3260-L3262 I haven't been able to find any code that depends upon this behavior but I think retaining the `TestSuite` structure is preferable from a user perspective and likely safe (`TestSuite`s [can be executed](https://docs.python.org/3/library/unittest.html#load-tests-protocol:~:text=test%20runner%20to-,allow%20it%20to%20be%20run,-as%20any%20other) just like `TestCase`s and this is the structure [recommended](https://docs.python.org/3/library/unittest.html#load-tests-protocol:~:text=provides%20a%20mechanism%20for%20this%3A%20the%20test%20suite) by the standard python documentation). If necessary, I can change this PR to continue flattening each test module's `TestSuite`s. Since I expect external tools using the `unittest` `discover` API will usually assume discovered `TestSuite`s to retain their structure (e.g. like [vscode](https://github.com/microsoft/vscode-python/blob/192c3eabd8a065492f237196b052145364e68cb4/pythonFiles/visualstudio_py_testlauncher.py#L336-L349)) retaining the `testsuite` flattening behavior would likely require customization of those external tools for PyTorch though. Thanks to everyone in the community for the continued contributions to this incredibly valuable framework! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85584 Approved by: https://github.com/huydhn commit f0570354dda37c03c63377ada1ec889cf82ae9f6 Author: Abhishek Pathak Date: Sun Sep 25 19:03:58 2022 +0000 [MPS] Fix memory error in var (#85571) * Fix memory corruption + wrong handling of negative dims * Use vector for shape Pull Request resolved: https://github.com/pytorch/pytorch/pull/85571 Approved by: https://github.com/malfet commit e29f2483a6edde29b3e175df759def8a6701c7fc Author: John Detloff Date: Sun Sep 25 18:36:29 2022 +0000 Remove codesigning from github actions ios build workflow (#85597) Codesigning isn't necessary for simulator builds, and we're not running any device tests. More importantly, our dev certificate is expiring at the end of the month, and we don't have a replacement. As a result, we need to remove our job dependencies on it. This commit removes our references to it from our github CI, but a follow up PR will be needed to remove it from CircleCI workflows. Co-authored-by: Nikita Shulga Pull Request resolved: https://github.com/pytorch/pytorch/pull/85597 Approved by: https://github.com/malfet commit 1a0e1db763aa1152ac2126c895763a7c5ccb47fc Author: Taylor Robie Date: Thu Sep 22 15:00:17 2022 -0700 [Profiler] Compute unique IDs for Tensors (#85162) This PR is largely based on https://github.com/pytorch/pytorch/pull/80266, with one major difference. #80266 assigned each unique {TensorImpl, StorageImpl} pair a unique ID, whereas this PR seeks to cluster the implicit graph formed by the pairs into disjoint groups and assign an ID to each disjoint group. Differential Revision: [D39563859](https://our.internmc.facebook.com/intern/diff/D39563859/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85162 Approved by: https://github.com/chaekit commit e1f9125e61feaef81fd60dc97d02acb536a178be Author: wakananai Date: Sun Sep 25 17:10:45 2022 +0000 [doc] add argument default values in rot90 (#85610) Add argument default values in rot90. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85610 Approved by: https://github.com/lezcano commit 0d86dfccf8607eac845cfa12b3fefd217892efa4 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Sun Sep 25 16:23:21 2022 +0000 Bump protobuf from 3.20.1 to 3.20.2 in /.circleci/docker (#85572) Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.20.1 to 3.20.2.
Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.20.2

C++

Commits

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=protobuf&package-manager=pip&previous-version=3.20.1&new-version=3.20.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) ---
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/pytorch/pytorch/network/alerts).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85572 Approved by: https://github.com/malfet commit bd5efbb7eefb9e8bbf997742fafe5e77a5d0991f Author: foram-chandra <96388449+foram-chandra@users.noreply.github.com> Date: Sun Sep 25 10:47:59 2022 +0000 [doc] add pin_memory argument to rand (#85221) Similar to #85123 cc - @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85221 Approved by: https://github.com/mruberry commit a8add2b92f336a99d1d25648bb4512a9ab5157b5 Author: Sherlock Huang Date: Sat Sep 24 17:57:32 2022 +0000 Support matching Args for SubgraphMatcher (#85456) Subgraph matcher now handles the matching of non-Node arguments. Here are the 4 cases - pn is Node, gn is Node: this go through the regular _match_node() function - pn is Noed, gn is not a Node: this is a match if only pn is a placeholder op - pn is not Node, gn is Node: this is a no match case - pn is not a Node, gn is not a Node: this will go through the argument comparison. With this change ``` def target(x): return foo(x, 3) def pattern(x, y): return foo(x, y) ``` is a match Pull Request resolved: https://github.com/pytorch/pytorch/pull/85456 Approved by: https://github.com/jerryzh168 commit db40fbdee03920944219588464d38774ca0b3d05 Author: Howard Huang Date: Sat Sep 24 18:00:28 2022 +0000 Add deprecation warning to ProcessGroupRoundRobin (#85158) Trying to add any deprecation messages we anticipate we need before 1.13 branch cut. Add deprecation message to process group round robin. ```python import torch.distributed as dist if __name__ == "__main__": pg = dist._round_robin_process_groups( [ dist.ProcessGroupGloo(dist.TCPStore("localhost", 29500, 1, True), 0, 1) ] ) ``` gives message ``` W0916 16:19:38.367360 68031 ProcessGroupRoundRobin.cpp:12] Warning: ProcessGroupRoundRobin is deprecated and scheduled to be removed after this current release (1.13). Please file an issue on https://github.com/pytorch/pytorch/issues if there are any concerns or issues with this deprecation. (function ProcessGroupRoundRobin) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85158 Approved by: https://github.com/rohan-varma commit 41be45f0f4c6db2755be907db4f4a1665fe312e0 Author: PyTorch MergeBot Date: Sat Sep 24 12:35:21 2022 +0000 Revert "Create a quantized version ReLU function for CUDA (#85502)" This reverts commit 93a53ff4d92c883d87cc7aee35af719039b481a8. Reverted https://github.com/pytorch/pytorch/pull/85502 on behalf of https://github.com/janeyx99 due to Sorry, reverting as 10.2 builds on trunk broke due to this change, see https://hud.pytorch.org/pytorch/pytorch/commit/93a53ff4d92c883d87cc7aee35af719039b481a8 commit a531a604a093528721f970d922cd8e72ed9f0f8f Author: Wang, Eikan Date: Fri Sep 23 06:04:13 2022 +0000 Support BF16ImmPtr (#84041) - To support BF16 Immediate value by converting it to uint16. The behavior is as same as BF16 tensor - Enable BF16 test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84041 Approved by: https://github.com/ZolotukhinM commit ffaff8896a2716cca5a29315124b1b63c475e80f Author: Fabio Rocha Date: Thu Sep 22 16:46:37 2022 +0000 Removed None arg check in test/test_decomp.py (#85402) Not sure why this check was necessary? Tests seem to run fine without it. There were definitely tests this was skipping before that it shouldn't, e.g., pretty much all of the tests for `torch.nn.functional.interpolate` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85402 Approved by: https://github.com/ezyang commit d3be4245bb416a676c4faf53ebfa3bf55ba32bbc Author: Wang, Eikan Date: Fri Sep 23 06:05:09 2022 +0000 Fix the issue that cat result would be incorrect for channels-last (#85076) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85076 Approved by: https://github.com/frank-wei commit 2efea21c52c0a9f0818f33d4520d699cca90cea3 Author: Sunita Nadampalli Date: Sat Sep 24 08:05:27 2022 +0000 [mkldnn_matmul] enable mkldnn matmul for aarch64 bf16 devices (#83671) (#85546) this PR enables mkldnn matmul for aarch64 bf16 devices for both bf16 as well as fp32 input. This PR is dependent on cpuinfo commit update PR: https://github.com/pytorch/pytorch/pull/83620 Issue: https://github.com/pytorch/pytorch/issues/83594 This is a reland of https://github.com/pytorch/pytorch/pull/83671 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85546 Approved by: https://github.com/kit1980 commit 93a53ff4d92c883d87cc7aee35af719039b481a8 Author: Feisi Fu Date: Sat Sep 24 05:59:13 2022 +0000 Create a quantized version ReLU function for CUDA (#85502) Summary: this is to allow the relu function to run on a quantized tensor on cuda. That is torch.relu(qa) for a quantized tensor qa on cuda. Test Plan: python test/test_quantization.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/85502 Approved by: https://github.com/dzdang commit e7e1cd945fe218fde228cedbdb1509f1750f70ea Author: Jane Xu Date: Sat Sep 24 03:47:33 2022 +0000 Add path optimize kwarg to einsum (#84890) - [x] add c++ support for an optimize path - [x] add python opt_einsum path passthrough - [x] add opt_einsum to OSS requirements, but a soft one - [x] show benchmark results here Additional things I've explored + their conclusions: - **Delaying the summing over dimensions** => added! - The idea here is to not incur kernel calls to `sum` as we try to early sum out in einsum. Thus, we collect all the dimensions that need to be summed together in one contraction + sum at the end instead of summing as we go. While this optimization didn't feel like it made things faster for the random cases we've selected (they all summed 1 dim per contraction), it is a good principle and would help more common use cases that would reduce multiple dimensions at a time (like `bxy,xyi,xyj->bij`). - **Caching contract_path based on equation and tensor sizes** => dropped :( - The benchmarks were strictly worse for all the cases, and, from scanning the use cases, I observed people do not often call einsum on the same equation/tensor order enough for caching to be justified. I do think caching can be effective in the future, but it would require further investigation. - adding opt_einsum package to OSS CI - adding it to internal CI - potentially adding a kwarg path argument to the python API -- if the path is given, we wouldn't have to spend time calculating it, but there would be some time lost validating user input. - Added more tests to CI **TL;DRs** - **torch.einsum with opt_einsum is a definite win for the production case**. - **torch.einsum with opt_einsum installed is consistently fast, but has an overhead** of needing to find the path. If the path is already found/optimal, it will be slightly slower. - The einsum overhead decreases for bigger dimensions. - **torch.einsum without opt_einsum installed is comparable to before this commit**, with occasional slowness potentially due to not reshaping/squeezing as we contract until the end. - For many of the random generated cases, the dimensions were too similar and small where an optimal order wasn't that much more optimal than just going left to right. However, in production, dimensions are commonly quite distinct (batch size will be small, but the data will be huge). - **torch.einsum opt is comparable (slightly faster overall) compared to numpy.einsum opt for the cpu case**. This is interesting given that torch.einsum currently spends time computing the path, but numpy.einsum takes it as input. - **torch.einsum opt is significantly faster than numpy.einsum opt for the gpu case**. This is because numpy doesn't take advantage of GPUs. The following benchmarks were done on an A100 GPU and Linux CPUs. The line in the first chart separates GPU (on top) from CPU, and the line in the second graph separates CPU (on top) and then GPU. Sorry it's flipped 😛 . Production example (see [colab benchmark](https://colab.research.google.com/drive/1V2s4v1dOOKwRvp5T_DC-PNUosOV9FFJx?authuser=1#scrollTo=WZoQkC8Mdt6I) for more context): image Randomly generated examples (the same ones as in https://github.com/pytorch/pytorch/pull/60191) image Open below to see old + not super relevant benchmarking results:
Benchmark results BEFORE this PR (on Linux -- I will update devices so they are consistent later): image Benchmark results with the code on this PR (on my x86 mac): For the CPU internal use case -- ![image](https://user-images.githubusercontent.com/31798555/190801376-6f591b00-cebd-4ca7-bb23-ae8f17f1634e.png) For the general use case -- It looks like numpy opt still does better in several of these random cases, but torch einsum opt is consistently faster than torch.einsum. ![image](https://user-images.githubusercontent.com/31798555/190811730-fbb6797d-af59-4f5a-92da-ba4103372014.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84890 Approved by: https://github.com/albanD, https://github.com/soulitzer commit e78e00f4d98c4376e298902db8aae7e7057e86df Author: PyTorch MergeBot Date: Sat Sep 24 02:31:59 2022 +0000 [vision hash update] update the pinned vision hash (#85581) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85581 Approved by: https://github.com/pytorchbot commit 2b6d2cad29fc1652f80199d647306b9c7c841ca9 Author: Sergii Dymchenko Date: Sat Sep 24 01:39:19 2022 +0000 Remove @saketh-are from CODEOWNERS (#85521) saketh-are no longer has write access to the repository. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85521 Approved by: https://github.com/huydhn commit 4d3acf12034132c422606d175ca535359123023c Author: Huy Do Date: Sat Sep 24 01:17:04 2022 +0000 Enable pytest-shard for functorch (#85321) This extends https://github.com/pytorch/pytorch/pull/84961 to support functorch tests with pytest-shard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85321 Approved by: https://github.com/samdow, https://github.com/clee2000 commit 70b27e91c7160bdf016511fc67940f5b89f5a30f Author: Sourav Mandal Date: Sat Sep 24 01:02:40 2022 +0000 [pytorch] Skip linalg tests that fail on Meta infra (#85577) Summary: test_inverse_errors_large and test_linalg_solve_triangular fail for dtype=float64 when invoked on GPUs on Meta internal testing infra. Skip in Meta internal testing. Test Plan: (observe tests skipped on Meta internal infra) Reviewed By: mikekgfb Differential Revision: D39785331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85577 Approved by: https://github.com/malfet commit a554a546b382949b4ba8518d2a594956a6ed3fbf Author: Ke Wen Date: Sat Sep 24 01:02:35 2022 +0000 Update PyTorch Distributed CODEOWNERS (#85560) Add Ke Wen to PyTorch Distributed modules Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/85560 Approved by: https://github.com/H-Huang commit 7d8ee38a5c6b9d852b97605cbfdded5183b6524a Author: Mike Iovine Date: Sat Sep 24 01:01:34 2022 +0000 [Static Runtime] Fix prim::If tuple corner case (#85446) Summary: We currently assume that a tuple output implies that the prim::If node returns multiple unpacked outputs, but this is not guaranteed to be the case. Add some logic to return the wrapped tuple if necessary Test Plan: New unit test Differential Revision: D39712050 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85446 Approved by: https://github.com/tenpercent commit 18685b7fe1e33d9102526b515df855fec0e2c445 Author: supriyar Date: Thu Sep 22 13:31:19 2022 -0700 Update PT maintainers list for AO (#85125) Summary: Update the list based on recommendation in https://github.com/pytorch/pytorch/blob/master/docs/source/community/build_ci_governance.rst Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D39745619](https://our.internmc.facebook.com/intern/diff/D39745619) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85125 Approved by: https://github.com/gchanan commit a38e43e936c725cd5dd59617b5806c31f13eab0c Author: Alex Beloi Date: Fri Sep 23 23:36:57 2022 +0000 [perf][1/5] Replace IValue::toString()->string() with IValue::toStringRef() (#85437) Summary: `IValue::toString()` creates a `new c10::intrusive_ptr` (like `std::shared_ptr`) and `->string()` immediately accesses it, creating an atomic reference increment/decrement. We can skip both of these operations by calling `IValue::toStringRef()`. Test Plan: CI Reviewed By: jaybean-dev Differential Revision: D39605242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85437 Approved by: https://github.com/jfix71 commit ea81138bd6b688554ad307e370bebeffd264f1b7 Author: dzdang Date: Thu Sep 22 21:27:32 2022 -0400 [quant][improvement][better-engineering] Refactored get_supported_device_types into common_quantization.py (#79607) Summary: Both test_quantized_tensor.py and test_quantize_fx.py had the same get_supported_device_types function defined. This PR refactors it into the common_quantization.py file for common usage Test Plan: ``` python test/test_quantization.py ``` Differential Revision: [D37173692](https://our.internmc.facebook.com/intern/diff/D37173692) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79607 Approved by: https://github.com/jerryzh168 commit 12ae3bea437e760d4fede3f1c50c2c81af3f687c Author: nikitaved Date: Fri Sep 23 23:31:17 2022 +0000 Faster mul(sparse, sparse) with broadcasting in dense dims. (#85336) This is a combo PR of https://github.com/pytorch/pytorch/pull/84929 and ~https://github.com/pytorch/pytorch/pull/83428~. Preliminary benchmarks (square matrices of shape (n, n)).
Script ```python import torch import math from IPython import get_ipython from itertools import product, repeat import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) problem_dims = ( (10000, 100), (100000, 1000), (1000000, 10000), (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), ) name = "PR" device = "cuda" results = [] for n, nnz in problem_dims: def gen_tensor(coalesce=False): shape = (n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device) colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz, device=device) itemidx = torch.hstack((itemidx, itemidx)) xvalues = torch.hstack((xvalues, xvalues)) res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape) if coalesce: return res.coalesce() else: return res for x_coalesce, y_coalesce in product(*repeat((True, False), 2)): x = gen_tensor(x_coalesce) y = gen_tensor(y_coalesce) smtp = "x * y" timer = Timer(smtp, globals=globals(), label="coo.mul", description=f"{name}: mul, device: {device}", sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_{device}_mul.pickle", 'wb') as f: pickle.dump(results, f) ```
Gather results ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "master" ] device = 'cuda' timers = [] for name in files: with open("{}_{}_mul.pickle".format(name, device), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ```
CUDA ``` [------------------------------------------------- coo.mul -------------------------------------------------] | PR: mul, device: cuda | master: mul, device: cuda 24 threads: ------------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) | 95 | 91 n=10000, nnz=100, coalesce=((True, False)) | 87 | 242 n=10000, nnz=100, coalesce=((False, True)) | 87 | 226 n=10000, nnz=100, coalesce=((False, False)) | 130 | 371 n=100000, nnz=1000, coalesce=((True, True)) | 100 | 521 n=100000, nnz=1000, coalesce=((True, False)) | 90 | 649 n=100000, nnz=1000, coalesce=((False, True)) | 100 | 659 n=100000, nnz=1000, coalesce=((False, False)) | 200 | 781 n=1000000, nnz=10000, coalesce=((True, True)) | 100 | 4861 n=1000000, nnz=10000, coalesce=((True, False)) | 100 | 5012 n=1000000, nnz=10000, coalesce=((False, True)) | 98 | 5010 n=1000000, nnz=10000, coalesce=((False, False)) | 384 | 5174 n=10, nnz=100, coalesce=((True, True)) | 100 | 79 n=10, nnz=100, coalesce=((True, False)) | 100 | 221 n=10, nnz=100, coalesce=((False, True)) | 100 | 221 n=10, nnz=100, coalesce=((False, False)) | 100 | 350 n=10, nnz=1000, coalesce=((True, True)) | 100 | 100 n=10, nnz=1000, coalesce=((True, False)) | 100 | 240 n=10, nnz=1000, coalesce=((False, True)) | 100 | 254 n=10, nnz=1000, coalesce=((False, False)) | 100 | 392 n=10, nnz=10000, coalesce=((True, True)) | 100 | 110 n=10, nnz=10000, coalesce=((True, False)) | 110 | 286 n=10, nnz=10000, coalesce=((False, True)) | 110 | 286 n=10, nnz=10000, coalesce=((False, False)) | 271 | 455 n=100, nnz=1000, coalesce=((True, True)) | 110 | 851 n=100, nnz=1000, coalesce=((True, False)) | 110 | 1000 n=100, nnz=1000, coalesce=((False, True)) | 110 | 990 n=100, nnz=1000, coalesce=((False, False)) | 140 | 1124 n=100, nnz=10000, coalesce=((True, True)) | 110 | 5137 n=100, nnz=10000, coalesce=((True, False)) | 110 | 5391 n=100, nnz=10000, coalesce=((False, True)) | 100 | 5405 n=100, nnz=10000, coalesce=((False, False)) | 249 | 5539 n=1000, nnz=10000, coalesce=((True, True)) | 100 | 8598 n=1000, nnz=10000, coalesce=((True, False)) | 100 | 8800 n=1000, nnz=10000, coalesce=((False, True)) | 100 | 8782 n=1000, nnz=10000, coalesce=((False, False)) | 255 | 8956 n=1000, nnz=100000, coalesce=((True, True)) | 120 | 84500 n=1000, nnz=100000, coalesce=((True, False)) | 200 | 88560 n=1000, nnz=100000, coalesce=((False, True)) | 160 | 89000 n=1000, nnz=100000, coalesce=((False, False)) | 373 | 89000 n=1000, nnz=1000000, coalesce=((True, True)) | 312 | 606400 n=1000, nnz=1000000, coalesce=((True, False)) | 1340 | 609200 n=1000, nnz=1000000, coalesce=((False, True)) | 1340 | 609100 n=1000, nnz=1000000, coalesce=((False, False)) | 4408 | 611400 Times are in microseconds (us). ```
CPU ``` [------------------------------------------------ coo.mul ------------------------------------------------] | PR: mul, device: cpu | master: mul, device: cpu 24 threads: ----------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) | 8 | 8 n=10000, nnz=100, coalesce=((True, False)) | 32 | 34 n=10000, nnz=100, coalesce=((False, True)) | 32 | 34 n=10000, nnz=100, coalesce=((False, False)) | 41 | 56 n=100000, nnz=1000, coalesce=((True, True)) | 24 | 24 n=100000, nnz=1000, coalesce=((True, False)) | 90 | 100 n=100000, nnz=1000, coalesce=((False, True)) | 87 | 100 n=100000, nnz=1000, coalesce=((False, False)) | 231 | 255 n=1000000, nnz=10000, coalesce=((True, True)) | 190 | 200 n=1000000, nnz=10000, coalesce=((True, False)) | 908 | 2023 n=1000000, nnz=10000, coalesce=((False, True)) | 800 | 2036 n=1000000, nnz=10000, coalesce=((False, False)) | 3684 | 3989 n=10, nnz=100, coalesce=((True, True)) | 8 | 7 n=10, nnz=100, coalesce=((True, False)) | 34 | 30 n=10, nnz=100, coalesce=((False, True)) | 33 | 30 n=10, nnz=100, coalesce=((False, False)) | 44 | 50 n=10, nnz=1000, coalesce=((True, True)) | 8 | 7 n=10, nnz=1000, coalesce=((True, False)) | 100 | 100 n=10, nnz=1000, coalesce=((False, True)) | 130 | 100 n=10, nnz=1000, coalesce=((False, False)) | 746 | 210 n=10, nnz=10000, coalesce=((True, True)) | 8 | 7 n=10, nnz=10000, coalesce=((True, False)) | 1000 | 1500 n=10, nnz=10000, coalesce=((False, True)) | 1000 | 1510 n=10, nnz=10000, coalesce=((False, False)) | 3063 | 2457 n=100, nnz=1000, coalesce=((True, True)) | 25 | 25 n=100, nnz=1000, coalesce=((True, False)) | 180 | 130 n=100, nnz=1000, coalesce=((False, True)) | 200 | 130 n=100, nnz=1000, coalesce=((False, False)) | 271 | 255 n=100, nnz=10000, coalesce=((True, True)) | 100 | 100 n=100, nnz=10000, coalesce=((True, False)) | 2444 | 2290 n=100, nnz=10000, coalesce=((False, True)) | 2455 | 2357 n=100, nnz=10000, coalesce=((False, False)) | 5316 | 3783 n=1000, nnz=10000, coalesce=((True, True)) | 204 | 211 n=1000, nnz=10000, coalesce=((True, False)) | 2457 | 2480 n=1000, nnz=10000, coalesce=((False, True)) | 2448 | 2539 n=1000, nnz=10000, coalesce=((False, False)) | 3665 | 4801 n=1000, nnz=100000, coalesce=((True, True)) | 2293 | 2374 n=1000, nnz=100000, coalesce=((True, False)) | 9000 | 24620 n=1000, nnz=100000, coalesce=((False, True)) | 8000 | 25080 n=1000, nnz=100000, coalesce=((False, False)) | 26500 | 47650 n=1000, nnz=1000000, coalesce=((True, True)) | 10000 | 13000 n=1000, nnz=1000000, coalesce=((True, False)) | 80000 | 362200 n=1000, nnz=1000000, coalesce=((False, True)) | 78050 | 392600 n=1000, nnz=1000000, coalesce=((False, False)) | 312100 | 766900 Times are in microseconds (us). ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85336 Approved by: https://github.com/cpuhrsch commit 40d3e55b7d20d03d2da5c94d8f6c12a8c64fdbfa Author: Huy Do Date: Fri Sep 23 23:29:12 2022 +0000 Temporary fix to skip NVIDIA driver installation from RHEL repo (#85569) This is a temporary fix until torchrec and FBGEMM are updated to use PyTorch NVIDIA installation script instead of using the latest driver from RHEL repo. It might take a day or so to finish updating the 2 repos, so I want to have this in place to avoid any issue with NVIDIA driver till then. The driver from RHEL repo `515.65.01` is even newer than what we are using in PyTorch CI `515.57`. So everything should just work with both of them Pull Request resolved: https://github.com/pytorch/pytorch/pull/85569 Approved by: https://github.com/clee2000 commit 4befe45084ace174e7f24a4d2db5ec372f633375 Author: Renfei Chen Date: Fri Sep 23 23:21:54 2022 +0000 [FX] Add one option to maintain the FX graph execution order after splitting_module (#85188) Summary: {F770932209} Given the original execution order and the node dependency relationship (note that the same dependency order could generate multiple execution order, which refers to “Topological Order”), after reunion, we could find the new execution order of the new GraphModule is different from the original one which is not what we want. For example, let’s assume that NewLeaf_1 is EmbeddingLookup (Calling EmbeddingLookup is awaitable, we will keep executing the following nodes rather than waiting for the result until we have to know the lookup result), NewLeaf_4 is the node where we HAVE to get the lookup result to interact with the NewLeaf_3. So NewLeaf_1 will launch a lookup kernel and all2all communication stream to distribute the result to all ranks. In the meantime, we want to keep executing NewLeaf_2 and NewLeaf_3 to avoid meaningless waiting. However, given the new execution order, we have to wait for the lookup kernel and all2all communication to be finished since the next node NewLeaf_4 needs the result, until then we can execute NewLeaf_2, etc. It cannot leverage the advantage of parallel computation and communication stream and will hurt the QPS a lot. So while constructing the GraphModule, we have to change from the topological order to the original order Test Plan: Unit test Not sure how to add tests in FX as there's no TARGETS, so I added in the TorchRec folder Differential Revision: D39567314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85188 Approved by: https://github.com/SherlockNoMad commit 4f5f2c1a9e09ff09947da240ac3209e9dd13a5a3 Author: George Gensure Date: Fri Sep 23 23:07:01 2022 +0000 Add torch.nested to ovrsource (#85384) Summary: Prevent a build break for ovrsource dependency. Stacked changes will help to prevent further regressions in this target. Test Plan: Build //arvr/projects/codec_avatar/pylab/examples/demos/pica:pica. Without this change, it will fail on linking torch with an undefined symbol. With it, the build will proceed. Differential Revision: D39669887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85384 Approved by: https://github.com/kit1980 commit 4c01c51266afae57c6d6952c84fff2802d9b2bb9 Author: Edward Z. Yang Date: Fri Sep 23 12:22:13 2022 -0700 Symintifying slice ops (#85196) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85196 Approved by: https://github.com/ezyang commit 604487f239b5d9312106de3e336ea227ea946993 Author: Edward Z. Yang Date: Fri Sep 23 12:20:12 2022 -0700 OpInfo for Slice (#85554) This is based on wconstab tests from #84680 Technically, slice is covered by the __getitem__ opinfo, but it is easier to debug/test on a more narrow internal function that only uses this functionality and not other advanced indexing stuff. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85554 Approved by: https://github.com/mruberry, https://github.com/wconstab commit bc6dc8d271d6cf4d0ae381077f59fc7bb7cf024d Author: Kshiteej K Date: Fri Sep 23 21:40:07 2022 +0000 [fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330) Ref: #69991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85330 Approved by: https://github.com/zou3519 commit 2e81710366279af67f3e05fe00011a99f962e61f Author: andrewor14 Date: Fri Sep 23 06:54:18 2022 -0700 [Quant] Add initial Executorch BackendConfig (#85527) Summary: This commit adds the initial BackendConfig for backends PyTorch lowers to through the Executorch stack. This initial version is only intended to cover the following set of ops: quantized::linear_dynamic, quantized::add, quantized::batch_norm2d, quantized::conv2d.new, quantized::linear, quantized::conv2d_relu.new, aten::relu_, aten::_adaptive_avg_pool2d, aten::_reshape_alias_copy, aten::squeeze.dim, aten::permute For now, the `BackendPatternConfig` for each of these ops is the same as the ones for the corresponding ops in the FBGEMM `BackendConfig`, though this may change in the future. Reviewers: jerryzh168, vkuzo Subscribers: jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85527 Approved by: https://github.com/jerryzh168 commit a8074a1a0ba6ec4765e22a26ea37e7e4ac5f3f99 Author: Rohan Varma Date: Thu Sep 22 11:57:41 2022 -0700 [Checkpoint] rename apply_ac_wrapper (#85449) Per title Differential Revision: [D39714855](https://our.internmc.facebook.com/intern/diff/D39714855/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85449 Approved by: https://github.com/awgu commit cc64f64670544f41f4307a44592fc0e4699c3747 Author: Rohan Varma Date: Thu Sep 22 11:57:41 2022 -0700 [Docs] Minor fix to apply_ac doc (#85448) Per title Created from CodeHub with https://fburl.com/edit-in-codehub Differential Revision: [D39714530](https://our.internmc.facebook.com/intern/diff/D39714530/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85448 Approved by: https://github.com/awgu commit a4c94f0739158d2f7fd27f2be59b77f33027e1c7 Author: Sean Ross-Ross Date: Thu Sep 22 14:27:43 2022 -0500 Fix cuda issue with sparse.sampled_addmm (#85194) fixes https://github.com/pytorch/pytorch/issues/85169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85194 Approved by: https://github.com/amjames, https://github.com/nikitaved commit 49e10c15981b21fc04a10a74ea506b5cbcaf7074 Author: Catherine Lee Date: Fri Sep 23 20:45:20 2022 +0000 [ci] test_ops in parallel, ci tests log to file (#85528) part one of splitting up https://github.com/pytorch/pytorch/pull/84961 into (probably 2) parts contains * logging to file * testing test_ops in parallel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85528 Approved by: https://github.com/huydhn commit 0e582fbfcc8cab66c0265d3fe326e3dc505855d1 Author: jjsjann123 Date: Wed Sep 21 15:03:10 2022 -0700 [NVFuser] Upstream push 0907 (#84626) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Codegen changes include: - codegen improvement: i. improved view support on pointwise and transpose scheduler ii. grouped grid welford added for better outer-norm grid persistence in normalization - misc: i. new composite ops added: variance_mean , arange, ii. fixes misaligned address for transpose scheduler iii. refactor on separation of compilation API from execution API to prepare us for async compilation iv. double type support on expression evaluator v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN Commits that's in this PR from the devel branch: ``` 89330aa23aa804340b2406ab58899d816e3dc3d2 Tensor factories must set the output shape as its input (#1939) b2fd01ea9346712c6d6f623ca6addbc4888d008e arange support (#1933) 56c00fd3922dad7dfc57351ad7d780f0f2f8e4ed Double support on all expression evaluators (#1937) 371f28223e57fe3f6b5e50a0a45177e6a5c0785c Improve trivial reduction merge support (#1931) 1d0c26790e5647920b40d419d26815bbe310b3a6 Test `rand` in a fusion with zero tensor input (#1932) 0dab160fb2177d178eef3148c6a529e0855009e9 Fix softmax bwd sizes. (#1890) ef98f360f6d3e3e1cc662ecb65202d88150f128d Fix a bug (#1936) 63132a0c56508c550084b07fb76a3df865102d00 Propagate permissive mapping information into indexing pass (#1929) b4ac2c88d78078ee4d8b21c4fc51645b5710a282 Map IterationDomains through view operations. (#1919) c0a187a7619d7cf9dc920294e15461791e8d6d4d do not use deprecated functions (#1935) 88de85e758c5e4afb7b6e746573c0d9a53b4cea7 Upstream cherry pick fixes 0811 (#1934) b247dcf7c57dc6ac3f7a799b0a6beb7770536a74 Separate kernel compilation API from kernel execution API (#1914) b34e3b93ee1a8030730c14af3995dd95665af07d Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924) 14a53e6707f43bf760494c238a46386d69830822 Nullary RNGOp (#1892) 3c3c89e638f5172cafb0761f22bacd1fd695eec3 Misc fixes/tuning for transpose scheduler (#1912) 20cf109c8b44d48f61977e35bae94368985144ac Grouped grid welford (#1921) 6cf7eb024c9e53c358cbe56597e117bad56efefd Transpose scheduler small dim sizes better support (#1910) 9341ea9a5bf42f9b14ccad0c94edbc79fc5bb552 Disabled ViewPersistentShmoo sizes that results in NAN (#1922) 057237f66deeea816bb943d802a97c1b7e4414ab Fix CUDA driver error: misaligned address for transpose scheduler (#1918) 3fb3d80339e4f794767a53eb8fdd61e64cf404a2 Add variance_mean function using Welford (#1907) 98febf6aa3b8c6fe4fdfb2864cda9e5d30089262 Remove DisableOption::UnrollWithRng (#1913) ee8ef33a5591b534cf587d347af11e48ba7a15d4 Minor fix for the debug interface of using PTX directly (#1917) 6e8f953351f9dabfd1f991d8431cecb6c2ce684d Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916) 5eefa9a72385f6a4b145680a9dcc52d7e8293763 dopt is only available since nvrtc 11.7 (#1915) 2ec8fc711eafc72451eebf0f5e2a98a38bf3f6ef Kill computeAtBetween (#1911) d0d106a1d9af118d71673173674e875be35d259d Improve view support on pointwise and transpose scheduler (#1906) e71e1ecefe67219846070590bbed54bbc7416b79 Fix name clash of RNG with shared memory (#1904) 3381793a253689abf224febc73fd3fe2a0dbc921 Fix mutator and sameAs for expanded IterDomain (#1902) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84626 Approved by: https://github.com/malfet commit 52a8be523ce682ce26dd793a4154b668b1f37703 Author: atalman Date: Fri Sep 23 20:28:36 2022 +0000 Adjust retry time for conda upload (#85545) Adjusting retry times for conda upload. Refer to this failure: https://github.com/pytorch/pytorch/actions/runs/3110932965/jobs/5043384691 ``` Error: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) + sleep 1 ...... Error: ('file osx-arm64/pytorch-1.13.0.dev20220923-py3.9_0.tar.bz2 already exists or being uploaded for package pytorch version 1.13.0.dev20220923. if your previous upload failed, please wait 2 minutes before trying again', 409) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85545 Approved by: https://github.com/datumbox commit 3007093007670c7fcf7ba54b07afadcfe2241d86 Author: atalman Date: Fri Sep 23 20:21:36 2022 +0000 Add new cudnn buid for linux only (#85549) Add new cudnn buid for linux only New pypi packaged are available only for linux: https://pypi.org/project/nvidia-cudnn-cu11/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/85549 Approved by: https://github.com/malfet commit d83ca9ebff09225d90a5fbae3edd533ebf1cd1aa Author: Nikita Shulga Date: Fri Sep 23 10:14:37 2022 -0700 [CI] Make `cuda-arch-list` a parameter to linux-build (#85523) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85523 Approved by: https://github.com/huydhn commit 108b25db25ef180bfbfaed2347c9b99103aa68ef Author: Edward Z. Yang Date: Fri Sep 23 13:44:42 2022 -0400 Let antoniojkim snoop all symbolic shapes PRs (#85555) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85555 Approved by: https://github.com/qihqi, https://github.com/wconstab commit 4dfaca6fb14d27fb17e498fd39861b26267ab06d Author: Taylor Robie Date: Thu Sep 22 15:00:15 2022 -0700 [Profiler] Clean up Tensor representation (#85161) I want to start using `TensorMetadata` elsewhere in profiler so we have a common representation of Tensor. The main changes in this PR are: 1) Replace raw pointers with strong typedefs and create a custom type caster to handle moving them to Python. 2) Adding a `device()` method to handle reassembling type and index. Differential Revision: [D39563965](https://our.internmc.facebook.com/intern/diff/D39563965/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85161 Approved by: https://github.com/chaekit commit e296a82f239b431d97204f0c2891f4c49bef8f6b Author: Taylor Robie Date: Thu Sep 22 15:00:13 2022 -0700 [Profiler] Capture storage data pointer (#84276) This is approximately a re-land of the storage half of https://github.com/pytorch/pytorch/pull/80266 I've directly represented and exposed storage impl rather than using it as a first guess for an ID. (Mostly for testing, which happened to save me as I was initially recording the wrong thing.) Differential Revision: [D39136546](https://our.internmc.facebook.com/intern/diff/D39136546/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84276 Approved by: https://github.com/slgong-fb commit 4615d1bcfa0915a992e7445086ba559ca7441607 Author: Masaki Kozuki Date: Fri Sep 23 18:56:00 2022 +0000 resubmit: [mta] APEX style Fused Adam (#81705) (#85507) This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel. related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705 Approved by: https://github.com/ngimel cc @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85507 Approved by: https://github.com/ngimel commit f1a6f32b72b7c2b73277f89bbf7e7459a400d80a Author: Erjia Guan Date: Fri Sep 23 18:52:52 2022 +0000 [DataLoader] Make distributed lazily initialized & share seed via PG (#85279) Fixes #84492 https://github.com/pytorch/data/issues/772 - Move the logic of distributed sharding from the constructor of DataLoader to the constructor of DataLoaderIterator. This would prevent the Error caused by lazy distributed process initialization - Replace distributed store by process group (`gloo`) to share the random seed because `mpi` backend doesn't provide distributed store. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85279 Approved by: https://github.com/NivekT, https://github.com/VitalyFedyunin commit e3766e9855d4fcf8d7d9b23f5f1f75ded51d8b9e Author: George Qi Date: Fri Sep 23 06:00:53 2022 +0000 [maskedtensor] move __torch_function/dispatch__ functions to a map (#85529) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85529 Approved by: https://github.com/bhosmer commit 7893748900c405ea4ba2a1eb525824a43ad8009a Author: Zain Rizvi Date: Fri Sep 23 18:23:34 2022 +0000 Add instructions on how to merge a PR (#85280) Adding basic instructions for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/85280 Approved by: https://github.com/malfet, https://github.com/huydhn, https://github.com/janeyx99 commit c7b17d7eb165b2311aac7ed6a9618d2136787f48 Author: Kevin Stephano Date: Fri Sep 23 18:03:35 2022 +0000 Add nvprims `rand_like` support for Dropout (#85077) NM Pull Request resolved: https://github.com/pytorch/pytorch/pull/85077 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry commit 1e4c88518c5cf9b41b8a652ae2ed1eef6ce6f000 Author: Elias Ellison Date: Thu Sep 22 21:57:25 2022 +0000 Fake tensor refactorings (#85498) The only semantic change is moving the error checking before the dynamic shapes handling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85498 Approved by: https://github.com/ezyang commit d10de31cc833f1defa2cb64fef3c27f657a3dee2 Author: PyTorch MergeBot Date: Fri Sep 23 17:21:43 2022 +0000 Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)" This reverts commit 78afa0cf0ca04ce437ca4b519f07c04e73fe0d4c. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk https://hud.pytorch.org/pytorch/pytorch/commit/78afa0cf0ca04ce437ca4b519f07c04e73fe0d4c commit eb570ab7d0fd5df88fccf90cdadc581c722d20ef Author: PyTorch MergeBot Date: Fri Sep 23 17:19:06 2022 +0000 Revert "add amp tests (#85434)" This reverts commit c2f4bbe66918d167401ff5804c6b2d24fc6bda40. Reverted https://github.com/pytorch/pytorch/pull/85434 on behalf of https://github.com/clee2000 due to broke rocm and slow tests on trunk https://hud.pytorch.org/pytorch/pytorch/commit/c2f4bbe66918d167401ff5804c6b2d24fc6bda40 commit 3b195fd33e5149daac89fff5e9f9336cdafe004d Author: PyTorch MergeBot Date: Fri Sep 23 17:13:35 2022 +0000 Revert "Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471)" This reverts commit 1e92eb806865602be6d9c02a311108c4f88869b2. Reverted https://github.com/pytorch/pytorch/pull/85471 on behalf of https://github.com/clee2000 due to stacked prs https://github.com/pytorch/pytorch/pull/85417 and https://github.com/pytorch/pytorch/pull/85434 broke trunk, reverting this so i can revert the others commit f371b5267deb93caa4413482a5c942d9d14a8c2c Author: Richard Zou Date: Thu Sep 22 12:08:04 2022 -0700 Made max_pool2d_with_indices_backward_cuda contiguify `indices` (#85493) Currently, max_pool2d_with_indices_backward(grad_output, self, ..., indices) (on cuda) assumes that indices has the same suggested memory format as self. This is indeed always true in regular PyTorch: the max_pool2d_with_indices forward pass returns indices with the same suggeted memory format as self. However, we'd like to make an argument that always contiguifying indices is good for consistency, has negligible added cost, and is more robust (for Tensor Subclass authors): - the max_pool2d_with_indices_backward implementation for CPU always contiguifies `indices`. Ditto for the max_pool3d_with_indices_backward implementation. - Calling .contiguous() has almost no cost (compared to before) because there is a fast-path that checks the cached memory_format on the TensorImpl. - functorch has trouble writing a batching rule for `max_pool2d_with_indices_backward`. Having it accept `indices` with arbitrary strides helps make it so that vmap doesn't need to special case the batching rule for the strides of `indices`. Test Plan: - Not sure if it's worth writing a separate test case - this PR fixes one of functorch's test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85493 Approved by: https://github.com/ezyang commit ea72a0991c3422d8f314acdf8b911de42a6b4c1e Author: Erjia Guan Date: Fri Sep 23 16:21:25 2022 +0000 Add support to traverse all python collection objects (#84079) Fixes https://github.com/pytorch/data/issues/752 This PR makes `traverse` function supporting more collections data structures from Python. The `getstate_hook` will be invoked after custom `__getstate__` function. This would guarantee that `traverse` function will be working as long as the `DataPipe` is working properly with multiprocessing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84079 Approved by: https://github.com/NivekT, https://github.com/VitalyFedyunin commit 1e92eb806865602be6d9c02a311108c4f88869b2 Author: Elias Ellison Date: Fri Sep 23 13:20:15 2022 +0000 Turn on aliasing tests for fake backwards, Fix Batch norm running mean/var decomp aliasing (#85471) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85471 Approved by: https://github.com/ezyang commit c2f4bbe66918d167401ff5804c6b2d24fc6bda40 Author: Elias Ellison Date: Fri Sep 23 13:13:20 2022 +0000 add amp tests (#85434) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85434 Approved by: https://github.com/ngimel commit 78afa0cf0ca04ce437ca4b519f07c04e73fe0d4c Author: Elias Ellison Date: Fri Sep 23 13:13:19 2022 +0000 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang commit 2e883d4655ce4ad85b1a2af5cf9908c0032549c5 Author: PyTorch MergeBot Date: Fri Sep 23 14:09:29 2022 +0000 Revert "[mkldnn_matmul] enable mkldnn matmul for aarch64 bf16 devices (#83671)" This reverts commit f21e77d9a64b39bb9feb9946d912b7b4952430d6. Reverted https://github.com/pytorch/pytorch/pull/83671 on behalf of https://github.com/dagitses due to breaking meta internal builds where cpuinfo_has_arm_bf16 is not defined commit 034f2b4d231c2ca6fee889198b3a985e48514aa8 Author: andrewor14 Date: Thu Sep 22 16:31:56 2022 -0700 [Quant][fx] Enable FX static quantization for LSTM (#85068) **Summary:** This commit enables the custom module LSTM path for FX graph mode static quantization. This has the same flow as eager mode, which was already previously supported: ``` torch.nn.LSTM | (prepare_fx) v torch.ao.nn.quantizable.LSTM | (convert_fx) v torch.ao.nn.quantized.LSTM ``` The main reason why custom module LSTM is not supported in FX graph mode quantization today is because its inputs and outputs are nested tuples, and existing constructs such as observers, "quantize" nodes, and "dequantize" nodes do not understand how to handle complex structures. Note that the approach taken in this commit is only intended to be a short-term solution highly tailored to the input and output formats of custom module LSTM. In the future, for the longer-term solution, we should design a more general QConfig that allows users to specify complex input and output formats, and enable FX graph mode quantization to understand arbitrary nested structures and automatically infer how to transform the graph accordingly. **Context:** Today, in FX graph mode static quantization, custom modules are assumed to have quantized inputs and quantized outputs, with the exact dtypes derived from the associated QConfig (default quint8). Since custom modules are currently not handled through the reference model flow, their observer replacement logic are a little different from normal operators: ``` input -> custom_module -> output input -> obs0 -> custom_module -> obs1 -> output input -> quant -> quantized_custom_module -> dequant -> output ``` In the last step, input observers are replaced with "quantize" and output observers are replaced with "dequantize", in contrast to other non-custom-module patterns where observers are replaced with "quantize-dequantize" pairs instead. Note that, conceptually, the output observer `obs1` is really just a DeQuantStub, since no observation is actually needed. **Custom module LSTM:** The reason why custom module LSTM cannot be handled in the same way is because, unlike other custom modules, its inputs and outputs are nested tuples instead of single tensors. This is how the existing custom module code would try to handle LSTMs: ``` input -> lstm -> output hidden0 -/ \-> hidden0 hidden1 -/ \-> hidden1 input -> obs0 -> lstm -> obs1 # fails hidden0 -/ # missing observer hidden1 -/ # missing observer ``` However, this fails today because 1) we assume there is only one input to the custom module, and so we never end up quantizing `hidden0` and `hidden1`, and 2) the output observer `obs1` is fed a tuple, which it does not understand how to handle. **Short-term fix:** This commit addresses the above by specifically handling the input and output structures used by custom module LSTM. For the inputs, we manually insert observers for `hidden0` and `hidden1` to ensure all input tensors are quantized. For the outputs, we split the tuple into its internal nodes, attach a DeQuantStub to each node, and recombine these DeQuantStubs according to the original structure. Finally, we must also reroute consumers of the original LSTM tuple (and its internal nodes, e.g. `lstm[0]`) to these DeQuantStubs: ``` input -> lstm -> output -> linear0 hidden0 -/ \-> hidden0 -> linear1 hidden1 -/ \-> hidden1 -> linear2 input -> obs0 -> lstm -> output -> dqstub -> linear0 -> obs3 hidden0 -> obs1 -/ \-> hidden0 -> dqstub -> linear1 -> obs4 hidden1 -> obs2 -/ \-> hidden1 -> dqstub -> linear2 -> obs5 input -> quant -> qlstm -> output -> dequant -> linear0 -> quant -> dequant hidden0 -> quant -/ \-> hidden0 -> dequant -> linear1 -> quant -> dequant hidden1 -> quant -/ \-> hidden1 -> dequant -> linear2 -> quant -> dequant input -> quant -> qlstm -> output -> quantized_linear0 -> dequant hidden0 -> quant -/ \-> hidden0 -> quantized_linear1 -> dequant hidden1 -> quant -/ \-> hidden1 -> quantized_linear2 -> dequant ``` Note that we choose to insert DeQuantStubs here instead of observers because these will ultimately be replaced by "dequantize" nodes. This matches the general custom module behavior, where output observers are replaced only with "dequantize" nodes (as opposed to the normal "quantize-dequantize" pair), since custom module outputs are assumed to already be quantized. Using DeQuantStubs instead of observers also simplifies the "dequantize" insertion logic. In the future, we should use DeQuantStubs in place of output observers for custom modules in general. **Test plan:** python test/test_quantization.py TestQuantizeFx.test_static_lstm python test/test_quantization.py TestQuantizeFx.test_static_lstm_consume_tuple **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/85068 Approved by: https://github.com/jerryzh168 commit 71dddec6eac4e518d428d2fdc1324d421f5c8b56 Author: Ryan Spring Date: Fri Sep 23 06:52:38 2022 +0000 Cast grad_input to half when input_dtype is half in _softmax_backward_data aten decomposition (#85497) Fixes #85504 `_softmax_backward_data` and `_log_softmax_backward_data` cast `grad_input` to half when the `input_dtype` is half. When running with amp without the cast, consumer ops can trigger `RuntimeError: expected scalar type Float but found Half`. https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/SoftMax.cpp#L70-L83 https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/SoftMax.cpp#L102-L113 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85497 Approved by: https://github.com/ngimel commit b4f9b68225de19f87e9c3c6e148447506763a861 Author: Nikolay Korovaiko Date: Fri Sep 23 04:55:50 2022 +0000 should_check_strides (#85416) This PR ports `should_check_strides` checks from `origin/symbolic-shapes` to `master` as the part of our dynamic shapes landing effort. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85416 Approved by: https://github.com/ezyang commit d5cabf79469b8966f99e10a8f04b3a2f222027df Author: Nikita Shulga Date: Fri Sep 23 04:48:16 2022 +0000 Make functorch compilable with Py-3.11 (#85054) By using compatibility wrappers from [python_compat.h](https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/python_compat.h) and skipping part of `getname` switch Fixes https://github.com/pytorch/pytorch/issues/85006 Please note that `import torch` right now fails by default on 3.11 with some jit issue, so I think this shouldn't be a really issue for a bit Pull Request resolved: https://github.com/pytorch/pytorch/pull/85054 Approved by: https://github.com/kit1980, https://github.com/zdevito commit 56c0c0af5b529d5fcfbef2bcba2f75ec1487be3a Author: Andrew Gu Date: Thu Sep 22 21:25:16 2022 +0000 [ShardedTensor] Add `is_floating_point` (#85483) This adds `is_floating_point()` support to `ShardedTensor`. This is needed for `ShardedTensor` + FSDP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85483 Approved by: https://github.com/wanchaol commit c8f78d417b305536d0b1e031afff2f91f2487bd0 Author: Andrew Gu Date: Thu Sep 22 21:25:15 2022 +0000 [ShardedTensor] Add `is_meta` (#85482) This adds `is_meta` support to `ShardedTensor`. This is needed for `ShardedTensor` + FSDP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85482 Approved by: https://github.com/wanchaol commit 05d0eb2aee495b6caf3de9b756fa0ceca8f1f67b Author: Andrew Gu Date: Thu Sep 22 21:25:15 2022 +0000 [FSDP] Make `_ran_pre_backward_hook` check more robust (#85481) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85481 Approved by: https://github.com/rohan-varma commit 8602873a122ddbe24e7df6f8246f6748abe25a60 Author: PyTorch MergeBot Date: Fri Sep 23 03:40:45 2022 +0000 [vision hash update] update the pinned vision hash (#85522) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85522 Approved by: https://github.com/pytorchbot commit cf0de77c2cfb8843b8ae67e6a6f053e6bf6bb3d9 Author: Andrew Gu Date: Thu Sep 22 17:09:51 2022 +0000 [Easy][FSDP] Simplify `assert` to `p_assert` (#85479) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85479 Approved by: https://github.com/rohan-varma commit 5704c73b56fb5fabb3ee7e51bc03a0b55081d524 Merge: f15886f9a2 6b416bf681 Author: mingfeima Date: Fri Sep 23 10:26:23 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 6b416bf6818ce814b8336113fadca8b05b052b01 Author: mingfeima Date: Fri Sep 23 10:26:23 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit f15886f9a2c1d0ddbae7baa5411cd02908bd556f Merge: e97c609edf 9ae1bd8724 Author: mingfeima Date: Fri Sep 23 10:03:34 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 9ae1bd872438b4814ea8982191a97b6cefe0d815 Author: mingfeima Date: Fri Sep 23 10:03:34 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 8bd4724f04386393d7f361f472e504e5fbb7501a Author: Wei Wang <109318740+weiwangmeta@users.noreply.github.com> Date: Fri Sep 23 01:05:15 2022 +0000 Adding a unit test that would gate PRs and prevent reverts, e.g. #83327 (#85442) PR #83327 slipped through CI, adding this unit test as part of efforts to minimize future reverts Pull Request resolved: https://github.com/pytorch/pytorch/pull/85442 Approved by: https://github.com/Balandat, https://github.com/mehtanirav commit 5f6735ea97c90913f54c00b5eb2fe782b65b257b Author: Rohan Varma Date: Thu Sep 22 12:54:19 2022 -0700 [FSDP] Address comments on previous PR (#85490) Address follow ups on https://github.com/pytorch/pytorch/pull/85223/ Differential Revision: [D39740878](https://our.internmc.facebook.com/intern/diff/D39740878/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85490 Approved by: https://github.com/awgu commit 539076e2c25675dafcef804f115a7979a44fdfdb Author: Ivan Yashchuk Date: Fri Sep 23 00:16:55 2022 +0000 Remove deprecated torch.lstsq (#70980) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.lstsq`. There's a note in `tools/codegen/gen.py` about `lstsq` schema in `native_function.yaml` that I will not remove: https://github.com/pytorch/pytorch/blob/87139d8532c99ff5dbeef1b97948d71793aa7851/tools/codegen/gen.py#L734-L770 cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70980 Approved by: https://github.com/lezcano, https://github.com/kit1980 commit 6380016bdd6637785e08c9ccb932e83ea46b7a18 Author: Nikita Shulga Date: Fri Sep 23 00:08:23 2022 +0000 Disable decomposition registration on Python-3.11 (#85509) As it is currently broken (probably need few tweaks to AST tree parsing) See https://github.com/pytorch/pytorch/issues/85506 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85509 Approved by: https://github.com/zou3519, https://github.com/soulitzer commit f0869cc8d095c9bdbcaca147ba52857932e7a743 Author: Richard Barnes Date: Thu Sep 22 23:15:10 2022 +0000 Make CUDA exceptions unlikely and isolate C10_CUDA_CHECK body (#85256) This marks CUDA exception checks as unlikely, which might have a positive performance impact. If further isolates part of `C10_CUDA_CHECK` into a separate function and file so that code can be made more expressive in subsequent diffs without bloating functions using the check or creating readability issues. Test Plan: Sandcastle Differential Revision: D39619861 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85256 Approved by: https://github.com/ezyang, https://github.com/ngimel commit f0a084f3db544ec7db2f56d29ad9dcaa4619bf5a Author: PyTorch MergeBot Date: Thu Sep 22 23:00:13 2022 +0000 Revert "[Profiler] Make `LibKinetoClient::stop()` directly call `ProfilerStateBase::pop` (#83965)" This reverts commit fdd366541333330387d0b262da8357984e0d311f. Reverted https://github.com/pytorch/pytorch/pull/83965 on behalf of https://github.com/robieta due to broke internal on-demand tracing: S296407 commit 46a6a50f4ef826bc88ef0783765c24e6f0aa28c2 Author: Nikita Shulga Date: Thu Sep 22 22:13:47 2022 +0000 Skip MPS test from generic M1 testsuite (#85500) As there is separate Run MPS shard right now See if this reduces the number of crashes Pull Request resolved: https://github.com/pytorch/pytorch/pull/85500 Approved by: https://github.com/clee2000, https://github.com/kit1980, https://github.com/huydhn commit 92a942100a9a061a6274a56b31cd905fc688e622 Author: Catherine Lee Date: Thu Sep 22 22:05:55 2022 +0000 disable circleci jobs b/c they are flaky (#85508) should undo this when theyre ok again Pull Request resolved: https://github.com/pytorch/pytorch/pull/85508 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi commit 5e700803c27260d2aaba92c42cf2da7f43ed0d68 Author: Mikayla Gawarecki Date: Thu Sep 22 14:33:04 2022 +0000 Use fallback approach for nested matmul (#85311) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85311 Approved by: https://github.com/cpuhrsch, https://github.com/drisspg commit 63c1f2fef94a0c3a0b0ea87493ce3fb919876153 Author: Mike Iovine Date: Thu Sep 22 20:23:05 2022 +0000 [Static Runtime] Fold linear prepack ops (#85289) Summary: Split `quantized_linear_unpacked_weight_v2` into `linear_prepack` and `quantized_linear` so that the prepacking operation may be eliminated by constant folding. Test Plan: Fixes a huge regression in an internal model: ``` Before 89.6141 ms. 99.0923%. fb::quantized_linear_unpacked_weight_v2 (12 nodes) After 0.806852 ms. 53.5365%. quantized::linear (12 nodes, out variant) (prepacking eliminated) ``` Differential Revision: D39622530 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85289 Approved by: https://github.com/davidberard98 commit e4899764b2a560b65be9018131bfea7ebdc2cd84 Author: Mike Iovine Date: Thu Sep 22 20:21:52 2022 +0000 [Static Runtime] Fix aten::index_put list conversions (#85298) Summary: Apparently static runtime's list construct return value is always a `GenericList`, so we cannot use the `toOptionalTensorList` method in the general case -- we must convert each item individually. Test Plan: New unit test Differential Revision: D39628979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85298 Approved by: https://github.com/tenpercent commit bd854588fb927371c319d24d31b659731eddc3bc Author: Alexander Grund Date: Thu Sep 22 19:58:48 2022 +0000 Increase timeout for ProcessGroupGlooTest (#85474) We see spurious failures due to timeouts in`test_allreduce_coalesced_basics` but only when running the whole test suite with `python run_test.py --verbose -i distributed/test_c10d_gloo`. Increasing the timeout to 50s should provide enough leeway to avoid this. Note that the default for the `_timeout` is 30 minutes. Originally reported in EasyBuild at https://github.com/easybuilders/easybuild-easyconfigs/pull/15137#issuecomment-1073809305 and patch proposed by @casparvl Pull Request resolved: https://github.com/pytorch/pytorch/pull/85474 Approved by: https://github.com/rohan-varma commit e505360eb8c21d713180d3e71add0513cb201581 Author: PyTorch MergeBot Date: Thu Sep 22 19:37:29 2022 +0000 Revert "[mta] APEX style Fused Adam (#81705)" This reverts commit 7a6c4d0c50dd0670d87bc39d53292cf8cb90ca04. Reverted https://github.com/pytorch/pytorch/pull/81705 on behalf of https://github.com/dagitses due to broke internal builds, details to come commit 848437590f41af5d3c9f9bb381106114f70fe572 Author: Richard Zou Date: Thu Sep 22 06:56:40 2022 -0700 Delete functorch's monkeypatching (#85430) By upstreaming functorch's tensor printing logic into PyTorch. There's no way of creating a custom print function for a TensorImpl subclass (as opposed to a torch_dispatch or torch_function tensor subclass, which can just override repr()) right now, so we need to directly interpose inside regular Tensor printing in PyTorch. Monkey patching is bad; users do not expect `import blah` to change something about another library. Fixes https://github.com/pytorch/functorch/issues/900 Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/85430 Approved by: https://github.com/ezyang commit 5e5c31954994274e51c09731ac71a4f824ddb620 Author: Richard Zou Date: Thu Sep 22 06:56:40 2022 -0700 Move functorch python bindings to torch/csrc (#85426) This moves functorch's python bindings to torch/csrc/functorch/init.cpp. Coming next is the torchdim move. I didn't do torchdim yet because moving functorch's python bindings unblocks some other things that I want to do first. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/85426 Approved by: https://github.com/ezyang commit bcf93181a0ca5db75bd038db0d5f7e4cee733db7 Author: Ivan Yashchuk Date: Thu Sep 22 17:40:46 2022 +0000 Remove deprecated torch.matrix_rank (#70981) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.matrix_rank`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70981 Approved by: https://github.com/lezcano, https://github.com/kit1980 commit e34297690787dc89e98989b977b845cb1fa86d1e Author: atalman Date: Thu Sep 22 17:33:59 2022 +0000 Adding conda retry upload to mitigate connection reset errors (#85407) Adding conda retry upload to mitigate connection reset errors Mitigate errors like this: https://github.com/pytorch/pytorch/actions/runs/3095808905/jobs/5012840560 ``` Uploading file "pytorch-nightly/pytorch/1.13.0.dev20220921/linux-64/pytorch-1.13.0.dev20220921-py3.9_cuda11.6_cudnn8.3.2_0.tar.bz2" 0%| | 0.00/1.24G [00:00 Date: Thu Sep 22 16:30:16 2022 +0000 Exposing native _scaled_dot_product_attention to torch.nn (#85044) This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the next few weeks and will eventually land as a prototype feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85044 Approved by: https://github.com/cpuhrsch commit 0d04e5489855dbdc8fd83bb2a6a3f13ba7017f63 Author: Nikita Shulga Date: Thu Sep 22 16:12:51 2022 +0000 [GHF] Create "Core Reviewers" group (#85477) And add @mruberry and @lezcano to it Pull Request resolved: https://github.com/pytorch/pytorch/pull/85477 Approved by: https://github.com/albanD commit e227e3ec4897d2ce04de705423afefa028d82b34 Author: atalman Date: Thu Sep 22 16:01:38 2022 +0000 Disabling the pypi cudnn wheel from uploading temporarily (#85470) Disabling the pypi cudnn wheel from uploading Temporary change untill the cudnn wheel package is ready for release This mitigates following issue: https://github.com/pytorch/vision/issues/6628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85470 Approved by: https://github.com/malfet commit 5043457a8ed07e06961c3b92579b856ed2bc9f6f Author: PyTorch MergeBot Date: Thu Sep 22 15:44:38 2022 +0000 Revert "Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417)" This reverts commit 9c77083965e1283763a83f72a3adf299281761e3. Reverted https://github.com/pytorch/pytorch/pull/85417 on behalf of https://github.com/clee2000 due to broke tests on trunk (and pull somehow) https://hud.pytorch.org/pytorch/pytorch/commit/9c77083965e1283763a83f72a3adf299281761e3 commit 9baf6770bcd67272a2cb9212c49e3bb95f0679c3 Author: Edward Z. Yang Date: Wed Sep 21 07:00:52 2022 -0700 Apply new symbolic shape strategy to make_fx symbolic mode (#85260) This results in some test wobbling, which looks legit. I also added some debug helpers for stuff that I found useful while working on this. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85260 Approved by: https://github.com/albanD commit 2f50d2f685db0cc52c52577b25c935970d96b99e Author: Justin Chu Date: Thu Sep 22 03:52:37 2022 +0000 [ONNX] Update docs on symbolic registration (#85290) - Move inline instructions on editing symbolic functions to the README - Add a line on using the symbolic function registration decorator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85290 Approved by: https://github.com/BowenBao commit 9c77083965e1283763a83f72a3adf299281761e3 Author: Elias Ellison Date: Wed Sep 21 21:24:39 2022 +0000 Add FakeCrossRef tests for backwards, Fix Layer Norm Backward Decomp (#85417) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85417 Approved by: https://github.com/ezyang commit 61b4e8a7bfb69954680013e2e34fc099db900736 Author: Edward Z. Yang Date: Wed Sep 21 21:53:23 2022 -0700 More SymFloat support (#85411) - Support storing SymFloat in IValue - Add SymFloat to JIT type system (erases to float) - Printing support for SymFloat - add/sub/mul/truediv operator support for SymFloat - Support truediv on integers, it returns a SymFloat - Support parsing SymFloat from Python object Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85411 Approved by: https://github.com/albanD commit 0c46e3ec6684ad64ce4ef54f07a886ef67bad924 Author: George Qi Date: Wed Sep 21 18:10:13 2022 +0000 [maskedtensor] add basic tests and unary/binary/reduction tests from common_method_invocations (#82841) Decided offline on the invariant that: `masked_tensor` calls `MaskedTensor()`, which is analogous to `torch.tensor` `as_masked_tensor` calls `MaskedTensor._from_values()`, which is analogous to `torch.as_tensor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82841 Approved by: https://github.com/cpuhrsch, https://github.com/bhosmer commit 2bc82163eb70341b9e644689f54a6c3e0fafdc92 Author: Eddie Yan Date: Thu Sep 22 07:34:45 2022 +0000 Reduce memory usage requirement of test_warp_softmax_64bit_indexing in test_nn.py (re-open of #85037) (#85373) CC @ngimel @xwang233 @ptrblck Adds fix for `get_tolerances`, tested locally on a dgx Volta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85373 Approved by: https://github.com/ngimel commit e97c609edf89467e7c27f450505a581ff02c4055 Merge: 0ee1ee3159 48c34a9d00 Author: mingfeima Date: Thu Sep 22 15:08:44 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 48c34a9d00c38c5e5cc87b1481c2e8c0e818ab28 Author: mingfeima Date: Thu Sep 22 15:08:44 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 76d60778eb01b4213735be1c6e126fe2da519b8e Author: Justin Chu Date: Thu Sep 22 03:52:37 2022 +0000 [ONNX] Use decorators for symbolic function registration (#84448) This is the 4th PR in the series of #83787. It enables the use of `@onnx_symbolic` across `torch.onnx`. - **Backward breaking**: Removed some symbolic functions from `__all__` because of the use of `@onnx_symbolic` for registering the same function on multiple aten names. - Decorate all symbolic functions with `@onnx_symbolic` - Move Quantized and Prim ops out from classes to functions defined in the modules. Eliminate the need for `isfunction` checking, speeding up the registration process by 60%. - Remove the outdated unit test `test_symbolic_opset9.py` - Symbolic function registration moved from the first call to `_run_symbolic_function` to init time. - Registration is fast: ![image](https://user-images.githubusercontent.com/11205048/189164959-f3fca173-19bc-4682-b150-f13a586387bf.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84448 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 0ee1ee3159382ec49211d4276e760dd7e9581a5c Merge: cad2d77de3 ec4448e95a Author: mingfeima Date: Thu Sep 22 14:10:02 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit ec4448e95a1d6db57e7850898d4f3e7605b795c3 Author: mingfeima Date: Thu Sep 22 14:10:02 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit cad2d77de3b1858a93a1faab40dd00c25253de5d Merge: 75bfbc35ca 23ab87f96c Author: mingfeima Date: Thu Sep 22 13:56:40 2022 +0800 Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit 23ab87f96cb4a0963eb6521a23217d6f22d07377 Merge: 97c4f58755 08f413bd6a Author: mingfeima Date: Thu Sep 22 13:56:40 2022 +0800 Update base for Update on "port spmm_sum to pytorch and optimize it on CPU" [ghstack-poisoned] commit c7c2578f93fbfad5f769543848642a16b6071756 Author: Huy Do Date: Thu Sep 22 03:33:30 2022 +0000 Skip NVIDIA driver installation if it's already there (#85435) Address flaky failures such as https://github.com/pytorch/pytorch/actions/runs/3099236524/jobs/5018444060 in which NVIDIA driver has already been installed. The installation will be skipped if the same driver has already been installed. I also move NVIDIA driver installation before the installation of docker NVIDIA support to avoid any funny business with the latter interfering with the installation. * Run `.github/scripts/install_nvidia_utils_linux.sh` manually with an existing but different NVIDIA driver installed (515.65.01) ``` == Installing nvidia driver NVIDIA-Linux-x86_64-515.57.run == + HAS_NVIDIA_DRIVER=0 ++ command -v nvidia-smi + '[' -x /usr/bin/nvidia-smi ']' ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader + INSTALLED_DRIVER_VERSION=515.65.01 + '[' 515.65.01 '!=' 515.57 ']' + echo 'NVIDIA driver (515.65.01) has been installed, but we expect to have 515.57 instead. Continuing with NVIDIA driver installation' NVIDIA driver (515.65.01) has been installed, but we expect to have 515.57 instead. Continuing with NVIDIA driver installation + '[' 0 -eq 0 ']' + sudo yum groupinstall -y 'Development Tools' Loaded plugins: dkms-build-requires, extras_suggestions, langpacks, priorities, update-motd Maybe run: yum groups mark install (see man yum) No packages in any requested group available to install or update ++ uname -r + sudo yum install -y 'kernel-devel-uname-r == 4.14.290-217.505.amzn2.x86_64' Loaded plugins: dkms-build-requires, extras_suggestions, langpacks, priorities, update-motd Package kernel-devel-4.14.290-217.505.amzn2.x86_64 already installed and latest version Nothing to do + sudo modprobe backlight + sudo curl -fsL -o /tmp/nvidia_driver https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-515.57.run + sudo /bin/bash /tmp/nvidia_driver -s --no-drm ... ``` * Run `.github/scripts/install_nvidia_utils_linux.sh` manually with the same NVIDIA driver installed (515.57) ``` == Installing nvidia driver NVIDIA-Linux-x86_64-515.57.run == + HAS_NVIDIA_DRIVER=0 ++ command -v nvidia-smi + '[' -x /usr/bin/nvidia-smi ']' ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader + INSTALLED_DRIVER_VERSION=515.57 + '[' 515.57 '!=' 515.57 ']' + HAS_NVIDIA_DRIVER=1 + echo 'NVIDIA driver (515.57) has already been installed. Skipping NVIDIA driver installation' NVIDIA driver (515.57) has already been installed. Skipping NVIDIA driver installation + '[' 1 -eq 0 ']' + nvidia-smi ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85435 Approved by: https://github.com/seemethere, https://github.com/malfet commit 99ad8a304898de8bf1e20a6fc12e335e9b7c5064 Author: PyTorch MergeBot Date: Thu Sep 22 03:12:46 2022 +0000 [vision hash update] update the pinned vision hash (#85451) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85451 Approved by: https://github.com/pytorchbot commit 34296e2f4c99841d5fe1d8043299f07923106a8d Author: Sherlock Huang Date: Wed Sep 21 22:18:37 2022 +0000 SubgraphMatcher remove invalid matches (#85444) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85444 Approved by: https://github.com/rkindi commit 4523ac7aa10cc5a6a5d93c3469c353d581a818be Author: Jerry Zhang Date: Tue Sep 20 18:34:56 2022 -0700 [quant][docs][ez] Fix formatting for qconfig_mapping (#85306) Summary: att Test Plan: visual inspection of generated docs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/85306 Approved by: https://github.com/vkuzo, https://github.com/andrewor14 commit f21e77d9a64b39bb9feb9946d912b7b4952430d6 Author: Sunita Nadampalli Date: Thu Sep 22 00:54:59 2022 +0000 [mkldnn_matmul] enable mkldnn matmul for aarch64 bf16 devices (#83671) this PR enables mkldnn matmul for aarch64 bf16 devices for both bf16 as well as fp32 input. This PR is dependent on cpuinfo commit update PR: https://github.com/pytorch/pytorch/pull/83620 Issue: https://github.com/pytorch/pytorch/issues/83594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83671 Approved by: https://github.com/malfet commit 26a861cb27313f86fd1c7eb1348b577a0a4f0784 Author: Zafar Date: Thu Sep 22 00:50:49 2022 +0000 [quant] Check if cuda quantizing while on qnnpack engine (#85423) Although not possible in practice, while running the tests it is possible to try to quantize a CUDA tensor while quantization engine is set to QNNPACK. This would override the memory allocator from CUDA to MobileCPU, which would cause the new quantized tensors to be allocated on a CPU (instead of CUDA). Please, note that this is not a realistic scenario, as the qnnpack quantization engine is only "emulated" during the tests. When running on a real mobile CPU we don't expect a CUDA to be present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85423 Approved by: https://github.com/jerryzh168 commit 56a41b5998a28566984fee70e1dc9604896bd180 Author: kshitij12345 Date: Thu Sep 22 00:21:11 2022 +0000 [composite compliance] ctc_loss (#84752) I have mixed feelings about adding new (private) operators. Backends writers will have to override them as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84752 Approved by: https://github.com/zou3519 commit 1910c5847edbf2f92debc8f73fc7d9056b9fd9a0 Author: Catherine Lee Date: Thu Sep 22 00:07:00 2022 +0000 rebase and merge - add sleep (#85420) not a fan of this solution, so if anyone has better ideas please tell me add a sleep between the tryrebase.py and trymerge.py scripts so that github has time to get the push and start workflows, and so that we dont get weird event orders like https://github.com/pytorch/pytorch/pull/85267 where the push from the rebase looks like its after the merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/85420 Approved by: https://github.com/malfet, https://github.com/ZainRizvi commit caa0ab557dd10e04ca413c1508f76ec8ae5adea3 Author: PyTorch MergeBot Date: Wed Sep 21 22:55:40 2022 +0000 Revert "Use fallback approach for nested matmul (#85311)" This reverts commit 7c31f6e67213cbe773b0e2556f880f6ce2449fc3. Reverted https://github.com/pytorch/pytorch/pull/85311 on behalf of https://github.com/clee2000 due to broke lots of builds https://hud.pytorch.org/pytorch/pytorch/commit/7c31f6e67213cbe773b0e2556f880f6ce2449fc3 even though the pr was green commit 0336308be5c2d019b99ed5fb59ec1bf01f735a99 Author: Zafar Date: Wed Sep 21 22:46:25 2022 +0000 [AO] Callable norm function for sparsifier (#85236) The `WeightNormSparsifier` currently only supports L2-norm. This allows the users specify the function that is applied to compute the norm. In addition, L1-norm is also added, as an `.abs` function. - The functions that are referred to as "norms", are not strictly such. For example, L2-norm of `x` is computed as `F.avg_pool(x * x, ...)`. Similarly, L1-norm of `x` is computed as `F.avg_pool(x.abs(), ...)`. - When passing callable functions for the norm, the above assumption must hold: `F.avg_pool(norm_fn(x), ...)` will be applied. ```python >>> # L3-norm >>> l3 = lambda T: T * T * T >>> sparsifier = WeightNormSparsifier(norm=l3) >>> >>> # L0-norm >>> l0 = lambda T: (torch.logical_or(torch.zeros(T.shape), T != 0).to(T.dtype) >>> sparsifier = WeightNormSparsifier(norm=l0) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85236 Approved by: https://github.com/jcaip commit 6fb182c86b0bcf814ec4b5ece0fe1ffa8abcfbb6 Author: foram-chandra <96388449+foram-chandra@users.noreply.github.com> Date: Wed Sep 21 22:43:52 2022 +0000 [doc] document pin_memory for randn (#85219) Fixes #85123 cc: @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85219 Approved by: https://github.com/mruberry commit 7c31f6e67213cbe773b0e2556f880f6ce2449fc3 Author: Mikayla Gawarecki Date: Wed Sep 21 16:31:27 2022 +0000 Use fallback approach for nested matmul (#85311) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85311 Approved by: https://github.com/cpuhrsch, https://github.com/drisspg commit 5aa84c16dbb9640da738866f0d52f1dd0d285f77 Author: Sourav Mandal Date: Wed Sep 21 22:17:46 2022 +0000 [pytorch] cuBLAS addmm malfunction test (#85432) Summary: Re-submit for approved PR that was then reverted: https://github.com/pytorch/pytorch/pull/85084 Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85432 Approved by: https://github.com/zrphercule commit 9c127986bfa5bf8759eb88eb2c77f2de7ad001ba Author: Zain Rizvi Date: Wed Sep 21 21:34:56 2022 +0000 Fix labeling detection bug (#85429) Fixes a bug where if a PR is pre-labeled with both a `release notes:` label and a `topic:` label then our bot still pings on the PR, erroneously asking for those labels to be added &-ing sets computes the set intersection, which isn't what was desired here Pull Request resolved: https://github.com/pytorch/pytorch/pull/85429 Approved by: https://github.com/janeyx99 commit 3dce26635f1bbab4bc96801e3cfd7ce728ba78b9 Author: PyTorch MergeBot Date: Wed Sep 21 20:21:25 2022 +0000 Revert "test in parallel at file granularity (#84961)" This reverts commit 8107666c6a1c25e96762a31296cace9ed343aaf6. Reverted https://github.com/pytorch/pytorch/pull/84961 on behalf of https://github.com/clee2000 due to makes test_forward_ad_nn_functional_max_unpool2d_cuda_float32 flakily unexpectedly pass commit 0278a141fc9723e94506d36c40c995aa77fcc00b Author: nikitaved Date: Wed Sep 21 20:10:24 2022 +0000 csr <-> csc, csc <-> csc, bsr <-> bsc, bsc <-> bsc, bsr <-> bsr conversions (#85091) As per title. Required to enable a wider selection of sparse formats for `nn.functional.linear`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85091 Approved by: https://github.com/amjames, https://github.com/cpuhrsch commit a2cbbbd46ffff8c43d8708fb7ef718bb4fdfaa87 Author: Edward Z. Yang Date: Wed Sep 21 11:24:23 2022 -0400 Improve SymInt print and move to correct file (#85413) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85413 Approved by: https://github.com/wconstab commit 3a09a1e8f01ee85f0854eba9acb5e049b2c1545e Author: foram-chandra Date: Wed Sep 21 19:31:56 2022 +0000 [doc] remove out argument from squeeze (#85222) Fixes #83972 cc- @ngimel @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85222 Approved by: https://github.com/ezyang commit 9a81da7ad1a57e0f6b17948872d7c0d08495ae91 Author: Peter Bell Date: Wed Sep 21 19:23:49 2022 +0000 Update NCCL to current master and remove patch step (#85367) The patch from #84245 has been upstreamed into NCCL, so the patch step is no longer required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85367 Approved by: https://github.com/ezyang commit d5adf8151af7b1b1126ce4ae3d1bf140d0515485 Author: jiahongyu Date: Wed Sep 21 19:20:30 2022 +0000 [PolishTypo] inherentely->inherently, intentially->intentionally (#85325) Polish comment typo, `inherentely->inherently`, `intentially->intentionally` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85325 Approved by: https://github.com/ezyang commit 764cba6848e5b239d102773ec45080dc0729c9e4 Author: Thomas Viehmann Date: Wed Sep 21 18:53:34 2022 +0000 add Python ref for isreal (#85361) Dipping my toes into prims waters Pull Request resolved: https://github.com/pytorch/pytorch/pull/85361 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry commit 77f1f98479b204b2d0151d9cf3700f99915b9d50 Author: Mikayla Gawarecki Date: Tue Sep 20 15:13:49 2022 +0000 Re-introduce `torch.Tensor.to_padded_tensor` (#85293) Differential Revision: [D39629004](https://our.internmc.facebook.com/intern/diff/D39629004) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85293 Approved by: https://github.com/cpuhrsch commit 8e1ae1c19d7096e72d5e095a7c5de0acf05c5fbf Author: Edward Z. Yang Date: Wed Sep 21 13:53:02 2022 -0400 Add Krovatkin to symbolic-shapes team (#85422) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85422 Approved by: https://github.com/wconstab commit 25a5ada426a238461db39386cf39f6452b73a4b9 Author: Edward Z. Yang Date: Wed Sep 21 13:51:16 2022 -0400 Typofix (#85421) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85421 Approved by: https://github.com/wconstab, https://github.com/malfet commit 35943f30cbe99276847e3b04704a66f0318f0083 Author: Ivan Yashchuk Date: Wed Sep 21 18:12:52 2022 +0000 Reference implementation for torch.Tensor.sum_to_size (#85338) New ref: `torch._refs.sum_to_size`. View consistency validation is disabled because the ref returns a view instead of returning the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85338 Approved by: https://github.com/mruberry commit 85073b8ddceb3705e333adfb08d3f1ba039a0370 Author: anjali411 Date: Wed Sep 21 09:45:09 2022 +0000 Add __all__ to fx, fistributed and cuda submodules (#85080) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85080 Approved by: https://github.com/albanD commit 0217a8d049ec54d303ef39776cf28bc80954b8e1 Author: PyTorch MergeBot Date: Wed Sep 21 18:02:50 2022 +0000 Revert "[fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330)" This reverts commit d3dec8097b847fc46755ef06ea6ff90eebc846eb. Reverted https://github.com/pytorch/pytorch/pull/85330 on behalf of https://github.com/dagitses due to a PR this is based on got reverted, rebase and reland commit 0ac6311356d21d052d2ca070b6f81706339deafb Author: PyTorch MergeBot Date: Wed Sep 21 17:57:49 2022 +0000 Revert "[CUBLAS][CUDA GRAPHS] (re-open of #83461) Explicitly set the workspace for cuBLAS handles (#85292)" This reverts commit 4012e623e8689b873cae94590766d990d155017c. Reverted https://github.com/pytorch/pytorch/pull/85292 on behalf of https://github.com/dagitses due to broke an internal test during shutdown. Re-submit with #85399 in stack commit 0e194b32192298f411a47bb28b6a62b194a211b0 Author: Edward Z. Yang Date: Wed Sep 21 17:44:40 2022 +0000 Add Auto Request Review for reviewer assignment (#85414) I want this specifically for dynamic shapes work, but you can feel free to use it for your own needs too. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85414 Approved by: https://github.com/malfet commit 2a88f1b2d86a1fdc7380db768e67b18c24d199c4 Author: lezcano Date: Wed Sep 21 08:14:48 2022 +0000 Land "Make ceil,floor,round,trunc handle integers" (#85144) PR to land https://github.com/pytorch/pytorch/pull/78480, as Rohit does not work in the PyTorch project anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/85144 Approved by: https://github.com/ngimel, https://github.com/mruberry commit 6f2b390fc1d6910876233663a27a4c89d9e486f2 Author: Xu Zhao Date: Wed Sep 21 17:17:46 2022 +0000 Fix import of instruction count benchmark (#85359) This PR fixes the instruction count benchmark 1. Fix the updated import path 2. Allows building the benchmark with less compiler options (remove all "-W" options) Test plan: ``` BENCHMARK_USE_DEV_SHM=1 python main.py --mode ci ``` Manually tested and worked on the CI machine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85359 Approved by: https://github.com/robieta commit d9aa6dfe886597f2c6fb9d9b0582e669465fa28d Author: Elias Ellison Date: Wed Sep 21 01:47:27 2022 +0000 Add Fake Cross Ref Mode, migrate sparse to it (#85382) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85382 Approved by: https://github.com/ezyang commit 8107666c6a1c25e96762a31296cace9ed343aaf6 Author: Catherine Lee Date: Wed Sep 21 16:58:11 2022 +0000 test in parallel at file granularity (#84961) run tests in parallel at the test file granularity runs 3 files in parallel using multiprocessing pool, output goes to a file, which is then printed when the test finishes. Some tests cannot be run in parallel (usually due to lacking memory), so we run those after. Sharding is changed to attempt to mask large files with other large files/run them on the same shard. test_ops* gets a custom handler to run it because it is simply too big (2hrs on windows) and linalg_cholesky fails (I would really like a solution to this if possible, but until then we use the custom handler). reduces cuda tests by a lot, reduces total windows test time by ~1hr Ref. https://github.com/pytorch/pytorch/issues/82894 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84961 Approved by: https://github.com/huydhn commit 2fb820455cc7b6d8f67e303098ffbcf4aac791f8 Author: PyTorch MergeBot Date: Wed Sep 21 16:48:55 2022 +0000 Revert "[pytorch] cuBLAS addmm malfunction test (#85084)" This reverts commit 0297c75c141103cc780c88bfe9749c460690bf58. Reverted https://github.com/pytorch/pytorch/pull/85084 on behalf of https://github.com/clee2000 due to broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419 commit eb94df28c748bff6a55c06e9b3440a525ea1f867 Author: atalman Date: Wed Sep 21 16:30:25 2022 +0000 Use pip install cu117 (#85097) Creates new wheel workflow specific to CUDA 11.7 that does not bundle the cudnn and cublas. Workflow: https://github.com/pytorch/pytorch/actions/runs/3094622781 New Package: manywheel-py3_10-cuda11_7-with-pypi-cudnn | 843 MB Old Package: manywheel-py3_10-cuda11_7 | 1.65 GB Testing workflow: [manywheel-py3_7-cuda11_7-with-pypi-cudnn-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000867662#logs): ``` Bundling without cudnn and cublas. + DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "$LIBGOMP_PATH") + DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libgomp.so.1") ..... pytorch_extra_install_requirements: nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cublas-cu11 ``` [manywheel-py3_7-cuda11_7-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000863250#logs) ``` Bundling with cudnn and cublas. + DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "/usr/local/cuda/lib64/libcudnn_adv_infer.so.8" "/usr/local/cuda/lib64/libcudnn_adv_train.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_infer.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_train.so.8" "/usr/local/cuda/lib64/libcudnn_ops_infer.so.8" "/usr/local/cuda/lib64/libcudnn_ops_train.so.8" "/usr/local/cuda/lib64/libcudnn.so.8" "/usr/local/cuda/lib64/libcublas.so.11" "/usr/local/cuda/lib64/libcublasLt.so.11" "$LIBGOMP_PATH") + DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libcudnn_adv_infer.so.8" "libcudnn_adv_train.so.8" "libcudnn_cnn_infer.so.8" "libcudnn_cnn_train.so.8" "libcudnn_ops_infer.so.8" "libcudnn_ops_train.so.8" "libcudnn.so.8" "libcublas.so.11" "libcublasLt.so.11" "libgomp.so.1") ``` cc: @malfet @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85097 Approved by: https://github.com/malfet commit 90b64e231e6fc327a96ad78bbb4306f69bbf1406 Author: Jithun Nair Date: Wed Sep 21 16:22:12 2022 +0000 Update hipification logic for all ROCm headers (#85320) ...to remove deprecation warnings. Remove component-specific include dirs from include path Pull Request resolved: https://github.com/pytorch/pytorch/pull/85320 Approved by: https://github.com/kit1980 commit 2c285f3e9b83cca3cc3f08b723cc9e46b34c1ccd Author: Jerry Zhang Date: Wed Sep 21 05:13:20 2022 +0000 [quant][docs] README for FX Graph Mode Quantization (#85070) Summary: This is a developer-oriented design doc/README for FX Graph Mode Quantization, the goal for the doc is for new developers of FX Graph Mode Quantization to get familiarized with the high level algorithm of FX Graph Mode Quantization and ramp up quickly Test Plan: no test needed Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/85070 Approved by: https://github.com/vkuzo commit 5fa104a76c092d3d2259794f686e00285d8a3e46 Author: Richard Zou Date: Wed Sep 21 15:50:44 2022 +0000 Move functorch C++ into aten/src/ATen/functorch (#85381) This PR moves functorch C++ code that does not depend on python into aten/src/ATen/functorch. The C++ code that does depend on python (the python bindings as well as torchdim) will go into torch/csrc/functorch, to come later (see https://github.com/pytorch/pytorch/pull/85263 for initial attempt). Pull Request resolved: https://github.com/pytorch/pytorch/pull/85381 Approved by: https://github.com/ezyang commit 125e9256f44d89cd20acbb6954e5f909ddc6da1e Author: Andrew Gu Date: Wed Sep 21 02:16:05 2022 +0000 [FSDP] Add back `forward_prefetch` (#85177) - This implements explicit forward prefetching following the static 1st iteration's pre-forward order when `forward_prefetch=True` in the FSDP constructor. - This has the same unit test coverage as the original `forward_prefetch`. - I checked via print statements that the prefetches are happening, but since I cannot get a good CPU bound workload, it is hard to tell via traces that the prefetch is working. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85177 Approved by: https://github.com/zhaojuanmao commit 977f8fce3cf642f8514eb9e79576d9784eb30b04 Author: Andrew Gu Date: Wed Sep 21 02:16:05 2022 +0000 [FSDP] Simplify backward prefetch implementation (#85176) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85176 Approved by: https://github.com/zhaojuanmao commit 563b065f5a4b4055fa6b025c2514b566d5fd9439 Author: Khushi Agrawal Date: Wed Sep 21 13:57:16 2022 +0000 [fix] rrelu, rrelu_, & RReLU when lower bound > upper bound (#84996) Fixes #83160 cc @kshitij12345 @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/84996 Approved by: https://github.com/mruberry, https://github.com/albanD commit de0f3c4200c17e156e632eef266fd6f27e482127 Author: lezcano Date: Wed Sep 21 09:12:58 2022 +0000 Change Lezcano to lezcano (#85396) I changed my handle to lezcano (no caps) as rhere's always issues with capital letters when automatising stuff. The last issue was https://github.com/pytorch/test-infra/pull/751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85396 Approved by: https://github.com/ezyang commit 0297c75c141103cc780c88bfe9749c460690bf58 Author: Sourav Mandal Date: Wed Sep 21 13:42:12 2022 +0000 [pytorch] cuBLAS addmm malfunction test (#85084) Summary: Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85084 Approved by: https://github.com/zrphercule commit b70c254ebbfe02f2e9a9990aa95368d8ee73be37 Author: Mateusz Sypniewski Date: Wed Sep 21 13:41:52 2022 +0000 Rework printing tensor aliases in CSAN error message (#85008) Small rework of how the error message is formatted, introduces a distinction between the arguments and the output of kernels. Verified manually on multiple examples that the message is printed as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85008 Approved by: https://github.com/lw commit 3eb27229dd74dd0bea434326c471f16c50e558a4 Author: Edward Z. Yang Date: Tue Sep 20 22:01:48 2022 -0700 as_strided symbolic support (#85264) Signed-off-by: Edward Z. Yang Differential Revision: [D39662820](https://our.internmc.facebook.com/intern/diff/D39662820) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85264 Approved by: https://github.com/wconstab commit 793deeefb44ee22dea9f6da5f57bf16e4d63d32d Author: Jesse Cai Date: Wed Sep 21 00:40:11 2022 +0000 [quant][core][feature] Implement masked_fill for CUDA tensors (#85108) Summary: - Add new cuda test for masked_fill - Add in QuantizedCUDA implementation for masked_fill Test Plan: ``` python test/test_quantization.py -k test_qtensor_masked_fill_cuda ``` Reviewers: dzdang Subscribers: Tasks: Fixes https://github.com/pytorch/pytorch/issues/83110 Tags: quant Pull Request resolved: https://github.com/pytorch/pytorch/pull/85108 Approved by: https://github.com/dzdang commit 308b26fe4d2b82788862866a937c19c1b3934881 Author: Ivan Yashchuk Date: Wed Sep 21 12:45:15 2022 +0000 Add nvFuser support for transpose (#84629) `torch._refs.t`, `torch._refs.transpose`, `torch._refs.permute` are all should be working now with nvFuser executor. It would also work with graphs processed by AOT Autograd as these functions are registered to the aten->ref mapping via the "register_decomposition" decorator: https://github.com/pytorch/pytorch/blob/07d398fb269eebe314ae898287494a2bfdc7f278/torch/_refs/__init__.py#L3125-L3126 https://github.com/pytorch/pytorch/blob/07d398fb269eebe314ae898287494a2bfdc7f278/torch/_refs/__init__.py#L3143-L3144 https://github.com/pytorch/pytorch/blob/07d398fb269eebe314ae898287494a2bfdc7f278/torch/_refs/__init__.py#L2548-L2549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84629 Approved by: https://github.com/ngimel commit 2f4a517d67fd693c4ff544e74947b01dc6063dfe Author: Horace He Date: Wed Sep 21 02:30:50 2022 +0000 Ported matmul compositeimplicitautograd impl into core (#85239) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85239 Approved by: https://github.com/ezyang, https://github.com/lezcano commit a3dc338ee1b30aa1f59f36b3ba4c98d6a30a8600 Author: PyTorch MergeBot Date: Wed Sep 21 08:34:51 2022 +0000 Revert "Exposing native _scaled_dot_product_attention to torch.nn (#85044)" This reverts commit 9fdd8a8b7f171be70ea3bd4724c38852ef292d73. Reverted https://github.com/pytorch/pytorch/pull/85044 on behalf of https://github.com/huydhn due to This breaks CUDA 10.2 in trunk. We are deprecating CUDA 10.2, but it is still here in the mean time commit 26ba2e9751dbda278603d40ed67e02d15ca834e3 Author: Jakub Pietrak Date: Wed Sep 21 09:37:40 2022 +0200 implementation for conv layers commit 09965957cd8ecc696852e73022892b3ad4475783 Author: Vasiliy Kuznetsov Date: Tue Sep 20 16:14:07 2022 -0700 quantization: align observer dtype with reference model spec (#85345) Summary: Before this PR, the `dtype` attribute of observers was not clearly defined. It originally meant `interface_dtype` in the eager mode workflow, which is how the codebase before this PR is using it. In the new reference model spec, `dtype` attribute of an observer represents the `dtype` value which needs to be passed into a `quantize` function in the reference model spec. This PR aligns the codebase to this definition of dtype. In detail: 1. change util functions to interpret `dtype` using the reference model definition 2. change `prepare` to interpret `dtype` using the reference model definition 3. change observers for dynamic quantization to interpret `dtype` using the reference model definition. A future PR (left out of this one to keep LOC small) will deprecate the `compute_dtype` field and instead expose `is_dynamic` on observers. " Test plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Differential Revision: [D39675209](https://our.internmc.facebook.com/intern/diff/D39675209) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85345 Approved by: https://github.com/z-a-f, https://github.com/jerryzh168 commit 08f413bd6a076e41f0023dadb6f9b95f2fe2ddc6 Author: PyTorch MergeBot Date: Wed Sep 21 05:04:37 2022 +0000 [vision hash update] update the pinned vision hash (#85380) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85380 Approved by: https://github.com/pytorchbot commit 75451d3c81c88eebc878fb03aa5fcb89328989d9 Author: Edward Z. Yang Date: Tue Sep 20 17:17:33 2022 -0400 Address eellison's CR comments on AOTAutograd (#85370) For some reason, this change didn't actually make it to master. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85370 Approved by: https://github.com/eellison commit 3c870dadc3536b03bdcc5377ac85ef9e44cc1e87 Author: Nikita Shulga Date: Wed Sep 21 03:53:25 2022 +0000 [BE] Mark unused range-loop vars with `C10_UNUSED` (#85383) I.e. replace: ``` for(const auto i: c10::irange(lim)) { (void)i; ... } ``` with ``` for(const auto i C10_UNUSED: c10::irange(lim)) { ... } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85383 Approved by: https://github.com/kit1980 commit 3122a96ee45507e8d33f265410222e69cc66677a Author: PyTorch MergeBot Date: Wed Sep 21 03:13:20 2022 +0000 Revert "Improve and expose cpp_backtrace to python binding (#84896)" This reverts commit 73fbca1ea6ecc08ae4455a12b68fc2ead93a088c. Reverted https://github.com/pytorch/pytorch/pull/84896 on behalf of https://github.com/kit1980 due to Broke libtorch and linux-binary-manywheel - https://hud.pytorch.org/pytorch/pytorch/commit/73fbca1ea6ecc08ae4455a12b68fc2ead93a088c commit 9fdd8a8b7f171be70ea3bd4724c38852ef292d73 Author: Driss Guessous Date: Wed Sep 21 03:09:08 2022 +0000 Exposing native _scaled_dot_product_attention to torch.nn (#85044) This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the next few weeks and will eventually land as a prototype feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85044 Approved by: https://github.com/cpuhrsch commit d7029fea5113468441cb358bced6045e6e4d4b9a Author: Michael Gschwind Date: Wed Sep 21 02:07:13 2022 +0000 Remove TS compatibility transition code (#85003) Summary: Remove TS compatibility transition code Test Plan: sandcastle Differential Revision: D39494677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85003 Approved by: https://github.com/erichan1 commit 73fbca1ea6ecc08ae4455a12b68fc2ead93a088c Author: Sherlock Huang Date: Tue Sep 20 20:49:22 2022 +0000 Improve and expose cpp_backtrace to python binding (#84896) We can now get cpp stack trace by calling torch.utils.get_cpp_backtrace() Sample output when calling from a torch_dispatch stack: ``` frame #23: torch::handle_torch_function_no_python_arg_parser(c10::ArrayRef, _object*, _object*, char const*, _object*, char const*, torch::TorchFunctionName) (0x7f69330bab90 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/utils/python_arg_parser.cpp:323) frame #24: (0x7f6932a09e79 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/python_variable.cpp:2252) frame #25: (0x7f69261aee33 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/PythonFallbackKernel.cpp:56) frame #26: (0x7f69261afef9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:19) frame #27: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #28: (0x7f6926fae9b9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/boxing.h:227) frame #29: at::Tensor c10::Dispatcher::redispatch(c10::TypedOperatorHandle const&, c10::DispatchKeySet, at::Tensor const&) const (0x7f6926e821f5 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:106) frame #30: at::_ops::alias::redispatch(c10::DispatchKeySet, at::Tensor const&) (0x7f6927142c31 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:438) frame #31: (0x7f692ae4f8be in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:1361) frame #32: (0x7f692ae4f9b1 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:1362) frame #33: (0x7f692aef77e9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13) frame #34: (0x7f6926fae7d8 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:50) frame #35: at::Tensor c10::Dispatcher::redispatch(c10::TypedOperatorHandle const&, c10::DispatchKeySet, at::Tensor const&) const (0x7f6926e821c9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:97) frame #36: at::_ops::alias::redispatch(c10::DispatchKeySet, at::Tensor const&) (0x7f6927142c31 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:438) frame #37: (0x7f6929ec654a in /fsx/users/bahuang/repos/pytorch_fsx/build/aten/src/ATen/RedispatchFunctions.h:10697) frame #38: (0x7f6929d9edae in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/VariableType_1.cpp:2837) frame #39: (0x7f6929d9f043 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/VariableType_1.cpp:2838) frame #40: (0x7f6929e7d2f9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13) frame #41: (0x7f6929eb1344 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:478) frame #42: (0x7f6929ea7b99 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:490) frame #43: (0x7f6929e7d370 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:563) frame #44: (0x7f6929e7d43a in /fsx/users/bahuang/repos/pytorch_fsx/c10/util/C++17.h:239) frame #45: (0x7f6929e7d48c in /fsx/users/bahuang/repos/pytorch_fsx/c10/util/C++17.h:364) frame #46: (0x7f6929e7d50a in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:554) frame #47: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #48: c10::KernelFunction::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadd26 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:43) frame #49: c10::Dispatcher::redispatchBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f692603890a in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:652) frame #50: (0x7f69260387f9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:388) frame #51: (0x7f69261af0ef in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/PythonFallbackKernel.cpp:96) frame #52: (0x7f69261aff2b in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:25) frame #53: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #54: c10::KernelFunction::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const (0x7f6932aadd26 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:43) frame #55: c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const (0x7f6925fd6ab2 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:628) frame #56: (0x7f6925fd6690 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:376) frame #57: (0x7f692bf5b525 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:380) frame #58: (0x7f692bf59fac in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/runtime/register_c10_ops.cpp:15) frame #59: (0x7f692bf5af41 in /usr/include/c++/7/bits/std_function.h:316) frame #60: std::function >&)>::operator()(std::vector >&) const (0x7f6932ab9a0f in /usr/include/c++/7/bits/std_function.h:706) frame #61: (0x7f6932aad541 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/stack.h:41) frame #62: (0x7f6932ab3102 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/pybind_utils.h:1206 (discriminator 1)) frame #63: (0x7f6932ab3943 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/pybind_utils.h:1272) frame #64: (0x7f6932a46120 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/init.cpp:1767) frame #65: (0x7f6932a997be in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/cast.h:1441) frame #66: (0x7f6932a8a985 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/cast.h:1410) frame #67: (0x7f6932a66e1e in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:249) frame #68: (0x7f6932a66ec2 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:224) frame #69: (0x7f6932473111 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:929) frame #104: __libc_start_main (0x7f693485dc87 in /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:310) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84896 Approved by: https://github.com/ezyang commit 52fd7e491b24d4cf910b98bbe06c460f7d8e5577 Author: Will Constable Date: Wed Sep 21 00:06:54 2022 +0000 Update torch.ops.aten.all ref to be symbolic-trace friendly (#85352) - previous impl compared the summed bool values of the tensor to its nelem, which in a symbolic world is a symint and can't be coerced back into a bool for the purpose of shoving into a result tensor - new impl adds one extra negation op but avoids the need to compare to the symbolic nelem Pull Request resolved: https://github.com/pytorch/pytorch/pull/85352 Approved by: https://github.com/ezyang, https://github.com/mruberry commit f6a18d3d373f0391c722f6159a1dda23da556ab4 Author: Scott Wolchok Date: Mon Sep 19 15:41:33 2022 -0700 [PyTorch] StorageImpl: cache size_bytes.is_symbolic() (#85309) We've got 6 bools' worth of extra space, so let's try caching this. Differential Revision: [D39636570](https://our.internmc.facebook.com/intern/diff/D39636570/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39636570/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/85309 Approved by: https://github.com/ezyang commit 90fa744c09fdbf8a2e7747ea82823714b0ee7e04 Author: Horace He Date: Tue Sep 20 18:07:46 2022 +0000 Fixed memory issues in linalg_lstsq (#85357) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85357 Approved by: https://github.com/ezyang, https://github.com/IvanYashchuk commit cb8e73bb71ffb64ff0ef0f6e9fbe6ef7dfcbc307 Author: Vasiliy Kuznetsov Date: Tue Sep 20 09:37:54 2022 -0700 fx quant: fix bug in custom module test (#85344) Summary: `TestQuantizeFx.test_custom_module_class` was subtly broken because the various parts of the test case were modifying the original model. This seems incorrect because `prepare_fx` and `convert_fx` are inplace. To fix this, we can `copy.deepcopy` the model before applying the test cases to it. This test case was triggered by an unrelated refactor, splitting the fix in a separate diff to keep the refator clean. Test plan: ``` python test/test_quantization.py TestQuantizeFx.test_custom_module_class ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85344 Approved by: https://github.com/dzdang, https://github.com/z-a-f, https://github.com/jerryzh168 commit 62786a09d334498c872f2c74e814ce56b27092ae Author: Matthew LeMay Date: Tue Sep 20 19:11:28 2022 +0000 Fix indentation in functorch limitations docs (#85346) Fixes a minor indentation error in the `functorch` UX limitations documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85346 Approved by: https://github.com/kit1980 commit e1f634753c2606ddf1d9e1ef611a882928ce415c Author: Edward Z. Yang Date: Mon Sep 19 14:42:46 2022 -0700 Setup fake tensor and symbolic shapes once at beginning of AOTAutograd (#85233) Signed-off-by: Edward Z. Yang Differential Revision: [D39662822](https://our.internmc.facebook.com/intern/diff/D39662822) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85233 Approved by: https://github.com/wconstab commit 5f623f5c4c759cada9f0dc3866b63c906178dbc1 Author: Edward Z. Yang Date: Mon Sep 19 13:29:47 2022 -0700 Correctly handle duplicate arguments to AOTAutograd (#85301) If we do not deduplicate them, our custom autograd function will double-count the gradient computed for the variable (since the same x.grad field will be embedded into the graph twice.) The alternative is to destroy aliasing relationships the inputs and trace accumulating individual gradients for each of the inputs into separate grad fields, but this prevents resizing of inputs inside the traced graph from working correctly. In principle, you could detach the inputs, allow metadata changes on them, and then reflect metadata changes to the originals as necessary (in fact, we should do this for other reasons), but AOTAutograd doesn't do this yet. Another alternative is to have dynamo guarantee not to give duplicate tensor inputs, but because AOTAutograd is public API, we are obligated to still handle it correctly here in case a direct user passes duplicate inputs. Signed-off-by: Edward Z. Yang Differential Revision: [D39662821](https://our.internmc.facebook.com/intern/diff/D39662821) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85301 Approved by: https://github.com/Chillee, https://github.com/albanD commit b9b27f7664c2da80458a21d799eb737cfd2490df Author: jjsjann123 Date: Tue Sep 20 18:52:02 2022 +0000 Added `Tensor.to` overloads to `torch._refs.to` (#84802) Fixes #84264 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84802 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry commit d3dec8097b847fc46755ef06ea6ff90eebc846eb Author: kshitij12345 Date: Tue Sep 20 18:18:39 2022 +0000 [fix] composite compliance: cumprod, _masked.cumprod, linalg.vander (#85330) Ref: #69991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85330 Approved by: https://github.com/zou3519 commit e24e17916fff4ea9959d23471fb3b8b05f2720dd Author: Edward Z. Yang Date: Tue Sep 20 13:53:08 2022 -0400 Remove errant semicolon (#85356) Signed-off-by: Edward Z. Yang Differential Revision: [D39662630](https://our.internmc.facebook.com/intern/diff/D39662630) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85356 Approved by: https://github.com/wconstab, https://github.com/Krovatkin, https://github.com/voznesenskym commit a3afb2c2f603f100e18e8aae9e9bfee2d1c67a4a Author: Elias Ellison Date: Tue Sep 20 15:24:19 2022 +0000 Fake: fix conv_transpose2d striding (#82846) The output striding channels-last preservation logic differs between cuda and cpu. For the meta kernel, we can peek at the fake tensor device and use that to determine whether to do cpu or cuda. You could argue there's a leaking of abstraction here but this seems like a pretty minimal leak and I'm not sure there's a much cleaner way forward for device-specific striding tracing logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82846 Approved by: https://github.com/ezyang commit e1ed485c65d68787971a2dfb1ace1ed66f5a4d5b Author: Kulin Seth Date: Tue Sep 20 17:53:43 2022 +0000 [MPS] Handle reduction of scalars in edge-cases (#83743) The issue was found as part of fixing Test consistency issues. Test case coming up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83743 Approved by: https://github.com/razarmehr, https://github.com/malfet commit d17b144e6564f10f55af639fbf2dd82eaacdfa3e Author: lezcano Date: Tue Sep 20 12:40:28 2022 +0000 Adding multigammaln ref and fix arange (#85153) Partially based on https://github.com/pytorch/pytorch/pull/83662. I'll help land this one, as Rob does not work in the PyTorch project anymore I removed the data-dependent check for the args, as data dependencies are bad for many reasons (and it was failing when the input has NaNs). It also registers arange as a decomposition, and fixes the naming of its args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85153 Approved by: https://github.com/mruberry, https://github.com/ngimel commit 7a6c4d0c50dd0670d87bc39d53292cf8cb90ca04 Author: Masaki Kozuki Date: Tue Sep 20 17:18:33 2022 +0000 [mta] APEX style Fused Adam (#81705) This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel. related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436 cc @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705 Approved by: https://github.com/ngimel commit 00a1065286ef9b425cfe1d74d76b7ab5555eee83 Author: Vasu Agrawal Date: Tue Sep 20 17:15:59 2022 +0000 [pytorch] Inline std::forward definition (#85255) Summary: Alternative (probably better) solution to the problem laid out in D39562394. Test Plan: CI should be green. Differential Revision: D39612710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85255 Approved by: https://github.com/ezyang commit 9c1a6a522d5dddc9db2ce728dd751ad5a7fb577e Author: Sherlock Huang Date: Tue Sep 20 06:30:57 2022 +0000 Make ones and zeros's ref accepts variadic size argument (#85117) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85117 Approved by: https://github.com/ngimel, https://github.com/lezcano commit 5e8f16b8779775dc2a29d20de6827a0f25fb0df6 Author: Will Constable Date: Tue Sep 20 16:36:48 2022 +0000 Fix fake_tensor to_copy meta dispatch (#85337) Previously, no_dispatch() was causing us to hit real kernels (well, real decomps and prims) for to_copy when we were operating on FakeTensors. This change helps us hit meta kernels and seems to pass the relevant tests. I still have questions about why this line has to call .to("meta") input = new_kwargs.pop("input").to("meta") But that can wait for another PR. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/85337 Approved by: https://github.com/eellison commit 4012e623e8689b873cae94590766d990d155017c Author: eqy Date: Tue Sep 20 16:31:54 2022 +0000 [CUBLAS][CUDA GRAPHS] (re-open of #83461) Explicitly set the workspace for cuBLAS handles (#85292) re-open of #83461 with fix for 10.2 build CC @ngimel @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/85292 Approved by: https://github.com/malfet commit 39f482acdf959b2b22a310381a5a3d3bcf9699b9 Author: Kevin Stephano Date: Tue Sep 20 16:10:05 2022 +0000 Add a reset() method to nvFuser FusionCache to enable proper resetting during tests. (#85319) Fixes issue Jie found in his PR: https://github.com/pytorch/pytorch/pull/84626#issuecomment-1250745334 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85319 Approved by: https://github.com/jjsjann123 commit 86d8c61c7c122ede1628f967277073231f92c744 Author: Benoit Steiner Date: Tue Sep 20 15:38:58 2022 +0000 Revert D39583438: Multisect successfully blamed D39583438 for test or build failures (#85277) Summary: This diff is reverting D39583438 D39583438 has been identified to be causing the following test or build failures: Tests affected: - https://www.internalfb.com/intern/test/281475048999851/ Here's the Multisect link: https://www.internalfb.com/intern/testinfra/multisect/1260522 Here are the tasks that are relevant to this breakage: T124797105: 18 tests started failing for employee benoitsteiner in the last 2 weeks We're generating a revert to back out the changes in this diff, please note the backout may land if someone accepts it. Test Plan: NA Reviewed By: benoitsteiner Differential Revision: D39599694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85277 Approved by: https://github.com/dagitses commit cf2f552cd8a41f4913c370c15804173a3b56a415 Author: anjali411 Date: Tue Sep 20 10:18:31 2022 +0000 Add __all__ to torch.{fx, distributed, backends} submodules (#85079) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85079 Approved by: https://github.com/rohan-varma commit a4dca9822dfabcdbd1b36a12c013764f2af87613 Author: kshitij12345 Date: Tue Sep 20 08:03:36 2022 +0000 [composite compliance] prod (#81969) Ref: #69991 Also fixes #82644 (fix similar to #81617) For CompositeCompliance, we can't use `item` to choose a special fast-path when Tensor is a Subclass. Instead we always dispatch to the slower but safer implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81969 Approved by: https://github.com/zou3519 commit 077db3de92f34cff3187b61de4b18900a927b3fd Author: Kulin Seth Date: Tue Sep 20 06:19:40 2022 +0000 [MPS] Fix conv1d backwards crash for channels last case (#85283) Fixes pytorch#84511 Use the same logic as in the forward pass for the backward pass (in case of channels last, manually set the shape to NHWC) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85283 Approved by: https://github.com/malfet, https://github.com/razarmehr commit bcdef58a55665bba959a773633aa5b1e5758d94e Author: Kulin Seth Date: Tue Sep 20 06:00:58 2022 +0000 [MPS] Fix the crash in bitwise ops on x86 platforms. (#85285) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/85285 Approved by: https://github.com/razarmehr, https://github.com/malfet commit 6c48a01cef029075d3787acab18d2a4e32b2cb4c Author: Xia, Weiwen Date: Tue Sep 20 05:33:26 2022 +0000 [Quant] Improve performance of ONEDNN backend (#84470) This PR improves performance of ONEDNN quantization backend by adding fast paths. For qconv, qconv_transpose and qlinear. It uses a cache to store reusable data on the first run thus reducing runtime overhead afterwards. Note: Other quantization backends not affected. **Correctness**: Covered by UT **Performance**: (Time to run each op, in microseconds) Convolution, 1 core per instance, multiple instances on whole socket shape | onednn (old) | onednn (new) | Improvement -- | -- | -- | -- mb1_ic128oc128_id2od2kd3sd1dd0pd1 _ih8oh8kh3sh1dh0ph1_iw10ow10kw3sw1dw0pw1 | 767.038 | 415.238 | 45.86% mb1_ic256oc128_id4od4kd1sd1dd0pd0 _ih16oh16kh1sh1dh0ph0_iw20ow20kw1sw1dw0pw0 | 194.979 | 167.353 | 14.17% mb1_ic32oc16_ih112oh112kh1sh1dh0ph0 _iw112ow112kw1sw1dw0pw0 | 104.024 | 78.206 | 24.82% mb1_ic3oc16_ih224oh112kh3sh2dh0ph1 _iw224ow112kw3sw2dw0pw1 | 104.178 | 81.559 | 21.71% mb30_ic256oc256_ih14oh14kh3sh1dh0ph1 _iw14ow14kw3sw1dw0pw1 | 12249.438 | 12079.125 | 1.39% mb56_ic3oc28_ih24oh22kh3sh1dh0ph0 _iw24ow22kw3sw1dw0pw0 | 438.046 | 405.21 | 7.50% mb100_ic128oc128_ih16oh16kh3sh1dh0ph1 _iw16ow16kw3sw1dw0pw1 | 13893.188 | 13797.609 | 0.69% g2mb1_ic128oc256_ih28oh28kh3sh1dh0ph1 _iw28ow28kw3sw1dw0pw1 | 499.014 | 475.333 | 4.75% g32mb1_ic1024oc1024_ih14oh14kh3sh1dh0ph1 _iw14ow14kw3sw1dw0pw1 | 294.877 | 270.568 | 8.24% g64mb1_ic1024oc2048_ih14oh7kh3sh2dh0ph1 _iw14ow7kw3sw2dw0pw1 | 122.664 | 95.503 | 22.14% g256mb1_ic256oc256_ih10oh5kh3sh2dh0ph1 _iw10ow5kw3sw2dw0pw1 | 31.343 | 13.522 | 56.86% g512mb1_ic512oc512_ih19oh10kh3sh2dh0ph1 _iw19ow10kw3sw2dw0pw1 | 54.116 | 34.651 | 35.97% g1024mb1_ic1024oc1024_ih10oh10kh3sh1dh0ph1 _iw10ow10kw3sw1dw0pw1 | 74.989 | 55.566 | 25.90% Convolution, 4 cores per instance, multiple instances on whole socket shape | onednn (old) | onednn (new) | Improvement -- | -- | -- | -- mb1_ic128oc128_id2od2kd3sd1dd0pd1 _ih8oh8kh3sh1dh0ph1_iw10ow10kw3sw1dw0pw1 | 249.978 | 160.429 | 35.82% mb1_ic256oc128_id4od4kd1sd1dd0pd0 _ih16oh16kh1sh1dh0ph0_iw20ow20kw1sw1dw0pw0 | 102.726 | 89.555 | 12.82% mb1_ic32oc16_ih112oh112kh1sh1dh0ph0 _iw112ow112kw1sw1dw0pw0 | 72.993 | 57.622 | 21.06% mb1_ic3oc16_ih224oh112kh3sh2dh0ph1 _iw224ow112kw3sw2dw0pw1 | 76.607 | 61.847 | 19.27% mb30_ic256oc256_ih14oh14kh3sh1dh0ph1 _iw14ow14kw3sw1dw0pw1 | 3109.625 | 3006.062 | 3.33% mb56_ic3oc28_ih24oh22kh3sh1dh0ph0 _iw24ow22kw3sw1dw0pw0 | 191.194 | 175.997 | 7.95% mb100_ic128oc128_ih16oh16kh3sh1dh0ph1 _iw16ow16kw3sw1dw0pw1 | 3435.625 | 3391.438 | 1.29% g2mb1_ic128oc256_ih28oh28kh3sh1dh0ph1 _iw28ow28kw3sw1dw0pw1 | 205.209 | 191.931 | 6.47% g32mb1_ic1024oc1024_ih14oh14kh3sh1dh0ph1 _iw14ow14kw3sw1dw0pw1 | 157.004 | 142.82 | 9.03% g64mb1_ic1024oc2048_ih14oh7kh3sh2dh0ph1 _iw14ow7kw3sw2dw0pw1 | 83.262 | 71.689 | 13.90% g256mb1_ic256oc256_ih10oh5kh3sh2dh0ph1 _iw10ow5kw3sw2dw0pw1 | 31.848 | 13.378 | 57.99% g512mb1_ic512oc512_ih19oh10kh3sh2dh0ph1 _iw19ow10kw3sw2dw0pw1 | 50.216 | 32.663 | 34.95% g1024mb1_ic1024oc1024_ih10oh10kh3sh1dh0ph1 _iw10ow10kw3sw1dw0pw1 | 67.359 | 49.779 | 26.10% Transposed Convolution, 1 core per instance, multiple instances on whole socket shape | onednn (old) | onednn (new) | Improvement -- | -- | -- | -- mb1_ic512oc256_ih4oh8kh4sh2dh0ph1_iw4ow8kw4sw2dw0pw1 | 537.12 | 505.142 | 5.95% mb1_ic256oc128_ih8oh16kh4sh2dh0ph1_iw8ow16kw4sw2dw0pw1 | 296.95 | 275.724 | 7.15% mb1_ic128oc64_ih16oh32kh4sh2dh0ph1_iw16ow32kw4sw2dw0pw1 | 266.933 | 251.175 | 5.90% mb1_ic64oc3_ih32oh64kh4sh2dh0ph1_iw32ow64kw4sw2dw0pw1 | 141.77 | 126.41 | 10.83% mb1_ic100oc512_ih1oh4kh4sh1dh0ph0_iw1ow4kw4sw1dw0pw0 | 89.511 | 66.719 | 25.46% Transposed Convolution, 4 cores per instance, multiple instances on whole socket shape | onednn (old) | onednn (new) | Improvement -- | -- | -- | -- mb1_ic512oc256_ih4oh8kh4sh2dh0ph1 _iw4ow8kw4sw2dw0pw1 | 181.594 | 163.77 | 9.82% mb1_ic256oc128_ih8oh16kh4sh2dh0ph1 _iw8ow16kw4sw2dw0pw1 | 163 | 145.104 | 10.98% mb1_ic128oc64_ih16oh32kh4sh2dh0ph1 _iw16ow32kw4sw2dw0pw1 | 163.158 | 150.71 | 7.63% mb1_ic64oc3_ih32oh64kh4sh2dh0ph1 _iw32ow64kw4sw2dw0pw1 | 109.955 | 98.603 | 10.32% mb1_ic100oc512_ih1oh4kh4sh1dh0ph0 _iw1ow4kw4sw1dw0pw0 | 69.502 | 54.523 | 21.55% Linear, 1 core per instance, multiple instances on whole socket shape | onednn (old) | onednn (new) | Improvement -- | -- | -- | -- mb1ic16oc8 | 54.415 | 35.816 | 34.18% mb1ic32oc16 | 26.764 | 16.041 | 40.07% mb1ic64oc32 | 26.735 | 16.007 | 40.13% mb1ic100oc1 | 26.712 | 16.06 | 39.88% mb1ic512oc1000 | 65.261 | 51.947 | 20.40% mb1ic1024oc1000 | 112.603 | 98.175 | 12.81% mb1ic2048oc1000 | 207.294 | 192.014 | 7.37% mb1ic4096oc4096 | 3761.094 | 3745.609 | 0.41% mb1ic9216oc4096 | 8918.672 | 8912.547 | 0.07% mb20ic2048oc91 | 52.487 | 44.623 | 14.98% mb30ic512oc37 | 29.257 | 19.642 | 32.86% mb100ic128oc256 | 39.32 | 29.81 | 24.19% mb100ic256oc512 | 74.499 | 64.322 | 13.66% mb100ic512oc1024 | 220.029 | 204.745 | 6.95% mb100ic1024oc784 | 352.311 | 336.309 | 4.54% Linear, 4 cores per instance, multiple instances on whole socket shape | onednn (old) | onednn (new) | Improvement -- | -- | -- | -- mb1ic16oc8 | 58.252 | 40.433 | 30.59% mb1ic32oc16 | 23.901 | 15.549 | 34.94% mb1ic64oc32 | 24.594 | 16.214 | 34.07% mb1ic100oc1 | 24.011 | 15.4 | 35.86% mb1ic512oc1000 | 49.781 | 41.988 | 15.65% mb1ic1024oc1000 | 70.304 | 61.88 | 11.98% mb1ic2048oc1000 | 92.259 | 85.715 | 7.09% mb1ic4096oc4096 | 794.937 | 781.137 | 1.74% mb1ic9216oc4096 | 2081.375 | 2067.75 | 0.65% mb20ic2048oc91 | 66.929 | 58.338 | 12.84% mb30ic512oc37 | 35.332 | 26.337 | 25.46% mb100ic128oc256 | 42.21 | 38.908 | 7.82% mb100ic256oc512 | 66.49 | 63.967 | 3.79% mb100ic512oc1024 | 130.828 | 122.673 | 6.23% mb100ic1024oc784 | 160.987 | 154.765 | 3.86% Environment: - PyTorch version: 1.13.0a0+gitcdd625b - Is debug build: False - CUDA used to build PyTorch: None - ROCM used to build PyTorch: N/A - OS: Ubuntu 20.04.3 LTS (x86_64) - GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 - Clang version: Could not collect - CMake version: version 3.22.5 - Libc version: glibc-2.31 - Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) - Python platform: Linux-5.11.0-27-generic-x86_64-with-glibc2.31 - Is CUDA available: False - CUDA runtime version: No CUDA - GPU models and configuration: No CUDA - Nvidia driver version: No CUDA - cuDNN version: No CUDA - HIP runtime version: N/A - MIOpen runtime version: N/A - Is XNNPACK available: True Versions of relevant libraries: - [pip3] intel-extension-for-pytorch==1.13.0+cpu - [pip3] numpy==1.23.3 - [pip3] pytorch-widedeep==0.3.7 - [pip3] torch==1.13.0a0+git48b423b - [pip3] torchvision==0.14.0a0+ebb68f3 - [conda] blas 1.0 mkl - [conda] intel-extension-for-pytorch 1.13.0+cpu pypi_0 pypi - [conda] mkl 2021.4.0 h06a4308_640 - [conda] mkl-include 2022.1.0 pypi_0 pypi - [conda] mkl-service 2.4.0 py39h7f8727e_0 - [conda] mkl-static 2022.1.0 pypi_0 pypi - [conda] mkl_fft 1.3.1 py39hd3c417c_0 - [conda] mkl_random 1.2.2 py39h51133e4_0 - [conda] numpy 1.23.3 pypi_0 pypi - [conda] numpy-base 1.22.3 py39hf524024_0 - [conda] torch 1.13.0a0+git48b423b pypi_0 pypi - [conda] torchvision 0.14.0a0+ebb68f3 pypi_0 pypi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84470 Approved by: https://github.com/jerryzh168 commit 35088f283e5a93c6775e65e19d34093bdfb101e1 Author: PyTorch MergeBot Date: Tue Sep 20 03:42:43 2022 +0000 Revert "Python stack tracing OD flow (part 1) (#84362)" This reverts commit 1f4f05e59c4cd72dfff9755629f7cc23f8df7abe. Reverted https://github.com/pytorch/pytorch/pull/84362 on behalf of https://github.com/malfet due to Broke CUDA-10.2 tests, see https://hud.pytorch.org/pytorch/pytorch/commit/1f4f05e59c4cd72dfff9755629f7cc23f8df7abe commit 8c7e20976e227f0cf85ccd742878c42f2d1c927d Author: PyTorch MergeBot Date: Tue Sep 20 03:04:44 2022 +0000 [vision hash update] update the pinned vision hash (#85315) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85315 Approved by: https://github.com/pytorchbot commit c05ca0dbf286d94a0575b8a037410dca200a523d Author: Nikita Shulga Date: Tue Sep 20 01:49:04 2022 +0000 [torch.futures] Fix nullptr deref (#85304) `torch.jit.wait(None)` and `torch.futures.collect_all((None,))` should not crash. Fixes https://github.com/pytorch/pytorch/issues/85237 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85304 Approved by: https://github.com/kit1980 commit 66907e7262da6d6ef3d471e4c90a1c48a8f39a76 Author: Richard Zou Date: Mon Sep 19 14:29:41 2022 -0700 [functorch] Fix dangling impls (#85299) Our dangling impls were: - positive_ (the in-place op just never existed) - unique (something happened to this op, maybe it was renamed) Test Plan: - `import functorch; torch._C._dispatch_find_dangling_impls` - It's difficult to write a test for this because the number of dangling impls depends on if `test_dispatch` has been run already or not (test_dispatch adds a dangling impl) - Can't remove the torchdynamo skip for this yet either Pull Request resolved: https://github.com/pytorch/pytorch/pull/85299 Approved by: https://github.com/ezyang commit 53fdd60635710a7a9f1c2a3eb1115f51b1247e94 Author: PyTorch MergeBot Date: Tue Sep 20 00:13:41 2022 +0000 Revert "Reduce memory usage requirement of `test_warp_softmax_64bit_indexing` in `test_nn.py` (#85037)" This reverts commit 66a9cba221ac32658ea837e88b68b859a08378d0. Reverted https://github.com/pytorch/pytorch/pull/85037 on behalf of https://github.com/clee2000 due to broke test_warp_softmax_64bit_indexing_cuda_float32 and test_warp_softmax_64bit_indexing_cuda_float16 on rocm https://github.com/pytorch/pytorch/actions/runs/3085764744/jobs/4989643817 commit a998a8eb1027b0e162fd6789efb378edc66d37e9 Author: Natalia Gimelshein Date: Mon Sep 19 22:00:41 2022 +0000 Fix segfault for `out` with a large number of dims (#85294) Fixes #85166, #85167, #79218, #85251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85294 Approved by: https://github.com/malfet commit d9024ea284e925a5f6e0fd35151d682a70e288cf Author: Richard Zou Date: Mon Sep 19 21:49:18 2022 +0000 Setup torch/csrc/functorch/*; move CompileCache.{h, cpp} there (#85263) The plan for functorch C++ is: - all C++-only code goes into aten/functorch. - any C++ code with a python dependency goes into torch/csrc/functorch. This will include the functorch Python bindings as well as all of torchdim. Alternative: - we could split it so that code goes into torch/csrc/functorch/nopython and torch/csrc/functorch/python instead of putting anything into ATen. This just feels like a matter of cosmetics. This PR also does two more things: - fix a windows lint error regarding PyLong_asLong - clang-format the code (because the linter got triggered) Test Plan: - run tests - check internal build Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/85263 Approved by: https://github.com/ezyang commit 1f4f05e59c4cd72dfff9755629f7cc23f8df7abe Author: Seonglyong Gong Date: Mon Sep 19 21:33:55 2022 +0000 Python stack tracing OD flow (part 1) (#84362) Summary: submodule update Test Plan: CI Differential Revision: D39176686 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84362 Approved by: https://github.com/robieta commit 66a9cba221ac32658ea837e88b68b859a08378d0 Author: eqy Date: Mon Sep 19 21:31:08 2022 +0000 Reduce memory usage requirement of `test_warp_softmax_64bit_indexing` in `test_nn.py` (#85037) For reference: #84944 CC @xwang233 @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85037 Approved by: https://github.com/ngimel, https://github.com/pmeier commit e41d758e26bd2de00e9dd50e94e878f46f9f1b88 Author: Thomas Viehmann Date: Mon Sep 19 21:20:34 2022 +0000 Handle implicit real->complex casting for backward of stack (#84993) Fixes: #75852 P.S.: Yay for the PyTorch foundation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84993 Approved by: https://github.com/soulitzer commit cd7408e9505e3a7ae00e72a69ab17389ce086475 Author: Michael Voznesensky Date: Mon Sep 19 20:48:09 2022 +0000 Add aten _assert_tensor_metadata op (#84617) Example: ``` graph(): %arg0 : [#users=3] = placeholder[target=arg0] %arg_guard_equality_check : [#users=1] = call_function[target=torch._tensor_equal](args = (%arg0, (1, 1, 2), (2, 2, 1), torch.float32), kwargs = {}) %_assert_true : [#users=0] = call_function[target=torch._assert_true](args = (%arg_guard_equality_check, Guard evaluation failed equality check for arg0), kwargs = {}) %add : [#users=1] = call_function[target=operator.add](args = (%arg0, 1), kwargs = {}) return ([arg0, arg0], (add, add)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84617 Approved by: https://github.com/jansel commit 6ed90379a848e1ff1422fb906253e38683c25c90 Author: PyTorch MergeBot Date: Mon Sep 19 20:34:08 2022 +0000 Revert "Legalize BFloat16 in NNC. (#83988)" This reverts commit b049493ed52292c344c5b17f6db16a0242419865. Reverted https://github.com/pytorch/pytorch/pull/83988 on behalf of https://github.com/clee2000 due to broke slow tests in trunk, https://github.com/pytorch/pytorch/actions/runs/3084421000/jobs/4986706931 commit 1456cca1fc31a16a5e7a6248d28ebfa80dae8db0 Author: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Date: Mon Sep 19 20:21:46 2022 +0000 Fix exception handling, improve overheads and avoid constructing storage for element size (#84612) These changes were proposed by @MatthiasKohl in #84271 and #84542 that fix #84267 and #84056 respectively. The reason I am creating the pull request is CLA check (see original PRs). cc @ptrblck @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/84612 Approved by: https://github.com/ngimel commit cbe5469e88db19f1683efcc49f3114a82ea58e32 Author: jiahongyu Date: Mon Sep 19 19:43:14 2022 +0000 [PolishComment] Polish code comment, revelant->relevant (#85238) Polish code comment, `revelant`->`relevant` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85238 Approved by: https://github.com/kit1980 commit 8c952db13ae6c634da2f1e42c1b053d0ad40003e Author: Ivan Yashchuk Date: Mon Sep 19 19:31:16 2022 +0000 Fix segfault case for torch.ormqr (#85278) Correct behavior is to raise an error for `tau.size[-1] > input.size[-1]`. Fixes https://github.com/pytorch/pytorch/issues/85218 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85278 Approved by: https://github.com/Lezcano, https://github.com/malfet, https://github.com/ngimel commit 555bb6cdb8ca82fb298b3fe6b017c59255b08621 Author: Thytu Date: Mon Sep 19 18:49:07 2022 +0000 Check that groups is > 0 in _convolution op (#85111) (#85248) `_convolution` will raise an error if it is called with groups <= 0 Signed-off-by: Thytu Fixes #85111 Side note : If I need to do it elsewhere, let me know 🙂 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85248 Approved by: https://github.com/Lezcano, https://github.com/malfet commit 7234eb06f73f0e2c0aaa02727aee4afb5300ff1a Author: PyTorch MergeBot Date: Mon Sep 19 18:46:35 2022 +0000 Revert "Land "Make ceil,floor,round,trunc handle integers" (#85144)" This reverts commit b27eb8d377fc8ac267fdaed7f95a03d609764604. Reverted https://github.com/pytorch/pytorch/pull/85144 on behalf of https://github.com/clee2000 due to broke slow tests in trunk ex https://ossci-raw-job-status.s3.amazonaws.com/log/8433956087 commit f0b06c64c8d169f41e025a76390efd89e3cdcd99 Author: Pearu Peterson Date: Mon Sep 19 10:13:41 2022 +0300 Fix bugs in sparse compressed tensor shape and device inference (#85240) Fixes #84999 This PR - uses device option to set sparse compressed tensor instance device - enables shape and device inference tests that was disabled due to an oversight - fixes a bug in shape inference of hybrid tensors - fixes a bug in to_sparse_bsr of a cuda tensor - updates tests that catch the above bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/85240 Approved by: https://github.com/cpuhrsch commit 6a18616296ab9b467a0437bd7248523860c3babc Author: Edward Z. Yang Date: Sat Sep 17 10:52:14 2022 -0700 Support for sym_strides() in backwards formulas (#85210) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85210 Approved by: https://github.com/Chillee, https://github.com/voznesenskym commit f38f9dfbfae27f255e83791670890c4383be98da Author: Edward Z. Yang Date: Mon Sep 19 07:49:42 2022 -0700 When tracing SymInts, peephole optimize multiply by one (#85261) This shows up a lot in graphs, so it is nice to not bother recording useless info. On pytorch_BERT, this optimization doesn't seem to speed anything up, so it's mostly for cleanliness. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85261 Approved by: https://github.com/wconstab commit ebf45a07858f8c07fd5574ea981d50d653fb0c4b Author: Wu, Chunyuan Date: Mon Sep 19 17:45:20 2022 +0000 [NNC] support aten::_convolution when it is 2D conv (#84038) Currently, only `aten::conv2d` has been supported in NNC. When using `torch.jit.trace`, the node on the graph will be `aten::_convolution`. This PR adds support of `aten::_convolution` node when it corresponds to a 2D convolution. Support `aten::_convolution` in NNC when we can infer from the parameters that it is a 2D convolution to support models obtained from `torch.jit.trace`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84038 Approved by: https://github.com/huiguoo commit b049493ed52292c344c5b17f6db16a0242419865 Author: Wang, Eikan Date: Fri Sep 16 07:54:16 2022 +0000 Legalize BFloat16 in NNC. (#83988) Regarding BF16 support in NNC, we always convert the BF16 to FP32 and then compute with FP32. After the FP32 computation, we convert the FP32 result to BF16. This logic has been supported in [half_support.h](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/half_support.h). But the B16/FP32 conversion has not been supported by LLVM. Currently, LLVM only supports the BF16 in its front end but still cannot generate the assembly code. So we implement this feature in [llvm_codegen](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/llvm_codegen.cpp) like aten implementation. In this PR, it contains three points - Take BF16 as uint16, Convert BF16 to FP32 and Convert FP32 to BF16. - Take BF16 as uint16 - [PR Change](https://github.com/pytorch/pytorch/pull/83988/files#diff-9d09ca2fce1c689ab43cd71795ab9b8b63447de950cf98ae8a18114e18d87e79R544-R546) We cannot naively convert map the BF16 to LLVM BF16 as the LLVM backend still does not support this data type as I mentioned. Meanwhile, the BF16 in PyTorch is a [structure](https://github.com/pytorch/pytorch/blob/master/c10/util/BFloat16.h#L73). Its realdata is uint16. So we also bitcast the BF16 tensor to uint16 - BF16 to FP32 conversion - [PR Change](https://github.com/pytorch/pytorch/pull/83988/files#diff-9d09ca2fce1c689ab43cd71795ab9b8b63447de950cf98ae8a18114e18d87e79R1057-R1061) we just need to shift the BF16 value left by 16bits and then bit cast the shifted value to FP32 - FP32 to BF16 conversion - [PR Change](https://github.com/pytorch/pytorch/pull/83988/files#diff-9d09ca2fce1c689ab43cd71795ab9b8b63447de950cf98ae8a18114e18d87e79R1066-R1084) The conversion is kind of complex. Because we use RNE to implement it. The RNR to convert the FP32 to BF16 is as follows. ```C++ uint16_t round_to_nearest_even(float src) { if (std::isnan(src)) { return UINT16_C(0x7FC0); } else { union { uint32_t U32; float F32; }; F32 = src; uint32_t rounding_bias = ((U32 >> 16) & 1) + UINT32_C(0x7FFF); return static_cast((U32 + rounding_bias) >> 16); } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83988 Approved by: https://github.com/ZolotukhinM, https://github.com/frank-wei commit b27eb8d377fc8ac267fdaed7f95a03d609764604 Author: lezcano Date: Mon Sep 19 14:58:38 2022 +0000 Land "Make ceil,floor,round,trunc handle integers" (#85144) PR to land https://github.com/pytorch/pytorch/pull/78480, as Rohit does not work in the PyTorch project anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/85144 Approved by: https://github.com/ngimel, https://github.com/mruberry commit cd32a86bf2b7bdb928dd02fe7954c5852be8c27a Author: Richard Zou Date: Mon Sep 19 06:53:14 2022 -0700 Stop monkeypatching Tensor.backward() on `import functorch` (#85152) Monkeypatching is bad, we should never be doing it. This PR removes functorch's monkeypatching on Tensor.backward() by adding it directly to the implementation of Tensor.backward(). As an alternative, we could have done an `import functorch` and used `functorch._C.are_transforms_active` directly in `torch/autograd/__init__.py`. The problem with that is that it runs into a bunch of circular imports. NB: https://github.com/pytorch/pytorch/issues/72179 is still on my mind. I didn't choose to do it right now because: - This PR doesn't make the situation worse than it already is (no monkeypatching is better than having the monkeypatch) - We don't have a design for #72179 yet. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/85152 Approved by: https://github.com/soulitzer commit 5ce56d9377914d3c273c0bce037b2443bfe6c21b Author: Richard Zou Date: Mon Sep 19 06:53:13 2022 -0700 Stop loading jit decomps in eager_transforms.py (#85151) They're already loaded in `torch/__init__.py` Test Plan: - functorch tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/85151 Approved by: https://github.com/samdow, https://github.com/soulitzer commit 6fd8e28a993c578cbdc1e78d71fc2ef71682b165 Author: Antonio Kim Date: Mon Sep 19 15:41:19 2022 +0000 Make addmm meta kernel consistent with mm (#84960) Change the names of the parameters in the `addmm` meta kernel to be more consistent with `mm`. Functionally, the only difference in behaviour should be that `addmm` meta kernel gets its options from the `input` tensor instead of from the `bias` parameter. Fixes #84930 CC: @ezyang @ngimel @wconstab @ke1337 @glebk-cerebras Pull Request resolved: https://github.com/pytorch/pytorch/pull/84960 Approved by: https://github.com/ezyang commit 3a51b557efa0d1959210b96009b7153dbeb2e2dc Author: Sean Ross-Ross Date: Mon Sep 19 14:28:25 2022 +0000 Added docs and opinfo for narrow_copy (#84493) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84493 Approved by: https://github.com/amjames, https://github.com/ngimel, https://github.com/mruberry commit b0c447e954a335b0df60307a2e7c720320af7231 Author: Richard Zou Date: Fri Sep 16 14:51:34 2022 -0700 [functorch] add batch rule for linalg.lu_solve (#85175) Fixes https://github.com/pytorch/functorch/issues/1022 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85175 Approved by: https://github.com/Chillee commit bb0e6e54560c55f9d4319ce2f630ad648fe8dd29 Author: mingfeima Date: Fri Aug 19 15:06:55 2022 +0800 port spmm_sum to pytorch and optimize it on CPU [ghstack-poisoned] commit d561aa944b7e777eb0575be2427e26a86df85f11 Author: Mike Ruberry Date: Mon Sep 19 10:32:39 2022 +0000 Adds normal prim, randn reference, and randn OpInfo (#85128) This PR extends prims support for random operations by adding `prims.normal` and `refs.randn`. Note that in the future we may not want to model draws from distributions as their own prims. `prims.normal` accepts a shape and the mean and standard deviation of a normal distribution as numbers. This is distinct from `torch.normal` which takes two tensors so every generated datapoint can be drawn from a normal distribution with its own mean and standard deviation. To address this @ngimel and I expect to add `prims.normal_with_tensors`. The current `prims.normal` could be implemented using `prims.normal_with_tensors`, but we expect the case of two numbers is much more common, and that executors will likely want to specialize for it, anyway. In a follow-up PR I plan to add `refs.randn_like`, `prims.normal_with_tensors` (as mentioned above), and `refs.normal`. While writing this PR I noticed the following issues: - https://github.com/pytorch/pytorch/issues/85123 - https://github.com/pytorch/pytorch/issues/85121 The latter of which is prohibiting some testing. In future PRs I plan to add a prim for changing layout, add support for pinned memory, and improve support for testing tensor creation operators, likely with a TensorCreationOpInfo class. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85128 Approved by: https://github.com/ngimel commit 17aefce0aaf939a50f81f06db658f942cbc1df1f Author: PyTorch MergeBot Date: Mon Sep 19 10:03:27 2022 +0000 [xla hash update] update the pinned xla hash (#85242) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85242 Approved by: https://github.com/pytorchbot commit 7df0878b9936961cc1bde9d20c834ac4331d140a Author: Rohan Varma Date: Sat Sep 17 19:37:35 2022 +0000 [FSDP] Option to keep grads in lower prec (#85223) Reland of https://github.com/pytorch/pytorch/pull/85134, fix is to use fp16 instead of bf16 which is not supported on all platforms. Differential Revision: [D39565189](https://our.internmc.facebook.com/intern/diff/D39565189/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85223 Approved by: https://github.com/awgu commit 9024015adf01d93fd2533c71fa1e7f06831c2ac7 Author: Nikita Shulga Date: Sun Sep 18 20:38:43 2022 +0000 [BE] Small improvements to device_count (#85192) If `_parse_visible_devices` returns an empty set, no need to make nvml calls Also, reduce indent a bit in `_device_count_nvml` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85192 Approved by: https://github.com/kit1980, https://github.com/ngimel commit dadd89a8a60f80222b8f3c3bdb83440b902c737b Author: Bin Bao Date: Fri Sep 16 20:15:11 2022 +0000 Add a flag to trigger inductor testing (#85183) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85183 Approved by: https://github.com/jansel commit 1378561d03d5bb1433f6404e829b49caaaba9e00 Author: PyTorch MergeBot Date: Sun Sep 18 02:46:48 2022 +0000 [vision hash update] update the pinned vision hash (#85199) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85199 Approved by: https://github.com/pytorchbot commit b8bf11bbf4e9ae0073b14ddd1966d47543e8d2b5 Author: Edward Z. Yang Date: Sat Sep 17 09:42:04 2022 -0700 Add braces around single line conditional (#85207) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85207 Approved by: https://github.com/Chillee commit 68929f4768a0cf77fe2bc4d9f49dd67fbad9f9af Author: Edward Z. Yang Date: Sat Sep 17 09:42:03 2022 -0700 Remove improper asserts. (#85206) strides() will raise an error if it is called on a tensor with symbolic shapes, so we cannot actually assert using it. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85206 Approved by: https://github.com/Chillee commit 9d84db3b726c905beb00ff9ad3d995435c211ae6 Author: Edward Z. Yang Date: Sat Sep 17 09:42:03 2022 -0700 Templatize checkInBoundsForStorage and setStrided for SymInt (#85205) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85205 Approved by: https://github.com/Chillee commit 280e2f92831b92fa0440bdbaf2101df46570c5b9 Author: Edward Z. Yang Date: Sat Sep 17 09:42:02 2022 -0700 Fix bug in computeStorageNbytes (#85204) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85204 Approved by: https://github.com/Chillee commit 12a19a4846c924e9d1e2d37fa0a706fb8eaef9a7 Author: Horace He Date: Sat Sep 17 18:11:51 2022 +0000 Made tracing of proxy symints lazy (#85185) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85185 Approved by: https://github.com/ezyang commit 5dd9610e9d6a54cb6e7340e950606a06aa7eee96 Author: lezcano Date: Sat Sep 17 16:57:34 2022 +0000 Refs and decompositions for index_{add,copy,select,fill} (#85002) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85002 Approved by: https://github.com/ngimel commit 45a9dcd4dd6b691d8e2fb867e68ef59a72f1fc75 Author: Nikita Shulga Date: Sat Sep 17 18:20:17 2022 +0000 [BE] Add explicit `__all__` to torch.cuda (#85193) This helps one avoid re-exporting torch, warnings and other system modules from `torch.cuda` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85193 Approved by: https://github.com/kit1980 commit 8c9d7fabd60b7cbb84277d1db87e9a9c78fde266 Author: Edward Z. Yang Date: Sat Sep 17 06:47:59 2022 -0700 Add SymInt::guard_int (#85139) This allows you to explicitly guard on the specific integer value of a SymInt so that you can condition on it. If possible, prefer guarding on a boolean expression instead. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85139 Approved by: https://github.com/Chillee commit b0a631cd14c1072199826e97a2a6c302b9446dc9 Author: Kurt Mohler Date: Sat Sep 17 11:58:18 2022 +0000 Add nondeterministic alert for `MaxUnpool1d/2d/3d` (#84766) Part of #80827 Part of #78249 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84766 Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/nikitaved commit b8418e02eb33b82ccb682767994379d840351f42 Author: Kevin Stephano Date: Sat Sep 17 10:52:54 2022 +0000 Create Cache for Fusion Reuse in NVFuser in Python Frontend for Primtorch (#85045) This PR does the following: - Replaces the `FusionOwner` with a `FusionCache` and `FusionInterface`. The `FusionCache` is a singleton that contains a cache of Fusions based on the `FusionDefinition`. It replaces the TorchScript graph caching that looked up a Fusion based on a stringified and canonicalized representation of the TorchScript graph with a prefix tree of statements in the `FusionDefinition`. The `FusionInterface` is an object that represents a Fusion in python. It can also query the cache based on id. - The ability to print out a mechanically derived definition, in python, for the user to use when debugging was added. - Replaces the python `examples` directory with true python tests under `test/test_nvfuser_frontend.py`. - Adds a set of C++ tests under the `test` directory to verify the `FusionCache`, `FusionDefinition`, and parts of the `RecordFunctor` child classes. - Adds a README file to explain how to use the Python Frontend While there are 3,000+ line edits, the bulk of the changes were repetitive line changes to the python bindings for each operation. An identical PR to #83267 to avoid tooling issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85045 Approved by: https://github.com/davidberard98 commit d23ce29761dbc0e817fa80dcd35d9b8d30f16bbb Author: Hector Yuen Date: Sat Sep 17 09:42:42 2022 +0000 allow changing the cuda allocator settings even after the process started (#84970) Summary: - expose a python call to set the allocator settings, it uses the same format as the value for PYTORCH_CUDA_ALLOCATOR - keep the implementation contained within the cpp file to avoid increasing build times, only expose a function to call the setting - make some of the Allocator Config methods public, now it looks more like a singleton Test Plan: added the unit test Differential Revision: D39487522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84970 Approved by: https://github.com/zdevito commit 81620c3360d4a15d266b8ad7daf556069db6dfc6 Author: PyTorch MergeBot Date: Sat Sep 17 06:53:11 2022 +0000 Revert "Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428)" This reverts commit d49943bda8e495bdb358e20b6eb114c442afa6e9. Reverted https://github.com/pytorch/pytorch/pull/83428 on behalf of https://github.com/osalpekar due to Reverted because __restrict symbol not supported by certain MSVC compilers, leading to undefined symbol error at compilation time commit 98b8ef99e1ec9b5d273b9612c08404fb34a9dc63 Author: lezcano Date: Sat Sep 17 03:52:56 2022 +0000 Add refs for sinc and sgn (#85142) This PR superseded https://github.com/pytorch/pytorch/pull/80171 This does not add the ref for `special.sinc` as I was getting some errors. This should be added to https://github.com/pytorch/pytorch/pull/84957 (cc @nkaretnikov) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85142 Approved by: https://github.com/ngimel, https://github.com/mruberry commit e33b464ffc8f08d9fb93b09816708d7f32500e68 Author: PyTorch MergeBot Date: Sat Sep 17 04:26:04 2022 +0000 Revert "Refs and decompositions for index_{add,copy,select,fill} (#85002)" This reverts commit 2f0b3de443dd8d4477d70c5a56fa14496d1eebe3. Reverted https://github.com/pytorch/pytorch/pull/85002 on behalf of https://github.com/huydhn due to Broke trunk slow tests commit 1838957e6f5c9f4d32ff446ec5af085d0f19ba2f Author: Brian Hirsh Date: Fri Sep 16 13:04:09 2022 -0700 fix external codegen kernel error checking (#85029) Fixes https://github.com/pytorch/pytorch/issues/84987. I followed the repro steps from the issue (changed `empty_symint` to `empty_symint2` and confirmed that and error gets raised. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85029 Approved by: https://github.com/ezyang commit 652707abc080d73d3477c96eca34a079d1e84e70 Author: John Detloff Date: Sat Sep 17 03:24:44 2022 +0000 Don't cache model specs within PTMCoreMLCompiler (#85136) Summary: It turns out disk cache space is more limited than I realized - Instagram starts evicting cached items at 10mb. We don't actually need to cache the model specs, once the model is compiled all we need is the compiled model. With this diff, after model compilation succeeds we cleanup the model specs from disk. Test Plan: Delete instagram from device to ensure an empty cache, build, launch camera, open a MCS or Segmentation effect, confirm it loads and works correctly. Restart the app and launch again, to confirm it can load the compiled model from cache as well. Differential Revision: D39562009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85136 Approved by: https://github.com/kimishpatel commit 2dbd2673b6b440efd977e84904ab30db683da921 Author: Nikolay Korovaiko Date: Fri Sep 16 12:21:31 2022 -0700 remove symintnode bits in LTC (#85171) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85171 Approved by: https://github.com/ezyang commit 02f654abca89760ea8004d50702410aceb2296f4 Author: soulitzer Date: Fri Sep 16 20:48:44 2022 -0400 Disable torch.library.Library with PYTORCH_DISABLE_LIBRARY (#85190) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85190 Approved by: https://github.com/d4l3k commit dca42ec20c85aee1487627ff1158d39292c4e411 Author: PyTorch MergeBot Date: Sat Sep 17 02:34:56 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#85198) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85198 Approved by: https://github.com/pytorchbot commit 21f2d55974d8543b70f26d3eac7ab1c4cc7a45ce Author: Taylor Robie Date: Fri Sep 16 09:51:05 2022 -0700 [Profiler][Trivial] Make `test/profiler` folder. (#84273) The first step to improving profiler test coverage is to improve the test structure and organization. This PR just pulls the tests into a dedicated folder. Differential Revision: [D39108645](https://our.internmc.facebook.com/intern/diff/D39108645/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39108645/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84273 Approved by: https://github.com/slgong-fb commit 4a5edbf0766c258a5cbf230758ce63b9794fb953 Author: Gavin Wu Date: Sat Sep 17 02:11:27 2022 +0000 Make param 'option' const& to prevent unnecessary copy at call-site (#84747) Reviewed By: ajtulloch Differential Revision: D39208916 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84747 Approved by: https://github.com/janeyx99 commit 32fc0b958e8b3280ccd8721009b8642394df7fcf Author: Will Constable Date: Sat Sep 17 02:10:23 2022 +0000 Expose get_active_ddp_module api for torchdynamo DDP (#83333) Pairs up with torchdynamo PR https://github.com/pytorch/torchdynamo/pull/628 Exposes a new API that lets torchdynamo know when it is compiling the 'forward' of a module that is inside a DDPmodule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83333 Approved by: https://github.com/mrshenli commit 0a6f32619ece9ccb3c23fc9fb07aec7f3767d8ba Author: Bartek Rymkowski Date: Sat Sep 17 02:06:43 2022 +0000 CoreML .mlmodel export support (#84784) Test Plan: This was tested manually - model was exported and XCode was used to analyze it Reviewed By: jmdetloff Differential Revision: D39048536 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84784 Approved by: https://github.com/jmdetloff commit ca419c33382057ec19fa6186889b78d7eb2f41f5 Author: Wu, Chunyuan Date: Sat Sep 17 01:44:34 2022 +0000 [NNC] add eltwise OPs: mish and elu (#80586) Enable more eltwise OPs in NNC: - mish - elu Pull Request resolved: https://github.com/pytorch/pytorch/pull/80586 Approved by: https://github.com/ZolotukhinM, https://github.com/malfet commit 377b5d6f8ba09ea799dce103b2e7e0aa4d804cc0 Author: Horace He Date: Fri Sep 16 22:59:44 2022 +0000 Added additional simplifications/caching for replacements and divisibility (#84918) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84918 Approved by: https://github.com/ezyang commit 9d1155235b82a83b49b7645db5292491ea81dacf Author: Justin Chu Date: Fri Sep 16 22:42:41 2022 +0000 [ONNX] Create decorators for symbolic function registration (#84709) This PR creates and tests the decorators proposed in #83787 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84709 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 05ff3f896053bbf152a8add7dfaa381af847b500 Author: Mateusz Sypniewski Date: Sat Sep 17 00:11:05 2022 +0000 Add symlink resolution in benchmark timer interface (#82734) The `sys.executable` string does not take into account if the file is a symlink or not. This lead to a false negative during checking if the two python interpreters were the same, while using an interpreter that was symlinked to another one. Finding the realpath fixes the problem. Tested manually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82734 Approved by: https://github.com/ngimel commit 2f0b3de443dd8d4477d70c5a56fa14496d1eebe3 Author: lezcano Date: Fri Sep 16 20:22:08 2022 +0000 Refs and decompositions for index_{add,copy,select,fill} (#85002) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85002 Approved by: https://github.com/ngimel commit d6c2080eb49ccaaf43cff37b7f07a85906250b92 Author: Justin Chu Date: Fri Sep 16 21:56:41 2022 +0000 [ONNX] Update ONNX documentation to include unsupported operators (#84496) - Update ONNX documentation to include unsupported operators - Include aten, quantized and other namespaces Pull Request resolved: https://github.com/pytorch/pytorch/pull/84496 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao, https://github.com/kit1980 commit 46843be1e6b1ec831bfe62dd0eefb04a813566ca Author: Justin Chu Date: Fri Sep 16 19:45:42 2022 +0000 [ONNX] Update error messages (#85179) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85179 Approved by: https://github.com/kit1980 commit a4c7cadca61efd3a9b585e3985dfd407e1120bd4 Author: Zain Rizvi Date: Fri Sep 16 22:48:10 2022 +0000 Retry installing lintrunner if download fails (#85189) Occasionally lintrunner fails to download due to network issues (it caused one build [to fail](https://github.com/pytorch/pytorch/actions/runs/3054209039/jobs/4925814096) this week) Let's make sure we retry the download before giving up Pull Request resolved: https://github.com/pytorch/pytorch/pull/85189 Approved by: https://github.com/huydhn commit 14b3bdc025ebf8408a5b80064b0c51aec8b69403 Author: PyTorch MergeBot Date: Fri Sep 16 22:33:06 2022 +0000 Revert "[FSDP] Option to keep grads in lower prec (#85134)" This reverts commit 607eccb13ca586f775fb09daeb728a4b4e30ebdd. Reverted https://github.com/pytorch/pytorch/pull/85134 on behalf of https://github.com/ZainRizvi due to broke trunk, failing the tests test_grads_reduced_precision (main.TestFSDPMixedPrecisionUnsharded) commit 4382da5d5e1b306f42d434e58e093d74e364bfc9 Author: Muhammed Shuaibi Date: Fri Sep 16 22:04:42 2022 +0000 Remove assertEqualIgnoreType from test_pooling (#85112) Fix TODOs related to https://github.com/pytorch/pytorch/issues/38095 in test_pooling.py. This PR correctly casts the expected outputs to satisfy the asserts. If you'd prefer feeding `exact_dtype=False` as an argument instead I can update accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85112 Approved by: https://github.com/kit1980 commit cd7e6d4ad1df9fb42bc557b6c8dffaa5535bae74 Author: Justin Chu Date: Fri Sep 16 17:30:24 2022 +0000 [ONNX] New symbolic function registry (#84382) The change brings the new registry for symbolic functions in ONNX. The `SymbolicRegistry` class in `torch.onnx._internal.registration` replaces the dictionary and various functions defined in `torch.onnx.symbolic_registry`. The new registry - Has faster lookup by storing only functions in the opset version they are defined in - Is easier to manage and interact with due to its class design - Builds the foundation for the more flexible registration process detailed in #83787 Implementation changes - **Breaking**: Remove `torch.onnx.symbolic_registry` - `register_custom_op_symbolic` and `unregister_custom_op_symbolic` in utils maintain their api for compatibility - Update _onnx_supported_ops.py for doc generation to include quantized ops. - Update code to register python ops in `torch/csrc/jit/passes/onnx.cpp` -0.1 seconds in execution time. -34% time spent in `_run_symbolic_function`. Tested on the alexnet example in public doc. ``` └─ 1.641 export <@beartype(torch.onnx.utils.export) at 0x7f19be17f790>:1 └─ 1.641 export torch/onnx/utils.py:185 └─ 1.640 _export torch/onnx/utils.py:1331 ├─ 0.889 _model_to_graph torch/onnx/utils.py:1005 │ ├─ 0.478 _optimize_graph torch/onnx/utils.py:535 │ │ ├─ 0.214 PyCapsule._jit_pass_onnx_graph_shape_type_inference :0 │ │ │ [2 frames hidden] │ │ ├─ 0.190 _run_symbolic_function torch/onnx/utils.py:1670 │ │ │ └─ 0.145 Constant torch/onnx/symbolic_opset9.py:5782 │ │ │ └─ 0.139 _graph_op torch/onnx/_patch_torch.py:18 │ │ │ └─ 0.134 PyCapsule._jit_pass_onnx_node_shape_type_inference :0 │ │ │ [2 frames hidden] │ │ └─ 0.033 [self] ``` ![image](https://user-images.githubusercontent.com/11205048/188032302-688d881e-860d-4046-bdba-90da54233576.png) The startup process takes 0.03 seconds. Calls to `inspect` will be eliminated when we switch to using decorators for registration in #84448 ![image](https://user-images.githubusercontent.com/11205048/188208910-250f0434-475d-4872-9abc-781535519305.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84382 Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 735154354b86296b4b8d99c78da764565af76018 Author: Kshiteej K Date: Fri Sep 16 21:44:23 2022 +0000 update torch.narrow doc (#85180) Fixes #84783 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85180 Approved by: https://github.com/kit1980 commit 5877cc9a9fce3ad1dc69ed6b862c9245a772631b Author: Catherine Lee Date: Fri Sep 16 21:35:33 2022 +0000 fix for rebase and merge (#85168) its --branch not -b Pull Request resolved: https://github.com/pytorch/pytorch/pull/85168 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi commit 17593f15bdc53c2ec41bc7f4a087f3a28e37b626 Author: Richard Zou Date: Fri Sep 16 12:34:40 2022 -0700 [functorch] Document DynamicLayer.{h, cpp} a bit more (#85178) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85178 Approved by: https://github.com/Chillee commit d559299ccf4b685ca1e42a7f090fbbb77e25efed Author: Salil Desai Date: Fri Sep 16 21:28:15 2022 +0000 [QNNPACK] Export cpuinfo-targets in clog CMakeLists (#84876) Summary: Fixes the following error when building qnnpack: ``` CMake Error: install(EXPORT "cpuinfo-targets" ...) includes target "cpuinfo" which requires target "clog" that is not in any export set. ``` This diff mirrors the changes to the CMakeLists of https://github.com/pytorch/cpuinfo/pull/69 Test Plan: ``` export ANDROID_NDK=/opt/android_ndk/r20 export ANDROID_NDK_HOME=${ANDROID_NDK} export ANDROID_SDK=/opt/android_sdk export ANDROID_HOME=${ANDROID_SDK} cd ~/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack ./scripts/build-android-arm64.sh ``` Succeeds Differential Revision: D39438768 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84876 Approved by: https://github.com/digantdesai commit c6c3346d5aaa548d82f3dff14ed54e687af50116 Author: Andrew Gu Date: Wed Sep 14 20:29:48 2022 +0000 [FSDP] Short-term fix to remove `optim_input` (#84201) This is a short-term quick fix to accommodate using the existing optimizer state APIs without passing `optim_input`. It preserves the existing `optim_input` code path but if `optim_input` is `None` while `optim` is not, then the APIs will use the new code path that relies on `self.param_groups` to get the information previously provided by `optim_input`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84201 Approved by: https://github.com/rohan-varma commit a9258eba8e87417479bde4004dfa9ffb6e60a8fa Author: Khushi Agrawal Date: Fri Sep 16 21:24:09 2022 +0000 [Testing] Port `bernoulli` and `multinomial` to ErrorInputs. (#74683) Hi, The PR aims to port `bernoulli` and `multinomial` to error inputs. Thanks! cc: @kshitij12345! :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/74683 Approved by: https://github.com/kshitij12345, https://github.com/mruberry commit a5d9d2aaa20f878d0a61bd2b682ae3e2248df07d Author: samdow Date: Fri Sep 16 15:49:11 2022 +0000 [functorch] remove argnums partial helper function, rewrite test to use slice argnum (#84951) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84951 Approved by: https://github.com/zou3519 commit 776e0fe75600b6d3a93060d91bbe0a31fc92afce Author: PyTorch MergeBot Date: Fri Sep 16 21:06:24 2022 +0000 Revert "Make ones and zeros's ref accepts variadic size argument (#85117)" This reverts commit 7e5616c9ff6347913d98627c60e39f72dce558e3. Reverted https://github.com/pytorch/pytorch/pull/85117 on behalf of https://github.com/ZainRizvi due to Failed trunk commit 490727a35f213778d2e709a3b03d899ad502c5f9 Author: Edward Z. Yang Date: Fri Sep 16 10:23:01 2022 -0700 New calling convention for Python dispatcher (#85133) Instead of calling into the Python dispatcher for EVERY dispatcher call, we now have a two step process. First, we getattr(op: OpOverload, dispatch_key) to "load" the handler for the function. This can either be a conventional function (in which case we will call it, in the same way the old Python dispatcher worked), or it can be a DispatchKey, in which case we will directly call that DispatchKey in C++, bypassing marshalling between Python and C++ entirely. OpOverload.__getattr__ is carefully written so that it will cache the A further optimization would be to define __slots__ on OpOverload, and ensuring that the DispatchKey strings are interned. The resulting Python dispatcher is less flexible: after the first lookup, the handler is cached and we won't recompute it. Furthermore, by default, dispatches will not go into Python, and so you won't get stack frames for the Python dispatcher by default. But we get a huge performance improvement: on the following microbenchmark we go from 2.5s to 1.9s. ``` import time import torch from functorch import make_fx def f(x): for i in range(1000): x = x * x return x begin = time.time() res = make_fx(f, tracing_mode="symbolic")(torch.randn(10, 20)) print(time.time()-begin) ``` Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85133 Approved by: https://github.com/wconstab commit e5fac7f5dc4f16070193bb7d06322e0faaa94099 Author: Edward Z. Yang Date: Thu Sep 15 21:48:59 2022 -0700 Optimize torch.ops.ns.opname.overload accessor in torch dispatch (#85132) This doesn't actually seem to help all that much. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85132 Approved by: https://github.com/wconstab commit 607eccb13ca586f775fb09daeb728a4b4e30ebdd Author: Rohan Varma Date: Fri Sep 16 17:54:08 2022 +0000 [FSDP] Option to keep grads in lower prec (#85134) Differential Revision: [D39565189](https://our.internmc.facebook.com/intern/diff/D39565189) Rehash of a similar PR from a month ago that got stale. Adds a config to FSDP MP so that gradients can be kept in lower precision, to support optimizers such as AnyPrecisionOptimizer which would like to keep grads in bf16. To do this, for sharded cases, we cannot simply omit the cast back to the full precision param dtype, otherwise when setting `p.grad = p._saved_grad_shard` in finalize_params, autograd will throw an error indicating that the grad dtype should match the param dtype when it is being set. As a workaround, we re-cast after setting this. Although, this means that for cases that use gradient accumulation, p._saved_grad_shard will be of the reduced dtype because it is set to p.grad in `_prep_grad_for_backward`. As a result, add a check + recast here as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85134 Approved by: https://github.com/awgu commit 7e5616c9ff6347913d98627c60e39f72dce558e3 Author: Sherlock Huang Date: Fri Sep 16 17:05:58 2022 +0000 Make ones and zeros's ref accepts variadic size argument (#85117) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85117 Approved by: https://github.com/ngimel, https://github.com/Lezcano commit 38778add8d7047bfcf29c754cd43ae9b258e4410 Author: Christian Puhrsch Date: Fri Sep 16 19:27:31 2022 +0000 flash_attention_helper mitigation: pass contiguous inputs (#85135) There appears to be a transient issue with respect to non-contiguous inputs in flash_attn and thus we're passing contiguous inputs to mitigate it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85135 Approved by: https://github.com/drisspg commit 7b3e177b8772d95e5aaa92415a632d280320c740 Author: Zain Rizvi Date: Fri Sep 16 19:07:57 2022 +0000 Increase docker build timeout (#85156) Docker builds used to take around 15 mins to run (more than the 10 min timeout) and have recently started taking even longer due to conda's slow dependency resolver. We were in this bad state where we _depended_ on the retry to complete the build. That is, the first attempt would try to build docker, timeout, then the second attempt would continue to build on top of the cache the first build had setup, etc. Increasing the timeout so that docker builds actually have enough time to complete the build within a single attempt Pull Request resolved: https://github.com/pytorch/pytorch/pull/85156 Approved by: https://github.com/huydhn commit 29eba319b4f56eae2b6a3bcc3830f8e080214aef Author: Sherlock Huang Date: Fri Sep 16 03:51:05 2022 +0000 Use alias for nop decomp (#84727) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84727 Approved by: https://github.com/Chillee commit d8eae6283db867a1abd75df0a86b1a48faef3043 Author: Feisi Fu Date: Fri Sep 16 17:49:06 2022 +0000 Rename 'torch/ao/nn/quantized._reference' to 'torch/ao/nn/quantized/reference'. (#84974) Currently, the path for reference modules contains _ which means it's private (https://github.com/pytorch/pytorch/tree/master/torch/ao/nn/quantized/_reference), but we would like to make it public since the reference module is now enabled by default in the fx graph mode quantization flow and it will be added to eager mode flow as well in the future. To make '_reference' public, it should satisfy the [public API rules](https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation). I did in the first commit (prepare '_reference' to be public): 1: add __all__ to public modules and packages; 2. made functions, that are only used in the file that the function is defined, private by adding _ at their names. Fixes #83090. (we rename the 'torch/ao/nn/quantized/_reference', because of migration #81667.) This is a dup for the #84786. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84974 Approved by: https://github.com/andrewor14, https://github.com/z-a-f commit d710c95cc01486b7f2922799dd033da9893b0e21 Author: lezcano Date: Fri Sep 16 15:27:25 2022 +0000 Implement forward AD for scatter_reduce (#85000) I left the case `reduction="prod"` for future work as it's a bit of a pain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85000 Approved by: https://github.com/soulitzer commit 6162a043640cf01695cf568edd0be047d56477ff Author: Natalia Gimelshein Date: Fri Sep 16 15:54:50 2022 +0000 fix half_to_float arg in *softmax decomp (#85120) Fixes https://github.com/pytorch/torchdynamo/issues/1239 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85120 Approved by: https://github.com/Chillee commit f37069aac7a0c4fd1c3455e4e9058bf56fb759f4 Author: Elias Ellison Date: Fri Sep 16 05:01:03 2022 +0000 Re-enable fixed dynamo tests (#84969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969 Approved by: https://github.com/bdhirsh, https://github.com/ezyang commit 54c9c4e73d5c2bfc9f244b62a81a99e8852890e7 Author: Elias Ellison Date: Fri Sep 16 05:01:02 2022 +0000 Flip fake tensors on in aot autograd (#84968) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84968 Approved by: https://github.com/Chillee commit 61ba125064dc452e2fef3bfa1731db46d57fc322 Author: Richard Zou Date: Thu Sep 15 11:10:16 2022 -0700 Add warning about installing functorch via setup.py (#85095) We'll probably delete the functorch/setup.py file in 2 weeks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85095 Approved by: https://github.com/samdow commit 2e1ec5d18cbf49a195a4bff2ac1cff43b159c498 Author: lezcano Date: Fri Sep 16 09:10:06 2022 +0000 Re-enables some tests for linalg.det (#85141) At last. Fixes https://github.com/pytorch/functorch/issues/961 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85141 Approved by: https://github.com/zou3519 commit 8b29b7953a46fbab9363294214f7689d04df0a85 Author: lezcano Date: Thu Sep 15 19:30:51 2022 +0000 Fix behaviour of index_add / atomicAdd(bool,bool) (#85100) This fixes one of the first PyTorch issues I opened, as it bit me again Fixes https://github.com/pytorch/pytorch/issues/54317 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85100 Approved by: https://github.com/ngimel commit 4bdc0af53d235b5939e6792d2f54004fc11442bd Author: Horace He Date: Fri Sep 16 02:29:13 2022 +0000 Added support for symbolic is_contiguous (#84829) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84829 Approved by: https://github.com/ezyang commit 5652ab22f6d56ed74b21f13a1ba71f09cc94ee4a Author: Andrew Gu Date: Thu Sep 15 19:20:50 2022 +0000 [FSDP] Add `_set_flattened()`; `_is_flattened()` (#85038) For both exposing the original parameters and for TP integration, we cannot only rely on `isinstance(param, FlatParameter)` to ignore already-flattened parameters in `.named_parameters()`. As a simple workaround, we can mark original parameters or `ShardedTensor`s with an attribute `_fsdp_flattened` (saved as a string variable `FSDP_FLATTENED`) to indicate that the parameter/tensor has already been flattened. This issue only arises for recursive/nested FSDP wrapping. This PR also changes `isinstance(param, FlatParameter)` checks to `type(param) is FlatParameter` because all tensor subclasses that have `_is_param == True` will return `True` for `isinstance(param, )`. This means that a `ShardedTensor` parameter will return `True` for `isinstance(st, FlatParameter)`, which is not what we want. https://github.com/pytorch/pytorch/blob/5271494ef21ae0140755a41f3b16a8bd745642b6/torch/nn/parameter.py#L8-L10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85038 Approved by: https://github.com/rohan-varma commit 0ec19db7ac88e307135100ddcfc418ae3925844f Author: PyTorch MergeBot Date: Fri Sep 16 02:44:33 2022 +0000 [vision hash update] update the pinned vision hash (#85130) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85130 Approved by: https://github.com/pytorchbot commit b363e9874a1d33ac8e6d3c6f528025d7217bb101 Author: PyTorch MergeBot Date: Fri Sep 16 02:43:16 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#85129) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85129 Approved by: https://github.com/pytorchbot commit 647aeb831f5fbaae9c815d02b2eca256c43c9042 Author: Shisuiuzumaki Date: Fri Sep 16 01:45:20 2022 +0000 torch/jit/_trace.py in compare_outputs(original, reference, match_wha… (#84850) Fixes #83533 ``` /opt/homebrew/lib/python3.9/site-packages/torch/jit/_trace.py in _check_trace(check_inputs, func, traced_func, check_tolerance, strict, force_outplace, is_trace_module, _module_class) 525 traced_outs = run_mod_and_filter_tensor_outputs(traced_func, inputs, "trace") 526 fn_outs = run_mod_and_filter_tensor_outputs(func, inputs, "Python function") --> 527 if compare_outputs(traced_outs, fn_outs, "Python function"): 528 check_outs = run_mod_and_filter_tensor_outputs( 529 check_mod_func, inputs, "repeated trace" /opt/homebrew/lib/python3.9/site-packages/torch/jit/_trace.py in compare_outputs(original, reference, match_what) 500 else: 501 torch.testing.assert_close( --> 502 orig.double(), 503 ref.double(), 504 rtol=check_tolerance, TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead. ``` ``` if orig.is_mps or ref.is_mps: torch.testing.assert_close( orig.float(), ref.float(), rtol=check_tolerance, atol=default_tolerances(orig, ref)[1], equal_nan=True, ) else: torch.testing.assert_close( orig.double(), ref.double(), rtol=check_tolerance, atol=default_tolerances(orig, ref)[1], equal_nan=True, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84850 Approved by: https://github.com/davidberard98 commit 54bccbb22fcc970c48a750d658fe675c80809d42 Author: Catherine Lee Date: Fri Sep 16 01:33:42 2022 +0000 [mergebot] rebase + merge (#85028) adds flag to rebase and merge with one command by just running the tryrebase.py script before running the trymerge.py script testing on https://github.com/clee2000/random-testing/pull/19 corresponding test-infra change: https://github.com/pytorch/test-infra/pull/712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85028 Approved by: https://github.com/janeyx99, https://github.com/huydhn commit 89525cbd6930ae0be3003dc55e02edb70e395458 Author: Steven Krawczyk Date: Fri Sep 16 01:26:22 2022 +0000 Add variable_list support to ExtractVariables struct (#84583) This is required to unblock https://github.com/pytorch/xla/pull/3843, which lowers the einsum op for pytorch/xla. Because one method input parameter is a TensorList, we need to support TensorLists here so that we can support einsum gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84583 Approved by: https://github.com/soulitzer commit 50733c8bbafa596359caf08bdf97c8a3628aaf6c Author: Animesh Jain Date: Fri Sep 16 01:20:54 2022 +0000 TorchDynamo Remove context manager (#85124) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/85124 Approved by: https://github.com/ezyang commit 95a2c3df31983bbe5c28b19e2910855189fae7a1 Author: Kurt Mohler Date: Fri Sep 16 01:10:12 2022 +0000 Replace `expectedAlertNondeterministic` with simpler check function (#84808) Fixes #84807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84808 Approved by: https://github.com/mruberry commit 1275e2df1fcc8ba7651450a0e6c7ed30036de340 Author: Edward Z. Yang Date: Thu Sep 15 09:03:37 2022 -0700 Remove getattr magic method from OpOverload (#85090) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85090 Approved by: https://github.com/wconstab commit 00ce302c077cf1b26e9190da146b008dd319eed2 Author: Edward Z. Yang Date: Wed Sep 14 22:00:50 2022 -0700 Performance optimizations to proxy tensor (#85049) - Lazily allocate FX nodes for size/stride accessors on proxy tensor - Properly track derived computations on strides/numel/etc - Remove unnecessary tree_map at end of proxy tensor trace checking invariants; we will just have to be smart (it's too expensive) - Avoid tree_map in sym proxy tracing Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85049 Approved by: https://github.com/wconstab commit d49943bda8e495bdb358e20b6eb114c442afa6e9 Author: nikitaved Date: Thu Sep 15 21:36:21 2022 +0000 Faster mul(sparse, sparse) with broadcasting in dense dims. (#83428) Preliminary benchmarks (square matrices of shape (n, n)).
Script ```python import torch import math from IPython import get_ipython from itertools import product, repeat import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) problem_dims = ( (10000, 100), (100000, 1000), (1000000, 10000), (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), ) name = "PR" device = "cuda" results = [] for n, nnz in problem_dims: def gen_tensor(coalesce=False): shape = (n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,), device=device) colidx = torch.randint(low=0, high=ncols, size=(nnz,), device=device) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz, device=device) itemidx = torch.hstack((itemidx, itemidx)) xvalues = torch.hstack((xvalues, xvalues)) res = torch.sparse_coo_tensor(itemidx, xvalues, size=shape) if coalesce: return res.coalesce() else: return res for x_coalesce, y_coalesce in product(*repeat((True, False), 2)): x = gen_tensor(x_coalesce) y = gen_tensor(y_coalesce) smtp = "x * y" timer = Timer(smtp, globals=globals(), label="coo.mul", description=f"{name}: mul, device: {device}", sub_label=f"n={n}, nnz={nnz}, coalesce=({x_coalesce, y_coalesce})", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_{device}_mul.pickle", 'wb') as f: pickle.dump(results, f) ```
Gather results ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "master" ] device = 'cuda' timers = [] for name in files: with open("{}_{}_mul.pickle".format(name, device), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ```
CUDA ``` [------------------------------------------------- coo.mul -------------------------------------------------] | PR: mul, device: cuda | master: mul, device: cuda 24 threads: ------------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) | 95 | 91 n=10000, nnz=100, coalesce=((True, False)) | 87 | 242 n=10000, nnz=100, coalesce=((False, True)) | 87 | 226 n=10000, nnz=100, coalesce=((False, False)) | 130 | 371 n=100000, nnz=1000, coalesce=((True, True)) | 100 | 521 n=100000, nnz=1000, coalesce=((True, False)) | 90 | 649 n=100000, nnz=1000, coalesce=((False, True)) | 100 | 659 n=100000, nnz=1000, coalesce=((False, False)) | 200 | 781 n=1000000, nnz=10000, coalesce=((True, True)) | 100 | 4861 n=1000000, nnz=10000, coalesce=((True, False)) | 100 | 5012 n=1000000, nnz=10000, coalesce=((False, True)) | 98 | 5010 n=1000000, nnz=10000, coalesce=((False, False)) | 384 | 5174 n=10, nnz=100, coalesce=((True, True)) | 100 | 79 n=10, nnz=100, coalesce=((True, False)) | 100 | 221 n=10, nnz=100, coalesce=((False, True)) | 100 | 221 n=10, nnz=100, coalesce=((False, False)) | 100 | 350 n=10, nnz=1000, coalesce=((True, True)) | 100 | 100 n=10, nnz=1000, coalesce=((True, False)) | 100 | 240 n=10, nnz=1000, coalesce=((False, True)) | 100 | 254 n=10, nnz=1000, coalesce=((False, False)) | 100 | 392 n=10, nnz=10000, coalesce=((True, True)) | 100 | 110 n=10, nnz=10000, coalesce=((True, False)) | 110 | 286 n=10, nnz=10000, coalesce=((False, True)) | 110 | 286 n=10, nnz=10000, coalesce=((False, False)) | 271 | 455 n=100, nnz=1000, coalesce=((True, True)) | 110 | 851 n=100, nnz=1000, coalesce=((True, False)) | 110 | 1000 n=100, nnz=1000, coalesce=((False, True)) | 110 | 990 n=100, nnz=1000, coalesce=((False, False)) | 140 | 1124 n=100, nnz=10000, coalesce=((True, True)) | 110 | 5137 n=100, nnz=10000, coalesce=((True, False)) | 110 | 5391 n=100, nnz=10000, coalesce=((False, True)) | 100 | 5405 n=100, nnz=10000, coalesce=((False, False)) | 249 | 5539 n=1000, nnz=10000, coalesce=((True, True)) | 100 | 8598 n=1000, nnz=10000, coalesce=((True, False)) | 100 | 8800 n=1000, nnz=10000, coalesce=((False, True)) | 100 | 8782 n=1000, nnz=10000, coalesce=((False, False)) | 255 | 8956 n=1000, nnz=100000, coalesce=((True, True)) | 120 | 84500 n=1000, nnz=100000, coalesce=((True, False)) | 200 | 88560 n=1000, nnz=100000, coalesce=((False, True)) | 160 | 89000 n=1000, nnz=100000, coalesce=((False, False)) | 373 | 89000 n=1000, nnz=1000000, coalesce=((True, True)) | 312 | 606400 n=1000, nnz=1000000, coalesce=((True, False)) | 1340 | 609200 n=1000, nnz=1000000, coalesce=((False, True)) | 1340 | 609100 n=1000, nnz=1000000, coalesce=((False, False)) | 4408 | 611400 Times are in microseconds (us). ```
CPU ``` [------------------------------------------------ coo.mul ------------------------------------------------] | PR: mul, device: cpu | master: mul, device: cpu 24 threads: ----------------------------------------------------------------------------------------------- n=10000, nnz=100, coalesce=((True, True)) | 8 | 8 n=10000, nnz=100, coalesce=((True, False)) | 32 | 34 n=10000, nnz=100, coalesce=((False, True)) | 32 | 34 n=10000, nnz=100, coalesce=((False, False)) | 41 | 56 n=100000, nnz=1000, coalesce=((True, True)) | 24 | 24 n=100000, nnz=1000, coalesce=((True, False)) | 90 | 100 n=100000, nnz=1000, coalesce=((False, True)) | 87 | 100 n=100000, nnz=1000, coalesce=((False, False)) | 231 | 255 n=1000000, nnz=10000, coalesce=((True, True)) | 190 | 200 n=1000000, nnz=10000, coalesce=((True, False)) | 908 | 2023 n=1000000, nnz=10000, coalesce=((False, True)) | 800 | 2036 n=1000000, nnz=10000, coalesce=((False, False)) | 3684 | 3989 n=10, nnz=100, coalesce=((True, True)) | 8 | 7 n=10, nnz=100, coalesce=((True, False)) | 34 | 30 n=10, nnz=100, coalesce=((False, True)) | 33 | 30 n=10, nnz=100, coalesce=((False, False)) | 44 | 50 n=10, nnz=1000, coalesce=((True, True)) | 8 | 7 n=10, nnz=1000, coalesce=((True, False)) | 100 | 100 n=10, nnz=1000, coalesce=((False, True)) | 130 | 100 n=10, nnz=1000, coalesce=((False, False)) | 746 | 210 n=10, nnz=10000, coalesce=((True, True)) | 8 | 7 n=10, nnz=10000, coalesce=((True, False)) | 1000 | 1500 n=10, nnz=10000, coalesce=((False, True)) | 1000 | 1510 n=10, nnz=10000, coalesce=((False, False)) | 3063 | 2457 n=100, nnz=1000, coalesce=((True, True)) | 25 | 25 n=100, nnz=1000, coalesce=((True, False)) | 180 | 130 n=100, nnz=1000, coalesce=((False, True)) | 200 | 130 n=100, nnz=1000, coalesce=((False, False)) | 271 | 255 n=100, nnz=10000, coalesce=((True, True)) | 100 | 100 n=100, nnz=10000, coalesce=((True, False)) | 2444 | 2290 n=100, nnz=10000, coalesce=((False, True)) | 2455 | 2357 n=100, nnz=10000, coalesce=((False, False)) | 5316 | 3783 n=1000, nnz=10000, coalesce=((True, True)) | 204 | 211 n=1000, nnz=10000, coalesce=((True, False)) | 2457 | 2480 n=1000, nnz=10000, coalesce=((False, True)) | 2448 | 2539 n=1000, nnz=10000, coalesce=((False, False)) | 3665 | 4801 n=1000, nnz=100000, coalesce=((True, True)) | 2293 | 2374 n=1000, nnz=100000, coalesce=((True, False)) | 9000 | 24620 n=1000, nnz=100000, coalesce=((False, True)) | 8000 | 25080 n=1000, nnz=100000, coalesce=((False, False)) | 26500 | 47650 n=1000, nnz=1000000, coalesce=((True, True)) | 10000 | 13000 n=1000, nnz=1000000, coalesce=((True, False)) | 80000 | 362200 n=1000, nnz=1000000, coalesce=((False, True)) | 78050 | 392600 n=1000, nnz=1000000, coalesce=((False, False)) | 312100 | 766900 Times are in microseconds (us). ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83428 Approved by: https://github.com/cpuhrsch commit abaf99d37fbbec58125f3e28b90b0bbff3026527 Author: Pearu Peterson Date: Wed Sep 14 23:14:42 2022 +0300 Enable unary elementwise inplace ops for all sparse compressed layouts. (#85031) Fixes #84998 Unblocks #84897 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85031 Approved by: https://github.com/cpuhrsch commit 27ec195a81522df397098f7ffd12c06773261000 Author: samdow Date: Thu Sep 15 18:10:03 2022 +0000 [functorch] fix jacfwd so all inputs get wrappers (#84915) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84915 Approved by: https://github.com/zou3519 commit 64899c5d10944a617cc65c938e84c38011456a58 Author: Nikolay Korovaiko Date: Thu Sep 15 22:58:50 2022 +0000 change the type of storage_offset to SymInt (#85102) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/85102 Approved by: https://github.com/ezyang commit 7f88934a8fb9b376b32c722ac2f05959da34c147 Author: soulitzer Date: Thu Sep 15 22:46:16 2022 +0000 [reland 2] Call jit decomp in VariableType to improve forward AD coverage (#84976) Reland of https://github.com/pytorch/pytorch/pull/84675 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84976 Approved by: https://github.com/zou3519 commit 7dcc723d35a8795c7f386cf0a439299a89432a75 Author: Rodrigo Kumpera Date: Thu Sep 15 22:32:48 2022 +0000 [c10d] Ensure collectives are called with the same dtype for all tensor params. (#84664) While passing tensors with different dtypes don't crash, they don't produce sensible results. We see data tearing instead of casting. It's not clear we want to support transparent casting so, for now, we fail when such input is presented. Fixes #84525 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84664 Approved by: https://github.com/rohan-varma commit 5558deac592cd82c1f2ccc9fcb9c3924bcaf0266 Author: Bowen Bao Date: Thu Sep 15 22:25:48 2022 +0000 [ONNX] Add `caffe2/python/onnx/**` to merge rule (#85118) This PR extends merge rule such that any related fixes needed for caffe2 onnx tests can be merged in the same PR. Test skips need to be added to `caffe2/python/onnx/tests/onnx_backend_test.py` for new ONNX operators when ONNX submodule is updated. Unblocks #83201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85118 Approved by: https://github.com/malfet commit b1f5644fadda2c202b8f431f6eca3e326ac92b4e Author: Larry Liu <8188269+larryliu0820@users.noreply.github.com> Date: Thu Sep 15 22:16:30 2022 +0000 [frontend] Print real type for Argument (#85103) Retry of [#84985](https://github.com/pytorch/pytorch/pull/84985). For some reason `ghstack land` doesn't work on that PR In JIT world, if we parse an argument in schema and print its type, we always get `fake_type`. For example, `MemoryFormat? memory_format` becomes `int? memory_format`. This doesn't align with the original schema string and creates discrepency between `torchgen.FunctionSchema` and `torch._C.FunctionSchema`. Here I'm letting `torch._C.Argument` print its `real_type` and hence be aligned with the original schema string. Rely on newly added unit test. Differential Revision: [D39550665](https://our.internmc.facebook.com/intern/diff/D39550665) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85103 Approved by: https://github.com/cccclai commit 52a2b612035e081626ff6f23eec87782bde6643c Author: Xu Zhao Date: Thu Sep 15 21:48:28 2022 +0000 Fix fetch function which breaks user code (#85099) The [fastNLP](https://github.com/fastnlp/fastNLP/blob/v0.6.0/fastNLP/core/batch.py#L51) model uses DataSetGetter to fetch data from the dataset. The following code breaks because of https://github.com/pytorch/pytorch/pull/84301: ``` from fastNLP.io.pipe.qa import CMRC2018BertPipe input_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".data", "cmrc2018-sim") data_bundle = CMRC2018BertPipe().process_from_file(paths=input_dir) data_bundle.rename_field('chars', 'words') data_bundle.get_dataset('dev') dataset = DataSetGetter(dataset, as_numpy) dataiter = torch.utils.data.DataLoader(dataset=dataset) for batch in dataiter: ``` This is because for the `DataSetGetter` class, the following condition holds: ``` ``` This PR adds an additional check to make sure `__getitems__` is only called when it is not None. This error was found by the torchbench nightly CI, original error stack trace: ``` ERROR: test_fastNLP_Bert_train_cuda (__main__.TestBenchmark) ---------------------------------------------------------------------- components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last): File "/home/circleci/project/components/_impl/workers/subprocess_rpc.py", line 470, in _run_block exec( # noqa: P204 File "", line 35, in File "", line 12, in _run_in_worker_f File "/home/circleci/project/torchbenchmark/util/model.py", line 16, in __call__ obj = type.__call__(cls, *args, **kwargs) File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 93, in __init__ self.example_inputs = self._prefetch(example_inputs) File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 133, in _prefetch for batch_x, batch_y in example_inputs: File "/home/circleci/miniconda3/lib/python3.8/site-packages/fastNLP/core/batch.py", line 266, in __iter__ for indices, batch_x, batch_y in self.dataiter: File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__ data = self._next_data() File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 719, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch data = self.dataset.__getitems__(possibly_batched_index) TypeError: 'NoneType' object is not callable ``` Full error log: https://app.circleci.com/pipelines/github/pytorch/benchmark/5143/workflows/0676f36d-0ab4-42bd-adb4-90e6b0df76d1/jobs/5293 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85099 Approved by: https://github.com/ejguan commit 2386cd2945498ce7261b761a8d9bd5b59d06c5a1 Author: Khushi Agrawal Date: Thu Sep 15 19:34:44 2022 +0000 [reland] [numpy] add torch.concatenate, alias of torch.cat (#85073) Previous PR: #82946 Fixes #81161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85073 Approved by: https://github.com/mruberry commit 25ecc4889de895fa2041b556e1ea4dc057c33712 Author: Andrew Gu Date: Thu Sep 15 15:43:12 2022 +0000 [FSDP] Fix memory regression! (#85087) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85087 Approved by: https://github.com/zhaojuanmao commit 4306a1882622d0ba293b3704c0bc1ef5ce181edb Author: Elias Ellison Date: Thu Sep 15 17:04:32 2022 +0000 Fix tied params with Fake Tensor (#85065) Tested locally to fix `USE_FAKE_TENSOR=1 python benchmarks/huggingface.py --ci -d cuda --float32 --backend=aot_nop --training --only=RobertaForCausalLM`. When I tried to repro with a small example test could not successfully, but @anijain2305 is attempting to with minifier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85065 Approved by: https://github.com/anijain2305 commit 2e41fbc2114d89d046a3002f0614bbfe933c01b9 Author: Ubuntu Date: Wed Sep 14 16:48:26 2022 +0000 [ONNX] Enable test_custom_opsets_inverse (#85013) in [#87004](https://github.com/pytorch/pytorch/pull/80074), `aten::inverse` becomes alias of `aten::linalg.inv`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85013 Approved by: https://github.com/BowenBao commit a225f3cfce19a9baf40db4814efcc44a9161b286 Author: Pearu Peterson Date: Wed Sep 14 23:14:42 2022 +0300 torch.zero_ on a sparse compressed tensor resets nnz to 0 (#85030) Fixes #84997 and #82683 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85030 Approved by: https://github.com/cpuhrsch commit 21e656b020ab16c5748720a4eb95aea93778e0de Author: Bowen Bao Date: Thu Sep 15 18:22:41 2022 +0000 [ONNX] Add `third_party/onnx` to merge rule (#84715) We expect to bump onnx submodule version regularly to develop support for new onnx operators/functions. Adding this to merge rule reduces the burden for core maintainers for approval. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84715 Approved by: https://github.com/thiagocrepaldi, https://github.com/malfet commit 6bd7d0f85665cecc0396f1be4c1bac6b14e3f5d1 Author: Salahuddin <60926009+ShisuiUzumaki@users.noreply.github.com> Date: Thu Sep 15 18:17:10 2022 +0000 doc string fixed in torch.distributed.reduce_scatter (#84983) Fixes #84865 Previous `torch.distributed.reduce_scatter`: ``` def reduce_scatter(output, input_list, op=ReduceOp.SUM, group=None, async_op=False): """ Reduces, then scatters a list of tensors to all processes in a group. Args: output (Tensor): Output tensor. input_list (list[Tensor]): List of tensors to reduce and scatter. group (ProcessGroup, optional): The process group to work on. If None, the default process group will be used. async_op (bool, optional): Whether this op should be an async op. ``` Fixed: ``` def reduce_scatter(output, input_list, op=ReduceOp.SUM, group=None, async_op=False): """ Reduces, then scatters a list of tensors to all processes in a group. Args: output (Tensor): Output tensor. input_list (list[Tensor]): List of tensors to reduce and scatter. op (optional): One of the values from ``torch.distributed.ReduceOp`` enum. Specifies an operation used for element-wise reductions group (ProcessGroup, optional): The process group to work on. If None, the default process group will be used. async_op (bool, optional): Whether this op should be an async op. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84983 Approved by: https://github.com/H-Huang commit d52452b3d1ff14a87e1d79c8d8bd67f0f074e6e6 Author: Nikita Shulga Date: Thu Sep 15 17:45:05 2022 +0000 [Functorch] Set rpath for Mac builds (#85086) For the `delocate-wheel` to be able to find dependent libs Fixes https://github.com/pytorch/pytorch/issues/85007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85086 Approved by: https://github.com/kit1980, https://github.com/zou3519 commit 4db1588ca09b4f6328c5bd98701e9c302cb39800 Author: Kshiteej K Date: Thu Sep 15 15:59:23 2022 +0000 [functorch] follow-up vmapjvpvjp (#84992) Ref 1: https://github.com/pytorch/pytorch/pull/83375#discussion_r970046113 Ref 2: https://github.com/pytorch/pytorch/pull/83375#discussion_r970047848 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84992 Approved by: https://github.com/zou3519 commit 50142827925374c23b75a840eccddf3ad6d05c1e Author: atalman Date: Thu Sep 15 15:47:36 2022 +0000 Removing cuda 11.3 nightly builds (#84866) Removing cuda 11.3 nightly builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/84866 Approved by: https://github.com/weiwangmeta, https://github.com/malfet commit ebd4e90ff7f6d1e57bbf6f2e717a8addb6b28e28 Author: Seonglyong Gong Date: Thu Sep 15 06:41:33 2022 +0000 [Profiler] add config option to remove 'Call stack' field from trace file (#84982) Summary: `Call stack` field increases trace file size exponentially for Python stack tracing (need to be deprecated carefully). Added a config option to avoid this increase. Test Plan: `experimental_config=_ExperimentalConfig(no_callstack_trace=True),` will remove the field. + CI tests Differential Revision: D39489828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84982 Approved by: https://github.com/robieta commit a22f4f535b32664bd6c1286604773f508ed8ff69 Author: zhuyuhua-v Date: Thu Sep 15 06:01:14 2022 +0000 Add xpu path for GRUCell (#83723) Add xpu path for GRUCell. We supported a new kernel named _thnn_fused_gru_cell to fuse the small ops of GRU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83723 Approved by: https://github.com/ezyang commit 17925122d091fcf5b1b14a82d4718396dd0199e8 Author: Sherlock Huang Date: Wed Sep 14 21:46:46 2022 +0000 Rewrite new_zeros, new_ones, new_full decomp with aten.full (#84946) We should **NOT** introducing non-functional op for decomps of functional op. For example ``` make_fx(functionalize(lambda x: x.new_zeros(3)), decomposition_table=decomposition_table)(x) ``` is producing ``` def forward(self, x_1): empty = torch.ops.aten.empty.memory_format([3, 4], dtype = torch.float32, layout = torch.strided, device = device(type='cpu'), pin_memory = False) zero_ = torch.ops.aten.zero_.default(empty); empty = None return zero_ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84946 Approved by: https://github.com/ngimel commit 65158b8876b6e65f82d7844e543afff55d1a44f9 Author: Edward Z. Yang Date: Wed Sep 14 21:05:03 2022 -0700 empty strided symint (#84830) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84830 Approved by: https://github.com/ezyang commit d05f07494a9a32c63f9218c0e703764a02033bb9 Author: Sergii Dymchenko Date: Thu Sep 15 03:08:49 2022 +0000 Use angle brackets in include for internal clangtidy (#85032) This issue was found after importing https://github.com/pytorch/pytorch/pull/70978 into fbsource. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85032 Approved by: https://github.com/huydhn commit be800cd6ea783a66d9b722116a6f483248f5c53e Author: PyTorch MergeBot Date: Thu Sep 15 02:36:55 2022 +0000 [vision hash update] update the pinned vision hash (#85061) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85061 Approved by: https://github.com/pytorchbot commit 625e44c1df211d6753609a9b391cb10f2f94367f Author: PyTorch MergeBot Date: Thu Sep 15 02:36:32 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#85060) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85060 Approved by: https://github.com/pytorchbot commit 62af1c9eedf958293aa9a60c59581410c93e264c Author: Andrew Gu Date: Wed Sep 14 23:48:14 2022 +0000 [Easy][FSDP] Change `assert` -> `p_assert` (#85052) This changes a few `assert`s to `p_assert()`s because they can run in the backward (some are in the forward, but AC can make them run in the backward). Pull Request resolved: https://github.com/pytorch/pytorch/pull/85052 Approved by: https://github.com/zhaojuanmao commit cdd625ba702fe1ef812a910256fcfc60f233dadf Author: Andrew Gu Date: Wed Sep 14 23:34:18 2022 +0000 [Easy][FSDP] Remove outdated comment (#85051) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85051 Approved by: https://github.com/zhaojuanmao commit cc62ad79c752534bd8fdd07c5bf494c9534337d2 Author: Andrew Gu Date: Wed Sep 14 22:52:07 2022 +0000 [FSDP] Fix `pin_memory()` for CPU offloading (#85048) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85048 Approved by: https://github.com/zhaojuanmao commit e7ad699be0c33f72be43249445c430d953e0747e Author: Rohan Varma Date: Wed Sep 14 22:40:12 2022 +0000 Resubmit bfloat support for im2col,col2im (#84372) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84372 Approved by: https://github.com/awgu, https://github.com/ngimel commit 8ca1839d32a56ea7ef007bf43fab38cc94b3f608 Author: Michael Voznesensky Date: Thu Sep 15 00:43:36 2022 +0000 Python Dispatcher integration with C++ dispatcher (#85050) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050 Approved by: https://github.com/malfet commit 3a107bc9bedb6642b280c430ee5389b6ca4c2ca3 Author: Richard Zou Date: Wed Sep 14 10:33:46 2022 -0700 [functorch] fix vmapvjpvjp test for prelu (#84939) Turns out this is just a composite compliance issue. Branching on if something requires grad or not can lead to incorrect gradients if we have a BatchedTensor wrapping a tensor that requires grad. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/84939 Approved by: https://github.com/soulitzer commit 8badb00ff4239acb69a277cff7527f80707d09aa Author: Richard Zou Date: Wed Sep 14 10:33:46 2022 -0700 [functorch] fix conv_transpose with groups batching rule (#84938) The original batching rule didn't work in all cases, so this PR fixes it. Test Plan: - added new test case to conv_transpose2d's OpInfo. Surprisingly the other test cases didn't catch the bug. - Removed some xfails and skips as a result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84938 Approved by: https://github.com/samdow commit 8cb7826889ad488307caef918a731471be370eac Author: Rohan Varma Date: Wed Sep 14 14:59:34 2022 -0700 [CheckpointWrapper] Reentrant kwarg support (#84908) A temporary patch to support keyword args when reentrant checkpoint wrapper is used. This is need to unblock some crucial workloads, the ideal fix would be checking this directly into torch.utils.checkpoint. Differential Revision: [D39453453](https://our.internmc.facebook.com/intern/diff/D39453453/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84908 Approved by: https://github.com/awgu commit 55ca6901a79c68353525b247873dfbf3bf14e959 Author: Rohan Varma Date: Wed Sep 14 14:59:33 2022 -0700 [CheckpointWrapper] Decouple CPU offload (#84907) This fixes the activation offload for checkpoint wrapper, which was previously broken. It was broken because it was tightly coupled with activation checkpoint, i.e. we did: ``` with save_on_cpu: checkpoint(module_forward()) ``` which would not offload any activation tensors to CPU, as those activations would already be not saved by autograd due to the checkpoint implementation taking priority. Now, if `offload_to_cpu` is specified, we only do `save_on_cpu` and no checkpoint, so all intermediate tensors are offloaded to CPU instead of checkpointed. These wrappers can be composed, i.e. if we have `(Linear, Linear) -> (Linear, Linear) -> (Linear, Linear)` we can do `Offload( checkpoint(Linear, Linear) -> checkpoint(Linear, Linear) -> checkpoint(Linear, Linear))` and inner tensors would be checkpointed while outers will be offloaded. Differential Revision: [D39448882](https://our.internmc.facebook.com/intern/diff/D39448882/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84907 Approved by: https://github.com/awgu commit 166ea7e6b1b7a7fc34fa6abc1cac6d9eca6fc720 Author: samdow Date: Wed Sep 14 15:08:30 2022 -0400 [functorch] fix jacrev so all inputs get wrappers (#84914) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84914 Approved by: https://github.com/zou3519 commit 1a6cf6ea8861549b70f40eace4817ee4ed84a152 Author: Nikita Shulga Date: Wed Sep 14 23:40:20 2022 +0000 [MPS] Fix int rounding div crash on M1 (#85016) Fixes https://github.com/pytorch/pytorch/issues/84995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85016 Approved by: https://github.com/kulinseth commit 976f8bee94f35f37f0f58a8bebccf87f0a80d8a6 Author: Wanchao Liang Date: Wed Sep 14 06:02:08 2022 +0000 [c10d] add ncclGetLastError to NCCL pg (#83724) This PR add ncclGetLastError API to the nccl pg, to provide better error reporting out of nccl failures directly, instead of guessing on random reasons Differential Revision: [D39161199](https://our.internmc.facebook.com/intern/diff/D39161199) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83724 Approved by: https://github.com/kwen2501, https://github.com/H-Huang commit ccade9410f1d72e766d86fabeeb80822dd36f449 Author: Edward Z. Yang Date: Wed Sep 14 10:51:36 2022 -0700 Don't detach when making views; force caller to detach (#84893) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84893 Approved by: https://github.com/soulitzer, https://github.com/SherlockNoMad commit 2711b9fa63af23c25b0f3f1301a72291afd68655 Author: PyTorch MergeBot Date: Wed Sep 14 22:27:30 2022 +0000 Revert "[CUBLAS][CUDA GRAPHS] Explicitly set the workspace for cuBLAS handles (#83461)" This reverts commit 713d8b855223970dc98ec81bb722fba002ac1390. Reverted https://github.com/pytorch/pytorch/pull/83461 on behalf of https://github.com/malfet due to Broke CUDA-10.2 builds, see https://hud.pytorch.org/pytorch/pytorch/commit/713d8b855223970dc98ec81bb722fba002ac1390 commit a1a95d402d300070d5ffc21b8f3a0e0bfbc38323 Author: Zain Rizvi Date: Wed Sep 14 22:04:43 2022 +0000 Fix inheritance in TestDataLoaderUtil (#85018) TestDataLoaderUtils needs to run it's parent class's setUp method to actually disable flaky tests (see https://github.com/pytorch/pytorch/issues/70516#issuecomment-1247045072 for details) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85018 Approved by: https://github.com/clee2000, https://github.com/huydhn commit 713d8b855223970dc98ec81bb722fba002ac1390 Author: Eddie Yan Date: Wed Sep 14 21:56:48 2022 +0000 [CUBLAS][CUDA GRAPHS] Explicitly set the workspace for cuBLAS handles (#83461) We're seeing an issue where repeatedly capturing graphs incurs increasing memory usage as cuBLAS internally allocates a new workspace for each graph even when the same handle is being used: https://gist.github.com/tomconerlyanth/a20c04a4a46a0f6e9ce18f5280729b36 This PR works around the issue by intercepting the `CUBLAS_WORKSPACE_CONFIG` environment variable and allocating the workspace for the cuBLAS handle explicitly. CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/83461 Approved by: https://github.com/ngimel commit 7b64c885d5ac0ebac50041e0bbbb82ff92e92dc3 Author: Huy Do Date: Wed Sep 14 21:55:10 2022 +0000 Enable manual test config label selection on GHA macos (#84895) Following up with https://github.com/pytorch/pytorch/pull/83690 and https://github.com/pytorch/pytorch/pull/84669, functorch team has started using the new label in some of their [PRs](https://github.com/pytorch/pytorch/labels/test-config%2Ffunctorch). This is to enable manual test config using label on GHA macos. This also works with `ciflow/mps` as follows: * If only `test-config/functorch` is present, no arm64 build is performed and mps test is skipped * If only `ciflow/mps` is present, mps test is run in addition to all other tests * If both `test-config/functorch` and `ciflow/mps` is present, both functorch and mps tests are run * If none of the label is present, pull workflow is run as usual Pull Request resolved: https://github.com/pytorch/pytorch/pull/84895 Approved by: https://github.com/ZainRizvi commit fa7bf3e2dc63cc27b2b0bcc90d7a2ab387dd0c9f Author: PyTorch MergeBot Date: Wed Sep 14 21:32:11 2022 +0000 Revert "[numpy] add `torch.concatenate`, alias of torch.cat (#82946)" This reverts commit 270e5e519d98868af0166f3a179b286682cfb267. Reverted https://github.com/pytorch/pytorch/pull/82946 on behalf of https://github.com/malfet due to Broke M1 tests, see https://hud.pytorch.org/pytorch/pytorch/commit/270e5e519d98868af0166f3a179b286682cfb267 commit 23b7a5fc7a1cc95f31c7004173b07d0d4a83a22d Author: Huy Do Date: Wed Sep 14 21:16:56 2022 +0000 Shard distributed tests on non CUDA focal (#84891) [pull / linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)](https://hud.pytorch.org/tts/pytorch/pytorch/master?jobName=pull%20%2F%20linux-focal-py3.7-gcc7%20%2F%20test%20(distributed%2C%201%2C%201%2C%20linux.2xlarge)) p90 TTS is about 2.2 hours, 2x the default shards. This is non-CUDA common Linux runners, so we can simply add one more shard for distributed. I missed this change in https://github.com/pytorch/pytorch/pull/84430 Having 2 shards with test time around 55m each: * https://github.com/pytorch/pytorch/actions/runs/3040900328/jobs/4897576932 * https://github.com/pytorch/pytorch/actions/runs/3040900328/jobs/4897577014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84891 Approved by: https://github.com/clee2000 commit 3e57c9550ea272b6fc4e752e7ac643c5105405f3 Author: Edward Z. Yang Date: Wed Sep 14 10:39:17 2022 -0700 Ensure as_strided_tensorimpl is never called with MPS (#85020) See https://github.com/pytorch/pytorch/pull/84893 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/85020 Approved by: https://github.com/soulitzer, https://github.com/kulinseth commit 5271494ef21ae0140755a41f3b16a8bd745642b6 Author: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Date: Wed Sep 14 19:56:12 2022 +0000 [CUDA graphs] Fixes errors in RNG seed (#84967) Fixes #84614 Prior to this PR CUDAGraph did not store the RNG seed, that is why `torch.cuda.manual_seed(new_seed)` would only reset the offset but not update the seed at all keeping whatever value was used during graph capture. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84967 Approved by: https://github.com/ngimel commit 270e5e519d98868af0166f3a179b286682cfb267 Author: Khushi Agrawal Date: Wed Sep 14 19:28:43 2022 +0000 [numpy] add `torch.concatenate`, alias of torch.cat (#82946) As per the title. Fixes: #81161 - [x] add ErrorInputs - ~[ ] dtype argument?~ - ~[ ] casting argument?~ As discussed offline with @kshitij12345, we can currently ignore `dtype` and `casting` arguments. cc: @kshitij12345! Pull Request resolved: https://github.com/pytorch/pytorch/pull/82946 Approved by: https://github.com/mruberry commit 94b67f4cd8dc1ab5f7add5f006f7f3fd988b8ecf Author: PyTorch MergeBot Date: Wed Sep 14 17:40:22 2022 +0000 Revert "Create Cache for Fusion Reuse in NVFuser in Python Frontend for Primtorch (#83267)" This reverts commit ec916bf6afcfa91305bb69d1bedbd6dafccb7c95. Reverted https://github.com/pytorch/pytorch/pull/83267 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit 4247cc98a22760932d0b53236403e478e8612c2f Author: Denis Vieriu Date: Wed Sep 14 17:24:24 2022 +0000 [MPS] Fix mps to cpu casting from a smaller dtype to a bigger dtype (#84928) Fixes #82566 , #80800 - mps->cpu casts from a smaller dtype to a bigger dtype mps->mps cast from smaller/bigger dtype to another dtype in case of scatter - For mps->cpu copies where we don't have a source/destination offset, we can save the cast result directly in the destTensor, so we can skip the additional overhead of the blit. - In case we can return the data without doing the blit, we need to check if it's blocking call, case in which we'd need a synchronize(SyncType::COMMIT_AND_WAIT); call (previously this was done by the blit). Pull Request resolved: https://github.com/pytorch/pytorch/pull/84928 Approved by: https://github.com/razarmehr commit 1a81ab3ba58d23566ba6cecd0d30eafe93dc7bc8 Author: Shen Li Date: Wed Sep 14 04:48:10 2022 +0000 Test tracing consecutive comms on the same input tensor (#84980) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84980 Approved by: https://github.com/wanchaol commit 0f30059227c6e64ca0e503b6bd3f436d837cdcef Author: Shen Li Date: Wed Sep 14 04:48:10 2022 +0000 Remove eager mode support form CommTensor (#84978) We don't need eager mode support (automatic wait on read) for now. Removing that to simply the code. We can always add this back if necessary in the future. Note that, we still need the eager mode code in `__torch_dispatch__`, as `make_fx` will also run the ops in eager mode to get the output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84978 Approved by: https://github.com/wanchaol commit b6d6a78c12e5869d0c738456e28155a3a2554ece Author: Michael Melesse Date: Wed Sep 14 15:50:14 2022 +0000 [ROCM] test_batchnorm_cudnn_nhwc (#84603) This pr enables test_batchnorm_cudnn_nhwc. This is a follow up to https://github.com/pytorch/pytorch/pull/82512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84603 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet commit 706b99030656c573619cebaa3be9298a575fc776 Author: PyTorch MergeBot Date: Wed Sep 14 14:07:58 2022 +0000 Revert "Python Dispatcher integration with C++ dispatcher (#84826)" This reverts commit 35f6a69191ef762cf22b6cbfe94b8d9406e16674. Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see https://hud.pytorch.org/pytorch/pytorch/commit/35f6a69191ef762cf22b6cbfe94b8d9406e16674 commit 74ead619446669ff638e56127759d432cf85a7ee Author: Howard Huang Date: Tue Sep 13 12:07:22 2022 -0700 [2/N] [Dispatchable Collectives] Extract ProcessGroup::Work into a separate class and update references (#83680) - Move ProcessGroup::Work into its own class and update all the references to it / header includes. In the future PRs we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. This change is prevent a circular dependency with ProcessGroup depending on Backend and Backend depending on ProcessGroup::Work. Differential Revision: [D38839212](https://our.internmc.facebook.com/intern/diff/D38839212) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83680 Approved by: https://github.com/kwen2501 commit 54c46e4f902209dc9cca2fbeb8181acf05409cb6 Author: atalman Date: Wed Sep 14 12:06:15 2022 +0000 Upgrade to CUDNN version for cuda 11.7 (#84964) Upgrade to CUDNN version to 8.5 for cuda 11.7. This is reland of: https://github.com/pytorch/pytorch/pull/84859 Issues in periodic build fshould be fixed by : https://github.com/pytorch/pytorch/pull/84943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84964 Approved by: https://github.com/ZainRizvi commit 6750946b820a3dff6de00f1ed93c9165e2f222b7 Author: Ivan Yashchuk Date: Wed Sep 14 12:03:11 2022 +0000 Skip validate_view_consistency for nvFuser tests (#84858) nvFuser's execute function always returns a copy for now. Ref. https://github.com/pytorch/pytorch/pull/84629#discussion_r966375582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84858 Approved by: https://github.com/mruberry, https://github.com/ngimel commit 35f6a69191ef762cf22b6cbfe94b8d9406e16674 Author: Michael Voznesensky Date: Tue Sep 13 20:32:42 2022 +0000 Python Dispatcher integration with C++ dispatcher (#84826) Signed-off-by: Edward Z. Yang From @ezyang's original PR: There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients: We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch. I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful. I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826 Approved by: https://github.com/ezyang commit 44c30c5d1ce9995a000d5a55cb87b168972e2801 Author: Jerry Zhang Date: Tue Sep 13 22:20:21 2022 +0000 [quant][docs] Add example for the error message for fixed qparam ops (#84666) Summary: att, since example makes it clearer what the user needs to do Test Plan: local test for the error message Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84666 Approved by: https://github.com/vkuzo, https://github.com/andrewor14 commit 55ca297d4e048c641d149a76f2fda7c9ce630ff6 Author: Edward Z. Yang Date: Tue Sep 13 16:46:15 2022 -0700 Remove enable_recursive_torch_dispatch (#84945) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84945 Approved by: https://github.com/soulitzer commit 922560b872415b96552e90eed521d7b91a7600b4 Author: Jane Xu Date: Wed Sep 14 02:52:54 2022 +0000 Removes unnecessary namespace of functions used only in einsum (#84955) This is cosmetic change that removes a few function declarations and derives values instead of hardcoding. This is step 1 in relanding a cleaner version of einsum with opt_einsum. See https://github.com/pytorch/pytorch/pull/60191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84955 Approved by: https://github.com/soulitzer commit d26e9cd9b27b7e90c2650a1bd092b6b9682c56d5 Author: PyTorch MergeBot Date: Wed Sep 14 02:46:40 2022 +0000 [vision hash update] update the pinned vision hash (#84975) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84975 Approved by: https://github.com/pytorchbot commit b28d82cb1ddb4030f8c58a99f70e6af870829541 Author: PyTorch MergeBot Date: Wed Sep 14 02:44:34 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#84912) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84912 Approved by: https://github.com/pytorchbot commit d00cabae7bc894b951d4b2c9c24c7d95bebd86e1 Author: Kurt Mohler Date: Wed Sep 14 01:39:24 2022 +0000 Fix `expectedFailureMeta` to avoid skipping tests (#84875) Fixes #84874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84875 Approved by: https://github.com/mruberry commit 8cbbd3a25f3bbac006cc7e3d8f43829235648a5c Author: Shen Li Date: Tue Sep 13 22:59:17 2022 +0000 Avoid nested CommTensor wrapping (#84963) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84963 Approved by: https://github.com/wanchaol commit 8ca057eb7179f4dfce47515309d12303fa1c11d9 Author: PyTorch MergeBot Date: Wed Sep 14 01:09:04 2022 +0000 Revert "Don't detach when making views; force caller to detach (#84893)" This reverts commit 3bb8d6a93cc4cc4403dd2e3dfcd39b841c71a3c3. Reverted https://github.com/pytorch/pytorch/pull/84893 on behalf of https://github.com/malfet due to Broke MPS, see https://hud.pytorch.org/pytorch/pytorch/commit/3bb8d6a93cc4cc4403dd2e3dfcd39b841c71a3c3 commit a185dc2e631c7bc25213f0fa4c4cc41851737079 Author: Arindam Roy Date: Wed Sep 14 00:41:12 2022 +0000 [ROCm] re-enable tensorexpr and test_openmp (#81367) The following tests are being re-enabled for ROCm: - test_openmp.py - TestTensorExprPyBind tests in test_tensorexpr_pybind.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/81367 Approved by: https://github.com/jeffdaily, https://github.com/malfet commit cb9ef4668ed37460d99cc8ee3d9960fef2075902 Author: Nayef Ahmed <22487263+Nayef211@users.noreply.github.com> Date: Wed Sep 14 00:35:36 2022 +0000 Updated library level maintainers for torchtext (#84950) - Updated library level maintainers for torchtext to reflect internal changes to the team Pull Request resolved: https://github.com/pytorch/pytorch/pull/84950 Approved by: https://github.com/mthrok commit d05a11337c8aafb663ca3b29722c5219d1589fec Author: Nikita Shulga Date: Tue Sep 13 16:36:57 2022 +0000 [CMake] Add functorch target (#83464) Move functorch/functorch into `functorch` folder - Add functorch/CMakeLists.txt that adds `functorch` native python exension - Modify `setup.py` to package pytorch and functorch together into a single wheel - Modify `functorch.__version__` is not equal to that of `torch.__version__` - Add dummy `functorch/setup.py` file for the projects that still want to build it Differential Revision: [D39058811](https://our.internmc.facebook.com/intern/diff/D39058811) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83464 Approved by: https://github.com/zou3519 commit 26b59862978ccdff925c1b457eb003b334143736 Author: Masaki Kozuki Date: Wed Sep 14 00:01:06 2022 +0000 `ReflectionPad` supports `BFloat16` (#84949) Just by looking at some commits, I didn't find why BFloat16 isn't there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84949 Approved by: https://github.com/ngimel commit fdd366541333330387d0b262da8357984e0d311f Author: Taylor Robie Date: Tue Sep 13 14:04:10 2022 -0700 [Profiler] Make `LibKinetoClient::stop()` directly call `ProfilerStateBase::pop` (#83965) It has been discussed earlier in the stack at length, but if profiler fails after it pops the profiler state but before stopping Kineto then the next profiler call will see `LibKinetoClient::stop()` try to clean up the prior run (which it still thinks is active) by calling `disableProfiler()`. (Which fails because there is not an active profiler.) This PR addresses the issue rather bluntly by simply rug pulling any active profiler from `LibKinetoClient::stop()`. I'm not particularly fond of this solution and we should refine the semantics in the future, but for now it has the desired effect of returning to a clean state. Earlier PRs in this stack cleaned up some of the lifetime management such that objects being destroyed triggers appropriate cleanup. As a result it is no longer catastrophic to simply pop the profiler state and let the destructor chain clean up. Differential Revision: [D38958237](https://our.internmc.facebook.com/intern/diff/D38958237/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83965 Approved by: https://github.com/slgong-fb commit 3bb8d6a93cc4cc4403dd2e3dfcd39b841c71a3c3 Author: Edward Z. Yang Date: Tue Sep 13 11:42:12 2022 -0700 Don't detach when making views; force caller to detach (#84893) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84893 Approved by: https://github.com/soulitzer, https://github.com/SherlockNoMad commit ec916bf6afcfa91305bb69d1bedbd6dafccb7c95 Author: Kevin Stephano Date: Tue Sep 13 23:28:39 2022 +0000 Create Cache for Fusion Reuse in NVFuser in Python Frontend for Primtorch (#83267) This PR does the following: - Replaces the `FusionOwner` with a `FusionCache` and `FusionInterface`. The `FusionCache` is a singleton that contains a cache of Fusions based on the `FusionDefinition`. It replaces the TorchScript graph caching that looked up a Fusion based on a stringified and canonicalized representation of the TorchScript graph with a prefix tree of statements in the `FusionDefinition`. The `FusionInterface` is an object that represents a Fusion in python. It can also query the cache based on id. - The ability to print out a mechanically derived definition, in python, for the user to use when debugging was added. - Replaces the python `examples` directory with true python tests under `test/test_nvfuser_frontend.py`. - Adds a set of C++ tests under the `test` directory to verify the `FusionCache`, `FusionDefinition`, and parts of the `RecordFunctor` child classes. - Adds a README file to explain how to use the Python Frontend While there are 3,000+ line edits, the bulk of the changes were repetitive line changes to the python bindings for each operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83267 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98 commit 59bb5c933b051226572a08f36a110576d9abaf29 Author: Mike Ruberry <38511765+mruberry@users.noreply.github.com> Date: Tue Sep 13 23:17:19 2022 +0000 Adds mruberry as superuser (#84869) (so PRs I approve can be merged) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84869 Approved by: https://github.com/malfet, https://github.com/seemethere commit c61e89545ec37751317697040f4d391ff2cda819 Author: XiaobingSuper Date: Tue Sep 13 04:24:14 2022 -0400 disable onednn gelu for empty input (#84926) This PR is about disabling onednn gelu for empty input, fix https://github.com/pytorch/pytorch/issues/78152. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84926 Approved by: https://github.com/Lezcano, https://github.com/zou3519 commit 25d91e0a9d5eb2e366076850e2e81bf968cc0fbb Author: atalman Date: Tue Sep 13 23:00:09 2022 +0000 Updating cudnn_frontend to 0.7.1 (#84943) Updating cudnn_frontend to 0.7.1 To enable CUDNN 8.5 integration cc @malfet @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/84943 Approved by: https://github.com/huydhn, https://github.com/malfet commit 36d79143cef8847a0d6455d65f52a5ef9f23471b Author: PyTorch MergeBot Date: Tue Sep 13 22:54:53 2022 +0000 Revert "[reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151) (#84675)" This reverts commit bb4e96c9644a034e593085026b781ee78a4d6a77. Reverted https://github.com/pytorch/pytorch/pull/84675 on behalf of https://github.com/osalpekar due to causing asan xplat link-time errors like ld.lld: error: undefined symbol: torch::jit::has_jit_decomposition(c10::FunctionSchema const&) commit 38192f63cdca6b80e6eb369a2eddad7728f0492c Author: Rodrigo Kumpera Date: Tue Sep 13 21:57:46 2022 +0000 Add __all__ for a few distributed modules plus a little typing (reland) (#84872) This handles distributed_c10d, which is massive and ddp_comm_hooks. This relands #84119 with the required fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84872 Approved by: https://github.com/rohan-varma commit 53c71e214233b0e97c1cb2bf02676a9f800c1e91 Author: kshitij12345 Date: Tue Sep 13 21:02:53 2022 +0000 [functorch] test - vmapjvpvjp (#83375) Adds `vmapjvpvjp` test to `functorch` Runtime of the test: ``` = 856 passed, 250 skipped, 16175 deselected, 137 xfailed, 197 warnings in 2231.84s (0:37:11) = ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83375 Approved by: https://github.com/zou3519 commit b4a881afac62989d130c8a92b4c83d16ccc7384a Author: Jithun Nair Date: Tue Sep 13 20:43:42 2022 +0000 [ROCm] Remove gfx900 from base docker build and Pytorch build scripts (#80015) CI doesn't have any MI25s anymore. Should improve docker and Pytorch build times in CI for ROCm. Will take out of Draft mode after https://github.com/pytorch/pytorch/pull/79596 is merged Pull Request resolved: https://github.com/pytorch/pytorch/pull/80015 Approved by: https://github.com/jeffdaily, https://github.com/malfet commit 0e8c5cf8477e3235a7574c9436f30bbcbcd82e89 Author: Jianyu Huang Date: Tue Sep 13 20:42:52 2022 +0000 Revert D34636039: Multisect successfully blamed D34636039 for test or build failures (#84942) Test Plan: NA Reviewed By: jianyuh Differential Revision: D39373091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84942 Approved by: https://github.com/xuzhao9 commit 81da50a972fc402a6dd880fe392af0f0051cb6de Author: Nikita Shulga Date: Tue Sep 13 00:59:59 2022 +0000 Return device count using nvml (#84879) Fixes https://github.com/pytorch/pytorch/issues/83973 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84879 Approved by: https://github.com/ngimel commit 94f20c3514ce16f637d9863b867bac3ec6f2d9ce Author: Nikita Shulga Date: Mon Sep 12 20:22:37 2022 +0000 Memoize `torch.cuda.device_count` (#84878) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84878 Approved by: https://github.com/ngimel commit bda8a5729b2ee739c9b8dd6bf2696ba3b0bdec78 Author: drisspg Date: Tue Sep 13 20:35:58 2022 +0000 [Nested Tensor] Create differentiable nt to tensor view functions (#83371) This PR attempts to implements 2) "the safe way" of creating a view of nested tensor that returns a regular tensor. The rest of the break down is here: https://fb.quip.com/J8QCAx41af11 https://gist.github.com/drisspg/8622e9c97d374fa920ac647e1167cabc This is a short list of some edge cases. After some more work I was able to address two of the test cases in the above gist. There are few complex aspects here that I left defeated comments inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83371 Approved by: https://github.com/bdhirsh commit fa86874bbddfdbd2f4095e4084a3e1b2f81fde50 Author: Peter Bell Date: Tue Sep 13 17:13:37 2022 +0000 Fix intermittent link errors in NCCL build (#84245) Should fix #13362 and fix #83790 I think I've discovered the root cause of the intermittent nccl link failures. If we look at the variable name in the redefinition error: ``` _02021d91_11_sendrecv_cu_0bc7b9c8_11152 ``` this is the name of the file being compiled + some form of unique ID. As part of NCCL's build process, the same file is compiled multiple times with different macro definitions depending on which operator and dtype are being compiled, e.g. ``` nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -dc sendrecv.cu -o sendrecv_sum_i8.o ``` Since the filename parts are the same, then if the unique IDs also happen to collide then the entire identifier will collide and the link fails. So the fix here is to generate a unique `.cu` file for each object file. I've implemented this as a `.patch` file that gets applied from our cmake code, but if we instead fork nccl that would be cleaner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84245 Approved by: https://github.com/janeyx99, https://github.com/malfet commit 74d0c64708c79351ca8b43992422f6b647f46a9f Author: Edward Z. Yang Date: Tue Sep 13 08:50:57 2022 -0700 Don't use reentrant dispatch for composite compliance (#84909) I believe these were added in to prevent changing behavior when https://github.com/pytorch/pytorch/pull/75827 landed, but I actually think they are unnecessary, and they are causing asserts to fire on the subsequent PR (where I assert that tensors returned by views MUST NOT already have view metadata associated with them.) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84909 Approved by: https://github.com/zou3519, https://github.com/soulitzer commit b4799736ee6cb5941c8aff1b20e1ca19372e98e2 Author: Thomas Orozco Date: Tue Sep 13 18:41:15 2022 +0000 autograd: fix non-deterministic output in codegen comments (#84695) Summary: Like it says in the title. Currently, this will return output like this: In Buck1, that's OK because Buck1's caching doesn't really care too much about However, in Buck2, this is a disaster, because caching is based exclusively on inputs and outputs and The diff here proposes making the path relative to the codegen script itself, which should carry about as much info, but avoid cache misses. Concretely, this: ``` // generated from /dev/shm/uid-34135/cfbc5712-seed-nspid4026533424_cgpid2794673-ns-4026533443/tools/autograd/templates/python_functions.h ``` Becomes, this: ``` // generated from ../tools/autograd/templates/python_functions.h ``` So, we keep the useful part, and we get caching. This matters because those headers are used in actions like: ``` fbcode//deeplearning/fbgemm/fbgemm_gpu/codegen:embedding_ops -- action (cxx_compile gen_embedding_backward_adam_split_unweighted_cuda.cu (pic)) ``` Those actions take upwards of 5 minutes to finish, so by allowing a cache hit, we are a) saving our users a lot of time and b) saving some RE capacity as well. This actually matters a lot because right now those targets are produced by `//caffe2:generate-code`, which itself doesn't get cache hits from RE because `generate_code.par` is non-deterministic (this is, unfortunately, true of PARs in general), so that rule introduces non-determinism that the codegen propagates and we get zero caching. This diff doesn't fix `//caffe2:generate-code`'s inputs being non-deterministic, but it does fix its *outputs* being non-deterministic, which means the non-determinism stops there, and we get back to cache hits. Test Plan: - CI ``` buck2 build fbcode//caffe2:generate-code buck2 build fbcode//deeplearning/fbgemm/fbgemm_gpu/codegen:embedding_ops ``` Reviewed By: ndmitchell Differential Revision: D39348565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84695 Approved by: https://github.com/soulitzer commit 2e65f187cdc8fd461c77c501cf7ec40d76f0b34f Author: Nikita Shulga Date: Tue Sep 13 00:42:19 2022 +0000 [Functorch] Delete unused files (#83777) Differential Revision: [D39032967](https://our.internmc.facebook.com/intern/diff/D39032967) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83777 Approved by: https://github.com/zou3519 commit 33352336b443dbfce1394f6b4950e8f33eff2cef Author: Andrew Gu Date: Tue Sep 13 00:22:53 2022 +0000 [FSDP] Add rate limiter (#83917) **Overview** This PR adds a `bool` argument `limit_all_gathers` to the FSDP constructor, defaulted to `False`. - Setting `limit_all_gathers=True` limits the max number of inflight all-gathers to 2 (an empirically chosen constant), preventing a fast CPU thread from over-allocating blocks to the all-gather stream. - When experiencing a high number of CUDA malloc retries, the limiter can help reduce the number and hence lead to QPS improvement. **Exploration** I experimented with both a count-based limiter and size-based limiter (where the size is based on the inflight all-gather size in bytes). - The size-based limiter did not provide any advantage, only confusing the developer and user alike on what threshold to set. - For the count-based approach, I decided not to expose the max number of inflight all-gathers to the user since values other than 2 do not show improvements and exposing the knob may confuse users. **T5-11B** T5-11B evidences the performance gain from enabling the limiter and that a limit of 2 is a reasonable choice. This is run on an AWS cluster with 8 A100s per node and EFA. For both 2 and 4 nodes, we scale the batch size maximally before hitting OOM, which is a common practice.

For 2 nodes, the limit of 2 yields 3.01x QPS improvement, and for 4 nodes, the limit of 2 yields 2.87x QPS improvement. We need more data points, but the limiter may simplify the batch size scaling workflow. Normally, a practitioner may scale until hitting OOM and back off until there are few CUDA malloc retries. However, now the practitioner may be able to scale until hitting OOM and simply turn on the limiter to reduce the number of retries instead of backing off. Differential Revision: [D39331201](https://our.internmc.facebook.com/intern/diff/D39331201) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83917 Approved by: https://github.com/zhaojuanmao commit 39676a977f7dc91c5c05cce8c93f0cb8481fc3da Author: Andrew Gu Date: Tue Sep 13 00:22:53 2022 +0000 [FSDP][Easy] Save unpadded/padded unsharded sizes as attributes (#84366) Differential Revision: [D39331199](https://our.internmc.facebook.com/intern/diff/D39331199) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84366 Approved by: https://github.com/rohan-varma commit afcc7c7f5c7cef740241ff0abdae8d4f2ad22a03 Author: Andrew Gu Date: Tue Sep 13 00:22:52 2022 +0000 [FSDP] Generalize prefetching; lower unshard/reshard to handle (#83665) - `self.sharding_strategy` - If the world size is 1, I clamp the sharding strategy to `NO_SHARD`, regardless of the passed-in sharding strategy, since the behavior is fully equivalent. This absolves the need for `p._is_sharded or self.world_size == 1` checks in the core code. Once we fully shift the paradigm to using handles, this should result in a clear net positive. However, for now, we still have some places where we interface directly with the `FlatParameter`, in which case we have some temporary hacky code. - `HandleConfig` - As a part of the new design abstraction, much logic is lowered to the `FlatParamHandle`. This requires the handle be aware of mixed precision, CPU offloading, sharding strategy, and the process group (for world size > 1). To be less error-prone, I re-defined the `dataclass`s and `enum`s for the handle. These can be removed and coalesced with the existing ones. - The drawback is that the `FlattenParamsWrapper` constructor now takes in the `HandleConfig` to forward it to the `FlatParamHandle` constructor. I tolerate this since we plan to retire the FPW. For now, the handle's process group attributes are set later when we call `handle.shard()`. - We will dive into this logic lowering later. For now, the idea is we need to pass some extra info to the handle, which must go through the FPW. - `FullyShardedDataParallel._shard_parameters()` -> `FlatParamHandle.shard()` - [Important] Generalizing attributes to remove the 1 `FullyShardedDataParallel` : 1 `FlatParameter` assumption - **Before:** `_fsdp_graph_order`, `_pre_backward_hook_full_params_prefetched`, `_forward_full_params_prefetched`, `reshard_after_forward` are with respect to 1 `FullyShardedDataParallel` - **After:** (1) We use `FlatParamHandle` in place of `FullyShardedDataParallel`. (2) The atomic unit for forward and pre-backward is a _group_ of handles involved in the same module's forward/pre-backward. This is represented as `Tuple[FlatParamHandle, ...]`. For now, this is **always a singleton tuple**, but this shift enables a module having multiple FSDP parameters (which we have use cases for). - `_reset_lazy_init()` attributes - The prefetched flags are merged into `self._handles_prefetched`, which is directly defined in the constructor. `reshard_after_forward` is retired since it can be fully determined by other attributes (`_is_root` and `sharding_strategy`). The first step is to read the existing `_rebuild_full_params()`. A few notable observations: - It returns `Tuple[Tensor, bool]`. The first element is the _padded unsharded flattened parameter_, and the second element is whether we can free it upon exiting `summon_full_params()`. This return value is **only used in `summon_full_params()`**. - If parameter mixed precision is enabled and the `FlatParameter` is already unsharded, then the low precision shard (`_mp_shard`) is still re-allocated on GPU. (It is freed at the end of the method.) - If CPU offloading is enabled and the `FlatParameter` is already unsharded, then there is a no-op `p.data = p.data.to(self.compute_device, non_blocking=True)`. - Inside `summon_full_params()`, `mixed_precision_cast_ran` is always `False`. Therefore, the return value for the `not p._is_sharded and mixed_precision_cast_ran` branch is unused. -`summon_full_params()` can only be called (before forward or after backward) or (between forward and backward). Given this, I cannot think of a case where we call `summon_full_params()`, the `FlatParameter` is already unsharded, but `reshard_after_forward` is `True`. The `FlatParameter` should be sharded (before forward or after backward), and the `FlatParameter` may only be unsharded (between forward and backward) if `reshard_after_forward` is `False`. - If parameter mixed precision is enabled and the sharding strategy is a sharded one, then inside `summon_full_params()`, the `FlatParameter` is unsharded in full precision. This involves allocating a new padded unsharded flattened parameter on GPU in full precision since `_full_param_padded` is in the low precision. Some comments: - Ideally, we reduce the complexity of the core code path: i.e. unshard for forward and pre-backward. If the return value is only used for `summon_full_params()`, we should consider if we can compartmentalize that logic. - The branching is complex, and some return values are never used, where this fact is not immediately obvious. We should see if we can reduce the branch complexity. Disclaimer: The difference in attribute semantics between `NO_SHARD` and the sharded strategies makes it challenging to unify the cases. This PR does not attempt to address that since it requires more design thought. However, it does attempt to reduce the complexity for the sharded strategies. Let us trace through the new logical unshard. 1. `FullyShardedDataParallel._unshard(self, handles: List[FlatParamHandle], prepare_gradient: bool)` - This iterates over the handles and calls `handle.pre_unshard()`, `handle.unshard()`, and `handle.post_unshard(prepare_gradient)` in the all-gather stream. 2. `FlatParamHandle.needs_unshard(self)` - We take an aside to look at this key subroutine. - For `NO_SHARD`, this returns `False`. - For sharded strategies, this checks if the padded unsharded flattened parameter is allocated. The padded unsharded flattened parameter is the base tensor for the unpadded unsharded flattened parameter, which is a view into the padded one. Thus, the padded one's allocation fully determines if the `FlatParameter` is unsharded. - For sharded strategies, to accommodate the parameter mixed precision + `summon_full_params()` case, we introduce `_full_prec_full_param_padded`, which is the padded unsharded flattened parameter in full precision. The helper `_get_padded_unsharded_flat_param()` takes care of this casing and returns the padded unsharded flattened parameter. Instead of allocating a new tensor each time, we manually manage `_full_prec_full_param_padded`'s storage just like for `_full_param_padded`. 3. `FlatParamHandle.pre_unshard(self)` - For sharded strategies, the postcondition is that the handle's `FlatParameter` points to the tensor to all-gather. This should be on the communication device and in the desired precision. The allocation and usage of the low precision shard for parameter mixed precision and the CPU -> GPU copy for CPU offloading both classify naturally in the pre-unshard. - For sharded strategies, if the `FlatParameter` does not need to be unsharded, `pre_unshard()` is a no-op. This avoids unnecessarily allocating and freeing the low precision shard. - For `NO_SHARD`, we simply preserve the existing semantics. 4. `FlatParamHandle.unshard(self)` - If the handle was resharded without freeing the padded unsharded flattened parameter (e.g. `summon_full_params()` between forward and backward when `reshard_after_forward=False`), then the `FlatParameter` points to the sharded flattened parameter. We need to switch to using the unsharded parameter. This is a design choice. Alternatively, we may not switch to using the sharded flattened parameter in `reshard()` if we do not free the padded unsharded flattened parameter. However, the postcondition that the `FlatParameter` points to the sharded flattened parameter after `reshard()` is helpful logically, so I prefer this approach. - Otherwise, this allocates the padded unsharded flattened parameter, all-gathers, and switches to using the unpadded unsharded flattened parameter. - In the future, we may add an option to `unshard()` that additionally all-gathers the gradient. 5. `FlatParamHandle.post_unshard(self, prepare_gradient: bool)` - For sharded strategies, if using parameter mixed precision, this frees the low precision shard. More generally, this should free any sharded allocations made in `pre_unshard()` since the all-gather has been launched. If using CPU offloading, the GPU copy of the local shard goes out of scope after `unshard()` and is able to be garbage collected. **We should understand if there is any performance difference between manually freeing versus deferring to garbage collection since our usage is inconsistent.** For now, I preserve the existing semantics here. - `prepare_gradient` is meant to be set to `True` for the pre-backward unshard and `False` for the forward unshard. This runs the equivalent logic of `_prep_grads_for_backward()`. - This post-unshard logic (notably the gradient preparation) now runs in the all-gather stream, which is fine because we always have the current stream wait for the all-gather stream immediately after `FullyShardedDataParallel._unshard()`. IIUC, we do not need to call `_mp_shard.record_stream(current_stream)` (where `current_stream` is the default stream) because `_mp_shard` is allocated and freed in the same (all-gather) stream. - A postcondition is that the `FlatParameter` is on the compute device. It should also have the unpadded unsharded size (though I do not have a check for this at the moment). Now that we see how the logical unshard has been reorganized for the core code path, let us dive into `summon_full_params()`. The two constraints are: 1. If using parameter mixed precision, we should unshard in full precision. 2. We must determine if we should free the padded unsharded flattened parameter upon exiting. The first constraint is addressed as described before in the core unshard code path, so it remains to explore the second constraint. I propose a simple rule: **We free iff we actually unshard the `FlatParameter` in `summon_full_params()`** (i.e. it was not already unsharded). We perform a case analysis: **Parameter mixed precision enabled:** * `NO_SHARD`: `flat_param.data` points to `flat_param._local_shard`, which is the full precision unsharded flattened parameter. This is **not safe to free**. * `FULL_SHARD` / `SHARD_GRAD_OP`: We force full precision and all-gather to `_full_prec_full_param_padded`. We do not support `nested summon_full_params()`, so `_full_prec_full_param_padded` must be unallocated. We unshard, and it is **safe to free**. **Parameter mixed precision disabled:** * `NO_SHARD`: This is the same as with mixed precision enabled. This is **not safe to free**. * `FULL_SHARD` / `SHARD_GRAD_OP`: We all-gather to `_full_param_padded`. It may already be unsharded. * Already unsharded: The unshard is a no-op. This is **not safe to free**. * For `FULL_SHARD`, this can happen for the root FSDP instance after `forward()` but before backward. * For `SHARD_GRAD_OP`, this can happen for all FSDP instances after `forward()` but before backward. * Needs unshard: We unshard. This is **safe to free**. Therefore, we see that it is not safe to free when using `NO_SHARD` and when using a sharded strategy but the `FlatParameter` is already unsharded. This is precisely the proposed rule. There were two notable edge cases that the existing code did not address. 1. The existing code tests if the `FlatParameter` is already unsharded by checking the allocation status of `_full_param_padded`. When using parameter mixed precision, this is the incorrect tensor to check. If `_full_param_padded` is allocated (e.g. when `reshard_after_forward=False` and calling `summon_full_params()` between forward and backward), the already-unsharded check is a false positive, and `summon_full_params()` does not correctly force full precision. https://github.com/pytorch/pytorch/issues/83068 - This PR's `needs_unshard()` check correctly routes to the appropriate padded unsharded flattened parameter depending on the calling context (i.e. if it needs to force full precision or not). 2. The existing code does not free the GPU copy of the padded unsharded flattened parameter when calling `summon_full_params(offload_to_cpu=True)`. It unshards the `FlatParameter`, moves the padded unsharded flattened parameter to CPU, and sets the `FlatParameter` data to be the appropriate unpadded view into the padded unsharded flattened parameter on CPU. However, `_full_param_padded` still points to the all-gathered padded unsharded flattened parameter on GPU, which is kept in memory. https://github.com/pytorch/pytorch/issues/83076 - This PR frees the GPU copy and reallocates it upon exiting `summon_full_params()`. This is essential for avoiding peak GPU memory usage from increasing as we recurse through the module tree. There may be some cases where we can avoid reallocation altogether, but that can be addressed in a follow-up PR. - This PR offloads the *unpadded* unsharded flattened parameter to CPU directly instead of the *padded* one. As far as I can tell, there is no need to include the padding since unflattening the original parameters does not require the padding. - The relevant code is in the context manager `FlatParamHandle.to_cpu()`. This PR removes the mixed precision stream usage. As is, I do not think there is any extra overlap being achieved by the stream usage. The low precision shard is allocated and copied to in the mixed precision stream ([code](https://github.com/pytorch/pytorch/blob/1f99bdfcc4a3f97d28471a531d2b69def762f6ba/torch/distributed/fsdp/fully_sharded_data_parallel.py#L1401-L1412)), and the current stream (in this case the all-gather stream) waits for the mixed precision stream ([code](https://github.com/pytorch/pytorch/blob/1f99bdfcc4a3f97d28471a531d2b69def762f6ba/torch/distributed/fsdp/fully_sharded_data_parallel.py#L1414)). However, we immediately schedule an all-gather that communicates that exact low precision shard ([code](https://github.com/pytorch/pytorch/blob/1f99bdfcc4a3f97d28471a531d2b69def762f6ba/torch/distributed/fsdp/fully_sharded_data_parallel.py#L3338)) with no other meaningful computation between. If we remove the mixed precision stream, the low precision shard is allocated and copied to in the all-gather stream (including the non-blocking CPU -> GPU copy if using CPU offloading). Under this PR's design, we may consider a "pre-unshard" stream for all logical pre-unshard data transfers if we want to overlap in the future. IIUC, the overlap opportunity exists if there are multiple `FlatParameter`s per module, and we only have the all-gather stream wait for the data transfer corresponding to the local shard it communicates, not the others. If we agree on removing the mixed-precision stream for now, I will remember to delete it from `_init_streams()`. Like with unshard, the first step is the look at the existing `_free_full_params()` and `_use_param_local_shard()`. A few notable observations: - For only `NO_SHARD`, `_free_full_params()` includes a call to `_free_mp_shard()`. - For `summon_full_params()`, there is a separate `_free_full_params_and_use_local_shard()` that duplicates the main logic of `_free_full_params()` and calls `_use_param_local_shard()`. - In `forward()`, if `reshard_after_forward=True`, we call `_free_full_params()` and then `_free_mp_shard()`. Hence, for `NO_SHARD`, the `_free_mp_shard()` is a no-op. - In the post-backward hook, we typically call `_free_full_params()` and `_free_mp_shard()`. The `_free_mp_shard()` is a no-op for `NO_SHARD` and if `reshard_after_forward=True`. Some comments: - The code certainly works, but some of the no-ops are subtle. When possible, we should make it clear when calls are no-ops or not. It is good that the existing code documents that `_free_mp_shard()` is a no-op in the post-backward hook when `reshard_after_forward=True`. However, there are still some non-obvious no-ops (around `NO_SHARD`). - We should see if we can avoid the duplicate `_free_full_params_and_use_local_shard()`. Let us trace through the logical reshard: 1. `FullyShardedDataParallel._reshard(self, handles: List[FlatParamHandle], free_unsharded_flat_params: List[bool])` - The two args should have the same length since they are to be zipped. - The goal of having `free_unsharded_flat_params` is that the caller should be explicit about whether the (padded) unsharded flattened parameter should be freed. The low precision shard is always meant to be freed (as early as possible), so there is no corresponding `List[bool]`. 2. `FlatParamHandle.reshard(self, free_unsharded_flat_param: bool)` - This frees the (padded) unsharded flattened parameter if `free_unsharded_flat_param` and switches to using the sharded flattened parameter. - Echoing back to forcing full precision in `summon_full_params()`, `_free_unsharded_flat_param()` frees the correct tensor by using `_get_padded_unsharded_flat_parameter()`. 3. `FlatParamHandle.post_reshard(self)` - I am not fully content with the existence of this method, but this seems to be an unavoidable consequence of `NO_SHARD`. Perhaps, this may be useful in the future for other reasons though. - Right now, this method is only meaningful for `NO_SHARD` + parameter mixed precision + outside `summon_full_params()`. `_mp_shard` is not freed in the post-unshard since it is also the low precision _unsharded_ flattened parameter, so we must delay the free until the the post-reshard. Below the `FlatParamHandle.reshard()` and `post_reshard()` layer, there should not be any no-ops. One final comment I will mention is that I like the `pre_unshard()`, `unshard()`, `post_unshard()`, and `reshard()`, `post_reshard()` organization because it makes it clear what the boundaries are and their temporal relationship. Through that, we can set pre- and post-conditions. Furthermore, we can eventually convert logic to hooks that may be registered on the `FlatParamHandle` (for `pre_unshard()`, `post_unshard()`, and `post_reshard()`). This may improve the customizability of FSDP. - This PR reorganizes `forward()` in preparation for non-recursive wrapping, which uses pre-forward and post-forward hooks that expect the signature `hook(module, input)`. For FSDP, the `module` and `input` arguments are not used. - This PR creates a new method `_fsdp_root_pre_forward()` to handle the logic only the root FSDP should run. Finally, we dive into the prefetching changes. Some highlights: 1. This PR unifies the execution order validation and prefetching implementations. - Both involve the execution order and can be unified to share some boilerplate. 2. Execution order validation only runs when the distributed debug level is `INFO`. - We have yet to have one success case where we actually catch an unintended source of dynamism. The warning is also too verbose. Hence, we are gating it by the `INFO` level. 3. This PR moves prefetching to be with respect to groups of handles (as mentioned in the constructor comment). - This is essential for supporting prefetching with non-recursive wrapping. 4. This PR does not include "bubbles", i.e. modules with no handles, in the recorded execution order(s). This deviates from the existing implementation. - This makes prefetching possibly more aggressive (when there are such bubbles), but it should not have significant performance implications either way. 5. This PR changes backward prefetching to reset the post-forward order each iteration (as intended). 6. This PR changes forward prefetching to use the first iteration's pre-forward order instead of the first iteration's post-forward order. (We can discuss whether we want this in this PR or not. Otherwise, I can keep it as using the post-forward order to preserve the existing semantics.) This PR also removes the `all_gather_stream.wait_stream(current_stream)` before forward prefetching because it does not help with high GPU reserved memory. We can add that back if desired. The existing PT-D FSDP pre-backward prefetching uses the reverse post-forward order.
Model Code ``` class Model(nn.Module): def __init__(self): super().__init__() self.block1 = nn.Sequential( nn.Conv2d(3, 4, kernel_size=3), nn.BatchNorm2d(4), nn.ReLU(inplace=True), ) self.block2 = nn.Sequential( nn.Conv2d(4, 4, kernel_size=3), nn.BatchNorm2d(4), nn.ReLU(inplace=False), ) self.block3 = nn.Linear(12, 8) self.head = nn.Sequential( nn.AdaptiveAvgPool2d(output_size=(1, 1)), nn.Flatten(), nn.Linear(4, 10), ) def forward(self, x): x = self.block1(x) x = self.block2(x) x = self.block3(x) return self.head(x) model = Model().cuda() fsdp_kwargs = {} model.block1[1] = FSDP(model.block1[1], **fsdp_kwargs) # BN2d model.block2[1] = FSDP(model.block2[1], **fsdp_kwargs) # BN2d model.block1 = FSDP(model.block1, **fsdp_kwargs) model.block2 = FSDP(model.block2, **fsdp_kwargs) model.block3 = FSDP(model.block3, **fsdp_kwargs) model = FSDP(model, **fsdp_kwargs) ```
Execution Orders ``` Pre-backward hook for ('head.2.weight', 'head.2.bias') 140339520587136 (model) Pre-backward hook for ('weight', 'bias') 140339461194656 (block3) Pre-backward hook for ('0.weight', '0.bias') 140339520589776 (block2) Pre-backward hook for ('weight', 'bias') 140339520587664 (block2 BN) Pre-backward hook for ('weight', 'bias') 140339520586656 (block1 BN) Pre-backward hook for ('0.weight', '0.bias') 140339520588768 (block1) Pre-forward order: ('head.2.weight', 'head.2.bias') 140339520587136 (model) ('0.weight', '0.bias') 140339520588768 (block1) ('weight', 'bias') 140339520586656 (block1 BN) ('0.weight', '0.bias') 140339520589776 (block2) ('weight', 'bias') 140339520587664 (block2 BN) ('weight', 'bias') 140339461194656 (block3) Reverse post-forward order: ('head.2.weight', 'head.2.bias') 140339520587136 (model) ('weight', 'bias') 140339461194656 (block3) ('0.weight', '0.bias') 140339520589776 (block2) ('weight', 'bias') 140339520587664 (block2 BN) ('0.weight', '0.bias') 140339520588768 (block1) ('weight', 'bias') 140339520586656 (block1 BN) ```
Differential Revision: [D39293429](https://our.internmc.facebook.com/intern/diff/D39293429) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83665 Approved by: https://github.com/zhaojuanmao commit a2acead00256cb2580afe8297dda0ad0134fe21e Author: Andrew Gu Date: Tue Sep 13 00:22:52 2022 +0000 [FSDP][Easy] Minor cleanup (#84761) This PR simply pulls out some minor changes from the next (monolithic) PR. Differential Revision: [D39392147](https://our.internmc.facebook.com/intern/diff/D39392147) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84761 Approved by: https://github.com/zhaojuanmao commit 8c2da0616c217c7732f5893b7a5e7ee80b8af4ff Author: PyTorch MergeBot Date: Tue Sep 13 16:46:24 2022 +0000 Revert "Upgrade to CUDNN version for cuda 11.7 (#84859)" This reverts commit 9064bf2c721ee1df0e3698344412043eb80e4fa7. Reverted https://github.com/pytorch/pytorch/pull/84859 on behalf of https://github.com/atalman due to Reverting broke periodic tests commit 351ac63cddf992e7dfff0af058a4d175ac37e142 Author: nikitaved Date: Tue Sep 13 07:35:51 2022 -0500 coo binary_op intersection primitives (#83427) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83427 Approved by: https://github.com/bhosmer, https://github.com/amjames, https://github.com/cpuhrsch commit 3f047b2a90fddf55487cbe42c17558beb7f29903 Author: Nikolay Korovaiko Date: Mon Sep 12 22:36:16 2022 -0700 SymInt support for computeStride (#84905) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84905 Approved by: https://github.com/ezyang commit 8b8141e971ed97aa5633b412bffde6e5bf31187c Author: Nikolay Korovaiko Date: Mon Sep 12 22:36:16 2022 -0700 SymInt support for multiply_integers (#84904) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84904 Approved by: https://github.com/ezyang commit ecee6c742f3d88bd567dc2e95a9ecccdad674854 Author: Nikolay Korovaiko Date: Mon Sep 12 16:56:39 2022 -0700 StmInt support for InferSize (#84903) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84903 Approved by: https://github.com/ezyang commit 7e900f204f8494ab52f4ad089608c8cb008a273c Author: Edward Z. Yang Date: Mon Sep 12 22:02:36 2022 -0700 Avoid throwing an exception when ScriptList doesn't match. (#84921) This prevents 'catch throw' gdb breakpoint pollution and should also improve performance. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84921 Approved by: https://github.com/Chillee commit 33bb8ae350611760139457b85842b1d7edf9aa11 Author: erjia Date: Tue Sep 13 13:38:58 2022 +0000 Set shuffle to DataPipes with set_shuffle API (#83741) This PR requires PR is landed: https://github.com/pytorch/pytorch/pull/83202 - For `apply_shuffle_setting` and `apply_shuffle_seed`, it makes sure it will apply shuffle setting to each of DataPipe that contains a method called `set_shuffle` or `set_seed`. - Change the API from `apply_shuffle_seed` to `apply_random_seed`. - Fix a bug that `apply_shuffle_seed` only accepts DataPipe that is hashable. After the PR, this function uses `id` to prevent seeding the same DataPipe multiple times per epoch. - Fix another bug from `shuffler` that `reset` with `_enable=False` would also reset `_seed`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83741 Approved by: https://github.com/NivekT commit 7a9ab5c232f54430704456d18a22f99838489817 Author: Edward Z. Yang Date: Mon Sep 12 22:02:31 2022 -0700 Move Python argument related functions to cpp file (#84919) No changes to contents, just moving things out of header. I only moved the stuff I suspected I'd be editing; maybe more things from this header could migrate out. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84919 Approved by: https://github.com/suo commit 99cfaf9eeea0a6f20d0b11d211db379473db748e Author: Andrew Gu Date: Tue Sep 13 00:22:52 2022 +0000 [FSDP] Subtest prefetching for `test_fsdp_grad_acc.py` (#84601) This modifies `test_fsdp_grad_acc.py` to test all 3 current sharding strategies and subtests prefetching. Differential Revision: [D39293432](https://our.internmc.facebook.com/intern/diff/D39293432) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84601 Approved by: https://github.com/zhaojuanmao commit dbd38f63f5731f8403edfdf9d5956ca872453dd3 Author: John Detloff Date: Tue Sep 13 05:09:15 2022 +0000 Include CoreML error description in exception thrown when inference fails (#84804) Summary: Catch the error and throw an exception with PTMCoreMLBackend when inference fails. This way the error description will be available in the logged crash, as opposed to crashing with a less descriptive exception. I'll be drafting follow up diffs to actually catch exceptions in the segmentation shim next. Ideally we would fail inference gracefully and not crash at all, but at least after this diff we'll have the full diagnostic info Test Plan: Force an error, and confirm its description appears in the exception thrown via the console Differential Revision: D39407865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84804 Approved by: https://github.com/mcr229 commit e980ff8eb912576f5846f14563ba1bc2ee297bce Author: Sergii Dymchenko Date: Tue Sep 13 04:04:04 2022 +0000 Remove unused method_assignments (#84917) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84917 Approved by: https://github.com/huydhn commit d951165bd8e35e486fddd5c80f469ff644e66971 Author: vfdev-5 Date: Tue Sep 13 03:54:07 2022 +0000 [C++ API] Added missing antialiasing path in interpolation C++ api (#84599) Description: Following https://github.com/pytorch/pytorch/pull/69318#issuecomment-1238433540 adding missing bicubic path for anti-alias flag to C++ frontend. - https://github.com/pytorch/pytorch/pull/70930 - added tests in pytorch/test/cpp/api/functional.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/84599 Approved by: https://github.com/kit1980, https://github.com/malfet commit 2fbc0fab20d4af520f69f158f8777e99ad761e1d Author: Huy Do Date: Tue Sep 13 03:06:11 2022 +0000 Setup sccache for linux test (#84916) The TTS alarm in HUD is quite noisy because of `pull / linux-focal-py3.7-gcc7 / test (backwards_compat)`. Some runs take up to 50m, i.e. [4893945786](https://github.com/pytorch/pytorch/actions/runs/3038960118/jobs/4893945786) while others take only 10m, i.e. [4893781147](https://github.com/pytorch/pytorch/actions/runs/3038943635/jobs/4893781147). Looking closer into their logs, it turns out that the longer runs have a much higher rate of cache miss. For example, [4893945786](https://github.com/pytorch/pytorch/actions/runs/3038960118/jobs/4893945786) ``` Compile requests 6487 Compile requests executed 6224 Cache hits 4975 Cache hits (C/C++) 4975 Cache misses 1227 Cache misses (C/C++) 1227 Cache timeouts 0 Cache read errors 0 Forced recaches 0 Cache write errors 0 Compilation failures 9 Cache errors 13 Cache errors (C/C++) 13 Non-cacheable compilations 0 Non-cacheable calls 16 Non-compilation calls 247 Unsupported compiler calls 0 Average cache write 0.096 s Average cache read miss 11.681 s Average cache read hit 0.040 s Failed distributed compilations 0 Non-cacheable reasons: multiple input files 15 unknown source language 1 Cache location S3, bucket: Bucket(name=ossci-compiler-cache-circleci-v2, base_url=http://ossci-compiler-cache-circleci-v2.s3.amazonaws.com/) ``` In https://github.com/pytorch/pytorch/pull/82103, we didn't setup `SCCACHE_S3_KEY_PREFIX` for `_linux-test`, which could explain the high rate of cache miss here. `backwards_compat` is a bit different than other tests in which it compiles PyTorch and gets the benefit from caching. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84916 Approved by: https://github.com/seemethere commit 9d5b3e4da8723bbac6879fc7cae8f27177f0c26d Author: Andrew Gu Date: Tue Sep 13 00:22:51 2022 +0000 [FSDP] Remove `forward_prefetch` (#84600) We are removing the `forward_prefetch` option. By the nature of async GPU kernel execution, launching the CPU kernel for the next layer's all-gather early does not actually improve performance. Moreover, the existing `forward_prefetch` uses the post-forward order instead of the pre-forward order, which leads to mis-targeted prefetched all-gathers. Differential Revision: [D39454217](https://our.internmc.facebook.com/intern/diff/D39454217) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84600 Approved by: https://github.com/zhaojuanmao commit 8f92140c4052ceecb104f6caf078fd8614c32be4 Author: PyTorch MergeBot Date: Tue Sep 13 02:43:15 2022 +0000 [vision hash update] update the pinned vision hash (#84913) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84913 Approved by: https://github.com/pytorchbot commit dc865bff4e9de5e02ea27d0b702a74c2bf63f02f Author: Seonglyong Gong Date: Tue Sep 13 01:48:41 2022 +0000 [Profiler] set_class util (part 1 of Record Optimizer) (#84779) Summary: Part 1 of Record Optimizer param_groups and states (https://github.com/pytorch/pytorch/pull/84063) - nnModule and Optimizer have duplicated parts - create a util function to avoid duplication Test Plan: buck run mode/opt //caffe2/test:profiler Differential Revision: D39397210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84779 Approved by: https://github.com/robieta commit 6d222116a13d55c2aa2211938f9df686535fbd51 Author: Abhijit Deo <72816663+abhi-glitchhg@users.noreply.github.com> Date: Tue Sep 13 00:29:50 2022 +0000 [Documentation] Minor rendering issue (#84856) There is a Rendering issue with the docstring of nn.GELU. Hope this fixes the [issue.](https://pytorch.org/docs/stable/generated/torch.nn.GELU.html) cc: @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/84856 Approved by: https://github.com/kit1980 commit 964fde7d7ceac67db6f0e30fc4a499d02904b09e Author: Edward Z. Yang Date: Mon Sep 12 11:38:08 2022 -0700 Raise AttributeError for __origin__ to avoid C++ exception raise (#84880) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84880 Approved by: https://github.com/wconstab commit 260b716c65a17a6791fc70420de3553be220a3da Author: Dhruv Matani Date: Sun Sep 11 19:44:27 2022 -0700 [Mobile Tracer] Allow tracing multiple input models at once (#84833) Summary: For practical usage, folks may want to custom build PyTorch for support with multiple models. The current tracer allows tracing just one model. There are multiple way to address this limitation: 1. Provide a tool to merge multiple YAML files produced by each of these runs. Each run corresponds to a YAML file for a single model. 2. Allow the tracer to run multiple models at once. This PR implements the solution [2] above. Test Plan: Build the tracer using: `USE_NUMPY=0 USE_DISTRIBUTED=0 USE_CUDA=0 TRACING_BASED=1 python setup.py develop` Run with 1 input file: `./build/bin/model_tracer --model_input_path /tmp/path_to_model.ptl --build_yaml_path /tmp/selected_ops.yaml` Run with multiple input files: `./build/bin/model_tracer --model_input_path /tmp/path_to_model.ptl,/tmp/path_to_model.ptl --build_yaml_path /tmp/selected_ops.yaml` Both runs completed successfully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84833 Approved by: https://github.com/JacobSzwejbka commit 5a29db142e4665577149e305db0583e55ef21683 Author: Xiao Wang <24860335+xwang233@users.noreply.github.com> Date: Mon Sep 12 22:45:38 2022 +0000 Use int64_t index type in multiplications to avoid integer overflow in max_pool2d and avg_pool2d on CUDA (#68682) Fix https://github.com/pytorch/pytorch/issues/68418 - [X] operator benchmark: https://github.com/xwang233/code-snippet/tree/master/pooling-bench-68682, 10% or worse regression are seen in some shapes - [X] end-to-end benchmark: no major regression seen in our test suites Pull Request resolved: https://github.com/pytorch/pytorch/pull/68682 Approved by: https://github.com/ngimel commit 918cd8b9bafe92da4209765c303b861eb10edf82 Author: David Eklov Date: Mon Sep 12 22:35:19 2022 +0000 [torch::deploy] Ignore return value of function declared with 'warn_unused_result' (#84862) Summary: Addresses the following build failure that we get on some of our internal build environments: caffe2/torch/csrc/deploy/environment.h:60:5: error: ignoring return value of function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result] system(rmCmd.c_str()); Test Plan: buck build //caffe2/torch/... Differential Revision: D39364411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84862 Approved by: https://github.com/PaliC commit 9b16bf04af4bdcc352998f77b96052f21567a2f2 Author: Nikita Shulga Date: Mon Sep 12 22:25:26 2022 +0000 Fix MPS test sanity (#84889) Follow up after https://github.com/pytorch/pytorch/pull/84834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84889 Approved by: https://github.com/tugsbayasgalan, https://github.com/janeyx99, https://github.com/ZainRizvi commit d09e8b23bf8c8c73f406b5610eda94e9dd3c2e96 Author: Ryan Spring Date: Mon Sep 12 22:19:06 2022 +0000 [primTorch] Add repeat and unfold_copy references (#81374) Add References: - repeat - unfold - expand_as Pull Request resolved: https://github.com/pytorch/pytorch/pull/81374 Approved by: https://github.com/mruberry, https://github.com/ngimel commit d6733327829fa02295c239ad96a26bef8afa6da4 Author: Ivan Zaitsev Date: Mon Sep 12 21:47:25 2022 +0000 Forward fix for FB internal breakage (manual export of internal diff D39421802) (#84871) D39421802 is a forward-fix for D39419569 (corresponds to #84806). The forward-fix is mostly internal-facing, but has a single line that has to be exported to GH first. Manual export was required, because automatic export failed due to diff dependecies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84871 Approved by: https://github.com/mehtanirav commit 5238404f4d8fcb21adbed7bd7a4a2a836e5764c2 Author: Kento Nozawa Date: Mon Sep 12 21:38:16 2022 +0000 Increment `version_range_max` (#84815) Python 3.10 should be added as a listing in `Programming Language` on https://pypi.org/project/torch/: Screenshot 2022-09-11 at 2 48 01 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84815 Approved by: https://github.com/malfet commit c85e47b36895d44e797536d2fbb45b3edc049767 Author: Andrew Gu Date: Mon Sep 12 21:31:01 2022 +0000 [BE][PT-D] Fix race on checkpoint file (#84881) Without calling `dist.barrier()` before removing the checkpoint file, rank 0 may run ahead and delete the checkpoint file before nonzero ranks are able to load from the checkpoint. This PR adds a `dist.barrier()` to ensure all ranks can load the checkpoint before rank 0 deletes it. For example, including the added `dist.barrier()`: https://github.com/pytorch/pytorch/blob/037e8eefcf0b669430211b83d19aedf2185ed6fc/torch/testing/_internal/distributed/distributed_test.py#L5068-L5098 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84881 Approved by: https://github.com/rohan-varma commit 3baa363f715547b2e9c569434fc7d2d226afd37e Author: Nikita Shulga Date: Fri Sep 9 22:52:12 2022 +0000 [Functorch] Make minpybind less likely to crash (#84788) Error handling in `vector_args::parse` is fundamentally broken, as it calls to [`_PyArg_ParseStackAndKeywords`](https://github.com/python/cpython/blob/000593c0f97ac9b75b56064a957b84a3aaa60674/Include/modsupport.h#L106) variadic function, which are akin to `sscanf` without any arguments, which results, in case of partial parse in a random segfault/stack corruption. Remedy it by passing a few references to dummy pyobject Pull Request resolved: https://github.com/pytorch/pytorch/pull/84788 Approved by: https://github.com/zou3519 commit ccb1ff22333ed1d28ee5287d75a3322b12b93f6a Author: karanprime Date: Mon Sep 12 21:15:02 2022 +0000 Updated invalid type error message to explicitly say only float types… (#83170) … allowed Fixes #82983 I added an error messages consistent with existing invalid type error messages (line 330 in torch/csrc/tensor/python_tensor.cpp). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83170 Approved by: https://github.com/ezyang, https://github.com/kit1980 commit cfeb53170051f4cf942ae683e1727fd48f60f18c Author: Alexander Grund Date: Mon Sep 12 20:59:17 2022 +0000 Fix failing test_model_dump due to empty file (#84744) The `torch.jit.save` call on a file object may not actually write the data to disk due to buffering. The call to `model_dump.main` on that file will when fail with an error like > zipfile.BadZipFile: File is not a zip file Inspecting the file confirms that it is either empty (usually) or incomplete (possible). Fix this by flushing the file after saving the model. Fixes #84745 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84744 Approved by: https://github.com/kit1980 commit cd3731bd1774ed3d152a5307c45cdbdc90ef8536 Author: Nikita Shulga Date: Mon Sep 12 18:28:23 2022 +0000 [BE] Refactor `_is_compiled()` function (#84877) Call it from `is_available()` and `device_count()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84877 Approved by: https://github.com/ngimel commit 31cc03cc132020244f6985aefdcd1c05babc2e17 Author: Jack Danger Date: Mon Sep 12 20:37:39 2022 +0000 fixing English typo in MPSFallback error message (#84834) Changing "current supported" to "currently supported" Pull Request resolved: https://github.com/pytorch/pytorch/pull/84834 Approved by: https://github.com/Chillee, https://github.com/kulinseth, https://github.com/kit1980 commit bb4e96c9644a034e593085026b781ee78a4d6a77 Author: soulitzer Date: Mon Sep 12 11:47:12 2022 -0400 [reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151) (#84675) This reverts commit acb4a09628284201281e262aaee58e3dc6be9c2b. In addition, we also fix a memory leak in layer norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84675 Approved by: https://github.com/zou3519 commit a2cccb2d6b0461b7f9c9922096a74240225ebc7b Author: Wenzhe Xue Date: Mon Sep 12 20:09:00 2022 +0000 add oneDNN graph fuser context API and unittest (#82491) Add oneDNN graph context manager API to be consistent with other fusers. NNC and nvFuser have two ways to use: 1) a function to enable/disable and 2) a context manager. And the later way is used extensively in libraries like Dynamo. Currently oneDNN Graph fuser only has the former way. To promote the usage of oneDNN graph fuser, this PR creates the context manager for oneDNN graph fuser. This PR should not affect any performance. A unit-test `test_context_manager` is added under `test/test_jit_llga_fuser.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82491 Approved by: https://github.com/malfet commit c304a1206bbffef33ed7c7c20aa0a4f1e169a32c Author: Andrew Gu Date: Mon Sep 12 17:24:12 2022 +0000 [FSDP][Easy] Remove unused functions (#84598) This removes some leftover functions from the constructor refactor. Differential Revision: [D39293430](https://our.internmc.facebook.com/intern/diff/D39293430) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84598 Approved by: https://github.com/zhaojuanmao commit 9e5563dbb1cb789d41370b8ab6120c62a385a74a Author: Edward Z. Yang Date: Mon Sep 12 09:25:09 2022 -0700 Delete SymIntArrayRef wrapper struct (#84837) Since we separated at::foo and at::foo_symint there is no benefit to trying to make initializer lists work in both cases. So we can get rid of the special different struct. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84837 Approved by: https://github.com/kit1980 commit 7e43c6f28e9b0699dd6f0a3803513feacc60eaf3 Author: titaiwang Date: Fri Sep 9 23:25:54 2022 +0000 [ONNX] replace AT_ASSERT with TORCH_INTERNAL_ASSERT (#84790) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84790 Approved by: https://github.com/kit1980 commit 034f2db1fdb253421e79bf36edca5423fd390e3a Author: PyTorch MergeBot Date: Mon Sep 12 19:04:07 2022 +0000 Revert "Delete SymIntArrayRef wrapper struct (#84837)" This reverts commit 9c78f599e40eac7fee027d86e03af06e251705b5. Reverted https://github.com/pytorch/pytorch/pull/84837 on behalf of https://github.com/ZainRizvi due to The test test_post_localSGD_optimizer_step_reload in the X linux-bionic-cuda11.6-py3.10-gcc7 workflow has started consistently failing since this PR was submitted commit c3df78f436d1906a920afabbd8c58af4ec8471d9 Author: Christian Puhrsch Date: Mon Sep 12 17:41:38 2022 +0000 TARGETs changes for flash attention and cutlass (#84781) Summary: Integrate flash attention and use it when the inputs align just right Test Plan: Unit tests and such Differential Revision: D39364603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84781 Approved by: https://github.com/mikaylagawarecki commit 9064bf2c721ee1df0e3698344412043eb80e4fa7 Author: atalman Date: Mon Sep 12 17:09:05 2022 +0000 Upgrade to CUDNN version for cuda 11.7 (#84859) Upgrade to CUDNN version for cuda 11.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84859 Approved by: https://github.com/malfet commit 4f6027b78a8f2e1fc07a50f9e0096de28ede429d Author: kshitij12345 Date: Mon Sep 12 16:59:05 2022 +0000 [opinfo] narrow: add new sample for Tensor overload (#84785) `narrow` accepts `start` argument to be a Tensor. We add a sample to test this overload. NOTE: This leads to a bunch of failed tests and hence the skips and xfails Pull Request resolved: https://github.com/pytorch/pytorch/pull/84785 Approved by: https://github.com/zou3519 commit a06f2edab63adc951afe1a8e3bf9ba606b729af1 Author: Dhruv Matani Date: Sat Sep 10 20:19:53 2022 -0700 [Build] Replace message() in caffe2/CMakeLists.txt with message in cmake/Summary.cmake (#84814) Summary: In [PR 84755](https://github.com/pytorch/pytorch/pull/84755), @cccclai noticed and mentioned the presence of `message(STATUS...)` logging in caffe2/CMakeLists.txt and suggested moving it to the file cmake/Summary.cmake. This PR addresses that comment/suggestion. Test Plan: Ran the build as `USE_NUMPY=0 USE_DISTRIBUTED=0 USE_CUDA=0 TRACING_BASED=1 python setup.py develop` and saw the follwing being printed: ``` -- BUILD_MOBILE_AUTOGRAD : OFF -- BUILD_LITE_INTERPRETER: OFF -- INTERN_BUILD_MOBILE : -- TRACING_BASED : 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84814 Approved by: https://github.com/cccclai commit d6b2f5c6433ef3e63e046096f4ad54e26eb17d10 Author: Jesse Cai Date: Fri Sep 9 07:29:36 2022 -0700 [Quant][fx] Remove `remove_quant_dequant_pairs` and fix tests (#84203) Summary: - `remove_quant_dequant_pairs` removes ops when a `quant` is followed by a `dequant` - It looks like the quantized implementation of `layer_norm` only supports float weights, so updated the default qconfig to avoid quantizing the weight param. - Fixes broken test, `test_norm_weight_bias`. This was the only test that broke, because the default qconfig dict we pass in quantizes the weight. I just pulled the native qconfig object and converted it to a dict. - Adds in qconfig and backend config support for layernorm Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels ``` Reviewers: Subscribers: Tasks: Fixes https://github.com/pytorch/pytorch/issues/83110 Tags: quant, fx Differential Revision: [D39395141](https://our.internmc.facebook.com/intern/diff/D39395141) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84203 Approved by: https://github.com/jerryzh168 commit e217b30b0fc98df226654cef0617eb41a177531f Author: Mikayla Gawarecki Date: Mon Sep 12 04:03:49 2022 +0000 Add `torch.nested` namespace (#84102) First step towards #83775 - only `to_padded_tensor` is moved to the nested namespace for now - following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in `torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`. ~~**Question**: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~ [generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested) Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102 Approved by: https://github.com/drisspg commit 9c78f599e40eac7fee027d86e03af06e251705b5 Author: Edward Z. Yang Date: Mon Sep 12 09:25:09 2022 -0700 Delete SymIntArrayRef wrapper struct (#84837) Since we separated at::foo and at::foo_symint there is no benefit to trying to make initializer lists work in both cases. So we can get rid of the special different struct. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84837 Approved by: https://github.com/kit1980 commit 8cdc0679b91d6e688e857b3e02caa2b4823d19ab Author: Jeff Daily Date: Mon Sep 12 15:20:51 2022 +0000 [ROCm][jiterator] unskip additional tests (#84371) Follow-up to #77982. Unskip additional jiterator tests for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84371 Approved by: https://github.com/ngimel, https://github.com/SherlockNoMad commit 2698f99dc7a2efe6d60fffa43beb545901a57c9b Author: Slava Kovalevskyi Date: Mon Sep 12 14:15:52 2022 +0000 fixing form link for governance (#84861) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84861 Approved by: https://github.com/malfet commit d2d145a40001d1e1f815a144160bd0b8d0f60ea0 Author: PyTorch MergeBot Date: Mon Sep 12 10:03:48 2022 +0000 [xla hash update] update the pinned xla hash (#84853) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84853 Approved by: https://github.com/pytorchbot commit 5ea2eb304ea6e9bab0c68fc57dbffbc068354ce7 Author: Horace He Date: Mon Sep 12 02:16:46 2022 +0000 Converted batch norm over to use symints (#84113) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84113 Approved by: https://github.com/wconstab, https://github.com/ezyang commit caf034a9a2326bbf70a8f042cb5b527b789b3062 Author: Edward Z. Yang Date: Sun Sep 11 06:29:09 2022 -0700 Fix bugs in how LTC decides whether or not to symint op or not (#84832) This fixes two problems: - First, shape signature didn't respect the symint property (so it would always mark the operator as symint). This was relatively easy to fix. - Second, the call to fallback goes directly through at::_ops, so it must always be SymInt-aware, even if SymInt is disabled externally. This was a bit more difficult, because the current LTC codegen is poorly factored. First, I needed to make it so individual arguments knew if they were going to be SymInt in LTC or not; second, I need to plumb enough information about the enclosing bindings so that I could use translate to do the expressions (previously, it was just assumed the signatures matched.) The LTC codegen would do well to have a complete rewrite, but this will have to do for now, I suppose. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84832 Approved by: https://github.com/wconstab commit bfc6db0a5af1acf2a4cb864c334f17bcd08fc079 Author: PyTorch MergeBot Date: Sun Sep 11 02:38:32 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#84828) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84828 Approved by: https://github.com/pytorchbot commit 5f960db0e01839f1de8735060b374ea6cbd1713a Author: Dhruv Matani Date: Sat Sep 10 09:36:31 2022 -0700 [Mobile] Add support for dtypes and custom classes in model tracer (#84795) Summary: Currently, the model tracer generates the selected features YAML file only with used operators. This change adds support for dtypes and custom classes as well. We need to add the flag `-DENABLE_RECORD_KERNEL_FUNCTION_DTYPE` when building PyTorch in Instrumentation Mode (i.e. with `TRACING_BASED=1` for server builds) to enable capturing this data. Test Plan: Built using `USE_NUMPY=0 USE_DISTRIBUTED=0 USE_CUDA=0 TRACING_BASED=1 python setup.py develop` Ran the model tracer to observe this generated file: https://gist.github.com/dhruvbird/50e1860b39ae065e57d58f17e0912136 Then used the generated YAML to built pytorch (minimal build) using the command ``` BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN=1 \ USE_LIGHTWEIGHT_DISPATCH=0 BUILD_LITE_INTERPRETER=1 \ SELECTED_OP_LIST=/tmp/selected_ops.yaml \ TRACING_BASED=1 \ ./scripts/build_mobile.sh ``` After that I generated a binary using this command: ``` g++ /tmp/main.cpp -L build_mobile/lib/ -I build_mobile/install/include/ -ffunction-sections -fdata-sections -Wl,--gc-sections \ -lpthread -lc10 -Wl,--whole-archive -ltorch_cpu -Wl,--no-whole-archive -ltorch -lXNNPACK \ -lpytorch_qnnpack -lcpuinfo -lclog -lpthreadpool -lkineto -lfmt -ldl -lc10 ``` The table below shows the size reduction in all build modes. | Build Type | Unstripped | Stripped | | ----------- | ----------- | ----------- | | Standard | 49MiB | 34MiB | | Minimal w/o dtype | 6.1MiB (12%) | 4.5MiB (18%) | | Minimal w/ dtype | 3.7MiB (7%) | 2.7MiB (11%) | Pull Request resolved: https://github.com/pytorch/pytorch/pull/84795 Approved by: https://github.com/cccclai commit 0455c9b036a18505d7ae19b6d8b4ef9bef869365 Author: PyTorch MergeBot Date: Sat Sep 10 18:34:33 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#84797) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84797 Approved by: https://github.com/pytorchbot, https://github.com/kit1980 commit b5e921b89e355a6aa7b80fa35556cffe9438bc15 Author: PyTorch MergeBot Date: Sat Sep 10 18:33:49 2022 +0000 [vision hash update] update the pinned vision hash (#84798) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84798 Approved by: https://github.com/pytorchbot, https://github.com/kit1980 commit 21bf9a467eb86a152e471e04837b98617098f32f Author: Khushi Agrawal Date: Sat Sep 10 13:30:43 2022 +0000 [jiterator] logical_{or, xor} : complex (#75947) Follows: #74748 cc @kshitij12345! Pull Request resolved: https://github.com/pytorch/pytorch/pull/75947 Approved by: https://github.com/ngimel, https://github.com/kshitij12345 commit 08c4f8c7a76f3f1c874aa357fc5aafdfb87ce680 Author: Xiang Gao Date: Sat Sep 10 10:56:05 2022 +0000 ProcessGroupUCC tests (#83285) - [x] Direct dependency on UCX is completely removed, UCC active set API always enabled - [x] Remove `TORCH_UCC_PROFILING_ENABLE`, always enable profiling - [x] Fixes profiling of `recv` and `all_gather` - [x] Use the NCCL TL of UCC on CUDA, as the UCP TL is not well supported on CUDA Most tests are passing, but there are a few skipped tests: - `scatter` and `gather` are not supported by the UCP TL of UCC on CPU tensors - A few flaky tests in PyTorch's CI environment - Profiler-related failures, some of them will be fixed by @Fuzzkatt in https://github.com/pytorch/pytorch/pull/84368 After this PR is merged, I will continue to work on these skipped failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83285 Approved by: https://github.com/vtlam, https://github.com/malfet, https://github.com/kwen2501 commit 2765243cd5e657a92142d09504dafadb058de63f Author: Mengwei Liu Date: Sat Sep 10 06:58:56 2022 +0000 [torchgen] Refactor static_dispatch to take in source signature (#84384) Summary: Context: currently `static_dispatch` assumes that given a native function `f`, we always want to map from its `DispatchSignature` to its `CppSignature`. This assumption may not hold true for some use cases, where the source bindings may not come from its `DispatchSignature`. Here I'm changing the argument `sig: DispatcherSignature` to be `sig: Union[CppSignature, DispatcherSignature]`, also removes unused `f` Test Plan: Rely on added unit test. Differential Revision: D39192969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84384 Approved by: https://github.com/iseeyuan commit c5a8946e40d6cda42aa38dda2705ea4e9930c2cb Author: Edward Z. Yang Date: Sat Sep 10 00:00:38 2022 -0400 Revert "Revert "Redo how custom/python_custom methods on TensorImpl work (#84796)" (#84806) This reverts commit ca3b2bfbe3945c756a67a784aaa7d9891698c59b. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84806 Approved by: https://github.com/Chillee commit bccc26f365d8b795e2931797d283e32c5f47aa0f Author: Abhishek Pathak Date: Sat Sep 10 03:10:04 2022 +0000 [MPS] Handle casting for div operation (#84742) * Handle casting for div operation * Update divmode test to test for rounding mode in div cc. @lhoenig Pull Request resolved: https://github.com/pytorch/pytorch/pull/84742 Approved by: https://github.com/razarmehr commit ddc56732ae30a3290a577e9694e037e108a3fff3 Author: Nikita Shulga Date: Sat Sep 10 02:45:35 2022 +0000 [GHF][BE] Delete land checks branch when done (#84767) Also, don't create this branch if running with dry-run Pull Request resolved: https://github.com/pytorch/pytorch/pull/84767 Approved by: https://github.com/clee2000, https://github.com/huydhn commit b7d2818598b1c9f6d35f027f8d122543521c6a6a Author: Ryan Spring Date: Sat Sep 10 02:36:41 2022 +0000 Return contiguous tensor from native_layer_norm reference (#84799) Fixes https://github.com/pytorch/pytorch/issues/84618 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84799 Approved by: https://github.com/Chillee commit 5e25c2b4ccb3224366d3cd3dc790b1e23440a49f Author: Xiang Gao Date: Sat Sep 10 00:50:02 2022 +0000 Add missing spaces to error messages in PG (#84780) Just some formatting, no real changes. See https://github.com/pytorch/pytorch/pull/83285#discussion_r966469992 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84780 Approved by: https://github.com/kit1980 commit ca3b2bfbe3945c756a67a784aaa7d9891698c59b Author: Eli Uriegas Date: Sat Sep 10 00:18:13 2022 +0000 Revert "Redo how custom/python_custom methods on TensorImpl work (#84796) This reverts commit 591b75bf98b92acd4f3d0a1dc934198afeaa6fc1. Manual revert of https://github.com/pytorch/pytorch/pull/84641 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84796 Approved by: https://github.com/izaitsevfb commit 96e4bd950027a2f472fafa98616c92403a890bd2 Author: Dmytro Dzhulgakov Date: Sat Sep 10 00:09:57 2022 +0000 [docs] Person of interest update: sparse, torchrec and smaller tweaks (#84772) Fixes #83363 This is not a full update yet, but fixes some obvious things: missing modules (torchrec, sparse) and brings a few people from merge_rules.json who are working on the respective modules. There are still discrepancies - e.g. Intel CPU work is split in many categories in merge_rules, but it's better to improve things incrementally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84772 Approved by: https://github.com/b0noI, https://github.com/malfet commit f598b5be1825fc0a12b5013c547fb5972b57b208 Author: Sergii Dymchenko Date: Fri Sep 9 23:00:12 2022 +0000 Remove last bit or torch.eig from functorch/test/test_ops.py (#84787) After https://github.com/pytorch/pytorch/pull/70982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84787 Approved by: https://github.com/suo, https://github.com/seemethere commit cd50512d414e352fb9088805d8d66bf6880895d1 Author: Xu Zhao Date: Fri Sep 9 22:01:20 2022 +0000 Upload the benchmark result to S3 and post the URL (#84726) Upload the benchmark result to S3 and make it accessible to the public. The URL is available at the end of the "Upload to S3" step of the workflow. For example, this PR uploads 3 files: ``` Uploaded the result file control.json to https://ossci-metrics.s3.amazonaws.com/torchbench-pr-test/pr84726/control.json Uploading file treatment.json to S3 with key: torchbench-pr-test/pr84726/treatment.json Uploaded the result file treatment.json to https://ossci-metrics.s3.amazonaws.com/torchbench-pr-test/pr84726/treatment.json Uploading file result.csv to S3 with key: torchbench-pr-test/pr84726/result.csv Uploaded the result file result.csv to https://ossci-metrics.s3.amazonaws.com/torchbench-pr-test/pr84726/result.csv ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/84726 Approved by: https://github.com/davidberard98 commit 01c54ad6dedf2ab0206a37f6df1af4fe41afa051 Author: Ivan Yashchuk Date: Fri Sep 9 21:31:57 2022 +0000 Remove deprecated torch.eig (#70982) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.eig`. cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/70982 Approved by: https://github.com/Lezcano, https://github.com/malfet commit c4a5255df77c7b945d134fa58fe59684f41c33a8 Author: Dhruv Matani Date: Fri Sep 9 12:16:52 2022 -0700 [Mobile Tracer] Use unified source file list for BUCK build (#84770) Currently, the source list `torch_mobile_tracer_sources` in `build_variables.bzl` is used only for OSS build. This resulted in a regression for OSS builds when `TRACING_BASED=1` was used to build the OSS model tracer binary. To prevent this from happening in the future, it makes sense to re-use this list for internal BUCK builds as well. This change does that. Differential Revision: [D39392010](https://our.internmc.facebook.com/intern/diff/D39392010/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39392010/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84770 Approved by: https://github.com/cccclai commit 1dabb51a16eb6cf81475efecb1d39c4683af50fb Author: Vasiliy Kuznetsov Date: Fri Sep 9 10:35:47 2022 -0700 quant: add `extra_repr` to HistogramObserver (#84760) Summary: Adds `extra_repr` to `HistogramObserver`. This is useful when debugging PTQ models because it allows to quickly check whether a `HistogramObserver` has received data or not. Test plan: ``` >>> import torch >>> obs = torch.ao.quantization.HistogramObserver() >>> obs(torch.randn(1, 3, 224, 224)) ... >>> print(obs) // before - hard to tell if observer has seen data HistogramObserver() // after HistogramObserver(min_val=-4.778339862823486, max_val=4.311892986297607) >>> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84760 Approved by: https://github.com/andrewor14 commit 0fc02dbba40129e4d0cb01aea2a4667bf0cc928f Author: Driss Guessous Date: Fri Sep 9 20:11:26 2022 +0000 flash_attention integration (#81434) - I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on. - Only looked at CMake did not attempt bazel or buck yet. - I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434 Approved by: https://github.com/cpuhrsch commit 219ff26172d0b5abea89ea5bde7e0f7119efed59 Author: PyTorch MergeBot Date: Fri Sep 9 20:01:07 2022 +0000 Revert "Add __all__ for a few distributed modules plus a little typing (#84119)" This reverts commit 6f216805634e5859b76253432542a1c4c60ee573. Reverted https://github.com/pytorch/pytorch/pull/84119 on behalf of https://github.com/izaitsevfb due to breaking internal builds, see D39386448 commit 2614079f890193ea78099cdc7b4361d5e1ccfde1 Author: Peter Bell Date: Thu Sep 8 15:18:23 2022 +0100 OpInfo: Prevent clamp sample inputs from sharing tensors (#84696) As per the comment, re-using tensors between sample inputs is strongly discouraged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84696 Approved by: https://github.com/ngimel commit 5c0c8f2ce344f74849afaed88df93292cb30ce0b Author: Max Ren Date: Fri Sep 9 19:32:40 2022 +0000 [coreml][bug] coreml gpu flag not set (#84725) Summary: Delegated CoreML models with cpuAndGPU flag set does not properly run models on CPU - Fix will allow us to target models on CPU Test Plan: brymkowski can you test this on your performance benchmarks? Reviewed By: salilsdesai Differential Revision: D39361382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84725 Approved by: https://github.com/jmdetloff commit fe47e61425dc7b44c7e90fc6c06052a83786ad57 Author: Salil Desai Date: Fri Sep 9 19:19:12 2022 +0000 [QNNPack] Update GoogleTest SHA256 Hash (#84754) Summary: Fixes hash mismatch error when building qnnpack Test Plan: ``` export ANDROID_NDK=/opt/android_ndk/r20 export ANDROID_NDK_HOME=${ANDROID_NDK} export ANDROID_SDK=/opt/android_sdk export ANDROID_HOME=${ANDROID_SDK} cd ~/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack ./scripts/build-android-arm64.sh ``` ``` [1/9] Creating directories for 'googletest' [2/9] Performing download step (download, verify and extract) for 'googletest' FAILED: googletest-prefix/src/googletest-stamp/googletest-download /data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/googletest-stamp/googletest-download cd /home/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/deps && /data/users/salilsdesai/miniconda3/envs/pytorch/bin/cmake -P /data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/googletest-stamp/download-googletest.cmake && /data/users/salilsdesai/miniconda3/envs/pytorch/bin/cmake -P /data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/googletest-stamp/verify-googletest.cmake && /data/users/salilsdesai/miniconda3/envs/pytorch/bin/cmake -P /data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/googletest-stamp/extract-googletest.cmake && /data/users/salilsdesai/miniconda3/envs/pytorch/bin/cmake -E touch /data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/googletest-stamp/googletest-download -- Downloading... dst='/data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/release-1.10.0.zip' timeout='none' inactivity timeout='none' -- Using src='https://github.com/google/googletest/archive/release-1.10.0.zip' -- [download 1% complete] ... -- [download 100% complete] -- verifying file... file='/data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/release-1.10.0.zip' -- SHA256 hash of /data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/release-1.10.0.zip does not match expected value expected: 'f3ed3b58511efd272eb074a3a6d6fb79d7c2e6a0e374323d1e6bcbcc1ef141bf' actual: '94c634d499558a76fa649edb13721dce6e98fb1e7018dfaeba3cd7a083945e91' -- Hash mismatch, removing... ```` ``` [1/9] Creating directories for 'googletest' [2/9] Performing download step (download, verify and extract) for 'googletest' -- Downloading... dst='/data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/release-1.10.0.zip' timeout='none' inactivity timeout='none' -- Using src='https://github.com/google/googletest/archive/release-1.10.0.zip' -- [download 1% complete] ... -- [download 100% complete] -- verifying file... file='/data/users/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/release-1.10.0.zip' -- Downloading... done -- extracting... src='/home/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/build/android/arm64-v8a/deps/googletest-download/googletest-prefix/src/release-1.10.0.zip' dst='/home/salilsdesai/fbsource/fbcode/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/deps/googletest' -- extracting... [tar xfz] -- extracting... [analysis] -- extracting... [rename] -- extracting... [clean up] -- extracting... done [3/9] No update step for 'googletest' [4/9] No patch step for 'googletest' [5/9] No configure step for 'googletest' [6/9] No build step for 'googletest' [7/9] No install step for 'googletest' [8/9] No test step for 'googletest' [9/9] Completed 'googletest' ``` Reviewed By: digantdesai Differential Revision: D39273970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84754 Approved by: https://github.com/digantdesai commit daffff9986b2f427496b1b8a782f95f93b70f725 Author: Taylor Robie Date: Thu Sep 8 11:03:22 2022 -0700 [Profiler] Make `RecordQueue` manage the lifetime of `PythonTracer`. (#83964) `PythonTracer` holds a pointer to an owning `RecordQueue`, however that relationship is not enforced and it is possible to dangle that pointer if the ProfilerState owning the `RecordQueue` is destroyed without proper cleanup. We currently use a singleton to enforce the requirement that only one python tracer is active at a time, however a better formulation is to simply enforce that with an atomic bool and manage object lifetime through composition. In this new architecture, `RecordQueue` explicitly holds a unique_ptr to the python tracer instance. That way if `~RecordQueue` is called it will call `~PythonTracer` which can then clean up any state. Overall it is just a simpler ownership model, and less prone to unexpected failures. Differential Revision: [D38955616](https://our.internmc.facebook.com/intern/diff/D38955616/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83964 Approved by: https://github.com/slgong-fb commit 328538700a505e7fee4ba66f4b8de72c2cc44217 Author: Taylor Robie Date: Thu Sep 8 11:03:20 2022 -0700 [Profiler][Trivial] Move `PythonTracerBase` to `torch/csrc/profiler/orchestration` (#83895) The ownership model between `RecordQueue` and `PythonTracer` is brittle; if a profiler is popped without proper shutdown it can dangle a reference in `PythonTracer` which will segfault when dereferenced. The next PR will address this; to start we simply move the code into `torch/csrc/profiler/orchestration` to limit the sloc delta when making actual changes. Differential Revision: [D38933962](https://our.internmc.facebook.com/intern/diff/D38933962/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38933962/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83895 Approved by: https://github.com/slgong-fb commit e8b950186159247639e6645ba50f57f2a00ac6b0 Author: Sean Ross-Ross Date: Fri Sep 9 18:54:47 2022 +0000 test: adding uniform (#84292) Adding OpInfo for uniform Pull Request resolved: https://github.com/pytorch/pytorch/pull/84292 Approved by: https://github.com/amjames, https://github.com/ngimel commit a3855cc611e8be5a76254165a7468503208c7285 Author: Kimish Patel Date: Fri Sep 9 18:54:14 2022 +0000 Make xnnpack based convs thread safe (#84602) Summary: For convolution xnnpack uses indirection buffer. This needs setup if input dimensions change. If we run the same model from multiple threads each supplying different input sized tensor, then there is a race condition where, indirection buffer might be in use by one thread, while being reset by another. This diff adds a lock to each conv object so as to serialize the execution and prevent such race conditions. When uncontended, it should not have perf impact. Test Plan: TestConvolution2dMultiThreaded Differential Revision: D39288298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84602 Approved by: https://github.com/digantdesai commit 2c4eaddb28a9216d049adf41518199299ffac93e Author: Peter Bell Date: Thu Sep 8 15:18:18 2022 +0100 Use exclude_zero in i0e sample inputs function (#84453) As per the todo, this is now supported by `make_tensor` directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84453 Approved by: https://github.com/mruberry, https://github.com/ngimel commit 93aef3a010b5488c57ffce3e5dc14ea13d0b78d2 Author: Eli Uriegas Date: Fri Sep 9 11:29:07 2022 -0700 Use presence of _symint in kernel name to generate symint sig or not (#84579) Something people found confusing was that whether or not a native:: signature would get SymInt or not in its type was based on the dispatch key. This changes it so that SymInt or not in type is based on whether or not you have _symint in the name of the kernel or not. This means that even when we make operators support SymInt, you no longer have to go and update all the preexisting definitions; instead, you now selectively write _symint to opt individual kernels into SymInt support. I then go and update a bunch of kernels that don't have proper SymInt support to make use of this convention. There is some hacking around for view generation code. I also add support for external backends to specify 'symint' operators, for which we generate SymInt signatures instead of regular signatures. Signed-off-by: Edward Z. Yang Differential Revision: [D39310060](https://our.internmc.facebook.com/intern/diff/D39310060) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84579 Approved by: https://github.com/wconstab commit 18a31cc0448f226f4c2dd9926d24aaef86409f1c Author: Dhruv Matani Date: Fri Sep 9 08:51:12 2022 -0700 [Mobile] Fix The Build For Model Tracer (#84755) Summary: Currently, the model tracer build is broken because of 2 reasons: 1. A few source files are missing, resulting in missing link time symbols 2. The `TRACING_BASED` flag isn't passed correctly from the command line (specified as an evnironment variable) as a CMake flag Both these issues were fixed. Test Plan: Ran this command: `USE_CUDA=0 TRACING_BASED=1 python setup.py develop --cmake` and saw that the tracer binary was built at `build/bin/model_tracer` - also ran it to ensure that it can generate a YAML file. Differential Revision: [D39391270](https://our.internmc.facebook.com/intern/diff/D39391270) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84755 Approved by: https://github.com/cccclai commit 52224139b85bed65595e6c57eda8c265bb4e9d84 Author: PyTorch MergeBot Date: Fri Sep 9 18:21:51 2022 +0000 Revert "Convert NoopPyInterpreterVTable into a Meyer singleton (#84656)" This reverts commit 9162bc025256d638369c77c845b8a5ed66eeff5a. Reverted https://github.com/pytorch/pytorch/pull/84656 on behalf of https://github.com/ezyang due to this breaks some build configs commit ac364f8ba1a20484d118362bcc11b5730dec676a Author: Jean Schmidt Date: Fri Sep 9 17:23:28 2022 +0000 Removing .github/scale-config.yml, now this repo is using the config in test-infra (#84753) [As communicated internally](https://fb.workplace.com/groups/pytorch.dev/permalink/1189033455008466/), all repositories now rely on a single scale-config.yml that is on pytorch/test-infra. As such this file is no longer used and to avoid confusion it is better to remove it. Here is a short summary of the announcement: > [As previously announced](https://fb.workplace.com/groups/pytorch.dev/permalink/1173939633184515/), the scale-config.yml file in each repository for the pytorch/ organization is now not being used to control GHA runners. On its place, [the file with same path on test-infra](https://github.com/pytorch/test-infra/blob/main/.github/scale-config.yml) repository is controlling and enabling runners. If you feel the need for new runners, or change settings for current ones, feel free to submit a PR with required changes on the former file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84753 Approved by: https://github.com/janeyx99, https://github.com/seemethere commit 28c830ac0725c3689fb7cd9ff293fdf4b0453941 Author: Chien-Chin Huang Date: Thu Sep 8 08:45:23 2022 -0700 [FSDP] Optimizer states may be on CPU, copy them to GPU before gathering (#84708) **Background**: Optimizer states may not always on GPUs. Some examples include, 1.) CPU offload is enable, 2.) after lightning trainer fit() is called. **What Does This PR Do?** If states are not on GPUs, move them to GPUs before gathering the global states. Differential Revision: [D39332300](https://our.internmc.facebook.com/intern/diff/D39332300/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84708 Approved by: https://github.com/awgu commit 0fd8f6b93cb3d1342a10ef71d4b27356f0dfc9b1 Author: Clive Chan Date: Fri Sep 9 16:13:05 2022 +0000 Missed one CHECK_NOTNULL in #82032's find-replace (#84720) Building master fails with the following: ``` pytorch/caffe2/contrib/nccl/cuda_nccl_gpu.cc:180:51: error: 'CHECK_NOTNULL' was not declared in this scope; did you mean 'TORCH_CHECK_NOTNULL'? 180 | CUDA_ENFORCE(cudaStreamWaitEvent(CHECK_NOTNULL(ex.stream), event, 0)); ``` Seems like #82032 just missed one find-replace. cc @wconstab Not sure why this wouldn't have been caught elsewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84720 Approved by: https://github.com/wconstab commit d12f3524b7a3b7267d90ae502208f0aeda881ced Author: Mateusz Sypniewski Date: Fri Sep 9 15:29:34 2022 +0000 Add user facing documentation for CSAN (#84689) This adds a user facing tutorial for the CSAN tool. The documentation preview should be available [here](https://docs-preview.pytorch.org/84689/index.html) once the GitHub job completes on this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84689 Approved by: https://github.com/lw commit a8198a09559d8f94247106499443f95e1a6da06e Author: Michael Andreas Dagitses Date: Thu Sep 8 05:26:40 2022 -0700 remove c10_defs.bzl and embed its logic directly where it is used (#84595) Differential Revision: [D39287079](https://our.internmc.facebook.com/intern/diff/D39287079/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39287079/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84595 Approved by: https://github.com/DanilBaibak commit 214a6500e3c03ecfadf12e28fcd576efebcd8cfc Author: Jerry Zhang Date: Thu Sep 8 22:20:12 2022 -0700 [quant][docs] Additonal fixes for quantize_fx docs (#84587) Summary: Some more clarifications for the arguments, including linking to object docs (QConfigMapping, BackendConfig) and adding types in the doc Test Plan: ``` cd docs make html ``` and visual inspection for the generated docs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84587 Approved by: https://github.com/vkuzo commit 0a89bdf9892b3021aca0bcd3df7388a20e24cfd1 Author: Richard Zou Date: Fri Sep 9 08:00:04 2022 -0700 Set up aten/src/ATen/functorch directory; move some files there (#84648) This PR: - sets up aten/src/ATen/functorch in PyTorch's build system - Moves {BatchedTensorImpl.h, and BatchedTensorImpl.cpp} there as a test. Test Plan: - functorch build and test should pass Differential Revision: [D39315051](https://our.internmc.facebook.com/intern/diff/D39315051) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84648 Approved by: https://github.com/ezyang commit 8e57ce63a11041c27d182b12cb30ac24d5c0cdcd Author: Mateusz Sypniewski Date: Fri Sep 9 04:51:33 2022 -0700 Add CSAN support for CPU synchronizations (#84428) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84428 Approved by: https://github.com/ngimel, https://github.com/lw commit 7702ca49937b34843fe2f79ea7344451a83f1157 Author: Richard Zou Date: Fri Sep 9 08:00:03 2022 -0700 [functorch] Simplify BatchedTensorImpl (#84642) Three changes: - deleted an unused constructor - simplified an implementation from the old days when a BatchedTensorImpl could have multiple bdims - added a comment about getKeysToPropagateToWrapper Differential Revision: [D39315049](https://our.internmc.facebook.com/intern/diff/D39315049) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84642 Approved by: https://github.com/samdow commit 0d46bfac5b236774f50227c99328c6ef44f2fe1a Author: Dhruv Matani Date: Thu Sep 8 13:48:19 2022 -0700 [Mobile] Use -ffunction-sections and -fdata-sections for Mobile builds (#84704) Summary: Set `-ffunction-sections` and `-fdata-sections` so that each method has its own text section. This allows the linker to remove unused section when the flag `-Wl,-gc-sections` is provided at link time. Test Plan: CI and local build using `build_mobile.sh` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84704 Approved by: https://github.com/JacobSzwejbka, https://github.com/cccclai commit 747f27a9adb484113f91560525183d8814eab41d Author: Dhruv Matani Date: Thu Sep 8 13:48:19 2022 -0700 [Mobile] Update build_mobile.sh to allow lite interpreter and tracing based builds (#84647) Summary: Currently, build_mobile.sh doesn't allow lite interpreter builds or tracing based selective builds. build_mobile.sh is used for host builds of PyTorch for Mobile deployment. Additionally, certain flags such as `USE_BLAS` were not being respected as they should be. This change addresses that as well. Test Plan: Build using: ``` cat /tmp/selected_ops.yaml - aten::add - aten::sub ``` ``` BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN=1 USE_LIGHTWEIGHT_DISPATCH=0 BUILD_LITE_INTERPRETER=1 SELECTED_OP_LIST=/tmp/selected_ops.yaml ./scripts/build_mobile.sh ``` ``` cat /tmp/main.cpp int main() { auto m = torch::jit::_load_for_mobile("/tmp/path_to_model.ptl"); auto res = m.forward({}); return 0; } ``` Test using: ``` g++ /tmp/main.cpp -L build_mobile/lib/ -I build_mobile/install/include/ -lpthread -lc10 -ltorch_cpu -ltorch -lXNNPACK -lpytorch_qnnpack -lcpuinfo -lclog -lpthreadpool -lgloo -lkineto -lfmt -ldl -lc10 ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84647 Approved by: https://github.com/JacobSzwejbka, https://github.com/cccclai commit 27e5299ee3dfe8a48f835e6a8ce11ae697d01937 Author: Kevin Tse Date: Thu Sep 8 23:51:21 2022 +0000 [DataPipe] Fix mishandling of exception message when error is not iterable (#84676) We sometimes get an exception message like this: ``` This exception is thrown by __iter__ of TarArchiveLoaderIterDataPipe(datapipe=FileOpenerIterDataPipe, length=-1, mode='r:') elif msg not in e.args[0] and single_iterator_msg not in e.args[0]: TypeError: argument of type 'int' is not iterable ``` The `TypeError` raised by the mishandling of the error message obfuscates the true exception, which now will be show as: ``` FileNotFoundError: [Errno 2] No such file or directory: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84676 Approved by: https://github.com/ejguan commit df3377fb6485f7fe4cb4a9bc928ee8eb9a0d5b10 Author: Richard Zou Date: Fri Sep 9 06:58:37 2022 -0700 [functorch] delete functorch/csrc/Constants.h (#84639) This file aliased dispatch keys. The original purpose was so that we could change the dispatch keys in pytorch core without changing functorch too much, but there's no need for the layer of indirection anymore. Also moved SINGLE_ARG to functorch/csrc/Macros.h, but that might need a new home later. Differential Revision: [D39315052](https://our.internmc.facebook.com/intern/diff/D39315052) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84639 Approved by: https://github.com/samdow commit 09bcc006e9d10ef62dcf38dd6547a2154c3b1b57 Author: jataylo Date: Fri Sep 9 14:14:59 2022 +0000 ROCm support for test_lazy_init (#84333) Added ROCm support for the test_lazy_init unit test by including a condition on TEST_WITH_ROCM to switch CUDA_VISIBLE_DEVICES with HIP_VISIBLE_DEVICES. This is needed because HIP_VISIBLE_DEVICES is set when running the single-GPU tests in CI: https://github.com/pytorch/pytorch/blob/a47bc96fb7176d43752d3e376697971d4ba47317/.jenkins/pytorch/test.sh#L38, but this test sets CUDA_VISIBLE_DEVICES, which takes lower precedence than HIP_VISIBLE_DEVICES on ROCm. **Testing Logs (to show behavior difference)** 12:40:41 Aug 30 11:40:41 CUDA_VISIBLE_DEVICES='0': 0 12:40:41 Aug 30 11:40:41 1 12:40:41 Aug 30 11:40:41 CUDA_VISIBLE_DEVICES='32': 32 12:40:41 Aug 30 11:40:41 1 12:40:41 Aug 30 11:40:41 HIP_VISIBLE_DEVICES='0': 0 12:40:41 Aug 30 11:40:41 1 12:40:41 Aug 30 11:40:41 HIP_VISIBLE_DEVICES='32': 32 12:40:41 Aug 30 11:40:41 0 **Passing UT** Aug 30 17:03:15 test_lazy_init (main.TestCuda) Aug 30 17:03:17 Validate that no CUDA calls are made during import torch call ... ok (2.471s) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84333 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet commit 67d6f7160ce82be80eb22e40c4ae16084cfd4ee0 Author: Mateusz Sypniewski Date: Fri Sep 9 03:20:05 2022 -0700 Add synchronize hooks (#84427) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84427 Approved by: https://github.com/ngimel, https://github.com/lw commit 591b75bf98b92acd4f3d0a1dc934198afeaa6fc1 Author: Edward Z. Yang Date: Thu Sep 8 09:26:09 2022 -0700 Redo how custom/python_custom methods on TensorImpl work (#84641) A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`. The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested. In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true. Billing of changes: * Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions. * Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.) * I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly. * The default custom implementations now more reliably call their default() implementations * As bonus refactor, I devirtualized some functions that don't need to be virtual * `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize. * This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641 Approved by: https://github.com/wconstab commit d802fcfcd81f86333e25135a20802aa39b275da1 Author: Ivan Yashchuk Date: Fri Sep 9 07:58:21 2022 +0000 Add config to PrimTorch's nvFuser executor (#84482) This PR adds `executor_parameters` keyword argument to `torch._prims.executor.execute`. For now there are two knobs: * `use_python_fusion_cache: bool = True` whether to use lru_cache when constructing fusion object or not. * `allow_single_op_fusion: bool = True` whether to allow fusions with single callable Behavior can be controlled by passing dict with custom specified values as `executor_parameters` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84482 Approved by: https://github.com/jjsjann123, https://github.com/ngimel commit 6f72c13f9b749b28b91732a56e997b25aa692a8d Author: mingfeima Date: Thu Sep 8 14:09:57 2022 +0800 test mkldnn conv2d channels last when weight is nchw format (#77348) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77348 Approved by: https://github.com/malfet commit 1840f24df737046d085691d230ca2fe86cccb0d2 Author: Chien-Chin Huang Date: Wed Sep 7 17:23:09 2022 -0700 [FSDP] Ensure that all ranks use the same order to iterate through optimizer states (#84654) **Background:** Optimizer states are of the type `Dict[int, Dict[str, torch.Tensor]]` and the order of `dict.items()` is the creation order of keys. Without checkpoint (state_dict/load_state_dict), the creation order of keys depends on the implementation of the optimizer (e.g., Adam seems to creates `exp_avg` then `exp_avg_sq`). However, when loading states from a checkpoint, since the optimizer state are lazily initialized, the order depends on the user code (reading state_dict from IO). See the following example: ``` optimizer_state_dict = USER_CODE_TO_READ_STATE_FROM_IO() optimizer.load_state_dict(optimizer_state_dict) ``` The key order of `optimizer_state_dict` depends on `USER_CODE_TO_READ_STATE_FROM_IO` and there is no guarantee that the order is the same across ranks. **What Can Go Wrong?** After the first checkpoint load, the key order of optimizer may not be the same on different ranks. When users try to save another checkpoint, user will call `_unflatten_optim_state()` to save the optimizer states. Inside `_unflatten_optim_state()`, `dict.itmes()` will be called to iterate all the local optimizer state and `all_gather()` will be used to gather the local states. Since the order may be different across ranks, the gathered states are not correct. We have seen some models get NaN loss after the second checkpoint load because of this issue. **What This PR Does?** This PR implements a `sorted_items()` to return sorted `(key, value)` pairs. We can do this because the key is either an integer or a string. Differential Revision: [D39315184](https://our.internmc.facebook.com/intern/diff/D39315184/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84654 Approved by: https://github.com/awgu commit 2211949513d6ff126f1caae4d817055ec9d3fab1 Author: Shen Li Date: Fri Sep 9 02:30:55 2022 +0000 Moving CommTensor from tests to private _spmd folder (#84719) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84719 Approved by: https://github.com/wanchaol commit b00a4b7cf1655cf9a6cf78ef9536b63898cbfb08 Author: Kunal Bhalla Date: Fri Sep 9 05:44:29 2022 +0000 [torch.fx.wrap] Use callable / function.__name__ instead of function.__code__.co_name (#84373) Ran across this issue while using torch.fx.wrap on a decorated function: it triggered a KeyError: 'wrapper_inside_decorator'. torch.fx.wrap stores function.__code__.co_name, but that isn't set correctly (and doesn't match it's name in the global namespace) for decorators; function.__name__ is set correctly. Also adjusted to checking for callable instead of checking for the existing of __code__ to allow for a broader variety of functions that can be passed in. Eg. using functools.cache returns a callable that won't have a __code__ attribute. I added a unit test (that incidentally fails every test in the suite before the fix commit -- because it affects the global state), and then a fix that addresses it. ``` In [1]: import functools In [2]: def decorator(f): ...: @functools.wraps(f) ...: def wrapper(*args, **kwargs): ...: return f(*args, **kwargs) ...: return wrapper ...: In [3]: @decorator ...: def some_function(x): ...: return x ...: In [4]: some_function.__name__ Out[4]: 'some_function' In [5]: some_function.__code__.co_name Out[5]: 'wrapper' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84373 Approved by: https://github.com/jamesr66a, https://github.com/SherlockNoMad commit 1c8cb0877095f64e1244c42de851627fc8c3116f Author: PyTorch MergeBot Date: Fri Sep 9 03:12:08 2022 +0000 [vision hash update] update the pinned vision hash (#84731) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84731 Approved by: https://github.com/pytorchbot commit 1c8f02d406a059e4ea1d29c7db3ac2386765e2c3 Author: PyTorch MergeBot Date: Fri Sep 9 03:09:41 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#84730) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84730 Approved by: https://github.com/pytorchbot commit e6ee8e613dfe764f61b2b29ad0a4a2c36f143eaa Author: Sherlock Huang Date: Thu Sep 8 23:25:47 2022 +0000 Return x.alias() when transpose is an nop (#84674) To fix bug in https://gist.github.com/SherlockNoMad/b8dfbc614d3e65707d1bc379a098196d ``` def f(x): return x.t() x = torch.randn(2, requires_grad=True) y = f(x) compiled_f = make_fx(f)(x) y_compiled = compiled_f(x) print(compiled_f) print("y.requires_grad", y.requires_grad) print("y_compiled.requires_grad", y_compiled.requires_grad) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84674 Approved by: https://github.com/cpuhrsch commit dbdc1cd590169576cfb78008f33b7cc795150729 Author: Justin Chu Date: Fri Sep 9 01:52:38 2022 +0000 [ONNX] Fix node attributes when namespace is aten (#84211) When `g.at` is used, the previous clean up in #83136 mistakenly removed the behavior that sets `aten=True` in `_add_attribute`. This PR brings the behavior back. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84211 Approved by: https://github.com/thiagocrepaldi, https://github.com/BowenBao commit 2fa8142cf9469fa6570507ec097d4dd5e0b13c92 Author: Justin Chu Date: Fri Sep 9 01:22:12 2022 +0000 [ONNX] Rename constants for clarity (#84645) Rename constants to make them more clear. Fix styles to upper case. Removed `onnx_stable_opsets` because it can be computed from `ONNX_MIN_OPSET` and `ONNX_MAX_OPSET`. Fixes #84643 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84645 Approved by: https://github.com/BowenBao commit bc3683de83adc9b9213cc926c677b3f8bb309722 Author: xndcn Date: Fri Sep 9 01:22:09 2022 +0000 [quant] remove mkldnn headers in OnednnUtils.h (#84195) mkldnn headers are not installed in include directory, also we don't have to include mkldnn here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84195 Approved by: https://github.com/jerryzh168 commit 7a5d5a00207e91d5a7d1820781109a989aadc86c Author: Eric Han Date: Fri Sep 9 01:15:22 2022 +0000 Disable Transformer/MHA fast path when autocast is enabled (#84722) Differential Revision: D39362298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84722 Approved by: https://github.com/cpuhrsch commit f23a1cf805356f0519d3dfc276958c4c518e6db5 Author: Huy Do Date: Fri Sep 9 00:15:46 2022 +0000 Fix conda cmake setup for macos x86-64 (#84682) The latest conda setup with cache causes package conflicts for macos x86-64 with cmake-3.19, for example https://github.com/pytorch/pytorch/runs/8237917073. It's near impossible to understand the cryptic package conflicts errors from conda. `cmake=3.22.1` is the same cmake version used in macos arm64, which doesn't have the issue. At the moment, the mac x86-64 build and test jobs success because they are reinstalled with stock conda from miniconda installation script every time and don't use any cache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84682 Approved by: https://github.com/ZainRizvi commit 6f216805634e5859b76253432542a1c4c60ee573 Author: Rodrigo Kumpera Date: Thu Sep 8 23:28:28 2022 +0000 Add __all__ for a few distributed modules plus a little typing (#84119) This handles distributed_c10d, which is massive and ddp_comm_hooks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84119 Approved by: https://github.com/rohan-varma commit 9b8e0a38a6674c78a8f2729ce15393859aa2bd3d Author: rzou Date: Thu Sep 8 08:54:32 2022 -0700 [functorch] excise older custom_vjp prototype (#84638) It was based off of the Python op registration API that has been implemented in PyTorch already, so we can always bring it back, but we're probably taking a different approach here. Test Plan: - tests Differential Revision: [D39315050](https://our.internmc.facebook.com/intern/diff/D39315050) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84638 Approved by: https://github.com/samdow commit 8c91dd2677cca5983d410592684827784a6cfc07 Author: rzou Date: Thu Sep 8 08:54:30 2022 -0700 [functorch] Add some C++ documentation (#84637) Clarify the purpose of many files. Differential Revision: [D39315053](https://our.internmc.facebook.com/intern/diff/D39315053) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84637 Approved by: https://github.com/samdow commit 05b778f958a77adc33cc51db5717d6b9ab2e8b35 Author: PyTorch MergeBot Date: Thu Sep 8 20:30:23 2022 +0000 Revert "Add mkl implementation for exponential on CPU (#69967)" This reverts commit 189768ed64561e61ff05c9e42adfa40139388204. Reverted https://github.com/pytorch/pytorch/pull/69967 on behalf of https://github.com/izaitsevfb due to This PR caused internal breakage (internal revert D39348330; relevant task T131416326) commit 6bedb7a75e2c6712ef3a8de3283fe44adab4a659 Author: Paul Saab Date: Thu Sep 8 19:42:20 2022 +0000 [aarch64] Fix aarch64 build so that quantize_val_arm is defined (#84564) Summary: quantize_val_arm is used in the kernels when building under aarch64 Test Plan: CI Differential Revision: D39272746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84564 Approved by: https://github.com/kimishpatel commit a6e6276c8b1715fd3685c565065b35184d103a48 Author: PyTorch MergeBot Date: Thu Sep 8 19:28:38 2022 +0000 Revert "Moving CommTensor from tests to private _spmd folder (#84655)" This reverts commit 07dad15583a1a6bb6a65594883fa094a3b109baf. Reverted https://github.com/pytorch/pytorch/pull/84655 on behalf of https://github.com/kit1980 due to Several test failures on trunk https://hud.pytorch.org/pytorch/pytorch/commit/07dad15583a1a6bb6a65594883fa094a3b109baf, PR also had failures commit c5cf8f6b28a3c5abf443d60fd3811b8a4b7fcc16 Author: xiaxiaohua1 Date: Thu Sep 8 18:22:48 2022 +0000 fix [rpc] Wrong usage of RRefContext::handleException #71458 (#83166) Fixes #71458 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83166 Approved by: https://github.com/kumpera commit 1459a909b4034fc330b3ee55e164d77ffde1bdd8 Author: Horace He Date: Thu Sep 8 00:15:52 2022 +0000 Added mv, mm, and binary_cross_entropy_with_logits decomps (#84451) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84451 Approved by: https://github.com/ngimel commit fe353e1413e2262993fb71dba7317f21bc6fc3bc Author: Huy Do Date: Thu Sep 8 17:51:15 2022 +0000 Enable manual test config label selection on Windows (#84669) After https://github.com/pytorch/pytorch/pull/83690, functorch team has started using the new label in some of their [PRs](https://github.com/pytorch/pytorch/labels/test-config%2Ffunctorch). So this enabled the same feature on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84669 Approved by: https://github.com/ZainRizvi commit 07dad15583a1a6bb6a65594883fa094a3b109baf Author: Shen Li Date: Thu Sep 8 14:31:26 2022 +0000 Moving CommTensor from tests to private _spmd folder (#84655) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84655 Approved by: https://github.com/wanchaol commit 9beb0c0b877b85880e51366270a5916f70559e42 Author: Taylor Robie Date: Wed Sep 7 15:01:58 2022 -0700 Reland: [Profiler] Unify global and thread local profiler lookup. (#83894) (#84668) This PR renames `ProfilerThreadLocalStateBase` to simply `ProfilerStateBase`, and adds `push`, `pop`, and `get` methods. `global` can be specified, or can be omitted for priority selection. In order to support this unification it was necessary to make a (mostly) non-throwing version of pop. The asserts around observer removal are intended to act as guard rails against multiple profilers trampling over each other. However on-demand wants to do exactly that because it wants to be able to preempt. A hack would be to get the current observer and then only pop if an observer is found, but that would be prone to race conditions. By removing the asserts, we can preserve the old behavior by adding `ASSERT(pop())` on the caller side while allowing more complex handling for the kineto client interface. (Later PR.) Differential Revision: [D39326253](https://our.internmc.facebook.com/intern/diff/D39326253/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84668 Approved by: https://github.com/slgong-fb commit bea01840335f0990c8d481de70c86f276d7c1654 Author: Taylor Robie Date: Wed Sep 7 14:58:28 2022 -0700 Reland: [Profiler][Trivial] Create orchestration folder and move observer management there. (#83893)" (#84667) Reland of https://github.com/pytorch/pytorch/pull/83893 Differential Revision: [D39282536](https://our.internmc.facebook.com/intern/diff/D39282536/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39282536/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84667 Approved by: https://github.com/slgong-fb commit b9793a66b56ec01e4ec85dce879552dfa650d0c8 Author: Peter Bell Date: Thu Sep 8 15:18:18 2022 +0100 Fix linalg.norm sample inputs function and related failures (#84452) Due to an indentation error, the return statement happens after just 1 loop of `for test_size in test_sizes` so only one shape was ever tested. This also revealed several cases where the provided shapes don't work so I've disabled the generation of those sample inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84452 Approved by: https://github.com/Lezcano, https://github.com/zou3519 commit 335033f7182bf421d203d5eeaad598fa1102933f Author: Junteng Jia Date: Thu Sep 8 17:00:45 2022 +0000 asyncio increase throughput (pytorch change) (#84301) Summary: This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage. Differential Revision: D39145980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84301 Approved by: https://github.com/VitalyFedyunin commit 368742d0596f8c17bbcf118f232cc63c4c86b5b7 Author: cchheennhhaaoo Date: Thu Sep 8 13:48:16 2022 +0000 Dispatch for xpu in adaptive_avg_pooling (#84541) Motivation: - See native_functions.yaml, operators adaptive_avg_pool2d/adaptive_avg_pool3d are not recommended to register for backends. - When adaptive_avg_pool2d/adaptive_avg_pool3d have a input of xpu tensor, they can't step into xpu implementation. Solution: - Dispatch to _adaptive_avg_pool2d/_adaptive_avg_pool3d for xpu backend in adaptive_avg_pool2d/adaptive_avg_pool3d implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84541 Approved by: https://github.com/ezyang commit eddc2370ec33938adbd9a3136852c3ab19e51a78 Author: kshitij12345 Date: Thu Sep 8 13:35:19 2022 +0000 [functorch] vmapvjpvjp (re-enable test with skips and xfails) (#83999) Enable `vmapvjpvjp` test and add relevant skips and xfails. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83999 Approved by: https://github.com/zou3519 commit 76fc690522024d978176b74a73e0222ac4d062de Author: PyTorch MergeBot Date: Thu Sep 8 10:44:37 2022 +0000 Revert "[functorch] vmapvjpvjp (re-enable test with skips and xfails) (#83999)" This reverts commit 9addeccb6b1ec9fed0246ba18fdb70550c813a90. Reverted https://github.com/pytorch/pytorch/pull/83999 on behalf of https://github.com/kshitij12345 due to Broke trunk commit 9addeccb6b1ec9fed0246ba18fdb70550c813a90 Author: kshitij12345 Date: Thu Sep 8 06:23:12 2022 +0000 [functorch] vmapvjpvjp (re-enable test with skips and xfails) (#83999) Enable `vmapvjpvjp` test and add relevant skips and xfails. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83999 Approved by: https://github.com/zou3519 commit 8bd9fe3f493073bf8f4a2e428c3048096fb36052 Author: Elias Ellison Date: Wed Sep 7 23:39:28 2022 +0000 Changes to prepare for fake tensors on in functorch by default (#84432) Fixes some errors you run into in dynamo when turning on fake tensors. I'm waiting on flipping the switch because I need to also get some fixes into dynamo + do benchmarking. I could manually turn off fake tensors in functorch in dynamo, and then turn it on here if requested, although the changes here are pretty minimal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84432 Approved by: https://github.com/Chillee commit b288cfd328be3908ffc42b948bf0137940b01e85 Author: Aaron Enye Shi Date: Thu Sep 8 03:37:39 2022 +0000 [Profiler] Add quoted metadata API to remove empty trace cpu_op metadata (#84128) Summary: The profiler utility function, stacksToStr, is quoting all metadata values, and therefore even empty metadata fields are being pushed into the trace files. Remove this and add an argument to use quoted metadata api provided by libkineto::GenericTraceActivity. Test Plan: Before, a trace file will dump extra empty fields for Module Hierarchy and Call Stack: ``` { "ph": "X", "cat": "cpu_op", "name": "autograd::engine::evaluate_function: AddBackward0", "pid": 798015, "tid": 798264, "ts": 1661451887593736, "dur": 21, "args": { "Trace name": "PyTorch Profiler", "Trace iteration": 0, "External id": 513, "Profiler Event Index": 0, "Module Hierarchy": "", "Call stack": "", "Fwd thread id": 3, "Sequence number": 1, "ID": 139880536829952, "Parent ID": null } } ``` After, these fields will not be in the trace file anymore: ``` { "ph": "X", "cat": "cpu_op", "name": "autograd::engine::evaluate_function: AddBackward0", "pid": 1482813, "tid": 1483069, "ts": 1661468912444365, "dur": 43, "args": { "Trace name": "PyTorch Profiler", "Trace iteration": 0, "External id": 513, "Profiler Event Index": 0, "Fwd thread id": 3, "Sequence number": 1, "ID": 139852271321088, "Parent ID": null } } ``` Also, with input tracking on, it looks correct compared to previous kineto observer: ``` { "ph": "X", "cat": "cpu_op", "name": "aten::add_", "pid": 1572428, "tid": 1572776, "ts": 1661469920242309, "dur": 19, "args": { "Trace name": "PyTorch Profiler", "Trace iteration": 0, "External id": 531, "Profiler Event Index": 18, "Input Dims": [[256, 256], [256, 256], []], "Input type": ["float", "float", "Scalar"], "ID": 140023871647232, "Parent ID": 140023871646720 } } ``` Differential Revision: D39041244 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/84128 Approved by: https://github.com/robieta commit 942c0f31dfffbc5eb180cadd0fd1302d5e907f64 Author: titaiwang Date: Thu Sep 8 00:58:09 2022 +0000 [ONNX] Align Optional Type in block (#83599) Why: Previously, we use `replaceAlluseswith` after adding Optional on the node which is right before output. However, this may break the graph by also changing the nodes that needs the node (original) as input. We only need the node to be optional in output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83599 Approved by: https://github.com/justinchuby, https://github.com/BowenBao, https://github.com/malfet commit 49ec8d32c706e3df1f777b2361b2ee673269f8b8 Author: Sergii Dymchenko Date: Thu Sep 8 03:12:50 2022 +0000 Suggest draft PRs in contribution_guide.rst (#84658) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84658 Approved by: https://github.com/huydhn commit 0945074a8e4e7d0d07b7a929873d1f0dbdca7173 Author: Sherlock Huang Date: Thu Sep 8 00:31:58 2022 +0000 Preserver stacktrace over functionalization (#84662) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84662 Approved by: https://github.com/Chillee commit cb6ba27db3e1e55e9a429fb4a576a9e8389c2b93 Author: PyTorch MergeBot Date: Thu Sep 8 02:34:33 2022 +0000 [vision hash update] update the pinned vision hash (#84679) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84679 Approved by: https://github.com/pytorchbot commit 889540d091086bb31367a602295730f64e2ff690 Author: PyTorch MergeBot Date: Thu Sep 8 02:34:16 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#84678) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84678 Approved by: https://github.com/pytorchbot commit e0229d6517385a98afeadbc6391d3592d5027c63 Author: John Detloff Date: Thu Sep 8 01:49:55 2022 +0000 Remove caffe2 mobile (#84338) We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338 Approved by: https://github.com/dreiss commit 9669e3c6ec6b6f232bed3b29bcd593434992f57d Author: Edward Z. Yang Date: Wed Sep 7 17:25:49 2022 -0400 Ignore UB on multiply (#84665) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84665 Approved by: https://github.com/Chillee commit 1a1bcc736197f1f7943d568512c3c1e44ba05fbc Author: Nikita Shulga Date: Thu Sep 8 01:09:10 2022 +0000 Actually chown artifacts (#84672) Rollback part of https://github.com/pytorch/pytorch/commit/045ebc771d5070696f839e586285ace9c06f1339 to actually chown artifacts folder rather than workspace Fixes https://github.com/pytorch/pytorch/issues/84644 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84672 Approved by: https://github.com/kit1980, https://github.com/huydhn commit 93359bf9b3503135332d40cb297515efe5290ec6 Author: Edward Z. Yang Date: Wed Sep 7 17:19:08 2022 -0400 Convert ConcretePyInterpreterVTable into Meyer singleton (#84657) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84657 Approved by: https://github.com/wconstab commit 9162bc025256d638369c77c845b8a5ed66eeff5a Author: Edward Z. Yang Date: Wed Sep 7 15:43:58 2022 -0400 Convert NoopPyInterpreterVTable into a Meyer singleton (#84656) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84656 Approved by: https://github.com/wconstab commit 29672b2136fc80537edf4632b2cf40f48efe0ab8 Author: samdow Date: Wed Sep 7 20:46:24 2022 +0000 [functorch] add pinv batch rule (#83761) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83761 Approved by: https://github.com/zou3519 commit 586832ce65607c6a1d1d8245b55d4ec24ddfc0e4 Author: Howard Huang Date: Wed Sep 7 08:26:16 2022 -0700 Add underlying_store property for PrefixStore (#84640) Add a property to `PrefixStore` to retrieve the underlying store it is wrapping around. Open for suggestions on property name. This change is based on discussion in [D39225101](https://www.internalfb.com/diff/D39225101) where we need to read properties of the store that PrefixStore is wrapping around. Differential Revision: [D39311151](https://our.internmc.facebook.com/intern/diff/D39311151) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84640 Approved by: https://github.com/xush6528 commit e68df8e4a14ce1fbedf6b20e132b11ec7b151f8a Author: Elias Ellison Date: Wed Sep 7 07:55:51 2022 -0700 Turn on functionalization by default in functorch (#84435) I talked to @SherlockNoMad abt this PR and we agreed prior to brian coming back it was worth disabling this test for getting functionalization on (and that is already the state of torchdynamo) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84435 Approved by: https://github.com/Chillee commit 6b2111619e801064065c0eaba7ca03f00feef59b Author: Catherine Lee Date: Wed Sep 7 21:44:39 2022 +0000 check rate limits of other tokens too (#83632) we keep running into api rate limit issues but apparently theyre connected to pytorchbot, so check rate limit of our other tokens too according to https://docs.github.com/en/rest/rate-limit this doesnt count against the rate limit Pull Request resolved: https://github.com/pytorch/pytorch/pull/83632 Approved by: https://github.com/huydhn commit 9532c7e267b3ccf2ca500fdae1ed5298c1f0f146 Author: samdow Date: Wed Sep 7 17:50:54 2022 +0000 [functorch] add matrix_rank rule (#83760) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83760 Approved by: https://github.com/zou3519 commit e14f46f9ddf143dbe894ee40e3a698fb401523ae Author: Howard Huang Date: Wed Sep 7 07:39:21 2022 -0700 Add host and port to TCPStore pyi definition (#84636) `host` and `port` are already exposed in the `TCPStore` pybind definition, this is a small change adding it in the pyi stub Differential Revision: [D39311153](https://our.internmc.facebook.com/intern/diff/D39311153) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84636 Approved by: https://github.com/wz337 commit c7f6deb6678f4df578584439e4ab26d185da5ef3 Author: Scott Wolchok Date: Tue Sep 6 14:42:09 2022 -0700 [PyTorch] Guard against self assignment in SymInt (#84375) self assignment was broken, now it's not. Differential Revision: [D39189342](https://our.internmc.facebook.com/intern/diff/D39189342/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84375 Approved by: https://github.com/suo commit fc4acd4425ca0896ca1c4f0a8bd7e22a51e94731 Author: WEN Hao Date: Wed Sep 7 19:12:33 2022 +0000 Fix error in the index range math expression in the docstring of MultiMarginLoss (#84513) Fixes #84512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84513 Approved by: https://github.com/Lezcano, https://github.com/cpuhrsch commit d892d5d6829c315ba9b5038b8796e1c96a54f9b5 Author: Eddie Yan Date: Wed Sep 7 18:30:23 2022 +0000 [CUBLAS][TF32][CUDNN] Update numerical_accuracy.rst (#79537) CC @mruberry @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/79537 Approved by: https://github.com/ngimel, https://github.com/mruberry commit acb4a09628284201281e262aaee58e3dc6be9c2b Author: PyTorch MergeBot Date: Wed Sep 7 18:02:27 2022 +0000 Revert "Call jit decomposition in VariableType to increase forward AD coverage (#84151)" This reverts commit 42d99e6f196233627a28b8e9efb26a0a166fa370. Reverted https://github.com/pytorch/pytorch/pull/84151 on behalf of https://github.com/malfet due to Regressed test_jvpvjp_nn_functional_layer_norm_cuda_float32, see https://hud.pytorch.org/pytorch/pytorch/commit/42d99e6f196233627a28b8e9efb26a0a166fa370 commit 31ef8ddb8c4467f5b8698ef1eb9bb8bab7056855 Author: Wei Wei Date: Wed Sep 7 17:21:27 2022 +0000 add option to remove passes (#84425) Summary: Add a remove_pass method in pass_manager to provide user option to remove any pass. Reviewed By: wushirong Differential Revision: D39080077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84425 Approved by: https://github.com/yinghai commit 2feb31cb269bd640ff2858ebe8adb3fb0aec8dc0 Author: Peter Bell Date: Wed Sep 7 15:00:54 2022 +0100 Improve torch::jit::as_{module,object} performance (#84399) This caches the import of `torch.jit.ScriptModule`, `torch.ScriptObject` and `torch.jit.RecursiveScriptClass`. I measure a ~0.8 us performance uplift locally when calling a `torch.ops` function with a `ScriptObject` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84399 Approved by: https://github.com/ezyang commit 2b2e0fddf8001c0c662bd582e1d958a74bc84ac4 Author: Mateusz Sypniewski Date: Wed Sep 7 07:23:03 2022 -0700 Add CUDA Sanitizer (#83984) Example of a simple synchronization error: ``` a = torch.rand(4, 2, device="cuda") with torch.cuda.stream(second_stream): torch.mul(a, 5, out=a) ``` Output produced by CSAN: ``` ============================ CSAN detected a possible data race on tensor with data pointer 139719969079296 Access by stream 94646435460352 during kernel: aten::mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) writing to argument: self, out, output With stack trace: File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch stack_trace = traceback.StackSummary.extract( File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 544, in __torch_dispatch__ errors = self.event_handler._handle_kernel_launch( File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped return f(self, *args, **kwargs) File "/private/home/sypniewski/pytorch/tester.py", line 9, in torch.mul(a, 5, out=a) Previous access by stream 0 during kernel: aten::rand(int[] size, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor writing to argument: output With stack trace: File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch stack_trace = traceback.StackSummary.extract( File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 544, in __torch_dispatch__ errors = self.event_handler._handle_kernel_launch( File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped return f(self, *args, **kwargs) File "/private/home/sypniewski/pytorch/tester.py", line 6, in a = torch.rand(10000, device="cuda") Tensor was allocated with stack trace: File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 420, in _handle_memory_allocation traceback.StackSummary.extract( File "/private/home/sypniewski/pytorch/torch/utils/_cuda_trace.py", line 23, in fire_callbacks cb(*args, **kwargs) File "/private/home/sypniewski/pytorch/torch/_ops.py", line 60, in __call__ return self._op(*args, **kwargs or {}) File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 541, in __torch_dispatch__ outputs = func(*args, **kwargs) File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped return f(self, *args, **kwargs) File "/private/home/sypniewski/pytorch/tester.py", line 6, in a = torch.rand(10000, device="cuda") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83984 Approved by: https://github.com/ezyang commit 19e27b15562b261e87e3e629cb32cb6876b9caca Author: Edward Z. Yang Date: Wed Sep 7 05:58:32 2022 -0700 Make dispatcher registrations of SymInt functions backwards compatible (#84557) Previously, when we SymInt-ify a schema, this is a BC-breaking change for all people who registered functions for that function; they must accept c10::SymInt where they previously accepted int64_t. This is not great. With this change, I accept old type registrations transparently. The idea is in several parts: - At the registration site, at compile time I have no idea whether or not if the function being registered has a SymInt schema or not. So I must defer the exact compatibility check. What I do instead is check if the function pointer registered to me has SymInt in the argument or not. If it does, I assume it is new-style and ensure it is also registered to a special sym_ slot on KernelFunction. If not, it only goes in the conventional slot. - At the dispatcher site, I know at compile time whether or not this is a SymInt function. If it is, I check for a sym_ slot on the KernelFunction, and preferentially use that. If no such slot exists, I then fall back to the regular slot... but I convert all SymInt arguments to int64_t arguments (doing assertions that no true symbolic integer was passed.) I can skip this test entirely if the function doesn't have any SymInts in it; in that case I know that only the original slot could have been registered. Fortunately, both branches of the short circuit typecheck, so I didn't have to use SFINAE or if-constexpr to make it work; just a plain if statement that I expect the compiler to optimize away. - Schema validation is now modestly more complicated. There are two parts. First, function schema validation proceeds by checking if the signature in question has any SymInt-like types in it or not. If it does, we do function schema validation against the real types; if it doesn't, we do validation against the fake types (but only for symint; MemoryFormat is always MemoryFormat). Second, cpp signature validation also keeps track of a "symint" cpp signature and a "non-symint" cpp signature. We only compare symint with symint, and non-symint with non-symint. I did not implement checking a conflict between a symint and non-symint cpp signature, though in principle you could try converting the SymInt types to non-SymInt types and doing the comparison that way. To show it is working, I remove a bunch of c10::asIntArrayRefSlow shims, as the dispatcher is able to insert them automatically now. I didn't update the Metal registrations (though they can get similar treatment) as OSS CI coverage is insufficient for this case. Signed-off-by: Edward Z. Yang Differential Revision: [D39280965](https://our.internmc.facebook.com/intern/diff/D39280965) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84557 Approved by: https://github.com/wconstab commit ed46b9670ebafa1c6bf7d078dcf5687109fee6ae Author: samdow Date: Tue Sep 6 16:41:00 2022 -0400 add randomness kwarg to jacfwd (#84220) From https://github.com/pytorch/functorch/issues/1010, if a user runs jacfwd with a function that uses randomness, it will fail since the default behavior for vmap is error. This lets the user specify the randomness behavior to jacfwd too since it is doing vmap(jvp(forward)). This is less likely to show up in jacrev since that only vmaps over the backwards pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/84220 Approved by: https://github.com/zou3519 commit 50ae5c9141fc752c80e7fe88a123ea77ee0265f9 Author: Jianyu Huang Date: Wed Sep 7 16:14:23 2022 +0000 set workspace size to 4M (#74159) Summary: Follow D34480690 (https://github.com/pytorch/pytorch/commit/3ec1dd9989ac5441c767f975f5e0fc46847400a2) Test Plan: CI Differential Revision: D34636039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74159 Approved by: https://github.com/xuzhao9 commit 87738f2073d808f0f76d607d1593f7683a463f45 Author: Shen Li Date: Wed Sep 7 02:22:56 2022 +0000 Remove expired c10d::broadcast backward compatibility check (#84107) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84107 Approved by: https://github.com/wanchaol commit 99b7eb4dfbf8387d15b46913f1ff4e771782f499 Author: mikey dagitses Date: Wed Sep 7 15:44:20 2022 +0000 move internal only PyTorch test defs into fb/ subdirectories (#84605) Test Plan: Rely on CI. Differential Revision: D39289373 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84605 Approved by: https://github.com/DanilBaibak commit d3d163af8061e08097c3ae37079bf61535b81ff1 Author: lezcano Date: Wed Sep 7 13:12:49 2022 +0000 Add xla/ folder to gitignore (#84632) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/84632 Approved by: https://github.com/ezyang commit 42d99e6f196233627a28b8e9efb26a0a166fa370 Author: soulitzer Date: Tue Sep 6 21:37:03 2022 -0400 Call jit decomposition in VariableType to increase forward AD coverage (#84151) This PR: - updates forward AD codegen in core to generate code that tries calling into decompositions registered to jit when - (1) the function is not in-place or out variant - AND (2) the function is differentiable (requires_derivative=True) - AND (3) there are no forward AD formulas registered - To simplify things we always generating the if/else (as long as (1) is true), but generate 'false' when either (2) or (3) are false. - removes the mechanism from functorch - (follow up) some functorch tests should be updated here so they no longer have to compute the Jacobian with vjp - factors out some logic to generate the any_has_forward_grad condition - (bc-breaking) when TensorList inputs unexpectedly have forward grad, the error will no longer contain the name See https://github.com/pytorch/pytorch/pull/84151#issuecomment-1238519247 for codegen output and more discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84151 Approved by: https://github.com/samdow, https://github.com/albanD, https://github.com/zou3519 commit e31ad1c2d3a08a6421cd7a8adcd7b3f66727305a Author: soulitzer Date: Tue Sep 6 13:23:03 2022 -0400 [reland] Move decompositions and helpers for jvp from functorch into core (#84581) Reland of https://github.com/pytorch/pytorch/pull/84358 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84581 Approved by: https://github.com/samdow commit 3eb16509c761c41f50163d404428246ea117c7fd Author: nikitaved Date: Wed Sep 7 15:29:44 2022 +0000 optimize householder product backward to be more memory-efficient (#84627) A follow-up on discussions in https://github.com/pytorch/pytorch/pull/84180. Makes backward more memory efficient with the lesser number of kernel calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84627 Approved by: https://github.com/kshitij12345, https://github.com/zou3519 commit e96fb5d58c2accd717f0859b510ae7facb6d6aac Author: Rodrigo Kumpera Date: Wed Sep 7 14:49:45 2022 +0000 [c10d] Fix docstring of scatter_object_list (#84596) The docstring for scatter_object_list mentions is doesn't work with NCCL, but this was fixed in #79034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84596 Approved by: https://github.com/H-Huang commit a47bc96fb7176d43752d3e376697971d4ba47317 Author: Richard Zou Date: Tue Sep 6 10:20:07 2022 -0700 [composite compliance] fix linalg.eigvals (#84137) linalg.eigvals fails in some cases with functorch and the root of the problem is that it is not composite compliant. In particular, checks that branch on whether or not a Tensor requires grad do not work with functorch. In order to support functorch with them, we have to include an additional "if the tensor is a Tensor Subclass, then assume that it MAY require grad, so we must always go through the differentiable path". This PR also changes the batching rule for linalg.eigvals to be a decomposition instead of what it was previously. What it was previously was masking the error in functorch's test suite. Unfortunately we don't comprehensive tests for this on the functorch side which is why this was not caught before. I'll look into why that is in the future; it's a bit complicated. Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/84137 Approved by: https://github.com/Lezcano, https://github.com/IvanYashchuk, https://github.com/samdow commit 89c4654ba9e3c552d3a6e0a56da8adf656cce469 Author: Shen Li Date: Tue Sep 6 21:51:34 2022 +0000 Add scatter_ to CommTensor (#84606) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84606 Approved by: https://github.com/wanchaol commit f43c38bdc820650ad974bb1c48360b0c6931961a Author: Shen Li Date: Tue Sep 6 21:35:20 2022 +0000 Add broadcast_ to CommTensor (#84604) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84604 Approved by: https://github.com/wanchaol commit a24d7a8565f5aac8448775552557112d0239fc8f Author: Shen Li Date: Tue Sep 6 21:35:12 2022 +0000 Add reduce_scatter_ to CommTensor (#84592) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84592 Approved by: https://github.com/wanchaol commit e4519548a5a5f4026645f4a240ac026094ef1be5 Author: Shen Li Date: Tue Sep 6 21:35:12 2022 +0000 Supported nested lists in CommTensor and enable tracing allgather_ (#84585) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84585 Approved by: https://github.com/wanchaol commit 189768ed64561e61ff05c9e42adfa40139388204 Author: CaoE Date: Wed Sep 7 13:48:43 2022 +0000 Add mkl implementation for exponential on CPU (#69967) Add mkl implementation for exponential on CPU to improve the performance of exponential. data type: float32 single socket (28cores): ``` before: torch.Size([10, 128, 10, 124]) 0.065 ms torch.Size([10, 128, 20, 124]) 0.130 ms after: torch.Size([10, 128, 10, 124]) 5.9e-05 ms torch.Size([10, 128, 20, 124]) 0.000113 ms ``` single core: ``` before: torch.Size([10, 128, 10, 124]) 0.065 ms torch.Size([10, 128, 20, 124]) 0.130 ms after: torch.Size([10, 128, 10, 124]) 0.00117 ms torch.Size([10, 128, 20, 124]) 0.002347 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69967 Approved by: https://github.com/frank-wei commit 9e7af4e8d4540c6034806e84fec64d08643031bd Author: Mateusz Sypniewski Date: Wed Sep 7 13:01:51 2022 +0000 Add alias info to torch._C (#84580) This adds the `AliasInfo` class to torch._C, as defined in https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/python/init.cpp#L1943. This will fix MYPY errors for missing `Argument` attributes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84580 Approved by: https://github.com/lw commit ec3939a62f7e09807e0e7e9701c354c94aef7a66 Author: Sean Silva Date: Wed Sep 7 12:53:08 2022 +0000 Detect `__code__` a bit more reliably. (#84610) Based on Ed's patch. Fixes https://github.com/pytorch/pytorch/issues/84570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84610 Approved by: https://github.com/Chillee commit 07d398fb269eebe314ae898287494a2bfdc7f278 Author: kshitij12345 Date: Wed Sep 7 09:33:37 2022 +0000 [composite compliance] linalg_householder_product (#84180) Ref: #69991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84180 Approved by: https://github.com/zou3519 commit 045ebc771d5070696f839e586285ace9c06f1339 Author: Nikita Shulga Date: Wed Sep 7 05:52:27 2022 +0000 [BE] Use `teardown-linux`/`chown` actions for binary builds (#84449) Also embed `wait_for_ssh_to_drain.sh` into the action (to make it more reusable across repos) and delete unused teardown_linux template from `common.yml` Also, in `_binary-test-linux.yml` move artifact download step after repo checkout, to make errors during that step more parseable Pull Request resolved: https://github.com/pytorch/pytorch/pull/84449 Approved by: https://github.com/kit1980 commit 1a33e944b58a75efe6154f1d02a32b80b7661edf Author: jjsjann123 Date: Wed Sep 7 05:22:37 2022 +0000 nvfuser torchbench patch (#84411) 1. Patching nvfuser_execute to take aten nvprim fallback when no cuda tensors are provided as inputs 2. Extending support of nvfuser python API on cpu scalar tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84411 Approved by: https://github.com/ngimel, https://github.com/kevinstephano, https://github.com/IvanYashchuk commit 7c3102f3f09e0ac0bf272df2ad48dd40515eceea Author: Antonio Kim Date: Wed Sep 7 05:03:02 2022 +0000 Add ShouldSyncTensor interface (#84418) Adding an `ShouldSyncTensor` interface to allow for the case of output pruning should a vendor not support retrieving the value of a certain output. CC: @wconstab @JackCaoG @Krovatkin Pull Request resolved: https://github.com/pytorch/pytorch/pull/84418 Approved by: https://github.com/wconstab commit c4e0c927e31d59c51a6d4e09d58038becc1faf29 Author: Ke Wen Date: Wed Sep 7 04:48:02 2022 +0000 [c10d] Add a soft error handling mode (#84386) Adding new value "2" to env `NCCL_ASYNC_ERROR_HANDLING` standing for a "CleanUpOnly" error handling mode. Comparing to `NCCL_ASYNC_ERROR_HANDLING=1`, the "CleanUpOnly" mode will just abort the collectives and NCCL communicators, and will not tear down the process. User will have the chance to query the state of the process group (in a later PR) and abort the process group (in a later PR), and re-create the process group if needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84386 Approved by: https://github.com/rohan-varma commit 5b58140d1a471b144baf66cc61a45a86746f0215 Author: Kurt Mohler Date: Wed Sep 7 03:12:49 2022 +0000 Add deterministic impl of `scatter_add` CUDA for all input sizes (#79466) Fixes #50469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79466 Approved by: https://github.com/ngimel commit 039b0146f9d831ae20ce293989db07a711dae09a Author: PyTorch MergeBot Date: Wed Sep 7 02:39:36 2022 +0000 [vision hash update] update the pinned vision hash (#83900) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83900 Approved by: https://github.com/pytorchbot commit 15c5baf87802cda783824c1762bf16d848b6625f Author: Elias Ellison Date: Wed Sep 7 00:20:29 2022 +0000 Throw on data dependent ops (#83567) Previously, we would trace through the following with no error: ``` from torch.fx.experimental.proxy_tensor import make_fx import torch def f(x, y): return x[0, y:] ``` Even though the output shape is dependent on the data of `y`. Now, throw on the conversion of `y` to an integer. It would be nice to not break on constant tensors but I'll do that as the next PR (Edit: done with https://github.com/pytorch/pytorch/pull/84387). Sketching out how that would work (and keep in mind this is applicable Dynamo tracing and not just AOT Autograd) I think to do that you would need to : - hold strong refs to a set of constant tensors, and only allow them to be captured from `lift_fresh.copy` - when you run a mutable op, either remove it from the set of constant tensors or run the operator for real - limit to small constant tensors Anything else ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/83567 Approved by: https://github.com/ezyang commit 0be77d54159ae9b95297e978d29eb2c92d5bafee Author: PyTorch MergeBot Date: Wed Sep 7 02:32:47 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#84613) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84613 Approved by: https://github.com/pytorchbot commit b168c4faa23b5684ede608140febec7c97a795d0 Author: Shen Li Date: Tue Sep 6 16:14:16 2022 +0000 Make CommTensor Generic to arguments and outputs structures (#84576) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84576 Approved by: https://github.com/aazzolini commit 00e0228050739dd33335cc2a8663c9759ba2f144 Author: Nikita Shulga Date: Wed Sep 7 00:53:46 2022 +0000 [BE] Delete `Check for new workflow" check (#84608) This check was introduced back in Mar, and all PRs for the last 60+ days should pass this check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84608 Approved by: https://github.com/ZainRizvi, https://github.com/kit1980 commit 06ebe2d5bc1055f226f56ed2fe26a29038a466e5 Author: Bin Chen Date: Wed Sep 7 00:17:20 2022 +0000 Add watchdog to TorchElastic agent and trainers (#84081) Summary: D38604238 (https://github.com/pytorch/pytorch/commit/3b11b80fc3f9f9a0171abb5eb2299835feba8b04) introduced a named pipe based watchdog timer. This diff uses the named pipe based watchdog timer in TorchElastic agent and training worker processes (in the StuckJobDetector class) to allow the TorchElastic agent to detect the stuck of a training process, and kill the process to create a core dump. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test:local_agent_test ``` ``` RemoteExecution session id: reSessionID-0bfcacef-24d1-42bc-a1d3-f3058fc42b2f-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7318349503394739 ✓ ListingSuccess: caffe2/test/distributed/elastic/agent/server/test:local_agent_test : 55 tests discovered (22.699) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_barrier_failed_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (47.140) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_homogeneous_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (49.198) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_happy_function_c10d (local_elastic_agent_test.LocalElasticAgentTest) (46.387) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_happy_function_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (46.094) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_bipolar_function_etcd (local_elastic_agent_test.LocalElasticAgentTest) (106.342) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_correct_rank_assignment_homogeneous_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (64.888) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_correct_rank_assignment_homogeneous_etcd (local_elastic_agent_test.LocalElasticAgentTest) (69.158) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_agent_local_watchdog_setup_enabled_etcd (local_elastic_agent_test.LocalElasticAgentTest) (46.965) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_elastic_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (79.626) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_function_with_return_value_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (46.113) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_sad_function_etcd (local_elastic_agent_test.LocalElasticAgentTest) (46.487) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_shutdown_called_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (24.358) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_torch_rpc_c10d (local_elastic_agent_test.LocalElasticAgentTest) (48.216) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_homogeneous_c10d (local_elastic_agent_test.LocalElasticAgentTest) (48.433) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_torch_rpc_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (47.029) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_simple_dist_sum_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (44.357) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_check_master_addr_port_override_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (45.176) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_check_nccl_async_error_handling_env_default_c10d (local_elastic_agent_test.LocalElasticAgentTest) (45.980) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_simple_dist_sum_c10d (local_elastic_agent_test.LocalElasticAgentTest) (47.151) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_simple_dist_sum_etcd (local_elastic_agent_test.LocalElasticAgentTest) (44.614) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_correct_rank_assignment_heterogeneous_etcd (local_elastic_agent_test.LocalElasticAgentTest) (69.099) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_agent_local_watchdog_setup_enabled_c10d (local_elastic_agent_test.LocalElasticAgentTest) (45.367) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_shutdown_called_etcd (local_elastic_agent_test.LocalElasticAgentTest) (22.804) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_elastic_c10d (local_elastic_agent_test.LocalElasticAgentTest) (77.560) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_dummy_compute_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (46.050) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_heterogeneous_c10d (local_elastic_agent_test.LocalElasticAgentTest) (48.088) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_elastic_etcd (local_elastic_agent_test.LocalElasticAgentTest) (77.286) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_fault_tolerance_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (50.670) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_check_master_addr_port_override_etcd (local_elastic_agent_test.LocalElasticAgentTest) (45.631) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_heterogeneous_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (50.867) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_double_agent_fault_tolerance_etcd (local_elastic_agent_test.LocalElasticAgentTest) (51.095) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_happy_function_etcd (local_elastic_agent_test.LocalElasticAgentTest) (45.000) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_sad_function_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (45.197) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_homogeneous_etcd (local_elastic_agent_test.LocalElasticAgentTest) (46.873) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_shutdown_called_c10d (local_elastic_agent_test.LocalElasticAgentTest) (23.160) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_barrier_failed_etcd (local_elastic_agent_test.LocalElasticAgentTest) (43.632) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_torch_rpc_etcd (local_elastic_agent_test.LocalElasticAgentTest) (44.536) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_bipolar_function_c10d (local_elastic_agent_test.LocalElasticAgentTest) (89.859) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_workers_drift_fail_etcd (local_elastic_agent_test.LocalElasticAgentTest) (48.277) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_check_nccl_async_error_handling_env_c10d (local_elastic_agent_test.LocalElasticAgentTest) (43.930) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_bipolar_function_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (87.677) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_workers_drift_success_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (48.965) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_workers_drift_fail_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (50.143) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_workers_drift_success_etcd (local_elastic_agent_test.LocalElasticAgentTest) (46.781) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_function_with_return_value_etcd (local_elastic_agent_test.LocalElasticAgentTest) (45.152) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_barrier_failed_c10d (local_elastic_agent_test.LocalElasticAgentTest) (44.832) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_function_with_return_value_c10d (local_elastic_agent_test.LocalElasticAgentTest) (45.281) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_correct_rank_assignment_heterogeneous_etcd_v2 (local_elastic_agent_test.LocalElasticAgentTest) (74.968) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_agent_local_watchdog_setup_disabled_c10d (local_elastic_agent_test.LocalElasticAgentTest) (46.141) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_dummy_compute_c10d (local_elastic_agent_test.LocalElasticAgentTest) (44.960) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_dummy_compute_etcd (local_elastic_agent_test.LocalElasticAgentTest) (45.292) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_agent_local_watchdog_setup_disabled_etcd (local_elastic_agent_test.LocalElasticAgentTest) (44.611) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_check_env_function_etcd (local_elastic_agent_test.LocalElasticAgentTest) (44.939) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_distributed_sum_heterogeneous_etcd (local_elastic_agent_test.LocalElasticAgentTest) (47.609) ✓ Pass: caffe2/test/distributed/elastic/agent/server/test:local_agent_test - test_run_sad_function_c10d (local_elastic_agent_test.LocalElasticAgentTest) (45.628) Summary Pass: 55 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7318349503394739 ``` ----------- ``` buck test caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test ``` ``` RemoteExecution session id: reSessionID-607a0028-4095-4dfc-b657-55f0807fe621-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774432794818 ✓ ListingSuccess: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test : 11 tests discovered (39.037) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_thrift_api_called (caffe2.torch.fb.trainer.stuck_detection.tests.collect_quickstack_test.CollectQuickstackTrace) (0.655) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_setup_local_watchdog (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (36.510) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_dont_print_when_job_normal (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (36.727) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_send_watchdog_request_on_batch_callbacks_no_server (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.060) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_quickstack_stuck_job (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.242) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_setup_local_watchdog_disabled (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.243) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_print_stack_trace_when_job_stuck (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.590) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_print_when_stuck (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.590) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_setup_local_watchdog_no_file (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (37.589) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_signposts_stack_trace_when_job_stuck (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (38.132) ✓ Pass: caffe2/torch/fb/trainer/stuck_detection/tests:stuck_job_detector_test - test_send_watchdog_request_on_batch_callbacks (caffe2.torch.fb.trainer.stuck_detection.tests.stuck_job_detector_test.StuckJobDetectorTest) (38.133) Summary Pass: 11 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8162774432794818 ``` Differential Revision: D38930476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84081 Approved by: https://github.com/d4l3k commit d9ceda49c497bc39c2b360b038ee07e145b32f5b Author: Thibault Date: Tue Sep 6 23:32:16 2022 +0000 ONNX: fix default function value in _optimize_graph (#83996) The default value for params_dict in _optimize_graph, which is None, throw the following error: > _C._jit_pass_onnx_unpack_quantized_weights( > TypeError: _jit_pass_onnx_unpack_quantized_weights(): incompatible function arguments. The following argument types are supported: > 1. (arg0: torch::jit::Graph, arg1: Dict[str, IValue], arg2: bool) -> Dict[str, IValue] Replacing it by an empty dict fixes the issue (and makes more sense). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83996 Approved by: https://github.com/BowenBao commit 16f8dc00f0331c839b04381d8ec644fbc2220313 Author: Hansong Zhang Date: Tue Sep 6 23:08:07 2022 +0000 [nnapi] remove unused field 'order_' in nnapi.h (#84067) Summary: This fixes the build Test Plan: `buck build //xplat/caffe2:nnapi_benchmarkAndroid` Reviewed By: salilsdesai Differential Revision: D38924916 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84067 Approved by: https://github.com/SS-JIA commit 166dec74b5ce3968a53d4c0f616776d0a2bf4309 Author: PyTorch MergeBot Date: Tue Sep 6 22:31:14 2022 +0000 Revert "Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761)" This reverts commit 65beff5acb0d7c0c484bd0558bcaf8ddc9c96aab. Reverted https://github.com/pytorch/pytorch/pull/81761 on behalf of https://github.com/mehtanirav due to Breakages in pytorch/glow commit 0e49bcfd416b1a83de0820f910c7c9ac38cbebaf Author: Paul Saab Date: Tue Sep 6 22:28:43 2022 +0000 [aarch64] Use cross build ld/ar/objcopy when creating libraries for cross building etc (#84558) Summary: ^^ Test Plan: CI Differential Revision: D39267050 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84558 Approved by: https://github.com/ajtulloch commit 1cad744694d7feb7c55e5f4ff4a6ae749686bfb5 Author: Mikayla Gawarecki Date: Tue Sep 6 16:59:51 2022 +0000 Enable select.int when NestedTensor requires grad (#83875) Previously indexing a nested tensor when it requires_grad would raise an error because the backward formula for `select.int` uses `self.sizes()`. This PR fixes that by temporarily registering a _nested_select_backward function which can be removed when we start using the symint approach to register kernels. For now this functionality is needed for creating a POC that nested tensor can be an API to `segment_coo` and `segment_csr` in the torch_scatter repo ``` a = torch.arange(10).reshape(2, 5).float() b = torch.arange(12).reshape(2, 6).float() nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(True) nt[0] ``` whereas ``` nt = torch.nested_tensor([a, b], dtype=torch.float).requires_grad_(False) nt[0] ``` would succeed Pull Request resolved: https://github.com/pytorch/pytorch/pull/83875 Approved by: https://github.com/albanD, https://github.com/drisspg commit 752c3bcb474e8024f59c9977dea67adfb256146d Author: Ivan Yashchuk Date: Tue Sep 6 22:08:11 2022 +0000 Enable nvfuser tests for refs.broadcast_to and refs.broadcast_tensors (#84337) Previously these tests were failing because they required some other op alongside prims.broadcast_in_dim to be executed. Now it works standalone. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84337 Approved by: https://github.com/mruberry, https://github.com/ngimel commit aec76e391f8c5c44e0340c7f4c67347f043e3144 Author: Catherine Lee Date: Tue Sep 6 21:32:03 2022 +0000 circleci - add master back, retry checkout for ios (#84443) add master back so its easier to determine when something started failing retry checkout for ios, based on the provided circleci checkout but with a lot of stuff removed Pull Request resolved: https://github.com/pytorch/pytorch/pull/84443 Approved by: https://github.com/janeyx99 commit 7a7b05802ac6b2cd14ffcc1af512d0c5cc46bf33 Author: Peter Bell Date: Tue Sep 6 19:19:14 2022 +0100 Add col2im_batched kernel (#84543) Closes #84407 This changes col2im on CUDA to launch a single batch-aware kernel instead of launching n single slice kernels. The `istft` call in the linked issue goes from 98.7 ms to 858 us on my machine, for an over 100x speedup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84543 Approved by: https://github.com/ngimel commit bab1304f59cf48901891aad73974dc123ad9614a Author: Antonio Kim Date: Tue Sep 6 20:55:34 2022 +0000 Add step closures (#84300) Ports over the step closure functionality from PyTorch/XLA to Lazy Tensor Core: References: https://github.com/pytorch/xla/blob/205ae574c0a24e092899ea8610c360f93f5d8142/torch_xla/core/xla_model.py#L852-L900 https://github.com/pytorch/xla/blob/205ae574c0a24e092899ea8610c360f93f5d8142/torch_xla/utils/closures.py#L7-L83 CC: @wconstab @JackCaoG @Krovatkin Pull Request resolved: https://github.com/pytorch/pytorch/pull/84300 Approved by: https://github.com/JackCaoG, https://github.com/wconstab commit 02da9437b0ed501c2403e133b8c81eab5802c586 Author: Edward Z. Yang Date: Tue Sep 6 09:57:24 2022 -0700 Store SymInt out of line (#84390) swolchok reported that non-tracing usage of Tensor we are wasting a lot of time on is_symbolic() tests, e.g., when destructing SymInts. This is a regression for no good reason because we don't actually ever have SymInts in those cases. This PR moves the stored SymInts on Tensor out of line, into a separate ExtraMeta struct, which is only allocated when we make a Tensor store symbolic sizes/strides. To avoid adding another word to TensorImpl, I take over the named tensor metadata field. This makes named tensor require a double indirection and use up more space, but it's OK since we're going to delete this feature anyway soon. I restore regular int64_t storage on Tensor. This entailed reverting https://github.com/pytorch/pytorch/pull/82467 ; there are no other substantive changes to SizesAndStrides so a close review is not necessary. I don't bother optimizes sizes and strides in ExtraMeta in the same way stock tensor is optimized. I add a SymDimVector alias. I make SymInt UNCHECKED constructor public as it is a useful optimization in some situations when the int is known to be positive. I thought about storing the SymInts on the Python object instead. However, because we can allocate symbolic shape tensors directly from C++, we cannot guarantee that there is a PyInterpreter for a Tensor. So we do it this way instead; it's also faster since you don't have to take out the GIL to do accesses. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84390 Approved by: https://github.com/swolchok, https://github.com/Krovatkin commit 7f90606309bda30e82f571d9720b25e85a041246 Author: Max Podkorytov Date: Tue Sep 6 20:07:56 2022 +0000 [static-runtime] update generator for the modified tests; re-run autogen script (#84437) Test Plan: CI Reviewed By: mikeiovine Differential Revision: D39183148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84437 Approved by: https://github.com/mikeiovine commit 6363b1b3587aa64ad055ba0a905af28d8dec52d2 Author: Ivan Yashchuk Date: Tue Sep 6 19:56:17 2022 +0000 Add nvFuser support for aten.native_batch_norm_backward (#84546) Replacing `tensor.reshape(broadcast_mask)` with unsqueezes makes the implementation of `batch_norm_backward` more friendly for PrimTorch+nvFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84546 Approved by: https://github.com/Chillee commit 7243264c61a15446bad0fcd412a1fee1bc08ec1e Author: F-G Fernandez <26927750+frgfm@users.noreply.github.com> Date: Tue Sep 6 19:24:10 2022 +0000 fix: Allowed optimizers with more than 2 betas (#84486) Hello there :wave: As discussed in #84485, this PR enables more flexibility on the optimizers that are wrapped by LR schedulers in PyTorch. Currently, it is incompatible with optimizers that have a number of betas different than 2. This PR fixes that with minimal modifications. Fixes #84485 Any feedback is welcome! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84486 Approved by: https://github.com/Lezcano, https://github.com/soulitzer commit e20f2172954609e44f014146a291fd521d29180e Author: Ivan Yashchuk Date: Tue Sep 6 19:16:39 2022 +0000 Remove unnecessary decomposition_table= from test/test_prims.py (#84188) Follow-up to 83782 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84188 Approved by: https://github.com/jjsjann123, https://github.com/ngimel commit 88b1cc885cc92b9483eec95546bb48c7bccea070 Author: Fabio Rocha Date: Tue Sep 6 16:35:26 2022 +0000 Removed tri[lu]* tests, superseeded by OpInfos (#84256) triu, tril, triu_indices and tril_indices had some tests in test_tensor_creation_ops.py and test_cuda.py that are redudant with the ones done by OpInfos for those ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84256 Approved by: https://github.com/Lezcano, https://github.com/ngimel commit 92a6b970baf87b9cb85112f5facb6af51c48c8c0 Author: cchheennhhaaoo Date: Tue Sep 6 18:38:27 2022 +0000 Be compatible with SYCL 2020 and SYCL1.2.1 for sycl.hpp (#83259) - In SYCL2020, SYCL provides one standard header file: , which needs to be included in every translation unit that uses the SYCL programming API. - For compatibility with SYCL 1.2.1, SYCL provides another standard header file: , which can be included in place of . - SYCL documents this change in [doc](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:headers-and-namespaces)(4.3). - SYCL_LANGUAGE_VERSION substitutes an integer reflecting the version number and revision of the SYCL language being supported by the implementation in SYCL 2020. In SYCL1.2.1, the macro name is CL_SYCL_LANGUAGE_VERSION. So these two macros can be used to distinguish SYCL1.2.1 and SYCL2020. - SYCL 2020 doc: https://registry.khronos.org/SYCL/specs/sycl-2020/pdf/sycl-2020.pdf - SYCL 1.2.1 doc: https://registry.khronos.org/SYCL/specs/sycl-1.2.1.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/83259 Approved by: https://github.com/malfet commit c4e8d6282bc730ab35fc3a42c12bfda7a99a5b1c Author: Jason Ansel Date: Tue Sep 6 18:36:24 2022 +0000 Improve getitem syntax for TensorType (#84555) Allows `TensorType[Dyn, 3, Dyn]` instead of the prior `TensorType[(Dyn, 3, Dyn)]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84555 Approved by: https://github.com/jamesr66a commit fa99b7b8f726f8ce63f0dff076b7e9171e3dd40a Author: Sergei Vorobev Date: Tue Sep 6 18:14:08 2022 +0000 [bazel] fix integration test (#79843) Fixes broken bazel `bazel test //:integration_test` Bazel needs a way to download the mnist dataset that's used in the integration test. This patch does it through a genrule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79843 Approved by: https://github.com/malfet commit 4f0b9f3c31bebdb46df8f78f13a0857f6c4ed43f Author: mikey dagitses Date: Tue Sep 6 18:08:42 2022 +0000 move PyTorch internal-only starlark files into fb/ subdirectories (#84548) Summary: These are not used in OSS so should not clutter them there. Test Plan: Rely on CI. Differential Revision: D39262135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84548 Approved by: https://github.com/DanilBaibak commit c794ee5cc12192da527bbbcf5c5b9ec33c935cbe Author: Nikita Shulga Date: Tue Sep 6 17:49:29 2022 +0000 Reenable TestCppExtensionJIT on M1 (#84552) Works fine locally, let's see if it'll pass CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/84552 Approved by: https://github.com/kit1980 commit c771d73461449f89e26bc4130d1641340a03e05d Author: Richard Zou Date: Tue Sep 6 07:10:33 2022 -0700 [composite compliance] fix max_pool1d (#84127) max_pool1d has a fast path for CPU tensors that do not require grad that directly accesses the data_ptr. This PR makes the change that if the input Tensor is a Tensor Subclass, then we want to walk through the "slow path" of calling max_pool1d_with_indices. Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/84127 Approved by: https://github.com/kshitij12345, https://github.com/samdow, https://github.com/malfet commit 139599ba954e084ed6962dc94c99f5f2ce6ec2e7 Author: Richard Zou Date: Tue Sep 6 07:10:33 2022 -0700 Contiguify bias in slow_conv_transpose3d kernel (#84125) Users never run into this because PyTorch now comes with cudnn by default and cudnn has a better conv_transpose implementation. However we seem to test without cudnn in our CI; and also, ROCM goes down this path. The .contiguous() call does not regress anything because previously it was a runtime error. Because this kernel is the "slow conv transpose3d kernel", we don't care much for its performance. Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/84125 Approved by: https://github.com/ngimel commit ee228ad9499ca97c267e5597d36570e096dcf2c0 Author: PyTorch MergeBot Date: Tue Sep 6 17:12:12 2022 +0000 Revert "[BE] Use `teardown-linux`/`chown` actions for binary builds (#84449)" This reverts commit 1a16b2576f69383480e8be889531e4f574356c62. Reverted https://github.com/pytorch/pytorch/pull/84449 on behalf of https://github.com/malfet due to Revert as it broke trunk, though on next PR commit faac3dbce20a6068a3e530c11788896e81a73c64 Author: kshitij12345 Date: Tue Sep 6 16:58:42 2022 +0000 [optim] asgd : handle complex params as independent real params (#84472) Ref: #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84472 Approved by: https://github.com/Lezcano, https://github.com/soulitzer commit f725009a48dcbec6c9e9378880314d30a9080c82 Author: Nikolay Korovaiko Date: Wed Aug 31 23:51:35 2022 -0700 as_strided supports SymInt; codegen supports optional SymInt (#84393) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84393 Approved by: https://github.com/ezyang commit ee57f5c6c81b1622e3d34f5f0c4f20aad108797f Author: Nikolay Korovaiko Date: Wed Aug 31 20:55:41 2022 -0700 fix skipIfTorchDynamo on classes (#84392) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84392 Approved by: https://github.com/ezyang commit 5e9c26c8e25e5d5be18ff98b7808b674b1e7a0a5 Author: George Qi Date: Tue Sep 6 06:56:38 2022 +0000 [maskedtensor] adding reductions (#82839) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82839 Approved by: https://github.com/bhosmer commit f125bd2cbb8301c12685957ace573c301e1056e2 Author: Peter Bell Date: Tue Sep 6 13:43:09 2022 +0100 Support torch.ScriptObject in torch::jit::as_object (#84398) When a torchbind class is returned from an operator, it has the class `torch.ScriptObject`, yet the `torch.ops` interface checks against `torch.jit.RecursiveScriptClass` or else falls back to a much slower path that doesn't return the original c++ object. On my machine I see a 2 us performance improvement when calling a `torch.ops` function with a `ScriptObject` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84398 Approved by: https://github.com/ezyang commit 207a5a8fa9bfd9361038d46636c0440290c171bb Author: PyTorch MergeBot Date: Tue Sep 6 13:23:19 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#84383) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84383 Approved by: https://github.com/pytorchbot, https://github.com/ezyang commit d2b8b8f29121ed23e2b39446d09bba3a7eb96684 Author: Paul Saab Date: Tue Sep 6 03:05:52 2022 +0000 [aarch64] Unused variable (#84549) Summary: Declare another variable unused Test Plan: CI Reviewed By: andrewjcg Differential Revision: D39263305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84549 Approved by: https://github.com/jianyuh commit 26c136a135fe0215195a6e0566651baaffb01159 Author: Peter Bell Date: Mon Sep 5 15:02:02 2022 +0100 Use TensorBase in Shuffle and WeightNorm cpu kernels (#84499) These files are already only using the subset avalilable with TensorBase, so this is straight-forward name substitution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84499 Approved by: https://github.com/ezyang commit 6f29642b6f27f53295ead7c3f2767ef45307e710 Author: Peter Bell Date: Mon Sep 5 15:02:01 2022 +0100 Remove Tensor.h includes from spdiags cpu kernel (#84500) This file uses `Tensor::operator[]` in the middle of a `cpu_kernel` which is not allowed because it relies on the thread-local dispatcher state. Instead, we should just do the stride calculations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84500 Approved by: https://github.com/ezyang commit 1a16b2576f69383480e8be889531e4f574356c62 Author: Nikita Shulga Date: Mon Sep 5 21:44:30 2022 +0000 [BE] Use `teardown-linux`/`chown` actions for binary builds (#84449) Also embed `wait_for_ssh_to_drain.sh` into the action (to make it more reusable across repos) and delete unused teardown_linux template from `common.yml` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84449 Approved by: https://github.com/kit1980 commit 91a5f52f51de9d6aa305d184fe07fe15d20b82c9 Author: Fabio Rocha Date: Mon Sep 5 14:27:37 2022 +0000 Decomp for nn.functional.grid_sampler_2d (#84350) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84350 Approved by: https://github.com/jansel, https://github.com/Lezcano commit acb11da556ddb2302ac14531c5ddf7016ff34a97 Author: Alexander Grund Date: Mon Sep 5 21:23:50 2022 +0000 Increase default test timeout for distributed tests (#80330) When running on clusters the startup time for the subprocesses might be much higher which leads to spurious failures. So increase this to 300s similar to torch/testing/_internal/distributed/distributed_test.py Also introduces `DISTRIBUTED_TESTS_DEFAULT_TIMEOUT` as suggested by @malfet in #55896 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80330 Approved by: https://github.com/malfet commit da99008d3775859832990b3b930ed3c1e4151637 Author: Boyoon Jang Date: Mon Sep 5 16:48:32 2022 +0000 fix typo in torch/package/_mock.py (#84508) Fixed a typo in torch/package/_mock.py Fixes #84507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84508 Approved by: https://github.com/H-Huang commit e79d0ebfa6d09bc4728bf63ae56cae28b831dbfe Author: Hyeongjun Sim Date: Mon Sep 5 16:34:02 2022 +0000 Fix typo in core.py (#84534) This is a minor typo fix in core.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/84534 Approved by: https://github.com/H-Huang commit 1896d801913fe156f46b0b65f4b1e38f314210b3 Author: Louis Feng Date: Mon Sep 5 16:11:49 2022 +0000 [PyTorch][Profiler] Increase max number of elements to record in execution graph (#84285) Summary: Noticed some jobs are exceeding the max num of elements in an array. 100 was too conservative (observed 128 sizes in CMF model), but we also don't want have unbounded container size. Setting to a large number 4096 that probably will catch extreme cases. Test Plan: ``` buck build mode/opt-split-dwarf //hpc/models/ads:ads_10x_launcher --show-output buck-out/gen/hpc/models/ads/ads_10x_launcher.par +checkpoint=model_store +launcher=mast +data_loader=dist +mode=mast launcher.data_project=ads_model_platform launcher.fbl_entitlement=ads_global_qps checkpoint.model_type=ctr_mobile_feed_model data_loader.table_ds=["2022-08-15"] data_loader.num_batches=5000 profiling_trace=true ``` Differential Revision: D39137530 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84285 Approved by: https://github.com/robieta commit 7e05879b463cfa21b1cf3c26279bf248f835f52e Author: Vasilis Vryniotis Date: Mon Sep 5 13:15:55 2022 +0000 Fix fx test for S3D (#84526) Fixing [failing](https://github.com/pytorch/pytorch/runs/8083404365?check_suite_focus=true) tests by adjusting the input size for S3D. The reason the test is failing is because S3D requires a bigger input size than previously passed. As noted before, TorchVision already checks that its models are FX traceable and ensures all the tests are updated and work properly prior adding new architectures. The tests here seem to duplicate our efforts and often break because they don't factor in details about each model. It might be worth considering running TorchVision's tests instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84526 Approved by: https://github.com/pbelevich commit 437b066e26fab4f84c55314d9a0f6299525297a1 Author: PyTorch MergeBot Date: Mon Sep 5 09:58:11 2022 +0000 [xla hash update] update the pinned xla hash (#84533) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84533 Approved by: https://github.com/pytorchbot commit edab44f6dd4d5fbe00136c70c99be12a8f67e9f7 Author: Ivan Yashchuk Date: Mon Sep 5 08:49:01 2022 +0000 Support a few corner cases for nvFuser executor (#84416) This PR adds asserts to the `nvfuser_execute` function for the cases that do not work. Fallback to eager is used in those cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84416 Approved by: https://github.com/jjsjann123, https://github.com/ngimel commit 9a6aa9053f79127721875e371addd9c3baeaaac0 Author: Edward Z. Yang Date: Fri Sep 2 22:15:00 2022 -0400 Don't convert INT64_MAX start index into zero (#84509) I... don't understand why we did it this way in the first place? Source: https://github.com/pytorch/pytorch/pull/48719/files#r962087365 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84509 Approved by: https://github.com/ngimel commit e91c1e65b6b4b324284d891c13ce2f612129e9be Author: Paul Saab Date: Sun Sep 4 23:47:59 2022 +0000 [aarch64] Fix _mm_pause() on aarch64 (#84505) Summary: It's possible if you're using simde that _mm_pause is already defined, so intsead use the asm for yield Test Plan: CI Differential Revision: D39225258 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84505 Approved by: https://github.com/ajtulloch commit 7c4c7dafbdf2c41ccd9042f1db4f9f9f01a42f00 Author: titaiwang Date: Sun Sep 4 00:01:00 2022 +0000 [ONNX] Add onnx::LayerNorm support for version 17 (#84293) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84293 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit 6d6e04d6cc9a3d7cf5d9a2eda5baafd5c3ee75c0 Author: Kshiteej K Date: Sat Sep 3 07:21:48 2022 +0000 [test_nn] move dropout tests to test/nn/test_dropout.py (#84165) Ref https://github.com/pytorch/pytorch/issues/63085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84165 Approved by: https://github.com/albanD commit e46c1c7931da2d723a6cad4ec307ff4ed4e9cb7f Author: Paul Saab Date: Sat Sep 3 04:06:26 2022 +0000 [aarch64] Cast to signed char to fix aarch64 build (#84429) Summary: Force SHORT_BINUNICODE and PROTO to signed char to fix build on aarch64 Test Plan: CI Differential Revision: D39198776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84429 Approved by: https://github.com/ajtulloch commit 388368b6996479f6eca484d4e60a6250b2535dec Author: Justin Chu Date: Fri Sep 2 23:19:03 2022 +0000 [ONNX] Fix type annotations and enable type checking for all apis (#84091) Enable runtime type checking for all torch.onnx public apis, symbolic functions and most helpers (minus two that does not have a checkable type: `_.JitType` does not exist) by adding the beartype decorator. Fix type annotations to makes unit tests green. Profile: export `torchvision.models.alexnet(pretrained=True)` ``` with runtime type checking: 21.314 / 10 passes without runtime type checking: 20.797 / 10 passes + 2.48% ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84091 Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi commit 2a332afbf41b68080a9436e910b93af7cd336fbc Author: Edward Z. Yang Date: Fri Sep 2 08:53:59 2022 -0700 Add SymFloat, support SymInt to SymFloat conversion (#84284) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284 Approved by: https://github.com/albanD commit 7f5da70ef0be0d3fa60d92430548d2fff6f93ef9 Author: Yeounoh Chung Date: Fri Sep 2 23:15:17 2022 +0000 Avoid hitting the fused path in Linear for xla backend. (#84503) Fixes #84244 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84503 Approved by: https://github.com/JackCaoG, https://github.com/ezyang commit 3dfbf09afebc067f5ddea60f7db5cd2aa0b98f93 Author: lezcano Date: Fri Sep 2 12:19:03 2022 +0000 Optimise the decomposition for `adaptive_avg_pool2d` wrt. TorchInductor (#84483) This fixes some part of the implementation that did not work with TorchInductor (e.g. the indices in TorchInductor need to be `int64`s, while in PyTorch we can have `int32`s). It also brings up the performance of the kernel to similar numbers than those of the lowering (benchmarks below). Pull Request resolved: https://github.com/pytorch/pytorch/pull/84483 Approved by: https://github.com/jansel commit ab6c57217a97438c8e13952a407e42873e2259f3 Author: Masaki Kozuki Date: Fri Sep 2 21:57:45 2022 +0000 Add NCCL PreMul Sum to c10d `redce` ops (#84243) This is based on #81272 but this conforms to TorchScript Compiler - [ ] Update https://github.com/pytorch/pytorch/blob/abaf8112e6d6bed2a5d33dcbc1d46ed20b8e80de/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp#L64-L73 to use `ReduceOp::RedOpType`. In my first try with `USE_SYSTEM_UCC=1`, this change wasn't necessary (I think) because of `ReduceOp::RedOpType` operator. That being said, I want to make it more explicit. cc @ptrblck @kwen2501 @aazzolini cc @zasdfgbnm for visibility to the TODO above Pull Request resolved: https://github.com/pytorch/pytorch/pull/84243 Approved by: https://github.com/kwen2501 commit 0b363c5c5c1832820466b7768b353db121809018 Author: Natalia Gimelshein Date: Fri Sep 2 21:18:58 2022 +0000 don't synchronize single element any/all reductions (#84465) Fixes #84291 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84465 Approved by: https://github.com/ezyang commit 5ffda02388f1a1a3c83d8e6676ec1c7019c5ecd1 Author: Peter Bell Date: Fri Sep 2 18:49:30 2022 +0100 Fix alertCuBLASConfigNotDeterministic to respect warn_only=True (#84215) This cublas check would error even if the `warn_only=True` flag is passed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84215 Approved by: https://github.com/kurtamohler, https://github.com/albanD commit 65beff5acb0d7c0c484bd0558bcaf8ddc9c96aab Author: lezcano Date: Thu Sep 1 08:25:04 2022 +0000 Dispatch torch.norm to linalg.vector_norm and linalg.matrix_norm (#81761) `torch.norm` is very odd. Some notable issues are: - The default value of `"fro"` in `torch.norm` has an odd behaviour when `dim=None`. This is handled in the new dispatch - The treatment of the `dtype` argument in `torch.norm` was completely wrong. This should fix it - Some `out=` variants in the previous implementation were also wrong. This should fix those. - This new dispatch should make some paths much faster. For example, `torch.norm(x)` where `x` is complex. I'll try to make the changes in these PRs as incremental as possible as this is a tricky one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81761 Approved by: https://github.com/ngimel commit 72f0f24a764e01a0af2c8c96394fa15db0b41a41 Author: Natalia Gimelshein Date: Fri Sep 2 18:08:39 2022 +0000 remove unneeded _to_copy meta (#84460) Fixes #84335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84460 Approved by: https://github.com/Chillee commit 9b115c7bd32b4a516f253a217bc8ec47bd07c44d Author: Andrew M. James Date: Thu Sep 1 13:54:42 2022 -0500 Sparse Compressed Transpose add support for Batch dims and BSR/BSC layouts (#82122) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82122 Approved by: https://github.com/bhosmer commit 0192a34910c8873175380791b963517b18c44075 Author: Andrew M. James Date: Thu Sep 1 13:54:42 2022 -0500 Dense -> CSC support batch dimensions (#83086) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83086 Approved by: https://github.com/bhosmer, https://github.com/nikitaved commit a5a01e443ce1dd8e31ef7d0b3fd6a2359881a922 Author: Andrew M. James Date: Thu Sep 1 13:54:42 2022 -0500 Dense->BSR performance improvment (#83085) Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83085 Approved by: https://github.com/bhosmer, https://github.com/nikitaved commit f0e5b7336410a24088069a7b620bfccc6372338a Author: Andrew M. James Date: Thu Sep 1 13:54:41 2022 -0500 Dense -> CSR support batch dimensions (#83084) Only requires changes to the dense->sparse pathway. The reverse already has support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83084 Approved by: https://github.com/bhosmer, https://github.com/nikitaved commit 2d969dc2ca9e3ccf0c87d5d45d9321228f51b865 Author: PyTorch MergeBot Date: Fri Sep 2 17:40:17 2022 +0000 Revert "Support a few corner cases for nvFuser executor (#84416)" This reverts commit 3db3845f5f20047d9a30f450d3936e4113975ae6. Reverted https://github.com/pytorch/pytorch/pull/84416 on behalf of https://github.com/malfet due to Broke both trunk and pull, see https://hud.pytorch.org/pytorch/pytorch/commit/3db3845f5f20047d9a30f450d3936e4113975ae6 commit f803fa9fc94ea7e744885926f654479e578850cf Author: Driss Guessous Date: Fri Sep 2 16:31:55 2022 +0000 [Nested Tensor] Add a NestedTensorUtils header and cpp file for organization (#84385) Trying to do some clean up into code structure for nested tensors. This introduces a utility header and cpp file that implements helper functions. This is the initial PR in more clean up. The next would be separating out the all native functions that create nested tensors into their own file since they do not infact do math on nested tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84385 Approved by: https://github.com/mikaylagawarecki commit ae67099e88970b8fab140717d8251d9f5e9943b0 Author: Justin Chu Date: Fri Sep 2 15:15:30 2022 +0000 Fix type annotation in `_ConvNd` for in_channels (#84302) `_ConvNd` has an attribute `in_channels` that was mistakenly annotated as `_in_channels`. This fixes https://github.com/pytorch/pytorch/issues/84223 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84302 Approved by: https://github.com/albanD commit 3db3845f5f20047d9a30f450d3936e4113975ae6 Author: Ivan Yashchuk Date: Fri Sep 2 14:57:05 2022 +0000 Support a few corner cases for nvFuser executor (#84416) This PR adds asserts to the `nvfuser_execute` function for the cases that do not work. Fallback to eager is used in those cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84416 Approved by: https://github.com/jjsjann123, https://github.com/ngimel commit 0fd173b097f27b7dd190b25ae13075ba3bf25a5a Author: PyTorch MergeBot Date: Fri Sep 2 10:45:41 2022 +0000 Revert "Support a few corner cases for nvFuser executor (#84416)" This reverts commit 3ac9f6683dc8f17e030699da4df6c767f22939b6. Reverted https://github.com/pytorch/pytorch/pull/84416 on behalf of https://github.com/IvanYashchuk due to trunk CI is failing due to sneaked in print_tabular() call commit 3ac9f6683dc8f17e030699da4df6c767f22939b6 Author: Ivan Yashchuk Date: Fri Sep 2 06:42:39 2022 +0000 Support a few corner cases for nvFuser executor (#84416) This PR adds asserts to the `nvfuser_execute` function for the cases that do not work. Fallback to eager is used in those cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84416 Approved by: https://github.com/jjsjann123, https://github.com/ngimel commit cb4421b19cf7aa3a1a6cfcc7e0677f2b2ba0a9b6 Author: Huy Do Date: Fri Sep 2 05:12:55 2022 +0000 [Proof of Concept] Use labels to select the test configs to run (#83690) This is the proof-of-concept PR to support linux. Other platforms will follow in subsequent PRs. Per feedbacks from the team, I have changed the label to be `test-config/CONFIG`, for example `test-config/functorch` to make it clear that this is not `ciflow`. * The script maintains a set of valid test configs (shard names) including `default`, `functorch`, `dynamo`, etc. * If the PR has one or more labels as specified in the set, i.e. **test-config/functorch**, only these test configs will be selected. If the PR has both `test-config/functorch` and `ciflow/trunk`, both will be taken into account: **All functorch builds and tests in trunk will be run** * If the PR has none of the test-config label, all tests are run as usual. Basically, the CI workflow will be `filter (part of build) -> build -> filter -> test[filtered_matrix]`. The filter is applied twice before build and test because we want to get the latest labels from the PR right before the steps are run. This is mainly to avoid GHA static list of labels that is only populated at the time of the pull request event, for example, a new pull request will have no label, This PR has a bunch of random labels but it includes two important labels among them `test-config/functorch` and `test-config/dynamo`. The former was added before the CI started while the latter was added after (but before the test started). Only functorch and dynamo tests (multiple shards) were run. Also, I manage to find a way to hide the majority of skipped tests, so they won't clutter the signal box that much Pull Request resolved: https://github.com/pytorch/pytorch/pull/83690 Approved by: https://github.com/ZainRizvi commit 97b2dff60081e1092cfd6d1b3a80c995ff3d6148 Author: Elias Ellison Date: Thu Sep 1 23:37:55 2022 +0000 Add Initial Support For Fake Tensor Constant Tracking (#84387) Adds support for constant tensor tracking within FakeTensors. Copy-pasta'ing from `proxy_tensor.py` why this is useful: ``` ``` This PR only attempts to add support for the tracing scenarios where we run each operation linearly - aot autograd, torchdynamo. It does not yet handle how constant tensors should be handled as part of the persistent fx graph. Additionally, it does not yet attempt to de-duplicate or interact with ProxyMode's only constant tensor handling. Edit: plan is to rely on functionalization for fx graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/84387 Approved by: https://github.com/ezyang commit 832ce5f8fad374ab1dd8bae16c28cd6004938ab3 Author: Zafar Date: Thu Sep 1 22:40:31 2022 +0000 Adding codeowners to quantization, sparsity, ns, etc. (#79505) The notifications for the AO-maintained codebase. This should not be blocking, just PR/test notifications. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79505 Approved by: https://github.com/vkuzo, https://github.com/albanD commit f6ce2a442e8f88b39c11b07fb5c716f6ef4bd06d Author: Edward Z. Yang Date: Thu Sep 1 13:43:06 2022 -0700 Refactor PyInterpreter to use normal vtables (#84388) I realized that we can deal with the dead vtable problem by... introducing another indirection! The resulting code is worse (you have to do one more dereference to get to the vtable), but the reduction in boilerplate is, IMO, worth it. I did this refactor because I'm about to add a lot more methods to PyInterpreter to handle expunging SymInt from TensorImpl. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84388 Approved by: https://github.com/albanD commit 241c99232e67dfde18dd40bf821e453ab4c313b1 Author: Nikita Shulga Date: Thu Sep 1 23:56:05 2022 +0000 Fix typo (#84439) s/bionicl/bionic/ hattip to @kit1980 for reporting in https://github.com/pytorch/pytorch/pull/84314#discussion_r960099849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84439 Approved by: https://github.com/seemethere, https://github.com/clee2000, https://github.com/atalman commit edec9698abde6207ca3a06718568807fe5c037dd Author: Sergii Dymchenko Date: Thu Sep 1 23:55:25 2022 +0000 Fix ScripModule typo (#84444) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84444 Approved by: https://github.com/malfet commit 375d6cd5b7075286f9d925341201cb2776e311a8 Author: PyTorch MergeBot Date: Thu Sep 1 23:42:48 2022 +0000 Revert "Move decompositions and helpers for jvp from functorch into core (#84358)" This reverts commit a3c60a4db464aa32b3217e45fdc9013ad6a535ae. Reverted https://github.com/pytorch/pytorch/pull/84358 on behalf of https://github.com/malfet due to Broke lint commit 6ef85dc99079b770d96e4cc87bdc5b047441e9a9 Author: updaun Date: Thu Sep 1 23:01:06 2022 +0000 Fix minor typo in rpc_test.py (#84431) This fixes a very minor typo in the `rpc_test.py` comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84431 Approved by: https://github.com/mrshenli commit a65b88d516316695f3f930a0d39e5c25f0f38729 Author: Sergii Dymchenko Date: Thu Sep 1 22:57:50 2022 +0000 Import forgotten pack_weight_bias in rnn.py (#84315) `pack_weight_bias` is exported in `__all__`, but the actual import was lot during migration in https://github.com/pytorch/pytorch/pull/78714. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84315 Approved by: https://github.com/seemethere, https://github.com/malfet commit 73cb6cf8ae355417e0e9b6b9614492b280f66ae7 Author: drisspg Date: Thu Sep 1 22:50:59 2022 +0000 Fixing back invariant on offsets (#84433) I changed the calculation of offsets to add an extra element for bounding above. This invariant makes sense in the contiguous case but when ntesnor[i] is sliced like in this pr: #83736 this doesn't make semantic sense anymore. SO changing back, Borderline stampy Pull Request resolved: https://github.com/pytorch/pytorch/pull/84433 Approved by: https://github.com/mikaylagawarecki commit a3c60a4db464aa32b3217e45fdc9013ad6a535ae Author: soulitzer Date: Thu Sep 1 15:26:23 2022 -0400 Move decompositions and helpers for jvp from functorch into core (#84358) This refactor shouldn't change any behavior. At this point functorch still relies on the mechanism in DynamicLayerFront; we just moved some parts of it into core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84358 Approved by: https://github.com/samdow commit eaab653376da76cd3038b7f2bed37b03e2048522 Author: Ian Graves Date: Thu Sep 1 22:38:59 2022 +0000 Read via FileAdapter when loading files in torch if not flatbuffer - Part 2 (#84296) Summary: D38998858 (https://github.com/pytorch/pytorch/commit/3fae89d4a468a02be501357eb123ce2bf7086d2f) used the wrong version of `_load_for_mobile` that kept the "load everything in memory then parse" technique. This fixes it to call the `_load_for_mobile_impl` version which for non-flatbuffer models will stream parse. See D38998858 (https://github.com/pytorch/pytorch/commit/3fae89d4a468a02be501357eb123ce2bf7086d2f) for the expected memory optimization gains. Test Plan: CI Signals. Reviewed By: qihqi Differential Revision: D39138280 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84296 Approved by: https://github.com/qihqi commit a563a4880fe577e986b63a288bb8bf00a1fb7618 Author: Linbin Yu Date: Thu Sep 1 22:32:55 2022 +0000 [Edge] Add an option to avoid adding base ops to static op library (#84360) Summary: We use a static op library in a test for PyTorch C++ usages, but don't want to introduce all base ops. Because the goal is to check if a given model can run on the exact op collection (i.e., fbios ops, fb4a ops), and these base ops are not present in real apps. So add an option to disable this feature. Test Plan: Build. Expect no change to existing targets. Differential Revision: D39164021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84360 Approved by: https://github.com/kit1980 commit ff56f1c30d2b4ad3a018b8f0c9fee1ffcb06ca4f Author: Yu, Guangye Date: Thu Sep 1 22:22:25 2022 +0000 Define the SYCL device version assertation used in the other backend, like XPU (#84106) We need a device version assertation that can be used in SYCL kernel. SYCL_KERNEL_ASSERT will be used in the kernel launched on device XPU. We add a macro SYCL_KERNEL_ASSERT via __assert_fail declaration for Linux and _wassert declaration for Windows even though NDEBUG is enabled. `__assert_fail` in SYCL kernel `extern SYCL_EXTERNAL void __assert_fail(const char *expr, const char *file, unsigned int line, const char *func);` `_wassert` in SYCL kernel `extern SYCL_EXTERNAL void _wassert(const wchar_t *wexpr, const wchar_t *wfile, unsigned line);` No additional unit test because this change could not affect PyTorch's functionality. It only affects assertation in kernel on XPU backend. So it is difficult to add ut to test it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84106 Approved by: https://github.com/malfet commit 1463c6f3de1bb2113fd22ab9b1bddd2b3a84355d Author: Huy Do Date: Thu Sep 1 22:18:07 2022 +0000 Increase distributed shards (#84430) Per title, increase from 2 to 3 shards. With 2 shards, the test time was about 1.7 hours as show in [HUD](https://hud.pytorch.org/tts/pytorch/pytorch/master?jobName=pull%20%2F%20linux-bionic-cuda11.6-py3.10-gcc7%20%2F%20test%20(distributed%2C%201%2C%202%2C%20linux.8xlarge.nvidia.gpu)) With 3 shards, the time drops to about 1.1 hours: * 1st shard: https://github.com/pytorch/pytorch/runs/8141516281 (1h16m) * 2nd shard: https://github.com/pytorch/pytorch/runs/8141516449 (59m) * 3rd shard: https://github.com/pytorch/pytorch/runs/8141516593 (1h3m) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84430 Approved by: https://github.com/clee2000 commit ce1b727e774c75f8e31b28ff5915851385c70dcf Author: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Date: Thu Sep 1 21:34:51 2022 +0000 Disable autocast cache in torch.cuda.make_graphed_callables (#84289) There there are conflicts between `torch.clear_autocast_cache()` and `cudaMallocAsync` from #82682. Moreover, the use of autocast caching is not reasonable during training which is the main target of `make_graphed_callables`. cc @eqy @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/84289 Approved by: https://github.com/ngimel commit d39490a711f6d5119444d76d1d2e337e0213beea Author: Edward Z. Yang Date: Thu Sep 1 07:08:03 2022 -0700 Add meta function for repeat (#84349) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84349 Approved by: https://github.com/Krovatkin commit 0fb1495512852fd12f77c6bfb7bf9b86013c8caa Author: Paul Saab Date: Thu Sep 1 20:26:35 2022 +0000 [aarch64] Fix ATen-cpu aarch64 builds (#84294) Summary: Fix ATen-cpu aarch64 builds and hook up cpukernel_neon Test Plan: CI Differential Revision: D39142670 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84294 Approved by: https://github.com/ajtulloch commit 5e5c610a587d671044303c4fa56af20f33eee5dd Author: Andrey Talman Date: Thu Sep 1 20:24:06 2022 +0000 Move slow-grad checks to CUDA-11.6 (#84313) Mitigates #84192 by skipping two tests Please note: We tried to increase the tolerance for test_fn_gradgrad_linalg_det_singular_cuda_float64 but this did not help. Ref: Increase `test_fn_gradgrad_linalg_det_singular_cuda_float64` error tolerance to 1e-4 as suggested in https://github.com/pytorch/pytorch/issues/84192#issuecomment-1230644574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84313 Approved by: https://github.com/malfet, https://github.com/huydhn, https://github.com/Lezcano commit 673b35c847ee6ba67367ba27ff8597c8ae382257 Author: YifanShenSZ Date: Thu Sep 1 20:01:39 2022 +0000 Better reshape with autograd support (#82754) (#84154) The original author is @YifanShenSZ and the original PR is: #82754 Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior. This pull request fixes it by: 1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd` 2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor` Side changes: * add contiguous memory format support to `clone_nested` * add `view_nested` * add `reshape_as_nested` Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754 Test Plan: Imported from GitHub, without a `Test Plan:` line. **Static Docs Preview: executorch** |[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)| |**Modified Pages**| Reviewed By: albanD Differential Revision: D39023822 Pulled By: drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154 Approved by: https://github.com/bdhirsh, https://github.com/albanD commit 9bcad063d8b5253ca5b3013735d3ad0cb3f7e3cb Author: Catherine Lee Date: Thu Sep 1 19:53:36 2022 +0000 disable ios on circleci b/c failing (#84438) reenable when fixed cause is likely: https://status.circleci.com/incidents/lbhyrt87g89r examples of failures: https://app.circleci.com/pipelines/github/pytorch/pytorch/559778/workflows/e17e6b96-649e-4e49-b9f1-c0b1ecd96e02/jobs/17073870 something related to ssh started around 12 hours ago? Pull Request resolved: https://github.com/pytorch/pytorch/pull/84438 Approved by: https://github.com/ZainRizvi commit 88802719b699ce75f1be7818293c76748311a79b Author: Andrew Gu Date: Thu Sep 1 16:59:34 2022 +0000 [FSDP][Easy] Move utils to `_utils.py` (#84212) I pulled this out into a separate PR. This just moves some utility functions to `_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84212 Approved by: https://github.com/rohan-varma commit e71370064c1a475e9179ba8dc05834fefe51413b Author: Qiming Lu Date: Thu Sep 1 18:39:26 2022 +0000 Improvements to FX Minimizer (#83833) Summary: This diff improves the FX Minimizer for better error reports, and fixes a few other issues. Test Plan: CI Differential Revision: D38900309 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83833 Approved by: https://github.com/yuhc, https://github.com/Chillee commit dd82b31e552d4da255bb36266681a0400367314a Author: Angela Yi Date: Wed Aug 31 16:03:47 2022 -0700 [fx] Add metadata to fx.GraphModule (#84378) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84378 Approved by: https://github.com/SherlockNoMad commit 8b578849b4bce1e6ad012d659e1aced04fb2bdc3 Author: PyTorch MergeBot Date: Thu Sep 1 18:34:57 2022 +0000 Revert "[Profiler][Trivial] Create orchestration folder and move observer management there. (#83893)" This reverts commit 48a596ad3f2ca617cd2fafc3fa3c368f5600930a. Reverted https://github.com/pytorch/pytorch/pull/83893 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit 5a73a0291d28f7d510756d8eab4fc942a0455ba8 Author: Michael Andreas Dagitses Date: Thu Sep 1 04:05:25 2022 -0700 re-enable ATen packedtensoraccessor_test (#84397) Summary: Test Plan: Rely on CI. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84397 Approved by: https://github.com/malfet commit fd756caa3633cf4bc0bbcdd5db77683cf18e5eaf Author: BowenBao Date: Thu Sep 1 18:29:41 2022 +0000 [ONNX] Support nn.init.normal (#84149) * Updated symbolic function for `aten::normal` to support additional generator arguments emitted from https://github.com/pytorch/pytorch/blob/5563248b5882231cb99105b042cc32bddd18b912/torch/csrc/jit/passes/remove_mutation.cpp#L51 * Added symbolic function for `aten::is_pinned` and `prim::layout`. Both are unused by ONNX later on. Fixes #83647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84149 Approved by: https://github.com/AllenTiTaiWang, https://github.com/abock commit 5d39e8de572c1ae426a762b7f1b71a4bb064e85c Author: samdow Date: Tue Aug 30 11:03:30 2022 -0400 add matrix rank op info tests with non-default kwargs (#84074) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84074 Approved by: https://github.com/zou3519 commit 041edeeecb75f3c110605d7311fa46abe1c62ea9 Author: SmirnovKol <31559413+OccupyMars2025@users.noreply.github.com> Date: Thu Sep 1 17:56:50 2022 +0000 Fix several typos (#83823) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/83823 Approved by: https://github.com/ngimel, https://github.com/kit1980 commit 7a348a1d4aa2dcea4d78a4cd4f772155fce38012 Author: Rodrigo Kumpera Date: Thu Sep 1 17:54:10 2022 +0000 Fix internal breakage caused by #82134 (#84363) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84363 Approved by: https://github.com/rohan-varma, https://github.com/mehtanirav commit 7ffa10036c846a3d4148bb3deed8b77ff506a9cc Author: PyTorch MergeBot Date: Thu Sep 1 17:47:52 2022 +0000 Revert "[Profiler] Unify global and thread local profiler lookup. (#83894)" This reverts commit c06a5586f57c844fdc4a98e52f88e71f64dd54d2. Reverted https://github.com/pytorch/pytorch/pull/83894 on behalf of https://github.com/mehtanirav due to [Internal breakages](https://www.internalfb.com/intern/sandcastle/job/13510799644553996/artifact/runsandcastle?selectedLines=990-990-7-65) commit 6dc9223c8bb107fc9794d867a0ec8cdcff89382b Author: Andrew M. James Date: Wed Aug 31 15:25:08 2022 -0500 Sparse_coo: Be more agressive in setting coalesced True to avoid suprising behaviors (#82426) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82426 Approved by: https://github.com/pearu, https://github.com/bhosmer commit 2e0f5bce3917ba42ac106101b21e20d99d067928 Author: PyTorch MergeBot Date: Thu Sep 1 17:23:21 2022 +0000 Revert "Fix several typos (#83823)" This reverts commit f9609d82038897ac560b408808e9dba9f39bc922. Reverted https://github.com/pytorch/pytorch/pull/83823 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit bf62ece5364486385bdabc43c72b7681e213057e Author: Max Podkorytov Date: Thu Sep 1 17:21:22 2022 +0000 [static-runtime] add schema checks to most of the ops where these checks are missing (#84163) Test Plan: existing unit tests; also fix some failing ones along the way Differential Revision: D39074902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84163 Approved by: https://github.com/mikeiovine commit d648375f13b4a4efd4cd35247098679fce5d4bcd Author: Kevin Tse Date: Thu Sep 1 15:12:37 2022 +0000 [GHF] Changing the ordering in merge rules to allow more appropriate messages to be raised first (#84359) Changing the ordering in merge rules to allow more appropriate messages to be raised first. Context: [#84279](https://github.com/pytorch/pytorch/pull/84279#issuecomment-1233130498) @janeyx99: "Approving to unblock, but modifying the merge rules to move the Core maintainers rule to last would be a good idea." Pull Request resolved: https://github.com/pytorch/pytorch/pull/84359 Approved by: https://github.com/janeyx99, https://github.com/ZainRizvi, https://github.com/malfet commit bfdfeecd151fde72b05cc96113999d4049485673 Author: soulitzer Date: Wed Aug 31 17:53:32 2022 -0400 Add per-op MPS gradient tests and update skips (#84242) Follow up: - ~Remove non-float dtypes from allow-list for gradients~ - ~Map dtypes to short-hand so there aren't so many lines, i.e. float16 should be f16.~ - ~There were a lot of linting issues that flake8 wouldn't format for me, so I reformatted with black. This makes the diff a little trickier to parse.~ Observations: - there are entries in the allow-list that weren't there before - some forward that we previously passing now fail with requires_grad=True - Because the allow list does not know about variants, a special skip was added for that in the block list Pull Request resolved: https://github.com/pytorch/pytorch/pull/84242 Approved by: https://github.com/kulinseth, https://github.com/malfet commit f1ee162193102464d92140edb84c3a99012ad0cb Author: Edward Z. Yang Date: Thu Sep 1 07:10:00 2022 -0700 Use SymInt signature to compute saved variables (#84354) This seems to have been accidentally working, but it broke when I added support for saving optional SymInt directly from input arguments. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84354 Approved by: https://github.com/Krovatkin commit 5e2c23377a0ea6410c8e6a624b1cc516af19f63b Author: Edward Z. Yang Date: Thu Sep 1 07:10:46 2022 -0700 LTC codegen appears to be hardcoded to only support tensors (#84355) Assert accordingly Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84355 Approved by: https://github.com/wconstab commit 7d9e54673881501e3a2b165fe3d703d2898350fd Author: breidct <51497916+breidct@users.noreply.github.com> Date: Thu Sep 1 16:16:45 2022 +0000 Replace assertEqualIgnoreTypes in common_nn.py (#84210) See #38095 Replaced all instances of assertEqualIgnoreTypes in common_nn.py with assertEqual Pull Request resolved: https://github.com/pytorch/pytorch/pull/84210 Approved by: https://github.com/kit1980 commit 5cfe76938735a7cae06f8fa8cd1ab3962fbe384f Author: Nikita Karetnikov Date: Thu Sep 1 16:14:31 2022 +0000 [primTorch] Add refs for `reshape_as`, `view_as`, unify tests (#84222) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84222 Approved by: https://github.com/Lezcano, https://github.com/ngimel commit 8778f337442ab7ad512d20c3a9028df59380c6f0 Author: Andrew M. James Date: Wed Aug 31 09:58:57 2022 -0500 Dense <-> bsc conversions (#80781) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80781 Approved by: https://github.com/bhosmer, https://github.com/nikitaved commit 0909639c9045e9f9435165778319fdb59728baa6 Author: Yu, Guangye Date: Thu Sep 1 11:53:32 2022 +0000 fix dispatch declaration bug about quantized op (#83649) Fixes issue #83051. _fake_quantize_learnable_per_tensor_affine_backward and _fake_quantize_learnable_per_channel_affine_backward are implemented for CPU and CUDA. Currently, these two are in the CompositeImplicitAutograd category. If this issue is not fixed. We need to provide their autograd function when we want to register a new backend. It doesn't make sense to implement autograd function for them since they are all backward operators implemented directly with TensorIterators. Add a dispatch keyword in aten/src/ATen/native/native_functions.yaml and explicitly dispatch operators to CPU and CUDA. like this: ` dispatch:` ` CPU, CUDA: _fake_quantize_learnable_per_tensor_affine_backward` No additional unit test because this change could not affect PyTorch's functionality. It only affects registration on other backends, like XPU. So it is difficult to add ut to test it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83649 Approved by: https://github.com/jerryzh168 commit 70ef06cc1913e1d9c333819b222152e6abc5b870 Author: Michael Andreas Dagitses Date: Wed Aug 31 21:42:05 2022 -0700 fix and enable ATen ExclusivelyOwned_test (#84395) Summary: This depends on caffe2 so it must move to that section. Test Plan: Rely on CI. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84395 Approved by: https://github.com/DanilBaibak commit 521d1071f881c14a8f49bdc1aff984a0e7928294 Author: Zafar Date: Thu Sep 1 11:35:01 2022 +0000 [quant] Subpackage import in nn.quantized (#84141) Some of the subpackages were not included in the 'torch.nn.quantized'. That would cause some specific cases fail. For example, `from torch.nn.quantized import dynamic` would work, but `import torch; torch.nn.quantized.dynamic` would fail. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84141 Approved by: https://github.com/andrewor14 commit 546e5fa0c5df42ad83f336a77f5b7cb9ab40e16f Author: Michael Andreas Dagitses Date: Wed Aug 31 21:13:08 2022 -0700 register skipped ATen tests in CMake (#84345) Summary: These tests were not being built or executed as part of CI. Test Plan: Rely on CI. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84345 Approved by: https://github.com/kit1980 commit 65e887c041943bf5d1ae2c515cc7a89e3b89b588 Author: Ivan Yashchuk Date: Thu Sep 1 07:18:42 2022 +0000 Remove unnecessary copy from torch._refs.to, add OpInfo for torch.Tensor.to (#84270) This PR removes unnecessary copy from `torch._refs.to`, adds OpInfo for `torch.Tensor.to`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84270 Approved by: https://github.com/ngimel commit 90d6112a948644dac77120cfcf1de9ac5566ab79 Author: Huy Do Date: Thu Sep 1 03:48:52 2022 +0000 Test distributed backends in parallel (#84034) This allows multiple backends (nccl, gloo) to be tested in parallel and speed up the process. The improvement is mainly in the 1st distributed CUDA shard where the long pole `distributed/test_distributed_spawn` test is executed: * [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596825?check_suite_focus=true#logs) takes 1h24m. This is better than the current average expectation of 2h12m On the other hand, there is no improvement for the following two jobs: * [linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)](https://github.com/pytorch/pytorch/runs/8007417353?check_suite_focus=true#logs) takes 1h47m * [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596870?check_suite_focus=true#logs) takes 1h40m This is still a gain though because it allows us to add more shards for distributed test if needed. Issue https://github.com/pytorch/pytorch/issues/83694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84034 Approved by: https://github.com/wanchaol commit 693ed8b14777d1515c18653f5f8f28a602898662 Author: Howard Huang Date: Wed Aug 31 11:42:08 2022 -0700 [1/N] [Dispatchable Collectives] Create Backend class (#83679) - Create a new Backend class which contains collectives similar to that of https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/ProcessGroup.hpp. In future PRs, the existing ProcessGroupNCCL/Gloo/UCC will be migrated to derive from this Backend class. The idea is that we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. Differential Revision: [D38839213](https://our.internmc.facebook.com/intern/diff/D38839213) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83679 Approved by: https://github.com/kwen2501 commit ece0002c4beaebaf083dc75b7bf8ceb19edf7a0b Author: titaiwang Date: Wed Aug 31 21:40:27 2022 +0000 [ONNX] Disable autocast cache in exporter (#84219) This PR provides a temporary fix on #84092 in exporter to avoid more cases falling into this bug. A long-term fix will be provided later. A simple repro with torch.onnx.export is still under investigation, as torch.jit.trace() is not the API we call inside torch.onnx.export, and it may introduce the difference. Therefore, a test case is provided here only. A specific test one can use, ```python import torch import onnxruntime from onnxruntime.training.ortmodule import DebugOptions, LogLevel from onnxruntime.training.ortmodule import ORTModule class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.cv1 = torch.nn.Conv2d(3, 3, 5, 2, 1) def forward(self, x): x = self.cv1(x) return x x = torch.randn(10, 3, 20, 20) * 2 m = MyModule().eval() x = x.cuda() m = m.cuda() debug_options = DebugOptions(log_level=LogLevel.VERBOSE, save_onnx=True, onnx_prefix="ViT-B") m = ORTModule(m, debug_options=debug_options) with torch.cuda.amp.autocast(dtype=torch.float16, cache_enabled=True): loss = m(x) ``` AND make assertion fail in ORTModule https://github.com/microsoft/onnxruntime/blob/17ccd6fa02877a1c8d3201344137b1ca105b681d/orttraining/orttraining/python/training/ortmodule/_io.py#L578-L581 Without the fix, the user will see the weight/bias of Conv node becomes constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84219 Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi commit 18264432f7f9b7545e7d494b1e9391883fc8ab60 Author: titaiwang Date: Wed Aug 31 21:39:08 2022 +0000 [ONNX] replace all _C._flatten to torch.jit._flatten (#83598) _C._flatten is exactly the same as torch.jit._flatten. Unifying them to reduce confusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83598 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit f701cb04fbc864f5eb9e928c16bae24f006cfd5d Author: Elias Ellison Date: Wed Aug 31 13:40:04 2022 -0700 Test Dynamo CI w Fake Tensors (#84282) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282 Approved by: https://github.com/anijain2305 commit ef3ab31f1c57b357a23f729f8d986432185ebaa4 Author: Sherlock Huang Date: Wed Aug 31 21:22:17 2022 +0000 Decomp for aten.im2col (#84303) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84303 Approved by: https://github.com/jansel, https://github.com/ngimel commit cd96f3f6769af7b01a3b50e0d19d9fc0ea015346 Author: Edward Z. Yang Date: Wed Aug 31 14:14:44 2022 -0700 Use register_meta for everything in meta_registrations (#84297) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84297 Approved by: https://github.com/Chillee commit 305c6a6c35ace740ca000851ad908714daad4b7a Author: Chien-Chin Huang Date: Wed Aug 31 09:20:45 2022 -0700 [FSDP] Fix the FQN not found issue for load sharded_state_dict when using activation checkpoint (#84253) The current sharded_state_dict load will fail if activation checkpoint is also enabled. This PR fixes the issue. Differential Revision: [D39125431](https://our.internmc.facebook.com/intern/diff/D39125431/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84253 Approved by: https://github.com/awgu commit e8885a872c5a444711bb75aaf4b3a792fe674057 Author: Nikita Shulga Date: Wed Aug 31 23:02:42 2022 +0000 [CI] Move bazel from 11.3 to 11.6 (#84314) In process of doing so have to: - Delete `/usr/local/cuda-11.6/cuda-11.6` symlink to self, otherwise Bazel builds fail with ``` ERROR: circular symlinks detected [start of symlink cycle] /usr/local/cuda-11.6/cuda-11.6 [end of symlink cycle] ``` - Add `-DCUB_WRAPPED_NAMESPACE=at_cuda_detail"` to `COMMON_COPTS` if building with CUDA, to mimic the behaviour in https://github.com/pytorch/pytorch/blob/4b8ae047881314580826113f8a224f3fd935b203/cmake/Dependencies.cmake#L1664-L1668 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84314 Approved by: https://github.com/ngimel, https://github.com/atalman commit fddfc4488afb207971c54ad4bf58130fdc8a4dc5 Author: Zain Rizvi Date: Wed Aug 31 22:44:14 2022 +0000 Further improve mergebot messages (#84283) Reword the rejection reasons to better match the format mergebot uses to output the message, and repoints the workflow links to point to the commit page in hud instead of github **Context:** Some of the mergebot messages looked a bit weird. For example, it would claim to be offering a reason for a merge failing, but instead the message would be of a more diagnostic nature. Example of a weird message ("view failures on hud" is not a reason!): image The above message would now look like: image Pull Request resolved: https://github.com/pytorch/pytorch/pull/84283 Approved by: https://github.com/huydhn commit c585e149e2d7d6fbc460a0ff0324bdc189246578 Author: Slava Kovalevskyi Date: Wed Aug 31 21:48:39 2022 +0000 Process for maintaining Build + CI contributors list (#83869) The following issues are fixed: * process of adding new contributors to the "Build + CI" module added * folks who qualified are explicitly added Pull Request resolved: https://github.com/pytorch/pytorch/pull/83869 Approved by: https://github.com/svekars, https://github.com/seemethere, https://github.com/malfet commit 4b8ae047881314580826113f8a224f3fd935b203 Author: Nikita Shulga Date: Wed Aug 31 19:59:31 2022 +0000 [BE] Delete torch._dl extension (#84361) And lots of complexity around the availability of RTLD_GLOBAL flags in `os` module As this flag is always present since Python-3.3, see https://docs.python.org/3/library/os.html#os.RTLD_GLOBAL Fixes https://github.com/pytorch/pytorch/issues/84351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84361 Approved by: https://github.com/kit1980 commit cfb9d0d23314fd28be118b6ca280ded55364e71c Author: Kevin Tse Date: Wed Aug 31 17:18:07 2022 +0000 [DataPipe] Fixing `map` function signature validation (#84279) As @pmeier [points out](https://github.com/pytorch/pytorch/pull/80267#discussion_r958423241), #80267 introduces a bug where an exception is thrown when a built-in function (or a function implemented in C) is used with `.map` because `inspect.signature(fn)` cannot find the function's signature. This PR skips over a function when its signature cannot be found. I believe this case is rare, and if the `fn` is truly incompatible with the usage of `input_col`/`output_col`, an exception will be raised at run time such that users will be able to examine what is wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84279 Approved by: https://github.com/pmeier, https://github.com/janeyx99 commit 744019ece76aef07c38e64dcb53a9801c5b51d49 Author: Salil Desai Date: Wed Aug 31 19:47:57 2022 +0000 [AIBench] Pass Vulkan Profiling Data to Kineto Profiler in lite_predictor_benchmark (#84185) Summary: This lets us more easily analyze operator-level performance of models run with Vulkan Test Plan: Generated chrometrace with vulkan events recorded Reviewed By: kimishpatel Differential Revision: D38280587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84185 Approved by: https://github.com/SS-JIA commit a0ccfe08477486e6adc536d29e7acdc53e13899b Author: Huy Do Date: Wed Aug 31 19:29:25 2022 +0000 Temporary fix to not fail concurrent viable/strict updates (#84324) Until we have a solution for https://github.com/pytorch/pytorch/issues/83986 and can use our runner for the job, we need to live with the fact that GitHub runner can have a pretty long queue throughout the day. This keeps trunk green in the meantime. This's a follow-up of https://github.com/pytorch/pytorch/pull/84249 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84324 Approved by: https://github.com/zengk95 commit 84ceebebf9d232a7f5e17012402e195afaf57129 Author: Andrew Gu Date: Wed Aug 31 15:55:02 2022 +0000 [FSDP] ufmt `flat_param.py`, `flatten_params_wrapper.py` (#83664) I think we can move FSDP code to start using ufmt (https://ufmt.omnilib.dev/en/stable/) to unify formatting across developers. ufmt is the recommended formatter for PyTorch's Python code. If we have consensus, I can ufmt all of the FSDP code in follow-ups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83664 Approved by: https://github.com/rohan-varma commit 040263d7dc7bd1e4e620bd1889717890b1bf9b30 Author: Michael Andreas Dagitses Date: Wed Aug 31 05:28:35 2022 -0700 sort ATen tests in CMake (#84344) Summary: This will make it easier to compare and spot missing files. Test Plan: Rely on CI. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84344 Approved by: https://github.com/malfet commit 65f98eb47dbf75335d08f7676835a5e1f1fc3574 Author: PyTorch MergeBot Date: Wed Aug 31 18:27:58 2022 +0000 Revert "Add meta function for repeat (#84349)" This reverts commit 44bc6db8f88faf1b7543e825f1282140b9efa504. Reverted https://github.com/pytorch/pytorch/pull/84349 on behalf of https://github.com/janeyx99 due to Land race with the revert causing test_fx failures https://hud.pytorch.org/pytorch/pytorch/commit/44bc6db8f88faf1b7543e825f1282140b9efa504 commit 6efadf7e7e6655b543b5a9819b6e2eac2d76f09c Author: Jeff Daily Date: Wed Aug 31 18:26:22 2022 +0000 [ROCm] guard ROCm-only files in NVFUSER_RUNTIME_FILES (#84312) Addresses comment in #82498 as a follow-up PR. https://github.com/pytorch/pytorch/pull/82498#discussion_r958745967 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84312 Approved by: https://github.com/jjsjann123 commit 762890d11ef4a11a2cb1eac2f61b2805328fad72 Author: Andrew Gu Date: Wed Aug 31 15:55:02 2022 +0000 [FSDP] Retire `self.device_id`; clean up ctor (#83663) This PR retires `self.device_id` by coalescing it with `self.compute_device` and more generally cleans up the FSDP constructor. 1. Compute the ignored parameters/modules from `ignored_modules` and the buffer names (to avoid cloning in `state_dict()`) 2. Recursively auto wrap if needed 5. Define process group attributes 6. Determine `device_id` 7. Materialize the wrapped module if using meta device or `torchdistX` deferred initialization 8. Move the module if needed (based on `self.device_id`) 9. Determine `compute_device` 10. Define `training_state`, gradient divide factors, FSDP feature-related attributes (`cpu_offload`, `forward_prefetch`, `backward_prefetch`, `sharding_strategy`, `mixed_precision`), `_orig_buffer_dtypes` 11. Determine the parameters to flatten 12. Sync module states if `sync_module_states` 13. Initialize the `FlattenParamsWrapper` with the parameters to flatten and the wrapped module, which constructs the `FlatParameter` 14. Shard the `FlatParameter` (in-place) 15. Define `_is_root`, shared attributes (`_streams`, `_fsdp_graph_order`), prefetching attributes (`_my_fsdp_idx_in_graph`, `_pre_backward_hook_full_params_prefetched`, `_forward_full_params_prefetched`), `reshard_after_forward` -- all of this is done in `_reset_lazy_init()` 16. Define `_require_backward_grad_sync` to configure `no_sync()` 17. Define state dict attributes (`_state_dict_type`, `_state_dict_config`) and register state dict hooks 18. Define backward pass flags (`_pre_backward_hook_has_run`, `_need_rebuild_full_params`) 19. Move `FlatParameter`s to CPU if `cpu_offload.offload_params` 20. Define `_exec_order_data` for execution order validation 21. Define communication hook attributes (`communication_hook`, `communication_hook_state`, `_hook_registered`) - `self.mixed_precision` - **Before:** `self.mixed_precision` itself could be `None`. Equivalently, `self.mixed_precision` could be `MixedPrecision(None, None, None)`. Both would disable mixed precision completely. - **After:** `self.mixed_precision` itself is never `None`. We only have `MixedPrecision(None, None, None)` (default construction of the `dataclass`) to disable mixed precision. This catches the issue that for `test_summon_full_params.py`, we were passing `MixedPrecision(None, None, None)` when we wanted to actually enable mixed precision. - `cpu_offload.offload_params=True` + `device_id` - **Before:** For nested FSDP and `device_id` specified, `FlatParameter`s already offloaded to CPU are moved back to GPU and not re-offloaded to CPU. - **After:** The nested `FlatParameter`s are re-offloaded to CPU. This is a temporary hack. The ideal solution removes the `module = module.to()` in the first place and only moves the relevant parameters. Because the `module.to()` implementation has some complexity, I did not want to remove that call in this PR. - `device_id` and `compute_device` - **Before:** `self.device_id` is either `None` or equal to `self.compute_device`. `self.device_id` is not used after the FSDP constructor. - **After:** `self.device_id` is removed and instead coalesced with `self.compute_device`. The only semantic change is that `test_module_device_mismatches_device_id()` errors earlier (but importantly, still errors). - This PR also uses a helper method `_get_orig_params()`, which is more robust and may avoid issues like https://github.com/pytorch/pytorch/issues/82891 without having to gate higher-level logic. - `_reset_lazy_init()` attributes - **Before:** Some attributes were being _defined_ in `_reset_lazy_init()` (which may not be obvious to all devs). - **After:** For this PR, we define these attributes in the constructor but leave `_reset_lazy_init()` as is. In the follow-ups, this gets further refactored. - Otherwise, I simply moved some logic into their own methods and reorganized the attribute definitions to be grouped logically. 1. What should the specification be for `device_id` + `ignored_modules`? 2. Investigate removing the `module = module.to()` in favor of moving per parameter. 3. Should we call `_reset_lazy_init()` in `register_comm_hook()`? Pull Request resolved: https://github.com/pytorch/pytorch/pull/83663 Approved by: https://github.com/zhaojuanmao, https://github.com/rohan-varma commit 85931eaa6beab53d138f873d3505aee34e98ee89 Author: Horace He Date: Wed Aug 31 07:53:03 2022 +0000 Rename fake_result to val (#84331) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84331 Approved by: https://github.com/ezyang commit 85b889fa5f1a478e1f15183008f01c56537f10d7 Author: Nikita Karetnikov Date: Tue Aug 30 18:59:08 2022 +0200 [primTorch] Add ref for `poisson_nll_loss` (#83805) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83805 Approved by: https://github.com/Lezcano, https://github.com/ngimel commit 71ce9cd0726394a5e1dd85e1d7430776a4d05a82 Author: Nikita Karetnikov Date: Tue Aug 30 18:59:07 2022 +0200 [primTorch] Add decomp for `soft_margin_loss` (#83804) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83804 Approved by: https://github.com/Lezcano, https://github.com/ngimel commit 305af90d0f78171fef0c9d36078794b3b4acad36 Author: Nikita Karetnikov Date: Tue Aug 30 18:59:07 2022 +0200 [primTorch] Add docstring and promotion for `l1_loss` ref (#83803) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83803 Approved by: https://github.com/Lezcano, https://github.com/ngimel commit 44bc6db8f88faf1b7543e825f1282140b9efa504 Author: Edward Z. Yang Date: Wed Aug 31 07:39:53 2022 -0700 Add meta function for repeat (#84349) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84349 Approved by: https://github.com/Krovatkin commit 7834f557d7477fc9a11494a03eaa88228e40636f Author: Will Constable Date: Wed Aug 31 17:15:05 2022 +0000 Add dynamo_timed to aot autograd (#84307) Provides visibility into time spent running AotAutograd Partially fixes [torchdynamo/795](https://github.com/pytorch/torchdynamo/issues/795) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84307 Approved by: https://github.com/Chillee commit 14093b5979cb5c0b777e3920819ab8252eb6d3ea Author: PyTorch MergeBot Date: Wed Aug 31 16:32:24 2022 +0000 Revert "Use register_meta for everything in meta_registrations (#84297)" This reverts commit 8cd296f6804727899b39198d1641055b64f99056. Reverted https://github.com/pytorch/pytorch/pull/84297 on behalf of https://github.com/suo due to broke test_proxy_tensor on master commit bf67589915de07a6b8756a685de9abbd90ec2dfa Author: Sungmin Cho Date: Wed Aug 31 15:15:21 2022 +0000 Escape curly brackets in FxGraphDrawer _typename (#83604) Summary: Encountered `Error: bad label format` from dot (i.e. graphviz) when benchmarking models that have dict-like structure. The root cause was that curly brackets were not properly escaped, like this example P522499127 (unescaped curly brackets in target= string) This diff insert the fix in FxGraphDrawer, since many of these graph generation codes rely on that class. (Modified summary before exporting to GitHub PR) Test Plan: ``` CUDA_VISIBLE_DEVICES=7 buck run mode/opt -c python.package_style=inplace //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --model-name={INSERT IFR QE MODEL NAME HERE} --batch-iter 100 --batch-size 768 --num-gpu 1 --lower-presets {INSERT ITS PRESET} ``` Will not encounter dot errors after this diff. (Modified test plan before exporting to GitHub PR) Reviewed By: yinghai Differential Revision: D38758827 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83604 Approved by: https://github.com/yinghai, https://github.com/jianyuh commit b170db855441059218ead33c88af5d7576a1bc59 Author: Michael Andreas Dagitses Date: Wed Aug 31 05:12:16 2022 -0700 build/test MaybeOwned_test in OSS and fix it (#84342) Summary: This was not listed in the compilation for the ATen tests and was only getting built in Meta internal repositories. This ended up with the following problems: * at::zeros was not available * equal() for tensors was being selected from ATen/ops/equal.h and crashing Test Plan: Verified locally. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/84342 Approved by: https://github.com/DanilBaibak commit a27a4a02fecfdd626b25794a84954731b80f29fb Author: Horace He Date: Wed Aug 31 07:01:37 2022 +0000 Refactored proxytensor to clean up separate branches (#84325) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84325 Approved by: https://github.com/ezyang commit 8843f5b9868a99c41d5259ac0346bc99f2c578a0 Author: Horace He Date: Wed Aug 31 01:17:31 2022 +0000 remove data-dependent shapes from some distributions (#84322) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84322 Approved by: https://github.com/voznesenskym commit 6a3ecda5a25025d48bbc5f0215db8c338745ef79 Author: Horace He Date: Wed Aug 31 00:29:55 2022 +0000 Started storing faketensor/symbolic shape metadata on FX nodes in make_fx (#84114) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84114 Approved by: https://github.com/SherlockNoMad commit 79e3a39f95e91af03823a8579da06c35bb519faf Author: Nikita Shulga Date: Wed Aug 31 04:34:01 2022 +0000 [BE] Remove unused `export.h` include (#84305) As flatbuffer_serializer can be compiled without it Found while debugging cause of https://github.com/pytorch/pytorch/pull/82040#issuecomment-1229503604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84305 Approved by: https://github.com/kit1980, https://github.com/qihqi commit abaf8112e6d6bed2a5d33dcbc1d46ed20b8e80de Author: Eli Uriegas Date: Tue Aug 30 16:08:59 2022 -0700 ci: Replace setup-miniconda with test-infra version (#84236) Replaces our use of the conda-incubator version of setup-miniconda with one that's more tailored to our specific needs. Should address issues highlighted in https://github.com/pytorch/pytorch/issues/84196 Signed-off-by: Eli Uriegas Pull Request resolved: https://github.com/pytorch/pytorch/pull/84236 Approved by: https://github.com/atalman, https://github.com/janeyx99, https://github.com/malfet commit b343febe610b8c95ca07fe9a0b061f138ed7c94d Author: PyTorch MergeBot Date: Wed Aug 31 02:58:14 2022 +0000 [torchdynamo hash update] update the pinned torchdynamo hash (#84317) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned torchdynamo hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84317 Approved by: https://github.com/pytorchbot commit 8cd296f6804727899b39198d1641055b64f99056 Author: Edward Z. Yang Date: Tue Aug 30 15:55:04 2022 -0700 Use register_meta for everything in meta_registrations (#84297) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/84297 Approved by: https://github.com/Chillee commit 71d99662a0d7f8a9ad68999c9a014b71591cbb68 Author: David Berard Date: Tue Aug 30 20:06:22 2022 +0000 add nvidia-smi to run_torchbench (#83857) Seeing an OOM in #83239, this would help understand whether the issue is with the infra or with the test. RUN_TORCHBENCH: nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/83857 Approved by: https://github.com/xuzhao9 commit 9c452abcf18a811023530b9673ae362bf987068a Author: Elias Ellison Date: Tue Aug 30 15:25:33 2022 -0700 Use reentrant mode when invoking prims, delete global prim_fake_mode (#84090) Maybe I should be using the meta_impl instead of the prim_impl, but it's not terribly clear why, since the prim impl will be better tested and should work under the re-entrant FakeTensorMode. Fixes https://github.com/pytorch/pytorch/issues/78613 in the process Pull Request resolved: https://github.com/pytorch/pytorch/pull/84090 Approved by: https://github.com/ezyang, https://github.com/samdow commit db7784e7227ea296c9c23be731bcf5bb4ad4dff7 Author: Mike Iovine Date: Wed Aug 31 01:20:14 2022 +0000 [Static Runtime] Schema checks for index_put (#84152) Summary: `index_put` can take a list of tensors, but Static Runtime always tries to convert its argument to a list of optional tensors. This was causing crashes for some users. Add some schema checks to prevent this, and add a new overload for the new case. Also, I found a clear bug in the JIT interpreter (mutating the argument when its not supposed to), so I fixed that too. Test Plan: New unit test Differential Revision: D39072214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84152 Approved by: https://github.com/tenpercent commit 7532d5b125fff65945cbb95d5f6cbee082e7238f Author: samdow Date: Tue Aug 30 12:24:07 2022 -0400 [Modes] remove inner constructor kwarg (#83925) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83925 Approved by: https://github.com/ezyang, https://github.com/zou3519 commit e23d159bc57c1651e47e555092c2486bb55db37a Author: Scott Wolchok Date: Mon Aug 29 09:39:36 2022 -0700 [PyTorch][caffe2] Add CAFFE2_{DECLARE,DEFINE}_KNOWN_TYPE (#83707) It looks like we aren't getting inlining for the defined `_typeMetaData` functions from CAFFE_KNOWN_TYPE and there's some cost associated with that. I added new macros that fix this problem; I will migrate to them in a follow-up after I get buy-in from reviewers. Differential Revision: [D36883685](https://our.internmc.facebook.com/intern/diff/D36883685/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36883685/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83707 Approved by: https://github.com/ezyang commit af741e821bb3efecc22feca984519e472e933e9e Author: Catherine Lee Date: Tue Aug 30 22:45:15 2022 +0000 no ios arm builds on circleci (#84299) Get rid of ios arm builds on circleci b/c most people dont have these permissions and they make the job show up as failing/red. Next step is to see if we can do only builds since they might not require credentials Pull Request resolved: https://github.com/pytorch/pytorch/pull/84299 Approved by: https://github.com/janeyx99, https://github.com/malfet commit e014bd8e4ef65377374640310cbafbccbcd0f5f7 Author: Xu Zhao Date: Tue Aug 30 22:40:44 2022 +0000 Upgrade default cuda version of torchbench (#84248) Upgrade CUDA version of torchbench as we are moving away from CUDA 11.3 This PR needs to land together with https://github.com/pytorch/benchmark/pull/1141 RUN_TORCHBENCH: nvfuser TORCHBENCH_BRANCH: xz9/setup-cuda-compile Pull Request resolved: https://github.com/pytorch/pytorch/pull/84248 Approved by: https://github.com/erichan1, https://github.com/davidberard98 commit 7acdb2d5642557053df00951b51b94929302a9b7 Author: Huy Do Date: Tue Aug 30 22:19:07 2022 +0000 Don't start land checks if the PR hasn't been approved yet (#84239) Per title, don't start land checks if the PR hasn't been approved yet. This is very important to make sure that we don't start CI jobs from unknown devs, i.e. first time contributor. Also rename force to `skip_mandatory_checks` to make it clearer on what this flag does ``` python .github/scripts/test_trymerge.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84239 Approved by: https://github.com/zengk95, https://github.com/ZainRizvi commit eabe34cc40aeb79a10208df291b2a4d92302fbc2 Author: Jesse Cai Date: Tue Aug 30 09:50:03 2022 -0700 [Quant] Remove warnings from using torch.tensor(value) (#84277) Summary: I think zafar made an earlier pull for these changes [here](https://github.com/pytorch/pytorch/commit/ce0786add26c1e117b16b58e8ae12dbe776133e1), but they didn't seem to make it through the migration. Test Plan: ``` python test/test_quantization.py ``` Reviewers: Subscribers: Tasks: https://github.com/pytorch/pytorch/issues/73566 Tags: quant Differential Revision: [D39145070](https://our.internmc.facebook.com/intern/diff/D39145070) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84277 Approved by: https://github.com/z-a-f commit eda217ab672a08e555a7d09a1e4f10d2f98ee478 Author: Nikolay Korovaiko Date: Tue Aug 30 21:53:34 2022 +0000 Reland symint_numel (#84281) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84281 Approved by: https://github.com/ezyang commit d09486ab233284e9f298e45a43977fed8f075fe4 Author: Jeff Daily Date: Tue Aug 30 21:50:39 2022 +0000 [ROCm] enable nvfuser (#82498) The nvfuser is enabled for ROCm. CI label ciflow/trunk covers the newly enabled ROCm functionality as well as any CUDA regressions caused by these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82498 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98 commit f9609d82038897ac560b408808e9dba9f39bc922 Author: SmirnovKol <31559413+OccupyMars2025@users.noreply.github.com> Date: Tue Aug 30 21:41:11 2022 +0000 Fix several typos (#83823) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/83823 Approved by: https://github.com/ngimel, https://github.com/kit1980 commit c06a5586f57c844fdc4a98e52f88e71f64dd54d2 Author: Taylor Robie Date: Tue Aug 30 09:05:15 2022 -0700 [Profiler] Unify global and thread local profiler lookup. (#83894) This PR renames `ProfilerThreadLocalStateBase` to simply `ProfilerStateBase`, and adds `push`, `pop`, and `get` methods. `global` can be specified, or can be omitted for priority selection. In order to support this unification it was necessary to make a (mostly) non-throwing version of pop. The asserts around observer removal are intended to act as guard rails against multiple profilers trampling over each other. However on-demand wants to do exactly that because it wants to be able to preempt. A hack would be to get the current observer and then only pop if an observer is found, but that would be prone to race conditions. By removing the asserts, we can preserve the old behavior by adding `ASSERT(pop())` on the caller side while allowing more complex handling for the kineto client interface. (Later PR.) Differential Revision: [D38931521](https://our.internmc.facebook.com/intern/diff/D38931521/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83894 Approved by: https://github.com/slgong-fb commit 48a596ad3f2ca617cd2fafc3fa3c368f5600930a Author: Taylor Robie Date: Tue Aug 30 09:05:13 2022 -0700 [Profiler][Trivial] Create orchestration folder and move observer management there. (#83893) Just a basic move. Later I'll add other subsystems. (Python, Kineto) Differential Revision: [D38925895](https://our.internmc.facebook.com/intern/diff/D38925895/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38925895/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83893 Approved by: https://github.com/slgong-fb commit c26b53f6a4c05a280aabe525a5c5918e3db3da57 Author: Taylor Robie Date: Tue Aug 30 09:05:11 2022 -0700 [Profiler] Encapsulate callback handle management. (#83892) Right now the profiler is capible of leaking callback handles if a client does not call `at::removeCallback`. (As well as a double free if two clients handle it.) This modestly improves the situation by pulling removal into a single method and calling that removal code in the dtor unless explicitly opted out. Once we deprecate the legacy profiler we can further simplify by making the ProfilerThreadLocalStateBase own the handle outright. Differential Revision: [D38920537](https://our.internmc.facebook.com/intern/diff/D38920537/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83892 Approved by: https://github.com/slgong-fb commit ddd841b3168750fa888b9c97e21cf9a6f0934d5b Author: atalman Date: Tue Aug 30 21:23:17 2022 +0000 Removing multigpu 10.2 . Using 11.6 cuda for multigpu tests instead (#84286) Removing multigpu 10.2 . Using 11.6 cuda for multigpu tests instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/84286 Approved by: https://github.com/huydhn, https://github.com/malfet commit 772721a4b7ea68a21e14eb74fedbd6c22f616905 Author: PyTorch MergeBot Date: Tue Aug 30 21:01:25 2022 +0000 Revert "Test distributed backends in parallel (#84034)" This reverts commit 3ae5be74ac7aa4feed6ec8e7c29b280b148651a7. Reverted https://github.com/pytorch/pytorch/pull/84034 on behalf of https://github.com/huydhn due to This somehow revives the flaky test https://github.com/pytorch/pytorch/issues/76428 commit 20018aa7667284a21303a52e5ac0bed5971af2bd Author: Isaac Hoffman Date: Tue Aug 30 20:36:30 2022 +0000 modify split_by_tags to retain output order (#84136) Summary: Currently `split_by_tags` determines submodule output order by iterating over `used_in_main`. Since this is a `Set`, insertion order is not retained so we run into problems with submodule output order being "randomized" & inconsistent between splits. By using `Dict[Node, None]` we can implement `used_in_main` as an ordered set so that output order is consistent when splitting the same model. Test Plan: CI Differential Revision: D39039268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84136 Approved by: https://github.com/houseroad commit 90161c23cf28d5c61295d5c392b14cc3483d3a33 Author: Ivan Yashchuk Date: Tue Aug 30 20:36:11 2022 +0000 Add nvfuser support for squeeze (#84117) "_refs.squeeze" and "refs.unsqueeze" now work with nvfuser executor tests. Similarly to `_refs.reshape` we need to explicitly save the concrete shape on the trace to pass that info to nvfuser, as it gets lost in translation (https://github.com/pytorch/pytorch/pull/83739#discussion_r950352124). Pull Request resolved: https://github.com/pytorch/pytorch/pull/84117 Approved by: https://github.com/ngimel commit 174c3c6859529f30a7dfa4920a9a52e1373b02a9 Author: Driss Guessous Date: Tue Aug 30 19:22:38 2022 +0000 [Nested Tensor]Clean up offsets (#84145) - Document contiguous offset construction - Expand offsets by 1 so that storage offsets for `ntensor[i] = offsets[i+1] - offsets[i]` Another simple one. While looking into this issue https://github.com/pytorch/pytorch/issues/84082 I noticed that the kernels essentially rebuild the offsets but with the added last element. I added this and also cleaned up the code a little Pull Request resolved: https://github.com/pytorch/pytorch/pull/84145 Approved by: https://github.com/albanD commit 3ae5be74ac7aa4feed6ec8e7c29b280b148651a7 Author: Huy Do Date: Tue Aug 30 19:06:49 2022 +0000 Test distributed backends in parallel (#84034) This allows multiple backends (nccl, gloo) to be tested in parallel and speed up the process. The improvement is mainly in the 1st distributed CUDA shard where the long pole `distributed/test_distributed_spawn` test is executed: * [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596825?check_suite_focus=true#logs) takes 1h24m. This is better than the current average expectation of 2h12m On the other hand, there is no improvement for the following two jobs: * [linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)](https://github.com/pytorch/pytorch/runs/8007417353?check_suite_focus=true#logs) takes 1h47m * [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596870?check_suite_focus=true#logs) takes 1h40m This is still a gain though because it allows us to add more shards for distributed test if needed. Issue https://github.com/pytorch/pytorch/issues/83694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84034 Approved by: https://github.com/wanchaol commit 641c3952516baf444f75058f76cde59d0e1110f0 Author: titaiwang Date: Mon Aug 29 20:12:28 2022 +0000 [ONNX] refactor test_pytorch_onnx_onnxruntime_cuda.py (#84218) Fix #80037 After https://github.com/pytorch/pytorch/pull/79641, the code was outdated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84218 Approved by: https://github.com/BowenBao commit b8ee81014481f58cc87fbda19737307435951d02 Author: Andrew Gu Date: Mon Aug 29 16:28:07 2022 +0000 [Easy][FSDP] Update `StateDictType` doc (#84200) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84200 Approved by: https://github.com/rohan-varma commit 7f58db7424121f035fa70c9504c437bbda722efe Author: Andrew Gu Date: Mon Aug 29 16:27:58 2022 +0000 [Easy][FSDP] ufmt `_optim_utils.py` (#84199) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84199 Approved by: https://github.com/rohan-varma commit 5bceaadb70a87d317f5855514f2e6c730844a015 Author: titaiwang Date: Fri Aug 26 20:02:04 2022 +0000 [ONNX] Add script/trace different flatten and move optional type tests to runtime (#83184) fix #78119 Why: As in onnx tests verification code, we used to only consider tracing output, which ignores None type, this PR enables runtime test to keep None type in torch in script mode. 1. Move Optional Type tests from no runtime to runtime, as it's supported by ONNXRUNTIME. 2. Add ignoreNone flag for output comparison of internal tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/83184 Approved by: https://github.com/justinchuby, https://github.com/BowenBao commit b106a04d766c21d15137318c506fc1ed823016b9 Author: lezcano Date: Tue Aug 30 12:11:57 2022 +0000 Fix the edge case when y = 0 in kl_div (#82714) Brought up in https://github.com/pytorch/pytorch/pull/80334#issuecomment-1193600883 We also prepare its opinfo to fix https://github.com/pytorch/pytorch/issues/80488 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82714 Approved by: https://github.com/albanD commit b182f0813519d9b153bf7acb4214c8e3f795866e Author: Eric Han Date: Tue Aug 30 18:06:25 2022 +0000 Fix issue in softmax.cu with transformer error when mask seqlen > 1024 (#83639) Fixes #83142 Adds - test to catch this issue. - fix to softmax.cu that broadcasts src_key_padding_mask to regular attention_mask shape Pull Request resolved: https://github.com/pytorch/pytorch/pull/83639 Approved by: https://github.com/ngimel commit 897907d42cc379af2c16885345bc20b5e8ca894d Author: Peter Bell Date: Mon Aug 29 20:53:39 2022 +0100 Fix split torch_function handling (#83866) `Tensor.split` calls `TensorBase.split` whose `handle_torch_function` statement passes `func` as `Tensor.split` which is usually correct, but not here because of the use of `super()`. Instead this calls `torch._VF.split` which correctly differentiates from `torch.split`. This is currently okay since we never hit `TensorBase.split` for types with `__torch_function__` however, once we allow skipping only one hop of `__torch_function__` this will expose the error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83866 Approved by: https://github.com/albanD commit 65dc5dd3f317b8eb9440d22e044816adab2ffa9e Author: Rodrigo Kumpera Date: Tue Aug 30 17:44:57 2022 +0000 [c10d] Introduce dist.get_local_rank, dist.get_global_rank and dist.get_global_ranks (#82134) Those functions enable membership introspection into a ProcessGroup. A common scenario that needs this is library code that consumes a PG but doesn't create it, which means it likely doesn't know the global ranks used to create it. Translating from local to global is necessary when using c10d collectives like broadcast so if your library code adopts the convention of using local rank 0, it needs to the following: ```python import torch.distributed as dist my_pg: dist.ProcessGroup = ... def my_library_bcast(tensor) dist.broadcast(tensor, src=dist.get_global_rank(my_pg, local_rank=0), my_pg) ``` This implements some of the helpers needed to implement the `clone` API from: https://github.com/pytorch/pytorch/issues/81291 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82134 Approved by: https://github.com/rohan-varma commit 56a37ea1a6e89a8aa31abc888127ccac647b92d4 Author: Shen Li Date: Tue Aug 30 01:16:42 2022 +0000 Set default value for nccl make MAX_JOBS if ProcessorCount returns 0 (#84231) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84231 Approved by: https://github.com/malfet, https://github.com/rohan-varma commit f0efc1c2d19e561ecdc6dd1e556f76fe1a91e484 Author: Andrew Gu Date: Mon Aug 29 16:27:49 2022 +0000 [Easy][FSDP] Fix sharded optim state dict doc formatting (#84198) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84198 Approved by: https://github.com/rohan-varma commit 546d68226c355fb21ed374588b219f5d7d7a66c3 Author: Marko Horatio Mekjavic <48606569+Pompey21@users.noreply.github.com> Date: Tue Aug 30 15:00:30 2022 +0000 Update README.md (#84263) Just fixed a couple of typos (i.e. upzipped -> unzipped) :) Fixes #84262 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84263 Approved by: https://github.com/Lezcano, https://github.com/albanD commit 44a975335e2d08cbbb07df9a1cebe2620f337ed9 Author: Nikolay Korovaiko Date: Mon Aug 29 21:12:34 2022 -0700 Revert "Re-land sym_numel (#82374) (#82726) (#82731) (#82855)" (#84207) This reverts commit bfebf254dd92f3ed35154597166e7e71fb04f31b. Differential Revision: [D39104562](https://our.internmc.facebook.com/intern/diff/D39104562) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84207 Approved by: https://github.com/robieta commit 60f47cb0021b0ea245aa6cc4654bf9e6d0f4ab20 Author: PyTorch MergeBot Date: Tue Aug 30 13:16:21 2022 +0000 Revert "Use self-hosted runner for viable/strict update (#84249)" This reverts commit acd6ca8cfa9537284928fb5d36834d1e5ae1e6f3. Reverted https://github.com/pytorch/pytorch/pull/84249 on behalf of https://github.com/malfet due to Broke trunk, as one can't use regular actions on self-hosted runners, see https://github.com/pytorch/pytorch/runs/8092593881?check_suite_focus=true commit acd6ca8cfa9537284928fb5d36834d1e5ae1e6f3 Author: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Date: Tue Aug 30 07:33:59 2022 +0000 Use self-hosted runner for viable/strict update (#84249) Queuing for GH self hosted runners has been too much to be acceptable and this job gets canceled too frequently: image Let's use our own runner here instead for this important job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84249 Approved by: https://github.com/suo commit ec714e33a3365ce405e2ae048c8adaa2751b9dba Author: Shenxiu Liu Date: Tue Aug 30 05:16:19 2022 +0000 [PT] Allowing deepcopy in unitialized parameter (#83809) Summary: UninitializedParameter overrides `__new__` method thus the parent class's `__deepcopy__` method doesn't work anymore, causing models using LazyModule cannot be instantiated. Test Plan: locally copied lazy module. After change: ``` shenxiu@devbig1109:fbcode (5c57dd833)$ bento console --kernel pytorch --local /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/scribeutil/lib.py:9: DeprecationWarning: The "thrift" clients in libfb.py.thrift_clients are not proper thrift clients, and often have unexpected or incorrect behaviour. They are also completely unsupported. Please use a supported client from https://fburl.com/srpy or a supported raw thrift client if you cannot use ServiceRouter. from libfb.py.thrift_clients.scribe_thrift_client import ScribeThriftClient /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/ipykernel/iostream.py:14: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses from imp import lock_held as import_lock_held Python 3.8.6 (default, Jun 10 2022, 04:32:13) Type 'copyright', 'credits' or 'license' for more information IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import copy ...: import torch ...: ...: class LazyModule(torch.nn.Module): ...: def __init__(self): ...: super().__init__() ...: self.m = torch.nn.LazyLinear(10) ...: ...: def forward(self, input): ...: x = self.m(input) ...: return x ...: ...: m = LazyModule() ...: print(m.state_dict()) copy.deepcopy(m) /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/mpmath/ctx_mp_python.py:892: SyntaxWarning: "is" with a literal. Did you mean "=="? if other is 0: /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/mpmath/ctx_mp_python.py:986: SyntaxWarning: "is" with a literal. Did you mean "=="? if other is 0: /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/sympy/solvers/diophantine.py:3188: SyntaxWarning: "is" with a literal. Did you mean "=="? if feasible is 1: # it's prime and k == 2 /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/sympy/plotting/plot.py:520: SyntaxWarning: "is" with a literal. Did you mean "=="? if self.xscale is 'log': /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/sympy/plotting/plot.py:540: SyntaxWarning: "is" with a literal. Did you mean "=="? if self.xscale is 'log': /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/sympy/plotting/plot.py:553: SyntaxWarning: "is" with a literal. Did you mean "=="? if self.xscale is 'log': /data/users/shenxiu/fbsource/buck-out/v2/gen/fbcode/26f2c80c27f9e71d/bento/kernels/__bento_kernel_pytorch__/bento_kernel_pytorch#link-tree/sympy/plotting/plot.py:560: SyntaxWarning: "is" with a literal. Did you mean "=="? if self.xscale is 'log': OrderedDict([('m.weight', ), ('m.bias', )]) In [2]: copy.deepcopy(m) Out[2]: LazyModule( (m): LazyLinear(in_features=0, out_features=10, bias=True) ) ``` Before change, above code will give ``` TypeError: empty() received an invalid combination of arguments - got (int, dtype=NoneType, device=bool), but expected one of: * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad) * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad) * (tuple of SymInts size, *, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad) ``` Cloned n2369721 locally and successful (thru console not notebook because somehow bento notebook doesn't work with buck2 well). Reviewed By: avilay Differential Revision: D38866072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83809 Approved by: https://github.com/ngimel commit 856a7d94116e175cd4b631c61b4a9a8c572b83c8 Author: Peter Bell Date: Tue Aug 30 02:12:13 2022 +0000 Vectorize conversions to BFloat16 on CPU (#80906) This adds explicit vectorization for converting float or double to bfloat16. Most conversions are sufficiently handled by the auto-vectorizer, but these conversions aren't (presumably due to branching in the scalar conversion code). Benchmark results with 512K elements on an AVX2 machine: | conversion | Before (us) | After (us) | |---------------------|-------------|------------| | float32 -> bfloat16 | 53.3 | 39.8 | | float64 -> bfloat16 | 92.1 | 78.2 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/80906 Approved by: https://github.com/ngimel commit 7a14c56beecf5f21e88c029bf306cbda8e91fbed Author: Catherine Lee Date: Tue Aug 30 03:53:16 2022 +0000 only run the circleci mac/ios jobs on prs (#84227) as in title, since they were being run on nightly when they dont need to be (and they were failing), also dont run on master b/c the github actions version already exists for that Pull Request resolved: https://github.com/pytorch/pytorch/pull/84227 Approved by: https://github.com/seemethere, https://github.com/janeyx99, https://github.com/huydhn, https://github.com/malfet commit 71369051ee99f679cbb026b571e2521e3845a93e Author: Driss Guessous Date: Tue Aug 30 03:48:09 2022 +0000 [Nested Tensor] fix from_padded bug (#84217) Fixes #84082 Explained in the issue that the problem was arising from grad being not contiguous and the fast kernel not handiling this case gracefully. The other thing I can do is add a contiguous call to https://github.com/pytorch/pytorch/blob/d144594512e10ab2a9625347816c2dee1fb55667/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp#L45 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84217 Approved by: https://github.com/albanD commit df98c529480b2ece3809b19fc850f57d2054605a Author: Zhengxu Chen Date: Tue Aug 30 03:09:48 2022 +0000 [fx] Make get_isolated_graphmodule accept tracing mode. (#84238) Summary: make get_isolated_graphmodule be able to run with symbolic mode. Test Plan: eyes Reviewed By: angelayi Differential Revision: D39110454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84238 Approved by: https://github.com/angelayi commit 399b1eb84b006f7a6e2bdcda7083fd264ac204da Author: samdow Date: Mon Aug 29 11:31:36 2022 -0400 [functorch] fix multinomial (#83838) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83838 Approved by: https://github.com/zou3519 commit 34e5b0997e73916a15a9802ea4525705ba033cb1 Author: Shen Li Date: Mon Aug 29 20:46:14 2022 +0000 [reland] Make allreduce compatible with make_fx (#84221) land after #83122 This PR explores solutions for 2 issues: 1. Collective comm ops are inplace ops, and does not return a tensor. With that, `make_fx` cannot include comm ops in the traced graph. The current solution is to make comm ops return a tuple of `(output_tensors, work_handle)`, so that [`proxy_call`](https://github.com/pytorch/pytorch/blob/90821aab100a436424113e2306eac63f5e247ee5/torch/fx/experimental/proxy_tensor.py#L170-L172) can handle that. It won't change the behavior of existing c10d Python/C++ APIs, so I directly added the code to `Ops.cpp`. 2. `make_fx` does not recognize `ProcessGroup::Work` and will ignore the `wait()` call on the work when tracing graph. However, this might break correctness, as when running the traced function, it could consume a tensor before it's ready. The current solution is to create a `CommTensor` tensor subclass to explicitly call `wait()`. In this PR, I am only doing this in the test, as we will need more discussion to see if we can add this to c10d Python implementations. kudos to Chillee wanchaol Edit: `print_tabular` breaks CI. removing that from tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84221 Approved by: https://github.com/wanchaol commit a402e100be1fc88fe8499f96ed36a08cf1bb0de0 Author: Zhengxu Chen Date: Tue Aug 30 01:16:56 2022 +0000 [fx] Make wrapped_fn also work for non-mutating passes. (#84232) Summary: Before the change, wrapped_fn should only take mutating passes, but we don't actually have any way to detect whether a pass is mutating before running it. To make this an abstraction without involving any precondition depending on PassManager run, we could just relax the precondition to take any kind of passes, and conditionally return the original pass based on the pass result. Test Plan: eyes Reviewed By: qihqi, angelayi Differential Revision: D39086343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84232 Approved by: https://github.com/angelayi commit 8aba2535e4ebe01e1461a17beedcabfd34db9d87 Author: zh Wang Date: Tue Aug 30 01:04:26 2022 +0000 Fix typo (#83802) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83802 Approved by: https://github.com/ngimel, https://github.com/kit1980 commit 7371761d9cc4a1c235f65aff07452da7de482eae Author: Antonio Kim Date: Tue Aug 30 00:31:35 2022 +0000 Add Lazy backend type string (#84228) As the title suggest, the `Lazy` case was missing the in the `backend_to_string` switch case causing ``` RuntimeError: Unimplemented backend Lazy ``` when called with a lazy backend. CC: @wconstab @Krovatkin @desertfire Pull Request resolved: https://github.com/pytorch/pytorch/pull/84228 Approved by: https://github.com/wconstab commit adc54dc2195fbfe37b2f01649b8788314382a9be Author: Huy Do Date: Mon Aug 29 23:50:24 2022 +0000 Give better error message when merge fails to find any rules (#84160) Fixes #84147 and https://github.com/pytorch/test-infra/issues/421. * If merge rule file is missing or fails to load for whatever reasons: ``` No rules find to match PR, please [report]{issue_link} this issue to DevX team. ``` * If the list of rules is empty: ``` Merges are not allowed into repository without a rules. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84160 Approved by: https://github.com/ZainRizvi, https://github.com/malfet commit 54d8661266665c0f66a6d819cc24e5b0053b0be9 Author: ssjia Date: Mon Aug 29 08:39:52 2022 -0700 [vulkan] Add vulkan_api_test as an instrumentation test (#83978) This diff adds a `fb_xplat_cxx_test` Android instrumentation test in internal repo that runs `vulkan_api_test.cpp`. Some small changes to `vulkan_api_test.cpp` were needed to build/run the binary successfully. Differential Revision: [D38954229](https://our.internmc.facebook.com/intern/diff/D38954229/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38954229/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83978 Approved by: https://github.com/kirklandsign commit e7635c06ce15b1e5952b34d4e50018c1c8d545db Author: apeltop Date: Mon Aug 29 23:32:44 2022 +0000 Fix typos in docs (#80602) I hope it helps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80602 Approved by: https://github.com/kit1980 commit 372a19d2c673a20fe50955b20ea4e3685266d630 Author: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Date: Mon Aug 29 22:53:40 2022 +0000 Update start_index and end_index for adaptive pooling (#84010) The PR fixes the issue #81409. To fix the issue the procedure of determining start and end indices for adaptive max pooling and average pooling is modified towards integer-only arithmetic. The testing of the new functions is straightforward: ``` int64_t start_index(int64_t a, int64_t b, int64_t c) { return (int64_t)std::floor((float)(a * c) / b); } int64_t end_index(int64_t a, int64_t b, int64_t c) { return (int64_t)std::ceil((float)((a + 1) * c) / b); } int64_t start_index_new(int64_t a, int64_t b, int64_t c) { return (a / b) * c + ((a % b) * c) / b; } int64_t end_index_new(int64_t a, int64_t b, int64_t c) { return 1 + ((a + 1) * c - 1) / b; } int main() { size_t N = 2<<24; std::cout< Date: Mon Aug 29 22:15:20 2022 +0000 Make mergebot failure messages more readable (#84214) Reformat how mergebot outputs merge and revert errors to make the failure more obvious and hide text that doesn't actually help most users debug their PRs. (The workflow job helps the DevX team to debug mergebot errors) image image image Pull Request resolved: https://github.com/pytorch/pytorch/pull/84214 Approved by: https://github.com/malfet, https://github.com/huydhn commit d62a6ca5216fa4465dad546307c406544917ffea Author: Zain Rizvi Date: Mon Aug 29 20:31:30 2022 +0000 Link to instructions on submitting an RFC (#83990) Point people to instructions on how to create an RFC Pull Request resolved: https://github.com/pytorch/pytorch/pull/83990 Approved by: https://github.com/janeyx99 commit 724b63d69452faac365131d63cbbbeb0a3c2d94a Author: Michael Suo Date: Mon Aug 29 10:49:53 2022 -0700 [ci] move XLA pin update to weekly (#84208) - Create a `weekly` workflow and move XLA pin update to that - Move the other two pin updates to the `nightly` workflow (instead of having a special workflow just for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84208 Approved by: https://github.com/janeyx99, https://github.com/malfet, https://github.com/seemethere commit 806878518f32c5b93acc7da576e57ab52f6f5232 Author: BowenBao Date: Mon Aug 29 10:26:43 2022 -0700 [ONNX][Reland] Export node and value with scope name (#82040) Introduce `_jit_pass_onnx_assign_node_and_value_names` to parse and assign scoped name for nodes and values in exported onnx graph. Module layer information is obtained from `ONNXScopeName` captured in `scope` attribute in nodes. For nodes, the processed onnx node name are stored in attribute `onnx_name`. For values, the processed onnx output name are stored as `debugName`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82040 Approved by: https://github.com/AllenTiTaiWang, https://github.com/justinchuby, https://github.com/abock commit d144594512e10ab2a9625347816c2dee1fb55667 Author: Jesse Cai Date: Mon Aug 29 18:08:36 2022 +0000 [Quant][fx] Remove WEIGHT_INDEX_DICT and BIAS_INDEX_DICT (Part 2) (#83853) Summary: - Finishes the second part of https://github.com/pytorch/pytorch/pull/83263 - Removes WEIGHT_INDEX_DICT and BIAS_INDEX_DICT from utils.py - Moves two funcitons, `node_arg_is_weight` and `node_arg_is_bias` into utils.py from prepare.py convert.py and _equalize.py now use node_arg_is_weight instead of the dictionaries - Adds in quantization support for `F.groupnorm`. Add in missing BackendPatternConfigs for layernorm, instancenorm, and groupnorm Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2b157e0dc4f1553be1f4813b4693db952e6fc558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83848 Fixes #83093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83853 Approved by: https://github.com/jerryzh168, https://github.com/andrewor14 commit ad44670fa1ce2dad7e2cdc3f90d27668e88e9548 Author: Edward Z. Yang Date: Mon Aug 29 06:08:43 2022 -0700 Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628)" (#84173) Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature" Original commit changeset: dab4a9dba4fa Original commit changeset: dcaf16c037a9 Original Phabricator Diff: D38984222 Original Phabricator Diff: D39075159 Also update Metal registrations for C++ registration changes. Also update NNPI registration to account for tightened schema checking Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173 Approved by: https://github.com/Krovatkin commit cfd18e105fe795072edafe54c1f5861967ca746a Author: Kimish Patel Date: Sat Aug 27 16:06:16 2022 -0700 [Pytorch][Ondevice quantization] Add device side API to convert model (#83807) Summary: This diff adds device side API which will convert the model to its quantized equivalent. THe input model must have been prepared AOT for quantization. API is implemented by: - Running reset obervers - Running observe method - Running quantize method - And replacing method, e.g. forward, with its quantized equivalent. Test Plan: test/quantization/jit/test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38889818](https://our.internmc.facebook.com/intern/diff/D38889818) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83807 Approved by: https://github.com/iseeyuan commit eebdcb5a2ef6a117a608b9ca5ca1eb2fd4f72fbd Author: Kimish Patel Date: Sat Aug 27 16:06:16 2022 -0700 [Pytorch][quantization][ondevice] Add a wrapper API for server side prep (#83742) for ondevice quantization Summary: THis diff just wraps existing API for ondevice quantization Test Plan: test/quantization/jit/test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38868647](https://our.internmc.facebook.com/intern/diff/D38868647) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83742 Approved by: https://github.com/jerryzh168 commit 5c7e801c50e4478e8f96ab287a77ee4b08051f75 Author: Kimish Patel Date: Sat Aug 27 16:06:15 2022 -0700 [pytorch][on device quant] Finalize method for ondevice quant (#83571) Summary: After inserting quant dequant nodes in the graph, we need 1. Insert packed param creation and quantized op 2. Create packed_params attribute in the top module. For this we need graph that inlined except for calculate_qparams method calls. But they can be inlined too. So perhaps we need to make sure no other callmethods exist. 3. Insert SetAttr for the packed param 4. Insert GetAttr for the packed param 5. Use GetAttr output for quantized op where applicable, e.g. linear_dynamic The above is added to quantize_ method created inprevious step. Once the above steps are done clone the method into quantized_ Modify quantize_: 1. Remove all outputs from the method. 2. Run dce 3. Remove all inputs from the method except self. Modify quantized_: 1. Remove all packed_param setAttr nodes. 2. Run dce. This should result in removal of all nodes that generate packed param. Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571 Approved by: https://github.com/jerryzh168 commit 446afb5f9f62d50d159e58e36f4596dfdfa8bcd5 Author: Kimish Patel Date: Sat Aug 27 16:06:14 2022 -0700 [On Device Quantization][pytorch]Make insert_quant_dequant support ondevice ptq (#83570) Summary: This diff adds a way to: - clone previously observed method - Add calls to observer's calculate_qparams methods - Extract the scale and zero point - Use them to insert quant dequant nodes Now for forward method we have - observe_forward - quantize_forward observe_forward is used post training to observer statistics. In the case of dynamic PTQ this requires just running that method once to update weight observer statistics. quantize_forward method will be used to use the observer statistics to calculate quantization parameters and apply that to quant dequant op. Subsequent diffs will replace dequant + op with their quantized op counter parts and replace quantize ops with relevant packed params class where possible Test Plan: To be written Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771419](https://our.internmc.facebook.com/intern/diff/D38771419) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83570 Approved by: https://github.com/jerryzh168 commit 6a5d9f1be0c7a7ebe556d427930683a438195def Author: Kimish Patel Date: Fri Aug 26 10:53:28 2022 -0700 Replace "_scalar_type" string with constant (#83569) Summary: Use this refactor to make insertQuantizationOps tempelatized in the later diff Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771418](https://our.internmc.facebook.com/intern/diff/D38771418) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83569 Approved by: https://github.com/jerryzh168 commit 9189edb3b3eb7ab9b94d514c428af284b8d978e1 Author: Kimish Patel Date: Fri Aug 26 10:53:27 2022 -0700 [Quantization][Pytorch] On device quantization support part 1 (#83568) Summary: TO support on device quantization this diff introduces observer insertion. Specifically observers are inserted by adding new method with prefix observ_. Intent is that post training, this method will be run to record statistics Test Plan: test_ondevice_quantization.py Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D38771417](https://our.internmc.facebook.com/intern/diff/D38771417) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83568 Approved by: https://github.com/jerryzh168 commit 8acc92eb009bc4df2ee4f9cbd06cd6b9cee533a6 Author: Rohan Varma Date: Mon Aug 29 17:10:25 2022 +0000 [FSDP] Print exec order only in debug mode (#83868) Since exec order warning can result in very long module name print out, gating this only to be printing in debug mode. Oftentimes such as in multiModal training, there is not a lot we can do about this warning since some modules go unused in certain iterations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83868 Approved by: https://github.com/awgu commit 352da6de6b731c04576701295b1b88e733ebaf76 Author: Angela Yi Date: Thu Aug 25 16:29:27 2022 -0700 [fx][pass] Fix type of exception (#84094) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84094 Approved by: https://github.com/SherlockNoMad commit 7088a98fba3a5031a2afc293cbf25cec09f248a5 Author: soulitzer Date: Fri Aug 26 11:21:19 2022 -0400 conv2d: require bias to have the same dtype as input and weight on cpu (#83686) Fixes https://github.com/pytorch/pytorch/issues/83505 BC-breaking message: - Previously we only required input and weight to have the same dtype on cpu (when input is non-complex). After this change, the dtype of bias is now also expected to have the same dtype. This change was necessary to improve the error message for certain combinations of inputs. This behavior now also matches that of convolution on cuda.
Old plan Previously convolution (at least for slow_conv2d) did not perform type promotion, i.e. the output of `conv(int, int, float)` is an int, and that leads to the autograd assert. This PR adds type promotion handling at the `at::native::conv2d` (this is a composite) level. We also need to correct or remove many tests that assume that conv errors when input types are mixed Pros: - Doing type promotion at this level avoids the complex path from having any special handling for mixed dtypes, and can potentially speed up mixed dtype inputs to now dispatch to faster kernels which are only capable of handling floats. Cons: - Doing type promotion at this level has the risk of introducing extra overhead when we would've dispatched to a kernel capable of handle mixed type anyway. I don't know if any of these exist at all though - it is possible that inputs with any non-float arguments are dispatched to the slow path. If this approach is OK, we can proceed with the other convolutions as well:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83686 Approved by: https://github.com/ngimel commit 1945d28f58732a883220563c0dcebf43f1412c72 Author: PyTorch MergeBot Date: Mon Aug 29 16:41:09 2022 +0000 Revert "[fx][pass] Fix type of exception (#84094)" This reverts commit eb2fa2e042b18ba35fa6eedb769c2efe411dbcfb. Reverted https://github.com/pytorch/pytorch/pull/84094 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit c29b7865d02239d89d4407559a85a556039cb7c6 Author: PyTorch MergeBot Date: Mon Aug 29 15:44:31 2022 +0000 Revert "[xla hash update] update the pinned xla hash (#84164)" This reverts commit fbf5a3f9f41d69248099c957571be0474659b15a. Reverted https://github.com/pytorch/pytorch/pull/84164 on behalf of https://github.com/weiwangmeta due to MESSAGE commit eff312f07be85508f049e5ddfe9ada5aa5df4fc4 Author: samdow Date: Fri Aug 26 14:49:05 2022 +0000 nit fixes in modes (#83924) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83924 Approved by: https://github.com/ezyang, https://github.com/zou3519 commit 1a53e35b9db8442432d8dfcfff430d2a569ad062 Author: Rohan Varma Date: Fri Aug 26 16:58:59 2022 +0000 Enforce explicit ProcessGroup passed into DefaultState (#84105) Would prefer to enforce that users pass in explicit PG into these state objects when using comm hooks with FSDP, so that it is clear and easy debugable over which processes communication is taking place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84105 Approved by: https://github.com/mrshenli, https://github.com/zhaojuanmao commit f66be71d77c2998122c5177c2362c3d66b7b19cc Author: Rodrigo Kumpera Date: Mon Aug 29 14:38:32 2022 +0000 [checkpoint] Adopt Planner interface across the board. (#83781) Change StorageReader and StorageWriter to follow the new SavePlanner / LoadPlanner design. Add optional planner param to load_state_dict and save_state_dict and implement the new protocol. This includes a small rework of FileSystem layer to support single file per rank and making fsync optional to match torch.save behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83781 Approved by: https://github.com/wanchaol, https://github.com/fduwjj commit fbf5a3f9f41d69248099c957571be0474659b15a Author: PyTorch MergeBot Date: Mon Aug 29 10:29:47 2022 +0000 [xla hash update] update the pinned xla hash (#84164) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84164 Approved by: https://github.com/pytorchbot commit b8e1c54f53404393e5fe421c26f7bfc951251e43 Author: Nikita Shulga Date: Mon Aug 29 09:29:28 2022 +0000 [Prim] Implement group_norm_backward (#84037) Test plan: CI, i.e. `python3 test_decomp.py -v -k test_comprehensive_nn_functional_group_norm` plus: ``` import torch func = torch.ops.aten.native_group_norm_backward.default decomp = torch._decomp.decomposition_table[func] for args in ( (torch.rand(1, 6, 3), torch.rand(1, 6, 3), torch.rand(1, 2), torch.rand(1, 2), torch.rand(6), 1, 6, 3, 2, [True, True, True]), (torch.rand(64, 768, 7, 7), torch.rand(64, 768, 7, 7), torch.rand(64, 1), torch.rand(64, 1), torch.rand(768), 64, 768, 49, 1, [True, True, True])): nrc=func(*args) drc=decomp(*args) for i in range(len(nrc)): print(i, torch.max(nrc[i]-drc[i])) print(all(torch.allclose(x, y) for (x, y) in zip(nrc, drc))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84037 Approved by: https://github.com/Chillee, https://github.com/ngimel commit 2436cf8aa8a557ffa031d700f6448047b0fd58a3 Author: Driss Guessous Date: Mon Aug 29 09:12:24 2022 +0000 [Nested Tensor] detach (#84078) Add detach op for nested tensors. Nested tensors are not part of the composite explicit dispatch key set and therefore need to be added manually. The Detach test is failing only for the dtype=torch.float32, torch.float16 and device=cuda. The chain of ops that called are sum.backward() -> from_padded() -> unbind(). This populates the grad for a and b. Does this potentially indicated that cuda implementation for one of these ops, likely from_padded() is incorrect? Pull Request resolved: https://github.com/pytorch/pytorch/pull/84078 Approved by: https://github.com/albanD commit 0095571135c8e3d2017a270c7652fd8605425879 Author: Animesh Jain Date: Mon Aug 29 09:11:54 2022 +0000 [AOT Autograd] Redirect named_parameters to original mod (#84157) Helps in comparing accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84157 Approved by: https://github.com/Chillee commit 3f947264533f318355b848c070a4279032cbb5d8 Author: erjia Date: Fri Aug 26 20:55:44 2022 +0000 [DataPipe] Convert MapDataPipe.shuffle to IterDataPipe (#83202) Fixes: https://github.com/pytorch/data/issues/718 This is an alternative PR against https://github.com/pytorch/pytorch/pull/82974 This PR would change the behavior for both types to the same behavior as `IterDataPipe.shuffle` - Lazily generating seed per iteration - Each iterators has a new seed - Convert `MapDataPipe.shuffle` to an `IterDataPipe` This PR changes the return type of `MapDataPipe.shuffle` from a `MapDataPipe` to a `IterDataPipe`. Output as `MapDataPipe` ``` >>> from torch.utils.data import IterDataPipe, MapDataPipe >>> from torch.utils.data.datapipes.map import SequenceWrapper >>> dp = SequenceWrapper(list(range(10))).shuffle() >>> isinstance(dp, MapDataPipe) True >>> isinstance(dp, IterDataPipe) False ``` Output as `IterDataPipe` ``` >>> from torch.utils.data import IterDataPipe, MapDataPipe >>> from torch.utils.data.datapipes.map import SequenceWrapper >>> dp = SequenceWrapper(list(range(10))).shuffle() >>> isinstance(dp, MapDataPipe) False >>> isinstance(dp, IterDataPipe) True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83202 Approved by: https://github.com/NivekT commit 7480e83338e69bd39905ed90a23b7c22960e1aff Author: Taylor Robie Date: Fri Aug 26 12:49:11 2022 -0700 [Profiler] Add `disabled` and `global` methods to ProfilerConfig. (#83891) `ProfilerState::Disabled` and `ProfilerState::KINETO_ONDEMAND` have special semantics. The former is somewhat intuitive, but the degree of behavior branching on the latter (and why the branching is necessary) is less clear. By factoring the enum checks into methods, we can both clairify intent and future proof in case we ever add other global profiling contexts. Differential Revision: [D38917980](https://our.internmc.facebook.com/intern/diff/D38917980/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83891 Approved by: https://github.com/slgong-fb commit 8e6207bcd8beff791c517977c3f83179e0f51d45 Author: PyTorch MergeBot Date: Mon Aug 29 06:36:17 2022 +0000 Revert "[ONNX] Export node and value with scope name (#82040)" This reverts commit 6a3666282d000a0f196fbdd8b182bb4fd711f189. Reverted https://github.com/pytorch/pytorch/pull/82040 on behalf of https://github.com/weiwangmeta due to Diff reverted internally commit d50aa517b532dd58daafb79160bcc8758ecd01b7 Author: PyTorch MergeBot Date: Mon Aug 29 06:34:50 2022 +0000 Revert "Add support to traverse all python collection objects (#84079)" This reverts commit e0f0c8e7b9acf6b821956acadbe79aaa0f6f0237. Reverted https://github.com/pytorch/pytorch/pull/84079 on behalf of https://github.com/weiwangmeta due to Diff reverted internally commit 0ac2986d3334f8f9b35ca2fa7a30c20022c26fa6 Author: Natalia Gimelshein Date: Mon Aug 29 04:29:09 2022 +0000 Fixes softmax indexing for large tensors (#84182) Fixes #84144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84182 Approved by: https://github.com/janeyx99 commit 533203f5aaa9f8987f25d828e1c37e755a2ba4ea Author: Natalia Gimelshein Date: Mon Aug 29 02:25:00 2022 +0000 _to_copy decomp (#84108) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/84108 Approved by: https://github.com/Chillee commit 9fc02f6bc558f26a460241cbeaea915ea2b41005 Author: lezcano Date: Sun Aug 28 22:58:52 2022 +0000 Decomposition for adaptive_avg_pool2d (#84062) This was already implemented as a lowering in https://github.com/pytorch/torchdynamo/pull/962. I'm putting the idea up here ~(I haven't even run this code, so it surely has *many* issues, but I reckon the general idea should hopefully be alright).~ The tests now pass and I corrected the issues that the first implementation had. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84062 Approved by: https://github.com/jansel commit 3aae6ff1e13128412e44a69ca3da5582f17fac02 Author: Ivan Yashchuk Date: Sun Aug 28 18:45:25 2022 +0000 Add nvprims.var_mean (#83508) This PR adds nvfuser-specific primitive - `var_mean`. Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager. I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`). Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti): ```py import torch from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.executor import execute def func(a): return torch.native_layer_norm(a, (1024,), None, None, 1e-6) a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda") with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) for _ in range(10): execute(gm, a, executor="strictly_nvfuser"); ``` run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py` ```py ``` So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape. Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`). Ref. https://github.com/pytorch/pytorch/issues/80187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508 Approved by: https://github.com/ngimel commit 261be8e5c2e3702528105005035d2e151b4f2724 Author: PyTorch MergeBot Date: Sun Aug 28 18:30:05 2022 +0000 Revert "[Profiler] Add `disabled` and `global` methods to ProfilerConfig. (#83891)" This reverts commit 69e9f905b7ddc0f453fa273746c9db5ed60bc71a. Reverted https://github.com/pytorch/pytorch/pull/83891 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit 7244a3737c8c6fd4c2e4e42fcddc14e2f56a35c1 Author: PyTorch MergeBot Date: Sun Aug 28 18:00:17 2022 +0000 Revert "[DataPipe] Convert MapDataPipe.shuffle to IterDataPipe (#83202)" This reverts commit a423c966a780a1fdac6a29c6d2be2a0709de2cd5. Reverted https://github.com/pytorch/pytorch/pull/83202 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit 33db5da4c16f048a966542f8e916afc02463f71c Author: PyTorch MergeBot Date: Sun Aug 28 17:30:50 2022 +0000 Revert "[Prim] Implement group_norm_backward (#84037)" This reverts commit bed85cce8b2e7c7430c1f3b5f7c8c765b779ec3e. Reverted https://github.com/pytorch/pytorch/pull/84037 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit df523a6eeef283a79062f828c3fedce2cc3e32f0 Author: PyTorch MergeBot Date: Sun Aug 28 16:29:08 2022 +0000 Revert "[AOT Autograd] Redirect named_parameters to original mod (#84157)" This reverts commit 43620b7e8d722d1b5c34cbda2619ccd9f92ca820. Reverted https://github.com/pytorch/pytorch/pull/84157 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit f4f54c7ce1bac0db91922c618c38a5f72cab130b Author: PyTorch MergeBot Date: Sun Aug 28 15:30:21 2022 +0000 Revert "[Nested Tensor] detach (#84078)" This reverts commit 092fe71f33fe37b8d09499708230307aea028eaf. Reverted https://github.com/pytorch/pytorch/pull/84078 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit 5cf4542f86e0907ac0ac514d64995ae90d41ac78 Author: PyTorch MergeBot Date: Sun Aug 28 14:30:18 2022 +0000 Revert "Enforce explicit ProcessGroup passed into DefaultState (#84105)" This reverts commit adc9a1e2fbd0e6d873dc2441d250b94fe9098e9e. Reverted https://github.com/pytorch/pytorch/pull/84105 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit ff23f3ac1c10f6bd5104f27aa1566b71e2ae6fa0 Author: PyTorch MergeBot Date: Sun Aug 28 13:27:49 2022 +0000 Revert "_to_copy decomp (#84108)" This reverts commit e33897cb9999f124bce126c7e43f96c0755413ef. Reverted https://github.com/pytorch/pytorch/pull/84108 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit d8cc8368abfc540725b8f944419bcf1e7e79458e Author: PyTorch MergeBot Date: Sun Aug 28 12:28:58 2022 +0000 Revert "[ONNX] Fix type annotations and enable type checking for all apis (#84091)" This reverts commit 6446da17305960088dfae501d5c7358af068fa81. Reverted https://github.com/pytorch/pytorch/pull/84091 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit b159a5230ffb497c3683e67f95095615493ef65f Author: PyTorch MergeBot Date: Sun Aug 28 11:30:27 2022 +0000 Revert "Add nvprims.var_mean (#83508)" This reverts commit 7e7694b6615fbf46abfab234615fa891c2819eb7. Reverted https://github.com/pytorch/pytorch/pull/83508 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit 71cd3fa2d56d24a3ef246102ebb145a06fbe88a3 Author: PyTorch MergeBot Date: Sun Aug 28 10:29:30 2022 +0000 Revert "[xla hash update] update the pinned xla hash (#84164)" This reverts commit c032b097e315177af5bc867eeee5452b7df32952. Reverted https://github.com/pytorch/pytorch/pull/84164 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally commit b078d242c481344ab083926e4010322ec68884c9 Author: jjsjann123 Date: Sun Aug 28 04:26:36 2022 +0000 Nvfuser to copy decomp to prim (#83782) Conditional decomposing aten::_to_copy to nvprim::convert_element_type to allow fusion with type casting, which is introduced during type promotion phase at torch decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83782 Approved by: https://github.com/ngimel commit c9b144ff47ff3b6f358752976d29ac61f2b9b070 Author: kuttire42 <64169153+kuttire42@users.noreply.github.com> Date: Sun Aug 28 01:25:07 2022 +0000 Replace assertEqualIgnoreTypes from common_methods_invocations.py (#84076) This addresses TODO:38095 . More details at https://github.com/pytorch/pytorch/issues/38095 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84076 Approved by: https://github.com/kit1980 commit b8fe0edcf5a92f53d8f0254d3ad10c2770b23772 Author: PyTorch MergeBot Date: Sat Aug 27 14:14:58 2022 +0000 Revert "Make allreduce compatible with fx ProxyTensor (#84126)" This reverts commit ec5b83f76847584013a9cd4177d389a408033614. Reverted https://github.com/pytorch/pytorch/pull/84126 on behalf of https://github.com/malfet due to Likely broke multigpu periodic jobs, see https://github.com/pytorch/pytorch/runs/8044611438?check_suite_focus=true commit c032b097e315177af5bc867eeee5452b7df32952 Author: PyTorch MergeBot Date: Sat Aug 27 10:24:21 2022 +0000 [xla hash update] update the pinned xla hash (#84164) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84164 Approved by: https://github.com/pytorchbot commit 7e7694b6615fbf46abfab234615fa891c2819eb7 Author: Ivan Yashchuk Date: Sat Aug 27 09:05:20 2022 +0000 Add nvprims.var_mean (#83508) This PR adds nvfuser-specific primitive - `var_mean`. Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager. I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`). Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti): ```py import torch from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.executor import execute def func(a): return torch.native_layer_norm(a, (1024,), None, None, 1e-6) a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda") with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) for _ in range(10): execute(gm, a, executor="strictly_nvfuser"); ``` run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py` ```py ``` So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape. Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`). Ref. https://github.com/pytorch/pytorch/issues/80187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508 Approved by: https://github.com/ngimel commit 6446da17305960088dfae501d5c7358af068fa81 Author: Justin Chu Date: Sat Aug 27 02:05:37 2022 +0000 [ONNX] Fix type annotations and enable type checking for all apis (#84091) Enable runtime type checking for all torch.onnx public apis, symbolic functions and most helpers (minus two that does not have a checkable type: `_.JitType` does not exist) by adding the beartype decorator. Fix type annotations to makes unit tests green. Profile: export `torchvision.models.alexnet(pretrained=True)` ``` with runtime type checking: 21.314 / 10 passes without runtime type checking: 20.797 / 10 passes + 2.48% ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84091 Approved by: https://github.com/BowenBao commit e33897cb9999f124bce126c7e43f96c0755413ef Author: Natalia Gimelshein Date: Sat Aug 27 03:51:03 2022 +0000 _to_copy decomp (#84108) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/84108 Approved by: https://github.com/Chillee commit adc9a1e2fbd0e6d873dc2441d250b94fe9098e9e Author: Rohan Varma Date: Fri Aug 26 16:58:59 2022 +0000 Enforce explicit ProcessGroup passed into DefaultState (#84105) Would prefer to enforce that users pass in explicit PG into these state objects when using comm hooks with FSDP, so that it is clear and easy debugable over which processes communication is taking place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84105 Approved by: https://github.com/mrshenli, https://github.com/zhaojuanmao commit 092fe71f33fe37b8d09499708230307aea028eaf Author: Driss Guessous Date: Sat Aug 27 03:00:53 2022 +0000 [Nested Tensor] detach (#84078) Add detach op for nested tensors. Nested tensors are not part of the composite explicit dispatch key set and therefore need to be added manually. The Detach test is failing only for the dtype=torch.float32, torch.float16 and device=cuda. The chain of ops that called are sum.backward() -> from_padded() -> unbind(). This populates the grad for a and b. Does this potentially indicated that cuda implementation for one of these ops, likely from_padded() is incorrect? Pull Request resolved: https://github.com/pytorch/pytorch/pull/84078 Approved by: https://github.com/albanD commit 43620b7e8d722d1b5c34cbda2619ccd9f92ca820 Author: Animesh Jain Date: Sat Aug 27 02:53:58 2022 +0000 [AOT Autograd] Redirect named_parameters to original mod (#84157) Helps in comparing accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84157 Approved by: https://github.com/Chillee commit c7edcd69683f6e3b08305ed0d4621e148fbfbe17 Author: PyTorch MergeBot Date: Sat Aug 27 01:23:17 2022 +0000 Revert "Don't introduce new overload for SymInt (#83628)" This reverts commit 9790d90e4b0288796ab44a6b4979db0a67580ba8. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487 commit 38e5e4a85f18c716ed84d12e6c7d5155ac582b65 Author: PyTorch MergeBot Date: Sat Aug 27 01:18:43 2022 +0000 Revert "[xla hash update] update the pinned xla hash (#84043)" This reverts commit ddedc294fbb4c13170811442b590a18e950dae67. Reverted https://github.com/pytorch/pytorch/pull/84043 on behalf of https://github.com/malfet due to Depends on https://github.com/pytorch/pytorch/pull/83628 commit bed85cce8b2e7c7430c1f3b5f7c8c765b779ec3e Author: Nikita Shulga Date: Sat Aug 27 01:10:27 2022 +0000 [Prim] Implement group_norm_backward (#84037) Test plan: CI, i.e. `python3 test_decomp.py -v -k test_comprehensive_nn_functional_group_norm` plus: ``` import torch func = torch.ops.aten.native_group_norm_backward.default decomp = torch._decomp.decomposition_table[func] for args in ( (torch.rand(1, 6, 3), torch.rand(1, 6, 3), torch.rand(1, 2), torch.rand(1, 2), torch.rand(6), 1, 6, 3, 2, [True, True, True]), (torch.rand(64, 768, 7, 7), torch.rand(64, 768, 7, 7), torch.rand(64, 1), torch.rand(64, 1), torch.rand(768), 64, 768, 49, 1, [True, True, True])): nrc=func(*args) drc=decomp(*args) for i in range(len(nrc)): print(i, torch.max(nrc[i]-drc[i])) print(all(torch.allclose(x, y) for (x, y) in zip(nrc, drc))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84037 Approved by: https://github.com/Chillee, https://github.com/ngimel commit a423c966a780a1fdac6a29c6d2be2a0709de2cd5 Author: erjia Date: Fri Aug 26 20:55:44 2022 +0000 [DataPipe] Convert MapDataPipe.shuffle to IterDataPipe (#83202) Fixes: https://github.com/pytorch/data/issues/718 This is an alternative PR against https://github.com/pytorch/pytorch/pull/82974 This PR would change the behavior for both types to the same behavior as `IterDataPipe.shuffle` - Lazily generating seed per iteration - Each iterators has a new seed - Convert `MapDataPipe.shuffle` to an `IterDataPipe` This PR changes the return type of `MapDataPipe.shuffle` from a `MapDataPipe` to a `IterDataPipe`. Output as `MapDataPipe` ``` >>> from torch.utils.data import IterDataPipe, MapDataPipe >>> from torch.utils.data.datapipes.map import SequenceWrapper >>> dp = SequenceWrapper(list(range(10))).shuffle() >>> isinstance(dp, MapDataPipe) True >>> isinstance(dp, IterDataPipe) False ``` Output as `IterDataPipe` ``` >>> from torch.utils.data import IterDataPipe, MapDataPipe >>> from torch.utils.data.datapipes.map import SequenceWrapper >>> dp = SequenceWrapper(list(range(10))).shuffle() >>> isinstance(dp, MapDataPipe) False >>> isinstance(dp, IterDataPipe) True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83202 Approved by: https://github.com/NivekT commit 69e9f905b7ddc0f453fa273746c9db5ed60bc71a Author: Taylor Robie Date: Fri Aug 26 12:49:11 2022 -0700 [Profiler] Add `disabled` and `global` methods to ProfilerConfig. (#83891) `ProfilerState::Disabled` and `ProfilerState::KINETO_ONDEMAND` have special semantics. The former is somewhat intuitive, but the degree of behavior branching on the latter (and why the branching is necessary) is less clear. By factoring the enum checks into methods, we can both clairify intent and future proof in case we ever add other global profiling contexts. Differential Revision: [D38917980](https://our.internmc.facebook.com/intern/diff/D38917980/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83891 Approved by: https://github.com/slgong-fb commit f4dc7b3a8a60bb1823f58154f1d041b489cbdf25 Author: Taylor Robie Date: Fri Aug 26 10:33:18 2022 -0700 [Profiler][Trivial] Cleanup ExperimentalConfig (#83890) I'm trying to limit how much is in headers to make it easier to read the API surface. In a similar vein, we can replace `hasOptions` with `operator bool` so it just does the right thing in the check. Differential Revision: [D38917366](https://our.internmc.facebook.com/intern/diff/D38917366/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83890 Approved by: https://github.com/slgong-fb commit eb2fa2e042b18ba35fa6eedb769c2efe411dbcfb Author: Angela Yi Date: Thu Aug 25 16:29:27 2022 -0700 [fx][pass] Fix type of exception (#84094) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84094 Approved by: https://github.com/SherlockNoMad commit aa4be48b58f5f22e15d1695b6332064c3c4d7074 Author: Jeff Daily Date: Fri Aug 26 21:48:06 2022 +0000 [Nested Tensor] do not use at::cuda::getDefaultCUDAStream() (#84134) Use at::cuda::getCurrentCUDAStream(), not getDefaultCUDAStream(). Otherwise, add/remove padding kernels won't sync with current stream, resulting in flaky unit tests in test_nestedtensor.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84134 Approved by: https://github.com/drisspg commit 82efb0e196c71a75985595fbbf294d8c816e9753 Author: Huy Do Date: Fri Aug 26 21:33:22 2022 +0000 Enable cache action for windows and other minor workflows (#84093) Following up on https://github.com/pytorch/pytorch/pull/84026, these are the rest of pip dependencies that I can find. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84093 Approved by: https://github.com/malfet commit 3fae89d4a468a02be501357eb123ce2bf7086d2f Author: Ian Graves Date: Fri Aug 26 21:04:04 2022 +0000 Read via FileAdapter when loading files in torch if not flatbuffer (#84028) Summary: This will optimize memory usage at the small cost of loading time when loading mobile models restoring the behavior before D36926217 (https://github.com/pytorch/pytorch/commit/fed12ff680813c0fab7dba7232f6b4cd8b33b8d3). Test Plan: Signals Reviewed By: qihqi Differential Revision: D38998858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84028 Approved by: https://github.com/qihqi, https://github.com/cccclai commit e0f0c8e7b9acf6b821956acadbe79aaa0f6f0237 Author: erjia Date: Fri Aug 26 21:02:43 2022 +0000 Add support to traverse all python collection objects (#84079) Fixes https://github.com/pytorch/data/issues/752 This PR makes `traverse` function supporting more collections data structures from Python. Please let me know if anyone has a better idea about how to elegantly check if the object is a collection then we can dive into this object to see wether there is any DataPipe wrapped. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84079 Approved by: https://github.com/NivekT commit 6a3666282d000a0f196fbdd8b182bb4fd711f189 Author: BowenBao Date: Fri Aug 26 10:29:44 2022 -0700 [ONNX] Export node and value with scope name (#82040) Introduce `_jit_pass_onnx_assign_node_and_value_names` to parse and assign scoped name for nodes and values in exported onnx graph. Module layer information is obtained from `ONNXScopeName` captured in `scope` attribute in nodes. For nodes, the processed onnx node name are stored in attribute `onnx_name`. For values, the processed onnx output name are stored as `debugName`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82040 Approved by: https://github.com/AllenTiTaiWang, https://github.com/justinchuby, https://github.com/abock commit b5c2b0b2004c3e0c4b0850bd841f13e72d88e82f Author: Catherine Lee Date: Fri Aug 26 20:56:09 2022 +0000 make job pass even if monitoring script fails (#84068) makes github slightly less confusing to look at when a test fails Pull Request resolved: https://github.com/pytorch/pytorch/pull/84068 Approved by: https://github.com/huydhn, https://github.com/malfet commit 6a5860395619700633ab148b7bdbaed331eb67d5 Author: Animesh Jain Date: Fri Aug 26 20:49:43 2022 +0000 Update Dynamo pin (#83829) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/83829 Approved by: https://github.com/ezyang commit 61b9d8fccd3361f21e1f3548c2a9538b62cc7525 Author: Taylor Robie Date: Fri Aug 26 10:33:17 2022 -0700 [Profiler][Trivial] Add null handling to `AppendOnlyList::copy` memcpy path. (#83963) It is apparently undefined behavior to do pointer arithmetic on nullptr. In the case of AppendOnlyList, `next_` will only be null if `end_` is also null and thus the `memcpy` path will only be triggered if `n == 0`. Nonetheless, it is UB to `memcpy(0, 0, 0)` The extra null check is in a `C10_LIKELY` block so the extra cost should be negligible, and indeed after dusting off the component microbenchmarks there's no observable difference. Differential Revision: [D38969443](https://our.internmc.facebook.com/intern/diff/D38969443/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83963 Approved by: https://github.com/slgong-fb commit 014a333df37ca331d4ae969d200aece76b1d4536 Author: Taylor Robie Date: Fri Aug 26 10:33:15 2022 -0700 [Profiler][Minor] Extend Python bindings (#83622) Adding some fields which are needed for memory profiling. Differential Revision: [D38528382](https://our.internmc.facebook.com/intern/diff/D38528382/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83622 Approved by: https://github.com/Gamrix commit 681c38704e1efd16079ed79b9a92ba6d0a57db29 Author: Justin Chu Date: Thu Aug 25 20:28:23 2022 +0000 [ONNX] Clean up patch functions (#83136) Changes: - Move namespace handling from `_new_node` to `_graph_op` for clarity - Always require the `aten` namespace when creating aten ops. Remove the `aten` argument supplied in `_aten_op` for clarity - Rename the `_ATTR_PATTERN` global - Improve types - Update `_add_attribute` to raise ValueErrors Pull Request resolved: https://github.com/pytorch/pytorch/pull/83136 Approved by: https://github.com/BowenBao commit ec5b83f76847584013a9cd4177d389a408033614 Author: Shen Li Date: Fri Aug 26 16:36:41 2022 +0000 Make allreduce compatible with fx ProxyTensor (#84126) land after #83122 This PR explores solutions for 2 issues: 1. Collective comm ops are inplace ops, and does not return a tensor. With that, `make_fx` cannot include comm ops in the traced graph. The current solution is to make comm ops return a tuple of `(output_tensors, work_handle)`, so that [`proxy_call`](https://github.com/pytorch/pytorch/blob/90821aab100a436424113e2306eac63f5e247ee5/torch/fx/experimental/proxy_tensor.py#L170-L172) can handle that. It won't change the behavior of existing c10d Python/C++ APIs, so I directly added the code to `Ops.cpp`. 2. `make_fx` does not recognize `ProcessGroup::Work` and will ignore the `wait()` call on the work when tracing graph. However, this might break correctness, as when running the traced function, it could consume a tensor before it's ready. The current solution is to create a `CommTensor` tensor subclass to explicitly call `wait()`. In this PR, I am only doing this in the test, as we will need more discussion to see if we can add this to c10d Python implementations. kudos to @Chillee @wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/84126 Approved by: https://github.com/wanchaol commit f93446adc2b5b90e144d1b0a3e81269ab0c3401b Author: Shen Li Date: Fri Aug 26 13:42:37 2022 +0000 Update proxy_tensor.py to support List input/output (#83302) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83302 Approved by: https://github.com/Chillee commit 527a16016995c63dc7a7fcf74f18a75e2a96ff0e Author: Shen Li Date: Fri Aug 26 13:42:36 2022 +0000 Expose ProcessGroup::Work.wait() API to TorchScript (#83303) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83303 Approved by: https://github.com/rohan-varma commit c6348a7109796887d6497ed4c463537016003c39 Author: Adam J. Stewart Date: Fri Aug 26 18:58:25 2022 +0000 Add type hints to torch.save, torch.load (#83937) I'll probably need help with this one. I'm not sure what the full type signature for `map_location` should be. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83937 Approved by: https://github.com/malfet, https://github.com/albanD commit 582c0833d520fa802664eabeb689879b7e67dd2b Author: Catherine Lee Date: Fri Aug 26 18:48:46 2022 +0000 mac circleci workflows (#82780) Add mac and ios workflows to circleci so they can be run on pull m1 tests not included because circleci doesnt have machines Unsure how to get certain environment variables (specifically for arm64 ios builds that require env vars like `IOS_SIGN_KEY_2022` and `IOS_DEV_TEAM_ID` that are stored in the org-member context which is not accessible by everyone. doc regarding env vars https://docs.google.com/document/d/1J_3Z9sfu2vlHMF1fjdJfeTuxPXC6dgqJs7aU0KpYSBU/edit# Pull Request resolved: https://github.com/pytorch/pytorch/pull/82780 Approved by: https://github.com/malfet, https://github.com/huydhn commit e9dff858c3d9aa57d4ecca4410bfbcd996eaf8eb Author: samdow Date: Thu Aug 25 15:29:34 2022 -0400 [functorch] add lstsq batch rule (#82325) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82325 Approved by: https://github.com/zou3519 commit a08911400edb62c9caa0c94d1ce176cf8cb29765 Author: Peter Bell Date: Thu Aug 25 23:43:25 2022 +0100 Use C10_HAS_CPP_ATTRIBUTE to simplify nodiscard definition (#83976) `C10_HAS_CPP_ATTRIBUTE` only expands to `__has_cpp_attribute` when it is defined, so we avoid the extra `#if defined(__has_cpp_attribute)` checks and double-nested `#if`s Pull Request resolved: https://github.com/pytorch/pytorch/pull/83976 Approved by: https://github.com/albanD commit b429a17545be8418f8d5887ad302c9b8af031177 Author: Peter Bell Date: Thu Aug 25 23:43:25 2022 +0100 Enable -Wunused-local-typedefs (#83708) I recently had a PR reverted because it triggered an unused-local-typedefs warning, so disabling these in the CMake build is counter-productive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83708 Approved by: https://github.com/albanD commit 65ea3d062161f3aa5c8969b62ca322b0518300ae Author: kshitij12345 Date: Fri Aug 26 15:14:37 2022 +0000 [composite compliance] cov, corrcoef (#82954) Ref: #69991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82954 Approved by: https://github.com/zou3519 commit cddf96c4ba9425bd70782979901b78007760fef5 Author: lezcano Date: Fri Aug 26 00:03:54 2022 +0000 Fix preconditions of adaptive_avg_pooling2d (#84061) Before, if the input had dimension `4`, the channel had to be of dimension non zero. This was not what the errors advertised Pull Request resolved: https://github.com/pytorch/pytorch/pull/84061 Approved by: https://github.com/Chillee commit 9a236c7ab423a8893461b9d6f538d4aca02a086a Author: Horace He Date: Fri Aug 26 08:23:49 2022 +0000 Made some minor cleanups to decompositions (#83814) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83814 Approved by: https://github.com/ngimel commit ddedc294fbb4c13170811442b590a18e950dae67 Author: PyTorch MergeBot Date: Fri Aug 26 10:08:56 2022 +0000 [xla hash update] update the pinned xla hash (#84043) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84043 Approved by: https://github.com/pytorchbot commit d54fad5675138e9f2a0d504e6c7dee3cc099f342 Author: Sergii Dymchenko Date: Fri Aug 26 06:17:29 2022 +0000 Remove unreachable except block (#84070) This was introduced because two PRs tried to fix an issue concurently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84070 Approved by: https://github.com/huydhn, https://github.com/janeyx99 commit f03ab28b971e8e0b11dda8bf49e85ff3be6fb97d Author: Sergii Dymchenko Date: Fri Aug 26 06:16:20 2022 +0000 Use an unused variable (#84073) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84073 Approved by: https://github.com/huydhn commit 993d8bb77e1ce79940705b1c7667dc9276f449df Author: Min Si Date: Fri Aug 26 05:45:59 2022 +0000 Use size to check same tensor sizes in reduce_scatter and allgather (#84099) Summary: Previous code uses tensor.numel() to check if all tensors have the same size in order to switch between reduce_scatter_v v.s. reduce_scatter, same applies to allgather. However, if the user input tensor is zero in the last dimension (e.g., [648632,0]), then numel() returns zero and check_same_numel is always true. This patch fixes the check to use size rather than numel, to cover the above case. Differential Revision: D39044439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84099 Approved by: https://github.com/kwen2501 commit 089101fc82971aca874093e7504cf24b11462bcc Author: Christian Jauvin Date: Fri Aug 26 04:53:49 2022 +0000 Fix small typo in cuda.rst (#84012) This fixes a very minor typo in the CUDA semantics doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84012 Approved by: https://github.com/malfet commit 15b560a5c4d638c82e738f3496e2faf95fc328a5 Author: Gao, Xiang Date: Fri Aug 26 03:11:46 2022 +0000 Fix missing include for size_t (#84088) Fixes the following issue: ```C++ In file included from /home/gaoxiang/pytorch-ucc/c10/test/util/ConstexprCrc_test.cpp:1: In file included from /home/gaoxiang/pytorch-ucc/c10/util/ConstexprCrc.h:3: /home/gaoxiang/pytorch-ucc/c10/util/IdWrapper.h:42:10: error: unknown type name 'size_t'; did you mean 'std::size_t'? friend size_t hash_value(const concrete_type& v) { ^~~~~~ std::size_t /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/x86_64-pc-linux-gnu/bits/c++config.h:298:26: note: 'std::size_t' declared here typedef __SIZE_TYPE__ size_t; ^ 1 error generated. [111/2069] Generating /home/gaoxiang/pytorch-ucc/torch/csrc/a...ch-ucc/torch/testing/_internal/generated/annotated_fn_args.py ninja: build stopped: subcommand failed. ``` This error happens with my GCC 12.2.0 + Clang 14.0.6. Full environment: ``` Collecting environment information... PyTorch version: 1.13.0a0+git14a53e6 Is debug build: True CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A OS: Arch Linux (x86_64) GCC version: (GCC) 12.2.0 Clang version: 14.0.6 CMake version: version 3.24.1 Libc version: glibc-2.36 Python version: 3.10.6 (main, Aug 3 2022, 17:39:45) [GCC 12.1.1 20220730] (64-bit runtime) Python platform: Linux-5.19.3-arch1-1-x86_64-with-glibc2.36 Is CUDA available: True CUDA runtime version: 11.7.99 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 2080 Ti Nvidia driver version: 515.65.01 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.4.1 /usr/lib/libcudnn_adv_infer.so.8.4.1 /usr/lib/libcudnn_adv_train.so.8.4.1 /usr/lib/libcudnn_cnn_infer.so.8.4.1 /usr/lib/libcudnn_cnn_train.so.8.4.1 /usr/lib/libcudnn_ops_infer.so.8.4.1 /usr/lib/libcudnn_ops_train.so.8.4.1 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.23.1 [pip3] torch==1.13.0a0+gitbcc6f6c [pip3] torch-ucc==1.0.0 [pip3] torchani==2.2 [pip3] torchvision==0.2.2.post3 [conda] Could not collect ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84088 Approved by: https://github.com/ezyang commit 9790d90e4b0288796ab44a6b4979db0a67580ba8 Author: Edward Z. Yang Date: Thu Aug 25 18:33:45 2022 -0700 Don't introduce new overload for SymInt (#83628) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh commit d2f37401b85c9bdea342c4eb0f1d1f277ae93ed0 Author: Rohan Varma Date: Thu Aug 25 18:36:30 2022 +0000 Silence namedtuple warning in dist (#84072) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84072 Approved by: https://github.com/awgu commit b35e7c5da770ec5b5080cb69eb3618f5a3203e9c Author: Rohan Varma Date: Thu Aug 25 18:36:06 2022 +0000 Fix FSDP not all outputs used in loss (#83195) There are a couple issues / assumptions within FSDP today that this PR attempts to fix: - In wait_for_post_backward, we assume that if a param required grad, its post backward was called, but this is not true, i.e. if its output did not participate in grad computation, it would not have called post backward. To fix this we simply removed those assertions. - There is a deeper issue where in `_finalize_params`, we could end up assigning a grad of the sharded shape to an unsharded parameter gradient field, which would raise a shape error. This can happen for example if a parameter's usage transitions from used --> unused. In this case, when the parameter was used, it would have had a gradient, then user could have possibly called `zero_grad()` and p.grad would not be `None`. This in `_prep_grad_for_backward`, we would assign a `_saved_grad_shard` to this gradient field which would be the sharded shape. In `_finalize_param`, our parameter would be unsharded (since post_backward was not called), but we'd try to assign, raising the shape issue. This issue is fixed by checking `_post_backward_called`. If this is False, we simply skip the assignment because there is no new gradient to update. - A final issue as mentioned above is that if post_backward is not called, we never reshard the full param. This is fixed by checking if we haven't resharded (basically if post_backward_called == False), and if so, performing a reshard. A few things to note: - This logic may have to be revisited when non-recursive wrapping lands as there are multiple FlatParams per FSDP unit - This logic may not work when post_backward_hook fires but p.grad is None, i.e. the short-circuiting here: https://github.com/pytorch/pytorch/blob/f534b2c627da65bbee7ccc8f7e054da0ba48eb79/torch/distributed/fsdp/fully_sharded_data_parallel.py#L2884. As a quick fix, we could just move `_post_backward_called` flag change to after this, or just perform a reshard before returning early. I am not sure how to repro a case where p.grad == None but we call the post-backward hook, https://github.com/pytorch/pytorch/issues/83197 might be a possibility, but I think it is fine to not support this yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83195 Approved by: https://github.com/awgu commit 7e5c76da47beec83ba539fdb53d52e13d492cc4f Author: Sherlock Huang Date: Thu Aug 25 20:20:52 2022 +0000 Make graph_module.print_readable() discoverable (#83960) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83960 Approved by: https://github.com/ezyang commit a4a55f5ea6403472a25b12a6e9b8f4f713e664a3 Author: Xiang Gao Date: Thu Aug 25 21:33:15 2022 +0000 New TORCH_UCC_BLOCKING_WAIT env variable (#81791) Cherry-pick of https://github.com/facebookresearch/torch_ucc/pull/95. I recommend waiting until https://github.com/pytorch/pytorch/pull/81583 is merged first, so the CI is checking if this PR compiles correctly. Marking this as a draft for now, will change to "ready for review" once https://github.com/pytorch/pytorch/pull/81583 merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81791 Approved by: https://github.com/kwen2501 commit 85f82f7311d33b0ed28fe4865c4ac24f96d6cbaa Author: migeedz Date: Thu Aug 25 11:23:00 2022 -0700 example program for paper intro (#83945) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83945 Approved by: https://github.com/jansel commit bf25a140f9d948915d52f294b5204196d1ca22e3 Author: Justin Chu Date: Thu Aug 25 21:24:35 2022 +0000 [ONNX] Add runtime type checking to `export` (#83673) This PR adds an internal wrapper on the [beartype](https://github.com/beartype/beartype) library to perform runtime type checking in `torch.onnx`. It uses beartype when it is found in the environment and is reduced to a no-op when beartype is not found. Setting the env var `TORCH_ONNX_EXPERIMENTAL_RUNTIME_TYPE_CHECK=ERRORS` will turn on the feature. setting `TORCH_ONNX_EXPERIMENTAL_RUNTIME_TYPE_CHECK=DISABLED` will disable all checks. When not set and `beartype` is installed, a warning message is emitted. Now when users call an api with invalid arguments e.g. ```python torch.onnx.export(conv, y, path, export_params=True, training=False) ``` they get ``` Traceback (most recent call last): File "bisect_m1_error.py", line 63, in main() File "bisect_m1_error.py", line 59, in main reveal_error() File "bisect_m1_error.py", line 32, in reveal_error torch.onnx.export(conv, y, cpu_model_path, export_params=True, training=False) File "<@beartype(torch.onnx.utils.export) at 0x1281f5a60>", line 136, in export File "pytorch/venv/lib/python3.9/site-packages/beartype/_decor/_error/errormain.py", line 301, in raise_pep_call_exception raise exception_cls( # type: ignore[misc] beartype.roar.BeartypeCallHintParamViolation: @beartyped export() parameter training=False violates type hint , as False not instance of . ``` when `TORCH_ONNX_EXPERIMENTAL_RUNTIME_TYPE_CHECK` is not set and `beartype` is installed, a warning message is emitted. ``` >>> torch.onnx.export("foo", "bar", "f") :1: CallHintViolationWarning: Traceback (most recent call last): File "/home/justinchu/dev/pytorch/torch/onnx/_internal/_beartype.py", line 54, in _coerce_beartype_exceptions_to_warnings return beartyped(*args, **kwargs) File "<@beartype(torch.onnx.utils.export) at 0x7f1d4ab35280>", line 39, in export File "/home/justinchu/anaconda3/envs/pytorch/lib/python3.9/site-packages/beartype/_decor/_error/errormain.py", line 301, in raise_pep_call_exception raise exception_cls( # type: ignore[misc] beartype.roar.BeartypeCallHintParamViolation: @beartyped export() parameter model='foo' violates type hint typing.Union[torch.nn.modules.module.Module, torch.jit._script.ScriptModule, torch.jit.ScriptFunction], as 'foo' not , , or . Traceback (most recent call last): File "", line 1, in File "/home/justinchu/dev/pytorch/torch/onnx/_internal/_beartype.py", line 63, in _coerce_beartype_exceptions_to_warnings return func(*args, **kwargs) File "/home/justinchu/dev/pytorch/torch/onnx/utils.py", line 482, in export _export( File "/home/justinchu/dev/pytorch/torch/onnx/utils.py", line 1422, in _export with exporter_context(model, training, verbose): File "/home/justinchu/anaconda3/envs/pytorch/lib/python3.9/contextlib.py", line 119, in __enter__ return next(self.gen) File "/home/justinchu/dev/pytorch/torch/onnx/utils.py", line 177, in exporter_context with select_model_mode_for_export( File "/home/justinchu/anaconda3/envs/pytorch/lib/python3.9/contextlib.py", line 119, in __enter__ return next(self.gen) File "/home/justinchu/dev/pytorch/torch/onnx/utils.py", line 95, in select_model_mode_for_export originally_training = model.training AttributeError: 'str' object has no attribute 'training' ``` We see the error is caught right when the type mismatch happens, improving from what otherwise would become `AttributeError: 'str' object has no attribute 'training'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83673 Approved by: https://github.com/BowenBao commit 562a021cf3c6468c1f86e34c5d46accf2aac8eab Author: Nikita Shulga Date: Thu Aug 25 21:13:09 2022 +0000 [GHF] Land validation should not change default branch (#84084) This prevents a loophole, where somebody submits a PR that modifies merge rules and request land validation, so that their PR will be validated against those rules, rather than ones currently in trunk. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84084 Approved by: https://github.com/janeyx99, https://github.com/kit1980 commit 9f4626ea1b5326843756d866d90d583fbca32616 Author: ssjia Date: Thu Aug 25 06:15:32 2022 -0700 [vulkan] use VMA at third-party (#83934) Remove the VMA checked in at `aten/src/ATen/native/vulkan/api/vk_mem_alloc.h`, and use the version checked into `fbsource/third_party` instead. Also change open source CMakeLists to look for VMA in third_party submodule directory. Note that I had to add an alternate VulkanMemoryAllocator target that uses `fb_xplat_cxx_library` instead of `oxx_static_library` to make it work with vulkan targets in `caffe2`. Before landing this diff, make sure https://github.com/pytorch/pytorch/pull/83906 is committed on open source, which adds VMA as a git submodule of pytorch. Differential Revision: [D38943217](https://our.internmc.facebook.com/intern/diff/D38943217/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38943217/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83934 Approved by: https://github.com/manuelcandales commit ced2ca8f867b376c5b4e495f183aeba78a27c0c4 Author: Michael Voznesensky Date: Thu Aug 25 20:11:53 2022 +0000 Torch cond operator, python dispatch, pyoperator (#83154) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/83154 Approved by: https://github.com/ezyang commit 3c2a0780b8f4e717b8a754fdc3dde68193ccae6c Author: BowenBao Date: Tue Aug 23 17:53:29 2022 -0700 [ONNX] Assign ONNXScopeName during function substituion (#82039) Previously only traced IR graph stores module typename and variable name in `scope` in `node`. This change enables such `scope` info for IR graph generated by torch script. Torch script produced IR graphs emit nodes for module object and module method call. This structured graph is flattened in `function_substition` pass prior to other ONNX conversion passes. This PR extends `function_substition` pass to record the module typename and variable name info in `scope`, while inlining the graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82039 Approved by: https://github.com/justinchuby, https://github.com/abock commit 4c19981316eb4c343a935cbd8b06b4bd2b610c50 Author: erjia Date: Wed Aug 24 19:29:06 2022 +0000 [DataPipe] Reset Shuffler's iterator when NotStarted (#83535) This PR changes the behavior of `IterDataPipe` to always invoke `reset` for the state of `NotStarted`. The main reason is we normally put lazy initialization code into `reset` function. Even for the state of `NotStarted`, we should invoke `reset` to initialize those lazy variables. Otherwise, we have to manually determine if the state is `NotStarted` or `Iterating` in `__iter__` function and only manually invoke `reset` in the state of `NotStarted`. This PR also makes `Shuffler` is able to serialize with `buffer` and `rng_state`. The following part is removed: ~I am also add `_snapshot_state` into serialization state and during `__setstate__` only change the state to `Restored` if the original state is `Iterating`. Especially, for the case of deserializing/serializing `NotStarted` DataPipe (multiprocessing), we would invoke `set_seed` for `Shuffler`. We need the `DataPipe` remains as `NotStarted` to properly `reset`.~ I am listing all the expected behavior state transition below: - Initial state: `NotStarted` - `iter` -> Call `reset` and change the state to `Iterating` - serialize/deserialize -> Keep the state as `NotStarted` (will `reset` if `iter` is called afterwards) - Initial state: `Iterating` - `iter` -> Call `reset` and keep the state to `Iterating` - serialize/deserialize -> Change the state as `Restored` - Initial state: `Restored` - `iter` -> Only change the state to `Iterating` - serialize/deserialize -> Not allowed Pull Request resolved: https://github.com/pytorch/pytorch/pull/83535 Approved by: https://github.com/NivekT commit b82c74da07430ba4a221403d383eeb27de04f7f7 Author: Brian Hirsh Date: Thu Aug 25 08:45:33 2022 -0700 functionalization: support inplace views on inputs (#83993) A version of this PR was sitting at https://github.com/pytorch/pytorch/pull/82601 but that PR some other cleanup that relies on being able to use functorch in pytorch/pytorch CI tests, which isn't ready yet. I pulled the change out here to unblock functionalization for some models run with inductor (see https://github.com/pytorch/torchdynamo/issues/964#issuecomment-1225971788). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83993 Approved by: https://github.com/ezyang commit 0c6a616af0b14b9bbe190b93655edf24bac14cfd Author: Brian Hirsh Date: Thu Aug 25 08:45:32 2022 -0700 run functorch decomps after functionalization when enabled (#83992) This is a short-to-midterm fix for https://github.com/pytorch/pytorch/issues/83923. By running functionalization before decomps, we guarantee that functionalization won't have to see any primtorch view/inplace ops like `broadcast_in_dim`. This will only really be a problem if there's a function in the decomposition table that decomposes a functional op into mutations. If that comes up later, we'll need to revisit https://github.com/pytorch/pytorch/issues/83923. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83992 Approved by: https://github.com/ezyang commit caaa723ae24a904a76d3851b1b84de3ad128735a Author: Nikita Shulga Date: Thu Aug 25 18:21:44 2022 +0000 [GHF][BE] Move merge rules to yaml (#84065) To allow comments Update `trymerge.yaml`, `revert.yaml` and `tryrebase.yaml` to use v4 setup-python action and install pyyaml Reformat json to yaml by running: ``` python -c "import yaml;print(yaml.dump(yaml.safe_load(open('.github/merge_rules.yaml')), sort_keys=False))" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84065 Approved by: https://github.com/b0noI, https://github.com/huydhn commit 86e134ddf777d6f4b82a1860102698ef13bf0c33 Author: Nikolay Korovaiko Date: Thu Aug 25 17:28:23 2022 +0000 disable c10::SymIntNode tests on mobile (#84066) This fixes c++ tests' breaks where we were passing pointers and expected `is_symbolic` to return `true` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84066 Approved by: https://github.com/albanD commit 2f04ba2c7c8920418ad77ebb1ab09d93374e6578 Author: zaf Date: Thu Aug 25 01:53:51 2022 -0700 [quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [X] [Current PR] `torch.nn.qat` → `torch.ao.nn.qat` - [X] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [X] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861197/)! Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78716 Approved by: https://github.com/jerryzh168 commit 29e83b6599e91ddc087540880f7c14cbe33df200 Author: zaf Date: Thu Aug 25 01:53:49 2022 -0700 [quant][ao_migration] `torch.nn.quantizable` → `torch.ao.nn.quantizable`. (#78717) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] [Current PR] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - `torch/ao/nn/__init__.py` → Changing the imports to lazy. Differential Revision: [D36861090](https://our.internmc.facebook.com/intern/diff/D36861090/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861090/)! Differential Revision: [D36861090](https://our.internmc.facebook.com/intern/diff/D36861090) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78717 Approved by: https://github.com/jerryzh168 commit b1455f9424227ec3a0ff7c6b41a73868b03c7967 Author: zaf Date: Thu Aug 25 01:53:48 2022 -0700 [quant][ao_migration] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` (#78715) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] [Current PR] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860927/)! Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78715 Approved by: https://github.com/jerryzh168 commit d32a762147343bbb9404ef995c979fe8f048b836 Author: zaf Date: Thu Aug 25 01:53:46 2022 -0700 [quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] [Current PR] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 - [BC test](test/quantization/bc/test_backward_compatibility.py) @vkuzo - [IR emitter](torch/csrc/jit/frontend/ir_emitter.cpp) @jamesr66a - [JIT serialization](torch/csrc/jit/serialization/import_source.cpp) @IvanKobzarev @jamesr66a Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860660/)! Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78714 Approved by: https://github.com/jerryzh168 commit c92e5ac95be6a1ccae505fb391bb02329b9a7413 Author: zaf Date: Thu Aug 25 01:53:44 2022 -0700 [quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012/) Differential Revision: [D38926012](https://our.internmc.facebook.com/intern/diff/D38926012) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168 commit a83d7d8b654bc5169a1450f2e7a5fb2fc58f5ffe Author: Jianyu Huang Date: Thu Aug 25 16:47:02 2022 +0000 enable qlinear dynamic parallelization with fbgemm (#84033) Test Plan: CI Differential Revision: D39004891 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84033 Approved by: https://github.com/jerryzh168 commit e2f75d63d42cdf7940cc919e745daad02a961395 Author: Animesh Jain Date: Thu Aug 25 16:09:52 2022 +0000 Decomposition - batch_norm, save_mean and save_variance always float32 (#84013) AMP error shown here - https://github.com/pytorch/torchdynamo/issues/835 Test missing Pull Request resolved: https://github.com/pytorch/pytorch/pull/84013 Approved by: https://github.com/ezyang commit 56fef4e6ee4020de957c4888032e04a5721576cb Author: erjia Date: Thu Aug 25 16:05:14 2022 +0000 fix `NoneType` object has no attribute `python_exit_status` (#83985) Fixes #83791 Prevents the Error when `_utils` has been cleared by Python before `__del__` is invoked. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83985 Approved by: https://github.com/NivekT commit 00cb184512f3a636d87793f46d3f9c7fea406b25 Author: Richard Zou Date: Wed Aug 24 13:51:12 2022 -0700 [functorch] add batching rule for fill_.Tensor (#84015) I think this is what the theseus folks ran into, but will confirm with them later. Test Plan: - new manual test; the OpInfo for fill_ isn't sufficient and it is difficult to modify Pull Request resolved: https://github.com/pytorch/pytorch/pull/84015 Approved by: https://github.com/Chillee commit 31f151767b2fde8ba2b73e0fe2b9bd68f284673b Author: XiaobingSuper Date: Thu Aug 25 10:03:19 2022 +0000 add qscheme check for quantization observer (#80126) Motivation: each quantization observer only supports a limit qschemes, we need to do this check at the initiation step, rather than at the running step, such as MinMaxObserver with set qscheme with **torch.per_channel_affine**, there will have a runtime error at the running the calibration step: ``` AttributeError: 'MinMaxObserver' object has no attribute 'ch_axis' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/80126 Approved by: https://github.com/jerryzh168 commit f5a3515083cce8335913e1985b54d5e6ead95498 Author: Mario Lezcano Date: Thu Aug 25 04:24:44 2022 -0500 Make linalg.inv composite of linalg.solve (#80074) The `getri` kernel calls inside `getrs` so we can do so explicitly ourselves and save ourselves from having to maintain an extra kernel. This way we just need to optimise `lu_factor` and `lu_solve` and `inv` will be as efficient as it can be, as it'll be choosing the best backend to perform the factorisation and the best backend (not necessarily the same) to perform the solve. Fixes https://github.com/pytorch/pytorch/issues/77498 The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet commit e3c89d07789d632b135862b87a5cbd4cfce7f53a Author: Horace He Date: Thu Aug 25 06:59:37 2022 +0000 Disable autocast cache during aotdispatch (#84035) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84035 Approved by: https://github.com/jansel commit 63cbdc92a750a667ffdcfbdac563d02db6fd9559 Author: Nikolay Korovaiko Date: Thu Aug 25 08:28:38 2022 +0000 switching the exact check to isinstance check (#84023) Simplifying a type check if an object is a SymIntNode in `is_symint_node` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84023 Approved by: https://github.com/ezyang commit 02c3781332031981988cd0cadfd573a210210b33 Author: Huy Do Date: Thu Aug 25 07:28:50 2022 +0000 Enable cache action for lint workflow (#84026) Cache all python dependencies using [GHA cache](https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows). I'm doing this for lint workflow first and will slowly roll it out to other workflows. Before caching, pip cache is not found. Dependencies installation continues as usual: ![Screen Shot 2022-08-24 at 16 36 15](https://user-images.githubusercontent.com/475357/186543554-9d7f5978-2c2d-4362-9535-c3b17e922da1.png) After caching https://github.com/pytorch/pytorch/runs/8006214772?check_suite_focus=true. The long hash at the end of the cache key is the hash of requirements files ![Screen Shot 2022-08-24 at 16 51 51](https://user-images.githubusercontent.com/475357/186543825-055ea025-3d42-42fc-877d-baec358de0ed.png) Note that the cache is in the runners themselves. This should be a transparent process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84026 Approved by: https://github.com/seemethere, https://github.com/suo, https://github.com/malfet commit c00f0c80c0d9e4b8dae9ff6493f963315abc777c Author: Alex Beloi Date: Thu Aug 25 07:10:30 2022 +0000 [fx] add deferred weights (xl_weight) and tracing for xl_embedding_bag (#84016) Test Plan: added unit tests Reviewed By: jfix71 Differential Revision: D36152238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84016 Approved by: https://github.com/jfix71 commit 8b8942b11464bbe042b731bc332d65297161353a Author: Horace He Date: Thu Aug 25 01:53:33 2022 +0000 Fix dumb make_fx issue (#84011) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84011 Approved by: https://github.com/ezyang commit c03f8abb21d7848f83162a82a42ffdd219b668e3 Author: Mandar Deshpande Date: Thu Aug 25 06:23:32 2022 +0000 [fx+scripting] Adding num_iter_1 and num_iter_2 params LearningRate op (#83691) Summary: Adding num_iter_1 and num_iter_2 to learning rate op Test Plan: Exisiting unit tests Differential Revision: D38762710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83691 Approved by: https://github.com/qxy11 commit 2000eba4547f885dc937c4335bee4ba1a71b4df5 Author: Peter Bell Date: Thu Aug 25 00:57:57 2022 +0100 NCCL: Re-enable parallel builds (#83696) Since #83173 was merged I have noticed some CI being slowed down by the nccl building step. e.g. if there are no C++ changes then sccache compiles everything else very quickly and nccl becomes the limiting factor. This re-enables parallel builds with some safeguards to protect against oversubscription. When `make` is the parent build system, we can use `$(MAKE)` and the `make` jobserver will coordinate job allocation with the sub-process. For other build systems, this calls `make` with the `-l` flag which should prevent it launching jobs when the system load average is already too high. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83696 Approved by: https://github.com/malfet commit d5af2a70ba47f604193e137f3d05d8ffee4ed7f0 Author: PyTorch MergeBot Date: Thu Aug 25 05:09:12 2022 +0000 Revert "[TorchTidy] Adding support for unique tensor identifiers (#80266)" This reverts commit b6ba41921daf6365a762562641bfd846437c8529. Reverted https://github.com/pytorch/pytorch/pull/80266 on behalf of https://github.com/malfet due to Broke number of trunk jobs, see https://hud.pytorch.org/pytorch/pytorch/commit/b6ba41921daf6365a762562641bfd846437c8529 commit 1f61c39ac43d8cfccbe345ab42924ab739c4c1a8 Author: PyTorch MergeBot Date: Thu Aug 25 05:01:37 2022 +0000 Revert "Support NCCL Premul Sum (#81272)" This reverts commit 432c508e71111f9d5382322e0e6b1bc1c66bf0ec. Reverted https://github.com/pytorch/pytorch/pull/81272 on behalf of https://github.com/weiwangmeta due to breaking internal builds commit 460636ab9402b759c28c2bb1adb6b2ab12c0e773 Author: Andrew Gallagher Date: Thu Aug 25 04:14:09 2022 +0000 [caffe2] Remove last clang-for-cuda sources (#84021) Summary: We're no longer pursuing clang-for-cuda, so remove the last use-case. Test Plan: CI Reviewed By: pallab-zz Differential Revision: D38996710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84021 Approved by: https://github.com/malfet commit a013597b32d7c14b76e1d18214ec77770196fe0d Author: XiaobingSuper Date: Thu Aug 25 03:58:11 2022 +0000 fix oneDNN channels_last path issue (#83653) Fix #82060(N>1 will call in OneDNN path) and #80837, those two issues are introduced by the definition of channels last is different between PyTorch FW side with ideep side, this PR will fix this gap which ideep will use the format flag given by FW side. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83653 Approved by: https://github.com/mingfeima, https://github.com/malfet commit b6ba41921daf6365a762562641bfd846437c8529 Author: John Clow Date: Wed Aug 24 16:58:09 2022 -0700 [TorchTidy] Adding support for unique tensor identifiers (#80266) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80266 Approved by: https://github.com/robieta commit b21a6ff6397b74c148c12e4fc41ef12b382443e2 Author: jjsjann123 Date: Tue Aug 23 23:22:37 2022 +0000 [NVFuser] Upstream push 0811 (#83239) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - codegen improvements: 1. double support in expression evaluator - bug fixes: 1. dropout fix - rework RNG to support broadcasted dropout (Fixes #82784) 2. expand fix - Patch expand+reduction, expand+view, rework view analysis and guard - scheduler: 1. manual transpose schedule example 2. WIP transpose scheduler Commits that's in this PR from the devel branch: ``` b7435afcd22c917713c2f41a7237bc26e1183f14 Transpose scheduler, step 1 (#1854) 8a45dbf72034684eb8e18b1835b533e90b68f184 Add an example on how to manually schedule transpose (#1889) 83dbf56a9554b2efbd5416461d938fff477b0b27 Patch dropout fix (#1898) 69d3519a532250719b1aa8341b50e067b181b42d Expand+Reduction, Expand+View support, rework View analysis and guards (#1883) 15091c488e96343bdc49e3990acbf238a3b3da51 Rework RNG to correctly support broadcasted dropout (#1888) aafe2d048aaac596e503596a41303423619f3954 Make ExpressionEvaluator support Double (#1885) ``` RUN_TORCHBENCH: nvfuser Differential Revision: [D38657074](https://our.internmc.facebook.com/intern/diff/D38657074) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83239 Approved by: https://github.com/davidberard98 commit e90db1756585250096b0ea8e9ca31ad4fd007809 Author: atalman Date: Thu Aug 25 01:08:25 2022 +0000 Increase timeout for linux binary builds (#84008) Increase timeout for linux binary builds This mitigates conda build issue: https://github.com/pytorch/pytorch/issues/84003 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84008 Approved by: https://github.com/malfet commit 6b597595b2fb54dcc63e25169d58a2c4602306c1 Author: Weiwen Xia Date: Thu Aug 25 01:07:18 2022 +0000 [Quant] Vectorize scalar remainder in quantized kernel for normalization (#79673) This PR improves performance of quantized kernel for normalize by vectorizing scalar remainder. In the current implementation [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp), the computation is vectorized while the scalar remainder is handled in a `for` loop. The remainder is also vectorized to improve performance in this PR. This kernel is for contiguous (NCHW) memory layout. For channels-last memory layout, a fast path is added in this PR https://github.com/pytorch/pytorch/pull/70520 The improvement is beneficial for layer norm, group norm and instance norm as this kernel is used for them. 1. Add an argument `size` to `Vectorized::loadu()` for vec256_qint and vec512_qint. 2. Load the remainder with the new `loadu` and do computation in the similar way as for vectorized part. Run quantized group norm with group = 2. Op CPU time measured by `torch.profiler.profile` with warmup = 20, active = 200 - Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz - OS: CentOS Linux 7 (Core) (x86_64) - Python version: 3.7.10 - Use JeMalloc memory allocator - MALLOC_CONF=oversize_threshold:1,background_thread:true,metadata_thp:auto - Using Intel OpenMP - KMP_AFFINITY=granularity=fine,compact,1,0 - KMP_BLOCKTIME=1 **Environment** - GCC version: (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3) - AVX2 enabled, AVX512 disabled, i.e., vec256 used **Run a single instance on a single core** Shape | New impl (us) | Old impl (us) | Fp32 (us) | New/old | New/fp32 | Comments -- | -- | -- | -- | -- | -- | -- (1, 2, 8, 5) | 3.73 | 3.75 | 4.51 | 99.41% | 82.75% | Remainder size = 8 (1, 2, 8, 6) | 3.76 | 4.00 | 4.53 | 93.93% | 82.95% | Remainder size = 16 (1, 2, 8, 7) | 3.74 | 4.01 | 4.52 | 93.34% | 82.84% | Remainder size = 24 (1, 2, 8, 8) | 3.90 | 3.96 | 4.49 | 98.49% | 87.00% | No remainder (1, 2, 8, 17) | 4.00 | 4.17 | 4.72 | 95.83% | 84.69% | Remainder size = 8 (1, 2, 8, 18) | 4.00 | 4.23 | 4.72 | 94.54% | 84.89% | Remainder size = 16 (1, 2, 8, 19) | 4.03 | 4.29 | 4.76 | 94.01% | 84.70% | Remainder size = 24 (1, 2, 8, 20) | 3.92 | 3.93 | 4.76 | 99.67% | 82.29% | No remainder (1, 2, 8, 33) | 4.10 | 4.18 | 5.06 | 97.92% | 81.00% | Remainder size = 8 (1, 2, 8, 34) | 4.07 | 4.23 | 5.06 | 96.40% | 80.53% | Remainder size = 16 (1, 2, 8, 35) | 4.11 | 4.42 | 5.09 | 93.03% | 80.72% | Remainder size = 24 (1, 2, 8, 36) | 4.03 | 4.06 | 5.11 | 99.24% | 78.83% | No remainder ![image](https://user-images.githubusercontent.com/12522207/173979129-e393e13f-71f5-4987-95ea-ac6e0c895bd7.png) **Run a single instance on two cores** Shape | New impl (us) | Old impl (us) | Fp32 (us) | New/old | New/fp32 | Comments -- | -- | -- | -- | -- | -- | -- (1, 4, 8, 5) | 5.09 | 5.24 | 5.52 | 97.17% | 92.29% | Remainder size = 8 (1, 4, 8, 6) | 5.22 | 5.50 | 5.56 | 94.95% | 93.86% | Remainder size = 16 (1, 4, 8, 7) | 5.04 | 5.60 | 5.51 | 89.97% | 91.44% | Remainder size = 24 (1, 4, 8, 8) | 5.30 | 5.29 | 5.56 | 100.23% | 95.27% | No remainder (1, 4, 8, 17) | 5.36 | 5.56 | 6.05 | 96.53% | 88.69% | Remainder size = 8 (1, 4, 8, 18) | 5.48 | 5.71 | 6.25 | 95.99% | 87.67% | Remainder size = 16 (1, 4, 8, 19) | 5.44 | 5.81 | 6.25 | 93.65% | 87.11% | Remainder size = 24 (1, 4, 8, 20) | 5.43 | 5.34 | 6.07 | 101.76% | 89.43% | No remainder (1, 4, 8, 33) | 5.52 | 5.58 | 6.51 | 98.89% | 84.75% | Remainder size = 8 (1, 4, 8, 34) | 5.50 | 5.71 | 6.63 | 96.22% | 82.95% | Remainder size = 16 (1, 4, 8, 35) | 5.50 | 6.16 | 6.40 | 89.33% | 85.95% | Remainder size = 24 (1, 4, 8, 36) | 5.37 | 5.48 | 6.54 | 97.94% | 81.98% | No remainder ![image](https://user-images.githubusercontent.com/12522207/173981377-6222e278-0948-4f52-809b-28899399ca65.png) **Environment** - GCC version: (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2) - AVX512 enabled, i.e., vec512 used **Run a single instance on a single core** Shape | New impl (us) | Old impl (us) | Fp32 (us) | New/old | New/fp32 | Comments -- | -- | -- | -- | -- | -- | -- (1, 2, 16, 5) | 3.66 | 3.94 | 4.52 | 92.79% | 80.93% | Remainder size = 16 (1, 2, 16, 6) | 3.77 | 4.28 | 4.60 | 88.15% | 81.90% | Remainder size = 32 (1, 2, 16, 7) | 3.85 | 4.41 | 4.57 | 87.36% | 84.20% | Remainder size = 48 (1, 2, 16, 8) | 3.70 | 3.76 | 4.62 | 98.62% | 80.10% | No remainder (1, 2, 16, 17) | 3.91 | 4.06 | 4.97 | 96.43% | 78.71% | Remainder size = 16 (1, 2, 16, 18) | 3.82 | 4.34 | 5.01 | 88.19% | 76.30% | Remainder size = 32 (1, 2, 16, 19) | 3.86 | 4.56 | 5.05 | 84.63% | 76.28% | Remainder size = 48 (1, 2, 16, 20) | 3.80 | 3.87 | 5.08 | 98.14% | 74.73% | No remainder (1, 2, 16, 33) | 3.89 | 4.23 | 5.65 | 91.94% | 68.85% | Remainder size = 16 (1, 2, 16, 34) | 3.91 | 4.46 | 5.70 | 87.68% | 68.61% | Remainder size = 32 (1, 2, 16, 35) | 4.04 | 4.68 | 5.72 | 86.44% | 70.64% | Remainder size = 48 (1, 2, 16, 36) | 4.00 | 3.99 | 5.71 | 100.28% | 69.96% | No remainder ![image](https://user-images.githubusercontent.com/12522207/173982490-4687c5bc-50e8-49aa-9fe2-7967c738dbfb.png) **Run a single instance on two cores** Shape | New impl (us) | Old impl (us) | Fp32 (us) | New/old | New/fp32 | Comments -- | -- | -- | -- | -- | -- | -- (1, 4, 16, 5) | 5.43 | 5.53 | 5.92 | 98.12% | 91.60% | Remainder size = 16 (1, 4, 16, 6) | 5.35 | 5.85 | 6.05 | 91.53% | 88.54% | Remainder size = 32 (1, 4, 16, 7) | 5.31 | 6.04 | 6.18 | 87.97% | 85.93% | Remainder size = 48 (1, 4, 16, 8) | 5.30 | 5.27 | 6.30 | 100.66% | 84.16% | No remainder (1, 4, 16, 17) | 5.47 | 5.67 | 6.48 | 96.51% | 84.45% | Remainder size = 16 (1, 4, 16, 18) | 5.53 | 5.86 | 6.59 | 94.28% | 83.78% | Remainder size = 32 (1, 4, 16, 19) | 5.48 | 6.13 | 6.57 | 89.39% | 83.38% | Remainder size = 48 (1, 4, 16, 20) | 5.35 | 5.31 | 6.95 | 100.79% | 76.91% | No remainder (1, 4, 16, 33) | 5.62 | 5.77 | 7.31 | 97.28% | 76.80% | Remainder size = 16 (1, 4, 16, 34) | 5.56 | 5.85 | 7.06 | 95.03% | 78.71% | Remainder size = 32 (1, 4, 16, 35) | 5.67 | 6.10 | 7.09 | 93.03% | 79.98% | Remainder size = 48 (1, 4, 16, 36) | 5.50 | 5.39 | 7.20 | 102.15% | 76.42% | No remainder ![image](https://user-images.githubusercontent.com/12522207/173982748-5f003630-18a4-4c3d-a643-b8711892cc39.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79673 Approved by: https://github.com/jerryzh168 commit a7edf713608806f10e17eab90d0a5df727d9a16e Author: PyTorch MergeBot Date: Thu Aug 25 00:49:40 2022 +0000 Revert "Don't introduce new overload for SymInt (#83628)" This reverts commit 8fae7027b399e65e6071d335aa874497682c84d0. Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222 commit 7a02ee55dbf46d3d85d389cf013e1d97f79c7100 Author: PyTorch MergeBot Date: Thu Aug 25 00:45:05 2022 +0000 Revert "[xla hash update] update the pinned xla hash (#83967)" This reverts commit ce7a9f92e30b93ab6efff4135be005c9afd0533a. Reverted https://github.com/pytorch/pytorch/pull/83967 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628 commit 5321bf52f2791932ec5c1ea0eb3a1b585bfedba7 Author: PyTorch MergeBot Date: Thu Aug 25 00:43:00 2022 +0000 Revert "Make linalg.inv composite of linalg.solve (#80074)" This reverts commit 4737b3361479f4104efaa3bfa2ea517eaacb60fb. Reverted https://github.com/pytorch/pytorch/pull/80074 on behalf of https://github.com/malfet due to Depends on the changes from https://github.com/pytorch/pytorch/pull/83628 commit 4a6726a84073c98a773ae846d01dc63c73302c82 Author: Catherine Lee Date: Thu Aug 25 00:34:23 2022 +0000 use condensed disabled tests file (#84017) follow up to https://github.com/pytorch/test-infra/pull/545 then we can get rid of the non condensed version Pull Request resolved: https://github.com/pytorch/pytorch/pull/84017 Approved by: https://github.com/huydhn, https://github.com/janeyx99 commit cef522a8a9eec5355f5db528142231ab1176643c Author: ProGamerGov Date: Wed Aug 24 23:41:09 2022 +0000 Add docstring type guidelines for list & tuple to `CONTRIBUTING.md` (#83634) Minor followup to: https://github.com/pytorch/pytorch/pull/83536 For Google style docstrings, `list` and `tuple` should be completely lowercase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83634 Approved by: https://github.com/ngimel commit 101709f43b501bc7dc862431fffa822a38852b3e Author: chengscott <60510scott@gmail.com> Date: Wed Aug 24 23:04:03 2022 +0000 Add comments for block_reduce.cuh (#83825) ~~Add warning for the BlockReduce result Remove redundant __syncthreads~~ Add comments for BlockReduce Pull Request resolved: https://github.com/pytorch/pytorch/pull/83825 Approved by: https://github.com/ngimel commit bf8d5e83289a35e64a0ce98ef36fd45ab2dd0d43 Author: Sherlock Huang Date: Wed Aug 24 17:34:44 2022 +0000 Pretty print stack trace with gm.print_readable() (#83706) Precondition: https://github.com/pytorch/torchdynamo/pull/899 Given following function ``` def my_relu(a): return a.relu() def func(a, b): d = torch.square(a + b) e = my_relu(d) f = d.sin() s = torch.stack([e, f]) s = s.sum() ``` Here are the possible result with various tracing frontend: dynamo, symbolic_trace, make_fx - joint graph with torchdynamo.optimize("aot_nop") Notice that it has a special stack for gradient addition node (for multiple uses of tensor) in backward Notice that "No stacktrace found for following nodes" are shown for nodes with stacktrace ``` def forward(self, primals, tangents): primals_1, primals_2, tangents_1, = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec) add_tensor = torch.ops.aten.add.Tensor(primals_1, primals_2); primals_1 = primals_2 = None pow_tensor_scalar = torch.ops.aten.pow.Tensor_Scalar(add_tensor, 2) relu_default = torch.ops.aten.relu.default(pow_tensor_scalar) detach_default = torch.ops.aten.detach.default(relu_default) sin_default = torch.ops.aten.sin.default(pow_tensor_scalar) stack_default = torch.ops.aten.stack.default([relu_default, sin_default]); relu_default = sin_default = None sum_default = torch.ops.aten.sum.default(stack_default); stack_default = None is_same_size_default = torch.ops.aten.is_same_size.default(sum_default, tangents_1) expand_default = torch.ops.aten.expand.default(tangents_1, [2, 10, 10]); tangents_1 = None unbind_int = torch.ops.aten.unbind.int(expand_default); expand_default = None getitem = unbind_int[0] getitem_1 = unbind_int[1]; unbind_int = None cos_default = torch.ops.aten.cos.default(pow_tensor_scalar); pow_tensor_scalar = None mul_tensor = torch.ops.aten.mul.Tensor(getitem_1, cos_default); getitem_1 = cos_default = None detach_default_1 = torch.ops.aten.detach.default(detach_default); detach_default = None threshold_backward_default = torch.ops.aten.threshold_backward.default(getitem, detach_default_1, 0); getitem = detach_default_1 = None add_tensor_1 = torch.ops.aten.add.Tensor(mul_tensor, threshold_backward_default); mul_tensor = threshold_backward_default = None pow_tensor_scalar_1 = torch.ops.aten.pow.Tensor_Scalar(add_tensor, 1.0); add_tensor = None mul_scalar = torch.ops.aten.mul.Scalar(pow_tensor_scalar_1, 2.0); pow_tensor_scalar_1 = None mul_tensor_1 = torch.ops.aten.mul.Tensor(add_tensor_1, mul_scalar); add_tensor_1 = mul_scalar = None sum_sym_int = torch.ops.aten.sum.SymInt(mul_tensor_1, [0], True) view_sym_int = torch.ops.aten.view.SymInt(sum_sym_int, [10]); sum_sym_int = None return pytree.tree_unflatten([sum_default, mul_tensor_1, view_sym_int], self._out_spec) ``` - default symbolic_trace Notice that nodes without stacktrace are folded under same region ``` def forward(self, a, b): add = a + b; a = b = None square = torch.square(add); add = None relu = square.relu() sin = square.sin(); square = None stack = torch.stack([relu, sin]); relu = sin = None sum_1 = stack.sum(); stack = None return sum_1 ``` - symbolic_trace with record_stack_traces=True ``` def forward(self, a, b): add = a + b; a = b = None square = torch.square(add); add = None relu = square.relu() sin = square.sin(); square = None stack = torch.stack([relu, sin]); relu = sin = None sum_1 = stack.sum(); stack = None return sum_1 ``` - make_fx without decomposition ``` def forward(self, a_1, b_1): add_tensor = torch.ops.aten.add.Tensor(a_1, b_1); a_1 = b_1 = None pow_tensor_scalar = torch.ops.aten.pow.Tensor_Scalar(add_tensor, 2); add_tensor = None relu_default = torch.ops.aten.relu.default(pow_tensor_scalar) detach_default = torch.ops.aten.detach.default(relu_default) sin_default = torch.ops.aten.sin.default(pow_tensor_scalar); pow_tensor_scalar = None stack_default = torch.ops.aten.stack.default([relu_default, sin_default]); relu_default = sin_default = None sum_default = torch.ops.aten.sum.default(stack_default); stack_default = None return sum_default ``` - make_fx with decomposition to prims ``` def forward(self, a_1, b_1): broadcast_in_dim_default = torch.ops.prims.broadcast_in_dim.default(b_1, [10, 10], [1]); b_1 = None add_default = torch.ops.prims.add.default(a_1, broadcast_in_dim_default); a_1 = broadcast_in_dim_default = None mul_default = torch.ops.prims.mul.default(add_default, add_default); add_default = None le_default = torch.ops.prims.le.default(mul_default, 0.0) where_default = torch.ops.prims.where.default(le_default, 0.0, mul_default); le_default = None sin_default = torch.ops.prims.sin.default(mul_default); mul_default = None cat_default = torch.ops.prims.cat.default([where_default, sin_default], 0); where_default = sin_default = None split_dim_default = torch.ops.prims.split_dim.default(cat_default, 0, 2); cat_default = None convert_element_type_default = torch.ops.prims.convert_element_type.default(split_dim_default, torch.float32); split_dim_default = None sum_default = torch.ops.prims.sum.default(convert_element_type_default, [0, 1, 2]); convert_element_type_default = None return sum_default ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83706 Approved by: https://github.com/Chillee, https://github.com/ezyang commit e72256604f80e66b6a380479e4b610be19c82e71 Author: Chen, Jian Ping Date: Wed Aug 24 22:42:53 2022 +0000 Enhance add_out_dense_sparse_cpu for hybrid sparse tensor (#23057) This is to improve the performance for hybrid sparse coo tensor on CPU path. This case is appeared at the DLRM terabyte test. With this fix, according to the previous performance test data, it got ~10x performance improvement on DLRM execution. without this, the DLRM will run as Finished training it 100/1000 of epoch 0, 2969.25 ms/it, loss 0.220505, accuracy 0.000 % with this, the DLRM will run as Finished training it 100/1000 of epoch 0, 270.71 ms/it, loss 0.220505, accuracy 0.000 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/23057 Approved by: https://github.com/VitalyFedyunin, https://github.com/malfet commit 3b11b80fc3f9f9a0171abb5eb2299835feba8b04 Author: Bin Chen Date: Wed Aug 24 22:16:12 2022 +0000 Named pipe based watchdog timer (#83695) Summary: This diff implements a named pipe based watchdog timer (`FileTimerClient` and `FileTimerServer`). This is similar to the existing `LocalTimerClient` and `LocalTimerServer` (https://fburl.com/code/j4b9pyya). The motivation is from the need of handling various timeout issues. The training process occasionally get stuck. We need a proper watchdog to monitor the liveness of the training processes. This timer allows the TorchElastic agent (as the watchdog) to monitor the progress of the training processes that it spawned. If a timeout occurred, he TorchElastic agent can take some action to kill the stuck process and creating a core dump for it. `LocalTimerClient` and `LocalTimerServer` require a `multiprocessing.Queue()` to work. So they can only be used between `multiprocessing` parent and child processes. `FileTimerClient` and `FileTimerServer` does not have such limitation. Test Plan: ``` buck test mode/opt caffe2/test/distributed/elastic/timer:file_based_timer_test ``` ``` RemoteExecution session id: reSessionID-06d70a77-043c-4d9d-b0f2-94c24460740a-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425186732666 ✓ ListingSuccess: caffe2/test/distributed/elastic/timer:file_based_timer_test : 12 tests discovered (2.177) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_happy_path (file_based_local_timer_test.FileTimerTest) (2.463) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_expired_timers (file_based_local_timer_test.FileTimerServerTest) (1.889) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_send_request_release (file_based_local_timer_test.FileTimerServerTest) (1.700) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_valid_timers (file_based_local_timer_test.FileTimerServerTest) (1.873) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_watchdog_call_count (file_based_local_timer_test.FileTimerServerTest) (1.715) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_watchdog_empty_queue (file_based_local_timer_test.FileTimerServerTest) (1.609) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_exception_propagation (file_based_local_timer_test.FileTimerTest) (1.633) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_multiple_clients_interaction (file_based_local_timer_test.FileTimerTest) (2.189) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_get_timer_recursive (file_based_local_timer_test.FileTimerTest) (2.295) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_no_client (file_based_local_timer_test.FileTimerTest) (1.753) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_timer (file_based_local_timer_test.FileTimerTest) (2.151) ✓ Pass: caffe2/test/distributed/elastic/timer:file_based_timer_test - test_client_interaction (file_based_local_timer_test.FileTimerTest) (1.895) Summary Pass: 12 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425186732666 ``` Differential Revision: D38604238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83695 Approved by: https://github.com/d4l3k commit 37d3db7579386e928f70815afa7b5a21ffb2fefd Author: Jane Xu Date: Wed Aug 24 21:43:09 2022 +0000 Deletes CCACHE_DISABLE and SCCACHE_DISABLE from nccl.cmake (#84007) Looking through the code and online, it does not look like these variables actually change anything. Regardless, this change was instituted to fix https://github.com/pytorch/pytorch/issues/13362, but we are again running into similar issues even with the workaround: see https://github.com/pytorch/pytorch/issues/83790. Thus, since 1. this change isn't preventing flakiness 2. these variables do not seem used anywhere in pytorch/pytorch nor mozilla/sccache we should remove this confusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84007 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/ZainRizvi commit 1eff853fdc91619ea24abf0cbd51ca992fe10c97 Author: Jane Xu Date: Wed Aug 24 21:22:14 2022 +0000 Pin conda to 4.13.0 (#83991) Recent update to conda 4.14.0 caused breakages in our docker builds: https://hud.pytorch.org/pytorch/pytorch/commit/754d7f05b6841e555cea5a4b2c505dd9e0baec1d This pins to prevent the errors: ``` Traceback (most recent call last): 2022-08-24T16:20:49.2412247Z File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1125, in __call__ 2022-08-24T16:20:49.2413036Z File "/opt/conda/lib/python3.9/site-packages/conda/cli/main.py", line 86, in main_subshell 2022-08-24T16:20:49.2413615Z File "/opt/conda/lib/python3.9/site-packages/conda/cli/conda_argparse.py", line 93, in do_call 2022-08-24T16:20:49.2414282Z File "/opt/conda/lib/python3.9/site-packages/conda/notices/core.py", line 75, in wrapper 2022-08-24T16:20:49.2415036Z File "/opt/conda/lib/python3.9/site-packages/conda/notices/core.py", line 39, in display_notices 2022-08-24T16:20:49.2415853Z File "/opt/conda/lib/python3.9/site-packages/conda/notices/http.py", line 36, in get_notice_responses 2022-08-24T16:20:49.2416661Z File "/opt/conda/lib/python3.9/site-packages/conda/notices/http.py", line 39, in 2022-08-24T16:20:49.2417399Z File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator 2022-08-24T16:20:49.2418145Z File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 446, in result 2022-08-24T16:20:49.2418831Z File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result 2022-08-24T16:20:49.2419543Z File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 58, in run 2022-08-24T16:20:49.2420292Z File "/opt/conda/lib/python3.9/site-packages/conda/notices/http.py", line 42, in 2022-08-24T16:20:49.2421070Z File "/opt/conda/lib/python3.9/site-packages/conda/notices/cache.py", line 37, in wrapper 2022-08-24T16:20:49.2421712Z File "/opt/conda/lib/python3.9/site-packages/conda/notices/http.py", line 58, in get_channel_notice_response 2022-08-24T16:20:49.2422258Z File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 600, in get 2022-08-24T16:20:49.2422801Z File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 587, in request 2022-08-24T16:20:49.2423226Z File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 701, in send 2022-08-24T16:20:49.2423634Z File "/opt/conda/lib/python3.9/site-packages/requests/adapters.py", line 460, in send 2022-08-24T16:20:49.2424239Z File "/opt/conda/lib/python3.9/site-packages/requests/adapters.py", line 263, in cert_verify 2022-08-24T16:20:49.2424731Z OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /opt/conda/lib/python3.9/site-packages/certifi/cacert.pem 2022-08-24T16:20:49.2424967Z 2022-08-24T16:20:49.2425110Z During handling of the above exception, another exception occurred: 2022-08-24T16:20:49.2425279Z 2022-08-24T16:20:49.2425377Z Traceback (most recent call last): 2022-08-24T16:20:49.2425610Z File "/opt/conda/bin/conda", line 13, in 2022-08-24T16:20:49.2425845Z sys.exit(main()) 2022-08-24T16:20:49.2426176Z File "/opt/conda/lib/python3.9/site-packages/conda/cli/main.py", line 129, in main 2022-08-24T16:20:49.2426614Z File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1413, in conda_exception_handler 2022-08-24T16:20:49.2427054Z File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1128, in __call__ 2022-08-24T16:20:49.2427555Z File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1170, in handle_exception 2022-08-24T16:20:49.2427995Z File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1181, in handle_unexpected_exception 2022-08-24T16:20:49.2428471Z File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1251, in print_unexpected_error_report 2022-08-24T16:20:49.2428873Z ModuleNotFoundError: No module named 'conda.cli.main_info' 2022-08-24T16:20:55.5428691Z The command '/bin/sh -c bash ./install_conda.sh && rm install_conda.sh' returned a non-zero code: 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83991 Approved by: https://github.com/malfet commit f5bfa4d0888e6cd5984092b38cb8b10609558d05 Author: Jagadish Krishnamoorthy Date: Wed Aug 24 20:49:20 2022 +0000 [ROCm] Enable test_multiprocessing tests (#82356) Signed-off-by: Jagadish Krishnamoorthy Issue fixed in ROCm 5.2 user space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82356 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/huydhn commit d56577284afeb5c82b43365d422e63db7893f70b Author: Huy Do Date: Wed Aug 24 20:19:38 2022 +0000 Set python build-docs timeout to 30 minutes and cpp build-docs timeout to 180 minutes (#83957) Anything more means there's something wrong and we should just return. AFAIK the timeout doesn't include queuing time, only the job duration https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes ![Screen Shot 2022-08-23 at 18 31 57](https://user-images.githubusercontent.com/475357/186298046-5637384f-887c-4c6a-a946-c101b6c66741.png) This will help avoid having python build docs timeout after 6 hours. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83957 Approved by: https://github.com/ZainRizvi commit f38a32c905351aa67a2c2e22c9cb11736f81408f Author: chengscott <60510scott@gmail.com> Date: Wed Aug 24 20:18:55 2022 +0000 remove duplicate WarpReduceSum (#83757) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83757 Approved by: https://github.com/ngimel commit 67f0940cdd497005c6a78fd03e0d9f5a3dfbb2e7 Author: Richard Barnes Date: Wed Aug 24 20:12:25 2022 +0000 Check all CUDA API calls for errors in test/ (#74921) (#83954) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74921 Test Plan: Sandcastle Reviewed By: ezyang, malfet, ngimel Differential Revision: D35194966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83954 Approved by: https://github.com/ezyang commit a741927e614a96ac94b82ffe0e9eeba231626a9f Author: chengscott <60510scott@gmail.com> Date: Wed Aug 24 20:09:47 2022 +0000 Improve Normalization.cuh (#83871) remove unused Ops replaced copy-and-paste by calling BlockReduce (+SumReduceOp +2D block indexing) and removing duplicate warpSum Pull Request resolved: https://github.com/pytorch/pytorch/pull/83871 Approved by: https://github.com/ngimel commit 7b1a056b88af5b4fca48b89fd5f74025ad5f4741 Author: Richard Barnes Date: Wed Aug 24 20:02:57 2022 +0000 Map new CUDA error handling to HIP (#75032) (#83953) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75032 Test Plan: Sandcastle Reviewed By: ezyang, malfet Differential Revision: D35253785 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83953 Approved by: https://github.com/ezyang, https://github.com/malfet commit ef782e730dff7bef078fc95d7fb5b78bafcc5284 Author: thomasw21 <24695242+thomasw21@users.noreply.github.com> Date: Wed Aug 24 20:01:57 2022 +0000 Support BF16 for fast layernorm (#83971) Fixes #83970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83971 Approved by: https://github.com/ngimel commit a5564c4bd073bbfe4c1f41a01b2e7618500fa7ac Author: Sherlock Huang Date: Wed Aug 24 17:06:16 2022 +0000 Suppress Anomaly mode warning message (#83966) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83966 Approved by: https://github.com/albanD commit a8a36c45a64bdf51d5d4e0bd79c8779b2b918318 Author: Larry Liu <8188269+larryliu0820@users.noreply.github.com> Date: Wed Aug 24 19:50:19 2022 +0000 [frontend] Fix tensor list alias annotation (#84005) For issue https://github.com/pytorch/pytorch/issues/77920 and a retry of https://github.com/pytorch/pytorch/pull/83921 The current logic checks alias info before `[]` and after. If no alias info exists after `[]`, we overwrite the alias info before. This logic failed on argument like `Tensor(a!)[]`, dropping the alias info before `[]` on the floor. This PR adds a new alias info if it's missing after `[]`. This way we can keep the alias info before `[]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84005 Approved by: https://github.com/cccclai, https://github.com/bdhirsh commit b745e5f1157b7dd4b5814d42989af52ef8e2d68b Author: Richard Barnes Date: Wed Aug 24 18:59:05 2022 +0000 Check all CUDA API calls for errors in benchmarks/cpp/nvfuser (#74920) (#81817) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74920 Test Plan: Sandcastle Differential Revision: D35194656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81817 Approved by: https://github.com/malfet commit f7e668b7b5994370aedcaa5d96ac33a78ac19e5d Author: Catherine Lee Date: Wed Aug 24 18:38:36 2022 +0000 add hud link to merge failure message (#83946) as in title, related to https://github.com/pytorch/test-infra/issues/568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83946 Approved by: https://github.com/huydhn commit 3a9ae518f2c9424251eae13fb23db6ea571cfce1 Author: Nikita Shulga Date: Wed Aug 24 18:31:25 2022 +0000 Skip NCCL slimming for cxx11 libtorch builds (#83959) Fixes https://github.com/pytorch/pytorch/issues/83887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83959 Approved by: https://github.com/atalman commit d79ccb7b4589ab65727b16cde19918dfdd11d32c Author: Digant Desai Date: Wed Aug 24 18:17:27 2022 +0000 [pthreadpool] Cap max thread count to fix TSAN issues (#83950) Summary: Cap the thread count to 64 unconditionally to solve this tsan issue which leads to harder to debug, flaky test failures. Test Plan: CI Reviewed By: kimishpatel Differential Revision: D38136212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83950 Approved by: https://github.com/kimishpatel commit 5e01fb995c626be0a50e8d3a0ff4f7564fb37461 Author: Nikolay Korovaiko Date: Tue Aug 23 16:31:18 2022 -0700 strip SymIntNodes off in the mobile builds (#83938) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83938 Approved by: https://github.com/ezyang commit b842670aa54072f532ce75edfc3663f3509de146 Author: Nikolay Korovaiko Date: Tue Aug 23 14:04:35 2022 -0700 logical ops (#83879) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83879 Approved by: https://github.com/ezyang commit 2b805e3520f842c13e559e46398ae64206c2cf7a Author: Nikolay Korovaiko Date: Tue Aug 23 14:04:35 2022 -0700 add arithmetic ops (#83878) arithmetic ops tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/83878 Approved by: https://github.com/ezyang commit 0831813e26ebcd406b261ffb9629f933c627d4d3 Author: Nikolay Korovaiko Date: Tue Aug 23 14:04:35 2022 -0700 support more symintnode operations (#83877) remove debug code Pull Request resolved: https://github.com/pytorch/pytorch/pull/83877 Approved by: https://github.com/ezyang commit 5c49c7bbba5bbdf6ee941f84fdc13d4bd34d4014 Author: Robert Date: Wed Aug 24 17:34:28 2022 +0000 [WIP] Validating input_col for certain datapipes (#80267) Follow up from #79344. Currently WIP due to multiple test failures. Waiting for #80140 to land Pull Request resolved: https://github.com/pytorch/pytorch/pull/80267 Approved by: https://github.com/ejguan commit 30a5583d7566ef25ffee14ee3699f839e84fd5df Author: John Clow Date: Tue Aug 23 21:37:39 2022 -0700 [TorchTidy Fix] Don't try to collect strides for non-strided tensors (#83935) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83935 Approved by: https://github.com/robieta, https://github.com/slgong-fb commit 3f88171240d73c694ab45b3f3640137b5d695b2f Author: BowenBao Date: Wed Aug 24 00:54:17 2022 +0000 [ONNX] Remove static None graph output (#82623) Fixes #82370 * Unify the export behavior regarding static None outputs. These are dropped for both traced graph and TorchScript graph export. * `Optional` outputs are not affected. Fixes #82370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82623 Approved by: https://github.com/AllenTiTaiWang, https://github.com/abock commit 7a8152530d490b30a56bb090e9a67397d20e16b1 Author: kshitij12345 Date: Wed Aug 24 16:17:50 2022 +0000 move pooling test from test_nn to test/nn/test_pooling (#83915) Ref #63085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83915 Approved by: https://github.com/albanD commit 4eb02e863718abf5ff75fa4b296cd2331f938701 Author: Antonio Kim Date: Wed Aug 24 15:35:43 2022 +0000 [LTC] Add custom lazy tensor save function (#83294) We need a custom `save` function for checkpointing a lazy model, similar to what exists in PyTorch/XLA: https://github.com/pytorch/xla/blob/3eb8a9d9eb4ebb0b064461c3704650241625654e/torch_xla/core/xla_model.py#L994 The purpose of this function is to move any lazy tensors to CPU before saving the checkpoint. The way I implemented it was to create a general structure visitor, adapted from a function that we use quite often in Cerebras internal repositories. If there is a better tool already available in PyTorch that does the same things, I'm open to suggestions. CC: @wconstab @Krovatkin @JackCaoG Pull Request resolved: https://github.com/pytorch/pytorch/pull/83294 Approved by: https://github.com/wconstab commit 3e6e0a1d1093992d35b2248fd5a54feab4b01984 Author: Mario Lezcano Date: Wed Aug 24 05:53:25 2022 -0500 Support a stable double backward on linalg.det for real inputs (#80217) The complex case still fails. I do not know why. Fixes https://github.com/pytorch/pytorch/issues/62327 Fixes https://github.com/pytorch/pytorch/issues/53364 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80217 Approved by: https://github.com/nikitaved, https://github.com/albanD, https://github.com/malfet commit 4737b3361479f4104efaa3bfa2ea517eaacb60fb Author: Mario Lezcano Date: Wed Aug 24 05:53:24 2022 -0500 Make linalg.inv composite of linalg.solve (#80074) The `getri` kernel calls inside `getrs` so we can do so explicitly ourselves and save ourselves from having to maintain an extra kernel. This way we just need to optimise `lu_factor` and `lu_solve` and `inv` will be as efficient as it can be, as it'll be choosing the best backend to perform the factorisation and the best backend (not necessarily the same) to perform the solve. Fixes https://github.com/pytorch/pytorch/issues/77498 The benchmarks: https://github.com/pytorch/pytorch/pull/80074#issuecomment-1164309071 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80074 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD, https://github.com/malfet commit 0bdcfcb840bedad546cc97662a2272b34f5c7d64 Author: lezcano Date: Wed Aug 24 09:07:59 2022 +0000 Strenghten preconditions of linalg.cross (#83798) This makes `linalg.cross` array API complaint (https://github.com/data-apis/array-api/issues/415) and fixes a few bugs. Fixes https://github.com/pytorch/pytorch/issues/77629 Fixes https://github.com/pytorch/pytorch/issues/83756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83798 Approved by: https://github.com/mruberry commit 4a18d0a9729b69e02f036a0d218626fdb9ca2dda Author: Henry Tu Date: Wed Aug 24 14:33:52 2022 +0000 Fix LTC build warnings (#83955) Addresses `Wc++98-compat-extra-semi` warning from https://github.com/llvm/torch-mlir/issues/1264 by removing extraneous semicolon after autogen LTC native function definitions. ``` /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python/torch_mlir/csrc/base_lazy_backend/generated/LazyNativeFunctions.cpp:4241:6: warning: extra ';' outside of a function is incompatible with C++98 [-Wc++98-compat-extra-semi] }; ^ ``` cc: @wconstab @desertfire @ke1337 @antoniojkim Pull Request resolved: https://github.com/pytorch/pytorch/pull/83955 Approved by: https://github.com/wconstab commit ce7a9f92e30b93ab6efff4135be005c9afd0533a Author: PyTorch MergeBot Date: Wed Aug 24 10:14:44 2022 +0000 [xla hash update] update the pinned xla hash (#83967) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83967 Approved by: https://github.com/pytorchbot commit fa241fd50e226d74af06c36482fa68f0e2a4fb3c Author: Seonglyong Gong Date: Wed Aug 24 08:17:20 2022 +0000 [Profiler] record nn.Module's parameters (#83209) Summary: Record nn.Module's parameters for detaild memory profiling: - extend 'module_' in value cache & NNModuleInfo to save parameters - python binding and unit test case Test Plan: buck run mode/opt //caffe2/test:profiler -- -r test_nnmodule Differential Revision: D38379717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83209 Approved by: https://github.com/robieta commit 0ae298f8696f00b9201a6f433788e1583d914a8b Author: Souranil Sen Date: Wed Aug 24 08:00:20 2022 +0000 Test type promotion assertignoretypes (#83867) See #38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83867 Approved by: https://github.com/kit1980, https://github.com/mruberry commit 432c508e71111f9d5382322e0e6b1bc1c66bf0ec Author: Masaki Kozuki Date: Wed Aug 24 04:53:25 2022 +0000 Support NCCL Premul Sum (#81272) This PR adds the support for https://docs.nvidia.com/deeplearning/nccl/archives/nccl_21212/user-guide/docs/api/ops.html?highlight=premul#c.ncclRedOpCreatePreMulSum. The major changes include - convert enum ReduceOp to struct - add premul sum specific paths to init.cpp and Ops.cpp. note: - For pip wheels / conda binaries to support this, ~~I think https://github.com/pytorch/pytorch/pull/79132 would be needed~~ https://github.com/pytorch/pytorch/pull/82775 landed The commit titled "add nccl premul" whose current hash is https://github.com/pytorch/pytorch/pull/81272/commits/cb99ad67447b5899ecf8c4c3d78deaafa1cc09b8 was authored by @mcarilli and @ptrblck. cc @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/81272 Approved by: https://github.com/kwen2501 commit 67aed393195970459029a2f6a825d9db726187cf Author: Lu, Chengjun Date: Wed Aug 24 04:35:43 2022 +0000 Support the XPU backend untyped storage (#83952) Simple add XPU backend in untyped torch storage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83952 Approved by: https://github.com/ezyang commit 0491e1a13a62ead5c22f4396012da5fb6e09800f Author: Edward Z. Yang Date: Tue Aug 23 15:14:05 2022 -0700 Support returning symbolic strides from t.stride() in Python (#83842) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83842 Approved by: https://github.com/albanD, https://github.com/Chillee, https://github.com/bdhirsh commit df70714e763ea6e93dcefe0072ea9982afb73b7f Author: Nikita Shulga Date: Wed Aug 24 04:28:45 2022 +0000 [BE][CUDA] Use packed_accessor64 (#83949) Not sure why we are ignoring those, but SoftMax.cu alone generates 100+ lines of warnings: ``` /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In function ‘at::Tensor at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::get_offsets(const at::Tensor&, const IntArrayRef&, int64_t)’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:261:69: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = long int; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto indices_accessor = indices.packed_accessor(); ^ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘void at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::cuda_sparse_coo_softmax(at::Tensor&, const at::Tensor&, int64_t) [with scalar_t = double; bool LogSoftMax = false; int64_t = long int]’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:607:924: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:423:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = values_2.packed_accessor(); ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:426:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto out_values_accessor = out_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘void at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::cuda_sparse_coo_softmax(at::Tensor&, const at::Tensor&, int64_t) [with scalar_t = float; bool LogSoftMax = false; int64_t = long int]’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:607:1677: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:423:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = values_2.packed_accessor(); ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:426:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto out_values_accessor = out_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘void at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::cuda_sparse_coo_softmax(at::Tensor&, const at::Tensor&, int64_t) [with scalar_t = double; bool LogSoftMax = true; int64_t = long int]’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:623:927: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:423:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = values_2.packed_accessor(); ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:426:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto out_values_accessor = out_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘void at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::cuda_sparse_coo_softmax(at::Tensor&, const at::Tensor&, int64_t) [with scalar_t = float; bool LogSoftMax = true; int64_t = long int]’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:623:1679: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:423:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = values_2.packed_accessor(); ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:426:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto out_values_accessor = out_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘void at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::cuda_sparse_coo_softmax_backward(at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, c10::ScalarType) [with scalar_t = double; bool LogSoftMax = false; int64_t = long int]’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:641:977: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:542:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = values_2.packed_accessor(); ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:545:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto out_values_accessor = out_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:548:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto grad_values_accessor = grad_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘void at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::cuda_sparse_coo_softmax_backward(at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, c10::ScalarType) [with scalar_t = float; bool LogSoftMax = false; int64_t = long int]’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:641:1775: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:542:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = values_2.packed_accessor(); ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:545:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto out_values_accessor = out_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:548:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto grad_values_accessor = grad_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘void at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::cuda_sparse_coo_softmax_backward(at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, c10::ScalarType) [with scalar_t = double; bool LogSoftMax = true; int64_t = long int]’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:661:980: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:542:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = values_2.packed_accessor(); ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:545:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto out_values_accessor = out_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:548:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto grad_values_accessor = grad_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘void at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::cuda_sparse_coo_softmax_backward(at::Tensor&, const at::Tensor&, const at::Tensor&, int64_t, c10::ScalarType) [with scalar_t = float; bool LogSoftMax = true; int64_t = long int]’: /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:661:1777: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:542:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = values_2.packed_accessor(); ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:545:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto out_values_accessor = out_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:548:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto grad_values_accessor = grad_values_2.packed_accessor(); ^~~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘std::tuple at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::compute_pool_max(const at::Tensor&, const at::Tensor&, const IntArrayRef&, int64_t, int64_t) [with scalar_t = double; bool requireMxRows = true; at::IntArrayRef = c10::ArrayRef; int64_t = long int]’: /tmp/tmpxft_000040e0_00000000-6_SoftMax.cudafe1.stub.c:16:557: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:347:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘std::tuple at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::compute_pool_max(const at::Tensor&, const at::Tensor&, const IntArrayRef&, int64_t, int64_t) [with scalar_t = float; bool requireMxRows = true; at::IntArrayRef = c10::ArrayRef; int64_t = long int]’: /tmp/tmpxft_000040e0_00000000-6_SoftMax.cudafe1.stub.c:18:556: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:347:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘std::tuple at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::compute_pool_max(const at::Tensor&, const at::Tensor&, const IntArrayRef&, int64_t, int64_t) [with scalar_t = double; bool requireMxRows = false; at::IntArrayRef = c10::ArrayRef; int64_t = long int]’: /tmp/tmpxft_000040e0_00000000-6_SoftMax.cudafe1.stub.c:20:557: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:347:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = double; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu: In instantiation of ‘std::tuple at::native::_GLOBAL__N__39f8a8aa_10_SoftMax_cu_75209b9c::compute_pool_max(const at::Tensor&, const at::Tensor&, const IntArrayRef&, int64_t, int64_t) [with scalar_t = float; bool requireMxRows = false; at::IntArrayRef = c10::ArrayRef; int64_t = long int]’: /tmp/tmpxft_000040e0_00000000-6_SoftMax.cudafe1.stub.c:21:556: required from here /home/nshulga/git/pytorch/pytorch/aten/src/ATen/native/sparse/cuda/SoftMax.cu:347:6: warning: ‘at::GenericPackedTensorAccessor at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 2; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations] auto values_accessor = ^~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:245:1: note: declared here GenericPackedTensorAccessor packed_accessor() const & { ^ ~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83949 Approved by: https://github.com/ngimel commit 754d7f05b6841e555cea5a4b2c505dd9e0baec1d Author: Peter Bell Date: Wed Aug 24 00:54:14 2022 +0100 Remove conj kernels for real dtypes (#80374) `conj_physical_stub` is currently implemented for all dtypes despite it just being a plain copy for real dtypes. So, instead we should defer to the existing copy kernel in these cases. On my build for one CUDA architecture, I see a 2.2 MB decrease in `libtorch_cuda.so` size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80374 Approved by: https://github.com/ngimel, https://github.com/atalman commit 2c76d05b8fde5a08cb795d5e3d2c99aad7bd352f Author: Driss Guessous Date: Wed Aug 24 02:50:45 2022 +0000 [Nested Tensor] Make offset copy and move assignment more explicit. (#83488) Currently the nested tensor construction for the offset_ parameter takes in references and in the chain of delegation uses value. This could lead to unnecessary copies. Whenever a nested tensor impl is constructed it should take ownership of all its metadata. The only non-trivially copyable metadata associated with the class is `offsets_`. The goal of this PR is to make sure that consumers of nested_tensor_impl constructors ensure that they are passing offsets as a temporary - either buy explicitly copying a reference, or by constructing the offsets vector in the scope of construction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83488 Approved by: https://github.com/albanD, https://github.com/bdhirsh commit 7fdc2f70c659b51555dd293f1c63bfa724596a20 Author: Ishan-Rajgarhia Date: Wed Aug 24 02:45:49 2022 +0000 Task: T129772171 remove assertEqualIgnoreTypes from test/test_nn.py (#83870) See https://github.com/pytorch/pytorch/issues/38095 Replaced assertEqualIgnoreType with assertEqual Pull Request resolved: https://github.com/pytorch/pytorch/pull/83870 Approved by: https://github.com/kit1980 commit 6edcf8e18c68b4450fcb710cc13189ac078cee16 Author: Hansong Zhang Date: Wed Aug 24 02:17:52 2022 +0000 Move nnapi code from ATen common code to specific library (#83748) Summary: Currently we include nnapi code in all targets using ATen even if it's not used (actually there is no usage and being deprecated). Move it to `nnapi_backend_lib` for now. Test Plan: Sandcastle. Differential Revision: D38761095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83748 Approved by: https://github.com/salilsdesai, https://github.com/SS-JIA commit 84f0411f4f29bf185f941b98c9af52ff010172b4 Author: Catherine Lee Date: Wed Aug 24 02:06:50 2022 +0000 add merge blocking to ci: sev template (#83940) as in title, so that by default, ci: sev will block merges the line can be removed to not block merges Pull Request resolved: https://github.com/pytorch/pytorch/pull/83940 Approved by: https://github.com/huydhn, https://github.com/janeyx99, https://github.com/malfet, https://github.com/seemethere commit c47e0450f8bb5826108c45ff2bc3f77f2297b94b Author: Nan Xiao Date: Wed Aug 24 01:15:25 2022 +0000 [fbia] Keep Track of full qualified name before and after remote sharding (#83889) Summary: track qualname changes in embedding sharding & FX split, and compose target qualname in the end of FBIA transform stage, so we can use the qualname mapping in XL materialize stage Test Plan: CI/CD with DISABLE_XLEBB_MATERIALIZATION = True https://fburl.com/fblearner/a8yljbux with DISABLE_XLEBB_MATERIALIZATION = False https://fburl.com/fblearner/2nvi0dam Reviewed By: lliu315gt Differential Revision: D38772525 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83889 Approved by: https://github.com/houseroad commit 58f61d50a45f21512947588ac8e79b5e52956cb5 Author: Edward Z. Yang Date: Tue Aug 23 18:05:26 2022 -0400 Add hypothesis to requirements.txt (#83740) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83740 Approved by: https://github.com/zhxchen17, https://github.com/janeyx99, https://github.com/zou3519 commit 84e45e7e907484f300cade2ce23e5272da660e4f Author: PyTorch MergeBot Date: Wed Aug 24 00:47:03 2022 +0000 Revert "Optimize transpose copy on CPU using fbgemm transpose (#83327)" This reverts commit 04d8da88a6a1abf0da2b11096c85244bf38d3b2a. Reverted https://github.com/pytorch/pytorch/pull/83327 on behalf of https://github.com/weiwangmeta due to breaking internal builds/causing out-of-bounds errors/training accuracy commit 591222f5d93630d312899f335f7f58505bf44544 Author: Sergii Dymchenko Date: Wed Aug 24 00:26:46 2022 +0000 Fix use-dict-literal lint (#83718) Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718 Approved by: https://github.com/albanD commit fc470cf9806643efdbc1df650f9e8eafb671ba17 Author: Shirong Wu Date: Wed Aug 24 00:17:46 2022 +0000 Back out "Support regex-style matching for Any and Oneof (#82853)" (#83922) Reviewed By: hl475 Differential Revision: D38945806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83922 Approved by: https://github.com/hl475 commit 89072177e10b3cf9c8fdf204381db17fe1fde068 Author: Angela Yi Date: Tue Aug 23 23:56:50 2022 +0000 [fx][pass infra] Adding error catching (#83933) Example: ``` ====================================================================== ERROR: test_pass_manager_error (fx.test_pass_infra.TestPassManager) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/angelayi/Projects/pytorch/torch/fx/passes/infra/pass_manager.py", line 285, in __call__ res = fn(module) File "/Users/angelayi/Projects/pytorch/test/fx/test_pass_infra.py", line 164, in pass_fail raise RuntimeError("bad") RuntimeError: bad The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/angelayi/Projects/pytorch/test/fx/test_pass_infra.py", line 170, in test_pass_manager_error pm(traced_m) File "/Users/angelayi/Projects/pytorch/torch/fx/passes/infra/pass_manager.py", line 289, in __call__ raise RuntimeError(msg) from e RuntimeError: An error occured when running the 'pass_fail' pass after the following passes: ['replace_add_with_mul_pass', 'replace_mul_with_div_pass'] ``` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/83933 Approved by: https://github.com/SherlockNoMad commit 7c8d265822088621a09e4526a4612b4acb6dc5d6 Author: Eli Uriegas Date: Tue Aug 23 11:32:03 2022 -0700 ci: Remove dead code related to android uploads (#83930) These uploads actually never got triggeredhappened in nightlies so removing it altogether. Someone can re-add in the future if they feel these are important but I can't find an instance of this running since we migrated so I have a hard time believing anyone will miss it. https://hud.pytorch.org/hud/pytorch/pytorch/nightly/1?per_page=50&name_filter=android Signed-off-by: Eli Uriegas Pull Request resolved: https://github.com/pytorch/pytorch/pull/83930 Approved by: https://github.com/atalman, https://github.com/malfet commit 21bc77ca9673e067c7b8903f536d8f8901d148f8 Author: John Detloff Date: Tue Aug 23 22:50:09 2022 +0000 Remove CoreMLMemoryObserver (#83703) Summary: We added this observer to help us diagnose memory issues that have since resolved. It should be safe to clean this up. Test Plan: Diff just removed logging, so just build IG and confirm no errors. Differential Revision: D38843701 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83703 Approved by: https://github.com/mcr229 commit 8fae7027b399e65e6071d335aa874497682c84d0 Author: Edward Z. Yang Date: Tue Aug 23 12:31:10 2022 -0700 Don't introduce new overload for SymInt (#83628) Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented. This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts. This is BC-breaking in the following ways: * The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this. This is not BC-breaking in the following ways: * The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed. * This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type. Structure of the PR: * The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other: * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular: * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences. * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!) * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway. * Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes. * The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK. * I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it. * I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload) * I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.) * I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints. * I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628 Approved by: https://github.com/albanD, https://github.com/bdhirsh commit 4808bda7963712a5d77cb543f25fa72e2b7b3d91 Author: Zain Rizvi Date: Tue Aug 23 21:55:02 2022 +0000 Prefer signal from land checks over PR signals (#83715) When a dev forks their branch from a red master build, their branch can fail CI checks for reasons unrelated to their changes, but the same checks would however pass in the land validation commit (which is rebased off of viable/strict) Today, in the above scenario the `merge -l` command fails because mergebot sees the failing checks in the PR, which is not helpful when that same check passes in land validation. This PR changes the behavior so that: 1. If both the PR and land validation ran a workflow, only look at the results from land validation 2. If only the PR ran a specific workflow (e.g. for CLA Check or a nightly run) then continue to look the result from the PR (which matches existing behavior) It also includes a few extra BE fixes: - Replaces the tuple we used to pass workflow check results around with a named tuple so that it's easier to tell what data is being used - Reduces the number of API calls to github by ~50% during merges. Before, we were pulling results from github every time and then filtering it down to the relevant category of checks (e.g. failed/pending/startup_failed). Now, our filters share the check results Pull Request resolved: https://github.com/pytorch/pytorch/pull/83715 Approved by: https://github.com/zengk95 commit 25dd2a0422cf5fe38937c5b9441b1f6abc1c40cb Author: chenlai Date: Mon Aug 22 20:35:11 2022 -0700 Fix load_extra_only api for flatbuffers and enable flatbuffers in mobile for OSS properly (#83855) `_load_extra_only_for_mobile` API hasn't handled flatbuffers logic yet. Update the api accordingly. Also find out mobile build in OSS doesn't build with flatbuffers. Filed task T129996445 to track Differential Revision: [D38890847](https://our.internmc.facebook.com/intern/diff/D38890847/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38890847/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83855 Approved by: https://github.com/qihqi commit bbe803cb35948df77b46a2d38372910c96693dcd Author: PyTorch MergeBot Date: Tue Aug 23 19:36:43 2022 +0000 Revert "Strenghten preconditions of linalg.cross (#83798)" This reverts commit 7f0198e7390eff2f2f5fcb33ce36c99ec3b7f55e. Reverted https://github.com/pytorch/pytorch/pull/83798 on behalf of https://github.com/janeyx99 due to Sorry, land race caused functorch issues https://hud.pytorch.org/pytorch/pytorch/commit/7f0198e7390eff2f2f5fcb33ce36c99ec3b7f55e commit a802603ef77960f016dc81aaa7b0b773a19d3d73 Author: kshitij12345 Date: Tue Aug 23 19:31:22 2022 +0000 [complex] conv_transpose1d (#79694) Reference: https://github.com/pytorch/pytorch/issues/71108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79694 Approved by: https://github.com/ngimel commit 9095030239bde50da8b4bbdbc6d5701d3fdfdcae Author: Khushi Agrawal Date: Tue Aug 23 19:23:39 2022 +0000 [fix] edge case in `MaxPool1d` and add ErrorInputs (#83553) Fixes #83224 cc @kshitij12345 @albanD! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83553 Approved by: https://github.com/albanD commit 8f9ae35648e1ac24bd05a4c962ca0abc626bb6ca Author: Kaichen Liu Date: Tue Aug 23 19:19:38 2022 +0000 remove assertEqualIgnoreTypes from test/distributions/test_distributions.py (#83709) See https://github.com/pytorch/pytorch/issues/38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83709 Approved by: https://github.com/kit1980 commit 5204b8e4f9045a6132e882776075f085e59eca16 Author: Mengwei Liu Date: Mon Aug 22 16:13:19 2022 -0700 [torchgen] Add documentation for `autogen` keyword (#83610) This is a follow up for #81437. This PR explains what operator can use `autogen` and what will be generated. Also talked about generated kernels and where to find them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83610 Approved by: https://github.com/albanD, https://github.com/bdhirsh commit 732255f0318f188f66dc527ce7a067d5e185194c Author: Stephen Jia Date: Tue Aug 23 09:48:07 2022 -0400 [vulkan] Add VMA as a third_party subrepo (#83906) the [VulkanMemoryAllocator](https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator) is a popular library for GPU memory allocation using Vulkan. The Vulkan backend has a dependency on it, but since it is only a single header file we currently include it by checking it into the repo under [aten/src/ATen/native/vulkan/api/vk_mem_alloc.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/vk_mem_alloc.h). However, it is better to check it in as a third party submodule, since it allows better version tracking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83906 Approved by: https://github.com/kimishpatel commit 81843596cba1c3fadaed63e76d2094161347d9a4 Author: soulitzer Date: Tue Aug 23 11:19:03 2022 -0400 Fix view_func replay in no-grad mode (#83872) Fixes https://github.com/pytorch/pytorch/issues/83828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83872 Approved by: https://github.com/albanD commit 7f0198e7390eff2f2f5fcb33ce36c99ec3b7f55e Author: lezcano Date: Tue Aug 23 10:59:11 2022 +0000 Strenghten preconditions of linalg.cross (#83798) This makes `linalg.cross` array API complaint (https://github.com/data-apis/array-api/issues/415) and fixes a few bugs. Fixes https://github.com/pytorch/pytorch/issues/77629 Fixes https://github.com/pytorch/pytorch/issues/83756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83798 Approved by: https://github.com/mruberry commit 9beddde1d7fc401880ea8da0ad39833ff6a3cb93 Author: Ke Wen Date: Tue Aug 23 17:57:16 2022 +0000 Enable NCCL_DESYNC_DEBUG when TORCH_DISTRIBUTED_DEBUG=DETAIL (#83881) Automatically enable `NCCL_DESYNC_DEBUG` when `TORCH_DISTRIBUTED_DEBUG` is set to `DETAIL`. Saving user from setting two env variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83881 Approved by: https://github.com/malfet, https://github.com/rohan-varma, https://github.com/H-Huang commit cb488e6d2f21269d16da8bd60ec7dff1368baf98 Author: Ivan Yashchuk Date: Tue Aug 23 17:47:08 2022 +0000 Allow None arguments for elementwise type promotion wrapper and fix clamp with None arguments (#83586) Fixes https://github.com/pytorch/torchdynamo/issues/759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83586 Approved by: https://github.com/ezyang, https://github.com/ngimel commit 8793cd2fd30b89cbc060e7ccce7deba30b6af52b Author: Nikita Shulga Date: Tue Aug 23 17:46:45 2022 +0000 Move ATenNVRTC.h include from `jit_utils.h` to `jit_utils.cpp` (#83886) In general, `.h` files should only include headers that are used in the header Fixes https://github.com/pytorch/pytorch/issues/83856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83886 Approved by: https://github.com/ngimel commit 8db04c111363bde5c90885fee3debbd288605336 Author: Brian Hirsh Date: Mon Aug 22 08:39:28 2022 -0700 reinplace pass: special handling for view_scatter ops (#83846) There is already special handling in the reinplacing pass for removing `{view}_scatter` ops, but there is another case that needs special handling. In this code: ``` def f(): a = torch.zeros(4, 4, 4) a[:, 2:] = torch.ones(4, 2, 4) return a ``` Tracing normally with `make_fx()` gives you: ``` def forward(self): zeros = torch.ops.aten.zeros.default([4, 4, 4], device = device(type='cpu'), pin_memory = False) ones = torch.ops.aten.ones.default([4, 2, 4], device = device(type='cpu'), pin_memory = False) slice_tensor = torch.ops.aten.slice.Tensor(zeros, 0, 0, 9223372036854775807) slice_tensor_1 = torch.ops.aten.slice.Tensor(slice_tensor, 1, 2, 9223372036854775807); slice_tensor = None copy__default = torch.ops.aten.copy_.default(slice_tensor_1, ones); slice_tensor_1 = ones = None return zeros ``` Functionalizing it gives you: ``` def forward(self): zeros = torch.ops.aten.zeros.default([4, 4, 4], device = device(type='cpu'), pin_memory = False) ones = torch.ops.aten.ones.default([4, 2, 4], device = device(type='cpu'), pin_memory = False) slice_tensor = torch.ops.aten.slice.Tensor(zeros, 0, 0, 9223372036854775807) slice_tensor_1 = torch.ops.aten.slice.Tensor(slice_tensor, 1, 2, 9223372036854775807); slice_tensor = None slice_tensor_2 = torch.ops.aten.slice.Tensor(zeros, 0, 0, 9223372036854775807) slice_scatter_default = torch.ops.aten.slice_scatter.default(slice_tensor_2, ones, 1, 2, 9223372036854775807); slice_tensor_2 = ones = None slice_scatter_default_1 = torch.ops.aten.slice_scatter.default(zeros, slice_scatter_default, 0, 0, 9223372036854775807); zeros = slice_scatter_default = None return slice_scatter_default_1 ``` Notice that there are not any functional ops to directly re-inplace! What actually happened is that functionalization turned the `copy_()` into a `copy()`, but the out-of-place `copy()` operator gets optimized away because it's a no-op (when the input and output metadata are the same, `out = copy(a, b)` just returns `b`). What we actually want is to replace this line: ``` slice_scatter_default = torch.ops.aten.slice_scatter.default(slice_tensor_2, ones, 1, 2, ...); ``` with this: ``` new_slice = torch.ops.aten.slice.Tensor(slice_tensor_2, 1, 2, ...); _ = torch.ops.aten.copy_.default(new_slice, ones) ``` In the above, we're taking a fresh slice of the "base" tensor, and performing a `copy_()` on the slice, adding back what functionalization removed. We actually need to create a fresh "slice" node, because we're not guaranteed that one already exists in the graph (technically there should be one, but it might have been DCE'd by the time we hit re-inplacing) I also updated the docs for re-inplacing to more closely match the order of the logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83846 Approved by: https://github.com/ezyang commit 75ec7b754707b4ec19328d9e35f4d30f8045268c Author: Brian Hirsh Date: Mon Aug 22 08:39:25 2022 -0700 reinplace pass: bugfix for output node replacement (#83845) Cleaned up some of the arg replacement logic to use tree_map, so it handles FX nodes that have nested containers. See the added test: when you write a function that returns a list, the `output` node in the FX graph shows up as having `node.args = tuple(immutable_list(...))` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83845 Approved by: https://github.com/ezyang commit 01434c2d206e3732219a1afaf4f33b294c60a7b3 Author: chengscott <60510scott@gmail.com> Date: Tue Aug 23 17:12:50 2022 +0000 Improve DistanceKernel.cu (#83811) include device_sqrt replace reduce_agg by BlockReduce choose implementation by impl_fptr instead of error-prone copy-and-paste Pull Request resolved: https://github.com/pytorch/pytorch/pull/83811 Approved by: https://github.com/ngimel commit df048414e0c8485ed22bb7b28a02ab5189d888f4 Author: samdow Date: Mon Aug 22 09:45:04 2022 -0400 [functorch] add linalg cross batch rule (#83759) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83759 Approved by: https://github.com/zou3519 commit e4af53c1a1b33ff38375a0fecc531324c5cd6b0b Author: Scott Wolchok Date: Mon Aug 22 16:02:39 2022 -0700 [PyTorch] Remove unused sstream/string includes from c10/macros/Macros.h (#83353) Nothing in the rest of the header seems to use these. Differential Revision: [D38672680](https://our.internmc.facebook.com/intern/diff/D38672680/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83353 Approved by: https://github.com/malfet commit 7e386845a41d2ae57e4bf80e65c39e930baba99a Author: Jane Xu Date: Tue Aug 23 16:38:34 2022 +0000 Update retry action to latest version (#83911) We're running into EPERM issues when trying to install nvidia tools, see failure example https://github.com/pytorch/pytorch/runs/7975726013?check_suite_focus=true. ``` WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver. /home/ec2-user/actions-runner/_work/_actions/nick-fields/retry/71062288b76e2b6214ebde0e673ce0de1755740a/dist/index.js:1049 throw err; ^ Error: kill EPERM at process.kill (internal/process/per_thread.js:199:13) at killPid (/home/ec2-user/actions-runner/_work/_actions/nick-fields/retry/71062288b76e2b6214ebde0e673ce0de1755740a/dist/index.js:1059:17) at /home/ec2-user/actions-runner/_work/_actions/nick-fields/retry/71062288b76e2b6214ebde0e673ce0de1755740a/dist/index.js:1036:21 at Array.forEach () at /home/ec2-user/actions-runner/_work/_actions/nick-fields/retry/71062288b76e2b6214ebde0e673ce0de1755740a/dist/index.js:1034:23 at Array.forEach () at killAll (/home/ec2-user/actions-runner/_work/_actions/nick-fields/retry/71062288b76e2b6214ebde0e673ce0de1755740a/dist/index.js:1033:27) at /home/ec2-user/actions-runner/_work/_actions/nick-fields/retry/71062288b76e2b6214ebde0e673ce0de1755740a/dist/index.js:1024:13 at ChildProcess.onClose (/home/ec2-user/actions-runner/_work/_actions/nick-fields/retry/71062288b76e2b6214ebde0e673ce0de1755740a/dist/index.js:1080:17) at ChildProcess.emit (events.js:314:20) { errno: 'EPERM', code: 'EPERM', syscall: 'kill' } ``` The root issue probably lies elsewhere but this action is not helping/the errors seem to say it's unable to kill child processes. A more recent commit in that repo uses spawn instead of exec which might make a difference. Regardless, we should keep our actions up to date anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83911 Approved by: https://github.com/malfet commit a315a2c79bbd25dfb022e294df8df67568b14dea Author: Jeff Daily Date: Tue Aug 23 16:22:14 2022 +0000 [ROCm] restore MIOpen benchmark flag default to true (#82656) PR https://github.com/pytorch/pytorch/pull/77438 allowed MIOpen to support the benchmark flag. Previously, the benchmark flag was ignored by MIOpen such that benchmarking was always turned on. This commit restores the behavior that MIOpen benchmarking is by default turned on. CI unit tests cover this capability. Torchvision models demonstrate the performance delta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82656 Approved by: https://github.com/ngimel commit 0270a707e5e6464785291f090715f0525a5019d0 Author: Horace He Date: Tue Aug 23 05:11:03 2022 +0000 Fix stride issue with faketensors (#83822) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83822 Approved by: https://github.com/ezyang, https://github.com/ngimel commit 7ebdb4c72f4158cca889f2696f12c5fbff3c1023 Author: Horace He Date: Tue Aug 23 05:11:03 2022 +0000 Refactored ops on size to be dispatcher ops (#83719) An example of how the graph looks now. ``` def forward(self, x_1): size = torch.ops.math.size(x_1, 0) size_1 = torch.ops.math.size(x_1, 1); x_1 = None ones = torch.ops.aten.ones.default([1], device = device(type='cpu'), pin_memory = False) expand_sym_int = torch.ops.aten.expand.SymInt(ones, [size, size_1]); ones = size = size_1 = None cos_default = torch.ops.aten.cos.default(expand_sym_int); expand_sym_int = None return (cos_default,) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83719 Approved by: https://github.com/ezyang commit 58170fb8aafc955258609693cd368cf8ecc8ff2b Author: Vasiliy Kuznetsov Date: Mon Aug 22 14:55:52 2022 -0700 Remove DBR quantization from the codebase (#83642) Summary: DBR quantization is a no-go for now because it does not align well with PyTorch 2.0 plans and we do not want to build yet another tracing system. Deleting it from the codebase for now since there are no plans to develop this in the near future. We can bring it back at a later time if necessary. Test plan: CI Differential Revision: [D38839556](https://our.internmc.facebook.com/intern/diff/D38839556) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83642 Approved by: https://github.com/andrewor14, https://github.com/jerryzh168 commit 4dfa6d28a139e8325fe9b255af86e8d1360ae7ee Author: mattip Date: Tue Aug 23 15:03:29 2022 +0000 Normalize DLPack stride to 1 where shape < 2 (#83158) Fixes #83069. Also move all the dlpack tests to a new file., `test_dlpack.py`. The fix involves always allocating a "strides" int array when converting to dlPack and deleting the strides when the capsule descructor is called. Then the strides are copied from the tensor, and `strides[i]` is set to `1` where `shape[i] < 2`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83158 Approved by: https://github.com/ezyang commit 247468baf0d9e105ced371a9eb42074147c1ee57 Author: jpvillam Date: Tue Aug 23 13:54:09 2022 +0000 [ROCm] More Sparse UTs enablement and more hipification mappings. (#78939) Enables: test_bmm_cuda_float64 test_bmm_deterministic_cuda_float64 test_csr_matvec_cuda_complex128 test_csr_matvec_cuda_complex64 test_csr_matvec_cuda_float32 test_csr_matvec_cuda_float64 To enable the above tests had to add some more hip mappings for the hipification process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939 Approved by: https://github.com/pruthvistony, https://github.com/malfet commit ed949e22580f6f76a1476626badbc9d6161b7745 Author: PyTorch MergeBot Date: Tue Aug 23 10:25:52 2022 +0000 [xla hash update] update the pinned xla hash (#83899) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83899 Approved by: https://github.com/pytorchbot commit 7c20ad3dfae18c5e262c59e09c2b6079ec7fc69f Author: kshitij12345 Date: Tue Aug 23 08:39:35 2022 +0000 [optim] rprop: handle complex params as independent real params (#83858) Ref #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83858 Approved by: https://github.com/albanD commit dd67d52b570629b7c157e4a11a9fae18f517a6e6 Author: Kshiteej K Date: Tue Aug 23 08:34:39 2022 +0000 [nn] split rnn_utils test from test_nn.py (#83675) Ref: https://github.com/pytorch/pytorch/issues/63085 Proposed folder structure ``` -> test -> nn -> test_conv.py -> test_pooling.py -> ..... ``` This PR: Moves test related RNN utilities to a different file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83675 Approved by: https://github.com/albanD commit a419e483b21eb572848f85c815e3f993cb13040c Author: Jerry Zhang Date: Mon Aug 22 19:55:57 2022 -0700 [quant][fx] Add support for quantized matmul (#83885) Summary: att, probably missed the op during migration to the reference flow Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_qmatmul Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83885 Approved by: https://github.com/andrewor14 commit 3dfb8dfcf31530aad457df50a5a89ef691438c11 Author: Justin Chu Date: Tue Aug 23 05:39:17 2022 +0000 [ONNX] Use `errors.SymbolicValueError` for more context (#83332) Replace runtime errors in torch.onnx with `errors.SymbolicValueError` for more context around jit values. - Extend `_unimplemented`, `_onnx_unsupported`, `_onnx_opset_unsupported`, `_onnx_opset_unsupported_detailed` errors to include JIT value information - Replace plain RuntimeError with `errors.SymbolicValueError` - Clean up: Use `_is_bool` to replace string comparison on jit types - Clean up: Remove the todo `Remove type ignore after #81112` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83332 Approved by: https://github.com/AllenTiTaiWang, https://github.com/thiagocrepaldi, https://github.com/BowenBao commit 04d8da88a6a1abf0da2b11096c85244bf38d3b2a Author: CaoE Date: Tue Aug 23 04:48:38 2022 +0000 Optimize transpose copy on CPU using fbgemm transpose (#83327) Optimize transpose copy on CPU using fbgemm transpose single socket (28cores): ``` before: torch.Size([10, 128, 10, 124]) -> torch.Size([10, 128, 124, 10]) fp32: 4.819e-05 ms; bf16: 4.846e-05 ms torch.Size([10, 128, 30, 124]) -> torch.Size([10, 128, 124, 30]) fp32: 0.000171 ms; bf16: 0.000129 ms after: torch.Size([10, 128, 10, 124]) -> torch.Size([10, 128, 124, 10]) fp32: 2.439e-05 ms; bf16: 2.152e-05 ms torch.Size([10, 128, 30, 124]) -> torch.Size([10, 128, 124, 30]) fp32: 0.000132 ms; bf16: 3.916e-05 ms ``` single core: ``` before: torch.Size([10, 128, 10, 124]) -> torch.Size([10, 128, 124, 10]) fp32: 0.00109 ms; bf16: 0.00103 ms torch.Size([10, 128, 30, 124]) -> torch.Size([10, 128, 124, 30]) fp32: 0.00339 ms; bf16: 0.00295 ms after: torch.Size([10, 128, 10, 124]) -> torch.Size([10, 128, 124, 10]) fp32: 0.000566 ms; bf16: 0.000382 ms torch.Size([10, 128, 30, 124]) -> torch.Size([10, 128, 124, 30]) fp32: 0.00282 ms; bf16: 0.000999 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83327 Approved by: https://github.com/frank-wei commit b29a074882a2194d61f1cd7ccf939618d8384d08 Author: Rohan Varma Date: Mon Aug 22 19:00:50 2022 +0000 [BE] Revert distributed change in https://github.com/pytorch/pytorch/pull/68779 (#83181) https://github.com/pytorch/pytorch/issues/82641 points out a regression in how inputs / outputs are processed by DDP, blocking their HF use case. It was narrowed down to https://github.com/pytorch/pytorch/pull/68779 and reverting the distributed change there fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83181 Approved by: https://github.com/kumpera commit 4e90526a4f03d1950e8db6e8722cce8e0fb4a5f5 Author: Rohan Varma Date: Mon Aug 22 19:00:49 2022 +0000 [FSDP] Remove unneeded checks (#83150) @awgu pointed out these checks aren't really doing anything, as they just make sure we're setting training state in certain ways throughout FSDP and is sort of arbitrary. So, removing them to avoid confusion. We still keep the checking around `_post_backward_called` because this is needed in `finalize_params` for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83150 Approved by: https://github.com/awgu commit 8e074f455720245c733bb0f3417a21c9fd5a73c7 Author: Catherine Lee Date: Tue Aug 23 01:50:26 2022 +0000 hash update - bug fix for branches (#83865) hash updates for xla were failing because the current pinned hash is a branch, so the git command for getting the date couldn't find the branch due to not having a local version of the branch. Fixed by checking out the branch to make sure it exists locally. example of failure: https://github.com/pytorch/pytorch/runs/7913835742?check_suite_focus=true Test plan: made it pull request trigger and ran, to get this: https://github.com/pytorch/pytorch/runs/7959221184?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/83865 Approved by: https://github.com/zengk95 commit 7cfc8b78207e6d5f0c2caff435525b4da65ce68d Author: Driss Guessous Date: Tue Aug 23 01:13:14 2022 +0000 [MPS] Move mps_linear to mps dispatch key (#80068) Fixes #77394 This is related to #79920 which adds linear support for nested tensors. Codegen still throws an assert stoping this from compiling. However I tested locally by commenting out this assert: https://github.com/pytorch/pytorch/blob/61305cd638b6fcd73a0b66b4cde7014fecb9e8ce/tools/autograd/gen_variable_type.py#L798 and the intended behavior appears to be working. I am not sure what changes need to be made to codegen to make this work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80068 Approved by: https://github.com/albanD, https://github.com/malfet, https://github.com/kulinseth commit b18f984307070a883dbfabf138cfcd48a391ec75 Author: Xiao Wang <24860335+xwang233@users.noreply.github.com> Date: Tue Aug 23 01:09:29 2022 +0000 [cmake] Change COLORIZE_OUTPUT option to USE_COLORIZE_OUTPUT (#83716) Close https://github.com/pytorch/pytorch/issues/83500 Change COLORIZE_OUTPUT option to USE_COLORIZE_OUTPUT so that it can be passed and disabled through environment variable. Not sure why COLORIZE_OUTPUT=0 didn't work before but USE_COLORIZE_OUTPUT=0 works after. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83716 Approved by: https://github.com/malfet commit b7afee8a2736b1a9d04de59d66b016838772a95d Author: Zafar Date: Tue Aug 23 00:54:42 2022 +0000 Lazy deprecation import function in torch.nn (#83834) This introduces a mechanism to show a deprecation warning on import. This is achieved through override of the `__getattr__` in the modules that need the migration. See https://peps.python.org/pep-0562/ for details. Some of the codes under torch.nn are migrating, and will require a deprecation warning on import. Specifically, quantized modules are in the process of migration to the `torch.ao` package (see https://github.com/pytorch/pytorch/issues/81667). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83834 Approved by: https://github.com/albanD commit 658f958bc4bb314d9c6030eeaf3e1784792b5d15 Author: XiaobingSuper Date: Tue Aug 23 00:53:37 2022 +0000 fix upsample bf16 issue for channels last path by using high pricsion to compute index (#83847) Given the following case: ``` import torch a = torch.ones(1, 3, 320, 480).bfloat16().to(memory_format=torch.channels_last) out_bf16 = torch.nn.functional.interpolate(a, size = (640, 960), scale_factor = None, mode = 'bilinear', align_corners = False, recompute_scale_factor= None, antialias = False) out_fp32= torch.nn.functional.interpolate(a.float(), size = (640, 960), scale_factor = None, mode = 'bilinear', align_corners = False, recompute_scale_factor= None, antialias = False) print(out_bf16[0, 2, :, :]) print(out_fp32[0, 2, :, :]) ``` the boundary of bfloat16 output gets a wrong value: ``` tensor([[1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], [1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], [1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], ..., [1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], [1.0000e+00, 1.0000e+00, 1.0000e+00, ..., 1.0000e+00, 1.0000e+00, 1.0000e+00], [0.0000e+00, 0.0000e+00, 1.8367e-40, ..., 0.0000e+00, 0.0000e+00, 0.0000e+00]], dtype=torch.bfloat16) tensor([[1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], ..., [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.]]) ``` the expected behavior is that the bfloat16 output value should also be one. The main reason is that we use low precision to compute the index, see https://github.com/pytorch/pytorch/blob/fcb124406bdf86bc2d15e999d5a3e09b86238bba/aten/src/ATen/native/UpSample.h#L448, we should use a high precison to do the computation as GPU path: https://github.com/pytorch/pytorch/blob/fcb124406bdf86bc2d15e999d5a3e09b86238bba/aten/src/ATen/native/cuda/UpSample.cuh#L123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83847 Approved by: https://github.com/frank-wei commit 80cfafc3857c981d67507f135232bf43da4a1caa Author: Justin Chu Date: Fri Aug 19 19:04:47 2022 -0700 [ONNX] Add quantization support to more single output ops (#83008) - Implement quantization support for single output ops - quantized::sigmoid - quantized::instance_norm - aten::reshape - aten::reshape_as - aten::sum - aten::mean - aten::prod - aten::t - aten::numpy_T - aten::expand - aten::expand_as - aten::embedding - aten::embedding_bag - aten::view - aten::select - aten::eq - aten::ne - aten::gt - aten::lt - aten::le - aten::ge - quantized::layer_norm - aten::elu - aten::selu - aten::maximum - aten::minimum - aten::amax - aten::amin - aten::hardtanh - aten::hardswish - quantized::group_norm - aten::as_strided - quantized::leaky_relu - aten::transpose - Avoid modifying functions in `quantized_args` and have the wrapper closed over `scale` and `zero_point` instead (for purity) - Remove magic number and assign it to INT64_MAX - implement `_unpack_quantized_tensor` for handling quantized tensor unpacking to separate the logic from tuple unpacking and for clearer error handling Pull Request resolved: https://github.com/pytorch/pytorch/pull/83008 Approved by: https://github.com/BowenBao commit 1e4383f7563c9bb2e3c8e6989b6853d1d04f652f Author: Wonjoo Lee Date: Mon Aug 22 22:52:10 2022 +0000 Add lazy shape inference for cholesky op (#83720) PyTorch/XLA companion PR: https://github.com/pytorch/xla/pull/3907 --- Add lazy shape inference for cholesky op Pull Request resolved: https://github.com/pytorch/pytorch/pull/83720 Approved by: https://github.com/JackCaoG commit 36f6d91a2d6a913d22d5b80d64148b53243e3584 Author: Jane Xu Date: Mon Aug 22 22:19:41 2022 +0000 Migrate last workflows from 18.04 to 22.04 (#83861) 18.04 is getting deprecated in december--let's migrate them off now. This PR does NOT touch functorch nor third_party workflows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83861 Approved by: https://github.com/kit1980, https://github.com/malfet, https://github.com/seemethere commit 09331c947cd559211bec20fd016e48f86d48e51f Author: Kshiteej K Date: Mon Aug 22 21:55:01 2022 +0000 [optim] rmsprop: handle complex params as independent real params (#83860) Ref: #65711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83860 Approved by: https://github.com/albanD commit 62d9f1559e3fc1f807e1e51e95d2d2b03e8bf374 Author: Janosh Riebesell Date: Mon Aug 22 21:42:37 2022 +0000 Fix model type CNN->MLP in functorch ensembling notebook intro (#83603) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83603 Approved by: https://github.com/albanD, https://github.com/zou3519 commit dc557b94ec98e7651ac14975851ff8015188114a Author: Reza Sharifi Date: Mon Aug 22 21:34:42 2022 +0000 Used generator for "any" and "all" (#83844) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/83844 Approved by: https://github.com/albanD commit e10c47a7d09c4a558ba87d900033f565e6662812 Author: George Qi Date: Fri Aug 19 14:26:30 2022 +0000 [maskedtensor] adding unary and binary operations (#82837) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82837 Approved by: https://github.com/bhosmer commit daca0ee5e23402b6f11c90978c4655f31657b2ca Author: BowenBao Date: Mon Aug 22 10:14:36 2022 -0700 [ONNX] Introduce ONNXScopeName (#82038) Update `_setup_trace_module_map` to always record module/layer info in `Scope` attribute for nodes. Extend `Scope` name to not only record module typename, but also module object variable name. Both names are formatted and stored as `name` attribute in `Scope`. Introduce `ONNXScopeName` class to manage the formatting and parsing. Updated local function export code adjusting to this update. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82038 Approved by: https://github.com/AllenTiTaiWang, https://github.com/justinchuby, https://github.com/abock, https://github.com/malfet commit 91766360b18be25a5230702c1c855d08c62d0171 Author: zengk95 <34172846+zengk95@users.noreply.github.com> Date: Mon Aug 22 20:17:40 2022 +0000 [mergebot] Post PR Comment on cancel (#82744) When someone cancels a PR merge, it's not apparent that it's canceled unless the user clicks into that job. In this PR, we add a message if the pr gets canceled. The only thing is the user will not receive a comment if the PR is canceled immediately since posting the message requires that the checkout be finished. n/a Tested it on canary https://github.com/pytorch/pytorch-canary/pull/132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82744 Approved by: https://github.com/huydhn, https://github.com/seemethere commit b136f3f310aa01a8b3c1e63dc0bfda8fd2234b06 Author: joncrall Date: Mon Aug 22 20:07:23 2022 +0000 More doctest refinements. (#83317) Follow up to #82797 Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way. @ezyang @vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317 Approved by: https://github.com/ezyang commit 9c9f42481761c7f42f47c296a9dbee60f3407b90 Author: Fabian Ricardo Latorre Gomez Date: Mon Aug 22 19:48:46 2022 +0000 modify the signature of method `__getitem__` from `ModuleList` (#83799) The type of the parameter idx can be either slice or int. The same for the `Sequential` class Fixes #83797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83799 Approved by: https://github.com/malfet, https://github.com/albanD commit 91eb1b9bb93ddd997691295114a5e34bd61793ad Author: Peter Bell Date: Mon Aug 22 14:23:55 2022 +0100 Move _masked opinfos to opinfo/definitions/_masked.py (#83763) Ref #82518 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83763 Approved by: https://github.com/albanD commit 7656ef73f1ae73798ae965da6dedd260b7cb4f01 Author: Peter Bell Date: Mon Aug 22 14:23:55 2022 +0100 Move `torch.special` OpInfos into opinfo/definitions/special.py (#83762) Ref #82518 As with `linalg` this doesn't include ops with an alias in special, only the ones where `special.foo` is the actual name of the opinfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83762 Approved by: https://github.com/albanD commit 35d4fa444b67cbcbe34a862782ddf2d92f5b1ce7 Author: George Petterson Date: Mon Aug 22 19:05:41 2022 +0000 Fix for transposed convolution shape functions (#83557) This fixes an issue with #80860 when in channels and out channels are different. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83557 Approved by: https://github.com/Gamrix commit eff28d61c9c961e6c2724b78f57441ee2e3e40cb Author: John Clow Date: Wed Aug 17 15:07:57 2022 -0700 [JIT SSA] Allow updating shape functions without recompilation (#83629) In order to avoid extra round trips, and avoid confusion in places such as this to manually pull in the latest copy of the shape_functions.py file This also fixes the cases where people pull in the wrong version of the file. This can happen in cases such as when developers run `python setup.py install` instead of `python setup.py develop` to generate their current copy of Pytorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83629 Approved by: https://github.com/davidberard98 commit 53cda905be74e03161e6732d679be3b9cb2c65b0 Author: PyTorch MergeBot Date: Mon Aug 22 17:47:06 2022 +0000 Revert "Optimize transpose copy on CPU using fbgemm transpose (#83327)" This reverts commit f56720ea7c7ad0bcb4c5af669e28bf7de8122cb6. Reverted https://github.com/pytorch/pytorch/pull/83327 on behalf of https://github.com/janeyx99 due to Sorry, reverting as this breaks mac functorch tests on trunk https://hud.pytorch.org/pytorch/pytorch/commit/f56720ea7c7ad0bcb4c5af669e28bf7de8122cb6 commit d1be36ceab0bda0aa348846f44bc3c9372e0eda3 Author: Ramin Azarmehr Date: Mon Aug 22 17:07:09 2022 +0000 [MPS] Fix the index error in constant_pad_nd() with single-dimension input (#83745) * Fix the index error in constant_pad_nd() with single-dimension input (#83343) - Also added a test case in test_mps for it * Move padding code into new file Pad.mm Fixes https://github.com/pytorch/pytorch/issues/83343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83745 Approved by: https://github.com/razarmehr commit a6b75bb0990f2c949bf6de4e6ae58f019d61c0ac Author: Denis Vieriu <104024078+DenisVieriu97@users.noreply.github.com> Date: Mon Aug 22 17:05:53 2022 +0000 [MPS] Fix placeholder case for missing gather graph (#83744) Fixes https://github.com/pytorch/pytorch/issues/82543, https://github.com/pytorch/pytorch/issues/83230 The current Placeholder code relies to find a gather graph in order to make the data contiguous, otherwise we'll try calling into tensor.contiguous() directly, which for slice elements, won't do anything. E.g consider the following basic case where we index a 2 element tensor: ``` tensor_list = torch.tensor([1.2, 1.0], device="mps") for scalar in tensor_list: r_mps = torch.ceil(scalar) r_cpu = torch.ceil(scalar.to("cpu")) self.assertEqual(r_mps.cpu(), r_cpu) ``` The second element 1.0 is a contiguous view tensor (similar to slicing), but it has no gather graph created behind. In the placeholder, we won't be able to find the graph, thus relying on the fallback case where we call _tensor = src.contiguous();. For an already contiguous tensor, this won't do anything, thus we end up creating the NDArray with all the values of the tensor (1.2 and 1.0 instead of just 1.0). Doing clone instead of contiguous will actually perform a blit behind and take into consideration the storage_offset of the view when performing the copy. Similarly, the following basic case is also failing because of this issue: ``` x = torch.tensor([1.0, 0.49], device="mps") print(x) # prints 1.0 and 0.0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83744 Approved by: https://github.com/razarmehr commit b8496eb411a4b7af2fcd12a4fe4a6fb1690c8f6a Author: Andrew Or Date: Mon Aug 22 06:53:46 2022 -0700 [Quant] Separate FBGEMM/QNNPACK BackendConfigs (#83566) Summary: Previously we use a single BackendConfig (get_native_backend_config) for both the FBGEMM and QNNPACK backends. However, these two backends have subtle differences in terms of their requirements that cannot be satisfied using a single BackendConfig. Therefore, this commit is the first step torwards decoupling the two backends. The real change in functionality will come in a future commit after DTypeConfig supports quant_min/quant_max and scale_min/scale_max. Existing uses of `get_native_backend_config` should not be affected. Public facing changes: ``` from torch.ao.quantization.backend_config import ( get_fbgemm_backend_config, get_qnnpack_backend_config, ) fbgemm_backend_config = get_fbgemm_backend_config() qnnpack_backend_config = get_qnnpack_backend_config() ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Reviewers: jerryzh168 Subscribers: jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83566 Approved by: https://github.com/jerryzh168 commit 07d0c9ec75c1fbf7b14d67010f49548d6cb5574c Author: Nikolay Korovaiko Date: Mon Aug 22 16:41:48 2022 +0000 make sym sizes be computed lazily (#82233) Creating size nodes proactively for each tensor is leading to increased memory pressure as hold strong pointers to tensor data. [](https://github.com/pytorch/pytorch/issues/80942) Creating Pull Request resolved: https://github.com/pytorch/pytorch/pull/82233 Approved by: https://github.com/wconstab commit f56720ea7c7ad0bcb4c5af669e28bf7de8122cb6 Author: ecao Date: Mon Aug 22 16:39:33 2022 +0000 Optimize transpose copy on CPU using fbgemm transpose (#83327) Optimize transpose copy on CPU using fbgemm transpose single socket (28cores): ``` before: torch.Size([10, 128, 10, 124]) -> torch.Size([10, 128, 124, 10]) fp32: 4.819e-05 ms; bf16: 4.846e-05 ms torch.Size([10, 128, 30, 124]) -> torch.Size([10, 128, 124, 30]) fp32: 0.000171 ms; bf16: 0.000129 ms after: torch.Size([10, 128, 10, 124]) -> torch.Size([10, 128, 124, 10]) fp32: 2.439e-05 ms; bf16: 2.152e-05 ms torch.Size([10, 128, 30, 124]) -> torch.Size([10, 128, 124, 30]) fp32: 0.000132 ms; bf16: 3.916e-05 ms ``` single core: ``` before: torch.Size([10, 128, 10, 124]) -> torch.Size([10, 128, 124, 10]) fp32: 0.00109 ms; bf16: 0.00103 ms torch.Size([10, 128, 30, 124]) -> torch.Size([10, 128, 124, 30]) fp32: 0.00339 ms; bf16: 0.00295 ms after: torch.Size([10, 128, 10, 124]) -> torch.Size([10, 128, 124, 10]) fp32: 0.000566 ms; bf16: 0.000382 ms torch.Size([10, 128, 30, 124]) -> torch.Size([10, 128, 124, 30]) fp32: 0.00282 ms; bf16: 0.000999 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83327 Approved by: https://github.com/frank-wei commit fcb124406bdf86bc2d15e999d5a3e09b86238bba Author: Nikolay Korovaiko Date: Fri Aug 19 20:37:32 2022 -0700 release the current symintnode in the move c-tor (#83789) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83789 Approved by: https://github.com/ezyang commit b47f712b7b571831c7e5a0690bccef8d8b1a54b6 Author: Nikolay Korovaiko Date: Fri Aug 19 20:37:24 2022 -0700 Fix uninitialized member if the default c-tor is called (#83788) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83788 Approved by: https://github.com/ezyang commit 09157c76c04546f9f55b15a37dda938d6327df8a Author: Mike Iovine Date: Mon Aug 22 13:42:47 2022 +0000 [Static Runtime] Add schema checks for aten::list (#83753) Summary: The previous implementation assumed that there was only one overload and unconditionally tried to convert its input into a string. Some users were running into crashes because of this. Added a new overload for the list overload and schema checks. Also, I managed to uncover another bug when writing tests for this case (yikes). Returning inputs didn't work because the input cleanup process would destroy the output. Extended `CreateOwnedRefsForSpecialIValues` to fix that. Test Plan: CI + new unit tests Differential Revision: D38870803 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83753 Approved by: https://github.com/tenpercent, https://github.com/albanD commit d46dba18f7e96811c672afe629d891ad6b8d095d Author: lezcano Date: Sun Aug 21 23:56:09 2022 +0000 Simplify reshape and fix _refs.unflatten (#83827) Unflatten was incorrectly calling into `reshape` rather than `view`. When looking at the checks performed in `reshape`, I saw that the in PrimTorch is quite divergent from that in PyTorch, to the point that it took me some time to be able to prove that they were equivalent. I refactored that part into a separate function, and I implemented the logic that we have in ATen, together with the same errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83827 Approved by: https://github.com/ngimel commit 473b733bae7009945cc5712699d346678e8a40ff Author: Ivan Yashchuk Date: Mon Aug 22 09:12:13 2022 +0000 Replace .new_zeros(()) with 0.0 in torch/_decomp/decompositions (#83734) `new_zeros` is decomposed into `prims.empty_strided`+`prims.fill`+`prims.copy_to` and none of these are supported by prims+nvFuser executor currently. Replacing it with 0.0 makes these backward decompositions nvFuser friendly. Example with `torch.ops.aten.hardsigmoid_backward.default`: ```py opcode name target args kwargs ------------- ------------------------ -------------------------------- ------------------------------------------------------------ ---------------------------------------------------------------------------------------- placeholder a_1 a_1 () {} placeholder g_1 g_1 () {} call_function gt_default nvprims.gt.default (a_1, -3.0) {} call_function lt_default nvprims.lt.default (a_1, 3.0) {} call_function bitwise_and_default nvprims.bitwise_and.default (gt_default, lt_default) {} call_function mul_default nvprims.mul.default (g_1, 0.16666666666666666) {} call_function empty_strided prims.empty_strided.default ([], []) {'dtype': torch.float32, 'device': device(type='cuda', index=0), 'requires_grad': False} call_function fill_default prims.fill.default (empty_strided, 0) {} call_function copy_to_default prims.copy_to.default (empty_strided, fill_default) {} call_function broadcast_in_dim_default nvprims.broadcast_in_dim.default (copy_to_default, [3, 2], []) {} call_function where_default nvprims.where.default (bitwise_and_default, mul_default, broadcast_in_dim_default) {} output output output (where_default,) {} opcode name target args kwargs ------------- ------------------- --------------------------- --------------------------------------- -------- placeholder a_1 a_1 () {} placeholder g_1 g_1 () {} call_function gt_default nvprims.gt.default (a_1, -3.0) {} call_function lt_default nvprims.lt.default (a_1, 3.0) {} call_function bitwise_and_default nvprims.bitwise_and.default (gt_default, lt_default) {} call_function mul_default nvprims.mul.default (g_1, 0.16666666666666666) {} call_function where_default nvprims.where.default (bitwise_and_default, mul_default, 0.0) {} output output output (where_default,) {} Pull Request resolved: https://github.com/pytorch/pytorch/pull/83734 Approved by: https://github.com/Chillee commit 6a9c02339d02fe2f701e17ae7d7f3304dab15d98 Author: PyTorch MergeBot Date: Mon Aug 22 07:32:37 2022 +0000 Revert "[quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713)" This reverts commit 432f037498e3f470f1f6d2a5cc7c6ae8eb4fc870. Reverted https://github.com/pytorch/pytorch/pull/78713 on behalf of https://github.com/janeyx99 due to Reverting for breaking (trunk-only) ios build commit b1a7b67529110ce6cfdb50b9ea9e3e0ccf8196bc Author: PyTorch MergeBot Date: Mon Aug 22 07:30:48 2022 +0000 Revert "[quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714)" This reverts commit e6fb97d8ae0d2a45e26c9a597426f1ded13d3aec. Reverted https://github.com/pytorch/pytorch/pull/78714 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted commit 355d343fa85a6c9ae415bddaaf5352c6ce850f1e Author: PyTorch MergeBot Date: Mon Aug 22 07:29:15 2022 +0000 Revert "[quant][ao_migration] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` (#78715)" This reverts commit a7344e52b9d746062923647ae00ca38578d272d4. Reverted https://github.com/pytorch/pytorch/pull/78715 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted commit e9dd4d5adf391ed38b0b9152493958fc3d9a1350 Author: PyTorch MergeBot Date: Mon Aug 22 07:26:43 2022 +0000 Revert "[quant][ao_migration] `torch.nn.quantizable` → `torch.ao.nn.quantizable`. (#78717)" This reverts commit e0876feb493d4378d5aedced367eeaae75339741. Reverted https://github.com/pytorch/pytorch/pull/78717 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted commit 4cbb1986fe9e1f0ae3d352686378808789aa9186 Author: PyTorch MergeBot Date: Mon Aug 22 07:23:24 2022 +0000 Revert "[quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716)" This reverts commit 7cd2fa1d388bf240cd33ff933dc120e74ebc2eb3. Reverted https://github.com/pytorch/pytorch/pull/78716 on behalf of https://github.com/janeyx99 due to sorry, reverting so https://github.com/pytorch/pytorch/pull/78713 could be cleanly reverted commit 3c6c39e66e99a09677739a72c339afbd79cdc12f Author: Alex Beloi Date: Mon Aug 22 06:54:18 2022 +0000 [fx] refactor fba_passes into FBAPassManagerBuilder (#83268) Summary: This diff integrate FBAPassManagerBuilder as the primary orchestrator of FBA-FX passes Reviewed By: jfix71, dborkovic Differential Revision: D38186354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83268 Approved by: https://github.com/dborkovic commit 7cd2fa1d388bf240cd33ff933dc120e74ebc2eb3 Author: zaf Date: Sun Aug 21 19:34:58 2022 -0700 [quant][ao_migration] `torch.nn.qat` → `torch.ao.nn.qat` (#78716) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [X] [Current PR] `torch.nn.qat` → `torch.ao.nn.qat` - [X] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [X] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861197](https://our.internmc.facebook.com/intern/diff/D36861197/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861197/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78716 Approved by: https://github.com/jerryzh168 commit e0876feb493d4378d5aedced367eeaae75339741 Author: zaf Date: Sun Aug 21 19:34:56 2022 -0700 [quant][ao_migration] `torch.nn.quantizable` → `torch.ao.nn.quantizable`. (#78717) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [X] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [X] [Current PR] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36861090](https://our.internmc.facebook.com/intern/diff/D36861090/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36861090/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78717 Approved by: https://github.com/jerryzh168 commit a7344e52b9d746062923647ae00ca38578d272d4 Author: zaf Date: Sun Aug 21 19:34:54 2022 -0700 [quant][ao_migration] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` (#78715) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [X] [Current PR] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - None Differential Revision: [D36860927](https://our.internmc.facebook.com/intern/diff/D36860927/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860927/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78715 Approved by: https://github.com/jerryzh168 commit 08126c8967937be9e6be7bbb34a2c01b84aa0c1d Author: Animesh Jain Date: Mon Aug 22 05:22:30 2022 +0000 Minifier fixes (#83754) cc @Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/83754 Approved by: https://github.com/Chillee commit e6fb97d8ae0d2a45e26c9a597426f1ded13d3aec Author: zaf Date: Sun Aug 21 19:34:53 2022 -0700 [quant][ao_migration] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` (#78714) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [X] [Current PR] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 - [BC test](test/quantization/bc/test_backward_compatibility.py) @vkuzo - [IR emitter](torch/csrc/jit/frontend/ir_emitter.cpp) @jamesr66a - [JIT serialization](torch/csrc/jit/serialization/import_source.cpp) @IvanKobzarev @jamesr66a Differential Revision: [D36860660](https://our.internmc.facebook.com/intern/diff/D36860660/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36860660/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78714 Approved by: https://github.com/jerryzh168 commit 8948fdc525488c08c703befa704b4c4179732e3c Author: chenlai Date: Sun Aug 21 14:13:02 2022 -0700 Switch mobile targets to flatbuffers_mobile (#82829) Differential Revision: [D38412635](https://our.internmc.facebook.com/intern/diff/D38412635/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38412635/)! Differential Revision: [D38412635](https://our.internmc.facebook.com/intern/diff/D38412635) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82829 Approved by: https://github.com/qihqi commit f0eb841d209f251d6a735827d4b903962d0d31b8 Author: Emilio Castillo Date: Mon Aug 22 03:37:10 2022 +0000 Make `torch.optim.RMSprop` differentiable (#83578) Blocked by #82205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83578 Approved by: https://github.com/albanD commit ac39d2bd6e423c215338ce72b150afad2afe924c Author: Edward Z. Yang Date: Sat Aug 20 22:52:56 2022 -0400 Make negative integer test always done for Int to SymInt (#83815) Otherwise, it would be easy to trigger arbitrary memory access by passing a sufficiently negative integer to the API. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83815 Approved by: https://github.com/Chillee commit 4902254b9b595adf1a0346d6f79a6c7b145dbcaa Author: migeedz Date: Thu Aug 18 10:13:37 2022 -0700 fix torch._C._nn.linear bug (#83682) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83682 Approved by: https://github.com/jansel commit da6cd12173194559a1420e94ca3d0a6f6319929e Author: migeedz Date: Thu Aug 18 10:13:36 2022 -0700 gt constraint heutristic (#83334) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83334 Approved by: https://github.com/jansel commit 432f037498e3f470f1f6d2a5cc7c6ae8eb4fc870 Author: zaf Date: Sun Aug 21 14:54:38 2022 -0700 [quant][ao_migration] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` (#78713) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [X] [Current PR] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - Documentation @vkuzo - docs/source/conf.py - docs/source/quantization.rst - [quantize_fx](torch/ao/quantization/quantize_fx.py) @jerryzh168 - [common test routine](test/quantization/ao_migration/common.py) @HDCharles - JIT stuff @jamesr66a - torch/csrc/jit/passes/hoist_conv_packed_params.cpp - torch/csrc/jit/passes/quantization/helper.h - torch/csrc/jit/serialization/import_source.cpp Differential Revision: [D36860145](https://our.internmc.facebook.com/intern/diff/D36860145/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78713 Approved by: https://github.com/jerryzh168 commit 765fd77d9a96983e1a2adf496ac2fe66b4825f45 Author: Eli Uriegas Date: Sun Aug 21 12:33:41 2022 -0700 ci: Switch binary builds to github artifacting (#83778) Switches binary builds artifacting from s3 artifact solution to github's artifact solution. Signed-off-by: Eli Uriegas Pull Request resolved: https://github.com/pytorch/pytorch/pull/83778 Approved by: https://github.com/malfet commit 91e754b268c1869df5b2836f15c73e6ec1e265f1 Author: Nikita Shulga Date: Fri Aug 19 22:01:43 2022 +0000 [BE] setup.py refactors (#83635) No function changes, just move stuff around: - Move main code to `main` routine - Define torch and torchgen package data list in local vars Pull Request resolved: https://github.com/pytorch/pytorch/pull/83635 Approved by: https://github.com/kit1980 commit 5c5a5f150589b63220ffa2c8da11781d2a593a2b Author: Rui Zhu Date: Sun Aug 21 06:51:11 2022 +0000 Add HIP libs into torch depoly init list & corresponding dependency for CURE benchmark running on AMD (#83434) Summary: This diff adds needed targets for CURE benchmark on AMD, and also add hip lib to torch deploy init list Test Plan: on AMD host fbcode/, With model generated by D38509136 model.pt. cp model.pt /tmp/textray_v20220509.pt buck build mode/{dev-nosan,amd-gpu} mode/lower-locally -c fbcode.enable_gpu_sections=true -c fbcode.rocm_arch=mi100 -c fbcode.platform=platform010 //accelerators/tools/benchmark:PyTorchPredictorInferenceBenchmark buck-out/gen/accelerators/tools/benchmark/PyTorchPredictorInferenceBenchmark --replay_record_format recordio --replay_record_source /tmp/textray_20220509_prod.recordio --model_path /tmp/textray_v20220509.pt --batch_size=64 --batching_threads=1 --max_batch_wait_ms=500 --min_threads 5 --max_threads 5 --timeout_seconds 120 --check_allow_extra_field --diff_threshold 1e-3 --equal_threshold 1e-4 --thread_step 5 --use_cuda Reviewed By: mikekgfb Differential Revision: D38596119 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83434 Approved by: https://github.com/erichan1 commit 09e837634bc76c4380db7119e0e997816d584844 Author: Taylor Robie Date: Fri Aug 19 18:42:39 2022 -0700 [Profiler][Minor] Set end time on python events when profiling stops. (#83621) We don't have an end event for calls that are ongoing when profiling stops. (e.g. main) This cropped up when I was adding checks for negative durations. I also refactored `populate` to use a pop method. This not only allows me to implement this fix, but should also provide a convenient entry point for https://github.com/pytorch/pytorch/pull/82154 Differential Revision: [D38426342](https://our.internmc.facebook.com/intern/diff/D38426342/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83621 Approved by: https://github.com/slgong-fb commit 37f91d700bcb80c5f254c734a420ad89928771f4 Author: Taylor Robie Date: Fri Aug 19 18:42:37 2022 -0700 [Profiler] Break metadata generation into multiple visitors (#83033) This is a no-op change which establishes a base class to handle Result to Kineto details, and then splits the existing logging logic. (With the idea that at some point we'll probably conditionally run things to manage trace size.) Differential Revision: [D38469409](https://our.internmc.facebook.com/intern/diff/D38469409/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83033 Approved by: https://github.com/aaronenyeshi commit f295dd0735e05146106d6fc25df1449ff76d078b Author: Taylor Robie Date: Fri Aug 19 18:42:36 2022 -0700 [Profiler][Minor] Add typed visit method to Result. (#82993) Often in post processing we want to step into a specific typed context: "If X is a torch op, do Y". This PR simply adds an ergonomic way to write such cases. Differential Revision: [D38426341](https://our.internmc.facebook.com/intern/diff/D38426341/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82993 Approved by: https://github.com/chaekit commit 294f9d12826d63df1a736ffd4ef9dab2af2d10d0 Author: Taylor Robie Date: Fri Aug 19 18:42:34 2022 -0700 [Profiler][Minor] Organize collection.h/.cpp (#82992) Collection of Torch ops is quite complex compared to backend events / allocations / ooms. Python is also complex, however it is already factored into a standalone unit. This PR just shuffles the contents of collection.cpp to group the Torch op specific parts together, and does various cleanups to the code. Differential Revision: [D38426344](https://our.internmc.facebook.com/intern/diff/D38426344/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82992 Approved by: https://github.com/chaekit commit c9475fa927ef5557aa54e4e9a7bc2a9ab98cdcf7 Author: chenlai Date: Fri Aug 19 16:02:29 2022 -0700 Create flatbuffers_mobile (#82828) Differential Revision: [D38412636](https://our.internmc.facebook.com/intern/diff/D38412636/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38412636/)! Differential Revision: [D38412636](https://our.internmc.facebook.com/intern/diff/D38412636) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82828 Approved by: https://github.com/qihqi commit f45cd00d7ac1ce02c890bd73a96dc4be2233ad0d Author: Horace He Date: Sat Aug 20 01:46:32 2022 +0000 Added inference to context when only compiling forwards (#83783) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83783 Approved by: https://github.com/pyjhzwh, https://github.com/jansel commit 08c03c91d70a625f70487af81cd54edd5d16aa1a Author: Pallab Bhattacharya Date: Sat Aug 20 12:29:02 2022 +0000 guard include of x64 intrinsics headers (#83793) Summary: make inclusion of immintrin.h only for x64 Test Plan: CI Differential Revision: D38886597 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83793 Approved by: https://github.com/ajtulloch commit e0f2eba93d2804d22cd53ea8c09a479ae546dc7f Author: Rui Zhu Date: Sat Aug 20 10:02:08 2022 +0000 Move odd num_head in TransformerEncoder to slow_path (#83483) Summary: odd nhead is not supported for masked softmax, therefore we just move it to use old slow_path Test Plan: CI Differential Revision: D38720086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83483 Approved by: https://github.com/erichan1 commit 5a1f6d50a9a4059315c5530703e5dc9e38229715 Author: David Berard Date: Sat Aug 20 05:25:03 2022 +0000 Skip pr-sanity-checks with skip-pr-sanity-checks label (#83751) see #83752 for demo Pull Request resolved: https://github.com/pytorch/pytorch/pull/83751 Approved by: https://github.com/albanD, https://github.com/malfet commit f0ee21fe0ad3ac5ae9b07a863697346651c7e230 Author: Huy Do Date: Sat Aug 20 06:16:54 2022 +0000 Update cpuinfo to the latest commit (#83620) This hasn't been updated for a while, so pulling the latest commit from https://github.com/pytorch/cpuinfo. I wonder if it breaks anything Fixes #83594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83620 Approved by: https://github.com/malfet commit b2ddef28d70ca96f75a39f535a41a2d130f14d40 Author: Huy Do Date: Sat Aug 20 03:37:21 2022 +0000 Freeze the rest of python docs requirement (#83785) This is to avoid similar issue like #83774 `pip freeze -r requirements.txt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83785 Approved by: https://github.com/malfet commit 9732a7d84ee72521d006c9617430c4415016daef Author: Adam J. Stewart Date: Sat Aug 20 01:26:30 2022 +0000 torch.cartesian_prod: add type hints (#81377) Noticed this function was missing type hints. There are plenty more obviously, but this is the only one I happen to be using that is missing type hints. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81377 Approved by: https://github.com/malfet commit 0e0af73ba20f0fae3a20c385cc112a3be12337ef Author: Horace He Date: Sat Aug 20 00:47:11 2022 +0000 Add support for partial decompositions in make_fx (#83770) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83770 Approved by: https://github.com/ngimel commit d5a74efc82f632444f71dc19b111cc57bd406d19 Author: Edward Z. Yang Date: Thu Aug 18 19:22:38 2022 -0700 Don't extract tensor metadata from sparse tensors (#83669) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83669 Approved by: https://github.com/Chillee, https://github.com/bdhirsh commit 329deb9757469340379efe3edb09b7dac814a4e7 Author: Edward Z. Yang Date: Thu Aug 18 19:22:37 2022 -0700 Refactor is_X_like, better invariant checking for SymInt overload (#83668) Add is_symint_like, by way of is_base_ty_like which generalizes the pattern for is_tensor_like and is_generator_like. Now that we can query if a signature contains a SymInt, we can enforce that you must name the overload with SymInt if the signature contains SymInt. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83668 Approved by: https://github.com/bdhirsh, https://github.com/larryliu0820 commit 7fe19c03e482be1d108fbfca8fb0214e133970ad Author: Brian Hirsh Date: Fri Aug 19 11:56:55 2022 -0700 fix functionalization <> fake tensor mode (#83701) The bug is that: (1) functionalization kernels internally call `at::empty_strided()` to construct meta tensors, and then call the meta tensor op (2) This happens with the Python dispatch key already added to the TLS exclude set, so we expect these meta tensors never to enter python (3) When calling detach() though, `TensorImpl::shallow_copy_and_detach()` will currently always call into python when a PythonMode is set. Instead, I updated it to check if the Python key is in the TLS exclude set first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83701 Approved by: https://github.com/ezyang commit e9e7363854978e2f70cc65200b5b76a462e37672 Author: Brian Hirsh Date: Fri Aug 19 11:56:54 2022 -0700 reinplacing pass fixes for torchbench + huggingface (#83626) I'm testing out turning on re-inplacing + functionalization by default with the AOTAutograd + eager backend on torchbench + huggingface models. This PR contains a few bug fixes from turning re-inplacing on: (1) Handle more gracefully when FakeTensorMode is already turned on when you call reinplace (2) More robust detection for when an inplace variant of an op exists (the dumb bug was that `pow.Scalar` doesn't have an inplace variant, even though there are several overloads of `pow_`. None of them are eligible though (3) Avoid re-inplacing when it would require resizing the input buffer. This isn't allowed, because inplace ops aren't allowed to resize their inputs. For the last one, I gave the two main examples in more detail in the comments. Important cases are: ``` torch.add(tensor[1, 4], tensor[4, 4]) torch.ge(a, b) ``` (4) There's some logic around keeping `storage_to_nodes` up to date when we see a view op: if we re-inplace `out = a.add(...)`, and later in the program we encounter a "later_node",`out.view(..)`, and need to replace it with `a.view(...)`, then we need to update some metadata structures. I had to fix that logic: specifically, if "later_node" isn't a dispatcher op, (e.g. if it's an FX output node), I wasn't properly handling the case where the node's fake_meta info was not a tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83626 Approved by: https://github.com/ezyang commit cce32c6fa16bea7b1574bca76f0c18a78b68a04a Author: Brian Hirsh Date: Fri Aug 19 06:29:43 2022 -0700 functionalization: handle models that resize their program inputs (#83542) Context: When turning on functionalization + fake tensor mode and running `resnet50_quantized_qat` from torchbench, the model fails (and should be fixed by this PR). Before landing this PR, we need ProxyTensors (soon to be just fake tensors after https://github.com/pytorch/pytorch/pull/83330) to be resizable Pull Request resolved: https://github.com/pytorch/pytorch/pull/83542 Approved by: https://github.com/ezyang commit 0c24af498578480547822bce5c3aa43a6fa8b920 Author: Brian Hirsh Date: Fri Aug 19 06:29:43 2022 -0700 Always allow tensor metadata changes (#83590) Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`. This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring. My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes. It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang. (Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590 Approved by: https://github.com/ezyang commit a7d8863c7af99da8c5e10e2ed942f15d35980b86 Author: ssjia Date: Fri Aug 19 13:12:22 2022 -0700 [vulkan][ez] lock cache mutex when purging for ShaderCache (#83738) Acquire mutex before clearing cache in `ShaderCache`. Differential Revision: [D38865341](https://our.internmc.facebook.com/intern/diff/D38865341/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83738 Approved by: https://github.com/kirklandsign commit 155343ef2d9e75701ddf772e942a7b76b1f80570 Author: Huy Do Date: Fri Aug 19 22:27:29 2022 +0000 Pin sphinxcontrib.katex to 0.8.6 (#83774) sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering but it doesn't seem to work and hangs around idly. The initial thought is probably something related to Docker setup. We can investigate this later. Here is the release change log from the [sphinxcontrib-katex](https://github.com/hagenw/sphinxcontrib-katex/commit/e27a051532dee33fbe329636b042426bf3ad6e26) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83774 Approved by: https://github.com/janeyx99, https://github.com/malfet commit 2efbdbfcc490660e875ffe0bb8fad3dad7e8920c Author: Horace He Date: Fri Aug 19 04:37:55 2022 +0000 Make some optimizations to minifier (#83641) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83641 Approved by: https://github.com/eellison commit 13f42069a8659f24005e92a52432b9b91423150b Author: Jerry Zhang Date: Fri Aug 19 04:18:09 2022 +0000 [quant][fx][refactor] Rename qconfig_utils.py to qconfig_mapping_utils.py in torch/ao/quantization/fx (#83369) Summary: att, it seems more appropriate to name it qconfig_mapping_utils, also we probably want to move the functions in torch/ao/quantization/qconfig_mapping_utils.py to torch/ao/quantization/fx/qconfig_mapping_utils.py as well Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83369 Approved by: https://github.com/andrewor14 commit 1f38225b56b873c944196241ea448445a61798fd Author: Nikita Karetnikov Date: Fri Aug 19 18:51:57 2022 +0000 [primTorch] Add ref for `new_empty_strided` (#82466) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82466 Approved by: https://github.com/ezyang, https://github.com/ngimel commit 307421930a0e1ac610c42c07470813ea14f53964 Author: Aashaka Shah Date: Fri Aug 19 17:59:28 2022 +0000 Enable pg_nccl to perform vector AllGather for uneven output splits (#83713) Pushing PR on behalf of @aashaka To replace: https://github.com/pytorch/pytorch/pull/82835 Summary: A vector all_gather requires each process to gather other process' inputs into an output tensor according to the ouput list provided. Internally, pg_nccl.allgather will coalesce a list of pg_nccl._broadcast_oop to implement a vector all-gather in the case when the any shape is different in the output list. Otherwise, it will perform a ncclAllGather as usual. - This change adds an out-of-place `_broadcast_oop` function to ProcessGroupNCCL. It allows broadcasting an input tensor and placing the output in a separate output tensor. Since allgather provides an out-of-place API, an allgather_v semantic implemented inside `pg_nccl.allgather` also needs to support out-of-place, for which an out-of-place broadcast is required to be added. Test Plan: Added a new test `test_all_gather_v_cuda` for all_gather_v to `distributed_nccl_spawn`. Differential Revision: D37735263 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/83713 Approved by: https://github.com/mingzhe09088 commit 1fa9a377d01ba8e1a0b65cc2d05ed8a2d53a89f2 Author: Taylor Robie Date: Thu Aug 18 17:51:32 2022 -0700 [Profiler] Start moving python bindings out of autograd (#82584) A lot of profiler code still lives in autograd for historic reasons. However as we formalize and clean up profiler internals it makes sense to pull more and more into the profiler folders/namespace. For now I'm just moving some of the core config data structures and those related to `torch::profiler::impl::Result` to keep the scope manageable. Differential Revision: [D37961462](https://our.internmc.facebook.com/intern/diff/D37961462/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37961462/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/82584 Approved by: https://github.com/albanD, https://github.com/Gamrix commit 7453019e7943717c79b4b3b07e01f2ae5d7bc89f Author: Daniel Recoskie Date: Thu Aug 18 07:55:17 2022 -0700 Remove duplicate_dequantize_node and remove_extra_dequantize (#83611) Summary: removed duplicate_dequantize_node and remove_extra_dequantize Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: jerryzh168 Subscribers: Tasks: Tags: Differential Revision: [D38841052](https://our.internmc.facebook.com/intern/diff/D38841052) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83611 Approved by: https://github.com/jerryzh168 commit da520a43f228d4b2f5fda6ec0412080504822fc7 Author: Manuel Candales Date: Fri Aug 19 16:28:59 2022 +0000 [Vulkan] Fix issues in GRU and LSTM (#83722) Summary: This diffs fixes several issues in GRU and LSTM vulkan ops: - Add create_gru_context and create_lstm_context to vulkanFoldPrePackingOps - Add filter to insertPrePackedGruOp and insertPrePackedLstmOp to avoid matching gru.data and lstm.data usages - Fixed output dimension of GRU and LSTM - Allowed batch_first to be false when batch=1 and seq=1 Test Plan: Check that optimize_for_mobile runs and correctly folds the create context ops ``` buck run :export_for_mobile ~/ferraris/ferraris.ptl ~/ferraris ``` Check that vulkan api tests are still passing ``` buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 ``` Reviewed By: SS-JIA Differential Revision: D38811967 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83722 Approved by: https://github.com/SS-JIA commit 108a1fb17374974534254f5b4652bbf2b3dff0e5 Author: Ivan Yashchuk Date: Fri Aug 19 16:20:34 2022 +0000 Avoid using fx.Interpreter in nvfuser executor function (#83607) Using fx.Interpreter is a nice way of modifying the calls inside of FX graphs, but it introduces unnecessary overhead in this case. Example: ```py import torch from torch.fx.experimental.proxy_tensor import make_fx from torch._prims.context import TorchRefsNvfuserCapabilityMode from torch._prims.executor import execute a = torch.randn(3, 2, dtype=torch.float16, device="cuda") s = torch.sigmoid d = torch.digamma # digamma is not supported in nvfuser and aten eager execution is used def func(a): return s(d(s(d(s(d(s(a))))))) with TorchRefsNvfuserCapabilityMode(): gm = make_fx(func)(a) %%timeit execute(gm, a, executor="nvfuser"); torch.cuda.synchronize(); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83607 Approved by: https://github.com/ezyang commit e0d26ee0927aa4193a444840e1d52699f65f9724 Author: ssjia Date: Thu Aug 18 14:49:04 2022 -0700 [vulkan] Throw std::runtime_error instead of using TORCH_CHECK when creating Vulkan context/runtime fails (#83627) Currently, if unable to load the global context/runtime, an error will be thrown using `TORCH_CHECK(false, ...)`. This diff changes it to throw a `std::runtime_error` directly instead. The reason for this is that `TORCH_CHECK()` will not preserve error messages `#ifdef STRIP_ERROR_MESSAGES`. However, it is more useful for the error reason to be always present for the purpose of diagnosing driver support issues and detecting if a model load failure is related to Vulkan. Differential Revision: [D38800348](https://our.internmc.facebook.com/intern/diff/D38800348/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83627 Approved by: https://github.com/salilsdesai commit 1407e6728cb29d33eceffc55428f82baceb628c5 Author: jjsjann123 Date: Fri Aug 19 16:05:39 2022 +0000 Nvfuser python api patch take 2 (#83684) landing #83645 again. Previously we are breaking on codegen bf16 kernel for cuda TK 10.2. Added a short-cut to disable bf tests on pre cuda 11 build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83684 Approved by: https://github.com/ngimel commit 0ec7fc13d6e02ad0f09bd115cba89fa8304d4f12 Author: Edward Z. Yang Date: Thu Aug 18 19:22:37 2022 -0700 Refactor CppSignatureGroup to collect signatures as list. (#83667) This makes it easier to add more signatures to the signature group, as relevant logic which needs to run for each signature no longer needs to be adjusted. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83667 Approved by: https://github.com/larryliu0820, https://github.com/bdhirsh commit 03e322c8d64056db748dd28ca6121180db2f9fe3 Author: Sherlock Huang Date: Fri Aug 19 15:55:44 2022 +0000 Switch fx.replace_pattern to use new SubgraphMatcher (#83717) This is a duplicate of https://github.com/pytorch/pytorch/pull/82295 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83717 Approved by: https://github.com/ezyang commit 73652dd1c40f6824aaa432fdfb9d2da82fc317aa Author: Ezgi Çiçek Date: Fri Aug 19 14:14:32 2022 +0000 Avoid unnecessary copy of pointeeSet in MemoryDAG::setWildcards (#83681) Test Plan: CI By making use of latest [copy constructor tags](https://fb.workplace.com/groups/638005567605797/permalink/731932211546465/), [this strobelight query](https://fburl.com/scuba/strobelight_services/5nizhawx) shows that this is used in several services such as `mast_hpc_job/customer_application` or `mui/mui_service_bi` Differential Revision: D38830797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83681 Approved by: https://github.com/mikeiovine commit 93eedc51a5f8a6ba18bf4c87a26e1cc3b34cc177 Author: Richard Zou Date: Thu Aug 18 06:57:23 2022 -0700 [functorch] re-classify linalg.eigh in vmap testing (#83614) Similar to the previous PR, linalg.eigh doesn't give unique output. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/83614 Approved by: https://github.com/samdow commit 8788e92f0f3f23249161fdb91aafa4ecc7d4f131 Author: Peter Bell Date: Fri Aug 19 03:32:18 2022 +0100 Move `torch.linalg` opinfos to opinfo.definitions (2/2) (#83554) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83554 Approved by: https://github.com/albanD commit 8dbb0990bccb7b12f986f5cbc182c384041334ff Author: Peter Bell Date: Fri Aug 19 03:32:18 2022 +0100 Move `torch.linalg` opinfos to opinfo.definitions (1/2) (#83547) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83547 Approved by: https://github.com/albanD commit 4aeb98dee9756119f6a6414338e92f2b52c83346 Author: Peter Bell Date: Fri Aug 19 03:32:17 2022 +0100 Move RefInfo classes into opinfo.refs (#83563) Given that there is already a clear `op_db`, `python_ref_db` split I think it makes sense to have the `RefInfo` classes be defined in a different file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83563 Approved by: https://github.com/albanD commit f4caeb25e94abdb40f8217a84dbe4d55a21f7d7a Author: Peter Bell Date: Fri Aug 19 03:32:17 2022 +0100 Move gradcheck_wrapper and clone_sample funcs into opinfo.core (#83560) The linalg OpInfos need these, so moving them into core to prevent circular dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83560 Approved by: https://github.com/albanD commit ae68e455be3c264b2b3bcc61819dda5627f751a9 Author: Peter Bell Date: Fri Aug 19 03:32:16 2022 +0100 Enable formatting in all of testing/_internal/opinfo (#83559) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83559 Approved by: https://github.com/albanD commit b4bc0d249f782d3afc877e61189f7427f6d55968 Author: Kshiteej K Date: Fri Aug 19 11:59:31 2022 +0000 [composite compliance] batch_norm (#79990) Fixes https://github.com/pytorch/pytorch/issues/76283 Ref: #69991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79990 Approved by: https://github.com/zou3519 commit a6f777c80dd7fca5022ba102b86833089b4d3444 Author: Luca Wehrstedt Date: Fri Aug 19 11:52:18 2022 +0000 Ensure cuda_primary_ctx test is run on multigpu CI (#83252) Summary: It requires 2 GPUs, but it wasn't added to the list of tests running on multigpu jobs. Test Plan: Look at CI Differential Revision: D38616784 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83252 Approved by: https://github.com/malfet commit ca9919e3e81bb563422beb83e61b71cb8deca62c Author: PyTorch MergeBot Date: Fri Aug 19 10:26:40 2022 +0000 [vision hash update] update the pinned vision hash (#83729) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83729 Approved by: https://github.com/pytorchbot commit b8d647e1d56114514b98a6fdc8f5141784b8a016 Author: Catherine Lee Date: Fri Aug 19 06:18:40 2022 +0000 Revert "Manually shard slow-gradcheck CI job to prevent timeout #83354" (#83704) Now that https://github.com/pytorch/test-infra/pull/529 exists, we can undo the custom sharding from #83354 for slow grad check test plan: look at logs to see if it sharded + look at time to see that its evenly distributed Pull Request resolved: https://github.com/pytorch/pytorch/pull/83704 Approved by: https://github.com/huydhn commit 5bc85fccebebba3397fc42c72a68186efe80abe8 Author: Collin Schlager Date: Fri Aug 19 05:04:56 2022 +0000 Remove assertEqualIgnoreTypes from test_unary_ufuncs (#83711) Fix TODOs related to #38095 in test_unary_ufuncs.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/83711 Approved by: https://github.com/kit1980 commit 0ff929f4871a283b15672e23e23e267cab4f866b Author: Wonjoo Lee Date: Fri Aug 19 03:51:15 2022 +0000 Add lazy shape inference for take op (#82679) Add lazy shape inference for take op --- Companion PR on XLA's side: https://github.com/pytorch/xla/pull/3818 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82679 Approved by: https://github.com/JackCaoG commit 76d5699e13352930be89d61442b82230950a35cf Author: Ian Barber Date: Fri Aug 19 02:51:44 2022 +0000 Fix use-generator lint warnings in module.py (#83700) % pylint --disable=all --enable=R1729 torch/nn/modules/module.py Verified in pylint 2.14.5 -------------------------------------------------------------------- Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83700 Approved by: https://github.com/kit1980, https://github.com/albanD commit 61b2cde5270986476f47b58b984de80d02aac321 Author: PyTorch MergeBot Date: Fri Aug 19 02:27:03 2022 +0000 Revert "Enable formatting in all of testing/_internal/opinfo (#83559)" This reverts commit a7e619690936ed3b90e6f035ec078ed630e83e93. Reverted https://github.com/pytorch/pytorch/pull/83559 on behalf of https://github.com/peterbell10 due to Stack broke lint commit 107465af2cbbdc3d0ac6f89375bab0cd32eea5ff Author: PyTorch MergeBot Date: Fri Aug 19 02:23:32 2022 +0000 Revert "Move gradcheck_wrapper and clone_sample funcs into opinfo.core (#83560)" This reverts commit 5120263703b75bc3036cf2009d944e03e52eeb99. Reverted https://github.com/pytorch/pytorch/pull/83560 on behalf of https://github.com/peterbell10 due to Stack broke lint commit 0ddabe56ad7c3e955cfd3cfc8d5e2f48acdc13ac Author: PyTorch MergeBot Date: Fri Aug 19 02:21:40 2022 +0000 Revert "Move RefInfo classes into opinfo.refs (#83563)" This reverts commit 03ce36e3c139aa8aaf1e6184303dd6bf12d168f3. Reverted https://github.com/pytorch/pytorch/pull/83563 on behalf of https://github.com/peterbell10 due to Stack broke lint commit c8730d0a2fdb4b1b3b98e6e5431ca19b18eeaf52 Author: PyTorch MergeBot Date: Fri Aug 19 02:18:21 2022 +0000 Revert "Move `torch.linalg` opinfos to opinfo.definitions (1/2) (#83547)" This reverts commit bb86c31e2609304a81629351a107ebe810977606. Reverted https://github.com/pytorch/pytorch/pull/83547 on behalf of https://github.com/peterbell10 due to Stack broke lint commit 88e0165d085166ce13ef443991eea003ee86869e Author: vspenubarthi Date: Thu Aug 18 16:42:03 2022 -0700 [ao] Added Equalization QConfig generation to ModelReport class (#83698) Summary: This adds the capability to generate a QConfigMapping based on the suggestions of the ModelReport API for the user to use. The only dependency of this feature is that the calibration is run before the generation of the QConfigMapping and there is no dependency on the report generation other than that the observers cannot be removed before this is called. This maps module fqns to EqualizationQConfigs instead of regular QConfigs. Example Usage (after callibration): ``` quantization_mapping = mod_report.generate_qconfig_mapping() equalization_mapping = mod_report.generate_equalization_mapping() prepared_model = quantize_fx.prepare_fx(model, mapping, example_input, _equalization_config=equalization_mapping) quantized_model = quantize_fx.convert_fx(prepared) ``` This was tested by ensuring that the suggestions generated in the
QConfigMapping are: 1. Correct according to the set backend and data passed through 2. Able to be prepared and converted as a proper config (is a valid
config) The test for this is a part of the TestFxModelReportClass test suite. Test Plan: python test/test_quantization.py TestFxModelReportClass.test_equalization_mapping_generation Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83698 Approved by: https://github.com/jerryzh168 commit 393137e13fb6a56c9803e24131c4cac0bb1f1e48 Author: PyTorch MergeBot Date: Fri Aug 19 02:14:42 2022 +0000 Revert "Move `torch.linalg` opinfos to opinfo.definitions (2/2) (#83554)" This reverts commit 1f2efdce1534bef50d47e7706e58a1c611b2d4a7. Reverted https://github.com/pytorch/pytorch/pull/83554 on behalf of https://github.com/peterbell10 due to Stack broke lint commit 05849eafb92def1c0071d5a7b0bb782360145cbb Author: Justin Chu Date: Thu Aug 18 16:28:16 2022 -0700 [ONNX] Create empty opset 17 symbolic file (#83287) The PR - Creates an empty symbolic file to house the new ops defined in ONNX 17 - Increments the max version to 17 and fixes the doc for version 16 - Enables tests for opset 17 - Updates the IR version in `export.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83287 Approved by: https://github.com/thiagocrepaldi, https://github.com/AllenTiTaiWang, https://github.com/BowenBao commit 1f2efdce1534bef50d47e7706e58a1c611b2d4a7 Author: Peter Bell Date: Thu Aug 18 13:04:13 2022 +0100 Move `torch.linalg` opinfos to opinfo.definitions (2/2) (#83554) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83554 Approved by: https://github.com/albanD commit bb86c31e2609304a81629351a107ebe810977606 Author: Peter Bell Date: Thu Aug 18 13:04:12 2022 +0100 Move `torch.linalg` opinfos to opinfo.definitions (1/2) (#83547) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83547 Approved by: https://github.com/albanD commit 03ce36e3c139aa8aaf1e6184303dd6bf12d168f3 Author: Peter Bell Date: Thu Aug 18 13:04:12 2022 +0100 Move RefInfo classes into opinfo.refs (#83563) Given that there is already a clear `op_db`, `python_ref_db` split I think it makes sense to have the `RefInfo` classes be defined in a different file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83563 Approved by: https://github.com/albanD commit 5120263703b75bc3036cf2009d944e03e52eeb99 Author: Peter Bell Date: Thu Aug 18 13:04:12 2022 +0100 Move gradcheck_wrapper and clone_sample funcs into opinfo.core (#83560) The linalg OpInfos need these, so moving them into core to prevent circular dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83560 Approved by: https://github.com/albanD commit a7e619690936ed3b90e6f035ec078ed630e83e93 Author: Peter Bell Date: Thu Aug 18 13:04:11 2022 +0100 Enable formatting in all of testing/_internal/opinfo (#83559) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83559 Approved by: https://github.com/albanD commit 7aba6f8e7b67d08628e80f93d9146224163b1300 Author: chenlai Date: Thu Aug 18 10:37:33 2022 -0700 Rename flatbuffer_serializer to *_mobile or *_full_jit (#82827) The target named `flatbuffer_serializer` in fbcode has dependency from full jit and the one in xplat has dependency for mobile only. Rename them accordingly ``` flatbuffer_serializer in fbode -> flatbuffer_serializer_full_jit flatbuffer_serializer in xplat -> flatbuffer_serializer_mobile ``` so it's more readable. Differential Revision: [D38413369](https://our.internmc.facebook.com/intern/diff/D38413369/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38413369/)! Differential Revision: [D38413369](https://our.internmc.facebook.com/intern/diff/D38413369) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82827 Approved by: https://github.com/qihqi commit b02e620fa3e789645df099aedef29fb80a2068d5 Author: Scott Wolchok Date: Thu Aug 18 15:55:54 2022 -0700 [PyTorch] Bypass dispatch for narrow() calls within split_with_sizes (#83213) This can add up to a lot of dispatcher overhead if there are a lot of splits. split_with_sizes already has an autograd formula so this should Just Work? Differential Revision: [D38600576](https://our.internmc.facebook.com/intern/diff/D38600576/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83213 Approved by: https://github.com/albanD commit 784c47fbeea235823f29a6d035fe7eaea3f30680 Author: Jerry Zhang Date: Thu Aug 18 20:08:53 2022 +0000 [quant][fx][refactor] Move ObservationType to backend_config.py (#83368) Summary: Now we have a separate file to define BackendConfig related classes, we can move ObservationType to that file as well Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83368 Approved by: https://github.com/andrewor14 commit 82507ce334be9171729faecd8fc2ac9efa8c07e6 Author: Elias Ellison Date: Thu Aug 18 19:53:41 2022 +0000 Minifier fix for non tensor inputs (#83644) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83644 Approved by: https://github.com/Chillee commit 1f3ef5a2c800780e1f63daff5758f953ffe4dfc5 Author: Jeff Daily Date: Fri Aug 19 00:31:30 2022 +0000 [ROCm] unskip test_jit TestBackendsWithCompiler (#81281) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81281 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet commit f094113ebf5b4e5281ab1a134220a1a985f03964 Author: Nikita Shulga Date: Thu Aug 18 23:52:43 2022 +0000 [MPS] Add native bitwise-not implementation (#83678) Follows the same pattern as bitwise binary ops Rename `BitwiseBinaryOps.mm` to `BitwiseOps.mm` Already tested in `test_mps.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83678 Approved by: https://github.com/albanD, https://github.com/kulinseth commit b14df5334d5910fc77c2658532c303fac0809236 Author: Peter Bell Date: Thu Aug 18 17:40:16 2022 +0100 CMake: List python source files as codegen dependencies (#83683) The pyi, selected_mobile_ops and nvfuser code generators were missing some dependencies outright. The autograd codegen had some effort to list out specific files that it depends on, but this has clearly fallen out of sync so it's safer to just depend on the entire folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83683 Approved by: https://github.com/albanD commit 5e715be17ebb7f7fbfa232e951a14abf3b1a7a5a Author: vspenubarthi Date: Thu Aug 18 12:59:15 2022 -0700 [ao] Added Quantization QConfig generation to ModelReport class (#83688) Summary: This adds the capability to generate a QConfigMapping based on the suggestions of the ModelReport API for the user to use. The only dependency of this feature is that the callibration is run before the generation of the QConfigMapping and there is no dependency on the report generation other than that the observers cannot be removed before this is called. Example Usage (after callibration): ``` mapping = mod_report.generate_qconfig_mapping() prepared_model = quantize_fx.prepare_fx(model, mapping, example_input) quantized_model = quantize_fx.convert_fx(prepared) ``` This was tested by ensuring that the suggestions generated in the QConfigMapping are: 1. Correct according to the set backend and data passed through 2. Able to be prepared and converted as a proper config (is a valid config) The test for this is a part of the TestFxModelReportClass test suite. Test Plan: python test/test_quantization.py TestFxModelReportClass.test_qconfig_mapping_generation Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83688 Approved by: https://github.com/jerryzh168 commit 72963bbae9b7f2a4f2e7c5fc84abdaa2f3552e73 Author: Milad Mohammadi Date: Thu Aug 18 22:53:18 2022 +0000 Update isDynamic api to align with is_symbolic API (#83415) Downstream #https://github.com/pytorch/xla/pull/3888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83415 Approved by: https://github.com/Krovatkin commit 04353f7837dccdc7c344dc0bdd82288957dcbc94 Author: Justin Chu Date: Thu Aug 18 22:51:57 2022 +0000 Check existence of the array ref when tracing resize_ (#81422) When `.resize_` takes an empty `torch.Size` or ints, tracing it would result in a `RuntimeError: _Map_base::at` (key not found in map). In https://github.com/pytorch/pytorch/blob/0d124fc6961f5b39f1a46722dab2d88f23686783/torch/csrc/jit/frontend/tracer.h#L126-L129 - This change updates `TraceType::resize_` to check the mapping first. - It also updates the warning message when tracing `resize_` to suggest using reshape or view. Repo: ```python import torch class M(torch.nn.Module): def forward(self, x, y): print(y.shape) x = x.resize_(y.shape) return x, y x = torch.tensor(1.2) y = torch.tensor(4.2) M()(x, y) torch.jit.trace(M(), (x, y)) ``` Related: https://github.com/pytorch/pytorch/issues/76486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81422 Approved by: https://github.com/BowenBao, https://github.com/malfet commit 91521449445077c9ee977b18e2d0f19be4dd1c5b Author: Edward Z. Yang Date: Wed Aug 17 20:30:41 2022 -0700 Coverage for nondeterministic_seeded, respect it in constant prop (#83650) - nondeterministic_seeded was not applied to enough functions. I added some heuristics to codegen for identifying functions that are likely to be random and added a bunch of these tags to functions. Not sure I got all of them. - Don't constant propagate through nondeterministic functions in FX tracing. It would be better to do some testing for the tag but this would be quite an effort. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83650 Approved by: https://github.com/bdhirsh, https://github.com/eellison commit 24acc3155fae43ea9f2ab9c8e31d83f55dd7d7f1 Author: Edward Z. Yang Date: Wed Aug 17 20:30:13 2022 -0700 Be more conservative about propagating constants. (#83648) If a constant would turn into something large, don't keep it as a constant, just drop it. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83648 Approved by: https://github.com/eellison commit 02581f053bbb824b7d42b1df8655eff977865093 Author: Edward Z. Yang Date: Wed Aug 17 20:30:12 2022 -0700 Address CR comments for "Delete ProxyTensor wrapper subclass" (#83646) CR is on https://github.com/pytorch/pytorch/pull/83330 - Factor proxy slot getters/setters into helper functions - Use a weak map for storing proxies, so they go away when tracing is done - More documentation on SymDispatchMode Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83646 Approved by: https://github.com/Chillee commit a7baad04f6f29a97743e98d25c369b21aed18faf Author: Sherlock Huang Date: Thu Aug 18 17:54:52 2022 +0000 Preserve stack trace for backward nodes over AOTAutograd (#83558) For the following program. ``` def my_relu(a): return a.relu() def func(a, b): a = torch.nn.Linear(10, 10)(a) d = torch.square(b) d = my_relu(d) loss = d.sum() return loss with torchdynamo.optimize("aot_nop"): x = torch.rand(10, 10, requires_grad=True) y = torch.rand(10, 10, requires_grad=True) out = func(x, y) ``` It would generate the following fx graph with stack_trace populated in both forward and backward nodes. ``` def forward(self, primals, tangents): primals_1, primals_2, primals_3, primals_4, tangents_1, = fx_pytree.tree_flatten_spec([primals, tangents], self._in_spec) t_default = torch.ops.aten.t.default(primals_3); primals_3 = None addmm_default = torch.ops.aten.addmm.default(primals_4, primals_1, t_default); primals_4 = primals_1 = t_default = None pow_tensor_scalar = torch.ops.aten.pow.Tensor_Scalar(primals_2, 2) relu_default = torch.ops.aten.relu.default(pow_tensor_scalar); pow_tensor_scalar = None detach_default = torch.ops.aten.detach.default(relu_default) sum_default = torch.ops.aten.sum.default(relu_default); relu_default = None is_same_size_default = torch.ops.aten.is_same_size.default(sum_default, tangents_1) expand_default = torch.ops.aten.expand.default(tangents_1, [10, 10]); tangents_1 = None detach_default_1 = torch.ops.aten.detach.default(detach_default); detach_default = None threshold_backward_default = torch.ops.aten.threshold_backward.default(expand_default, detach_default_1, 0); expand_default = detach_default_1 = None pow_tensor_scalar_1 = torch.ops.aten.pow.Tensor_Scalar(primals_2, 1.0); primals_2 = None mul_scalar = torch.ops.aten.mul.Scalar(pow_tensor_scalar_1, 2.0); pow_tensor_scalar_1 = None mul_tensor = torch.ops.aten.mul.Tensor(threshold_backward_default, mul_scalar); threshold_backward_default = mul_scalar = None return pytree.tree_unflatten([sum_default, None, mul_tensor, None, None], self._out_spec) ====== joint graph ======= primals_1 None primals_2 None primals_3 None primals_4 None tangents_1 None t_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 12, in func def func(a, b): File "/fsx/users/bahuang/repos/pytorch_fsx/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) addmm_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 12, in func def func(a, b): File "/fsx/users/bahuang/repos/pytorch_fsx/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) pow_tensor_scalar File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 14, in func d = torch.square(b) relu_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 15, in func d = my_relu(d) File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 10, in my_relu return a.relu() detach_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 15, in func d = my_relu(d) File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 10, in my_relu return a.relu() sum_default is_same_size_default expand_default detach_default_1 File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 15, in func d = my_relu(d) File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 10, in my_relu return a.relu() threshold_backward_default File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 15, in func d = my_relu(d) File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 10, in my_relu return a.relu() pow_tensor_scalar_1 File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 14, in func d = torch.square(b) mul_scalar File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 14, in func d = torch.square(b) mul_tensor File "/fsx/users/bahuang/repos/pytorch_fsx/test.py", line 14, in func d = torch.square(b) output None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83558 Approved by: https://github.com/albanD commit e2e71c1f4c924c5e9e02b25eb66296a697f4b3e7 Author: samdow Date: Thu Aug 18 12:28:41 2022 -0400 [functorch] add linalg solve batch rule (#82814) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82814 Approved by: https://github.com/zou3519 commit ff533b1efa26ed0dc5e3caa332de05f53963e360 Author: Nikita Shulga Date: Thu Aug 18 21:59:15 2022 +0000 [MPS] Fix torch.full for uint8 (#83697) By creating uint32 tensor and then downcasting it to uint8 Workaround https://github.com/pytorch/pytorch/issues/83692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83697 Approved by: https://github.com/albanD commit 88d3acd6b194cdd7e06d7b4e8e7d5aed7294adb2 Author: Mario Lezcano Date: Thu Aug 18 05:53:02 2022 -0500 Fix and improve the efficiency of the backward of xlog* functions. (#82713) That is `xlogy`, `special.xlogy`, `special.xlog1py`. Fixes https://github.com/pytorch/pytorch/issues/80770 Fixes https://github.com/pytorch/pytorch/issues/74279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82713 Approved by: https://github.com/albanD commit 9e1560821eab91c00d03e4e257c6973c34735ba1 Author: Fabio Rocha Date: Thu Aug 18 12:31:21 2022 -0500 [primTorch] Refs for pdist, triu and related ops (#82819) This PR adds refs for the following ops: - `torch.triu` - `torch.tril` - `torch.triu_indices` - `torch.tril_indices` - `torch.nn.functional.pairwise_distance` - `torch.nn.functional.pdist` It adds OpInfos for - `torch.triu_indices` - `torch.tril_indices` Note that these were already tested in `test/test_tensor_creation_ops.py` but for the ref tests we need the OpInfos. Finally, it improves documentation for PairwiseDistance and adds a missing import to `torch/testing/_internal/opinfo/core.py`. This started with an attempt to just add the `nn.functional` refs above, but it turned out that `pdist` was easiest to implement using `triu_indices` so I added that one and the related functions as well. ~~In the end, I changed the `pdist` implementation to not use `triu_indices` but kept the other refs.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/82819 Approved by: https://github.com/ngimel commit 38348362608a47371c65d7fd52db138b4c6a5d65 Author: albanD Date: Thu Aug 18 20:54:44 2022 +0000 [DataLoader] Move loop content into a function to ensure we don't preserve anything (#83595) Can lead to CPU memory saving as we don't hold onto the pin memory buffer as long as we used to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83595 Approved by: https://github.com/ejguan, https://github.com/NivekT commit 23d22724739b7a27249cde1354150a43e278ed10 Author: Howard Huang Date: Thu Aug 18 15:46:09 2022 +0000 Add remaining device types in the pybinded DeviceType enum (#83676) Small change to update pybinded definition to match https://github.com/pytorch/pytorch/blob/master/c10/core/DeviceType.h#L32-L58 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83676 Approved by: https://github.com/albanD commit 46ba9f2e52a43a35d23b1ce8f00f9d2614c24204 Author: PyTorch MergeBot Date: Thu Aug 18 20:28:20 2022 +0000 Revert "Remove conj kernels for real dtypes (#80374)" This reverts commit ad44079952b945262808af8fa841994f736c1fe2. Reverted https://github.com/pytorch/pytorch/pull/80374 on behalf of https://github.com/atalman due to Breaks internal build UnaryOpsKernel.cpp:208:5: error: unused type alias 'scalar_t' [-Werror,-Wunused-local-typedef] commit d11d3dd036b4a7098ab3b4d333fcb556b97b4860 Author: Rodrigo Kumpera Date: Thu Aug 18 19:40:15 2022 +0000 [dist.cp] Introduce LoadPlanner and SavePlanner extensibility API. (#83419) The planners come with default implementations in default_planner.py. The default planners expose their core functionality as separate functions to make it easy for other checkpoint implementations to use this functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83419 Approved by: https://github.com/wanchaol commit 4a033be4482441d3f61a887502d75356a90e6a6a Author: Richard Zou Date: Thu Aug 18 06:57:22 2022 -0700 [functorch] reclassify svd as an allowed failure; add test (#83612) svd when done on a batch of inputs vs the input in a for-loop may return different results because svd isn't unique. So, instead of checking that the output of vmap and the output of a for-loop are the same, we check that matrix-multiplying the decomposed tensors results in the same tensor when doing it under vmap vs under a for-loop. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/83612 Approved by: https://github.com/samdow commit 601aca2a2dedfe7b46f2815649682924a31d50ce Author: Richard Zou Date: Thu Aug 18 06:57:22 2022 -0700 [functorch] add some vmap+jvp inplace+view tests (#83178) No problems here, just more tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83178 Approved by: https://github.com/samdow commit d84dc589c24bb28147aaafddb67dff4fac6ac9a7 Author: Richard Zou Date: Thu Aug 18 06:57:21 2022 -0700 [functorch] relax as_strided batching rule (#83597) Previously there was a constraint that the bdim is required to be at the front. As I noted in the comment in the code that I wrote years ago, this is not necessary for correctness, we were just guarding against potentially incorrect behavior and assumed most people would not vmap over dimensions other than 0. Now, the above assumption did not age very well, because we have batch rules that return a BatchedTensor where the bdim is something other than 0 (e.g. convolution batch rule). This PR deletes the check for that assumption and adds additional manual tests that the as_strided batching rule works when one vmaps over a dimension other than 0. Automatic tests don't exist because it's a bit hard to get the test_vmap_exhaustive test runner to replicate the strides of the inputs faithfully. Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/83597 Approved by: https://github.com/samdow commit 69728d7dd9b6a05a503e25759ed589754741ff01 Author: Richard Zou Date: Thu Aug 18 06:57:21 2022 -0700 [functorch] annotate test_jvpvjp (#83530) Most of these are just "forward-mode Ad formula not implemented" Pull Request resolved: https://github.com/pytorch/pytorch/pull/83530 Approved by: https://github.com/samdow commit e4f74f0891da9e49c5c82df05794f7723b05cbac Author: Justin Chu Date: Tue Aug 16 16:33:37 2022 +0000 [ONNX] Update the default opset to version 14 (#83284) Update the default opset by running the `update_default_opset_version.py` script. The update is done in a regularly to ensure we are in sync with the onnx updates. All changes are produced by the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83284 Approved by: https://github.com/AllenTiTaiWang, https://github.com/malfet, https://github.com/BowenBao commit f204afc2bbd7bec6826b49403978ed7f93ccc9f3 Author: Olga Andreeva Date: Thu Aug 18 18:41:14 2022 +0000 Added communication hook for sharded cases (#83254) Fixes https://github.com/pytorch/pytorch/issues/79114 An implementation of a FSDP communication hook interface for a sharded strategies: - Added `reduce_scatter_hook` to default hooks. Note the difference of `reduce_scatter` from `all_reduce`, it requires 2 tensors:`input_gradient` and `output` variables and stores result in `output`, which is further used as a summed gradient shard. - Adjusted FSDP logic to return `reduce_scatter_hook` as a default communication hook for sharded strategies, `DefaultState` is the same for sharded and non-sharded strategies. - Adjusted low-precision hooks to work with both `all_reduce` and `reduce_scatter` depending on whether `output` tensor is provided or not. Test plan: Added all existing sharded strategies as an input parameters to existing tests. For`test_default_communication_hook_behaviour` double checked how a linear layer is sharded across workers. This test creates a simple net ``1 X N``, where ``N`` - is the number of workers. For sharded cases, ``N`` parameters are sharded across ``N`` workers. This test checks that after backward, each worker has a proper value in it's chunk of the gradient, or the whole gradient on every worker is equal to an expected value. Checked that low-precision tests work for sharded cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83254 Approved by: https://github.com/rohan-varma, https://github.com/awgu commit 78c8a0d75220bdd4955415b5f81509e005af4232 Author: zaf Date: Thu Aug 18 03:59:30 2022 -0700 [quant][ao_migration] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` (#78712) Context: In order to avoid the cluttering of the `torch.nn` namespace the quantized modules namespace is moved to `torch.ao.nn`. The list of the `nn.quantized` files that are being migrated: - [ ] `torch.nn.quantized` → `torch.ao.nn.quantized` - [X] [Current PR] `torch.nn.quantized.functional` → `torch.ao.nn.quantized.functional` - [ ] `torch.nn.quantized.modules` → `torch.ao.nn.quantized.modules` - [ ] `torch.nn.quantized.dynamic` → `torch.ao.nn.quantized.dynamic` - [ ] `torch.nn.quantized._reference` → `torch.ao.nn.quantized._reference` - [ ] `torch.nn.quantizable` → `torch.ao.nn.quantizable` - [ ] `torch.nn.qat` → `torch.ao.nn.qat` - [ ] `torch.nn.qat.modules` → `torch.ao.nn.qat.modules` - [ ] `torch.nn.qat.dynamic` → `torch.ao.nn.qat.dynamic` - [ ] `torch.nn.intrinsic` → `torch.ao.nn.intrinsic` - [ ] `torch.nn.intrinsic.modules` → `torch.ao.nn.intrinsic.modules` - [ ] `torch.nn.intrinsic.qat` → `torch.ao.nn.intrinsic.qat` - [ ] `torch.nn.intrinsic.quantized` → `torch.ao.nn.intrinsic.quantized` - [ ] `torch.nn.intrinsic.quantized.modules` → `torch.ao.nn.intrinsic.quantized.modules` - [ ] `torch.nn.intrinsic.quantized.dynamic` → `torch.ao.nn.intrinsic.quantized.dynamic` Majority of the files are just moved to the new location. However, specific files need to be double checked: - [Documentation](docs/source/quantization-support.rst) @vkuzo - [Public API test list](test/allowlist_for_publicAPI.json) @peterbell10 Differential Revision: [D36792967](https://our.internmc.facebook.com/intern/diff/D36792967/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36792967/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/78712 Approved by: https://github.com/jerryzh168 commit 3e1fc85b23f9f12ff2ba5be645841bde90dba14e Author: Chien-Chin Huang Date: Wed Aug 17 22:46:54 2022 -0700 [FSDP] Implement sharded_optim_state_dict and flatten_sharded_optim_state_dict. (#77628) As title Differential Revision: [D36436496](https://our.internmc.facebook.com/intern/diff/D36436496/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77628 Approved by: https://github.com/awgu commit cd0ab154b5662f5dae36456971db5bc574d6cbe1 Author: JackCaoG Date: Thu Aug 18 16:36:54 2022 +0000 Handle python frame is empty in GetPythonFrames (#83643) Fixes https://github.com/pytorch/xla/issues/3900 and https://github.com/pytorch/xla/issues/3795 for pytorch/xla when `XLA_IR_DEBUG=1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83643 Approved by: https://github.com/Krovatkin commit abcf01196cd27805349aa892db847f9a61f52c0e Author: Clive Chan Date: Thu Aug 18 15:24:18 2022 +0000 Release the GIL when munmap'ing tensors - fixes #77139 (#83623) Fixes #77139, where deallocating large tensors with munmap takes a significant amount of time while holding the GIL. This causes the pin_memory thread to interfere with the main thread = performance sadness. Thanks @igozali @zhengwy888 @colesbury as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83623 Approved by: https://github.com/albanD commit f84e087d5e6f458f69274e2ace127af7d4fa8d82 Author: PyTorch MergeBot Date: Thu Aug 18 14:00:42 2022 +0000 Revert "fixing define_constant pybind signature to match std::complex scalar (#83645)" This reverts commit 278c726458c1febdde7420734477bf8b552c0243. Reverted https://github.com/pytorch/pytorch/pull/83645 on behalf of https://github.com/albanD due to broke master test commit 3f612b58be58fe38eb57cad7cbca545887ce7759 Author: mikey dagitses Date: Thu Aug 18 13:03:00 2022 +0000 fix quantization/core/test_docs for Buck2 (#83341) Summary: We extract the test to its own target, fixing the relative path to the quantization docs. This allows us to find the docs with a more simple implementation. Test Plan: Tested locally with buck1 and buck2. Differential Revision: D38662169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83341 Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/ZainRizvi commit aad89bb77176a755cf7f916b4cb16bc4a021d1bb Author: Mario Lezcano Date: Thu Aug 18 05:17:32 2022 -0500 Make the derivative of masked_fill more efficient (#83515) There's no need to add all the zeros if we extract all the non-zero elements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83515 Approved by: https://github.com/albanD, https://github.com/soulitzer commit 4b3f1bdb0cb7213ae5ac4f3e3d187648c7720175 Author: PyTorch MergeBot Date: Thu Aug 18 10:35:00 2022 +0000 [vision hash update] update the pinned vision hash (#83582) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83582 Approved by: https://github.com/pytorchbot commit eb6004146aba1e371a3c169f11e76390fd74a13e Author: PyTorch MergeBot Date: Thu Aug 18 10:23:01 2022 +0000 [xla hash update] update the pinned xla hash (#83581) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83581 Approved by: https://github.com/pytorchbot commit ce7177f88a8c76351087bd06520681e60591ff50 Author: Kulin Seth Date: Thu Aug 18 06:03:16 2022 +0000 [MPS] Register index.Tensor_out (#82507) * Add more tests from test_indexing into test_mps * Cache the indexing library on the MPSDevice Pull Request resolved: https://github.com/pytorch/pytorch/pull/82507 Approved by: https://github.com/malfet commit 6dc8673b1bb7a0f22a2453049751089943cc1f3b Author: yanbing-j Date: Thu Aug 18 05:08:12 2022 +0000 Update ideep for NNC post-op (#82705) This PR is to add NNC post-op fusion support in ideep for further NNC development. It includes: - element wise post op fusion - conv/matmal/linear + binary post op fusion **Common configuration:** - Jemalloc and iomp enabled - BS=1 - num_warmup = 300 - num_run = 500 - Average time of 1 iteration in ms is used - time_before: no fusion - time_after: with fusion - Eltwise OPs selected: hardswish and abs - Using oneDNN v2.6 **On ICX (32 cores per socket): Conv2d FP32 (in channels Last format)**   | shape | time_(ms)_before | time_(ms)_after | Gain -- | -- | -- | -- | -- 1socket | Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 0.112174 | 0.071106 | 36.61% 1socket | Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 0.11269 | 0.070586 | 37.36% 1socket | Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 0.164219 | 0.129498 | 21.14% 1socket | Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 0.169371 | 0.1277 | 24.60%   |   |   |   |     | shape | time_(ms)_before | time_(ms)_after | Gain 1thread | Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 1.994555 | 1.429813 | 28.31% 1thread | Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 1.715168 | 1.459937 | 14.88% 1thread | Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 2.997382 | 2.47915 | 17.29% 1thread | Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 3.044476 | 2.499366 | 17.90%   |   |   |   |     | shape | time_(ms)_before | time_(ms)_after | Gain 4thread | Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 0.405204 | 0.38117 | 5.93% 4thread | Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 0.410145 | 0.389279 | 5.09% 4thread | Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 0.67917 | 0.662792 | 2.41% 4thread | Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 0.682302 | 0.671226 | 1.62% **On CPX (28 cores per socket): Conv2d BF16 (in channels Last format)**   | shape | time_(ms)_before | time_(ms)_after | Gain -- | -- | -- | -- | -- 1socket | Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 0.119289 | 0.091015 | 23.70% 1socket | Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 0.144116 | 0.09339 | 35.20% 1socket | Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 0.209975 | 0.177111 | 15.65% 1socket | Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 0.234777 | 0.179945 | 23.36%   |   |   |   |     | shape | time_(ms)_before | time_(ms)_after | Gain 1thread | Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 1.296252 | 1.086423 | 16.19% 1thread | Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 1.364738 | 1.131289 | 17.11% 1thread | Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 3.99519 | 3.736147 | 6.48% 1thread | Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 4.03415 | 3.77981 | 6.30%   |   |   |   |     | shape | time_(ms)_before | time_(ms)_after | Gain 4thread | Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 0.27474 | 0.245281 | 10.72% 4thread | Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 | 0.28595 | 0.254748 | 10.91% 4thread | Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 0.847318 | 0.791453 | 6.59% 4thread | Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 | 0.870212 | 0.801594 | 7.89% **On CPX (28 cores per socket): Linear BF16**   | shape | time_(ms)_before | time_(ms)_after | Gain -- | -- | -- | -- | -- 1socket | Linear+abs_N=1_iC=1024_oC=4096 | 0.043199 | 0.037603 | 12.95% 1socket | Linear+hardswish_N=1_iC=1024_oC=4096 | 0.041845 | 0.038332 | 8.40% 1socket | Linear+abs_N=1_iC=4096_oC=1024 | 0.048282 | 0.044281 | 8.29% 1socket | Linear+hardswish_N=1_iC=4096_oC=1024 | 0.048362 | 0.044106 | 8.80% 1socket | Linear+abs_N=1_iC=2048_oC=1000 | 0.036302 | 0.0344 | 5.24% 1socket | Linear+hardswish_N=1_iC=2048_oC=1000 | 0.035734 | 0.035593 | 0.39%   |   |   |   |     | shape | time_(ms)_before | time_(ms)_after | Gain 1thread | Linear+abs_N=1_iC=1024_oC=4096 | 0.365143 | 0.36279 | 0.64% 1thread | Linear+hardswish_N=1_iC=1024_oC=4096 | 0.364464 | 0.363392 | 0.29% 1thread | Linear+abs_N=1_iC=4096_oC=1024 | 0.384498 | 0.379902 | 1.20% 1thread | Linear+hardswish_N=1_iC=4096_oC=1024 | 0.382545 | 0.381252 | 0.34% 1thread | Linear+abs_N=1_iC=2048_oC=1000 | 0.213244 | 0.209999 | 1.52% 1thread | Linear+hardswish_N=1_iC=2048_oC=1000 | 0.212003 | 0.208567 | 1.62%   |   |   |   |     | shape | time_(ms)_before | time_(ms)_after | Gain 4thread | Linear+abs_N=1_iC=1024_oC=4096 | 0.126096 | 0.12157 | 3.59% 4thread | Linear+hardswish_N=1_iC=1024_oC=4096 | 0.126627 | 0.121662 | 3.92% 4thread | Linear+abs_N=1_iC=4096_oC=1024 | 0.132845 | 0.128921 | 2.95% 4thread | Linear+hardswish_N=1_iC=4096_oC=1024 | 0.132642 | 0.12783 | 3.63% 4thread | Linear+abs_N=1_iC=2048_oC=1000 | 0.079582 | 0.072584 | 8.79% 4thread | Linear+hardswish_N=1_iC=2048_oC=1000 | 0.077761 | 0.071981 | 7.43% Pull Request resolved: https://github.com/pytorch/pytorch/pull/82705 Approved by: https://github.com/frank-wei, https://github.com/eellison commit 278c726458c1febdde7420734477bf8b552c0243 Author: jjsjann123 Date: Thu Aug 18 04:52:31 2022 +0000 fixing define_constant pybind signature to match std::complex scalar (#83645) Fixes #83576 Previously complex scalar is defined as boolean and generating wrong result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83645 Approved by: https://github.com/ezyang, https://github.com/kevinstephano commit badbdb033038d84c46550c4ddc8eab64257c5143 Author: Mengwei Liu Date: Thu Aug 18 04:47:13 2022 +0000 [torchgen] Relax the restriction on number of custom namespaces (#83580) Summary: We started to see use cases where it involves more than 1 custom namespace to live within the same yaml file. Hence relaxing the restriction that 1 yaml file can only have 1 custom namespace other than `aten`. Updated unit test as well. Differential Revision: D38775685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83580 Approved by: https://github.com/JacobSzwejbka commit 7263450c309443a8fd3f8ab29fbc04c35692e58f Author: PyTorch MergeBot Date: Thu Aug 18 02:58:15 2022 +0000 Revert "[primTorch] Add ref for `new_empty_strided` (#82466)" This reverts commit e154f5ae3b91fd462faa2120f8940811a47096de. Reverted https://github.com/pytorch/pytorch/pull/82466 on behalf of https://github.com/ezyang due to broke trunk only nnc tests commit d6a30e213e2355e8ad553c02d205391c889a0254 Author: Aashaka Shah Date: Thu Aug 18 02:16:24 2022 +0000 Enable pg_nccl.reduce_scatter to perform vector ReduceScatter for uneven input splits (#82924) Summary: A vector reduce_scatter requires each process to reduce and scatter an input tensor according to the input list provided. Internally, pg_nccl.reduce_scatter will coalesce a list of pg_nccl._reduce_oop to implement a vector reduce-scatter in the case when the any input shape is different in the input list. Otherwise, it will perform a ncclReduceScatter as usual. - This change adds a `CoalescedWorkNCCL` class which encapsulates the WorkNCCL requests from coalesced operations. A `.wait()` on a CoalescedWorkNCCL request will call a wait on each of the WorkNCCL requests that are coalesced. - This change adds an out-of-place `_reduce_oop` function to ProcessGroupNCCL. It allows reducing an input tensor and placing the output in a separate output tensor. Since reduce_scatter provides an out-of-place API, a reduce_scatter_v semantic implemented inside `pg_nccl.reduce_scatter` also needs to support out-of-place, for which an out-of-place reduce is required to be added. Test Plan: Added a new test `test_reduce_scatter_v_cuda` for reduce_scatter_v to `distributed_nccl_spawn`. Differential Revision: D38478781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82924 Approved by: https://github.com/kwen2501 commit 52be908225a2019da8ff7a2dc52e28ce2b13e69a Author: Edward Z. Yang Date: Wed Aug 17 10:20:15 2022 -0700 Delete unnecessary sum.SymInt overload (#83591) Dims argument only ever takes dimensions, which we do not need to SymInt-ify. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83591 Approved by: https://github.com/albanD commit 6679d238fd6f85397559977920b5202390f8e4f1 Author: Edward Z. Yang Date: Wed Aug 17 10:20:14 2022 -0700 SymInt'ify schemas for prims (#83528) I audited these looking for places where ints were accepted for sizes and turned them into SymInts. Dimensions and miscellaneous ints were not modified. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83528 Approved by: https://github.com/ngimel commit 817a82704ff140cec001fab942437b96d901da42 Author: Edward Z. Yang Date: Tue Aug 16 13:37:29 2022 -0700 Delete ProxyTensor wrapper subclass (#83330) I was working on https://github.com/pytorch/torchdynamo/issues/80 and my working hypothesis for what was causing the error was that proxy tensor was not advertising correct dispatch keys, causing AMP to operate differently when you traced. I could have fixed this directly by replicating fake tensor's fix for setting dispatch keys to also apply to proxy tensor, but I was like, "Why must I repeat myself." This PR is the result. It completely deletes the ProxyTensor wrapper subclass, so that when we are tracing, the tensors flowing through the program are the *original* real or fake tensors, depending on what the user requested in the top-level API. There is no more wrapping. To store the Proxy objects necessary for actually doing tracing, I store the property directly on the tensors. (Note: I never clean up old entries from the map at the moment, this is easily fixed by using a weak map) Benefits of doing this: * No more tip-toeing around no_dispatch() creation of new ProxyTensors; we never create new tensors (except when we call the underlying func), so you don't have to worry about accidentally tracing them. * No more syncing up metadata from in place operators. In particular https://github.com/pytorch/pytorch/issues/81526 is mooted * This fixes https://github.com/pytorch/torchdynamo/issues/519 as we no longer need to teach proxy tensor to support sparse tensor. * No more schlepping symbolic integers from the inner fake tensor to the outer proxy tensor. If you can make a fake tensor with symbolic ints, you're done, nothing else to do. To avoid having to rewrite all of the guts, when I get to the actual proxy tensor handler, I first "fetch" the stored ProxyTensor data from the weakmap via a tree_map, and then operate on the consequent data as before. A more optimized implementation is possible. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83330 Approved by: https://github.com/Chillee commit 0a48cdfb3bad14a62d9386ae0d1499bec74b63a6 Author: Horace He Date: Wed Aug 17 22:54:09 2022 +0000 re-enable aotautograd tests (#83485) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83485 Approved by: https://github.com/zou3519 commit e154f5ae3b91fd462faa2120f8940811a47096de Author: Nikita Karetnikov Date: Thu Aug 18 01:35:11 2022 +0000 [primTorch] Add ref for `new_empty_strided` (#82466) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82466 Approved by: https://github.com/ezyang commit b3c99bef0cab618cb6fedf7832004011172f9a34 Author: Yifan Shen Date: Thu Aug 18 00:49:29 2022 +0000 Support nested dropout autograd (#83338) When the initial version came out, `NestedTensor` was not included in the `CompositeImplicitAutograd` key set, so we had to register dropout_nested to dropout and make it forward-only. Now is the time to improve it! This pr removes dropout_nested; instead native_dropout_nested is implemented along with native_dropout_backward_nested. Side change: remove dropout__nested since @cpuhrsch suggested to leave out nested in-place ops for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/83338 Approved by: https://github.com/jbschlosser commit 451c6296af2deb5159848f3b579d201b4903c608 Author: Jay Chae Date: Wed Aug 17 22:31:49 2022 +0000 [kineto] deprecate USE_KINETO_UPDATED (#83305) Summary: This is used to do cross repo updates but has not been cleaned up properly Test Plan: CI Reviewed By: aaronenyeshi Differential Revision: D38633379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83305 Approved by: https://github.com/aaronenyeshi commit 79534b7f259e23aa5f819eb13173567e99183bbf Author: Nikolay Korovaiko Date: Wed Aug 17 22:14:12 2022 +0000 Adding XLA folks to reviewer/approvers (#83555) XLA folks will be doing a lot of smaller changes to the Lazy component that they can review themselves w/o either @wconstab or myself. They need approval permissions for the Lazy component. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83555 Approved by: https://github.com/wconstab, https://github.com/JackCaoG commit cf2c94e6de0e50edb3c9f89bb602055ee6d11011 Author: Michael Gschwind Date: Wed Aug 17 21:57:39 2022 +0000 NestedTensor Softmax (#83435) Summary: Simple mask compute and softmax Test Plan: unit test Differential Revision: D38711915 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83435 Approved by: https://github.com/erichan1, https://github.com/huydhn commit 71141c30232d436ad61d2931af8808e4451f8cde Author: migeedz Date: Wed Aug 17 11:29:53 2022 -0700 extend torch.ones to handle tuple inputs (#83194) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83194 Approved by: https://github.com/jansel commit 7536ac7125ea50f23be5236aed387bd09215f939 Author: migeedz Date: Wed Aug 17 11:29:52 2022 -0700 prevent graph mutation in constraint generation (#83109) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83109 Approved by: https://github.com/jansel commit ea2183f0eaa6209b8f31ea01627c1ee34654a2b7 Author: Daniel Recoskie Date: Wed Aug 17 11:17:37 2022 -0700 removed duplicate_quantize_dynamic_node (#83459) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83459 Approved by: https://github.com/jerryzh168 commit cf5330977d10a3585358fb02c049939bf1401074 Author: Nikita Shulga Date: Wed Aug 17 20:54:06 2022 +0000 [CI] Move torch-deploy to cuda-11.6 (#83572) As we are slowly deprecating 11.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83572 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi commit af8e34cca9d40f16cbfcd773750d791e83d4a39f Author: ssjia Date: Wed Aug 17 08:52:33 2022 -0700 [vulkan] Do not populate unpacked args of PackedContexts when deserializing (#83587) Vulkan ops that use `PackedContext` objects currently maintain two lists storing the parameters of the op: 1. `unpacked_` which stores the original arguments passed in to the op 2. `packed_` which stores pre-processed arguments which are used for inference. The `unpacked_` list is only needed for serialization - during inference, where it is not expected that the model will be saved, then there is no point keeping the `unpacked_` list in memory. This diff introduces a flag `fill_unpacked`, by default set to `true`, that is passed into the `*PackedContext()` constructors. `unpacked_` is populated only if `fill_unpacked = true`. The `create_*_context()` functions will call the constructor with `fill_unpacked = true`, which ensures that `unpacked_` is populated for serialization. However, when loading a model, the `*PackedContext` objects are deserialized by calling `*PackedContext::pack()`, which will call the constructor with `fill_unpacked = false` - the original tensors will therefore be discarded after packing, saving a significant amount of CPU memory during model inference. Differential Revision: [D38761645](https://our.internmc.facebook.com/intern/diff/D38761645/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83587 Approved by: https://github.com/kimishpatel commit cf52680d406f96304420fe070b362905089d0268 Author: Nikita Karetnikov Date: Wed Aug 17 19:23:12 2022 +0200 [primTorch] Add OpInfo and ref for eye (#82323) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82323 Approved by: https://github.com/ezyang commit 1a49eea30102b9d083367f8f088f60381576a54c Author: Nikita Karetnikov Date: Wed Aug 17 14:26:09 2022 +0200 [primTorch] Add ref for diag_embed (#82322) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82322 Approved by: https://github.com/Lezcano, https://github.com/ngimel commit ea037344e81d5645ad2e8863a9b6ccdb33f60320 Author: Edward Z. Yang Date: Wed Aug 17 10:20:13 2022 -0700 Reset compile cache to fix flaky test (#83608) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83608 Approved by: https://github.com/seemethere, https://github.com/malfet commit ad44079952b945262808af8fa841994f736c1fe2 Author: Peter Bell Date: Wed Aug 17 17:38:07 2022 +0100 Remove conj kernels for real dtypes (#80374) `conj_physical_stub` is currently implemented for all dtypes despite it just being a plain copy for real dtypes. So, instead we should defer to the existing copy kernel in these cases. On my build for one CUDA architecture, I see a 2.2 MB decrease in `libtorch_cuda.so` size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80374 Approved by: https://github.com/ngimel commit 652fb0335513026632cf14e78a095aa485fcc81d Author: John Clow Date: Tue Aug 16 10:10:18 2022 -0700 Symbolic Shape Analaysis: Add Generalized List of Tensor Shape Support (#78679) Pull Request resolved: https://github.com/pytorch/pytorch/pull/78679 Approved by: https://github.com/davidberard98 commit b1e02ae8fc85883dc1390add7e8b2ae1cc611c4c Author: Peter Bell Date: Wed Aug 17 16:58:29 2022 +0100 Move PythonRefInfos for `torch.fft` into opinfo.definitions (#83277) Ref #82518 The moves `python_ref_db` entries for `torch.fft` into `opinfo/definitions/fft.py`. I ran into a problem with `_find_referenced_opinfo` since it's called at init time for the module, yet relies on the completed op_db list. This PR fixes the circular dependency by explicitly passing in an op_db argument which can point to only the locally defined `op_db`. An alternative solution would be to have a different folder for the `op_db` and the `python_ref_db` definitions. However that would mean losing the convenience of having closely related opinfos be in the same file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83277 Approved by: https://github.com/albanD commit 5f50289b399ef8f24025832aaf12d143279ed5c0 Author: Peter Bell Date: Wed Aug 17 16:58:28 2022 +0100 Move OpInfos for torch.fft into `opinfo.definitions` (#83276) Ref #82518 This moves the `op_db` entries into `opinfo/definitions/fft.py` and also appends them to `common_methods_invocations.op_db` so existing users are unaffected by this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83276 Approved by: https://github.com/albanD commit 85ef1a1cd104033143cfa9a3f19fc3ab326d737a Author: Fabio Rocha Date: Wed Aug 17 10:45:14 2022 -0500 [primTorch] added ref for nn.functional.glu (#82214) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82214 Approved by: https://github.com/ngimel commit bd0ad7a84f125435f9e0685f86b1ca2efd2bd43b Author: Mikayla Gawarecki Date: Tue Aug 16 22:08:13 2022 +0000 Add backward support for rudimentary NestedTensor.sum(dim) (#82625) Per offline discussion, this will be updated to use expand once expand semantics for nested tensor have been fleshed out. Next steps will be to add support for other features for forward sum mentioned on #82387 and likewise update the backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/82625 Approved by: https://github.com/albanD commit 68d2d7866daf766c3ff1b2b450d0a2e4d50e9908 Author: Max Podkorytov Date: Wed Aug 17 18:10:36 2022 +0000 [static-runtime] change the backend for permute_copy (#83532) Summary: Testing wrappable dims Differential Revision: D38717563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83532 Approved by: https://github.com/mikeiovine commit 30af17cea7e983b9e60354e05f7dcc2688183073 Author: John Clow Date: Tue Aug 16 10:18:47 2022 -0700 [HIP] Add extra exception handling for non-ROCM builds (#83009) I got the following error on OSX, which doesn't have HIP. As this file is supposed to compile with non-HIP builds, I added this error to the errors to ignore. ``` Traceback (most recent call last): File "test/test_profiler.py", line 31, in from torch.profiler._pattern_matcher import (Pattern, NamePattern, File "/Users/jclow/pytorch3/torch/profiler/_pattern_matcher.py", line 9, in import torch.utils.benchmark as benchmark File "/Users/jclow/pytorch3/torch/utils/benchmark/__init__.py", line 2, in from torch.utils.benchmark.utils.timer import * # noqa: F403 File "/Users/jclow/pytorch3/torch/utils/benchmark/utils/timer.py", line 8, in from torch.utils.benchmark.utils import common, cpp_jit File "/Users/jclow/pytorch3/torch/utils/benchmark/utils/cpp_jit.py", line 13, in from torch.utils import cpp_extension File "/Users/jclow/pytorch3/torch/utils/cpp_extension.py", line 19, in from .hipify import hipify_python File "/Users/jclow/pytorch3/torch/utils/hipify/hipify_python.py", line 34, in from .cuda_to_hip_mappings import CUDA_TO_HIP_MAPPINGS File "/Users/jclow/pytorch3/torch/utils/hipify/cuda_to_hip_mappings.py", line 34, in rocm_path = subprocess.check_output(["hipconfig", "--rocmpath"]).decode("utf-8") File "/Users/jclow/opt/anaconda3/envs/pytorch3/lib/python3.8/subprocess.py", line 415, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/Users/jclow/opt/anaconda3/envs/pytorch3/lib/python3.8/subprocess.py", line 493, in run with Popen(*popenargs, **kwargs) as process: File "/Users/jclow/opt/anaconda3/envs/pytorch3/lib/python3.8/subprocess.py", line 858, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/Users/jclow/opt/anaconda3/envs/pytorch3/lib/python3.8/subprocess.py", line 1706, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: 'hipconfig' ``` Differential Revision: [D38766067](https://our.internmc.facebook.com/intern/diff/D38766067) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83009 Approved by: https://github.com/malfet commit 244690205fdf5e27fe37c44aeea183eccd391307 Author: Chien-Chin Huang Date: Wed Aug 17 00:25:57 2022 -0700 [FSDP] Use _init_from_local_tensor to create ShardedTensor to avoid communication overhead (#82911) FSDP originally uses `_init_from_local_shards_and_global_metadata()` to create a ShardedTensor for sharded_state_dict(). We have seen some non-trivial overhead if the number of tensors is large. Using `_init_from_local_shards_and_global_metadata ` can significantly reduce the overhead. For a model with ~250 tensors in the state_dict trained with 16 GPUs, the original `sharded_state_dict` takes ~1.7 seconds and this PR reduces the overhead to ~0.6 seconds. Differential Revision: [D38452170](https://our.internmc.facebook.com/intern/diff/D38452170/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82911 Approved by: https://github.com/awgu commit 5e8b4c64aa1ecd2e56aeeb06559af97162effdd1 Author: Horace He Date: Wed Aug 17 02:45:19 2022 +0000 Delayed compilation of backwards pass to when backwards runs (#83367) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83367 Approved by: https://github.com/ngimel, https://github.com/zou3519 commit 1f7153bee80090c22490a26828bd74fb0d9fc60e Author: Digant Desai Date: Wed Aug 17 16:31:14 2022 +0000 [quant] Optionally clamp weights post quantization (#83438) Summary: Until we add quant_{min, max} args to `torch.quantize_per_{channel, tensor}`, this patch will make sure we will honor observer's restrictions on quantized values. Test Plan: Added new tests, run with - `buck run caffe2/test:quantization -- quantization.core.test_utils` Differential Revision: D38624119 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83438 Approved by: https://github.com/andrewor14 commit ab02b898116a2d8f0e6da2689298027543362ea9 Author: migeedz Date: Tue Aug 16 16:11:42 2022 -0700 expand torch.full to reason about integers (#83087) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83087 Approved by: https://github.com/jansel commit 1a38724ed3e189152f11ef576d0ff15a31a39eaa Author: migeedz Date: Tue Aug 16 16:11:42 2022 -0700 fix bug in a linear constraint (#82938) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82938 Approved by: https://github.com/jansel commit 0061e6762985511a1487c1c12a5353a6d4faf73e Author: PyTorch MergeBot Date: Wed Aug 17 16:19:38 2022 +0000 Revert "NestedTensor Softmax (#83435)" This reverts commit d7fc76a1ed33a155c8be795abe67315a4459e1a0. Reverted https://github.com/pytorch/pytorch/pull/83435 on behalf of https://github.com/huydhn due to This is suspected to break functorch tests in trunk https://hud.pytorch.org/pytorch/pytorch/commit/d7fc76a1ed33a155c8be795abe67315a4459e1a0 commit eb4e03ddf89dcacf204e3f95ae0711dfbcc1939b Author: Horace He Date: Wed Aug 17 01:26:25 2022 +0000 made some minor tweaks to minifier to reduce outputs more often (#83565) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83565 Approved by: https://github.com/voznesenskym commit 84c4b079328c2a97d78d95c47b841f8dca6036bb Author: albanD Date: Wed Aug 17 15:08:05 2022 +0000 Make sure that we can load old optimizer checkpoint (#83588) We want to make sure that we can load checkpoints that were saved with older version of the code (which doesn't contain the differentiable attribute). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83588 Approved by: https://github.com/mikaylagawarecki commit dcda907693e9e7661c099c3b9ed25fadaed273f8 Author: ProGamerGov Date: Wed Aug 17 14:53:02 2022 +0000 Add docstring type formatting guidelines to `CONTRIBUTING.md` (#83536) This PR builds on the following past PRs, and serves to help improve the consistency of PyTorch's docstring formatting. * `boolean` -> `bool` and `string` -> `str`: https://github.com/pytorch/pytorch/pull/82410 * Don't use plural of types: https://github.com/pytorch/pytorch/pull/82474 * Capitalize the Callable type, `callable` -> `Callable` : https://github.com/pytorch/pytorch/pull/82487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83536 Approved by: https://github.com/H-Huang, https://github.com/albanD commit 9f03444f705a52833d8e3220446ae48e285c2cf9 Author: Ivan Yashchuk Date: Wed Aug 17 14:46:04 2022 +0000 Add torch.ops.aten -> torch._refs mapping to TorchRefsMode using decomposition_table (#82657) This PR adds the possibility to convert `torch.ops.aten` calls to `torch._refs` and consequently prims under TorchRefsMode. New test, `test_aten_overload_to_prims`, in `test/test_prims.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82657 Approved by: https://github.com/jjsjann123, https://github.com/ezyang commit 7af3208412c282c4b8d216f413df5bd26287f9fd Author: Jagadish Krishnamoorthy Date: Wed Aug 17 14:42:33 2022 +0000 [ROCm] Enable test_ddp_profiling_torch_profiler (#82749) Signed-off-by: Jagadish Krishnamoorthy Pull Request resolved: https://github.com/pytorch/pytorch/pull/82749 Approved by: https://github.com/ngimel, https://github.com/rohan-varma commit c8ec4ceb9ba8746b2b8c149ebb981914d2ef0483 Author: Sergii Dymchenko Date: Wed Aug 17 13:23:11 2022 +0000 Delete checked_dense_tensor_unwrap (#83543) As TH gone long time ago. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83543 Approved by: https://github.com/ZainRizvi, https://github.com/ezyang commit 822a8e057fa4e6a6a8413d22bae2c1a5aa853134 Author: Peter Bell Date: Wed Aug 17 01:40:00 2022 +0100 Use opmath_type for CUDA logcumsumexp (#83425) This improves precision by reducing the number of narrowing conversions, as well as reducing compile times from 2m 30s to 1m 25s on my machine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83425 Approved by: https://github.com/ngimel commit 2a096e940d33a33c4eb6df1c2ed4da607bd31a7f Author: Fabio Rocha Date: Tue Aug 16 14:23:09 2022 -0500 [primTorch] support for a few magic methods (#83524) Added support for mapping __rsub__, __rtruediv__, __rfloordiv__, __floordiv__, __pow__, and __rpow__ in TorchRefsMode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83524 Approved by: https://github.com/ngimel commit 5aab57e112d244f0cf3bbab30db640e52a0c2c44 Author: Emilio Castillo Date: Wed Aug 17 07:20:37 2022 +0000 Make Adam optimizer differentiable (#82205) Continues [80938](https://github.com/pytorch/pytorch/pull/80938) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82205 Approved by: https://github.com/albanD commit 11d4d91bdccab35928fd56a4fc5eac781f9fb71e Author: Larry Liu <8188269+larryliu0820@users.noreply.github.com> Date: Tue Aug 16 21:11:45 2022 -0700 [torchgen] Add logic in annotation parser to accept alias set (#83501) Extending the current regex in `model.py` to support annotation alias set. See issue #83214. Ideally we should have a full fledged lexer similar to `schema_type_parser.cpp`, since regex can be more and more difficult to read if we add more support to it. Adding this to unblock this issue for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83501 Approved by: https://github.com/SherlockNoMad commit a09c3fcb8d0e290b8d398c110ddbfd845e6c4058 Author: CaoE Date: Wed Aug 17 06:19:54 2022 +0000 Add loss operators to fp32 cast policy of AutocastCPU (#81689) Add loss operators to fp32 cast policy of AutocastCPU to improve accuracy of BFloat16 training. There will be no performance impact on fp32, only a slight impact on bf16 training. This is because conv transpose does not fully support bf16 before, and it will be replaced to _convolution in graph mode. If _convolution is in lower precision cast policy it will throw dtype related errors. conv transpose does not fully support bf16 yet, so _convolution still needs to be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81689 Approved by: https://github.com/malfet commit d3a176a156819d57ed442579b806ff027402f4dc Author: fduwjj Date: Tue Aug 16 17:32:07 2022 -0700 [PT-D][BE][TP perf 1/N] Get rid of unnecessary collectives in Embedding/EmbeddingBag and use autograd-enabled collectives (#81853) These two ops (Embedding and EmbeddingBag for ShardedTensor) especially for row-wise sharding is very inefficient and hard to fit in the concept of future design. So this PR is trying to: 1. Remove all unnecessary collective communications. Only one gather and one reduce(or reduce scatter) is needed. 2. Use auto-grad enabled collectives so that we can use these ops in real model training. 3. Some minor code cleaning 4. Treat input differently when it's replicated tensor. (Will add more for this for the next few PRs). Differential Revision: [D37965687](https://our.internmc.facebook.com/intern/diff/D37965687/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81853 Approved by: https://github.com/wanchaol commit e09821f784bc9e9f13d361e9d2eb3fa1d7d07263 Author: Edward Z. Yang Date: Tue Aug 16 15:05:52 2022 -0400 Avoid using true division in split_dim (#83527) This makes it more amenable to tracing with dynamic shapes, where we don't support SymFloats yet. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83527 Approved by: https://github.com/ngimel commit d7fc76a1ed33a155c8be795abe67315a4459e1a0 Author: Michael Gschwind Date: Wed Aug 17 04:19:23 2022 +0000 NestedTensor Softmax (#83435) Summary: Simple mask compute and softmax Test Plan: unit test Differential Revision: D38711915 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83435 Approved by: https://github.com/erichan1 commit 343b5f86512f75f8f3bd4b90749c0459743b9e72 Author: John Clow Date: Tue Aug 16 10:18:47 2022 -0700 [TorchTidy] Adding support for accessing strides and scalars (#80072) Differential Revision: [D37571570](https://our.internmc.facebook.com/intern/diff/D37571570) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80072 Approved by: https://github.com/robieta commit 1a09b05c940b44968ccd6ba94698150512defbc7 Author: Nikita Shulga Date: Wed Aug 17 03:22:56 2022 +0000 Fix `torch.equal` on CPU (#83350) `torch.equal` should not raise an exception when comparing tensors of different types I.e. `torch.equal(torch.tensor([1, 2]), torch.tensor([1, 2], dtype=torch.float)))` should return True rather than raise an exception. Also, this makes it consistent with GPU behaviour Fixes https://github.com/pytorch/pytorch/issues/83314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83350 Approved by: https://github.com/albanD commit df62ea76d1b443aad8d92b8c6fdad18fae5c6eb6 Author: migeedz Date: Tue Aug 16 16:11:41 2022 -0700 add the nessesairy constraints for the next 5 benchmarks (#82923) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82923 Approved by: https://github.com/jansel commit aac622ad55a8127e770217c6773031817be33b5f Author: Jacob Szwejbka Date: Wed Aug 17 01:45:30 2022 +0000 Optionally run fbgemm in tracer (#83531) Summary: Well this tech debt has come back to haunt me. Gonna slap more duct-tape on it for today. Test Plan: ci Differential Revision: D38753249 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83531 Approved by: https://github.com/dhruvbird commit 31d4b6f52a9f74526f4f666348029da260254ea5 Author: Kulin Seth Date: Wed Aug 17 00:26:41 2022 +0000 [MPS] Fix conv1D and conv2D with non-matching strides/paddings (#83522) * Add reference to the github issue in test_mps.py Fixes https://github.com/pytorch/pytorch/issues/83180, https://github.com/pytorch/pytorch/issues/82921, https://github.com/pytorch/pytorch/issues/82711, https://github.com/pytorch/pytorch/issues/82563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83522 Approved by: https://github.com/albanD, https://github.com/malfet commit 0e2efaf9cca53890004718eba76dfefa74838aa3 Author: Catherine Lee Date: Wed Aug 17 00:19:39 2022 +0000 use global var for disabled and slow test dicts (#83487) as in title Additional changes: * run isort for imports * rename some variables * warning instead of print Test plan * checked logs to see that tests were still being disabled * checked pytest xmls to check that pytest still disables things Pull Request resolved: https://github.com/pytorch/pytorch/pull/83487 Approved by: https://github.com/malfet, https://github.com/huydhn commit 1ee9eb52b612f5fb4b63bbda832e44c8902edb64 Author: Brian Hirsh Date: Mon Aug 15 13:49:32 2022 -0700 fix native_layer_norm meta kernel parity w cuda (#83457) Fixes #83362 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83457 Approved by: https://github.com/eellison commit f4b7c10e14fabd3cf3998746f834bf0d0410c070 Author: Brian Hirsh Date: Mon Aug 15 13:49:32 2022 -0700 fix resnet50_quantized_qat and mobilenet_v2_quantized_qat <> functionalization (#83339) This won't actually fix the issue until we make FakeTensor always-on for AOTAutograd. I confirmed with the following benchmark (with `normalize_ir=False` and `use_functionalize=True`) in the dynamo/functorch config (run inside the `torch dynamo` repo): ``` terminal...$ python benchmarks/torchbench.py --training --devices=cuda --accuracy-aot-nop --generate-aot-autograd-stats --use-eval-mode --isolate --only=mobilenet_v2_quantized_qat cuda train mobilenet_v2_quantized_qat 0.967x p=0.00 terminal...$ python benchmarks/torchbench.py --training --devices=cuda --accuracy-aot-nop --generate-aot-autograd-stats --use-eval-mode --isolate --only=resnet50_quantized_qat cuda train resnet50_quantized_qat 0.943x p=0.00 ``` I explained a bit more in the comment: quantized models use a running-mean style op, `fused_moving_avg_obs_fake_quant`, that takes in the running min/max stored on the module and mutates them, potentially resizing them. That causes `AOTAutograd` to complain: AOTAutograd first takes views of the inputs (using `.detach().requires_grad_(grad)`), and plumbs them through the function to figure out what output to trace the backward with. These new inputs now have `TensorImpl::allow_tensor_metadata_change_ = false`, which causes the op to fail when it tries to resize the running counter variables. Once we're always using fake tensors, we shouldn't need to use `.detach().requires_grad_()` anymore (since we already have fresh fake tensors to trace with). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83339 Approved by: https://github.com/ezyang commit 785f7f62984c2a017ae5f31173d405d658435a66 Author: PyTorch MergeBot Date: Tue Aug 16 23:30:43 2022 +0000 Revert "Use opmath_type for CUDA logcumsumexp (#83425)" This reverts commit 06a64f7eaa47ce430a3fa61016010075b59b18a7. Reverted https://github.com/pytorch/pytorch/pull/83425 on behalf of https://github.com/huydhn due to This break ROCm trunk test https://hud.pytorch.org/pytorch/pytorch/commit/06a64f7eaa47ce430a3fa61016010075b59b18a7 commit 3586af8adce03ac44c57e42de23b8a6676d78961 Author: Jerry Zhang Date: Tue Aug 16 20:23:42 2022 +0000 [quant] Remove unused quantize handler definitions (#83360) Summary: CommonQuantizeHandler This was added previously to make some of the refactor to use reference quantized model flow easier, now we have fully migrated to use reference quantized model flow, it's no longer needed, so we can remove it Also updated some comments Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/83360 Approved by: https://github.com/andrewor14 commit 059321469e87801f2300ebd8c44d667e1b12bfa3 Author: ssjia Date: Tue Aug 16 10:31:54 2022 -0700 [vulkan] Use aliases when retrieving from packed/unpacked lists in OpContexts (#83526) Instead of retrieving elements of pack/unpacked lists using raw indices, this diff introduces aliases which improve code readability and guard against future errors. Differential Revision: [D38748293](https://our.internmc.facebook.com/intern/diff/D38748293/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83526 Approved by: https://github.com/manuelcandales commit 31fad3926a34c57e05d25a2cc22abf4028ebfc78 Author: soulitzer Date: Tue Aug 16 16:20:44 2022 -0400 Add option to run anomaly mode without nan checking (#83481) Fixes https://github.com/pytorch/pytorch/issues/83117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83481 Approved by: https://github.com/albanD commit 1b437718a351b3bc2bb424975bdf29c82fb1c6c8 Author: Eli Uriegas Date: Tue Aug 16 13:07:16 2022 -0700 ci: Add workflow to build official docker images with multiarch (#83437) Resolves https://github.com/pytorch/pytorch/issues/80764 Signed-off-by: Eli Uriegas Pull Request resolved: https://github.com/pytorch/pytorch/pull/83437 Approved by: https://github.com/ZainRizvi, https://github.com/malfet commit 3a511e83549d86ce9a2a8de2b64be340d2a23e4e Author: samdow Date: Tue Aug 16 22:39:06 2022 +0000 [Expanded Weights] add 'same' and 'valid' padding support (#83345) Co-authored-by: Ashkan Adds "same" and "valid" padding support, as Opacus (well @ashkan-software) did https://github.com/pytorch/opacus/pull/451 Basics of it are this: - during forward pass, if there's "same" padding, we manually pad the input (NB: this will cause a small perf hit, haven't benchmarked yet) - during backward pass, the gradient wrt input needs to be cut down to the correct size if the original padding was same (conv_transpose doesn't accept string padding). Because conv_transpose will give us a gradient wrt the padded shape, we cut down the gradient to the correct size (we know how much padding we added to the left and right) - then, for the per sample gradients wrt weights, the input is already padded so neither the unfold nor group convolution have any padding Pull Request resolved: https://github.com/pytorch/pytorch/pull/83345 Approved by: https://github.com/zou3519 commit cd68f08992e0985f1032726571ebe781aa50f82a Author: Justin Chu Date: Tue Aug 16 19:58:48 2022 +0000 [ONNX] Update the script for version updates (#83283) This PR updates the `tools/onnx/update_default_opset_version.py` script to ensure files are edited correctly to prepare for the opset 17 support in torch.onnx. - (clean up) Move script to `main()` - Add an `--skip_build` option to avoid building pytorch if we want to rerun the process due to errors after compilation is done - Update to edit the correct files now that the onnx files were refactored Pull Request resolved: https://github.com/pytorch/pytorch/pull/83283 Approved by: https://github.com/thiagocrepaldi, https://github.com/AllenTiTaiWang, https://github.com/abock commit d52d2bd5a94a49332d843bb909e4db58fe7ab1b2 Author: Jeff Daily Date: Tue Aug 16 20:49:33 2022 +0000 [ROCm] MIOpen fused convolution relu (#82002) Adds MIOpen fused convolution relu for fp32 and contiguous memory format. Adds fallbacks for conv + z + bias + relu, fp16, and channels last until MIOpen adds these features. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82002 Approved by: https://github.com/ngimel, https://github.com/malfet commit 79356311f5c3d9283da118286ed5dd8de7d43fb3 Author: kshitij12345 Date: Tue Aug 16 20:31:46 2022 +0000 update merge failed msg (#83462) Message seemed a bit incorrect to read Ref: https://github.com/pytorch/pytorch/pull/82955#issuecomment-1215523319 Before PR: ``` Merge failed due to This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. Raised by https://github.com/pytorch/pytorch/actions/runs/2862480424 ``` After PR ``` Merge failed Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. Raised by https://github.com/pytorch/pytorch/actions/runs/2862480424 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83462 Approved by: https://github.com/janeyx99, https://github.com/ZainRizvi commit 4b597019b735124b46947e8ff0490e0311f0bdb8 Author: Driss Guessous Date: Tue Aug 16 20:22:19 2022 +0000 [Nested Tensor] Created Nested Tensor to Nested Tensor Views (#82658) This is PR is pulling out all the changes from #81838 specific to properly creating nested_tensor views. I will update this comment with a design doc once that has been made. This should enable proper creation of NestedTensor views, two nested_tensors sharing the same buffer_ but with different NestedTensor meta data. The function `create_nested_tensor_view` is a helper function for creating a new nested tensor whose storage aliases the base causing the underlying storage to be shared - and is therefore a view. This function by itself is not differentiable and therefore autograd does not track its uses. If a nested tensor function implementation uses this helper in its implementation the aten_op must meet two requirements: - The function must return a view of the input - The function must be explicit and defines its backward A bug was found when creating a base tensor out of inference mode and then creating a view in inference mode. This test has been aded to this PR in order to show the effect of the change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82658 Approved by: https://github.com/albanD commit 94ba085ce0ccd48c2f1bd2eb1956b7800b873384 Author: George Qi Date: Mon Aug 15 19:14:34 2022 +0000 [maskedtensor] first commit, core and creation (#82836) * __->__ #82836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82836 Approved by: https://github.com/albanD, https://github.com/bhosmer commit 84146f3d0db1a39e6a4b363e15e30c6f6f159f75 Author: Peter Bell Date: Tue Aug 16 16:25:15 2022 +0100 Vectorize cpu tensor conversions (#80905) This adds vectorization to the copy kernel acting between different dtypes through the use of `at::vec::convert`. Currently `vec::convert` falls back to a scalar copy loop for most dtypes, however the compiler is still better able to auto-vectorize the loop since it doesn't involve stride calculations. In a simple timeit benchmark I see around a 2x speedup copying from int32 to various dtypes: | To dtype | Master (us) | This PR (us) | |----------|-------------|--------------| | int64 | 23.8 | 10.3 | | float32 | 16.8 | 8.18 | | float64 | 18.0 | 9.47 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/80905 Approved by: https://github.com/ngimel commit 559c8b8992cff9602b35735b837c3971e9224f36 Author: Peter Bell Date: Tue Aug 16 16:25:15 2022 +0100 Fix _refs.lcm using floating point maths (#82950) `lcm` is meant to use integer maths, but the use of `true_divide` introduces a promotion to float and thus a loss of precision. This also introduces promoting low precision integers to int32 which is required for 100% consistency with the C++ implementation since the "usual arithmetic conversions" means the intermediate terms are calculated to `int` precision in C++. This only really matters when the lower precision dtype would overflow, however the test cases for lcm do involve overflows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82950 Approved by: https://github.com/ngimel commit 9745edf971f125a870b6db75dc9536185fbc84c7 Author: Michael Melesse Date: Tue Aug 16 19:56:17 2022 +0000 [ROCM] Enable test_memory_format_nn_BatchNorm tests on ROCM (#82512) This enables some unit tests related to BatchNorm for ROCM. We make sure that we call the MIOpen library incases where it can handle it and use the default path in other cases. When MIOpen implements this specific case we will file a follow up PR enabling that code path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82512 Approved by: https://github.com/jeffdaily, https://github.com/albanD commit 06a64f7eaa47ce430a3fa61016010075b59b18a7 Author: Peter Bell Date: Tue Aug 16 15:47:15 2022 +0100 Use opmath_type for CUDA logcumsumexp (#83425) This improves precision by reducing the number of narrowing conversions, as well as reducing compile times from 2m 30s to 1m 25s on my machine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83425 Approved by: https://github.com/ngimel commit 0faf10b0f4da0bcaaaba8834fa6984bbd7f793f9 Author: Peter Bell Date: Tue Aug 16 15:47:14 2022 +0100 Split ScanKernels.cu (#83422) On my machine `ScanKernels.cu` takes 10 minutes for just a single architecture which is by far the highest compile time of any single file. So this splits it into multiple files, the slowest being `LogcumsumexpKernel.cu` which takes 2m 30s Pull Request resolved: https://github.com/pytorch/pytorch/pull/83422 Approved by: https://github.com/ngimel commit 8473e6968487d736c75470eeae4d63b11156b622 Author: Pruthvi Madugundu Date: Tue Aug 16 19:22:31 2022 +0000 [ROCm] Fixes the kernel asserts API declaration mismatch error (#81790) This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040) The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled. Solution: For HIP we keep `__device__ __assert_fail()` and for host side compilation we want to use the `__assert_fail()` from the glibc library. Tested the code by compiling with below steps ``` python3 tools/amd_build/build_amd.py python3 setup.py develop --cmake-only cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build cmake --build build ``` The UT test_fixed_cuda_assert_async is still skipped due performance overhead. cc @jithunnair-amd Pull Request resolved: https://github.com/pytorch/pytorch/pull/81790 Approved by: https://github.com/shintaro-iwasaki, https://github.com/jeffdaily, https://github.com/malfet commit b156f3329e75a5040fdf348c7dd6552bee5fcb40 Author: Nikita Karetnikov Date: Tue Aug 16 01:15:04 2022 +0200 [primTorch] Add ref for movedim (#83278) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83278 Approved by: https://github.com/ngimel commit 2c79b9c638e98b2fed0e29c601c3ed3e227280e6 Author: Slava Kovalevskyi Date: Tue Aug 16 18:38:06 2022 +0000 module names are made more consistent with POI page (#83219) Less intrusive update after the first attempt got reverted: https://github.com/pytorch/pytorch/pull/83127 fix for: #83363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83219 Approved by: https://github.com/malfet commit 92a005883a18a0026161d933aa27a08d0ef68af2 Author: Mikayla Gawarecki Date: Fri Aug 12 23:14:13 2022 +0000 [easy] Fix .sizes() call in saved_variable.cpp for nested tensor (#83356) Small fix so that TestMultipleDispatch in the above PR will throw the correct error when using an inplace operation on a saved nested input Pull Request resolved: https://github.com/pytorch/pytorch/pull/83356 Approved by: https://github.com/albanD commit 7e7afcabe70712e8d6bad0bba0adcd93e69cfd6b Author: Richard Zou Date: Tue Aug 16 09:02:50 2022 -0700 [functorch] classify some more test failures (#83520) Classifies test failures for test_vmapvjp and test_vmapjvpall Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/83520 Approved by: https://github.com/samdow commit 52b8a581970830ad1b9a0c7ec66d16f2e9eae5b8 Author: Richard Zou Date: Tue Aug 16 09:02:50 2022 -0700 [functorch] audit skips and xfails for vjp tests (#83518) Went through test_vjp, test_grad, test_vjpvjp Pull Request resolved: https://github.com/pytorch/pytorch/pull/83518 Approved by: https://github.com/samdow commit 64a3fbae5e6e4fe5a5b71d065b30549ca7a03847 Author: Richard Zou Date: Tue Aug 16 08:04:27 2022 -0700 [functorch] Classify some vmap failures with comments (#83517) The silent incorrectness issues are hi-pri Test Plan: - wait for tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/83517 Approved by: https://github.com/samdow commit a3e3cbfbbe093d9046d704738c52212f7e76b11c Author: Nikita Karetnikov Date: Tue Aug 16 01:15:04 2022 +0200 [primTorch] Add ref for diagonal and more test inputs (#82321) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82321 Approved by: https://github.com/ngimel commit 4010f96121f85f452d22692fb7fa4f3fb84d76d8 Author: Nikita Karetnikov Date: Tue Aug 16 01:15:03 2022 +0200 [primTorch] Fix off by 1 in `canonicalize_dim` (#83198) Also fix an issue in the `unsqueeze` ref due to this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83198 Approved by: https://github.com/ngimel commit 6a5ca409da0d5f54997d5a74dbf36782bd42c3a3 Author: Seonglyong Gong Date: Tue Aug 16 17:42:34 2022 +0000 Revert "reverted diff: Add python stack tracing option on on-demand flow" (#82378) Summary: Changes: add an option in Config; can use 'PYTHON_STACK_TRACE=true' option (via .conf) deliver PYTHON_STACK_TRACE value to kineto_client_interface start() abstract class also changed. Trace after changes by running //kineto/libkineto/fb/integration_tests/trace_tester.cpp (requested by chaekit) https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1657304871%2F127.0.0.1%2Flibkineto_activities_3502962.json.gz&bucket=gpu_traces Test Plan: launch a python test case with the following command for on-demand flow: echo -e "PYTHON_STACK_TRACE=true" > /tmp/scott_kineto.conf && dyno gputrace --gputrace_duration 300ms --gpuconf /tmp/scott_kineto.conf Reviewed By: chaekit Differential Revision: D38220201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82378 Approved by: https://github.com/chaekit commit bb94a13d0369ff6489d12c0e658b1257dabdf3d9 Author: ssjia Date: Mon Aug 15 20:05:00 2022 -0700 [vulkan][fix] Fix unsafe direct array access (#83432) This diff fixes an instance of unsafe array access of a sizes array. Differential Revision: [D38710499](https://our.internmc.facebook.com/intern/diff/D38710499/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83432 Approved by: https://github.com/kirklandsign, https://github.com/manuelcandales commit 08d38bbcfba4c09c8463acebd7bfc436c7f3a229 Author: ssjia Date: Mon Aug 15 15:10:24 2022 -0700 [vulkan] Replace *_size() functions with get_dim() (#83423) This diff replaces the `batch_size`, `channels_size`, etc. functions with a template function `get_dim` to reduce duplicate code. `batch_size()` has been replaced with `get_dim` and so on. Differential Revision: [D38706526](https://our.internmc.facebook.com/intern/diff/D38706526/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83423 Approved by: https://github.com/salilsdesai commit cd86d2551525446f2046b34143935d9db1fc5e7a Author: Nikita Karetnikov Date: Tue Aug 16 17:23:00 2022 +0000 [primTorch] Move addcdiv from decompositions -> refs (#80842) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80842 Approved by: https://github.com/Lezcano, https://github.com/ngimel commit 59fccab85775da7a0ecf33bda241f81eade3ad4b Author: Ramiro Leal-Cavazos Date: Tue Aug 16 17:13:21 2022 +0000 [Shape Fns] Fix handling of empty dim list in sum_mean_dim shape fn (#83357) The current implementation of the `sum_mean_dim` shape function takes `dim=[]` and `dim=None` to mean "no reduction". However, in the ops `torch.sum` and `torch.mean`, both `dim=[]` and `dim=None` are equivalent to "reduce along all dimensions". This commit fixes the handling of `dim` in the `sum_mean_dim` shape function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83357 Approved by: https://github.com/Gamrix commit d589aa531ffc3cb657f9f76d38abf034df474c57 Author: Michael Gschwind Date: Tue Aug 16 16:53:10 2022 +0000 TS jit 2 week compatibility window for new TEL forward() (#83467) Summary: TS jit 2 week compatibility window for new TEL forward() Test Plan: sandcastle Differential Revision: D38711177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83467 Approved by: https://github.com/erichan1, https://github.com/jbschlosser commit cf4fb5a6313d467a1024849a4de0f253400247ff Author: Edward Z. Yang Date: Tue Aug 16 10:23:24 2022 -0400 Make test_jvpvjp_as_strided_scatter skipped due to flaky (#83516) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83516 Approved by: https://github.com/zou3519 commit f9a3d82220586e3804bbc5317658115296dc6c18 Author: albanD Date: Tue Aug 16 15:32:43 2022 +0000 Fix typo in MPS allocator (#83465) Fixes https://github.com/pytorch/pytorch/issues/81184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83465 Approved by: https://github.com/malfet commit 4c8cfb57aa3ac58112efb693635198b07edf008f Author: Edward Z. Yang Date: Mon Aug 15 20:03:13 2022 -0700 Convert SymInt tracing to mode based tracing (#83380) We're on our way to deleting ProxyTensor entirely (see https://github.com/pytorch/pytorch/pull/83330 ), but before we can do that, we have to delete ProxySymInt first. Here's the plan. Changes in torch.fx.experimental.symbolic_shapes * The general idea is to do mode based tracing. This means we need a mode that can interpose on all SymInt operations. There are a few ways to do this, but I've done it the easy way: (1) I have a separate mode for SymInt operations specifically called SymDispatchMode, and (2) this mode operates on PySymInt (and not the basic SymInt which is user visible). I elided Int from the name because if we add SymFloats I want to use the same mode to handle those as well, and I used Dispatch rather than Function because this is the "inner" dispatch operating PySymInt and not SymInt (this is not a perfect analogy, but SymFunctionMode definitely seemed wrong as you still must go through the C++ binding.) The mode is entirely implemented in Python for ease of implementation. We could have implemented this more symmetrically to TorchFunctionMode in C++, but I leave that as later work; this API is unlikely to get used by others (unlike TorchFunctionMode). One downside to not doing the mode in C++ is that we still have to do the hop via a preexisting PySymInt to wrap; this is currently not a big deal as conversion to SymInts only really happens when there is already another SymInt floating around. SymDispatchMode is pared down from TorchDispatchMode; there is no ancestor tracking since I don't expect people to be mixing up SymDispatchModes. * I made some improvements for tracing. When I invoke the SymDispatchMode handler, I would like constants to show up as constants, so they can be directly inlined into the FX graph (rather than going through a wrapping process first, and then the wrapped SymInt being used in the operation). To do this, I directly track if a PySymInt is a constant at construction time. Only wrapped PySymInts are constants. * For convenience, PySymInts now support all magic methods that regular SymInts do. This is so that redispatch inside the SymDispatchMode can be written the idiomatic way `func(*args, **kwargs)` where func is an operator. The original names are retained for direct C++ calls. Changes in torch.fx.experimental.proxy_tensor * OK, so we got a new SymDispatchMode, so we define a ProxySymDispatchMode and activate it when we start tracing. This mode is currently unconditionally activated although technically we only need to activate it when doing symbolic tracing (it doesn't matter either way as there are no SymInts if you are not doing symbolic tracing). * We delete ProxySymInt. To do this, we must now record the proxy for the SymInt some other way. Based on discussion with Chillee, it is more intuitive to him if the proxies are still recorded on the SymInt in some way. So we store them in the `__dict__` of the PySymInt, indexed by Tracer. An improvement is to make this a weak map, so that we remove all of these entries when the tracer dies. In an original version of this PR, I keyed on the mode itself, but tracer is better as it is accessible from both modes (and as you will see, we will need to fetch the map from both the ProxySymDispatchMode as well as the ProxyTorchDispatchMode.) The implementation of SymDispatchMode now simply retrieves the proxies, performs the underlying operation as well as the FX graph recording, and then records the output proxy to the PySymInt. Note that FX tracing does not work with proxies and SymInts, so we manually call `call_function` to ensure that the correct operations get recorded to the graph. This means conventional FX retracing with proxies only will not work with these graphs, but there wasn't really any reason to do this (as opposed to `make_fx` retracing) anyway. Constants are detected and converted directly into Python integers. * SymInts can show up as arguments to tensor operations, so they must be accounted for in ProxyTorchDispatchMode as well. This is done by searching for SymInt arguments and converting them into proxies before the proxy call. This can be done more efficiently in a single `tree_map` but I'm lazy. The helper `unwrap_symint_proxy` conveniently implements the unwrapping in one place given a tracer; unfortunately it cannot be shared with SymDispatchMode as SymDispatchMode gets PySymInts, but ProxyTensorMode gets SymInts. Similarly, tensors that are returned from tensor operations can have SymInts in their shapes, which need fresh proxies allocated. To avoid leaking internal details of SymInt shape computation to the tensor operation graph, these SymInts are always given proxies derived from `x.size(dim)` call on their return tensor. We also need to do this for strides and numel but have not done so yet. Furthermore, we must avoid tracing internal SymInt calls while we run meta operations on the true operation; this is achieved by also disabling SymInt tracing on the inside of tensor tracing. This is analogous to how tensor tracing is disabled inside the implementation of tracing mode, but unfortunately we are unable to use the same mechanism (this would have been easier if the two modes could be combined somehow, and I am amenable to suggestions to try harder to achieve this.) * Because there are no more ProxySymInts, we no longer need to do anything to unwrap SymInt. Furthermore, we do not need to reallocate ProxySymInts on class creation. * If a bare SymInt without a Proxy is encountered, it is assumed that this must be a constant. `create_arg` handles this case. Non-constant free SymInts result in an assert error. * The initial input handling in `dispatch_trace` involves traversing all of the input tensors, traversing over their shapes, and assigning proxies for the SymInts in shapes in the same way we handle proxies for the output tensors. The preexisting testing is inadequate but will be better after I rebase past https://github.com/pytorch/pytorch/pull/82209 Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83380 Approved by: https://github.com/samdow commit a3907ca92d73b380f4f1624e39b7f0c6a06ea5b1 Author: Edward Z. Yang Date: Mon Aug 15 20:03:12 2022 -0700 Respect TorchDispatchMode for shallow_copy_and_detach (#83372) I noticed I was missing tensor creations with modes when I tried to delete proxy tensor. This was the cause. Hypothetically, all PyInterpreter calls could get this treatment. But I think it only matters for detach; the rest do not return Tensors and most modes will not be interested in them. Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83372 Approved by: https://github.com/zou3519 commit 1665715cb0fcc7c5dd5311c36cd0ef5dac660442 Author: Brian Hirsh Date: Mon Aug 15 13:27:33 2022 -0700 add sym_strides() function, use in fake/proxy tensors (#81300) Add `TensorImpl::sym_strides`, bind it to python with `torch.ops.aten.sym_strides`, and use it in `ProxyTensor` and `FakeTensor`. Before, `ProxyTensor` was generating `ProxySymInt`'s for the sizes, but not for the strides. Internally we still represent strides with a `SymIntArrayRef` though, so I ran into some weird issues where sizes were showing up as `ProxySymInt`, but strides were `PySymInt`'s. Differential Revision: [D38594558](https://our.internmc.facebook.com/intern/diff/D38594558) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81300 Approved by: https://github.com/ezyang commit 2e8e386d6f718cc6e4e5df21ec8b02ae730a6283 Author: Ivan Yashchuk Date: Tue Aug 16 13:40:40 2022 +0000 Add refs for real and imag to __all__ (#83057) `imag` and `real` were missing from the ref's `__all__` list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83057 Approved by: https://github.com/ngimel commit 3500df79831d21725ea8d3883254ea8e3f11245e Author: kshitij12345 Date: Tue Aug 16 13:30:40 2022 +0000 [composite compliance] istft (#82955) Ref #69991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82955 Approved by: https://github.com/zou3519 commit a9ba3fe1dbf2cea45c9a7e723010c27c211f7fe3 Author: PyTorch MergeBot Date: Tue Aug 16 10:14:25 2022 +0000 [vision hash update] update the pinned vision hash (#83503) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83503 Approved by: https://github.com/pytorchbot commit 445b55682a4b794c8de89a2ffe25eaf96d1bd149 Author: PyTorch MergeBot Date: Tue Aug 16 10:13:25 2022 +0000 [xla hash update] update the pinned xla hash (#83502) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83502 Approved by: https://github.com/pytorchbot commit f77adb71cb78eabf8967d1d8139dfd893d58c5c5 Author: Horace He Date: Tue Aug 16 06:49:34 2022 +0000 made some minor refactoring of minifier (#83439) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83439 Approved by: https://github.com/ezyang commit ff75562cffb54d7500a94a1091e06dc9b5c284fc Author: Rob Zinkov Date: Tue Aug 16 08:19:46 2022 +0000 Adding maximize to rprop (#81864) Added the maximize flag #68052 to rprop optimizer and updates the respective tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81864 Approved by: https://github.com/albanD commit a8941aa99676436eb4f10595b010bb48d6dc3c6e Author: Nikita Shulga Date: Tue Aug 16 07:51:11 2022 +0000 [BE] Better test stats errors (#83484) When `BUILD_ENVIRONMENT` is not defined, print sensible error message Which is better than: ``` Could not download https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/test-times.json because: 'BUILD_ENVIRONMENT' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83484 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi commit 03f9c7922edb4089ff65f0e2fde6c7cf8e2ab640 Author: Nikita Shulga Date: Mon Aug 15 19:29:20 2022 -0700 [FuncTorch] Fix compilation with -Werror (#83463) - Fixed signed-unsigned compares - Get rid of unused variables - Typecast to `PyCFunction` via `(void*)` - `ssize_t` is not a valid type on Win32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83463 Approved by: https://github.com/zou3519 commit a5f688ad0a9630c71695ca132ec4236a51677067 Author: Rohan Varma Date: Tue Aug 16 07:20:58 2022 +0000 Remove unused var from ProcessGroupGloo (#83286) This variable was not used since the logic was refactored into `getElapsedTime`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83286 Approved by: https://github.com/mrshenli, https://github.com/H-Huang commit 43a94daca0a141fb3d7b3cdceee3b3dc296a0aa4 Author: PyTorch MergeBot Date: Tue Aug 16 02:47:17 2022 +0000 Revert "Add a workflow to cache third party dependencies on S3 (#83306)" This reverts commit 0961dd6e9981fe6580ee3f1d2c622f526d8ab9a9. Reverted https://github.com/pytorch/pytorch/pull/83306 on behalf of https://github.com/huydhn due to The fix in https://github.com/pytorch/pytorch/pull/83489 still doesn't work commit 641d75d0ba0053816a73a6c977ac4a2d6e00e896 Author: PyTorch MergeBot Date: Tue Aug 16 02:42:25 2022 +0000 Revert "S3 third-party deps sync workflow: specify correct secrets (#83489)" This reverts commit 7ec49810cc8a44cc2bc53c115fb03656ab136751. Reverted https://github.com/pytorch/pytorch/pull/83489 on behalf of https://github.com/huydhn due to It still doesn't work https://github.com/pytorch/pytorch/runs/7849815716 commit 7ec49810cc8a44cc2bc53c115fb03656ab136751 Author: Ivan Zaitsev Date: Tue Aug 16 02:16:12 2022 +0000 S3 third-party deps sync workflow: specify correct secrets (#83489) A followup for: #83306 image The correct secrets to access OSSCI buckets have `AWS_OSSCI_S3_***` prefix. This PR makes the workflow use the correct secrets. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83489 Approved by: https://github.com/huydhn, https://github.com/malfet commit 794ae6417456bedb99749a1b50ab17a9fda2b466 Author: Rohan Varma Date: Fri Aug 12 01:24:43 2022 +0000 [FSDP] Pass kwargs to load_state_dict (#83309) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83309 Approved by: https://github.com/awgu commit 0961dd6e9981fe6580ee3f1d2c622f526d8ab9a9 Author: Ivan Zaitsev Date: Mon Aug 15 23:58:36 2022 +0000 Add a workflow to cache third party dependencies on S3 (#83306) For the context, see #75703, pytorch/builder#1096. Note: depends on the docker image `pytorch/sync_s3_thirdparty_deps` from pytorch/builder#1096 Summary of additions: * workflow config (based on pytorch/sync_s3_thirdparty_deps GH action) * S3 mapping config (sync_s3_cache.yml) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83306 Approved by: https://github.com/huydhn commit c177a7124cee54c5dfc30c38ca56414ddd9b5dca Author: John Clow Date: Mon Aug 15 13:59:16 2022 -0700 Adding additional debug logging and documentation for shape functions (#77115) Pull Request resolved: https://github.com/pytorch/pytorch/pull/77115 Approved by: https://github.com/eellison commit 9e1daf764419fb0b57c66dedce486e067cdd6be0 Author: Horace He Date: Mon Aug 15 22:07:52 2022 +0000 skip flaky tests for now (#83482) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83482 Approved by: https://github.com/huydhn commit cb64b558eeb273d1ad8f1e25a11725fc85bb1ddc Author: Edward Z. Yang Date: Mon Aug 15 11:56:27 2022 -0400 Add spaces so example is flake8 compatible (#83420) Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83420 Approved by: https://github.com/jbschlosser commit b75a214b36d74a775f3c4542f58ac8f9c9f107fd Author: Huy Do Date: Mon Aug 15 21:25:05 2022 +0000 Fix windows flaky test env var (#83466) Reland #83426 and #83436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83466 Approved by: https://github.com/atalman commit a234774096cb99cb8826fc56092f720e74634987 Author: PyTorch MergeBot Date: Mon Aug 15 21:11:25 2022 +0000 Revert "Fix flaky tests env variable length on Windows (#83426)" This reverts commit beb83d7419bc21f9ca8881de81c8421409dd8f3a. Reverted https://github.com/pytorch/pytorch/pull/83426 on behalf of https://github.com/huydhn due to This has a bug which breaks internal builds D38714900 and other OSS test. The bug has been fixed by https://github.com/pytorch/pytorch/pull/83436. But we decide that it is safer to revert both, merge them into one PR, then reland the fix commit 6266003d71e85beabef52da54ccf2ae70c11491d Author: PyTorch MergeBot Date: Mon Aug 15 21:07:45 2022 +0000 Revert "Check if IMPORT_DISABLED_TESTS is set (#83436)" This reverts commit 1187dedd336e4f6c0028e0d081b676c2f5796316. Reverted https://github.com/pytorch/pytorch/pull/83436 on behalf of https://github.com/huydhn due to The previous change breaks internal builds D38714900 and other OSS tests. The bug has been fixed by this PR. But we decide that it is safer to revert both, merge them into one PR, then reland the fix commit dffa5d309a6f55aa8e07db827750a2b99e9e6b6e Author: Catherine Lee Date: Mon Aug 15 20:03:08 2022 +0000 shard `trunk / linux-bionic-cuda10.2-py3.9-gcc7 / test (default` from 2 -> 4 (#83424) it takes a long time Pull Request resolved: https://github.com/pytorch/pytorch/pull/83424 Approved by: https://github.com/huydhn commit 43f950af201f8a39e5728a65e03cfcafec04585d Author: soulitzer Date: Mon Aug 15 12:17:17 2022 -0400 Manually shard slow-gradcheck CI job to prevent timeout (#83354) Fixes https://github.com/pytorch/pytorch/issues/83335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83354 Approved by: https://github.com/malfet, https://github.com/albanD commit 13e2a0a04838414058a45888a081a9ac81adb311 Author: Milad Mohammadi Date: Mon Aug 15 19:48:25 2022 +0000 Add `getDynamicValue` to `dynamic_ir` (#82188) Add `getDynamicValue` to `dynamic_ir`. This is a precondition to support https://github.com/pytorch/xla/issues/3759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82188 Approved by: https://github.com/Krovatkin commit ca4f3534514a2310c540986cceb4cf9c3d6a0995 Author: Milad Mohammadi Date: Mon Aug 15 19:47:14 2022 +0000 Updated the build process for PyTorch/XLA CI testing (#82497) Updated the build process for PyTorch/XLA CI testing Related issue https://github.com/pytorch/pytorch/issues/82425 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82497 Approved by: https://github.com/wconstab commit 60295e3abde373a1ca7ceea518ef65c8d7c7f058 Author: Richard Zou Date: Mon Aug 15 08:42:38 2022 -0700 [functorch] Delete functorch_lagging_op_db (#83418) No need to have a lagging op db because there are no more sync issues between functorch and pytorch. If someone adds a new OpInfo, then we should explicitly check if we support it or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83418 Approved by: https://github.com/samdow commit 759c37a4f4fb1962c37650bf24d88a7fa0918a5e Author: Nikolay Korovaiko Date: Mon Aug 15 19:12:15 2022 +0000 make sure arguments are tuples otherwise they won't be hashable (#83342) make sure arguments are tuples otherwise they won't be hashable if used in autograd.py or any other places that uses dictionaries for that matter Pull Request resolved: https://github.com/pytorch/pytorch/pull/83342 Approved by: https://github.com/bdhirsh, https://github.com/albanD commit a65825116a6e166d1d201acab9389986c827e422 Author: Horace He Date: Mon Aug 15 17:34:35 2022 +0000 clear cache in-between each test (#83431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83431 Approved by: https://github.com/ezyang, https://github.com/malfet commit 1187dedd336e4f6c0028e0d081b676c2f5796316 Author: Huy Do Date: Mon Aug 15 18:40:20 2022 +0000 Check if IMPORT_DISABLED_TESTS is set (#83436) I just realize that some tests, i.e. MAC MPS https://github.com/pytorch/pytorch/runs/7842997537?check_suite_focus=true, doesn't have this IMPORT_DISABLED_TESTS set. Thus, it can be None Pull Request resolved: https://github.com/pytorch/pytorch/pull/83436 Approved by: https://github.com/clee2000 commit 2d8f091f6a193fc0e9d3c6e91bce8d66fc3f31de Author: Edward Z. Yang Date: Mon Aug 15 06:56:28 2022 -0700 Move TorchDispatchModeTLS to c10/core (#83370) I need to access it directly from TensorImpl to route directly TensorImpl induced operations to modes (upcoming PR). Signed-off-by: Edward Z. Yang Pull Request resolved: https://github.com/pytorch/pytorch/pull/83370 Approved by: https://github.com/zou3519 commit beb83d7419bc21f9ca8881de81c8421409dd8f3a Author: Huy Do Date: Mon Aug 15 17:18:55 2022 +0000 Fix flaky tests env variable length on Windows (#83426) We are currently keeping all flaky tests in a single env variable and this breaks Windows CI because the upper limit of a single variable there is only 32767 chars, i.e. https://github.com/pytorch/pytorch/runs/7840599767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83426 Approved by: https://github.com/janeyx99 commit 03061472768c11afe8a3a822458ac36dc1130112 Author: Horace He Date: Mon Aug 15 02:44:25 2022 +0000 Fix issue with compiling under with_grad (#83395) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83395 Approved by: https://github.com/jansel commit ff5fe9e62284cb0a3ca6976c40978c9022c4503f Author: Jeff Daily Date: Mon Aug 15 16:04:09 2022 +0000 [ROCm] enable jiterator (#77982) Enables jiterator for ROCm builds. This includes necessary porting when hiprtc and nvrtc behavior differed. This also ported ROCm versus CUDA differences w.r.t. MAX_DIMS and NUM_THREADS from the non-jiterator code paths into jiterator. CI with ciflow/trunk label to force running ROCm workflows that are currently trunk-only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77982 Approved by: https://github.com/ngimel commit 316cb8a06a9860b7540d4032314005e7afd936aa Author: Mor Tzur Date: Mon Aug 15 15:08:55 2022 +0000 embedded_interpreter_hip (#83329) Summary: Adding embedded_interpreter_hip and deps to enable torch::deploy on AMD. Test Plan: Sandcastle Reviewed By: zrphercule Differential Revision: D38546701 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83329 Approved by: https://github.com/jfix71 commit 1bf23713658a702bd177f0e3478baa8269c3f120 Author: chengscott <60510scott@gmail.com> Date: Mon Aug 15 14:47:17 2022 +0000 Rename path on Windows from lib/x64 to lib\x64 (#83417) Use `os.path.join` to join path Pull Request resolved: https://github.com/pytorch/pytorch/pull/83417 Approved by: https://github.com/ezyang commit 50b1ecc28f9b8cf8c560f9ce087c87c4b18a41fb Author: kshitij12345 Date: Mon Aug 15 14:31:57 2022 +0000 [fix] cat : support different dtype tensor with 0-dim like before (#83391) Fixes: https://github.com/pytorch/pytorch/issues/82457 TODO: * [x] Add test (new test also passes on PyTorch version 1.11) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83391 Approved by: https://github.com/ezyang commit d4bd88b64be3d7156cf8617e33faae8eca307ce4 Author: Jesse Cai Date: Fri Aug 12 12:12:57 2022 -0700 [Quant][fx] Remove WEIGHT_INDEX_DICT and BIAS_INDEX_DICT (#83263) Summary: This change adds in input_type_to_index mappings to the backend patterns for `nn.functional.linear`, `nn.functional.conv1d`, `nn.functional.conv1d`, and `nn.functional.conv3d`. This let's us remove `WEIGHT_INDEX_DICT` and `BIAS_INDEX_DICT` from `prepare.py`. Instead we pass around `backend_config` and check wether an arg is weight/bias agains that config Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Reviewers: @andrewor14 Subscribers: Tasks: Tags: quant, fx Differential Revision: [D38705516](https://our.internmc.facebook.com/intern/diff/D38705516) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83263 Approved by: https://github.com/andrewor14 --- .bazelrc | 2 +- .circleci/README.md | 468 + .../cimodel/data/simple/ios_definitions.py | 48 +- .../cimodel/data/simple/macos_definitions.py | 18 +- .circleci/cimodel/data/simple/nightly_ios.py | 8 +- .../data/simple/util/branch_filters.py | 11 + .../cimodel/data/simple/util/versions.py | 14 +- .circleci/config.yml | 289 +- .circleci/docker/build.sh | 127 +- .circleci/docker/centos-rocm/Dockerfile | 4 + .circleci/docker/common/install_base.sh | 9 +- .circleci/docker/common/install_conda.sh | 19 +- .circleci/docker/common/install_cudnn.sh | 8 +- .circleci/docker/common/install_docs_reqs.sh | 4 +- .circleci/docker/common/install_protobuf.sh | 2 +- .circleci/docker/common/install_rocm.sh | 54 +- .circleci/docker/common/install_rocm_magma.sh | 29 + .circleci/docker/common/install_ucc.sh | 9 +- .circleci/docker/requirements-ci.txt | 28 +- .circleci/docker/ubuntu-cuda/Dockerfile | 6 + .circleci/docker/ubuntu-rocm/Dockerfile | 4 + .circleci/docker/ubuntu/Dockerfile | 1 + .circleci/generate_config_yml.py | 6 + .circleci/scripts/binary_install_miniconda.sh | 4 +- .circleci/scripts/binary_ios_build.sh | 2 +- .circleci/scripts/binary_ios_test.sh | 21 +- .circleci/scripts/binary_ios_upload.sh | 4 +- .circleci/scripts/binary_populate_env.sh | 11 +- .circleci/scripts/binary_upload.sh | 24 +- .circleci/scripts/driver_update.bat | 2 +- .../scripts/functorch_doc_push_script.sh | 47 + .circleci/scripts/python_doc_push_script.sh | 3 + .circleci/scripts/setup_ci_environment.sh | 6 +- .../scripts/setup_linux_system_environment.sh | 2 +- .circleci/scripts/vs_install.ps1 | 2 +- .circleci/scripts/vs_install_cmath.ps1 | 2 +- .circleci/scripts/windows_cudnn_install.sh | 4 +- .../job-specs/job-specs-custom.yml | 266 +- .github/ISSUE_TEMPLATE/ci-sev.md | 2 + .github/PULL_REQUEST_TEMPLATE.md | 2 +- .github/actionlint.yaml | 3 + .github/actions/build-android/action.yml | 2 +- .../actions/calculate-docker-image/action.yml | 12 +- .../download-build-artifacts/action.yml | 2 +- .../actions/filter-test-configs/action.yml | 62 + .../actions/get-workflow-job-id/action.yml | 4 +- .github/actions/pull-docker-image/action.yml | 23 - .github/actions/setup-rocm/action.yml | 7 +- .github/actions/setup-ssh/action.yml | 17 - .github/actions/setup-win/action.yml | 10 +- .github/actions/teardown-linux/action.yml | 28 - .../actions/test-pytorch-binary/action.yml | 1 - .../actions/upload-test-artifacts/action.yml | 26 +- .github/auto_request_review.yml | 29 + .github/ci_commit_pins/huggingface.txt | 1 + .github/ci_commit_pins/text.txt | 1 + .github/ci_commit_pins/timm.txt | 1 + .github/ci_commit_pins/torchbench.txt | 1 + .github/ci_commit_pins/torchdynamo.txt | 1 - .github/ci_commit_pins/triton.txt | 1 + .github/ci_commit_pins/vision.txt | 2 +- .github/ci_commit_pins/xla.txt | 2 +- .github/labeler.yml | 51 + .github/merge_rules.json | 302 - .github/merge_rules.yaml | 374 + .github/requirements-gha-cache.txt | 18 + .github/requirements/README.md | 24 + .github/requirements/conda-env-Linux-X64 | 10 + .github/requirements/conda-env-macOS-ARM64 | 20 + .github/requirements/conda-env-macOS-X64 | 18 + .../requirements/pip-requirements-macOS.txt | 22 + .github/scale-config.yml | 69 - .github/scripts/README.md | 4 +- .../scripts/build_publish_nightly_docker.sh | 44 - .github/scripts/build_triton_wheel.py | 51 + .github/scripts/check_labels.py | 87 + .github/scripts/comment_on_pr.py | 34 + .github/scripts/ensure_actions_will_cancel.py | 20 +- .github/scripts/fetch_latest_green_commit.py | 4 +- .github/scripts/filter_test_configs.py | 207 + .../scripts/generate_binary_build_matrix.py | 41 +- .github/scripts/generate_ci_workflows.py | 15 +- .github/scripts/generate_pytorch_version.py | 31 +- .github/scripts/gql_mocks.json | 39801 +++++++++++----- .github/scripts/install_nvidia_utils_linux.sh | 57 - .github/scripts/parse_ref.py | 24 +- .github/scripts/pr-sanity-check.sh | 60 + .github/scripts/process_commit.py | 106 - .github/scripts/run_torchbench.py | 42 +- .github/scripts/test_check_labels.py | 77 + .../scripts/test_fetch_latest_green_commit.py | 5 +- .github/scripts/test_filter_test_configs.py | 118 + .github/scripts/test_trymerge.py | 45 +- .github/scripts/trymerge.py | 373 +- .github/scripts/trymerge_explainer.py | 93 +- .github/scripts/tryrebase.py | 1 + .github/scripts/update_commit_hashes.py | 1 + .github/scripts/wait_for_ssh_to_drain.sh | 13 - .github/templates/common.yml.j2 | 320 +- .../linux_binary_build_workflow.yml.j2 | 7 +- .../macos_binary_build_workflow.yml.j2 | 27 +- .../windows_binary_build_workflow.yml.j2 | 4 +- .github/workflows/_android-build-test.yml | 14 +- .../workflows/_android-full-build-test.yml | 52 +- .github/workflows/_bazel-build-test.yml | 14 +- .github/workflows/_binary-build-linux.yml | 59 +- .github/workflows/_binary-test-linux.yml | 53 +- .github/workflows/_binary-upload.yml | 66 +- .github/workflows/_buck-build-test.yml | 27 +- .github/workflows/_docs.yml | 67 +- .github/workflows/_ios-build-test.yml | 76 +- .github/workflows/_linux-build.yml | 65 +- .github/workflows/_linux-test.yml | 81 +- .github/workflows/_mac-build.yml | 80 +- .github/workflows/_mac-test-mps.yml | 42 +- .github/workflows/_mac-test.yml | 91 +- .github/workflows/_rocm-test.yml | 38 +- .github/workflows/_run_android_tests.yml | 20 +- .github/workflows/_update-commit-hash.yml | 2 +- .github/workflows/_win-build.yml | 43 +- .github/workflows/_win-test.yml | 40 +- .github/workflows/auto_request_review.yml | 22 + .github/workflows/build-triton-wheel.yml | 149 + .github/workflows/check-labels.yml | 44 + .github/workflows/docker-builds.yml | 16 +- .github/workflows/docker-release.yml | 110 + .../generated-linux-binary-conda-nightly.yml | 480 - ...inux-binary-libtorch-cxx11-abi-nightly.yml | 692 +- ...linux-binary-libtorch-pre-cxx11-master.yml | 18 +- ...inux-binary-libtorch-pre-cxx11-nightly.yml | 692 +- ...enerated-linux-binary-manywheel-master.yml | 22 +- ...nerated-linux-binary-manywheel-nightly.yml | 1021 +- ...rated-macos-arm64-binary-conda-nightly.yml | 54 +- ...rated-macos-arm64-binary-wheel-nightly.yml | 158 +- .../generated-macos-binary-conda-nightly.yml | 72 +- ...acos-binary-libtorch-cxx11-abi-nightly.yml | 92 +- ...acos-binary-libtorch-pre-cxx11-nightly.yml | 92 +- .../generated-macos-binary-wheel-nightly.yml | 72 +- ...generated-windows-binary-conda-nightly.yml | 1246 +- ...d-windows-binary-libtorch-debug-master.yml | 4 +- ...-windows-binary-libtorch-debug-nightly.yml | 1004 +- ...windows-binary-libtorch-release-master.yml | 4 +- ...indows-binary-libtorch-release-nightly.yml | 1004 +- .../generated-windows-binary-wheel-master.yml | 236 - ...generated-windows-binary-wheel-nightly.yml | 1246 +- .github/workflows/inductor.yml | 41 + .github/workflows/labeler.yml | 20 + .github/workflows/lint.yml | 154 +- .github/workflows/mac-mps.yml | 4 + .github/workflows/nightly.yml | 9 + .github/workflows/periodic.yml | 172 +- .github/workflows/pr-labels.yml | 32 - .github/workflows/pull.yml | 209 +- .../workflows/push_nightly_docker_ghcr.yml | 39 - .github/workflows/revert.yml | 27 +- .github/workflows/run_torchbench.yml | 38 +- .github/workflows/scorecards.yml | 55 + .github/workflows/trunk.yml | 199 +- .github/workflows/trymerge.yml | 38 +- .github/workflows/tryrebase.yml | 28 +- .github/workflows/update-commit-hashes.yml | 37 - .github/workflows/update-viablestrict.yml | 27 +- .github/workflows/update_pytorch_labels.yml | 2 +- .github/workflows/update_s3_htmls.yml | 2 +- .github/workflows/upload-test-stats.yml | 31 +- .github/workflows/weekly.yml | 19 + .gitignore | 15 +- .gitmodules | 6 + .jenkins/caffe2/bench.sh | 54 - .jenkins/caffe2/build.sh | 231 - .jenkins/caffe2/dirty.sh | 7 - .jenkins/caffe2/test.sh | 7 +- .jenkins/pytorch/build-asan.sh | 2 +- .jenkins/pytorch/build-tsan.sh | 29 + .jenkins/pytorch/build.sh | 58 +- .jenkins/pytorch/common.sh | 22 - .jenkins/pytorch/common_utils.sh | 102 +- .jenkins/pytorch/dirty.sh | 9 - .jenkins/pytorch/macos-build.sh | 6 +- .jenkins/pytorch/macos-common.sh | 46 - .jenkins/pytorch/macos-test.sh | 25 - .jenkins/pytorch/multigpu-test.sh | 13 +- .jenkins/pytorch/test.sh | 251 +- .../win-test-helpers/build_pytorch.bat | 13 +- .../install_test_functorch.bat | 9 - .../activate_miniconda3.bat | 2 +- .../installation-helpers/install_magma.bat | 2 +- .../installation-helpers/install_mkl.bat | 2 +- .../installation-helpers/install_sccache.bat | 4 +- .../win-test-helpers/setup_pytorch_env.bat | 3 +- .jenkins/pytorch/win-test.sh | 4 - .lintrunner.toml | 50 +- BUILD.bazel | 44 +- CITATION | 10 - CITATION.cff | 73 + CMakeLists.txt | 118 +- CODEOWNERS | 57 +- CONTRIBUTING.md | 126 +- Dockerfile | 33 +- MANIFEST.in | 1 + Makefile | 4 + README.md | 16 +- RELEASE.md | 8 +- WORKSPACE | 11 +- android/gradle.properties | 2 +- .../src/main/cpp/pytorch_jni_common.cpp | 2 +- .../src/main/cpp/pytorch_jni_jit.cpp | 12 +- .../src/main/cpp/pytorch_jni_lite.cpp | 12 +- aten/CMakeLists.txt | 4 +- aten/src/ATen/ATen.h | 4 + aten/src/ATen/BatchedTensorImpl.cpp | 2 +- aten/src/ATen/BatchingRegistrations.cpp | 33 +- aten/src/ATen/CMakeLists.txt | 32 +- aten/src/ATen/Context.cpp | 39 +- aten/src/ATen/Context.h | 27 + aten/src/ATen/DLConvertor.cpp | 25 +- aten/src/ATen/DeviceGuard.h | 3 +- aten/src/ATen/Dispatch.h | 18 +- aten/src/ATen/EmptyTensor.cpp | 99 +- aten/src/ATen/EmptyTensor.h | 33 +- aten/src/ATen/ExpandUtils.cpp | 10 +- aten/src/ATen/ExpandUtils.h | 36 +- aten/src/ATen/FunctionalInverses.cpp | 65 +- aten/src/ATen/FunctionalStorageImpl.cpp | 90 +- aten/src/ATen/FunctionalStorageImpl.h | 68 +- aten/src/ATen/FunctionalTensorWrapper.cpp | 167 +- aten/src/ATen/FunctionalTensorWrapper.h | 62 +- aten/src/ATen/FunctionalizeFallbackKernel.cpp | 16 +- aten/src/ATen/InferSize.h | 21 +- aten/src/ATen/NamedTensorUtils.cpp | 6 +- aten/src/ATen/NamedTensorUtils.h | 3 +- aten/src/ATen/NestedTensorImpl.cpp | 149 +- aten/src/ATen/NestedTensorImpl.h | 95 +- aten/src/ATen/NumericUtils.h | 3 +- aten/src/ATen/OpaqueTensorImpl.h | 6 +- aten/src/ATen/PadNd.h | 28 + aten/src/ATen/Parallel.h | 1 + aten/src/ATen/PythonTorchFunctionTLS.cpp | 24 +- aten/src/ATen/PythonTorchFunctionTLS.h | 18 +- aten/src/ATen/SavedTensorHooks.cpp | 56 +- aten/src/ATen/SavedTensorHooks.h | 34 +- aten/src/ATen/SparseCsrTensorImpl.cpp | 142 +- aten/src/ATen/SparseCsrTensorImpl.h | 11 +- aten/src/ATen/SparseCsrTensorUtils.h | 52 + aten/src/ATen/SparseTensorImpl.cpp | 18 +- aten/src/ATen/SparseTensorImpl.h | 43 +- aten/src/ATen/TensorGeometry.cpp | 11 +- aten/src/ATen/TensorGeometry.h | 72 +- aten/src/ATen/TensorIndexing.h | 57 +- aten/src/ATen/TensorIterator.cpp | 16 +- aten/src/ATen/TensorIterator.h | 11 +- aten/src/ATen/TensorMeta.h | 1 + aten/src/ATen/TensorSubclassLikeUtils.h | 32 +- aten/src/ATen/TensorUtils.cpp | 46 +- aten/src/ATen/TensorUtils.h | 14 + aten/src/ATen/ThreadLocalState.cpp | 21 +- aten/src/ATen/ThreadLocalState.h | 19 +- aten/src/ATen/Utils.h | 53 - aten/src/ATen/VmapTransforms.cpp | 3 +- aten/src/ATen/VmapTransforms.h | 3 +- aten/src/ATen/WrapDimUtils.h | 90 +- aten/src/ATen/autocast_mode.cpp | 627 +- aten/src/ATen/autocast_mode.h | 32 + aten/src/ATen/core/ATen_fwd.h | 1 + aten/src/ATen/core/Formatting.cpp | 42 +- aten/src/ATen/core/Formatting.h | 4 +- aten/src/ATen/core/IListRef.h | 14 +- aten/src/ATen/core/IListRef_inl.h | 8 +- aten/src/ATen/core/IListRef_test.cpp | 14 +- aten/src/ATen/core/List_test.cpp | 4 +- aten/src/ATen/core/NamedRegistrations.cpp | 2 - aten/src/ATen/core/PhiloxRNGEngine.h | 1 - aten/src/ATen/core/PythonFallbackKernel.cpp | 29 +- aten/src/ATen/core/PythonFallbackKernel.h | 2 +- .../core/PythonOpRegistrationTrampoline.cpp | 28 + .../core/PythonOpRegistrationTrampoline.h | 18 + aten/src/ATen/core/TensorAccessor.h | 2 +- aten/src/ATen/core/TensorBase.h | 58 + aten/src/ATen/core/TorchDispatchModeTLS.cpp | 58 - aten/src/ATen/core/TorchDispatchModeTLS.h | 25 - aten/src/ATen/core/TorchDispatchUtils.cpp | 31 + aten/src/ATen/core/TorchDispatchUtils.h | 17 + aten/src/ATen/core/Variadic.h | 9 + aten/src/ATen/core/boxing/KernelFunction.h | 59 +- .../ATen/core/boxing/KernelFunction_impl.h | 67 +- .../impl/kernel_function_legacy_test.cpp | 10 +- .../core/boxing/impl/kernel_function_test.cpp | 4 +- .../boxing/impl/kernel_lambda_legacy_test.cpp | 10 +- .../core/boxing/impl/kernel_lambda_test.cpp | 4 +- .../impl/make_boxed_from_unboxed_functor.h | 30 +- .../make_boxed_from_unboxed_functor_test.cpp | 6 +- aten/src/ATen/core/class_type.cpp | 4 +- aten/src/ATen/core/custom_class.cpp | 1 + .../ATen/core/dispatch/DispatchKeyExtractor.h | 11 +- aten/src/ATen/core/dispatch/Dispatcher.cpp | 51 +- aten/src/ATen/core/dispatch/Dispatcher.h | 87 +- aten/src/ATen/core/dispatch/OperatorEntry.cpp | 90 +- aten/src/ATen/core/dispatch/OperatorEntry.h | 19 +- aten/src/ATen/core/dynamic_type.cpp | 4 - aten/src/ATen/core/dynamic_type.h | 3 +- aten/src/ATen/core/function_schema.cpp | 31 + aten/src/ATen/core/function_schema.h | 27 +- aten/src/ATen/core/interned_strings.h | 4 + aten/src/ATen/core/ivalue.cpp | 14 + aten/src/ATen/core/ivalue.h | 102 +- aten/src/ATen/core/ivalue_inl.h | 73 +- aten/src/ATen/core/jit_type.h | 184 +- aten/src/ATen/core/jit_type_base.h | 2 + aten/src/ATen/core/library.cpp | 66 +- aten/src/ATen/core/op_registration/adaption.h | 2 +- .../core/op_registration/infer_schema.cpp | 2 +- .../ATen/core/op_registration/infer_schema.h | 8 +- .../op_registration/op_registration_test.cpp | 12 +- aten/src/ATen/core/type.cpp | 5 + aten/src/ATen/cpp_custom_type_hack.h | 8 +- aten/src/ATen/cpu/vec/vec256/vec256.h | 45 + .../src/ATen/cpu/vec/vec256/vec256_bfloat16.h | 37 + aten/src/ATen/cpu/vec/vec256/vec256_double.h | 5 + aten/src/ATen/cpu/vec/vec256/vec256_float.h | 5 + .../ATen/cpu/vec/vec256/vec256_float_neon.h | 7 + aten/src/ATen/cpu/vec/vec256/vec256_int.h | 587 + aten/src/ATen/cpu/vec/vec256/vec256_qint.h | 78 + .../cpu/vec/vec256/vsx/vec256_float_vsx.h | 224 +- aten/src/ATen/cpu/vec/vec512/vec512.h | 50 + .../src/ATen/cpu/vec/vec512/vec512_bfloat16.h | 45 +- aten/src/ATen/cpu/vec/vec512/vec512_double.h | 5 + aten/src/ATen/cpu/vec/vec512/vec512_float.h | 5 + aten/src/ATen/cpu/vec/vec512/vec512_int.h | 481 + aten/src/ATen/cpu/vec/vec512/vec512_qint.h | 72 + aten/src/ATen/cpu/vec/vec_base.h | 50 +- aten/src/ATen/cuda/Atomic.cuh | 24 +- aten/src/ATen/cuda/CUDABlas.cpp | 63 +- aten/src/ATen/cuda/CUDABlas.h | 18 +- aten/src/ATen/cuda/CUDAContext.h | 2 + aten/src/ATen/cuda/CUDADataType.h | 8 +- aten/src/ATen/cuda/CUDAEvent.h | 12 +- aten/src/ATen/cuda/CUDAGeneratorImpl.cpp | 5 +- aten/src/ATen/cuda/CUDAGeneratorImpl.h | 15 +- aten/src/ATen/cuda/CUDAGraph.cpp | 41 +- aten/src/ATen/cuda/CUDAGraph.h | 1 + aten/src/ATen/cuda/CUDASparse.h | 15 +- aten/src/ATen/cuda/CUDASparseDescriptors.cpp | 8 +- aten/src/ATen/cuda/CUDASparseDescriptors.h | 15 +- aten/src/ATen/cuda/CublasHandlePool.cpp | 58 + aten/src/ATen/cuda/PeerToPeerAccess.cpp | 37 +- aten/src/ATen/cuda/detail/CUDAHooks.cpp | 17 +- aten/src/ATen/cuda/detail/KernelUtils.h | 1 + .../ATen/cuda/detail/PhiloxCudaStateRaw.cuh | 8 +- aten/src/ATen/cuda/detail/UnpackRaw.cuh | 4 +- aten/src/ATen/cuda/jiterator.h | 2 +- aten/src/ATen/cuda/jiterator_impl.h | 30 +- aten/src/ATen/cuda/llvm_complex.cpp | 28 +- aten/src/ATen/cudnn/Descriptors.cpp | 2 +- aten/src/ATen/cudnn/Descriptors.h | 13 +- aten/src/ATen/cudnn/Utils.h | 2 +- aten/src/ATen/detail/FunctionTraits.h | 24 + .../src/ATen/functorch}/ADInterpreters.cpp | 72 +- .../src/ATen/functorch}/ADInterpreters.h | 16 +- .../ATen/functorch}/BatchRulesActivation.cpp | 6 +- .../ATen/functorch}/BatchRulesBinaryOps.cpp | 51 +- .../ATen/functorch}/BatchRulesConvolution.cpp | 124 +- .../functorch}/BatchRulesDecompositions.cpp | 58 +- .../src/ATen/functorch}/BatchRulesDynamic.cpp | 15 +- .../src/ATen/functorch}/BatchRulesFactory.cpp | 51 +- .../src/ATen/functorch}/BatchRulesHelper.cpp | 27 +- .../src/ATen/functorch}/BatchRulesHelper.h | 40 +- .../functorch}/BatchRulesLinearAlgebra.cpp | 238 +- .../src/ATen/functorch}/BatchRulesLoss.cpp | 20 +- .../src/ATen/functorch}/BatchRulesModules.cpp | 64 +- .../src/ATen/functorch}/BatchRulesNorm.cpp | 60 +- .../src/ATen/functorch}/BatchRulesPooling.cpp | 8 +- .../ATen/functorch}/BatchRulesRandomness.cpp | 82 +- .../ATen/functorch}/BatchRulesReduceOps.cpp | 17 +- .../ATen/functorch}/BatchRulesScatterOps.cpp | 21 +- .../ATen/functorch}/BatchRulesUnaryOps.cpp | 7 +- .../src/ATen/functorch}/BatchRulesViews.cpp | 131 +- .../src/ATen/functorch}/BatchedFallback.cpp | 19 +- .../src/ATen/functorch}/BatchedFallback.h | 33 +- .../src/ATen/functorch}/BatchedTensorImpl.cpp | 76 +- .../src/ATen/functorch}/BatchedTensorImpl.h | 24 +- .../ATen/functorch}/BatchingMetaprogramming.h | 8 + .../src/ATen/functorch}/DynamicLayer.cpp | 179 +- aten/src/ATen/functorch/DynamicLayer.h | 131 + .../functorch}/FunctionalizeInterpreter.cpp | 7 +- .../functorch}/FunctionalizeInterpreter.h | 7 +- .../src/ATen/functorch}/Interpreter.cpp | 34 +- .../src/ATen/functorch}/Interpreter.h | 21 +- .../LegacyBatchingRegistrations.cpp | 233 +- .../ATen/functorch}/LegacyVmapTransforms.cpp | 23 +- .../ATen/functorch}/LegacyVmapTransforms.h | 15 +- aten/src/ATen/functorch/Macros.h | 3 + .../src/ATen/functorch}/PlumbingHelper.cpp | 8 +- aten/src/ATen/functorch/PlumbingHelper.h | 61 + .../ATen/functorch}/PyTorchOperatorHacks.cpp | 24 +- .../src/ATen/functorch}/TensorWrapper.cpp | 22 +- aten/src/ATen/functorch/TensorWrapper.h | 97 + .../src/ATen/functorch}/VmapInterpreter.cpp | 9 +- .../src/ATen/functorch}/VmapInterpreter.h | 7 +- .../ATen/functorch}/VmapModeRegistrations.cpp | 14 +- aten/src/ATen/jit_macros.h | 7 - aten/src/ATen/jiterator_macros.h | 4 +- aten/src/ATen/miopen/Descriptors.h | 2 +- aten/src/ATen/miopen/Utils.h | 2 +- aten/src/ATen/mkl/SparseBlas.cpp | 2 +- aten/src/ATen/mps/EmptyTensor.cpp | 1 + aten/src/ATen/mps/IndexKernels.h | 181 + aten/src/ATen/mps/MPSAllocator.h | 309 +- aten/src/ATen/mps/MPSAllocator.mm | 646 +- aten/src/ATen/mps/MPSDevice.h | 15 + aten/src/ATen/mps/MPSDevice.mm | 58 +- aten/src/ATen/mps/MPSFallback.mm | 19 +- aten/src/ATen/mps/MPSGuardImpl.h | 8 +- aten/src/ATen/mps/MPSGuardImpl.mm | 4 +- aten/src/ATen/mps/MPSStream.h | 28 +- aten/src/ATen/mps/MPSStream.mm | 106 +- aten/src/ATen/native/Activation.cpp | 72 +- aten/src/ATen/native/Activation.h | 2 + .../ATen/native/AdaptiveAveragePooling.cpp | 40 +- .../ATen/native/AdaptiveAveragePooling3d.cpp | 32 +- aten/src/ATen/native/AdaptiveMaxPooling2d.cpp | 11 +- aten/src/ATen/native/AdaptiveMaxPooling3d.cpp | 19 +- aten/src/ATen/native/AdaptivePooling.h | 5 +- aten/src/ATen/native/AffineGridGenerator.cpp | 14 +- aten/src/ATen/native/AutogradComposite.cpp | 15 +- aten/src/ATen/native/AveragePool2d.cpp | 12 +- aten/src/ATen/native/AveragePool3d.cpp | 13 +- aten/src/ATen/native/BatchLinearAlgebra.cpp | 573 +- aten/src/ATen/native/BatchLinearAlgebra.h | 7 +- .../ATen/native/BatchLinearAlgebraKernel.cpp | 127 +- aten/src/ATen/native/Batching.cpp | 1 + aten/src/ATen/native/BinaryOps.cpp | 160 +- aten/src/ATen/native/Blas.cpp | 25 +- aten/src/ATen/native/BlasKernel.cpp | 6 +- aten/src/ATen/native/Bucketization.cpp | 9 +- aten/src/ATen/native/CPUBlas.cpp | 1 + aten/src/ATen/native/CPUFallback.cpp | 10 +- aten/src/ATen/native/CPUFallback.h | 23 +- aten/src/ATen/native/ChanelShuffle.cpp | 15 +- aten/src/ATen/native/Col2Im.cpp | 52 +- aten/src/ATen/native/ComparisonUtils.cpp | 32 + aten/src/ATen/native/ComplexHelper.h | 40 +- aten/src/ATen/native/ConvUtils.h | 132 +- aten/src/ATen/native/Convolution.cpp | 1048 +- aten/src/ATen/native/ConvolutionMM2d.cpp | 17 +- aten/src/ATen/native/ConvolutionMM3d.cpp | 16 +- aten/src/ATen/native/ConvolutionMM3d.h | 2 +- aten/src/ATen/native/ConvolutionTBC.cpp | 14 +- aten/src/ATen/native/Copy.cpp | 83 +- aten/src/ATen/native/Correlation.cpp | 30 +- aten/src/ATen/native/Cross.cpp | 43 +- aten/src/ATen/native/DilatedMaxPool2d.cpp | 16 +- aten/src/ATen/native/DilatedMaxPool3d.cpp | 14 +- aten/src/ATen/native/DispatchStub.cpp | 2 + aten/src/ATen/native/DispatchStub.h | 7 +- aten/src/ATen/native/Distance.cpp | 34 +- aten/src/ATen/native/DistributionTemplates.h | 12 +- aten/src/ATen/native/Distributions.cpp | 42 +- aten/src/ATen/native/Dropout.cpp | 38 +- aten/src/ATen/native/Embedding.cpp | 69 +- aten/src/ATen/native/EmbeddingBag.cpp | 69 +- aten/src/ATen/native/EmbeddingBag.h | 3 +- aten/src/ATen/native/Fill.cpp | 19 +- aten/src/ATen/native/ForeachOpsKernels.cpp | 80 +- aten/src/ATen/native/ForeachUtils.h | 40 + aten/src/ATen/native/FractionalMaxPool2d.cpp | 14 +- aten/src/ATen/native/FractionalMaxPool3d.cpp | 16 +- aten/src/ATen/native/GatedLinearUnit.cpp | 17 +- aten/src/ATen/native/GridSampler.cpp | 30 +- aten/src/ATen/native/GridSamplerUtils.h | 2 +- aten/src/ATen/native/Histogram.cpp | 22 +- aten/src/ATen/native/Histogram.h | 2 - aten/src/ATen/native/Im2Col.cpp | 77 +- aten/src/ATen/native/IndexKernel.h | 1 + aten/src/ATen/native/IndexingUtils.cpp | 13 +- aten/src/ATen/native/IndexingUtils.h | 12 +- aten/src/ATen/native/Integration.cpp | 17 +- aten/src/ATen/native/Itertools.cpp | 19 +- aten/src/ATen/native/Lerp.cpp | 9 + aten/src/ATen/native/Lerp.h | 27 + aten/src/ATen/native/Linear.cpp | 429 +- aten/src/ATen/native/LinearAlgebra.cpp | 150 +- aten/src/ATen/native/LinearAlgebraUtils.h | 3 +- aten/src/ATen/native/Loss.cpp | 92 +- aten/src/ATen/native/LossCTC.cpp | 95 +- aten/src/ATen/native/LossMulti.h | 8 +- aten/src/ATen/native/LossMultiLabelMargin.cpp | 15 +- aten/src/ATen/native/LossMultiMargin.cpp | 14 +- aten/src/ATen/native/LossNLL.cpp | 86 +- aten/src/ATen/native/LossNLL2d.cpp | 28 +- .../src/ATen/native/MathBitFallThroughLists.h | 1 - aten/src/ATen/native/MathBitsFallback.h | 9 +- aten/src/ATen/native/MaxPooling.cpp | 33 +- aten/src/ATen/native/MaxUnpooling.cpp | 21 +- aten/src/ATen/native/Memory.cpp | 13 +- aten/src/ATen/native/MetaTensor.cpp | 28 +- aten/src/ATen/native/NNPACK.cpp | 17 +- .../native/NaiveConvolutionTranspose2d.cpp | 15 +- .../native/NaiveConvolutionTranspose3d.cpp | 16 +- .../ATen/native/NaiveDilatedConvolution.cpp | 15 +- aten/src/ATen/native/NamedTensor.cpp | 28 +- aten/src/ATen/native/NegateFallback.cpp | 1 + aten/src/ATen/native/NonSymbolicBC.h | 27 + aten/src/ATen/native/Normalization.cpp | 258 +- aten/src/ATen/native/Onehot.cpp | 12 +- aten/src/ATen/native/PackedSequence.cpp | 27 +- aten/src/ATen/native/PadNd.cpp | 73 +- aten/src/ATen/native/PadNd.h | 22 - aten/src/ATen/native/PixelShuffle.cpp | 36 +- aten/src/ATen/native/PointwiseOps.cpp | 15 +- aten/src/ATen/native/Pool.h | 41 +- aten/src/ATen/native/Pooling.cpp | 27 +- aten/src/ATen/native/Pow.cpp | 15 +- aten/src/ATen/native/QuantizedLinear.cpp | 26 +- aten/src/ATen/native/README.md | 36 +- aten/src/ATen/native/RNN.cpp | 70 +- aten/src/ATen/native/RNN.h | 2 +- aten/src/ATen/native/RangeFactories.cpp | 16 +- aten/src/ATen/native/ReduceAllOps.cpp | 28 +- aten/src/ATen/native/ReduceOps.cpp | 168 +- aten/src/ATen/native/ReduceOpsUtils.h | 2 +- aten/src/ATen/native/ReflectionPad.cpp | 37 +- aten/src/ATen/native/Repeat.cpp | 30 +- aten/src/ATen/native/ReplicationPadding.cpp | 19 +- aten/src/ATen/native/Resize.cpp | 11 +- aten/src/ATen/native/Resize.h | 42 +- aten/src/ATen/native/ResizeCommon.h | 5 +- aten/src/ATen/native/RowwisePrune.cpp | 11 +- aten/src/ATen/native/Scalar.cpp | 12 +- aten/src/ATen/native/SegmentReduce.cpp | 15 +- aten/src/ATen/native/SobolEngineOps.cpp | 16 +- aten/src/ATen/native/SobolEngineOpsUtils.cpp | 1 + aten/src/ATen/native/SobolEngineOpsUtils.h | 10 +- aten/src/ATen/native/SoftMax.cpp | 92 +- aten/src/ATen/native/Sorting.cpp | 38 +- aten/src/ATen/native/SpectralOps.cpp | 168 +- aten/src/ATen/native/SpmmReduce.cpp | 32 - aten/src/ATen/native/SpmmReduce.h | 12 - aten/src/ATen/native/SummaryOps.cpp | 21 +- .../ATen/native/TensorAdvancedIndexing.cpp | 232 +- aten/src/ATen/native/TensorAdvancedIndexing.h | 51 +- .../ATen/native/TensorAdvancedIndexingUtils.h | 10 +- aten/src/ATen/native/TensorCompare.cpp | 109 +- aten/src/ATen/native/TensorConversions.cpp | 968 +- aten/src/ATen/native/TensorConversions.h | 2 +- aten/src/ATen/native/TensorDimApply.h | 3 +- aten/src/ATen/native/TensorFactories.cpp | 154 +- aten/src/ATen/native/TensorFactories.h | 5 +- aten/src/ATen/native/TensorIteratorReduce.cpp | 11 +- aten/src/ATen/native/TensorProperties.cpp | 33 +- aten/src/ATen/native/TensorShape.cpp | 979 +- aten/src/ATen/native/TensorShape.h | 15 +- .../src/ATen/native/TensorTransformations.cpp | 21 +- aten/src/ATen/native/TestOps.cpp | 19 +- aten/src/ATen/native/TriangularOps.cpp | 30 +- aten/src/ATen/native/TriangularOpsUtils.h | 2 +- aten/src/ATen/native/TypeProperties.cpp | 26 +- aten/src/ATen/native/UnaryOps.cpp | 198 +- aten/src/ATen/native/Unfold2d.cpp | 1 + aten/src/ATen/native/Unfold3d.cpp | 4 +- aten/src/ATen/native/Unfold3d.h | 2 +- aten/src/ATen/native/UnfoldBackward.cpp | 6 + aten/src/ATen/native/UnfoldBackward.h | 78 +- aten/src/ATen/native/Unique.cpp | 21 +- aten/src/ATen/native/UpSample.cpp | 1 + aten/src/ATen/native/UpSample.h | 24 +- aten/src/ATen/native/UpSampleBicubic2d.cpp | 50 +- aten/src/ATen/native/UpSampleBilinear2d.cpp | 43 +- aten/src/ATen/native/UpSampleLinear1d.cpp | 27 +- aten/src/ATen/native/UpSampleNearest1d.cpp | 40 +- aten/src/ATen/native/UpSampleNearest2d.cpp | 41 +- aten/src/ATen/native/UpSampleNearest3d.cpp | 48 +- aten/src/ATen/native/UpSampleTrilinear3d.cpp | 28 +- aten/src/ATen/native/VariableMethodStubs.cpp | 20 +- aten/src/ATen/native/WeightNorm.cpp | 20 +- aten/src/ATen/native/ao_sparse/library.cpp | 1 + .../ao_sparse/quantized/cpu/fbgemm_utils.cpp | 4 +- .../ao_sparse/quantized/cpu/packed_params.h | 6 +- .../ao_sparse/quantized/cpu/qlinear.cpp | 10 +- .../quantized/cpu/qlinear_deserialize.cpp | 93 +- .../quantized/cpu/qlinear_dynamic.cpp | 17 +- .../quantized/cpu/qlinear_prepack.cpp | 13 +- .../quantized/cpu/qlinear_serialize.cpp | 37 +- .../quantized/cpu/qlinear_unpack.cpp | 12 +- aten/src/ATen/native/cpu/Activation.cpp | 114 +- aten/src/ATen/native/cpu/AtomicAddFloat.h | 6 +- aten/src/ATen/native/cpu/BinaryOpsKernel.cpp | 26 +- aten/src/ATen/native/cpu/BlasKernel.cpp | 67 +- .../ATen/native/cpu/ChannelShuffleKernel.cpp | 18 +- .../ATen/native/cpu/ChannelShuffleKernel.h | 10 +- aten/src/ATen/native/cpu/CopyKernel.cpp | 41 +- aten/src/ATen/native/cpu/CopyKernel.h | 12 + .../src/ATen/native/cpu/DepthwiseConvKernel.h | 3 +- .../src/ATen/native/cpu/DistanceOpsKernel.cpp | 3 +- .../cpu/FunctionOfAMatrixUtilsKernel.cpp | 3 +- aten/src/ATen/native/cpu/HistogramKernel.cpp | 8 +- aten/src/ATen/native/cpu/IndexKernel.cpp | 107 +- aten/src/ATen/native/cpu/LerpKernel.cpp | 134 +- aten/src/ATen/native/cpu/Loops.h | 9 +- .../ATen/native/cpu/PixelShuffleKernel.cpp | 30 +- aten/src/ATen/native/cpu/PixelShuffleKernel.h | 9 +- aten/src/ATen/native/cpu/README.md | 4 +- aten/src/ATen/native/cpu/Reduce.h | 3 +- aten/src/ATen/native/cpu/ReduceOpsKernel.cpp | 27 +- .../ATen/native/cpu/ScatterGatherKernel.cpp | 214 +- aten/src/ATen/native/cpu/SortingKernel.cpp | 3 +- aten/src/ATen/native/cpu/SparseFactories.cpp | 41 +- aten/src/ATen/native/cpu/SpmmReduceKernel.cpp | 601 +- aten/src/ATen/native/cpu/SpmmReduceKernel.h | 45 + .../ATen/native/cpu/TensorCompareKernel.cpp | 6 +- aten/src/ATen/native/cpu/UnaryOpsKernel.cpp | 20 +- aten/src/ATen/native/cpu/Unfold2d.cpp | 3 +- .../ATen/native/cpu/UnfoldBackwardKernel.cpp | 84 +- aten/src/ATen/native/cpu/UpSampleKernel.cpp | 100 +- .../ATen/native/cpu/UpSampleMoreKernel.cpp | 4 +- aten/src/ATen/native/cpu/WeightNormKernel.cpp | 68 +- aten/src/ATen/native/cpu/WeightNormKernel.h | 13 +- aten/src/ATen/native/cpu/radix_sort.h | 18 +- aten/src/ATen/native/cuda/Activation.cpp | 2 +- .../native/cuda/AdaptiveAveragePooling.cu | 15 +- .../native/cuda/AdaptiveAveragePooling3d.cu | 8 +- .../ATen/native/cuda/AdaptiveMaxPooling2d.cu | 8 +- .../ATen/native/cuda/AdaptiveMaxPooling3d.cu | 8 +- aten/src/ATen/native/cuda/AveragePool2d.cu | 16 +- .../native/cuda/BinaryLogicalOpsKernels.cu | 62 +- aten/src/ATen/native/cuda/Bucketization.cu | 6 - aten/src/ATen/native/cuda/Col2Im.cu | 89 +- aten/src/ATen/native/cuda/Copy.cu | 39 +- aten/src/ATen/native/cuda/Copy.h | 10 + aten/src/ATen/native/cuda/CumminmaxKernel.cu | 29 + aten/src/ATen/native/cuda/CumprodKernel.cu | 23 + aten/src/ATen/native/cuda/CumsumKernel.cu | 25 + aten/src/ATen/native/cuda/DepthwiseConv2d.cu | 1 - aten/src/ATen/native/cuda/DilatedMaxPool2d.cu | 26 +- aten/src/ATen/native/cuda/DistanceKernel.cu | 138 +- aten/src/ATen/native/cuda/Distributions.cu | 1 + aten/src/ATen/native/cuda/EmbeddingBag.cu | 11 +- aten/src/ATen/native/cuda/ForeachFunctors.cuh | 19 + .../ATen/native/cuda/ForeachPointwiseOp.cu | 35 + .../ATen/native/cuda/FractionalMaxPool2d.cu | 14 +- aten/src/ATen/native/cuda/FusedAdamKernel.cu | 45 + aten/src/ATen/native/cuda/GridSampler.cu | 4 +- aten/src/ATen/native/cuda/Im2Col.cu | 64 +- aten/src/ATen/native/cuda/IndexKernel.cu | 18 + aten/src/ATen/native/cuda/Indexing.cu | 198 +- aten/src/ATen/native/cuda/JitLoops.cuh | 4 - aten/src/ATen/native/cuda/KernelUtils.cuh | 48 +- aten/src/ATen/native/cuda/Lerp.cu | 21 +- aten/src/ATen/native/cuda/LinearAlgebra.cu | 4 +- .../ATen/native/cuda/LinearAlgebraStubs.cpp | 40 +- .../ATen/native/cuda/LogcumsumexpKernel.cu | 37 + aten/src/ATen/native/cuda/Loss.cu | 61 +- aten/src/ATen/native/cuda/MaxUnpooling.cu | 8 + aten/src/ATen/native/cuda/MultiMarginLoss.cu | 1 + .../src/ATen/native/cuda/MultiTensorApply.cuh | 68 + .../src/ATen/native/cuda/MultinomialKernel.cu | 4 +- aten/src/ATen/native/cuda/NLLLoss2d.cu | 23 +- .../cuda/NaiveConvolutionTranspose3d.cu | 11 +- aten/src/ATen/native/cuda/Normalization.cu | 23 +- aten/src/ATen/native/cuda/Normalization.cuh | 83 +- aten/src/ATen/native/cuda/Pow.cuh | 58 + aten/src/ATen/native/cuda/PowKernel.cu | 49 +- aten/src/ATen/native/cuda/Reduce.cuh | 35 +- aten/src/ATen/native/cuda/ReflectionPad.cu | 14 +- aten/src/ATen/native/cuda/RreluWithNoise.cu | 2 +- .../cuda/{ScanKernels.cu => ScanUtils.cuh} | 89 +- aten/src/ATen/native/cuda/Shape.cu | 2 +- aten/src/ATen/native/cuda/SoftMax.cu | 19 +- .../cuda/SparseBinaryOpIntersectionKernel.cu | 150 + aten/src/ATen/native/cuda/SummaryOps.cu | 16 +- aten/src/ATen/native/cuda/TensorFactories.cu | 12 +- aten/src/ATen/native/cuda/TriangularOps.cu | 130 +- .../ATen/native/cuda/UnaryComplexKernels.cu | 39 +- .../ATen/native/cuda/UnaryFractionKernels.cu | 2 +- .../ATen/native/cuda/UnarySpecialOpsKernel.cu | 9 +- .../ATen/native/cuda/UnfoldBackwardKernel.cu | 96 +- .../src/ATen/native/cuda/UpSampleNearest2d.cu | 22 +- .../src/ATen/native/cuda/UpSampleNearest3d.cu | 47 - aten/src/ATen/native/cuda/block_reduce.cuh | 43 +- .../native/cuda/fused_adam_amsgrad_impl.cu | 52 + .../native/cuda/fused_adam_amsgrad_impl.cuh | 24 + aten/src/ATen/native/cuda/fused_adam_impl.cu | 51 + aten/src/ATen/native/cuda/fused_adam_impl.cuh | 23 + .../src/ATen/native/cuda/fused_adam_utils.cuh | 166 + aten/src/ATen/native/cuda/im2col.cuh | 210 +- aten/src/ATen/native/cuda/jit_utils.cpp | 263 +- aten/src/ATen/native/cuda/jit_utils.h | 1 - .../src/ATen/native/cuda/layer_norm_kernel.cu | 476 +- .../native/cuda/linalg/BatchLinearAlgebra.cpp | 479 +- .../cuda/linalg/BatchLinearAlgebraLib.cpp | 156 +- .../cuda/linalg/BatchLinearAlgebraLib.h | 6 - .../ATen/native/cuda/reduction_template.cuh | 16 + aten/src/ATen/native/cuda/vol2col.cuh | 58 +- .../ATen/native/cudnn/AffineGridGenerator.cpp | 13 +- aten/src/ATen/native/cudnn/BatchNorm.cpp | 14 +- .../ATen/native/cudnn/ConvPlaceholders.cpp | 14 +- aten/src/ATen/native/cudnn/ConvShared.cpp | 24 +- aten/src/ATen/native/cudnn/ConvShared.h | 3 +- aten/src/ATen/native/cudnn/Conv_v7.cpp | 19 +- aten/src/ATen/native/cudnn/Conv_v8.cpp | 51 +- aten/src/ATen/native/cudnn/GridSampler.cpp | 13 +- aten/src/ATen/native/cudnn/LossCTC.cpp | 65 +- aten/src/ATen/native/cudnn/RNN.cpp | 22 +- aten/src/ATen/native/group_norm.cpp | 55 +- aten/src/ATen/native/im2col.h | 2 +- aten/src/ATen/native/im2col_shape_check.h | 10 +- aten/src/ATen/native/layer_norm.cpp | 40 +- aten/src/ATen/native/metal/MetalAten.mm | 3 +- aten/src/ATen/native/metal/MetalContext.mm | 4 +- aten/src/ATen/native/metal/MetalConvParams.h | 2 +- aten/src/ATen/native/metal/MetalTensorImpl.h | 4 + .../ATen/native/metal/mpscnn/MPSCNNConvOp.mm | 10 +- .../native/metal/mpscnn/MPSImageWrapper.mm | 3 + aten/src/ATen/native/metal/ops/MetalConcat.mm | 27 +- .../ATen/native/metal/ops/MetalConvolution.mm | 4 +- .../ATen/native/metal/ops/MetalHardshrink.mm | 3 +- .../src/ATen/native/metal/ops/MetalPadding.mm | 2 +- .../src/ATen/native/metal/ops/MetalReshape.mm | 5 +- .../ATen/native/miopen/BatchNorm_miopen.cpp | 13 +- aten/src/ATen/native/miopen/Conv_miopen.cpp | 248 +- aten/src/ATen/native/miopen/RNN_miopen.cpp | 17 +- aten/src/ATen/native/mkl/LinearAlgebra.cpp | 1 + aten/src/ATen/native/mkl/LinearAlgebra.h | 3 +- aten/src/ATen/native/mkl/SparseBlasImpl.cpp | 143 +- .../native/mkl/SparseCsrLinearAlgebra.cpp | 1 + .../ATen/native/mkl/SparseCsrLinearAlgebra.h | 3 +- aten/src/ATen/native/mkl/SpectralOps.cpp | 23 +- aten/src/ATen/native/mkldnn/BinaryOps.cpp | 10 +- aten/src/ATen/native/mkldnn/Conv.cpp | 527 +- aten/src/ATen/native/mkldnn/Copy.cpp | 8 +- aten/src/ATen/native/mkldnn/Gelu.cpp | 11 +- .../ATen/native/mkldnn/IDeepRegistration.cpp | 3 +- aten/src/ATen/native/mkldnn/Linear.cpp | 272 +- aten/src/ATen/native/mkldnn/MKLDNNCommon.h | 2 +- .../ATen/native/mkldnn/MKLDNNConversions.cpp | 100 +- aten/src/ATen/native/mkldnn/Matmul.cpp | 58 +- aten/src/ATen/native/mkldnn/Matmul.h | 2 +- .../ATen/native/mkldnn/MkldnnTensorMath.cpp | 10 +- aten/src/ATen/native/mkldnn/Normalization.cpp | 49 +- aten/src/ATen/native/mkldnn/Pooling.cpp | 22 +- aten/src/ATen/native/mkldnn/Prelu.cpp | 4 +- .../mkldnn/RegisterMkldnnOpContextClass.cpp | 30 + aten/src/ATen/native/mkldnn/Relu.cpp | 10 +- aten/src/ATen/native/mkldnn/SoftMax.cpp | 8 +- .../ATen/native/mkldnn/TensorFactories.cpp | 12 +- aten/src/ATen/native/mkldnn/TensorShape.cpp | 17 +- aten/src/ATen/native/mkldnn/UnaryOps.cpp | 9 +- aten/src/ATen/native/mkldnn/Utils.cpp | 133 + aten/src/ATen/native/mkldnn/Utils.h | 44 +- aten/src/ATen/native/mps/Copy.h | 15 +- aten/src/ATen/native/mps/MPSGraphVenturaOps.h | 17 + aten/src/ATen/native/mps/OperationUtils.h | 25 +- aten/src/ATen/native/mps/OperationUtils.mm | 109 +- aten/src/ATen/native/mps/TensorFactory.cpp | 12 +- .../ATen/native/mps/operations/Activation.mm | 318 +- .../native/mps/operations/AdaptivePooling.mm | 88 +- .../ATen/native/mps/operations/BinaryOps.mm | 62 +- .../{BitwiseBinaryOps.mm => BitwiseOps.mm} | 76 +- aten/src/ATen/native/mps/operations/Blas.mm | 27 +- .../ATen/native/mps/operations/ConstantOps.mm | 13 +- .../ATen/native/mps/operations/Convolution.mm | 60 +- aten/src/ATen/native/mps/operations/Copy.mm | 161 +- .../native/mps/operations/Distributions.mm | 903 +- aten/src/ATen/native/mps/operations/Eye.mm | 5 +- .../src/ATen/native/mps/operations/Indexing.h | 39 + .../ATen/native/mps/operations/Indexing.mm | 264 +- aten/src/ATen/native/mps/operations/Linear.mm | 18 +- .../src/ATen/native/mps/operations/LossOps.mm | 4 +- .../native/mps/operations/Normalization.mm | 50 +- aten/src/ATen/native/mps/operations/Pad.mm | 306 + .../native/mps/operations/PointwiseOps.mm | 8 +- .../native/mps/operations/RangeFactories.mm | 19 +- .../ATen/native/mps/operations/ReduceOps.mm | 464 +- aten/src/ATen/native/mps/operations/Repeat.mm | 9 +- aten/src/ATen/native/mps/operations/RnnOps.mm | 6 +- .../native/mps/operations/ScatterGather.mm | 12 +- aten/src/ATen/native/mps/operations/Shape.mm | 292 +- .../native/mps/operations/TensorCompare.mm | 122 +- .../native/mps/operations/TriangularOps.mm | 192 - .../ATen/native/mps/operations/UnaryOps.mm | 144 +- aten/src/ATen/native/mps/operations/View.mm | 32 +- aten/src/ATen/native/native_functions.yaml | 1428 +- .../native/nested/NestedTensorAliases.cpp | 15 + .../native/nested/NestedTensorBackward.cpp | 83 +- .../native/nested/NestedTensorBinaryOps.cpp | 247 + .../native/nested/NestedTensorBinaryOps.h | 16 + .../native/nested/NestedTensorFactories.cpp | 125 + .../native/nested/NestedTensorFactories.h | 7 + .../ATen/native/nested/NestedTensorMath.cpp | 969 +- .../src/ATen/native/nested/NestedTensorMath.h | 253 +- .../ATen/native/nested/NestedTensorMatmul.cpp | 352 + .../NestedTensorTransformerFunctions.cpp | 31 +- .../nested/NestedTensorTransformerFunctions.h | 18 +- .../native/nested/NestedTensorUnaryOps.cpp | 74 + .../ATen/native/nested/NestedTensorUtils.cpp | 112 + .../ATen/native/nested/NestedTensorUtils.h | 423 + .../nested/cuda/NestedTensorBinaryOps.cu | 120 + .../native/nested/cuda/NestedTensorMatmul.cu | 416 + .../cuda/NestedTensorTransformerFunctions.cpp | 382 +- .../cuda/NestedTensorTransformerFunctions.cu | 23 +- .../src/ATen/native/prim_native_functions.cpp | 9 +- .../ATen/native/quantized/AffineQuantizer.cpp | 47 +- .../ATen/native/quantized/AffineQuantizer.h | 3 +- .../native/quantized/AffineQuantizerBase.cpp | 27 + .../ATen/native/quantized/FakeQuantAffine.h | 3 +- .../quantized/FakeQuantPerTensorAffine.cpp | 6 +- aten/src/ATen/native/quantized/IndexKernel.h | 3 +- aten/src/ATen/native/quantized/PackedParams.h | 2 +- aten/src/ATen/native/quantized/QTensor.cpp | 6 + aten/src/ATen/native/quantized/README.md | 3 +- .../quantized/TensorAdvancedIndexing.cpp | 91 + .../ATen/native/quantized/TensorCompare.cpp | 13 + .../ATen/native/quantized/TensorFactories.cpp | 10 - .../quantized/cpu/AdaptiveAveragePooling.cpp | 16 +- .../native/quantized/cpu/AveragePool2d.cpp | 14 +- .../native/quantized/cpu/AveragePool3d.cpp | 18 +- .../ATen/native/quantized/cpu/BinaryOps.cpp | 25 +- .../src/ATen/native/quantized/cpu/BinaryOps.h | 2 +- .../native/quantized/cpu/ChannelShuffle.cpp | 18 +- .../quantized/cpu/EmbeddingPackedParams.h | 2 +- .../native/quantized/cpu/IntReprQuant.cpp | 12 +- .../native/quantized/cpu/LinearUnpackImpl.cpp | 14 +- .../cpu/MakePerTensorQuantizedTensor.cpp | 7 + .../native/quantized/cpu/Normalization.cpp | 14 +- .../ATen/native/quantized/cpu/OnednnUtils.h | 276 +- .../src/ATen/native/quantized/cpu/Pooling.cpp | 19 +- .../ATen/native/quantized/cpu/QnnpackUtils.h | 13 +- .../ATen/native/quantized/cpu/QuantUtils.h | 13 +- .../ATen/native/quantized/cpu/QuantizedOps.h | 9 +- .../ATen/native/quantized/cpu/ReduceOps.cpp | 19 +- .../src/ATen/native/quantized/cpu/Sorting.cpp | 18 +- .../native/quantized/cpu/TensorOperators.cpp | 24 +- .../ATen/native/quantized/cpu/TensorShape.cpp | 88 +- .../quantized/cpu/UpSampleBilinear2d.cpp | 15 +- .../quantized/cpu/UpSampleNearest2d.cpp | 16 +- .../quantized/cpu/UpSampleNearest3d.cpp | 38 +- .../ATen/native/quantized/cpu/XnnpackUtils.h | 2 +- .../native/quantized/cpu/conv_serialization.h | 41 +- .../native/quantized/cpu/fbgemm_utils.cpp | 18 +- .../quantized/cpu/fused_obs_fake_quant.cpp | 19 +- .../native/quantized/cpu/init_qnnpack.cpp | 3 +- .../cpu/kernels/QuantizedOpKernels.cpp | 51 +- aten/src/ATen/native/quantized/cpu/qclamp.cpp | 19 +- aten/src/ATen/native/quantized/cpu/qconv.cpp | 134 +- .../native/quantized/cpu/qconv_dynamic.cpp | 14 +- .../native/quantized/cpu/qconv_prepack.cpp | 50 +- .../quantized/cpu/qconv_unpack_impl.cpp | 2 +- aten/src/ATen/native/quantized/cpu/qelu.cpp | 12 +- .../native/quantized/cpu/qembeddingbag.cpp | 13 +- .../ATen/native/quantized/cpu/qembeddingbag.h | 4 +- .../quantized/cpu/qembeddingbag_prepack.cpp | 21 +- .../quantized/cpu/qembeddingbag_prepack.h | 6 +- .../quantized/cpu/qembeddingbag_unpack.cpp | 13 +- aten/src/ATen/native/quantized/cpu/qgelu.cpp | 17 +- .../native/quantized/cpu/qhardsigmoid.cpp | 15 +- .../ATen/native/quantized/cpu/qhardswish.cpp | 12 +- .../src/ATen/native/quantized/cpu/qlinear.cpp | 61 +- .../native/quantized/cpu/qlinear_dynamic.cpp | 87 +- .../native/quantized/cpu/qlinear_prepack.cpp | 22 +- .../src/ATen/native/quantized/cpu/qmatmul.cpp | 4 +- aten/src/ATen/native/quantized/cpu/qmul.cpp | 153 +- .../quantized/cpu/qnnpack/CMakeLists.txt | 1 - .../cpu/qnnpack/bench/q8gemm_sparse.cc | 41 +- .../quantized/cpu/qnnpack/buckbuild.bzl | 1 - .../qnnpack/cmake/DownloadGoogleTest.cmake | 2 +- .../cpu/qnnpack/deps/clog/CMakeLists.txt | 5 +- .../deps/clog/cmake/DownloadGoogleTest.cmake | 2 +- .../cpu/qnnpack/include/pack_block_sparse.h | 291 +- .../cpu/qnnpack/include/pytorch_qnnpack.h | 12 +- .../cpu/qnnpack/src/fully-connected-sparse.c | 35 +- .../native/quantized/cpu/qnnpack/src/init.c | 24 +- .../quantized/cpu/qnnpack/src/operator-run.c | 171 +- .../cpu/qnnpack/src/pack_block_sparse.cc | 170 - .../cpu/qnnpack/src/q8gemm/4x4c2-sse2.c | 9 +- .../4x8c1x4-dq-packedA-aarch32-neon.S | 804 +- .../4x8c8x1-dq-packedA-aarch32-neon.S | 622 +- .../q8gemm_sparse/8x4c1x4-dq-packedA-sse2.c | 451 +- .../q8gemm_sparse/8x4c1x4-dq-packedA-sse2.h | 435 + .../8x8c1x4-dq-packedA-aarch64-neon.S | 948 +- .../8x8c8x1-dq-packedA-aarch64-neon.S | 806 +- .../cpu/qnnpack/src/qnnpack/common.h | 12 + .../cpu/qnnpack/src/qnnpack/operator.h | 13 +- .../cpu/qnnpack/src/qnnpack/params.h | 34 +- .../cpu/qnnpack/src/qnnpack/q8gemm_sparse.h | 80 +- .../fully-connected-sparse-operator-tester.h | 38 +- .../gemm-block-sparse-microkernel-tester.h | 29 +- .../cpu/qnnpack/test/q8gemm_sparse.cc | 1362 +- .../native/quantized/cpu/qnormalization.cpp | 9 +- aten/src/ATen/native/quantized/cpu/qrelu.cpp | 21 +- .../ATen/native/quantized/cpu/qsigmoid.cpp | 17 +- aten/src/ATen/native/quantized/cpu/qtanh.cpp | 17 +- .../ATen/native/quantized/cpu/qthreshold.cpp | 13 +- .../ATen/native/quantized/cuda/Activation.cpp | 9 + .../ATen/native/quantized/cuda/Activation.cu | 21 + .../native/quantized/cuda/AffineQuantizer.cu | 16 +- .../native/quantized/cuda/EmbeddingBag.cu | 14 +- .../native/quantized/cuda/FakeQuantizeCore.cu | 6 +- .../quantized/cuda/FusedObsFakeQuant.cu | 16 +- .../native/quantized/cuda/IntReprQuant.cu | 13 +- .../cuda/MakePerTensorQuantizedTensor.cu | 17 +- .../ATen/native/quantized/cudnn/BinaryOps.cpp | 11 +- .../native/quantized/cudnn/ConvPrepack.cpp | 2 +- aten/src/ATen/native/quantized/cudnn/utils.h | 12 +- .../ATen/native/quantized/qconv_unpack.cpp | 20 +- aten/src/ATen/native/sparse/Macros.h | 19 + .../sparse/SparseBinaryOpIntersectionCommon.h | 585 + .../SparseBinaryOpIntersectionKernel.cpp | 107 + .../src/ATen/native/sparse/SparseBlasImpl.cpp | 204 + aten/src/ATen/native/sparse/SparseBlasImpl.h | 14 + .../ATen/native/sparse/SparseCsrTensor.cpp | 48 +- .../native/sparse/SparseCsrTensorMath.cpp | 321 +- .../ATen/native/sparse/SparseCsrTensorMath.h | 60 + .../ATen/native/sparse/SparseFactories.cpp | 1 + aten/src/ATen/native/sparse/SparseFactories.h | 8 +- aten/src/ATen/native/sparse/SparseStubs.h | 16 + aten/src/ATen/native/sparse/SparseTensor.cpp | 50 +- .../ATen/native/sparse/SparseTensorMath.cpp | 235 +- .../src/ATen/native/sparse/SparseTensorMath.h | 1 + .../src/ATen/native/sparse/SparseUnaryOps.cpp | 52 +- .../sparse/ValidateCompressedIndicesCommon.h | 14 +- aten/src/ATen/native/sparse/cuda/SoftMax.cu | 14 +- .../ATen/native/sparse/cuda/SparseBlas.cpp | 11 +- .../native/sparse/cuda/SparseBlasImpl.cpp | 146 +- .../sparse/cuda/SparseCUDAApplyUtils.cuh | 111 - .../native/sparse/cuda/SparseCUDABlas.cpp | 10 +- .../native/sparse/cuda/SparseCUDATensor.cu | 1 + .../sparse/cuda/SparseCUDATensorMath.cu | 71 +- .../native/sparse/cuda/SparseCsrTensorMath.cu | 2 +- .../ATen/native/sparse/cuda/SparseMatMul.cu | 8 +- aten/src/ATen/native/tags.yaml | 12 +- .../ATen/native/transformers/attention.cpp | 128 +- aten/src/ATen/native/transformers/attention.h | 33 + .../native/transformers/cuda/attention.cu | 581 +- .../transformers/cuda/attention_backward.cu | 289 + .../transformers/cuda/flash_attn/epilogue.h | 149 + .../epilogue_predicated_tile_iterator.h | 493 + .../transformers/cuda/flash_attn/fmha.h | 154 + .../transformers/cuda/flash_attn/fmha_api.cpp | 248 + .../transformers/cuda/flash_attn/fmha_api.h | 25 + .../cuda/flash_attn/fmha_fprop_kernel_1xN.h | 722 + .../flash_attn/fmha_fprop_kernel_dispatch.cu | 134 + .../cuda/flash_attn/fmha_kernel.h | 71 + .../transformers/cuda/flash_attn/fmha_utils.h | 52 + .../transformers/cuda/flash_attn/gemm.h | 95 + .../transformers/cuda/flash_attn/gmem_tile.h | 272 + .../cuda/flash_attn/kernel_traits.h | 154 + .../transformers/cuda/flash_attn/mask.h | 92 + .../cuda/flash_attn/mma_core_sm75.h | 382 + .../transformers/cuda/flash_attn/philox.cuh | 146 + .../transformers/cuda/flash_attn/softmax.h | 446 + .../cuda/flash_attn/static_switch.h | 25 + .../cuda/flash_attn/summary_stats.h | 55 + .../transformers/cuda/flash_attn/utils.h | 404 + .../attention_scaling_coefs_updater.h | 479 + .../cuda/mem_eff_attention/debug_utils.h | 129 + .../mem_eff_attention/epilogue_pipelined.h | 629 + .../epilogue_rescale_output.h | 234 + .../epilogue_thread_apply_logsumexp.h | 177 + .../cuda/mem_eff_attention/find_default_mma.h | 159 + .../cuda/mem_eff_attention/gemm/custom_mma.h | 92 + .../mem_eff_attention/gemm/custom_mma_base.h | 183 + .../gemm/custom_mma_multistage.h | 769 + .../gemm/custom_mma_pipelined.h | 402 + .../mem_eff_attention/gemm_kernel_utils.h | 226 + .../epilogue_predicated_tile_iterator.h | 750 + .../iterators/make_residual_last.h | 67 + ...cated_tile_access_iterator_residual_last.h | 2116 + .../predicated_tile_iterator_residual_last.h | 2120 + .../cuda/mem_eff_attention/kernel_backward.h | 1575 + .../cuda/mem_eff_attention/kernel_forward.h | 895 + .../kernels/backward_bf16.cu | 6 + .../kernels/backward_bf16_aligned.cu | 6 + .../kernels/backward_bf16_aligned_k128.cu | 6 + .../kernels/backward_bf16_aligned_k64.cu | 6 + .../kernels/backward_bf16_k128.cu | 6 + .../kernels/backward_bf16_k64.cu | 6 + .../mem_eff_attention/kernels/backward_f16.cu | 6 + .../kernels/backward_f16_aligned.cu | 6 + .../kernels/backward_f16_aligned_k128.cu | 6 + .../kernels/backward_f16_aligned_k64.cu | 6 + .../kernels/backward_f16_k128.cu | 6 + .../kernels/backward_f16_k64.cu | 6 + .../mem_eff_attention/kernels/backward_f32.cu | 6 + .../kernels/backward_f32_aligned.cu | 6 + .../kernels/backward_f32_aligned_k128.cu | 6 + .../kernels/backward_f32_aligned_k64.cu | 6 + .../kernels/backward_f32_k128.cu | 6 + .../kernels/backward_f32_k64.cu | 6 + .../mem_eff_attention/kernels/forward_bf16.cu | 74 + .../kernels/forward_bf16_aligned.cu | 74 + .../mem_eff_attention/kernels/forward_f16.cu | 54 + .../kernels/forward_f16_aligned.cu | 34 + .../mem_eff_attention/kernels/forward_f32.cu | 14 + .../kernels/forward_f32_aligned.cu | 14 + .../kernels/generate_kernels.sh | 56 + .../cuda/mem_eff_attention/mma_from_smem.h | 1785 + .../mma_simt_tile_iterator_residual.h | 302 + .../ATen/native/transformers/cuda/sdp_utils.h | 316 + .../ATen/native/transformers/sdp_utils_cpp.h | 9 + .../ATen/native/transformers/transformer.cpp | 67 +- aten/src/ATen/native/ts_native_functions.yaml | 46 +- aten/src/ATen/native/utils/Factory.cpp | 1 + aten/src/ATen/native/utils/Factory.h | 2 +- aten/src/ATen/native/utils/ParamUtils.h | 21 +- aten/src/ATen/native/vol2col.h | 4 +- .../native/vulkan/VulkanOpaqueTensorImpl.h | 4 + aten/src/ATen/native/vulkan/api/Adapter.cpp | 4 +- aten/src/ATen/native/vulkan/api/Adapter.h | 2 + aten/src/ATen/native/vulkan/api/Allocator.h | 6 +- aten/src/ATen/native/vulkan/api/Command.cpp | 23 + aten/src/ATen/native/vulkan/api/Command.h | 7 + aten/src/ATen/native/vulkan/api/Common.h | 68 +- aten/src/ATen/native/vulkan/api/Context.cpp | 78 +- aten/src/ATen/native/vulkan/api/Context.h | 80 +- aten/src/ATen/native/vulkan/api/Descriptor.h | 1 + aten/src/ATen/native/vulkan/api/Pipeline.h | 5 +- aten/src/ATen/native/vulkan/api/QueryPool.cpp | 31 +- aten/src/ATen/native/vulkan/api/QueryPool.h | 6 +- aten/src/ATen/native/vulkan/api/Resource.cpp | 40 +- aten/src/ATen/native/vulkan/api/Resource.h | 30 +- aten/src/ATen/native/vulkan/api/Runtime.cpp | 20 +- aten/src/ATen/native/vulkan/api/Shader.cpp | 27 + aten/src/ATen/native/vulkan/api/Shader.h | 32 +- aten/src/ATen/native/vulkan/api/Types.h | 21 + aten/src/ATen/native/vulkan/api/Utils.h | 36 + .../src/ATen/native/vulkan/api/vk_mem_alloc.h | 19558 -------- .../ATen/native/vulkan/glsl/batchnorm.glsl | 70 +- .../native/vulkan/glsl/buffer_to_buffer.glsl | 78 + aten/src/ATen/native/vulkan/glsl/conv2d.glsl | 153 +- .../ATen/native/vulkan/glsl/conv2d_dw.glsl | 102 +- .../ATen/native/vulkan/glsl/conv2d_pw.glsl | 48 - .../native/vulkan/glsl/conv2d_pw_2x2.glsl | 100 - .../vulkan/glsl/conv2d_pw_2x2_buffered.glsl | 154 - .../native/vulkan/glsl/conv_transpose2d.glsl | 117 +- .../native/vulkan/glsl/image2d_to_nchw.glsl | 52 + .../native/vulkan/glsl/image_to_nchw.glsl | 55 +- .../vulkan/glsl/image_to_nchw_quantized.glsl | 106 +- aten/src/ATen/native/vulkan/glsl/indexing.h | 13 + .../native/vulkan/glsl/nchw_to_image.glsl | 57 +- .../native/vulkan/glsl/nchw_to_image2d.glsl | 53 + .../vulkan/glsl/nchw_to_image_quantized.glsl | 79 +- .../vulkan/glsl/quantize_per_tensor.glsl | 8 +- .../native/vulkan/glsl/quantized_add.glsl | 4 +- .../native/vulkan/glsl/quantized_conv2d.glsl | 234 +- .../vulkan/glsl/quantized_conv2d_dw.glsl | 156 +- .../vulkan/glsl/quantized_conv2d_pw_2x2.glsl | 299 +- .../native/vulkan/glsl/quantized_div.glsl | 4 +- .../native/vulkan/glsl/quantized_mul.glsl | 4 +- .../native/vulkan/glsl/quantized_sub.glsl | 4 +- .../glsl/quantized_upsample_nearest2d.glsl | 3 +- .../vulkan/glsl/templates/conv2d_pw.glslt | 154 + .../glsl/templates/conv2d_pw_params.yaml | 7 + .../src/ATen/native/vulkan/ops/Arithmetic.cpp | 62 +- aten/src/ATen/native/vulkan/ops/Batchnorm.cpp | 290 +- aten/src/ATen/native/vulkan/ops/Batchnorm.h | 68 + aten/src/ATen/native/vulkan/ops/Clone.cpp | 9 +- aten/src/ATen/native/vulkan/ops/Common.cpp | 51 +- aten/src/ATen/native/vulkan/ops/Common.h | 89 +- aten/src/ATen/native/vulkan/ops/Concat.cpp | 58 +- .../ATen/native/vulkan/ops/Convolution.cpp | 1635 +- aten/src/ATen/native/vulkan/ops/Convolution.h | 87 +- aten/src/ATen/native/vulkan/ops/Copy.cpp | 38 +- aten/src/ATen/native/vulkan/ops/Copy.h | 12 +- aten/src/ATen/native/vulkan/ops/Glu.cpp | 2 +- aten/src/ATen/native/vulkan/ops/Gru.cpp | 109 +- aten/src/ATen/native/vulkan/ops/Gru.h | 30 + aten/src/ATen/native/vulkan/ops/Lerp.cpp | 14 +- aten/src/ATen/native/vulkan/ops/Lstm.cpp | 128 +- aten/src/ATen/native/vulkan/ops/Lstm.h | 30 + aten/src/ATen/native/vulkan/ops/Mm.cpp | 37 +- aten/src/ATen/native/vulkan/ops/Mm.h | 24 + .../native/vulkan/ops/QuantizedFunctions.h | 2 +- aten/src/ATen/native/vulkan/ops/Register.cpp | 36 + aten/src/ATen/native/vulkan/ops/Shape.cpp | 4 +- aten/src/ATen/native/vulkan/ops/Tensor.cpp | 449 +- aten/src/ATen/native/vulkan/ops/Tensor.h | 165 +- aten/src/ATen/native/vulkan/ops/Utils.cpp | 385 +- aten/src/ATen/native/vulkan/ops/Utils.h | 7 + aten/src/ATen/native/vulkan/ops/cumsum.cpp | 3 +- aten/src/ATen/native/xnnpack/Common.h | 5 +- aten/src/ATen/native/xnnpack/Engine.h | 3 +- aten/src/ATen/native/xnnpack/Init.cpp | 1 + aten/src/ATen/native/xnnpack/OpContext.cpp | 4 + aten/src/ATen/native/xnnpack/OpContext.h | 14 + aten/src/ATen/native/xnnpack/Shim.cpp | 1 + aten/src/ATen/quantized/Quantizer.cpp | 4 + aten/src/ATen/record_function.h | 28 + .../templates/CompositeViewCopyKernels.cpp | 12 +- .../templates/RegisterFunctionalization.cpp | 11 +- aten/src/ATen/templates/TensorBody.h | 2 + aten/src/ATen/test/CMakeLists.txt | 104 +- aten/src/ATen/test/ExclusivelyOwned_test.cpp | 3 +- aten/src/ATen/test/MaybeOwned_test.cpp | 21 +- aten/src/ATen/test/extension_backend_test.cpp | 4 +- aten/src/ATen/test/math_kernel_test.cpp | 10 - aten/src/ATen/test/mps_test_print.cpp | 34 + aten/src/ATen/test/scalar_test.cpp | 14 + aten/src/ATen/test/vulkan_api_test.cpp | 568 +- aten/src/ATen/test/vulkan_perf_test.cpp | 20 +- .../ATen/test/vulkan_quantized_api_test.cpp | 170 +- aten/src/ATen/test/xnnpack_test.cpp | 91 + aten/src/README.md | 4 +- benchmarks/cpp/nvfuser/CMakeLists.txt | 1 + .../cpp/nvfuser/batch_norm_channels_first.cpp | 4 - .../batch_norm_channels_first_backward.cpp | 4 - .../cpp/nvfuser/batch_norm_channels_last.cpp | 4 - .../batch_norm_channels_last_backward.cpp | 4 - benchmarks/cpp/nvfuser/bert.cpp | 52 +- benchmarks/cpp/nvfuser/broadcast.cpp | 10 +- benchmarks/cpp/nvfuser/gelu_backward.cpp | 9 +- benchmarks/cpp/nvfuser/heuristic_lookup.cpp | 14 +- benchmarks/cpp/nvfuser/instance_norm.cpp | 6 +- benchmarks/cpp/nvfuser/layer_norm.cpp | 8 +- .../cpp/nvfuser/layer_norm_backward.cpp | 9 +- benchmarks/cpp/nvfuser/lstm_cell.cpp | 4 +- benchmarks/cpp/nvfuser/matmul.cpp | 357 + benchmarks/cpp/nvfuser/reduction.cpp | 10 +- benchmarks/cpp/nvfuser/rms_norm.cpp | 2 - benchmarks/cpp/nvfuser/rms_norm_backward.cpp | 3 - benchmarks/cpp/nvfuser/scale_bias_relu.cpp | 18 +- benchmarks/cpp/nvfuser/shape_inference.cpp | 9 +- benchmarks/cpp/nvfuser/softmax.cpp | 6 +- benchmarks/cpp/nvfuser/softmax_backward.cpp | 34 +- benchmarks/cpp/nvfuser/softmax_dropout.cpp | 4 +- benchmarks/cpp/nvfuser/timm.cpp | 11 +- benchmarks/cpp/nvfuser/utils.cpp | 25 +- benchmarks/cpp/nvfuser/utils.h | 26 +- benchmarks/distributed/ddp/README.md | 2 +- benchmarks/distributed/ddp/benchmark.py | 2 +- benchmarks/dynamo/Makefile_dashboard | 40 + benchmarks/dynamo/README.md | 52 + .../dbr => benchmarks/dynamo}/__init__.py | 0 benchmarks/dynamo/check_csv.py | 40 + benchmarks/dynamo/common.py | 2078 + benchmarks/dynamo/dist_util.py | 148 + benchmarks/dynamo/distributed.py | 164 + benchmarks/dynamo/huggingface.py | 585 + benchmarks/dynamo/huggingface_models_list.txt | 51 + .../dynamo/microbenchmarks}/__init__.py | 0 .../microbenchmarks/bench_autotune_conv.py | 170 + .../dynamo/microbenchmarks/bench_conv.py | 144 + .../dynamo/microbenchmarks/bench_conv1x1.py | 140 + .../microbenchmarks/bench_conv_fusion.py | 298 + .../dynamo/microbenchmarks/bench_mm_fusion.py | 121 + .../microbenchmarks/benchmark_helper.py | 13 + .../dynamo/microbenchmarks/inductor_bmm.py | 61 + .../dynamo/microbenchmarks/inductor_mm.py | 134 + .../dynamo/microbenchmarks/matmul_relu.py | 100 + .../dynamo/microbenchmarks/microbench.py | 176 + benchmarks/dynamo/microbenchmarks/model.py | 26 + .../hf_train/AlbertForMaskedLM_training.txt | 115 + .../AlbertForQuestionAnswering_training.txt | 110 + .../AllenaiLongformerBase_training.txt | 186 + .../hf_train/BartForCausalLM_training.txt | 73 + .../BartForConditionalGeneration_training.txt | 89 + .../hf_train/BertForMaskedLM_training.txt | 81 + .../BertForQuestionAnswering_training.txt | 88 + .../hf_train/BigBird_training.txt | 237 + .../BlenderbotSmallForCausalLM_training.txt | 74 + ...SmallForConditionalGeneration_training.txt | 81 + .../hf_train/CamemBert_training.txt | 88 + .../hf_train/DebertaForMaskedLM_training.txt | 132 + .../DebertaForQuestionAnswering_training.txt | 133 + .../DebertaV2ForMaskedLM_training.txt | 85 + ...DebertaV2ForQuestionAnswering_training.txt | 92 + .../DistilBertForMaskedLM_training.txt | 78 + ...istilBertForQuestionAnswering_training.txt | 85 + .../hf_train/DistillGPT2_training.txt | 91 + .../hf_train/ElectraForCausalLM_training.txt | 92 + .../ElectraForQuestionAnswering_training.txt | 94 + ...GPT2ForSequenceClassification_training.txt | 106 + .../hf_train/GPTNeoForCausalLM_training.txt | 96 + ...TNeoForSequenceClassification_training.txt | 101 + .../hf_train/GoogleFnet_training.txt | 83 + .../hf_train/LayoutLMForMaskedLM_training.txt | 90 + ...utLMForSequenceClassification_training.txt | 98 + ...2M100ForConditionalGeneration_training.txt | 88 + .../hf_train/MBartForCausalLM_training.txt | 73 + ...MBartForConditionalGeneration_training.txt | 94 + .../MegatronBertForCausalLM_training.txt | 85 + ...atronBertForQuestionAnswering_training.txt | 88 + .../MobileBertForMaskedLM_training.txt | 112 + ...obileBertForQuestionAnswering_training.txt | 106 + .../hf_train/OPTForCausalLM_training.txt | 103 + .../hf_train/PLBartForCausalLM_training.txt | 73 + ...LBartForConditionalGeneration_training.txt | 94 + .../hf_train/PegasusForCausalLM_training.txt | 72 + ...gasusForConditionalGeneration_training.txt | 79 + .../hf_train/RobertaForCausalLM_training.txt | 94 + .../RobertaForQuestionAnswering_training.txt | 97 + .../Speech2Text2ForCausalLM_training.txt | 82 + .../hf_train/TrOCRForCausalLM_training.txt | 73 + .../hf_train/XGLMForCausalLM_training.txt | 88 + .../hf_train/XLNetLMHeadModel_training.txt | 105 + .../hf_train/YituTechConvBert_training.txt | 119 + .../timm_train/adv_inception_v3_training.txt | 239 + .../beit_base_patch16_224_training.txt | 100 + .../timm_train/botnet26t_256_training.txt | 244 + .../timm_train/cait_m36_384_training.txt | 149 + .../timm_train/coat_lite_mini_training.txt | 348 + .../timm_train/convmixer_768_32_training.txt | 45 + .../timm_train/convnext_base_training.txt | 210 + .../timm_train/crossvit_9_240_training.txt | 203 + .../timm_train/cspdarknet53_training.txt | 177 + ...it_base_distilled_patch16_224_training.txt | 87 + .../timm_train/densenet121_training.txt | 616 + .../timm_train/dla102_training.txt | 189 + .../timm_train/dm_nfnet_f0_training.txt | 296 + .../timm_train/dpn107_training.txt | 545 + .../eca_botnext26ts_256_training.txt | 288 + .../timm_train/eca_halonext26ts_training.txt | 343 + .../timm_train/ecaresnet101d_training.txt | 195 + .../timm_train/ese_vovnet19b_dw_training.txt | 182 + .../timm_train/fbnetc_100_training.txt | 189 + .../timm_train/fbnetv3_b_training.txt | 287 + .../timm_train/gernet_l_training.txt | 118 + .../timm_train/ghostnet_100_training.txt | 411 + .../gluon_inception_v3_training.txt | 239 + .../timm_train/gluon_senet154_training.txt | 187 + .../timm_train/gluon_xception65_training.txt | 155 + .../timm_train/gmixer_24_224_training.txt | 83 + .../timm_train/gmlp_s16_224_training.txt | 70 + .../timm_train/hardcorenas_a_training.txt | 260 + .../timm_train/hrnet_w18_training.txt | 247 + .../timm_train/inception_v3_training.txt | 239 + .../timm_train/jx_nest_base_training.txt | 269 + .../timm_train/lcnet_050_training.txt | 158 + .../timm_train/legacy_senet154_training.txt | 183 + .../timm_train/levit_128_training.txt | 295 + .../timm_train/mixer_b16_224_training.txt | 70 + .../timm_train/mixnet_l_training.txt | 378 + .../timm_train/mnasnet_100_training.txt | 170 + .../timm_train/mobilenetv2_100_training.txt | 172 + .../mobilenetv3_large_100_training.txt | 269 + .../timm_train/mobilevit_s_training.txt | 313 + .../timm_train/nasnetalarge_training.txt | 309 + .../timm_train/nfnet_l0_training.txt | 267 + .../timm_train/pit_b_224_training.txt | 185 + .../timm_train/pnasnet5large_training.txt | 293 + .../timm_train/poolformer_m36_training.txt | 111 + .../timm_train/regnety_002_training.txt | 181 + .../timm_train/repvgg_a2_training.txt | 90 + .../timm_train/res2net101_26w_4s_training.txt | 209 + .../timm_train/res2net50_14w_8s_training.txt | 209 + .../timm_train/res2next50_training.txt | 197 + .../timm_train/resmlp_12_224_training.txt | 75 + .../timm_train/resnest101e_training.txt | 269 + .../timm_train/resnet18_training.txt | 88 + .../timm_train/rexnet_100_training.txt | 573 + .../timm_train/sebotnet33ts_256_training.txt | 334 + .../timm_train/selecsls42b_training.txt | 167 + .../timm_train/spnasnet_100_training.txt | 182 + .../swin_base_patch4_window7_224_training.txt | 341 + .../swsl_resnext101_32x16d_training.txt | 143 + .../tf_efficientnet_b0_training.txt | 312 + .../timm_train/tf_mixnet_l_training.txt | 408 + .../timm_train/tinynet_a_training.txt | 302 + .../timm_train/tnt_s_patch16_224_training.txt | 146 + .../timm_train/twins_pcpvt_base_training.txt | 245 + .../timm_train/visformer_small_training.txt | 132 + .../vit_base_patch16_224_training.txt | 83 + .../timm_train/volo_d1_224_training.txt | 216 + .../BERT_pytorch_training.txt | 94 + .../Background_Matting_training.txt | 119 + .../LearningToPaint_training.txt | 86 + .../torchbench_train/Super_SloMo_training.txt | 255 + .../torchbench_train/alexnet_training.txt | 58 + ...ntion_is_all_you_need_pytorch_training.txt | 148 + .../torchbench_train/dcgan_training.txt | 42 + .../torchbench_train/densenet121_training.txt | 609 + .../fambench_dlrm_training.txt | 1063 + .../fastNLP_Bert_training.txt | 157 + .../torchbench_train/hf_Albert_training.txt | 110 + .../torchbench_train/hf_Bart_training.txt | 76 + .../torchbench_train/hf_Bert_training.txt | 76 + .../torchbench_train/hf_BigBird_training.txt | 235 + .../hf_DistilBert_training.txt | 73 + .../torchbench_train/hf_GPT2_training.txt | 88 + .../hf_Longformer_training.txt | 189 + .../maml_omniglot_training.txt | 49 + .../torchbench_train/mnasnet1_0_training.txt | 163 + .../mobilenet_v2_training.txt | 165 + .../mobilenet_v3_large_training.txt | 277 + .../nvidia_deeprecommender_training.txt | 36 + .../pytorch_CycleGAN_and_pix2pix_training.txt | 67 + .../pytorch_stargan_training.txt | 80 + .../pytorch_struct_training.txt | 63 + .../pytorch_unet_training.txt | 119 + .../torchbench_train/resnet18_training.txt | 81 + .../torchbench_train/resnet50_training.txt | 134 + .../resnext50_32x4d_training.txt | 124 + .../shufflenet_v2_x1_0_training.txt | 123 + .../speech_transformer_training.txt | 178 + .../squeezenet1_1_training.txt | 90 + .../timm_efficientdet_training.txt | 623 + .../timm_efficientnet_training.txt | 295 + .../torchbench_train/timm_nfnet_training.txt | 289 + .../torchbench_train/timm_regnet_training.txt | 178 + .../timm_resnest_training.txt | 205 + .../timm_vision_transformer_training.txt | 77 + .../torchbench_train/timm_vovnet_training.txt | 130 + .../torchbench_train/tts_angular_training.txt | 51 + .../torchbench_train/vgg16_training.txt | 72 + .../vision_maskrcnn_training.txt | 477 + .../torchbench_train/yolov3_training.txt | 261 + .../microbenchmarks/operator_inp_utils.py | 342 + .../dynamo/microbenchmarks/operatorbench.py | 242 + .../dynamo/microbenchmarks/profile_conv.py | 107 + benchmarks/dynamo/microbenchmarks/utils.py | 19 + benchmarks/dynamo/runner.py | 1345 + benchmarks/dynamo/test.py | 44 + benchmarks/dynamo/timm_models.py | 322 + benchmarks/dynamo/timm_models_list.txt | 62 + benchmarks/dynamo/torchbench.py | 365 + benchmarks/dynamo/torchbench_models_list.txt | 28 + benchmarks/dynamo/training_loss.py | 205 + benchmarks/instruction_counts/README.md | 2 +- benchmarks/instruction_counts/core/utils.py | 2 +- benchmarks/nested/nested_bmm_bench.py | 53 + benchmarks/operator_benchmark/README.md | 2 +- .../pt/ao_sparsifier_test.py | 4 +- .../operator_benchmark/pt/interpolate_test.py | 12 + .../operator_benchmark/pt/qactivation_test.py | 14 +- .../operator_benchmark/pt/qarithmetic_test.py | 2 +- .../pt/qatembedding_ops_test.py | 2 +- benchmarks/operator_benchmark/pt/qcat_test.py | 2 +- .../operator_benchmark/pt/qconv_test.py | 2 +- .../pt/qembeddingbag_test.py | 2 +- .../operator_benchmark/pt/qlinear_test.py | 4 +- .../pt/quantization_test.py | 2 +- .../static_runtime/test_generated_ops.cc | 398 +- .../static_runtime/test_static_module.cc | 19 +- .../static_runtime/test_static_runtime.cc | 84 +- benchmarks/static_runtime/test_utils.cc | 19 +- .../better_transformer_vs_mha_functional.py | 195 + benchmarks/transformer/sdp.py | 157 + benchmarks/transformer/sdp_backwards.py | 189 + binaries/CMakeLists.txt | 13 +- binaries/optimize_for_mobile.cc | 15 +- binaries/speed_benchmark_torch.cc | 4 + buckbuild.bzl | 85 +- build.bzl | 4 + build_variables.bzl | 108 +- c10/CMakeLists.txt | 5 +- c10/c10_defs.bzl | 29 - c10/core/AutogradState.cpp | 6 +- c10/core/AutogradState.h | 18 +- c10/core/Device.cpp | 13 +- c10/core/Device.h | 3 +- c10/core/DeviceType.cpp | 48 +- c10/core/DeviceType.h | 3 + c10/core/DispatchKey.cpp | 15 +- c10/core/DispatchKey.h | 7 + c10/core/DispatchKeySet.cpp | 12 +- c10/core/DispatchKeySet.h | 14 +- c10/core/InferenceMode.h | 3 +- c10/core/MemoryFormat.h | 36 +- c10/core/PyHandleCache.h | 75 + c10/core/QEngine.h | 4 + c10/core/SafePyObject.cpp | 5 + c10/core/SafePyObject.h | 31 +- c10/core/Scalar.cpp | 13 +- c10/core/Scalar.h | 166 +- c10/core/ScalarType.h | 2 +- c10/core/Storage.h | 4 + c10/core/StorageImpl.h | 14 +- c10/core/SymFloat.cpp | 81 + c10/core/SymFloat.h | 71 + c10/core/SymInt.cpp | 187 +- c10/core/SymInt.h | 258 +- c10/core/SymIntArrayRef.cpp | 34 +- c10/core/SymIntArrayRef.h | 215 +- c10/core/SymIntNodeImpl.cpp | 11 - c10/core/SymIntNodeImpl.h | 81 - c10/core/SymNodeImpl.cpp | 3 + c10/core/SymNodeImpl.h | 118 + c10/core/TensorImpl.cpp | 410 +- c10/core/TensorImpl.h | 924 +- c10/core/UndefinedTensorImpl.cpp | 8 +- c10/core/UndefinedTensorImpl.h | 2 + c10/core/WrapDimMinimal.cpp | 26 +- c10/core/WrapDimMinimal.h | 39 +- c10/core/impl/HermeticPyObjectTLS.cpp | 23 + c10/core/impl/HermeticPyObjectTLS.h | 61 + c10/core/impl/PyInterpreter.cpp | 202 +- c10/core/impl/PyInterpreter.h | 252 +- c10/core/impl/PythonDispatcherTLS.cpp | 32 + c10/core/impl/PythonDispatcherTLS.h | 27 + c10/core/impl/SizesAndStrides.cpp | 66 +- c10/core/impl/SizesAndStrides.h | 119 +- c10/core/impl/TorchDispatchModeTLS.cpp | 72 + c10/core/impl/TorchDispatchModeTLS.h | 27 + c10/cuda/CMakeLists.txt | 10 +- c10/cuda/CUDACachingAllocator.cpp | 1029 +- c10/cuda/CUDACachingAllocator.h | 225 +- c10/cuda/CUDAException.cpp | 35 + c10/cuda/CUDAException.h | 62 +- c10/cuda/CUDAFunctions.cpp | 4 + c10/cuda/CUDAFunctions.h | 11 + c10/cuda/CUDAMallocAsyncAllocator.cpp | 856 + c10/cuda/CUDAMiscFunctions.cpp | 5 + c10/cuda/CUDAMiscFunctions.h | 5 +- c10/cuda/CUDAStream.cpp | 4 +- c10/cuda/impl/CUDAGuardImpl.h | 10 +- c10/defs_hip.bzl | 126 - c10/macros/Macros.h | 91 +- c10/macros/build.bzl | 9 + c10/test/core/SymInt_test.cpp | 11 +- c10/test/core/impl/SizesAndStrides_test.cpp | 4 +- c10/test/util/complex_math_test_common.h | 128 + c10/test/util/intrusive_ptr_test.cpp | 5 + c10/test/util/string_view_test.cpp | 16 +- c10/util/C++17.h | 26 +- c10/util/DimVector.h | 2 + c10/util/Exception.cpp | 79 +- c10/util/Exception.h | 148 +- c10/util/FunctionRef.h | 2 +- c10/util/Half-inl.h | 6 +- c10/util/Half.h | 6 +- c10/util/IdWrapper.h | 1 + c10/util/Optional.h | 3 +- c10/util/SmallVector.cpp | 1 + c10/util/SmallVector.h | 1 + c10/util/ThreadLocalDebugInfo.cpp | 4 +- c10/util/build.bzl | 2 +- c10/util/complex_math.h | 31 + c10/util/hash.h | 8 + c10/util/intrusive_ptr.h | 3 + c10/util/irange.h | 30 +- c10/util/logging_is_not_google_glog.h | 2 +- c10/util/safe_numerics.h | 6 + c10/util/string_view.h | 8 +- c10/util/strong_type.h | 8 - c10/util/typeid.cpp | 64 +- c10/util/typeid.h | 121 +- c2_defs.bzl | 48 +- caffe2/CMakeLists.txt | 1244 +- caffe2/README.md | 2 - caffe2/contrib/aten/gen_op.py | 17 +- caffe2/contrib/nccl/cuda_nccl_gpu.cc | 2 +- caffe2/contrib/tensorrt/README.md | 2 +- .../contrib/tensorrt/tensorrt_tranformer.cc | 2 +- caffe2/core/CMakeLists.txt | 2 +- caffe2/core/context_gpu.cu | 6 +- caffe2/core/context_gpu.h | 12 +- caffe2/core/macros.h.in | 2 + caffe2/core/nomnigraph/CMakeLists.txt | 2 +- caffe2/core/tensor.cc | 2 +- caffe2/core/tensor.h | 5 + caffe2/defs.bzl | 89 - caffe2/defs_hip.bzl | 149 - .../mobile/contrib/libopencl-stub/README.md | 2 +- caffe2/mobile/contrib/nnapi/nnapi.h | 4 +- caffe2/operators/batch_box_cox_op.cc | 300 +- caffe2/operators/batch_box_cox_op.h | 60 +- .../generate_proposals_op_util_nms_gpu.cu | 42 +- ...generate_proposals_op_util_nms_gpu_test.cc | 2 +- .../rnn/recurrent_network_executor_gpu.cc | 3 +- caffe2/operators/scale_blobs_op.cu | 8 +- caffe2/operators/segment_reduction_op_gpu.cu | 18 +- caffe2/perfkernels/CMakeLists.txt | 2 +- caffe2/perfkernels/batch_box_cox.cc | 113 + caffe2/perfkernels/batch_box_cox.h | 35 + caffe2/perfkernels/batch_box_cox_avx2.cc | 399 + caffe2/perfkernels/common.h | 3 + caffe2/perfkernels/lstm_unit_cpu-impl.h | 22 +- caffe2/perfkernels/vectorizer.h | 28 + caffe2/proto/caffe2_pb.h | 2 +- caffe2/python/CMakeLists.txt | 1 + caffe2/python/clean_workspace_test.py | 15 + caffe2/python/onnx/ONNXOpCoverage.md | 2 +- caffe2/python/operator_test/_utils.py | 50 + .../operator_test/layer_norm_op_test.py | 30 +- .../operator_test/torch_integration_test.py | 66 +- caffe2/python/optimizer.py | 72 +- caffe2/python/optimizer_test.py | 22 +- caffe2/python/pybind_state.cc | 263 +- caffe2/python/pybind_workspace.cc | 72 + caffe2/python/pybind_workspace.h | 15 + caffe2/python/utils.py | 2 + caffe2/python/workspace_test.py | 6 - caffe2/quantization/server/README.md | 4 +- caffe2/quantization/server/dnnlowp.h | 2 + .../server/fully_connected_fake_lowp_op.h | 2 + caffe2/release-notes.md | 2 +- caffe2/serialize/inline_container.cc | 18 +- caffe2/serialize/inline_container.h | 2 +- caffe2/sgd/learning_rate_op.cc | 10 +- caffe2/utils/CMakeLists.txt | 2 +- caffe2/utils/math/elementwise.cu | 2 +- caffe2/utils/math/reduce.cu | 4 +- caffe2/utils/math_gpu.cu | 8 +- caffe2/utils/threadpool/ThreadPool.cc | 11 + cmake/Dependencies.cmake | 34 +- cmake/External/nccl.cmake | 39 +- cmake/Modules/FindMKLDNN.cmake | 2 + .../FindCUDA/select_compute_arch.cmake | 24 +- cmake/Summary.cmake | 6 +- cmake/VulkanCodegen.cmake | 12 +- cmake/public/LoadHIP.cmake | 3 - cmake/public/mkl.cmake | 7 +- cmake/public/utils.cmake | 119 +- defs.bzl | 8 - defs_gpu.bzl | 166 - defs_hip.bzl | 136 - docker.Makefile | 48 +- docs/Makefile | 7 +- docs/caffe2/.Doxyfile-c | 2 +- docs/caffe2/.Doxyfile-python | 2 +- docs/cpp/source/notes/tensor_cuda_stream.rst | 10 +- docs/requirements.txt | 16 +- docs/source/_dynamo.rst | 13 + .../_static/img/masked/tensor_comparison.jpg | Bin 0 -> 179951 bytes docs/source/amp.rst | 1 - docs/source/autograd.rst | 2 + docs/source/backends.rst | 42 + docs/source/bottleneck.rst | 6 +- docs/source/community/build_ci_governance.rst | 19 + docs/source/community/contribution_guide.rst | 16 +- docs/source/community/governance.rst | 29 +- docs/source/community/persons_of_interest.rst | 64 +- docs/source/conf.py | 54 +- docs/source/cuda._sanitizer.rst | 102 + docs/source/cuda.rst | 16 + docs/source/data.rst | 6 +- docs/source/deploy.rst | 241 +- docs/source/distributed.checkpoint.rst | 4 + docs/source/distributed.rst | 20 +- docs/source/elastic/agent.rst | 15 + docs/source/elastic/timer.rst | 11 + docs/source/fsdp.rst | 12 + docs/source/fx.rst | 6 +- docs/source/index.rst | 7 +- docs/source/jit_language_reference.rst | 2 +- docs/source/jit_language_reference_v2.rst | 4 +- docs/source/jit_unsupported.rst | 2 +- docs/source/linalg.rst | 6 + docs/source/masked.rst | 297 + docs/source/mobile_optimizer.rst | 5 +- docs/source/nested.rst | 125 +- docs/source/notes/autograd.rst | 2 +- docs/source/notes/cuda.rst | 150 +- docs/source/notes/extending.rst | 8 +- docs/source/notes/hip.rst | 11 + docs/source/notes/modules.rst | 21 +- docs/source/notes/numerical_accuracy.rst | 53 +- docs/source/onnx.rst | 344 +- docs/source/onnx_diagnostics.rst | 35 + docs/source/onnx_supported_aten_ops.rst | 34 +- docs/source/optim.rst | 6 +- docs/source/profiler.rst | 11 + .../quantization-accuracy-debugging.rst | 2 +- docs/source/quantization-support.rst | 168 +- docs/source/quantization.rst | 64 +- docs/source/rpc.rst | 2 +- .../onnx/build_onnx_diagnostics_rules_md.py | 37 + .../build_onnx_supported_aten_op_csv_table.py | 51 +- docs/source/signal.rst | 30 + docs/source/sparse.rst | 247 +- docs/source/special.rst | 2 + docs/source/storage.rst | 6 +- docs/source/tensor_attributes.rst | 16 +- docs/source/tensors.rst | 2 - docs/source/torch.rst | 16 +- functorch/.circleci/config.yml | 316 - .../unittest/linux/scripts/environment.yml | 17 - .../unittest/linux/scripts/install.sh | 61 - .../unittest/linux/scripts/post_process.sh | 8 - .../unittest/linux/scripts/run_test.sh | 16 - .../unittest/linux/scripts/setup_env.sh | 39 - .../unittest/windows/scripts/environment.yml | 20 - .../unittest/windows/scripts/install.sh | 46 - .../windows/scripts/install_conda.bat | 1 - .../unittest/windows/scripts/post_process.sh | 6 - .../unittest/windows/scripts/run_test.sh | 26 - .../unittest/windows/scripts/set_cuda_envs.sh | 48 - .../unittest/windows/scripts/setup_env.sh | 39 - .../windows/scripts/vc_env_helper.bat | 39 - functorch/.flake8 | 20 - functorch/.github/workflows/docs.yml | 82 - functorch/.github/workflows/lint.yml | 63 - functorch/.github/workflows/wheels.yml | 61 - functorch/.lintrunner.toml | 48 - functorch/CMakeLists.txt | 38 + functorch/CODE_OF_CONDUCT.md | 76 - functorch/CONTRIBUTING.md | 12 - functorch/LICENSE | 26 - functorch/README.md | 66 +- functorch/{functorch => }/__init__.py | 11 +- functorch/{functorch => }/_src/__init__.py | 0 functorch/_src/aot_autograd.py | 1965 + .../{functorch => }/_src/benchmark_utils.py | 0 .../{functorch => }/_src/compile_utils.py | 10 + functorch/{functorch => }/_src/compilers.py | 92 +- functorch/_src/config.py | 38 + .../{functorch => }/_src/eager_transforms.py | 159 +- functorch/_src/fx_minifier.py | 306 + .../{functorch => }/_src/make_functional.py | 2 +- .../_src/named_members_polyfill.py | 0 .../{functorch => }/_src/partitioners.py | 208 +- functorch/{functorch => }/_src/python_key.py | 5 +- .../{functorch => }/_src/pytree_hacks.py | 0 .../_src/top_operators_github_usage.py | 0 functorch/{functorch => }/_src/vmap.py | 16 +- functorch/benchmarks/operator_authoring.py | 8 +- functorch/benchmarks/pointwise_scorecard.py | 4 +- .../transformer_fusion_patterns/benchmark.py | 3 +- .../bias_gelu_dropout.py | 3 +- functorch/{functorch => }/compile/__init__.py | 7 +- functorch/{functorch => }/csrc/dim/arena.h | 0 functorch/{functorch => }/csrc/dim/dim.cpp | 100 +- functorch/{functorch => }/csrc/dim/dim.h | 0 .../{functorch => }/csrc/dim/minpybind.h | 9 +- .../csrc/dim/python_variable_simple.h | 0 functorch/csrc/init_dim_only.cpp | 22 + functorch/{functorch => }/dim/README.md | 22 +- functorch/{functorch => }/dim/__init__.py | 4 +- functorch/{functorch => }/dim/batch_tensor.py | 2 +- .../{functorch => }/dim/delayed_mul_tensor.py | 0 functorch/{functorch => }/dim/dim.py | 0 functorch/{functorch => }/dim/magic_trace.py | 0 .../{functorch => }/dim/op_properties.py | 0 functorch/{functorch => }/dim/reference.py | 0 functorch/{functorch => }/dim/tree_map.py | 0 functorch/{functorch => }/dim/wrap_type.py | 0 functorch/docs/source/_static/css/custom.css | 8 - .../docs/source/_static/images/functorch.svg | 6 - functorch/docs/source/_templates/layout.html | 339 +- functorch/docs/source/batch_norm.rst | 2 +- functorch/docs/source/conf.py | 6 +- functorch/docs/source/experimental.rst | 2 - functorch/docs/source/functorch.rst | 1 + functorch/docs/source/index.rst | 6 +- functorch/docs/source/install.rst | 40 +- functorch/docs/source/ux_limitations.rst | 4 +- functorch/examples/compilation/fuse_module.py | 4 +- .../examples/dp_cifar10/cifar10_transforms.py | 2 +- functorch/examples/maml_omniglot/README.md | 2 +- .../maml_omniglot/maml-omniglot-transforms.py | 2 +- .../maml_omniglot/support/omniglot_loaders.py | 2 +- .../{functorch => }/experimental/__init__.py | 5 +- functorch/experimental/_map.py | 105 + .../experimental/batch_norm_replacement.py | 0 functorch/experimental/cond.py | 157 + functorch/experimental/control_flow.py | 1 + functorch/experimental/ops.py | 1 + functorch/functorch/_src/aot_autograd.py | 808 - functorch/functorch/_src/config.py | 27 - functorch/functorch/_src/custom_function.py | 20 - functorch/functorch/_src/fx_minifier.py | 269 - functorch/functorch/_src/monkey_patching.py | 80 - functorch/functorch/csrc/CompileCache.cpp | 288 - functorch/functorch/csrc/CompileCache.h | 17 - functorch/functorch/csrc/Constants.h | 31 - functorch/functorch/csrc/CustomFunction.cpp | 291 - functorch/functorch/csrc/CustomFunction.h | 14 - functorch/functorch/csrc/DynamicLayer.h | 93 - functorch/functorch/csrc/Macros.h | 10 - functorch/functorch/csrc/PlumbingHelper.h | 39 - functorch/functorch/csrc/TensorWrapper.h | 68 - functorch/functorch/csrc/init.cpp | 419 - .../aot_autograd_optimizations.ipynb | 31 +- .../notebooks/colab/ensembling_colab.ipynb | 598 - .../colab/jacobians_hessians_colab.ipynb | 1120 - .../colab/per_sample_grads_colab.ipynb | 795 - functorch/notebooks/colab/readme.md | 5 - functorch/notebooks/ensembling.ipynb | 4 +- functorch/notebooks/jacobians_hessians.ipynb | 2 +- .../notebooks/neural_tangent_kernels.ipynb | 4 + functorch/notebooks/per_sample_grads.ipynb | 2 +- functorch/notebooks/whirlwind_tour.ipynb | 4 + functorch/op_analysis/public_api | 24 +- functorch/packaging/build_wheel.sh | 19 - functorch/packaging/pkg_helpers.bash | 414 - .../windows/internal/cuda_install.bat | 264 - .../windows/internal/driver_update.bat | 25 - .../windows/internal/vc_env_helper.bat | 43 - .../windows/internal/vc_install_helper.sh | 16 - functorch/pull_request_template.md | 5 - functorch/setup.cfg | 18 - functorch/setup.py | 149 - functorch/test/functorch_lagging_op_db.py | 635 - functorch/test/pytest.ini | 2 - functorch/test/test_compile_cache.py | 686 - functorch/test/test_minifier.py | 53 - functorch/test/test_pythonkey.py | 645 - functorch/tools/lint/black_linter.py | 228 - functorch/tools/lint/flake8_linter.py | 373 - functorch/tools/lint/pip_init.py | 75 - functorch/version.txt | 1 - ios/LibTorch-Lite.podspec | 3 +- ios/LibTorch.podspec | 3 +- ios/TestApp/AppleWWDRCAG3.cer | Bin 1109 -> 0 bytes ios/TestApp/README.md | 12 + ios/TestApp/TestApp.xcodeproj/project.pbxproj | 42 +- ios/TestApp/TestApp/Benchmark.h | 15 + ios/TestApp/TestApp/Benchmark.mm | 108 + ios/TestApp/TestApp/ViewController.mm | 40 + ios/TestApp/benchmark/config.json | 7 + ios/TestApp/benchmark/setup.rb | 15 +- ios/TestApp/fastlane/Fastfile | 16 - mypy-strict.ini | 1 + pt_ops.bzl | 6 +- pt_template_srcs.bzl | 1 + pytest.ini | 8 +- requirements.txt | 4 + scripts/buck_setup.sh | 6 +- scripts/build_android.sh | 36 +- scripts/build_ios.sh | 47 +- scripts/build_mobile.sh | 31 + scripts/onnx/test.sh | 1 + scripts/release_notes/commitlist.py | 4 + scripts/xcode_build.rb | 18 +- setup.py | 424 +- test/allowlist_for_publicAPI.json | 813 +- .../ao/sparsity/test_activation_sparsifier.py | 4 +- test/ao/sparsity/test_composability.py | 22 +- test/ao/sparsity/test_data_scheduler.py | 4 +- test/ao/sparsity/test_data_sparsifier.py | 4 +- test/ao/sparsity/test_kernels.py | 10 +- test/ao/sparsity/test_parametrization.py | 2 +- test/ao/sparsity/test_pruner.py | 2 +- .../ao/sparsity/test_qlinear_packed_params.py | 105 +- test/ao/sparsity/test_scheduler.py | 100 +- test/ao/sparsity/test_sparsifier.py | 10 +- test/ao/sparsity/test_sparsity_utils.py | 2 +- test/conftest.py | 23 + test/cpp/api/CMakeLists.txt | 11 +- test/cpp/api/autograd.cpp | 86 +- test/cpp/api/functional.cpp | 14 + test/cpp/api/imethod.cpp | 64 - test/cpp/api/inference_mode.cpp | 6 +- test/cpp/api/modules.cpp | 2 +- test/cpp/api/nested.cpp | 15 + test/cpp/api/nn_utils.cpp | 2 +- test/cpp/api/serialize.cpp | 41 + test/cpp/api/static.cpp | 4 + test/cpp/api/support.h | 13 +- test/cpp/c10d/CMakeLists.txt | 13 + test/cpp/c10d/FileStoreTest.cpp | 16 +- test/cpp/c10d/HashStoreTest.cpp | 4 +- test/cpp/c10d/ProcessGroupGlooAsyncTest.cpp | 17 +- test/cpp/c10d/ProcessGroupGlooTest.cpp | 26 +- test/cpp/c10d/ProcessGroupMPITest.cpp | 45 +- test/cpp/c10d/ProcessGroupNCCLErrorsTest.cpp | 4 +- test/cpp/c10d/ProcessGroupNCCLTest.cpp | 30 +- test/cpp/c10d/ProcessGroupUCCTest.cpp | 35 + test/cpp/c10d/StoreTestCommon.hpp | 2 +- test/cpp/c10d/TCPStoreTest.cpp | 4 +- test/cpp/c10d/example/allreduce.cpp | 6 +- test/cpp/jit/CMakeLists.txt | 21 +- test/cpp/jit/test_custom_class.cpp | 14 + .../jit/test_custom_class_registrations.cpp | 12 +- .../cpp/jit/test_custom_class_registrations.h | 5 + test/cpp/jit/test_flatbuffer.cpp | 12 +- test/cpp/jit/test_jit_logging_levels.cpp | 10 +- test/cpp/jit/test_misc.cpp | 138 + test/cpp/jit/test_module_api.cpp | 4 +- test/cpp/lazy/CMakeLists.txt | 1 - test/cpp/lazy/test_ir.cpp | 10 +- test/cpp/lazy/test_ir_util.cpp | 2 +- test/cpp/lazy/test_lazy_ops.cpp | 34 +- test/cpp/lazy/test_symbolic_shape.cpp | 161 - test/cpp/lite_interpreter_runtime/resources.h | 41 + .../test_mobile_profiler.cpp | 75 +- test/cpp/profiler/perf_events.cpp | 248 + test/cpp/rpc/e2e_test_base.h | 2 +- test/cpp/rpc/test_e2e_tensorpipe.cpp | 2 +- test/cpp/tensorexpr/test_cuda.cpp | 819 +- test/cpp/tensorexpr/test_kernel.cpp | 73 +- test/cpp/tensorexpr/test_loopnest.cpp | 4 +- test/cpp/tensorexpr/test_quantization.cpp | 3 +- test/cpp_extensions/cpp_c10d_extension.cpp | 26 +- test/cpp_extensions/cpp_c10d_extension.hpp | 37 +- .../open_registration_extension.cpp | 7 +- test/cpp_extensions/ort_extension.cpp | 6 - test/defs.bzl | 112 - .../_composable/test_checkpoint.py | 83 + test/distributed/_composable/test_contract.py | 122 + .../_composable/test_fully_shard.py | 267 + .../distributed/_composable/test_replicate.py | 107 + .../_shard/checkpoint/test_checkpoint.py | 413 - .../sharded_tensor/ops/test_embedding.py | 18 +- .../sharded_tensor/ops/test_embedding_bag.py | 12 +- .../sharding_spec/test_sharding_spec.py | 66 + test/distributed/_tensor/README.md | 11 + test/distributed/_tensor/__init__.py | 1 + .../distributed/_tensor/parallel}/__init__.py | 0 .../_tensor/parallel/test_2d_parallel.py | 214 + .../_tensor/parallel/test_parallelize_api.py | 219 + .../_tensor/parallel/test_tp_examples.py | 437 + .../_tensor/parallel/test_tp_style.py | 197 + .../parallel/test_view_sharding_dim_change.py | 30 + test/distributed/_tensor/test_api.py | 234 + test/distributed/_tensor/test_common_rules.py | 476 + test/distributed/_tensor/test_device_mesh.py | 518 + test/distributed/_tensor/test_dtensor.py | 359 + test/distributed/_tensor/test_dtensor_ops.py | 704 + test/distributed/_tensor/test_math_ops.py | 126 + test/distributed/_tensor/test_matrix_ops.py | 302 + .../distributed/_tensor/test_pointwise_ops.py | 285 + test/distributed/_tensor/test_redistribute.py | 317 + test/distributed/_tensor/test_tensor_ops.py | 365 + .../_tensor/test_tp_sharding_ops.py | 101 + test/distributed/_tensor/test_view_ops.py | 480 + .../ddp_comm_hooks/test_ddp_hooks.py | 10 +- .../distributed/checkpoint/test_checkpoint.py | 392 + .../checkpoint/test_dedup_tensors.py | 45 + .../checkpoint/test_file_system_checkpoint.py | 20 +- .../test_file_system_checkpoint_cpu.py | 2 +- test/distributed/checkpoint/test_planner.py | 269 + test/distributed/checkpoint/test_traverse.py | 176 + .../{_shard => }/checkpoint/test_utils.py | 8 +- test/distributed/defs.bzl | 39 - .../server/test/local_elastic_agent_test.py | 62 +- .../timer/file_based_local_timer_test.py | 266 + test/distributed/fsdp/defs.bzl | 22 - .../fsdp/test_checkpoint_wrapper.py | 264 +- .../fsdp/test_distributed_checkpoint.py | 27 +- .../fsdp/test_flatten_params_wrapper.py | 315 - test/distributed/fsdp/test_fsdp_apply.py | 5 +- test/distributed/fsdp/test_fsdp_checkpoint.py | 127 +- .../fsdp/test_fsdp_clip_grad_norm.py | 264 +- test/distributed/fsdp/test_fsdp_comm.py | 72 +- test/distributed/fsdp/test_fsdp_comm_hooks.py | 331 +- test/distributed/fsdp/test_fsdp_core.py | 52 +- test/distributed/fsdp/test_fsdp_exec_order.py | 46 +- .../fsdp/test_fsdp_flatten_params.py | 445 + .../fsdp/test_fsdp_freezing_weights.py | 8 +- test/distributed/fsdp/test_fsdp_grad_acc.py | 107 +- .../fsdp/test_fsdp_ignored_modules.py | 29 +- test/distributed/fsdp/test_fsdp_input.py | 7 +- test/distributed/fsdp/test_fsdp_memory.py | 8 +- test/distributed/fsdp/test_fsdp_meta.py | 68 +- test/distributed/fsdp/test_fsdp_misc.py | 184 +- .../fsdp/test_fsdp_mixed_precision.py | 241 +- .../fsdp/test_fsdp_multiple_forward.py | 8 +- .../fsdp/test_fsdp_multiple_wrapping.py | 3 +- .../distributed/fsdp/test_fsdp_optim_state.py | 844 +- test/distributed/fsdp/test_fsdp_overlap.py | 13 +- .../fsdp/test_fsdp_param_exec_order_wrap.py | 134 - test/distributed/fsdp/test_fsdp_pure_fp16.py | 6 +- .../fsdp/test_fsdp_sharded_grad_scaler.py | 69 +- test/distributed/fsdp/test_fsdp_state_dict.py | 426 +- .../fsdp/test_fsdp_summon_full_params.py | 304 +- .../fsdp/test_fsdp_tp_integration.py | 486 + test/distributed/fsdp/test_fsdp_traversal.py | 15 +- test/distributed/fsdp/test_fsdp_uneven.py | 7 +- .../fsdp/test_fsdp_use_orig_params.py | 1057 + test/distributed/fsdp/test_shard_utils.py | 26 +- test/distributed/fsdp/test_utils.py | 117 +- test/distributed/fsdp/test_wrap.py | 130 +- .../optim/test_apply_optimizer_in_backward.py | 113 + test/distributed/pipeline/sync/defs.bzl | 22 - test/distributed/test_c10d_common.py | 383 +- test/distributed/test_c10d_error_logger.py | 142 + test/distributed/test_c10d_gloo.py | 181 +- test/distributed/test_c10d_nccl.py | 257 +- test/distributed/test_c10d_spawn_ucc.py | 110 + test/distributed/test_distributed_spawn.py | 2 +- test/distributed/test_dynamo_distributed.py | 567 + test/distributed/test_multi_threaded_pg.py | 87 + test/distributed/test_store.py | 48 +- test/distributions/test_distributions.py | 23 +- .../dynamo}/__init__.py | 0 .../dynamo/mock_modules}/__init__.py | 0 test/dynamo/mock_modules/mock_module1.py | 2 + test/dynamo/mock_modules/mock_module2.py | 19 + test/dynamo/mock_modules/mock_module3.py | 7 + test/dynamo/test_aot_autograd.py | 288 + test/dynamo/test_aot_cudagraphs.py | 208 + test/dynamo/test_dynamic_shapes.py | 112 + test/dynamo/test_export.py | 1493 + test/dynamo/test_export_mutations.py | 134 + test/dynamo/test_functions.py | 697 + test/dynamo/test_global.py | 233 + test/dynamo/test_global_declaration.py | 4 + test/dynamo/test_minifier.py | 318 + test/dynamo/test_misc.py | 3062 ++ test/dynamo/test_model_output.py | 166 + test/dynamo/test_modules.py | 1045 + test/dynamo/test_nops.py | 72 + test/dynamo/test_optimizations.py | 206 + test/dynamo/test_optimizers.py | 167 + test/dynamo/test_python_autograd.py | 287 + test/dynamo/test_recompile_ux.py | 205 + test/dynamo/test_replay_record.py | 194 + test/dynamo/test_repros.py | 2066 + test/dynamo/test_skip_non_tensor.py | 113 + test/dynamo/test_subgraphs.py | 546 + test/dynamo/test_torchxla_integration.py | 131 + test/dynamo/test_torchxla_num_output.py | 120 + test/dynamo/test_torchxla_util.py | 26 + test/dynamo/test_unspec.py | 229 + test/dynamo/test_verify_correctness.py | 175 + ..._compat-fx_backcompat_class_members.expect | 4 +- ...t-fx_backcompat_function_signatures.expect | 8 +- .../check_forward_backward_compatibility.py | 203 +- {functorch/test => test/functorch}/attn_ft.py | 0 .../functorch}/attn_positional.py | 0 .../test => test/functorch}/common_utils.py | 212 +- .../functorch}/discover_coverage.py | 4 - .../functorch}/functorch_additional_op_db.py | 12 +- test/functorch/test_aotdispatch.py | 1989 + test/functorch/test_control_flow.py | 467 + .../test => test/functorch}/test_dims.py | 32 +- .../functorch}/test_eager_transforms.py | 745 +- .../functorch}/test_functionalize.py | 11 +- .../test_memory_efficient_fusion.py | 0 test/functorch/test_minifier.py | 116 + .../test => test/functorch}/test_ops.py | 830 +- .../test => test/functorch}/test_vmap.py | 291 +- .../functorch}/xfail_suggester.py | 4 +- test/fx/quantization.py | 2 +- test/fx/test_common_passes.py | 9 +- test/fx/test_fx_param_shape_control_flow.py | 8 +- test/fx/test_gradual_type.py | 9 +- test/fx/test_pass_infra.py | 15 + test/fx/test_subgraph_rewriter.py | 372 +- test/fx/test_z3_gradual_types.py | 84 +- .../callbacks => test/inductor}/__init__.py | 0 test/inductor/cpp/.gitignore | 13 + test/inductor/cpp/CMakeLists.txt | 47 + test/inductor/cpp/test.sh | 7 + test/inductor/cpp/test_cpp_prefix.cpp | 21 + test/inductor/opinfo_harness.py | 25 + test/inductor/test_minifier.py | 213 + test/inductor/test_perf.py | 502 + test/inductor/test_smoke.py | 30 + test/inductor/test_torchinductor.py | 5566 +++ test/inductor/test_torchinductor_opinfo.py | 591 + test/jit/test_async.py | 15 - test/jit/test_backends.py | 11 +- test/jit/test_freezing.py | 537 +- test/jit/test_hooks.py | 2 +- test/jit/test_misc.py | 19 + test/jit/test_module_interface.py | 8 +- test/jit/test_python_bindings.py | 5 + test/jit/test_symbolic_shape_analysis.py | 7 +- test/jit/test_tensor_creation_ops.py | 8 +- test/jit/test_tracer.py | 8 - test/jit/test_with.py | 2 + test/jit/xnnpack/test_xnnpack_delegate.py | 192 + test/lazy/test_debug_util.py | 44 + test/lazy/test_extract_compiled_graph.py | 2 +- test/lazy/test_meta_kernel.py | 34 + test/lazy/test_reuse_ir.py | 4 + test/lazy/test_step_closures.py | 91 + test/lazy/test_ts_opinfo.py | 34 +- test/mobile/model_test/README.md | 2 +- test/mobile/test_lite_script_module.py | 4 +- test/mobile/test_lite_script_type.py | 14 +- .../test_quantize_fx_lite_script_module.py | 16 +- test/nn/test_convolution.py | 2480 + test/nn/test_dropout.py | 283 + test/nn/test_embedding.py | 1193 + test/nn/test_init.py | 420 + test/nn/test_lazy_modules.py | 626 + test/nn/test_module_hooks.py | 1334 + test/nn/test_packed_sequence.py | 392 + test/nn/test_parametrization.py | 1525 + test/nn/test_pooling.py | 1450 + test/nn/test_pruning.py | 939 + .../expect/TestOperators.test_acos.expect | 2 +- .../TestOperators.test_add_broadcast.expect | 2 +- ...stOperators.test_add_left_broadcast.expect | 2 +- ...tOperators.test_add_size1_broadcast.expect | 2 +- ...tors.test_add_size1_right_broadcast.expect | 2 +- ....test_add_size1_singleton_broadcast.expect | 2 +- .../TestOperators.test_addconstant.expect | 2 +- .../expect/TestOperators.test_addmm.expect | 2 +- .../expect/TestOperators.test_argmax.expect | 2 +- .../expect/TestOperators.test_asin.expect | 2 +- .../expect/TestOperators.test_at_op.expect | 8 +- .../expect/TestOperators.test_atan.expect | 2 +- .../TestOperators.test_avg_pool2d.expect | 9 +- .../expect/TestOperators.test_baddbmm.expect | 26 +- .../expect/TestOperators.test_basic.expect | 2 +- .../TestOperators.test_batchnorm.expect | 7 +- .../TestOperators.test_batchnorm_1d.expect | 7 +- ...stOperators.test_batchnorm_noaffine.expect | 7 +- ...tOperators.test_batchnorm_onnx_irv4.expect | 7 +- ...stOperators.test_batchnorm_training.expect | 9 +- .../expect/TestOperators.test_chunk.expect | 2 +- .../expect/TestOperators.test_clip.expect | 2 +- .../expect/TestOperators.test_clip_max.expect | 2 +- .../expect/TestOperators.test_clip_min.expect | 2 +- .../expect/TestOperators.test_concat2.expect | 2 +- .../expect/TestOperators.test_conv.expect | 2 +- .../TestOperators.test_conv_onnx_irv4.expect | 2 +- .../TestOperators.test_convtranspose.expect | 2 +- .../onnx/expect/TestOperators.test_cos.expect | 2 +- .../expect/TestOperators.test_dict.expect | 2 +- .../expect/TestOperators.test_dict_str.expect | 2 +- .../onnx/expect/TestOperators.test_dim.expect | 2 +- .../expect/TestOperators.test_dropout.expect | 2 +- .../TestOperators.test_dropout_default.expect | 2 +- ...TestOperators.test_dropout_training.expect | 2 +- .../onnx/expect/TestOperators.test_elu.expect | 2 +- .../TestOperators.test_embedding_bags.expect | 2 +- .../TestOperators.test_empty_like.expect | 2 +- .../expect/TestOperators.test_equal.expect | 2 +- .../onnx/expect/TestOperators.test_erf.expect | 2 +- .../onnx/expect/TestOperators.test_exp.expect | 2 +- .../expect/TestOperators.test_expand.expect | 2 +- .../expect/TestOperators.test_flatten.expect | 7 +- .../TestOperators.test_flatten2D.expect | 2 +- .../TestOperators.test_frobenius_norm.expect | 2 +- .../expect/TestOperators.test_full.expect | 2 +- .../TestOperators.test_full_like.expect | 2 +- .../expect/TestOperators.test_gather.expect | 2 +- test/onnx/expect/TestOperators.test_ge.expect | 2 +- .../expect/TestOperators.test_gelu.expect | 2 +- test/onnx/expect/TestOperators.test_gt.expect | 2 +- .../expect/TestOperators.test_hardtanh.expect | 2 +- .../TestOperators.test_implicit_expand.expect | 2 +- .../expect/TestOperators.test_index.expect | 2 +- .../expect/TestOperators.test_isnan.expect | 2 +- .../TestOperators.test_layer_norm_aten.expect | 2 +- test/onnx/expect/TestOperators.test_le.expect | 2 +- .../expect/TestOperators.test_linear.expect | 2 +- .../TestOperators.test_log_sigmoid.expect | 2 +- .../TestOperators.test_logsoftmax.expect | 2 +- test/onnx/expect/TestOperators.test_lt.expect | 2 +- .../onnx/expect/TestOperators.test_max.expect | 2 +- .../expect/TestOperators.test_maxpool.expect | 2 +- .../TestOperators.test_maxpool_indices.expect | 2 +- .../expect/TestOperators.test_mean.expect | 2 +- .../TestOperators.test_mean_dtype.expect | 2 +- .../expect/TestOperators.test_meshgrid.expect | 32 +- .../onnx/expect/TestOperators.test_min.expect | 2 +- test/onnx/expect/TestOperators.test_mm.expect | 2 +- .../expect/TestOperators.test_mul_bool.expect | 2 +- .../TestOperators.test_mul_fp_bool.expect | 2 +- .../expect/TestOperators.test_narrow.expect | 14 +- test/onnx/expect/TestOperators.test_ne.expect | 2 +- .../expect/TestOperators.test_nonzero.expect | 2 +- .../expect/TestOperators.test_norm_p1.expect | 2 +- .../expect/TestOperators.test_norm_p2.expect | 2 +- .../TestOperators.test_ones_like.expect | 2 +- .../onnx/expect/TestOperators.test_pad.expect | 12 +- .../expect/TestOperators.test_params.expect | 2 +- ...TestOperators.test_params_onnx_irv4.expect | 2 +- .../expect/TestOperators.test_permute2.expect | 2 +- .../onnx/expect/TestOperators.test_pow.expect | 2 +- .../expect/TestOperators.test_prelu.expect | 2 +- .../expect/TestOperators.test_prod.expect | 2 +- .../TestOperators.test_prod_dtype.expect | 2 +- .../expect/TestOperators.test_rand.expect | 2 +- .../expect/TestOperators.test_randn.expect | 2 +- ...rs.test_reduce_sum_negative_indices.expect | 2 +- .../TestOperators.test_reduced_mean.expect | 2 +- ...stOperators.test_reduced_mean_dtype.expect | 2 +- ...Operators.test_reduced_mean_keepdim.expect | 2 +- .../TestOperators.test_reduced_prod.expect | 2 +- ...stOperators.test_reduced_prod_dtype.expect | 2 +- ...Operators.test_reduced_prod_keepdim.expect | 2 +- .../TestOperators.test_reduced_sum.expect | 2 +- ...estOperators.test_reduced_sum_dtype.expect | 2 +- ...tOperators.test_reduced_sum_keepdim.expect | 2 +- .../TestOperators.test_reducemax.expect | 2 +- .../TestOperators.test_reducemin.expect | 2 +- .../TestOperators.test_remainder.expect | 2 +- .../expect/TestOperators.test_repeat.expect | 2 +- ...tOperators.test_repeat_dim_overflow.expect | 2 +- .../expect/TestOperators.test_rrelu.expect | 2 +- .../expect/TestOperators.test_rsqrt.expect | 2 +- .../expect/TestOperators.test_rsub.expect | 2 +- .../TestOperators.test_scatter_add.expect | 2 +- .../expect/TestOperators.test_selu.expect | 2 +- .../TestOperators.test_shape_value_map.expect | 40 +- .../expect/TestOperators.test_sign.expect | 2 +- .../onnx/expect/TestOperators.test_sin.expect | 2 +- .../expect/TestOperators.test_slice.expect | 2 +- .../expect/TestOperators.test_split.expect | 2 +- ...TestOperators.test_split_with_sizes.expect | 2 +- .../expect/TestOperators.test_sqrt.expect | 2 +- .../onnx/expect/TestOperators.test_std.expect | 2 +- .../onnx/expect/TestOperators.test_sum.expect | 2 +- .../TestOperators.test_sum_dtype.expect | 2 +- .../onnx/expect/TestOperators.test_tan.expect | 2 +- .../TestOperators.test_transpose.expect | 2 +- .../expect/TestOperators.test_type_as.expect | 2 +- .../expect/TestOperators.test_unfold.expect | 2 +- .../TestOperators.test_unsqueeze.expect | 2 +- ...erators.test_upsample_nearest_scale.expect | 2 +- ..._nearest_scale_default_scale_factor.expect | 2 +- ...perators.test_upsample_nearest_size.expect | 2 +- .../expect/TestOperators.test_view.expect | 7 +- .../TestOperators.test_view_flatten.expect | 7 +- .../TestOperators.test_zeros_like.expect | 2 +- test/onnx/internal/test_beartype.py | 86 + test/onnx/internal/test_diagnostics.py | 304 + test/onnx/internal/test_registraion.py | 254 + test/onnx/onnx_test_common.py | 27 +- test/onnx/pytorch_test_common.py | 71 +- .../symbolic_opsets/test_symbolic_opset9.py | 32 - test/onnx/test_autograd_funs.py | 10 +- test/onnx/test_custom_ops.py | 14 +- test/{jit => onnx}/test_export_modes.py | 91 +- test/onnx/test_models.py | 8 +- test/onnx/test_models_onnxruntime.py | 5 +- test/onnx/test_onnx_opset.py | 5 +- test/onnx/test_onnxscript_no_runtime.py | 164 + test/onnx/test_onnxscript_runtime.py | 130 + test/onnx/test_operators.py | 29 +- test/onnx/test_pytorch_helper.py | 3 +- test/onnx/test_pytorch_jit_onnx.py | 5 +- test/onnx/test_pytorch_onnx_caffe2.py | 27 +- .../test_pytorch_onnx_caffe2_quantized.py | 7 +- test/onnx/test_pytorch_onnx_no_runtime.py | 465 +- test/onnx/test_pytorch_onnx_onnxruntime.py | 548 +- .../test_pytorch_onnx_onnxruntime_cuda.py | 26 +- .../onnx/test_pytorch_onnx_shape_inference.py | 190 +- test/onnx/test_utility_funs.py | 509 +- test/onnx/test_verification.py | 2 + test/onnx/verify.py | 2 +- test/profiler/profiler_utils_mock_events.json | 1 + test/profiler/test_memory_profiler.py | 1418 + test/{ => profiler}/test_profiler.py | 898 +- test/{ => profiler}/test_profiler_tree.py | 185 +- test/profiler_utils_mock_events.json | 1 - test/quantization/ao_migration/common.py | 22 +- .../ao_migration/test_ao_migration.py | 425 + .../ao_migration/test_quantization.py | 8 +- .../ao_migration/test_quantization_fx.py | 2 +- .../bc/test_backward_compatibility.py | 4 +- test/quantization/core/test_backend_config.py | 77 +- test/quantization/core/test_docs.py | 31 +- .../core/test_quantized_functional.py | 2 +- .../core/test_quantized_module.py | 115 +- test/quantization/core/test_quantized_op.py | 364 +- .../core/test_quantized_tensor.py | 174 +- test/quantization/core/test_top_level_apis.py | 93 + test/quantization/core/test_utils.py | 65 + .../quantization/core/test_workflow_module.py | 56 +- test/quantization/core/test_workflow_ops.py | 8 +- test/quantization/dbr/test_quantize_dbr.py | 1619 - test/quantization/eager/test_fuse_eager.py | 16 +- .../quantization/eager/test_model_numerics.py | 4 +- .../eager/test_numeric_suite_eager.py | 2 +- .../eager/test_quantize_eager_ptq.py | 121 +- .../eager/test_quantize_eager_qat.py | 49 +- test/quantization/fx/test_equalize_fx.py | 2 +- test/quantization/fx/test_model_report_fx.py | 100 +- test/quantization/fx/test_numeric_suite_fx.py | 406 +- test/quantization/fx/test_quantize_fx.py | 904 +- .../jit/test_ondevice_quantization.py | 529 + test/quantization/jit/test_quantize_jit.py | 7 +- test/run_test.py | 445 +- test/scripts/run_cuda_memcheck.py | 2 +- test/test_ao_sparsity.py | 1 + test/test_autocast.py | 61 + test/test_autograd.py | 966 +- test/test_binary_ufuncs.py | 63 +- test/test_comparison_utils.py | 32 + test/test_cpp_extensions_jit.py | 4 +- test/test_cuda.py | 672 +- test/test_cuda_nvml_based_avail.py | 69 + test/test_cuda_sanitizer.py | 505 + test/test_cuda_trace.py | 28 + test/test_dataloader.py | 154 +- test/test_datapipe.py | 502 +- test/test_decomp.py | 141 +- test/test_dispatch.py | 4 + test/test_dlpack.py | 193 + test/test_dynamic_shapes.py | 378 +- test/test_expanded_weights.py | 36 +- test/test_fake_tensor.py | 298 +- test/test_foreach.py | 55 +- test/test_function_schema.py | 20 + test/test_functional_optim.py | 7 +- test/test_functionalization.py | 912 +- test/test_futures.py | 9 + test/test_fx.py | 60 +- test/test_fx_backends.py | 258 - test/test_fx_experimental.py | 19 +- test/test_fx_passes.py | 236 +- test/test_fx_reinplace_pass.py | 207 +- test/test_indexing.py | 32 +- test/test_itt.py | 26 + test/test_jit.py | 161 +- test/test_jit_autocast.py | 30 +- test/test_jit_cuda_fuser.py | 238 +- test/test_jit_fuser_te.py | 52 +- test/test_jit_llga_fuser.py | 522 +- test/test_jiterator.py | 10 +- test/test_linalg.py | 523 +- test/test_masked.py | 16 +- test/test_maskedtensor.py | 912 + test/test_matmul_cuda.py | 155 + test/test_meta.py | 621 +- test/test_mkldnn.py | 15 +- test/test_mkldnn_fusion.py | 282 +- test/test_model_dump.py | 2 + test/test_module_init.py | 89 +- test/test_modules.py | 14 +- test/test_mps.py | 2738 +- test/test_multiprocessing.py | 5 +- test/test_namedtuple_return_api.py | 7 +- test/test_native_functions.py | 42 +- test/test_native_mha.py | 94 +- test/test_nestedtensor.py | 1513 +- test/test_nn.py | 11087 +---- test/test_nnapi.py | 10 +- test/test_nvfuser_dynamo.py | 148 + test/test_nvfuser_frontend.py | 366 + test/test_ops.py | 358 +- test/test_ops_fwd_gradients.py | 76 + test/test_ops_gradients.py | 183 +- test/test_ops_jit.py | 9 +- test/test_optim.py | 3190 +- test/test_overrides.py | 212 +- test/test_prims.py | 766 +- test/test_proxy_tensor.py | 784 +- test/test_public_bindings.py | 27 +- test/test_python_dispatch.py | 429 +- test/test_pytree.py | 8 +- test/test_quantization.py | 19 +- test/test_reductions.py | 40 +- test/test_scatter_gather_ops.py | 94 +- test/test_schema_check.py | 79 +- test/test_serialization.py | 232 +- test/test_shape_ops.py | 10 + test/test_sparse.py | 223 +- test/test_sparse_csr.py | 1048 +- test/test_spectral_ops.py | 47 +- test/test_stateless.py | 31 + test/test_subclass.py | 25 +- test/test_tensor_creation_ops.py | 217 - test/test_tensorexpr.py | 352 +- test/test_testing.py | 147 +- test/test_torch.py | 859 +- test/test_transformers.py | 608 +- test/test_type_promotion.py | 91 +- test/test_unary_ufuncs.py | 29 +- test/test_utils.py | 126 +- test/test_view_ops.py | 11 +- test/test_xnnpack_integration.py | 3 +- third_party/VulkanMemoryAllocator | 1 + third_party/build_bundled.py | 9 +- third_party/cpuinfo | 2 +- third_party/cpuinfo.BUILD | 55 - third_party/cudnn_frontend | 2 +- third_party/cutlass | 1 + third_party/cutlass.BUILD | 11 + third_party/fbgemm | 2 +- third_party/fmt | 2 +- third_party/gloo | 2 +- third_party/gloo.BUILD | 3 +- third_party/ideep | 2 +- third_party/kineto | 2 +- third_party/mkl-dnn.BUILD | 5 +- third_party/nccl/nccl | 2 +- third_party/pybind11 | 2 +- third_party/xnnpack.buck.bzl | 89 +- tools/BUCK.bzl | 32 +- tools/amd_build/build_amd.py | 3 + tools/autograd/derivatives.yaml | 740 +- tools/autograd/gen_autograd_functions.py | 59 +- tools/autograd/gen_inplace_or_view_type.py | 24 +- tools/autograd/gen_python_functions.py | 104 +- tools/autograd/gen_trace_type.py | 9 +- tools/autograd/gen_variable_factories.py | 60 +- tools/autograd/gen_variable_type.py | 287 +- tools/autograd/load_derivatives.py | 94 +- tools/autograd/templates/Functions.h | 8 +- tools/autograd/templates/VariableType.cpp | 3 +- tools/autograd/templates/VariableType.h | 2 +- tools/autograd/templates/python_functions.cpp | 2 +- .../templates/python_nested_functions.cpp | 81 + .../templates/python_nn_functions.cpp | 2 +- .../templates/python_variable_methods.cpp | 58 +- tools/code_analyzer/gen_oplist.py | 4 +- tools/code_coverage/README.md | 6 +- .../package/tool/summarize_jsons.py | 2 +- tools/cpuinfo_target_definition.bzl | 12 - tools/dynamo/verify_dynamo.py | 156 + tools/gen_vulkan_glsl.py | 111 + tools/gen_vulkan_spv.py | 121 +- tools/generate_torch_version.py | 13 +- tools/jit/gen_unboxing.py | 4 +- tools/linter/adapters/newlines_linter.py | 138 +- tools/linter/adapters/pip_init.py | 6 +- tools/linter/adapters/s3_init_config.json | 8 +- .../linter/clang_tidy/generate_build_files.py | 1 - tools/miniz_target_definition.bzl | 25 - tools/onnx/gen_diagnostics.py | 244 + tools/onnx/gen_diagnostics.sh | 16 + tools/onnx/sarif/code-gen-hints.json | 10 + tools/onnx/sarif/gen_sarif.sh | 51 + tools/onnx/templates/rules.h.in | 21 + tools/onnx/templates/rules.py.in | 20 + tools/onnx/update_default_opset_version.py | 150 +- tools/perf_kernel_defs.bzl | 66 - tools/pyi/gen_pyi.py | 27 +- tools/setup_helpers/cmake.py | 1 + tools/setup_helpers/cmake_utils.py | 2 +- tools/sgx_aten_target_definitions.bzl | 261 - tools/sgx_caffe2_target_definitions.bzl | 255 - tools/sgx_target_definitions.bzl | 96 - tools/stats/check_disabled_tests.py | 277 + tools/stats/import_test_stats.py | 39 +- tools/stats/monitor.py | 26 +- tools/stats/print_test_stats.py | 6 +- tools/stats/upload_artifacts.py | 61 + tools/stats/upload_stats_lib.py | 31 + tools/stats/upload_test_stats.py | 57 +- tools/target_definitions.bzl | 571 - tools/test/gen_oplist_test.py | 2 +- tools/test/test_codegen.py | 122 +- tools/test/test_codegen_model.py | 53 +- tools/test/test_gen_backend_stubs.py | 43 +- tools/test/test_vulkan_codegen.py | 100 + tools/testing/test_selections.py | 73 +- tools/update_masked_docs.py | 12 +- torch/CMakeLists.txt | 117 +- torch/_C/_VariableFunctions.pyi.in | 2 +- torch/_C/__init__.pyi.in | 214 +- torch/_C/_autograd.pyi | 101 +- torch/_C/_distributed_c10d.pyi | 81 +- torch/_C/_distributed_rpc.pyi | 5 +- torch/_C/_functorch.pyi | 46 + torch/_C/_itt.pyi | 1 + torch/_C/_lazy.pyi | 10 +- torch/_C/_profiler.pyi | 218 + torch/__init__.py | 238 +- torch/_decomp/__init__.py | 126 +- torch/_decomp/decompositions.py | 1628 +- .../_decomp/decompositions_for_jvp.py | 105 +- torch/_deploy.py | 2 +- .../scheduler => _dispatch}/__init__.py | 0 torch/_dispatch/python.py | 142 + torch/_dynamo/__init__.py | 122 + torch/_dynamo/allowed_functions.py | 272 + torch/_dynamo/bytecode_analysis.py | 197 + torch/_dynamo/bytecode_transformation.py | 388 + torch/_dynamo/codegen.py | 364 + torch/_dynamo/config.py | 182 + torch/_dynamo/convert_frame.py | 499 + torch/_dynamo/debug_utils.py | 944 + torch/_dynamo/eval_frame.py | 754 + torch/_dynamo/exc.py | 72 + torch/_dynamo/guards.py | 847 + torch/_dynamo/logging.py | 88 + torch/_dynamo/mutation_guard.py | 119 + torch/_dynamo/optimizations/__init__.py | 6 + torch/_dynamo/optimizations/analysis.py | 150 + torch/_dynamo/optimizations/backends.py | 830 + torch/_dynamo/optimizations/distributed.py | 277 + torch/_dynamo/optimizations/inference.py | 197 + torch/_dynamo/optimizations/log_args.py | 74 + torch/_dynamo/optimizations/normalize.py | 441 + torch/_dynamo/optimizations/subgraph.py | 236 + .../optimizations/torchxla_integration.py | 189 + torch/_dynamo/optimizations/training.py | 547 + torch/_dynamo/output_graph.py | 629 + torch/_dynamo/profiler.py | 177 + torch/_dynamo/replay_record.py | 118 + torch/_dynamo/resume_execution.py | 304 + torch/_dynamo/side_effects.py | 336 + torch/_dynamo/skipfiles.py | 213 + torch/_dynamo/source.py | 259 + torch/_dynamo/symbolic_convert.py | 1860 + torch/_dynamo/test_case.py | 68 + torch/_dynamo/test_minifier_common.py | 131 + torch/_dynamo/testing.py | 272 + torch/_dynamo/utils.py | 1157 + torch/_dynamo/variables/__init__.py | 89 + torch/_dynamo/variables/base.py | 301 + torch/_dynamo/variables/builder.py | 809 + torch/_dynamo/variables/builtin.py | 857 + torch/_dynamo/variables/constant.py | 158 + torch/_dynamo/variables/dicts.py | 436 + torch/_dynamo/variables/functions.py | 413 + torch/_dynamo/variables/lists.py | 511 + torch/_dynamo/variables/misc.py | 705 + torch/_dynamo/variables/nn_module.py | 574 + torch/_dynamo/variables/tensor.py | 593 + torch/_dynamo/variables/torch.py | 751 + torch/_dynamo/variables/user_defined.py | 386 + .../sparsifier => _functorch}/__init__.py | 0 torch/_functorch/pyfunctorch.py | 142 + torch/_functorch/utils.py | 14 + torch/_inductor/__init__.py | 0 torch/_inductor/codecache.py | 612 + torch/_inductor/codegen/__init__.py | 0 torch/_inductor/codegen/autotuner.py | 274 + torch/_inductor/codegen/common.py | 635 + torch/_inductor/codegen/cpp.py | 1561 + torch/_inductor/codegen/cpp_prefix.h | 71 + torch/_inductor/codegen/triton.py | 1481 + .../_inductor/codegen/triton_conv_delta_x.j2 | 181 + .../codegen/triton_conv_delta_x_hwc.j2 | 200 + torch/_inductor/codegen/triton_mm.j2 | 80 + torch/_inductor/codegen/triton_template.py | 351 + torch/_inductor/codegen/wrapper.py | 417 + torch/_inductor/compile_fx.py | 405 + torch/_inductor/config.py | 184 + torch/_inductor/cuda_properties.py | 54 + torch/_inductor/debug.py | 331 + torch/_inductor/decomposition.py | 529 + torch/_inductor/dependencies.py | 288 + torch/_inductor/exc.py | 85 + torch/_inductor/graph.py | 448 + torch/_inductor/ir.py | 4047 ++ torch/_inductor/lowering.py | 3670 ++ torch/_inductor/metrics.py | 17 + torch/_inductor/overrides.py | 1168 + torch/_inductor/scheduler.py | 1129 + torch/_inductor/sizevars.py | 586 + torch/_inductor/triton_ops/__init__.py | 8 + torch/_inductor/triton_ops/autotune.py | 692 + torch/_inductor/triton_ops/batched_matmul.py | 274 + torch/_inductor/triton_ops/conv.py | 744 + torch/_inductor/triton_ops/conv1x1.py | 195 + torch/_inductor/triton_ops/conv_perf_model.py | 165 + torch/_inductor/triton_ops/matmul.py | 136 + torch/_inductor/triton_ops/mm_perf_model.py | 90 + torch/_inductor/triton_ops/utils.py | 31 + torch/_inductor/utils.py | 383 + torch/_inductor/virtualized.py | 140 + torch/_lazy/__init__.py | 19 + torch/_lazy/closure.py | 134 + torch/_lazy/device_context.py | 25 + torch/_lazy/extract_compiled_graph.py | 2 +- torch/_linalg_utils.py | 31 +- torch/_lobpcg.py | 2 +- torch/_meta_registrations.py | 1409 +- torch/_ops.py | 364 +- torch/_prims/__init__.py | 339 +- torch/_prims/context.py | 268 +- torch/_prims/executor.py | 15 +- torch/_prims/nvfuser_executor.py | 378 +- torch/_prims/nvfuser_prims.py | 460 +- torch/_prims_common/__init__.py | 297 +- torch/_prims_common/wrappers.py | 81 +- torch/_python_dispatcher.py | 2 +- torch/_refs/__init__.py | 2583 +- torch/_refs/_conversions.py | 106 + torch/_refs/fft.py | 14 +- torch/_refs/linalg/__init__.py | 23 +- torch/_refs/nn/functional/__init__.py | 655 +- torch/_refs/special/__init__.py | 189 +- torch/_subclasses/__init__.py | 3 + torch/_subclasses/fake_tensor.py | 804 +- torch/_subclasses/fake_utils.py | 140 + torch/_subclasses/meta_utils.py | 349 +- torch/_tensor.py | 178 +- torch/_tensor_docs.py | 172 +- torch/_tensor_str.py | 47 +- torch/_torch_docs.py | 492 +- torch/_utils.py | 123 +- torch/_weights_only_unpickler.py | 291 + torch/amp/autocast_mode.py | 24 +- torch/ao/__init__.py | 16 + torch/ao/nn/__init__.py | 20 +- torch/ao/nn/intrinsic/__init__.py | 32 + torch/ao/nn/intrinsic/modules/__init__.py | 31 + torch/ao/nn/intrinsic/modules/fused.py | 128 + torch/ao/nn/intrinsic/qat/__init__.py | 1 + torch/ao/nn/intrinsic/qat/modules/__init__.py | 31 + .../ao/nn/intrinsic/qat/modules/conv_fused.py | 828 + .../nn/intrinsic/qat/modules/linear_fused.py | 167 + .../nn/intrinsic/qat/modules/linear_relu.py | 48 + torch/ao/nn/intrinsic/quantized/__init__.py | 10 + .../intrinsic/quantized/dynamic/__init__.py | 1 + .../quantized/dynamic/modules/__init__.py | 6 + .../quantized/dynamic/modules/linear_relu.py | 51 + .../intrinsic/quantized/modules/__init__.py | 12 + .../nn/intrinsic/quantized/modules/bn_relu.py | 78 + .../intrinsic/quantized/modules/conv_relu.py | 166 + .../quantized/modules/linear_relu.py | 41 + torch/ao/nn/qat/__init__.py | 1 + torch/ao/nn/qat/dynamic/__init__.py | 1 + torch/ao/nn/qat/dynamic/modules/__init__.py | 3 + torch/ao/nn/qat/dynamic/modules/linear.py | 24 + torch/ao/nn/qat/modules/__init__.py | 14 + torch/ao/nn/qat/modules/conv.py | 264 + torch/ao/nn/qat/modules/embedding_ops.py | 143 + torch/ao/nn/qat/modules/linear.py | 77 + torch/ao/nn/quantizable/__init__.py | 1 + torch/ao/nn/quantizable/modules/__init__.py | 9 + torch/ao/nn/quantizable/modules/activation.py | 454 + torch/ao/nn/quantizable/modules/rnn.py | 398 + torch/ao/nn/quantized/__init__.py | 38 + torch/ao/nn/quantized/dynamic/__init__.py | 1 + .../nn/quantized/dynamic/modules/__init__.py | 19 + torch/ao/nn/quantized/dynamic/modules/conv.py | 399 + .../ao/nn/quantized/dynamic/modules/linear.py | 127 + torch/ao/nn/quantized/dynamic/modules/rnn.py | 1054 + torch/ao/nn/quantized/functional.py | 616 + torch/ao/nn/quantized/modules/__init__.py | 136 + torch/ao/nn/quantized/modules/activation.py | 278 + torch/ao/nn/quantized/modules/batchnorm.py | 101 + torch/ao/nn/quantized/modules/conv.py | 937 + torch/ao/nn/quantized/modules/dropout.py | 27 + .../ao/nn/quantized/modules/embedding_ops.py | 295 + .../quantized/modules/functional_modules.py | 233 + torch/ao/nn/quantized/modules/linear.py | 302 + .../ao/nn/quantized/modules/normalization.py | 204 + torch/ao/nn/quantized/modules/rnn.py | 47 + torch/ao/nn/quantized/modules/utils.py | 113 + torch/ao/nn/quantized/reference/__init__.py | 17 + .../quantized/reference/modules/__init__.py | 20 + .../ao/nn/quantized/reference/modules/conv.py | 318 + .../nn/quantized/reference/modules/linear.py | 57 + .../ao/nn/quantized/reference/modules/rnn.py | 479 + .../nn/quantized/reference/modules/sparse.py | 94 + .../nn/quantized/reference/modules/utils.py | 160 + .../ao/nn/sparse/quantized/dynamic/linear.py | 8 +- torch/ao/nn/sparse/quantized/linear.py | 2 +- torch/ao/ns/_numeric_suite.py | 4 +- torch/ao/ns/_numeric_suite_dbr.py | 112 - torch/ao/ns/_numeric_suite_fx.py | 226 +- torch/ao/ns/fx/mappings.py | 14 +- torch/ao/ns/fx/n_shadows_utils.py | 917 + torch/ao/ns/fx/ns_types.py | 6 + torch/ao/ns/fx/qconfig_multi_mapping.py | 242 + torch/ao/ns/fx/utils.py | 2 +- torch/ao/ns/fx/weight_utils.py | 8 +- torch/ao/{sparsity => pruning}/__init__.py | 1 + torch/ao/pruning/_experimental/__init__.py | 0 .../activation_sparsifier/README.md | 2 +- .../activation_sparsifier/__init__.py | 0 .../activation_sparsifier.py | 0 .../_experimental/data_scheduler/README.md | 0 .../_experimental/data_scheduler/__init__.py | 0 .../data_scheduler/base_data_scheduler.py | 2 +- .../_experimental/data_sparsifier/README.md | 2 +- .../_experimental/data_sparsifier/__init__.py | 0 .../data_sparsifier/base_data_sparsifier.py | 0 .../data_sparsifier/benchmarks/README.md | 2 +- .../data_sparsifier/benchmarks/dlrm_utils.py | 0 .../benchmarks/evaluate_disk_savings.py | 2 +- .../benchmarks/evaluate_forward_time.py | 0 .../benchmarks/evaluate_model_metrics.py | 0 .../benchmarks/images/accuracy.png | Bin .../benchmarks/images/disk_savings.png | Bin .../benchmarks/images/forward_time.png | Bin .../data_sparsifier/data_norm_sparsifier.py | 0 .../data_sparsifier/lightning/__init__.py | 0 .../lightning/callbacks/README.md | 0 .../lightning/callbacks/__init__.py | 0 .../callbacks/_data_sparstity_utils.py | 2 +- .../lightning/callbacks/data_sparsity.py | 0 .../lightning/tests/test_callbacks.py | 10 +- .../data_sparsifier/quantization_utils.py | 2 +- .../_experimental/pruner/README.md | 0 .../_experimental/pruner/__init__.py | 0 .../_experimental/pruner/base_pruner.py | 4 +- .../_experimental/pruner/images/prune_1.png | Bin .../_experimental/pruner/images/prune_2.png | Bin .../_experimental/pruner/images/prune_3.png | Bin .../_experimental/pruner/images/prune_4.png | Bin .../_experimental/pruner/parametrization.py | 0 torch/ao/{sparsity => pruning}/_mappings.py | 8 +- torch/ao/pruning/scheduler/__init__.py | 0 .../scheduler/base_scheduler.py | 16 +- torch/ao/pruning/scheduler/cubic_scheduler.py | 108 + .../scheduler/lambda_scheduler.py | 0 torch/ao/pruning/sparsifier/__init__.py | 0 .../sparsifier/base_sparsifier.py | 4 +- .../sparsifier/nearly_diagonal_sparsifier.py | 0 .../{sparsity => pruning}/sparsifier/utils.py | 2 +- .../sparsifier/weight_norm_sparsifier.py | 28 +- torch/ao/quantization/__init__.py | 125 +- torch/ao/quantization/_correct_bias.py | 2 +- torch/ao/quantization/_dbr/README.md | 259 - torch/ao/quantization/_dbr/auto_trace.py | 723 - .../quantization/_dbr/auto_trace_rewriter.py | 247 - torch/ao/quantization/_dbr/function_fusion.py | 101 - torch/ao/quantization/_dbr/fusion.py | 56 - torch/ao/quantization/_dbr/mappings.py | 178 - torch/ao/quantization/_dbr/model_utils.py | 163 - .../ao/quantization/_dbr/module_swap_utils.py | 79 - .../_dbr/qconfig_mapping_utils.py | 25 - .../quantization/_dbr/quantization_state.py | 986 - .../ao/quantization/_dbr/torchscript_utils.py | 15 - torch/ao/quantization/_dbr/utils.py | 751 - torch/ao/quantization/_quantize_dbr.py | 144 - .../ao/quantization/backend_config/README.md | 146 +- .../quantization/backend_config/__init__.py | 13 +- .../_common_operator_config_utils.py | 252 +- .../backend_config/backend_config.py | 236 +- .../quantization/backend_config/executorch.py | 226 + .../ao/quantization/backend_config/fbgemm.py | 114 + .../ao/quantization/backend_config/native.py | 242 +- .../backend_config/observation_type.py | 13 - .../ao/quantization/backend_config/qnnpack.py | 161 + .../quantization/backend_config/tensorrt.py | 8 +- torch/ao/quantization/backend_config/utils.py | 31 +- torch/ao/quantization/backend_config/x86.py | 111 + torch/ao/quantization/fake_quantize.py | 50 +- torch/ao/quantization/fuse_modules.py | 10 +- .../ao/quantization/fuser_method_mappings.py | 82 +- torch/ao/quantization/fx/README.md | 380 + torch/ao/quantization/fx/_decomposed.py | 309 + torch/ao/quantization/fx/_equalize.py | 36 +- .../fx/_lower_to_native_backend.py | 80 +- .../quantization/fx/_model_report/README.md | 20 +- .../quantization/fx/_model_report/detector.py | 207 +- .../fx/_model_report/model_report.py | 166 +- .../quantization/fx/backend_config_utils.py | 50 +- .../fx/common_quantization_patterns.py | 8 - torch/ao/quantization/fx/convert.py | 731 +- torch/ao/quantization/fx/custom_config.py | 94 +- torch/ao/quantization/fx/fuse.py | 2 +- torch/ao/quantization/fx/fusion_patterns.py | 5 +- torch/ao/quantization/fx/graph_module.py | 13 +- torch/ao/quantization/fx/match_utils.py | 5 +- torch/ao/quantization/fx/pattern_utils.py | 19 +- torch/ao/quantization/fx/prepare.py | 636 +- ...nfig_utils.py => qconfig_mapping_utils.py} | 57 +- .../quantization/fx/quantization_patterns.py | 46 +- torch/ao/quantization/fx/tracer.py | 2 +- torch/ao/quantization/fx/utils.py | 587 +- torch/ao/quantization/observer.py | 106 +- torch/ao/quantization/qconfig.py | 69 +- torch/ao/quantization/qconfig_mapping.py | 98 +- .../ao/quantization/qconfig_mapping_utils.py | 31 +- torch/ao/quantization/quant_type.py | 3 +- .../ao/quantization/quantization_mappings.py | 47 +- torch/ao/quantization/quantization_types.py | 18 - torch/ao/quantization/quantize.py | 46 +- torch/ao/quantization/quantize_fx.py | 176 +- torch/ao/quantization/quantize_jit.py | 122 + torch/ao/quantization/utils.py | 149 +- torch/autograd/__init__.py | 42 +- torch/autograd/anomaly_mode.py | 20 +- torch/autograd/forward_ad.py | 23 + torch/autograd/function.py | 3 + torch/autograd/functional.py | 2 + torch/autograd/grad_mode.py | 43 +- torch/autograd/gradcheck.py | 4 +- torch/autograd/graph.py | 300 +- torch/autograd/profiler.py | 67 +- torch/autograd/profiler_legacy.py | 21 +- torch/autograd/profiler_util.py | 32 +- torch/autograd/variable.py | 1 + torch/backends/_coreml/preprocess.py | 12 +- torch/backends/cuda/__init__.py | 96 + torch/backends/cudnn/__init__.py | 2 + torch/backends/opt_einsum/__init__.py | 99 + torch/backends/quantized/__init__.py | 4 +- torch/backends/xeon/run_cpu.py | 18 +- torch/cpu/amp/autocast_mode.py | 2 + torch/csrc/CudaIPCTypes.cpp | 2 +- torch/csrc/DynamicTypes.cpp | 11 +- torch/csrc/Exceptions.cpp | 59 +- torch/csrc/Exceptions.h | 35 +- torch/csrc/Module.cpp | 352 +- torch/csrc/Size.cpp | 6 +- torch/csrc/Storage.cpp | 2 + torch/csrc/StorageMethods.cpp | 2 +- torch/csrc/StorageSharing.cpp | 12 +- torch/csrc/TypeInfo.cpp | 9 +- torch/csrc/api/include/torch/all.h | 1 + torch/csrc/api/include/torch/nested.h | 95 + .../api/include/torch/nn/functional/padding.h | 2 +- .../include/torch/nn/functional/upsampling.h | 16 +- torch/csrc/api/include/torch/nn/pimpl.h | 10 +- torch/csrc/api/src/nn/modules/transformer.cpp | 4 +- torch/csrc/api/src/optim/optimizer.cpp | 4 +- torch/csrc/autograd/FunctionsManual.cpp | 937 +- torch/csrc/autograd/FunctionsManual.h | 132 +- torch/csrc/autograd/TraceTypeManual.cpp | 4 +- torch/csrc/autograd/VariableTypeManual.cpp | 18 +- torch/csrc/autograd/VariableTypeUtils.h | 104 +- torch/csrc/autograd/anomaly_mode.cpp | 8 +- torch/csrc/autograd/anomaly_mode.h | 12 +- torch/csrc/autograd/autograd_meta.cpp | 10 +- .../autograd_not_implemented_fallback.cpp | 3 +- torch/csrc/autograd/custom_function.h | 8 +- torch/csrc/autograd/engine.cpp | 121 +- torch/csrc/autograd/function.h | 3 + torch/csrc/autograd/functions/tensor.cpp | 8 +- torch/csrc/autograd/functions/utils.h | 18 + torch/csrc/autograd/graph_task.h | 18 +- torch/csrc/autograd/init.cpp | 427 +- torch/csrc/autograd/input_metadata.h | 4 +- torch/csrc/autograd/jit_decomp_interface.cpp | 21 + torch/csrc/autograd/jit_decomp_interface.h | 54 + torch/csrc/autograd/profiler_kineto.cpp | 392 +- torch/csrc/autograd/profiler_kineto.h | 11 +- torch/csrc/autograd/profiler_legacy.cpp | 39 +- torch/csrc/autograd/profiler_legacy.h | 1 + torch/csrc/autograd/profiler_python.cpp | 374 +- torch/csrc/autograd/python_anomaly_mode.cpp | 2 +- torch/csrc/autograd/python_cpp_function.cpp | 21 +- torch/csrc/autograd/python_cpp_function.h | 2 + torch/csrc/autograd/python_engine.cpp | 4 +- torch/csrc/autograd/python_nested_functions.h | 11 + .../python_nested_functions_manual.cpp | 44 + .../autograd/python_saved_variable_hooks.cpp | 2 +- .../python_torch_functions_manual.cpp | 41 +- torch/csrc/autograd/python_variable.cpp | 694 +- torch/csrc/autograd/python_variable.h | 1 + .../autograd/python_variable_indexing.cpp | 34 +- torch/csrc/autograd/saved_variable.cpp | 21 +- .../autograd/utils/grad_layout_contract.h | 16 +- torch/csrc/autograd/utils/warnings.cpp | 11 +- torch/csrc/autograd/utils/warnings.h | 13 +- torch/csrc/autograd/variable.cpp | 77 +- torch/csrc/autograd/variable.h | 56 +- torch/csrc/cuda/CUDAPluggableAllocator.cpp | 317 + torch/csrc/cuda/CUDAPluggableAllocator.h | 135 + torch/csrc/cuda/Graph.cpp | 10 +- torch/csrc/cuda/Module.cpp | 339 +- torch/csrc/cuda/Tensor.cpp | 2 + torch/csrc/cuda/comm.cpp | 16 +- torch/csrc/cuda/memory_snapshot.cpp | 167 + torch/csrc/cuda/memory_snapshot.h | 17 + torch/csrc/cuda/nccl.cpp | 32 +- torch/csrc/cuda/nccl.h | 9 +- torch/csrc/cuda/shared/cudart.cpp | 11 +- torch/csrc/deploy/.gitignore | 1 - torch/csrc/deploy/CMakeLists.txt | 83 - torch/csrc/deploy/Exception.h | 47 - torch/csrc/deploy/README.md | 29 +- torch/csrc/deploy/benchmark.cpp | 336 - torch/csrc/deploy/deploy.cpp | 366 - torch/csrc/deploy/deploy.h | 302 - torch/csrc/deploy/elf_file.cpp | 56 - torch/csrc/deploy/elf_file.h | 66 - torch/csrc/deploy/environment.h | 69 - torch/csrc/deploy/example/benchmark.cpp | 336 - torch/csrc/deploy/example/examples.py | 268 - torch/csrc/deploy/example/fx/examples.py | 16 - .../csrc/deploy/example/fx/some_dependency.py | 4 - .../csrc/deploy/example/generate_examples.py | 96 - torch/csrc/deploy/example/gpu_wrapper.py | 66 - torch/csrc/deploy/example/simple.pt | Bin 2432 -> 0 bytes torch/csrc/deploy/example/tensorrt_example.py | 63 - .../interactive_embedded_interpreter.cpp | 37 - torch/csrc/deploy/interpreter/CMakeLists.txt | 117 - .../deploy/interpreter/CMakePythonModules.txt | 69 - torch/csrc/deploy/interpreter/Optional.hpp | 1107 - .../deploy/interpreter/builtin_registry.cpp | 284 - .../deploy/interpreter/builtin_registry.h | 130 - .../deploy/interpreter/configure_cpython.sh | 6 - .../deploy/interpreter/cpython_patch.diff | 14 - torch/csrc/deploy/interpreter/defs.bzl | 117 - .../deploy/interpreter/hide_symbols.script | 4 - .../interpreter/import_find_sharedfuncptr.cpp | 45 - .../deploy/interpreter/interpreter_impl.cpp | 413 - .../deploy/interpreter/interpreter_impl.h | 185 - .../interpreter/register_frozenpython.cpp | 82 - .../deploy/interpreter/register_numpy.cpp | 51 - .../deploy/interpreter/register_pyyaml.cpp | 6 - .../interpreter/test_builtin_registry.cpp | 58 - .../deploy/interpreter/third_party/README.md | 2 - torch/csrc/deploy/loader.cpp | 1255 - torch/csrc/deploy/loader.h | 52 - torch/csrc/deploy/mem_file.h | 67 - torch/csrc/deploy/noop_environment.h | 14 - torch/csrc/deploy/path_environment.cpp | 13 - torch/csrc/deploy/path_environment.h | 19 - torch/csrc/deploy/remove_dt_needed.cpp | 82 - torch/csrc/deploy/test_deploy.cpp | 537 - torch/csrc/deploy/test_deploy_from_python.py | 7 - torch/csrc/deploy/test_deploy_gpu.cpp | 120 - torch/csrc/deploy/test_deploy_lib.cpp | 98 - .../test_deploy_missing_interpreter.cpp | 14 - torch/csrc/deploy/test_deploy_python.py | 26 - torch/csrc/deploy/test_deploy_python_ext.cpp | 25 - torch/csrc/deploy/unity/example.py | 10 - torch/csrc/deploy/unity/main.cpp | 35 - torch/csrc/deploy/unity/tests/simple_model.py | 15 - torch/csrc/deploy/unity/tests/sum.py | 5 - torch/csrc/deploy/unity/tests/test_unity.h | 5 - .../unity/tests/test_unity_simple_model.cpp | 40 - .../deploy/unity/tests/test_unity_sum.cpp | 31 - torch/csrc/deploy/unity/unity.bzl | 46 - torch/csrc/deploy/unity/xar_environment.cpp | 158 - torch/csrc/deploy/unity/xar_environment.h | 31 - .../autograd/engine/dist_engine.cpp | 8 + torch/csrc/distributed/c10d/Backend.cpp | 17 + torch/csrc/distributed/c10d/Backend.hpp | 277 + torch/csrc/distributed/c10d/FileStore.cpp | 12 +- torch/csrc/distributed/c10d/FileStore.hpp | 3 +- .../distributed/c10d/GlooDeviceFactory.cpp | 2 +- torch/csrc/distributed/c10d/HashStore.cpp | 2 +- torch/csrc/distributed/c10d/HashStore.hpp | 2 +- torch/csrc/distributed/c10d/NCCLUtils.cpp | 53 +- torch/csrc/distributed/c10d/NCCLUtils.hpp | 79 +- torch/csrc/distributed/c10d/Ops.cpp | 382 +- torch/csrc/distributed/c10d/Ops.hpp | 52 +- torch/csrc/distributed/c10d/OpsImpl.cpp | 552 + .../csrc/distributed/c10d/ParamCommsUtils.cpp | 2 +- .../csrc/distributed/c10d/ParamCommsUtils.hpp | 62 +- torch/csrc/distributed/c10d/PrefixStore.cpp | 6 +- torch/csrc/distributed/c10d/PrefixStore.hpp | 6 +- torch/csrc/distributed/c10d/ProcessGroup.cpp | 177 +- torch/csrc/distributed/c10d/ProcessGroup.hpp | 208 +- .../distributed/c10d/ProcessGroupGloo.cpp | 177 +- .../distributed/c10d/ProcessGroupGloo.hpp | 44 +- .../csrc/distributed/c10d/ProcessGroupMPI.cpp | 53 +- .../csrc/distributed/c10d/ProcessGroupMPI.hpp | 46 +- .../distributed/c10d/ProcessGroupNCCL.cpp | 740 +- .../distributed/c10d/ProcessGroupNCCL.hpp | 102 +- .../c10d/ProcessGroupRoundRobin.cpp | 47 +- .../c10d/ProcessGroupRoundRobin.hpp | 32 +- .../csrc/distributed/c10d/ProcessGroupUCC.cpp | 435 +- .../csrc/distributed/c10d/ProcessGroupUCC.hpp | 138 +- .../distributed/c10d/ProcessGroupWrapper.cpp | 47 +- .../distributed/c10d/ProcessGroupWrapper.hpp | 42 +- .../csrc/distributed/c10d/PyProcessGroup.hpp | 36 +- torch/csrc/distributed/c10d/Store.cpp | 22 +- torch/csrc/distributed/c10d/Store.hpp | 9 + torch/csrc/distributed/c10d/TCPStore.cpp | 8 +- torch/csrc/distributed/c10d/TCPStore.hpp | 2 +- torch/csrc/distributed/c10d/TraceUtils.h | 4 +- torch/csrc/distributed/c10d/Types.hpp | 119 +- torch/csrc/distributed/c10d/UCCTracing.cpp | 20 +- torch/csrc/distributed/c10d/UCCTracing.hpp | 2 +- torch/csrc/distributed/c10d/UCCUtils.cpp | 80 +- torch/csrc/distributed/c10d/UCCUtils.hpp | 97 +- torch/csrc/distributed/c10d/UnixSockUtils.hpp | 2 +- torch/csrc/distributed/c10d/Utils.cpp | 2 +- torch/csrc/distributed/c10d/Utils.hpp | 31 +- torch/csrc/distributed/c10d/WinSockUtils.hpp | 2 +- torch/csrc/distributed/c10d/Work.cpp | 182 + torch/csrc/distributed/c10d/Work.hpp | 138 + torch/csrc/distributed/c10d/comm.cpp | 14 +- torch/csrc/distributed/c10d/comm.hpp | 2 +- torch/csrc/distributed/c10d/debug.cpp | 6 +- torch/csrc/distributed/c10d/debug.h | 2 +- .../distributed/c10d/default_comm_hooks.cpp | 6 +- .../distributed/c10d/default_comm_hooks.hpp | 4 +- torch/csrc/distributed/c10d/exception.cpp | 2 +- torch/csrc/distributed/c10d/init.cpp | 318 +- torch/csrc/distributed/c10d/logger.cpp | 8 +- torch/csrc/distributed/c10d/logger.hpp | 2 +- torch/csrc/distributed/c10d/logging.cpp | 4 +- .../distributed/c10d/python_comm_hook.cpp | 2 +- .../csrc/distributed/c10d/python_comm_hook.h | 4 +- .../c10d/quantization/quantization_gpu.cu | 2 +- torch/csrc/distributed/c10d/reducer.cpp | 12 +- torch/csrc/distributed/c10d/reducer.hpp | 20 +- torch/csrc/distributed/c10d/reducer_cuda.cpp | 2 +- torch/csrc/distributed/c10d/sequence_num.cpp | 2 +- torch/csrc/distributed/c10d/socket.cpp | 8 +- torch/csrc/distributed/c10d/socket.h | 2 +- torch/csrc/distributed/rpc/agent_utils.h | 2 +- torch/csrc/distributed/rpc/py_rref.cpp | 4 +- torch/csrc/distributed/rpc/rref_context.cpp | 14 +- torch/csrc/distributed/rpc/rref_context.h | 3 + .../csrc/distributed/rpc/tensorpipe_agent.cpp | 23 +- torch/csrc/distributed/rpc/tensorpipe_agent.h | 4 +- torch/csrc/distributed/rpc/utils.cpp | 2 +- torch/csrc/dl.c | 32 - torch/csrc/dynamo/eval_frame.c | 606 + torch/csrc/dynamo/eval_frame.h | 6 + torch/csrc/dynamo/guards.cpp | 422 + torch/csrc/dynamo/guards.h | 4 + torch/csrc/dynamo/init.cpp | 32 + torch/csrc/dynamo/init.h | 14 + torch/csrc/functorch/init.cpp | 509 + torch/csrc/functorch/init.h | 12 + torch/csrc/itt.cpp | 1 + torch/csrc/itt_wrapper.cpp | 5 + torch/csrc/itt_wrapper.h | 1 + torch/csrc/jit/OVERVIEW.md | 6 +- .../backends/coreml/objc/PTMCoreMLBackend.mm | 85 +- .../backends/coreml/objc/PTMCoreMLCompiler.h | 4 +- .../backends/coreml/objc/PTMCoreMLCompiler.mm | 143 +- .../backends/coreml/objc/PTMCoreMLExecutor.h | 2 +- .../backends/coreml/objc/PTMCoreMLExecutor.mm | 11 +- .../coreml/objc/PTMCoreMLModelWrapper.h | 9 - .../coreml/observer/PTMCoreMLObserver.h | 47 - .../coreml/observer/PTMCoreMLObserver.mm | 8 - .../xnnpack/compiler/xnn_compiler.cpp | 118 + .../backends/xnnpack/compiler/xnn_compiler.h | 27 + .../backends/xnnpack/executor/xnn_executor.h | 70 + .../backends/xnnpack/serialization/schema.fbs | 97 + .../xnnpack/serialization/serializer.cpp | 102 + .../xnnpack/serialization/serializer.h | 86 + .../backends/xnnpack/xnnpack_backend_lib.cpp | 119 + .../xnnpack/xnnpack_backend_preprocess.cpp | 132 + .../xnnpack/xnnpack_graph_builder.cpp | 324 + .../backends/xnnpack/xnnpack_graph_builder.h | 93 + torch/csrc/jit/codegen/cuda/README.md | 4 +- torch/csrc/jit/codegen/cuda/arith.cpp | 218 +- torch/csrc/jit/codegen/cuda/arith.h | 49 +- torch/csrc/jit/codegen/cuda/codegen.cpp | 288 +- torch/csrc/jit/codegen/cuda/compute_at.cpp | 21 +- torch/csrc/jit/codegen/cuda/compute_at.h | 2 +- .../csrc/jit/codegen/cuda/compute_at_map.cpp | 371 +- torch/csrc/jit/codegen/cuda/compute_at_map.h | 27 +- torch/csrc/jit/codegen/cuda/contiguity.cpp | 654 +- torch/csrc/jit/codegen/cuda/contiguity.h | 198 +- torch/csrc/jit/codegen/cuda/disjoint_set.h | 17 +- torch/csrc/jit/codegen/cuda/dispatch.cpp | 90 + torch/csrc/jit/codegen/cuda/dispatch.h | 24 + torch/csrc/jit/codegen/cuda/dynamic_type.h | 312 + .../jit/codegen/cuda/evaluator_common.cpp | 234 +- .../csrc/jit/codegen/cuda/evaluator_common.h | 102 +- torch/csrc/jit/codegen/cuda/executor.cpp | 497 +- torch/csrc/jit/codegen/cuda/executor.h | 57 +- .../jit/codegen/cuda/executor_kernel_arg.cpp | 35 + .../jit/codegen/cuda/executor_kernel_arg.h | 260 +- .../csrc/jit/codegen/cuda/executor_utils.cpp | 462 +- torch/csrc/jit/codegen/cuda/executor_utils.h | 9 +- .../csrc/jit/codegen/cuda/expr_evaluator.cpp | 114 +- torch/csrc/jit/codegen/cuda/expr_evaluator.h | 28 +- torch/csrc/jit/codegen/cuda/fusion.cpp | 43 +- torch/csrc/jit/codegen/cuda/fusion.h | 12 +- .../jit/codegen/cuda/fusion_segmenter.cpp | 60 +- .../csrc/jit/codegen/cuda/fusion_segmenter.h | 14 +- torch/csrc/jit/codegen/cuda/graph_fuser.cpp | 36 +- .../jit/codegen/cuda/grouped_reduction.cpp | 18 +- .../csrc/jit/codegen/cuda/grouped_reduction.h | 4 + torch/csrc/jit/codegen/cuda/index_compute.cpp | 400 +- torch/csrc/jit/codegen/cuda/index_compute.h | 112 +- .../codegen/cuda/index_reference_replay.cpp | 625 - .../jit/codegen/cuda/index_reference_replay.h | 132 - .../jit/codegen/cuda/inline_propagator.cpp | 385 - .../csrc/jit/codegen/cuda/inline_propagator.h | 118 - torch/csrc/jit/codegen/cuda/inlining.cpp | 306 + torch/csrc/jit/codegen/cuda/inlining.h | 100 + .../csrc/jit/codegen/cuda/instrumentation.cpp | 2 +- torch/csrc/jit/codegen/cuda/instrumentation.h | 4 +- torch/csrc/jit/codegen/cuda/interface.cpp | 185 +- torch/csrc/jit/codegen/cuda/interface.h | 8 + torch/csrc/jit/codegen/cuda/ir_base_nodes.cpp | 60 +- torch/csrc/jit/codegen/cuda/ir_base_nodes.h | 37 +- torch/csrc/jit/codegen/cuda/ir_builder.cpp | 4 + torch/csrc/jit/codegen/cuda/ir_cloner.cpp | 16 + torch/csrc/jit/codegen/cuda/ir_cloner.h | 4 + torch/csrc/jit/codegen/cuda/ir_graphviz.cpp | 38 + torch/csrc/jit/codegen/cuda/ir_graphviz.h | 4 + .../jit/codegen/cuda/ir_interface_nodes.h | 29 +- .../csrc/jit/codegen/cuda/ir_internal_nodes.h | 564 +- torch/csrc/jit/codegen/cuda/ir_iostream.cpp | 355 +- torch/csrc/jit/codegen/cuda/ir_iostream.h | 6 + torch/csrc/jit/codegen/cuda/ir_nodes.cpp | 968 +- torch/csrc/jit/codegen/cuda/ir_utils.cpp | 135 +- torch/csrc/jit/codegen/cuda/ir_utils.h | 11 +- torch/csrc/jit/codegen/cuda/iter_visitor.cpp | 149 +- torch/csrc/jit/codegen/cuda/iter_visitor.h | 99 +- torch/csrc/jit/codegen/cuda/kernel.cpp | 17 +- torch/csrc/jit/codegen/cuda/kernel.h | 2 +- torch/csrc/jit/codegen/cuda/kernel_cache.cpp | 417 +- torch/csrc/jit/codegen/cuda/kernel_cache.h | 90 +- .../codegen/cuda/kernel_expr_evaluator.cpp | 79 +- .../jit/codegen/cuda/kernel_expr_evaluator.h | 14 +- torch/csrc/jit/codegen/cuda/kernel_ir.cpp | 240 +- torch/csrc/jit/codegen/cuda/kernel_ir.h | 127 +- torch/csrc/jit/codegen/cuda/lower2device.cpp | 25 +- torch/csrc/jit/codegen/cuda/lower2device.h | 27 +- .../jit/codegen/cuda/lower_alias_memory.cpp | 22 +- .../jit/codegen/cuda/lower_allocation.cpp | 15 +- .../jit/codegen/cuda/lower_bank_conflict.cpp | 332 + .../jit/codegen/cuda/lower_bank_conflict.h | 46 + .../codegen/cuda/lower_divisible_split.cpp | 121 + .../jit/codegen/cuda/lower_divisible_split.h | 29 + .../csrc/jit/codegen/cuda/lower_expr_sort.cpp | 11 +- .../codegen/cuda/lower_fused_reduction.cpp | 12 +- torch/csrc/jit/codegen/cuda/lower_index.cpp | 307 +- torch/csrc/jit/codegen/cuda/lower_index.h | 22 + .../jit/codegen/cuda/lower_index_compute.cpp | 199 +- .../jit/codegen/cuda/lower_index_compute.h | 12 + .../jit/codegen/cuda/lower_insert_syncs.cpp | 4 +- .../jit/codegen/cuda/lower_instrument.cpp | 2 +- torch/csrc/jit/codegen/cuda/lower_loops.cpp | 4 +- .../cuda/lower_misaligned_vectorization.cpp | 2 +- .../csrc/jit/codegen/cuda/lower_predicate.cpp | 69 +- .../cuda/lower_predicate_elimination.cpp | 129 +- torch/csrc/jit/codegen/cuda/lower_shift.cpp | 136 +- torch/csrc/jit/codegen/cuda/lower_shift.h | 35 +- .../codegen/cuda/lower_sync_information.cpp | 50 +- .../codegen/cuda/lower_thread_predicate.cpp | 7 +- .../codegen/cuda/lower_trivial_broadcast.cpp | 6 +- .../codegen/cuda/lower_trivial_broadcast.h | 3 +- torch/csrc/jit/codegen/cuda/lower_unroll.cpp | 39 +- torch/csrc/jit/codegen/cuda/lower_unroll.h | 2 + torch/csrc/jit/codegen/cuda/lower_utils.cpp | 386 +- torch/csrc/jit/codegen/cuda/lower_utils.h | 116 +- .../jit/codegen/cuda/lower_validation.cpp | 49 +- .../jit/codegen/cuda/lower_warp_reduce.cpp | 6 +- torch/csrc/jit/codegen/cuda/manager.cpp | 12 +- torch/csrc/jit/codegen/cuda/mutator.cpp | 121 +- .../jit/codegen/cuda/non_divisible_split.cpp | 13 +- torch/csrc/jit/codegen/cuda/nvfuser.cmake | 17 +- torch/csrc/jit/codegen/cuda/ops/alias.cpp | 90 +- torch/csrc/jit/codegen/cuda/ops/composite.cpp | 2 +- .../jit/codegen/cuda/ops/normalization.cpp | 70 +- .../csrc/jit/codegen/cuda/ops/normalization.h | 11 + .../codegen/cuda/parallel_dimension_map.cpp | 2 +- .../jit/codegen/cuda/parallel_type_bitmap.cpp | 2 + torch/csrc/jit/codegen/cuda/parser.cpp | 180 +- torch/csrc/jit/codegen/cuda/partition.cpp | 2 +- .../codegen/cuda/python_frontend/README.md | 138 + .../examples/double_half_cast.py | 30 - .../examples/half_double_cast.py | 28 - .../examples/python_example.py | 36 - .../python_example_broadcast_in_dim.py | 94 - .../examples/python_example_fp16.py | 35 - .../cuda/python_frontend/fusion_cache.cpp | 155 + .../cuda/python_frontend/fusion_cache.h | 111 + .../python_frontend/fusion_definition.cpp | 179 +- .../cuda/python_frontend/fusion_definition.h | 114 +- .../cuda/python_frontend/fusion_interface.cpp | 65 + .../cuda/python_frontend/fusion_interface.h | 72 + .../cuda/python_frontend/fusion_owner.h | 36 - .../cuda/python_frontend/fusion_record.h | 1463 +- .../cuda/python_frontend/python_bindings.cpp | 1704 +- .../test/test_nvfuser_fusion_cache.cpp | 266 + .../test/test_nvfuser_fusion_definition.cpp | 196 + .../test/test_nvfuser_fusion_record.cpp | 136 + .../csrc/jit/codegen/cuda/reference_tensor.h | 27 - .../jit/codegen/cuda/register_interface.cpp | 1 + .../csrc/jit/codegen/cuda/root_domain_map.cpp | 208 +- torch/csrc/jit/codegen/cuda/root_domain_map.h | 17 +- .../jit/codegen/cuda/runtime/array_rocm.cu | 236 + .../codegen/cuda/runtime/bf16_support_rocm.cu | 39 + .../cuda/runtime/block_sync_default_rocm.cu | 12 + .../codegen/cuda/runtime/fused_reduction.cu | 1855 +- .../cuda/runtime/fused_welford_helper.cu | 93 + .../cuda/runtime/fused_welford_impl.cu | 623 + .../csrc/jit/codegen/cuda/runtime/helpers.cu | 25 +- torch/csrc/jit/codegen/cuda/runtime/memory.cu | 25 + .../codegen/cuda/runtime/random_numbers.cu | 172 +- torch/csrc/jit/codegen/cuda/runtime/tuple.cu | 173 + .../jit/codegen/cuda/runtime/warp_rocm.cu | 76 + .../codegen/cuda/scheduler/all_schedulers.h | 5 +- .../cuda/scheduler/compile_time_info.h | 56 +- .../jit/codegen/cuda/scheduler/heuristic.h | 3 +- .../jit/codegen/cuda/scheduler/mma_utils.cpp | 10 +- .../codegen/cuda/scheduler/normalization.cpp | 8 +- .../jit/codegen/cuda/scheduler/pointwise.cpp | 278 +- .../jit/codegen/cuda/scheduler/pointwise.h | 138 + .../cuda/scheduler/pointwise_utils.cpp | 46 +- .../codegen/cuda/scheduler/pointwise_utils.h | 20 +- .../jit/codegen/cuda/scheduler/reduction.cpp | 8 +- .../cuda/scheduler/reduction_utils.cpp | 13 +- .../jit/codegen/cuda/scheduler/registry.cpp | 487 +- .../jit/codegen/cuda/scheduler/registry.h | 30 +- .../jit/codegen/cuda/scheduler/transpose.cpp | 1140 + .../jit/codegen/cuda/scheduler/transpose.h | 115 + .../cuda/scheduler/transpose_heuristic.h | 163 + .../csrc/jit/codegen/cuda/scheduler/utils.cpp | 962 +- torch/csrc/jit/codegen/cuda/scheduler/utils.h | 178 +- .../cuda/scheduler/vectorize_helper.cpp | 286 + .../codegen/cuda/scheduler/vectorize_helper.h | 14 +- torch/csrc/jit/codegen/cuda/tensor_view.cpp | 141 +- torch/csrc/jit/codegen/cuda/test/test_gpu.cpp | 25499 ---------- .../csrc/jit/codegen/cuda/test/test_gpu1.cpp | 9985 ++++ .../csrc/jit/codegen/cuda/test/test_gpu2.cpp | 9801 ++++ .../csrc/jit/codegen/cuda/test/test_gpu3.cpp | 6538 +++ .../cuda/test/test_gpu_fused_reduction.cpp | 312 + .../jit/codegen/cuda/test/test_gpu_rng.cu | 399 + .../jit/codegen/cuda/test/test_gpu_shift.cpp | 67 + .../cuda/test/test_gpu_tensor_factories.cpp | 339 + .../codegen/cuda/test/test_gpu_transpose.cpp | 1260 + .../jit/codegen/cuda/test/test_gpu_utils.cpp | 273 + .../codegen/cuda/test/test_gpu_validator.h | 61 +- .../jit/codegen/cuda/test/test_gpu_view.cpp | 1108 +- torch/csrc/jit/codegen/cuda/test/test_utils.h | 310 +- .../jit/codegen/cuda/tools/stringify_file.py | 10 +- .../csrc/jit/codegen/cuda/transform_iter.cpp | 14 +- .../jit/codegen/cuda/transform_rfactor.cpp | 2 +- .../csrc/jit/codegen/cuda/transform_view.cpp | 999 +- torch/csrc/jit/codegen/cuda/transform_view.h | 62 +- torch/csrc/jit/codegen/cuda/type.cpp | 64 +- torch/csrc/jit/codegen/cuda/type.h | 17 +- .../csrc/jit/codegen/cuda/type_inference.cpp | 11 +- torch/csrc/jit/codegen/cuda/utils.cpp | 129 +- torch/csrc/jit/codegen/cuda/utils.h | 26 +- torch/csrc/jit/codegen/fuser/codegen.cpp | 2 +- .../csrc/jit/codegen/fuser/cpu/fused_kernel.h | 1 - torch/csrc/jit/codegen/fuser/cpu/temp_file.h | 4 +- .../jit/codegen/fuser/cuda/fused_kernel.cpp | 7 + torch/csrc/jit/codegen/fuser/fused_kernel.h | 4 +- .../jit/codegen/onednn/LlgaTensorImpl.cpp | 11 +- .../csrc/jit/codegen/onednn/LlgaTensorImpl.h | 9 +- torch/csrc/jit/codegen/onednn/README.md | 31 +- .../jit/codegen/onednn/decompose_silu.cpp | 65 + .../csrc/jit/codegen/onednn/decompose_silu.h | 15 + .../csrc/jit/codegen/onednn/graph_helper.cpp | 495 +- torch/csrc/jit/codegen/onednn/graph_helper.h | 15 +- torch/csrc/jit/codegen/onednn/interface.cpp | 13 +- torch/csrc/jit/codegen/onednn/kernel.cpp | 53 +- .../jit/codegen/onednn/layout_propagation.cpp | 9 + torch/csrc/jit/codegen/onednn/operator.h | 63 +- .../jit/codegen/onednn/prepare_binary.cpp | 123 +- torch/csrc/jit/docs/serialization.md | 2 +- .../jit/frontend/function_schema_parser.cpp | 6 +- torch/csrc/jit/frontend/ir_emitter.cpp | 2 +- torch/csrc/jit/frontend/schema_matching.cpp | 19 +- torch/csrc/jit/frontend/schema_matching.h | 2 + .../csrc/jit/frontend/schema_type_parser.cpp | 9 +- .../csrc/jit/frontend/script_type_parser.cpp | 2 +- torch/csrc/jit/frontend/tracer.cpp | 45 +- torch/csrc/jit/frontend/tracer.h | 18 + torch/csrc/jit/ir/alias_analysis.cpp | 15 +- torch/csrc/jit/ir/constants.cpp | 6 +- torch/csrc/jit/ir/ir.cpp | 3 + torch/csrc/jit/ir/ir.h | 26 +- torch/csrc/jit/ir/irparser.cpp | 2 +- .../mobile/compatibility/backport_manager.cpp | 2 + .../compatibility/model_compatibility.cpp | 2 +- torch/csrc/jit/mobile/flatbuffer_loader.cpp | 8 +- torch/csrc/jit/mobile/import.cpp | 54 +- torch/csrc/jit/mobile/import_data.cpp | 2 +- torch/csrc/jit/mobile/interpreter.cpp | 4 + .../jit/mobile/model_tracer/TracerRunner.cpp | 27 +- .../jit/mobile/model_tracer/TracerRunner.h | 13 + torch/csrc/jit/mobile/model_tracer/tracer.cpp | 83 +- torch/csrc/jit/mobile/module.cpp | 44 + torch/csrc/jit/mobile/module.h | 18 + torch/csrc/jit/mobile/parse_bytecode.cpp | 2 +- torch/csrc/jit/mobile/parse_operators.cpp | 4 +- torch/csrc/jit/mobile/profiler_edge.cpp | 12 +- torch/csrc/jit/mobile/profiler_edge.h | 3 +- torch/csrc/jit/mobile/promoted_prim_ops.cpp | 20 + torch/csrc/jit/mobile/promoted_prim_ops.h | 8 + torch/csrc/jit/mobile/quantization.cpp | 66 + torch/csrc/jit/mobile/quantization.h | 38 + torch/csrc/jit/operator_upgraders/README.md | 6 +- torch/csrc/jit/passes/freeze_module.cpp | 199 +- .../frozen_conv_add_relu_fusion_cuda.cpp | 16 +- .../csrc/jit/passes/frozen_ops_to_mkldnn.cpp | 14 +- .../jit/passes/hoist_conv_packed_params.cpp | 15 +- torch/csrc/jit/passes/mkldnn_rewrite.cpp | 3 - torch/csrc/jit/passes/mobile_optimizer_type.h | 13 + torch/csrc/jit/passes/normalize_ops.cpp | 2 + torch/csrc/jit/passes/onnx.cpp | 88 +- torch/csrc/jit/passes/onnx.h | 2 - torch/csrc/jit/passes/onnx/constant_fold.cpp | 6 +- .../passes/onnx/fixup_onnx_controlflow.cpp | 17 +- .../jit/passes/onnx/function_extraction.cpp | 11 +- .../jit/passes/onnx/function_substitution.cpp | 109 +- torch/csrc/jit/passes/onnx/helper.cpp | 12 +- torch/csrc/jit/passes/onnx/naming.cpp | 205 + torch/csrc/jit/passes/onnx/naming.h | 30 + .../pattern_conversion/pattern_conversion.cpp | 18 +- torch/csrc/jit/passes/onnx/peephole.cpp | 8 +- .../onnx/remove_inplace_ops_for_onnx.cpp | 15 + .../jit/passes/onnx/scalar_type_analysis.cpp | 7 +- .../jit/passes/onnx/shape_type_inference.cpp | 145 +- .../jit/passes/onnx/shape_type_inference.h | 7 +- .../passes/onnx/unpack_quantized_weights.cpp | 47 +- torch/csrc/jit/passes/peephole_non_tensor.cpp | 2 +- .../csrc/jit/passes/quantization/finalize.cpp | 172 + torch/csrc/jit/passes/quantization/finalize.h | 4 + torch/csrc/jit/passes/quantization/helper.cpp | 15 + torch/csrc/jit/passes/quantization/helper.h | 10 +- .../passes/quantization/insert_observers.cpp | 133 + .../passes/quantization/insert_observers.h | 22 + .../quantization/insert_quant_dequant.cpp | 349 +- .../quantization/insert_quant_dequant.h | 7 + .../quantization/quantization_patterns.h | 23 + .../quantization/register_packed_params.cpp | 149 + .../quantization/register_packed_params.h | 20 + .../jit/passes/symbolic_shape_analysis.cpp | 53 +- torch/csrc/jit/passes/tensorexpr_fuser.cpp | 52 +- torch/csrc/jit/passes/utils/memory_dag.cpp | 2 +- torch/csrc/jit/passes/vulkan_rewrite.cpp | 101 +- torch/csrc/jit/passes/vulkan_rewrite.h | 2 + torch/csrc/jit/passes/xnnpack_rewrite.cpp | 1 + torch/csrc/jit/passes/xnnpack_rewrite.h | 10 +- torch/csrc/jit/python/init.cpp | 345 +- torch/csrc/jit/python/module_python.h | 20 +- torch/csrc/jit/python/pybind_utils.cpp | 412 +- torch/csrc/jit/python/pybind_utils.h | 290 +- torch/csrc/jit/python/python_ir.cpp | 12 +- .../csrc/jit/python/python_sugared_value.cpp | 2 +- torch/csrc/jit/python/python_tracer.cpp | 65 +- torch/csrc/jit/python/python_tracer.h | 10 + torch/csrc/jit/python/script_init.cpp | 99 +- .../jit/runtime/decomposition_registry.cpp | 42 + .../csrc/jit/runtime/decomposition_registry.h | 6 + torch/csrc/jit/runtime/graph_executor.cpp | 13 +- torch/csrc/jit/runtime/interpreter.cpp | 3 +- torch/csrc/jit/runtime/register_prim_ops.cpp | 49 +- .../serialized_shape_function_registry.cpp | 44 +- torch/csrc/jit/runtime/static/README.md | 104 +- .../csrc/jit/runtime/static/generated_ops.cpp | 230 +- torch/csrc/jit/runtime/static/impl.cpp | 42 +- torch/csrc/jit/runtime/static/native_ops.cpp | 286 +- torch/csrc/jit/runtime/static/ops.cpp | 153 +- torch/csrc/jit/runtime/static/ops.h | 21 +- torch/csrc/jit/runtime/static/passes.cpp | 28 +- torch/csrc/jit/runtime/static/passes.h | 4 +- .../jit/runtime/symbolic_shape_registry.cpp | 6 + .../runtime/symbolic_shape_registry_util.cpp | 2 + .../callstack_debug_info_serialization.cpp | 4 +- torch/csrc/jit/serialization/export.cpp | 84 +- .../jit/serialization/export_bytecode.cpp | 2 +- .../csrc/jit/serialization/export_module.cpp | 7 +- .../serialization/flatbuffer_serializer.cpp | 9 +- torch/csrc/jit/serialization/import.cpp | 2 +- .../csrc/jit/serialization/import_source.cpp | 33 +- torch/csrc/jit/serialization/pickler.cpp | 21 +- torch/csrc/jit/serialization/pickler.h | 74 +- .../source_range_serialization.cpp | 2 +- torch/csrc/jit/serialization/unpickler.cpp | 96 +- torch/csrc/jit/serialization/unpickler.h | 6 +- torch/csrc/jit/tensorexpr/half_support.h | 19 +- torch/csrc/jit/tensorexpr/kernel.cpp | 118 +- torch/csrc/jit/tensorexpr/kernel.h | 3 + torch/csrc/jit/tensorexpr/llvm_codegen.cpp | 110 +- torch/csrc/jit/tensorexpr/loopnest.cpp | 12 +- torch/csrc/jit/tensorexpr/lowerings.cpp | 52 + torch/csrc/jit/tensorexpr/operators/misc.cpp | 41 +- torch/csrc/jit/tensorexpr/reduction.cpp | 31 + torch/csrc/jit/tensorexpr/reduction.h | 62 + torch/csrc/jit/tensorexpr/stmt.h | 8 +- torch/csrc/jit/tensorexpr/tensor.cpp | 40 +- torch/csrc/jit/tensorexpr/tensor.h | 9 + torch/csrc/jit/tensorexpr/types.cpp | 2 +- torch/csrc/lazy/backend/backend_device.cpp | 6 +- torch/csrc/lazy/backend/backend_device.h | 2 + torch/csrc/lazy/backend/backend_interface.cpp | 3 +- torch/csrc/lazy/backend/backend_interface.h | 6 +- torch/csrc/lazy/backend/lowering_context.cpp | 2 +- torch/csrc/lazy/backend/lowering_context.h | 4 +- torch/csrc/lazy/core/config.cpp | 7 +- torch/csrc/lazy/core/config.h | 1 + torch/csrc/lazy/core/debug_util.cpp | 2 +- torch/csrc/lazy/core/dynamic_ir.h | 5 +- torch/csrc/lazy/core/internal_ops/ltc_ops.h | 8 - torch/csrc/lazy/core/ir_builder.h | 160 +- torch/csrc/lazy/core/ir_dump_util.cpp | 16 +- torch/csrc/lazy/core/ir_dump_util.h | 12 +- torch/csrc/lazy/core/ir_metadata.cpp | 29 +- torch/csrc/lazy/core/ir_util.cpp | 30 +- torch/csrc/lazy/core/ir_util.h | 11 +- torch/csrc/lazy/core/lazy_graph_executor.cpp | 81 +- torch/csrc/lazy/core/lazy_graph_executor.h | 21 +- torch/csrc/lazy/core/lazy_view.cpp | 262 - torch/csrc/lazy/core/lazy_view.h | 173 - torch/csrc/lazy/core/metrics.cpp | 45 +- torch/csrc/lazy/core/metrics.h | 9 + torch/csrc/lazy/core/shape_inference.cpp | 90 +- torch/csrc/lazy/core/shape_inference.h | 9 +- torch/csrc/lazy/core/tensor.cpp | 167 +- torch/csrc/lazy/core/tensor.h | 61 +- torch/csrc/lazy/core/tensor_impl.cpp | 49 +- torch/csrc/lazy/core/tensor_impl.h | 20 +- torch/csrc/lazy/core/tensor_util.cpp | 3 + torch/csrc/lazy/python/init.cpp | 18 +- torch/csrc/lazy/python/python_util.cpp | 4 +- torch/csrc/lazy/ts_backend/dynamic_ir.cpp | 14 +- torch/csrc/lazy/ts_backend/dynamic_ir.h | 8 +- torch/csrc/lazy/ts_backend/ir_builder.h | 82 - .../csrc/lazy/ts_backend/tensor_aten_ops.cpp | 219 - torch/csrc/lazy/ts_backend/tensor_aten_ops.h | 92 +- .../csrc/lazy/ts_backend/ts_backend_impl.cpp | 11 +- .../lazy/ts_backend/ts_lowering_context.cpp | 2 +- .../lazy/ts_backend/ts_lowering_context.h | 2 +- .../lazy/ts_backend/ts_native_functions.cpp | 92 +- .../csrc/lazy/ts_backend/ts_node_lowering.cpp | 196 - torch/csrc/lazy/tutorial.md | 2 +- torch/csrc/onnx/diagnostics/diagnostics.h | 63 + torch/csrc/onnx/diagnostics/generated/rules.h | 48 + torch/csrc/onnx/init.cpp | 13 +- torch/csrc/onnx/onnx.h | 2 + torch/csrc/profiler/api.cpp | 184 - torch/csrc/profiler/api.h | 167 +- torch/csrc/profiler/collection.cpp | 571 +- torch/csrc/profiler/collection.h | 289 +- torch/csrc/profiler/containers.h | 22 +- torch/csrc/profiler/data_flow.cpp | 197 + torch/csrc/profiler/data_flow.h | 95 + torch/csrc/profiler/events.h | 30 + .../csrc/profiler/kineto_client_interface.cpp | 43 +- torch/csrc/profiler/kineto_shim.cpp | 4 +- torch/csrc/profiler/kineto_shim.h | 2 + .../csrc/profiler/orchestration/observer.cpp | 181 + torch/csrc/profiler/orchestration/observer.h | 135 + .../profiler/orchestration/python_tracer.cpp | 37 + .../profiler/orchestration/python_tracer.h | 62 + torch/csrc/profiler/perf-inl.h | 72 + torch/csrc/profiler/perf.cpp | 199 + torch/csrc/profiler/perf.h | 105 + torch/csrc/profiler/python/init.cpp | 295 + torch/csrc/profiler/python/init.h | 35 + torch/csrc/profiler/python/pybind.h | 50 + .../execution_graph_observer.cpp | 28 +- .../execution_graph_observer.h | 0 .../{ => standalone}/itt_observer.cpp | 9 +- .../profiler/{ => standalone}/itt_observer.h | 0 .../{ => standalone}/nvtx_observer.cpp | 11 +- .../profiler/{ => standalone}/nvtx_observer.h | 0 torch/csrc/profiler/stubs/base.cpp | 81 + torch/csrc/profiler/stubs/base.h | 43 + torch/csrc/profiler/{ => stubs}/cuda.cpp | 16 +- torch/csrc/profiler/{ => stubs}/itt.cpp | 6 +- torch/csrc/profiler/util.cpp | 17 +- torch/csrc/profiler/util.h | 26 +- torch/csrc/serialization.cpp | 25 +- torch/csrc/tensor/python_tensor.cpp | 8 +- torch/csrc/utils.cpp | 42 + torch/csrc/utils/disable_torch_function.cpp | 5 +- torch/csrc/utils/disallow_copy.h | 5 - torch/csrc/utils/invalid_arguments.cpp | 33 +- torch/csrc/utils/nested.cpp | 91 + torch/csrc/utils/nested.h | 17 + torch/csrc/utils/pybind.cpp | 83 + torch/csrc/utils/pybind.h | 130 +- torch/csrc/utils/python_arg_parser.cpp | 225 +- torch/csrc/utils/python_arg_parser.h | 188 +- torch/csrc/utils/python_compat.h | 5 + torch/csrc/utils/python_dispatch.cpp | 328 +- torch/csrc/utils/python_dispatch.h | 7 +- torch/csrc/utils/python_numbers.h | 9 - torch/csrc/utils/python_symnode.cpp | 19 + torch/csrc/utils/python_symnode.h | 178 + torch/csrc/utils/python_torch_function_mode.h | 15 +- torch/csrc/utils/schema_info.cpp | 4 + torch/csrc/utils/schema_info.h | 2 + torch/csrc/utils/tensor_memoryformats.cpp | 6 +- torch/csrc/utils/tensor_memoryformats.h | 4 +- torch/csrc/utils/tensor_new.cpp | 32 +- torch/csrc/utils/tensor_types.cpp | 4 + torch/csrc/utils/torch_dispatch_mode.h | 29 +- torch/cuda/__init__.py | 183 +- torch/cuda/_dynamo_graphs.py | 21 +- torch/cuda/_memory_viz.py | 256 +- torch/cuda/_sanitizer.py | 641 + torch/cuda/amp/autocast_mode.py | 4 +- torch/cuda/amp/common.py | 1 + torch/cuda/amp/grad_scaler.py | 2 + torch/cuda/graphs.py | 13 +- torch/cuda/jiterator.py | 4 +- torch/cuda/memory.py | 124 +- torch/cuda/profiler.py | 1 + torch/deploy.h | 3 - torch/distributed/__init__.py | 18 +- torch/distributed/_composable/__init__.py | 4 + torch/distributed/_composable/_ddp.py | 1877 + .../_composable/checkpoint_activation.py | 157 + torch/distributed/_composable/contract.py | 152 + torch/distributed/_composable/fully_shard.py | 80 + torch/distributed/_composable/replicate.py | 107 + .../distributed/_shard/checkpoint/__init__.py | 25 +- .../_shard/checkpoint/filesystem.py | 145 - .../_shard/checkpoint/resharding.py | 306 - .../_shard/checkpoint/state_dict_loader.py | 174 - .../_shard/checkpoint/state_dict_saver.py | 177 - .../distributed/_shard/checkpoint/storage.py | 188 - .../_shard/sharded_tensor/_ops/tensor_ops.py | 15 +- .../distributed/_shard/sharded_tensor/api.py | 10 +- .../_shard/sharding_spec/_internals.py | 85 +- .../sharding_spec/chunk_sharding_spec.py | 8 +- .../chunk_sharding_spec_ops/_common.py | 270 +- .../chunk_sharding_spec_ops/embedding.py | 170 +- .../chunk_sharding_spec_ops/embedding_bag.py | 621 +- torch/distributed/_sharding_spec/__init__.py | 4 +- torch/distributed/_spmd/__init__.py | 0 torch/distributed/_spmd/comm_tensor.py | 241 + torch/distributed/_tensor/README.md | 3 + torch/distributed/_tensor/__init__.py | 189 + torch/distributed/_tensor/api.py | 393 + torch/distributed/_tensor/device_mesh.py | 506 + torch/distributed/_tensor/dispatch.py | 301 + torch/distributed/_tensor/ops/__init__.py | 7 + torch/distributed/_tensor/ops/common_rules.py | 376 + torch/distributed/_tensor/ops/math_ops.py | 141 + torch/distributed/_tensor/ops/matrix_ops.py | 129 + .../distributed/_tensor/ops/pointwise_ops.py | 396 + torch/distributed/_tensor/ops/tensor_ops.py | 481 + .../_tensor/ops/tp_sharding_ops.py | 55 + torch/distributed/_tensor/ops/utils.py | 81 + torch/distributed/_tensor/ops/view_ops.py | 707 + .../distributed/_tensor/parallel/__init__.py | 36 + .../_tensor/parallel/_view_with_dim_change.py | 108 + torch/distributed/_tensor/parallel/api.py | 415 + torch/distributed/_tensor/parallel/fsdp.py | 359 + .../parallel/multihead_attention_tp.py | 273 + torch/distributed/_tensor/parallel/style.py | 233 + torch/distributed/_tensor/parallel/utils.py | 152 + torch/distributed/_tensor/placement_types.py | 432 + torch/distributed/_tensor/redistribute.py | 236 + torch/distributed/_tensor/utils.py | 53 + .../_checkpoint/checkpoint_wrapper.py | 197 +- .../algorithms/_comm_hooks/default_hooks.py | 85 +- .../ddp_comm_hooks/ddp_zero_hook.py | 5 +- .../ddp_comm_hooks/debugging_hooks.py | 1 + .../ddp_comm_hooks/default_hooks.py | 1 + .../ddp_comm_hooks/optimizer_overlap_hooks.py | 4 +- .../ddp_comm_hooks/powerSGD_hook.py | 4 +- .../hierarchical_model_averager.py | 2 +- .../algorithms/model_averaging/utils.py | 2 + torch/distributed/benchmarks/README.md | 2 +- .../benchmarks/benchmark_ddp_rpc.py | 2 +- torch/distributed/c10d_error_logger.py | 33 + torch/distributed/checkpoint/__init__.py | 21 + .../{_shard => }/checkpoint/api.py | 10 +- torch/distributed/checkpoint/dedup_tensors.py | 58 + .../distributed/checkpoint/default_planner.py | 244 + torch/distributed/checkpoint/filesystem.py | 313 + .../{_shard => }/checkpoint/metadata.py | 55 +- torch/distributed/checkpoint/planner.py | 377 + .../distributed/checkpoint/planner_helpers.py | 221 + torch/distributed/checkpoint/resharding.py | 55 + .../checkpoint/state_dict_loader.py | 111 + .../checkpoint/state_dict_saver.py | 115 + torch/distributed/checkpoint/storage.py | 233 + torch/distributed/checkpoint/traverse.py | 170 + .../{_shard => }/checkpoint/utils.py | 113 +- torch/distributed/distributed_c10d.py | 894 +- .../elastic/agent/server/__init__.py | 3 +- torch/distributed/elastic/agent/server/api.py | 2 +- .../agent/server/local_elastic_agent.py | 99 +- .../elastic/multiprocessing/api.py | 6 +- .../multiprocessing/errors/__init__.py | 1 + .../elastic/multiprocessing/tail_log.py | 1 + .../elastic/rendezvous/etcd_rendezvous.py | 2 +- torch/distributed/elastic/timer/__init__.py | 1 + .../elastic/timer/file_based_local_timer.py | 330 + torch/distributed/fsdp/__init__.py | 2 +- torch/distributed/fsdp/_common_utils.py | 202 + torch/distributed/fsdp/_exec_order_utils.py | 384 + torch/distributed/fsdp/_fsdp_extensions.py | 115 + torch/distributed/fsdp/_init_utils.py | 763 + torch/distributed/fsdp/_limiter_utils.py | 33 + torch/distributed/fsdp/_optim_utils.py | 595 +- torch/distributed/fsdp/_runtime_utils.py | 1155 + .../fsdp/{shard_utils.py => _shard_utils.py} | 111 +- torch/distributed/fsdp/_state_dict_utils.py | 694 + torch/distributed/fsdp/_symbolic_trace.py | 15 +- .../distributed/fsdp/_unshard_param_utils.py | 254 + torch/distributed/fsdp/_utils.py | 107 +- torch/distributed/fsdp/_wrap_utils.py | 170 + torch/distributed/fsdp/api.py | 245 + torch/distributed/fsdp/flat_param.py | 1657 +- .../fsdp/flatten_params_wrapper.py | 156 - .../fsdp/fully_sharded_data_parallel.py | 4369 +- torch/distributed/fsdp/sharded_grad_scaler.py | 66 +- torch/distributed/fsdp/wrap.py | 309 +- torch/distributed/logging_handlers.py | 16 + torch/distributed/nn/api/remote_module.py | 41 +- torch/distributed/optim/__init__.py | 1 + .../optim/apply_optimizer_in_backward.py | 78 + torch/distributed/optim/functional_adam.py | 13 +- torch/distributed/optim/functional_rprop.py | 5 +- torch/distributed/optim/optimizer.py | 6 +- .../optim/zero_redundancy_optimizer.py | 20 +- .../optim/zero_redundancy_optimizer.pyi | 3 - .../pipeline/sync/_balance/profile.py | 4 +- torch/distributed/pipeline/sync/checkpoint.py | 4 +- torch/distributed/pipeline/sync/copy.py | 2 +- torch/distributed/pipeline/sync/dependency.py | 2 +- torch/distributed/pipeline/sync/microbatch.py | 2 +- torch/distributed/pipeline/sync/phony.py | 2 +- torch/distributed/pipeline/sync/pipe.py | 7 +- torch/distributed/pipeline/sync/pipeline.py | 2 +- torch/distributed/pipeline/sync/stream.py | 6 +- torch/distributed/pipeline/sync/utils.py | 2 + torch/distributed/pipeline/sync/worker.py | 2 +- torch/distributed/rpc/__init__.py | 5 +- torch/distributed/rpc/api.py | 18 +- torch/distributed/rpc/backend_registry.py | 3 + torch/distributed/rpc/constants.py | 4 +- torch/distributed/rpc/internal.py | 16 +- torch/distributed/rpc/options.py | 3 +- torch/distributed/utils.py | 73 +- torch/distributions/distribution.py | 50 +- torch/distributions/half_cauchy.py | 2 +- torch/distributions/half_normal.py | 2 +- torch/distributions/kl.py | 1 + torch/distributions/lkj_cholesky.py | 2 +- .../lowrank_multivariate_normal.py | 4 +- torch/distributions/mixture_same_family.py | 2 +- torch/distributions/multivariate_normal.py | 4 +- .../distributions/transformed_distribution.py | 25 +- torch/distributions/utils.py | 2 + torch/distributions/wishart.py | 11 +- torch/fft/__init__.py | 2 +- torch/functional.py | 185 +- torch/futures/__init__.py | 2 + torch/fx/OVERVIEW.md | 2 +- torch/fx/_symbolic_trace.py | 113 +- .../experimental/accelerator_partitioner.py | 2 +- torch/fx/experimental/const_fold.py | 9 +- .../experimental/graph_gradual_typechecker.py | 4 +- torch/fx/experimental/meta_tracer.py | 4 +- .../constraint_generator.py | 64 +- torch/fx/experimental/normalize.py | 1 + torch/fx/experimental/proxy_tensor.py | 787 +- torch/fx/experimental/symbolic_shapes.py | 713 +- torch/fx/experimental/unification/core.py | 2 + torch/fx/experimental/unification/dispatch.py | 2 +- torch/fx/experimental/unification/match.py | 4 +- .../unification/multipledispatch/conflict.py | 2 + .../unification/multipledispatch/core.py | 5 +- .../multipledispatch/dispatcher.py | 4 +- .../unification/multipledispatch/utils.py | 1 + .../unification/multipledispatch/variadic.py | 1 + torch/fx/experimental/unification/utils.py | 1 + torch/fx/graph.py | 128 +- torch/fx/graph_module.py | 22 +- torch/fx/immutable_collections.py | 2 + torch/fx/interpreter.py | 17 +- torch/fx/node.py | 17 +- torch/fx/operator_schemas.py | 3 + torch/fx/passes/README.md | 2 +- torch/fx/passes/backends/cudagraphs.py | 7 +- torch/fx/passes/backends/nvfuser.py | 286 - torch/fx/passes/fake_tensor_prop.py | 14 +- torch/fx/passes/graph_drawer.py | 14 +- torch/fx/passes/infra/partitioner.py | 299 +- torch/fx/passes/infra/pass_manager.py | 85 +- torch/fx/passes/net_min_base.py | 153 +- torch/fx/passes/pass_manager.py | 65 +- torch/fx/passes/reinplace.py | 316 +- torch/fx/passes/shape_prop.py | 2 +- torch/fx/passes/split_module.py | 162 +- torch/fx/passes/split_utils.py | 8 +- torch/fx/passes/splitter_base.py | 72 +- torch/fx/passes/tests/test_pass_manager.py | 22 + torch/fx/passes/utils/fuser_utils.py | 4 +- torch/fx/passes/utils/matcher_utils.py | 183 +- torch/fx/proxy.py | 21 +- torch/fx/subgraph_rewriter.py | 339 +- torch/fx/tensor_type.py | 4 +- torch/fx/traceback.py | 13 +- torch/hub.py | 16 +- torch/jit/_builtins.py | 2 +- torch/jit/_freeze.py | 7 +- torch/jit/_fuser.py | 17 +- torch/jit/_recursive.py | 3 + torch/jit/_shape_functions.py | 63 +- torch/jit/_trace.py | 175 +- torch/jit/annotations.py | 14 +- torch/jit/frontend.py | 4 +- torch/jit/quantized.py | 18 +- torch/lib/libshm/CMakeLists.txt | 30 +- torch/library.h | 27 +- torch/library.py | 8 +- torch/linalg/__init__.py | 51 +- torch/masked/__init__.py | 37 + torch/{_masked => masked}/_docs.py | 42 +- torch/{_masked/__init__.py => masked/_ops.py} | 179 +- torch/masked/maskedtensor/__init__.py | 8 + torch/masked/maskedtensor/_ops_refs.py | 473 + torch/masked/maskedtensor/binary.py | 192 + torch/masked/maskedtensor/core.py | 335 + torch/masked/maskedtensor/creation.py | 21 + torch/masked/maskedtensor/passthrough.py | 43 + torch/masked/maskedtensor/reductions.py | 173 + torch/masked/maskedtensor/unary.py | 188 + torch/monitor/__init__.py | 1 + torch/multiprocessing/reductions.py | 21 +- torch/nested/__init__.py | 149 + torch/nn/functional.py | 204 +- torch/nn/init.py | 2 +- torch/nn/intrinsic/__init__.py | 36 +- torch/nn/intrinsic/modules/__init__.py | 15 +- torch/nn/intrinsic/modules/fused.py | 158 +- torch/nn/intrinsic/qat/modules/conv_fused.py | 764 +- .../nn/intrinsic/qat/modules/linear_fused.py | 176 +- torch/nn/intrinsic/qat/modules/linear_relu.py | 57 +- torch/nn/intrinsic/quantized/__init__.py | 9 + .../quantized/dynamic/modules/__init__.py | 1 - .../quantized/dynamic/modules/linear_relu.py | 54 +- .../nn/intrinsic/quantized/modules/bn_relu.py | 83 +- .../intrinsic/quantized/modules/conv_relu.py | 175 +- .../quantized/modules/linear_relu.py | 44 +- torch/nn/modules/_functions.py | 12 +- torch/nn/modules/activation.py | 79 +- torch/nn/modules/batchnorm.py | 15 +- torch/nn/modules/container.py | 4 +- torch/nn/modules/conv.py | 4 +- torch/nn/modules/distance.py | 13 +- torch/nn/modules/fold.py | 16 +- torch/nn/modules/loss.py | 9 +- torch/nn/modules/module.py | 472 +- torch/nn/modules/pooling.py | 6 +- torch/nn/modules/rnn.py | 3 + torch/nn/modules/sparse.py | 10 +- torch/nn/modules/transformer.py | 56 +- torch/nn/modules/upsampling.py | 88 +- torch/nn/parallel/distributed.py | 501 +- torch/nn/parallel/distributed.pyi | 21 - torch/nn/parameter.py | 12 +- torch/nn/qat/__init__.py | 17 + torch/nn/qat/dynamic/__init__.py | 6 + torch/nn/qat/dynamic/modules/linear.py | 35 +- torch/nn/qat/modules/__init__.py | 20 +- torch/nn/qat/modules/conv.py | 276 +- torch/nn/qat/modules/embedding_ops.py | 151 +- torch/nn/qat/modules/linear.py | 87 +- torch/nn/quantizable/modules/__init__.py | 6 +- torch/nn/quantizable/modules/activation.py | 464 +- torch/nn/quantizable/modules/rnn.py | 395 +- torch/nn/quantized/__init__.py | 39 + .../quantized/_reference/modules/__init__.py | 19 +- torch/nn/quantized/_reference/modules/conv.py | 335 +- .../nn/quantized/_reference/modules/linear.py | 67 +- torch/nn/quantized/_reference/modules/rnn.py | 494 +- .../nn/quantized/_reference/modules/sparse.py | 105 +- .../nn/quantized/_reference/modules/utils.py | 175 +- torch/nn/quantized/dynamic/__init__.py | 2 +- .../nn/quantized/dynamic/modules/__init__.py | 19 +- torch/nn/quantized/dynamic/modules/conv.py | 409 +- torch/nn/quantized/dynamic/modules/linear.py | 137 +- torch/nn/quantized/dynamic/modules/rnn.py | 1066 +- torch/nn/quantized/functional.py | 619 +- torch/nn/quantized/modules/__init__.py | 127 +- torch/nn/quantized/modules/activation.py | 296 +- torch/nn/quantized/modules/batchnorm.py | 115 +- torch/nn/quantized/modules/conv.py | 934 +- torch/nn/quantized/modules/dropout.py | 35 +- torch/nn/quantized/modules/embedding_ops.py | 303 +- .../quantized/modules/functional_modules.py | 240 +- torch/nn/quantized/modules/linear.py | 305 +- torch/nn/quantized/modules/normalization.py | 216 +- torch/nn/quantized/modules/rnn.py | 54 +- torch/nn/quantized/modules/utils.py | 88 +- torch/nn/utils/_deprecation_utils.py | 45 + .../conv_expanded_weights.py | 23 +- .../nn/utils/_expanded_weights/conv_utils.py | 56 +- torch/nn/utils/fusion.py | 8 +- torch/nn/utils/parametrizations.py | 16 +- torch/nn/utils/parametrize.py | 6 +- torch/nn/utils/stateless.py | 18 +- torch/onnx/README.md | 96 +- torch/onnx/__init__.py | 48 +- torch/onnx/_constants.py | 13 +- torch/onnx/_deprecation.py | 39 +- torch/onnx/_exporter_states.py | 25 +- torch/onnx/_globals.py | 30 +- torch/onnx/_internal/__init__.py | 0 torch/onnx/_internal/_beartype.py | 99 + torch/onnx/_internal/diagnostics/OVERVIEW.md | 83 + torch/onnx/_internal/diagnostics/__init__.py | 19 + .../onnx/_internal/diagnostics/_diagnostic.py | 153 + torch/onnx/_internal/diagnostics/_rules.py | 172 + .../_internal/diagnostics/infra/__init__.py | 27 + .../_internal/diagnostics/infra/_infra.py | 450 + .../_internal/diagnostics/infra/engine.py | 107 + .../_internal/diagnostics/infra/formatter.py | 77 + .../diagnostics/infra/sarif/__init__.py | 100 + .../diagnostics/infra/sarif/_address.py | 48 + .../diagnostics/infra/sarif/_artifact.py | 90 + .../infra/sarif/_artifact_change.py | 31 + .../infra/sarif/_artifact_content.py | 33 + .../infra/sarif/_artifact_location.py | 33 + .../diagnostics/infra/sarif/_attachment.py | 39 + .../diagnostics/infra/sarif/_code_flow.py | 31 + .../infra/sarif/_configuration_override.py | 31 + .../diagnostics/infra/sarif/_conversion.py | 35 + .../diagnostics/infra/sarif/_edge.py | 31 + .../infra/sarif/_edge_traversal.py | 31 + .../diagnostics/infra/sarif/_exception.py | 37 + .../infra/sarif/_external_properties.py | 100 + .../_external_property_file_reference.py | 33 + .../_external_property_file_references.py | 86 + .../_internal/diagnostics/infra/sarif/_fix.py | 31 + .../diagnostics/infra/sarif/_graph.py | 35 + .../infra/sarif/_graph_traversal.py | 43 + .../diagnostics/infra/sarif/_invocation.py | 117 + .../diagnostics/infra/sarif/_location.py | 50 + .../infra/sarif/_location_relationship.py | 28 + .../infra/sarif/_logical_location.py | 39 + .../diagnostics/infra/sarif/_message.py | 33 + .../sarif/_multiformat_message_string.py | 25 + .../diagnostics/infra/sarif/_node.py | 36 + .../diagnostics/infra/sarif/_notification.py | 55 + .../infra/sarif/_physical_location.py | 40 + .../diagnostics/infra/sarif/_property_bag.py | 19 + .../diagnostics/infra/sarif/_rectangle.py | 36 + .../diagnostics/infra/sarif/_region.py | 58 + .../diagnostics/infra/sarif/_replacement.py | 31 + .../infra/sarif/_reporting_configuration.py | 35 + .../infra/sarif/_reporting_descriptor.py | 71 + .../sarif/_reporting_descriptor_reference.py | 38 + .../_reporting_descriptor_relationship.py | 34 + .../diagnostics/infra/sarif/_result.py | 130 + .../infra/sarif/_result_provenance.py | 44 + .../_internal/diagnostics/infra/sarif/_run.py | 136 + .../infra/sarif/_run_automation_details.py | 33 + .../diagnostics/infra/sarif/_sarif_log.py | 39 + .../infra/sarif/_special_locations.py | 27 + .../diagnostics/infra/sarif/_stack.py | 31 + .../diagnostics/infra/sarif/_stack_frame.py | 33 + .../diagnostics/infra/sarif/_suppression.py | 38 + .../diagnostics/infra/sarif/_thread_flow.py | 40 + .../infra/sarif/_thread_flow_location.py | 69 + .../diagnostics/infra/sarif/_tool.py | 27 + .../infra/sarif/_tool_component.py | 125 + .../infra/sarif/_tool_component_reference.py | 30 + .../infra/sarif/_translation_metadata.py | 44 + .../infra/sarif/_version_control_details.py | 42 + .../diagnostics/infra/sarif/_web_request.py | 48 + .../diagnostics/infra/sarif/_web_response.py | 48 + .../diagnostics/infra/sarif/version.py | 5 + .../onnx/_internal/diagnostics/infra/utils.py | 35 + torch/onnx/_internal/diagnostics/rules.yaml | 84 + torch/onnx/_internal/jit_utils.py | 396 + torch/onnx/_internal/onnx_proto_utils.py | 143 + torch/onnx/_internal/registration.py | 339 + torch/onnx/_onnx_supported_ops.py | 85 +- torch/onnx/_patch_torch.py | 158 +- torch/onnx/_type_utils.py | 132 +- torch/onnx/errors.py | 52 +- torch/onnx/symbolic_caffe2.py | 143 +- torch/onnx/symbolic_helper.py | 590 +- torch/onnx/symbolic_opset10.py | 647 +- torch/onnx/symbolic_opset11.py | 698 +- torch/onnx/symbolic_opset12.py | 190 +- torch/onnx/symbolic_opset13.py | 368 +- torch/onnx/symbolic_opset14.py | 54 +- torch/onnx/symbolic_opset15.py | 53 +- torch/onnx/symbolic_opset16.py | 35 +- torch/onnx/symbolic_opset17.py | 56 + torch/onnx/symbolic_opset7.py | 20 +- torch/onnx/symbolic_opset8.py | 202 +- torch/onnx/symbolic_opset9.py | 3442 +- torch/onnx/symbolic_registry.py | 168 - torch/onnx/utils.py | 883 +- torch/onnx/verification.py | 141 +- torch/optim/_functional.py | 3 + torch/optim/adadelta.py | 47 +- torch/optim/adagrad.py | 22 +- torch/optim/adam.py | 219 +- torch/optim/adamax.py | 32 +- torch/optim/adamw.py | 35 +- torch/optim/asgd.py | 40 +- torch/optim/lr_scheduler.py | 93 +- torch/optim/lr_scheduler.pyi | 39 +- torch/optim/nadam.py | 45 +- torch/optim/optimizer.py | 18 +- torch/optim/radam.py | 39 +- torch/optim/rmsprop.py | 58 +- torch/optim/rprop.py | 68 +- torch/optim/sgd.py | 1 + torch/optim/sparse_adam.py | 4 +- torch/optim/swa_utils.py | 9 +- torch/overrides.py | 191 +- torch/package/_mock.py | 2 +- torch/package/package_exporter.py | 6 +- torch/package/package_importer.py | 11 +- torch/profiler/__init__.py | 35 +- torch/profiler/_memory_profiler.py | 807 + torch/profiler/_pattern_matcher.py | 95 +- torch/profiler/_utils.py | 27 +- torch/profiler/itt.py | 19 +- torch/profiler/profiler.py | 43 +- torch/quantization/__init__.py | 5 +- torch/quantization/fuser_method_mappings.py | 2 +- torch/quantization/fx/quantization_types.py | 2 +- torch/quantization/qconfig.py | 4 +- torch/quantization/quant_type.py | 2 +- torch/quantization/quantize_jit.py | 1 + torch/return_types.py | 10 +- torch/serialization.py | 140 +- torch/signal/__init__.py | 5 + torch/signal/windows/__init__.py | 26 + torch/signal/windows/windows.py | 761 + torch/sparse/__init__.py | 41 + torch/sparse/matmul.py | 27 + torch/special/__init__.py | 4 +- torch/storage.py | 254 +- torch/testing/__init__.py | 6 +- torch/testing/_comparison.py | 29 +- torch/testing/_creation.py | 34 +- torch/testing/_deprecated.py | 66 +- .../testing/_internal/autocast_test_lists.py | 19 + .../_internal/check_kernel_launches.py | 2 +- torch/testing/_internal/common_cuda.py | 20 +- torch/testing/_internal/common_device_type.py | 144 +- torch/testing/_internal/common_distributed.py | 178 +- torch/testing/_internal/common_dtype.py | 148 +- torch/testing/_internal/common_fsdp.py | 158 +- .../_internal/common_methods_invocations.py | 13219 +++-- torch/testing/_internal/common_modules.py | 73 +- torch/testing/_internal/common_nn.py | 84 +- .../testing/_internal/common_quantization.py | 11 +- torch/testing/_internal/common_quantized.py | 2 + torch/testing/_internal/common_utils.py | 812 +- .../testing/_internal/composite_compliance.py | 172 +- .../_internal/distributed/_tensor/__init__.py | 0 .../distributed/_tensor/common_dtensor.py | 334 + .../_tensor/dtensor_lagging_op_db.py | 661 + .../_tensor/gen_dtensor_lagging_op_db.py | 51 +- .../_internal/distributed/distributed_test.py | 280 +- .../distributed/multi_threaded_pg.py | 375 + .../_internal/distributed/rpc/jit/rpc_test.py | 18 +- .../_internal/distributed/rpc/rpc_test.py | 79 +- .../_internal/distributed/rpc_utils.py | 2 +- torch/testing/_internal/inductor_utils.py | 23 + torch/testing/_internal/opinfo/__init__.py | 2 + torch/testing/_internal/opinfo/core.py | 1040 +- .../_internal/opinfo/definitions/__init__.py | 25 + .../_internal/opinfo/definitions/_masked.py | 1148 + .../_internal/opinfo/definitions/fft.py | 755 + .../_internal/opinfo/definitions/linalg.py | 2232 + .../_internal/opinfo/definitions/signal.py | 827 + .../_internal/opinfo/definitions/special.py | 772 + torch/testing/_internal/opinfo/refs.py | 216 + torch/testing/_internal/opinfo/utils.py | 183 +- torch/testing/_internal/schema_check_mode.py | 6 +- torch/testing/_legacy.py | 158 - torch/types.py | 3 +- torch/utils/__init__.py | 2 + torch/utils/_cuda_trace.py | 23 + torch/utils/_mode_utils.py | 124 +- torch/utils/_python_dispatch.py | 165 +- torch/utils/_pytree.py | 94 +- torch/utils/backend_registration.py | 30 + torch/utils/benchmark/examples/fuzzer.py | 2 +- .../utils/benchmark/examples/sparse/fuzzer.py | 2 +- torch/utils/benchmark/utils/cpp_jit.py | 4 + torch/utils/benchmark/utils/timer.py | 6 +- .../utils/valgrind_wrapper/timer_interface.py | 10 +- torch/utils/bottleneck/__main__.py | 2 +- torch/utils/bundled_inputs.py | 4 +- torch/utils/checkpoint.py | 18 +- torch/utils/collect_env.py | 13 + torch/utils/cpp_backtrace.py | 11 + torch/utils/cpp_extension.py | 102 +- torch/utils/data/__init__.py | 4 - torch/utils/data/_utils/__init__.py | 14 - torch/utils/data/_utils/collate.py | 190 +- torch/utils/data/_utils/fetch.py | 17 +- torch/utils/data/_utils/pin_memory.py | 18 +- torch/utils/data/_utils/worker.py | 16 +- torch/utils/data/communication/__init__.py | 6 - torch/utils/data/communication/eventloop.py | 70 - torch/utils/data/communication/iter.py | 181 - torch/utils/data/communication/map.py | 159 - torch/utils/data/communication/messages.py | 75 - torch/utils/data/communication/protocol.py | 205 - torch/utils/data/communication/queue.py | 51 - torch/utils/data/dataloader.py | 142 +- torch/utils/data/dataloader_experimental.py | 150 - torch/utils/data/datapipes/_hook_iterator.py | 4 +- torch/utils/data/datapipes/_typing.py | 4 +- .../data/datapipes/dataframe/dataframes.py | 2 +- torch/utils/data/datapipes/datapipe.py | 32 +- torch/utils/data/datapipes/gen_pyi.py | 7 +- torch/utils/data/datapipes/iter/callable.py | 6 +- .../data/datapipes/iter/combinatorics.py | 25 +- torch/utils/data/datapipes/iter/combining.py | 66 +- torch/utils/data/datapipes/iter/filelister.py | 2 +- torch/utils/data/datapipes/iter/grouping.py | 49 +- torch/utils/data/datapipes/iter/selecting.py | 45 +- torch/utils/data/datapipes/map/__init__.py | 2 +- .../utils/data/datapipes/map/combinatorics.py | 106 +- torch/utils/data/datapipes/utils/common.py | 101 +- torch/utils/data/datapipes/utils/snapshot.py | 4 +- torch/utils/data/dataset.py | 5 +- torch/utils/data/graph.py | 76 +- torch/utils/data/graph_settings.py | 79 +- torch/utils/dlpack.py | 3 +- torch/utils/hipify/cuda_to_hip_mappings.py | 97 +- torch/utils/hipify/hipify_python.py | 3 +- torch/utils/hooks.py | 64 +- torch/utils/mobile_optimizer.py | 5 +- torch/utils/model_dump/__init__.py | 5 +- torch/utils/show_pickle.py | 1 + torch/utils/tensorboard/_pytorch_graph.py | 4 +- torch/utils/tensorboard/summary.py | 7 +- torch/utils/throughput_benchmark.py | 2 +- torchgen/api/autograd.py | 6 +- torchgen/api/cpp.py | 102 +- torchgen/api/dispatcher.py | 32 +- torchgen/api/lazy.py | 81 +- torchgen/api/native.py | 34 +- torchgen/api/python.py | 93 +- torchgen/api/structured.py | 9 +- torchgen/api/translate.py | 51 +- torchgen/api/types.py | 118 +- torchgen/api/ufunc.py | 7 +- torchgen/api/unboxing.py | 19 +- torchgen/context.py | 3 +- torchgen/dest/lazy_ir.py | 94 +- torchgen/dest/register_dispatch_key.py | 55 +- torchgen/gen.py | 274 +- torchgen/gen_backend_stubs.py | 57 +- torchgen/gen_functionalization_type.py | 148 +- torchgen/gen_lazy_tensor.py | 25 +- torchgen/gen_vmap_plumbing.py | 19 +- torchgen/local.py | 16 +- torchgen/model.py | 261 +- torchgen/native_function_generation.py | 12 +- .../gen_jit_shape_functions.py | 30 +- torchgen/static_runtime/config.py | 40 +- .../static_runtime/gen_static_runtime_ops.py | 3 + torchgen/static_runtime/generator.py | 159 +- torchgen/utils.py | 8 + ubsan.supp | 2 - version.txt | 2 +- 3888 files changed, 418919 insertions(+), 188556 deletions(-) create mode 100644 .circleci/README.md create mode 100644 .circleci/docker/common/install_rocm_magma.sh create mode 100755 .circleci/scripts/functorch_doc_push_script.sh create mode 100644 .github/actions/filter-test-configs/action.yml delete mode 100644 .github/actions/pull-docker-image/action.yml delete mode 100644 .github/actions/setup-ssh/action.yml delete mode 100644 .github/actions/teardown-linux/action.yml create mode 100644 .github/auto_request_review.yml create mode 100644 .github/ci_commit_pins/huggingface.txt create mode 100644 .github/ci_commit_pins/text.txt create mode 100644 .github/ci_commit_pins/timm.txt create mode 100644 .github/ci_commit_pins/torchbench.txt delete mode 100644 .github/ci_commit_pins/torchdynamo.txt create mode 100644 .github/ci_commit_pins/triton.txt create mode 100644 .github/labeler.yml delete mode 100644 .github/merge_rules.json create mode 100644 .github/merge_rules.yaml create mode 100644 .github/requirements-gha-cache.txt create mode 100644 .github/requirements/README.md create mode 100644 .github/requirements/conda-env-Linux-X64 create mode 100644 .github/requirements/conda-env-macOS-ARM64 create mode 100644 .github/requirements/conda-env-macOS-X64 create mode 100644 .github/requirements/pip-requirements-macOS.txt delete mode 100644 .github/scale-config.yml delete mode 100644 .github/scripts/build_publish_nightly_docker.sh create mode 100644 .github/scripts/build_triton_wheel.py create mode 100755 .github/scripts/check_labels.py create mode 100644 .github/scripts/comment_on_pr.py create mode 100755 .github/scripts/filter_test_configs.py delete mode 100755 .github/scripts/install_nvidia_utils_linux.sh create mode 100644 .github/scripts/pr-sanity-check.sh delete mode 100644 .github/scripts/process_commit.py create mode 100644 .github/scripts/test_check_labels.py create mode 100755 .github/scripts/test_filter_test_configs.py delete mode 100755 .github/scripts/wait_for_ssh_to_drain.sh create mode 100644 .github/workflows/auto_request_review.yml create mode 100644 .github/workflows/build-triton-wheel.yml create mode 100644 .github/workflows/check-labels.yml create mode 100644 .github/workflows/docker-release.yml delete mode 100644 .github/workflows/generated-windows-binary-wheel-master.yml create mode 100644 .github/workflows/inductor.yml create mode 100644 .github/workflows/labeler.yml delete mode 100644 .github/workflows/pr-labels.yml delete mode 100644 .github/workflows/push_nightly_docker_ghcr.yml create mode 100644 .github/workflows/scorecards.yml delete mode 100644 .github/workflows/update-commit-hashes.yml create mode 100644 .github/workflows/weekly.yml delete mode 100755 .jenkins/caffe2/bench.sh delete mode 100755 .jenkins/caffe2/build.sh delete mode 100755 .jenkins/caffe2/dirty.sh create mode 100755 .jenkins/pytorch/build-tsan.sh delete mode 100755 .jenkins/pytorch/dirty.sh delete mode 100644 CITATION create mode 100644 CITATION.cff create mode 100644 aten/src/ATen/PadNd.h create mode 100644 aten/src/ATen/core/PythonOpRegistrationTrampoline.cpp create mode 100644 aten/src/ATen/core/PythonOpRegistrationTrampoline.h delete mode 100644 aten/src/ATen/core/TorchDispatchModeTLS.cpp delete mode 100644 aten/src/ATen/core/TorchDispatchModeTLS.h create mode 100644 aten/src/ATen/core/TorchDispatchUtils.cpp create mode 100644 aten/src/ATen/core/TorchDispatchUtils.h rename {functorch/functorch/csrc => aten/src/ATen/functorch}/ADInterpreters.cpp (70%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/ADInterpreters.h (71%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesActivation.cpp (98%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesBinaryOps.cpp (90%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesConvolution.cpp (82%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesDecompositions.cpp (82%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesDynamic.cpp (86%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesFactory.cpp (73%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesHelper.cpp (92%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesHelper.h (94%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesLinearAlgebra.cpp (59%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesLoss.cpp (94%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesModules.cpp (89%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesNorm.cpp (93%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesPooling.cpp (92%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesRandomness.cpp (87%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesReduceOps.cpp (96%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesScatterOps.cpp (98%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesUnaryOps.cpp (97%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchRulesViews.cpp (83%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchedFallback.cpp (97%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchedFallback.h (63%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchedTensorImpl.cpp (58%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchedTensorImpl.h (82%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/BatchingMetaprogramming.h (92%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/DynamicLayer.cpp (72%) create mode 100644 aten/src/ATen/functorch/DynamicLayer.h rename {functorch/functorch/csrc => aten/src/ATen/functorch}/FunctionalizeInterpreter.cpp (94%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/FunctionalizeInterpreter.h (75%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/Interpreter.cpp (75%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/Interpreter.h (91%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/LegacyBatchingRegistrations.cpp (82%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/LegacyVmapTransforms.cpp (88%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/LegacyVmapTransforms.h (95%) create mode 100644 aten/src/ATen/functorch/Macros.h rename {functorch/functorch/csrc => aten/src/ATen/functorch}/PlumbingHelper.cpp (91%) create mode 100644 aten/src/ATen/functorch/PlumbingHelper.h rename {functorch/functorch/csrc => aten/src/ATen/functorch}/PyTorchOperatorHacks.cpp (95%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/TensorWrapper.cpp (89%) create mode 100644 aten/src/ATen/functorch/TensorWrapper.h rename {functorch/functorch/csrc => aten/src/ATen/functorch}/VmapInterpreter.cpp (68%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/VmapInterpreter.h (76%) rename {functorch/functorch/csrc => aten/src/ATen/functorch}/VmapModeRegistrations.cpp (83%) create mode 100644 aten/src/ATen/mps/IndexKernels.h create mode 100644 aten/src/ATen/native/ComparisonUtils.cpp create mode 100644 aten/src/ATen/native/NonSymbolicBC.h delete mode 100644 aten/src/ATen/native/PadNd.h delete mode 100644 aten/src/ATen/native/SpmmReduce.cpp delete mode 100644 aten/src/ATen/native/SpmmReduce.h create mode 100644 aten/src/ATen/native/cpu/CopyKernel.h create mode 100644 aten/src/ATen/native/cpu/SpmmReduceKernel.h create mode 100644 aten/src/ATen/native/cuda/Copy.h create mode 100644 aten/src/ATen/native/cuda/CumminmaxKernel.cu create mode 100644 aten/src/ATen/native/cuda/CumprodKernel.cu create mode 100644 aten/src/ATen/native/cuda/CumsumKernel.cu create mode 100644 aten/src/ATen/native/cuda/FusedAdamKernel.cu create mode 100644 aten/src/ATen/native/cuda/LogcumsumexpKernel.cu create mode 100644 aten/src/ATen/native/cuda/Pow.cuh rename aten/src/ATen/native/cuda/{ScanKernels.cu => ScanUtils.cuh} (84%) create mode 100644 aten/src/ATen/native/cuda/SparseBinaryOpIntersectionKernel.cu create mode 100644 aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cu create mode 100644 aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cuh create mode 100644 aten/src/ATen/native/cuda/fused_adam_impl.cu create mode 100644 aten/src/ATen/native/cuda/fused_adam_impl.cuh create mode 100644 aten/src/ATen/native/cuda/fused_adam_utils.cuh create mode 100644 aten/src/ATen/native/mps/MPSGraphVenturaOps.h rename aten/src/ATen/native/mps/operations/{BitwiseBinaryOps.mm => BitwiseOps.mm} (79%) create mode 100644 aten/src/ATen/native/mps/operations/Indexing.h create mode 100644 aten/src/ATen/native/mps/operations/Pad.mm create mode 100644 aten/src/ATen/native/nested/NestedTensorAliases.cpp create mode 100644 aten/src/ATen/native/nested/NestedTensorBinaryOps.cpp create mode 100644 aten/src/ATen/native/nested/NestedTensorBinaryOps.h create mode 100644 aten/src/ATen/native/nested/NestedTensorFactories.cpp create mode 100644 aten/src/ATen/native/nested/NestedTensorFactories.h create mode 100644 aten/src/ATen/native/nested/NestedTensorMatmul.cpp create mode 100644 aten/src/ATen/native/nested/NestedTensorUnaryOps.cpp create mode 100644 aten/src/ATen/native/nested/NestedTensorUtils.cpp create mode 100644 aten/src/ATen/native/nested/NestedTensorUtils.h create mode 100644 aten/src/ATen/native/nested/cuda/NestedTensorBinaryOps.cu create mode 100644 aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu delete mode 100644 aten/src/ATen/native/quantized/cpu/qnnpack/src/pack_block_sparse.cc create mode 100644 aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.h create mode 100644 aten/src/ATen/native/quantized/cuda/Activation.cu create mode 100644 aten/src/ATen/native/sparse/Macros.h create mode 100644 aten/src/ATen/native/sparse/SparseBinaryOpIntersectionCommon.h create mode 100644 aten/src/ATen/native/sparse/SparseBinaryOpIntersectionKernel.cpp create mode 100644 aten/src/ATen/native/sparse/SparseStubs.h create mode 100644 aten/src/ATen/native/transformers/attention.h create mode 100644 aten/src/ATen/native/transformers/cuda/attention_backward.cu create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/epilogue.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/epilogue_predicated_tile_iterator.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/fmha.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/fmha_api.cpp create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/fmha_api.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/fmha_fprop_kernel_1xN.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/fmha_fprop_kernel_dispatch.cu create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/fmha_kernel.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/fmha_utils.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/gemm.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/gmem_tile.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/kernel_traits.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/mask.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/mma_core_sm75.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/philox.cuh create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/softmax.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/static_switch.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/summary_stats.h create mode 100644 aten/src/ATen/native/transformers/cuda/flash_attn/utils.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/attention_scaling_coefs_updater.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/debug_utils.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/epilogue_pipelined.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/epilogue_rescale_output.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/epilogue_thread_apply_logsumexp.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/find_default_mma.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/gemm/custom_mma.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/gemm/custom_mma_base.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/gemm/custom_mma_multistage.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/gemm/custom_mma_pipelined.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/gemm_kernel_utils.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/iterators/epilogue_predicated_tile_iterator.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/iterators/make_residual_last.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/iterators/predicated_tile_access_iterator_residual_last.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/iterators/predicated_tile_iterator_residual_last.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernel_backward.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernel_forward.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned_k128.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_aligned_k64.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_k128.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_bf16_k64.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f16.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f16_aligned.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f16_aligned_k128.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f16_aligned_k64.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f16_k128.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f16_k64.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f32.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f32_aligned.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f32_aligned_k128.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f32_aligned_k64.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f32_k128.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/backward_f32_k64.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/forward_bf16.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/forward_bf16_aligned.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/forward_f16.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/forward_f16_aligned.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/forward_f32.cu create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/forward_f32_aligned.cu create mode 100755 aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/generate_kernels.sh create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/mma_from_smem.h create mode 100644 aten/src/ATen/native/transformers/cuda/mem_eff_attention/mma_simt_tile_iterator_residual.h create mode 100644 aten/src/ATen/native/transformers/cuda/sdp_utils.h create mode 100644 aten/src/ATen/native/transformers/sdp_utils_cpp.h create mode 100644 aten/src/ATen/native/vulkan/api/Types.h delete mode 100644 aten/src/ATen/native/vulkan/api/vk_mem_alloc.h create mode 100644 aten/src/ATen/native/vulkan/glsl/buffer_to_buffer.glsl delete mode 100644 aten/src/ATen/native/vulkan/glsl/conv2d_pw.glsl delete mode 100644 aten/src/ATen/native/vulkan/glsl/conv2d_pw_2x2.glsl delete mode 100644 aten/src/ATen/native/vulkan/glsl/conv2d_pw_2x2_buffered.glsl create mode 100644 aten/src/ATen/native/vulkan/glsl/image2d_to_nchw.glsl create mode 100644 aten/src/ATen/native/vulkan/glsl/indexing.h create mode 100644 aten/src/ATen/native/vulkan/glsl/nchw_to_image2d.glsl create mode 100644 aten/src/ATen/native/vulkan/glsl/templates/conv2d_pw.glslt create mode 100644 aten/src/ATen/native/vulkan/glsl/templates/conv2d_pw_params.yaml create mode 100644 aten/src/ATen/native/vulkan/ops/Batchnorm.h create mode 100644 aten/src/ATen/test/mps_test_print.cpp create mode 100644 benchmarks/cpp/nvfuser/matmul.cpp create mode 100644 benchmarks/dynamo/Makefile_dashboard create mode 100644 benchmarks/dynamo/README.md rename {test/quantization/dbr => benchmarks/dynamo}/__init__.py (100%) create mode 100644 benchmarks/dynamo/check_csv.py create mode 100644 benchmarks/dynamo/common.py create mode 100644 benchmarks/dynamo/dist_util.py create mode 100644 benchmarks/dynamo/distributed.py create mode 100755 benchmarks/dynamo/huggingface.py create mode 100644 benchmarks/dynamo/huggingface_models_list.txt rename {torch/ao/quantization/_dbr => benchmarks/dynamo/microbenchmarks}/__init__.py (100%) create mode 100644 benchmarks/dynamo/microbenchmarks/bench_autotune_conv.py create mode 100644 benchmarks/dynamo/microbenchmarks/bench_conv.py create mode 100644 benchmarks/dynamo/microbenchmarks/bench_conv1x1.py create mode 100644 benchmarks/dynamo/microbenchmarks/bench_conv_fusion.py create mode 100644 benchmarks/dynamo/microbenchmarks/bench_mm_fusion.py create mode 100644 benchmarks/dynamo/microbenchmarks/benchmark_helper.py create mode 100644 benchmarks/dynamo/microbenchmarks/inductor_bmm.py create mode 100644 benchmarks/dynamo/microbenchmarks/inductor_mm.py create mode 100644 benchmarks/dynamo/microbenchmarks/matmul_relu.py create mode 100755 benchmarks/dynamo/microbenchmarks/microbench.py create mode 100644 benchmarks/dynamo/microbenchmarks/model.py create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/AlbertForMaskedLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/AlbertForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/AllenaiLongformerBase_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/BartForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/BartForConditionalGeneration_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/BertForMaskedLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/BertForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/BigBird_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/BlenderbotSmallForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/BlenderbotSmallForConditionalGeneration_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/CamemBert_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/DebertaForMaskedLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/DebertaForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/DebertaV2ForMaskedLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/DebertaV2ForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/DistilBertForMaskedLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/DistilBertForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/DistillGPT2_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/ElectraForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/ElectraForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/GPT2ForSequenceClassification_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/GPTNeoForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/GPTNeoForSequenceClassification_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/GoogleFnet_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/LayoutLMForMaskedLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/LayoutLMForSequenceClassification_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/M2M100ForConditionalGeneration_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/MBartForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/MBartForConditionalGeneration_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/MegatronBertForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/MegatronBertForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/MobileBertForMaskedLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/MobileBertForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/OPTForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/PLBartForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/PLBartForConditionalGeneration_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/PegasusForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/PegasusForConditionalGeneration_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/RobertaForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/RobertaForQuestionAnswering_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/Speech2Text2ForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/TrOCRForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/XGLMForCausalLM_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/XLNetLMHeadModel_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/hf_train/YituTechConvBert_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/adv_inception_v3_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/beit_base_patch16_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/botnet26t_256_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/cait_m36_384_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/coat_lite_mini_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/convmixer_768_32_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/convnext_base_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/crossvit_9_240_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/cspdarknet53_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/deit_base_distilled_patch16_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/densenet121_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/dla102_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/dm_nfnet_f0_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/dpn107_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/eca_botnext26ts_256_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/eca_halonext26ts_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/ecaresnet101d_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/ese_vovnet19b_dw_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/fbnetc_100_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/fbnetv3_b_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/gernet_l_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/ghostnet_100_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/gluon_inception_v3_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/gluon_senet154_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/gluon_xception65_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/gmixer_24_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/gmlp_s16_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/hardcorenas_a_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/hrnet_w18_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/inception_v3_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/jx_nest_base_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/lcnet_050_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/legacy_senet154_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/levit_128_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/mixer_b16_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/mixnet_l_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/mnasnet_100_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/mobilenetv2_100_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/mobilenetv3_large_100_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/mobilevit_s_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/nasnetalarge_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/nfnet_l0_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/pit_b_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/pnasnet5large_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/poolformer_m36_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/regnety_002_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/repvgg_a2_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/res2net101_26w_4s_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/res2net50_14w_8s_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/res2next50_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/resmlp_12_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/resnest101e_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/resnet18_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/rexnet_100_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/sebotnet33ts_256_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/selecsls42b_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/spnasnet_100_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/swin_base_patch4_window7_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/swsl_resnext101_32x16d_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/tf_efficientnet_b0_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/tf_mixnet_l_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/tinynet_a_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/tnt_s_patch16_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/twins_pcpvt_base_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/visformer_small_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/vit_base_patch16_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/timm_train/volo_d1_224_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/BERT_pytorch_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/Background_Matting_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/LearningToPaint_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/Super_SloMo_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/alexnet_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/attention_is_all_you_need_pytorch_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/dcgan_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/densenet121_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/fambench_dlrm_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/fastNLP_Bert_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/hf_Albert_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/hf_Bart_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/hf_Bert_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/hf_BigBird_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/hf_DistilBert_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/hf_GPT2_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/hf_Longformer_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/maml_omniglot_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/mnasnet1_0_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/mobilenet_v2_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/mobilenet_v3_large_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/nvidia_deeprecommender_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/pytorch_CycleGAN_and_pix2pix_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/pytorch_stargan_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/pytorch_struct_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/pytorch_unet_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/resnet18_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/resnet50_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/resnext50_32x4d_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/shufflenet_v2_x1_0_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/speech_transformer_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/squeezenet1_1_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/timm_efficientdet_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/timm_efficientnet_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/timm_nfnet_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/timm_regnet_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/timm_resnest_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/timm_vision_transformer_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/timm_vovnet_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/tts_angular_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/vgg16_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/vision_maskrcnn_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_logs/torchbench_train/yolov3_training.txt create mode 100644 benchmarks/dynamo/microbenchmarks/operator_inp_utils.py create mode 100644 benchmarks/dynamo/microbenchmarks/operatorbench.py create mode 100644 benchmarks/dynamo/microbenchmarks/profile_conv.py create mode 100644 benchmarks/dynamo/microbenchmarks/utils.py create mode 100755 benchmarks/dynamo/runner.py create mode 100644 benchmarks/dynamo/test.py create mode 100755 benchmarks/dynamo/timm_models.py create mode 100644 benchmarks/dynamo/timm_models_list.txt create mode 100755 benchmarks/dynamo/torchbench.py create mode 100644 benchmarks/dynamo/torchbench_models_list.txt create mode 100644 benchmarks/dynamo/training_loss.py create mode 100644 benchmarks/nested/nested_bmm_bench.py create mode 100644 benchmarks/transformer/better_transformer_vs_mha_functional.py create mode 100644 benchmarks/transformer/sdp.py create mode 100644 benchmarks/transformer/sdp_backwards.py delete mode 100644 c10/c10_defs.bzl create mode 100644 c10/core/PyHandleCache.h create mode 100644 c10/core/SymFloat.cpp create mode 100644 c10/core/SymFloat.h delete mode 100644 c10/core/SymIntNodeImpl.cpp delete mode 100644 c10/core/SymIntNodeImpl.h create mode 100644 c10/core/SymNodeImpl.cpp create mode 100644 c10/core/SymNodeImpl.h create mode 100644 c10/core/impl/HermeticPyObjectTLS.cpp create mode 100644 c10/core/impl/HermeticPyObjectTLS.h create mode 100644 c10/core/impl/PythonDispatcherTLS.cpp create mode 100644 c10/core/impl/PythonDispatcherTLS.h create mode 100644 c10/core/impl/TorchDispatchModeTLS.cpp create mode 100644 c10/core/impl/TorchDispatchModeTLS.h create mode 100644 c10/cuda/CUDAException.cpp create mode 100644 c10/cuda/CUDAMallocAsyncAllocator.cpp delete mode 100644 c10/defs_hip.bzl delete mode 100644 caffe2/defs.bzl delete mode 100644 caffe2/defs_hip.bzl create mode 100644 caffe2/perfkernels/batch_box_cox.cc create mode 100644 caffe2/perfkernels/batch_box_cox.h create mode 100644 caffe2/perfkernels/batch_box_cox_avx2.cc create mode 100644 caffe2/perfkernels/vectorizer.h create mode 100644 caffe2/python/clean_workspace_test.py create mode 100644 caffe2/python/operator_test/_utils.py create mode 100644 caffe2/python/pybind_workspace.cc create mode 100644 caffe2/python/pybind_workspace.h delete mode 100644 defs_gpu.bzl delete mode 100644 defs_hip.bzl create mode 100644 docs/source/_dynamo.rst create mode 100644 docs/source/_static/img/masked/tensor_comparison.jpg create mode 100644 docs/source/community/build_ci_governance.rst create mode 100644 docs/source/cuda._sanitizer.rst create mode 100644 docs/source/distributed.checkpoint.rst create mode 100644 docs/source/masked.rst create mode 100644 docs/source/onnx_diagnostics.rst create mode 100644 docs/source/scripts/onnx/build_onnx_diagnostics_rules_md.py create mode 100644 docs/source/signal.rst delete mode 100644 functorch/.circleci/config.yml delete mode 100644 functorch/.circleci/unittest/linux/scripts/environment.yml delete mode 100755 functorch/.circleci/unittest/linux/scripts/install.sh delete mode 100755 functorch/.circleci/unittest/linux/scripts/post_process.sh delete mode 100755 functorch/.circleci/unittest/linux/scripts/run_test.sh delete mode 100755 functorch/.circleci/unittest/linux/scripts/setup_env.sh delete mode 100644 functorch/.circleci/unittest/windows/scripts/environment.yml delete mode 100644 functorch/.circleci/unittest/windows/scripts/install.sh delete mode 100644 functorch/.circleci/unittest/windows/scripts/install_conda.bat delete mode 100644 functorch/.circleci/unittest/windows/scripts/post_process.sh delete mode 100644 functorch/.circleci/unittest/windows/scripts/run_test.sh delete mode 100644 functorch/.circleci/unittest/windows/scripts/set_cuda_envs.sh delete mode 100644 functorch/.circleci/unittest/windows/scripts/setup_env.sh delete mode 100644 functorch/.circleci/unittest/windows/scripts/vc_env_helper.bat delete mode 100644 functorch/.flake8 delete mode 100644 functorch/.github/workflows/docs.yml delete mode 100644 functorch/.github/workflows/lint.yml delete mode 100644 functorch/.github/workflows/wheels.yml delete mode 100644 functorch/.lintrunner.toml create mode 100644 functorch/CMakeLists.txt delete mode 100644 functorch/CODE_OF_CONDUCT.md delete mode 100644 functorch/CONTRIBUTING.md delete mode 100644 functorch/LICENSE rename functorch/{functorch => }/__init__.py (76%) rename functorch/{functorch => }/_src/__init__.py (100%) create mode 100644 functorch/_src/aot_autograd.py rename functorch/{functorch => }/_src/benchmark_utils.py (100%) rename functorch/{functorch => }/_src/compile_utils.py (91%) rename functorch/{functorch => }/_src/compilers.py (79%) create mode 100644 functorch/_src/config.py rename functorch/{functorch => }/_src/eager_transforms.py (92%) create mode 100644 functorch/_src/fx_minifier.py rename functorch/{functorch => }/_src/make_functional.py (99%) rename functorch/{functorch => }/_src/named_members_polyfill.py (100%) rename functorch/{functorch => }/_src/partitioners.py (61%) rename functorch/{functorch => }/_src/python_key.py (50%) rename functorch/{functorch => }/_src/pytree_hacks.py (100%) rename functorch/{functorch => }/_src/top_operators_github_usage.py (100%) rename functorch/{functorch => }/_src/vmap.py (97%) rename functorch/{functorch => }/compile/__init__.py (70%) rename functorch/{functorch => }/csrc/dim/arena.h (100%) rename functorch/{functorch => }/csrc/dim/dim.cpp (96%) rename functorch/{functorch => }/csrc/dim/dim.h (100%) rename functorch/{functorch => }/csrc/dim/minpybind.h (98%) rename functorch/{functorch => }/csrc/dim/python_variable_simple.h (100%) create mode 100644 functorch/csrc/init_dim_only.cpp rename functorch/{functorch => }/dim/README.md (94%) rename functorch/{functorch => }/dim/__init__.py (97%) rename functorch/{functorch => }/dim/batch_tensor.py (95%) rename functorch/{functorch => }/dim/delayed_mul_tensor.py (100%) rename functorch/{functorch => }/dim/dim.py (100%) rename functorch/{functorch => }/dim/magic_trace.py (100%) rename functorch/{functorch => }/dim/op_properties.py (100%) rename functorch/{functorch => }/dim/reference.py (100%) rename functorch/{functorch => }/dim/tree_map.py (100%) rename functorch/{functorch => }/dim/wrap_type.py (100%) delete mode 100644 functorch/docs/source/_static/images/functorch.svg rename functorch/{functorch => }/experimental/__init__.py (60%) create mode 100644 functorch/experimental/_map.py rename functorch/{functorch => }/experimental/batch_norm_replacement.py (100%) create mode 100644 functorch/experimental/cond.py create mode 100644 functorch/experimental/control_flow.py create mode 100644 functorch/experimental/ops.py delete mode 100644 functorch/functorch/_src/aot_autograd.py delete mode 100644 functorch/functorch/_src/config.py delete mode 100644 functorch/functorch/_src/custom_function.py delete mode 100644 functorch/functorch/_src/fx_minifier.py delete mode 100644 functorch/functorch/_src/monkey_patching.py delete mode 100644 functorch/functorch/csrc/CompileCache.cpp delete mode 100644 functorch/functorch/csrc/CompileCache.h delete mode 100644 functorch/functorch/csrc/Constants.h delete mode 100644 functorch/functorch/csrc/CustomFunction.cpp delete mode 100644 functorch/functorch/csrc/CustomFunction.h delete mode 100644 functorch/functorch/csrc/DynamicLayer.h delete mode 100644 functorch/functorch/csrc/Macros.h delete mode 100644 functorch/functorch/csrc/PlumbingHelper.h delete mode 100644 functorch/functorch/csrc/TensorWrapper.h delete mode 100644 functorch/functorch/csrc/init.cpp delete mode 100644 functorch/notebooks/colab/ensembling_colab.ipynb delete mode 100644 functorch/notebooks/colab/jacobians_hessians_colab.ipynb delete mode 100644 functorch/notebooks/colab/per_sample_grads_colab.ipynb delete mode 100644 functorch/notebooks/colab/readme.md delete mode 100644 functorch/packaging/build_wheel.sh delete mode 100644 functorch/packaging/pkg_helpers.bash delete mode 100644 functorch/packaging/windows/internal/cuda_install.bat delete mode 100644 functorch/packaging/windows/internal/driver_update.bat delete mode 100644 functorch/packaging/windows/internal/vc_env_helper.bat delete mode 100644 functorch/packaging/windows/internal/vc_install_helper.sh delete mode 100644 functorch/pull_request_template.md delete mode 100644 functorch/setup.cfg delete mode 100644 functorch/setup.py delete mode 100644 functorch/test/functorch_lagging_op_db.py delete mode 100644 functorch/test/pytest.ini delete mode 100644 functorch/test/test_compile_cache.py delete mode 100644 functorch/test/test_minifier.py delete mode 100644 functorch/test/test_pythonkey.py delete mode 100644 functorch/tools/lint/black_linter.py delete mode 100644 functorch/tools/lint/flake8_linter.py delete mode 100644 functorch/tools/lint/pip_init.py delete mode 100644 functorch/version.txt delete mode 100644 ios/TestApp/AppleWWDRCAG3.cer create mode 100644 ios/TestApp/TestApp/Benchmark.h create mode 100644 ios/TestApp/TestApp/Benchmark.mm create mode 100644 ios/TestApp/benchmark/config.json delete mode 100644 test/cpp/api/imethod.cpp create mode 100644 test/cpp/api/nested.cpp create mode 100644 test/cpp/c10d/ProcessGroupUCCTest.cpp delete mode 100644 test/cpp/lazy/test_symbolic_shape.cpp create mode 100644 test/cpp/lite_interpreter_runtime/resources.h create mode 100644 test/cpp/profiler/perf_events.cpp delete mode 100644 test/defs.bzl create mode 100644 test/distributed/_composable/test_checkpoint.py create mode 100644 test/distributed/_composable/test_contract.py create mode 100644 test/distributed/_composable/test_fully_shard.py create mode 100644 test/distributed/_composable/test_replicate.py delete mode 100644 test/distributed/_shard/checkpoint/test_checkpoint.py create mode 100644 test/distributed/_tensor/README.md create mode 100644 test/distributed/_tensor/__init__.py rename {torch/ao/sparsity/_experimental => test/distributed/_tensor/parallel}/__init__.py (100%) create mode 100644 test/distributed/_tensor/parallel/test_2d_parallel.py create mode 100644 test/distributed/_tensor/parallel/test_parallelize_api.py create mode 100644 test/distributed/_tensor/parallel/test_tp_examples.py create mode 100644 test/distributed/_tensor/parallel/test_tp_style.py create mode 100644 test/distributed/_tensor/parallel/test_view_sharding_dim_change.py create mode 100644 test/distributed/_tensor/test_api.py create mode 100644 test/distributed/_tensor/test_common_rules.py create mode 100644 test/distributed/_tensor/test_device_mesh.py create mode 100644 test/distributed/_tensor/test_dtensor.py create mode 100644 test/distributed/_tensor/test_dtensor_ops.py create mode 100644 test/distributed/_tensor/test_math_ops.py create mode 100644 test/distributed/_tensor/test_matrix_ops.py create mode 100644 test/distributed/_tensor/test_pointwise_ops.py create mode 100644 test/distributed/_tensor/test_redistribute.py create mode 100644 test/distributed/_tensor/test_tensor_ops.py create mode 100644 test/distributed/_tensor/test_tp_sharding_ops.py create mode 100644 test/distributed/_tensor/test_view_ops.py create mode 100644 test/distributed/checkpoint/test_checkpoint.py create mode 100644 test/distributed/checkpoint/test_dedup_tensors.py rename test/distributed/{_shard => }/checkpoint/test_file_system_checkpoint.py (95%) rename test/distributed/{_shard => }/checkpoint/test_file_system_checkpoint_cpu.py (99%) create mode 100644 test/distributed/checkpoint/test_planner.py create mode 100644 test/distributed/checkpoint/test_traverse.py rename test/distributed/{_shard => }/checkpoint/test_utils.py (93%) delete mode 100644 test/distributed/defs.bzl create mode 100644 test/distributed/elastic/timer/file_based_local_timer_test.py delete mode 100644 test/distributed/fsdp/defs.bzl delete mode 100644 test/distributed/fsdp/test_flatten_params_wrapper.py create mode 100644 test/distributed/fsdp/test_fsdp_flatten_params.py delete mode 100644 test/distributed/fsdp/test_fsdp_param_exec_order_wrap.py create mode 100644 test/distributed/fsdp/test_fsdp_tp_integration.py create mode 100644 test/distributed/fsdp/test_fsdp_use_orig_params.py create mode 100644 test/distributed/optim/test_apply_optimizer_in_backward.py delete mode 100644 test/distributed/pipeline/sync/defs.bzl create mode 100644 test/distributed/test_c10d_error_logger.py create mode 100644 test/distributed/test_c10d_spawn_ucc.py create mode 100644 test/distributed/test_dynamo_distributed.py create mode 100644 test/distributed/test_multi_threaded_pg.py rename {torch/ao/sparsity/_experimental/activation_sparsifier => test/dynamo}/__init__.py (100%) rename {torch/ao/sparsity/_experimental/data_sparsifier/lightning => test/dynamo/mock_modules}/__init__.py (100%) create mode 100644 test/dynamo/mock_modules/mock_module1.py create mode 100644 test/dynamo/mock_modules/mock_module2.py create mode 100644 test/dynamo/mock_modules/mock_module3.py create mode 100644 test/dynamo/test_aot_autograd.py create mode 100644 test/dynamo/test_aot_cudagraphs.py create mode 100644 test/dynamo/test_dynamic_shapes.py create mode 100644 test/dynamo/test_export.py create mode 100644 test/dynamo/test_export_mutations.py create mode 100644 test/dynamo/test_functions.py create mode 100644 test/dynamo/test_global.py create mode 100644 test/dynamo/test_global_declaration.py create mode 100644 test/dynamo/test_minifier.py create mode 100644 test/dynamo/test_misc.py create mode 100644 test/dynamo/test_model_output.py create mode 100644 test/dynamo/test_modules.py create mode 100644 test/dynamo/test_nops.py create mode 100644 test/dynamo/test_optimizations.py create mode 100644 test/dynamo/test_optimizers.py create mode 100644 test/dynamo/test_python_autograd.py create mode 100644 test/dynamo/test_recompile_ux.py create mode 100644 test/dynamo/test_replay_record.py create mode 100644 test/dynamo/test_repros.py create mode 100644 test/dynamo/test_skip_non_tensor.py create mode 100644 test/dynamo/test_subgraphs.py create mode 100644 test/dynamo/test_torchxla_integration.py create mode 100644 test/dynamo/test_torchxla_num_output.py create mode 100644 test/dynamo/test_torchxla_util.py create mode 100644 test/dynamo/test_unspec.py create mode 100644 test/dynamo/test_verify_correctness.py rename {functorch/test => test/functorch}/attn_ft.py (100%) rename {functorch/test => test/functorch}/attn_positional.py (100%) rename {functorch/test => test/functorch}/common_utils.py (62%) rename {functorch/test => test/functorch}/discover_coverage.py (99%) rename {functorch/test => test/functorch}/functorch_additional_op_db.py (97%) create mode 100644 test/functorch/test_aotdispatch.py create mode 100644 test/functorch/test_control_flow.py rename {functorch/test => test/functorch}/test_dims.py (92%) rename {functorch/test => test/functorch}/test_eager_transforms.py (82%) rename {functorch/test => test/functorch}/test_functionalize.py (65%) rename {functorch/test => test/functorch}/test_memory_efficient_fusion.py (100%) create mode 100644 test/functorch/test_minifier.py rename {functorch/test => test/functorch}/test_ops.py (61%) rename {functorch/test => test/functorch}/test_vmap.py (93%) rename {functorch/test => test/functorch}/xfail_suggester.py (96%) rename {torch/ao/sparsity/_experimental/data_sparsifier/lightning/callbacks => test/inductor}/__init__.py (100%) create mode 100644 test/inductor/cpp/.gitignore create mode 100644 test/inductor/cpp/CMakeLists.txt create mode 100755 test/inductor/cpp/test.sh create mode 100644 test/inductor/cpp/test_cpp_prefix.cpp create mode 100644 test/inductor/opinfo_harness.py create mode 100644 test/inductor/test_minifier.py create mode 100644 test/inductor/test_perf.py create mode 100644 test/inductor/test_smoke.py create mode 100644 test/inductor/test_torchinductor.py create mode 100644 test/inductor/test_torchinductor_opinfo.py create mode 100644 test/jit/xnnpack/test_xnnpack_delegate.py create mode 100644 test/lazy/test_debug_util.py create mode 100644 test/lazy/test_meta_kernel.py create mode 100644 test/lazy/test_step_closures.py create mode 100644 test/nn/test_convolution.py create mode 100644 test/nn/test_dropout.py create mode 100644 test/nn/test_embedding.py create mode 100644 test/nn/test_init.py create mode 100644 test/nn/test_lazy_modules.py create mode 100644 test/nn/test_module_hooks.py create mode 100644 test/nn/test_packed_sequence.py create mode 100644 test/nn/test_parametrization.py create mode 100644 test/nn/test_pooling.py create mode 100644 test/nn/test_pruning.py create mode 100644 test/onnx/internal/test_beartype.py create mode 100644 test/onnx/internal/test_diagnostics.py create mode 100644 test/onnx/internal/test_registraion.py delete mode 100644 test/onnx/symbolic_opsets/test_symbolic_opset9.py rename test/{jit => onnx}/test_export_modes.py (64%) create mode 100644 test/onnx/test_onnxscript_no_runtime.py create mode 100644 test/onnx/test_onnxscript_runtime.py create mode 100644 test/profiler/profiler_utils_mock_events.json create mode 100644 test/profiler/test_memory_profiler.py rename test/{ => profiler}/test_profiler.py (67%) rename test/{ => profiler}/test_profiler_tree.py (86%) delete mode 100644 test/profiler_utils_mock_events.json create mode 100644 test/quantization/core/test_top_level_apis.py delete mode 100644 test/quantization/dbr/test_quantize_dbr.py create mode 100644 test/quantization/jit/test_ondevice_quantization.py mode change 100644 => 100755 test/run_test.py create mode 100644 test/test_comparison_utils.py create mode 100644 test/test_cuda_nvml_based_avail.py create mode 100644 test/test_cuda_sanitizer.py create mode 100644 test/test_dlpack.py delete mode 100644 test/test_fx_backends.py create mode 100644 test/test_itt.py create mode 100644 test/test_maskedtensor.py create mode 100644 test/test_matmul_cuda.py create mode 100644 test/test_nvfuser_dynamo.py create mode 100644 test/test_nvfuser_frontend.py create mode 100644 test/test_ops_fwd_gradients.py create mode 160000 third_party/VulkanMemoryAllocator delete mode 100644 third_party/cpuinfo.BUILD create mode 160000 third_party/cutlass create mode 100644 third_party/cutlass.BUILD create mode 100644 tools/autograd/templates/python_nested_functions.cpp delete mode 100644 tools/cpuinfo_target_definition.bzl create mode 100644 tools/dynamo/verify_dynamo.py create mode 100644 tools/gen_vulkan_glsl.py delete mode 100644 tools/miniz_target_definition.bzl create mode 100644 tools/onnx/gen_diagnostics.py create mode 100755 tools/onnx/gen_diagnostics.sh create mode 100644 tools/onnx/sarif/code-gen-hints.json create mode 100755 tools/onnx/sarif/gen_sarif.sh create mode 100644 tools/onnx/templates/rules.h.in create mode 100644 tools/onnx/templates/rules.py.in delete mode 100644 tools/perf_kernel_defs.bzl delete mode 100644 tools/sgx_aten_target_definitions.bzl delete mode 100644 tools/sgx_caffe2_target_definitions.bzl delete mode 100644 tools/sgx_target_definitions.bzl create mode 100644 tools/stats/check_disabled_tests.py create mode 100644 tools/stats/upload_artifacts.py delete mode 100644 tools/target_definitions.bzl create mode 100644 tools/test/test_vulkan_codegen.py create mode 100644 torch/_C/_functorch.pyi create mode 100644 torch/_C/_profiler.pyi rename functorch/functorch/_src/decompositions.py => torch/_decomp/decompositions_for_jvp.py (61%) rename torch/{ao/sparsity/scheduler => _dispatch}/__init__.py (100%) create mode 100644 torch/_dispatch/python.py create mode 100644 torch/_dynamo/__init__.py create mode 100644 torch/_dynamo/allowed_functions.py create mode 100644 torch/_dynamo/bytecode_analysis.py create mode 100644 torch/_dynamo/bytecode_transformation.py create mode 100644 torch/_dynamo/codegen.py create mode 100644 torch/_dynamo/config.py create mode 100644 torch/_dynamo/convert_frame.py create mode 100644 torch/_dynamo/debug_utils.py create mode 100644 torch/_dynamo/eval_frame.py create mode 100644 torch/_dynamo/exc.py create mode 100644 torch/_dynamo/guards.py create mode 100644 torch/_dynamo/logging.py create mode 100644 torch/_dynamo/mutation_guard.py create mode 100644 torch/_dynamo/optimizations/__init__.py create mode 100644 torch/_dynamo/optimizations/analysis.py create mode 100644 torch/_dynamo/optimizations/backends.py create mode 100644 torch/_dynamo/optimizations/distributed.py create mode 100644 torch/_dynamo/optimizations/inference.py create mode 100644 torch/_dynamo/optimizations/log_args.py create mode 100644 torch/_dynamo/optimizations/normalize.py create mode 100644 torch/_dynamo/optimizations/subgraph.py create mode 100644 torch/_dynamo/optimizations/torchxla_integration.py create mode 100644 torch/_dynamo/optimizations/training.py create mode 100644 torch/_dynamo/output_graph.py create mode 100644 torch/_dynamo/profiler.py create mode 100644 torch/_dynamo/replay_record.py create mode 100644 torch/_dynamo/resume_execution.py create mode 100644 torch/_dynamo/side_effects.py create mode 100644 torch/_dynamo/skipfiles.py create mode 100644 torch/_dynamo/source.py create mode 100644 torch/_dynamo/symbolic_convert.py create mode 100644 torch/_dynamo/test_case.py create mode 100644 torch/_dynamo/test_minifier_common.py create mode 100644 torch/_dynamo/testing.py create mode 100644 torch/_dynamo/utils.py create mode 100644 torch/_dynamo/variables/__init__.py create mode 100644 torch/_dynamo/variables/base.py create mode 100644 torch/_dynamo/variables/builder.py create mode 100644 torch/_dynamo/variables/builtin.py create mode 100644 torch/_dynamo/variables/constant.py create mode 100644 torch/_dynamo/variables/dicts.py create mode 100644 torch/_dynamo/variables/functions.py create mode 100644 torch/_dynamo/variables/lists.py create mode 100644 torch/_dynamo/variables/misc.py create mode 100644 torch/_dynamo/variables/nn_module.py create mode 100644 torch/_dynamo/variables/tensor.py create mode 100644 torch/_dynamo/variables/torch.py create mode 100644 torch/_dynamo/variables/user_defined.py rename torch/{ao/sparsity/sparsifier => _functorch}/__init__.py (100%) create mode 100644 torch/_functorch/pyfunctorch.py create mode 100644 torch/_functorch/utils.py create mode 100644 torch/_inductor/__init__.py create mode 100644 torch/_inductor/codecache.py create mode 100644 torch/_inductor/codegen/__init__.py create mode 100644 torch/_inductor/codegen/autotuner.py create mode 100644 torch/_inductor/codegen/common.py create mode 100644 torch/_inductor/codegen/cpp.py create mode 100644 torch/_inductor/codegen/cpp_prefix.h create mode 100644 torch/_inductor/codegen/triton.py create mode 100644 torch/_inductor/codegen/triton_conv_delta_x.j2 create mode 100644 torch/_inductor/codegen/triton_conv_delta_x_hwc.j2 create mode 100644 torch/_inductor/codegen/triton_mm.j2 create mode 100644 torch/_inductor/codegen/triton_template.py create mode 100644 torch/_inductor/codegen/wrapper.py create mode 100644 torch/_inductor/compile_fx.py create mode 100644 torch/_inductor/config.py create mode 100644 torch/_inductor/cuda_properties.py create mode 100644 torch/_inductor/debug.py create mode 100644 torch/_inductor/decomposition.py create mode 100644 torch/_inductor/dependencies.py create mode 100644 torch/_inductor/exc.py create mode 100644 torch/_inductor/graph.py create mode 100644 torch/_inductor/ir.py create mode 100644 torch/_inductor/lowering.py create mode 100644 torch/_inductor/metrics.py create mode 100644 torch/_inductor/overrides.py create mode 100644 torch/_inductor/scheduler.py create mode 100644 torch/_inductor/sizevars.py create mode 100644 torch/_inductor/triton_ops/__init__.py create mode 100644 torch/_inductor/triton_ops/autotune.py create mode 100644 torch/_inductor/triton_ops/batched_matmul.py create mode 100644 torch/_inductor/triton_ops/conv.py create mode 100644 torch/_inductor/triton_ops/conv1x1.py create mode 100644 torch/_inductor/triton_ops/conv_perf_model.py create mode 100644 torch/_inductor/triton_ops/matmul.py create mode 100644 torch/_inductor/triton_ops/mm_perf_model.py create mode 100644 torch/_inductor/triton_ops/utils.py create mode 100644 torch/_inductor/utils.py create mode 100644 torch/_inductor/virtualized.py create mode 100644 torch/_lazy/closure.py create mode 100644 torch/_lazy/device_context.py create mode 100644 torch/_refs/_conversions.py create mode 100644 torch/_subclasses/fake_utils.py create mode 100644 torch/_weights_only_unpickler.py create mode 100644 torch/ao/nn/intrinsic/__init__.py create mode 100644 torch/ao/nn/intrinsic/modules/__init__.py create mode 100644 torch/ao/nn/intrinsic/modules/fused.py create mode 100644 torch/ao/nn/intrinsic/qat/__init__.py create mode 100644 torch/ao/nn/intrinsic/qat/modules/__init__.py create mode 100644 torch/ao/nn/intrinsic/qat/modules/conv_fused.py create mode 100644 torch/ao/nn/intrinsic/qat/modules/linear_fused.py create mode 100644 torch/ao/nn/intrinsic/qat/modules/linear_relu.py create mode 100644 torch/ao/nn/intrinsic/quantized/__init__.py create mode 100644 torch/ao/nn/intrinsic/quantized/dynamic/__init__.py create mode 100644 torch/ao/nn/intrinsic/quantized/dynamic/modules/__init__.py create mode 100644 torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py create mode 100644 torch/ao/nn/intrinsic/quantized/modules/__init__.py create mode 100644 torch/ao/nn/intrinsic/quantized/modules/bn_relu.py create mode 100644 torch/ao/nn/intrinsic/quantized/modules/conv_relu.py create mode 100644 torch/ao/nn/intrinsic/quantized/modules/linear_relu.py create mode 100644 torch/ao/nn/qat/__init__.py create mode 100644 torch/ao/nn/qat/dynamic/__init__.py create mode 100644 torch/ao/nn/qat/dynamic/modules/__init__.py create mode 100644 torch/ao/nn/qat/dynamic/modules/linear.py create mode 100644 torch/ao/nn/qat/modules/__init__.py create mode 100644 torch/ao/nn/qat/modules/conv.py create mode 100644 torch/ao/nn/qat/modules/embedding_ops.py create mode 100644 torch/ao/nn/qat/modules/linear.py create mode 100644 torch/ao/nn/quantizable/__init__.py create mode 100644 torch/ao/nn/quantizable/modules/__init__.py create mode 100644 torch/ao/nn/quantizable/modules/activation.py create mode 100644 torch/ao/nn/quantizable/modules/rnn.py create mode 100644 torch/ao/nn/quantized/__init__.py create mode 100644 torch/ao/nn/quantized/dynamic/__init__.py create mode 100644 torch/ao/nn/quantized/dynamic/modules/__init__.py create mode 100644 torch/ao/nn/quantized/dynamic/modules/conv.py create mode 100644 torch/ao/nn/quantized/dynamic/modules/linear.py create mode 100644 torch/ao/nn/quantized/dynamic/modules/rnn.py create mode 100644 torch/ao/nn/quantized/functional.py create mode 100644 torch/ao/nn/quantized/modules/__init__.py create mode 100644 torch/ao/nn/quantized/modules/activation.py create mode 100644 torch/ao/nn/quantized/modules/batchnorm.py create mode 100644 torch/ao/nn/quantized/modules/conv.py create mode 100644 torch/ao/nn/quantized/modules/dropout.py create mode 100644 torch/ao/nn/quantized/modules/embedding_ops.py create mode 100644 torch/ao/nn/quantized/modules/functional_modules.py create mode 100644 torch/ao/nn/quantized/modules/linear.py create mode 100644 torch/ao/nn/quantized/modules/normalization.py create mode 100644 torch/ao/nn/quantized/modules/rnn.py create mode 100644 torch/ao/nn/quantized/modules/utils.py create mode 100644 torch/ao/nn/quantized/reference/__init__.py create mode 100644 torch/ao/nn/quantized/reference/modules/__init__.py create mode 100644 torch/ao/nn/quantized/reference/modules/conv.py create mode 100644 torch/ao/nn/quantized/reference/modules/linear.py create mode 100644 torch/ao/nn/quantized/reference/modules/rnn.py create mode 100644 torch/ao/nn/quantized/reference/modules/sparse.py create mode 100644 torch/ao/nn/quantized/reference/modules/utils.py delete mode 100644 torch/ao/ns/_numeric_suite_dbr.py create mode 100644 torch/ao/ns/fx/n_shadows_utils.py create mode 100644 torch/ao/ns/fx/qconfig_multi_mapping.py rename torch/ao/{sparsity => pruning}/__init__.py (93%) create mode 100644 torch/ao/pruning/_experimental/__init__.py rename torch/ao/{sparsity => pruning}/_experimental/activation_sparsifier/README.md (98%) create mode 100644 torch/ao/pruning/_experimental/activation_sparsifier/__init__.py rename torch/ao/{sparsity => pruning}/_experimental/activation_sparsifier/activation_sparsifier.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_scheduler/README.md (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_scheduler/__init__.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_scheduler/base_data_scheduler.py (98%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/README.md (98%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/__init__.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/base_data_sparsifier.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/benchmarks/README.md (97%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/benchmarks/dlrm_utils.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/benchmarks/evaluate_disk_savings.py (98%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/benchmarks/evaluate_forward_time.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/benchmarks/evaluate_model_metrics.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/benchmarks/images/accuracy.png (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/benchmarks/images/disk_savings.png (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/benchmarks/images/forward_time.png (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/data_norm_sparsifier.py (100%) create mode 100644 torch/ao/pruning/_experimental/data_sparsifier/lightning/__init__.py rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/lightning/callbacks/README.md (100%) create mode 100644 torch/ao/pruning/_experimental/data_sparsifier/lightning/callbacks/__init__.py rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/lightning/callbacks/_data_sparstity_utils.py (93%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/lightning/callbacks/data_sparsity.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/lightning/tests/test_callbacks.py (95%) rename torch/ao/{sparsity => pruning}/_experimental/data_sparsifier/quantization_utils.py (98%) rename torch/ao/{sparsity => pruning}/_experimental/pruner/README.md (100%) rename torch/ao/{sparsity => pruning}/_experimental/pruner/__init__.py (100%) rename torch/ao/{sparsity => pruning}/_experimental/pruner/base_pruner.py (98%) rename torch/ao/{sparsity => pruning}/_experimental/pruner/images/prune_1.png (100%) rename torch/ao/{sparsity => pruning}/_experimental/pruner/images/prune_2.png (100%) rename torch/ao/{sparsity => pruning}/_experimental/pruner/images/prune_3.png (100%) rename torch/ao/{sparsity => pruning}/_experimental/pruner/images/prune_4.png (100%) rename torch/ao/{sparsity => pruning}/_experimental/pruner/parametrization.py (100%) rename torch/ao/{sparsity => pruning}/_mappings.py (72%) create mode 100644 torch/ao/pruning/scheduler/__init__.py rename torch/ao/{sparsity => pruning}/scheduler/base_scheduler.py (89%) create mode 100644 torch/ao/pruning/scheduler/cubic_scheduler.py rename torch/ao/{sparsity => pruning}/scheduler/lambda_scheduler.py (100%) create mode 100644 torch/ao/pruning/sparsifier/__init__.py rename torch/ao/{sparsity => pruning}/sparsifier/base_sparsifier.py (99%) rename torch/ao/{sparsity => pruning}/sparsifier/nearly_diagonal_sparsifier.py (100%) rename torch/ao/{sparsity => pruning}/sparsifier/utils.py (99%) rename torch/ao/{sparsity => pruning}/sparsifier/weight_norm_sparsifier.py (89%) delete mode 100644 torch/ao/quantization/_dbr/README.md delete mode 100644 torch/ao/quantization/_dbr/auto_trace.py delete mode 100644 torch/ao/quantization/_dbr/auto_trace_rewriter.py delete mode 100644 torch/ao/quantization/_dbr/function_fusion.py delete mode 100644 torch/ao/quantization/_dbr/fusion.py delete mode 100644 torch/ao/quantization/_dbr/mappings.py delete mode 100644 torch/ao/quantization/_dbr/model_utils.py delete mode 100644 torch/ao/quantization/_dbr/module_swap_utils.py delete mode 100644 torch/ao/quantization/_dbr/qconfig_mapping_utils.py delete mode 100644 torch/ao/quantization/_dbr/quantization_state.py delete mode 100644 torch/ao/quantization/_dbr/torchscript_utils.py delete mode 100644 torch/ao/quantization/_dbr/utils.py delete mode 100644 torch/ao/quantization/_quantize_dbr.py create mode 100644 torch/ao/quantization/backend_config/executorch.py create mode 100644 torch/ao/quantization/backend_config/fbgemm.py create mode 100644 torch/ao/quantization/backend_config/qnnpack.py create mode 100644 torch/ao/quantization/backend_config/x86.py create mode 100644 torch/ao/quantization/fx/README.md create mode 100644 torch/ao/quantization/fx/_decomposed.py delete mode 100644 torch/ao/quantization/fx/common_quantization_patterns.py rename torch/ao/quantization/fx/{qconfig_utils.py => qconfig_mapping_utils.py} (85%) delete mode 100644 torch/ao/quantization/quantization_types.py create mode 100644 torch/backends/opt_einsum/__init__.py create mode 100644 torch/csrc/api/include/torch/nested.h create mode 100644 torch/csrc/autograd/jit_decomp_interface.cpp create mode 100644 torch/csrc/autograd/jit_decomp_interface.h create mode 100644 torch/csrc/autograd/python_nested_functions.h create mode 100644 torch/csrc/autograd/python_nested_functions_manual.cpp create mode 100644 torch/csrc/cuda/CUDAPluggableAllocator.cpp create mode 100644 torch/csrc/cuda/CUDAPluggableAllocator.h create mode 100644 torch/csrc/cuda/memory_snapshot.cpp create mode 100644 torch/csrc/cuda/memory_snapshot.h delete mode 100644 torch/csrc/deploy/.gitignore delete mode 100644 torch/csrc/deploy/CMakeLists.txt delete mode 100644 torch/csrc/deploy/Exception.h delete mode 100644 torch/csrc/deploy/benchmark.cpp delete mode 100644 torch/csrc/deploy/deploy.cpp delete mode 100644 torch/csrc/deploy/deploy.h delete mode 100644 torch/csrc/deploy/elf_file.cpp delete mode 100644 torch/csrc/deploy/elf_file.h delete mode 100644 torch/csrc/deploy/environment.h delete mode 100644 torch/csrc/deploy/example/benchmark.cpp delete mode 100644 torch/csrc/deploy/example/examples.py delete mode 100644 torch/csrc/deploy/example/fx/examples.py delete mode 100644 torch/csrc/deploy/example/fx/some_dependency.py delete mode 100644 torch/csrc/deploy/example/generate_examples.py delete mode 100644 torch/csrc/deploy/example/gpu_wrapper.py delete mode 100644 torch/csrc/deploy/example/simple.pt delete mode 100644 torch/csrc/deploy/example/tensorrt_example.py delete mode 100644 torch/csrc/deploy/interactive_embedded_interpreter.cpp delete mode 100644 torch/csrc/deploy/interpreter/CMakeLists.txt delete mode 100644 torch/csrc/deploy/interpreter/CMakePythonModules.txt delete mode 100644 torch/csrc/deploy/interpreter/Optional.hpp delete mode 100644 torch/csrc/deploy/interpreter/builtin_registry.cpp delete mode 100644 torch/csrc/deploy/interpreter/builtin_registry.h delete mode 100755 torch/csrc/deploy/interpreter/configure_cpython.sh delete mode 100644 torch/csrc/deploy/interpreter/cpython_patch.diff delete mode 100644 torch/csrc/deploy/interpreter/defs.bzl delete mode 100644 torch/csrc/deploy/interpreter/hide_symbols.script delete mode 100644 torch/csrc/deploy/interpreter/import_find_sharedfuncptr.cpp delete mode 100644 torch/csrc/deploy/interpreter/interpreter_impl.cpp delete mode 100644 torch/csrc/deploy/interpreter/interpreter_impl.h delete mode 100644 torch/csrc/deploy/interpreter/register_frozenpython.cpp delete mode 100644 torch/csrc/deploy/interpreter/register_numpy.cpp delete mode 100644 torch/csrc/deploy/interpreter/register_pyyaml.cpp delete mode 100644 torch/csrc/deploy/interpreter/test_builtin_registry.cpp delete mode 100644 torch/csrc/deploy/interpreter/third_party/README.md delete mode 100644 torch/csrc/deploy/loader.cpp delete mode 100644 torch/csrc/deploy/loader.h delete mode 100644 torch/csrc/deploy/mem_file.h delete mode 100644 torch/csrc/deploy/noop_environment.h delete mode 100644 torch/csrc/deploy/path_environment.cpp delete mode 100644 torch/csrc/deploy/path_environment.h delete mode 100644 torch/csrc/deploy/remove_dt_needed.cpp delete mode 100644 torch/csrc/deploy/test_deploy.cpp delete mode 100644 torch/csrc/deploy/test_deploy_from_python.py delete mode 100644 torch/csrc/deploy/test_deploy_gpu.cpp delete mode 100644 torch/csrc/deploy/test_deploy_lib.cpp delete mode 100644 torch/csrc/deploy/test_deploy_missing_interpreter.cpp delete mode 100644 torch/csrc/deploy/test_deploy_python.py delete mode 100644 torch/csrc/deploy/test_deploy_python_ext.cpp delete mode 100644 torch/csrc/deploy/unity/example.py delete mode 100644 torch/csrc/deploy/unity/main.cpp delete mode 100644 torch/csrc/deploy/unity/tests/simple_model.py delete mode 100644 torch/csrc/deploy/unity/tests/sum.py delete mode 100644 torch/csrc/deploy/unity/tests/test_unity.h delete mode 100644 torch/csrc/deploy/unity/tests/test_unity_simple_model.cpp delete mode 100644 torch/csrc/deploy/unity/tests/test_unity_sum.cpp delete mode 100644 torch/csrc/deploy/unity/unity.bzl delete mode 100644 torch/csrc/deploy/unity/xar_environment.cpp delete mode 100644 torch/csrc/deploy/unity/xar_environment.h create mode 100644 torch/csrc/distributed/c10d/Backend.cpp create mode 100644 torch/csrc/distributed/c10d/Backend.hpp create mode 100644 torch/csrc/distributed/c10d/OpsImpl.cpp create mode 100644 torch/csrc/distributed/c10d/Work.cpp create mode 100644 torch/csrc/distributed/c10d/Work.hpp delete mode 100644 torch/csrc/dl.c create mode 100644 torch/csrc/dynamo/eval_frame.c create mode 100644 torch/csrc/dynamo/eval_frame.h create mode 100644 torch/csrc/dynamo/guards.cpp create mode 100644 torch/csrc/dynamo/guards.h create mode 100644 torch/csrc/dynamo/init.cpp create mode 100644 torch/csrc/dynamo/init.h create mode 100644 torch/csrc/functorch/init.cpp create mode 100644 torch/csrc/functorch/init.h delete mode 100644 torch/csrc/jit/backends/coreml/observer/PTMCoreMLObserver.h delete mode 100644 torch/csrc/jit/backends/coreml/observer/PTMCoreMLObserver.mm create mode 100644 torch/csrc/jit/backends/xnnpack/compiler/xnn_compiler.cpp create mode 100644 torch/csrc/jit/backends/xnnpack/compiler/xnn_compiler.h create mode 100644 torch/csrc/jit/backends/xnnpack/executor/xnn_executor.h create mode 100644 torch/csrc/jit/backends/xnnpack/serialization/schema.fbs create mode 100644 torch/csrc/jit/backends/xnnpack/serialization/serializer.cpp create mode 100644 torch/csrc/jit/backends/xnnpack/serialization/serializer.h create mode 100644 torch/csrc/jit/backends/xnnpack/xnnpack_backend_lib.cpp create mode 100644 torch/csrc/jit/backends/xnnpack/xnnpack_backend_preprocess.cpp create mode 100644 torch/csrc/jit/backends/xnnpack/xnnpack_graph_builder.cpp create mode 100644 torch/csrc/jit/backends/xnnpack/xnnpack_graph_builder.h create mode 100644 torch/csrc/jit/codegen/cuda/dynamic_type.h delete mode 100644 torch/csrc/jit/codegen/cuda/index_reference_replay.cpp delete mode 100644 torch/csrc/jit/codegen/cuda/index_reference_replay.h delete mode 100644 torch/csrc/jit/codegen/cuda/inline_propagator.cpp delete mode 100644 torch/csrc/jit/codegen/cuda/inline_propagator.h create mode 100644 torch/csrc/jit/codegen/cuda/inlining.cpp create mode 100644 torch/csrc/jit/codegen/cuda/inlining.h create mode 100644 torch/csrc/jit/codegen/cuda/lower_bank_conflict.cpp create mode 100644 torch/csrc/jit/codegen/cuda/lower_bank_conflict.h create mode 100644 torch/csrc/jit/codegen/cuda/lower_divisible_split.cpp create mode 100644 torch/csrc/jit/codegen/cuda/lower_divisible_split.h create mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/README.md delete mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/examples/double_half_cast.py delete mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/examples/half_double_cast.py delete mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/examples/python_example.py delete mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/examples/python_example_broadcast_in_dim.py delete mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/examples/python_example_fp16.py create mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/fusion_cache.cpp create mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/fusion_cache.h create mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/fusion_interface.cpp create mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/fusion_interface.h delete mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/fusion_owner.h create mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/test/test_nvfuser_fusion_cache.cpp create mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/test/test_nvfuser_fusion_definition.cpp create mode 100644 torch/csrc/jit/codegen/cuda/python_frontend/test/test_nvfuser_fusion_record.cpp delete mode 100644 torch/csrc/jit/codegen/cuda/reference_tensor.h create mode 100644 torch/csrc/jit/codegen/cuda/runtime/array_rocm.cu create mode 100644 torch/csrc/jit/codegen/cuda/runtime/bf16_support_rocm.cu create mode 100644 torch/csrc/jit/codegen/cuda/runtime/block_sync_default_rocm.cu create mode 100644 torch/csrc/jit/codegen/cuda/runtime/fused_welford_helper.cu create mode 100644 torch/csrc/jit/codegen/cuda/runtime/fused_welford_impl.cu create mode 100644 torch/csrc/jit/codegen/cuda/runtime/warp_rocm.cu create mode 100644 torch/csrc/jit/codegen/cuda/scheduler/transpose.cpp create mode 100644 torch/csrc/jit/codegen/cuda/scheduler/transpose.h create mode 100644 torch/csrc/jit/codegen/cuda/scheduler/transpose_heuristic.h create mode 100644 torch/csrc/jit/codegen/cuda/scheduler/vectorize_helper.cpp delete mode 100644 torch/csrc/jit/codegen/cuda/test/test_gpu.cpp create mode 100644 torch/csrc/jit/codegen/cuda/test/test_gpu1.cpp create mode 100644 torch/csrc/jit/codegen/cuda/test/test_gpu2.cpp create mode 100644 torch/csrc/jit/codegen/cuda/test/test_gpu3.cpp create mode 100644 torch/csrc/jit/codegen/cuda/test/test_gpu_rng.cu create mode 100644 torch/csrc/jit/codegen/cuda/test/test_gpu_tensor_factories.cpp create mode 100644 torch/csrc/jit/codegen/cuda/test/test_gpu_transpose.cpp create mode 100644 torch/csrc/jit/codegen/cuda/test/test_gpu_utils.cpp create mode 100644 torch/csrc/jit/codegen/onednn/decompose_silu.cpp create mode 100644 torch/csrc/jit/codegen/onednn/decompose_silu.h create mode 100644 torch/csrc/jit/mobile/quantization.cpp create mode 100644 torch/csrc/jit/mobile/quantization.h create mode 100644 torch/csrc/jit/passes/mobile_optimizer_type.h create mode 100644 torch/csrc/jit/passes/onnx/naming.cpp create mode 100644 torch/csrc/jit/passes/onnx/naming.h create mode 100644 torch/csrc/jit/passes/quantization/register_packed_params.cpp create mode 100644 torch/csrc/jit/passes/quantization/register_packed_params.h delete mode 100644 torch/csrc/lazy/core/lazy_view.cpp delete mode 100644 torch/csrc/lazy/core/lazy_view.h create mode 100644 torch/csrc/onnx/diagnostics/diagnostics.h create mode 100644 torch/csrc/onnx/diagnostics/generated/rules.h delete mode 100644 torch/csrc/profiler/api.cpp create mode 100644 torch/csrc/profiler/data_flow.cpp create mode 100644 torch/csrc/profiler/data_flow.h create mode 100644 torch/csrc/profiler/events.h create mode 100644 torch/csrc/profiler/orchestration/observer.cpp create mode 100644 torch/csrc/profiler/orchestration/observer.h create mode 100644 torch/csrc/profiler/orchestration/python_tracer.cpp create mode 100644 torch/csrc/profiler/orchestration/python_tracer.h create mode 100644 torch/csrc/profiler/perf-inl.h create mode 100644 torch/csrc/profiler/perf.cpp create mode 100644 torch/csrc/profiler/perf.h create mode 100644 torch/csrc/profiler/python/init.cpp create mode 100644 torch/csrc/profiler/python/init.h create mode 100644 torch/csrc/profiler/python/pybind.h rename torch/csrc/profiler/{ => standalone}/execution_graph_observer.cpp (96%) rename torch/csrc/profiler/{ => standalone}/execution_graph_observer.h (100%) rename torch/csrc/profiler/{ => standalone}/itt_observer.cpp (89%) rename torch/csrc/profiler/{ => standalone}/itt_observer.h (100%) rename torch/csrc/profiler/{ => standalone}/nvtx_observer.cpp (95%) rename torch/csrc/profiler/{ => standalone}/nvtx_observer.h (100%) create mode 100644 torch/csrc/profiler/stubs/base.cpp create mode 100644 torch/csrc/profiler/stubs/base.h rename torch/csrc/profiler/{ => stubs}/cuda.cpp (94%) rename torch/csrc/profiler/{ => stubs}/itt.cpp (96%) delete mode 100644 torch/csrc/utils/disallow_copy.h create mode 100644 torch/csrc/utils/nested.cpp create mode 100644 torch/csrc/utils/nested.h create mode 100644 torch/csrc/utils/pybind.cpp create mode 100644 torch/csrc/utils/python_symnode.cpp create mode 100644 torch/csrc/utils/python_symnode.h create mode 100644 torch/cuda/_sanitizer.py delete mode 100644 torch/deploy.h create mode 100644 torch/distributed/_composable/__init__.py create mode 100644 torch/distributed/_composable/_ddp.py create mode 100644 torch/distributed/_composable/checkpoint_activation.py create mode 100644 torch/distributed/_composable/contract.py create mode 100644 torch/distributed/_composable/fully_shard.py create mode 100644 torch/distributed/_composable/replicate.py delete mode 100644 torch/distributed/_shard/checkpoint/filesystem.py delete mode 100644 torch/distributed/_shard/checkpoint/resharding.py delete mode 100644 torch/distributed/_shard/checkpoint/state_dict_loader.py delete mode 100644 torch/distributed/_shard/checkpoint/state_dict_saver.py delete mode 100644 torch/distributed/_shard/checkpoint/storage.py create mode 100644 torch/distributed/_spmd/__init__.py create mode 100644 torch/distributed/_spmd/comm_tensor.py create mode 100644 torch/distributed/_tensor/README.md create mode 100644 torch/distributed/_tensor/__init__.py create mode 100644 torch/distributed/_tensor/api.py create mode 100644 torch/distributed/_tensor/device_mesh.py create mode 100644 torch/distributed/_tensor/dispatch.py create mode 100644 torch/distributed/_tensor/ops/__init__.py create mode 100644 torch/distributed/_tensor/ops/common_rules.py create mode 100644 torch/distributed/_tensor/ops/math_ops.py create mode 100644 torch/distributed/_tensor/ops/matrix_ops.py create mode 100644 torch/distributed/_tensor/ops/pointwise_ops.py create mode 100644 torch/distributed/_tensor/ops/tensor_ops.py create mode 100644 torch/distributed/_tensor/ops/tp_sharding_ops.py create mode 100644 torch/distributed/_tensor/ops/utils.py create mode 100644 torch/distributed/_tensor/ops/view_ops.py create mode 100644 torch/distributed/_tensor/parallel/__init__.py create mode 100644 torch/distributed/_tensor/parallel/_view_with_dim_change.py create mode 100644 torch/distributed/_tensor/parallel/api.py create mode 100644 torch/distributed/_tensor/parallel/fsdp.py create mode 100644 torch/distributed/_tensor/parallel/multihead_attention_tp.py create mode 100644 torch/distributed/_tensor/parallel/style.py create mode 100644 torch/distributed/_tensor/parallel/utils.py create mode 100644 torch/distributed/_tensor/placement_types.py create mode 100644 torch/distributed/_tensor/redistribute.py create mode 100644 torch/distributed/_tensor/utils.py create mode 100644 torch/distributed/c10d_error_logger.py create mode 100644 torch/distributed/checkpoint/__init__.py rename torch/distributed/{_shard => }/checkpoint/api.py (90%) create mode 100644 torch/distributed/checkpoint/dedup_tensors.py create mode 100644 torch/distributed/checkpoint/default_planner.py create mode 100644 torch/distributed/checkpoint/filesystem.py rename torch/distributed/{_shard => }/checkpoint/metadata.py (75%) create mode 100644 torch/distributed/checkpoint/planner.py create mode 100644 torch/distributed/checkpoint/planner_helpers.py create mode 100644 torch/distributed/checkpoint/resharding.py create mode 100644 torch/distributed/checkpoint/state_dict_loader.py create mode 100644 torch/distributed/checkpoint/state_dict_saver.py create mode 100644 torch/distributed/checkpoint/storage.py create mode 100644 torch/distributed/checkpoint/traverse.py rename torch/distributed/{_shard => }/checkpoint/utils.py (73%) create mode 100644 torch/distributed/elastic/timer/file_based_local_timer.py create mode 100644 torch/distributed/fsdp/_common_utils.py create mode 100644 torch/distributed/fsdp/_exec_order_utils.py create mode 100644 torch/distributed/fsdp/_fsdp_extensions.py create mode 100644 torch/distributed/fsdp/_init_utils.py create mode 100644 torch/distributed/fsdp/_limiter_utils.py create mode 100644 torch/distributed/fsdp/_runtime_utils.py rename torch/distributed/fsdp/{shard_utils.py => _shard_utils.py} (64%) create mode 100644 torch/distributed/fsdp/_state_dict_utils.py create mode 100644 torch/distributed/fsdp/_unshard_param_utils.py create mode 100644 torch/distributed/fsdp/_wrap_utils.py create mode 100644 torch/distributed/fsdp/api.py delete mode 100644 torch/distributed/fsdp/flatten_params_wrapper.py create mode 100644 torch/distributed/logging_handlers.py create mode 100644 torch/distributed/optim/apply_optimizer_in_backward.py delete mode 100644 torch/fx/passes/backends/nvfuser.py create mode 100644 torch/masked/__init__.py rename torch/{_masked => masked}/_docs.py (97%) rename torch/{_masked/__init__.py => masked/_ops.py} (92%) create mode 100644 torch/masked/maskedtensor/__init__.py create mode 100644 torch/masked/maskedtensor/_ops_refs.py create mode 100644 torch/masked/maskedtensor/binary.py create mode 100644 torch/masked/maskedtensor/core.py create mode 100644 torch/masked/maskedtensor/creation.py create mode 100644 torch/masked/maskedtensor/passthrough.py create mode 100644 torch/masked/maskedtensor/reductions.py create mode 100644 torch/masked/maskedtensor/unary.py delete mode 100644 torch/nn/parallel/distributed.pyi create mode 100644 torch/nn/utils/_deprecation_utils.py create mode 100644 torch/onnx/_internal/__init__.py create mode 100644 torch/onnx/_internal/_beartype.py create mode 100644 torch/onnx/_internal/diagnostics/OVERVIEW.md create mode 100644 torch/onnx/_internal/diagnostics/__init__.py create mode 100644 torch/onnx/_internal/diagnostics/_diagnostic.py create mode 100644 torch/onnx/_internal/diagnostics/_rules.py create mode 100644 torch/onnx/_internal/diagnostics/infra/__init__.py create mode 100644 torch/onnx/_internal/diagnostics/infra/_infra.py create mode 100644 torch/onnx/_internal/diagnostics/infra/engine.py create mode 100644 torch/onnx/_internal/diagnostics/infra/formatter.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/__init__.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_address.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_artifact.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_artifact_change.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_artifact_content.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_artifact_location.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_attachment.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_code_flow.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_configuration_override.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_conversion.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_edge.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_edge_traversal.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_exception.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_external_properties.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_reference.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_external_property_file_references.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_fix.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_graph.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_graph_traversal.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_invocation.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_location.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_location_relationship.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_logical_location.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_message.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_multiformat_message_string.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_node.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_notification.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_physical_location.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_property_bag.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_rectangle.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_region.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_replacement.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_reporting_configuration.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_reference.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_reporting_descriptor_relationship.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_result.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_result_provenance.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_run.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_run_automation_details.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_sarif_log.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_special_locations.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_stack.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_stack_frame.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_suppression.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_thread_flow_location.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_tool.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_tool_component.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_tool_component_reference.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_translation_metadata.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_version_control_details.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_web_request.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/_web_response.py create mode 100644 torch/onnx/_internal/diagnostics/infra/sarif/version.py create mode 100644 torch/onnx/_internal/diagnostics/infra/utils.py create mode 100644 torch/onnx/_internal/diagnostics/rules.yaml create mode 100644 torch/onnx/_internal/jit_utils.py create mode 100644 torch/onnx/_internal/onnx_proto_utils.py create mode 100644 torch/onnx/_internal/registration.py create mode 100644 torch/onnx/symbolic_opset17.py delete mode 100644 torch/onnx/symbolic_registry.py create mode 100644 torch/profiler/_memory_profiler.py create mode 100644 torch/signal/__init__.py create mode 100644 torch/signal/windows/__init__.py create mode 100644 torch/signal/windows/windows.py create mode 100644 torch/sparse/matmul.py create mode 100644 torch/testing/_internal/distributed/_tensor/__init__.py create mode 100644 torch/testing/_internal/distributed/_tensor/common_dtensor.py create mode 100644 torch/testing/_internal/distributed/_tensor/dtensor_lagging_op_db.py rename functorch/codegen/gen_functorch_lagging_op_db.py => torch/testing/_internal/distributed/_tensor/gen_dtensor_lagging_op_db.py (57%) create mode 100644 torch/testing/_internal/distributed/multi_threaded_pg.py create mode 100644 torch/testing/_internal/inductor_utils.py create mode 100644 torch/testing/_internal/opinfo/definitions/__init__.py create mode 100644 torch/testing/_internal/opinfo/definitions/_masked.py create mode 100644 torch/testing/_internal/opinfo/definitions/fft.py create mode 100644 torch/testing/_internal/opinfo/definitions/linalg.py create mode 100644 torch/testing/_internal/opinfo/definitions/signal.py create mode 100644 torch/testing/_internal/opinfo/definitions/special.py create mode 100644 torch/testing/_internal/opinfo/refs.py delete mode 100644 torch/testing/_legacy.py create mode 100644 torch/utils/backend_registration.py create mode 100644 torch/utils/cpp_backtrace.py delete mode 100644 torch/utils/data/communication/__init__.py delete mode 100644 torch/utils/data/communication/eventloop.py delete mode 100644 torch/utils/data/communication/iter.py delete mode 100644 torch/utils/data/communication/map.py delete mode 100644 torch/utils/data/communication/messages.py delete mode 100644 torch/utils/data/communication/protocol.py delete mode 100644 torch/utils/data/communication/queue.py delete mode 100644 torch/utils/data/dataloader_experimental.py delete mode 100644 ubsan.supp diff --git a/.bazelrc b/.bazelrc index ce8406b58aaa..f8ff2215f2d6 100644 --- a/.bazelrc +++ b/.bazelrc @@ -1,4 +1,4 @@ -build --cxxopt=--std=c++14 +build --cxxopt=--std=c++17 build --copt=-I. # Bazel does not support including its cc_library targets as system # headers. We work around this for generated code diff --git a/.circleci/README.md b/.circleci/README.md new file mode 100644 index 000000000000..e2429b4d1f03 --- /dev/null +++ b/.circleci/README.md @@ -0,0 +1,468 @@ +Warning +======= + +Contents may be out of date. Our CircleCI workflows are gradually being migrated to Github actions. + +Structure of CI +=============== + +setup job: +1. Does a git checkout +2. Persists CircleCI scripts (everything in `.circleci`) into a workspace. Why? + We don't always do a Git checkout on all subjobs, but we usually + still want to be able to call scripts one way or another in a subjob. + Persisting files this way lets us have access to them without doing a + checkout. This workspace is conventionally mounted on `~/workspace` + (this is distinguished from `~/project`, which is the conventional + working directory that CircleCI will default to starting your jobs + in.) +3. Write out the commit message to `.circleci/COMMIT_MSG`. This is so + we can determine in subjobs if we should actually run the jobs or + not, even if there isn't a Git checkout. + + +CircleCI configuration generator +================================ + +One may no longer make changes to the `.circleci/config.yml` file directly. +Instead, one must edit these Python scripts or files in the `verbatim-sources/` directory. + + +Usage +---------- + +1. Make changes to these scripts. +2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`. + +You'll see a build failure on GitHub if the scripts don't agree with the checked-in version. + + +Motivation +---------- + +These scripts establish a single, authoritative source of documentation for the CircleCI configuration matrix. +The documentation, in the form of diagrams, is automatically generated and cannot drift out of sync with the YAML content. + +Furthermore, consistency is enforced within the YAML config itself, by using a single source of data to generate +multiple parts of the file. + +* Facilitates one-off culling/enabling of CI configs for testing PRs on special targets + +Also see https://github.com/pytorch/pytorch/issues/17038 + + +Future direction +---------------- + +### Declaring sparse config subsets +See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747): + +In contrast with a full recursive tree traversal of configuration dimensions, +> in the future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this. +---------------- +---------------- + +# How do the binaries / nightlies / releases work? + +### What is a binary? + +A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source. + +A **binary configuration** is a collection of + +* release or nightly + * releases are stable, nightlies are beta and built every night +* python version + * linux: 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists) + * macos: 3.7, 3.8 + * windows: 3.7, 3.8 +* cpu version + * cpu, cuda 9.0, cuda 10.0 + * The supported cuda versions occasionally change +* operating system + * Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu + * MacOS + * Windows - these are built on Azure pipelines +* devtoolset version (gcc compiler version) + * This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string + +### Where are the binaries? + +The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months. + +We have 3 types of binary packages + +* pip packages - nightlies are stored on s3 (pip install -f \). releases are stored in a pip repo (pip install torch) (ask Soumith about this) +* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix +* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only + * shared with dependencies (the only supported option for Windows) + * static with dependencies + * shared without dependencies + * static without dependencies + +All binaries are built in CircleCI workflows except Windows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release) + +# CircleCI structure of the binaries + +Some quick vocab: + +* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows. +* **jobs** are a sequence of '**steps**' +* **steps** are usually just a bash script or a builtin CircleCI command. *All steps run in new environments, environment variables declared in one script DO NOT persist to following steps* +* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps. + +## How are the workflows structured? + +The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build, test, and upload) per binary configuration + +1. binary_builds + 1. every day midnight EST + 2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml + 3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml + 4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a + 1. binary_linux_conda_3.7_cpu_build + 1. Builds the build. On linux jobs this uses the 'docker executor'. + 2. Persists the package to the workspace + 2. binary_linux_conda_3.7_cpu_test + 1. Loads the package to the workspace + 2. Spins up a docker image (on Linux), mapping the package and code repos into the docker + 3. Runs some smoke tests in the docker + 4. (Actually, for macos this is a step rather than a separate job) + 3. binary_linux_conda_3.7_cpu_upload + 1. Logs in to aws/conda + 2. Uploads the package +2. update_s3_htmls + 1. every day 5am EST + 2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml + 3. See below for what these are for and why they're needed + 4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3 +3. binarysmoketests + 1. every day + 2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml + 3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a + 1. smoke_linux_conda_3.7_cpu + 1. Downloads the package from the cloud, e.g. using the official pip or conda instructions + 2. Runs the smoke tests + +## How are the jobs structured? + +The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts . + +* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml + * binary_linux_build.sh + * binary_linux_test.sh + * binary_linux_upload.sh +* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml + * binary_macos_build.sh + * binary_macos_test.sh + * binary_macos_upload.sh +* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml + * These delegate from the pytorch/builder repo + * https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh + * https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh +* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml + * These delegate from the pytorch/builder repo + * https://github.com/pytorch/builder/blob/master/run_tests.sh + * https://github.com/pytorch/builder/blob/master/smoke_test.sh + * https://github.com/pytorch/builder/blob/master/check_binary.sh +* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml + * binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh + * binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps. + * binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables + * binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image + +### **Why do the steps all refer to scripts?** + +CircleCI creates a final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems. + +### **What is binary_run_in_docker for?** + +So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus + +* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor +* linux test jobs use the machine executor in order for them to properly interface with GPUs since docker executors cannot execute with attached GPUs +* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use +* linux smoke test jobs use the machine executor for the same reason as the linux test jobs + +binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs + +### **Why does binary_checkout also checkout pytorch? Why shouldn't it?** + +We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where. + +# Code structure of the binaries (circleci agnostic) + +## Overview + +The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder), which is a repo that defines how all the binaries are built. The relevant code is + + +``` +# All code needed to set-up environments for build code to run in, +# but only code that is specific to the current CI system +pytorch/pytorch +- .circleci/ # Folder that holds all circleci related stuff + - config.yml # GENERATED file that actually controls all circleci behavior + - verbatim-sources # Used to generate job/workflow sections in ^ + - scripts/ # Code needed to prepare circleci environments for binary build scripts +- setup.py # Builds pytorch. This is wrapped in pytorch/builder +- cmake files # used in normal building of pytorch +# All code needed to prepare a binary build, given an environment +# with all the right variables/packages/paths. +pytorch/builder +# Given an installed binary and a proper python env, runs some checks +# to make sure the binary was built the proper way. Checks things like +# the library dependencies, symbols present, etc. +- check_binary.sh +# Given an installed binary, runs python tests to make sure everything +# is in order. These should be de-duped. Right now they both run smoke +# tests, but are called from different places. Usually just call some +# import statements, but also has overlap with check_binary.sh above +- run_tests.sh +- smoke_test.sh +# Folders that govern how packages are built. See paragraphs below +- conda/ + - build_pytorch.sh # Entrypoint. Delegates to proper conda build folder + - switch_cuda_version.sh # Switches activate CUDA installation in Docker + - pytorch-nightly/ # Build-folder +- manywheel/ + - build_cpu.sh # Entrypoint for cpu builds + - build.sh # Entrypoint for CUDA builds + - build_common.sh # Actual build script that ^^ call into +- wheel/ + - build_wheel.sh # Entrypoint for wheel builds +- windows/ + - build_pytorch.bat # Entrypoint for wheel builds on Windows +``` + +Every type of package has an entrypoint build script that handles the all the important logic. + +## Conda + +Linux, MacOS and Windows use the same code flow for the conda builds. + +Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html + +Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing. +tl;dr on conda-build is + +1. Creates a brand new conda environment, based off of deps in the meta.yaml + 1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml + 2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is. +2. Calls build.sh in the environment +3. Copies the finished package to a new conda env, also specified by the meta.yaml +4. Runs some simple import tests (if specified in the meta.yaml) +5. Saves the finished package as a tarball + +The build.sh we use is essentially a wrapper around `python setup.py build`, but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths. + +The entrypoint file `builder/conda/build_conda.sh` is complicated because + +* It works for Linux, MacOS and Windows + * The mac builds used to create their own environments, since they all used to be on the same machine. There’s now a lot of extra logic to handle conda envs. This extra machinery could be removed +* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed. + +## Manywheels (linux pip and libtorch packages) + +Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant. + +`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh` + +The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because + +* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unnecessary folders and movements here and there. + * The script is never used this way anymore. This extra machinery could be removed. +* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff + * The script is never used this way anymore. This extra machinery could be removed. +* This also builds libtorch packages + * This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file. +* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed. + +## Wheels (MacOS pip and libtorch packages) + +The entrypoint file `builder/wheel/build_wheel.sh` is complicated because + +* The mac builds used to all run on one machine (we didn’t have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory. + * The script is never used this way anymore. This extra machinery could be removed. +* This also builds libtorch packages + * Ditto the comment above. This should definitely be separated out. + +Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda. + +## Windows Wheels (Windows pip and libtorch packages) + +The entrypoint file `builder/windows/build_pytorch.bat` is complicated because + +* This used to handle building for several different python versions at the same time. This is why there are loops everywhere + * The script is never used this way anymore. This extra machinery could be removed. +* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff + * The script is never used this way anymore. This extra machinery could be removed. +* This also builds libtorch packages + * This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file. + +Note that the Windows Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda. + +## General notes + +### Note on run_tests.sh, smoke_test.sh, and check_binary.sh + +* These should all be consolidated +* These must run on all OS types: MacOS, Linux, and Windows +* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didn’t mess anything up. +* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package. + +### Note on libtorch + +Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this + +* It’s confusing. Most of those scripts deal with python specifics. +* The extra conditionals everywhere severely complicate the wheel build scripts +* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script) + +### Note on docker images / Dockerfiles + +All linux builds occur in docker images. The docker images are + +* pytorch/conda-cuda + * Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds + * Also used for cpu builds +* pytorch/manylinux-cuda90 +* pytorch/manylinux-cuda100 + * Also used for cpu builds + +The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now. + +### General Python + +* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2 + +# How to manually rebuild the binaries + +tl;dr make a PR that looks like https://github.com/pytorch/pytorch/pull/21159 + +Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want. + +## How to test changes to the binaries via .circleci + +Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using `.circleci/regenerate.sh` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this. + +```sh +# Make your changes +touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml +# Regenerate the yaml, has to be in python 3.7 +.circleci/regenerate.sh +# Make a commit +git add .circleci * +git commit -m "My real changes" +git push origin my_branch +# Now hardcode the jobs that you want in the .circleci/config.yml workflows section +# Also eliminate ensure-consistency and should_run_job checks +# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d +# Make a commit you won't keep +git add .circleci +git commit -m "[DO NOT LAND] testing binaries for above changes" +git push origin my_branch +# Now you need to make some changes to the first commit. +git rebase -i HEAD~2 # mark the first commit as 'edit' +# Make the changes +touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml +.circleci/regenerate.sh +# Ammend the commit and recontinue +git add .circleci +git commit --amend +git rebase --continue +# Update the PR, need to force since the commits are different now +git push origin my_branch --force +``` + +The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes. + +## How to build a binary locally + +### Linux + +You can build Linux binaries locally easily using docker. + +```sh +# Run the docker +# Use the correct docker image, pytorch/conda-cuda used here as an example +# +# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the +# machine that you're running the command on) accessible to the docker +# container at path/to/bar. So if you then run `touch path/to/bar/baz` +# in the docker container then you will see path/to/foo/baz on your local +# machine. You could also clone the pytorch and builder repos in the docker. +# +# If you know how, add ccache as a volume too and speed up everything +docker run \ + -v your/pytorch/repo:/pytorch \ + -v your/builder/repo:/builder \ + -v where/you/want/packages/to/appear:/final_pkgs \ + -it pytorch/conda-cuda /bin/bash +# Export whatever variables are important to you. All variables that you'd +# possibly need are in .circleci/scripts/binary_populate_env.sh +# You should probably always export at least these 3 variables +export PACKAGE_TYPE=conda +export DESIRED_PYTHON=3.7 +export DESIRED_CUDA=cpu +# Call the entrypoint +# `|& tee foo.log` just copies all stdout and stderr output to foo.log +# The builds generate lots of output so you probably need this when +# building locally. +/builder/conda/build_pytorch.sh |& tee build_output.log +``` + +**Building CUDA binaries on docker** + +You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though it’s gonna take a long time). + +For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast. + +### MacOS + +There’s no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If you’re trying to repro an error on a Mac build in .circleci and you can’t seem to repro locally, then my best advice is actually to iterate on .circleci :/ + +But if you want to try, then I’d recommend + +```sh +# Create a new terminal +# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you +# know how to do +# Install a new miniconda +# First remove any other python or conda installation from your PATH +# Always install miniconda 3, even if building for Python <3 +new_conda="~/my_new_conda" +conda_sh="$new_conda/install_miniconda.sh" +curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh +chmod +x "$conda_sh" +"$conda_sh" -b -p "$MINICONDA_ROOT" +rm -f "$conda_sh" +export PATH="~/my_new_conda/bin:$PATH" +# Create a clean python env +# All MacOS builds use conda to manage the python env and dependencies +# that are built with, even the pip packages +conda create -yn binary python=2.7 +conda activate binary +# Export whatever variables are important to you. All variables that you'd +# possibly need are in .circleci/scripts/binary_populate_env.sh +# You should probably always export at least these 3 variables +export PACKAGE_TYPE=conda +export DESIRED_PYTHON=3.7 +export DESIRED_CUDA=cpu +# Call the entrypoint you want +path/to/builder/wheel/build_wheel.sh +``` + +N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that + +1. You make the ‘conda’ command accessible by prepending `path/to/conda_root/bin` to your PATH. +2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH` +3. Now say you (or some code that you ran) call python executable `foo` + 1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected. + 2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called ‘base’), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version! + +Newer conda versions and proper python hygiene can prevent this, but just install a new miniconda to be safe. + +### Windows + +TODO: fill in diff --git a/.circleci/cimodel/data/simple/ios_definitions.py b/.circleci/cimodel/data/simple/ios_definitions.py index a01a2db8229f..42aac5d90127 100644 --- a/.circleci/cimodel/data/simple/ios_definitions.py +++ b/.circleci/cimodel/data/simple/ios_definitions.py @@ -1,4 +1,5 @@ from cimodel.data.simple.util.versions import MultiPartVersion +from cimodel.data.simple.util.branch_filters import gen_filter_dict_exclude import cimodel.lib.miniutils as miniutils XCODE_VERSION = MultiPartVersion([12, 5, 1]) @@ -11,7 +12,7 @@ def __init__(self, name, custom_build_name=""): def render(self): extra_parts = [self.custom_build_name] if len(self.custom_build_name) > 0 else [] - return "_".join([self.name] + extra_parts) + return "-".join([self.name] + extra_parts).replace("_", "-") def get_platform(arch_variant_name): @@ -25,30 +26,25 @@ def __init__(self, xcode_version, arch_variant, is_org_member_context=True, extr self.is_org_member_context = is_org_member_context self.extra_props = extra_props - def gen_name_parts(self, with_version_dots): - - version_parts = self.xcode_version.render_dots_or_parts(with_version_dots) - build_variant_suffix = "_".join([self.arch_variant.render(), "build"]) - + def gen_name_parts(self): + version_parts = self.xcode_version.render_dots_or_parts("-") + build_variant_suffix = self.arch_variant.render() return [ - "pytorch", "ios", ] + version_parts + [ build_variant_suffix, ] def gen_job_name(self): - return "_".join(self.gen_name_parts(False)) + return "-".join(self.gen_name_parts()) def gen_tree(self): - platform_name = get_platform(self.arch_variant.name) - props_dict = { - "build_environment": "-".join(self.gen_name_parts(True)), + "name": self.gen_job_name(), + "build_environment": self.gen_job_name(), "ios_arch": self.arch_variant.name, "ios_platform": platform_name, - "name": self.gen_job_name(), } if self.is_org_member_context: @@ -57,30 +53,28 @@ def gen_tree(self): if self.extra_props: props_dict.update(self.extra_props) + props_dict["filters"] = gen_filter_dict_exclude() + return [{"pytorch_ios_build": props_dict}] WORKFLOW_DATA = [ IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False, extra_props={ "lite_interpreter": miniutils.quote(str(int(True)))}), - IOSJob(XCODE_VERSION, ArchVariant("x86_64", "full_jit"), is_org_member_context=False, extra_props={ - "lite_interpreter": miniutils.quote(str(int(False)))}), - IOSJob(XCODE_VERSION, ArchVariant("arm64"), extra_props={ - "lite_interpreter": miniutils.quote(str(int(True)))}), - IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={ - "use_metal": miniutils.quote(str(int(True))), - "lite_interpreter": miniutils.quote(str(int(True)))}), - IOSJob(XCODE_VERSION, ArchVariant("arm64", "full_jit"), extra_props={ - "lite_interpreter": miniutils.quote(str(int(False)))}), - IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={ - "op_list": "mobilenetv2.yaml", - "lite_interpreter": miniutils.quote(str(int(True)))}), + # IOSJob(XCODE_VERSION, ArchVariant("arm64"), extra_props={ + # "lite_interpreter": miniutils.quote(str(int(True)))}), + # IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={ + # "use_metal": miniutils.quote(str(int(True))), + # "lite_interpreter": miniutils.quote(str(int(True)))}), + # IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom-ops"), extra_props={ + # "op_list": "mobilenetv2.yaml", + # "lite_interpreter": miniutils.quote(str(int(True)))}), IOSJob(XCODE_VERSION, ArchVariant("x86_64", "coreml"), is_org_member_context=False, extra_props={ "use_coreml": miniutils.quote(str(int(True))), "lite_interpreter": miniutils.quote(str(int(True)))}), - IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={ - "use_coreml": miniutils.quote(str(int(True))), - "lite_interpreter": miniutils.quote(str(int(True)))}), + # IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={ + # "use_coreml": miniutils.quote(str(int(True))), + # "lite_interpreter": miniutils.quote(str(int(True)))}), ] diff --git a/.circleci/cimodel/data/simple/macos_definitions.py b/.circleci/cimodel/data/simple/macos_definitions.py index 371c8b694cf3..fff146dbf6bb 100644 --- a/.circleci/cimodel/data/simple/macos_definitions.py +++ b/.circleci/cimodel/data/simple/macos_definitions.py @@ -11,10 +11,14 @@ def gen_tree(self): non_phase_parts = ["pytorch", "macos", self.os_version, "py3"] extra_name_list = [name for name, exist in self.extra_props.items() if exist] - full_job_name_list = non_phase_parts + extra_name_list + [ - 'build' if self.is_build else None, - 'test' if self.is_test else None, - ] + full_job_name_list = ( + non_phase_parts + + extra_name_list + + [ + "build" if self.is_build else None, + "test" if self.is_test else None, + ] + ) full_job_name = "_".join(list(filter(None, full_job_name_list))) @@ -41,10 +45,8 @@ def gen_tree(self): "10_13", is_build=True, is_test=True, - extra_props=tuple({ - "lite_interpreter": True - }.items()), - ) + extra_props=tuple({"lite_interpreter": True}.items()), + ), ] diff --git a/.circleci/cimodel/data/simple/nightly_ios.py b/.circleci/cimodel/data/simple/nightly_ios.py index 941a61a73b91..f75bcb4bfe21 100644 --- a/.circleci/cimodel/data/simple/nightly_ios.py +++ b/.circleci/cimodel/data/simple/nightly_ios.py @@ -15,7 +15,7 @@ def __init__(self, def get_phase_name(self): return "upload" if self.is_upload else "build" - def get_common_name_pieces(self, with_version_dots): + def get_common_name_pieces(self, sep): extra_name_suffix = [self.get_phase_name()] if self.is_upload else [] @@ -24,7 +24,7 @@ def get_common_name_pieces(self, with_version_dots): common_name_pieces = [ "ios", ] + extra_name + [ - ] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [ + ] + ios_definitions.XCODE_VERSION.render_dots_or_parts(sep) + [ "nightly", self.variant, "build", @@ -33,14 +33,14 @@ def get_common_name_pieces(self, with_version_dots): return common_name_pieces def gen_job_name(self): - return "_".join(["pytorch"] + self.get_common_name_pieces(False)) + return "_".join(["pytorch"] + self.get_common_name_pieces(None)) def gen_tree(self): build_configs = BUILD_CONFIGS_FULL_JIT if self.is_full_jit else BUILD_CONFIGS extra_requires = [x.gen_job_name() for x in build_configs] if self.is_upload else [] props_dict = { - "build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(True)), + "build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(".")), "requires": extra_requires, "context": "org-member", "filters": {"branches": {"only": "nightly"}}, diff --git a/.circleci/cimodel/data/simple/util/branch_filters.py b/.circleci/cimodel/data/simple/util/branch_filters.py index ba4e00a059ef..e87d0045636d 100644 --- a/.circleci/cimodel/data/simple/util/branch_filters.py +++ b/.circleci/cimodel/data/simple/util/branch_filters.py @@ -12,6 +12,9 @@ RC_PATTERN = r"/v[0-9]+(\.[0-9]+)*-rc[0-9]+/" +MAC_IOS_EXCLUSION_LIST = ["nightly", "postnightly"] + + def gen_filter_dict( branches_list=NON_PR_BRANCH_LIST, tags_list=None @@ -26,3 +29,11 @@ def gen_filter_dict( if tags_list is not None: filter_dict["tags"] = {"only": tags_list} return filter_dict + + +def gen_filter_dict_exclude(branches_list=MAC_IOS_EXCLUSION_LIST): + return { + "branches": { + "ignore": branches_list, + }, + } diff --git a/.circleci/cimodel/data/simple/util/versions.py b/.circleci/cimodel/data/simple/util/versions.py index 53d3a837248c..518feb2e3869 100644 --- a/.circleci/cimodel/data/simple/util/versions.py +++ b/.circleci/cimodel/data/simple/util/versions.py @@ -1,3 +1,6 @@ +from typing import Optional + + class MultiPartVersion: def __init__(self, parts, prefix=""): self.parts = parts @@ -13,14 +16,11 @@ def prefixed_parts(self): else: return [self.prefix] - def render_dots(self): - return ".".join(self.prefixed_parts()) - - def render_dots_or_parts(self, with_dots): - if with_dots: - return [self.render_dots()] - else: + def render_dots_or_parts(self, sep: Optional[str] = None): + if sep is None: return self.prefixed_parts() + else: + return [sep.join(self.prefixed_parts())] class CudaVersion(MultiPartVersion): diff --git a/.circleci/config.yml b/.circleci/config.yml index 4ca08b1b7c18..0d353fb2a32e 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -570,6 +570,198 @@ jobs: paths: - miniconda3 + mac_build: + parameters: + build-environment: + type: string + description: Top-level label for what's being built/tested. + xcode-version: + type: string + default: "13.3.1" + description: What xcode version to build with. + build-generates-artifacts: + type: boolean + default: true + description: if the build generates build artifacts + python-version: + type: string + default: "3.8" + macos: + xcode: << parameters.xcode-version >> + resource_class: medium + environment: + BUILD_ENVIRONMENT: << parameters.build-environment >> + AWS_REGION: us-east-1 + steps: + + - checkout + - run_brew_for_macos_build + + - run: + name: Install sccache + command: | + sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${BASH_ENV}" + echo "export SCCACHE_S3_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${BASH_ENV}" + + set +x + echo "export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}" >> "${BASH_ENV}" + echo "export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}" >> "${BASH_ENV}" + set -x + + - run: + name: Get workflow job id + command: | + echo "export OUR_GITHUB_JOB_ID=${CIRCLE_WORKFLOW_JOB_ID}" >> "${BASH_ENV}" + + - run: + name: Build + command: | + set -x + + git submodule sync + git submodule update --init --recursive --depth 1 --jobs 0 + + export PATH="/usr/local/bin:$PATH" + export WORKSPACE_DIR="${HOME}/workspace" + mkdir -p "${WORKSPACE_DIR}" + MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-MacOSX-x86_64.sh" + if [ << parameters.python-version >> == 3.9.12 ]; then + MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh" + fi + + # If a local installation of conda doesn't exist, we download and install conda + if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then + mkdir -p "${WORKSPACE_DIR}" + curl --retry 3 ${MINICONDA_URL} -o "${WORKSPACE_DIR}"/miniconda3.sh + bash "${WORKSPACE_DIR}"/miniconda3.sh -b -p "${WORKSPACE_DIR}"/miniconda3 + fi + export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH" + # shellcheck disable=SC1091 + source "${WORKSPACE_DIR}"/miniconda3/bin/activate + + brew link --force libomp + + echo "export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${BASH_ENV}" + .jenkins/pytorch/macos-build.sh + + - when: + condition: << parameters.build-generates-artifacts >> + steps: + - run: + name: Archive artifacts into zip + command: | + zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json + cp artifacts.zip /Users/distiller/workspace + + - persist_to_workspace: + root: /Users/distiller/workspace/ + paths: + - miniconda3 + - artifacts.zip + + - store_artifacts: + path: /Users/distiller/project/artifacts.zip + + mac_test: + parameters: + build-environment: + type: string + shard-number: + type: string + num-test-shards: + type: string + xcode-version: + type: string + test-config: + type: string + default: 'default' + + macos: + xcode: << parameters.xcode-version >> + environment: + GIT_DEFAULT_BRANCH: 'master' + BUILD_ENVIRONMENT: << parameters.build-environment >> + TEST_CONFIG: << parameters.test-config >> + SHARD_NUMBER: << parameters.shard-number >> + NUM_TEST_SHARDS: << parameters.num-test-shards >> + PYTORCH_RETRY_TEST_CASES: 1 + PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 + steps: + - checkout + - attach_workspace: + at: ~/workspace + - run_brew_for_macos_build + - run: + name: Test + no_output_timeout: "2h" + command: | + set -x + + git submodule sync --recursive + git submodule update --init --recursive + + mv ~/workspace/artifacts.zip . + unzip artifacts.zip + + export IN_CI=1 + + COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") + + export PATH="/usr/local/bin:$PATH" + export WORKSPACE_DIR="${HOME}/workspace" + mkdir -p "${WORKSPACE_DIR}" + + export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH" + source "${WORKSPACE_DIR}"/miniconda3/bin/activate + + # sanitize the input commit message and PR body here: + + # trim all new lines from commit messages to avoid issues with batch environment + # variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028 + COMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}" + + # then trim all special characters like single and double quotes to avoid unescaped inputs to + # wreak havoc internally + export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}" + + python3 -mpip install dist/*.whl + .jenkins/pytorch/macos-test.sh + - run: + name: Copy files for uploading test stats + command: | + # copy into a parent folder test-reports because we can't use CIRCLEI_BUILD_NUM in path when persisting to workspace + mkdir -p test-reports/test-reports_${CIRCLE_BUILD_NUM}/test/test-reports + cp -r test/test-reports test-reports/test-reports_${CIRCLE_BUILD_NUM}/test/test-reports + - store_test_results: + path: test/test-reports + - persist_to_workspace: + root: /Users/distiller/project/ + paths: + - test-reports + + upload_test_stats: + machine: # executor type + image: ubuntu-2004:202010-01 # # recommended linux image - includes Ubuntu 20.04, docker 19.03.13, docker-compose 1.27.4 + steps: + - checkout + - attach_workspace: + at: ~/workspace + - run: + name: upload + command: | + set -ex + if [ -z ${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD} ]; then + echo "No credentials found, cannot upload test stats (are you on a fork?)" + exit 0 + fi + cp -r ~/workspace/test-reports/* ~/project + pip3 install requests==2.26 rockset==0.8.3 boto3==1.19.12 six==1.16.0 + export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD} + export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY_FOR_OSSCI_ARTIFACT_UPLOAD} + # i dont know how to get the run attempt number for reruns so default to 1 + python3 -m tools.stats.upload_test_stats --workflow-run-id "${CIRCLE_WORKFLOW_JOB_ID}" --workflow-run-attempt 1 --head-branch << pipeline.git.branch >> --circleci pytorch_macos_10_13_py3_test: environment: BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test @@ -795,10 +987,43 @@ jobs: macos: xcode: "12.5.1" steps: - - checkout + - run: + name: checkout with retry + command: | + checkout() { + set -ex + # Workaround old docker images with incorrect $HOME + # check https://github.com/docker/docker/issues/2968 for details + if [ "${HOME}" = "/" ] + then + export HOME=$(getent passwd $(id -un) | cut -d: -f6) + fi + + mkdir -p ~/.ssh + + echo 'github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== + ' >> ~/.ssh/known_hosts + + # use git+ssh instead of https + git config --global url."ssh://git@github.com".insteadOf "https://github.com" || true + git config --global gc.auto 0 || true + + echo 'Cloning git repository' + mkdir -p '/Users/distiller/project' + cd '/Users/distiller/project' + git clone "$CIRCLE_REPOSITORY_URL" . + echo 'Checking out branch' + git checkout --force -B "$CIRCLE_BRANCH" "$CIRCLE_SHA1" + git --no-pager log --no-color -n 1 --format='HEAD is now at %h %s' + } + + retry () { + $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*) + } + retry checkout - run_brew_for_ios_build - run: - name: Run Fastlane + name: Setup Fastlane no_output_timeout: "1h" command: | set -e @@ -806,20 +1031,6 @@ jobs: cd ${PROJ_ROOT}/ios/TestApp # install fastlane sudo gem install bundler && bundle install - # install certificates - echo ${IOS_CERT_KEY_2022} >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo ${IOS_SIGN_KEY_2022} >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - run: name: Build no_output_timeout: "1h" @@ -877,18 +1088,12 @@ jobs: command: | set -e PROJ_ROOT=/Users/distiller/project - PROFILE=PyTorch_CI_2022 # run the ruby build script if ! [ -x "$(command -v xcodebuild)" ]; then echo 'Error: xcodebuild is not installed.' exit 1 fi - echo ${IOS_DEV_TEAM_ID} - if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then - ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID} - else - ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} - fi + ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} if ! [ "$?" -eq "0" ]; then echo 'xcodebuild failed!' exit 1 @@ -911,12 +1116,13 @@ jobs: cd ${PROJ_ROOT}/ios/TestApp/benchmark mkdir -p ../models if [ ${USE_COREML_DELEGATE} == 1 ]; then - pip install coremltools==5.0b5 - pip install six + pip install coremltools==5.0b5 protobuf==3.20.1 six==1.16.0 python coreml_backend.py else - python trace_model.py + cd "${PROJ_ROOT}" + python test/mobile/model_test/gen_test_model.py ios-test fi + cd "${PROJ_ROOT}/ios/TestApp/benchmark" if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then echo "Setting up the TestApp for LiteInterpreter" ruby setup.rb --lite 1 @@ -924,10 +1130,10 @@ jobs: echo "Setting up the TestApp for Full JIT" ruby setup.rb fi - cd ${PROJ_ROOT}/ios/TestApp - instruments -s -devices - if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then - if [ ${USE_COREML_DELEGATE} == 1 ]; then + cd "${PROJ_ROOT}/ios/TestApp" + # instruments -s -devices + if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then + if [ "${USE_COREML_DELEGATE}" == 1 ]; then fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML else fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter @@ -1241,4 +1447,27 @@ workflows: branches: only: - postnightly + - pytorch_ios_build: + build_environment: ios-12-5-1-x86-64 + filters: + branches: + ignore: + - nightly + - postnightly + ios_arch: x86_64 + ios_platform: SIMULATOR + lite_interpreter: "1" + name: ios-12-5-1-x86-64 + - pytorch_ios_build: + build_environment: ios-12-5-1-x86-64-coreml + filters: + branches: + ignore: + - nightly + - postnightly + ios_arch: x86_64 + ios_platform: SIMULATOR + lite_interpreter: "1" + name: ios-12-5-1-x86-64-coreml + use_coreml: "1" when: << pipeline.parameters.run_build >> diff --git a/.circleci/docker/build.sh b/.circleci/docker/build.sh index 6eeee5f1ebaa..ebea9eda85a6 100755 --- a/.circleci/docker/build.sh +++ b/.circleci/docker/build.sh @@ -33,7 +33,7 @@ function extract_all_from_image_name() { if [ "x${name}" = xpy ]; then vername=ANACONDA_PYTHON_VERSION fi - # skip non-conforming fields such as "pytorch", "linux" or "xenial" without version string + # skip non-conforming fields such as "pytorch", "linux" or "bionic" without version string if [ -n "${name}" ]; then extract_version_from_image_name "${name}" "${vername}" fi @@ -46,11 +46,7 @@ if [[ "$image" == *xla* ]]; then exit 0 fi -if [[ "$image" == *-xenial* ]]; then - UBUNTU_VERSION=16.04 -elif [[ "$image" == *-artful* ]]; then - UBUNTU_VERSION=17.10 -elif [[ "$image" == *-bionic* ]]; then +if [[ "$image" == *-bionic* ]]; then UBUNTU_VERSION=18.04 elif [[ "$image" == *-focal* ]]; then UBUNTU_VERSION=20.04 @@ -79,56 +75,17 @@ elif [[ "$image" == *rocm* ]]; then DOCKERFILE="${OS}-rocm/Dockerfile" fi -if [[ "$image" == *xenial* ]] || [[ "$image" == *bionic* ]]; then - CMAKE_VERSION=3.13.5 -fi +# CMake 3.18 is needed to support CUDA17 language variant +CMAKE_VERSION=3.18.5 TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64" -UCX_COMMIT=v1.13.x -UCC_COMMIT=a7bda274b10f8adf5bb729f01da064f4e735fb23 +_UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab +_UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee # It's annoying to rename jobs every time you want to rewrite a # configuration, so we hardcode everything here rather than do it # from scratch case "$image" in - pytorch-linux-xenial-py3.8) - ANACONDA_PYTHON_VERSION=3.8 - GCC_VERSION=7 - # Do not install PROTOBUF, DB, and VISION as a test - ;; - pytorch-linux-xenial-py3.7-gcc7.2) - ANACONDA_PYTHON_VERSION=3.7 - GCC_VERSION=7 - # Do not install PROTOBUF, DB, and VISION as a test - ;; - pytorch-linux-xenial-py3.7-gcc7) - ANACONDA_PYTHON_VERSION=3.7 - GCC_VERSION=7 - PROTOBUF=yes - DB=yes - VISION=yes - ;; - pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7) - CUDA_VERSION=10.2 - CUDNN_VERSION=7 - ANACONDA_PYTHON_VERSION=3.7 - GCC_VERSION=7 - PROTOBUF=yes - DB=yes - VISION=yes - KATEX=yes - ;; - pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7) - CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names - CUDNN_VERSION=8 - TENSORRT_VERSION=8.0.1.6 - ANACONDA_PYTHON_VERSION=3.7 - GCC_VERSION=7 - PROTOBUF=yes - DB=yes - VISION=yes - KATEX=yes - ;; pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9) CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names CUDNN_VERSION=8 @@ -139,6 +96,7 @@ case "$image" in DB=yes VISION=yes KATEX=yes + CONDA_CMAKE=yes ;; pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7) CUDA_VERSION=11.6.2 @@ -149,8 +107,9 @@ case "$image" in DB=yes VISION=yes KATEX=yes - UCX_COMMIT=${UCX_COMMIT} - UCC_COMMIT=${UCC_COMMIT} + UCX_COMMIT=${_UCX_COMMIT} + UCC_COMMIT=${_UCC_COMMIT} + CONDA_CMAKE=yes ;; pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7) CUDA_VERSION=11.7.0 @@ -161,22 +120,9 @@ case "$image" in DB=yes VISION=yes KATEX=yes - UCX_COMMIT=${UCX_COMMIT} - UCC_COMMIT=${UCC_COMMIT} - ;; - pytorch-linux-xenial-py3-clang5-asan) - ANACONDA_PYTHON_VERSION=3.7 - CLANG_VERSION=5.0 - PROTOBUF=yes - DB=yes - VISION=yes - ;; - pytorch-linux-xenial-py3-clang7-asan) - ANACONDA_PYTHON_VERSION=3.7 - CLANG_VERSION=7 - PROTOBUF=yes - DB=yes - VISION=yes + UCX_COMMIT=${_UCX_COMMIT} + UCC_COMMIT=${_UCC_COMMIT} + CONDA_CMAKE=yes ;; pytorch-linux-focal-py3-clang7-asan) ANACONDA_PYTHON_VERSION=3.7 @@ -184,13 +130,7 @@ case "$image" in PROTOBUF=yes DB=yes VISION=yes - ;; - pytorch-linux-xenial-py3-clang7-onnx) - ANACONDA_PYTHON_VERSION=3.7 - CLANG_VERSION=7 - PROTOBUF=yes - DB=yes - VISION=yes + CONDA_CMAKE=yes ;; pytorch-linux-focal-py3-clang10-onnx) ANACONDA_PYTHON_VERSION=3.7 @@ -198,10 +138,11 @@ case "$image" in PROTOBUF=yes DB=yes VISION=yes + CONDA_CMAKE=yes ;; - pytorch-linux-xenial-py3-clang5-android-ndk-r19c) + pytorch-linux-focal-py3-clang7-android-ndk-r19c) ANACONDA_PYTHON_VERSION=3.7 - CLANG_VERSION=5.0 + CLANG_VERSION=7 LLVMDEV=yes PROTOBUF=yes ANDROID=yes @@ -209,13 +150,6 @@ case "$image" in GRADLE_VERSION=6.8.3 NINJA_VERSION=1.9.0 ;; - pytorch-linux-xenial-py3.7-clang7) - ANACONDA_PYTHON_VERSION=3.7 - CLANG_VERSION=7 - PROTOBUF=yes - DB=yes - VISION=yes - ;; pytorch-linux-bionic-py3.7-clang9) ANACONDA_PYTHON_VERSION=3.7 CLANG_VERSION=9 @@ -224,6 +158,7 @@ case "$image" in VISION=yes VULKAN_SDK_VERSION=1.2.162.1 SWIFTSHADER=yes + CONDA_CMAKE=yes ;; pytorch-linux-bionic-py3.8-gcc9) ANACONDA_PYTHON_VERSION=3.8 @@ -231,6 +166,7 @@ case "$image" in PROTOBUF=yes DB=yes VISION=yes + CONDA_CMAKE=yes ;; pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9) CUDA_VERSION=10.2 @@ -240,6 +176,7 @@ case "$image" in PROTOBUF=yes DB=yes VISION=yes + CONDA_CMAKE=yes ;; pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7) CUDA_VERSION=10.2 @@ -249,31 +186,34 @@ case "$image" in PROTOBUF=yes DB=yes VISION=yes + CONDA_CMAKE=yes ;; - pytorch-linux-focal-rocm5.1-py3.7) - ANACONDA_PYTHON_VERSION=3.7 + pytorch-linux-focal-rocm5.1-py3.8) + ANACONDA_PYTHON_VERSION=3.8 GCC_VERSION=9 PROTOBUF=yes DB=yes VISION=yes ROCM_VERSION=5.1.1 + CONDA_CMAKE=yes ;; - pytorch-linux-focal-rocm5.2-py3.7) - ANACONDA_PYTHON_VERSION=3.7 + pytorch-linux-focal-rocm5.2-py3.8) + ANACONDA_PYTHON_VERSION=3.8 GCC_VERSION=9 PROTOBUF=yes DB=yes VISION=yes ROCM_VERSION=5.2 + CONDA_CMAKE=yes ;; pytorch-linux-focal-py3.7-gcc7) ANACONDA_PYTHON_VERSION=3.7 - CMAKE_VERSION=3.16.9 # Required for precompiled header support GCC_VERSION=7 PROTOBUF=yes DB=yes VISION=yes KATEX=yes + CONDA_CMAKE=yes ;; pytorch-linux-jammy-cuda11.6-cudnn8-py3.8-clang12) ANACONDA_PYTHON_VERSION=3.8 @@ -283,8 +223,6 @@ case "$image" in PROTOBUF=yes DB=yes VISION=yes - UCX_COMMIT=${UCX_COMMIT} - UCC_COMMIT=${UCC_COMMIT} ;; pytorch-linux-jammy-cuda11.7-cudnn8-py3.8-clang12) ANACONDA_PYTHON_VERSION=3.8 @@ -294,8 +232,6 @@ case "$image" in PROTOBUF=yes DB=yes VISION=yes - UCX_COMMIT=${UCX_COMMIT} - UCC_COMMIT=${UCC_COMMIT} ;; *) # Catch-all for builds that are not hardcoded. @@ -312,6 +248,10 @@ case "$image" in fi if [[ "$image" == *rocm* ]]; then extract_version_from_image_name rocm ROCM_VERSION + NINJA_VERSION=1.9.0 + fi + if [[ "$image" == *centos7* ]]; then + NINJA_VERSION=1.10.2 fi if [[ "$image" == *gcc* ]]; then extract_version_from_image_name gcc GCC_VERSION @@ -383,10 +323,11 @@ docker build \ --build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \ --build-arg "KATEX=${KATEX:-}" \ --build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \ - --build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \ + --build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx906}" \ --build-arg "IMAGE_NAME=${IMAGE_NAME}" \ --build-arg "UCX_COMMIT=${UCX_COMMIT}" \ --build-arg "UCC_COMMIT=${UCC_COMMIT}" \ + --build-arg "CONDA_CMAKE=${CONDA_CMAKE}" \ -f $(dirname ${DOCKERFILE})/Dockerfile \ -t "$tmp_tag" \ "$@" \ diff --git a/.circleci/docker/centos-rocm/Dockerfile b/.circleci/docker/centos-rocm/Dockerfile index 7c7708d416fe..894f39fe471c 100644 --- a/.circleci/docker/centos-rocm/Dockerfile +++ b/.circleci/docker/centos-rocm/Dockerfile @@ -40,6 +40,7 @@ RUN bash ./install_user.sh && rm install_user.sh # Install conda and other packages (e.g., numpy, pytest) ENV PATH /opt/conda/bin:$PATH ARG ANACONDA_PYTHON_VERSION +ARG CONDA_CMAKE COPY requirements-ci.txt /opt/conda/requirements-ci.txt COPY ./common/install_conda.sh install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh @@ -71,6 +72,9 @@ ARG ROCM_VERSION COPY ./common/install_rocm.sh install_rocm.sh RUN bash ./install_rocm.sh RUN rm install_rocm.sh +COPY ./common/install_rocm_magma.sh install_rocm_magma.sh +RUN bash ./install_rocm_magma.sh +RUN rm install_rocm_magma.sh ENV PATH /opt/rocm/bin:$PATH ENV PATH /opt/rocm/hcc/bin:$PATH ENV PATH /opt/rocm/hip/bin:$PATH diff --git a/.circleci/docker/common/install_base.sh b/.circleci/docker/common/install_base.sh index 6724031c0a44..84835d6de50d 100755 --- a/.circleci/docker/common/install_base.sh +++ b/.circleci/docker/common/install_base.sh @@ -68,7 +68,10 @@ install_ubuntu() { sudo \ vim \ jq \ - libtool + libtool \ + vim \ + unzip \ + gdb # Should resolve issues related to various apt package repository cert issues # see: https://github.com/pytorch/pytorch/issues/65931 @@ -126,7 +129,9 @@ install_centos() { opencv-devel \ sudo \ wget \ - vim + vim \ + unzip \ + gdb # Cleanup yum clean all diff --git a/.circleci/docker/common/install_conda.sh b/.circleci/docker/common/install_conda.sh index 49afcb5aef42..84f9538ce124 100755 --- a/.circleci/docker/common/install_conda.sh +++ b/.circleci/docker/common/install_conda.sh @@ -55,8 +55,10 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then # Ensure we run conda in a directory that jenkins has write access to pushd /opt/conda - # Track latest conda update - as_jenkins conda update -y -n base conda + # Prevent conda from updating to 4.14.0, which causes docker build failures + # See https://hud.pytorch.org/pytorch/pytorch/commit/754d7f05b6841e555cea5a4b2c505dd9e0baec1d + # Uncomment the below when resolved to track the latest conda update + # as_jenkins conda update -y -n base conda # Install correct Python version as_jenkins conda install -y python="$ANACONDA_PYTHON_VERSION" @@ -73,8 +75,6 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then } # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README - # DO NOT install cmake here as it would install a version newer than 3.13, but - # we want to pin to version 3.13. CONDA_COMMON_DEPS="astunparse pyyaml mkl=2022.0.1 mkl-include=2022.0.1 setuptools cffi future six" if [ "$ANACONDA_PYTHON_VERSION" = "3.10" ]; then # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source @@ -90,15 +90,20 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then conda_install numpy=1.18.5 ${CONDA_COMMON_DEPS} typing_extensions fi + # Use conda cmake in some cases. Conda cmake will be newer than our supported + # min version (3.5 for xenial and 3.10 for bionic), so we only do it in those + # following builds that we know should use conda. Specifically, Ubuntu bionic + # and focal cannot find conda mkl with stock cmake, so we need a cmake from conda + if [ -n "${CONDA_CMAKE}" ]; then + conda_install cmake + fi + # Magma package names are concatenation of CUDA major and minor ignoring revision # I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89 if [ -n "$CUDA_VERSION" ]; then conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch fi - # TODO: This isn't working atm - conda_install nnpack -c killeent - # Install some other packages, including those needed for Python test reporting pip_install -r /opt/conda/requirements-ci.txt diff --git a/.circleci/docker/common/install_cudnn.sh b/.circleci/docker/common/install_cudnn.sh index 1f1c34ea200d..f68fc6946c2e 100644 --- a/.circleci/docker/common/install_cudnn.sh +++ b/.circleci/docker/common/install_cudnn.sh @@ -4,7 +4,13 @@ if [[ ${CUDNN_VERSION} == 8 ]]; then # cuDNN license: https://developer.nvidia.com/cudnn/license_agreement mkdir tmp_cudnn && cd tmp_cudnn CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive" - curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz + if [[ ${CUDA_VERSION:0:4} == "11.7" ]]; then + CUDNN_NAME="cudnn-linux-x86_64-8.5.0.96_cuda11-archive" + curl --retry 3 -OLs https://ossci-linux.s3.amazonaws.com/${CUDNN_NAME}.tar.xz + else + curl --retry 3 -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz + fi + tar xf ${CUDNN_NAME}.tar.xz cp -a ${CUDNN_NAME}/include/* /usr/include/ cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/ diff --git a/.circleci/docker/common/install_docs_reqs.sh b/.circleci/docker/common/install_docs_reqs.sh index 1adc9e8009a0..e60171208ae1 100644 --- a/.circleci/docker/common/install_docs_reqs.sh +++ b/.circleci/docker/common/install_docs_reqs.sh @@ -7,10 +7,10 @@ if [ -n "$KATEX" ]; then # Ignore error if gpg-agent doesn't exist (for Ubuntu 16.04) apt-get install -y gpg-agent || : - curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash - + curl --retry 3 -sL https://deb.nodesource.com/setup_12.x | sudo -E bash - sudo apt-get install -y nodejs - curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add - + curl --retry 3 -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add - echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list apt-get update diff --git a/.circleci/docker/common/install_protobuf.sh b/.circleci/docker/common/install_protobuf.sh index 9d9f6c40ba0c..4b7a7a6ac23f 100755 --- a/.circleci/docker/common/install_protobuf.sh +++ b/.circleci/docker/common/install_protobuf.sh @@ -12,7 +12,7 @@ install_protobuf_317() { # g++: error: ./../lib64/crti.o: No such file or directory ln -s /usr/lib64 "$pb_dir/lib64" - curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz" + curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz" --retry 3 tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz # -j6 to balance memory usage and speed. # naked `-j` seems to use too much memory. diff --git a/.circleci/docker/common/install_rocm.sh b/.circleci/docker/common/install_rocm.sh index ceebd7d60671..7ad0c4f123e1 100644 --- a/.circleci/docker/common/install_rocm.sh +++ b/.circleci/docker/common/install_rocm.sh @@ -2,34 +2,6 @@ set -ex -install_magma() { - # "install" hipMAGMA into /opt/rocm/magma by copying after build - git clone https://bitbucket.org/icl/magma.git - pushd magma - # Fixes memory leaks of magma found while executing linalg UTs - git checkout 5959b8783e45f1809812ed96ae762f38ee701972 - cp make.inc-examples/make.inc.hip-gcc-mkl make.inc - echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc - echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc - echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc - export PATH="${PATH}:/opt/rocm/bin" - if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then - amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'` - else - amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs` - fi - for arch in $amdgpu_targets; do - echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc - done - # hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition - sed -i 's/^FOPENMP/#FOPENMP/g' make.inc - make -f make.gen.hipMAGMA -j $(nproc) - LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda - make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda - popd - mv magma /opt/rocm -} - ver() { printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' '); } @@ -57,7 +29,12 @@ install_ubuntu() { if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then # Add amdgpu repository UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'` - local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/ubuntu" + local amdgpu_baseurl + if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then + amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu" + else + amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/ubuntu" + fi echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list fi @@ -66,6 +43,10 @@ install_ubuntu() { ROCM_REPO="xenial" fi + if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then + ROCM_REPO="${UBUNTU_VERSION_NAME}" + fi + # Add rocm repository wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}" @@ -89,8 +70,6 @@ install_ubuntu() { DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENKERNELS} fi - install_magma - # Cleanup apt-get autoclean && apt-get clean rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* @@ -108,7 +87,16 @@ install_centos() { if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then # Add amdgpu repository - local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/7.9/main/x86_64" + local amdgpu_baseurl + if [[ $OS_VERSION == 9 ]]; then + amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/9.0/main/x86_64" + else + if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then + amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/rhel/7.9/main/x86_64" + else + amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/7.9/main/x86_64" + fi + fi echo "[AMDGPU]" > /etc/yum.repos.d/amdgpu.repo echo "name=AMDGPU" >> /etc/yum.repos.d/amdgpu.repo echo "baseurl=${amdgpu_baseurl}" >> /etc/yum.repos.d/amdgpu.repo @@ -135,8 +123,6 @@ install_centos() { rocprofiler-dev \ roctracer-dev - install_magma - # Cleanup yum clean all rm -rf /var/cache/yum diff --git a/.circleci/docker/common/install_rocm_magma.sh b/.circleci/docker/common/install_rocm_magma.sh new file mode 100644 index 000000000000..c7b116b93868 --- /dev/null +++ b/.circleci/docker/common/install_rocm_magma.sh @@ -0,0 +1,29 @@ +#!/bin/bash + +set -ex + +# "install" hipMAGMA into /opt/rocm/magma by copying after build +git clone https://bitbucket.org/icl/magma.git +pushd magma +# Fixes memory leaks of magma found while executing linalg UTs +git checkout 5959b8783e45f1809812ed96ae762f38ee701972 +cp make.inc-examples/make.inc.hip-gcc-mkl make.inc +echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc +echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc +echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc +export PATH="${PATH}:/opt/rocm/bin" +if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then + amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'` +else + amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs` +fi +for arch in $amdgpu_targets; do + echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc +done +# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition +sed -i 's/^FOPENMP/#FOPENMP/g' make.inc +make -f make.gen.hipMAGMA -j $(nproc) +LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda +make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda +popd +mv magma /opt/rocm diff --git a/.circleci/docker/common/install_ucc.sh b/.circleci/docker/common/install_ucc.sh index a7b90286a0fb..333e44e6f779 100755 --- a/.circleci/docker/common/install_ucc.sh +++ b/.circleci/docker/common/install_ucc.sh @@ -2,6 +2,12 @@ set -ex +if [[ -d "/usr/local/cuda/" ]]; then + with_cuda=/usr/local/cuda/ +else + with_cuda=no +fi + function install_ucx() { set -ex git clone --recursive https://github.com/openucx/ucx.git @@ -12,6 +18,7 @@ function install_ucx() { ./autogen.sh ./configure --prefix=$UCX_HOME \ --enable-mt \ + --with-cuda=$with_cuda \ --enable-profiling \ --enable-stats time make -j @@ -29,7 +36,7 @@ function install_ucc() { git submodule update --init --recursive ./autogen.sh - ./configure --prefix=$UCC_HOME --with-ucx=$UCX_HOME --with-nccl=no + ./configure --prefix=$UCC_HOME --with-ucx=$UCX_HOME --with-cuda=$with_cuda time make -j sudo make install diff --git a/.circleci/docker/requirements-ci.txt b/.circleci/docker/requirements-ci.txt index 8b18a1745808..e527d29d4989 100644 --- a/.circleci/docker/requirements-ci.txt +++ b/.circleci/docker/requirements-ci.txt @@ -124,12 +124,17 @@ numba==0.55.2 ; python_version == "3.10" #Pinned versions: 1.9.0 #test that import: +opt-einsum==3.3 +#Description: Python library to optimize tensor contraction order, used in einsum +#Pinned versions: 3.3 +#test that import: test_linalg.py + #pillow #Description: Python Imaging Library fork #Pinned versions: #test that import: -protobuf==3.20.1 +protobuf==3.20.2 #Description: Google’s data interchange format #Pinned versions: 3.20.1 #test that import: test_tensorboard.py @@ -149,8 +154,18 @@ pytest-xdist #Pinned versions: #test that import: +pytest-shard +#Description: plugin spliting up tests in pytest +#Pinned versions: +#test that import: + +pytest-flakefinder==1.1.0 +#Description: plugin for rerunning tests a fixed number of times in pytest +#Pinned versions: 1.1.0 +#test that import: + pytest-rerunfailures -#Description: plugin for rerunning tests in pytest +#Description: plugin for rerunning failure tests in pytest #Pinned versions: #test that import: @@ -164,11 +179,16 @@ pytest-rerunfailures #Pinned versions: #test that import: -#xdoctest +xdoctest==1.0.2 #Description: runs doctests in pytest -#Pinned versions: +#Pinned versions: 1.0.2 #test that import: +pygments==2.12.0 +#Description: support doctest highlighting +#Pinned versions: 2.12.0 +#test that import: the doctests + #PyYAML #Description: data serialization format #Pinned versions: diff --git a/.circleci/docker/ubuntu-cuda/Dockerfile b/.circleci/docker/ubuntu-cuda/Dockerfile index a3a623996ad0..307071c8f4fc 100644 --- a/.circleci/docker/ubuntu-cuda/Dockerfile +++ b/.circleci/docker/ubuntu-cuda/Dockerfile @@ -26,6 +26,7 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh # Install conda and other packages (e.g., numpy, pytest) ENV PATH /opt/conda/bin:$PATH ARG ANACONDA_PYTHON_VERSION +ARG CONDA_CMAKE COPY requirements-ci.txt /opt/conda/requirements-ci.txt COPY ./common/install_conda.sh install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh @@ -118,9 +119,14 @@ COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm # Install CUDNN ARG CUDNN_VERSION +ARG CUDA_VERSION COPY ./common/install_cudnn.sh install_cudnn.sh RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi RUN rm install_cudnn.sh +# Delete /usr/local/cuda-11.X/cuda-11.X symlinks +RUN if [ -h /usr/local/cuda-11.6/cuda-11.6 ]; then rm /usr/local/cuda-11.6/cuda-11.6; fi +RUN if [ -h /usr/local/cuda-11.7/cuda-11.7 ]; then rm /usr/local/cuda-11.7/cuda-11.7; fi + USER jenkins CMD ["bash"] diff --git a/.circleci/docker/ubuntu-rocm/Dockerfile b/.circleci/docker/ubuntu-rocm/Dockerfile index a994b2e52f23..b9c8feab06cf 100644 --- a/.circleci/docker/ubuntu-rocm/Dockerfile +++ b/.circleci/docker/ubuntu-rocm/Dockerfile @@ -28,6 +28,7 @@ RUN bash ./install_user.sh && rm install_user.sh # Install conda and other packages (e.g., numpy, pytest) ENV PATH /opt/conda/bin:$PATH ARG ANACONDA_PYTHON_VERSION +ARG CONDA_CMAKE COPY requirements-ci.txt /opt/conda/requirements-ci.txt COPY ./common/install_conda.sh install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh @@ -64,6 +65,9 @@ ARG ROCM_VERSION COPY ./common/install_rocm.sh install_rocm.sh RUN bash ./install_rocm.sh RUN rm install_rocm.sh +COPY ./common/install_rocm_magma.sh install_rocm_magma.sh +RUN bash ./install_rocm_magma.sh +RUN rm install_rocm_magma.sh ENV PATH /opt/rocm/bin:$PATH ENV PATH /opt/rocm/hcc/bin:$PATH ENV PATH /opt/rocm/hip/bin:$PATH diff --git a/.circleci/docker/ubuntu/Dockerfile b/.circleci/docker/ubuntu/Dockerfile index e86baf0d6690..5f41ed53f954 100644 --- a/.circleci/docker/ubuntu/Dockerfile +++ b/.circleci/docker/ubuntu/Dockerfile @@ -37,6 +37,7 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh # Install conda and other packages (e.g., numpy, pytest) ENV PATH /opt/conda/bin:$PATH ARG ANACONDA_PYTHON_VERSION +ARG CONDA_CMAKE COPY requirements-ci.txt /opt/conda/requirements-ci.txt COPY ./common/install_conda.sh install_conda.sh RUN bash ./install_conda.sh && rm install_conda.sh diff --git a/.circleci/generate_config_yml.py b/.circleci/generate_config_yml.py index e068dd98fd8e..9366f59c465c 100755 --- a/.circleci/generate_config_yml.py +++ b/.circleci/generate_config_yml.py @@ -14,6 +14,7 @@ import cimodel.data.simple.mobile_definitions import cimodel.data.simple.nightly_ios import cimodel.data.simple.anaconda_prune_defintions +import cimodel.data.simple.ios_definitions import cimodel.lib.miniutils as miniutils import cimodel.lib.miniyaml as miniyaml @@ -70,6 +71,7 @@ def write(self, output_filehandle): for line in filter(None, lines): output_filehandle.write(line + "\n") + def _for_all_items(items, functor) -> None: if isinstance(items, list): for item in items: @@ -78,6 +80,7 @@ def _for_all_items(items, functor) -> None: item_type, item = next(iter(items.items())) functor(item_type, item) + def filter_master_only_jobs(items): def _is_main_or_master_item(item): filters = item.get('filters', None) @@ -116,6 +119,7 @@ def _do_filtering(items): _for_all_items(items, _save_requires_if_master) return _do_filtering(items) + def generate_required_docker_images(items): required_docker_images = set() @@ -131,11 +135,13 @@ def _requires_docker_image(item_type, item): _for_all_items(items, _requires_docker_image) return required_docker_images + def gen_build_workflows_tree(): build_workflows_functions = [ cimodel.data.simple.mobile_definitions.get_workflow_jobs, cimodel.data.simple.nightly_ios.get_workflow_jobs, cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs, + cimodel.data.simple.ios_definitions.get_workflow_jobs, ] build_jobs = [f() for f in build_workflows_functions] build_jobs.extend( diff --git a/.circleci/scripts/binary_install_miniconda.sh b/.circleci/scripts/binary_install_miniconda.sh index 43eb006742ae..3541a32ac6bf 100755 --- a/.circleci/scripts/binary_install_miniconda.sh +++ b/.circleci/scripts/binary_install_miniconda.sh @@ -31,9 +31,9 @@ fi conda_sh="$workdir/install_miniconda.sh" if [[ "$(uname)" == Darwin ]]; then - curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh else - curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh + curl --retry 3 --retry-all-errors -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh fi chmod +x "$conda_sh" "$conda_sh" -b -p "$MINICONDA_ROOT" diff --git a/.circleci/scripts/binary_ios_build.sh b/.circleci/scripts/binary_ios_build.sh index 6c7674ed510e..4bb5ea28af73 100644 --- a/.circleci/scripts/binary_ios_build.sh +++ b/.circleci/scripts/binary_ios_build.sh @@ -8,7 +8,7 @@ PROJ_ROOT=/Users/distiller/project export TCLLIBPATH="/usr/local/lib" # Install conda -curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh +curl --retry 3 --retry-all-errors -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x ~/conda.sh /bin/bash ~/conda.sh -b -p ~/anaconda export PATH="~/anaconda/bin:${PATH}" diff --git a/.circleci/scripts/binary_ios_test.sh b/.circleci/scripts/binary_ios_test.sh index 3f052175235c..c750dbceca87 100644 --- a/.circleci/scripts/binary_ios_test.sh +++ b/.circleci/scripts/binary_ios_test.sh @@ -1,30 +1,19 @@ #!/bin/bash set -ex -o pipefail +if ! [ "$IOS_PLATFORM" == "SIMULATOR" ]; then + exit 0 +fi + echo "" echo "DIR: $(pwd)" PROJ_ROOT=/Users/distiller/project cd ${PROJ_ROOT}/ios/TestApp # install fastlane sudo gem install bundler && bundle install -# install certificates -echo "${IOS_CERT_KEY_2022}" >> cert.txt -base64 --decode cert.txt -o Certificates.p12 -rm cert.txt -bundle exec fastlane install_root_cert -bundle exec fastlane install_dev_cert -# install the provisioning profile -PROFILE=PyTorch_CI_2022.mobileprovision -PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles -mkdir -pv "${PROVISIONING_PROFILES}" -cd "${PROVISIONING_PROFILES}" -echo "${IOS_SIGN_KEY_2022}" >> cert.txt -base64 --decode cert.txt -o ${PROFILE} -rm cert.txt # run the ruby build script if ! [ -x "$(command -v xcodebuild)" ]; then echo 'Error: xcodebuild is not installed.' exit 1 fi -PROFILE=PyTorch_CI_2022 -ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID} +ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} diff --git a/.circleci/scripts/binary_ios_upload.sh b/.circleci/scripts/binary_ios_upload.sh index 02037da8e07b..7949dc9170b0 100644 --- a/.circleci/scripts/binary_ios_upload.sh +++ b/.circleci/scripts/binary_ios_upload.sh @@ -33,7 +33,7 @@ fi cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/ # zip the library export DATE="$(date -u +%Y%m%d)" -export IOS_NIGHTLY_BUILD_VERSION="1.13.0.${DATE}" +export IOS_NIGHTLY_BUILD_VERSION="1.14.0.${DATE}" if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then # libtorch_lite_ios_nightly_1.11.0.20210810.zip ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip" @@ -47,7 +47,7 @@ echo "${IOS_NIGHTLY_BUILD_VERSION}" > version.txt zip -r ${ZIPFILE} install src version.txt LICENSE # upload to aws # Install conda then 'conda install' awscli -curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh +curl --retry 3 --retry-all-errors -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x ~/conda.sh /bin/bash ~/conda.sh -b -p ~/anaconda export PATH="~/anaconda/bin:${PATH}" diff --git a/.circleci/scripts/binary_populate_env.sh b/.circleci/scripts/binary_populate_env.sh index 56c4f556adbb..3294c72024aa 100755 --- a/.circleci/scripts/binary_populate_env.sh +++ b/.circleci/scripts/binary_populate_env.sh @@ -59,7 +59,7 @@ PIP_UPLOAD_FOLDER='nightly/' # We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it export DATE="$(date -u +%Y%m%d)" #TODO: We should be pulling semver version from the base version.txt -BASE_BUILD_VERSION="1.13.0.dev$DATE" +BASE_BUILD_VERSION="1.14.0.dev$DATE" # Change BASE_BUILD_VERSION to git tag when on a git tag # Use 'git -C' to make doubly sure we're in the correct directory for checking # the git tag @@ -76,6 +76,11 @@ if [[ "$(uname)" == 'Darwin' ]] || [[ "$PACKAGE_TYPE" == conda ]]; then else export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA" fi + +if [[ -n "${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" ]]; then + export PYTORCH_BUILD_VERSION="${PYTORCH_BUILD_VERSION}-with-pypi-cudnn" +fi + export PYTORCH_BUILD_NUMBER=1 @@ -124,9 +129,9 @@ if [[ "${OSTYPE}" == "msys" ]]; then else export DESIRED_DEVTOOLSET="${DESIRED_DEVTOOLSET:-}" fi - +export PYTORCH_EXTRA_INSTALL_REQUIREMENTS="${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" export DATE="$DATE" -export NIGHTLIES_DATE_PREAMBLE=1.13.0.dev +export NIGHTLIES_DATE_PREAMBLE=1.14.0.dev export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION" export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER" export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION" diff --git a/.circleci/scripts/binary_upload.sh b/.circleci/scripts/binary_upload.sh index 2c7cf95a963a..74f238bea528 100755 --- a/.circleci/scripts/binary_upload.sh +++ b/.circleci/scripts/binary_upload.sh @@ -14,6 +14,12 @@ UPLOAD_CHANNEL=${UPLOAD_CHANNEL:-nightly} UPLOAD_SUBFOLDER=${UPLOAD_SUBFOLDER:-cpu} UPLOAD_BUCKET="s3://pytorch" BACKUP_BUCKET="s3://pytorch-backup" +BUILD_NAME=${BUILD_NAME:-} + +# this is temporary change to upload pypi-cudnn builds to separate folder +if [[ ${BUILD_NAME} == *with-pypi-cudnn* ]]; then + UPLOAD_SUBFOLDER="${UPLOAD_SUBFOLDER}_pypi_cudnn" +fi DRY_RUN=${DRY_RUN:-enabled} # Don't actually do work unless explicit @@ -24,6 +30,11 @@ if [[ "${DRY_RUN}" = "disabled" ]]; then AWS_S3_CP="aws s3 cp" fi +# Sleep 2 minutes between retries for conda upload +retry () { + "$@" || (sleep 5m && "$@") || (sleep 5m && "$@") || (sleep 5m && "$@") || (sleep 5m && "$@") +} + do_backup() { local backup_dir backup_dir=$1 @@ -37,13 +48,14 @@ do_backup() { conda_upload() { ( set -x + retry \ ${ANACONDA} \ - upload \ - ${PKG_DIR}/*.tar.bz2 \ - -u "pytorch-${UPLOAD_CHANNEL}" \ - --label main \ - --no-progress \ - --force + upload \ + ${PKG_DIR}/*.tar.bz2 \ + -u "pytorch-${UPLOAD_CHANNEL}" \ + --label main \ + --no-progress \ + --force ) } diff --git a/.circleci/scripts/driver_update.bat b/.circleci/scripts/driver_update.bat index 46c05475cdba..fb8774366621 100644 --- a/.circleci/scripts/driver_update.bat +++ b/.circleci/scripts/driver_update.bat @@ -1,5 +1,5 @@ set "DRIVER_DOWNLOAD_LINK=https://s3.amazonaws.com/ossci-windows/452.39-data-center-tesla-desktop-win10-64bit-international.exe" -curl --retry 3 -kL %DRIVER_DOWNLOAD_LINK% --output 452.39-data-center-tesla-desktop-win10-64bit-international.exe +curl --retry 3 --retry-all-errors -kL %DRIVER_DOWNLOAD_LINK% --output 452.39-data-center-tesla-desktop-win10-64bit-international.exe if errorlevel 1 exit /b 1 start /wait 452.39-data-center-tesla-desktop-win10-64bit-international.exe -s -noreboot diff --git a/.circleci/scripts/functorch_doc_push_script.sh b/.circleci/scripts/functorch_doc_push_script.sh new file mode 100755 index 000000000000..aed2a1c451b9 --- /dev/null +++ b/.circleci/scripts/functorch_doc_push_script.sh @@ -0,0 +1,47 @@ +#!/bin/bash +# =================== The following code **should** be executed inside Docker container =================== + +# Install dependencies +sudo apt-get -y update +sudo apt-get -y install expect-dev + +# This is where the local pytorch install in the docker image is located +pt_checkout="/var/lib/jenkins/workspace" +source "$pt_checkout/.jenkins/pytorch/common_utils.sh" +echo "functorch_doc_push_script.sh: Invoked with $*" + +set -ex + +version=${DOCS_VERSION:-nightly} +echo "version: $version" + +# Build functorch docs +pushd $pt_checkout/functorch/docs +pip -q install -r requirements.txt +make html +popd + +git clone https://github.com/pytorch/functorch -b gh-pages --depth 1 functorch_ghpages +pushd functorch_ghpages + +if [ $version == "master" ]; then + version=nightly +fi + +git rm -rf "$version" || true +mv "$pt_checkout/functorch/docs/build/html" "$version" + +git add "$version" || true +git status +git config user.email "soumith+bot@pytorch.org" +git config user.name "pytorchbot" +# If there aren't changes, don't make a commit; push is no-op +git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true +git status + +if [[ "${WITH_PUSH:-}" == true ]]; then + git push -u origin gh-pages +fi + +popd +# =================== The above code **should** be executed inside Docker container =================== diff --git a/.circleci/scripts/python_doc_push_script.sh b/.circleci/scripts/python_doc_push_script.sh index f9b019ec069b..d255f77c82e8 100755 --- a/.circleci/scripts/python_doc_push_script.sh +++ b/.circleci/scripts/python_doc_push_script.sh @@ -135,6 +135,9 @@ git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true git status if [[ "${WITH_PUSH:-}" == true ]]; then + # push to a temp branch first to trigger CLA check and satisfy branch protections + git push -u origin HEAD:pytorchbot/temp-branch-py -f + sleep 30 git push -u origin "${branch}" fi diff --git a/.circleci/scripts/setup_ci_environment.sh b/.circleci/scripts/setup_ci_environment.sh index 8ac4f5b43a9a..42a605cd4445 100755 --- a/.circleci/scripts/setup_ci_environment.sh +++ b/.circleci/scripts/setup_ci_environment.sh @@ -32,7 +32,7 @@ if ! command -v aws >/dev/null; then fi if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then - DRIVER_FN="NVIDIA-Linux-x86_64-515.57.run" + DRIVER_FN="NVIDIA-Linux-x86_64-515.76.run" wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false) nvidia-smi @@ -40,8 +40,8 @@ if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then # Taken directly from https://github.com/NVIDIA/nvidia-docker # Add the package repositories distribution=$(. /etc/os-release;echo "$ID$VERSION_ID") - curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - - curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list + curl -s -L --retry 3 --retry-all-errors https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - + curl -s -L --retry 3 --retry-all-errors "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list retry sudo apt-get update -qq # Necessary to get the `--gpus` flag to function within docker diff --git a/.circleci/scripts/setup_linux_system_environment.sh b/.circleci/scripts/setup_linux_system_environment.sh index ce64076e2d64..780f7c1bd379 100755 --- a/.circleci/scripts/setup_linux_system_environment.sh +++ b/.circleci/scripts/setup_linux_system_environment.sh @@ -2,7 +2,7 @@ set -eux -o pipefail # Set up CircleCI GPG keys for apt, if needed -curl --retry 3 -s -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add - +curl --retry 3 --retry-all-errors -s -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add - # Stop background apt updates. Hypothetically, the kill should not # be necessary, because stop is supposed to send a kill signal to diff --git a/.circleci/scripts/vs_install.ps1 b/.circleci/scripts/vs_install.ps1 index a2e373078adb..4bbbc24bb043 100644 --- a/.circleci/scripts/vs_install.ps1 +++ b/.circleci/scripts/vs_install.ps1 @@ -29,7 +29,7 @@ if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswher } echo "Downloading VS installer from S3." -curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe +curl.exe --retry 3 --retry-all-errors -kL $VS_DOWNLOAD_LINK --output vs_installer.exe if ($LASTEXITCODE -ne 0) { echo "Download of the VS 2019 Version ${env:VS_VERSION} installer failed" exit 1 diff --git a/.circleci/scripts/vs_install_cmath.ps1 b/.circleci/scripts/vs_install_cmath.ps1 index c2998eba2521..62b637ec21b8 100644 --- a/.circleci/scripts/vs_install_cmath.ps1 +++ b/.circleci/scripts/vs_install_cmath.ps1 @@ -1,5 +1,5 @@ $CMATH_DOWNLOAD_LINK = "https://raw.githubusercontent.com/microsoft/STL/12c684bba78f9b032050526abdebf14f58ca26a3/stl/inc/cmath" $VC14_28_INSTALL_PATH="C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include" -curl.exe --retry 3 -kL $CMATH_DOWNLOAD_LINK --output "$home\cmath" +curl.exe --retry 3 --retry-all-errors -kL $CMATH_DOWNLOAD_LINK --output "$home\cmath" Move-Item -Path "$home\cmath" -Destination "$VC14_28_INSTALL_PATH" -Force diff --git a/.circleci/scripts/windows_cudnn_install.sh b/.circleci/scripts/windows_cudnn_install.sh index 763bc950fc4b..bbf45a3290b3 100644 --- a/.circleci/scripts/windows_cudnn_install.sh +++ b/.circleci/scripts/windows_cudnn_install.sh @@ -18,7 +18,7 @@ case ${CUDA_VERSION} in ;; 11.7) # Use cudnn8.3 with hard-coded cuda11.5 version - cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive" + cudnn_file_name="cudnn-windows-x86_64-8.5.0.96_cuda11-archive" ;; *) echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet" @@ -36,7 +36,7 @@ else tmp_dir=$(mktemp -d) ( pushd "${tmp_dir}" - curl --retry 3 -o "${cudnn_installer_name}" "$cudnn_installer_link" + curl --retry 3 --retry-all-errors -o "${cudnn_installer_name}" "$cudnn_installer_link" 7z x "${cudnn_installer_name}" -ocudnn # Use '${var:?}/*' to avoid potentially expanding to '/*' # Remove all of the directories before attempting to copy files diff --git a/.circleci/verbatim-sources/job-specs/job-specs-custom.yml b/.circleci/verbatim-sources/job-specs/job-specs-custom.yml index 180ea014db6d..7d5f7f686512 100644 --- a/.circleci/verbatim-sources/job-specs/job-specs-custom.yml +++ b/.circleci/verbatim-sources/job-specs/job-specs-custom.yml @@ -95,6 +95,198 @@ paths: - miniconda3 + mac_build: + parameters: + build-environment: + type: string + description: Top-level label for what's being built/tested. + xcode-version: + type: string + default: "13.3.1" + description: What xcode version to build with. + build-generates-artifacts: + type: boolean + default: true + description: if the build generates build artifacts + python-version: + type: string + default: "3.8" + macos: + xcode: << parameters.xcode-version >> + resource_class: medium + environment: + BUILD_ENVIRONMENT: << parameters.build-environment >> + AWS_REGION: us-east-1 + steps: + + - checkout + - run_brew_for_macos_build + + - run: + name: Install sccache + command: | + sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${BASH_ENV}" + echo "export SCCACHE_S3_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${BASH_ENV}" + + set +x + echo "export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}" >> "${BASH_ENV}" + echo "export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}" >> "${BASH_ENV}" + set -x + + - run: + name: Get workflow job id + command: | + echo "export OUR_GITHUB_JOB_ID=${CIRCLE_WORKFLOW_JOB_ID}" >> "${BASH_ENV}" + + - run: + name: Build + command: | + set -x + + git submodule sync + git submodule update --init --recursive --depth 1 --jobs 0 + + export PATH="/usr/local/bin:$PATH" + export WORKSPACE_DIR="${HOME}/workspace" + mkdir -p "${WORKSPACE_DIR}" + MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-MacOSX-x86_64.sh" + if [ << parameters.python-version >> == 3.9.12 ]; then + MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh" + fi + + # If a local installation of conda doesn't exist, we download and install conda + if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then + mkdir -p "${WORKSPACE_DIR}" + curl --retry 3 ${MINICONDA_URL} -o "${WORKSPACE_DIR}"/miniconda3.sh + bash "${WORKSPACE_DIR}"/miniconda3.sh -b -p "${WORKSPACE_DIR}"/miniconda3 + fi + export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH" + # shellcheck disable=SC1091 + source "${WORKSPACE_DIR}"/miniconda3/bin/activate + + brew link --force libomp + + echo "export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${BASH_ENV}" + .jenkins/pytorch/macos-build.sh + + - when: + condition: << parameters.build-generates-artifacts >> + steps: + - run: + name: Archive artifacts into zip + command: | + zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json + cp artifacts.zip /Users/distiller/workspace + + - persist_to_workspace: + root: /Users/distiller/workspace/ + paths: + - miniconda3 + - artifacts.zip + + - store_artifacts: + path: /Users/distiller/project/artifacts.zip + + mac_test: + parameters: + build-environment: + type: string + shard-number: + type: string + num-test-shards: + type: string + xcode-version: + type: string + test-config: + type: string + default: 'default' + + macos: + xcode: << parameters.xcode-version >> + environment: + GIT_DEFAULT_BRANCH: 'master' + BUILD_ENVIRONMENT: << parameters.build-environment >> + TEST_CONFIG: << parameters.test-config >> + SHARD_NUMBER: << parameters.shard-number >> + NUM_TEST_SHARDS: << parameters.num-test-shards >> + PYTORCH_RETRY_TEST_CASES: 1 + PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1 + steps: + - checkout + - attach_workspace: + at: ~/workspace + - run_brew_for_macos_build + - run: + name: Test + no_output_timeout: "2h" + command: | + set -x + + git submodule sync --recursive + git submodule update --init --recursive + + mv ~/workspace/artifacts.zip . + unzip artifacts.zip + + export IN_CI=1 + + COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") + + export PATH="/usr/local/bin:$PATH" + export WORKSPACE_DIR="${HOME}/workspace" + mkdir -p "${WORKSPACE_DIR}" + + export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH" + source "${WORKSPACE_DIR}"/miniconda3/bin/activate + + # sanitize the input commit message and PR body here: + + # trim all new lines from commit messages to avoid issues with batch environment + # variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028 + COMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}" + + # then trim all special characters like single and double quotes to avoid unescaped inputs to + # wreak havoc internally + export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}" + + python3 -mpip install dist/*.whl + .jenkins/pytorch/macos-test.sh + - run: + name: Copy files for uploading test stats + command: | + # copy into a parent folder test-reports because we can't use CIRCLEI_BUILD_NUM in path when persisting to workspace + mkdir -p test-reports/test-reports_${CIRCLE_BUILD_NUM}/test/test-reports + cp -r test/test-reports test-reports/test-reports_${CIRCLE_BUILD_NUM}/test/test-reports + - store_test_results: + path: test/test-reports + - persist_to_workspace: + root: /Users/distiller/project/ + paths: + - test-reports + + upload_test_stats: + machine: # executor type + image: ubuntu-2004:202010-01 # # recommended linux image - includes Ubuntu 20.04, docker 19.03.13, docker-compose 1.27.4 + steps: + - checkout + - attach_workspace: + at: ~/workspace + - run: + name: upload + command: | + set -ex + if [ -z ${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD} ]; then + echo "No credentials found, cannot upload test stats (are you on a fork?)" + exit 0 + fi + cp -r ~/workspace/test-reports/* ~/project + pip3 install requests==2.26 rockset==0.8.3 boto3==1.19.12 six==1.16.0 + export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD} + export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY_FOR_OSSCI_ARTIFACT_UPLOAD} + # i dont know how to get the run attempt number for reruns so default to 1 + python3 -m tools.stats.upload_test_stats --workflow-run-id "${CIRCLE_WORKFLOW_JOB_ID}" --workflow-run-attempt 1 --head-branch << pipeline.git.branch >> --circleci pytorch_macos_10_13_py3_test: environment: BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test @@ -320,10 +512,43 @@ macos: xcode: "12.5.1" steps: - - checkout + - run: + name: checkout with retry + command: | + checkout() { + set -ex + # Workaround old docker images with incorrect $HOME + # check https://github.com/docker/docker/issues/2968 for details + if [ "${HOME}" = "/" ] + then + export HOME=$(getent passwd $(id -un) | cut -d: -f6) + fi + + mkdir -p ~/.ssh + + echo 'github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== + ' >> ~/.ssh/known_hosts + + # use git+ssh instead of https + git config --global url."ssh://git@github.com".insteadOf "https://github.com" || true + git config --global gc.auto 0 || true + + echo 'Cloning git repository' + mkdir -p '/Users/distiller/project' + cd '/Users/distiller/project' + git clone "$CIRCLE_REPOSITORY_URL" . + echo 'Checking out branch' + git checkout --force -B "$CIRCLE_BRANCH" "$CIRCLE_SHA1" + git --no-pager log --no-color -n 1 --format='HEAD is now at %h %s' + } + + retry () { + $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*) + } + retry checkout - run_brew_for_ios_build - run: - name: Run Fastlane + name: Setup Fastlane no_output_timeout: "1h" command: | set -e @@ -331,20 +556,6 @@ cd ${PROJ_ROOT}/ios/TestApp # install fastlane sudo gem install bundler && bundle install - # install certificates - echo ${IOS_CERT_KEY_2022} >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo ${IOS_SIGN_KEY_2022} >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - run: name: Build no_output_timeout: "1h" @@ -402,18 +613,12 @@ command: | set -e PROJ_ROOT=/Users/distiller/project - PROFILE=PyTorch_CI_2022 # run the ruby build script if ! [ -x "$(command -v xcodebuild)" ]; then echo 'Error: xcodebuild is not installed.' exit 1 fi - echo ${IOS_DEV_TEAM_ID} - if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then - ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID} - else - ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} - fi + ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} if ! [ "$?" -eq "0" ]; then echo 'xcodebuild failed!' exit 1 @@ -436,12 +641,13 @@ cd ${PROJ_ROOT}/ios/TestApp/benchmark mkdir -p ../models if [ ${USE_COREML_DELEGATE} == 1 ]; then - pip install coremltools==5.0b5 - pip install six + pip install coremltools==5.0b5 protobuf==3.20.1 six==1.16.0 python coreml_backend.py else - python trace_model.py + cd "${PROJ_ROOT}" + python test/mobile/model_test/gen_test_model.py ios-test fi + cd "${PROJ_ROOT}/ios/TestApp/benchmark" if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then echo "Setting up the TestApp for LiteInterpreter" ruby setup.rb --lite 1 @@ -449,10 +655,10 @@ echo "Setting up the TestApp for Full JIT" ruby setup.rb fi - cd ${PROJ_ROOT}/ios/TestApp - instruments -s -devices - if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then - if [ ${USE_COREML_DELEGATE} == 1 ]; then + cd "${PROJ_ROOT}/ios/TestApp" + # instruments -s -devices + if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then + if [ "${USE_COREML_DELEGATE}" == 1 ]; then fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML else fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter diff --git a/.github/ISSUE_TEMPLATE/ci-sev.md b/.github/ISSUE_TEMPLATE/ci-sev.md index 8178c68d978b..2b6bbfc982c9 100644 --- a/.github/ISSUE_TEMPLATE/ci-sev.md +++ b/.github/ISSUE_TEMPLATE/ci-sev.md @@ -5,6 +5,8 @@ about: Tracking incidents for PyTorch's CI infra. > NOTE: Remember to label this issue with "`ci: sev`" +**MERGE BLOCKING** + ## Current Status *Status could be: preemptive, ongoing, mitigated, closed. Also tell people if they need to take action to fix it (i.e. rebase)*. diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 7d428014cd79..dff11e6aae5c 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -1 +1 @@ -Fixes #ISSUE_NUMBER +Fixes #ISSUE_NUMBER diff --git a/.github/actionlint.yaml b/.github/actionlint.yaml index 4b5afb13f367..ff640de7bde5 100644 --- a/.github/actionlint.yaml +++ b/.github/actionlint.yaml @@ -5,9 +5,12 @@ self-hosted-runner: - linux.large - linux.2xlarge - linux.4xlarge + - linux.12xlarge + - linux.24xlarge - linux.4xlarge.nvidia.gpu - linux.8xlarge.nvidia.gpu - linux.16xlarge.nvidia.gpu + - linux.g5.4xlarge.nvidia.gpu - windows.4xlarge - windows.8xlarge.nvidia.gpu - bm-runner diff --git a/.github/actions/build-android/action.yml b/.github/actions/build-android/action.yml index 5233b62cef0e..6513d82f6966 100644 --- a/.github/actions/build-android/action.yml +++ b/.github/actions/build-android/action.yml @@ -73,4 +73,4 @@ runs: # Copy install binaries back mkdir -p "${GITHUB_WORKSPACE}/build_android_install_${MATRIX_ARCH}" docker cp "${container_name}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_${MATRIX_ARCH}" - echo "::set-output name=container_id::${container_name}" + echo "container_id=${container_name}" >> "${GITHUB_OUTPUT}" diff --git a/.github/actions/calculate-docker-image/action.yml b/.github/actions/calculate-docker-image/action.yml index 7215bf84e987..ff090d623f8e 100644 --- a/.github/actions/calculate-docker-image/action.yml +++ b/.github/actions/calculate-docker-image/action.yml @@ -47,12 +47,12 @@ runs: if [ -n "${IS_XLA}" ]; then echo "XLA workflow uses pre-built test image at ${XLA_IMAGE_TAG}" DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "::set-output name=docker-tag::${DOCKER_TAG}" - echo "::set-output name=docker-image::${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" + echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" + echo "docker-image=${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" >> "${GITHUB_OUTPUT}" else DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "::set-output name=docker-tag::${DOCKER_TAG}" - echo "::set-output name=docker-image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" + echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" + echo "docker-image=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" fi - name: Check if image should be built @@ -93,10 +93,10 @@ runs: # In order to avoid a stampeding herd of jobs trying to push all at once we set it to # skip the push. If this is negatively affecting TTS across the board the suggestion # should be to run the docker-builds.yml workflow to generate the correct docker builds - echo ::set-output name=skip_push::true + echo "skip_push=true" >> "${GITHUB_OUTPUT}" fi fi - echo ::set-output name=rebuild::yes + echo "rebuild=yes" >> "${GITHUB_OUTPUT}" - name: Build and push docker image if: inputs.always-rebuild || steps.check.outputs.rebuild diff --git a/.github/actions/download-build-artifacts/action.yml b/.github/actions/download-build-artifacts/action.yml index 9b11d0f7fe32..a7107f2067de 100644 --- a/.github/actions/download-build-artifacts/action.yml +++ b/.github/actions/download-build-artifacts/action.yml @@ -21,7 +21,7 @@ runs: - name: Download PyTorch Build Artifacts from GHA if: inputs.use-gha - uses: actions/download-artifact@v2 + uses: actions/download-artifact@v3 with: name: ${{ inputs.name }} diff --git a/.github/actions/filter-test-configs/action.yml b/.github/actions/filter-test-configs/action.yml new file mode 100644 index 000000000000..0253577134c8 --- /dev/null +++ b/.github/actions/filter-test-configs/action.yml @@ -0,0 +1,62 @@ +name: Filter test configs matrix + +description: | + Apply filter to the test configs matrix to keep only entries specified + by the PR test-config labels. If no test-config label is set, the same + test configs matrix is returned untouched. + +inputs: + github-token: + description: GITHUB_TOKEN + required: true + test-matrix: + required: true + type: string + description: JSON description of what test configs to run. + +outputs: + test-matrix: + description: The filtered test configs matrix. + value: ${{ steps.filter.outputs.test-matrix }} + is-test-matrix-empty: + description: True if the filtered test configs matrix is empty. False otherwise. + value: ${{ steps.filter.outputs.is-test-matrix-empty }} + +runs: + using: composite + steps: + - uses: nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 + name: Setup dependencies + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + with: + shell: bash + timeout_minutes: 10 + max_attempts: 5 + retry_wait_seconds: 30 + command: | + set -eux + python3 -m pip install requests==2.26.0 pyyaml==6.0 + + - name: Parse ref + shell: bash + id: parse-ref + run: .github/scripts/parse_ref.py + + - name: Select all requested test configurations + shell: bash + env: + GITHUB_TOKEN: ${{ inputs.github-token }} + id: filter + run: | + .github/scripts/filter_test_configs.py \ + --test-matrix "${{ inputs.test-matrix }}" \ + --pr-number "${{ github.event.pull_request.number }}" \ + --tag "${{ steps.parse-ref.outputs.tag }}" \ + --event-name "${{ github.event_name }}" \ + --schedule "${{ github.event.schedule }}" + + - name: Print the filtered test matrix + shell: bash + run: | + echo "${{ steps.filter.outputs.test-matrix }}" diff --git a/.github/actions/get-workflow-job-id/action.yml b/.github/actions/get-workflow-job-id/action.yml index 34863677407a..54b7bbe5e174 100644 --- a/.github/actions/get-workflow-job-id/action.yml +++ b/.github/actions/get-workflow-job-id/action.yml @@ -15,7 +15,7 @@ outputs: runs: using: composite steps: - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + - uses: nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 id: get-job-id env: GITHUB_TOKEN: ${{ inputs.github-token }} @@ -28,4 +28,4 @@ runs: set -eux python3 -m pip install requests==2.26.0 GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") - echo "::set-output name=job-id::${GHA_WORKFLOW_JOB_ID}" + echo "job-id=${GHA_WORKFLOW_JOB_ID}" >> "${GITHUB_OUTPUT}" diff --git a/.github/actions/pull-docker-image/action.yml b/.github/actions/pull-docker-image/action.yml deleted file mode 100644 index 75e8baf6f2c9..000000000000 --- a/.github/actions/pull-docker-image/action.yml +++ /dev/null @@ -1,23 +0,0 @@ -name: Pull docker image - -description: pull a specific docker image - -inputs: - docker-image: - description: the image to pull - required: true - -runs: - using: composite - steps: - - name: Pull Docker image - shell: bash - env: - DOCKER_IMAGE: ${{ inputs.docker-image }} - run: | - retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } - # ignore output since only exit code is used for conditional - # only pull docker image if it's not available locally - if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then - retry docker pull "${DOCKER_IMAGE}" - fi diff --git a/.github/actions/setup-rocm/action.yml b/.github/actions/setup-rocm/action.yml index 97dfd22c76ac..d91762eb9a86 100644 --- a/.github/actions/setup-rocm/action.yml +++ b/.github/actions/setup-rocm/action.yml @@ -36,7 +36,12 @@ runs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi diff --git a/.github/actions/setup-ssh/action.yml b/.github/actions/setup-ssh/action.yml deleted file mode 100644 index c2be35a805c4..000000000000 --- a/.github/actions/setup-ssh/action.yml +++ /dev/null @@ -1,17 +0,0 @@ -name: Setup SSH - -description: Adds ssh keys for current user to machine - -inputs: - github-secret: - description: GitHub token - required: true - -runs: - using: composite - steps: - - name: "Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ inputs.github-secret }} - activate-with-label: false diff --git a/.github/actions/setup-win/action.yml b/.github/actions/setup-win/action.yml index 12f287b23089..6dc1a1b6c6fe 100644 --- a/.github/actions/setup-win/action.yml +++ b/.github/actions/setup-win/action.yml @@ -55,6 +55,12 @@ runs: .circleci/scripts/windows_cudnn_install.sh - name: Setup Python3 - uses: actions/setup-python@v2 + uses: actions/setup-python@v4 with: - python-version: "3.x" + python-version: '3.x' + check-latest: false + cache: pip + cache-dependency-path: | + **/requirements.txt + **/.circleci/docker/requirements-ci.txt + **/.github/requirements-gha-cache.txt diff --git a/.github/actions/teardown-linux/action.yml b/.github/actions/teardown-linux/action.yml deleted file mode 100644 index 9238a073a6b6..000000000000 --- a/.github/actions/teardown-linux/action.yml +++ /dev/null @@ -1,28 +0,0 @@ -name: Teardown Linux - -description: Stuff that should always run at the end of a linux job - -inputs: - skip-wait-ssh: - description: If set, don't wait for ssh to drain before tearing down - required: false - default: "" - -runs: - using: composite - steps: - - name: Hold runner for 2 hours or until ssh sessions have drained - # TODO working-directory: !{{ pytorch_directory }} - # Always hold for active ssh sessions - shell: bash - if: inputs.skip-wait-ssh == '' - run: .github/scripts/wait_for_ssh_to_drain.sh - - - name: Kill containers, clean up images - shell: bash - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af diff --git a/.github/actions/test-pytorch-binary/action.yml b/.github/actions/test-pytorch-binary/action.yml index bc2c546f57b2..be2090db533d 100644 --- a/.github/actions/test-pytorch-binary/action.yml +++ b/.github/actions/test-pytorch-binary/action.yml @@ -15,7 +15,6 @@ runs: -e BINARY_ENV_FILE \ -e BUILDER_ROOT \ -e BUILD_ENVIRONMENT \ - -e BUILD_SPLIT_CUDA \ -e DESIRED_CUDA \ -e DESIRED_DEVTOOLSET \ -e DESIRED_PYTHON \ diff --git a/.github/actions/upload-test-artifacts/action.yml b/.github/actions/upload-test-artifacts/action.yml index 35e249ea96be..9fd2342601f1 100644 --- a/.github/actions/upload-test-artifacts/action.yml +++ b/.github/actions/upload-test-artifacts/action.yml @@ -34,7 +34,7 @@ runs: run: | # Remove any previous test reports if they exist rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' + zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' -i '*.csv' - name: Zip usage log for upload if: runner.os != 'Windows' && !inputs.use-gha @@ -67,7 +67,7 @@ runs: FILE_SUFFIX: ${{ inputs.file-suffix }} run: | # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' + 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' -ir'!test\*.csv' - name: Zip usage log for upload if: runner.os == 'Windows' && !inputs.use-gha @@ -111,7 +111,7 @@ runs: # GHA upload - name: Store Test Downloaded JSONs on Github - uses: actions/upload-artifact@v2 + uses: actions/upload-artifact@v3 if: inputs.use-gha with: # Add the run attempt, see [Artifact run attempt] @@ -121,11 +121,25 @@ runs: path: test/**/*.json - name: Store Test Reports on Github - uses: actions/upload-artifact@v2 + uses: actions/upload-artifact@v3 if: inputs.use-gha with: # Add the run attempt, see [Artifact run attempt] name: test-reports-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip retention-days: 14 - if-no-files-found: error - path: test/**/*.xml + # Don't want to fail the workflow here because not all workflows have csv files + if-no-files-found: ignore + path: | + test/**/*.xml + test/**/*.csv + + - name: Store Usage Logs on Github + uses: actions/upload-artifact@v3 + if: inputs.use-gha + with: + # Add the run attempt, see [Artifact run attempt] + name: usage-log-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip + retention-days: 14 + if-no-files-found: ignore + path: usage_log.txt + continue-on-error: true diff --git a/.github/auto_request_review.yml b/.github/auto_request_review.yml new file mode 100644 index 000000000000..339f085d939a --- /dev/null +++ b/.github/auto_request_review.yml @@ -0,0 +1,29 @@ +# Documented at https://github.com/necojackarc/auto-request-review +reviewers: + groups: + symbolic-shapes: + - ezyang + - Chillee + - anjali411 + - albanD + - miladm + - bdhirsh + - voznesenskym + - SherlockNoMad + + per_author: + symbolic-shapes: + - symbolic-shapes + - antoniojkim + - wconstab + +files: + # none yet, TODO: migrate CODEOWNERS here + +options: + ignore_draft: true + ignored_keywords: + - DO NOT REVIEW + # Just manually setup a self-referential per_author rule if you + # want group assignment + enable_group_assignment: false diff --git a/.github/ci_commit_pins/huggingface.txt b/.github/ci_commit_pins/huggingface.txt new file mode 100644 index 000000000000..4b199567e9a7 --- /dev/null +++ b/.github/ci_commit_pins/huggingface.txt @@ -0,0 +1 @@ +ebee0a27940adfbb30444d83387b9ea0f1173f40 diff --git a/.github/ci_commit_pins/text.txt b/.github/ci_commit_pins/text.txt new file mode 100644 index 000000000000..c0e01da17fd0 --- /dev/null +++ b/.github/ci_commit_pins/text.txt @@ -0,0 +1 @@ +5b78d074bd303eb230d30567646fcf0358ee2dd4 diff --git a/.github/ci_commit_pins/timm.txt b/.github/ci_commit_pins/timm.txt new file mode 100644 index 000000000000..cdda1d14775c --- /dev/null +++ b/.github/ci_commit_pins/timm.txt @@ -0,0 +1 @@ +6635bc3f7d06c6a0d0481803b24d6ad0004b61ac diff --git a/.github/ci_commit_pins/torchbench.txt b/.github/ci_commit_pins/torchbench.txt new file mode 100644 index 000000000000..28041e71960e --- /dev/null +++ b/.github/ci_commit_pins/torchbench.txt @@ -0,0 +1 @@ +24b95f2f627bf07a61cefed653419389a7586357 diff --git a/.github/ci_commit_pins/torchdynamo.txt b/.github/ci_commit_pins/torchdynamo.txt deleted file mode 100644 index 3d570d9605ed..000000000000 --- a/.github/ci_commit_pins/torchdynamo.txt +++ /dev/null @@ -1 +0,0 @@ -f19410cd8204fa1c30ca72f81142508e128be66f diff --git a/.github/ci_commit_pins/triton.txt b/.github/ci_commit_pins/triton.txt new file mode 100644 index 000000000000..7c5e80098f7b --- /dev/null +++ b/.github/ci_commit_pins/triton.txt @@ -0,0 +1 @@ +0d7e7532279e45672555e344646f5c19c3972331 diff --git a/.github/ci_commit_pins/vision.txt b/.github/ci_commit_pins/vision.txt index 511567c66dff..6874c288beca 100644 --- a/.github/ci_commit_pins/vision.txt +++ b/.github/ci_commit_pins/vision.txt @@ -1 +1 @@ -a61e6ef6ff5af041661ecc70b1a7e3dacb2240b6 +72686211e2a8b78e5a5dc8c28be34eb9cfcdad4c diff --git a/.github/ci_commit_pins/xla.txt b/.github/ci_commit_pins/xla.txt index cb6944f39202..5650a48e646b 100644 --- a/.github/ci_commit_pins/xla.txt +++ b/.github/ci_commit_pins/xla.txt @@ -1 +1 @@ -3935e4445eba5af370ebc01b4daf5cec4c026900 +216d221f4d75ddfe9d0bd3ff2e8b92b39c67d381 diff --git a/.github/labeler.yml b/.github/labeler.yml new file mode 100644 index 000000000000..e86ff2192ede --- /dev/null +++ b/.github/labeler.yml @@ -0,0 +1,51 @@ +"module: dynamo": +- torch/_dynamo/** +- torch/csrc/dynamo/** +- benchmarks/dynamo/** +- test/dynamo/** + +"module: inductor": +- torch/_inductor/** +- test/inductor/** + +"ciflow/inductor": +- torch/_dynamo/** +- torch/_inductor/** +- benchmarks/dynamo/** +- torch/_subclasses/fake_tensor.py +- torch/_subclasses/fake_utils.py +- torch/_subclasses/meta_utils.py + +"module: cpu": +- aten/src/ATen/cpu/** +- aten/src/ATen/native/cpu/** +- aten/src/ATen/native/quantized/cpu/** +- aten/src/ATen/native/Convolution*.cpp +- aten/src/ATen/native/mkldnn/** +- torch/cpu/** +- torch/utils/mkldnn.py +- test/test_mkldnn.py + +"module: mkldnn": +- third_party/ideep +- caffe2/ideep/** +- caffe2/python/ideep/** +- cmake/Modules/FindMKLDNN.cmake +- third_party/mkl-dnn.BUILD +- torch/csrc/jit/codegen/onednn/** +- test/test_jit_llga_fuser.py + +"module: amp (automated mixed precision)": +- torch/amp/** +- aten/src/ATen/autocast_mode.* +- torch/csrc/jit/passes/autocast.cpp +- test/test_autocast.py + +"NNC": +- torch/csrc/jit/tensorexpr/** + +"oncall: quantization": +- torch/ao/quantization/** +- torch/quantization/** +- aten/src/ATen/quantized/** +- aten/src/ATen/native/quantized/cpu/** diff --git a/.github/merge_rules.json b/.github/merge_rules.json deleted file mode 100644 index c0b53c7f0c69..000000000000 --- a/.github/merge_rules.json +++ /dev/null @@ -1,302 +0,0 @@ -[ - { - "name": "ONNX exporter", - "patterns": [ - ".jenkins/caffe2/*", - "aten/src/ATen/core/interned_strings.h", - "docs/source/onnx.rst", - "docs/source/scripts/onnx/**", - "scripts/onnx/**", - "test/jit/test_export_modes.py", - "test/onnx/**", - "tools/onnx/**", - "torch/_C/__init__.pyi.in", - "torch/csrc/jit/passes/onnx.*", - "torch/csrc/jit/passes/onnx/**", - "torch/csrc/jit/serialization/export.*", - "torch/csrc/jit/serialization/onnx.*", - "torch/csrc/onnx/**", - "torch/onnx/**" - ], - "approved_by": ["BowenBao", "abock"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "NVFuser", - "patterns": [ - "test/test_jit_cuda_fuser.py", - "torch/csrc/jit/codegen/fuser/cuda/**", - "torch/csrc/jit/codegen/cuda/**", - "benchmarks/cpp/nvfuser/**" - ], - "approved_by": ["csarofeen", "ngimel", "jjsjann123"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "OSS CI", - "patterns": [".github/**", ".circleci/**", ".jenkins/**", "scripts/**", "tools/**"], - "approved_by": ["ezyang", "pytorch/pytorch-dev-infra"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "CI Pinned Hashes", - "patterns": [ - ".github/ci_commit_pins/vision.txt", - ".github/ci_commit_pins/torchdynamo.txt" - ], - "approved_by": ["pytorchbot", "ezyang", "pytorch/pytorch-dev-infra"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "XLA hash pin update", - "patterns": [".github/ci_commit_pins/xla.txt"], - "approved_by": ["pytorchbot", "ezyang", "pytorch/pytorch-dev-infra"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull / linux-bionic-py3_7-clang8-xla / build", - "pull / linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)" - ] - }, - { - "name": "Documentation", - "patterns": ["docs/**", "torch/*docs.py"], - "approved_by": ["mruberry", "ngimel", "janeyx99", "svekars"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "Mobile", - "patterns": ["ios/**", "android/**", "test/mobile/**"], - "approved_by": ["linbinyu", "kit1980", "IvanKobzarev", "dreiss"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "Linear Algebra", - "patterns": [ - "aten/src/ATen/native/cuda/linalg/**", - "aten/src/ATen/LinalgBackend.h", - "aten/src/ATen/native/**LinearAlgebra*", - "docs/source/linalg.rst", - "torch/linalg/**", - "torch/_linalg_utils.py", - "torch/**python_linalg_functions.*", - "torch/**linalg.h", - "tools/autograd/templates/python_linalg_functions.cpp", - "test/test_linalg.py" - ], - "approved_by": ["nikitaved", "mruberry", "pearu", "Lezcano", "IvanYashchuk"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "FFT", - "patterns": [ - "aten/src/ATen/native/cuda/*FFT*.h", - "aten/src/ATen/native/SpectralOps.cpp", - "aten/src/ATen/native/mkl/SpectralOps.cpp", - "aten/src/ATen/native/cuda/SpectralOps.*", - "docs/source/fft.rst", - "torch/fft/**", - "torch/csrc/api/include/torch/fft.h", - "torch/**python_fft_functions.*", - "tools/autograd/templates/python_fft_functions.cpp", - "test/cpp/api/fft.cpp" - ], - "approved_by": ["mruberry", "peterbell10"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "Sparse", - "patterns": [ - "benchmarks/sparse", - "c10/util/sparse_bitset.h", - "docs/source/sparse.rst", - "torch/**sparse/**", - "torch/**sparse*", - "torch/optim/sparse*", - "torch/ao/nn/sparse/**", - "torch/utils/benchmark/**sparse*", - "aten/src/ATen/native/ao_sparse/**", - "aten/src/ATen/native/sparse/**", - "aten/src/ATen/**Sparse*", - "aten/src/ATen/*Sparse*", - "torch/_masked/**", - "test/*_masked*", - "test/**sparse*" - ], - "approved_by": ["nikitaved", "cpuhrsch", "pearu", "IvanYashchuk"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "MPS", - "patterns": [ - "test/test_mps.py", - "aten/src/ATen/native/native_functions.yaml", - "aten/src/ATen/mps/**", - "aten/src/ATen/native/mps/**" - ], - "approved_by": ["kulinseth", "razarmehr"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "Distributions", - "patterns": [ - "torch/distributions/**", - "test/distributions/**" - ], - "approved_by": ["fritzo", "neerajprad", "alicanb", "vishwakftw"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "Distributed", - "patterns": [ - "docs/source/pipeline.rst", - "docs/source/distributed*", - "docs/source/rpc.rst", - "docs/source/rpc/**", - "docs/source/_static/img/rpc*", - "docs/source/_static/img/*distributed*", - "docs/source/elastic/**", - "benchmarks/distributed/**", - "torch/distributed/**", - "torch/nn/parallel/distributed*", - "torch/_C/_distributed*", - "torch/csrc/distributed/**", - "torch/testing/_internal/distributed/**", - "test/distributed/**", - "test/cpp/dist_autograd/**", - "test/cpp/rpc/**" - ], - "approved_by": ["mrshenli", "pritamdamania87", "d4l3k", "kiukchung", "pietern"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "IDEEP", - "patterns": [ - "third_party/ideep", - "caffe2/ideep/**", - "caffe2/python/ideep/**" - ], - "approved_by": ["XiaobingSuper", "yanbing-j"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "oneDNN graph", - "patterns": [ - "torch/csrc/jit/codegen/onednn/**", - "test/test_jit_llga_fuser.py" - ], - "approved_by": ["sanchitintel", "chunyuan-w"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "CPU ATen backend", - "patterns": [ - "aten/src/ATen/cpu/**", - "aten/src/ATen/native/cpu/**", - "aten/src/ATen/native/quantized/cpu/**", - "aten/src/ATen/native/Convolution*.cpp", - "aten/src/ATen/native/mkldnn/**" - ], - "approved_by": ["mingfeima", "XiaobingSuper"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "CPU frontend", - "patterns": [ - "torch/cpu/**", - "torch/utils/mkldnn.py", - "test/test_mkldnn.py" - ], - "approved_by": ["leslie-fang-intel", "CaoE"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "Autocast", - "patterns": [ - "torch/amp/**", - "aten/src/ATen/autocast_mode.*", - "torch/csrc/jit/passes/autocast.cpp", - "test/test_autocast.py" - ], - "approved_by": ["leslie-fang-intel", "CaoE"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - }, - { - "name": "superuser", - "patterns": ["*"], - "approved_by": ["pytorch/metamates"], - "mandatory_checks_name": [ - "Facebook CLA Check", - "Lint", - "pull" - ] - } -] diff --git a/.github/merge_rules.yaml b/.github/merge_rules.yaml new file mode 100644 index 000000000000..1837cce32b2f --- /dev/null +++ b/.github/merge_rules.yaml @@ -0,0 +1,374 @@ +- name: ONNX exporter + patterns: + - .jenkins/caffe2/* + - aten/src/ATen/core/interned_strings.h + - docs/source/onnx.rst + - docs/source/onnx* + - docs/source/scripts/onnx/** + - scripts/onnx/** + - test/onnx/** + - tools/onnx/** + - torch/_C/__init__.pyi.in + - torch/csrc/jit/passes/onnx.* + - torch/csrc/jit/passes/onnx/** + - torch/csrc/jit/serialization/export.* + - torch/csrc/jit/serialization/onnx.* + - torch/csrc/onnx/** + - torch/onnx/** + - third_party/onnx + - caffe2/python/onnx/** + approved_by: + - BowenBao + - abock + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: NVFuser + patterns: + - test/test_jit_cuda_fuser.py + - torch/csrc/jit/codegen/fuser/cuda/** + - torch/csrc/jit/codegen/cuda/** + - benchmarks/cpp/nvfuser/** + approved_by: + - csarofeen + - ngimel + - jjsjann123 + - kevinstephano + - ptrblck + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: OSS CI + patterns: + - .github/** + - .circleci/** + - .jenkins/** + - scripts/** + - tools/** + approved_by: + - alband + - dagitses + - pytorch/pytorch-dev-infra + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: OSS CI / pytorchbot + patterns: + - .github/ci_commit_pins/vision.txt + - .github/ci_commit_pins/torchdynamo.txt + approved_by: + - pytorchbot + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: OSS CI / pytorchbot / XLA + patterns: + - .github/ci_commit_pins/xla.txt + approved_by: + - pytorchbot + mandatory_checks_name: + - EasyCLA + - Lint + - pull / linux-bionic-py3_7-clang8-xla / build + - pull / linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge) + +- name: Documentation + patterns: + - docs/** + - torch/*docs.py + approved_by: + - svekars + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: Mobile + patterns: + - ios/** + - android/** + - test/mobile/** + approved_by: + - linbinyu + - IvanKobzarev + - dreiss + - raziel + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: Linear Algebra + patterns: + - aten/src/ATen/native/cuda/linalg/** + - aten/src/ATen/LinalgBackend.h + - aten/src/ATen/native/**LinearAlgebra* + - docs/source/linalg.rst + - torch/linalg/** + - torch/_linalg_utils.py + - torch/**python_linalg_functions.* + - torch/**linalg.h + - tools/autograd/templates/python_linalg_functions.cpp + - test/test_linalg.py + approved_by: + - mruberry + - lezcano + - IvanYashchuk + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: FFT + patterns: + - aten/src/ATen/native/cuda/*FFT*.h + - aten/src/ATen/native/SpectralOps.cpp + - aten/src/ATen/native/mkl/SpectralOps.cpp + - aten/src/ATen/native/cuda/SpectralOps.* + - docs/source/fft.rst + - torch/fft/** + - torch/csrc/api/include/torch/fft.h + - torch/**python_fft_functions.* + - tools/autograd/templates/python_fft_functions.cpp + - test/cpp/api/fft.cpp + approved_by: + - mruberry + - peterbell10 + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: Sparse + patterns: + - benchmarks/sparse + - c10/util/sparse_bitset.h + - docs/source/sparse.rst + - torch/**sparse/** + - torch/**sparse* + - torch/optim/sparse* + - torch/ao/nn/sparse/** + - torch/utils/benchmark/**sparse* + - aten/src/ATen/native/ao_sparse/** + - aten/src/ATen/native/sparse/** + - aten/src/ATen/**Sparse* + - aten/src/ATen/*Sparse* + - torch/_masked/** + - test/*_masked* + - test/**sparse* + approved_by: + - nikitaved + - cpuhrsch + - pearu + - IvanYashchuk + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: MPS + patterns: + - test/test_mps.py + - aten/src/ATen/native/native_functions.yaml + - aten/src/ATen/mps/** + - aten/src/ATen/native/mps/** + approved_by: + - kulinseth + - alband + - malfet + - razarmehr + mandatory_checks_name: + - EasyCLA + - Lint + - pull +- name: Distributions + patterns: + - torch/distributions/** + - test/distributions/** + approved_by: + - fritzo + - neerajprad + - alicanb + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: Distributed + patterns: + - docs/source/pipeline.rst + - docs/source/distributed* + - docs/source/rpc.rst + - docs/source/rpc/** + - docs/source/_static/img/rpc* + - docs/source/_static/img/*distributed* + - docs/source/elastic/** + - benchmarks/distributed/** + - torch/distributed/** + - torch/nn/parallel/distributed* + - torch/_C/_distributed* + - torch/csrc/distributed/** + - torch/testing/_internal/distributed/** + - test/distributed/** + - test/cpp/dist_autograd/** + - test/cpp/rpc/** + approved_by: + - mrshenli + - pritamdamania87 + - zhaojuanmao + - rohan-varma + - wanchaol + - fduwjj + - H-Huang + - d4l3k + - aazzolini + - kwen2501 + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: IDEEP + patterns: + - third_party/ideep + - caffe2/ideep/** + - caffe2/python/ideep/** + - cmake/Modules/FindMKLDNN.cmake + - third_party/mkl-dnn.BUILD + approved_by: + - XiaobingSuper + - jgong5 + - mingfeima + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: oneDNN graph + patterns: + - torch/csrc/jit/codegen/onednn/** + - test/test_jit_llga_fuser.py + approved_by: + - sanchitintel + - chunyuan-w + - jgong5 + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: CPU ATen backend + patterns: + - aten/src/ATen/cpu/** + - aten/src/ATen/native/cpu/** + - aten/src/ATen/native/quantized/cpu/** + - aten/src/ATen/native/Convolution*.cpp + - aten/src/ATen/native/mkldnn/** + - test/test_mkldnn.py + approved_by: + - mingfeima + - XiaobingSuper + - jgong5 + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: CPU frontend + patterns: + - torch/cpu/** + - torch/utils/mkldnn.py + - test/test_mkldnn.py + approved_by: + - leslie-fang-intel + - jgong5 + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: Autocast + patterns: + - torch/amp/** + - aten/src/ATen/autocast_mode.* + - torch/csrc/jit/passes/autocast.cpp + - test/test_autocast.py + approved_by: + - leslie-fang-intel + - jgong5 + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: NNC + patterns: + - torch/csrc/jit/tensorexpr/** + approved_by: + - EikanWang + - jgong5 + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: Lazy Tensor + patterns: + - torch/csrc/lazy/** + - test/cpp/lazy/** + - test/lazy/** + - torchgen/api/lazy.py + - torchgen/dest/lazy_ir.py + - torchgen/dest/lazy_ts_lowering.py + - torchgen/gen_lazy_tensor.py + - aten/src/ATen/native/ts_native_functions.yaml + - .github/ci_commit_pins/xla.txt + approved_by: + - alanwaketan + - JackCaoG + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: superuser + patterns: + - '*' + approved_by: + - pytorch/metamates + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: Core Reviewers + patterns: + - '*' + approved_by: + - mruberry + - lezcano + mandatory_checks_name: + - EasyCLA + - Lint + - pull + +- name: Core Maintainers + patterns: + - '*' + approved_by: + - soumith + - gchanan + - ezyang + - dzhulgakov + mandatory_checks_name: + - EasyCLA + - Lint + - pull diff --git a/.github/requirements-gha-cache.txt b/.github/requirements-gha-cache.txt new file mode 100644 index 000000000000..6badbe2cc65c --- /dev/null +++ b/.github/requirements-gha-cache.txt @@ -0,0 +1,18 @@ +# This file is to cache other dependencies not specified elsewhere in: +# requirement.txt +# requirements-flake8.txt +# docs/requirements.txt +# docs/cpp/requirements.txt +# functorch/docs/requirements.txt +# .circleci/docker/requirements-ci.txt +boto3==1.19.12 +cffi==1.15.0 +dataclasses==0.6 +jinja2==3.0.1 +lintrunner==0.9.2 +ninja==1.10.0.post1 +pynvml==11.4.1 +pyyaml==6.0 +requests==2.26 +rich==10.9.0 +rockset==0.8.10 diff --git a/.github/requirements/README.md b/.github/requirements/README.md new file mode 100644 index 000000000000..7300eee14562 --- /dev/null +++ b/.github/requirements/README.md @@ -0,0 +1,24 @@ +### Cached requirements and consolidation of conda and pip installation + +At the moment, the installation of conda and pip dependencies happens at +different places in the CI depending at the whim of different +developers, which makes it very challenging to handle issues like +network flakiness or upstream dependency failures gracefully. So, this +center directory is created to gradually include all the conda environment +and pip requirement files that are used to setup CI jobs. Not only it +gives a clear picture of all the dependencies required by different CI +jobs, but it also allows them to be cached properly to improve CI +reliability. + +The list of support files are as follows: + +* Conda: + * conda-env-macOS-ARM64. This is used by MacOS (m1, arm64) build and + test jobs to setup the conda environment + * conda-env-macOS-X64. This is use by MacOS (x86-64) build and test + jobs to setup the conda environment + * conda-env-Linux-X64. This is used by Linux buck build and test jobs + to setup the conda environment +* Pip: + * pip-requirements-macOS.txt. This is used by MacOS build and test jobs to + setup the pip environment diff --git a/.github/requirements/conda-env-Linux-X64 b/.github/requirements/conda-env-Linux-X64 new file mode 100644 index 000000000000..f2b3811263e5 --- /dev/null +++ b/.github/requirements/conda-env-Linux-X64 @@ -0,0 +1,10 @@ +cffi=1.15.1 +cmake=3.22.1 +mkl=2022.1.0 +mkl-include=2022.1.0 +ninja=1.10.2 +numpy=1.23.3 +pyyaml=6.0 +requests=2.28.1 +setuptools=65.5.0 +typing_extensions=4.3.0 diff --git a/.github/requirements/conda-env-macOS-ARM64 b/.github/requirements/conda-env-macOS-ARM64 new file mode 100644 index 000000000000..a031b014365f --- /dev/null +++ b/.github/requirements/conda-env-macOS-ARM64 @@ -0,0 +1,20 @@ +numpy=1.22.3 +pyyaml=6.0 +setuptools=61.2.0 +cmake=3.22.1 +cffi=1.15.1 +typing_extensions=4.3.0 +dataclasses=0.8 +pip=22.2.2 +six=1.16.0 +pillow=9.2.0 +pkg-config=0.29.2 +wheel=0.37.1 +expecttest=0.1.3 + +# Not pinning certifi so that we can always get the latest certificates +certifi + +# Cross-compiling arm64 from x86-64 picks up 1.40.0 while testing on arm64 +# itself only has up to 1.39.0 from upstream conda. Both work though +libuv>=1.39.0,<=1.40.0 diff --git a/.github/requirements/conda-env-macOS-X64 b/.github/requirements/conda-env-macOS-X64 new file mode 100644 index 000000000000..81463d4b39d5 --- /dev/null +++ b/.github/requirements/conda-env-macOS-X64 @@ -0,0 +1,18 @@ +mkl=2021.2.0 +mkl-include=2021.2.0 +numpy=1.18.5 +pyyaml=5.3 +setuptools=46.0.0 +cmake=3.22.1 +cffi=1.15.1 +typing_extensions=4.3.0 +dataclasses=0.8 +pip=22.2.2 +six=1.16.0 +pillow=9.2.0 +libuv=1.40.0 +pkg-config=0.29.2 +wheel=0.37.1 + +# Not pinning certifi so that we can always get the latest certificates +certifi diff --git a/.github/requirements/pip-requirements-macOS.txt b/.github/requirements/pip-requirements-macOS.txt new file mode 100644 index 000000000000..dfbaea260116 --- /dev/null +++ b/.github/requirements/pip-requirements-macOS.txt @@ -0,0 +1,22 @@ +boto3==1.19.12 +hypothesis==6.56.4 +expecttest==0.1.3 +librosa>=0.6.2 +mpmath==1.2.1 +networkx==2.8.7 +# Use numba-0.49.1 or older on Intel Macs, but 0.56.0 on M1 machines, as older numba is not available +numba==0.56.0; platform_machine == "arm64" +numba<=0.49.1; platform_machine != "arm64" +opt-einsum>=3.3 +psutil==5.9.1 +pynvml==11.4.1 +pygments==2.12.0 +pytest==7.2.0 +pytest-xdist==3.0.2 +pytest-rerunfailures==10.2 +pytest-flakefinder==1.1.0 +pytest-shard==0.1.2 +scipy==1.9.0 +sympy==1.11.1 +unittest-xml-reporting<=3.2.0,>=2.0.0 +xdoctest==1.0.2 diff --git a/.github/scale-config.yml b/.github/scale-config.yml deleted file mode 100644 index 1cf99b326ba8..000000000000 --- a/.github/scale-config.yml +++ /dev/null @@ -1,69 +0,0 @@ -# scale-config.yml: -# Powers what instance types are available for GHA auto-scaled -# runners. Runners listed here will be available as self hosted -# runners, configuration is directly pulled from the main branch. -# -# NOTE (Apr, 5, 2021): Linux runners are currently all an amazonlinux2 -# -# NOTE (Jan 5, 2021): Linux runners are all non-ephemeral to reduce the amount of CreateInstaces calls -# to avoid RequestLimitExceeded issues -# -# TODO: Add some documentation on how the auto-scaling works -# -# NOTE: Default values, -# -# runner_types: -# runner_label: -# instance_type: m4.large -# os: linux -# max_available: 20 -# disk_size: 50 -# is_ephemeral: true - -runner_types: - # mainly used for ciflow-should-run, not made to run any serious tests - linux.large: - instance_type: c5.large - os: linux - disk_size: 10 - is_ephemeral: false - linux.2xlarge: - instance_type: c5.2xlarge - os: linux - max_available: 1000 - disk_size: 150 - is_ephemeral: false - linux.4xlarge: # for binary-builds - instance_type: c5.4xlarge - os: linux - max_available: 500 - disk_size: 150 - is_ephemeral: false - linux.8xlarge.nvidia.gpu: - instance_type: g3.8xlarge - os: linux - max_available: 200 - disk_size: 150 - is_ephemeral: false - linux.4xlarge.nvidia.gpu: - instance_type: g3.4xlarge - os: linux - max_available: 250 - disk_size: 150 - is_ephemeral: false - linux.16xlarge.nvidia.gpu: - instance_type: g3.16xlarge - os: linux - max_available: 10 - disk_size: 150 - is_ephemeral: false - windows.4xlarge: - instance_type: c5d.4xlarge - os: windows - max_available: 200 - disk_size: 256 - windows.8xlarge.nvidia.gpu: - instance_type: p3.2xlarge - os: windows - max_available: 100 - disk_size: 256 diff --git a/.github/scripts/README.md b/.github/scripts/README.md index 22099c3732ea..cc9e1617b11a 100644 --- a/.github/scripts/README.md +++ b/.github/scripts/README.md @@ -3,7 +3,7 @@ > NOTE: This README contains information for the `.github` directory but cannot be located there because it will overwrite the repo README. -This directory contains workflows and scripts to support our CI infrastructure that runs on Github Actions. +This directory contains workflows and scripts to support our CI infrastructure that runs on GitHub Actions. ## Workflows @@ -36,7 +36,7 @@ New generated binary workflows can be added in the `.github/scripts/generate_ci_ examples from that script in order to add the workflow to the stream that is relevant to what you particularly care about. -Different parameters can be used to acheive different goals, i.e. running jobs on a cron, running only on trunk, etc. +Different parameters can be used to achieve different goals, i.e. running jobs on a cron, running only on trunk, etc. #### ciflow (trunk) diff --git a/.github/scripts/build_publish_nightly_docker.sh b/.github/scripts/build_publish_nightly_docker.sh deleted file mode 100644 index db84704aa3e4..000000000000 --- a/.github/scripts/build_publish_nightly_docker.sh +++ /dev/null @@ -1,44 +0,0 @@ -#!/usr/bin/env bash - -set -xeuo pipefail - -PYTORCH_DOCKER_TAG=$(git describe --tags --always)-devel -CUDA_VERSION=11.3.1 - -# Build PyTorch nightly docker -make -f docker.Makefile \ - DOCKER_REGISTRY=ghcr.io \ - DOCKER_ORG=pytorch \ - CUDA_VERSION=${CUDA_VERSION} \ - DOCKER_IMAGE=pytorch-nightly \ - DOCKER_TAG=${PYTORCH_DOCKER_TAG} \ - INSTALL_CHANNEL=pytorch-nightly BUILD_TYPE=official devel-image - -# Get the PYTORCH_NIGHTLY_COMMIT from the docker image -PYTORCH_NIGHTLY_COMMIT=$(docker run \ - ghcr.io/pytorch/pytorch-nightly:${PYTORCH_DOCKER_TAG} \ - python -c 'import torch; print(torch.version.git_version)' | head -c 7) - -docker tag ghcr.io/pytorch/pytorch-nightly:${PYTORCH_DOCKER_TAG} \ - ghcr.io/pytorch/pytorch-nightly:${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION} - -docker tag ghcr.io/pytorch/pytorch-nightly:${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION} \ - ghcr.io/pytorch/pytorch-nightly:latest - -if [[ ${WITH_PUSH:-} == "true" ]]; then - # Push the nightly docker to GitHub Container Registry - echo $GHCR_PAT | docker login ghcr.io -u pytorch --password-stdin - make -f docker.Makefile \ - DOCKER_REGISTRY=ghcr.io \ - DOCKER_ORG=pytorch \ - DOCKER_IMAGE=pytorch-nightly \ - DOCKER_TAG=${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION} \ - devel-push - - make -f docker.Makefile \ - DOCKER_REGISTRY=ghcr.io \ - DOCKER_ORG=pytorch \ - DOCKER_IMAGE=pytorch-nightly \ - DOCKER_TAG=latest \ - devel-push -fi diff --git a/.github/scripts/build_triton_wheel.py b/.github/scripts/build_triton_wheel.py new file mode 100644 index 000000000000..d9d2a2e98bd3 --- /dev/null +++ b/.github/scripts/build_triton_wheel.py @@ -0,0 +1,51 @@ +#!/usr/bin/env python3 +from subprocess import check_call +from pathlib import Path +from tempfile import TemporaryDirectory +import sys +import shutil +SCRIPT_DIR = Path(__file__).parent + +def read_triton_pin() -> str: + with open(SCRIPT_DIR.parent / "ci_commit_pins" / "triton.txt") as f: + return f.read().strip() + + +def check_and_replace(inp: str, src: str, dst: str) -> str: + """ Checks that `src` can be found in `input` and replaces it with `dst` """ + if src not in inp: + raise RuntimeError(f"Can't find ${src} in the input") + return inp.replace(src, dst) + + +def patch_setup_py(path: Path, *, version: str = "2.0.0", name: str = "triton") -> None: + with open(path) as f: + orig = f.read() + # Replace name + orig = check_and_replace(orig, "name=\"triton\",", f"name=\"{name}\",") + # Replace version + orig = check_and_replace(orig, "version=\"2.0.0\",", f"version=\"{version}\",") + with open(path, "w") as f: + f.write(orig) + + +def build_triton(commit_hash: str) -> Path: + with TemporaryDirectory() as tmpdir: + triton_basedir = Path(tmpdir) / "triton" + triton_pythondir = triton_basedir / "python" + check_call(["git", "clone", "https://github.com/openai/triton"], cwd=tmpdir) + check_call(["git", "checkout", commit_hash], cwd=triton_basedir) + patch_setup_py(triton_pythondir / "setup.py", name="torchtriton", version=f"2.0.0+{commit_hash[:10]}") + check_call([sys.executable, "setup.py", "bdist_wheel"], cwd=triton_pythondir) + whl_path = list((triton_pythondir / "dist").glob("*.whl"))[0] + shutil.copy(whl_path, Path.cwd()) + return Path.cwd() / whl_path.name + + +def main() -> None: + pin = read_triton_pin() + build_triton(pin) + + +if __name__ == "__main__": + main() diff --git a/.github/scripts/check_labels.py b/.github/scripts/check_labels.py new file mode 100755 index 000000000000..2d4a216daf94 --- /dev/null +++ b/.github/scripts/check_labels.py @@ -0,0 +1,87 @@ +#!/usr/bin/env python3 +"""check_labels.py""" + +from typing import Any, List + +from export_pytorch_labels import get_pytorch_labels +from gitutils import ( + get_git_remote_name, + get_git_repo_dir, + GitRepo, +) +from trymerge import ( + _fetch_url, + gh_post_pr_comment, + GitHubPR, +) + + +BOT_AUTHORS = ["github-actions", "pytorchmergebot", "pytorch-bot"] + +ERR_MSG_TITLE = "This PR needs a label" +ERR_MSG = ( + f"# {ERR_MSG_TITLE}\n" + "If your changes are user facing and intended to be a part of release notes, please use a label starting with `release notes:`.\n\n" # noqa: E501 pylint: disable=line-too-long + "If not, please add the `topic: not user facing` label.\n\n" + "For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work." # noqa: E501 pylint: disable=line-too-long +) + + +def get_release_notes_labels() -> List[str]: + return [label for label in get_pytorch_labels() if label.lstrip().startswith("release notes:")] + + +def delete_comment(comment_id: int) -> None: + url = f"https://api.github.com/repos/pytorch/pytorch/issues/comments/{comment_id}" + _fetch_url(url, method="DELETE") + + +def has_required_labels(pr: GitHubPR) -> bool: + pr_labels = pr.get_labels() + # Check if PR is not user facing + is_not_user_facing_pr = any(label.strip() == "topic: not user facing" for label in pr_labels) + return is_not_user_facing_pr or any(label.strip() in get_release_notes_labels() for label in pr_labels) + + +def delete_comments(pr: GitHubPR) -> None: + # Delete all previous comments + for comment in pr.get_comments(): + if comment.body_text.lstrip(" #").startswith(ERR_MSG_TITLE) and comment.author_login in BOT_AUTHORS: + delete_comment(comment.database_id) + + +def add_comment(pr: GitHubPR) -> None: + # Only make a comment if one doesn't exist already + for comment in pr.get_comments(): + if comment.body_text.lstrip(" #").startswith(ERR_MSG_TITLE) and comment.author_login in BOT_AUTHORS: + return + gh_post_pr_comment(pr.org, pr.project, pr.pr_num, ERR_MSG) + + +def parse_args() -> Any: + from argparse import ArgumentParser + parser = ArgumentParser("Check PR labels") + parser.add_argument("pr_num", type=int) + + return parser.parse_args() + + +def main() -> None: + args = parse_args() + repo = GitRepo(get_git_repo_dir(), get_git_remote_name()) + org, project = repo.gh_owner_and_name() + pr = GitHubPR(org, project, args.pr_num) + + try: + if not has_required_labels(pr): + print(ERR_MSG) + add_comment(pr) + exit(1) + else: + delete_comments(pr) + except Exception as e: + pass + + +if __name__ == "__main__": + main() diff --git a/.github/scripts/comment_on_pr.py b/.github/scripts/comment_on_pr.py new file mode 100644 index 000000000000..06b2eefe0988 --- /dev/null +++ b/.github/scripts/comment_on_pr.py @@ -0,0 +1,34 @@ +from typing import Any +from trymerge import gh_post_pr_comment +from gitutils import get_git_remote_name, get_git_repo_dir, GitRepo +from trymerge_explainer import BOT_COMMANDS_WIKI +import os + + +def parse_args() -> Any: + from argparse import ArgumentParser + + parser = ArgumentParser("Comment on a PR") + parser.add_argument("pr_num", type=int) + parser.add_argument("action", type=str) + return parser.parse_args() + + +def main() -> None: + args = parse_args() + repo = GitRepo(get_git_repo_dir(), get_git_remote_name(), debug=True) + org, project = repo.gh_owner_and_name() + run_url = os.environ.get("GH_RUN_URL") + + job_link = f"[job]({run_url})" if run_url is not None else "job" + msg = ( + f"The {args.action} {job_link} was canceled. If you believe this is a mistake," + + f"then you can re trigger it through [pytorch-bot]({BOT_COMMANDS_WIKI})." + ) + + gh_post_pr_comment(org, project, args.pr_num, msg) + print(org, project, args.pr_num, msg) + + +if __name__ == "__main__": + main() diff --git a/.github/scripts/ensure_actions_will_cancel.py b/.github/scripts/ensure_actions_will_cancel.py index c479aefb9fc4..729d02f560fa 100755 --- a/.github/scripts/ensure_actions_will_cancel.py +++ b/.github/scripts/ensure_actions_will_cancel.py @@ -42,26 +42,26 @@ def should_check(filename: Path) -> bool: print("ERROR: duplicate workflow name:", name, file=sys.stderr) errors_found = True names.add(name) - - expected = { - "group": EXPECTED_GROUP, - "cancel-in-progress": True, - } - actual = data.get("concurrency", None) - if actual != expected: + actual = data.get("concurrency", {}) + if not actual.get("group", "").startswith(EXPECTED_GROUP): print( f"'concurrency' incorrect or not found in '{filename.relative_to(REPO_ROOT)}'", file=sys.stderr, ) print( - f"expected: {expected}", + f"concurrency group should start with {EXPECTED_GROUP} but found {actual.get('group', None)}", file=sys.stderr, ) + errors_found = True + if not actual.get("cancel-in-progress", False): print( - f"actual: {actual}", + f"'concurrency' incorrect or not found in '{filename.relative_to(REPO_ROOT)}'", + file=sys.stderr, + ) + print( + f"concurrency cancel-in-progress should be True but found {actual.get('cancel-in-progress', None)}", file=sys.stderr, ) - errors_found = True if errors_found: sys.exit(1) diff --git a/.github/scripts/fetch_latest_green_commit.py b/.github/scripts/fetch_latest_green_commit.py index c9bb4830ab72..447b76b2dd8b 100644 --- a/.github/scripts/fetch_latest_green_commit.py +++ b/.github/scripts/fetch_latest_green_commit.py @@ -84,8 +84,6 @@ def isGreen(commit: str, results: Dict[str, Any]) -> Tuple[bool, str]: return (False, workflowName + " checks were not successful") else: regex[required_check] = True - if workflowName in ["periodic", "docker-release-builds"] and conclusion not in ["success", "skipped"]: - return (False, workflowName + " checks were not successful") missing_workflows = [x for x in regex.keys() if not regex[x]] if len(missing_workflows) > 0: @@ -110,7 +108,7 @@ def main() -> None: ) qlambda = rs.QueryLambda.retrieve( 'commit_jobs_batch_query', - version='15aba20837ae9d75', + version='8003fdfd18b64696', workspace='commons') commits = get_latest_commits() diff --git a/.github/scripts/filter_test_configs.py b/.github/scripts/filter_test_configs.py new file mode 100755 index 000000000000..eab32401ad97 --- /dev/null +++ b/.github/scripts/filter_test_configs.py @@ -0,0 +1,207 @@ +#!/usr/bin/env python3 + +import sys +import re +import json +import os +import requests +from typing import Any, Dict, Set, List +import yaml +import warnings + +PREFIX = "test-config/" + +# Same as shard names +VALID_TEST_CONFIG_LABELS = {f"{PREFIX}{label}" for label in { + "backwards_compat", + "crossref", + "default", + "deploy", + "distributed", + "docs_tests", + "dynamo", + "force_on_cpu", + "functorch", + "inductor", + "inductor_distributed", + "inductor_huggingface", + "inductor_timm", + "inductor_torchbench", + "jit_legacy", + "multigpu", + "nogpu_AVX512", + "nogpu_NO_AVX2", + "slow", + "tsan", + "xla", +}} + +# Supported modes when running periodically +SUPPORTED_PERIODICAL_MODES = { + "mem_leak_check", + "rerun_disabled_tests", +} + + +def parse_args() -> Any: + from argparse import ArgumentParser + parser = ArgumentParser("Filter all test configurations and keep only requested ones") + parser.add_argument("--test-matrix", type=str, required=True, help="the original test matrix") + parser.add_argument("--pr-number", type=str, help="the pull request number") + parser.add_argument("--tag", type=str, help="the associated tag if it exists") + parser.add_argument("--event-name", type=str, help="name of the event that triggered the job (pull, schedule, etc)") + parser.add_argument("--schedule", type=str, help="cron schedule that triggered the job") + return parser.parse_args() + + +def get_labels(pr_number: int) -> Set[str]: + """ + Dynamical get the latest list of labels from the pull request + """ + # From https://docs.github.com/en/actions/learn-github-actions/environment-variables + PYTORCH_REPO = os.environ.get("GITHUB_REPOSITORY", "pytorch/pytorch") + PYTORCH_GITHUB_API = f"https://api.github.com/repos/{PYTORCH_REPO}" + GITHUB_TOKEN = os.environ["GITHUB_TOKEN"] + + REQUEST_HEADERS = { + "Accept": "application/vnd.github.v3+json", + "Authorization": "token " + GITHUB_TOKEN, + } + + response = requests.get( + f"{PYTORCH_GITHUB_API}/issues/{pr_number}/labels", + headers=REQUEST_HEADERS, + ) + + if response.status_code != requests.codes.ok: + warnings.warn(f"Failed to get the labels for #{pr_number} (status code {response.status_code})") + return set() + + return {label.get("name") for label in response.json() if label.get("name")} + + +def filter(test_matrix: Dict[str, List[Any]], labels: Set[str]) -> Dict[str, List[Any]]: + """ + Select the list of test config to run from the test matrix. The logic works + as follows: + + If the PR has one or more labels as specified in the VALID_TEST_CONFIG_LABELS set, only + these test configs will be selected. This also works with ciflow labels, for example, + if a PR has both ciflow/trunk and test-config/functorch, only trunk functorch builds + and tests will be run + + If the PR has none of the test-config label, all tests are run as usual. + """ + + filtered_test_matrix: Dict[str, List[Any]] = { + "include": [] + } + + for entry in test_matrix.get("include", []): + config_name = entry.get("config", "") + if not config_name: + continue + + label = f"{PREFIX}{config_name.strip()}" + if label in labels: + print(f"Select {config_name} because label {label} is presented in the pull request by the time the test starts") + filtered_test_matrix["include"].append(entry) + + valid_test_config_labels = labels.intersection(VALID_TEST_CONFIG_LABELS) + + if not filtered_test_matrix["include"] and not valid_test_config_labels: + # Found no valid label and the filtered test matrix is empty, return the same + # test matrix as before so that all tests can be run normally + return test_matrix + else: + # When the filter test matrix contain matches or if a valid test config label + # is found in the PR, return the filtered test matrix + return filtered_test_matrix + + +def set_periodic_modes(test_matrix: Dict[str, List[Any]]) -> Dict[str, List[Any]]: + """ + Apply all periodic modes when running under a schedule + """ + scheduled_test_matrix: Dict[str, List[Any]] = { + "include": [], + } + + for config in test_matrix.get("include", []): + for mode in SUPPORTED_PERIODICAL_MODES: + cfg = config.copy() + cfg[mode] = mode + scheduled_test_matrix["include"].append(cfg) + + return scheduled_test_matrix + + +def set_output(name: str, val: Any) -> None: + if os.getenv("GITHUB_OUTPUT"): + with open(str(os.getenv("GITHUB_OUTPUT")), "a") as env: + print(f"{name}={val}", file=env) + else: + print(f"::set-output name={name}::{val}") + + +def main() -> None: + args = parse_args() + # Load the original test matrix set by the workflow. Its format, however, + # doesn't follow the strict JSON format, so we load it using yaml here for + # its more relaxed syntax + test_matrix = yaml.safe_load(args.test_matrix) + + if test_matrix is None: + warnings.warn(f"Invalid test matrix input '{args.test_matrix}', exiting") + # We handle invalid test matrix gracefully by marking it as empty + set_output("is-test-matrix-empty", True) + sys.exit(0) + + pr_number = args.pr_number + tag = args.tag + + # If the tag matches, we can get the PR number from it, this is from ciflow + # workflow dispatcher + tag_regex = re.compile(r"^ciflow/\w+/(?P\d+)$") + + if pr_number: + # If a PR number is set, query all the labels from that PR + labels = get_labels(int(pr_number)) + # Then filter the test matrix and keep only the selected ones + filtered_test_matrix = filter(test_matrix, labels) + + elif tag: + m = tag_regex.match(tag) + + if m: + pr_number = m.group("pr_number") + + # The PR number can also come from the tag in ciflow tag event + labels = get_labels(int(pr_number)) + # Filter the test matrix and keep only the selected ones + filtered_test_matrix = filter(test_matrix, labels) + + else: + # There is a tag but it isn't ciflow, so there is nothing left to do + filtered_test_matrix = test_matrix + + else: + # No PR number, no tag, we can just return the test matrix as it is + filtered_test_matrix = test_matrix + + if args.event_name == "schedule" and args.schedule == '29 8 * * *': + # we don't want to run the mem leack check or disabled tests on normal + # periodically scheduled jobs, only the ones at this time + filtered_test_matrix = set_periodic_modes(filtered_test_matrix) + + # Set the filtered test matrix as the output + set_output("test-matrix", json.dumps(filtered_test_matrix)) + + filtered_test_matrix_len = len(filtered_test_matrix.get("include", [])) + # and also put a flag if the test matrix is empty, so subsequent jobs can + # quickly check it without the need to parse the JSON string + set_output("is-test-matrix-empty", filtered_test_matrix_len == 0) + + +if __name__ == "__main__": + main() diff --git a/.github/scripts/generate_binary_build_matrix.py b/.github/scripts/generate_binary_build_matrix.py index b1e3b46bda34..deb225287b3f 100644 --- a/.github/scripts/generate_binary_build_matrix.py +++ b/.github/scripts/generate_binary_build_matrix.py @@ -13,10 +13,10 @@ from typing import Dict, List, Tuple, Optional -CUDA_ARCHES = ["10.2", "11.3", "11.6", "11.7"] +CUDA_ARCHES = ["11.6", "11.7"] -ROCM_ARCHES = ["5.1.1", "5.2"] +ROCM_ARCHES = ["5.2", "5.3"] def arch_type(arch_version: str) -> str: @@ -90,11 +90,8 @@ def generate_conda_matrix(os: str) -> List[Dict[str, str]]: ret: List[Dict[str, str]] = [] arches = ["cpu"] python_versions = FULL_PYTHON_VERSIONS - if os == "linux": + if os == "linux" or os == "windows": arches += CUDA_ARCHES - elif os == "windows": - # We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648 - arches += list_without(CUDA_ARCHES, ["10.2"]) elif os == "macos-arm64": python_versions = list_without(python_versions, ["3.7"]) for python_version in python_versions: @@ -129,8 +126,7 @@ def generate_libtorch_matrix(os: str, abi_version: str, arches += CUDA_ARCHES arches += ROCM_ARCHES elif os == "windows": - # We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648 - arches += list_without(CUDA_ARCHES, ["10.2"]) + arches += CUDA_ARCHES if libtorch_variants is None: libtorch_variants = [ @@ -198,8 +194,7 @@ def generate_wheels_matrix(os: str, if os == "linux": arches += CUDA_ARCHES + ROCM_ARCHES elif os == "windows": - # We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648 - arches += list_without(CUDA_ARCHES, ["10.2"]) + arches += CUDA_ARCHES ret: List[Dict[str, str]] = [] for python_version in python_versions: @@ -209,6 +204,32 @@ def generate_wheels_matrix(os: str, # Skip rocm 3.11 binaries for now as the docker image are not correct if python_version == "3.11" and gpu_arch_type == "rocm": continue + + # special 11.7 wheels package without dependencies + # dependency downloaded via pip install + if arch_version == "11.7" and os == "linux": + ret.append( + { + "python_version": python_version, + "gpu_arch_type": gpu_arch_type, + "gpu_arch_version": gpu_arch_version, + "desired_cuda": translate_desired_cuda( + gpu_arch_type, gpu_arch_version + ), + "container_image": WHEEL_CONTAINER_IMAGES[arch_version], + "package_type": package_type, + "pytorch_extra_install_requirements": + "nvidia-cuda-runtime-cu11; platform_system == 'Linux' | " + "nvidia-cudnn-cu11==8.5.0.96; platform_system == 'Linux' | " + "nvidia-cublas-cu11==11.10.3.66; platform_system == 'Linux'", + "build_name": + f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}-with-pypi-cudnn" + .replace( + ".", "_" + ), + } + ) + ret.append( { "python_version": python_version, diff --git a/.github/scripts/generate_ci_workflows.py b/.github/scripts/generate_ci_workflows.py index 653cfeebaab7..35680e30ee6a 100755 --- a/.github/scripts/generate_ci_workflows.py +++ b/.github/scripts/generate_ci_workflows.py @@ -134,7 +134,7 @@ class OperatingSystem: package_type="manywheel", build_configs=generate_binary_build_matrix.generate_wheels_matrix( OperatingSystem.LINUX, - arches=["10.2"], + arches=["11.6"], python_versions=["3.7"]), branches="master", ), @@ -154,7 +154,7 @@ class OperatingSystem: package_type="libtorch", abi_version=generate_binary_build_matrix.PRE_CXX11_ABI, build_configs=generate_binary_build_matrix.generate_libtorch_matrix( - OperatingSystem.LINUX, generate_binary_build_matrix.CXX11_ABI, + OperatingSystem.LINUX, generate_binary_build_matrix.PRE_CXX11_ABI, arches=["cpu"], libtorch_variants=["shared-with-deps"], ), @@ -207,15 +207,6 @@ class OperatingSystem: ), ] WINDOWS_BINARY_SMOKE_WORKFLOWS = [ - BinaryBuildWorkflow( - os=OperatingSystem.WINDOWS, - package_type="wheel", - build_configs=generate_binary_build_matrix.generate_wheels_matrix( - OperatingSystem.WINDOWS, - arches=["11.3"], - python_versions=["3.7"]), - branches="master", - ), BinaryBuildWorkflow( os=OperatingSystem.WINDOWS, package_type="libtorch", @@ -286,7 +277,7 @@ class OperatingSystem: BinaryBuildWorkflow( os=OperatingSystem.MACOS_ARM64, package_type="wheel", - build_configs=generate_binary_build_matrix.generate_wheels_matrix(OperatingSystem.MACOS), + build_configs=generate_binary_build_matrix.generate_wheels_matrix(OperatingSystem.MACOS_ARM64), cross_compile_arm64=True, ciflow_config=CIFlowConfig( labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL}, diff --git a/.github/scripts/generate_pytorch_version.py b/.github/scripts/generate_pytorch_version.py index 0655df137e07..02c19844cd09 100755 --- a/.github/scripts/generate_pytorch_version.py +++ b/.github/scripts/generate_pytorch_version.py @@ -23,27 +23,22 @@ def get_pytorch_root() -> Path: def get_tag() -> str: root = get_pytorch_root() - # We're on a tag - am_on_tag = ( - subprocess.run( - ['git', 'describe', '--tags', '--exact'], - cwd=root, - stdout=subprocess.DEVNULL, - stderr=subprocess.DEVNULL - ).returncode == 0 - ) - tag = "" - if am_on_tag: + try: dirty_tag = subprocess.check_output( - ['git', 'describe'], + ['git', 'describe', '--tags', '--exact'], cwd=root ).decode('ascii').strip() - # Strip leading v that we typically do when we tag branches - # ie: v1.7.1 -> 1.7.1 - tag = re.sub(LEADING_V_PATTERN, "", dirty_tag) - # Strip trailing rc pattern - # ie: 1.7.1-rc1 -> 1.7.1 - tag = re.sub(TRAILING_RC_PATTERN, "", tag) + except subprocess.CalledProcessError: + return "" + # Strip leading v that we typically do when we tag branches + # ie: v1.7.1 -> 1.7.1 + tag = re.sub(LEADING_V_PATTERN, "", dirty_tag) + # Strip trailing rc pattern + # ie: 1.7.1-rc1 -> 1.7.1 + tag = re.sub(TRAILING_RC_PATTERN, "", tag) + # Ignore ciflow tags + if tag.startswith("ciflow/"): + return "" return tag def get_base_version() -> str: diff --git a/.github/scripts/gql_mocks.json b/.github/scripts/gql_mocks.json index b146600f936a..7f6dbc05d341 100644 --- a/.github/scripts/gql_mocks.json +++ b/.github/scripts/gql_mocks.json @@ -1,20 +1,20 @@ { - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=73811 owner=pytorch": { + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=82169 owner=pytorch": { "data": { "repository": { "pullRequest": { "closed": true, "isCrossRepository": false, "author": { - "login": "seemethere" + "login": "ezyang" }, - "title": "ci: Migrate metrics credentials to managed IAM", - "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* __->__ #73811\n\r\nMigrates our credentials to upload metrics statistics to managed IAM\r\ncredentials in order to make it easier to know where the credentials are\r\ncoming from and to make it easier to add more permissions / less\r\npermissions later on.\r\n\r\nRelates to work done in [D34535827](https://www.internalfb.com/diff/D34535827)\r\n\r\nSigned-off-by: Eli Uriegas ", - "headRefName": "gh/seemethere/215/head", + "title": "Move test_dtypes so it runs later", + "body": "Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):\n* __->__ #82169\n\nThe error messages it gives are very unhelpful (because a failure\ngets translated into \"dtype was not supported\" rather than the\nactual backtrace), so I'd rather get error messages about this after\nI've tested basic functionality.\n\nSigned-off-by: Edward Z. Yang ", + "headRefName": "gh/ezyang/1279/head", "headRepository": { "nameWithOwner": "pytorch/pytorch" }, - "baseRefName": "gh/seemethere/215/base", + "baseRefName": "gh/ezyang/1279/base", "baseRepository": { "nameWithOwner": "pytorch/pytorch", "isPrivate": false, @@ -29,32 +29,44 @@ "commit": { "author": { "user": { - "login": "seemethere" + "login": "ezyang" }, - "email": "eliuriegas@fb.com", - "name": "Eli Uriegas" + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" }, - "oid": "13c44d16a876a56bca479b4cf30715d21fa16e99" + "oid": "cef34da55a59da5a32494bff218ccd4978b659d3" } }, { "commit": { "author": { "user": { - "login": "seemethere" + "login": "ezyang" }, - "email": "eliuriegas@fb.com", - "name": "Eli Uriegas" + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" }, - "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7" + "oid": "83ad7e73a07111ac1d85e931d14360cc22c01edd" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "28140e4008289251b695385acfb48ac7a47cd49c" } } ], "pageInfo": { - "endCursor": "Mg", + "endCursor": "Mw", "hasNextPage": false }, - "totalCount": 2 + "totalCount": 3 }, "commits": { "nodes": [ @@ -62,54 +74,6 @@ "commit": { "checkSuites": { "edges": [ - { - "node": { - "app": { - "name": "Facebook GitHub Tools", - "databaseId": 12274 - }, - "workflowRun": null, - "checkRuns": { - "nodes": [ - { - "name": "Facebook CLA Check", - "conclusion": "SUCCESS", - "detailsUrl": "https://code.intern.facebook.com/cla/" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOaHA=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658275867" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcBs=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" - } - }, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276090" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcPo=" - }, { "node": { "app": { @@ -118,20 +82,61 @@ }, "workflowRun": { "workflow": { - "name": "win-vs2019-cpu-py3" + "name": "Lint" } }, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310707890" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708140" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708223" + }, + { + "name": "Test collect_env (older_python_version)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708332" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708496" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708710" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708937" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310709169" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGj1lc=", "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276092" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696649" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcPw=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc8k=" }, { "node": { @@ -141,7 +146,7 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3-clang5-mobile-build" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { @@ -152,9 +157,9 @@ } }, "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276094" + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696651" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcP4=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc8s=" }, { "node": { @@ -164,20 +169,26 @@ }, "workflowRun": { "workflow": { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823982/jobs/4310707884" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGjz0w=", "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276095" + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696656" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcP8=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc9A=" }, { "node": { @@ -198,9 +209,9 @@ } }, "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276097" + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696660" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQE=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc9Q=" }, { "node": { @@ -210,7 +221,7 @@ }, "workflowRun": { "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + "name": "pull" } }, "checkRuns": { @@ -221,9 +232,9 @@ } }, "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276098" + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696715" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdAs=" }, { "node": { @@ -233,375 +244,304 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-gcc7-no-ops" + "name": "pull" } }, "checkRuns": { "nodes": [ { - "name": "build", + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815315?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObRM=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276099" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQM=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "Test tools" - } - }, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276100" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQQ=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-clang7-asan" - } - }, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276101" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQU=" - } - ], - "pageInfo": { - "hasNextPage": true - } - }, - "pushedDate": "2022-03-14T23:01:55Z", - "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7" - } - } - ] - }, - "changedFiles": 3, - "files": { - "nodes": [ - { - "path": ".github/templates/common.yml.j2" - }, - { - "path": ".github/workflows/generated-macos-11-py3-x86-64.yml" - }, - { - "path": ".github/workflows/update_pytorch_labels.yml" - } - ], - "pageInfo": { - "endCursor": "Mw", - "hasNextPage": false - } - }, - "reviews": { - "nodes": [ - { - "author": { - "login": "kit1980" - }, - "state": "APPROVED" - }, - { - "author": { - "login": "janeyx99" - }, - "state": "APPROVED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMy0wNFQxNjoyNDo0OC0wNjowMLkyMDIyLTAzLTA0VDE2OjI0OjQ4LTA2OjAwzjWwwqA=", - "hasPreviousPage": false - } - }, - "comments": { - "nodes": [ - { - "bodyText": "Merge failed due to Too many checksuites for commit\nRaised by https://github.com/pytorch/pytorch/actions/runs/1988337976", - "author": { - "login": "pytorchmergebot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1068270969 - }, - { - "bodyText": "@pytorchbot force merge this", - "author": { - "login": "seemethere" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1068436128 - }, - { - "bodyText": "Merge failed due to Too many checksuites for commit\nRaised by https://github.com/pytorch/pytorch/actions/runs/1989076952", - "author": { - "login": "pytorchmergebot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1068437098 - }, - { - "bodyText": "@pytorchbot merge this", - "author": { - "login": "seemethere" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1068482921 - }, - { - "bodyText": "Hey @seemethere.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", - "author": { - "login": "github-actions" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 1068484404 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOP6yFeQ==", - "hasPreviousPage": true - } - }, - "labels": { - "edges": [ - { - "node": { - "name": "cla signed" - } - } - ] - } - } - } - } - }, - "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCcQU= name=pytorch number=73811 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "commits": { - "nodes": [ - { - "commit": { - "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7", - "checkSuites": { - "edges": [ - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" - } - }, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276102" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQY=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-bionic-py3.7-clang9" - } - }, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276103" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQc=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-clang7-onnx" - } - }, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276104" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQg=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-gcc7" - } - }, - "checkRuns": { - "nodes": [ + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310708487" + }, { - "name": "build", + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815361?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310708713" }, { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545915218?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310708942" }, { - "name": "test (distributed, 1, 1, linux.2xlarge)", + "name": "linux-focal-py3.7-clang7-asan / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545915270?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310709174" }, { - "name": "test (default, 1, 2, linux.2xlarge)", + "name": "linux-bionic-py3_7-clang8-xla / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545915344?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqP89A=", - "hasNextPage": false - } - }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276105" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQk=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static" - } - }, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276106" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQo=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" - } - }, - "checkRuns": { - "nodes": [ + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310709340" + }, { - "name": "build-and-test", + "name": "linux-focal-py3.7-gcc7-no-ops / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815353?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObTk=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276107" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQs=" - }, - { + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310709579" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310709844" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710003" + }, + { + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710175" + }, + { + "name": "win-vs2019-cuda11.6-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710516" + }, + { + "name": "linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710716" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710890" + }, + { + "name": "linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711097" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711234" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711429" + }, + { + "name": "linux-focal-rocm5.2-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711603" + }, + { + "name": "linux-jammy-cuda11.6-cudnn8-py3.8-clang12 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711765" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711946" + }, + { + "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310712129" + }, + { + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310712276" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194495" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194591" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194659" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194749" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194858" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194934" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311195003" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311220458" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311220540" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311222725" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311222869" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223128" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223225" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223324" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (functorch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223396" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223496" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223569" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223690" + }, + { + "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311224360" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311230050" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311301930" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311302152" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311302303" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311302433" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311302531" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491082" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491172" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491232" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491289" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491348" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcG0YME=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696836" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdIQ=" + }, + { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 + "name": "Facebook GitHub Tools", + "databaseId": 12274 }, - "workflowRun": { - "workflow": { - "name": "linux-docs" + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.intern.facebook.com/cla/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGjyQg=", + "hasNextPage": false } }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696896" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdMA=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, "checkRuns": { "nodes": [], "pageInfo": { @@ -609,22 +549,18 @@ "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276110" + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697185" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQ4=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdeE=" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "win-vs2019-cuda11.3-py3" - } + "name": "Azure Pipelines", + "databaseId": 9426 }, + "workflowRun": null, "checkRuns": { "nodes": [], "pageInfo": { @@ -632,82 +568,197 @@ "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276111" + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697205" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQ8=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdfU=" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 + "name": "Dependabot", + "databaseId": 29110 }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7" + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false } }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815317?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546189850?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546189908?check_suite_focus=true" - }, - { - "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546189954?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqUJII=", - "hasNextPage": false - } - }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276112" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRA=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "pytorch-xla-linux-bionic-py3.7-clang8" - } - }, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276114" + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697224" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdgg=" } ], "pageInfo": { "hasNextPage": true } + }, + "status": null, + "pushedDate": "2022-07-27T15:34:17Z", + "oid": "28140e4008289251b695385acfb48ac7a47cd49c" + } + } + ] + }, + "changedFiles": 1, + "files": { + "nodes": [ + { + "path": "test/test_ops.py" + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "zou3519" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "Chillee" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNy0yNVQxNDo0NTozNS0wNzowMLkyMDIyLTA3LTI1VDE0OjQ1OjM1LTA3OjAwzj6XYmg=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "@pytorchbot merge -f FORCE", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1197107402 + }, + { + "bodyText": "You need to provide a reason for using force merge, in the format @pytorchbot merge -f '[CATEGORY] Explanation'. With [CATEGORY] being one the following:\nEMERGENCY - an emergency fix to quickly address an issue\nMINOR - a minor fix such as cleaning locally unused variables, which shouldn't break anything\nPRE_TESTED - a previous CI run tested everything and you've only added minor changes like fixing lint\nOTHER - something not covered above", + "author": { + "login": "pytorch-bot" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1197107439 + }, + { + "bodyText": "@pytorchbot merge -f \"[OTHER] normal land failed twice already\"", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1197108130 + }, + { + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1197119348 + }, + { + "bodyText": "Hey @ezyang.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1197120095 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOR1poyg==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "Merged" + } + }, + { + "node": { + "name": "cla signed" + } + } + ] + } + } + } + } + }, + "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAcG0YME= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAcHRdAs= name=pytorch number=82169 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { + "nodes": [ + { + "commit": { + "oid": "28140e4008289251b695385acfb48ac7a47cd49c", + "checkSuites": { + "nodes": [ + { + "checkRuns": { + "nodes": [ + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491405" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (functorch, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491484" + }, + { + "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491703" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311551941" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311552010" + }, + { + "name": "win-vs2019-cpu-py3 / test (functorch, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311552076" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcG1sTc=", + "hasNextPage": false + } + } + } + ] } } } @@ -717,7 +768,7 @@ } } }, - "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCcRI= name=pytorch number=73811 owner=pytorch": { + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAcHRdgg= name=pytorch number=82169 owner=pytorch": { "data": { "repository": { "pullRequest": { @@ -725,20 +776,16 @@ "nodes": [ { "commit": { - "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7", + "oid": "28140e4008289251b695385acfb48ac7a47cd49c", "checkSuites": { "edges": [ { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-gcc5.4" - } + "name": "Codecov", + "databaseId": 254 }, + "workflowRun": null, "checkRuns": { "nodes": [], "pageInfo": { @@ -746,22 +793,18 @@ "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276115" + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697240" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRM=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdhg=" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-vulkan-bionic-py3.7-clang9" - } + "name": "PyTorch Bot", + "databaseId": 40112 }, + "workflowRun": null, "checkRuns": { "nodes": [], "pageInfo": { @@ -769,54 +812,111 @@ "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276117" + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697255" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRU=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-bionic-py3.7-clang9" - } - }, - "checkRuns": { + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdic=" + } + ], + "pageInfo": { + "hasNextPage": false + } + } + } + } + ] + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=73811 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "seemethere" + }, + "title": "ci: Migrate metrics credentials to managed IAM", + "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* __->__ #73811\n\r\nMigrates our credentials to upload metrics statistics to managed IAM\r\ncredentials in order to make it easier to know where the credentials are\r\ncoming from and to make it easier to add more permissions / less\r\npermissions later on.\r\n\r\nRelates to work done in [D34535827](https://www.internalfb.com/diff/D34535827)\r\n\r\nSigned-off-by: Eli Uriegas ", + "headRefName": "gh/seemethere/215/head", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "gh/seemethere/215/base", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "seemethere" + }, + "email": "eliuriegas@fb.com", + "name": "Eli Uriegas" + }, + "oid": "13c44d16a876a56bca479b4cf30715d21fa16e99" + } + }, + { + "commit": { + "author": { + "user": { + "login": "seemethere" + }, + "email": "eliuriegas@fb.com", + "name": "Eli Uriegas" + }, + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7" + } + } + ], + "pageInfo": { + "endCursor": "Mg", + "hasNextPage": false + }, + "totalCount": 2 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { "nodes": [ { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815309?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545918134?check_suite_focus=true" - }, - { - "name": "test (noarch, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545918256?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", + "name": "Facebook CLA Check", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545918319?check_suite_focus=true" + "detailsUrl": "https://code.intern.facebook.com/cla/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqP_28=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOaHA=", "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276119" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658275867" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRc=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcBs=" }, { "node": { @@ -826,7 +926,7 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-gcc7-no-ops" + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" } }, "checkRuns": { @@ -837,9 +937,9 @@ } }, "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276122" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276090" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRo=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcPo=" }, { "node": { @@ -849,36 +949,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-clang7-onnx" + "name": "win-vs2019-cpu-py3" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815351?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545931419?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545931552?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQMyA=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276123" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276092" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRs=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcPw=" }, { "node": { @@ -888,41 +972,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-clang7-asan" + "name": "linux-xenial-py3-clang5-mobile-build" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815311?check_suite_focus=true" - }, - { - "name": "test (default, 3, 3, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545947543?check_suite_focus=true" - }, - { - "name": "test (default, 1, 3, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545947625?check_suite_focus=true" - }, - { - "name": "test (default, 2, 3, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545947792?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQcpA=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276124" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276094" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRw=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcP4=" }, { "node": { @@ -932,66 +995,20 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" } }, "checkRuns": { - "nodes": [ - { - "name": "cmakelint", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815342?check_suite_focus=true" - }, - { - "name": "clang-format", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815564?check_suite_focus=true" - }, - { - "name": "clang-tidy", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815688?check_suite_focus=true" - }, - { - "name": "flake8-py3", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815821?check_suite_focus=true" - }, - { - "name": "quick-checks", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816003?check_suite_focus=true" - }, - { - "name": "mypy", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816076?check_suite_focus=true" - }, - { - "name": "py2-setup-validate-errormsg", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816154?check_suite_focus=true" - }, - { - "name": "shellcheck", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816266?check_suite_focus=true" - }, - { - "name": "toc", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816398?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcU4=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276126" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276095" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcR4=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcP8=" }, { "node": { @@ -1001,26 +1018,20 @@ }, "workflowRun": { "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + "name": "Lint" } }, "checkRuns": { - "nodes": [ - { - "name": "run-torchbench", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815207?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObKc=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276127" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276097" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcR8=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQE=" }, { "node": { @@ -1030,7 +1041,7 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { @@ -1041,9 +1052,9 @@ } }, "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276129" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276098" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSE=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQI=" }, { "node": { @@ -1053,26 +1064,199 @@ }, "workflowRun": { "workflow": { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + "name": "linux-xenial-py3.7-gcc7-no-ops" } }, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602966/jobs/2839950629" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObRM=", "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276130" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276099" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQM=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Test tools" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276100" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQQ=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276101" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQU=" } ], "pageInfo": { "hasNextPage": true } - } + }, + "status": { + "contexts": [ + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17044969?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17045014?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17044975?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-03-14T23:01:55Z", + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7" + } + } + ] + }, + "changedFiles": 3, + "files": { + "nodes": [ + { + "path": ".github/templates/common.yml.j2" + }, + { + "path": ".github/workflows/generated-macos-11-py3-x86-64.yml" + }, + { + "path": ".github/workflows/update_pytorch_labels.yml" + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "kit1980" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "janeyx99" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMy0wNFQxNDoyNDo0OC0wODowMLkyMDIyLTAzLTA0VDE0OjI0OjQ4LTA4OjAwzjWwwqA=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "Merge failed due to Too many checksuites for commit\nRaised by https://github.com/pytorch/pytorch/actions/runs/1988337976", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068270969 + }, + { + "bodyText": "@pytorchbot force merge this", + "author": { + "login": "seemethere" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068436128 + }, + { + "bodyText": "Merge failed due to Too many checksuites for commit\nRaised by https://github.com/pytorch/pytorch/actions/runs/1989076952", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068437098 + }, + { + "bodyText": "@pytorchbot merge this", + "author": { + "login": "seemethere" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068482921 + }, + { + "bodyText": "Hey @seemethere.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1068484404 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOP6yFeQ==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "cla signed" } } ] @@ -1081,7 +1265,7 @@ } } }, - "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCcSI= name=pytorch number=73811 owner=pytorch": { + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCcQU= name=pytorch number=73811 owner=pytorch": { "data": { "repository": { "pullRequest": { @@ -1100,31 +1284,20 @@ }, "workflowRun": { "workflow": { - "name": "pytorch-xla-linux-bionic-py3.7-clang8" + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815348?check_suite_focus=true" - }, - { - "name": "test (xla, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545954339?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQjCM=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276131" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276102" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSM=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQY=" }, { "node": { @@ -1134,41 +1307,20 @@ }, "workflowRun": { "workflow": { - "name": "win-vs2019-cuda11.3-py3" + "name": "linux-bionic-py3.7-clang9" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815322?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546226404?check_suite_focus=true" - }, - { - "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546226489?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546226540?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqUs2w=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276132" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276103" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSQ=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQc=" }, { "node": { @@ -1178,26 +1330,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" + "name": "linux-xenial-py3.7-clang7-onnx" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815307?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObQs=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276133" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276104" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSU=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQg=" }, { "node": { @@ -1207,26 +1353,41 @@ }, "workflowRun": { "workflow": { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" + "name": "linux-xenial-py3.7-gcc7" } }, "checkRuns": { "nodes": [ { - "name": "build-and-test", + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602973/jobs/2839950664" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602973/jobs/2840019714" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602973/jobs/2840019747" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815362?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602973/jobs/2840019794" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObUI=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqP89A=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276134" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276105" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSY=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQk=" }, { "node": { @@ -1236,26 +1397,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" } }, "checkRuns": { - "nodes": [ - { - "name": "build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815337?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObSk=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276135" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276106" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSc=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQo=" }, { "node": { @@ -1265,31 +1420,26 @@ }, "workflowRun": { "workflow": { - "name": "linux-vulkan-bionic-py3.7-clang9" + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" } }, "checkRuns": { "nodes": [ { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815561?check_suite_focus=true" - }, - { - "name": "test (default, 1, 1, linux.2xlarge)", + "name": "build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545929390?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602977/jobs/2839950658" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQKq4=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObTk=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276136" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276107" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSg=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQs=" }, { "node": { @@ -1303,32 +1453,16 @@ } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815356?check_suite_focus=true" - }, - { - "name": "build-docs (cpp)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545920544?check_suite_focus=true" - }, - { - "name": "build-docs (python)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545920612?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQCGQ=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276137" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276110" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSk=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQ4=" }, { "node": { @@ -1338,36 +1472,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-bionic-rocm4.5-py3.7" + "name": "win-vs2019-cuda11.3-py3" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815326?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.rocm.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545983951?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.rocm.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545984049?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqRADE=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276140" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276111" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSw=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQ8=" }, { "node": { @@ -1377,7 +1495,7 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3-clang5-mobile-build" + "name": "linux-xenial-cuda11.3-py3.7-gcc7" } }, "checkRuns": { @@ -1385,18 +1503,33 @@ { "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815205?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602979/jobs/2839950630" + }, + { + "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602979/jobs/2840213785" + }, + { + "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602979/jobs/2840213832" + }, + { + "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602979/jobs/2840213866" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObKU=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqUJII=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276141" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276112" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcS0=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRA=" }, { "node": { @@ -1406,36 +1539,20 @@ }, "workflowRun": { "workflow": { - "name": "win-vs2019-cpu-py3" + "name": "pytorch-xla-linux-bionic-py3.7-clang8" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815314?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546093287?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, windows.4xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546093438?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqSq34=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276143" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276114" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcS8=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRI=" } ], "pageInfo": { @@ -1450,7 +1567,7 @@ } } }, - "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCcS8= name=pytorch number=73811 owner=pytorch": { + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCcRI= name=pytorch number=73811 owner=pytorch": { "data": { "repository": { "pullRequest": { @@ -1473,52 +1590,16 @@ } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815359?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545923802?check_suite_focus=true" - }, - { - "name": "test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545923899?check_suite_focus=true" - }, - { - "name": "test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545924024?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545924110?check_suite_focus=true" - }, - { - "name": "test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545924249?check_suite_focus=true" - }, - { - "name": "test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545924341?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQFvU=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276145" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276115" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcTE=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRM=" }, { "node": { @@ -1528,7 +1609,7 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-gcc7" + "name": "linux-vulkan-bionic-py3.7-clang9" } }, "checkRuns": { @@ -1539,9 +1620,9 @@ } }, "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276149" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276117" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcTU=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRU=" }, { "node": { @@ -1551,20 +1632,41 @@ }, "workflowRun": { "workflow": { - "name": "linux-bionic-rocm4.5-py3.7" + "name": "linux-bionic-py3.7-clang9" } }, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602984/jobs/2839950624" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602984/jobs/2840021854" + }, + { + "name": "test (noarch, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602984/jobs/2840021946" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602984/jobs/2840021988" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqP_28=", "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276152" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276119" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcTg=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRc=" }, { "node": { @@ -1574,26 +1676,20 @@ }, "workflowRun": { "workflow": { - "name": "Test tools" + "name": "linux-xenial-py3.7-gcc7-no-ops" } }, "checkRuns": { - "nodes": [ - { - "name": "test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815310?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObQ4=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276157" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276122" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcT0=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRo=" }, { "node": { @@ -1603,7 +1699,7 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + "name": "linux-xenial-py3.7-clang7-onnx" } }, "checkRuns": { @@ -1611,18 +1707,28 @@ { "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545815320?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602988/jobs/2839950656" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602988/jobs/2840031185" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602988/jobs/2840031288" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObRg=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQMyA=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276159" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276123" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcT8=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRs=" }, { "node": { @@ -1632,7 +1738,7 @@ }, "workflowRun": { "workflow": { - "name": "macos-10-15-py3-arm64" + "name": "linux-xenial-py3.7-clang7-asan" } }, "checkRuns": { @@ -1640,18 +1746,33 @@ { "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816079?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602989/jobs/2839950625" + }, + { + "name": "test (default, 3, 3, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602989/jobs/2840042498" + }, + { + "name": "test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602989/jobs/2840042534" + }, + { + "name": "test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602989/jobs/2840042646" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA8=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQcpA=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276857" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276124" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCc_k=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcRw=" }, { "node": { @@ -1661,26 +1782,66 @@ }, "workflowRun": { "workflow": { - "name": "ios-12-5-1-arm64-coreml" + "name": "Lint" } }, "checkRuns": { "nodes": [ { - "name": "build", + "name": "cmakelint", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816078?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839950650" + }, + { + "name": "clang-format", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839950743" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839950808" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839950884" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839950992" + }, + { + "name": "mypy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839951037" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839951085" + }, + { + "name": "shellcheck", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839951170" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602990/jobs/2839951266" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA4=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcU4=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276860" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276126" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCc_w=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcR4=" }, { "node": { @@ -1690,26 +1851,26 @@ }, "workflowRun": { "workflow": { - "name": "ios-12-5-1-arm64" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { "nodes": [ { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816071?check_suite_focus=true" + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602993/jobs/2839950562" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAc=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObKc=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276861" + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276127" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCc_0=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcR8=" }, { "node": { @@ -1719,36 +1880,20 @@ }, "workflowRun": { "workflow": { - "name": "macos-11-py3-x86-64" + "name": "linux-xenial-cuda11.3-py3.7-gcc7" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816073?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, macos-11)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546066712?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, macos-11)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5546066787?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqSQ2M=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276862" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276129" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCc_4=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSE=" }, { "node": { @@ -1758,26 +1903,20 @@ }, "workflowRun": { "workflow": { - "name": "ios-12-5-1-arm64-custom-ops" + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816081?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcBE=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276864" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276130" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAA=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSI=" } ], "pageInfo": { @@ -1792,7 +1931,7 @@ } } }, - "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCdAA= name=pytorch number=73811 owner=pytorch": { + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCcSI= name=pytorch number=73811 owner=pytorch": { "data": { "repository": { "pullRequest": { @@ -1811,7 +1950,7 @@ }, "workflowRun": { "workflow": { - "name": "ios-12-5-1-x86-64-coreml" + "name": "pytorch-xla-linux-bionic-py3.7-clang8" } }, "checkRuns": { @@ -1819,18 +1958,23 @@ { "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816077?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602994/jobs/2839950655" + }, + { + "name": "test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602994/jobs/2840047401" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA0=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQjCM=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276867" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276131" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAM=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSM=" }, { "node": { @@ -1840,7 +1984,7 @@ }, "workflowRun": { "workflow": { - "name": "ios-12-5-1-arm64-metal" + "name": "win-vs2019-cuda11.3-py3" } }, "checkRuns": { @@ -1848,18 +1992,33 @@ { "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816080?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602996/jobs/2839950632" + }, + { + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602996/jobs/2840239369" + }, + { + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602996/jobs/2840239408" + }, + { + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602996/jobs/2840239445" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcBA=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqUs2w=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276869" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276132" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAU=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSQ=" }, { "node": { @@ -1869,7 +2028,7 @@ }, "workflowRun": { "workflow": { - "name": "macos-10-15-py3-lite-interpreter-x86-64" + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" } }, "checkRuns": { @@ -1877,18 +2036,18 @@ { "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816075?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602998/jobs/2839950621" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAs=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObQs=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276873" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276133" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAk=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSU=" }, { "node": { @@ -1898,186 +2057,168 @@ }, "workflowRun": { "workflow": { - "name": "ios-12-5-1-x86-64" + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" } }, "checkRuns": { "nodes": [ { - "name": "build", + "name": "build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5545816068?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602997/jobs/2839950665" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAQ=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObUI=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276881" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276134" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdBE=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSY=" }, { "node": { "app": { - "name": "Netlify", - "databaseId": 13473 + "name": "GitHub Actions", + "databaseId": 15368 }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277331" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCddM=" - }, - { - "node": { - "app": { - "name": "Azure Pipelines", - "databaseId": 9426 - }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603001/jobs/2839950648" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObSk=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277340" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276135" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCddw=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSc=" }, { "node": { "app": { - "name": "Dependabot", - "databaseId": 29110 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603002/jobs/2839950741" + }, + { + "name": "test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603002/jobs/2840029810" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQKq4=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277346" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276136" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdeI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSg=" }, { "node": { "app": { - "name": "Codecov", - "databaseId": 254 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-docs" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603000/jobs/2839950661" + }, + { + "name": "build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603000/jobs/2840023513" + }, + { + "name": "build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603000/jobs/2840023552" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQCGQ=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277350" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276137" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdeY=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSk=" }, { "node": { "app": { - "name": "PyTorch Bot", - "databaseId": 40112 + "name": "GitHub Actions", + "databaseId": 15368 }, - "workflowRun": null, - "checkRuns": { - "nodes": [], + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603003/jobs/2839950637" + }, + { + "name": "test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603003/jobs/2840068586" + }, + { + "name": "test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603003/jobs/2840068671" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqRADE=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277355" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276140" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdes=" - } - ], - "pageInfo": { - "hasNextPage": false - } - } - } - } - ] - } - } - } - } - }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=31093 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "closed": true, - "isCrossRepository": true, - "author": { - "login": "mingxiaoh" - }, - "title": "improve mkldnn convolution test coverage", - "body": "This pr will improve the test coverage of mkldnn convolution.\r\n1.test input: specific sensitive numbers\r\n2.pass criteria: output of mkldnn convolution matches output of thnn convolution\r\n3.coverage: by using coverage tool, we found out the following sensitive parameters. Overall the case will test 4352 patterns, takes 8.8s on my machine.\r\n\r\nto run the test case:\r\n\r\npython test_mkldnn_conv2d_ext.py\r\nor\r\npython run_test.py -i mkldnn_conv2d_ext\r\n\r\nIn case of failure, the pattern will be printed in the log for further debugging.\r\n\r\nactually, this PR is created to replace and improve that PR we created before(https://github.com/pytorch/pytorch/pull/25085) ", - "headRefName": "master", - "headRepository": { - "nameWithOwner": "mingxiaoh/pytorch" - }, - "baseRefName": "master", - "baseRepository": { - "nameWithOwner": "pytorch/pytorch", - "isPrivate": false, - "defaultBranchRef": { - "name": "master" - } - }, - "mergeCommit": null, - "commits_with_authors": { - "nodes": [ - { - "commit": { - "author": { - "user": { - "login": "11pikachu" - }, - "email": "junx.du@intel.com", - "name": "dujun" - }, - "oid": "29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" - } - } - ], - "pageInfo": { - "endCursor": "MQ", - "hasNextPage": false - }, - "totalCount": 1 - }, - "commits": { - "nodes": [ - { - "commit": { - "checkSuites": { - "edges": [ + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcSw=" + }, { "node": { "app": { @@ -2086,26 +2227,26 @@ }, "workflowRun": { "workflow": { - "name": "clang-format" + "name": "linux-xenial-py3-clang5-mobile-build" } }, "checkRuns": { "nodes": [ { - "name": "clang-format", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676797?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603004/jobs/2839950560" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHOQYu8fQ==", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObKU=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1175281097" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276141" }, - "cursor": "Y3Vyc29yOnYyOpHORg1dyQ==" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcS0=" }, { "node": { @@ -2115,2861 +2256,7081 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "win-vs2019-cpu-py3" } }, "checkRuns": { "nodes": [ { - "name": "flake8-py3", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676800?check_suite_focus=true" - }, - { - "name": "quick-checks", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676817?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603005/jobs/2839950626" }, { - "name": "clang-tidy", + "name": "test (default, 2, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676829?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603005/jobs/2840145642" }, { - "name": "cmakelint", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676840?check_suite_focus=true" + "name": "test (default, 1, 2, windows.4xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603005/jobs/2840145755" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHOQYu8qA==", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqSq34=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1175281099" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276143" }, - "cursor": "Y3Vyc29yOnYyOpHORg1dyw==" - }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcS8=" + } + ], + "pageInfo": { + "hasNextPage": true + } + } + } + } + ] + } + } + } + } + }, + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCcS8= name=pytorch number=73811 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { + "nodes": [ + { + "commit": { + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7", + "checkSuites": { + "edges": [ { "node": { "app": { - "name": "Codecov", - "databaseId": 254 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } }, - "workflowRun": null, "checkRuns": { "nodes": [ { - "name": "codecov/project", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://codecov.io" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603007/jobs/2839950666" }, { - "name": "codecov/patch", + "name": "test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://codecov.io" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603007/jobs/2840025927" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603007/jobs/2840025995" + }, + { + "name": "test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603007/jobs/2840026086" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603007/jobs/2840026134" + }, + { + "name": "test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603007/jobs/2840026235" + }, + { + "name": "test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603007/jobs/2840026282" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHOQZhcFQ==", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqQFvU=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1176100822" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276145" }, - "cursor": "Y3Vyc29yOnYyOpHORhnf1g==" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcTE=" }, { "node": { "app": { - "name": "Codecov", - "databaseId": 254 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [ - { - "name": "codecov/patch", - "conclusion": "SUCCESS", - "detailsUrl": "https://codecov.io" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHOQZZsEQ==", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1176100824" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276149" }, - "cursor": "Y3Vyc29yOnYyOpHORhnf2A==" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcTU=" }, { "node": { "app": { - "name": "Facebook GitHub Tools", - "databaseId": 12274 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [ - { - "name": "Facebook CLA Check", - "conclusion": "SUCCESS", - "detailsUrl": "https://code.facebook.com/cla/" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHOUquzJg==", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1487517306" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276152" }, - "cursor": "Y3Vyc29yOnYyOpHOWKm2eg==" - } - ], - "pageInfo": { - "hasNextPage": false + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcTg=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Test tools" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603012/jobs/2839950623" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObQ4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276157" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcT0=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603013/jobs/2839950631" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObRg=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276159" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcT8=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "macos-10-15-py3-arm64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603251/jobs/2839951040" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA8=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276857" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCc_k=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-arm64-coreml" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603253/jobs/2839951038" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276860" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCc_w=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-arm64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603254/jobs/2839951030" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAc=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276861" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCc_0=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "macos-11-py3-x86-64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603255/jobs/2839951034" + }, + { + "name": "test (default, 1, 2, macos-11)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603255/jobs/2840127016" + }, + { + "name": "test (default, 2, 2, macos-11)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603255/jobs/2840127073" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqSQ2M=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276862" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCc_4=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-arm64-custom-ops" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603256/jobs/2839951041" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcBE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276864" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAA=" + } + ], + "pageInfo": { + "hasNextPage": true } - }, - "pushedDate": "2020-09-11T01:58:24Z", - "oid": "29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + } } } ] - }, - "changedFiles": 5, - "files": { - "nodes": [ - { - "path": "test/math_libraries/convolutions.py" - }, - { - "path": "test/math_libraries/convolutions_cases/shapes_googlenet_v3.json" - }, - { - "path": "test/math_libraries/convolutions_cases/shapes_maskrcnn_p1.json" - }, - { - "path": "test/math_libraries/convolutions_cases/shapes_mobilenet.json" - }, - { - "path": "test/math_libraries/convolutions_cases/shapes_resnet_50.json" - } - ], - "pageInfo": { - "endCursor": "NQ", - "hasNextPage": false - } - }, - "reviews": { + } + } + } + } + }, + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAVFCdAA= name=pytorch number=73811 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { "nodes": [ { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, + "commit": { + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7", + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-x86-64-coreml" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603259/jobs/2839951039" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcA0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276867" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAM=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-arm64-metal" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603261/jobs/2839951042" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcBA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276869" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAU=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "macos-10-15-py3-lite-interpreter-x86-64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603264/jobs/2839951036" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAs=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276873" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdAk=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "ios-12-5-1-x86-64" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983603269/jobs/2839951029" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOcAQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276881" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdBE=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277331" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCddM=" + }, + { + "node": { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277340" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCddw=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277346" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdeI=" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277350" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdeY=" + }, + { + "node": { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658277355" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCdes=" + } + ], + "pageInfo": { + "hasNextPage": false + } + } + } + } + ] + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=31093 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "mingxiaoh" + }, + "title": "improve mkldnn convolution test coverage", + "body": "This pr will improve the test coverage of mkldnn convolution.\r\n1.test input: specific sensitive numbers\r\n2.pass criteria: output of mkldnn convolution matches output of thnn convolution\r\n3.coverage: by using coverage tool, we found out the following sensitive parameters. Overall the case will test 4352 patterns, takes 8.8s on my machine.\r\n\r\nto run the test case:\r\n\r\npython test_mkldnn_conv2d_ext.py\r\nor\r\npython run_test.py -i mkldnn_conv2d_ext\r\n\r\nIn case of failure, the pattern will be printed in the log for further debugging.\r\n\r\nactually, this PR is created to replace and improve that PR we created before(https://github.com/pytorch/pytorch/pull/25085) ", + "headRefName": "master", + "headRepository": { + "nameWithOwner": "mingxiaoh/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "CHANGES_REQUESTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "CHANGES_REQUESTED" - }, - { - "author": { - "login": "ailzhang" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "ngimel" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "VitalyFedyunin" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "ngimel" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mingxiaoh" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "mingxiaoh" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "VitalyFedyunin" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "VitalyFedyunin" - }, - "state": "APPROVED" + "commit": { + "author": { + "user": { + "login": "11pikachu" + }, + "email": "junx.du@intel.com", + "name": "dujun" + }, + "oid": "29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + } } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAxOS0xMi0zMFQxMjoxOToxMS0wNjowMLkyMDE5LTEyLTMwVDEyOjE5OjExLTA2OjAwzhQZLuY=", - "hasPreviousPage": false - } + "endCursor": "MQ", + "hasNextPage": false + }, + "totalCount": 1 }, - "comments": { + "commits": { "nodes": [ { - "bodyText": "I cloned your repo and ran the tests:\n~/pytorch/test/math_libraries$ python convolutions.py\nFFFF\n======================================================================\nFAIL: test_conv2d_ext_cpu_float32 (__main__.TestConvExtCPU)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n----------------------------------------------------------------------\nRan 4 tests in 33.838s\n\nFAILED (failures=4)\n\nStill fails.\n\n@mruberry It is suggested by @VitalyFedyunin that, we need to display fail test to avoid invalid inputs, I guess we should set it as expected failures under the pytest test framework, right? we will change it as expected failure cases under pytest test framework. The result will looks like be low, is it ok?\n2500 passed, 136 skipped, 0 failed, 0 errors, 2 expected failures, 0 unexpected passes", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": { - "login": "mingxiaoh" - }, - "databaseId": 673816925 - }, - { - "bodyText": "Displaying tests that fail is fine, but I don't think @VitalyFedyunin meant that it was OK if the tests didn't pass. If these are expected failures then yes, you can use with self.assertRaises(RuntimeError):... when testing them. If you also want to report that the test has test cases with these properties you can print or warn, which will appear in the test output.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 673858224 - }, - { - "bodyText": "Codecov Report\n\nMerging #31093 into master will not change coverage.\nThe diff coverage is n/a.\n\n\n@@ Coverage Diff @@\n## master #31093 +/- ##\n=======================================\n Coverage 68.00% 68.00% \n=======================================\n Files 382 382 \n Lines 49527 49527 \n=======================================\n Hits 33679 33679 \n Misses 15848 15848 \n\nContinue to review full report at Codecov.\n\nLegend - Click here to learn more\n\u0394 = absolute (impact), \u00f8 = not affected, ? = missing data\nPowered by Codecov. Last update 69f6d94...29f6aa6. Read the comment docs.", - "author": { - "login": "codecov" - }, - "authorAssociation": "NONE", - "editor": { - "login": "codecov" - }, - "databaseId": 686921371 - }, - { - "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. Stale pull requests will automatically be closed 30 days after being marked Stale", - "author": { - "login": "pytorchbot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1095860944 - }, - { - "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.", - "author": { - "login": "github-actions" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 1152854802 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOKCmhXQ==", + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "clang-format" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "clang-format", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676797?check_suite_focus=true" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOQYu8fQ==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1175281097" + }, + "cursor": "Y3Vyc29yOnYyOpHORg1dyQ==" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "flake8-py3", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676800?check_suite_focus=true" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676817?check_suite_focus=true" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676829?check_suite_focus=true" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676840?check_suite_focus=true" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOQYu8qA==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1175281099" + }, + "cursor": "Y3Vyc29yOnYyOpHORg1dyw==" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "codecov/project", + "conclusion": "SUCCESS", + "detailsUrl": "https://codecov.io" + }, + { + "name": "codecov/patch", + "conclusion": "SUCCESS", + "detailsUrl": "https://codecov.io" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOQZhcFQ==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1176100822" + }, + "cursor": "Y3Vyc29yOnYyOpHORhnf1g==" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "codecov/patch", + "conclusion": "SUCCESS", + "detailsUrl": "https://codecov.io" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOQZZsEQ==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1176100824" + }, + "cursor": "Y3Vyc29yOnYyOpHORhnf2A==" + }, + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.facebook.com/cla/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHOUquzJg==", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1487517306" + }, + "cursor": "Y3Vyc29yOnYyOpHOWKm2eg==" + } + ], + "pageInfo": { + "hasNextPage": false + } + }, + "status": { + "contexts": [ + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406538?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406947?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406544?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406931?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_windows_libtorch_3_7_cpu_debug_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406550?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_windows_libtorch_3_7_cpu_debug_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406887?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_windows_libtorch_3_7_cpu_release_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406526?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_windows_libtorch_3_7_cpu_release_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406707?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: caffe2_onnx_main_py3_6_clang7_ubuntu16_04_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406533?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: caffe2_onnx_main_py3_6_clang7_ubuntu16_04_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407256?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407254?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407255?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406556?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406532?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406527?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406553?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-py3.6-clang9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406537?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-py3.8-gcc9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406529?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-rocm3.5.1-py3.6", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406554?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-rocm3.7-py3.6", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406545?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406543?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406536?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406552?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406535?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406540?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406528?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406541?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-asan", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406549?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-clang7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406555?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-gcc4.8", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406546?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-gcc5.4", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406531?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406534?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-gcc7.2", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406523?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.8", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406539?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-rocm3.3-py3.6", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406547?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-rocm3.5.1-py3.6", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406551?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407209?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406611?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_bazel_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406607?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_bazel_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406984?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_cpp_doc_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407013?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_doc_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407011?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_ios_11_2_1_x86_64_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406548?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_libtorch_linux_xenial_cuda11_0_cudnn8_py3_gcc7_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406563?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_libtorch_linux_xenial_cuda11_0_cudnn8_py3_gcc7_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7408680?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_backward_compatibility_check_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407014?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_py3_6_clang9_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406567?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_py3_6_clang9_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406945?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_py3_8_gcc9_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406561?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_py3_8_gcc9_coverage_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407422?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_rocm3_7_py3_6_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406562?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406612?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7408107?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_ge_config_legacy_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7408111?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_ge_config_profiling_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7408101?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406613?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406565?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407017?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_profiling_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407019?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407012?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407016?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_vulkan_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406608?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406609?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_asan_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406606?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_asan_test1", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407435?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_asan_test2", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407436?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_mobile_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406605?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_mobile_custom_build_dynamic", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406610?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_macos_10_13_py3_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406525?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_macos_10_13_py3_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407415?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_python_doc_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407018?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_vulkan_linux_bionic_py3_6_clang9_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406566?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_vulkan_linux_bionic_py3_6_clang9_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406946?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cpu_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406542?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cuda10.1_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406530?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cuda10.1_test1", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407028?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cuda10.1_test2", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407027?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cuda11.0_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406524?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_xla_linux_bionic_py3_6_clang9_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406572?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_xla_linux_bionic_py3_6_clang9_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407253?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "codecov/patch", + "state": "SUCCESS", + "targetUrl": "https://codecov.io/gh/pytorch/pytorch/compare/69f6d94caa3559d4f50745c26af5df041b83fee8...29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + }, + { + "context": "codecov/project", + "state": "SUCCESS", + "targetUrl": "https://codecov.io/gh/pytorch/pytorch/compare/69f6d94caa3559d4f50745c26af5df041b83fee8...29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + }, + { + "context": "pr/caffe2-pytorch-linux-bionic-rocm3.7-py3.6-test", + "state": "SUCCESS", + "targetUrl": "https://ci.pytorch.org/jenkins/job/caffe2-builds/job/pytorch-linux-bionic-rocm3.7-py3.6-trigger-test/2319/" + }, + { + "context": "pr/pytorch-linux-bionic-rocm3.7-py3.6", + "state": "SUCCESS", + "targetUrl": "https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm3.7-py3.6-trigger/2325/" + } + ] + }, + "pushedDate": "2020-09-11T01:58:24Z", + "oid": "29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + } + } + ] + }, + "changedFiles": 5, + "files": { + "nodes": [ + { + "path": "test/math_libraries/convolutions.py" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_googlenet_v3.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_maskrcnn_p1.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_mobilenet.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_resnet_50.json" + } + ], + "pageInfo": { + "endCursor": "NQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "CHANGES_REQUESTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "CHANGES_REQUESTED" + }, + { + "author": { + "login": "ailzhang" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "VitalyFedyunin" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mingxiaoh" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mingxiaoh" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "VitalyFedyunin" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "VitalyFedyunin" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAxOS0xMi0zMFQxMDoxOToxMS0wODowMLkyMDE5LTEyLTMwVDEwOjE5OjExLTA4OjAwzhQZLuY=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "I cloned your repo and ran the tests:\n~/pytorch/test/math_libraries$ python convolutions.py\nFFFF\n======================================================================\nFAIL: test_conv2d_ext_cpu_float32 (__main__.TestConvExtCPU)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n----------------------------------------------------------------------\nRan 4 tests in 33.838s\n\nFAILED (failures=4)\n\nStill fails.\n\n@mruberry It is suggested by @VitalyFedyunin that, we need to display fail test to avoid invalid inputs, I guess we should set it as expected failures under the pytest test framework, right? we will change it as expected failure cases under pytest test framework. The result will looks like be low, is it ok?\n2500 passed, 136 skipped, 0 failed, 0 errors, 2 expected failures, 0 unexpected passes", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 673816925 + }, + { + "bodyText": "Displaying tests that fail is fine, but I don't think @VitalyFedyunin meant that it was OK if the tests didn't pass. If these are expected failures then yes, you can use with self.assertRaises(RuntimeError):... when testing them. If you also want to report that the test has test cases with these properties you can print or warn, which will appear in the test output.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 673858224 + }, + { + "bodyText": "Codecov Report\n\nMerging #31093 into master will not change coverage.\nThe diff coverage is n/a.\n\n\n@@ Coverage Diff @@\n## master #31093 +/- ##\n=======================================\n Coverage 68.00% 68.00% \n=======================================\n Files 382 382 \n Lines 49527 49527 \n=======================================\n Hits 33679 33679 \n Misses 15848 15848 \n\nContinue to review full report at Codecov.\n\nLegend - Click here to learn more\n\u0394 = absolute (impact), \u00f8 = not affected, ? = missing data\nPowered by Codecov. Last update 69f6d94...29f6aa6. Read the comment docs.", + "author": { + "login": "codecov" + }, + "authorAssociation": "NONE", + "editor": { + "login": "codecov" + }, + "databaseId": 686921371 + }, + { + "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. Stale pull requests will automatically be closed 30 days after being marked Stale", + "author": { + "login": "pytorchbot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1095860944 + }, + { + "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1152854802 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOKCmhXQ==", "hasPreviousPage": true } }, - "labels": { - "edges": [ + "labels": { + "edges": [ + { + "node": { + "name": "triaged" + } + }, + { + "node": { + "name": "open source" + } + }, + { + "node": { + "name": "cla signed" + } + }, + { + "node": { + "name": "Stale" + } + } + ] + } + } + } + } + }, + "query_sha=62ce809793481ce6ddce6e1a19d9b0761755ff0ff75decaf8a79419eaf793110 cursor=Y3Vyc29yOnYyOpHOKCmhXQ== name=pytorch number=31093 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "comments": { + "nodes": [ + { + "bodyText": "Hi, @mingfeima @soumith @Jianhui-Li\nthis will improve the test coverage of mkldnn convolution, would you please review it?\nThe current code is forward only, do we need to cover backward, if yes, we can add backward.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 564806270 + }, + { + "bodyText": "@mingxiaoh, what is the value in testing DNNL as part of Pytorch validation for the Pytorch developers? Shouldn't having these tests run in DNNL validation be enough?", + "author": { + "login": "vpirogov" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 564808528 + }, + { + "bodyText": "@vpirogov The main value is to serve as a blind test to DNNL. If DNNL adds these test to DNNL test sets, it lost the value as a blind test. The spirit of validation is to cross check.\n@gottbrath @gchanan The test was developed per the request of Pytorch team. Mingxiao made an effort to reduce the execution time to a few second but still with good coverage. Although the test today is focused on DNNL, it could be easily extended to be blind test for any conv implementation used in Pytorch.", + "author": { + "login": "Jianhui-Li" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 567826907 + }, + { + "bodyText": "@mruberry thanks for the comment. As for the chainer dependency, we import it is because we would like to use its testing function for pytest test cases combinations, other wise we need to write much more code to achieve same effect. So, can we use it?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 574563012 + }, + { + "bodyText": "@mingxiaoh You cannot import chainer. Looking at the code you should be able to achieve the same effect without it.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 575272358 + }, + { + "bodyText": "@mruberry ok, we will change it according to your requirement. Thanks", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 583917522 + }, + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/31093\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 29f6aa6 (more details on the Dr. CI page):\n\nCommit 29f6aa6 was recently pushed. Waiting for builds...\n\nThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "author": { + "login": "dr-ci" + }, + "authorAssociation": "NONE", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 628466876 + }, + { + "bodyText": "@mruberry how about those cudnn UT error? we add check for it but it should be NV to fix cudnn bugs.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 629955767 + }, + { + "bodyText": "Hey @mingxiaoh! You're right, of course, that you shouldn't have to fix cuDNN bugs. Would you please:\n\nAssert that the test case fails, so we know it's failing and if someone fixes it they'll know what test to update.\nFile a new issue explaining the behavior and providing a short PyTorch program to reproduce the issue.\n\nThen we can ping NVIDIA on that issue.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 629997129 + }, + { + "bodyText": "about the suggestion 'Assert that the test case fails, so we know it's failing and if someone fixes it they'll know what test to update. ', if we only assert it and continue the following test, I guess users might always ignore them in later test. Anyway, any similar example case for reference?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 630010734 + }, + { + "bodyText": "In this recent PR https://github.com/pytorch/pytorch/pull/38505/files, for example, you can see that the construction of bool tensors wasn't working properly, so the test author cited the relevant issue and asserted that the incorrect behavior happened, as expected. You can also see how these lines are being removed by https://github.com/pytorch/pytorch/pull/38392/files, which fixes the issue.\nAnother common pattern is to use with self.assertRaises(RuntimeError/AssertionError/etc.):.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 630014823 + }, + { + "bodyText": "@mruberry the failed UT case is not introduced by our modification, how to handle this issue?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631187735 + }, + { + "bodyText": "@mingxiaoh You mean the failures on ROCm? You may ignore them. Be sure to re-request review when you're ready.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 631191425 + }, + { + "bodyText": "@mruberry we already skipped those ROCm errors, but there are stil somel error caused by the original code, they are not introduced by our modification.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631886529 + }, + { + "bodyText": "I understand. Let me know when you're ready for me to review.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 631908011 + }, + { + "bodyText": "@mruberry thanks, we are ready for review now.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631909442 + }, + { + "bodyText": "@mingxiaoh Great! I'll take a look ASAP.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 631910556 + }, + { + "bodyText": "@mruberry we just pull the latest code and updated the patch according to your comment, may you please help double check it? BTW, the new failed case in preci is not introduced by our modification.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 633430458 + }, + { + "bodyText": "@ailzhang would you please check the comment below? Thanks.\nIs there a reason why this TestConv2dExt is a new class instead a test inside TestNN?\n//comment: it is actually suggested by Tongzhou Wang in another thread before.\nAlthough this test sits in generic testing framework, it's actually comparing thnn/mkldnn/cudnn results specially. I feel it's better to make it truly generic so that it compares any device result with CPU result. Alternatively you can mark this test only run when torch.backends.mkldnn.is_available()=True\n//comment: but our goal is to compare the result with that of thnn. Anyway, if you insist, we can start to compare it with cpu.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 634432326 + }, + { + "bodyText": "Pruning reviewers. @ngimel, @VitalyFedyunin, this PR is looking pretty good from a test framework perspective. Would one of you like to review?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 634557563 + }, + { + "bodyText": "@mruberry Thanks, would you please help review it again. BTW: failed case is not introduced by our modification.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 635256214 + }, + { + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code", + "author": { + "login": "1pikachu" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637364148 + }, + { + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code\n\n@ngimel will follow-up on the test itself sometime this week or early next week.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 637444457 + }, + { + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code\n\n@ngimel will follow-up on the test itself sometime this week or early next week.\n\n@mruberry thank you", + "author": { + "login": "1pikachu" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637479226 + }, + { + "bodyText": "Improving test coverage of math libraries is certainly a good goal and this PR is moving towards it. I have some doubts about implementation decisions made, and about running this PR as part of regular pytorch CI.\nIf the primary goal of this PR is to test correctness of the convolution implementations in the vendor library, then it does not serve this purpose. The absolute majority of the 4000+ test cases come from group 1, where different kernel sizes/strides/dilations are used to produce the output of size 1x1. This can test whether pytorch correctly passes convolution parameters to the backends (although there are cheaper ways to do that), but as actual library correctness check it is almost useless - libraries use very different kernels depending in the input/output sizes, and tests with toy sizes like this don't invoke the real bread-and-butter kernels.\nAlso, if this test suite is meant as primary a means of testing vendor libraries (which is a good goal!) it does not have a place as a part of pytorch regular CI, and should be run when the corresponding vendor libraries are updated. I'd suggest moving this test out into a separate file (maybe even outside of torch/test directory) and have it as a part of library update/qualification process rather than regular CI.\nAlso, if the primary goal is to enable easier testing of vendor libraries correctness, perhaps we should rethink the mechanism of the generation of test cases. It should be easy to add a test case with a particular set of parameters that was found to be buggy. Also, running a cross-product of cases in a multi-dimensional space (as this PR does) is rarely an efficient way of getting a signal, some forms of random sampling usually provide a way to get better correctness signal why using less resources.\nAlso, when testing libraries it is important to test both forward and backward functions, whereas this PR does forward only. I'm openminded on whether convTransposed should be tested or not - if we are testing vendor libraries, then it's not necessary, convTransposed calls the same underlying functions, if we are testing pytorch, then it makes sense to test it separately because it takes different codepaths.", + "author": { + "login": "ngimel" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 637827507 + }, + { + "bodyText": "@mruberry ngimel is quite responsible, but it seems that she is not familiar with the background of this pull-request, since this pull-request is pending for so such a long time, each time we are almost done, then reviewer changes, each reviewer has different idea, it is good, but, would it be better if you help review it or ask the same reviewer to review it considering that you are more familiar with the background/change history? Thanks in advance.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637912105 + }, + { + "bodyText": "@mruberry ngimel is quite responsible, but it seems that she is not familiar with the background of this pull-request, since this pull-request is pending for so such a long time, each time we are almost done, then reviewer changes, each reviewer has different idea, it is good, but, would it be better if you help review it or ask the same reviewer to review it considering that you are more familiar with the background/change history? Thanks in advance.\n\nWe know this PR has been open for awhile and we respect that your time is valuable, but we want to make sure we're making the right change here, and I think @ngimel's comments reflect that and should not be too difficult to address. As I understand, her points are:\n\nThis is a good PR with an exciting idea. To let it run longer and test more cases maybe it should run outside the regular PyTorch CI.\nTo remedy this, let's create a test/math_libraries folder and put this test there: test/math_libaries/convolutions.py. Yes, this is different from our requests in the past, which is our mistake, but it should be an easy change.\nTo make the test more interesting it'd be good for the test cases to resemble convolutions used in practice. The current test cases seem like similar \"toy\" examples. Without time pressure we should be able to run larger, more computationally intensive convolutions.\nLet's change the test cases to include some practical convolutions, make it easy to add test cases, and think about how we might generate other interesting cases. (We should also test backwards once we have more time!)\n\nAnd I think these are good points. Maybe the PR doesn't create a new way to generate interesting convolutions to start and instead only runs a few representative convolutions, but @ngimel is positioning the work for success so that it's useful and we can continue to improve on it in the future.\nDoes that make sense?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 637924703 + }, + { + "bodyText": "@mruberry we were required to finish the test in limited time long long before, at that time, jianhui discussed this issue with you, and you are all agreed with the current test scope and test case number and test time, so you meant you change your mind now? you are not care about the test time currently? Sorry, this issue is pending so long, we are struggling with it now and would like to finish it asap. Given this, it would be be better if you raise all the requirement at a time, considering that we have many tasks at hand, we are hoping so eagerly that we can finish this PR and use it for further test for bugs finding.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 637960626 + }, + { + "bodyText": "@mruberry we were required to finish the test in limited time long long before, at that time, jianhui discussed this issue with you, and you are all agreed with the current test scope and test case number and test time, so you meant you change your mind now? you are not care about the test time currently? Sorry, this issue is pending so long, we are struggling with it now and would like to finish it asap. Given this, it would be be better if you raise all the requirement at a time, considering that we have many tasks at hand, we are hoping so eagerly that we can finish this PR and use it for further test for bugs finding.\n\nI'm sorry, I don't think I've talked to @Jianhui-Li before. It's true that the team we expressed a concern about timing if the test was to be run in the CI initially, but I think now that we understand what the test is trying to do better we're not sure the CI is the best place for it. The PR was also closed after a lengthy period of inactivity, and we assumed it had simply been abandoned.\nDo you know who @Jianhui-Li spoke with about this issue originally? Maybe I can follow-up with them for more context.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 637967153 + }, + { + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637978356 + }, + { + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?\n\nI think this will be easier to discuss at the regular Intel-FB meeting.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 638446723 + }, + { + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?\n\nI think this will be easier to discuss at the regular Intel-FB meeting.\n\nLet me sync with Mingxiao and follow up with this. Thanks.", + "author": { + "login": "Jianhui-Li" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 638451670 + }, + { + "bodyText": "@mruberry would you please help review it again?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 653028208 + }, + { + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 654443242 + }, + { + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 656062287 + }, + { + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks\n\n@mruberry the code is ready for review now, would you please take time for it? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 658071151 + }, + { + "bodyText": "super nit: renaming files to .json will make it more IDE friendly.", + "author": { + "login": "VitalyFedyunin" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 658464685 + }, + { + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks\n\n@mruberry the code is ready for review now, would you please take time for it? Thanks.\n\nCool! I took a look with @ngimel, once these issues are addressed I think we're good to go!", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 659164401 + }, + { + "bodyText": "@ngimel & @VitalyFedyunin We have changed the code according to your suggestions, would you please review it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 660884305 + }, + { + "bodyText": "@ngimel & @VitalyFedyunin We have changed the code according to your suggestions, would you please review it again? Thanks.\n\nUpdated: one more question about tolerances, one code cleanup recommendation, and one task leftover from the last review.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 662678464 + }, + { + "bodyText": "Updated: one more question about tolerances, one code cleanup recommendation, and one task leftover from the last review.\n@mruberry we have finished the modification according to your comment, would you please review it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 662930687 + }, + { + "bodyText": "The code looks good, but I tried running the test suite and hit the following failures:\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 102, in test_conv2d_ext\n msg=msg\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 1085, in assertEqual\n self.assertTrue(result, msg=msg)\nAssertionError: False is not true : device:cuda:0, dtype:torch.float16, group:1, batchsize:22input channel:448, output channel:384, bias:False, padding:[1, 1], dilation:[1, 1], stride:[1, 1], kernel:[3, 3]\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 102, in test_conv2d_ext\n msg=msg\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 1085, in assertEqual\n self.assertTrue(result, msg=msg)\nAssertionError: False is not true : device:cuda:0, dtype:torch.float32, group:1, batchsize:22input channel:80, output channel:192, bias:False, padding:[0, 0], dilation:[1, 1], stride:[1, 1], kernel:[3, 3]\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 106, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\nLooking at the first invalid convolution, for example, it's:\n {\n \"case_name\":\"masknet_p1:conv33\",\n \"mb\":1,\n \"g\":1,\n \"ic\":512,\n \"ih\":64,\n \"iw\":64,\n \"oc\":12,\n \"kh\":1,\n \"kw\":1,\n \"sh\":1,\n \"sw\":1,\n \"ph\":0,\n \"pw\":0,\n \"dh\":0,\n \"dw\":0,\n \"bias\":\"False\"\n },\n\nwhich has a dh and dw of zero, causing it to be added to invalid cases here:\ndh, dw = case['dh'], case['dw']\n has_bias = case['bias']\n if dh == 0 or dw == 0:\n invalid_cases.append(case_name)", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "mruberry" + }, + "databaseId": 663240268 + }, + { + "bodyText": "@mruberry the failure was not detected is because we did not export the cudnn path. Yes, you are right, we need to a large atol of 1e-2 . Would you please help review it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 664373079 + }, + { + "bodyText": "@mruberry the failure was not detected is because we did not export the cudnn path. Yes, you are right, we need to a large atol of 1e-2 . Would you please help review it again? Thanks.\n\nBefore I run these tests again, is an atol of 1e-2 needed for all types or just half? Also, how does 1e-2 compare to the values that are being compared?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 664569507 + }, + { + "bodyText": "@mruberry 1e-2 is experimental result, details see below, random means it might be failed sometimes.\n\n\n\natol,rtol\n1e-2,1e-2\n1e-2,1e-3\n1e-3,1e-2\n1e-3,1e-3\n1e-4,1e-3\n1e-3,1e-4\n1e-4,1e-4\n1e-4,1e-5\n1e-5,1e-4\n\n\n\n\nCuda float16\npass\npass\npass\npass\npass\nfail\nFail\nFail\nfail\n\n\nCuda float32\npass\nrandom\nrandom\nrandom\nrandom\nrandom\nrandom\nrandom\nfail", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 666894774 + }, + { + "bodyText": "@mruberry would you please find time to review it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 668380451 + }, + { + "bodyText": "@mruberry would you please find time to review it again? Thanks.\n\nI was just about to try and run this again locally but it looks like the files describing the convolutions are missing?", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 670306210 + }, + { + "bodyText": "@mruberry sorry but what is missing actually?", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 670322557 + }, + { + "bodyText": "@mruberry sorry but what is missing actually?\n\nThe JSON files.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 670591170 + }, + { + "bodyText": "@mruberry sorry but what is missing actually?\n\nThe JSON files.\n\n@mruberry sorry, we add them now, would you please check it again? Thanks.", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 673402901 + }, + { + "bodyText": "I cloned your repo and ran the tests:\n~/pytorch/test/math_libraries$ python convolutions.py\nFFFF\n======================================================================\nFAIL: test_conv2d_ext_cpu_float32 (__main__.TestConvExtCPU)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n----------------------------------------------------------------------\nRan 4 tests in 33.838s\n\nFAILED (failures=4)\n\nStill fails.", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 673760580 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOIapCfg==", + "hasPreviousPage": false + } + } + } + } + } + }, + "query_sha=2dc8bfb6750c4a2402124dc53123d266427c0b92d06add20e3221b57a0f5268f commit=6882717f73deffb692219ccd1fd6db258d8ed684 name=pytorch owner=pytorch": { + "data": { + "repository": { + "object": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625272" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hng=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625297" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hpE=" + }, + { + "node": { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625308" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hpw=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625328" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hrA=" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625347" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hsM=" + }, + { + "node": { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625357" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hs0=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241883/jobs/4095495959" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241883/jobs/4095496003" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241883/jobs/4095496162" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241883/jobs/4095496320" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241883/jobs/4095496465" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241883/jobs/4095496523" + }, + { + "name": "Test collect_env (older_python_version)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241883/jobs/4095496558" + }, + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241883/jobs/4095496708" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbCVA2Y=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625464" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hzg=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "trunk" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095496376" + }, + { + "name": "android-emulator-build-test / build-and-test", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095496525" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095496611" + }, + { + "name": "macos-10-15-py3-arm64 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095496713" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095496857" + }, + { + "name": "ios-12-5-1-x86-64 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095497178" + }, + { + "name": "libtorch-linux-bionic-cuda11.6-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095497392" + }, + { + "name": "win-vs2019-cuda11.6-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095497580" + }, + { + "name": "libtorch-linux-xenial-cuda10.2-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095497781" + }, + { + "name": "linux-bionic-py3.7-clang9-slow / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095497886" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095497997" + }, + { + "name": "macos-10-15-py3-lite-interpreter-x86-64 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095498146" + }, + { + "name": "macos-11-py3-x86-64 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095498338" + }, + { + "name": "caffe2-linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095498448" + }, + { + "name": "parallelnative-linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095498648" + }, + { + "name": "parallelnative-linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095659992" + }, + { + "name": "parallelnative-linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095660077" + }, + { + "name": "linux-bionic-py3.7-clang9-slow / test (slow, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095798458" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095840103" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095840227" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (slow, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095840377" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (nogpu_AVX512, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095840521" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095840605" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (jit_legacy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095840689" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095840741" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095840795" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095874982" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095875042" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 1, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095875174" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 2, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095875221" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 3, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095875266" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 4, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095875320" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 5, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095875369" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4095875417" + }, + { + "name": "macos-12.3-py3.8-arm64-test / Run MPS tests", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4096110771" + }, + { + "name": "macos-11-py3-x86-64 / test (default, 1, 2, macos-12)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4096408234" + }, + { + "name": "macos-11-py3-x86-64 / test (default, 2, 2, macos-12)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241915/jobs/4096408307" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbCn27w=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625556" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1h5Q=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-bionic-rocm5.1-py3.7", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095496220" + }, + { + "name": "win-vs2019-cuda11.6-py3", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095496344" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095496466" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095496612" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095496726" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095496862" + }, + { + "name": "linux-bionic-py3_7-clang8-xla / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095497204" + }, + { + "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095497405" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095497578" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095497784" + }, + { + "name": "linux-focal-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095497875" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095498008" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095498155" + }, + { + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095498346" + }, + { + "name": "linux-jammy-cuda11.6-cudnn8-py3.8-clang12 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095498440" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095498650" + }, + { + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095498724" + }, + { + "name": "linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095498883" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095499064" + }, + { + "name": "linux-focal-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095499218" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095499360" + }, + { + "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095615833" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095668105" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095668215" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095668293" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095668402" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095668480" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095668571" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095776890" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095776922" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095778975" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095794308" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095794370" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095794452" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095794502" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095794566" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095794652" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095794748" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095794836" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095800591" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095800638" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095800676" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095800723" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095800762" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095800805" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095813130" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095813208" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095858004" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095858063" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095858127" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbCcmdI=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625557" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1h5U=" + } + ], + "pageInfo": { + "hasNextPage": false + } + } + } + } + } + }, + "query_sha=23d6a47e5fd875c42231779040ec1d35d0042b502c9142cb0d33d6f65d58fead commit=6882717f73deffb692219ccd1fd6db258d8ed684 cr_cursor=Y3Vyc29yOnYyOpHPAAAAAbCcmdI= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAbH1h5Q= name=pytorch owner=pytorch": { + "data": { + "repository": { + "object": { + "oid": "6882717f73deffb692219ccd1fd6db258d8ed684", + "checkSuites": { + "nodes": [ + { + "checkRuns": { + "nodes": [ + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095858194" + }, + { + "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4095858272" + }, + { + "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2638241914/jobs/4096006884" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbCfo8c=", + "hasNextPage": false + } + } + } + ] + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=76118 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "malfet" + }, + "title": "Dummy change with lots of commits", + "body": "Draft PR with 100+ commits, to test mergebot ", + "headRefName": "malfet/pr-with-lots-of-commits", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "3067f2240afc7a29dc348000aa19eccbd9772303" + } + }, + { + "commit": { + "author": { + "user": { + "login": "andrewor14" + }, + "email": "andrewor@fb.com", + "name": "Andrew Or" + }, + "oid": "2f655b71f70c496c4e645f6cdb27d7bb7e825701" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "0c6dcaa7f58a19c42a530f4ee14bb6f0f03ca9fb" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "cad11c563d41ebcffb1683fe1f1288b8157413b3" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "jwtan@fb.com", + "name": "Jiewen Tan" + }, + "oid": "4dfd0875a68d87fccb5ad0d81692db480043b86e" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "2d37e74690582a4a26890e4c8b98f1f80e589c82" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "jwtan@fb.com", + "name": "Jiewen Tan" + }, + "oid": "d4aee60947e1a3ef23c7c42990621e0746fdd0a8" + } + }, + { + "commit": { + "author": { + "user": { + "login": "peterbell10" + }, + "email": "peterbell10@live.co.uk", + "name": "Peter Bell" + }, + "oid": "aac6204bf710beb5e50a383d426ae6222396335a" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "4b0362cab884584c24f5834b3874f5f357f56b5d" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "7536df613cbc645a9e68e6a3b0a8450753260fd1" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "20a50cb966d28d7bf82924adf781cf72a01ef90e" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "486387e8644afb46edff5aa5925b55c8119f67f0" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "acb9d78b9b732d3667b881727e6ed9f92a8c549f" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "683bb7959a5b973f8470c081ad02e8fc508e784a" + } + }, + { + "commit": { + "author": { + "user": { + "login": "qihqi" + }, + "email": "qihan@fb.com", + "name": "Han Qi" + }, + "oid": "a870cb40af65adf0b77d55f6b554d7093d284d7a" + } + }, + { + "commit": { + "author": { + "user": { + "login": "Krovatkin" + }, + "email": "korovaikon@gmail.com", + "name": "Nikolay Korovaiko" + }, + "oid": "70793b9f328ddf52cc86336104c3a064c8582ef4" + } + }, + { + "commit": { + "author": { + "user": { + "login": "suo" + }, + "email": "suo@fb.com", + "name": "Michael Suo" + }, + "oid": "f70b31f62b1c5159eef2725484b175983517c88c" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dagitses" + }, + "email": "mikeyd@fb.com", + "name": "Michael Andreas Dagitses" + }, + "oid": "04d3ec1db60defe1c6904bf77e9f8dfa87dc0b63" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "46b754a55b63e3168ad5854ad412c124934b675d" + } + }, + { + "commit": { + "author": { + "user": { + "login": "robieta" + }, + "email": "taylorrobie@fb.com", + "name": "Taylor Robie" + }, + "oid": "13df69e13ee571fdd716139419a00aec47ade7d6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "70642e911ec80a47cdbf4a50aac475c11aa129b6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "59bb7c39384bf3e0b284a037adef8b3caa53c1c4" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "007cfb97b55d70ff63e1ed71d1a674638f847376" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "0a7b858a5af1393fa3cf2853f92eca0e1d408dde" + } + }, + { + "commit": { + "author": { + "user": { + "login": "qihqi" + }, + "email": "qihan@fb.com", + "name": "Han Qi" + }, + "oid": "7917d789f0a523715041ade5177d271082628236" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kit1980" + }, + "email": "sdym@fb.com", + "name": "Sergii Dymchenko (Meta Employee)" + }, + "oid": "91eb6017f0fb8a1b29e8cb48fac93bc9709f73b3" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dagitses" + }, + "email": "mikeyd@fb.com", + "name": "Michael Andreas Dagitses" + }, + "oid": "bd04dca5fabb0c2a51ac87063a515f256ef274fa" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dagitses" + }, + "email": "mikeyd@fb.com", + "name": "Michael Andreas Dagitses" + }, + "oid": "1f805a5defda7dabc49d0059edb9ccb06bc29352" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@fb.com", + "name": "Mike Ruberry" + }, + "oid": "4982c0a8db8f23d15ec4bfcbca4ce939afc04954" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pearu" + }, + "email": "pearu.peterson@gmail.com", + "name": "Pearu Peterson" + }, + "oid": "28502265cb5925cb7db8dcb2dd2334963092714a" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "e03fcaedb1342e6d65c7f7f20243000938ba60b2" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pritamdamania" + }, + "email": "pritam.damania@fb.com", + "name": "pritam" + }, + "oid": "efb28f5a1a5d18aa96bd668ab2ab5c651be359f3" + } + }, + { + "commit": { + "author": { + "user": { + "login": "MagiaSN" + }, + "email": "magialiao@tencent.com", + "name": "magialiao" + }, + "oid": "52cc1b9994f861ebdd3908759ed1ab11cba1f8de" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "3cd99f23d1acd6a5bedf6f3b02be79d64350a5b6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "awgu" + }, + "email": "andgu@fb.com", + "name": "Andrew Gu" + }, + "oid": "b00502c634a5146f4d996bd90e84d317f049e7b0" + } + }, + { + "commit": { + "author": { + "user": { + "login": "davidberard98" + }, + "email": "dberard@fb.com", + "name": "David Berard" + }, + "oid": "976eb7cee799dddfbe6a4122b249aaee1b6c8854" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" + }, + "oid": "9608ab28744d5cae32f371490557b248c9549c66" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "4e119f0c39eb5ff0777f0e71561e6b633d85fb34" + } + }, + { + "commit": { + "author": { + "user": { + "login": "rohan-varma" + }, + "email": "rvarm1@fb.com", + "name": "Rohan Varma" + }, + "oid": "447580dc565f3660eddb2c996c6ed25b88338684" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "2bc8f43e9233008ea23053fab87b83ab36fca5e3" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "c13a8e891c3e3e714f60649ca1e3b082e090e9fe" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "fddc861b7ee473f57d3c2161e4618a2663a237e8" + } + }, + { + "commit": { + "author": { + "user": { + "login": "jiyuanzFB" + }, + "email": "jiyuanz@fb.com", + "name": "Jiyuan Zhang" + }, + "oid": "e2336dbc539d6c021720cbe43c92c9e4c8463299" + } + }, + { + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "26e2759d1ad59aac12168b74d1ca55e42ba9455c" + } + }, + { + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "ad7aa914ee3b3d1252e31514f010ba96c40aae87" + } + }, + { + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "f113c5d78065aafbe7b1c0e611945bfe9f67b3c0" + } + }, + { + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "a366fd01136292544b7862968ae92feba4b6d8fe" + } + }, + { + "commit": { + "author": { + "user": { + "login": "seemethere" + }, + "email": "eliuriegas@fb.com", + "name": "Eli Uriegas" + }, + "oid": "afeba0773749da5883c378a2e6ac066e1ce62ca0" + } + }, + { + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "d306c99addc543908f64666baeecacbd0749f4a7" + } + }, + { + "commit": { + "author": { + "user": { + "login": "awgu" + }, + "email": "andgu@fb.com", + "name": "Andrew Gu" + }, + "oid": "c2456ea658f41f64ea054a422edf22a9c977399f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "awgu" + }, + "email": "andgu@fb.com", + "name": "Andrew Gu" + }, + "oid": "a8b0a1b681c9fe41e0d553c962a5c93e81d92503" + } + }, + { + "commit": { + "author": { + "user": { + "login": "anjali411" + }, + "email": "chourdiaanjali123@gmail.com", + "name": "anjali411" + }, + "oid": "af761d9a5d058c9188f16589bae4f307d35185be" + } + }, + { + "commit": { + "author": { + "user": { + "login": "clee2000" + }, + "email": "csl@fb.com", + "name": "Catherine Lee" + }, + "oid": "beceb417baef35b15c2716e23178fb49f7fd6f9d" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "1516554e22136db89d0aeba43a1a1a987e995d68" + } + }, + { + "commit": { + "author": { + "user": { + "login": "qihqi" + }, + "email": "qihan@fb.com", + "name": "Han Qi" + }, + "oid": "68eb1fa8374eff6cbdcf0be5e37ed6775d22e722" + } + }, + { + "commit": { + "author": { + "user": { + "login": "janeyx99" + }, + "email": "janeyx@fb.com", + "name": "Jane Xu" + }, + "oid": "3c7bcb99b5c0c879c2610f427880b03881f82f38" + } + }, + { + "commit": { + "author": { + "user": { + "login": "janeyx99" + }, + "email": "janeyx@fb.com", + "name": "Jane Xu" + }, + "oid": "38c1a2028090353e40a019c673c9ab16b39e4825" + } + }, + { + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "8091cbea2c95ed2c4c406b3c61547a27c6319bae" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "d81f59121969a47c8b2213a88e02cf9be0219be9" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "20d798b319cd107a767fe220f7a3027c18a1c844" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "eb35381a770b58c1cd41e935910cb4df2f3d8f14" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "e6498a657b9aa47546dcd92d1b4ffb2e1a50ebdb" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "7f821382db5ad08efe5b09a145c606852b8a9272" + } + }, + { + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "995c0e11a97d854ff969962bd81d7341e46ecb07" + } + }, + { + "commit": { + "author": { + "user": { + "login": "davidberard98" + }, + "email": "dberard@fb.com", + "name": "David Berard" + }, + "oid": "28d6258e62c9fc361a18689877c962c69889dc23" + } + }, + { + "commit": { + "author": { + "user": { + "login": "HarborYuan" + }, + "email": "yuanhaobo@whu.edu.cn", + "name": "Haobo Yuan" + }, + "oid": "2350fad8391367ebf81c7236a2c883644b4ff622" + } + }, + { + "commit": { + "author": { + "user": { + "login": "zou3519" + }, + "email": "zou3519@gmail.com", + "name": "Richard Zou" + }, + "oid": "3f789c9ccecdd7e2e52269453646e992a68c6b92" + } + }, + { + "commit": { + "author": { + "user": { + "login": "jeffdaily" + }, + "email": "jeff.daily@amd.com", + "name": "Jeff Daily" + }, + "oid": "20f79f610c1a3314da96d49515bbfbee9442e4f8" + } + }, + { + "commit": { + "author": { + "user": { + "login": "peterbell10" + }, + "email": "peterbell10@live.co.uk", + "name": "Peter Bell" + }, + "oid": "5823958f047f3b71a5dc8c52a20eb8ae3291bd3e" + } + }, + { + "commit": { + "author": { + "user": { + "login": "peterbell10" + }, + "email": "peterbell10@live.co.uk", + "name": "Peter Bell" + }, + "oid": "a0b15c49ecf3844daf2c0dcaef44f0214259db20" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "4afc38c25ca2ca126ba4987a419a58a5c572223b" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "b606f58d4a36683fbe0a7d02adfdde7d5cc694c2" + } + }, + { + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "2d61b4d630f6482a6c3cc7437091fad6d27c347e" + } + }, + { + "commit": { + "author": { + "user": { + "login": "george-qi" + }, + "email": "georgeqi94@gmail.com", + "name": "George Qi" + }, + "oid": "bc5384c47036a6cda94129f3e2f9e43c43393698" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "60fc3277634365b64465712b13db2acb76d6c890" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "1b8762e95bc38d1847fe99ed3230546c8b800bfd" + } + }, + { + "commit": { + "author": { + "user": { + "login": "jerryzh168" + }, + "email": "jerryzh168@gmail.com", + "name": "Jerry Zhang" + }, + "oid": "6acf60f95f59ecbc6e8ce830dea0abba7d3ec763" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ysiraichi" + }, + "email": "yukio.siraichi@gmail.com", + "name": "Yukio Siraichi" + }, + "oid": "8fb0276561fdd530c5a06ea195e930e0584f8705" + } + }, + { + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "1da7aed95a8700406671425eac1e4bbc2c7a24b5" + } + }, + { + "commit": { + "author": { + "user": { + "login": "thiagocrepaldi" + }, + "email": "thiago.crepaldi@microsoft.com", + "name": "Thiago Crepaldi" + }, + "oid": "83208e7dee4503c1bee1df9f6632794694dffa01" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "1a46cf08dcd3d3564604c17b2c02d7e4eb45a7ff" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "b7f9b6689445f826c83694652fea5f7cfc7070d7" + } + }, + { + "commit": { + "author": { + "user": { + "login": "fatcat-z" + }, + "email": "jiz@microsoft.com", + "name": "Jay Zhang" + }, + "oid": "f273961c1696b156e35f8c76f7ad37934031050d" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pavithranrao" + }, + "email": "pavithran@fb.com", + "name": "Pavithran Ramachandran" + }, + "oid": "eb410a51fcbc716873fd80a970eb932d4aaaea61" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" + }, + "oid": "7dbb12cdc02332fa64264ed0df576511a5070d7e" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "43675665fa6b5154de8b25125dd03d7be35c884f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "6c4d23c402c413667463770d9a2fa801f493d3c5" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "cf3778a35129a40dee14366515201b7ed2c0f346" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "9d00a051373cb81f79cb6375942cf3ec9fff2fe6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "1eae67cf404aa8dffb80b8e85180f943878d52a6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "janeyx99" + }, + "email": "janeyx@fb.com", + "name": "Jane Xu" + }, + "oid": "ce0e69dcda0fe41a6e964d6ac70ce8016979c71a" + } + }, + { + "commit": { + "author": { + "user": { + "login": "swolchok" + }, + "email": "swolchok@fb.com", + "name": "Scott Wolchok" + }, + "oid": "6faba554f6e49777f24911928edb3061b6ed0e3d" + } + }, + { + "commit": { + "author": { + "user": { + "login": "IvanYashchuk" + }, + "email": "ivan.yashchuk@aalto.fi", + "name": "Ivan Yashchuk" + }, + "oid": "d1d0e03f57a359f8f95331f9a34b8bed3e7cc845" + } + }, + { + "commit": { + "author": { + "user": { + "login": "Chillee" + }, + "email": "chilli@fb.com", + "name": "Horace He" + }, + "oid": "bb46bd9233a9fc631802a902cb48a4c13c2722ca" + } + }, + { + "commit": { + "author": { + "user": { + "login": "mehtanirav" + }, + "email": "niravmehta@fb.com", + "name": "Nirav Mehta" + }, + "oid": "3b1007fe4be12e483f2620fbac67cae42e703efc" + } + }, + { + "commit": { + "author": { + "user": { + "login": "mehtanirav" + }, + "email": "niravmehta@fb.com", + "name": "Nirav Mehta" + }, + "oid": "b4b65228dd0c109f5fdf17c7d9e56f60a98e398b" + } + }, + { + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "d629e300705196d3ae0bac5ed983b197101fa2ee" + } + }, + { + "commit": { + "author": { + "user": { + "login": "bigfootjon" + }, + "email": "jonjanzen@fb.com", + "name": "Jon Janzen" + }, + "oid": "52754b9e515f378f8476ad44d75b0a692bad8cde" + } + }, + { + "commit": { + "author": { + "user": { + "login": "samdow" + }, + "email": "samdow@fb.com", + "name": "samdow" + }, + "oid": "128c3ad747093f4970329a82c7c4720420faeff2" + } + }, + { + "commit": { + "author": { + "user": { + "login": "arindamroy-eng" + }, + "email": "61168652+arindamroy-eng@users.noreply.github.com", + "name": "arindamroy-eng" + }, + "oid": "2a0bda7d32a5bcc9827f7254a7b77cceb16ba973" + } + } + ], + "pageInfo": { + "endCursor": "MTAw", + "hasNextPage": true + }, + "totalCount": 131 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.intern.facebook.com/cla/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuNRg4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693698" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRAI=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693712" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRBA=" + }, + { + "node": { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693725" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRB0=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693741" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRC0=" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693761" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsREE=" + }, + { + "node": { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693774" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRE4=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192463/jobs/3232430975" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuNR-Y=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694412" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRsw=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461134" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461211" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461301" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461386" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461521" + }, + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461634" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461717" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuN84s=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694417" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRtE=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232460797" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232460951" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461088" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461294" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461410" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461543" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461628" + }, + { + "name": "linux-bionic-rocm5.0-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461719" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461789" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461869" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461946" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462044" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462112" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462244" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462360" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462432" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462521" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462621" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462683" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462738" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232545510" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232545571" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547522" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547612" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547714" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547764" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547824" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547869" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547909" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547973" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232553452" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232553558" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232553605" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232553650" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232563716" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232563763" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232582650" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232582703" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232582741" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232590204" + }, + { + "name": "linux-bionic-rocm5.0-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232608872" + }, + { + "name": "linux-bionic-rocm5.0-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232608976" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232637097" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232637199" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232637259" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232639932" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232687012" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232687074" + }, + { + "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232785088" + }, + { + "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232785153" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuVD9M=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694439" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRuc=" + } + ], + "pageInfo": { + "hasNextPage": false + } + }, + "status": null, + "pushedDate": "2022-04-20T17:10:41Z", + "oid": "5696e8357cf38f852ef3d680381513e26f202371" + } + } + ] + }, + "changedFiles": 348, + "files": { + "nodes": [ + { + "path": ".circleci/cimodel/data/pytorch_build_data.py" + }, + { + "path": ".circleci/cimodel/data/pytorch_build_definitions.py" + }, + { + "path": ".circleci/scripts/cpp_doc_push_script.sh" + }, + { + "path": ".circleci/scripts/python_doc_push_script.sh" + }, + { + "path": ".github/actions/checkout-pytorch/action.yml" + }, + { + "path": ".github/merge_rules.json" + }, + { + "path": ".github/scripts/gitutils.py" + }, + { + "path": ".github/scripts/gql_mocks.json" + }, + { + "path": ".github/scripts/trymerge.py" + }, + { + "path": ".github/workflows/_bazel-build-test.yml" + }, + { + "path": ".github/workflows/_linux-build.yml" + }, + { + "path": ".github/workflows/_linux-test.yml" + }, + { + "path": ".github/workflows/_mac-test.yml" + }, + { + "path": ".github/workflows/_rocm-test.yml" + }, + { + "path": ".github/workflows/_win-test.yml" + }, + { + "path": ".github/workflows/buck_build_test.yml" + }, + { + "path": ".github/workflows/lint.yml" + }, + { + "path": ".github/workflows/periodic.yml" + }, + { + "path": ".github/workflows/pull.yml" + }, + { + "path": ".github/workflows/trunk.yml" + }, + { + "path": ".jenkins/pytorch/macos-test.sh" + }, + { + "path": ".jenkins/pytorch/test.sh" + }, + { + "path": ".jenkins/pytorch/win-test.sh" + }, + { + "path": ".lintrunner.toml" + }, + { + "path": "BUILD.bazel" + }, + { + "path": "CODEOWNERS" + }, + { + "path": "README.md" + }, + { + "path": "aten/src/ATen/BatchingRegistrations.cpp" + }, + { + "path": "aten/src/ATen/Dispatch.h" + }, + { + "path": "aten/src/ATen/ExpandUtils.h" + }, + { + "path": "aten/src/ATen/FunctionalInverses.cpp" + }, + { + "path": "aten/src/ATen/FunctionalStorageImpl.cpp" + }, + { + "path": "aten/src/ATen/FunctionalStorageImpl.h" + }, + { + "path": "aten/src/ATen/FunctionalTensorWrapper.cpp" + }, + { + "path": "aten/src/ATen/FunctionalTensorWrapper.h" + }, + { + "path": "aten/src/ATen/FunctionalizeFallbackKernel.cpp" + }, + { + "path": "aten/src/ATen/NestedTensorImpl.cpp" + }, + { + "path": "aten/src/ATen/OpMathType.h" + }, + { + "path": "aten/src/ATen/SparseCsrTensorUtils.h" + }, + { + "path": "aten/src/ATen/ThreadLocalState.cpp" + }, + { + "path": "aten/src/ATen/ThreadLocalState.h" + }, + { + "path": "aten/src/ATen/autocast_mode.cpp" + }, + { + "path": "aten/src/ATen/autocast_mode.h" + }, + { + "path": "aten/src/ATen/core/SymIntArrayRef.cpp" + }, + { + "path": "aten/src/ATen/core/SymIntArrayRef.h" + }, + { + "path": "aten/src/ATen/core/TensorBase.h" + }, + { + "path": "aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h" + }, + { + "path": "aten/src/ATen/core/dispatch/Dispatcher.h" + }, + { + "path": "aten/src/ATen/core/interned_strings.h" + }, + { + "path": "aten/src/ATen/core/ivalue.cpp" + }, + { + "path": "aten/src/ATen/core/ivalue.h" + }, + { + "path": "aten/src/ATen/core/ivalue_inl.h" + }, + { + "path": "aten/src/ATen/core/jit_type.h" + }, + { + "path": "aten/src/ATen/core/jit_type_base.h" + }, + { + "path": "aten/src/ATen/core/type.cpp" + }, + { + "path": "aten/src/ATen/cuda/CUDASparse.h" + }, { - "node": { - "name": "triaged" - } + "path": "aten/src/ATen/cuda/llvm_complex.cpp" }, { - "node": { - "name": "open source" - } + "path": "aten/src/ATen/cuda/llvm_jit_strings.h" }, { - "node": { - "name": "cla signed" - } + "path": "aten/src/ATen/native/Blas.cpp" }, { - "node": { - "name": "Stale" - } - } - ] - } - } - } - } - }, - "query_sha=62ce809793481ce6ddce6e1a19d9b0761755ff0ff75decaf8a79419eaf793110 cursor=Y3Vyc29yOnYyOpHOKCmhXQ== name=pytorch number=31093 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "comments": { - "nodes": [ + "path": "aten/src/ATen/native/Itertools.cpp" + }, { - "bodyText": "Hi, @mingfeima @soumith @Jianhui-Li\nthis will improve the test coverage of mkldnn convolution, would you please review it?\nThe current code is forward only, do we need to cover backward, if yes, we can add backward.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 564806270 + "path": "aten/src/ATen/native/LinearAlgebra.cpp" }, { - "bodyText": "@mingxiaoh, what is the value in testing DNNL as part of Pytorch validation for the Pytorch developers? Shouldn't having these tests run in DNNL validation be enough?", - "author": { - "login": "vpirogov" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 564808528 + "path": "aten/src/ATen/native/SoftMax.cpp" }, { - "bodyText": "@vpirogov The main value is to serve as a blind test to DNNL. If DNNL adds these test to DNNL test sets, it lost the value as a blind test. The spirit of validation is to cross check.\n@gottbrath @gchanan The test was developed per the request of Pytorch team. Mingxiao made an effort to reduce the execution time to a few second but still with good coverage. Although the test today is focused on DNNL, it could be easily extended to be blind test for any conv implementation used in Pytorch.", - "author": { - "login": "Jianhui-Li" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 567826907 + "path": "aten/src/ATen/native/TensorConversions.cpp" }, { - "bodyText": "@mruberry thanks for the comment. As for the chainer dependency, we import it is because we would like to use its testing function for pytest test cases combinations, other wise we need to write much more code to achieve same effect. So, can we use it?", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 574563012 + "path": "aten/src/ATen/native/TensorShape.cpp" }, { - "bodyText": "@mingxiaoh You cannot import chainer. Looking at the code you should be able to achieve the same effect without it.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 575272358 + "path": "aten/src/ATen/native/TensorShape.h" }, { - "bodyText": "@mruberry ok, we will change it according to your requirement. Thanks", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 583917522 + "path": "aten/src/ATen/native/Unique.cpp" }, { - "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/31093\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 29f6aa6 (more details on the Dr. CI page):\n\nCommit 29f6aa6 was recently pushed. Waiting for builds...\n\nThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", - "author": { - "login": "dr-ci" - }, - "authorAssociation": "NONE", - "editor": { - "login": "facebook-github-bot" - }, - "databaseId": 628466876 + "path": "aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu" }, { - "bodyText": "@mruberry how about those cudnn UT error? we add check for it but it should be NV to fix cudnn bugs.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 629955767 + "path": "aten/src/ATen/native/cuda/CUDAJitLoops.cuh" }, { - "bodyText": "Hey @mingxiaoh! You're right, of course, that you shouldn't have to fix cuDNN bugs. Would you please:\n\nAssert that the test case fails, so we know it's failing and if someone fixes it they'll know what test to update.\nFile a new issue explaining the behavior and providing a short PyTorch program to reproduce the issue.\n\nThen we can ping NVIDIA on that issue.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 629997129 + "path": "aten/src/ATen/native/cuda/JitLoops.cuh" }, { - "bodyText": "about the suggestion 'Assert that the test case fails, so we know it's failing and if someone fixes it they'll know what test to update. ', if we only assert it and continue the following test, I guess users might always ignore them in later test. Anyway, any similar example case for reference?", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 630010734 + "path": "aten/src/ATen/native/cuda/Lerp.cu" }, { - "bodyText": "In this recent PR https://github.com/pytorch/pytorch/pull/38505/files, for example, you can see that the construction of bool tensors wasn't working properly, so the test author cited the relevant issue and asserted that the incorrect behavior happened, as expected. You can also see how these lines are being removed by https://github.com/pytorch/pytorch/pull/38392/files, which fixes the issue.\nAnother common pattern is to use with self.assertRaises(RuntimeError/AssertionError/etc.):.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 630014823 + "path": "aten/src/ATen/native/cuda/PersistentSoftmax.cuh" }, { - "bodyText": "@mruberry the failed UT case is not introduced by our modification, how to handle this issue?", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 631187735 + "path": "aten/src/ATen/native/cuda/SoftMax.cu" }, { - "bodyText": "@mingxiaoh You mean the failures on ROCm? You may ignore them. Be sure to re-request review when you're ready.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 631191425 + "path": "aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu" }, { - "bodyText": "@mruberry we already skipped those ROCm errors, but there are stil somel error caused by the original code, they are not introduced by our modification.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 631886529 + "path": "aten/src/ATen/native/cuda/Unique.cu" }, { - "bodyText": "I understand. Let me know when you're ready for me to review.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 631908011 + "path": "aten/src/ATen/native/cuda/jit_utils.cpp" }, { - "bodyText": "@mruberry thanks, we are ready for review now.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 631909442 + "path": "aten/src/ATen/native/cuda/jit_utils.h" }, { - "bodyText": "@mingxiaoh Great! I'll take a look ASAP.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 631910556 + "path": "aten/src/ATen/native/native_functions.yaml" }, { - "bodyText": "@mruberry we just pull the latest code and updated the patch according to your comment, may you please help double check it? BTW, the new failed case in preci is not introduced by our modification.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 633430458 + "path": "aten/src/ATen/native/nested/NestedTensorMath.cpp" }, { - "bodyText": "@ailzhang would you please check the comment below? Thanks.\nIs there a reason why this TestConv2dExt is a new class instead a test inside TestNN?\n//comment: it is actually suggested by Tongzhou Wang in another thread before.\nAlthough this test sits in generic testing framework, it's actually comparing thnn/mkldnn/cudnn results specially. I feel it's better to make it truly generic so that it compares any device result with CPU result. Alternatively you can mark this test only run when torch.backends.mkldnn.is_available()=True\n//comment: but our goal is to compare the result with that of thnn. Anyway, if you insist, we can start to compare it with cpu.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": { - "login": "mingxiaoh" - }, - "databaseId": 634432326 + "path": "aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp" }, { - "bodyText": "Pruning reviewers. @ngimel, @VitalyFedyunin, this PR is looking pretty good from a test framework perspective. Would one of you like to review?", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 634557563 + "path": "aten/src/ATen/native/quantized/cpu/qsoftmax.cpp" }, { - "bodyText": "@mruberry Thanks, would you please help review it again. BTW: failed case is not introduced by our modification.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 635256214 + "path": "aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp" }, { - "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code", - "author": { - "login": "1pikachu" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 637364148 + "path": "aten/src/ATen/native/quantized/cudnn/Linear.cpp" }, { - "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code\n\n@ngimel will follow-up on the test itself sometime this week or early next week.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 637444457 + "path": "aten/src/ATen/native/quantized/cudnn/utils.h" }, { - "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code\n\n@ngimel will follow-up on the test itself sometime this week or early next week.\n\n@mruberry thank you", - "author": { - "login": "1pikachu" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 637479226 + "path": "aten/src/ATen/native/sparse/SparseCsrTensor.cpp" }, { - "bodyText": "Improving test coverage of math libraries is certainly a good goal and this PR is moving towards it. I have some doubts about implementation decisions made, and about running this PR as part of regular pytorch CI.\nIf the primary goal of this PR is to test correctness of the convolution implementations in the vendor library, then it does not serve this purpose. The absolute majority of the 4000+ test cases come from group 1, where different kernel sizes/strides/dilations are used to produce the output of size 1x1. This can test whether pytorch correctly passes convolution parameters to the backends (although there are cheaper ways to do that), but as actual library correctness check it is almost useless - libraries use very different kernels depending in the input/output sizes, and tests with toy sizes like this don't invoke the real bread-and-butter kernels.\nAlso, if this test suite is meant as primary a means of testing vendor libraries (which is a good goal!) it does not have a place as a part of pytorch regular CI, and should be run when the corresponding vendor libraries are updated. I'd suggest moving this test out into a separate file (maybe even outside of torch/test directory) and have it as a part of library update/qualification process rather than regular CI.\nAlso, if the primary goal is to enable easier testing of vendor libraries correctness, perhaps we should rethink the mechanism of the generation of test cases. It should be easy to add a test case with a particular set of parameters that was found to be buggy. Also, running a cross-product of cases in a multi-dimensional space (as this PR does) is rarely an efficient way of getting a signal, some forms of random sampling usually provide a way to get better correctness signal why using less resources.\nAlso, when testing libraries it is important to test both forward and backward functions, whereas this PR does forward only. I'm openminded on whether convTransposed should be tested or not - if we are testing vendor libraries, then it's not necessary, convTransposed calls the same underlying functions, if we are testing pytorch, then it makes sense to test it separately because it takes different codepaths.", - "author": { - "login": "ngimel" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 637827507 + "path": "aten/src/ATen/native/ts_native_functions.yaml" }, { - "bodyText": "@mruberry ngimel is quite responsible, but it seems that she is not familiar with the background of this pull-request, since this pull-request is pending for so such a long time, each time we are almost done, then reviewer changes, each reviewer has different idea, it is good, but, would it be better if you help review it or ask the same reviewer to review it considering that you are more familiar with the background/change history? Thanks in advance.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 637912105 + "path": "aten/src/ATen/record_function.cpp" }, { - "bodyText": "@mruberry ngimel is quite responsible, but it seems that she is not familiar with the background of this pull-request, since this pull-request is pending for so such a long time, each time we are almost done, then reviewer changes, each reviewer has different idea, it is good, but, would it be better if you help review it or ask the same reviewer to review it considering that you are more familiar with the background/change history? Thanks in advance.\n\nWe know this PR has been open for awhile and we respect that your time is valuable, but we want to make sure we're making the right change here, and I think @ngimel's comments reflect that and should not be too difficult to address. As I understand, her points are:\n\nThis is a good PR with an exciting idea. To let it run longer and test more cases maybe it should run outside the regular PyTorch CI.\nTo remedy this, let's create a test/math_libraries folder and put this test there: test/math_libaries/convolutions.py. Yes, this is different from our requests in the past, which is our mistake, but it should be an easy change.\nTo make the test more interesting it'd be good for the test cases to resemble convolutions used in practice. The current test cases seem like similar \"toy\" examples. Without time pressure we should be able to run larger, more computationally intensive convolutions.\nLet's change the test cases to include some practical convolutions, make it easy to add test cases, and think about how we might generate other interesting cases. (We should also test backwards once we have more time!)\n\nAnd I think these are good points. Maybe the PR doesn't create a new way to generate interesting convolutions to start and instead only runs a few representative convolutions, but @ngimel is positioning the work for success so that it's useful and we can continue to improve on it in the future.\nDoes that make sense?", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 637924703 + "path": "aten/src/ATen/record_function.h" + }, + { + "path": "aten/src/ATen/templates/Operators.h" + }, + { + "path": "aten/src/ATen/templates/RegisterFunctionalization.cpp" }, { - "bodyText": "@mruberry we were required to finish the test in limited time long long before, at that time, jianhui discussed this issue with you, and you are all agreed with the current test scope and test case number and test time, so you meant you change your mind now? you are not care about the test time currently? Sorry, this issue is pending so long, we are struggling with it now and would like to finish it asap. Given this, it would be be better if you raise all the requirement at a time, considering that we have many tasks at hand, we are hoping so eagerly that we can finish this PR and use it for further test for bugs finding.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": { - "login": "mingxiaoh" - }, - "databaseId": 637960626 + "path": "aten/src/ATen/test/basic.cpp" }, { - "bodyText": "@mruberry we were required to finish the test in limited time long long before, at that time, jianhui discussed this issue with you, and you are all agreed with the current test scope and test case number and test time, so you meant you change your mind now? you are not care about the test time currently? Sorry, this issue is pending so long, we are struggling with it now and would like to finish it asap. Given this, it would be be better if you raise all the requirement at a time, considering that we have many tasks at hand, we are hoping so eagerly that we can finish this PR and use it for further test for bugs finding.\n\nI'm sorry, I don't think I've talked to @Jianhui-Li before. It's true that the team we expressed a concern about timing if the test was to be run in the CI initially, but I think now that we understand what the test is trying to do better we're not sure the CI is the best place for it. The PR was also closed after a lengthy period of inactivity, and we assumed it had simply been abandoned.\nDo you know who @Jianhui-Li spoke with about this issue originally? Maybe I can follow-up with them for more context.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 637967153 + "path": "aten/src/ATen/test/vmap_test.cpp" }, { - "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 637978356 + "path": "binaries/record_function_benchmark.cc" }, { - "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?\n\nI think this will be easier to discuss at the regular Intel-FB meeting.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 638446723 + "path": "c10/core/DispatchKey.cpp" }, { - "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?\n\nI think this will be easier to discuss at the regular Intel-FB meeting.\n\nLet me sync with Mingxiao and follow up with this. Thanks.", - "author": { - "login": "Jianhui-Li" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 638451670 + "path": "c10/core/DispatchKey.h" }, { - "bodyText": "@mruberry would you please help review it again?", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 653028208 + "path": "c10/core/DispatchKeySet.h" }, { - "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 654443242 + "path": "c10/test/core/DispatchKeySet_test.cpp" }, { - "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 656062287 + "path": "c10/util/ArrayRef.h" }, { - "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks\n\n@mruberry the code is ready for review now, would you please take time for it? Thanks.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 658071151 + "path": "caffe2/core/tensor.h" }, { - "bodyText": "super nit: renaming files to .json will make it more IDE friendly.", + "path": "docs/source/conf.py" + }, + { + "path": "docs/source/fx.rst" + } + ], + "pageInfo": { + "endCursor": "MTAw", + "hasNextPage": true + } + }, + "reviews": { + "nodes": [], + "pageInfo": { + "startCursor": null, + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "Merge failed due to Matched rule superuser, but it was not reviewed yet by any of:zou3519,abhikrish,mehtanirav,wconstab,lc0, ...", "author": { - "login": "VitalyFedyunin" + "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 658464685 + "databaseId": 1104215370 }, { - "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks\n\n@mruberry the code is ready for review now, would you please take time for it? Thanks.\n\nCool! I took a look with @ngimel, once these issues are addressed I think we're good to go!", + "bodyText": "Merge failed due to Matched rule superuser, but PR has not been reviewed yet", "author": { - "login": "mruberry" + "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 659164401 + "databaseId": 1104220908 }, { - "bodyText": "@ngimel & @VitalyFedyunin We have changed the code according to your suggestions, would you please review it again? Thanks.", + "bodyText": "@pytorchbot merge this", "author": { - "login": "mingxiaoh" + "login": "malfet" }, - "authorAssociation": "NONE", + "authorAssociation": "MEMBER", "editor": null, - "databaseId": 660884305 + "databaseId": 1104378397 }, { - "bodyText": "@ngimel & @VitalyFedyunin We have changed the code according to your suggestions, would you please review it again? Thanks.\n\nUpdated: one more question about tolerances, one code cleanup recommendation, and one task leftover from the last review.", + "bodyText": "Merge failed due to Matched rule superuser, but PR has not been reviewed yet\nRaised by https://github.com/pytorch/pytorch/actions/runs/2197877090", "author": { - "login": "mruberry" + "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 662678464 + "databaseId": 1104379712 }, { - "bodyText": "Updated: one more question about tolerances, one code cleanup recommendation, and one task leftover from the last review.\n@mruberry we have finished the modification according to your comment, would you please review it again? Thanks.", + "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.", "author": { - "login": "mingxiaoh" + "login": "github-actions" }, "authorAssociation": "NONE", "editor": null, - "databaseId": 662930687 - }, + "databaseId": 1160658699 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQdD9Sg==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ { - "bodyText": "The code looks good, but I tried running the test suite and hit the following failures:\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 102, in test_conv2d_ext\n msg=msg\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 1085, in assertEqual\n self.assertTrue(result, msg=msg)\nAssertionError: False is not true : device:cuda:0, dtype:torch.float16, group:1, batchsize:22input channel:448, output channel:384, bias:False, padding:[1, 1], dilation:[1, 1], stride:[1, 1], kernel:[3, 3]\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 102, in test_conv2d_ext\n msg=msg\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 1085, in assertEqual\n self.assertTrue(result, msg=msg)\nAssertionError: False is not true : device:cuda:0, dtype:torch.float32, group:1, batchsize:22input channel:80, output channel:192, bias:False, padding:[0, 0], dilation:[1, 1], stride:[1, 1], kernel:[3, 3]\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 106, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\nLooking at the first invalid convolution, for example, it's:\n {\n \"case_name\":\"masknet_p1:conv33\",\n \"mb\":1,\n \"g\":1,\n \"ic\":512,\n \"ih\":64,\n \"iw\":64,\n \"oc\":12,\n \"kh\":1,\n \"kw\":1,\n \"sh\":1,\n \"sw\":1,\n \"ph\":0,\n \"pw\":0,\n \"dh\":0,\n \"dw\":0,\n \"bias\":\"False\"\n },\n\nwhich has a dh and dw of zero, causing it to be added to invalid cases here:\ndh, dw = case['dh'], case['dw']\n has_bias = case['bias']\n if dh == 0 or dw == 0:\n invalid_cases.append(case_name)", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": { - "login": "mruberry" - }, - "databaseId": 663240268 + "node": { + "name": "cla signed" + } }, { - "bodyText": "@mruberry the failure was not detected is because we did not export the cudnn path. Yes, you are right, we need to a large atol of 1e-2 . Would you please help review it again? Thanks.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 664373079 - }, + "node": { + "name": "Stale" + } + } + ] + } + } + } + } + }, + "query_sha=74bd29fe945c49fde4818e873fa62bc60b55b4ef6ae3f2bb719bab6cddbaa7ce cursor=MTAw name=pytorch number=76118 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits_with_authors": { + "nodes": [ { - "bodyText": "@mruberry the failure was not detected is because we did not export the cudnn path. Yes, you are right, we need to a large atol of 1e-2 . Would you please help review it again? Thanks.\n\nBefore I run these tests again, is an atol of 1e-2 needed for all types or just half? Also, how does 1e-2 compare to the values that are being compared?", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 664569507 + "commit": { + "author": { + "user": { + "login": "clee2000" + }, + "email": "csl@fb.com", + "name": "Catherine Lee" + }, + "oid": "7f560351ae04ea43e58fbfda885bcf216aa26cde" + } }, { - "bodyText": "@mruberry 1e-2 is experimental result, details see below, random means it might be failed sometimes.\n\n\n\natol,rtol\n1e-2,1e-2\n1e-2,1e-3\n1e-3,1e-2\n1e-3,1e-3\n1e-4,1e-3\n1e-3,1e-4\n1e-4,1e-4\n1e-4,1e-5\n1e-5,1e-4\n\n\n\n\nCuda float16\npass\npass\npass\npass\npass\nfail\nFail\nFail\nfail\n\n\nCuda float32\npass\nrandom\nrandom\nrandom\nrandom\nrandom\nrandom\nrandom\nfail", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 666894774 + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "e8677ed168a036bc7e590d800fe98dd15f10581b" + } }, { - "bodyText": "@mruberry would you please find time to review it again? Thanks.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 668380451 + "commit": { + "author": { + "user": { + "login": "robieta" + }, + "email": "taylorrobie@fb.com", + "name": "Taylor Robie" + }, + "oid": "ac5611caa13642ef8dbe0db453b283b42cbd900b" + } }, { - "bodyText": "@mruberry would you please find time to review it again? Thanks.\n\nI was just about to try and run this again locally but it looks like the files describing the convolutions are missing?", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 670306210 + "commit": { + "author": { + "user": { + "login": "robieta" + }, + "email": "taylorrobie@fb.com", + "name": "Taylor Robie" + }, + "oid": "1184afbd3bfde0f46133aef09e55e18d3bfb3c3e" + } }, { - "bodyText": "@mruberry sorry but what is missing actually?", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 670322557 + "commit": { + "author": { + "user": { + "login": "minsii" + }, + "email": "msi@fb.com", + "name": "Min Si" + }, + "oid": "1c05604f3d049c67dc678d0295c0add470bff3dc" + } }, { - "bodyText": "@mruberry sorry but what is missing actually?\n\nThe JSON files.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 670591170 + "commit": { + "author": { + "user": null, + "email": "eellison@devfair044.h1.fair", + "name": "Elias Ellison" + }, + "oid": "76ab5101bd36e8d73637d31bbea125240b7b27f0" + } }, { - "bodyText": "@mruberry sorry but what is missing actually?\n\nThe JSON files.\n\n@mruberry sorry, we add them now, would you please check it again? Thanks.", - "author": { - "login": "mingxiaoh" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 673402901 + "commit": { + "author": { + "user": null, + "email": "eellison@devfair044.h1.fair", + "name": "Elias Ellison" + }, + "oid": "c774050e92c3d8e52968e1eb635dd3e9491104b3" + } }, { - "bodyText": "I cloned your repo and ran the tests:\n~/pytorch/test/math_libraries$ python convolutions.py\nFFFF\n======================================================================\nFAIL: test_conv2d_ext_cpu_float32 (__main__.TestConvExtCPU)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n----------------------------------------------------------------------\nRan 4 tests in 33.838s\n\nFAILED (failures=4)\n\nStill fails.", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 673760580 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOIapCfg==", - "hasPreviousPage": false - } - } - } - } - } - }, - "query_sha=2dc8bfb6750c4a2402124dc53123d266427c0b92d06add20e3221b57a0f5268f commit=6882717f73deffb692219ccd1fd6db258d8ed684 name=pytorch owner=pytorch": { - "data": { - "repository": { - "object": { - "checkSuites": { - "edges": [ - { - "node": { - "app": { - "name": "Facebook GitHub Tools", - "databaseId": 12274 + "commit": { + "author": { + "user": { + "login": "guoyejun" + }, + "email": "yejun.guo@intel.com", + "name": "Guo Yejun" }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } + "oid": "8981595c5361f07186f4534f3be71f1d829a3046" + } + }, + { + "commit": { + "author": { + "user": { + "login": "BowenBao" + }, + "email": "bowbao@microsoft.com", + "name": "BowenBao" }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625272" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hng=" + "oid": "036f362904024ac9481248965009f312bec6656b" + } }, { - "node": { - "app": { - "name": "Netlify", - "databaseId": 13473 + "commit": { + "author": { + "user": { + "login": "janeyx99" + }, + "email": "janeyx@fb.com", + "name": "Jane Xu" }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } + "oid": "457d994933f164a9fd70da5ca2733dd6c046a28b" + } + }, + { + "commit": { + "author": { + "user": { + "login": "janeyx99" + }, + "email": "janeyx@fb.com", + "name": "Jane Xu" }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625297" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hpE=" + "oid": "f49ebc77520774e71722111d554a0215a26956df" + } }, { - "node": { - "app": { - "name": "Azure Pipelines", - "databaseId": 9426 + "commit": { + "author": { + "user": { + "login": "mikeiovine" + }, + "email": "mikeiovine@fb.com", + "name": "Mike Iovine" }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } + "oid": "f069e1a4a5f98d3fe961e4fc562ede59f59b4026" + } + }, + { + "commit": { + "author": { + "user": { + "login": "salilsdesai" + }, + "email": "salilsdesai@fb.com", + "name": "Salil Desai" }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625308" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hpw=" + "oid": "30bccf58393b288412a0f5a2423a1a41ffce258e" + } }, { - "node": { - "app": { - "name": "Dependabot", - "databaseId": 29110 + "commit": { + "author": { + "user": { + "login": "angelayi" + }, + "email": "angelayi@fb.com", + "name": "Angela Yi" }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } + "oid": "f4ba440fe8a632c1ee88e01f7746a8a92c8f3902" + } + }, + { + "commit": { + "author": { + "user": null, + "email": "shirong@fb.com", + "name": "Shirong Wu" }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625328" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hrA=" + "oid": "d203346c93ba96d626c6c02910888198c789ba69" + } }, { - "node": { - "app": { - "name": "Codecov", - "databaseId": 254 + "commit": { + "author": { + "user": null, + "email": "jamesreed@fb.com", + "name": "James Reed" }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } + "oid": "73a4e34963e212b799a191fd031d2fa31d17e0ac" + } + }, + { + "commit": { + "author": { + "user": { + "login": "Krovatkin" + }, + "email": "korovaikon@gmail.com", + "name": "Nikolay Korovaiko" }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625347" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hsM=" + "oid": "b9d5206dfb46f09f953aba3ffb0e1e33a99032ee" + } }, { - "node": { - "app": { - "name": "PyTorch Bot", - "databaseId": 40112 + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } + "oid": "12114e6937573fead54e11ae6cdebe5b31dee302" + } + }, + { + "commit": { + "author": { + "user": { + "login": "s4ayub" + }, + "email": "shababayub@fb.com", + "name": "Shabab Ayub" }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625357" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hs0=" + "oid": "f2323f76ad6f7f590285bf9c6d20c14a79542563" + } }, { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "Lint" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "workflow-checks", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257521878?check_suite_focus=true" - }, - { - "name": "quick-checks", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257521941?check_suite_focus=true" - }, - { - "name": "Test tools", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522171?check_suite_focus=true" - }, - { - "name": "toc", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522418?check_suite_focus=true" - }, - { - "name": "Test collect_env (with_torch)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522648?check_suite_focus=true" - }, - { - "name": "Test collect_env (without_torch)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522731?check_suite_focus=true" - }, - { - "name": "Test collect_env (older_python_version)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522798?check_suite_focus=true" - }, - { - "name": "lintrunner", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523046?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbCVA2Y=", - "hasNextPage": false - } + "commit": { + "author": { + "user": { + "login": "jaglinux" + }, + "email": "jagdish.krishna@gmail.com", + "name": "Jagadish Krishnamoorthy" }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625464" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1hzg=" + "oid": "acd4b5abe2739c09c1a02524eceda46ff93fd385" + } }, { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 + "commit": { + "author": { + "user": { + "login": "cccclai" + }, + "email": "chenlai@fb.com", + "name": "Chen Lai" }, - "workflowRun": { - "workflow": { - "name": "trunk" - } + "oid": "04179f533283132fa334a9f91a070b1712f7323d" + } + }, + { + "commit": { + "author": { + "user": { + "login": "zaxtax" + }, + "email": "rob@zinkov.com", + "name": "Rob Zinkov" }, - "checkRuns": { - "nodes": [ - { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522494?check_suite_focus=true" - }, - { - "name": "android-emulator-build-test / build-and-test", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522741?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-no-ops / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522887?check_suite_focus=true" - }, - { - "name": "macos-10-15-py3-arm64 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523057?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523301?check_suite_focus=true" - }, - { - "name": "ios-12-5-1-x86-64 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523681?check_suite_focus=true" - }, - { - "name": "libtorch-linux-bionic-cuda11.6-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523926?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524141?check_suite_focus=true" - }, - { - "name": "libtorch-linux-xenial-cuda10.2-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524423?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9-slow / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524568?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.1-py3.7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524710?check_suite_focus=true" - }, - { - "name": "macos-10-15-py3-lite-interpreter-x86-64 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524925?check_suite_focus=true" - }, - { - "name": "macos-11-py3-x86-64 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257525196?check_suite_focus=true" - }, - { - "name": "caffe2-linux-focal-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257525344?check_suite_focus=true" - }, - { - "name": "parallelnative-linux-focal-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257525621?check_suite_focus=true" - }, - { - "name": "parallelnative-linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257748822?check_suite_focus=true" - }, - { - "name": "parallelnative-linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257748937?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9-slow / test (slow, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257940181?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257996123?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257996266?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (slow, 1, 1, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257996436?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (nogpu_AVX512, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257996598?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257996687?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (jit_legacy, 1, 1, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257996800?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257996869?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257996947?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258043565?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258043644?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3 / test (default, 1, 5, windows.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258043840?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3 / test (default, 2, 5, windows.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258043904?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3 / test (default, 3, 5, windows.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258043967?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3 / test (default, 4, 5, windows.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258044051?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3 / test (default, 5, 5, windows.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258044125?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258044194?check_suite_focus=true" - }, - { - "name": "macos-12.3-py3.8-arm64-test / Run MPS tests", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258358668?check_suite_focus=true" - }, - { - "name": "macos-11-py3-x86-64 / test (default, 1, 2, macos-12)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258757994?check_suite_focus=true" - }, - { - "name": "macos-11-py3-x86-64 / test (default, 2, 2, macos-12)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258758076?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbCn27w=", - "hasNextPage": false - } + "oid": "5097cdcd6994ad82b3cec942b70e75dbeaee8ca4" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625556" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1h5Q=" + "oid": "5015ecb5a2b86943f457d71f5a977444dd062732" + } }, { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" }, - "workflowRun": { - "workflow": { - "name": "pull" - } + "oid": "1c42b7789d3966cd541b08fce359b9738fee69f6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" }, - "checkRuns": { - "nodes": [ - { - "name": "linux-bionic-rocm5.1-py3.7", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522250?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522456?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522650?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang10-onnx / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257522894?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523070?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523312?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3_7-clang8-xla / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523709?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257523936?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524138?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524427?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang7-asan / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524554?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524720?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257524938?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257525212?check_suite_focus=true" - }, - { - "name": "linux-jammy-cuda11.6-cudnn8-py3.8-clang12 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257525332?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257525623?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257525714?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257525946?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257526187?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7-no-ops / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257526402?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257526593?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257688277?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257759879?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257760015?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257760116?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257760245?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257760346?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257760456?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257909951?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257909994?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257912956?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257934535?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257934615?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257934714?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257934784?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257934866?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257934975?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (cpp)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257935092?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (python)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257935201?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257943077?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257943146?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257943200?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257943268?check_suite_focus=true" - }, + "oid": "893ac3d334fd3e85e22423a06fe986ce453fe304" + } + }, + { + "commit": { + "author": { + "user": { + "login": "emcastillo" + }, + "email": "ecastill@preferred.jp", + "name": "Emilio Castillo" + }, + "oid": "aa5d1b6b031ee2b8bb85f793a842ac1327ae4a19" + } + }, + { + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "0707a1d00f33d7098f56de339cb30436e8c2ea44" + } + }, + { + "commit": { + "author": { + "user": { + "login": "NivekT" + }, + "email": "ktse@fb.com", + "name": "Kevin Tse" + }, + "oid": "ccb082d42af99f6374183cf914cc712bac585f0f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ryandaryl" + }, + "email": "ryandarylmills@gmail.com", + "name": "ryandaryl" + }, + "oid": "4f2909cc8747808786a1871b0a6825cc4566f48c" + } + }, + { + "commit": { + "author": { + "user": { + "login": "clee2000" + }, + "email": "csl@fb.com", + "name": "Catherine Lee" + }, + "oid": "f764010648a29223d9ed4b955073d9d2fb1b2f43" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "5696e8357cf38f852ef3d680381513e26f202371" + } + } + ], + "pageInfo": { + "endCursor": "MTMx", + "hasNextPage": false + } + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=76123 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "kumpera" + }, + "title": "Introduce distributed checkpoint with ShardedTensor.", + "body": "Co-authored-by: Wen Zhang \r\nCo-authored-by: Yifu Wang \r\n\r\n", + "headRefName": "st_checkpoint", + "headRepository": { + "nameWithOwner": "kumpera/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "kumpera" + }, + "email": "kumpera@fb.com", + "name": "Rodrigo Kumpera" + }, + "oid": "6bf248bc20a71f248064b795f38276326fe43aae" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kumpera" + }, + "email": "kumpera@fb.com", + "name": "Rodrigo Kumpera" + }, + "oid": "10f84fb90bf02d7062e565ebf2c1da6352b64db7" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kumpera" + }, + "email": "kumpera@fb.com", + "name": "Rodrigo Kumpera" + }, + "oid": "96c5299740ec791f3cf0975c03a40a7b219b6747" + } + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + }, + "totalCount": 3 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ { - "name": "linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257943319?check_suite_focus=true" + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.intern.facebook.com/cla/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS2l4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755666" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSmtI=" }, { - "name": "linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257943373?check_suite_focus=true" + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063614/jobs/3379894109" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd2r3Q=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755785" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm0k=" }, { - "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257960183?check_suite_focus=true" + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894107" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894332" + }, + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894444" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894520" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894567" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894616" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894672" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd2shU=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755786" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm0o=" }, { - "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7257960282?check_suite_focus=true" + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902301" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902363" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902507" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902560" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902579" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902603" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902637" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902685" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902740" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902761" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902794" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902874" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903006" + }, + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903111" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903193" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903284" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903357" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903446" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903512" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903546" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379944655" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379944695" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946308" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946337" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946359" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946391" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946423" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946453" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946496" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946529" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950041" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950137" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950165" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950192" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950646" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379951202" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379951230" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379963877" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379963928" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379963976" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379964018" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379966372" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379996173" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379996218" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379997861" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379998374" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379998397" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379998422" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379998441" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3380042106" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd5yuY=", + "hasNextPage": true + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755806" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm14=" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258020141?check_suite_focus=true" + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387419477" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387419699" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387419923" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387419992" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387420129" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387420208" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387420309" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS3SE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363240" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNGg=" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258020221?check_suite_focus=true" + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796862/jobs/3387419465" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS1-o=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363271" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNIc=" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258020306?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbCcmdI=", - "hasNextPage": true - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/6882717f73deffb692219ccd1fd6db258d8ed684/checks?check_suite_id=7280625557" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbH1h5U=" - } - ], - "pageInfo": { - "hasNextPage": false - } - } - } - } - } - }, - "query_sha=23d6a47e5fd875c42231779040ec1d35d0042b502c9142cb0d33d6f65d58fead commit=6882717f73deffb692219ccd1fd6db258d8ed684 cr_cursor=Y3Vyc29yOnYyOpHPAAAAAbCcmdI= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAbH1h5Q= name=pytorch owner=pytorch": { - "data": { - "repository": { - "object": { - "oid": "6882717f73deffb692219ccd1fd6db258d8ed684", - "checkSuites": { - "nodes": [ - { - "checkRuns": { - "nodes": [ - { - "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258020388?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258020493?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7258219463?check_suite_focus=true" + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-bionic-rocm5.1-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387419999" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420164" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420316" + }, + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420477" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420675" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420934" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387421278" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387421672" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387421888" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387421982" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422191" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422303" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422476" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422715" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422963" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423092" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423234" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423421" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423622" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423739" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387545789" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387546032" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387546119" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553028" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553144" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553251" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553438" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553556" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553668" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387554002" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387554098" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387558927" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387559016" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387559071" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387559139" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387563803" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387563894" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387580868" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387580936" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387580993" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387581053" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387592286" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387631950" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387632035" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387649916" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387649974" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387650084" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387650151" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387650373" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387753429" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgaCXo=", + "hasNextPage": true + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363300" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNKQ=" + } + ], + "pageInfo": { + "hasNextPage": false } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbCfo8c=", - "hasNextPage": false - } + }, + "status": null, + "pushedDate": "2022-05-05T00:34:26Z", + "oid": "96c5299740ec791f3cf0975c03a40a7b219b6747" } } ] - } - } - } - } - }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=76118 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "closed": true, - "isCrossRepository": false, - "author": { - "login": "malfet" - }, - "title": "Dummy change with lots of commits", - "body": "Draft PR with 100+ commits, to test mergebot ", - "headRefName": "malfet/pr-with-lots-of-commits", - "headRepository": { - "nameWithOwner": "pytorch/pytorch" }, - "baseRefName": "master", - "baseRepository": { - "nameWithOwner": "pytorch/pytorch", - "isPrivate": false, - "defaultBranchRef": { - "name": "master" + "changedFiles": 11, + "files": { + "nodes": [ + { + "path": "test/distributed/_shard/checkpoint/test_checkpoint.py" + }, + { + "path": "test/distributed/_shard/checkpoint/test_file_system_checkpoint.py" + }, + { + "path": "test/distributed/_shard/sharded_tensor/test_sharded_tensor.py" + }, + { + "path": "torch/distributed/_shard/checkpoint/__init__.py" + }, + { + "path": "torch/distributed/_shard/checkpoint/filesystem.py" + }, + { + "path": "torch/distributed/_shard/checkpoint/metadata.py" + }, + { + "path": "torch/distributed/_shard/checkpoint/resharding.py" + }, + { + "path": "torch/distributed/_shard/checkpoint/state_dict_loader.py" + }, + { + "path": "torch/distributed/_shard/checkpoint/state_dict_saver.py" + }, + { + "path": "torch/distributed/_shard/checkpoint/storage.py" + }, + { + "path": "torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py" + } + ], + "pageInfo": { + "endCursor": "MTE", + "hasNextPage": false } }, - "mergeCommit": null, - "commits_with_authors": { + "reviews": { "nodes": [ { - "commit": { - "author": { - "user": { - "login": "malfet" - }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" - }, - "oid": "3067f2240afc7a29dc348000aa19eccbd9772303" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "wanchaol" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "simpkins" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "andrewor14" - }, - "email": "andrewor@fb.com", - "name": "Andrew Or" - }, - "oid": "2f655b71f70c496c4e645f6cdb27d7bb7e825701" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "0c6dcaa7f58a19c42a530f4ee14bb6f0f03ca9fb" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dzdang" - }, - "email": "dzdang@umich.edu", - "name": "dzdang" - }, - "oid": "cad11c563d41ebcffb1683fe1f1288b8157413b3" - } + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "jwtan@fb.com", - "name": "Jiewen Tan" - }, - "oid": "4dfd0875a68d87fccb5ad0d81692db480043b86e" - } + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "2d37e74690582a4a26890e4c8b98f1f80e589c82" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "jwtan@fb.com", - "name": "Jiewen Tan" - }, - "oid": "d4aee60947e1a3ef23c7c42990621e0746fdd0a8" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "peterbell10" - }, - "email": "peterbell10@live.co.uk", - "name": "Peter Bell" - }, - "oid": "aac6204bf710beb5e50a383d426ae6222396335a" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dzdang" - }, - "email": "dzdang@umich.edu", - "name": "dzdang" - }, - "oid": "4b0362cab884584c24f5834b3874f5f357f56b5d" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "7536df613cbc645a9e68e6a3b0a8450753260fd1" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "20a50cb966d28d7bf82924adf781cf72a01ef90e" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "486387e8644afb46edff5aa5925b55c8119f67f0" - } + "author": { + "login": "simpkins" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dzdang" - }, - "email": "dzdang@umich.edu", - "name": "dzdang" - }, - "oid": "acb9d78b9b732d3667b881727e6ed9f92a8c549f" - } + "author": { + "login": "simpkins" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "683bb7959a5b973f8470c081ad02e8fc508e784a" - } + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "qihqi" - }, - "email": "qihan@fb.com", - "name": "Han Qi" - }, - "oid": "a870cb40af65adf0b77d55f6b554d7093d284d7a" - } + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "Krovatkin" - }, - "email": "korovaikon@gmail.com", - "name": "Nikolay Korovaiko" - }, - "oid": "70793b9f328ddf52cc86336104c3a064c8582ef4" - } + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "suo" - }, - "email": "suo@fb.com", - "name": "Michael Suo" - }, - "oid": "f70b31f62b1c5159eef2725484b175983517c88c" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dagitses" - }, - "email": "mikeyd@fb.com", - "name": "Michael Andreas Dagitses" - }, - "oid": "04d3ec1db60defe1c6904bf77e9f8dfa87dc0b63" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "46b754a55b63e3168ad5854ad412c124934b675d" - } + "author": { + "login": "wilson100hong" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "wilson100hong" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "wilson100hong" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "robieta" - }, - "email": "taylorrobie@fb.com", - "name": "Taylor Robie" - }, - "oid": "13df69e13ee571fdd716139419a00aec47ade7d6" - } + "author": { + "login": "xunnanxu" + }, + "state": "DISMISSED" }, { - "commit": { - "author": { - "user": { - "login": "malfet" - }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" - }, - "oid": "70642e911ec80a47cdbf4a50aac475c11aa129b6" - } + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" - }, - "oid": "59bb7c39384bf3e0b284a037adef8b3caa53c1c4" - } + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "malfet" - }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" - }, - "oid": "007cfb97b55d70ff63e1ed71d1a674638f847376" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" - }, - "oid": "0a7b858a5af1393fa3cf2853f92eca0e1d408dde" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "qihqi" - }, - "email": "qihan@fb.com", - "name": "Han Qi" - }, - "oid": "7917d789f0a523715041ade5177d271082628236" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "kit1980" - }, - "email": "sdym@fb.com", - "name": "Sergii Dymchenko (Meta Employee)" - }, - "oid": "91eb6017f0fb8a1b29e8cb48fac93bc9709f73b3" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dagitses" - }, - "email": "mikeyd@fb.com", - "name": "Michael Andreas Dagitses" - }, - "oid": "bd04dca5fabb0c2a51ac87063a515f256ef274fa" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dagitses" - }, - "email": "mikeyd@fb.com", - "name": "Michael Andreas Dagitses" - }, - "oid": "1f805a5defda7dabc49d0059edb9ccb06bc29352" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "mruberry" - }, - "email": "mruberry@fb.com", - "name": "Mike Ruberry" - }, - "oid": "4982c0a8db8f23d15ec4bfcbca4ce939afc04954" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "pearu" - }, - "email": "pearu.peterson@gmail.com", - "name": "Pearu Peterson" - }, - "oid": "28502265cb5925cb7db8dcb2dd2334963092714a" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "e03fcaedb1342e6d65c7f7f20243000938ba60b2" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "pritamdamania" - }, - "email": "pritam.damania@fb.com", - "name": "pritam" - }, - "oid": "efb28f5a1a5d18aa96bd668ab2ab5c651be359f3" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "MagiaSN" - }, - "email": "magialiao@tencent.com", - "name": "magialiao" - }, - "oid": "52cc1b9994f861ebdd3908759ed1ab11cba1f8de" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" - }, - "oid": "3cd99f23d1acd6a5bedf6f3b02be79d64350a5b6" - } + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "awgu" - }, - "email": "andgu@fb.com", - "name": "Andrew Gu" - }, - "oid": "b00502c634a5146f4d996bd90e84d317f049e7b0" - } + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "davidberard98" - }, - "email": "dberard@fb.com", - "name": "David Berard" - }, - "oid": "976eb7cee799dddfbe6a4122b249aaee1b6c8854" - } + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "ngimel" - }, - "email": "ngimel@fb.com", - "name": "Natalia Gimelshein" - }, - "oid": "9608ab28744d5cae32f371490557b248c9549c66" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "malfet" - }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" - }, - "oid": "4e119f0c39eb5ff0777f0e71561e6b633d85fb34" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "rohan-varma" - }, - "email": "rvarm1@fb.com", - "name": "Rohan Varma" - }, - "oid": "447580dc565f3660eddb2c996c6ed25b88338684" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "malfet" - }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" - }, - "oid": "2bc8f43e9233008ea23053fab87b83ab36fca5e3" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dzdang" - }, - "email": "dzdang@umich.edu", - "name": "dzdang" - }, - "oid": "c13a8e891c3e3e714f60649ca1e3b082e090e9fe" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dzdang" - }, - "email": "dzdang@umich.edu", - "name": "dzdang" - }, - "oid": "fddc861b7ee473f57d3c2161e4618a2663a237e8" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "jiyuanzFB" - }, - "email": "jiyuanz@fb.com", - "name": "Jiyuan Zhang" - }, - "oid": "e2336dbc539d6c021720cbe43c92c9e4c8463299" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "bdhirsh" - }, - "email": "hirsheybar@fb.com", - "name": "Brian Hirsh" - }, - "oid": "26e2759d1ad59aac12168b74d1ca55e42ba9455c" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "bdhirsh" - }, - "email": "hirsheybar@fb.com", - "name": "Brian Hirsh" - }, - "oid": "ad7aa914ee3b3d1252e31514f010ba96c40aae87" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "bdhirsh" - }, - "email": "hirsheybar@fb.com", - "name": "Brian Hirsh" - }, - "oid": "f113c5d78065aafbe7b1c0e611945bfe9f67b3c0" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "bdhirsh" - }, - "email": "hirsheybar@fb.com", - "name": "Brian Hirsh" - }, - "oid": "a366fd01136292544b7862968ae92feba4b6d8fe" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "seemethere" - }, - "email": "eliuriegas@fb.com", - "name": "Eli Uriegas" - }, - "oid": "afeba0773749da5883c378a2e6ac066e1ce62ca0" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "bdhirsh" - }, - "email": "hirsheybar@fb.com", - "name": "Brian Hirsh" - }, - "oid": "d306c99addc543908f64666baeecacbd0749f4a7" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "awgu" - }, - "email": "andgu@fb.com", - "name": "Andrew Gu" - }, - "oid": "c2456ea658f41f64ea054a422edf22a9c977399f" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "awgu" - }, - "email": "andgu@fb.com", - "name": "Andrew Gu" - }, - "oid": "a8b0a1b681c9fe41e0d553c962a5c93e81d92503" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "anjali411" - }, - "email": "chourdiaanjali123@gmail.com", - "name": "anjali411" - }, - "oid": "af761d9a5d058c9188f16589bae4f307d35185be" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "clee2000" - }, - "email": "csl@fb.com", - "name": "Catherine Lee" - }, - "oid": "beceb417baef35b15c2716e23178fb49f7fd6f9d" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "1516554e22136db89d0aeba43a1a1a987e995d68" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "qihqi" - }, - "email": "qihan@fb.com", - "name": "Han Qi" - }, - "oid": "68eb1fa8374eff6cbdcf0be5e37ed6775d22e722" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "janeyx99" - }, - "email": "janeyx@fb.com", - "name": "Jane Xu" - }, - "oid": "3c7bcb99b5c0c879c2610f427880b03881f82f38" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "janeyx99" - }, - "email": "janeyx@fb.com", - "name": "Jane Xu" - }, - "oid": "38c1a2028090353e40a019c673c9ab16b39e4825" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "albanD" - }, - "email": "albandes@fb.com", - "name": "Alban Desmaison" - }, - "oid": "8091cbea2c95ed2c4c406b3c61547a27c6319bae" - } + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "d81f59121969a47c8b2213a88e02cf9be0219be9" - } + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "20d798b319cd107a767fe220f7a3027c18a1c844" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "dzdang" - }, - "email": "dzdang@umich.edu", - "name": "dzdang" - }, - "oid": "eb35381a770b58c1cd41e935910cb4df2f3d8f14" - } + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" - }, - "oid": "e6498a657b9aa47546dcd92d1b4ffb2e1a50ebdb" - } + "author": { + "login": "pritamdamania87" + }, + "state": "APPROVED" }, { - "commit": { - "author": { - "user": { - "login": "dzdang" - }, - "email": "dzdang@umich.edu", - "name": "dzdang" - }, - "oid": "7f821382db5ad08efe5b09a145c606852b8a9272" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "albanD" - }, - "email": "albandes@fb.com", - "name": "Alban Desmaison" - }, - "oid": "995c0e11a97d854ff969962bd81d7341e46ecb07" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "davidberard98" - }, - "email": "dberard@fb.com", - "name": "David Berard" - }, - "oid": "28d6258e62c9fc361a18689877c962c69889dc23" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "HarborYuan" - }, - "email": "yuanhaobo@whu.edu.cn", - "name": "Haobo Yuan" - }, - "oid": "2350fad8391367ebf81c7236a2c883644b4ff622" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "zou3519" - }, - "email": "zou3519@gmail.com", - "name": "Richard Zou" - }, - "oid": "3f789c9ccecdd7e2e52269453646e992a68c6b92" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "jeffdaily" - }, - "email": "jeff.daily@amd.com", - "name": "Jeff Daily" - }, - "oid": "20f79f610c1a3314da96d49515bbfbee9442e4f8" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "peterbell10" - }, - "email": "peterbell10@live.co.uk", - "name": "Peter Bell" - }, - "oid": "5823958f047f3b71a5dc8c52a20eb8ae3291bd3e" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "peterbell10" - }, - "email": "peterbell10@live.co.uk", - "name": "Peter Bell" - }, - "oid": "a0b15c49ecf3844daf2c0dcaef44f0214259db20" - } - }, + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNC0yNVQxMTozNTowMS0wNzowMLkyMDIyLTA0LTI1VDExOjM1OjAwLTA3OjAwzjjC2d0=", + "hasPreviousPage": true + } + }, + "comments": { + "nodes": [ { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "4afc38c25ca2ca126ba4987a419a58a5c572223b" - } + "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118495479 }, { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "b606f58d4a36683fbe0a7d02adfdde7d5cc694c2" - } + "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118511287 }, { - "commit": { - "author": { - "user": { - "login": "albanD" - }, - "email": "albandes@fb.com", - "name": "Alban Desmaison" - }, - "oid": "2d61b4d630f6482a6c3cc7437091fad6d27c347e" - } + "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118662274 }, { - "commit": { - "author": { - "user": { - "login": "george-qi" - }, - "email": "georgeqi94@gmail.com", - "name": "George Qi" - }, - "oid": "bc5384c47036a6cda94129f3e2f9e43c43393698" - } + "bodyText": "Merge failed due to Can't fetch all PR reviews Raised by https://github.com/pytorch/pytorch/actions/runs/2275691136\n\n@osalpekar @malfet This is failing because there are 109 review comments on this PR but we only fetch the first 100. This could be solved with a similar concept as how we fetch more comments/check_runs.", + "author": { + "login": "janeyx99" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118689010 }, { - "commit": { - "author": { - "user": { - "login": "malfet" - }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" - }, - "oid": "60fc3277634365b64465712b13db2acb76d6c890" - } - }, + "bodyText": "On a side note, has the test_fsdp_clip_grad_norm_norm_type_2_0_nested_fsdp_False_cpu_offload_CPUOffload failure on the distributed test first shard of this PR been addressed?", + "author": { + "login": "janeyx99" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118693497 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQqri9w==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ { - "commit": { - "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" - }, - "oid": "1b8762e95bc38d1847fe99ed3230546c8b800bfd" + "node": { + "name": "oncall: distributed" } }, { - "commit": { - "author": { - "user": { - "login": "jerryzh168" - }, - "email": "jerryzh168@gmail.com", - "name": "Jerry Zhang" - }, - "oid": "6acf60f95f59ecbc6e8ce830dea0abba7d3ec763" + "node": { + "name": "cla signed" } + } + ] + } + } + } + } + }, + "query_sha=6a8ce6412a780d5804bfe180ed1dc807269e1eae2ae50de2346d56d1283884bc cursor=Y3Vyc29yOnYyOpO5MjAyMi0wNC0yNVQxMTozNTowMS0wNzowMLkyMDIyLTA0LTI1VDExOjM1OjAwLTA3OjAwzjjC2d0= name=pytorch number=76123 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "reviews": { + "nodes": [ + { + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "ysiraichi" - }, - "email": "yukio.siraichi@gmail.com", - "name": "Yukio Siraichi" - }, - "oid": "8fb0276561fdd530c5a06ea195e930e0584f8705" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "albanD" - }, - "email": "albandes@fb.com", - "name": "Alban Desmaison" - }, - "oid": "1da7aed95a8700406671425eac1e4bbc2c7a24b5" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "thiagocrepaldi" - }, - "email": "thiago.crepaldi@microsoft.com", - "name": "Thiago Crepaldi" - }, - "oid": "83208e7dee4503c1bee1df9f6632794694dffa01" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "kshitij12345" - }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" - }, - "oid": "1a46cf08dcd3d3564604c17b2c02d7e4eb45a7ff" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "malfet" - }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" - }, - "oid": "b7f9b6689445f826c83694652fea5f7cfc7070d7" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "fatcat-z" - }, - "email": "jiz@microsoft.com", - "name": "Jay Zhang" - }, - "oid": "f273961c1696b156e35f8c76f7ad37934031050d" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "commit": { - "author": { - "user": { - "login": "pavithranrao" - }, - "email": "pavithran@fb.com", - "name": "Pavithran Ramachandran" - }, - "oid": "eb410a51fcbc716873fd80a970eb932d4aaaea61" - } + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, + { + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNC0yMlQyMDozNzo1NC0wNzowMLkyMDIyLTA0LTIyVDE2OjAyOjA5LTA3OjAwzjip7G8=", + "hasPreviousPage": false + } + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=71759 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "coolteemf" + }, + "title": "Optimize grid sample 3d", + "body": "Fixes #71415\r\nI have implemented the changes that replicate what @to-mi did in this [PR](https://github.com/pytorch/pytorch/pull/65986#issue-1012959443) for the 3D case :\r\n\r\n> Fixes #64977\r\n> \r\n> Avoids creating a tensor for and calculating `input` gradient if it's not needed in the backward pass of `grid_sample` (2d case, native CPU & CUDA kernels). Especially the tensor creation seemed time consuming (see #64977).\r\n> \r\n> Brief description of the changes:\r\n> \r\n> * I have tried to go with rather minimal changes. It would probably be possible to make a more elegant version with a bit larger refactoring (or possibly with better understanding of PyTorch internals and C++ functionalities).\r\n> \r\n> * Changed the `native_functions.yaml` and `derivatives.yaml` so that the gradient input mask is passed to the functions.\r\n> \r\n> * Changed the CPU kernels:\r\n> (1) added `bool input_requires_grad` template parameter to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorAccessor* gInp_slice_ptr` instead of `TensorAccessor& gInp_slice` so that I can pass a `nullptr` in case gradient for `input` is not requested. (A bit inelegant perhaps, but allows to keep one signature for `backward` function and not require breaking it to smaller pieces. Perhaps there's a more elegant way to achieve this?)\r\n> \r\n> * Changed CUDA kernel:\r\n> (1) added ~`bool input_requires_grad` template parameter~ `const bool input_requires_grad` argument to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorInfo()` instead of `getTensorInfo(grad_input)` in case gradient for `input` is not requested.\r\n> \r\n> * Modified tests in `test/test_nn.py` so that they run also cases with no `input` gradient needed.\r\n> \r\n> * Have not touched the CPU fallback kernel.\r\n\r\nNote: the changes number (3) are N/A in this case.\r\n\r\n", + "headRefName": "optimize_grid_sample_3d", + "headRepository": { + "nameWithOwner": "coolteemf/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { "commit": { "author": { - "user": { - "login": "ngimel" - }, - "email": "ngimel@fb.com", - "name": "Natalia Gimelshein" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "7dbb12cdc02332fa64264ed0df576511a5070d7e" + "oid": "e0b0d1e695aeddceaf265da602c4704592053e9e" } }, - { - "commit": { - "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" + { + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "43675665fa6b5154de8b25125dd03d7be35c884f" + "oid": "563ec73747ad53b63b36736c47c4342f962c2a09" } }, { "commit": { "author": { - "user": { - "login": "albanD" - }, - "email": "albandes@fb.com", - "name": "Alban Desmaison" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "6c4d23c402c413667463770d9a2fa801f493d3c5" + "oid": "51abe41a132d9dd5b1c0551bdca902aacc028ff8" } }, { "commit": { "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "cf3778a35129a40dee14366515201b7ed2c0f346" + "oid": "be9898205992034a00e8ace8a55c2ecdcee2c2f8" } }, { "commit": { "author": { - "user": { - "login": "dzdang" - }, - "email": "dzdang@umich.edu", - "name": "dzdang" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "9d00a051373cb81f79cb6375942cf3ec9fff2fe6" + "oid": "2929c60b64384c2deae0f7dea8bab94ad4bc9ec8" } }, { "commit": { "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "1eae67cf404aa8dffb80b8e85180f943878d52a6" + "oid": "9241b737e7e2b257905cc74ad9c50b737d7f9d0a" } }, { "commit": { "author": { - "user": { - "login": "janeyx99" - }, - "email": "janeyx@fb.com", - "name": "Jane Xu" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "ce0e69dcda0fe41a6e964d6ac70ce8016979c71a" + "oid": "64d6b795d0636928a8aa2fd3da01302fb5f5f7af" } }, { "commit": { "author": { - "user": { - "login": "swolchok" - }, - "email": "swolchok@fb.com", - "name": "Scott Wolchok" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "6faba554f6e49777f24911928edb3061b6ed0e3d" + "oid": "4503577e53760a0006f1e80ca6bfe04d2be90470" } }, { "commit": { "author": { - "user": { - "login": "IvanYashchuk" - }, - "email": "ivan.yashchuk@aalto.fi", - "name": "Ivan Yashchuk" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "d1d0e03f57a359f8f95331f9a34b8bed3e7cc845" + "oid": "b16f4b11ffbbbf2ca2098f9702af4ef6b6fc5e1f" } }, { "commit": { "author": { - "user": { - "login": "Chillee" - }, - "email": "chilli@fb.com", - "name": "Horace He" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "bb46bd9233a9fc631802a902cb48a4c13c2722ca" + "oid": "7ffc23368a604afdc92d2818747f730ce31a2bb5" } }, { "commit": { "author": { - "user": { - "login": "mehtanirav" - }, - "email": "niravmehta@fb.com", - "name": "Nirav Mehta" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "3b1007fe4be12e483f2620fbac67cae42e703efc" + "oid": "b85292604b9ad6c31706b76b5a5498c4f6d94309" } }, { "commit": { "author": { - "user": { - "login": "mehtanirav" - }, - "email": "niravmehta@fb.com", - "name": "Nirav Mehta" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "b4b65228dd0c109f5fdf17c7d9e56f60a98e398b" + "oid": "9d81d7bae8ad91aaa24b3ceab83e3138894dbc69" } }, { "commit": { "author": { - "user": { - "login": "albanD" - }, - "email": "albandes@fb.com", - "name": "Alban Desmaison" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "d629e300705196d3ae0bac5ed983b197101fa2ee" + "oid": "e79f6a2202512b294c55bf4bfb2e0524fafd4c48" } }, { "commit": { "author": { - "user": { - "login": "bigfootjon" - }, - "email": "jonjanzen@fb.com", - "name": "Jon Janzen" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "52754b9e515f378f8476ad44d75b0a692bad8cde" + "oid": "f683e8aec7aea76097a264eec01511e704c31154" } }, { "commit": { "author": { "user": { - "login": "samdow" + "login": "coolteemf" }, - "email": "samdow@fb.com", - "name": "samdow" + "email": "67541941+coolteemf@users.noreply.github.com", + "name": "Fran\u00e7ois Lecomte" }, - "oid": "128c3ad747093f4970329a82c7c4720420faeff2" + "oid": "b932e9e286c22aaf352375186df851ef060b295a" } }, { "commit": { "author": { - "user": { - "login": "arindamroy-eng" - }, - "email": "61168652+arindamroy-eng@users.noreply.github.com", - "name": "arindamroy-eng" + "user": null, + "email": "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", + "name": "coolteemf" }, - "oid": "2a0bda7d32a5bcc9827f7254a7b77cceb16ba973" + "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22" } } ], "pageInfo": { - "endCursor": "MTAw", - "hasNextPage": true + "endCursor": "MTY", + "hasNextPage": false }, - "totalCount": 131 + "totalCount": 16 }, "commits": { "nodes": [ @@ -4993,109 +9354,53 @@ } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuNRg4=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGYqY=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693698" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRAI=" - }, - { - "node": { - "app": { - "name": "Netlify", - "databaseId": 13473 - }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693712" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRBA=" - }, - { - "node": { - "app": { - "name": "Azure Pipelines", - "databaseId": 9426 - }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693725" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRB0=" - }, - { - "node": { - "app": { - "name": "Dependabot", - "databaseId": 29110 - }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693741" + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801320" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRC0=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_T6g=" }, { "node": { "app": { - "name": "Codecov", - "databaseId": 254 + "name": "GitHub Actions", + "databaseId": 15368 }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-onnx" } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693761" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsREE=" - }, - { - "node": { - "app": { - "name": "PyTorch Bot", - "databaseId": 40112 - }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754066/jobs/2663109808" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754066/jobs/2663214802" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754066/jobs/2663214856" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIob0=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693774" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801849" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRE4=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ubk=" }, { "node": { @@ -5105,26 +9410,26 @@ }, "workflowRun": { "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + "name": "linux-xenial-py3-clang5-mobile-build" } }, "checkRuns": { "nodes": [ { - "name": "run-torchbench", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099388390?check_suite_focus=true" + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754064/jobs/2663109676" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuNR-Y=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ1E=", "hasNextPage": false } }, - "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694412" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801852" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRsw=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ubw=" }, { "node": { @@ -5134,56 +9439,41 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "linux-bionic-rocm4.5-py3.7" } }, "checkRuns": { "nodes": [ { - "name": "Test collect_env (with_torch)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099431378?check_suite_focus=true" - }, - { - "name": "Test collect_env (without_torch)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099431511?check_suite_focus=true" - }, - { - "name": "toc", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099431693?check_suite_focus=true" - }, - { - "name": "Test tools", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099431829?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754065/jobs/2663109684" }, { - "name": "quick-checks", + "name": "test (default, 2, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432018?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754065/jobs/2663401083" }, { - "name": "lintrunner", + "name": "test (default, 1, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432195?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754065/jobs/2663401143" }, { - "name": "workflow-checks", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432331?check_suite_focus=true" + "name": "test (distributed, 1, 1, linux.rocm.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754065/jobs/2663401186" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuN84s=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwMsZY=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694417" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801853" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRtE=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ub0=" }, { "node": { @@ -5193,654 +9483,518 @@ }, "workflowRun": { "workflow": { - "name": "pull" + "name": "win-vs2019-cuda11.3-py3" } }, "checkRuns": { "nodes": [ { - "name": "linux-xenial-py3.7-gcc7-no-ops / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099430906?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099431117?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-onnx / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099431312?check_suite_focus=true" - }, - { - "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099431677?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099431819?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432057?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432191?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.0-py3.7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432334?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432446?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432577?check_suite_focus=true" - }, - { - "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432685?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432822?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099432932?check_suite_focus=true" - }, - { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099433128?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.3-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099433280?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099433402?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099433542?check_suite_focus=true" - }, - { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099433675?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099433758?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099433859?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099554424?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099554523?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (cpp)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099557184?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (python)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099557310?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099557449?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099557512?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099557588?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099557655?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099557717?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099557795?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099565740?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099565906?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099565972?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 1, linux.2xlarge)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099566036?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754068/jobs/2663109680" }, { - "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099580613?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754068/jobs/2663995756" }, { - "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099580676?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754068/jobs/2663995819" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099608194?check_suite_focus=true" - }, + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754068/jobs/2663995900" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwZbzg=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801855" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ub8=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", + "name": "mypy", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099608322?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663109683" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", + "name": "shellcheck", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099608371?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663109827" }, { - "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "name": "py2-setup-validate-errormsg", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099619007?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663109962" }, { - "name": "linux-bionic-rocm5.0-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "name": "clang-format", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099645951?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110044" }, { - "name": "linux-bionic-rocm5.0-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "name": "cmakelint", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099646089?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110132" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "name": "toc", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099685555?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110233" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "name": "quick-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099685664?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110320" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "name": "clang-tidy", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099685757?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110461" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "name": "flake8-py3", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099689530?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110575" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGbAQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801856" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcA=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [ { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099757872?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754070/jobs/2663109804" }, { - "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "name": "test (default, 3, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099757955?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754070/jobs/2663233675" }, { - "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "name": "test (default, 1, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099898234?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754070/jobs/2663233731" }, { - "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "name": "test (default, 2, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099898323?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754070/jobs/2663233805" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuVD9M=", - "hasNextPage": true - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694439" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRuc=" - } - ], - "pageInfo": { - "hasNextPage": false - } - }, - "pushedDate": "2022-04-20T17:10:41Z", - "oid": "5696e8357cf38f852ef3d680381513e26f202371" - } - } - ] - }, - "changedFiles": 348, - "files": { - "nodes": [ - { - "path": ".circleci/cimodel/data/pytorch_build_data.py" - }, - { - "path": ".circleci/cimodel/data/pytorch_build_definitions.py" - }, - { - "path": ".circleci/scripts/cpp_doc_push_script.sh" - }, - { - "path": ".circleci/scripts/python_doc_push_script.sh" - }, - { - "path": ".github/actions/checkout-pytorch/action.yml" - }, - { - "path": ".github/merge_rules.json" - }, - { - "path": ".github/scripts/gitutils.py" - }, - { - "path": ".github/scripts/gql_mocks.json" - }, - { - "path": ".github/scripts/trymerge.py" - }, - { - "path": ".github/workflows/_bazel-build-test.yml" - }, - { - "path": ".github/workflows/_linux-build.yml" - }, - { - "path": ".github/workflows/_linux-test.yml" - }, - { - "path": ".github/workflows/_mac-test.yml" - }, - { - "path": ".github/workflows/_rocm-test.yml" - }, - { - "path": ".github/workflows/_win-test.yml" - }, - { - "path": ".github/workflows/buck_build_test.yml" - }, - { - "path": ".github/workflows/lint.yml" - }, - { - "path": ".github/workflows/periodic.yml" - }, - { - "path": ".github/workflows/pull.yml" - }, - { - "path": ".github/workflows/trunk.yml" - }, - { - "path": ".jenkins/pytorch/macos-test.sh" - }, - { - "path": ".jenkins/pytorch/test.sh" - }, - { - "path": ".jenkins/pytorch/win-test.sh" - }, - { - "path": ".lintrunner.toml" - }, - { - "path": "BUILD.bazel" - }, - { - "path": "CODEOWNERS" - }, - { - "path": "README.md" - }, - { - "path": "aten/src/ATen/BatchingRegistrations.cpp" - }, - { - "path": "aten/src/ATen/Dispatch.h" - }, - { - "path": "aten/src/ATen/ExpandUtils.h" - }, - { - "path": "aten/src/ATen/FunctionalInverses.cpp" - }, - { - "path": "aten/src/ATen/FunctionalStorageImpl.cpp" - }, - { - "path": "aten/src/ATen/FunctionalStorageImpl.h" - }, - { - "path": "aten/src/ATen/FunctionalTensorWrapper.cpp" - }, - { - "path": "aten/src/ATen/FunctionalTensorWrapper.h" - }, - { - "path": "aten/src/ATen/FunctionalizeFallbackKernel.cpp" - }, - { - "path": "aten/src/ATen/NestedTensorImpl.cpp" - }, - { - "path": "aten/src/ATen/OpMathType.h" - }, - { - "path": "aten/src/ATen/SparseCsrTensorUtils.h" - }, - { - "path": "aten/src/ATen/ThreadLocalState.cpp" - }, - { - "path": "aten/src/ATen/ThreadLocalState.h" - }, - { - "path": "aten/src/ATen/autocast_mode.cpp" - }, - { - "path": "aten/src/ATen/autocast_mode.h" - }, - { - "path": "aten/src/ATen/core/SymIntArrayRef.cpp" - }, - { - "path": "aten/src/ATen/core/SymIntArrayRef.h" - }, - { - "path": "aten/src/ATen/core/TensorBase.h" - }, - { - "path": "aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h" - }, - { - "path": "aten/src/ATen/core/dispatch/Dispatcher.h" - }, - { - "path": "aten/src/ATen/core/interned_strings.h" - }, - { - "path": "aten/src/ATen/core/ivalue.cpp" - }, - { - "path": "aten/src/ATen/core/ivalue.h" - }, - { - "path": "aten/src/ATen/core/ivalue_inl.h" - }, - { - "path": "aten/src/ATen/core/jit_type.h" - }, - { - "path": "aten/src/ATen/core/jit_type_base.h" - }, - { - "path": "aten/src/ATen/core/type.cpp" - }, - { - "path": "aten/src/ATen/cuda/CUDASparse.h" - }, - { - "path": "aten/src/ATen/cuda/llvm_complex.cpp" - }, - { - "path": "aten/src/ATen/cuda/llvm_jit_strings.h" - }, - { - "path": "aten/src/ATen/native/Blas.cpp" - }, - { - "path": "aten/src/ATen/native/Itertools.cpp" - }, - { - "path": "aten/src/ATen/native/LinearAlgebra.cpp" - }, - { - "path": "aten/src/ATen/native/SoftMax.cpp" - }, - { - "path": "aten/src/ATen/native/TensorConversions.cpp" - }, - { - "path": "aten/src/ATen/native/TensorShape.cpp" - }, - { - "path": "aten/src/ATen/native/TensorShape.h" - }, - { - "path": "aten/src/ATen/native/Unique.cpp" - }, - { - "path": "aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu" - }, - { - "path": "aten/src/ATen/native/cuda/CUDAJitLoops.cuh" - }, - { - "path": "aten/src/ATen/native/cuda/JitLoops.cuh" - }, - { - "path": "aten/src/ATen/native/cuda/Lerp.cu" - }, - { - "path": "aten/src/ATen/native/cuda/PersistentSoftmax.cuh" - }, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwJC4U=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801857" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcE=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754076/jobs/2663109810" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ_w=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801862" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcY=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663109777" + }, + { + "name": "test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201383" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201458" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201512" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201580" + }, + { + "name": "test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201672" + }, + { + "name": "test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201839" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIWu4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801866" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Uco=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754079/jobs/2663109681" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ1k=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801869" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Uc0=" + } + ], + "pageInfo": { + "hasNextPage": true + } + }, + "status": { + "contexts": [ + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17017798?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17017799?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17017816?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17017800?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-02-23T10:39:30Z", + "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22" + } + } + ] + }, + "changedFiles": 9, + "files": { + "nodes": [ { - "path": "aten/src/ATen/native/cuda/SoftMax.cu" + "path": "aten/src/ATen/native/GridSampler.cpp" }, { - "path": "aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu" + "path": "aten/src/ATen/native/cpu/GridSamplerKernel.cpp" }, { - "path": "aten/src/ATen/native/cuda/Unique.cu" + "path": "aten/src/ATen/native/cuda/GridSampler.cpp" }, { - "path": "aten/src/ATen/native/cuda/jit_utils.cpp" + "path": "aten/src/ATen/native/cuda/GridSampler.cu" }, { - "path": "aten/src/ATen/native/cuda/jit_utils.h" + "path": "aten/src/ATen/native/cuda/GridSampler.h" }, { "path": "aten/src/ATen/native/native_functions.yaml" }, { - "path": "aten/src/ATen/native/nested/NestedTensorMath.cpp" - }, - { - "path": "aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp" - }, - { - "path": "aten/src/ATen/native/quantized/cpu/qsoftmax.cpp" - }, - { - "path": "aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp" + "path": "test/forward_backward_compatibility/check_forward_backward_compatibility.py" }, { - "path": "aten/src/ATen/native/quantized/cudnn/Linear.cpp" + "path": "test/test_nn.py" }, { - "path": "aten/src/ATen/native/quantized/cudnn/utils.h" - }, + "path": "tools/autograd/derivatives.yaml" + } + ], + "pageInfo": { + "endCursor": "OQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ { - "path": "aten/src/ATen/native/sparse/SparseCsrTensor.cpp" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/native/ts_native_functions.yaml" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/record_function.cpp" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/record_function.h" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/templates/Operators.h" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/templates/RegisterFunctionalization.cpp" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/test/basic.cpp" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/test/vmap_test.cpp" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "path": "binaries/record_function_benchmark.cc" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "path": "c10/core/DispatchKey.cpp" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "path": "c10/core/DispatchKey.h" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "path": "c10/core/DispatchKeySet.h" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "path": "c10/test/core/DispatchKeySet_test.cpp" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "path": "c10/util/ArrayRef.h" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "path": "caffe2/core/tensor.h" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "path": "docs/source/conf.py" + "author": { + "login": "albanD" + }, + "state": "APPROVED" }, { - "path": "docs/source/fx.rst" + "author": { + "login": "albanD" + }, + "state": "APPROVED" } ], "pageInfo": { - "endCursor": "MTAw", - "hasNextPage": true - } - }, - "reviews": { - "nodes": [], - "pageInfo": { - "startCursor": null, + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMS0yNVQwODoyODoxMC0wODowMLkyMDIyLTAxLTI1VDA3OjU0OjA1LTA4OjAwzjNooqI=", "hasPreviousPage": false } }, "comments": { "nodes": [ { - "bodyText": "Merge failed due to Matched rule superuser, but it was not reviewed yet by any of:zou3519,abhikrish,mehtanirav,wconstab,lc0, ...", + "bodyText": "Merge failed due to 'NoneType' object is not subscriptable\nRaised by https://github.com/pytorch/pytorch/actions/runs/1887945630", "author": { "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1104215370 + "databaseId": 1048868910 }, { - "bodyText": "Merge failed due to Matched rule superuser, but PR has not been reviewed yet", + "bodyText": "Thanks for the update! The windows failure is not your fault, you can ignore it!\n\nThank you very much for all of your feedback and sorry for the delay !", "author": { - "login": "pytorchmergebot" + "login": "coolteemf" }, - "authorAssociation": "MEMBER", + "authorAssociation": "CONTRIBUTOR", "editor": null, - "databaseId": 1104220908 + "databaseId": 1048983572 }, { - "bodyText": "@pytorchbot merge this", + "bodyText": "@coolteemf can you please send either me or @albanD an email? (or I can send you and invite to collab on private repo)", "author": { "login": "malfet" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1104378397 + "databaseId": 1049048119 }, { - "bodyText": "Merge failed due to Matched rule superuser, but PR has not been reviewed yet\nRaised by https://github.com/pytorch/pytorch/actions/runs/2197877090", + "bodyText": "@pytorchbot merge this please", "author": { - "login": "pytorchmergebot" + "login": "albanD" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1104379712 + "databaseId": 1049131992 }, { - "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.", + "bodyText": "Hey @coolteemf.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": { "login": "github-actions" }, "authorAssociation": "NONE", "editor": null, - "databaseId": 1160658699 + "databaseId": 1049134520 } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOQdD9Sg==", + "startCursor": "Y3Vyc29yOnYyOpHOPoR4Lg==", "hasPreviousPage": true } }, "labels": { "edges": [ + { + "node": { + "name": "triaged" + } + }, + { + "node": { + "name": "open source" + } + }, { "node": { "name": "cla signed" @@ -5848,7 +10002,12 @@ }, { "node": { - "name": "Stale" + "name": "release notes: nn" + } + }, + { + "node": { + "name": "topic: performance" } } ] @@ -5857,268 +10016,174 @@ } } }, - "query_sha=74bd29fe945c49fde4818e873fa62bc60b55b4ef6ae3f2bb719bab6cddbaa7ce cursor=MTAw name=pytorch number=76118 owner=pytorch": { + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=75095 owner=pytorch": { "data": { "repository": { "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "mruberry" + }, + "title": "Initial prims, references, and test architecture for them", + "body": "This PR adds an initial set of experimental primitive operations and Python references that reimplement existing PyTorch operations using them. See https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-0/577 for additional context.\r\n\r\nThe following experimental primitives are added:\r\n\r\n- Elementwise unary prims -- abs, acos, acosh, asin, atan, cos, cosh, bessel_i0e, bessel_i1e, cbrt, ceil, digamma, erf, erf_inv, erfc, exp, expm1, floor, igamma, igammac, is_finite, lgamma, log, log1p, neg, reciprocal, round, sign, sinh, sqrt, square, tan. \r\n- Elementwise binary prims -- add, atan2, bitwise_and, bitwise_not, bitwise_or, bitwise_xor, div, eq, ge, gt, le, lt, max, min, mul, ne, nextafter, pow, rsqrt, shift_left, shift_right_arithmetic\r\n- View prims -- brodcast_in_dim, collapse_view, split_dim, squeeze\r\n- Shape prims -- collapse, concatenate, reshape\r\n- Conditional prims -- select\r\n- Data conversion & movement prims -- convert_element_type, device_put\r\n- Inplace prims -- copy_to, resize\r\n\r\nThese primitives do not add any new functionality to PyTorch, but are intended to be the semantic building blocks for reference operators. We have tried to make them consistent with the operations in [jax.lax](https://jax.readthedocs.io/en/latest/jax.lax.html) where possible (because PyTorch prefers being consistent with other frameworks), although there are key differences between these prims and operations in jax.lax. Most notably is that these prims model view semantics and inplace operations.\r\n\r\nIn addition to these primitives the following elementwise binary Python references are added:\r\n\r\n- Elementwise binary Python references -- add, atan2, bitwise_and, bitwise_left_shift, bitwise_or, bitwise_right_shift, bitwise_xor, eq, float_power, ge, gt, le, lt, maximum, minimum, mul, ne, nextafter, pow, sub, true_divide\r\n- Conditional Python references - where\r\n- Data conversion & movement references - copy_to\r\n\r\nA Python reference implements the same behavior as its corresponding PyTorch operator (excepting slight numerical differences, bug fixes, and in some cases additional features). \r\n\r\nThe start of an OpInfo-based test architecture for these references is also included in this PR. A new list, `python_ref_db`, is added to `common_methods_invocations.py`. This list introduces the new `ElementwiseBinaryPythonRefInfo`, which inherits input arguments from the original operators' OpInfo, allows them to be overridden, and then constructs the OpInfo for the Python reference using the (potentially modified) arguments. OpInfo-based tests can opt-into testing references by including this new list in the Sequence passed to the `@ops` decorator. \r\n\r\ncc @ngimel @csarofeen @kevinstephano @Lezcano ", + "headRefName": "prims_and_references", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, "commits_with_authors": { "nodes": [ - { - "commit": { - "author": { - "user": { - "login": "clee2000" - }, - "email": "csl@fb.com", - "name": "Catherine Lee" - }, - "oid": "7f560351ae04ea43e58fbfda885bcf216aa26cde" - } - }, - { - "commit": { - "author": { - "user": { - "login": "pytorchmergebot" - }, - "email": "pytorchmergebot@users.noreply.github.com", - "name": "PyTorch MergeBot" - }, - "oid": "e8677ed168a036bc7e590d800fe98dd15f10581b" - } - }, - { - "commit": { - "author": { - "user": { - "login": "robieta" - }, - "email": "taylorrobie@fb.com", - "name": "Taylor Robie" - }, - "oid": "ac5611caa13642ef8dbe0db453b283b42cbd900b" - } - }, - { - "commit": { - "author": { - "user": { - "login": "robieta" - }, - "email": "taylorrobie@fb.com", - "name": "Taylor Robie" - }, - "oid": "1184afbd3bfde0f46133aef09e55e18d3bfb3c3e" - } - }, - { - "commit": { - "author": { - "user": { - "login": "minsii" - }, - "email": "msi@fb.com", - "name": "Min Si" - }, - "oid": "1c05604f3d049c67dc678d0295c0add470bff3dc" - } - }, { "commit": { "author": { "user": null, - "email": "eellison@devfair044.h1.fair", - "name": "Elias Ellison" + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "76ab5101bd36e8d73637d31bbea125240b7b27f0" + "oid": "a790467c650be92775103cde5e866c90b56f5376" } }, { "commit": { "author": { "user": null, - "email": "eellison@devfair044.h1.fair", - "name": "Elias Ellison" - }, - "oid": "c774050e92c3d8e52968e1eb635dd3e9491104b3" - } - }, - { - "commit": { - "author": { - "user": { - "login": "guoyejun" - }, - "email": "yejun.guo@intel.com", - "name": "Guo Yejun" - }, - "oid": "8981595c5361f07186f4534f3be71f1d829a3046" - } - }, - { - "commit": { - "author": { - "user": { - "login": "BowenBao" - }, - "email": "bowbao@microsoft.com", - "name": "BowenBao" - }, - "oid": "036f362904024ac9481248965009f312bec6656b" - } - }, - { - "commit": { - "author": { - "user": { - "login": "janeyx99" - }, - "email": "janeyx@fb.com", - "name": "Jane Xu" + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "457d994933f164a9fd70da5ca2733dd6c046a28b" + "oid": "bd6fcf50692e208ebecdc2eaa517a2bfcdcd35cf" } }, { "commit": { "author": { - "user": { - "login": "janeyx99" - }, - "email": "janeyx@fb.com", - "name": "Jane Xu" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "f49ebc77520774e71722111d554a0215a26956df" + "oid": "4a119c8f21529fe1375e7e8789b91f41a3df80c5" } }, { "commit": { "author": { - "user": { - "login": "mikeiovine" - }, - "email": "mikeiovine@fb.com", - "name": "Mike Iovine" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "f069e1a4a5f98d3fe961e4fc562ede59f59b4026" + "oid": "ea6750dc34d66be759fdfe84b09fb0e23ee59c79" } }, { "commit": { "author": { - "user": { - "login": "salilsdesai" - }, - "email": "salilsdesai@fb.com", - "name": "Salil Desai" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "30bccf58393b288412a0f5a2423a1a41ffce258e" + "oid": "2eef8a55fe0227e1921b51bf1f56f9d0a29b49ac" } }, { "commit": { "author": { - "user": { - "login": "angelayi" - }, - "email": "angelayi@fb.com", - "name": "Angela Yi" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "f4ba440fe8a632c1ee88e01f7746a8a92c8f3902" + "oid": "b886ed6c20dd1785fd31ed6fa6a8c5b6d0d0b16c" } }, { "commit": { "author": { "user": null, - "email": "shirong@fb.com", - "name": "Shirong Wu" + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "d203346c93ba96d626c6c02910888198c789ba69" + "oid": "9ad9b63d09aa4f7a8549bcf1d88ea4ff0674299c" } }, { "commit": { "author": { - "user": { - "login": "jamesr66a" - }, - "email": "jamesreed@fb.com", - "name": "James Reed" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "73a4e34963e212b799a191fd031d2fa31d17e0ac" + "oid": "63fdd580118477416ae160e0670ae722ea248090" } }, { "commit": { "author": { - "user": { - "login": "Krovatkin" - }, - "email": "korovaikon@gmail.com", - "name": "Nikolay Korovaiko" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "b9d5206dfb46f09f953aba3ffb0e1e33a99032ee" + "oid": "0ccf7dc292af1d40d0a094eb2b2fb0c7ab4ccc70" } }, { "commit": { "author": { - "user": { - "login": "ngimel" - }, - "email": "ngimel@fb.com", - "name": "Natalia Gimelshein" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "12114e6937573fead54e11ae6cdebe5b31dee302" + "oid": "e8a8a4d1fbe35f20eb88e1a43cf5a653883638e5" } }, { "commit": { "author": { - "user": { - "login": "s4ayub" - }, - "email": "shababayub@fb.com", - "name": "Shabab Ayub" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "f2323f76ad6f7f590285bf9c6d20c14a79542563" + "oid": "186634dfdd25645c05b58a212f9e8d77c4125fc0" } }, { "commit": { "author": { - "user": { - "login": "jaglinux" - }, - "email": "jagdish.krishna@gmail.com", - "name": "Jagadish Krishnamoorthy" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "acd4b5abe2739c09c1a02524eceda46ff93fd385" + "oid": "f5b4741312b5c42a79f6c8a1d3930b79db38ed8f" } }, { "commit": { "author": { "user": { - "login": "cccclai" + "login": "ezyang" }, - "email": "chenlai@fb.com", - "name": "Chen Lai" + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" }, - "oid": "04179f533283132fa334a9f91a070b1712f7323d" + "oid": "23d50391bb0fd12111fd3171591c4235ffb2fc1a" } }, { "commit": { "author": { "user": { - "login": "zaxtax" + "login": "ezyang" }, - "email": "rob@zinkov.com", - "name": "Rob Zinkov" + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" }, - "oid": "5097cdcd6994ad82b3cec942b70e75dbeaee8ca4" + "oid": "bac9d45422d58f513b60b4b854441cfdc253d4c5" } }, { @@ -6130,7 +10195,7 @@ "email": "ezyang@fb.com", "name": "Edward Z. Yang" }, - "oid": "5015ecb5a2b86943f457d71f5a977444dd062732" + "oid": "13240ae0b4a0332c3167b65ac026a3172da90cb7" } }, { @@ -6142,171 +10207,125 @@ "email": "ezyang@fb.com", "name": "Edward Z. Yang" }, - "oid": "1c42b7789d3966cd541b08fce359b9738fee69f6" + "oid": "1ee34468cb1db3dc6cbae204669f4fec20e2a466" } }, { "commit": { "author": { "user": { - "login": "albanD" + "login": "ezyang" }, - "email": "albandes@fb.com", - "name": "Alban Desmaison" + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" }, - "oid": "893ac3d334fd3e85e22423a06fe986ce453fe304" + "oid": "561d132bc686d00e8911f7feb3da5901b2bdc574" } }, { "commit": { "author": { "user": { - "login": "emcastillo" + "login": "ngimel" }, - "email": "ecastill@preferred.jp", - "name": "Emilio Castillo" + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" }, - "oid": "aa5d1b6b031ee2b8bb85f793a842ac1327ae4a19" + "oid": "ac42bedc84b7c96256376ad09917263bb020b2c3" } }, { "commit": { "author": { "user": { - "login": "dzdang" + "login": "ngimel" }, - "email": "dzdang@umich.edu", - "name": "dzdang" + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" }, - "oid": "0707a1d00f33d7098f56de339cb30436e8c2ea44" + "oid": "7f7d5ba40a0b5e10526d90b018b30b54673d12d8" } }, { "commit": { "author": { - "user": { - "login": "NivekT" - }, - "email": "ktse@fb.com", - "name": "Kevin Tse" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "ccb082d42af99f6374183cf914cc712bac585f0f" + "oid": "37a6b4a8b1adb712d5777c7c3479866c27fb3c4e" } }, { "commit": { "author": { "user": { - "login": "ryandaryl" + "login": "ngimel" }, - "email": "ryandarylmills@gmail.com", - "name": "ryandaryl" + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" }, - "oid": "4f2909cc8747808786a1871b0a6825cc4566f48c" + "oid": "65b613868c44e519c1777af79b9fd3498c5a7e58" } }, { "commit": { "author": { "user": { - "login": "clee2000" + "login": "ngimel" }, - "email": "csl@fb.com", - "name": "Catherine Lee" + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" }, - "oid": "f764010648a29223d9ed4b955073d9d2fb1b2f43" + "oid": "442c405e9da0d66744ef03e379224c41eedf5b57" } }, { "commit": { "author": { - "user": { - "login": "malfet" - }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "5696e8357cf38f852ef3d680381513e26f202371" + "oid": "031ac49ae9c192989385986b6707fa781e3229e0" } - } - ], - "pageInfo": { - "endCursor": "MTMx", - "hasNextPage": false - } - } - } - } - } - }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=76123 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "closed": true, - "isCrossRepository": true, - "author": { - "login": "kumpera" - }, - "title": "Introduce distributed checkpoint with ShardedTensor.", - "body": "Co-authored-by: Wen Zhang \r\nCo-authored-by: Yifu Wang \r\n\r\n", - "headRefName": "st_checkpoint", - "headRepository": { - "nameWithOwner": "kumpera/pytorch" - }, - "baseRefName": "master", - "baseRepository": { - "nameWithOwner": "pytorch/pytorch", - "isPrivate": false, - "defaultBranchRef": { - "name": "master" - } - }, - "mergeCommit": null, - "commits_with_authors": { - "nodes": [ + }, { "commit": { "author": { - "user": { - "login": "kumpera" - }, - "email": "kumpera@fb.com", - "name": "Rodrigo Kumpera" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "6bf248bc20a71f248064b795f38276326fe43aae" + "oid": "9a6c3b00039c0c985c1c9cb59490012d1c0b38ba" } }, { "commit": { "author": { - "user": { - "login": "kumpera" - }, - "email": "kumpera@fb.com", - "name": "Rodrigo Kumpera" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "10f84fb90bf02d7062e565ebf2c1da6352b64db7" + "oid": "d5c30e408af1889b90012d2e09f6ec3cda333bcb" } }, { "commit": { "author": { - "user": { - "login": "kumpera" - }, - "email": "kumpera@fb.com", - "name": "Rodrigo Kumpera" + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" }, - "oid": "96c5299740ec791f3cf0975c03a40a7b219b6747" + "oid": "db355d55655bb252a699cd532441bb98e52b98d5" } } ], "pageInfo": { - "endCursor": "Mw", + "endCursor": "MjY", "hasNextPage": false }, - "totalCount": 3 + "totalCount": 26 }, "commits": { "nodes": [ @@ -6327,379 +10346,146 @@ "name": "Facebook CLA Check", "conclusion": "SUCCESS", "detailsUrl": "https://code.intern.facebook.com/cla/" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS2l4=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755666" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSmtI=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" - } - }, - "checkRuns": { - "nodes": [ + }, { - "name": "run-torchbench", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299234164?check_suite_focus=true" + "name": "Meta Internal-Only Changes Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://opensource.facebook.com/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd2r3Q=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6ux14=", "hasNextPage": false } }, - "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755785" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454954" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm0k=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC2o=" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "Lint" - } + "name": "Netlify", + "databaseId": 13473 }, - "checkRuns": { - "nodes": [ - { - "name": "quick-checks", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299234165?check_suite_focus=true" - }, - { - "name": "toc", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299234428?check_suite_focus=true" - }, - { - "name": "lintrunner", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299234555?check_suite_focus=true" - }, - { - "name": "Test collect_env (with_torch)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299234642?check_suite_focus=true" - }, - { - "name": "Test collect_env (without_torch)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299234701?check_suite_focus=true" - }, - { - "name": "Test tools", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299234761?check_suite_focus=true" - }, - { - "name": "workflow-checks", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299234837?check_suite_focus=true" - } - ], + "workflowRun": null, + "checkRuns": { + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd2shU=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755786" + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454956" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm0o=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC2w=" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 + "name": "Azure Pipelines", + "databaseId": 9426 }, - "workflowRun": { - "workflow": { - "name": "pull" + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false } }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454965" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC3U=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, "checkRuns": { - "nodes": [ - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299245858?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299245958?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246168?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246250?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246281?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-onnx / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246329?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246373?check_suite_focus=true" - }, - { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246442?check_suite_focus=true" - }, - { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246517?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246547?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246591?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246687?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246843?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7-no-ops / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299246972?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299247064?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299247163?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.3-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299247261?check_suite_focus=true" - }, - { - "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299247380?check_suite_focus=true" - }, - { - "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299247471?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.1-py3.7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299247519?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299305596?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299305656?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (cpp)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299307925?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (python)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299307961?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299308001?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299308035?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299308082?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299308120?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299308169?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299308217?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299312986?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299313146?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299313195?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299313235?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299313977?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299314888?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299314937?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 4, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299332358?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299332420?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 4, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299332476?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 4, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299332526?check_suite_focus=true" - }, - { - "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299335580?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299375031?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299375079?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299377190?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299378010?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299378053?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299378105?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299378136?check_suite_focus=true" - }, + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454970" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC3o=" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454974" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC34=" + }, + { + "node": { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454977" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC4E=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6299437798?check_suite_focus=true" + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622865/jobs/3270915028" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd5yuY=", - "hasNextPage": true + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6e-c8=", + "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755806" + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455322" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm14=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDNo=" }, { "node": { @@ -6714,80 +10500,51 @@ }, "checkRuns": { "nodes": [ + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915027" + }, { "name": "lintrunner", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309468155?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915071" }, { - "name": "quick-checks", + "name": "Test tools", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309468457?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915141" }, { "name": "Test collect_env (with_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309468841?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915194" }, { "name": "Test collect_env (without_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309468942?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915229" }, { - "name": "Test tools", + "name": "toc", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309469180?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915283" }, { "name": "workflow-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309469314?check_suite_focus=true" - }, - { - "name": "toc", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309469473?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915321" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS3SE=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6e-zM=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363240" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNGg=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "run-torchbench", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309468138?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS1-o=", - "hasNextPage": false - } - }, - "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363271" + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455334" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNIc=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDOY=" }, { "node": { @@ -6803,1262 +10560,2489 @@ "checkRuns": { "nodes": [ { - "name": "linux-bionic-rocm5.1-py3.7 / build", + "name": "linux-vulkan-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309468956?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927344" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "name": "linux-bionic-rocm5.0-py3.7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309469237?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927442" }, { - "name": "linux-xenial-py3.7-clang7-asan / build", + "name": "linux-xenial-py3.7-clang7-onnx / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309469475?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927507" }, { - "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309469750?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927567" }, { "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309470049?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309470368?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927674" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "name": "win-vs2019-cuda11.3-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309470787?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927727" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "name": "linux-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309471290?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927802" }, { - "name": "linux-xenial-py3-clang5-mobile-build / build", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309471585?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927853" }, { - "name": "linux-xenial-py3.7-clang7-onnx / build", + "name": "linux-xenial-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309471734?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927948" }, { - "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "name": "linux-xenial-py3-clang5-mobile-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309472014?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927996" }, { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "name": "linux-xenial-py3.7-clang7-asan / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309472172?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928061" }, { "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309472411?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928116" }, { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309472715?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928198" }, { "name": "linux-xenial-py3.7-gcc5.4 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309473041?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928256" }, { "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309473226?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309473414?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928291" }, { "name": "win-vs2019-cpu-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309473700?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928317" }, { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309473992?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.3-py3 / build", + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309474162?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928338" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "name": "linux-xenial-py3.7-gcc7-no-ops / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309647069?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928367" }, { - "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309647413?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928410" }, { - "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309647538?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928445" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309657055?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991071" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309657196?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991125" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309657332?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991162" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309657575?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991195" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309657726?check_suite_focus=true" + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991233" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309657858?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991261" }, { "name": "linux-docs / build-docs (cpp)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309658314?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991305" }, { "name": "linux-docs / build-docs (python)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309658433?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991349" }, { "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309665388?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270996024" }, { "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309665513?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270996068" }, { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309665597?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270996092" }, { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270996505" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270998987" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309665697?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270999027" }, { "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309672367?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271006886" }, { "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309672499?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271006941" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 4, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309696458?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271018097" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309696554?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271018135" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 4, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309696638?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271018162" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 4, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309696725?check_suite_focus=true" + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271021143" }, { - "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309712838?check_suite_focus=true" + "name": "linux-bionic-rocm5.0-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271034041" }, { - "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "name": "linux-bionic-rocm5.0-py3.7 / test (default, 2, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309767601?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271034072" }, { - "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309767717?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271048218" }, { "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309792321?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271049553" }, { "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309792407?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271049587" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309792546?check_suite_focus=true" + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271049616" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309792639?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271068293" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309792972?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271068336" }, { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271149276" + }, + { + "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6309939578?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271149321" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgaCXo=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6jVK8=", "hasNextPage": true } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363300" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455360" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNKQ=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDQA=" } ], "pageInfo": { "hasNextPage": false } }, - "pushedDate": "2022-05-05T00:34:26Z", - "oid": "96c5299740ec791f3cf0975c03a40a7b219b6747" + "status": null, + "pushedDate": "2022-04-25T02:30:31Z", + "oid": "db355d55655bb252a699cd532441bb98e52b98d5" } } ] }, - "changedFiles": 11, + "changedFiles": 5, "files": { "nodes": [ { - "path": "test/distributed/_shard/checkpoint/test_checkpoint.py" + "path": "test/test_ops.py" + }, + { + "path": "torch/_prims/__init__.py" + }, + { + "path": "torch/_prims/utils.py" + }, + { + "path": "torch/_refs/__init__.py" + }, + { + "path": "torch/testing/_internal/common_methods_invocations.py" + } + ], + "pageInfo": { + "endCursor": "NQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/distributed/_shard/checkpoint/test_file_system_checkpoint.py" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/distributed/_shard/sharded_tensor/test_sharded_tensor.py" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/checkpoint/__init__.py" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/checkpoint/filesystem.py" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/checkpoint/metadata.py" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/checkpoint/resharding.py" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/checkpoint/state_dict_loader.py" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/checkpoint/state_dict_saver.py" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/checkpoint/storage.py" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py" - } - ], - "pageInfo": { - "endCursor": "MTE", - "hasNextPage": false - } - }, - "reviews": { - "nodes": [ + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" + }, { "author": { - "login": "kumpera" + "login": "ngimel" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "lezcano" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "zou3519" }, "state": "COMMENTED" }, { "author": { - "login": "zzzwen" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "zzzwen" + "login": "peterbell10" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "lezcano" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "lezcano" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ngimel" }, "state": "COMMENTED" }, { "author": { - "login": "wanchaol" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "zzzwen" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "zzzwen" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "simpkins" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "zzzwen" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "zzzwen" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "simpkins" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "simpkins" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "pritamdamania87" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "pritamdamania87" + "login": "ngimel" }, "state": "COMMENTED" }, { "author": { - "login": "pritamdamania87" + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "wilson100hong" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "wilson100hong" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "wilson100hong" + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "xunnanxu" + "login": "mruberry" }, - "state": "DISMISSED" + "state": "COMMENTED" }, { "author": { - "login": "xunnanxu" + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" }, "state": "COMMENTED" }, { "author": { - "login": "xunnanxu" + "login": "ngimel" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "ezyang" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "mruberry" }, "state": "COMMENTED" - }, + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNC0wNlQxMjo1NjoyNC0wNzowMLkyMDIyLTA0LTA2VDA4OjQwOjM4LTA3OjAwzjenO6Y=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ { + "bodyText": "Ref implementations by themselves can handle any shapes (and broadcast ops by themselves don't bake in any shapes). The question is can we decide if a particular trace is applicable for a different input, but that depends on the tracing technology and what we are caching on, so out of scope for initial PR.", "author": { - "login": "kumpera" + "login": "ngimel" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1105643418 }, { + "bodyText": "@pytorchbot merge this please", "author": { - "login": "kumpera" + "login": "mruberry" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1108072887 }, { + "bodyText": "Merge failed due to 'mruberry'\nRaised by https://github.com/pytorch/pytorch/actions/runs/2218044244", "author": { - "login": "kumpera" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1108073536 }, { + "bodyText": "@mruberry has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", "author": { - "login": "kumpera" + "login": "facebook-github-bot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1108075965 }, { + "bodyText": "Hey @mruberry.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": { - "login": "kumpera" + "login": "github-actions" }, - "state": "COMMENTED" + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1108351107 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQebHmg==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "cla signed" + } + }, + { + "node": { + "name": "topic: not user facing" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "node": { + "name": "module: primTorch" + } + } + ] + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=77700 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "kit1980" + }, + "title": "Move pull linux-docs job to Ubuntu 20.04", + "body": "", + "headRefName": "sdym/pull-xenial-focal-linux-docs", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "kit1980" + }, + "email": "sdym@fb.com", + "name": "Sergii Dymchenko" + }, + "oid": "81261599614423baa17df72300b8e109677b6799" + } + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + }, + "totalCount": 1 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.facebook.com/cla/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNmNqE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147714" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuMI=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147726" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuM4=" + }, + { + "node": { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147733" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuNU=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147746" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuOI=" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147762" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuPI=" + }, + { + "node": { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147780" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuQQ=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528127876" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128023" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128196" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128519" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128575" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128663" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128857" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdYVY=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148336" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuzA=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867843/jobs/3528127882" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdXEg=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148344" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuzg=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "docker-builds" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "docker-build (pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528127883" + }, + { + "name": "docker-build (pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528127945" + }, + { + "name": "docker-build (pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128001" + }, + { + "name": "docker-build (pytorch-linux-bionic-py3.7-clang9)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128067" + }, + { + "name": "docker-build (pytorch-linux-bionic-rocm5.0-py3.7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128124" + }, + { + "name": "docker-build (pytorch-linux-bionic-rocm5.1-py3.7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128191" + }, + { + "name": "docker-build (pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128259" + }, + { + "name": "docker-build (pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128321" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3-clang5-android-ndk-r19c)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128365" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3-clang5-asan)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128446" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3-clang7-asan)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128507" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3-clang7-onnx)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128563" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3.7-gcc5.4)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128639" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3.7-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128687" + }, + { + "name": "docker-build (pytorch-linux-focal-py3.7-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128741" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdYLI=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148352" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduu0A=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528150762" + }, + { + "name": "linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528150903" + }, + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151086" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151258" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151511" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151776" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151896" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152014" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152139" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152216" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152378" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152516" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152599" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152723" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152802" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152913" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152969" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528153005" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528153062" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528153125" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528153207" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528242483" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528242528" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528245875" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528245914" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528245964" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528246008" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528248520" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528255086" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528255128" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274064" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274097" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274133" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274173" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274209" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528277014" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528308958" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309747" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309810" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309837" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309864" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309895" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309925" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528310044" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528310101" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384337" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384379" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384408" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384441" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384471" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNi1Nc=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148369" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduu1E=" + } + ], + "pageInfo": { + "hasNextPage": false + } + }, + "status": null, + "pushedDate": "2022-05-19T00:02:11Z", + "oid": "81261599614423baa17df72300b8e109677b6799" + } + } + ] + }, + "changedFiles": 3, + "files": { + "nodes": [ + { + "path": ".circleci/docker/build.sh" }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "path": ".circleci/docker/common/install_katex.sh" }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" - }, + "path": ".github/workflows/pull.yml" + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ { "author": { - "login": "kumpera" + "login": "suo" }, "state": "COMMENTED" }, { "author": { - "login": "kumpera" + "login": "kit1980" }, "state": "COMMENTED" }, { "author": { - "login": "xunnanxu" + "login": "janeyx99" }, - "state": "COMMENTED" - }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNS0xOFQxMjo0MTowNS0wNzowMLkyMDIyLTA1LTE4VDEyOjQxOjA0LTA3OjAwzjpD7es=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/77700\n\ud83d\udcc4 \u00a0Preview Python docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\u2753Need help or want to give feedback on the CI? Visit our office hours\n\n\u2705 No Failures (0 Pending)\nAs of commit 8126159 (more details on the Dr. CI page):\nExpand to see more\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", "author": { - "login": "xunnanxu" + "login": "facebook-github-bot" }, - "state": "COMMENTED" - }, - { - "author": { - "login": "xunnanxu" + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" }, - "state": "COMMENTED" + "databaseId": 1129400934 }, { + "bodyText": "@pytorchbot merge", "author": { - "login": "kumpera" + "login": "kit1980" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1131884232 }, { + "bodyText": "Merge failed due to Refusing to merge as mandatory check(s) linux-docs / build-docs (cpp), linux-docs / build-docs (python) are pending/not yet run for rule OSS CI\nRaised by https://github.com/pytorch/pytorch/actions/runs/2353067846", "author": { - "login": "kumpera" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1131886153 }, { + "bodyText": "@pytorchbot merge -f", "author": { - "login": "kumpera" + "login": "kit1980" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1131945610 }, { + "bodyText": "Hey @kit1980.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": { - "login": "kumpera" + "login": "github-actions" }, - "state": "COMMENTED" - }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1131947473 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQ1FKZg==", + "hasPreviousPage": false + } + }, + "labels": { + "edges": [ { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "node": { + "name": "Merged" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" - }, + "node": { + "name": "cla signed" + } + } + ] + } + } + } + } + }, + "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAYNi1Nc= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAYduu0A= name=pytorch number=77700 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { + "nodes": [ { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" - }, + "commit": { + "oid": "81261599614423baa17df72300b8e109677b6799", + "checkSuites": { + "nodes": [ + { + "checkRuns": { + "nodes": [ + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384494" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528477548" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528477578" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528728152" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528728187" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNqJcE=", + "hasNextPage": false + } + } + } + ] + } + } + } + ] + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=68111 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "chunyuan-w" + }, + "title": "Add JIT graph fuser for oneDNN Graph API (Preview4)", + "body": "## Description\r\nPreview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).\r\n\r\nOn the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:\r\n\r\n- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used\r\n- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.\r\n\r\n### User API:\r\nThe optimization pass is disabled by default. Users could enable it by:\r\n```\r\ntorch.jit.enable_onednn_fusion(True)\r\n```\r\n\r\n### Performance:\r\n[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:\r\n- SkyLake 8180 (1 socket of 28 cores):\r\n\r\n ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)\r\n\r\n- SkyLake 8180 (single thread):\r\n\r\n ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)\r\n \\* By mapping hardswish to oneDNN Graph, it\u2019s 8% faster than PyTorch JIT (NNC + OFI)\r\n \\** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops\r\n\r\n\r\n### Directory structure of the integration code\r\nFuser-related code are placed under:\r\n```\r\ntorch/csrc/jit/codegen/onednn/\r\n```\r\n\r\nOptimization pass registration is done in:\r\n```\r\ntorch/csrc/jit/passes/onednn_graph_fuser.h\r\n```\r\n\r\nCMake for the integration code is:\r\n```\r\ncaffe2/CMakeLists.txt\r\n```\r\n\r\n## Limitations\r\n\r\n- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.\r\n- We have only optimized the inference use case.", + "headRefName": "chunyuan/llga_preview2", + "headRepository": { + "nameWithOwner": "chunyuan-w/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "0096fcc49f277fd8e006fcb42e0cb28a1422ec98" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "7bcc4de26a5472f1d252735dd425b46794b0844f" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "3a2a588bfe6bbf9bf74d88d441cd22affda207da" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "ca7df12fbfaa3ddbabeca39b76300d17f4a33f2f" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "81d44f35b8bc043c38837d0694e5bc072203b832" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "14fd5d1bfc2c58a71379f778871e3fca0a8e79b2" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "954dc23663125897f4b199eb2a8607dc5fca3274" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "9f77a0b476accc678b6f0569e4ff33fa6bbe97fc" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "fbf3b23bc1288697e1aec539a7c4ee3dc0bcb84c" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "f8b8e78f786586c3cdf3966fd83ffa124d3eda70" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "6fffa2f7453ee7e0f8d8e2f73ea8a65230539589" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "849385404e6f3cd1cf7cef19f931ecf4fa28afdb" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "adbae7b77f8c0dbc59fccf15207d97ba86cfade2" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "6dcf2a4981aff24fa16fc7461ae4ec29690f956f" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "54f3e05ad524cffd0911ee93be3c50f589b51f58" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "edbfc640ea79a0af85757d9e73796dcc90231519" + } }, { - "author": { - "login": "pritamdamania87" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "67654db7cba562809d1b4a44cdda58af5cc9daaf" + } }, { - "author": { - "login": "pritamdamania87" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "9c9d99b930b11af9ff03f52d45bf49c652df758d" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ffb25119cd9ce815cc4d9d14a2317fcbbfa9ea86" + } }, { - "author": { - "login": "pritamdamania87" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ab9eee84512ca1bdfbc81e25c6eb67b29d0f302a" + } }, { - "author": { - "login": "pritamdamania87" - }, - "state": "APPROVED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "62a4642cf3330524990a69ac29e002c97812320a" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ca9b1223be4af2c8b4929303d498eafd71793128" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "6f4a23d24514a02954d2ec792830085f612223c9" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "b2a9a9c0926b02d0b2e87722ed61450f224a61d0" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e88b492be733f24b6aa395829c76add67d0901e7" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "c44336d7a914952bfb78e012e08d9a6d6dde5937" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "5157930f7b3921d41a586260582b574c915f6ca1" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "04cb8353813f6bbd0d913a994923cc7e1e291406" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNC0yNVQxMzozNTowMS0wNTowMLkyMDIyLTA0LTI1VDEzOjM1OjAwLTA1OjAwzjjC2d0=", - "hasPreviousPage": true - } - }, - "comments": { - "nodes": [ - { - "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", - "author": { - "login": "pytorchmergebot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1118495479 + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "62991eaad0e638bb0bced327e03f932f66f68732" + } }, { - "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", - "author": { - "login": "pytorchmergebot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1118511287 + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "7496bf1588050191595d833d23b8972b2f22655e" + } }, { - "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", - "author": { - "login": "pytorchmergebot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1118662274 + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "d9d35f23cca0cd29c78a845731b24826152dcf1c" + } }, { - "bodyText": "Merge failed due to Can't fetch all PR reviews Raised by https://github.com/pytorch/pytorch/actions/runs/2275691136\n\n@osalpekar @malfet This is failing because there are 109 review comments on this PR but we only fetch the first 100. This could be solved with a similar concept as how we fetch more comments/check_runs.", - "author": { - "login": "janeyx99" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1118689010 + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "f74ec134f18a65a7c72455bdf44f72e3ebb27105" + } }, { - "bodyText": "On a side note, has the test_fsdp_clip_grad_norm_norm_type_2_0_nested_fsdp_False_cpu_offload_CPUOffload failure on the distributed test first shard of this PR been addressed?", - "author": { - "login": "janeyx99" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1118693497 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOQqri9w==", - "hasPreviousPage": true - } - }, - "labels": { - "edges": [ + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "eb32cc65a975361160948bfc3d6a577991ea262e" + } + }, { - "node": { - "name": "oncall: distributed" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "c7665f8d695b680c54db0bad2b7b7df46d886b50" } }, { - "node": { - "name": "cla signed" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e6321ad8f59ea01130568c202d186448bb9cb9d0" } - } - ] - } - } - } - } - }, - "query_sha=6a8ce6412a780d5804bfe180ed1dc807269e1eae2ae50de2346d56d1283884bc cursor=Y3Vyc29yOnYyOpO5MjAyMi0wNC0yNVQxMzozNTowMS0wNTowMLkyMDIyLTA0LTI1VDEzOjM1OjAwLTA1OjAwzjjC2d0= name=pytorch number=76123 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "reviews": { - "nodes": [ + }, { - "author": { - "login": "pritamdamania87" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "a72cd0d02693f45e5354a70654581ad514581ec7" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "b3cd3028b4ed31805e82f7eaf02217ab74ca59b9" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "49a592d9788d08e6cd0593882f867e129057c1cc" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "0575766b2144b13f6a38227c4e2b8d22ec8db80f" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "b5c9b10ff87d622350e8ca64fae3a476eb70d5aa" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "66bc652a30ccc329adb929870a4ac726bb98b38c" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "72b9ca9c8e2dac98cbb7199b3dfac7c7305b80c5" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "a7892ed7373207d96406c8b5734a089643c5cdbd" + } }, { - "author": { - "login": "kumpera" - }, - "state": "COMMENTED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNC0yMlQyMjozNzo1NC0wNTowMLkyMDIyLTA0LTIyVDE4OjAyOjA5LTA1OjAwzjip7G8=", - "hasPreviousPage": false - } - } - } - } - } - }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=71759 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "closed": true, - "isCrossRepository": true, - "author": { - "login": "coolteemf" - }, - "title": "Optimize grid sample 3d", - "body": "Fixes #71415\r\nI have implemented the changes that replicate what @to-mi did in this [PR](https://github.com/pytorch/pytorch/pull/65986#issue-1012959443) for the 3D case :\r\n\r\n> Fixes #64977\r\n> \r\n> Avoids creating a tensor for and calculating `input` gradient if it's not needed in the backward pass of `grid_sample` (2d case, native CPU & CUDA kernels). Especially the tensor creation seemed time consuming (see #64977).\r\n> \r\n> Brief description of the changes:\r\n> \r\n> * I have tried to go with rather minimal changes. It would probably be possible to make a more elegant version with a bit larger refactoring (or possibly with better understanding of PyTorch internals and C++ functionalities).\r\n> \r\n> * Changed the `native_functions.yaml` and `derivatives.yaml` so that the gradient input mask is passed to the functions.\r\n> \r\n> * Changed the CPU kernels:\r\n> (1) added `bool input_requires_grad` template parameter to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorAccessor* gInp_slice_ptr` instead of `TensorAccessor& gInp_slice` so that I can pass a `nullptr` in case gradient for `input` is not requested. (A bit inelegant perhaps, but allows to keep one signature for `backward` function and not require breaking it to smaller pieces. Perhaps there's a more elegant way to achieve this?)\r\n> \r\n> * Changed CUDA kernel:\r\n> (1) added ~`bool input_requires_grad` template parameter~ `const bool input_requires_grad` argument to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorInfo()` instead of `getTensorInfo(grad_input)` in case gradient for `input` is not requested.\r\n> \r\n> * Modified tests in `test/test_nn.py` so that they run also cases with no `input` gradient needed.\r\n> \r\n> * Have not touched the CPU fallback kernel.\r\n\r\nNote: the changes number (3) are N/A in this case.\r\n\r\n", - "headRefName": "optimize_grid_sample_3d", - "headRepository": { - "nameWithOwner": "coolteemf/pytorch" - }, - "baseRefName": "master", - "baseRepository": { - "nameWithOwner": "pytorch/pytorch", - "isPrivate": false, - "defaultBranchRef": { - "name": "master" - } - }, - "mergeCommit": null, - "commits_with_authors": { - "nodes": [ + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "d54cb084e1daad8a08c3f8de0ad3f7afb5b05ac1" + } + }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" }, - "oid": "e0b0d1e695aeddceaf265da602c4704592053e9e" + "oid": "aef71d692a8a159e0ca56be363e2cc1225ce7647" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "563ec73747ad53b63b36736c47c4342f962c2a09" + "oid": "bf618e205ec31cff962dcc8ab478e0a699a9572d" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "51abe41a132d9dd5b1c0551bdca902aacc028ff8" + "oid": "e4a331f1088448f7d7d86256ce71e0e71da006b0" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "be9898205992034a00e8ace8a55c2ecdcee2c2f8" + "oid": "0b743523d1430fec759d5fefbb687f17c89335a5" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "2929c60b64384c2deae0f7dea8bab94ad4bc9ec8" + "oid": "e80a351a62d98b810ec8985c4b25257af1d6c5bb" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "9241b737e7e2b257905cc74ad9c50b737d7f9d0a" + "oid": "c189eca154b6691919d0e21489d1c322c7435c0b" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" }, - "oid": "64d6b795d0636928a8aa2fd3da01302fb5f5f7af" + "oid": "e080a067c75d7b888a8a362682a2d5ba70e0c3a8" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" }, - "oid": "4503577e53760a0006f1e80ca6bfe04d2be90470" + "oid": "028561fbf8f3ed90e074e6e0e3a4ca4dd7ffa2a8" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "b16f4b11ffbbbf2ca2098f9702af4ef6b6fc5e1f" + "oid": "d550cf14037badd4caa2f52202e2f20bc4db8432" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "7ffc23368a604afdc92d2818747f730ce31a2bb5" + "oid": "574159ebadd1dec24daaf883879ffeca8d9e71b7" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "b85292604b9ad6c31706b76b5a5498c4f6d94309" + "oid": "9eb3ee98ea756067ed1c8f52f309f6d3e211a904" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "9d81d7bae8ad91aaa24b3ceab83e3138894dbc69" + "oid": "29929f48be03dcdd1bbfade572de7feafa825547" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "e79f6a2202512b294c55bf4bfb2e0524fafd4c48" + "oid": "8a7358ca8da547b40ea1a99ddc57ebed19959684" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "f683e8aec7aea76097a264eec01511e704c31154" + "oid": "6606637d2c5525b43e294a8b366a85052e1be0c6" } }, { "commit": { "author": { "user": { - "login": "coolteemf" + "login": "sanchitintel" }, - "email": "67541941+coolteemf@users.noreply.github.com", - "name": "Fran\u00e7ois Lecomte" + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "b932e9e286c22aaf352375186df851ef060b295a" + "oid": "5ecfd1f28b87045deb8bc8ffe33b3d8b906f3264" } }, { "commit": { "author": { - "user": null, - "email": "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", - "name": "coolteemf" + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" }, - "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22" + "oid": "be2d4345c65442c4cfbe8afdfb2ae0893945da42" + } + }, + { + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "b5b89d3644a43e2dbda841cafb71b32edbe07c8a" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nikita.shulga@gmail.com", + "name": "Nikita Shulga" + }, + "oid": "73881411e2bfb3aaa2e89926a82390b4c587ad75" } } ], "pageInfo": { - "endCursor": "MTY", + "endCursor": "NjI", "hasNextPage": false }, - "totalCount": 16 + "totalCount": 62 }, "commits": { "nodes": [ @@ -8076,88 +13060,25 @@ "checkRuns": { "nodes": [ { - "name": "Facebook CLA Check", - "conclusion": "SUCCESS", - "detailsUrl": "https://code.intern.facebook.com/cla/" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGYqY=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801320" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_T6g=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-clang7-onnx" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020089?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302165846?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302165949?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIob0=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801849" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ubk=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3-clang5-mobile-build" - } - }, - "checkRuns": { - "nodes": [ + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.intern.facebook.com/cla/" + }, { - "name": "build", + "name": "Meta Internal-Only Changes Check", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302019921?check_suite_focus=true" + "detailsUrl": "https://opensource.facebook.com/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ1E=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NXnc=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801852" + "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625010" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ubw=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYwzI=" }, { "node": { @@ -8167,41 +13088,81 @@ }, "workflowRun": { "workflow": { - "name": "linux-bionic-rocm4.5-py3.7" + "name": "Lint" } }, "checkRuns": { "nodes": [ { - "name": "build", + "name": "clang-format", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302019934?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903895825" }, { - "name": "test (default, 2, 2, linux.rocm.gpu)", + "name": "py2-setup-validate-errormsg", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302431993?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903895911" }, { - "name": "test (default, 1, 2, linux.rocm.gpu)", + "name": "quick-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302432078?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903895963" }, { - "name": "test (distributed, 1, 1, linux.rocm.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302432150?check_suite_focus=true" + "name": "shellcheck", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896134" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896253" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896371" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896525" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896658" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896771" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896795" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896838" + }, + { + "name": "mypy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896897" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwMsZY=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NZqw=", "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801853" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625458" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ub0=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxPI=" }, { "node": { @@ -8211,41 +13172,26 @@ }, "workflowRun": { "workflow": { - "name": "win-vs2019-cuda11.3-py3" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { "nodes": [ { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302019928?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5303266925?check_suite_focus=true" - }, - { - "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5303267017?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5303267128?check_suite_focus=true" + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440031/jobs/2903895828" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwZbzg=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NYIw=", "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801855" + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625463" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ub8=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxPc=" }, { "node": { @@ -8255,485 +13201,1024 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "pull" } }, "checkRuns": { "nodes": [ { - "name": "mypy", + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896014" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302019930?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896165" }, { - "name": "shellcheck", + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896394" + }, + { + "name": "linux-bionic-rocm4.5-py3.7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020111?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896572" }, { - "name": "py2-setup-validate-errormsg", + "name": "linux-xenial-py3.7-clang7-asan / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020318?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896666" }, { - "name": "clang-format", + "name": "linux-xenial-py3.7-clang7-onnx / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020421?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896778" }, { - "name": "cmakelint", + "name": "linux-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020539?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896837" }, { - "name": "toc", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020668?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896896" }, { - "name": "quick-checks", + "name": "linux-xenial-py3.7-gcc5.4 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020780?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896936" }, { - "name": "clang-tidy", + "name": "linux-xenial-py3-clang5-mobile-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020970?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897025" }, { - "name": "flake8-py3", + "name": "linux-xenial-py3.7-gcc7-no-ops / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302021124?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGbAQ=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801856" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcA=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-clang7-asan" - } - }, - "checkRuns": { - "nodes": [ + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897161" + }, { - "name": "build", + "name": "linux-xenial-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020084?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897213" }, { - "name": "test (default, 3, 3, linux.2xlarge)", + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302192846?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897280" }, { - "name": "test (default, 1, 3, linux.2xlarge)", + "name": "win-vs2019-cpu-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302192926?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897368" }, { - "name": "test (default, 2, 3, linux.2xlarge)", + "name": "win-vs2019-cuda11.3-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302193029?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwJC4U=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801857" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcE=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" - } - }, - "checkRuns": { - "nodes": [ + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897431" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897476" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897578" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897630" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897699" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897733" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904327787" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904327838" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904327956" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904327997" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904328035" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904328093" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904328131" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904328177" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904333962" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904334006" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904430419" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904430459" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904430508" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904430573" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904443663" + }, { - "name": "build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020092?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ_w=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801862" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcY=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-gcc5.4" - } - }, - "checkRuns": { - "nodes": [ + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904443723" + }, { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302020048?check_suite_focus=true" + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904443787" }, { - "name": "test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302147216?check_suite_focus=true" + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904454239" }, { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302147336?check_suite_focus=true" + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904454303" }, { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302147409?check_suite_focus=true" + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904554602" }, { - "name": "test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302147493?check_suite_focus=true" + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904554698" }, { - "name": "test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302147622?check_suite_focus=true" + "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904588855" }, { - "name": "test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302147822?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIWu4=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801866" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Uco=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" - } - }, - "checkRuns": { - "nodes": [ + "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904588886" + }, { - "name": "build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5302019929?check_suite_focus=true" + "name": "win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904588924" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904655702" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904656104" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904656150" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904656192" + }, + { + "name": "linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904706520" + }, + { + "name": "linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904706565" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ1k=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_fN1g=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801869" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625483" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Uc0=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxQs=" } ], "pageInfo": { - "hasNextPage": true + "hasNextPage": false } }, - "pushedDate": "2022-02-23T10:39:30Z", - "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22" + "status": { + "contexts": [ + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17048428?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17048429?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17048431?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17048430?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-03-21T19:58:52Z", + "oid": "73881411e2bfb3aaa2e89926a82390b4c587ad75" } } ] }, - "changedFiles": 9, - "files": { + "changedFiles": 37, + "files": { + "nodes": [ + { + "path": "aten/src/ATen/core/interned_strings.h" + }, + { + "path": "caffe2/CMakeLists.txt" + }, + { + "path": "cmake/Dependencies.cmake" + }, + { + "path": "cmake/Modules/FindMKLDNN.cmake" + }, + { + "path": "cmake/public/mkldnn.cmake" + }, + { + "path": "docs/source/jit.rst" + }, + { + "path": "test/test_jit_llga_fuser.py" + }, + { + "path": "torch/_C/__init__.pyi.in" + }, + { + "path": "torch/csrc/jit/codegen/onednn/LlgaTensorImpl.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/LlgaTensorImpl.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/README.md" + }, + { + "path": "torch/csrc/jit/codegen/onednn/defer_size_check.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/defer_size_check.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_fuser.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_fuser.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_helper.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_helper.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/graph_rewriter.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/guard_shape.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/guard_shape.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/interface.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/interface.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/kernel.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/kernel.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/layout_propagation.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/layout_propagation.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/operator.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/prepare_binary.cpp" + }, + { + "path": "torch/csrc/jit/codegen/onednn/prepare_binary.h" + }, + { + "path": "torch/csrc/jit/codegen/onednn/register_interface.cpp" + }, + { + "path": "torch/csrc/jit/ir/alias_analysis.cpp" + }, + { + "path": "torch/csrc/jit/ir/ir.cpp" + }, + { + "path": "torch/csrc/jit/passes/inline_autodiff_subgraphs.cpp" + }, + { + "path": "torch/csrc/jit/passes/onednn_graph_fuser.h" + }, + { + "path": "torch/csrc/jit/python/init.cpp" + }, + { + "path": "torch/csrc/jit/runtime/operator.cpp" + }, + { + "path": "torch/jit/__init__.py" + } + ], + "pageInfo": { + "endCursor": "Mzc", + "hasNextPage": false + } + }, + "reviews": { "nodes": [ { - "path": "aten/src/ATen/native/GridSampler.cpp" + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/native/cpu/GridSamplerKernel.cpp" + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "chunyuan-w" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "wukong1992" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "eellison" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "malfet" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/native/cuda/GridSampler.cpp" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/native/cuda/GridSampler.cu" + "author": { + "login": "malfet" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/native/cuda/GridSampler.h" + "author": { + "login": "malfet" + }, + "state": "COMMENTED" }, { - "path": "aten/src/ATen/native/native_functions.yaml" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "test/forward_backward_compatibility/check_forward_backward_compatibility.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "test/test_nn.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "tools/autograd/derivatives.yaml" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" } ], "pageInfo": { - "endCursor": "OQ", - "hasNextPage": false + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMS0xMi0xMFQwOToyNDoxOS0wODowMLkyMDIxLTEyLTEwVDA5OjI0OjE5LTA4OjAwzjFryLE=", + "hasPreviousPage": false } }, - "reviews": { + "comments": { "nodes": [ { + "bodyText": "Looks like this broke master https://hud.pytorch.org/pytorch/pytorch/commit/7dd08230117f4fa8bb82b3524e90fb00340198c7. I am reverting.", "author": { - "login": "albanD" + "login": "suo" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074498483 }, { + "bodyText": "@pytorchbot revert this", "author": { - "login": "coolteemf" + "login": "suo" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074498550 }, { + "bodyText": "Looks like this broke master https://hud.pytorch.org/pytorch/pytorch/commit/7dd08230117f4fa8bb82b3524e90fb00340198c7. I am reverting.\n\nOops! Will fix it ASAP.", "author": { - "login": "albanD" + "login": "sanchitintel" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1074499668 }, { + "bodyText": "This pull request has been reverted by e5bf879. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).", "author": { - "login": "coolteemf" + "login": "facebook-github-bot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074508608 }, { + "bodyText": "This pull request has been reverted by e5bf879. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).", "author": { - "login": "albanD" + "login": "facebook-github-bot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1082508130 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQAuLsw==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "oncall: jit" + } }, { - "author": { - "login": "coolteemf" - }, - "state": "COMMENTED" + "node": { + "name": "triaged" + } + }, + { + "node": { + "name": "open source" + } + }, + { + "node": { + "name": "cla signed" + } + }, + { + "node": { + "name": "Reverted" + } }, { + "node": { + "name": "intel priority" + } + } + ] + } + } + } + } + }, + "query_sha=62ce809793481ce6ddce6e1a19d9b0761755ff0ff75decaf8a79419eaf793110 cursor=Y3Vyc29yOnYyOpHOQAuLsw== name=pytorch number=68111 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "comments": { + "nodes": [ + { + "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/chunyuan-w/pytorch/blob/7496bf1588050191595d833d23b8972b2f22655e/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk\n\u2705 triggered\n\n\nlinux-docs\nciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-vulkan-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-build\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-custom-build-static\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-onnx\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7-no-ops\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\ncaffe2-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\ndocker-builds\nciflow/all, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-custom-ops\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-full-jit\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-metal\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-full-jit\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda10.2-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-binary-conda\nciflow/binaries, ciflow/binaries/conda\n\ud83d\udeab skipped\n\n\nlinux-binary-libtorch-cxx11-abi\nciflow/binaries, ciflow/binaries/libtorch\n\ud83d\udeab skipped\n\n\nlinux-binary-libtorch-pre-cxx11\nciflow/binaries, ciflow/binaries/libtorch\n\ud83d\udeab skipped\n\n\nlinux-binary-manywheel\nciflow/binaries, ciflow/binaries/wheel\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-docs-push\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-no-ops\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-arm64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-lite-interpreter-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-11-py3-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.1-py3.7-gcc7-debug\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.1-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.5-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-build\nciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\n\n\nYou can add a comment to the PR and tag @pytorchbot with the following commands:\n\n# ciflow rerun, \"ciflow/default\" will always be added automatically\n@pytorchbot ciflow rerun\n\n# ciflow rerun with additional labels \"-l \", which is equivalent to adding these labels manually and trigger the rerun\n@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow\n\nFor more information, please take a look at the CI Flow Wiki.", "author": { - "login": "coolteemf" + "login": "pytorch-probot" }, - "state": "COMMENTED" + "authorAssociation": "NONE", + "editor": { + "login": "pytorch-probot" + }, + "databaseId": 964902865 }, { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/68111\nNeed help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 7388141 (more details on the Dr. CI page):\n\n\n29/29 failures introduced in this PR\n\n\n\ud83d\udd75\ufe0f 29 new failures recognized by patterns\nThe following CI failures do not appear to be due to upstream breakages:\n pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (1/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:31:38.6978776Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:31:38.3001628Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:31:38.5169168Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:31:38.5362923Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:31:38.5413452Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:31:38.5458747Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:31:38.5484014Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:31:38.5497924Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:31:38.5656491Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:31:38.5678893Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:31:38.6888479Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0f6488c20adb4dca4\n2022-03-21T21:31:38.6978776Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:31:38.6992648Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:31:38.7003010Z ##[error]Process completed with exit code 2.\n2022-03-21T21:31:38.7044027Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:31:38.7044261Z with:\n2022-03-21T21:31:38.7044413Z env:\n2022-03-21T21:31:38.7044565Z IN_CI: 1\n2022-03-21T21:31:38.7044709Z IS_GHA: 1\n2022-03-21T21:31:38.7044885Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:31:38.7045067Z ##[endgroup]\n2022-03-21T21:31:38.7060958Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge) (2/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:35:19.2635222Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:35:18.9028722Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:35:19.1132721Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:35:19.1310590Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:35:19.1360251Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:35:19.1386865Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:35:19.1429182Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:35:19.1441925Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:35:19.1468280Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:35:19.1617667Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:35:19.2545368Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-098be2985e0392130\n2022-03-21T21:35:19.2635222Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:35:19.2648463Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:35:19.2658727Z ##[error]Process completed with exit code 2.\n2022-03-21T21:35:19.2706355Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:35:19.2706591Z with:\n2022-03-21T21:35:19.2706748Z env:\n2022-03-21T21:35:19.2706908Z IN_CI: 1\n2022-03-21T21:35:19.2707061Z IS_GHA: 1\n2022-03-21T21:35:19.2707246Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:35:19.2707438Z ##[endgroup]\n2022-03-21T21:35:19.2724554Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (3/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:11:57.5531419Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:11:52.7662022Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T23:11:53.1213298Z ---------------------------------------- 8.1/8.1 MB 23.6 MB/s eta 0:00:00\n2022-03-21T23:11:53.1644665Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:11:53.2218699Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T23:11:53.2389674Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T23:11:53.2787295Z -------------------------------------- 247.7/247.7 KB 7.4 MB/s eta 0:00:00\n2022-03-21T23:11:53.3761842Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:11:53.5457622Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T23:11:57.4175080Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T23:11:57.5296815Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0105d4db093574f40\n2022-03-21T23:11:57.5531419Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:11:57.5564814Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:11:57.5587712Z ##[error]Process completed with exit code 2.\n2022-03-21T23:11:57.5790311Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T23:11:57.5790832Z with:\n2022-03-21T23:11:57.5791104Z env:\n2022-03-21T23:11:57.5791358Z IN_CI: 1\n2022-03-21T23:11:57.5791620Z IS_GHA: 1\n2022-03-21T23:11:57.5791939Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:11:57.5792425Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T23:11:57.5792884Z ##[endgroup]\n\n\n pull / linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu) (4/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T02:17:12.6257577Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T02:17:11.9280556Z Using cached https://files.pythonhosted.org/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl\n2022-03-22T02:17:11.9335199Z Collecting urllib3<1.27,>=1.25.4 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:11.9682045Z Using cached https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl\n2022-03-22T02:17:11.9850357Z Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:12.0403171Z Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n2022-03-22T02:17:12.0468875Z Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:12.0590000Z Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n2022-03-22T02:17:12.0607093Z Installing collected packages: jmespath, urllib3, six, python-dateutil, botocore, s3transfer, boto3\n2022-03-22T02:17:12.5273459Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.9\n2022-03-22T02:17:12.6032812Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 worker-rocm-amd-114\n2022-03-22T02:17:12.6257577Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T02:17:12.6259543Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T02:17:12.6291924Z ##[error]Process completed with exit code 2.\n2022-03-22T02:17:12.6387977Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T02:17:12.6388298Z with:\n2022-03-22T02:17:12.6388521Z wait-ssh: false\n2022-03-22T02:17:12.6388727Z env:\n2022-03-22T02:17:12.6388932Z IN_CI: 1\n2022-03-22T02:17:12.6389143Z IS_GHA: 1\n2022-03-22T02:17:12.6389368Z GIT_DEFAULT_BRANCH: master\n2022-03-22T02:17:12.6389669Z DOCKER_HOST: unix:///run/user/1121/docker.sock\n\n\n pull / linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge) (5/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:19:24.4890693Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:19:24.0962005Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:19:24.3152253Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:19:24.3341183Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:19:24.3391374Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:19:24.3436392Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:19:24.3448982Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:19:24.3474092Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:19:24.3502003Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:19:24.3655072Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:19:24.4799309Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0bc9250521f338cae\n2022-03-21T22:19:24.4890693Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:19:24.4903625Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:19:24.4913841Z ##[error]Process completed with exit code 2.\n2022-03-21T22:19:24.4957338Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:19:24.4957575Z with:\n2022-03-21T22:19:24.4957735Z env:\n2022-03-21T22:19:24.4957900Z IN_CI: 1\n2022-03-21T22:19:24.4958055Z IS_GHA: 1\n2022-03-21T22:19:24.4958246Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:19:24.4958437Z ##[endgroup]\n2022-03-21T22:19:24.4989649Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (6/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T01:05:07.6983899Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T01:05:06.8364546Z Using cached https://files.pythonhosted.org/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl\n2022-03-22T01:05:06.8431763Z Collecting urllib3<1.27,>=1.25.4 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:06.8949391Z Using cached https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl\n2022-03-22T01:05:06.9180079Z Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:06.9803351Z Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n2022-03-22T01:05:06.9882133Z Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:07.0067062Z Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n2022-03-22T01:05:07.0088676Z Installing collected packages: urllib3, jmespath, six, python-dateutil, botocore, s3transfer, boto3\n2022-03-22T01:05:07.5819667Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.9\n2022-03-22T01:05:07.6774717Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 worker-rocm-amd-60\n2022-03-22T01:05:07.6983899Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T01:05:07.6988652Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T01:05:07.7023073Z ##[error]Process completed with exit code 2.\n2022-03-22T01:05:07.7102087Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T01:05:07.7102389Z with:\n2022-03-22T01:05:07.7102603Z wait-ssh: false\n2022-03-22T01:05:07.7102820Z env:\n2022-03-22T01:05:07.7103015Z IN_CI: 1\n2022-03-22T01:05:07.7103224Z IS_GHA: 1\n2022-03-22T01:05:07.7103458Z GIT_DEFAULT_BRANCH: master\n2022-03-22T01:05:07.7103737Z DOCKER_HOST: unix:///run/user/1502/docker.sock\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge) (7/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:51:39.3637996Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:51:39.2041249Z Attempting uninstall: s3transfer\n2022-03-21T20:51:39.2043010Z Found existing installation: s3transfer 0.3.7\n2022-03-21T20:51:39.2083799Z Uninstalling s3transfer-0.3.7:\n2022-03-21T20:51:39.2089675Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T20:51:39.2480546Z Attempting uninstall: boto3\n2022-03-21T20:51:39.2482953Z Found existing installation: boto3 1.16.34\n2022-03-21T20:51:39.2584292Z Uninstalling boto3-1.16.34:\n2022-03-21T20:51:39.2599474Z Successfully uninstalled boto3-1.16.34\n2022-03-21T20:51:39.3130921Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T20:51:39.3550598Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-03ef7efc3078e3da5\n2022-03-21T20:51:39.3637996Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:51:39.3650651Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:51:39.3660484Z ##[error]Process completed with exit code 2.\n2022-03-21T20:51:39.3696465Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:51:39.3696693Z with:\n2022-03-21T20:51:39.3696850Z env:\n2022-03-21T20:51:39.3697012Z IN_CI: 1\n2022-03-21T20:51:39.3697161Z IS_GHA: 1\n2022-03-21T20:51:39.3697342Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:51:39.3697528Z ##[endgroup]\n2022-03-21T20:51:39.3730420Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge) (8/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:03:36.3916860Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:03:36.0096309Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:03:36.2278560Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:03:36.2461618Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:03:36.2513260Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:03:36.2541524Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:03:36.2554899Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:03:36.2598277Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:03:36.2758299Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:03:36.2780690Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:03:36.3825021Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0a4a552890e6ef7d3\n2022-03-21T21:03:36.3916860Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:03:36.3930343Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:03:36.3941263Z ##[error]Process completed with exit code 2.\n2022-03-21T21:03:36.3979258Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:03:36.3979496Z with:\n2022-03-21T21:03:36.3979654Z env:\n2022-03-21T21:03:36.3979814Z IN_CI: 1\n2022-03-21T21:03:36.3979968Z IS_GHA: 1\n2022-03-21T21:03:36.3980157Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:03:36.3980360Z ##[endgroup]\n2022-03-21T21:03:36.3996257Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (9/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T00:41:15.5325784Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T00:41:10.3015614Z Downloading s3transfer-0.5.2-py3-none-any.whl (79 kB)\n2022-03-22T00:41:10.3625659Z ---------------------------------------- 79.5/79.5 KB 1.1 MB/s eta 0:00:00\n2022-03-22T00:41:10.4120236Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-22T00:41:10.4170155Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-22T00:41:10.4722115Z -------------------------------------- 247.7/247.7 KB 5.2 MB/s eta 0:00:00\n2022-03-22T00:41:10.4843512Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-22T00:41:10.6596108Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-22T00:41:10.8733354Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-22T00:41:15.3745408Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-22T00:41:15.4987162Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-09cacc848abc3dd32\n2022-03-22T00:41:15.5325784Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T00:41:15.5373630Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T00:41:15.5404353Z ##[error]Process completed with exit code 2.\n2022-03-22T00:41:15.5790508Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-22T00:41:15.5791192Z with:\n2022-03-22T00:41:15.5791530Z env:\n2022-03-22T00:41:15.5791849Z IN_CI: 1\n2022-03-22T00:41:15.5792186Z IS_GHA: 1\n2022-03-22T00:41:15.5792599Z GIT_DEFAULT_BRANCH: master\n2022-03-22T00:41:15.5793237Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-22T00:41:15.5793831Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge) (10/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:50:32.9799307Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:50:32.8167560Z Attempting uninstall: s3transfer\n2022-03-21T20:50:32.8169351Z Found existing installation: s3transfer 0.3.7\n2022-03-21T20:50:32.8213295Z Uninstalling s3transfer-0.3.7:\n2022-03-21T20:50:32.8219209Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T20:50:32.8602320Z Attempting uninstall: boto3\n2022-03-21T20:50:32.8603289Z Found existing installation: boto3 1.16.34\n2022-03-21T20:50:32.8704535Z Uninstalling boto3-1.16.34:\n2022-03-21T20:50:32.8719403Z Successfully uninstalled boto3-1.16.34\n2022-03-21T20:50:32.9244278Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T20:50:32.9710449Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0c568461a276d4a71\n2022-03-21T20:50:32.9799307Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:50:32.9812238Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:50:32.9823052Z ##[error]Process completed with exit code 2.\n2022-03-21T20:50:32.9859290Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:50:32.9859527Z with:\n2022-03-21T20:50:32.9859664Z env:\n2022-03-21T20:50:32.9859817Z IN_CI: 1\n2022-03-21T20:50:32.9859977Z IS_GHA: 1\n2022-03-21T20:50:32.9860144Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:50:32.9860327Z ##[endgroup]\n2022-03-21T20:50:32.9893642Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge) (11/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:05:00.7163042Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:05:00.6660824Z #10 0x55fc8a3ea801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:05:00.6661768Z #11 0x55fc8a3f57a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:05:00.6662455Z #12 0x55fc8a3f580b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:05:00.6663570Z #13 0x55fc8a3f5908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:05:00.6663952Z #14 0x55fc8a3f5908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:05:00.6664431Z #15 0x55fc8a3f5908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:05:00.6665304Z #16 0x55fc8a3f5ccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:05:00.7162113Z #17 0x7f940d00f83f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:05:00.7162534Z #18 0x55fc8a39a554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:05:00.7162711Z \n2022-03-21T21:05:00.7163042Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:05:00.7334595Z + retcode=1\n2022-03-21T21:05:00.7334954Z + set -e\n2022-03-21T21:05:00.7335215Z + return 1\n2022-03-21T21:05:00.7338688Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:05:00.7339232Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:05:00.7340113Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:05:00.7340612Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:05:00.7341187Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:05:00.7341668Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:05:00.7344466Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge) (12/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:06:03.4437430Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:06:03.0752199Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:06:03.2853252Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:06:03.3032326Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:06:03.3081589Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:06:03.3093911Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:06:03.3120244Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:06:03.3162406Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:06:03.3188431Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:06:03.3337181Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:06:03.4348072Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0ee48c8811fafc444\n2022-03-21T22:06:03.4437430Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:06:03.4450920Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:06:03.4461263Z ##[error]Process completed with exit code 2.\n2022-03-21T22:06:03.4502346Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:06:03.4502576Z with:\n2022-03-21T22:06:03.4502730Z env:\n2022-03-21T22:06:03.4502888Z IN_CI: 1\n2022-03-21T22:06:03.4503038Z IS_GHA: 1\n2022-03-21T22:06:03.4503302Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:06:03.4503492Z ##[endgroup]\n2022-03-21T22:06:03.4519156Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (13/29)\nStep: \"Test\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:50:13.2205634Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:50:12.8679322Z + python3 -m pip install boto3==1.19.12\n2022-03-21T20:50:13.0744228Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T20:50:13.0916284Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T20:50:13.0964264Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T20:50:13.1005656Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T20:50:13.1017299Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T20:50:13.1041042Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T20:50:13.1189450Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T20:50:13.1208751Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T20:50:13.2119445Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d02da60fd18c22f5\n2022-03-21T20:50:13.2205634Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:50:13.2217939Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:50:13.2220259Z ##[error]Process completed with exit code 2.\n2022-03-21T20:50:13.2248664Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:50:13.2249012Z with:\n2022-03-21T20:50:13.2249260Z env:\n2022-03-21T20:50:13.2249500Z IN_CI: 1\n2022-03-21T20:50:13.2249738Z IS_GHA: 1\n2022-03-21T20:50:13.2250025Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:50:13.2250329Z ##[endgroup]\n2022-03-21T20:50:13.2272735Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (14/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:47:38.0451999Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:47:37.5554508Z + python3 -m pip install boto3==1.19.12\n2022-03-21T23:47:37.8411473Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T23:47:37.8631484Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T23:47:37.8699561Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T23:47:37.8737037Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T23:47:37.8754443Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T23:47:37.8814393Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T23:47:37.8849540Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:47:37.9059579Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:47:38.0336298Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0b44f47f4292089a2\n2022-03-21T23:47:38.0451999Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:47:38.0469471Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:47:38.0484106Z ##[error]Process completed with exit code 2.\n2022-03-21T23:47:38.0532678Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T23:47:38.0533007Z with:\n2022-03-21T23:47:38.0533223Z env:\n2022-03-21T23:47:38.0533440Z IN_CI: 1\n2022-03-21T23:47:38.0533649Z IS_GHA: 1\n2022-03-21T23:47:38.0533902Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:47:38.0534170Z GPU_FLAG: --gpus all\n2022-03-21T23:47:38.0534401Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge) (15/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:04:59.3115800Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:04:59.2595213Z #10 0x55a7f39a4801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:04:59.2595707Z #11 0x55a7f39af7a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:04:59.2597203Z #12 0x55a7f39af80b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:04:59.2598205Z #13 0x55a7f39af908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:04:59.2598697Z #14 0x55a7f39af908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:04:59.2599178Z #15 0x55a7f39af908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:04:59.2599747Z #16 0x55a7f39afccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:04:59.3114751Z #17 0x7f3b3822383f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:04:59.3115277Z #18 0x55a7f3954554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:04:59.3115468Z \n2022-03-21T21:04:59.3115800Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:04:59.3292385Z + retcode=1\n2022-03-21T21:04:59.3292781Z + set -e\n2022-03-21T21:04:59.3293062Z + return 1\n2022-03-21T21:04:59.3295462Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:04:59.3295802Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:04:59.3296394Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:04:59.3296700Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:04:59.3297055Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:04:59.3297416Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:04:59.3299623Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (16/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:14:31.7846086Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:14:25.5525714Z Collecting jmespath<1.0.0,>=0.7.1\n2022-03-21T22:14:25.5568155Z Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)\n2022-03-21T22:14:25.5952617Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T22:14:25.6169392Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T22:14:25.6629996Z -------------------------------------- 247.7/247.7 KB 5.1 MB/s eta 0:00:00\n2022-03-21T22:14:25.6710247Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:14:25.8284354Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:14:25.9816751Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T22:14:31.6672236Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T22:14:31.7630473Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0ed0915ecee5d2424\n2022-03-21T22:14:31.7846086Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:14:31.7876742Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:14:31.7897140Z ##[error]Process completed with exit code 2.\n2022-03-21T22:14:31.8195621Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T22:14:31.8196110Z with:\n2022-03-21T22:14:31.8196356Z env:\n2022-03-21T22:14:31.8196614Z IN_CI: 1\n2022-03-21T22:14:31.8196876Z IS_GHA: 1\n2022-03-21T22:14:31.8197169Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:14:31.8197652Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T22:14:31.8198093Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge) (17/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:19:15.8845728Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:19:15.5116060Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:19:15.7231476Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:19:15.7409711Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:19:15.7458478Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:19:15.7470508Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:19:15.7496799Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:19:15.7538362Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:19:15.7566161Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:19:15.7711630Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:19:15.8753543Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0e2b3b4ddb246ff2a\n2022-03-21T21:19:15.8845728Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:19:15.8859814Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:19:15.8870165Z ##[error]Process completed with exit code 2.\n2022-03-21T21:19:15.8917039Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:19:15.8917279Z with:\n2022-03-21T21:19:15.8917433Z env:\n2022-03-21T21:19:15.8917586Z IN_CI: 1\n2022-03-21T21:19:15.8917734Z IS_GHA: 1\n2022-03-21T21:19:15.8917917Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:19:15.8918102Z ##[endgroup]\n2022-03-21T21:19:15.8934572Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (18/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:19:48.5900162Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:19:48.0742254Z + python3 -m pip install boto3==1.19.12\n2022-03-21T23:19:48.3742563Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T23:19:48.3976536Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T23:19:48.4048700Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T23:19:48.4065374Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T23:19:48.4128076Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T23:19:48.4164273Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T23:19:48.4202610Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:19:48.4416723Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:19:48.5773033Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-07ab7a3c4a5402af2\n2022-03-21T23:19:48.5900162Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:19:48.5919822Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:19:48.5936087Z ##[error]Process completed with exit code 2.\n2022-03-21T23:19:48.6007930Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T23:19:48.6008268Z with:\n2022-03-21T23:19:48.6008483Z env:\n2022-03-21T23:19:48.6008701Z IN_CI: 1\n2022-03-21T23:19:48.6008920Z IS_GHA: 1\n2022-03-21T23:19:48.6009170Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:19:48.6009440Z GPU_FLAG: --gpus all\n2022-03-21T23:19:48.6009671Z ##[endgroup]\n\n\n pull / win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu) (19/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:54:04.2844259Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:53:59.0889659Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T22:53:59.6881416Z ---------------------------------------- 8.1/8.1 MB 14.0 MB/s eta 0:00:00\n2022-03-21T22:53:59.7427779Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:53:59.7691882Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T22:53:59.7779847Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T22:53:59.8281663Z -------------------------------------- 247.7/247.7 KB 5.1 MB/s eta 0:00:00\n2022-03-21T22:54:00.0185115Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:54:00.2359770Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T22:54:04.1208891Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T22:54:04.2505862Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-03b4fbe63be8ef4b0\n2022-03-21T22:54:04.2844259Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:54:04.2891082Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:54:04.2919900Z ##[error]Process completed with exit code 2.\n2022-03-21T22:54:04.3377901Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T22:54:04.3378575Z with:\n2022-03-21T22:54:04.3378930Z env:\n2022-03-21T22:54:04.3379275Z IN_CI: 1\n2022-03-21T22:54:04.3379600Z IS_GHA: 1\n2022-03-21T22:54:04.3380023Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:54:04.3380691Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T22:54:04.3381278Z ##[endgroup]\n\n\n pull / linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge) (20/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:09:34.0074610Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:09:33.6365531Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:09:33.8475619Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:09:33.8655152Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:09:33.8704395Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:09:33.8716774Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:09:33.8760145Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:09:33.8785000Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:09:33.8811316Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:09:33.8960134Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:09:33.9984866Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d325eb9fd156146f\n2022-03-21T22:09:34.0074610Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:09:34.0087465Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:09:34.0101743Z ##[error]Process completed with exit code 2.\n2022-03-21T22:09:34.0154014Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:09:34.0154246Z with:\n2022-03-21T22:09:34.0154412Z env:\n2022-03-21T22:09:34.0154574Z IN_CI: 1\n2022-03-21T22:09:34.0154728Z IS_GHA: 1\n2022-03-21T22:09:34.0154917Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:09:34.0155112Z ##[endgroup]\n2022-03-21T22:09:34.0191047Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge) (21/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:03:17.8502655Z [E request_callbac...yUniqueId(created_on=0, local_id=0) to be created.\n\n2022-03-21T21:03:14.4669960Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxgdsmeer\n2022-03-21T21:03:14.4671407Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxgdsmeer/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.4973023Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1i2hfmpc\n2022-03-21T21:03:14.4973800Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1i2hfmpc/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.5532339Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgx4da7b0\n2022-03-21T21:03:14.5533064Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgx4da7b0/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.7050673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0\n2022-03-21T21:03:14.7097127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3\n2022-03-21T21:03:14.7398339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2\n2022-03-21T21:03:14.7922283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1\n2022-03-21T21:03:17.8502655Z [E request_callback_no_python.cpp:559] Received error while processing request type 261: false INTERNAL ASSERT FAILED at \"/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp\":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.\n2022-03-21T21:03:17.8503603Z Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):\n2022-03-21T21:03:17.8504385Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x69 (0x7f180df19e19 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8505131Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xd2 (0x7f180df160e2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8505927Z frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) + 0x4e (0x7f180df17a7e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8506674Z frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0x4b4 (0x7f18118b7b64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8507642Z frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr >) const + 0x70 (0x7f18118a7bf0 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8508613Z frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0xc8 (0x7f1819736208 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\n2022-03-21T21:03:17.8509749Z frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f18118ac914 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8510708Z frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f1819735865 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\n2022-03-21T21:03:17.8511369Z frame #8: + 0x375249a (0x7f18118a949a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test (22/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:01:07.7015580Z \ufffd[36;1m echo \"ERR...t available for the merge-base of your branch\"\ufffd[0m\n\n2022-03-21T20:01:07.7012399Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7012634Z \ufffd[36;1m# Covers the case where a previous tag doesn't exist for the tree\ufffd[0m\n2022-03-21T20:01:07.7012992Z \ufffd[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly\ufffd[0m\n2022-03-21T20:01:07.7013373Z \ufffd[36;1mif ! git rev-parse \"$MERGE_BASE:.circleci/docker\"; then\ufffd[0m\n2022-03-21T20:01:07.7013784Z \ufffd[36;1m echo \"Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit\"\ufffd[0m\n2022-03-21T20:01:07.7014149Z \ufffd[36;1m exit 1\ufffd[0m\n2022-03-21T20:01:07.7014325Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7014573Z \ufffd[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse \"$MERGE_BASE:.circleci/docker\")\ufffd[0m\n2022-03-21T20:01:07.7014907Z \ufffd[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here\ufffd[0m\n2022-03-21T20:01:07.7015231Z \ufffd[36;1mif [[ \"${PREVIOUS_DOCKER_TAG}\" = \"${DOCKER_TAG}\" ]]; then\ufffd[0m\n2022-03-21T20:01:07.7015580Z \ufffd[36;1m echo \"ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch\"\ufffd[0m\n2022-03-21T20:01:07.7015931Z \ufffd[36;1m echo \" contact the PyTorch team to restore the original images\"\ufffd[0m\n2022-03-21T20:01:07.7016225Z \ufffd[36;1m exit 1\ufffd[0m\n2022-03-21T20:01:07.7016400Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7016608Z \ufffd[36;1mecho ::set-output name=rebuild::yes\ufffd[0m\n2022-03-21T20:01:07.7027605Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}\n2022-03-21T20:01:07.7027837Z env:\n2022-03-21T20:01:07.7028006Z IN_CI: 1\n2022-03-21T20:01:07.7028159Z IS_GHA: 1\n2022-03-21T20:01:07.7028346Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:01:07.7028589Z BASE_REVISION: 6643522db9ff595f564b8081de58b3a33c546178\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu) (23/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T00:49:54.2949572Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T00:49:53.8049151Z + python3 -m pip install boto3==1.19.12\n2022-03-22T00:49:54.0981629Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-22T00:49:54.1207562Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-22T00:49:54.1277146Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-22T00:49:54.1315027Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-22T00:49:54.1331813Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-22T00:49:54.1391622Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-22T00:49:54.1609217Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-22T00:49:54.1637417Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-22T00:49:54.2830197Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0f7c32fe13be12fea\n2022-03-22T00:49:54.2949572Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T00:49:54.2966933Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T00:49:54.2982588Z ##[error]Process completed with exit code 2.\n2022-03-22T00:49:54.3031464Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T00:49:54.3031794Z with:\n2022-03-22T00:49:54.3032012Z env:\n2022-03-22T00:49:54.3032227Z IN_CI: 1\n2022-03-22T00:49:54.3032434Z IS_GHA: 1\n2022-03-22T00:49:54.3032681Z GIT_DEFAULT_BRANCH: master\n2022-03-22T00:49:54.3033084Z GPU_FLAG: --gpus all\n2022-03-22T00:49:54.3033312Z ##[endgroup]\n\n\n pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (24/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:56:12.5872636Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:56:07.3365589Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T21:56:07.7926584Z ---------------------------------------- 8.1/8.1 MB 17.3 MB/s eta 0:00:00\n2022-03-21T21:56:07.9319362Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T21:56:07.9366132Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T21:56:08.0077590Z -------------------------------------- 247.7/247.7 KB 3.0 MB/s eta 0:00:00\n2022-03-21T21:56:08.0164070Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:56:08.1775537Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:56:08.3393469Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T21:56:12.4576766Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T21:56:12.5641959Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0afad69838118af0e\n2022-03-21T21:56:12.5872636Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:56:12.5905611Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:56:12.5927729Z ##[error]Process completed with exit code 2.\n2022-03-21T21:56:12.6239531Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T21:56:12.6240039Z with:\n2022-03-21T21:56:12.6240299Z env:\n2022-03-21T21:56:12.6240557Z IN_CI: 1\n2022-03-21T21:56:12.6240805Z IS_GHA: 1\n2022-03-21T21:56:12.6241118Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:56:12.6241613Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T21:56:12.6242052Z ##[endgroup]\n\n\n pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (25/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:46:39.5474616Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:46:39.1884210Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:46:39.3928976Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:46:39.4105069Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:46:39.4152571Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:46:39.4194931Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:46:39.4218947Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:46:39.4230812Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:46:39.4380089Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:46:39.4399461Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:46:39.5387703Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0888bed1149cca415\n2022-03-21T21:46:39.5474616Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:46:39.5487145Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:46:39.5497480Z ##[error]Process completed with exit code 2.\n2022-03-21T21:46:39.5541319Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:46:39.5541544Z with:\n2022-03-21T21:46:39.5541698Z env:\n2022-03-21T21:46:39.5541851Z IN_CI: 1\n2022-03-21T21:46:39.5541997Z IS_GHA: 1\n2022-03-21T21:46:39.5542176Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:46:39.5542361Z ##[endgroup]\n2022-03-21T21:46:39.5557878Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (26/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:34:57.0623859Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:34:56.9039884Z Attempting uninstall: s3transfer\n2022-03-21T21:34:56.9041446Z Found existing installation: s3transfer 0.3.7\n2022-03-21T21:34:56.9090783Z Uninstalling s3transfer-0.3.7:\n2022-03-21T21:34:56.9095968Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T21:34:56.9453014Z Attempting uninstall: boto3\n2022-03-21T21:34:56.9454356Z Found existing installation: boto3 1.16.34\n2022-03-21T21:34:56.9564320Z Uninstalling boto3-1.16.34:\n2022-03-21T21:34:56.9578035Z Successfully uninstalled boto3-1.16.34\n2022-03-21T21:34:57.0091363Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T21:34:57.0536230Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-034a3afd5d80b91fd\n2022-03-21T21:34:57.0623859Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:34:57.0637167Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:34:57.0647396Z ##[error]Process completed with exit code 2.\n2022-03-21T21:34:57.0688237Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:34:57.0688481Z with:\n2022-03-21T21:34:57.0688631Z env:\n2022-03-21T21:34:57.0688769Z IN_CI: 1\n2022-03-21T21:34:57.0688930Z IS_GHA: 1\n2022-03-21T21:34:57.0689109Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:34:57.0689462Z ##[endgroup]\n2022-03-21T21:34:57.0704768Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge) (27/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:05:00.7896545Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:05:00.7395504Z #10 0x5597fd5a9801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:05:00.7396330Z #11 0x5597fd5b47a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:05:00.7396688Z #12 0x5597fd5b480b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:05:00.7398664Z #13 0x5597fd5b4908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:05:00.7399177Z #14 0x5597fd5b4908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:05:00.7399663Z #15 0x5597fd5b4908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:05:00.7399986Z #16 0x5597fd5b4ccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:05:00.7895241Z #17 0x7f0a5905983f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:05:00.7895772Z #18 0x5597fd559554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:05:00.7896033Z \n2022-03-21T21:05:00.7896545Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:05:00.8063448Z + retcode=1\n2022-03-21T21:05:00.8063787Z + set -e\n2022-03-21T21:05:00.8064058Z + return 1\n2022-03-21T21:05:00.8067638Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:05:00.8068127Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:05:00.8069018Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:05:00.8069500Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:05:00.8070105Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:05:00.8070580Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:05:00.8072640Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (28/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:48:17.3384813Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:48:16.8599645Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:48:17.1464241Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:48:17.1685222Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:48:17.1754164Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:48:17.1771662Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:48:17.1808722Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:48:17.1868636Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:48:17.1903889Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:48:17.2113746Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:48:17.3267404Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-01fe178c405417375\n2022-03-21T22:48:17.3384813Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:48:17.3402286Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:48:17.3418376Z ##[error]Process completed with exit code 2.\n2022-03-21T22:48:17.3470528Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:48:17.3470874Z with:\n2022-03-21T22:48:17.3471096Z env:\n2022-03-21T22:48:17.3471327Z IN_CI: 1\n2022-03-21T22:48:17.3471538Z IS_GHA: 1\n2022-03-21T22:48:17.3471802Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:48:17.3472083Z GPU_FLAG: --gpus all\n2022-03-21T22:48:17.3472322Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (29/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:16:38.9646300Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:16:38.7995969Z Attempting uninstall: s3transfer\n2022-03-21T21:16:38.7998039Z Found existing installation: s3transfer 0.3.7\n2022-03-21T21:16:38.8066994Z Uninstalling s3transfer-0.3.7:\n2022-03-21T21:16:38.8072844Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T21:16:38.8449275Z Attempting uninstall: boto3\n2022-03-21T21:16:38.8451430Z Found existing installation: boto3 1.16.34\n2022-03-21T21:16:38.8559828Z Uninstalling boto3-1.16.34:\n2022-03-21T21:16:38.8574290Z Successfully uninstalled boto3-1.16.34\n2022-03-21T21:16:38.9100438Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T21:16:38.9558098Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d779c59d277d32ee\n2022-03-21T21:16:38.9646300Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:16:38.9658894Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:16:38.9673240Z ##[error]Process completed with exit code 2.\n2022-03-21T21:16:38.9720106Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:16:38.9720333Z with:\n2022-03-21T21:16:38.9720485Z env:\n2022-03-21T21:16:38.9720645Z IN_CI: 1\n2022-03-21T21:16:38.9720793Z IS_GHA: 1\n2022-03-21T21:16:38.9720970Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:16:38.9721151Z ##[endgroup]\n2022-03-21T21:16:38.9736762Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", "author": { - "login": "albanD" + "login": "facebook-github-bot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 964902894 }, { + "bodyText": "@vitaly-fedyunin @gottbrath FYI that this is the oneDNN Graph API integration. It depends on the #63748.", "author": { - "login": "coolteemf" + "login": "Jianhui-Li" }, - "state": "COMMENTED" + "authorAssociation": "NONE", + "editor": null, + "databaseId": 970451860 }, { + "bodyText": "CI failures are currently being caused by some issues in the CI infra, and are also occurring with other PRs.", "author": { - "login": "albanD" + "login": "sanchitintel" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 990641309 }, { + "bodyText": "CI failures are unrelated.", "author": { - "login": "albanD" + "login": "sanchitintel" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 991281407 }, { + "bodyText": "The CI failure is unrelated.", "author": { - "login": "coolteemf" + "login": "sanchitintel" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 995389295 }, { + "bodyText": "Hi, thank you for the PR!\nDo you mind running a larger amount of torchbench and reporting numbers ? You can look at Jason's post here for what models are supported in script. Initially just the vision models would be useful. @Krovatkin also did some benchmarking of a traced Bert model and found on average a ~16% speedup with this PR.", "author": { - "login": "albanD" + "login": "eellison" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1015689390 }, { + "bodyText": "Thanks a lot for reviewing, @eellison & @Krovatkin!\nWe just wanted to let you know that we're working on the benchmarking & will get back to you in a day, or two.\nUPDATE (Jan 21): While running some TorchBench models, we discovered some composability issues, and are working to ensure that oneDNN Graph would complement PyTorch's existing fusion capabilities, not hinder them.\nUPDATE (Jan 24): We've resolved the issues & will update this PR later today. Thanks!", "author": { - "login": "coolteemf" + "login": "sanchitintel" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1016996190 }, { + "bodyText": "Hello @eellison,\nWe used this TorchBench branch for comparison. compare_llga.sh can be run for comparison.\nFor benchmarking mobilenet_v3_large with hardswish support in oneDNN Graph, this oneDNN Graph branch can be used in third_party/ideep/mkl-dnn. It delivers a speedup over PyTorch JIT (NNC + OFI) because 21 additional reorders are prevented (the major factor here), and fusion with conv also helps further.\nThe next release of oneDNN Graph would have hardswish support.\nWe're also exploring adding a hardsigmoid op in oneDNN Graph.\nThank you!", "author": { - "login": "albanD" + "login": "sanchitintel" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1022709513 }, { + "bodyText": "Please note that this PR should be merged after #71546, as #71546 changes the third_party/ideep commit (this PR also uses that ideep commit, but it'd probably be better to merge #71546 first, so that oneDNN v2.5.2 upgrade would be in a separate PR). Thank you!", "author": { - "login": "albanD" + "login": "sanchitintel" }, - "state": "APPROVED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1026330085 }, { + "bodyText": "@sanchitintel mind rebasing and i'll land ?", "author": { - "login": "albanD" + "login": "eellison" }, - "state": "APPROVED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMS0yNVQxMDoyODoxMC0wNjowMLkyMDIyLTAxLTI1VDA5OjU0OjA1LTA2OjAwzjNooqI=", - "hasPreviousPage": false - } - }, - "comments": { - "nodes": [ + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1055813984 + }, { - "bodyText": "Merge failed due to 'NoneType' object is not subscriptable\nRaised by https://github.com/pytorch/pytorch/actions/runs/1887945630", + "bodyText": "@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", "author": { - "login": "pytorchmergebot" + "login": "facebook-github-bot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1048868910 + "databaseId": 1057203495 }, { - "bodyText": "Thanks for the update! The windows failure is not your fault, you can ignore it!\n\nThank you very much for all of your feedback and sorry for the delay !", + "bodyText": "Thanks a lot for taking a look, @eellison! To fix this error, we would enable Bazel build for oneDNN Graph.", "author": { - "login": "coolteemf" + "login": "sanchitintel" }, - "authorAssociation": "CONTRIBUTOR", + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1061230087 + }, + { + "bodyText": "@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1048983572 + "databaseId": 1063276600 }, { - "bodyText": "@coolteemf can you please send either me or @albanD an email? (or I can send you and invite to collab on private repo)", + "bodyText": "@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", "author": { - "login": "malfet" + "login": "facebook-github-bot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1049048119 + "databaseId": 1074355779 }, { - "bodyText": "@pytorchbot merge this please", + "bodyText": "And graph_rewriter.cpp is full of DOS newlines...", "author": { - "login": "albanD" + "login": "malfet" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1049131992 + "databaseId": 1074407452 }, { - "bodyText": "Hey @coolteemf.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "bodyText": "Hey @chunyuan-w.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": { "login": "github-actions" }, "authorAssociation": "NONE", "editor": null, - "databaseId": 1049134520 + "databaseId": 1074471758 + }, + { + "bodyText": "Thanks a ton for your help, @malfet & @eellison! :)\nWe'll incorporate your suggestions in subsequent PR(s).", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1074492365 } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOPoR4Lg==", - "hasPreviousPage": true + "startCursor": "Y3Vyc29yOnYyOpHOOYM_0Q==", + "hasPreviousPage": false } - }, - "labels": { - "edges": [ - { - "node": { - "name": "triaged" - } - }, - { - "node": { - "name": "open source" - } - }, - { - "node": { - "name": "cla signed" - } - }, - { - "node": { - "name": "release notes: nn" - } - }, - { - "node": { - "name": "topic: performance" - } - } - ] } } } } }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=75095 owner=pytorch": { + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=73969 owner=pytorch": { "data": { "repository": { "pullRequest": { "closed": true, - "isCrossRepository": false, + "isCrossRepository": true, "author": { - "login": "mruberry" + "login": "malfet" }, - "title": "Initial prims, references, and test architecture for them", - "body": "This PR adds an initial set of experimental primitive operations and Python references that reimplement existing PyTorch operations using them. See https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-0/577 for additional context.\r\n\r\nThe following experimental primitives are added:\r\n\r\n- Elementwise unary prims -- abs, acos, acosh, asin, atan, cos, cosh, bessel_i0e, bessel_i1e, cbrt, ceil, digamma, erf, erf_inv, erfc, exp, expm1, floor, igamma, igammac, is_finite, lgamma, log, log1p, neg, reciprocal, round, sign, sinh, sqrt, square, tan. \r\n- Elementwise binary prims -- add, atan2, bitwise_and, bitwise_not, bitwise_or, bitwise_xor, div, eq, ge, gt, le, lt, max, min, mul, ne, nextafter, pow, rsqrt, shift_left, shift_right_arithmetic\r\n- View prims -- brodcast_in_dim, collapse_view, split_dim, squeeze\r\n- Shape prims -- collapse, concatenate, reshape\r\n- Conditional prims -- select\r\n- Data conversion & movement prims -- convert_element_type, device_put\r\n- Inplace prims -- copy_to, resize\r\n\r\nThese primitives do not add any new functionality to PyTorch, but are intended to be the semantic building blocks for reference operators. We have tried to make them consistent with the operations in [jax.lax](https://jax.readthedocs.io/en/latest/jax.lax.html) where possible (because PyTorch prefers being consistent with other frameworks), although there are key differences between these prims and operations in jax.lax. Most notably is that these prims model view semantics and inplace operations.\r\n\r\nIn addition to these primitives the following elementwise binary Python references are added:\r\n\r\n- Elementwise binary Python references -- add, atan2, bitwise_and, bitwise_left_shift, bitwise_or, bitwise_right_shift, bitwise_xor, eq, float_power, ge, gt, le, lt, maximum, minimum, mul, ne, nextafter, pow, sub, true_divide\r\n- Conditional Python references - where\r\n- Data conversion & movement references - copy_to\r\n\r\nA Python reference implements the same behavior as its corresponding PyTorch operator (excepting slight numerical differences, bug fixes, and in some cases additional features). \r\n\r\nThe start of an OpInfo-based test architecture for these references is also included in this PR. A new list, `python_ref_db`, is added to `common_methods_invocations.py`. This list introduces the new `ElementwiseBinaryPythonRefInfo`, which inherits input arguments from the original operators' OpInfo, allows them to be overridden, and then constructs the OpInfo for the Python reference using the (potentially modified) arguments. OpInfo-based tests can opt-into testing references by including this new list in the Sequence passed to the `@ops` decorator. \r\n\r\ncc @ngimel @csarofeen @kevinstephano @Lezcano ", - "headRefName": "prims_and_references", + "title": "Dummy change", + "body": "Test Plan: None at all\n\nDifferential Revision: D34753911\n\n", + "headRefName": "export-D34753911", "headRepository": { - "nameWithOwner": "pytorch/pytorch" + "nameWithOwner": "malfet/pytorch" }, "baseRefName": "master", "baseRepository": { @@ -8744,299 +14229,781 @@ } }, "mergeCommit": null, - "commits_with_authors": { - "nodes": [ - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "a790467c650be92775103cde5e866c90b56f5376" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "bd6fcf50692e208ebecdc2eaa517a2bfcdcd35cf" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "4a119c8f21529fe1375e7e8789b91f41a3df80c5" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "ea6750dc34d66be759fdfe84b09fb0e23ee59c79" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "2eef8a55fe0227e1921b51bf1f56f9d0a29b49ac" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "b886ed6c20dd1785fd31ed6fa6a8c5b6d0d0b16c" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "9ad9b63d09aa4f7a8549bcf1d88ea4ff0674299c" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "63fdd580118477416ae160e0670ae722ea248090" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "0ccf7dc292af1d40d0a094eb2b2fb0c7ab4ccc70" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "e8a8a4d1fbe35f20eb88e1a43cf5a653883638e5" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "186634dfdd25645c05b58a212f9e8d77c4125fc0" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "f5b4741312b5c42a79f6c8a1d3930b79db38ed8f" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "23d50391bb0fd12111fd3171591c4235ffb2fc1a" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "bac9d45422d58f513b60b4b854441cfdc253d4c5" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "13240ae0b4a0332c3167b65ac026a3172da90cb7" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "1ee34468cb1db3dc6cbae204669f4fec20e2a466" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "561d132bc686d00e8911f7feb3da5901b2bdc574" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ngimel" - }, - "email": "ngimel@fb.com", - "name": "Natalia Gimelshein" - }, - "oid": "ac42bedc84b7c96256376ad09917263bb020b2c3" - } - }, + "commits_with_authors": { + "nodes": [ { "commit": { "author": { "user": { - "login": "ngimel" + "login": "malfet" }, - "email": "ngimel@fb.com", - "name": "Natalia Gimelshein" - }, - "oid": "7f7d5ba40a0b5e10526d90b018b30b54673d12d8" - } - }, - { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" + "email": "nshulga@fb.com", + "name": "Nikita Shulga" }, - "oid": "37a6b4a8b1adb712d5777c7c3479866c27fb3c4e" + "oid": "4746da707a9912356f5179625da89616b228dc21" } - }, + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + }, + "totalCount": 1 + }, + "commits": { + "nodes": [ { "commit": { - "author": { - "user": { - "login": "ngimel" - }, - "email": "ngimel@fb.com", - "name": "Natalia Gimelshein" + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280134/jobs/2794078044" + }, + { + "name": "test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280134/jobs/2794189060" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRQMQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592963" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QM=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280135/jobs/2794078023" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592965" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QU=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280132/jobs/2794078060" + }, + { + "name": "test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280132/jobs/2794292071" + }, + { + "name": "test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280132/jobs/2794292205" + }, + { + "name": "test (distributed, 1, 1, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280132/jobs/2794292306" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbTiXw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592966" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QY=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cuda11.3-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280139/jobs/2794078053" + }, + { + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280139/jobs/2794536907" + }, + { + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280139/jobs/2794536998" + }, + { + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280139/jobs/2794537089" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbY_vU=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592967" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qc=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280136/jobs/2794078031" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2ao=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592969" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qk=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-docs" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280138/jobs/2794078055" + }, + { + "name": "build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280138/jobs/2794183768" + }, + { + "name": "build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280138/jobs/2794183828" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRIt0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592970" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qo=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280140/jobs/2794078017" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280140/jobs/2794181109" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280140/jobs/2794181305" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280140/jobs/2794181488" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRFm4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592971" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qs=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280143/jobs/2794078025" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592974" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Q4=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "shellcheck", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078028" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078196" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078407" + }, + { + "name": "clang-format", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078610" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078760" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078898" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078999" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794079087" + }, + { + "name": "mypy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794079199" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO4Es=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592975" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Q8=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280146/jobs/2794078040" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2b0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592976" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RA=" + } + ], + "pageInfo": { + "hasNextPage": true + } }, - "oid": "65b613868c44e519c1777af79b9fd3498c5a7e58" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ngimel" - }, - "email": "ngimel@fb.com", - "name": "Natalia Gimelshein" + "status": { + "contexts": [ + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17040614?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17040643?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17040615?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] }, - "oid": "442c405e9da0d66744ef03e379224c41eedf5b57" + "pushedDate": "2022-03-09T15:57:16Z", + "oid": "4746da707a9912356f5179625da89616b228dc21" } - }, + } + ] + }, + "changedFiles": 1, + "files": { + "nodes": [ { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "031ac49ae9c192989385986b6707fa781e3229e0" - } - }, + "path": "tools/build_variables.bzl" + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [], + "pageInfo": { + "startCursor": null, + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "9a6c3b00039c0c985c1c9cb59490012d1c0b38ba" - } + "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/malfet/pytorch/blob/4746da707a9912356f5179625da89616b228dc21/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\nAdd ciflow labels to this PR to trigger more builds:\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nlinux-binary-libtorch-cxx11-abi\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-binary-libtorch-pre-cxx11\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-binary-manywheel\nciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk\n\u2705 triggered\n\n\nlinux-bionic-rocm4.5-py3.7\nciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk\n\u2705 triggered\n\n\nlinux-docs\nciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-vulkan-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-build\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-custom-build-static\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-onnx\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build\nciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7-no-ops\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nmacos-arm64-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nmacos-arm64-binary-wheel\nciflow/binaries, ciflow/binaries_wheel, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-libtorch-cxx11-abi\nciflow/binaries, ciflow/binaries_libtorch, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-libtorch-pre-cxx11\nciflow/binaries, ciflow/binaries_libtorch, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-wheel\nciflow/binaries, ciflow/binaries_wheel, ciflow/default\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwindows-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nwindows-binary-libtorch-debug\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nwindows-binary-libtorch-release\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nwindows-binary-wheel\nciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\ncaffe2-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\ndocker-builds\nciflow/all, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-custom-ops\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-metal\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda10.2-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-docs-push\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-no-ops\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-arm64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-lite-interpreter-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-11-py3-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.3-py3.7-gcc7-debug\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.5-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-build\nciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\npytorch-xla-linux-bionic-py3.7-clang8\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla\n\ud83d\udeab skipped", + "author": { + "login": "pytorch-bot" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1063079053 }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "d5c30e408af1889b90012d2e09f6ec3cda333bcb" - } + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/73969\n\ud83d\udcc4 \u00a0Preview docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 4746da7 (more details on the Dr. CI page):\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1063079113 }, { - "commit": { - "author": { - "user": null, - "email": "mruberry@devfair044.h1.fair", - "name": "Mike Ruberry" - }, - "oid": "db355d55655bb252a699cd532441bb98e52b98d5" + "bodyText": "This pull request was exported from Phabricator. Differential Revision: D34753911", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1063079731 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOP11MjQ==", + "hasPreviousPage": false + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "fb-exported" + } + }, + { + "node": { + "name": "cla signed" } } - ], - "pageInfo": { - "endCursor": "MjY", - "hasNextPage": false - }, - "totalCount": 26 - }, + ] + } + } + } + } + }, + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAU2F-RA= name=pytorch number=73969 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { "commits": { "nodes": [ { "commit": { + "oid": "4746da707a9912356f5179625da89616b228dc21", "checkSuites": { "edges": [ + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280141/jobs/2794078056" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2c8=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592977" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RE=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Test tools" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280142/jobs/2794078033" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2as=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592978" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RI=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280144/jobs/2794078046" + }, + { + "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280144/jobs/2794338293" + }, + { + "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280144/jobs/2794338408" + }, + { + "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280144/jobs/2794338568" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbUkMA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592980" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RQ=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7-no-ops" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280148/jobs/2794078065" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2d8=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592981" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RU=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cpu-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280149/jobs/2794078067" + }, + { + "name": "test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280149/jobs/2794407041" + }, + { + "name": "test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280149/jobs/2794407168" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbWDX8=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592982" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RY=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280150/jobs/2794078029" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592983" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Rc=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280151/jobs/2794078062" + }, + { + "name": "test (default, 3, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280151/jobs/2794225603" + }, + { + "name": "test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280151/jobs/2794225793" + }, + { + "name": "test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280151/jobs/2794226005" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbSD-k=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592985" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Rk=" + }, { "node": { "app": { @@ -9053,115 +15020,313 @@ }, { "name": "Meta Internal-Only Changes Check", - "conclusion": "SUCCESS", + "conclusion": "NEUTRAL", "detailsUrl": "https://opensource.facebook.com/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6ux14=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO574=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454954" + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592986" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC2o=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Ro=" }, { "node": { "app": { - "name": "Netlify", - "databaseId": 13473 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-xla-linux-bionic-py3.7-clang8" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280152/jobs/2794078032" + }, + { + "name": "test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280152/jobs/2794227475" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbSGAM=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454956" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592987" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC2w=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Rs=" }, { "node": { "app": { - "name": "Azure Pipelines", - "databaseId": 9426 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280160/jobs/2794078054" + }, + { + "name": "test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280160/jobs/2794203297" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280160/jobs/2794203553" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280160/jobs/2794203717" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280160/jobs/2794203878" + }, + { + "name": "test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280160/jobs/2794203982" + }, + { + "name": "test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280160/jobs/2794204149" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRlJs=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454965" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592997" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC3U=" - }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-SU=" + } + ], + "pageInfo": { + "hasNextPage": true + } + } + } + } + ] + } + } + } + } + }, + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAU2F-SU= name=pytorch number=73969 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { + "nodes": [ + { + "commit": { + "oid": "4746da707a9912356f5179625da89616b228dc21", + "checkSuites": { + "edges": [ { "node": { "app": { - "name": "Dependabot", - "databaseId": 29110 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-py3.7-clang9" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280162/jobs/2794078019" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280162/jobs/2794187280" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280162/jobs/2794187423" + }, + { + "name": "test (noarch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280162/jobs/2794187582" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRN_c=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454970" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595593001" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC3o=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Sk=" }, { "node": { "app": { - "name": "Codecov", - "databaseId": 254 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-onnx" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280164/jobs/2794078039" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280164/jobs/2794213425" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280164/jobs/2794213615" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRySo=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454974" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595593014" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC34=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-TY=" }, { "node": { "app": { - "name": "PyTorch Bot", - "databaseId": 40112 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280168/jobs/2794078064" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2d0=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454977" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595593026" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC4E=" - }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-UI=" + } + ], + "pageInfo": { + "hasNextPage": false + } + } + } + } + ] + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=73099 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "BowenBao" + }, + "title": "[ONNX] Make graph name spec-compliant (#71961)", + "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* #73104\n* #73103\n* #73102\n* #73101\n* #73100\n* __->__ #73099\n\n[According to the ONNX spec](https://github.com/onnx/onnx/blob/main/docs/IR.md#names-within-a-graph),\nall names must adhere to C90 identifier syntax rules, which means no\ndashes.\n\nFixes: #30952", + "headRefName": "gh/BowenBao/138/head", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "gh/BowenBao/138/base", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "BowenBao" + }, + "email": "bowbao@microsoft.com", + "name": "BowenBao" + }, + "oid": "3038b939eb2069653305c419326a0f47d2598e39" + } + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + }, + "totalCount": 1 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ { "node": { "app": { @@ -9178,18 +15343,18 @@ { "name": "run-torchbench", "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150879695?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041786/jobs/2626264278" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6e-c8=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNn9o=", "hasNextPage": false } }, "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455322" + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189561" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDNo=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7k=" }, { "node": { @@ -9199,56 +15364,41 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "linux-xenial-cuda11.3-py3.7-gcc7" } }, "checkRuns": { "nodes": [ { - "name": "quick-checks", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150879696?check_suite_focus=true" - }, - { - "name": "lintrunner", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150879758?check_suite_focus=true" - }, - { - "name": "Test tools", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150879835?check_suite_focus=true" - }, - { - "name": "Test collect_env (with_torch)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150879901?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041785/jobs/2626264385" }, { - "name": "Test collect_env (without_torch)", + "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150879942?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041785/jobs/2626417658" }, { - "name": "toc", + "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150880005?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041785/jobs/2626417743" }, { - "name": "workflow-checks", + "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150880051?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041785/jobs/2626417885" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6e-zM=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkRE_E=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455334" + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189562" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDOY=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7o=" }, { "node": { @@ -9258,913 +15408,933 @@ }, "workflowRun": { "workflow": { - "name": "pull" + "name": "linux-xenial-py3.7-gcc7-no-ops" } }, "checkRuns": { "nodes": [ { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895177?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.0-py3.7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895295?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-onnx / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895365?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895428?check_suite_focus=true" - }, - { - "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895554?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.3-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895614?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895698?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895758?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895866?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895923?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150895991?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896053?check_suite_focus=true" - }, - { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896146?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896213?check_suite_focus=true" - }, - { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896256?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896288?check_suite_focus=true" - }, - { - "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896313?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7-no-ops / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896352?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896403?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150896443?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150970691?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150970749?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150970796?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150970831?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150970876?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150970911?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (cpp)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150970959?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (python)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150971013?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150976613?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150976667?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150976694?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150977190?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150980317?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041789/jobs/2626264416" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoJE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189563" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7s=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150980363?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041787/jobs/2626264407" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoIY=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189564" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7w=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "name": "build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150989669?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041788/jobs/2626264422" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoJs=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189566" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS74=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6150989736?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041790/jobs/2626264414" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", + "name": "test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151003389?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041790/jobs/2626349405" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", + "name": "test (noarch, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151003429?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041790/jobs/2626349522" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", + "name": "test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151003460?check_suite_focus=true" - }, - { - "name": "pytorch-xla-linux-bionic-py3.7-clang8", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151007051?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041790/jobs/2626349618" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiwA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189567" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS78=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-bionic-rocm5.0-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151023043?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041793/jobs/2626264431" }, { - "name": "linux-bionic-rocm5.0-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "name": "test (default, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151023077?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041793/jobs/2626359364" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPxgQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189568" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8A=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151040240?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041792/jobs/2626264427" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoKA=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189570" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8I=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cpu-py3" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151041874?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041791/jobs/2626264386" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "name": "test (default, 1, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151041915?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041791/jobs/2626722677" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "name": "test (default, 2, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151041959?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041791/jobs/2626722710" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkX070=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189571" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8M=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } + }, + "checkRuns": { + "nodes": [ { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151065166?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041803/jobs/2626264401" }, { - "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "name": "test (distributed, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151065218?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041803/jobs/2626349045" }, { - "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "name": "test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151165045?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041803/jobs/2626349141" }, { - "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "name": "test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6151165103?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041803/jobs/2626349272" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6jVK8=", - "hasNextPage": true + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiQA=", + "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455360" + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189572" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDQA=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8Q=" } ], "pageInfo": { - "hasNextPage": false + "hasNextPage": true } }, - "pushedDate": "2022-04-25T02:30:31Z", - "oid": "db355d55655bb252a699cd532441bb98e52b98d5" + "status": { + "contexts": [ + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17010288?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17010289?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17010488?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17010326?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-02-18T18:46:28Z", + "oid": "3038b939eb2069653305c419326a0f47d2598e39" } } ] }, - "changedFiles": 5, + "changedFiles": 162, "files": { "nodes": [ { - "path": "test/test_ops.py" + "path": "test/onnx/expect/TestOperators.test_acos.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_left_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_addconstant.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_addmm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_arange_dynamic.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_argmax.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_asin.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_at_op.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_atan.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_aten_embedding_1.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_aten_embedding_2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_avg_pool2d.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_baddbmm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_basic.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_1d.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_training.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_bitshift.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_c2_op.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_chunk.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip_max.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip_min.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_concat2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_conv.expect" }, { - "path": "torch/_prims/__init__.py" + "path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect" }, { - "path": "torch/_prims/utils.py" + "path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4_opset8.expect" }, { - "path": "torch/_refs/__init__.py" + "path": "test/onnx/expect/TestOperators.test_convtranspose.expect" }, { - "path": "torch/testing/_internal/common_methods_invocations.py" - } - ], - "pageInfo": { - "endCursor": "NQ", - "hasNextPage": false - } - }, - "reviews": { - "nodes": [ + "path": "test/onnx/expect/TestOperators.test_cos.expect" + }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_cumsum.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_det.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dict.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dict_str.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dim.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dropout.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dropout_default.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dropout_opset12.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dropout_training.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dropout_training_opset12.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_add.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_add_inputs_same_symbolic_shape.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_matmul.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_reduce_mean.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_unchange.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_elu.expect" }, { - "author": { - "login": "ngimel" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_embedding_bags.expect" }, { - "author": { - "login": "ngimel" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_empty_like.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_empty_like_opset7.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_equal.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_erf.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_exp.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_expand.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_flatten.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_flatten2D.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_fmod.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_frobenius_norm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_full.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_full_like.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gather.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gather_opset11.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ge.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gelu.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gt.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_hardtanh.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_implicit_expand.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_index.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_isnan.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_layer_norm_aten.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_le.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_linear.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_log_sigmoid.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_logsoftmax.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_lstm_none_sequence_lens.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_lt.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_master_opset.expect" }, + { + "path": "test/onnx/expect/TestOperators.test_max.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool_dilations.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool_indices.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mean.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mean_dtype.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_meshgrid.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_min.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_narrow.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ne.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_nonzero.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_norm_p1.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_norm_p2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ones_like.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_pad.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_params.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_params_onnx_irv4.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_permute2.expect" + } + ], + "pageInfo": { + "endCursor": "MTAw", + "hasNextPage": true + } + }, + "reviews": { + "nodes": [ { "author": { - "login": "zou3519" + "login": "garymm" }, - "state": "COMMENTED" - }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMi0xOFQxNzoxODo0NC0wODowMLkyMDIyLTAyLTE4VDE3OjE4OjQ0LTA4OjAwzjTr0H0=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ { + "bodyText": "This PR cannot be merged by bot due to changing > 100 files. @malfet \n \n \n pytorch/.github/scripts/trymerge.py\n \n \n Line 63\n in\n 932adf2\n \n \n \n \n\n \n \n files(last: 100) { \n \n \n \n\n Can this be relaxed? If not please import.", "author": { - "login": "mruberry" + "login": "BowenBao" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1048084569 }, { + "bodyText": "This PR cannot be merged by bot due to changing > 100 files. @malfet\nCan this be relaxed? If not please import.\n\nWow, you've hit a really interesting problem. 100 is a limitation enforced by GitHub, see https://docs.github.com/en/graphql/overview/resource-limitations, but I can implement a pagination. Do you mind keeping it like that for a bit, want to land a fix soonish.", "author": { - "login": "peterbell10" + "login": "malfet" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1048088691 }, { + "bodyText": "@malfet Thank you for info. Sure, I have separated the rest of stack from this one, we'll wait for the fix to try again.", "author": { - "login": "mruberry" + "login": "BowenBao" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1048090640 }, { + "bodyText": "@pytorchbot merge this", "author": { - "login": "mruberry" + "login": "BowenBao" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1050293881 }, { + "bodyText": "Hey @BowenBao.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": { - "login": "mruberry" + "login": "github-actions" }, - "state": "COMMENTED" - }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1050295451 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOPniAWQ==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "node": { + "name": "oncall: jit" + } }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "node": { + "name": "open source" + } }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "node": { + "name": "cla signed" + } }, { - "author": { - "login": "ngimel" - }, - "state": "COMMENTED" + "node": { + "name": "release notes: onnx" + } }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - }, + "node": { + "name": "topic: bug fixes" + } + } + ] + } + } + } + } + }, + "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MTAw name=pytorch number=73099 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "files": { + "nodes": [ { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_pixel_shuffle.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_pow.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_prelu.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_prod.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_prod_dtype.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_rand.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_randn.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduce_sum_negative_indices.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_mean.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_mean_dtype.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_mean_keepdim.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_prod.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_prod_dtype.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_prod_keepdim.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_sum.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_sum_dtype.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reduced_sum_keepdim.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reducemax.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_reducemin.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_remainder.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_repeat.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_repeat_dim_overflow.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_round.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_rrelu.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_rsqrt.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_rsub.expect" }, { - "author": { - "login": "ngimel" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_scatter_add.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_scatter_add_opset11.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_selu.expect" }, - { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + { + "path": "test/onnx/expect/TestOperators.test_shape_value_map.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_sign.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_sin.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_slice.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_slice_dynamic.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_3d.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_3d_none.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_4d.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_ignore_index.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_weights.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_split.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_split_with_sizes.expect" }, { - "author": { - "login": "Lezcano" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_sqrt.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_std.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_sum.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_sum_dtype.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_tan.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_topk.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_topk_smallest_unsorted.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_transpose.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_type_as.expect" }, { - "author": { - "login": "ngimel" - }, - "state": "APPROVED" + "path": "test/onnx/expect/TestOperators.test_unfold.expect" }, { - "author": { - "login": "ezyang" - }, - "state": "COMMENTED" + "path": "test/onnx/expect/TestOperators.test_unique.expect" }, { - "author": { - "login": "mruberry" - }, - "state": "COMMENTED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNC0wNlQxNDo1NjoyNC0wNTowMLkyMDIyLTA0LTA2VDEwOjQwOjM4LTA1OjAwzjenO6Y=", - "hasPreviousPage": false - } - }, - "comments": { - "nodes": [ + "path": "test/onnx/expect/TestOperators.test_unsqueeze.expect" + }, { - "bodyText": "Ref implementations by themselves can handle any shapes (and broadcast ops by themselves don't bake in any shapes). The question is can we decide if a particular trace is applicable for a different input, but that depends on the tracing technology and what we are caching on, so out of scope for initial PR.", - "author": { - "login": "ngimel" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1105643418 + "path": "test/onnx/expect/TestOperators.test_upsample_nearest_scale.expect" }, { - "bodyText": "@pytorchbot merge this please", - "author": { - "login": "mruberry" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1108072887 + "path": "test/onnx/expect/TestOperators.test_upsample_nearest_scale_default_scale_factor.expect" }, { - "bodyText": "Merge failed due to 'mruberry'\nRaised by https://github.com/pytorch/pytorch/actions/runs/2218044244", - "author": { - "login": "pytorchmergebot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1108073536 + "path": "test/onnx/expect/TestOperators.test_upsample_nearest_size.expect" }, { - "bodyText": "@mruberry has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", - "author": { - "login": "facebook-github-bot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1108075965 + "path": "test/onnx/expect/TestOperators.test_view.expect" }, { - "bodyText": "Hey @mruberry.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", - "author": { - "login": "github-actions" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 1108351107 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOQebHmg==", - "hasPreviousPage": true - } - }, - "labels": { - "edges": [ + "path": "test/onnx/expect/TestOperators.test_view_flatten.expect" + }, { - "node": { - "name": "cla signed" - } + "path": "test/onnx/expect/TestOperators.test_zeros_like.expect" }, { - "node": { - "name": "topic: not user facing" - } + "path": "torch/csrc/jit/serialization/export.cpp" }, { - "node": { - "name": "module: primTorch" - } + "path": "torch/csrc/jit/serialization/export.h" } - ] + ], + "pageInfo": { + "endCursor": "MTYy", + "hasNextPage": false + } } } } } }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=77700 owner=pytorch": { + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=74649 owner=pytorch": { "data": { "repository": { "pullRequest": { "closed": true, "isCrossRepository": false, "author": { - "login": "kit1980" + "login": "malfet" }, - "title": "Move pull linux-docs job to Ubuntu 20.04", - "body": "", - "headRefName": "sdym/pull-xenial-focal-linux-docs", + "title": "This should fail flake8", + "body": "Test issue for GHF mandatory checks", + "headRefName": "malfet-patch-8", "headRepository": { "nameWithOwner": "pytorch/pytorch" }, @@ -10183,20 +16353,32 @@ "commit": { "author": { "user": { - "login": "kit1980" + "login": "malfet" }, - "email": "sdym@fb.com", - "name": "Sergii Dymchenko" + "email": "nshulga@fb.com", + "name": "Nikita Shulga" }, - "oid": "81261599614423baa17df72300b8e109677b6799" + "oid": "57c86ff1c5ab948888fd329986c9d55796680e33" + } + }, + { + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4" } } ], "pageInfo": { - "endCursor": "MQ", + "endCursor": "Mg", "hasNextPage": false }, - "totalCount": 1 + "totalCount": 2 }, "commits": { "nodes": [ @@ -10216,18 +16398,18 @@ { "name": "Facebook CLA Check", "conclusion": "SUCCESS", - "detailsUrl": "https://code.facebook.com/cla/" + "detailsUrl": "https://code.intern.facebook.com/cla/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNmNqE=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsK3w=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147714" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018129" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuMI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1E=" }, { "node": { @@ -10244,9 +16426,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147726" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018131" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuM4=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1M=" }, { "node": { @@ -10263,9 +16445,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147733" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018132" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuNU=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1Q=" }, { "node": { @@ -10282,9 +16464,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147746" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018134" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuOI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1Y=" }, { "node": { @@ -10301,9 +16483,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147762" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018139" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuPI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1s=" }, { "node": { @@ -10320,9 +16502,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147780" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018142" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuQQ=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj14=" }, { "node": { @@ -10338,50 +16520,75 @@ "checkRuns": { "nodes": [ { - "name": "lintrunner", + "name": "clang-format", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901060?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925132" }, { - "name": "workflow-checks", + "name": "clang-tidy", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901248?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925189" }, { - "name": "quick-checks", + "name": "cmakelint", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925230" + }, + { + "name": "flake8-py3", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925307" + }, + { + "name": "mypy", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901458?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925365" }, { "name": "Test collect_env (with_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901863?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925427" }, { "name": "Test collect_env (without_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901951?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925449" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925537" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925644" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925688" }, { "name": "toc", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498902083?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925809" }, { - "name": "Test tools", + "name": "shellcheck", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498902358?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925945" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdYVY=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsMiY=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148336" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018384" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuzA=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkFA=" }, { "node": { @@ -10399,18 +16606,18 @@ { "name": "run-torchbench", "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901064?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576288/jobs/2928925134" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdXEg=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsLW0=", "hasNextPage": false } }, "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148344" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018395" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuzg=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkFs=" }, { "node": { @@ -10420,1331 +16627,2622 @@ }, "workflowRun": { "workflow": { - "name": "docker-builds" + "name": "pull" } }, "checkRuns": { "nodes": [ { - "name": "docker-build (pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901070?check_suite_focus=true" - }, - { - "name": "docker-build (pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901146?check_suite_focus=true" - }, - { - "name": "docker-build (pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901221?check_suite_focus=true" + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928935743" }, { - "name": "docker-build (pytorch-linux-bionic-py3.7-clang9)", + "name": "linux-vulkan-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901302?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928935775" }, { - "name": "docker-build (pytorch-linux-bionic-rocm5.0-py3.7)", + "name": "linux-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901366?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928935850" }, { - "name": "docker-build (pytorch-linux-bionic-rocm5.1-py3.7)", + "name": "linux-bionic-rocm4.5-py3.7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901454?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928935994" }, { - "name": "docker-build (pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901538?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936064" }, { - "name": "docker-build (pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)", + "name": "linux-xenial-py3.7-gcc5.4 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901617?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936179" }, { - "name": "docker-build (pytorch-linux-xenial-py3-clang5-android-ndk-r19c)", + "name": "linux-xenial-py3-clang5-mobile-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901670?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936265" }, { - "name": "docker-build (pytorch-linux-xenial-py3-clang5-asan)", + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901773?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936309" }, { - "name": "docker-build (pytorch-linux-xenial-py3-clang7-asan)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901846?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936353" }, { - "name": "docker-build (pytorch-linux-xenial-py3-clang7-onnx)", + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498901939?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936395" }, { - "name": "docker-build (pytorch-linux-xenial-py3.7-gcc5.4)", + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498902041?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936426" }, { - "name": "docker-build (pytorch-linux-xenial-py3.7-gcc7)", + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498902117?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936483" }, { - "name": "docker-build (pytorch-linux-focal-py3.7-gcc7)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498902194?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdYLI=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148352" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduu0A=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "pull" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "linux-bionic-py3.7-clang9 / build", + "name": "win-vs2019-cuda11.3-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498932877?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936516" }, { - "name": "linux-focal-py3.7-gcc7 / build", + "name": "win-vs2019-cpu-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498933082?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936558" }, { "name": "linux-xenial-py3.7-gcc7-no-ops / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498933297?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936633" }, { "name": "linux-xenial-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498933508?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498933805?check_suite_focus=true" - }, - { - "name": "linux-bionic-rocm5.1-py3.7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498934115?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498934258?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-gcc5.4 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498934411?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-onnx / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498934576?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936705" }, { "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498934681?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936736" }, { - "name": "win-vs2019-cuda11.3-py3 / build", + "name": "linux-xenial-py3.7-clang7-onnx / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498934902?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936756" }, { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935080?check_suite_focus=true" + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936796" }, { - "name": "linux-xenial-py3-clang5-mobile-build / build", + "name": "linux-xenial-py3.7-clang7-asan / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935207?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936823" }, { - "name": "linux-xenial-py3.7-clang7-asan / build", + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935381?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928990551" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935482?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928990588" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "name": "linux-docs / build-docs (cpp)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935669?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928992832" }, { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "name": "linux-docs / build-docs (python)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935747?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928992868" }, { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935802?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928992932" }, { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935884?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928992965" }, { - "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498935972?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928993011" }, { - "name": "win-vs2019-cpu-py3 / build", + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6498936102?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928993042" }, { - "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499060931?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928993086" }, { - "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499060996?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928993128" }, { "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499065639?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928995802" }, { "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499065699?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499065764?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928995853" }, { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "name": "linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499065815?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928995889" }, { "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499069355?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928997626" }, { "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499078217?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928999058" }, { "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499078276?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499104194?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928999075" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499104243?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929012407" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499104298?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929012438" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499104357?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929012469" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "name": "linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499104403?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929034328" }, { - "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "name": "linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499108043?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929034340" }, { "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499152001?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499153180?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499153280?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499153315?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499153355?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499153395?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929040801" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499153439?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929045939" }, { - "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499153610?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929046016" }, { - "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499153676?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929046063" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499259414?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929082254" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499259466?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929082275" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499259509?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929157614" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499259568?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929157635" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "name": "win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499259607?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929157656" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNi1Nc=", - "hasNextPage": true + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHxIT4=", + "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148369" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018405" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduu1E=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkGU=" } ], "pageInfo": { "hasNextPage": false } }, - "pushedDate": "2022-05-19T00:02:11Z", - "oid": "81261599614423baa17df72300b8e109677b6799" + "status": null, + "pushedDate": "2022-03-24T00:42:33Z", + "oid": "6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4" + } + } + ] + }, + "changedFiles": 1, + "files": { + "nodes": [ + { + "path": "torch/nn/cpp.py" + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "seemethere" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMy0yM1QxNTo1MDo0NS0wNzowMLkyMDIyLTAzLTIzVDE1OjUwOjQ1LTA3OjAwzjbPEDg=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/74649\n\u21a9\ufe0f \u00a0[fb-only] Re-run with SSH instructions\nNeed help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 6c3c3de (more details on the Dr. CI page):\n\n\n1/1 failures introduced in this PR\n\n\n1 failure not recognized by patterns:\n\n\n\nJob\nStep\nAction\n\n\n\n\n Lint / flake8-py3\nFail if there were any warnings\n\ud83d\udd01 rerun\n\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1076891218 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQDAOUg==", + "hasPreviousPage": false + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "cla signed" } } - ] - }, - "changedFiles": 3, + ] + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "dreiss" + }, + { + "login": "kumpera" + }, + { + "login": "ezyang" + }, + { + "login": "stephenroller" + }, + { + "login": "swolchok" + }, + { + "login": "hyuen" + }, + { + "login": "orionr" + }, + { + "login": "dhruvbird" + }, + { + "login": "likethesky" + }, + { + "login": "lw" + }, + { + "login": "raziel" + }, + { + "login": "simpkins" + }, + { + "login": "ebyrne" + }, + { + "login": "Babar" + }, + { + "login": "kostmo" + }, + { + "login": "0x00b1" + }, + { + "login": "bhosmer" + }, + { + "login": "digantdesai" + }, + { + "login": "zdevito" + }, + { + "login": "bugra" + }, + { + "login": "kunalb" + }, + { + "login": "caraya10" + }, + { + "login": "kit1980" + }, + { + "login": "shoumikhin" + }, + { + "login": "huydhn" + }, + { + "login": "teytaud" + }, + { + "login": "xuzhao9" + }, + { + "login": "jansel" + }, + { + "login": "abhinavarora" + }, + { + "login": "b0noI" + }, + { + "login": "djthorne" + }, + { + "login": "nairbv" + }, + { + "login": "Mortimerp9" + }, + { + "login": "dadkins20" + }, + { + "login": "colesbury" + }, + { + "login": "laurencer" + }, + { + "login": "nickgg" + }, + { + "login": "yzhao30" + }, + { + "login": "rmaz" + }, + { + "login": "bearzx" + }, + { + "login": "mattjgalloway" + }, + { + "login": "chenyang78" + }, + { + "login": "yns88" + }, + { + "login": "lc0" + }, + { + "login": "wenleix" + }, + { + "login": "jingsh" + }, + { + "login": "mthrok" + }, + { + "login": "drdarshan" + }, + { + "login": "tvalentius" + }, + { + "login": "d4l3k" + }, + { + "login": "jamiemccrindle" + }, + { + "login": "kazhang" + }, + { + "login": "simonhollis" + }, + { + "login": "ajyu" + }, + { + "login": "govardhan" + }, + { + "login": "yinghai" + }, + { + "login": "zyan0" + }, + { + "login": "ajtulloch" + }, + { + "login": "smeenai" + }, + { + "login": "vtlam" + }, + { + "login": "pbelevich" + }, + { + "login": "VitalyFedyunin" + }, + { + "login": "dbish" + }, + { + "login": "khabinov" + }, + { + "login": "NicolasHug" + }, + { + "login": "jfix71" + }, + { + "login": "atuljangra" + }, + { + "login": "idning" + }, + { + "login": "soumith" + }, + { + "login": "nimin98" + }, + { + "login": "chaekit" + }, + { + "login": "radkris-git" + }, + { + "login": "xunnanxu" + }, + { + "login": "javier-m" + }, + { + "login": "jmdetloff" + }, + { + "login": "mostafaelhoushi" + }, + { + "login": "brianjo" + }, + { + "login": "wangkuiyi" + }, + { + "login": "suo" + }, + { + "login": "vkuzo" + }, + { + "login": "seemethere" + }, + { + "login": "cpuhrsch" + }, + { + "login": "qihqi" + }, + { + "login": "jackm321" + }, + { + "login": "linbinyu" + }, + { + "login": "neerajprad" + }, + { + "login": "rsemenov" + }, + { + "login": "ziky90" + }, + { + "login": "gmagogsfm" + }, + { + "login": "zzzwen" + }, + { + "login": "ikriv" + }, + { + "login": "deeptigp" + }, + { + "login": "andrewor14" + }, + { + "login": "jianyuh" + }, + { + "login": "cykustcc" + }, + { + "login": "highker" + }, + { + "login": "beauby" + }, + { + "login": "jeffreyksmithjr" + }, + { + "login": "suphoff" + }, + { + "login": "smessmer" + } + ], + "pageInfo": { + "hasNextPage": true, + "endCursor": "Y3Vyc29yOnYyOpHOACQ5JQ==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOACQ5JQ== name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "ananthsub" + }, + { + "login": "firstprayer" + }, + { + "login": "malfet" + }, + { + "login": "fegin" + }, + { + "login": "hanton" + }, + { + "login": "zanqi" + }, + { + "login": "supriyar" + }, + { + "login": "kausv" + }, + { + "login": "dagitses" + }, + { + "login": "bilgeacun" + }, + { + "login": "caogao" + }, + { + "login": "miguelmartin75" + }, + { + "login": "penguinwu" + }, + { + "login": "shz117" + }, + { + "login": "ajliu" + }, + { + "login": "msaroufim" + }, + { + "login": "davides" + }, + { + "login": "alannnna" + }, + { + "login": "hlin09" + }, + { + "login": "hudeven" + }, + { + "login": "terrychenism" + }, + { + "login": "xiaomengy" + }, + { + "login": "jisaacso" + }, + { + "login": "fkhan1337" + }, + { + "login": "xing-liu" + }, + { + "login": "alanadakotashine" + }, + { + "login": "desertfire" + }, + { + "login": "YosuaMichael" + }, + { + "login": "banitag1" + }, + { + "login": "gchanan" + }, + { + "login": "dbort" + }, + { + "login": "bilalsal" + }, + { + "login": "DanilBaibak" + }, + { + "login": "serhaty" + }, + { + "login": "yf225" + }, + { + "login": "mlazos" + }, + { + "login": "yifuwang" + }, + { + "login": "z-a-f" + }, + { + "login": "tenpercent" + }, + { + "login": "bertmaher" + }, + { + "login": "chauhang" + }, + { + "login": "ZainRizvi" + }, + { + "login": "jiayisuse" + }, + { + "login": "bochko" + }, + { + "login": "jeanschmidt" + }, + { + "login": "bradleyhd" + }, + { + "login": "mullachv" + }, + { + "login": "voznesenskym" + }, + { + "login": "bwasti" + }, + { + "login": "NivekT" + }, + { + "login": "zhxchen17" + }, + { + "login": "jerryzh168" + }, + { + "login": "MohammadMahdiJavanmard" + }, + { + "login": "wconstab" + }, + { + "login": "Hangjun" + }, + { + "login": "davidberard98" + }, + { + "login": "Krovatkin" + }, + { + "login": "CamiWilliams" + }, + { + "login": "datumbox" + }, + { + "login": "aartibasant" + }, + { + "login": "xta0" + }, + { + "login": "zou3519" + }, + { + "login": "xman1979" + }, + { + "login": "suraj813" + }, + { + "login": "gqchen" + }, + { + "login": "george-qi" + }, + { + "login": "abhikrish" + }, + { + "login": "zhangguanheng66" + }, + { + "login": "mikeiovine" + }, + { + "login": "Chillee" + }, + { + "login": "albanD" + }, + { + "login": "bigfootjon" + }, + { + "login": "robotal" + }, + { + "login": "MarcioPorto" + }, + { + "login": "srsuryadev" + }, + { + "login": "IvanKobzarev" + }, + { + "login": "eprivezentsev" + }, + { + "login": "kwen2501" + }, + { + "login": "linux-jedi" + }, + { + "login": "chandlerzuo" + }, + { + "login": "otsneh" + }, + { + "login": "husthyc" + }, + { + "login": "briancoutinho" + }, + { + "login": "fduwjj" + }, + { + "login": "frank-wei" + }, + { + "login": "prabhat00155" + }, + { + "login": "QuentinDuval" + }, + { + "login": "atalman" + }, + { + "login": "xush6528" + }, + { + "login": "dracifer" + }, + { + "login": "SS-JIA" + }, + { + "login": "helunwencser" + }, + { + "login": "xw285cornell" + }, + { + "login": "hhbyyh" + }, + { + "login": "rohan-varma" + }, + { + "login": "jcaip" + }, + { + "login": "teng-li" + }, + { + "login": "larryliu0820" + }, + { + "login": "lyoka" + }, + { + "login": "SungMinCho" + } + ], + "pageInfo": { + "hasNextPage": true, + "endCursor": "Y3Vyc29yOnYyOpHOAH1fDg==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOAH1fDg== name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "cbalioglu" + }, + { + "login": "hl475" + }, + { + "login": "hwangjeff" + }, + { + "login": "Jack-Khuu" + }, + { + "login": "mehtanirav" + }, + { + "login": "nateanl" + }, + { + "login": "fuqianz" + }, + { + "login": "boyuantan" + }, + { + "login": "muntaqim" + }, + { + "login": "ymao1993" + }, + { + "login": "fmassa" + }, + { + "login": "esantorella" + }, + { + "login": "HamidShojanazeri" + }, + { + "login": "jubinchheda" + }, + { + "login": "mehdimashayekhi" + }, + { + "login": "rkindi" + }, + { + "login": "wanchaol" + }, + { + "login": "zephirefaith" + }, + { + "login": "kapilsh" + }, + { + "login": "plahera" + }, + { + "login": "SherlockNoMad" + }, + { + "login": "pritamdamania87" + }, + { + "login": "iseeyuan" + }, + { + "login": "protonu" + }, + { + "login": "terhuhf" + }, + { + "login": "aruntonic" + }, + { + "login": "gcatron" + }, + { + "login": "yingrliu" + }, + { + "login": "alexanderguzhva" + }, + { + "login": "angelayi" + }, + { + "login": "zhaoalex" + }, + { + "login": "vivekmig" + }, + { + "login": "sangongs" + }, + { + "login": "jspisak" + }, + { + "login": "akshaypandian" + }, + { + "login": "drej82" + }, + { + "login": "tktrungna" + }, + { + "login": "eellison" + }, + { + "login": "NarineK" + }, + { + "login": "andrewconnors" + }, + { + "login": "wenwei202" + }, + { + "login": "jg2912" + }, + { + "login": "robieta" + }, + { + "login": "mreso" + }, + { + "login": "soulitzer" + }, + { + "login": "PaliC" + }, + { + "login": "anijain2305" + }, + { + "login": "pvtuan10" + }, + { + "login": "huangyi1979" + }, + { + "login": "osalpekar" + }, + { + "login": "xiaohui-zhang" + }, + { + "login": "jerry39213gh" + }, + { + "login": "jarodhou" + }, + { + "login": "hlu1" + }, + { + "login": "huiguoo" + }, + { + "login": "H-Huang" + }, + { + "login": "vtsyvina" + }, + { + "login": "Nitrokitty" + }, + { + "login": "satgera" + }, + { + "login": "ngimel" + }, + { + "login": "markkm" + }, + { + "login": "EscapeZero" + }, + { + "login": "bdhirsh" + }, + { + "login": "cccclai" + }, + { + "login": "carolineechen" + }, + { + "login": "tugsbayasgalan" + }, + { + "login": "agunapal" + }, + { + "login": "frankseide" + }, + { + "login": "YazhiGao" + }, + { + "login": "mrshenli" + }, + { + "login": "bashnick" + }, + { + "login": "lena-kashtelyan" + }, + { + "login": "brad-mengchi" + }, + { + "login": "kimishpatel" + }, + { + "login": "aaronenyeshi" + }, + { + "login": "shajrawi" + }, + { + "login": "samdow" + }, + { + "login": "great-way" + }, + { + "login": "ashkan-software" + }, + { + "login": "bankawas" + }, + { + "login": "jbitton" + }, + { + "login": "jdsgomes" + }, + { + "login": "zhangxy988" + }, + { + "login": "samlurye" + }, + { + "login": "anjali411" + }, + { + "login": "joecummings" + }, + { + "login": "842974287" + }, + { + "login": "JacobSzwejbka" + }, + { + "login": "nishantpdce" + }, + { + "login": "srinivas212" + }, + { + "login": "shreyanb98" + }, + { + "login": "dzdang" + }, + { + "login": "naveedgol" + }, + { + "login": "Nayef211" + }, + { + "login": "zrphercule" + }, + { + "login": "HengruiX" + }, + { + "login": "langong347" + }, + { + "login": "ebsmothers" + }, + { + "login": "anshuljain1" + }, + { + "login": "salilsdesai" + } + ], + "pageInfo": { + "hasNextPage": true, + "endCursor": "Y3Vyc29yOnYyOpHOAYM3gA==" + } + } + } + } + } + }, + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOAYM3gA== name=metamates org=pytorch": { + "data": { + "organization": { + "team": { + "members": { + "nodes": [ + { + "login": "vmoens" + }, + { + "login": "yoavnavon" + }, + { + "login": "printfoo" + }, + { + "login": "xinyang0" + }, + { + "login": "abhinav19792" + }, + { + "login": "fbbradheintz" + }, + { + "login": "kauterry" + }, + { + "login": "anirbanraywork" + }, + { + "login": "houseroad" + }, + { + "login": "erichan1" + }, + { + "login": "hsrussell" + }, + { + "login": "ilia-cher" + }, + { + "login": "ajitmaths" + }, + { + "login": "awgu" + }, + { + "login": "wz337" + }, + { + "login": "qxy11" + }, + { + "login": "janeyx99" + }, + { + "login": "msedwar" + }, + { + "login": "glaringlee" + }, + { + "login": "anj-s" + }, + { + "login": "drisspg" + }, + { + "login": "kmh4321" + }, + { + "login": "RdoubleA" + }, + { + "login": "jramseyer" + }, + { + "login": "jianingfu" + }, + { + "login": "zengk95" + }, + { + "login": "gtarjun" + }, + { + "login": "mikaylagawarecki" + }, + { + "login": "xianxl" + }, + { + "login": "mingzhe09088" + }, + { + "login": "aazzolini" + }, + { + "login": "nataliakliushkina" + }, + { + "login": "Xirider" + }, + { + "login": "HDCharles" + }, + { + "login": "mcr229" + }, + { + "login": "manuelcandales" + }, + { + "login": "guangy10" + }, + { + "login": "mengwa41" + }, + { + "login": "YulunW" + }, + { + "login": "hx89" + }, + { + "login": "hanhsienhuang" + }, + { + "login": "clee2000" + }, + { + "login": "lhuang04" + }, + { + "login": "sidneyfletcher" + }, + { + "login": "gottbrath" + }, + { + "login": "lessw2020" + }, + { + "login": "taivu1998" + }, + { + "login": "danrecoskie" + }, + { + "login": "zhaojuanmao" + }, + { + "login": "johncalab" + }, + { + "login": "dhthompson" + }, + { + "login": "superwizard2019" + }, + { + "login": "shunting314" + }, + { + "login": "edward-io" + }, + { + "login": "xcheng16" + }, + { + "login": "adamomainz" + }, + { + "login": "sluks" + }, + { + "login": "SebastianAment" + }, + { + "login": "poojahp" + }, + { + "login": "ansley" + }, + { + "login": "cheetah2216" + }, + { + "login": "pinaki-mukerji" + }, + { + "login": "hongxiayang" + }, + { + "login": "kyulee-com" + }, + { + "login": "sstsai-adl" + }, + { + "login": "dahsh" + }, + { + "login": "szewaiyuen7" + }, + { + "login": "byterover" + }, + { + "login": "wmao533" + }, + { + "login": "ejguan" + }, + { + "login": "nimaelyasi" + }, + { + "login": "qxu-fb" + }, + { + "login": "sshawnwu" + }, + { + "login": "njuvekar" + }, + { + "login": "iramazanli" + }, + { + "login": "jnkwok1" + }, + { + "login": "kurman" + }, + { + "login": "jbschlosser" + }, + { + "login": "haichuan-fb" + }, + { + "login": "janghyuncho" + }, + { + "login": "wwang84" + }, + { + "login": "JustinPinero" + }, + { + "login": "gcramer23" + }, + { + "login": "yuguo68" + }, + { + "login": "c-odrin" + }, + { + "login": "chowarfb" + }, + { + "login": "priyaramani" + }, + { + "login": "asalioufb" + }, + { + "login": "four4fish" + }, + { + "login": "kkosik20" + }, + { + "login": "KZFB" + }, + { + "login": "henryliu-bluehills" + }, + { + "login": "muchulee8" + }, + { + "login": "bchen2020" + }, + { + "login": "anirbanr-fb-r2p" + }, + { + "login": "kirklandsign" + }, + { + "login": "izaitsevfb" + }, + { + "login": "ashramac" + }, + { + "login": "weiwangmeta" + }, + { + "login": "andysamfb" + } + ], + "pageInfo": { + "hasNextPage": false, + "endCursor": "Y3Vyc29yOnYyOpHOBp303g==" + } + } + } + } + } + }, + "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MTAw name=pytorch number=76118 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "files": { + "nodes": [ + { + "path": "docs/source/quantization.rst" + }, + { + "path": "docs/source/scripts/build_quantization_configs.py" + }, + { + "path": "test/allowlist_for_publicAPI.json" + }, + { + "path": "test/cpp/jit/source_range_test.cpp" + }, + { + "path": "test/cpp/jit/test_backend.cpp" + }, + { + "path": "test/cpp/jit/test_flatbuffer.cpp" + }, + { + "path": "test/cpp/jit/test_misc.cpp" + }, + { + "path": "test/cpp/jit/test_utils.h" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_float_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_float_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_int_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_int_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_float_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_int_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_scalar_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_tensor_inplace_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_tensor_out_v2.ptl.ff" + }, + { + "path": "test/cpp/jit/upgrader_models/test_versioned_div_tensor_v2.ptl.ff" + }, + { + "path": "test/cpp/profiler/record_function.cpp" + }, + { + "path": "test/distributed/_shard/sharded_tensor/test_sharded_tensor.py" + }, + { + "path": "test/distributed/_shard/test_replicated_tensor.py" + }, + { + "path": "test/distributed/fsdp/test_fsdp_comm.py" + }, + { + "path": "test/distributed/fsdp/test_fsdp_optim_state.py" + }, + { + "path": "test/distributed/optim/test_zero_redundancy_optimizer.py" + }, + { + "path": "test/jit/test_export_modes.py" + }, + { + "path": "test/jit/test_if_hoisting.py" + }, + { + "path": "test/jit/test_tracer.py" + }, + { + "path": "test/jit/test_upgraders.py" + }, + { + "path": "test/mobile/test_lite_script_type.py" + }, + { + "path": "test/onnx/expect/TestOperators.test_layer_norm_aten.expect" + }, + { + "path": "test/onnx/test_operators.py" + }, + { + "path": "test/onnx/test_pytorch_onnx_onnxruntime.py" + }, + { + "path": "test/quantization/ao_migration/test_quantization_fx.py" + }, + { + "path": "test/quantization/core/test_quantized_op.py" + }, + { + "path": "test/quantization/core/test_quantized_tensor.py" + }, + { + "path": "test/quantization/fx/test_numeric_suite_fx.py" + }, + { + "path": "test/quantization/fx/test_quantize_fx.py" + }, + { + "path": "test/test_autograd.py" + }, + { + "path": "test/test_binary_ufuncs.py" + }, + { + "path": "test/test_expanded_weights.py" + }, + { + "path": "test/test_functionalization.py" + }, + { + "path": "test/test_fx_experimental.py" + }, + { + "path": "test/test_jit.py" + }, + { + "path": "test/test_jit_cuda_fuser.py" + }, + { + "path": "test/test_linalg.py" + }, + { + "path": "test/test_nestedtensor.py" + }, + { + "path": "test/test_nn.py" + }, + { + "path": "test/test_ops.py" + }, + { + "path": "test/test_ops_gradients.py" + }, + { + "path": "test/test_ops_jit.py" + }, + { + "path": "test/test_optim.py" + }, + { + "path": "test/test_overrides.py" + }, + { + "path": "test/test_profiler.py" + }, + { + "path": "test/test_public_bindings.py" + }, + { + "path": "test/test_pytree.py" + }, + { + "path": "test/test_reductions.py" + }, + { + "path": "test/test_sort_and_select.py" + }, + { + "path": "test/test_sparse.py" + }, + { + "path": "test/test_sparse_csr.py" + }, + { + "path": "test/test_spectral_ops.py" + }, + { + "path": "test/test_tensor_creation_ops.py" + }, + { + "path": "test/test_tensorboard.py" + }, + { + "path": "test/test_testing.py" + }, + { + "path": "test/test_torch.py" + }, + { + "path": "test/test_unary_ufuncs.py" + }, + { + "path": "third_party/BUCK.github" + }, + { + "path": "third_party/fbgemm" + }, + { + "path": "tools/autograd/derivatives.yaml" + }, + { + "path": "tools/autograd/gen_inplace_or_view_type.py" + }, + { + "path": "tools/autograd/load_derivatives.py" + }, + { + "path": "tools/build_variables.bzl" + }, + { + "path": "tools/codegen/api/autograd.py" + }, + { + "path": "tools/codegen/api/cpp.py" + }, + { + "path": "tools/codegen/api/dispatcher.py" + }, + { + "path": "tools/codegen/api/functionalization.py" + }, + { + "path": "tools/codegen/api/lazy.py" + }, + { + "path": "tools/codegen/api/meta.py" + }, + { + "path": "tools/codegen/api/native.py" + }, + { + "path": "tools/codegen/api/python.py" + }, + { + "path": "tools/codegen/api/structured.py" + }, + { + "path": "tools/codegen/api/translate.py" + }, + { + "path": "tools/codegen/api/types.py" + }, + { + "path": "tools/codegen/api/ufunc.py" + }, + { + "path": "tools/codegen/api/unboxing.py" + }, + { + "path": "tools/codegen/code_template.py" + }, + { + "path": "tools/codegen/context.py" + }, + { + "path": "tools/codegen/decompositions/gen_jit_decompositions.py" + }, + { + "path": "tools/codegen/dest/__init__.py" + }, + { + "path": "tools/codegen/dest/lazy_ir.py" + }, + { + "path": "tools/codegen/dest/lazy_ts_lowering.py" + }, + { + "path": "tools/codegen/dest/native_functions.py" + }, + { + "path": "tools/codegen/dest/register_dispatch_key.py" + }, + { + "path": "tools/codegen/dest/ufunc.py" + }, + { + "path": "tools/codegen/gen.py" + }, + { + "path": "tools/codegen/gen_backend_stubs.py" + }, + { + "path": "tools/codegen/gen_functionalization_type.py" + }, + { + "path": "tools/codegen/gen_lazy_tensor.py" + }, + { + "path": "tools/codegen/local.py" + }, + { + "path": "tools/codegen/model.py" + }, + { + "path": "tools/codegen/operator_versions/gen_mobile_upgraders.py" + } + ], + "pageInfo": { + "endCursor": "MjAw", + "hasNextPage": true + } + } + } + } + } + }, + "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MjAw name=pytorch number=76118 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { "files": { "nodes": [ { - "path": ".circleci/docker/build.sh" + "path": "tools/codegen/selective_build/operator.py" + }, + { + "path": "tools/codegen/selective_build/selector.py" + }, + { + "path": "tools/codegen/shape_functions/gen_jit_shape_functions.py" + }, + { + "path": "tools/codegen/static_runtime/config.py" + }, + { + "path": "tools/codegen/static_runtime/gen_static_runtime_ops.py" + }, + { + "path": "tools/codegen/static_runtime/gen_structured.py" + }, + { + "path": "tools/codegen/utils.py" + }, + { + "path": "tools/linter/adapters/circleci_linter.py" + }, + { + "path": "tools/linter/adapters/clangformat_linter.py" + }, + { + "path": "tools/linter/adapters/grep_linter.py" + }, + { + "path": "tools/linter/adapters/nativefunctions_linter.py" + }, + { + "path": "tools/setup_helpers/BUILD.bazel" + }, + { + "path": "tools/setup_helpers/generate_code.py" + }, + { + "path": "torch/_C/__init__.pyi.in" + }, + { + "path": "torch/amp/autocast_mode.py" + }, + { + "path": "torch/ao/ns/fx/pattern_utils.py" + }, + { + "path": "torch/ao/quantization/backend_config/README.md" + }, + { + "path": "torch/ao/quantization/backend_config/__init__.py" + }, + { + "path": "torch/ao/quantization/backend_config/native.py" + }, + { + "path": "torch/ao/quantization/backend_config/observation_type.py" + }, + { + "path": "torch/ao/quantization/backend_config/tensorrt.py" + }, + { + "path": "torch/ao/quantization/backend_config/utils.py" + }, + { + "path": "torch/ao/quantization/fx/__init__.py" + }, + { + "path": "torch/ao/quantization/fx/backend_config/fuse_handler.py" + }, + { + "path": "torch/ao/quantization/fx/backend_config/quantize_handler.py" + }, + { + "path": "torch/ao/quantization/fx/backend_config_utils.py" + }, + { + "path": "torch/ao/quantization/fx/convert.py" + }, + { + "path": "torch/ao/quantization/fx/fuse.py" + }, + { + "path": "torch/ao/quantization/fx/fusion_patterns.py" + }, + { + "path": "torch/ao/quantization/fx/match_utils.py" + }, + { + "path": "torch/ao/quantization/fx/pattern_utils.py" + }, + { + "path": "torch/ao/quantization/fx/prepare.py" + }, + { + "path": "torch/ao/quantization/fx/quantization_patterns.py" + }, + { + "path": "torch/ao/quantization/qconfig.py" + }, + { + "path": "torch/ao/quantization/quantization_types.py" + }, + { + "path": "torch/ao/quantization/quantize_fx.py" + }, + { + "path": "torch/autograd/__init__.py" + }, + { + "path": "torch/csrc/Module.cpp" + }, + { + "path": "torch/csrc/autograd/FunctionsManual.cpp" + }, + { + "path": "torch/csrc/autograd/FunctionsManual.h" + }, + { + "path": "torch/csrc/autograd/engine.cpp" + }, + { + "path": "torch/csrc/autograd/function.h" + }, + { + "path": "torch/csrc/autograd/functions/accumulate_grad.h" + }, + { + "path": "torch/csrc/autograd/init.cpp" + }, + { + "path": "torch/csrc/autograd/python_torch_functions_manual.cpp" + }, + { + "path": "torch/csrc/autograd/python_variable.cpp" + }, + { + "path": "torch/csrc/autograd/record_function_ops.h" + }, + { + "path": "torch/csrc/autograd/utils/grad_layout_contract.h" + }, + { + "path": "torch/csrc/deploy/CMakeLists.txt" + }, + { + "path": "torch/csrc/distributed/c10d/logger.cpp" + }, + { + "path": "torch/csrc/jit/codegen/cuda/graph_fuser.cpp" + }, + { + "path": "torch/csrc/jit/codegen/cuda/parser.cpp" + }, + { + "path": "torch/csrc/jit/frontend/function_schema_parser.cpp" + }, + { + "path": "torch/csrc/jit/frontend/lexer.h" + }, + { + "path": "torch/csrc/jit/frontend/parser.cpp" + }, + { + "path": "torch/csrc/jit/frontend/parser.h" + }, + { + "path": "torch/csrc/jit/frontend/script_type_parser.cpp" + }, + { + "path": "torch/csrc/jit/frontend/source_range.cpp" + }, + { + "path": "torch/csrc/jit/frontend/source_range.h" + }, + { + "path": "torch/csrc/jit/frontend/source_ref.h" + }, + { + "path": "torch/csrc/jit/frontend/tracer.cpp" + }, + { + "path": "torch/csrc/jit/frontend/tracer.h" + }, + { + "path": "torch/csrc/jit/mobile/debug_info.cpp" + }, + { + "path": "torch/csrc/jit/mobile/debug_info.h" + }, + { + "path": "torch/csrc/jit/mobile/flatbuffer_loader.cpp" + }, + { + "path": "torch/csrc/jit/mobile/module.h" + }, + { + "path": "torch/csrc/jit/passes/common_expression_hoisting.cpp" + }, + { + "path": "torch/csrc/jit/passes/common_expression_hoisting.h" + }, + { + "path": "torch/csrc/jit/passes/frozen_graph_optimizations.cpp" + }, + { + "path": "torch/csrc/jit/passes/onnx/pattern_conversion/common.cpp" }, { - "path": ".circleci/docker/common/install_katex.sh" + "path": "torch/csrc/jit/passes/onnx/scalar_type_analysis.cpp" }, { - "path": ".github/workflows/pull.yml" - } - ], - "pageInfo": { - "endCursor": "Mw", - "hasNextPage": false - } - }, - "reviews": { - "nodes": [ + "path": "torch/csrc/jit/python/init.cpp" + }, { - "author": { - "login": "suo" - }, - "state": "COMMENTED" + "path": "torch/csrc/jit/python/python_tree_views.cpp" }, { - "author": { - "login": "kit1980" - }, - "state": "COMMENTED" + "path": "torch/csrc/jit/python/script_init.cpp" }, { - "author": { - "login": "janeyx99" - }, - "state": "APPROVED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNS0xOFQxNDo0MTowNS0wNTowMLkyMDIyLTA1LTE4VDE0OjQxOjA0LTA1OjAwzjpD7es=", - "hasPreviousPage": false - } - }, - "comments": { - "nodes": [ + "path": "torch/csrc/jit/runtime/graph_executor.cpp" + }, { - "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/77700\n\ud83d\udcc4 \u00a0Preview Python docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\u2753Need help or want to give feedback on the CI? Visit our office hours\n\n\u2705 No Failures (0 Pending)\nAs of commit 8126159 (more details on the Dr. CI page):\nExpand to see more\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", - "author": { - "login": "facebook-github-bot" - }, - "authorAssociation": "MEMBER", - "editor": { - "login": "facebook-github-bot" - }, - "databaseId": 1129400934 + "path": "torch/csrc/jit/runtime/interpreter.cpp" }, { - "bodyText": "@pytorchbot merge", - "author": { - "login": "kit1980" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1131884232 + "path": "torch/csrc/jit/runtime/profiling_graph_executor_impl.cpp" }, { - "bodyText": "Merge failed due to Refusing to merge as mandatory check(s) linux-docs / build-docs (cpp), linux-docs / build-docs (python) are pending/not yet run for rule OSS CI\nRaised by https://github.com/pytorch/pytorch/actions/runs/2353067846", - "author": { - "login": "pytorchmergebot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1131886153 + "path": "torch/csrc/jit/runtime/script_profile.cpp" }, { - "bodyText": "@pytorchbot merge -f", - "author": { - "login": "kit1980" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1131945610 + "path": "torch/csrc/jit/runtime/serialized_shape_function_registry.cpp" }, { - "bodyText": "Hey @kit1980.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", - "author": { - "login": "github-actions" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 1131947473 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOQ1FKZg==", - "hasPreviousPage": false - } - }, - "labels": { - "edges": [ + "path": "torch/csrc/jit/runtime/serialized_shape_function_registry.h" + }, { - "node": { - "name": "Merged" - } + "path": "torch/csrc/jit/runtime/shape_function_registry.h" }, { - "node": { - "name": "cla signed" - } - } - ] - } - } - } - } - }, - "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAYNi1Nc= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAYduu0A= name=pytorch number=77700 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "commits": { - "nodes": [ + "path": "torch/csrc/jit/runtime/shape_functions.h" + }, { - "commit": { - "oid": "81261599614423baa17df72300b8e109677b6799", - "checkSuites": { - "nodes": [ - { - "checkRuns": { - "nodes": [ - { - "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499259645?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (cpp)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499394792?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (python)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499394839?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499739021?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6499739073?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNqJcE=", - "hasNextPage": false - } - } - } - ] - } - } + "path": "torch/csrc/jit/runtime/shape_functions_1.h" + }, + { + "path": "torch/csrc/jit/runtime/static/impl.cpp" + }, + { + "path": "torch/csrc/jit/runtime/static/passes.cpp" + }, + { + "path": "torch/csrc/jit/runtime/symbolic_shape_registry.cpp" + }, + { + "path": "torch/csrc/jit/runtime/symbolic_shape_registry.h" + }, + { + "path": "torch/csrc/jit/serialization/export_module.cpp" + }, + { + "path": "torch/csrc/jit/serialization/flatbuffer_serializer.cpp" + }, + { + "path": "torch/csrc/jit/serialization/import.cpp" + }, + { + "path": "torch/csrc/jit/serialization/import_export_helpers.cpp" + }, + { + "path": "torch/csrc/jit/serialization/import_export_helpers.h" + }, + { + "path": "torch/csrc/jit/serialization/import_source.cpp" + }, + { + "path": "torch/csrc/jit/serialization/import_source.h" + }, + { + "path": "torch/csrc/jit/serialization/source_range_serialization.cpp" + }, + { + "path": "torch/csrc/jit/serialization/source_range_serialization.h" + }, + { + "path": "torch/csrc/jit/testing/file_check.cpp" + }, + { + "path": "torch/csrc/lazy/core/dynamic_ir.cpp" + }, + { + "path": "torch/csrc/lazy/core/dynamic_ir.h" + }, + { + "path": "torch/csrc/lazy/ts_backend/ts_eager_fallback.cpp" } - ] + ], + "pageInfo": { + "endCursor": "MzAw", + "hasNextPage": true + } } } } } }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=68111 owner=pytorch": { + "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MzAw name=pytorch number=76118 owner=pytorch": { "data": { "repository": { "pullRequest": { - "closed": true, - "isCrossRepository": true, - "author": { - "login": "chunyuan-w" - }, - "title": "Add JIT graph fuser for oneDNN Graph API (Preview4)", - "body": "## Description\r\nPreview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).\r\n\r\nOn the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:\r\n\r\n- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used\r\n- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.\r\n\r\n### User API:\r\nThe optimization pass is disabled by default. Users could enable it by:\r\n```\r\ntorch.jit.enable_onednn_fusion(True)\r\n```\r\n\r\n### Performance:\r\n[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:\r\n- SkyLake 8180 (1 socket of 28 cores):\r\n\r\n ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)\r\n\r\n- SkyLake 8180 (single thread):\r\n\r\n ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)\r\n \\* By mapping hardswish to oneDNN Graph, it\u2019s 8% faster than PyTorch JIT (NNC + OFI)\r\n \\** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops\r\n\r\n\r\n### Directory structure of the integration code\r\nFuser-related code are placed under:\r\n```\r\ntorch/csrc/jit/codegen/onednn/\r\n```\r\n\r\nOptimization pass registration is done in:\r\n```\r\ntorch/csrc/jit/passes/onednn_graph_fuser.h\r\n```\r\n\r\nCMake for the integration code is:\r\n```\r\ncaffe2/CMakeLists.txt\r\n```\r\n\r\n## Limitations\r\n\r\n- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.\r\n- We have only optimized the inference use case.", - "headRefName": "chunyuan/llga_preview2", - "headRepository": { - "nameWithOwner": "chunyuan-w/pytorch" - }, - "baseRefName": "master", - "baseRepository": { - "nameWithOwner": "pytorch/pytorch", - "isPrivate": false, - "defaultBranchRef": { - "name": "master" - } - }, - "mergeCommit": null, - "commits_with_authors": { - "nodes": [ - { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "0096fcc49f277fd8e006fcb42e0cb28a1422ec98" - } - }, - { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "7bcc4de26a5472f1d252735dd425b46794b0844f" - } + "files": { + "nodes": [ + { + "path": "torch/csrc/lazy/ts_backend/ts_native_functions.cpp" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "3a2a588bfe6bbf9bf74d88d441cd22affda207da" - } + "path": "torch/csrc/utils/python_arg_parser.cpp" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "ca7df12fbfaa3ddbabeca39b76300d17f4a33f2f" - } + "path": "torch/csrc/utils/python_arg_parser.h" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "81d44f35b8bc043c38837d0694e5bc072203b832" - } + "path": "torch/csrc/utils/tensor_list.cpp" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "14fd5d1bfc2c58a71379f778871e3fca0a8e79b2" - } + "path": "torch/csrc/utils/tensor_new.cpp" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "954dc23663125897f4b199eb2a8607dc5fca3274" - } + "path": "torch/csrc/utils/tensor_new.h" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "9f77a0b476accc678b6f0569e4ff33fa6bbe97fc" - } + "path": "torch/distributed/_shard/__init__.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" - }, - "oid": "fbf3b23bc1288697e1aec539a7c4ee3dc0bcb84c" - } + "path": "torch/distributed/_shard/api.py" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "f8b8e78f786586c3cdf3966fd83ffa124d3eda70" - } + "path": "torch/distributed/_shard/replicated_tensor.py" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "6fffa2f7453ee7e0f8d8e2f73ea8a65230539589" - } + "path": "torch/distributed/_shard/sharded_tensor/__init__.py" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "849385404e6f3cd1cf7cef19f931ecf4fa28afdb" - } + "path": "torch/distributed/_shard/sharded_tensor/api.py" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "adbae7b77f8c0dbc59fccf15207d97ba86cfade2" - } + "path": "torch/distributed/_shard/sharded_tensor/utils.py" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "6dcf2a4981aff24fa16fc7461ae4ec29690f956f" - } + "path": "torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "54f3e05ad524cffd0911ee93be3c50f589b51f58" - } + "path": "torch/distributed/algorithms/model_averaging/utils.py" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "edbfc640ea79a0af85757d9e73796dcc90231519" - } + "path": "torch/distributed/fsdp/_optim_utils.py" }, { - "commit": { - "author": { - "user": { - "login": "chunyuan-w" - }, - "email": "chunyuan.wu@intel.com", - "name": "chunyuan" - }, - "oid": "67654db7cba562809d1b4a44cdda58af5cc9daaf" - } + "path": "torch/distributed/fsdp/fully_sharded_data_parallel.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "9c9d99b930b11af9ff03f52d45bf49c652df758d" - } + "path": "torch/distributed/nn/__init__.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "ffb25119cd9ce815cc4d9d14a2317fcbbfa9ea86" - } + "path": "torch/distributed/nn/functional.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "ab9eee84512ca1bdfbc81e25c6eb67b29d0f302a" - } + "path": "torch/distributed/optim/functional_adagrad.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "62a4642cf3330524990a69ac29e002c97812320a" - } + "path": "torch/fx/experimental/meta_tracer.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "ca9b1223be4af2c8b4929303d498eafd71793128" - } + "path": "torch/fx/graph.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "6f4a23d24514a02954d2ec792830085f612223c9" - } + "path": "torch/jit/_shape_functions.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" - }, - "oid": "b2a9a9c0926b02d0b2e87722ed61450f224a61d0" - } + "path": "torch/nn/parallel/_replicated_tensor_ddp_interop.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "e88b492be733f24b6aa395829c76add67d0901e7" - } + "path": "torch/nn/parallel/_replicated_tensor_ddp_utils.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "c44336d7a914952bfb78e012e08d9a6d6dde5937" - } + "path": "torch/nn/parallel/distributed.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "5157930f7b3921d41a586260582b574c915f6ca1" - } + "path": "torch/nn/utils/_expanded_weights/__init__.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "04cb8353813f6bbd0d913a994923cc7e1e291406" - } + "path": "torch/nn/utils/_expanded_weights/instance_norm_expanded_weights.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" - }, - "oid": "62991eaad0e638bb0bced327e03f932f66f68732" - } + "path": "torch/onnx/symbolic_opset11.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" - }, - "oid": "7496bf1588050191595d833d23b8972b2f22655e" - } + "path": "torch/onnx/symbolic_opset12.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" - }, - "oid": "d9d35f23cca0cd29c78a845731b24826152dcf1c" - } + "path": "torch/onnx/symbolic_opset9.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "f74ec134f18a65a7c72455bdf44f72e3ebb27105" - } + "path": "torch/optim/adagrad.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "eb32cc65a975361160948bfc3d6a577991ea262e" - } + "path": "torch/optim/lr_scheduler.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "c7665f8d695b680c54db0bad2b7b7df46d886b50" - } + "path": "torch/overrides.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "e6321ad8f59ea01130568c202d186448bb9cb9d0" - } + "path": "torch/quantization/fx/pattern_utils.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "a72cd0d02693f45e5354a70654581ad514581ec7" - } + "path": "torch/quantization/fx/quantization_patterns.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "b3cd3028b4ed31805e82f7eaf02217ab74ca59b9" - } + "path": "torch/quantization/fx/quantization_types.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "49a592d9788d08e6cd0593882f867e129057c1cc" - } + "path": "torch/return_types.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "0575766b2144b13f6a38227c4e2b8d22ec8db80f" - } + "path": "torch/testing/_internal/common_device_type.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "b5c9b10ff87d622350e8ca64fae3a476eb70d5aa" - } + "path": "torch/testing/_internal/common_distributed.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "66bc652a30ccc329adb929870a4ac726bb98b38c" - } + "path": "torch/testing/_internal/common_fx2trt.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "72b9ca9c8e2dac98cbb7199b3dfac7c7305b80c5" - } + "path": "torch/testing/_internal/common_methods_invocations.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "a7892ed7373207d96406c8b5734a089643c5cdbd" - } + "path": "torch/testing/_internal/common_utils.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" - }, - "oid": "d54cb084e1daad8a08c3f8de0ad3f7afb5b05ac1" - } + "path": "torch/testing/_internal/composite_compliance.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" - }, - "oid": "aef71d692a8a159e0ca56be363e2cc1225ce7647" - } + "path": "torch/testing/_internal/distributed/distributed_test.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "bf618e205ec31cff962dcc8ab478e0a699a9572d" - } + "path": "torch/testing/_internal/jit_metaprogramming_utils.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "e4a331f1088448f7d7d86256ce71e0e71da006b0" - } + "path": "torch/utils/cpp_extension.py" }, { - "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "0b743523d1430fec759d5fefbb687f17c89335a5" - } + "path": "torch/utils/data/datapipes/_typing.py" }, + { + "path": "torch/utils/model_dump/__init__.py" + } + ], + "pageInfo": { + "endCursor": "MzQ4", + "hasNextPage": false + } + } + } + } + } + }, + "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAWuVD9M= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAXEsRtE= name=pytorch number=76118 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { + "nodes": [ { "commit": { - "author": { - "user": { - "login": "sanchitintel" - }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" - }, - "oid": "e80a351a62d98b810ec8985c4b25257af1d6c5bb" + "oid": "5696e8357cf38f852ef3d680381513e26f202371", + "checkSuites": { + "nodes": [ + { + "checkRuns": { + "nodes": [ + { + "name": "win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232785220" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuVECw=", + "hasNextPage": false + } + } + } + ] + } } - }, + } + ] + } + } + } + } + }, + "query_sha=e29d0e1d73b9847dacfd671c80e117d82111b407f7daa8ff885d3c444eafe47f name=pytorch number=79694 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "kshitij12345" + }, + "title": "[complex] conv_transpose1d", + "body": "Reference: https://github.com/pytorch/pytorch/issues/71108", + "headRefName": "develop/complex/conv_transpose1d", + "headRepository": { + "nameWithOwner": "kshitij12345/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "c189eca154b6691919d0e21489d1c322c7435c0b" + "oid": "d1ea948e65ac6d31ad056287ab65d38ecc68b30d" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "e080a067c75d7b888a8a362682a2d5ba70e0c3a8" + "oid": "b4ba1db9a3a71bd8c03158dcd1b68711360633d8" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "028561fbf8f3ed90e074e6e0e3a4ca4dd7ffa2a8" + "oid": "655a4220beae163bfe578f0318a130df01ec05d6" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "Kshiteej K" }, - "oid": "d550cf14037badd4caa2f52202e2f20bc4db8432" + "oid": "8181716be7a8005eb13ad5c3f2e1279ed1c60aff" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "574159ebadd1dec24daaf883879ffeca8d9e71b7" + "oid": "9e5ca3663e7471786eeebebfdf84aea5d761712f" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "9eb3ee98ea756067ed1c8f52f309f6d3e211a904" + "oid": "9c110f39bcdc4e56386b6f9c4e2c082c8940ade6" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "29929f48be03dcdd1bbfade572de7feafa825547" + "oid": "49315e79d0eee8008e2a74575c6fc0f6a9531ee4" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "8a7358ca8da547b40ea1a99ddc57ebed19959684" + "oid": "728752480760226270c374a0acc08e28b9b133f3" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "6606637d2c5525b43e294a8b366a85052e1be0c6" + "oid": "ffe43399d6f60ef7844523a5f465c11d9a67062f" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "5ecfd1f28b87045deb8bc8ffe33b3d8b906f3264" + "oid": "9672a2198472567bae4ac6f55d004f7e1fa8a9fa" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchit.jain" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "be2d4345c65442c4cfbe8afdfb2ae0893945da42" + "oid": "48a0ebf32b895286f036b36c871f671dc867e400" } }, { "commit": { "author": { "user": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "email": "sanchit.jain@intel.com", - "name": "sanchitintel" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "b5b89d3644a43e2dbda841cafb71b32edbe07c8a" + "oid": "52fbe80d5c8a94e03d816c0bd21fd82019dcd5ac" } }, { "commit": { "author": { "user": { - "login": "malfet" + "login": "kshitij12345" }, - "email": "nikita.shulga@gmail.com", - "name": "Nikita Shulga" + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" }, - "oid": "73881411e2bfb3aaa2e89926a82390b4c587ad75" + "oid": "2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce" } } ], "pageInfo": { - "endCursor": "NjI", + "endCursor": "MTM", "hasNextPage": false }, - "totalCount": 62 + "totalCount": 13 }, "commits": { "nodes": [ @@ -11764,7 +19262,7 @@ { "name": "Facebook CLA Check", "conclusion": "SUCCESS", - "detailsUrl": "https://code.intern.facebook.com/cla/" + "detailsUrl": "https://code.facebook.com/cla/" }, { "name": "Meta Internal-Only Changes Check", @@ -11773,14 +19271,14 @@ } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NXnc=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdtq8Hc=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625010" + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7929899098" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYwzI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdioqFo=" }, { "node": { @@ -11790,81 +19288,26 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { "nodes": [ { - "name": "clang-format", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633826958?check_suite_focus=true" - }, - { - "name": "py2-setup-validate-errormsg", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827084?check_suite_focus=true" - }, - { - "name": "quick-checks", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827160?check_suite_focus=true" - }, - { - "name": "shellcheck", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827410?check_suite_focus=true" - }, - { - "name": "toc", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827566?check_suite_focus=true" - }, - { - "name": "clang-tidy", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827701?check_suite_focus=true" - }, - { - "name": "cmakelint", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827899?check_suite_focus=true" - }, - { - "name": "flake8-py3", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828081?check_suite_focus=true" - }, - { - "name": "Test collect_env (with_torch)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828249?check_suite_focus=true" - }, - { - "name": "Test collect_env (without_torch)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828312?check_suite_focus=true" - }, - { - "name": "Test tools", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828407?check_suite_focus=true" - }, - { - "name": "mypy", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828524?check_suite_focus=true" + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393316/jobs/4628529923" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NZqw=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdqTEwk=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625458" + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7929899387" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxPI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdioqXs=" }, { "node": { @@ -11874,26 +19317,66 @@ }, "workflowRun": { "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + "name": "Lint" } }, "checkRuns": { "nodes": [ { - "name": "run-torchbench", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633826956?check_suite_focus=true" + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628529910" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628530162" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628530698" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628530867" + }, + { + "name": "Test collect_env (older_python_version)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628530989" + }, + { + "name": "pr-sanity-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628531151" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628531475" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628531753" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628531853" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NYIw=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdqTHFY=", "hasNextPage": false } }, - "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625463" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7929899388" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxPc=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdioqXw=" }, { "node": { @@ -11909,782 +19392,1391 @@ "checkRuns": { "nodes": [ { - "name": "pytorch-xla-linux-bionic-py3.7-clang8", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827223?check_suite_focus=true" + "name": "linux-focal-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628531149" }, { - "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827451?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628531473" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827729?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628531754" }, { - "name": "linux-bionic-rocm4.5-py3.7 / build", + "name": "linux-jammy-cuda11.6-cudnn8-py3.8-clang12 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633827956?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628531857" }, { - "name": "linux-xenial-py3.7-clang7-asan / build", + "name": "linux-focal-py3.7-gcc7-pch / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828089?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628532179" }, { - "name": "linux-xenial-py3.7-clang7-onnx / build", + "name": "linux-focal-py3.7-clang10-onnx / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828258?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628532543" }, { - "name": "linux-bionic-py3.7-clang9 / build", + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828406?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628532694" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "name": "linux-focal-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828523?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628532918" }, { - "name": "linux-xenial-py3.7-gcc5.4 / build", + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533033" + }, + { + "name": "linux-focal-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533181" + }, + { + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533420" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533630" + }, + { + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533825" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828594?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533959" }, { "name": "linux-xenial-py3-clang5-mobile-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828765?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534129" }, { - "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "name": "linux-bionic-py3_7-clang8-xla / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633828992?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534256" }, { - "name": "linux-xenial-py3.7-gcc7 / build", + "name": "linux-focal-rocm5.2-py3.7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829085?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534388" }, { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "name": "linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534571" + }, + { + "name": "linux-bionic-cuda11_6-py3_10-gcc7-deploy / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534714" + }, + { + "name": "win-vs2019-cuda11.6-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829195?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534989" }, { "name": "win-vs2019-cpu-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829321?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628535311" }, { - "name": "win-vs2019-cuda11.3-py3 / build", + "name": "linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829420?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639115" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "name": "linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829488?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639198" }, { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "name": "linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829666?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639265" }, { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "name": "linux-focal-py3.7-gcc7 / test (functorch, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829746?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639339" }, { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "name": "linux-focal-py3.7-gcc7 / test (docs_test, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829845?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639395" }, { - "name": "pytorch-xla-linux-bionic-py3.7-clang8", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5633829904?check_suite_focus=true" + "name": "linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639450" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639509" }, { "name": "linux-docs / build-docs (cpp)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634453168?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639572" }, { "name": "linux-docs / build-docs (python)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634453232?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639635" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634453388?check_suite_focus=true" + "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647047" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634453444?check_suite_focus=true" + "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647119" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634453499?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647215" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634453573?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647277" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634453624?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647348" }, { - "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634453683?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647432" }, { - "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634462211?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647522" }, { - "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634462270?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647641" }, { - "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634602176?check_suite_focus=true" + "name": "linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647762" }, { - "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634602239?check_suite_focus=true" + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628653797" }, { - "name": "linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634602319?check_suite_focus=true" + "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679376" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634602425?check_suite_focus=true" + "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679431" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634622529?check_suite_focus=true" + "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679469" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634622639?check_suite_focus=true" + "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679519" }, { - "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634622730?check_suite_focus=true" + "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679594" }, { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634637718?check_suite_focus=true" + "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628681226" }, { - "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634637817?check_suite_focus=true" + "name": "linux-bionic-cuda11_6-py3_10-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628854932" }, { - "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634775159?check_suite_focus=true" + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856434" }, { - "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634775273?check_suite_focus=true" + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856501" }, { - "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634823038?check_suite_focus=true" + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856575" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdqZ2fA=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7929899419" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdioqZs=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "windows-binary-libtorch-debug" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "libtorch-cpu-shared-with-deps-debug-build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351637/jobs/4634503587" }, { - "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634823099?check_suite_focus=true" + "name": "libtorch-cpu-shared-with-deps-debug-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351637/jobs/4635312938" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsbsmM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953056" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUSuA=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "windows-binary-wheel" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "wheel-py3_7-cuda11_3-build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351640/jobs/4634503571" }, { - "name": "win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634823171?check_suite_focus=true" + "name": "wheel-py3_7-cuda11_3-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351640/jobs/4636146265" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsskcw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953059" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUSuM=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "windows-binary-libtorch-release" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "libtorch-cpu-shared-with-deps-release-build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351643/jobs/4634503570" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634920855?check_suite_focus=true" + "name": "libtorch-cpu-shared-with-deps-release-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351643/jobs/4635003925" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsVbD8=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953061" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUSuU=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-binary-libtorch-cxx11-abi" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "libtorch-cpu-shared-with-deps-cxx11-abi-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351698/jobs/4634504079" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634921428?check_suite_focus=true" - }, + "name": "libtorch-cpu-shared-with-deps-cxx11-abi-test / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351698/jobs/4635072931" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsW5Aw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953185" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUS2E=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-binary-libtorch-pre-cxx11" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634921484?check_suite_focus=true" + "name": "libtorch-cpu-shared-with-deps-cxx11-abi-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351700/jobs/4634503897" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634921543?check_suite_focus=true" - }, + "name": "libtorch-cpu-shared-with-deps-cxx11-abi-test / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351700/jobs/4635077148" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsW-jo=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953186" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUS2I=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-binary-manywheel" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634995986?check_suite_focus=true" + "name": "manywheel-py3_7-cuda10_2-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351699/jobs/4634503896" }, { - "name": "linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu)", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5634996056?check_suite_focus=true" + "name": "manywheel-py3_7-cuda10_2-test / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351699/jobs/4635934290" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_fN1g=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsoMEA=", "hasNextPage": false } }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625483" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953187" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxQs=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUS2M=" } ], "pageInfo": { - "hasNextPage": false + "hasNextPage": true } }, - "pushedDate": "2022-03-21T19:58:52Z", - "oid": "73881411e2bfb3aaa2e89926a82390b4c587ad75" + "status": null, + "pushedDate": "2022-08-22T22:04:19Z", + "oid": "2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce" } } ] }, - "changedFiles": 37, - "files": { - "nodes": [ - { - "path": "aten/src/ATen/core/interned_strings.h" - }, - { - "path": "caffe2/CMakeLists.txt" - }, - { - "path": "cmake/Dependencies.cmake" - }, - { - "path": "cmake/Modules/FindMKLDNN.cmake" - }, - { - "path": "cmake/public/mkldnn.cmake" - }, - { - "path": "docs/source/jit.rst" - }, - { - "path": "test/test_jit_llga_fuser.py" - }, - { - "path": "torch/_C/__init__.pyi.in" - }, - { - "path": "torch/csrc/jit/codegen/onednn/LlgaTensorImpl.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/LlgaTensorImpl.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/README.md" - }, - { - "path": "torch/csrc/jit/codegen/onednn/defer_size_check.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/defer_size_check.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/graph_fuser.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/graph_fuser.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/graph_helper.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/graph_helper.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/graph_rewriter.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/guard_shape.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/guard_shape.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/interface.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/interface.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/kernel.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/kernel.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/layout_propagation.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/layout_propagation.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/operator.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/prepare_binary.cpp" - }, - { - "path": "torch/csrc/jit/codegen/onednn/prepare_binary.h" - }, - { - "path": "torch/csrc/jit/codegen/onednn/register_interface.cpp" - }, - { - "path": "torch/csrc/jit/ir/alias_analysis.cpp" - }, - { - "path": "torch/csrc/jit/ir/ir.cpp" - }, - { - "path": "torch/csrc/jit/passes/inline_autodiff_subgraphs.cpp" - }, - { - "path": "torch/csrc/jit/passes/onednn_graph_fuser.h" - }, - { - "path": "torch/csrc/jit/python/init.cpp" - }, - { - "path": "torch/csrc/jit/runtime/operator.cpp" - }, - { - "path": "torch/jit/__init__.py" - } - ], - "pageInfo": { - "endCursor": "Mzc", - "hasNextPage": false - } - }, - "reviews": { - "nodes": [ - { - "author": { - "login": "pinzhenx" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "pinzhenx" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "pinzhenx" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "chunyuan-w" - }, - "state": "COMMENTED" - }, - { - "author": { - "login": "eellison" - }, - "state": "COMMENTED" - }, + "changedFiles": 3, + "files": { + "nodes": [ { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" + "path": "aten/src/ATen/native/Convolution.cpp" }, { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" + "path": "torch/testing/_internal/common_methods_invocations.py" }, { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" - }, + "path": "torch/testing/_internal/common_modules.py" + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ { "author": { - "login": "sanchitintel" + "login": "ngimel" }, - "state": "COMMENTED" - }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNy0xOVQxMDowNzo1NC0wNzowMLkyMDIyLTA3LTE5VDEwOjA3OjU0LTA3OjAwzj43QcY=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ { + "bodyText": "@pytorchbot merge -g\nAll is green internally!", "author": { - "login": "sanchitintel" + "login": "albanD" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1224702749 }, { + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here.\nThe merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.\nPlease reach out to the PyTorch DevX Team with feedback or questions!", "author": { - "login": "sanchitintel" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1224705564 }, { + "bodyText": "Thanks for looking into it \ud83d\ude42 @albanD @jeanschmidt", "author": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1224712351 }, { + "bodyText": "Hey @kshitij12345.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": { - "login": "sanchitintel" + "login": "github-actions" }, - "state": "COMMENTED" + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1224956051 }, { + "bodyText": "Yeah, discussed with my manager and I got the required permissions to do so. Sorry for not responding promptly yesterday. But I am available from now on to provide assistance :)", "author": { - "login": "sanchitintel" + "login": "jeanschmidt" }, - "state": "COMMENTED" - }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1225462612 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOSP97HQ==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" + "node": { + "name": "open source" + } }, { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" + "node": { + "name": "Merged" + } }, { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" + "node": { + "name": "cla signed" + } }, { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" + "node": { + "name": "Reverted" + } }, { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" + "node": { + "name": "ciflow/trunk" + } }, { - "author": { - "login": "sanchitintel" - }, - "state": "COMMENTED" - }, + "node": { + "name": "ciflow/periodic" + } + } + ] + } + } + } + } + }, + "query_sha=62ce809793481ce6ddce6e1a19d9b0761755ff0ff75decaf8a79419eaf793110 cursor=Y3Vyc29yOnYyOpHOSP97HQ== name=pytorch number=79694 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "comments": { + "nodes": [ { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/79694\n\ud83d\udcc4 \u00a0Preview Python docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\u2753Need help or want to give feedback on the CI? Visit our office hours\n\n\u2705 No Failures (0 Pending)\nAs of commit 2fd08f1 (more details on the Dr. CI page):\nExpand to see more\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", "author": { - "login": "sanchitintel" + "login": "facebook-github-bot" }, - "state": "COMMENTED" - }, - { - "author": { - "login": "sanchitintel" + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" }, - "state": "COMMENTED" + "databaseId": 1157454523 }, { + "bodyText": "Unable to reproduce jit failure locally (will skip the test)\nCI Failure : https://github.com/pytorch/pytorch/runs/6926187074?check_suite_focus=true#step:9:20230\npytest test/test_ops_jit.py -k test_variant_consistency_jit_nn_functional_conv_transpose1d_cpu_complex64 -v\n=============================================================== test session starts ===============================================================\nplatform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- /home/kshiteej/.conda/envs/pytorch-cuda-dev/bin/python\ncachedir: .pytest_cache\nhypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/kshiteej/Pytorch/pytorch_complex_convolution.py/.hypothesis/examples')\nrootdir: /home/kshiteej/Pytorch/pytorch_complex_convolution.py, configfile: pytest.ini\nplugins: hypothesis-6.23.2, repeat-0.9.1\ncollected 1976 items / 1975 deselected / 1 selected \n\ntest/test_ops_jit.py::TestJitCPU::test_variant_consistency_jit_nn_functional_conv_transpose1d_cpu_complex64 PASSED [100%]\n\n================================================================ warnings summary =================================================================\n../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/testing/_internal/common_cuda.py:9\n /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/testing/_internal/common_cuda.py:9: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives\n from distutils.version import LooseVersion\n\n../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:91\n /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:91: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.\n warnings.warn(\n\n-- Docs: https://docs.pytest.org/en/stable/warnings.html\n================================================= 1 passed, 1975 deselected, 2 warnings in 4.90s =================================================", "author": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "state": "COMMENTED" - }, - { - "author": { - "login": "sanchitintel" + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "kshitij12345" }, - "state": "COMMENTED" + "databaseId": 1186949486 }, { + "bodyText": "@pytorchbot merge", "author": { - "login": "wukong1992" + "login": "ngimel" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189347786 }, { + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", "author": { - "login": "eellison" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189350009 }, { + "bodyText": "Hey @kshitij12345.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": { - "login": "eellison" + "login": "github-actions" }, - "state": "COMMENTED" + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1189350932 }, { + "bodyText": "@pytorchbot revert -m \"broke slow test https://github.com/pytorch/pytorch/runs/7414560957?check_suite_focus=true#step:9:31516\" -c \"nosignal\"", "author": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1189459845 }, { + "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", "author": { - "login": "sanchitintel" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189460926 }, { + "bodyText": "Will not revert as @kshitij12345 is not a MEMBER, but COLLABORATOR", "author": { - "login": "eellison" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189460942 }, { + "bodyText": "@pytorchbot revert -m \"broke slow test https://github.com/pytorch/pytorch/runs/7414560957?check_suite_focus=true#step:9:31516\" -c \"nosignal\"", "author": { - "login": "sanchitintel" + "login": "anjali411" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189529734 }, { + "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", "author": { - "login": "sanchitintel" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189530756 }, { + "bodyText": "@kshitij12345 your PR has been successfully reverted.", "author": { - "login": "sanchitintel" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189530831 }, { + "bodyText": "@pytorchbot merge -g", "author": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1190070141 }, { + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", "author": { - "login": "sanchitintel" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1190071424 }, { + "bodyText": "Hey @kshitij12345.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", "author": { - "login": "eellison" + "login": "github-actions" }, - "state": "APPROVED" + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1190258272 }, { + "bodyText": "commit is breaking internal builds/tests https://pastebin.com/HX4RUusH (pytorch/functorch/test:test_eager_transforms)", "author": { - "login": "sanchitintel" + "login": "jeanschmidt" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191327616 }, { + "bodyText": "@pytorchbot revert -m \"breaking internal builds\" -c \"ghfirst\"", "author": { - "login": "eellison" + "login": "jeanschmidt" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191328013 }, { + "bodyText": "@pytorchbot revert -m \"breaking internal builds\" -c \"ghfirst\"", "author": { - "login": "malfet" + "login": "jeanschmidt" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191329792 }, { + "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", "author": { - "login": "sanchitintel" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191330586 }, { + "bodyText": "@kshitij12345 your PR has been successfully reverted.", "author": { - "login": "malfet" + "login": "pytorchmergebot" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191330690 }, { + "bodyText": "@jeanschmidt which test is it failing on? I tried running the test_eager_transforms in functorch but couldn't reproduce it.", "author": { - "login": "malfet" + "login": "kshitij12345" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1193667568 }, { + "bodyText": "@jbschlosser have added a ref as discussed offline. Can you please take a look? And if it looks good, can you import the PR to check if it is breaking anything internally.\nThanks", "author": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1204329491 }, { + "bodyText": "@jbschlosser @jeanschmidt @albanD anything we can do to unblock this on our side?", "author": { - "login": "sanchitintel" + "login": "lezcano" }, - "state": "COMMENTED" + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1221266218 }, { + "bodyText": "Functorch tests should be running here now so can you rebase on top of master please?", "author": { - "login": "sanchitintel" + "login": "albanD" }, - "state": "COMMENTED" + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1223129944 }, { + "bodyText": "@albanD have rebased on latest master.", "author": { - "login": "sanchitintel" + "login": "kshitij12345" }, - "state": "COMMENTED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMS0xMi0xMFQxMToyNDoxOS0wNjowMLkyMDIxLTEyLTEwVDExOjI0OjE5LTA2OjAwzjFryLE=", - "hasPreviousPage": false - } - }, - "comments": { - "nodes": [ + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1223758571 + }, { - "bodyText": "Looks like this broke master https://hud.pytorch.org/pytorch/pytorch/commit/7dd08230117f4fa8bb82b3524e90fb00340198c7. I am reverting.", + "bodyText": "I triggered all the tests not to have any issues with slow tests again", "author": { - "login": "suo" + "login": "lezcano" }, - "authorAssociation": "MEMBER", + "authorAssociation": "COLLABORATOR", "editor": null, - "databaseId": 1074498483 + "databaseId": 1223796413 }, { - "bodyText": "@pytorchbot revert this", + "bodyText": "Thanks @lezcano! However, last time it was reverted for internal failures. So it would be great if someone can import and verify that.\ncc: @albanD @jeanschmidt", "author": { - "login": "suo" + "login": "kshitij12345" }, - "authorAssociation": "MEMBER", + "authorAssociation": "COLLABORATOR", "editor": null, - "databaseId": 1074498550 + "databaseId": 1223863075 }, { - "bodyText": "Looks like this broke master https://hud.pytorch.org/pytorch/pytorch/commit/7dd08230117f4fa8bb82b3524e90fb00340198c7. I am reverting.\n\nOops! Will fix it ASAP.", + "bodyText": "@albanD has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", "author": { - "login": "sanchitintel" + "login": "facebook-github-bot" }, - "authorAssociation": "CONTRIBUTOR", + "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1074499668 + "databaseId": 1224175731 }, { - "bodyText": "This pull request has been reverted by e5bf879. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).", + "bodyText": "I am not the right person to provide assistence, as currently I am not based in a Tier 1 location, so my permissions to access are so restricted that I am not able to import this commit, run the tests and provide meaningful responses.", "author": { - "login": "facebook-github-bot" + "login": "jeanschmidt" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1074508608 + "databaseId": 1224272324 }, { - "bodyText": "This pull request has been reverted by e5bf879. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).", + "bodyText": "@jeanschmidt has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", "author": { "login": "facebook-github-bot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1082508130 + "databaseId": 1224351135 } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOQAuLsw==", - "hasPreviousPage": true + "startCursor": "Y3Vyc29yOnYyOpHORP1auw==", + "hasPreviousPage": false } - }, - "labels": { - "edges": [ - { - "node": { - "name": "oncall: jit" - } - }, - { - "node": { - "name": "triaged" - } - }, - { - "node": { - "name": "open source" - } - }, - { - "node": { - "name": "cla signed" - } - }, + } + } + } + } + }, + "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAdqZ2fA= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAdioqXw= name=pytorch number=79694 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { + "nodes": [ { - "node": { - "name": "Reverted" + "commit": { + "oid": "2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce", + "checkSuites": { + "nodes": [ + { + "checkRuns": { + "nodes": [ + { + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856668" + }, + { + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856772" + }, + { + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856812" + }, + { + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (functorch, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856867" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628858900" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628858948" + }, + { + "name": "win-vs2019-cpu-py3 / test (functorch, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628859006" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdqZ5lE=", + "hasNextPage": false + } + } + } + ] + } } - }, + } + ] + } + } + } + } + }, + "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAdkUS2M= name=pytorch number=79694 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "commits": { + "nodes": [ { - "node": { - "name": "intel priority" + "commit": { + "oid": "2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce", + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "trunk" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "macos-12-py3-x86-64 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634504326" + }, + { + "name": "macos-12-py3-arm64 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634504522" + }, + { + "name": "parallelnative-linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634504655" + }, + { + "name": "caffe2-linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634504882" + }, + { + "name": "android-emulator-build-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634505033" + }, + { + "name": "ios-12-5-1-x86-64 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634505167" + }, + { + "name": "linux-bionic-py3.7-clang9-slow / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634505347" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634505499" + }, + { + "name": "libtorch-linux-bionic-cuda11.6-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634505639" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634505767" + }, + { + "name": "win-vs2019-cuda11.6-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634506032" + }, + { + "name": "macos-12-py3-x86-64-lite-interpreter / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634506202" + }, + { + "name": "linux-focal-rocm5.2-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634506357" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634506535" + }, + { + "name": "linux-bionic-py3.7-clang9-slow / test (slow, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634664404" + }, + { + "name": "parallelnative-linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634669945" + }, + { + "name": "parallelnative-linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634670046" + }, + { + "name": "macos-12-py3-x86-64 / test (default, 1, 2, macos-12)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634734165" + }, + { + "name": "macos-12-py3-x86-64 / test (default, 2, 2, macos-12)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634734293" + }, + { + "name": "macos-12-py3-x86-64 / test (functorch, 1, 1, macos-12)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634734388" + }, + { + "name": "linux-focal-rocm5.2-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634772323" + }, + { + "name": "linux-focal-rocm5.2-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634772410" + }, + { + "name": "macos-12-py3-arm64 / test (default, 1, 2, macos-m1-12)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634812657" + }, + { + "name": "macos-12-py3-arm64 / test (default, 2, 2, macos-m1-12)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634812746" + }, + { + "name": "macos-12-py3-arm64-mps / Run MPS tests", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634812878" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634868761" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634868884" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869012" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869132" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (functorch, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869240" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (slow, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869348" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (slow, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869457" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (nogpu_AVX512, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869537" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869649" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (jit_legacy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869743" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869861" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4634869984" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 1, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4635049837" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 2, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4635049935" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 3, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4635050025" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 4, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4635050129" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (default, 5, 5, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4635050234" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (functorch, 1, 1, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4635050323" + }, + { + "name": "win-vs2019-cuda11.6-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351701/jobs/4635050460" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsWbDg=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953192" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUS2g=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "periodic" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "ios-12-5-1-arm64-metal / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634504650" + }, + { + "name": "linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634504883" + }, + { + "name": "ios-12-5-1-arm64 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634505024" + }, + { + "name": "buck-build-test / buck-build-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634505165" + }, + { + "name": "ios-12-5-1-arm64-coreml / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634505316" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7-debug / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634505521" + }, + { + "name": "libtorch-linux-bionic-cuda11.7-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634505667" + }, + { + "name": "linux-bionic-cuda11.7-py3.7-gcc7-debug / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634505786" + }, + { + "name": "linux-focal-rocm5.2-py3.7-slow / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634506031" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634506209" + }, + { + "name": "linux-focal-rocm5.2-py3.7-distributed / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634506353" + }, + { + "name": "win-vs2019-cuda11.7-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634506550" + }, + { + "name": "ios-12-5-1-x86-64-coreml / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634506968" + }, + { + "name": "ios-12-5-1-arm64-custom-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634507176" + }, + { + "name": "linux-focal-rocm5.2-py3.7-distributed / test (distributed, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634799214" + }, + { + "name": "linux-focal-rocm5.2-py3.7-distributed / test (distributed, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634799342" + }, + { + "name": "linux-focal-rocm5.2-py3.7-slow / test (slow, 1, 1, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634800216" + }, + { + "name": "linux-bionic-cuda10.2-py3.9-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634896194" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7-debug / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634955955" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7-debug / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634956066" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7-debug / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634956160" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7-debug / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634956251" + }, + { + "name": "linux-bionic-cuda11.7-py3.7-gcc7-debug / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634987167" + }, + { + "name": "linux-bionic-cuda11.7-py3.7-gcc7-debug / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634987289" + }, + { + "name": "linux-bionic-cuda11.7-py3.7-gcc7-debug / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634987406" + }, + { + "name": "linux-bionic-cuda11.7-py3.7-gcc7-debug / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4634987543" + }, + { + "name": "win-vs2019-cuda11.7-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4635020787" + }, + { + "name": "win-vs2019-cuda11.7-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4635020896" + }, + { + "name": "win-vs2019-cuda11.7-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4635021008" + }, + { + "name": "linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4635184380" + }, + { + "name": "linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351759/jobs/4635184472" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsZHek=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953337" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUS_k=" + } + ], + "pageInfo": { + "hasNextPage": false + } + } } } ] @@ -12693,212 +20785,93 @@ } } }, - "query_sha=62ce809793481ce6ddce6e1a19d9b0761755ff0ff75decaf8a79419eaf793110 cursor=Y3Vyc29yOnYyOpHOQAuLsw== name=pytorch number=68111 owner=pytorch": { + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=pytorch-dev-infra org=pytorch": { "data": { - "repository": { - "pullRequest": { - "comments": { + "organization": { + "team": { + "members": { "nodes": [ { - "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/chunyuan-w/pytorch/blob/7496bf1588050191595d833d23b8972b2f22655e/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk\n\u2705 triggered\n\n\nlinux-docs\nciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-vulkan-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-build\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-custom-build-static\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-onnx\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7-no-ops\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\ncaffe2-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\ndocker-builds\nciflow/all, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-custom-ops\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-full-jit\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-metal\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-full-jit\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda10.2-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-binary-conda\nciflow/binaries, ciflow/binaries/conda\n\ud83d\udeab skipped\n\n\nlinux-binary-libtorch-cxx11-abi\nciflow/binaries, ciflow/binaries/libtorch\n\ud83d\udeab skipped\n\n\nlinux-binary-libtorch-pre-cxx11\nciflow/binaries, ciflow/binaries/libtorch\n\ud83d\udeab skipped\n\n\nlinux-binary-manywheel\nciflow/binaries, ciflow/binaries/wheel\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-docs-push\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-no-ops\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-arm64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-lite-interpreter-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-11-py3-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.1-py3.7-gcc7-debug\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.1-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.5-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-build\nciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\n\n\nYou can add a comment to the PR and tag @pytorchbot with the following commands:\n\n# ciflow rerun, \"ciflow/default\" will always be added automatically\n@pytorchbot ciflow rerun\n\n# ciflow rerun with additional labels \"-l \", which is equivalent to adding these labels manually and trigger the rerun\n@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow\n\nFor more information, please take a look at the CI Flow Wiki.", - "author": { - "login": "pytorch-probot" - }, - "authorAssociation": "NONE", - "editor": { - "login": "pytorch-probot" - }, - "databaseId": 964902865 - }, - { - "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/68111\nNeed help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 7388141 (more details on the Dr. CI page):\n\n\n29/29 failures introduced in this PR\n\n\n\ud83d\udd75\ufe0f 29 new failures recognized by patterns\nThe following CI failures do not appear to be due to upstream breakages:\n pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (1/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:31:38.6978776Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:31:38.3001628Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:31:38.5169168Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:31:38.5362923Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:31:38.5413452Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:31:38.5458747Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:31:38.5484014Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:31:38.5497924Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:31:38.5656491Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:31:38.5678893Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:31:38.6888479Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0f6488c20adb4dca4\n2022-03-21T21:31:38.6978776Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:31:38.6992648Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:31:38.7003010Z ##[error]Process completed with exit code 2.\n2022-03-21T21:31:38.7044027Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:31:38.7044261Z with:\n2022-03-21T21:31:38.7044413Z env:\n2022-03-21T21:31:38.7044565Z IN_CI: 1\n2022-03-21T21:31:38.7044709Z IS_GHA: 1\n2022-03-21T21:31:38.7044885Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:31:38.7045067Z ##[endgroup]\n2022-03-21T21:31:38.7060958Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge) (2/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:35:19.2635222Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:35:18.9028722Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:35:19.1132721Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:35:19.1310590Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:35:19.1360251Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:35:19.1386865Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:35:19.1429182Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:35:19.1441925Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:35:19.1468280Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:35:19.1617667Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:35:19.2545368Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-098be2985e0392130\n2022-03-21T21:35:19.2635222Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:35:19.2648463Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:35:19.2658727Z ##[error]Process completed with exit code 2.\n2022-03-21T21:35:19.2706355Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:35:19.2706591Z with:\n2022-03-21T21:35:19.2706748Z env:\n2022-03-21T21:35:19.2706908Z IN_CI: 1\n2022-03-21T21:35:19.2707061Z IS_GHA: 1\n2022-03-21T21:35:19.2707246Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:35:19.2707438Z ##[endgroup]\n2022-03-21T21:35:19.2724554Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (3/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:11:57.5531419Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:11:52.7662022Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T23:11:53.1213298Z ---------------------------------------- 8.1/8.1 MB 23.6 MB/s eta 0:00:00\n2022-03-21T23:11:53.1644665Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:11:53.2218699Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T23:11:53.2389674Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T23:11:53.2787295Z -------------------------------------- 247.7/247.7 KB 7.4 MB/s eta 0:00:00\n2022-03-21T23:11:53.3761842Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:11:53.5457622Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T23:11:57.4175080Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T23:11:57.5296815Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0105d4db093574f40\n2022-03-21T23:11:57.5531419Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:11:57.5564814Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:11:57.5587712Z ##[error]Process completed with exit code 2.\n2022-03-21T23:11:57.5790311Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T23:11:57.5790832Z with:\n2022-03-21T23:11:57.5791104Z env:\n2022-03-21T23:11:57.5791358Z IN_CI: 1\n2022-03-21T23:11:57.5791620Z IS_GHA: 1\n2022-03-21T23:11:57.5791939Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:11:57.5792425Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T23:11:57.5792884Z ##[endgroup]\n\n\n pull / linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu) (4/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T02:17:12.6257577Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T02:17:11.9280556Z Using cached https://files.pythonhosted.org/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl\n2022-03-22T02:17:11.9335199Z Collecting urllib3<1.27,>=1.25.4 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:11.9682045Z Using cached https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl\n2022-03-22T02:17:11.9850357Z Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:12.0403171Z Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n2022-03-22T02:17:12.0468875Z Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:12.0590000Z Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n2022-03-22T02:17:12.0607093Z Installing collected packages: jmespath, urllib3, six, python-dateutil, botocore, s3transfer, boto3\n2022-03-22T02:17:12.5273459Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.9\n2022-03-22T02:17:12.6032812Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 worker-rocm-amd-114\n2022-03-22T02:17:12.6257577Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T02:17:12.6259543Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T02:17:12.6291924Z ##[error]Process completed with exit code 2.\n2022-03-22T02:17:12.6387977Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T02:17:12.6388298Z with:\n2022-03-22T02:17:12.6388521Z wait-ssh: false\n2022-03-22T02:17:12.6388727Z env:\n2022-03-22T02:17:12.6388932Z IN_CI: 1\n2022-03-22T02:17:12.6389143Z IS_GHA: 1\n2022-03-22T02:17:12.6389368Z GIT_DEFAULT_BRANCH: master\n2022-03-22T02:17:12.6389669Z DOCKER_HOST: unix:///run/user/1121/docker.sock\n\n\n pull / linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge) (5/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:19:24.4890693Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:19:24.0962005Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:19:24.3152253Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:19:24.3341183Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:19:24.3391374Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:19:24.3436392Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:19:24.3448982Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:19:24.3474092Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:19:24.3502003Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:19:24.3655072Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:19:24.4799309Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0bc9250521f338cae\n2022-03-21T22:19:24.4890693Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:19:24.4903625Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:19:24.4913841Z ##[error]Process completed with exit code 2.\n2022-03-21T22:19:24.4957338Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:19:24.4957575Z with:\n2022-03-21T22:19:24.4957735Z env:\n2022-03-21T22:19:24.4957900Z IN_CI: 1\n2022-03-21T22:19:24.4958055Z IS_GHA: 1\n2022-03-21T22:19:24.4958246Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:19:24.4958437Z ##[endgroup]\n2022-03-21T22:19:24.4989649Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (6/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T01:05:07.6983899Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T01:05:06.8364546Z Using cached https://files.pythonhosted.org/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl\n2022-03-22T01:05:06.8431763Z Collecting urllib3<1.27,>=1.25.4 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:06.8949391Z Using cached https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl\n2022-03-22T01:05:06.9180079Z Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:06.9803351Z Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n2022-03-22T01:05:06.9882133Z Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:07.0067062Z Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n2022-03-22T01:05:07.0088676Z Installing collected packages: urllib3, jmespath, six, python-dateutil, botocore, s3transfer, boto3\n2022-03-22T01:05:07.5819667Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.9\n2022-03-22T01:05:07.6774717Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 worker-rocm-amd-60\n2022-03-22T01:05:07.6983899Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T01:05:07.6988652Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T01:05:07.7023073Z ##[error]Process completed with exit code 2.\n2022-03-22T01:05:07.7102087Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T01:05:07.7102389Z with:\n2022-03-22T01:05:07.7102603Z wait-ssh: false\n2022-03-22T01:05:07.7102820Z env:\n2022-03-22T01:05:07.7103015Z IN_CI: 1\n2022-03-22T01:05:07.7103224Z IS_GHA: 1\n2022-03-22T01:05:07.7103458Z GIT_DEFAULT_BRANCH: master\n2022-03-22T01:05:07.7103737Z DOCKER_HOST: unix:///run/user/1502/docker.sock\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge) (7/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:51:39.3637996Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:51:39.2041249Z Attempting uninstall: s3transfer\n2022-03-21T20:51:39.2043010Z Found existing installation: s3transfer 0.3.7\n2022-03-21T20:51:39.2083799Z Uninstalling s3transfer-0.3.7:\n2022-03-21T20:51:39.2089675Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T20:51:39.2480546Z Attempting uninstall: boto3\n2022-03-21T20:51:39.2482953Z Found existing installation: boto3 1.16.34\n2022-03-21T20:51:39.2584292Z Uninstalling boto3-1.16.34:\n2022-03-21T20:51:39.2599474Z Successfully uninstalled boto3-1.16.34\n2022-03-21T20:51:39.3130921Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T20:51:39.3550598Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-03ef7efc3078e3da5\n2022-03-21T20:51:39.3637996Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:51:39.3650651Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:51:39.3660484Z ##[error]Process completed with exit code 2.\n2022-03-21T20:51:39.3696465Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:51:39.3696693Z with:\n2022-03-21T20:51:39.3696850Z env:\n2022-03-21T20:51:39.3697012Z IN_CI: 1\n2022-03-21T20:51:39.3697161Z IS_GHA: 1\n2022-03-21T20:51:39.3697342Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:51:39.3697528Z ##[endgroup]\n2022-03-21T20:51:39.3730420Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge) (8/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:03:36.3916860Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:03:36.0096309Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:03:36.2278560Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:03:36.2461618Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:03:36.2513260Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:03:36.2541524Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:03:36.2554899Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:03:36.2598277Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:03:36.2758299Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:03:36.2780690Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:03:36.3825021Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0a4a552890e6ef7d3\n2022-03-21T21:03:36.3916860Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:03:36.3930343Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:03:36.3941263Z ##[error]Process completed with exit code 2.\n2022-03-21T21:03:36.3979258Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:03:36.3979496Z with:\n2022-03-21T21:03:36.3979654Z env:\n2022-03-21T21:03:36.3979814Z IN_CI: 1\n2022-03-21T21:03:36.3979968Z IS_GHA: 1\n2022-03-21T21:03:36.3980157Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:03:36.3980360Z ##[endgroup]\n2022-03-21T21:03:36.3996257Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (9/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T00:41:15.5325784Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T00:41:10.3015614Z Downloading s3transfer-0.5.2-py3-none-any.whl (79 kB)\n2022-03-22T00:41:10.3625659Z ---------------------------------------- 79.5/79.5 KB 1.1 MB/s eta 0:00:00\n2022-03-22T00:41:10.4120236Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-22T00:41:10.4170155Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-22T00:41:10.4722115Z -------------------------------------- 247.7/247.7 KB 5.2 MB/s eta 0:00:00\n2022-03-22T00:41:10.4843512Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-22T00:41:10.6596108Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-22T00:41:10.8733354Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-22T00:41:15.3745408Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-22T00:41:15.4987162Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-09cacc848abc3dd32\n2022-03-22T00:41:15.5325784Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T00:41:15.5373630Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T00:41:15.5404353Z ##[error]Process completed with exit code 2.\n2022-03-22T00:41:15.5790508Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-22T00:41:15.5791192Z with:\n2022-03-22T00:41:15.5791530Z env:\n2022-03-22T00:41:15.5791849Z IN_CI: 1\n2022-03-22T00:41:15.5792186Z IS_GHA: 1\n2022-03-22T00:41:15.5792599Z GIT_DEFAULT_BRANCH: master\n2022-03-22T00:41:15.5793237Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-22T00:41:15.5793831Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge) (10/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:50:32.9799307Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:50:32.8167560Z Attempting uninstall: s3transfer\n2022-03-21T20:50:32.8169351Z Found existing installation: s3transfer 0.3.7\n2022-03-21T20:50:32.8213295Z Uninstalling s3transfer-0.3.7:\n2022-03-21T20:50:32.8219209Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T20:50:32.8602320Z Attempting uninstall: boto3\n2022-03-21T20:50:32.8603289Z Found existing installation: boto3 1.16.34\n2022-03-21T20:50:32.8704535Z Uninstalling boto3-1.16.34:\n2022-03-21T20:50:32.8719403Z Successfully uninstalled boto3-1.16.34\n2022-03-21T20:50:32.9244278Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T20:50:32.9710449Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0c568461a276d4a71\n2022-03-21T20:50:32.9799307Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:50:32.9812238Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:50:32.9823052Z ##[error]Process completed with exit code 2.\n2022-03-21T20:50:32.9859290Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:50:32.9859527Z with:\n2022-03-21T20:50:32.9859664Z env:\n2022-03-21T20:50:32.9859817Z IN_CI: 1\n2022-03-21T20:50:32.9859977Z IS_GHA: 1\n2022-03-21T20:50:32.9860144Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:50:32.9860327Z ##[endgroup]\n2022-03-21T20:50:32.9893642Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge) (11/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:05:00.7163042Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:05:00.6660824Z #10 0x55fc8a3ea801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:05:00.6661768Z #11 0x55fc8a3f57a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:05:00.6662455Z #12 0x55fc8a3f580b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:05:00.6663570Z #13 0x55fc8a3f5908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:05:00.6663952Z #14 0x55fc8a3f5908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:05:00.6664431Z #15 0x55fc8a3f5908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:05:00.6665304Z #16 0x55fc8a3f5ccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:05:00.7162113Z #17 0x7f940d00f83f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:05:00.7162534Z #18 0x55fc8a39a554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:05:00.7162711Z \n2022-03-21T21:05:00.7163042Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:05:00.7334595Z + retcode=1\n2022-03-21T21:05:00.7334954Z + set -e\n2022-03-21T21:05:00.7335215Z + return 1\n2022-03-21T21:05:00.7338688Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:05:00.7339232Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:05:00.7340113Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:05:00.7340612Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:05:00.7341187Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:05:00.7341668Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:05:00.7344466Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge) (12/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:06:03.4437430Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:06:03.0752199Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:06:03.2853252Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:06:03.3032326Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:06:03.3081589Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:06:03.3093911Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:06:03.3120244Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:06:03.3162406Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:06:03.3188431Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:06:03.3337181Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:06:03.4348072Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0ee48c8811fafc444\n2022-03-21T22:06:03.4437430Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:06:03.4450920Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:06:03.4461263Z ##[error]Process completed with exit code 2.\n2022-03-21T22:06:03.4502346Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:06:03.4502576Z with:\n2022-03-21T22:06:03.4502730Z env:\n2022-03-21T22:06:03.4502888Z IN_CI: 1\n2022-03-21T22:06:03.4503038Z IS_GHA: 1\n2022-03-21T22:06:03.4503302Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:06:03.4503492Z ##[endgroup]\n2022-03-21T22:06:03.4519156Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (13/29)\nStep: \"Test\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:50:13.2205634Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:50:12.8679322Z + python3 -m pip install boto3==1.19.12\n2022-03-21T20:50:13.0744228Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T20:50:13.0916284Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T20:50:13.0964264Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T20:50:13.1005656Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T20:50:13.1017299Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T20:50:13.1041042Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T20:50:13.1189450Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T20:50:13.1208751Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T20:50:13.2119445Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d02da60fd18c22f5\n2022-03-21T20:50:13.2205634Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:50:13.2217939Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:50:13.2220259Z ##[error]Process completed with exit code 2.\n2022-03-21T20:50:13.2248664Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:50:13.2249012Z with:\n2022-03-21T20:50:13.2249260Z env:\n2022-03-21T20:50:13.2249500Z IN_CI: 1\n2022-03-21T20:50:13.2249738Z IS_GHA: 1\n2022-03-21T20:50:13.2250025Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:50:13.2250329Z ##[endgroup]\n2022-03-21T20:50:13.2272735Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (14/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:47:38.0451999Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:47:37.5554508Z + python3 -m pip install boto3==1.19.12\n2022-03-21T23:47:37.8411473Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T23:47:37.8631484Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T23:47:37.8699561Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T23:47:37.8737037Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T23:47:37.8754443Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T23:47:37.8814393Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T23:47:37.8849540Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:47:37.9059579Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:47:38.0336298Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0b44f47f4292089a2\n2022-03-21T23:47:38.0451999Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:47:38.0469471Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:47:38.0484106Z ##[error]Process completed with exit code 2.\n2022-03-21T23:47:38.0532678Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T23:47:38.0533007Z with:\n2022-03-21T23:47:38.0533223Z env:\n2022-03-21T23:47:38.0533440Z IN_CI: 1\n2022-03-21T23:47:38.0533649Z IS_GHA: 1\n2022-03-21T23:47:38.0533902Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:47:38.0534170Z GPU_FLAG: --gpus all\n2022-03-21T23:47:38.0534401Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge) (15/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:04:59.3115800Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:04:59.2595213Z #10 0x55a7f39a4801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:04:59.2595707Z #11 0x55a7f39af7a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:04:59.2597203Z #12 0x55a7f39af80b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:04:59.2598205Z #13 0x55a7f39af908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:04:59.2598697Z #14 0x55a7f39af908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:04:59.2599178Z #15 0x55a7f39af908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:04:59.2599747Z #16 0x55a7f39afccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:04:59.3114751Z #17 0x7f3b3822383f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:04:59.3115277Z #18 0x55a7f3954554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:04:59.3115468Z \n2022-03-21T21:04:59.3115800Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:04:59.3292385Z + retcode=1\n2022-03-21T21:04:59.3292781Z + set -e\n2022-03-21T21:04:59.3293062Z + return 1\n2022-03-21T21:04:59.3295462Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:04:59.3295802Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:04:59.3296394Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:04:59.3296700Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:04:59.3297055Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:04:59.3297416Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:04:59.3299623Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (16/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:14:31.7846086Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:14:25.5525714Z Collecting jmespath<1.0.0,>=0.7.1\n2022-03-21T22:14:25.5568155Z Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)\n2022-03-21T22:14:25.5952617Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T22:14:25.6169392Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T22:14:25.6629996Z -------------------------------------- 247.7/247.7 KB 5.1 MB/s eta 0:00:00\n2022-03-21T22:14:25.6710247Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:14:25.8284354Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:14:25.9816751Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T22:14:31.6672236Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T22:14:31.7630473Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0ed0915ecee5d2424\n2022-03-21T22:14:31.7846086Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:14:31.7876742Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:14:31.7897140Z ##[error]Process completed with exit code 2.\n2022-03-21T22:14:31.8195621Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T22:14:31.8196110Z with:\n2022-03-21T22:14:31.8196356Z env:\n2022-03-21T22:14:31.8196614Z IN_CI: 1\n2022-03-21T22:14:31.8196876Z IS_GHA: 1\n2022-03-21T22:14:31.8197169Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:14:31.8197652Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T22:14:31.8198093Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge) (17/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:19:15.8845728Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:19:15.5116060Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:19:15.7231476Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:19:15.7409711Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:19:15.7458478Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:19:15.7470508Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:19:15.7496799Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:19:15.7538362Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:19:15.7566161Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:19:15.7711630Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:19:15.8753543Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0e2b3b4ddb246ff2a\n2022-03-21T21:19:15.8845728Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:19:15.8859814Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:19:15.8870165Z ##[error]Process completed with exit code 2.\n2022-03-21T21:19:15.8917039Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:19:15.8917279Z with:\n2022-03-21T21:19:15.8917433Z env:\n2022-03-21T21:19:15.8917586Z IN_CI: 1\n2022-03-21T21:19:15.8917734Z IS_GHA: 1\n2022-03-21T21:19:15.8917917Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:19:15.8918102Z ##[endgroup]\n2022-03-21T21:19:15.8934572Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (18/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:19:48.5900162Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:19:48.0742254Z + python3 -m pip install boto3==1.19.12\n2022-03-21T23:19:48.3742563Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T23:19:48.3976536Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T23:19:48.4048700Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T23:19:48.4065374Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T23:19:48.4128076Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T23:19:48.4164273Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T23:19:48.4202610Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:19:48.4416723Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:19:48.5773033Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-07ab7a3c4a5402af2\n2022-03-21T23:19:48.5900162Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:19:48.5919822Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:19:48.5936087Z ##[error]Process completed with exit code 2.\n2022-03-21T23:19:48.6007930Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T23:19:48.6008268Z with:\n2022-03-21T23:19:48.6008483Z env:\n2022-03-21T23:19:48.6008701Z IN_CI: 1\n2022-03-21T23:19:48.6008920Z IS_GHA: 1\n2022-03-21T23:19:48.6009170Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:19:48.6009440Z GPU_FLAG: --gpus all\n2022-03-21T23:19:48.6009671Z ##[endgroup]\n\n\n pull / win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu) (19/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:54:04.2844259Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:53:59.0889659Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T22:53:59.6881416Z ---------------------------------------- 8.1/8.1 MB 14.0 MB/s eta 0:00:00\n2022-03-21T22:53:59.7427779Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:53:59.7691882Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T22:53:59.7779847Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T22:53:59.8281663Z -------------------------------------- 247.7/247.7 KB 5.1 MB/s eta 0:00:00\n2022-03-21T22:54:00.0185115Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:54:00.2359770Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T22:54:04.1208891Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T22:54:04.2505862Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-03b4fbe63be8ef4b0\n2022-03-21T22:54:04.2844259Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:54:04.2891082Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:54:04.2919900Z ##[error]Process completed with exit code 2.\n2022-03-21T22:54:04.3377901Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T22:54:04.3378575Z with:\n2022-03-21T22:54:04.3378930Z env:\n2022-03-21T22:54:04.3379275Z IN_CI: 1\n2022-03-21T22:54:04.3379600Z IS_GHA: 1\n2022-03-21T22:54:04.3380023Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:54:04.3380691Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T22:54:04.3381278Z ##[endgroup]\n\n\n pull / linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge) (20/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:09:34.0074610Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:09:33.6365531Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:09:33.8475619Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:09:33.8655152Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:09:33.8704395Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:09:33.8716774Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:09:33.8760145Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:09:33.8785000Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:09:33.8811316Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:09:33.8960134Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:09:33.9984866Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d325eb9fd156146f\n2022-03-21T22:09:34.0074610Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:09:34.0087465Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:09:34.0101743Z ##[error]Process completed with exit code 2.\n2022-03-21T22:09:34.0154014Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:09:34.0154246Z with:\n2022-03-21T22:09:34.0154412Z env:\n2022-03-21T22:09:34.0154574Z IN_CI: 1\n2022-03-21T22:09:34.0154728Z IS_GHA: 1\n2022-03-21T22:09:34.0154917Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:09:34.0155112Z ##[endgroup]\n2022-03-21T22:09:34.0191047Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge) (21/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:03:17.8502655Z [E request_callbac...yUniqueId(created_on=0, local_id=0) to be created.\n\n2022-03-21T21:03:14.4669960Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxgdsmeer\n2022-03-21T21:03:14.4671407Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxgdsmeer/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.4973023Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1i2hfmpc\n2022-03-21T21:03:14.4973800Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1i2hfmpc/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.5532339Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgx4da7b0\n2022-03-21T21:03:14.5533064Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgx4da7b0/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.7050673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0\n2022-03-21T21:03:14.7097127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3\n2022-03-21T21:03:14.7398339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2\n2022-03-21T21:03:14.7922283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1\n2022-03-21T21:03:17.8502655Z [E request_callback_no_python.cpp:559] Received error while processing request type 261: false INTERNAL ASSERT FAILED at \"/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp\":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.\n2022-03-21T21:03:17.8503603Z Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):\n2022-03-21T21:03:17.8504385Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x69 (0x7f180df19e19 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8505131Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xd2 (0x7f180df160e2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8505927Z frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) + 0x4e (0x7f180df17a7e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8506674Z frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0x4b4 (0x7f18118b7b64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8507642Z frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr >) const + 0x70 (0x7f18118a7bf0 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8508613Z frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0xc8 (0x7f1819736208 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\n2022-03-21T21:03:17.8509749Z frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f18118ac914 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8510708Z frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f1819735865 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\n2022-03-21T21:03:17.8511369Z frame #8: + 0x375249a (0x7f18118a949a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test (22/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:01:07.7015580Z \ufffd[36;1m echo \"ERR...t available for the merge-base of your branch\"\ufffd[0m\n\n2022-03-21T20:01:07.7012399Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7012634Z \ufffd[36;1m# Covers the case where a previous tag doesn't exist for the tree\ufffd[0m\n2022-03-21T20:01:07.7012992Z \ufffd[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly\ufffd[0m\n2022-03-21T20:01:07.7013373Z \ufffd[36;1mif ! git rev-parse \"$MERGE_BASE:.circleci/docker\"; then\ufffd[0m\n2022-03-21T20:01:07.7013784Z \ufffd[36;1m echo \"Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit\"\ufffd[0m\n2022-03-21T20:01:07.7014149Z \ufffd[36;1m exit 1\ufffd[0m\n2022-03-21T20:01:07.7014325Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7014573Z \ufffd[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse \"$MERGE_BASE:.circleci/docker\")\ufffd[0m\n2022-03-21T20:01:07.7014907Z \ufffd[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here\ufffd[0m\n2022-03-21T20:01:07.7015231Z \ufffd[36;1mif [[ \"${PREVIOUS_DOCKER_TAG}\" = \"${DOCKER_TAG}\" ]]; then\ufffd[0m\n2022-03-21T20:01:07.7015580Z \ufffd[36;1m echo \"ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch\"\ufffd[0m\n2022-03-21T20:01:07.7015931Z \ufffd[36;1m echo \" contact the PyTorch team to restore the original images\"\ufffd[0m\n2022-03-21T20:01:07.7016225Z \ufffd[36;1m exit 1\ufffd[0m\n2022-03-21T20:01:07.7016400Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7016608Z \ufffd[36;1mecho ::set-output name=rebuild::yes\ufffd[0m\n2022-03-21T20:01:07.7027605Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}\n2022-03-21T20:01:07.7027837Z env:\n2022-03-21T20:01:07.7028006Z IN_CI: 1\n2022-03-21T20:01:07.7028159Z IS_GHA: 1\n2022-03-21T20:01:07.7028346Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:01:07.7028589Z BASE_REVISION: 6643522db9ff595f564b8081de58b3a33c546178\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu) (23/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T00:49:54.2949572Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T00:49:53.8049151Z + python3 -m pip install boto3==1.19.12\n2022-03-22T00:49:54.0981629Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-22T00:49:54.1207562Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-22T00:49:54.1277146Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-22T00:49:54.1315027Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-22T00:49:54.1331813Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-22T00:49:54.1391622Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-22T00:49:54.1609217Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-22T00:49:54.1637417Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-22T00:49:54.2830197Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0f7c32fe13be12fea\n2022-03-22T00:49:54.2949572Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T00:49:54.2966933Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T00:49:54.2982588Z ##[error]Process completed with exit code 2.\n2022-03-22T00:49:54.3031464Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T00:49:54.3031794Z with:\n2022-03-22T00:49:54.3032012Z env:\n2022-03-22T00:49:54.3032227Z IN_CI: 1\n2022-03-22T00:49:54.3032434Z IS_GHA: 1\n2022-03-22T00:49:54.3032681Z GIT_DEFAULT_BRANCH: master\n2022-03-22T00:49:54.3033084Z GPU_FLAG: --gpus all\n2022-03-22T00:49:54.3033312Z ##[endgroup]\n\n\n pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (24/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:56:12.5872636Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:56:07.3365589Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T21:56:07.7926584Z ---------------------------------------- 8.1/8.1 MB 17.3 MB/s eta 0:00:00\n2022-03-21T21:56:07.9319362Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T21:56:07.9366132Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T21:56:08.0077590Z -------------------------------------- 247.7/247.7 KB 3.0 MB/s eta 0:00:00\n2022-03-21T21:56:08.0164070Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:56:08.1775537Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:56:08.3393469Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T21:56:12.4576766Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T21:56:12.5641959Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0afad69838118af0e\n2022-03-21T21:56:12.5872636Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:56:12.5905611Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:56:12.5927729Z ##[error]Process completed with exit code 2.\n2022-03-21T21:56:12.6239531Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T21:56:12.6240039Z with:\n2022-03-21T21:56:12.6240299Z env:\n2022-03-21T21:56:12.6240557Z IN_CI: 1\n2022-03-21T21:56:12.6240805Z IS_GHA: 1\n2022-03-21T21:56:12.6241118Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:56:12.6241613Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T21:56:12.6242052Z ##[endgroup]\n\n\n pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (25/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:46:39.5474616Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:46:39.1884210Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:46:39.3928976Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:46:39.4105069Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:46:39.4152571Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:46:39.4194931Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:46:39.4218947Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:46:39.4230812Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:46:39.4380089Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:46:39.4399461Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:46:39.5387703Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0888bed1149cca415\n2022-03-21T21:46:39.5474616Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:46:39.5487145Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:46:39.5497480Z ##[error]Process completed with exit code 2.\n2022-03-21T21:46:39.5541319Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:46:39.5541544Z with:\n2022-03-21T21:46:39.5541698Z env:\n2022-03-21T21:46:39.5541851Z IN_CI: 1\n2022-03-21T21:46:39.5541997Z IS_GHA: 1\n2022-03-21T21:46:39.5542176Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:46:39.5542361Z ##[endgroup]\n2022-03-21T21:46:39.5557878Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (26/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:34:57.0623859Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:34:56.9039884Z Attempting uninstall: s3transfer\n2022-03-21T21:34:56.9041446Z Found existing installation: s3transfer 0.3.7\n2022-03-21T21:34:56.9090783Z Uninstalling s3transfer-0.3.7:\n2022-03-21T21:34:56.9095968Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T21:34:56.9453014Z Attempting uninstall: boto3\n2022-03-21T21:34:56.9454356Z Found existing installation: boto3 1.16.34\n2022-03-21T21:34:56.9564320Z Uninstalling boto3-1.16.34:\n2022-03-21T21:34:56.9578035Z Successfully uninstalled boto3-1.16.34\n2022-03-21T21:34:57.0091363Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T21:34:57.0536230Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-034a3afd5d80b91fd\n2022-03-21T21:34:57.0623859Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:34:57.0637167Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:34:57.0647396Z ##[error]Process completed with exit code 2.\n2022-03-21T21:34:57.0688237Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:34:57.0688481Z with:\n2022-03-21T21:34:57.0688631Z env:\n2022-03-21T21:34:57.0688769Z IN_CI: 1\n2022-03-21T21:34:57.0688930Z IS_GHA: 1\n2022-03-21T21:34:57.0689109Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:34:57.0689462Z ##[endgroup]\n2022-03-21T21:34:57.0704768Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge) (27/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:05:00.7896545Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:05:00.7395504Z #10 0x5597fd5a9801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:05:00.7396330Z #11 0x5597fd5b47a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:05:00.7396688Z #12 0x5597fd5b480b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:05:00.7398664Z #13 0x5597fd5b4908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:05:00.7399177Z #14 0x5597fd5b4908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:05:00.7399663Z #15 0x5597fd5b4908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:05:00.7399986Z #16 0x5597fd5b4ccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:05:00.7895241Z #17 0x7f0a5905983f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:05:00.7895772Z #18 0x5597fd559554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:05:00.7896033Z \n2022-03-21T21:05:00.7896545Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:05:00.8063448Z + retcode=1\n2022-03-21T21:05:00.8063787Z + set -e\n2022-03-21T21:05:00.8064058Z + return 1\n2022-03-21T21:05:00.8067638Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:05:00.8068127Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:05:00.8069018Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:05:00.8069500Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:05:00.8070105Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:05:00.8070580Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:05:00.8072640Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (28/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:48:17.3384813Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:48:16.8599645Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:48:17.1464241Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:48:17.1685222Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:48:17.1754164Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:48:17.1771662Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:48:17.1808722Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:48:17.1868636Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:48:17.1903889Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:48:17.2113746Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:48:17.3267404Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-01fe178c405417375\n2022-03-21T22:48:17.3384813Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:48:17.3402286Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:48:17.3418376Z ##[error]Process completed with exit code 2.\n2022-03-21T22:48:17.3470528Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:48:17.3470874Z with:\n2022-03-21T22:48:17.3471096Z env:\n2022-03-21T22:48:17.3471327Z IN_CI: 1\n2022-03-21T22:48:17.3471538Z IS_GHA: 1\n2022-03-21T22:48:17.3471802Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:48:17.3472083Z GPU_FLAG: --gpus all\n2022-03-21T22:48:17.3472322Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (29/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:16:38.9646300Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:16:38.7995969Z Attempting uninstall: s3transfer\n2022-03-21T21:16:38.7998039Z Found existing installation: s3transfer 0.3.7\n2022-03-21T21:16:38.8066994Z Uninstalling s3transfer-0.3.7:\n2022-03-21T21:16:38.8072844Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T21:16:38.8449275Z Attempting uninstall: boto3\n2022-03-21T21:16:38.8451430Z Found existing installation: boto3 1.16.34\n2022-03-21T21:16:38.8559828Z Uninstalling boto3-1.16.34:\n2022-03-21T21:16:38.8574290Z Successfully uninstalled boto3-1.16.34\n2022-03-21T21:16:38.9100438Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T21:16:38.9558098Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d779c59d277d32ee\n2022-03-21T21:16:38.9646300Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:16:38.9658894Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:16:38.9673240Z ##[error]Process completed with exit code 2.\n2022-03-21T21:16:38.9720106Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:16:38.9720333Z with:\n2022-03-21T21:16:38.9720485Z env:\n2022-03-21T21:16:38.9720645Z IN_CI: 1\n2022-03-21T21:16:38.9720793Z IS_GHA: 1\n2022-03-21T21:16:38.9720970Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:16:38.9721151Z ##[endgroup]\n2022-03-21T21:16:38.9736762Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", - "author": { - "login": "facebook-github-bot" - }, - "authorAssociation": "MEMBER", - "editor": { - "login": "facebook-github-bot" - }, - "databaseId": 964902894 - }, - { - "bodyText": "@vitaly-fedyunin @gottbrath FYI that this is the oneDNN Graph API integration. It depends on the #63748.", - "author": { - "login": "Jianhui-Li" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 970451860 - }, - { - "bodyText": "CI failures are currently being caused by some issues in the CI infra, and are also occurring with other PRs.", - "author": { - "login": "sanchitintel" - }, - "authorAssociation": "CONTRIBUTOR", - "editor": null, - "databaseId": 990641309 + "login": "kit1980" }, { - "bodyText": "CI failures are unrelated.", - "author": { - "login": "sanchitintel" - }, - "authorAssociation": "CONTRIBUTOR", - "editor": null, - "databaseId": 991281407 + "login": "huydhn" }, { - "bodyText": "The CI failure is unrelated.", - "author": { - "login": "sanchitintel" - }, - "authorAssociation": "CONTRIBUTOR", - "editor": null, - "databaseId": 995389295 + "login": "b0noI" }, { - "bodyText": "Hi, thank you for the PR!\nDo you mind running a larger amount of torchbench and reporting numbers ? You can look at Jason's post here for what models are supported in script. Initially just the vision models would be useful. @Krovatkin also did some benchmarking of a traced Bert model and found on average a ~16% speedup with this PR.", - "author": { - "login": "eellison" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1015689390 + "login": "seemethere" }, { - "bodyText": "Thanks a lot for reviewing, @eellison & @Krovatkin!\nWe just wanted to let you know that we're working on the benchmarking & will get back to you in a day, or two.\nUPDATE (Jan 21): While running some TorchBench models, we discovered some composability issues, and are working to ensure that oneDNN Graph would complement PyTorch's existing fusion capabilities, not hinder them.\nUPDATE (Jan 24): We've resolved the issues & will update this PR later today. Thanks!", - "author": { - "login": "sanchitintel" - }, - "authorAssociation": "CONTRIBUTOR", - "editor": { - "login": "sanchitintel" - }, - "databaseId": 1016996190 + "login": "malfet" }, { - "bodyText": "Hello @eellison,\nWe used this TorchBench branch for comparison. compare_llga.sh can be run for comparison.\nFor benchmarking mobilenet_v3_large with hardswish support in oneDNN Graph, this oneDNN Graph branch can be used in third_party/ideep/mkl-dnn. It delivers a speedup over PyTorch JIT (NNC + OFI) because 21 additional reorders are prevented (the major factor here), and fusion with conv also helps further.\nThe next release of oneDNN Graph would have hardswish support.\nWe're also exploring adding a hardsigmoid op in oneDNN Graph.\nThank you!", - "author": { - "login": "sanchitintel" - }, - "authorAssociation": "CONTRIBUTOR", - "editor": { - "login": "sanchitintel" - }, - "databaseId": 1022709513 + "login": "DanilBaibak" }, { - "bodyText": "Please note that this PR should be merged after #71546, as #71546 changes the third_party/ideep commit (this PR also uses that ideep commit, but it'd probably be better to merge #71546 first, so that oneDNN v2.5.2 upgrade would be in a separate PR). Thank you!", - "author": { - "login": "sanchitintel" - }, - "authorAssociation": "CONTRIBUTOR", - "editor": null, - "databaseId": 1026330085 + "login": "ZainRizvi" }, { - "bodyText": "@sanchitintel mind rebasing and i'll land ?", - "author": { - "login": "eellison" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1055813984 + "login": "jeanschmidt" }, { - "bodyText": "@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", - "author": { - "login": "facebook-github-bot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1057203495 + "login": "atalman" }, { - "bodyText": "Thanks a lot for taking a look, @eellison! To fix this error, we would enable Bazel build for oneDNN Graph.", - "author": { - "login": "sanchitintel" - }, - "authorAssociation": "CONTRIBUTOR", - "editor": { - "login": "sanchitintel" - }, - "databaseId": 1061230087 + "login": "mehtanirav" }, { - "bodyText": "@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", - "author": { - "login": "facebook-github-bot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1063276600 + "login": "osalpekar" }, { - "bodyText": "@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", - "author": { - "login": "facebook-github-bot" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1074355779 + "login": "janeyx99" }, { - "bodyText": "And graph_rewriter.cpp is full of DOS newlines...", - "author": { - "login": "malfet" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1074407452 + "login": "zengk95" }, { - "bodyText": "Hey @chunyuan-w.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", - "author": { - "login": "github-actions" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 1074471758 + "login": "clee2000" }, { - "bodyText": "Thanks a ton for your help, @malfet & @eellison! :)\nWe'll incorporate your suggestions in subsequent PR(s).", - "author": { - "login": "sanchitintel" - }, - "authorAssociation": "CONTRIBUTOR", - "editor": { - "login": "sanchitintel" - }, - "databaseId": 1074492365 + "login": "izaitsevfb" + }, + { + "login": "weiwangmeta" } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOOYM_0Q==", - "hasPreviousPage": false + "hasNextPage": false, + "endCursor": "Y3Vyc29yOnYyOpHOBoQSVA==" } } } } } }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=73969 owner=pytorch": { + "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=qwertyuiop org=pytorch": { + "data": { + "organization": { + "team": null + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=82169 owner=pytorch": { "data": { "repository": { "pullRequest": { "closed": true, - "isCrossRepository": true, + "isCrossRepository": false, "author": { - "login": "malfet" + "login": "ezyang" }, - "title": "Dummy change", - "body": "Test Plan: None at all\n\nDifferential Revision: D34753911\n\n", - "headRefName": "export-D34753911", + "title": "Move test_dtypes so it runs later", + "body": "Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):\n* __->__ #82169\n\nThe error messages it gives are very unhelpful (because a failure\ngets translated into \"dtype was not supported\" rather than the\nactual backtrace), so I'd rather get error messages about this after\nI've tested basic functionality.\n\nSigned-off-by: Edward Z. Yang ", + "headRefName": "gh/ezyang/1279/head", "headRepository": { - "nameWithOwner": "malfet/pytorch" + "nameWithOwner": "pytorch/pytorch" }, - "baseRefName": "master", + "baseRefName": "gh/ezyang/1279/base", "baseRepository": { "nameWithOwner": "pytorch/pytorch", "isPrivate": false, @@ -12913,20 +20886,44 @@ "commit": { "author": { "user": { - "login": "malfet" + "login": "ezyang" }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" }, - "oid": "4746da707a9912356f5179625da89616b228dc21" + "oid": "cef34da55a59da5a32494bff218ccd4978b659d3" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "83ad7e73a07111ac1d85e931d14360cc22c01edd" + } + }, + { + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "28140e4008289251b695385acfb48ac7a47cd49c" } } ], "pageInfo": { - "endCursor": "MQ", + "endCursor": "Mw", "hasNextPage": false }, - "totalCount": 1 + "totalCount": 3 }, "commits": { "nodes": [ @@ -12942,148 +20939,61 @@ }, "workflowRun": { "workflow": { - "name": "linux-vulkan-bionic-py3.7-clang9" + "name": "Lint" } }, "checkRuns": { "nodes": [ { - "name": "build", + "name": "lintrunner", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928580?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310707890" }, { - "name": "test (default, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483086020?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRQMQ=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592963" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QM=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928547?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aM=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592965" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QU=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-bionic-rocm4.5-py3.7" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", + "name": "Test collect_env (with_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928602?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708140" }, { - "name": "test (default, 1, 2, linux.rocm.gpu)", + "name": "Test collect_env (without_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483235366?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708223" }, { - "name": "test (default, 2, 2, linux.rocm.gpu)", + "name": "Test collect_env (older_python_version)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483235570?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708332" }, { - "name": "test (distributed, 1, 1, linux.rocm.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483235708?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbTiXw=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592966" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QY=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "win-vs2019-cuda11.3-py3" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", + "name": "quick-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928594?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708496" }, { - "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "name": "toc", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483593208?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708710" }, { - "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "name": "Test tools", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483593337?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310708937" }, { - "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "name": "workflow-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483593461?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823981/jobs/4310709169" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbY_vU=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGj1lc=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592967" + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696649" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qc=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc8k=" }, { "node": { @@ -13093,26 +21003,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { - "nodes": [ - { - "name": "build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928554?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2ao=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592969" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696651" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qk=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc8s=" }, { "node": { @@ -13122,36 +21026,26 @@ }, "workflowRun": { "workflow": { - "name": "linux-docs" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { "nodes": [ { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928595?check_suite_focus=true" - }, - { - "name": "build-docs (cpp)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483078289?check_suite_focus=true" - }, - { - "name": "build-docs (python)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483078365?check_suite_focus=true" + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747823982/jobs/4310707884" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRIt0=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGjz0w=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592970" + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696656" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qo=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc9A=" }, { "node": { @@ -13161,41 +21055,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-gcc7" + "name": "Lint" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928553?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483074693?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483074951?check_suite_focus=true" - }, - { - "name": "test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483075182?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRFm4=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592971" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696660" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qs=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc9Q=" }, { "node": { @@ -13205,26 +21078,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3-clang5-mobile-build" + "name": "pull" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928556?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aw=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592974" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696715" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Q4=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdAs=" }, { "node": { @@ -13234,103 +21101,362 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "pull" } }, "checkRuns": { "nodes": [ { - "name": "shellcheck", + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310708487" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310708713" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310708942" + }, + { + "name": "linux-focal-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310709174" + }, + { + "name": "linux-bionic-py3_7-clang8-xla / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310709340" + }, + { + "name": "linux-focal-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310709579" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310709844" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710003" + }, + { + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710175" + }, + { + "name": "win-vs2019-cuda11.6-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710516" + }, + { + "name": "linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710716" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310710890" + }, + { + "name": "linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711097" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711234" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711429" + }, + { + "name": "linux-focal-rocm5.2-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711603" + }, + { + "name": "linux-jammy-cuda11.6-cudnn8-py3.8-clang12 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711765" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310711946" + }, + { + "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310712129" + }, + { + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4310712276" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194495" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194591" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194659" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194749" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194858" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311194934" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311195003" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311220458" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311220540" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311222725" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311222869" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223128" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223225" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223324" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (functorch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223396" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223496" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223569" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311223690" + }, + { + "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311224360" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928552?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311230050" }, { - "name": "quick-checks", + "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928797?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311301930" }, { - "name": "clang-tidy", + "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482929069?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311302152" }, { - "name": "clang-format", + "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482929350?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311302303" }, { - "name": "cmakelint", + "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482929628?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311302433" }, { - "name": "toc", + "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482929838?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311302531" }, { - "name": "py2-setup-validate-errormsg", + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482929972?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491082" }, { - "name": "flake8-py3", + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482930102?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491172" }, { - "name": "mypy", + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491232" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491289" + }, + { + "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482930251?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2747824048/jobs/4311491348" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO4Es=", - "hasNextPage": false + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcG0YME=", + "hasNextPage": true } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592975" + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696836" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Q8=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdIQ=" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" - } + "name": "Facebook GitHub Tools", + "databaseId": 12274 }, + "workflowRun": null, "checkRuns": { "nodes": [ { - "name": "build-and-test", + "name": "Facebook CLA Check", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928573?check_suite_focus=true" + "detailsUrl": "https://code.intern.facebook.com/cla/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2b0=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGjyQg=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592976" + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696896" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RA=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdMA=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697185" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdeE=" + }, + { + "node": { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697205" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdfU=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697224" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdgg=" } ], "pageInfo": { "hasNextPage": true } }, - "pushedDate": "2022-03-09T15:57:16Z", - "oid": "4746da707a9912356f5179625da89616b228dc21" + "status": null, + "pushedDate": "2022-07-27T15:34:17Z", + "oid": "28140e4008289251b695385acfb48ac7a47cd49c" } } ] @@ -13339,7 +21465,7 @@ "files": { "nodes": [ { - "path": "tools/build_variables.bzl" + "path": "test/test_ops.py" } ], "pageInfo": { @@ -13348,54 +21474,88 @@ } }, "reviews": { - "nodes": [], + "nodes": [ + { + "author": { + "login": "zou3519" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "Chillee" + }, + "state": "APPROVED" + } + ], "pageInfo": { - "startCursor": null, + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNy0yNVQxNDo0NTozNS0wNzowMLkyMDIyLTA3LTI1VDE0OjQ1OjM1LTA3OjAwzj6XYmg=", "hasPreviousPage": false } }, "comments": { "nodes": [ { - "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/malfet/pytorch/blob/4746da707a9912356f5179625da89616b228dc21/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\nAdd ciflow labels to this PR to trigger more builds:\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nlinux-binary-libtorch-cxx11-abi\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-binary-libtorch-pre-cxx11\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-binary-manywheel\nciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk\n\u2705 triggered\n\n\nlinux-bionic-rocm4.5-py3.7\nciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk\n\u2705 triggered\n\n\nlinux-docs\nciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-vulkan-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-build\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-custom-build-static\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-onnx\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build\nciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7-no-ops\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nmacos-arm64-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nmacos-arm64-binary-wheel\nciflow/binaries, ciflow/binaries_wheel, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-libtorch-cxx11-abi\nciflow/binaries, ciflow/binaries_libtorch, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-libtorch-pre-cxx11\nciflow/binaries, ciflow/binaries_libtorch, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-wheel\nciflow/binaries, ciflow/binaries_wheel, ciflow/default\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwindows-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nwindows-binary-libtorch-debug\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nwindows-binary-libtorch-release\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nwindows-binary-wheel\nciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\ncaffe2-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\ndocker-builds\nciflow/all, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-custom-ops\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-metal\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda10.2-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-docs-push\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-no-ops\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-arm64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-lite-interpreter-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-11-py3-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.3-py3.7-gcc7-debug\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.5-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-build\nciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\npytorch-xla-linux-bionic-py3.7-clang8\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla\n\ud83d\udeab skipped", + "bodyText": "@pytorchbot merge -f FORCE", + "createdAt": "2022-07-27T17:56:43Z", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1197107402 + }, + { + "bodyText": "You need to provide a reason for using force merge, in the format @pytorchbot merge -f '[CATEGORY] Explanation'. With [CATEGORY] being one the following:\nEMERGENCY - an emergency fix to quickly address an issue\nMINOR - a minor fix such as cleaning locally unused variables, which shouldn't break anything\nPRE_TESTED - a previous CI run tested everything and you've only added minor changes like fixing lint\nOTHER - something not covered above", + "createdAt": "2022-07-27T17:56:45Z", "author": { "login": "pytorch-bot" }, "authorAssociation": "NONE", "editor": null, - "databaseId": 1063079053 + "databaseId": 1197107439 }, { - "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/73969\n\ud83d\udcc4 \u00a0Preview docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 4746da7 (more details on the Dr. CI page):\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "bodyText": "@pytorchbot merge -f \"[OTHER] normal land failed twice already\"", + "createdAt": "2022-07-27T17:57:28Z", "author": { - "login": "facebook-github-bot" + "login": "malfet" }, "authorAssociation": "MEMBER", - "editor": { - "login": "facebook-github-bot" - }, - "databaseId": 1063079113 + "editor": null, + "databaseId": 1197108130 }, { - "bodyText": "This pull request was exported from Phabricator. Differential Revision: D34753911", + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "createdAt": "2022-07-27T18:08:13Z", "author": { - "login": "facebook-github-bot" + "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1063079731 + "databaseId": 1197119348 + }, + { + "bodyText": "Hey @ezyang.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-07-27T18:08:58Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1197120095 } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOP11MjQ==", - "hasPreviousPage": false + "startCursor": "Y3Vyc29yOnYyOpHOR1poyg==", + "hasPreviousPage": true } }, "labels": { "edges": [ { "node": { - "name": "fb-exported" + "name": "Merged" } }, { @@ -13409,118 +21569,93 @@ } } }, - "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAU2F-RA= name=pytorch number=73969 owner=pytorch": { + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=73811 owner=pytorch": { "data": { "repository": { "pullRequest": { - "commits": { - "nodes": [ - { - "commit": { - "oid": "4746da707a9912356f5179625da89616b228dc21", - "checkSuites": { - "edges": [ - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "run-torchbench", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928591?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2c8=", - "hasNextPage": false - } - }, - "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592977" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RE=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "Test tools" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928555?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2as=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592978" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RI=" - }, + "closed": true, + "isCrossRepository": false, + "author": { + "login": "seemethere" + }, + "title": "ci: Migrate metrics credentials to managed IAM", + "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* __->__ #73811\n\r\nMigrates our credentials to upload metrics statistics to managed IAM\r\ncredentials in order to make it easier to know where the credentials are\r\ncoming from and to make it easier to add more permissions / less\r\npermissions later on.\r\n\r\nRelates to work done in [D34535827](https://www.internalfb.com/diff/D34535827)\r\n\r\nSigned-off-by: Eli Uriegas ", + "headRefName": "gh/seemethere/215/head", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "gh/seemethere/215/base", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "seemethere" + }, + "email": "eliuriegas@fb.com", + "name": "Eli Uriegas" + }, + "oid": "13c44d16a876a56bca479b4cf30715d21fa16e99" + } + }, + { + "commit": { + "author": { + "user": { + "login": "seemethere" + }, + "email": "eliuriegas@fb.com", + "name": "Eli Uriegas" + }, + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7" + } + } + ], + "pageInfo": { + "endCursor": "Mg", + "hasNextPage": false + }, + "totalCount": 2 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7" - } + "name": "Facebook GitHub Tools", + "databaseId": 12274 }, + "workflowRun": null, "checkRuns": { "nodes": [ { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928570?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483302702?check_suite_focus=true" - }, - { - "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483302867?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "name": "Facebook CLA Check", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483303104?check_suite_focus=true" + "detailsUrl": "https://code.intern.facebook.com/cla/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbUkMA=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqOaHA=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592980" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658275867" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RQ=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcBs=" }, { "node": { @@ -13530,26 +21665,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-gcc7-no-ops" + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928607?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2d8=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592981" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276090" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RU=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcPo=" }, { "node": { @@ -13563,61 +21692,16 @@ } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928611?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483400398?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483400575?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbWDX8=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592982" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RY=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928548?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aQ=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592983" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276092" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Rc=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcPw=" }, { "node": { @@ -13627,71 +21711,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-clang7-asan" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928603?check_suite_focus=true" - }, - { - "name": "test (default, 3, 3, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483138456?check_suite_focus=true" - }, - { - "name": "test (default, 1, 3, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483138698?check_suite_focus=true" - }, - { - "name": "test (default, 2, 3, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483139049?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbSD-k=", - "hasNextPage": false + "name": "linux-xenial-py3-clang5-mobile-build" } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592985" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Rk=" - }, - { - "node": { - "app": { - "name": "Facebook GitHub Tools", - "databaseId": 12274 - }, - "workflowRun": null, "checkRuns": { - "nodes": [ - { - "name": "Facebook CLA Check", - "conclusion": "SUCCESS", - "detailsUrl": "https://code.intern.facebook.com/cla/" - }, - { - "name": "Meta Internal-Only Changes Check", - "conclusion": "NEUTRAL", - "detailsUrl": "https://opensource.facebook.com/" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO574=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592986" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276094" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Ro=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcP4=" }, { "node": { @@ -13701,31 +21734,20 @@ }, "workflowRun": { "workflow": { - "name": "pytorch-xla-linux-bionic-py3.7-clang8" + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928559?check_suite_focus=true" - }, - { - "name": "test (xla, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483141123?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbSGAM=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592987" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276095" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Rs=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcP8=" }, { "node": { @@ -13735,81 +21757,21 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-gcc5.4" + "name": "Lint" } }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928593?check_suite_focus=true" - }, - { - "name": "test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483106295?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483106609?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483106835?check_suite_focus=true" - }, - { - "name": "test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483107050?check_suite_focus=true" - }, - { - "name": "test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483107208?check_suite_focus=true" - }, - { - "name": "test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483107483?check_suite_focus=true" - } - ], + "checkRuns": { + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRlJs=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592997" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276097" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-SU=" - } - ], - "pageInfo": { - "hasNextPage": true - } - } - } - } - ] - } - } - } - } - }, - "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAU2F-SU= name=pytorch number=73969 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "commits": { - "nodes": [ - { - "commit": { - "oid": "4746da707a9912356f5179625da89616b228dc21", - "checkSuites": { - "edges": [ + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQE=" + }, { "node": { "app": { @@ -13818,41 +21780,20 @@ }, "workflowRun": { "workflow": { - "name": "linux-bionic-py3.7-clang9" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928550?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483083368?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483083553?check_suite_focus=true" - }, - { - "name": "test (noarch, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483083767?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRN_c=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595593001" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276098" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Sk=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQI=" }, { "node": { @@ -13862,7 +21803,7 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3.7-clang7-onnx" + "name": "linux-xenial-py3.7-gcc7-no-ops" } }, "checkRuns": { @@ -13870,28 +21811,18 @@ { "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928572?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483120691?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5483120938?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1983602966/jobs/2839950629" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRySo=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUqObRM=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595593014" + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276099" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-TY=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQM=" }, { "node": { @@ -13901,32 +21832,175 @@ }, "workflowRun": { "workflow": { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + "name": "Test tools" } }, "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5482928605?check_suite_focus=true" - } - ], + "nodes": [], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2d0=", + "endCursor": null, "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595593026" + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276100" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-UI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQQ=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": "CANCELLED", + "url": "https://github.com/pytorch/pytorch/commit/9d26f4e6d8c8df275ea546180fef42548257d2d7/checks?check_suite_id=5658276101" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVFCcQU=" } ], "pageInfo": { - "hasNextPage": false + "hasNextPage": true } - } + }, + "status": { + "contexts": [ + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17044969?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17045014?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17044975?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-03-14T23:01:55Z", + "oid": "9d26f4e6d8c8df275ea546180fef42548257d2d7" + } + } + ] + }, + "changedFiles": 3, + "files": { + "nodes": [ + { + "path": ".github/templates/common.yml.j2" + }, + { + "path": ".github/workflows/generated-macos-11-py3-x86-64.yml" + }, + { + "path": ".github/workflows/update_pytorch_labels.yml" + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "kit1980" + }, + "state": "APPROVED" + }, + { + "author": { + "login": "janeyx99" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMy0wNFQxNDoyNDo0OC0wODowMLkyMDIyLTAzLTA0VDE0OjI0OjQ4LTA4OjAwzjWwwqA=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "Merge failed due to Too many checksuites for commit\nRaised by https://github.com/pytorch/pytorch/actions/runs/1988337976", + "createdAt": "2022-03-15T17:43:28Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068270969 + }, + { + "bodyText": "@pytorchbot force merge this", + "createdAt": "2022-03-15T20:26:36Z", + "author": { + "login": "seemethere" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068436128 + }, + { + "bodyText": "Merge failed due to Too many checksuites for commit\nRaised by https://github.com/pytorch/pytorch/actions/runs/1989076952", + "createdAt": "2022-03-15T20:27:47Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068437098 + }, + { + "bodyText": "@pytorchbot merge this", + "createdAt": "2022-03-15T21:18:55Z", + "author": { + "login": "seemethere" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1068482921 + }, + { + "bodyText": "Hey @seemethere.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-03-15T21:20:40Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1068484404 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOP6yFeQ==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "cla signed" } } ] @@ -13935,22 +22009,22 @@ } } }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=73099 owner=pytorch": { + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=31093 owner=pytorch": { "data": { "repository": { "pullRequest": { "closed": true, - "isCrossRepository": false, + "isCrossRepository": true, "author": { - "login": "BowenBao" + "login": "mingxiaoh" }, - "title": "[ONNX] Make graph name spec-compliant (#71961)", - "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* #73104\n* #73103\n* #73102\n* #73101\n* #73100\n* __->__ #73099\n\n[According to the ONNX spec](https://github.com/onnx/onnx/blob/main/docs/IR.md#names-within-a-graph),\nall names must adhere to C90 identifier syntax rules, which means no\ndashes.\n\nFixes: #30952", - "headRefName": "gh/BowenBao/138/head", + "title": "improve mkldnn convolution test coverage", + "body": "This pr will improve the test coverage of mkldnn convolution.\r\n1.test input: specific sensitive numbers\r\n2.pass criteria: output of mkldnn convolution matches output of thnn convolution\r\n3.coverage: by using coverage tool, we found out the following sensitive parameters. Overall the case will test 4352 patterns, takes 8.8s on my machine.\r\n\r\nto run the test case:\r\n\r\npython test_mkldnn_conv2d_ext.py\r\nor\r\npython run_test.py -i mkldnn_conv2d_ext\r\n\r\nIn case of failure, the pattern will be printed in the log for further debugging.\r\n\r\nactually, this PR is created to replace and improve that PR we created before(https://github.com/pytorch/pytorch/pull/25085) ", + "headRefName": "master", "headRepository": { - "nameWithOwner": "pytorch/pytorch" + "nameWithOwner": "mingxiaoh/pytorch" }, - "baseRefName": "gh/BowenBao/138/base", + "baseRefName": "master", "baseRepository": { "nameWithOwner": "pytorch/pytorch", "isPrivate": false, @@ -13965,12 +22039,12 @@ "commit": { "author": { "user": { - "login": "BowenBao" + "login": "11pikachu" }, - "email": "bowbao@microsoft.com", - "name": "BowenBao" + "email": "junx.du@intel.com", + "name": "dujun" }, - "oid": "3038b939eb2069653305c419326a0f47d2598e39" + "oid": "29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" } } ], @@ -13994,157 +22068,26 @@ }, "workflowRun": { "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "run-torchbench", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161498?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNn9o=", - "hasNextPage": false - } - }, - "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189561" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7k=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161648?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252387496?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252387628?check_suite_focus=true" - }, - { - "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252387825?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkRE_E=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189562" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7o=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-gcc7-no-ops" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161681?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoJE=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189563" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7s=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3-clang5-mobile-build" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161670?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoIY=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189564" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7w=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + "name": "clang-format" } }, "checkRuns": { "nodes": [ { - "name": "build-and-test", + "name": "clang-format", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161691?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676797?check_suite_focus=true" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoJs=", + "endCursor": "Y3Vyc29yOnYyOpHOQYu8fQ==", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189566" + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1175281097" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS74=" + "cursor": "Y3Vyc29yOnYyOpHORg1dyQ==" }, { "node": { @@ -14154,866 +22097,2625 @@ }, "workflowRun": { "workflow": { - "name": "linux-bionic-py3.7-clang9" + "name": "Lint" } }, "checkRuns": { "nodes": [ { - "name": "build", + "name": "flake8-py3", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161678?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676800?check_suite_focus=true" }, { - "name": "test (default, 1, 2, linux.2xlarge)", + "name": "quick-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252286900?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676817?check_suite_focus=true" }, { - "name": "test (noarch, 1, 1, linux.2xlarge)", + "name": "clang-tidy", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252287072?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676829?check_suite_focus=true" }, { - "name": "test (default, 2, 2, linux.2xlarge)", + "name": "cmakelint", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252287232?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/runs/1099676840?check_suite_focus=true" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiwA=", + "endCursor": "Y3Vyc29yOnYyOpHOQYu8qA==", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189567" + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1175281099" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS78=" + "cursor": "Y3Vyc29yOnYyOpHORg1dyw==" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-vulkan-bionic-py3.7-clang9" - } + "name": "Codecov", + "databaseId": 254 }, + "workflowRun": null, "checkRuns": { "nodes": [ { - "name": "build", + "name": "codecov/project", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161699?check_suite_focus=true" + "detailsUrl": "https://codecov.io" }, { - "name": "test (default, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252302340?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPxgQ=", - "hasNextPage": false - } - }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189568" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8A=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", + "name": "codecov/patch", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161696?check_suite_focus=true" + "detailsUrl": "https://codecov.io" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoKA=", + "endCursor": "Y3Vyc29yOnYyOpHOQZhcFQ==", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189570" + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1176100822" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8I=" + "cursor": "Y3Vyc29yOnYyOpHORhnf1g==" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "win-vs2019-cpu-py3" - } + "name": "Codecov", + "databaseId": 254 }, + "workflowRun": null, "checkRuns": { "nodes": [ { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161646?check_suite_focus=true" - }, - { - "name": "test (default, 1, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252830090?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, windows.4xlarge)", + "name": "codecov/patch", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252830141?check_suite_focus=true" + "detailsUrl": "https://codecov.io" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkX070=", + "endCursor": "Y3Vyc29yOnYyOpHOQZZsEQ==", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189571" + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1176100824" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8M=" + "cursor": "Y3Vyc29yOnYyOpHORhnf2A==" }, { "node": { "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "linux-xenial-py3.7-gcc7" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252161666?check_suite_focus=true" - }, - { - "name": "test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252286386?check_suite_focus=true" - }, - { - "name": "test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252286526?check_suite_focus=true" - }, + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ { - "name": "test (default, 1, 2, linux.2xlarge)", + "name": "Facebook CLA Check", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5252286720?check_suite_focus=true" + "detailsUrl": "https://code.facebook.com/cla/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiQA=", + "endCursor": "Y3Vyc29yOnYyOpHOUquzJg==", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189572" + "url": "https://github.com/pytorch/pytorch/commit/29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9/checks?check_suite_id=1487517306" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8Q=" + "cursor": "Y3Vyc29yOnYyOpHOWKm2eg==" } ], "pageInfo": { - "hasNextPage": true + "hasNextPage": false } }, - "pushedDate": "2022-02-18T18:46:28Z", - "oid": "3038b939eb2069653305c419326a0f47d2598e39" + "status": { + "contexts": [ + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406538?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406947?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406544?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406931?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_windows_libtorch_3_7_cpu_debug_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406550?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_windows_libtorch_3_7_cpu_debug_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406887?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_windows_libtorch_3_7_cpu_release_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406526?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: binary_windows_libtorch_3_7_cpu_release_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406707?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: caffe2_onnx_main_py3_6_clang7_ubuntu16_04_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406533?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: caffe2_onnx_main_py3_6_clang7_ubuntu16_04_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407256?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407254?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407255?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406556?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406532?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406527?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406553?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-py3.6-clang9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406537?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-py3.8-gcc9", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406529?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-rocm3.5.1-py3.6", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406554?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-bionic-rocm3.7-py3.6", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406545?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406543?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406536?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406552?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406535?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406540?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406528?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406541?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-asan", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406549?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-clang7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406555?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-gcc4.8", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406546?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-gcc5.4", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406531?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-gcc7", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406534?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.6-gcc7.2", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406523?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3.8", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406539?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-rocm3.3-py3.6", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406547?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-rocm3.5.1-py3.6", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406551?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407209?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406611?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_bazel_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406607?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_bazel_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406984?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_cpp_doc_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407013?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_doc_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407011?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_ios_11_2_1_x86_64_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406548?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_libtorch_linux_xenial_cuda11_0_cudnn8_py3_gcc7_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406563?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_libtorch_linux_xenial_cuda11_0_cudnn8_py3_gcc7_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7408680?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_backward_compatibility_check_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407014?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_py3_6_clang9_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406567?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_py3_6_clang9_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406945?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_py3_8_gcc9_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406561?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_py3_8_gcc9_coverage_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407422?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_bionic_rocm3_7_py3_6_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406562?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406612?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7408107?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_ge_config_legacy_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7408111?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_ge_config_profiling_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7408101?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406613?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406565?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407017?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_profiling_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407019?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407012?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_6_gcc5_4_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407016?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_vulkan_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406608?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406609?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_asan_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406606?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_asan_test1", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407435?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_asan_test2", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407436?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_mobile_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406605?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_mobile_custom_build_dynamic", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406610?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_macos_10_13_py3_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406525?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_macos_10_13_py3_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407415?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_python_doc_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407018?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_vulkan_linux_bionic_py3_6_clang9_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406566?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_vulkan_linux_bionic_py3_6_clang9_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406946?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cpu_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406542?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cuda10.1_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406530?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cuda10.1_test1", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407028?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cuda10.1_test2", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407027?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_windows_vs2019_py36_cuda11.0_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406524?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_xla_linux_bionic_py3_6_clang9_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7406572?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_xla_linux_bionic_py3_6_clang9_test", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/7407253?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "codecov/patch", + "state": "SUCCESS", + "targetUrl": "https://codecov.io/gh/pytorch/pytorch/compare/69f6d94caa3559d4f50745c26af5df041b83fee8...29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + }, + { + "context": "codecov/project", + "state": "SUCCESS", + "targetUrl": "https://codecov.io/gh/pytorch/pytorch/compare/69f6d94caa3559d4f50745c26af5df041b83fee8...29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" + }, + { + "context": "pr/caffe2-pytorch-linux-bionic-rocm3.7-py3.6-test", + "state": "SUCCESS", + "targetUrl": "https://ci.pytorch.org/jenkins/job/caffe2-builds/job/pytorch-linux-bionic-rocm3.7-py3.6-trigger-test/2319/" + }, + { + "context": "pr/pytorch-linux-bionic-rocm3.7-py3.6", + "state": "SUCCESS", + "targetUrl": "https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm3.7-py3.6-trigger/2325/" + } + ] + }, + "pushedDate": "2020-09-11T01:58:24Z", + "oid": "29f6aa6ecc2ece3fa58170ff4561f9d8d5c129f9" } } ] }, - "changedFiles": 162, + "changedFiles": 5, "files": { "nodes": [ { - "path": "test/onnx/expect/TestOperators.test_acos.expect" + "path": "test/math_libraries/convolutions.py" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_googlenet_v3.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_maskrcnn_p1.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_mobilenet.json" + }, + { + "path": "test/math_libraries/convolutions_cases/shapes_resnet_50.json" + } + ], + "pageInfo": { + "endCursor": "NQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "CHANGES_REQUESTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_add_broadcast.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_add_left_broadcast.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_add_size1_broadcast.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_addconstant.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_addmm.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_arange_dynamic.expect" + "author": { + "login": "mruberry" + }, + "state": "CHANGES_REQUESTED" }, { - "path": "test/onnx/expect/TestOperators.test_argmax.expect" + "author": { + "login": "ailzhang" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_asin.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_at_op.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_atan.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_aten_embedding_1.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_aten_embedding_2.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_avg_pool2d.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_baddbmm.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_basic.expect" + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_batchnorm.expect" + "author": { + "login": "VitalyFedyunin" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_batchnorm_1d.expect" + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect" + "author": { + "login": "mingxiaoh" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_batchnorm_training.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_bitshift.expect" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "mingxiaoh" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "VitalyFedyunin" + }, + "state": "COMMENTED" + }, + { + "author": { + "login": "VitalyFedyunin" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAxOS0xMi0zMFQxMDoxOToxMS0wODowMLkyMDE5LTEyLTMwVDEwOjE5OjExLTA4OjAwzhQZLuY=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "I cloned your repo and ran the tests:\n~/pytorch/test/math_libraries$ python convolutions.py\nFFFF\n======================================================================\nFAIL: test_conv2d_ext_cpu_float32 (__main__.TestConvExtCPU)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n----------------------------------------------------------------------\nRan 4 tests in 33.838s\n\nFAILED (failures=4)\n\nStill fails.\n\n@mruberry It is suggested by @VitalyFedyunin that, we need to display fail test to avoid invalid inputs, I guess we should set it as expected failures under the pytest test framework, right? we will change it as expected failure cases under pytest test framework. The result will looks like be low, is it ok?\n2500 passed, 136 skipped, 0 failed, 0 errors, 2 expected failures, 0 unexpected passes", + "createdAt": "2020-08-14T01:36:20Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 673816925 + }, + { + "bodyText": "Displaying tests that fail is fine, but I don't think @VitalyFedyunin meant that it was OK if the tests didn't pass. If these are expected failures then yes, you can use with self.assertRaises(RuntimeError):... when testing them. If you also want to report that the test has test cases with these properties you can print or warn, which will appear in the test output.", + "createdAt": "2020-08-14T03:09:37Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 673858224 + }, + { + "bodyText": "Codecov Report\n\nMerging #31093 into master will not change coverage.\nThe diff coverage is n/a.\n\n\n@@ Coverage Diff @@\n## master #31093 +/- ##\n=======================================\n Coverage 68.00% 68.00% \n=======================================\n Files 382 382 \n Lines 49527 49527 \n=======================================\n Hits 33679 33679 \n Misses 15848 15848 \n\nContinue to review full report at Codecov.\n\nLegend - Click here to learn more\n\u0394 = absolute (impact), \u00f8 = not affected, ? = missing data\nPowered by Codecov. Last update 69f6d94...29f6aa6. Read the comment docs.", + "createdAt": "2020-09-04T05:41:01Z", + "author": { + "login": "codecov" + }, + "authorAssociation": "NONE", + "editor": { + "login": "codecov" + }, + "databaseId": 686921371 + }, + { + "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. Stale pull requests will automatically be closed 30 days after being marked Stale", + "createdAt": "2022-04-12T02:35:37Z", + "author": { + "login": "pytorchbot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1095860944 + }, + { + "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.", + "createdAt": "2022-06-11T04:40:16Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1152854802 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOKCmhXQ==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "triaged" + } + }, + { + "node": { + "name": "open source" + } + }, + { + "node": { + "name": "cla signed" + } + }, + { + "node": { + "name": "Stale" + } + } + ] + } + } + } + } + }, + "query_sha=2e2877d2452c4f233f042b7ccd50ab9c2a6e9a73d8819a0c876203c12364e8a3 cursor=Y3Vyc29yOnYyOpHOKCmhXQ== name=pytorch number=31093 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "comments": { + "nodes": [ + { + "bodyText": "Hi, @mingfeima @soumith @Jianhui-Li\nthis will improve the test coverage of mkldnn convolution, would you please review it?\nThe current code is forward only, do we need to cover backward, if yes, we can add backward.", + "createdAt": "2019-12-12T01:19:02Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 564806270 + }, + { + "bodyText": "@mingxiaoh, what is the value in testing DNNL as part of Pytorch validation for the Pytorch developers? Shouldn't having these tests run in DNNL validation be enough?", + "createdAt": "2019-12-12T01:28:32Z", + "author": { + "login": "vpirogov" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 564808528 + }, + { + "bodyText": "@vpirogov The main value is to serve as a blind test to DNNL. If DNNL adds these test to DNNL test sets, it lost the value as a blind test. The spirit of validation is to cross check.\n@gottbrath @gchanan The test was developed per the request of Pytorch team. Mingxiao made an effort to reduce the execution time to a few second but still with good coverage. Although the test today is focused on DNNL, it could be easily extended to be blind test for any conv implementation used in Pytorch.", + "createdAt": "2019-12-20T07:44:30Z", + "author": { + "login": "Jianhui-Li" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 567826907 }, { - "path": "test/onnx/expect/TestOperators.test_c2_op.expect" + "bodyText": "@mruberry thanks for the comment. As for the chainer dependency, we import it is because we would like to use its testing function for pytest test cases combinations, other wise we need to write much more code to achieve same effect. So, can we use it?", + "createdAt": "2020-01-15T09:04:34Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 574563012 }, { - "path": "test/onnx/expect/TestOperators.test_chunk.expect" + "bodyText": "@mingxiaoh You cannot import chainer. Looking at the code you should be able to achieve the same effect without it.", + "createdAt": "2020-01-16T17:59:46Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 575272358 }, { - "path": "test/onnx/expect/TestOperators.test_clip.expect" + "bodyText": "@mruberry ok, we will change it according to your requirement. Thanks", + "createdAt": "2020-02-10T00:59:34Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 583917522 }, { - "path": "test/onnx/expect/TestOperators.test_clip_max.expect" + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/31093\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 29f6aa6 (more details on the Dr. CI page):\n\nCommit 29f6aa6 was recently pushed. Waiting for builds...\n\nThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2020-05-14T08:04:30Z", + "author": { + "login": "dr-ci" + }, + "authorAssociation": "NONE", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 628466876 }, { - "path": "test/onnx/expect/TestOperators.test_clip_min.expect" + "bodyText": "@mruberry how about those cudnn UT error? we add check for it but it should be NV to fix cudnn bugs.", + "createdAt": "2020-05-18T05:34:11Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 629955767 }, { - "path": "test/onnx/expect/TestOperators.test_concat2.expect" + "bodyText": "Hey @mingxiaoh! You're right, of course, that you shouldn't have to fix cuDNN bugs. Would you please:\n\nAssert that the test case fails, so we know it's failing and if someone fixes it they'll know what test to update.\nFile a new issue explaining the behavior and providing a short PyTorch program to reproduce the issue.\n\nThen we can ping NVIDIA on that issue.", + "createdAt": "2020-05-18T07:27:08Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 629997129 }, { - "path": "test/onnx/expect/TestOperators.test_conv.expect" + "bodyText": "about the suggestion 'Assert that the test case fails, so we know it's failing and if someone fixes it they'll know what test to update. ', if we only assert it and continue the following test, I guess users might always ignore them in later test. Anyway, any similar example case for reference?", + "createdAt": "2020-05-18T07:55:08Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 630010734 }, { - "path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect" + "bodyText": "In this recent PR https://github.com/pytorch/pytorch/pull/38505/files, for example, you can see that the construction of bool tensors wasn't working properly, so the test author cited the relevant issue and asserted that the incorrect behavior happened, as expected. You can also see how these lines are being removed by https://github.com/pytorch/pytorch/pull/38392/files, which fixes the issue.\nAnother common pattern is to use with self.assertRaises(RuntimeError/AssertionError/etc.):.", + "createdAt": "2020-05-18T08:02:13Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 630014823 }, { - "path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4_opset8.expect" + "bodyText": "@mruberry the failed UT case is not introduced by our modification, how to handle this issue?", + "createdAt": "2020-05-20T01:59:13Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631187735 }, { - "path": "test/onnx/expect/TestOperators.test_convtranspose.expect" + "bodyText": "@mingxiaoh You mean the failures on ROCm? You may ignore them. Be sure to re-request review when you're ready.", + "createdAt": "2020-05-20T02:12:58Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 631191425 }, { - "path": "test/onnx/expect/TestOperators.test_cos.expect" + "bodyText": "@mruberry we already skipped those ROCm errors, but there are stil somel error caused by the original code, they are not introduced by our modification.", + "createdAt": "2020-05-21T05:18:07Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631886529 }, { - "path": "test/onnx/expect/TestOperators.test_cumsum.expect" + "bodyText": "I understand. Let me know when you're ready for me to review.", + "createdAt": "2020-05-21T06:24:15Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 631908011 }, { - "path": "test/onnx/expect/TestOperators.test_det.expect" + "bodyText": "@mruberry thanks, we are ready for review now.", + "createdAt": "2020-05-21T06:28:11Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 631909442 }, { - "path": "test/onnx/expect/TestOperators.test_dict.expect" + "bodyText": "@mingxiaoh Great! I'll take a look ASAP.", + "createdAt": "2020-05-21T06:31:10Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 631910556 }, { - "path": "test/onnx/expect/TestOperators.test_dict_str.expect" + "bodyText": "@mruberry we just pull the latest code and updated the patch according to your comment, may you please help double check it? BTW, the new failed case in preci is not introduced by our modification.", + "createdAt": "2020-05-25T07:44:58Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 633430458 }, { - "path": "test/onnx/expect/TestOperators.test_dim.expect" + "bodyText": "@ailzhang would you please check the comment below? Thanks.\nIs there a reason why this TestConv2dExt is a new class instead a test inside TestNN?\n//comment: it is actually suggested by Tongzhou Wang in another thread before.\nAlthough this test sits in generic testing framework, it's actually comparing thnn/mkldnn/cudnn results specially. I feel it's better to make it truly generic so that it compares any device result with CPU result. Alternatively you can mark this test only run when torch.backends.mkldnn.is_available()=True\n//comment: but our goal is to compare the result with that of thnn. Anyway, if you insist, we can start to compare it with cpu.", + "createdAt": "2020-05-27T05:11:08Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 634432326 }, { - "path": "test/onnx/expect/TestOperators.test_dropout.expect" + "bodyText": "Pruning reviewers. @ngimel, @VitalyFedyunin, this PR is looking pretty good from a test framework perspective. Would one of you like to review?", + "createdAt": "2020-05-27T09:58:42Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 634557563 }, { - "path": "test/onnx/expect/TestOperators.test_dropout_default.expect" + "bodyText": "@mruberry Thanks, would you please help review it again. BTW: failed case is not introduced by our modification.", + "createdAt": "2020-05-28T10:26:32Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 635256214 }, { - "path": "test/onnx/expect/TestOperators.test_dropout_opset12.expect" + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code", + "createdAt": "2020-06-02T08:00:01Z", + "author": { + "login": "1pikachu" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637364148 }, { - "path": "test/onnx/expect/TestOperators.test_dropout_training.expect" + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code\n\n@ngimel will follow-up on the test itself sometime this week or early next week.", + "createdAt": "2020-06-02T10:23:47Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 637444457 }, { - "path": "test/onnx/expect/TestOperators.test_dropout_training_opset12.expect" + "bodyText": "@mruberry we moved our case to TestNNDeviceType class, would you please help review it again? BTW, those failed cases are not introduced by our code\n\n@ngimel will follow-up on the test itself sometime this week or early next week.\n\n@mruberry thank you", + "createdAt": "2020-06-02T11:32:06Z", + "author": { + "login": "1pikachu" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637479226 }, { - "path": "test/onnx/expect/TestOperators.test_dynamic_axes_add.expect" + "bodyText": "Improving test coverage of math libraries is certainly a good goal and this PR is moving towards it. I have some doubts about implementation decisions made, and about running this PR as part of regular pytorch CI.\nIf the primary goal of this PR is to test correctness of the convolution implementations in the vendor library, then it does not serve this purpose. The absolute majority of the 4000+ test cases come from group 1, where different kernel sizes/strides/dilations are used to produce the output of size 1x1. This can test whether pytorch correctly passes convolution parameters to the backends (although there are cheaper ways to do that), but as actual library correctness check it is almost useless - libraries use very different kernels depending in the input/output sizes, and tests with toy sizes like this don't invoke the real bread-and-butter kernels.\nAlso, if this test suite is meant as primary a means of testing vendor libraries (which is a good goal!) it does not have a place as a part of pytorch regular CI, and should be run when the corresponding vendor libraries are updated. I'd suggest moving this test out into a separate file (maybe even outside of torch/test directory) and have it as a part of library update/qualification process rather than regular CI.\nAlso, if the primary goal is to enable easier testing of vendor libraries correctness, perhaps we should rethink the mechanism of the generation of test cases. It should be easy to add a test case with a particular set of parameters that was found to be buggy. Also, running a cross-product of cases in a multi-dimensional space (as this PR does) is rarely an efficient way of getting a signal, some forms of random sampling usually provide a way to get better correctness signal why using less resources.\nAlso, when testing libraries it is important to test both forward and backward functions, whereas this PR does forward only. I'm openminded on whether convTransposed should be tested or not - if we are testing vendor libraries, then it's not necessary, convTransposed calls the same underlying functions, if we are testing pytorch, then it makes sense to test it separately because it takes different codepaths.", + "createdAt": "2020-06-02T21:56:33Z", + "author": { + "login": "ngimel" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 637827507 }, { - "path": "test/onnx/expect/TestOperators.test_dynamic_axes_add_inputs_same_symbolic_shape.expect" + "bodyText": "@mruberry ngimel is quite responsible, but it seems that she is not familiar with the background of this pull-request, since this pull-request is pending for so such a long time, each time we are almost done, then reviewer changes, each reviewer has different idea, it is good, but, would it be better if you help review it or ask the same reviewer to review it considering that you are more familiar with the background/change history? Thanks in advance.", + "createdAt": "2020-06-03T02:16:07Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637912105 }, { - "path": "test/onnx/expect/TestOperators.test_dynamic_axes_matmul.expect" + "bodyText": "@mruberry ngimel is quite responsible, but it seems that she is not familiar with the background of this pull-request, since this pull-request is pending for so such a long time, each time we are almost done, then reviewer changes, each reviewer has different idea, it is good, but, would it be better if you help review it or ask the same reviewer to review it considering that you are more familiar with the background/change history? Thanks in advance.\n\nWe know this PR has been open for awhile and we respect that your time is valuable, but we want to make sure we're making the right change here, and I think @ngimel's comments reflect that and should not be too difficult to address. As I understand, her points are:\n\nThis is a good PR with an exciting idea. To let it run longer and test more cases maybe it should run outside the regular PyTorch CI.\nTo remedy this, let's create a test/math_libraries folder and put this test there: test/math_libaries/convolutions.py. Yes, this is different from our requests in the past, which is our mistake, but it should be an easy change.\nTo make the test more interesting it'd be good for the test cases to resemble convolutions used in practice. The current test cases seem like similar \"toy\" examples. Without time pressure we should be able to run larger, more computationally intensive convolutions.\nLet's change the test cases to include some practical convolutions, make it easy to add test cases, and think about how we might generate other interesting cases. (We should also test backwards once we have more time!)\n\nAnd I think these are good points. Maybe the PR doesn't create a new way to generate interesting convolutions to start and instead only runs a few representative convolutions, but @ngimel is positioning the work for success so that it's useful and we can continue to improve on it in the future.\nDoes that make sense?", + "createdAt": "2020-06-03T03:04:55Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 637924703 }, { - "path": "test/onnx/expect/TestOperators.test_dynamic_axes_reduce_mean.expect" + "bodyText": "@mruberry we were required to finish the test in limited time long long before, at that time, jianhui discussed this issue with you, and you are all agreed with the current test scope and test case number and test time, so you meant you change your mind now? you are not care about the test time currently? Sorry, this issue is pending so long, we are struggling with it now and would like to finish it asap. Given this, it would be be better if you raise all the requirement at a time, considering that we have many tasks at hand, we are hoping so eagerly that we can finish this PR and use it for further test for bugs finding.", + "createdAt": "2020-06-03T05:22:43Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": { + "login": "mingxiaoh" + }, + "databaseId": 637960626 }, { - "path": "test/onnx/expect/TestOperators.test_dynamic_axes_unchange.expect" + "bodyText": "@mruberry we were required to finish the test in limited time long long before, at that time, jianhui discussed this issue with you, and you are all agreed with the current test scope and test case number and test time, so you meant you change your mind now? you are not care about the test time currently? Sorry, this issue is pending so long, we are struggling with it now and would like to finish it asap. Given this, it would be be better if you raise all the requirement at a time, considering that we have many tasks at hand, we are hoping so eagerly that we can finish this PR and use it for further test for bugs finding.\n\nI'm sorry, I don't think I've talked to @Jianhui-Li before. It's true that the team we expressed a concern about timing if the test was to be run in the CI initially, but I think now that we understand what the test is trying to do better we're not sure the CI is the best place for it. The PR was also closed after a lengthy period of inactivity, and we assumed it had simply been abandoned.\nDo you know who @Jianhui-Li spoke with about this issue originally? Maybe I can follow-up with them for more context.", + "createdAt": "2020-06-03T05:42:28Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 637967153 }, { - "path": "test/onnx/expect/TestOperators.test_elu.expect" + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?", + "createdAt": "2020-06-03T06:13:14Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 637978356 }, { - "path": "test/onnx/expect/TestOperators.test_embedding_bags.expect" + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?\n\nI think this will be easier to discuss at the regular Intel-FB meeting.", + "createdAt": "2020-06-03T20:34:05Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 638446723 }, { - "path": "test/onnx/expect/TestOperators.test_empty_like.expect" + "bodyText": "@mruberry it is reviewed and discussed with @soumith before. Anyway, since current reviewer is you, so, it should be decided by you. So, what we should do next?\n\nI think this will be easier to discuss at the regular Intel-FB meeting.\n\nLet me sync with Mingxiao and follow up with this. Thanks.", + "createdAt": "2020-06-03T20:44:44Z", + "author": { + "login": "Jianhui-Li" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 638451670 }, { - "path": "test/onnx/expect/TestOperators.test_empty_like_opset7.expect" + "bodyText": "@mruberry would you please help review it again?", + "createdAt": "2020-07-02T14:09:23Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 653028208 }, { - "path": "test/onnx/expect/TestOperators.test_equal.expect" + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?", + "createdAt": "2020-07-06T20:15:04Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 654443242 }, { - "path": "test/onnx/expect/TestOperators.test_erf.expect" + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks", + "createdAt": "2020-07-09T11:04:06Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 656062287 }, { - "path": "test/onnx/expect/TestOperators.test_exp.expect" + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks\n\n@mruberry the code is ready for review now, would you please take time for it? Thanks.", + "createdAt": "2020-07-14T09:16:48Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 658071151 }, { - "path": "test/onnx/expect/TestOperators.test_expand.expect" + "bodyText": "super nit: renaming files to .json will make it more IDE friendly.", + "createdAt": "2020-07-14T23:38:37Z", + "author": { + "login": "VitalyFedyunin" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 658464685 }, { - "path": "test/onnx/expect/TestOperators.test_flatten.expect" + "bodyText": "@mruberry would you please help review it again?\n\nHappy to help out, but as last discussed this needs some follow-up at the Intel-FB meeting. Did you get a chance to discuss it there, yet? If so, what did you decide?\n\nyes, we talked it with jianhui, and we decided to follow your ideas. Anyway, we would like to do so modification later, will contact you for review tomorrow. Thanks\n\n@mruberry the code is ready for review now, would you please take time for it? Thanks.\n\nCool! I took a look with @ngimel, once these issues are addressed I think we're good to go!", + "createdAt": "2020-07-16T05:17:29Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 659164401 }, { - "path": "test/onnx/expect/TestOperators.test_flatten2D.expect" + "bodyText": "@ngimel & @VitalyFedyunin We have changed the code according to your suggestions, would you please review it again? Thanks.", + "createdAt": "2020-07-20T08:30:01Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 660884305 }, { - "path": "test/onnx/expect/TestOperators.test_fmod.expect" + "bodyText": "@ngimel & @VitalyFedyunin We have changed the code according to your suggestions, would you please review it again? Thanks.\n\nUpdated: one more question about tolerances, one code cleanup recommendation, and one task leftover from the last review.", + "createdAt": "2020-07-22T20:26:42Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 662678464 }, { - "path": "test/onnx/expect/TestOperators.test_frobenius_norm.expect" + "bodyText": "Updated: one more question about tolerances, one code cleanup recommendation, and one task leftover from the last review.\n@mruberry we have finished the modification according to your comment, would you please review it again? Thanks.", + "createdAt": "2020-07-23T10:24:26Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 662930687 }, { - "path": "test/onnx/expect/TestOperators.test_full.expect" + "bodyText": "The code looks good, but I tried running the test suite and hit the following failures:\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 102, in test_conv2d_ext\n msg=msg\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 1085, in assertEqual\n self.assertTrue(result, msg=msg)\nAssertionError: False is not true : device:cuda:0, dtype:torch.float16, group:1, batchsize:22input channel:448, output channel:384, bias:False, padding:[1, 1], dilation:[1, 1], stride:[1, 1], kernel:[3, 3]\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 102, in test_conv2d_ext\n msg=msg\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 1085, in assertEqual\n self.assertTrue(result, msg=msg)\nAssertionError: False is not true : device:cuda:0, dtype:torch.float32, group:1, batchsize:22input channel:80, output channel:192, bias:False, padding:[0, 0], dilation:[1, 1], stride:[1, 1], kernel:[3, 3]\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 777, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 241, in instantiated_test\n result = test(self, device_arg, dtype)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 542, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 411, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 106, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\nLooking at the first invalid convolution, for example, it's:\n {\n \"case_name\":\"masknet_p1:conv33\",\n \"mb\":1,\n \"g\":1,\n \"ic\":512,\n \"ih\":64,\n \"iw\":64,\n \"oc\":12,\n \"kh\":1,\n \"kw\":1,\n \"sh\":1,\n \"sw\":1,\n \"ph\":0,\n \"pw\":0,\n \"dh\":0,\n \"dw\":0,\n \"bias\":\"False\"\n },\n\nwhich has a dh and dw of zero, causing it to be added to invalid cases here:\ndh, dw = case['dh'], case['dw']\n has_bias = case['bias']\n if dh == 0 or dw == 0:\n invalid_cases.append(case_name)", + "createdAt": "2020-07-23T21:25:19Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "mruberry" + }, + "databaseId": 663240268 }, { - "path": "test/onnx/expect/TestOperators.test_full_like.expect" + "bodyText": "@mruberry the failure was not detected is because we did not export the cudnn path. Yes, you are right, we need to a large atol of 1e-2 . Would you please help review it again? Thanks.", + "createdAt": "2020-07-27T12:43:44Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 664373079 }, { - "path": "test/onnx/expect/TestOperators.test_gather.expect" + "bodyText": "@mruberry the failure was not detected is because we did not export the cudnn path. Yes, you are right, we need to a large atol of 1e-2 . Would you please help review it again? Thanks.\n\nBefore I run these tests again, is an atol of 1e-2 needed for all types or just half? Also, how does 1e-2 compare to the values that are being compared?", + "createdAt": "2020-07-27T18:39:27Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 664569507 }, { - "path": "test/onnx/expect/TestOperators.test_gather_opset11.expect" + "bodyText": "@mruberry 1e-2 is experimental result, details see below, random means it might be failed sometimes.\n\n\n\natol,rtol\n1e-2,1e-2\n1e-2,1e-3\n1e-3,1e-2\n1e-3,1e-3\n1e-4,1e-3\n1e-3,1e-4\n1e-4,1e-4\n1e-4,1e-5\n1e-5,1e-4\n\n\n\n\nCuda float16\npass\npass\npass\npass\npass\nfail\nFail\nFail\nfail\n\n\nCuda float32\npass\nrandom\nrandom\nrandom\nrandom\nrandom\nrandom\nrandom\nfail", + "createdAt": "2020-07-31T03:33:27Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 666894774 }, { - "path": "test/onnx/expect/TestOperators.test_ge.expect" + "bodyText": "@mruberry would you please find time to review it again? Thanks.", + "createdAt": "2020-08-04T05:01:20Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 668380451 }, { - "path": "test/onnx/expect/TestOperators.test_gelu.expect" + "bodyText": "@mruberry would you please find time to review it again? Thanks.\n\nI was just about to try and run this again locally but it looks like the files describing the convolutions are missing?", + "createdAt": "2020-08-07T03:49:44Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 670306210 }, { - "path": "test/onnx/expect/TestOperators.test_gt.expect" + "bodyText": "@mruberry sorry but what is missing actually?", + "createdAt": "2020-08-07T05:00:20Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 670322557 }, { - "path": "test/onnx/expect/TestOperators.test_hardtanh.expect" + "bodyText": "@mruberry sorry but what is missing actually?\n\nThe JSON files.", + "createdAt": "2020-08-07T16:06:41Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 670591170 }, { - "path": "test/onnx/expect/TestOperators.test_implicit_expand.expect" + "bodyText": "@mruberry sorry but what is missing actually?\n\nThe JSON files.\n\n@mruberry sorry, we add them now, would you please check it again? Thanks.", + "createdAt": "2020-08-13T10:40:11Z", + "author": { + "login": "mingxiaoh" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 673402901 }, { - "path": "test/onnx/expect/TestOperators.test_index.expect" - }, + "bodyText": "I cloned your repo and ran the tests:\n~/pytorch/test/math_libraries$ python convolutions.py\nFFFF\n======================================================================\nFAIL: test_conv2d_ext_cpu_float32 (__main__.TestConvExtCPU)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float16 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float32 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n======================================================================\nFAIL: test_conv2d_ext_cuda_float64 (__main__.TestConvExtCUDA)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_utils.py\", line 815, in wrapper\n method(*args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 244, in instantiated_test\n result = test(self, *args)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 615, in only_fn\n return fn(self, device, *args, **kwargs)\n File \"/private/home/mruberry/git/pytorch/torch/testing/_internal/common_device_type.py\", line 472, in dep_fn\n return fn(slf, device, *args, **kwargs)\n File \"convolutions.py\", line 114, in test_conv2d_ext\n \"invalid cases:\" + \",\".join(invalid_cases)\nAssertionError: invalid cases:masknet_p1:conv33,masknet_p1:conv8,masknet_p1:conv2*4,masknet_p1:conv12,masknet_p1:conv4*3,masknet_p1:conv19,masknet_p1:conv4,masknet_p1:conv4,masknet_p1:conv27,masknet_p1:conv39,masknet_p1:conv23,masknet_p1:conv20,masknet_p1:conv25,masknet_p1:conv17,masknet_p1:conv9*4,masknet_p1:conv36,masknet_p1:conv18,masknet_p1:conv5,masknet_p1:conv38,masknet_p1:conv31,masknet_p1:conv14,masknet_p1:conv26,masknet_p1:conv2,masknet_p1:conv5*2,masknet_p1:conv28,masknet_p1:conv16,masknet_p1:conv20*3,masknet_p1:conv9,masknet_p1:conv14*23,masknet_p1:conv32,masknet_p1:conv30,masknet_p1:conv35,masknet_p1:conv37,masknet_p1:conv3,masknet_p1:conv24,masknet_p1:conv13,masknet_p1:conv21*3,masknet_p1:conv10,masknet_p1:conv7,masknet_p1:conv34,masknet_p1:conv13*24,masknet_p1:conv10*4,masknet_p1:conv22*2,masknet_p1:conv6,masknet_p1:conv22,masknet_p1:conv11,masknet_p1:conv40,masknet_p1:conv15,masknet_p1:conv17*23,masknet_p1:conv29,masknet_p1:conv21,masknet_p1:conv1,masknet_p1:conv11*3,mobilenet:conv3,mobilenet:conv2*4,mobilenet:conv6,mobilenet:conv7,mobilenet:conv5*4,mobilenet:conv4*4,mobilenet:conv7*4,mobilenet:conv1*3,mobilenet:conv10,mobilenet:conv2,mobilenet:conv5,mobilenet:conv4,mobilenet:conv9*4,mobilenet:conv8,mobilenet:conv9,mobilenet:conv6*4,mobilenet:conv10*4,mobilenet:conv11,mobilenet:conv8*20,mobilenet:conv1,mobilenet:conv11*4,mobilenet:conv3*4\n\n----------------------------------------------------------------------\nRan 4 tests in 33.838s\n\nFAILED (failures=4)\n\nStill fails.", + "createdAt": "2020-08-13T23:35:00Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 673760580 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOIapCfg==", + "hasPreviousPage": false + } + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=76118 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "malfet" + }, + "title": "Dummy change with lots of commits", + "body": "Draft PR with 100+ commits, to test mergebot ", + "headRefName": "malfet/pr-with-lots-of-commits", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { - "path": "test/onnx/expect/TestOperators.test_isnan.expect" + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "3067f2240afc7a29dc348000aa19eccbd9772303" + } }, { - "path": "test/onnx/expect/TestOperators.test_layer_norm_aten.expect" + "commit": { + "author": { + "user": { + "login": "andrewor14" + }, + "email": "andrewor@fb.com", + "name": "Andrew Or" + }, + "oid": "2f655b71f70c496c4e645f6cdb27d7bb7e825701" + } }, { - "path": "test/onnx/expect/TestOperators.test_le.expect" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "0c6dcaa7f58a19c42a530f4ee14bb6f0f03ca9fb" + } }, { - "path": "test/onnx/expect/TestOperators.test_linear.expect" + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "cad11c563d41ebcffb1683fe1f1288b8157413b3" + } }, { - "path": "test/onnx/expect/TestOperators.test_log_sigmoid.expect" + "commit": { + "author": { + "user": null, + "email": "jwtan@fb.com", + "name": "Jiewen Tan" + }, + "oid": "4dfd0875a68d87fccb5ad0d81692db480043b86e" + } }, { - "path": "test/onnx/expect/TestOperators.test_logsoftmax.expect" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "2d37e74690582a4a26890e4c8b98f1f80e589c82" + } }, { - "path": "test/onnx/expect/TestOperators.test_lstm_none_sequence_lens.expect" + "commit": { + "author": { + "user": null, + "email": "jwtan@fb.com", + "name": "Jiewen Tan" + }, + "oid": "d4aee60947e1a3ef23c7c42990621e0746fdd0a8" + } }, { - "path": "test/onnx/expect/TestOperators.test_lt.expect" + "commit": { + "author": { + "user": { + "login": "peterbell10" + }, + "email": "peterbell10@live.co.uk", + "name": "Peter Bell" + }, + "oid": "aac6204bf710beb5e50a383d426ae6222396335a" + } }, { - "path": "test/onnx/expect/TestOperators.test_master_opset.expect" + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "4b0362cab884584c24f5834b3874f5f357f56b5d" + } }, { - "path": "test/onnx/expect/TestOperators.test_max.expect" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "7536df613cbc645a9e68e6a3b0a8450753260fd1" + } }, { - "path": "test/onnx/expect/TestOperators.test_maxpool.expect" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "20a50cb966d28d7bf82924adf781cf72a01ef90e" + } }, { - "path": "test/onnx/expect/TestOperators.test_maxpool_dilations.expect" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "486387e8644afb46edff5aa5925b55c8119f67f0" + } }, { - "path": "test/onnx/expect/TestOperators.test_maxpool_indices.expect" + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "acb9d78b9b732d3667b881727e6ed9f92a8c549f" + } }, { - "path": "test/onnx/expect/TestOperators.test_mean.expect" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "683bb7959a5b973f8470c081ad02e8fc508e784a" + } }, { - "path": "test/onnx/expect/TestOperators.test_mean_dtype.expect" + "commit": { + "author": { + "user": { + "login": "qihqi" + }, + "email": "qihan@fb.com", + "name": "Han Qi" + }, + "oid": "a870cb40af65adf0b77d55f6b554d7093d284d7a" + } }, { - "path": "test/onnx/expect/TestOperators.test_meshgrid.expect" + "commit": { + "author": { + "user": { + "login": "Krovatkin" + }, + "email": "korovaikon@gmail.com", + "name": "Nikolay Korovaiko" + }, + "oid": "70793b9f328ddf52cc86336104c3a064c8582ef4" + } }, { - "path": "test/onnx/expect/TestOperators.test_min.expect" + "commit": { + "author": { + "user": { + "login": "suo" + }, + "email": "suo@fb.com", + "name": "Michael Suo" + }, + "oid": "f70b31f62b1c5159eef2725484b175983517c88c" + } }, { - "path": "test/onnx/expect/TestOperators.test_mm.expect" + "commit": { + "author": { + "user": { + "login": "dagitses" + }, + "email": "mikeyd@fb.com", + "name": "Michael Andreas Dagitses" + }, + "oid": "04d3ec1db60defe1c6904bf77e9f8dfa87dc0b63" + } }, { - "path": "test/onnx/expect/TestOperators.test_narrow.expect" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "46b754a55b63e3168ad5854ad412c124934b675d" + } }, { - "path": "test/onnx/expect/TestOperators.test_ne.expect" + "commit": { + "author": { + "user": { + "login": "robieta" + }, + "email": "taylorrobie@fb.com", + "name": "Taylor Robie" + }, + "oid": "13df69e13ee571fdd716139419a00aec47ade7d6" + } }, { - "path": "test/onnx/expect/TestOperators.test_nonzero.expect" + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "70642e911ec80a47cdbf4a50aac475c11aa129b6" + } }, { - "path": "test/onnx/expect/TestOperators.test_norm_p1.expect" + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "59bb7c39384bf3e0b284a037adef8b3caa53c1c4" + } }, { - "path": "test/onnx/expect/TestOperators.test_norm_p2.expect" + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "007cfb97b55d70ff63e1ed71d1a674638f847376" + } }, { - "path": "test/onnx/expect/TestOperators.test_ones_like.expect" + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "0a7b858a5af1393fa3cf2853f92eca0e1d408dde" + } }, { - "path": "test/onnx/expect/TestOperators.test_pad.expect" + "commit": { + "author": { + "user": { + "login": "qihqi" + }, + "email": "qihan@fb.com", + "name": "Han Qi" + }, + "oid": "7917d789f0a523715041ade5177d271082628236" + } }, { - "path": "test/onnx/expect/TestOperators.test_params.expect" + "commit": { + "author": { + "user": { + "login": "kit1980" + }, + "email": "sdym@fb.com", + "name": "Sergii Dymchenko (Meta Employee)" + }, + "oid": "91eb6017f0fb8a1b29e8cb48fac93bc9709f73b3" + } }, { - "path": "test/onnx/expect/TestOperators.test_params_onnx_irv4.expect" + "commit": { + "author": { + "user": { + "login": "dagitses" + }, + "email": "mikeyd@fb.com", + "name": "Michael Andreas Dagitses" + }, + "oid": "bd04dca5fabb0c2a51ac87063a515f256ef274fa" + } }, { - "path": "test/onnx/expect/TestOperators.test_permute2.expect" - } - ], - "pageInfo": { - "endCursor": "MTAw", - "hasNextPage": true - } - }, - "reviews": { - "nodes": [ - { - "author": { - "login": "garymm" - }, - "state": "APPROVED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMi0xOFQxOToxODo0NC0wNjowMLkyMDIyLTAyLTE4VDE5OjE4OjQ0LTA2OjAwzjTr0H0=", - "hasPreviousPage": false - } - }, - "comments": { - "nodes": [ - { - "bodyText": "This PR cannot be merged by bot due to changing > 100 files. @malfet \n \n \n pytorch/.github/scripts/trymerge.py\n \n \n Line 63\n in\n 932adf2\n \n \n \n \n\n \n \n files(last: 100) { \n \n \n \n\n Can this be relaxed? If not please import.", - "author": { - "login": "BowenBao" - }, - "authorAssociation": "COLLABORATOR", - "editor": null, - "databaseId": 1048084569 + "commit": { + "author": { + "user": { + "login": "dagitses" + }, + "email": "mikeyd@fb.com", + "name": "Michael Andreas Dagitses" + }, + "oid": "1f805a5defda7dabc49d0059edb9ccb06bc29352" + } }, { - "bodyText": "This PR cannot be merged by bot due to changing > 100 files. @malfet\nCan this be relaxed? If not please import.\n\nWow, you've hit a really interesting problem. 100 is a limitation enforced by GitHub, see https://docs.github.com/en/graphql/overview/resource-limitations, but I can implement a pagination. Do you mind keeping it like that for a bit, want to land a fix soonish.", - "author": { - "login": "malfet" - }, - "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1048088691 + "commit": { + "author": { + "user": null, + "email": "mruberry@fb.com", + "name": "Mike Ruberry" + }, + "oid": "4982c0a8db8f23d15ec4bfcbca4ce939afc04954" + } }, { - "bodyText": "@malfet Thank you for info. Sure, I have separated the rest of stack from this one, we'll wait for the fix to try again.", - "author": { - "login": "BowenBao" - }, - "authorAssociation": "COLLABORATOR", - "editor": null, - "databaseId": 1048090640 + "commit": { + "author": { + "user": { + "login": "pearu" + }, + "email": "pearu.peterson@gmail.com", + "name": "Pearu Peterson" + }, + "oid": "28502265cb5925cb7db8dcb2dd2334963092714a" + } }, { - "bodyText": "@pytorchbot merge this", - "author": { - "login": "BowenBao" - }, - "authorAssociation": "COLLABORATOR", - "editor": null, - "databaseId": 1050293881 + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "e03fcaedb1342e6d65c7f7f20243000938ba60b2" + } }, { - "bodyText": "Hey @BowenBao.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", - "author": { - "login": "github-actions" - }, - "authorAssociation": "NONE", - "editor": null, - "databaseId": 1050295451 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOPniAWQ==", - "hasPreviousPage": true - } - }, - "labels": { - "edges": [ + "commit": { + "author": { + "user": { + "login": "pritamdamania" + }, + "email": "pritam.damania@fb.com", + "name": "pritam" + }, + "oid": "efb28f5a1a5d18aa96bd668ab2ab5c651be359f3" + } + }, { - "node": { - "name": "oncall: jit" + "commit": { + "author": { + "user": { + "login": "MagiaSN" + }, + "email": "magialiao@tencent.com", + "name": "magialiao" + }, + "oid": "52cc1b9994f861ebdd3908759ed1ab11cba1f8de" } }, { - "node": { - "name": "open source" + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "3cd99f23d1acd6a5bedf6f3b02be79d64350a5b6" } }, { - "node": { - "name": "cla signed" + "commit": { + "author": { + "user": { + "login": "awgu" + }, + "email": "andgu@fb.com", + "name": "Andrew Gu" + }, + "oid": "b00502c634a5146f4d996bd90e84d317f049e7b0" } }, { - "node": { - "name": "release notes: onnx" + "commit": { + "author": { + "user": { + "login": "davidberard98" + }, + "email": "dberard@fb.com", + "name": "David Berard" + }, + "oid": "976eb7cee799dddfbe6a4122b249aaee1b6c8854" } }, { - "node": { - "name": "topic: bug fixes" + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" + }, + "oid": "9608ab28744d5cae32f371490557b248c9549c66" } - } - ] - } - } - } - } - }, - "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MTAw name=pytorch number=73099 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "files": { - "nodes": [ + }, { - "path": "test/onnx/expect/TestOperators.test_pixel_shuffle.expect" + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "4e119f0c39eb5ff0777f0e71561e6b633d85fb34" + } }, { - "path": "test/onnx/expect/TestOperators.test_pow.expect" + "commit": { + "author": { + "user": { + "login": "rohan-varma" + }, + "email": "rvarm1@fb.com", + "name": "Rohan Varma" + }, + "oid": "447580dc565f3660eddb2c996c6ed25b88338684" + } }, { - "path": "test/onnx/expect/TestOperators.test_prelu.expect" + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "2bc8f43e9233008ea23053fab87b83ab36fca5e3" + } }, { - "path": "test/onnx/expect/TestOperators.test_prod.expect" + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "c13a8e891c3e3e714f60649ca1e3b082e090e9fe" + } }, { - "path": "test/onnx/expect/TestOperators.test_prod_dtype.expect" + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "fddc861b7ee473f57d3c2161e4618a2663a237e8" + } }, { - "path": "test/onnx/expect/TestOperators.test_rand.expect" + "commit": { + "author": { + "user": { + "login": "jiyuanzFB" + }, + "email": "jiyuanz@fb.com", + "name": "Jiyuan Zhang" + }, + "oid": "e2336dbc539d6c021720cbe43c92c9e4c8463299" + } }, { - "path": "test/onnx/expect/TestOperators.test_randn.expect" + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "26e2759d1ad59aac12168b74d1ca55e42ba9455c" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduce_sum_negative_indices.expect" + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "ad7aa914ee3b3d1252e31514f010ba96c40aae87" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_mean.expect" + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "f113c5d78065aafbe7b1c0e611945bfe9f67b3c0" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_mean_dtype.expect" + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "a366fd01136292544b7862968ae92feba4b6d8fe" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_mean_keepdim.expect" + "commit": { + "author": { + "user": { + "login": "seemethere" + }, + "email": "eliuriegas@fb.com", + "name": "Eli Uriegas" + }, + "oid": "afeba0773749da5883c378a2e6ac066e1ce62ca0" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_prod.expect" + "commit": { + "author": { + "user": { + "login": "bdhirsh" + }, + "email": "hirsheybar@fb.com", + "name": "Brian Hirsh" + }, + "oid": "d306c99addc543908f64666baeecacbd0749f4a7" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_prod_dtype.expect" + "commit": { + "author": { + "user": { + "login": "awgu" + }, + "email": "andgu@fb.com", + "name": "Andrew Gu" + }, + "oid": "c2456ea658f41f64ea054a422edf22a9c977399f" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_prod_keepdim.expect" + "commit": { + "author": { + "user": { + "login": "awgu" + }, + "email": "andgu@fb.com", + "name": "Andrew Gu" + }, + "oid": "a8b0a1b681c9fe41e0d553c962a5c93e81d92503" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_sum.expect" + "commit": { + "author": { + "user": { + "login": "anjali411" + }, + "email": "chourdiaanjali123@gmail.com", + "name": "anjali411" + }, + "oid": "af761d9a5d058c9188f16589bae4f307d35185be" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_sum_dtype.expect" + "commit": { + "author": { + "user": { + "login": "clee2000" + }, + "email": "csl@fb.com", + "name": "Catherine Lee" + }, + "oid": "beceb417baef35b15c2716e23178fb49f7fd6f9d" + } }, { - "path": "test/onnx/expect/TestOperators.test_reduced_sum_keepdim.expect" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "1516554e22136db89d0aeba43a1a1a987e995d68" + } }, { - "path": "test/onnx/expect/TestOperators.test_reducemax.expect" + "commit": { + "author": { + "user": { + "login": "qihqi" + }, + "email": "qihan@fb.com", + "name": "Han Qi" + }, + "oid": "68eb1fa8374eff6cbdcf0be5e37ed6775d22e722" + } }, { - "path": "test/onnx/expect/TestOperators.test_reducemin.expect" + "commit": { + "author": { + "user": { + "login": "janeyx99" + }, + "email": "janeyx@fb.com", + "name": "Jane Xu" + }, + "oid": "3c7bcb99b5c0c879c2610f427880b03881f82f38" + } }, { - "path": "test/onnx/expect/TestOperators.test_remainder.expect" + "commit": { + "author": { + "user": { + "login": "janeyx99" + }, + "email": "janeyx@fb.com", + "name": "Jane Xu" + }, + "oid": "38c1a2028090353e40a019c673c9ab16b39e4825" + } }, { - "path": "test/onnx/expect/TestOperators.test_repeat.expect" + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "8091cbea2c95ed2c4c406b3c61547a27c6319bae" + } }, { - "path": "test/onnx/expect/TestOperators.test_repeat_dim_overflow.expect" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "d81f59121969a47c8b2213a88e02cf9be0219be9" + } }, { - "path": "test/onnx/expect/TestOperators.test_round.expect" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "20d798b319cd107a767fe220f7a3027c18a1c844" + } }, { - "path": "test/onnx/expect/TestOperators.test_rrelu.expect" + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "eb35381a770b58c1cd41e935910cb4df2f3d8f14" + } }, { - "path": "test/onnx/expect/TestOperators.test_rsqrt.expect" + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "e6498a657b9aa47546dcd92d1b4ffb2e1a50ebdb" + } }, { - "path": "test/onnx/expect/TestOperators.test_rsub.expect" + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "7f821382db5ad08efe5b09a145c606852b8a9272" + } }, { - "path": "test/onnx/expect/TestOperators.test_scatter_add.expect" + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "995c0e11a97d854ff969962bd81d7341e46ecb07" + } }, { - "path": "test/onnx/expect/TestOperators.test_scatter_add_opset11.expect" + "commit": { + "author": { + "user": { + "login": "davidberard98" + }, + "email": "dberard@fb.com", + "name": "David Berard" + }, + "oid": "28d6258e62c9fc361a18689877c962c69889dc23" + } }, { - "path": "test/onnx/expect/TestOperators.test_selu.expect" + "commit": { + "author": { + "user": { + "login": "HarborYuan" + }, + "email": "yuanhaobo@whu.edu.cn", + "name": "Haobo Yuan" + }, + "oid": "2350fad8391367ebf81c7236a2c883644b4ff622" + } }, { - "path": "test/onnx/expect/TestOperators.test_shape_value_map.expect" + "commit": { + "author": { + "user": { + "login": "zou3519" + }, + "email": "zou3519@gmail.com", + "name": "Richard Zou" + }, + "oid": "3f789c9ccecdd7e2e52269453646e992a68c6b92" + } }, { - "path": "test/onnx/expect/TestOperators.test_sign.expect" + "commit": { + "author": { + "user": { + "login": "jeffdaily" + }, + "email": "jeff.daily@amd.com", + "name": "Jeff Daily" + }, + "oid": "20f79f610c1a3314da96d49515bbfbee9442e4f8" + } }, { - "path": "test/onnx/expect/TestOperators.test_sin.expect" + "commit": { + "author": { + "user": { + "login": "peterbell10" + }, + "email": "peterbell10@live.co.uk", + "name": "Peter Bell" + }, + "oid": "5823958f047f3b71a5dc8c52a20eb8ae3291bd3e" + } }, { - "path": "test/onnx/expect/TestOperators.test_slice.expect" + "commit": { + "author": { + "user": { + "login": "peterbell10" + }, + "email": "peterbell10@live.co.uk", + "name": "Peter Bell" + }, + "oid": "a0b15c49ecf3844daf2c0dcaef44f0214259db20" + } }, { - "path": "test/onnx/expect/TestOperators.test_slice_dynamic.expect" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "4afc38c25ca2ca126ba4987a419a58a5c572223b" + } }, { - "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy.expect" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "b606f58d4a36683fbe0a7d02adfdde7d5cc694c2" + } }, { - "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_3d.expect" + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "2d61b4d630f6482a6c3cc7437091fad6d27c347e" + } }, { - "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_3d_none.expect" + "commit": { + "author": { + "user": { + "login": "george-qi" + }, + "email": "georgeqi94@gmail.com", + "name": "George Qi" + }, + "oid": "bc5384c47036a6cda94129f3e2f9e43c43393698" + } }, { - "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_4d.expect" + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "60fc3277634365b64465712b13db2acb76d6c890" + } }, { - "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_ignore_index.expect" + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "1b8762e95bc38d1847fe99ed3230546c8b800bfd" + } }, { - "path": "test/onnx/expect/TestOperators.test_softmaxcrossentropy_weights.expect" + "commit": { + "author": { + "user": { + "login": "jerryzh168" + }, + "email": "jerryzh168@gmail.com", + "name": "Jerry Zhang" + }, + "oid": "6acf60f95f59ecbc6e8ce830dea0abba7d3ec763" + } }, { - "path": "test/onnx/expect/TestOperators.test_split.expect" + "commit": { + "author": { + "user": { + "login": "ysiraichi" + }, + "email": "yukio.siraichi@gmail.com", + "name": "Yukio Siraichi" + }, + "oid": "8fb0276561fdd530c5a06ea195e930e0584f8705" + } }, { - "path": "test/onnx/expect/TestOperators.test_split_with_sizes.expect" + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "1da7aed95a8700406671425eac1e4bbc2c7a24b5" + } }, { - "path": "test/onnx/expect/TestOperators.test_sqrt.expect" + "commit": { + "author": { + "user": { + "login": "thiagocrepaldi" + }, + "email": "thiago.crepaldi@microsoft.com", + "name": "Thiago Crepaldi" + }, + "oid": "83208e7dee4503c1bee1df9f6632794694dffa01" + } }, { - "path": "test/onnx/expect/TestOperators.test_std.expect" + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "1a46cf08dcd3d3564604c17b2c02d7e4eb45a7ff" + } }, { - "path": "test/onnx/expect/TestOperators.test_sum.expect" + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nshulga@fb.com", + "name": "Nikita Shulga" + }, + "oid": "b7f9b6689445f826c83694652fea5f7cfc7070d7" + } }, { - "path": "test/onnx/expect/TestOperators.test_sum_dtype.expect" + "commit": { + "author": { + "user": { + "login": "fatcat-z" + }, + "email": "jiz@microsoft.com", + "name": "Jay Zhang" + }, + "oid": "f273961c1696b156e35f8c76f7ad37934031050d" + } }, { - "path": "test/onnx/expect/TestOperators.test_tan.expect" + "commit": { + "author": { + "user": { + "login": "pavithranrao" + }, + "email": "pavithran@fb.com", + "name": "Pavithran Ramachandran" + }, + "oid": "eb410a51fcbc716873fd80a970eb932d4aaaea61" + } }, { - "path": "test/onnx/expect/TestOperators.test_topk.expect" + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" + }, + "oid": "7dbb12cdc02332fa64264ed0df576511a5070d7e" + } }, { - "path": "test/onnx/expect/TestOperators.test_topk_smallest_unsorted.expect" + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "43675665fa6b5154de8b25125dd03d7be35c884f" + } }, { - "path": "test/onnx/expect/TestOperators.test_transpose.expect" + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "6c4d23c402c413667463770d9a2fa801f493d3c5" + } }, { - "path": "test/onnx/expect/TestOperators.test_type_as.expect" + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "cf3778a35129a40dee14366515201b7ed2c0f346" + } }, { - "path": "test/onnx/expect/TestOperators.test_unfold.expect" + "commit": { + "author": { + "user": { + "login": "dzdang" + }, + "email": "dzdang@umich.edu", + "name": "dzdang" + }, + "oid": "9d00a051373cb81f79cb6375942cf3ec9fff2fe6" + } }, { - "path": "test/onnx/expect/TestOperators.test_unique.expect" + "commit": { + "author": { + "user": { + "login": "pytorchmergebot" + }, + "email": "pytorchmergebot@users.noreply.github.com", + "name": "PyTorch MergeBot" + }, + "oid": "1eae67cf404aa8dffb80b8e85180f943878d52a6" + } }, { - "path": "test/onnx/expect/TestOperators.test_unsqueeze.expect" + "commit": { + "author": { + "user": { + "login": "janeyx99" + }, + "email": "janeyx@fb.com", + "name": "Jane Xu" + }, + "oid": "ce0e69dcda0fe41a6e964d6ac70ce8016979c71a" + } }, { - "path": "test/onnx/expect/TestOperators.test_upsample_nearest_scale.expect" + "commit": { + "author": { + "user": { + "login": "swolchok" + }, + "email": "swolchok@fb.com", + "name": "Scott Wolchok" + }, + "oid": "6faba554f6e49777f24911928edb3061b6ed0e3d" + } }, { - "path": "test/onnx/expect/TestOperators.test_upsample_nearest_scale_default_scale_factor.expect" + "commit": { + "author": { + "user": { + "login": "IvanYashchuk" + }, + "email": "ivan.yashchuk@aalto.fi", + "name": "Ivan Yashchuk" + }, + "oid": "d1d0e03f57a359f8f95331f9a34b8bed3e7cc845" + } }, { - "path": "test/onnx/expect/TestOperators.test_upsample_nearest_size.expect" + "commit": { + "author": { + "user": { + "login": "Chillee" + }, + "email": "chilli@fb.com", + "name": "Horace He" + }, + "oid": "bb46bd9233a9fc631802a902cb48a4c13c2722ca" + } }, { - "path": "test/onnx/expect/TestOperators.test_view.expect" + "commit": { + "author": { + "user": { + "login": "mehtanirav" + }, + "email": "niravmehta@fb.com", + "name": "Nirav Mehta" + }, + "oid": "3b1007fe4be12e483f2620fbac67cae42e703efc" + } }, { - "path": "test/onnx/expect/TestOperators.test_view_flatten.expect" + "commit": { + "author": { + "user": { + "login": "mehtanirav" + }, + "email": "niravmehta@fb.com", + "name": "Nirav Mehta" + }, + "oid": "b4b65228dd0c109f5fdf17c7d9e56f60a98e398b" + } }, { - "path": "test/onnx/expect/TestOperators.test_zeros_like.expect" + "commit": { + "author": { + "user": { + "login": "albanD" + }, + "email": "albandes@fb.com", + "name": "Alban Desmaison" + }, + "oid": "d629e300705196d3ae0bac5ed983b197101fa2ee" + } }, { - "path": "torch/csrc/jit/serialization/export.cpp" + "commit": { + "author": { + "user": { + "login": "bigfootjon" + }, + "email": "jonjanzen@fb.com", + "name": "Jon Janzen" + }, + "oid": "52754b9e515f378f8476ad44d75b0a692bad8cde" + } }, - { - "path": "torch/csrc/jit/serialization/export.h" - } - ], - "pageInfo": { - "endCursor": "MTYy", - "hasNextPage": false - } - } - } - } - } - }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=74649 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "closed": true, - "isCrossRepository": false, - "author": { - "login": "malfet" - }, - "title": "This should fail flake8", - "body": "Test issue for GHF mandatory checks", - "headRefName": "malfet-patch-8", - "headRepository": { - "nameWithOwner": "pytorch/pytorch" - }, - "baseRefName": "master", - "baseRepository": { - "nameWithOwner": "pytorch/pytorch", - "isPrivate": false, - "defaultBranchRef": { - "name": "master" - } - }, - "mergeCommit": null, - "commits_with_authors": { - "nodes": [ { "commit": { "author": { "user": { - "login": "malfet" + "login": "samdow" }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" + "email": "samdow@fb.com", + "name": "samdow" }, - "oid": "57c86ff1c5ab948888fd329986c9d55796680e33" + "oid": "128c3ad747093f4970329a82c7c4720420faeff2" } }, { "commit": { "author": { "user": { - "login": "malfet" + "login": "arindamroy-eng" }, - "email": "nshulga@fb.com", - "name": "Nikita Shulga" + "email": "61168652+arindamroy-eng@users.noreply.github.com", + "name": "arindamroy-eng" }, - "oid": "6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4" + "oid": "2a0bda7d32a5bcc9827f7254a7b77cceb16ba973" } } ], "pageInfo": { - "endCursor": "Mg", - "hasNextPage": false + "endCursor": "MTAw", + "hasNextPage": true }, - "totalCount": 2 + "totalCount": 131 }, "commits": { "nodes": [ @@ -15037,14 +24739,14 @@ } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsK3w=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuNRg4=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018129" + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693698" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1E=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRAI=" }, { "node": { @@ -15061,9 +24763,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018131" + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693712" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1M=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRBA=" }, { "node": { @@ -15080,9 +24782,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018132" + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693725" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1Q=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRB0=" }, { "node": { @@ -15099,9 +24801,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018134" + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693741" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1Y=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRC0=" }, { "node": { @@ -15118,9 +24820,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018139" + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693761" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1s=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsREE=" }, { "node": { @@ -15137,9 +24839,9 @@ } }, "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018142" + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193693774" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj14=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRE4=" }, { "node": { @@ -15149,110 +24851,85 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" } }, "checkRuns": { "nodes": [ { - "name": "clang-format", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669399915?check_suite_focus=true" - }, - { - "name": "clang-tidy", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669399990?check_suite_focus=true" - }, - { - "name": "cmakelint", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400052?check_suite_focus=true" - }, - { - "name": "flake8-py3", - "conclusion": "FAILURE", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400154?check_suite_focus=true" - }, - { - "name": "mypy", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400239?check_suite_focus=true" - }, + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192463/jobs/3232430975" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuNR-Y=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694412" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRsw=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ { "name": "Test collect_env (with_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400327?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461134" }, { "name": "Test collect_env (without_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400361?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461211" }, { - "name": "Test tools", + "name": "toc", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400470?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461301" }, { - "name": "py2-setup-validate-errormsg", + "name": "Test tools", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400681?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461386" }, { "name": "quick-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400789?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461521" }, { - "name": "toc", + "name": "lintrunner", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669400953?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461634" }, { - "name": "shellcheck", + "name": "workflow-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669401126?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsMiY=", - "hasNextPage": false - } - }, - "conclusion": "FAILURE", - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018384" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkFA=" - }, - { - "node": { - "app": { - "name": "GitHub Actions", - "databaseId": 15368 - }, - "workflowRun": { - "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" - } - }, - "checkRuns": { - "nodes": [ - { - "name": "run-torchbench", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669399917?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192461/jobs/3232461717" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsLW0=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuN84s=", "hasNextPage": false } }, - "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018395" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694417" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkFs=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRtE=" }, { "node": { @@ -15268,2704 +24945,7298 @@ "checkRuns": { "nodes": [ { - "name": "pytorch-xla-linux-bionic-py3.7-clang8", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669414276?check_suite_focus=true" + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232460797" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "name": "linux-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669414324?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232460951" }, { - "name": "linux-bionic-py3.7-clang9 / build", + "name": "linux-xenial-py3.7-clang7-onnx / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669414430?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461088" }, { - "name": "linux-bionic-rocm4.5-py3.7 / build", + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669414605?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461294" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669414697?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461410" }, { - "name": "linux-xenial-py3.7-gcc5.4 / build", + "name": "linux-xenial-py3.7-clang7-asan / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669414841?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461543" }, { - "name": "linux-xenial-py3-clang5-mobile-build / build", + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669414951?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461628" }, { - "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "name": "linux-bionic-rocm5.0-py3.7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415003?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461719" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "name": "linux-vulkan-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415060?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461789" }, { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415120?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461869" }, { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415166?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232461946" }, { - "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415236?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462044" }, { - "name": "win-vs2019-cuda11.3-py3 / build", + "name": "linux-xenial-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415288?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462112" }, { - "name": "win-vs2019-cpu-py3 / build", + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415348?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462244" }, { - "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "name": "win-vs2019-cuda11.3-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415451?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462360" }, { - "name": "linux-xenial-py3.7-gcc7 / build", + "name": "linux-xenial-py3-clang5-mobile-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415561?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462432" }, { - "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "name": "win-vs2019-cpu-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415607?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462521" }, { - "name": "linux-xenial-py3.7-clang7-onnx / build", + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415642?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462621" }, { - "name": "pytorch-xla-linux-bionic-py3.7-clang8", - "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415706?check_suite_focus=true" + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462683" }, { - "name": "linux-xenial-py3.7-clang7-asan / build", + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669415757?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232462738" }, { "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669488974?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232545510" }, { "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669489019?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232545571" }, { "name": "linux-docs / build-docs (cpp)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669492162?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547522" }, { "name": "linux-docs / build-docs (python)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669492211?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547612" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669492293?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547714" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669492341?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547764" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669492396?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547824" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669492440?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547869" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669492497?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547909" }, { "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669492558?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232547973" }, { - "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669496296?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232553452" }, { - "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669496350?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232553558" }, { - "name": "linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge)", + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669496393?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232553605" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669498726?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232553650" }, { "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669500818?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232563716" }, { "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669500848?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232563763" }, { "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669518721?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232582650" }, { "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669518760?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232582703" }, { "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669518798?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232582741" }, { - "name": "linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669549301?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232590204" }, { - "name": "linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "name": "linux-bionic-rocm5.0-py3.7 / test (default, 1, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669549318?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232608872" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "name": "linux-bionic-rocm5.0-py3.7 / test (default, 2, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669559843?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232608976" }, { "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669567414?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232637097" }, { "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669567499?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232637199" }, { "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669567553?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232637259" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232639932" }, { "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669619773?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232687012" }, { "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669619803?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232687074" }, { "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669724420?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232785088" }, { "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669724451?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/5669724478?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2197192471/jobs/3232785153" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHxIT4=", - "hasNextPage": false + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuVD9M=", + "hasNextPage": true } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018405" + "url": "https://github.com/pytorch/pytorch/commit/5696e8357cf38f852ef3d680381513e26f202371/checks?check_suite_id=6193694439" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkGU=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXEsRuc=" } ], "pageInfo": { "hasNextPage": false } }, - "pushedDate": "2022-03-24T00:42:33Z", - "oid": "6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4" + "status": null, + "pushedDate": "2022-04-20T17:10:41Z", + "oid": "5696e8357cf38f852ef3d680381513e26f202371" } } ] }, - "changedFiles": 1, + "changedFiles": 348, "files": { "nodes": [ { - "path": "torch/nn/cpp.py" - } - ], - "pageInfo": { - "endCursor": "MQ", - "hasNextPage": false - } - }, - "reviews": { - "nodes": [ + "path": ".circleci/cimodel/data/pytorch_build_data.py" + }, { - "author": { - "login": "seemethere" - }, - "state": "APPROVED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMy0yM1QxNzo1MDo0NS0wNTowMLkyMDIyLTAzLTIzVDE3OjUwOjQ1LTA1OjAwzjbPEDg=", - "hasPreviousPage": false - } - }, - "comments": { - "nodes": [ + "path": ".circleci/cimodel/data/pytorch_build_definitions.py" + }, { - "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/74649\n\u21a9\ufe0f \u00a0[fb-only] Re-run with SSH instructions\nNeed help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 6c3c3de (more details on the Dr. CI page):\n\n\n1/1 failures introduced in this PR\n\n\n1 failure not recognized by patterns:\n\n\n\nJob\nStep\nAction\n\n\n\n\n Lint / flake8-py3\nFail if there were any warnings\n\ud83d\udd01 rerun\n\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", - "author": { - "login": "facebook-github-bot" - }, - "authorAssociation": "MEMBER", - "editor": { - "login": "facebook-github-bot" - }, - "databaseId": 1076891218 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOQDAOUg==", - "hasPreviousPage": false - } - }, - "labels": { - "edges": [ + "path": ".circleci/scripts/cpp_doc_push_script.sh" + }, { - "node": { - "name": "cla signed" - } - } - ] - } - } - } - } - }, - "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=metamates org=pytorch": { - "data": { - "organization": { - "team": { - "members": { - "nodes": [ + "path": ".circleci/scripts/python_doc_push_script.sh" + }, { - "login": "dreiss" + "path": ".github/actions/checkout-pytorch/action.yml" }, { - "login": "kumpera" + "path": ".github/merge_rules.json" }, { - "login": "ezyang" + "path": ".github/scripts/gitutils.py" }, { - "login": "yaroslavvb" + "path": ".github/scripts/gql_mocks.json" }, { - "login": "stephenroller" + "path": ".github/scripts/trymerge.py" }, { - "login": "swolchok" + "path": ".github/workflows/_bazel-build-test.yml" }, { - "login": "hyuen" + "path": ".github/workflows/_linux-build.yml" }, { - "login": "orionr" + "path": ".github/workflows/_linux-test.yml" }, { - "login": "dhruvbird" + "path": ".github/workflows/_mac-test.yml" }, { - "login": "likethesky" + "path": ".github/workflows/_rocm-test.yml" }, { - "login": "lw" + "path": ".github/workflows/_win-test.yml" }, { - "login": "raziel" + "path": ".github/workflows/buck_build_test.yml" }, { - "login": "simpkins" + "path": ".github/workflows/lint.yml" }, { - "login": "ebyrne" + "path": ".github/workflows/periodic.yml" }, { - "login": "Babar" + "path": ".github/workflows/pull.yml" }, { - "login": "kostmo" + "path": ".github/workflows/trunk.yml" }, { - "login": "0x00b1" + "path": ".jenkins/pytorch/macos-test.sh" }, { - "login": "bhosmer" + "path": ".jenkins/pytorch/test.sh" }, { - "login": "zdevito" + "path": ".jenkins/pytorch/win-test.sh" }, { - "login": "bugra" + "path": ".lintrunner.toml" }, { - "login": "caraya10" + "path": "BUILD.bazel" }, { - "login": "kit1980" + "path": "CODEOWNERS" }, { - "login": "shoumikhin" + "path": "README.md" }, { - "login": "huydhn" + "path": "aten/src/ATen/BatchingRegistrations.cpp" }, { - "login": "teytaud" + "path": "aten/src/ATen/Dispatch.h" }, { - "login": "xuzhao9" + "path": "aten/src/ATen/ExpandUtils.h" }, { - "login": "jansel" + "path": "aten/src/ATen/FunctionalInverses.cpp" + }, + { + "path": "aten/src/ATen/FunctionalStorageImpl.cpp" + }, + { + "path": "aten/src/ATen/FunctionalStorageImpl.h" + }, + { + "path": "aten/src/ATen/FunctionalTensorWrapper.cpp" + }, + { + "path": "aten/src/ATen/FunctionalTensorWrapper.h" + }, + { + "path": "aten/src/ATen/FunctionalizeFallbackKernel.cpp" + }, + { + "path": "aten/src/ATen/NestedTensorImpl.cpp" + }, + { + "path": "aten/src/ATen/OpMathType.h" + }, + { + "path": "aten/src/ATen/SparseCsrTensorUtils.h" + }, + { + "path": "aten/src/ATen/ThreadLocalState.cpp" + }, + { + "path": "aten/src/ATen/ThreadLocalState.h" + }, + { + "path": "aten/src/ATen/autocast_mode.cpp" + }, + { + "path": "aten/src/ATen/autocast_mode.h" + }, + { + "path": "aten/src/ATen/core/SymIntArrayRef.cpp" + }, + { + "path": "aten/src/ATen/core/SymIntArrayRef.h" + }, + { + "path": "aten/src/ATen/core/TensorBase.h" + }, + { + "path": "aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h" + }, + { + "path": "aten/src/ATen/core/dispatch/Dispatcher.h" + }, + { + "path": "aten/src/ATen/core/interned_strings.h" + }, + { + "path": "aten/src/ATen/core/ivalue.cpp" + }, + { + "path": "aten/src/ATen/core/ivalue.h" + }, + { + "path": "aten/src/ATen/core/ivalue_inl.h" + }, + { + "path": "aten/src/ATen/core/jit_type.h" + }, + { + "path": "aten/src/ATen/core/jit_type_base.h" + }, + { + "path": "aten/src/ATen/core/type.cpp" + }, + { + "path": "aten/src/ATen/cuda/CUDASparse.h" + }, + { + "path": "aten/src/ATen/cuda/llvm_complex.cpp" + }, + { + "path": "aten/src/ATen/cuda/llvm_jit_strings.h" + }, + { + "path": "aten/src/ATen/native/Blas.cpp" + }, + { + "path": "aten/src/ATen/native/Itertools.cpp" + }, + { + "path": "aten/src/ATen/native/LinearAlgebra.cpp" + }, + { + "path": "aten/src/ATen/native/SoftMax.cpp" + }, + { + "path": "aten/src/ATen/native/TensorConversions.cpp" + }, + { + "path": "aten/src/ATen/native/TensorShape.cpp" + }, + { + "path": "aten/src/ATen/native/TensorShape.h" + }, + { + "path": "aten/src/ATen/native/Unique.cpp" + }, + { + "path": "aten/src/ATen/native/cuda/BinaryMiscBackwardOpsKernels.cu" }, { - "login": "abhinavarora" + "path": "aten/src/ATen/native/cuda/CUDAJitLoops.cuh" }, { - "login": "b0noI" + "path": "aten/src/ATen/native/cuda/JitLoops.cuh" }, { - "login": "djthorne" + "path": "aten/src/ATen/native/cuda/Lerp.cu" }, { - "login": "nairbv" + "path": "aten/src/ATen/native/cuda/PersistentSoftmax.cuh" }, { - "login": "Mortimerp9" + "path": "aten/src/ATen/native/cuda/SoftMax.cu" }, { - "login": "dadkins20" + "path": "aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu" }, { - "login": "colesbury" + "path": "aten/src/ATen/native/cuda/Unique.cu" }, { - "login": "laurencer" + "path": "aten/src/ATen/native/cuda/jit_utils.cpp" }, { - "login": "nickgg" + "path": "aten/src/ATen/native/cuda/jit_utils.h" }, { - "login": "yzhao30" + "path": "aten/src/ATen/native/native_functions.yaml" }, { - "login": "rmaz" + "path": "aten/src/ATen/native/nested/NestedTensorMath.cpp" }, { - "login": "bearzx" + "path": "aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp" }, { - "login": "mattjgalloway" + "path": "aten/src/ATen/native/quantized/cpu/qsoftmax.cpp" }, { - "login": "chenyang78" + "path": "aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp" }, { - "login": "yns88" + "path": "aten/src/ATen/native/quantized/cudnn/Linear.cpp" }, { - "login": "lc0" + "path": "aten/src/ATen/native/quantized/cudnn/utils.h" }, { - "login": "wenleix" + "path": "aten/src/ATen/native/sparse/SparseCsrTensor.cpp" }, { - "login": "jingsh" + "path": "aten/src/ATen/native/ts_native_functions.yaml" }, { - "login": "mthrok" + "path": "aten/src/ATen/record_function.cpp" }, { - "login": "drdarshan" + "path": "aten/src/ATen/record_function.h" }, { - "login": "tvalentius" + "path": "aten/src/ATen/templates/Operators.h" }, { - "login": "d4l3k" + "path": "aten/src/ATen/templates/RegisterFunctionalization.cpp" }, { - "login": "jamiemccrindle" + "path": "aten/src/ATen/test/basic.cpp" }, { - "login": "kazhang" + "path": "aten/src/ATen/test/vmap_test.cpp" }, { - "login": "simonhollis" + "path": "binaries/record_function_benchmark.cc" }, { - "login": "lqiao" + "path": "c10/core/DispatchKey.cpp" }, { - "login": "ajyu" + "path": "c10/core/DispatchKey.h" }, { - "login": "govardhan" + "path": "c10/core/DispatchKeySet.h" }, { - "login": "yinghai" + "path": "c10/test/core/DispatchKeySet_test.cpp" }, { - "login": "zyan0" + "path": "c10/util/ArrayRef.h" }, { - "login": "ajtulloch" + "path": "caffe2/core/tensor.h" }, { - "login": "vtlam" + "path": "docs/source/conf.py" }, { - "login": "pbelevich" + "path": "docs/source/fx.rst" + } + ], + "pageInfo": { + "endCursor": "MTAw", + "hasNextPage": true + } + }, + "reviews": { + "nodes": [], + "pageInfo": { + "startCursor": null, + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "Merge failed due to Matched rule superuser, but it was not reviewed yet by any of:zou3519,abhikrish,mehtanirav,wconstab,lc0, ...", + "createdAt": "2022-04-20T17:26:18Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1104215370 }, { - "login": "VitalyFedyunin" + "bodyText": "Merge failed due to Matched rule superuser, but PR has not been reviewed yet", + "createdAt": "2022-04-20T17:31:26Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1104220908 }, { - "login": "dbish" + "bodyText": "@pytorchbot merge this", + "createdAt": "2022-04-20T19:30:50Z", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1104378397 }, { - "login": "NicolasHug" + "bodyText": "Merge failed due to Matched rule superuser, but PR has not been reviewed yet\nRaised by https://github.com/pytorch/pytorch/actions/runs/2197877090", + "createdAt": "2022-04-20T19:32:10Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1104379712 }, { - "login": "efaust" + "bodyText": "Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale. Feel free to remove the Stale label if you feel this was a mistake. If you are unable to remove the Stale label please contact a maintainer in order to do so. If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.", + "createdAt": "2022-06-20T16:44:05Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1160658699 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQdD9Sg==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "cla signed" + } }, { - "login": "jfix71" + "node": { + "name": "Stale" + } + } + ] + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=76123 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "kumpera" + }, + "title": "Introduce distributed checkpoint with ShardedTensor.", + "body": "Co-authored-by: Wen Zhang \r\nCo-authored-by: Yifu Wang \r\n\r\n", + "headRefName": "st_checkpoint", + "headRepository": { + "nameWithOwner": "kumpera/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "kumpera" + }, + "email": "kumpera@fb.com", + "name": "Rodrigo Kumpera" + }, + "oid": "6bf248bc20a71f248064b795f38276326fe43aae" + } }, { - "login": "atuljangra" + "commit": { + "author": { + "user": { + "login": "kumpera" + }, + "email": "kumpera@fb.com", + "name": "Rodrigo Kumpera" + }, + "oid": "10f84fb90bf02d7062e565ebf2c1da6352b64db7" + } }, { - "login": "idning" + "commit": { + "author": { + "user": { + "login": "kumpera" + }, + "email": "kumpera@fb.com", + "name": "Rodrigo Kumpera" + }, + "oid": "96c5299740ec791f3cf0975c03a40a7b219b6747" + } + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + }, + "totalCount": 3 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.intern.facebook.com/cla/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS2l4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755666" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSmtI=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063614/jobs/3379894109" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd2r3Q=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755785" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm0k=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894107" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894332" + }, + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894444" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894520" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894567" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894616" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063615/jobs/3379894672" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd2shU=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755786" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm0o=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902301" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902363" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902507" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902560" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902579" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902603" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902637" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902685" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902740" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902761" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902794" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379902874" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903006" + }, + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903111" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903193" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903284" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903357" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903446" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903512" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379903546" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379944655" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379944695" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946308" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946337" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946359" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946391" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946423" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946453" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946496" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379946529" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950041" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950137" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950165" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950192" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379950646" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379951202" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379951230" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379963877" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379963928" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379963976" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379964018" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379966372" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379996173" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379996218" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379997861" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379998374" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379998397" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379998422" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3379998441" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2273063632/jobs/3380042106" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXd5yuY=", + "hasNextPage": true + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6380755806" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXxSm14=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387419477" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387419699" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387419923" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387419992" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387420129" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387420208" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796859/jobs/3387420309" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS3SE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363240" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNGg=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796862/jobs/3387419465" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgS1-o=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363271" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNIc=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-bionic-rocm5.1-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387419999" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420164" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420316" + }, + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420477" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420675" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387420934" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387421278" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387421672" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387421888" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387421982" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422191" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422303" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422476" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422715" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387422963" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423092" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423234" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423421" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423622" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387423739" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387545789" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387546032" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387546119" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553028" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553144" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553251" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553438" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553556" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387553668" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387554002" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387554098" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387558927" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387559016" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387559071" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387559139" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387563803" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387563894" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387580868" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387580936" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387580993" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 4, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387581053" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387592286" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387631950" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387632035" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387649916" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387649974" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387650084" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387650151" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387650373" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2276796865/jobs/3387753429" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAXgaCXo=", + "hasNextPage": true + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/96c5299740ec791f3cf0975c03a40a7b219b6747/checks?check_suite_id=6390363300" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXzlNKQ=" + } + ], + "pageInfo": { + "hasNextPage": false + } + }, + "status": null, + "pushedDate": "2022-05-05T00:34:26Z", + "oid": "96c5299740ec791f3cf0975c03a40a7b219b6747" + } + } + ] + }, + "changedFiles": 11, + "files": { + "nodes": [ + { + "path": "test/distributed/_shard/checkpoint/test_checkpoint.py" }, { - "login": "soumith" + "path": "test/distributed/_shard/checkpoint/test_file_system_checkpoint.py" }, { - "login": "nimin98" + "path": "test/distributed/_shard/sharded_tensor/test_sharded_tensor.py" }, { - "login": "chaekit" + "path": "torch/distributed/_shard/checkpoint/__init__.py" }, { - "login": "radkris-git" + "path": "torch/distributed/_shard/checkpoint/filesystem.py" }, { - "login": "xunnanxu" + "path": "torch/distributed/_shard/checkpoint/metadata.py" }, { - "login": "javier-m" + "path": "torch/distributed/_shard/checkpoint/resharding.py" }, { - "login": "jmdetloff" + "path": "torch/distributed/_shard/checkpoint/state_dict_loader.py" }, { - "login": "mostafaelhoushi" + "path": "torch/distributed/_shard/checkpoint/state_dict_saver.py" }, { - "login": "brianjo" + "path": "torch/distributed/_shard/checkpoint/storage.py" }, { - "login": "ShijunK" - }, + "path": "torch/testing/_internal/distributed/_shard/sharded_tensor/_test_st_common.py" + } + ], + "pageInfo": { + "endCursor": "MTE", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ { - "login": "suo" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "vkuzo" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "seemethere" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "cpuhrsch" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "qihqi" + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" }, { - "login": "jackm321" + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" }, { - "login": "linbinyu" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "neerajprad" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "gnadathur" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "rsemenov" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "ziky90" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "gmagogsfm" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "zzzwen" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "ikriv" + "author": { + "login": "wanchaol" + }, + "state": "COMMENTED" }, { - "login": "deeptigp" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "andrewor14" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "jianyuh" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "cykustcc" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "highker" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "beauby" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "jeffreyksmithjr" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "suphoff" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "smessmer" - } - ], - "pageInfo": { - "hasNextPage": true, - "endCursor": "Y3Vyc29yOnYyOpHOACQ5JQ==" - } - } - } - } - } - }, - "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOACQ5JQ== name=metamates org=pytorch": { - "data": { - "organization": { - "team": { - "members": { - "nodes": [ + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + }, { - "login": "ananthsub" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "d1jang" + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" }, { - "login": "firstprayer" + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" }, { - "login": "malfet" + "author": { + "login": "simpkins" + }, + "state": "COMMENTED" }, { - "login": "fegin" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "hanton" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "zanqi" + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" }, { - "login": "bujar" + "author": { + "login": "zzzwen" + }, + "state": "COMMENTED" }, { - "login": "supriyar" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "kausv" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "divchenko" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "dagitses" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "rahuln32" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "bilgeacun" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "caogao" + "author": { + "login": "simpkins" + }, + "state": "COMMENTED" }, { - "login": "miguelmartin75" + "author": { + "login": "simpkins" + }, + "state": "COMMENTED" }, { - "login": "penguinwu" + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "login": "shz117" + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "login": "ajliu" + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "login": "saketh-are" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "msaroufim" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "mdundas" + "author": { + "login": "wilson100hong" + }, + "state": "COMMENTED" }, { - "login": "davides" + "author": { + "login": "wilson100hong" + }, + "state": "COMMENTED" }, { - "login": "alannnna" + "author": { + "login": "wilson100hong" + }, + "state": "COMMENTED" }, { - "login": "hlin09" + "author": { + "login": "xunnanxu" + }, + "state": "DISMISSED" }, { - "login": "hudeven" + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "login": "terrychenism" + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "login": "xiaomengy" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "jisaacso" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "fkhan1337" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "xing-liu" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "alanadakotashine" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "desertfire" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "YosuaMichael" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "banitag1" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "letterx" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "gchanan" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "dbort" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "bilalsal" + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "login": "DanilBaibak" + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "login": "serhaty" + "author": { + "login": "xunnanxu" + }, + "state": "COMMENTED" }, { - "login": "yf225" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "yifuwang" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "piyushmh" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "z-a-f" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "superzgc" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "bertmaher" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "chauhang" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "ZainRizvi" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "jiayisuse" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "bochko" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "jeanschmidt" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "bradleyhd" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "ZolotukhinM" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "jamesr66a" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "mullachv" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "voznesenskym" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "charliechen0401" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "bwasti" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "cryptopic" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "chinannyang" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "NivekT" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "zhxchen17" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "jerryzh168" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "MohammadMahdiJavanmard" + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "login": "rajkar86" + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "login": "wconstab" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "Hangjun" + "author": { + "login": "pritamdamania87" + }, + "state": "COMMENTED" }, { - "login": "davidberard98" + "author": { + "login": "pritamdamania87" + }, + "state": "APPROVED" }, { - "login": "Krovatkin" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "CamiWilliams" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "J0Nreynolds" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "datumbox" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "aartibasant" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "xta0" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "zou3519" + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" }, { - "login": "xman1979" - }, + "author": { + "login": "kumpera" + }, + "state": "COMMENTED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNC0yNVQxMTozNTowMS0wNzowMLkyMDIyLTA0LTI1VDExOjM1OjAwLTA3OjAwzjjC2d0=", + "hasPreviousPage": true + } + }, + "comments": { + "nodes": [ { - "login": "suraj813" + "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", + "createdAt": "2022-05-05T12:35:49Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118495479 }, { - "login": "gqchen" + "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", + "createdAt": "2022-05-05T12:53:15Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118511287 }, { - "login": "george-qi" + "bodyText": "Merge failed due to Can't fetch all PR reviews\nRaised by https://github.com/pytorch/pytorch/actions/runs/2275691136", + "createdAt": "2022-05-05T15:00:08Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118662274 }, { - "login": "abhikrish" + "bodyText": "Merge failed due to Can't fetch all PR reviews Raised by https://github.com/pytorch/pytorch/actions/runs/2275691136\n\n@osalpekar @malfet This is failing because there are 109 review comments on this PR but we only fetch the first 100. This could be solved with a similar concept as how we fetch more comments/check_runs.", + "createdAt": "2022-05-05T15:20:46Z", + "author": { + "login": "janeyx99" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118689010 }, { - "login": "zhangguanheng66" - }, + "bodyText": "On a side note, has the test_fsdp_clip_grad_norm_norm_type_2_0_nested_fsdp_False_cpu_offload_CPUOffload failure on the distributed test first shard of this PR been addressed?", + "createdAt": "2022-05-05T15:24:08Z", + "author": { + "login": "janeyx99" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1118693497 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQqri9w==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ { - "login": "mikeiovine" + "node": { + "name": "oncall: distributed" + } }, { - "login": "Adolfo-Karim" - }, + "node": { + "name": "cla signed" + } + } + ] + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=71759 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "coolteemf" + }, + "title": "Optimize grid sample 3d", + "body": "Fixes #71415\r\nI have implemented the changes that replicate what @to-mi did in this [PR](https://github.com/pytorch/pytorch/pull/65986#issue-1012959443) for the 3D case :\r\n\r\n> Fixes #64977\r\n> \r\n> Avoids creating a tensor for and calculating `input` gradient if it's not needed in the backward pass of `grid_sample` (2d case, native CPU & CUDA kernels). Especially the tensor creation seemed time consuming (see #64977).\r\n> \r\n> Brief description of the changes:\r\n> \r\n> * I have tried to go with rather minimal changes. It would probably be possible to make a more elegant version with a bit larger refactoring (or possibly with better understanding of PyTorch internals and C++ functionalities).\r\n> \r\n> * Changed the `native_functions.yaml` and `derivatives.yaml` so that the gradient input mask is passed to the functions.\r\n> \r\n> * Changed the CPU kernels:\r\n> (1) added `bool input_requires_grad` template parameter to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorAccessor* gInp_slice_ptr` instead of `TensorAccessor& gInp_slice` so that I can pass a `nullptr` in case gradient for `input` is not requested. (A bit inelegant perhaps, but allows to keep one signature for `backward` function and not require breaking it to smaller pieces. Perhaps there's a more elegant way to achieve this?)\r\n> \r\n> * Changed CUDA kernel:\r\n> (1) added ~`bool input_requires_grad` template parameter~ `const bool input_requires_grad` argument to the `backward` function,\r\n> (2) added if branches based on it to remove `input` gradient computations if it's not requested,\r\n> (3) feed in `TensorInfo()` instead of `getTensorInfo(grad_input)` in case gradient for `input` is not requested.\r\n> \r\n> * Modified tests in `test/test_nn.py` so that they run also cases with no `input` gradient needed.\r\n> \r\n> * Have not touched the CPU fallback kernel.\r\n\r\nNote: the changes number (3) are N/A in this case.\r\n\r\n", + "headRefName": "optimize_grid_sample_3d", + "headRepository": { + "nameWithOwner": "coolteemf/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { - "login": "Chillee" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "e0b0d1e695aeddceaf265da602c4704592053e9e" + } }, { - "login": "albanD" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "563ec73747ad53b63b36736c47c4342f962c2a09" + } }, { - "login": "bigfootjon" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "51abe41a132d9dd5b1c0551bdca902aacc028ff8" + } }, { - "login": "robotal" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "be9898205992034a00e8ace8a55c2ecdcee2c2f8" + } }, { - "login": "MarcioPorto" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "2929c60b64384c2deae0f7dea8bab94ad4bc9ec8" + } }, { - "login": "srsuryadev" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "9241b737e7e2b257905cc74ad9c50b737d7f9d0a" + } }, { - "login": "IvanKobzarev" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "64d6b795d0636928a8aa2fd3da01302fb5f5f7af" + } }, { - "login": "eprivezentsev" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "4503577e53760a0006f1e80ca6bfe04d2be90470" + } }, { - "login": "kwen2501" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "b16f4b11ffbbbf2ca2098f9702af4ef6b6fc5e1f" + } }, { - "login": "linux-jedi" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "7ffc23368a604afdc92d2818747f730ce31a2bb5" + } }, { - "login": "chandlerzuo" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "b85292604b9ad6c31706b76b5a5498c4f6d94309" + } }, { - "login": "prateek1404" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "9d81d7bae8ad91aaa24b3ceab83e3138894dbc69" + } }, { - "login": "otsneh" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "e79f6a2202512b294c55bf4bfb2e0524fafd4c48" + } }, { - "login": "husthyc" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "f683e8aec7aea76097a264eec01511e704c31154" + } }, { - "login": "briancoutinho" + "commit": { + "author": { + "user": { + "login": "coolteemf" + }, + "email": "67541941+coolteemf@users.noreply.github.com", + "name": "Fran\u00e7ois Lecomte" + }, + "oid": "b932e9e286c22aaf352375186df851ef060b295a" + } }, { - "login": "fduwjj" + "commit": { + "author": { + "user": null, + "email": "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", + "name": "coolteemf" + }, + "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22" + } } ], "pageInfo": { - "hasNextPage": true, - "endCursor": "Y3Vyc29yOnYyOpHOAGncmA==" - } - } - } - } - } - }, - "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOAGncmA== name=metamates org=pytorch": { - "data": { - "organization": { - "team": { - "members": { + "endCursor": "MTY", + "hasNextPage": false + }, + "totalCount": 16 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.intern.facebook.com/cla/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGYqY=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801320" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_T6g=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-onnx" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754066/jobs/2663109808" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754066/jobs/2663214802" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754066/jobs/2663214856" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIob0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801849" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ubk=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754064/jobs/2663109676" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ1E=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801852" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ubw=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754065/jobs/2663109684" + }, + { + "name": "test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754065/jobs/2663401083" + }, + { + "name": "test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754065/jobs/2663401143" + }, + { + "name": "test (distributed, 1, 1, linux.rocm.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754065/jobs/2663401186" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwMsZY=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801853" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ub0=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cuda11.3-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754068/jobs/2663109680" + }, + { + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754068/jobs/2663995756" + }, + { + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754068/jobs/2663995819" + }, + { + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754068/jobs/2663995900" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwZbzg=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801855" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Ub8=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "mypy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663109683" + }, + { + "name": "shellcheck", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663109827" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663109962" + }, + { + "name": "clang-format", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110044" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110132" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110233" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110320" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110461" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754069/jobs/2663110575" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGbAQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801856" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcA=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-clang7-asan" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754070/jobs/2663109804" + }, + { + "name": "test (default, 3, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754070/jobs/2663233675" + }, + { + "name": "test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754070/jobs/2663233731" + }, + { + "name": "test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754070/jobs/2663233805" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwJC4U=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801857" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcE=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754076/jobs/2663109810" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ_w=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801862" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_UcY=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663109777" + }, + { + "name": "test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201383" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201458" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201512" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201580" + }, + { + "name": "test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201672" + }, + { + "name": "test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754078/jobs/2663201839" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwIWu4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801866" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Uco=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1886754079/jobs/2663109681" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATwGZ1k=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/346e0c547953d98eb84d23c1391a95badb9c4a22/checks?check_suite_id=5414801869" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAUK_Uc0=" + } + ], + "pageInfo": { + "hasNextPage": true + } + }, + "status": { + "contexts": [ + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17017798?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17017799?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17017816?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17017800?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-02-23T10:39:30Z", + "oid": "346e0c547953d98eb84d23c1391a95badb9c4a22" + } + } + ] + }, + "changedFiles": 9, + "files": { "nodes": [ { - "login": "frank-wei" - }, - { - "login": "esqu1" - }, - { - "login": "prabhat00155" - }, - { - "login": "Gamrix" - }, - { - "login": "QuentinDuval" - }, - { - "login": "atalman" - }, - { - "login": "xush6528" - }, - { - "login": "dracifer" - }, - { - "login": "SS-JIA" - }, - { - "login": "helunwencser" - }, - { - "login": "xw285cornell" - }, - { - "login": "hhbyyh" - }, - { - "login": "rohan-varma" - }, - { - "login": "teng-li" - }, - { - "login": "larryliu0820" - }, - { - "login": "lyoka" - }, - { - "login": "cbalioglu" - }, - { - "login": "hl475" - }, - { - "login": "hwangjeff" - }, - { - "login": "Jack-Khuu" - }, - { - "login": "mehtanirav" - }, - { - "login": "nateanl" - }, - { - "login": "fuqianz" - }, - { - "login": "boyuantan" - }, - { - "login": "muntaqim" - }, - { - "login": "ymao1993" - }, - { - "login": "fmassa" - }, - { - "login": "esantorella" - }, - { - "login": "HamidShojanazeri" - }, - { - "login": "akshayParashar1995" - }, - { - "login": "jubinchheda" - }, - { - "login": "mehdimashayekhi" - }, - { - "login": "rkindi" - }, - { - "login": "wanchaol" - }, - { - "login": "zephirefaith" - }, - { - "login": "alexbeloi" - }, - { - "login": "kapilsh" - }, - { - "login": "plahera" + "path": "aten/src/ATen/native/GridSampler.cpp" }, { - "login": "SherlockNoMad" + "path": "aten/src/ATen/native/cpu/GridSamplerKernel.cpp" }, { - "login": "pritamdamania87" + "path": "aten/src/ATen/native/cuda/GridSampler.cpp" }, { - "login": "psavla2" + "path": "aten/src/ATen/native/cuda/GridSampler.cu" }, { - "login": "rahxephon89" + "path": "aten/src/ATen/native/cuda/GridSampler.h" }, { - "login": "migeed-z" + "path": "aten/src/ATen/native/native_functions.yaml" }, { - "login": "iseeyuan" + "path": "test/forward_backward_compatibility/check_forward_backward_compatibility.py" }, { - "login": "Matphyler" + "path": "test/test_nn.py" }, { - "login": "protonu" - }, + "path": "tools/autograd/derivatives.yaml" + } + ], + "pageInfo": { + "endCursor": "OQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ { - "login": "terhuhf" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "login": "aruntonic" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "login": "gcatron" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "login": "yingrliu" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "login": "alexanderguzhva" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "login": "angelayi" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "login": "zhaoalex" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "login": "shahofblah" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "login": "vivekmig" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "login": "jspisak" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "login": "akshaypandian" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "login": "tktrungna" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "login": "eellison" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "login": "ziab" + "author": { + "login": "coolteemf" + }, + "state": "COMMENTED" }, { - "login": "NarineK" + "author": { + "login": "albanD" + }, + "state": "COMMENTED" }, { - "login": "andrewconnors" + "author": { + "login": "albanD" + }, + "state": "APPROVED" }, { - "login": "wenwei202" - }, + "author": { + "login": "albanD" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMS0yNVQwODoyODoxMC0wODowMLkyMDIyLTAxLTI1VDA3OjU0OjA1LTA4OjAwzjNooqI=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ { - "login": "jg2912" + "bodyText": "Merge failed due to 'NoneType' object is not subscriptable\nRaised by https://github.com/pytorch/pytorch/actions/runs/1887945630", + "createdAt": "2022-02-23T14:55:36Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1048868910 }, { - "login": "robieta" + "bodyText": "Thanks for the update! The windows failure is not your fault, you can ignore it!\n\nThank you very much for all of your feedback and sorry for the delay !", + "createdAt": "2022-02-23T16:44:36Z", + "author": { + "login": "coolteemf" + }, + "authorAssociation": "CONTRIBUTOR", + "editor": null, + "databaseId": 1048983572 }, { - "login": "davidxili" + "bodyText": "@coolteemf can you please send either me or @albanD an email? (or I can send you and invite to collab on private repo)", + "createdAt": "2022-02-23T17:49:55Z", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1049048119 }, { - "login": "mreso" + "bodyText": "@pytorchbot merge this please", + "createdAt": "2022-02-23T19:23:55Z", + "author": { + "login": "albanD" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1049131992 }, { - "login": "soulitzer" - }, + "bodyText": "Hey @coolteemf.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-02-23T19:26:51Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1049134520 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOPoR4Lg==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ { - "login": "prigoyal" + "node": { + "name": "triaged" + } }, { - "login": "PaliC" + "node": { + "name": "open source" + } }, { - "login": "aovladi" + "node": { + "name": "cla signed" + } }, { - "login": "anijain2305" + "node": { + "name": "release notes: nn" + } }, { - "login": "pvtuan10" - }, + "node": { + "name": "topic: performance" + } + } + ] + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=75095 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "mruberry" + }, + "title": "Initial prims, references, and test architecture for them", + "body": "This PR adds an initial set of experimental primitive operations and Python references that reimplement existing PyTorch operations using them. See https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-0/577 for additional context.\r\n\r\nThe following experimental primitives are added:\r\n\r\n- Elementwise unary prims -- abs, acos, acosh, asin, atan, cos, cosh, bessel_i0e, bessel_i1e, cbrt, ceil, digamma, erf, erf_inv, erfc, exp, expm1, floor, igamma, igammac, is_finite, lgamma, log, log1p, neg, reciprocal, round, sign, sinh, sqrt, square, tan. \r\n- Elementwise binary prims -- add, atan2, bitwise_and, bitwise_not, bitwise_or, bitwise_xor, div, eq, ge, gt, le, lt, max, min, mul, ne, nextafter, pow, rsqrt, shift_left, shift_right_arithmetic\r\n- View prims -- brodcast_in_dim, collapse_view, split_dim, squeeze\r\n- Shape prims -- collapse, concatenate, reshape\r\n- Conditional prims -- select\r\n- Data conversion & movement prims -- convert_element_type, device_put\r\n- Inplace prims -- copy_to, resize\r\n\r\nThese primitives do not add any new functionality to PyTorch, but are intended to be the semantic building blocks for reference operators. We have tried to make them consistent with the operations in [jax.lax](https://jax.readthedocs.io/en/latest/jax.lax.html) where possible (because PyTorch prefers being consistent with other frameworks), although there are key differences between these prims and operations in jax.lax. Most notably is that these prims model view semantics and inplace operations.\r\n\r\nIn addition to these primitives the following elementwise binary Python references are added:\r\n\r\n- Elementwise binary Python references -- add, atan2, bitwise_and, bitwise_left_shift, bitwise_or, bitwise_right_shift, bitwise_xor, eq, float_power, ge, gt, le, lt, maximum, minimum, mul, ne, nextafter, pow, sub, true_divide\r\n- Conditional Python references - where\r\n- Data conversion & movement references - copy_to\r\n\r\nA Python reference implements the same behavior as its corresponding PyTorch operator (excepting slight numerical differences, bug fixes, and in some cases additional features). \r\n\r\nThe start of an OpInfo-based test architecture for these references is also included in this PR. A new list, `python_ref_db`, is added to `common_methods_invocations.py`. This list introduces the new `ElementwiseBinaryPythonRefInfo`, which inherits input arguments from the original operators' OpInfo, allows them to be overridden, and then constructs the OpInfo for the Python reference using the (potentially modified) arguments. OpInfo-based tests can opt-into testing references by including this new list in the Sequence passed to the `@ops` decorator. \r\n\r\ncc @ngimel @csarofeen @kevinstephano @Lezcano ", + "headRefName": "prims_and_references", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { - "login": "huangyi1979" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "a790467c650be92775103cde5e866c90b56f5376" + } }, { - "login": "osalpekar" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "bd6fcf50692e208ebecdc2eaa517a2bfcdcd35cf" + } }, { - "login": "xiaohui-zhang" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "4a119c8f21529fe1375e7e8789b91f41a3df80c5" + } }, { - "login": "jerry39213gh" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "ea6750dc34d66be759fdfe84b09fb0e23ee59c79" + } }, { - "login": "jarodhou" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "2eef8a55fe0227e1921b51bf1f56f9d0a29b49ac" + } }, { - "login": "hlu1" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "b886ed6c20dd1785fd31ed6fa6a8c5b6d0d0b16c" + } }, { - "login": "huiguoo" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "9ad9b63d09aa4f7a8549bcf1d88ea4ff0674299c" + } }, { - "login": "H-Huang" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "63fdd580118477416ae160e0670ae722ea248090" + } }, { - "login": "vtsyvina" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "0ccf7dc292af1d40d0a094eb2b2fb0c7ab4ccc70" + } }, { - "login": "qchip" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "e8a8a4d1fbe35f20eb88e1a43cf5a653883638e5" + } }, { - "login": "Nitrokitty" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "186634dfdd25645c05b58a212f9e8d77c4125fc0" + } }, { - "login": "satgera" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "f5b4741312b5c42a79f6c8a1d3930b79db38ed8f" + } }, { - "login": "ngimel" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "23d50391bb0fd12111fd3171591c4235ffb2fc1a" + } }, { - "login": "dongreenberg" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "bac9d45422d58f513b60b4b854441cfdc253d4c5" + } }, { - "login": "sijiac" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "13240ae0b4a0332c3167b65ac026a3172da90cb7" + } }, { - "login": "markkm" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "1ee34468cb1db3dc6cbae204669f4fec20e2a466" + } }, { - "login": "EscapeZero" + "commit": { + "author": { + "user": { + "login": "ezyang" + }, + "email": "ezyang@fb.com", + "name": "Edward Z. Yang" + }, + "oid": "561d132bc686d00e8911f7feb3da5901b2bdc574" + } }, { - "login": "bdhirsh" + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" + }, + "oid": "ac42bedc84b7c96256376ad09917263bb020b2c3" + } }, { - "login": "cccclai" + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" + }, + "oid": "7f7d5ba40a0b5e10526d90b018b30b54673d12d8" + } }, { - "login": "carolineechen" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "37a6b4a8b1adb712d5777c7c3479866c27fb3c4e" + } }, { - "login": "tugsbayasgalan" + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" + }, + "oid": "65b613868c44e519c1777af79b9fd3498c5a7e58" + } }, { - "login": "agunapal" + "commit": { + "author": { + "user": { + "login": "ngimel" + }, + "email": "ngimel@fb.com", + "name": "Natalia Gimelshein" + }, + "oid": "442c405e9da0d66744ef03e379224c41eedf5b57" + } }, { - "login": "frankseide" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "031ac49ae9c192989385986b6707fa781e3229e0" + } }, { - "login": "YazhiGao" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "9a6c3b00039c0c985c1c9cb59490012d1c0b38ba" + } }, { - "login": "pavithranrao" + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "d5c30e408af1889b90012d2e09f6ec3cda333bcb" + } }, { - "login": "VirgileHlav" - }, + "commit": { + "author": { + "user": null, + "email": "mruberry@devfair044.h1.fair", + "name": "Mike Ruberry" + }, + "oid": "db355d55655bb252a699cd532441bb98e52b98d5" + } + } + ], + "pageInfo": { + "endCursor": "MjY", + "hasNextPage": false + }, + "totalCount": 26 + }, + "commits": { + "nodes": [ { - "login": "mrshenli" + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.intern.facebook.com/cla/" + }, + { + "name": "Meta Internal-Only Changes Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://opensource.facebook.com/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6ux14=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454954" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC2o=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454956" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC2w=" + }, + { + "node": { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454965" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC3U=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454970" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC3o=" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454974" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC34=" + }, + { + "node": { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241454977" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFC4E=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622865/jobs/3270915028" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6e-c8=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455322" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDNo=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915027" + }, + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915071" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915141" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915194" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915229" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915283" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622869/jobs/3270915321" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6e-zM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455334" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDOY=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927344" + }, + { + "name": "linux-bionic-rocm5.0-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927442" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927507" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927567" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927674" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927727" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927802" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927853" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927948" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270927996" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928061" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928116" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928198" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928256" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928291" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928317" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928338" + }, + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928367" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928410" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270928445" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991071" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991125" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991162" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991195" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991233" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991261" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991305" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270991349" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270996024" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270996068" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270996092" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270996505" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270998987" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3270999027" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271006886" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271006941" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271018097" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271018135" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271018162" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271021143" + }, + { + "name": "linux-bionic-rocm5.0-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271034041" + }, + { + "name": "linux-bionic-rocm5.0-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271034072" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271048218" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271049553" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271049587" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271049616" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271068293" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271068336" + }, + { + "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271149276" + }, + { + "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2217622878/jobs/3271149321" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAW6jVK8=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/db355d55655bb252a699cd532441bb98e52b98d5/checks?check_suite_id=6241455360" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAXQFDQA=" + } + ], + "pageInfo": { + "hasNextPage": false + } + }, + "status": null, + "pushedDate": "2022-04-25T02:30:31Z", + "oid": "db355d55655bb252a699cd532441bb98e52b98d5" + } } - ], - "pageInfo": { - "hasNextPage": true, - "endCursor": "Y3Vyc29yOnYyOpHOAQNk0w==" - } - } - } - } - } - }, - "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOAQNk0w== name=metamates org=pytorch": { - "data": { - "organization": { - "team": { - "members": { + ] + }, + "changedFiles": 5, + "files": { "nodes": [ { - "login": "lena-kashtelyan" - }, - { - "login": "brad-mengchi" - }, - { - "login": "kimishpatel" - }, - { - "login": "aaronenyeshi" - }, - { - "login": "shajrawi" - }, - { - "login": "samdow" - }, - { - "login": "dzhulgakov" - }, - { - "login": "great-way" - }, - { - "login": "ashkan-software" - }, - { - "login": "garroud" - }, - { - "login": "jbitton" - }, - { - "login": "jdsgomes" - }, - { - "login": "zhangxy988" - }, - { - "login": "samlurye" - }, - { - "login": "EdwardTyantov" - }, - { - "login": "anjali411" - }, - { - "login": "kryanchun" - }, - { - "login": "842974287" - }, - { - "login": "JacobSzwejbka" - }, - { - "login": "macandro96" - }, - { - "login": "nishantpdce" - }, - { - "login": "srinivas212" - }, - { - "login": "cherie11" - }, - { - "login": "shreyanb98" - }, - { - "login": "kavoor" - }, - { - "login": "dzdang" - }, - { - "login": "yushangdi" - }, - { - "login": "naveedgol" - }, - { - "login": "Nayef211" - }, - { - "login": "zrphercule" - }, - { - "login": "HengruiX" - }, - { - "login": "langong347" - }, - { - "login": "soapisnotfat" - }, - { - "login": "ebsmothers" - }, - { - "login": "swang392" - }, - { - "login": "anshuljain1" - }, - { - "login": "b-koopman" - }, - { - "login": "salilsdesai" - }, - { - "login": "vmoens" - }, - { - "login": "LinjianMa" - }, - { - "login": "printfoo" - }, - { - "login": "xinyang0" - }, - { - "login": "ramvenkat98" - }, - { - "login": "fbbradheintz" - }, - { - "login": "davidchencsl" - }, - { - "login": "kauterry" - }, - { - "login": "VenkatSubramaniam" - }, - { - "login": "yxia11" - }, - { - "login": "anirbanraywork" - }, - { - "login": "houseroad" - }, - { - "login": "erichan1" - }, - { - "login": "hsrussell" - }, - { - "login": "ilia-cher" - }, - { - "login": "ajitmaths" - }, - { - "login": "awgu" - }, - { - "login": "wz337" - }, - { - "login": "qxy11" - }, - { - "login": "janeyx99" - }, - { - "login": "msedwar" - }, - { - "login": "dustinh1999" - }, - { - "login": "glaringlee" - }, - { - "login": "anj-s" - }, - { - "login": "liuchen9494" - }, - { - "login": "drisspg" - }, - { - "login": "kmh4321" - }, - { - "login": "RdoubleA" - }, - { - "login": "jramseyer" - }, - { - "login": "goldenxuett" - }, - { - "login": "zengk95" - }, - { - "login": "gtarjun" - }, - { - "login": "mikaylagawarecki" - }, - { - "login": "xianxl" - }, - { - "login": "mingzhe09088" - }, - { - "login": "Vucibatina" - }, - { - "login": "aazzolini" - }, - { - "login": "nataliakliushkina" - }, - { - "login": "mruberry" - }, - { - "login": "HDCharles" - }, - { - "login": "mcr229" - }, - { - "login": "manuelcandales" - }, - { - "login": "guangy10" - }, - { - "login": "mengwa41" - }, - { - "login": "YulunW" - }, - { - "login": "hx89" + "path": "test/test_ops.py" }, { - "login": "hanhsienhuang" + "path": "torch/_prims/__init__.py" }, { - "login": "clee2000" + "path": "torch/_prims/utils.py" }, { - "login": "lhuang04" + "path": "torch/_refs/__init__.py" }, { - "login": "sidneyfletcher" - }, + "path": "torch/testing/_internal/common_methods_invocations.py" + } + ], + "pageInfo": { + "endCursor": "NQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ { - "login": "gottbrath" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "lessw2020" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "mmh683" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "dwarakrajagopal" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "YifanShenSZ" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "lazysjb" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "zhaojuanmao" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "johncalab" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "dhthompson" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "superwizard2019" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "fbhuba" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "shunting314" - } - ], - "pageInfo": { - "hasNextPage": true, - "endCursor": "Y3Vyc29yOnYyOpHOAyJyuA==" - } - } - } - } - } - }, - "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=Y3Vyc29yOnYyOpHOAyJyuA== name=metamates org=pytorch": { - "data": { - "organization": { - "team": { - "members": { - "nodes": [ + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" + }, { - "login": "edward-io" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "sean-ngo" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "bzinodev" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "skim0514" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "xcheng16" + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" }, { - "login": "adamomainz" + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" }, { - "login": "sluks" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "poojahp" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "ansley" + "author": { + "login": "zou3519" + }, + "state": "COMMENTED" }, { - "login": "mvsampath" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "cheetah2216" + "author": { + "login": "peterbell10" + }, + "state": "COMMENTED" }, { - "login": "pinaki-mukerji" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "hongxiayang" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "kyulee-com" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "sstsai-adl" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "dahsh" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "login": "ohgnoes" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "szewaiyuen7" + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" }, { - "login": "byterover" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "asl3" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "ejguan" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "nimaelyasi" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "nikithamalgifb" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "rohan-ahluwalia" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "qxu-fb" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "sshawnwu" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "andrewyounkins" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "njuvekar" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "iramazanli" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "jnkwok1" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "kurman" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "jbschlosser" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "ccongge" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "haichuan-fb" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "janghyuncho" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "wwang84" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "JustinPinero" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "gcramer23" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "yuguo68" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "c-odrin" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "chowarfb" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "priyaramani" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "yidawang-oss" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "asalioufb" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "four4fish" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "kkosik20" + "author": { + "login": "ngimel" + }, + "state": "COMMENTED" }, { - "login": "pmabbo13" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "login": "KZFB" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "dborkovic" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "sisilmehta2000" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "henryliu-bluehills" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "madhu-fb" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "muchulee8" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "anirbanr-fb-r2p" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "kirklandsign" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "o-hanna" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "izaitsevfb" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "login": "weiwangmeta" - } - ], - "pageInfo": { - "hasNextPage": false, - "endCursor": "Y3Vyc29yOnYyOpHOBoQSVA==" - } - } - } - } - } - }, - "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MTAw name=pytorch number=76118 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "files": { - "nodes": [ - { - "path": "docs/source/quantization.rst" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "docs/source/scripts/build_quantization_configs.py" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/allowlist_for_publicAPI.json" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/source_range_test.cpp" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/test_backend.cpp" + "author": { + "login": "lezcano" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/test_flatbuffer.cpp" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/test_misc.cpp" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/test_utils.h" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_float_v2.ptl.ff" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_float_v2.ptl.ff" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_inplace_int_v2.ptl.ff" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_int_v2.ptl.ff" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_float_v2.ptl.ff" + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_reciprocal_int_v2.ptl.ff" + "author": { + "login": "ngimel" + }, + "state": "APPROVED" }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_scalar_scalar_v2.ptl.ff" + "author": { + "login": "ezyang" + }, + "state": "COMMENTED" }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_tensor_inplace_v2.ptl.ff" - }, + "author": { + "login": "mruberry" + }, + "state": "COMMENTED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNC0wNlQxMjo1NjoyNC0wNzowMLkyMDIyLTA0LTA2VDA4OjQwOjM4LTA3OjAwzjenO6Y=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_tensor_out_v2.ptl.ff" + "bodyText": "Ref implementations by themselves can handle any shapes (and broadcast ops by themselves don't bake in any shapes). The question is can we decide if a particular trace is applicable for a different input, but that depends on the tracing technology and what we are caching on, so out of scope for initial PR.", + "createdAt": "2022-04-21T19:00:28Z", + "author": { + "login": "ngimel" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1105643418 }, { - "path": "test/cpp/jit/upgrader_models/test_versioned_div_tensor_v2.ptl.ff" + "bodyText": "@pytorchbot merge this please", + "createdAt": "2022-04-25T04:42:29Z", + "author": { + "login": "mruberry" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1108072887 }, { - "path": "test/cpp/profiler/record_function.cpp" + "bodyText": "Merge failed due to 'mruberry'\nRaised by https://github.com/pytorch/pytorch/actions/runs/2218044244", + "createdAt": "2022-04-25T04:43:54Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1108073536 }, { - "path": "test/distributed/_shard/sharded_tensor/test_sharded_tensor.py" + "bodyText": "@mruberry has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "createdAt": "2022-04-25T04:51:11Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1108075965 }, { - "path": "test/distributed/_shard/test_replicated_tensor.py" + "bodyText": "Hey @mruberry.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-04-25T09:57:56Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1108351107 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQebHmg==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "cla signed" + } }, { - "path": "test/distributed/fsdp/test_fsdp_comm.py" + "node": { + "name": "topic: not user facing" + } }, { - "path": "test/distributed/fsdp/test_fsdp_optim_state.py" - }, + "node": { + "name": "module: primTorch" + } + } + ] + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=77700 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "kit1980" + }, + "title": "Move pull linux-docs job to Ubuntu 20.04", + "body": "", + "headRefName": "sdym/pull-xenial-focal-linux-docs", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { - "path": "test/distributed/optim/test_zero_redundancy_optimizer.py" - }, + "commit": { + "author": { + "user": { + "login": "kit1980" + }, + "email": "sdym@fb.com", + "name": "Sergii Dymchenko" + }, + "oid": "81261599614423baa17df72300b8e109677b6799" + } + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + }, + "totalCount": 1 + }, + "commits": { + "nodes": [ { - "path": "test/jit/test_export_modes.py" - }, + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.facebook.com/cla/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNmNqE=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147714" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuMI=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147726" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuM4=" + }, + { + "node": { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147733" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuNU=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147746" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuOI=" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147762" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuPI=" + }, + { + "node": { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567147780" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuQQ=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "lintrunner", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528127876" + }, + { + "name": "workflow-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128023" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128196" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128519" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128575" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128663" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867841/jobs/3528128857" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdYVY=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148336" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuzA=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867843/jobs/3528127882" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdXEg=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148344" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduuzg=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "docker-builds" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "docker-build (pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528127883" + }, + { + "name": "docker-build (pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528127945" + }, + { + "name": "docker-build (pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128001" + }, + { + "name": "docker-build (pytorch-linux-bionic-py3.7-clang9)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128067" + }, + { + "name": "docker-build (pytorch-linux-bionic-rocm5.0-py3.7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128124" + }, + { + "name": "docker-build (pytorch-linux-bionic-rocm5.1-py3.7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128191" + }, + { + "name": "docker-build (pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128259" + }, + { + "name": "docker-build (pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128321" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3-clang5-android-ndk-r19c)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128365" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3-clang5-asan)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128446" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3-clang7-asan)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128507" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3-clang7-onnx)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128563" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3.7-gcc5.4)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128639" + }, + { + "name": "docker-build (pytorch-linux-xenial-py3.7-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128687" + }, + { + "name": "docker-build (pytorch-linux-focal-py3.7-gcc7)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867844/jobs/3528128741" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNdYLI=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148352" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduu0A=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528150762" + }, + { + "name": "linux-focal-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528150903" + }, + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151086" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151258" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151511" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151776" + }, + { + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528151896" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152014" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152139" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152216" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152378" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152516" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152599" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152723" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152802" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152913" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528152969" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528153005" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528153062" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528153125" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528153207" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528242483" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528242528" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528245875" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528245914" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528245964" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528246008" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528248520" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528255086" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528255128" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274064" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274097" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274133" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274173" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528274209" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528277014" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528308958" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309747" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309810" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309837" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309864" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309895" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528309925" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528310044" + }, + { + "name": "linux-bionic-rocm5.1-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528310101" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384337" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384379" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384408" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384441" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2348867849/jobs/3528384471" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAYNi1Nc=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/81261599614423baa17df72300b8e109677b6799/checks?check_suite_id=6567148369" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAYduu1E=" + } + ], + "pageInfo": { + "hasNextPage": false + } + }, + "status": null, + "pushedDate": "2022-05-19T00:02:11Z", + "oid": "81261599614423baa17df72300b8e109677b6799" + } + } + ] + }, + "changedFiles": 3, + "files": { + "nodes": [ { - "path": "test/jit/test_if_hoisting.py" + "path": ".circleci/docker/build.sh" }, { - "path": "test/jit/test_tracer.py" + "path": ".circleci/docker/common/install_katex.sh" }, { - "path": "test/jit/test_upgraders.py" - }, + "path": ".github/workflows/pull.yml" + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ { - "path": "test/mobile/test_lite_script_type.py" + "author": { + "login": "suo" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/expect/TestOperators.test_layer_norm_aten.expect" + "author": { + "login": "kit1980" + }, + "state": "COMMENTED" }, { - "path": "test/onnx/test_operators.py" - }, + "author": { + "login": "janeyx99" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNS0xOFQxMjo0MTowNS0wNzowMLkyMDIyLTA1LTE4VDEyOjQxOjA0LTA3OjAwzjpD7es=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ { - "path": "test/onnx/test_pytorch_onnx_onnxruntime.py" + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/77700\n\ud83d\udcc4 \u00a0Preview Python docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\u2753Need help or want to give feedback on the CI? Visit our office hours\n\n\u2705 No Failures (0 Pending)\nAs of commit 8126159 (more details on the Dr. CI page):\nExpand to see more\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2022-05-17T23:01:48Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1129400934 }, { - "path": "test/quantization/ao_migration/test_quantization_fx.py" + "bodyText": "@pytorchbot merge", + "createdAt": "2022-05-19T15:39:05Z", + "author": { + "login": "kit1980" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1131884232 }, { - "path": "test/quantization/core/test_quantized_op.py" + "bodyText": "Merge failed due to Refusing to merge as mandatory check(s) linux-docs / build-docs (cpp), linux-docs / build-docs (python) are pending/not yet run for rule OSS CI\nRaised by https://github.com/pytorch/pytorch/actions/runs/2353067846", + "createdAt": "2022-05-19T15:40:59Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1131886153 }, { - "path": "test/quantization/core/test_quantized_tensor.py" + "bodyText": "@pytorchbot merge -f", + "createdAt": "2022-05-19T16:41:29Z", + "author": { + "login": "kit1980" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1131945610 }, { - "path": "test/quantization/fx/test_numeric_suite_fx.py" - }, + "bodyText": "Hey @kit1980.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-05-19T16:43:37Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1131947473 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQ1FKZg==", + "hasPreviousPage": false + } + }, + "labels": { + "edges": [ { - "path": "test/quantization/fx/test_quantize_fx.py" + "node": { + "name": "Merged" + } }, { - "path": "test/test_autograd.py" - }, + "node": { + "name": "cla signed" + } + } + ] + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=68111 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "chunyuan-w" + }, + "title": "Add JIT graph fuser for oneDNN Graph API (Preview4)", + "body": "## Description\r\nPreview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).\r\n\r\nOn the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:\r\n\r\n- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used\r\n- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.\r\n\r\n### User API:\r\nThe optimization pass is disabled by default. Users could enable it by:\r\n```\r\ntorch.jit.enable_onednn_fusion(True)\r\n```\r\n\r\n### Performance:\r\n[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:\r\n- SkyLake 8180 (1 socket of 28 cores):\r\n\r\n ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)\r\n\r\n- SkyLake 8180 (single thread):\r\n\r\n ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)\r\n \\* By mapping hardswish to oneDNN Graph, it\u2019s 8% faster than PyTorch JIT (NNC + OFI)\r\n \\** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops\r\n\r\n\r\n### Directory structure of the integration code\r\nFuser-related code are placed under:\r\n```\r\ntorch/csrc/jit/codegen/onednn/\r\n```\r\n\r\nOptimization pass registration is done in:\r\n```\r\ntorch/csrc/jit/passes/onednn_graph_fuser.h\r\n```\r\n\r\nCMake for the integration code is:\r\n```\r\ncaffe2/CMakeLists.txt\r\n```\r\n\r\n## Limitations\r\n\r\n- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.\r\n- We have only optimized the inference use case.", + "headRefName": "chunyuan/llga_preview2", + "headRepository": { + "nameWithOwner": "chunyuan-w/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { - "path": "test/test_binary_ufuncs.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "0096fcc49f277fd8e006fcb42e0cb28a1422ec98" + } }, { - "path": "test/test_expanded_weights.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "7bcc4de26a5472f1d252735dd425b46794b0844f" + } }, { - "path": "test/test_functionalization.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "3a2a588bfe6bbf9bf74d88d441cd22affda207da" + } }, { - "path": "test/test_fx_experimental.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "ca7df12fbfaa3ddbabeca39b76300d17f4a33f2f" + } }, { - "path": "test/test_jit.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "81d44f35b8bc043c38837d0694e5bc072203b832" + } }, { - "path": "test/test_jit_cuda_fuser.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "14fd5d1bfc2c58a71379f778871e3fca0a8e79b2" + } }, { - "path": "test/test_linalg.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "954dc23663125897f4b199eb2a8607dc5fca3274" + } }, { - "path": "test/test_nestedtensor.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "9f77a0b476accc678b6f0569e4ff33fa6bbe97fc" + } }, { - "path": "test/test_nn.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "fbf3b23bc1288697e1aec539a7c4ee3dc0bcb84c" + } }, { - "path": "test/test_ops.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "f8b8e78f786586c3cdf3966fd83ffa124d3eda70" + } }, { - "path": "test/test_ops_gradients.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "6fffa2f7453ee7e0f8d8e2f73ea8a65230539589" + } }, { - "path": "test/test_ops_jit.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "849385404e6f3cd1cf7cef19f931ecf4fa28afdb" + } }, { - "path": "test/test_optim.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "adbae7b77f8c0dbc59fccf15207d97ba86cfade2" + } }, { - "path": "test/test_overrides.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "6dcf2a4981aff24fa16fc7461ae4ec29690f956f" + } }, { - "path": "test/test_profiler.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "54f3e05ad524cffd0911ee93be3c50f589b51f58" + } }, { - "path": "test/test_public_bindings.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "edbfc640ea79a0af85757d9e73796dcc90231519" + } }, { - "path": "test/test_pytree.py" + "commit": { + "author": { + "user": { + "login": "chunyuan-w" + }, + "email": "chunyuan.wu@intel.com", + "name": "chunyuan" + }, + "oid": "67654db7cba562809d1b4a44cdda58af5cc9daaf" + } }, { - "path": "test/test_reductions.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "9c9d99b930b11af9ff03f52d45bf49c652df758d" + } }, { - "path": "test/test_sort_and_select.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ffb25119cd9ce815cc4d9d14a2317fcbbfa9ea86" + } }, { - "path": "test/test_sparse.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ab9eee84512ca1bdfbc81e25c6eb67b29d0f302a" + } }, { - "path": "test/test_sparse_csr.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "62a4642cf3330524990a69ac29e002c97812320a" + } }, { - "path": "test/test_spectral_ops.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "ca9b1223be4af2c8b4929303d498eafd71793128" + } }, { - "path": "test/test_tensor_creation_ops.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "6f4a23d24514a02954d2ec792830085f612223c9" + } }, { - "path": "test/test_tensorboard.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "b2a9a9c0926b02d0b2e87722ed61450f224a61d0" + } }, { - "path": "test/test_testing.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e88b492be733f24b6aa395829c76add67d0901e7" + } }, { - "path": "test/test_torch.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "c44336d7a914952bfb78e012e08d9a6d6dde5937" + } }, { - "path": "test/test_unary_ufuncs.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "5157930f7b3921d41a586260582b574c915f6ca1" + } }, { - "path": "third_party/BUCK.github" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "04cb8353813f6bbd0d913a994923cc7e1e291406" + } }, { - "path": "third_party/fbgemm" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "62991eaad0e638bb0bced327e03f932f66f68732" + } }, { - "path": "tools/autograd/derivatives.yaml" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "7496bf1588050191595d833d23b8972b2f22655e" + } }, { - "path": "tools/autograd/gen_inplace_or_view_type.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "d9d35f23cca0cd29c78a845731b24826152dcf1c" + } }, { - "path": "tools/autograd/load_derivatives.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "f74ec134f18a65a7c72455bdf44f72e3ebb27105" + } }, { - "path": "tools/build_variables.bzl" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "eb32cc65a975361160948bfc3d6a577991ea262e" + } }, { - "path": "tools/codegen/api/autograd.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "c7665f8d695b680c54db0bad2b7b7df46d886b50" + } }, { - "path": "tools/codegen/api/cpp.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e6321ad8f59ea01130568c202d186448bb9cb9d0" + } }, { - "path": "tools/codegen/api/dispatcher.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "a72cd0d02693f45e5354a70654581ad514581ec7" + } }, { - "path": "tools/codegen/api/functionalization.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "b3cd3028b4ed31805e82f7eaf02217ab74ca59b9" + } }, { - "path": "tools/codegen/api/lazy.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "49a592d9788d08e6cd0593882f867e129057c1cc" + } }, { - "path": "tools/codegen/api/meta.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "0575766b2144b13f6a38227c4e2b8d22ec8db80f" + } }, { - "path": "tools/codegen/api/native.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "b5c9b10ff87d622350e8ca64fae3a476eb70d5aa" + } }, { - "path": "tools/codegen/api/python.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "66bc652a30ccc329adb929870a4ac726bb98b38c" + } }, { - "path": "tools/codegen/api/structured.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "72b9ca9c8e2dac98cbb7199b3dfac7c7305b80c5" + } }, { - "path": "tools/codegen/api/translate.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "a7892ed7373207d96406c8b5734a089643c5cdbd" + } }, { - "path": "tools/codegen/api/types.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "d54cb084e1daad8a08c3f8de0ad3f7afb5b05ac1" + } }, { - "path": "tools/codegen/api/ufunc.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "aef71d692a8a159e0ca56be363e2cc1225ce7647" + } }, { - "path": "tools/codegen/api/unboxing.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "bf618e205ec31cff962dcc8ab478e0a699a9572d" + } }, { - "path": "tools/codegen/code_template.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e4a331f1088448f7d7d86256ce71e0e71da006b0" + } }, { - "path": "tools/codegen/context.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "0b743523d1430fec759d5fefbb687f17c89335a5" + } }, { - "path": "tools/codegen/decompositions/gen_jit_decompositions.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "e80a351a62d98b810ec8985c4b25257af1d6c5bb" + } }, { - "path": "tools/codegen/dest/__init__.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "c189eca154b6691919d0e21489d1c322c7435c0b" + } }, { - "path": "tools/codegen/dest/lazy_ir.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "e080a067c75d7b888a8a362682a2d5ba70e0c3a8" + } }, { - "path": "tools/codegen/dest/lazy_ts_lowering.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "028561fbf8f3ed90e074e6e0e3a4ca4dd7ffa2a8" + } }, { - "path": "tools/codegen/dest/native_functions.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "d550cf14037badd4caa2f52202e2f20bc4db8432" + } }, { - "path": "tools/codegen/dest/register_dispatch_key.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "574159ebadd1dec24daaf883879ffeca8d9e71b7" + } }, { - "path": "tools/codegen/dest/ufunc.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "9eb3ee98ea756067ed1c8f52f309f6d3e211a904" + } }, { - "path": "tools/codegen/gen.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "29929f48be03dcdd1bbfade572de7feafa825547" + } }, { - "path": "tools/codegen/gen_backend_stubs.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "8a7358ca8da547b40ea1a99ddc57ebed19959684" + } }, { - "path": "tools/codegen/gen_functionalization_type.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "6606637d2c5525b43e294a8b366a85052e1be0c6" + } }, { - "path": "tools/codegen/gen_lazy_tensor.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "5ecfd1f28b87045deb8bc8ffe33b3d8b906f3264" + } }, { - "path": "tools/codegen/local.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchit.jain" + }, + "oid": "be2d4345c65442c4cfbe8afdfb2ae0893945da42" + } }, { - "path": "tools/codegen/model.py" + "commit": { + "author": { + "user": { + "login": "sanchitintel" + }, + "email": "sanchit.jain@intel.com", + "name": "sanchitintel" + }, + "oid": "b5b89d3644a43e2dbda841cafb71b32edbe07c8a" + } }, { - "path": "tools/codegen/operator_versions/gen_mobile_upgraders.py" + "commit": { + "author": { + "user": { + "login": "malfet" + }, + "email": "nikita.shulga@gmail.com", + "name": "Nikita Shulga" + }, + "oid": "73881411e2bfb3aaa2e89926a82390b4c587ad75" + } } ], "pageInfo": { - "endCursor": "MjAw", - "hasNextPage": true - } - } - } - } - } - }, - "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MjAw name=pytorch number=76118 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { + "endCursor": "NjI", + "hasNextPage": false + }, + "totalCount": 62 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ + { + "name": "Facebook CLA Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://code.intern.facebook.com/cla/" + }, + { + "name": "Meta Internal-Only Changes Check", + "conclusion": "SUCCESS", + "detailsUrl": "https://opensource.facebook.com/" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NXnc=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625010" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYwzI=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "clang-format", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903895825" + }, + { + "name": "py2-setup-validate-errormsg", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903895911" + }, + { + "name": "quick-checks", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903895963" + }, + { + "name": "shellcheck", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896134" + }, + { + "name": "toc", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896253" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896371" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896525" + }, + { + "name": "flake8-py3", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896658" + }, + { + "name": "Test collect_env (with_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896771" + }, + { + "name": "Test collect_env (without_torch)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896795" + }, + { + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896838" + }, + { + "name": "mypy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440028/jobs/2903896897" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NZqw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625458" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxPI=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "run-torchbench", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440031/jobs/2903895828" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_NYIw=", + "hasNextPage": false + } + }, + "conclusion": "SKIPPED", + "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625463" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxPc=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "pull" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896014" + }, + { + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896165" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896394" + }, + { + "name": "linux-bionic-rocm4.5-py3.7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896572" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896666" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896778" + }, + { + "name": "linux-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896837" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896896" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903896936" + }, + { + "name": "linux-xenial-py3-clang5-mobile-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897025" + }, + { + "name": "linux-xenial-py3.7-gcc7-no-ops / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897161" + }, + { + "name": "linux-xenial-py3.7-gcc7 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897213" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897280" + }, + { + "name": "win-vs2019-cpu-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897368" + }, + { + "name": "win-vs2019-cuda11.3-py3 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897431" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897476" + }, + { + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897578" + }, + { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897630" + }, + { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897699" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2903897733" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904327787" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904327838" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904327956" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904327997" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904328035" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904328093" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904328131" + }, + { + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904328177" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904333962" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904334006" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904430419" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904430459" + }, + { + "name": "linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904430508" + }, + { + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904430573" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904443663" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904443723" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904443787" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904454239" + }, + { + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904454303" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904554602" + }, + { + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904554698" + }, + { + "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904588855" + }, + { + "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904588886" + }, + { + "name": "win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904588924" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904655702" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904656104" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904656150" + }, + { + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904656192" + }, + { + "name": "linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904706520" + }, + { + "name": "linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2018440039/jobs/2904706565" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAU_fN1g=", + "hasNextPage": false + } + }, + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/73881411e2bfb3aaa2e89926a82390b4c587ad75/checks?check_suite_id=5743625483" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVZYxQs=" + } + ], + "pageInfo": { + "hasNextPage": false + } + }, + "status": { + "contexts": [ + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17048428?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17048429?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17048431?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17048430?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-03-21T19:58:52Z", + "oid": "73881411e2bfb3aaa2e89926a82390b4c587ad75" + } + } + ] + }, + "changedFiles": 37, "files": { "nodes": [ { - "path": "tools/codegen/selective_build/operator.py" - }, - { - "path": "tools/codegen/selective_build/selector.py" - }, - { - "path": "tools/codegen/shape_functions/gen_jit_shape_functions.py" - }, - { - "path": "tools/codegen/static_runtime/config.py" - }, - { - "path": "tools/codegen/static_runtime/gen_static_runtime_ops.py" - }, - { - "path": "tools/codegen/static_runtime/gen_structured.py" - }, - { - "path": "tools/codegen/utils.py" + "path": "aten/src/ATen/core/interned_strings.h" }, { - "path": "tools/linter/adapters/circleci_linter.py" + "path": "caffe2/CMakeLists.txt" }, { - "path": "tools/linter/adapters/clangformat_linter.py" + "path": "cmake/Dependencies.cmake" }, { - "path": "tools/linter/adapters/grep_linter.py" + "path": "cmake/Modules/FindMKLDNN.cmake" }, { - "path": "tools/linter/adapters/nativefunctions_linter.py" + "path": "cmake/public/mkldnn.cmake" }, { - "path": "tools/setup_helpers/BUILD.bazel" + "path": "docs/source/jit.rst" }, { - "path": "tools/setup_helpers/generate_code.py" + "path": "test/test_jit_llga_fuser.py" }, { "path": "torch/_C/__init__.pyi.in" }, { - "path": "torch/amp/autocast_mode.py" - }, - { - "path": "torch/ao/ns/fx/pattern_utils.py" - }, - { - "path": "torch/ao/quantization/backend_config/README.md" - }, - { - "path": "torch/ao/quantization/backend_config/__init__.py" - }, - { - "path": "torch/ao/quantization/backend_config/native.py" - }, - { - "path": "torch/ao/quantization/backend_config/observation_type.py" - }, - { - "path": "torch/ao/quantization/backend_config/tensorrt.py" - }, - { - "path": "torch/ao/quantization/backend_config/utils.py" - }, - { - "path": "torch/ao/quantization/fx/__init__.py" - }, - { - "path": "torch/ao/quantization/fx/backend_config/fuse_handler.py" - }, - { - "path": "torch/ao/quantization/fx/backend_config/quantize_handler.py" - }, - { - "path": "torch/ao/quantization/fx/backend_config_utils.py" - }, - { - "path": "torch/ao/quantization/fx/convert.py" - }, - { - "path": "torch/ao/quantization/fx/fuse.py" - }, - { - "path": "torch/ao/quantization/fx/fusion_patterns.py" - }, - { - "path": "torch/ao/quantization/fx/match_utils.py" - }, - { - "path": "torch/ao/quantization/fx/pattern_utils.py" - }, - { - "path": "torch/ao/quantization/fx/prepare.py" - }, - { - "path": "torch/ao/quantization/fx/quantization_patterns.py" - }, - { - "path": "torch/ao/quantization/qconfig.py" - }, - { - "path": "torch/ao/quantization/quantization_types.py" - }, - { - "path": "torch/ao/quantization/quantize_fx.py" - }, - { - "path": "torch/autograd/__init__.py" - }, - { - "path": "torch/csrc/Module.cpp" - }, - { - "path": "torch/csrc/autograd/FunctionsManual.cpp" - }, - { - "path": "torch/csrc/autograd/FunctionsManual.h" - }, - { - "path": "torch/csrc/autograd/engine.cpp" - }, - { - "path": "torch/csrc/autograd/function.h" - }, - { - "path": "torch/csrc/autograd/functions/accumulate_grad.h" - }, - { - "path": "torch/csrc/autograd/init.cpp" - }, - { - "path": "torch/csrc/autograd/python_torch_functions_manual.cpp" - }, - { - "path": "torch/csrc/autograd/python_variable.cpp" + "path": "torch/csrc/jit/codegen/onednn/LlgaTensorImpl.cpp" }, { - "path": "torch/csrc/autograd/record_function_ops.h" + "path": "torch/csrc/jit/codegen/onednn/LlgaTensorImpl.h" }, { - "path": "torch/csrc/autograd/utils/grad_layout_contract.h" + "path": "torch/csrc/jit/codegen/onednn/README.md" }, { - "path": "torch/csrc/deploy/CMakeLists.txt" + "path": "torch/csrc/jit/codegen/onednn/defer_size_check.cpp" }, { - "path": "torch/csrc/distributed/c10d/logger.cpp" + "path": "torch/csrc/jit/codegen/onednn/defer_size_check.h" }, { - "path": "torch/csrc/jit/codegen/cuda/graph_fuser.cpp" + "path": "torch/csrc/jit/codegen/onednn/graph_fuser.cpp" }, { - "path": "torch/csrc/jit/codegen/cuda/parser.cpp" + "path": "torch/csrc/jit/codegen/onednn/graph_fuser.h" }, { - "path": "torch/csrc/jit/frontend/function_schema_parser.cpp" + "path": "torch/csrc/jit/codegen/onednn/graph_helper.cpp" }, { - "path": "torch/csrc/jit/frontend/lexer.h" + "path": "torch/csrc/jit/codegen/onednn/graph_helper.h" }, { - "path": "torch/csrc/jit/frontend/parser.cpp" + "path": "torch/csrc/jit/codegen/onednn/graph_rewriter.cpp" }, { - "path": "torch/csrc/jit/frontend/parser.h" + "path": "torch/csrc/jit/codegen/onednn/guard_shape.cpp" }, { - "path": "torch/csrc/jit/frontend/script_type_parser.cpp" + "path": "torch/csrc/jit/codegen/onednn/guard_shape.h" }, { - "path": "torch/csrc/jit/frontend/source_range.cpp" + "path": "torch/csrc/jit/codegen/onednn/interface.cpp" }, { - "path": "torch/csrc/jit/frontend/source_range.h" + "path": "torch/csrc/jit/codegen/onednn/interface.h" }, { - "path": "torch/csrc/jit/frontend/source_ref.h" + "path": "torch/csrc/jit/codegen/onednn/kernel.cpp" }, { - "path": "torch/csrc/jit/frontend/tracer.cpp" + "path": "torch/csrc/jit/codegen/onednn/kernel.h" }, { - "path": "torch/csrc/jit/frontend/tracer.h" + "path": "torch/csrc/jit/codegen/onednn/layout_propagation.cpp" }, { - "path": "torch/csrc/jit/mobile/debug_info.cpp" + "path": "torch/csrc/jit/codegen/onednn/layout_propagation.h" }, { - "path": "torch/csrc/jit/mobile/debug_info.h" + "path": "torch/csrc/jit/codegen/onednn/operator.h" }, { - "path": "torch/csrc/jit/mobile/flatbuffer_loader.cpp" + "path": "torch/csrc/jit/codegen/onednn/prepare_binary.cpp" }, { - "path": "torch/csrc/jit/mobile/module.h" + "path": "torch/csrc/jit/codegen/onednn/prepare_binary.h" }, { - "path": "torch/csrc/jit/passes/common_expression_hoisting.cpp" + "path": "torch/csrc/jit/codegen/onednn/register_interface.cpp" }, { - "path": "torch/csrc/jit/passes/common_expression_hoisting.h" + "path": "torch/csrc/jit/ir/alias_analysis.cpp" }, { - "path": "torch/csrc/jit/passes/frozen_graph_optimizations.cpp" + "path": "torch/csrc/jit/ir/ir.cpp" }, { - "path": "torch/csrc/jit/passes/onnx/pattern_conversion/common.cpp" + "path": "torch/csrc/jit/passes/inline_autodiff_subgraphs.cpp" }, { - "path": "torch/csrc/jit/passes/onnx/scalar_type_analysis.cpp" + "path": "torch/csrc/jit/passes/onednn_graph_fuser.h" }, { "path": "torch/csrc/jit/python/init.cpp" }, { - "path": "torch/csrc/jit/python/python_tree_views.cpp" - }, - { - "path": "torch/csrc/jit/python/script_init.cpp" - }, - { - "path": "torch/csrc/jit/runtime/graph_executor.cpp" - }, - { - "path": "torch/csrc/jit/runtime/interpreter.cpp" - }, - { - "path": "torch/csrc/jit/runtime/profiling_graph_executor_impl.cpp" - }, - { - "path": "torch/csrc/jit/runtime/script_profile.cpp" - }, - { - "path": "torch/csrc/jit/runtime/serialized_shape_function_registry.cpp" - }, - { - "path": "torch/csrc/jit/runtime/serialized_shape_function_registry.h" - }, - { - "path": "torch/csrc/jit/runtime/shape_function_registry.h" - }, - { - "path": "torch/csrc/jit/runtime/shape_functions.h" - }, - { - "path": "torch/csrc/jit/runtime/shape_functions_1.h" - }, - { - "path": "torch/csrc/jit/runtime/static/impl.cpp" - }, - { - "path": "torch/csrc/jit/runtime/static/passes.cpp" - }, - { - "path": "torch/csrc/jit/runtime/symbolic_shape_registry.cpp" - }, - { - "path": "torch/csrc/jit/runtime/symbolic_shape_registry.h" + "path": "torch/csrc/jit/runtime/operator.cpp" }, { - "path": "torch/csrc/jit/serialization/export_module.cpp" - }, + "path": "torch/jit/__init__.py" + } + ], + "pageInfo": { + "endCursor": "Mzc", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ { - "path": "torch/csrc/jit/serialization/flatbuffer_serializer.cpp" + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/jit/serialization/import.cpp" + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/jit/serialization/import_export_helpers.cpp" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/jit/serialization/import_export_helpers.h" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/jit/serialization/import_source.cpp" + "author": { + "login": "pinzhenx" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/jit/serialization/import_source.h" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/jit/serialization/source_range_serialization.cpp" + "author": { + "login": "chunyuan-w" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/jit/serialization/source_range_serialization.h" + "author": { + "login": "eellison" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/jit/testing/file_check.cpp" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/lazy/core/dynamic_ir.cpp" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/lazy/core/dynamic_ir.h" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/lazy/ts_backend/ts_eager_fallback.cpp" - } - ], - "pageInfo": { - "endCursor": "MzAw", - "hasNextPage": true - } - } - } - } - } - }, - "query_sha=0a34acb829d8aca9dd28a8ba388dfa52f6ecdde7e903ace1caabdcfaba87de98 cursor=MzAw name=pytorch number=76118 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "files": { - "nodes": [ - { - "path": "torch/csrc/lazy/ts_backend/ts_native_functions.cpp" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/utils/python_arg_parser.cpp" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/utils/python_arg_parser.h" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/utils/tensor_list.cpp" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/utils/tensor_new.cpp" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/csrc/utils/tensor_new.h" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/__init__.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/api.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/replicated_tensor.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/sharded_tensor/__init__.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/sharded_tensor/api.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/_shard/sharded_tensor/utils.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/algorithms/model_averaging/utils.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/fsdp/_optim_utils.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/fsdp/fully_sharded_data_parallel.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/nn/__init__.py" + "author": { + "login": "wukong1992" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/nn/functional.py" + "author": { + "login": "eellison" + }, + "state": "COMMENTED" }, { - "path": "torch/distributed/optim/functional_adagrad.py" + "author": { + "login": "eellison" + }, + "state": "COMMENTED" }, { - "path": "torch/fx/experimental/meta_tracer.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/fx/graph.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/jit/_shape_functions.py" + "author": { + "login": "eellison" + }, + "state": "COMMENTED" }, { - "path": "torch/nn/parallel/_replicated_tensor_ddp_interop.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/nn/parallel/_replicated_tensor_ddp_utils.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/nn/parallel/distributed.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/nn/utils/_expanded_weights/__init__.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/nn/utils/_expanded_weights/instance_norm_expanded_weights.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/onnx/symbolic_opset11.py" + "author": { + "login": "eellison" + }, + "state": "APPROVED" }, { - "path": "torch/onnx/symbolic_opset12.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/onnx/symbolic_opset9.py" + "author": { + "login": "eellison" + }, + "state": "COMMENTED" }, { - "path": "torch/optim/adagrad.py" + "author": { + "login": "malfet" + }, + "state": "COMMENTED" }, { - "path": "torch/optim/lr_scheduler.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/overrides.py" + "author": { + "login": "malfet" + }, + "state": "COMMENTED" }, { - "path": "torch/quantization/fx/pattern_utils.py" + "author": { + "login": "malfet" + }, + "state": "COMMENTED" }, { - "path": "torch/quantization/fx/quantization_patterns.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/quantization/fx/quantization_types.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/return_types.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" }, { - "path": "torch/testing/_internal/common_device_type.py" + "author": { + "login": "sanchitintel" + }, + "state": "COMMENTED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMS0xMi0xMFQwOToyNDoxOS0wODowMLkyMDIxLTEyLTEwVDA5OjI0OjE5LTA4OjAwzjFryLE=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "Looks like this broke master https://hud.pytorch.org/pytorch/pytorch/commit/7dd08230117f4fa8bb82b3524e90fb00340198c7. I am reverting.", + "createdAt": "2022-03-21T22:51:38Z", + "author": { + "login": "suo" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074498483 }, { - "path": "torch/testing/_internal/common_distributed.py" + "bodyText": "@pytorchbot revert this", + "createdAt": "2022-03-21T22:51:44Z", + "author": { + "login": "suo" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074498550 }, { - "path": "torch/testing/_internal/common_fx2trt.py" + "bodyText": "Looks like this broke master https://hud.pytorch.org/pytorch/pytorch/commit/7dd08230117f4fa8bb82b3524e90fb00340198c7. I am reverting.\n\nOops! Will fix it ASAP.", + "createdAt": "2022-03-21T22:53:34Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1074499668 }, { - "path": "torch/testing/_internal/common_methods_invocations.py" + "bodyText": "This pull request has been reverted by e5bf879. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).", + "createdAt": "2022-03-21T23:07:23Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074508608 }, { - "path": "torch/testing/_internal/common_utils.py" - }, + "bodyText": "This pull request has been reverted by e5bf879. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).", + "createdAt": "2022-03-30T00:53:50Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1082508130 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQAuLsw==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ { - "path": "torch/testing/_internal/composite_compliance.py" + "node": { + "name": "oncall: jit" + } }, { - "path": "torch/testing/_internal/distributed/distributed_test.py" + "node": { + "name": "triaged" + } }, { - "path": "torch/testing/_internal/jit_metaprogramming_utils.py" + "node": { + "name": "open source" + } }, { - "path": "torch/utils/cpp_extension.py" + "node": { + "name": "cla signed" + } }, { - "path": "torch/utils/data/datapipes/_typing.py" + "node": { + "name": "Reverted" + } }, { - "path": "torch/utils/model_dump/__init__.py" + "node": { + "name": "intel priority" + } } - ], - "pageInfo": { - "endCursor": "MzQ4", - "hasNextPage": false - } + ] } } } } }, - "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAWuVD9M= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAXEsRtE= name=pytorch number=76118 owner=pytorch": { + "query_sha=2e2877d2452c4f233f042b7ccd50ab9c2a6e9a73d8819a0c876203c12364e8a3 cursor=Y3Vyc29yOnYyOpHOQAuLsw== name=pytorch number=68111 owner=pytorch": { "data": { "repository": { "pullRequest": { - "commits": { + "comments": { "nodes": [ { - "commit": { - "oid": "5696e8357cf38f852ef3d680381513e26f202371", - "checkSuites": { - "nodes": [ - { - "checkRuns": { - "nodes": [ - { - "name": "win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/6099898412?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAWuVECw=", - "hasNextPage": false - } - } - } - ] - } - } - } - ] - } - } - } - } - }, - "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=pytorch-dev-infra org=pytorch": { - "data": { - "organization": { - "team": { - "members": { - "nodes": [ + "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/chunyuan-w/pytorch/blob/7496bf1588050191595d833d23b8972b2f22655e/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk\n\u2705 triggered\n\n\nlinux-docs\nciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-vulkan-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-build\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-custom-build-static\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-onnx\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7-no-ops\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\ncaffe2-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\ndocker-builds\nciflow/all, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-custom-ops\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-full-jit\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-metal\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-full-jit\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda10.2-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-binary-conda\nciflow/binaries, ciflow/binaries/conda\n\ud83d\udeab skipped\n\n\nlinux-binary-libtorch-cxx11-abi\nciflow/binaries, ciflow/binaries/libtorch\n\ud83d\udeab skipped\n\n\nlinux-binary-libtorch-pre-cxx11\nciflow/binaries, ciflow/binaries/libtorch\n\ud83d\udeab skipped\n\n\nlinux-binary-manywheel\nciflow/binaries, ciflow/binaries/wheel\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-docs-push\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-no-ops\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-arm64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-lite-interpreter-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-11-py3-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.1-py3.7-gcc7-debug\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.1-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.5-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-build\nciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\n\n\nYou can add a comment to the PR and tag @pytorchbot with the following commands:\n\n# ciflow rerun, \"ciflow/default\" will always be added automatically\n@pytorchbot ciflow rerun\n\n# ciflow rerun with additional labels \"-l \", which is equivalent to adding these labels manually and trigger the rerun\n@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow\n\nFor more information, please take a look at the CI Flow Wiki.", + "createdAt": "2021-11-10T08:42:49Z", + "author": { + "login": "pytorch-probot" + }, + "authorAssociation": "NONE", + "editor": { + "login": "pytorch-probot" + }, + "databaseId": 964902865 + }, { - "login": "kit1980" + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/68111\nNeed help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 7388141 (more details on the Dr. CI page):\n\n\n29/29 failures introduced in this PR\n\n\n\ud83d\udd75\ufe0f 29 new failures recognized by patterns\nThe following CI failures do not appear to be due to upstream breakages:\n pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (1/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:31:38.6978776Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:31:38.3001628Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:31:38.5169168Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:31:38.5362923Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:31:38.5413452Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:31:38.5458747Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:31:38.5484014Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:31:38.5497924Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:31:38.5656491Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:31:38.5678893Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:31:38.6888479Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0f6488c20adb4dca4\n2022-03-21T21:31:38.6978776Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:31:38.6992648Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:31:38.7003010Z ##[error]Process completed with exit code 2.\n2022-03-21T21:31:38.7044027Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:31:38.7044261Z with:\n2022-03-21T21:31:38.7044413Z env:\n2022-03-21T21:31:38.7044565Z IN_CI: 1\n2022-03-21T21:31:38.7044709Z IS_GHA: 1\n2022-03-21T21:31:38.7044885Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:31:38.7045067Z ##[endgroup]\n2022-03-21T21:31:38.7060958Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge) (2/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:35:19.2635222Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:35:18.9028722Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:35:19.1132721Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:35:19.1310590Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:35:19.1360251Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:35:19.1386865Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:35:19.1429182Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:35:19.1441925Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:35:19.1468280Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:35:19.1617667Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:35:19.2545368Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-098be2985e0392130\n2022-03-21T21:35:19.2635222Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:35:19.2648463Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:35:19.2658727Z ##[error]Process completed with exit code 2.\n2022-03-21T21:35:19.2706355Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:35:19.2706591Z with:\n2022-03-21T21:35:19.2706748Z env:\n2022-03-21T21:35:19.2706908Z IN_CI: 1\n2022-03-21T21:35:19.2707061Z IS_GHA: 1\n2022-03-21T21:35:19.2707246Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:35:19.2707438Z ##[endgroup]\n2022-03-21T21:35:19.2724554Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (3/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:11:57.5531419Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:11:52.7662022Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T23:11:53.1213298Z ---------------------------------------- 8.1/8.1 MB 23.6 MB/s eta 0:00:00\n2022-03-21T23:11:53.1644665Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:11:53.2218699Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T23:11:53.2389674Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T23:11:53.2787295Z -------------------------------------- 247.7/247.7 KB 7.4 MB/s eta 0:00:00\n2022-03-21T23:11:53.3761842Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:11:53.5457622Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T23:11:57.4175080Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T23:11:57.5296815Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0105d4db093574f40\n2022-03-21T23:11:57.5531419Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:11:57.5564814Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:11:57.5587712Z ##[error]Process completed with exit code 2.\n2022-03-21T23:11:57.5790311Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T23:11:57.5790832Z with:\n2022-03-21T23:11:57.5791104Z env:\n2022-03-21T23:11:57.5791358Z IN_CI: 1\n2022-03-21T23:11:57.5791620Z IS_GHA: 1\n2022-03-21T23:11:57.5791939Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:11:57.5792425Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T23:11:57.5792884Z ##[endgroup]\n\n\n pull / linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu) (4/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T02:17:12.6257577Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T02:17:11.9280556Z Using cached https://files.pythonhosted.org/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl\n2022-03-22T02:17:11.9335199Z Collecting urllib3<1.27,>=1.25.4 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:11.9682045Z Using cached https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl\n2022-03-22T02:17:11.9850357Z Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:12.0403171Z Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n2022-03-22T02:17:12.0468875Z Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T02:17:12.0590000Z Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n2022-03-22T02:17:12.0607093Z Installing collected packages: jmespath, urllib3, six, python-dateutil, botocore, s3transfer, boto3\n2022-03-22T02:17:12.5273459Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.9\n2022-03-22T02:17:12.6032812Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 worker-rocm-amd-114\n2022-03-22T02:17:12.6257577Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T02:17:12.6259543Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T02:17:12.6291924Z ##[error]Process completed with exit code 2.\n2022-03-22T02:17:12.6387977Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T02:17:12.6388298Z with:\n2022-03-22T02:17:12.6388521Z wait-ssh: false\n2022-03-22T02:17:12.6388727Z env:\n2022-03-22T02:17:12.6388932Z IN_CI: 1\n2022-03-22T02:17:12.6389143Z IS_GHA: 1\n2022-03-22T02:17:12.6389368Z GIT_DEFAULT_BRANCH: master\n2022-03-22T02:17:12.6389669Z DOCKER_HOST: unix:///run/user/1121/docker.sock\n\n\n pull / linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge) (5/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:19:24.4890693Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:19:24.0962005Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:19:24.3152253Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:19:24.3341183Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:19:24.3391374Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:19:24.3436392Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:19:24.3448982Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:19:24.3474092Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:19:24.3502003Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:19:24.3655072Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:19:24.4799309Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0bc9250521f338cae\n2022-03-21T22:19:24.4890693Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:19:24.4903625Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:19:24.4913841Z ##[error]Process completed with exit code 2.\n2022-03-21T22:19:24.4957338Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:19:24.4957575Z with:\n2022-03-21T22:19:24.4957735Z env:\n2022-03-21T22:19:24.4957900Z IN_CI: 1\n2022-03-21T22:19:24.4958055Z IS_GHA: 1\n2022-03-21T22:19:24.4958246Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:19:24.4958437Z ##[endgroup]\n2022-03-21T22:19:24.4989649Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu) (6/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T01:05:07.6983899Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T01:05:06.8364546Z Using cached https://files.pythonhosted.org/packages/7b/9c/f51775ebe7df5a7aa4e7c79ed671bde94e154bd968aca8d65bb24aba0c8c/s3transfer-0.5.2-py3-none-any.whl\n2022-03-22T01:05:06.8431763Z Collecting urllib3<1.27,>=1.25.4 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:06.8949391Z Using cached https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl\n2022-03-22T01:05:06.9180079Z Collecting python-dateutil<3.0.0,>=2.1 (from botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:06.9803351Z Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n2022-03-22T01:05:06.9882133Z Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12)\n2022-03-22T01:05:07.0067062Z Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n2022-03-22T01:05:07.0088676Z Installing collected packages: urllib3, jmespath, six, python-dateutil, botocore, s3transfer, boto3\n2022-03-22T01:05:07.5819667Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2 six-1.16.0 urllib3-1.26.9\n2022-03-22T01:05:07.6774717Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 worker-rocm-amd-60\n2022-03-22T01:05:07.6983899Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T01:05:07.6988652Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T01:05:07.7023073Z ##[error]Process completed with exit code 2.\n2022-03-22T01:05:07.7102087Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T01:05:07.7102389Z with:\n2022-03-22T01:05:07.7102603Z wait-ssh: false\n2022-03-22T01:05:07.7102820Z env:\n2022-03-22T01:05:07.7103015Z IN_CI: 1\n2022-03-22T01:05:07.7103224Z IS_GHA: 1\n2022-03-22T01:05:07.7103458Z GIT_DEFAULT_BRANCH: master\n2022-03-22T01:05:07.7103737Z DOCKER_HOST: unix:///run/user/1502/docker.sock\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge) (7/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:51:39.3637996Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:51:39.2041249Z Attempting uninstall: s3transfer\n2022-03-21T20:51:39.2043010Z Found existing installation: s3transfer 0.3.7\n2022-03-21T20:51:39.2083799Z Uninstalling s3transfer-0.3.7:\n2022-03-21T20:51:39.2089675Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T20:51:39.2480546Z Attempting uninstall: boto3\n2022-03-21T20:51:39.2482953Z Found existing installation: boto3 1.16.34\n2022-03-21T20:51:39.2584292Z Uninstalling boto3-1.16.34:\n2022-03-21T20:51:39.2599474Z Successfully uninstalled boto3-1.16.34\n2022-03-21T20:51:39.3130921Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T20:51:39.3550598Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-03ef7efc3078e3da5\n2022-03-21T20:51:39.3637996Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:51:39.3650651Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:51:39.3660484Z ##[error]Process completed with exit code 2.\n2022-03-21T20:51:39.3696465Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:51:39.3696693Z with:\n2022-03-21T20:51:39.3696850Z env:\n2022-03-21T20:51:39.3697012Z IN_CI: 1\n2022-03-21T20:51:39.3697161Z IS_GHA: 1\n2022-03-21T20:51:39.3697342Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:51:39.3697528Z ##[endgroup]\n2022-03-21T20:51:39.3730420Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge) (8/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:03:36.3916860Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:03:36.0096309Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:03:36.2278560Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:03:36.2461618Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:03:36.2513260Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:03:36.2541524Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:03:36.2554899Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:03:36.2598277Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:03:36.2758299Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:03:36.2780690Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:03:36.3825021Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0a4a552890e6ef7d3\n2022-03-21T21:03:36.3916860Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:03:36.3930343Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:03:36.3941263Z ##[error]Process completed with exit code 2.\n2022-03-21T21:03:36.3979258Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:03:36.3979496Z with:\n2022-03-21T21:03:36.3979654Z env:\n2022-03-21T21:03:36.3979814Z IN_CI: 1\n2022-03-21T21:03:36.3979968Z IS_GHA: 1\n2022-03-21T21:03:36.3980157Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:03:36.3980360Z ##[endgroup]\n2022-03-21T21:03:36.3996257Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (9/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T00:41:15.5325784Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T00:41:10.3015614Z Downloading s3transfer-0.5.2-py3-none-any.whl (79 kB)\n2022-03-22T00:41:10.3625659Z ---------------------------------------- 79.5/79.5 KB 1.1 MB/s eta 0:00:00\n2022-03-22T00:41:10.4120236Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-22T00:41:10.4170155Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-22T00:41:10.4722115Z -------------------------------------- 247.7/247.7 KB 5.2 MB/s eta 0:00:00\n2022-03-22T00:41:10.4843512Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-22T00:41:10.6596108Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-22T00:41:10.8733354Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-22T00:41:15.3745408Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-22T00:41:15.4987162Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-09cacc848abc3dd32\n2022-03-22T00:41:15.5325784Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T00:41:15.5373630Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T00:41:15.5404353Z ##[error]Process completed with exit code 2.\n2022-03-22T00:41:15.5790508Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-22T00:41:15.5791192Z with:\n2022-03-22T00:41:15.5791530Z env:\n2022-03-22T00:41:15.5791849Z IN_CI: 1\n2022-03-22T00:41:15.5792186Z IS_GHA: 1\n2022-03-22T00:41:15.5792599Z GIT_DEFAULT_BRANCH: master\n2022-03-22T00:41:15.5793237Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-22T00:41:15.5793831Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge) (10/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:50:32.9799307Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:50:32.8167560Z Attempting uninstall: s3transfer\n2022-03-21T20:50:32.8169351Z Found existing installation: s3transfer 0.3.7\n2022-03-21T20:50:32.8213295Z Uninstalling s3transfer-0.3.7:\n2022-03-21T20:50:32.8219209Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T20:50:32.8602320Z Attempting uninstall: boto3\n2022-03-21T20:50:32.8603289Z Found existing installation: boto3 1.16.34\n2022-03-21T20:50:32.8704535Z Uninstalling boto3-1.16.34:\n2022-03-21T20:50:32.8719403Z Successfully uninstalled boto3-1.16.34\n2022-03-21T20:50:32.9244278Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T20:50:32.9710449Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0c568461a276d4a71\n2022-03-21T20:50:32.9799307Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:50:32.9812238Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:50:32.9823052Z ##[error]Process completed with exit code 2.\n2022-03-21T20:50:32.9859290Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:50:32.9859527Z with:\n2022-03-21T20:50:32.9859664Z env:\n2022-03-21T20:50:32.9859817Z IN_CI: 1\n2022-03-21T20:50:32.9859977Z IS_GHA: 1\n2022-03-21T20:50:32.9860144Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:50:32.9860327Z ##[endgroup]\n2022-03-21T20:50:32.9893642Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge) (11/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:05:00.7163042Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:05:00.6660824Z #10 0x55fc8a3ea801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:05:00.6661768Z #11 0x55fc8a3f57a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:05:00.6662455Z #12 0x55fc8a3f580b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:05:00.6663570Z #13 0x55fc8a3f5908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:05:00.6663952Z #14 0x55fc8a3f5908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:05:00.6664431Z #15 0x55fc8a3f5908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:05:00.6665304Z #16 0x55fc8a3f5ccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:05:00.7162113Z #17 0x7f940d00f83f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:05:00.7162534Z #18 0x55fc8a39a554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:05:00.7162711Z \n2022-03-21T21:05:00.7163042Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:05:00.7334595Z + retcode=1\n2022-03-21T21:05:00.7334954Z + set -e\n2022-03-21T21:05:00.7335215Z + return 1\n2022-03-21T21:05:00.7338688Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:05:00.7339232Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:05:00.7340113Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:05:00.7340612Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:05:00.7341187Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:05:00.7341668Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:05:00.7344466Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge) (12/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:06:03.4437430Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:06:03.0752199Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:06:03.2853252Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:06:03.3032326Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:06:03.3081589Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:06:03.3093911Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:06:03.3120244Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:06:03.3162406Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:06:03.3188431Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:06:03.3337181Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:06:03.4348072Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0ee48c8811fafc444\n2022-03-21T22:06:03.4437430Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:06:03.4450920Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:06:03.4461263Z ##[error]Process completed with exit code 2.\n2022-03-21T22:06:03.4502346Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:06:03.4502576Z with:\n2022-03-21T22:06:03.4502730Z env:\n2022-03-21T22:06:03.4502888Z IN_CI: 1\n2022-03-21T22:06:03.4503038Z IS_GHA: 1\n2022-03-21T22:06:03.4503302Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:06:03.4503492Z ##[endgroup]\n2022-03-21T22:06:03.4519156Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (13/29)\nStep: \"Test\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:50:13.2205634Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T20:50:12.8679322Z + python3 -m pip install boto3==1.19.12\n2022-03-21T20:50:13.0744228Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T20:50:13.0916284Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T20:50:13.0964264Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T20:50:13.1005656Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T20:50:13.1017299Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T20:50:13.1041042Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T20:50:13.1189450Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T20:50:13.1208751Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T20:50:13.2119445Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d02da60fd18c22f5\n2022-03-21T20:50:13.2205634Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T20:50:13.2217939Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T20:50:13.2220259Z ##[error]Process completed with exit code 2.\n2022-03-21T20:50:13.2248664Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T20:50:13.2249012Z with:\n2022-03-21T20:50:13.2249260Z env:\n2022-03-21T20:50:13.2249500Z IN_CI: 1\n2022-03-21T20:50:13.2249738Z IS_GHA: 1\n2022-03-21T20:50:13.2250025Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:50:13.2250329Z ##[endgroup]\n2022-03-21T20:50:13.2272735Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (14/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:47:38.0451999Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:47:37.5554508Z + python3 -m pip install boto3==1.19.12\n2022-03-21T23:47:37.8411473Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T23:47:37.8631484Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T23:47:37.8699561Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T23:47:37.8737037Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T23:47:37.8754443Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T23:47:37.8814393Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T23:47:37.8849540Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:47:37.9059579Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:47:38.0336298Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0b44f47f4292089a2\n2022-03-21T23:47:38.0451999Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:47:38.0469471Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:47:38.0484106Z ##[error]Process completed with exit code 2.\n2022-03-21T23:47:38.0532678Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T23:47:38.0533007Z with:\n2022-03-21T23:47:38.0533223Z env:\n2022-03-21T23:47:38.0533440Z IN_CI: 1\n2022-03-21T23:47:38.0533649Z IS_GHA: 1\n2022-03-21T23:47:38.0533902Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:47:38.0534170Z GPU_FLAG: --gpus all\n2022-03-21T23:47:38.0534401Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge) (15/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:04:59.3115800Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:04:59.2595213Z #10 0x55a7f39a4801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:04:59.2595707Z #11 0x55a7f39af7a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:04:59.2597203Z #12 0x55a7f39af80b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:04:59.2598205Z #13 0x55a7f39af908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:04:59.2598697Z #14 0x55a7f39af908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:04:59.2599178Z #15 0x55a7f39af908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:04:59.2599747Z #16 0x55a7f39afccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:04:59.3114751Z #17 0x7f3b3822383f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:04:59.3115277Z #18 0x55a7f3954554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:04:59.3115468Z \n2022-03-21T21:04:59.3115800Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:04:59.3292385Z + retcode=1\n2022-03-21T21:04:59.3292781Z + set -e\n2022-03-21T21:04:59.3293062Z + return 1\n2022-03-21T21:04:59.3295462Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:04:59.3295802Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:04:59.3296394Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:04:59.3296700Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:04:59.3297055Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:04:59.3297416Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:04:59.3299623Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (16/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:14:31.7846086Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:14:25.5525714Z Collecting jmespath<1.0.0,>=0.7.1\n2022-03-21T22:14:25.5568155Z Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)\n2022-03-21T22:14:25.5952617Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T22:14:25.6169392Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T22:14:25.6629996Z -------------------------------------- 247.7/247.7 KB 5.1 MB/s eta 0:00:00\n2022-03-21T22:14:25.6710247Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:14:25.8284354Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:14:25.9816751Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T22:14:31.6672236Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T22:14:31.7630473Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0ed0915ecee5d2424\n2022-03-21T22:14:31.7846086Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:14:31.7876742Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:14:31.7897140Z ##[error]Process completed with exit code 2.\n2022-03-21T22:14:31.8195621Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T22:14:31.8196110Z with:\n2022-03-21T22:14:31.8196356Z env:\n2022-03-21T22:14:31.8196614Z IN_CI: 1\n2022-03-21T22:14:31.8196876Z IS_GHA: 1\n2022-03-21T22:14:31.8197169Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:14:31.8197652Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T22:14:31.8198093Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge) (17/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:19:15.8845728Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:19:15.5116060Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:19:15.7231476Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:19:15.7409711Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:19:15.7458478Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:19:15.7470508Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:19:15.7496799Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:19:15.7538362Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:19:15.7566161Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:19:15.7711630Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:19:15.8753543Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0e2b3b4ddb246ff2a\n2022-03-21T21:19:15.8845728Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:19:15.8859814Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:19:15.8870165Z ##[error]Process completed with exit code 2.\n2022-03-21T21:19:15.8917039Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:19:15.8917279Z with:\n2022-03-21T21:19:15.8917433Z env:\n2022-03-21T21:19:15.8917586Z IN_CI: 1\n2022-03-21T21:19:15.8917734Z IS_GHA: 1\n2022-03-21T21:19:15.8917917Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:19:15.8918102Z ##[endgroup]\n2022-03-21T21:19:15.8934572Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (18/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T23:19:48.5900162Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T23:19:48.0742254Z + python3 -m pip install boto3==1.19.12\n2022-03-21T23:19:48.3742563Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T23:19:48.3976536Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T23:19:48.4048700Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T23:19:48.4065374Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T23:19:48.4128076Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T23:19:48.4164273Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T23:19:48.4202610Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T23:19:48.4416723Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T23:19:48.5773033Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-07ab7a3c4a5402af2\n2022-03-21T23:19:48.5900162Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T23:19:48.5919822Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T23:19:48.5936087Z ##[error]Process completed with exit code 2.\n2022-03-21T23:19:48.6007930Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T23:19:48.6008268Z with:\n2022-03-21T23:19:48.6008483Z env:\n2022-03-21T23:19:48.6008701Z IN_CI: 1\n2022-03-21T23:19:48.6008920Z IS_GHA: 1\n2022-03-21T23:19:48.6009170Z GIT_DEFAULT_BRANCH: master\n2022-03-21T23:19:48.6009440Z GPU_FLAG: --gpus all\n2022-03-21T23:19:48.6009671Z ##[endgroup]\n\n\n pull / win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu) (19/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:54:04.2844259Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:53:59.0889659Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T22:53:59.6881416Z ---------------------------------------- 8.1/8.1 MB 14.0 MB/s eta 0:00:00\n2022-03-21T22:53:59.7427779Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:53:59.7691882Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T22:53:59.7779847Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T22:53:59.8281663Z -------------------------------------- 247.7/247.7 KB 5.1 MB/s eta 0:00:00\n2022-03-21T22:54:00.0185115Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:54:00.2359770Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T22:54:04.1208891Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T22:54:04.2505862Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-03b4fbe63be8ef4b0\n2022-03-21T22:54:04.2844259Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:54:04.2891082Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:54:04.2919900Z ##[error]Process completed with exit code 2.\n2022-03-21T22:54:04.3377901Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T22:54:04.3378575Z with:\n2022-03-21T22:54:04.3378930Z env:\n2022-03-21T22:54:04.3379275Z IN_CI: 1\n2022-03-21T22:54:04.3379600Z IS_GHA: 1\n2022-03-21T22:54:04.3380023Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:54:04.3380691Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T22:54:04.3381278Z ##[endgroup]\n\n\n pull / linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge) (20/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:09:34.0074610Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:09:33.6365531Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:09:33.8475619Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:09:33.8655152Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:09:33.8704395Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:09:33.8716774Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:09:33.8760145Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:09:33.8785000Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:09:33.8811316Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:09:33.8960134Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:09:33.9984866Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d325eb9fd156146f\n2022-03-21T22:09:34.0074610Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:09:34.0087465Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:09:34.0101743Z ##[error]Process completed with exit code 2.\n2022-03-21T22:09:34.0154014Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:09:34.0154246Z with:\n2022-03-21T22:09:34.0154412Z env:\n2022-03-21T22:09:34.0154574Z IN_CI: 1\n2022-03-21T22:09:34.0154728Z IS_GHA: 1\n2022-03-21T22:09:34.0154917Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:09:34.0155112Z ##[endgroup]\n2022-03-21T22:09:34.0191047Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge) (21/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:03:17.8502655Z [E request_callbac...yUniqueId(created_on=0, local_id=0) to be created.\n\n2022-03-21T21:03:14.4669960Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxgdsmeer\n2022-03-21T21:03:14.4671407Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxgdsmeer/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.4973023Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1i2hfmpc\n2022-03-21T21:03:14.4973800Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1i2hfmpc/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.5532339Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgx4da7b0\n2022-03-21T21:03:14.5533064Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgx4da7b0/_remote_module_non_sriptable.py\n2022-03-21T21:03:14.7050673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0\n2022-03-21T21:03:14.7097127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3\n2022-03-21T21:03:14.7398339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2\n2022-03-21T21:03:14.7922283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1\n2022-03-21T21:03:17.8502655Z [E request_callback_no_python.cpp:559] Received error while processing request type 261: false INTERNAL ASSERT FAILED at \"/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp\":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.\n2022-03-21T21:03:17.8503603Z Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):\n2022-03-21T21:03:17.8504385Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x69 (0x7f180df19e19 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8505131Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xd2 (0x7f180df160e2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8505927Z frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) + 0x4e (0x7f180df17a7e in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\n2022-03-21T21:03:17.8506674Z frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0x4b4 (0x7f18118b7b64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8507642Z frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr >) const + 0x70 (0x7f18118a7bf0 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8508613Z frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0xc8 (0x7f1819736208 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\n2022-03-21T21:03:17.8509749Z frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7f18118ac914 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n2022-03-21T21:03:17.8510708Z frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f1819735865 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\n2022-03-21T21:03:17.8511369Z frame #8: + 0x375249a (0x7f18118a949a in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test (22/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T20:01:07.7015580Z \ufffd[36;1m echo \"ERR...t available for the merge-base of your branch\"\ufffd[0m\n\n2022-03-21T20:01:07.7012399Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7012634Z \ufffd[36;1m# Covers the case where a previous tag doesn't exist for the tree\ufffd[0m\n2022-03-21T20:01:07.7012992Z \ufffd[36;1m# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly\ufffd[0m\n2022-03-21T20:01:07.7013373Z \ufffd[36;1mif ! git rev-parse \"$MERGE_BASE:.circleci/docker\"; then\ufffd[0m\n2022-03-21T20:01:07.7013784Z \ufffd[36;1m echo \"Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit\"\ufffd[0m\n2022-03-21T20:01:07.7014149Z \ufffd[36;1m exit 1\ufffd[0m\n2022-03-21T20:01:07.7014325Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7014573Z \ufffd[36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse \"$MERGE_BASE:.circleci/docker\")\ufffd[0m\n2022-03-21T20:01:07.7014907Z \ufffd[36;1m# If no image exists but the hash is the same as the previous hash then we should error out here\ufffd[0m\n2022-03-21T20:01:07.7015231Z \ufffd[36;1mif [[ \"${PREVIOUS_DOCKER_TAG}\" = \"${DOCKER_TAG}\" ]]; then\ufffd[0m\n2022-03-21T20:01:07.7015580Z \ufffd[36;1m echo \"ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch\"\ufffd[0m\n2022-03-21T20:01:07.7015931Z \ufffd[36;1m echo \" contact the PyTorch team to restore the original images\"\ufffd[0m\n2022-03-21T20:01:07.7016225Z \ufffd[36;1m exit 1\ufffd[0m\n2022-03-21T20:01:07.7016400Z \ufffd[36;1mfi\ufffd[0m\n2022-03-21T20:01:07.7016608Z \ufffd[36;1mecho ::set-output name=rebuild::yes\ufffd[0m\n2022-03-21T20:01:07.7027605Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}\n2022-03-21T20:01:07.7027837Z env:\n2022-03-21T20:01:07.7028006Z IN_CI: 1\n2022-03-21T20:01:07.7028159Z IS_GHA: 1\n2022-03-21T20:01:07.7028346Z GIT_DEFAULT_BRANCH: master\n2022-03-21T20:01:07.7028589Z BASE_REVISION: 6643522db9ff595f564b8081de58b3a33c546178\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu) (23/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-22T00:49:54.2949572Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-22T00:49:53.8049151Z + python3 -m pip install boto3==1.19.12\n2022-03-22T00:49:54.0981629Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-22T00:49:54.1207562Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-22T00:49:54.1277146Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-22T00:49:54.1315027Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-22T00:49:54.1331813Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-22T00:49:54.1391622Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-22T00:49:54.1609217Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-22T00:49:54.1637417Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-22T00:49:54.2830197Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0f7c32fe13be12fea\n2022-03-22T00:49:54.2949572Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-22T00:49:54.2966933Z + GHA_WORKFLOW_JOB_ID=\n2022-03-22T00:49:54.2982588Z ##[error]Process completed with exit code 2.\n2022-03-22T00:49:54.3031464Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-22T00:49:54.3031794Z with:\n2022-03-22T00:49:54.3032012Z env:\n2022-03-22T00:49:54.3032227Z IN_CI: 1\n2022-03-22T00:49:54.3032434Z IS_GHA: 1\n2022-03-22T00:49:54.3032681Z GIT_DEFAULT_BRANCH: master\n2022-03-22T00:49:54.3033084Z GPU_FLAG: --gpus all\n2022-03-22T00:49:54.3033312Z ##[endgroup]\n\n\n pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (24/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:56:12.5872636Z C:\\actions-runner\\...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:56:07.3365589Z Downloading botocore-1.22.12-py3-none-any.whl (8.1 MB)\n2022-03-21T21:56:07.7926584Z ---------------------------------------- 8.1/8.1 MB 17.3 MB/s eta 0:00:00\n2022-03-21T21:56:07.9319362Z Collecting python-dateutil<3.0.0,>=2.1\n2022-03-21T21:56:07.9366132Z Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)\n2022-03-21T21:56:08.0077590Z -------------------------------------- 247.7/247.7 KB 3.0 MB/s eta 0:00:00\n2022-03-21T21:56:08.0164070Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:56:08.1775537Z Requirement already satisfied: six>=1.5 in c:\\actions-runner\\_work\\_tool\\python\\3.10.3\\x64\\lib\\site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:56:08.3393469Z Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, boto3\n2022-03-21T21:56:12.4576766Z Successfully installed boto3-1.19.12 botocore-1.22.12 jmespath-0.10.0 python-dateutil-2.8.2 s3transfer-0.5.2\n2022-03-21T21:56:12.5641959Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0afad69838118af0e\n2022-03-21T21:56:12.5872636Z C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\\python3.exe: can't open file 'C:\\\\actions-runner\\\\_work\\\\pytorch\\\\pytorch\\\\.github\\\\scripts\\\\get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:56:12.5905611Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:56:12.5927729Z ##[error]Process completed with exit code 2.\n2022-03-21T21:56:12.6239531Z ##[group]Run pytorch/pytorch/.github/actions/teardown-win@master\n2022-03-21T21:56:12.6240039Z with:\n2022-03-21T21:56:12.6240299Z env:\n2022-03-21T21:56:12.6240557Z IN_CI: 1\n2022-03-21T21:56:12.6240805Z IS_GHA: 1\n2022-03-21T21:56:12.6241118Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:56:12.6241613Z pythonLocation: C:\\actions-runner\\_work\\_tool\\Python\\3.10.3\\x64\n2022-03-21T21:56:12.6242052Z ##[endgroup]\n\n\n pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (25/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:46:39.5474616Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:46:39.1884210Z + python3 -m pip install boto3==1.19.12\n2022-03-21T21:46:39.3928976Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T21:46:39.4105069Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T21:46:39.4152571Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T21:46:39.4194931Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T21:46:39.4218947Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T21:46:39.4230812Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T21:46:39.4380089Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T21:46:39.4399461Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T21:46:39.5387703Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0888bed1149cca415\n2022-03-21T21:46:39.5474616Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:46:39.5487145Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:46:39.5497480Z ##[error]Process completed with exit code 2.\n2022-03-21T21:46:39.5541319Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:46:39.5541544Z with:\n2022-03-21T21:46:39.5541698Z env:\n2022-03-21T21:46:39.5541851Z IN_CI: 1\n2022-03-21T21:46:39.5541997Z IS_GHA: 1\n2022-03-21T21:46:39.5542176Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:46:39.5542361Z ##[endgroup]\n2022-03-21T21:46:39.5557878Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (26/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:34:57.0623859Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:34:56.9039884Z Attempting uninstall: s3transfer\n2022-03-21T21:34:56.9041446Z Found existing installation: s3transfer 0.3.7\n2022-03-21T21:34:56.9090783Z Uninstalling s3transfer-0.3.7:\n2022-03-21T21:34:56.9095968Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T21:34:56.9453014Z Attempting uninstall: boto3\n2022-03-21T21:34:56.9454356Z Found existing installation: boto3 1.16.34\n2022-03-21T21:34:56.9564320Z Uninstalling boto3-1.16.34:\n2022-03-21T21:34:56.9578035Z Successfully uninstalled boto3-1.16.34\n2022-03-21T21:34:57.0091363Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T21:34:57.0536230Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-034a3afd5d80b91fd\n2022-03-21T21:34:57.0623859Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:34:57.0637167Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:34:57.0647396Z ##[error]Process completed with exit code 2.\n2022-03-21T21:34:57.0688237Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:34:57.0688481Z with:\n2022-03-21T21:34:57.0688631Z env:\n2022-03-21T21:34:57.0688769Z IN_CI: 1\n2022-03-21T21:34:57.0688930Z IS_GHA: 1\n2022-03-21T21:34:57.0689109Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:34:57.0689462Z ##[endgroup]\n2022-03-21T21:34:57.0704768Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n pull / linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge) (27/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:05:00.7896545Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in\n\n2022-03-21T21:05:00.7395504Z #10 0x5597fd5a9801 in run_mod /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:1037\n2022-03-21T21:05:00.7396330Z #11 0x5597fd5b47a9 in PyRun_StringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:961\n2022-03-21T21:05:00.7396688Z #12 0x5597fd5b480b in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1627392990942/work/Python/pythonrun.c:455\n2022-03-21T21:05:00.7398664Z #13 0x5597fd5b4908 in pymain_run_command /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:420\n2022-03-21T21:05:00.7399177Z #14 0x5597fd5b4908 in pymain_run_python /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:2907\n2022-03-21T21:05:00.7399663Z #15 0x5597fd5b4908 in pymain_main /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3460\n2022-03-21T21:05:00.7399986Z #16 0x5597fd5b4ccb in _Py_UnixMain /tmp/build/80754af9/python_1627392990942/work/Modules/main.c:3495\n2022-03-21T21:05:00.7895241Z #17 0x7f0a5905983f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291\n2022-03-21T21:05:00.7895772Z #18 0x5597fd559554 in _start (/opt/conda/bin/python3.7+0x1d7554)\n2022-03-21T21:05:00.7896033Z \n2022-03-21T21:05:00.7896545Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in \n2022-03-21T21:05:00.8063448Z + retcode=1\n2022-03-21T21:05:00.8063787Z + set -e\n2022-03-21T21:05:00.8064058Z + return 1\n2022-03-21T21:05:00.8067638Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]\n2022-03-21T21:05:00.8068127Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X ]]\n2022-03-21T21:05:00.8069018Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]\n2022-03-21T21:05:00.8069500Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\2 ]]\n2022-03-21T21:05:00.8070105Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]\n2022-03-21T21:05:00.8070580Z + [[ default == \\n\\o\\g\\p\\u\\_\\N\\O\\_\\A\\V\\X\\5\\1\\2 ]]\n2022-03-21T21:05:00.8072640Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]\n\n\n pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (28/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T22:48:17.3384813Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T22:48:16.8599645Z + python3 -m pip install boto3==1.19.12\n2022-03-21T22:48:17.1464241Z Defaulting to user installation because normal site-packages is not writeable\n2022-03-21T22:48:17.1685222Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)\n2022-03-21T22:48:17.1754164Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)\n2022-03-21T22:48:17.1771662Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)\n2022-03-21T22:48:17.1808722Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)\n2022-03-21T22:48:17.1868636Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)\n2022-03-21T22:48:17.1903889Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9)\n2022-03-21T22:48:17.2113746Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)\n2022-03-21T22:48:17.3267404Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-01fe178c405417375\n2022-03-21T22:48:17.3384813Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T22:48:17.3402286Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T22:48:17.3418376Z ##[error]Process completed with exit code 2.\n2022-03-21T22:48:17.3470528Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T22:48:17.3470874Z with:\n2022-03-21T22:48:17.3471096Z env:\n2022-03-21T22:48:17.3471327Z IN_CI: 1\n2022-03-21T22:48:17.3471538Z IS_GHA: 1\n2022-03-21T22:48:17.3471802Z GIT_DEFAULT_BRANCH: master\n2022-03-21T22:48:17.3472083Z GPU_FLAG: --gpus all\n2022-03-21T22:48:17.3472322Z ##[endgroup]\n\n\n pull / linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (29/29)\nStep: \"Upload test statistics\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-03-21T21:16:38.9646300Z python3: can't ope...ow_job_id.py': [Errno 2] No such file or directory\n\n2022-03-21T21:16:38.7995969Z Attempting uninstall: s3transfer\n2022-03-21T21:16:38.7998039Z Found existing installation: s3transfer 0.3.7\n2022-03-21T21:16:38.8066994Z Uninstalling s3transfer-0.3.7:\n2022-03-21T21:16:38.8072844Z Successfully uninstalled s3transfer-0.3.7\n2022-03-21T21:16:38.8449275Z Attempting uninstall: boto3\n2022-03-21T21:16:38.8451430Z Found existing installation: boto3 1.16.34\n2022-03-21T21:16:38.8559828Z Uninstalling boto3-1.16.34:\n2022-03-21T21:16:38.8574290Z Successfully uninstalled boto3-1.16.34\n2022-03-21T21:16:38.9100438Z Successfully installed boto3-1.19.12 botocore-1.22.12 s3transfer-0.5.2\n2022-03-21T21:16:38.9558098Z ++ python3 .github/scripts/get_workflow_job_id.py 2018440039 i-0d779c59d277d32ee\n2022-03-21T21:16:38.9646300Z python3: can't open file '.github/scripts/get_workflow_job_id.py': [Errno 2] No such file or directory\n2022-03-21T21:16:38.9658894Z + GHA_WORKFLOW_JOB_ID=\n2022-03-21T21:16:38.9673240Z ##[error]Process completed with exit code 2.\n2022-03-21T21:16:38.9720106Z ##[group]Run pytorch/pytorch/.github/actions/teardown-linux@master\n2022-03-21T21:16:38.9720333Z with:\n2022-03-21T21:16:38.9720485Z env:\n2022-03-21T21:16:38.9720645Z IN_CI: 1\n2022-03-21T21:16:38.9720793Z IS_GHA: 1\n2022-03-21T21:16:38.9720970Z GIT_DEFAULT_BRANCH: master\n2022-03-21T21:16:38.9721151Z ##[endgroup]\n2022-03-21T21:16:38.9736762Z ##[group]Run # ignore expansion of \"docker ps -q\" since it could be empty\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2021-11-10T08:42:52Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 964902894 }, { - "login": "huydhn" + "bodyText": "@vitaly-fedyunin @gottbrath FYI that this is the oneDNN Graph API integration. It depends on the #63748.", + "createdAt": "2021-11-16T16:36:52Z", + "author": { + "login": "Jianhui-Li" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 970451860 }, { - "login": "b0noI" + "bodyText": "CI failures are currently being caused by some issues in the CI infra, and are also occurring with other PRs.", + "createdAt": "2021-12-10T05:59:17Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 990641309 }, { - "login": "seemethere" + "bodyText": "CI failures are unrelated.", + "createdAt": "2021-12-10T20:44:09Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 991281407 }, { - "login": "malfet" + "bodyText": "The CI failure is unrelated.", + "createdAt": "2021-12-16T02:45:59Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 995389295 }, { - "login": "DanilBaibak" + "bodyText": "Hi, thank you for the PR!\nDo you mind running a larger amount of torchbench and reporting numbers ? You can look at Jason's post here for what models are supported in script. Initially just the vision models would be useful. @Krovatkin also did some benchmarking of a traced Bert model and found on average a ~16% speedup with this PR.", + "createdAt": "2022-01-18T18:22:34Z", + "author": { + "login": "eellison" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1015689390 }, { - "login": "ZainRizvi" + "bodyText": "Thanks a lot for reviewing, @eellison & @Krovatkin!\nWe just wanted to let you know that we're working on the benchmarking & will get back to you in a day, or two.\nUPDATE (Jan 21): While running some TorchBench models, we discovered some composability issues, and are working to ensure that oneDNN Graph would complement PyTorch's existing fusion capabilities, not hinder them.\nUPDATE (Jan 24): We've resolved the issues & will update this PR later today. Thanks!", + "createdAt": "2022-01-20T00:31:01Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1016996190 }, { - "login": "jeanschmidt" + "bodyText": "Hello @eellison,\nWe used this TorchBench branch for comparison. compare_llga.sh can be run for comparison.\nFor benchmarking mobilenet_v3_large with hardswish support in oneDNN Graph, this oneDNN Graph branch can be used in third_party/ideep/mkl-dnn. It delivers a speedup over PyTorch JIT (NNC + OFI) because 21 additional reorders are prevented (the major factor here), and fusion with conv also helps further.\nThe next release of oneDNN Graph would have hardswish support.\nWe're also exploring adding a hardsigmoid op in oneDNN Graph.\nThank you!", + "createdAt": "2022-01-26T23:51:38Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1022709513 }, { - "login": "atalman" + "bodyText": "Please note that this PR should be merged after #71546, as #71546 changes the third_party/ideep commit (this PR also uses that ideep commit, but it'd probably be better to merge #71546 first, so that oneDNN v2.5.2 upgrade would be in a separate PR). Thank you!", + "createdAt": "2022-01-31T23:57:21Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1026330085 }, { - "login": "mehtanirav" + "bodyText": "@sanchitintel mind rebasing and i'll land ?", + "createdAt": "2022-03-01T20:07:57Z", + "author": { + "login": "eellison" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1055813984 }, { - "login": "osalpekar" + "bodyText": "@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "createdAt": "2022-03-02T17:44:47Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1057203495 + }, + { + "bodyText": "Thanks a lot for taking a look, @eellison! To fix this error, we would enable Bazel build for oneDNN Graph.", + "createdAt": "2022-03-07T23:03:45Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1061230087 }, { - "login": "swang392" + "bodyText": "@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "createdAt": "2022-03-09T19:24:13Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1063276600 }, { - "login": "janeyx99" + "bodyText": "@malfet has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "createdAt": "2022-03-21T19:59:41Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074355779 }, { - "login": "clee2000" + "bodyText": "And graph_rewriter.cpp is full of DOS newlines...", + "createdAt": "2022-03-21T20:53:40Z", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1074407452 }, { - "login": "izaitsevfb" + "bodyText": "Hey @chunyuan-w.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-03-21T22:12:51Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1074471758 }, { - "login": "weiwangmeta" + "bodyText": "Thanks a ton for your help, @malfet & @eellison! :)\nWe'll incorporate your suggestions in subsequent PR(s).", + "createdAt": "2022-03-21T22:41:25Z", + "author": { + "login": "sanchitintel" + }, + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "sanchitintel" + }, + "databaseId": 1074492365 } ], "pageInfo": { - "hasNextPage": false, - "endCursor": "Y3Vyc29yOnYyOpHOBoQSVA==" + "startCursor": "Y3Vyc29yOnYyOpHOOYM_0Q==", + "hasPreviousPage": false } } } } } }, - "query_sha=a91ab398f97fb43cbe6e0899980dad8ff7447457ea5a71bbc59f7702a9280eb5 cursor=None name=qwertyuiop org=pytorch": { - "data": { - "organization": { - "team": null - } - } - }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=82169 owner=pytorch": { + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=73969 owner=pytorch": { "data": { "repository": { "pullRequest": { "closed": true, - "isCrossRepository": false, + "isCrossRepository": true, "author": { - "login": "ezyang" + "login": "malfet" }, - "title": "Move test_dtypes so it runs later", - "body": "Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):\n* __->__ #82169\n\nThe error messages it gives are very unhelpful (because a failure\ngets translated into \"dtype was not supported\" rather than the\nactual backtrace), so I'd rather get error messages about this after\nI've tested basic functionality.\n\nSigned-off-by: Edward Z. Yang ", - "headRefName": "gh/ezyang/1279/head", + "title": "Dummy change", + "body": "Test Plan: None at all\n\nDifferential Revision: D34753911\n\n", + "headRefName": "export-D34753911", "headRepository": { - "nameWithOwner": "pytorch/pytorch" + "nameWithOwner": "malfet/pytorch" }, - "baseRefName": "gh/ezyang/1279/base", + "baseRefName": "master", "baseRepository": { "nameWithOwner": "pytorch/pytorch", "isPrivate": false, @@ -17980,44 +32251,20 @@ "commit": { "author": { "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "cef34da55a59da5a32494bff218ccd4978b659d3" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ezyang" - }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" - }, - "oid": "83ad7e73a07111ac1d85e931d14360cc22c01edd" - } - }, - { - "commit": { - "author": { - "user": { - "login": "ezyang" + "login": "malfet" }, - "email": "ezyang@fb.com", - "name": "Edward Z. Yang" + "email": "nshulga@fb.com", + "name": "Nikita Shulga" }, - "oid": "28140e4008289251b695385acfb48ac7a47cd49c" + "oid": "4746da707a9912356f5179625da89616b228dc21" } } ], "pageInfo": { - "endCursor": "Mw", + "endCursor": "MQ", "hasNextPage": false }, - "totalCount": 3 + "totalCount": 1 }, "commits": { "nodes": [ @@ -18033,61 +32280,358 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "linux-vulkan-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280134/jobs/2794078044" + }, + { + "name": "test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280134/jobs/2794189060" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRQMQ=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592963" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QM=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280135/jobs/2794078023" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592965" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QU=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-rocm4.5-py3.7" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280132/jobs/2794078060" + }, + { + "name": "test (default, 1, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280132/jobs/2794292071" + }, + { + "name": "test (default, 2, 2, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280132/jobs/2794292205" + }, + { + "name": "test (distributed, 1, 1, linux.rocm.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280132/jobs/2794292306" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbTiXw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592966" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-QY=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cuda11.3-py3" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280139/jobs/2794078053" + }, + { + "name": "test (force_on_cpu, 1, 1, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280139/jobs/2794536907" + }, + { + "name": "test (default, 2, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280139/jobs/2794536998" + }, + { + "name": "test (default, 1, 2, windows.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280139/jobs/2794537089" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbY_vU=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592967" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qc=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280136/jobs/2794078031" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2ao=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592969" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qk=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-docs" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280138/jobs/2794078055" + }, + { + "name": "build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280138/jobs/2794183768" + }, + { + "name": "build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280138/jobs/2794183828" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRIt0=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592970" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qo=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" } }, "checkRuns": { "nodes": [ { - "name": "lintrunner", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543705427?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280140/jobs/2794078017" }, { - "name": "Test collect_env (with_torch)", + "name": "test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543705796?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280140/jobs/2794181109" }, { - "name": "Test collect_env (without_torch)", + "name": "test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543705914?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280140/jobs/2794181305" }, { - "name": "Test collect_env (older_python_version)", + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280140/jobs/2794181488" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbRFm4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592971" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Qs=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-build" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280143/jobs/2794078025" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2aw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592974" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Q4=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "Lint" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "shellcheck", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543706071?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078028" }, { "name": "quick-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543706300?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078196" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078407" + }, + { + "name": "clang-format", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078610" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078760" }, { "name": "toc", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543706581?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078898" }, { - "name": "Test tools", + "name": "py2-setup-validate-errormsg", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543706911?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794078999" }, { - "name": "workflow-checks", + "name": "flake8-py3", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794079087" + }, + { + "name": "mypy", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543707223?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280145/jobs/2794079199" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGj1lc=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO4Es=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696649" + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592975" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc8k=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-Q8=" }, { "node": { @@ -18097,21 +32641,185 @@ }, "workflowRun": { "workflow": { - "name": "TorchBench CI (pytorch-linux-py3.7-cu102)" + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit" } }, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build-and-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1958280146/jobs/2794078040" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAUbO2b0=", "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696651" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/4746da707a9912356f5179625da89616b228dc21/checks?check_suite_id=5595592976" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc8s=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAU2F-RA=" + } + ], + "pageInfo": { + "hasNextPage": true + } + }, + "status": { + "contexts": [ + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17040614?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17040643?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17040615?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-03-09T15:57:16Z", + "oid": "4746da707a9912356f5179625da89616b228dc21" + } + } + ] + }, + "changedFiles": 1, + "files": { + "nodes": [ + { + "path": "tools/build_variables.bzl" + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [], + "pageInfo": { + "startCursor": null, + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/malfet/pytorch/blob/4746da707a9912356f5179625da89616b228dc21/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\nAdd ciflow labels to this PR to trigger more builds:\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nlinux-binary-libtorch-cxx11-abi\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-binary-libtorch-pre-cxx11\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-binary-manywheel\nciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nlinux-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk\n\u2705 triggered\n\n\nlinux-bionic-rocm4.5-py3.7\nciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk\n\u2705 triggered\n\n\nlinux-docs\nciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-vulkan-bionic-py3.7-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-build\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3-clang5-mobile-custom-build-static\nciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-clang7-onnx\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build\nciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nlinux-xenial-py3.7-gcc7-no-ops\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nmacos-arm64-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nmacos-arm64-binary-wheel\nciflow/binaries, ciflow/binaries_wheel, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-libtorch-cxx11-abi\nciflow/binaries, ciflow/binaries_libtorch, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-libtorch-pre-cxx11\nciflow/binaries, ciflow/binaries_libtorch, ciflow/default\n\u2705 triggered\n\n\nmacos-binary-wheel\nciflow/binaries, ciflow/binaries_wheel, ciflow/default\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit\nciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win\n\u2705 triggered\n\n\nwindows-binary-conda\nciflow/binaries, ciflow/binaries_conda, ciflow/default\n\u2705 triggered\n\n\nwindows-binary-libtorch-debug\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nwindows-binary-libtorch-release\nciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nwindows-binary-wheel\nciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\ncaffe2-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\ndocker-builds\nciflow/all, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-custom-ops\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-arm64-metal\nciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nios-12-5-1-x86-64-coreml\nciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda10.2-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk\n\ud83d\udeab skipped\n\n\nlinux-docs-push\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda11.3-py3.7-gcc7-no-ops\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-arm64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-10-15-py3-lite-interpreter-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nmacos-11-py3-x86-64\nciflow/all, ciflow/macos, ciflow/trunk\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.7-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-bionic-cuda11.5-py3.7-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.3-py3.7-gcc7-debug\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.5-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npytorch-linux-xenial-py3-clang5-android-ndk-r19c-build\nciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk\n\ud83d\udeab skipped\n\n\npytorch-xla-linux-bionic-py3.7-clang8\nciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla\n\ud83d\udeab skipped", + "createdAt": "2022-03-09T15:57:11Z", + "author": { + "login": "pytorch-bot" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1063079053 + }, + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/73969\n\ud83d\udcc4 \u00a0Preview docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 4746da7 (more details on the Dr. CI page):\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2022-03-09T15:57:12Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1063079113 + }, + { + "bodyText": "This pull request was exported from Phabricator. Differential Revision: D34753911", + "createdAt": "2022-03-09T15:57:34Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1063079731 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOP11MjQ==", + "hasPreviousPage": false + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "fb-exported" + } + }, + { + "node": { + "name": "cla signed" + } + } + ] + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=73099 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "BowenBao" + }, + "title": "[ONNX] Make graph name spec-compliant (#71961)", + "body": "Stack from [ghstack](https://github.com/ezyang/ghstack):\n* #73104\n* #73103\n* #73102\n* #73101\n* #73100\n* __->__ #73099\n\n[According to the ONNX spec](https://github.com/onnx/onnx/blob/main/docs/IR.md#names-within-a-graph),\nall names must adhere to C90 identifier syntax rules, which means no\ndashes.\n\nFixes: #30952", + "headRefName": "gh/BowenBao/138/head", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "gh/BowenBao/138/base", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "BowenBao" + }, + "email": "bowbao@microsoft.com", + "name": "BowenBao" + }, + "oid": "3038b939eb2069653305c419326a0f47d2598e39" + } + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + }, + "totalCount": 1 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ { "node": { "app": { @@ -18128,18 +32836,18 @@ { "name": "run-torchbench", "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543705420?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041786/jobs/2626264278" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGjz0w=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNn9o=", "hasNextPage": false } }, "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696656" + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189561" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc9A=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7k=" }, { "node": { @@ -18149,20 +32857,41 @@ }, "workflowRun": { "workflow": { - "name": "Lint" + "name": "linux-xenial-cuda11.3-py3.7-gcc7" } }, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041785/jobs/2626264385" + }, + { + "name": "test (default, 1, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041785/jobs/2626417658" + }, + { + "name": "test (default, 2, 2, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041785/jobs/2626417743" + }, + { + "name": "test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041785/jobs/2626417885" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkRE_E=", "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696660" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189562" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRc9Q=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7o=" }, { "node": { @@ -18172,20 +32901,26 @@ }, "workflowRun": { "workflow": { - "name": "pull" + "name": "linux-xenial-py3.7-gcc7-no-ops" } }, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041789/jobs/2626264416" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoJE=", "hasNextPage": false } }, - "conclusion": "CANCELLED", - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696715" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189563" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdAs=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7s=" }, { "node": { @@ -18195,447 +32930,659 @@ }, "workflowRun": { "workflow": { - "name": "pull" + "name": "linux-xenial-py3-clang5-mobile-build" } - }, - "checkRuns": { - "nodes": [ - { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543706290?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543706587?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543706915?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang7-asan / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543707231?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3_7-clang8-xla / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543707459?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7-no-ops / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543707794?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543708127?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543708379?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543708606?check_suite_focus=true" - }, - { - "name": "win-vs2019-cuda11.6-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543709052?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543709309?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543709535?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543709809?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang10-onnx / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543709986?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-mobile-build / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543710238?check_suite_focus=true" - }, - { - "name": "linux-focal-rocm5.2-py3.7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543710467?check_suite_focus=true" - }, - { - "name": "linux-jammy-cuda11.6-cudnn8-py3.8-clang12 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543710675?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543710925?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543711166?check_suite_focus=true" - }, - { - "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7543711347?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544378552?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544378697?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544378800?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544378922?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544379063?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544379177?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544379274?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544414957?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544415089?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (cpp)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544418146?check_suite_focus=true" - }, - { - "name": "linux-docs / build-docs (python)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544418325?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544418649?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544418760?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544418892?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (functorch, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544418988?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (docs_test, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544419111?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544419210?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-gcc7 / test (backwards_compat, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544419367?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544420236?check_suite_focus=true" - }, - { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544427790?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544526201?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544526466?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544526651?check_suite_focus=true" - }, - { - "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544526810?check_suite_focus=true" - }, + }, + "checkRuns": { + "nodes": [ { - "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544526939?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041787/jobs/2626264407" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoIY=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189564" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS7w=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "name": "build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544790873?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041788/jobs/2626264422" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoJs=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189566" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS74=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-bionic-py3.7-clang9" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544790983?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041790/jobs/2626264414" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "name": "test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544791069?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041790/jobs/2626349405" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "name": "test (noarch, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544791145?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041790/jobs/2626349522" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "name": "test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544791233?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041790/jobs/2626349618" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcG0YME=", - "hasNextPage": true + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiwA=", + "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696836" + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189567" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdIQ=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS78=" }, { "node": { "app": { - "name": "Facebook GitHub Tools", - "databaseId": 12274 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-vulkan-bionic-py3.7-clang9" + } }, - "workflowRun": null, "checkRuns": { "nodes": [ { - "name": "Facebook CLA Check", + "name": "build", "conclusion": "SUCCESS", - "detailsUrl": "https://code.intern.facebook.com/cla/" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041793/jobs/2626264431" + }, + { + "name": "test (default, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041793/jobs/2626359364" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcGjyQg=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPxgQ=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546696896" + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189568" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdMA=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8A=" }, { "node": { "app": { - "name": "Netlify", - "databaseId": 13473 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3-clang5-mobile-custom-build-static" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041792/jobs/2626264427" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkNoKA=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697185" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189570" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdeE=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8I=" }, { "node": { "app": { - "name": "Azure Pipelines", - "databaseId": 9426 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "win-vs2019-cpu-py3" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041791/jobs/2626264386" + }, + { + "name": "test (default, 1, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041791/jobs/2626722677" + }, + { + "name": "test (default, 2, 2, windows.4xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041791/jobs/2626722710" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkX070=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697205" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189571" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdfU=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8M=" }, { "node": { "app": { - "name": "Dependabot", - "databaseId": 29110 + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-xenial-py3.7-gcc7" + } }, - "workflowRun": null, "checkRuns": { - "nodes": [], + "nodes": [ + { + "name": "build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041803/jobs/2626264401" + }, + { + "name": "test (distributed, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041803/jobs/2626349045" + }, + { + "name": "test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041803/jobs/2626349141" + }, + { + "name": "test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/1866041803/jobs/2626349272" + } + ], "pageInfo": { - "endCursor": null, + "endCursor": "Y3Vyc29yOnYyOpHPAAAAATkPiQA=", "hasNextPage": false } }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697224" + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/3038b939eb2069653305c419326a0f47d2598e39/checks?check_suite_id=5365189572" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdgg=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAT_KS8Q=" } ], "pageInfo": { "hasNextPage": true } }, - "pushedDate": "2022-07-27T15:34:17Z", - "oid": "28140e4008289251b695385acfb48ac7a47cd49c" + "status": { + "contexts": [ + { + "context": "ci/circleci: binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17010288?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17010289?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17010488?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + }, + { + "context": "ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build", + "state": "SUCCESS", + "targetUrl": "https://circleci.com/gh/pytorch/pytorch/17010326?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link" + } + ] + }, + "pushedDate": "2022-02-18T18:46:28Z", + "oid": "3038b939eb2069653305c419326a0f47d2598e39" } } ] }, - "changedFiles": 1, + "changedFiles": 162, "files": { "nodes": [ { - "path": "test/test_ops.py" + "path": "test/onnx/expect/TestOperators.test_acos.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_left_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_right_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_add_size1_singleton_broadcast.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_addconstant.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_addmm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_arange_dynamic.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_argmax.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_asin.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_at_op.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_atan.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_aten_embedding_1.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_aten_embedding_2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_avg_pool2d.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_baddbmm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_basic.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_1d.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_noaffine.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_onnx_irv4.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_batchnorm_training.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_bitshift.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_c2_op.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_chunk.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip_max.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_clip_min.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_concat2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_conv.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_conv_onnx_irv4_opset8.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_convtranspose.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_cos.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_cumsum.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_det.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dict.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dict_str.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dim.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout_default.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout_opset12.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout_training.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dropout_training_opset12.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_add.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_add_inputs_same_symbolic_shape.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_matmul.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_reduce_mean.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_dynamic_axes_unchange.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_elu.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_embedding_bags.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_empty_like.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_empty_like_opset7.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_equal.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_erf.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_exp.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_expand.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_flatten.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_flatten2D.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_fmod.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_frobenius_norm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_full.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_full_like.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gather.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gather_opset11.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ge.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gelu.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_gt.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_hardtanh.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_implicit_expand.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_index.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_isnan.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_layer_norm_aten.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_le.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_linear.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_log_sigmoid.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_logsoftmax.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_lstm_none_sequence_lens.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_lt.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_master_opset.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_max.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool_dilations.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_maxpool_indices.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mean.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mean_dtype.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_meshgrid.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_min.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_mm.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_narrow.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ne.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_nonzero.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_norm_p1.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_norm_p2.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_ones_like.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_pad.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_params.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_params_onnx_irv4.expect" + }, + { + "path": "test/onnx/expect/TestOperators.test_permute2.expect" } ], "pageInfo": { - "endCursor": "MQ", - "hasNextPage": false + "endCursor": "MTAw", + "hasNextPage": true } }, "reviews": { "nodes": [ { "author": { - "login": "zou3519" - }, - "state": "APPROVED" - }, - { - "author": { - "login": "Chillee" + "login": "garymm" }, "state": "APPROVED" } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNy0yNVQxNDo0NTozNS0wNzowMLkyMDIyLTA3LTI1VDE0OjQ1OjM1LTA3OjAwzj6XYmg=", + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMi0xOFQxNzoxODo0NC0wODowMLkyMDIyLTAyLTE4VDE3OjE4OjQ0LTA4OjAwzjTr0H0=", "hasPreviousPage": false } }, "comments": { "nodes": [ { - "bodyText": "@pytorchbot merge -f FORCE", + "bodyText": "This PR cannot be merged by bot due to changing > 100 files. @malfet \n \n \n pytorch/.github/scripts/trymerge.py\n \n \n Line 63\n in\n 932adf2\n \n \n \n \n\n \n \n files(last: 100) { \n \n \n \n\n Can this be relaxed? If not please import.", + "createdAt": "2022-02-22T18:22:40Z", "author": { - "login": "malfet" + "login": "BowenBao" }, - "authorAssociation": "MEMBER", + "authorAssociation": "COLLABORATOR", "editor": null, - "databaseId": 1197107402 + "databaseId": 1048084569 }, { - "bodyText": "You need to provide a reason for using force merge, in the format @pytorchbot merge -f '[CATEGORY] Explanation'. With [CATEGORY] being one the following:\nEMERGENCY - an emergency fix to quickly address an issue\nMINOR - a minor fix such as cleaning locally unused variables, which shouldn't break anything\nPRE_TESTED - a previous CI run tested everything and you've only added minor changes like fixing lint\nOTHER - something not covered above", + "bodyText": "This PR cannot be merged by bot due to changing > 100 files. @malfet\nCan this be relaxed? If not please import.\n\nWow, you've hit a really interesting problem. 100 is a limitation enforced by GitHub, see https://docs.github.com/en/graphql/overview/resource-limitations, but I can implement a pagination. Do you mind keeping it like that for a bit, want to land a fix soonish.", + "createdAt": "2022-02-22T18:27:29Z", "author": { - "login": "pytorch-bot" + "login": "malfet" }, - "authorAssociation": "NONE", + "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1197107439 + "databaseId": 1048088691 }, { - "bodyText": "@pytorchbot merge -f \"[OTHER] normal land failed twice already\"", + "bodyText": "@malfet Thank you for info. Sure, I have separated the rest of stack from this one, we'll wait for the fix to try again.", + "createdAt": "2022-02-22T18:29:48Z", "author": { - "login": "malfet" + "login": "BowenBao" }, - "authorAssociation": "MEMBER", + "authorAssociation": "COLLABORATOR", "editor": null, - "databaseId": 1197108130 + "databaseId": 1048090640 }, { - "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "bodyText": "@pytorchbot merge this", + "createdAt": "2022-02-24T21:42:36Z", "author": { - "login": "pytorchmergebot" + "login": "BowenBao" }, - "authorAssociation": "MEMBER", + "authorAssociation": "COLLABORATOR", "editor": null, - "databaseId": 1197119348 + "databaseId": 1050293881 }, { - "bodyText": "Hey @ezyang.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "bodyText": "Hey @BowenBao.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-02-24T21:44:39Z", "author": { "login": "github-actions" }, "authorAssociation": "NONE", "editor": null, - "databaseId": 1197120095 + "databaseId": 1050295451 } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHOR1poyg==", + "startCursor": "Y3Vyc29yOnYyOpHOPniAWQ==", "hasPreviousPage": true } }, @@ -18643,283 +33590,91 @@ "edges": [ { "node": { - "name": "Merged" + "name": "oncall: jit" } }, { "node": { - "name": "cla signed" - } - } - ] - } - } - } - } - }, - "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAcG0YME= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAcHRdAs= name=pytorch number=82169 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "commits": { - "nodes": [ - { - "commit": { - "oid": "28140e4008289251b695385acfb48ac7a47cd49c", - "checkSuites": { - "nodes": [ - { - "checkRuns": { - "nodes": [ - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544791308?check_suite_focus=true" - }, - { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (functorch, 1, 1, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544791418?check_suite_focus=true" - }, - { - "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544791778?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544877177?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544877276?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (functorch, 1, 1, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7544877367?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAcG1sTc=", - "hasNextPage": false - } - } - } - ] - } - } - } - ] - } - } - } - } - }, - "query_sha=4fa42dda073cf7ac75b2bbf595a8ef67b6dfff4bd248668750ff33ea913bf75f cursor=Y3Vyc29yOnYyOpHPAAAAAcHRdgg= name=pytorch number=82169 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "commits": { - "nodes": [ - { - "commit": { - "oid": "28140e4008289251b695385acfb48ac7a47cd49c", - "checkSuites": { - "edges": [ - { - "node": { - "app": { - "name": "Codecov", - "databaseId": 254 - }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697240" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdhg=" - }, - { - "node": { - "app": { - "name": "PyTorch Bot", - "databaseId": 40112 - }, - "workflowRun": null, - "checkRuns": { - "nodes": [], - "pageInfo": { - "endCursor": null, - "hasNextPage": false - } - }, - "conclusion": null, - "url": "https://github.com/pytorch/pytorch/commit/28140e4008289251b695385acfb48ac7a47cd49c/checks?check_suite_id=7546697255" - }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAcHRdic=" - } - ], - "pageInfo": { - "hasNextPage": false - } - } - } - } - ] - } - } - } - } - }, - "query_sha=0e2a29eda6405cea4c9de20fb80ae7924910e17272a7b251040182e7d8c390e0 name=pytorch number=79694 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "closed": false, - "isCrossRepository": true, - "author": { - "login": "kshitij12345" - }, - "title": "[complex] conv_transpose1d", - "body": "Reference: https://github.com/pytorch/pytorch/issues/71108", - "headRefName": "develop/complex/conv_transpose1d", - "headRepository": { - "nameWithOwner": "kshitij12345/pytorch" - }, - "baseRefName": "master", - "baseRepository": { - "nameWithOwner": "pytorch/pytorch", - "isPrivate": false, - "defaultBranchRef": { - "name": "master" - } - }, - "mergeCommit": null, - "commits_with_authors": { - "nodes": [ - { - "commit": { - "author": { - "user": { - "login": "kshitij12345" - }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" - }, - "oid": "d1ea948e65ac6d31ad056287ab65d38ecc68b30d" - } - }, - { - "commit": { - "author": { - "user": { - "login": "kshitij12345" - }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" - }, - "oid": "b4ba1db9a3a71bd8c03158dcd1b68711360633d8" - } - }, - { - "commit": { - "author": { - "user": { - "login": "kshitij12345" - }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" - }, - "oid": "655a4220beae163bfe578f0318a130df01ec05d6" - } - }, - { - "commit": { - "author": { - "user": { - "login": "kshitij12345" - }, - "email": "kshitijkalambarkar@gmail.com", - "name": "Kshiteej K" - }, - "oid": "8181716be7a8005eb13ad5c3f2e1279ed1c60aff" + "name": "open source" } }, { - "commit": { - "author": { - "user": { - "login": "kshitij12345" - }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" - }, - "oid": "9e5ca3663e7471786eeebebfdf84aea5d761712f" + "node": { + "name": "cla signed" } }, { - "commit": { - "author": { - "user": { - "login": "kshitij12345" - }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" - }, - "oid": "9c110f39bcdc4e56386b6f9c4e2c082c8940ade6" + "node": { + "name": "release notes: onnx" } }, { - "commit": { - "author": { - "user": { - "login": "kshitij12345" - }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" - }, - "oid": "49315e79d0eee8008e2a74575c6fc0f6a9531ee4" + "node": { + "name": "topic: bug fixes" } - }, + } + ] + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=74649 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": false, + "author": { + "login": "malfet" + }, + "title": "This should fail flake8", + "body": "Test issue for GHF mandatory checks", + "headRefName": "malfet-patch-8", + "headRepository": { + "nameWithOwner": "pytorch/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ { "commit": { "author": { "user": { - "login": "kshitij12345" + "login": "malfet" }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" + "email": "nshulga@fb.com", + "name": "Nikita Shulga" }, - "oid": "728752480760226270c374a0acc08e28b9b133f3" + "oid": "57c86ff1c5ab948888fd329986c9d55796680e33" } }, { "commit": { "author": { "user": { - "login": "kshitij12345" + "login": "malfet" }, - "email": "kshitijkalambarkar@gmail.com", - "name": "kshitij12345" + "email": "nshulga@fb.com", + "name": "Nikita Shulga" }, - "oid": "ffe43399d6f60ef7844523a5f465c11d9a67062f" + "oid": "6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4" } } ], "pageInfo": { - "endCursor": "OQ", + "endCursor": "Mg", "hasNextPage": false }, - "totalCount": 9 + "totalCount": 2 }, "commits": { "nodes": [ @@ -18943,14 +33698,109 @@ } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAboNCRo=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsK3w=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/ffe43399d6f60ef7844523a5f465c11d9a67062f/checks?check_suite_id=7428002306" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018129" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1E=" + }, + { + "node": { + "app": { + "name": "Netlify", + "databaseId": 13473 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018131" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1M=" + }, + { + "node": { + "app": { + "name": "Azure Pipelines", + "databaseId": 9426 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018132" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1Q=" + }, + { + "node": { + "app": { + "name": "Dependabot", + "databaseId": 29110 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018134" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1Y=" + }, + { + "node": { + "app": { + "name": "Codecov", + "databaseId": 254 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018139" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj1s=" + }, + { + "node": { + "app": { + "name": "PyTorch Bot", + "databaseId": 40112 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [], + "pageInfo": { + "endCursor": null, + "hasNextPage": false + } + }, + "conclusion": null, + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018142" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbq-UgI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlj14=" }, { "node": { @@ -18966,55 +33816,75 @@ "checkRuns": { "nodes": [ { - "name": "lintrunner", + "name": "clang-format", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925132" + }, + { + "name": "clang-tidy", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925189" + }, + { + "name": "cmakelint", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925230" + }, + { + "name": "flake8-py3", + "conclusion": "FAILURE", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925307" + }, + { + "name": "mypy", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426574264?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925365" }, { "name": "Test collect_env (with_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426574600?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925427" }, { "name": "Test collect_env (without_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426574693?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925449" }, { - "name": "Test collect_env (older_python_version)", + "name": "Test tools", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426574832?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925537" }, { - "name": "Test tools", + "name": "py2-setup-validate-errormsg", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426575043?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925644" }, { - "name": "toc", + "name": "quick-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426575297?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925688" }, { - "name": "workflow-checks", + "name": "toc", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426575617?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925809" }, { - "name": "quick-checks", + "name": "shellcheck", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426575807?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576283/jobs/2928925945" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbqojb8=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsMiY=", "hasNextPage": false } }, - "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/ffe43399d6f60ef7844523a5f465c11d9a67062f/checks?check_suite_id=7437320797" + "conclusion": "FAILURE", + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018384" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbtMgl0=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkFA=" }, { "node": { @@ -19032,18 +33902,18 @@ { "name": "run-torchbench", "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426574246?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576288/jobs/2928925134" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbqoh6Y=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHsLW0=", "hasNextPage": false } }, "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/ffe43399d6f60ef7844523a5f465c11d9a67062f/checks?check_suite_id=7437320800" + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018395" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbtMgmA=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkFs=" }, { "node": { @@ -19059,265 +33929,561 @@ "checkRuns": { "nodes": [ { - "name": "linux-focal-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426574798?check_suite_focus=true" + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928935743" }, { - "name": "linux-focal-py3.7-gcc7-no-ops / build", + "name": "linux-vulkan-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426575118?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928935775" }, { "name": "linux-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426575476?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928935850" }, { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "name": "linux-bionic-rocm4.5-py3.7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426575622?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928935994" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / build", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426575875?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936064" }, { - "name": "linux-focal-py3.7-clang7-asan / build", + "name": "linux-xenial-py3.7-gcc5.4 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426576118?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936179" }, { - "name": "linux-focal-py3.7-clang10-onnx / build", + "name": "linux-xenial-py3-clang5-mobile-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426576360?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936265" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "name": "linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426576522?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936309" }, { - "name": "linux-jammy-cuda11.6-cudnn8-py3.8-clang12 / build", + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426576694?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936353" }, { - "name": "linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build / build", + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426576858?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936395" }, { - "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426577069?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936426" }, { - "name": "linux-focal-rocm5.1-py3.7 / build", + "name": "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426577340?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936483" }, { - "name": "win-vs2019-cuda11.6-py3 / build", + "name": "win-vs2019-cuda11.3-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426577507?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936516" }, { - "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / build", + "name": "win-vs2019-cpu-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426577677?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936558" }, { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "name": "linux-xenial-py3.7-gcc7-no-ops / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426577906?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936633" }, { - "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "name": "linux-xenial-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426578065?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936705" }, { - "name": "linux-bionic-py3_7-clang8-xla / build", + "name": "deploy-linux-xenial-cuda11.3-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426578285?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936736" }, { - "name": "linux-xenial-py3-clang5-mobile-build / build", + "name": "linux-xenial-py3.7-clang7-onnx / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936756" + }, + { + "name": "pytorch-xla-linux-bionic-py3.7-clang8", + "conclusion": "NEUTRAL", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936796" + }, + { + "name": "linux-xenial-py3.7-clang7-asan / build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928936823" + }, + { + "name": "linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426578423?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928990551" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "name": "linux-xenial-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426578533?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928990588" }, { - "name": "win-vs2019-cpu-py3 / build", + "name": "linux-docs / build-docs (cpp)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426578766?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928992832" }, { - "name": "linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", + "name": "linux-docs / build-docs (python)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426768328?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928992868" }, { - "name": "linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426768494?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928992932" }, { - "name": "linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)", + "name": "linux-xenial-py3.7-gcc5.4 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426768635?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928992965" }, { - "name": "linux-focal-py3.7-gcc7 / test (docs_test, 1, 1, linux.2xlarge)", + "name": "linux-xenial-py3.7-gcc5.4 / test (distributed, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426768797?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928993011" }, { - "name": "linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)", + "name": "linux-xenial-py3.7-gcc5.4 / test (docs_test, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426768904?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928993042" }, { - "name": "linux-docs / build-docs (cpp)", + "name": "linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426769059?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928993086" }, { - "name": "linux-docs / build-docs (python)", + "name": "linux-xenial-py3.7-gcc5.4 / test (jit_legacy, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426769221?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928993128" }, { "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426794528?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928995802" }, { "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426794681?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426794811?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928995853" }, { - "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", + "name": "linux-bionic-py3.7-clang9 / test (noarch, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426794965?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928995889" }, { - "name": "linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)", + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426795132?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928997626" }, { - "name": "linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426795278?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928999058" }, { - "name": "linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426795396?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2928999075" }, { - "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 1, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426815145?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929012407" }, { - "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 2, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426815265?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929012438" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "name": "linux-xenial-py3.7-clang7-asan / test (default, 3, 3, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426818878?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929012469" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", + "name": "linux-bionic-rocm4.5-py3.7 / test (default, 1, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426857383?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929034328" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", + "name": "linux-bionic-rocm4.5-py3.7 / test (default, 2, 2, linux.rocm.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426857577?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929034340" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426857720?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929040801" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426857893?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929045939" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426858145?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929046016" }, { - "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", + "name": "linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426883486?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929046063" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426949849?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929082254" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426950005?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929082275" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "name": "win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426950152?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929157614" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "name": "win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426950337?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929157635" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "name": "win-vs2019-cuda11.3-py3 / test (force_on_cpu, 1, 1, windows.4xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426950460?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2031576300/jobs/2929157656" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAVHxIT4=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4/checks?check_suite_id=5778018405" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAVhlkGU=" + } + ], + "pageInfo": { + "hasNextPage": false + } + }, + "status": null, + "pushedDate": "2022-03-24T00:42:33Z", + "oid": "6c3c3de6a5c1183d9a08f3c54148bc0b5de11bb4" + } + } + ] + }, + "changedFiles": 1, + "files": { + "nodes": [ + { + "path": "torch/nn/cpp.py" + } + ], + "pageInfo": { + "endCursor": "MQ", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "seemethere" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wMy0yM1QxNTo1MDo0NS0wNzowMLkyMDIyLTAzLTIzVDE1OjUwOjQ1LTA3OjAwzjbPEDg=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/74649\n\u21a9\ufe0f \u00a0[fb-only] Re-run with SSH instructions\nNeed help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 6c3c3de (more details on the Dr. CI page):\n\n\n1/1 failures introduced in this PR\n\n\n1 failure not recognized by patterns:\n\n\n\nJob\nStep\nAction\n\n\n\n\n Lint / flake8-py3\nFail if there were any warnings\n\ud83d\udd01 rerun\n\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2022-03-23T22:40:51Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1076891218 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQDAOUg==", + "hasPreviousPage": false + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "cla signed" + } + } + ] + } + } + } + } + }, + "query_sha=81fd873151c3cded18314e9e53bf54a93ffb0afa9c52fa2cbafb2ceab7df5e45 name=pytorch number=79694 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "closed": true, + "isCrossRepository": true, + "author": { + "login": "kshitij12345" + }, + "title": "[complex] conv_transpose1d", + "body": "Reference: https://github.com/pytorch/pytorch/issues/71108", + "headRefName": "develop/complex/conv_transpose1d", + "headRepository": { + "nameWithOwner": "kshitij12345/pytorch" + }, + "baseRefName": "master", + "baseRepository": { + "nameWithOwner": "pytorch/pytorch", + "isPrivate": false, + "defaultBranchRef": { + "name": "master" + } + }, + "mergeCommit": null, + "commits_with_authors": { + "nodes": [ + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "d1ea948e65ac6d31ad056287ab65d38ecc68b30d" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "b4ba1db9a3a71bd8c03158dcd1b68711360633d8" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "655a4220beae163bfe578f0318a130df01ec05d6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "Kshiteej K" + }, + "oid": "8181716be7a8005eb13ad5c3f2e1279ed1c60aff" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "9e5ca3663e7471786eeebebfdf84aea5d761712f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "9c110f39bcdc4e56386b6f9c4e2c082c8940ade6" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "49315e79d0eee8008e2a74575c6fc0f6a9531ee4" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "728752480760226270c374a0acc08e28b9b133f3" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "ffe43399d6f60ef7844523a5f465c11d9a67062f" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "9672a2198472567bae4ac6f55d004f7e1fa8a9fa" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "48a0ebf32b895286f036b36c871f671dc867e400" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "52fbe80d5c8a94e03d816c0bd21fd82019dcd5ac" + } + }, + { + "commit": { + "author": { + "user": { + "login": "kshitij12345" + }, + "email": "kshitijkalambarkar@gmail.com", + "name": "kshitij12345" + }, + "oid": "2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce" + } + } + ], + "pageInfo": { + "endCursor": "MTM", + "hasNextPage": false + }, + "totalCount": 13 + }, + "commits": { + "nodes": [ + { + "commit": { + "checkSuites": { + "edges": [ + { + "node": { + "app": { + "name": "Facebook GitHub Tools", + "databaseId": 12274 + }, + "workflowRun": null, + "checkRuns": { + "nodes": [ { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "name": "Facebook CLA Check", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426950568?check_suite_focus=true" + "detailsUrl": "https://code.facebook.com/cla/" }, { - "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "name": "Meta Internal-Only Changes Check", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7426961175?check_suite_focus=true" + "detailsUrl": "https://opensource.facebook.com/" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbqubxc=", - "hasNextPage": true + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdtq8Hc=", + "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/ffe43399d6f60ef7844523a5f465c11d9a67062f/checks?check_suite_id=7437320828" + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7929899098" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbtMgnw=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdioqFo=" }, { "node": { @@ -19335,18 +34501,18 @@ { "name": "run-torchbench", "conclusion": "NEUTRAL", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453692770?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393316/jobs/4628529923" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbxGU2I=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdqTEwk=", "hasNextPage": false } }, "conclusion": "SKIPPED", - "url": "https://github.com/pytorch/pytorch/commit/ffe43399d6f60ef7844523a5f465c11d9a67062f/checks?check_suite_id=7463496300" + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7929899387" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbzb6mw=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdioqXs=" }, { "node": { @@ -19364,53 +34530,58 @@ { "name": "lintrunner", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453692736?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628529910" }, { - "name": "toc", + "name": "quick-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453693139?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628530162" }, { - "name": "workflow-checks", + "name": "Test collect_env (with_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453693588?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628530698" }, { - "name": "quick-checks", + "name": "Test collect_env (without_torch)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453693942?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628530867" }, { - "name": "Test tools", + "name": "Test collect_env (older_python_version)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453694270?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628530989" }, { - "name": "Test collect_env (with_torch)", + "name": "pr-sanity-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453694519?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628531151" }, { - "name": "Test collect_env (without_torch)", + "name": "workflow-checks", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453694654?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628531475" }, { - "name": "Test collect_env (older_python_version)", + "name": "Test tools", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628531753" + }, + { + "name": "toc", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453694759?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393315/jobs/4628531853" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbxGWyc=", + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdqTHFY=", "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/ffe43399d6f60ef7844523a5f465c11d9a67062f/checks?check_suite_id=7463496306" + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7929899388" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbzb6nI=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdioqXw=" }, { "node": { @@ -19426,273 +34597,478 @@ "checkRuns": { "nodes": [ { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / build", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453693883?check_suite_focus=true" - }, - { - "name": "linux-bionic-py3.7-clang9 / build", + "name": "linux-focal-py3.7-clang7-asan / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453694269?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628531149" }, { - "name": "linux-focal-py3.7-gcc7-no-ops / build", + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453694482?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628531473" }, { - "name": "linux-bionic-py3_7-clang8-xla / build", + "name": "linux-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453694773?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628531754" }, { "name": "linux-jammy-cuda11.6-cudnn8-py3.8-clang12 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453695048?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628531857" }, { - "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", + "name": "linux-focal-py3.7-gcc7-pch / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453695376?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628532179" }, { - "name": "linux-focal-py3.7-gcc7 / build", + "name": "linux-focal-py3.7-clang10-onnx / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453695572?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628532543" }, { - "name": "linux-focal-py3.7-clang10-onnx / build", + "name": "linux-bionic-cuda11.3-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453695789?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628532694" }, { - "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", + "name": "linux-focal-py3.7-gcc7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453696094?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628532918" }, { - "name": "win-vs2019-cpu-py3 / build", + "name": "linux-vulkan-bionic-py3.7-clang9 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453696262?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533033" }, { - "name": "linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build / build", + "name": "linux-focal-py3.7-gcc7-no-ops / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453696440?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533181" }, { - "name": "linux-focal-py3.7-clang7-asan / build", + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453696619?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533420" }, { - "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", + "name": "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453696913?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533630" }, { - "name": "linux-focal-rocm5.1-py3.7 / build", + "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453697192?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533825" }, { - "name": "win-vs2019-cuda11.6-py3 / build", + "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453697504?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628533959" }, { - "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / build", + "name": "linux-xenial-py3-clang5-mobile-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453697701?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534129" }, { - "name": "linux-xenial-py3-clang5-mobile-custom-build-static / build", + "name": "linux-bionic-py3_7-clang8-xla / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453697927?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534256" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / build", + "name": "linux-focal-rocm5.2-py3.7 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453698388?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534388" }, { - "name": "linux-xenial-py3-clang5-mobile-build / build", + "name": "linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453698629?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534571" }, { - "name": "linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single / build-and-test", + "name": "linux-bionic-cuda11_6-py3_10-gcc7-deploy / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453698800?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534714" }, { - "name": "linux-docs / build-docs (cpp)", + "name": "win-vs2019-cuda11.6-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453870481?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628534989" }, { - "name": "linux-docs / build-docs (python)", + "name": "win-vs2019-cpu-py3 / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453870600?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628535311" }, { "name": "linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453870806?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639115" }, { "name": "linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453870899?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639198" }, { "name": "linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453871006?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639265" + }, + { + "name": "linux-focal-py3.7-gcc7 / test (functorch, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639339" }, { "name": "linux-focal-py3.7-gcc7 / test (docs_test, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453871108?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639395" }, { "name": "linux-focal-py3.7-gcc7 / test (jit_legacy, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453871214?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639450" }, { "name": "linux-focal-py3.7-gcc7 / test (backwards_compat, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453871379?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639509" + }, + { + "name": "linux-docs / build-docs (cpp)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639572" + }, + { + "name": "linux-docs / build-docs (python)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628639635" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647047" + }, + { + "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647119" }, { "name": "linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453877423?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647215" }, { "name": "linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453877577?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647277" }, { "name": "linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453877679?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647348" }, { "name": "linux-bionic-py3.7-clang9 / test (crossref, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453877783?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647432" }, { "name": "linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453877932?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647522" }, { "name": "linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453878058?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647641" }, { "name": "linux-bionic-py3.7-clang9 / test (functorch, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453878178?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628647762" }, { - "name": "linux-focal-py3.7-clang10-onnx / test (default, 1, 2, linux.2xlarge)", + "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453882847?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628653797" }, { - "name": "linux-focal-py3.7-clang10-onnx / test (default, 2, 2, linux.2xlarge)", + "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453882949?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679376" }, { - "name": "linux-vulkan-bionic-py3.7-clang9 / test (default, 1, 1, linux.2xlarge)", + "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453888149?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679431" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 1, 5, linux.2xlarge)", + "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453922173?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679469" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 2, 5, linux.2xlarge)", + "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453922275?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679519" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge)", + "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453922371?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628679594" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 4, 5, linux.2xlarge)", + "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628681226" + }, + { + "name": "linux-bionic-cuda11_6-py3_10-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628854932" + }, + { + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856434" + }, + { + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856501" + }, + { + "name": "linux-bionic-cuda11.6-py3.10-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2907393329/jobs/4628856575" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdqZ2fA=", + "hasNextPage": true + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7929899419" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdioqZs=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "windows-binary-libtorch-debug" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "libtorch-cpu-shared-with-deps-debug-build", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351637/jobs/4634503587" + }, + { + "name": "libtorch-cpu-shared-with-deps-debug-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351637/jobs/4635312938" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsbsmM=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953056" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUSuA=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "windows-binary-wheel" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "wheel-py3_7-cuda11_3-build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453922449?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351640/jobs/4634503571" }, { - "name": "linux-focal-py3.7-clang7-asan / test (default, 5, 5, linux.2xlarge)", + "name": "wheel-py3_7-cuda11_3-test", + "conclusion": "SUCCESS", + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351640/jobs/4636146265" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsskcw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953059" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUSuM=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "windows-binary-libtorch-release" + } + }, + "checkRuns": { + "nodes": [ + { + "name": "libtorch-cpu-shared-with-deps-release-build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453922527?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351643/jobs/4634503570" }, { - "name": "linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)", + "name": "libtorch-cpu-shared-with-deps-release-test", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7453931393?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351643/jobs/4635003925" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsVbD8=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953061" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUSuU=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-binary-libtorch-cxx11-abi" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)", + "name": "libtorch-cpu-shared-with-deps-cxx11-abi-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454011679?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351698/jobs/4634504079" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu)", + "name": "libtorch-cpu-shared-with-deps-cxx11-abi-test / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454011783?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351698/jobs/4635072931" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsW5Aw=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953185" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUS2E=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-binary-libtorch-pre-cxx11" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu)", + "name": "libtorch-cpu-shared-with-deps-cxx11-abi-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454011866?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351700/jobs/4634503897" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu)", + "name": "libtorch-cpu-shared-with-deps-cxx11-abi-test / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454011976?check_suite_focus=true" - }, + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351700/jobs/4635077148" + } + ], + "pageInfo": { + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsW-jo=", + "hasNextPage": false + } + }, + "conclusion": "SUCCESS", + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953186" + }, + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUS2I=" + }, + { + "node": { + "app": { + "name": "GitHub Actions", + "databaseId": 15368 + }, + "workflowRun": { + "workflow": { + "name": "linux-binary-manywheel" + } + }, + "checkRuns": { + "nodes": [ { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)", + "name": "manywheel-py3_7-cuda10_2-build / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454012075?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351699/jobs/4634503896" }, { - "name": "linux-bionic-cuda11.6-py3.7-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)", + "name": "manywheel-py3_7-cuda10_2-test / build", "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454012177?check_suite_focus=true" + "detailsUrl": "https://github.com/pytorch/pytorch/actions/runs/2910351699/jobs/4635934290" } ], "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbxLMxE=", - "hasNextPage": true + "endCursor": "Y3Vyc29yOnYyOpHPAAAAAdsoMEA=", + "hasNextPage": false } }, "conclusion": "SUCCESS", - "url": "https://github.com/pytorch/pytorch/commit/ffe43399d6f60ef7844523a5f465c11d9a67062f/checks?check_suite_id=7463496361" + "url": "https://github.com/pytorch/pytorch/commit/2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce/checks?check_suite_id=7936953187" }, - "cursor": "Y3Vyc29yOnYyOpHPAAAAAbzb6qk=" + "cursor": "Y3Vyc29yOnYyOpHPAAAAAdkUS2M=" } ], "pageInfo": { - "hasNextPage": false + "hasNextPage": true } }, - "pushedDate": "2022-07-19T19:21:58Z", - "oid": "ffe43399d6f60ef7844523a5f465c11d9a67062f" + "status": null, + "pushedDate": "2022-08-22T22:04:19Z", + "oid": "2fd08f1c669bbb0f2e14ae40e76f9e0d3195f4ce" } } ] @@ -19701,263 +35077,789 @@ "files": { "nodes": [ { - "path": "aten/src/ATen/native/Convolution.cpp" + "path": "aten/src/ATen/native/Convolution.cpp" + }, + { + "path": "torch/testing/_internal/common_methods_invocations.py" + }, + { + "path": "torch/testing/_internal/common_modules.py" + } + ], + "pageInfo": { + "endCursor": "Mw", + "hasNextPage": false + } + }, + "reviews": { + "nodes": [ + { + "author": { + "login": "ngimel" + }, + "state": "APPROVED" + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNy0xOVQxMDowNzo1NC0wNzowMLkyMDIyLTA3LTE5VDEwOjA3OjU0LTA3OjAwzj43QcY=", + "hasPreviousPage": false + } + }, + "comments": { + "nodes": [ + { + "bodyText": "@pytorchbot merge -g\nAll is green internally!", + "createdAt": "2022-08-23T19:29:55Z", + "author": { + "login": "albanD" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1224702749 + }, + { + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here.\nThe merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.\nPlease reach out to the PyTorch DevX Team with feedback or questions!", + "createdAt": "2022-08-23T19:31:18Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1224705564 + }, + { + "bodyText": "Thanks for looking into it \ud83d\ude42 @albanD @jeanschmidt", + "createdAt": "2022-08-23T19:34:36Z", + "author": { + "login": "kshitij12345" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1224712351 + }, + { + "bodyText": "Hey @kshitij12345.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-08-23T22:31:58Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1224956051 + }, + { + "bodyText": "Yeah, discussed with my manager and I got the required permissions to do so. Sorry for not responding promptly yesterday. But I am available from now on to provide assistance :)", + "createdAt": "2022-08-24T09:24:04Z", + "author": { + "login": "jeanschmidt" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1225462612 + } + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOSP97HQ==", + "hasPreviousPage": true + } + }, + "labels": { + "edges": [ + { + "node": { + "name": "open source" + } + }, + { + "node": { + "name": "Merged" + } + }, + { + "node": { + "name": "cla signed" + } + }, + { + "node": { + "name": "Reverted" + } + }, + { + "node": { + "name": "ciflow/trunk" + } + }, + { + "node": { + "name": "ciflow/periodic" + } + } + ] + } + } + } + } + }, + "query_sha=2e2877d2452c4f233f042b7ccd50ab9c2a6e9a73d8819a0c876203c12364e8a3 cursor=Y3Vyc29yOnYyOpHOSP97HQ== name=pytorch number=79694 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "comments": { + "nodes": [ + { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/79694\n\ud83d\udcc4 \u00a0Preview Python docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\u2753Need help or want to give feedback on the CI? Visit our office hours\n\n\u2705 No Failures (0 Pending)\nAs of commit 2fd08f1 (more details on the Dr. CI page):\nExpand to see more\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2022-06-16T09:43:16Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1157454523 + }, + { + "bodyText": "Unable to reproduce jit failure locally (will skip the test)\nCI Failure : https://github.com/pytorch/pytorch/runs/6926187074?check_suite_focus=true#step:9:20230\npytest test/test_ops_jit.py -k test_variant_consistency_jit_nn_functional_conv_transpose1d_cpu_complex64 -v\n=============================================================== test session starts ===============================================================\nplatform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- /home/kshiteej/.conda/envs/pytorch-cuda-dev/bin/python\ncachedir: .pytest_cache\nhypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/kshiteej/Pytorch/pytorch_complex_convolution.py/.hypothesis/examples')\nrootdir: /home/kshiteej/Pytorch/pytorch_complex_convolution.py, configfile: pytest.ini\nplugins: hypothesis-6.23.2, repeat-0.9.1\ncollected 1976 items / 1975 deselected / 1 selected \n\ntest/test_ops_jit.py::TestJitCPU::test_variant_consistency_jit_nn_functional_conv_transpose1d_cpu_complex64 PASSED [100%]\n\n================================================================ warnings summary =================================================================\n../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/testing/_internal/common_cuda.py:9\n /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/testing/_internal/common_cuda.py:9: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives\n from distutils.version import LooseVersion\n\n../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:91\n /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:91: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.\n warnings.warn(\n\n-- Docs: https://docs.pytest.org/en/stable/warnings.html\n================================================= 1 passed, 1975 deselected, 2 warnings in 4.90s =================================================", + "createdAt": "2022-07-18T09:05:35Z", + "author": { + "login": "kshitij12345" + }, + "authorAssociation": "COLLABORATOR", + "editor": { + "login": "kshitij12345" + }, + "databaseId": 1186949486 + }, + { + "bodyText": "@pytorchbot merge", + "createdAt": "2022-07-19T17:12:23Z", + "author": { + "login": "ngimel" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189347786 + }, + { + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "createdAt": "2022-07-19T17:13:42Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189350009 + }, + { + "bodyText": "Hey @kshitij12345.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-07-19T17:14:25Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1189350932 + }, + { + "bodyText": "@pytorchbot revert -m \"broke slow test https://github.com/pytorch/pytorch/runs/7414560957?check_suite_focus=true#step:9:31516\" -c \"nosignal\"", + "createdAt": "2022-07-19T19:15:41Z", + "author": { + "login": "kshitij12345" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1189459845 + }, + { + "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", + "createdAt": "2022-07-19T19:16:59Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189460926 + }, + { + "bodyText": "Will not revert as @kshitij12345 is not a MEMBER, but COLLABORATOR", + "createdAt": "2022-07-19T19:17:00Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189460942 }, { - "path": "torch/testing/_internal/common_methods_invocations.py" + "bodyText": "@pytorchbot revert -m \"broke slow test https://github.com/pytorch/pytorch/runs/7414560957?check_suite_focus=true#step:9:31516\" -c \"nosignal\"", + "createdAt": "2022-07-19T20:40:04Z", + "author": { + "login": "anjali411" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189529734 }, { - "path": "torch/testing/_internal/common_modules.py" + "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", + "createdAt": "2022-07-19T20:41:20Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189530756 + }, + { + "bodyText": "@kshitij12345 your PR has been successfully reverted.", + "createdAt": "2022-07-19T20:41:25Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1189530831 + }, + { + "bodyText": "@pytorchbot merge -g", + "createdAt": "2022-07-20T09:53:08Z", + "author": { + "login": "kshitij12345" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1190070141 + }, + { + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "createdAt": "2022-07-20T09:54:24Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1190071424 + }, + { + "bodyText": "Hey @kshitij12345.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "createdAt": "2022-07-20T13:00:51Z", + "author": { + "login": "github-actions" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1190258272 + }, + { + "bodyText": "commit is breaking internal builds/tests https://pastebin.com/HX4RUusH (pytorch/functorch/test:test_eager_transforms)", + "createdAt": "2022-07-21T10:39:01Z", + "author": { + "login": "jeanschmidt" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191327616 + }, + { + "bodyText": "@pytorchbot revert -m \"breaking internal builds\" -c \"ghfirst\"", + "createdAt": "2022-07-21T10:39:27Z", + "author": { + "login": "jeanschmidt" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191328013 + }, + { + "bodyText": "@pytorchbot revert -m \"breaking internal builds\" -c \"ghfirst\"", + "createdAt": "2022-07-21T10:41:23Z", + "author": { + "login": "jeanschmidt" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191329792 + }, + { + "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", + "createdAt": "2022-07-21T10:42:16Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191330586 + }, + { + "bodyText": "@kshitij12345 your PR has been successfully reverted.", + "createdAt": "2022-07-21T10:42:23Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1191330690 + }, + { + "bodyText": "@jeanschmidt which test is it failing on? I tried running the test_eager_transforms in functorch but couldn't reproduce it.", + "createdAt": "2022-07-25T07:11:19Z", + "author": { + "login": "kshitij12345" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1193667568 + }, + { + "bodyText": "@jbschlosser have added a ref as discussed offline. Can you please take a look? And if it looks good, can you import the PR to check if it is breaking anything internally.\nThanks", + "createdAt": "2022-08-03T18:30:17Z", + "author": { + "login": "kshitij12345" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1204329491 + }, + { + "bodyText": "@jbschlosser @jeanschmidt @albanD anything we can do to unblock this on our side?", + "createdAt": "2022-08-20T09:27:17Z", + "author": { + "login": "lezcano" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1221266218 + }, + { + "bodyText": "Functorch tests should be running here now so can you rebase on top of master please?", + "createdAt": "2022-08-22T21:42:37Z", + "author": { + "login": "albanD" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1223129944 + }, + { + "bodyText": "@albanD have rebased on latest master.", + "createdAt": "2022-08-23T08:49:10Z", + "author": { + "login": "kshitij12345" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1223758571 + }, + { + "bodyText": "I triggered all the tests not to have any issues with slow tests again", + "createdAt": "2022-08-23T09:20:18Z", + "author": { + "login": "lezcano" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1223796413 + }, + { + "bodyText": "Thanks @lezcano! However, last time it was reverted for internal failures. So it would be great if someone can import and verify that.\ncc: @albanD @jeanschmidt", + "createdAt": "2022-08-23T10:17:50Z", + "author": { + "login": "kshitij12345" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1223863075 + }, + { + "bodyText": "@albanD has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "createdAt": "2022-08-23T14:43:02Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1224175731 + }, + { + "bodyText": "I am not the right person to provide assistence, as currently I am not based in a Tier 1 location, so my permissions to access are so restricted that I am not able to import this commit, run the tests and provide meaningful responses.", + "createdAt": "2022-08-23T15:57:48Z", + "author": { + "login": "jeanschmidt" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1224272324 + }, + { + "bodyText": "@jeanschmidt has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.", + "createdAt": "2022-08-23T17:00:53Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1224351135 } ], "pageInfo": { - "endCursor": "Mw", - "hasNextPage": false + "startCursor": "Y3Vyc29yOnYyOpHORP1auw==", + "hasPreviousPage": false } - }, - "reviews": { + } + } + } + } + }, + "query_sha=2e2877d2452c4f233f042b7ccd50ab9c2a6e9a73d8819a0c876203c12364e8a3 cursor=Y3Vyc29yOnYyOpHOR1poyg== name=pytorch number=82169 owner=pytorch": { + "data": { + "repository": { + "pullRequest": { + "comments": { "nodes": [ { + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/82169\n\ud83d\udcc4 \u00a0Preview Python docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\u2753Need help or want to give feedback on the CI? Visit our office hours\n\n\u2705 No Failures (0 Pending)\nAs of commit 28140e4 (more details on the Dr. CI page):\nExpand to see more\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2022-07-25T21:41:41Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1194667199 + }, + { + "bodyText": "@pytorchbot merge -g", + "createdAt": "2022-07-25T21:46:04Z", + "author": { + "login": "ezyang" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1194671445 + }, + { + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "createdAt": "2022-07-25T21:47:25Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1194672744 + }, + { + "bodyText": "Merge failed due to Refusing to merge as mandatory check(s) pull failed for rule superuser\nRaised by https://github.com/pytorch/pytorch/actions/runs/2735501647", + "createdAt": "2022-07-25T23:22:45Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1194761219 + }, + { + "bodyText": "@pytorchbot rebase", + "createdAt": "2022-07-26T00:54:17Z", + "author": { + "login": "ezyang" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1194839920 + }, + { + "bodyText": "@pytorchbot successfully started a rebase job. Check the current status here", + "createdAt": "2022-07-26T01:01:32Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1194846575 + }, + { + "bodyText": "Successfully rebased gh/ezyang/1279/orig onto refs/remotes/origin/master, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/82169)", + "createdAt": "2022-07-26T01:01:53Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1194846838 + }, + { + "bodyText": "@pytorchbot rebase", + "createdAt": "2022-07-27T15:32:13Z", + "author": { + "login": "ezyang" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1196915484 + }, + { + "bodyText": "@pytorchbot successfully started a rebase job. Check the current status here", + "createdAt": "2022-07-27T15:33:49Z", "author": { - "login": "ngimel" + "login": "pytorchmergebot" }, - "state": "APPROVED" - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpO5MjAyMi0wNy0xOVQxMDowNzo1NC0wNzowMLkyMDIyLTA3LTE5VDEwOjA3OjU0LTA3OjAwzj43QcY=", - "hasPreviousPage": false - } - }, - "comments": { - "nodes": [ + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1196917359 + }, { - "bodyText": "@pytorchbot revert -m \"breaking internal builds\" -c \"ghfirst\"", + "bodyText": "Successfully rebased gh/ezyang/1279/orig onto refs/remotes/origin/master, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/82169)", + "createdAt": "2022-07-27T15:34:03Z", "author": { - "login": "jeanschmidt" + "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1191328013 + "databaseId": 1196917609 }, { - "bodyText": "@pytorchbot revert -m \"breaking internal builds\" -c \"ghfirst\"", + "bodyText": "@pytorchbot merge -g", + "createdAt": "2022-07-27T15:41:52Z", "author": { - "login": "jeanschmidt" + "login": "ezyang" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1191329792 + "databaseId": 1196927174 }, { - "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "createdAt": "2022-07-27T15:43:11Z", "author": { "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1191330586 + "databaseId": 1196928771 }, { - "bodyText": "@kshitij12345 your PR has been successfully reverted.", + "bodyText": "Merge failed due to Refusing to merge as mandatory check(s) Lint failed for rule superuser\nRaised by https://github.com/pytorch/pytorch/actions/runs/2747872935", + "createdAt": "2022-07-27T15:43:14Z", "author": { "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1191330690 + "databaseId": 1196928849 }, { - "bodyText": "@jeanschmidt which test is it failing on? I tried running the test_eager_transforms in functorch but couldn't reproduce it.", + "bodyText": "@pytorchbot merge -g", + "createdAt": "2022-07-27T16:59:37Z", "author": { - "login": "kshitij12345" + "login": "ezyang" }, - "authorAssociation": "COLLABORATOR", + "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1193667568 - } - ], - "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHORwI5DQ==", - "hasPreviousPage": true - } - }, - "labels": { - "edges": [ + "databaseId": 1197046487 + }, { - "node": { - "name": "open source" - } + "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "createdAt": "2022-07-27T17:07:32Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1197055101 }, { - "node": { - "name": "Merged" - } + "bodyText": "Merge failed due to Refusing to merge as mandatory check(s) Lint failed for rule superuser\nRaised by https://github.com/pytorch/pytorch/actions/runs/2748317347", + "createdAt": "2022-07-27T17:07:36Z", + "author": { + "login": "pytorchmergebot" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1197055259 }, { - "node": { - "name": "cla signed" - } + "bodyText": "@pytorchbot merge -f", + "createdAt": "2022-07-27T17:56:26Z", + "author": { + "login": "malfet" + }, + "authorAssociation": "MEMBER", + "editor": null, + "databaseId": 1197107106 }, { - "node": { - "name": "Reverted" - } + "bodyText": "\u274c \ud83e\udd16 pytorchbot command failed:\n@pytorchbot merge: error: argument -f/--force: expected one argument\n\nusage: @pytorchbot merge [-g | -f FORCE | -l]\n\nTry @pytorchbot --help for more info.", + "createdAt": "2022-07-27T17:56:27Z", + "author": { + "login": "pytorch-bot" + }, + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1197107129 } - ] + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHORzUsvw==", + "hasPreviousPage": false + } } } } } }, - "query_sha=62ce809793481ce6ddce6e1a19d9b0761755ff0ff75decaf8a79419eaf793110 cursor=Y3Vyc29yOnYyOpHORwI5DQ== name=pytorch number=79694 owner=pytorch": { + "query_sha=2e2877d2452c4f233f042b7ccd50ab9c2a6e9a73d8819a0c876203c12364e8a3 cursor=Y3Vyc29yOnYyOpHOPoR4Lg== name=pytorch number=71759 owner=pytorch": { "data": { "repository": { "pullRequest": { "comments": { "nodes": [ { - "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/79694\n\ud83d\udcc4 \u00a0Preview Python docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\u2753Need help or want to give feedback on the CI? Visit our office hours\n\n\u2705 No Failures (0 Pending)\nAs of commit ffe4339 (more details on the Dr. CI page):\nExpand to see more\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", - "author": { - "login": "facebook-github-bot" - }, - "authorAssociation": "MEMBER", - "editor": { - "login": "facebook-github-bot" - }, - "databaseId": 1157454523 - }, - { - "bodyText": "Unable to reproduce jit failure locally (will skip the test)\nCI Failure : https://github.com/pytorch/pytorch/runs/6926187074?check_suite_focus=true#step:9:20230\npytest test/test_ops_jit.py -k test_variant_consistency_jit_nn_functional_conv_transpose1d_cpu_complex64 -v\n=============================================================== test session starts ===============================================================\nplatform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- /home/kshiteej/.conda/envs/pytorch-cuda-dev/bin/python\ncachedir: .pytest_cache\nhypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/kshiteej/Pytorch/pytorch_complex_convolution.py/.hypothesis/examples')\nrootdir: /home/kshiteej/Pytorch/pytorch_complex_convolution.py, configfile: pytest.ini\nplugins: hypothesis-6.23.2, repeat-0.9.1\ncollected 1976 items / 1975 deselected / 1 selected \n\ntest/test_ops_jit.py::TestJitCPU::test_variant_consistency_jit_nn_functional_conv_transpose1d_cpu_complex64 PASSED [100%]\n\n================================================================ warnings summary =================================================================\n../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/testing/_internal/common_cuda.py:9\n /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/testing/_internal/common_cuda.py:9: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives\n from distutils.version import LooseVersion\n\n../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:91\n /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:91: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.\n warnings.warn(\n\n-- Docs: https://docs.pytest.org/en/stable/warnings.html\n================================================= 1 passed, 1975 deselected, 2 warnings in 4.90s =================================================", + "bodyText": "CI Flow Status\n\u269b\ufe0f CI Flow\nRuleset - Version: v1\nRuleset - File: https://github.com/coolteemf/pytorch/blob/7647f7953a68e4f1c3feaa19c77d925abfe8e377/.github/generated-ciflow-ruleset.json\nPR ciflow labels: ciflow/default\nAdd ciflow labels to this PR to trigger more builds:\n\n\n\nWorkflows\nLabels (bold enabled)\nStatus\n\n\n\n\nTriggered Workflows\n\n\n\n\nlinux-bionic-py3.6-clang9\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla\n\u2705 triggered\n\n\nlinux-xenial-cuda11.3-py3.6-gcc7\nciflow/all, ciflow/cuda, ciflow/default, ciflow/linux\n\u2705 triggered\n\n\nlinux-xenial-py3.6-clang7-asan\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers\n\u2705 triggered\n\n\nlinux-xenial-py3.6-gcc5.4\nciflow/all, ciflow/cpu, ciflow/default, ciflow/linux\n\u2705 triggered\n\n\nlinux-xenial-py3.6-gcc7-bazel-test\nciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux\n\u2705 triggered\n\n\nwin-vs2019-cpu-py3\nciflow/all, ciflow/cpu, ciflow/default, ciflow/win\n\u2705 triggered\n\n\nwin-vs2019-cuda11.3-py3\nciflow/all, ciflow/cuda, ciflow/default, ciflow/win\n\u2705 triggered\n\n\nSkipped Workflows\n\n\n\n\nlibtorch-linux-xenial-cuda10.2-py3.6-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux\n\ud83d\udeab skipped\n\n\nlibtorch-linux-xenial-cuda11.3-py3.6-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux\n\ud83d\udeab skipped\n\n\nlinux-bionic-cuda10.2-py3.9-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow\n\ud83d\udeab skipped\n\n\nlinux-xenial-cuda10.2-py3.6-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow\n\ud83d\udeab skipped\n\n\nparallelnative-linux-xenial-py3.6-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux\n\ud83d\udeab skipped\n\n\nperiodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7\nciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-linux-xenial-cuda11.1-py3.6-gcc7\nciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled\n\ud83d\udeab skipped\n\n\nperiodic-win-vs2019-cuda11.1-py3\nciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win\n\ud83d\udeab skipped\n\n\npuretorch-linux-xenial-py3.6-gcc5.4\nciflow/all, ciflow/cpu, ciflow/linux\n\ud83d\udeab skipped", + "createdAt": "2022-01-25T09:31:05Z", "author": { - "login": "kshitij12345" - }, - "authorAssociation": "COLLABORATOR", - "editor": { - "login": "kshitij12345" + "login": "pytorch-bot" }, - "databaseId": 1186949486 + "authorAssociation": "NONE", + "editor": null, + "databaseId": 1020983378 }, { - "bodyText": "@pytorchbot merge", + "bodyText": "Hi @coolteemf!\nThank you for your pull request and welcome to our community.\nAction Required\nIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.\nProcess\nIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.\nOnce the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.\nIf you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!", + "createdAt": "2022-01-25T09:31:06Z", "author": { - "login": "ngimel" + "login": "facebook-github-bot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1189347786 + "databaseId": 1020983383 }, { - "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/71759\n\ud83d\udcc4 \u00a0Preview docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\ud83d\udd27 \u00a0Opt-in to CIFlow to control what jobs run on your PRs\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit 346e0c5 (more details on the Dr. CI page):\n\n\n2/3 failures introduced in this PR\n1/3 tentatively recognized as flaky \u2744\ufe0f\n\nClick here to rerun these jobs\n\n\n\n\n\ud83d\udd75\ufe0f 2 new failures recognized by patterns\nThe following CI failures do not appear to be due to upstream breakages:\n win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (1/2)\nStep: \"Test\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-02-23T14:12:58.9371445Z FAIL [0.010s]: test_sparse_addmm_cpu_bfloat16 (__main__.TestSparseCPU)\n\n2022-02-23T14:12:58.9258506Z test_sparse_zeros_tanh_cpu_float64 (__main__.TestSparseUnaryUfuncsCPU) ... ok (0.002s)\n2022-02-23T14:12:58.9274771Z test_sparse_zeros_tanh_cpu_int16 (__main__.TestSparseUnaryUfuncsCPU) ... ok (0.001s)\n2022-02-23T14:12:58.9290805Z test_sparse_zeros_tanh_cpu_int32 (__main__.TestSparseUnaryUfuncsCPU) ... ok (0.001s)\n2022-02-23T14:12:58.9306695Z test_sparse_zeros_tanh_cpu_int64 (__main__.TestSparseUnaryUfuncsCPU) ... ok (0.000s)\n2022-02-23T14:12:58.9322595Z test_sparse_zeros_tanh_cpu_int8 (__main__.TestSparseUnaryUfuncsCPU) ... ok (0.000s)\n2022-02-23T14:12:58.9338535Z test_sparse_zeros_tanh_cpu_uint8 (__main__.TestSparseUnaryUfuncsCPU) ... ok (0.000s)\n2022-02-23T14:12:58.9354468Z test_sparse_zeros_trunc_cpu_float32 (__main__.TestSparseUnaryUfuncsCPU) ... ok (0.000s)\n2022-02-23T14:12:58.9370208Z test_sparse_zeros_trunc_cpu_float64 (__main__.TestSparseUnaryUfuncsCPU) ... ok (0.000s)\n2022-02-23T14:12:58.9370712Z \n2022-02-23T14:12:58.9370976Z ======================================================================\n2022-02-23T14:12:58.9371445Z FAIL [0.010s]: test_sparse_addmm_cpu_bfloat16 (__main__.TestSparseCPU)\n2022-02-23T14:12:58.9372134Z ----------------------------------------------------------------------\n2022-02-23T14:12:58.9372597Z Traceback (most recent call last):\n2022-02-23T14:12:58.9374021Z File \"C:\\actions-runner\\_work\\pytorch\\pytorch\\build\\win_tmp\\build\\torch\\testing\\_internal\\common_device_type.py\", line 376, in instantiated_test\n2022-02-23T14:12:58.9374740Z result = test(self, **param_kwargs)\n2022-02-23T14:12:58.9375570Z File \"C:\\actions-runner\\_work\\pytorch\\pytorch\\build\\win_tmp\\build\\torch\\testing\\_internal\\common_utils.py\", line 2951, in wrapped\n2022-02-23T14:12:58.9376266Z f(self, *args, **kwargs, coalesced=False)\n2022-02-23T14:12:58.9376972Z File \"test_sparse.py\", line 1272, in test_sparse_addmm\n2022-02-23T14:12:58.9377402Z test_shape(7, 8, 9, 20, True, None)\n2022-02-23T14:12:58.9377939Z File \"test_sparse.py\", line 1264, in test_shape\n2022-02-23T14:12:58.9378373Z self.assertEqual(Y, Y_dense)\n\n\n win-vs2019-cuda11.3-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu) (2/2)\nStep: \"Test\" (full log | diagnosis details | \ud83d\udd01 rerun)\n\n\n2022-02-23T15:20:20.5710678Z FAIL [0.031s]: test_sparse_addmm_cpu_bfloat16 (__main__.TestSparseCPU)\n\n2022-02-23T15:20:20.5569146Z test_sparse_zeros_tanh_cuda_float64 (__main__.TestSparseUnaryUfuncsCUDA) ... ok (0.000s)\n2022-02-23T15:20:20.5589083Z test_sparse_zeros_tanh_cuda_int16 (__main__.TestSparseUnaryUfuncsCUDA) ... ok (0.000s)\n2022-02-23T15:20:20.5609025Z test_sparse_zeros_tanh_cuda_int32 (__main__.TestSparseUnaryUfuncsCUDA) ... ok (0.000s)\n2022-02-23T15:20:20.5629080Z test_sparse_zeros_tanh_cuda_int64 (__main__.TestSparseUnaryUfuncsCUDA) ... ok (0.016s)\n2022-02-23T15:20:20.5649102Z test_sparse_zeros_tanh_cuda_int8 (__main__.TestSparseUnaryUfuncsCUDA) ... ok (0.000s)\n2022-02-23T15:20:20.5668867Z test_sparse_zeros_tanh_cuda_uint8 (__main__.TestSparseUnaryUfuncsCUDA) ... ok (0.000s)\n2022-02-23T15:20:20.5688700Z test_sparse_zeros_trunc_cuda_float32 (__main__.TestSparseUnaryUfuncsCUDA) ... ok (0.000s)\n2022-02-23T15:20:20.5708285Z test_sparse_zeros_trunc_cuda_float64 (__main__.TestSparseUnaryUfuncsCUDA) ... ok (0.000s)\n2022-02-23T15:20:20.5709405Z \n2022-02-23T15:20:20.5709879Z ======================================================================\n2022-02-23T15:20:20.5710678Z FAIL [0.031s]: test_sparse_addmm_cpu_bfloat16 (__main__.TestSparseCPU)\n2022-02-23T15:20:20.5711399Z ----------------------------------------------------------------------\n2022-02-23T15:20:20.5712013Z Traceback (most recent call last):\n2022-02-23T15:20:20.5713280Z File \"C:\\actions-runner\\_work\\pytorch\\pytorch\\build\\win_tmp\\build\\torch\\testing\\_internal\\common_device_type.py\", line 376, in instantiated_test\n2022-02-23T15:20:20.5714267Z result = test(self, **param_kwargs)\n2022-02-23T15:20:20.5715299Z File \"C:\\actions-runner\\_work\\pytorch\\pytorch\\build\\win_tmp\\build\\torch\\testing\\_internal\\common_utils.py\", line 2951, in wrapped\n2022-02-23T15:20:20.5716240Z f(self, *args, **kwargs, coalesced=False)\n2022-02-23T15:20:20.5716943Z File \"test_sparse.py\", line 1275, in test_sparse_addmm\n2022-02-23T15:20:20.5717516Z test_shape(7, 8, 9, 20, False, (1, 1))\n2022-02-23T15:20:20.5718323Z File \"test_sparse.py\", line 1264, in test_shape\n2022-02-23T15:20:20.5718915Z self.assertEqual(Y, Y_dense)\n\n\n\n\u2744\ufe0f 1 failure tentatively classified as flaky\nbut reruns have not yet been triggered to confirm:\n linux-bionic-rocm4.5-py3.7 / test (distributed, 1, 1, linux.rocm.gpu) (1/1)\nStep: \"Test\" (full log | diagnosis details | \ud83d\udd01 rerun) \u2744\ufe0f\n\n\n2022-02-23T16:16:26.7221984Z RuntimeError: Proc...ated or timed out after 100.06913685798645 seconds\n\n2022-02-23T16:16:26.7207909Z ERROR [100.093s]: test_collect_shards (__main__.TestZeroRedundancyOptimizerDistributed)\n2022-02-23T16:16:26.7209206Z Check the state consolidation mechanism, and the state dict exposed by ZeroRedundancyOptimizer\n2022-02-23T16:16:26.7213073Z ----------------------------------------------------------------------\n2022-02-23T16:16:26.7213996Z Traceback (most recent call last):\n2022-02-23T16:16:26.7215434Z File \"/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py\", line 483, in wrapper\n2022-02-23T16:16:26.7216409Z self._join_processes(fn)\n2022-02-23T16:16:26.7217801Z File \"/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py\", line 702, in _join_processes\n2022-02-23T16:16:26.7218822Z self._check_return_codes(elapsed_time)\n2022-02-23T16:16:26.7220266Z File \"/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py\", line 754, in _check_return_codes\n2022-02-23T16:16:26.7221201Z i, elapsed_time\n2022-02-23T16:16:26.7221984Z RuntimeError: Process 0 terminated or timed out after 100.06913685798645 seconds\n2022-02-23T16:16:26.7222551Z \n2022-02-23T16:16:26.7223245Z ----------------------------------------------------------------------\n2022-02-23T16:16:26.7224032Z Ran 26 tests in 303.663s\n2022-02-23T16:16:26.7224400Z \n2022-02-23T16:16:26.7224780Z FAILED (errors=1, skipped=8, unexpected successes=3)\n2022-02-23T16:16:26.7225718Z \n2022-02-23T16:16:26.7225992Z Generating XML reports...\n2022-02-23T16:16:26.7336797Z Generated XML report: test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerDistributed-20220223161123.xml\n2022-02-23T16:16:26.7349296Z Generated XML report: test-reports/python-unittest/distributed.optim.test_zero_redundancy_optimizer/TEST-TestZeroRedundancyOptimizerSingleRank-20220223161123.xml\n2022-02-23T16:16:27.6823633Z Traceback (most recent call last):\n\n\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2022-01-25T09:31:08Z", "author": { - "login": "pytorchmergebot" + "login": "facebook-github-bot" }, "authorAssociation": "MEMBER", - "editor": null, - "databaseId": 1189350009 + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1020983433 }, { - "bodyText": "Hey @kshitij12345.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "bodyText": "Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!", + "createdAt": "2022-01-25T18:07:45Z", "author": { - "login": "github-actions" + "login": "facebook-github-bot" }, - "authorAssociation": "NONE", + "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1189350932 + "databaseId": 1021467314 }, { - "bodyText": "@pytorchbot revert -m \"broke slow test https://github.com/pytorch/pytorch/runs/7414560957?check_suite_focus=true#step:9:31516\" -c \"nosignal\"", + "bodyText": "@albanD Is there something that needs to be done to correct the failed check ?", + "createdAt": "2022-02-04T13:18:05Z", "author": { - "login": "kshitij12345" + "login": "coolteemf" }, - "authorAssociation": "COLLABORATOR", + "authorAssociation": "CONTRIBUTOR", "editor": null, - "databaseId": 1189459845 + "databaseId": 1029978104 }, { - "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", + "bodyText": "Hi,\nI think you didn't do the merge properly as there are now a lot more commits than it should be in this PR.\nYou can either clean up the branch locally and force push here or open a new clean PR.\nNote that in general, it is better to rebase on top of master than merge master into your branch!", + "createdAt": "2022-02-04T14:28:28Z", "author": { - "login": "pytorchmergebot" + "login": "albanD" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1189460926 + "databaseId": 1030038719 }, { - "bodyText": "Will not revert as @kshitij12345 is not a MEMBER, but COLLABORATOR", + "bodyText": "Okay thank you for the heads up", + "createdAt": "2022-02-04T16:44:46Z", "author": { - "login": "pytorchmergebot" + "login": "coolteemf" }, - "authorAssociation": "MEMBER", + "authorAssociation": "CONTRIBUTOR", "editor": null, - "databaseId": 1189460942 + "databaseId": 1030159616 }, { - "bodyText": "@pytorchbot revert -m \"broke slow test https://github.com/pytorch/pytorch/runs/7414560957?check_suite_focus=true#step:9:31516\" -c \"nosignal\"", + "bodyText": "@albanD I just rebased and updated the branch to take into account changes from 28388b4. Is it all clear for merging ?", + "createdAt": "2022-02-16T15:34:59Z", "author": { - "login": "anjali411" + "login": "coolteemf" }, - "authorAssociation": "MEMBER", + "authorAssociation": "CONTRIBUTOR", "editor": null, - "databaseId": 1189529734 + "databaseId": 1041720345 }, { - "bodyText": "@pytorchbot successfully started a revert job. Check the current status here", + "bodyText": "Thanks! The CI needs fixing for bc-compat and lint though\n\nThe lint should be fixed, however I didn't find clear instructions on how to fix the bc compat.\nI guess output_mask could be made optional, however in the case of native_group_norm_backward the same argument is not optional.", + "createdAt": "2022-02-17T08:04:30Z", "author": { - "login": "pytorchmergebot" + "login": "coolteemf" }, - "authorAssociation": "MEMBER", + "authorAssociation": "CONTRIBUTOR", "editor": null, - "databaseId": 1189530756 + "databaseId": 1042672732 }, { - "bodyText": "@kshitij12345 your PR has been successfully reverted.", + "bodyText": "Since we are changing the signature on purpose here, you can add it to the list at https://github.com/pytorch/pytorch/blob/master/test/forward_backward_compatibility/check_forward_backward_compatibility.py#L29 to silence the test.", + "createdAt": "2022-02-17T14:41:16Z", "author": { - "login": "pytorchmergebot" + "login": "albanD" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1189530831 + "databaseId": 1043020903 }, { - "bodyText": "@pytorchbot merge -g", + "bodyText": "@pytorchbot merge this please", + "createdAt": "2022-02-23T14:48:05Z", "author": { - "login": "kshitij12345" + "login": "albanD" }, - "authorAssociation": "COLLABORATOR", + "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1190070141 + "databaseId": 1048861185 }, { - "bodyText": "@pytorchbot successfully started a merge job. Check the current status here", + "bodyText": "Merge failed due to 'NoneType' object is not subscriptable\nRaised by https://github.com/pytorch/pytorch/actions/runs/1887914411", + "createdAt": "2022-02-23T14:49:16Z", "author": { "login": "pytorchmergebot" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1190071424 + "databaseId": 1048862374 }, { - "bodyText": "Hey @kshitij12345.\nYou've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.\nFor changes that are 'topic: not user facing' there is no need for a release notes label.", + "bodyText": "@coolteemf you can ignore me playing with the bot. Nothing is needed on your end anymore, I'll take it from here.", + "createdAt": "2022-02-23T14:52:10Z", "author": { - "login": "github-actions" + "login": "albanD" }, - "authorAssociation": "NONE", + "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1190258272 + "databaseId": 1048865236 }, { - "bodyText": "commit is breaking internal builds/tests https://pastebin.com/HX4RUusH (pytorch/functorch/test:test_eager_transforms)", + "bodyText": "@pytorchbot merge this", + "createdAt": "2022-02-23T14:54:23Z", "author": { - "login": "jeanschmidt" + "login": "malfet" }, "authorAssociation": "MEMBER", "editor": null, - "databaseId": 1191327616 + "databaseId": 1048867615 } ], "pageInfo": { - "startCursor": "Y3Vyc29yOnYyOpHORP1auw==", + "startCursor": "Y3Vyc29yOnYyOpHOPNr4Ug==", "hasPreviousPage": false } } @@ -19965,88 +35867,39 @@ } } }, - "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAbqubxc= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAbtMgmA= name=pytorch number=79694 owner=pytorch": { + "query_sha=2e2877d2452c4f233f042b7ccd50ab9c2a6e9a73d8819a0c876203c12364e8a3 cursor=Y3Vyc29yOnYyOpHOQebHmg== name=pytorch number=75095 owner=pytorch": { "data": { "repository": { "pullRequest": { - "commits": { + "comments": { "nodes": [ { - "commit": { - "oid": "ffe43399d6f60ef7844523a5f465c11d9a67062f", - "checkSuites": { - "nodes": [ - { - "checkRuns": { - "nodes": [ - { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7427036779?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7427036925?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbqvlv0=", - "hasNextPage": false - } - } - } - ] - } - } - } - ] - } - } - } - } - }, - "query_sha=4c16925415d1fcc12ac0f5f7ce73b8e6122997d2f51c4c2757c2543e6493c60d cr_cursor=Y3Vyc29yOnYyOpHPAAAAAbxLMxE= cs_cursor=Y3Vyc29yOnYyOpHPAAAAAbzb6nI= name=pytorch number=79694 owner=pytorch": { - "data": { - "repository": { - "pullRequest": { - "commits": { - "nodes": [ + "bodyText": "\ud83d\udd17 Helpful links\n\n\ud83e\uddea \u00a0See artifacts and rendered test results at hud.pytorch.org/pr/75095\n\ud83d\udcc4 \u00a0Preview Python docs built from this PR\n\ud83d\udcc4 \u00a0Preview C++ docs built from this PR\n\u2753Need help or want to give feedback on the CI? Visit our office hours\n\n\ud83d\udc8a CI failures summary and remediations\nAs of commit db355d5 (more details on the Dr. CI page):\nExpand to see more\n\n\ud83d\udc9a \ud83d\udc9a Looks good so far! There are no failures yet. \ud83d\udc9a \ud83d\udc9a\n\nThis comment was automatically generated by Dr. CI (expand for details).\nPlease report bugs/suggestions to the (internal) Dr. CI Users group.\nClick here to manually regenerate this comment.", + "createdAt": "2022-04-01T08:49:06Z", + "author": { + "login": "facebook-github-bot" + }, + "authorAssociation": "MEMBER", + "editor": { + "login": "facebook-github-bot" + }, + "databaseId": 1085625658 + }, { - "commit": { - "oid": "ffe43399d6f60ef7844523a5f465c11d9a67062f", - "checkSuites": { - "nodes": [ - { - "checkRuns": { - "nodes": [ - { - "name": "linux-xenial-cuda11_3-py3_7-gcc7-deploy / test (deploy, 1, 1, linux.4xlarge.nvidia.gpu)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454025911?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454189584?check_suite_focus=true" - }, - { - "name": "win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge)", - "conclusion": "SUCCESS", - "detailsUrl": "https://github.com/pytorch/pytorch/runs/7454189772?check_suite_focus=true" - } - ], - "pageInfo": { - "endCursor": "Y3Vyc29yOnYyOpHPAAAAAbxN6Mw=", - "hasNextPage": false - } - } - } - ] - } - } + "bodyText": "High level question: how do we plan to validate that our ref implementations are compatible with somewhat-symbolic shapes? There are multiple ways to write the shape processing logic to be compatible vs not, it'd be good to catch such instances early. Does it make sense to throw in some proxy objects (that have state of 0,1,N) in tests early on? (maybe in a follow up PR). Otherwise it's not clear to me that squeeze/broadcast/etc are the right set of primitives for symbolic shapes", + "createdAt": "2022-04-21T18:51:24Z", + "author": { + "login": "dzhulgakov" + }, + "authorAssociation": "COLLABORATOR", + "editor": null, + "databaseId": 1105634766 } - ] + ], + "pageInfo": { + "startCursor": "Y3Vyc29yOnYyOpHOQLVVOg==", + "hasPreviousPage": false + } } } } diff --git a/.github/scripts/install_nvidia_utils_linux.sh b/.github/scripts/install_nvidia_utils_linux.sh deleted file mode 100755 index b5274fb5805f..000000000000 --- a/.github/scripts/install_nvidia_utils_linux.sh +++ /dev/null @@ -1,57 +0,0 @@ -#!/usr/bin/env bash - -set -eou pipefail - - -DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) \ -DRIVER_FN="NVIDIA-Linux-x86_64-515.57.run" -YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" - -install_nvidia_docker2_amzn2() { - ( - set -x - # Needed for yum-config-manager - sudo yum install -y yum-utils - sudo yum-config-manager --add-repo "${YUM_REPO_URL}" - sudo yum install -y nvidia-docker2 - sudo systemctl restart docker - ) -} - -install_nvidia_driver_amzn2() { - ( - set -x - sudo yum groupinstall -y "Development Tools" - # ensure our kernel install is the same as our underlying kernel, - # groupinstall "Development Tools" has a habit of mismatching kernel headers - sudo yum install -y "kernel-devel-uname-r == $(uname -r)" - sudo modprobe backlight - sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" - sudo /bin/bash /tmp/nvidia_driver -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false) - sudo rm -fv /tmp/nvidia_driver - nvidia-smi - ) -} - -# Install container toolkit based on distribution -echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" -case "${DISTRIBUTION}" in - amzn*) - install_nvidia_docker2_amzn2 - ;; - *) - echo "ERROR: Unknown distribution ${DISTRIBUTION}" - exit 1 - ;; -esac - -echo "== Installing nvidia driver ${DRIVER_FN} ==" -case "${DISTRIBUTION}" in - amzn*) - install_nvidia_driver_amzn2 - ;; - *) - echo "ERROR: Unknown distribution ${DISTRIBUTION}" - exit 1 - ;; -esac diff --git a/.github/scripts/parse_ref.py b/.github/scripts/parse_ref.py index 036146f734c3..59a454fe3025 100755 --- a/.github/scripts/parse_ref.py +++ b/.github/scripts/parse_ref.py @@ -4,18 +4,26 @@ import re +def set_output(name: str, val: str) -> None: + if os.getenv("GITHUB_OUTPUT"): + with open(str(os.getenv("GITHUB_OUTPUT")), "a") as env: + print(f"{name}={val}", file=env) + else: + print(f"::set-output name={name}::{val}") + + def main() -> None: - ref = os.environ['GITHUB_REF'] + ref = os.environ["GITHUB_REF"] m = re.match(r'^refs/(\w+)/(.*)$', ref) if m: category, stripped = m.groups() - if category == 'heads': - print(f'::set-output name=branch::{stripped}') - elif category == 'pull': - print(f'::set-output name=branch::pull/{stripped.split("/")[0]}') - elif category == 'tags': - print(f'::set-output name=tag::{stripped}') + if category == "heads": + set_output("branch", stripped) + elif category == "pull": + set_output("branch", "pull/" + stripped.split("/")[0]) + elif category == "tags": + set_output("tag", stripped) -if __name__ == '__main__': +if __name__ == "__main__": main() diff --git a/.github/scripts/pr-sanity-check.sh b/.github/scripts/pr-sanity-check.sh new file mode 100644 index 000000000000..13d037d5eaab --- /dev/null +++ b/.github/scripts/pr-sanity-check.sh @@ -0,0 +1,60 @@ +#!/usr/bin/env bash + +set -eou pipefail + +GIT_TOP_DIR=$(git rev-parse --show-toplevel) + +TMPFILE=$(mktemp) +trap "rm -rf ${TMPFILE}" EXIT + +# By default just run against the latest commit +BASE=${BASE:-HEAD~1} +HEAD=${HEAD:-HEAD} + +ancestor=$(git merge-base "${BASE}" "${HEAD}") +echo "INFO: Checking aginst the following stats" +( + set -x + git diff --stat "$ancestor" "${HEAD}" | sed '$d' > "${TMPFILE}" +) + +while read -r git_attribute; do + if echo "${git_attribute}" | grep "linguist-generated=true" >/dev/null 2>/dev/null; then + pattern=$(echo ${git_attribute} | cut -d' ' -f1) + escaped_pattern=$(printf '%s\n' "$pattern" | sed -e 's/[\/&]/\\&/g') + # Delete known generated files + sed -i '/'"${escaped_pattern}"'/d' "${TMPFILE}" + fi +done < "${GIT_TOP_DIR}/.gitattributes" + +echo "INFO: Showing non-generated files:" +( + set -x + cat "${TMPFILE}" +) + +# Get only files that have changed +changed_files=$(cut -d' ' -f2 "${TMPFILE}" | xargs) + +details=$(git diff --shortstat "$ancestor" "${HEAD}" -- ${changed_files}) +add=$(echo "$details" | grep -o '[0-9]* insertion' | grep -o '[0-9]*' || true) +remove=$(echo "$details" | grep -o '[0-9]* deletion' | grep -o '[0-9]*' || true) +pr_size=0 +if [ "$add" ]; then + pr_size=$(("$pr_size" + "$add")) +fi +if [ "$remove" ]; then + pr_size=$(("$pr_size" + "$remove")) +fi +echo "INFO: PR SIZE is ${pr_size}" + +if ((pr_size > 2000)); then + echo + echo 'Your PR is '"$pr_size"' LOC which is more than the 2000 maximum' + echo 'allowed within PyTorch infra. PLease make sure to split up' + echo 'your PR into smaller pieces that can be reviewed.' + echo 'If you think that this rule should not apply to your PR,' + echo 'please contact @albanD or @seemethere.' + echo + exit 1 +fi diff --git a/.github/scripts/process_commit.py b/.github/scripts/process_commit.py deleted file mode 100644 index 1bfca3237984..000000000000 --- a/.github/scripts/process_commit.py +++ /dev/null @@ -1,106 +0,0 @@ -#!/usr/bin/env python3 -""" -This script finds the user/pr creator responsible for labeling a PR by a commit SHA. It is used by the workflow in -'.github/workflows/pr-labels.yml'. If there exists no PR associated with the commit or the PR is properly labeled, -this script is a no-op. - -Note: we ping the user only, not the reviewers, as the reviewers can sometimes be external to pytorch -with no labeling responsibility, so we don't want to bother them. -This script is based on: https://github.com/pytorch/vision/blob/main/.github/process_commit.py -""" - -import sys -from typing import Any, Set, Tuple, List -import re -import os -import json -import requests - -# For a PR to be properly labeled it should have release notes label and one topic label -PULL_REQUEST_EXP = "Pull Request resolved:.*pull/(.*)" -PRIMARY_LABEL_FILTER = "release notes:" -SECONDARY_LABELS = { - "topic: bc_breaking", - "topic: deprecation", - "topic: new feature", - "topic: improvements", - "topic: bug fixes", - "topic: performance", - "topic: documentation", - "topic: developer feature", - "topic: not user facing", -} -# This secondary does not require a primary -ALLOWED_ONLY_SECONDARY = {"topic: not user facing"} -PYTORCH_REPO = "https://api.github.com/repos/pytorch/pytorch" -GITHUB_TOKEN = os.environ.get('GITHUB_TOKEN') -REQUEST_HEADERS = {'Accept': 'application/vnd.github.v3+json', 'Authorization': f'token {GITHUB_TOKEN}'} - - -def query_pytorch(cmd: str) -> Any: - response = requests.get(f"{PYTORCH_REPO}/{cmd}", headers=REQUEST_HEADERS) - return response.json() - - -def get_pr_number(commit_hash: str) -> Any: - data = query_pytorch(f"commits/{commit_hash}") - if not data or (not data["commit"]["message"]): - return None - message = data["commit"]["message"] - p = re.compile(PULL_REQUEST_EXP) - result = p.search(message) - if not result: - return None - return result.group(1) - - -def get_pr_author_and_labels(pr_number: int) -> Tuple[str, Set[str]]: - # See https://docs.github.com/en/rest/reference/pulls#get-a-pull-request - data = query_pytorch(f"pulls/{pr_number}") - user = data["user"]["login"] - labels = {label["name"] for label in data["labels"]} - return user, labels - -def get_repo_labels() -> List[str]: - collected_labels: List[str] = list() - for page in range(0, 10): - response = query_pytorch(f"labels?per_page=100&page={page}") - page_labels = list(map(lambda x: str(x["name"]), response)) - if not page_labels: - break - collected_labels += page_labels - return collected_labels - -def post_pytorch_comment(pr_number: int, merger: str) -> Any: - message = {'body' : f"Hey @{merger}." + """ -You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. \ -Please add one of each to the PR. The 'release notes: ...' label should represent the part of \ -PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should \ -represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). \ -The list of valid labels can be found [here](https://github.com/pytorch/pytorch/labels?q=release+notes) \ -for the 'release notes: ...' and [here](https://github.com/pytorch/pytorch/labels?q=topic) for the \ -'topics: ...'. -For changes that are 'topic: not user facing' there is no need for a release notes label."""} - - response = requests.post( - f"{PYTORCH_REPO}/issues/{pr_number}/comments", - json.dumps(message), - headers=REQUEST_HEADERS) - return response.json() - -if __name__ == "__main__": - commit_hash = sys.argv[1] - pr_number = get_pr_number(commit_hash) - - if not pr_number: - sys.exit(0) - - user, labels = get_pr_author_and_labels(pr_number) - repo_labels = get_repo_labels() - - primary_labels = set(filter(lambda x: x.startswith(PRIMARY_LABEL_FILTER), repo_labels)) - has_both_labels = bool(primary_labels.intersection(labels) and SECONDARY_LABELS.intersection(labels)) - is_properly_labeled = has_both_labels or bool(ALLOWED_ONLY_SECONDARY.intersection(labels)) - - if not is_properly_labeled: - post_pytorch_comment(pr_number, user) diff --git a/.github/scripts/run_torchbench.py b/.github/scripts/run_torchbench.py index 44e53f6a14e2..352da69c8158 100644 --- a/.github/scripts/run_torchbench.py +++ b/.github/scripts/run_torchbench.py @@ -13,10 +13,12 @@ # 1. Does not reuse the build artifact in other CI workflows # 2. CI jobs are serialized because there is only one worker import os +import boto3 # type: ignore[import] import git # type: ignore[import] import pathlib import argparse import subprocess +from pathlib import Path from typing import List, Tuple @@ -31,6 +33,25 @@ direction: decrease timeout: 720 tests:""" +S3_BUCKET = "ossci-metrics" +S3_PREFIX = "torchbench-pr-test" +S3_URL_BASE = f"https://{S3_BUCKET}.s3.amazonaws.com/" + +class S3Client: + def __init__(self, bucket: str = S3_BUCKET, prefix: str = S3_PREFIX): + self.s3 = boto3.client('s3') + self.resource = boto3.resource('s3') + self.bucket = bucket + self.prefix = prefix + + def upload_file(self, file_path: Path, filekey_prefix: str) -> None: + assert file_path.is_file(), f"Specified file path {file_path} does not exist or not file." + file_name = file_path.name + s3_key = f"{self.prefix}/{filekey_prefix}/{file_name}" + print(f"Uploading file {file_name} to S3 with key: {s3_key}") + self.s3.upload_file(str(file_path), self.bucket, s3_key) + # output the result URL + print(f"Uploaded the result file {file_name} to {S3_URL_BASE}{s3_key}") def gen_abtest_config(control: str, treatment: str, models: List[str]) -> str: d = {} @@ -121,6 +142,7 @@ def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) -> "--pytorch-src", pytorch_path, "--torchbench-src", torchbench_path, "--config", os.path.join(output_dir, TORCHBENCH_CONFIG_NAME), "--output", os.path.join(output_dir, "result.txt")] + print(f"Running torchbench command: {command}") subprocess.check_call(command, cwd=torchbench_path, env=env) def run_userbenchmarks(pytorch_path: str, torchbench_path: str, base_sha: str, head_sha: str, @@ -133,11 +155,24 @@ def run_userbenchmarks(pytorch_path: str, torchbench_path: str, base_sha: str, h "--head", head_sha, "--userbenchmark", userbenchmark, "--output-dir", output_dir] + print(f"Running torchbench userbenchmark command: {command}") subprocess.check_call(command, cwd=torchbench_path, env=env) +def process_upload_s3(result_dir: str) -> None: + # validate result directory + result_dir_path = Path(result_dir) + assert result_dir_path.exists(), f"Specified result directory {result_dir} doesn't exist." + # upload all files to S3 bucket oss-ci-metrics + files = [x for x in result_dir_path.iterdir() if x.is_file()] + # upload file to S3 bucket + s3_client: S3Client = S3Client() + filekey_prefix = result_dir_path.name + for f in files: + s3_client.upload_file(f, filekey_prefix) + if __name__ == "__main__": parser = argparse.ArgumentParser(description='Run TorchBench tests based on PR') - parser.add_argument('--pr-body', required=True, help="The file that contains body of a Pull Request") + parser.add_argument('--pr-body', help="The file that contains body of a Pull Request") subparsers = parser.add_subparsers(dest='command') # parser for setup the torchbench branch name env @@ -149,6 +184,9 @@ def run_userbenchmarks(pytorch_path: str, torchbench_path: str, base_sha: str, h run_parser.add_argument('--pr-head-sha', required=True, type=str, help="The Pull Request head hash") run_parser.add_argument('--pytorch-path', required=True, type=str, help="Path to pytorch repository") run_parser.add_argument('--torchbench-path', required=True, type=str, help="Path to TorchBench repository") + # parser to upload results to S3 + upload_parser = subparsers.add_parser("upload-s3") + upload_parser.add_argument('--result-dir', required=True, type=str, help="Path to benchmark output") args = parser.parse_args() if args.command == 'set-torchbench-branch': @@ -179,6 +217,8 @@ def run_userbenchmarks(pytorch_path: str, torchbench_path: str, base_sha: str, h if not models and not userbenchmarks: print("Can't parse valid models or userbenchmarks from the pr body. Quit.") exit(-1) + elif args.command == 'upload-s3': + process_upload_s3(args.result_dir) else: print(f"The command {args.command} is not supported.") exit(-1) diff --git a/.github/scripts/test_check_labels.py b/.github/scripts/test_check_labels.py new file mode 100644 index 000000000000..64e91dcd8ecb --- /dev/null +++ b/.github/scripts/test_check_labels.py @@ -0,0 +1,77 @@ +"""test_check_labels.py""" + +from typing import Any +from unittest import TestCase, mock, main + +from trymerge import GitHubPR +from test_trymerge import mocked_gh_graphql +from check_labels import has_required_labels + +release_notes_labels = [ + "release notes: AO frontend", + "release notes: autograd", + "release notes: benchmark", + "release notes: build", + "release notes: complex", + "release notes: composability", + "release notes: cpp", + "release notes: cuda", + "release notes: cudnn", + "release notes: dataloader", + "release notes: distributed (c10d)", + "release notes: distributed (ddp)", + "release notes: distributed (fsdp)", + "release notes: distributed (pipeline)", + "release notes: distributed (rpc)", + "release notes: distributed (sharded)", + "release notes: foreach_frontend", + "release notes: functorch", + "release notes: fx", + "release notes: hub", + "release notes: jit", + "release notes: lazy", + "release notes: linalg_frontend", + "release notes: memory format", + "release notes: Meta API", + "release notes: mobile", + "release notes: mps", + "release notes: nested tensor", + "release notes: nn", + "release notes: onnx", + "release notes: package/deploy", + "release notes: performance_as_product", + "release notes: profiler", + "release notes: python_frontend", + "release notes: quantization", + "release notes: releng", + "release notes: rocm", + "release notes: sparse", + "release notes: visualization", + "release notes: vulkan", +] + + +class TestCheckLabels(TestCase): + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + @mock.patch('check_labels.get_release_notes_labels', return_value=release_notes_labels) + def test_pr_with_missing_labels(self, mocked_rn_labels: Any, mocked_gql: Any) -> None: + "Test PR with no 'release notes:' label or 'topic: not user facing' label" + pr = GitHubPR("pytorch", "pytorch", 82169) + self.assertFalse(has_required_labels(pr)) + + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + @mock.patch('check_labels.get_release_notes_labels', return_value=release_notes_labels) + def test_pr_with_release_notes_label(self, mocked_rn_labels: Any, mocked_gql: Any) -> None: + "Test PR with 'release notes: nn' label" + pr = GitHubPR("pytorch", "pytorch", 71759) + self.assertTrue(has_required_labels(pr)) + + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + @mock.patch('check_labels.get_release_notes_labels', return_value=release_notes_labels) + def test_pr_with_not_user_facing_label(self, mocked_rn_labels: Any, mocked_gql: Any) -> None: + "Test PR with 'topic: not user facing' label" + pr = GitHubPR("pytorch", "pytorch", 75095) + self.assertTrue(has_required_labels(pr)) + +if __name__ == "__main__": + main() diff --git a/.github/scripts/test_fetch_latest_green_commit.py b/.github/scripts/test_fetch_latest_green_commit.py index 2f84658e6394..f88e7f262fb9 100644 --- a/.github/scripts/test_fetch_latest_green_commit.py +++ b/.github/scripts/test_fetch_latest_green_commit.py @@ -81,13 +81,12 @@ def test_necessary_failed(self, mock_get_commit_results: Any) -> None: @mock.patch('fetch_latest_green_commit.get_commit_results', return_value=TestChecks().make_test_checks()) def test_skippable_failed(self, mock_get_commit_results: Any) -> None: - "Test with skippable job (ex: docker-release-builds) failing" + "Test with failing skippable jobs (ex: docker-release-builds) should pass" workflow_checks = mock_get_commit_results() workflow_checks = set_workflow_job_status(workflow_checks, "periodic", "skipped") workflow_checks = set_workflow_job_status(workflow_checks, "docker-release-builds", "failed") result = isGreen("sha", workflow_checks) - self.assertFalse(result[0]) - self.assertEqual(result[1], "docker-release-builds checks were not successful") + self.assertTrue(result[0]) @mock.patch('fetch_latest_green_commit.get_commit_results', return_value={}) def test_no_workflows(self, mock_get_commit_results: Any) -> None: diff --git a/.github/scripts/test_filter_test_configs.py b/.github/scripts/test_filter_test_configs.py new file mode 100755 index 000000000000..55410e846c97 --- /dev/null +++ b/.github/scripts/test_filter_test_configs.py @@ -0,0 +1,118 @@ +#!/usr/bin/env python3 + +import os +import yaml +import json +from unittest import TestCase, main, mock +from filter_test_configs import ( + get_labels, + filter, + set_periodic_modes, + PREFIX, + VALID_TEST_CONFIG_LABELS, + SUPPORTED_PERIODICAL_MODES +) +import requests +from requests.models import Response +from typing import Any, Dict + + +def mocked_gh_get_labels_failed(url: str, headers: Dict[str, str]) -> Response: + mocked_response = Response() + mocked_response.status_code = requests.codes.bad_request + return mocked_response + + +def mocked_gh_get_labels(url: str, headers: Dict[str, str]) -> Response: + mocked_response = Response() + mocked_response.status_code = requests.codes.ok + mocked_response._content = b'[{"name": "foo"}, {"name": "bar"}, {}, {"name": ""}]' + return mocked_response + + +class TestConfigFilter(TestCase): + + def setUp(self) -> None: + os.environ["GITHUB_TOKEN"] = "GITHUB_TOKEN" + if os.getenv("GITHUB_OUTPUT"): + del os.environ["GITHUB_OUTPUT"] + + @mock.patch("filter_test_configs.requests.get", side_effect=mocked_gh_get_labels) + def test_get_labels(self, mocked_gh: Any) -> None: + labels = get_labels(pr_number=12345) + self.assertSetEqual({"foo", "bar"}, labels) + + @mock.patch("filter_test_configs.requests.get", side_effect=mocked_gh_get_labels_failed) + def test_get_labels_failed(self, mocked_gh: Any) -> None: + labels = get_labels(pr_number=54321) + self.assertFalse(labels) + + def test_filter(self) -> None: + mocked_labels = {f"{PREFIX}cfg", "ciflow/trunk", "plain-cfg"} + testcases = [ + { + "test_matrix": '{include: [{config: "default", runner: "linux"}]}', + "expected": '{"include": [{"config": "default", "runner": "linux"}]}', + "description": "No match, keep the same test matrix", + }, + { + "test_matrix": '{include: [{config: "default", runner: "linux"}, {config: "plain-cfg"}]}', + "expected": '{"include": [{"config": "default", "runner": "linux"}, {"config": "plain-cfg"}]}', + "description": "No match because there is no prefix or suffix, keep the same test matrix", + }, + { + "test_matrix": '{include: [{config: "default", runner: "linux"}, {config: "cfg", shard: 1}]}', + "expected": '{"include": [{"config": "cfg", "shard": 1}]}', + "description": "Found a match, only keep that", + }, + ] + + for case in testcases: + filtered_test_matrix = filter(yaml.safe_load(case["test_matrix"]), mocked_labels) + self.assertEqual(case["expected"], json.dumps(filtered_test_matrix)) + + def test_filter_with_valid_label(self) -> None: + mocked_labels = {f"{PREFIX}cfg", "ciflow/trunk"} + VALID_TEST_CONFIG_LABELS.add(f"{PREFIX}cfg") + + testcases = [ + { + "test_matrix": '{include: [{config: "default", runner: "linux"}]}', + "expected": '{"include": []}', + "description": "Found a valid label in the PR body, return the filtered test matrix", + }, + { + "test_matrix": '{include: [{config: "default", runner: "linux"}, {config: "cfg", shard: 1}]}', + "expected": '{"include": [{"config": "cfg", "shard": 1}]}', + "description": "Found a match, only keep that", + }, + ] + + for case in testcases: + filtered_test_matrix = filter(yaml.safe_load(case["test_matrix"]), mocked_labels) + self.assertEqual(case["expected"], json.dumps(filtered_test_matrix)) + + + def test_set_periodic_modes(self) -> None: + testcases = [ + { + "test_matrix": "{include: []}", + "description": "Empty test matrix", + }, + { + "test_matrix": '{include: [{config: "default", runner: "linux"}, {config: "cfg", runner: "macos"}]}', + "descripion": "Replicate each periodic mode in a different config", + }, + ] + + for case in testcases: + test_matrix = yaml.safe_load(case["test_matrix"]) + scheduled_test_matrix = set_periodic_modes(test_matrix) + self.assertEqual( + len(test_matrix["include"]) * len(SUPPORTED_PERIODICAL_MODES), + len(scheduled_test_matrix["include"]) + ) + + +if __name__ == '__main__': + main() diff --git a/.github/scripts/test_trymerge.py b/.github/scripts/test_trymerge.py index af3faf8cd094..7d5dfe7f0a3a 100755 --- a/.github/scripts/test_trymerge.py +++ b/.github/scripts/test_trymerge.py @@ -18,9 +18,12 @@ gh_get_team_members, read_merge_rules, validate_revert, + filter_pending_checks, + filter_failed_checks, GitHubPR, MergeRule, MandatoryChecksMissingError, + WorkflowCheckState, main as trymerge_main) from gitutils import get_git_remote_name, get_git_repo_dir, GitRepo from typing import Any, List, Optional @@ -90,7 +93,7 @@ def mock_revert(repo: GitRepo, pr: GitHubPR, *, def mock_merge(pr_num: int, repo: GitRepo, dry_run: bool = False, - force: bool = False, + skip_mandatory_checks: bool = False, comment_id: Optional[int] = None, mandatory_only: bool = False, on_green: bool = False, @@ -127,6 +130,11 @@ def mocked_read_merge_rules(repo: Any, org: str, project: str) -> List[MergeRule ), ] + +def mocked_read_merge_rules_raise(repo: Any, org: str, project: str) -> List[MergeRule]: + raise RuntimeError("testing") + + class DummyGitRepo(GitRepo): def __init__(self) -> None: super().__init__(get_git_repo_dir(), get_git_remote_name()) @@ -139,7 +147,7 @@ def commit_message(self, ref: str) -> str: class TestGitHubPR(TestCase): def test_merge_rules_valid(self) -> None: - "Test that merge_rules.json can be parsed" + "Test that merge_rules.yaml can be parsed" repo = DummyGitRepo() self.assertGreater(len(read_merge_rules(repo, "pytorch", "pytorch")), 1) @@ -151,6 +159,14 @@ def test_match_rules(self, mocked_gql: Any, mocked_rmr: Any) -> None: repo = DummyGitRepo() self.assertTrue(find_matching_merge_rule(pr, repo) is not None) + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) + @mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules_raise) + def test_read_merge_rules_fails(self, mocked_gql: Any, mocked_rmr: Any) -> None: + "Tests that PR fails to read the merge rules" + pr = GitHubPR("pytorch", "pytorch", 77700) + repo = DummyGitRepo() + self.assertRaisesRegex(RuntimeError, "testing", lambda: find_matching_merge_rule(pr, repo)) + @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) @mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules) def test_lint_fails(self, mocked_gql: Any, mocked_rmr: Any) -> None: @@ -203,7 +219,7 @@ def test_internal_changes(self, mocked_gql: Any) -> None: def test_checksuites_pagination(self, mocked_gql: Any) -> None: "Tests that PR with lots of checksuits can be fetched" pr = GitHubPR("pytorch", "pytorch", 73811) - self.assertEqual(len(pr.get_checkrun_conclusions()), 104) + self.assertEqual(len(pr.get_checkrun_conclusions()), 107) @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) def test_comments_pagination(self, mocked_gql: Any) -> None: @@ -310,7 +326,7 @@ def test_main_force(self, mock_merge: Any, mock_parse_args: Any, mock_gh_get_inf mock_merge.assert_called_once_with(mock.ANY, mock.ANY, dry_run=mock.ANY, - force=True, + skip_mandatory_checks=True, comment_id=mock.ANY, on_green=False, land_checks=False, @@ -324,18 +340,35 @@ def test_main_merge(self, mock_merge: Any, mock_parse_args: Any, mock_gh_get_inf mock_merge.assert_called_once_with(mock.ANY, mock.ANY, dry_run=mock.ANY, - force=False, + skip_mandatory_checks=False, comment_id=mock.ANY, on_green=False, land_checks=False, mandatory_only=False) @mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql) - def test_revert_rules(self, mock_gql: Any) -> None: + @mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules) + def test_revert_rules(self, mock_gql: Any, mock_mr: Any) -> None: """ Tests that reverts from collaborators are allowed """ pr = GitHubPR("pytorch", "pytorch", 79694) repo = DummyGitRepo() self.assertIsNotNone(validate_revert(repo, pr, comment_id=1189459845)) + def test_checks_filter(self) -> None: + checks = [ + WorkflowCheckState(name="check0", status="SUCCESS", url="url0"), + WorkflowCheckState(name="check1", status="FAILURE", url="url1"), + WorkflowCheckState(name="check2", status="STARTUP_FAILURE", url="url2"), + WorkflowCheckState(name="check3", status=None, url="url3"), + ] + + checks_dict = {check.name : check for check in checks} + + pending_checks = filter_pending_checks(checks_dict) + failing_checks = filter_failed_checks(checks_dict) + + self.assertListEqual(failing_checks, [checks[1], checks[2]]) + self.assertListEqual(pending_checks, [checks[3]]) + if __name__ == "__main__": main() diff --git a/.github/scripts/trymerge.py b/.github/scripts/trymerge.py index 8be44f240162..697b4b94faac 100755 --- a/.github/scripts/trymerge.py +++ b/.github/scripts/trymerge.py @@ -9,6 +9,7 @@ from dataclasses import dataclass from datetime import datetime from functools import lru_cache +import yaml from typing import ( Any, Callable, @@ -20,10 +21,12 @@ Tuple, Union, cast, + NamedTuple ) from urllib.error import HTTPError from urllib.request import Request, urlopen from warnings import warn +from pathlib import Path from gitutils import ( GitRepo, @@ -33,10 +36,14 @@ ) from trymerge_explainer import ( TryMergeExplainer, - get_land_check_troubleshooting_message, get_revert_message, ) +class WorkflowCheckState(NamedTuple): + status: Optional[str] + url: str + name: str + GH_PR_REVIEWS_FRAGMENT = """ fragment PRReviews on PullRequestReviewConnection { nodes { @@ -144,6 +151,13 @@ checkSuites(first: 10) { ...PRCheckSuites } + status { + contexts { + context + state + targetUrl + } + } pushedDate oid } @@ -165,6 +179,7 @@ comments(last: 5) { nodes { bodyText + createdAt author { login } @@ -322,6 +337,7 @@ comments(last: 100, before: $cursor) { nodes { bodyText + createdAt author { login } @@ -391,9 +407,12 @@ r'https://github.com/(?P[^/]+)/(?P[^/]+)/pull/(?P[0-9]+)', re.MULTILINE ) +RE_PR_CC_LINE = re.compile(r'^cc:? @\w+.*\r?\n?$', re.MULTILINE) RE_DIFF_REV = re.compile(r'^Differential Revision:.+?(D[0-9]+)', re.MULTILINE) CIFLOW_LABEL = re.compile(r"^ciflow/.+") CIFLOW_TRUNK_LABEL = re.compile(r"^ciflow/trunk") +MERGE_RULE_PATH = Path(".github") / "merge_rules.yaml" + def _fetch_url(url: str, *, headers: Optional[Dict[str, str]] = None, @@ -485,12 +504,11 @@ def get_check_run_name_prefix(workflow_run: Any) -> str: else: return f'{workflow_run["workflow"]["name"]} / ' - def add_workflow_conclusions( checksuites: Any, get_next_checkruns_page: Callable[[List[Dict[str, Dict[str, Any]]], int, Any], Any], get_next_checksuites: Callable[[Any], Any] -) -> Dict[str, Tuple[str, str]]: +) -> Dict[str, WorkflowCheckState]: conclusions = {} def add_conclusions(edges: Any) -> None: @@ -504,7 +522,10 @@ def add_conclusions(edges: Any) -> None: # Do not override existing status with cancelled if workflow_conclusion == "CANCELLED" and workflow_name in conclusions: continue - conclusions[workflow_name] = (workflow_conclusion, node["url"]) + conclusions[workflow_name] = WorkflowCheckState( + name=workflow_name, + status=workflow_conclusion, + url=node["url"]) has_failing_check = False while checkruns is not None: for checkrun_node in checkruns["nodes"]: @@ -513,8 +534,11 @@ def add_conclusions(edges: Any) -> None: continue if checkrun_node["conclusion"] == 'FAILURE': has_failing_check = True - conclusions[f'{get_check_run_name_prefix(workflow_run)}{checkrun_node["name"]}'] = ( - checkrun_node["conclusion"], checkrun_node["detailsUrl"] + checkrun_name = f'{get_check_run_name_prefix(workflow_run)}{checkrun_node["name"]}' + conclusions[checkrun_name] = WorkflowCheckState( + name=checkrun_name, + status=checkrun_node["conclusion"], + url=checkrun_node["detailsUrl"] ) if bool(checkruns["pageInfo"]["hasNextPage"]): checkruns = get_next_checkruns_page(edges, edge_idx, checkruns) @@ -522,7 +546,11 @@ def add_conclusions(edges: Any) -> None: checkruns = None # Github doesn't set conclusion to failure if a job is still pending if workflow_run is not None and has_failing_check: - conclusions[workflow_run["workflow"]["name"]] = ("FAILURE", node["url"]) + workflow_name = workflow_run["workflow"]["name"] + conclusions[workflow_name] = WorkflowCheckState( + name=workflow_name, + status="FAILURE", + url=node["url"]) add_conclusions(checksuites["edges"]) while bool(checksuites["pageInfo"]["hasNextPage"]): @@ -558,6 +586,7 @@ def can_skip_internal_checks(pr: "GitHubPR", comment_id: Optional[int] = None) - @dataclass class GitHubComment: body_text: str + created_at: str author_login: str author_association: str editor_login: Optional[str] @@ -573,7 +602,7 @@ def __init__(self, org: str, project: str, pr_num: int) -> None: self.info = gh_get_pr_info(org, project, pr_num) self.changed_files: Optional[List[str]] = None self.labels: Optional[List[str]] = None - self.conclusions: Optional[Dict[str, Tuple[str, str]]] = None + self.conclusions: Optional[Dict[str, WorkflowCheckState]] = None self.comments: Optional[List[GitHubComment]] = None self._authors: Optional[List[Tuple[str, str]]] = None self._reviews: Optional[List[Tuple[str, str]]] = None @@ -701,7 +730,7 @@ def get_labels(self) -> List[str]: self.labels = labels return self.labels - def get_checkrun_conclusions(self) -> Dict[str, Tuple[str, str]]: + def get_checkrun_conclusions(self) -> Dict[str, WorkflowCheckState]: """ Returns dict of checkrun -> [conclusion, url] """ if self.conclusions is not None: return self.conclusions @@ -733,6 +762,13 @@ def get_pr_next_checksuites(checksuites: Any) -> Any: checksuites = orig_last_commit["checkSuites"] self.conclusions = add_workflow_conclusions(checksuites, get_pr_next_check_runs, get_pr_next_checksuites) + + # Append old style statuses(like ones populated by CircleCI or EasyCLA) to conclusions + if orig_last_commit["status"] and orig_last_commit["status"]["contexts"]: + for status in orig_last_commit["status"]["contexts"]: + name = status["context"] + self.conclusions[name] = WorkflowCheckState(name=name, status=status["state"], url=status["targetUrl"]) + return self.conclusions def get_authors(self) -> Dict[str, str]: @@ -775,6 +811,7 @@ def get_pr_url(self) -> str: def _comment_from_node(node: Any) -> GitHubComment: editor = node["editor"] return GitHubComment(body_text=node["bodyText"], + created_at=node["createdAt"] if "createdAt" in node else "", author_login=node["author"]["login"], author_association=node["authorAssociation"], editor_login=editor["login"] if editor else None, @@ -826,9 +863,15 @@ def has_internal_changes(self) -> bool: checks = self.get_checkrun_conclusions() if checks is None or checkrun_name not in checks: return False - return checks[checkrun_name][0] != "SUCCESS" - - def merge_ghstack_into(self, repo: GitRepo, force: bool, comment_id: Optional[int] = None) -> None: + return checks[checkrun_name].status != "SUCCESS" + + def merge_ghstack_into( + self, + repo: GitRepo, + skip_mandatory_checks: bool, + comment_id: Optional[int] = None, + land_check_commit: Optional[str] = None + ) -> None: assert self.is_ghstack_pr() # For ghstack, cherry-pick commits based from origin orig_ref = f"{repo.remote}/{re.sub(r'/head$', '/orig', self.head_ref())}" @@ -849,7 +892,12 @@ def merge_ghstack_into(self, repo: GitRepo, force: bool, comment_id: Optional[in continue commit_msg = pr.gen_commit_message(filter_ghstack=True) # Raises exception if matching rule is not found - find_matching_merge_rule(pr, repo, force=force, skip_internal_checks=can_skip_internal_checks(self, comment_id)) + find_matching_merge_rule( + pr, + repo, + skip_mandatory_checks=skip_mandatory_checks, + skip_internal_checks=can_skip_internal_checks(self, comment_id), + land_check_commit=land_check_commit) repo.cherry_pick(rev) repo.amend_commit_message(commit_msg) @@ -860,28 +908,41 @@ def gen_commit_message(self, filter_ghstack: bool = False) -> str: filters out ghstack info """ # Adding the url here makes it clickable within the Github UI approved_by_urls = ', '.join(prefix_with_github_url(login) for login in self.get_approved_by()) + # Remove "cc: " line from the message body + msg_body = re.sub(RE_PR_CC_LINE, "", self.get_body()) + if filter_ghstack: + msg_body = re.sub(RE_GHSTACK_DESC, "", msg_body) msg = self.get_title() + f" (#{self.pr_num})\n\n" - msg += self.get_body() if not filter_ghstack else re.sub(RE_GHSTACK_DESC, "", self.get_body()) + msg += msg_body msg += f"\nPull Request resolved: {self.get_pr_url()}\n" msg += f"Approved by: {approved_by_urls}\n" return msg def merge_into(self, repo: GitRepo, *, - force: bool = False, + skip_mandatory_checks: bool = False, dry_run: bool = False, - comment_id: Optional[int] = None) -> None: + comment_id: Optional[int] = None, + land_check_commit: Optional[str] = None) -> None: # Raises exception if matching rule is not found - find_matching_merge_rule(self, repo, force=force, skip_internal_checks=can_skip_internal_checks(self, comment_id)) - self.merge_changes(repo, force, comment_id) + find_matching_merge_rule( + self, + repo, + skip_mandatory_checks=skip_mandatory_checks, + skip_internal_checks=can_skip_internal_checks(self, comment_id), + land_check_commit=land_check_commit) + self.merge_changes(repo, skip_mandatory_checks, comment_id, land_check_commit=land_check_commit) repo.push(self.default_branch(), dry_run) if not dry_run: + if land_check_commit: + self.delete_land_time_check_branch(repo) gh_add_labels(self.org, self.project, self.pr_num, ["merged"]) def merge_changes(self, repo: GitRepo, - force: bool = False, + skip_mandatory_checks: bool = False, comment_id: Optional[int] = None, + land_check_commit: Optional[str] = None, branch: Optional[str] = None) -> None: branch_to_merge_into = self.default_branch() if branch is None else branch if repo.current_branch() != branch_to_merge_into: @@ -893,14 +954,25 @@ def merge_changes(self, repo._run_git("merge", "--squash", pr_branch_name) repo._run_git("commit", f"--author=\"{self.get_author()}\"", "-m", msg) else: - self.merge_ghstack_into(repo, force, comment_id=comment_id) + self.merge_ghstack_into( + repo, + skip_mandatory_checks, + comment_id=comment_id, + land_check_commit=land_check_commit + ) def create_land_time_check_branch(self, repo: GitRepo, branch: str, - force: bool = False, + skip_mandatory_checks: bool = False, comment_id: Optional[int] = None,) -> str: - self.merge_changes(repo, branch=branch, force=force, comment_id=comment_id) + orig_branch = repo.current_branch() + self.merge_changes( + repo, + branch=branch, + skip_mandatory_checks=skip_mandatory_checks, + comment_id=comment_id + ) land_check_branch = f'landchecks/{self.pr_num}' try: repo._run_git('branch', "-D", land_check_branch) @@ -909,8 +981,16 @@ def create_land_time_check_branch(self, repo._run_git('checkout', "-b", land_check_branch) repo._run_git('push', '-u', 'origin', land_check_branch, '--force') commit = repo.get_commit('HEAD').commit_hash + # Important, return to original branch + if repo.current_branch() != orig_branch: + repo.checkout(orig_branch) return commit + def delete_land_time_check_branch(self, + repo: GitRepo) -> None: + land_check_branch = f'landchecks/{self.pr_num}' + repo._run_git('push', 'origin', '-d', land_check_branch) + class MandatoryChecksMissingError(Exception): pass @@ -927,10 +1007,20 @@ class MergeRule: mandatory_checks_name: Optional[List[str]] -def read_merge_rules(repo: Optional[GitRepo], org: str, project: str) -> List[MergeRule]: - from pathlib import Path +def gen_new_issue_link( + org: str, + project: str, + labels: List[str], + template: str = "bug-report.yml" +) -> str: + labels_str = ",". join(labels) + return (f"https://github.com/{org}/{project}/issues/new?" + f"labels={urllib.parse.quote(labels_str)}&" + f"template={urllib.parse.quote(template)}") + - repo_relative_rules_path = Path(".github") / "merge_rules.json" +def read_merge_rules(repo: Optional[GitRepo], org: str, project: str) -> List[MergeRule]: + repo_relative_rules_path = MERGE_RULE_PATH if repo is None: json_data = _fetch_url( f"https://api.github.com/repos/{org}/{project}/contents/{repo_relative_rules_path}", @@ -938,28 +1028,46 @@ def read_merge_rules(repo: Optional[GitRepo], org: str, project: str) -> List[Me reader=json.load, ) content = base64.b64decode(json_data["content"]) - return cast(List[MergeRule], json.loads(content, object_hook=lambda x: MergeRule(**x))) + return [MergeRule(**x) for x in yaml.safe_load(content)] else: rules_path = Path(repo.repo_dir) / repo_relative_rules_path if not rules_path.exists(): print(f"{rules_path} does not exist, returning empty rules") return [] with open(rules_path) as fp: - rc = json.load(fp, object_hook=lambda x: MergeRule(**x)) - return cast(List[MergeRule], rc) + rc = yaml.safe_load(fp) + return [MergeRule(**x) for x in rc] def find_matching_merge_rule(pr: GitHubPR, repo: Optional[GitRepo] = None, - force: bool = False, - skip_internal_checks: bool = False + skip_mandatory_checks: bool = False, + skip_internal_checks: bool = False, + land_check_commit: Optional[str] = None, ) -> MergeRule: """Returns merge rule matching to this pr or raises an exception""" changed_files = pr.get_changed_files() approved_by = set(pr.get_approved_by()) + issue_link = gen_new_issue_link( + org=pr.org, + project=pr.project, + labels=["module: ci"], + ) + reject_reason = f"No rule found to match PR. Please [report]{issue_link} this issue to DevX team." + rules = read_merge_rules(repo, pr.org, pr.project) - reject_reason = f"PR {pr.pr_num} does not match merge rules" - # Used to determine best rejection reason + if not rules: + reject_reason = f"Rejecting the merge as no rules are defined for the repository in {MERGE_RULE_PATH}" + raise RuntimeError(reject_reason) + + # PRs can fail multiple merge rules, but it only needs to pass one rule to be approved. + # If it fails all rules, we need to find the rule that it came closest to passing and report + # that to the dev. + # + # reject_reason_score ranks rules by relevancy. The higher the score, the more relevant the + # rule & rejection reason, and we only care about the most relevant rule/reason + # + # reject_reason_score intrepretation: # Score 0 to 10K - how many files rule matched # Score 10K - matched all files, but no overlapping approvers # Score 20K - matched all files and approvers, but mandatory checks are pending @@ -969,6 +1077,8 @@ def find_matching_merge_rule(pr: GitHubPR, rule_name = rule.name patterns_re = patterns_to_regex(rule.patterns) non_matching_files = [] + + # Does this rule apply to all the files? for fname in changed_files: if not patterns_re.match(fname): non_matching_files.append(fname) @@ -976,16 +1086,21 @@ def find_matching_merge_rule(pr: GitHubPR, num_matching_files = len(changed_files) - len(non_matching_files) if num_matching_files > reject_reason_score: reject_reason_score = num_matching_files - reject_reason = (f"{num_matching_files} files matched rule {rule_name}, but there are still non-matching files: " + - f"{','.join(non_matching_files[:5])}{', ...' if len(non_matching_files) > 5 else ''}") + reject_reason = "\n".join(( + f"Not all files match rule `{rule_name}`." + f"{num_matching_files} files matched, but there are still non-matching files:" + f"{','.join(non_matching_files[:5])}{', ...' if len(non_matching_files) > 5 else ''}" + )) continue + # If rule needs approvers but PR has not been reviewed, skip it if len(rule.approved_by) > 0 and len(approved_by) == 0: if reject_reason_score < 10000: reject_reason_score = 10000 - reject_reason = f"Matched rule {rule_name}, but PR #{pr.pr_num} has not been reviewed yet" + reject_reason = f"PR #{pr.pr_num} has not been reviewed yet (Rule {rule_name})" continue + # Does the PR have the required approvals for this rule? rule_approvers_set = set() for approver in rule.approved_by: if "/" in approver: @@ -998,35 +1113,51 @@ def find_matching_merge_rule(pr: GitHubPR, if len(approvers_intersection) == 0 and len(rule_approvers_set) > 0: if reject_reason_score < 10000: reject_reason_score = 10000 - reject_reason = (f"Matched rule {rule_name}, but PR #{pr.pr_num} was not reviewed yet by any of: " + - f"{', '.join(list(rule_approvers_set)[:5])}{', ...' if len(rule_approvers_set) > 5 else ''}") + reject_reason = "\n".join(( + f"Approval needed from one of the following (Rule '{rule_name}'):", + f"{', '.join(list(rule_approvers_set)[:5])}{', ...' if len(rule_approvers_set) > 5 else ''}" + )) continue + + # Does the PR pass the checks required by this rule? mandatory_checks = rule.mandatory_checks_name if rule.mandatory_checks_name is not None else [] - checks = pr.get_checkrun_conclusions() - required_checks = filter(lambda x: force is False or "CLA Check" in x, mandatory_checks) + checks = get_combined_checks_from_pr_and_land_validation(pr, land_check_commit) + required_checks = filter(lambda x: skip_mandatory_checks is False or "EasyCLA" in x, mandatory_checks) [pending_checks, failed_checks] = categorize_checks(checks, required_checks) + hud_link = f"https://hud.pytorch.org/{pr.org}/{pr.project}/commit/{pr.last_commit()['oid']}" if len(failed_checks) > 0: if reject_reason_score < 30000: reject_reason_score = 30000 - reject_reason = ("Refusing to merge as mandatory check(s) " + - checks_to_str(failed_checks) + f" failed for rule {rule_name}") + reject_reason = "\n".join(( + f"The following mandatory check(s) failed (Rule `{rule_name}`):", + *checks_to_markdown_bullets(failed_checks), + "", + f"Dig deeper by [viewing the failures on hud]({hud_link})" + )) continue elif len(pending_checks) > 0: if reject_reason_score < 20000: reject_reason_score = 20000 - reject_reason = f"Refusing to merge as mandatory check(s) {checks_to_str(pending_checks)}" - reject_reason += f" are pending/not yet run for rule {rule_name}" + reject_reason = "\n".join(( + f"The following mandatory check(s) are pending/not yet run (Rule `{rule_name}`):", + *checks_to_markdown_bullets(pending_checks), + "", + f"Dig deeper by [viewing the pending checks on hud]({hud_link})" + )) continue + if not skip_internal_checks and pr.has_internal_changes(): raise RuntimeError("This PR has internal changes and must be landed via Phabricator") + return rule + if reject_reason_score == 20000: raise MandatoryChecksMissingError(reject_reason) raise RuntimeError(reject_reason) -def get_land_checkrun_conclusions(org: str, project: str, commit: str) -> Dict[str, Tuple[str, str]]: +def get_land_checkrun_conclusions(org: str, project: str, commit: str) -> Dict[str, WorkflowCheckState]: def get_commit_next_check_runs(edges: List[Dict[str, Dict[str, Any]]], edge_idx: int, checkruns: Any) -> Any: rc = gh_graphql(GH_GET_COMMIT_NEXT_CHECK_RUNS, @@ -1055,18 +1186,48 @@ def get_commit_next_checksuites(checksuites: Any) -> Any: def checks_to_str(checks: List[Tuple[str, Optional[str]]]) -> str: return ", ".join(f"[{c[0]}]({c[1]})" if c[1] is not None else c[0] for c in checks) -def pr_get_checks_with_lambda(pr: GitHubPR, status_check: Callable[[Optional[str]], bool]) -> List[Tuple[str, str]]: - checks = pr.get_checkrun_conclusions() - return [(name, status[1]) for name, status in checks.items() if status_check(status[0])] +def checks_to_markdown_bullets(checks: List[Tuple[str, Optional[str]]]) -> List[str]: + return [f"- [{c[0]}]({c[1]})" if c[1] is not None else f"- {c[0]}" for c in checks] + +def get_combined_checks_from_pr_and_land_validation( + pr: GitHubPR, + land_check_commit: Optional[str] +) -> Dict[str, WorkflowCheckState]: + """ + Combines checks from both the PR and land validation to get a holistic view + of all checks. + + This helps us cover the corner case where certain workflows may have been + requested on the PR but are not part of land validation (e.g. nightly + builds) or are implicitly run on PRs but not on land validation branches + (like CLA Checks). -def pr_get_pending_checks(pr: GitHubPR) -> List[Tuple[str, str]]: - return pr_get_checks_with_lambda(pr, lambda x: x is None) + At the same time, we prioritize the signal workflows which do run on land + validation. + E.g. if a workflow fails on the PR but passes on land validation then we'd + use the successful result from the land validation. + """ -def pr_get_failed_checks(pr: GitHubPR) -> List[Tuple[str, str]]: - return pr_get_checks_with_lambda(pr, lambda x: x in ["FAILURE", "STARTUP_FAILURE"]) + pr_checks = pr.get_checkrun_conclusions() + land_validation_checks = get_land_checkrun_conclusions(pr.org, pr.project, land_check_commit) if land_check_commit else {} + # Merge the two checks together. Land validation check results (if any) overwrite pr check results + merged_checks = {**pr_checks, **land_validation_checks} # explanation: https://stackoverflow.com/a/26853961/21539 + return merged_checks + +def filter_checks_with_lambda( + checks: Dict[str, WorkflowCheckState], + status_filter: Callable[[Optional[str]], bool] +) -> List[WorkflowCheckState]: + return [check for check in checks.values() if status_filter(check.status)] + +def filter_pending_checks(checks: Dict[str, WorkflowCheckState]) -> List[WorkflowCheckState]: + return filter_checks_with_lambda(checks, lambda x: x is None) + +def filter_failed_checks(checks: Dict[str, WorkflowCheckState]) -> List[WorkflowCheckState]: + return filter_checks_with_lambda(checks, lambda x: x in ["FAILURE", "STARTUP_FAILURE"]) def validate_revert(repo: GitRepo, pr: GitHubPR, *, comment_id: Optional[int] = None) -> Tuple[str, str]: @@ -1087,7 +1248,7 @@ def validate_revert(repo: GitRepo, pr: GitHubPR, *, skip_internal_checks = can_skip_internal_checks(pr, comment_id) # Raises exception if matching rule is not found, but ignores all status checks - find_matching_merge_rule(pr, repo, force=True, skip_internal_checks=skip_internal_checks) + find_matching_merge_rule(pr, repo, skip_mandatory_checks=True, skip_internal_checks=skip_internal_checks) commit_sha = pr.get_merge_commit() if commit_sha is None: commits = repo.commits_resolving_gh_pr(pr.pr_num) @@ -1129,8 +1290,8 @@ def post_comment(msg: str) -> None: def prefix_with_github_url(suffix_str: str) -> str: return f"https://github.com/{suffix_str}" -def check_for_sev(org: str, project: str, force: bool) -> None: - if force: +def check_for_sev(org: str, project: str, skip_mandatory_checks: bool) -> None: + if skip_mandatory_checks: return response = cast( Dict[str, Any], @@ -1164,24 +1325,22 @@ def validate_land_time_checks(org: str, project: str, commit: str) -> None: def has_label(labels: List[str], pattern: Pattern[str] = CIFLOW_LABEL) -> bool: return len(list(filter(pattern.match, labels))) > 0 -def categorize_checks(check_runs: Dict[str, Tuple[str, str]], +def categorize_checks(check_runs: Dict[str, WorkflowCheckState], required_checks: Iterable[str]) -> Tuple[List[Tuple[str, Optional[str]]], List[Tuple[str, Optional[str]]]]: pending_checks: List[Tuple[str, Optional[str]]] = [] failed_checks: List[Tuple[str, Optional[str]]] = [] for checkname in required_checks: if checkname not in check_runs: pending_checks.append((checkname, None)) - elif check_runs[checkname][0] is None: - pending_checks.append((checkname, check_runs[checkname][1])) - elif (check_runs[checkname][0].upper() != 'SUCCESS' - and check_runs[checkname][0].upper() != 'SKIPPED' - and check_runs[checkname][0].upper() != 'NEUTRAL'): - failed_checks.append((checkname, check_runs[checkname][1])) + elif check_runs[checkname].status is None: + pending_checks.append((checkname, check_runs[checkname].url)) + elif (str(check_runs[checkname].status).upper() not in ['SUCCESS', 'SKIPPED', 'NEUTRAL']): + failed_checks.append((checkname, check_runs[checkname].url)) return (pending_checks, failed_checks) def merge(pr_num: int, repo: GitRepo, dry_run: bool = False, - force: bool = False, + skip_mandatory_checks: bool = False, comment_id: Optional[int] = None, mandatory_only: bool = False, on_green: bool = False, @@ -1192,65 +1351,100 @@ def merge(pr_num: int, repo: GitRepo, org, project = repo.gh_owner_and_name() pr = GitHubPR(org, project, pr_num) initial_commit_sha = pr.last_commit()['oid'] - explainer = TryMergeExplainer(force, on_green, land_checks, pr.get_labels(), pr.pr_num, org, project) + explainer = TryMergeExplainer(skip_mandatory_checks, on_green, land_checks, pr.get_labels(), pr.pr_num, org, project) on_green, land_checks = explainer.get_flags() land_check_commit = None - check_for_sev(org, project, force) + check_for_sev(org, project, skip_mandatory_checks) - if force or can_skip_internal_checks(pr, comment_id): + if skip_mandatory_checks or can_skip_internal_checks(pr, comment_id): # do not wait for any pending signals if PR is closed as part of co-development process gh_post_pr_comment(org, project, pr.pr_num, explainer.get_merge_message()) - return pr.merge_into(repo, dry_run=dry_run, force=force, comment_id=comment_id) + return pr.merge_into( + repo, + dry_run=dry_run, + skip_mandatory_checks=skip_mandatory_checks, + comment_id=comment_id + ) - if land_checks: - land_check_commit = pr.create_land_time_check_branch(repo, 'viable/strict', force=force, comment_id=comment_id) + # Important: check for merge rule once before starting land checks + # because we want to make sure that only approved PRs can start CI + # jobs. If there's missing approval, a RuntimeError will be raised + # here to stop the merge process right away + find_matching_merge_rule(pr, repo, skip_mandatory_checks=True) + + if land_checks and not dry_run: + land_check_commit = pr.create_land_time_check_branch( + repo, + 'viable/strict', + skip_mandatory_checks=skip_mandatory_checks, + comment_id=comment_id + ) gh_post_pr_comment(org, project, pr.pr_num, explainer.get_merge_message(land_check_commit)) if (datetime.utcnow() - pr.last_pushed_at()).days > stale_pr_days: - raise RuntimeError("This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again.") + if land_checks and not dry_run: + pr.delete_land_time_check_branch(repo) + raise RuntimeError(f"This PR is too stale; the last push date was more than {stale_pr_days} days ago. " + "Please rebase and try again. You can rebase by leaving the following comment on this PR:\n" + "`@pytorchbot rebase`") start_time = time.time() last_exception = '' elapsed_time = 0.0 while elapsed_time < timeout_minutes * 60: - check_for_sev(org, project, force) + check_for_sev(org, project, skip_mandatory_checks) current_time = time.time() elapsed_time = current_time - start_time print(f"Attempting merge of https://github.com/{org}/{project}/pull/{pr_num} ({elapsed_time / 60} minutes elapsed)") pr = GitHubPR(org, project, pr_num) if initial_commit_sha != pr.last_commit()['oid']: + if land_checks and not dry_run: + pr.delete_land_time_check_branch(repo) raise RuntimeError("New commits were pushed while merging. Please rerun the merge command.") try: find_matching_merge_rule(pr, repo) - pending = pr_get_pending_checks(pr) - failing = pr_get_failed_checks(pr) + checks = get_combined_checks_from_pr_and_land_validation(pr, land_check_commit) + pending = filter_pending_checks(checks) + failing = filter_failed_checks(checks) # HACK until GitHub will be better about surfacing those - startup_failures = pr_get_checks_with_lambda(pr, lambda x: x == "STARTUP_FAILURE") + startup_failures = filter_checks_with_lambda(checks, lambda status: status == "STARTUP_FAILURE") if len(startup_failures) > 0: raise RuntimeError(f"{len(failing)} STARTUP failures reported, please check workflows syntax! " + - ' ,'.join(f"[{x[0]}]({x[1]})" for x in startup_failures[:5])) + ' ,'.join(f"[{x.name}]({x.url})" for x in startup_failures[:5])) # END of HACK if (not mandatory_only and on_green) and len(failing) > 0: raise RuntimeError(f"{len(failing)} additional jobs have failed, first few of them are: " + - ' ,'.join(f"[{x[0]}]({x[1]})" for x in failing[:5])) + ' ,'.join(f"[{x.name}]({x.url})" for x in failing[:5])) if (not mandatory_only and on_green) and len(pending) > 0: raise MandatoryChecksMissingError(f"Still waiting for {len(pending)} additional jobs to finish, " + - f"first few of them are: {' ,'.join(x[0] for x in pending[:5])}") + f"first few of them are: {' ,'.join(x.name for x in pending[:5])}") if land_checks and land_check_commit is not None: validate_land_time_checks(org, project, land_check_commit) - return pr.merge_into(repo, dry_run=dry_run, force=force, comment_id=comment_id) + return pr.merge_into( + repo, + dry_run=dry_run, + skip_mandatory_checks=skip_mandatory_checks, + comment_id=comment_id, + land_check_commit=land_check_commit + ) except MandatoryChecksMissingError as ex: last_exception = str(ex) print(f"Merge of https://github.com/{org}/{project}/pull/{pr_num} failed due to: {ex}. Retrying in 5 min") time.sleep(5 * 60) + except RuntimeError: + if land_checks and not dry_run: + pr.delete_land_time_check_branch(repo) + raise # Finally report timeout back msg = f"Merged timed out after {timeout_minutes} minutes. Please contact the pytorch_dev_infra team." msg += f"The last exception was: {last_exception}" if not dry_run: + if land_checks: + pr.delete_land_time_check_branch(repo) gh_add_labels(org, project, pr_num, ["land-failed"]) raise RuntimeError(msg) @@ -1260,13 +1454,26 @@ def main() -> None: org, project = repo.gh_owner_and_name() pr = GitHubPR(org, project, args.pr_num) - def handle_exception(e: Exception, msg: str = "Merge failed") -> None: - msg += f" due to {e}" + def handle_exception(e: Exception, title: str = "Merge failed") -> None: + exception = f"**Reason**: {e}" + + internal_debugging = "" run_url = os.getenv("GH_RUN_URL") if run_url is not None: - msg += f"\nRaised by {run_url}" - if args.land_checks: - msg += get_land_check_troubleshooting_message() + # Hide this behind a collapsed bullet since it's not helpful to most devs + internal_debugging = "\n".join(( + "
Details for Dev Infra team", + f"Raised by workflow job", + "
" + )) + + msg = "\n".join(( + f"## {title}", + f"{exception}", + "", + f"{internal_debugging}" + )) + gh_post_pr_comment(org, project, args.pr_num, msg, dry_run=args.dry_run) import traceback traceback.print_exc() @@ -1290,7 +1497,7 @@ def handle_exception(e: Exception, msg: str = "Merge failed") -> None: try: merge(args.pr_num, repo, dry_run=args.dry_run, - force=args.force, + skip_mandatory_checks=args.force, comment_id=args.comment_id, on_green=args.on_green, mandatory_only=args.on_mandatory, diff --git a/.github/scripts/trymerge_explainer.py b/.github/scripts/trymerge_explainer.py index e59307f10854..a7be2f78c4bc 100644 --- a/.github/scripts/trymerge_explainer.py +++ b/.github/scripts/trymerge_explainer.py @@ -9,12 +9,10 @@ CIFLOW_TRUNK_LABEL = re.compile(r"^ciflow/trunk") OFFICE_HOURS_LINK = "https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours" -CONTACT_US = f"Please reach out to the [PyTorch DevX Team]({OFFICE_HOURS_LINK}) with feedback or questions!" +CONTACT_US = f"Questions? Feedback? Please reach out to the [PyTorch DevX Team]({OFFICE_HOURS_LINK})" ALTERNATIVES = ( - "If this is not the intended behavior, feel free to use some " - + f"of the other merge options in the [wiki]({BOT_COMMANDS_WIKI})." + f"Learn more about merging in the [wiki]({BOT_COMMANDS_WIKI})." ) -LAND_CHECK_ROLLOUT = "https://github.com/pytorch/test-infra/blob/main/torchci/lib/bot/rolloutUtils.ts#L1-L34" def has_label(labels: List[str], pattern: Pattern[str] = CIFLOW_LABEL) -> bool: @@ -62,68 +60,49 @@ def get_flags(self) -> Tuple[bool, bool]: def _get_flag_msg(self) -> str: if self.force: - return " the force (-f) flag." + return "Your change will be merged immediately since you used the force (-f) flag, " + \ + "**bypassing any CI checks** (ETA: 1-5 minutes)." elif self.on_green: - return " the green (-g) flag." + return "Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours)." elif self.land_checks: - return ( - " the land checks (-l) flag." - + " If you did not specify this flag yourself, " - + f" you are likely enrolled in the [land checks rollout]({LAND_CHECK_ROLLOUT})." - ) + flag_msg = \ + "**The `-l` land checks flag is deprecated and no longer needed.** Instead we now automatically " + \ + "add the `ciflow\\trunk` label to your PR once it's approved\n\n" + + if self.has_trunk_label: + flag_msg += "Your change will be merged once all checks on your PR pass (ETA 0-4 Hours)." + else: + flag_msg += "Your change will be merged once the land checks pass (**ETA 4 Hours**)." + + return flag_msg else: - return "out a flag." + return "Your change will be merged once all checks pass (ETA 0-4 Hours)." def _get_land_check_progress(self, commit: Optional[str]) -> str: if commit is not None: return ( " and land check " - + f"progress [here](https://hud.pytorch.org/{self.org}/{self.project}/commit/{commit})" + + f"progress here" ) else: return "" - def _get_flag_explanation_message(self) -> str: - if self.force: - return "This means your change will be merged **immediately**, bypassing any CI checks (ETA: 1-5 minutes)." - elif self.on_green: - return "This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours)." - elif self.land_checks: - if self.has_trunk_label: - land_check_msg_suffix = "have passed since you have added the `ciflow/trunk` label to your PR (ETA 0-4 Hours)." - else: - land_check_msg_suffix = ( - "and the land checks have passed (**ETA 4 Hours**). " - ) - land_check_msg_suffix += "If you need to coordinate lands between different changes and cannot risk a land race, " - land_check_msg_suffix += "please add the `ciflow/trunk` label to your PR and wait for signal to complete, " - land_check_msg_suffix += "and then land your changes in proper order." - land_check_msg_suffix += ( - " Having `trunk`, `pull`, and `Lint` pre-run on a " - ) - land_check_msg_suffix += ( - "PR will bypass land checks and the ETA should be immediate." - ) - - return ( - "This means that your change will be merged once all checks on your PR " - + land_check_msg_suffix - ) - else: - return "This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours)." - def get_merge_message(self, commit: Optional[str] = None) -> str: - message_prefix = "@pytorchbot successfully started a merge job." - progress_links = f"Check the current status [here]({os.getenv('GH_RUN_URL')}){self._get_land_check_progress(commit)}." - flag_message = f"The merge job was triggered with{self._get_flag_msg()}" - explanation_message = self._get_flag_explanation_message() - - msg = message_prefix + " " - msg += progress_links + "\n" - msg += flag_message + " " - msg += explanation_message + " " - msg += ALTERNATIVES + "\n" + title = "### Merge started" + main_message = self._get_flag_msg() + + advanced_debugging = "\n".join(( + "
Advanced Debugging", + "Check the merge workflow status ", + f"here{self._get_land_check_progress(commit)}", + "
" + )) + + msg = title + "\n" + msg += main_message + "\n\n" + msg += ALTERNATIVES + "\n\n" msg += CONTACT_US + msg += advanced_debugging return msg @@ -134,13 +113,3 @@ def get_revert_message(org: str, project: str, pr_num: int) -> str: ) msg += CONTACT_US return msg - - -def get_land_check_troubleshooting_message() -> str: - return ( - " If you believe this is an error, you can use the old behavior with `@pytorchbot merge -g`" - + " (optionally with the `ciflow/trunk` to get land checks)" - + ' or use `@pytorchbot merge -f "some reason here"`.' - + f" For more information, see the [bot wiki]({BOT_COMMANDS_WIKI}). \n" - + CONTACT_US - ) diff --git a/.github/scripts/tryrebase.py b/.github/scripts/tryrebase.py index 1b69f653e525..2e8987e9faaa 100755 --- a/.github/scripts/tryrebase.py +++ b/.github/scripts/tryrebase.py @@ -69,6 +69,7 @@ def rebase_ghstack_onto(pr: GitHubPR, repo: GitRepo, onto_branch: str, dry_run: push_result = ghstack_result.stdout.decode("utf-8") print(push_result) if ghstack_result.returncode != 0: + print(ghstack_result.stderr.decode("utf-8")) raise Exception(f"\n```{push_result}```") # The contents of a successful push result should look like: # Summary of changes (ghstack 0.6.0) diff --git a/.github/scripts/update_commit_hashes.py b/.github/scripts/update_commit_hashes.py index 5dad5877ca4a..4b638cf11c90 100644 --- a/.github/scripts/update_commit_hashes.py +++ b/.github/scripts/update_commit_hashes.py @@ -136,6 +136,7 @@ def main() -> None: ) with open(f".github/ci_commit_pins/{args.repo_name}.txt", "r+") as f: old_hash = f.read().strip() + subprocess.run(f"git checkout {old_hash}".split(), cwd=args.repo_name) f.seek(0) f.truncate() f.write(f"{hash}\n") diff --git a/.github/scripts/wait_for_ssh_to_drain.sh b/.github/scripts/wait_for_ssh_to_drain.sh deleted file mode 100755 index f33d80764033..000000000000 --- a/.github/scripts/wait_for_ssh_to_drain.sh +++ /dev/null @@ -1,13 +0,0 @@ -#!/usr/bin/env bash - -set -eou pipefail - -echo "Holding runner for 2 hours until all ssh sessions have logged out" -for _ in $(seq 1440); do - # Break if no ssh session exists anymore - if [ "$(who)" = "" ]; then - break - fi - echo "." - sleep 5 -done diff --git a/.github/templates/common.yml.j2 b/.github/templates/common.yml.j2 index f0f3e3a430f7..edb652ff16ce 100644 --- a/.github/templates/common.yml.j2 +++ b/.github/templates/common.yml.j2 @@ -1,10 +1,8 @@ {%- set upload_artifact_s3_action = "seemethere/upload-artifact-s3@v5" -%} {%- set download_artifact_s3_action = "seemethere/download-artifact-s3@v4" -%} +{%- set upload_artifact_action = "actions/upload-artifact@v3" -%} +{%- set download_artifact_action = "actions/download-artifact@v3" -%} -{# squid_proxy is an private ELB that only available for GHA custom runners #} -{%- set squid_proxy = "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -%} -{# squid_no_proxy is a list of common set of fixed domains or IPs that we don't need to proxy. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy #} -{%- set squid_no_proxy = "localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" -%} {%- set timeout_minutes = 240 -%} # NOTE: If testing pytorch/builder changes you can change this variable to change what pytorch/builder reference @@ -17,43 +15,6 @@ concurrency: cancel-in-progress: true {%- endmacro -%} -{%- macro add_retry_to_env() -%} - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } -{%- endmacro -%} - -{%- macro gen_dispatch_rules(on_pull_request, is_scheduled, ciflow_labels, branches = ['master', 'main', 'release/*'], enable_doc_jobs = True) -%} -on: -{%- if on_pull_request %} - pull_request: -{%- endif %} - push: -{%- if enable_doc_jobs and is_scheduled %} - tags: - # NOTE: Binary build pipelines should only get triggered on release candidate builds - # Release candidate tags look like: v1.11.0-rc1 - - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ -{%- endif %} -{%- for label in ciflow_labels | sort %} - {%- if loop.first and not (enable_doc_jobs and is_scheduled) %} - tags: - {%- endif %} - - '!{{ label }}/*' -{%- endfor %} -{%- if not is_scheduled %} - branches: -{%- for branch in branches %} - - !{{ branch }} -{%- endfor %} -{%- endif %} -{%- if is_scheduled %} - schedule: - - cron: !{{ is_scheduled }} -{%- endif %} - workflow_dispatch: -{%- endmacro -%} - {%- macro display_ec2_information() -%} - name: Display EC2 information shell: bash @@ -71,52 +32,6 @@ on: echo "system info $(uname -a)" {%- endmacro -%} -{%- macro parse_ref(pytorch_directory="") -%} - - name: Parse ref - shell: bash -{%- if pytorch_directory %} - working-directory: !{{ pytorch_directory }} -{%- endif %} - id: parse-ref - run: ./.github/scripts/parse_ref.py -{%- endmacro -%} - -{%- macro upload_test_statistics(build_environment, when="always()", pytorch_directory="", needs_credentials=False) -%} - - name: Upload test statistics -{%- if pytorch_directory %} - working-directory: !{{ pytorch_directory }} -{%- endif %} - if: !{{ when }} - env: - AWS_DEFAULT_REGION: us-east-1 - GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} - BRANCH: ${{ steps.parse-ref.outputs.branch }} - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - TAG: ${{ steps.parse-ref.outputs.tag }} - WORKFLOW_ID: '${{ github.run_id }}' - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} -{%- if needs_credentials %} - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} -{%- endif %} - shell: bash - run: | - set -x - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") - export GHA_WORKFLOW_JOB_ID - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test -{%- endmacro -%} - -{%- macro chown_dir(dir) -%} - - name: Chown artifacts - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "!{{ dir }}:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . -{%- endmacro -%} {%- macro setup_ec2_windows() -%} !{{ display_ec2_information() }} @@ -136,27 +51,6 @@ on: Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore {%- endmacro -%} -{%- macro setup_ec2_linux() -%} - - name: Checkout PyTorch - uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - - name: Setup Linux - uses: ./.github/actions/setup-linux - - name: Chown workspace - run: | - !{{ add_retry_to_env() }} - retry docker pull "${ALPINE_IMAGE}" - # Ensure the working directory gets chowned back to the current user - docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Clean workspace - run: | - rm -rf "${GITHUB_WORKSPACE}" - mkdir "${GITHUB_WORKSPACE}" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} -{%- endmacro -%} - {%- macro setup_rocm_linux() -%} - name: Clean workspace run: | @@ -184,7 +78,12 @@ on: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -197,29 +96,6 @@ on: env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" {%- endmacro -%} -{%- macro teardown_ec2_linux(pytorch_directory="") -%} - - name: Hold runner for 2 hours or until ssh sessions have drained -{%- if pytorch_directory %} - working-directory: !{{ pytorch_directory }} -{%- endif %} - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af -{%- endmacro -%} - {%- macro teardown_rocm_linux() -%} - name: Kill containers, clean up images if: always() @@ -260,186 +136,6 @@ on: {%- endif %} {%- endmacro -%} -{%- macro upload_downloaded_files(name, config=None, shard=None, num_shards=None, runner=None, artifact_name="", use_s3=True, when="always()") -%} - - name: Zip JSONs for upload - if: !{{ when }} - env: -{%- if name == 'linux' or name == 'windows' or name == 'macos' %} - FILE_SUFFIX: '${{ github.job }}-!{{ config }}-!{{ shard }}-!{{ num_shards }}-!{{ runner }}'{%- else %} - FILE_SUFFIX: '!{{ name }}-${{ github.job }}' -{%- endif %} -{%- if name == 'windows' %} - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json' -{%- else %} - run: | - # Remove any previous test jsons if they exist - rm -f test-jsons-*.zip - zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' -{%- endif %} -{%- if use_s3 %} - - uses: !{{ upload_artifact_s3_action }} - name: Store Test Downloaded JSONs on S3 -{%- else %} - - uses: actions/upload-artifact@v2 - name: Store Test Downloaded JSONs on Github -{%- endif %} - if: !{{ when }} - with: -{%- if artifact_name != "" %} - name: !{{ artifact_name }} -{%- endif %} - retention-days: 14 - if-no-files-found: warn - path: - test-jsons-*.zip -{%- endmacro -%} - -{%- macro upload_test_reports(name, config=None, shard=None, num_shards=None, runner=None, artifact_name="", use_s3=True) -%} - - name: Zip test reports for upload - if: always() - env: -{%- if name == 'linux' or name == 'windows' or name == 'macos' %} - FILE_SUFFIX: '${{ github.job }}-!{{ config }}-!{{ shard }}-!{{ num_shards }}-!{{ runner }}' -{%- else %} - FILE_SUFFIX: '!{{ name }}-${{ github.job }}' -{%- endif %} -{%- if name == 'windows' %} - shell: powershell - run: | - # -ir => recursive include all files in pattern - 7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' -{%- else %} - run: | - # Remove any previous test reports if they exist - rm -f test-reports-*.zip - zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' -{%- endif %} -{%- if use_s3 %} - - uses: !{{ upload_artifact_s3_action }} - name: Store Test Reports on S3 -{%- else %} - - uses: actions/upload-artifact@v2 - name: Store Test Reports on Github -{%- endif %} - if: always() - with: -{%- if artifact_name != "" %} - name: !{{ artifact_name }} -{%- endif %} - retention-days: 14 - if-no-files-found: error - path: - test-reports-*.zip -{%- endmacro -%} - -{%- macro upload_cores(artifact_name="coredumps", config=None, shard=None, use_s3=True) -%} -{%- if use_s3 %}- uses: !{{ upload_artifact_s3_action }} - name: Store Core dumps on S3 -{%- else %}- uses: actions/upload-artifact@v2 - name: Store Core dumps on Github -{%- endif %} - if: failure() - with: -{%- if config != "" and shard != "" %} - name: !{{ artifact_name }}-!{{ config }}-!{{ shard }} -{%- else %} - name: !{{ artifact_name }} -{%- endif %} - retention-days: 14 - if-no-files-found: ignore - path: - ./**/core.[1-9]* -{%- endmacro -%} - -{%- macro render_test_results() -%} - - name: Install render_test_results dependencies - if: always() - shell: bash - run: | - python3 -m pip install junitparser==2.1.1 rich==10.9.0 - - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]" - if: always() - shell: bash - # Encoding is weird on windows, just try to default to utf-8 if possible - env: - PYTHONIOENCODING: "utf-8" - run: | - python3 tools/render_junit.py test/ -{%- endmacro -%} - -{%- macro calculate_docker_image(always_rebuild) -%} - - name: Calculate docker image tag - id: calculate-tag - run: | - DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker) - echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}" - echo "::set-output name=docker_tag::${DOCKER_TAG}" - echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" - - name: Check if image should be built - id: check - env: - BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }} - run: | - set -x -{%- if not always_rebuild %} - # Check if image already exists, if it does then skip building it - if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then - exit 0 - fi - if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then - # if we're on the base branch then use the parent commit - MERGE_BASE=$(git rev-parse HEAD~) - else - # otherwise we're on a PR, so use the most recent base commit - MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") - fi - # Covers the case where a previous tag doesn't exist for the tree - # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly - if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then - echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit" - exit 1 - fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker") - # If no image exists but the hash is the same as the previous hash then we should error out here - if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then - echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" - echo " contact the PyTorch team to restore the original images" - exit 1 - fi -{%- endif %} - echo ::set-output name=rebuild::yes - - name: Build and push docker image - if: ${{ steps.check.outputs.rebuild }} - env: - DOCKER_SKIP_S3_UPLOAD: 1 - working-directory: .circleci/docker - run: | - export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/} - ./build_docker.sh -{%- endmacro -%} - -{%- macro setup_miniconda(python_version, activate_environment=True) -%} - - name: Setup miniconda - uses: conda-incubator/setup-miniconda@v2 - with: - auto-update-conda: true - python-version: !{{ python_version }} -{%- if activate_environment %} - activate-environment: build -{%- endif %} -{%- endmacro -%} - -{%- macro set_xcode_version(xcode_version) -%} -{%- if xcode_version != '' %} - # Set xcode xcode version to !{{ xcode_version }} - DEVELOPER_DIR: /Applications/Xcode_!{{ xcode_version }}.app/Contents/Developer -{%- endif %} -{%- endmacro -%} - {%- macro wait_and_kill_ssh_windows(pytorch_directory="") -%} - name: Wait until all sessions have drained shell: powershell diff --git a/.github/templates/linux_binary_build_workflow.yml.j2 b/.github/templates/linux_binary_build_workflow.yml.j2 index 2879da9dad9c..2c6529d32b66 100644 --- a/.github/templates/linux_binary_build_workflow.yml.j2 +++ b/.github/templates/linux_binary_build_workflow.yml.j2 @@ -52,6 +52,9 @@ jobs: with:!{{ upload.binary_env_as_input(config) }} build_name: !{{ config["build_name"] }} build_environment: !{{ build_environment }} + {%- if config.pytorch_extra_install_requirements is defined %} + PYTORCH_EXTRA_INSTALL_REQUIREMENTS: !{{ config.pytorch_extra_install_requirements }} + {%- endif %} secrets: github-token: ${{ secrets.GITHUB_TOKEN }} @@ -78,7 +81,7 @@ jobs: !{{ upload.binary_env(config) }} steps: !{{ common.setup_rocm_linux() }} - - uses: !{{ common.download_artifact_s3_action }} + - uses: !{{ common.download_artifact_action }} name: Download Build Artifacts with: name: !{{ config["build_name"] }} @@ -89,7 +92,7 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: !{{ config["container_image"] }} - name: Test Pytorch binary diff --git a/.github/templates/macos_binary_build_workflow.yml.j2 b/.github/templates/macos_binary_build_workflow.yml.j2 index 64bc3653e8de..eb0c2ff4b373 100644 --- a/.github/templates/macos_binary_build_workflow.yml.j2 +++ b/.github/templates/macos_binary_build_workflow.yml.j2 @@ -58,17 +58,8 @@ jobs: {%- for config in build_configs %} !{{ config["build_name"] }}-build: if: ${{ github.repository_owner == 'pytorch' }} - {%- if config["package_type"] == "libtorch" %} - runs-on: macos-10.15 - {%- else %} runs-on: macos-12-xl - {%- endif %} -{%- if config["package_type"] == "libtorch" %} - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 -{%- else %} timeout-minutes: !{{ common.timeout_minutes }} -{%- endif %} !{{ upload.binary_env(config, true) }} # For sccache access (only on non-forked PRs) AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} @@ -78,18 +69,24 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" !{{ common.checkout(deep_clone=False, directory="pytorch") }} !{{ common.checkout(deep_clone=False, directory="builder", repository="pytorch/builder", branch=common.builder_branch) }} - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -100,7 +97,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: !{{ config["build_name"] }} diff --git a/.github/templates/windows_binary_build_workflow.yml.j2 b/.github/templates/windows_binary_build_workflow.yml.j2 index 6b0cbbd18740..9f68df06b704 100644 --- a/.github/templates/windows_binary_build_workflow.yml.j2 +++ b/.github/templates/windows_binary_build_workflow.yml.j2 @@ -72,7 +72,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: !{{ common.upload_artifact_s3_action }} + - uses: !{{ common.upload_artifact_action }} if: always() with: name: !{{ config["build_name"] }} @@ -93,7 +93,7 @@ jobs: steps: !{{ common.setup_ec2_windows() }} !{{ set_runner_specific_vars() }} - - uses: !{{ common.download_artifact_s3_action }} + - uses: !{{ common.download_artifact_action }} name: Download Build Artifacts with: name: !{{ config["build_name"] }} diff --git a/.github/workflows/_android-build-test.yml b/.github/workflows/_android-build-test.yml index 4d3e07826eae..dfa48daa84ac 100644 --- a/.github/workflows/_android-build-test.yml +++ b/.github/workflows/_android-build-test.yml @@ -28,6 +28,11 @@ jobs: if: github.repository_owner == 'pytorch' runs-on: [self-hosted, linux.2xlarge] steps: + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -35,11 +40,6 @@ jobs: - name: Setup Linux uses: ./.github/actions/setup-linux - - name: Setup SSH (Click me for login details) - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.GITHUB_TOKEN }} - - name: Calculate docker image id: calculate-docker-image uses: ./.github/actions/calculate-docker-image @@ -48,7 +48,7 @@ jobs: xla: ${{ contains(inputs.build-environment, 'xla') }} - name: Pull docker image - uses: ./.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} @@ -112,5 +112,5 @@ jobs: if: always() - name: Teardown Linux - uses: ./.github/actions/teardown-linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main if: always() diff --git a/.github/workflows/_android-full-build-test.yml b/.github/workflows/_android-full-build-test.yml index efc66846db7a..ea07fda814b1 100644 --- a/.github/workflows/_android-full-build-test.yml +++ b/.github/workflows/_android-full-build-test.yml @@ -19,23 +19,6 @@ on: If this is set, our linter will use this to make sure that every other job with the same `sync-tag` is identical. - secrets: - SONATYPE_NEXUS_USERNAME: - description: nexus user - required: true - SONATYPE_NEXUS_PASSWORD: - description: nexus pass - required: true - ANDROID_SIGN_KEY: - description: android key - required: true - ANDROID_SIGN_PASS: - description: android pass - required: true - SCRIBE_GRAPHQL_ACCESS_TOKEN: - description: token for writing to scribe/scuba - required: true - env: GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} @@ -45,6 +28,11 @@ jobs: if: github.repository_owner == 'pytorch' runs-on: [self-hosted, linux.2xlarge] steps: + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -52,11 +40,6 @@ jobs: - name: Setup Linux uses: ./.github/actions/setup-linux - - name: Setup SSH (Click me for login details) - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.GITHUB_TOKEN }} - - name: Calculate docker image id: calculate-docker-image uses: ./.github/actions/calculate-docker-image @@ -64,7 +47,7 @@ jobs: docker-image-name: ${{ inputs.docker-image-name }} - name: Pull docker image - uses: ./.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} @@ -145,7 +128,7 @@ jobs: # run gradle buildRelease (echo "./.circleci/scripts/build_android_gradle.sh" | docker exec \ - -e BUILD_ENVIRONMENT="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build" \ + -e BUILD_ENVIRONMENT="pytorch-linux-focal-py3-clang7-android-ndk-r19c-gradle-build" \ -e MAX_JOBS="$(nproc --ignore=2)" \ -e AWS_DEFAULT_REGION \ -e PR_NUMBER \ @@ -160,25 +143,6 @@ jobs: mkdir -p "${GITHUB_WORKSPACE}/build_android_artifacts" docker cp "${ID_X86_32}:/var/lib/jenkins/workspace/android/artifacts.tgz" "${GITHUB_WORKSPACE}/build_android_artifacts/" - - name: Publish android snapshot - if: ${{ github.event_name == 'push' && github.event.ref == 'refs/heads/nightly' }} - env: - SONATYPE_NEXUS_USERNAME: ${{ secrets.SONATYPE_NEXUS_USERNAME }} - SONATYPE_NEXUS_PASSWORD: ${{ secrets.SONATYPE_NEXUS_PASSWORD }} - ANDROID_SIGN_KEY: ${{ secrets.ANDROID_SIGN_KEY }} - ANDROID_SIGN_PASS: ${{ secrets.ANDROID_SIGN_PASS }} - ID_X86_32: ${{ steps.build-x86_32.outputs.container_id }} - run: | - set -eux - (echo "./.circleci/scripts/publish_android_snapshot.sh" | docker exec \ - -e BUILD_ENVIRONMENT="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot" \ - -e SONATYPE_NEXUS_USERNAME \ - -e SONATYPE_NEXUS_PASSWORD \ - -e ANDROID_SIGN_KEY \ - -e ANDROID_SIGN_PASS \ - -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \ - -u jenkins -i "${ID_X86_32}" bash) 2>&1 - - name: Store PyTorch Android Build Artifacts on S3 uses: seemethere/upload-artifact-s3@v5 with: @@ -192,5 +156,5 @@ jobs: if: always() - name: Teardown Linux - uses: ./.github/actions/teardown-linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main if: always() diff --git a/.github/workflows/_bazel-build-test.yml b/.github/workflows/_bazel-build-test.yml index 06786d237f07..79445e1dad6c 100644 --- a/.github/workflows/_bazel-build-test.yml +++ b/.github/workflows/_bazel-build-test.yml @@ -28,6 +28,11 @@ jobs: if: github.repository_owner == 'pytorch' runs-on: [self-hosted, linux.2xlarge] steps: + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -35,11 +40,6 @@ jobs: - name: Setup Linux uses: ./.github/actions/setup-linux - - name: Setup SSH (Click me for login details) - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.GITHUB_TOKEN }} - - name: Calculate docker image id: calculate-docker-image uses: ./.github/actions/calculate-docker-image @@ -47,7 +47,7 @@ jobs: docker-image-name: ${{ inputs.docker-image-name }} - name: Pull docker image - uses: ./.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} @@ -197,5 +197,5 @@ jobs: python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - name: Teardown Linux - uses: ./.github/actions/teardown-linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main if: always() diff --git a/.github/workflows/_binary-build-linux.yml b/.github/workflows/_binary-build-linux.yml index b1b88a5b32f8..192ca251b79f 100644 --- a/.github/workflows/_binary-build-linux.yml +++ b/.github/workflows/_binary-build-linux.yml @@ -55,6 +55,11 @@ on: required: false type: string description: Desired python version + PYTORCH_EXTRA_INSTALL_REQUIREMENTS: + required: false + type: string + description: Extra install requirements + default: "" secrets: github-token: required: true @@ -62,8 +67,8 @@ on: jobs: build: - runs-on: linux.4xlarge - timeout-minutes: 240 + runs-on: linux.12xlarge + timeout-minutes: 150 env: PYTORCH_ROOT: ${{ inputs.PYTORCH_ROOT }} BUILDER_ROOT: ${{ inputs.BUILDER_ROOT }} @@ -79,6 +84,7 @@ jobs: LIBTORCH_VARIANT: ${{ inputs.LIBTORCH_VARIANT }} DESIRED_DEVTOOLSET: ${{ inputs.DESIRED_DEVTOOLSET }} DESIRED_PYTHON: ${{ inputs.DESIRED_PYTHON }} + PYTORCH_EXTRA_INSTALL_REQUIREMENTS: ${{ inputs.PYTORCH_EXTRA_INSTALL_REQUIREMENTS }} # Needed for conda builds ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" ANACONDA_USER: pytorch @@ -97,7 +103,6 @@ jobs: echo "PYTORCH_ROOT=${{ env.PYTORCH_ROOT }}" echo "BUILDER_ROOT=${{ env.BUILDER_ROOT }}" echo "PACKAGE_TYPE=${{ env.PACKAGE_TYPE }}" - echo "DESIRED_CUDA=${{ env.DESIRED_CUDA }}" echo "GPU_ARCH_VERSION=${{ env.GPU_ARCH_VERSION }}" echo "GPU_ARCH_TYPE=${{ env.GPU_ARCH_TYPE }}" @@ -107,12 +112,13 @@ jobs: echo "LIBTORCH_VARIANT=${{ env.LIBTORCH_VARIANT }}" echo "DESIRED_DEVTOOLSET=${{ env.DESIRED_DEVTOOLSET }}" echo "DESIRED_PYTHON=${{ env.DESIRED_PYTHON }}" - + echo "PYTORCH_EXTRA_INSTALL_REQUIREMENTS=${{ env.PYTORCH_EXTRA_INSTALL_REQUIREMENTS }}" echo "ALPINE_IMAGE=${{ env.ALPINE_IMAGE }}" echo "ANACONDA_USER=${{ env.ANACONDA_USER }}" echo "AWS_DEFAULT_REGION=${{ env.AWS_DEFAULT_REGION }}" echo "BINARY_ENV_FILE=${{ env.BINARY_ENV_FILE }}" echo "BUILD_ENVIRONMENT=${{ env.BUILD_ENVIRONMENT }}" + echo "BUILD_NAME=${{ env.BUILD_NAME }}" echo "PR_NUMBER=${{ env.PR_NUMBER }}" echo "PYTORCH_FINAL_PACKAGE_DIR=${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" echo "SHA1=${{ env.SHA1 }}" @@ -120,16 +126,16 @@ jobs: - name: List the env shell: bash run: env + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.github-token }} - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Setup Linux uses: ./.github/actions/setup-linux - name: Chown workspace uses: ./.github/actions/chown-workspace - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.github-token }} - name: Clean workspace shell: bash run: | @@ -161,17 +167,10 @@ jobs: git clean -fxd working-directory: builder - - name: Set BUILD_SPLIT_CUDA - if: ${{ inputs.GPU_ARCH_TYPE == 'cuda' && startsWith(inputs.GPU_ARCH_VERSION, '11') }} - shell: bash - run: | - echo "BUILD_SPLIT_CUDA='ON'" >> "$GITHUB_ENV" - name: Pull Docker image - run: | - retry () { - "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") - } - retry docker pull "${DOCKER_IMAGE}" + uses: pytorch/test-infra/.github/actions/pull-docker-image@main + with: + docker-image: ${{ inputs.DOCKER_IMAGE }} - name: Build PyTorch binary run: | set -x @@ -180,7 +179,6 @@ jobs: -e BINARY_ENV_FILE \ -e BUILDER_ROOT \ -e BUILD_ENVIRONMENT \ - -e BUILD_SPLIT_CUDA \ -e DESIRED_CUDA \ -e DESIRED_DEVTOOLSET \ -e DESIRED_PYTHON \ @@ -192,6 +190,7 @@ jobs: -e PYTORCH_FINAL_PACKAGE_DIR \ -e PYTORCH_ROOT \ -e SKIP_ALL_TESTS \ + -e PYTORCH_EXTRA_INSTALL_REQUIREMENTS \ --tty \ --detach \ -v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \ @@ -209,29 +208,17 @@ jobs: # Ensure the working directory gets chowned back to the current user docker run --rm -v "${RUNNER_TEMP}/artifacts:/v" -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 with: name: ${{ inputs.build_name }} - retention-days: 14 if-no-files-found: error path: ${{ runner.temp }}/artifacts/* - - name: Hold runner for 2 hours or until ssh sessions have drained - working-directory: pytorch/ - # Always hold for active ssh sessions + - name: Teardown Linux if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh + uses: pytorch/test-infra/.github/actions/teardown-linux@main + - name: Chown workspace if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af + uses: ./pytorch/.github/actions/chown-workspace diff --git a/.github/workflows/_binary-test-linux.yml b/.github/workflows/_binary-test-linux.yml index 5c29288b8246..471a2af88b8f 100644 --- a/.github/workflows/_binary-test-linux.yml +++ b/.github/workflows/_binary-test-linux.yml @@ -122,6 +122,10 @@ jobs: echo "SHA1=${{ env.SHA1 }}" } >> "${GITHUB_ENV} }}" + - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.github-token }} # Setup the environment - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -129,23 +133,12 @@ jobs: uses: ./.github/actions/setup-linux - name: Chown workspace uses: ./.github/actions/chown-workspace - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.github-token }} - name: Clean workspace shell: bash run: | rm -rf "${GITHUB_WORKSPACE}" mkdir "${GITHUB_WORKSPACE}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: ${{ inputs.build_name }} - path: "${{ runner.temp }}/artifacts/" - - - name: Checkout PyTorch to pytorch dir uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -171,42 +164,28 @@ jobs: git clean -fxd working-directory: builder + - uses: actions/download-artifact@v3 + name: Download Build Artifacts + with: + name: ${{ inputs.build_name }} + path: "${{ runner.temp }}/artifacts/" + - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + uses: pytorch/test-infra/.github/actions/setup-nvidia@main if: ${{ inputs.GPU_ARCH_TYPE == 'cuda' }} - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - pushd pytorch - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - popd - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ inputs.DOCKER_IMAGE }} - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - - name: Hold runner for 2 hours or until ssh sessions have drained - working-directory: pytorch/ - # Always hold for active ssh sessions + - name: Teardown Linux if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh + uses: pytorch/test-infra/.github/actions/teardown-linux@main + - name: Chown workspace if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af + uses: ./pytorch/.github/actions/chown-workspace diff --git a/.github/workflows/_binary-upload.yml b/.github/workflows/_binary-upload.yml index cf47de9ccf21..2dc77dba09bc 100644 --- a/.github/workflows/_binary-upload.yml +++ b/.github/workflows/_binary-upload.yml @@ -70,7 +70,9 @@ on: description: Conda PyTorchBot token jobs: build: - runs-on: linux.2xlarge + runs-on: ubuntu-22.04 + container: + image: continuumio/miniconda3:4.12.0 env: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder @@ -86,40 +88,20 @@ jobs: LIBTORCH_VARIANT: ${{ inputs.LIBTORCH_VARIANT }} DESIRED_DEVTOOLSET: ${{ inputs.DESIRED_DEVTOOLSET }} DESIRED_PYTHON: ${{ inputs.DESIRED_PYTHON }} - # Needed for conda builds - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" ANACONDA_USER: pytorch - AWS_DEFAULT_REGION: us-east-1 BINARY_ENV_FILE: /tmp/env GITHUB_TOKEN: ${{ secrets.github-token }} PR_NUMBER: ${{ github.event.pull_request.number }} PYTORCH_FINAL_PACKAGE_DIR: /artifacts SHA1: ${{ github.event.pull_request.head.sha || github.sha }} steps: - - name: List the env - shell: bash - run: env - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - - name: Setup Linux - uses: ./.github/actions/setup-linux - - name: Chown workspace - uses: ./.github/actions/chown-workspace - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: ./.github/actions/setup-ssh with: - github-secret: ${{ secrets.github-token }} + no-sudo: true - - name: Download Build Artifacts with S3 - uses: seemethere/download-artifact-s3@v4 - if: ${{ inputs.use_s3 }} - with: - name: ${{ inputs.build_name }} - path: "${{ runner.temp }}/artifacts/" - - - name: Download Build Artifacts without S3 - uses: actions/download-artifact@v2 - if: ${{ !inputs.use_s3 }} + - name: Download Build Artifacts + uses: actions/download-artifact@v3 with: name: ${{ inputs.build_name }} path: "${{ runner.temp }}/artifacts/" @@ -130,6 +112,7 @@ jobs: echo "DRY_RUN=disabled" >> "$GITHUB_ENV" - name: Set UPLOAD_CHANNEL (only for tagged pushes) if: ${{ github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/') }} + shell: bash -e -l {0} run: | # reference ends with an RC suffix if [[ ${GITHUB_REF_NAME} = *-rc[0-9]* ]]; then @@ -143,36 +126,7 @@ jobs: AWS_ACCESS_KEY_ID: ${{ secrets.aws-access-key-id }} AWS_SECRET_ACCESS_KEY: ${{ secrets.aws-pytorch-uploader-secret-access-key }} ANACONDA_API_TOKEN: ${{ secrets.conda-pytorchbot-token }} + BUILD_NAME: ${{ inputs.build_name }} run: | - docker run --rm -i \ - -e ANACONDA_API_TOKEN \ - -e AWS_ACCESS_KEY_ID \ - -e AWS_SECRET_ACCESS_KEY \ - -e DRY_RUN \ - -e PACKAGE_TYPE \ - -e PKG_DIR=/artifacts \ - -e UPLOAD_CHANNEL \ - -e UPLOAD_SUBFOLDER \ - -v "${RUNNER_TEMP}/artifacts:/artifacts" \ - -v "${GITHUB_WORKSPACE}:/v" \ - -w /v \ - 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/miniconda3:4.10.3 \ - bash -c '.circleci/scripts/binary_upload.sh' - - - name: Hold runner for 2 hours or until ssh sessions have drained - # Always hold for active ssh sessions - if: always() - run: .github/scripts/wait_for_ssh_to_drain.sh - - name: Chown workspace - if: always() - run: | - # Ensure the working directory gets chowned back to the current user - docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" . - - name: Kill containers, clean up images - if: always() - run: | - # ignore expansion of "docker ps -q" since it could be empty - # shellcheck disable=SC2046 - docker stop $(docker ps -q) || true - # Prune all of the docker images - docker system prune -af + set -ex + bash .circleci/scripts/binary_upload.sh diff --git a/.github/workflows/_buck-build-test.yml b/.github/workflows/_buck-build-test.yml index 59fb21bf8965..07f41299c711 100644 --- a/.github/workflows/_buck-build-test.yml +++ b/.github/workflows/_buck-build-test.yml @@ -21,32 +21,13 @@ jobs: distribution: 'temurin' - name: Setup miniconda - uses: conda-incubator/setup-miniconda@v2 + uses: pytorch/test-infra/.github/actions/setup-miniconda@main with: - auto-update-conda: true python-version: 3.8 - activate-environment: build - - - name: Install dependencies - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - with: - timeout_minutes: 10 - max_attempts: 5 - command: | - conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions + environment-file: .github/requirements/conda-env-${{ runner.os }}-${{ runner.arch }} - name: Install Buck - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + uses: nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 with: timeout_minutes: 10 max_attempts: 5 @@ -56,7 +37,7 @@ jobs: sudo apt install ./buck.2021.01.12.01_all.deb - name: Download third party libraries and generate wrappers - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + uses: nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 with: timeout_minutes: 10 max_attempts: 5 diff --git a/.github/workflows/_docs.yml b/.github/workflows/_docs.yml index a925beaf768e..318471e7c786 100644 --- a/.github/workflows/_docs.yml +++ b/.github/workflows/_docs.yml @@ -38,11 +38,41 @@ jobs: build-docs: # Don't run on forked repos. if: github.repository_owner == 'pytorch' - runs-on: [self-hosted, linux.4xlarge] + runs-on: ${{ matrix.runner }} strategy: matrix: - docs_type: [cpp, python] + include: + - docs_type: cpp + # We recently seeing lots of exit code 137 running this in Docker indicating + # an OOM issue when running the job, so this upgrades the runner from 4xlarge + # to the next available tier of 12xlarge. So much memory just to generate cpp + # doc + runner: linux.12xlarge + # TODO: Nightly cpp docs take longer and longer to finish (more than 3h now) + # Let's try to figure out how this can be improved + timeout-minutes: 240 + - docs_type: python + runner: linux.2xlarge + # It takes less than 30m to finish python docs unless there are issues + timeout-minutes: 30 + - docs_type: functorch + runner: linux.2xlarge + # It takes less than 15m to finish functorch docs unless there are issues + timeout-minutes: 15 + # Set a fixed name for this job instead of using the current matrix-generated name, i.e. build-docs (cpp, linux.12xlarge, 180) + # The current name requires updating the Rockset last docs push query from test-infra every time the matrix is updated + name: build-docs-${{ matrix.docs_type }}-${{ inputs.push }} steps: + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + instructions: | + All builds are done inside the container, to start an interactive session run: + docker exec -it $(docker container ps --format '{{.ID}}') bash + To start Python docs build type: + cd docs && make html && make coverage + # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -50,13 +80,8 @@ jobs: - name: Setup Linux uses: ./.github/actions/setup-linux - - name: Setup SSH (Click me for login details) - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.GITHUB_TOKEN }} - - name: Pull docker image - uses: ./.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ inputs.docker-image }} @@ -76,8 +101,13 @@ jobs: echo "password ${GITHUB_PYTORCHBOT_TOKEN}" >> "${RUNNER_TEMP}/.netrc" - name: Build ${{ matrix.docs_type }} docs + timeout-minutes: ${{ matrix.timeout-minutes }} + id: build-docs env: - WITH_PUSH: ${{ github.event_name == 'schedule' || startsWith(github.event.ref, 'refs/tags/v') }} + # After https://github.com/pytorch/pytorch/pull/88373, pull workflow can now be run periodically, + # so using a schedule event to determine if the docs should be pushed or not doesn't hold true + # anymore + WITH_PUSH: ${{ inputs.push }} DOCKER_IMAGE: ${{ inputs.docker-image }} DOCS_TYPE: ${{ matrix.docs_type }} RUN_DOXYGEN: ${{ inputs.run-doxygen }} @@ -110,7 +140,7 @@ jobs: -w /var/lib/jenkins/workspace \ "${DOCKER_IMAGE}" ) - docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh" + docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install $(echo dist/*.whl)[opt-einsum] && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh" - name: Chown workspace uses: ./.github/actions/chown-workspace @@ -118,7 +148,7 @@ jobs: - name: Upload Python Docs Preview uses: seemethere/upload-artifact-s3@v5 - if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }} + if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' && steps.build-docs.outcome == 'success' }} with: retention-days: 14 s3-bucket: doc-previews @@ -128,10 +158,23 @@ jobs: - name: Upload C++ Docs Preview uses: seemethere/upload-artifact-s3@v5 - if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }} + if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' && steps.build-docs.outcome == 'success' }} with: retention-days: 14 if-no-files-found: error s3-bucket: doc-previews path: cppdocs/ s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs + + - name: Upload functorch Docs Preview + uses: seemethere/upload-artifact-s3@v5 + if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'functorch' && steps.build-docs.outcome == 'success' }} + with: + retention-days: 14 + s3-bucket: doc-previews + if-no-files-found: error + path: functorch_ghpages/nightly/ + s3-prefix: pytorch/${{ github.event.pull_request.number }}/functorchdocs + - name: Teardown Linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main + if: always() diff --git a/.github/workflows/_ios-build-test.yml b/.github/workflows/_ios-build-test.yml index 56443419ef1d..269ad3f153ca 100644 --- a/.github/workflows/_ios-build-test.yml +++ b/.github/workflows/_ios-build-test.yml @@ -23,20 +23,6 @@ on: If this is set, our linter will use this to make sure that every other job with the same `sync-tag` is identical. - secrets: - IOS_CERT_KEY_2022: - required: true - description: ios cert - IOS_CERT_SECRET: - required: true - description: ios cert - IOS_DEV_TEAM_ID: - required: true - description: ios cert - IOS_SIGN_KEY_2022: - required: true - description: ios cert - env: GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} BUILD_ENVIRONMENT: ${{ inputs.build-environment }} @@ -45,16 +31,8 @@ env: jobs: build: - # NOTE: These builds will not run successfully without running on `pytorch/pytorch` due to the limitations - # of accessing secrets from forked pull requests and IOS' dependency on secrets for their build/test - if: github.repository_owner == 'pytorch' runs-on: macos-12 timeout-minutes: 240 - env: - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET }} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID }} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} steps: # [see note: pytorch repo ref] - name: Checkout PyTorch @@ -90,47 +68,34 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions + blas=1.0 \ + cffi=1.15.1 \ + cmake=3.22.1 \ + mkl=2022.1.0 \ + mkl-include=2022.1.0 \ + ninja=1.10.2 \ + numpy=1.23.3 \ + pyyaml=6.0 \ + requests=2.28.1 \ + setuptools=63.4.1 \ + typing_extensions=4.3.0 - - name: Run Fastlane + - name: Setup Fastlane run: | set -x cd ios/TestApp # install fastlane sudo gem install bundler && bundle install bundle update fastlane - # install certificates - echo "${IOS_CERT_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o Certificates.p12 - rm cert.txt - bundle exec fastlane install_root_cert - bundle exec fastlane install_dev_cert - # install the provisioning profile - PROFILE=PyTorch_CI_2022.mobileprovision - PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles - mkdir -pv "${PROVISIONING_PROFILES}" - cd "${PROVISIONING_PROFILES}" - echo "${IOS_SIGN_KEY_2022}" >> cert.txt - base64 --decode cert.txt -o ${PROFILE} - rm cert.txt - - name: Build + - name: Build PyTorch Mobile Runtime run: | # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" @@ -139,20 +104,16 @@ jobs: export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"} scripts/build_ios.sh - - name: Run Build Test - timeout-minutes: 5 + - name: Build TestApp + if: inputs.ios-platform == 'SIMULATOR' + timeout-minutes: 15 run: | - PROFILE=PyTorch_CI_2022 # run the ruby build script if ! [ -x "$(command -v xcodebuild)" ]; then echo 'Error: xcodebuild is not installed.' exit 1 fi - if [ "${IOS_PLATFORM}" != "SIMULATOR" ]; then - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" -c "${PROFILE}" -t "${IOS_DEV_TEAM_ID}" - else - ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" - fi + ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "${IOS_PLATFORM}" - name: Run Simulator Tests if: inputs.ios-platform == 'SIMULATOR' @@ -191,6 +152,7 @@ jobs: else bundle exec fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT fi + - name: Dump Simulator Tests On a Failure if: | failure() && inputs.ios-platform == 'SIMULATOR' diff --git a/.github/workflows/_linux-build.yml b/.github/workflows/_linux-build.yml index 09a400c4d502..be3d2ce98c03 100644 --- a/.github/workflows/_linux-build.yml +++ b/.github/workflows/_linux-build.yml @@ -28,21 +28,49 @@ on: description: | If this is set, our linter will use this to make sure that every other job with the same `sync-tag` is identical. + cuda-arch-list: + required: false + type: string + default: "5.2" + description: | + List of CUDA architectures CI build should target. + runner: + required: false + type: string + default: "linux.2xlarge" + description: | + List of CUDA architectures CI build should target. + + test-matrix: + required: false + type: string + description: | + An option JSON description of what test configs to run later on. This + is moved here from the Linux test workflow so that we can apply filter + logic using test-config labels earlier and skip unnecessary builds outputs: docker-image: value: ${{ jobs.build.outputs.docker-image }} description: The docker image containing the built PyTorch. + test-matrix: + value: ${{ inputs.test-matrix }} + description: An optional JSON description of what test configs to run later on. jobs: build: - # Don't run on forked repos. + # Don't run on forked repos if: github.repository_owner == 'pytorch' - runs-on: [self-hosted, linux.2xlarge] + runs-on: ${{ inputs.runner }} timeout-minutes: 240 outputs: docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} steps: + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + # [pytorch repo ref] # Use a pytorch/pytorch reference instead of a reference to the local # checkout because when we run this action we don't *have* a local @@ -50,21 +78,9 @@ jobs: - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - - name: Check for new workflows - run: | - if [ ! -f "./.github/actions/setup-linux/action.yml" ]; then - echo "::error::Your PR is based on a version of master that is too old for our CI to work. Please rebase your PR on latest master and resubmit." - exit 1 - fi - - name: Setup Linux uses: ./.github/actions/setup-linux - - name: Setup SSH (Click me for login details) - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.GITHUB_TOKEN }} - - name: Calculate docker image id: calculate-docker-image uses: ./.github/actions/calculate-docker-image @@ -73,7 +89,7 @@ jobs: xla: ${{ contains(inputs.build-environment, 'xla') }} - name: Pull docker image - uses: ./.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ steps.calculate-docker-image.outputs.docker-image }} @@ -88,7 +104,17 @@ jobs: with: github-token: ${{ secrets.GITHUB_TOKEN }} + # Apply the filter logic to the build step too if the test-config label is already there + - name: Select all requested test configurations (if the test matrix is available) + id: filter + uses: ./.github/actions/filter-test-configs + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + test-matrix: ${{ inputs.test-matrix }} + - name: Build + if: steps.filter.outputs.is-test-matrix-empty == 'False' || inputs.test-matrix == '' + id: build env: BUILD_ENVIRONMENT: ${{ inputs.build-environment }} BRANCH: ${{ steps.parse-ref.outputs.branch }} @@ -100,7 +126,7 @@ jobs: SCCACHE_S3_KEY_PREFIX: ${{ github.workflow }} XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }} - TORCH_CUDA_ARCH_LIST: 5.2 + TORCH_CUDA_ARCH_LIST: ${{ inputs.cuda-arch-list }} DOCKER_IMAGE: ${{ steps.calculate-docker-image.outputs.docker-image }} XLA_CUDA: ${{ contains(inputs.build-environment, 'xla') && '0' || '' }} DEBUG: ${{ inputs.build-with-debug && '1' || '0' }} @@ -135,13 +161,13 @@ jobs: docker exec -t "${container_name}" sh -c '.jenkins/pytorch/build.sh' - name: Archive artifacts into zip - if: inputs.build-generates-artifacts + if: inputs.build-generates-artifacts && steps.build.outcome != 'skipped' run: | zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json - name: Store PyTorch Build Artifacts on S3 uses: seemethere/upload-artifact-s3@v5 - if: inputs.build-generates-artifacts + if: inputs.build-generates-artifacts && steps.build.outcome != 'skipped' with: name: ${{ inputs.build-environment }} retention-days: 14 @@ -149,6 +175,7 @@ jobs: path: artifacts.zip - name: Upload sccache stats + if: steps.build.outcome != 'skipped' uses: seemethere/upload-artifact-s3@v5 with: s3-prefix: | @@ -158,5 +185,5 @@ jobs: path: sccache-stats-*.json - name: Teardown Linux - uses: ./.github/actions/teardown-linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main if: always() diff --git a/.github/workflows/_linux-test.yml b/.github/workflows/_linux-test.yml index aa81647c53fc..a444a5fc530a 100644 --- a/.github/workflows/_linux-test.yml +++ b/.github/workflows/_linux-test.yml @@ -22,46 +22,70 @@ on: description: | If this is set, our linter will use this to make sure that every other job with the same `sync-tag` is identical. + timeout-minutes: + required: false + type: number + default: 240 + description: | + Set the maximum (in minutes) how long the workflow should take to finish env: GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} jobs: + # This needs to be run right before the test starts so that it can gather the + # latest labels from the PR + filter: + runs-on: [self-hosted, linux.large] + outputs: + test-matrix: ${{ steps.filter.outputs.test-matrix }} + is-test-matrix-empty: ${{ steps.filter.outputs.is-test-matrix-empty }} + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + fetch-depth: 1 + submodules: false + + - name: Select all requested test configurations + id: filter + uses: ./.github/actions/filter-test-configs + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + test-matrix: ${{ inputs.test-matrix }} + test: - # Don't run on forked repos. - if: github.repository_owner == 'pytorch' + needs: filter + # Don't run on forked repos or empty test matrix + if: github.repository_owner == 'pytorch' && needs.filter.outputs.is-test-matrix-empty == 'False' strategy: - matrix: ${{ fromJSON(inputs.test-matrix) }} + matrix: ${{ fromJSON(needs.filter.outputs.test-matrix) }} fail-fast: false runs-on: ${{ matrix.runner }} + timeout-minutes: ${{ inputs.timeout-minutes }} steps: - # [see note: pytorch repo ref] + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + instructions: | + All testing is done inside the container, to start an interactive session run: + docker exec -it $(docker container ps --format '{{.ID}}') bash + - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - name: Setup Linux uses: ./.github/actions/setup-linux - - name: Setup SSH (Click me for login details) - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.GITHUB_TOKEN }} - - name: Pull docker image - uses: ./.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ inputs.docker-image }} - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a + uses: pytorch/test-infra/.github/actions/setup-nvidia@main if: contains(inputs.build-environment, 'cuda') && !contains(matrix.config, 'nogpu') - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/install_nvidia_utils_linux.sh - echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" - name: Start monitoring script id: monitor-script @@ -70,7 +94,7 @@ jobs: python3 -m pip install psutil==5.9.1 python3 -m pip install pynvml==11.4.1 python3 -m tools.stats.monitor > usage_log.txt 2>&1 & - echo "::set-output name=monitor-script-pid::${!}" + echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" - name: Download build artifacts uses: ./.github/actions/download-build-artifacts @@ -96,11 +120,13 @@ jobs: NUM_TEST_SHARDS: ${{ matrix.num_shards }} PR_BODY: ${{ github.event.pull_request.body }} SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 + SCCACHE_S3_KEY_PREFIX: ${{ github.workflow }} SHM_SIZE: ${{ contains(inputs.build-environment, 'cuda') && '2g' || '1g' }} DOCKER_IMAGE: ${{ inputs.docker-image }} XLA_CUDA: ${{ contains(inputs.build-environment, 'xla') && '0' || '' }} XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla - timeout-minutes: 240 + PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: ${{ matrix.mem_leak_check && '1' || '0' }} + PYTORCH_TEST_RERUN_DISABLED_TESTS: ${{ matrix.rerun_disabled_tests && '1' || '0' }} run: | set -x @@ -150,8 +176,11 @@ jobs: -e PR_LABELS \ -e MAX_JOBS="$(nproc --ignore=2)" \ -e SCCACHE_BUCKET \ + -e SCCACHE_S3_KEY_PREFIX \ -e XLA_CUDA \ -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ + -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ + -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ --ulimit stack=10485760:83886080 \ --security-opt seccomp=unconfined \ @@ -166,7 +195,8 @@ jobs: -w /var/lib/jenkins/workspace \ "${DOCKER_IMAGE}" ) - docker exec -t "${container_name}" sh -c "pip install dist/*.whl && ${TEST_COMMAND}" + echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" + docker exec -t "${container_name}" sh -c "pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" - name: Get workflow job id id: get-job-id @@ -178,6 +208,7 @@ jobs: - name: Stop monitoring script if: always() && steps.monitor-script.outputs.monitor-script-pid shell: bash + continue-on-error: true env: MONITOR_SCRIPT_PID: ${{ steps.monitor-script.outputs.monitor-script-pid }} run: | @@ -189,6 +220,12 @@ jobs: with: file-suffix: ${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}_${{ steps.get-job-id.outputs.job-id }} + - name: Collect backtraces from coredumps (if any) + if: always() + run: | + # shellcheck disable=SC2156 + find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; + - name: Store Core dumps on S3 uses: seemethere/upload-artifact-s3@v5 if: failure() @@ -223,5 +260,5 @@ jobs: python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test - name: Teardown Linux - uses: ./.github/actions/teardown-linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main if: always() diff --git a/.github/workflows/_mac-build.yml b/.github/workflows/_mac-build.yml index 316656b6ec9b..5ee909f02c22 100644 --- a/.github/workflows/_mac-build.yml +++ b/.github/workflows/_mac-build.yml @@ -33,6 +33,25 @@ on: default: "3.8" description: | The python version to be used. Will be 3.8 by default + environment-file: + required: false + type: string + description: Set the conda environment file used to setup macOS build. + test-matrix: + required: false + type: string + description: | + An option JSON description of what test configs to run later on. This + is moved here from the Linux test workflow so that we can apply filter + logic using test-config labels earlier and skip unnecessary builds + + outputs: + test-matrix: + value: ${{ inputs.test-matrix }} + description: An optional JSON description of what test configs to run later on. + build-outcome: + value: ${{ jobs.build.outputs.build-outcome }} + description: The outcome of the build step. This is used to influence test filtering logic later on. secrets: MACOS_SCCACHE_S3_ACCESS_KEY_ID: @@ -42,11 +61,6 @@ on: required: true description: Secret for S3 bucket for macOS sccache. -# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179 -defaults: - run: - shell: bash -e -l {0} - jobs: build: # Don't run on forked repos. @@ -57,6 +71,8 @@ jobs: AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} BUILD_ENVIRONMENT: ${{ inputs.build-environment }} + outputs: + build-outcome: ${{ steps.build.outcome }} steps: # [see note: pytorch repo ref] - name: Checkout PyTorch @@ -71,25 +87,39 @@ jobs: fi - name: Setup miniconda - uses: conda-incubator/setup-miniconda@v2 + if: inputs.environment-file == '' + uses: pytorch/test-infra/.github/actions/setup-miniconda@main + with: + python-version: ${{ inputs.python_version }} + environment-file: .github/requirements/conda-env-${{ runner.os }}-${{ runner.arch }} + + # This option is used when cross-compiling arm64 from x86-64. Specifically, we need arm64 conda + # environment even though the arch is x86-64 + - name: Setup miniconda using the provided environment file + if: inputs.environment-file != '' + uses: pytorch/test-infra/.github/actions/setup-miniconda@main with: - auto-update-conda: true python-version: ${{ inputs.python_version }} - activate-environment: build - miniconda-version: 4.7.12 + environment-file: ${{ inputs.environment-file }} - name: Install macOS homebrew dependencies run: | # Install dependencies brew install libomp + brew link --force libomp - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - echo "SCCACHE_S3_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + echo "SCCACHE_S3_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${GITHUB_ENV}" - name: Get workflow job id id: get-job-id @@ -98,21 +128,31 @@ jobs: with: github-token: ${{ secrets.GITHUB_TOKEN }} + # Apply the filter logic to the build step too if the test-config label is already there + - name: Select all requested test configurations (if the test matrix is available) + id: filter + uses: ./.github/actions/filter-test-configs + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + test-matrix: ${{ inputs.test-matrix }} + - name: Build + if: steps.filter.outputs.is-test-matrix-empty == 'False' || inputs.test-matrix == '' + id: build env: OUR_GITHUB_JOB_ID: ${{ steps.get-job-id.outputs.job-id }} run: | echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}" - .jenkins/pytorch/macos-build.sh + ${CONDA_RUN} .jenkins/pytorch/macos-build.sh - name: Archive artifacts into zip - if: inputs.build-generates-artifacts + if: inputs.build-generates-artifacts && steps.build.outcome != 'skipped' run: | zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json - name: Store PyTorch Build Artifacts on GHA - uses: actions/upload-artifact@v2 - if: inputs.build-generates-artifacts + uses: actions/upload-artifact@v3 + if: inputs.build-generates-artifacts && steps.build.outcome != 'skipped' with: name: ${{ env.BUILD_ENVIRONMENT }} retention-days: 14 @@ -120,9 +160,9 @@ jobs: path: artifacts.zip - name: Upload sccache stats to GHA - uses: actions/upload-artifact@v2 + uses: actions/upload-artifact@v3 # Only if sccache is installed, see above - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} + if: ${{ (github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository) && steps.build.outcome != 'skipped' }} with: name: sccache-stats-${{ inputs.build-environment }}-runattempt${{ github.run_attempt }}-${{ steps.get-job-id.outputs.job-id }} retention-days: 14 diff --git a/.github/workflows/_mac-test-mps.yml b/.github/workflows/_mac-test-mps.yml index fa189307358a..24203e005153 100644 --- a/.github/workflows/_mac-test-mps.yml +++ b/.github/workflows/_mac-test-mps.yml @@ -15,7 +15,6 @@ on: If this is set, our linter will use this to make sure that every other job with the same `sync-tag` is identical. - jobs: run_mps_test: name: "Run MPS tests" @@ -38,6 +37,22 @@ jobs: name: ${{ inputs.build-environment }} use-gha: true + # This is copied from the main macos test workflow. It was missed in the earlier fix because macos M1 + # runners are shared and not ephemeral, so the issue wasn't manifested if the runners with the fix were + # used + - name: Install macOS homebrew dependencies + run: | + # Install dependencies + brew install libomp + brew link --force libomp + + - name: Setup miniconda + uses: pytorch/test-infra/.github/actions/setup-miniconda@main + with: + python-version: 3.9 + environment-file: .github/requirements/conda-env-${{ runner.os }}-${{ runner.arch }} + pip-requirements-file: .github/requirements/pip-requirements-${{ runner.os }}.txt + - name: Install PyTorch env: ENV_NAME: conda-test-env-${{ github.run_id }} @@ -45,24 +60,33 @@ jobs: shell: arch -arch arm64 bash {0} run: | # shellcheck disable=SC1090 - . ~/miniconda3/etc/profile.d/conda.sh set -ex - conda create -yp "${ENV_NAME}" "python=${PY_VERS}" numpy expecttest pyyaml # As wheels are cross-compiled they are reported as x86_64 ones ORIG_WHLNAME=$(ls -1 dist/*.whl); ARM_WHLNAME=${ORIG_WHLNAME/x86_64/arm64}; mv ${ORIG_WHLNAME} ${ARM_WHLNAME} - conda run -p "${ENV_NAME}" python3 -mpip install dist/*.whl + ${CONDA_RUN} python3 -mpip install --no-index --no-deps dist/*.whl - name: Run MPS tests + id: test env: ENV_NAME: conda-test-env-${{ github.run_id }} shell: arch -arch arm64 bash {0} run: | # shellcheck disable=SC1090 - . ~/miniconda3/etc/profile.d/conda.sh set -ex # TODO(https://github.com/pytorch/pytorch/issues/79293) - # This step currently fails if we actually run as if we're in CI. - unset CI - conda run --cwd test -p "${ENV_NAME}" python3 test_mps.py -v - conda env remove -p "${ENV_NAME}" + ${CONDA_RUN} python3 test/run_test.py --mps --verbose + + - name: Get workflow job id + id: get-job-id + uses: ./.github/actions/get-workflow-job-id + if: always() + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + + - name: Upload test artifacts + uses: ./.github/actions/upload-test-artifacts + if: always() && (steps.test.conclusion == 'success' || steps.test.conclusion == 'failure') + with: + use-gha: true + file-suffix: ${{ github.job }}-mps-1-1-macos-m1-12_${{ steps.get-job-id.outputs.job-id }} diff --git a/.github/workflows/_mac-test.yml b/.github/workflows/_mac-test.yml index 8b648c7a8762..cbc3372e1c42 100644 --- a/.github/workflows/_mac-test.yml +++ b/.github/workflows/_mac-test.yml @@ -32,18 +32,39 @@ on: required: true description: secret acess key for test stats upload - jobs: + # This needs to be run right before the test starts so that it can gather the + # latest labels from the PR + filter: + runs-on: [self-hosted, linux.large] + outputs: + test-matrix: ${{ steps.filter.outputs.test-matrix }} + is-test-matrix-empty: ${{ steps.filter.outputs.is-test-matrix-empty }} + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + fetch-depth: 1 + submodules: false + + - name: Select all requested test configurations + id: filter + uses: ./.github/actions/filter-test-configs + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + test-matrix: ${{ inputs.test-matrix }} + test: - # Don't run on forked repos. - if: github.repository_owner == 'pytorch' + needs: filter + # Don't run on forked repos or empty test matrix + if: github.repository_owner == 'pytorch' && needs.filter.outputs.is-test-matrix-empty == 'False' # For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179 # Also ensure that we always run with the right architecture defaults: run: shell: arch -arch ${{ inputs.arch }} bash -e -l {0} strategy: - matrix: ${{ fromJSON(inputs.test-matrix) }} + matrix: ${{ fromJSON(needs.filter.outputs.test-matrix) }} fail-fast: false runs-on: ${{ matrix.runner }} timeout-minutes: 240 @@ -61,43 +82,39 @@ jobs: - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - - name: Start monitoring script - id: monitor-script - run: | - python3 -m pip install psutil==5.9.1 - python3 -m pip install pynvml==11.4.1 - python3 -m tools.stats.monitor > usage_log.txt 2>&1 & - echo "::set-output name=monitor-script-pid::${!}" - - name: Download build artifacts uses: ./.github/actions/download-build-artifacts with: name: ${{ inputs.build-environment }} use-gha: true - - name: Setup miniconda for x86 - if: inputs.build-environment == 'macos-12-py3-x86-64' - uses: conda-incubator/setup-miniconda@v2 + - name: Setup miniconda (x86, py3.8) + if: ${{ runner.arch == 'X64' }} + uses: pytorch/test-infra/.github/actions/setup-miniconda@main with: - auto-update-conda: true python-version: 3.8 - activate-environment: build - miniconda-version: 4.7.12 + environment-file: .github/requirements/conda-env-${{ runner.os }}-${{ runner.arch }} + pip-requirements-file: .github/requirements/pip-requirements-${{ runner.os }}.txt - - name: Setup miniconda for arm64 - if: inputs.build-environment == 'macos-12-py3-arm64' + - name: Setup miniconda (arm64, py3.9) + if: ${{ runner.arch == 'ARM64' }} + uses: pytorch/test-infra/.github/actions/setup-miniconda@main + with: + python-version: 3.9 + environment-file: .github/requirements/conda-env-${{ runner.os }}-${{ runner.arch }} + pip-requirements-file: .github/requirements/pip-requirements-${{ runner.os }}.txt + + - name: Start monitoring script + id: monitor-script run: | - # Conda is already installed and setup for bash here - # Cleanup lingering conda environment and create - # a new one for this run - conda env remove -n build - conda create -n build python=3.9.12 - conda list + ${CONDA_RUN} python3 -m tools.stats.monitor > usage_log.txt 2>&1 & + echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" - name: Install macOS homebrew dependencies run: | # Install dependencies brew install libomp + brew link --force libomp - name: Parse ref id: parse-ref @@ -111,6 +128,9 @@ jobs: - name: Test id: test + env: + PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: ${{ matrix.mem_leak_check && '1' || '0' }} + PYTORCH_TEST_RERUN_DISABLED_TESTS: ${{ matrix.rerun_disabled_tests && '1' || '0' }} run: | COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") @@ -127,18 +147,8 @@ jobs: export PR_BODY="${PR_BODY//[\'\"]}" arch - # This is a no-op for x86 - conda activate build - - python3 -mpip install dist/*.whl - .jenkins/pytorch/macos-test.sh - - - name: Cleanup miniconda for arm64 - if: inputs.build-environment == 'macos-12-py3-arm64' - run: | - # Cleanup conda env - conda deactivate - conda env remove -n build + ${CONDA_RUN} python3 -mpip install --no-index --no-deps $(echo dist/*.whl) + ${CONDA_RUN} .jenkins/pytorch/macos-test.sh - name: Get workflow job id id: get-job-id @@ -149,6 +159,7 @@ jobs: - name: Stop monitoring script if: always() && ${{ steps.monitor-script.outputs.monitor-script-pid }} + continue-on-error: true env: MONITOR_SCRIPT_PID: ${{ steps.monitor-script.outputs.monitor-script-pid }} run: | @@ -182,6 +193,4 @@ jobs: GHA_WORKFLOW_JOB_ID: ${{ steps.get-job-id.outputs.job-id }} run: | set -x - python3 -m pip install -r requirements.txt - python3 -m pip install boto3==1.19.12 - python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test + ${CONDA_RUN} python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test diff --git a/.github/workflows/_rocm-test.yml b/.github/workflows/_rocm-test.yml index b5550fdda7f0..be4a5c9dcc6c 100644 --- a/.github/workflows/_rocm-test.yml +++ b/.github/workflows/_rocm-test.yml @@ -39,12 +39,34 @@ env: GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} jobs: + # This needs to be run right before the test starts so that it can gather the + # latest labels from the PR + filter: + runs-on: [self-hosted, linux.large] + outputs: + test-matrix: ${{ steps.filter.outputs.test-matrix }} + is-test-matrix-empty: ${{ steps.filter.outputs.is-test-matrix-empty }} + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + fetch-depth: 1 + submodules: false + + - name: Select all requested test configurations + id: filter + uses: ./.github/actions/filter-test-configs + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + test-matrix: ${{ inputs.test-matrix }} + test: - # Don't run on forked repos. - if: github.repository_owner == 'pytorch' + needs: filter + # Don't run on forked repos or empty test matrix + if: github.repository_owner == 'pytorch' && needs.filter.outputs.is-test-matrix-empty == 'False' timeout-minutes: 300 strategy: - matrix: ${{ fromJSON(inputs.test-matrix) }} + matrix: ${{ fromJSON(needs.filter.outputs.test-matrix) }} fail-fast: false runs-on: ${{ matrix.runner }} steps: @@ -58,7 +80,7 @@ jobs: uses: ./.github/actions/setup-rocm - name: Pull docker image - uses: ./.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ inputs.docker-image }} @@ -69,7 +91,7 @@ jobs: python3 -m pip install psutil==5.9.1 python3 -m pip install pynvml==11.4.1 python3 -m tools.stats.monitor > usage_log.txt 2>&1 & - echo "::set-output name=monitor-script-pid::${!}" + echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" - name: Download build artifacts uses: ./.github/actions/download-build-artifacts @@ -96,6 +118,9 @@ jobs: SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 DOCKER_IMAGE: ${{ inputs.docker-image }} XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla + PYTORCH_JIT_ENABLE_NVFUSER: 1 + PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: ${{ matrix.mem_leak_check && '1' || '0' }} + PYTORCH_TEST_RERUN_DISABLED_TESTS: ${{ matrix.rerun_disabled_tests && '1' || '0' }} timeout-minutes: 270 run: | set -x @@ -145,6 +170,8 @@ jobs: -e MAX_JOBS="$(nproc --ignore=2)" \ -e SCCACHE_BUCKET \ -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ + -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ + -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ --ulimit stack=10485760:83886080 \ --security-opt seccomp=unconfined \ @@ -179,6 +206,7 @@ jobs: - name: Stop monitoring script if: always() && steps.monitor-script.outputs.monitor-script-pid shell: bash + continue-on-error: true env: MONITOR_SCRIPT_PID: ${{ steps.monitor-script.outputs.monitor-script-pid }} run: | diff --git a/.github/workflows/_run_android_tests.yml b/.github/workflows/_run_android_tests.yml index 273ec2db81ae..ae992baab11a 100644 --- a/.github/workflows/_run_android_tests.yml +++ b/.github/workflows/_run_android_tests.yml @@ -21,16 +21,16 @@ jobs: - name: Install dependencies run: | conda install -y \ - cffi \ - cmake \ - mkl \ - mkl-include \ - ninja \ - numpy \ - pyyaml \ - requests \ - setuptools \ - typing_extensions + cffi=1.15.1 \ + cmake=3.22.1 \ + mkl=2022.1.0 \ + mkl-include=2022.1.0 \ + ninja=1.10.2 \ + numpy=1.23.3 \ + pyyaml=6.0 \ + requests=2.28.1 \ + setuptools=65.5.0 \ + typing_extensions=4.3.0 # [see note: pytorch repo ref] - name: Checkout PyTorch diff --git a/.github/workflows/_update-commit-hash.yml b/.github/workflows/_update-commit-hash.yml index 42e12d9dca9f..416e05c0cc53 100644 --- a/.github/workflows/_update-commit-hash.yml +++ b/.github/workflows/_update-commit-hash.yml @@ -27,7 +27,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Checkout repo - uses: actions/checkout@v2 + uses: actions/checkout@v3 with: fetch-depth: 1 submodules: false diff --git a/.github/workflows/_win-build.yml b/.github/workflows/_win-build.yml index fb2195fafce6..8baaca498d17 100644 --- a/.github/workflows/_win-build.yml +++ b/.github/workflows/_win-build.yml @@ -23,6 +23,18 @@ on: description: | If this is set, our linter will use this to make sure that every other job with the same `sync-tag` is identical. + test-matrix: + required: false + type: string + description: | + An option JSON description of what test configs to run later on. This + is moved here from the Linux test workflow so that we can apply filter + logic using test-config labels earlier and skip unnecessary builds + + outputs: + test-matrix: + value: ${{ inputs.test-matrix }} + description: An optional JSON description of what test configs to run later on. env: GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} @@ -34,6 +46,20 @@ jobs: runs-on: [self-hosted, windows.4xlarge] timeout-minutes: 240 steps: + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + instructions: | + To forward remote desktop on your local machine ssh as follows: + ssh -L 3389:localhost:3389 %%username%%@%%hostname%% + And then change password using `passwd` command. + + To start build locally, change working folder to \actions-runner\_work\pytorch\pytorch, + Activate miniconda and Visual Studio environment, but running: + call C:\Jenkins\Miniconda3\Scripts\activate.bat C:\Jenkins\Miniconda3 + call "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64 + # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -45,11 +71,6 @@ jobs: with: cuda-version: ${{ inputs.cuda-version }} - - name: Setup SSH (Click me for login details) - uses: ./.github/actions/setup-ssh - with: - github-secret: ${{ secrets.GITHUB_TOKEN }} - - name: Parse ref id: parse-ref run: .github/scripts/parse_ref.py @@ -61,7 +82,17 @@ jobs: with: github-token: ${{ secrets.GITHUB_TOKEN }} + # Apply the filter logic to the build step too if the test-config label is already there + - name: Select all requested test configurations (if the test matrix is available) + id: filter + uses: ./.github/actions/filter-test-configs + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + test-matrix: ${{ inputs.test-matrix }} + - name: Build + if: steps.filter.outputs.is-test-matrix-empty == 'False' || inputs.test-matrix == '' + id: build shell: bash env: PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/ @@ -89,6 +120,7 @@ jobs: # Upload to github so that people can click and download artifacts - name: Upload artifacts to s3 + if: steps.build.outcome != 'skipped' uses: seemethere/upload-artifact-s3@v5 with: retention-days: 14 @@ -97,6 +129,7 @@ jobs: path: C:\${{ github.run_id }}\build-results - name: Upload sccache stats + if: steps.build.outcome != 'skipped' uses: seemethere/upload-artifact-s3@v5 with: s3-prefix: | diff --git a/.github/workflows/_win-test.yml b/.github/workflows/_win-test.yml index 560c0fe84e1d..0cabb8ec469a 100644 --- a/.github/workflows/_win-test.yml +++ b/.github/workflows/_win-test.yml @@ -27,15 +27,42 @@ env: GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} jobs: + # This needs to be run right before the test starts so that it can gather the + # latest labels from the PR + filter: + runs-on: [self-hosted, linux.large] + outputs: + test-matrix: ${{ steps.filter.outputs.test-matrix }} + is-test-matrix-empty: ${{ steps.filter.outputs.is-test-matrix-empty }} + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + fetch-depth: 1 + submodules: false + + - name: Select all requested test configurations + id: filter + uses: ./.github/actions/filter-test-configs + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + test-matrix: ${{ inputs.test-matrix }} + test: - # Don't run on forked repos. - if: github.repository_owner == 'pytorch' + needs: filter + # Don't run on forked repos or empty test matrix + if: github.repository_owner == 'pytorch' && needs.filter.outputs.is-test-matrix-empty == 'False' strategy: - matrix: ${{ fromJSON(inputs.test-matrix) }} + matrix: ${{ fromJSON(needs.filter.outputs.test-matrix) }} fail-fast: false runs-on: ${{ matrix.runner }} timeout-minutes: 300 steps: + - name: Enable git symlinks on Windows + shell: bash + run: | + git config --global core.symlinks true + # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -48,7 +75,7 @@ jobs: cuda-version: ${{ inputs.cuda-version }} - name: Setup SSH (Click me for login details) - uses: ./.github/actions/setup-ssh + uses: pytorch/test-infra/.github/actions/setup-ssh@main with: github-secret: ${{ secrets.GITHUB_TOKEN }} @@ -59,7 +86,7 @@ jobs: python3 -m pip install psutil==5.9.1 python3 -m pip install pynvml==11.4.1 python3 -m tools.stats.monitor > usage_log.txt 2>&1 & - echo "::set-output name=monitor-script-pid::${!}" + echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" - name: Download PyTorch Build Artifacts uses: seemethere/download-artifact-s3@v4 @@ -97,6 +124,8 @@ jobs: TEST_CONFIG: ${{ matrix.config }} PR_BODY: ${{ github.event.pull_request.body }} TORCH_CUDA_ARCH_LIST: "7.0" + PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: ${{ matrix.mem_leak_check && '1' || '0' }} + PYTORCH_TEST_RERUN_DISABLED_TESTS: ${{ matrix.rerun_disabled_tests && '1' || '0' }} run: | COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") @@ -124,6 +153,7 @@ jobs: - name: Stop monitoring script if: always() && steps.monitor-script.outputs.monitor-script-pid shell: bash + continue-on-error: true env: MONITOR_SCRIPT_PID: ${{ steps.monitor-script.outputs.monitor-script-pid }} run: | diff --git a/.github/workflows/auto_request_review.yml b/.github/workflows/auto_request_review.yml new file mode 100644 index 000000000000..7c98c2990fba --- /dev/null +++ b/.github/workflows/auto_request_review.yml @@ -0,0 +1,22 @@ +name: Auto Request Review + +on: + pull_request: + types: [opened, ready_for_review, reopened] + +jobs: + auto-request-review: + # Don't run on forked repos + if: ${{ !github.event.pull_request.head.repo.fork }} + name: Auto Request Review + runs-on: ubuntu-latest + steps: + - name: Request review based on files changes and/or groups the author belongs to + # v0.7.0 + uses: necojackarc/auto-request-review@e08cdffa277d50854744de3f76230260e61c67f4 + with: + token: ${{ secrets.GITHUB_TOKEN }} + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true diff --git a/.github/workflows/build-triton-wheel.yml b/.github/workflows/build-triton-wheel.yml new file mode 100644 index 000000000000..fac2a1340b42 --- /dev/null +++ b/.github/workflows/build-triton-wheel.yml @@ -0,0 +1,149 @@ +name: Build Triton wheels + +on: + push: + branches: + - main + - master + paths: + - .github/workflows/build-triton-wheel.yml + - .github/scripts/build_triton_wheel.py + - .github/ci_commit_pins/triton.txt + pull_request: + paths: + - .github/workflows/build-triton-wheel.yml + - .github/scripts/build_triton_wheel.py + - .github/ci_commit_pins/triton.txt + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + build-wheel: + runs-on: [self-hosted, linux.2xlarge] + strategy: + fail-fast: false + matrix: + py_vers: [ "3.7", "3.8", "3.9", "3.10", "3.11" ] + timeout-minutes: 40 + env: + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + PY_VERS: ${{ matrix.py_vers }} + steps: + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + submodules: false + + - name: Setup Linux + uses: ./.github/actions/setup-linux + + - name: Pull Docker image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main + with: + docker-image: ${{ env.DOCKER_IMAGE }} + + - name: Build Triton wheel + run: | + set -x + mkdir -p "${RUNNER_TEMP}/artifacts/" + container_name=$(docker run \ + --tty \ + --detach \ + -v "${GITHUB_WORKSPACE}:/pytorch" \ + -v "${RUNNER_TEMP}/artifacts:/artifacts" \ + -w /artifacts/ \ + "${DOCKER_IMAGE}" \ + ) + + # Determine python executable for given version + case $PY_VERS in + 3.7) + PYTHON_EXECUTABLE=/opt/python/cp37-cp37m/bin/python + ;; + 3.8) + PYTHON_EXECUTABLE=/opt/python/cp38-cp38/bin/python + ;; + 3.9) + PYTHON_EXECUTABLE=/opt/python/cp39-cp39/bin/python + ;; + 3.10) + PYTHON_EXECUTABLE=/opt/python/cp310-cp310/bin/python + ;; + 3.11) + PYTHON_EXECUTABLE=/opt/python/cp311-cp311/bin/python + ;; + *) + echo "Unsupported python version ${PY_VERS}" + exit 1 + ;; + esac + + docker exec -t "${container_name}" yum install -y llvm11 llvm11-devel llvm11-static llvm11-libs zlib-devel + docker exec -t "${container_name}" "${PYTHON_EXECUTABLE}" /pytorch/.github/scripts/build_triton_wheel.py + docker exec -t "${container_name}" chown -R 1000.1000 /artifacts + + - uses: actions/upload-artifact@v3 + with: + name: "pytorch-triton-${{ matrix.py_vers }}" + if-no-files-found: error + path: + ${{ runner.temp }}/artifacts/* + + - name: Teardown Linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main + if: always() + upload-wheel: + runs-on: linux.20_04.4x + needs: build-wheel + container: + image: continuumio/miniconda3:4.12.0 + env: + GITHUB_TOKEN: ${{ secrets.github-token }} + steps: + - name: Download Build Artifacts (3.7) + uses: actions/download-artifact@v3 + with: + name: "pytorch-triton-3.7" + path: "${{ runner.temp }}/artifacts/" + - name: Download Build Artifacts (3.8) + uses: actions/download-artifact@v3 + with: + name: "pytorch-triton-3.8" + path: "${{ runner.temp }}/artifacts/" + - name: Download Build Artifacts (3.9) + uses: actions/download-artifact@v3 + with: + name: "pytorch-triton-3.9" + path: "${{ runner.temp }}/artifacts/" + - name: Download Build Artifacts (3.10) + uses: actions/download-artifact@v3 + with: + name: "pytorch-triton-3.10" + path: "${{ runner.temp }}/artifacts/" + - name: Download Build Artifacts (3.11) + uses: actions/download-artifact@v3 + with: + name: "pytorch-triton-3.11" + path: "${{ runner.temp }}/artifacts/" + - name: Upload binaries + if: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/master' || github.event.ref == 'refs/heads/main') }} + env: + PKG_DIR: "${{ runner.temp }}/artifacts" + # When running these on pull_request events these should be blank + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_S3_UPDATE_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_S3_UPDATE_SECRET_ACCESS_KEY }} + UPLOAD_BUCKET: "s3://pytorch" + run: | + set -ex + pip install -q awscli + s3_dir="${UPLOAD_BUCKET}/whl/nightly/" + for pkg in "${PKG_DIR}/"*.whl; do + aws s3 cp --no-progress --acl public-read "${pkg}" "${s3_dir}" + done diff --git a/.github/workflows/check-labels.yml b/.github/workflows/check-labels.yml new file mode 100644 index 000000000000..5fa5fed16daf --- /dev/null +++ b/.github/workflows/check-labels.yml @@ -0,0 +1,44 @@ +name: Check Labels + +on: + pull_request: + types: [opened, synchronize, reopened, labeled, unlabeled] + workflow_dispatch: + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + check-labels: + name: Check labels + runs-on: linux.20_04.4x + steps: + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + submodules: false + fetch-depth: 1 + + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.8' + architecture: x64 + check-latest: false + cache: pip + cache-dependency-path: | + **/.github/requirements-gha-cache.txt + + - name: Install requirements + id: requirements + run: | + pip install -r .github/requirements-gha-cache.txt --user + + - name: Check labels + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PR_NUM: ${{ github.event.number }} + run: | + set -ex + python3 .github/scripts/check_labels.py "${PR_NUM}" diff --git a/.github/workflows/docker-builds.yml b/.github/workflows/docker-builds.yml index 974ac458d4ca..3108f4b926a8 100644 --- a/.github/workflows/docker-builds.yml +++ b/.github/workflows/docker-builds.yml @@ -33,20 +33,15 @@ jobs: strategy: matrix: include: - - docker-image-name: pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7 - docker-image-name: pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9 - docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 - docker-image-name: pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7 - docker-image-name: pytorch-linux-bionic-py3.7-clang9 - - docker-image-name: pytorch-linux-focal-rocm5.1-py3.7 - - docker-image-name: pytorch-linux-focal-rocm5.2-py3.7 + - docker-image-name: pytorch-linux-focal-rocm5.1-py3.8 + - docker-image-name: pytorch-linux-focal-rocm5.2-py3.8 - docker-image-name: pytorch-linux-jammy-cuda11.6-cudnn8-py3.8-clang12 - docker-image-name: pytorch-linux-jammy-cuda11.7-cudnn8-py3.8-clang12 - - docker-image-name: pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7 - - docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 - - docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c - - docker-image-name: pytorch-linux-xenial-py3-clang5-asan - - docker-image-name: pytorch-linux-xenial-py3-clang7-onnx + - docker-image-name: pytorch-linux-focal-py3-clang7-android-ndk-r19c - docker-image-name: pytorch-linux-focal-py3.7-gcc7 - docker-image-name: pytorch-linux-focal-py3-clang7-asan - docker-image-name: pytorch-linux-focal-py3-clang10-onnx @@ -81,7 +76,7 @@ jobs: push-ghcr-image: ${{ github.event_name == 'push' }} - name: Pull docker image - uses: ./.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: docker-image: ${{ steps.build-docker-image.outputs.docker-image }} @@ -90,5 +85,6 @@ jobs: if: always() - name: Teardown Linux - uses: ./.github/actions/teardown-linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main + if: always() diff --git a/.github/workflows/docker-release.yml b/.github/workflows/docker-release.yml new file mode 100644 index 000000000000..0f9638e210ad --- /dev/null +++ b/.github/workflows/docker-release.yml @@ -0,0 +1,110 @@ +name: Build Official Docker Images + +on: + workflow_dispatch: + pull_request: + paths: + - Dockerfile + - docker.Makefile + - .github/workflows/docker-release.yml + push: + branches: + - nightly + tags: + # Release candidate tags look like: v1.11.0-rc1 + - v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+ + - ciflow/nightly/* + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +env: + BUILD_PROGRESS: plain + BUILD_TYPE: official + DOCKER_ORG: pytorch + DOCKER_REGISTRY: ghcr.io + NO_BUILD_SUFFIX: true + USE_BUILDX: 1 + WITH_PUSH: ${{ github.event_name == 'push' && (github.event.ref == 'refs/heads/nightly' || (startsWith(github.event.ref, 'refs/tags/') && !startsWith(github.event.ref, 'refs/tags/ciflow/'))) }} + +jobs: + build: + if: ${{ github.repository == 'pytorch/pytorch' }} + runs-on: [self-hosted, linux.2xlarge] + timeout-minutes: 240 + strategy: + matrix: + include: + # nvidia specific images don't exist for arm64 so only build the runtime image + - image_type: runtime + platform: linux/arm64,linux/amd64 + - image_type: devel + platform: linux/amd64 + env: + BUILD_IMAGE_TYPE: ${{ matrix.image_type }} + BUILD_PLATFORMS: ${{ matrix.platform }} + steps: + - name: Setup SSH (Click me for login details) + uses: pytorch/test-infra/.github/actions/setup-ssh@main + with: + github-secret: ${{ secrets.GITHUB_TOKEN }} + # [see note: pytorch repo ref] + # deep clone (fetch-depth 0) required for git merge-base + - name: Checkout PyTorch + uses: actions/checkout@v3 + with: + fetch-depth: 0 + submodules: 'recursive' + - name: Setup Linux + uses: ./.github/actions/setup-linux + - name: Login to GitHub Container Registry + if: ${{ env.WITH_PUSH == 'true' }} + uses: docker/login-action@v2 + with: + registry: ghcr.io + username: pytorch + password: ${{ secrets.GHCR_PAT }} + # Setup multi-arch image builds + - name: Set up QEMU + uses: docker/setup-qemu-action@v2 + env: + QEMU_BINARY_PATH: ${{ runner.temp }}/bin + - name: Set up Docker Buildx + uses: docker/setup-buildx-action@v2 + - name: Setup job specific variables + run: | + set -eou pipefail + # To get QEMU binaries in our PATh + echo "${RUNNER_TEMP}/bin" >> "${GITHUB_PATH}" + # Generate PyTorch version to use + echo "PYTORCH_VERSION=$(python3 .github/scripts/generate_pytorch_version.py)" >> "${GITHUB_ENV}" + - name: Setup nightly specific variables + if: ${{ github.event.ref == 'refs/heads/nightly' || startsWith(github.event.ref, 'refs/tags/ciflow/nightly/') }} + run: | + { + echo "DOCKER_IMAGE=pytorch-nightly"; + echo "INSTALL_CHANNEL=pytorch-nightly"; + echo "TRITON_VERSION=2.0.0+$(cut -c -10 .github/ci_commit_pins/triton.txt)"; + } >> "${GITHUB_ENV}" + - name: Run docker build / push + # WITH_PUSH is used here to determine whether or not to add the --push flag + run: | + make -f docker.Makefile "${BUILD_IMAGE_TYPE}-image" + - name: Push nightly tags + if: ${{ github.event.ref == 'refs/heads/nightly' && matrix.image_type == 'runtime' }} + run: | + PYTORCH_DOCKER_TAG="${PYTORCH_VERSION}-runtime" + CUDA_VERSION=$(python3 -c "import re;print(re.search('CUDA_VERSION\s+=\s+([0-9\.]+)',open('docker.Makefile').read())[1],end='')") + PYTORCH_NIGHTLY_COMMIT=$(docker run ghcr.io/pytorch/pytorch-nightly:"${PYTORCH_DOCKER_TAG}" \ + python -c 'import torch; print(torch.version.git_version[:7],end="")') + docker tag ghcr.io/pytorch/pytorch-nightly:"${PYTORCH_DOCKER_TAG}" \ + ghcr.io/pytorch/pytorch-nightly:"${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION}" + docker push ghcr.io/pytorch/pytorch-nightly:"${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION}" + + docker tag ghcr.io/pytorch/pytorch-nightly:"${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION}" \ + ghcr.io/pytorch/pytorch-nightly:latest + docker push ghcr.io/pytorch/pytorch-nightly:latest + - name: Teardown Linux + uses: pytorch/test-infra/.github/actions/teardown-linux@main + if: always() diff --git a/.github/workflows/generated-linux-binary-conda-nightly.yml b/.github/workflows/generated-linux-binary-conda-nightly.yml index 81f779f2f014..f37b8de5144c 100644 --- a/.github/workflows/generated-linux-binary-conda-nightly.yml +++ b/.github/workflows/generated-linux-binary-conda-nightly.yml @@ -93,126 +93,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_7-cuda10_2-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda10_2 - build_environment: linux-binary-conda - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - conda-py3_7-cuda10_2-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda10_2-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda10_2 - build_environment: linux-binary-conda - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - conda-py3_7-cuda10_2-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - conda-py3_7-cuda11_3-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda11_3 - build_environment: linux-binary-conda - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - conda-py3_7-cuda11_3-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda11_3-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda11_3 - build_environment: linux-binary-conda - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - conda-py3_7-cuda11_3-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda11_3-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda11_3 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml conda-py3_7-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml @@ -390,126 +270,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_8-cuda10_2-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda10_2 - build_environment: linux-binary-conda - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - conda-py3_8-cuda10_2-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda10_2-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda10_2 - build_environment: linux-binary-conda - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - conda-py3_8-cuda10_2-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - conda-py3_8-cuda11_3-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda11_3 - build_environment: linux-binary-conda - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - conda-py3_8-cuda11_3-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_3-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda11_3 - build_environment: linux-binary-conda - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - conda-py3_8-cuda11_3-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_3-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda11_3 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml conda-py3_8-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml @@ -687,126 +447,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_9-cuda10_2-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda10_2 - build_environment: linux-binary-conda - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - conda-py3_9-cuda10_2-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda10_2-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda10_2 - build_environment: linux-binary-conda - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - conda-py3_9-cuda10_2-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - conda-py3_9-cuda11_3-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda11_3 - build_environment: linux-binary-conda - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - conda-py3_9-cuda11_3-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_3-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda11_3 - build_environment: linux-binary-conda - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - conda-py3_9-cuda11_3-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_3-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda11_3 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml conda-py3_9-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml @@ -984,126 +624,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_10-cuda10_2-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.10" - build_name: conda-py3_10-cuda10_2 - build_environment: linux-binary-conda - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - conda-py3_10-cuda10_2-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda10_2-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.10" - build_name: conda-py3_10-cuda10_2 - build_environment: linux-binary-conda - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - conda-py3_10-cuda10_2-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda10.2 - DESIRED_PYTHON: "3.10" - build_name: conda-py3_10-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - conda-py3_10-cuda11_3-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.10" - build_name: conda-py3_10-cuda11_3 - build_environment: linux-binary-conda - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - conda-py3_10-cuda11_3-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda11_3-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.10" - build_name: conda-py3_10-cuda11_3 - build_environment: linux-binary-conda - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - conda-py3_10-cuda11_3-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda11_3-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/conda-builder:cuda11.3 - DESIRED_PYTHON: "3.10" - build_name: conda-py3_10-cuda11_3 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml conda-py3_10-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml diff --git a/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml b/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml index a4cfb807988b..6b1765b9a405 100644 --- a/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml +++ b/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml @@ -276,510 +276,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda10_2-shared-with-deps-cxx11-abi-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-shared-with-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda10_2-shared-with-deps-cxx11-abi-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-shared-with-deps-cxx11-abi-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-shared-with-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda10_2-shared-with-deps-cxx11-abi-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-shared-with-deps-cxx11-abi-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-shared-with-deps-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda10_2-shared-without-deps-cxx11-abi-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-shared-without-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda10_2-shared-without-deps-cxx11-abi-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-shared-without-deps-cxx11-abi-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-shared-without-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda10_2-shared-without-deps-cxx11-abi-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-shared-without-deps-cxx11-abi-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-shared-without-deps-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda10_2-static-with-deps-cxx11-abi-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-static-with-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda10_2-static-with-deps-cxx11-abi-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-static-with-deps-cxx11-abi-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-static-with-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda10_2-static-with-deps-cxx11-abi-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-static-with-deps-cxx11-abi-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-static-with-deps-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda10_2-static-without-deps-cxx11-abi-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-static-without-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda10_2-static-without-deps-cxx11-abi-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-static-without-deps-cxx11-abi-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-static-without-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda10_2-static-without-deps-cxx11-abi-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-static-without-deps-cxx11-abi-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda10.2 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda10_2-static-without-deps-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-shared-with-deps-cxx11-abi-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-shared-with-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda11_3-shared-with-deps-cxx11-abi-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-with-deps-cxx11-abi-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-shared-with-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda11_3-shared-with-deps-cxx11-abi-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-with-deps-cxx11-abi-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-shared-with-deps-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-shared-without-deps-cxx11-abi-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-shared-without-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda11_3-shared-without-deps-cxx11-abi-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-without-deps-cxx11-abi-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-shared-without-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda11_3-shared-without-deps-cxx11-abi-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-without-deps-cxx11-abi-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-shared-without-deps-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-static-with-deps-cxx11-abi-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-static-with-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda11_3-static-with-deps-cxx11-abi-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-with-deps-cxx11-abi-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-static-with-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda11_3-static-with-deps-cxx11-abi-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-with-deps-cxx11-abi-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-static-with-deps-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-static-without-deps-cxx11-abi-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-static-without-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda11_3-static-without-deps-cxx11-abi-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-without-deps-cxx11-abi-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-static-without-deps-cxx11-abi - build_environment: linux-binary-libtorch-cxx11-abi - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda11_3-static-without-deps-cxx11-abi-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-without-deps-cxx11-abi-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cuda11.3 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cuda11_3-static-without-deps-cxx11-abi - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml libtorch-cuda11_6-shared-with-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml @@ -1284,7 +780,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-rocm5_1_1-shared-with-deps-cxx11-abi-build: + libtorch-rocm5_2-shared-with-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1293,20 +789,20 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-rocm5_1_1-shared-with-deps-cxx11-abi + build_name: libtorch-rocm5_2-shared-with-deps-cxx11-abi build_environment: linux-binary-libtorch-cxx11-abi secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-rocm5_1_1-shared-with-deps-cxx11-abi-test: # Testing + libtorch-rocm5_2-shared-with-deps-cxx11-abi-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_1_1-shared-with-deps-cxx11-abi-build + needs: libtorch-rocm5_2-shared-with-deps-cxx11-abi-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1315,11 +811,11 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: @@ -1349,7 +845,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1360,10 +861,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: libtorch-rocm5_1_1-shared-with-deps-cxx11-abi + name: libtorch-rocm5_2-shared-with-deps-cxx11-abi path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1392,9 +893,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/libtorch-cxx11-builder:rocm5.1.1 + docker-image: pytorch/libtorch-cxx11-builder:rocm5.2 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1405,29 +906,29 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - libtorch-rocm5_1_1-shared-with-deps-cxx11-abi-upload: # Uploading + libtorch-rocm5_2-shared-with-deps-cxx11-abi-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_1_1-shared-with-deps-cxx11-abi-test + needs: libtorch-rocm5_2-shared-with-deps-cxx11-abi-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-rocm5_1_1-shared-with-deps-cxx11-abi + build_name: libtorch-rocm5_2-shared-with-deps-cxx11-abi secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-rocm5_1_1-static-with-deps-cxx11-abi-build: + libtorch-rocm5_2-static-with-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1436,20 +937,20 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-rocm5_1_1-static-with-deps-cxx11-abi + build_name: libtorch-rocm5_2-static-with-deps-cxx11-abi build_environment: linux-binary-libtorch-cxx11-abi secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-rocm5_1_1-static-with-deps-cxx11-abi-test: # Testing + libtorch-rocm5_2-static-with-deps-cxx11-abi-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_1_1-static-with-deps-cxx11-abi-build + needs: libtorch-rocm5_2-static-with-deps-cxx11-abi-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1458,11 +959,11 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: @@ -1492,7 +993,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1503,10 +1009,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: libtorch-rocm5_1_1-static-with-deps-cxx11-abi + name: libtorch-rocm5_2-static-with-deps-cxx11-abi path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1535,9 +1041,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/libtorch-cxx11-builder:rocm5.1.1 + docker-image: pytorch/libtorch-cxx11-builder:rocm5.2 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1548,29 +1054,29 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - libtorch-rocm5_1_1-static-with-deps-cxx11-abi-upload: # Uploading + libtorch-rocm5_2-static-with-deps-cxx11-abi-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_1_1-static-with-deps-cxx11-abi-test + needs: libtorch-rocm5_2-static-with-deps-cxx11-abi-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-rocm5_1_1-static-with-deps-cxx11-abi + build_name: libtorch-rocm5_2-static-with-deps-cxx11-abi secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-rocm5_2-shared-with-deps-cxx11-abi-build: + libtorch-rocm5_3-shared-with-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1579,20 +1085,20 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.3 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-rocm5_2-shared-with-deps-cxx11-abi + build_name: libtorch-rocm5_3-shared-with-deps-cxx11-abi build_environment: linux-binary-libtorch-cxx11-abi secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-rocm5_2-shared-with-deps-cxx11-abi-test: # Testing + libtorch-rocm5_3-shared-with-deps-cxx11-abi-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_2-shared-with-deps-cxx11-abi-build + needs: libtorch-rocm5_3-shared-with-deps-cxx11-abi-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1601,11 +1107,11 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.3 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: @@ -1635,7 +1141,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1646,10 +1157,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: libtorch-rocm5_2-shared-with-deps-cxx11-abi + name: libtorch-rocm5_3-shared-with-deps-cxx11-abi path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1678,9 +1189,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/libtorch-cxx11-builder:rocm5.2 + docker-image: pytorch/libtorch-cxx11-builder:rocm5.3 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1691,29 +1202,29 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - libtorch-rocm5_2-shared-with-deps-cxx11-abi-upload: # Uploading + libtorch-rocm5_3-shared-with-deps-cxx11-abi-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_2-shared-with-deps-cxx11-abi-test + needs: libtorch-rocm5_3-shared-with-deps-cxx11-abi-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.3 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-rocm5_2-shared-with-deps-cxx11-abi + build_name: libtorch-rocm5_3-shared-with-deps-cxx11-abi secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-rocm5_2-static-with-deps-cxx11-abi-build: + libtorch-rocm5_3-static-with-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1722,20 +1233,20 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.3 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-rocm5_2-static-with-deps-cxx11-abi + build_name: libtorch-rocm5_3-static-with-deps-cxx11-abi build_environment: linux-binary-libtorch-cxx11-abi secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-rocm5_2-static-with-deps-cxx11-abi-test: # Testing + libtorch-rocm5_3-static-with-deps-cxx11-abi-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_2-static-with-deps-cxx11-abi-build + needs: libtorch-rocm5_3-static-with-deps-cxx11-abi-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1744,11 +1255,11 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.3 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi steps: @@ -1778,7 +1289,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1789,10 +1305,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: libtorch-rocm5_2-static-with-deps-cxx11-abi + name: libtorch-rocm5_3-static-with-deps-cxx11-abi path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1821,9 +1337,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/libtorch-cxx11-builder:rocm5.2 + docker-image: pytorch/libtorch-cxx11-builder:rocm5.3 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1834,22 +1350,22 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - libtorch-rocm5_2-static-with-deps-cxx11-abi-upload: # Uploading + libtorch-rocm5_3-static-with-deps-cxx11-abi-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_2-static-with-deps-cxx11-abi-test + needs: libtorch-rocm5_3-static-with-deps-cxx11-abi-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.2 + DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:rocm5.3 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-rocm5_2-static-with-deps-cxx11-abi + build_name: libtorch-rocm5_3-static-with-deps-cxx11-abi secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} diff --git a/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-master.yml b/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-master.yml index edacb2e949b0..39e41e67853a 100644 --- a/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-master.yml +++ b/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-master.yml @@ -31,7 +31,7 @@ concurrency: cancel-in-progress: true jobs: - libtorch-cpu-shared-with-deps-cxx11-abi-build: + libtorch-cpu-shared-with-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -42,17 +42,17 @@ jobs: # favor of GPU_ARCH_VERSION DESIRED_CUDA: cpu GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cpu-shared-with-deps-cxx11-abi + DESIRED_DEVTOOLSET: pre-cxx11 + build_name: libtorch-cpu-shared-with-deps-pre-cxx11 build_environment: linux-binary-libtorch-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cpu-shared-with-deps-cxx11-abi-test: # Testing + libtorch-cpu-shared-with-deps-pre-cxx11-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cpu-shared-with-deps-cxx11-abi-build + needs: libtorch-cpu-shared-with-deps-pre-cxx11-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -62,10 +62,10 @@ jobs: # favor of GPU_ARCH_VERSION DESIRED_CUDA: cpu GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/libtorch-cxx11-builder:cpu + DOCKER_IMAGE: pytorch/manylinux-builder:cpu LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: cxx11-abi - build_name: libtorch-cpu-shared-with-deps-cxx11-abi + DESIRED_DEVTOOLSET: pre-cxx11 + build_name: libtorch-cpu-shared-with-deps-pre-cxx11 build_environment: linux-binary-libtorch-pre-cxx11 runs_on: linux.4xlarge secrets: diff --git a/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml b/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml index a09ce3c930a3..eaa928f3e09a 100644 --- a/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml +++ b/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml @@ -276,510 +276,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda10_2-shared-with-deps-pre-cxx11-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-shared-with-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda10_2-shared-with-deps-pre-cxx11-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-shared-with-deps-pre-cxx11-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-shared-with-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda10_2-shared-with-deps-pre-cxx11-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-shared-with-deps-pre-cxx11-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-shared-with-deps-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda10_2-shared-without-deps-pre-cxx11-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-shared-without-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda10_2-shared-without-deps-pre-cxx11-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-shared-without-deps-pre-cxx11-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-shared-without-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda10_2-shared-without-deps-pre-cxx11-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-shared-without-deps-pre-cxx11-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-shared-without-deps-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda10_2-static-with-deps-pre-cxx11-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-static-with-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda10_2-static-with-deps-pre-cxx11-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-static-with-deps-pre-cxx11-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-static-with-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda10_2-static-with-deps-pre-cxx11-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-static-with-deps-pre-cxx11-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-static-with-deps-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda10_2-static-without-deps-pre-cxx11-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-static-without-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda10_2-static-without-deps-pre-cxx11-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-static-without-deps-pre-cxx11-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-static-without-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda10_2-static-without-deps-pre-cxx11-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda10_2-static-without-deps-pre-cxx11-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda10_2-static-without-deps-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-shared-with-deps-pre-cxx11-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-shared-with-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda11_3-shared-with-deps-pre-cxx11-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-with-deps-pre-cxx11-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-shared-with-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda11_3-shared-with-deps-pre-cxx11-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-with-deps-pre-cxx11-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: shared-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-shared-with-deps-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-shared-without-deps-pre-cxx11-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-shared-without-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda11_3-shared-without-deps-pre-cxx11-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-without-deps-pre-cxx11-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-shared-without-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda11_3-shared-without-deps-pre-cxx11-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-without-deps-pre-cxx11-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: shared-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-shared-without-deps-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-static-with-deps-pre-cxx11-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-static-with-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda11_3-static-with-deps-pre-cxx11-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-with-deps-pre-cxx11-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-static-with-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda11_3-static-with-deps-pre-cxx11-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-with-deps-pre-cxx11-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: static-with-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-static-with-deps-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-static-without-deps-pre-cxx11-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-static-without-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - libtorch-cuda11_3-static-without-deps-pre-cxx11-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-without-deps-pre-cxx11-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-static-without-deps-pre-cxx11 - build_environment: linux-binary-libtorch-pre-cxx11 - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-cuda11_3-static-without-deps-pre-cxx11-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-without-deps-pre-cxx11-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - LIBTORCH_VARIANT: static-without-deps - DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-cuda11_3-static-without-deps-pre-cxx11 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml libtorch-cuda11_6-shared-with-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml @@ -1284,7 +780,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-rocm5_1_1-shared-with-deps-pre-cxx11-build: + libtorch-rocm5_2-shared-with-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1293,20 +789,20 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-rocm5_1_1-shared-with-deps-pre-cxx11 + build_name: libtorch-rocm5_2-shared-with-deps-pre-cxx11 build_environment: linux-binary-libtorch-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-rocm5_1_1-shared-with-deps-pre-cxx11-test: # Testing + libtorch-rocm5_2-shared-with-deps-pre-cxx11-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_1_1-shared-with-deps-pre-cxx11-build + needs: libtorch-rocm5_2-shared-with-deps-pre-cxx11-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1315,11 +811,11 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: @@ -1349,7 +845,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1360,10 +861,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: libtorch-rocm5_1_1-shared-with-deps-pre-cxx11 + name: libtorch-rocm5_2-shared-with-deps-pre-cxx11 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1392,9 +893,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.1.1 + docker-image: pytorch/manylinux-builder:rocm5.2 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1405,29 +906,29 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - libtorch-rocm5_1_1-shared-with-deps-pre-cxx11-upload: # Uploading + libtorch-rocm5_2-shared-with-deps-pre-cxx11-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_1_1-shared-with-deps-pre-cxx11-test + needs: libtorch-rocm5_2-shared-with-deps-pre-cxx11-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-rocm5_1_1-shared-with-deps-pre-cxx11 + build_name: libtorch-rocm5_2-shared-with-deps-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-rocm5_1_1-static-with-deps-pre-cxx11-build: + libtorch-rocm5_2-static-with-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1436,20 +937,20 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-rocm5_1_1-static-with-deps-pre-cxx11 + build_name: libtorch-rocm5_2-static-with-deps-pre-cxx11 build_environment: linux-binary-libtorch-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-rocm5_1_1-static-with-deps-pre-cxx11-test: # Testing + libtorch-rocm5_2-static-with-deps-pre-cxx11-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_1_1-static-with-deps-pre-cxx11-build + needs: libtorch-rocm5_2-static-with-deps-pre-cxx11-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1458,11 +959,11 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: @@ -1492,7 +993,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1503,10 +1009,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: libtorch-rocm5_1_1-static-with-deps-pre-cxx11 + name: libtorch-rocm5_2-static-with-deps-pre-cxx11 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1535,9 +1041,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.1.1 + docker-image: pytorch/manylinux-builder:rocm5.2 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1548,29 +1054,29 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - libtorch-rocm5_1_1-static-with-deps-pre-cxx11-upload: # Uploading + libtorch-rocm5_2-static-with-deps-pre-cxx11-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_1_1-static-with-deps-pre-cxx11-test + needs: libtorch-rocm5_2-static-with-deps-pre-cxx11-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-rocm5_1_1-static-with-deps-pre-cxx11 + build_name: libtorch-rocm5_2-static-with-deps-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-rocm5_2-shared-with-deps-pre-cxx11-build: + libtorch-rocm5_3-shared-with-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1579,20 +1085,20 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-rocm5_2-shared-with-deps-pre-cxx11 + build_name: libtorch-rocm5_3-shared-with-deps-pre-cxx11 build_environment: linux-binary-libtorch-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-rocm5_2-shared-with-deps-pre-cxx11-test: # Testing + libtorch-rocm5_3-shared-with-deps-pre-cxx11-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_2-shared-with-deps-pre-cxx11-build + needs: libtorch-rocm5_3-shared-with-deps-pre-cxx11-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1601,11 +1107,11 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: @@ -1635,7 +1141,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1646,10 +1157,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: libtorch-rocm5_2-shared-with-deps-pre-cxx11 + name: libtorch-rocm5_3-shared-with-deps-pre-cxx11 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1678,9 +1189,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.2 + docker-image: pytorch/manylinux-builder:rocm5.3 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1691,29 +1202,29 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - libtorch-rocm5_2-shared-with-deps-pre-cxx11-upload: # Uploading + libtorch-rocm5_3-shared-with-deps-pre-cxx11-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_2-shared-with-deps-pre-cxx11-test + needs: libtorch-rocm5_3-shared-with-deps-pre-cxx11-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 LIBTORCH_VARIANT: shared-with-deps DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-rocm5_2-shared-with-deps-pre-cxx11 + build_name: libtorch-rocm5_3-shared-with-deps-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-rocm5_2-static-with-deps-pre-cxx11-build: + libtorch-rocm5_3-static-with-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1722,20 +1233,20 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-rocm5_2-static-with-deps-pre-cxx11 + build_name: libtorch-rocm5_3-static-with-deps-pre-cxx11 build_environment: linux-binary-libtorch-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - libtorch-rocm5_2-static-with-deps-pre-cxx11-test: # Testing + libtorch-rocm5_3-static-with-deps-pre-cxx11-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_2-static-with-deps-pre-cxx11-build + needs: libtorch-rocm5_3-static-with-deps-pre-cxx11-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1744,11 +1255,11 @@ jobs: PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 steps: @@ -1778,7 +1289,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1789,10 +1305,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: libtorch-rocm5_2-static-with-deps-pre-cxx11 + name: libtorch-rocm5_3-static-with-deps-pre-cxx11 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1821,9 +1337,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.2 + docker-image: pytorch/manylinux-builder:rocm5.3 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1834,22 +1350,22 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - libtorch-rocm5_2-static-with-deps-pre-cxx11-upload: # Uploading + libtorch-rocm5_3-static-with-deps-pre-cxx11-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-rocm5_2-static-with-deps-pre-cxx11-test + needs: libtorch-rocm5_3-static-with-deps-pre-cxx11-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: libtorch # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 LIBTORCH_VARIANT: static-with-deps DESIRED_DEVTOOLSET: pre-cxx11 - build_name: libtorch-rocm5_2-static-with-deps-pre-cxx11 + build_name: libtorch-rocm5_3-static-with-deps-pre-cxx11 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} diff --git a/.github/workflows/generated-linux-binary-manywheel-master.yml b/.github/workflows/generated-linux-binary-manywheel-master.yml index 6412c82b0c46..e085fb5eb5fb 100644 --- a/.github/workflows/generated-linux-binary-manywheel-master.yml +++ b/.github/workflows/generated-linux-binary-manywheel-master.yml @@ -31,7 +31,7 @@ concurrency: cancel-in-progress: true jobs: - manywheel-py3_7-cuda10_2-build: + manywheel-py3_7-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -40,19 +40,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda10_2 + build_name: manywheel-py3_7-cuda11_6 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_7-cuda10_2-test: # Testing + manywheel-py3_7-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-cuda10_2-build + needs: manywheel-py3_7-cuda11_6-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -60,12 +60,12 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda10_2 + build_name: manywheel-py3_7-cuda11_6 build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: diff --git a/.github/workflows/generated-linux-binary-manywheel-nightly.yml b/.github/workflows/generated-linux-binary-manywheel-nightly.yml index 1867ce103a14..b93f797d7e01 100644 --- a/.github/workflows/generated-linux-binary-manywheel-nightly.yml +++ b/.github/workflows/generated-linux-binary-manywheel-nightly.yml @@ -93,67 +93,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_7-cuda10_2-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda10_2 - build_environment: linux-binary-manywheel - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - manywheel-py3_7-cuda10_2-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-cuda10_2-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda10_2 - build_environment: linux-binary-manywheel - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_7-cuda10_2-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_7-cuda11_3-build: + manywheel-py3_7-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -162,19 +102,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda11_3 + build_name: manywheel-py3_7-cuda11_6 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_7-cuda11_3-test: # Testing + manywheel-py3_7-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-cuda11_3-build + needs: manywheel-py3_7-cuda11_6-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -182,38 +122,38 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda11_3 + build_name: manywheel-py3_7-cuda11_6 build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_7-cuda11_3-upload: # Uploading + manywheel-py3_7-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-cuda11_3-test + needs: manywheel-py3_7-cuda11_6-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda11_3 + build_name: manywheel-py3_7-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_7-cuda11_6-build: + manywheel-py3_7-cuda11_7-with-pypi-cudnn-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -222,19 +162,20 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda11_6 + build_name: manywheel-py3_7-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel + PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-runtime-cu11; platform_system == 'Linux' | nvidia-cudnn-cu11==8.5.0.96; platform_system == 'Linux' | nvidia-cublas-cu11==11.10.3.66; platform_system == 'Linux' secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_7-cuda11_6-test: # Testing + manywheel-py3_7-cuda11_7-with-pypi-cudnn-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-cuda11_6-build + needs: manywheel-py3_7-cuda11_7-with-pypi-cudnn-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -242,31 +183,31 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda11_6 + build_name: manywheel-py3_7-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_7-cuda11_6-upload: # Uploading + manywheel-py3_7-cuda11_7-with-pypi-cudnn-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-cuda11_6-test + needs: manywheel-py3_7-cuda11_7-with-pypi-cudnn-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-cuda11_6 + build_name: manywheel-py3_7-cuda11_7-with-pypi-cudnn secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} @@ -333,7 +274,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_7-rocm5_1_1-build: + manywheel-py3_7-rocm5_2-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -342,19 +283,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-rocm5_1_1 + build_name: manywheel-py3_7-rocm5_2 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_7-rocm5_1_1-test: # Testing + manywheel-py3_7-rocm5_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-rocm5_1_1-build + needs: manywheel-py3_7-rocm5_2-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -363,11 +304,11 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.7" steps: - name: Clean workspace @@ -396,7 +337,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -407,10 +353,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: manywheel-py3_7-rocm5_1_1 + name: manywheel-py3_7-rocm5_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -439,9 +385,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.1.1 + docker-image: pytorch/manylinux-builder:rocm5.2 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -452,28 +398,28 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_7-rocm5_1_1-upload: # Uploading + manywheel-py3_7-rocm5_2-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-rocm5_1_1-test + needs: manywheel-py3_7-rocm5_2-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-rocm5_1_1 + build_name: manywheel-py3_7-rocm5_2 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_7-rocm5_2-build: + manywheel-py3_7-rocm5_3-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -482,19 +428,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-rocm5_2 + build_name: manywheel-py3_7-rocm5_3 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_7-rocm5_2-test: # Testing + manywheel-py3_7-rocm5_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-rocm5_2-build + needs: manywheel-py3_7-rocm5_3-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -503,11 +449,11 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.7" steps: - name: Clean workspace @@ -536,7 +482,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -547,10 +498,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: manywheel-py3_7-rocm5_2 + name: manywheel-py3_7-rocm5_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -579,9 +530,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.2 + docker-image: pytorch/manylinux-builder:rocm5.3 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -592,21 +543,21 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_7-rocm5_2-upload: # Uploading + manywheel-py3_7-rocm5_3-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_7-rocm5_2-test + needs: manywheel-py3_7-rocm5_3-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.7" - build_name: manywheel-py3_7-rocm5_2 + build_name: manywheel-py3_7-rocm5_3 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} @@ -670,67 +621,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_8-cuda10_2-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda10_2 - build_environment: linux-binary-manywheel - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - manywheel-py3_8-cuda10_2-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda10_2-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda10_2 - build_environment: linux-binary-manywheel - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_8-cuda10_2-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_8-cuda11_3-build: + manywheel-py3_8-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -739,19 +630,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda11_3 + build_name: manywheel-py3_8-cuda11_6 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_8-cuda11_3-test: # Testing + manywheel-py3_8-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda11_3-build + needs: manywheel-py3_8-cuda11_6-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -759,38 +650,38 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda11_3 + build_name: manywheel-py3_8-cuda11_6 build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_8-cuda11_3-upload: # Uploading + manywheel-py3_8-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda11_3-test + needs: manywheel-py3_8-cuda11_6-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda11_3 + build_name: manywheel-py3_8-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_8-cuda11_6-build: + manywheel-py3_8-cuda11_7-with-pypi-cudnn-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -799,19 +690,20 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda11_6 + build_name: manywheel-py3_8-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel + PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-runtime-cu11; platform_system == 'Linux' | nvidia-cudnn-cu11==8.5.0.96; platform_system == 'Linux' | nvidia-cublas-cu11==11.10.3.66; platform_system == 'Linux' secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_8-cuda11_6-test: # Testing + manywheel-py3_8-cuda11_7-with-pypi-cudnn-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda11_6-build + needs: manywheel-py3_8-cuda11_7-with-pypi-cudnn-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -819,31 +711,31 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda11_6 + build_name: manywheel-py3_8-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_8-cuda11_6-upload: # Uploading + manywheel-py3_8-cuda11_7-with-pypi-cudnn-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-cuda11_6-test + needs: manywheel-py3_8-cuda11_7-with-pypi-cudnn-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-cuda11_6 + build_name: manywheel-py3_8-cuda11_7-with-pypi-cudnn secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} @@ -910,7 +802,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_8-rocm5_1_1-build: + manywheel-py3_8-rocm5_2-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -919,19 +811,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-rocm5_1_1 + build_name: manywheel-py3_8-rocm5_2 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_8-rocm5_1_1-test: # Testing + manywheel-py3_8-rocm5_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-rocm5_1_1-build + needs: manywheel-py3_8-rocm5_2-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -940,11 +832,11 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.8" steps: - name: Clean workspace @@ -973,7 +865,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -984,10 +881,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-rocm5_1_1 + name: manywheel-py3_8-rocm5_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1016,9 +913,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.1.1 + docker-image: pytorch/manylinux-builder:rocm5.2 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1029,28 +926,28 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-rocm5_1_1-upload: # Uploading + manywheel-py3_8-rocm5_2-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-rocm5_1_1-test + needs: manywheel-py3_8-rocm5_2-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-rocm5_1_1 + build_name: manywheel-py3_8-rocm5_2 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_8-rocm5_2-build: + manywheel-py3_8-rocm5_3-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1059,19 +956,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-rocm5_2 + build_name: manywheel-py3_8-rocm5_3 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_8-rocm5_2-test: # Testing + manywheel-py3_8-rocm5_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-rocm5_2-build + needs: manywheel-py3_8-rocm5_3-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1080,11 +977,11 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.8" steps: - name: Clean workspace @@ -1113,7 +1010,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1124,10 +1026,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: manywheel-py3_8-rocm5_2 + name: manywheel-py3_8-rocm5_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1156,9 +1058,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.2 + docker-image: pytorch/manylinux-builder:rocm5.3 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1169,21 +1071,21 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_8-rocm5_2-upload: # Uploading + manywheel-py3_8-rocm5_3-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_8-rocm5_2-test + needs: manywheel-py3_8-rocm5_3-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.8" - build_name: manywheel-py3_8-rocm5_2 + build_name: manywheel-py3_8-rocm5_3 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} @@ -1247,7 +1149,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_9-cuda10_2-build: + manywheel-py3_9-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1256,19 +1158,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda10_2 + build_name: manywheel-py3_9-cuda11_6 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_9-cuda10_2-test: # Testing + manywheel-py3_9-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda10_2-build + needs: manywheel-py3_9-cuda11_6-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -1276,98 +1178,38 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda10_2 + build_name: manywheel-py3_9-cuda11_6 build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_9-cuda10_2-upload: # Uploading + manywheel-py3_9-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_9-cuda11_3-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda11_3 - build_environment: linux-binary-manywheel - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - manywheel-py3_9-cuda11_3-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda11_3-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 - DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda11_3 - build_environment: linux-binary-manywheel - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_9-cuda11_3-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda11_3-test + needs: manywheel-py3_9-cuda11_6-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda11_3 + build_name: manywheel-py3_9-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_9-cuda11_6-build: + manywheel-py3_9-cuda11_7-with-pypi-cudnn-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1376,19 +1218,20 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda11_6 + build_name: manywheel-py3_9-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel + PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-runtime-cu11; platform_system == 'Linux' | nvidia-cudnn-cu11==8.5.0.96; platform_system == 'Linux' | nvidia-cublas-cu11==11.10.3.66; platform_system == 'Linux' secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_9-cuda11_6-test: # Testing + manywheel-py3_9-cuda11_7-with-pypi-cudnn-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda11_6-build + needs: manywheel-py3_9-cuda11_7-with-pypi-cudnn-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -1396,31 +1239,31 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda11_6 + build_name: manywheel-py3_9-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_9-cuda11_6-upload: # Uploading + manywheel-py3_9-cuda11_7-with-pypi-cudnn-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-cuda11_6-test + needs: manywheel-py3_9-cuda11_7-with-pypi-cudnn-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-cuda11_6 + build_name: manywheel-py3_9-cuda11_7-with-pypi-cudnn secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} @@ -1487,7 +1330,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_9-rocm5_1_1-build: + manywheel-py3_9-rocm5_2-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1496,19 +1339,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-rocm5_1_1 + build_name: manywheel-py3_9-rocm5_2 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_9-rocm5_1_1-test: # Testing + manywheel-py3_9-rocm5_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-rocm5_1_1-build + needs: manywheel-py3_9-rocm5_2-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1517,11 +1360,11 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.9" steps: - name: Clean workspace @@ -1550,7 +1393,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1561,10 +1409,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-rocm5_1_1 + name: manywheel-py3_9-rocm5_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1593,9 +1441,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.1.1 + docker-image: pytorch/manylinux-builder:rocm5.2 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1606,28 +1454,28 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-rocm5_1_1-upload: # Uploading + manywheel-py3_9-rocm5_2-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-rocm5_1_1-test + needs: manywheel-py3_9-rocm5_2-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-rocm5_1_1 + build_name: manywheel-py3_9-rocm5_2 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_9-rocm5_2-build: + manywheel-py3_9-rocm5_3-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1636,19 +1484,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-rocm5_2 + build_name: manywheel-py3_9-rocm5_3 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_9-rocm5_2-test: # Testing + manywheel-py3_9-rocm5_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-rocm5_2-build + needs: manywheel-py3_9-rocm5_3-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -1657,11 +1505,11 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.9" steps: - name: Clean workspace @@ -1690,7 +1538,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -1701,10 +1554,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: manywheel-py3_9-rocm5_2 + name: manywheel-py3_9-rocm5_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1733,9 +1586,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.2 + docker-image: pytorch/manylinux-builder:rocm5.3 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -1746,21 +1599,21 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_9-rocm5_2-upload: # Uploading + manywheel-py3_9-rocm5_3-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_9-rocm5_2-test + needs: manywheel-py3_9-rocm5_3-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.9" - build_name: manywheel-py3_9-rocm5_2 + build_name: manywheel-py3_9-rocm5_3 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} @@ -1824,67 +1677,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_10-cuda10_2-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda10_2 - build_environment: linux-binary-manywheel - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - manywheel-py3_10-cuda10_2-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda10_2-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda10_2 - build_environment: linux-binary-manywheel - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_10-cuda10_2-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_10-cuda11_3-build: + manywheel-py3_10-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1893,19 +1686,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda11_3 + build_name: manywheel-py3_10-cuda11_6 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_10-cuda11_3-test: # Testing + manywheel-py3_10-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda11_3-build + needs: manywheel-py3_10-cuda11_6-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -1913,38 +1706,38 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda11_3 + build_name: manywheel-py3_10-cuda11_6 build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_10-cuda11_3-upload: # Uploading + manywheel-py3_10-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda11_3-test + needs: manywheel-py3_10-cuda11_6-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda11_3 + build_name: manywheel-py3_10-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_10-cuda11_6-build: + manywheel-py3_10-cuda11_7-with-pypi-cudnn-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -1953,19 +1746,20 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda11_6 + build_name: manywheel-py3_10-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel + PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-runtime-cu11; platform_system == 'Linux' | nvidia-cudnn-cu11==8.5.0.96; platform_system == 'Linux' | nvidia-cublas-cu11==11.10.3.66; platform_system == 'Linux' secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_10-cuda11_6-test: # Testing + manywheel-py3_10-cuda11_7-with-pypi-cudnn-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda11_6-build + needs: manywheel-py3_10-cuda11_7-with-pypi-cudnn-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -1973,31 +1767,31 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda11_6 + build_name: manywheel-py3_10-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_10-cuda11_6-upload: # Uploading + manywheel-py3_10-cuda11_7-with-pypi-cudnn-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-cuda11_6-test + needs: manywheel-py3_10-cuda11_7-with-pypi-cudnn-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-cuda11_6 + build_name: manywheel-py3_10-cuda11_7-with-pypi-cudnn secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} @@ -2064,7 +1858,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_10-rocm5_1_1-build: + manywheel-py3_10-rocm5_2-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -2073,19 +1867,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-rocm5_1_1 + build_name: manywheel-py3_10-rocm5_2 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_10-rocm5_1_1-test: # Testing + manywheel-py3_10-rocm5_2-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-rocm5_1_1-build + needs: manywheel-py3_10-rocm5_2-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -2094,11 +1888,11 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.10" steps: - name: Clean workspace @@ -2127,7 +1921,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -2138,10 +1937,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-rocm5_1_1 + name: manywheel-py3_10-rocm5_2 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2170,9 +1969,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.1.1 + docker-image: pytorch/manylinux-builder:rocm5.2 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -2183,28 +1982,28 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-rocm5_1_1-upload: # Uploading + manywheel-py3_10-rocm5_2-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-rocm5_1_1-test + needs: manywheel-py3_10-rocm5_2-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.1.1 - GPU_ARCH_VERSION: 5.1.1 + DESIRED_CUDA: rocm5.2 + GPU_ARCH_VERSION: 5.2 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.1.1 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-rocm5_1_1 + build_name: manywheel-py3_10-rocm5_2 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_10-rocm5_2-build: + manywheel-py3_10-rocm5_3-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -2213,19 +2012,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-rocm5_2 + build_name: manywheel-py3_10-rocm5_3 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_10-rocm5_2-test: # Testing + manywheel-py3_10-rocm5_3-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-rocm5_2-build + needs: manywheel-py3_10-rocm5_3-build runs-on: linux.rocm.gpu timeout-minutes: 240 env: @@ -2234,11 +2033,11 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm SKIP_ALL_TESTS: 1 - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.10" steps: - name: Clean workspace @@ -2267,7 +2066,12 @@ jobs: run: | ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then - echo "Failed to detect GPUs on the runner" + if [[ $ngpu -eq 0 ]]; then + echo "Error: Failed to detect any GPUs on the runner" + else + echo "Error: Detected $ngpu GPUs on the runner, when only 2 or 4 were expected" + fi + echo "Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" exit 1 fi - name: Runner health check disconnect on failure @@ -2278,10 +2082,10 @@ jobs: run: | env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: manywheel-py3_10-rocm5_2 + name: manywheel-py3_10-rocm5_3 path: "${{ runner.temp }}/artifacts/" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2310,9 +2114,9 @@ jobs: run: | echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}" - name: Pull Docker image - uses: ./pytorch/.github/actions/pull-docker-image + uses: pytorch/test-infra/.github/actions/pull-docker-image@main with: - docker-image: pytorch/manylinux-builder:rocm5.2 + docker-image: pytorch/manylinux-builder:rocm5.3 - name: Test Pytorch binary uses: ./pytorch/.github/actions/test-pytorch-binary - name: Kill containers, clean up images @@ -2323,21 +2127,21 @@ jobs: docker stop $(docker ps -q) || true # Prune all of the docker images docker system prune -af - manywheel-py3_10-rocm5_2-upload: # Uploading + manywheel-py3_10-rocm5_3-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_10-rocm5_2-test + needs: manywheel-py3_10-rocm5_3-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: rocm5.2 - GPU_ARCH_VERSION: 5.2 + DESIRED_CUDA: rocm5.3 + GPU_ARCH_VERSION: 5.3 GPU_ARCH_TYPE: rocm - DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.2 + DOCKER_IMAGE: pytorch/manylinux-builder:rocm5.3 DESIRED_PYTHON: "3.10" - build_name: manywheel-py3_10-rocm5_2 + build_name: manywheel-py3_10-rocm5_3 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} @@ -2401,67 +2205,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_11-cuda10_2-build: - if: ${{ github.repository_owner == 'pytorch' }} - uses: ./.github/workflows/_binary-build-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda10_2 - build_environment: linux-binary-manywheel - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - - manywheel-py3_11-cuda10_2-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_11-cuda10_2-build - uses: ./.github/workflows/_binary-test-linux.yml - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda10_2 - build_environment: linux-binary-manywheel - runs_on: linux.4xlarge.nvidia.gpu - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_11-cuda10_2-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_11-cuda10_2-test - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: manywheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu102 - GPU_ARCH_VERSION: 10.2 - GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda10.2 - DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda10_2 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_11-cuda11_3-build: + manywheel-py3_11-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -2470,19 +2214,19 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda11_3 + build_name: manywheel-py3_11-cuda11_6 build_environment: linux-binary-manywheel secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_11-cuda11_3-test: # Testing + manywheel-py3_11-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_11-cuda11_3-build + needs: manywheel-py3_11-cuda11_6-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -2490,38 +2234,38 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda11_3 + build_name: manywheel-py3_11-cuda11_6 build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_11-cuda11_3-upload: # Uploading + manywheel-py3_11-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_11-cuda11_3-test + needs: manywheel-py3_11-cuda11_6-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.3 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda11_3 + build_name: manywheel-py3_11-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - manywheel-py3_11-cuda11_6-build: + manywheel-py3_11-cuda11_7-with-pypi-cudnn-build: if: ${{ github.repository_owner == 'pytorch' }} uses: ./.github/workflows/_binary-build-linux.yml with: @@ -2530,19 +2274,20 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda11_6 + build_name: manywheel-py3_11-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel + PYTORCH_EXTRA_INSTALL_REQUIREMENTS: nvidia-cuda-runtime-cu11; platform_system == 'Linux' | nvidia-cudnn-cu11==8.5.0.96; platform_system == 'Linux' | nvidia-cublas-cu11==11.10.3.66; platform_system == 'Linux' secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_11-cuda11_6-test: # Testing + manywheel-py3_11-cuda11_7-with-pypi-cudnn-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_11-cuda11_6-build + needs: manywheel-py3_11-cuda11_7-with-pypi-cudnn-build uses: ./.github/workflows/_binary-test-linux.yml with: PYTORCH_ROOT: /pytorch @@ -2550,31 +2295,31 @@ jobs: PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda11_6 + build_name: manywheel-py3_11-cuda11_7-with-pypi-cudnn build_environment: linux-binary-manywheel runs_on: linux.4xlarge.nvidia.gpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} - manywheel-py3_11-cuda11_6-upload: # Uploading + manywheel-py3_11-cuda11_7-with-pypi-cudnn-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: manywheel-py3_11-cuda11_6-test + needs: manywheel-py3_11-cuda11_7-with-pypi-cudnn-test with: PYTORCH_ROOT: /pytorch BUILDER_ROOT: /builder PACKAGE_TYPE: manywheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda - DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.6 + DOCKER_IMAGE: pytorch/manylinux-builder:cuda11.7 DESIRED_PYTHON: "3.11" - build_name: manywheel-py3_11-cuda11_6 + build_name: manywheel-py3_11-cuda11_7-with-pypi-cudnn secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} diff --git a/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml b/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml index a6210cf4ad67..c88b107a90a9 100644 --- a/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml +++ b/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml @@ -67,10 +67,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -95,11 +96,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -110,7 +116,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_8-cpu @@ -171,10 +177,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -199,11 +206,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -214,7 +226,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_9-cpu @@ -275,10 +287,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -303,11 +316,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -318,7 +336,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_10-cpu diff --git a/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml b/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml index 61217b639ad5..c8858fd0501b 100644 --- a/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml +++ b/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml @@ -34,110 +34,6 @@ concurrency: cancel-in-progress: true jobs: - wheel-py3_7-cpu-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-12-xl - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.7" - # For sccache access (only on non-forked PRs) - AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} - steps: - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - # shellcheck disable=SC2129 - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - # shellcheck disable=SC2129 - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - # shellcheck disable=SC2129 - echo "MAC_PACKAGE_WORK_DIR=${RUNNER_TEMP}" >> "${GITHUB_ENV}" - - name: Install conda and dependencies - run: | - # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh - chmod +x "${RUNNER_TEMP}/conda.sh" - /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" - echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Install sccache (only for non-forked PRs, and pushes to trunk) - if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - - name: Populate binary env - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - run: | - # shellcheck disable=SC1091 - source "${RUNNER_TEMP}/anaconda/bin/activate" - "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 - if: always() - with: - name: wheel-py3_7-cpu - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - wheel-py3_7-cpu-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_7-cpu-build - with: - PYTORCH_ROOT: /pytorch - BUILDER_ROOT: /builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - DOCKER_IMAGE: pytorch/manylinux-builder:cpu - DESIRED_PYTHON: "3.7" - build_name: wheel-py3_7-cpu - use_s3: False - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml wheel-py3_8-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: macos-12-xl @@ -171,10 +67,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -199,11 +96,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -214,7 +116,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_8-cpu @@ -275,10 +177,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -303,11 +206,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -318,7 +226,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_9-cpu @@ -379,10 +287,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -407,11 +316,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -422,7 +336,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_10-cpu diff --git a/.github/workflows/generated-macos-binary-conda-nightly.yml b/.github/workflows/generated-macos-binary-conda-nightly.yml index 174650de54d7..52cfb3d98f76 100644 --- a/.github/workflows/generated-macos-binary-conda-nightly.yml +++ b/.github/workflows/generated-macos-binary-conda-nightly.yml @@ -65,10 +65,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -93,11 +94,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -108,7 +114,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_7-cpu @@ -169,10 +175,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -197,11 +204,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -212,7 +224,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_8-cpu @@ -273,10 +285,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -301,11 +314,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -316,7 +334,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_9-cpu @@ -377,10 +395,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -405,11 +424,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -420,7 +444,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_10-cpu diff --git a/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml b/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml index a6e4119c387e..cd9ad45ba561 100644 --- a/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml +++ b/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml @@ -34,9 +34,8 @@ concurrency: jobs: libtorch-cpu-shared-with-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-10.15 - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 + runs-on: macos-12-xl + timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -70,10 +69,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -98,11 +98,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -113,7 +118,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-with-deps-cxx11-abi @@ -144,9 +149,8 @@ jobs: uses: ./.github/workflows/_binary-upload.yml libtorch-cpu-shared-without-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-10.15 - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 + runs-on: macos-12-xl + timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -180,10 +184,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -208,11 +213,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -223,7 +233,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-without-deps-cxx11-abi @@ -254,9 +264,8 @@ jobs: uses: ./.github/workflows/_binary-upload.yml libtorch-cpu-static-with-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-10.15 - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 + runs-on: macos-12-xl + timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -290,10 +299,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -318,11 +328,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -333,7 +348,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-static-with-deps-cxx11-abi @@ -364,9 +379,8 @@ jobs: uses: ./.github/workflows/_binary-upload.yml libtorch-cpu-static-without-deps-cxx11-abi-build: if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-10.15 - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 + runs-on: macos-12-xl + timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -400,10 +414,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -428,11 +443,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -443,7 +463,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-static-without-deps-cxx11-abi diff --git a/.github/workflows/generated-macos-binary-libtorch-pre-cxx11-nightly.yml b/.github/workflows/generated-macos-binary-libtorch-pre-cxx11-nightly.yml index 7853e8009393..4ce5c6f32c36 100644 --- a/.github/workflows/generated-macos-binary-libtorch-pre-cxx11-nightly.yml +++ b/.github/workflows/generated-macos-binary-libtorch-pre-cxx11-nightly.yml @@ -34,9 +34,8 @@ concurrency: jobs: libtorch-cpu-shared-with-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-10.15 - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 + runs-on: macos-12-xl + timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -70,10 +69,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -98,11 +98,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -113,7 +118,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-with-deps-pre-cxx11 @@ -144,9 +149,8 @@ jobs: uses: ./.github/workflows/_binary-upload.yml libtorch-cpu-shared-without-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-10.15 - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 + runs-on: macos-12-xl + timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -180,10 +184,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -208,11 +213,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -223,7 +233,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-without-deps-pre-cxx11 @@ -254,9 +264,8 @@ jobs: uses: ./.github/workflows/_binary-upload.yml libtorch-cpu-static-with-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-10.15 - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 + runs-on: macos-12-xl + timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -290,10 +299,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -318,11 +328,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -333,7 +348,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-static-with-deps-pre-cxx11 @@ -364,9 +379,8 @@ jobs: uses: ./.github/workflows/_binary-upload.yml libtorch-cpu-static-without-deps-pre-cxx11-build: if: ${{ github.repository_owner == 'pytorch' }} - runs-on: macos-10.15 - # libtorch builds take a long time on github hosted runners - timeout-minutes: 720 + runs-on: macos-12-xl + timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -400,10 +414,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -428,11 +443,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -443,7 +463,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-static-without-deps-pre-cxx11 diff --git a/.github/workflows/generated-macos-binary-wheel-nightly.yml b/.github/workflows/generated-macos-binary-wheel-nightly.yml index 47442efef269..a3839d6e8a14 100644 --- a/.github/workflows/generated-macos-binary-wheel-nightly.yml +++ b/.github/workflows/generated-macos-binary-wheel-nightly.yml @@ -65,10 +65,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -93,11 +94,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -108,7 +114,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_7-cpu @@ -169,10 +175,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -197,11 +204,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -212,7 +224,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_8-cpu @@ -273,10 +285,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -301,11 +314,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -316,7 +334,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_9-cpu @@ -377,10 +395,11 @@ jobs: - name: Install conda and dependencies run: | # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on - curl --retry 3 -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh + curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh chmod +x "${RUNNER_TEMP}/conda.sh" /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda" echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}" + echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 with: @@ -405,11 +424,16 @@ jobs: git clean -fxd working-directory: builder - name: Install sccache (only for non-forked PRs, and pushes to trunk) + uses: nick-fields/retry@v2.8.2 if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }} - run: | - sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache - sudo chmod +x /usr/local/bin/sccache - echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" + with: + timeout_minutes: 5 + max_attempts: 3 + retry_wait_seconds: 90 + command: | + sudo curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache + sudo chmod +x /usr/local/bin/sccache + echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}" - name: Populate binary env run: | # shellcheck disable=SC1091 @@ -420,7 +444,7 @@ jobs: # shellcheck disable=SC1091 source "${RUNNER_TEMP}/anaconda/bin/activate" "${PYTORCH_ROOT}/.circleci/scripts/binary_macos_build.sh" - - uses: actions/upload-artifact@v2 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_10-cpu diff --git a/.github/workflows/generated-windows-binary-conda-nightly.yml b/.github/workflows/generated-windows-binary-conda-nightly.yml index b4633b15c661..df7cc13d8a26 100644 --- a/.github/workflows/generated-windows-binary-conda-nightly.yml +++ b/.github/workflows/generated-windows-binary-conda-nightly.yml @@ -115,7 +115,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_7-cpu @@ -188,7 +188,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: conda-py3_7-cpu @@ -256,7 +256,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_7-cuda11_3-build: + conda-py3_7-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -266,8 +266,8 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" @@ -340,10 +340,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_7-cuda11_3 + name: conda-py3_7-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -360,9 +360,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_7-cuda11_3-test: # Testing + conda-py3_7-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda11_3-build + needs: conda-py3_7-cuda11_6-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -371,8 +371,8 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" @@ -414,10 +414,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_7-cuda11_3 + name: conda-py3_7-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -463,27 +463,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_7-cuda11_3-upload: # Uploading + conda-py3_7-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda11_3-test + needs: conda-py3_7-cuda11_6-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda11_3 + build_name: conda-py3_7-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_7-cuda11_6-build: + conda-py3_7-cuda11_7-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -493,8 +493,8 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" @@ -567,10 +567,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_7-cuda11_6 + name: conda-py3_7-cuda11_7 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -587,9 +587,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_7-cuda11_6-test: # Testing + conda-py3_7-cuda11_7-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda11_6-build + needs: conda-py3_7-cuda11_7-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -598,8 +598,8 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" @@ -641,10 +641,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_7-cuda11_6 + name: conda-py3_7-cuda11_7 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -690,27 +690,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_7-cuda11_6-upload: # Uploading + conda-py3_7-cuda11_7-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda11_6-test + needs: conda-py3_7-cuda11_7-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda11_6 + build_name: conda-py3_7-cuda11_7 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_7-cuda11_7-build: + conda-py3_8-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -720,11 +720,10 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.7" + DESIRED_PYTHON: "3.8" steps: - name: Display EC2 information shell: bash @@ -794,10 +793,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_7-cuda11_7 + name: conda-py3_8-cpu retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -814,10 +813,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_7-cuda11_7-test: # Testing + conda-py3_8-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda11_7-build - runs-on: windows.8xlarge.nvidia.gpu + needs: conda-py3_8-cpu-build + runs-on: windows.4xlarge timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -825,11 +824,10 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.7" + DESIRED_PYTHON: "3.8" steps: - name: Display EC2 information shell: bash @@ -868,10 +866,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_7-cuda11_7 + name: conda-py3_8-cpu path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -917,27 +915,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_7-cuda11_7-upload: # Uploading + conda-py3_8-cpu-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_7-cuda11_7-test + needs: conda-py3_8-cpu-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.7" - build_name: conda-py3_7-cuda11_7 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DESIRED_PYTHON: "3.8" + build_name: conda-py3_8-cpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_8-cpu-build: + conda-py3_8-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -947,8 +944,9 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: @@ -1020,10 +1018,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_8-cpu + name: conda-py3_8-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1040,10 +1038,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_8-cpu-test: # Testing + conda-py3_8-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cpu-build - runs-on: windows.4xlarge + needs: conda-py3_8-cuda11_6-build + runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1051,8 +1049,9 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: @@ -1093,10 +1092,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_8-cpu + name: conda-py3_8-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1142,26 +1141,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_8-cpu-upload: # Uploading + conda-py3_8-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cpu-test + needs: conda-py3_8-cuda11_6-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cpu + build_name: conda-py3_8-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_8-cuda11_3-build: + conda-py3_8-cuda11_7-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -1171,8 +1171,8 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" @@ -1245,10 +1245,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_8-cuda11_3 + name: conda-py3_8-cuda11_7 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1265,9 +1265,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_8-cuda11_3-test: # Testing + conda-py3_8-cuda11_7-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_3-build + needs: conda-py3_8-cuda11_7-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -1276,8 +1276,8 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" @@ -1319,10 +1319,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda11_3 + name: conda-py3_8-cuda11_7 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1368,27 +1368,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_8-cuda11_3-upload: # Uploading + conda-py3_8-cuda11_7-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_3-test + needs: conda-py3_8-cuda11_7-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda11_3 + build_name: conda-py3_8-cuda11_7 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_8-cuda11_6-build: + conda-py3_9-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -1398,11 +1398,10 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - name: Display EC2 information shell: bash @@ -1472,10 +1471,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_8-cuda11_6 + name: conda-py3_9-cpu retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1492,10 +1491,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_8-cuda11_6-test: # Testing + conda-py3_9-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_6-build - runs-on: windows.8xlarge.nvidia.gpu + needs: conda-py3_9-cpu-build + runs-on: windows.4xlarge timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1503,11 +1502,10 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - name: Display EC2 information shell: bash @@ -1546,10 +1544,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda11_6 + name: conda-py3_9-cpu path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1595,27 +1593,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_8-cuda11_6-upload: # Uploading + conda-py3_9-cpu-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_6-test + needs: conda-py3_9-cpu-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda11_6 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DESIRED_PYTHON: "3.9" + build_name: conda-py3_9-cpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_8-cuda11_7-build: + conda-py3_9-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -1625,11 +1622,11 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - name: Display EC2 information shell: bash @@ -1699,10 +1696,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_8-cuda11_7 + name: conda-py3_9-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1719,9 +1716,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_8-cuda11_7-test: # Testing + conda-py3_9-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_7-build + needs: conda-py3_9-cuda11_6-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -1730,11 +1727,11 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - name: Display EC2 information shell: bash @@ -1773,10 +1770,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_8-cuda11_7 + name: conda-py3_9-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1822,27 +1819,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_8-cuda11_7-upload: # Uploading + conda-py3_9-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_8-cuda11_7-test + needs: conda-py3_9-cuda11_6-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.8" - build_name: conda-py3_8-cuda11_7 + DESIRED_PYTHON: "3.9" + build_name: conda-py3_9-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_9-cpu-build: + conda-py3_9-cuda11_7-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -1852,8 +1849,9 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: @@ -1925,10 +1923,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_9-cpu + name: conda-py3_9-cuda11_7 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1945,10 +1943,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_9-cpu-test: # Testing + conda-py3_9-cuda11_7-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cpu-build - runs-on: windows.4xlarge + needs: conda-py3_9-cuda11_7-build + runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1956,8 +1954,9 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: @@ -1998,10 +1997,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_9-cpu + name: conda-py3_9-cuda11_7 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2047,26 +2046,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_9-cpu-upload: # Uploading + conda-py3_9-cuda11_7-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cpu-test + needs: conda-py3_9-cuda11_7-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 + GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cpu + build_name: conda-py3_9-cuda11_7 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_9-cuda11_3-build: + conda-py3_10-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -2076,11 +2076,10 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - name: Display EC2 information shell: bash @@ -2150,10 +2149,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: conda-py3_9-cuda11_3 + name: conda-py3_10-cpu retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -2170,10 +2169,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_9-cuda11_3-test: # Testing + conda-py3_10-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_3-build - runs-on: windows.8xlarge.nvidia.gpu + needs: conda-py3_10-cpu-build + runs-on: windows.4xlarge timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -2181,11 +2180,10 @@ jobs: PACKAGE_TYPE: conda # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - name: Display EC2 information shell: bash @@ -2224,10 +2222,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: conda-py3_9-cuda11_3 + name: conda-py3_10-cpu path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2273,688 +2271,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_9-cuda11_3-upload: # Uploading + conda-py3_10-cpu-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_3-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda11_3 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - conda-py3_9-cuda11_6-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: conda-py3_9-cuda11_6 - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_9-cuda11_6-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_6-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: conda-py3_9-cuda11_6 - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_9-cuda11_6-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_6-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda11_6 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - conda-py3_9-cuda11_7-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: conda-py3_9-cuda11_7 - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_9-cuda11_7-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_7-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: conda-py3_9-cuda11_7 - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_9-cuda11_7-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_9-cuda11_7-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.9" - build_name: conda-py3_9-cuda11_7 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - conda-py3_10-cpu-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.10" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: conda-py3_10-cpu - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_10-cpu-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cpu-build - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.10" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: conda-py3_10-cpu - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_10-cpu-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cpu-test + needs: conda-py3_10-cpu-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -2971,233 +2290,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - conda-py3_10-cuda11_3-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.10" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: conda-py3_10-cuda11_3 - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_10-cuda11_3-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda11_3-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.10" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: conda-py3_10-cuda11_3 - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - conda-py3_10-cuda11_3-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: conda-py3_10-cuda11_3-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: conda - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.10" - build_name: conda-py3_10-cuda11_3 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml conda-py3_10-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge @@ -3282,7 +2374,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_10-cuda11_6 @@ -3356,7 +2448,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: conda-py3_10-cuda11_6 @@ -3509,7 +2601,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: conda-py3_10-cuda11_7 @@ -3583,7 +2675,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: conda-py3_10-cuda11_7 diff --git a/.github/workflows/generated-windows-binary-libtorch-debug-master.yml b/.github/workflows/generated-windows-binary-libtorch-debug-master.yml index c34cb5250018..e52949eadf68 100644 --- a/.github/workflows/generated-windows-binary-libtorch-debug-master.yml +++ b/.github/workflows/generated-windows-binary-libtorch-debug-master.yml @@ -114,7 +114,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-with-deps-debug @@ -191,7 +191,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-debug diff --git a/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml b/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml index de660d9ef218..c0b5ddae71fa 100644 --- a/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml +++ b/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml @@ -119,7 +119,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-with-deps-debug @@ -196,7 +196,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-debug @@ -355,7 +355,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-without-deps-debug @@ -432,7 +432,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-debug @@ -591,7 +591,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-static-with-deps-debug @@ -668,7 +668,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-debug @@ -827,7 +827,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-static-without-deps-debug @@ -904,7 +904,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-debug @@ -976,962 +976,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-shared-with-deps-debug-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: shared-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: libtorch-cuda11_3-shared-with-deps-debug - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-shared-with-deps-debug-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-with-deps-debug-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: shared-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: libtorch-cuda11_3-shared-with-deps-debug - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-shared-with-deps-debug-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-with-deps-debug-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: shared-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - build_name: libtorch-cuda11_3-shared-with-deps-debug - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-shared-without-deps-debug-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: shared-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: libtorch-cuda11_3-shared-without-deps-debug - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-shared-without-deps-debug-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-without-deps-debug-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: shared-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: libtorch-cuda11_3-shared-without-deps-debug - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-shared-without-deps-debug-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-without-deps-debug-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: shared-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - build_name: libtorch-cuda11_3-shared-without-deps-debug - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-static-with-deps-debug-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: static-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: libtorch-cuda11_3-static-with-deps-debug - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-static-with-deps-debug-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-with-deps-debug-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: static-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: libtorch-cuda11_3-static-with-deps-debug - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-static-with-deps-debug-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-with-deps-debug-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: static-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - build_name: libtorch-cuda11_3-static-with-deps-debug - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-static-without-deps-debug-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: static-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: libtorch-cuda11_3-static-without-deps-debug - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-static-without-deps-debug-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-without-deps-debug-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: static-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: libtorch-cuda11_3-static-without-deps-debug - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-static-without-deps-debug-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-without-deps-debug-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - LIBTORCH_CONFIG: debug - LIBTORCH_VARIANT: static-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - build_name: libtorch-cuda11_3-static-without-deps-debug - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml libtorch-cuda11_6-shared-with-deps-debug-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge @@ -2020,7 +1064,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_6-shared-with-deps-debug @@ -2098,7 +1142,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_6-shared-with-deps-debug @@ -2259,7 +1303,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_6-shared-without-deps-debug @@ -2337,7 +1381,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_6-shared-without-deps-debug @@ -2498,7 +1542,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_6-static-with-deps-debug @@ -2576,7 +1620,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_6-static-with-deps-debug @@ -2737,7 +1781,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_6-static-without-deps-debug @@ -2815,7 +1859,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_6-static-without-deps-debug @@ -2976,7 +2020,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_7-shared-with-deps-debug @@ -3054,7 +2098,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_7-shared-with-deps-debug @@ -3215,7 +2259,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_7-shared-without-deps-debug @@ -3293,7 +2337,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_7-shared-without-deps-debug @@ -3454,7 +2498,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_7-static-with-deps-debug @@ -3532,7 +2576,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_7-static-with-deps-debug @@ -3693,7 +2737,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_7-static-without-deps-debug @@ -3771,7 +2815,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_7-static-without-deps-debug diff --git a/.github/workflows/generated-windows-binary-libtorch-release-master.yml b/.github/workflows/generated-windows-binary-libtorch-release-master.yml index 7765834eeda7..ada48aa7768c 100644 --- a/.github/workflows/generated-windows-binary-libtorch-release-master.yml +++ b/.github/workflows/generated-windows-binary-libtorch-release-master.yml @@ -114,7 +114,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-with-deps-release @@ -191,7 +191,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-release diff --git a/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml b/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml index e96ddc5fb635..f2f1d3badfe3 100644 --- a/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml +++ b/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml @@ -119,7 +119,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-with-deps-release @@ -196,7 +196,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-with-deps-release @@ -355,7 +355,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-shared-without-deps-release @@ -432,7 +432,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-shared-without-deps-release @@ -591,7 +591,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-static-with-deps-release @@ -668,7 +668,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-with-deps-release @@ -827,7 +827,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cpu-static-without-deps-release @@ -904,7 +904,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cpu-static-without-deps-release @@ -976,962 +976,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-shared-with-deps-release-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: shared-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: libtorch-cuda11_3-shared-with-deps-release - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-shared-with-deps-release-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-with-deps-release-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: shared-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: libtorch-cuda11_3-shared-with-deps-release - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-shared-with-deps-release-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-with-deps-release-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: shared-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - build_name: libtorch-cuda11_3-shared-with-deps-release - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-shared-without-deps-release-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: shared-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: libtorch-cuda11_3-shared-without-deps-release - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-shared-without-deps-release-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-without-deps-release-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: shared-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: libtorch-cuda11_3-shared-without-deps-release - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-shared-without-deps-release-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-shared-without-deps-release-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: shared-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - build_name: libtorch-cuda11_3-shared-without-deps-release - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-static-with-deps-release-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: static-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: libtorch-cuda11_3-static-with-deps-release - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-static-with-deps-release-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-with-deps-release-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: static-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: libtorch-cuda11_3-static-with-deps-release - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-static-with-deps-release-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-with-deps-release-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: static-with-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - build_name: libtorch-cuda11_3-static-with-deps-release - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - libtorch-cuda11_3-static-without-deps-release-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: static-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: libtorch-cuda11_3-static-without-deps-release - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-static-without-deps-release-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-without-deps-release-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: static-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: libtorch-cuda11_3-static-without-deps-release - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - libtorch-cuda11_3-static-without-deps-release-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: libtorch-cuda11_3-static-without-deps-release-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: libtorch - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - LIBTORCH_CONFIG: release - LIBTORCH_VARIANT: static-without-deps - # This is a dummy value for libtorch to work correctly with our batch scripts - # without this value pip does not get installed for some reason - DESIRED_PYTHON: "3.7" - build_name: libtorch-cuda11_3-static-without-deps-release - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml libtorch-cuda11_6-shared-with-deps-release-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge @@ -2020,7 +1064,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_6-shared-with-deps-release @@ -2098,7 +1142,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_6-shared-with-deps-release @@ -2259,7 +1303,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_6-shared-without-deps-release @@ -2337,7 +1381,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_6-shared-without-deps-release @@ -2498,7 +1542,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_6-static-with-deps-release @@ -2576,7 +1620,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_6-static-with-deps-release @@ -2737,7 +1781,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_6-static-without-deps-release @@ -2815,7 +1859,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_6-static-without-deps-release @@ -2976,7 +2020,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_7-shared-with-deps-release @@ -3054,7 +2098,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_7-shared-with-deps-release @@ -3215,7 +2259,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_7-shared-without-deps-release @@ -3293,7 +2337,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_7-shared-without-deps-release @@ -3454,7 +2498,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_7-static-with-deps-release @@ -3532,7 +2576,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_7-static-with-deps-release @@ -3693,7 +2737,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: libtorch-cuda11_7-static-without-deps-release @@ -3771,7 +2815,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: libtorch-cuda11_7-static-without-deps-release diff --git a/.github/workflows/generated-windows-binary-wheel-master.yml b/.github/workflows/generated-windows-binary-wheel-master.yml deleted file mode 100644 index 175507d4fd5a..000000000000 --- a/.github/workflows/generated-windows-binary-wheel-master.yml +++ /dev/null @@ -1,236 +0,0 @@ -# @generated DO NOT EDIT MANUALLY - -# Template is at: .github/templates/windows_binary_build_workflow.yml.j2 -# Generation script: .github/scripts/generate_ci_workflows.py -name: windows-binary-wheel - -on: - push: - branches: - - master - tags: - - 'ciflow/trunk/*' - workflow_dispatch: - -env: - # Needed for conda builds - ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine" - ANACONDA_USER: pytorch - AWS_DEFAULT_REGION: us-east-1 - BUILD_ENVIRONMENT: windows-binary-wheel - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - PR_NUMBER: ${{ github.event.pull_request.number }} - SHA1: ${{ github.event.pull_request.head.sha || github.sha }} - SKIP_ALL_TESTS: 1 -concurrency: - group: windows-binary-wheel-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true - -jobs: - wheel-py3_7-cuda11_3-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: wheel-py3_7-cuda11_3 - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_7-cuda11_3-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_7-cuda11_3-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.7" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: wheel-py3_7-cuda11_3 - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 diff --git a/.github/workflows/generated-windows-binary-wheel-nightly.yml b/.github/workflows/generated-windows-binary-wheel-nightly.yml index df5ce57fff06..026c81e6bb58 100644 --- a/.github/workflows/generated-windows-binary-wheel-nightly.yml +++ b/.github/workflows/generated-windows-binary-wheel-nightly.yml @@ -115,7 +115,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_7-cpu @@ -188,7 +188,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: wheel-py3_7-cpu @@ -256,7 +256,7 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_7-cuda11_3-build: + wheel-py3_7-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -266,8 +266,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" @@ -340,10 +340,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_7-cuda11_3 + name: wheel-py3_7-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -360,9 +360,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_7-cuda11_3-test: # Testing + wheel-py3_7-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_7-cuda11_3-build + needs: wheel-py3_7-cuda11_6-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -371,8 +371,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" @@ -414,10 +414,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_7-cuda11_3 + name: wheel-py3_7-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -463,27 +463,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_7-cuda11_3-upload: # Uploading + wheel-py3_7-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_7-cuda11_3-test + needs: wheel-py3_7-cuda11_6-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.7" - build_name: wheel-py3_7-cuda11_3 + build_name: wheel-py3_7-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_7-cuda11_6-build: + wheel-py3_7-cuda11_7-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -493,8 +493,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" @@ -567,10 +567,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_7-cuda11_6 + name: wheel-py3_7-cuda11_7 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -587,9 +587,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_7-cuda11_6-test: # Testing + wheel-py3_7-cuda11_7-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_7-cuda11_6-build + needs: wheel-py3_7-cuda11_7-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -598,8 +598,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.7" @@ -641,10 +641,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_7-cuda11_6 + name: wheel-py3_7-cuda11_7 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -690,27 +690,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_7-cuda11_6-upload: # Uploading + wheel-py3_7-cuda11_7-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_7-cuda11_6-test + needs: wheel-py3_7-cuda11_7-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.7" - build_name: wheel-py3_7-cuda11_6 + build_name: wheel-py3_7-cuda11_7 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_7-cuda11_7-build: + wheel-py3_8-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -720,11 +720,10 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.7" + DESIRED_PYTHON: "3.8" steps: - name: Display EC2 information shell: bash @@ -794,10 +793,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_7-cuda11_7 + name: wheel-py3_8-cpu retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -814,10 +813,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_7-cuda11_7-test: # Testing + wheel-py3_8-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_7-cuda11_7-build - runs-on: windows.8xlarge.nvidia.gpu + needs: wheel-py3_8-cpu-build + runs-on: windows.4xlarge timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -825,11 +824,10 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.7" + DESIRED_PYTHON: "3.8" steps: - name: Display EC2 information shell: bash @@ -868,10 +866,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_7-cuda11_7 + name: wheel-py3_8-cpu path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -917,27 +915,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_7-cuda11_7-upload: # Uploading + wheel-py3_8-cpu-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_7-cuda11_7-test + needs: wheel-py3_8-cpu-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.7" - build_name: wheel-py3_7-cuda11_7 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DESIRED_PYTHON: "3.8" + build_name: wheel-py3_8-cpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_8-cpu-build: + wheel-py3_8-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -947,8 +944,9 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: @@ -1020,10 +1018,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_8-cpu + name: wheel-py3_8-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1040,10 +1038,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cpu-test: # Testing + wheel-py3_8-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cpu-build - runs-on: windows.4xlarge + needs: wheel-py3_8-cuda11_6-build + runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1051,8 +1049,9 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" steps: @@ -1093,10 +1092,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cpu + name: wheel-py3_8-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1142,26 +1141,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cpu-upload: # Uploading + wheel-py3_8-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cpu-test + needs: wheel-py3_8-cuda11_6-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 + GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.8" - build_name: wheel-py3_8-cpu + build_name: wheel-py3_8-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_8-cuda11_3-build: + wheel-py3_8-cuda11_7-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -1171,8 +1171,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" @@ -1245,10 +1245,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_8-cuda11_3 + name: wheel-py3_8-cuda11_7 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1265,9 +1265,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_3-test: # Testing + wheel-py3_8-cuda11_7-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_3-build + needs: wheel-py3_8-cuda11_7-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -1276,8 +1276,8 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.8" @@ -1319,10 +1319,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cuda11_3 + name: wheel-py3_8-cuda11_7 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1368,27 +1368,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_3-upload: # Uploading + wheel-py3_8-cuda11_7-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_3-test + needs: wheel-py3_8-cuda11_7-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.8" - build_name: wheel-py3_8-cuda11_3 + build_name: wheel-py3_8-cuda11_7 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_8-cuda11_6-build: + wheel-py3_9-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -1398,11 +1398,10 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - name: Display EC2 information shell: bash @@ -1472,10 +1471,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_8-cuda11_6 + name: wheel-py3_9-cpu retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1492,10 +1491,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_6-test: # Testing + wheel-py3_9-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_6-build - runs-on: windows.8xlarge.nvidia.gpu + needs: wheel-py3_9-cpu-build + runs-on: windows.4xlarge timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1503,11 +1502,10 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - name: Display EC2 information shell: bash @@ -1546,10 +1544,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cuda11_6 + name: wheel-py3_9-cpu path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1595,27 +1593,26 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_6-upload: # Uploading + wheel-py3_9-cpu-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_6-test + needs: wheel-py3_9-cpu-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.8" - build_name: wheel-py3_8-cuda11_6 + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu + DESIRED_PYTHON: "3.9" + build_name: wheel-py3_9-cpu secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_8-cuda11_7-build: + wheel-py3_9-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -1625,11 +1622,11 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - name: Display EC2 information shell: bash @@ -1699,10 +1696,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_8-cuda11_7 + name: wheel-py3_9-cuda11_6 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1719,9 +1716,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_7-test: # Testing + wheel-py3_9-cuda11_6-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_7-build + needs: wheel-py3_9-cuda11_6-build runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: @@ -1730,11 +1727,11 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.8" + DESIRED_PYTHON: "3.9" steps: - name: Display EC2 information shell: bash @@ -1773,10 +1770,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_8-cuda11_7 + name: wheel-py3_9-cuda11_6 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -1822,27 +1819,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_8-cuda11_7-upload: # Uploading + wheel-py3_9-cuda11_6-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_8-cuda11_7-test + needs: wheel-py3_9-cuda11_6-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 + DESIRED_CUDA: cu116 + GPU_ARCH_VERSION: 11.6 GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.8" - build_name: wheel-py3_8-cuda11_7 + DESIRED_PYTHON: "3.9" + build_name: wheel-py3_9-cuda11_6 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_9-cpu-build: + wheel-py3_9-cuda11_7-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -1852,8 +1849,9 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: @@ -1925,10 +1923,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_9-cpu + name: wheel-py3_9-cuda11_7 retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -1945,10 +1943,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cpu-test: # Testing + wheel-py3_9-cuda11_7-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cpu-build - runs-on: windows.4xlarge + needs: wheel-py3_9-cuda11_7-build + runs-on: windows.8xlarge.nvidia.gpu timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -1956,8 +1954,9 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 + GPU_ARCH_TYPE: cuda SKIP_ALL_TESTS: 1 DESIRED_PYTHON: "3.9" steps: @@ -1998,10 +1997,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_9-cpu + name: wheel-py3_9-cuda11_7 path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2047,26 +2046,27 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cpu-upload: # Uploading + wheel-py3_9-cuda11_7-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cpu-test + needs: wheel-py3_9-cuda11_7-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu + DESIRED_CUDA: cu117 + GPU_ARCH_VERSION: 11.7 + GPU_ARCH_TYPE: cuda DESIRED_PYTHON: "3.9" - build_name: wheel-py3_9-cpu + build_name: wheel-py3_9-cuda11_7 secrets: github-token: ${{ secrets.GITHUB_TOKEN }} aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_9-cuda11_3-build: + wheel-py3_10-cpu-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge timeout-minutes: 240 @@ -2076,11 +2076,10 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - name: Display EC2 information shell: bash @@ -2150,10 +2149,10 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: - name: wheel-py3_9-cuda11_3 + name: wheel-py3_10-cpu retention-days: 14 if-no-files-found: error path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" @@ -2170,10 +2169,10 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_3-test: # Testing + wheel-py3_10-cpu-test: # Testing if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_3-build - runs-on: windows.8xlarge.nvidia.gpu + needs: wheel-py3_10-cpu-build + runs-on: windows.4xlarge timeout-minutes: 240 env: PYTORCH_ROOT: ${{ github.workspace }}/pytorch @@ -2181,11 +2180,10 @@ jobs: PACKAGE_TYPE: wheel # TODO: This is a legacy variable that we eventually want to get rid of in # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda + DESIRED_CUDA: cpu + GPU_ARCH_TYPE: cpu SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" + DESIRED_PYTHON: "3.10" steps: - name: Display EC2 information shell: bash @@ -2224,10 +2222,10 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: - name: wheel-py3_9-cuda11_3 + name: wheel-py3_10-cpu path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - name: Checkout PyTorch uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 @@ -2273,688 +2271,9 @@ jobs: if: always() run: | .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_3-upload: # Uploading + wheel-py3_10-cpu-upload: # Uploading if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_3-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.9" - build_name: wheel-py3_9-cuda11_3 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - wheel-py3_9-cuda11_6-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: wheel-py3_9-cuda11_6 - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_6-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_6-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: wheel-py3_9-cuda11_6 - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_6-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_6-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu116 - GPU_ARCH_VERSION: 11.6 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.9" - build_name: wheel-py3_9-cuda11_6 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - wheel-py3_9-cuda11_7-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: wheel-py3_9-cuda11_7 - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_7-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_7-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.9" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: wheel-py3_9-cuda11_7 - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_9-cuda11_7-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_9-cuda11_7-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu117 - GPU_ARCH_VERSION: 11.7 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.9" - build_name: wheel-py3_9-cuda11_7 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml - wheel-py3_10-cpu-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.10" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: wheel-py3_10-cpu - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cpu-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cpu-build - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cpu - GPU_ARCH_TYPE: cpu - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.10" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: wheel-py3_10-cpu - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cpu-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cpu-test + needs: wheel-py3_10-cpu-test with: PYTORCH_ROOT: ${{ github.workspace }}/pytorch BUILDER_ROOT: ${{ github.workspace }}/builder @@ -2971,233 +2290,6 @@ jobs: aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} uses: ./.github/workflows/_binary-upload.yml - wheel-py3_10-cuda11_3-build: - if: ${{ github.repository_owner == 'pytorch' }} - runs-on: windows.4xlarge - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.10" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Build PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 - if: always() - with: - name: wheel-py3_10-cuda11_3 - retention-days: 14 - if-no-files-found: error - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cuda11_3-test: # Testing - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cuda11_3-build - runs-on: windows.8xlarge.nvidia.gpu - timeout-minutes: 240 - env: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - SKIP_ALL_TESTS: 1 - DESIRED_PYTHON: "3.10" - steps: - - name: Display EC2 information - shell: bash - run: | - set -euo pipefail - function get_ec2_metadata() { - # Pulled from instance metadata endpoint for EC2 - # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html - category=$1 - curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" - } - echo "ami-id: $(get_ec2_metadata ami-id)" - echo "instance-id: $(get_ec2_metadata instance-id)" - echo "instance-type: $(get_ec2_metadata instance-type)" - echo "system info $(uname -a)" - - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)" - uses: seemethere/add-github-ssh-key@v1 - with: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560 - - name: Enable long paths on Windows - shell: powershell - run: | - Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 - # Since it's just a defensive command, the workflow should continue even the command fails - - name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory. - shell: powershell - run: | - Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore - # NOTE: These environment variables are put here so that they can be applied on every job equally - # They are also here because setting them at a workflow level doesn't give us access to the - # runner.temp variable, which we need. - - name: Populate binary env - shell: bash - run: | - echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" - echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" - echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 - name: Download Build Artifacts - with: - name: wheel-py3_10-cuda11_3 - path: "${{ env.PYTORCH_FINAL_PACKAGE_DIR }}" - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - submodules: recursive - path: pytorch - - name: Clean PyTorch checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: pytorch - - name: Checkout pytorch/builder - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: main - submodules: recursive - repository: pytorch/builder - path: builder - - name: Clean pytorch/builder checkout - run: | - # Remove any artifacts from the previous checkouts - git clean -fxd - working-directory: builder - - name: Populate binary env - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_populate_env.sh" - - name: Test PyTorch binary - shell: bash - run: | - "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_test.sh" - - name: Wait until all sessions have drained - shell: powershell - working-directory: pytorch - if: always() - timeout-minutes: 120 - run: | - .github\scripts\wait_for_ssh_to_drain.ps1 - - name: Kill active ssh sessions if still around (Useful if workflow was cancelled) - shell: powershell - working-directory: pytorch - if: always() - run: | - .github\scripts\kill_active_ssh_sessions.ps1 - wheel-py3_10-cuda11_3-upload: # Uploading - if: ${{ github.repository_owner == 'pytorch' }} - needs: wheel-py3_10-cuda11_3-test - with: - PYTORCH_ROOT: ${{ github.workspace }}/pytorch - BUILDER_ROOT: ${{ github.workspace }}/builder - PACKAGE_TYPE: wheel - # TODO: This is a legacy variable that we eventually want to get rid of in - # favor of GPU_ARCH_VERSION - DESIRED_CUDA: cu113 - GPU_ARCH_VERSION: 11.3 - GPU_ARCH_TYPE: cuda - DESIRED_PYTHON: "3.10" - build_name: wheel-py3_10-cuda11_3 - secrets: - github-token: ${{ secrets.GITHUB_TOKEN }} - aws-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }} - aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }} - conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }} - uses: ./.github/workflows/_binary-upload.yml wheel-py3_10-cuda11_6-build: if: ${{ github.repository_owner == 'pytorch' }} runs-on: windows.4xlarge @@ -3282,7 +2374,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_10-cuda11_6 @@ -3356,7 +2448,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: wheel-py3_10-cuda11_6 @@ -3509,7 +2601,7 @@ jobs: shell: bash run: | "${PYTORCH_ROOT}/.circleci/scripts/binary_windows_build.sh" - - uses: seemethere/upload-artifact-s3@v5 + - uses: actions/upload-artifact@v3 if: always() with: name: wheel-py3_10-cuda11_7 @@ -3583,7 +2675,7 @@ jobs: echo "BINARY_ENV_FILE=${RUNNER_TEMP}/env" >> "${GITHUB_ENV}" echo "PYTORCH_FINAL_PACKAGE_DIR=${RUNNER_TEMP}/artifacts" >> "${GITHUB_ENV}" echo "WIN_PACKAGE_WORK_DIR=${RUNNER_TEMP}" - - uses: seemethere/download-artifact-s3@v4 + - uses: actions/download-artifact@v3 name: Download Build Artifacts with: name: wheel-py3_10-cuda11_7 diff --git a/.github/workflows/inductor.yml b/.github/workflows/inductor.yml new file mode 100644 index 000000000000..9179b186e918 --- /dev/null +++ b/.github/workflows/inductor.yml @@ -0,0 +1,41 @@ +name: inductor + +on: + schedule: + - cron: 45 1,5,9,13,17,21 * * * + push: + tags: + - ciflow/inductor/* + - ciflow/periodic/* + workflow_dispatch: + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true + +jobs: + linux-bionic-cuda11_6-py3_10-gcc7-inductor-build: + name: cuda11.6-py3.10-gcc7-sm86 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-cuda11.6-py3.10-gcc7-sm86 + docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 + cuda-arch-list: '8.6' + test-matrix: | + { include: [ + { config: "inductor", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "inductor_huggingface", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "inductor_timm", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "inductor_timm", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "inductor_torchbench", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "inductor_distributed", shard: 1, num_shards: 1, runner: "linux.g5.12xlarge.nvidia.gpu" }, + ]} + + linux-bionic-cuda11_6-py3_10-gcc7-inductor-test: + name: cuda11.6-py3.10-gcc7-sm86 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda11_6-py3_10-gcc7-inductor-build + with: + build-environment: linux-bionic-cuda11.6-py3.10-gcc7-sm86 + docker-image: ${{ needs.linux-bionic-cuda11_6-py3_10-gcc7-inductor-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-bionic-cuda11_6-py3_10-gcc7-inductor-build.outputs.test-matrix }} diff --git a/.github/workflows/labeler.yml b/.github/workflows/labeler.yml new file mode 100644 index 000000000000..bdef7a1367bf --- /dev/null +++ b/.github/workflows/labeler.yml @@ -0,0 +1,20 @@ +name: Labeler + +on: +- pull_request_target + +jobs: + triage: + permissions: + contents: read + pull-requests: write + runs-on: ubuntu-latest + steps: + - uses: actions/labeler@v4 + with: + repo-token: "${{ secrets.GITHUB_TOKEN }}" + sync-labels: '' + +concurrency: + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + cancel-in-progress: true diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index 763d284280f6..1f47e1defc2f 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -14,19 +14,25 @@ jobs: lintrunner: runs-on: linux.20_04.16x steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.8 - architecture: x64 - - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master with: submodules: false + fetch-depth: 1 + + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.8' + architecture: x64 + check-latest: false + cache: pip + cache-dependency-path: | + **/.github/requirements-gha-cache.txt - - name: Install lintrunner - run: pip install lintrunner==0.9.* + - name: Install requirements + run: | + pip install -r .github/requirements-gha-cache.txt --user - name: Initialize lint dependencies run: lintrunner init @@ -64,11 +70,6 @@ jobs: name: quick-checks runs-on: linux.20_04.4x steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.x - architecture: x64 # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -79,9 +80,18 @@ jobs: run: | # Remove any artifacts from the previous checkouts git clean -fxd + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.x' + architecture: x64 + check-latest: false + cache: pip + cache-dependency-path: | + **/requirements.txt - name: Install requirements id: requirements - run: pip3 install -r requirements.txt --user + run: pip install -r requirements.txt --user - name: Ensure no non-breaking spaces if: always() run: | @@ -111,7 +121,7 @@ jobs: name: pr-sanity-checks runs-on: linux.20_04.4x # Only run this on pull requests - if: github.event_name == 'pull_request' + if: github.event_name == 'pull_request' && !contains(github.event.pull_request.labels.*.name, 'skip-pr-sanity-checks') steps: - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master @@ -123,56 +133,35 @@ jobs: BASE: ${{ github.event.pull_request.base.sha }} HEAD: ${{ github.event.pull_request.head.sha }} run: | - set -x - - ancestor=$(git merge-base "${BASE}" "${HEAD}") - details=$(git diff --shortstat "$ancestor" "${HEAD}") - add=$(echo "$details" | grep -o '[0-9]* insertion' | grep -o '[0-9]*' || true) - remove=$(echo "$details" | grep -o '[0-9]* deletion' | grep -o '[0-9]*' || true) - - pr_size=0 - if [ "$add" ]; then - pr_size=$(("$pr_size" + "$add")) - fi - if [ "$remove" ]; then - pr_size=$(("$pr_size" + "$remove")) - fi - - if ((pr_size > 2000)); then - echo - echo 'Your PR is '"$pr_size"' LOC which is more than the 2000 maximum' - echo 'allowed within PyTorch infra. PLease make sure to split up' - echo 'your PR into smaller pieces that can be reviewed.' - echo 'If you think that this rule should not apply to your PR,' - echo 'please contact @albanD or @seemethere.' - echo - false - fi - - + bash .github/scripts/pr-sanity-check.sh workflow-checks: name: workflow-checks runs-on: linux.20_04.4x steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.x - architecture: x64 # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master with: submodules: false fetch-depth: 1 + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.x' + architecture: x64 + check-latest: false + cache: pip + cache-dependency-path: | + **/requirements.txt + **/.github/requirements-gha-cache.txt - name: Install requirements id: requirements run: | - pip3 install -r requirements.txt --user + pip install -r requirements.txt --user - name: Install Jinja2 run: | - pip3 install Jinja2==3.0.1 --user + pip install Jinja2==3.0.1 --user - name: Regenerate workflows id: generate_workflows run: .github/scripts/generate_ci_workflows.py @@ -202,14 +191,15 @@ jobs: env: NPM_CONFIG_PREFIX: ~/.npm-global steps: - - name: Setup Node - uses: actions/setup-node@v2 # [see note: pytorch repo ref] - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master with: submodules: false fetch-depth: 1 + # This is not a node project so there is no package-lock.json to cache + - name: Setup Node + uses: actions/setup-node@v3 - name: Install markdown-toc run: npm install -g markdown-toc - name: Regenerate ToCs and check that they didn't change @@ -241,29 +231,36 @@ jobs: if: ${{ github.repository == 'pytorch/pytorch' }} runs-on: linux.20_04.4x steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.8 - architecture: x64 # [see note: pytorch repo ref] # deep clone (fetch-depth 0) required, to allow us to use git log - name: Checkout PyTorch uses: pytorch/pytorch/.github/actions/checkout-pytorch@master with: submodules: false + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.8' + architecture: x64 + check-latest: false + cache: pip + cache-dependency-path: | + **/requirements.txt + **/requirements-flake8.txt + **/.circleci/docker/requirements-ci.txt + **/.github/requirements-gha-cache.txt - name: Install dependencies # mypy and boto3 versions copied from # .circleci/docker/common/install_conda.sh run: | set -eux - python3 -mpip install -r requirements.txt - python3 -mpip install boto3==1.16.34 - pip3 install typing-extensions==3.10 --user - pip3 install -r requirements-flake8.txt --user - python3 -mpip install rockset==0.8.10 --user - python3 -mpip install -r requirements.txt --user - python3 -mpip install mypy==0.960 --user + pip install -r requirements.txt + pip install boto3==1.19.12 + pip install typing-extensions==3.10 --user + pip install -r requirements-flake8.txt --user + pip install rockset==0.8.10 --user + pip install -r requirements.txt --user + pip install mypy==0.960 --user make setup_lint - name: Test tools run: | @@ -278,28 +275,37 @@ jobs: matrix: test_type: [with_torch, without_torch, older_python_version] steps: + # [see note: pytorch repo ref] + # deep clone (fetch-depth 0) required, to allow us to use git log + - name: Checkout PyTorch + uses: pytorch/pytorch/.github/actions/checkout-pytorch@master + with: + submodules: false + fetch-depth: 1 - name: Setup Python 3.5 if: matrix.test_type == 'older_python_version' - uses: actions/setup-python@v2 + uses: actions/setup-python@v4 with: - python-version: 3.5 + python-version: '3.5' architecture: x64 + check-latest: false + cache: pip + cache-dependency-path: | + **/requirements.txt - name: Setup Python 3.8 if: matrix.test_type != 'older_python_version' - uses: actions/setup-python@v2 + uses: actions/setup-python@v4 with: - python-version: 3.8 + python-version: '3.8' architecture: x64 - # [see note: pytorch repo ref] - # deep clone (fetch-depth 0) required, to allow us to use git log - - name: Checkout PyTorch - uses: pytorch/pytorch/.github/actions/checkout-pytorch@master - with: - submodules: false - fetch-depth: 1 + check-latest: false + cache: pip + cache-dependency-path: | + **/requirements.txt - name: Install torch if: matrix.test_type == 'with_torch' run: | + pip install -r requirements.txt # Doesn't really matter what torch version, we just need ANY torch installed pip install 'torch==1.*' - name: Run collect_env.py diff --git a/.github/workflows/mac-mps.yml b/.github/workflows/mac-mps.yml index 8fc2dd8336bf..5df7299cc507 100644 --- a/.github/workflows/mac-mps.yml +++ b/.github/workflows/mac-mps.yml @@ -22,6 +22,10 @@ jobs: build-generates-artifacts: true # To match the one pre-installed in the m1 runners python_version: 3.9.12 + # We need to set the environment file here instead of trying to detect it automatically because + # MacOS arm64 is cross-compiled from x86-64. Specifically, it means that arm64 conda environment + # is needed when building PyTorch MacOS arm64 from x86-64 + environment-file: .github/requirements/conda-env-macOS-ARM64 secrets: MACOS_SCCACHE_S3_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} MACOS_SCCACHE_S3_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml index 133aa56865c7..a8de37ca85be 100644 --- a/.github/workflows/nightly.yml +++ b/.github/workflows/nightly.yml @@ -35,3 +35,12 @@ jobs: run-doxygen: true secrets: GH_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} + + update-vision-commit-hash: + uses: ./.github/workflows/_update-commit-hash.yml + with: + repo-name: vision + branch: main + secrets: + MERGEBOT_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} + PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} diff --git a/.github/workflows/periodic.yml b/.github/workflows/periodic.yml index 7fbd04f8f161..9a188345899d 100644 --- a/.github/workflows/periodic.yml +++ b/.github/workflows/periodic.yml @@ -3,99 +3,106 @@ name: periodic on: schedule: - cron: 45 0,4,8,12,16,20 * * * + - cron: 29 8 * * * # about 1:29am PDT, for mem leak check and rerun disabled tests push: tags: - ciflow/periodic/* workflow_dispatch: concurrency: - group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }}-${{ github.event_name == 'schedule' }} cancel-in-progress: true jobs: - linux-xenial-cuda10_2-py3-gcc7-slow-gradcheck-build: - name: linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck + linux-bionic-cuda11_6-py3-gcc7-slow-gradcheck-build: + name: linux-bionic-cuda11.6-py3-gcc7-slow-gradcheck uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck - docker-image-name: pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7 - - linux-xenial-cuda10_2-py3-gcc7-slow-gradcheck-test: - name: linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck - uses: ./.github/workflows/_linux-test.yml - needs: linux-xenial-cuda10_2-py3-gcc7-slow-gradcheck-build - with: - build-environment: linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck - docker-image: ${{ needs.linux-xenial-cuda10_2-py3-gcc7-slow-gradcheck-build.outputs.docker-image }} + build-environment: linux-bionic-cuda11.6-py3-gcc7-slow-gradcheck + docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 test-matrix: | { include: [ { config: "default", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, ]} - linux-focal-rocm5_2-py3_7-slow-build: - name: linux-focal-rocm5.2-py3.7-slow - uses: ./.github/workflows/_linux-build.yml + linux-bionic-cuda11_6-py3-gcc7-slow-gradcheck-test: + name: linux-bionic-cuda11.6-py3-gcc7-slow-gradcheck + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda11_6-py3-gcc7-slow-gradcheck-build with: - build-environment: linux-focal-rocm5.2-py3.7 - docker-image-name: pytorch-linux-focal-rocm5.2-py3.7 + build-environment: linux-bionic-cuda11.6-py3-gcc7-slow-gradcheck + docker-image: ${{ needs.linux-bionic-cuda11_6-py3-gcc7-slow-gradcheck-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-bionic-cuda11_6-py3-gcc7-slow-gradcheck-build.outputs.test-matrix }} + timeout-minutes: 300 - linux-focal-rocm5_2-py3_7-slow-test: - name: linux-focal-rocm5.2-py3.7-slow - uses: ./.github/workflows/_rocm-test.yml - needs: linux-focal-rocm5_2-py3_7-slow-build + linux-focal-rocm5_2-py3_8-slow-build: + name: linux-focal-rocm5.2-py3.8-slow + uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-focal-rocm5.2-py3.7 - docker-image: ${{ needs.linux-focal-rocm5_2-py3_7-slow-build.outputs.docker-image }} + build-environment: linux-focal-rocm5.2-py3.8 + docker-image-name: pytorch-linux-focal-rocm5.2-py3.8 test-matrix: | { include: [ { config: "slow", shard: 1, num_shards: 1, runner: "linux.rocm.gpu" }, ]} + + linux-focal-rocm5_2-py3_8-slow-test: + name: linux-focal-rocm5.2-py3.8-slow + uses: ./.github/workflows/_rocm-test.yml + needs: linux-focal-rocm5_2-py3_8-slow-build + with: + build-environment: linux-focal-rocm5.2-py3.8 + docker-image: ${{ needs.linux-focal-rocm5_2-py3_8-slow-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-focal-rocm5_2-py3_8-slow-build.outputs.test-matrix }} secrets: AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} - linux-focal-rocm5_2-py3_7-distributed-build: - name: linux-focal-rocm5.2-py3.7-distributed + linux-focal-rocm5_2-py3_8-distributed-build: + name: linux-focal-rocm5.2-py3.8-distributed uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-focal-rocm5.2-py3.7 - docker-image-name: pytorch-linux-focal-rocm5.2-py3.7 - - linux-focal-rocm5_2-py3_7-distributed-test: - name: linux-focal-rocm5.2-py3.7-distributed - uses: ./.github/workflows/_rocm-test.yml - needs: linux-focal-rocm5_2-py3_7-distributed-build - with: - build-environment: linux-focal-rocm5.2-py3.7 - docker-image: ${{ needs.linux-focal-rocm5_2-py3_7-distributed-build.outputs.docker-image }} + build-environment: linux-focal-rocm5.2-py3.8 + docker-image-name: pytorch-linux-focal-rocm5.2-py3.8 test-matrix: | { include: [ { config: "distributed", shard: 1, num_shards: 2, runner: "linux.rocm.gpu" }, { config: "distributed", shard: 2, num_shards: 2, runner: "linux.rocm.gpu" }, ]} + + linux-focal-rocm5_2-py3_8-distributed-test: + name: linux-focal-rocm5.2-py3.8-distributed + uses: ./.github/workflows/_rocm-test.yml + needs: linux-focal-rocm5_2-py3_8-distributed-build + with: + build-environment: linux-focal-rocm5.2-py3.8 + docker-image: ${{ needs.linux-focal-rocm5_2-py3_8-distributed-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-focal-rocm5_2-py3_8-distributed-build.outputs.test-matrix }} secrets: AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} - linux-bionic-cuda10_2-py3_9-gcc7-build: - name: linux-bionic-cuda10.2-py3.9-gcc7 + linux-bionic-cuda11_6-py3_9-gcc7-build: + name: linux-bionic-cuda11.6-py3.9-gcc7 uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-bionic-cuda10.2-py3.9-gcc7 - docker-image-name: pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7 - - linux-bionic-cuda10_2-py3_9-gcc7-test: - name: linux-bionic-cuda10.2-py3.9-gcc7 - uses: ./.github/workflows/_linux-test.yml - needs: linux-bionic-cuda10_2-py3_9-gcc7-build - with: - build-environment: linux-bionic-cuda10.2-py3.9-gcc7 - docker-image: ${{ needs.linux-bionic-cuda10_2-py3_9-gcc7-build.outputs.docker-image }} + build-environment: linux-bionic-cuda11.6-py3.9-gcc7 + docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 test-matrix: | { include: [ { config: "multigpu", shard: 1, num_shards: 1, runner: "linux.16xlarge.nvidia.gpu" }, ]} + build-with-debug: false + + linux-bionic-cuda11_6-py3_9-gcc7-test: + name: linux-bionic-cuda11.6-py3.9-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda11_6-py3_9-gcc7-build + with: + build-environment: linux-bionic-cuda11.6-py3.9-gcc7 + docker-image: ${{ needs.linux-bionic-cuda11_6-py3_9-gcc7-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-bionic-cuda11_6-py3_9-gcc7-build.outputs.test-matrix }} linux-bionic-cuda11_6-py3_7-gcc7-debug-build: name: linux-bionic-cuda11.6-py3.7-gcc7-debug @@ -104,6 +111,13 @@ jobs: build-environment: linux-bionic-cuda11.6-py3.7-gcc7-debug docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 build-with-debug: true + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + ]} linux-bionic-cuda11_6-py3_7-gcc7-debug-test: name: linux-bionic-cuda11.6-py3.7-gcc7-debug @@ -112,13 +126,7 @@ jobs: with: build-environment: linux-bionic-cuda11.6-py3.7-gcc7-debug docker-image: ${{ needs.linux-bionic-cuda11_6-py3_7-gcc7-debug-build.outputs.docker-image }} - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - ]} + test-matrix: ${{ needs.linux-bionic-cuda11_6-py3_7-gcc7-debug-build.outputs.test-matrix }} linux-bionic-cuda11_7-py3_7-gcc7-debug-build: name: linux-bionic-cuda11.7-py3.7-gcc7-debug @@ -127,6 +135,13 @@ jobs: build-environment: linux-bionic-cuda11.7-py3.7-gcc7-debug docker-image-name: pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7 build-with-debug: true + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + ]} linux-bionic-cuda11_7-py3_7-gcc7-debug-test: name: linux-bionic-cuda11.7-py3.7-gcc7-debug @@ -135,13 +150,7 @@ jobs: with: build-environment: linux-bionic-cuda11.7-py3.7-gcc7-debug docker-image: ${{ needs.linux-bionic-cuda11_7-py3_7-gcc7-debug-build.outputs.docker-image }} - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - ]} + test-matrix: ${{ needs.linux-bionic-cuda11_7-py3_7-gcc7-debug-build.outputs.test-matrix }} libtorch-linux-bionic-cuda11_7-py3_7-gcc7-build: name: libtorch-linux-bionic-cuda11.7-py3.7-gcc7 @@ -157,6 +166,13 @@ jobs: with: build-environment: win-vs2019-cuda11.7-py3 cuda-version: "11.7" + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 3, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 3, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 3, num_shards: 3, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, + ]} win-vs2019-cuda11_7-py3-test: name: win-vs2019-cuda11.7-py3 @@ -165,12 +181,7 @@ jobs: with: build-environment: win-vs2019-cuda11.7-py3 cuda-version: "11.7" - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" }, - { config: "default", shard: 2, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" }, - { config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, - ]} + test-matrix: ${{ needs.win-vs2019-cuda11_7-py3-build.outputs.test-matrix }} ios-12-5-1-x86-64-coreml: name: ios-12-5-1-x86-64-coreml @@ -179,11 +190,6 @@ jobs: build-environment: ios-12-5-1-x86-64-coreml ios-platform: SIMULATOR ios-arch: x86_64 - secrets: - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} ios-12-5-1-arm64: name: ios-12-5-1-arm64 @@ -192,11 +198,6 @@ jobs: build-environment: ios-12-5-1-arm64 ios-platform: OS ios-arch: arm64 - secrets: - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} ios-12-5-1-arm64-coreml: name: ios-12-5-1-arm64-coreml @@ -205,11 +206,6 @@ jobs: build-environment: ios-12-5-1-arm64-coreml ios-platform: OS ios-arch: arm64 - secrets: - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} ios-12-5-1-arm64-custom-ops: name: ios-12-5-1-arm64-custom-ops @@ -218,11 +214,6 @@ jobs: build-environment: ios-12-5-1-arm64-custom-ops ios-platform: OS ios-arch: arm64 - secrets: - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} ios-12-5-1-arm64-metal: name: ios-12-5-1-arm64-metal @@ -231,11 +222,6 @@ jobs: build-environment: ios-12-5-1-arm64-metal ios-platform: OS ios-arch: arm64 - secrets: - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} buck-build-test: name: buck-build-test diff --git a/.github/workflows/pr-labels.yml b/.github/workflows/pr-labels.yml deleted file mode 100644 index 7313d0b8e968..000000000000 --- a/.github/workflows/pr-labels.yml +++ /dev/null @@ -1,32 +0,0 @@ -name: pr-labels - -on: - push: - branches: - - master - - main - -jobs: - is-properly-labeled: - runs-on: ubuntu-latest - - steps: - - name: Set up python - uses: actions/setup-python@v2 - - - name: Install requests - run: pip3 install requests==2.26 - - - name: Checkout repository - uses: actions/checkout@v2 - - - name: Process commit and find merger responsible for labeling - id: commit - env: - SHA1: ${{ github.sha }} - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - run: python .github/scripts/process_commit.py "${SHA1}" - -concurrency: - group: pr-labels-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/pull.yml b/.github/workflows/pull.yml index 76656febf928..3642c7fc1769 100644 --- a/.github/workflows/pull.yml +++ b/.github/workflows/pull.yml @@ -9,9 +9,11 @@ on: - release/* - landchecks/* workflow_dispatch: + schedule: + - cron: 29 8 * * * # about 1:29am PDT concurrency: - group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}-${{ github.event_name == 'schedule' }} cancel-in-progress: true jobs: @@ -21,25 +23,27 @@ jobs: with: build-environment: linux-focal-py3.7-gcc7 docker-image-name: pytorch-linux-focal-py3.7-gcc7 - - linux-focal-py3_7-gcc7-test: - name: linux-focal-py3.7-gcc7 - uses: ./.github/workflows/_linux-test.yml - needs: linux-focal-py3_7-gcc7-build - with: - build-environment: linux-focal-py3.7-gcc7 - docker-image: ${{ needs.linux-focal-py3_7-gcc7-build.outputs.docker-image }} test-matrix: | { include: [ { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, - { config: "distributed", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + { config: "distributed", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, + { config: "distributed", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, { config: "functorch", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, { config: "docs_test", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, { config: "jit_legacy", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, { config: "backwards_compat", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, ]} + linux-focal-py3_7-gcc7-test: + name: linux-focal-py3.7-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-focal-py3_7-gcc7-build + with: + build-environment: linux-focal-py3.7-gcc7 + docker-image: ${{ needs.linux-focal-py3_7-gcc7-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-focal-py3_7-gcc7-build.outputs.test-matrix }} + linux-docs: name: linux-docs uses: ./.github/workflows/_docs.yml @@ -68,6 +72,15 @@ jobs: with: build-environment: linux-focal-py3.7-clang7-asan docker-image-name: pytorch-linux-focal-py3-clang7-asan + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 5, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 5, runner: "linux.2xlarge" }, + { config: "default", shard: 3, num_shards: 5, runner: "linux.2xlarge" }, + { config: "default", shard: 4, num_shards: 5, runner: "linux.4xlarge" }, + { config: "default", shard: 5, num_shards: 5, runner: "linux.4xlarge" }, + { config: "functorch", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} linux-focal-py3_7-clang7-asan-test: name: linux-focal-py3.7-clang7-asan @@ -76,14 +89,7 @@ jobs: with: build-environment: linux-focal-py3.7-clang7-asan docker-image: ${{ needs.linux-focal-py3_7-clang7-asan-build.outputs.docker-image }} - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 5, runner: "linux.2xlarge" }, - { config: "default", shard: 2, num_shards: 5, runner: "linux.2xlarge" }, - { config: "default", shard: 3, num_shards: 5, runner: "linux.2xlarge" }, - { config: "default", shard: 4, num_shards: 5, runner: "linux.2xlarge" }, - { config: "default", shard: 5, num_shards: 5, runner: "linux.2xlarge" }, - ]} + test-matrix: ${{ needs.linux-focal-py3_7-clang7-asan-build.outputs.test-matrix }} linux-focal-py3_7-clang10-onnx-build: name: linux-focal-py3.7-clang10-onnx @@ -91,6 +97,11 @@ jobs: with: build-environment: linux-focal-py3.7-clang10-onnx docker-image-name: pytorch-linux-focal-py3-clang10-onnx + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, + ]} linux-focal-py3_7-clang10-onnx-test: name: linux-focal-py3.7-clang10-onnx @@ -99,11 +110,7 @@ jobs: with: build-environment: linux-focal-py3.7-clang10-onnx docker-image: ${{ needs.linux-focal-py3_7-clang10-onnx-build.outputs.docker-image }} - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, - { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, - ]} + test-matrix: ${{ needs.linux-focal-py3_7-clang10-onnx-build.outputs.test-matrix }} linux-bionic-py3_7-clang9-build: name: linux-bionic-py3.7-clang9 @@ -111,14 +118,6 @@ jobs: with: build-environment: linux-bionic-py3.7-clang9 docker-image-name: pytorch-linux-bionic-py3.7-clang9 - - linux-bionic-py3_7-clang9-test: - name: linux-bionic-py3.7-clang9 - uses: ./.github/workflows/_linux-test.yml - needs: linux-bionic-py3_7-clang9-build - with: - build-environment: linux-bionic-py3.7-clang9 - docker-image: ${{ needs.linux-bionic-py3_7-clang9-build.outputs.docker-image }} test-matrix: | { include: [ { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, @@ -130,12 +129,14 @@ jobs: { config: "functorch", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, ]} - linux-bionic-cuda11_3-py3_7-clang9-build: - name: linux-bionic-cuda11.3-py3.7-clang9 - uses: ./.github/workflows/_linux-build.yml + linux-bionic-py3_7-clang9-test: + name: linux-bionic-py3.7-clang9 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-py3_7-clang9-build with: - build-environment: linux-bionic-cuda11.3-py3.7-clang9 - docker-image-name: pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9 + build-environment: linux-bionic-py3.7-clang9 + docker-image: ${{ needs.linux-bionic-py3_7-clang9-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-bionic-py3_7-clang9-build.outputs.test-matrix }} linux-vulkan-bionic-py3_7-clang9-build: name: linux-vulkan-bionic-py3.7-clang9 @@ -143,6 +144,10 @@ jobs: with: build-environment: linux-vulkan-bionic-py3.7-clang9 docker-image-name: pytorch-linux-bionic-py3.7-clang9 + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} linux-vulkan-bionic-py3_7-clang9-test: name: linux-vulkan-bionic-py3.7-clang9 @@ -151,10 +156,7 @@ jobs: with: build-environment: linux-vulkan-bionic-py3.7-clang9 docker-image: ${{ needs.linux-vulkan-bionic-py3_7-clang9-build.outputs.docker-image }} - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, - ]} + test-matrix: ${{ needs.linux-vulkan-bionic-py3_7-clang9-build.outputs.test-matrix }} linux-bionic-cuda11_6-py3_10-gcc7-build: name: linux-bionic-cuda11.6-py3.10-gcc7 @@ -162,31 +164,34 @@ jobs: with: build-environment: linux-bionic-cuda11.6-py3.10-gcc7 docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 - - linux-bionic-cuda11_6-py3_10-gcc7-test: - name: linux-bionic-cuda11.6-py3.10-gcc7 - uses: ./.github/workflows/_linux-test.yml - needs: linux-bionic-cuda11_6-py3_10-gcc7-build - with: - build-environment: linux-bionic-cuda11.6-py3.10-gcc7 - docker-image: ${{ needs.linux-bionic-cuda11_6-py3_10-gcc7-build.outputs.docker-image }} test-matrix: | { include: [ { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "distributed", shard: 1, num_shards: 2, runner: "linux.8xlarge.nvidia.gpu" }, - { config: "distributed", shard: 2, num_shards: 2, runner: "linux.8xlarge.nvidia.gpu" }, + { config: "distributed", shard: 1, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, + { config: "distributed", shard: 2, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, + { config: "distributed", shard: 3, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, { config: "functorch", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "deploy", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, ]} - linux-xenial-py3-clang5-mobile-build: - name: linux-xenial-py3-clang5-mobile-build + linux-bionic-cuda11_6-py3_10-gcc7-test: + name: linux-bionic-cuda11.6-py3.10-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda11_6-py3_10-gcc7-build + with: + build-environment: linux-bionic-cuda11.6-py3.10-gcc7 + docker-image: ${{ needs.linux-bionic-cuda11_6-py3_10-gcc7-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-bionic-cuda11_6-py3_10-gcc7-build.outputs.test-matrix }} + + linux-focal-py3-clang7-mobile-build: + name: linux-focal-py3-clang7-mobile-build uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-xenial-py3-clang5-mobile-build - docker-image-name: pytorch-linux-xenial-py3-clang5-asan + build-environment: linux-focal-py3-clang7-mobile-build + docker-image-name: pytorch-linux-focal-py3-clang7-asan build-generates-artifacts: false linux-jammy-cuda-11_6-cudnn8-py3_8-clang12-build: @@ -196,12 +201,12 @@ jobs: build-environment: linux-jammy-cuda11.6-cudnn8-py3.8-clang12 docker-image-name: pytorch-linux-jammy-cuda11.6-cudnn8-py3.8-clang12 - linux-xenial-py3-clang5-mobile-custom-build-static: - name: linux-xenial-py3-clang5-mobile-custom-build-static + linux-focal-py3-clang7-mobile-custom-build-static: + name: linux-focal-py3-clang7-mobile-custom-build-static uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-xenial-py3-clang5-mobile-custom-build-static - docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c + build-environment: linux-focal-py3-clang7-mobile-custom-build-static + docker-image-name: pytorch-linux-focal-py3-clang7-android-ndk-r19c build-generates-artifacts: false linux-bionic-py3_7-clang8-xla-build: @@ -210,6 +215,10 @@ jobs: with: build-environment: linux-bionic-py3_7-clang8-xla docker-image-name: xla_base + test-matrix: | + { include: [ + { config: "xla", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} linux-bionic-py3_7-clang8-xla-test: name: linux-bionic-py3_7-clang8-xla @@ -218,10 +227,7 @@ jobs: with: build-environment: linux-bionic-py3_7-clang8-xla docker-image: ${{ needs.linux-bionic-py3_7-clang8-xla-build.outputs.docker-image }} - test-matrix: | - { include: [ - { config: "xla", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, - ]} + test-matrix: ${{ needs.linux-bionic-py3_7-clang8-xla-build.outputs.test-matrix }} win-vs2019-cpu-py3-build: name: win-vs2019-cpu-py3 @@ -229,6 +235,12 @@ jobs: with: build-environment: win-vs2019-cpu-py3 cuda-version: cpu + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "windows.4xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "windows.4xlarge" }, + { config: "functorch", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, + ]} win-vs2019-cpu-py3-test: name: win-vs2019-cpu-py3 @@ -237,12 +249,7 @@ jobs: with: build-environment: win-vs2019-cpu-py3 cuda-version: cpu - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 2, runner: "windows.4xlarge" }, - { config: "default", shard: 2, num_shards: 2, runner: "windows.4xlarge" }, - { config: "functorch", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, - ]} + test-matrix: ${{ needs.win-vs2019-cpu-py3-build.outputs.test-matrix }} win-vs2019-cuda11_6-py3-build: if: github.event_name == 'pull_request' @@ -252,27 +259,37 @@ jobs: build-environment: win-vs2019-cuda11.6-py3 cuda-version: "11.6" sync-tag: win-cuda-build + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 5, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 5, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 3, num_shards: 5, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 4, num_shards: 5, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "default", shard: 5, num_shards: 5, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "functorch", shard: 1, num_shards: 1, runner: "windows.8xlarge.nvidia.gpu" }, + { config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, + ]} - linux-xenial-cuda11_3-py3_7-gcc7-bazel-test: - name: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test + linux-bionic-cuda11_6-py3_10-gcc7-bazel-test: + name: linux-bionic-cuda11.6-py3.10-gcc7-bazel-test uses: ./.github/workflows/_bazel-build-test.yml with: - build-environment: linux-xenial-cuda11.3-py3.7-gcc7-bazel-test - docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + build-environment: linux-bionic-cuda11.6-py3.10-gcc7-bazel-test + docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 - linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single: - name: linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single + linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single: + name: linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single uses: ./.github/workflows/_android-build-test.yml with: - build-environment: linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single - docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c + build-environment: linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single + docker-image-name: pytorch-linux-focal-py3-clang7-android-ndk-r19c - linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit: - name: linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit + linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single-full-jit: + name: linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single-full-jit uses: ./.github/workflows/_android-build-test.yml with: - build-environment: linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit - docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c + build-environment: linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single-full-jit + docker-image-name: pytorch-linux-focal-py3-clang7-android-ndk-r19c linux-focal-py3_7-gcc7-mobile-lightweight-dispatch-build: name: linux-focal-py3.7-gcc7-mobile-lightweight-dispatch-build @@ -282,31 +299,17 @@ jobs: docker-image-name: pytorch-linux-focal-py3.7-gcc7 build-generates-artifacts: false - linux-xenial-cuda11_3-py3_7-gcc7-deploy-build: - name: linux-xenial-cuda11_3-py3_7-gcc7-deploy - uses: ./.github/workflows/_linux-build.yml - with: - build-environment: linux-xenial-cuda11.3-py3.7-gcc7-deploy - docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 - - deploy-linux-xenial-cuda11_3-py3_7-gcc7-test: - name: linux-xenial-cuda11_3-py3_7-gcc7-deploy - uses: ./.github/workflows/_linux-test.yml - needs: linux-xenial-cuda11_3-py3_7-gcc7-deploy-build - with: - build-environment: linux-xenial-cuda11.3-py3.7-gcc7-deploy - docker-image: ${{ needs.linux-xenial-cuda11_3-py3_7-gcc7-deploy-build.outputs.docker-image }} - test-matrix: | - { include: [ - { config: "deploy", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, - ]} - - linux-focal-rocm5_2-py3_7-build: + linux-focal-rocm5_2-py3_8-build: # don't run build twice on master if: github.event_name == 'pull_request' - name: linux-focal-rocm5.2-py3.7 + name: linux-focal-rocm5.2-py3.8 uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-focal-rocm5.2-py3.7 - docker-image-name: pytorch-linux-focal-rocm5.2-py3.7 + build-environment: linux-focal-rocm5.2-py3.8 + docker-image-name: pytorch-linux-focal-rocm5.2-py3.8 sync-tag: rocm-build + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.rocm.gpu" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.rocm.gpu" }, + ]} diff --git a/.github/workflows/push_nightly_docker_ghcr.yml b/.github/workflows/push_nightly_docker_ghcr.yml deleted file mode 100644 index ca30c9651ff8..000000000000 --- a/.github/workflows/push_nightly_docker_ghcr.yml +++ /dev/null @@ -1,39 +0,0 @@ -name: docker-release-builds -on: - schedule: - # Push the nightly docker daily at 1 PM UTC - - cron: '0 13 * * *' - # Trigger when we modify something related to these images - pull_request: - paths: - - .github/scripts/build_publish_nightly_docker.sh - - .github/workflows/push_nightly_docker_ghcr.yml - - Dockerfile - - docker.Makefile - # Have the ability to trigger this job manually using the API as well - workflow_dispatch: - -jobs: - docker-release-build: - if: ${{ github.repository == 'pytorch/pytorch' }} - runs-on: linux.2xlarge - env: - GHCR_PAT: ${{ secrets.GHCR_PAT }} - WITH_PUSH: ${{ github.event_name == 'schedule' }} - steps: - - name: Checkout PyTorch - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9 - with: - ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }} - - uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a - name: Build and upload nightly docker - with: - timeout_minutes: 10 - max_attempts: 3 - command: | - set -ex - bash .github/scripts/build_publish_nightly_docker.sh - -concurrency: - group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }} - cancel-in-progress: true diff --git a/.github/workflows/revert.yml b/.github/workflows/revert.yml index 1fbdacc82071..2a2fff27044e 100644 --- a/.github/workflows/revert.yml +++ b/.github/workflows/revert.yml @@ -8,18 +8,25 @@ jobs: do_revert: name: try_revert_pr_${{ github.event.client_payload.pr_num }} runs-on: linux.20_04.4x + env: + GH_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.8 - architecture: x64 - name: Checkout repo uses: actions/checkout@v2 + id: checkout with: fetch-depth: 0 token: ${{ secrets.MERGEBOT_TOKEN }} + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.8' + architecture: x64 + check-latest: false + cache: pip + - run: pip install pyyaml==6.0 + - name: Setup committer id run: | git config --global user.email "pytorchmergebot@users.noreply.github.com" @@ -30,7 +37,6 @@ jobs: PR_NUM: ${{ github.event.client_payload.pr_num }} COMMENT_ID: ${{ github.event.client_payload.comment_id }} REASON: ${{ github.event.client_payload.reason }} - GH_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} run: | set -ex if [ -n "${COMMENT_ID}" ]; then @@ -46,5 +52,14 @@ jobs: python3 .github/scripts/trymerge.py --revert "${PR_NUM}" fi fi + - name: Comment on Canceled + if: ${{ cancelled() && steps.checkout.outcome == 'success' }} + continue-on-error: true + env: + GITHUB_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} + PR_NUM: ${{ github.event.client_payload.pr_num }} + run: | + set -ex + python3 .github/scripts/comment_on_pr.py "${PR_NUM}" "revert" concurrency: try-revert diff --git a/.github/workflows/run_torchbench.yml b/.github/workflows/run_torchbench.yml index 1ec238fe4d32..b6c870fa7839 100644 --- a/.github/workflows/run_torchbench.yml +++ b/.github/workflows/run_torchbench.yml @@ -1,17 +1,18 @@ -name: TorchBench CI (pytorch-linux-py3.7-cu102) +name: TorchBench CI (pytorch-linux-py3.8-cu116) on: pull_request: env: PYTHON_VERSION: "3.8" - CUDA_VERSION: "11.3" - MAGMA_VERSION: "magma-cuda113" # must be consistent with https://github.com/pytorch/benchmark/blob/main/requirements.txt#L19 NUMPY_VERSION: "1.21.2" + SETUP_SCRIPT: "/data/nvme/bin/setup_instance.sh" PR_NUM: ${{ github.event.number }} PR_BODY: ${{ github.event.pull_request.body }} PR_BASE_SHA: ${{ github.event.pull_request.base.sha }} PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }} + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} jobs: run-torchbench: @@ -35,20 +36,19 @@ jobs: - name: Create conda environment and install deps run: | conda create -y -n pr-ci python="${PYTHON_VERSION}" - # shellcheck disable=SC1091 - . "${HOME}"/anaconda3/etc/profile.d/conda.sh + # shellcheck source=/dev/null + . "${SETUP_SCRIPT}" conda activate pr-ci # pin cmake version to 3.22 since 3.23 breaks pytorch build # see details at: https://github.com/pytorch/pytorch/issues/74985 conda install -y numpy="${NUMPY_VERSION}" requests ninja pyyaml mkl mkl-include \ - setuptools cmake=3.22 cffi typing_extensions \ + setuptools cmake=3.22 cffi typing_extensions boto3 \ future six dataclasses pillow pytest tabulate gitpython git-lfs tqdm psutil - # install magma - conda install -y -c pytorch "${MAGMA_VERSION}" + pip install --pre torch torchvision torchtext -f https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html - name: Setup TorchBench branch run: | - # shellcheck disable=SC1091 - . "${HOME}"/anaconda3/etc/profile.d/conda.sh + # shellcheck source=/dev/null + . "${SETUP_SCRIPT}" conda activate pr-ci PR_BODY_FILE=/tmp/pr-body.txt echo "$PR_BODY" > ${PR_BODY_FILE} @@ -60,15 +60,19 @@ jobs: path: benchmark lfs: false ref: ${{ env.TORCHBENCH_BRANCH }} + - name: GPU Info + run: | + nvidia-smi - name: Run TorchBench run: | + set -x pushd "${HOME}"/pytorch PR_MERGE_BASE=$(git merge-base "$PR_BASE_SHA" "$PR_HEAD_SHA") popd PR_BODY_FILE=/tmp/pr-body.txt echo "$PR_BODY" > ${PR_BODY_FILE} - # shellcheck disable=SC1091 - . "${HOME}"/anaconda3/etc/profile.d/conda.sh + # shellcheck source=/dev/null + . "${SETUP_SCRIPT}" conda activate pr-ci python3 pytorch/.github/scripts/run_torchbench.py \ --pr-body "$PR_BODY_FILE" \ @@ -78,12 +82,20 @@ jobs: --pr-num "$PR_NUM" \ --pr-base-sha "$PR_MERGE_BASE" \ --pr-head-sha "$PR_HEAD_SHA" + - name: Upload result to S3 + run: | + # shellcheck source=/dev/null + . "${SETUP_SCRIPT}" + conda activate pr-ci + python3 pytorch/.github/scripts/run_torchbench.py \ + upload-s3 \ + --result-dir "${HOME}/.torchbench/bisection/pr${{ github.event.number }}" - name: Remove conda environment and cleanup run: | conda env remove --name pr-ci rm /tmp/pr-body.txt - name: Upload artifact - uses: actions/upload-artifact@v2 + uses: actions/upload-artifact@v3 with: name: TorchBench result path: ~/.torchbench/bisection/pr${{ github.event.number }} diff --git a/.github/workflows/scorecards.yml b/.github/workflows/scorecards.yml new file mode 100644 index 000000000000..8abee79cf400 --- /dev/null +++ b/.github/workflows/scorecards.yml @@ -0,0 +1,55 @@ +name: ossf-scorecard +on: + # Only the default branch is supported. + branch_protection_rule: + workflow_dispatch: + schedule: + - cron: '32 16 * * 3' + push: + branches: [ "master" ] + +# Declare default permissions as read only. +permissions: read-all + +jobs: + analysis: + name: Scorecards analysis + runs-on: ubuntu-latest + permissions: + # Needed to upload the results to code-scanning dashboard. + security-events: write + # Used to receive a badge. + id-token: write + + if: false && github.repository == 'pytorch/pytorch' # don't run on forks + + steps: + - name: "Checkout code" + uses: actions/checkout@v3 + with: + persist-credentials: false + + - name: "Run analysis" + uses: ossf/scorecard-action@865b4092859256271290c77adbd10a43f4779972 # tag=v2.0.3 + with: + results_file: results.sarif + results_format: sarif + + # Publish the results for public repositories to enable scorecard badges. For more details, see + # https://github.com/ossf/scorecard-action#publishing-results. + publish_results: true + + # Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF + # format to the repository Actions tab. + - name: "Upload artifact" + uses: actions/upload-artifact@v3 + with: + name: SARIF file + path: results.sarif + retention-days: 5 + + # Upload the results to GitHub's code scanning dashboard. + - name: "Upload to code-scanning" + uses: github/codeql-action/upload-sarif@5f532563584d71fdef14ee64d17bafb34f751ce5 # tag=v1.0.26 + with: + sarif_file: results.sarif diff --git a/.github/workflows/trunk.yml b/.github/workflows/trunk.yml index 0b4c147386a3..6779a362209c 100644 --- a/.github/workflows/trunk.yml +++ b/.github/workflows/trunk.yml @@ -10,9 +10,11 @@ on: tags: - ciflow/trunk/* workflow_dispatch: + schedule: + - cron: 29 8 * * * # about 1:29am PDT concurrency: - group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }} + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }}-${{ github.event_name == 'schedule' }} cancel-in-progress: true jobs: @@ -22,6 +24,11 @@ jobs: with: build-environment: parallelnative-linux-focal-py3.7-gcc7 docker-image-name: pytorch-linux-focal-py3.7-gcc7 + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, + { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, + ]} parallelnative-linux-focal-py3_7-gcc7-test: name: parallelnative-linux-focal-py3.7-gcc7 @@ -30,11 +37,7 @@ jobs: with: build-environment: parallelnative-linux-focal-py3.7-gcc7 docker-image: ${{ needs.parallelnative-linux-focal-py3_7-gcc7-build.outputs.docker-image }} - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 2, runner: "linux.2xlarge" }, - { config: "default", shard: 2, num_shards: 2, runner: "linux.2xlarge" }, - ]} + test-matrix: ${{ needs.parallelnative-linux-focal-py3_7-gcc7-build.outputs.test-matrix }} # Build PyTorch with BUILD_CAFFE2=ON caffe2-linux-focal-py3_7-gcc7-build: @@ -44,34 +47,63 @@ jobs: build-environment: caffe2-linux-focal-py3.7-gcc7 docker-image-name: pytorch-linux-focal-py3.7-gcc7 - linux-bionic-cuda10_2-py3_9-gcc7-build: - name: linux-bionic-cuda10.2-py3.9-gcc7 + linux-bionic-cuda11_7-py3_10-gcc7-build: + name: linux-bionic-cuda11.7-py3.10-gcc7 uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-bionic-cuda10.2-py3.9-gcc7 - docker-image-name: pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7 - - linux-bionic-cuda10_2-py3_9-gcc7-test: - name: linux-bionic-cuda10.2-py3.9-gcc7 - uses: ./.github/workflows/_linux-test.yml - needs: linux-bionic-cuda10_2-py3_9-gcc7-build - with: - build-environment: linux-bionic-cuda10.2-py3.9-gcc7 - docker-image: ${{ needs.linux-bionic-cuda10_2-py3_9-gcc7-build.outputs.docker-image }} + build-environment: linux-bionic-cuda11.7-py3.10-gcc7 + docker-image-name: pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7 test-matrix: | { include: [ - { config: "default", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "default", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 1, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 3, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, + { config: "default", shard: 4, num_shards: 4, runner: "linux.4xlarge.nvidia.gpu" }, { config: "functorch", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, { config: "slow", shard: 1, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, { config: "slow", shard: 2, num_shards: 2, runner: "linux.4xlarge.nvidia.gpu" }, { config: "nogpu_AVX512", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, { config: "nogpu_NO_AVX2", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, { config: "jit_legacy", shard: 1, num_shards: 1, runner: "linux.4xlarge.nvidia.gpu" }, - { config: "distributed", shard: 1, num_shards: 2, runner: "linux.8xlarge.nvidia.gpu" }, - { config: "distributed", shard: 2, num_shards: 2, runner: "linux.8xlarge.nvidia.gpu" }, + { config: "distributed", shard: 1, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, + { config: "distributed", shard: 2, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, + { config: "distributed", shard: 3, num_shards: 3, runner: "linux.8xlarge.nvidia.gpu" }, + ]} + + linux-bionic-cuda11_7-py3_10-gcc7-test: + name: linux-bionic-cuda11.7-py3.10-gcc7 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda11_7-py3_10-gcc7-build + with: + build-environment: linux-bionic-cuda11.7-py3.10-gcc7 + docker-image: ${{ needs.linux-bionic-cuda11_7-py3_10-gcc7-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-bionic-cuda11_7-py3_10-gcc7-build.outputs.test-matrix }} + + linux-bionic-cuda11_6-py3_10-gcc7-sm86-build: + name: cuda11.6-py3.10-gcc7-sm86 + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-bionic-cuda11.6-py3.10-gcc7-sm86 + docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 + cuda-arch-list: 8.6 + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "default", shard: 2, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "default", shard: 3, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "default", shard: 4, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" }, + { config: "functorch", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, ]} + linux-bionic-cuda11_6-py3_10-gcc7-sm86-test: + name: cuda11.6-py3.10-gcc7-sm86 + uses: ./.github/workflows/_linux-test.yml + needs: linux-bionic-cuda11_6-py3_10-gcc7-sm86-build + with: + build-environment: linux-bionic-cuda11.6-py3.10-gcc7-sm86 + docker-image: ${{ needs.linux-bionic-cuda11_6-py3_10-gcc7-sm86-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-bionic-cuda11_6-py3_10-gcc7-sm86-build.outputs.test-matrix }} + libtorch-linux-bionic-cuda11_6-py3_7-gcc7-build: name: libtorch-linux-bionic-cuda11.6-py3.7-gcc7 uses: ./.github/workflows/_linux-build.yml @@ -79,27 +111,22 @@ jobs: build-environment: libtorch-linux-bionic-cuda11.6-py3.7-gcc7 docker-image-name: pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7 build-generates-artifacts: false + runner: linux.4xlarge # no-ops builds test USE_PER_OPERATOR_HEADERS=0 where ATen/ops is not generated - linux-xenial-cuda11_3-py3_7-gcc7-no-ops-build: - name: linux-xenial-cuda11.3-py3.7-gcc7-no-ops + linux-bionic-cuda11_7-py3_10-gcc7-no-ops-build: + name: linux-bionic-cuda11.7-py3.10-gcc7-no-ops uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-xenial-cuda11.3-py3.7-gcc7-no-ops - docker-image-name: pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 + build-environment: linux-bionic-cuda11.7-py3.10-gcc7-no-ops + docker-image-name: pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7 - pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build: - name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build + pytorch-linux-focal-py3-clang7-android-ndk-r19c-build: + name: pytorch-linux-focal-py3-clang7-android-ndk-r19c-build uses: ./.github/workflows/_android-full-build-test.yml with: - build-environment: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build - docker-image-name: pytorch-linux-xenial-py3-clang5-android-ndk-r19c - secrets: - SONATYPE_NEXUS_USERNAME: ${{ secrets.SONATYPE_NEXUS_USERNAME }} - SONATYPE_NEXUS_PASSWORD: ${{ secrets.SONATYPE_NEXUS_PASSWORD }} - ANDROID_SIGN_KEY: ${{ secrets.ANDROID_SIGN_KEY }} - ANDROID_SIGN_PASS: ${{ secrets.ANDROID_SIGN_PASS }} - SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }} + build-environment: pytorch-linux-focal-py3-clang7-android-ndk-r19c-build + docker-image-name: pytorch-linux-focal-py3-clang7-android-ndk-r19c linux-bionic-py3_7-clang9-slow-build: name: linux-bionic-py3.7-clang9-slow @@ -107,6 +134,10 @@ jobs: with: build-environment: linux-bionic-py3.7-clang9-slow docker-image-name: pytorch-linux-bionic-py3.7-clang9 + test-matrix: | + { include: [ + { config: "slow", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + ]} linux-bionic-py3_7-clang9-slow-test: name: linux-bionic-py3.7-clang9-slow @@ -115,11 +146,28 @@ jobs: with: build-environment: linux-bionic-py3.7-clang9-slow docker-image: ${{ needs.linux-bionic-py3_7-clang9-slow-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-bionic-py3_7-clang9-slow-build.outputs.test-matrix }} + + linux-focal-py3_7-clang7-tsan-build: + name: linux-focal-py3.7-clang7-tsan + uses: ./.github/workflows/_linux-build.yml + with: + build-environment: linux-focal-py3.7-clang7-tsan + docker-image-name: pytorch-linux-focal-py3-clang7-asan test-matrix: | { include: [ - { config: "slow", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, + { config: "tsan", shard: 1, num_shards: 1, runner: "linux.2xlarge" }, ]} + linux-focal-py3_7-clang7-tsan-test: + name: linux-focal-py3.7-clang7-tsan + uses: ./.github/workflows/_linux-test.yml + needs: linux-focal-py3_7-clang7-tsan-build + with: + build-environment: linux-focal-py3.7-clang7-tsan + docker-image: ${{ needs.linux-focal-py3_7-clang7-tsan-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-focal-py3_7-clang7-tsan-build.outputs.test-matrix }} + ios-12-5-1-x86-64: name: ios-12-5-1-x86-64 uses: ./.github/workflows/_ios-build-test.yml @@ -127,11 +175,6 @@ jobs: build-environment: ios-12-5-1-x86-64 ios-platform: SIMULATOR ios-arch: x86_64 - secrets: - IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }} - IOS_CERT_SECRET: ${{ secrets.IOS_CERT_SECRET}} - IOS_DEV_TEAM_ID: ${{ secrets.IOS_DEV_TEAM_ID}} - IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }} macos-12-py3-x86-64-build: name: macos-12-py3-x86-64 @@ -141,6 +184,12 @@ jobs: xcode-version: "13.3.1" runner-type: macos-12-xl build-generates-artifacts: true + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "macos-12" }, + { config: "default", shard: 2, num_shards: 2, runner: "macos-12" }, + { config: "functorch", shard: 1, num_shards: 1, runner: "macos-12" }, + ]} secrets: MACOS_SCCACHE_S3_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} MACOS_SCCACHE_S3_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} @@ -151,12 +200,7 @@ jobs: needs: macos-12-py3-x86-64-build with: build-environment: macos-12-py3-x86-64 - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 2, runner: "macos-12" }, - { config: "default", shard: 2, num_shards: 2, runner: "macos-12" }, - { config: "functorch", shard: 1, num_shards: 1, runner: "macos-12" }, - ]} + test-matrix: ${{ needs.macos-12-py3-x86-64-build.outputs.test-matrix }} arch: x86_64 secrets: AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} @@ -185,6 +229,16 @@ jobs: build-generates-artifacts: true # To match the one pre-installed in the m1 runners python_version: 3.9.12 + # We need to set the environment file here instead of trying to detect it automatically because + # MacOS arm64 is cross-compiled from x86-64. Specifically, it means that arm64 conda environment + # is needed when building PyTorch MacOS arm64 from x86-64 + environment-file: .github/requirements/conda-env-macOS-ARM64 + test-matrix: | + { include: [ + { config: "default", shard: 1, num_shards: 2, runner: "macos-m1-12" }, + { config: "default", shard: 2, num_shards: 2, runner: "macos-m1-12" }, + { config: "functorch", shard: 1, num_shards: 1, runner: "macos-m1-12" }, + ]} secrets: MACOS_SCCACHE_S3_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }} MACOS_SCCACHE_S3_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }} @@ -193,6 +247,7 @@ jobs: name: macos-12-py3-arm64-mps uses: ./.github/workflows/_mac-test-mps.yml needs: macos-12-py3-arm64-build + if: needs.macos-12-py3-arm64-build.outputs.build-outcome == 'success' with: sync-tag: macos-12-py3-arm64-mps-test build-environment: macos-12-py3-arm64 @@ -203,11 +258,7 @@ jobs: needs: macos-12-py3-arm64-build with: build-environment: macos-12-py3-arm64 - test-matrix: | - { include: [ - { config: "default", shard: 1, num_shards: 2, runner: "macos-m1-12" }, - { config: "default", shard: 2, num_shards: 2, runner: "macos-m1-12" }, - ]} + test-matrix: ${{ needs.macos-12-py3-arm64-build.outputs.test-matrix }} arch: arm64 secrets: AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} @@ -220,14 +271,6 @@ jobs: build-environment: win-vs2019-cuda11.6-py3 cuda-version: "11.6" sync-tag: win-cuda-build - - win-vs2019-cuda11_6-py3-test: - name: win-vs2019-cuda11.6-py3 - uses: ./.github/workflows/_win-test.yml - needs: win-vs2019-cuda11_6-py3-build - with: - build-environment: win-vs2019-cuda11.6-py3 - cuda-version: "11.6" test-matrix: | { include: [ { config: "default", shard: 1, num_shards: 5, runner: "windows.8xlarge.nvidia.gpu" }, @@ -239,26 +282,36 @@ jobs: { config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, ]} - linux-focal-rocm5_2-py3_7-build: - name: linux-focal-rocm5.2-py3.7 - uses: ./.github/workflows/_linux-build.yml + win-vs2019-cuda11_6-py3-test: + name: win-vs2019-cuda11.6-py3 + uses: ./.github/workflows/_win-test.yml + needs: win-vs2019-cuda11_6-py3-build with: - build-environment: linux-focal-rocm5.2-py3.7 - docker-image-name: pytorch-linux-focal-rocm5.2-py3.7 - sync-tag: rocm-build + build-environment: win-vs2019-cuda11.6-py3 + cuda-version: "11.6" + test-matrix: ${{ needs.win-vs2019-cuda11_6-py3-build.outputs.test-matrix }} - linux-focal-rocm5_2-py3_7-test: - name: linux-focal-rocm5.2-py3.7 - uses: ./.github/workflows/_rocm-test.yml - needs: linux-focal-rocm5_2-py3_7-build + linux-focal-rocm5_2-py3_8-build: + name: linux-focal-rocm5.2-py3.8 + uses: ./.github/workflows/_linux-build.yml with: - build-environment: linux-focal-rocm5.2-py3.7 - docker-image: ${{ needs.linux-focal-rocm5_2-py3_7-build.outputs.docker-image }} + build-environment: linux-focal-rocm5.2-py3.8 + docker-image-name: pytorch-linux-focal-rocm5.2-py3.8 + sync-tag: rocm-build test-matrix: | { include: [ { config: "default", shard: 1, num_shards: 2, runner: "linux.rocm.gpu" }, { config: "default", shard: 2, num_shards: 2, runner: "linux.rocm.gpu" }, ]} + + linux-focal-rocm5_2-py3_8-test: + name: linux-focal-rocm5.2-py3.8 + uses: ./.github/workflows/_rocm-test.yml + needs: linux-focal-rocm5_2-py3_8-build + with: + build-environment: linux-focal-rocm5.2-py3.8 + docker-image: ${{ needs.linux-focal-rocm5_2-py3_8-build.outputs.docker-image }} + test-matrix: ${{ needs.linux-focal-rocm5_2-py3_8-build.outputs.test-matrix }} secrets: AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_V2_ACCESS_KEY_ID }} AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_V2_SECRET_ACCESS_KEY }} diff --git a/.github/workflows/trymerge.yml b/.github/workflows/trymerge.yml index 8db7b0c97c5c..3d1d92967d88 100644 --- a/.github/workflows/trymerge.yml +++ b/.github/workflows/trymerge.yml @@ -8,18 +8,25 @@ jobs: do_merge: name: try_merge_pr_${{ github.event.client_payload.pr_num }} runs-on: linux.20_04.4x + env: + GH_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.8 - architecture: x64 - name: Checkout repo - uses: actions/checkout@v2 + id: checkout + uses: actions/checkout@v3 with: fetch-depth: 0 token: ${{ secrets.MERGEBOT_TOKEN }} + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.8' + check-latest: false + cache: pip + architecture: x64 + - run: pip install pyyaml==6.0 + - name: Setup committer id run: | git config --global user.email "pytorchmergebot@users.noreply.github.com" @@ -28,13 +35,21 @@ jobs: env: GITHUB_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} PR_NUM: ${{ github.event.client_payload.pr_num }} - GH_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} FORCE: ${{ github.event.client_payload.force}} ON_GREEN: ${{ github.event.client_payload.on_green}} LAND_CHECKS: ${{ github.event.client_payload.land_checks }} COMMENT_ID: ${{ github.event.client_payload.comment_id }} + REBASE: ${{ github.event.client_payload.rebase }} run: | set -ex + if [ -n "${REBASE}" ]; then + python3 .github/scripts/tryrebase.py "${PR_NUM}" --branch "${REBASE}" + git checkout master + git fetch -p + # give github some time between the push and start workflows so that Github's messages + # on the PR appear in chronological order (timing issues can shuffle them around) + sleep 60 + fi if [ -n "${FORCE}" ]; then if [ -n "${COMMENT_ID}" ]; then python3 .github/scripts/trymerge.py --force --comment-id "${COMMENT_ID}" "${PR_NUM}" @@ -50,6 +65,15 @@ jobs: else python3 .github/scripts/trymerge.py "${PR_NUM}" fi + - name: Comment on Canceled + if: ${{ cancelled() && steps.checkout.outcome == 'success' }} + continue-on-error: true + env: + GITHUB_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} + PR_NUM: ${{ github.event.client_payload.pr_num }} + run: | + set -ex + python3 .github/scripts/comment_on_pr.py "${PR_NUM}" "merge" # We want newer merge commands to supercede old ones concurrency: diff --git a/.github/workflows/tryrebase.yml b/.github/workflows/tryrebase.yml index 748127ff2d62..53434310c3d0 100644 --- a/.github/workflows/tryrebase.yml +++ b/.github/workflows/tryrebase.yml @@ -7,19 +7,25 @@ on: jobs: do_rebase: runs-on: ubuntu-20.04 + env: + GH_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.8 - architecture: x64 - - name: Checkout repo + id: checkout uses: actions/checkout@v2 with: fetch-depth: 0 token: ${{ secrets.MERGEBOT_TOKEN }} + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.8' + architecture: x64 + check-latest: false + cache: pip + - run: pip install pyyaml==6.0 + - name: Setup committer id run: | git config --global user.email "pytorchmergebot@users.noreply.github.com" @@ -29,7 +35,6 @@ jobs: env: GITHUB_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} PR_NUM: ${{ github.event.client_payload.pr_num }} - GH_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} BRANCH: ${{ github.event.client_payload.branch }} run: | set -ex @@ -38,3 +43,12 @@ jobs: else python3 .github/scripts/tryrebase.py "${PR_NUM}" fi + - name: Comment on Canceled + if: ${{ cancelled() && steps.checkout.outcome == 'success' }} + continue-on-error: true + env: + GITHUB_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} + PR_NUM: ${{ github.event.client_payload.pr_num }} + run: | + set -ex + python3 .github/scripts/comment_on_pr.py "${PR_NUM}" "rebase" diff --git a/.github/workflows/update-commit-hashes.yml b/.github/workflows/update-commit-hashes.yml deleted file mode 100644 index 6c72492d93ac..000000000000 --- a/.github/workflows/update-commit-hashes.yml +++ /dev/null @@ -1,37 +0,0 @@ -name: update-commit-hashes - -on: - schedule: - # Every day at 7:37am UTC = 12:27am PST - # Choose a random time near midnight PST because it may be delayed if there are high loads - # See https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule - - cron: 37 7 * * * - workflow_dispatch: - -jobs: - update-xla-commit-hash: - uses: ./.github/workflows/_update-commit-hash.yml - with: - repo-name: xla - branch: master - secrets: - MERGEBOT_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} - PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} - - update-torchdynamo-commit-hash: - uses: ./.github/workflows/_update-commit-hash.yml - with: - repo-name: torchdynamo - branch: main - secrets: - MERGEBOT_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} - PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} - - update-vision-commit-hash: - uses: ./.github/workflows/_update-commit-hash.yml - with: - repo-name: vision - branch: main - secrets: - MERGEBOT_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} - PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} diff --git a/.github/workflows/update-viablestrict.yml b/.github/workflows/update-viablestrict.yml index 872d8f5c1428..12bf4e271f92 100644 --- a/.github/workflows/update-viablestrict.yml +++ b/.github/workflows/update-viablestrict.yml @@ -7,24 +7,29 @@ on: concurrency: group: ${{ github.workflow }} - cancel-in-progress: true + cancel-in-progress: false jobs: do_update_viablestrict: runs-on: ubuntu-20.04 steps: - - name: Setup Python - uses: actions/setup-python@v2 - with: - python-version: 3.8 - architecture: x64 - - name: Checkout repo - uses: actions/checkout@v2 + uses: actions/checkout@v3 with: fetch-depth: 0 token: ${{ secrets.MERGEBOT_TOKEN }} + - name: Setup Python + uses: actions/setup-python@v4 + with: + python-version: '3.8' + architecture: x64 + check-latest: false + cache: pip + cache-dependency-path: | + **/.circleci/docker/requirements-ci.txt + **/.github/requirements-gha-cache.txt + - name: Install Python Packages run: | pip3 install rockset==0.8.10 @@ -36,7 +41,7 @@ jobs: ROCKSET_API_KEY: ${{ secrets.ROCKSET_API_KEY }} run: | output=$(python3 .github/scripts/fetch_latest_green_commit.py) - echo "::set-output name=latest_viable_sha::$output" + echo "latest_viable_sha=$output" >> "${GITHUB_OUTPUT}" id: get-latest-commit - name: Push SHA to viable/strict branch @@ -47,4 +52,6 @@ jobs: git config --global user.email "pytorchmergebot@users.noreply.github.com" git config --global user.name "PyTorch MergeBot" echo "Set the latest sha variable to be ${{ steps.get-latest-commit.outputs.latest_viable_sha }}" - git push origin "${{ steps.get-latest-commit.outputs.latest_viable_sha }}":viable/strict + # Pushing an older green commit here will fail because it's non-fast-forward, which is ok + # to ignore because we already have the later green commit in visable/strict + git push origin "${{ steps.get-latest-commit.outputs.latest_viable_sha }}":viable/strict || true diff --git a/.github/workflows/update_pytorch_labels.yml b/.github/workflows/update_pytorch_labels.yml index f19347070ece..31bbab78e2f9 100644 --- a/.github/workflows/update_pytorch_labels.yml +++ b/.github/workflows/update_pytorch_labels.yml @@ -10,7 +10,7 @@ concurrency: jobs: update-labels-in-S3: - runs-on: ubuntu-18.04 + runs-on: ubuntu-22.04 if: ${{ github.repository == 'pytorch/pytorch' }} steps: - name: Checkout PyTorch diff --git a/.github/workflows/update_s3_htmls.yml b/.github/workflows/update_s3_htmls.yml index 5f3ff056c5a4..d68b58911bed 100644 --- a/.github/workflows/update_s3_htmls.yml +++ b/.github/workflows/update_s3_htmls.yml @@ -8,7 +8,7 @@ on: jobs: update-html: - runs-on: ubuntu-18.04 + runs-on: ubuntu-22.04 if: ${{ github.repository == 'pytorch/pytorch' }} strategy: matrix: diff --git a/.github/workflows/upload-test-stats.yml b/.github/workflows/upload-test-stats.yml index b649aac2c7c5..3f3db80670d8 100644 --- a/.github/workflows/upload-test-stats.yml +++ b/.github/workflows/upload-test-stats.yml @@ -2,7 +2,7 @@ name: Upload test stats on: workflow_run: - workflows: [pull, trunk, periodic] + workflows: [pull, trunk, periodic, inductor] types: - completed @@ -58,6 +58,31 @@ jobs: python3 -m tools.stats.upload_test_stats --workflow-run-id "${WORKFLOW_RUN_ID}" --workflow-run-attempt "${WORKFLOW_RUN_ATTEMPT}" --head-branch "${HEAD_BRANCH}" python3 -m tools.stats.upload_sccache_stats --workflow-run-id "${WORKFLOW_RUN_ID}" --workflow-run-attempt "${WORKFLOW_RUN_ATTEMPT}" + - name: Upload test artifacts + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + WORKFLOW_ARTIFACTS_URL: ${{ github.event.workflow_run.artifacts_url }} + WORKFLOW_RUN_ID: ${{ github.event.workflow_run.id }} + WORKFLOW_RUN_ATTEMPT: ${{ github.event.workflow_run.run_attempt }} + REPO_FULLNAME: ${{ github.event.workflow_run.repository.full_name }} + run: | + echo "${WORKFLOW_ARTIFACTS_URL}" + + # Note that in the case of Linux and Windows, their artifacts have already been uploaded to S3, so there simply won't be + # anything on GitHub to upload. The command should return right away + python3 -m tools.stats.upload_artifacts --workflow-run-id "${WORKFLOW_RUN_ID}" --workflow-run-attempt "${WORKFLOW_RUN_ATTEMPT}" --repo "${REPO_FULLNAME}" + + - name: Analyze disabled tests rerun + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + WORKFLOW_ARTIFACTS_URL: ${{ github.event.workflow_run.artifacts_url }} + WORKFLOW_RUN_ID: ${{ github.event.workflow_run.id }} + WORKFLOW_RUN_ATTEMPT: ${{ github.event.workflow_run.run_attempt }} + REPO_FULLNAME: ${{ github.event.workflow_run.repository.full_name }} + run: | + # Analyze the results from disable tests rerun and upload them to S3 + python3 -m tools.stats.check_disabled_tests --workflow-run-id "${WORKFLOW_RUN_ID}" --workflow-run-attempt "${WORKFLOW_RUN_ATTEMPT}" --repo "${REPO_FULLNAME}" + check-api-rate: if: ${{ always() }} runs-on: [self-hosted, linux.2xlarge] @@ -66,5 +91,9 @@ jobs: - name: Get our GITHUB_TOKEN API limit usage env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN}} + MERGEBOT_TOKEN: ${{ secrets.MERGEBOT_TOKEN}} run: | curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit + curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token $PYTORCHBOT_TOKEN" https://api.github.com/rate_limit + curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token $MERGEBOT_TOKEN" https://api.github.com/rate_limit diff --git a/.github/workflows/weekly.yml b/.github/workflows/weekly.yml new file mode 100644 index 000000000000..d87c610e1426 --- /dev/null +++ b/.github/workflows/weekly.yml @@ -0,0 +1,19 @@ +name: weekly + +on: + schedule: + # Mondays at 7:37am UTC = 12:27am PST + # Choose a random time near midnight PST because it may be delayed if there are high loads + # See https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule + - cron: 37 7 * * 1 + workflow_dispatch: + +jobs: + update-xla-commit-hash: + uses: ./.github/workflows/_update-commit-hash.yml + with: + repo-name: xla + branch: master + secrets: + MERGEBOT_TOKEN: ${{ secrets.MERGEBOT_TOKEN }} + PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }} diff --git a/.gitignore b/.gitignore index 88d472b456f4..597ae390abe9 100644 --- a/.gitignore +++ b/.gitignore @@ -46,6 +46,7 @@ docs/source/generated/ log usage_log.txt test-reports/ +test/*.bak test/.coverage test/.hypothesis/ test/cpp/api/mnist @@ -78,10 +79,6 @@ torch/testing/_internal/generated/annotated_fn_args.py torch/testing/_internal/data/*.pt torch/csrc/api/include/torch/version.h torch/csrc/cudnn/cuDNN.cpp -torch/csrc/deploy/example/generated -torch/csrc/deploy/interpreter/cpython -torch/csrc/deploy/interpreter/frozen -torch/csrc/deploy/interpreter/third_party/typing_extensions.py torch/csrc/generated torch/csrc/generic/TensorMethods.cpp torch/csrc/jit/generated/* @@ -117,6 +114,7 @@ torch/test/ torch/utils/benchmark/utils/valgrind_wrapper/callgrind.h torch/utils/benchmark/utils/valgrind_wrapper/valgrind.h torch/version.py +minifier_launcher.py # Root level file used in CI to specify certain env configs. # E.g., see .circleci/config.yaml env @@ -307,6 +305,9 @@ TAGS # bazel symlinks bazel-* +# xla repo +xla/ + # direnv, posh-direnv .envrc .psenvrc @@ -335,3 +336,9 @@ buck-out/ # Downloaded libraries third_party/ruy/ third_party/glog/ + +# Virtualenv +venv/ + +# Log files +*.log diff --git a/.gitmodules b/.gitmodules index 538967d31764..282746ed0b53 100644 --- a/.gitmodules +++ b/.gitmodules @@ -148,3 +148,9 @@ [submodule "third_party/nlohmann"] path = third_party/nlohmann url = https://github.com/nlohmann/json.git +[submodule "third_party/VulkanMemoryAllocator"] + path = third_party/VulkanMemoryAllocator + url = https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git +[submodule "third_party/cutlass"] + path = third_party/cutlass + url = https://github.com/NVIDIA/cutlass.git diff --git a/.jenkins/caffe2/bench.sh b/.jenkins/caffe2/bench.sh deleted file mode 100755 index 55ac4e94df21..000000000000 --- a/.jenkins/caffe2/bench.sh +++ /dev/null @@ -1,54 +0,0 @@ -#!/bin/bash - -# shellcheck source=./common.sh -source "$(dirname "${BASH_SOURCE[0]}")/common.sh" - -# Anywhere except $ROOT_DIR should work. This is so the python import doesn't -# get confused by any 'caffe2' directory in cwd -cd "$INSTALL_PREFIX" - -if [[ $BUILD_ENVIRONMENT == *-cuda* ]]; then - num_gpus=$(nvidia-smi -L | wc -l) -elif [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then - num_gpus=$(rocminfo | grep 'Device Type.*GPU' | wc -l) -else - num_gpus=0 -fi - -caffe2_pypath="$(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.dirname(os.path.realpath(caffe2.__file__)))')" -# Resnet50 -if (( $num_gpus == 0 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --use_cpu -fi -if (( $num_gpus >= 1 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 1 - # Let's skip the fp16 bench runs for now, as it recompiles the miopen kernels and can take 10+min to run. - # We can resume when we (1) bindmount the miopen cache folder in jenkins; (2) install the pre-compiled miopen kernel library in the docker - # "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 256 --epoch_size 25600 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16 -fi -if (( $num_gpus >= 4 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 512 --epoch_size 51200 --num_epochs 2 --num_gpus 4 -fi - -# ResNext -if (( $num_gpus == 0 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu -fi -if (( $num_gpus >= 1 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1 - # "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 64 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --float16_compute --dtype float16 -fi -if (( $num_gpus >= 4 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --resnext_num_groups 32 --resnext_width_per_group 4 --num_layers 101 --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4 -fi - -# Shufflenet -if (( $num_gpus == 0 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --use_cpu --model shufflenet -fi -if (( $num_gpus >= 1 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 32 --epoch_size 3200 --num_epochs 2 --num_gpus 1 --model shufflenet -fi -if (( $num_gpus >= 4 )); then - "$PYTHON" "$caffe2_pypath/python/examples/imagenet_trainer.py" --train_data null --batch_size 128 --epoch_size 12800 --num_epochs 2 --num_gpus 4 --model shufflenet -fi diff --git a/.jenkins/caffe2/build.sh b/.jenkins/caffe2/build.sh deleted file mode 100755 index e6e06c1d7db5..000000000000 --- a/.jenkins/caffe2/build.sh +++ /dev/null @@ -1,231 +0,0 @@ -#!/bin/bash - -set -ex - -# shellcheck source=./common.sh -source "$(dirname "${BASH_SOURCE[0]}")/common.sh" - -# CMAKE_ARGS are only passed to 'cmake' and the -Dfoo=bar does not work with -# setup.py, so we build a list of foo=bars and then either convert it to -# -Dfoo=bars or export them before running setup.py -build_args=() -build_to_cmake () { - cmake_args=() - for build_arg in $*; do - cmake_args+=("-D$build_arg") - done - echo ${cmake_args[@]} -} - - -SCCACHE="$(which sccache)" - -# Setup ccache if configured to use it (and not sccache) -if [ -z "${SCCACHE}" ] && which ccache > /dev/null; then - mkdir -p ./ccache - ln -sf "$(which ccache)" ./ccache/cc - ln -sf "$(which ccache)" ./ccache/c++ - ln -sf "$(which ccache)" ./ccache/gcc - ln -sf "$(which ccache)" ./ccache/g++ - ln -sf "$(which ccache)" ./ccache/x86_64-linux-gnu-gcc - if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then - mkdir -p ./ccache/cuda - ln -sf "$(which ccache)" ./ccache/cuda/nvcc - fi - export CACHE_WRAPPER_DIR="$PWD/ccache" - export PATH="$CACHE_WRAPPER_DIR:$PATH" -fi - -# sccache will fail for CUDA builds if all cores are used for compiling -if [ -z "$MAX_JOBS" ]; then - if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]] && [ -n "${SCCACHE}" ]; then - MAX_JOBS=`expr $(nproc) - 1` - else - MAX_JOBS=$(nproc) - fi -fi - -report_compile_cache_stats() { - if [[ -n "${SCCACHE}" ]]; then - "$SCCACHE" --show-stats - elif which ccache > /dev/null; then - ccache -s - fi -} - - -############################################################################### -# Use special scripts for Android and setup builds -############################################################################### -if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then - export ANDROID_NDK=/opt/ndk - build_args+=("BUILD_BINARY=ON") - build_args+=("BUILD_TEST=ON") - build_args+=("USE_OBSERVERS=ON") - build_args+=("USE_ZSTD=ON") - BUILD_CAFFE2_MOBILE=1 "${ROOT_DIR}/scripts/build_android.sh" $(build_to_cmake ${build_args[@]}) "$@" - exit 0 -fi - -############################################################################### -# Set parameters -############################################################################### -if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then - build_args+=("BUILD_PYTHON=OFF") -else - build_args+=("BUILD_PYTHON=ON") - build_args+=("PYTHON_EXECUTABLE=${PYTHON}") -fi -if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then - build_args+=("BLAS=MKL") - build_args+=("USE_MKLDNN=ON") -fi -build_args+=("BUILD_BINARY=ON") -build_args+=("BUILD_TEST=ON") -build_args+=("INSTALL_TEST=ON") -build_args+=("USE_ZSTD=ON") - -if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then - build_args+=("USE_CUDA=ON") - build_args+=("USE_NNPACK=OFF") - - # Target only our CI GPU machine's CUDA arch to speed up the build - build_args+=("TORCH_CUDA_ARCH_LIST=Maxwell") - - # Explicitly set path to NVCC such that the symlink to ccache or sccache is used - if [ -n "${CACHE_WRAPPER_DIR}" ]; then - build_args+=("CUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/cuda/nvcc") - build_args+=("CMAKE_CUDA_COMPILER_LAUNCHER=${CACHE_WRAPPER_DIR}/ccache") - fi - - # Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit. - # Setting PATH to resolve to the right nvcc alone isn't enough. - # See /usr/share/cmake-3.5/Modules/FindCUDA.cmake, block at line 589. - export CUDA_PATH="/usr/local/cuda" - - # Ensure the ccache symlink can still find the real nvcc binary. - export PATH="/usr/local/cuda/bin:$PATH" -fi -if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then - if [[ -n "$CI" && -z "$PYTORCH_ROCM_ARCH" ]]; then - # Set ROCM_ARCH to gfx900 and gfx906 for CI builds, if user doesn't override. - echo "Limiting PYTORCH_ROCM_ARCH to gfx90[06] for CI builds" - export PYTORCH_ROCM_ARCH="gfx900;gfx906" - fi - # This is needed to enable ImageInput operator in resnet50_trainer - build_args+=("USE_OPENCV=ON") - # This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip - build_args+=("USE_LMDB=ON") - # hcc used to run out of memory, silently exiting without stopping - # the build process, leaving undefined symbols in the shared lib, - # causing undefined symbol errors when later running tests. - # We used to set MAX_JOBS to 4 to avoid, but this is no longer an issue. - if [ -z "$MAX_JOBS" ]; then - export MAX_JOBS=$(($(nproc) - 1)) - fi - - ########## HIPIFY Caffe2 operators - ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_amd.py" -fi - -# Try to include Redis support for Linux builds -if [ "$(uname)" == "Linux" ]; then - build_args+=("USE_REDIS=ON") -fi - -# Use a specialized onnx namespace in CI to catch hardcoded onnx namespace -build_args+=("ONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI") - -############################################################################### -# Configure and make -############################################################################### - -if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then - # cmake-only non-setup.py build, to test cpp only bits. This installs into - # /usr/local/caffe2 and installs no Python tests - build_args+=("CMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}") - - # Run cmake from ./build_caffe2 directory so it doesn't conflict with - # standard PyTorch build directory. Eventually these won't need to - # be separate. - rm -rf build_caffe2 - mkdir build_caffe2 - cd ./build_caffe2 - - # We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04) - # and use that if so. - if [[ -x "$(command -v cmake3)" ]]; then - CMAKE_BINARY=cmake3 - else - CMAKE_BINARY=cmake - fi - - # Configure - ${CMAKE_BINARY} "${ROOT_DIR}" $(build_to_cmake ${build_args[@]}) "$@" - - # Build - if [ "$(uname)" == "Linux" ]; then - make "-j${MAX_JOBS}" install - else - echo "Don't know how to build on $(uname)" - exit 1 - fi - - # This is to save test binaries for testing - mv "$INSTALL_PREFIX/test/" "$INSTALL_PREFIX/cpp_test/" - - ls -lah $INSTALL_PREFIX - -else - # Python build. Uses setup.py to install into site-packages - build_args+=("USE_LEVELDB=ON") - build_args+=("USE_LMDB=ON") - build_args+=("USE_OPENCV=ON") - build_args+=("BUILD_TEST=ON") - # These flags preserve the flags that were used before this refactor (blame - # me) - build_args+=("USE_GLOG=ON") - build_args+=("USE_GFLAGS=ON") - build_args+=("USE_FBGEMM=OFF") - build_args+=("USE_MKLDNN=OFF") - build_args+=("USE_DISTRIBUTED=ON") - for build_arg in "${build_args[@]}"; do - export $build_arg - done - - # sccache will be stuck if all cores are used for compiling - # see https://github.com/pytorch/pytorch/pull/7361 - if [[ -n "${SCCACHE}" && $BUILD_ENVIRONMENT != *rocm* ]]; then - export MAX_JOBS=`expr $(nproc) - 1` - fi - - pip install --user dataclasses typing_extensions - - $PYTHON setup.py install --user - - report_compile_cache_stats -fi - -############################################################################### -# Install ONNX -############################################################################### - -# Install ONNX into a local directory -pip install --user "file://${ROOT_DIR}/third_party/onnx#egg=onnx" - -report_compile_cache_stats - -if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then - # remove sccache wrappers post-build; runtime compilation of MIOpen kernels does not yet fully support them - sudo rm -f /opt/cache/bin/cc - sudo rm -f /opt/cache/bin/c++ - sudo rm -f /opt/cache/bin/gcc - sudo rm -f /opt/cache/bin/g++ - pushd /opt/rocm/llvm/bin - if [[ -d original ]]; then - sudo mv original/clang . - sudo mv original/clang++ . - fi - sudo rm -rf original - popd -fi diff --git a/.jenkins/caffe2/dirty.sh b/.jenkins/caffe2/dirty.sh deleted file mode 100755 index 6b9ba544dab9..000000000000 --- a/.jenkins/caffe2/dirty.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash -set -ex -upstream="$1" -pr="$2" -git diff --name-only "$upstream" "$pr" -# For safety, unconditionally trigger for any changes. -#git diff --name-only "$upstream" "$pr" | grep -Eq '^(CMakeLists.txt|Makefile|.gitmodules|.jenkins/caffe2|binaries|caffe|caffe2|cmake|conda|docker|docs/caffe2|modules|scripts|third_party)' diff --git a/.jenkins/caffe2/test.sh b/.jenkins/caffe2/test.sh index 3c1f42aa9d64..d245dabda4da 100755 --- a/.jenkins/caffe2/test.sh +++ b/.jenkins/caffe2/test.sh @@ -149,6 +149,9 @@ export DNNL_MAX_CPU_ISA=AVX2 # Should still run even in the absence of SHARD_NUMBER if [[ "${SHARD_NUMBER:-1}" == "1" ]]; then + # TODO(sdym@meta.com) remove this when the linked issue resolved. + # py is temporary until https://github.com/Teemu/pytest-sugar/issues/241 is fixed + pip install --user py==1.11.0 pip install --user pytest-sugar # NB: Warnings are disabled because they make it harder to see what # the actual erroring test is @@ -173,7 +176,9 @@ fi ############## if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)" - pip install -q --user ninja flatbuffers==2.0 numpy==1.21.5 onnxruntime==1.12.1 + pip install -q --user ninja flatbuffers==2.0 numpy==1.21.5 onnxruntime==1.12.1 beartype==0.10.4 onnx==1.12.0 + # TODO: change this when onnx-script is on testPypi + pip install 'onnx-script @ git+https://github.com/microsoft/onnx-script' # numba requires numpy <= 1.20, onnxruntime requires numpy >= 1.21. # We don't actually need it for our tests, but it's imported if it's present, so uninstall. pip uninstall -q --yes numba diff --git a/.jenkins/pytorch/build-asan.sh b/.jenkins/pytorch/build-asan.sh index d46f4bd2a685..91953c322f22 100755 --- a/.jenkins/pytorch/build-asan.sh +++ b/.jenkins/pytorch/build-asan.sh @@ -26,7 +26,7 @@ CC="clang" CXX="clang++" LDSHARED="clang --shared" \ CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize-address-use-after-scope -shared-libasan" \ USE_ASAN=1 USE_CUDA=0 USE_MKLDNN=0 \ python setup.py bdist_wheel - python -mpip install dist/*.whl + pip_install_whl "$(echo dist/*.whl)" # Test building via the sdist source tarball python setup.py sdist diff --git a/.jenkins/pytorch/build-tsan.sh b/.jenkins/pytorch/build-tsan.sh new file mode 100755 index 000000000000..e10edb310d81 --- /dev/null +++ b/.jenkins/pytorch/build-tsan.sh @@ -0,0 +1,29 @@ +#!/bin/bash + +# Required environment variable: $BUILD_ENVIRONMENT +# (This is set by default in the Docker images we build, so you don't +# need to set it yourself. + +# shellcheck source=./common.sh +source "$(dirname "${BASH_SOURCE[0]}")/common.sh" +# shellcheck source=./common-build.sh +source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh" + +echo "Clang version:" +clang --version + +python tools/stats/export_test_times.py + +if [ -n "$(which conda)" ]; then + export CMAKE_PREFIX_PATH=/opt/conda +fi + +CC="clang" CXX="clang++" LDSHARED="clang --shared" \ + CFLAGS="-fsanitize=thread" \ + USE_TSAN=1 USE_CUDA=0 USE_MKLDNN=0 \ + python setup.py bdist_wheel + pip_install_whl "$(echo dist/*.whl)" + +print_sccache_stats + +assert_git_not_dirty diff --git a/.jenkins/pytorch/build.sh b/.jenkins/pytorch/build.sh index e258c8e9b6b1..bb7b2c5d03c8 100755 --- a/.jenkins/pytorch/build.sh +++ b/.jenkins/pytorch/build.sh @@ -15,14 +15,12 @@ if [[ "$BUILD_ENVIRONMENT" == *-clang7-asan* ]]; then exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@" fi -if [[ "$BUILD_ENVIRONMENT" == *-mobile-*build* ]]; then - exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile.sh" "$@" +if [[ "$BUILD_ENVIRONMENT" == *-clang7-tsan* ]]; then + exec "$(dirname "${BASH_SOURCE[0]}")/build-tsan.sh" "$@" fi -if [[ "$BUILD_ENVIRONMENT" == *deploy* ]]; then - # Enabling DEPLOY build (embedded torch python interpreter, experimental) - # only on one config for now, can expand later - export USE_DEPLOY=ON +if [[ "$BUILD_ENVIRONMENT" == *-mobile-*build* ]]; then + exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile.sh" "$@" fi echo "Python version:" @@ -43,9 +41,9 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then fi if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then - # enable split torch_cuda build option in CMake - export BUILD_SPLIT_CUDA=ON - if [[ "$BUILD_ENVIRONMENT" != *cuda11.3* ]]; then + if [[ "$BUILD_ENVIRONMENT" != *cuda11.3* && "$BUILD_ENVIRONMENT" != *clang* ]]; then + # TODO: there is a linking issue when building with UCC using clang, + # disable it for now and to be fix later. export USE_UCC=1 export USE_SYSTEM_UCC=1 fi @@ -62,20 +60,20 @@ elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then export ATEN_THREADING=NATIVE fi -# TODO: Don't run this... -pip_install -r requirements.txt || true - # Enable LLVM dependency for TensorExpr testing -export USE_LLVM=/opt/llvm -export LLVM_DIR=/opt/llvm/lib/cmake/llvm +if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then + export USE_LLVM=/opt/rocm/llvm + export LLVM_DIR=/opt/rocm/llvm/lib/cmake/llvm +else + export USE_LLVM=/opt/llvm + export LLVM_DIR=/opt/llvm/lib/cmake/llvm +fi -# TODO: Don't install this here if ! which conda; then # In ROCm CIs, we are doing cross compilation on build machines with # intel cpu and later run tests on machines with amd cpu. # Also leave out two builds to make sure non-mkldnn builds still work. if [[ "$BUILD_ENVIRONMENT" != *rocm* ]]; then - pip_install mkl mkl-devel export USE_MKLDNN=1 else export USE_MKLDNN=0 @@ -144,9 +142,9 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then fi if [[ -n "$CI" && -z "$PYTORCH_ROCM_ARCH" ]]; then - # Set ROCM_ARCH to gfx900 and gfx906 for CI builds, if user doesn't override. - echo "Limiting PYTORCH_ROCM_ARCH to gfx90[06] for CI builds" - export PYTORCH_ROCM_ARCH="gfx900;gfx906" + # Set ROCM_ARCH to gfx906 for CI builds, if user doesn't override. + echo "Limiting PYTORCH_ROCM_ARCH to gfx906 for CI builds" + export PYTORCH_ROCM_ARCH="gfx906" fi # hipify sources @@ -161,8 +159,11 @@ if [ -z "$MAX_JOBS" ]; then fi fi -# Target only our CI GPU machine's CUDA arch to speed up the build -export TORCH_CUDA_ARCH_LIST="5.2" +# TORCH_CUDA_ARCH_LIST must be passed from an environment variable +if [[ "$BUILD_ENVIRONMENT" == *cuda* && -z "$TORCH_CUDA_ARCH_LIST" ]]; then + echo "TORCH_CUDA_ARCH_LIST must be defined" + exit 1 +fi if [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then export CC=clang @@ -181,17 +182,8 @@ if [[ "${BUILD_ENVIRONMENT}" == *linux-focal-py3.7-gcc7-build* ]]; then export USE_GLOO_WITH_OPENSSL=ON fi -# TODO: Remove after xenial->focal migration -if [[ "${BUILD_ENVIRONMENT}" == pytorch-linux-xenial-py3* ]]; then - if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* ]]; then - export BUILD_STATIC_RUNTIME_BENCHMARK=ON - fi -fi - -if [[ "${BUILD_ENVIRONMENT}" == pytorch-linux-focal-py3* ]]; then - if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* ]]; then - export BUILD_STATIC_RUNTIME_BENCHMARK=ON - fi +if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* ]]; then + export BUILD_STATIC_RUNTIME_BENCHMARK=ON fi if [[ "$BUILD_ENVIRONMENT" == *-bazel-* ]]; then @@ -222,7 +214,7 @@ else else python setup.py bdist_wheel fi - python -mpip install dist/*.whl + pip_install_whl "$(echo dist/*.whl)" # TODO: I'm not sure why, but somehow we lose verbose commands set -x diff --git a/.jenkins/pytorch/common.sh b/.jenkins/pytorch/common.sh index c71acc7e66cf..d8330243db57 100644 --- a/.jenkins/pytorch/common.sh +++ b/.jenkins/pytorch/common.sh @@ -23,28 +23,6 @@ fi # shellcheck disable=SC2034 BUILD_TEST_LIBTORCH=0 -# Use conda cmake in some CI build. Conda cmake will be newer than our supported -# min version (3.5 for xenial and 3.10 for bionic), -# so we only do it in four builds that we know should use conda. -# Linux bionic cannot find conda mkl with cmake 3.10, so we need a cmake from conda. -# Alternatively we could point cmake to the right place -# export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} -if [[ "${TEST_CONFIG:-}" == *xla* ]] || \ - [[ "$BUILD_ENVIRONMENT" == *centos* ]] || \ - [[ "$BUILD_ENVIRONMENT" == *linux-bionic* ]] || \ - [[ "$BUILD_ENVIRONMENT" == *linux-focal* ]]; then - if ! which conda; then - echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty" - exit 1 - else - conda install -q -y cmake - fi - if [[ "$BUILD_ENVIRONMENT" == *centos* ]]; then - # cmake3 package will conflict with conda cmake - sudo yum -y remove cmake3 || true - fi -fi - retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } diff --git a/.jenkins/pytorch/common_utils.sh b/.jenkins/pytorch/common_utils.sh index 0584ddab9e2a..6d3c96b9278f 100644 --- a/.jenkins/pytorch/common_utils.sh +++ b/.jenkins/pytorch/common_utils.sh @@ -9,6 +9,10 @@ log() { printf '%s\n' "$*"; } error() { log "ERROR: $*" >&2; } fatal() { error "$@"; exit 1; } +retry () { + "$@" || (sleep 10 && "$@") || (sleep 20 && "$@") || (sleep 40 && "$@") +} + # compositional trap taken from https://stackoverflow.com/a/7287873/23845 # appends a command to a trap # @@ -49,6 +53,12 @@ function assert_git_not_dirty() { fi } +function pip_install_whl() { + # This is used to install PyTorch and other build artifacts wheel locally + # without using any network connection + python3 -mpip install --no-index --no-deps "$@" +} + function pip_install() { # retry 3 times # old versions of pip don't have the "--progress-bar" flag @@ -72,12 +82,12 @@ function get_exit_code() { function get_bazel() { if [[ $(uname) == "Darwin" ]]; then # download bazel version - curl https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-darwin-x86_64 -Lo tools/bazel + retry curl https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-darwin-x86_64 -Lo tools/bazel # verify content echo '74d93848f0c9d592e341e48341c53c87e3cb304a54a2a1ee9cff3df422f0b23c tools/bazel' | shasum -a 256 -c >/dev/null else # download bazel version - curl https://ossci-linux.s3.amazonaws.com/bazel-4.2.1-linux-x86_64 -o tools/bazel + retry curl https://ossci-linux.s3.amazonaws.com/bazel-4.2.1-linux-x86_64 -o tools/bazel # verify content echo '1a4f3a3ce292307bceeb44f459883859c793436d564b95319aacb8af1f20557c tools/bazel' | shasum -a 256 -c >/dev/null fi @@ -95,20 +105,16 @@ function get_pinned_commit() { cat .github/ci_commit_pins/"${1}".txt } -function install_torchvision() { +function install_torchtext() { local commit - commit=$(get_pinned_commit vision) - pip_install --no-use-pep517 --user "git+https://github.com/pytorch/vision.git@${commit}" + commit=$(get_pinned_commit text) + pip_install --no-use-pep517 --user "git+https://github.com/pytorch/text.git@${commit}" } -function checkout_install_torchvision() { +function install_torchvision() { local commit commit=$(get_pinned_commit vision) - git clone https://github.com/pytorch/vision - pushd vision - git checkout "${commit}" - time python setup.py install - popd + pip_install --no-use-pep517 --user "git+https://github.com/pytorch/vision.git@${commit}" } function clone_pytorch_xla() { @@ -117,31 +123,81 @@ function clone_pytorch_xla() { pushd xla # pin the xla hash so that we don't get broken by changes to xla git checkout "$(cat ../.github/ci_commit_pins/xla.txt)" + git submodule sync + git submodule update --init --recursive popd fi } -function install_torchdynamo() { +function install_filelock() { + pip_install filelock +} + +function install_triton() { local commit - commit=$(get_pinned_commit torchdynamo) - pip_install --user "git+https://github.com/pytorch/torchdynamo.git@${commit}" + if [[ "${TEST_CONFIG}" == *rocm* ]]; then + echo "skipping triton due to rocm" + else + commit=$(get_pinned_commit triton) + pip_install --user "git+https://github.com/openai/triton@${commit}#subdirectory=python" + pip_install --user jinja2 + fi +} + +function setup_torchdeploy_deps(){ + conda install -y cmake + conda install -y -c conda-forge libpython-static=3.10 + local CC + local CXX + CC="$(which gcc)" + CXX="$(which g++)" + export CC + export CXX + pip install --upgrade pip } -function checkout_install_torchdynamo() { +function checkout_install_torchdeploy() { local commit - commit=$(get_pinned_commit torchdynamo) + setup_torchdeploy_deps pushd .. - git clone https://github.com/pytorch/torchdynamo - pushd torchdynamo - git checkout "${commit}" - time python setup.py develop + git clone --recurse-submodules https://github.com/pytorch/multipy.git + pushd multipy + python multipy/runtime/example/generate_examples.py + pip install -e . --install-option="--cudatests" popd popd } -function install_functorch() { - pushd functorch - time python setup.py develop +function test_torch_deploy(){ + pushd .. + pushd multipy + ./multipy/runtime/build/test_deploy + ./multipy/runtime/build/test_deploy_gpu + popd + popd +} + +function install_huggingface() { + local commit + commit=$(get_pinned_commit huggingface) + pip_install pandas + pip_install scipy + pip_install "git+https://github.com/huggingface/transformers.git@${commit}#egg=transformers" +} + +function install_timm() { + local commit + commit=$(get_pinned_commit timm) + pip_install pandas + pip_install scipy + pip_install "git+https://github.com/rwightman/pytorch-image-models@${commit}" +} + +function checkout_install_torchbench() { + git clone https://github.com/pytorch/benchmark torchbench + pushd torchbench + git checkout no_torchaudio + python install.py popd } diff --git a/.jenkins/pytorch/dirty.sh b/.jenkins/pytorch/dirty.sh deleted file mode 100755 index 230d69606664..000000000000 --- a/.jenkins/pytorch/dirty.sh +++ /dev/null @@ -1,9 +0,0 @@ -#!/bin/bash -set -ex -upstream="$1" -pr="$2" -git diff --name-only "$upstream" "$pr" -# Now that PyTorch build depends on Caffe2, unconditionally trigger -# for any changes. -# TODO: Replace this with a NEGATIVE regex that allows us to skip builds when they are unnecessary -#git diff --name-only "$upstream" "$pr" | grep -Eq '^(aten/|caffe2/|.jenkins/pytorch|docs/(make.bat|Makefile|requirements.txt|source)|mypy|requirements.txt|setup.py|test/|third_party/|tools/|\.gitmodules|torch/)' diff --git a/.jenkins/pytorch/macos-build.sh b/.jenkins/pytorch/macos-build.sh index d40ec521520b..dbba68081d3e 100755 --- a/.jenkins/pytorch/macos-build.sh +++ b/.jenkins/pytorch/macos-build.sh @@ -35,11 +35,13 @@ fi cross_compile_arm64() { # Cross compilation for arm64 - USE_DISTRIBUTED=1 CMAKE_OSX_ARCHITECTURES=arm64 MACOSX_DEPLOYMENT_TARGET=11.0 USE_MKLDNN=OFF USE_QNNPACK=OFF WERROR=1 BUILD_TEST=OFF python setup.py bdist_wheel + # Explicitly set USE_DISTRIBUTED=0 to align with the default build config on mac. This also serves as the sole CI config that tests + # that building with USE_DISTRIBUTED=0 works at all. See https://github.com/pytorch/pytorch/issues/86448 + USE_DISTRIBUTED=0 CMAKE_OSX_ARCHITECTURES=arm64 MACOSX_DEPLOYMENT_TARGET=11.0 USE_MKLDNN=OFF USE_QNNPACK=OFF WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel } compile_x86_64() { - USE_DISTRIBUTED=1 WERROR=1 python setup.py bdist_wheel + USE_DISTRIBUTED=0 WERROR=1 python setup.py bdist_wheel } build_lite_interpreter() { diff --git a/.jenkins/pytorch/macos-common.sh b/.jenkins/pytorch/macos-common.sh index 4df378d505ec..d1b31ec94188 100755 --- a/.jenkins/pytorch/macos-common.sh +++ b/.jenkins/pytorch/macos-common.sh @@ -7,52 +7,6 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh" sysctl -a | grep machdep.cpu -if [[ ${BUILD_ENVIRONMENT} = *arm64* ]]; then - # We use different versions here as the arm build/tests runs on python 3.9 - # while the x86 one runs on python 3.8 - retry conda install -y \ - numpy=1.22.3 \ - pyyaml=6.0 \ - setuptools=61.2.0 \ - cmake=3.22.1 \ - cffi \ - ninja \ - typing_extensions \ - dataclasses \ - pip -else - # NOTE: mkl 2021.3.0+ cmake requires sub-command PREPEND, may break the build - retry conda install -y \ - mkl=2021.2.0 \ - mkl-include=2021.2.0 \ - numpy=1.18.5 \ - pyyaml=5.3 \ - setuptools=46.0.0 \ - cmake=3.19 \ - cffi \ - ninja \ - typing_extensions \ - dataclasses \ - pip -fi - -# The torch.hub tests make requests to GitHub. -# -# The certifi package from conda-forge is new enough to make the -# following error disappear (included for future reference): -# -# > ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] -# > certificate verify failed: unable to get local issuer certificate -# > (_ssl.c:1056) -# -retry conda install -y -c conda-forge certifi wheel=0.36.2 - -# Needed by torchvision, which is imported from TestHub in test_utils.py. -retry conda install -y pillow - -# Building with USE_DISTRIBUTED=1 requires libuv (for Gloo). -retry conda install -y libuv pkg-config - # These are required for both the build job and the test job. # In the latter to test cpp extensions. export MACOSX_DEPLOYMENT_TARGET=10.9 diff --git a/.jenkins/pytorch/macos-test.sh b/.jenkins/pytorch/macos-test.sh index 68f7f2619209..4beab880ddbb 100755 --- a/.jenkins/pytorch/macos-test.sh +++ b/.jenkins/pytorch/macos-test.sh @@ -4,22 +4,6 @@ # shellcheck source=./macos-common.sh source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh" -conda install -y six -if [[ ${BUILD_ENVIRONMENT} = *arm64* ]]; then - pip install hypothesis "expecttest==0.1.3" "librosa>=0.6.2" "numba==0.56.0" psutil "scipy==1.9.0" -else - pip install hypothesis "expecttest==0.1.3" "librosa>=0.6.2" "numba<=0.49.1" psutil "scipy==1.6.3" -fi - -# TODO move this to docker -# Pin unittest-xml-reporting to freeze printing test summary logic, related: https://github.com/pytorch/pytorch/issues/69014 -pip install "unittest-xml-reporting<=3.2.0,>=2.0.0" \ - pytest \ - pytest-xdist \ - pytest-rerunfailures - # TODO: enable xdoctest later - # xdoctest - if [ -z "${CI}" ]; then rm -rf "${WORKSPACE_DIR}"/miniconda3/lib/python3.6/site-packages/torch* fi @@ -170,14 +154,7 @@ test_jit_hooks() { assert_git_not_dirty } -test_dynamo() { - pushd ../torchdynamo - pytest tests - popd -} - if [[ "${TEST_CONFIG}" == *functorch* ]]; then - install_functorch test_functorch elif [[ $NUM_TEST_SHARDS -gt 1 ]]; then test_python_shard "${SHARD_NUMBER}" @@ -189,11 +166,9 @@ elif [[ $NUM_TEST_SHARDS -gt 1 ]]; then test_custom_backend fi else - checkout_install_torchdynamo test_python_all test_libtorch test_custom_script_ops test_jit_hooks test_custom_backend - test_dynamo fi diff --git a/.jenkins/pytorch/multigpu-test.sh b/.jenkins/pytorch/multigpu-test.sh index d75d701e8e18..9d7efc969823 100755 --- a/.jenkins/pytorch/multigpu-test.sh +++ b/.jenkins/pytorch/multigpu-test.sh @@ -7,12 +7,7 @@ # shellcheck source=./common.sh source "$(dirname "${BASH_SOURCE[0]}")/common.sh" -echo "Testing pytorch (distributed only)" -if [ -n "${CI}" ]; then - # TODO move this to docker - # Pin unittest-xml-reporting to freeze printing test summary logic, related: https://github.com/pytorch/pytorch/issues/69014 - pip_install "unittest-xml-reporting<=3.2.0,>=2.0.0" -fi +echo "Testing pytorch" # Disabling tests to see if they solve timeout issues; see https://github.com/pytorch/pytorch/issues/70015 # python tools/download_mnist.py --quiet -d test/cpp/api/mnist @@ -28,8 +23,8 @@ time python test/run_test.py --verbose -i distributed/rpc/cuda/test_tensorpipe_a # FSDP tests for f in test/distributed/fsdp/*.py ; do time python test/run_test.py --verbose -i "${f#*/}" ; done # ShardedTensor tests -time python test/run_test.py --verbose -i distributed/_shard/checkpoint/test_checkpoint -time python test/run_test.py --verbose -i distributed/_shard/checkpoint/test_file_system_checkpoint +time python test/run_test.py --verbose -i distributed/checkpoint/test_checkpoint +time python test/run_test.py --verbose -i distributed/checkpoint/test_file_system_checkpoint time python test/run_test.py --verbose -i distributed/_shard/sharding_spec/test_sharding_spec time python test/run_test.py --verbose -i distributed/_shard/sharding_plan/test_sharding_plan time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/test_megatron_prototype @@ -48,4 +43,6 @@ time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/ time python test/run_test.py --verbose -i distributed/_shard/sharded_optim/test_sharded_optim time python test/run_test.py --verbose -i distributed/_shard/test_partial_tensor time python test/run_test.py --verbose -i distributed/_shard/test_replicated_tensor +# Other tests +time python test/run_test.py --verbose -i test_cuda_primary_ctx assert_git_not_dirty diff --git a/.jenkins/pytorch/test.sh b/.jenkins/pytorch/test.sh index 9c767500477c..ca50a31beb60 100755 --- a/.jenkins/pytorch/test.sh +++ b/.jenkins/pytorch/test.sh @@ -15,6 +15,45 @@ BUILD_DIR="build" BUILD_RENAMED_DIR="build_renamed" BUILD_BIN_DIR="$BUILD_DIR"/bin +export VALGRIND=ON +if [[ "$BUILD_ENVIRONMENT" == *clang9* ]]; then + # clang9 appears to miscompile code involving c10::optional, + # such that valgrind complains along these lines: + # + # Conditional jump or move depends on uninitialised value(s) + # at 0x40303A: ~optional_base (Optional.h:281) + # by 0x40303A: call (Dispatcher.h:448) + # by 0x40303A: call(at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::optional) (basic.cpp:10) + # by 0x403700: main (basic.cpp:16) + # Uninitialised value was created by a stack allocation + # at 0x402AAA: call(at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::optional) (basic.cpp:6) + # + # The problem does not appear with gcc or newer versions of clang (we tested + # clang14). So we suppress valgrind testing for clang9 specifically. + # You may need to suppress it for other versions of clang if they still have + # the bug. + # + # A minimal repro for the valgrind error is below: + # + # #include + # #include + # + # using namespace at; + # + # Tensor call(const at::Tensor & self, c10::SymIntArrayRef size, c10::SymIntArrayRef stride, c10::optional storage_offset) { + # auto op = c10::Dispatcher::singleton() + # .findSchemaOrThrow(at::_ops::as_strided::name, at::_ops::as_strided::overload_name) + # .typed(); + # return op.call(self, size, stride, storage_offset); + # } + # + # int main(int argv) { + # Tensor b = empty({3, 4}); + # auto z = call(b, b.sym_sizes(), b.sym_strides(), c10::nullopt); + # } + export VALGRIND=OFF +fi + # Get fully qualified path using realpath if [[ "$BUILD_ENVIRONMENT" != *bazel* ]]; then CUSTOM_TEST_ARTIFACT_BUILD_DIR=$(realpath "${CUSTOM_TEST_ARTIFACT_BUILD_DIR:-"build/custom_test_artifacts"}") @@ -58,10 +97,6 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* || "$BUILD_ENVIRONMENT" == *rocm* ]]; then export PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" fi -if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then - export BUILD_SPLIT_CUDA=ON -fi - if [[ "$TEST_CONFIG" == *crossref* ]]; then export PYTORCH_TEST_WITH_CROSSREF=1 fi @@ -70,12 +105,8 @@ if [[ "$TEST_CONFIG" == *dynamo* ]]; then export PYTORCH_TEST_WITH_DYNAMO=1 fi -# TODO: this condition is never true, need to fix this. -if [[ -n "$PR_NUMBER" ]] && [[ -z "$CI_MASTER" || "$CI_MASTER" == "false" ]]; then - # skip expensive checks when on PR and CI_MASTER flag is not set - export PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=1 -else - export PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0 +if [[ "$TEST_CONFIG" == *inductor* ]]; then + export PYTORCH_TEST_WITH_INDUCTOR=1 fi if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then @@ -86,7 +117,7 @@ fi if [[ "$BUILD_ENVIRONMENT" != *-bazel-* ]] ; then # JIT C++ extensions require ninja. - pip_install --user ninja + pip_install --user "ninja==1.10.2" # ninja is installed in $HOME/.local/bin, e.g., /var/lib/jenkins/.local/bin for CI user jenkins # but this script should be runnable by any user, including root export PATH="$HOME/.local/bin:$PATH" @@ -96,9 +127,8 @@ fi # if you're not careful. Check this if you made some changes and the # ASAN test is not working if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then - # Suppress vptr violations arising from multiple copies of pybind11 export ASAN_OPTIONS=detect_leaks=0:symbolize=1:detect_stack_use_after_return=1:strict_init_order=true:detect_odr_violation=0 - export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PWD/ubsan.supp + export UBSAN_OPTIONS=print_stacktrace=1 export PYTORCH_TEST_WITH_ASAN=1 export PYTORCH_TEST_WITH_UBSAN=1 # TODO: Figure out how to avoid hard-coding these paths @@ -141,12 +171,17 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then ulimit -s 81920 (cd test && python -c "import torch; print(torch.__version__, torch.version.git_version)") - echo "The next three invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured" + echo "The next four invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured" (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_asan(3)") (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_ubsan(0)") + (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_vptr_ubsan()") (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)") fi +if [[ "$BUILD_ENVIRONMENT" == *-tsan* ]]; then + export PYTORCH_TEST_WITH_TSAN=1 +fi + if [[ $TEST_CONFIG == 'nogpu_NO_AVX2' ]]; then export ATEN_CPU_CAPABILITY=default elif [[ $TEST_CONFIG == 'nogpu_AVX512' ]]; then @@ -163,7 +198,9 @@ test_python_shard() { echo "NUM_TEST_SHARDS must be defined to run a Python test shard" exit 1 fi + time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard "$1" "$NUM_TEST_SHARDS" --verbose + assert_git_not_dirty } @@ -192,16 +229,84 @@ test_dynamo_shard() { test_reductions \ test_namedtensor \ test_namedtuple_return_api \ - test_profiler \ - test_profiler_tree \ + profiler/test_profiler \ + profiler/test_profiler_tree \ test_overrides \ test_python_dispatch \ test_fx \ + test_package \ + test_vmap \ --shard "$1" "$NUM_TEST_SHARDS" \ --verbose assert_git_not_dirty } +test_inductor_distributed() { + # this runs on both single-gpu and multi-gpu instance. It should be smart about skipping tests that aren't supported + # with if required # gpus aren't available + PYTORCH_TEST_WITH_INDUCTOR=0 PYTORCH_TEST_WITH_INDUCTOR=0 python test/run_test.py --include distributed/test_dynamo_distributed --verbose + assert_git_not_dirty +} + +test_inductor() { + python test/run_test.py --include test_modules test_ops --verbose + PYTORCH_TEST_WITH_INDUCTOR=0 python test/run_test.py --include inductor/test_torchinductor --include inductor/test_torchinductor_opinfo --verbose + # TODO: investigate "RuntimeError: CUDA driver API confirmed a leak" + # seen intest_ops_gradients.py + # pytest test/test_ops_gradients.py --verbose -k "not _complex and not test_inplace_grad_acos_cuda_float64" +} + +test_inductor_huggingface() { + # Use test-reports directory under test folder will allow the CI to automatically pick up + # the test reports and upload them to S3. Need to use full path here otherwise the script + # will bark about file not found later on + TEST_REPORTS_DIR=$(pwd)/test/test-reports + mkdir -p "$TEST_REPORTS_DIR" + # Check inference with --float32 + python benchmarks/dynamo/huggingface.py --ci --accuracy \ + --device cuda --inductor --float32 --output "$TEST_REPORTS_DIR"/inductor_inference_huggingface.csv + python benchmarks/dynamo/check_csv.py -f "$TEST_REPORTS_DIR"/inductor_inference_huggingface.csv + # Check training with --amp + python benchmarks/dynamo/huggingface.py --ci --training --accuracy \ + --device cuda --inductor --amp --output "$TEST_REPORTS_DIR"/inductor_training_huggingface.csv + python benchmarks/dynamo/check_csv.py -f "$TEST_REPORTS_DIR"/inductor_training_huggingface.csv +} + +test_inductor_timm_shard() { + if [[ -z "$NUM_TEST_SHARDS" ]]; then + echo "NUM_TEST_SHARDS must be defined to run a Python test shard" + exit 1 + fi + # Use test-reports directory under test folder will allow the CI to automatically pick up + # the test reports and upload them to S3. Need to use full path here otherwise the script + # will bark about file not found later on + TEST_REPORTS_DIR=$(pwd)/test/test-reports + mkdir -p "$TEST_REPORTS_DIR" + # Check inference with --float32 + python benchmarks/dynamo/timm_models.py --ci --accuracy \ + --device cuda --inductor --float32 --total-partitions 2 --partition-id "$1" \ + --output "$TEST_REPORTS_DIR"/inductor_inference_timm_"$1".csv + python benchmarks/dynamo/check_csv.py -f "$TEST_REPORTS_DIR"/inductor_inference_timm_"$1".csv + # Check training with --amp + python benchmarks/dynamo/timm_models.py --ci --training --accuracy \ + --device cuda --inductor --amp --total-partitions 2 --partition-id "$1" \ + --output "$TEST_REPORTS_DIR"/inductor_training_timm_"$1".csv + python benchmarks/dynamo/check_csv.py -f "$TEST_REPORTS_DIR"/inductor_training_timm_"$1".csv +} + +test_inductor_torchbench() { + TEST_REPORTS_DIR=$(pwd)/test/test-reports + mkdir -p "$TEST_REPORTS_DIR" + # Check inference with --float32 + PYTHONPATH=$(pwd)/torchbench python benchmarks/dynamo/torchbench.py --ci --accuracy \ + --device cuda --inductor --float32 --output "$TEST_REPORTS_DIR"/inductor_inference_torchbench.csv + python benchmarks/dynamo/check_csv.py -f "$TEST_REPORTS_DIR"/inductor_inference_torchbench.csv + # Check training with --amp + PYTHONPATH=$(pwd)/torchbench python benchmarks/dynamo/torchbench.py --ci --training --accuracy \ + --device cuda --inductor --amp --output "$TEST_REPORTS_DIR"/inductor_training_torchbench.csv + python benchmarks/dynamo/check_csv.py -f "$TEST_REPORTS_DIR"/inductor_training_torchbench.csv +} + test_python_gloo_with_tls() { source "$(dirname "${BASH_SOURCE[0]}")/run_glootls_test.sh" assert_git_not_dirty @@ -290,8 +395,11 @@ test_libtorch() { TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_libtorch mkdir -p $TEST_REPORTS_DIR - # Run JIT cpp tests - python test/cpp/jit/tests_setup.py setup + if [[ "$BUILD_ENVIRONMENT" != *-tsan* ]]; then + # Run JIT cpp tests + python test/cpp/jit/tests_setup.py setup + fi + if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then "$TORCH_BIN_DIR"/test_jit --gtest_output=xml:$TEST_REPORTS_DIR/test_jit.xml else @@ -305,19 +413,19 @@ test_libtorch() { "$TORCH_BIN_DIR"/test_lazy --gtest_output=xml:$TEST_REPORTS_DIR/test_lazy.xml fi - python test/cpp/jit/tests_setup.py shutdown + if [[ "$BUILD_ENVIRONMENT" != *-tsan* ]]; then + python test/cpp/jit/tests_setup.py shutdown + fi + # Wait for background download to finish wait # Exclude IMethodTest that relies on torch::deploy, which will instead be ran in test_deploy. OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$TORCH_BIN_DIR"/test_api --gtest_filter='-IMethodTest.*' --gtest_output=xml:$TEST_REPORTS_DIR/test_api.xml "$TORCH_BIN_DIR"/test_tensorexpr --gtest_output=xml:$TEST_REPORTS_DIR/test_tensorexpr.xml - # TODO: this condition is never (BUILD_ENVIRONMENT doesn't start with pytorch-), need to fix this. - if [[ "${BUILD_ENVIRONMENT}" == pytorch-linux-xenial-py3* ]]; then - if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* && "${BUILD_ENVIRONMENT}" != *asan* ]]; then - # TODO: Consider to run static_runtime_test from $TORCH_BIN_DIR (may need modify build script) - "$BUILD_BIN_DIR"/static_runtime_test --gtest_output=xml:$TEST_REPORTS_DIR/static_runtime_test.xml - fi + if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* && "${BUILD_ENVIRONMENT}" != *asan* ]]; then + # TODO: Consider to run static_runtime_test from $TORCH_BIN_DIR (may need modify build script) + "$BUILD_BIN_DIR"/static_runtime_test --gtest_output=xml:$TEST_REPORTS_DIR/static_runtime_test.xml fi assert_git_not_dirty fi @@ -325,6 +433,14 @@ test_libtorch() { test_aot_compilation() { echo "Testing Ahead of Time compilation" + ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR" + ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR" + + # Make test_reports directory + # NB: the ending test_libtorch must match the current function name for the current + # test reporting process (in print_test_stats.py) to function as expected. + TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_aot_compilation + mkdir -p $TEST_REPORTS_DIR if [ -f "$TORCH_BIN_DIR"/test_mobile_nnc ]; then "$TORCH_BIN_DIR"/test_mobile_nnc --gtest_output=xml:$TEST_REPORTS_DIR/test_mobile_nnc.xml; fi # shellcheck source=test/mobile/nnc/test_aot_compile.sh if [ -f "$TORCH_BIN_DIR"/aot_model_compiler_test ]; then source test/mobile/nnc/test_aot_compile.sh; fi @@ -457,6 +573,11 @@ build_xla() { clone_pytorch_xla # shellcheck disable=SC1091 source "xla/.circleci/common.sh" + + # TODO: The torch pin #73164 is involved in the sev https://github.com/pytorch/pytorch/issues/86093 + # so this is temporarily removed until XLA fixes the weird logic in https://github.com/pytorch/xla/blob/master/scripts/apply_patches.sh#L17-L18 + rm "${XLA_DIR}/torch_patches/.torch_pin" || true + apply_patches SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')" # These functions are defined in .circleci/common.sh in pytorch/xla repo @@ -593,39 +714,19 @@ test_vec256() { fi } -test_dynamo() { - pushd ../torchdynamo - pytest tests - popd -} - -test_torch_deploy() { - python torch/csrc/deploy/example/generate_examples.py - ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR" - ln -sf "$TORCH_LIB_DIR"/libshm* "$TORCH_BIN_DIR" - ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR" - "$TORCH_BIN_DIR"/test_deploy - "$TORCH_BIN_DIR"/test_deploy_gpu - assert_git_not_dirty -} - test_docs_test() { .jenkins/pytorch/docs-test.sh } -if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* || "${BUILD_ENVIRONMENT}" == *-bazel-* ]]; then +if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* || "${BUILD_ENVIRONMENT}" == *-bazel-* || "${BUILD_ENVIRONMENT}" == *-tsan* ]]; then (cd test && python -c "import torch; print(torch.__config__.show())") (cd test && python -c "import torch; print(torch.__config__.parallel_info())") fi -if [[ "${TEST_CONFIG}" == *deploy* ]]; then - install_torchdynamo - test_torch_deploy -elif [[ "${TEST_CONFIG}" == *backward* ]]; then +if [[ "${TEST_CONFIG}" == *backward* ]]; then test_forward_backward_compatibility # Do NOT add tests after bc check tests, see its comment. elif [[ "${TEST_CONFIG}" == *xla* ]]; then install_torchvision - install_torchdynamo build_xla test_xla elif [[ "$TEST_CONFIG" == 'jit_legacy' ]]; then @@ -634,32 +735,67 @@ elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then # TODO: run some C++ tests echo "no-op at the moment" elif [[ "$TEST_CONFIG" == distributed ]]; then - install_torchdynamo + install_filelock + install_triton test_distributed # Only run RPC C++ tests on the first shard if [[ "${SHARD_NUMBER}" == 1 ]]; then test_rpc fi +elif [[ "$TEST_CONFIG" == deploy ]]; then + checkout_install_torchdeploy + test_torch_deploy +elif [[ "${TEST_CONFIG}" == *inductor_distributed* ]]; then + install_filelock + install_triton + install_huggingface + test_inductor_distributed elif [[ "${TEST_CONFIG}" == *dynamo* && "${SHARD_NUMBER}" == 1 && $NUM_TEST_SHARDS -gt 1 ]]; then test_without_numpy install_torchvision - install_torchdynamo + install_triton test_dynamo_shard 1 test_aten elif [[ "${TEST_CONFIG}" == *dynamo* && "${SHARD_NUMBER}" == 2 && $NUM_TEST_SHARDS -gt 1 ]]; then install_torchvision - checkout_install_torchdynamo + install_filelock + install_triton test_dynamo_shard 2 - test_dynamo +elif [[ "${TEST_CONFIG}" == *inductor_huggingface* ]]; then + install_torchvision + install_filelock + install_triton + install_huggingface + test_inductor_huggingface +elif [[ "${TEST_CONFIG}" == *inductor_timm* && $NUM_TEST_SHARDS -gt 1 ]]; then + install_torchvision + install_filelock + install_triton + install_timm + id=$((SHARD_NUMBER-1)) + test_inductor_timm_shard $id +elif [[ "${TEST_CONFIG}" == *inductor_torchbench* ]]; then + install_torchtext + install_torchvision + install_filelock + install_triton + checkout_install_torchbench + test_inductor_torchbench +elif [[ "${TEST_CONFIG}" == *inductor* && "${SHARD_NUMBER}" == 1 ]]; then + install_torchvision + install_filelock + install_triton + test_inductor + test_inductor_distributed elif [[ "${SHARD_NUMBER}" == 1 && $NUM_TEST_SHARDS -gt 1 ]]; then test_without_numpy install_torchvision - install_torchdynamo + install_triton test_python_shard 1 test_aten elif [[ "${SHARD_NUMBER}" == 2 && $NUM_TEST_SHARDS -gt 1 ]]; then install_torchvision - checkout_install_torchdynamo + install_triton test_python_shard 2 test_libtorch test_aot_compilation @@ -668,7 +804,7 @@ elif [[ "${SHARD_NUMBER}" == 2 && $NUM_TEST_SHARDS -gt 1 ]]; then test_torch_function_benchmark elif [[ "${SHARD_NUMBER}" -gt 2 ]]; then # Handle arbitrary number of shards - install_torchdynamo + install_triton test_python_shard "$SHARD_NUMBER" elif [[ "${BUILD_ENVIRONMENT}" == *vulkan* ]]; then test_vulkan @@ -676,14 +812,17 @@ elif [[ "${BUILD_ENVIRONMENT}" == *-bazel-* ]]; then test_bazel elif [[ "${BUILD_ENVIRONMENT}" == *-mobile-lightweight-dispatch* ]]; then test_libtorch +elif [[ "${BUILD_ENVIRONMENT}" == *-tsan* ]]; then + # TODO: TSAN check is currently failing with 415 data race warnings. This will + # be addressed later, the first PR can be merged first to setup the CI jobs + test_libtorch || true elif [[ "${TEST_CONFIG}" = docs_test ]]; then test_docs_test elif [[ "${TEST_CONFIG}" == *functorch* ]]; then - install_functorch test_functorch else install_torchvision - install_torchdynamo + install_triton install_monkeytype test_python test_aten diff --git a/.jenkins/pytorch/win-test-helpers/build_pytorch.bat b/.jenkins/pytorch/win-test-helpers/build_pytorch.bat index 7edeca96ed8d..da28956cae97 100644 --- a/.jenkins/pytorch/win-test-helpers/build_pytorch.bat +++ b/.jenkins/pytorch/win-test-helpers/build_pytorch.bat @@ -135,16 +135,17 @@ if "%REBUILD%" == "" ( if not errorlevel 0 exit /b ) ) -:: tests if BUILD_ENVIRONMENT contains cuda11 as a substring -if not x%BUILD_ENVIRONMENT:cuda11=%==x%BUILD_ENVIRONMENT% ( - set BUILD_SPLIT_CUDA=ON -) -python setup.py install --cmake && sccache --show-stats && ( +python setup.py bdist_wheel +if errorlevel 1 exit /b +if not errorlevel 0 exit /b +sccache --show-stats +python -c "import os, glob; os.system('python -mpip install ' + glob.glob('dist/*.whl')[0] + '[opt-einsum]')" +( if "%BUILD_ENVIRONMENT%"=="" ( echo NOTE: To run `import torch`, please make sure to activate the conda environment by running `call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3` in Command Prompt before running Git Bash. ) else ( - 7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torchgen %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\caffe2 && copy /Y "%TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z" "%PYTORCH_FINAL_PACKAGE_DIR%\" + 7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torchgen %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\functorch && copy /Y "%TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z" "%PYTORCH_FINAL_PACKAGE_DIR%\" if errorlevel 1 exit /b if not errorlevel 0 exit /b diff --git a/.jenkins/pytorch/win-test-helpers/install_test_functorch.bat b/.jenkins/pytorch/win-test-helpers/install_test_functorch.bat index 7679bffbc70e..d06d46f3dd22 100644 --- a/.jenkins/pytorch/win-test-helpers/install_test_functorch.bat +++ b/.jenkins/pytorch/win-test-helpers/install_test_functorch.bat @@ -6,15 +6,6 @@ if not errorlevel 0 ( exit /b ) -pushd functorch -echo "Install functorch" -:: --no-deps because for some reason, on windows, `torch` isn't found in -:: `pip list` despite being installed. With just `python setup.py develop`, -:: setuptools explicitly checks for the existence of torch and can't find it. -python setup.py develop --no-deps -popd -if ERRORLEVEL 1 goto fail - echo "Installing test dependencies" pip install networkx if errorlevel 1 exit /b diff --git a/.jenkins/pytorch/win-test-helpers/installation-helpers/activate_miniconda3.bat b/.jenkins/pytorch/win-test-helpers/installation-helpers/activate_miniconda3.bat index e6660a17b389..0552d85a407a 100644 --- a/.jenkins/pytorch/win-test-helpers/installation-helpers/activate_miniconda3.bat +++ b/.jenkins/pytorch/win-test-helpers/installation-helpers/activate_miniconda3.bat @@ -13,7 +13,7 @@ if not exist %CONDA_PARENT_DIR%\Miniconda3 ( ) if "%INSTALL_FRESH_CONDA%"=="1" ( - curl --retry 3 -k https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe + curl --retry 3 --retry-all-errors -k https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe if errorlevel 1 exit /b if not errorlevel 0 exit /b diff --git a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat index d9f3ab1cf821..d0fbf5b20d88 100644 --- a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat +++ b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_magma.bat @@ -24,7 +24,7 @@ if "%CUDA_SUFFIX%" == "" ( if "%REBUILD%"=="" ( if "%BUILD_ENVIRONMENT%"=="" ( - curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z + curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z ) else ( aws s3 cp s3://ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet ) diff --git a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat index c700a04a1e4a..6c676d1baede 100644 --- a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat +++ b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_mkl.bat @@ -1,6 +1,6 @@ if "%REBUILD%"=="" ( if "%BUILD_ENVIRONMENT%"=="" ( - curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/mkl_2020.2.254.7z --output %TMP_DIR_WIN%\mkl.7z + curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/mkl_2020.2.254.7z --output %TMP_DIR_WIN%\mkl.7z ) else ( aws s3 cp s3://ossci-windows/mkl_2020.2.254.7z %TMP_DIR_WIN%\mkl.7z --quiet ) diff --git a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_sccache.bat b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_sccache.bat index 0165604400dd..6f8cc15ba868 100644 --- a/.jenkins/pytorch/win-test-helpers/installation-helpers/install_sccache.bat +++ b/.jenkins/pytorch/win-test-helpers/installation-helpers/install_sccache.bat @@ -7,8 +7,8 @@ if "%REBUILD%"=="" ( del %TMP_DIR_WIN%\bin\sccache.exe || ver > nul del %TMP_DIR_WIN%\bin\sccache-cl.exe || ver > nul if "%BUILD_ENVIRONMENT%"=="" ( - curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe - curl --retry 3 -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output %TMP_DIR_WIN%\bin\sccache-cl.exe + curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe + curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output %TMP_DIR_WIN%\bin\sccache-cl.exe ) else ( aws s3 cp s3://ossci-windows/sccache.exe %TMP_DIR_WIN%\bin\sccache.exe aws s3 cp s3://ossci-windows/sccache-cl.exe %TMP_DIR_WIN%\bin\sccache-cl.exe diff --git a/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat b/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat index c598a04e0f97..29c213ad4246 100644 --- a/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat +++ b/.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat @@ -36,8 +36,7 @@ popd ======= :: Pin unittest-xml-reporting to freeze printing test summary logic, related: https://github.com/pytorch/pytorch/issues/69014 -pip install "ninja==1.10.0.post1" future "hypothesis==5.35.1" "expecttest==0.1.3" "librosa>=0.6.2" "scipy==1.6.3" psutil pillow "unittest-xml-reporting<=3.2.0,>=2.0.0" pytest pytest-xdist pytest-rerunfailures -:: # TODO: enable xdoctest later +pip install "ninja==1.10.0.post1" future "hypothesis==5.35.1" "expecttest==0.1.3" "librosa>=0.6.2" "scipy==1.6.3" psutil pillow "unittest-xml-reporting<=3.2.0,>=2.0.0" pytest pytest-xdist pytest-shard pytest-rerunfailures sympy "xdoctest==1.0.2" "pygments==2.12.0" "opt-einsum>=3.3" if errorlevel 1 exit /b if not errorlevel 0 exit /b diff --git a/.jenkins/pytorch/win-test.sh b/.jenkins/pytorch/win-test.sh index dc2852120487..560b039dbf67 100755 --- a/.jenkins/pytorch/win-test.sh +++ b/.jenkins/pytorch/win-test.sh @@ -39,10 +39,6 @@ fi export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers -if [[ "${BUILD_ENVIRONMENT}" == *cuda11* ]]; then - export BUILD_SPLIT_CUDA=ON -fi - if [[ "$TEST_CONFIG" = "force_on_cpu" ]]; then # run the full test suite for force_on_cpu test export USE_CUDA=0 diff --git a/.lintrunner.toml b/.lintrunner.toml index 4c206c5fc744..34b673c7e09a 100644 --- a/.lintrunner.toml +++ b/.lintrunner.toml @@ -99,6 +99,8 @@ include_patterns = [ exclude_patterns = [ 'torch/include/**', 'torch/csrc/**', + 'torch/_dynamo/**/*.py', + 'torch/_inductor/**/*.py', 'torch/distributed/elastic/agent/server/api.py', 'torch/testing/_internal/**', 'torch/distributed/fsdp/fully_sharded_data_parallel.py', @@ -154,6 +156,7 @@ include_patterns = [ exclude_patterns = [ # (linbinyu) copied from internal repo 'tools/code_analyzer/gen_operators_yaml.py', + 'tools/dynamo/verify_dynamo.py', 'tools/gen_vulkan_spv.py', 'tools/test/gen_operators_yaml_test.py', 'tools/test/gen_oplist_test.py', @@ -170,7 +173,6 @@ command = [ [[linter]] code = 'CLANGTIDY' include_patterns = [ - 'torch/csrc/deploy/**/*.cpp', 'torch/csrc/fx/**/*.cpp', 'torch/csrc/generic/**/*.cpp', 'torch/csrc/onnx/**/*.cpp', @@ -183,7 +185,6 @@ exclude_patterns = [ # FunctionsManual.cpp is excluded to keep this diff clean. It will be fixed # in a follow up PR. # /torch/csrc/generic/*.cpp is excluded because those files aren't actually built. - # deploy/interpreter files are excluded due to using macros and other techniquies # that are not easily converted to accepted c++ 'torch/csrc/jit/passes/onnx/helper.cpp', 'torch/csrc/jit/passes/onnx/shape_type_inference.cpp', @@ -197,11 +198,6 @@ exclude_patterns = [ 'torch/csrc/autograd/FunctionsManual.cpp', 'torch/csrc/generic/*.cpp', 'torch/csrc/jit/codegen/cuda/runtime/*', - 'torch/csrc/deploy/interactive_embedded_interpreter.cpp', - 'torch/csrc/deploy/interpreter/**', - 'torch/csrc/deploy/test_deploy_python_ext.cpp', - 'torch/csrc/deploy/test_deploy_missing_interpreter.cpp', - 'torch/csrc/deploy/test_deploy_gpu.cpp', 'torch/csrc/utils/disable_torch_function.cpp', ] init_command = [ @@ -293,8 +289,10 @@ include_patterns=['**'] exclude_patterns=[ '**/contrib/**', 'third_party/**', + '**/*.bat', '**/*.expect', '**/*.ipynb', + '**/*.ps1', '**/*.ptl', 'tools/clang_format_hash/**', 'test/cpp/jit/upgrader_models/*.ptl', @@ -339,6 +337,7 @@ include_patterns = ['**'] exclude_patterns = [ '**/*.svg', '**/*Makefile', + '**/*Makefile_dashboard', '**/contrib/**', 'third_party/**', '**/.gitattributes', @@ -373,6 +372,7 @@ include_patterns = [ exclude_patterns = [ 'aten/src/ATen/native/quantized/cpu/qnnpack/**', 'aten/src/ATen/native/vulkan/api/vk_mem_alloc.h', + 'aten/src/ATen/native/vulkan/glsl/**', 'torch/csrc/jit/serialization/mobile_bytecode_generated.h', ] command = [ @@ -422,6 +422,35 @@ command = [ '@{{PATHSFILE}}' ] +[[linter]] +code = 'ERROR_PRONE_ISINSTANCE' +include_patterns = [ + 'torch/_refs/**/*.py', + 'torch/_prims/**/*.py', + 'torch/_prims_common/**/*.py', + 'torch/_decomp/**/*.py', + 'torch/_meta_registrations.py', +] +command = [ + 'python3', + 'tools/linter/adapters/grep_linter.py', + '--pattern=isinstance\([^)]+(int|float)\)', + '--linter-name=ERROR_PRONE_ISINSTANCE', + '--error-name=error prone isinstance', + """--error-description=\ + This line has an isinstance call that directly refers to \ + int or float. This is error-prone because you may also \ + have wanted to allow SymInt or SymFloat in your test. \ + To suppress this lint, use an appropriate type alias defined \ + in torch._prims_common; use IntLike/FloatLike when you would accept \ + both regular and symbolic numbers, Dim for ints representing \ + dimensions, or IntWithoutSymInt/FloatWithoutSymFloat if you really \ + meant to exclude symbolic numbers. + """, + '--', + '@{{PATHSFILE}}' +] + [[linter]] code = 'PYBIND11_SPECIALIZATION' include_patterns = [ @@ -710,6 +739,11 @@ include_patterns = [ 'test/onnx/**/*.py', 'test/test_dynamo_cudagraphs.py', 'tools/**/*.py', + 'torch/_dynamo/**/*.py', + 'test/dynamo/**/*.py', + 'benchmarks/dynamo/**/*.py', + 'torch/_inductor/**/*.py', + 'test/inductor/**/*.py', 'torch/onnx/**/*.py', 'torch/package/**/*.py', 'torch/_decomp/**/*.py', @@ -719,6 +753,7 @@ include_patterns = [ 'torch/_refs/**/*.py', 'torch/_subclasses/**/*.py', 'torch/_*.py', + 'torch/testing/_internal/opinfo/**/*.py', 'torchgen/**/*.py', 'functorch/functorch/_src/aot_autograd.py', 'functorch/functorch/_src/compilers.py', @@ -737,6 +772,7 @@ init_command = [ 'python3', 'tools/linter/adapters/pip_init.py', '--dry-run={{DRYRUN}}', + '--no-black-binary', 'black==22.3.0', 'ufmt==1.3.3', 'usort==1.0.2', diff --git a/BUILD.bazel b/BUILD.bazel index 4c0791bffbb4..172a31723a0b 100644 --- a/BUILD.bazel +++ b/BUILD.bazel @@ -28,6 +28,10 @@ COMMON_COPTS = [ ] + if_cuda([ "-DUSE_CUDA", "-DUSE_CUDNN", + # TODO: This should be passed only when building for CUDA-11.5 or newer + # use cub in a safe manner, see: + # https://github.com/pytorch/pytorch/pull/55292 + "-DCUB_WRAPPED_NAMESPACE=at_cuda_detail", ]) aten_generation_srcs = ["aten/src/ATen/native/native_functions.yaml"] + ["aten/src/ATen/native/tags.yaml"] + glob(["aten/src/ATen/templates/**"]) @@ -47,6 +51,7 @@ generated_cpu_cpp = [ "aten/src/ATen/RegisterSparseCsrCPU.cpp", "aten/src/ATen/RegisterZeroTensor.cpp", "aten/src/ATen/RegisterCompositeImplicitAutograd.cpp", + "aten/src/ATen/RegisterCompositeImplicitAutogradNestedTensor.cpp", "aten/src/ATen/RegisterCompositeExplicitAutograd.cpp", "aten/src/ATen/RegisterCompositeExplicitAutogradNonFunctional.cpp", "aten/src/ATen/RegisterMeta.cpp", @@ -62,6 +67,8 @@ generated_cpu_cpp = [ "aten/src/ATen/CompositeExplicitAutogradNonFunctionalFunctions_inl.h", "aten/src/ATen/CompositeImplicitAutogradFunctions.h", "aten/src/ATen/CompositeImplicitAutogradFunctions_inl.h", + "aten/src/ATen/CompositeImplicitAutogradNestedTensorFunctions.h", + "aten/src/ATen/CompositeImplicitAutogradNestedTensorFunctions_inl.h", "aten/src/ATen/CompositeViewCopyKernels.cpp", "aten/src/ATen/FunctionalInverses.h", "aten/src/ATen/Functions.h", @@ -126,6 +133,7 @@ filegroup( name = "aten_base_cpp", srcs = glob([ "aten/src/ATen/*.cpp", + "aten/src/ATen/functorch/*.cpp", "aten/src/ATen/detail/*.cpp", "aten/src/ATen/cpu/*.cpp", ]), @@ -421,6 +429,7 @@ cu_library( "@cuda//:cublas", "@cuda//:cufft", "@cuda//:cusparse", + "@cutlass", ], alwayslink = True, ) @@ -1665,6 +1674,7 @@ cc_library( ] + if_cuda([ ":torch_distributed_cuda", "@cuda//:nvToolsExt", + "@cutlass", ]), alwayslink = True, ) @@ -1740,10 +1750,28 @@ cc_library( # Torch integration tests rely on a labeled data set from the MNIST database. # http://yann.lecun.com/exdb/mnist/ -# imethod.cpp is excluded since torch/csrc/deploy* build is not yet supported. cpp_api_tests = glob( ["test/cpp/api/*.cpp"], - exclude = ["test/cpp/api/imethod.cpp"], + exclude = [ + "test/cpp/api/imethod.cpp", + "test/cpp/api/integration.cpp", + ], +) + +cc_test( + name = "integration_test", + size = "medium", + srcs = ["test/cpp/api/integration.cpp"], + deps = [ + ":test_support", + "@com_google_googletest//:gtest_main", + ], + tags = [ + "gpu-required", + ], + data = [ + ":download_mnist", + ], ) [ @@ -1885,3 +1913,15 @@ test_suite( "torch/csrc/lazy/ts_backend/ts_native_functions.cpp", ] ] + +genrule( + name = "download_mnist", + srcs = ["//:tools/download_mnist.py"], + outs = [ + "mnist/train-images-idx3-ubyte", + "mnist/train-labels-idx1-ubyte", + "mnist/t10k-images-idx3-ubyte", + "mnist/t10k-labels-idx1-ubyte", + ], + cmd = "python3 tools/download_mnist.py -d $(RULEDIR)/mnist", +) diff --git a/CITATION b/CITATION deleted file mode 100644 index f7db31f23627..000000000000 --- a/CITATION +++ /dev/null @@ -1,10 +0,0 @@ -@incollection{NEURIPS2019_9015, -title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library}, -author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith}, -booktitle = {Advances in Neural Information Processing Systems 32}, -editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett}, -pages = {8024--8035}, -year = {2019}, -publisher = {Curran Associates, Inc.}, -url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf} -} diff --git a/CITATION.cff b/CITATION.cff new file mode 100644 index 000000000000..2bebc947bfb2 --- /dev/null +++ b/CITATION.cff @@ -0,0 +1,73 @@ +cff-version: 1.2.0 +message: If you use this software, please cite it as below. +title: PyTorch +authors: + - family-names: PyTorch Team +url: https://pytorch.org +preferred-citation: + type: conference-paper + title: "PyTorch: An Imperative Style, High-Performance Deep Learning Library" + authors: + - family-names: Paszke + given-names: Adam + - family-names: Gross + given-names: Sam + - family-names: Massa + given-names: Francisco + - family-names: Lerer + given-names: Adam + - family-names: Bradbury + given-names: James + - family-names: Chanan + given-names: Gregory + - family-names: Killeen + given-names: Trevor + - family-names: Lin + given-names: Zeming + - family-names: Gimelshein + given-names: Natalia + - family-names: Antiga + given-names: Luca + - family-names: Desmaison + given-names: Alban + - family-names: Kopf + given-names: Andreas + - family-names: Yang + given-names: Edward + - family-names: DeVito + given-names: Zachary + - family-names: Raison + given-names: Martin + - family-names: Tejani + given-names: Alykhan + - family-names: Chilamkurthy + given-names: Sasank + - family-names: Steiner + given-names: Benoit + - family-names: Fang + given-names: Lu + - family-names: Bai + given-names: Junjie + - family-names: Chintala + given-names: Soumith + collection-title: Advances in Neural Information Processing Systems 32 + collection-type: proceedings + editors: + - family-names: Wallach + given-names: H. + - family-names: Larochelle + given-names: H. + - family-names: Beygelzimer + given-names: A. + - family-names: "d'Alché-Buc" + given-names: F. + - family-names: Fox + given-names: E. + - family-names: Garnett + given-names: R. + start: 8024 + end: 8035 + year: 2019 + publisher: + name: Curran Associates, Inc. + url: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf diff --git a/CMakeLists.txt b/CMakeLists.txt index 5bd4fb954b4d..784b52841704 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -1,4 +1,4 @@ -cmake_minimum_required(VERSION 3.13 FATAL_ERROR) +cmake_minimum_required(VERSION 3.18 FATAL_ERROR) #cmake_policy(SET CMP0022 NEW) #cmake_policy(SET CMP0023 NEW) @@ -11,13 +11,9 @@ cmake_policy(SET CMP0025 NEW) # Suppress warning flags in default MSVC configuration. It's not # mandatory that we do this (and we don't if cmake is old), but it's # nice when it's possible, and it's possible on our Windows configs. -if(NOT CMAKE_VERSION VERSION_LESS 3.15.0) - cmake_policy(SET CMP0092 NEW) -endif() +cmake_policy(SET CMP0092 NEW) -if(NOT CMAKE_VERSION VERSION_LESS 3.10) - set(FIND_CUDA_MODULE_DEPRECATED ON) -endif() +set(FIND_CUDA_MODULE_DEPRECATED ON) # ---[ Project and semantic versioning. project(Torch CXX C) @@ -165,9 +161,6 @@ option(BUILD_LITE_INTERPRETER "Master flag to build Lite Interpreter" OFF) cmake_dependent_option( BUILD_CAFFE2_OPS "Build Caffe2 operators" ON "BUILD_CAFFE2" OFF) -cmake_dependent_option( - BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" OFF - "BUILD_CAFFE2" OFF) option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON) cmake_dependent_option( CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON @@ -186,21 +179,14 @@ cmake_dependent_option( INSTALL_TEST "Install test binaries if BUILD_TEST is on" ON "BUILD_TEST" OFF) option(USE_CPP_CODE_COVERAGE "Compile C/C++ with code coverage flags" OFF) -option(COLORIZE_OUTPUT "Colorize output during compilation" ON) -option(USE_ASAN "Use Address Sanitizer" OFF) +option(USE_COLORIZE_OUTPUT "Colorize output during compilation" ON) +option(USE_ASAN "Use Address+Undefined Sanitizers" OFF) option(USE_TSAN "Use Thread Sanitizer" OFF) option(USE_CUDA "Use CUDA" ON) -# BUILD_SPLIT_CUDA must also be exported as an environment variable before building, with -# `export BUILD_SPLIT_CUDA=1` because cpp_extension.py can only work properly if this variable -# also exists in the environment. -# This option is incompatible with CUDA_SEPARABLE_COMPILATION. -cmake_dependent_option( - BUILD_SPLIT_CUDA "Split torch_cuda library into torch_cuda_cu and torch_cuda_cpp" OFF - "USE_CUDA AND NOT CUDA_SEPARABLE_COMPILATION" OFF) cmake_dependent_option( BUILD_LAZY_CUDA_LINALG "Build cuda linalg ops as separate library" ON "USE_CUDA AND LINUX AND BUILD_PYTHON" OFF) option(USE_FAST_NVCC "Use parallel NVCC build" OFF) -option(USE_ROCM "Use ROCm" ON) +cmake_dependent_option(USE_ROCM "Use ROCm" ON "LINUX" OFF) option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF) cmake_dependent_option( USE_CUDNN "Use cuDNN" ON @@ -295,6 +281,7 @@ if(NOT USE_XNNPACK AND CMAKE_VERSION VERSION_LESS ${XNNPACK_MIN_CMAKE_VER}) endif() option(USE_ZMQ "Use ZMQ" OFF) option(USE_ZSTD "Use ZSTD" OFF) +option(TORCH_DISABLE_GPU_ASSERTS "Disable GPU asserts by default" OFF) # Ensure that an ITT build is the default for x86 CPUs cmake_dependent_option( USE_ITT "Use Intel(R) VTune Profiler ITT functionality" ON @@ -348,9 +335,6 @@ cmake_dependent_option( option(ONNX_ML "Enable traditional ONNX ML API." ON) option(HAVE_SOVERSION "Whether to add SOVERSION to the shared objects" OFF) option(BUILD_LIBTORCH_CPU_WITH_DEBUG "Enable RelWithDebInfo for libtorch_cpu target only" OFF) -cmake_dependent_option( - USE_DEPLOY "Build embedded torch::deploy interpreter. See torch/csrc/deploy/README.md for more info." OFF - "BUILD_PYTHON" OFF) cmake_dependent_option(USE_CCACHE "Attempt using CCache to wrap the compilation" ON "UNIX" OFF) option(WERROR "Build with -Werror supported by the compiler" OFF) option(USE_COREML_DELEGATE "Use the CoreML backend through delegate APIs" OFF) @@ -358,6 +342,8 @@ option(USE_PER_OPERATOR_HEADERS "Whether ATen should generate separate headers f cmake_dependent_option( BUILD_LAZY_TS_BACKEND "Build the lazy Torchscript backend, not compatible with mobile builds" ON "NOT INTERN_BUILD_MOBILE" OFF) +cmake_dependent_option( + BUILD_FUNCTORCH "Build Functorch" ON "BUILD_PYTHON" OFF) if(USE_CCACHE) @@ -556,6 +542,9 @@ if(MSVC) # Try harder string(APPEND CMAKE_CUDA_FLAGS " -Xcompiler /w -w") + + string(APPEND CMAKE_CXX_FLAGS " /FS") + string(APPEND CMAKE_CUDA_FLAGS " -Xcompiler /FS") endif(MSVC) string(APPEND CMAKE_CUDA_FLAGS " -Xfatbin -compress-all") @@ -575,6 +564,22 @@ if(ANDROID OR IOS OR DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN}) message(WARNING "INTERN_BUILD_MOBILE is on, disabling BUILD_LAZY_TS_BACKEND") set(BUILD_LAZY_TS_BACKEND OFF) + # Set -ffunction-sections and -fdata-sections so that each method has its own + # text section. This allows the linker to remove unused section when the flag + # -Wl,-gc-sections is provided at link time. + string(APPEND CMAKE_CXX_FLAGS " -ffunction-sections") + string(APPEND CMAKE_C_FLAGS " -ffunction-sections") + string(APPEND CMAKE_CXX_FLAGS " -fdata-sections") + string(APPEND CMAKE_C_FLAGS " -fdata-sections") + + # Please note that the use of the following flags is required when linking + # against libtorch_cpu.a for mobile builds. + # -Wl,--whole-archive -ltorch_cpu -Wl,--no-whole-archive + # + # This allows global constructors to be included and run. Global + # constructors are used for operator/kernel registration with the + # PyTorch Dispatcher. + if(DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN}) # C10_MOBILE is derived from Android/iOS toolchain macros in # c10/macros/Macros.h, so it needs to be explicitly set here. @@ -591,18 +596,15 @@ if(ANDROID OR IOS OR DEFINED ENV{BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN}) endif() # INTERN_BUILD_ATEN_OPS is used to control whether to build ATen/TH operators. -# It's disabled for caffe2 mobile library. -if(INTERN_BUILD_MOBILE AND BUILD_CAFFE2_MOBILE) - set(INTERN_BUILD_ATEN_OPS OFF) -else() - set(INTERN_BUILD_ATEN_OPS ON) +set(INTERN_BUILD_ATEN_OPS ON) + +if(NOT DEFINED USE_BLAS) + set(USE_BLAS ON) endif() -# BUILD_CAFFE2_MOBILE is the master switch to choose between libcaffe2 v.s. libtorch mobile build. -# When it's enabled it builds original libcaffe2 mobile library without ATen/TH ops nor TorchScript support; -# When it's disabled it builds libtorch mobile library, which contains ATen/TH ops and native support for +# Build libtorch mobile library, which contains ATen/TH ops and native support for # TorchScript model, but doesn't contain not-yet-unified caffe2 ops; -if(INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE) +if(INTERN_BUILD_MOBILE) if(NOT BUILD_SHARED_LIBS AND NOT "${SELECTED_OP_LIST}" STREQUAL "") string(APPEND CMAKE_CXX_FLAGS " -DNO_EXPORT") endif() @@ -612,13 +614,18 @@ if(INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE) set(INTERN_DISABLE_AUTOGRAD ON) endif() set(BUILD_PYTHON OFF) + set(BUILD_FUNCTORCH OFF) set(BUILD_CAFFE2_OPS OFF) set(USE_DISTRIBUTED OFF) set(NO_API ON) set(USE_FBGEMM OFF) set(USE_QNNPACK OFF) set(INTERN_DISABLE_ONNX ON) - set(INTERN_USE_EIGEN_BLAS ON) + if(USE_BLAS) + set(INTERN_USE_EIGEN_BLAS ON) + else() + set(INTERN_USE_EIGEN_BLAS OFF) + endif() # Disable developing mobile interpreter for actual mobile build. # Enable it elsewhere to capture build error. set(INTERN_DISABLE_MOBILE_INTERP ON) @@ -707,6 +714,16 @@ set(BUILD_ONEDNN_GRAPH OFF) include(cmake/Dependencies.cmake) +# Moved this cmake set option down here because CMAKE_CUDA_COMPILER_VERSION is not avaialble until now +cmake_dependent_option( + USE_FLASH_ATTENTION + "Whether to build the flash_attention kernel for scaled dot product attention" ON + "USE_CUDA AND NOT ROCM AND NOT MSVC AND NOT CMAKE_CUDA_COMPILER_VERSION VERSION_LESS 11.6" OFF) +if(USE_FLASH_ATTENTION) + ADD_DEFINITIONS(-DUSE_FLASH_ATTENTION) +ENDIF() + + if(USE_CUDA AND (CMAKE_CUDA_COMPILER_VERSION VERSION_LESS 10.2) AND (CMAKE_HOST_SYSTEM_NAME MATCHES "Windows")) # CUDA < 10.2 doesn't support compiling and extracting header dependencies in # one call, so instead CMake calls nvcc twice with && in between. @@ -794,27 +811,27 @@ endif() # ---[ Build flags if(NOT MSVC) string(APPEND CMAKE_CXX_FLAGS " -O2 -fPIC") - string(APPEND CMAKE_CXX_FLAGS " -Wno-narrowing") # Eigen fails to build with some versions, so convert this to a warning # Details at http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1459 string(APPEND CMAKE_CXX_FLAGS " -Wall") string(APPEND CMAKE_CXX_FLAGS " -Wextra") append_cxx_flag_if_supported("-Werror=return-type" CMAKE_CXX_FLAGS) - if(NOT USE_CUDNN) - # Temporary fix to ignore non virtual dtor error if cudnn is used. A - # separate PR to cudnn_frontend is needed to address this later on - append_cxx_flag_if_supported("-Werror=non-virtual-dtor" CMAKE_CXX_FLAGS) - endif() + append_cxx_flag_if_supported("-Werror=non-virtual-dtor" CMAKE_CXX_FLAGS) + append_cxx_flag_if_supported("-Werror=braced-scalar-init" CMAKE_CXX_FLAGS) + append_cxx_flag_if_supported("-Werror=range-loop-construct" CMAKE_CXX_FLAGS) + append_cxx_flag_if_supported("-Wnarrowing" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-missing-field-initializers" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-type-limits" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-array-bounds" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-unknown-pragmas" CMAKE_CXX_FLAGS) + append_cxx_flag_if_supported("-Wunused-local-typedefs" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-unused-parameter" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-unused-function" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-unused-result" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-strict-overflow" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-strict-aliasing" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-error=deprecated-declarations" CMAKE_CXX_FLAGS) + append_cxx_flag_if_supported("-Wvla-extension" CMAKE_CXX_FLAGS) if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang") string(APPEND CMAKE_CXX_FLAGS " -Wno-range-loop-analysis") string(APPEND CMAKE_CXX_FLAGS " -Wno-pass-failed") @@ -870,12 +887,14 @@ if(NOT MSVC) append_cxx_flag_if_supported("-Wno-c++14-extensions" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-constexpr-not-const" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Wno-missing-braces" CMAKE_CXX_FLAGS) + append_cxx_flag_if_supported("-Wunused-lambda-capture" CMAKE_CXX_FLAGS) + append_cxx_flag_if_supported("-Wunused-local-typedef" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Qunused-arguments" CMAKE_CXX_FLAGS) - if(${COLORIZE_OUTPUT}) + if(${USE_COLORIZE_OUTPUT}) endif() endif() - if(${COLORIZE_OUTPUT}) + if(${USE_COLORIZE_OUTPUT}) append_cxx_flag_if_supported("-fcolor-diagnostics" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-fdiagnostics-color=always" CMAKE_CXX_FLAGS) endif() @@ -903,14 +922,16 @@ if(NOT MSVC) append_cxx_flag_if_supported("-fno-trapping-math" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Werror=format" CMAKE_CXX_FLAGS) append_cxx_flag_if_supported("-Werror=cast-function-type" CMAKE_CXX_FLAGS) - check_cxx_compiler_flag("-Werror=sign-compare" HAS_WERROR_SIGN_COMPARE) - # This doesn't work globally so we use the test on specific - # target_compile_options endif() if(USE_ASAN) - string(APPEND CMAKE_CXX_FLAGS_DEBUG " -fsanitize=address") - string(APPEND CMAKE_LINKER_FLAGS_DEBUG " -fsanitize=address") + string(APPEND CMAKE_CXX_FLAGS_DEBUG " -fsanitize=address -fsanitize=undefined") + string(APPEND CMAKE_LINKER_FLAGS_DEBUG " -fsanitize=address -fsanitize=undefined") +endif() + +if(USE_TSAN) + string(APPEND CMAKE_CXX_FLAGS_DEBUG " -fsanitize=thread") + string(APPEND CMAKE_LINKER_FLAGS_DEBUG " -fsanitize=thread") endif() if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64") @@ -1150,7 +1171,6 @@ endif() include(cmake/Summary.cmake) caffe2_print_configuration_summary() -# ---[ Torch Deploy -if(USE_DEPLOY) - add_subdirectory(torch/csrc/deploy) +if(BUILD_FUNCTORCH) + add_subdirectory(functorch) endif() diff --git a/CODEOWNERS b/CODEOWNERS index ccd111beba86..179e87198dba 100644 --- a/CODEOWNERS +++ b/CODEOWNERS @@ -1,3 +1,8 @@ +# IMPORTANT: +# This file is ONLY used to subscribe for notifications for PRs +# related to a specific file path. Approvals from people in this +# file are not required for merges. + # This is a comment. # Each line is a file pattern followed by one or more owners. # For module labels => owners mapping, please see https://github.com/pytorch/pytorch/issues/24422. @@ -9,13 +14,30 @@ /torch/csrc/autograd/ @albanD @soulitzer /torch/autograd/ @albanD @soulitzer /tools/autograd/ @albanD @soulitzer -/torch/nn/ @albanD @jbschlosser @saketh-are +/torch/nn/ @albanD @jbschlosser /torch/optim/ @albanD /test/test_public_bindings.py @albanD /test/allowlist_for_publicAPI.json @albanD @anjali411 /docs/source/conf.py @albanD /aten/src/ATen/native/tags.yaml @anjali411 +# Architecture Optimization (quantization, sparsity, etc.) +/aten/src/ATen/native/ao_sparse @z-a-f @salilsdesai @kimishpatel @digantdesai @jianyuh +/aten/src/ATen/native/quantized @jerryzh168 @z-a-f @salilsdesai @kimishpatel @digantdesai @jianyuh +/aten/src/ATen/native/quantized/cpu @jerryzh168 @z-a-f @salilsdesai @kimishpatel @digantdesai @jianyuh +/aten/src/ATen/native/quantized/cuda @jerryzh168 @dzdang +/aten/src/ATen/native/quantized/cudnn @jerryzh168 @dzdang +/test/test_quantization.py @jerryzh168 +/test/ao/ @jerryzh168 @z-a-f @hdcharles +/test/quantization/ @jerryzh168 @z-a-f +/torch/quantization/ @jerryzh168 +ao/sparisty/ @z-a-f @hdcharles +ao/quantization/ @jerryzh168 +nn/intrinsic/ @jerryzh168 +nn/quantized/ @jerryzh168 +nn/quantizable/ @jerryzh168 @z-a-f +nn/qat/ @jerryzh168 + # Tensorpipe RPC Agent. /torch/csrc/distributed/rpc/tensorpipe_agent.cpp @jiayisuse @osalpekar @lw @beauby /torch/csrc/distributed/rpc/tensorpipe_agent.h @jiayisuse @osalpekar @lw @beauby @@ -23,22 +45,23 @@ # Distributed package # This list is mostly if you'd like to be tagged as reviewer, feel free to add # or remove yourself from it. -/torch/csrc/distributed/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @awgu -/torch/distributed/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @awgu -/torch/nn/parallel/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @mingzhe09088 @H-Huang @awgu +/torch/csrc/distributed/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @H-Huang @awgu @kwen2501 +/torch/distributed/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @H-Huang @awgu @kwen2501 +/torch/distributed/_composable @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @H-Huang @awgu @kwen2501 @yhcharles +/torch/nn/parallel/ @mrshenli @zhaojuanmao @pritamdamania87 @rohan-varma @H-Huang @awgu @kwen2501 # Distributed tests # This list is mostly if you'd like to be tagged as reviewer, feel free to add # or remove yourself from it. -/test/distributed @mrshenli @pritamdamania87 @zhaojuanmao @rohan-varma @H-Huang @awgu -/torch/testing/_internal/distributed @mrshenli @pritamdamania87 @zhaojuanmao @rohan-varma @H-Huang @awgu +/test/distributed @mrshenli @pritamdamania87 @zhaojuanmao @rohan-varma @H-Huang @awgu @kwen2501 +/torch/testing/_internal/distributed @mrshenli @pritamdamania87 @zhaojuanmao @rohan-varma @H-Huang @awgu @kwen2501 # ONNX Export -/torch/csrc/jit/passes/onnx.h @bowenbao @shubhambhokare1 -/torch/csrc/jit/passes/onnx.cpp @bowenbao @shubhambhokare1 -/torch/csrc/jit/passes/onnx/ @bowenbao @shubhambhokare1 -/torch/onnx/ @bowenbao @shubhambhokare1 -/test/onnx/ @bowenbao @shubhambhokare1 +/torch/csrc/jit/passes/onnx.h @bowenbao @abock +/torch/csrc/jit/passes/onnx.cpp @bowenbao @abock +/torch/csrc/jit/passes/onnx/ @bowenbao @abock +/torch/onnx/ @bowenbao @abock +/test/onnx/ @bowenbao @abock # Docker /.circleci/docker/ @jeffdaily @@ -68,12 +91,19 @@ /torch/testing/_internal/common_methods_invocations.py @mruberry @ngimel /torch/testing/_internal/common_device_type.py @mruberry @ngimel test/test_ops.py @mruberry @ngimel -test/test_ops_gradients.py @mruberry @ngimel +test/test_ops_gradients.py @mruberry @ngimel @soulitzer +test/test_ops_fwd_gradients.py @mruberry @ngimel @soulitzer test/test_unary_ufuncs.py @mruberry @ngimel test/test_binary_ufuncs.py @mruberry @ngimel test/test_reductions.py @mruberry @ngimel test/test_type_promotion.py @mruberry @ngimel +# functorch-related things +# This list is for people wanting to be notified every time there's a change +# Useful for e.g. auditing xfails that other folks add to tests +test/functorch/test_ops.py @zou3519 +test/functorch/test_vmap.py @zou3519 + # torch MPS test/test_mps.py @kulinseth aten/src/ATen/mps/ @kulinseth @@ -84,3 +114,6 @@ torch/csrc/autograd/profiler* @robieta torch/autograd/profiler* @robieta torch/csrc/profiler/ @robieta torch/profiler/ @robieta + +# AOTDispatch tests +test/functorch/test_aotdispatch.py @ezyang @Chillee diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 7b4a1246d002..eaf81b19eefa 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -16,7 +16,9 @@ - [Running `mypy`](#running-mypy) - [C++ Unit Testing](#c-unit-testing) - [Run Specific CI Jobs](#run-specific-ci-jobs) +- [Merging your Change](#merging-your-change) - [Writing documentation](#writing-documentation) + - [Docstring type formatting](#docstring-type-formatting) - [Building documentation](#building-documentation) - [Tips](#tips) - [Building C++ Documentation](#building-c-documentation) @@ -116,21 +118,9 @@ git submodule sync --recursive git submodule update --init --recursive --jobs 0 ``` -If you want to have no-op incremental rebuilds (which are fast), see the section below titled "Make no-op build fast." +If you want to have no-op incremental rebuilds (which are fast), see [Make no-op build fast](#make-no-op-build-fast) below. -3. Follow the instructions for [installing PyTorch from source](https://github.com/pytorch/pytorch#from-source), except when it's time to install PyTorch instead of invoking `setup.py install` you'll want to call `setup.py develop` instead: - -Specifically, the change you have to make is to replace - -```bash -python setup.py install -``` - -with - -```bash -python setup.py develop -``` +3. Follow the instructions for [installing PyTorch from source](https://github.com/pytorch/pytorch#from-source), but instead of installing PyTorch via `python setup.py install`, use `python setup.py develop`. This mode will symlink the Python files from the current local source tree into the Python install. This way when you modify a Python file, you @@ -434,6 +424,17 @@ ghstack submit [`ghstack`](https://github.com/ezyang/ghstack). It creates a large commit that is of very low signal to reviewers. +## Merging your Change +If you know the right people or team that should approve your PR (and you have the required permisssions to do so), add them to the Reviewers list. + +If not, leave the Reviewers section empty. Our triage squad will review your PR, add a module label, and assign it to the appropriate reviewer in a couple business days. The reviewer will then look at your PR and respond. + +Occasionally, things might fall through the cracks (sorry!). In case your PR either doesn't get assigned to a reviewer or doesn't get any response from the reviewer for 4 business days, please leave comment on the PR (mentioning the reviewer if one has been assigned). That'll get it nudged back onto people's radar. + +If that still doesn't help, come see us during [our office hours](https://github.com/pytorch/pytorch/wiki/Contact-Pytorch-Dev-Infra-Office) + +Once your PR is approved, you can merge it in by entering a comment with the content `@pytorchmergebot merge` ([what's this bot?](https://github.com/pytorch/pytorch/wiki/Bot-commands)) + ## Writing documentation So you want to write some documentation and don't know where to start? @@ -447,9 +448,47 @@ If you're interested in adding new developer docs, please read this [page on the The rest of this section is about user-facing documentation. -PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) +PyTorch uses [Google style](https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html) for formatting docstrings. Each line inside a docstrings block must be limited to 80 characters so that it fits into Jupyter documentation popups. + +### Docstring type formatting + +In addition to the standard Google Style docstring formatting rules, the following guidelines should be followed for docstring types (docstring types are the type information contained in the round brackets after the variable name): + +* The "`Callable`", "`Any`", "`Iterable`", "`Iterator`", "`Generator`" types should have their first letter capitalized. + +* The "`list`" and "`tuple`" types should be completely lowercase. + +* Types should not be made plural. For example: `tuple of int` should be used instead of `tuple of ints`. + +* The only acceptable delimiter words for types are `or` and `of`. No other non-type words should be used other than `optional`. + +* The word `optional` should only be used after the types, and it is only used if the user does not have to specify a value for the variable. Default values are listed after the variable description. Example: + + ``` + my_var (int, optional): Variable description. Default: 1 + ``` + +* Basic Python types should match their type name so that the [Intersphinx](https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html) extension can correctly identify them. For example: + * Use `str` instead of `string`. + * Use `bool` instead of `boolean`. + * Use `dict` instead of `dictionary`. + +* Square brackets should be used for the dictionary type. For example: + + ``` + my_var (dict[str, int]): Variable description. + ``` + +* If a variable has two different possible types, then the word `or` should be used without a comma. Otherwise variables with 3 or more types should use commas to separate the types. Example: + + ``` + x (type1 or type2): Variable description. + y (type1, type2, or type3): Variable description. + ``` + + ### Building documentation To build the documentation: @@ -1207,13 +1246,6 @@ In 2018, we merged Caffe2 into the PyTorch source repository. While the steady state aspiration is that Caffe2 and PyTorch share code freely, in the meantime there will be some separation. -If you submit a PR to only PyTorch or only Caffe2 code, CI will only -run for the project you edited. The logic for this is implemented -in `.jenkins/pytorch/dirty.sh` and `.jenkins/caffe2/dirty.sh`; you -can look at this to see what path prefixes constitute changes. -This also means if you ADD a new top-level path, or you start -sharing code between projects, you need to modify these files. - There are a few "unusual" directories which, for historical reasons, are Caffe2/PyTorch specific. Here they are: @@ -1246,8 +1278,9 @@ our [CI wiki](https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-f ### Which commit is used in CI? For CI run on `master`, this repository is checked out for a given `master` -commit, and CI is run on that commit (there isn't really any other choice). For -PRs, however, it's a bit more complicated. Consider this commit graph, where +commit, and CI is run on that commit (there isn't really any other choice). + +For PRs, however, it's a bit more complicated. Consider this commit graph, where `master` is at commit `A`, and the branch for PR #42 (just a placeholder) is at commit `B`: @@ -1256,7 +1289,7 @@ commit `B`: / \ / C (refs/pull/42/merge) / / ----o---o---o---A (refs/heads/master) +---o---o---o---A (merge-destination) - usually master ``` There are two possible choices for which commit to use: @@ -1264,37 +1297,18 @@ There are two possible choices for which commit to use: 1. Checkout commit `B`, the head of the PR (manually committed by the PR author). 2. Checkout commit `C`, the hypothetical result of what would happen if the PR - were merged into `master` (automatically generated by GitHub). - -This choice depends on several factors; here is the decision tree as of -2021-03-30: - -- For CI jobs on CircleCI: - - If the name of the job (or one of its ancestors in the workflow DAG) - contains "xla" or "gcc5", choice **2** is used. This includes the following - jobs: - - pytorch_linux_xenial_py3_6_gcc5_4_build - - pytorch_cpp_doc_build - - pytorch_doc_test - - pytorch_linux_forward_backward_compatibility_check_test - - pytorch_linux_xenial_py3_6_gcc5_4_jit_legacy_test - - pytorch_linux_xenial_py3_6_gcc5_4_test - - pytorch_python_doc_build - - pytorch_xla_linux_bionic_py3_6_clang9_build - - pytorch_xla_linux_bionic_py3_6_clang9_test - - Otherwise, choice **1** is used. -- For CI jobs on GitHub Actions: - - If the PR was created using [`ghstack`](https://github.com/ezyang/ghstack), - choice **1** is used. - - Otherwise, choice **2** is used. - -This is important to be aware of, because if you see a CI failure on your PR and -choice **2** is being used for that CI job, it is possible that the failure is -nondeterministically caused by a commit that does not exist in the ancestry of -your PR branch. If you happen to have write access to this repo, you can choose -to use `ghstack` to eliminate this nondeterminism for GitHub Actions jobs on -your PRs, but it will still be present for the select CircleCI jobs listed -above. + were merged into it's destination (usually `master`). + +For all practical purposes, most people can think of the commit being used as +commit `B` (choice **1**). + +However, if workflow files (which govern CI behavior) were modified (either by your PR or since dev branch were created ) there's +a nuance to know about: +The workflow files themselves get taken from checkpoint `C`, the merger of your +PR and the `master` branch. But only the workflow files get taken from that merged +checkpoint. Everything else (tests, code, etc) all get taken directly from your +PR's commit (commit `B`). Please note, this scenario would never affect PRs authored by `ghstack` as they would not automatically ingest the updates from default branch. + ## Dev Infra Office Hours [Dev Infra Office Hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) are hosted every Friday to answer any questions regarding developer experience, Green HUD, and CI. diff --git a/Dockerfile b/Dockerfile index 1bd522a62406..e125271607c9 100644 --- a/Dockerfile +++ b/Dockerfile @@ -11,8 +11,7 @@ ARG BASE_IMAGE=ubuntu:18.04 ARG PYTHON_VERSION=3.8 FROM ${BASE_IMAGE} as dev-base -RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \ - apt-get update && apt-get install -y --no-install-recommends \ +RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ ca-certificates \ ccache \ @@ -28,9 +27,16 @@ ENV PATH /opt/conda/bin:$PATH FROM dev-base as conda ARG PYTHON_VERSION=3.8 +# Automatically set by buildx +ARG TARGETPLATFORM +# translating Docker's TARGETPLATFORM into miniconda arches +RUN case ${TARGETPLATFORM} in \ + "linux/arm64") MINICONDA_ARCH=aarch64 ;; \ + *) MINICONDA_ARCH=x86_64 ;; \ + esac && \ + curl -fsSL -v -o ~/miniconda.sh -O "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-${MINICONDA_ARCH}.sh" COPY requirements.txt . -RUN curl -fsSL -v -o ~/miniconda.sh -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \ - chmod +x ~/miniconda.sh && \ +RUN chmod +x ~/miniconda.sh && \ ~/miniconda.sh -b -p /opt/conda && \ rm ~/miniconda.sh && \ /opt/conda/bin/conda install -y python=${PYTHON_VERSION} cmake conda-build pyyaml numpy ipython && \ @@ -53,19 +59,28 @@ RUN --mount=type=cache,target=/opt/ccache \ FROM conda as conda-installs ARG PYTHON_VERSION=3.8 -ARG CUDA_VERSION=11.3 +ARG CUDA_VERSION=11.6 ARG CUDA_CHANNEL=nvidia ARG INSTALL_CHANNEL=pytorch-nightly -ENV CONDA_OVERRIDE_CUDA=${CUDA_VERSION} -RUN /opt/conda/bin/conda install -c "${INSTALL_CHANNEL}" -c "${CUDA_CHANNEL}" -y python=${PYTHON_VERSION} pytorch torchvision torchtext "cudatoolkit=${CUDA_VERSION}" && \ +# Automatically set by buildx +RUN /opt/conda/bin/conda update -y conda +RUN /opt/conda/bin/conda install -c "${INSTALL_CHANNEL}" -y python=${PYTHON_VERSION} +ARG TARGETPLATFORM +ARG TRITON_VERSION + +# On arm64 we can only install wheel packages +RUN case ${TARGETPLATFORM} in \ + "linux/arm64") pip install --extra-index-url https://download.pytorch.org/whl/cpu/ torch torchvision torchtext ;; \ + *) /opt/conda/bin/conda install -c "${INSTALL_CHANNEL}" -c "${CUDA_CHANNEL}" -y "python=${PYTHON_VERSION}" pytorch torchvision torchtext "pytorch-cuda=$(echo $CUDA_VERSION | cut -d'.' -f 1-2)" ;; \ + esac && \ /opt/conda/bin/conda clean -ya RUN /opt/conda/bin/pip install torchelastic +RUN if test -n "${TRITON_VERSION}" -a "${TARGETPLATFORM}" != "linux/arm64"; then /opt/conda/bin/pip install "torchtriton==${TRITON_VERSION}" --extra-index-url https://download.pytorch.org/whl/nightly/cpu ; fi FROM ${BASE_IMAGE} as official ARG PYTORCH_VERSION LABEL com.nvidia.volumes.needed="nvidia_driver" -RUN --mount=type=cache,id=apt-final,target=/var/cache/apt \ - apt-get update && apt-get install -y --no-install-recommends \ +RUN apt-get update && apt-get install -y --no-install-recommends \ ca-certificates \ libjpeg-dev \ libpng-dev && \ diff --git a/MANIFEST.in b/MANIFEST.in index acf4c7291f43..403b90b702df 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -26,5 +26,6 @@ recursive-include benchmarks *.* recursive-include scripts *.* recursive-include mypy_plugins *.* recursive-include modules *.* +recursive-include functorch *.* prune */__pycache__ global-exclude *.o *.so *.dylib *.a .git *.pyc *.swp diff --git a/Makefile b/Makefile index 21745f42a887..45dfeb8cda26 100644 --- a/Makefile +++ b/Makefile @@ -31,3 +31,7 @@ lint: quicklint: lintrunner + +triton: + $(PIP) uninstall -y triton + $(PIP) install -U "git+https://github.com/openai/triton@$(shell cat .github/ci_commit_pins/triton.txt)#subdirectory=python" diff --git a/README.md b/README.md index 10d14d354cc8..bcce2997b25b 100644 --- a/README.md +++ b/README.md @@ -72,7 +72,7 @@ PyTorch provides Tensors that can live either on the CPU or the GPU and accelera computation by a huge amount. We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs -such as slicing, indexing, math operations, linear algebra, reductions. +such as slicing, indexing, mathematical operations, linear algebra, reductions. And they are fast! ### Dynamic Neural Networks: Tape-Based Autograd @@ -234,7 +234,7 @@ python tools/amd_build/build_amd.py Install PyTorch ```bash export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} -python setup.py install +python setup.py develop ``` Note that if you are using [Anaconda](https://www.anaconda.com/distribution/#download-section), you may experience an error caused by the linker: @@ -245,13 +245,13 @@ collect2: error: ld returned 1 exit status error: command 'g++' failed with exit status 1 ``` -This is caused by `ld` from Conda environment shadowing the system `ld`. You should use a newer version of Python that fixes this issue. The recommended Python version is 3.7.6+ and 3.8.1+. +This is caused by `ld` from the Conda environment shadowing the system `ld`. You should use a newer version of Python that fixes this issue. The recommended Python version is 3.7.6+ and 3.8.1+. **On macOS** ```bash export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} -MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install +MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py develop ``` **On Windows** @@ -274,7 +274,7 @@ In this mode PyTorch computations will run on your CPU, not your GPU ```cmd conda activate -python setup.py install +python setup.py develop ``` Note on OpenMP: The desired OpenMP implementation is Intel OpenMP (iomp). In order to link against iomp, you'll need to manually download the library and set up the building environment by tweaking `CMAKE_INCLUDE_PATH` and `LIB`. The instruction [here](https://github.com/pytorch/pytorch/blob/master/docs/source/notes/windows.rst#building-from-source) is an example for setting up both MKL and Intel OpenMP. Without these configurations for CMake, Microsoft Visual C OpenMP runtime (vcomp) will be used. @@ -284,7 +284,7 @@ Note on OpenMP: The desired OpenMP implementation is Intel OpenMP (iomp). In ord In this mode PyTorch computations will leverage your GPU via CUDA for faster number crunching [NVTX](https://docs.nvidia.com/gameworks/content/gameworkslibrary/nvtx/nvidia_tools_extension_library_nvtx.htm) is needed to build Pytorch with CUDA. -NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox. +NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto an already installed CUDA run CUDA installation once again and check the corresponding checkbox. Make sure that CUDA with Nsight Compute is installed after Visual Studio. Currently, VS 2017 / 2019, and Ninja are supported as the generator of CMake. If `ninja.exe` is detected in `PATH`, then Ninja will be used as the default generator, otherwise, it will use VS 2017 / 2019. @@ -299,7 +299,7 @@ You can refer to the [build_pytorch.bat](https://github.com/pytorch/pytorch/blob ```cmd cmd -:: Set the environment variables after you have downloaded and upzipped the mkl package, +:: Set the environment variables after you have downloaded and unzipped the mkl package, :: else CMake would throw an error as `Could NOT find OpenMP`. set CMAKE_INCLUDE_PATH={Your directory}\mkl\include set LIB={Your directory}\mkl\lib;%LIB% @@ -315,7 +315,7 @@ for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\ :: [Optional] If you want to override the CUDA host compiler set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.exe -python setup.py install +python setup.py develop ``` diff --git a/RELEASE.md b/RELEASE.md index 32f71e124141..d13ca5d11e10 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -14,7 +14,7 @@ - [Release Candidate health validation](#release-candidate-health-validation) - [Cherry Picking Fixes](#cherry-picking-fixes) - [Promoting RCs to Stable](#promoting-rcs-to-stable) - - [Additonal Steps to prepare for release day](#additonal-steps-to-prepare-for-release-day) + - [Additional Steps to prepare for release day](#additional-steps-to-prepare-for-release-day) - [Modify release matrix](#modify-release-matrix) - [Open Google Colab issue](#open-google-colab-issue) - [Patch Releases](#patch-releases) @@ -186,7 +186,7 @@ Promotion should occur in two steps: **NOTE**: The promotion of wheels to PyPI can only be done once so take caution when attempting to promote wheels to PyPI, (see https://github.com/pypa/warehouse/issues/726 for a discussion on potential draft releases within PyPI) -## Additonal Steps to prepare for release day +## Additional Steps to prepare for release day The following should be prepared for the release day @@ -264,7 +264,7 @@ For versions of Python that we support we follow the [NEP 29 policy](https://num ## Accelerator Software -For acclerator software like CUDA and ROCm we will typically use the following criteria: +For accelerator software like CUDA and ROCm we will typically use the following criteria: * Support latest 2 minor versions ### Special support cases @@ -281,7 +281,7 @@ need to support these particular versions of software. In the event a submodule cannot be fast forwarded and a patch must be applied we can take two different approaches: -* (preferred) Fork the said repository under the pytorch Github organization, apply the patches we need there, and then switch our submodule to accept our fork. +* (preferred) Fork the said repository under the pytorch GitHub organization, apply the patches we need there, and then switch our submodule to accept our fork. * Get the dependencies maintainers to support a release branch for us Editing submodule remotes can be easily done with: (running from the root of the git repository) diff --git a/WORKSPACE b/WORKSPACE index d26dfca5a333..e8591f291abd 100644 --- a/WORKSPACE +++ b/WORKSPACE @@ -84,10 +84,17 @@ new_local_repository( path = "third_party/eigen", ) +new_local_repository( + name = "cutlass", + build_file = "//third_party:cutlass.BUILD", + path = "third_party/cutlass", +) + new_local_repository( name = "fbgemm", build_file = "//third_party:fbgemm/BUILD.bazel", path = "third_party/fbgemm", + repo_mapping = {"@cpuinfo" : "@org_pytorch_cpuinfo"} ) new_local_repository( @@ -103,8 +110,8 @@ new_local_repository( ) new_local_repository( - name = "cpuinfo", - build_file = "//third_party:cpuinfo.BUILD", + name = "org_pytorch_cpuinfo", + build_file = "//third_party:cpuinfo/BUILD.bazel", path = "third_party/cpuinfo", ) diff --git a/android/gradle.properties b/android/gradle.properties index 9d2640f9a185..ecefc09a587b 100644 --- a/android/gradle.properties +++ b/android/gradle.properties @@ -1,6 +1,6 @@ ABI_FILTERS=armeabi-v7a,arm64-v8a,x86,x86_64 -VERSION_NAME=1.13.0-SNAPSHOT +VERSION_NAME=1.14.0-SNAPSHOT GROUP=org.pytorch MAVEN_GROUP=org.pytorch SONATYPE_STAGING_PROFILE=orgpytorch diff --git a/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp b/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp index 5ed0c9978e83..beafc0a7114a 100644 --- a/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp +++ b/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp @@ -303,7 +303,7 @@ facebook::jni::local_ref JIValue::newJIValueFromStringDict( facebook::jni::alias_ref>::create(); for (auto& pair : dict) { jmap->put( - facebook::jni::make_jstring(pair.key().toString()->string()), + facebook::jni::make_jstring(pair.key().toStringRef()), JIValue::newJIValueFromAtIValue(pair.value())); } return jMethodDictStringKey(JIValue::javaClassStatic(), jmap); diff --git a/android/pytorch_android/src/main/cpp/pytorch_jni_jit.cpp b/android/pytorch_android/src/main/cpp/pytorch_jni_jit.cpp index 1b0d54784d76..6ef4f462df16 100644 --- a/android/pytorch_android/src/main/cpp/pytorch_jni_jit.cpp +++ b/android/pytorch_android/src/main/cpp/pytorch_jni_jit.cpp @@ -195,14 +195,16 @@ class PytorchJni : public facebook::jni::HybridClass { std::vector inputs{}; size_t n = jinputs->size(); inputs.reserve(n); + const bool requires_backend_transfers = + module_.attr("requires_backend_transfers", at::IValue(true)).toBool(); for (size_t i = 0; i < n; i++) { at::IValue atIValue = JIValue::JIValueToAtIValue(jinputs->getElement(i)); - if (at::kVulkan == deviceType_) { + if (at::kVulkan == deviceType_ && requires_backend_transfers) { inputs.push_back( atIValue.isTensor() ? at::IValue{atIValue.toTensor().vulkan()} : std::move(atIValue)); } else { - TORCH_CHECK(at::kCPU == deviceType_); + TORCH_CHECK(at::kCPU == deviceType_ || !requires_backend_transfers); inputs.push_back(std::move(atIValue)); } } @@ -223,14 +225,16 @@ class PytorchJni : public facebook::jni::HybridClass { std::vector inputs{}; size_t n = jinputs->size(); inputs.reserve(n); + const bool requires_backend_transfers = + module_.attr("requires_backend_transfers", at::IValue(true)).toBool(); for (size_t i = 0; i < n; i++) { at::IValue atIValue = JIValue::JIValueToAtIValue(jinputs->getElement(i)); - if (at::kVulkan == deviceType_) { + if (at::kVulkan == deviceType_ && requires_backend_transfers) { inputs.push_back( atIValue.isTensor() ? at::IValue{atIValue.toTensor().vulkan()} : std::move(atIValue)); } else { - TORCH_CHECK(at::kCPU == deviceType_); + TORCH_CHECK(at::kCPU == deviceType_ || !requires_backend_transfers); inputs.push_back(std::move(atIValue)); } } diff --git a/android/pytorch_android/src/main/cpp/pytorch_jni_lite.cpp b/android/pytorch_android/src/main/cpp/pytorch_jni_lite.cpp index 86fd1e2260f9..802bb801a1f9 100644 --- a/android/pytorch_android/src/main/cpp/pytorch_jni_lite.cpp +++ b/android/pytorch_android/src/main/cpp/pytorch_jni_lite.cpp @@ -158,14 +158,16 @@ class PytorchJni : public facebook::jni::HybridClass { std::vector inputs{}; size_t n = jinputs->size(); inputs.reserve(n); + const bool requires_backend_transfers = + module_.attr("requires_backend_transfers", at::IValue(true)).toBool(); for (const auto i : c10::irange(n)) { at::IValue atIValue = JIValue::JIValueToAtIValue(jinputs->getElement(i)); - if (at::kVulkan == deviceType_) { + if (at::kVulkan == deviceType_ && requires_backend_transfers) { inputs.push_back( atIValue.isTensor() ? at::IValue{atIValue.toTensor().vulkan()} : std::move(atIValue)); } else { - TORCH_CHECK(at::kCPU == deviceType_); + TORCH_CHECK(at::kCPU == deviceType_ || !requires_backend_transfers); inputs.push_back(std::move(atIValue)); } } @@ -187,14 +189,16 @@ class PytorchJni : public facebook::jni::HybridClass { std::vector inputs{}; size_t n = jinputs->size(); inputs.reserve(n); + const bool requires_backend_transfers = + module_.attr("requires_backend_transfers", at::IValue(true)).toBool(); for (const auto i : c10::irange(n)) { at::IValue atIValue = JIValue::JIValueToAtIValue(jinputs->getElement(i)); - if (at::kVulkan == deviceType_) { + if (at::kVulkan == deviceType_ && requires_backend_transfers) { inputs.push_back( atIValue.isTensor() ? at::IValue{atIValue.toTensor().vulkan()} : std::move(atIValue)); } else { - TORCH_CHECK(at::kCPU == deviceType_); + TORCH_CHECK(at::kCPU == deviceType_ || !requires_backend_transfers); inputs.push_back(std::move(atIValue)); } } diff --git a/aten/CMakeLists.txt b/aten/CMakeLists.txt index 9c3757f346cd..9ba141c29e42 100644 --- a/aten/CMakeLists.txt +++ b/aten/CMakeLists.txt @@ -33,6 +33,7 @@ set(ATen_HIP_SRCS_W_SORT_BY_KEY) set(ATen_HIP_TEST_SRCS) set(ATen_HIP_INCLUDE) set(ATen_MPS_SRCS) +set(ATen_MPS_TEST_SRCS) set(ATen_VULKAN_TEST_SRCS) set(ATen_CPU_DEPENDENCY_LIBS) set(ATen_CUDA_DEPENDENCY_LIBS) @@ -55,7 +56,7 @@ set(TH_CPU_INCLUDE list(APPEND ATen_CPU_INCLUDE ${TH_CPU_INCLUDE}) if(USE_VULKAN) - list(APPEND ATen_CPU_INCLUDE ${CMAKE_BINARY_DIR}/vulkan) + list(APPEND ATen_CPU_INCLUDE ${CMAKE_BINARY_DIR}/vulkan ${CMAKE_CURRENT_SOURCE_DIR}/../third_party/VulkanMemoryAllocator) endif() # Find the HIP package, set the HIP paths, load the HIP CMake. @@ -106,6 +107,7 @@ set(ATen_CUDA_SRCS_W_SORT_BY_KEY ${ATen_CUDA_SRCS_W_SORT_BY_KEY} PARENT_SCOPE) set(ATen_CUDA_CU_SRCS_W_SORT_BY_KEY ${ATen_CUDA_CU_SRCS_W_SORT_BY_KEY} PARENT_SCOPE) set(ATen_HIP_SRCS ${ATen_HIP_SRCS} PARENT_SCOPE) set(ATen_MPS_SRCS ${ATen_MPS_SRCS} PARENT_SCOPE) +set(ATen_MPS_TEST_SRCS ${ATen_MPS_TEST_SRCS} PARENT_SCOPE) set(ATen_HIP_SRCS_W_SORT_BY_KEY ${ATen_HIP_SRCS_W_SORT_BY_KEY} PARENT_SCOPE) set(ATen_NVRTC_STUB_SRCS ${ATen_NVRTC_STUB_SRCS} PARENT_SCOPE) set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE) diff --git a/aten/src/ATen/ATen.h b/aten/src/ATen/ATen.h index 1be43cbe7def..4a5a949f0dd7 100644 --- a/aten/src/ATen/ATen.h +++ b/aten/src/ATen/ATen.h @@ -31,3 +31,7 @@ #include #include #include + +// TODO: try to remove this +// There is some back story, see https://github.com/pytorch/pytorch/issues/48684 +#include diff --git a/aten/src/ATen/BatchedTensorImpl.cpp b/aten/src/ATen/BatchedTensorImpl.cpp index d5ab588de53d..fdedfa7c6316 100644 --- a/aten/src/ATen/BatchedTensorImpl.cpp +++ b/aten/src/ATen/BatchedTensorImpl.cpp @@ -17,7 +17,7 @@ BatchedTensorImpl::BatchedTensorImpl(Tensor value, BatchDims bdims) { TORCH_INTERNAL_ASSERT(value_.defined()); set_storage_access_should_throw(); - set_sizes_strides_policy(SizesStridesPolicy::CustomStrides); + set_custom_sizes_strides(SizesStridesPolicy::CustomStrides); checkInvariants(); const auto public_dims = value_.dim() - bdims_.size(); diff --git a/aten/src/ATen/BatchingRegistrations.cpp b/aten/src/ATen/BatchingRegistrations.cpp index a269f82fa817..5a01f949745f 100644 --- a/aten/src/ATen/BatchingRegistrations.cpp +++ b/aten/src/ATen/BatchingRegistrations.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include @@ -185,14 +186,6 @@ Tensor expand_batching_rule(const Tensor& self, IntArrayRef size, bool implicit) return self_physical.getPhysicalToLogicalMap().apply(result); } -Tensor expand_symint_batching_rule(const Tensor& self, SymIntArrayRef psize, bool implicit) { - return self.expand(asIntArrayRefSlow(psize), implicit); -} - -Tensor sum_symint_batching_rule(const Tensor& input_t, c10::SymIntArrayRef dim, bool keepdim, optional opt_dtype) { - return input_t.sum(c10::asIntArrayRefSlow(dim), keepdim, opt_dtype); -} - std::vector chunk_batching_rule(const Tensor& self, int64_t chunks, int64_t dim) { auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self); auto dim_physical = self_physical.getPhysicalDim(dim); @@ -472,10 +465,6 @@ Tensor view_batching_rule(const Tensor& self, IntArrayRef size) { return self_physical.getPhysicalToLogicalMap().apply(result); } -Tensor view_symint_batching_rule(const Tensor& self, c10::SymIntArrayRef size) { - return self.view(asIntArrayRefSlow(size)); -} - Tensor view_as_complex_batching_rule(const Tensor& self) { // guard against the user passing in a batch of scalar tensors with batch // size equal to 2. @@ -928,7 +917,7 @@ Tensor mm_batching_rule(const Tensor& self, const Tensor& other) { TORCH_INTERNAL_ASSERT(false, "either self or other must be a BatchedTensor"); } -Tensor cat_batching_rule(TensorList tensors, int64_t dim) { +Tensor cat_batching_rule(const ITensorListRef& tensors, int64_t dim) { auto physical_views = MultiBatchVmapTransform::logicalToPhysical(tensors); auto physical_tensors = fmap( physical_views, [](const VmapPhysicalView& view) -> Tensor { return view.tensor(); }); @@ -1006,16 +995,6 @@ Tensor new_empty_batching_rule( return physical_view.getPhysicalToLogicalMap().apply(result); } -Tensor new_empty_symint_batching_rule( - const Tensor& self, - c10::SymIntArrayRef size, - c10::optional dtype, - c10::optional layout, - c10::optional device, - c10::optional pin_memory) { - return new_empty_batching_rule(self, asIntArrayRefSlow(size), dtype, layout, device, pin_memory); -} - Tensor new_empty_strided_batching_rule( const Tensor& self, IntArrayRef size, @@ -1100,7 +1079,6 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { m.impl("_new_zeros_with_same_feature_meta", _new_zeros_with_same_feature_meta_batching_rule); m.impl("sum.dim_IntList", sum_batching_rule); - m.impl("sum.SymInt", sum_symint_batching_rule); m.impl("is_complex", native::is_complex); // inplace operations @@ -1115,13 +1093,12 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { m.impl("tensor_split.indices", tensor_split_indices_batching_rule); m.impl("diagonal", diagonal_batching_rule); m.impl("expand", expand_batching_rule); - m.impl("expand.SymInt", expand_symint_batching_rule); m.impl("expand_as", native::expand_as); // composite wrt autograd m.impl("movedim.intlist", movedim_batching_rule); m.impl("movedim.int", static_cast(native::movedim)); // composite wrt autograd - // NB: static_cast because there's another variant of narrow. However, we don't + // There is another variant of narrow. However, we don't // want to support the other variant yet bc it isn't documented... - m.impl("narrow", static_cast(native::narrow)); // composite wrt autograd + m.impl("narrow", native::narrow_symint); // composite wrt autograd m.impl("numpy_T", native::numpy_T); // composite wrt autograd m.impl("matrix_H", native::matrix_H); // composite wrt autograd m.impl("mT", native::mT); // composite wrt autograd @@ -1144,7 +1121,6 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { m.impl("unfold", unfold_batching_rule); m.impl("unsqueeze", unsqueeze_batching_rule); m.impl("view", view_batching_rule); - m.impl("view.SymInt", view_symint_batching_rule); m.impl("view_as", native::view_as); // composite wrt autograd // clamp operations @@ -1283,7 +1259,6 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { // Tensor.new_* operators m.impl("new_empty", new_empty_batching_rule); - m.impl("new_empty.SymInt", new_empty_symint_batching_rule); m.impl("new_empty_strided", new_empty_strided_batching_rule); m.impl("new_zeros", new_zeros_batching_rule); diff --git a/aten/src/ATen/CMakeLists.txt b/aten/src/ATen/CMakeLists.txt index 286d59f3e97d..613c6a6834e3 100644 --- a/aten/src/ATen/CMakeLists.txt +++ b/aten/src/ATen/CMakeLists.txt @@ -56,8 +56,8 @@ if(NOT BUILD_CAFFE2 AND NOT BUILD_LITE_INTERPRETER) EXCLUDE(ATen_CORE_TEST_SRCS "${ATen_CORE_TEST_SRCS}" ${ATen_CORE_EXCLUDED_TEST_SRCS}) endif() -file(GLOB base_h "*.h" "detail/*.h" "cpu/*.h" "cpu/vec/vec512/*.h" "cpu/vec/vec256/*.h" "cpu/vec/*.h" "quantized/*.h") -file(GLOB base_cpp "*.cpp" "detail/*.cpp" "cpu/*.cpp") +file(GLOB base_h "*.h" "detail/*.h" "cpu/*.h" "cpu/vec/vec512/*.h" "cpu/vec/vec256/*.h" "cpu/vec/vec256/vsx/*.h" "cpu/vec/*.h" "quantized/*.h" "functorch/*.h") +file(GLOB base_cpp "*.cpp" "detail/*.cpp" "cpu/*.cpp" "functorch/*.cpp") file(GLOB cuda_h "cuda/*.h" "cuda/detail/*.h" "cuda/*.cuh" "cuda/detail/*.cuh") file(GLOB cuda_cpp "cuda/*.cpp" "cuda/detail/*.cpp") file(GLOB cuda_nvrtc_stub_h "cuda/nvrtc_stub/*.h") @@ -130,15 +130,13 @@ file(GLOB native_cuda_h "native/cuda/*.h" "native/cuda/*.cuh") file(GLOB native_cuda_linalg_cpp "native/cuda/linalg/*.cpp") file(GLOB native_hip_h "native/hip/*.h" "native/hip/*.cuh") file(GLOB native_cudnn_cpp "native/cudnn/*.cpp") -file(GLOB native_nested_cuda_cu "native/nested/cuda/*.cu") -file(GLOB native_nested_cuda_cpp "native/nested/cuda/*.cpp") file(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu") file(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp") file(GLOB native_quantized_cuda_cu "native/quantized/cuda/*.cu") file(GLOB native_quantized_cuda_cpp "native/quantized/cuda/*.cpp") file(GLOB native_quantized_cudnn_cpp "native/quantized/cudnn/*.cpp") -file(GLOB native_transformers_cuda_cu "native/transformers/cuda/*.cu") -file(GLOB native_transformers_cuda_cpp "native/transformers/cuda/*.cpp") +file(GLOB native_nested_cuda_cu "native/nested/cuda/*.cu") +file(GLOB native_nested_cuda_cpp "native/nested/cuda/*.cpp") file(GLOB native_hip_hip "native/hip/*.hip") file(GLOB native_hip_cpp "native/hip/*.cpp") @@ -151,11 +149,31 @@ file(GLOB native_sparse_hip_hip "native/sparse/hip/*.hip") file(GLOB native_sparse_hip_cpp "native/sparse/hip/*.cpp") file(GLOB native_quantized_hip_hip "native/quantized/hip/*.hip") file(GLOB native_quantized_hip_cpp "native/quantized/hip/*.cpp") +file(GLOB native_transformers_cuda_cu "native/transformers/cuda/*.cu") +file(GLOB native_transformers_cuda_cpp "native/transformers/cuda/*.cpp") file(GLOB native_transformers_hip_hip "native/transformers/hip/*.hip") file(GLOB native_transformers_hip_cpp "native/transformers/hip/*.cpp") file(GLOB native_quantized_cudnn_hip_cpp "native/quantized/cudnn/hip/*.cpp") file(GLOB native_utils_cpp "native/utils/*.cpp") +# flash_attention sources +file(GLOB flash_attention_cuda_cu "native/transformers/cuda/flash_attn/*.cu") +file(GLOB flash_attention_cuda_cpp "native/transformers/cuda/flash_attn/*.cpp") + +#Mem_eff attention sources +file(GLOB mem_eff_attention_cuda_cu "native/transformers/cuda/mem_eff_attention/*.cu") +file(GLOB mem_eff_attention_cuda_kernels_cu "native/transformers/cuda/mem_eff_attention/kernels/*.cu") +file(GLOB mem_eff_attention_cuda_cpp "native/transformers/cuda/mem_eff_attention/*.cpp") + +if(USE_FLASH_ATTENTION) + list(APPEND native_transformers_cuda_cu ${flash_attention_cuda_cu}) + list(APPEND native_transformers_cuda_cpp ${flash_attention_cuda_cpp}) + + list(APPEND native_transformers_cuda_cu ${mem_eff_attention_cuda_cu}) + list(APPEND native_transformers_cuda_cu ${mem_eff_attention_cuda_kernels_cu}) + list(APPEND native_transformers_cuda_cpp ${mem_eff_attention_cuda_cpp}) +endif() + # XNNPACK file(GLOB native_xnnpack "native/xnnpack/*.cpp") @@ -415,6 +433,7 @@ if(NOT MSVC AND NOT EMSCRIPTEN AND NOT INTERN_BUILD_MOBILE) endif() if(USE_CUDA AND NOT USE_ROCM) + list(APPEND ATen_CUDA_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/../../../third_party/cutlass/include) if($ENV{ATEN_STATIC_CUDA}) list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${CUDA_LIBRARIES} @@ -593,6 +612,7 @@ set(ATen_MOBILE_BENCHMARK_SRCS ${ATen_MOBILE_BENCHMARK_SRCS} PARENT_SCOPE) set(ATen_MOBILE_TEST_SRCS ${ATen_MOBILE_TEST_SRCS} ${ATen_VULKAN_TEST_SRCS} PARENT_SCOPE) set(ATen_VEC_TEST_SRCS ${ATen_VEC_TEST_SRCS} PARENT_SCOPE) set(ATen_QUANTIZED_TEST_SRCS ${ATen_QUANTIZED_TEST_SRCS} PARENT_SCOPE) +set(ATen_MPS_TEST_SRCS ${ATen_MPS_TEST_SRCS} PARENT_SCOPE) set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE) set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE) set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE) diff --git a/aten/src/ATen/Context.cpp b/aten/src/ATen/Context.cpp index 4e8c9cae04f7..7086a05ab6c7 100644 --- a/aten/src/ATen/Context.cpp +++ b/aten/src/ATen/Context.cpp @@ -104,6 +104,30 @@ void Context::setAllowTF32CuDNN(bool b) { allow_tf32_cudnn = b; } +bool Context::userEnabledFlashSDP() const { + return enabled_flashSDP; +} + +void Context::setSDPUseFlash(bool e) { + enabled_flashSDP = e; +} + +bool Context::userEnabledMemEfficientSDP() const { + return enabled_mem_efficientSDP; +} + +void Context::setSDPUseMemEfficient(bool e) { + enabled_mem_efficientSDP = e; +} + +bool Context::userEnabledMathSDP() const { + return enabled_mathSDP; +} + +void Context::setSDPUseMath(bool e) { + enabled_mathSDP = e; +} + // NOLINTNEXTLINE(cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays) static const char cublas_config_var_name[] = "CUBLAS_WORKSPACE_CONFIG"; // NOLINTNEXTLINE(cppcoreguidelines-avoid-c-arrays,modernize-avoid-c-arrays) @@ -125,7 +149,11 @@ bool Context::checkCuBLASConfigDeterministic() { void Context::alertCuBLASConfigNotDeterministic() const { static bool cublas_config_deterministic = checkCuBLASConfigDeterministic(); - TORCH_CHECK(!deterministicAlgorithms() || cublas_config_deterministic, + if (C10_LIKELY(!deterministicAlgorithms() || cublas_config_deterministic)) { + return; + } + + auto msg = c10::str( "Deterministic behavior was enabled with either `torch.use_deterministic_algorithms(True)` or ", "`at::Context::setDeterministicAlgorithms(true)`, but this operation is not deterministic because ", "it uses CuBLAS and you have CUDA >= 10.2. To enable deterministic behavior in this ", @@ -134,6 +162,12 @@ void Context::alertCuBLASConfigNotDeterministic() const { cublas_config_var_name, "=", cublas_deterministic_configs[1], ". For more information, go to ", "https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility" ); + + if (deterministicAlgorithmsWarnOnly()) { + TORCH_WARN(msg); + } else { + TORCH_CHECK(false, msg); + } } bool Context::benchmarkCuDNN() const { @@ -298,9 +332,12 @@ const std::vector& Context::supportedQEngines() { #ifdef USE_FBGEMM if (fbgemm::fbgemmSupportedCPU()) { + // The X86 qengine is available if and only if FBGEMM is available + engines.push_back(at::kX86); engines.push_back(at::kFBGEMM); } #endif + return engines; }(); return supported_qengines; diff --git a/aten/src/ATen/Context.h b/aten/src/ATen/Context.h index 8f3928376473..48e3c935a2c0 100644 --- a/aten/src/ATen/Context.h +++ b/aten/src/ATen/Context.h @@ -126,6 +126,26 @@ class TORCH_API Context { bool deterministicCuDNN() const; void setDeterministicCuDNN(bool); + // Note [Disabling Fused SDP Kernels] + // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + // Flash and Memory Efficient SDP kernels are enabled by default. + // However, they can be disabled by setting + // at::globalContext().setUserEnabledFlashSDP(false) flag. + // This is useful for debugging purposes. For example, if you want to + // compare the performance of the flash SDP kernels with the unfused + // kernel, you can disable the flash SDP kernels. By disabling + // the math SDP kernel, you can force your code to use flash kernels. + // The math SDP kernel can be disabled by setting + // at::globalContext().setUserEnabledMathSDP(false) flag. + void setSDPUseFlash(bool); + bool userEnabledFlashSDP() const; + + void setSDPUseMemEfficient(bool); + bool userEnabledMemEfficientSDP() const; + + void setSDPUseMath(bool); + bool userEnabledMathSDP() const; + at::LinalgBackend linalgPreferredBackend() const; void setLinalgPreferredBackend(at::LinalgBackend); @@ -253,7 +273,14 @@ class TORCH_API Context { bool deterministic_cudnn = false; bool _deterministic_algorithms = false; bool _deterministic_algorithms_warn_only = false; + bool enabled_flashSDP = true; + bool enabled_mem_efficientSDP = true; + bool enabled_mathSDP = true; +#ifdef USE_ROCM + bool benchmark_cudnn = true; +#else bool benchmark_cudnn = false; +#endif Float32MatmulPrecision float32_matmul_precision = at::Float32MatmulPrecision::HIGHEST; int benchmark_limit_cudnn = 10; diff --git a/aten/src/ATen/DLConvertor.cpp b/aten/src/ATen/DLConvertor.cpp index fb3f3596e1fe..614dc46158e8 100644 --- a/aten/src/ATen/DLConvertor.cpp +++ b/aten/src/ATen/DLConvertor.cpp @@ -215,11 +215,22 @@ void deleter(DLManagedTensor* arg) { // This function returns a shared_ptr to memory managed DLpack tensor // constructed out of ATen tensor DLManagedTensor* toDLPack(const Tensor& src) { + // create a new tensor with possibly normalized strides + // gh-83069 + auto shape = src.sizes(); + auto strides = src.strides().vec(); + for (int i=0; ihandle = src; + atDLMTensor->handle = view; atDLMTensor->tensor.manager_ctx = atDLMTensor; atDLMTensor->tensor.deleter = &deleter; - atDLMTensor->tensor.dl_tensor.data = src.data_ptr(); + atDLMTensor->tensor.dl_tensor.data = view.data_ptr(); int64_t device_id = 0; if (src.is_cuda()) { device_id = src.get_device(); @@ -229,10 +240,10 @@ DLManagedTensor* toDLPack(const Tensor& src) { atDLMTensor->tensor.dl_tensor.dtype = getDLDataType(src); atDLMTensor->tensor.dl_tensor.shape = // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast) - const_cast(src.sizes().data()); + const_cast(view.sizes().data()); atDLMTensor->tensor.dl_tensor.strides = // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast) - const_cast(src.strides().data()); + const_cast(view.strides().data()); atDLMTensor->tensor.dl_tensor.byte_offset = 0; return &(atDLMTensor->tensor); } @@ -241,8 +252,10 @@ Tensor fromDLPack(const DLManagedTensor* src) { Device device = getATenDevice(src->dl_tensor.device); ScalarType stype = toScalarType(src->dl_tensor.dtype); auto deleter = [src](void* self) { - // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast) - src->deleter(const_cast(src)); + if (src->deleter) { + // NOLINTNEXTLINE(cppcoreguidelines-pro-type-const-cast) + src->deleter(const_cast(src)); + } }; if (!src->dl_tensor.strides) { return at::from_blob(src->dl_tensor.data, diff --git a/aten/src/ATen/DeviceGuard.h b/aten/src/ATen/DeviceGuard.h index a827a1ccc7fa..83bb31d7fd42 100644 --- a/aten/src/ATen/DeviceGuard.h +++ b/aten/src/ATen/DeviceGuard.h @@ -1,5 +1,6 @@ #pragma once +#include #include #include #include // TensorList whyyyyy @@ -29,7 +30,7 @@ inline c10::optional device_of(const c10::optional& t) { /// Return the Device of a TensorList, if the list is non-empty and /// the first Tensor is defined. (This function implicitly assumes /// that all tensors in the list have the same device.) -inline c10::optional device_of(TensorList t) { +inline c10::optional device_of(ITensorListRef t) { if (!t.empty()) { return device_of(t.front()); } else { diff --git a/aten/src/ATen/Dispatch.h b/aten/src/ATen/Dispatch.h index 08d41126a161..d2f5a244ad57 100644 --- a/aten/src/ATen/Dispatch.h +++ b/aten/src/ATen/Dispatch.h @@ -8,6 +8,10 @@ #include #include +#ifdef __CUDACC__ +#include // For CUDA_VERSION +#endif + #ifdef TEMPLATE_SELECTIVE_BUILD #include #else @@ -72,10 +76,20 @@ TORCH_API void record_kernel_function_dtype(std::string name); }) #endif +// Workaround for C10_UNUSED because CUDA 10.2 and below fails to handle unused +// attribute in the type aliasing context. Keep name long and verbose to avoid +// macro collisions. +#if defined(__CUDACC__) && CUDA_VERSION < 11000 +#define C10_UNUSED_DISPATCH_CUDA_WORKAROUND +#else +#define C10_UNUSED_DISPATCH_CUDA_WORKAROUND C10_UNUSED +#endif + #define AT_PRIVATE_CASE_TYPE_USING_HINT(enum_type, HINT, ...) \ case enum_type: { \ AT_PRIVATE_CHECK_SELECTIVE_BUILD(enum_type); \ - using HINT = c10::impl::ScalarTypeToCPPTypeT; \ + using HINT C10_UNUSED_DISPATCH_CUDA_WORKAROUND = \ + c10::impl::ScalarTypeToCPPTypeT; \ return __VA_ARGS__(); \ } @@ -186,7 +200,7 @@ inline void deprecated_AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX() {} // conditionally compile fragments of the case statements such // that the kernel functions are specialized only for the dtypes // that are needed. The NAME parameter *must* be a build time -// cons char* (can't be std::string, etc...) +// const char* (can't be std::string, etc...) // // Please ensure that the NAME is unique for every implementation // or you run the risk of over-including code for the kernel diff --git a/aten/src/ATen/EmptyTensor.cpp b/aten/src/ATen/EmptyTensor.cpp index ff91aa0bd14d..daf0b6842365 100644 --- a/aten/src/ATen/EmptyTensor.cpp +++ b/aten/src/ATen/EmptyTensor.cpp @@ -106,6 +106,35 @@ size_t computeStorageNbytes( #endif } +// not including mobile-only macros in this function, +// since mobile shouldn't be using symints. +SymInt computeStorageNbytes( + SymIntArrayRef sizes, + SymIntArrayRef strides, + SymInt itemsize_bytes, + SymInt storage_offset + ) { + TORCH_CHECK( + sizes.size() == strides.size(), + "dimensionality of sizes (", + sizes.size(), + ") must match dimensionality of strides (", + strides.size(), + ")"); + + // size of the underlying storage is 1 bigger than the offset + // of the last element according to stride + SymInt size = 1; + for (const auto i : c10::irange(sizes.size())) { + if (sizes[i] == 0) { + return 0; + } + + size += strides[i] * (sizes[i] - 1); + } + return itemsize_bytes * (storage_offset + size); +} + TensorBase empty_generic( IntArrayRef size, c10::Allocator* allocator, @@ -140,20 +169,20 @@ TensorBase empty_generic( return tensor; } -TensorBase empty_strided_generic( - IntArrayRef size, - IntArrayRef stride, +template +TensorBase _empty_strided_generic( + T size, + T stride, c10::Allocator* allocator, c10::DispatchKeySet ks, ScalarType scalar_type) { at::detail::check_size_nonnegative(size); at::detail::raise_warning_for_complex_half(scalar_type); caffe2::TypeMeta dtype = scalarTypeToTypeMeta(scalar_type); - size_t size_bytes = computeStorageNbytes(size, stride, dtype.itemsize()); + auto size_bytes = computeStorageNbytes(size, stride, dtype.itemsize()); auto storage_impl = c10::make_intrusive( c10::StorageImpl::use_byte_size_t(), size_bytes, - allocator->allocate(size_bytes), allocator, /*resizeable=*/true); @@ -163,6 +192,24 @@ TensorBase empty_strided_generic( return tensor; } +TensorBase empty_strided_generic( + IntArrayRef size, + IntArrayRef stride, + c10::Allocator* allocator, + c10::DispatchKeySet ks, + ScalarType scalar_type) { + return _empty_strided_generic(size, stride, allocator, ks, scalar_type); +} + +TensorBase empty_strided_symint_generic( + SymIntArrayRef size, + SymIntArrayRef stride, + c10::Allocator* allocator, + c10::DispatchKeySet ks, + ScalarType scalar_type) { + return _empty_strided_generic(size, stride, allocator, ks, scalar_type); +} + TensorBase empty_cpu(IntArrayRef size, ScalarType dtype, bool pin_memory, c10::optional memory_format_opt) { auto allocator = GetCPUAllocatorMaybePinned(pin_memory); @@ -303,9 +350,7 @@ TensorBase empty_symint_meta( auto scalar_type = dtype_or_default(dtype_opt); auto *allocator = GetAllocator(kMeta); constexpr c10::DispatchKeySet meta_dks(c10::DispatchKey::Meta); - // TODO: do this. Note that naive implementation will choke on truly - // unknown sizes without on the fly reasoning - // at::detail::check_size_nonnegative(size); + at::detail::check_size_nonnegative(size); at::detail::raise_warning_for_complex_half(scalar_type); caffe2::TypeMeta dtype = scalarTypeToTypeMeta(scalar_type); SymInt size_bytes = dtype.itemsize(); @@ -343,7 +388,7 @@ TensorBase empty_symint_meta( TORCH_CHECK(0, "other memory format not implemented yet"); } - tensor.unsafeGetTensorImpl()->set_sym_sizes_and_strides(size, strides); + tensor.unsafeGetTensorImpl()->set_sizes_and_strides(size, strides); return tensor; } @@ -395,4 +440,40 @@ TensorBase empty_strided_meta( options.pinned_memory_opt()); } +TensorBase empty_strided_symint_meta(SymIntArrayRef size, SymIntArrayRef stride, + ScalarType dtype) { + auto *allocator = GetAllocator(kMeta); + constexpr c10::DispatchKeySet meta_dks(c10::DispatchKey::Meta); + return at::detail::empty_strided_symint_generic( + size, stride, allocator, meta_dks, dtype); +} + +TensorBase empty_strided_symint_meta( + SymIntArrayRef size, + SymIntArrayRef stride, + c10::optional dtype_opt, + c10::optional layout_opt, + c10::optional device_opt, + c10::optional pin_memory_opt) { + auto device = device_or_default(device_opt); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(device.type() == DeviceType::Meta); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(layout_or_default(layout_opt) == Layout::Strided); + + auto dtype = dtype_or_default(dtype_opt); + return at::detail::empty_strided_symint_meta(size, stride, dtype); +} + +TensorBase empty_strided_symint_meta( + SymIntArrayRef size, + SymIntArrayRef stride, + const TensorOptions &options) { + return at::detail::empty_strided_symint_meta( + size, + stride, + optTypeMetaToScalarType(options.dtype_opt()), + options.layout_opt(), + options.device_opt(), + options.pinned_memory_opt()); +} + }} // namespace at::detail diff --git a/aten/src/ATen/EmptyTensor.h b/aten/src/ATen/EmptyTensor.h index 06a33601a154..969eeb6dc5ee 100644 --- a/aten/src/ATen/EmptyTensor.h +++ b/aten/src/ATen/EmptyTensor.h @@ -4,7 +4,8 @@ namespace at { namespace detail { -inline void check_size_nonnegative(IntArrayRef size) { +template +inline void check_size_nonnegative(ArrayRefType size) { for (auto x : size) { TORCH_CHECK( x >= 0, @@ -24,6 +25,11 @@ TORCH_API size_t computeStorageNbytes( IntArrayRef strides, size_t itemsize, size_t storage_offset = 0); +TORCH_API SymInt computeStorageNbytes( + SymIntArrayRef sizes, + SymIntArrayRef strides, + SymInt itemsize, + SymInt storage_offset = 0); TORCH_API TensorBase empty_generic( IntArrayRef size, @@ -39,6 +45,13 @@ TORCH_API TensorBase empty_strided_generic( c10::DispatchKeySet ks, ScalarType scalar_type); +TORCH_API TensorBase empty_strided_symint_generic( + SymIntArrayRef size, + SymIntArrayRef stride, + c10::Allocator* allocator, + c10::DispatchKeySet ks, + ScalarType scalar_type); + TORCH_API TensorBase empty_cpu( IntArrayRef size, ScalarType dtype, @@ -113,5 +126,23 @@ TORCH_API TensorBase empty_strided_meta( IntArrayRef stride, const TensorOptions& options); +TORCH_API TensorBase empty_strided_symint_meta( + SymIntArrayRef size, + SymIntArrayRef stride, + ScalarType dtype); + +TORCH_API TensorBase empty_strided_symint_meta( + SymIntArrayRef size, + SymIntArrayRef stride, + c10::optional dtype_opt, + c10::optional layout_opt, + c10::optional device_opt, + c10::optional pin_memory_opt); + +TORCH_API TensorBase empty_strided_symint_meta( + SymIntArrayRef size, + SymIntArrayRef stride, + const TensorOptions& options); + } // namespace detail } // namespace at diff --git a/aten/src/ATen/ExpandUtils.cpp b/aten/src/ATen/ExpandUtils.cpp index a44005a2ef81..ee846c9b82e3 100644 --- a/aten/src/ATen/ExpandUtils.cpp +++ b/aten/src/ATen/ExpandUtils.cpp @@ -13,8 +13,8 @@ TensorBase expand_slow_path(const TensorBase &self, IntArrayRef size) { namespace { // NOTE: are_expandable did a similar check, please keep them sync if change is needed -template -Container infer_size_impl(IntArrayRef a, IntArrayRef b) { +template +Container infer_size_impl(ArrayType a, ArrayType b) { size_t dimsA = a.size(); size_t dimsB = b.size(); size_t ndim = dimsA > dimsB ? dimsA : dimsB; @@ -25,8 +25,8 @@ Container infer_size_impl(IntArrayRef a, IntArrayRef b) { ptrdiff_t offset = ndim - 1 - i; ptrdiff_t dimA = dimsA - 1 - offset; ptrdiff_t dimB = dimsB - 1 - offset; - int64_t sizeA = (dimA >= 0) ? a[dimA] : 1; - int64_t sizeB = (dimB >= 0) ? b[dimB] : 1; + auto sizeA = (dimA >= 0) ? a[dimA] : 1; + auto sizeB = (dimB >= 0) ? b[dimB] : 1; TORCH_CHECK( sizeA == sizeB || sizeA == 1 || sizeB == 1, @@ -35,7 +35,7 @@ Container infer_size_impl(IntArrayRef a, IntArrayRef b) { ") at non-singleton dimension ", i); // 1s map to the other size (even 0). - expandedSizes[i] = sizeA == 1 ? sizeB : sizeA; + expandedSizes[i] = sizeA == 1 ? std::move(sizeB) : std::move(sizeA); } return expandedSizes; diff --git a/aten/src/ATen/ExpandUtils.h b/aten/src/ATen/ExpandUtils.h index 7a81076a7dd0..9e48421e540f 100644 --- a/aten/src/ATen/ExpandUtils.h +++ b/aten/src/ATen/ExpandUtils.h @@ -3,6 +3,7 @@ #ifndef AT_PER_OPERATOR_HEADERS #include #else +#include #include #endif @@ -20,6 +21,8 @@ namespace at { TORCH_API std::vector infer_size(IntArrayRef a, IntArrayRef b); TORCH_API DimVector infer_size_dimvector(IntArrayRef a, IntArrayRef b); +TORCH_API SymDimVector +infer_size_symdimvector(SymIntArrayRef a, SymIntArrayRef b); // Named type instead of a pair/tuple so that we can be sure to // construct the vectors in place and get NRVO. @@ -93,10 +96,11 @@ inline void check_defined( inline c10::MaybeOwned expand_inplace( const Tensor& tensor, const Tensor& to_expand) { - if (tensor.sizes().equals(to_expand.sizes())) { + if (tensor.sym_sizes().equals(to_expand.sym_sizes())) { return c10::MaybeOwned::borrowed(to_expand); } - return c10::MaybeOwned::owned(to_expand.expand(tensor.sizes())); + return c10::MaybeOwned::owned( + to_expand.expand_symint(tensor.sym_sizes())); } inline c10::MaybeOwned expand_inplace( @@ -437,16 +441,17 @@ inline std::vector expand_outplace(TensorList to_expand) { return result; } -static inline Tensor sum_to( +template +inline Tensor _sum_to( Tensor tensor, - const c10::SymIntArrayRef shape, + const c10::ArrayRef shape, bool always_return_non_view = false) { if (shape.size() == 0) { return tensor.sum(); } - auto sizes = tensor.sym_sizes(); - c10::SmallVector reduce_dims; + auto sizes = at::symint::sizes(tensor); + c10::SmallVector reduce_dims; const int64_t leading_dims = sizes.size() - shape.size(); for (const auto i : c10::irange(leading_dims)) { reduce_dims.push_back(i); @@ -458,29 +463,34 @@ static inline Tensor sum_to( } if (!reduce_dims.empty()) { - tensor = tensor.sum_symint(reduce_dims, /*keepdim=*/true); + tensor = tensor.sum(reduce_dims, /*keepdim=*/true); } if (always_return_non_view) { // This is only actually used by the functionalization pass. // We want to be able to guarantee that this function doesn't return a view // of the input. - return leading_dims > 0 ? at::view_copy_symint(tensor, shape) + return leading_dims > 0 ? at::symint::view_copy(tensor, shape) : tensor.clone(); } else { - return leading_dims > 0 ? tensor.view_symint(shape) : tensor; + return leading_dims > 0 ? at::symint::view(tensor, shape) : tensor; } } +inline Tensor sum_to( + Tensor tensor, + const c10::SymIntArrayRef shape, + bool always_return_non_view = false) { + return _sum_to(tensor, shape, always_return_non_view); +} + // Sums `tensor` repeatedly to produce a tensor of shape `shape`. // Precondition: is_expandable_to(shape, tensor.sizes()) must be true -static inline Tensor sum_to( +inline Tensor sum_to( Tensor tensor, const IntArrayRef shape, bool always_return_non_view = false) { - auto sym_size = c10::SymIntArrayRef( - reinterpret_cast(shape.data()), shape.size()); - return sum_to(tensor, sym_size, always_return_non_view); + return _sum_to(tensor, shape, always_return_non_view); } static inline bool is_expandable_to( diff --git a/aten/src/ATen/FunctionalInverses.cpp b/aten/src/ATen/FunctionalInverses.cpp index 471c74a73c95..2bdc76c7764a 100644 --- a/aten/src/ATen/FunctionalInverses.cpp +++ b/aten/src/ATen/FunctionalInverses.cpp @@ -127,9 +127,9 @@ Tensor FunctionalInverses::_neg_view_copy_inverse(const Tensor& base, const Tens } } -Tensor FunctionalInverses::as_strided_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::IntArrayRef size, at::IntArrayRef stride, c10::optional storage_offset) { +Tensor FunctionalInverses::as_strided_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::SymIntArrayRef size, at::SymIntArrayRef stride, c10::optional storage_offset) { // Pessimism: we can't reapply views for as_strided_scatter. - return base.as_strided_scatter(mutated_view, size, stride, storage_offset); + return base.as_strided_scatter_symint(mutated_view, size, stride, storage_offset); } Tensor FunctionalInverses::diagonal_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t offset, int64_t dim1, int64_t dim2) { @@ -137,19 +137,15 @@ Tensor FunctionalInverses::diagonal_copy_inverse(const Tensor& base, const Tenso return base.diagonal_scatter(mutated_view, offset, dim1, dim2); } -Tensor FunctionalInverses::expand_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::IntArrayRef size, bool implicit) { - return at::sum_to(mutated_view, base.sizes(),/*always_return_non_view=*/!reapply_views); -} - -Tensor FunctionalInverses::expand_copy_SymInt_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, c10::SymIntArrayRef size, bool implicit) { - return at::sum_to(mutated_view, c10::asIntArrayRefSlow(base.sym_sizes()),/*always_return_non_view=*/!reapply_views); +Tensor FunctionalInverses::expand_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::SymIntArrayRef size, bool implicit) { + return at::sum_to(mutated_view, base.sym_sizes(),/*always_return_non_view=*/!reapply_views); } Tensor FunctionalInverses::permute_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::IntArrayRef dims) { return at::functionalization::permute_copy_inverse(mutated_view, dims, reapply_views); } -Tensor FunctionalInverses::_reshape_alias_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::IntArrayRef size, at::IntArrayRef stride) { +Tensor FunctionalInverses::_reshape_alias_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::SymIntArrayRef size, at::SymIntArrayRef stride) { // Note that I'm directly calling reshape(), and ignoring the strides. // _reshape_alias() isn't available from user code, and is an implementation detail of reshape(). // Specifically, passing in the strides directly can get us into trouble in cases like: @@ -157,16 +153,17 @@ Tensor FunctionalInverses::_reshape_alias_copy_inverse(const Tensor& base, const // When we eventually run the _reshape_alias_inverse() call here, if we were to pass in both sizes and strides, // The call would fail because `mutated_view` doesn't have enough bytes of storage. if (reapply_views) { - return at::_reshape_alias(mutated_view, base.sizes(), base.strides()); + return at::_reshape_alias_symint(mutated_view, base.sym_sizes(), base.sym_strides()); } else { - return at::_reshape_alias_copy(mutated_view, base.sizes(), base.strides()); + return at::_reshape_alias_copy_symint(mutated_view, base.sym_sizes(), base.sym_strides()); } } -Tensor FunctionalInverses::select_copy_int_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t dim, int64_t index) { +Tensor FunctionalInverses::select_copy_int_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t dim, c10::SymInt index) { // Pessimism: we can't reapply views for slice_scatter. - return base.select_scatter(mutated_view, dim, index); + return base.select_scatter_symint(mutated_view, dim, index); } + Tensor FunctionalInverses::detach_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views) { // the functionalization pass doesn't care about autograd metadata - as a view, I think detach() is just an identity function return mutated_view; @@ -176,36 +173,36 @@ Tensor FunctionalInverses::lift_fresh_copy_inverse(const Tensor& base, const Ten return mutated_view; } -Tensor FunctionalInverses::slice_copy_Tensor_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t dim, c10::optional start, c10::optional end, int64_t step) { +Tensor FunctionalInverses::slice_copy_Tensor_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t dim, c10::optional start, c10::optional end, c10::SymInt step) { // Pessimism: we can't reapply views for slice_scatter. - return base.slice_scatter(mutated_view, dim, start, end, step); + return base.slice_scatter_symint(mutated_view, dim, start, end, step); } -Tensor FunctionalInverses::split_copy_Tensor_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t mutated_view_idx, int64_t split_size, int64_t dim) { +Tensor FunctionalInverses::split_copy_Tensor_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t mutated_view_idx, c10::SymInt split_size, int64_t dim) { // It would be nice if this logic could be re-used from autograd's split_backward(), but I don't think it can. // For functionalization, we have only have one of the tensors from the TensorList outputed by split(), and we want to layer i // on top of the base tensor. // For autograd, we have all of the tensors outputted by split() and we just want to stack them. - dim = at::maybe_wrap_dim(dim, base.sizes().size()); - auto dim_size = base.size(dim); - auto start = mutated_view_idx * split_size; - auto end = start + split_size; + dim = at::maybe_wrap_dim(dim, base.dim()); + auto dim_size = base.sym_size(dim); + auto start = split_size * mutated_view_idx; + auto end = split_size + start; if (end > dim_size) end = dim_size; // Pessimism: we can't reapply views for slice_scatter. - return base.slice_scatter(mutated_view, dim, start, end, 1); + return base.slice_scatter_symint(mutated_view, dim, start, end, 1); } -Tensor FunctionalInverses::split_with_sizes_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t mutated_view_idx, at::IntArrayRef split_sizes, int64_t dim) { - dim = at::maybe_wrap_dim(dim, base.sizes().size()); - auto dim_size = base.size(dim); - int64_t start = 0; +Tensor FunctionalInverses::split_with_sizes_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t mutated_view_idx, c10::SymIntArrayRef split_sizes, int64_t dim) { + dim = at::maybe_wrap_dim(dim, base.dim()); + auto dim_size = base.sym_size(dim); + c10::SymInt start = 0; for (auto i = 0; i < mutated_view_idx; ++i) { start += split_sizes[i]; } auto end = start + split_sizes[mutated_view_idx]; if (end > dim_size) end = dim_size; // Pessimism: we can't reapply views for slice_scatter. - return base.slice_scatter(mutated_view, dim, start, end, 1); + return base.slice_scatter_symint(mutated_view, dim, start, end, 1); } Tensor FunctionalInverses::squeeze_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views) { @@ -232,6 +229,11 @@ Tensor FunctionalInverses::transpose_copy_int_inverse(const Tensor& base, const } } +Tensor FunctionalInverses::_nested_view_from_buffer_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, const Tensor& nested_size_tensor, const Tensor& nested_stride_tensor, IntArrayRef offsets) { + TORCH_INTERNAL_ASSERT(false, "Attempted to call _nested_view_from_buffer() during the functionalization pass. For now, nested tensors aren't supported during functionalization"); + return Tensor(); +} + Tensor FunctionalInverses::unsqueeze_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, int64_t dim) { if (reapply_views) { return at::squeeze(mutated_view, dim); @@ -291,15 +293,7 @@ Tensor FunctionalInverses::unbind_copy_int_inverse(const Tensor& base, const Ten return base.select_scatter(mutated_view, dim, mutated_view_idx); } -Tensor FunctionalInverses::view_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::IntArrayRef size) { - if (reapply_views) { - return mutated_view.view(base.sizes()); - } else { - return at::view_copy(mutated_view, base.sizes()); - } -} - -Tensor FunctionalInverses::view_copy_SymInt_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, c10::SymIntArrayRef size) { +Tensor FunctionalInverses::view_copy_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::SymIntArrayRef size) { if (reapply_views) { return mutated_view.view_symint(base.sym_sizes()); } else { @@ -307,6 +301,7 @@ Tensor FunctionalInverses::view_copy_SymInt_inverse(const Tensor& base, const Te } } + Tensor FunctionalInverses::view_copy_dtype_inverse(const Tensor& base, const Tensor& mutated_view, bool reapply_views, at::ScalarType dtype) { if (reapply_views) { return mutated_view.view(base.scalar_type()); diff --git a/aten/src/ATen/FunctionalStorageImpl.cpp b/aten/src/ATen/FunctionalStorageImpl.cpp index 2fad6bfad606..8e80ce0ca7dd 100644 --- a/aten/src/ATen/FunctionalStorageImpl.cpp +++ b/aten/src/ATen/FunctionalStorageImpl.cpp @@ -1,7 +1,9 @@ #include +#include #include #include +#include #include #include @@ -13,23 +15,9 @@ ViewMeta ViewMeta::to_out_idx(int64_t out_idx) { return ViewMeta(forward_fn, reverse_fn, out_idx); } -Alias::Alias(const at::Tensor& base) { - TORCH_INTERNAL_ASSERT(!at::functionalization::impl::isFunctionalTensor(base)); - base_ = base; -} - -const at::Tensor& Alias::base() const { - return base_; -} - -void Alias::add_update(const at::Tensor& updated_val, const std::vector& metas) { - updates_.push_back({updated_val, metas}); - generation_++; -} - // Note [Functionalization: Alias Removal Part 2] // See Note [Functionalization: Alias Removal] for more details. -// This function applies a single update from one of the views to the Alias object. +// This function applies a single update from one of the views to the StorageImpl. // We start out with and , and our goal is to end up with . // Consider this program: // @@ -44,15 +32,15 @@ void Alias::add_update(const at::Tensor& updated_val, const std::vectorhas_symbolic_sizes_strides()) { + // Today, the two implementations of SymInt are in Python (proxy tensor), + // and lazy tensor (LTC/XLA). + // LTC hasn't implemented SymInt support yet though + // Once it does, we should remove this check. + if (value.key_set().has(c10::DispatchKey::Python)) { + return value.storage().sym_nbytes(); + } + } + // XLA storage objects also do not properly track nbytes. + return at::detail::computeStorageNbytes(value.sizes(), value.strides(), value.dtype().itemsize(), value.storage_offset()); +} + +FunctionalStorageImpl::FunctionalStorageImpl(const Tensor& base) + : c10::StorageImpl( + c10::StorageImpl::use_byte_size_t(), + get_nbytes(base), + DataPtr{nullptr, base.device()}, + GetAllocator(kMeta), + /*resizeable=*/true + ), + base_(base) + { + TORCH_INTERNAL_ASSERT(!at::functionalization::impl::isFunctionalTensor(base_)); +} + +void FunctionalStorageImpl::add_update(const Tensor& updated_val, const std::vector& metas) { + TORCH_CHECK(!frozen_, "cannot mutate tensors with frozen storage"); + updates_.push_back({updated_val, metas}); + generation_++; +} + +bool FunctionalStorageImpl::apply_updates() { // N.B:none of the tensors used in this function should be FunctionalTensorWrappers at this point. // The only reason we currently need the TLS exclude guard here is because of functorch's DynamicLayer stack. // It adds the Functionalize key into TLS before redispatching to the functionalization kernels, @@ -89,33 +111,5 @@ bool Alias::apply_updates() { return any_updates; } -FunctionalStorageImpl::FunctionalStorageImpl(const Tensor& value) - : c10::StorageImpl( - c10::StorageImpl::use_byte_size_t(), - value.numel() * value.dtype().itemsize(), - DataPtr{nullptr, value.device()}, - // Using a null allocator, since FunctionalTensorImpl's aren't resizeable. - nullptr, - /*resizeable=*/false - ), - alias_(Alias(value)) - {} - -void FunctionalStorageImpl::add_update(const Tensor& updated_val, const std::vector& view_metas) { - alias_.add_update(updated_val, view_metas); -} - -bool FunctionalStorageImpl::apply_updates() { - return alias_.apply_updates(); -} - -const Tensor& FunctionalStorageImpl::base() { - return alias_.base(); -} - -size_t FunctionalStorageImpl::generation() const { - return alias_.generation(); -} - } // namespace functionalization } // namespace at diff --git a/aten/src/ATen/FunctionalStorageImpl.h b/aten/src/ATen/FunctionalStorageImpl.h index 6caeac2737fd..dbaf30c9963d 100644 --- a/aten/src/ATen/FunctionalStorageImpl.h +++ b/aten/src/ATen/FunctionalStorageImpl.h @@ -46,13 +46,18 @@ struct ViewMeta { ViewMeta to_out_idx(int64_t out_idx); }; -// Alias represents the state shared by (potentially multiple) views of the same -// tensor. For example, in the following code: +// FunctionalStorageImpl is a subclass of StorageImpl used by the +// functionalization pass. It has no underlying data (similar to meta storage). +// It also knows how to reflect mutations to tensors in the absence of a valid +// data pointer. +// +// A storage represents the state shared by (potentially multiple) views of the +// same tensor. For example, in the following code: // // b = a.view1(...) // c = b.view2(...) // b.add_(1) -// --> alias.add_update(b, {view1_meta}) +// --> storage.add_update(b, {view1_meta}) // // The call to add_(1) will result in a call to alias.add_update(b, // {view1_meta}), queueing up the mutation from b onto the alias. Later, suppose @@ -65,58 +70,49 @@ struct ViewMeta { // --> c.sync_() // --> alias.apply_updates() // after this, the alias will be updated to // reflect the mutation to b -class Alias { +struct TORCH_API FunctionalStorageImpl : public c10::StorageImpl { public: struct Update { const at::Tensor new_val; const std::vector view_metas; }; - explicit Alias(const at::Tensor& base); - const at::Tensor& base() const; + + explicit FunctionalStorageImpl(const Tensor& value); + + void add_update( + const Tensor& updated_val, + const std::vector& view_metas); + bool apply_updates(); + const Tensor& base() { + return base_; + } size_t generation() const { return generation_; } - void add_update( - const at::Tensor& updated_val, - const std::vector& metas); - bool apply_updates(); + void freeze() { + frozen_ = true; + } + + ~FunctionalStorageImpl() override = default; private: // NB: base_ should always point to a tensor BELOW the current // functionalization layer. This is mainly to avoid reference cycles. e.g. // given `b = a.view(...)` Both a.storage_ and b.storage_ are a - // FunctionStorageImpl containing an Alias, with contains a Tensor `base_`. In - // this case (where a and b are FunctionalTensorWrapper's), base_ should point - // not to a, but to a's unwrapped value, a.value_` See Note - // [Functionalization: Alias Removal] for a diagram that shows this visually. + // FunctionStorageImpl containing an Walualias, with contains a Tensor + // `base_`. In this case (where a and b are FunctionalTensorWrapper's), base_ + // should point not to a, but to a's unwrapped value, a.value_` See Note + // [Functionalization: Walualias Removal] for a diagram that shows this + // visually. at::Tensor base_; std::vector updates_; // generation_ gets incremented every time a mutation is queued onto the // alias. It is used to determine if a given tensor is "up to date", or if it // needs to be regenerated from the alias. size_t generation_ = 0; -}; - -// FunctionalStorageImpl is a subclass of StorageImpl used by the -// functionalization pass. It has no underlying data (similar to meta storage). -// It also knows how to reflect mutations to tensors in the absence of a valid -// data pointer. It does this by separately storing an Alias object, which knows -// how to reflect mutations that may have happened to views of the original -// tensor. -struct TORCH_API FunctionalStorageImpl : public c10::StorageImpl { - explicit FunctionalStorageImpl(const Tensor& value); - - void add_update( - const Tensor& updated_val, - const std::vector& view_metas); - bool apply_updates(); - const Tensor& base(); - size_t generation() const; - - ~FunctionalStorageImpl() override = default; - - private: - at::functionalization::Alias alias_; + // If frozen, no more mutations are allowed on this storage. Once frozen, a + // storage cannot be unfrozen. + bool frozen_ = false; }; } // namespace functionalization diff --git a/aten/src/ATen/FunctionalTensorWrapper.cpp b/aten/src/ATen/FunctionalTensorWrapper.cpp index a8c58466a052..2c3a12020eb6 100644 --- a/aten/src/ATen/FunctionalTensorWrapper.cpp +++ b/aten/src/ATen/FunctionalTensorWrapper.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include @@ -36,19 +37,10 @@ void FunctionalTensorWrapper::set_constructor_metadata() { // Functorch transforms all have their own wrapper tensors (e.g. BatchedTensorImpl) which expect // to participate in the functorch transforms. key_set_ = key_set_ - c10::functorch_transforms_ks - c10::python_ks; - // For better error handling, - // we also don't want our wrapper tensor to be able to dispatch directly - // to a backend kernel. - // Dispatching directly to e.g. a CPU kernel would always segfault, - // because wrapper tensors don't have any real data. - // (This should never happen because we should always hit a functionalization kernel, - // but can help make bugs less nasty). - // Here, we defensively remove any backend keys from the wrapper's keyset. - // We don't want to remove actual backend bits though (say we're redispatching to autograd; - // we need to know if we're dispatching to AutogradCPU or AutogradXLA). - // Instead, it's sufficient to remove the `Dense` dispatch key, - // which prevents us from accidentally trying to directly run a CPU/CUDA kernel. - key_set_ = key_set_.remove(c10::DispatchKey::Dense); + // We override a bunch of _custom(), so make sure they get called + // TODO: metadata copying may not actually be necessary then + set_custom_sizes_strides(SizesStridesPolicy::CustomSizes); + set_custom_device(true); } FunctionalTensorWrapper::FunctionalTensorWrapper(const Tensor& value) @@ -62,6 +54,10 @@ FunctionalTensorWrapper::FunctionalTensorWrapper(const Tensor& value) set_constructor_metadata(); } +void FunctionalTensorWrapper::freeze_storage() const { + functional_storage_impl()->freeze(); +} + // Note [Functionalization: Alias Removal] // When someone calls a view() op during the functionalization pass, e.g. 'b = a.view(...)', // we link `b` and `a` to a shared Alias object to preserve the aliasing relationship. @@ -202,12 +198,7 @@ void FunctionalTensorWrapper::replace_(const Tensor& other) { value_ = other; // out= ops are allowed to resize the output tensors, mutating both the data and metadata of the tensor. // We need to propagate that metadata mutation to the wrapper (new size). - if (sizes() != value_.sizes() || strides() != value_.strides()) { - set_sizes_and_strides(value_.sizes(), value_.strides()); - } - if (storage_offset() != value_.storage_offset()) { - set_storage_offset(value_.storage_offset()); - } + set_sizes_and_strides(value_.sym_sizes(), value_.sym_strides(), value_.sym_storage_offset()); if (dtype() != value_.unsafeGetTensorImpl()->dtype() || layout() != value_.unsafeGetTensorImpl()->layout()) { // .to() should not re-entrantly go through functionalization. at::AutoDispatchSkipFunctionalize guard; @@ -296,19 +287,23 @@ c10::intrusive_ptr FunctionalTensorWrapper::shallow_copy_and_detach_ bool allow_tensor_metadata_change) const { if (key_set_.has(DispatchKey::Python) && !c10::impl::tls_is_dispatch_key_excluded(DispatchKey::Python)) { - auto r = pyobj_interpreter_.load(std::memory_order_acquire)->detach(this); + auto r = (*pyobj_interpreter_.load(std::memory_order_acquire))->detach(this); if (r) { r->set_version_counter(std::forward(version_counter)); r->set_allow_tensor_metadata_change(allow_tensor_metadata_change); return r; } } + auto impl = c10::make_intrusive(value_); copy_tensor_metadata( /*src_impl=*/this, /*dest_impl=*/impl.get(), /*version_counter=*/std::forward(version_counter), /*allow_tensor_metadata_change=*/allow_tensor_metadata_change); + impl->level_ = level_; + impl->generation_ = generation_; + impl->view_metas_ = view_metas_; impl->refresh_numel(); impl->refresh_contiguous(); return impl; @@ -328,6 +323,9 @@ c10::intrusive_ptr FunctionalTensorWrapper::shallow_copy_and_detach( std::move(version_counter), allow_tensor_metadata_change); } +c10::Device FunctionalTensorWrapper::device_custom() const { + return value_.unsafeGetTensorImpl()->device(); +} at::IntArrayRef FunctionalTensorWrapper::sizes_custom() const { return value_.unsafeGetTensorImpl()->sizes(); } @@ -343,12 +341,18 @@ int64_t FunctionalTensorWrapper::numel_custom() const { bool FunctionalTensorWrapper::is_contiguous_custom(at::MemoryFormat memory_format) const { return value_.unsafeGetTensorImpl()->is_contiguous(); } -c10::SymIntArrayRef FunctionalTensorWrapper::sym_sizes() const { - return value_.unsafeGetTensorImpl()->sym_sizes(); -} c10::SymIntArrayRef FunctionalTensorWrapper::sym_sizes_custom() const { return value_.unsafeGetTensorImpl()->sym_sizes(); } +c10::SymIntArrayRef FunctionalTensorWrapper::sym_strides_custom() const { + return value_.unsafeGetTensorImpl()->sym_strides(); +} +c10::SymInt FunctionalTensorWrapper::sym_size_custom(int64_t d) const { + return value_.unsafeGetTensorImpl()->sym_size(d); +} +c10::SymInt FunctionalTensorWrapper::sym_storage_offset_custom() const { + return value_.unsafeGetTensorImpl()->sym_storage_offset(); +} namespace functionalization { namespace impl { @@ -367,14 +371,6 @@ c10::optional to_functional_tensor(const c10::optional& tensor) } return c10::nullopt; } -c10::List to_functional_tensor(const c10::List& t_list) { - c10::List outputs; - outputs.reserve(t_list.size()); - for (const auto i : c10::irange(t_list.size())) { - outputs.push_back(to_functional_tensor(t_list[i])); - } - return outputs; -} c10::List> to_functional_tensor(const c10::List>& t_list) { c10::List> outputs; outputs.reserve(t_list.size()); @@ -383,17 +379,11 @@ c10::List> to_functional_tensor(const c10::List to_functional_tensor(const std::vector& t_list) { - std::vector outputs(t_list.size()); - for (const auto i : c10::irange(t_list.size())) { - outputs[i] = to_functional_tensor(t_list[i]); - } - return outputs; -} -std::vector to_functional_tensor(const TensorList& t_list) { - std::vector outputs(t_list.size()); - for (const auto i : c10::irange(t_list.size())) { - outputs[i] = to_functional_tensor(t_list[i]); +std::vector to_functional_tensor(ITensorListRef t_list) { + std::vector outputs; + outputs.reserve(t_list.size()); + for (const auto& tensor : t_list) { + outputs.push_back(to_functional_tensor(tensor)); } return outputs; } @@ -419,17 +409,17 @@ c10::optional from_functional_tensor(const c10::optional& t, boo } return c10::nullopt; } -c10::List from_functional_tensor(const c10::List& t_list) { - c10::List outputs; +std::vector from_functional_tensor(ITensorListRef t_list) { + std::vector outputs; outputs.reserve(t_list.size()); - for (const auto i : c10::irange(t_list.size())) { + for (const auto& tensor : t_list) { // from_functional_tensor(Tensor) has asserts to make sure you don't accidentally call // it on a non-functional input, // but from_functional_tensor(TensorList) can recieve a list containing both // functional and non-functional tensors. // Example of when that can happen: torch.cat(function_input_tensor, global_state_tensor). // When that happens, we're okay with only unwrapping the functional tensors. - outputs.push_back(from_functional_tensor(t_list[i], /*assert_functional=*/false)); + outputs.push_back(from_functional_tensor(tensor, /*assert_functional=*/false)); } return outputs; } @@ -441,13 +431,6 @@ c10::List> from_functional_tensor(const c10::List from_functional_tensor(const TensorList& t_list) { - std::vector outputs(t_list.size()); - for (const auto i : c10::irange(t_list.size())) { - outputs[i] = from_functional_tensor(t_list[i], /*assert_functional=*/false); - } - return outputs; -} void sync(const Tensor& t) { if (t.unsafeGetTensorImpl()->is_wrapped_number()) { @@ -471,13 +454,8 @@ void sync(const c10::optional& t) { sync(*t); } } -void sync(const c10::List t_list) { - for (const auto i : c10::irange(t_list.size())) { - sync(t_list[i]); - } -} -void sync(const at::TensorList t_list) { - for (auto t: t_list) { +void sync(ITensorListRef t_list) { + for (const auto& t : t_list) { sync(t); } } @@ -492,22 +470,24 @@ void replace_(const Tensor& functional_tensor, const Tensor& other) { unsafeGetFunctionalWrapper(functional_tensor)->replace_(other); } -void replace_(const TensorList functional_tensor, TensorList other) { +void replace_(const ITensorListRef functional_tensor, ITensorListRef other) { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(functional_tensor.size() == other.size()); + auto functional_tensor_it = functional_tensor.begin(); + auto other_it = other.begin(); for (const auto i : c10::irange(functional_tensor.size())) { - replace_(functional_tensor[i], other[i]); + (void)i; // Suppress unused variable warning + replace_(*functional_tensor_it++, *other_it++); } } - void commit_update(const Tensor& functional_tensor) { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(isFunctionalTensor(functional_tensor)); unsafeGetFunctionalWrapper(functional_tensor)->commit_update(); } -void commit_update(const TensorList functional_tensor) { - for (const auto i : c10::irange(functional_tensor.size())) { - commit_update(functional_tensor[i]); +void commit_update(ITensorListRef functional_tensor) { + for (const auto& t : functional_tensor) { + commit_update(t); } } @@ -523,21 +503,6 @@ bool isFunctionalTensor(const c10::optional& t) { } } -// For lists that have a mix of functional and nonfunctional tensors, -// functionalization machinery should just unwrap the functional wrappers -// and leave the ordinary tensors alone. -bool isFunctionalTensor(const c10::List& t_list) { - if (t_list.size() == 0) return false; - auto functional_count = 0; - for (const auto i : c10::irange(t_list.size())) { - if (!t_list[i].defined()) continue; - if (isFunctionalTensor(t_list[i])) { - ++functional_count; - } - } - return functional_count > 0; -} - bool isFunctionalTensor(const c10::List>& t_list) { if (t_list.size() == 0) return false; auto functional_count = 0; @@ -550,18 +515,29 @@ bool isFunctionalTensor(const c10::List>& t_list) { return functional_count > 0; } -bool isFunctionalTensor(const c10::ArrayRef t_list) { - if (t_list.size() == 0) return false; +template +bool isFunctionalTensorIListRef(c10::IListRef list) { + if (list.size() == 0) return false; auto functional_count = 0; - for (const auto i : c10::irange(t_list.size())) { - if (!t_list[i].defined()) continue; - if (isFunctionalTensor(t_list[i])) { + for (const auto& tensor : list) { + if (!tensor.defined()) continue; + if (isFunctionalTensor(tensor)) { ++functional_count; } } return functional_count > 0; } +bool isFunctionalTensor(ITensorListRef list) { + return isFunctionalTensorIListRef(list); +} + +void freeze_functional_tensor(const Tensor& tensor) { + TORCH_INTERNAL_ASSERT(at::functionalization::impl::isFunctionalTensor(tensor)); + auto functional_base_impl = at::functionalization::impl::unsafeGetFunctionalWrapper(tensor); + functional_base_impl->freeze_storage(); +} + Tensor create_functional_tensor_with_view_meta(const at::Tensor& view_to_wrap, const at::Tensor& base, functionalization::ViewMeta meta, int64_t out_idx) { TORCH_INTERNAL_ASSERT(!at::functionalization::impl::isFunctionalTensor(view_to_wrap)); TORCH_INTERNAL_ASSERT(at::functionalization::impl::isFunctionalTensor(base)); @@ -575,18 +551,12 @@ Tensor create_functional_tensor_with_view_meta(const at::Tensor& view_to_wrap, c return at::detail::make_tensor(view_to_wrap, functional_base_impl, meta); } -std::vector create_functional_tensor_with_view_meta(const c10::List& view_to_wrap, const at::Tensor& base, functionalization::ViewMeta meta) { - std::vector outputs(view_to_wrap.size()); - for (const auto i : c10::irange(view_to_wrap.size())) { - outputs[i] = create_functional_tensor_with_view_meta(view_to_wrap[i], base, meta, i); - } - return outputs; -} - -std::vector create_functional_tensor_with_view_meta(const std::vector& view_to_wrap, const at::Tensor& base, functionalization::ViewMeta meta) { +std::vector create_functional_tensor_with_view_meta(ITensorListRef view_to_wrap, const at::Tensor& base, functionalization::ViewMeta meta) { std::vector outputs(view_to_wrap.size()); - for (const auto i : c10::irange(view_to_wrap.size())) { - outputs[i] = create_functional_tensor_with_view_meta(view_to_wrap[i], base, meta, i); + int64_t i = 0; + for (const auto& tensor : view_to_wrap) { + outputs[i] = create_functional_tensor_with_view_meta(tensor, base, meta, i); + i++; } return outputs; } @@ -602,8 +572,7 @@ void mutate_view_meta(const at::Tensor& self, functionalization::ViewMeta meta) // calls each {view} reference implementations with meta tensors. // The output meta tensor's stride info serves as a reference for what the correct strides should be. void set_sizes_strides_offset(const Tensor& out, const Tensor& reference_out) { - out.unsafeGetTensorImpl()->set_sizes_and_strides(reference_out.sizes(), reference_out.strides()); - out.unsafeGetTensorImpl()->set_storage_offset(reference_out.storage_offset()); + out.unsafeGetTensorImpl()->set_sizes_and_strides(reference_out.sym_sizes(), reference_out.sym_strides(), reference_out.sym_storage_offset()); } void set_sizes_strides_offset(const std::vector& outs, const std::vector& reference_outs) { diff --git a/aten/src/ATen/FunctionalTensorWrapper.h b/aten/src/ATen/FunctionalTensorWrapper.h index c5c0339fc1bf..0762fb1f7f9b 100644 --- a/aten/src/ATen/FunctionalTensorWrapper.h +++ b/aten/src/ATen/FunctionalTensorWrapper.h @@ -3,6 +3,7 @@ #include #include +#include #include #include #include @@ -99,6 +100,8 @@ struct TORCH_API FunctionalTensorWrapper : public c10::TensorImpl { // used to determine if it's up-to-date with its alias. The act of syncing a // tensor will set a tensor's generation equal to its alias's generation. bool is_up_to_date() const; + // Freezes the storage of this tensor, preventing subsequent mutations + void freeze_storage() const; // Every FunctionalTensorWrapper contains a vector objects // describing the series of view ops that ran to generate the current tensor // from the base tensor. This method is used by inplace-view ops like @@ -134,15 +137,18 @@ struct TORCH_API FunctionalTensorWrapper : public c10::TensorImpl { ~FunctionalTensorWrapper() override = default; // FunctionalTensorWrapper overrides all custom size/stride function, - // so that if the inner tensor has a custo implementation + // so that if the inner tensor has a custom implementation // we make sure to call that implementation. at::IntArrayRef sizes_custom() const override; at::IntArrayRef strides_custom() const override; int64_t dim_custom() const override; int64_t numel_custom() const override; bool is_contiguous_custom(at::MemoryFormat memory_format) const override; - c10::SymIntArrayRef sym_sizes() const override; c10::SymIntArrayRef sym_sizes_custom() const override; + c10::SymInt sym_size_custom(int64_t d) const override; + c10::SymIntArrayRef sym_strides_custom() const override; + c10::SymInt sym_storage_offset_custom() const override; + c10::Device device_custom() const override; private: const char* tensorimpl_type_name() const override; @@ -183,44 +189,40 @@ TORCH_API inline FunctionalTensorWrapper* unsafeGetFunctionalWrapper( TORCH_API bool isFunctionalTensor(const at::Tensor& tensor); TORCH_API bool isFunctionalTensor(const c10::optional& t); -TORCH_API bool isFunctionalTensor(const c10::List& t_list); TORCH_API bool isFunctionalTensor( const c10::List>& t_list); -TORCH_API bool isFunctionalTensor(const c10::ArrayRef t_list); +TORCH_API bool isFunctionalTensor(ITensorListRef list); TORCH_API Tensor to_functional_tensor(const Tensor& tensor); TORCH_API c10::optional to_functional_tensor( const c10::optional& tensor); -TORCH_API c10::List to_functional_tensor( - const c10::List& t_list); TORCH_API c10::List> to_functional_tensor( const c10::List>& t_list); -TORCH_API std::vector to_functional_tensor( - const std::vector& t_list); -TORCH_API std::vector to_functional_tensor(const TensorList& t_list); +TORCH_API std::vector to_functional_tensor(ITensorListRef t_list); + +TORCH_API void freeze_functional_tensor(const Tensor& tensor); TORCH_API Tensor from_functional_tensor(const Tensor& tensor, bool assert_functional = true); TORCH_API c10::optional from_functional_tensor( const c10::optional& t, bool assert_functional = true); -TORCH_API c10::List from_functional_tensor( - const c10::List& t_list); TORCH_API c10::List> from_functional_tensor( const c10::List>& t_list); -TORCH_API std::vector from_functional_tensor(const TensorList& tensors); +TORCH_API std::vector from_functional_tensor(ITensorListRef t_list); TORCH_API void sync(const at::Tensor& t); TORCH_API void sync(const c10::optional& t); -TORCH_API void sync(const c10::List t_list); -TORCH_API void sync(const at::TensorList t_list); TORCH_API void sync(const c10::List> t_list); +TORCH_API void sync(ITensorListRef t_list); TORCH_API void replace_(const Tensor& functional_tensor, const Tensor& other); -TORCH_API void replace_(const TensorList functional_tensor, TensorList other); +TORCH_API void replace_( + const ITensorListRef functional_tensor, + ITensorListRef other); TORCH_API void commit_update(const Tensor& functional_tensor); -TORCH_API void commit_update(const TensorList functional_tensor); +TORCH_API void commit_update(ITensorListRef functional_tensor); Tensor create_functional_tensor_with_view_meta( const Tensor& view_to_wrap, @@ -228,11 +230,7 @@ Tensor create_functional_tensor_with_view_meta( functionalization::ViewMeta meta, int64_t out_idx = 0); std::vector create_functional_tensor_with_view_meta( - const c10::List& view_to_wrap, - const Tensor& base, - functionalization::ViewMeta meta); -std::vector create_functional_tensor_with_view_meta( - const std::vector& view_to_wrap, + ITensorListRef view_to_wrap, const Tensor& base, functionalization::ViewMeta meta); @@ -280,18 +278,21 @@ TORCH_API void functionalize_op_helper( const c10::OperatorHandle& op, torch::jit::Stack* stack); -template +template struct _functionalize_aten_op final {}; -template -struct _functionalize_aten_op final { - static ReturnType call(ParameterTypes... args) { +template +struct _functionalize_aten_op final { + static ReturnType call( + typename c10::maybe_keep_symint::type... args) { + using FuncType = ReturnType( + typename c10::maybe_keep_symint::type...); auto op = c10::Dispatcher::singleton() .findSchemaOrThrow( (const char*)Op::name, (const char*)Op::overload_name) - .typed(); + .typed(); - return c10::impl::BoxedKernelWrapper::call( + return c10::impl::BoxedKernelWrapper::call( c10::BoxedKernel::makeFromFunction(), op, // BoxedKernelWrapper knows to ignore this keyset argument, @@ -302,7 +303,12 @@ struct _functionalize_aten_op final { }; template -using functionalize_aten_op = _functionalize_aten_op; +using functionalize_aten_op = + _functionalize_aten_op; + +template +using functionalize_aten_op_symint = + _functionalize_aten_op; } // namespace functionalization } // namespace at diff --git a/aten/src/ATen/FunctionalizeFallbackKernel.cpp b/aten/src/ATen/FunctionalizeFallbackKernel.cpp index 25c81165f883..3b7d3361133b 100644 --- a/aten/src/ATen/FunctionalizeFallbackKernel.cpp +++ b/aten/src/ATen/FunctionalizeFallbackKernel.cpp @@ -256,35 +256,35 @@ at::Tensor _to_copy_functionalize( // The idea with _unsafe_view is that you're guaranteed that the input // is a temporary, and don't actually have to worry about propagating // mutations between the input and output. -at::Tensor _unsafe_view_functionalize(const at::Tensor & self, at::IntArrayRef size) { +at::Tensor _unsafe_view_functionalize(const at::Tensor & self, at::SymIntArrayRef size) { if (!at::functionalization::impl::isFunctionalTensor(self)) { at::AutoDispatchSkipFunctionalize guard; - return at::_unsafe_view(self, size); + return at::_unsafe_view_symint(self, size); } auto self_ = at::functionalization::impl::from_functional_tensor(self); at::Tensor tmp_output; { at::AutoDispatchSkipFunctionalize guard; - tmp_output = at::_unsafe_view(self_, size); + tmp_output = at::_unsafe_view_symint(self_, size); } at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [size = size.vec()](const at::Tensor & base, int64_t mutated_view_idx) -> at::Tensor { - return at::_unsafe_view(base, size); + return at::_unsafe_view_symint(base, size); }, [size = size.vec()](const at::Tensor & base, const at::Tensor & mutated_view, int64_t mutated_view_idx) -> at::Tensor { - return at::_unsafe_view(mutated_view, base.sizes()); + return at::_unsafe_view_symint(mutated_view, base.sym_sizes()); } ); auto out = at::functionalization::impl::create_functional_tensor_with_view_meta(tmp_output, self, view_meta); // See Note [Propagating strides in the functionalization pass] // (for _unsafe_view, I'm just manually doing the shape inference rule here instead of calling the meta function for unsafe_view) - auto inferred_size = at::infer_size_dv(size, self.numel()); - auto stride = at::detail::computeStride(self.sizes(), self.strides(), inferred_size); + auto inferred_size = at::infer_size_dv(size, self.sym_numel()); + auto stride = at::detail::computeStride(self.sym_sizes(), self.sym_strides(), inferred_size); TORCH_INTERNAL_ASSERT(stride.has_value()); - out.unsafeGetTensorImpl()->set_sizes_and_strides(size, stride.value()); + out.unsafeGetTensorImpl()->set_sizes_and_strides(inferred_size, stride.value()); return out; } diff --git a/aten/src/ATen/InferSize.h b/aten/src/ATen/InferSize.h index e0bedb751bf2..111c7eb8f5fc 100644 --- a/aten/src/ATen/InferSize.h +++ b/aten/src/ATen/InferSize.h @@ -2,6 +2,8 @@ #include #include +#include +#include #include #include #include @@ -14,9 +16,13 @@ namespace at { // templated to handle std::vector and DimVector use cases, see // below // -template -inline void infer_size_impl(IntArrayRef shape, int64_t numel, ResultVec& res) { - int64_t newsize = 1; +template +inline void infer_size_impl( + InputArrayRef shape, + NumelType numel, + ResultVec& res) { + NumelType newsize = 1; + // N.B. this is an index, not a sym dim! auto infer_dim = c10::optional(); for (int64_t dim = 0, ndim = shape.size(); dim != ndim; dim++) { if (shape[dim] == -1) { @@ -69,4 +75,13 @@ inline at::DimVector infer_size_dv(IntArrayRef shape, int64_t numel) { return res; } +inline at::SymDimVector infer_size_dv( + c10::SymIntArrayRef shape, + c10::SymInt numel) { + auto res = at::SymDimVector(shape); + infer_size_impl( + shape, std::move(numel), res); + return res; +} + } // namespace at diff --git a/aten/src/ATen/NamedTensorUtils.cpp b/aten/src/ATen/NamedTensorUtils.cpp index ca38f7be31bd..13d5ddb902de 100644 --- a/aten/src/ATen/NamedTensorUtils.cpp +++ b/aten/src/ATen/NamedTensorUtils.cpp @@ -234,7 +234,7 @@ std::vector compute_squeeze_outnames(const Tensor& tensor) { std::vector outnames; auto tensor_names = tensor.names(); for (const auto d : c10::irange(tensor.dim())) { - if (tensor.sizes()[d] != 1) { + if (tensor.sym_sizes()[d] != 1) { outnames.push_back(tensor_names[d]); } } @@ -410,12 +410,12 @@ std::vector broadcast_to_outnames( return unify_from_right(reference_names, tensor_names); } -std::vector compute_cat_outnames(ITensorListRef tensors) { +std::vector compute_cat_outnames(const MaterializedITensorListRef& tensors) { if (!at::has_names(tensors)) { return {}; } std::vector result; - for (const auto& tensor : tensors) { + for (const Tensor& tensor : tensors) { const auto tensor_names = tensor.names(); TORCH_CHECK(tensor_names.size() > 0, "zero-dimensional tensor cannot be concatenated"); TORCH_CHECK(result.empty() || tensor_names.size() == result.size(), diff --git a/aten/src/ATen/NamedTensorUtils.h b/aten/src/ATen/NamedTensorUtils.h index a77f38501f53..c9ff27c2d1b2 100644 --- a/aten/src/ATen/NamedTensorUtils.h +++ b/aten/src/ATen/NamedTensorUtils.h @@ -118,7 +118,8 @@ TORCH_API void propagate_names_for_expand( const Tensor& result, const Tensor& self); -TORCH_API std::vector compute_cat_outnames(ITensorListRef tensors); +TORCH_API std::vector compute_cat_outnames( + const MaterializedITensorListRef& tensors); TORCH_API std::vector compute_broadcast_outnames( const Tensor& self, diff --git a/aten/src/ATen/NestedTensorImpl.cpp b/aten/src/ATen/NestedTensorImpl.cpp index 122c6f10a7d6..4ed527cfd486 100644 --- a/aten/src/ATen/NestedTensorImpl.cpp +++ b/aten/src/ATen/NestedTensorImpl.cpp @@ -4,8 +4,70 @@ #include #include #include +#include #include +#include +#include +#include +#include + +namespace { +inline void validate_nested_tensor_metadata( + const at::Tensor& nested_sizes, + const at::Tensor& nested_strides, + const std::vector& offsets) { + TORCH_INTERNAL_ASSERT(nested_sizes.is_contiguous()); + int64_t size_dim = nested_sizes.dim(); + TORCH_INTERNAL_ASSERT(size_dim == 0 || size_dim == 2); + TORCH_INTERNAL_ASSERT(nested_strides.is_contiguous()); + TORCH_INTERNAL_ASSERT(nested_strides.dim() == size_dim); + TORCH_INTERNAL_ASSERT(nested_sizes.sizes() == nested_strides.sizes()); + TORCH_INTERNAL_ASSERT( + (size_dim == 0 && (int64_t)offsets.empty()) || + (size_dim == 2 && nested_sizes.size(0) == (int64_t)offsets.size())); +} + +/** + * Generates a nested key_set from a non-nested tensor. + * + * When creating a nested tensor from a non-nested tensor + * We want to maintain the same keyset as the buffer but + * swap non nested keys for nested ones + * + * @return Appropriate key set for nested tensor + */ +inline c10::DispatchKeySet generate_nested_key_set_from_buffer( + const at::Tensor& buffer) { + auto nested_key_set = buffer.key_set(); + const bool has_autograd = nested_key_set.has_any(c10::autograd_dispatch_keyset); + // Remove non_nested tensor specific keys + nested_key_set = nested_key_set - + c10::DispatchKeySet{c10::DispatchKey::Dense, c10::DispatchKey::Autograd}; + + // Add nested tensor specific keys + nested_key_set = + nested_key_set | c10::DispatchKeySet{c10::DispatchKey::NestedTensor}; + nested_key_set = + has_autograd ? nested_key_set | c10::autograd_nested : nested_key_set; + return nested_key_set; +} + +/** + * Generates a the correct view keyset. + * + * When creating a nested tensor view of base + * The appropriate keyset will be dependent on the nested + * status of the base + * + * @return Appropriate key set for nested tensor + */ +c10::DispatchKeySet get_view_key_set(const at::Tensor& base) { + return base.is_nested() ? base.key_set() + : generate_nested_key_set_from_buffer(base); +} + +} // namespace namespace at { namespace native { @@ -67,14 +129,22 @@ inline at::Tensor construct_nested_stride_tensor(const at::Tensor& sizes) { return strides; } -// assume contiguous, we can construct offsets from size +/** + * Create a vector of offsets assuming the nested tensor is contiguous + * + * This function iterates over the implicit ntensor outer dimension + * populating a vector with the num_elements in each implicit tensor. + * The first element is always 0 and the length of the returned vector + * is n_tensor. + * + * @return A vector of offsets + */ inline std::vector construct_offsets(const at::Tensor& sizes) { // empty `sizes` means empty nested tensor, so return empty strides if (sizes.dim() == 0) { return std::vector(); } - int64_t ntensors = sizes.size(0), - orig_dim = sizes.size(1); + int64_t ntensors = sizes.size(0), orig_dim = sizes.size(1); std::vector offsets(ntensors); // nesting scalars has easy offsets if (orig_dim == 0) { @@ -83,44 +153,27 @@ inline std::vector construct_offsets(const at::Tensor& sizes) { } const int64_t* sizes_ptr = sizes.data_ptr(); offsets[0] = 0; - for (int64_t i = 0; i < ntensors - 1; i++) { - int64_t row_product = sizes_ptr[0]; - for (int64_t j = 1; j < orig_dim; j++) { - row_product *= sizes_ptr[j]; - } + for (const auto i : c10::irange(ntensors - 1)) { + const int64_t row_product = std::accumulate(sizes_ptr, sizes_ptr + orig_dim, 1, std::multiplies()); offsets[i + 1] = offsets[i] + row_product; sizes_ptr += orig_dim; } return offsets; } -// [Note: Nested Tensor Autograd] The Nested Tensor key is a functionality -// key and therefore getAutogradRelatedKeySetFromBackend will return the -// wrong autograd key. For this specific impl we make sure to register the -// correct Autograd key which is AutogradNestedTensor -c10::DispatchKeySet generate_nested_key_set(at::Tensor buffer) { - c10::DispatchKeySet key_set = - c10::DispatchKeySet(DispatchKey::NestedTensor) | c10::DispatchKeySet{buffer.key_set().highestBackendKey()}; - - // Add AutogradNestedTensor specific keys - key_set = key_set | inplace_or_view_ks | autograd_nested; - return key_set; -} - NestedTensorImpl::NestedTensorImpl( - int64_t buffer_size, Storage storage, c10::DispatchKeySet key_set, const caffe2::TypeMeta data_type, at::Tensor nested_size_tensor, at::Tensor nested_stride_tensor, - std::vector offsets) + std::vector&& offsets) : TensorImpl(std::move(storage), key_set, data_type), - buffer_size_(buffer_size), nested_size_tensor_(std::move(nested_size_tensor)), nested_stride_tensor_(std::move(nested_stride_tensor)), - offsets_(std::move(offsets)), + storage_offsets_(std::move(offsets)), opt_sizes_(construct_opt_sizes(nested_size_tensor_)) { + C10_LOG_API_USAGE_ONCE("torch.NestedTensor"); TORCH_WARN_ONCE( "The PyTorch API of nested tensors is in prototype stage and will change " "in the near future."); @@ -129,34 +182,23 @@ NestedTensorImpl::NestedTensorImpl( storage_device.is_cpu() || storage_device.is_cuda(), "NestedTensorImpl storage must be either CUDA or CPU but got ", storage_device); - TORCH_INTERNAL_ASSERT(nested_size_tensor_.is_contiguous()); - int64_t size_dim = nested_size_tensor_.dim(); - TORCH_INTERNAL_ASSERT(size_dim == 0 || size_dim == 2); - TORCH_INTERNAL_ASSERT(nested_stride_tensor_.is_contiguous()); - TORCH_INTERNAL_ASSERT(nested_stride_tensor_.dim() == size_dim); - TORCH_INTERNAL_ASSERT( - nested_stride_tensor_.sizes() == nested_size_tensor_.sizes()); - TORCH_INTERNAL_ASSERT( - (size_dim == 0 && (int64_t)offsets_.empty()) || - (size_dim == 2 && - nested_size_tensor_.size(0) == (int64_t)offsets_.size())); + validate_nested_tensor_metadata(nested_size_tensor_, nested_stride_tensor_, storage_offsets_); refresh_dim(); - set_sizes_strides_policy(c10::TensorImpl::SizesStridesPolicy::CustomSizes); + set_custom_sizes_strides(c10::TensorImpl::SizesStridesPolicy::CustomSizes); } NestedTensorImpl::NestedTensorImpl( at::Tensor buffer, at::Tensor nested_size_tensor, at::Tensor nested_stride_tensor, - std::vector offsets) + std::vector&& offsets) : NestedTensorImpl( - buffer.sizes()[0], buffer.storage(), - generate_nested_key_set(buffer), + generate_nested_key_set_from_buffer(buffer), buffer.dtype(), nested_size_tensor, nested_stride_tensor, - offsets) { + std::move(offsets)) { TORCH_INTERNAL_ASSERT( buffer.dim() == 1, @@ -177,6 +219,22 @@ NestedTensorImpl::NestedTensorImpl( construct_offsets(nested_size_tensor)) {} +NestedTensorImpl::NestedTensorImpl( + c10::TensorImpl::ImplType impl_type, + const at::Tensor& base_tensor, + at::Tensor nested_size_tensor, + at::Tensor nested_stride_tensor, + std::vector&& offsets) + : TensorImpl(impl_type, Storage(base_tensor.storage()), get_view_key_set(base_tensor), base_tensor.dtype()), + nested_size_tensor_(std::move(nested_size_tensor)), + nested_stride_tensor_(std::move(nested_stride_tensor)), + storage_offsets_(std::move(offsets)), + opt_sizes_(construct_opt_sizes(nested_size_tensor_)) { + validate_nested_tensor_metadata(nested_size_tensor_, nested_stride_tensor_, storage_offsets_); + refresh_dim(); + set_custom_sizes_strides(c10::TensorImpl::SizesStridesPolicy::CustomSizes); +} + void NestedTensorImpl::refresh_dim() { const auto my_dim = nested_size_tensor_.dim() ? nested_size_tensor_.sizes()[1] + 1 : 1; sizes_and_strides_.resize(my_dim); @@ -227,8 +285,8 @@ c10::SymIntArrayRef NestedTensorImpl::sym_sizes_custom() const { TORCH_CHECK(false, "Internal error: NestedTensorImpl doesn't support sizes. Please file an issue on https://github.com/pytorch/nestedtensor"); } -c10::SymIntArrayRef NestedTensorImpl::sym_sizes() const { - return sym_sizes_custom(); +c10::SymIntArrayRef NestedTensorImpl::sym_strides_custom() const { + TORCH_CHECK(false, "Internal error: NestedTensorImpl doesn't support strides. Please file an issue on https://github.com/pytorch/nestedtensor"); } IntArrayRef NestedTensorImpl::strides_custom() const { @@ -246,7 +304,7 @@ c10::intrusive_ptr NestedTensorImpl::shallow_copy_and_detach_core( bool allow_tensor_metadata_change) const { if (key_set_.has(DispatchKey::Python) && !c10::impl::tls_is_dispatch_key_excluded(DispatchKey::Python)) { - auto r = pyobj_interpreter_.load(std::memory_order_acquire)->detach(this); + auto r = (*pyobj_interpreter_.load(std::memory_order_acquire))->detach(this); if (r) { r->set_version_counter(std::forward(version_counter)); r->set_allow_tensor_metadata_change(allow_tensor_metadata_change); @@ -256,13 +314,12 @@ c10::intrusive_ptr NestedTensorImpl::shallow_copy_and_detach_core( // the interpreter is dead no one can call us out on it } auto impl = c10::make_intrusive( - buffer_size_, storage_, key_set_, data_type_, nested_size_tensor_, nested_stride_tensor_, - offsets_); + std::vector(storage_offsets_)); copy_tensor_metadata( /*src_impl=*/this, diff --git a/aten/src/ATen/NestedTensorImpl.h b/aten/src/ATen/NestedTensorImpl.h index d2e0381425f4..4790cda1cbcb 100644 --- a/aten/src/ATen/NestedTensorImpl.h +++ b/aten/src/ATen/NestedTensorImpl.h @@ -1,6 +1,8 @@ #pragma once #include #include +#include +#include #include #include #include @@ -10,26 +12,35 @@ namespace at { namespace native { +struct NestedTensorImpl; +inline bool nested_tensor_impl_is_contiguous(const NestedTensorImpl* nt); struct TORCH_API NestedTensorImpl : public c10::TensorImpl { explicit NestedTensorImpl( - int64_t buffer_size, Storage storage, c10::DispatchKeySet key_set, const caffe2::TypeMeta data_type, at::Tensor nested_size_tensor, at::Tensor nested_stride_tensor, - std::vector offsets); + std::vector&& offsets); explicit NestedTensorImpl( at::Tensor buffer, at::Tensor nested_size_tensor, at::Tensor nested_stride_tensor, - std::vector offsets); + std::vector&& offsets); // assume contiguous, `nested_stride_tensor` and `offsets` // can be infered from `nested_size_tensor` explicit NestedTensorImpl(at::Tensor buffer, at::Tensor nested_size_tensor); + // This constructor is used creating view tensors from nested tensors + explicit NestedTensorImpl( + c10::TensorImpl::ImplType impl_type, + const at::Tensor& base_tensor, + at::Tensor nested_size_tensor, + at::Tensor nested_stride_tensor, + std::vector&& offsets); + // TODO: don't expose private implementation details like this; in // particular, resizing this tensor will mess up our dim() and // callers cannot fix it. @@ -40,8 +51,8 @@ struct TORCH_API NestedTensorImpl : public c10::TensorImpl { const Tensor& get_nested_stride_tensor() const { return nested_stride_tensor_; } - const std::vector& get_offsets() const { - return offsets_; + const std::vector& get_storage_offsets() const { + return storage_offsets_; } // Returns nullopt if the ith dimension is irregular. The ith dimension // of a NestedTensor is regular if the unbound tensors match in @@ -63,16 +74,41 @@ struct TORCH_API NestedTensorImpl : public c10::TensorImpl { " is irregular and does not have a size."); return *optional_size; } - + /** + * Return a view of the nested tensor as a 1 dimensional contiguous tensor. + * + * The buffer tensor created by this function shares the same storage_impl as + * the original nested tensor, and therefore can be seen as a view. + * + * @return A newly constructed view tensor + */ at::Tensor get_buffer() const { - auto buffer_key_set_ = c10::DispatchKeySet{c10::DispatchKey::Dense} | - c10::DispatchKeySet{this->key_set_.highestBackendKey()}; + TORCH_CHECK( + nested_tensor_impl_is_contiguous(this), + "NestedTensor must be contiguous to get buffer."); + return get_unsafe_storage_as_tensor(); + } + /** + * If possible use get_buffer() instead. This function returns the storage + * as a tensor directly, which is not safe to use in general. If using this + * function, The caller must ensure to account for nested_sizes, + * nested_strides and storage_offsets. + * + * @return A newly constructed view tensor + */ + at::Tensor get_unsafe_storage_as_tensor() const { + auto buffer_key_set_ = generate_buffer_key_set(); + const auto buffer_size = get_buffer_size(); auto buffer_tensor_impl = c10::make_intrusive( - Storage(storage_), buffer_key_set_, data_type_); - buffer_tensor_impl->set_sizes_contiguous(c10::makeArrayRef(buffer_size_)); + c10::TensorImpl::VIEW, Storage(storage_), buffer_key_set_, data_type_); + buffer_tensor_impl->set_sizes_contiguous(c10::makeArrayRef(buffer_size)); return Tensor(buffer_tensor_impl); } + int64_t get_buffer_size() const { + return storage_.nbytes() / data_type_.itemsize(); + } + protected: const char* tensorimpl_type_name() const override; @@ -89,8 +125,8 @@ struct TORCH_API NestedTensorImpl : public c10::TensorImpl { } IntArrayRef sizes_custom() const override; c10::SymIntArrayRef sym_sizes_custom() const override; - c10::SymIntArrayRef sym_sizes() const override; IntArrayRef strides_custom() const override; + c10::SymIntArrayRef sym_strides_custom() const override; // this one is real int64_t dim_custom() const override; @@ -115,10 +151,7 @@ struct TORCH_API NestedTensorImpl : public c10::TensorImpl { // Must be called after any changes to our dim() to sync the state // to TensorImpl. void refresh_dim(); - // Store the size of the buffer for use in get_buffer(). - // get_buffer constructs a flat, contiguous tensor from the NestedTensor - // storage - int64_t buffer_size_; + const at::Tensor nested_size_tensor_, nested_stride_tensor_; // The starting positions of the underlying tensors in contiguous buffer // i.e. the buffer memory offsets to get the underlying tensors @@ -132,7 +165,7 @@ struct TORCH_API NestedTensorImpl : public c10::TensorImpl { // Some strong enough constraints are: // 1. every underlying tensor is contiguous in memory // && nesting in ascending order - std::vector offsets_; + std::vector storage_offsets_; // NOTE: -1 here means the size is missing // TODO: maybe we can remove this metadata since // we can compute it from `nested_size_tensor_` @@ -142,6 +175,34 @@ struct TORCH_API NestedTensorImpl : public c10::TensorImpl { c10::intrusive_ptr shallow_copy_and_detach_core( VariableVersion&& version_counter, bool allow_tensor_metadata_change) const; + + /** + * Generates a non-nested key_set from a nested tensor. + * + * For many nested tensor kernel implementations a buffer tensor + * is generated and redispatched to a non-nested kernel this function + * generates the key set used by that buffer tensor + * + * @return Appropriate key set for non-nested tensor + */ + inline c10::DispatchKeySet generate_buffer_key_set() const { + auto buffer_key_set = this->key_set(); + const bool Autograd = buffer_key_set.has_any(c10::autograd_dispatch_keyset); + // Remove nested tensor specific keys + buffer_key_set = buffer_key_set - + c10::DispatchKeySet{ + c10::DispatchKey::NestedTensor, + c10::DispatchKey::AutogradNestedTensor}; + + // Add dense tensor specific keys + buffer_key_set = + buffer_key_set | c10::DispatchKeySet{c10::DispatchKey::Dense}; + buffer_key_set = Autograd + ? c10::DispatchKeySet{c10::DispatchKey::Autograd} | buffer_key_set + : buffer_key_set; + + return buffer_key_set; + } }; inline NestedTensorImpl* get_nested_tensor_impl_or_null( @@ -165,7 +226,7 @@ inline bool nested_tensor_impl_is_contiguous(const NestedTensorImpl* nt) { } const Tensor &sizemat = nt->get_nested_size_tensor(), &stridemat = nt->get_nested_stride_tensor(); - const auto& offsets = nt->get_offsets(); + const auto& offsets = nt->get_storage_offsets(); int64_t orig_dim = sizemat.size(1); // nesting scalars if (orig_dim == 0) { diff --git a/aten/src/ATen/NumericUtils.h b/aten/src/ATen/NumericUtils.h index 816cc4e8a44b..241addbc5c28 100644 --- a/aten/src/ATen/NumericUtils.h +++ b/aten/src/ATen/NumericUtils.h @@ -7,8 +7,9 @@ #include #include #include +#include + #include -#include #include namespace at { diff --git a/aten/src/ATen/OpaqueTensorImpl.h b/aten/src/ATen/OpaqueTensorImpl.h index c87fcab77bd2..e6c6413815bb 100644 --- a/aten/src/ATen/OpaqueTensorImpl.h +++ b/aten/src/ATen/OpaqueTensorImpl.h @@ -30,7 +30,7 @@ struct TORCH_API OpaqueTensorImpl : public TensorImpl { : TensorImpl(key_set, data_type, device), opaque_handle_(std::move(opaque_handle)) { set_storage_access_should_throw(); - set_sizes_strides_policy(SizesStridesPolicy::CustomStrides); + set_custom_sizes_strides(SizesStridesPolicy::CustomStrides); sizes_and_strides_.set_sizes(sizes); refresh_numel(); is_non_overlapping_and_dense_ = is_non_overlapping_and_dense; @@ -77,7 +77,7 @@ struct TORCH_API OpaqueTensorImpl : public TensorImpl { dtype(), device(), opaque_handle_, - asIntArrayRefSlow(sizes_and_strides_.sizes_arrayref())); + sizes_and_strides_.sizes_arrayref()); copy_tensor_metadata( /*src_opaque_impl=*/this, /*dest_opaque_impl=*/impl.get(), @@ -101,7 +101,7 @@ struct TORCH_API OpaqueTensorImpl : public TensorImpl { dtype(), device(), opaque_handle_, - asIntArrayRefSlow(sizes_and_strides_.sizes_arrayref())); + sizes_and_strides_.sizes_arrayref()); copy_tensor_metadata( /*src_opaque_impl=*/this, /*dest_opaque_impl=*/impl.get(), diff --git a/aten/src/ATen/PadNd.h b/aten/src/ATen/PadNd.h new file mode 100644 index 000000000000..573d1a7b88ab --- /dev/null +++ b/aten/src/ATen/PadNd.h @@ -0,0 +1,28 @@ +#pragma once +#include +#include + +namespace at { + +enum class padding_mode { + reflect, + replicate, + circular, + constant, +}; + +static inline c10::string_view padding_mode_string(padding_mode m) { + switch (m) { + case padding_mode::reflect: + return "reflect"; + case padding_mode::replicate: + return "replicate"; + case padding_mode::circular: + return "circular"; + case padding_mode::constant: + return "constant"; + } + TORCH_CHECK(false, "Invalid padding mode (", static_cast(m), ")"); +} + +} // namespace at diff --git a/aten/src/ATen/Parallel.h b/aten/src/ATen/Parallel.h index 6c99fcd422cb..4693997624e9 100644 --- a/aten/src/ATen/Parallel.h +++ b/aten/src/ATen/Parallel.h @@ -2,6 +2,7 @@ #include #include #include +#include namespace at { diff --git a/aten/src/ATen/PythonTorchFunctionTLS.cpp b/aten/src/ATen/PythonTorchFunctionTLS.cpp index ae9f722de60a..c9487c6958cb 100644 --- a/aten/src/ATen/PythonTorchFunctionTLS.cpp +++ b/aten/src/ATen/PythonTorchFunctionTLS.cpp @@ -6,16 +6,24 @@ namespace impl { static thread_local PythonTorchFunctionTLS pythonTorchFunctionState; -void PythonTorchFunctionTLS::set_mode(std::shared_ptr mode) { - pythonTorchFunctionState.mode_ = std::move(mode); +void PythonTorchFunctionTLS::push_onto_stack(std::shared_ptr mode) { + pythonTorchFunctionState.stack_.push_back(std::move(mode)); } -const std::shared_ptr& PythonTorchFunctionTLS::get_mode() { - return pythonTorchFunctionState.mode_; +const std::shared_ptr PythonTorchFunctionTLS::pop_stack() { + TORCH_CHECK(pythonTorchFunctionState.stack_.size() > 0, "trying to pop from empty mode stack"); + const auto out = pythonTorchFunctionState.stack_.back(); + pythonTorchFunctionState.stack_.pop_back(); + return out; } -void PythonTorchFunctionTLS::swap_mode(std::shared_ptr& mode) { - pythonTorchFunctionState.mode_.swap(mode); +const std::shared_ptr& PythonTorchFunctionTLS::get_stack_at(int64_t idx) { + TORCH_CHECK(idx < static_cast(pythonTorchFunctionState.stack_.size()), "Tried to get stack at idx that's too big"); + return pythonTorchFunctionState.stack_[idx]; +} + +int64_t PythonTorchFunctionTLS::stack_len() { + return pythonTorchFunctionState.stack_.size(); } void PythonTorchFunctionTLS::set_disabled(bool disabled) { @@ -34,5 +42,9 @@ const PythonTorchFunctionTLS& PythonTorchFunctionTLS::get_state() { return pythonTorchFunctionState; } +bool torch_function_mode_enabled() { + return PythonTorchFunctionTLS::stack_len() > 0; +} + } // namespace impl } // namespace at diff --git a/aten/src/ATen/PythonTorchFunctionTLS.h b/aten/src/ATen/PythonTorchFunctionTLS.h index 003dcef1e90f..5940fb6f2dee 100644 --- a/aten/src/ATen/PythonTorchFunctionTLS.h +++ b/aten/src/ATen/PythonTorchFunctionTLS.h @@ -10,17 +10,25 @@ struct TORCH_API PythonTorchFunctionTLS { static void set_disabled(bool); static bool is_disabled(); - static void set_mode(std::shared_ptr); - static const std::shared_ptr& get_mode(); - static void swap_mode(std::shared_ptr&); + static void push_onto_stack(std::shared_ptr mode); + static const std::shared_ptr pop_stack(); + static const std::shared_ptr& get_stack_at(int64_t idx); + static int64_t stack_len(); - static void set_state(const PythonTorchFunctionTLS& state); static const PythonTorchFunctionTLS& get_state(); + static void set_state(const PythonTorchFunctionTLS& state); private: + // The mode TLS is split into + // - disabled_, which says whether or not to disable all torch function + // modes + // - stack_, which is a vector of modes representing the stack of user + // defined modes bool disabled_; - std::shared_ptr mode_; + std::vector> stack_; }; +TORCH_API bool torch_function_mode_enabled(); + } // namespace impl } // namespace at diff --git a/aten/src/ATen/SavedTensorHooks.cpp b/aten/src/ATen/SavedTensorHooks.cpp index aff6ddd1b06e..6b3b63f5987e 100644 --- a/aten/src/ATen/SavedTensorHooks.cpp +++ b/aten/src/ATen/SavedTensorHooks.cpp @@ -5,46 +5,78 @@ namespace at { namespace { - // PyObject is defined in c10/util/python_stub.h - thread_local std::stack> stack; + thread_local impl::SavedTensorDefaultHooksTLS tls; // This flag is set to true the first time default hooks are registered // and left at true for the rest of the execution. // It's an optimization so that users who never use default hooks don't need to // read the thread_local variables pack_hook_ and unpack_hook_. - static bool is_enabled(false); + static bool is_initialized(false); +} + +static void assertSavedTensorHooksNotDisabled() { + TORCH_CHECK(SavedTensorDefaultHooks::is_enabled(), tls.disabled_error_message.value()); +} + +bool SavedTensorDefaultHooks::is_enabled() { + // See NOTE: [disabled_error_message invariant] + return !tls.disabled_error_message.has_value(); +} + +void SavedTensorDefaultHooks::disable(const std::string& message) { + tls.disabled_error_message = message; + if (tls.stack.size() > 0) { + assertSavedTensorHooksNotDisabled(); + } } void SavedTensorDefaultHooks::enable() { - is_enabled = true; + tls.disabled_error_message = c10::nullopt; +} + +const c10::optional& SavedTensorDefaultHooks::get_disabled_error_message() { + return tls.disabled_error_message; +} + +const impl::SavedTensorDefaultHooksTLS& SavedTensorDefaultHooks::get_tls_state() { + return tls; +} + +void SavedTensorDefaultHooks::set_tls_state(const impl::SavedTensorDefaultHooksTLS& state) { + tls = state; +} + +void SavedTensorDefaultHooks::lazy_initialize() { + is_initialized = true; } void SavedTensorDefaultHooks::push_hooks(PyObject* pack_hook, PyObject* unpack_hook) { // Reference counting is handled by the caller of `push_hooks` - TORCH_INTERNAL_ASSERT(is_enabled); + TORCH_INTERNAL_ASSERT(is_initialized); TORCH_INTERNAL_ASSERT(pack_hook != nullptr && unpack_hook != nullptr); - stack.push(std::make_pair(pack_hook, unpack_hook)); + assertSavedTensorHooksNotDisabled(); + tls.stack.push(std::make_pair(pack_hook, unpack_hook)); } void SavedTensorDefaultHooks::pop_hooks() { // Reference counting is handled by the caller of `pop_hooks` - TORCH_INTERNAL_ASSERT(is_enabled && !stack.empty()); - stack.pop(); + TORCH_INTERNAL_ASSERT(is_initialized && !tls.stack.empty()); + tls.stack.pop(); } std::pair SavedTensorDefaultHooks::get_hooks() { - if (!is_enabled || stack.empty()) { + if (!is_initialized || tls.stack.empty()) { return std::make_pair(nullptr, nullptr); } - return stack.top(); + return tls.stack.top(); } std::stack> SavedTensorDefaultHooks::get_stack() { - return stack; + return tls.stack; } void SavedTensorDefaultHooks::set_stack(std::stack> stack_) { - stack = stack_; + tls.stack = stack_; } } diff --git a/aten/src/ATen/SavedTensorHooks.h b/aten/src/ATen/SavedTensorHooks.h index 0cdfa3c9ecc3..af821cb908c6 100644 --- a/aten/src/ATen/SavedTensorHooks.h +++ b/aten/src/ATen/SavedTensorHooks.h @@ -1,20 +1,52 @@ #pragma once #include +#include #include #include +#include #include namespace at { +namespace impl { + +struct TORCH_API SavedTensorDefaultHooksTLS { + // PyObject is defined in c10/util/python_stub.h + std::stack> stack; + + // See NOTE: [Disabling SavedTensorDefaultHooks] for context + // NOTE: [disabled_error_message invariant] + // disabled_error_message is nullopt IFF Saved Tensor hooks is enabled + // We did this for efficiency (so we didn't have to keep a separate bool + // around) + c10::optional disabled_error_message; +}; + +} // namespace impl + struct TORCH_API SavedTensorDefaultHooks { static void push_hooks(PyObject* pack_hook, PyObject* unpack_hook); static void pop_hooks(); static std::pair get_hooks(); - static void enable(); + static void lazy_initialize(); static std::stack> get_stack(); static void set_stack(std::stack>); + + static const impl::SavedTensorDefaultHooksTLS& get_tls_state(); + static void set_tls_state(const impl::SavedTensorDefaultHooksTLS& tls); + + // NOTE: [Disabling SavedTensorDefaultHooks] + // A developer of a PyTorch feature may choose to disable SavedTensorDefault + // hooks, especially if their feature does not work with it. If they are + // disabled, then the following will raise an error: + // - Attempting to push_hooks + // - calling disable(message) with a non-zero stack (from get_stack) size + static void disable(const std::string& error_message); + static void enable(); + static bool is_enabled(); + static const c10::optional& get_disabled_error_message(); }; } // namespace at diff --git a/aten/src/ATen/SparseCsrTensorImpl.cpp b/aten/src/ATen/SparseCsrTensorImpl.cpp index dab45065fa71..f07bd176d6c4 100644 --- a/aten/src/ATen/SparseCsrTensorImpl.cpp +++ b/aten/src/ATen/SparseCsrTensorImpl.cpp @@ -8,22 +8,10 @@ #include namespace at { -namespace { -DeviceType SparseCsrTensorSetToDeviceType(DispatchKeySet key_set) { - if (key_set.has(DispatchKey::SparseCsrCPU)) { - return kCPU; - } else if (key_set.has(DispatchKey::SparseCsrCUDA)) { - return kCUDA; - } else { - TORCH_CHECK(false, - "Cannot construct SparseCsrTensor with non-sparse tensor type ID ", - key_set); - } -} -} // namespace SparseCsrTensorImpl::SparseCsrTensorImpl( at::DispatchKeySet key_set, + at::Device device, at::Layout layout, const caffe2::TypeMeta data_type) : SparseCsrTensorImpl( @@ -32,19 +20,19 @@ SparseCsrTensorImpl::SparseCsrTensorImpl( at::empty( {0}, at::initialTensorOptions() - .device(SparseCsrTensorSetToDeviceType(key_set)) + .device(device) .dtype(ScalarType::Int)) // crow_indices , at::empty( {0}, at::initialTensorOptions() - .device(SparseCsrTensorSetToDeviceType(key_set)) + .device(device) .dtype(ScalarType::Int)) // col_indices , at::empty( {0}, at::initialTensorOptions() - .device(SparseCsrTensorSetToDeviceType(key_set)) + .device(device) .dtype(data_type)) // values , layout @@ -66,15 +54,24 @@ SparseCsrTensorImpl::SparseCsrTensorImpl( TORCH_WARN_ONCE("Sparse ", at::sparse_csr::layoutToString(layout_, /*upper=*/true), " tensor support is in beta state. " "If you miss a functionality in the sparse tensor support, please submit a feature request " "to https://github.com/pytorch/pytorch/issues."); + + TORCH_INTERNAL_ASSERT(((key_set.has(DispatchKey::SparseCsrCPU) && device().type() == kCPU) + || (key_set.has(DispatchKey::SparseCsrCUDA) && device().type() == kCUDA)), + "Inconsistent key_set (=", key_set, ") and device (=", device(), ")"); + set_storage_access_should_throw(); is_non_overlapping_and_dense_ = false; - set_sizes_strides_policy(SizesStridesPolicy::CustomStrides); + set_custom_sizes_strides(SizesStridesPolicy::CustomStrides); // TODO: If this check ever shows up as a bottleneck, which is unlikely given that // comparing devices only involves comparing the type and index (two integers), we // can move this to a DEBUG only assert. Until then this confirms and maintains a // crucial invariance. - TORCH_CHECK(values_.device() == crow_indices_.device(), "Values and crow_indices need to be on the same device."); - TORCH_CHECK(values_.device() == col_indices_.device(), "Values and col_indices need to be on the same device."); + TORCH_CHECK(values_.device() == crow_indices_.device(), "Values and ", + at::sparse_csr::compressedIndicesName(layout_), " need to be on the same device."); + TORCH_CHECK(values_.device() == col_indices_.device(), "Values and ", + at::sparse_csr::plainIndicesName(layout_), " need to be on the same device."); + TORCH_INTERNAL_ASSERT(values_.device() == device(), + "Values and compressed sparse tensor instance need to have the same device."); } const char* SparseCsrTensorImpl::tensorimpl_type_name() const { @@ -104,23 +101,83 @@ void SparseCsrTensorImpl::resize_(int64_t nnz, IntArrayRef size) { sizes_and_strides_.set_sizes(size); } -void SparseCsrTensorImpl::resize_as_sparse_csr_tensor_(const Tensor& src) { +void SparseCsrTensorImpl::resize_and_clear_(int64_t sparse_dim, IntArrayRef size) { TORCH_CHECK( !has_symbolic_sizes_strides_, - "resize_as_sparse_csr_tensor_ called on tensor with symbolic shape") - set_layout(src.layout()); - crow_indices_ = at::empty_like( - src.crow_indices(), - src.crow_indices().options(), - src.crow_indices().suggest_memory_format()); - col_indices_ = at::empty_like( - src.col_indices(), - src.col_indices().options(), - src.col_indices().suggest_memory_format()); - values_ = at::empty_like( - src.values(), - src.values().options(), - src.values().suggest_memory_format()); + "resize_and_clear_ called on tensor with symbolic shape"); + TORCH_CHECK(sparse_dim >= 2, "resize_and_clear_ sparse dimensionality must be at least 2, got ", sparse_dim); + TORCH_CHECK(static_cast(size.size()) >= sparse_dim, "resize_and_clear_ size length must be at least sparse dimensionality (=", + sparse_dim, "), got ", size.size()); + auto batch_dim = sparse_dim - 2; + auto batchsize = size.slice(0, batch_dim); + auto densesize = size.slice(batch_dim + 2, size.size() - batch_dim - 2); + + auto values_size = DimVector(batchsize); + values_size.push_back(0); // nse + values_size.append(densesize.begin(), densesize.end()); + + auto col_indices_size = DimVector(batchsize); + col_indices_size.push_back(0); // nse + + auto n_compressed_indices = AT_DISPATCH_ROW_SPARSE_COMPRESSED_LAYOUTS(layout_, "resize_and_clear_", + [&] () -> int64_t { return size[batch_dim]; }, + [&] () -> int64_t { return size[batch_dim + 1]; } + ); + AT_DISPATCH_PLAIN_SPARSE_COMPRESSED_LAYOUTS(layout_, + "resize_and_clear_", + [] () {}, + [&] () { + auto blocksize = this->values_.sizes().slice(this->batch_dim() + 1, 2); + values_size.append(blocksize.begin(), blocksize.end()); + n_compressed_indices /= blocksize[(the_layout == kSparseBsr ? 0 : 1)]; + }); + auto crow_indices_size = DimVector(batchsize); + crow_indices_size.push_back(n_compressed_indices + 1); + + crow_indices_.resize_(crow_indices_size); + crow_indices_.zero_(); + col_indices_.resize_(col_indices_size); + values_.resize_(values_size); + sizes_and_strides_.set_sizes(size); + refresh_numel(); +} + +void SparseCsrTensorImpl::resize_as_sparse_compressed_tensor_( + const Tensor& src) { + TORCH_CHECK( + !has_symbolic_sizes_strides_, + "resize_as_sparse_compressed_tensor_ called on tensor with symbolic shape"); + + // We cannot resize as other layout and preserve the invariants for self + // layout + TORCH_CHECK( + src.layout() == layout_, + "resize_as_sparse_compressed_tensor_: self and src must have the same layout, but got: self (", + layout_, + ") and source (", + src.layout(), + ")"); + + Tensor compressed_indices; + Tensor plain_indices; + std::tie(compressed_indices, plain_indices) = + sparse_csr::getCompressedPlainIndices(src); + // reuse self indices storage + if (crow_indices_.sizes() != compressed_indices.sizes()) { + crow_indices_.resize_as_(compressed_indices); + } + if (col_indices_.sizes() != plain_indices.sizes()) { + col_indices_.resize_as_(plain_indices); + } + // Update indices data to ensure result is valid under invariants check + if ((sizes() != src.sizes()) || (dense_dim() != src.dense_dim())) { + crow_indices_.copy_(compressed_indices); + col_indices_.copy_(plain_indices); + } + // Reuse values storage + if (values_.sizes() != src.values().sizes()) { + values_.resize_as_(src.values()); + } sizes_and_strides_.set_sizes(src.sizes()); refresh_numel(); } @@ -132,7 +189,7 @@ void SparseCsrTensorImpl::set_member_tensors( IntArrayRef size) { TORCH_CHECK( !has_symbolic_sizes_strides_, - "set_member_tensors called on tensor with symbolic shape") + "set_member_tensors called on tensor with symbolic shape"); // CSR Type Invariants TORCH_CHECK( @@ -142,7 +199,6 @@ void SparseCsrTensorImpl::set_member_tensors( ") must match dtype of sparse tensor (", typeMetaToScalarType(dtype()), ")"); - crow_indices_ = crow_indices; col_indices_ = col_indices; values_ = values; @@ -153,13 +209,20 @@ void SparseCsrTensorImpl::set_member_tensors( // comparing devices only involves comparing the type and index (two integers), we // can move this to a DEBUG only assert. Until then this confirms and maintains a // crucial invariance. - TORCH_CHECK(values_.device() == crow_indices_.device(), "Values and crow_indices need to be on the same device."); - TORCH_CHECK(values_.device() == col_indices_.device(), "Values and col_indices need to be on the same device."); + TORCH_CHECK(values_.device() == crow_indices_.device(), "Values and ", + at::sparse_csr::compressedIndicesName(layout_), " need to be on the same device."); + TORCH_CHECK(values_.device() == col_indices_.device(), "Values and ", + at::sparse_csr::plainIndicesName(layout_), " need to be on the same device."); + TORCH_CHECK(values_.device() == device(), + "Values and compressed tensor instance need to be on the same device."); } IntArrayRef SparseCsrTensorImpl::strides_custom() const { TORCH_CHECK(false, "Sparse ", at::sparse_csr::layoutToString(layout_, /*upper=*/true), " tensors do not have strides"); } +SymIntArrayRef SparseCsrTensorImpl::sym_strides_custom() const { + TORCH_CHECK(false, "Sparse ", at::sparse_csr::layoutToString(layout_, /*upper=*/true), " tensors do not have strides"); +} void SparseCsrTensorImpl::set_size(int64_t dim, int64_t new_size) { TORCH_CHECK(false, "Sparse ", at::sparse_csr::layoutToString(layout_, /*upper=*/true), " tensors do not have set_size."); } @@ -169,5 +232,8 @@ void SparseCsrTensorImpl::set_stride(int64_t dim, int64_t new_stride) { void SparseCsrTensorImpl::set_storage_offset(int64_t storage_offset) { TORCH_CHECK(false, "Sparse ", at::sparse_csr::layoutToString(layout_, /*upper=*/true), " tensors do not have set_storage_offset."); } +bool SparseCsrTensorImpl::is_contiguous_custom(MemoryFormat) const { + TORCH_CHECK(false, "Sparse ", at::sparse_csr::layoutToString(layout_, /*upper=*/true), " tensors do not have is_contiguous"); +} } // namespace at diff --git a/aten/src/ATen/SparseCsrTensorImpl.h b/aten/src/ATen/SparseCsrTensorImpl.h index 878c465962b8..12ef1de24ff7 100644 --- a/aten/src/ATen/SparseCsrTensorImpl.h +++ b/aten/src/ATen/SparseCsrTensorImpl.h @@ -3,7 +3,6 @@ #include #include #include - namespace at { // Struct implementing a sparse CSR tensor. It uses three 1-D tensors for @@ -33,11 +32,13 @@ struct TORCH_API SparseCsrTensorImpl : public TensorImpl { public: explicit SparseCsrTensorImpl( at::DispatchKeySet, + at::Device device, Layout layout, const caffe2::TypeMeta); void resize_(int64_t nnz, IntArrayRef size); - void resize_as_sparse_csr_tensor_(const Tensor& src); + void resize_and_clear_(int64_t sparse_dim, IntArrayRef size); + void resize_as_sparse_compressed_tensor_(const Tensor& src); void set_member_tensors( const Tensor& crow_indices, const Tensor& col_indices, @@ -76,6 +77,8 @@ struct TORCH_API SparseCsrTensorImpl : public TensorImpl { protected: IntArrayRef strides_custom() const override; + SymIntArrayRef sym_strides_custom() const override; + bool is_contiguous_custom(MemoryFormat) const override; public: void set_size(int64_t dim, int64_t new_size) override; @@ -107,7 +110,7 @@ struct TORCH_API SparseCsrTensorImpl : public TensorImpl { const c10::VariableVersion& version_counter, bool allow_tensor_metadata_change) const override { auto impl = c10::make_intrusive( - key_set(), layout_impl(), dtype()); + key_set(), device(), layout_impl(), dtype()); copy_tensor_metadata( /*src_impl=*/this, /*dest_impl=*/impl.get(), @@ -127,7 +130,7 @@ struct TORCH_API SparseCsrTensorImpl : public TensorImpl { c10::VariableVersion&& version_counter, bool allow_tensor_metadata_change) const override { auto impl = c10::make_intrusive( - key_set(), layout_impl(), dtype()); + key_set(), device(), layout_impl(), dtype()); copy_tensor_metadata( /*src_impl=*/this, /*dest_impl=*/impl.get(), diff --git a/aten/src/ATen/SparseCsrTensorUtils.h b/aten/src/ATen/SparseCsrTensorUtils.h index 24b5ae47df7d..e76d2707c6f4 100644 --- a/aten/src/ATen/SparseCsrTensorUtils.h +++ b/aten/src/ATen/SparseCsrTensorUtils.h @@ -127,6 +127,22 @@ namespace sparse_csr { using SparseCsrTensor = Tensor; +inline bool is_sparse_compressed(const Layout& layout) { + switch (layout) { + case kSparseCsr: + case kSparseCsc: + case kSparseBsr: + case kSparseBsc: + return true; + default:; + } + return false; +} + +inline bool is_sparse_compressed(const Tensor& self) { + return is_sparse_compressed(self.layout()); +} + inline SparseCsrTensorImpl* get_sparse_csr_impl(const SparseCsrTensor& self) { AT_DISPATCH_ALL_SPARSE_COMPRESSED_LAYOUTS( self.layout(), "get_sparse_csr_impl", [&] {}); @@ -235,5 +251,41 @@ inline int plainDimension( return size.size() - dense_ndim - (isCompressedRow(layout) ? 1 : 2); } +inline int64_t numBatchDimensions(Tensor const& self) { + return AT_DISPATCH_ROW_SPARSE_COMPRESSED_LAYOUTS( + self.layout(), + "numBatchDimensions", + [&self] { return self.crow_indices().dim() - 1; }, + [&self] { return self.ccol_indices().dim() - 1; }); +} + +inline std::pair getCompressedPlainIndices(Tensor const& self) { + return AT_DISPATCH_ROW_SPARSE_COMPRESSED_LAYOUTS( + self.layout(), + "getCompressedPlainIndices", + [&self] { + return std::make_pair(self.crow_indices(), self.col_indices()); + }, + [&self] { + return std::make_pair(self.ccol_indices(), self.row_indices()); + }); +} + +inline Layout flip_compressed_layout(Layout layout) { + switch (layout) { + case kSparseCsr: + return kSparseCsc; + case kSparseCsc: + return kSparseCsr; + case kSparseBsr: + return kSparseBsc; + case kSparseBsc: + return kSparseBsr; + default: + TORCH_CHECK(false, "Not a sparse compressed layout:", layout); + return kSparseCsr; + } +} + } // namespace sparse_csr } // namespace at diff --git a/aten/src/ATen/SparseTensorImpl.cpp b/aten/src/ATen/SparseTensorImpl.cpp index 03999da97312..36c93b706db8 100644 --- a/aten/src/ATen/SparseTensorImpl.cpp +++ b/aten/src/ATen/SparseTensorImpl.cpp @@ -46,7 +46,7 @@ SparseTensorImpl::SparseTensorImpl(at::DispatchKeySet key_set, const caffe2::Typ is_non_overlapping_and_dense_ = false; set_storage_access_should_throw(); - set_sizes_strides_policy(SizesStridesPolicy::CustomStrides); + set_custom_sizes_strides(SizesStridesPolicy::CustomStrides); } // Destructor doesn't call release_resources because it's @@ -89,16 +89,16 @@ void SparseTensorImpl::set_indices_and_values_unsafe(const Tensor& indices, cons TORCH_CHECK(indices.options().backend() == values.options().backend(), "backend of indices (", indices.options().backend(), ") must match backend of values (", values.options().backend(), ")"); TORCH_CHECK(!indices.is_cuda() || indices.get_device() == values.get_device(), "device of indices (", indices.get_device(), ") must match device of values (", values.get_device(), ")"); - TORCH_CHECK(indices.dim() == 2, "indices must be sparse_dim x nnz, but got: ", indices.sizes()); - TORCH_CHECK(indices.size(1) == values.size(0), "indices and values must have same nnz, but got nnz from indices: ", indices.size(1), ", nnz from values: ", values.size(0)); - TORCH_CHECK(indices.size(0) == sparse_dim_, "indices has incorrect first dimension, expected ", sparse_dim_, ", got ", indices.size(0)); + TORCH_CHECK(indices.dim() == 2, "indices must be sparse_dim x nnz, but got: ", indices.sym_sizes()); + TORCH_CHECK(indices.sym_size(1) == values.sym_size(0), "indices and values must have same nnz, but got nnz from indices: ", indices.sym_size(1), ", nnz from values: ", values.sym_size(0)); + TORCH_CHECK(indices.sym_size(0) == sparse_dim_, "indices has incorrect first dimension, expected ", sparse_dim_, ", got ", indices.sym_size(0)); TORCH_CHECK(values.dim() == dense_dim_ + 1, "values has incorrect number of dimensions, expected ", dense_dim_ + 1, ", got ", values.dim()); - auto dense_size_original = sizes().slice(sparse_dim_); - std::vector expected_values_size_vec = {values.size(0)}; + auto dense_size_original = sym_sizes().slice(sparse_dim_); + std::vector expected_values_size_vec = {values.sym_size(0)}; expected_values_size_vec.insert(expected_values_size_vec.end(), dense_size_original.begin(), dense_size_original.end()); - IntArrayRef expected_values_size(expected_values_size_vec); - auto new_values_size = values.sizes(); + SymIntArrayRef expected_values_size(expected_values_size_vec); + auto new_values_size = values.sym_sizes(); TORCH_CHECK( std::equal(expected_values_size.begin(), expected_values_size.end(), new_values_size.begin()), "values has incorrect size, expected ", expected_values_size, ", got ", new_values_size @@ -109,7 +109,7 @@ void SparseTensorImpl::set_indices_and_values_unsafe(const Tensor& indices, cons AT_ASSERT(device() == values_.device()); AT_ASSERT(values_.device() == indices_.device()); - coalesced_ = false; + coalesced_ = sym_nnz() < 2; } diff --git a/aten/src/ATen/SparseTensorImpl.h b/aten/src/ATen/SparseTensorImpl.h index 9bbe3b86bc09..d90734100ca6 100644 --- a/aten/src/ATen/SparseTensorImpl.h +++ b/aten/src/ATen/SparseTensorImpl.h @@ -9,6 +9,7 @@ #include #else #include +#include #endif namespace at { @@ -51,6 +52,10 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { int64_t nnz() const { return values_.size(0); } + + c10::SymInt sym_nnz() const { + return values_.sym_size(0); + } int64_t sparse_dim() const { return sparse_dim_; } @@ -85,7 +90,7 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { TORCH_CHECK( !has_symbolic_sizes_strides_, "raw_resize_ called on tensor with symbolic shape") - sizes_and_strides_.set_sizes(size); + set_sizes_and_strides(size, std::vector(size.size())); sparse_dim_ = sparse_dim; dense_dim_ = dense_dim; refresh_numel(); @@ -116,7 +121,8 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { // 4. When we attempt to shrink the size of any of the sparse dimensions on a // non-empty sparse tensor (this could make some of the stored indices // out-of-bound and thus unsafe). - void resize_(int64_t sparse_dim, int64_t dense_dim, IntArrayRef size) { + template + void _resize_(int64_t sparse_dim, int64_t dense_dim, ArrayRef size) { TORCH_CHECK( allow_tensor_metadata_change(), "resize_ ", @@ -160,7 +166,7 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { bool shrinking_sparse_dims = false; bool shrinking_dense_dim = false; - auto sparse_size_original = sizes().slice(0, sparse_dim); + auto sparse_size_original = generic_sizes().slice(0, sparse_dim); auto sparse_size_new = size.slice(0, sparse_dim); for (const auto i : c10::irange(sparse_dim)) { if (sparse_size_new[i] < sparse_size_original[i]) { @@ -168,7 +174,7 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { break; } } - auto dense_size_original = sizes().slice(sparse_dim); + auto dense_size_original = generic_sizes().slice(sparse_dim); auto dense_size_new = size.slice(sparse_dim); for (const auto i : c10::irange(dense_dim)) { if (dense_size_new[i] < dense_size_original[i]) { @@ -196,8 +202,7 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { alt_options_msg); } - IntArrayRef sizes_and_strides = - asIntArrayRefSlow(sizes_and_strides_.sizes_arrayref()); + auto sizes_and_strides = generic_sizes(); const bool size_equals_sizes = std::equal( size.begin(), size.end(), @@ -205,23 +210,34 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { sizes_and_strides.end()); if ((!size_equals_sizes) || (sparse_dim != sparse_dim_) || (dense_dim != dense_dim_)) { - auto nnz = values().size(0); - std::vector values_size = {nnz}; + auto nnz = at::symint::sizes(values())[0]; + std::vector values_size = {nnz}; auto dense_size = size.slice(sparse_dim); values_size.insert( values_size.end(), dense_size.begin(), dense_size.end()); - values_.resize_(values_size); - indices_.resize_({sparse_dim, nnz}); + at::symint::resize_(values_, values_size); + at::symint::resize_(indices_, {T(sparse_dim), nnz}); } if (!size_equals_sizes) { - sizes_and_strides_.set_sizes(size); + set_sizes_and_strides(size, std::vector(size.size())); } sparse_dim_ = sparse_dim; dense_dim_ = dense_dim; refresh_numel(); } + void resize_(int64_t sparse_dim, int64_t dense_dim, ArrayRef size) { + return _resize_(sparse_dim, dense_dim, size); + } + + void resize_( + int64_t sparse_dim, + int64_t dense_dim, + ArrayRef size) { + return _resize_(sparse_dim, dense_dim, size); + } + // NOTE: this function will resize the sparse tensor and also set `indices` // and `values` to empty. void resize_and_clear_( @@ -244,7 +260,7 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { "), but got ", size.size()); - sizes_and_strides_.set_sizes(size); + set_sizes_and_strides(size, std::vector(size.size())); sparse_dim_ = sparse_dim; dense_dim_ = dense_dim; @@ -275,6 +291,9 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { AT_ASSERT(new_nnz <= nnz()); indices_ = indices_.narrow(1, 0, new_nnz); values_ = values_.narrow(0, 0, new_nnz); + if (new_nnz < 2) { + coalesced_ = true; + } } // Takes indices and values and directly puts them into the sparse tensor, no diff --git a/aten/src/ATen/TensorGeometry.cpp b/aten/src/ATen/TensorGeometry.cpp index 164a7b279129..7dbc5973b7fe 100644 --- a/aten/src/ATen/TensorGeometry.cpp +++ b/aten/src/ATen/TensorGeometry.cpp @@ -6,10 +6,11 @@ namespace at { // See TensorGeometry.h on why this is useful now that we cache is_contiguous. -bool geometry_is_contiguous(IntArrayRef sizes, IntArrayRef strides) { +template +bool _geometry_is_contiguous(ArrayRef sizes, ArrayRef strides) { assert(!overflows(sizes.size())); auto dim = static_cast(sizes.size()); - int64_t expected_stride = 1; + T expected_stride = 1; bool contig_if_nonempty = true; for (int64_t i = dim - 1; i >= 0; i--) { if (sizes[i] == 0) { @@ -25,11 +26,15 @@ bool geometry_is_contiguous(IntArrayRef sizes, IntArrayRef strides) { return contig_if_nonempty; } +bool geometry_is_contiguous(IntArrayRef sizes, IntArrayRef strides) { + return _geometry_is_contiguous(sizes, strides); +} + bool TensorGeometry::is_contiguous() const { if (numel_ == 0) { return true; } - return at::geometry_is_contiguous(sizes_, strides_); + return at::_geometry_is_contiguous(sizes_, strides_); } } // namespace at diff --git a/aten/src/ATen/TensorGeometry.h b/aten/src/ATen/TensorGeometry.h index e89a666a8c56..110f2356c3a5 100644 --- a/aten/src/ATen/TensorGeometry.h +++ b/aten/src/ATen/TensorGeometry.h @@ -15,10 +15,14 @@ TORCH_API bool geometry_is_contiguous(IntArrayRef sizes, IntArrayRef strides); struct TORCH_API TensorGeometry { TensorGeometry() : storage_offset_(0) {} - explicit TensorGeometry(IntArrayRef sizes) - : sizes_(sizes.vec()), strides_(sizes.size()), storage_offset_(0) { + explicit TensorGeometry(c10::SymIntArrayRef sizes) + : sizes_(sizes.vec()), + strides_(sizes.size()), + storage_offset_(0), + has_symbolic_sizes_strides_( + !c10::asIntArrayRefSlowOpt(sizes).has_value()) { int64_t dim = sizes.size(); - int64_t expected_stride = 1; + c10::SymInt expected_stride = 1; for (int64_t i = dim - 1; i >= 0; i--) { strides_[i] = expected_stride; expected_stride *= sizes_[i]; @@ -27,10 +31,12 @@ struct TORCH_API TensorGeometry { } explicit TensorGeometry(const TensorBase& t) - : sizes_(t.sizes().vec()), - strides_(t.strides().vec()), - storage_offset_(t.storage_offset()), - numel_(t.numel()) {} + : sizes_(t.sym_sizes().vec()), + strides_(t.sym_strides().vec()), + storage_offset_(t.sym_storage_offset()), + numel_(t.sym_numel()), + has_symbolic_sizes_strides_( + t.unsafeGetTensorImpl()->has_symbolic_sizes_strides()) {} // true if the tensor is contiguous bool is_contiguous() const; @@ -38,24 +44,52 @@ struct TORCH_API TensorGeometry { int64_t dim() const { return sizes_.size(); } + int64_t size(int64_t dim) const { + TORCH_INTERNAL_ASSERT(!has_symbolic_sizes_strides_); dim = c10::maybe_wrap_dim(dim, this->dim()); - return sizes_.at(static_cast(dim)); + return sizes_.at(static_cast(dim)).as_int_unchecked(); } - IntArrayRef sizes() const { - return IntArrayRef{sizes_}; + c10::IntArrayRef sizes() const { + TORCH_INTERNAL_ASSERT(!has_symbolic_sizes_strides_); + return c10::asIntArrayRefUnchecked(sizes_); } int64_t stride(int64_t dim) const { + TORCH_INTERNAL_ASSERT(!has_symbolic_sizes_strides_); dim = c10::maybe_wrap_dim(dim, this->dim()); - return strides_.at(static_cast(dim)); + return strides_.at(static_cast(dim)).as_int_unchecked(); } - IntArrayRef strides() const { - return IntArrayRef{strides_}; + c10::IntArrayRef strides() const { + TORCH_INTERNAL_ASSERT(!has_symbolic_sizes_strides_); + return c10::asIntArrayRefUnchecked(strides_); } int64_t storage_offset() const { - return storage_offset_; + TORCH_INTERNAL_ASSERT(!has_symbolic_sizes_strides_); + return storage_offset_.as_int_unchecked(); } int64_t numel() const { + TORCH_INTERNAL_ASSERT(!has_symbolic_sizes_strides_); + return numel_.as_int_unchecked(); + } + + c10::SymInt sym_size(int64_t dim) const { + dim = c10::maybe_wrap_dim(dim, this->dim()); + return sizes_.at(static_cast(dim)); + } + c10::SymIntArrayRef sym_sizes() const { + return sizes_; + } + c10::SymInt sym_stride(int64_t dim) const { + dim = c10::maybe_wrap_dim(dim, this->dim()); + return strides_.at(static_cast(dim)); + } + c10::SymIntArrayRef sym_strides() const { + return strides_; + } + c10::SymInt sym_storage_offset() const { + return storage_offset_; + } + c10::SymInt sym_numel() const { return numel_; } @@ -80,10 +114,12 @@ struct TORCH_API TensorGeometry { return r; } - std::vector sizes_; - std::vector strides_; - int64_t storage_offset_; - int64_t numel_; + private: + std::vector sizes_; + std::vector strides_; + c10::SymInt storage_offset_; + c10::SymInt numel_; + bool has_symbolic_sizes_strides_; }; } // namespace at diff --git a/aten/src/ATen/TensorIndexing.h b/aten/src/ATen/TensorIndexing.h index f1eedfa83ef9..19333741a981 100644 --- a/aten/src/ATen/TensorIndexing.h +++ b/aten/src/ATen/TensorIndexing.h @@ -1,15 +1,22 @@ #pragma once #include -#include #include +#include #include +#include #include #include -// TODO: try to remove this -// There is some back story, see https://github.com/pytorch/pytorch/issues/48684 +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#include +#endif #include @@ -211,7 +218,7 @@ static inline Tensor applySlice( int64_t step, bool disable_slice_optimization, const at::Device& self_device, - const c10::optional& self_sizes) { + const c10::optional& self_sizes) { // TODO: implement negative step TORCH_CHECK_VALUE(step > 0, "step must be greater than zero"); @@ -220,10 +227,10 @@ static inline Tensor applySlice( // Skip this optimization if we are tracing, as the trace may be polymorphic // over the shape of the `self` tensor, and we still want to record // the slice. - int64_t length = (self_device == at::kCPU || self_device == at::kCUDA) + SymInt length = (self_device == at::kCPU || self_device == at::kCUDA) ? (*self_sizes)[dim] - : self.size(dim); - if (!disable_slice_optimization && start == 0 && stop == length && + : self.sym_size(dim); + if (!disable_slice_optimization && start == 0 && length == stop && step == 1) { return self; } @@ -237,7 +244,7 @@ static inline Tensor applySelect( int64_t index, int64_t real_dim, const at::Device& /*self_device*/, - const c10::optional& self_sizes) { + const c10::optional& self_sizes) { // See NOTE [nested tensor size for indexing] if (self_sizes.has_value()) { TORCH_CHECK_INDEX( @@ -245,9 +252,9 @@ static inline Tensor applySelect( "invalid index of a 0-dim tensor. ", "Use `tensor.item()` in Python or `tensor.item()` in C++ to convert a 0-dim tensor to a number"); - int64_t size = (*self_sizes)[dim]; + auto size = (*self_sizes)[dim]; TORCH_CHECK_INDEX( - index >= -size && index < size, + size >= -index && size > index, "index ", index, " is out of bounds for dimension ", @@ -383,7 +390,7 @@ static inline Tensor scalarToTensor( // To match numpy semantics: // As a special case for backwards compatibility, // strip away unit dimensions from the left of 'src' -static inline IntArrayRef slicePrefix1sSize(const IntArrayRef& sizes) { +static inline SymIntArrayRef slicePrefix1sSize(const SymIntArrayRef& sizes) { size_t first_non1_src = sizes.size(); for (const auto i : c10::irange(sizes.size())) { if (sizes[i] != 1) { @@ -396,7 +403,7 @@ static inline IntArrayRef slicePrefix1sSize(const IntArrayRef& sizes) { } static inline void copy_to(const Tensor& dst, const Tensor& src) { - if (dst.sizes().equals(src.sizes())) { + if (dst.sym_sizes().equals(src.sym_sizes())) { // A shortcut to avoid generating hard-coded constant sizes during tracing. // This is not a perfect solution: when src & dst have different shapes, // constants will still appear. Users can workaround that case by @@ -407,7 +414,7 @@ static inline void copy_to(const Tensor& dst, const Tensor& src) { dst.fill_(src); return; } - auto src_view = src.view(slicePrefix1sSize(src.sizes())); + auto src_view = src.view_symint(slicePrefix1sSize(src.sym_sizes())); c10::MaybeOwned b_src = expand_inplace(dst, src_view, "setitem"); dst.copy_(*b_src); } @@ -424,7 +431,7 @@ static inline Tensor handleDimInMultiDimIndexing( std::vector& outIndices, bool disable_slice_optimization, const at::Device& original_tensor_device, - const c10::optional& prev_dim_result_sizes) { + const c10::optional& prev_dim_result_sizes) { if (index.is_integer()) { return impl::applySelect( prev_dim_result, @@ -508,7 +515,7 @@ static inline Tensor applySlicing( std::vector& outIndices, bool disable_slice_optimization, const at::Device& self_device, - const c10::optional& self_sizes) { + const c10::optional& self_sizes) { int64_t dim = 0; int64_t specified_dims = impl::count_specified_dimensions(indices); @@ -524,9 +531,9 @@ static inline Tensor applySlicing( for (const auto i : c10::irange(indices.size())) { auto& obj = indices[i]; // See NOTE [nested tensor size for indexing] - c10::optional result_sizes = result.is_nested() - ? c10::optional(c10::nullopt) - : c10::optional(result.sizes()); + c10::optional result_sizes = result.is_nested() + ? c10::optional(c10::nullopt) + : c10::optional(result.sym_sizes()); result = handleDimInMultiDimIndexing( /*prev_dim_result=*/result, /*original_tensor=*/self, @@ -600,9 +607,9 @@ static inline Tensor get_item( // nested tensor does not have a size (yet) so for now we represent its size // as null may need to be changed after we reach a better solution for nested // tensor size - c10::optional self_sizes = self.is_nested() - ? c10::optional(c10::nullopt) - : c10::optional(self.sizes()); + c10::optional self_sizes = self.is_nested() + ? c10::optional(c10::nullopt) + : c10::optional(self.sym_sizes()); // handle simple types: integers, slices, none, ellipsis, bool if (indices.size() == 1) { @@ -663,7 +670,7 @@ static inline void set_item( const Tensor& value, bool disable_slice_optimization = false) { at::Device self_device = self.device(); - IntArrayRef self_sizes = self.sizes(); + SymIntArrayRef self_sizes = self.sym_sizes(); // handle simple types: integers, slices, ellipsis, bool if (indices.size() == 1) { @@ -713,11 +720,11 @@ static inline void set_item( return; } - IntArrayRef valueSizes = value.sizes(); - IntArrayRef slicedValueSizes = slicePrefix1sSize(valueSizes); + SymIntArrayRef valueSizes = value.sym_sizes(); + SymIntArrayRef slicedValueSizes = slicePrefix1sSize(valueSizes); Tensor valuesSliced; if (!valueSizes.equals(slicedValueSizes)) { - valuesSliced = value.view(slicedValueSizes); + valuesSliced = value.view_symint(slicedValueSizes); } else { valuesSliced = value; } diff --git a/aten/src/ATen/TensorIterator.cpp b/aten/src/ATen/TensorIterator.cpp index a4715a2caabb..7e86163f1ca4 100644 --- a/aten/src/ATen/TensorIterator.cpp +++ b/aten/src/ATen/TensorIterator.cpp @@ -431,7 +431,7 @@ void TensorIteratorBase::compute_types(const TensorIteratorConfig& config) { } // Computes a common dtype, if needed - if (has_different_input_dtypes && config.promote_inputs_to_common_dtype_) { + if ((has_different_input_dtypes || all_ops_are_scalars_) && config.promote_inputs_to_common_dtype_) { common_dtype_ = compute_common_dtype(); } @@ -1237,11 +1237,12 @@ void TensorIteratorBase::compute_shape(const TensorIteratorConfig& config) { shape_ = infer_size_dimvector(shape_, shape); } } + all_ops_are_scalars_ = !has_tensors; } void TensorIteratorBase::compute_strides(const TensorIteratorConfig& config) { for (auto& op : operands_) { - if (op.tensor_base().defined()) { + if (op.tensor_base().defined() && !op.will_resize) { IntArrayRef original_shape = config.static_shape_ ? shape_ : op.tensor_base().sizes(); auto original_stride = op.tensor_base().strides(); auto element_size_in_bytes = op.tensor_base().element_size(); @@ -1491,10 +1492,19 @@ void TensorIteratorBase::build(TensorIteratorConfig& config) { if (is_meta_) return; + auto has_storage = true; + for (auto& op : operands_) { + has_storage &= op.tensor_base().has_storage(); + } + auto privateuse1_without_storage = + common_device_.type() == DeviceType::PrivateUse1 && + !has_storage; + // XLA and lazy tensors don't have storage, so they don't have an underlying data pointer. // Nothing beyond this point is important for meta functions, so it's fine to exit early here. // Extend the condition to ORT tesnors as ORT tensors also don't have storage. - if (common_device_.type() == DeviceType::XLA || + if (privateuse1_without_storage || + common_device_.type() == DeviceType::XLA || common_device_.type() == DeviceType::IPU || common_device_.type() == DeviceType::Lazy || common_device_.type() == DeviceType::ORT || diff --git a/aten/src/ATen/TensorIterator.h b/aten/src/ATen/TensorIterator.h index fdf86cbba6af..31ae65466870 100644 --- a/aten/src/ATen/TensorIterator.h +++ b/aten/src/ATen/TensorIterator.h @@ -473,6 +473,10 @@ struct TORCH_API TensorIteratorBase : public impl::MetaBase { } bool has_contiguous_first_dim() const { + if (ndim() == 0) { + return true; + } + int num_tensors = ntensors(); for (const auto i : c10::irange(num_tensors)) { if (strides(i)[0] != element_size(i)) { @@ -655,9 +659,12 @@ struct TORCH_API TensorIteratorBase : public impl::MetaBase { /// in operands_). int num_outputs_ = 0; - /// Whether or not all operands have the same shape. Having all the same - /// shape affects whether or not the iterator is eligible for fast setup. + /// Whether or not all operands have the same shape and are 1d+. Having all + /// the same shape affects whether or not the iterator is eligible for fast + /// setup. bool all_ops_same_shape_ = false; + /// Whether or not all operands are 0d, this affects type promotion + bool all_ops_are_scalars_ = false; /// The "computation" dtype of TensorIterator, specifying what the dtype /// we will do the internal computation in TensorIterator. Typically, diff --git a/aten/src/ATen/TensorMeta.h b/aten/src/ATen/TensorMeta.h index 97124611ca13..07631c3552fd 100644 --- a/aten/src/ATen/TensorMeta.h +++ b/aten/src/ATen/TensorMeta.h @@ -71,6 +71,7 @@ namespace impl { struct TORCH_API MetaBase { virtual const Tensor& maybe_get_output(int64_t output_idx) = 0; + // Note: [set_output_*] // See: https://github.com/pytorch/pytorch/issues/69813 // Whenever defining the output properties in the META function of a // structured kernel (what was usually done with `set_output`), use one of diff --git a/aten/src/ATen/TensorSubclassLikeUtils.h b/aten/src/ATen/TensorSubclassLikeUtils.h index 5c01ce979040..44b422324590 100644 --- a/aten/src/ATen/TensorSubclassLikeUtils.h +++ b/aten/src/ATen/TensorSubclassLikeUtils.h @@ -1,5 +1,13 @@ #pragma once -#include +#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif namespace at { @@ -23,7 +31,9 @@ namespace at { // or returning a regular non-Tensor-subclass Tensor! constexpr auto kFunctorchWrappedTensors = DispatchKeySet( - {DispatchKey::FuncTorchGradWrapper, DispatchKey::FuncTorchBatched}); + {DispatchKey::FuncTorchGradWrapper, + DispatchKey::FuncTorchBatched, + DispatchKey::Functionalize}); constexpr auto kTensorSubclassLike = kFunctorchWrappedTensors | @@ -39,16 +49,22 @@ constexpr auto kTensorSubclassLike = DispatchKeySet(BackendComponent::MetaBit); inline bool isTensorSubclassLike(const Tensor& tensor) { + if (c10::impl::dispatch_mode_enabled()) + return true; auto key_set = tensor.unsafeGetTensorImpl()->key_set(); return !(key_set & kTensorSubclassLike).empty(); } inline bool areAnyTensorSubclassLike(TensorList tensors) { + if (c10::impl::dispatch_mode_enabled()) + return true; return std::any_of(tensors.begin(), tensors.end(), isTensorSubclassLike); } inline bool areAnyOptionalTensorSubclassLike( const c10::List>& tensors) { + if (c10::impl::dispatch_mode_enabled()) + return true; return std::any_of( tensors.begin(), tensors.end(), [](const optional& opt_tensor) { return ( @@ -56,4 +72,16 @@ inline bool areAnyOptionalTensorSubclassLike( }); } +// Helper function to deal testing truthfulness of a scalar tensor +// in a Composite Compliant manner. +// NOTE: This function expects a scalar tensor of boolean dtype. +// Eg. +// Non-Composite Compliant Pattern : (t == 0).all().item() +// Composite Compliant Patter : is_salar_tensor_true((t == 0).all()) +inline bool is_scalar_tensor_true(const Tensor& t) { + TORCH_INTERNAL_ASSERT(t.dim() == 0) + TORCH_INTERNAL_ASSERT(t.scalar_type() == kBool) + return at::equal(t, t.new_ones({}, t.options())); +} + } // namespace at diff --git a/aten/src/ATen/TensorUtils.cpp b/aten/src/ATen/TensorUtils.cpp index 7fbddd7a3482..4f211df680ec 100644 --- a/aten/src/ATen/TensorUtils.cpp +++ b/aten/src/ATen/TensorUtils.cpp @@ -75,6 +75,14 @@ void checkSize(CheckedFrom c, const TensorGeometryArg& t, IntArrayRef sizes) { " for ", t, " (while checking arguments for ", c, ")"); } +void checkSize_symint(CheckedFrom c, const TensorGeometryArg& t, c10::SymIntArrayRef sizes) { + checkDim(c, t, sizes.size()); + TORCH_CHECK( + t->sym_sizes().equals(sizes), + "Expected tensor of size ", sizes, ", but got tensor of size ", t->sizes(), + " for ", t, " (while checking arguments for ", c, ")"); +} + void checkSize(CheckedFrom c, const TensorGeometryArg& t, int64_t dim, int64_t size) { TORCH_CHECK( t->size(dim) == size, @@ -83,6 +91,14 @@ void checkSize(CheckedFrom c, const TensorGeometryArg& t, int64_t dim, int64_t s " (while checking arguments for ", c, ")"); } +void checkSize_symint(CheckedFrom c, const TensorGeometryArg& t, int64_t dim, c10::SymInt size) { + TORCH_CHECK( + t->sym_size(dim) == size, + "Expected tensor to have size ", size, " at dimension ", dim, + ", but got size ", t->size(dim), " for ", t, + " (while checking arguments for ", c, ")"); +} + void checkAllSame(CheckedFrom c, ArrayRef tensors, void(*fn)(CheckedFrom, const TensorArg&, const TensorArg&)) { const TensorArg* t0 = nullptr; for (auto& t : tensors) { @@ -310,12 +326,12 @@ std::vector defaultStrides(IntArrayRef sizes) { // templatized for DimVector and IntArrayRef use cases, // see overloads of computeStride() below. // -template +template inline c10::optional computeStride_impl( - IntArrayRef oldshape, - IntArrayRef oldstride, + const NewShapeVec& oldshape, + const NewShapeVec& oldstride, const NewShapeVec& newshape, - ResultVec toResult(const IntArrayRef&) + ResultVec toResult(const NewShapeVec&) ) { if (oldshape.empty()) { return ResultVec(newshape.size(), 1); @@ -326,7 +342,7 @@ inline c10::optional computeStride_impl( // we use the stride as if it were computed via resize. // This could perhaps be combined with the below code, but the complexity // didn't seem worth it. - const int64_t numel = c10::multiply_integers(oldshape); + const Numel numel = c10::multiply_integers(oldshape); if (numel == 0 && oldshape.equals(newshape)) { return toResult(oldstride); } @@ -338,7 +354,7 @@ inline c10::optional computeStride_impl( newstride[view_d] = 1; } else { newstride[view_d] = - std::max(newshape[view_d+1], 1) * newstride[view_d+1]; + std::max(newshape[view_d+1], Numel(1)) * newstride[view_d+1]; } } return newstride; @@ -346,10 +362,10 @@ inline c10::optional computeStride_impl( int64_t view_d = (int64_t)newshape.size() - 1; // stride for each subspace in the chunk - int64_t chunk_base_stride = oldstride.back(); + Numel chunk_base_stride = oldstride.back(); // numel in current chunk - int64_t tensor_numel = 1; - int64_t view_numel = 1; + Numel tensor_numel = 1; + Numel view_numel = 1; for (int64_t tensor_d = oldshape.size() - 1; tensor_d >= 0; tensor_d--) { tensor_numel *= oldshape[tensor_d]; // if end of tensor size chunk, check view @@ -383,7 +399,15 @@ c10::optional> computeStride( IntArrayRef oldstride, IntArrayRef newshape) { auto toResult = [](const IntArrayRef& a) { return a.vec(); }; - return computeStride_impl, IntArrayRef>(oldshape, oldstride, newshape, toResult); + return computeStride_impl, IntArrayRef, int64_t>(oldshape, oldstride, newshape, toResult); +} + +c10::optional computeStride( + c10::SymIntArrayRef oldshape, + c10::SymIntArrayRef oldstride, + c10::SymIntArrayRef newshape) { + auto toResult = [](const SymIntArrayRef& a) { return SymDimVector(a); }; + return computeStride_impl(oldshape, oldstride, newshape, toResult); } c10::optional computeStride( @@ -391,7 +415,7 @@ c10::optional computeStride( IntArrayRef oldstride, const DimVector& newshape) { auto toResult = [](const IntArrayRef& a) { return DimVector(a); }; - return computeStride_impl(oldshape, oldstride, newshape, toResult); + return computeStride_impl(oldshape, oldstride, newshape, toResult); } } // namespace detail diff --git a/aten/src/ATen/TensorUtils.h b/aten/src/ATen/TensorUtils.h index 4bfe87c9de44..8f5d2687c9c3 100644 --- a/aten/src/ATen/TensorUtils.h +++ b/aten/src/ATen/TensorUtils.h @@ -85,11 +85,20 @@ TORCH_API void checkSize( CheckedFrom c, const TensorGeometryArg& t, IntArrayRef sizes); +TORCH_API void checkSize_symint( + CheckedFrom c, + const TensorGeometryArg& t, + c10::SymIntArrayRef sizes); TORCH_API void checkSize( CheckedFrom c, const TensorGeometryArg& t, int64_t dim, int64_t size); +TORCH_API void checkSize_symint( + CheckedFrom c, + const TensorGeometryArg& t, + int64_t dim, + c10::SymInt size); TORCH_API void checkNumel( CheckedFrom c, const TensorGeometryArg& t, @@ -157,6 +166,11 @@ TORCH_API c10::optional> computeStride( IntArrayRef oldstride, IntArrayRef newshape); +TORCH_API c10::optional computeStride( + c10::SymIntArrayRef oldshape, + c10::SymIntArrayRef oldstride, + c10::SymIntArrayRef newshape); + TORCH_API c10::optional computeStride( IntArrayRef oldshape, IntArrayRef oldstride, diff --git a/aten/src/ATen/ThreadLocalState.cpp b/aten/src/ATen/ThreadLocalState.cpp index 8315ddad97b2..5c8214b7d882 100644 --- a/aten/src/ATen/ThreadLocalState.cpp +++ b/aten/src/ATen/ThreadLocalState.cpp @@ -6,6 +6,7 @@ #include #include +#include namespace at { @@ -14,18 +15,24 @@ ThreadLocalState::ThreadLocalState() debug_info_(c10::ThreadLocalDebugInfo::current()), functorch_tls_(functorch::getCopyOfFuncTorchTLS()), autograd_tls_(c10::AutogradState::get_tls_state()), - python_torch_function_state_(at::impl::PythonTorchFunctionTLS::get_state()) { + python_dispatcher_state_(c10::impl::PythonDispatcherTLS::get_state()), + python_torch_function_state_(at::impl::PythonTorchFunctionTLS::get_state()), + functionalization_reapply_views_state_(at::functionalization::impl::getFunctionalizationReapplyViewsTLS()) { rf_tls_ = at::get_record_function_tls_(); - saved_tensors_default_hooks_ = at::SavedTensorDefaultHooks::get_stack(); + saved_tensors_default_hooks_state_ = at::SavedTensorDefaultHooks::get_tls_state(); - torch_dispatch_mode_state_ = at::impl::TorchDispatchModeTLS::get_state(); + torch_dispatch_mode_state_ = c10::impl::TorchDispatchModeTLS::get_state(); } void ThreadLocalState::set_grad_mode(bool enabled) { autograd_tls_.set_grad_mode(enabled); } +void ThreadLocalState::set_multithreading_enabled(bool enabled) { + autograd_tls_.set_multithreading_enabled(enabled); +} + /* static */ void ThreadLocalState::setThreadLocalState( const ThreadLocalState& state) { @@ -33,19 +40,23 @@ void ThreadLocalState::setThreadLocalState( // restore the dispatch key set TLS at the same time. c10::AutogradState::set_tls_state(state.autograd_tls_); - at::impl::TorchDispatchModeTLS::set_state(state.torch_dispatch_mode_state_); + c10::impl::TorchDispatchModeTLS::set_state(state.torch_dispatch_mode_state_); at::impl::PythonTorchFunctionTLS::set_state(state.python_torch_function_state_); at::set_record_function_tls_(state.rf_tls_); - at::SavedTensorDefaultHooks::set_stack(state.saved_tensors_default_hooks_); + at::SavedTensorDefaultHooks::set_tls_state(state.saved_tensors_default_hooks_state_); + + c10::impl::PythonDispatcherTLS::set_state(state.python_dispatcher_state_); c10::ThreadLocalDebugInfo::_forceCurrentDebugInfo(state.debug_info_); c10::impl::_force_tls_local_dispatch_key_set(state.dispatch_key_); functorch::setFuncTorchTLS(state.functorch_tls_); + + at::functionalization::impl::setFunctionalizationReapplyViewsTLS(state.functionalization_reapply_views_state_); } } // namespace at diff --git a/aten/src/ATen/ThreadLocalState.h b/aten/src/ATen/ThreadLocalState.h index a21ee6a674f3..0184cc9b82c4 100644 --- a/aten/src/ATen/ThreadLocalState.h +++ b/aten/src/ATen/ThreadLocalState.h @@ -9,8 +9,10 @@ #include #include -#include +#include #include +#include +#include namespace at { @@ -28,6 +30,12 @@ class TORCH_API ThreadLocalState { // autograd engine. void set_grad_mode(bool enabled); + // set_multithreading_enabled - force the value of the multithreadinmaximum + // threads TLS in + // the current state object. This is used for example in the + // autograd engine. + void set_multithreading_enabled(bool enabled); + // Sets thread local variables in the current thread, // according to the thread boundary specified static void setThreadLocalState(const ThreadLocalState& state); @@ -55,13 +63,18 @@ class TORCH_API ThreadLocalState { AutogradState autograd_tls_; // TLS for enable_torch_dispatch_mode - std::shared_ptr torch_dispatch_mode_state_; + c10::impl::TorchDispatchModeTLS torch_dispatch_mode_state_; + + // TLS for enable_python_dispatcher + c10::impl::PyInterpreter* python_dispatcher_state_; // TLS for __torch_function__ (mode and disable_torch_function) at::impl::PythonTorchFunctionTLS python_torch_function_state_; // TLS for saved tensors default hooks - std::stack> saved_tensors_default_hooks_; + at::impl::SavedTensorDefaultHooksTLS saved_tensors_default_hooks_state_; + + bool functionalization_reapply_views_state_; friend class ThreadLocalStateGuard; }; diff --git a/aten/src/ATen/Utils.h b/aten/src/ATen/Utils.h index bbc235182f1e..61c9c58fa437 100644 --- a/aten/src/ATen/Utils.h +++ b/aten/src/ATen/Utils.h @@ -26,59 +26,6 @@ namespace at { TORCH_API int _crash_if_asan(int); -// TODO: This unwrapping code is ONLY used for TH bindings; once TH goes -// away, we can delete this function -static inline TensorImpl* checked_dense_tensor_unwrap( - const Tensor& expr, - const char* name, - int pos, - const char* api, - bool allowNull, - DeviceType device_type, - ScalarType scalar_type) { - if (allowNull && !expr.defined()) { - return nullptr; - } - if (expr.layout() != Layout::Strided) { - AT_ERROR( - "Expected dense tensor but got ", - expr.layout(), - " for argument #", - pos, - " '", - name, - "' in call to ", - api); - } - if (expr.device().type() != device_type) { - AT_ERROR( - "Expected object of device type ", - device_type, - " but got device type ", - expr.device().type(), - " for argument #", - pos, - " '", - name, - "' in call to ", - api); - } - if (expr.scalar_type() != scalar_type) { - AT_ERROR( - "Expected object of scalar type ", - scalar_type, - " but got scalar type ", - expr.scalar_type(), - " for argument #", - pos, - " '", - name, - "' in call to ", - api); - } - return expr.unsafeGetTensorImpl(); -} - // Converts a TensorList (i.e. ArrayRef to vector of TensorImpl*) // NB: This is ONLY used by legacy TH bindings, and ONLY used by cat. // Once cat is ported entirely to ATen this can be deleted! diff --git a/aten/src/ATen/VmapTransforms.cpp b/aten/src/ATen/VmapTransforms.cpp index 20c792f73709..71ef7a169026 100644 --- a/aten/src/ATen/VmapTransforms.cpp +++ b/aten/src/ATen/VmapTransforms.cpp @@ -1,5 +1,6 @@ #include #include +#include #include namespace at { @@ -188,7 +189,7 @@ static Tensor alignBatchDimsAtFront( // 4. Expand each physical tensor so that they have output batch size equal // to `batch_sizes` VmapPhysicalViewVec -MultiBatchVmapTransform::logicalToPhysical(TensorList logical_tensors) { +MultiBatchVmapTransform::logicalToPhysical(ITensorListRef logical_tensors) { // Figure out all of the collective vmap levels in `logical_tensors`. std::bitset collective_levels; for (const auto& logical_tensor : logical_tensors) { diff --git a/aten/src/ATen/VmapTransforms.h b/aten/src/ATen/VmapTransforms.h index 53e476e2243f..cece52dcbc41 100644 --- a/aten/src/ATen/VmapTransforms.h +++ b/aten/src/ATen/VmapTransforms.h @@ -1,6 +1,7 @@ #pragma once #include +#include namespace at { @@ -55,7 +56,7 @@ using VmapDimVector = SmallVector; // and returns a VmapPhysicalView on the tensor(s). struct TORCH_API MultiBatchVmapTransform { static VmapPhysicalView logicalToPhysical(const Tensor& logical_tensor); - static VmapPhysicalViewVec logicalToPhysical(TensorList logical_tensors); + static VmapPhysicalViewVec logicalToPhysical(ITensorListRef logical_tensors); }; // VmapTransform for operators that broadcast all inputs. diff --git a/aten/src/ATen/WrapDimUtils.h b/aten/src/ATen/WrapDimUtils.h index 13f8658c354d..b0bc583b90c2 100644 --- a/aten/src/ATen/WrapDimUtils.h +++ b/aten/src/ATen/WrapDimUtils.h @@ -8,22 +8,17 @@ namespace at { -static inline int64_t maybe_wrap_dim( - int64_t dim, - int64_t dim_post_expr, - bool wrap_scalar = true) { - // if dim_post_expr is 0 and wrap_scalar is true, then dim must be in the - // range [-1, 0]. This is a special case for scalar tensors and manifests in - // e.g. torch.sum(scalar_tensor, 0) Otherwise, dim should be in the range - // [-dim_post_expr, dim_post_expr-1]. - return c10::maybe_wrap_dim(dim, dim_post_expr, wrap_scalar); -} +// if dim_post_expr is 0 and wrap_scalar is true, then dim must be in the +// range [-1, 0]. This is a special case for scalar tensors and manifests in +// e.g. torch.sum(scalar_tensor, 0) Otherwise, dim should be in the range +// [-dim_post_expr, dim_post_expr-1]. +using c10::maybe_wrap_dim; -static inline int64_t maybe_wrap_dim(int64_t dim, TensorImpl* tensor) { +inline int64_t maybe_wrap_dim(int64_t dim, TensorImpl* tensor) { return maybe_wrap_dim(dim, tensor->dim()); } -static inline int64_t maybe_wrap_dim(int64_t dim, TensorList tensors) { +inline int64_t maybe_wrap_dim(int64_t dim, TensorList tensors) { if (tensors.size() == 0) { // can't wrap empty TensorList; rely on underlying implementation to throw // error if necessary. @@ -32,7 +27,7 @@ static inline int64_t maybe_wrap_dim(int64_t dim, TensorList tensors) { return maybe_wrap_dim(dim, tensors[0].dim()); } -static inline int64_t maybe_wrap_dim( +inline int64_t maybe_wrap_dim( int64_t dim, const std::vector>& tensor_sizes) { if (tensor_sizes.size() == 0) { @@ -43,14 +38,29 @@ static inline int64_t maybe_wrap_dim( return maybe_wrap_dim(dim, tensor_sizes[0].size()); } -// wrap each dim in the dims array, taking dim_post_expr as the true number of -// dimensions -static inline void maybe_wrap_dims_n( +// Given an array of dimensions `dims` of length `ndims`, this function "Wraps" +// each dim in-place for a tensor of rank `dim_post_expr`, allowing dims to be +// specified using negative indices. +// +// Additionally, if `wrap_scalar` is true then scalar tensors with rank 0, will +// allow dimensions in the range [-1, 0]. Otherwise, an IndexError is raised for +// dimensions not in the range [-dim_post_expr, dim_post_expr). +inline void maybe_wrap_dims_n( int64_t* dims, int64_t ndims, - int64_t dim_post_expr) { + int64_t dim_post_expr, + bool wrap_scalars = true) { if (dim_post_expr <= 0) { - dim_post_expr = 1; // this will make range [-1, 0] + if (wrap_scalars) { + dim_post_expr = 1; // this will make range [-1, 0] + } else { + TORCH_CHECK_INDEX( + ndims == 0, + "Dimension specified as ", + dims[0], + " but tensor has no dimensions"); + return; + } } int64_t min = -dim_post_expr; int64_t max = dim_post_expr - 1; @@ -72,11 +82,20 @@ static inline void maybe_wrap_dims_n( } } -// Wrap each dim in a contiguous container, taking dim_post_expr as the true -// number of dimensions E.g. could also be std::array or c10::SmallVector +// Given a contiguous container of dimensions `dims`, this function "Wraps" +// each dim in-place for a tensor of rank `dim_post_expr`, allowing dims to be +// specified using negative indices. +// +// Additionally, if `wrap_scalar` is true then scalar tensors with rank 0, will +// allow dimensions in the range [-1, 0]. Otherwise, an IndexError is raised for +// dimensions not in the range [-dim_post_expr, dim_post_expr). template -inline void maybe_wrap_dims(Container& dims, int64_t dim_post_expr) { - return maybe_wrap_dims_n(dims.data(), dims.size(), dim_post_expr); +inline void maybe_wrap_dims( + Container& dims, + int64_t dim_post_expr, + bool wrap_scalars = true) { + return maybe_wrap_dims_n( + dims.data(), dims.size(), dim_post_expr, wrap_scalars); } // previously, size [0] tensors were the only possible empty tensors; thus, it @@ -85,11 +104,12 @@ inline void maybe_wrap_dims(Container& dims, int64_t dim_post_expr) { // dimension behavior and dimension size checking). We maintain this behavior // for backwards compatibility, but only for this specific size (i.e. other // empty sizes are not skipped). -static inline int64_t legacy_cat_wrap_dim( +template +inline int64_t _legacy_cat_wrap_dim( int64_t dim, - const std::vector>& tensor_sizes) { + const std::vector>& tensor_sizes) { for (auto& sizes : tensor_sizes) { - if (sizes == std::vector({0})) { + if (sizes.size() == 1 && sizes[0] == 0) { continue; } return maybe_wrap_dim(dim, sizes.size()); @@ -97,8 +117,22 @@ static inline int64_t legacy_cat_wrap_dim( return dim; } -static inline int64_t legacy_cat_wrap_dim(int64_t dim, ITensorListRef tensors) { - for (auto& tensor : tensors) { +inline int64_t legacy_cat_wrap_dim( + int64_t dim, + const std::vector>& tensor_sizes) { + return _legacy_cat_wrap_dim(dim, tensor_sizes); +} + +inline int64_t legacy_cat_wrap_dim_symint( + int64_t dim, + const std::vector>& tensor_sizes) { + return _legacy_cat_wrap_dim(dim, tensor_sizes); +} + +inline int64_t legacy_cat_wrap_dim( + int64_t dim, + const MaterializedITensorListRef& tensors) { + for (const Tensor& tensor : tensors) { if (tensor.dim() == 1 && tensor.sizes()[0] == 0) { continue; } @@ -108,7 +142,7 @@ static inline int64_t legacy_cat_wrap_dim(int64_t dim, ITensorListRef tensors) { } // wrap negative dims in a vector -static inline void wrap_all_dims( +inline void wrap_all_dims( std::vector& dims_to_wrap, int64_t tensor_total_dims) { for (const auto i : c10::irange(dims_to_wrap.size())) { diff --git a/aten/src/ATen/autocast_mode.cpp b/aten/src/ATen/autocast_mode.cpp index da0a87b02d1d..ee8b4b30b152 100644 --- a/aten/src/ATen/autocast_mode.cpp +++ b/aten/src/ATen/autocast_mode.cpp @@ -2,12 +2,14 @@ #include #include #include +#include #include #include #include #include +#include namespace at { namespace autocast { @@ -36,6 +38,14 @@ void set_xpu_enabled(bool new_enabled) { c10::impl::tls_set_dispatch_key_excluded(DispatchKey::AutocastXPU, !new_enabled); } +bool is_hpu_enabled() { + return !c10::impl::tls_is_dispatch_key_excluded(DispatchKey::AutocastHPU); +} + +void set_hpu_enabled(bool new_enabled) { + c10::impl::tls_set_dispatch_key_excluded(DispatchKey::AutocastHPU, !new_enabled); +} + namespace { // Imitate Apex and cache some of the casts to streamline parameter reuse. // Our heuristic is to cache lower_precision_fp casts of fp32 model weights (see cached_cast below). @@ -55,7 +65,8 @@ namespace { // directly against incoming TensorImpl*s. using weakref_type = c10::weak_intrusive_ptr; using val_type = std::tuple; -thread_local std::unordered_map cached_casts; +std::unordered_map cached_casts; +std::mutex cached_casts_mutex; // nesting tracks the nesting depth of the Python-side context manager. // When the autocast context manager exits to a nesting level that's outside @@ -69,6 +80,9 @@ thread_local at::ScalarType autocast_cpu_dtype = at::kBFloat16; // autocast_xpu_dtype is the lower_precision_fp used by AutocastXPU. thread_local at::ScalarType autocast_xpu_dtype = at::kBFloat16; +// autocast_hpu_dtype is the lower_precision_fp used by AutocastHPU. +thread_local at::ScalarType autocast_hpu_dtype = at::kBFloat16; + // should we enabled the cache inside autocast. thread_local bool cache_enabled = true; @@ -77,6 +91,7 @@ thread_local at::ScalarType autocast_gpu_dtype = at::kHalf; } void clear_cache() { + const std::lock_guard lock(cached_casts_mutex); cached_casts.clear(); } @@ -100,6 +115,10 @@ at::ScalarType get_autocast_xpu_dtype() { return autocast_xpu_dtype; } +at::ScalarType get_autocast_hpu_dtype() { + return autocast_hpu_dtype; +} + void set_autocast_cpu_dtype(at::ScalarType dtype) { TORCH_CHECK( dtype == at::kBFloat16, @@ -115,6 +134,10 @@ void set_autocast_xpu_dtype(at::ScalarType dtype) { autocast_xpu_dtype = dtype; } +void set_autocast_hpu_dtype(at::ScalarType dtype) { + autocast_hpu_dtype = dtype; +} + bool is_autocast_cache_enabled() { return cache_enabled; } @@ -135,6 +158,7 @@ Tensor cached_cast(at::ScalarType to_type, const Tensor& arg, DeviceType device_ arg.scalar_type() == at::kFloat && arg.requires_grad() && arg.is_leaf() && !arg.is_view() && cache_enabled); if (can_try_cache) { + const std::lock_guard lock(cached_casts_mutex); auto it = cached_casts.find(arg.unsafeGetTensorImpl()); if (it != cached_casts.end()) { return std::get<1>(it->second); @@ -304,9 +328,12 @@ Therefore, for the moment, this is all copy pasted in from VariableTypeEverythin // Common cases where registration signature matches redispatch signature // (that's why SIGNATURE is repeated in the WrapFunction instantiation) -#define KERNEL(FUNC, REGISTER_NAME, SIGNATURE, POLICY) \ - m.impl(TORCH_SELECTIVE_NAME("aten::" REGISTER_NAME), \ - &WrapFunction::type::call); +#define KERNEL(OP, POLICY) \ + m.impl(TORCH_SELECTIVE_NAME("aten::" #OP), \ + &WrapFunction::type::call); +#define KERNEL2(OP, OVERLOAD, POLICY) \ + m.impl(TORCH_SELECTIVE_NAME("aten::" #OP "." #OVERLOAD), \ + &WrapFunction::type::call); // Less-common but still useful case: redispatching to a function with a new signature (e.g. appending a dtype) #define KERNEL_DIFFERENT_REDISPATCH_SIGNATURE(REDISPATCH_FUNC, REGISTER_NAME, REGISTER_SIGNATURE, REDISPATCH_SIGNATURE, POLICY) \ @@ -314,9 +341,12 @@ Therefore, for the moment, this is all copy pasted in from VariableTypeEverythin &WrapFunction::type::call); // KERNEL_CPU registration for AutocastCPU -#define KERNEL_CPU(FUNC, REGISTER_NAME, SIGNATURE, POLICY) \ - m.impl(TORCH_SELECTIVE_NAME("aten::" REGISTER_NAME), \ - &WrapFunction::type::call); +#define KERNEL_CPU(OP, POLICY) \ + m.impl(TORCH_SELECTIVE_NAME("aten::" #OP), \ + &WrapFunction::type::call); +#define KERNEL_CPU2(OP, OVERLOAD, POLICY) \ + m.impl(TORCH_SELECTIVE_NAME("aten::" #OP "." #OVERLOAD), \ + &WrapFunction::type::call); /***************************************** Explicit registration for out-of-place ops @@ -327,136 +357,110 @@ TORCH_LIBRARY_IMPL(_, Autocast, m) { TORCH_LIBRARY_IMPL(aten, Autocast, m) { // lower_precision_fp - KERNEL(ADD_NS(_convolution), "_convolution.deprecated", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, bool, IntArrayRef, int64_t, bool, bool, bool), lower_precision_fp) - KERNEL(ADD_NS(_convolution), "_convolution", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, bool, IntArrayRef, int64_t, bool, bool, bool, bool), lower_precision_fp) - KERNEL(ADD_NS(conv1d), "conv1d", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t), lower_precision_fp) - KERNEL(ADD_NS(conv2d), "conv2d", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t), lower_precision_fp) - KERNEL(ADD_NS(conv3d), "conv3d", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t), lower_precision_fp) - KERNEL(ADD_NS(conv_tbc), "conv_tbc", Tensor (const Tensor &, const Tensor &, const Tensor &, int64_t), lower_precision_fp) - KERNEL(ADD_NS(conv_transpose1d), "conv_transpose1d", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t, IntArrayRef), lower_precision_fp) - KERNEL(ADD_NS(conv_transpose2d), "conv_transpose2d.input", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t, IntArrayRef), lower_precision_fp) - KERNEL(ADD_NS(conv_transpose3d), "conv_transpose3d.input", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t, IntArrayRef), lower_precision_fp) - KERNEL(ADD_NS(convolution), "convolution", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, bool, IntArrayRef, int64_t), lower_precision_fp) - KERNEL(ADD_NS(cudnn_convolution), "cudnn_convolution", Tensor (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, IntArrayRef, int64_t, bool, bool, bool), lower_precision_fp) - KERNEL(ADD_NS(cudnn_convolution_transpose), "cudnn_convolution_transpose", Tensor (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, IntArrayRef, IntArrayRef, int64_t, bool, bool, bool), lower_precision_fp) - KERNEL(ADD_NS(prelu), "prelu", Tensor (const Tensor &, const Tensor &), lower_precision_fp) - KERNEL(ADD_NS(addmm), "addmm", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) - KERNEL(ADD_NS(addmv), "addmv", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) - KERNEL(ADD_NS(addr), "addr", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) - KERNEL(ADD_NS(matmul), "matmul", Tensor (const Tensor &, const Tensor &), lower_precision_fp) - KERNEL(ADD_NS(einsum), "einsum", Tensor (c10::string_view, TensorList), lower_precision_fp) - KERNEL(ADD_NS(mm), "mm", Tensor (const Tensor &, const Tensor &), lower_precision_fp) - KERNEL(ADD_NS(mv), "mv", Tensor (const Tensor &, const Tensor &), lower_precision_fp) - KERNEL(ADD_NS(linear), "linear", Tensor (const Tensor &, const Tensor &, const c10::optional&), lower_precision_fp) - KERNEL(ADD_NS(addbmm), "addbmm", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) - KERNEL(ADD_NS(baddbmm), "baddbmm", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) - KERNEL(ADD_NS(bmm), "bmm", Tensor (const Tensor &, const Tensor &), lower_precision_fp) - KERNEL(ADD_NS(chain_matmul), "chain_matmul", Tensor (TensorList), lower_precision_fp) - KERNEL(ADD_NS(linalg_multi_dot), "linalg_multi_dot", Tensor (TensorList), lower_precision_fp) - // The macro doesn't like these (I think it chokes on commas inside <>) so write them manually - m.impl(TORCH_SELECTIVE_NAME("aten::_thnn_fused_lstm_cell"), - TORCH_FN((&WrapFunction (const Tensor &, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - std::tuple (const Tensor &, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - &ADD_NS(_thnn_fused_lstm_cell)>::type::call))); - m.impl("_thnn_fused_gru_cell", - TORCH_FN((&WrapFunction (const Tensor &, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - std::tuple (const Tensor &, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - &ADD_NS(_thnn_fused_gru_cell)>::type::call))); - m.impl("lstm_cell", - TORCH_FN((&WrapFunction (const Tensor &, TensorList, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - std::tuple (const Tensor &, TensorList, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - &ADD_NS(lstm_cell)>::type::call))); - m.impl("gru_cell", - TORCH_FN((&WrapFunction&, const c10::optional&), - Tensor (const Tensor &, const Tensor &, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - &ADD_NS(gru_cell)>::type::call))); - m.impl("rnn_tanh_cell", // tanh unary op is executed as a cuda math library call. - TORCH_FN((&WrapFunction&, const c10::optional&), - Tensor (const Tensor &, const Tensor &, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - &ADD_NS(rnn_tanh_cell)>::type::call))); - m.impl("rnn_relu_cell", - TORCH_FN((&WrapFunction&, const c10::optional&), - Tensor (const Tensor &, const Tensor &, const Tensor &, const Tensor &, const c10::optional&, const c10::optional&), - &ADD_NS(rnn_relu_cell)>::type::call))); + KERNEL2(_convolution, deprecated, lower_precision_fp) + KERNEL(_convolution, lower_precision_fp) + KERNEL(conv1d, lower_precision_fp) + KERNEL(conv2d, lower_precision_fp) + KERNEL(conv3d, lower_precision_fp) + KERNEL(conv_tbc, lower_precision_fp) + KERNEL(conv_transpose1d, lower_precision_fp) + KERNEL2(conv_transpose2d, input, lower_precision_fp) + KERNEL2(conv_transpose3d, input, lower_precision_fp) + KERNEL(convolution, lower_precision_fp) + KERNEL(cudnn_convolution, lower_precision_fp) + KERNEL(cudnn_convolution_transpose, lower_precision_fp) + KERNEL(prelu, lower_precision_fp) + KERNEL(addmm, lower_precision_fp) + KERNEL(addmv, lower_precision_fp) + KERNEL(addr, lower_precision_fp) + KERNEL(matmul, lower_precision_fp) + KERNEL(einsum, lower_precision_fp) + KERNEL(mm, lower_precision_fp) + KERNEL(mv, lower_precision_fp) + KERNEL(linear, lower_precision_fp) + KERNEL(addbmm, lower_precision_fp) + KERNEL(baddbmm, lower_precision_fp) + KERNEL(bmm, lower_precision_fp) + KERNEL(chain_matmul, lower_precision_fp) + KERNEL(linalg_multi_dot, lower_precision_fp) + KERNEL(_thnn_fused_lstm_cell, lower_precision_fp) + KERNEL(_thnn_fused_gru_cell, lower_precision_fp) + KERNEL(lstm_cell, lower_precision_fp) + KERNEL(gru_cell, lower_precision_fp) + KERNEL(rnn_tanh_cell, lower_precision_fp) + KERNEL(rnn_relu_cell, lower_precision_fp) + // fp32 - KERNEL(ADD_NS(acos), "acos", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(asin), "asin", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(cosh), "cosh", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(erfinv), "erfinv", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(exp), "exp", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(expm1), "expm1", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(log), "log", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(log10), "log10", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(log2), "log2", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(log1p), "log1p", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(reciprocal), "reciprocal", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(rsqrt), "rsqrt", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(sinh), "sinh", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(tan), "tan", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(pow), "pow.Tensor_Scalar", Tensor (const Tensor &, const Scalar&), fp32) - KERNEL(ADD_NS(pow), "pow.Tensor_Tensor", Tensor (const Tensor &, const Tensor &), fp32) - KERNEL(ADD_NS(pow), "pow.Scalar", Tensor (const Scalar&, const Tensor &), fp32) - KERNEL(ADD_NS(softplus), "softplus", Tensor (const Tensor &, const Scalar&, const Scalar&), fp32) - KERNEL(ADD_NS(layer_norm), "layer_norm", Tensor (const Tensor &, IntArrayRef, const c10::optional&, const c10::optional&, double, bool), fp32) - // The macro doesn't like this one (I think it chokes on commas inside <>) so write it manually - m.impl(TORCH_SELECTIVE_NAME("aten::native_layer_norm"), - TORCH_FN((&WrapFunction (const Tensor&, IntArrayRef, const c10::optional&, const c10::optional&, double), - std::tuple (const Tensor&, IntArrayRef, const c10::optional&, const c10::optional&, double), - &ADD_NS(native_layer_norm)>::type::call))); - KERNEL(ADD_NS(group_norm), "group_norm", Tensor (const Tensor &, int64_t, const c10::optional&, const c10::optional&, double, bool), fp32) - KERNEL(ADD_NS(frobenius_norm), "frobenius_norm", Tensor (const Tensor &), fp32) - KERNEL(ADD_NS(frobenius_norm), "frobenius_norm.dim", Tensor (const Tensor &, IntArrayRef, bool), fp32) - KERNEL(ADD_NS(nuclear_norm), "nuclear_norm", Tensor (const Tensor &, bool), fp32) - KERNEL(ADD_NS(nuclear_norm), "nuclear_norm.dim", Tensor (const Tensor &, IntArrayRef, bool), fp32) - KERNEL(ADD_NS(cosine_similarity), "cosine_similarity", Tensor (const Tensor &, const Tensor &, int64_t, double), fp32) - KERNEL(ADD_NS(poisson_nll_loss), "poisson_nll_loss", Tensor (const Tensor &, const Tensor &, bool, bool, double, int64_t), fp32) - KERNEL(ADD_NS(cosine_embedding_loss), "cosine_embedding_loss", Tensor (const Tensor &, const Tensor &, const Tensor &, double, int64_t), fp32) - KERNEL(ADD_NS(nll_loss), "nll_loss", Tensor (const Tensor &, const Tensor &, const c10::optional&, int64_t, int64_t), fp32) - KERNEL(ADD_NS(nll_loss2d), "nll_loss2d", Tensor (const Tensor &, const Tensor &, const c10::optional&, int64_t, int64_t), fp32) - KERNEL(ADD_NS(hinge_embedding_loss), "hinge_embedding_loss", Tensor (const Tensor &, const Tensor &, double, int64_t), fp32) - KERNEL(ADD_NS(kl_div), "kl_div", Tensor (const Tensor &, const Tensor &, int64_t, bool), fp32) - KERNEL(ADD_NS(l1_loss), "l1_loss", Tensor (const Tensor &, const Tensor &, int64_t), fp32) - KERNEL(ADD_NS(smooth_l1_loss), "smooth_l1_loss", Tensor (const Tensor &, const Tensor &, int64_t, double), fp32) - KERNEL(ADD_NS(huber_loss), "huber_loss", Tensor (const Tensor &, const Tensor &, int64_t, double), fp32) - KERNEL(ADD_NS(mse_loss), "mse_loss", Tensor (const Tensor &, const Tensor &, int64_t), fp32) - KERNEL(ADD_NS(margin_ranking_loss), "margin_ranking_loss", Tensor (const Tensor &, const Tensor &, const Tensor &, double, int64_t), fp32) - KERNEL(ADD_NS(multilabel_margin_loss), "multilabel_margin_loss", Tensor (const Tensor &, const Tensor &, int64_t), fp32) - KERNEL(ADD_NS(soft_margin_loss), "soft_margin_loss", Tensor (const Tensor &, const Tensor &, int64_t), fp32) - KERNEL(ADD_NS(triplet_margin_loss), "triplet_margin_loss", Tensor (const Tensor &, const Tensor &, const Tensor &, double, double, double, bool, int64_t), fp32) - KERNEL(ADD_NS(multi_margin_loss), "multi_margin_loss", Tensor (const Tensor &, const Tensor &, const Scalar&, const Scalar&, const c10::optional&, int64_t), fp32) - KERNEL(ADD_NS(binary_cross_entropy_with_logits), "binary_cross_entropy_with_logits", Tensor (const Tensor &, const Tensor &, const c10::optional&, const c10::optional&, int64_t), fp32) - KERNEL(ADD_NS(dist), "dist", Tensor (const Tensor &, const Tensor &, const Scalar&), fp32) - KERNEL(ADD_NS(pdist), "pdist", Tensor (const Tensor &, double), fp32) - KERNEL(ADD_NS(cdist), "cdist", Tensor (const Tensor &, const Tensor &, double, c10::optional), fp32) - KERNEL(ADD_NS(renorm), "renorm", Tensor (const Tensor &, const Scalar&, int64_t, const Scalar&), fp32) - KERNEL(ADD_NS(logsumexp), "logsumexp", Tensor (const Tensor &, IntArrayRef, bool), fp32) + KERNEL(acos, fp32) + KERNEL(asin, fp32) + KERNEL(cosh, fp32) + KERNEL(erfinv, fp32) + KERNEL(exp, fp32) + KERNEL(expm1, fp32) + KERNEL(log, fp32) + KERNEL(log10, fp32) + KERNEL(log2, fp32) + KERNEL(log1p, fp32) + KERNEL(reciprocal, fp32) + KERNEL(rsqrt, fp32) + KERNEL(sinh, fp32) + KERNEL(tan, fp32) + KERNEL2(pow, Tensor_Scalar, fp32) + KERNEL2(pow, Tensor_Tensor, fp32) + KERNEL2(pow, Scalar, fp32) + KERNEL(softplus, fp32) + KERNEL(layer_norm, fp32) + KERNEL(native_layer_norm, fp32) + KERNEL(group_norm, fp32) + KERNEL(frobenius_norm, fp32) + KERNEL2(frobenius_norm, dim, fp32) + KERNEL(nuclear_norm, fp32) + KERNEL2(nuclear_norm, dim, fp32) + KERNEL(cosine_similarity, fp32) + KERNEL(poisson_nll_loss, fp32) + KERNEL(cosine_embedding_loss, fp32) + KERNEL(nll_loss, fp32) + KERNEL(nll_loss2d, fp32) + KERNEL(hinge_embedding_loss, fp32) + KERNEL(kl_div, fp32) + KERNEL(l1_loss, fp32) + KERNEL(smooth_l1_loss, fp32) + KERNEL(huber_loss, fp32) + KERNEL(mse_loss, fp32) + KERNEL(margin_ranking_loss, fp32) + KERNEL(multilabel_margin_loss, fp32) + KERNEL(soft_margin_loss, fp32) + KERNEL(triplet_margin_loss, fp32) + KERNEL(multi_margin_loss, fp32) + KERNEL(binary_cross_entropy_with_logits, fp32) + KERNEL(dist, fp32) + KERNEL(pdist, fp32) + KERNEL(cdist, fp32) + KERNEL(renorm, fp32) + KERNEL(logsumexp, fp32) // fp32_set_opt_dtype - KERNEL(ADD_NS(prod), "prod", Tensor (const Tensor &, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(prod), "prod.dim_int", Tensor (const Tensor &, int64_t, bool, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(prod), "prod.dim_Dimname", Tensor (const Tensor &, Dimname, bool, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(softmax), "softmax.int", Tensor (const Tensor &, int64_t, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(softmax), "softmax.Dimname", Tensor (const Tensor &, Dimname, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(log_softmax), "log_softmax.int", Tensor (const Tensor &, int64_t, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(log_softmax), "log_softmax.Dimname", Tensor (const Tensor &, Dimname, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(cumprod), "cumprod", Tensor (const Tensor &, int64_t, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(cumprod), "cumprod.dimname", Tensor (const Tensor &, Dimname, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(cumsum), "cumsum", Tensor (const Tensor &, int64_t, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(cumsum), "cumsum.dimname", Tensor (const Tensor &, Dimname, c10::optional), fp32_set_opt_dtype) + KERNEL(prod, fp32_set_opt_dtype) + KERNEL2(prod, dim_int, fp32_set_opt_dtype) + KERNEL2(prod, dim_Dimname, fp32_set_opt_dtype) + KERNEL2(softmax, int, fp32_set_opt_dtype) + KERNEL2(softmax, Dimname, fp32_set_opt_dtype) + KERNEL2(log_softmax, int, fp32_set_opt_dtype) + KERNEL2(log_softmax, Dimname, fp32_set_opt_dtype) + KERNEL(cumprod, fp32_set_opt_dtype) + KERNEL2(cumprod, dimname, fp32_set_opt_dtype) + KERNEL(cumsum, fp32_set_opt_dtype) + KERNEL2(cumsum, dimname, fp32_set_opt_dtype) + KERNEL(linalg_vector_norm, fp32_set_opt_dtype) + KERNEL(linalg_matrix_norm, fp32_set_opt_dtype) + KERNEL2(linalg_matrix_norm, str_ord, fp32_set_opt_dtype) // commenting these out because they accept an explicit (not-optional) dtype, and we shouldn't try to flip that even // when autocasting. - // KERNEL(ADD_NS(norm), "norm.ScalarOpt_dtype", Tensor (const Tensor &, c10::optional, ScalarType), fp32_set_opt_dtype) - // KERNEL(ADD_NS(norm), "norm.ScalarOpt_dim_dtype", Tensor (const Tensor &, c10::optional, IntArrayRef, bool, ScalarType), fp32_set_opt_dtype) - // KERNEL(ADD_NS(norm), "norm.names_ScalarOpt_dim_dtype", Tensor (const Tensor &, c10::optional, DimnameList, bool, ScalarType), fp32_set_opt_dtype) - KERNEL(ADD_NS(sum), "sum", Tensor (const Tensor &, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(sum), "sum.dim_IntList", Tensor (const Tensor &, OptionalIntArrayRef, bool, c10::optional), fp32_set_opt_dtype) - KERNEL(ADD_NS(sum), "sum.dim_DimnameList", Tensor (const Tensor &, DimnameList, bool, c10::optional), fp32_set_opt_dtype) + // KERNEL2(norm, ScalarOpt_dtype, fp32_set_opt_dtype) + // KERNEL2(norm, ScalarOpt_dim_dtype, fp32_set_opt_dtype) + // KERNEL2(norm, names_ScalarOpt_dim_dtype, fp32_set_opt_dtype) + KERNEL(sum, fp32_set_opt_dtype) + KERNEL2(sum, dim_IntList, fp32_set_opt_dtype) + KERNEL2(sum, dim_DimnameList, fp32_set_opt_dtype) // fp32_append_dtype // The fp32_append_dtype wrapper overrides implicit promotion behavior. // norm does not implicitly promote, but be aware when adding new ops to this policy. @@ -464,16 +468,16 @@ TORCH_LIBRARY_IMPL(aten, Autocast, m) { KERNEL_DIFFERENT_REDISPATCH_SIGNATURE(ADD_NS(norm), "norm.ScalarOpt_dim", Tensor (const Tensor &, const c10::optional&, IntArrayRef, bool), Tensor (const Tensor &, const c10::optional&, IntArrayRef, bool, ScalarType), fp32_append_dtype) KERNEL_DIFFERENT_REDISPATCH_SIGNATURE(ADD_NS(norm), "norm.names_ScalarOpt_dim", Tensor (const Tensor &, const c10::optional&, DimnameList, bool), Tensor (const Tensor &, const c10::optional&, DimnameList, bool, ScalarType), fp32_append_dtype) // promote - KERNEL(ADD_NS(addcdiv), "addcdiv", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&), promote) - KERNEL(ADD_NS(addcmul), "addcmul", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&), promote) - KERNEL(ADD_NS(atan2), "atan2", Tensor (const Tensor &, const Tensor &), promote) - KERNEL(ADD_NS(bilinear), "bilinear", Tensor (const Tensor &, const Tensor &, const Tensor &, const c10::optional&), promote) - KERNEL(ADD_NS(cross), "cross", Tensor (const Tensor &, const Tensor &, c10::optional), promote) - KERNEL(ADD_NS(dot), "dot", Tensor (const Tensor &, const Tensor &), promote) - KERNEL(ADD_NS(grid_sampler), "grid_sampler", Tensor (const Tensor &, const Tensor &, int64_t, int64_t, bool), promote) - KERNEL(ADD_NS(index_put), "index_put", Tensor (const Tensor &, const torch::List>&, const Tensor &, bool), promote) - KERNEL(ADD_NS(tensordot), "tensordot", Tensor (const Tensor &, const Tensor &, IntArrayRef, IntArrayRef), promote) - KERNEL(ADD_NS(scatter_add), "scatter_add", Tensor (const Tensor&, int64_t, const Tensor&, const Tensor&), promote) + KERNEL(addcdiv, promote) + KERNEL(addcmul, promote) + KERNEL(atan2, promote) + KERNEL(bilinear, promote) + KERNEL(cross, promote) + KERNEL(dot, promote) + KERNEL(grid_sampler, promote) + KERNEL(index_put, promote) + KERNEL(tensordot, promote) + KERNEL(scatter_add, promote) m.impl(TORCH_SELECTIVE_NAME("aten::binary_cross_entropy"), TORCH_FN((&at::autocast::binary_cross_entropy_banned))); @@ -486,223 +490,134 @@ TORCH_LIBRARY_IMPL(_, AutocastCPU, m) { TORCH_LIBRARY_IMPL(aten, AutocastCPU, m) { // lower_precision_fp cast policy - KERNEL_CPU(ADD_NS(conv1d), "conv1d", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t), lower_precision_fp) - KERNEL_CPU(ADD_NS(conv1d), "conv1d.padding", Tensor (const Tensor&, const Tensor&, const c10::optional&, IntArrayRef, c10::string_view, IntArrayRef, int64_t groups), lower_precision_fp) - KERNEL_CPU(ADD_NS(conv2d), "conv2d", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t), lower_precision_fp) - KERNEL_CPU(ADD_NS(conv2d), "conv2d.padding", Tensor (const Tensor&, const Tensor&, const c10::optional&, IntArrayRef, c10::string_view, IntArrayRef, int64_t groups), lower_precision_fp) - KERNEL_CPU(ADD_NS(conv3d), "conv3d", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, int64_t), lower_precision_fp) - KERNEL_CPU(ADD_NS(conv3d), "conv3d.padding", Tensor (const Tensor&, const Tensor&, const c10::optional&, IntArrayRef, c10::string_view, IntArrayRef, int64_t groups), lower_precision_fp) - KERNEL_CPU(ADD_NS(bmm), "bmm", Tensor (const Tensor &, const Tensor &), lower_precision_fp) - KERNEL_CPU(ADD_NS(mm), "mm", Tensor (const Tensor &, const Tensor &), lower_precision_fp) - KERNEL_CPU(ADD_NS(baddbmm), "baddbmm", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) - KERNEL_CPU(ADD_NS(addmm), "addmm", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) - KERNEL_CPU(ADD_NS(addbmm), "addbmm", Tensor (const Tensor &, const Tensor &, const Tensor &, const Scalar&, const Scalar&), lower_precision_fp) - KERNEL_CPU(ADD_NS(linear), "linear", Tensor (const Tensor &, const Tensor &, const c10::optional &), lower_precision_fp) - KERNEL_CPU(ADD_NS(_convolution), "_convolution.deprecated", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, bool, IntArrayRef, int64_t, bool, bool, bool), lower_precision_fp) - KERNEL_CPU(ADD_NS(_convolution), "_convolution", Tensor (const Tensor &, const Tensor &, const c10::optional&, IntArrayRef, IntArrayRef, IntArrayRef, bool, IntArrayRef, int64_t, bool, bool, bool, bool), lower_precision_fp) - KERNEL_CPU(ADD_NS(matmul), "matmul", Tensor (const Tensor &, const Tensor &), lower_precision_fp) - KERNEL_CPU(ADD_NS(conv_tbc), "conv_tbc", Tensor(const Tensor &, const Tensor &, const Tensor &, int64_t), lower_precision_fp) + KERNEL_CPU(conv1d, lower_precision_fp) + KERNEL_CPU2(conv1d, padding, lower_precision_fp) + KERNEL_CPU(conv2d, lower_precision_fp) + KERNEL_CPU2(conv2d, padding, lower_precision_fp) + KERNEL_CPU(conv3d, lower_precision_fp) + KERNEL_CPU2(conv3d, padding, lower_precision_fp) + KERNEL_CPU(bmm, lower_precision_fp) + KERNEL_CPU(mm, lower_precision_fp) + KERNEL_CPU(baddbmm, lower_precision_fp) + KERNEL_CPU(addmm, lower_precision_fp) + KERNEL_CPU(addbmm, lower_precision_fp) + KERNEL_CPU(linear, lower_precision_fp) + KERNEL_CPU2(_convolution, deprecated, lower_precision_fp) + KERNEL_CPU(matmul, lower_precision_fp) + KERNEL_CPU(conv_tbc, lower_precision_fp) // fp32 cast policy - KERNEL_CPU(ADD_NS(conv_transpose1d), "conv_transpose1d", Tensor (const Tensor &, const Tensor &, const c10::optional &, IntArrayRef, IntArrayRef, IntArrayRef, int64_t, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(conv_transpose2d), "conv_transpose2d.input", Tensor (const Tensor &, const Tensor &, const c10::optional &, IntArrayRef, IntArrayRef, IntArrayRef, int64_t, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(conv_transpose3d), "conv_transpose3d.input", Tensor (const Tensor &, const Tensor &, const c10::optional &, IntArrayRef, IntArrayRef, IntArrayRef, int64_t, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(avg_pool3d), "avg_pool3d", Tensor (const Tensor &, IntArrayRef, IntArrayRef, IntArrayRef, bool, bool, c10::optional), fp32) - KERNEL_CPU(ADD_NS(binary_cross_entropy), "binary_cross_entropy", Tensor (const Tensor &, const Tensor &, const c10::optional&, int64_t), fp32) - KERNEL_CPU(ADD_NS(grid_sampler), "grid_sampler", Tensor(const Tensor &, const Tensor &, int64_t, int64_t, bool), fp32) - KERNEL_CPU(ADD_NS(polar), "polar", Tensor(const Tensor &, const Tensor &), fp32) - KERNEL_CPU(ADD_NS(prod), "prod", Tensor(const Tensor &, c10::optional), fp32) - KERNEL_CPU(ADD_NS(prod), "prod.dim_int", Tensor(const Tensor &, int64_t, bool, c10::optional), fp32) - KERNEL_CPU(ADD_NS(prod), "prod.dim_Dimname", Tensor(const Tensor &, at::Dimname, bool, c10::optional), fp32) - KERNEL_CPU(ADD_NS(quantile), "quantile", Tensor(const Tensor &, const Tensor &, c10::optional, bool, c10::string_view), fp32) - KERNEL_CPU(ADD_NS(quantile), "quantile.scalar", Tensor(const Tensor &, double, c10::optional, bool, c10::string_view), fp32) - KERNEL_CPU(ADD_NS(nanquantile), "nanquantile", Tensor(const Tensor &, const Tensor &, c10::optional, bool, c10::string_view), fp32) - KERNEL_CPU(ADD_NS(nanquantile), "nanquantile.scalar", Tensor(const Tensor &, double, c10::optional, bool, c10::string_view), fp32) - KERNEL_CPU(ADD_NS(stft), "stft", Tensor(const Tensor &, int64_t, c10::optional, c10::optional, const c10::optional &, bool, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(stft), "stft.center", Tensor(const Tensor &, int64_t, c10::optional, c10::optional, const c10::optional &, bool, c10::string_view, bool, c10::optional, c10::optional), fp32) - KERNEL_CPU(ADD_NS(cdist), "cdist", Tensor(const Tensor &, const Tensor &, double, c10::optional), fp32) - KERNEL_CPU(ADD_NS(grid_sampler_2d), "grid_sampler_2d", Tensor(const Tensor &, const Tensor &, int64_t, int64_t, bool), fp32) - KERNEL_CPU(ADD_NS(_grid_sampler_2d_cpu_fallback), "_grid_sampler_2d_cpu_fallback", Tensor(const Tensor &, const Tensor &, int64_t, int64_t, bool), fp32) - KERNEL_CPU(ADD_NS(grid_sampler_3d), "grid_sampler_3d", Tensor(const Tensor &, const Tensor &, int64_t, int64_t, bool), fp32) - KERNEL_CPU(ADD_NS(trace), "trace", Tensor(const Tensor &), fp32) - KERNEL_CPU(ADD_NS(view_as_complex), "view_as_complex", Tensor(const Tensor &), fp32) - KERNEL_CPU(ADD_NS(cholesky), "cholesky", Tensor(const Tensor &, bool), fp32) - KERNEL_CPU(ADD_NS(cholesky_inverse), "cholesky_inverse", Tensor(const Tensor &, bool), fp32) - KERNEL_CPU(ADD_NS(cholesky_solve), "cholesky_solve", Tensor(const Tensor &, const Tensor &, bool), fp32) - KERNEL_CPU(ADD_NS(inverse), "inverse", Tensor(const Tensor &), fp32) - KERNEL_CPU(ADD_NS(lu_solve), "lu_solve", Tensor(const Tensor &, const Tensor &, const Tensor &), fp32) - KERNEL_CPU(ADD_NS(matrix_rank), "matrix_rank", Tensor(const Tensor &, bool), fp32) - KERNEL_CPU(ADD_NS(orgqr), "orgqr", Tensor(const Tensor &, const Tensor &), fp32) - KERNEL_CPU(ADD_NS(ormqr), "ormqr", Tensor(const Tensor &, const Tensor &, const Tensor &, bool, bool), fp32) - KERNEL_CPU(ADD_NS(pinverse), "pinverse", Tensor(const Tensor &, double), fp32) - KERNEL_CPU(ADD_NS(max_pool3d), "max_pool3d", Tensor(const Tensor &, IntArrayRef, IntArrayRef, IntArrayRef, IntArrayRef, bool), fp32) - KERNEL_CPU(ADD_NS(max_unpool2d), "max_unpool2d", Tensor(const Tensor &, const Tensor &, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(max_unpool3d), "max_unpool3d", Tensor(const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(adaptive_avg_pool3d), "adaptive_avg_pool3d", Tensor(const Tensor &, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(reflection_pad1d), "reflection_pad1d", Tensor(const Tensor &, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(reflection_pad2d), "reflection_pad2d", Tensor(const Tensor &, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(replication_pad1d), "replication_pad1d", Tensor(const Tensor &, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(replication_pad2d), "replication_pad2d", Tensor(const Tensor &, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(replication_pad3d), "replication_pad3d", Tensor(const Tensor &, IntArrayRef), fp32) - KERNEL_CPU(ADD_NS(mse_loss), "mse_loss", Tensor(const Tensor &, const Tensor &, int64_t), fp32) - KERNEL_CPU(ADD_NS(ctc_loss), "ctc_loss.IntList", Tensor(const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t, int64_t, bool), fp32) - KERNEL_CPU(ADD_NS(ctc_loss), "ctc_loss.Tensor", Tensor(const Tensor &, const Tensor &, const Tensor &, const Tensor &, int64_t, int64_t, bool), fp32) - KERNEL_CPU(ADD_NS(kl_div), "kl_div", Tensor(const Tensor &, const Tensor &, int64_t, bool), fp32) - KERNEL_CPU(ADD_NS(multilabel_margin_loss), "multilabel_margin_loss", Tensor(const Tensor &, const Tensor &, int64_t), fp32) - KERNEL_CPU(ADD_NS(fft_fft), "fft_fft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_ifft), "fft_ifft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_fft2), "fft_fft2", Tensor(const Tensor &, at::OptionalIntArrayRef, at::IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_ifft2), "fft_ifft2", Tensor(const Tensor &, at::OptionalIntArrayRef, at::IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_fftn), "fft_fftn", Tensor(const Tensor &, at::OptionalIntArrayRef, at::OptionalIntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_ifftn), "fft_ifftn", Tensor(const Tensor &, at::OptionalIntArrayRef, at::OptionalIntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_rfft), "fft_rfft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_irfft), "fft_irfft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_rfft2), "fft_rfft2", Tensor(const Tensor &, at::OptionalIntArrayRef, at::IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_irfft2), "fft_irfft2", Tensor(const Tensor &, at::OptionalIntArrayRef, at::IntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_rfftn), "fft_rfftn", Tensor(const Tensor &, at::OptionalIntArrayRef, at::OptionalIntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_irfftn), "fft_irfftn", Tensor(const Tensor &, at::OptionalIntArrayRef, at::OptionalIntArrayRef, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_hfft), "fft_hfft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) - KERNEL_CPU(ADD_NS(fft_ihfft), "fft_ihfft", Tensor(const Tensor &, c10::optional, int64_t, c10::optional), fp32) - KERNEL_CPU(ADD_NS(linalg_matrix_norm), "linalg_matrix_norm", Tensor(const Tensor &, const at::Scalar &, at::IntArrayRef, bool, c10::optional), fp32) - KERNEL_CPU(ADD_NS(linalg_matrix_norm), "linalg_matrix_norm.str_ord", Tensor(const Tensor &, c10::string_view, at::IntArrayRef, bool, c10::optional), fp32) - KERNEL_CPU(ADD_NS(linalg_cond), "linalg_cond", Tensor(const Tensor &, const c10::optional &), fp32) - KERNEL_CPU(ADD_NS(linalg_cond), "linalg_cond.p_str", Tensor(const Tensor &, c10::string_view), fp32) - KERNEL_CPU(ADD_NS(linalg_matrix_rank), "linalg_matrix_rank", Tensor(const Tensor &, double, bool), fp32) - KERNEL_CPU(ADD_NS(linalg_matrix_rank), "linalg_matrix_rank.tol_tensor", Tensor(const Tensor &, const Tensor &, bool), fp32) - KERNEL_CPU(ADD_NS(linalg_matrix_rank), "linalg_matrix_rank.atol_rtol_tensor", Tensor(const Tensor &, const c10::optional &, const c10::optional &, bool), fp32) - KERNEL_CPU(ADD_NS(linalg_matrix_rank), "linalg_matrix_rank.atol_rtol_float", Tensor(const Tensor &, c10::optional, c10::optional, bool), fp32) - KERNEL_CPU(ADD_NS(linalg_solve), "linalg_solve", Tensor(const Tensor &, const Tensor &, bool), fp32) - KERNEL_CPU(ADD_NS(linalg_cholesky), "linalg_cholesky", Tensor(const Tensor &, bool), fp32) - KERNEL_CPU(ADD_NS(linalg_svdvals), "linalg_svdvals", Tensor(const Tensor &, c10::optional), fp32) - KERNEL_CPU(ADD_NS(linalg_eigvals), "linalg_eigvals", Tensor(const Tensor &), fp32) - KERNEL_CPU(ADD_NS(linalg_eigvalsh), "linalg_eigvalsh", Tensor(const Tensor &, c10::string_view), fp32) - KERNEL_CPU(ADD_NS(linalg_inv), "linalg_inv", Tensor(const Tensor &), fp32) - KERNEL_CPU(ADD_NS(linalg_householder_product), "linalg_householder_product", Tensor(const Tensor &, const Tensor &), fp32) - KERNEL_CPU(ADD_NS(linalg_tensorinv), "linalg_tensorinv", Tensor(const Tensor &, int64_t), fp32) - KERNEL_CPU(ADD_NS(linalg_tensorsolve), "linalg_tensorsolve", Tensor(const Tensor &, const Tensor &, at::OptionalIntArrayRef), fp32) - KERNEL_CPU(ADD_NS(fake_quantize_per_tensor_affine), "fake_quantize_per_tensor_affine", Tensor (const Tensor &, double, int64_t, int64_t, int64_t), fp32) - - m.impl(TORCH_SELECTIVE_NAME("aten::eig"), - TORCH_FN((&WrapFunction (const Tensor &, bool), - std::tuple (const Tensor &, bool), - &ADD_NS(eig)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::geqrf"), - TORCH_FN((&WrapFunction (const Tensor &), - std::tuple (const Tensor &), - &ADD_NS(geqrf)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::lstsq"), - TORCH_FN((&WrapFunction (const Tensor &, const Tensor &), - std::tuple (const Tensor &, const Tensor &), - &ADD_NS(lstsq)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::_lu_with_info"), - TORCH_FN((&WrapFunction (const Tensor &, bool, bool), - std::tuple (const Tensor &, bool, bool), - &ADD_NS(_lu_with_info)>::type::call))); - - - m.impl(TORCH_SELECTIVE_NAME("aten::qr"), - TORCH_FN((&WrapFunction (const Tensor &, bool), - std::tuple (const Tensor &, bool), - &ADD_NS(qr)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::svd"), - TORCH_FN((&WrapFunction (const Tensor &, bool, bool), - std::tuple (const Tensor &, bool, bool), - &ADD_NS(svd)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::symeig"), - TORCH_FN((&WrapFunction (const Tensor &, bool, bool), - std::tuple (const Tensor &, bool, bool), - &ADD_NS(symeig)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::triangular_solve"), - TORCH_FN((&WrapFunction (const Tensor &, const Tensor &, bool, bool, bool), - std::tuple (const Tensor &, const Tensor &, bool, bool, bool), - &ADD_NS(triangular_solve)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::fractional_max_pool2d"), - TORCH_FN((&WrapFunction (const Tensor &, IntArrayRef, IntArrayRef, const Tensor &), - std::tuple (const Tensor &, IntArrayRef, IntArrayRef, const Tensor &), - &ADD_NS(fractional_max_pool2d)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::fractional_max_pool3d"), - TORCH_FN((&WrapFunction (const Tensor &, IntArrayRef, IntArrayRef, const Tensor &), - std::tuple (const Tensor &, IntArrayRef, IntArrayRef, const Tensor &), - &ADD_NS(fractional_max_pool3d)>::type::call))); - - - m.impl(TORCH_SELECTIVE_NAME("aten::adaptive_max_pool3d"), - TORCH_FN((&WrapFunction (const Tensor &, IntArrayRef), - std::tuple (const Tensor &, IntArrayRef), - &ADD_NS(adaptive_max_pool3d)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::multilabel_margin_loss_forward"), - TORCH_FN((&WrapFunction (const Tensor &, const Tensor &, int64_t), - std::tuple (const Tensor &, const Tensor &, int64_t), - &ADD_NS(multilabel_margin_loss_forward)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::linalg_qr"), - TORCH_FN((&WrapFunction (const Tensor &, c10::string_view), - std::tuple (const Tensor &, c10::string_view), - &ADD_NS(linalg_qr)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::linalg_cholesky_ex"), - TORCH_FN((&WrapFunction (const Tensor &, bool, bool), - std::tuple (const Tensor &, bool, bool), - &ADD_NS(linalg_cholesky_ex)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::linalg_svd"), - TORCH_FN((&WrapFunction (const Tensor &, bool, c10::optional), - std::tuple (const Tensor &, bool, c10::optional), - &ADD_NS(linalg_svd)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::linalg_eig"), - TORCH_FN((&WrapFunction (const Tensor &), - std::tuple (const Tensor &), - &ADD_NS(linalg_eig)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::linalg_eigh"), - TORCH_FN((&WrapFunction (const Tensor &, c10::string_view), - std::tuple (const Tensor &, c10::string_view), - &ADD_NS(linalg_eigh)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::linalg_lstsq"), - TORCH_FN((&WrapFunction (const Tensor &, const Tensor &, c10::optional, c10::optional), - std::tuple (const Tensor &, const Tensor &, c10::optional, c10::optional), - &ADD_NS(linalg_lstsq)>::type::call))); - - m.impl(TORCH_SELECTIVE_NAME("aten::linalg_inv_ex"), - TORCH_FN((&WrapFunction (const Tensor &, bool), - std::tuple (const Tensor &, bool), - &ADD_NS(linalg_inv_ex)>::type::call))); + KERNEL_CPU(conv_transpose1d, fp32) + KERNEL_CPU2(conv_transpose2d, input, fp32) + KERNEL_CPU2(conv_transpose3d, input, fp32) + KERNEL_CPU(avg_pool3d, fp32) + KERNEL_CPU(binary_cross_entropy, fp32) + KERNEL_CPU(grid_sampler, fp32) + KERNEL_CPU(polar, fp32) + KERNEL_CPU(prod, fp32) + KERNEL_CPU2(prod, dim_int, fp32) + KERNEL_CPU2(prod, dim_Dimname, fp32) + KERNEL_CPU(quantile, fp32) + KERNEL_CPU2(quantile, scalar, fp32) + KERNEL_CPU(nanquantile, fp32) + KERNEL_CPU2(nanquantile, scalar, fp32) + KERNEL_CPU(stft, fp32) + KERNEL_CPU2(stft, center, fp32) + KERNEL_CPU(cdist, fp32) + KERNEL_CPU(grid_sampler_2d, fp32) + KERNEL_CPU(_grid_sampler_2d_cpu_fallback, fp32) + KERNEL_CPU(grid_sampler_3d, fp32) + KERNEL_CPU(trace, fp32) + KERNEL_CPU(view_as_complex, fp32) + KERNEL_CPU(cholesky, fp32) + KERNEL_CPU(cholesky_inverse, fp32) + KERNEL_CPU(cholesky_solve, fp32) + KERNEL_CPU(inverse, fp32) + KERNEL_CPU(lu_solve, fp32) + KERNEL_CPU(orgqr, fp32) + KERNEL_CPU(ormqr, fp32) + KERNEL_CPU(pinverse, fp32) + KERNEL_CPU(max_pool3d, fp32) + KERNEL_CPU(max_unpool2d, fp32) + KERNEL_CPU(max_unpool3d, fp32) + KERNEL_CPU(adaptive_avg_pool3d, fp32) + KERNEL_CPU(reflection_pad1d, fp32) + KERNEL_CPU(reflection_pad2d, fp32) + KERNEL_CPU(replication_pad1d, fp32) + KERNEL_CPU(replication_pad2d, fp32) + KERNEL_CPU(replication_pad3d, fp32) + KERNEL_CPU(mse_loss, fp32) + KERNEL_CPU(cosine_embedding_loss, fp32) + KERNEL_CPU(nll_loss, fp32) + KERNEL_CPU(nll_loss2d, fp32) + KERNEL_CPU(hinge_embedding_loss, fp32) + KERNEL_CPU(poisson_nll_loss, fp32) + KERNEL_CPU(smooth_l1_loss, fp32) + KERNEL_CPU(cross_entropy_loss, fp32) + KERNEL_CPU(l1_loss, fp32) + KERNEL_CPU(huber_loss, fp32) + KERNEL_CPU(margin_ranking_loss, fp32) + KERNEL_CPU(soft_margin_loss, fp32) + KERNEL_CPU(triplet_margin_loss, fp32) + KERNEL_CPU(multi_margin_loss, fp32) + KERNEL_CPU2(ctc_loss, IntList, fp32) + KERNEL_CPU2(ctc_loss, Tensor, fp32) + KERNEL_CPU(kl_div, fp32) + KERNEL_CPU(multilabel_margin_loss, fp32) + KERNEL_CPU(binary_cross_entropy_with_logits, fp32) + KERNEL_CPU(fft_fft, fp32) + KERNEL_CPU(fft_ifft, fp32) + KERNEL_CPU(fft_fft2, fp32) + KERNEL_CPU(fft_ifft2, fp32) + KERNEL_CPU(fft_fftn, fp32) + KERNEL_CPU(fft_ifftn, fp32) + KERNEL_CPU(fft_rfft, fp32) + KERNEL_CPU(fft_irfft, fp32) + KERNEL_CPU(fft_rfft2, fp32) + KERNEL_CPU(fft_irfft2, fp32) + KERNEL_CPU(fft_rfftn, fp32) + KERNEL_CPU(fft_irfftn, fp32) + KERNEL_CPU(fft_hfft, fp32) + KERNEL_CPU(fft_ihfft, fp32) + KERNEL_CPU(linalg_cond, fp32) + KERNEL_CPU2(linalg_cond, p_str, fp32) + KERNEL_CPU(linalg_matrix_rank, fp32) + KERNEL_CPU2(linalg_matrix_rank, tol_tensor, fp32) + KERNEL_CPU2(linalg_matrix_rank, atol_rtol_tensor, fp32) + KERNEL_CPU2(linalg_matrix_rank, atol_rtol_float, fp32) + KERNEL_CPU(linalg_solve, fp32) + KERNEL_CPU(linalg_cholesky, fp32) + KERNEL_CPU(linalg_svdvals, fp32) + KERNEL_CPU(linalg_eigvals, fp32) + KERNEL_CPU(linalg_eigvalsh, fp32) + KERNEL_CPU(linalg_inv, fp32) + KERNEL_CPU(linalg_householder_product, fp32) + KERNEL_CPU(linalg_tensorinv, fp32) + KERNEL_CPU(linalg_tensorsolve, fp32) + KERNEL_CPU(fake_quantize_per_tensor_affine, fp32) + KERNEL_CPU(geqrf, fp32) + KERNEL_CPU(_lu_with_info, fp32) + KERNEL_CPU(qr, fp32) + KERNEL_CPU(svd, fp32) + KERNEL_CPU(symeig, fp32) + KERNEL_CPU(triangular_solve, fp32) + KERNEL_CPU(fractional_max_pool2d, fp32) + KERNEL_CPU(fractional_max_pool3d, fp32) + KERNEL_CPU(adaptive_max_pool3d, fp32) + KERNEL_CPU(multilabel_margin_loss_forward, fp32) + KERNEL_CPU(linalg_qr, fp32) + KERNEL_CPU(linalg_cholesky_ex, fp32) + KERNEL_CPU(linalg_svd, fp32) + KERNEL_CPU(linalg_eig, fp32) + KERNEL_CPU(linalg_eigh, fp32) + KERNEL_CPU(linalg_lstsq, fp32) + KERNEL_CPU(linalg_inv_ex, fp32) // promote - KERNEL_CPU(ADD_NS(cat), "cat", Tensor (TensorList, int64_t), promote) - KERNEL_CPU(ADD_NS(stack), "stack", Tensor (TensorList, int64_t), promote) - KERNEL_CPU(ADD_NS(index_copy), "index_copy", Tensor (const Tensor &, int64_t, const Tensor &, const Tensor &), promote) - KERNEL_CPU(ADD_NS(index_copy), "index_copy.dimname", Tensor (const Tensor &, at::Dimname, const Tensor &, const Tensor &), promote) + KERNEL_CPU(stack, promote) + KERNEL_CPU(cat, promote) + KERNEL_CPU(index_copy, promote) + KERNEL_CPU2(index_copy, dimname, promote) } } // namespace diff --git a/aten/src/ATen/autocast_mode.h b/aten/src/ATen/autocast_mode.h index f5e88a0b88f1..3d57ac923116 100644 --- a/aten/src/ATen/autocast_mode.h +++ b/aten/src/ATen/autocast_mode.h @@ -20,6 +20,10 @@ TORCH_API bool is_xpu_enabled(); TORCH_API void set_xpu_enabled(bool enabled); TORCH_API at::ScalarType get_autocast_xpu_dtype(); TORCH_API void set_autocast_xpu_dtype(at::ScalarType dtype); +TORCH_API bool is_hpu_enabled(); +TORCH_API void set_hpu_enabled(bool enabled); +TORCH_API at::ScalarType get_autocast_hpu_dtype(); +TORCH_API void set_autocast_hpu_dtype(at::ScalarType dtype); TORCH_API bool is_autocast_cache_enabled(); TORCH_API void set_autocast_cache_enabled(bool enabled); @@ -34,6 +38,8 @@ bool is_autocast_eligible(const Tensor& tensor, DeviceType device_type) { tensor.is_floating_point(); case DeviceType::XPU: return tensor.is_xpu() && tensor.is_floating_point(); + case DeviceType::HPU: + return tensor.is_hpu() && tensor.is_floating_point(); default: return false; } @@ -49,6 +55,8 @@ inline DispatchKey get_autocast_dispatch_key_from_device_type( return DispatchKey::AutocastCPU; case DeviceType::XPU: return DispatchKey::AutocastXPU; + case DeviceType::HPU: + return DispatchKey::AutocastHPU; default: throw std::runtime_error( "unknown device type for autocast in get_autocast_dispatch_key_from_device_type"); @@ -64,6 +72,8 @@ inline at::ScalarType get_lower_precision_fp_from_device_type( return get_autocast_cpu_dtype(); case DeviceType::XPU: return get_autocast_xpu_dtype(); + case DeviceType::HPU: + return get_autocast_hpu_dtype(); default: throw std::runtime_error( "unknown device type for autocast in get_lower_precision_fp_from_device_type"); @@ -116,6 +126,16 @@ inline at::ScalarType prioritize( return current; } +inline at::ScalarType prioritize( + at::ScalarType current, + const ITensorListRef& list, + DeviceType device_type = DeviceType::CUDA) { + for (const auto& tensor : list) { + current = prioritize(current, tensor, device_type); + } + return current; +} + // Template to catch non-Tensor args (no-op that returns current best guess) template inline at::ScalarType prioritize( @@ -186,6 +206,18 @@ inline std::vector cached_cast( return vec; } +inline std::vector cached_cast( + at::ScalarType to_type, + const ITensorListRef& arg, + DeviceType device_type = DeviceType::CUDA) { + std::vector vec; + vec.reserve(arg.size()); + for (const auto& t : arg) { + vec.push_back(cached_cast(to_type, t, device_type)); + } + return vec; +} + // Template to catch non-Tensor args. template inline T cached_cast( diff --git a/aten/src/ATen/core/ATen_fwd.h b/aten/src/ATen/core/ATen_fwd.h index f6676a0c4ff1..63d576797251 100644 --- a/aten/src/ATen/core/ATen_fwd.h +++ b/aten/src/ATen/core/ATen_fwd.h @@ -35,6 +35,7 @@ using IOptTensorListRef = c10::IListRef; using DimnameList = c10::ArrayRef; using IntArrayRef = c10::ArrayRef; using OptionalIntArrayRef = c10::OptionalArrayRef; +using OptionalSymIntArrayRef = c10::OptionalArrayRef; using c10::Stream; using c10::Storage; diff --git a/aten/src/ATen/core/Formatting.cpp b/aten/src/ATen/core/Formatting.cpp index 832059ed1980..4537adff5aa4 100644 --- a/aten/src/ATen/core/Formatting.cpp +++ b/aten/src/ATen/core/Formatting.cpp @@ -13,7 +13,7 @@ std::ostream& operator<<(std::ostream & out, Backend b) { return out << toString(b); } -std::ostream& operator<<(std::ostream & out, Scalar s) { +std::ostream& operator<<(std::ostream & out, const Scalar& s) { if (s.isFloatingPoint()) { return out << s.toDouble(); } @@ -23,13 +23,19 @@ std::ostream& operator<<(std::ostream & out, Scalar s) { if (s.isBoolean()) { return out << (s.toBool() ? "true" : "false"); } + if (s.isSymInt()) { + return out << (s.toSymInt()); + } + if (s.isSymFloat()) { + return out << (s.toSymFloat()); + } if (s.isIntegral(false)) { return out << s.toLong(); } throw std::logic_error("Unknown type in Scalar"); } -std::string toString(Scalar s) { +std::string toString(const Scalar& s) { std::stringstream out; out << s; return out.str(); @@ -83,14 +89,9 @@ static std::tuple __printFormat(std::ostream& stream, const Ten break; } } - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - double expMin; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - double expMax; - if(offset == size) { - expMin = 1; - expMax = 1; - } else { + double expMin = 1; + double expMax = 1; + if(offset != size) { expMin = fabs(self_p[offset]); expMax = fabs(self_p[offset]); for (const auto i : c10::irange(offset, size)) { @@ -116,8 +117,7 @@ static std::tuple __printFormat(std::ostream& stream, const Ten } } double scale = 1; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - int64_t sz; + int64_t sz = 11; if(intMode) { if(expMax > 9) { sz = 11; @@ -153,8 +153,7 @@ static std::tuple __printFormat(std::ostream& stream, const Ten static void __printIndent(std::ostream &stream, int64_t indent) { - for (const auto i : c10::irange(indent)) { - (void)i; //Suppress unused variable warning + for (C10_UNUSED const auto i : c10::irange(indent)) { stream << " "; } } @@ -165,10 +164,8 @@ static void printScale(std::ostream & stream, double scale) { } static void __printMatrix(std::ostream& stream, const Tensor& self, int64_t linesize, int64_t indent) { - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - double scale; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - int64_t sz; + double scale = 0.0; + int64_t sz = 0; std::tie(scale, sz) = __printFormat(stream, self); __printIndent(stream, indent); @@ -277,6 +274,9 @@ std::ostream& print(std::ostream& stream, const Tensor & tensor_, int64_t linesi } else if (tensor_.is_mkldnn()) { stream << "MKLDNN Tensor: "; tensor = tensor_.to_dense().to(kCPU, kDouble).contiguous(); + } else if (tensor_.is_mps()) { + // MPS does not support double tensors, so first copy then convert + tensor = tensor_.to(kCPU).to(kDouble).contiguous(); } else { tensor = tensor_.to(kCPU, kDouble).contiguous(); } @@ -285,10 +285,8 @@ std::ostream& print(std::ostream& stream, const Tensor & tensor_, int64_t linesi stream << "[ " << tensor_.toString() << "{}"; } else if(tensor.ndimension() == 1) { if (tensor.numel() > 0) { - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - double scale; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - int64_t sz; + double scale = 0.0; + int64_t sz = 0; std::tie(scale, sz) = __printFormat(stream, tensor); if(scale != 1) { printScale(stream, scale); diff --git a/aten/src/ATen/core/Formatting.h b/aten/src/ATen/core/Formatting.h index 6dcfc6c7b3cd..9dcd14e1902e 100644 --- a/aten/src/ATen/core/Formatting.h +++ b/aten/src/ATen/core/Formatting.h @@ -8,8 +8,8 @@ namespace c10 { TORCH_API std::ostream& operator<<(std::ostream& out, Backend b); -TORCH_API std::ostream& operator<<(std::ostream & out, Scalar s); -TORCH_API std::string toString(Scalar s); +TORCH_API std::ostream& operator<<(std::ostream & out, const Scalar& s); +TORCH_API std::string toString(const Scalar& s); } namespace at { diff --git a/aten/src/ATen/core/IListRef.h b/aten/src/ATen/core/IListRef.h index 35ac34b22020..0b0ff67b02e2 100644 --- a/aten/src/ATen/core/IListRef.h +++ b/aten/src/ATen/core/IListRef.h @@ -313,7 +313,10 @@ using _MaterializedIListRefElem = typename std::conditional< T>::type; template -using MaterializedIListRef = std::vector<_MaterializedIListRefElem>>; +using MaterializedIListRefElem = _MaterializedIListRefElem>; + +template +using MaterializedIListRef = std::vector>; } // namespace detail @@ -388,6 +391,9 @@ class IListRefIterator : public std::iterator; using const_iterator = IListRefIterator; + using reverse_iterator = std::reverse_iterator; using value_type = typename iterator::value_type; IListRef() : tag_(IListRefTag::None) {} diff --git a/aten/src/ATen/core/IListRef_inl.h b/aten/src/ATen/core/IListRef_inl.h index a14bcfddae2d..534272f69b64 100644 --- a/aten/src/ATen/core/IListRef_inl.h +++ b/aten/src/ATen/core/IListRef_inl.h @@ -93,9 +93,9 @@ class IListRefTagImplBase { * implementation for `IListRefTag::Materialized`. */ template -class IListRefTagImplBase> { +class IListRefTagImplBase> { public: - using elem_type = _MaterializedIListRefElem; + using elem_type = MaterializedIListRefElem; using list_type = MaterializedIListRef; static const list_type& unwrap(const IListRef& ilist) { @@ -141,7 +141,7 @@ class IListRefTagImpl : public IListRefTagImplBase< IListRefTag::Materialized, at::Tensor, - _MaterializedIListRefElem> {}; + MaterializedIListRefElem> {}; /* * [Note: IOptTensorListRef] @@ -182,7 +182,7 @@ class IListRefTagImpl : public IListRefTagImplBase< IListRefTag::Materialized, at::OptionalTensorRef, - _MaterializedIListRefElem> {}; + MaterializedIListRefElem> {}; } // namespace detail } // namespace c10 diff --git a/aten/src/ATen/core/IListRef_test.cpp b/aten/src/ATen/core/IListRef_test.cpp index 1a609de74f80..67bd6efebfe4 100644 --- a/aten/src/ATen/core/IListRef_test.cpp +++ b/aten/src/ATen/core/IListRef_test.cpp @@ -77,7 +77,7 @@ TEST(ITensorListRefTest, CtorUnboxedIndirect_IsUnboxed) { }; check_is_unboxed(at::ITensorListRef{vec[0]}); check_is_unboxed(at::ITensorListRef{vec.data(), vec.size()}); - check_is_unboxed(at::ITensorListRef{&*vec.begin(), &*vec.end()}); + check_is_unboxed(at::ITensorListRef{vec.data(), vec.data() + vec.size()}); check_is_unboxed(vec); check_is_unboxed({vec[0], vec[1], vec[2]}); } @@ -137,7 +137,7 @@ TEST(ITensorListRefTest, UnboxedIndirect_Equal) { // Implicit constructors check_elements_same(vec[0], std::vector{vec[0]}, /* use_count= */ 3); check_elements_same({vec.data(), vec.size()}, vec, /* use_count= */ 1); - check_elements_same({&*vec.begin(), &*vec.end()}, vec, /* use_count= */ 1); + check_elements_same({vec.data(), vec.data() + vec.size()}, vec, /* use_count= */ 1); // Vector constructor check_elements_same(vec, vec, /* use_count= */ 1); // InitializerList constructor @@ -165,9 +165,15 @@ TEST(ITensorListRefTest, UnboxedMaterialize_Equal) { } TEST(ITensorListRefIteratorTest, CtorEmpty_ThrowsError) { - at::ITensorListRefIterator it; + at::ITensorListRefIterator* it = new at::ITensorListRefIterator(); // NOLINTNEXTLINE(cppcoreguidelines-avoid-goto,hicpp-avoid-goto) - EXPECT_THROW(*it, c10::Error); + EXPECT_THROW(**it, c10::Error); + +#if defined(_MSC_VER) && _ITERATOR_DEBUG_LEVEL == 2 + EXPECT_THROW({ delete it; }, c10::Error); +#else + delete it; +#endif } TEST(ITensorListRefIteratorTest, Boxed_GetFirstElement) { diff --git a/aten/src/ATen/core/List_test.cpp b/aten/src/ATen/core/List_test.cpp index e16e26b6042e..f37f3c008493 100644 --- a/aten/src/ATen/core/List_test.cpp +++ b/aten/src/ATen/core/List_test.cpp @@ -1118,7 +1118,7 @@ TEST(ListTest, canAccessStringByReference) { List list({"one", "two"}); const auto& listRef = list; static_assert(std::is_same::value, - "const List acccess should be by const reference"); + "const List access should be by const reference"); std::string str = list[1]; const std::string& strRef = listRef[1]; EXPECT_EQ("two", str); @@ -1130,7 +1130,7 @@ TEST(ListTest, canAccessOptionalStringByReference) { const auto& listRef = list; static_assert( std::is_same>>::value, - "List> acccess should be by const reference"); + "List> access should be by const reference"); c10::optional str1 = list[1]; c10::optional str2 = list[2]; decltype(auto) strRef1 = listRef[1]; diff --git a/aten/src/ATen/core/NamedRegistrations.cpp b/aten/src/ATen/core/NamedRegistrations.cpp index a9ae2f12c4dd..b78a563b673b 100644 --- a/aten/src/ATen/core/NamedRegistrations.cpp +++ b/aten/src/ATen/core/NamedRegistrations.cpp @@ -179,7 +179,6 @@ TORCH_LIBRARY_IMPL(aten, Named, m) { m.impl("exp.out", CppFunction::makeFallthrough()); m.impl("exp_", CppFunction::makeFallthrough()); m.impl("expand", CppFunction::makeFallthrough()); - m.impl("expand.SymInt", CppFunction::makeFallthrough()); m.impl("expm1", CppFunction::makeFallthrough()); m.impl("expm1.out", CppFunction::makeFallthrough()); m.impl("expm1_", CppFunction::makeFallthrough()); @@ -467,7 +466,6 @@ TORCH_LIBRARY_IMPL(aten, Named, m) { m.impl("sum.IntList_out", CppFunction::makeFallthrough()); m.impl("sum.dim_DimnameList", CppFunction::makeFallthrough()); m.impl("sum.dim_IntList", CppFunction::makeFallthrough()); - m.impl("sum.SymInt", CppFunction::makeFallthrough()); m.impl("t", CppFunction::makeFallthrough()); m.impl("tan", CppFunction::makeFallthrough()); m.impl("tan.out", CppFunction::makeFallthrough()); diff --git a/aten/src/ATen/core/PhiloxRNGEngine.h b/aten/src/ATen/core/PhiloxRNGEngine.h index a702de8998d9..c6536d29e798 100644 --- a/aten/src/ATen/core/PhiloxRNGEngine.h +++ b/aten/src/ATen/core/PhiloxRNGEngine.h @@ -213,7 +213,6 @@ class philox_engine { inline detail::FLOAT2 normalize_pair_uniform(detail::FLOAT2 in) { // TODO(voz) We use std:: below, and thus need a separate impl for CUDA. float u1 = in[0]; - float u2 = in[1]; constexpr float two_pi = 2.0 * M_PI; diff --git a/aten/src/ATen/core/PythonFallbackKernel.cpp b/aten/src/ATen/core/PythonFallbackKernel.cpp index 37b46ae15a3c..2d8834afe59e 100644 --- a/aten/src/ATen/core/PythonFallbackKernel.cpp +++ b/aten/src/ATen/core/PythonFallbackKernel.cpp @@ -1,4 +1,5 @@ -#include +#include +#include #include #include @@ -51,9 +52,10 @@ void pythonFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { // If Torch Dispatch Mode is active, use its PyInterpreter for dispatch - const auto& maybe_torch_dispatch_mode_state = at::impl::TorchDispatchModeTLS::get_state(); - if (maybe_torch_dispatch_mode_state) { - maybe_torch_dispatch_mode_state->pyinterpreter()->dispatch(op, stack); + const auto mode_stack_len = c10::impl::TorchDispatchModeTLS::stack_len(); + if (mode_stack_len > 0) { + const auto& cur_torch_dispatch_mode_state = c10::impl::TorchDispatchModeTLS::get_stack_at(mode_stack_len - 1); + cur_torch_dispatch_mode_state->pyinterpreter()->dispatch(op, stack); return; } @@ -69,16 +71,19 @@ void pythonFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { if (ivalue.isTensor()) { auto* interpreter = ivalue.unsafeToTensorImpl()->pyobj_interpreter(); if (interpreter) { - interpreter->dispatch(op, stack); + (*interpreter)->dispatch(op, stack); return; } - } else if (ivalue.isTensorList()) { + } else if (ivalue.isTensorList() || ivalue.isOptionalTensorList()) { // NB: use toListRef as it doesn't induce refcount bumps (toTensorListRef // is not a thing) for (const auto& nv : ivalue.toListRef()) { + if (nv.isNone()) { + continue; + } auto* interpreter = nv.unsafeToTensorImpl()->pyobj_interpreter(); if (interpreter) { - interpreter->dispatch(op, stack); + (*interpreter)->dispatch(op, stack); return; } } @@ -87,6 +92,12 @@ void pythonFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { TORCH_INTERNAL_ASSERT(0, "Hit Python dispatch key but no arguments had PyInterpreter (no tensor args?)"); } +void pythonDispatcherFallback(const c10::OperatorHandle& op, c10::DispatchKeySet dispatch_keys, torch::jit::Stack* stack) { + auto* state = c10::impl::PythonDispatcherTLS::get_state(); + TORCH_INTERNAL_ASSERT(state, "Hit PythonDispatcher dispatch key but PythonDispatcherTLS was not set"); + (*state)->python_dispatcher(op, dispatch_keys.remove(c10::DispatchKey::PythonDispatcher), stack); +} + void pythonTLSSnapshotFallback(const c10::OperatorHandle &op, c10::DispatchKeySet dispatch_keys, torch::jit::Stack* stack) { // It is ok for the tls to be already set here. // It means that there are multiple calls into the dispatcher not originating from python code. @@ -134,6 +145,10 @@ TORCH_LIBRARY_IMPL(_, Python, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&pythonFallback>()); } +TORCH_LIBRARY_IMPL(_, PythonDispatcher, m) { + m.fallback(torch::CppFunction::makeFromBoxedFunction<&pythonDispatcherFallback>()); +} + TORCH_LIBRARY_IMPL(_, PythonTLSSnapshot, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&pythonTLSSnapshotFallback>()); } diff --git a/aten/src/ATen/core/PythonFallbackKernel.h b/aten/src/ATen/core/PythonFallbackKernel.h index 94cd4e81291a..f38bdd2ada90 100644 --- a/aten/src/ATen/core/PythonFallbackKernel.h +++ b/aten/src/ATen/core/PythonFallbackKernel.h @@ -1,5 +1,5 @@ #pragma once - +#include namespace at { namespace impl { diff --git a/aten/src/ATen/core/PythonOpRegistrationTrampoline.cpp b/aten/src/ATen/core/PythonOpRegistrationTrampoline.cpp new file mode 100644 index 000000000000..2d9b15a6b03c --- /dev/null +++ b/aten/src/ATen/core/PythonOpRegistrationTrampoline.cpp @@ -0,0 +1,28 @@ +#include + +namespace at { +namespace impl { + +// The strategy is that all python interpreters attempt to register themselves +// as the main interpreter, but only one wins. Only that interpreter is +// allowed to interact with the C++ dispatcher. Furthermore, when we execute +// logic on that interpreter, we do so hermetically, never setting pyobj field +// on Tensor. + +std::atomic PythonOpRegistrationTrampoline::interpreter_{nullptr}; + +bool PythonOpRegistrationTrampoline::registerInterpreter(c10::impl::PyInterpreter* interp) { + c10::impl::PyInterpreter* expected = nullptr; + interpreter_.compare_exchange_strong(expected, interp); + if (expected != nullptr) { + // This is the second (or later) Python interpreter, which means we need + // non-trivial hermetic PyObject TLS + c10::impl::HermeticPyObjectTLS::init_state(); + return false; + } else { + return true; + } +} + +} // namespace impl +} // namespace at diff --git a/aten/src/ATen/core/PythonOpRegistrationTrampoline.h b/aten/src/ATen/core/PythonOpRegistrationTrampoline.h new file mode 100644 index 000000000000..00d3c635859a --- /dev/null +++ b/aten/src/ATen/core/PythonOpRegistrationTrampoline.h @@ -0,0 +1,18 @@ +#include + +// TODO: this can probably live in c10 + +namespace at { +namespace impl { + +class TORCH_API PythonOpRegistrationTrampoline final { + static std::atomic interpreter_; + +public: + // Returns true if you successfully registered yourself (that means + // you are in the hot seat for doing the operator registrations!) + static bool registerInterpreter(c10::impl::PyInterpreter*); +}; + +} // namespace impl +} // namespace at diff --git a/aten/src/ATen/core/TensorAccessor.h b/aten/src/ATen/core/TensorAccessor.h index 9c60f84a16b3..fea6c09f262f 100644 --- a/aten/src/ATen/core/TensorAccessor.h +++ b/aten/src/ATen/core/TensorAccessor.h @@ -160,7 +160,7 @@ class GenericPackedTensorAccessorBase { index_t strides_[N]; C10_HOST void bounds_check_(index_t i) const { TORCH_CHECK_INDEX( - 0 <= i && i < N, + 0 <= i && i < index_t{N}, "Index ", i, " is not within bounds of a tensor of dimension ", diff --git a/aten/src/ATen/core/TensorBase.h b/aten/src/ATen/core/TensorBase.h index 334cbba102a2..0ecd4456033b 100644 --- a/aten/src/ATen/core/TensorBase.h +++ b/aten/src/ATen/core/TensorBase.h @@ -48,6 +48,7 @@ inline bool variable_excluded_from_dispatch() { return c10::impl::tls_local_dispatch_key_set().excluded_.isSupersetOf(c10::autograd_dispatch_keyset); #endif } + } // NOTE: [Tensor vs. TensorBase] @@ -161,6 +162,14 @@ class TORCH_API TensorBase { return impl_->sym_size(dim); } + c10::SymInt sym_stride(int64_t dim) const { + const auto sizes = this->sym_strides(); + const auto ndim = static_cast(sizes.size()); + // false is passed to maybe_wrap_dim so behavior is identical to array access (but with wrapping) + return sizes[c10::maybe_wrap_dim(dim, ndim, /*wrap_scalar=*/false)]; + + } + int64_t size(int64_t dim) const { return impl_->size(dim); } @@ -225,6 +234,9 @@ class TORCH_API TensorBase { c10::SymIntArrayRef sym_sizes() const { return impl_->sym_sizes(); } + c10::SymIntArrayRef sym_strides() const { + return impl_->sym_strides(); + } IntArrayRef strides() const { return impl_->strides(); } @@ -282,6 +294,14 @@ class TORCH_API TensorBase { return impl_->numel() * impl_->itemsize(); } + c10::SymInt sym_nbytes() const { + TORCH_CHECK(layout () != at::kSparse, + "nbytes is not defined for sparse tensors. If you want the size of the constituent " \ + "tensors, add the nbytes of the indices and values. If you want the size of the " \ + "equivalent dense tensor, multiply numel() by element_size()"); + return impl_->sym_numel() * impl_->itemsize(); + } + int64_t numel() const { return impl_->numel(); } @@ -290,6 +310,10 @@ class TORCH_API TensorBase { return impl_->sym_numel(); } + c10::SymInt sym_storage_offset() const { + return impl_->sym_storage_offset(); + } + // Length of one array element in bytes. This is the traditional // Numpy naming. size_t itemsize() const { @@ -553,6 +577,10 @@ class TORCH_API TensorBase { template class PtrTraits = DefaultPtrTraits> PackedTensorAccessor32 packed_accessor32() const& { + TORCH_CHECK( + impl_->numel() <= + static_cast(std::numeric_limits::max()), + "numel needs to be smaller than int32_t max; otherwise, please use packed_accessor64"); return generic_packed_accessor(); } template class PtrTraits = DefaultPtrTraits> @@ -914,4 +942,34 @@ inline c10::MaybeOwned TensorBase::expect_contiguous(MemoryFormat me return c10::MaybeOwned::owned(__dispatch_contiguous(memory_format)); } } + +namespace symint { + +template +using enable_if_symint = std::enable_if_t::value>; +template +using enable_if_int = std::enable_if_t::value>; + +template > +c10::SymIntArrayRef sizes(const TensorBase& t) { return t.sym_sizes(); } +template > +IntArrayRef sizes(const TensorBase& t) { return t.sizes(); } + +template > +c10::SymInt size(const TensorBase& t, int64_t dim) { return t.sym_size(dim); } +template > +int64_t size(const TensorBase& t, int64_t dim) { return t.size(dim); } + +template > +c10::SymIntArrayRef strides(const TensorBase& t) { return t.sym_strides(); } +template > +IntArrayRef strides(const TensorBase& t) { return t.strides(); } + +template > +c10::SymInt numel(const TensorBase& t) { return t.sym_numel(); } +template > +int64_t numel(const TensorBase& t) { return t.numel(); } + +} // namespace symint + } // namespace at diff --git a/aten/src/ATen/core/TorchDispatchModeTLS.cpp b/aten/src/ATen/core/TorchDispatchModeTLS.cpp deleted file mode 100644 index d224b08d5b54..000000000000 --- a/aten/src/ATen/core/TorchDispatchModeTLS.cpp +++ /dev/null @@ -1,58 +0,0 @@ -#include -#include -#include - -namespace at { namespace impl { - -thread_local std::shared_ptr torchDispatchModeState; - -void TorchDispatchModeTLS::set_state(std::shared_ptr state) { - if (state) { - c10::impl::tls_set_dispatch_key_included(DispatchKey::Python, true); - c10::impl::tls_set_dispatch_key_included(DispatchKey::PythonTLSSnapshot, true); - } else { - TorchDispatchModeTLS::reset_state(); - } - torchDispatchModeState = std::move(state); -} - -const std::shared_ptr& TorchDispatchModeTLS::get_state() { - return torchDispatchModeState; -} - -void TorchDispatchModeTLS::reset_state() { - torchDispatchModeState.reset(); - c10::impl::tls_set_dispatch_key_included(DispatchKey::Python, false); - c10::impl::tls_set_dispatch_key_included(DispatchKey::PythonTLSSnapshot, false); -} - -bool dispatch_mode_enabled() { - return static_cast(at::impl::TorchDispatchModeTLS::get_state()); -} - -bool tensor_has_dispatch(const at::Tensor& t) { - DispatchKeySet key_set({DispatchKey::Python, DispatchKey::PythonTLSSnapshot}); - return t.key_set().has_any(key_set); -} - -bool tensorlist_has_dispatch(const at::TensorList& li) { - for (const auto& t: li) { - if (tensor_has_dispatch(t)) { - return true; - } - } - return false; -} - -bool tensorlist_has_dispatch(const c10::List>& li) { - for (auto i : c10::irange(li.size())) { - auto t = li.get(i); - if (t && tensor_has_dispatch(*t)) { - return true; - } - } - return false; -} - -} // namespace impl -} // namespace at diff --git a/aten/src/ATen/core/TorchDispatchModeTLS.h b/aten/src/ATen/core/TorchDispatchModeTLS.h deleted file mode 100644 index 9ae015e6582f..000000000000 --- a/aten/src/ATen/core/TorchDispatchModeTLS.h +++ /dev/null @@ -1,25 +0,0 @@ -#pragma once - -#include -#include -#include -#include -#include - -namespace at { -namespace impl { - -struct TORCH_API TorchDispatchModeTLS { - static void set_state(std::shared_ptr state); - static const std::shared_ptr& get_state(); - static void reset_state(); -}; - -bool dispatch_mode_enabled(); -bool tensor_has_dispatch(const at::Tensor& t); -bool tensorlist_has_dispatch(const at::TensorList& li); -bool tensorlist_has_dispatch(const c10::List>& li); - - -} // namespace impl -} // namespace at diff --git a/aten/src/ATen/core/TorchDispatchUtils.cpp b/aten/src/ATen/core/TorchDispatchUtils.cpp new file mode 100644 index 000000000000..e2f981c6a833 --- /dev/null +++ b/aten/src/ATen/core/TorchDispatchUtils.cpp @@ -0,0 +1,31 @@ +#include + +namespace at { +namespace impl { + +bool tensor_has_dispatch(const at::Tensor& t) { + DispatchKeySet key_set({DispatchKey::Python, DispatchKey::PythonTLSSnapshot}); + return t.key_set().has_any(key_set); +} + +bool tensorlist_has_dispatch(at::ITensorListRef li) { + for (const auto& t : li) { + if (tensor_has_dispatch(t)) { + return true; + } + } + return false; +} + +bool tensorlist_has_dispatch(const c10::List>& li) { + for (auto i : c10::irange(li.size())) { + auto t = li.get(i); + if (t && tensor_has_dispatch(*t)) { + return true; + } + } + return false; +} + +} // namespace impl +} // namespace at diff --git a/aten/src/ATen/core/TorchDispatchUtils.h b/aten/src/ATen/core/TorchDispatchUtils.h new file mode 100644 index 000000000000..ed7b4181095d --- /dev/null +++ b/aten/src/ATen/core/TorchDispatchUtils.h @@ -0,0 +1,17 @@ +#pragma once + +#include +#include +#include +#include +#include + +namespace at { +namespace impl { + +bool tensor_has_dispatch(const at::Tensor& t); +bool tensorlist_has_dispatch(at::ITensorListRef li); +bool tensorlist_has_dispatch(const c10::List>& li); +using c10::impl::dispatch_mode_enabled; + +}} diff --git a/aten/src/ATen/core/Variadic.h b/aten/src/ATen/core/Variadic.h index d33f3d575177..61b6a35a0b1c 100644 --- a/aten/src/ATen/core/Variadic.h +++ b/aten/src/ATen/core/Variadic.h @@ -48,6 +48,15 @@ struct IterArgs { // you may be able to process these structures more efficiently // than handling them one-by-one. + template + void operator()(c10::IListRef args) { + for (const auto& arg : args) { + self()(arg); + if (self().short_circuit()) + return; + } + } + template void operator()(at::ArrayRef args) { for (const auto& arg : args) { diff --git a/aten/src/ATen/core/boxing/KernelFunction.h b/aten/src/ATen/core/boxing/KernelFunction.h index 8ab34e95046a..f1bfc9ec6f27 100644 --- a/aten/src/ATen/core/boxing/KernelFunction.h +++ b/aten/src/ATen/core/boxing/KernelFunction.h @@ -1,5 +1,6 @@ #pragma once +#include #include #include #include @@ -14,6 +15,56 @@ class OperatorHandle; struct OperatorKernel; class KernelFunction; +template +using has_symint = + guts::disjunction< + std::is_same>, + std::is_same>, + std::is_same>, + std::is_same, std::decay_t> + >; + +template +struct remove_symint { + using type = T; +}; + +template <> +struct remove_symint { + using type = int64_t; +}; + +template <> +struct remove_symint { + using type = OptionalIntArrayRef; +}; + +template <> +struct remove_symint { + using type = c10::IntArrayRef; +}; + +template <> +struct remove_symint> { + using type = c10::optional; +}; + + +template +struct maybe_keep_symint final {}; + +template +struct maybe_keep_symint { using type = T; }; + +template +struct maybe_keep_symint { using type = typename remove_symint::type; }; + +template +using fn_has_symint = typename guts::typelist::true_for_any_type< + has_symint, + typename guts::infer_function_traits::type::parameter_types +>; + /** * KernelFunction is similar to std::function but stores a kernel function. * You can create a KernelFunction from a boxed or unboxed function/functor/lambda @@ -31,6 +82,7 @@ class TORCH_API KernelFunction final { // Fast path for dispatch to allow not touching the boxed kernel in // the common case where unboxed is available. bool isValidUnboxed() const; + bool isValidSymUnboxed() const; bool isValid() const; bool isFallthrough() const; @@ -182,13 +234,16 @@ class TORCH_API KernelFunction final { explicit KernelFunction( std::unique_ptr functor, InternalBoxedKernelFunction* boxed_kernel_func, - void* unboxed_kernel_func); + void* unboxed_kernel_func, + void* sym_unboxed_kernel_func); explicit KernelFunction( BoxedKernel boxed_fn, - void* unboxed_kernel_func); + void* unboxed_kernel_func, + void* sym_unboxed_kernel_func); BoxedKernel boxed_kernel_func_; void* unboxed_kernel_func_; + void* sym_unboxed_kernel_func_; }; } diff --git a/aten/src/ATen/core/boxing/KernelFunction_impl.h b/aten/src/ATen/core/boxing/KernelFunction_impl.h index c33175e4b99a..9637f8fc2043 100644 --- a/aten/src/ATen/core/boxing/KernelFunction_impl.h +++ b/aten/src/ATen/core/boxing/KernelFunction_impl.h @@ -8,22 +8,29 @@ namespace c10 { inline KernelFunction::KernelFunction() : boxed_kernel_func_() , unboxed_kernel_func_(nullptr) + , sym_unboxed_kernel_func_(nullptr) {} -inline KernelFunction::KernelFunction(std::unique_ptr functor, InternalBoxedKernelFunction* boxed_kernel_func, void* unboxed_kernel_func) +inline KernelFunction::KernelFunction(std::unique_ptr functor, InternalBoxedKernelFunction* boxed_kernel_func, void* unboxed_kernel_func, void* sym_unboxed_kernel_func = nullptr) : boxed_kernel_func_(std::move(functor), boxed_kernel_func) , unboxed_kernel_func_(unboxed_kernel_func) + , sym_unboxed_kernel_func_(sym_unboxed_kernel_func) {} -inline KernelFunction::KernelFunction(BoxedKernel boxed_fn, void* unboxed_kernel_func) +inline KernelFunction::KernelFunction(BoxedKernel boxed_fn, void* unboxed_kernel_func, void* sym_unboxed_kernel_func = nullptr) : boxed_kernel_func_(std::move(boxed_fn)) , unboxed_kernel_func_(unboxed_kernel_func) + , sym_unboxed_kernel_func_(sym_unboxed_kernel_func) {} inline bool KernelFunction::isValidUnboxed() const { return unboxed_kernel_func_ != nullptr; } +inline bool KernelFunction::isValidSymUnboxed() const { + return sym_unboxed_kernel_func_ != nullptr; +} + inline bool KernelFunction::isValid() const { return boxed_kernel_func_.isValid(); } @@ -43,16 +50,58 @@ inline Return callUnboxedKernelFunction(void* unboxed_kernel_func, OperatorKerne return (*func)(functor, dispatchKeySet, std::forward(args)...); } +// This template requires you to explicitly specify the argument you want to +// forward; it doesn't work if you try to deduce it +// NB: keep this in sync with cloneWithRealTypes in function_schema.cpp + +template +inline typename remove_symint::type unpackSymInt(T x) { return x; } + +template <> +inline typename remove_symint::type unpackSymInt(c10::SymInt x) { + return x.expect_int(); +} + +template <> +inline typename remove_symint::type unpackSymInt(c10::SymIntArrayRef x) { + return c10::asIntArrayRefSlow(x); +} + +template <> +inline typename remove_symint>::type unpackSymInt(c10::optional x) { + return x.has_value() ? c10::make_optional(x->expect_int()) : c10::nullopt; +} + +template <> +inline typename remove_symint::type unpackSymInt(at::OptionalSymIntArrayRef x) { + return x.has_value() ? c10::make_optional(c10::asIntArrayRefSlow(*x)) : c10::nullopt; +} + template C10_ALWAYS_INLINE Return KernelFunction::call(const OperatorHandle& opHandle, DispatchKeySet dispatchKeySet, Args... args) const { // note: Args above is intentionally not Args&&. We don't want perfect // forwarding, which would require Args to be deduced, but instead we // want callers to explicitly specify the Args. - if (C10_LIKELY(unboxed_kernel_func_ != nullptr)) { - auto *functor = boxed_kernel_func_.getFunctor(); - return callUnboxedKernelFunction( - unboxed_kernel_func_, functor, dispatchKeySet, std::forward(args)...); + // This should get inlined by compiler + if (guts::disjunction...>::value) { + if (sym_unboxed_kernel_func_ != nullptr) { + auto *functor = boxed_kernel_func_.getFunctor(); + return callUnboxedKernelFunction( + sym_unboxed_kernel_func_, functor, dispatchKeySet, std::forward(args)...); + } + + if (unboxed_kernel_func_ != nullptr) { + auto *functor = boxed_kernel_func_.getFunctor(); + return callUnboxedKernelFunction::type...>( + unboxed_kernel_func_, functor, dispatchKeySet, unpackSymInt(args)...); + } + } else { + if (C10_LIKELY(unboxed_kernel_func_ != nullptr)) { + auto *functor = boxed_kernel_func_.getFunctor(); + return callUnboxedKernelFunction( + unboxed_kernel_func_, functor, dispatchKeySet, std::forward(args)...); + } } return impl::BoxedKernelWrapper::call( @@ -102,10 +151,14 @@ inline KernelFunction KernelFunction::makeFromUnboxedFunctor(std::unique_ptr::value, "Tried to call KernelFunction::makeFromUnboxedFunctor, but the functor doesn't inherit from c10::OperatorKernel. Please have the functor inherit from it."); + auto* unboxed_fn = &impl::wrap_kernel_functor_unboxed::call; + void* void_unboxed_fn = reinterpret_cast(unboxed_fn); + bool is_symint = fn_has_symint::value; return KernelFunction( std::move(kernelFunctor), &impl::make_boxed_from_unboxed_functor::call, - reinterpret_cast(&impl::wrap_kernel_functor_unboxed::call) + is_symint ? nullptr : void_unboxed_fn, + is_symint ? void_unboxed_fn : nullptr ); } diff --git a/aten/src/ATen/core/boxing/impl/kernel_function_legacy_test.cpp b/aten/src/ATen/core/boxing/impl/kernel_function_legacy_test.cpp index 4db6794e50eb..3c87fec710aa 100644 --- a/aten/src/ATen/core/boxing/impl/kernel_function_legacy_test.cpp +++ b/aten/src/ATen/core/boxing/impl/kernel_function_legacy_test.cpp @@ -508,8 +508,8 @@ TEST(OperatorRegistrationTest_LegacyFunctionBasedKernel, givenKernelWithStringLi auto output = std::move(outputs[0]).toList(); EXPECT_EQ(2, output.size()); - EXPECT_EQ("value1", output.get(0).toString()->string()); - EXPECT_EQ("value2", output.get(1).toString()->string()); + EXPECT_EQ("value1", output.get(0).toStringRef()); + EXPECT_EQ("value2", output.get(1).toStringRef()); } int captured_dict_size = 0; @@ -550,7 +550,7 @@ TEST(OperatorRegistrationTest_LegacyFunctionBasedKernel, givenKernelWithDictInpu dict.insert("key2", "value2"); auto outputs = callOp(*op, dict); EXPECT_EQ(1, outputs.size()); - EXPECT_EQ("value2", outputs[0].toString()->string()); + EXPECT_EQ("value2", outputs[0].toStringRef()); } Dict kernelWithDictOutput(Dict input) { @@ -612,7 +612,7 @@ TEST(OperatorRegistrationTest_LegacyFunctionBasedKernel, givenKernelWithUnordere dict.insert("key2", "value2"); auto outputs = callOp(*op, dict); EXPECT_EQ(1, outputs.size()); - EXPECT_EQ("value2", outputs[0].toString()->string()); + EXPECT_EQ("value2", outputs[0].toStringRef()); } std::unordered_map kernelWithUnorderedMapOutput(std::unordered_map input) { @@ -897,7 +897,7 @@ TEST(OperatorRegistrationTest_LegacyFunctionBasedKernel, givenKernelWithOptional EXPECT_EQ(3, outputs.size()); EXPECT_EQ(DispatchKey::CUDA, extractDispatchKey(outputs[0].toTensor())); EXPECT_TRUE(outputs[1].isNone()); - EXPECT_EQ("text", outputs[2].toString()->string()); + EXPECT_EQ("text", outputs[2].toStringRef()); outputs = callOp(*op, dummyTensor(DispatchKey::CPU), c10::IValue(), 4, c10::IValue()); EXPECT_EQ(3, outputs.size()); diff --git a/aten/src/ATen/core/boxing/impl/kernel_function_test.cpp b/aten/src/ATen/core/boxing/impl/kernel_function_test.cpp index 10d2a3fdeb2f..b4fe9290b9e2 100644 --- a/aten/src/ATen/core/boxing/impl/kernel_function_test.cpp +++ b/aten/src/ATen/core/boxing/impl/kernel_function_test.cpp @@ -484,7 +484,7 @@ TEST(OperatorRegistrationTest_FunctionBasedKernel, givenKernelWithDictInput_with dict.insert("key2", "value2"); auto outputs = callOp(*op, dict); EXPECT_EQ(1, outputs.size()); - EXPECT_EQ("value2", outputs[0].toString()->string()); + EXPECT_EQ("value2", outputs[0].toStringRef()); } Dict kernelWithDictOutput(Dict input) { @@ -639,7 +639,7 @@ TEST(OperatorRegistrationTest_FunctionBasedKernel, givenKernelWithOptionalInputs EXPECT_EQ(3, outputs.size()); EXPECT_EQ(DispatchKey::CPU, extractDispatchKey(outputs[0].toTensor())); EXPECT_TRUE(outputs[1].isNone()); - EXPECT_EQ("text", outputs[2].toString()->string()); + EXPECT_EQ("text", outputs[2].toStringRef()); outputs = callOp(*op, dummyTensor(DispatchKey::CPU), c10::IValue(), 4, c10::IValue()); EXPECT_EQ(3, outputs.size()); diff --git a/aten/src/ATen/core/boxing/impl/kernel_lambda_legacy_test.cpp b/aten/src/ATen/core/boxing/impl/kernel_lambda_legacy_test.cpp index 0b4d1e8ad6b7..dc527d98eb99 100644 --- a/aten/src/ATen/core/boxing/impl/kernel_lambda_legacy_test.cpp +++ b/aten/src/ATen/core/boxing/impl/kernel_lambda_legacy_test.cpp @@ -456,8 +456,8 @@ TEST(OperatorRegistrationTest_LegacyLambdaBasedKernel, givenKernelWithStringList auto output = std::move(outputs[0]).toList(); EXPECT_EQ(2, output.size()); - EXPECT_EQ("value1", output.get(0).toString()->string()); - EXPECT_EQ("value2", output.get(1).toString()->string()); + EXPECT_EQ("value1", output.get(0).toStringRef()); + EXPECT_EQ("value2", output.get(1).toStringRef()); } TEST(OperatorRegistrationTest_LegacyLambdaBasedKernel, givenKernelWithDictInput_withoutOutput_whenRegistered_thenCanBeCalled) { @@ -494,7 +494,7 @@ TEST(OperatorRegistrationTest_LegacyLambdaBasedKernel, givenKernelWithDictInput_ dict.insert("key2", "value2"); auto outputs = callOp(*op, dict); EXPECT_EQ(1, outputs.size()); - EXPECT_EQ("value2", outputs[0].toString()->string()); + EXPECT_EQ("value2", outputs[0].toStringRef()); } TEST(OperatorRegistrationTest_LegacyLambdaBasedKernel, givenKernelWithDictOutput_whenRegistered_thenCanBeCalled) { @@ -552,7 +552,7 @@ TEST(OperatorRegistrationTest_LegacyLambdaBasedKernel, givenKernelWithUnorderedM dict.insert("key2", "value2"); auto outputs = callOp(*op, dict); EXPECT_EQ(1, outputs.size()); - EXPECT_EQ("value2", outputs[0].toString()->string()); + EXPECT_EQ("value2", outputs[0].toStringRef()); } TEST(OperatorRegistrationTest_LegacyLambdaBasedKernel, givenKernelWithUnorderedMapOutput_whenRegistered_thenCanBeCalled) { @@ -832,7 +832,7 @@ TEST(OperatorRegistrationTest_LegacyLambdaBasedKernel, givenKernelWithOptionalIn EXPECT_EQ(3, outputs.size()); EXPECT_EQ(DispatchKey::CUDA, extractDispatchKey(outputs[0].toTensor())); EXPECT_TRUE(outputs[1].isNone()); - EXPECT_EQ("text", outputs[2].toString()->string()); + EXPECT_EQ("text", outputs[2].toStringRef()); outputs = callOp(*op, dummyTensor(DispatchKey::CPU), c10::IValue(), 4, c10::IValue()); EXPECT_EQ(3, outputs.size()); diff --git a/aten/src/ATen/core/boxing/impl/kernel_lambda_test.cpp b/aten/src/ATen/core/boxing/impl/kernel_lambda_test.cpp index 19f4ee4acbeb..c9b72e23048f 100644 --- a/aten/src/ATen/core/boxing/impl/kernel_lambda_test.cpp +++ b/aten/src/ATen/core/boxing/impl/kernel_lambda_test.cpp @@ -410,7 +410,7 @@ TEST(OperatorRegistrationTest_LambdaBasedKernel, givenKernelWithDictInput_withOu dict.insert("key2", "value2"); auto outputs = callOp(*op, dict); EXPECT_EQ(1, outputs.size()); - EXPECT_EQ("value2", outputs[0].toString()->string()); + EXPECT_EQ("value2", outputs[0].toStringRef()); } TEST(OperatorRegistrationTest_LambdaBasedKernel, givenKernelWithDictOutput_whenRegistered_thenCanBeCalled) { @@ -554,7 +554,7 @@ TEST(OperatorRegistrationTest_LambdaBasedKernel, givenKernelWithOptionalInputs_w EXPECT_EQ(3, outputs.size()); EXPECT_EQ(DispatchKey::CPU, extractDispatchKey(outputs[0].toTensor())); EXPECT_TRUE(outputs[1].isNone()); - EXPECT_EQ("text", outputs[2].toString()->string()); + EXPECT_EQ("text", outputs[2].toStringRef()); outputs = callOp(*op, dummyTensor(DispatchKey::CPU), c10::IValue(), 4, c10::IValue()); EXPECT_EQ(3, outputs.size()); diff --git a/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h b/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h index 0a28330a0bfb..a99f45040788 100644 --- a/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h +++ b/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h @@ -4,6 +4,7 @@ #include #include #include +#include #include #include @@ -342,6 +343,13 @@ namespace impl { } }; + template + struct ivalue_to_arg final { + static List call(IValue& v) { + return v.toTensorList(); + } + }; + template struct ivalue_to_arg, AllowDeprecatedTypes> final { // If an argument is ArrayRef, convert the IValue to a std::vector and pass that @@ -353,7 +361,27 @@ namespace impl { template struct ivalue_to_arg final { static std::vector call(IValue& v) { - return ivalue_to_arg, AllowDeprecatedTypes>::call(v); + if (v.isIntList()) { + std::vector r; + auto src = v.toIntList(); + std::transform(src.begin(), src.end(), std::back_inserter(r), [](int64_t i) { return c10::SymInt(i); }); + return r; + } else { + return ivalue_to_arg, AllowDeprecatedTypes>::call(v); + } + } + }; + template + struct ivalue_to_arg, AllowDeprecatedTypes> final { + static OptionalArray call(IValue& v) { + if (v.isIntList()) { + std::vector r; + auto src = v.toIntList(); + std::transform(src.begin(), src.end(), std::back_inserter(r), [](int64_t i) { return c10::SymInt(i); }); + return OptionalArray(r); + } else { + return std::move(v).to>(); + } } }; template diff --git a/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor_test.cpp b/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor_test.cpp index 933e1bbdf94c..9eebb55cc34b 100644 --- a/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor_test.cpp +++ b/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor_test.cpp @@ -491,7 +491,7 @@ TEST(OperatorRegistrationTest_FunctorBasedKernel, givenKernelWithDictInput_withO dict.insert("key2", "value2"); auto outputs = callOp(*op, dict); EXPECT_EQ(1, outputs.size()); - EXPECT_EQ("value2", outputs[0].toString()->string()); + EXPECT_EQ("value2", outputs[0].toStringRef()); } struct KernelWithDictOutput final : OperatorKernel { @@ -546,7 +546,7 @@ TEST(OperatorRegistrationTest_FunctorBasedKernel, givenKernelWithTupleInput_with std::tuple tup{"foobar", 123, 420.1337}; auto outputs = callOp(*op, tup); EXPECT_EQ(1, outputs.size()); - EXPECT_EQ("foobar", outputs[0].toString()->string()); + EXPECT_EQ("foobar", outputs[0].toStringRef()); } TEST(OperatorRegistrationTest_FunctorBasedKernel, givenKernelWithCache_thenCacheIsKeptCorrectly) { @@ -774,7 +774,7 @@ TEST(OperatorRegistrationTest_FunctorBasedKernel, givenKernelWithOptionalInputs_ EXPECT_EQ(3, outputs.size()); EXPECT_EQ(DispatchKey::CPU, extractDispatchKey(outputs[0].toTensor())); EXPECT_TRUE(outputs[1].isNone()); - EXPECT_EQ("text", outputs[2].toString()->string()); + EXPECT_EQ("text", outputs[2].toStringRef()); outputs = callOp(*op, dummyTensor(DispatchKey::CPU), c10::IValue(), 4, c10::IValue()); EXPECT_EQ(3, outputs.size()); diff --git a/aten/src/ATen/core/class_type.cpp b/aten/src/ATen/core/class_type.cpp index 9d7b38d4d67b..2478bde034bc 100644 --- a/aten/src/ATen/core/class_type.cpp +++ b/aten/src/ATen/core/class_type.cpp @@ -86,7 +86,7 @@ std::string ClassType::getForwardPreHookErrorMessage(int pre_hook_idx) const { std::string pre_hook_schema = pre_hook_name + "(self, input: Tuple[" + input_types + "])"; std::string return_string = - "This error occured while scripting the forward pre-hook '" + + "This error occurred while scripting the forward pre-hook '" + pre_hook_name + "' on module '" + name()->name() + "'. If you did not want to script this pre-hook remove it from the " "original NN module before scripting. Pre-hooks for module '" + @@ -111,7 +111,7 @@ std::string ClassType::getForwardHookErrorMessage(int hook_idx) const { std::string hook_schema = hook_name + "(self, input: Tuple[" + input_types + "], output: " + output_types + ")"; std::string return_string = - "This error occured while scripting the forward hook '" + "This error occurred while scripting the forward hook '" + hook_name + "' on module " + name()->name() + ". If you did not want to script this hook remove it from" + " the original NN module before scripting. This hook was" + diff --git a/aten/src/ATen/core/custom_class.cpp b/aten/src/ATen/core/custom_class.cpp index 2bba7e6df62f..d719dde6ea0c 100644 --- a/aten/src/ATen/core/custom_class.cpp +++ b/aten/src/ATen/core/custom_class.cpp @@ -143,6 +143,7 @@ c10::FunctionSchema class_base::withNewArguments( new_args.emplace_back( default_arg.name_, old_arg.type(), + old_arg.real_type(), old_arg.N(), default_arg.value_); } diff --git a/aten/src/ATen/core/dispatch/DispatchKeyExtractor.h b/aten/src/ATen/core/dispatch/DispatchKeyExtractor.h index 6a46a795be42..7401297c66a6 100644 --- a/aten/src/ATen/core/dispatch/DispatchKeyExtractor.h +++ b/aten/src/ATen/core/dispatch/DispatchKeyExtractor.h @@ -74,7 +74,13 @@ namespace detail { } } } - void operator()(at::ArrayRef>) { + // Structured Tensor[] translates to this case + void operator()(at::ITensorListRef xs) { + for (const auto& x : xs) { + ts = ts | x.key_set(); + } + } + [[noreturn]] void operator()(at::ArrayRef>) { // Just checking that the handling of Tensor?[] didn't change. TORCH_INTERNAL_ASSERT(false); } @@ -114,6 +120,9 @@ namespace detail { * they have been registered as fallthrough. The set of excluded backends * varies from operator, as some operators may have overridden the * fallthrough with custom behavior. + * + * Note - this should maintain identical impl to the py dispatcher key extraction logic + * at pytorch/torch/dispatcher.py */ struct TORCH_API DispatchKeyExtractor final { public: diff --git a/aten/src/ATen/core/dispatch/Dispatcher.cpp b/aten/src/ATen/core/dispatch/Dispatcher.cpp index 667eefdcc5ab..8b2257605161 100644 --- a/aten/src/ATen/core/dispatch/Dispatcher.cpp +++ b/aten/src/ATen/core/dispatch/Dispatcher.cpp @@ -1,6 +1,7 @@ #include #include #include +#include namespace c10 { @@ -9,6 +10,12 @@ bool show_dispatch_trace() { return temp != nullptr; } +static thread_local int64_t dispatch_trace_nesting_value_; + +void dispatch_trace_nesting_incr() { ++dispatch_trace_nesting_value_; } +void dispatch_trace_nesting_decr() { --dispatch_trace_nesting_value_; } +int64_t dispatch_trace_nesting_value() { return dispatch_trace_nesting_value_; } + namespace detail { class RegistrationListenerList final { @@ -44,7 +51,9 @@ Dispatcher::Dispatcher() , operatorLookupTable_() , backendFallbackKernels_() , listeners_(std::make_unique()) -, mutex_() {} +, mutex_() +, cond_var_() +{} Dispatcher::~Dispatcher() = default; @@ -63,6 +72,41 @@ c10::optional Dispatcher::findOp(const OperatorName& overload_na }); } +// NB: If you add more waitFor* implementations, you also have to add +// appropriate notify_all() calls to the relevant register calls + +void Dispatcher::waitForDef(const FunctionSchema& schema) { + using namespace std::chrono_literals; + std::unique_lock lock(mutex_); + bool r = cond_var_.wait_for(lock, 2s, [&]{ + return findOp(schema.operator_name()) != c10::nullopt; + }); + TORCH_INTERNAL_ASSERT(r, + "Expected main interpreter to define ", schema.operator_name(), + ", but this didn't happen within timeout. Are you trying to load " + "different models in the same torchdeploy/multipy instance? You " + "must warmup each interpreter identically, e.g., import all " + "the same dependencies."); +} + +void Dispatcher::waitForImpl(const OperatorName& op_name, c10::optional maybe_dk) { + using namespace std::chrono_literals; + std::unique_lock lock(mutex_); + auto dk = maybe_dk.value_or(DispatchKey::CompositeImplicitAutograd); + auto op = findOrRegisterName_(op_name); + bool r = cond_var_.wait_for(lock, 2s, [&]{ + // NB: this is slightly unsound for overrides, but overrides are + // funny business anyway + return op.hasKernelForDispatchKey(dk); + }); + TORCH_INTERNAL_ASSERT(r, + "Expected main interpreter to implement ", dk, " for ", op_name, + ", but this didn't happen within timeout. Are you trying to load " + "different models in the same torchdeploy/multipy instance? You " + "must warmup each interpreter identically, e.g., import all " + "the same dependencies."); +} + c10::optional Dispatcher::findSchema(const OperatorName& overload_name) { auto it = findOp(overload_name); if (it.has_value()) { @@ -169,6 +213,8 @@ RegistrationHandleRAII Dispatcher::registerDef(FunctionSchema schema, std::strin ++op.operatorDef_->def_count; ++op.operatorDef_->def_and_impl_count; + cond_var_.notify_all(); + return RegistrationHandleRAII([this, op, op_name] { deregisterDef_(op, op_name); }); @@ -221,6 +267,8 @@ RegistrationHandleRAII Dispatcher::registerImpl( ++op.operatorDef_->def_and_impl_count; + cond_var_.notify_all(); + return RegistrationHandleRAII([this, op, op_name, dispatch_key, handle] { deregisterImpl_(op, op_name, dispatch_key, handle); }); @@ -243,6 +291,7 @@ RegistrationHandleRAII Dispatcher::registerName(OperatorName op_name) { std::lock_guard lock(mutex_); auto op = findOrRegisterName_(op_name); ++op.operatorDef_->def_and_impl_count; + return RegistrationHandleRAII( [this, op, op_name] { deregisterName_(op, op_name); }); } diff --git a/aten/src/ATen/core/dispatch/Dispatcher.h b/aten/src/ATen/core/dispatch/Dispatcher.h index bc40bc5b62e0..5af8ef1e52de 100644 --- a/aten/src/ATen/core/dispatch/Dispatcher.h +++ b/aten/src/ATen/core/dispatch/Dispatcher.h @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -19,6 +20,14 @@ namespace c10 { TORCH_API bool show_dispatch_trace(); +TORCH_API void dispatch_trace_nesting_incr(); +TORCH_API void dispatch_trace_nesting_decr(); +TORCH_API int64_t dispatch_trace_nesting_value(); + +struct DispatchTraceNestingGuard { + DispatchTraceNestingGuard() { dispatch_trace_nesting_incr(); } + ~DispatchTraceNestingGuard() { dispatch_trace_nesting_decr(); } +}; class TORCH_API OperatorHandle; template class TypedOperatorHandle; @@ -168,6 +177,15 @@ class TORCH_API Dispatcher final { // See Note [Plumbing Keys Through The Dispatcher] void redispatchBoxed(const OperatorHandle& op, DispatchKeySet dispatchKeySet, Stack* stack) const; + bool hasBackendFallbackForDispatchKey(DispatchKey dk) { + auto dispatch_ix = getDispatchTableIndexForDispatchKey(dk); + if (dispatch_ix < 0) return false; + return backendFallbackKernels_[dispatch_ix].kernel.isValid(); + } + + // Used by torchdeploy/multipy for multiple interpreters racing. + void waitForDef(const FunctionSchema& schema); + void waitForImpl(const OperatorName& op_name, c10::optional dispatch_key); // ------------------------------------------------------------------------ // @@ -293,7 +311,23 @@ class TORCH_API Dispatcher final { std::array backendFallbackKernels_; std::unique_ptr listeners_; + + // This mutex protects concurrent access to the dispatcher std::mutex mutex_; + + // This condition variable gets notified whenever we add a new def/impl to the + // dispatch table. This is primarily used by multipy/torchdeploy, when + // we have multiple interpreters trying to register to the dispatch table. + // In this situation, whenever the non-primary interpreter would have tried + // to register to the dispatch table, instead it will check to see if the + // expected registration has already been made, and if it hasn't, wait on + // this condition variable to see if it was just racing with the primary + // interpreter. + // + // We expect it to be rare for there to be any waiters on this condition + // variable. This is mostly just to help give better diagnostics if + // something goes horribly wrong + std::condition_variable cond_var_; }; /** @@ -302,6 +336,8 @@ class TORCH_API Dispatcher final { * to lookup a kernel for a certain set of arguments. */ class TORCH_API OperatorHandle { + template friend class std::hash; + public: OperatorHandle(OperatorHandle&&) noexcept = default; OperatorHandle& operator=(OperatorHandle&&) noexcept = default; @@ -333,6 +369,10 @@ class TORCH_API OperatorHandle { return operatorDef_->op.hasKernelForDispatchKey(k); } + bool hasKernelForAnyDispatchKey(DispatchKeySet k) const { + return operatorDef_->op.hasKernelForAnyDispatchKey(k); + } + bool hasComputedKernelForDispatchKey(DispatchKey k) const { return operatorDef_->op.hasComputedKernelForDispatchKey(k); } @@ -388,6 +428,19 @@ class TORCH_API OperatorHandle { c10::Dispatcher::singleton().redispatchBoxed(*this, ks, stack); } + template + PyObject* getPythonOp(c10::impl::PyInterpreter* self_interpreter, F slow_accessor) const { + return operatorDef_->op.getPythonOp(self_interpreter, slow_accessor); + } + + bool operator==(const OperatorHandle& other) const { + return operatorDef_ == other.operatorDef_; + } + + bool operator!=(const OperatorHandle& other) const { + return operatorDef_ != other.operatorDef_; + } + private: explicit OperatorHandle(std::list::iterator operatorIterator) : operatorDef_(&*operatorIterator), operatorIterator_(operatorIterator) {} @@ -568,7 +621,10 @@ C10_ALWAYS_INLINE_UNLESS_MOBILE Return Dispatcher::call(const TypedOperatorHandl auto dispatchKeySet = op.operatorDef_->op.dispatchKeyExtractor() .template getDispatchKeySetUnboxed(args...); #ifndef NDEBUG + DispatchTraceNestingGuard debug_guard; if (show_dispatch_trace()) { + auto nesting_value = dispatch_trace_nesting_value(); + for (int64_t i = 0; i < nesting_value; ++i) std::cerr << " "; std::cerr << "[call] op=[" << op.operator_name() << "], key=[" << toString(dispatchKeySet.highestPriorityTypeId()) << "]" << std::endl; } #endif @@ -588,7 +644,10 @@ inline Return Dispatcher::redispatch(const TypedOperatorHandle detail::unused_arg_(args...); // workaround for a false-positive warning about unused parameters in gcc 5 // do not use RecordFunction on redispatch #ifndef NDEBUG + DispatchTraceNestingGuard debug_guard; if (show_dispatch_trace()) { + auto nesting_value = dispatch_trace_nesting_value(); + for (int64_t i = 0; i < nesting_value; ++i) std::cerr << " "; std::cerr << "[redispatch] op=[" << op.operator_name() << "], key=[" << toString(currentDispatchKeySet.highestPriorityTypeId()) << "]" << std::endl; } #endif @@ -601,7 +660,10 @@ inline void Dispatcher::callBoxed(const OperatorHandle& op, Stack* stack) const const auto& entry = op.operatorDef_->op; auto dispatchKeySet = entry.dispatchKeyExtractor().getDispatchKeySetBoxed(stack); #ifndef NDEBUG + DispatchTraceNestingGuard debug_guard; if (show_dispatch_trace()) { + auto nesting_value = dispatch_trace_nesting_value(); + for (int64_t i = 0; i < nesting_value; ++i) std::cerr << " "; std::cerr << "[callBoxed] op=[" << op.operator_name() << "], key=[" << toString(dispatchKeySet.highestPriorityTypeId()) << "]" << std::endl; } #endif @@ -635,16 +697,26 @@ inline void Dispatcher::callBoxedForDispatchKey(const OperatorHandle& op, Dispat // We still compute this as we're obligated to pass it on to the internal // kernel, if it is a boxed fallback auto dispatchKeySet = entry.dispatchKeyExtractor().getDispatchKeySetBoxed(stack); - const auto& kernel = entry.kernelForDispatchKey(dk); + const auto& kernel = ([&]() { + if (op.hasKernelForDispatchKey(dk)) { + return entry.kernelForDispatchKey(dk); + } else { + auto idx = getDispatchTableIndexForDispatchKey(dk); + TORCH_INTERNAL_ASSERT(idx >= 0); + return backendFallbackKernels_[idx].kernel; + } + })(); kernel.callBoxed(op, dispatchKeySet, stack); } - inline void Dispatcher::redispatchBoxed(const OperatorHandle& op, DispatchKeySet dispatchKeySet, Stack* stack) const { // note: this doesn't need the mutex because write operations on the list keep iterators intact. const auto& entry = op.operatorDef_->op; #ifndef NDEBUG + DispatchTraceNestingGuard debug_guard; if (show_dispatch_trace()) { + auto nesting_value = dispatch_trace_nesting_value(); + for (int64_t i = 0; i < nesting_value; ++i) std::cerr << " "; std::cerr << "[redispatchBoxed] op=[" << op.operator_name() << "], key=[" << toString(dispatchKeySet.highestPriorityTypeId()) << "]" << std::endl; } #endif @@ -653,3 +725,14 @@ inline void Dispatcher::redispatchBoxed(const OperatorHandle& op, DispatchKeySet } } // namespace c10 + +namespace std { + +template <> +struct hash { + size_t operator()(c10::OperatorHandle op) const noexcept { + return std::hash{}(static_cast(op.operatorDef_)); + } +}; + +} // namespace std diff --git a/aten/src/ATen/core/dispatch/OperatorEntry.cpp b/aten/src/ATen/core/dispatch/OperatorEntry.cpp index 5c1c42bb6226..5bd5d8abf54d 100644 --- a/aten/src/ATen/core/dispatch/OperatorEntry.cpp +++ b/aten/src/ATen/core/dispatch/OperatorEntry.cpp @@ -26,6 +26,7 @@ OperatorEntry::OperatorEntry(OperatorName&& operator_name) , dispatchKeyExtractor_(DispatchKeyExtractor::makeUninitialized()) , kernels_() , cpp_signature_() +, sym_cpp_signature_() , is_observed_(ObservedOperators::isObserved(name_)) { // Pick up any backend fallbacks that were registered prior to this @@ -34,7 +35,10 @@ OperatorEntry::OperatorEntry(OperatorName&& operator_name) } namespace { - void checkSchema(const OperatorName& name, const FunctionSchema& from_def, const std::string& from_def_debug, const FunctionSchema& inferred, const std::string& inferred_debug) { + void checkSchema(const OperatorName& name, const FunctionSchema& from_def_, const std::string& from_def_debug, const KernelFunction& kernel, const FunctionSchema& inferred_, const std::string& inferred_debug) { + // TODO: figure out if we can just directly save real schema at def time + FunctionSchema from_def = from_def_.cloneWithRealTypes(kernel.isValidSymUnboxed()); + FunctionSchema inferred = inferred_.cloneWithRealTypes(); c10::optional schema_difference = findSchemaDifferences(from_def, inferred); if (schema_difference.has_value()) { TORCH_CHECK(false, @@ -60,12 +64,24 @@ const AnnotatedKernel& OperatorEntry::ambiguousAutogradOtherKernel() const { return kernel; } +void OperatorEntry::assertSignatureIsCorrect(const CppSignature call_signature, bool has_symint) const { + if (has_symint) { + if (C10_UNLIKELY(sym_cpp_signature_.has_value() && (call_signature != sym_cpp_signature_->signature))) { + reportSignatureError(call_signature, *sym_cpp_signature_); + } + } else { + if (C10_UNLIKELY(cpp_signature_.has_value() && (call_signature != cpp_signature_->signature))) { + reportSignatureError(call_signature, *cpp_signature_); + } + } +} + void OperatorEntry::registerSchema(FunctionSchema&& schema, std::string&& debug, std::vector tags) { TORCH_INTERNAL_ASSERT(!schema_.has_value()); for (const auto& kernel : kernels_) { for (const auto &j : kernel.second) { if (j.inferred_function_schema != nullptr) { - checkSchema(name_, schema, debug, *j.inferred_function_schema, j.debug); + checkSchema(name_, schema, debug, j.kernel, *j.inferred_function_schema, j.debug); } } } @@ -99,25 +115,26 @@ OperatorEntry::AnnotatedKernelContainerIterator OperatorEntry::registerKernel( // which means if you could validly change the type of a cpp_signature, then // that would also invalidate the old TypedOperatorHandles. if (cpp_signature.has_value()) { - if (cpp_signature_.has_value()) { - TORCH_CHECK(*cpp_signature == cpp_signature_->signature, + auto& local_cpp_signature = kernel.isValidSymUnboxed() ? sym_cpp_signature_ : cpp_signature_; + if (local_cpp_signature.has_value()) { + TORCH_CHECK(*cpp_signature == local_cpp_signature->signature, "\nMismatch in kernel C++ signatures\n", " operator: ", (this->schema_.has_value() ? toString(this->schema_->schema) : toString(name_)), "\n", " ", (this->schema_.has_value() ? this->schema_->debug : "no debug info"), "\n", - " kernel 1: ", cpp_signature_->signature.name(), "\n", - " dispatch key: ", toString(cpp_signature_->dispatch_key), "\n", - " ", cpp_signature_->debug, "\n", + " kernel 1: ", local_cpp_signature->signature.name(), "\n", + " dispatch key: ", toString(local_cpp_signature->dispatch_key), "\n", + " ", local_cpp_signature->debug, "\n", " kernel 2: ", cpp_signature->name(), "\n", " dispatch key: ", toString(dispatch_key), "\n", " ", debug, "\n" ); } else { - cpp_signature_ = CppSignatureWithDebug { *cpp_signature, debug, dispatch_key }; + local_cpp_signature = CppSignatureWithDebug { *cpp_signature, debug, dispatch_key }; } } if (schema_ && inferred_function_schema) { - checkSchema(name_, schema_->schema, schema_->debug, *inferred_function_schema, debug); + checkSchema(name_, schema_->schema, schema_->debug, kernel, *inferred_function_schema, debug); } // Add the kernel to the kernels list, @@ -130,13 +147,17 @@ OperatorEntry::AnnotatedKernelContainerIterator OperatorEntry::registerKernel( #else if (k.size() > 0) { #endif - TORCH_WARN("Overriding a previously registered kernel for the same operator and the same dispatch key\n", - " operator: ", (schema_.has_value() ? toString(schema_->schema) : toString(name_)), "\n", - " ", (this->schema_.has_value() ? this->schema_->debug : "no debug info"), "\n", - " dispatch key: ", toString(dispatch_key), "\n", - " previous kernel: ", (cpp_signature_.has_value() ? cpp_signature_->debug : "no debug info"), "\n", - " new kernel: ", debug - ); + // Suppress the warning for Meta key as we are overriding C++ meta functions with python meta functions + // for some ops + if (dispatch_key != DispatchKey::Meta) { + TORCH_WARN("Overriding a previously registered kernel for the same operator and the same dispatch key\n", + " operator: ", (schema_.has_value() ? toString(schema_->schema) : toString(name_)), "\n", + " ", (this->schema_.has_value() ? this->schema_->debug : "no debug info"), "\n", + " dispatch key: ", toString(dispatch_key), "\n", + " previous kernel: ", (cpp_signature_.has_value() ? cpp_signature_->debug : (sym_cpp_signature_.has_value() ? sym_cpp_signature_->debug : "no debug info")), "\n", + " new kernel: ", debug + ); + } } #ifdef C10_DISPATCHER_ONE_KERNEL_PER_DISPATCH_KEY @@ -303,6 +324,19 @@ std::pair OperatorEntry::computeDispatchTab // For AutogradOther, we return ambiguousAutogradOtherKernel() if there's registration // to any of its backends. // See Note [Undefined in dispatchTable_] for the special handling for Undefined. + + // If the dispatch key is included in CompositeImplicitAutogradNestedTensor, + // then we register it to nested-tensor kernel rather than + // regular-tensor CompositeImplicitAutograd kernel. + // We have no intention to change the behavior of Undefined, + // so this nested-tensor branch requires `dispatch_key != DispatchKey::Undefined` + // to let the original CompositeImplicitAutograd handle Undefined + if (dispatch_key != DispatchKey::Undefined && isIncludedInAlias(dispatch_key, DispatchKey::CompositeImplicitAutogradNestedTensor)) { + if (auto nested_registration = getKernelForDispatchKey(DispatchKey::CompositeImplicitAutogradNestedTensor)) { + return {*nested_registration, "nested kernel"}; + } + } + if (dispatch_key == DispatchKey::Undefined || isIncludedInAlias(dispatch_key, DispatchKey::CompositeImplicitAutograd)) { if (auto math_registration = getKernelForDispatchKey(DispatchKey::CompositeImplicitAutograd)) { if (dispatch_key == DispatchKey::AutogradOther @@ -452,19 +486,35 @@ std::string OperatorEntry::listAllDispatchKeys() const { return str.str(); } -void OperatorEntry::reportSignatureError(const CppSignature call_signature) const { +void OperatorEntry::reportSignatureError(const CppSignature& call_signature, const CppSignatureWithDebug& saved_signature) const { TORCH_CHECK(false, "\nTried to access or call an operator with a wrong signature.\n", " operator: ", (schema_.has_value() ? toString(schema_->schema) : toString(name_)), "\n", " ", (schema_.has_value() ? schema_->debug : "unknown debug info"), "\n", - " correct signature: ", cpp_signature_->signature.name(), "\n", - " ", cpp_signature_->debug, "\n", + " correct signature: ", saved_signature.signature.name(), "\n", + " ", saved_signature.debug, "\n", " accessed/called as: ", call_signature.name(), "\n", "This likely happened in a call to OperatorHandle::typed(). ", "Please make sure that the function signature matches the signature in the operator registration call." ); }; +std::string post_process_dispatch_key_str(std::string dispatch_key) { + const std::string substr = "PrivateUse1"; + if (substr.size() <= dispatch_key.size() && std::equal(substr.rbegin(), substr.rend(), dispatch_key.rbegin())) { + auto privateuse1_backend = get_privateuse1_backend(); + if (privateuse1_backend != "privateuseone") { + // remove trailing "*PrivateUse1" + dispatch_key.erase(dispatch_key.length() - substr.length()); + // append the registered backend's name. + // AutogradPrivateUse1 -> AutogradFoo + auto backend_name = c10::get_privateuse1_backend(); + dispatch_key = dispatch_key + backend_name; + } + } + return dispatch_key; +} + void OperatorEntry::reportError(DispatchKey dispatchKey) const { // If there is an invariant problem, report it now. checkInvariants(); @@ -479,7 +529,7 @@ void OperatorEntry::reportError(DispatchKey dispatchKey) const { } TORCH_CHECK_NOT_IMPLEMENTED(false, "Could not run '", name_, "' with arguments", - " from the '", toString(dispatchKey), "' backend. This could be because " + " from the '", post_process_dispatch_key_str(toString(dispatchKey)), "' backend. This could be because " "the operator doesn't exist for this backend, or was omitted during ", "the selective/custom build process (if using custom build). If you are a ", "Facebook employee using PyTorch on mobile, please visit ", diff --git a/aten/src/ATen/core/dispatch/OperatorEntry.h b/aten/src/ATen/core/dispatch/OperatorEntry.h index 1d9f1495f3c7..c3bd91197f5e 100644 --- a/aten/src/ATen/core/dispatch/OperatorEntry.h +++ b/aten/src/ATen/core/dispatch/OperatorEntry.h @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -163,14 +164,10 @@ class TORCH_API OperatorEntry final { // Asserts that the given FuncType is correct for calling this operator in an unboxed way. template inline void assertSignatureIsCorrect() { - assertSignatureIsCorrect(CppSignature::make()); + assertSignatureIsCorrect(CppSignature::make(), fn_has_symint::value); } - void assertSignatureIsCorrect(const CppSignature call_signature) { - if (C10_UNLIKELY(cpp_signature_.has_value() && (call_signature != cpp_signature_->signature))) { - reportSignatureError(call_signature); - } - } + void assertSignatureIsCorrect(const CppSignature call_signature, bool has_symint) const; [[noreturn]] void reportError(DispatchKey dispatchKey) const; @@ -215,6 +212,11 @@ class TORCH_API OperatorEntry final { // Returns all the operator tags added at the time of registration const std::vector& getTags() const; + template + PyObject* getPythonOp(PyInterpreter* self_interpreter, F slow_accessor) const { + return py_cache_.ptr_or(self_interpreter, slow_accessor); + } + private: OperatorName name_; @@ -224,6 +226,8 @@ class TORCH_API OperatorEntry final { #endif std::array dispatchTable_; DispatchKeyExtractor dispatchKeyExtractor_; + // Pointer to the torch.ops.ns.op.overload object for speed + c10::PyHandleCache py_cache_; // kernels_ stores all registered kernels for the corresponding dispatch key // and catchAllKernels_ stores the catch-all kernels. @@ -280,11 +284,12 @@ class TORCH_API OperatorEntry final { c10::optional dispatch_key; }; c10::optional cpp_signature_; + c10::optional sym_cpp_signature_; // Whether this operator needs to be observed with RecordFunction const bool is_observed_; - [[noreturn]] void reportSignatureError(CppSignature call_signature) const; + [[noreturn]] void reportSignatureError(const CppSignature& call_signature, const CppSignatureWithDebug& saved_signature) const; const KernelFunction& computeDispatchTableEntry(const c10::Dispatcher& dispatcher, DispatchKey dispatch_key) const; std::pair computeDispatchTableEntryWithDebug( const c10::Dispatcher& dispatcher, DispatchKey dispatch_key diff --git a/aten/src/ATen/core/dynamic_type.cpp b/aten/src/ATen/core/dynamic_type.cpp index 5920d7c05f1f..49dd593e38d3 100644 --- a/aten/src/ATen/core/dynamic_type.cpp +++ b/aten/src/ATen/core/dynamic_type.cpp @@ -231,8 +231,6 @@ TypePtr DynamicType::fallback() const { return BoolType::get(); case Tag::Int: return IntType::get(); - case Tag::SymInt: - return SymIntType::get(); case Tag::Float: return FloatType::get(); case Tag::Complex: @@ -326,8 +324,6 @@ DynamicType::Ptr IValue::TagType::get(const c10::IValue& v) { return DynamicTypeTrait::getBaseType(); case Tag::Int: return DynamicTypeTrait::getBaseType(); - case Tag::SymInt: - return DynamicTypeTrait::getBaseType(); case Tag::Bool: return DynamicTypeTrait::getBaseType(); case Tag::String: diff --git a/aten/src/ATen/core/dynamic_type.h b/aten/src/ATen/core/dynamic_type.h index a84644ddde04..1f649c8217cb 100644 --- a/aten/src/ATen/core/dynamic_type.h +++ b/aten/src/ATen/core/dynamic_type.h @@ -16,7 +16,6 @@ constexpr DynamicTypeBits kDynamicAnyTypeBit = DYNAMIC_TYPE_BIT(30); constexpr DynamicTypeBits kDynamicNoneTypeBit = DYNAMIC_TYPE_BIT(1); constexpr DynamicTypeBits kDynamicIntTypeBit = DYNAMIC_TYPE_BIT(3); -constexpr DynamicTypeBits kDynamicSymIntTypeBit = DYNAMIC_TYPE_BIT(23); constexpr DynamicTypeBits kDynamicFloatTypeBit = DYNAMIC_TYPE_BIT(4); constexpr DynamicTypeBits kDynamicComplexTypeBit = DYNAMIC_TYPE_BIT(5); constexpr DynamicTypeBits kDynamicListTypeBit = DYNAMIC_TYPE_BIT(7); @@ -29,7 +28,6 @@ constexpr DynamicTypeBits kDynamicClassTypeBit = DYNAMIC_TYPE_BIT(10); _(Bool, DYNAMIC_TYPE_BIT(2), 1) \ _(Int, kDynamicIntTypeBit, 1) \ _(Float, kDynamicFloatTypeBit, 1) \ - _(SymInt, kDynamicSymIntTypeBit, 1) \ _(Complex, kDynamicComplexTypeBit, 1) \ _(Number, \ (kDynamicIntTypeBit | kDynamicFloatTypeBit | kDynamicComplexTypeBit), \ @@ -63,6 +61,7 @@ constexpr DynamicTypeBits kDynamicClassTypeBit = DYNAMIC_TYPE_BIT(10); #define FORALL_DYNAMIC_TYPES_FAKE(_) \ _(ScalarType, kDynamicIntTypeBit, 1) \ _(Layout, kDynamicIntTypeBit, 1) \ + _(SymInt, kDynamicIntTypeBit, 1) \ _(MemoryFormat, kDynamicIntTypeBit, 1) #define FORWARD_DECL_TYPE(NAME, _, __) struct NAME ## Type; diff --git a/aten/src/ATen/core/function_schema.cpp b/aten/src/ATen/core/function_schema.cpp index a3a10862178c..440ee446d499 100644 --- a/aten/src/ATen/core/function_schema.cpp +++ b/aten/src/ATen/core/function_schema.cpp @@ -17,6 +17,37 @@ const std::vector& FunctionSchema::getCorrectList(SchemaArgType type) } } +FunctionSchema FunctionSchema::cloneWithRealTypes(bool with_symint) const { + auto cloneWithRealTypes = [&](const Argument& a) { + if (with_symint) { + return a.cloneWithType(a.real_type()); + } + // Don't use real type if it looks like a SymInt + // NB: keep this in sync with unpackSymInt in KernelFunction_impl.h + if ( + *a.real_type() == *getTypePtr() || + *a.real_type() == *getTypePtr>() || + *a.real_type() == *getTypePtr() || + *a.real_type() == *getTypePtr() + ) { + // Keep the fake type + return a.cloneWithType(a.type()); + } else { + return a.cloneWithType(a.real_type()); + } + }; + std::vector new_arguments, new_returns; + std::transform(arguments().begin(), arguments().end(), std::back_inserter(new_arguments), cloneWithRealTypes); + std::transform(returns().begin(), returns().end(), std::back_inserter(new_returns), cloneWithRealTypes); + return FunctionSchema( + name(), + overload_name(), + std::move(new_arguments), + std::move(new_returns), + is_vararg(), + is_varret()); +} + bool FunctionSchema::canAliasTypeSetsAlias(const c10::optional &lhs, const c10::optional &rhs) const { if (!lhs || !rhs) { return false; diff --git a/aten/src/ATen/core/function_schema.h b/aten/src/ATen/core/function_schema.h index 77fdb20f6516..d80eaf6581e0 100644 --- a/aten/src/ATen/core/function_schema.h +++ b/aten/src/ATen/core/function_schema.h @@ -44,7 +44,7 @@ struct Argument { c10::optional alias_info = c10::nullopt) : name_(std::move(name)), type_(fake_type ? std::move(fake_type) : TensorType::get()), - real_type_(real_type ? std::move(real_type) : TensorType::get()), + real_type_(real_type ? std::move(real_type) : type_), N_(std::move(N)), default_value_(std::move(default_value)), alias_info_(alias_info ? std::make_unique(std::move(*alias_info)) : nullptr), @@ -88,6 +88,8 @@ struct Argument { const TypePtr& type() const { return type_; } + // if type() is non-null, this is guaranteed to be non-null (if no real + // type was provided, this takes on type()'s value) const TypePtr& real_type() const { return real_type_; } @@ -214,6 +216,7 @@ enum struct TORCH_API SchemaArgType { input, output }; struct TORCH_API SchemaArgument { SchemaArgType type; size_t index; + SchemaArgument(SchemaArgType tpe, size_t idx) : type(tpe), index(idx) {} bool operator==(const SchemaArgument& rhs) const { return type == rhs.type && index == rhs.index; } @@ -472,6 +475,8 @@ struct TORCH_API FunctionSchema { FunctionSchema cloneWithRemappedTypes( const std::function type_map) const; + FunctionSchema cloneWithRealTypes(bool with_symint=true) const; + // Check that inputs have the correct types and appends any missing default // values. template @@ -546,19 +551,31 @@ inline std::ostream& operator<<(std::ostream& out, const Argument& arg) { // in schema, we have Tensor?(a!) input, and t(a!)?. // however, t?(a!) doesn't work with schema parser. // so we always use Type(alias)? format - auto type = arg.type(); + // real_type versus fake_type: in order to be compatible with FunctionSchema + // parser, printing an argument with either MemoryFormat or Layout type should + // give us the original schema string, hence printing out real_type. + auto type = arg.real_type(); bool is_opt = type->kind() == OptionalType::Kind; auto unopt_type = is_opt ? type->castRaw()->getElementType() : type; - if (unopt_type->kind() == ListType::Kind && arg.N()) { + if (unopt_type->kind() == ListType::Kind) { // sized lists get size N from arg, not type auto list = unopt_type->cast(); - out << list->getElementType()->str() << "[" << *arg.N() << "]"; + out << list->getElementType()->str(); + if (arg.alias_info() && !arg.alias_info()->containedTypes().empty()){ + out << arg.alias_info()->containedTypes()[0]; + } + std::string N = ""; + if (arg.N()) { + N = std::to_string(*arg.N()); + } + out << "[" << N << "]"; } else { out << unopt_type->str(); } - if (arg.alias_info()) { + // print alias info if it has beforeSets. + if (arg.alias_info() && !arg.alias_info()->beforeSets().empty()) { out << *arg.alias_info(); } diff --git a/aten/src/ATen/core/interned_strings.h b/aten/src/ATen/core/interned_strings.h index dc5860ebf2c4..2abc6217516d 100644 --- a/aten/src/ATen/core/interned_strings.h +++ b/aten/src/ATen/core/interned_strings.h @@ -50,8 +50,11 @@ namespace c10 { _(prim, FunctionalGraph) \ _(prim, add_optional) \ _(prim, view_copy) \ + _(prim, permute_copy) \ _(prim, reshape_copy) \ _(prim, squeeze_copy) \ + _(prim, t_copy) \ + _(prim, transpose_copy) \ _(prim, unsqueeze_copy) \ _(prim, flatten_copy) \ _(prim, expand_copy) \ @@ -236,6 +239,7 @@ namespace c10 { _(onnx, LSTM) \ _(onnx, MatMul) \ _(onnx, Min) \ + _(onnx, Max) \ _(onnx, Mul) \ _(onnx, Pow) \ _(onnx, RNN) \ diff --git a/aten/src/ATen/core/ivalue.cpp b/aten/src/ATen/core/ivalue.cpp index eb977f09cbe6..304ff8cf3f5c 100644 --- a/aten/src/ATen/core/ivalue.cpp +++ b/aten/src/ATen/core/ivalue.cpp @@ -93,6 +93,8 @@ c10::TypePtr IValue::TagType::get(const IValue& v) { return IntType::get(); case Tag::SymInt: return c10::SymIntType::get(); + case Tag::SymFloat: + return c10::SymFloatType::get(); case Tag::Bool: return BoolType::get(); case Tag::String: @@ -302,6 +304,10 @@ IValue IValue::equals(const IValue& rhs) const { return rhs.isInt() && lhs.toInt() == rhs.toInt(); case Tag::SymInt: return rhs.isSymInt() && lhs.toSymInt() == rhs.toSymInt(); + case Tag::SymFloat: + // NB: this doesn't actually work as sym floats don't have equality + // defined + return rhs.isSymFloat() && lhs.toSymFloat() == rhs.toSymFloat(); case Tag::Bool: return rhs.isBool() && lhs.toBool() == rhs.toBool(); case Tag::String: @@ -353,8 +359,11 @@ size_t IValue::hash(const IValue& v) { return c10::get_hash(v.payload.u.as_int); case Tag::Int: return c10::get_hash(v.payload.u.as_int); + // NB: these are technically strict aliasing violations case Tag::SymInt: return c10::get_hash(v.payload.u.as_int); + case Tag::SymFloat: + return c10::get_hash(v.payload.u.as_int); case Tag::String: return c10::get_hash(v.toStringRef()); case Tag::Tuple: @@ -584,6 +593,8 @@ std::ostream& IValue::repr( return out << v.toInt(); case IValue::Tag::SymInt: return out << v.toSymInt(); + case IValue::Tag::SymFloat: + return out << v.toSymFloat(); case IValue::Tag::Bool: return out << (v.toBool() ? "True" : "False"); case IValue::Tag::Tuple: { @@ -772,6 +783,8 @@ std::ostream& operator<<(std::ostream & out, const IValue & v) { return out << v.toInt(); case IValue::Tag::SymInt: return out << v.toSymInt(); + case IValue::Tag::SymFloat: + return out << v.toSymFloat(); case IValue::Tag::Bool: return out << (v.toBool() ? "True" : "False"); case IValue::Tag::Tuple: { @@ -906,6 +919,7 @@ IValue IValue::deepcopy( case IValue::Tag::Double: case IValue::Tag::Int: case IValue::Tag::SymInt: + case IValue::Tag::SymFloat: case IValue::Tag::Bool: case IValue::Tag::Device: case IValue::Tag::Uninitialized: { diff --git a/aten/src/ATen/core/ivalue.h b/aten/src/ATen/core/ivalue.h index 8d0199b3c954..3461fe2300e4 100644 --- a/aten/src/ATen/core/ivalue.h +++ b/aten/src/ATen/core/ivalue.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -27,6 +28,8 @@ template class Dict; template class List; +template +class IListRef; struct IValue; struct ClassType; struct Type; @@ -145,6 +148,7 @@ struct Capsule { _(ComplexDouble) \ _(Int) \ _(SymInt) \ + _(SymFloat) \ _(Bool) \ _(Tuple) \ _(String) \ @@ -421,6 +425,7 @@ struct TORCH_API IValue final { at::Tensor& toTensor() &; const at::Tensor& toTensor() const&; at::TensorImpl* unsafeToTensorImpl() const { + TORCH_INTERNAL_ASSERT(isTensor()); return payload.as_tensor.unsafeGetTensorImpl(); } @@ -558,21 +563,35 @@ struct TORCH_API IValue final { IValue(c10::SymInt i) { if (i.is_symbolic()) { tag = Tag::SymInt; - payload.u.as_intrusive_ptr = i.toSymIntNodeImpl().release(); + payload.u.as_intrusive_ptr = i.toSymNodeImpl().release(); } else { tag = Tag::Int; payload.u.as_int = i.as_int_unchecked(); } } - IValue(c10::SymIntArrayRef v); - bool isSymInt() const { return Tag::SymInt == tag; } c10::SymInt toSymInt() const; + IValue(c10::SymFloat i) { + if (i.is_symbolic()) { + tag = Tag::SymFloat; + payload.u.as_intrusive_ptr = i.toSymNodeImpl().release(); + } else { + tag = Tag::Double; + payload.u.as_double = i.as_float_unchecked(); + } + } + + bool isSymFloat() const { + return Tag::SymFloat == tag; + } + + c10::SymFloat toSymFloat() const; + // allow you to pass literals (3, 4) without ambiguity IValue(int32_t i) : IValue(static_cast(i)) {} @@ -665,22 +684,59 @@ struct TORCH_API IValue final { c10::ArrayRef toListRef() const; // Some template constructors of IValue calls another constructor recursively. - // This SNIFAEs the called constructor exists. + // This SFINAEs the called constructor exists. template using enable_if_ivalue_constructible = std::enable_if_t::value, std::nullptr_t>; - template = nullptr> + // The rule for lists is more complicated; the generic constructor is only + // acceptable if your element isn't SymInt. If you do have a SymInt element, + // then you must also, at construction time, check if you can decay the list + // into an int list (this is MANDATORY, as at a use site we may expect + // toIntList to work even if at the call site you had a SymIntArrayRef + // argument). In practice, only SymIntArrayRef is used this way, so we + // didn't bother making it work for the other constructors, we just make sure + // they're not selectable. + template + using enable_if_list_is_ivalue_constructible = + std::enable_if_t::value && + !std::is_same::value, std::nullptr_t>; + + template = nullptr> IValue(c10::List&& v); - template = nullptr> + template = nullptr> IValue(const c10::List& v); - template = nullptr> + template = nullptr> IValue(at::ArrayRef v); - template = nullptr> + template = nullptr> IValue(const std::vector& v); template IValue(std::array v); + // Manual constructors for lists of symints, which decay to int list if + // possible. To avoid ambiguous overload situations, we template them + // to prevent implicit conversions + template + using enable_if_symint = + std::enable_if_t::value, std::nullptr_t>; + + template = nullptr> + IValue(at::ArrayRef v); + template = nullptr> + IValue(at::OptionalArrayRef v); + template = nullptr> + IValue(const std::vector& v); + + template + using enable_if_ilist_is_ivalue_constructible = std::enable_if_t< + std::is_constructible::value && + std::is_constructible::boxed_type>::value && + !std::is_same::value, + std::nullptr_t>; + + template = nullptr> + IValue(c10::IListRef v); + // GenericDict IValue(c10::Dict v); bool isGenericDict() const { @@ -702,7 +758,7 @@ struct TORCH_API IValue final { template = nullptr> IValue(c10::optional v); - template = nullptr> + template = nullptr> IValue(c10::OptionalArrayRef v); IValue(c10::nullopt_t); @@ -753,7 +809,15 @@ struct TORCH_API IValue final { // Scalar, which gets encoded as either an Int, a Double or a ComplexDouble IValue(const at::Scalar& s) : IValue() { - if (s.isFloatingPoint()) { + // NB: do the symbolic versions first, as isFloatingPoint is true + // for both SymFloat and double + if (s.isSymInt()) { + tag = Tag::SymInt; + payload.u.as_intrusive_ptr = s.toSymInt().toSymNodeImpl().release(); + } else if (s.isSymFloat()) { + tag = Tag::SymFloat; + payload.u.as_intrusive_ptr = s.toSymFloat().toSymNodeImpl().release(); + } else if (s.isFloatingPoint()) { tag = Tag::Double; payload.u.as_double = s.toDouble(); } else if (s.isComplex()) { @@ -769,7 +833,7 @@ struct TORCH_API IValue final { } bool isScalar() const { - return isDouble() || isInt() || isComplexDouble() || isBool(); + return isDouble() || isInt() || isComplexDouble() || isBool() || isSymInt() || isSymFloat(); } at::Scalar toScalar() const { @@ -781,6 +845,10 @@ struct TORCH_API IValue final { return toComplexDouble(); else if (isBool()) return toBool(); + else if (isSymInt()) + return toSymInt(); + else if (isSymFloat()) + return toSymFloat(); throw std::runtime_error("IValue is not a Scalar"); } @@ -1077,6 +1145,8 @@ struct TORCH_API IValue final { return false; case Tag::SymInt: return true; + case Tag::SymFloat: + return true; case Tag::Bool: return false; case Tag::Tuple: @@ -1126,6 +1196,7 @@ struct TORCH_API IValue final { } union Payload { + // [TriviallyCopyablePayload] // We use a nested union here so that we can make the copy easy // and efficient in the non-tensor (i.e., trivially copyable) // case. Specifically, we do not have to do a switch-on-tag to @@ -1339,6 +1410,10 @@ struct WeakOrStrongCompilationUnit { return strong_ptr_ != c10::nullopt; } + bool holdingEmptyStrongRef() const { + return holdingStrongRef() && *strong_ptr_ == nullptr; + } + c10::optional> strong_ptr_; c10::optional> weak_ptr_; }; @@ -1362,9 +1437,14 @@ struct TORCH_API WeakOrStrongTypePtr { WeakOrStrongCompilationUnit cu_; TypePtr type_; + bool holds_strong_ref() const { return cu_.holdingStrongRef(); } + + bool holds_empty_strong_ref() const { + return cu_.holdingEmptyStrongRef(); + } }; diff --git a/aten/src/ATen/core/ivalue_inl.h b/aten/src/ATen/core/ivalue_inl.h index 00361c80a01c..bea795c8d81e 100644 --- a/aten/src/ATen/core/ivalue_inl.h +++ b/aten/src/ATen/core/ivalue_inl.h @@ -7,6 +7,7 @@ #include #include +#include #include #include #include @@ -218,12 +219,21 @@ inline at::Generator IValue::toGenerator() const& { inline c10::SymInt IValue::toSymInt() const { AT_ASSERT(isSymInt() || isInt(), "Expected SymInt or int but got ", tagKind()); if (isSymInt()) { - return c10::SymInt::toSymInt(toIntrusivePtr()); + return c10::SymInt(toIntrusivePtr()); } else { return c10::SymInt(payload.u.as_int); } } +inline c10::SymFloat IValue::toSymFloat() const { + AT_ASSERT(isSymFloat() || isDouble(), "Expected SymFloat or double but got ", tagKind()); + if (isSymFloat()) { + return c10::SymFloat(toIntrusivePtr()); + } else { + return c10::SymFloat(payload.u.as_double); + } +} + namespace ivalue { void TORCH_API @@ -1455,6 +1465,10 @@ struct C10_EXPORT ivalue::Object final : c10::intrusive_ptr_target { return !type_.holds_strong_ref(); } + bool is_empty_strong_compilation_ref() const { + return type_.holds_empty_strong_ref(); + } + private: void resizeObject(size_t slot); WeakOrStrongTypePtr type_; @@ -1594,6 +1608,7 @@ DEFINE_TO(at::QScheme, toQScheme) DEFINE_TO(at::Dimname, toDimname) DEFINE_TO(at::Generator, toGenerator) DEFINE_TO(c10::SymInt, toSymInt) +DEFINE_TO(c10::SymFloat, toSymFloat) template struct _fake_type {}; @@ -1987,11 +2002,11 @@ inline IValue::IValue(c10::impl::GenericList v) payload.u.as_intrusive_ptr = null_to_undefined_tensor(v.impl_.release()); } -template > +template > inline IValue::IValue(c10::List&& v) : IValue(impl::toList(std::move(v))) {} -template > +template > inline IValue::IValue(const c10::List& v) : IValue(impl::toList(v)) {} -template > +template > inline IValue::IValue(at::ArrayRef v) : IValue(c10::List()) { auto list = to>(); list.reserve(v.size()); @@ -1999,8 +2014,33 @@ inline IValue::IValue(at::ArrayRef v) : IValue(c10::List()) { list.push_back(e); } } -inline IValue::IValue(c10::SymIntArrayRef v) : IValue(at::ArrayRef(v.data(), v.size())) {} -template > +template > +inline IValue::IValue(at::ArrayRef v) : IValue() { + auto vi = c10::asIntArrayRefSlowOpt(v); + if (vi.has_value()) { + // This list is entirely integers; ensure it is typed as + // an IntList so toIntList works + *this = IValue(*vi); + } else { + // This list has SymInts; type it as a SymInt + *this = IValue(impl::toList(c10::List())); + auto list = to>(); + list.reserve(v.size()); + for (const auto& e : v) { + list.push_back(e); + } + } +} +template > +inline IValue::IValue(at::OptionalArrayRef mb_v) : IValue() { + if (!mb_v.has_value()) return; + *this = IValue(*mb_v); +} +template > +inline IValue::IValue(const std::vector& v) : IValue() { + *this = IValue(at::ArrayRef(v)); +} +template > inline IValue::IValue(const std::vector& v) : IValue(c10::List()) { auto list = to>(); list.reserve(v.size()); @@ -2008,7 +2048,7 @@ inline IValue::IValue(const std::vector& v) : IValue(c10::List()) { list.push_back(e); } } -template > +template > inline IValue::IValue(c10::OptionalArrayRef v) : IValue() { if (v.has_value()) { *this = IValue(std::move(*v)); @@ -2024,6 +2064,25 @@ inline IValue::IValue(std::array v) : IValue(c10::List()) { } } +template > +inline IValue::IValue(c10::IListRef v) : IValue() { + constexpr bool boxed_type_constructs_ivalue = + std::is_constructible::boxed_type>::value; + // First, we try to use the boxed value. + // If we fail (either it's not in the boxed state, or its boxed type + // can not construct an IValue), we fallback to copying the list. + if (boxed_type_constructs_ivalue && v.isBoxed()) { + *this = IValue(impl::toList(v.toBoxed())); + } else { + c10::List list; + list.reserve(v.size()); + for (const auto& t : v) { + list.push_back(t); + } + *this = IValue(impl::toList(std::move(list))); + } +} + inline IValue::IValue(c10::impl::GenericDict v) : tag(Tag::GenericDict) { payload.u.as_intrusive_ptr = null_to_undefined_tensor(v.impl_.release()); diff --git a/aten/src/ATen/core/jit_type.h b/aten/src/ATen/core/jit_type.h index 50b27a0e8fd8..0a8f5e14d9a5 100644 --- a/aten/src/ATen/core/jit_type.h +++ b/aten/src/ATen/core/jit_type.h @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -1309,7 +1310,6 @@ struct TORCH_API SymIntType : public Type { return "SymInt"; } std::string annotation_str_impl(TypePrinter printer = nullptr) const override { - // TODO: will become a Union[SymIntNodeImpl|int] in the near future return "int"; } static const TypeKind Kind = TypeKind::SymIntType; @@ -1320,6 +1320,26 @@ struct TORCH_API SymIntType : public Type { SymIntType() : Type(TypeKind::SymIntType) {} }; +struct SymFloatType; +using SymFloatTypePtr = SingletonTypePtr; +struct TORCH_API SymFloatType : public Type { + bool equals(const Type& rhs) const override { + return rhs.kind() == kind(); + } + std::string str() const override { + return "SymFloat"; + } + std::string annotation_str_impl(TypePrinter printer = nullptr) const override { + return "float"; + } + static const TypeKind Kind = TypeKind::SymFloatType; + // global singleton + static SymFloatTypePtr get(); + + private: + SymFloatType() : Type(TypeKind::SymFloatType) {} +}; + struct IntType; using IntTypePtr = SingletonTypePtr; // This type represents a Python int number @@ -1738,6 +1758,13 @@ struct getTypePtr_ final { } }; +template +struct getMaybeFakeTypePtr_ final { + static decltype(auto) call() { + return getTypePtr_::call(); + } +}; + template <> struct getTypePtr_ final { static decltype(auto) call() { @@ -1783,33 +1810,35 @@ struct getTypePtr_ final { }; template <> -struct getTypePtr_ final { +struct getMaybeFakeTypePtr_ final { static decltype(auto) call() { return SymIntType::get(); } }; template <> -struct getTypePtr_ final { +struct getMaybeFakeTypePtr_ final { static decltype(auto) call() { return IntType::get(); } }; + template <> -struct getTypePtr_ final { +struct getMaybeFakeTypePtr_ final { static decltype(auto) call() { - return DeviceObjType::get(); + return SymFloatType::get(); } }; template <> -struct getTypePtr_ final { +struct getMaybeFakeTypePtr_ final { static decltype(auto) call() { - return IntType::get(); + return FloatType::get(); } }; + template <> -struct getTypePtr_ final { +struct getTypePtr_ final { static decltype(auto) call() { - return IntType::get(); + return DeviceObjType::get(); } }; template <> @@ -1855,47 +1884,55 @@ struct getTypePtr_ final { return StringType::get(); } }; -template -struct getTypePtr_> final { +template +struct getMaybeFakeTypePtr_, fake> final { static const auto& call() { - static auto inner_type = getTypePtr_::call(); + static auto inner_type = getMaybeFakeTypePtr_::call(); // The "per vector" static singleton needs to live in a .cpp file, // otherwise we'll end up with one singleton instance per shared library. static auto type = ListType::get("vector", inner_type); return type; } }; -template -struct getTypePtr_> final { +template +struct getMaybeFakeTypePtr_, fake> final { static const auto& call() { - static auto inner_type = getTypePtr_::call(); + static auto inner_type = getMaybeFakeTypePtr_::call(); // The "per ArrayRef" static singleton needs to live in a .cpp file, // otherwise we'll end up with one singleton instance per shared library. static auto type = ListType::get("ArrayRef", inner_type); return type; } }; -template <> -struct getTypePtr_ final { +template +struct getMaybeFakeTypePtr_ final { static const auto& call() { - static auto type = ListType::create(getTypePtr_::call()); + static auto type = ListType::create(getMaybeFakeTypePtr_::call()); return type; } }; -template -struct getTypePtr_> final { +template +struct getMaybeFakeTypePtr_, fake> final { static const auto& call() { - static auto inner_type = getTypePtr_::call(); + static auto inner_type = getMaybeFakeTypePtr_::call(); // The "per List" static singleton needs to live in a .cpp file, // otherwise we'll end up with one singleton instance per shared library. static auto type = ListType::get("List", inner_type); return type; } }; -template -struct getTypePtr_> final { +template +struct getMaybeFakeTypePtr_, fake> final { static const auto& call() { - static auto inner_type = getTypePtr_::call(); + static auto inner_type = getMaybeFakeTypePtr_::call(); + static auto type = ListType::get("List", inner_type); + return type; + } +}; +template +struct getMaybeFakeTypePtr_, fake> final { + static const auto& call() { + static auto inner_type = getMaybeFakeTypePtr_::call(); // The "per array" static singleton needs to live in a .cpp file, // otherwise we'll end up with one singleton instance per shared library. // (Concatenating the length onto the end of the string because we want a unique @@ -1904,22 +1941,22 @@ struct getTypePtr_> final { return type; } }; -template -struct getTypePtr_> final { +template +struct getMaybeFakeTypePtr_, fake> final { static const auto& call() { - static auto inner_key_type = getTypePtr_::call(); - static auto inner_val_type = getTypePtr_::call(); + static auto inner_key_type = getMaybeFakeTypePtr_::call(); + static auto inner_val_type = getMaybeFakeTypePtr_::call(); // The "per unordered_map" static singleton needs to live in a .cpp file, // otherwise we'll end up with one singleton instance per shared library. static auto type = DictType::get("unordered_map", inner_key_type, inner_val_type); return type; } }; -template -struct getTypePtr_> final { +template +struct getMaybeFakeTypePtr_, fake> final { static const auto& call() { - static auto inner_key_type = getTypePtr_::call(); - static auto inner_val_type = getTypePtr_::call(); + static auto inner_key_type = getMaybeFakeTypePtr_::call(); + static auto inner_val_type = getMaybeFakeTypePtr_::call(); // The "per Dict" static singleton needs to live in a .cpp file, // otherwise we'll end up with one singleton instance per shared library. static auto type = DictType::get("Dict", inner_key_type, inner_val_type); @@ -1927,10 +1964,10 @@ struct getTypePtr_> final { } }; -template -struct getTypePtr_> final { +template +struct getMaybeFakeTypePtr_, fake> final { static const auto& call() { - static auto inner_type = getTypePtr_::call(); + static auto inner_type = getMaybeFakeTypePtr_::call(); // The "per optional" static singleton needs to live in a .cpp file, // otherwise we'll end up with one singleton instance per shared library. static auto type = OptionalType::get(inner_type); @@ -1942,17 +1979,31 @@ struct getTypePtr_> final { template<> struct getTypePtr_ final { static const auto& call() { - static auto type = OptionalType::create(getTypePtr_::call()); + static auto inner_type = getMaybeFakeTypePtr_::call(); + // The "per optional" static singleton needs to live in a .cpp file, + // otherwise we'll end up with one singleton instance per shared library. + static auto type = OptionalType::get(inner_type); return type; } }; -template -struct getTypePtr_> final { +template +struct getMaybeFakeTypePtr_ final { + static const auto& call() { + // The "per optional" static singleton needs to live in a .cpp file, + // otherwise we'll end up with one singleton instance per shared library. + static auto inner_type = getMaybeFakeTypePtr_::call(); + static auto type = OptionalType::get(inner_type); + return type; + } +}; + +template +struct getMaybeFakeTypePtr_, fake> final { static const auto& call() { static auto type = ([]() { std::vector contained_types = { - (getTypePtr_::call())... + (getMaybeFakeTypePtr_::call())... }; return TupleType::create(std::move(contained_types)); })(); @@ -1970,7 +2021,7 @@ template inline decltype(auto) getTypePtr() { // TODO: static_assert that a templated function exists, and throw a friendly // error message if not - return detail::getTypePtr_::call(); + return detail::getMaybeFakeTypePtr_::call(); } template @@ -1980,6 +2031,16 @@ inline TypePtr getTypePtrCopy() { return getTypePtr(); } +template +inline decltype(auto) getFakeTypePtr() { + return detail::getMaybeFakeTypePtr_::call(); +} + +template +inline TypePtr getFakeTypePtrCopy() { + return getFakeTypePtr(); +} + using TypeEnv = std::unordered_map; struct MatchTypeReturn { MatchTypeReturn(std::string reason) : reason_(std::move(reason)) {} @@ -2109,7 +2170,7 @@ struct MemoryFormatType; using MemoryFormatTypePtr = SingletonTypePtr; struct TORCH_API MemoryFormatType : public EnumerationType { std::string str() const override { -return "MemoryFormatType"; +return "MemoryFormat"; } static const TypeKind Kind = TypeKind::MemoryFormatType; // global singleton @@ -2123,7 +2184,7 @@ struct LayoutType; using LayoutTypePtr = SingletonTypePtr; struct TORCH_API LayoutType : public EnumerationType { std::string str() const override { -return "LayoutType"; +return "Layout"; } static const TypeKind Kind = TypeKind::LayoutType; // global singleton @@ -2133,6 +2194,45 @@ static LayoutTypePtr get(); LayoutType() : EnumerationType() {} }; +namespace detail { +template <> +struct getMaybeFakeTypePtr_ final { + static decltype(auto) call() { + return ScalarTypeType::get(); + } +}; +template <> +struct getMaybeFakeTypePtr_ final { + static decltype(auto) call() { + return LayoutType::get(); + } +}; +template <> +struct getMaybeFakeTypePtr_ final { + static decltype(auto) call() { + return MemoryFormatType::get(); + } +}; +template <> +struct getMaybeFakeTypePtr_ final { + static decltype(auto) call() { + return IntType::get(); + } +}; +template <> +struct getMaybeFakeTypePtr_ final { + static decltype(auto) call() { + return IntType::get(); + } +}; +template <> +struct getMaybeFakeTypePtr_ final { + static decltype(auto) call() { + return IntType::get(); + } +}; +} // namespace detail + // the common supertype of all lists, // List[T] <: AnyList for all T struct AnyListType; diff --git a/aten/src/ATen/core/jit_type_base.h b/aten/src/ATen/core/jit_type_base.h index 6fee9fe0a113..beb553eb935a 100644 --- a/aten/src/ATen/core/jit_type_base.h +++ b/aten/src/ATen/core/jit_type_base.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -52,6 +53,7 @@ namespace c10 { _(AnyTupleType) \ _(AnyClassType) \ _(SymIntType) \ + _(SymFloatType) \ _(UnionType) \ _(DynamicType) diff --git a/aten/src/ATen/core/library.cpp b/aten/src/ATen/core/library.cpp index 5c9cea05ea76..965d3f243d01 100644 --- a/aten/src/ATen/core/library.cpp +++ b/aten/src/ATen/core/library.cpp @@ -89,7 +89,7 @@ Library::Library(Kind kind, std::string ns, c10::optional k, c // merge everything #define DEF_PRELUDE "def(\"", schema.operator_name(), "\"): " -Library& Library::_def(c10::FunctionSchema&& schema, c10::OperatorName* out_name, const std::vector& tags) & { +Library& Library::_def(c10::FunctionSchema&& schema, c10::OperatorName* out_name, const std::vector& tags, _RegisterOrVerify rv) & { TORCH_CHECK(kind_ == DEF || kind_ == FRAGMENT, DEF_PRELUDE, "Cannot define an operator inside of a ", toString(kind_), " block. " @@ -125,13 +125,20 @@ Library& Library::_def(c10::FunctionSchema&& schema, c10::OperatorName* out_name if (out_name) { *out_name = schema.operator_name(); // copy! } - registrars_.emplace_back( - c10::Dispatcher::singleton().registerDef( - std::move(schema), - debugString(file_, line_), - tags - ) - ); + switch (rv) { + case _RegisterOrVerify::REGISTER: + registrars_.emplace_back( + c10::Dispatcher::singleton().registerDef( + std::move(schema), + debugString(file_, line_), + tags + ) + ); + break; + case _RegisterOrVerify::VERIFY: + c10::Dispatcher::singleton().waitForDef(schema); + break; + } return *this; } #undef DEF_PRELUDE @@ -174,11 +181,10 @@ Library& Library::_def(c10::either&& nam } #define IMPL_PRELUDE "impl(\"", name_str, "\", ...): " -Library& Library::_impl(const char* name_str, CppFunction&& f) & { +at::OperatorName Library::_parseNameForLib(const char* name_str) const { auto name = torch::jit::parseName(name_str); auto ns_opt = name.getNamespace(); - // This is kind of similar to the checking in def(), but the error - // messages are a little different for this call site + // This is a copy paste of Library::_impl if (ns_opt.has_value()) { // See Note [Redundancy in registration code is OK] TORCH_CHECK(*ns_opt == *ns_, @@ -193,6 +199,11 @@ Library& Library::_impl(const char* name_str, CppFunction&& f) & { bool b = name.setNamespaceIfNotSet(ns_->c_str()); TORCH_INTERNAL_ASSERT(b, ERROR_CONTEXT); } + return name; +} + +Library& Library::_impl(const char* name_str, CppFunction&& f, _RegisterOrVerify rv) & { + at::OperatorName name = _parseNameForLib(name_str); // See Note [Redundancy in registration code is OK] TORCH_CHECK(!(f.dispatch_key_.has_value() && dispatch_key_.has_value() && @@ -205,19 +216,30 @@ Library& Library::_impl(const char* name_str, CppFunction&& f) & { ERROR_CONTEXT ); auto dispatch_key = f.dispatch_key_.has_value() ? f.dispatch_key_ : dispatch_key_; - registrars_.emplace_back( - c10::Dispatcher::singleton().registerImpl( - std::move(name), - dispatch_key, - std::move(f.func_), - // NOLINTNEXTLINE(performance-move-const-arg) - std::move(f.cpp_signature_), - std::move(f.schema_), - debugString(std::move(f.debug_), file_, line_) - ) - ); + switch (rv) { + case _RegisterOrVerify::REGISTER: + registrars_.emplace_back( + c10::Dispatcher::singleton().registerImpl( + std::move(name), + dispatch_key, + std::move(f.func_), + // NOLINTNEXTLINE(performance-move-const-arg) + std::move(f.cpp_signature_), + std::move(f.schema_), + debugString(std::move(f.debug_), file_, line_) + ) + ); + break; + case _RegisterOrVerify::VERIFY: + c10::Dispatcher::singleton().waitForImpl(name, dispatch_key); + break; + } return *this; } + +c10::OperatorName Library::_resolve(const char* name_str) const { + return _parseNameForLib(name_str); +} #undef IMPL_PRELUDE Library& Library::_fallback(CppFunction&& f) & { diff --git a/aten/src/ATen/core/op_registration/adaption.h b/aten/src/ATen/core/op_registration/adaption.h index 5bf1b691ebad..3112a206bb4e 100644 --- a/aten/src/ATen/core/op_registration/adaption.h +++ b/aten/src/ATen/core/op_registration/adaption.h @@ -68,7 +68,7 @@ inline void check_and_update_common_device(optional& common_device, cons } } -inline void check_and_update_common_device(optional& common_device, at::TensorList tensors, at::CheckedFrom methodName, at::CheckedFrom argName) { +inline void check_and_update_common_device(optional& common_device, at::ITensorListRef tensors, at::CheckedFrom methodName, at::CheckedFrom argName) { for (const auto& tensor : tensors) { check_and_update_common_device(common_device, tensor, methodName, argName); } diff --git a/aten/src/ATen/core/op_registration/infer_schema.cpp b/aten/src/ATen/core/op_registration/infer_schema.cpp index df1925aba5ed..e9e93a2556e0 100644 --- a/aten/src/ATen/core/op_registration/infer_schema.cpp +++ b/aten/src/ATen/core/op_registration/infer_schema.cpp @@ -23,7 +23,7 @@ std::vector createArgumentVector(c10::ArrayRef args) { result.reserve(args.size()); for (const auto i : c10::irange(args.size())) { // Arguments are named "_" - result.emplace_back(fastToString(i), (*args[i].getTypeFn)()); + result.emplace_back(fastToString(i), (*args[i].getFakeTypeFn)(), (*args[i].getTypeFn)()); } return result; } diff --git a/aten/src/ATen/core/op_registration/infer_schema.h b/aten/src/ATen/core/op_registration/infer_schema.h index 7539cd59cac9..2938e2a8d564 100644 --- a/aten/src/ATen/core/op_registration/infer_schema.h +++ b/aten/src/ATen/core/op_registration/infer_schema.h @@ -22,8 +22,9 @@ namespace infer_schema { struct ArgumentDef final { using GetTypeFn = TypePtr(); GetTypeFn* getTypeFn; - constexpr ArgumentDef(): getTypeFn(nullptr) {} - explicit constexpr ArgumentDef(GetTypeFn *getTypeFn): getTypeFn(getTypeFn) {} + GetTypeFn* getFakeTypeFn; + constexpr ArgumentDef(): getTypeFn(nullptr), getFakeTypeFn(nullptr) {} + explicit constexpr ArgumentDef(GetTypeFn *getTypeFn, GetTypeFn *getFakeTypeFn): getTypeFn(getTypeFn), getFakeTypeFn(getFakeTypeFn) {} }; template @@ -52,7 +53,8 @@ constexpr std::array createArgumentVectorFromTypes(s checkStaticTypes(), // Create the return value - std::array{ArgumentDef(&getTypePtrCopy>)...} + std::array{ + ArgumentDef(&getTypePtrCopy>, &getFakeTypePtrCopy>)...} ); } diff --git a/aten/src/ATen/core/op_registration/op_registration_test.cpp b/aten/src/ATen/core/op_registration/op_registration_test.cpp index 05294c25548e..ade5da971172 100644 --- a/aten/src/ATen/core/op_registration/op_registration_test.cpp +++ b/aten/src/ATen/core/op_registration/op_registration_test.cpp @@ -418,7 +418,7 @@ TEST(OperatorRegistrationTest, whenRegisteringMismatchingKernelsInSameOpCall_the } void backend_fallback_kernel(const c10::OperatorHandle& op, c10::Stack* stack) { - (*stack)[1] = (*stack)[1].toString()->string() + op.schema().name(); + (*stack)[1] = (*stack)[1].toStringRef() + op.schema().name(); } TEST(OperatorRegistrationTest, whenRegisteringBackendFallbackKernel_thenCanBeCalled) { @@ -428,7 +428,7 @@ TEST(OperatorRegistrationTest, whenRegisteringBackendFallbackKernel_thenCanBeCal auto op = Dispatcher::singleton().findSchema({"_test::dummy", ""}); ASSERT_TRUE(op.has_value()); auto stack = callOp(*op, dummyTensor(c10::DispatchKey::CPU), "hello "); - EXPECT_EQ("hello _test::dummy", stack[1].toString()->string()); + EXPECT_EQ("hello _test::dummy", stack[1].toStringRef()); } TEST(OperatorRegistrationTest, whenRegisteringBackendFallbackKernelForWrongBackend_thenCannotBeCalled) { @@ -472,7 +472,7 @@ TEST(OperatorRegistrationTest, whenRegisteringBackendFallbackKernelAndRegularKer called = false; auto stack = callOp(*op, dummyTensor(c10::DispatchKey::CPU), "hello "); EXPECT_FALSE(called); - EXPECT_EQ("hello _test::dummy", stack[1].toString()->string()); + EXPECT_EQ("hello _test::dummy", stack[1].toStringRef()); } TEST(OperatorRegistrationTest, whenRegisteringBackendFallbackKernelAndRegularKernelForSameBackend_thenCallsRegularKernel) { @@ -875,7 +875,7 @@ TEST(OperatorRegistrationTest, testAvailableArgTypes) { "(bool a) -> bool"); testArgTypes::test( "string1", [] (const std::string& v) {EXPECT_EQ("string1", v);}, - "string2", [] (const IValue& v) {EXPECT_EQ("string2", v.toString()->string());}, + "string2", [] (const IValue& v) {EXPECT_EQ("string2", v.toStringRef());}, "(str a) -> str"); testArgTypes::test( dummyTensor(c10::DispatchKey::CPU), [] (const Tensor& v) {EXPECT_EQ(c10::DispatchKey::CPU, extractDispatchKey(v));}, @@ -902,7 +902,7 @@ TEST(OperatorRegistrationTest, testAvailableArgTypes) { "(bool? a) -> bool?"); testArgTypes>::test( c10::optional("string1"), [] (const c10::optional& v) {EXPECT_EQ("string1", v.value());}, - c10::optional("string2"), [] (const IValue& v) {EXPECT_EQ("string2", v.toString()->string());}, + c10::optional("string2"), [] (const IValue& v) {EXPECT_EQ("string2", v.toStringRef());}, "(str? a) -> str?"); testArgTypes>::test( c10::optional(dummyTensor(c10::DispatchKey::CPU)), [] (const c10::optional& v) {EXPECT_EQ(c10::DispatchKey::CPU, extractDispatchKey(v.value()));}, @@ -1939,7 +1939,7 @@ TEST(NewOperatorRegistrationTest, fallback) { auto op = Dispatcher::singleton().findSchema({"_test::dummy", ""}); ASSERT_TRUE(op.has_value()); auto stack = callOp(*op, dummyTensor(c10::DispatchKey::CPU), "hello "); - EXPECT_EQ("hello _test::dummy", stack[1].toString()->string()); + EXPECT_EQ("hello _test::dummy", stack[1].toStringRef()); } TEST(NewOperatorRegistrationTest, BackendSelectRedispatchesToCPU) { diff --git a/aten/src/ATen/core/type.cpp b/aten/src/ATen/core/type.cpp index 00e4ceffc156..1c291da06c91 100644 --- a/aten/src/ATen/core/type.cpp +++ b/aten/src/ATen/core/type.cpp @@ -329,6 +329,11 @@ SymIntTypePtr SymIntType::get() { return value; } +SymFloatTypePtr SymFloatType::get() { + static SymFloatTypePtr value(new SymFloatType()); + return value; +} + c10::optional unifyTypesImpl(const TypePtr& t1, const TypePtr& t2, bool default_to_union=false, TypePtr type_hint=nullptr) { // check direct subtyping relation if (t1->isSubtypeOf(*t2)) { diff --git a/aten/src/ATen/cpp_custom_type_hack.h b/aten/src/ATen/cpp_custom_type_hack.h index ff5310b80968..75b900c0d694 100644 --- a/aten/src/ATen/cpp_custom_type_hack.h +++ b/aten/src/ATen/cpp_custom_type_hack.h @@ -48,8 +48,14 @@ // STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP // STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP -#include #include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif namespace at { namespace cpp_custom_type_hack { diff --git a/aten/src/ATen/cpu/vec/vec256/vec256.h b/aten/src/ATen/cpu/vec/vec256/vec256.h index 98ec588137ce..d0a8cb03604a 100644 --- a/aten/src/ATen/cpu/vec/vec256/vec256.h +++ b/aten/src/ATen/cpu/vec/vec256/vec256.h @@ -222,6 +222,51 @@ inline deinterleave2(const Vectorized& a, const Vectorized& _mm256_permute2f128_ps(a_grouped, b_grouped, 0b0110001)); // 1, 3. 4 bits apart } +// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FLIP ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m256i mask_float = _mm256_set_epi32(0, 1, 2, 3, 4, 5, 6, 7); + return _mm256_permutevar8x32_ps(v, mask_float); +} + +template<> +inline Vectorized flip(const Vectorized & v) { + return _mm256_permute4x64_pd(v, 27); // 27 == _MM_SHUFFLE(0, 1, 2, 3) +} + +template<> +inline Vectorized flip(const Vectorized & v) { + return _mm256_permute4x64_epi64(v, 27); // 27 == _MM_SHUFFLE(0, 1, 2, 3) +} + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m256i mask_int32 = _mm256_set_epi32(0, 1, 2, 3, 4, 5, 6, 7); + return _mm256_permutevar8x32_epi32(v, mask_int32); +} + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m256i mask = _mm256_set_epi8( + 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14, + 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 + ); + auto reversed = _mm256_shuffle_epi8(v, mask); + return _mm256_permute2x128_si256(reversed, reversed, 1); +} + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m256i mask_int8 = _mm256_set_epi8( + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 + ); + auto reversed = _mm256_shuffle_epi8(v, mask_int8); + return _mm256_permute2x128_si256(reversed, reversed, 1); +} + + #endif // (defined(CPU_CAPABILITY_AVX2) && !defined(_MSC_VER) }}} diff --git a/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h b/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h index c64e3e589905..15d8ac269e3d 100644 --- a/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h +++ b/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h @@ -698,6 +698,43 @@ inline void convert(const BFloat16* src, BFloat16* dst, int64_t n) { } } +template <> +inline void convert(const float* src, BFloat16* dst, int64_t n) { + int64_t i; + for (i = 0; i + Vectorized::size() <= n; i += Vectorized::size()) { + __m256 a = _mm256_loadu_ps(&src[i]); + __m256 b = _mm256_loadu_ps(&src[i + 8]); + + __m256i bf = cvtfp32_bf16(a, b); + _mm256_storeu_si256(reinterpret_cast<__m256i*>(&dst[i]), bf); + } + for (; i < n; i++) { + dst[i] = c10::convert(src[i]); + } +} + +template <> +inline void convert(const double* src, BFloat16* dst, int64_t n) { + auto load_float = [](const double *src) -> __m256 { + // Load one float vector from an array of doubles + __m128 a = _mm256_cvtpd_ps(_mm256_loadu_pd(src)); + __m128 b = _mm256_cvtpd_ps(_mm256_loadu_pd(src + 4)); + return _mm256_insertf128_ps(_mm256_castps128_ps256(a), b, 1); + }; + + int64_t i; + for (i = 0; i + Vectorized::size() <= n; i += Vectorized::size()) { + __m256 a = load_float(&src[i]); + __m256 b = load_float(&src[i + 8]); + + __m256i bf = cvtfp32_bf16(a, b); + _mm256_storeu_si256(reinterpret_cast<__m256i*>(&dst[i]), bf); + } + for (; i < n; i++) { + dst[i] = c10::convert(src[i]); + } +} + template <> Vectorized inline fmadd(const Vectorized& a, const Vectorized& b, const Vectorized& c) { diff --git a/aten/src/ATen/cpu/vec/vec256/vec256_double.h b/aten/src/ATen/cpu/vec/vec256/vec256_double.h index 138daf3f588a..7956ff24966a 100644 --- a/aten/src/ATen/cpu/vec/vec256/vec256_double.h +++ b/aten/src/ATen/cpu/vec/vec256/vec256_double.h @@ -412,6 +412,11 @@ template <> Vectorized inline fmadd(const Vectorized& a, const Vectorized& b, const Vectorized& c) { return _mm256_fmadd_pd(a, b, c); } + +template <> +Vectorized inline fmsub(const Vectorized& a, const Vectorized& b, const Vectorized& c) { + return _mm256_fmsub_pd(a, b, c); +} #endif #endif diff --git a/aten/src/ATen/cpu/vec/vec256/vec256_float.h b/aten/src/ATen/cpu/vec/vec256/vec256_float.h index 6981676c92c8..440875e59de9 100644 --- a/aten/src/ATen/cpu/vec/vec256/vec256_float.h +++ b/aten/src/ATen/cpu/vec/vec256/vec256_float.h @@ -419,6 +419,11 @@ Vectorized inline fmadd(const Vectorized& a, const Vectorized +Vectorized inline fmsub(const Vectorized& a, const Vectorized& b, const Vectorized& c) { + return _mm256_fmsub_ps(a, b, c); +} + #endif }}} diff --git a/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h b/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h index cbd349083636..76cc7ba3f59c 100644 --- a/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h +++ b/aten/src/ATen/cpu/vec/vec256/vec256_float_neon.h @@ -827,6 +827,13 @@ Vectorized inline fmadd(const Vectorized& a, const Vectorized(r0, r1); } +template <> +Vectorized inline fmsub(const Vectorized& a, const Vectorized& b, const Vectorized& c) { + float32x4_t r0 = vfmsq_f32(c.get_low(), a.get_low(), b.get_low()); + float32x4_t r1 = vfmsq_f32(c.get_high(), a.get_high(), b.get_high()); + return Vectorized(r0, r1); +} + #endif /* defined(aarch64) */ }}} diff --git a/aten/src/ATen/cpu/vec/vec256/vec256_int.h b/aten/src/ATen/cpu/vec/vec256/vec256_int.h index 0cc36d590019..391baeb8b6a3 100644 --- a/aten/src/ATen/cpu/vec/vec256/vec256_int.h +++ b/aten/src/ATen/cpu/vec/vec256/vec256_int.h @@ -745,6 +745,257 @@ class Vectorized : public Vectorizedi { Vectorized le(const Vectorized& other) const; }; +template <> +class Vectorized : public Vectorizedi { +private: + static const Vectorized ones; +public: + using value_type = uint8_t; + static constexpr int size() { + return 32; + } + using Vectorizedi::Vectorizedi; + Vectorized() {} + Vectorized(uint8_t v) { values = _mm256_set1_epi8(v); } + Vectorized(uint8_t val1, uint8_t val2, uint8_t val3, uint8_t val4, + uint8_t val5, uint8_t val6, uint8_t val7, uint8_t val8, + uint8_t val9, uint8_t val10, uint8_t val11, uint8_t val12, + uint8_t val13, uint8_t val14, uint8_t val15, uint8_t val16, + uint8_t val17, uint8_t val18, uint8_t val19, uint8_t val20, + uint8_t val21, uint8_t val22, uint8_t val23, uint8_t val24, + uint8_t val25, uint8_t val26, uint8_t val27, uint8_t val28, + uint8_t val29, uint8_t val30, uint8_t val31, uint8_t val32) { + values = _mm256_setr_epi8(val1, val2, val3, val4, val5, val6, val7, val8, + val9, val10, val11, val12, val13, val14, val15, val16, + val17, val18, val19, val20, val21, val22, val23, val24, + val25, val26, val27, val28, val29, val30, val31, val32); + } + template + static Vectorized blend(Vectorized a, Vectorized b) { + __at_align__ uint8_t tmp_values[size()]; + a.store(tmp_values); + if (mask & 0x01) + tmp_values[0] = _mm256_extract_epi8(b.values, 0); + if (mask & 0x02) + tmp_values[1] = _mm256_extract_epi8(b.values, 1); + if (mask & 0x04) + tmp_values[2] = _mm256_extract_epi8(b.values, 2); + if (mask & 0x08) + tmp_values[3] = _mm256_extract_epi8(b.values, 3); + if (mask & 0x10) + tmp_values[4] = _mm256_extract_epi8(b.values, 4); + if (mask & 0x20) + tmp_values[5] = _mm256_extract_epi8(b.values, 5); + if (mask & 0x40) + tmp_values[6] = _mm256_extract_epi8(b.values, 6); + if (mask & 0x80) + tmp_values[7] = _mm256_extract_epi8(b.values, 7); + if (mask & 0x100) + tmp_values[8] = _mm256_extract_epi8(b.values, 8); + if (mask & 0x200) + tmp_values[9] = _mm256_extract_epi8(b.values, 9); + if (mask & 0x400) + tmp_values[10] = _mm256_extract_epi8(b.values, 10); + if (mask & 0x800) + tmp_values[11] = _mm256_extract_epi8(b.values, 11); + if (mask & 0x1000) + tmp_values[12] = _mm256_extract_epi8(b.values, 12); + if (mask & 0x2000) + tmp_values[13] = _mm256_extract_epi8(b.values, 13); + if (mask & 0x4000) + tmp_values[14] = _mm256_extract_epi8(b.values, 14); + if (mask & 0x8000) + tmp_values[15] = _mm256_extract_epi8(b.values, 15); + if (mask & 0x010000) + tmp_values[16] = _mm256_extract_epi8(b.values, 16); + if (mask & 0x020000) + tmp_values[17] = _mm256_extract_epi8(b.values, 17); + if (mask & 0x040000) + tmp_values[18] = _mm256_extract_epi8(b.values, 18); + if (mask & 0x080000) + tmp_values[19] = _mm256_extract_epi8(b.values, 19); + if (mask & 0x100000) + tmp_values[20] = _mm256_extract_epi8(b.values, 20); + if (mask & 0x200000) + tmp_values[21] = _mm256_extract_epi8(b.values, 21); + if (mask & 0x400000) + tmp_values[22] = _mm256_extract_epi8(b.values, 22); + if (mask & 0x800000) + tmp_values[23] = _mm256_extract_epi8(b.values, 23); + if (mask & 0x1000000) + tmp_values[24] = _mm256_extract_epi8(b.values, 24); + if (mask & 0x2000000) + tmp_values[25] = _mm256_extract_epi8(b.values, 25); + if (mask & 0x4000000) + tmp_values[26] = _mm256_extract_epi8(b.values, 26); + if (mask & 0x8000000) + tmp_values[27] = _mm256_extract_epi8(b.values, 27); + if (mask & 0x10000000) + tmp_values[28] = _mm256_extract_epi8(b.values, 28); + if (mask & 0x20000000) + tmp_values[29] = _mm256_extract_epi8(b.values, 29); + if (mask & 0x40000000) + tmp_values[30] = _mm256_extract_epi8(b.values, 30); + if (mask & 0x80000000) + tmp_values[31] = _mm256_extract_epi8(b.values, 31); + return loadu(tmp_values); + } + static Vectorized blendv(const Vectorized& a, const Vectorized& b, + const Vectorized& mask) { + return _mm256_blendv_epi8(a.values, b.values, mask.values); + } + template + static Vectorized arange(uint8_t base = 0, step_t step = static_cast(1)) { + return Vectorized( + base, base + step, base + 2 * step, base + 3 * step, + base + 4 * step, base + 5 * step, base + 6 * step, base + 7 * step, + base + 8 * step, base + 9 * step, base + 10 * step, base + 11 * step, + base + 12 * step, base + 13 * step, base + 14 * step, base + 15 * step, + base + 16 * step, base + 17 * step, base + 18 * step, base + 19 * step, + base + 20 * step, base + 21 * step, base + 22 * step, base + 23 * step, + base + 24 * step, base + 25 * step, base + 26 * step, base + 27 * step, + base + 28 * step, base + 29 * step, base + 30 * step, base + 31 * step); + } + static Vectorized + set(Vectorized a, Vectorized b, uint8_t count = size()) { + switch (count) { + case 0: + return a; + case 1: + return blend<0x1>(a, b); + case 2: + return blend<0x3>(a, b); + case 3: + return blend<0x7>(a, b); + case 4: + return blend<0xF>(a, b); + case 5: + return blend<0x1F>(a, b); + case 6: + return blend<0x3F>(a, b); + case 7: + return blend<0x7F>(a, b); + case 8: + return blend<0xFF>(a, b); + case 9: + return blend<0x1FF>(a, b); + case 10: + return blend<0x3FF>(a, b); + case 11: + return blend<0x7FF>(a, b); + case 12: + return blend<0xFFF>(a, b); + case 13: + return blend<0x1FFF>(a, b); + case 14: + return blend<0x3FFF>(a, b); + case 15: + return blend<0x7FFF>(a, b); + case 16: + return blend<0xFFFF>(a, b); + case 17: + return blend<0x1FFFF>(a, b); + case 18: + return blend<0x3FFFF>(a, b); + case 19: + return blend<0x7FFFF>(a, b); + case 20: + return blend<0xFFFFF>(a, b); + case 21: + return blend<0x1FFFFF>(a, b); + case 22: + return blend<0x3FFFFF>(a, b); + case 23: + return blend<0x7FFFFF>(a, b); + case 24: + return blend<0xFFFFFF>(a, b); + case 25: + return blend<0x1FFFFFF>(a, b); + case 26: + return blend<0x3FFFFFF>(a, b); + case 27: + return blend<0x7FFFFFF>(a, b); + case 28: + return blend<0xFFFFFFF>(a, b); + case 29: + return blend<0x1FFFFFFF>(a, b); + case 30: + return blend<0x3FFFFFFF>(a, b); + case 31: + return blend<0x7FFFFFFF>(a, b); + } + return b; + } + static Vectorized loadu(const void* ptr) { + return _mm256_loadu_si256(reinterpret_cast(ptr)); + } + static Vectorized loadu(const void* ptr, uint8_t count) { + __at_align__ uint8_t tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy(tmp_values, ptr, count * sizeof(uint8_t)); + return loadu(tmp_values); + } + void store(void* ptr, int count = size()) const { + if (count == size()) { + // ptr need not to be aligned here. See + // https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-vector-extensions/intrinsics-for-load-and-store-operations-1/mm256-storeu-si256.html + _mm256_storeu_si256(reinterpret_cast<__m256i*>(ptr), values); + } else if (count > 0) { + __at_align__ uint8_t tmp_values[size()]; + _mm256_storeu_si256(reinterpret_cast<__m256i*>(tmp_values), values); + std::memcpy(ptr, tmp_values, count * sizeof(uint8_t)); + } + } + const uint8_t& operator[](int idx) const = delete; + uint8_t& operator[](int idx) = delete; + Vectorized abs() const { + return values; + } + Vectorized real() const { + return *this; + } + Vectorized imag() const { + return _mm256_set1_epi8(0); + } + Vectorized conj() const { + return *this; + } + Vectorized frac() const; + Vectorized neg() const; + Vectorized operator==(const Vectorized& other) const { + return _mm256_cmpeq_epi8(values, other.values); + } + Vectorized operator!=(const Vectorized& other) const { + return invert(_mm256_cmpeq_epi8(values, other.values)); + } + Vectorized operator<(const Vectorized& other) const { + __m256i max = _mm256_max_epu8(values, other.values); + return invert(_mm256_cmpeq_epi8(max, values)); + } + Vectorized operator<=(const Vectorized& other) const { + __m256i max = _mm256_max_epu8(values, other.values); + return _mm256_cmpeq_epi8(max, other.values); + } + Vectorized operator>(const Vectorized& other) const { + return other < *this; + } + Vectorized operator>=(const Vectorized& other) const { + return other <= *this; + } + + Vectorized eq(const Vectorized& other) const; + Vectorized ne(const Vectorized& other) const; + Vectorized gt(const Vectorized& other) const; + Vectorized ge(const Vectorized& other) const; + Vectorized lt(const Vectorized& other) const; + Vectorized le(const Vectorized& other) const; +}; + template <> Vectorized inline operator+(const Vectorized& a, const Vectorized& b) { return _mm256_add_epi64(a, b); @@ -765,6 +1016,12 @@ Vectorized inline operator+(const Vectorized& a, const Vectorize return _mm256_add_epi8(a, b); } +template <> +Vectorized inline operator+(const Vectorized& a, const Vectorized& b) { + return _mm256_add_epi8(a, b); +} + + template <> Vectorized inline operator-(const Vectorized& a, const Vectorized& b) { return _mm256_sub_epi64(a, b); @@ -785,6 +1042,11 @@ Vectorized inline operator-(const Vectorized& a, const Vectorize return _mm256_sub_epi8(a, b); } +template <> +Vectorized inline operator-(const Vectorized& a, const Vectorized& b) { + return _mm256_sub_epi8(a, b); +} + // Negation. Defined here so we can utilize operator- inline Vectorized Vectorized::neg() const { return Vectorized(0) - *this; @@ -802,6 +1064,10 @@ inline Vectorized Vectorized::neg() const { return Vectorized(0) - *this; } +inline Vectorized Vectorized::neg() const { + return Vectorized(0) - *this; +} + // Emulate operations with no native 64-bit support in avx, // by extracting each element, performing the operation pointwise, // then combining the results into a vector. @@ -888,6 +1154,12 @@ Vectorized inline operator*(const Vectorized& a, const Vectorize return int_elementwise_binary_256(a, b, std::multiplies()); } +template <> +Vectorized inline operator*(const Vectorized& a, const Vectorized& b) { + // We don't have an instruction for multiplying uint8_t + return int_elementwise_binary_256(a, b, std::multiplies()); +} + template <> Vectorized inline minimum(const Vectorized& a, const Vectorized& b) { return emulate(a, b, [](int64_t a_point, int64_t b_point) {return std::min(a_point, b_point);}); @@ -908,6 +1180,11 @@ Vectorized inline minimum(const Vectorized& a, const Vectorized< return _mm256_min_epi8(a, b); } +template <> +Vectorized inline minimum(const Vectorized& a, const Vectorized& b) { + return _mm256_min_epu8(a, b); +} + template <> Vectorized inline maximum(const Vectorized& a, const Vectorized& b) { return emulate(a, b, [](int64_t a_point, int64_t b_point) {return std::max(a_point, b_point);}); @@ -928,6 +1205,11 @@ Vectorized inline maximum(const Vectorized& a, const Vectorized< return _mm256_max_epi8(a, b); } +template <> +Vectorized inline maximum(const Vectorized& a, const Vectorized& b) { + return _mm256_max_epu8(a, b); +} + template <> Vectorized inline clamp(const Vectorized& a, const Vectorized& min_val, const Vectorized& max_val) { return emulate(a, min_val, max_val, [](int64_t a_point, int64_t min_point, int64_t max_point) {return std::min(max_point, std::max(a_point, min_point));}); @@ -948,6 +1230,11 @@ Vectorized inline clamp(const Vectorized& a, const Vectorized +Vectorized inline clamp(const Vectorized& a, const Vectorized& min_val, const Vectorized& max_val) { + return _mm256_min_epu8(max_val, _mm256_max_epu8(a, min_val)); +} + template <> Vectorized inline clamp_max(const Vectorized& a, const Vectorized& max_val) { return emulate(a, max_val, [](int64_t a_point, int64_t max_point) {return std::min(max_point, a_point);}); @@ -968,6 +1255,11 @@ Vectorized inline clamp_max(const Vectorized& a, const Vectorize return _mm256_min_epi8(max_val, a); } +template <> +Vectorized inline clamp_max(const Vectorized& a, const Vectorized& max_val) { + return _mm256_min_epu8(max_val, a); +} + template <> Vectorized inline clamp_min(const Vectorized& a, const Vectorized& min_val) { return emulate(a, min_val, [](int64_t a_point, int64_t min_point) {return std::max(min_point, a_point);}); @@ -988,6 +1280,11 @@ Vectorized inline clamp_min(const Vectorized& a, const Vectorize return _mm256_max_epi8(min_val, a); } +template <> +Vectorized inline clamp_min(const Vectorized& a, const Vectorized& min_val) { + return _mm256_max_epu8(min_val, a); +} + template Vectorized inline convert_to_int32(const T* ptr) { return Vectorized::loadu(ptr); @@ -1019,6 +1316,10 @@ template <> Vectorized inline operator/(const Vectorized& a, const Vectorized& b) { return int_elementwise_binary_256(a, b, std::divides()); } +template <> +Vectorized inline operator/(const Vectorized& a, const Vectorized& b) { + return int_elementwise_binary_256(a, b, std::divides()); +} template>::value, int> = 0> inline Vectorized operator&(const Vectorized& a, const Vectorized& b) { @@ -1133,6 +1434,292 @@ inline Vectorized Vectorized::le(const Vectorized& other return (*this <= other) & Vectorized(1); } +inline Vectorized Vectorized::eq(const Vectorized& other) const { + return (*this == other) & Vectorized(1); +} + +inline Vectorized Vectorized::ne(const Vectorized& other) const { + return (*this != other) & Vectorized(1); +} + +inline Vectorized Vectorized::gt(const Vectorized& other) const { + return (*this > other) & Vectorized(1); +} + +inline Vectorized Vectorized::ge(const Vectorized& other) const { + return (*this >= other) & Vectorized(1); +} + +inline Vectorized Vectorized::lt(const Vectorized& other) const { + return (*this < other) & Vectorized(1); +} + +inline Vectorized Vectorized::le(const Vectorized& other) const { + return (*this <= other) & Vectorized(1); +} + +template +Vectorized inline shift_256_16(const Vectorized& a, const Vectorized& b) { + // No vector instruction for shifting int16_t, so emulating it instead. + + // Control masks for shuffle operation, treating 256 bits as an + // array of 16-bit elements, and considering pairs of neighboring + // elements. Specifially, a mask named "ctl_M_N" (M,N in [0,1], and + // M!=N) is set so that shuffle will move element with index M from + // input pair into element with index N in output pair, and element + // with index M in output pair will be set to all 0s. + __m256i ctl_0_1 = _mm256_set_epi8(29, 28, 0x80, 0x80, 25, 24, 0x80, 0x80, + 21, 20, 0x80, 0x80, 17, 16, 0x80, 0x80, + 13, 12, 0x80, 0x80, 9, 8, 0x80, 0x80, + 5, 4, 0x80, 0x80, 1, 0, 0x80, 0x80); + __m256i ctl_1_0 = _mm256_set_epi8(0x80, 0x80, 31, 30, 0x80, 0x80, 27, 26, + 0x80, 0x80, 23, 22, 0x80, 0x80, 19, 18, + 0x80, 0x80, 15, 14, 0x80, 0x80, 11, 10, + 0x80, 0x80, 7, 6, 0x80, 0x80, 3, 2); + + // Masks for bitwise and operation, treating 256 bits as an array of + // 16-bit elements, and considering them in pairs of neighboring + // elements. A mask named "keep_M" (M in [0,1]) is set so that + // bitwise and will copy element with index M from input pair into + // element with the same index in output pair, while the other + // element in output pair will be set to all 0s. + __m256i keep_0 = _mm256_set1_epi32(0xFFFF); + __m256i keep_1 = _mm256_set1_epi32(0xFFFF0000); + + // Take each 16-bit element with idx%2==0 from input array to be + // shifted and extend it to 32 bits so that 0s are added to the + // right. Then, perform shifting on this 32-bit number. Upper 16 + // bits will be proper result of shifting original 16-bit number, so + // write them to result array, into the same position from which + // corresponding input element is taken. Also, make sure that + // result array elements with idx%2!=0 are set to all 0s. + // + // Note that number of bits to shift for is extended to 32 bits by + // adding 0s to the left. That means this number is not properly + // sign-extended for negative values. However, number of bits to + // shift is treated as an unsigned integer by respective shift + // intrinsics anyway so if negative then either with or without + // proper sign extension, it will be interpreted as a number greater + // than 32, and the shifting result will be the same. + __m256i a0 = _mm256_shuffle_epi8(a, ctl_0_1); + __m256i b0 = _mm256_and_si256(b, keep_0); + __m256i c0; + if (left_shift) + c0 = _mm256_sllv_epi32(a0, b0); + else + c0 = _mm256_srav_epi32(a0, b0); + c0 = _mm256_shuffle_epi8(c0, ctl_1_0); + + // Peform shifting the same way for input array elements with + // idx%2==1. + __m256i a1 = _mm256_and_si256(a, keep_1); + __m256i b1 = _mm256_shuffle_epi8(b, ctl_1_0); + __m256i c1; + if (left_shift) + c1 = _mm256_sllv_epi32(a1, b1); + else + c1 = _mm256_srav_epi32(a1, b1); + c1 = _mm256_and_si256(c1, keep_1); + + // Merge partial results into the final result. + __m256i c = _mm256_or_si256(c0, c1); + + return c; +} + +template ::value || std::is_same::value, int> = 0> +Vectorized inline shift_256_8(const Vectorized& a, const Vectorized& b) { + // No vector instruction for shifting int8_t/uint8_t, so emulating + // it instead. + + // Control masks for shuffle operation, treating 256 bits as an + // array of 8-bit elements, and considering quadruples of + // neighboring elements. Specifially, a mask named "ctl_M_N" (M,N + // in [0,1,2,3], and M!=N) is set so that shuffle will move element + // with index M from input quadruple into element with index N in + // output quadruple, and other elements in output quadruple will be + // set to all 0s. + __m256i ctl_0_3 = _mm256_set_epi8(28, 0x80, 0x80, 0x80, 24, 0x80, 0x80, 0x80, + 20, 0x80, 0x80, 0x80, 16, 0x80, 0x80, 0x80, + 12, 0x80, 0x80, 0x80, 8, 0x80, 0x80, 0x80, + 4, 0x80, 0x80, 0x80, 0, 0x80, 0x80, 0x80); + __m256i ctl_1_0 = _mm256_set_epi8(0x80, 0x80, 0x80, 29, 0x80, 0x80, 0x80, 25, + 0x80, 0x80, 0x80, 21, 0x80, 0x80, 0x80, 17, + 0x80, 0x80, 0x80, 13, 0x80, 0x80, 0x80, 9, + 0x80, 0x80, 0x80, 5, 0x80, 0x80, 0x80, 1); + __m256i ctl_1_3 = _mm256_set_epi8(29, 0x80, 0x80, 0x80, 25, 0x80, 0x80, 0x80, + 21, 0x80, 0x80, 0x80, 17, 0x80, 0x80, 0x80, + 13, 0x80, 0x80, 0x80, 9, 0x80, 0x80, 0x80, + 5, 0x80, 0x80, 0x80, 1, 0x80, 0x80, 0x80); + __m256i ctl_2_0 = _mm256_set_epi8(0x80, 0x80, 0x80, 30, 0x80, 0x80, 0x80, 26, + 0x80, 0x80, 0x80, 22, 0x80, 0x80, 0x80, 18, + 0x80, 0x80, 0x80, 14, 0x80, 0x80, 0x80, 10, + 0x80, 0x80, 0x80, 6, 0x80, 0x80, 0x80, 2); + __m256i ctl_2_3 = _mm256_set_epi8(30, 0x80, 0x80, 0x80, 26, 0x80, 0x80, 0x80, + 22, 0x80, 0x80, 0x80, 18, 0x80, 0x80, 0x80, + 14, 0x80, 0x80, 0x80, 10, 0x80, 0x80, 0x80, + 6, 0x80, 0x80, 0x80, 2, 0x80, 0x80, 0x80); + __m256i ctl_3_0 = _mm256_set_epi8(0x80, 0x80, 0x80, 31, 0x80, 0x80, 0x80, 27, + 0x80, 0x80, 0x80, 23, 0x80, 0x80, 0x80, 19, + 0x80, 0x80, 0x80, 15, 0x80, 0x80, 0x80, 11, + 0x80, 0x80, 0x80, 7, 0x80, 0x80, 0x80, 3); + __m256i ctl_3_1 = _mm256_set_epi8(0x80, 0x80, 31, 0x80, 0x80, 0x80, 27, 0x80, + 0x80, 0x80, 23, 0x80, 0x80, 0x80, 19, 0x80, + 0x80, 0x80, 15, 0x80, 0x80, 0x80, 11, 0x80, + 0x80, 0x80, 7, 0x80, 0x80, 0x80, 3, 0x80); + __m256i ctl_3_2 = _mm256_set_epi8(0x80, 31, 0x80, 0x80, 0x80, 27, 0x80, 0x80, + 0x80, 23, 0x80, 0x80, 0x80, 19, 0x80, 0x80, + 0x80, 15, 0x80, 0x80, 0x80, 11, 0x80, 0x80, + 0x80, 7, 0x80, 0x80, 0x80, 3, 0x80, 0x80); + + // Masks for bitwise and operation, treating 256 bits as an array of + // 8-bit elements, and considering them in quadruples of neighboring + // elements. A mask named "keep_M" (M in [0,1,2,3]) is set so that + // bitwise and will copy element with index M from input quadruple + // into element with the same index in output quadruple, while the + // other elements in output quadruple will be set to all 0s. + __m256i keep_0 = _mm256_set1_epi32(0xFF); + __m256i keep_3 = _mm256_set1_epi32(0xFF000000); + + // Take each 8-bit element with idx%4==0 from input array to be + // shifted and extend it to 32 bits so that 0s are added to the + // right. Then, perform shifting on this 32-bit number. Upper 8 + // bits will be proper result of shifting original 8-bit number, so + // write them to result array, into the same position from which + // corresponding input element is taken. Also, make sure that + // result array elements with idx%4!=0 are set to all 0s. + // + // Note that number of bits to shift for is extended to 32 bits by + // adding 0s to the left. That means this number is not properly + // sign-extended for negative values. However, number of bits to + // shift is treated as an unsigned integer by respective shift + // intrinsics anyway so if negative then either with or without + // proper sign extension, it will be interpreted as a number greater + // than 32, and the shifting result will be the same. + __m256i a0 = _mm256_shuffle_epi8(a, ctl_0_3); + __m256i b0 = _mm256_and_si256(b, keep_0); + __m256i c0; + if (left_shift) + c0 = _mm256_sllv_epi32(a0, b0); + else + if (std::is_same::value) + c0 = _mm256_srav_epi32(a0, b0); + else + c0 = _mm256_srlv_epi32(a0, b0); + c0 = _mm256_shuffle_epi8(c0, ctl_3_0); + + // Peform shifting the same way for input array elements with + // idx%4==1. + __m256i a1 = _mm256_shuffle_epi8(a, ctl_1_3); + __m256i b1 = _mm256_shuffle_epi8(b, ctl_1_0); + __m256i c1; + if (left_shift) + c1 = _mm256_sllv_epi32(a1, b1); + else + if (std::is_same::value) + c1 = _mm256_srav_epi32(a1, b1); + else + c1 = _mm256_srlv_epi32(a1, b1); + c1 = _mm256_shuffle_epi8(c1, ctl_3_1); + + // Peform shifting the same way for input array elements with + // idx%4==2. + __m256i a2 = _mm256_shuffle_epi8(a, ctl_2_3); + __m256i b2 = _mm256_shuffle_epi8(b, ctl_2_0); + __m256i c2; + if (left_shift) + c2 = _mm256_sllv_epi32(a2, b2); + else + if (std::is_same::value) + c2 = _mm256_srav_epi32(a2, b2); + else + c2 = _mm256_srlv_epi32(a2, b2); + c2 = _mm256_shuffle_epi8(c2, ctl_3_2); + + // Peform shifting the same way for input array elements with + // idx%4==3. + __m256i a3 = _mm256_and_si256(a, keep_3); + __m256i b3 = _mm256_shuffle_epi8(b, ctl_3_0); + __m256i c3; + if (left_shift) + c3 = _mm256_sllv_epi32(a3, b3); + else + if (std::is_same::value) + c3 = _mm256_srav_epi32(a3, b3); + else + c3 = _mm256_srlv_epi32(a3, b3); + c3 = _mm256_and_si256(c3, keep_3); + + // Merge partial results into the final result. + __m256i c01 = _mm256_or_si256(c0, c1); + __m256i c23 = _mm256_or_si256(c2, c3); + __m256i c = _mm256_or_si256(c01, c23); + + return c; +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return _mm256_sllv_epi64(a, b); +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return _mm256_sllv_epi32(a, b); +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return shift_256_16(a, b); +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return shift_256_8(a, b); +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return shift_256_8(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + // No vector instruction for right shifting int64_t, so emulating it + // instead. + + // Shift the number logically to the right, thus filling the most + // significant bits with 0s. Then, replace these bits with the sign + // bit. + __m256i sign_bits = _mm256_cmpgt_epi64(_mm256_set1_epi64x(0), a); + __m256i b_inv_mod_64 = _mm256_sub_epi64(_mm256_set1_epi64x(64), b); + __m256i sign_ext = _mm256_sllv_epi64(sign_bits, b_inv_mod_64); + __m256i c = _mm256_srlv_epi64(a, b); + c = _mm256_or_si256(c, sign_ext); + + return c; +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return _mm256_srav_epi32(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return shift_256_16(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return shift_256_8(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return shift_256_8(a, b); +} + #endif }}} diff --git a/aten/src/ATen/cpu/vec/vec256/vec256_qint.h b/aten/src/ATen/cpu/vec/vec256/vec256_qint.h index 1dc624f6668a..0ee43b53e635 100644 --- a/aten/src/ATen/cpu/vec/vec256/vec256_qint.h +++ b/aten/src/ATen/cpu/vec/vec256/vec256_qint.h @@ -257,6 +257,19 @@ struct Vectorized : public Vectorizedqi { return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy( + tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return _mm256_loadu_si256((const __m256i*)tmp_values); + } + float_vec_return_type dequantize( Vectorized scale, Vectorized /*zero_point*/, @@ -436,6 +449,19 @@ struct Vectorized : public Vectorizedqi { return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy( + tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return _mm256_loadu_si256((const __m256i*)tmp_values); + } + private: __m256i cvtepi8_epi32(__m128i epi8_vals) const { return _mm256_cvtepi8_epi32(epi8_vals); @@ -601,6 +627,19 @@ struct Vectorized : public Vectorizedqi { return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy( + tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return _mm256_loadu_si256((const __m256i*)tmp_values); + } + private: __m256i cvtepu8_epi32(__m128i epu8_vals) const { return _mm256_cvtepu8_epi32(epu8_vals); @@ -820,6 +859,19 @@ struct Vectorized : public VectorizedQuantizedConverter< return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy( + tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return Vectorized(tmp_values); + } + static Vectorized quantize( const float_vec_return_type& rhs, float scale, @@ -952,6 +1004,19 @@ struct Vectorized : public VectorizedQuantizedConverter< return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy( + tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return Vectorized(tmp_values); + } + static Vectorized quantize( const float_vec_return_type& rhs, float scale, @@ -1072,6 +1137,19 @@ struct Vectorized : public VectorizedQuantizedConverter< return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy( + tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return Vectorized(tmp_values); + } + static Vectorized quantize( const float_vec_return_type& rhs, float scale, diff --git a/aten/src/ATen/cpu/vec/vec256/vsx/vec256_float_vsx.h b/aten/src/ATen/cpu/vec/vec256/vsx/vec256_float_vsx.h index 77cf3695ab91..8fe6cc25f0ee 100644 --- a/aten/src/ATen/cpu/vec/vec256/vsx/vec256_float_vsx.h +++ b/aten/src/ATen/cpu/vec/vec256/vsx/vec256_float_vsx.h @@ -256,29 +256,29 @@ class Vectorized { } Vectorized C10_ALWAYS_INLINE acos() const { - return {Sleef_acosf4_u10vsx(_vec0), Sleef_acosf4_u10vsx(_vec1)}; + return {Sleef_acosf4_u10vsx(_vec0), Sleef_acosf4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE asin() const { - return {Sleef_asinf4_u10vsx(_vec0), Sleef_asinf4_u10vsx(_vec1)}; + return {Sleef_asinf4_u10vsx(_vec0), Sleef_asinf4_u10vsx(_vec1)}; } Vectorized atan() const { - return {Sleef_atanf4_u10vsx(_vec0), Sleef_atanf4_u10vsx(_vec1)}; + return {Sleef_atanf4_u10vsx(_vec0), Sleef_atanf4_u10vsx(_vec1)}; } Vectorized atan2(const Vectorized& b) const { - return {Sleef_atan2f4_u10vsx(_vec0, b._vec0), Sleef_atan2f4_u10vsx(_vec1, b._vec1)}; + return {Sleef_atan2f4_u10vsx(_vec0, b._vec0), Sleef_atan2f4_u10vsx(_vec1, b._vec1)}; } Vectorized copysign(const Vectorized &sign) const { return {Sleef_copysignf4_vsx(_vec0, sign._vec0), Sleef_copysignf4_vsx(_vec1, sign._vec1)}; } Vectorized lgamma() const { - return {Sleef_lgammaf4_u10vsx(_vec0), Sleef_lgammaf4_u10vsx(_vec1)}; + return {Sleef_lgammaf4_u10vsx(_vec0), Sleef_lgammaf4_u10vsx(_vec1)}; } Vectorized erf() const { - return {Sleef_erff4_u10vsx(_vec0), Sleef_erff4_u10vsx(_vec1)}; + return {Sleef_erff4_u10vsx(_vec0), Sleef_erff4_u10vsx(_vec1)}; } Vectorized erfc() const { - return {Sleef_erfcf4_u15vsx(_vec0), Sleef_erfcf4_u15vsx(_vec1)}; + return {Sleef_erfcf4_u15vsx(_vec0), Sleef_erfcf4_u15vsx(_vec1)}; } Vectorized erfinv() const { @@ -301,133 +301,32 @@ class Vectorized { } Vectorized C10_ALWAYS_INLINE exp() const { - // implementation logic from avx_mathfun with some modifications from sleef - // Express e**x = e**g 2**n - /// = e**g e**( n loge(2) ) - /// = e**( g + n loge(2) ) - // - auto tmp_x = *this; - auto fx = (tmp_x * log2e_inv).round(); - - auto x = fx.madd(negln2f_hi, tmp_x); - x = fx.madd(negln2f_lo, x); - auto z = x * x; - auto y = x.madd(exp_p0, exp_p1); - y = y.madd(x, exp_p2); - y = y.madd(x, exp_p3); - y = y.madd(x, exp_p4); - y = y.madd(x, exp_p5); - y = y.madd(z, x) + one; - - // vm_pow2n 2^n - vint32 imm0 = vec_signed(fx._vec0); - vint32 imm1 = vec_signed(fx._vec1); - // this pow2n logic is from Sleef code - vint32 imm00 = imm0 >> 1; //>>1 - vint32 imm01 = imm1 >> 1; - vint32 imm10 = imm0 - imm00; - vint32 imm11 = imm1 - imm01; - imm00 = (imm00 + v0x7f) << vu_23; - imm01 = (imm01 + v0x7f) << vu_23; - imm10 = (imm10 + v0x7f) << vu_23; - imm11 = (imm11 + v0x7f) << vu_23; - // treat imm as float vector without conversion - - y._vec0 = (y._vec0 * (vfloat32)imm00) * (vfloat32)imm10; - y._vec1 = (y._vec1 * (vfloat32)imm01) * (vfloat32)imm11; - // boundary check - auto tmp = blendv(y, v_inf, (Vectorized(exp_hi) <= tmp_x)); - y = blendv(tmp, zero, (tmp_x < Vectorized(exp_lo))); - - return y; + return {Sleef_expf4_u10vsx(_vec0), Sleef_expf4_u10vsx(_vec1)}; } Vectorized expm1() const { - return exp() - one; + return {Sleef_expm1f4_u10vsx(_vec0), Sleef_expm1f4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE log() const { return {Sleef_logf4_u10vsx(_vec0), Sleef_logf4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE log10() const { - return {Sleef_log10f4_u10vsx(_vec0), Sleef_log10f4_u10vsx(_vec1)}; + return {Sleef_log10f4_u10vsx(_vec0), Sleef_log10f4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE log1p() const { - return {Sleef_log1pf4_u10vsx(_vec0), Sleef_log1pf4_u10vsx(_vec1)}; + return {Sleef_log1pf4_u10vsx(_vec0), Sleef_log1pf4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE log2() const { - return {Sleef_log2f4_u10vsx(_vec0), Sleef_log2f4_u10vsx(_vec1)}; + return {Sleef_log2f4_u10vsx(_vec0), Sleef_log2f4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE ceil() const { return {vec_ceil(_vec0), vec_ceil(_vec1)}; } Vectorized C10_ALWAYS_INLINE cos() const { - // take the absolute value - auto x = abs(); - // extract the sign bit (upper one) - auto sign_bit = (*this) & sign_mask; - // scale by 4/Pi - auto y = x * _4div_pi; - // store the integer part of y in mm0 - // j=(j+1) & (~1) (see the cephes sources) - vint32 imm0 = (vec_signed(y._vec0) + vi_1) & vi_inv1; - vint32 imm1 = (vec_signed(y._vec1) + vi_1) & vi_inv1; - y._vec0 = vec_float(imm0); - y._vec1 = vec_float(imm1); - - imm0 = imm0 - vi_2; - imm1 = imm1 - vi_2; - Vectorized poly_mask; - // get the swap sign flag - vint32 tmp0 = vec_and(vec_nand(imm0, imm0), vi_4); - vint32 tmp1 = vec_and(vec_nand(imm1, imm1), vi_4); - sign_bit._vecb0 = (vbool32)vec_sl(tmp0, vu_29); - sign_bit._vecb1 = (vbool32)vec_sl(tmp1, vu_29); - // get the polynom selection mask - // there is one polynom for 0 <= x <= Pi / 4 - // and another one for Pi / 4 < x <= Pi / 2 - // Both branches will be computed. - - poly_mask._vecb0 = (vbool32)vec_cmpeq((imm0 & vi_2), vi_0); - poly_mask._vecb1 = (vbool32)vec_cmpeq((imm1 & vi_2), vi_0); - - // The magic pass: "Extended precision modular arithmetic" - // x = ((x - y * DP1) - y * DP2) - y * DP3; - x = y.madd(minus_cephes_dp1, x); - x = y.madd(minus_cephes_dp2, x); - x = y.madd(minus_cephes_dp3, x); - - // Evaluate the first polynom (0 <= x <= Pi/4) - auto z = x * x; - y = z.madd(coscof_p0, coscof_p1); - y = y.madd(z, coscof_p2); - y = y * z * z; - y = y - z * half + one; - - // Evaluate the second polynom (Pi/4 <= x <= 0) - auto y_2 = z.madd(sincof_p0, sincof_p1); - y_2 = y_2.madd(z, sincof_p2); - y_2 = y_2 * z; - y_2 = y_2.madd(x, x); - - // select the correct result from the two polynoms - y = blendv(y, y_2, poly_mask); - // update the sign - y = y ^ sign_bit; - - return y; + return {Sleef_cosf4_u10vsx(_vec0), Sleef_cosf4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE cosh() const { - // cosh = 1/2 * (e^x + e^-x) - auto x = abs(); - auto e_x = x.exp(); - auto ret = (e_x + Vectorized(one) / e_x) * half; - // inf and nan checks -#if 0 - ret = blendv(ret, v_inf, x >= vf_89); - ret = blendv(ret, v_inf, ret.isnan()); - ret = blendv(ret, v_nan, this->isnan()); -#endif - return ret; + return {Sleef_coshf4_u10vsx(_vec0), Sleef_coshf4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE floor() const { return {vec_floor(_vec0), vec_floor(_vec1)}; @@ -440,97 +339,16 @@ class Vectorized { return {vec_round(_vec0), vec_round(_vec1)}; } Vectorized C10_ALWAYS_INLINE sin() const { - // take the absolute value and xtract sign - auto x = abs(); - auto sign_bit = (*this) & sign_mask; - - // scale by 4/Pi - auto y = x * _4div_pi; - // store the integer part of y in mm0 - - // j=(j+1) & (~1) (see the cephes sources) - vint32 imm0 = (vec_signed(y._vec0) + vi_1) & vi_inv1; - vint32 imm1 = (vec_signed(y._vec1) + vi_1) & vi_inv1; - y._vec0 = vec_float(imm0); - y._vec1 = vec_float(imm1); - // get the swap sign flag - Vectorized swap_sign_bit, poly_mask; - swap_sign_bit._vecb0 = (vbool32)vec_sl(imm0 & vi_4, vu_29); - swap_sign_bit._vecb1 = (vbool32)vec_sl(imm1 & vi_4, vu_29); - // get the polynom selection mask - // there is one polynom for 0 <= x <= Pi/4 - // and another one for Pi/4 C10_ALWAYS_INLINE sinh() const { - auto temp_abs = abs(); - // get exponent - auto ret = temp_abs.exp(); - auto recp = Vectorized(half) / ret; - auto v = ret * half - recp; - // extract the sign bit (upper one) - auto sign_bit = (*this) & sign_mask; - auto z = temp_abs * temp_abs; - auto y = z.madd(p0, p1); - y = y.madd(z, p2); - y = (y * z).madd(temp_abs, temp_abs); - // check and select - auto result = blendv(y, v, temp_abs > one); - return result | sign_bit; + return {Sleef_sinhf4_u10vsx(_vec0), Sleef_sinhf4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE tan() const { - return {Sleef_tanf4_u10vsx(_vec0), Sleef_tanf4_u10vsx(_vec1)}; + return {Sleef_tanf4_u10vsx(_vec0), Sleef_tanf4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE tanh() const { - auto x = *this; - auto vabs = abs(); - // get exponent - auto exp2x = (vabs + vabs).exp(); - auto vv = Vectorized(one) - Vectorized(two) / (exp2x + one); - // extract the sign bit (upper one) - auto sign_bit = (*this) & sign_mask; - auto z = vabs * vabs; - auto y = z.madd(tanh_p0, tanh_p1); - auto tmp = y.madd(z, tanh_p2); - y = z.madd(tmp, tanh_p3); - tmp = y.madd(z, tanh_p4); - y = tmp * z; - tmp = y.madd(x, x); - // add sign - vv = vv | sign_bit; - // check and select - auto sel_mask = vabs >= tanh_0p625; - auto max_mask = vabs > tanh_half_max; - auto max_ret = sign_bit ^ one; - return blendv(blendv(tmp, vv, sel_mask), max_ret, max_mask); + return {Sleef_tanhf4_u10vsx(_vec0), Sleef_tanhf4_u10vsx(_vec1)}; } Vectorized C10_ALWAYS_INLINE trunc() const { return {vec_trunc(_vec0), vec_trunc(_vec1)}; @@ -555,15 +373,15 @@ class Vectorized { } Vectorized fmod(const Vectorized& b) const { - return {Sleef_fmodf4_vsx(_vec0, b._vec0),Sleef_fmodf4_vsx(_vec1, b._vec1)}; + return {Sleef_fmodf4_vsx(_vec0, b._vec0),Sleef_fmodf4_vsx(_vec1, b._vec1)}; } Vectorized hypot(const Vectorized& b) const { - return {Sleef_hypotf4_u05vsx(_vec0, b._vec0), Sleef_hypotf4_u05vsx(_vec1, b._vec1)}; + return {Sleef_hypotf4_u05vsx(_vec0, b._vec0), Sleef_hypotf4_u05vsx(_vec1, b._vec1)}; } Vectorized nextafter(const Vectorized& b) const { - return {Sleef_nextafterf4_vsx(_vec0, b._vec0), Sleef_nextafterf4_vsx(_vec1, b._vec1)}; + return {Sleef_nextafterf4_vsx(_vec0, b._vec0), Sleef_nextafterf4_vsx(_vec1, b._vec1)}; } Vectorized igamma(const Vectorized& x) const { diff --git a/aten/src/ATen/cpu/vec/vec512/vec512.h b/aten/src/ATen/cpu/vec/vec512/vec512.h index 0c6f33fa08a0..dd1235e82ece 100644 --- a/aten/src/ATen/cpu/vec/vec512/vec512.h +++ b/aten/src/ATen/cpu/vec/vec512/vec512.h @@ -190,6 +190,56 @@ inline deinterleave2(const Vectorized& a, const Vectorized& _mm512_mask_permutex2var_ps(a, 0xffff, idx2, b)); } +// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FLIP ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m512i mask = _mm512_set_epi32(0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15); + return _mm512_permutexvar_ps(mask, v); +} + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m512i mask = _mm512_set_epi64(0, 1, 2, 3, 4, 5, 6, 7); + return _mm512_permutexvar_pd(mask, v); +} + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m512i mask = _mm512_set_epi64(0, 1, 2, 3, 4, 5, 6, 7); + return _mm512_permutexvar_epi64(mask, v); +} + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m512i mask = _mm512_set_epi32(0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15); + return _mm512_permutexvar_epi32(mask, v); +} + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m512i mask = _mm512_set_epi16( + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 + ); + return _mm512_permutexvar_epi16(mask, v); +} + +template<> +inline Vectorized flip(const Vectorized & v) { + const __m512i mask1 = _mm512_set_epi8( + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 + ); + const __m512i mask2 = _mm512_set_epi64(1, 0, 3, 2, 5, 4, 7, 6); + auto reversed_vec = _mm512_shuffle_epi8(v, mask1); + return _mm512_permutexvar_epi64(mask2, reversed_vec); +} + #endif // defined(CPU_CAPABILITY_AVX512) && !defined(_MSC_VER) }}} diff --git a/aten/src/ATen/cpu/vec/vec512/vec512_bfloat16.h b/aten/src/ATen/cpu/vec/vec512/vec512_bfloat16.h index c690682a4aa4..65fca9154215 100644 --- a/aten/src/ATen/cpu/vec/vec512/vec512_bfloat16.h +++ b/aten/src/ATen/cpu/vec/vec512/vec512_bfloat16.h @@ -800,6 +800,43 @@ inline void convert(const BFloat16* src, BFloat16* dst, int64_t n) { } } +template <> +inline void convert(const float* src, BFloat16* dst, int64_t n) { + int64_t i; + for (i = 0; i + Vectorized::size() <= n; i += Vectorized::size()) { + __m512 a = _mm512_loadu_ps(&src[i]); + __m512 b = _mm512_loadu_ps(&src[i + 16]); + + __m512i bf = cvtfp32_bf16(a, b); + _mm512_storeu_si512(reinterpret_cast<__m512i*>(&dst[i]), bf); + } + for (; i < n; i++) { + dst[i] = c10::convert(src[i]); + } +} + +template <> +inline void convert(const double* src, BFloat16* dst, int64_t n) { + auto load_float = [](const double *src) -> __m512 { + // Load one float vector from an array of doubles + __m256 a = _mm512_cvtpd_ps(_mm512_loadu_pd(src)); + __m256 b = _mm512_cvtpd_ps(_mm512_loadu_pd(src + 8)); + return _mm512_insertf32x8(_mm512_castps256_ps512(a), b, 1); + }; + + int64_t i; + for (i = 0; i + Vectorized::size() <= n; i += Vectorized::size()) { + __m512 a = load_float(&src[i]); + __m512 b = load_float(&src[i + 16]); + + __m512i bf = cvtfp32_bf16(a, b); + _mm512_storeu_si512(reinterpret_cast<__m512i*>(&dst[i]), bf); + } + for (; i < n; i++) { + dst[i] = c10::convert(src[i]); + } +} + template <> Vectorized inline fmadd(const Vectorized& a, const Vectorized& b, const Vectorized& c) { @@ -831,7 +868,9 @@ inline std::tuple, Vectorized> convert_bfloat16_float(c __at_align__ float arr[K]; __at_align__ BFloat16 arr2[K]; a.store(arr2); - convert(arr2, arr, K); + for (const auto k : c10::irange(K)) { + arr[k] = c10::convert(arr2[k]); + } return std::make_tuple( Vectorized::loadu(arr), Vectorized::loadu(arr + Vectorized::size())); @@ -843,7 +882,9 @@ inline Vectorized convert_float_bfloat16(const Vectorized& a, c __at_align__ BFloat16 arr2[K]; a.store(arr); b.store(arr + Vectorized::size()); - convert(arr, arr2, K); + for (const auto k : c10::irange(K)) { + arr2[k] = c10::convert(arr[k]); + } return Vectorized::loadu(arr2); } diff --git a/aten/src/ATen/cpu/vec/vec512/vec512_double.h b/aten/src/ATen/cpu/vec/vec512/vec512_double.h index 077ce2381cdc..c4c0749d14c2 100644 --- a/aten/src/ATen/cpu/vec/vec512/vec512_double.h +++ b/aten/src/ATen/cpu/vec/vec512/vec512_double.h @@ -450,6 +450,11 @@ Vectorized inline fmadd(const Vectorized& a, const Vectorized +Vectorized inline fmsub(const Vectorized& a, const Vectorized& b, const Vectorized& c) { + return _mm512_fmsub_pd(a, b, c); +} + #endif }}} diff --git a/aten/src/ATen/cpu/vec/vec512/vec512_float.h b/aten/src/ATen/cpu/vec/vec512/vec512_float.h index e0c93a834118..849e1320f55a 100644 --- a/aten/src/ATen/cpu/vec/vec512/vec512_float.h +++ b/aten/src/ATen/cpu/vec/vec512/vec512_float.h @@ -465,6 +465,11 @@ Vectorized inline fmadd(const Vectorized& a, const Vectorized +Vectorized inline fmsub(const Vectorized& a, const Vectorized& b, const Vectorized& c) { + return _mm512_fmsub_ps(a, b, c); +} + #endif }}} diff --git a/aten/src/ATen/cpu/vec/vec512/vec512_int.h b/aten/src/ATen/cpu/vec/vec512/vec512_int.h index c2cbc0b1d7f9..a2550fbfc1df 100644 --- a/aten/src/ATen/cpu/vec/vec512/vec512_int.h +++ b/aten/src/ATen/cpu/vec/vec512/vec512_int.h @@ -828,6 +828,280 @@ class Vectorized : public Vectorizedi { Vectorized le(const Vectorized& other) const; }; +template <> +class Vectorized : public Vectorizedi { +private: + static constexpr __m512i zero_vector {0, 0, 0, 0, 0, 0, 0, 0}; + static const Vectorized ones; +public: + using value_type = uint8_t; + static constexpr int size() { + return 64; + } + using Vectorizedi::Vectorizedi; + Vectorized() {} + Vectorized(uint8_t v) { values = _mm512_set1_epi8(v); } + Vectorized(uint8_t val1, uint8_t val2, uint8_t val3, uint8_t val4, + uint8_t val5, uint8_t val6, uint8_t val7, uint8_t val8, + uint8_t val9, uint8_t val10, uint8_t val11, uint8_t val12, + uint8_t val13, uint8_t val14, uint8_t val15, uint8_t val16, + uint8_t val17, uint8_t val18, uint8_t val19, uint8_t val20, + uint8_t val21, uint8_t val22, uint8_t val23, uint8_t val24, + uint8_t val25, uint8_t val26, uint8_t val27, uint8_t val28, + uint8_t val29, uint8_t val30, uint8_t val31, uint8_t val32, + uint8_t val33, uint8_t val34, uint8_t val35, uint8_t val36, + uint8_t val37, uint8_t val38, uint8_t val39, uint8_t val40, + uint8_t val41, uint8_t val42, uint8_t val43, uint8_t val44, + uint8_t val45, uint8_t val46, uint8_t val47, uint8_t val48, + uint8_t val49, uint8_t val50, uint8_t val51, uint8_t val52, + uint8_t val53, uint8_t val54, uint8_t val55, uint8_t val56, + uint8_t val57, uint8_t val58, uint8_t val59, uint8_t val60, + uint8_t val61, uint8_t val62, uint8_t val63, uint8_t val64){ + values = _mm512_set_epi8(val64, val63, val62, val61, val60, val59, val58, val57, + val56, val55, val54, val53,val52, val51, val50, val49, + val48, val47, val46, val45, val44, val43, val42, val41, + val40, val39, val38, val37, val36, val35, val34, val33, + val32, val31, val30, val29, val28, val27, val26, val25, + val24, val23, val22, val21, val20, val19, val18, val17, + val16, val15, val14, val13, val12, val11, val10, val9, + val8, val7, val6, val5, val4, val3, val2, val1); + } + template + static Vectorized blend(Vectorized a, Vectorized b) { + return _mm512_mask_blend_epi8(mask, a.values, b.values); + } + static Vectorized blendv(const Vectorized& a, const Vectorized& b, + const Vectorized& mask) { + auto msb_one = _mm512_set1_epi8(0xFF); + auto mask_ = _mm512_cmp_epu8_mask(mask, msb_one, _MM_CMPINT_EQ); + return _mm512_mask_blend_epi8(mask_, a.values, b.values); + } + template + static Vectorized arange(uint8_t base = 0, step_t step = static_cast(1)) { + return Vectorized( + base, base + step, base + 2 * step, base + 3 * step, + base + 4 * step, base + 5 * step, base + 6 * step, base + 7 * step, + base + 8 * step, base + 9 * step, base + 10 * step, base + 11 * step, + base + 12 * step, base + 13 * step, base + 14 * step, base + 15 * step, + base + 16 * step, base + 17 * step, base + 18 * step, base + 19 * step, + base + 20 * step, base + 21 * step, base + 22 * step, base + 23 * step, + base + 24 * step, base + 25 * step, base + 26 * step, base + 27 * step, + base + 28 * step, base + 29 * step, base + 30 * step, base + 31 * step, + base + 32 * step, base + 33 * step, base + 34 * step, base + 35 * step, + base + 36 * step, base + 37 * step, base + 38 * step, base + 39 * step, + base + 40 * step, base + 41 * step, base + 42 * step, base + 43 * step, + base + 44 * step, base + 45 * step, base + 46 * step, base + 47 * step, + base + 48 * step, base + 49 * step, base + 50 * step, base + 51 * step, + base + 52 * step, base + 53 * step, base + 54 * step, base + 55 * step, + base + 56 * step, base + 57 * step, base + 58 * step, base + 59 * step, + base + 60 * step, base + 61 * step, base + 62 * step, base + 63 * step); + } + static Vectorized + set(Vectorized a, Vectorized b, uint8_t count = size()) { + switch (count) { + case 0: + return a; + case 1: + return blend<0x1>(a, b); + case 2: + return blend<0x3>(a, b); + case 3: + return blend<0x7>(a, b); + case 4: + return blend<0xF>(a, b); + case 5: + return blend<0x1F>(a, b); + case 6: + return blend<0x3F>(a, b); + case 7: + return blend<0x7F>(a, b); + case 8: + return blend<0xFF>(a, b); + case 9: + return blend<0x1FF>(a, b); + case 10: + return blend<0x3FF>(a, b); + case 11: + return blend<0x7FF>(a, b); + case 12: + return blend<0xFFF>(a, b); + case 13: + return blend<0x1FFF>(a, b); + case 14: + return blend<0x3FFF>(a, b); + case 15: + return blend<0x7FFF>(a, b); + case 16: + return blend<0xFFFF>(a, b); + case 17: + return blend<0x1FFFF>(a, b); + case 18: + return blend<0x3FFFF>(a, b); + case 19: + return blend<0x7FFFF>(a, b); + case 20: + return blend<0xFFFFF>(a, b); + case 21: + return blend<0x1FFFFF>(a, b); + case 22: + return blend<0x3FFFFF>(a, b); + case 23: + return blend<0x7FFFFF>(a, b); + case 24: + return blend<0xFFFFFF>(a, b); + case 25: + return blend<0x1FFFFFF>(a, b); + case 26: + return blend<0x3FFFFFF>(a, b); + case 27: + return blend<0x7FFFFFF>(a, b); + case 28: + return blend<0xFFFFFFF>(a, b); + case 29: + return blend<0x1FFFFFFF>(a, b); + case 30: + return blend<0x3FFFFFFF>(a, b); + case 31: + return blend<0x7FFFFFFF>(a, b); + case 32: + return blend<0xFFFFFFFF>(a, b); + case 33: + return blend<0x1FFFFFFFF>(a, b); + case 34: + return blend<0x3FFFFFFFF>(a, b); + case 35: + return blend<0x7FFFFFFFF>(a, b); + case 36: + return blend<0xFFFFFFFFF>(a, b); + case 37: + return blend<0x1FFFFFFFFF>(a, b); + case 38: + return blend<0x3FFFFFFFFF>(a, b); + case 39: + return blend<0x7FFFFFFFFF>(a, b); + case 40: + return blend<0xFFFFFFFFFF>(a, b); + case 41: + return blend<0x1FFFFFFFFFF>(a, b); + case 42: + return blend<0x3FFFFFFFFFF>(a, b); + case 43: + return blend<0x7FFFFFFFFFF>(a, b); + case 44: + return blend<0xFFFFFFFFFFF>(a, b); + case 45: + return blend<0x1FFFFFFFFFFF>(a, b); + case 46: + return blend<0x3FFFFFFFFFFF>(a, b); + case 47: + return blend<0x7FFFFFFFFFFF>(a, b); + case 48: + return blend<0xFFFFFFFFFFFF>(a, b); + case 49: + return blend<0x1FFFFFFFFFFFF>(a, b); + case 50: + return blend<0x3FFFFFFFFFFFF>(a, b); + case 51: + return blend<0x7FFFFFFFFFFFF>(a, b); + case 52: + return blend<0xFFFFFFFFFFFFF>(a, b); + case 53: + return blend<0x1FFFFFFFFFFFFF>(a, b); + case 54: + return blend<0x3FFFFFFFFFFFFF>(a, b); + case 55: + return blend<0x7FFFFFFFFFFFFF>(a, b); + case 56: + return blend<0xFFFFFFFFFFFFFF>(a, b); + case 57: + return blend<0x1FFFFFFFFFFFFFF>(a, b); + case 58: + return blend<0x3FFFFFFFFFFFFFF>(a, b); + case 59: + return blend<0x7FFFFFFFFFFFFFF>(a, b); + case 60: + return blend<0xFFFFFFFFFFFFFFF>(a, b); + case 61: + return blend<0x1FFFFFFFFFFFFFFF>(a, b); + case 62: + return blend<0x3FFFFFFFFFFFFFFF>(a, b); + case 63: + return blend<0x7FFFFFFFFFFFFFFF>(a, b); + } + return b; + } + static Vectorized loadu(const void* ptr) { + return _mm512_loadu_si512(reinterpret_cast(ptr)); + } + static Vectorized loadu(const void* ptr, uint8_t count) { + __at_align__ uint8_t tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy(tmp_values, ptr, count * sizeof(uint8_t)); + return loadu(tmp_values); + } + void store(void* ptr, int count = size()) const { + if (count == size()) { + // ptr need not to be aligned here. See + // https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-intel-advanced-vector-extensions/intrinsics-for-load-and-store-operations-1/mm512-storeu-si512.html + _mm512_storeu_si512(reinterpret_cast<__m512i*>(ptr), values); + } else if (count > 0) { + __at_align__ uint8_t tmp_values[size()]; + _mm512_storeu_si512(reinterpret_cast<__m512i*>(tmp_values), values); + std::memcpy(ptr, tmp_values, count * sizeof(uint8_t)); + } + } + const uint8_t& operator[](int idx) const = delete; + uint8_t& operator[](int idx) = delete; + Vectorized abs() const { + return values; + } + Vectorized real() const { + return *this; + } + Vectorized imag() const { + return _mm512_set1_epi8(0); + } + Vectorized conj() const { + return *this; + } + Vectorized frac() const; + Vectorized neg() const; + Vectorized operator==(const Vectorized& other) const { + auto mask = _mm512_cmpeq_epu8_mask(values, other.values); + return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); + } + Vectorized operator!=(const Vectorized& other) const { + auto mask = _mm512_cmpneq_epu8_mask(values, other.values); + return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); + } + Vectorized operator<(const Vectorized& other) const { + auto mask = _mm512_cmplt_epu8_mask(values, other.values); + return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); + } + Vectorized operator<=(const Vectorized& other) const { + auto mask = _mm512_cmple_epu8_mask(values, other.values); + return _mm512_mask_set1_epi8(zero_vector, mask, 0xFF); + } + Vectorized operator>(const Vectorized& other) const { + return other < *this; + } + Vectorized operator>=(const Vectorized& other) const { + return other <= *this; + } + + Vectorized eq(const Vectorized& other) const; + Vectorized ne(const Vectorized& other) const; + Vectorized gt(const Vectorized& other) const; + Vectorized ge(const Vectorized& other) const; + Vectorized lt(const Vectorized& other) const; + Vectorized le(const Vectorized& other) const; +}; + template <> Vectorized inline operator+(const Vectorized& a, const Vectorized& b) { return _mm512_add_epi64(a, b); @@ -848,6 +1122,11 @@ Vectorized inline operator+(const Vectorized& a, const Vectorize return _mm512_add_epi8(a, b); } +template <> +Vectorized inline operator+(const Vectorized& a, const Vectorized& b) { + return _mm512_add_epi8(a, b); +} + template <> Vectorized inline operator-(const Vectorized& a, const Vectorized& b) { return _mm512_sub_epi64(a, b); @@ -868,6 +1147,11 @@ Vectorized inline operator-(const Vectorized& a, const Vectorize return _mm512_sub_epi8(a, b); } +template <> +Vectorized inline operator-(const Vectorized& a, const Vectorized& b) { + return _mm512_sub_epi8(a, b); +} + // Negation. Defined here so we can utilize operator- inline Vectorized Vectorized::neg() const { return Vectorized(0) - *this; @@ -885,6 +1169,10 @@ inline Vectorized Vectorized::neg() const { return Vectorized(0) - *this; } +inline Vectorized Vectorized::neg() const { + return Vectorized(0) - *this; +} + template <> Vectorized inline operator*(const Vectorized& a, const Vectorized& b) { return _mm512_mullo_epi64(a, b); @@ -918,6 +1206,12 @@ Vectorized inline operator*(const Vectorized& a, const Vectorize return int_elementwise_binary_512(a, b, std::multiplies()); } +template <> +Vectorized inline operator*(const Vectorized& a, const Vectorized& b) { + // We don't have an instruction for multiplying uint8_t + return int_elementwise_binary_512(a, b, std::multiplies()); +} + template <> Vectorized inline minimum(const Vectorized& a, const Vectorized& b) { return _mm512_min_epi64(a, b); @@ -938,6 +1232,11 @@ Vectorized inline minimum(const Vectorized& a, const Vectorized< return _mm512_min_epi8(a, b); } +template <> +Vectorized inline minimum(const Vectorized& a, const Vectorized& b) { + return _mm512_min_epu8(a, b); +} + template <> Vectorized inline maximum(const Vectorized& a, const Vectorized& b) { return _mm512_max_epi64(a, b); @@ -958,6 +1257,11 @@ Vectorized inline maximum(const Vectorized& a, const Vectorized< return _mm512_max_epi8(a, b); } +template <> +Vectorized inline maximum(const Vectorized& a, const Vectorized& b) { + return _mm512_max_epi8(a, b); +} + template <> Vectorized inline clamp(const Vectorized& a, const Vectorized& min_val, const Vectorized& max_val) { return _mm512_min_epi64(max_val, _mm512_max_epi64(a, min_val)); @@ -978,6 +1282,11 @@ Vectorized inline clamp(const Vectorized& a, const Vectorized +Vectorized inline clamp(const Vectorized& a, const Vectorized& min_val, const Vectorized& max_val) { + return _mm512_min_epu8(max_val, _mm512_max_epu8(a, min_val)); +} + template <> Vectorized inline clamp_max(const Vectorized& a, const Vectorized& max_val) { return _mm512_min_epi64(max_val, a); @@ -998,6 +1307,11 @@ Vectorized inline clamp_max(const Vectorized& a, const Vectorize return _mm512_min_epi8(max_val, a); } +template <> +Vectorized inline clamp_max(const Vectorized& a, const Vectorized& max_val) { + return _mm512_min_epu8(max_val, a); +} + template <> Vectorized inline clamp_min(const Vectorized& a, const Vectorized& min_val) { return _mm512_max_epi64(min_val, a); @@ -1018,6 +1332,11 @@ Vectorized inline clamp_min(const Vectorized& a, const Vectorize return _mm512_max_epi8(min_val, a); } +template <> +Vectorized inline clamp_min(const Vectorized& a, const Vectorized& min_val) { + return _mm512_max_epu8(min_val, a); +} + template Vectorized inline convert_to_int32(const T* ptr) { return Vectorized::loadu(ptr); @@ -1049,6 +1368,10 @@ template <> Vectorized inline operator/(const Vectorized& a, const Vectorized& b) { return int_elementwise_binary_512(a, b, std::divides()); } +template <> +Vectorized inline operator/(const Vectorized& a, const Vectorized& b) { + return int_elementwise_binary_512(a, b, std::divides()); +} template>::value, int> = 0> inline Vectorized operator&(const Vectorized& a, const Vectorized& b) { @@ -1163,6 +1486,164 @@ inline Vectorized Vectorized::le(const Vectorized& other return (*this <= other) & Vectorized(1); } +inline Vectorized Vectorized::eq(const Vectorized& other) const { + return (*this == other) & Vectorized(1); +} + +inline Vectorized Vectorized::ne(const Vectorized& other) const { + return (*this != other) & Vectorized(1); +} + +inline Vectorized Vectorized::gt(const Vectorized& other) const { + return (*this > other) & Vectorized(1); +} + +inline Vectorized Vectorized::ge(const Vectorized& other) const { + return (*this >= other) & Vectorized(1); +} + +inline Vectorized Vectorized::lt(const Vectorized& other) const { + return (*this < other) & Vectorized(1); +} + +inline Vectorized Vectorized::le(const Vectorized& other) const { + return (*this <= other) & Vectorized(1); +} + +template ::value || std::is_same::value, int> = 0> +Vectorized inline shift_512_8(const Vectorized& a, const Vectorized& b) { + // No vector instruction for shifting int8_t/uint8_t, so emulating + // it instead. + + // Control masks for shuffle operation, treating 512 bits as an + // array of 8-bit elements, and considering pairs of neighboring + // elements. Specifially, a mask named "ctl_M_N" (M,N in [0,1], and + // M!=N) is set so that shuffle will move element with index M from + // input pair into element with index N in output pair, and element + // with index M in output pair will be set to all 0s. + __m512i ctl_0_1 = _mm512_set_epi8(62, 0x80, 60, 0x80, 58, 0x80, 56, 0x80, + 54, 0x80, 52, 0x80, 50, 0x80, 48, 0x80, + 46, 0x80, 44, 0x80, 42, 0x80, 40, 0x80, + 38, 0x80, 36, 0x80, 34, 0x80, 32, 0x80, + 30, 0x80, 28, 0x80, 26, 0x80, 24, 0x80, + 22, 0x80, 20, 0x80, 18, 0x80, 16, 0x80, + 14, 0x80, 12, 0x80, 10, 0x80, 8, 0x80, + 6, 0x80, 4, 0x80, 2, 0x80, 0, 0x80); + __m512i ctl_1_0 = _mm512_set_epi8(0x80, 63, 0x80, 61, 0x80, 59, 0x80, 57, + 0x80, 55, 0x80, 53, 0x80, 51, 0x80, 49, + 0x80, 47, 0x80, 45, 0x80, 43, 0x80, 41, + 0x80, 39, 0x80, 37, 0x80, 35, 0x80, 33, + 0x80, 31, 0x80, 29, 0x80, 27, 0x80, 25, + 0x80, 23, 0x80, 21, 0x80, 19, 0x80, 17, + 0x80, 15, 0x80, 13, 0x80, 11, 0x80, 9, + 0x80, 7, 0x80, 5, 0x80, 3, 0x80, 1); + + // Masks for bitwise and operation, treating 512 bits as an array of + // 8-bit elements, and considering them in pairs of neighboring + // elements. A mask named "keep_M" (M in [0,1]) is set so that + // bitwise and will copy element with index M from input pair into + // element with the same index in output pair, while the other + // element in output pair will be set to all 0s. + __m512i keep_0 = _mm512_set1_epi16(0xFF); + __m512i keep_1 = _mm512_set1_epi16(0xFF00); + + // Take each 8-bit element with idx%2==0 from input array to be + // shifted and extend it to 16 bits so that 0s are added to the + // right. Then, perform shifting on this 16-bit number. Upper 8 + // bits will be proper result of shifting original 8-bit number, so + // write them to result array, into the same position from which + // corresponding input element is taken. Also, make sure that + // result array elements with idx%2!=0 are set to all 0s. + // + // Note that number of bits to shift for is extended to 16 bits by + // adding 0s to the left. That means this number is not properly + // sign-extended for negative values. However, number of bits to + // shift is treated as an unsigned integer by respective shift + // intrinsics anyway so if negative then either with or without + // proper sign extension, it will be interpreted as a number greater + // than 32, and the shifting result will be the same. + __m512i a0 = _mm512_shuffle_epi8(a, ctl_0_1); + __m512i b0 = _mm512_and_si512(b, keep_0); + __m512i c0; + if (left_shift) + c0 = _mm512_sllv_epi16(a0, b0); + else + if (std::is_same::value) + c0 = _mm512_srav_epi16(a0, b0); + else + c0 = _mm512_srlv_epi16(a0, b0); + c0 = _mm512_shuffle_epi8(c0, ctl_1_0); + + // Peform shifting the same way for input array elements with + // idx%2==1. + __m512i a1 = _mm512_and_si512(a, keep_1); + __m512i b1 = _mm512_shuffle_epi8(b, ctl_1_0); + __m512i c1; + if (left_shift) + c1 = _mm512_sllv_epi16(a1, b1); + else + if (std::is_same::value) + c1 = _mm512_srav_epi16(a1, b1); + else + c1 = _mm512_srlv_epi16(a1, b1); + c1 = _mm512_and_si512(c1, keep_1); + + // Merge partial results into the final result. + __m512i c = _mm512_or_si512(c0, c1); + + return c; +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return _mm512_sllv_epi64(a, b); +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return _mm512_sllv_epi32(a, b); +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return _mm512_sllv_epi16(a, b); +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return shift_512_8(a, b); +} + +template <> +Vectorized inline operator<<(const Vectorized& a, const Vectorized& b) { + return shift_512_8(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return _mm512_srav_epi64(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return _mm512_srav_epi32(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return _mm512_srav_epi16(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return shift_512_8(a, b); +} + +template <> +Vectorized inline operator>>(const Vectorized& a, const Vectorized& b) { + return shift_512_8(a, b); +} + #endif }}} diff --git a/aten/src/ATen/cpu/vec/vec512/vec512_qint.h b/aten/src/ATen/cpu/vec/vec512/vec512_qint.h index 0f3474eaa2ad..87cf44283c0b 100644 --- a/aten/src/ATen/cpu/vec/vec512/vec512_qint.h +++ b/aten/src/ATen/cpu/vec/vec512/vec512_qint.h @@ -268,6 +268,18 @@ struct Vectorized : public Vectorizedqi { return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy(tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return loadu(tmp_values); + } + float_vec_return_type dequantize( Vectorized scale, Vectorized zero_point, @@ -447,6 +459,18 @@ struct Vectorized : public Vectorizedqi { return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy(tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return loadu(tmp_values); + } + private: __m512i cvtepi8_epi32(__m128i epi8_vals) const { return _mm512_cvtepi8_epi32(epi8_vals); @@ -611,6 +635,18 @@ struct Vectorized : public Vectorizedqi { return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy(tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return loadu(tmp_values); + } + private: __m512i cvtepu8_epi32(__m128i epu8_vals) const { return _mm512_cvtepu8_epi32(epu8_vals); @@ -833,6 +869,18 @@ struct Vectorized : public VectorizedQuantizedConverter< return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy(tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return loadu(tmp_values); + } + static Vectorized quantize( const float_vec_return_type& rhs, float scale, @@ -965,6 +1013,18 @@ struct Vectorized : public VectorizedQuantizedConverter< return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy(tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return loadu(tmp_values); + } + static Vectorized quantize( const float_vec_return_type& rhs, float scale, @@ -1085,6 +1145,18 @@ struct Vectorized : public VectorizedQuantizedConverter< return Vectorized(ptr); } + static Vectorized loadu(const void* ptr, int64_t count) { + __at_align__ value_type tmp_values[size()]; + // Ensure uninitialized memory does not change the output value See https://github.com/pytorch/pytorch/issues/32502 + // for more details. We do not initialize arrays to zero using "={0}" because gcc would compile it to two + // instructions while a loop would be compiled to one instruction. + for (const auto i : c10::irange(size())) { + tmp_values[i] = 0; + } + std::memcpy(tmp_values, reinterpret_cast(ptr), count * sizeof(value_type)); + return loadu(tmp_values); + } + static Vectorized quantize( const float_vec_return_type& rhs, float scale, diff --git a/aten/src/ATen/cpu/vec/vec_base.h b/aten/src/ATen/cpu/vec/vec_base.h index 3bf1010efd68..abf106e8d5b3 100644 --- a/aten/src/ATen/cpu/vec/vec_base.h +++ b/aten/src/ATen/cpu/vec/vec_base.h @@ -33,6 +33,7 @@ #include #include #include +#include // These macros helped us unify vec_base.h #ifdef CPU_CAPABILITY_AVX512 @@ -131,8 +132,9 @@ struct Vectorized { // versions GCC/Clang have buggy determinations on whether or not an // identifier is odr-used or not, and in any case it's hard to tell if // a variable is odr-used or not. So best to just cut the problem at the root. + static constexpr size_type size_T = sizeof(T); // Workaround to compile with VS2022. static constexpr size_type size() { - return VECTOR_WIDTH / sizeof(T); + return VECTOR_WIDTH / size_T; } Vectorized() : values{static_cast(0)} {} Vectorized(T val) { @@ -797,6 +799,21 @@ inline Vectorized operator~(const Vectorized& a) { return a ^ ones; } +template Vectorized inline operator<<(const Vectorized &a, const Vectorized &b) { + Vectorized c; + for (int i = 0; i != Vectorized::size(); i++) { + c[i] = a[i] << b[i]; + } + return c; +} + +template Vectorized inline operator>>(const Vectorized &a, const Vectorized &b) { + Vectorized c; + for (int i = 0; i != Vectorized::size(); i++) { + c[i] = a[i] >> b[i]; + } + return c; +} template inline Vectorized& operator += (Vectorized& a, const Vectorized& b) { @@ -824,11 +841,28 @@ inline Vectorized& operator *= (Vectorized& a, const Vectorized& b) { return a; } +template +inline Vectorized& operator <<= (Vectorized& a, const Vectorized& b) { + a = a << b; + return a; +} + +template +inline Vectorized& operator >>= (Vectorized& a, const Vectorized& b) { + a = a >> b; + return a; +} + template inline Vectorized fmadd(const Vectorized& a, const Vectorized& b, const Vectorized& c) { return a * b + c; } +template +inline Vectorized fmsub(const Vectorized& a, const Vectorized& b, const Vectorized& c) { + return a * b - c; +} + template std::enable_if_t> inline gather(T const* base_addr, const Vectorized>& vindex) { @@ -975,10 +1009,22 @@ inline void convert(const src_T *src, dst_T *dst, int64_t n) { #endif for (const auto i : c10::irange(n)) { (void)i; //Suppress unused variable warning - *dst = c10::static_cast_with_inter_type::apply(*src); + *dst = c10::convert(c10::load(src)); src++; dst++; } } +template +inline Vectorized flip(const Vectorized & data) { + static constexpr int size = Vectorized::size(); + T output[size]; + T buffer[size]; + data.store(static_cast(buffer)); + for (const auto i : c10::irange(size)) { + output[i] = buffer[size - i - 1]; + } + return Vectorized::loadu(static_cast(output)); +} + }}} diff --git a/aten/src/ATen/cuda/Atomic.cuh b/aten/src/ATen/cuda/Atomic.cuh index 079b289ef8c3..3d60b672e972 100644 --- a/aten/src/ATen/cuda/Atomic.cuh +++ b/aten/src/ATen/cuda/Atomic.cuh @@ -6,6 +6,10 @@ #include +#if !(defined(USE_ROCM) || ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800)))) +#include +#endif + template struct AtomicFPOp; @@ -164,6 +168,7 @@ Atomic##NAME##IntegerImpl()(address, } \ ATOMIC_INTEGER_IMPL(Add) +GPU_ATOMIC_INTEGER(Add, a || b, bool) // Don't instantiate gpuAtomicAdd with the macro as it seems non-standard (see int32, int64) static inline __device__ void gpuAtomicAdd(uint8_t *address, uint8_t val) { @@ -206,10 +211,6 @@ static inline __device__ void gpuAtomicAdd(int64_t *address, int64_t val) { #endif } -static inline __device__ void gpuAtomicAdd(bool *address, bool val) { - *address = address && val; -} - static inline __device__ at::Half gpuAtomicAdd(at::Half *address, at::Half val) { #if defined(USE_ROCM) || ((defined(CUDA_VERSION) && CUDA_VERSION < 10000) || (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 700))) return AtomicFPOp()(address, val, @@ -222,10 +223,15 @@ static inline __device__ at::Half gpuAtomicAdd(at::Half *address, at::Half val) } static inline __device__ at::BFloat16 gpuAtomicAdd(at::BFloat16 *address, at::BFloat16 val) { - return AtomicFPOp()(address, val, - [](at::BFloat16 bsum, at::BFloat16 val) { - return bsum + val; - }); +#if defined(USE_ROCM) || ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800))) +return AtomicFPOp()(address, val, + [](at::BFloat16 bsum, at::BFloat16 val) { + return bsum + val; + }); +#else + __nv_bfloat16 r = atomicAdd(reinterpret_cast<__nv_bfloat16*>(address), *reinterpret_cast<__nv_bfloat16*>(&val)); + return *reinterpret_cast(&r); +#endif } #if defined(CUDA_VERSION) && defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 600 || CUDA_VERSION < 8000) @@ -256,7 +262,7 @@ static inline __device__ double atomicAdd(double* address, double val) * minimal. */ -#if defined(__HIP_PLATFORM_HCC__) && __hcc_workweek__ < 18312 && !__HIP__ +#if defined(USE_ROCM) && __hcc_workweek__ < 18312 && !__HIP__ // This needs to be defined for the host side pass static inline __device__ double atomicAdd(double *address, double val) { } #endif diff --git a/aten/src/ATen/cuda/CUDABlas.cpp b/aten/src/ATen/cuda/CUDABlas.cpp index e99017289d68..866f53ee7f87 100644 --- a/aten/src/ATen/cuda/CUDABlas.cpp +++ b/aten/src/ATen/cuda/CUDABlas.cpp @@ -1162,7 +1162,7 @@ void vdot>(CUDABLAS_DOT_ARGTYPES(c10::complex)) { reinterpret_cast(result))); } -// This guards blocks use of getrsBatched, geqrfBatched, getrfBatched, getriBatched on platforms other than cuda +// This guards blocks use of getrsBatched, geqrfBatched, getrfBatched on platforms other than cuda #ifdef CUDART_VERSION template <> @@ -1323,67 +1323,6 @@ void getrfBatched>( batchsize)); } -template <> -void getriBatched( - int n, double** dA_array, int ldda, int* ipiv_array, double** dC_array, int lddc, int* info_array, int batchsize) { - auto handle = at::cuda::getCurrentCUDABlasHandle(); - TORCH_CUDABLAS_CHECK(cublasDgetriBatched( - handle, n, dA_array, ldda, ipiv_array, dC_array, lddc, info_array, batchsize)); -} - -template <> -void getriBatched( - int n, float** dA_array, int ldda, int* ipiv_array, float** dC_array, int lddc, int* info_array, int batchsize) { - auto handle = at::cuda::getCurrentCUDABlasHandle(); - TORCH_CUDABLAS_CHECK(cublasSgetriBatched( - handle, n, dA_array, ldda, ipiv_array, dC_array, lddc, info_array, batchsize)); -} - -template <> -void getriBatched>( - int n, - c10::complex** dA_array, - int ldda, - int* ipiv_array, - c10::complex** dC_array, - int lddc, - int* info_array, - int batchsize) { - auto handle = at::cuda::getCurrentCUDABlasHandle(); - TORCH_CUDABLAS_CHECK(cublasZgetriBatched( - handle, - n, - reinterpret_cast(dA_array), - ldda, - ipiv_array, - reinterpret_cast(dC_array), - lddc, - info_array, - batchsize)); -} - -template <> -void getriBatched>( - int n, - c10::complex** dA_array, - int ldda, - int* ipiv_array, - c10::complex** dC_array, - int lddc, - int* info_array, - int batchsize) { - auto handle = at::cuda::getCurrentCUDABlasHandle(); - TORCH_CUDABLAS_CHECK(cublasCgetriBatched( - handle, - n, - reinterpret_cast(dA_array), - ldda, - ipiv_array, - reinterpret_cast(dC_array), - lddc, - info_array, - batchsize)); -} template <> void gelsBatched(CUDABLAS_GELS_BATCHED_ARGTYPES(double)) { diff --git a/aten/src/ATen/cuda/CUDABlas.h b/aten/src/ATen/cuda/CUDABlas.h index 10e589ecd6c9..96c7fc818422 100644 --- a/aten/src/ATen/cuda/CUDABlas.h +++ b/aten/src/ATen/cuda/CUDABlas.h @@ -227,7 +227,7 @@ void vdot>(CUDABLAS_DOT_ARGTYPES(c10::complex)); template <> void vdot>(CUDABLAS_DOT_ARGTYPES(c10::complex)); -// This guards blocks use of getrsBatched, geqrfBatched, getrfBatched, getriBatched on platforms other than cuda +// This guards blocks use of getrsBatched, geqrfBatched, getrfBatched on platforms other than cuda #ifdef CUDART_VERSION #define CUDABLAS_GETRS_ARGTYPES(Dtype) \ @@ -287,22 +287,6 @@ TORCH_CUDA_CU_API void getrfBatched>(CUDABLAS_GETRF_ARGTYPE template<> TORCH_CUDA_CU_API void getrfBatched>(CUDABLAS_GETRF_ARGTYPES(c10::complex)); -#define CUDABLAS_GETRI_ARGTYPES(Dtype) \ - int n, Dtype** dA_array, int ldda, int* ipiv_array, Dtype** dC_array, int lddc, int* info_array, int batchsize - -template -void getriBatched(CUDABLAS_GETRI_ARGTYPES(Dtype)) { - TORCH_CHECK(false, "at::cuda::blas::getriBatched: not implemented for ", typeid(Dtype).name()); -} -template<> -TORCH_CUDA_CU_API void getriBatched(CUDABLAS_GETRI_ARGTYPES(float)); -template<> -TORCH_CUDA_CU_API void getriBatched(CUDABLAS_GETRI_ARGTYPES(double)); -template<> -TORCH_CUDA_CU_API void getriBatched>(CUDABLAS_GETRI_ARGTYPES(c10::complex)); -template<> -TORCH_CUDA_CU_API void getriBatched>(CUDABLAS_GETRI_ARGTYPES(c10::complex)); - #define CUDABLAS_GELS_BATCHED_ARGTYPES(Dtype) \ cublasHandle_t handle, cublasOperation_t trans, int m, int n, int nrhs, Dtype** dA_array, int ldda, Dtype** dC_array, int lddc, int* info, int *devInfoArray, int batchSize diff --git a/aten/src/ATen/cuda/CUDAContext.h b/aten/src/ATen/cuda/CUDAContext.h index 0167cd585eaa..12349b709050 100644 --- a/aten/src/ATen/cuda/CUDAContext.h +++ b/aten/src/ATen/cuda/CUDAContext.h @@ -72,6 +72,8 @@ TORCH_CUDA_CPP_API Allocator* getCUDADeviceAllocator(); TORCH_CUDA_CPP_API cusparseHandle_t getCurrentCUDASparseHandle(); TORCH_CUDA_CPP_API cublasHandle_t getCurrentCUDABlasHandle(); +TORCH_CUDA_CPP_API void clearCublasWorkspaces(); + #ifdef CUDART_VERSION TORCH_CUDA_CPP_API cusolverDnHandle_t getCurrentCUDASolverDnHandle(); #endif diff --git a/aten/src/ATen/cuda/CUDADataType.h b/aten/src/ATen/cuda/CUDADataType.h index 5221b233398c..d25722c080ec 100644 --- a/aten/src/ATen/cuda/CUDADataType.h +++ b/aten/src/ATen/cuda/CUDADataType.h @@ -33,7 +33,7 @@ template<> inline cudaDataType getCudaDataType>() { } // HIP doesn't define integral types -#ifndef __HIP_PLATFORM_HCC__ +#ifndef USE_ROCM template<> inline cudaDataType getCudaDataType() { return CUDA_R_8U; } @@ -45,7 +45,7 @@ template<> inline cudaDataType getCudaDataType() { } #endif -#if !defined(__HIP_PLATFORM_HCC__) && defined(CUDA_VERSION) && CUDA_VERSION >= 11000 +#if !defined(USE_ROCM) && defined(CUDA_VERSION) && CUDA_VERSION >= 11000 template<> inline cudaDataType getCudaDataType() { return CUDA_R_16I; } @@ -60,7 +60,7 @@ template<> inline cudaDataType getCudaDataType() { inline cudaDataType ScalarTypeToCudaDataType(const c10::ScalarType& scalar_type) { switch (scalar_type) { // HIP doesn't define integral types -#ifndef __HIP_PLATFORM_HCC__ +#ifndef USE_ROCM case c10::ScalarType::Byte: return CUDA_R_8U; case c10::ScalarType::Char: @@ -80,7 +80,7 @@ inline cudaDataType ScalarTypeToCudaDataType(const c10::ScalarType& scalar_type) return CUDA_C_32F; case c10::ScalarType::ComplexDouble: return CUDA_C_64F; -#if !defined(__HIP_PLATFORM_HCC__) && defined(CUDA_VERSION) && CUDA_VERSION >= 11000 +#if !defined(USE_ROCM) && defined(CUDA_VERSION) && CUDA_VERSION >= 11000 case c10::ScalarType::Short: return CUDA_R_16I; case c10::ScalarType::Long: diff --git a/aten/src/ATen/cuda/CUDAEvent.h b/aten/src/ATen/cuda/CUDAEvent.h index 205fad8c1121..1c3c67949e58 100644 --- a/aten/src/ATen/cuda/CUDAEvent.h +++ b/aten/src/ATen/cuda/CUDAEvent.h @@ -48,7 +48,7 @@ struct TORCH_CUDA_CPP_API CUDAEvent { CUDAGuard guard(device_index_); const c10::impl::PyInterpreter* interp = c10::impl::GPUTrace::get_trace(); if (C10_UNLIKELY(interp)) { - interp->trace_gpu_event_deletion(reinterpret_cast(event_)); + (*interp)->trace_gpu_event_deletion(reinterpret_cast(event_)); } cudaEventDestroy(event_); } @@ -120,7 +120,7 @@ struct TORCH_CUDA_CPP_API CUDAEvent { AT_CUDA_CHECK(cudaEventRecord(event_, stream)); const c10::impl::PyInterpreter* interp = c10::impl::GPUTrace::get_trace(); if (C10_UNLIKELY(interp)) { - interp->trace_gpu_event_record( + (*interp)->trace_gpu_event_record( reinterpret_cast(event_), reinterpret_cast(stream.stream()) ); @@ -136,7 +136,7 @@ struct TORCH_CUDA_CPP_API CUDAEvent { AT_CUDA_CHECK(cudaStreamWaitEvent(stream, event_, 0)); const c10::impl::PyInterpreter* interp = c10::impl::GPUTrace::get_trace(); if (C10_UNLIKELY(interp)) { - interp->trace_gpu_event_wait( + (*interp)->trace_gpu_event_wait( reinterpret_cast(event_), reinterpret_cast(stream.stream()) ); @@ -157,6 +157,10 @@ struct TORCH_CUDA_CPP_API CUDAEvent { // Note: cudaEventSynchronize can be safely called from any device void synchronize() const { if (is_created_) { + const c10::impl::PyInterpreter* interp = c10::impl::GPUTrace::get_trace(); + if (C10_UNLIKELY(interp)) { + (*interp)->trace_gpu_event_synchronization(reinterpret_cast(event_)); + } AT_CUDA_CHECK(cudaEventSynchronize(event_)); } } @@ -185,7 +189,7 @@ struct TORCH_CUDA_CPP_API CUDAEvent { AT_CUDA_CHECK(cudaEventCreateWithFlags(&event_, flags_)); const c10::impl::PyInterpreter* interp = c10::impl::GPUTrace::get_trace(); if (C10_UNLIKELY(interp)) { - interp->trace_gpu_event_creation(reinterpret_cast(event_)); + (*interp)->trace_gpu_event_creation(reinterpret_cast(event_)); } is_created_ = true; } diff --git a/aten/src/ATen/cuda/CUDAGeneratorImpl.cpp b/aten/src/ATen/cuda/CUDAGeneratorImpl.cpp index 0cac5d6da2d5..a678354dca49 100644 --- a/aten/src/ATen/cuda/CUDAGeneratorImpl.cpp +++ b/aten/src/ATen/cuda/CUDAGeneratorImpl.cpp @@ -231,7 +231,8 @@ uint64_t CUDAGeneratorImpl::philox_offset_per_thread() const { * offset_extragraph is the initial offset at the start of the graphed region. * offset_intragraph tracks the offset in the graphed region. */ -void CUDAGeneratorImpl::capture_prologue(int64_t* offset_extragraph) { +void CUDAGeneratorImpl::capture_prologue(int64_t* seed_extragraph, int64_t* offset_extragraph) { + seed_extragraph_ = seed_extragraph; offset_extragraph_ = offset_extragraph; offset_intragraph_ = 0; graph_expects_this_gen_ = true; @@ -279,7 +280,7 @@ PhiloxCudaState CUDAGeneratorImpl::philox_cuda_state(uint64_t increment) { TORCH_INTERNAL_ASSERT(this->offset_intragraph_ <= std::numeric_limits::max() - increment); this->offset_intragraph_ += increment; - return PhiloxCudaState(this->seed_, + return PhiloxCudaState(this->seed_extragraph_, this->offset_extragraph_, offset); } else { diff --git a/aten/src/ATen/cuda/CUDAGeneratorImpl.h b/aten/src/ATen/cuda/CUDAGeneratorImpl.h index 768f0b7549c2..b8d563343f24 100644 --- a/aten/src/ATen/cuda/CUDAGeneratorImpl.h +++ b/aten/src/ATen/cuda/CUDAGeneratorImpl.h @@ -19,10 +19,10 @@ namespace at { * * A CUDA graph containing multiple RNG ops behaves like a * single giant kernel from the perspective of ops external - * to the graph. During graph capture, logic below records - * the total of all offset increments that occur in the graphed - * region, and records the final total as the offset for the - * entire graph. + * to the graph. During graph capture, logic in CUDAGeneratorImpl + * records the total of all offset increments that occur in the + * graphed region, and records the final total as the offset for + * the entire graph. * * When the graph reruns, the logic that reruns it * increments this device's CUDA generator's offset @@ -30,8 +30,8 @@ namespace at { * * Meanwhile, within the graph, at capture time, instead of * populating PhiloxCudaStates with the uint64_t offset pulled - * directly from the global state, PhiloxCudaState instead - * holds a pointer to one-element stream-local int64_t device tensor + * directly from the global state, PhiloxCudaState uses a pointer + * to a one-element stream-local int64_t device tensor * holding an initial offset value, and a uint64_t holding an * intra-graph offset. (The intra-graph offset starts from zero * when capture begins.) In each consumer kernel, @@ -100,7 +100,7 @@ struct TORCH_CUDA_CPP_API CUDAGeneratorImpl : public c10::GeneratorImpl { c10::intrusive_ptr get_state() const override; void set_philox_offset_per_thread(uint64_t offset); uint64_t philox_offset_per_thread() const; - void capture_prologue(int64_t* offset_extragraph); + void capture_prologue(int64_t* seed_extragraph, int64_t* offset_extragraph); uint64_t capture_epilogue(); PhiloxCudaState philox_cuda_state(uint64_t increment); @@ -114,6 +114,7 @@ struct TORCH_CUDA_CPP_API CUDAGeneratorImpl : public c10::GeneratorImpl { CUDAGeneratorImpl* clone_impl() const override; uint64_t seed_ = default_rng_seed_val; uint64_t philox_offset_per_thread_ = 0; + int64_t* seed_extragraph_{}; int64_t* offset_extragraph_{}; uint32_t offset_intragraph_ = 0; bool graph_expects_this_gen_ = false; diff --git a/aten/src/ATen/cuda/CUDAGraph.cpp b/aten/src/ATen/cuda/CUDAGraph.cpp index c7734334f4e2..24ee0b19ab90 100644 --- a/aten/src/ATen/cuda/CUDAGraph.cpp +++ b/aten/src/ATen/cuda/CUDAGraph.cpp @@ -65,9 +65,11 @@ void CUDAGraph::capture_begin(MempoolId_t pool/*=0*/) { c10::nullopt, cuda::detail::getDefaultCUDAGenerator()); auto options = TensorOptions().device(at::kCUDA).dtype(at::kLong); + seed_extragraph_ = at::empty({1}, options); offset_extragraph_ = at::empty({1}, options); - gen->capture_prologue(offset_extragraph_.data_ptr()); + seed_extragraph_.fill_(int64_t(gen->current_seed())); + gen->capture_prologue(seed_extragraph_.data_ptr(), offset_extragraph_.data_ptr()); auto stream = at::cuda::getCurrentCUDAStream(); @@ -131,16 +133,42 @@ void CUDAGraph::capture_end() { TORCH_CHECK(stream == capture_stream_, "Capture must end on the same stream it began on."); - c10::cuda::CUDACachingAllocator::notifyCaptureEnd(capture_dev_, id_); + c10::cuda::CUDACachingAllocator::notifyCaptureAboutToEnd(capture_dev_, id_); AT_CUDA_CHECK(cudaStreamEndCapture(capture_stream_, &graph_)); TORCH_CHECK(graph_ != NULL, "Invalid capture."); has_graph_ = true; - // Trailing NULL, NULL, 0 arguments were recommended by Cuda driver people, - // who prefer not to report error message through these arguments moving forward - // (they prefer return value, or errors on api calls internal to the capture) - AT_CUDA_CHECK(cudaGraphInstantiate(&graph_exec_, graph_, NULL, NULL, 0)); + c10::cuda::CUDACachingAllocator::notifyCaptureEnded(capture_dev_, id_); + + // In typical graph usage some tensors (e.g. the tensors used for graph IO) are not freed + // between replays. + // If Pytorch compiles and runs with a CUDA 11.4+ toolkit, there's a chance the allocator backend + // is cudaMallocAsync. + // cudaMallocAsync is generally graph-safe, but if some tensors are not freed between replays, + // the graph's internal bookkeeping requires that we instantiate with + // cudaGraphInstantiateFlagAutoFreeOnLaunch. See + // cudaGraphLaunch + // https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH_1g1accfe1da0c605a577c22d9751a09597 + // cudaGraphInstantiateWithFlags + // https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html#group__CUDART__GRAPH_1ga2c652a24ba93e52b99a47bec0888233 +#if CUDA_VERSION >= 11040 + int version; + AT_CUDA_CHECK(cudaDriverGetVersion(&version)); + if (version < 11040) { +#endif + // Trailing NULL, NULL, 0 arguments were recommended by Cuda driver people, + // who prefer not to report error message through these arguments moving forward + // (they prefer return value, or errors on api calls internal to the capture) + AT_CUDA_CHECK(cudaGraphInstantiate(&graph_exec_, graph_, NULL, NULL, 0)); +#if CUDA_VERSION >= 11040 + } else { + AT_CUDA_CHECK(cudaGraphInstantiateWithFlags(&graph_exec_, + graph_, + cudaGraphInstantiateFlagAutoFreeOnLaunch)); + } +#endif + has_graph_exec_ = true; auto* gen = get_generator_or_default( @@ -175,6 +203,7 @@ void CUDAGraph::replay() { std::lock_guard lock(gen->mutex_); rng_engine_inputs = gen->philox_cuda_state(wholegraph_increment_); } + seed_extragraph_.fill_(int64_t(gen->current_seed())); offset_extragraph_.fill_(int64_t(rng_engine_inputs.offset_.val)); // graph_exec_ may be replayed in any stream. diff --git a/aten/src/ATen/cuda/CUDAGraph.h b/aten/src/ATen/cuda/CUDAGraph.h index 09b0b7b5d800..bacad79102a3 100644 --- a/aten/src/ATen/cuda/CUDAGraph.h +++ b/aten/src/ATen/cuda/CUDAGraph.h @@ -69,6 +69,7 @@ struct TORCH_CUDA_CPP_API CUDAGraph { int capture_dev_; // RNG state trackers + at::Tensor seed_extragraph_; at::Tensor offset_extragraph_; uint64_t wholegraph_increment_; }; diff --git a/aten/src/ATen/cuda/CUDASparse.h b/aten/src/ATen/cuda/CUDASparse.h index ecb7127dfa32..d309cd5d8e31 100644 --- a/aten/src/ATen/cuda/CUDASparse.h +++ b/aten/src/ATen/cuda/CUDASparse.h @@ -4,13 +4,26 @@ // cuSparse Generic API added in CUDA 10.1 // Windows support added in CUDA 11.0 -// ROCm is not enabled #if defined(CUDART_VERSION) && defined(CUSPARSE_VERSION) && ((CUSPARSE_VERSION >= 10300) || (CUSPARSE_VERSION >= 11000 && defined(_WIN32))) #define AT_USE_CUSPARSE_GENERIC_API() 1 #else #define AT_USE_CUSPARSE_GENERIC_API() 0 #endif +// hipSparse Generic API ROCm 5.2 +#if defined(USE_ROCM) && ROCM_VERSION >= 50200 +#define AT_USE_HIPSPARSE_GENERIC_52_API() 1 +#else +#define AT_USE_HIPSPARSE_GENERIC_52_API() 0 +#endif + +// hipSparse Generic API ROCm 5.1 +#if defined(USE_ROCM) && ROCM_VERSION >= 50100 +#define AT_USE_HIPSPARSE_GENERIC_API() 1 +#else +#define AT_USE_HIPSPARSE_GENERIC_API() 0 +#endif + // cuSparse Generic API spsv function was added in CUDA 11.3.0 #if defined(CUDART_VERSION) && defined(CUSPARSE_VERSION) && (CUSPARSE_VERSION >= 11500) #define AT_USE_CUSPARSE_GENERIC_SPSV() 1 diff --git a/aten/src/ATen/cuda/CUDASparseDescriptors.cpp b/aten/src/ATen/cuda/CUDASparseDescriptors.cpp index 3065babf89b6..6319e214ac98 100644 --- a/aten/src/ATen/cuda/CUDASparseDescriptors.cpp +++ b/aten/src/ATen/cuda/CUDASparseDescriptors.cpp @@ -9,7 +9,7 @@ namespace at { namespace cuda { namespace sparse { -#if AT_USE_CUSPARSE_GENERIC_API() +#if AT_USE_CUSPARSE_GENERIC_API() || AT_USE_HIPSPARSE_GENERIC_API() namespace { @@ -53,6 +53,7 @@ cusparseIndexType_t getCuSparseIndexType(const c10::ScalarType& scalar_type) { } } +#if AT_USE_HIPSPARSE_GENERIC_52_API() || AT_USE_CUSPARSE_GENERIC_API() CuSparseDnMatDescriptor::CuSparseDnMatDescriptor(const Tensor& input, int64_t batch_offset) { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(input.layout() == kStrided); IntArrayRef input_strides = input.strides(); @@ -105,6 +106,7 @@ CuSparseDnMatDescriptor::CuSparseDnMatDescriptor(const Tensor& input, int64_t ba descriptor_.reset(raw_descriptor); } +#endif // AT_USE_HIPSPARSE_GENERIC_52_API() || AT_USE_CUSPARSE_GENERIC_API() CuSparseDnVecDescriptor::CuSparseDnVecDescriptor(const Tensor& input) { // cuSPARSE doesn't support batched vectors @@ -175,7 +177,7 @@ CuSparseSpMatCsrDescriptor::CuSparseSpMatCsrDescriptor(const Tensor& input, int6 value_type // data type of values )); -#if defined(CUDA_VERSION) && CUDA_VERSION >= 11000 +#if AT_USE_HIPSPARSE_GENERIC_52_API() || (defined(CUDA_VERSION) && CUDA_VERSION >= 11000) if (ndim == 3 && batch_offset == -1) { int batch_count = at::native::cuda_int_cast(at::native::batchCount(input), "batch_count"); @@ -204,7 +206,7 @@ CuSparseSpMatCsrDescriptor::CuSparseSpMatCsrDescriptor(const Tensor& input, int6 descriptor_.reset(raw_descriptor); } -#endif // AT_USE_CUSPARSE_GENERIC_API() +#endif // AT_USE_CUSPARSE_GENERIC_API() || AT_USE_HIPSPARSE_GENERIC_API() } // namespace sparse } // namespace cuda diff --git a/aten/src/ATen/cuda/CUDASparseDescriptors.h b/aten/src/ATen/cuda/CUDASparseDescriptors.h index 40078b65df64..60c9ff0ffa88 100644 --- a/aten/src/ATen/cuda/CUDASparseDescriptors.h +++ b/aten/src/ATen/cuda/CUDASparseDescriptors.h @@ -40,6 +40,11 @@ class CuSparseDescriptor { #if defined(USE_ROCM) // hipSPARSE doesn't define this using cusparseMatDescr = std::remove_pointer::type; +using cusparseDnMatDescr = std::remove_pointer::type; +using cusparseDnVecDescr = std::remove_pointer::type; +using cusparseSpMatDescr = std::remove_pointer::type; +using cusparseSpMatDescr = std::remove_pointer::type; +using cusparseSpGEMMDescr = std::remove_pointer::type; #if AT_USE_HIPSPARSE_TRIANGULAR_SOLVE() using bsrsv2Info = std::remove_pointer::type; using bsrsm2Info = std::remove_pointer::type; @@ -92,15 +97,17 @@ class TORCH_CUDA_CPP_API CuSparseBsrsm2Info #endif // AT_USE_HIPSPARSE_TRIANGULAR_SOLVE -#if AT_USE_CUSPARSE_GENERIC_API() +#if AT_USE_CUSPARSE_GENERIC_API() || AT_USE_HIPSPARSE_GENERIC_API() cusparseIndexType_t getCuSparseIndexType(const c10::ScalarType& scalar_type); +#if AT_USE_HIPSPARSE_GENERIC_52_API() || AT_USE_CUSPARSE_GENERIC_API() class TORCH_CUDA_CPP_API CuSparseDnMatDescriptor : public CuSparseDescriptor { public: explicit CuSparseDnMatDescriptor(const Tensor& input, int64_t batch_offset = -1); }; +#endif //AT_USE_HIPSPARSE_GENERIC_52_API() || AT_USE_CUSPARSE_GENERIC_API() class TORCH_CUDA_CPP_API CuSparseDnVecDescriptor : public CuSparseDescriptor { @@ -116,7 +123,7 @@ class TORCH_CUDA_CPP_API CuSparseSpMatCsrDescriptor public: explicit CuSparseSpMatCsrDescriptor(const Tensor& input, int64_t batch_offset = -1); -#if defined(CUDA_VERSION) && CUDA_VERSION >= 11000 +#if defined(USE_ROCM) || (defined(CUDA_VERSION) && CUDA_VERSION >= 11000) std::tuple get_size() { int64_t rows, cols, nnz; TORCH_CUDASPARSE_CHECK(cusparseSpMatGetSize( @@ -190,7 +197,7 @@ class TORCH_CUDA_CPP_API CuSparseSpSMDescriptor }; #endif -#if defined(CUDA_VERSION) && CUDA_VERSION >= 11000 +#if (defined(USE_ROCM) && ROCM_VERSION >= 50200) || (defined(CUDA_VERSION) && CUDA_VERSION >= 11000) class TORCH_CUDA_CPP_API CuSparseSpGEMMDescriptor : public CuSparseDescriptor { public: @@ -202,7 +209,7 @@ class TORCH_CUDA_CPP_API CuSparseSpGEMMDescriptor }; #endif -#endif // AT_USE_CUSPARSE_GENERIC_API() +#endif // AT_USE_CUSPARSE_GENERIC_API() || AT_USE_HIPSPARSE_GENERIC_API() } // namespace sparse } // namespace cuda diff --git a/aten/src/ATen/cuda/CublasHandlePool.cpp b/aten/src/ATen/cuda/CublasHandlePool.cpp index 08fa4e4904c9..b168c6bcdfcf 100644 --- a/aten/src/ATen/cuda/CublasHandlePool.cpp +++ b/aten/src/ATen/cuda/CublasHandlePool.cpp @@ -1,9 +1,19 @@ #include #include +#include + +#include + namespace at { namespace cuda { + namespace { +std::map, at::DataPtr>& cublas_handle_stream_to_workspace() { + static auto& instance = *new std::map, at::DataPtr>; + return instance; +} + void createCublasHandle(cublasHandle_t *handle) { TORCH_CUDABLAS_CHECK(cublasCreate(handle)); } @@ -25,6 +35,44 @@ using CuBlasPoolType = DeviceThreadHandlePoolallocate(getChosenWorkspaceSize()); +} + cublasHandle_t getCurrentCUDABlasHandle() { int device; AT_CUDA_CHECK(cudaGetDevice(&device)); @@ -47,6 +95,16 @@ cublasHandle_t getCurrentCUDABlasHandle() { auto handle = myPoolWindow->reserve(device); auto stream = c10::cuda::getCurrentCUDAStream(); TORCH_CUDABLAS_CHECK(cublasSetStream(handle, stream)); +#if !defined(USE_ROCM) && CUDA_VERSION >= 11000 + // cublasSetWorkspace not available on CUDA 10.2 + cudaStream_t _stream = stream; + auto key = std::make_tuple(static_cast(handle), static_cast(_stream)); + auto workspace_it = cublas_handle_stream_to_workspace().find(key); + if (workspace_it == cublas_handle_stream_to_workspace().end()) { + workspace_it = cublas_handle_stream_to_workspace().insert(workspace_it, {key, getNewWorkspace()}); + } + TORCH_CUDABLAS_CHECK(cublasSetWorkspace(handle, workspace_it->second.get(), getChosenWorkspaceSize())); +#endif #if defined(CUDA_VERSION) && CUDA_VERSION >= 11000 // On CUDA >= 11, and architecture >= Ampere, cuBLAS can use TF32 to speedup // FP32 data type calculations based on the value of the allow_tf32 flag. diff --git a/aten/src/ATen/cuda/PeerToPeerAccess.cpp b/aten/src/ATen/cuda/PeerToPeerAccess.cpp index 8d2e16776f9e..4c0e4f9c1f1d 100644 --- a/aten/src/ATen/cuda/PeerToPeerAccess.cpp +++ b/aten/src/ATen/cuda/PeerToPeerAccess.cpp @@ -1,10 +1,11 @@ #include + +#include #include #include #include #include -#include namespace at { namespace cuda { @@ -38,6 +39,12 @@ bool get_p2p_access(int dev, int dev_to_access) { dev_to_access, " is not a device"); TORCH_INTERNAL_ASSERT(num_devices_ >= 0, "p2p access cache not initialized"); +#ifdef USE_ROCM + bool needs_pool_specific_peer_access = false; +#else + bool needs_pool_specific_peer_access = CUDACachingAllocator::get()->needsPoolSpecificPeerAccess(); +#endif + auto &cache = p2pAccessEnabled_[dev * num_devices_ + dev_to_access]; if (cache != -1) { @@ -49,12 +56,30 @@ bool get_p2p_access(int dev, int dev_to_access) { int access = 0; C10_CUDA_CHECK(cudaDeviceCanAccessPeer(&access, dev, dev_to_access)); if (access) { - cudaError_t err = cudaDeviceEnablePeerAccess(dev_to_access, 0); - if (err == cudaErrorPeerAccessAlreadyEnabled) { - // ignore and clear the error if access was already enabled - cudaGetLastError(); + if (needs_pool_specific_peer_access) { +#if CUDA_VERSION >= 11040 + // Double-checks allocator backend hasn't changed, which would definitely be an error. + // cudaMallocAsync pools are unaffected by cudaDeviceEnablePeerAccess. + // We need pool-specific enablement. See + // https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-2/ + cudaMemPool_t mempool; + C10_CUDA_CHECK(cudaDeviceGetDefaultMemPool(&mempool, dev_to_access)); + cudaMemAccessDesc desc = {}; + desc.location.type = cudaMemLocationTypeDevice; + desc.location.id = dev; + desc.flags = cudaMemAccessFlagsProtReadWrite; + C10_CUDA_CHECK(cudaMemPoolSetAccess(mempool, &desc, 1 /* numDescs */)); +#else + TORCH_INTERNAL_ASSERT(false); +#endif } else { - C10_CUDA_CHECK(err); + cudaError_t err = cudaDeviceEnablePeerAccess(dev_to_access, 0); + if (err == cudaErrorPeerAccessAlreadyEnabled) { + // ignore and clear the error if access was already enabled + cudaGetLastError(); + } else { + C10_CUDA_CHECK(err); + } } cache = 1; } else { diff --git a/aten/src/ATen/cuda/detail/CUDAHooks.cpp b/aten/src/ATen/cuda/detail/CUDAHooks.cpp index ea335180259e..25e4c2b44fa9 100644 --- a/aten/src/ATen/cuda/detail/CUDAHooks.cpp +++ b/aten/src/ATen/cuda/detail/CUDAHooks.cpp @@ -53,6 +53,20 @@ void set_magma_init_fn(void (*fn)()) { magma_init_fn = fn; } +// Sets the CUDA_MODULE_LOADING environment variable +// if it's not set by the user. +void maybe_set_cuda_module_loading(const std::string &def_value) { + auto value = std::getenv("CUDA_MODULE_LOADING"); + if (!value) { +#ifdef _WIN32 + auto env_var = "CUDA_MODULE_LOADING=" + def_value; + _putenv(env_var.c_str()); +#else + setenv("CUDA_MODULE_LOADING", def_value.c_str(), 1); +#endif + } +} + // NB: deleter is dynamic, because we need it to live in a separate // compilation unit (alt is to have another method in hooks, but // let's not if we don't need to!) @@ -62,12 +76,13 @@ void CUDAHooks::initCUDA() const { // have a chance to enable vitals. at::vitals::VitalsAPI.setVital("CUDA", "used", "true", /* force = */ true); + maybe_set_cuda_module_loading("LAZY"); const auto num_devices = c10::cuda::device_count_ensure_non_zero(); c10::cuda::CUDACachingAllocator::init(num_devices); at::cuda::detail::init_p2p_access_cache(num_devices); #if AT_MAGMA_ENABLED() - TORCH_INTERNAL_ASSERT(magma_init_fn != nullptr, "Cannot initilaize magma, init routine not set"); + TORCH_INTERNAL_ASSERT(magma_init_fn != nullptr, "Cannot initialize magma, init routine not set"); magma_init_fn(); #endif } diff --git a/aten/src/ATen/cuda/detail/KernelUtils.h b/aten/src/ATen/cuda/detail/KernelUtils.h index b36e78c9b9a6..5479f500a3e1 100644 --- a/aten/src/ATen/cuda/detail/KernelUtils.h +++ b/aten/src/ATen/cuda/detail/KernelUtils.h @@ -1,6 +1,7 @@ #pragma once #include +#include namespace at { namespace cuda { namespace detail { diff --git a/aten/src/ATen/cuda/detail/PhiloxCudaStateRaw.cuh b/aten/src/ATen/cuda/detail/PhiloxCudaStateRaw.cuh index e14680f88793..a9b67b41ac45 100644 --- a/aten/src/ATen/cuda/detail/PhiloxCudaStateRaw.cuh +++ b/aten/src/ATen/cuda/detail/PhiloxCudaStateRaw.cuh @@ -13,14 +13,14 @@ struct PhiloxCudaState { // Called if graph capture is not underway PhiloxCudaState(uint64_t seed, uint64_t offset) { - seed_ = seed; + seed_.val = seed; offset_.val = offset; } // Called if graph capture is underway - PhiloxCudaState(uint64_t seed, + PhiloxCudaState(int64_t* seed, int64_t* offset_extragraph, uint32_t offset_intragraph) { - seed_ = seed; + seed_.ptr = seed; offset_.ptr = offset_extragraph; offset_intragraph_ = offset_intragraph; captured_ = true; @@ -34,7 +34,7 @@ struct PhiloxCudaState { int64_t* ptr; }; - uint64_t seed_ = 0; + Payload seed_; Payload offset_; uint32_t offset_intragraph_ = 0; bool captured_ = false; diff --git a/aten/src/ATen/cuda/detail/UnpackRaw.cuh b/aten/src/ATen/cuda/detail/UnpackRaw.cuh index e6746fbe4fd0..f8fa4ebbf160 100644 --- a/aten/src/ATen/cuda/detail/UnpackRaw.cuh +++ b/aten/src/ATen/cuda/detail/UnpackRaw.cuh @@ -21,9 +21,9 @@ unpack(at::PhiloxCudaState arg) { // static_cast avoids "warning: invalid narrowing conversion from "long" to "unsigned long". // *(arg.offset_.ptr) is a broadcast load of a single int64_t to the entire kernel. // For most threads' reads it will hit in cache, so it shouldn't hurt performance. - return std::make_tuple(arg.seed_, static_cast(*(arg.offset_.ptr) + arg.offset_intragraph_)); + return std::make_tuple(static_cast(*arg.seed_.ptr), static_cast(*(arg.offset_.ptr) + arg.offset_intragraph_)); } else { - return std::make_tuple(arg.seed_, arg.offset_.val); + return std::make_tuple(arg.seed_.val, arg.offset_.val); } } diff --git a/aten/src/ATen/cuda/jiterator.h b/aten/src/ATen/cuda/jiterator.h index 41a6f719a9e3..ac2c4d7cecf3 100644 --- a/aten/src/ATen/cuda/jiterator.h +++ b/aten/src/ATen/cuda/jiterator.h @@ -33,7 +33,7 @@ TORCH_CUDA_CPP_API c10::SmallVector CompileAndLaunchKernel( const c10::SmallVector& tensors, const c10::SmallVector& extra_args, bool return_by_ref) { - TORCH_CHECK(false, "Jiterator is not supported on ROCm"); + TORCH_CHECK(false, "Jiterator is not supported"); } }} // namespace at::cuda diff --git a/aten/src/ATen/cuda/jiterator_impl.h b/aten/src/ATen/cuda/jiterator_impl.h index 7144b6d8eeaf..5ba251055ad2 100644 --- a/aten/src/ATen/cuda/jiterator_impl.h +++ b/aten/src/ATen/cuda/jiterator_impl.h @@ -27,6 +27,16 @@ namespace native { _(7) \ _(8) +#define AT_FOR_8_CASES_WITH_COMMA(_) \ + _(1) , \ + _(2) , \ + _(3) , \ + _(4) , \ + _(5) , \ + _(6) , \ + _(7) , \ + _(8) + c10::SmallVector get_extra_args_typenames(const c10::SmallVector& extra_args) { c10::SmallVector args_typenames(extra_args.size()); for (auto i = 0; i < extra_args.size(); ++i) { @@ -83,9 +93,9 @@ static std::unique_ptr> make_unique_offset_calculator( template struct OffsetCalculatorVariant { -#define DEFINE_CASE(index) std::unique_ptr>, +#define DEFINE_CASE(index) std::unique_ptr> using OffsetCalculatorTypes = c10::variant< - AT_FOR_8_CASES(DEFINE_CASE) + AT_FOR_8_CASES_WITH_COMMA(DEFINE_CASE) >; #undef DEFINE_CASE @@ -113,9 +123,9 @@ struct OffsetCalculatorVariant { struct ArrayVariant { // works for up to 8 input + 8 outputs -#define DEFINE_CASE(index) at::detail::Array, at::detail::Array, +#define DEFINE_CASE(index) at::detail::Array, at::detail::Array using ArrayTypes = c10::variant< - AT_FOR_8_CASES(DEFINE_CASE) + AT_FOR_8_CASES_WITH_COMMA(DEFINE_CASE) >; #undef DEFINE_CASE @@ -149,9 +159,9 @@ struct ArrayVariant { }; struct TrivialOffsetCalculatorVariant { -#define DEFINE_CASE(index) TrivialOffsetCalculator, +#define DEFINE_CASE(index) TrivialOffsetCalculator using TrivialOffsetCalculatorTypes = c10::variant< - AT_FOR_8_CASES(DEFINE_CASE) + AT_FOR_8_CASES_WITH_COMMA(DEFINE_CASE) >; #undef DEFINE_CASE @@ -177,9 +187,9 @@ struct TrivialOffsetCalculatorVariant { }; struct LoadWithCastVariant { -#define DEFINE_CASE(index) std::unique_ptr>, +#define DEFINE_CASE(index) std::unique_ptr> using LoadWithCastPtr = c10::variant< - AT_FOR_8_CASES(DEFINE_CASE) + AT_FOR_8_CASES_WITH_COMMA(DEFINE_CASE) >; #undef DEFINE_CASE @@ -206,9 +216,9 @@ struct LoadWithCastVariant { }; struct StoreWithCastVariant { -#define DEFINE_CASE(index) std::unique_ptr>, +#define DEFINE_CASE(index) std::unique_ptr> using StoreWithCastPtr = c10::variant< - AT_FOR_8_CASES(DEFINE_CASE) + AT_FOR_8_CASES_WITH_COMMA(DEFINE_CASE) >; #undef DEFINE_CASE diff --git a/aten/src/ATen/cuda/llvm_complex.cpp b/aten/src/ATen/cuda/llvm_complex.cpp index 55e39e280272..0bb2c2ba9a09 100644 --- a/aten/src/ATen/cuda/llvm_complex.cpp +++ b/aten/src/ATen/cuda/llvm_complex.cpp @@ -48,6 +48,10 @@ class complex void real(value_type __re) {__re_ = __re;} void imag(value_type __im) {__im_ = __im;} + constexpr operator bool() const { + return real() || imag(); + } + complex& operator= (const value_type& __re) {__re_ = __re; __im_ = value_type(); return *this;} complex& operator+=(const value_type& __re) {__re_ += __re; return *this;} @@ -106,6 +110,10 @@ class complex void real(value_type __re) {__re_ = __re;} void imag(value_type __im) {__im_ = __im;} + constexpr operator bool() const { + return real() || imag(); + } + complex& operator= (float __re) {__re_ = __re; __im_ = value_type(); return *this;} complex& operator+=(float __re) {__re_ += __re; return *this;} @@ -162,6 +170,10 @@ class complex void real(value_type __re) {__re_ = __re;} void imag(value_type __im) {__im_ = __im;} + constexpr operator bool() const { + return real() || imag(); + } + complex& operator= (double __re) {__re_ = __re; __im_ = value_type(); return *this;} complex& operator+=(double __re) {__re_ += __re; return *this;} @@ -482,7 +494,15 @@ inline constexpr bool operator&&(const complex<_Tp>& __x, const complex<_Tp>& __y) { - return (__x.real() || __x.imag()) && (__y.real() || __y.imag()); + return bool(__x) && bool(__y); +} + +template +inline constexpr +bool +operator||(const complex<_Tp>& __x, const complex<_Tp>& __y) +{ + return bool(__x) || bool(__y); } // 26.3.7 values: @@ -834,7 +854,7 @@ complex::type> pow(const complex<_Tp>& __x, const complex<_Up>& __y) { typedef complex::type> result_type; - return _VSTD::pow(result_type(__x), result_type(__y)); + return std::pow(result_type(__x), result_type(__y)); } template @@ -847,7 +867,7 @@ typename enable_if pow(const complex<_Tp>& __x, const _Up& __y) { typedef complex::type> result_type; - return _VSTD::pow(result_type(__x), result_type(__y)); + return std::pow(result_type(__x), result_type(__y)); } template @@ -860,7 +880,7 @@ typename enable_if pow(const _Tp& __x, const complex<_Up>& __y) { typedef complex::type> result_type; - return _VSTD::pow(result_type(__x), result_type(__y)); + return std::pow(result_type(__x), result_type(__y)); } // __sqr, computes pow(x, 2) diff --git a/aten/src/ATen/cudnn/Descriptors.cpp b/aten/src/ATen/cudnn/Descriptors.cpp index f954bbf5623a..0e739a49bb33 100644 --- a/aten/src/ATen/cudnn/Descriptors.cpp +++ b/aten/src/ATen/cudnn/Descriptors.cpp @@ -164,7 +164,7 @@ void FilterDescriptor::set(const at::Tensor &t, const at::MemoryFormat memory_fo filter_format = CUDNN_TENSOR_NHWC; break; default: - TORCH_INTERNAL_ASSERT(false, "unsurpported memory_format for cuDNN filters"); + TORCH_INTERNAL_ASSERT(false, "unsupported memory_format for cuDNN filters"); } set(getDataType(t), (int) dim, size, filter_format); } diff --git a/aten/src/ATen/cudnn/Descriptors.h b/aten/src/ATen/cudnn/Descriptors.h index a7bcb5eb72ea..e111987785cc 100644 --- a/aten/src/ATen/cudnn/Descriptors.h +++ b/aten/src/ATen/cudnn/Descriptors.h @@ -7,11 +7,17 @@ #include #include -#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { std::string cudnnTypeToString(cudnnDataType_t dtype); @@ -40,7 +46,8 @@ inline int dataSize(cudnnDataType_t dataType) // that the stride for dim i is the product of the sizes of dims // i+1 to the end. This stride is indeed uniquely determined. This // function modifies 'stride' in place so this invariant holds. -static inline void fixSizeOneDimStride(int dim, const int *size, int *stride, bool nhwc) { +template +static inline void fixSizeOneDimStride(int dim, const T *size, T *stride, bool nhwc) { int64_t z = 1; int index = 0; std::vector permutation(dim); @@ -144,7 +151,7 @@ class TORCH_CUDA_CPP_API TensorDescriptor : public Descriptor< void set(cudnnDataType_t dataType, IntArrayRef sizes, IntArrayRef strides, size_t pad, bool nhwc); void set(cudnnDataType_t dataType, int dim, int* size, int* stride, bool nhwc) { - fixSizeOneDimStride(dim, size, stride, nhwc); + fixSizeOneDimStride(dim, size, stride, nhwc); AT_CUDNN_CHECK(cudnnSetTensorNdDescriptor(mut_desc(), dataType, dim, size, stride)); } }; diff --git a/aten/src/ATen/cudnn/Utils.h b/aten/src/ATen/cudnn/Utils.h index 9552953e88ee..64c13c68aa21 100644 --- a/aten/src/ATen/cudnn/Utils.h +++ b/aten/src/ATen/cudnn/Utils.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/aten/src/ATen/detail/FunctionTraits.h b/aten/src/ATen/detail/FunctionTraits.h index aab7300b585f..f49a55e1326d 100644 --- a/aten/src/ATen/detail/FunctionTraits.h +++ b/aten/src/ATen/detail/FunctionTraits.h @@ -76,3 +76,27 @@ struct binary_function_traits { using arg1_t = typename traits::template arg<0>::type; using arg2_t = typename traits::template arg<1>::type; }; + + +// Traits for calling with c10::guts::invoke, where member_functions have a first argument of ClassType +template +struct invoke_traits : public function_traits{ +}; + +template +struct invoke_traits : public invoke_traits{ +}; + +template +struct invoke_traits : public invoke_traits{ +}; + +template +struct invoke_traits : + public function_traits { +}; + +template +struct invoke_traits : + public function_traits { +}; diff --git a/functorch/functorch/csrc/ADInterpreters.cpp b/aten/src/ATen/functorch/ADInterpreters.cpp similarity index 70% rename from functorch/functorch/csrc/ADInterpreters.cpp rename to aten/src/ATen/functorch/ADInterpreters.cpp index 6a269d7e5394..174949bbc3b4 100644 --- a/functorch/functorch/csrc/ADInterpreters.cpp +++ b/aten/src/ATen/functorch/ADInterpreters.cpp @@ -1,9 +1,12 @@ -#include -#include -#include +#include +#include +#include +#include namespace at { namespace functorch { +constexpr size_t default_bitset_size = 64; + static void checkForInvalidMutationOnCaptures( const c10::OperatorHandle& op, const torch::jit::Stack* stack, @@ -14,7 +17,7 @@ static void checkForInvalidMutationOnCaptures( auto args = torch::jit::last(stack, op.schema().arguments().size()); auto mutated_arg = unwrapIfDead(args[0].toTensor()); auto* wrapper = maybeGetTensorWrapper(mutated_arg); - if (wrapper && wrapper->level().has_value() && wrapper->level().value() == cur_level) { + if (wrapper && wrapper->level().has_value() && wrapper->level().value() == cur_level && !(wrapper->is_immutable())) { return; } TORCH_CHECK(false, @@ -25,20 +28,28 @@ static void checkForInvalidMutationOnCaptures( "as inputs."); } -static Tensor materializeGradWrappers(const Tensor& tensor, int64_t current_level) { +Tensor materializeGradWrappers(const Tensor& tensor, int64_t current_level) { if (!tensor.defined()) { return tensor; } auto* wrapper = maybeGetTensorWrapper(tensor); if (!wrapper) { - return makeTensorWrapper(tensor, current_level); + return makeTensorWrapper(tensor, current_level, /*is_immutable=*/true); } TORCH_INTERNAL_ASSERT(wrapper->level().value() <= current_level, "escaped?"); if (wrapper->level().value() == current_level) { TORCH_INTERNAL_ASSERT(tensor.defined()); return tensor; } - return makeTensorWrapper(tensor, current_level); + return makeTensorWrapper(tensor, current_level, /*is_immutable=*/true); +} + +Tensor GradInterpreterPtr::lift(const Tensor& tensor) const { + return materializeGradWrappers(tensor, level()); +} + +Tensor JvpInterpreterPtr::lift(const Tensor& tensor) const { + return materializeGradWrappers(tensor, level()); } static void autogradBasedTransformProcess( @@ -69,7 +80,8 @@ static void autogradBasedTransformSendToNext( int64_t current_level, TransformType transform_type, optional prev_grad_mode, - optional prev_fwd_grad_mode) { + optional prev_fwd_grad_mode, + bool grad_special_case) { if (transform_type == TransformType::Grad) { TORCH_INTERNAL_ASSERT(prev_grad_mode.has_value()); } @@ -91,14 +103,14 @@ static void autogradBasedTransformSendToNext( } return tensor; }; - auto wrap = [&](const Tensor& tensor) { + auto wrap = [&](const Tensor& tensor, bool is_immutable) { if (!tensor.defined()) { return tensor; } // if (c10::show_dispatch_trace_enabled()) { // std::cout << "wrap " << current_level << std::endl; // } - return makeTensorWrapper(tensor, current_level); + return makeTensorWrapper(tensor, current_level, is_immutable); }; // TODO: we only need to do the following (marked with !) on in-place functions @@ -113,11 +125,34 @@ static void autogradBasedTransformSendToNext( // Step 1 & 2 auto args_size = op.schema().arguments().size(); + const auto ret_size = op.schema().returns().size(); // Step 1 auto front = stack->size() - args_size; for (const auto arg_idx : c10::irange(0, args_size)) { stack->push_back((*stack)[front + arg_idx]); } + + std::bitset outputs_aliasing_immutable; // set = 1 for all bits + if(!grad_special_case) { + for (auto idx = stack->size() - args_size; idx < stack->size(); idx++) { + const auto ivalue = (*stack)[idx]; + if (!ivalue.isTensor()) { + continue; // only input that can be aliased is a tensor, not a tensor list (expect in ops without returns) + } + const auto tensor = ivalue.toTensor(); + auto* maybe_tensor_wrapper = maybeGetTensorWrapper(tensor); + if (!maybe_tensor_wrapper || maybe_tensor_wrapper->is_immutable()) { + // if the input is immutable, we find if it aliases anything, noting that + // args are in reverse order on stack, so the last arg is at the top of the stack + const auto relative_pos = idx - (stack->size() - args_size); + const auto aliased_out = findAliasedOutput(op.schema(), relative_pos); + if (aliased_out.has_value()) { + outputs_aliasing_immutable.flip(*aliased_out); // each output aliases at most one input, so we can only hit this once + } + } + } + } + // Step 2 foreachTensorInplace(*stack, stack->size() - args_size, stack->size(), unwrap); @@ -136,12 +171,13 @@ static void autogradBasedTransformSendToNext( if (getDynamicLayerStack().size() == 0) { sanityCheckStack(op, stack); } - op.callBoxed(stack); // Step 4, 5, 6 - auto ret_size = op.schema().returns().size(); + + op.callBoxed(stack); + // Step 4 - foreachTensorInplace(*stack, stack->size() - ret_size, stack->size(), wrap); + foreachTensorInplaceWithFlag(*stack, stack->size() - ret_size, stack->size(), outputs_aliasing_immutable, wrap); // Step 5 auto args_front = stack->size() - args_size - ret_size; @@ -169,10 +205,11 @@ void GradInterpreterPtr::processImpl( void GradInterpreterPtr::sendToNextInterpreterImpl( const c10::OperatorHandle& op, - torch::jit::Stack* stack) { + torch::jit::Stack* stack, + bool grad_special_case) { autogradBasedTransformSendToNext( op, stack, level(), - TransformType::Grad, prevGradMode(), nullopt); + TransformType::Grad, prevGradMode(), nullopt, grad_special_case); } void JvpInterpreterPtr::processImpl( @@ -183,10 +220,11 @@ void JvpInterpreterPtr::processImpl( void JvpInterpreterPtr::sendToNextInterpreterImpl( const c10::OperatorHandle& op, - torch::jit::Stack* stack) { + torch::jit::Stack* stack, + bool grad_special_case) { autogradBasedTransformSendToNext( op, stack, level(), - TransformType::Jvp, nullopt, prevFwdGradMode()); + TransformType::Jvp, nullopt, prevFwdGradMode(), grad_special_case); } }} // namespace at::functorch diff --git a/functorch/functorch/csrc/ADInterpreters.h b/aten/src/ATen/functorch/ADInterpreters.h similarity index 71% rename from functorch/functorch/csrc/ADInterpreters.h rename to aten/src/ATen/functorch/ADInterpreters.h index 6f79afc6144f..6ec1cca065d6 100644 --- a/functorch/functorch/csrc/ADInterpreters.h +++ b/aten/src/ATen/functorch/ADInterpreters.h @@ -1,30 +1,36 @@ #pragma once -#include +#include namespace at { namespace functorch { -struct GradInterpreterPtr { +// These are the interpreters for our AD transforms +// (grad, vjp and jvp). +// See NOTE: [functorch interpreter stack] for more details. + +struct TORCH_API GradInterpreterPtr { explicit GradInterpreterPtr(const Interpreter* base): base_(base) { TORCH_INTERNAL_ASSERT(base->key() == TransformType::Grad); } TransformType key() const { return base_->key(); } int64_t level() const { return base_->level(); } void processImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack); - void sendToNextInterpreterImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack); + void sendToNextInterpreterImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack, bool grad_special_case); bool prevGradMode() const { return c10::get(base_->meta()).prevGradMode_; } + Tensor lift(const Tensor& tensor) const; private: const Interpreter* base_; }; -struct JvpInterpreterPtr { +struct TORCH_API JvpInterpreterPtr { explicit JvpInterpreterPtr(const Interpreter* base): base_(base) { TORCH_INTERNAL_ASSERT(base->key() == TransformType::Jvp); } TransformType key() const { return base_->key(); } int64_t level() const { return base_->level(); } void processImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack); - void sendToNextInterpreterImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack); + void sendToNextInterpreterImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack, bool grad_special_case); bool prevFwdGradMode() const { return c10::get(base_->meta()).prevFwdGradMode_; } + Tensor lift(const Tensor& tensor) const; private: const Interpreter* base_; }; diff --git a/functorch/functorch/csrc/BatchRulesActivation.cpp b/aten/src/ATen/functorch/BatchRulesActivation.cpp similarity index 98% rename from functorch/functorch/csrc/BatchRulesActivation.cpp rename to aten/src/ATen/functorch/BatchRulesActivation.cpp index b761c70b1575..d96ab08a7e2f 100644 --- a/functorch/functorch/csrc/BatchRulesActivation.cpp +++ b/aten/src/ATen/functorch/BatchRulesActivation.cpp @@ -4,8 +4,8 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include +#include +#include #include // NB: most activation functions fit pointwise unary or binary rules. @@ -216,7 +216,7 @@ std::tuple,Tensor,optional> prelu_backward_bat return std::make_tuple(std::get<0>(grads), 0, std::get<1>(grads), (weight_grad_is_batched ? optional(0) : nullopt)); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { VMAP_SUPPORT(glu_backward, glu_backward_batch_rule); VMAP_SUPPORT(glu, glu_batch_rule); VMAP_SUPPORT(prelu, prelu_batch_rule) diff --git a/functorch/functorch/csrc/BatchRulesBinaryOps.cpp b/aten/src/ATen/functorch/BatchRulesBinaryOps.cpp similarity index 90% rename from functorch/functorch/csrc/BatchRulesBinaryOps.cpp rename to aten/src/ATen/functorch/BatchRulesBinaryOps.cpp index afc3579eb22e..4e228afdfc61 100644 --- a/functorch/functorch/csrc/BatchRulesBinaryOps.cpp +++ b/aten/src/ATen/functorch/BatchRulesBinaryOps.cpp @@ -4,8 +4,8 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include +#include +#include #include #include @@ -53,7 +53,7 @@ struct BinaryRandomPointwiseBatchRuleHelper; template struct BinaryRandomPointwiseBatchRuleHelper> { static Tensor apply(const Tensor& tensor, const Tensor& other, T... extra_args) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); auto cur_level = maybe_layer->layerId(); RandomnessType randomness = maybe_layer->randomness(); @@ -268,6 +268,43 @@ std::tuple> cdist_backward_batch_rule( return std::make_tuple(out, out_bdim); } +void fill__Tensor_batch_rule( + Tensor& self, + optional self_bdim, + const Tensor& other, + optional other_bdim) { + if (!other_bdim.has_value()) { + // Optimization: fill_ is faster than the other path which does + // reshaping + copy_ + self.fill_(other); + return; + } + if (!self_bdim && other_bdim) { + vmapIncompatibleInplaceError("fill_"); + } + auto self_and_other = _binary_pointwise_helper( + self, self_bdim, other, other_bdim, /*do_type_promotion*/false); + std::get<0>(self_and_other).copy_(std::get<1>(self_and_other)); +} + +std::tuple> log_sigmoid_backward_batch_rule( + Tensor& grad, optional grad_bdim, + Tensor& self, optional self_bdim, + Tensor& buffer, optional buffer_bdim) { + // NB: This emulates handle_pointwise_ops except we ignore the last argument, buffer + // when any of the inputs are on cuda. + // We do this because on cuda, buffer is a dummy tensor always of logical rank 1 and + // it becomes an issue when the rest of the inputs are scalar + int64_t out_logical_rank = std::max(rankWithoutBatchDim(grad, grad_bdim), rankWithoutBatchDim(self, self_bdim)); + if (!grad.is_cuda() && !self.is_cuda() && !buffer.is_cuda()) { + out_logical_rank = std::max(out_logical_rank, rankWithoutBatchDim(buffer, buffer_bdim)); + } + Tensor out_grad = maybePadToLogicalRank(moveBatchDimToFront(grad, grad_bdim), grad_bdim, out_logical_rank); + Tensor out_self = maybePadToLogicalRank(moveBatchDimToFront(self, self_bdim), self_bdim, out_logical_rank); + Tensor out_buffer = maybePadToLogicalRank(moveBatchDimToFront(buffer, buffer_bdim), buffer_bdim, out_logical_rank); + return std::make_tuple(at::log_sigmoid_backward(out_grad, out_self, out_buffer), 0); +} + Tensor binomial_wrapper(const Tensor& count, const Tensor& prob, c10::optional gen) { return at::binomial(count, prob.contiguous(), gen); // Bug in PyTorch, prob shouldn't need to be contiguous } @@ -282,7 +319,7 @@ TORCH_LIBRARY_IMPL(aten, FuncTorchVmapMode, m) { m.impl("binomial", BINARY_RANDOM_POINTWISE_BATCH_RULE(at::functorch::binomial_wrapper)); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { #define BINARY_POINTWISE2(op, overload) \ VMAP_SUPPORT2(op, overload, BINARY_POINTWISE_BATCH_RULE(ATEN_FN2(op, overload))); #define BINARY_POINTWISE(op) \ @@ -395,7 +432,7 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { // BINARY_POINTWISE(infinitely_differentiable_gelu_backward); BINARY_POINTWISE(leaky_relu_backward); BINARY_POINTWISE(logit_backward); - POINTWISE_BOXED(log_sigmoid_backward); + VMAP_SUPPORT(log_sigmoid_backward, log_sigmoid_backward_batch_rule); VMAP_SUPPORT(gelu_backward, gelu_backward_batch_rule); BINARY_POINTWISE(sigmoid_backward); POINTWISE_BOXED(softplus_backward); @@ -456,7 +493,9 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { #undef SINGLE_ARG #undef LOGICAL_COMPARISON_POINTWISE VMAP_SUPPORT(masked_select, masked_select_batch_rule); - VMAP_SUPPORT(masked_select_backward, masked_select_backward_batch_rule) + VMAP_SUPPORT(masked_select_backward, masked_select_backward_batch_rule); + + VMAP_SUPPORT2(fill_, Tensor, fill__Tensor_batch_rule); } }} diff --git a/functorch/functorch/csrc/BatchRulesConvolution.cpp b/aten/src/ATen/functorch/BatchRulesConvolution.cpp similarity index 82% rename from functorch/functorch/csrc/BatchRulesConvolution.cpp rename to aten/src/ATen/functorch/BatchRulesConvolution.cpp index 8382070283cd..79523ed1fb6d 100644 --- a/functorch/functorch/csrc/BatchRulesConvolution.cpp +++ b/aten/src/ATen/functorch/BatchRulesConvolution.cpp @@ -4,8 +4,8 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include +#include +#include #include namespace at { namespace functorch { @@ -17,7 +17,7 @@ namespace at { namespace functorch { // we do not support batch_group_count (which is needed for convolution backwards). // Instead, there's a convolution_backward op that needs a batching rule. std::tuple> -convolution_batch_rule(const Tensor& lhs, optional lhs_bdim, const Tensor& rhs, optional rhs_bdim, const optional& bias, optional bias_bdim, IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, bool transposed, IntArrayRef output_padding, int64_t groups) { +convolution_batch_rule(const Tensor& lhs, optional lhs_bdim, const Tensor& rhs, optional rhs_bdim, const optional& bias, optional bias_bdim, IntArrayRef stride, c10::SymIntArrayRef padding, IntArrayRef dilation, bool transposed, c10::SymIntArrayRef output_padding, int64_t groups) { DimVector lhs_spec(stride.size() + 2); std::iota(lhs_spec.begin(), lhs_spec.end(), 0); DimVector rhs_spec = lhs_spec; @@ -42,36 +42,68 @@ convolution_batch_rule(const Tensor& lhs, optional lhs_bdim, const Tens std::tuple> result; if (lhs_bdim && !rhs_bdim) { auto new_x = reshape_dim_into(*lhs_bdim, lhs_spec[0], lhs); - auto out = at::convolution(new_x, rhs, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); + auto out = at::convolution_symint(new_x, rhs, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); out = reshape_dim_outof(out_spec[0], lhs.sizes()[*lhs_bdim], out); result = std::make_tuple(out, out_spec[0]); } else if (!lhs_bdim && rhs_bdim) { if (groups == 1) { auto new_w = reshape_dim_into(*rhs_bdim, rhs_spec[0], rhs); - auto out = at::convolution(lhs, new_w, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); - out = reshape_dim_outof(out_spec[1], rhs.sizes()[*rhs_bdim], out); + auto out = at::convolution_symint(lhs, new_w, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); + out = reshape_dim_outof(out_spec[1], rhs.size(*rhs_bdim), out); result = std::make_tuple(out, out_spec[1]); } else { - auto dim_with_groups = transposed ? 1 : 0; - auto new_w = reshape_dim_outof(rhs_spec[dim_with_groups] + (*rhs_bdim <= rhs_spec[0]), groups, rhs); - new_w = reshape_dim_into(*rhs_bdim + (rhs_spec[0] < rhs_bdim), rhs_spec[0] + 1, new_w); - new_w = reshape_dim_into(rhs_spec[0], rhs_spec[0], new_w); - auto out = at::convolution(lhs, new_w, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); - out = reshape_dim_outof(out_spec[1], groups, out); - out = reshape_dim_outof(out_spec[1] + 1, rhs.sizes()[*rhs_bdim], out); - out = reshape_dim_into(out_spec[1], out_spec[1] + 1, out); - result = std::make_tuple(out, out_spec[1]); + if (transposed) { + // conv_transpose with groups is normally NIHW, IOHW -> N(GO)HW + // With RHS batched, we do the following: + // NIHW, BIOHW -> NIHW, I(BO)HW -> N(GBO)HW -> BN(GO)HW + // NB: the following isn't written using rhs_spec + // (PyTorch convs have a fixed dimension order) + + // BIOHW -> I(BO)HW + auto new_w = reshape_dim_into(*rhs_bdim, 1, rhs); + // NIHW, I(BO)HW -> N(GBO)HW + auto out = at::convolution_symint(lhs, new_w, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); + // N(GBO)HW -> NG(BO)HW + out = reshape_dim_outof(1, groups, out); + // NG(BO)HW -> NGBOHW + out = reshape_dim_outof(2, rhs.size(*rhs_bdim), out); + // NGBOHW -> NB(GO)HW + out = reshape_dim_into(1, 2, out); + result = std::make_tuple(out, 1); + } else { + // conv with groups is normally N(GI)HW, (GO)IHW -> N(GO)HW + // With RHS batched, we do the following: + // N(GI)HW, B(GO)IHW -> N(GI)HW, (GBO)IHW -> N(GBO)HW -> BN(GO)HW + // NB: the following isn't written using rhs_spec + // (PyTorch convs have a fixed dimension order) + + // B(GO)IHW -> BGOIHW + auto new_w = reshape_dim_outof(0 + (*rhs_bdim == 0), groups, rhs); + // BGOIHW -> G(BO)IHW + new_w = reshape_dim_into(*rhs_bdim + (*rhs_bdim > 0), 1, new_w); + // G(BO)IHW -> (GBO)IHW + new_w = reshape_dim_into(0, 0, new_w); + // N(GI)HW, (GBO)IHW -> N(GBO)HW + auto out = at::convolution_symint(lhs, new_w, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); + // N(GBO)HW -> NG(BO)HW + out = reshape_dim_outof(1, groups, out); + // NG(BO)HW -> NGBOHW + out = reshape_dim_outof(2, rhs.size(*rhs_bdim), out); + // NGBOHW -> NB(GO)HW + out = reshape_dim_into(1, 2, out); + result = std::make_tuple(out, 1); + } } } else if (lhs_bdim && rhs_bdim) { auto new_x = reshape_dim_into(*lhs_bdim, lhs_spec[1], lhs); groups *= lhs.sizes()[*lhs_bdim]; auto dim_with_groups = transposed ? 1 : 0; auto new_w = reshape_dim_into(*rhs_bdim, rhs_spec[dim_with_groups], rhs); - auto out = at::convolution(new_x, new_w, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); + auto out = at::convolution_symint(new_x, new_w, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups); out = reshape_dim_outof(out_spec[1], lhs.sizes()[*lhs_bdim], out); result = std::make_tuple(out, out_spec[1]); } else { - result = std::make_tuple(at::convolution(lhs, rhs, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups), nullopt); + result = std::make_tuple(at::convolution_symint(lhs, rhs, unbatched_bias, stride, padding, dilation, transposed, output_padding, groups), nullopt); } if (separate_bias) { auto A = std::get<0>(result); @@ -165,7 +197,7 @@ Tensor _convolution_decomp( // std::tie(weight_value, weight_bdim) = unwrapTensorAtLevel(weight, cur_level); // // if (self_bdim.has_value() && self_value.dim() == 5 && first_dim_has_size_1(self_value, *self_bdim) && grad_output_bdim.has_value() && !weight_bdim.has_value()) { -// c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); +// c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); // auto result = cudnn_conv_per_sample_grad_rule( // self_value, self_bdim, // grad_output_value, grad_output_bdim, @@ -212,8 +244,8 @@ convolution_backward_input_batch_rule( const Tensor& grad_output, optional grad_output_bdim, const Tensor& input, optional input_bdim, const Tensor& weight, optional weight_bdim, - IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, bool transposed, - IntArrayRef output_padding, int64_t groups) { + IntArrayRef stride, c10::SymIntArrayRef padding, IntArrayRef dilation, bool transposed, + c10::SymIntArrayRef output_padding, int64_t groups) { const std::array mask = {true, false, false}; if (grad_output_bdim && weight_bdim) { // regular: BNO, BOI -> N(BO), (BO)I -> N(BI) @@ -222,7 +254,7 @@ convolution_backward_input_batch_rule( const auto grad_output_ = reshape_dim_into(*grad_output_bdim, 1, grad_output); const auto weight_ = reshape_dim_into(*weight_bdim, 0, weight); auto dummy_input = make_dummy(input, input_bdim, 1, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output_, dummy_input, weight_, nullopt, stride, padding, dilation, transposed, output_padding, groups * batch_size, mask); const auto grad_input = reshape_dim_outof(1, batch_size, std::get<0>(result)); @@ -233,7 +265,7 @@ convolution_backward_input_batch_rule( const auto batch_size = grad_output.size(*grad_output_bdim); const auto grad_output_ = reshape_dim_into(*grad_output_bdim, 0, grad_output); auto dummy_input = make_dummy(input, input_bdim, 0, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output_, dummy_input, weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); const auto grad_input = reshape_dim_outof(0, batch_size, std::get<0>(result)); @@ -246,7 +278,7 @@ convolution_backward_input_batch_rule( const auto in_ch_dim = transposed ? 0 : 1; const auto weight_ = reshape_dim_into(*weight_bdim, in_ch_dim, weight); auto dummy_input = make_dummy(input, input_bdim, 1, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, dummy_input, weight_, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); const auto grad_input = reshape_dim_outof(1, batch_size, std::get<0>(result)); @@ -257,7 +289,7 @@ convolution_backward_input_batch_rule( // N(GO), B(GO)I -> N(GO), (GO)(BI) -> N(GBI) const auto weight_ = reshape_dim_into(*weight_bdim, 1, weight); auto dummy_input = make_dummy(input, input_bdim, 1, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, dummy_input, weight_, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); grad_input = std::get<0>(result); // N(GBI) @@ -268,7 +300,7 @@ convolution_backward_input_batch_rule( weight_ = weight_.transpose(0, 1); // GBIO weight_ = weight_.flatten(0, 2); // (GBI)O const auto dummy_input = make_dummy(input, input_bdim, 1, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, dummy_input, weight_, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); grad_input = std::get<0>(result); // N(GBI) @@ -282,7 +314,7 @@ convolution_backward_input_batch_rule( } else { TORCH_INTERNAL_ASSERT(input_bdim); const auto dummy_input = make_dummy(input, input_bdim, 0, 1); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, dummy_input, weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); return std::make_tuple(std::get<0>(result), nullopt); @@ -293,8 +325,8 @@ convolution_backward_weight_batch_rule( const Tensor& grad_output, optional grad_output_bdim, const Tensor& input, optional input_bdim, const Tensor& weight, optional weight_bdim, - IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, bool transposed, - IntArrayRef output_padding, int64_t groups) { + IntArrayRef stride, c10::SymIntArrayRef padding, IntArrayRef dilation, bool transposed, + c10::SymIntArrayRef output_padding, int64_t groups) { const std::array mask = {false, true, false}; if (grad_output_bdim && input_bdim) { // BNO, BNI -> N(BO), N(BI) -> (BO)I (regular) (BI)O (transposed) @@ -302,7 +334,7 @@ convolution_backward_weight_batch_rule( const auto grad_output_ = reshape_dim_into(*grad_output_bdim, 1, grad_output); const auto input_ = reshape_dim_into(*input_bdim, 1, input); const auto dummy_weight = make_dummy(weight, weight_bdim, 0, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output_, input_, dummy_weight, nullopt, stride, padding, dilation, transposed, output_padding, groups * batch_size, mask); auto grad_weight = std::get<1>(result); @@ -316,7 +348,7 @@ convolution_backward_weight_batch_rule( const auto grad_output_ = reshape_dim_into(*grad_output_bdim, 1, grad_output); const auto out_ch_dim = transposed ? 1 : 0; const auto dummy_weight = make_dummy(weight, weight_bdim, out_ch_dim, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output_, input, dummy_weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); auto grad_weight = std::get<1>(result); @@ -330,7 +362,7 @@ convolution_backward_weight_batch_rule( if (!transposed) { // BN(GO), N(GI) -> N(GBO), N(GI) -> (GBO)I const auto dummy_weight = make_dummy(weight, weight_bdim, 0, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output_, input, dummy_weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); auto grad_weight = std::get<1>(result); @@ -341,7 +373,7 @@ convolution_backward_weight_batch_rule( } else { // BN(GO), N(GI) -> N(GBO), N(GI) -> (GI)(BO) const auto dummy_weight = make_dummy(weight, weight_bdim, 1, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output_, input, dummy_weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); auto grad_weight = std::get<1>(result); @@ -357,7 +389,7 @@ convolution_backward_weight_batch_rule( const auto input_ = reshape_dim_into(*input_bdim, 1, input); const auto in_ch_dim = transposed ? 0 : 1; const auto dummy_weight = make_dummy(weight, weight_bdim, in_ch_dim, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, input_, dummy_weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); auto grad_weight = std::get<1>(result); @@ -371,7 +403,7 @@ convolution_backward_weight_batch_rule( if (!transposed) { // regular: N(GO), BN(GI) -> N(GO), N(GBI) -> (GO)(BI) const auto dummy_weight = make_dummy(weight, weight_bdim, 1, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, input_, dummy_weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); auto grad_weight = std::get<1>(result); @@ -380,7 +412,7 @@ convolution_backward_weight_batch_rule( } else { // transposed: N(GO), BN(GI) -> N(GO), N(GBI) -> (GBI)O const auto dummy_weight = make_dummy(weight, weight_bdim, 0, batch_size); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, input_, dummy_weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); auto grad_weight = std::get<1>(result); @@ -393,7 +425,7 @@ convolution_backward_weight_batch_rule( } else { TORCH_INTERNAL_ASSERT(weight_bdim); const auto dummy_weight = make_dummy(weight, weight_bdim, 0, 1); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, input, dummy_weight, nullopt, stride, padding, dilation, transposed, output_padding, groups, mask); return std::make_tuple(std::get<1>(result), nullopt); @@ -403,16 +435,16 @@ convolution_backward_weight_batch_rule( std::tuple convolution_backward_plumbing( const Tensor& grad_output_, const Tensor& input_, const Tensor& weight_, - const c10::OptionalArrayRef bias_sizes_opt, - IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, bool transposed, - IntArrayRef output_padding, int64_t groups, std::array output_mask) { + const c10::OptionalArrayRef bias_sizes_opt, + IntArrayRef stride, c10::SymIntArrayRef padding, IntArrayRef dilation, bool transposed, + c10::SymIntArrayRef output_padding, int64_t groups, std::array output_mask) { const auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); if (!areAnyBatchedAtLevel({grad_output_, input_, weight_}, cur_level)){ - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); - return at::convolution_backward( + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); + return at::convolution_backward_symint( grad_output_, input_, weight_, bias_sizes_opt, stride, padding, dilation, transposed, output_padding, groups, output_mask); } @@ -448,14 +480,14 @@ std::tuple convolution_backward_plumbing( // BNO, BNI, BOI // AKA one of the model ensembling case if (grad_output_bdim && input_bdim && weight_bdim) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); grad_output = reshape_dim_into(*grad_output_bdim, 1, grad_output); // BNO, BNI, BOI -> N(BO), N(BI), (BO)I const auto batch_size = weight.size(*weight_bdim); input = reshape_dim_into(*input_bdim, 1, input); weight = reshape_dim_into(*weight_bdim, 0, weight); - const auto result = at::convolution_backward( + const auto result = at::convolution_backward_symint( grad_output, input, weight, nullopt, stride, padding, dilation, transposed, output_padding, batch_size * groups, output_mask); // N(BI), (BO)I -> NBI, BOI @@ -471,7 +503,7 @@ std::tuple convolution_backward_plumbing( Tensor grad_input; if (output_mask[0]) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto result = convolution_backward_input_batch_rule( grad_output, grad_output_bdim, input, input_bdim, @@ -482,7 +514,7 @@ std::tuple convolution_backward_plumbing( Tensor grad_weight; if (output_mask[1]) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto result = convolution_backward_weight_batch_rule( grad_output, grad_output_bdim, input, input_bdim, @@ -504,7 +536,7 @@ std::tuple convolution_backward_plumbing( } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { VMAP_SUPPORT(convolution, convolution_batch_rule); m.impl("_convolution", _convolution_decomp); m.impl("convolution_backward", convolution_backward_plumbing); diff --git a/functorch/functorch/csrc/BatchRulesDecompositions.cpp b/aten/src/ATen/functorch/BatchRulesDecompositions.cpp similarity index 82% rename from functorch/functorch/csrc/BatchRulesDecompositions.cpp rename to aten/src/ATen/functorch/BatchRulesDecompositions.cpp index 46219f542ef1..13dedcfb879a 100644 --- a/functorch/functorch/csrc/BatchRulesDecompositions.cpp +++ b/aten/src/ATen/functorch/BatchRulesDecompositions.cpp @@ -8,24 +8,24 @@ #include #include #include -#include -#include -#include -#include +#include +#include +#include +#include namespace at { namespace functorch { #define OP_DECOMPOSE(op) m.impl(#op, static_cast(native::op)); #define OP_DECOMPOSE2(op, overload) m.impl(#op"."#overload, static_cast(native::op)); -TORCH_LIBRARY_IMPL(aten, FT_VMAP_MODE_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchVmapMode, m) { OP_DECOMPOSE(alpha_dropout_); OP_DECOMPOSE(dropout_); OP_DECOMPOSE(feature_alpha_dropout_); OP_DECOMPOSE(feature_dropout_); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { OP_DECOMPOSE2(__and__, Scalar); OP_DECOMPOSE2(__and__, Tensor); OP_DECOMPOSE2(__iand__, Tensor); @@ -44,8 +44,8 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE(avg_pool1d); OP_DECOMPOSE(adaptive_max_pool1d); OP_DECOMPOSE(adaptive_avg_pool1d); - OP_DECOMPOSE(adaptive_avg_pool2d); - OP_DECOMPOSE(adaptive_avg_pool3d); + m.impl("adaptive_avg_pool2d", native::adaptive_avg_pool2d_symint); + m.impl("adaptive_avg_pool3d", native::adaptive_avg_pool3d_symint); OP_DECOMPOSE(adjoint); OP_DECOMPOSE(arccos); OP_DECOMPOSE(arccosh); @@ -63,7 +63,7 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE2(bitwise_or, Scalar); OP_DECOMPOSE2(bitwise_xor, Scalar); OP_DECOMPOSE(broadcast_tensors); - OP_DECOMPOSE(broadcast_to); + m.impl("broadcast_to", native::broadcast_to_symint); OP_DECOMPOSE(cartesian_prod); OP_DECOMPOSE(cdist); OP_DECOMPOSE(clip); @@ -75,17 +75,16 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE(cosine_embedding_loss); OP_DECOMPOSE(cosine_similarity); OP_DECOMPOSE(cov); - OP_DECOMPOSE(cross_entropy_loss); + m.impl("cross_entropy_loss", native::cross_entropy_loss_symint); OP_DECOMPOSE2(cumulative_trapezoid, x); OP_DECOMPOSE2(cumulative_trapezoid, dx); OP_DECOMPOSE2(dsplit, int); OP_DECOMPOSE2(dsplit, array); OP_DECOMPOSE(det); - OP_DECOMPOSE(diag_backward); OP_DECOMPOSE(diff); OP_DECOMPOSE(dstack); OP_DECOMPOSE(einsum); - OP_DECOMPOSE(embedding_backward); + m.impl("embedding_backward", native::embedding_backward_symint); OP_DECOMPOSE(expand_as); OP_DECOMPOSE(fft_fft); OP_DECOMPOSE(fft_fftshift); @@ -126,13 +125,14 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE2(hsplit, int); OP_DECOMPOSE2(hsplit, array); OP_DECOMPOSE(hstack); - OP_DECOMPOSE(index_select_backward); + m.impl("index_select_backward", native::index_select_backward_symint); OP_DECOMPOSE(inner); OP_DECOMPOSE(inverse); + OP_DECOMPOSE(concatenate); OP_DECOMPOSE(instance_norm); OP_DECOMPOSE(kron); OP_DECOMPOSE(l1_loss); - OP_DECOMPOSE(layer_norm); + m.impl("layer_norm", native::layer_norm_symint); OP_DECOMPOSE2(ldexp, Tensor); OP_DECOMPOSE2(less_equal, Tensor ); OP_DECOMPOSE2(less, Tensor ); @@ -140,13 +140,16 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE(linalg_cholesky); OP_DECOMPOSE(linalg_det); OP_DECOMPOSE(linalg_eigvalsh); + OP_DECOMPOSE(linalg_eigvals); OP_DECOMPOSE(linalg_inv); OP_DECOMPOSE(linalg_matmul); OP_DECOMPOSE(linalg_matrix_norm); OP_DECOMPOSE2(linalg_matrix_norm, str_ord); OP_DECOMPOSE(linalg_multi_dot); OP_DECOMPOSE(linalg_norm); + OP_DECOMPOSE2(linalg_norm, ord_str); OP_DECOMPOSE(linalg_solve); + OP_DECOMPOSE(linalg_solve_ex); OP_DECOMPOSE(linalg_svd); OP_DECOMPOSE(linalg_svdvals); OP_DECOMPOSE(linalg_tensorinv); @@ -165,24 +168,25 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE2(movedim, int); OP_DECOMPOSE(msort); OP_DECOMPOSE(mT); - OP_DECOMPOSE(narrow); + m.impl("narrow", native::narrow_symint); OP_DECOMPOSE(negative); OP_DECOMPOSE2(frobenius_norm, dim); OP_DECOMPOSE2(nuclear_norm, dim); OP_DECOMPOSE(nuclear_norm); - OP_DECOMPOSE(nll_loss_nd); - OP_DECOMPOSE(nll_loss); - OP_DECOMPOSE(nll_loss2d); + m.impl("nll_loss_nd", native::nll_loss_nd_symint); + m.impl("nll_loss", native::nll_loss_symint); + m.impl("nll_loss2d", native::nll_loss2d_symint); OP_DECOMPOSE2(not_equal, Tensor ); OP_DECOMPOSE(outer); OP_DECOMPOSE(pairwise_distance); OP_DECOMPOSE(pinverse); OP_DECOMPOSE(poisson_nll_loss); + OP_DECOMPOSE(positive); OP_DECOMPOSE(qr); OP_DECOMPOSE(ravel); - OP_DECOMPOSE2(repeat_interleave, self_int); + m.impl("repeat_interleave.self_int", native::repeat_interleave_symint); OP_DECOMPOSE2(repeat_interleave, self_Tensor); - OP_DECOMPOSE(reshape); + m.impl("reshape", native::reshape_symint); OP_DECOMPOSE(resolve_conj); OP_DECOMPOSE(resolve_neg); OP_DECOMPOSE(row_stack); @@ -196,10 +200,11 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE(special_multigammaln); OP_DECOMPOSE(special_polygamma); OP_DECOMPOSE(special_softmax); - OP_DECOMPOSE2(split, sizes); + m.impl("split.sizes", native::split_symint); OP_DECOMPOSE(square); OP_DECOMPOSE(numpy_T); OP_DECOMPOSE(reshape_as); + OP_DECOMPOSE(slogdet); OP_DECOMPOSE(t); OP_DECOMPOSE2(result_type, Tensor); OP_DECOMPOSE2(result_type, Scalar); @@ -225,7 +230,7 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE2(trapezoid, dx); OP_DECOMPOSE2(trapz, x); OP_DECOMPOSE2(trapz, dx); - OP_DECOMPOSE(value_selecting_reduction_backward); + m.impl("value_selecting_reduction_backward", native::value_selecting_reduction_backward_symint); OP_DECOMPOSE(var); OP_DECOMPOSE2(var, dim); OP_DECOMPOSE(var_mean); @@ -237,7 +242,7 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE2(where, ScalarSelf); OP_DECOMPOSE(orgqr); OP_DECOMPOSE2(unflatten, int); - OP_DECOMPOSE(_convolution_double_backward); + m.impl("_convolution_double_backward", native::_convolution_double_backward); OP_DECOMPOSE(conv_transpose1d); OP_DECOMPOSE2(conv_transpose2d, input); OP_DECOMPOSE2(conv_transpose3d, input); @@ -248,14 +253,15 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { OP_DECOMPOSE2(conv2d, padding); OP_DECOMPOSE2(conv3d, padding); OP_DECOMPOSE(_convolution_mode); - OP_DECOMPOSE(frobenius_norm); OP_DECOMPOSE(type_as); OP_DECOMPOSE(linalg_diagonal); - OP_DECOMPOSE(pad); - OP_DECOMPOSE(_pad_circular); + OP_DECOMPOSE(diagonal_copy); + m.impl("pad", native::pad_symint); + m.impl("_pad_circular", native::_pad_circular_symint); OP_DECOMPOSE(t_); OP_DECOMPOSE(swapdims_); OP_DECOMPOSE(swapaxes_); + OP_DECOMPOSE(unfold_copy); // divide, alias for div OP_DECOMPOSE2(divide, Tensor); diff --git a/functorch/functorch/csrc/BatchRulesDynamic.cpp b/aten/src/ATen/functorch/BatchRulesDynamic.cpp similarity index 86% rename from functorch/functorch/csrc/BatchRulesDynamic.cpp rename to aten/src/ATen/functorch/BatchRulesDynamic.cpp index e752d96d168d..a85d7f18953f 100644 --- a/functorch/functorch/csrc/BatchRulesDynamic.cpp +++ b/aten/src/ATen/functorch/BatchRulesDynamic.cpp @@ -5,11 +5,15 @@ // LICENSE file in the root directory of this source tree. #include -#include -#include +#include +#include #include #include +// This file contains batching rules for operations that return Tensors of +// dynamic shape. We generally don't support those with vmap so we raise +// errors for them. + namespace at { namespace functorch { @@ -57,10 +61,13 @@ void unsupportedAllclose(const c10::OperatorHandle& op, torch::jit::Stack* stack "support over at github.com/pytorch/functorch/issues/275"); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { UNSUPPORTED_DYNAMIC(nonzero); UNSUPPORTED_DYNAMIC(where); - UNSUPPORTED_DYNAMIC(unique); + UNSUPPORTED_DYNAMIC(unique_dim); + UNSUPPORTED_DYNAMIC(unique_consecutive); + UNSUPPORTED_DYNAMIC(unique_dim_consecutive); + UNSUPPORTED_DYNAMIC(_unique2); m.impl("_local_scalar_dense", torch::CppFunction::makeFromBoxedFunction<&unsupportedLocalScalarDense>()); m.impl("item", torch::CppFunction::makeFromBoxedFunction<&unsupportedItem>()); m.impl("is_nonzero", torch::CppFunction::makeFromBoxedFunction<&unsupportedIsNonzero>()); diff --git a/functorch/functorch/csrc/BatchRulesFactory.cpp b/aten/src/ATen/functorch/BatchRulesFactory.cpp similarity index 73% rename from functorch/functorch/csrc/BatchRulesFactory.cpp rename to aten/src/ATen/functorch/BatchRulesFactory.cpp index 3f63d27a0c8e..06d497959f42 100644 --- a/functorch/functorch/csrc/BatchRulesFactory.cpp +++ b/aten/src/ATen/functorch/BatchRulesFactory.cpp @@ -4,11 +4,30 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include "c10/core/SymIntArrayRef.h" +#include +#include namespace at { namespace functorch { +template +struct NewBlahBatchRuleHelperSymInt; + +template +struct NewBlahBatchRuleHelperSymInt> { + static std::tuple> apply( + const Tensor& tensor, + optional batch_dim, + SymIntArrayRef shape, + T... extra_args) { + const auto bdim_size = tensor.sym_size(batch_dim.value()); + c10::SmallVector new_shape; + new_shape.reserve(shape.size() + 1); + new_shape.emplace_back(bdim_size); + new_shape.insert(new_shape.end(), shape.begin(), shape.end()); + return std::make_tuple(Func(tensor, new_shape, std::forward(extra_args)...), 0); + } +}; + template struct NewBlahBatchRuleHelper; @@ -37,6 +56,12 @@ struct NewBlahBatchRuleHelper> { &fn,\ c10::guts::function_traits::parameter_types>::apply) +#define NEW_BLAH_BATCH_RULE_SYMINT(fn) SINGLE_ARG(\ + NewBlahBatchRuleHelperSymInt<\ + decltype(&fn),\ + &fn,\ + c10::guts::function_traits::parameter_types>::apply) + std::tuple> _new_zeros_with_same_feature_meta_batch_rule( const Tensor& self, optional self_bdim, const Tensor& other, optional other_bdim, @@ -82,18 +107,7 @@ bool _has_same_storage_numel_batch_rule(const Tensor& a, const Tensor& b) { return true; } -Tensor new_empty_symint_decomp( - const Tensor& self, - SymIntArrayRef size, - c10::optional dtype_opt, - c10::optional layout_opt, - c10::optional device_opt, - c10::optional pin_memory_opt - ) { - return self.new_empty(c10::asIntArrayRefSlow(size), dtype_opt, layout_opt, device_opt, pin_memory_opt); -} - -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { m.impl("_has_same_storage_numel", _has_same_storage_numel_batch_rule); VMAP_SUPPORT(ones_like, BASIC_UNARY_BATCH_RULE(ATEN_FN(ones_like))); VMAP_SUPPORT(zeros_like, BASIC_UNARY_BATCH_RULE(ATEN_FN(zeros_like))); @@ -101,11 +115,10 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { VMAP_SUPPORT(randn_like, BASIC_UNARY_BATCH_RULE(ATEN_FN(randn_like))); VMAP_SUPPORT(rand_like, BASIC_UNARY_BATCH_RULE(ATEN_FN(rand_like))); VMAP_SUPPORT(full_like, BASIC_UNARY_BATCH_RULE(ATEN_FN(full_like))); - VMAP_SUPPORT(new_empty, NEW_BLAH_BATCH_RULE(ATEN_FN(new_empty))); - m.impl("new_empty.SymInt", new_empty_symint_decomp); - VMAP_SUPPORT(new_zeros, NEW_BLAH_BATCH_RULE(ATEN_FN(new_zeros))); - VMAP_SUPPORT(new_ones, NEW_BLAH_BATCH_RULE(ATEN_FN(new_ones))); - VMAP_SUPPORT(new_full, NEW_BLAH_BATCH_RULE(ATEN_FN(new_full))); + VMAP_SUPPORT(new_empty, NEW_BLAH_BATCH_RULE_SYMINT(ATEN_FN(new_empty))); + VMAP_SUPPORT(new_zeros, NEW_BLAH_BATCH_RULE_SYMINT(ATEN_FN(new_zeros))); + VMAP_SUPPORT(new_ones, NEW_BLAH_BATCH_RULE_SYMINT(ATEN_FN(new_ones))); + VMAP_SUPPORT(new_full, NEW_BLAH_BATCH_RULE_SYMINT(ATEN_FN(new_full))); VMAP_SUPPORT(_new_zeros_with_same_feature_meta, _new_zeros_with_same_feature_meta_batch_rule); // Not sure how to add the ones with irregular args to the mix cleanly (i.e. randint takes an extra int parameter) } diff --git a/functorch/functorch/csrc/BatchRulesHelper.cpp b/aten/src/ATen/functorch/BatchRulesHelper.cpp similarity index 92% rename from functorch/functorch/csrc/BatchRulesHelper.cpp rename to aten/src/ATen/functorch/BatchRulesHelper.cpp index dfd690ac2168..136a23e17088 100644 --- a/functorch/functorch/csrc/BatchRulesHelper.cpp +++ b/aten/src/ATen/functorch/BatchRulesHelper.cpp @@ -4,8 +4,7 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include +#include #include namespace at { namespace functorch { @@ -122,6 +121,16 @@ Tensor reshape_dim_outof(int64_t src, int64_t size1, const Tensor& x) { return at::reshape(x, shape); } +Tensor reshape_dim_outof_symint(int64_t src, c10::SymInt size1, const Tensor& x) { + src = maybe_wrap_dim(src, x.dim()); + c10::SymDimVector shape(x.sym_sizes().begin(), x.sym_sizes().end()); + TORCH_INTERNAL_ASSERT(shape[src] % size1 == 0); + auto size2 = shape[src] / size1; + shape[src] = size1; + shape.insert(shape.begin() + src + 1, size2); + return at::reshape_symint(x, shape); +} + void vmapIncompatibleInplaceError(const char* schema_name) { TORCH_CHECK(false, "vmap: ", schema_name, "(self, *extra_args) is not possible because ", @@ -133,20 +142,6 @@ void vmapIncompatibleInplaceError(const char* schema_name) { "please file a bug report instead."); } -void run_jit_decomposition(const c10::OperatorHandle& op, torch::jit::Stack* stack) { - const auto& schema = op.schema(); - // TODO: templatize based on op and keep static trace_exec - auto * trace_exec = torch::jit::GetDecompositionExecutor(schema); - trace_exec->run((*stack)); - if (stack->back().isTuple()) { - IValue tup = stack->back(); - stack->pop_back(); - for (const auto& elem: tup.toTuple()->elements()) { - stack->push_back(elem); - } - } -} - static void handleScalarTypePromotion(Tensor& logical_scalar_tensor, Tensor& second) { auto result_type = at::native::result_type(logical_scalar_tensor[0], second); if (logical_scalar_tensor.scalar_type() != result_type) { diff --git a/functorch/functorch/csrc/BatchRulesHelper.h b/aten/src/ATen/functorch/BatchRulesHelper.h similarity index 94% rename from functorch/functorch/csrc/BatchRulesHelper.h rename to aten/src/ATen/functorch/BatchRulesHelper.h index 552a38b20e20..219c01c89c56 100644 --- a/functorch/functorch/csrc/BatchRulesHelper.h +++ b/aten/src/ATen/functorch/BatchRulesHelper.h @@ -4,24 +4,26 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include #include #include -#include - -#include -#include -#include -#include -#include -#include + +#include +#include +#include +#include +#include +#include #include -#include #include +// This file contains helper functions for batching rules. + namespace at { namespace functorch { -Tensor reshape_dim_into(int64_t src, int64_t dst, const Tensor& x); -Tensor reshape_dim_outof(int64_t src, int64_t size1, const Tensor& x); + +TORCH_API Tensor reshape_dim_into(int64_t src, int64_t dst, const Tensor& x); +TORCH_API Tensor reshape_dim_outof(int64_t src, int64_t size1, const Tensor& x); + +TORCH_API Tensor reshape_dim_outof_symint(int64_t src, c10::SymInt size1, const Tensor& x); Tensor moveBatchDimToFront(const Tensor& tensor, optional maybe_batch_dim); int64_t rankWithoutBatchDim(const Tensor& tensor, optional maybe_batch_dim); @@ -119,7 +121,7 @@ void boxed_tensor_inputs_batch_rule(const c10::OperatorHandle& op, torch::jit::S const auto num_returns = schema.returns().size(); const auto num_arguments = schema.arguments().size(); - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); @@ -195,12 +197,6 @@ inline void handle_variadic_bdims(std::vector>()); -void run_jit_decomposition(const c10::OperatorHandle& op, torch::jit::Stack* stack); - -#define RUN_JIT_DECOMPOSITION(op) \ - m.impl(#op, torch::CppFunction::makeFromBoxedFunction<&run_jit_decomposition>()); - - using UnpackedBatchedTensor = std::tuple>; inline void find_and_unpack_tensors( @@ -243,7 +239,7 @@ inline void boxed_existing_bdim_all_batch_rule( const auto num_returns = schema.returns().size(); const auto num_arguments = schema.arguments().size(); - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); @@ -299,7 +295,7 @@ inline void boxed_all_tensors_have_optional_bdim( const auto num_returns = schema.returns().size(); const auto num_arguments = schema.arguments().size(); - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); @@ -390,7 +386,7 @@ struct ExistingBdimBatchRuleHelper> { T... extra_args) { auto self_ = reshape_dim_into(*self_bdim, 0, self); auto out = Func(self_, std::forward(extra_args)...); - return std::make_tuple(reshape_dim_outof(0, self.sizes()[*self_bdim], out), 0); + return std::make_tuple(reshape_dim_outof_symint(0, self.sym_sizes()[*self_bdim], out), 0); } }; diff --git a/functorch/functorch/csrc/BatchRulesLinearAlgebra.cpp b/aten/src/ATen/functorch/BatchRulesLinearAlgebra.cpp similarity index 59% rename from functorch/functorch/csrc/BatchRulesLinearAlgebra.cpp rename to aten/src/ATen/functorch/BatchRulesLinearAlgebra.cpp index 63efbec1caba..f26a4f79b146 100644 --- a/functorch/functorch/csrc/BatchRulesLinearAlgebra.cpp +++ b/aten/src/ATen/functorch/BatchRulesLinearAlgebra.cpp @@ -4,7 +4,7 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include +#include namespace at { namespace functorch { @@ -264,6 +264,59 @@ struct LinalgCheckMatrixBinaryRuleHelper> } }; +static void expect_at_least_rank( + const Tensor& tensor, + optional tensor_bdim, + int64_t expected_rank, + const char* name) { + auto rank = rankWithoutBatchDim(tensor, tensor_bdim); + TORCH_CHECK(rank >= expected_rank, + name, " should have at least ", expected_rank, " dimensions, but has ", + rank, " dimensions instead."); +} + +oneOutput linalg_lu_solve_batch_rule( + const Tensor& LU, optional LU_bdim, + const Tensor& pivots, optional pivots_bdim, + const Tensor& B, optional B_bdim, + bool left, bool adjoint) { + const auto LU_min_rank = 2; + const auto pivots_min_rank = 1; + const auto B_min_rank = 2; + + expect_at_least_rank(LU, LU_bdim, LU_min_rank, "LU"); + expect_at_least_rank(pivots, pivots_bdim, pivots_min_rank, "pivots"); + expect_at_least_rank(B, B_bdim, B_min_rank, "B"); + + auto LU_ = moveBatchDimToFront(LU, LU_bdim); + auto pivots_ = moveBatchDimToFront(pivots, pivots_bdim); + auto B_ = moveBatchDimToFront(B, B_bdim); + + // LU and pivots's first {N-2} (for LU), {N-1} (for pivots) dimensions must match + // So if only one of them is being vmapped over, we must expand out that dimension. + if (LU_bdim.has_value() ^ pivots_bdim.has_value()) { + auto bdim_size = get_bdim_size2(LU, LU_bdim, pivots, pivots_bdim); + LU_ = ensure_has_bdim(LU_, LU_bdim.has_value(), bdim_size); + pivots_ = ensure_has_bdim(pivots_, pivots_bdim.has_value(), bdim_size); + pivots_bdim = 0; + LU_bdim = 0; + } + + // Now, {LU, pivots} and B's first dimensions are allowed to broadcast. + // The rest of the logic handles that. + const auto LU_num_batch_dims = rankWithoutBatchDim(LU_, LU_bdim) - LU_min_rank; + const auto pivots_num_batch_dims = rankWithoutBatchDim(pivots_, pivots_bdim) - pivots_min_rank; + const auto B_num_batch_dims = rankWithoutBatchDim(B_, B_bdim) - B_min_rank; + const auto max_num_batch_dims = std::max(std::max(LU_num_batch_dims, pivots_num_batch_dims), B_num_batch_dims); + + LU_ = maybePadToLogicalRank(LU_, LU_bdim, max_num_batch_dims + LU_min_rank); + pivots_ = maybePadToLogicalRank(pivots_, pivots_bdim, max_num_batch_dims + pivots_min_rank); + B_ = maybePadToLogicalRank(B_, B_bdim, max_num_batch_dims + B_min_rank); + + const auto result = at::linalg_lu_solve(LU_, pivots_, B_, left, adjoint); + return std::make_tuple(result, 0); +} + oneOutput cholesky_solve_batch_rule( const Tensor& self, c10::optional self_bdim, const Tensor& A, c10::optional A_bdim, @@ -293,6 +346,151 @@ oneOutput matrix_exp_batch_rule(const Tensor& self, c10::optional self_ return std::make_tuple(at::matrix_exp(self_), 0); } +fourOutputs solve_ex_batch_rule( + const Tensor& A, optional A_bdim, + const Tensor& B, optional B_bdim, + bool left, bool check_errors) { + auto batch_size = get_bdim_size2(A, A_bdim, B, B_bdim); + const auto A_logical_rank = rankWithoutBatchDim(A, A_bdim); + const auto B_logical_rank = rankWithoutBatchDim(B, B_bdim); + const auto max_logical_rank = std::max(A_logical_rank, B_logical_rank); + + TORCH_CHECK(A_logical_rank >= 2, + "linalg.solve: The input tensor A must have at least 2 dimensions."); + + int b_logical_rank = max_logical_rank; + if (A_logical_rank > B_logical_rank) { // vector case: B was a vector or batched vector + // not accurate but matches linalg error message + TORCH_CHECK(B_logical_rank >= 1, "linalg.solve: The input tensor B must have at least 2 dimensions."); + b_logical_rank = max_logical_rank - 1; + } else { // matrix case: A and B are both matrices or batches of matrices + TORCH_CHECK(B_logical_rank >= 2, "linalg.solve: The input tensor B must have at least 2 dimensions."); + } + + // basically binary pointwise helper but if B was a vector incoming, we must pad it to be 1 dim smaller than A + auto A_ = moveBatchDimToFront(A, A_bdim); + auto B_ = moveBatchDimToFront(B, B_bdim); + A_ = maybePadToLogicalRank(A_, A_bdim, max_logical_rank); + B_ = maybePadToLogicalRank(B_, B_bdim, b_logical_rank); + + A_ = ensure_has_bdim(A_, A_bdim.has_value(), batch_size); + B_ = ensure_has_bdim(B_, B_bdim.has_value(), batch_size); + + // NOTE [ solve_ex Batch Rule Contiguity ] + // A determines whether or not linalg_solve takes an optimized path. We need the check on A_ to match the one run on + // A as BatchedTensor since it might have been saved by autograd (specifically by the jvp) and the autograd behvaior + // differs based on whether or not the optimized path was taken + const auto batched_A_was_contiguous = A_bdim.has_value() ? at::select(A, *A_bdim, 0).is_contiguous() : A.is_contiguous(); + if (batched_A_was_contiguous && !A.is_complex()) { + A_ = A_.contiguous(); + } + const auto res = _linalg_solve_ex(A_, B_, left, check_errors); + return std::make_tuple(std::get<0>(res), 0, std::get<1>(res), 0, std::get<2>(res), 0, std::get<3>(res), 0); +} + +oneOutput cross_batch_rule(const Tensor& self, c10::optional self_bdim, + const Tensor& other, c10::optional other_bdim, const int64_t dim) { + // match cross dimension checks + TORCH_CHECK(rankWithoutBatchDim(self, self_bdim) == rankWithoutBatchDim(other, other_bdim), + "linalg.cross: inputs must have the same number of dimensions." + ); + + const auto batch_size = get_bdim_size2(self, self_bdim, other, other_bdim); + const auto self_other_bundled = _binary_pointwise_helper(self, self_bdim, other, other_bdim, false); + + const auto self_ = ensure_has_bdim(std::get<0>(self_other_bundled), self_bdim.has_value(), batch_size); + const auto other_ = ensure_has_bdim(std::get<1>(self_other_bundled), other_bdim.has_value(), batch_size); + + const auto dim_ = getPhysicalDim(self_, true, dim); + + return std::make_tuple(linalg_cross(self_, other_, dim_), 0); +} + +c10::optional batch_dim_if_not_empty(const Tensor& t) { + if (t.dim() == 1 && t.size(0) == 0) { + return c10::optional(); + } + return c10::optional(0); +} + +fourOutputs linalg_lstsq_batch_rule( + const Tensor& self, c10::optional self_bdim, const Tensor& b, c10::optional b_bdim, + c10::optional rcond, c10::optional driver) { + TORCH_CHECK(rankWithoutBatchDim(self, self_bdim) >= 2, "torch.linalg.lstsq: input must have at least 2 dimensions."); + TORCH_CHECK(rankWithoutBatchDim(b, b_bdim) >= 1, "torch.linalg.lstsq: other must have at least 1 dimension."); + + const auto batch_size = get_bdim_size2(self, self_bdim, b, b_bdim); + const auto tensor_other = _binary_pointwise_helper(self, self_bdim, b, b_bdim, /*do_type_promotion=*/false); + + // because of ambiguity with vector case, lstsq can broadcast [1, 2] -> [batch_size, 2] but not [2] -> [batch_size, 2] + // so could unsqueeze if there's no bdim or just ensure_has_bdim + const auto self_ = ensure_has_bdim(std::get<0>(tensor_other), self_bdim.has_value(), batch_size); + const auto b_ = ensure_has_bdim(std::get<1>(tensor_other), b_bdim.has_value(), batch_size); + + Tensor res, res_1, res_2, res_3; + std::tie(res, res_1, res_2, res_3) = at::linalg_lstsq(self_, b_, rcond, driver); + + // everything but the 0th output are only sometimes computed. When they aren't, they're empty tensors without a bdim + const auto res_1_bdim = batch_dim_if_not_empty(res_1); + const auto res_2_bdim = batch_dim_if_not_empty(res_2); + const auto res_3_bdim = batch_dim_if_not_empty(res_3); + return std::make_tuple(res, 0, res_1, res_1_bdim, res_2, res_2_bdim, res_3, res_3_bdim); +} + +template +std::tuple> +atol_rtol_tensor_batch_rule( + F Func, const Tensor& input, optional input_bdim, + const optional& atol, const optional atol_bdim, + const optional& rtol, const optional rtol_bdim, bool hermitian, char const *op_name) { + auto input_logical_rank = rankWithoutBatchDim(input, input_bdim); + + TORCH_CHECK(input_logical_rank >= 2, + op_name, ": The input tensor input must have at least 2 dimensions."); + + // atol and rtol's dims must be broadcastable to the number of batch dims of input + // which is input's dim - 2 (input represents a batch of matrices, so 2 is for the matrix dimensions) + const auto input_logical_num_bdims = input_logical_rank - 2; + const int64_t atol_logical_num_bdims = atol.has_value() ? rankWithoutBatchDim(*atol, atol_bdim) : 0; + const int64_t rtol_logical_num_bdims = rtol.has_value() ? rankWithoutBatchDim(*rtol, rtol_bdim) : 0; + const auto max_logical_bdims = std::max({input_logical_num_bdims, atol_logical_num_bdims, rtol_logical_num_bdims}); + + auto input_ = moveBatchDimToFront(input, input_bdim); + auto atol_ = atol.has_value() ? moveBatchDimToFront(*atol, atol_bdim) : atol; + auto rtol_ = rtol.has_value() ? moveBatchDimToFront(*rtol, rtol_bdim) : rtol; + + // pad all inputs to have the same number of (non-vmap) batch dimensions + input_ = maybePadToLogicalRank(input_, input_bdim, max_logical_bdims + 2); + atol_ = atol_.has_value() ? maybePadToLogicalRank(*atol_, atol_bdim, max_logical_bdims) : atol_; + rtol_ = rtol_.has_value() ? maybePadToLogicalRank(*rtol_, rtol_bdim, max_logical_bdims) : rtol_; + + return std::make_tuple(Func(input_, atol_, rtol_, hermitian), 0); +} + +std::tuple> +matrix_rank_atol_rtol_tensor_batch_rule( + const Tensor& input, c10::optional input_bdim, const optional& atol, + const c10::optional atol_bdim, const optional& rtol, + const c10::optional rtol_bdim, bool hermitian) { + return atol_rtol_tensor_batch_rule(ATEN_FN2(linalg_matrix_rank, atol_rtol_tensor), input, input_bdim, atol, atol_bdim, rtol, rtol_bdim, hermitian, "torch.linalg.matrix_rank"); +} + +std::tuple> +pinv_batch_rule( + const Tensor& input, c10::optional input_bdim, const optional& atol, + const c10::optional atol_bdim, const optional& rtol, + const c10::optional rtol_bdim, bool hermitian) { + return atol_rtol_tensor_batch_rule(ATEN_FN2(linalg_pinv, atol_rtol_tensor), input, input_bdim, atol, atol_bdim, rtol, rtol_bdim, hermitian, "linalg.pinv"); +} + +std::tuple> +matrix_rank_atol_rtol_float_batch_rule( + const Tensor& input, optional input_bdim, optional atol, optional rtol, bool hermitian) { + TORCH_CHECK(rankWithoutBatchDim(input, input_bdim) >= 2, + "torch.linalg.matrix_rank: The input tensor input must have at least 2 dimensions."); + return std::make_tuple(linalg_matrix_rank(moveBatchDimToFront(input, input_bdim), atol, rtol, hermitian), 0); +} + #define LINALG_CHECK_MATRIX_UNARY_BATCH_RULE(fn, num_out) SINGLE_ARG(\ LinalgCheckMatrixUnaryRuleHelper<\ func_string_##fn,\ @@ -317,51 +515,65 @@ oneOutput matrix_exp_batch_rule(const Tensor& self, c10::optional self_ // Define string constants with the function names. These will be used as template parameters // C++ doesn't let us use string literals as template parameters, so we have to declare them as consts first +// What is going on with these macros? +// - clang-5 seems to require the constexpr +// - windows compiles with or without the constexpr, but the constexpr causes test problems +// - as a result we have some macro guards. +#if defined(_MSC_VER) #define LINALG_STRING_CONST(fn, op_name) \ const char func_string_##fn[] = #op_name;\ #define LINALG_STRING_CONST2(fn, overload, op_name) \ const char func_string_##fn_##overload[] = #op_name;\ +#else +#define LINALG_STRING_CONST(fn, op_name) \ + constexpr const char func_string_##fn[] = #op_name;\ + +#define LINALG_STRING_CONST2(fn, overload, op_name) \ + constexpr const char func_string_##fn_##overload[] = #op_name;\ + +#endif + #define LINALG_CHECK_MATRIX_UNARY_ONE_OUT(fn, op_name) \ LINALG_STRING_CONST(fn, op_name);\ - TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) {\ + TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) {\ VMAP_SUPPORT(fn, LINALG_CHECK_MATRIX_UNARY_BATCH_RULE(fn, one));\ } #define LINALG_CHECK_MATRIX_UNARY_ONE_OUT2(fn, overload, op_name) \ LINALG_STRING_CONST2(fn, overload, op_name);\ - TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) {\ + TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) {\ VMAP_SUPPORT2(fn, overload, LINALG_CHECK_MATRIX_UNARY_BATCH_RULE2(fn, overload, one));\ } #define LINALG_CHECK_MATRIX_UNARY_TWO_OUT(fn, op_name) \ LINALG_STRING_CONST(fn, op_name);\ - TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) {\ + TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) {\ VMAP_SUPPORT(fn, LINALG_CHECK_MATRIX_UNARY_BATCH_RULE(fn, two));\ } #define LINALG_CHECK_MATRIX_UNARY_THREE_OUT(fn, op_name) \ LINALG_STRING_CONST(fn, op_name);\ - TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) {\ + TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) {\ VMAP_SUPPORT(fn, LINALG_CHECK_MATRIX_UNARY_BATCH_RULE(fn, three));\ } #define LINALG_CHECK_MATRIX_UNARY_FOUR_OUT(fn, op_name) \ LINALG_STRING_CONST(fn, op_name);\ - TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) {\ + TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) {\ VMAP_SUPPORT(fn, LINALG_CHECK_MATRIX_UNARY_BATCH_RULE(fn, four));\ } #define LINALG_CHECK_MATRIX_BINARY_ONE_OUT(fn, op_name) \ LINALG_STRING_CONST(fn, op_name);\ - TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) {\ + TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) {\ VMAP_SUPPORT(fn, LINALG_CHECK_MATRIX_BINARY_BATCH_RULE(fn, one));\ } #define LINALG_CHECK_MATRIX_BINARY_TWO_OUT(fn, op_name) \ LINALG_STRING_CONST(fn, op_name);\ - TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) {\ + TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) {\ VMAP_SUPPORT(fn, LINALG_CHECK_MATRIX_BINARY_BATCH_RULE(fn, two));\ } @@ -370,7 +582,6 @@ LINALG_CHECK_MATRIX_UNARY_ONE_OUT(cholesky, cholesky); LINALG_CHECK_MATRIX_UNARY_ONE_OUT(cholesky_inverse, cholesky_inverse); LINALG_CHECK_MATRIX_UNARY_TWO_OUT(linalg_cholesky_ex, linalg.cholesky); LINALG_CHECK_MATRIX_UNARY_TWO_OUT(linalg_eig, linalg.eig); -LINALG_CHECK_MATRIX_UNARY_ONE_OUT(linalg_eigvals, linalg.eigvals); LINALG_CHECK_MATRIX_UNARY_TWO_OUT(linalg_inv_ex, linalg.inv_ex); LINALG_CHECK_MATRIX_UNARY_THREE_OUT(linalg_ldl_factor_ex, torch.linalg.ldl_factor_ex); LINALG_CHECK_MATRIX_UNARY_ONE_OUT(linalg_matrix_power, linalg.matrix_power); @@ -389,7 +600,7 @@ LINALG_CHECK_MATRIX_UNARY_TWO_OUT(_linalg_eigh, linalg.eigh); LINALG_CHECK_MATRIX_UNARY_FOUR_OUT(_linalg_slogdet, linalg.slogdet); LINALG_CHECK_MATRIX_UNARY_THREE_OUT(_linalg_svd, linalg.svd); -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { VMAP_SUPPORT(bmm, bmm_batch_rule); m.impl("addmv", addmv_decomp); m.impl("addmm", addmm_decomp); @@ -399,10 +610,17 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { VMAP_SUPPORT(mv, mv_batch_rule); VMAP_SUPPORT(mm, mm_batch_rule); m.impl("linear", linear_decomp); + VMAP_SUPPORT(linalg_lu_solve, linalg_lu_solve_batch_rule); VMAP_SUPPORT(linalg_householder_product, householder_product_batch_rule); VMAP_SUPPORT(cholesky_solve, cholesky_solve_batch_rule); // custom dim error + VMAP_SUPPORT(linalg_lstsq, linalg_lstsq_batch_rule); // custom errors and sometimes empty return VMAP_SUPPORT(linalg_lu_factor_ex, linalg_lu_factor_ex_batch_rule); VMAP_SUPPORT(linalg_matrix_exp, matrix_exp_batch_rule); + VMAP_SUPPORT(_linalg_solve_ex, solve_ex_batch_rule); + VMAP_SUPPORT(linalg_cross, cross_batch_rule); + VMAP_SUPPORT2(linalg_matrix_rank, atol_rtol_tensor, matrix_rank_atol_rtol_tensor_batch_rule); + VMAP_SUPPORT2(linalg_matrix_rank, atol_rtol_float, matrix_rank_atol_rtol_float_batch_rule); + VMAP_SUPPORT2(linalg_pinv, atol_rtol_tensor, pinv_batch_rule); VMAP_SUPPORT(_linalg_check_errors, _linalg_check_errors_batch_rule); } diff --git a/functorch/functorch/csrc/BatchRulesLoss.cpp b/aten/src/ATen/functorch/BatchRulesLoss.cpp similarity index 94% rename from functorch/functorch/csrc/BatchRulesLoss.cpp rename to aten/src/ATen/functorch/BatchRulesLoss.cpp index 16ee2fb7e9c1..66c2b7fb3194 100644 --- a/functorch/functorch/csrc/BatchRulesLoss.cpp +++ b/aten/src/ATen/functorch/BatchRulesLoss.cpp @@ -4,9 +4,9 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include -#include +#include +#include +#include #include namespace at { namespace functorch { @@ -64,7 +64,7 @@ Tensor binary_cross_entropy_plumbing( if (!isBatchedAtLevel(self, cur_level) && !isBatchedAtLevel(target, cur_level) && !isBatchedAtLevel(weight, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::binary_cross_entropy(self, target, weight, reduction); } @@ -77,7 +77,7 @@ Tensor binary_cross_entropy_plumbing( Tensor result; if (self_bdim || target_bdim) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto bdim_size = get_bdim_size2(self_value, self_bdim, target_value, target_bdim); auto self_ = moveBatchDimToFront(self_value, self_bdim); auto target_ = moveBatchDimToFront(target_value, target_bdim); @@ -86,7 +86,7 @@ Tensor binary_cross_entropy_plumbing( result = at::binary_cross_entropy(self_, target_, nullopt, Reduction::None); result = makeBatched(result, 0, cur_level); } else { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); result = at::binary_cross_entropy(self_value, target_value, nullopt, Reduction::None); } if (weight.has_value() && weight->defined()) { @@ -103,7 +103,7 @@ Tensor binary_cross_entropy_backward_plumbing( int64_t cur_level = maybe_layer->layerId(); if (!areAnyBatchedAtLevel({grad, input, target, weight_opt}, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::binary_cross_entropy_backward(grad, input, target, weight_opt, reduction); } @@ -120,7 +120,7 @@ Tensor binary_cross_entropy_backward_plumbing( Tensor grad_input; if (grad_bdim || input_bdim || target_bdim) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto bdim_size = get_bdim_size3( grad_value, grad_bdim, input_value, input_bdim, target_value, target_bdim); @@ -136,7 +136,7 @@ Tensor binary_cross_entropy_backward_plumbing( grad_, input_, target_, nullopt, Reduction::None); grad_input = makeBatched(grad_input, 0, cur_level); } else { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); grad_input = at::binary_cross_entropy_backward( grad_value, input_value, target_value, nullopt, Reduction::None); } @@ -276,7 +276,7 @@ at::Tensor nll_loss_backward_decomposition( return grad_input * grad_output_; } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { m.impl("nll_loss_forward", nll_loss_forward_decomposition); m.impl("nll_loss2d_forward", nll_loss_forward_decomposition); m.impl("nll_loss_backward", nll_loss_backward_decomposition); diff --git a/functorch/functorch/csrc/BatchRulesModules.cpp b/aten/src/ATen/functorch/BatchRulesModules.cpp similarity index 89% rename from functorch/functorch/csrc/BatchRulesModules.cpp rename to aten/src/ATen/functorch/BatchRulesModules.cpp index 3d54ba5d0fe4..f51d63feaa8e 100644 --- a/functorch/functorch/csrc/BatchRulesModules.cpp +++ b/aten/src/ATen/functorch/BatchRulesModules.cpp @@ -4,33 +4,33 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include +#include +#include #include namespace at { namespace functorch { -static Tensor getStepTensor(const Tensor& indices, int64_t bdim_size, int64_t num_embeddings) { +static Tensor getStepTensor(const Tensor& indices, c10::SymInt bdim_size, c10::SymInt num_embeddings) { // [batch_size, 1, 1, 1, ..., 1] - DimVector view_shape(indices.dim(), 1); + c10::SymDimVector view_shape(indices.dim(), 1); view_shape[0] = bdim_size; auto range = at::arange(0, bdim_size * num_embeddings, num_embeddings, indices.options()); - return range.view(view_shape); + return range.view_symint(view_shape); } std::tuple> embedding_batch_rule( const Tensor& weight, optional weight_bdim, const Tensor& indices, optional indices_bdim, - int64_t padding_idx, bool scale_grad_by_freq, bool sparse) { + c10::SymInt padding_idx, bool scale_grad_by_freq, bool sparse) { if (!weight_bdim && indices_bdim) { // B*, ED -> B*D - const auto result = at::embedding(weight, indices, padding_idx, scale_grad_by_freq, sparse); + const auto result = at::embedding_symint(weight, indices, padding_idx, scale_grad_by_freq, sparse); return std::make_tuple(result, indices_bdim); } else if (weight_bdim && !indices_bdim) { // *, BED -> *, E(BD) -> *(BD) -> *BD const auto batch_size = weight.size(*weight_bdim); const auto weight_ = reshape_dim_into(*weight_bdim, /*embedding_dim*/1, weight); - auto result = at::embedding(weight_, indices, padding_idx, scale_grad_by_freq, sparse); + auto result = at::embedding_symint(weight_, indices, padding_idx, scale_grad_by_freq, sparse); result = reshape_dim_outof(-1, batch_size, result); return std::make_tuple(result, result.dim() - 2); } @@ -44,7 +44,7 @@ std::tuple> embedding_batch_rule( const auto range = getStepTensor(indices, batch_size, num_embeddings); indices_ = indices_ + range; - const auto result = at::embedding(weight_, indices_, padding_idx, scale_grad_by_freq, sparse); + const auto result = at::embedding_symint(weight_, indices_, padding_idx, scale_grad_by_freq, sparse); return std::make_tuple(result, 0); } @@ -52,15 +52,15 @@ std::tuple> embedding_dense_backward_batch_rule( const Tensor& grad_, optional grad_bdim, const Tensor& indices_, optional indices_bdim, - int64_t num_weights, int64_t padding_idx, bool scale_grad_by_freq) { + c10::SymInt num_weights, c10::SymInt padding_idx, bool scale_grad_by_freq) { Tensor grad = grad_; Tensor indices = indices_; if (!indices_bdim && grad_bdim) { - const auto bdim_size = grad.size(*grad_bdim); + const auto bdim_size = grad.sym_size(*grad_bdim); grad = reshape_dim_into(*grad_bdim, -1, grad); - auto result = at::embedding_dense_backward( + auto result = at::embedding_dense_backward_symint( grad, indices, num_weights, padding_idx, scale_grad_by_freq); - result = reshape_dim_outof(1, bdim_size, result); + result = reshape_dim_outof_symint(1, bdim_size, result); return std::make_tuple(result, 1); } const auto bdim_size = indices.size(*indices_bdim); @@ -68,13 +68,13 @@ embedding_dense_backward_batch_rule( grad = moveBatchDimToFront(grad, grad_bdim); grad = ensure_has_bdim(grad, grad_bdim.has_value(), bdim_size); const auto range = getStepTensor(indices, bdim_size, num_weights); - auto result = at::embedding_dense_backward( + auto result = at::embedding_dense_backward_symint( grad, indices + range, num_weights * bdim_size, -1, scale_grad_by_freq); result = reshape_dim_outof(0, bdim_size, result); // Fill in the padding. We can't do it in the embedding_dense_backward call // because we need to fill in multiple rows! if (padding_idx >= 0) { - result.select(1, padding_idx).fill_(0); + result.select_symint(1, padding_idx).fill_(0); } return std::make_tuple(result, 0); } @@ -295,21 +295,21 @@ template struct UpsampleBackwardBatchRuleHelper> { static std::tuple> apply( const Tensor& grad_output, optional grad_output_bdim, - OptionalArrayRef output_size, IntArrayRef input_size, + c10::SymIntArrayRef output_size, c10::SymIntArrayRef input_size, T... extra_args) { auto grad_output_ = reshape_dim_into(*grad_output_bdim, 0, grad_output); TORCH_INTERNAL_ASSERT(input_size.size() > 0); // input_size is wrong so we correct it - DimVector physical_input_size(input_size.begin(), input_size.end()); - physical_input_size[0] = grad_output_.sizes()[0]; + c10::SymDimVector physical_input_size(input_size.begin(), input_size.end()); + physical_input_size[0] = grad_output_.sym_sizes()[0]; auto out = Func( grad_output_, output_size, physical_input_size, std::forward(extra_args)...); - return std::make_tuple(reshape_dim_outof(0, grad_output.sizes()[*grad_output_bdim], out), 0); + return std::make_tuple(reshape_dim_outof_symint(0, grad_output.sym_sizes()[*grad_output_bdim], out), 0); } }; @@ -375,20 +375,20 @@ struct CudnnGridSampleBackwardBatchRuleHelper { #define CUDNN_GRID_SAMPLE_BW_BATCH_RULE(fn)\ CudnnGridSampleBackwardBatchRuleHelper::apply -#define UPSAMPLE_BACKWARD(op, overload) VMAP_SUPPORT2(op, overload, SINGLE_ARG(\ +#define UPSAMPLE_BACKWARD(op) VMAP_SUPPORT(op, SINGLE_ARG(\ UpsampleBackwardBatchRuleHelper<\ - decltype(&ATEN_FN2(op, overload)),\ - &ATEN_FN2(op, overload),\ - c10::guts::function_traits::parameter_types>::apply)) + decltype(&ATEN_FN(op)),\ + &ATEN_FN(op),\ + c10::guts::function_traits::parameter_types>::apply)) #define UPSAMPLE_BATCH(op) \ EXISTING_BDIM2(op, vec); \ EXISTING_BDIM(op); -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { EXISTING_BDIM(im2col); - EXISTING_BDIM(im2col_backward); + EXISTING_BDIM(col2im); VMAP_SUPPORT(embedding, embedding_batch_rule); VMAP_SUPPORT(embedding_dense_backward, embedding_dense_backward_batch_rule); @@ -430,13 +430,13 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { UPSAMPLE_BATCH(upsample_nearest3d); UPSAMPLE_BATCH(upsample_trilinear3d); - UPSAMPLE_BACKWARD(upsample_bicubic2d_backward, vec); - UPSAMPLE_BACKWARD(upsample_bilinear2d_backward, vec); - UPSAMPLE_BACKWARD(upsample_linear1d_backward, vec); - UPSAMPLE_BACKWARD(upsample_nearest1d_backward, vec); - UPSAMPLE_BACKWARD(upsample_nearest2d_backward, vec); - UPSAMPLE_BACKWARD(upsample_nearest3d_backward, vec); - UPSAMPLE_BACKWARD(upsample_trilinear3d_backward, vec); + UPSAMPLE_BACKWARD(upsample_bicubic2d_backward); + UPSAMPLE_BACKWARD(upsample_bilinear2d_backward); + UPSAMPLE_BACKWARD(upsample_linear1d_backward); + UPSAMPLE_BACKWARD(upsample_nearest1d_backward); + UPSAMPLE_BACKWARD(upsample_nearest2d_backward); + UPSAMPLE_BACKWARD(upsample_nearest3d_backward); + UPSAMPLE_BACKWARD(upsample_trilinear3d_backward); m.impl("one_hot", one_hot_decomposition_hack); } }} diff --git a/functorch/functorch/csrc/BatchRulesNorm.cpp b/aten/src/ATen/functorch/BatchRulesNorm.cpp similarity index 93% rename from functorch/functorch/csrc/BatchRulesNorm.cpp rename to aten/src/ATen/functorch/BatchRulesNorm.cpp index e78538329582..d53d4f6a2e97 100644 --- a/functorch/functorch/csrc/BatchRulesNorm.cpp +++ b/aten/src/ATen/functorch/BatchRulesNorm.cpp @@ -4,9 +4,9 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include -#include +#include +#include +#include #include namespace at { namespace functorch { @@ -279,7 +279,7 @@ std::tuple batch_norm_backward_plumbing( std::tie(grad_normalized_input_value, grad_normalized_input_bdim) = unwrapTensorAtLevel(grad_normalized_input.transpose(0, 1), cur_level); // [B0, B, C, *] - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto results = batch_norm_backward_no_weight_bias_batch_rule( grad_normalized_input_value, grad_normalized_input_bdim, input_value, input_bdim, @@ -308,7 +308,7 @@ std::tuple native_group_norm_plumbing( int64_t cur_level = maybe_layer->layerId(); if (!areAnyBatchedAtLevel({input, weight_opt, bias_opt}, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::native_group_norm(input, weight_opt, bias_opt, N, C, HxW, group, eps); } @@ -323,13 +323,13 @@ std::tuple native_group_norm_plumbing( const auto input_ = reshape_dim_into(*input_bdim, 0, input_value); const auto bdim_size = input_value.size(*input_bdim); - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto result = at::native_group_norm(input_, nullopt, nullopt, N * bdim_size, C, HxW, group, eps); result0 = makeBatched(reshape_dim_outof(0, bdim_size, std::get<0>(result)), 0, cur_level); mean = makeBatched(reshape_dim_outof(0, bdim_size, std::get<1>(result)), 0, cur_level); rstd = makeBatched(reshape_dim_outof(0, bdim_size, std::get<2>(result)), 0, cur_level); } else { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto result = at::native_group_norm(input_value, nullopt, nullopt, N, C, HxW, group, eps); result0 = std::get<0>(result); mean = std::get<1>(result); @@ -397,7 +397,7 @@ std::tuple native_group_norm_backward_plumbing( int64_t cur_level = maybe_layer->layerId(); if (!areAnyBatchedAtLevel({grad_out, input, mean, rstd, weight_opt}, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::native_group_norm_backward(grad_out, input, mean, rstd, weight_opt, N, C, HxW, group, output_mask); } @@ -441,7 +441,7 @@ std::tuple native_group_norm_backward_plumbing( std::tie(grad_normalized_input_value, grad_normalized_input_bdim) = unwrapTensorAtLevel(grad_normalized_input, cur_level); - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto res = group_norm_backward_no_weight_bias_batch_rule( grad_normalized_input_value, grad_normalized_input_bdim, input_value, input_bdim, @@ -456,7 +456,7 @@ std::tuple native_group_norm_backward_plumbing( C10_ALWAYS_INLINE bool has_same_shape( const Tensor& tensor, optional tensor_bdim, - IntArrayRef normalized_shape) { + c10::SymIntArrayRef normalized_shape) { if (!tensor.defined()) { return true; } @@ -479,7 +479,7 @@ C10_ALWAYS_INLINE bool has_same_shape( C10_ALWAYS_INLINE void check_same_shape( const Tensor& tensor, optional tensor_bdim, - IntArrayRef normalized_shape, const std::string& name) { + c10::SymIntArrayRef normalized_shape, const std::string& name) { TORCH_CHECK(has_same_shape(tensor, tensor_bdim, normalized_shape), "Expected ", name, " to be of same shape as normalized_shape, but got ", name, " of shape ", @@ -490,7 +490,7 @@ C10_ALWAYS_INLINE void check_same_shape( // Ugh, hard to deduplicate C10_ALWAYS_INLINE void _check_layer_norm_inputs( - IntArrayRef normalized_shape, + SymIntArrayRef normalized_shape, const Tensor& weight, optional weight_bdim, const Tensor& bias, optional bias_bdim) { @@ -507,13 +507,13 @@ C10_ALWAYS_INLINE void _check_layer_norm_inputs( std::tuple,Tensor,optional,Tensor,optional> native_layer_norm_batch_rule( const Tensor& input, optional input_bdim, - IntArrayRef normalized_shape, + c10::SymIntArrayRef normalized_shape, const c10::optional& weight_opt, optional weight_bdim, const c10::optional& bias_opt, optional bias_bdim, double eps) { auto input_ = moveBatchDimToFront(input, input_bdim); if (!weight_bdim && !bias_bdim) { - const auto result = at::native_layer_norm(input_, normalized_shape, weight_opt, bias_opt, eps); + const auto result = at::native_layer_norm_symint(input_, normalized_shape, weight_opt, bias_opt, eps); const auto mean = std::get<1>(result); const auto rstd = std::get<2>(result); const auto stats_bdim = compute_stat_bdim(input_bdim, mean); @@ -528,7 +528,7 @@ native_layer_norm_batch_rule( _check_layer_norm_inputs(normalized_shape, weight, weight_bdim, bias, bias_bdim); const auto input_logical_rank = rankWithoutBatchDim(input, input_bdim); - const auto result = at::native_layer_norm(input_, normalized_shape, nullopt, nullopt, eps); + const auto result = at::native_layer_norm_symint(input_, normalized_shape, nullopt, nullopt, eps); auto result0 = std::get<0>(result); const auto mean = std::get<1>(result); const auto rstd = std::get<2>(result); @@ -607,7 +607,7 @@ std::tuple native_layer_norm_backward_plumbing TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); if (!areAnyBatchedAtLevel({grad_out, input, mean, rstd, weight_opt, bias_opt}, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::native_layer_norm_backward(grad_out, input, normalized_shape, mean, rstd, weight_opt, bias_opt, output_mask); } @@ -667,7 +667,7 @@ std::tuple native_layer_norm_backward_plumbing std::tie(grad_normalized_input_value, grad_normalized_input_bdim) = unwrapTensorAtLevel(grad_normalized_input, cur_level); - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); const auto results = native_layer_norm_backward_no_weight_bias_batch_rule( grad_normalized_input_value, grad_normalized_input_bdim, input_value, input_bdim, @@ -761,7 +761,7 @@ struct NativeBatchNormBackwardBatchRuleHelper { if (!areAnyBatchedAtLevel({grad_out, input, weight_opt, running_mean_opt, running_var_opt, save_mean_opt, save_rstd_opt}, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::native_batch_norm_backward(grad_out, input, weight_opt, running_mean_opt, running_var_opt, save_mean_opt, save_rstd_opt, training, eps, output_mask); @@ -791,7 +791,7 @@ struct CudnnBatchNormBackwardBatchRuleHelper { if (!areAnyBatchedAtLevel({input, grad_out, weight, running_mean_opt, running_var_opt, save_mean_opt, save_rstd_opt, reserve}, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::cudnn_batch_norm_backward(input, grad_out, weight, running_mean_opt, running_var_opt, save_mean_opt, save_rstd_opt, eps, reserve); } @@ -819,7 +819,7 @@ struct MiopenBatchNormBackwardBatchRuleHelper { if (!areAnyBatchedAtLevel({input, grad_out, weight, running_mean_opt, running_var_opt, save_mean_opt, save_rstd_opt}, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::miopen_batch_norm_backward(input, grad_out, weight, running_mean_opt, running_var_opt, save_mean_opt, save_rstd_opt, eps); } @@ -875,10 +875,28 @@ std::tuple cudnn_batch_norm_backward_wrapper( return at::miopen_batch_norm_backward(input, grad_out, weight_opt, running_mean_opt, running_var_opt, save_mean_opt, save_rstd_opt, eps); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +// NB: This is NOT good. In the ideal world, we do NOT want to convert the new legit op back into native_batch_norm +// as native_batch_norm has a problematic schema--it promises it is functional when it is not. However, vmap doesn't +// work with dynamo anyway so we gain some buffer room to do wrong things here. The (reasonable) hope is that we will +// make native_batch_norm composite implicit within a few weeks and we can fix this before vmap works with dynamo. +std::tuple _native_batch_norm_legit_batch( + const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, + Tensor& running_mean, Tensor& running_var, bool train, double momentum, double eps) { + return at::native_batch_norm(self, weight_opt, bias_opt, running_mean, running_var, train, momentum, eps); +} + +std::tuple _native_batch_norm_legit_no_stats_batch( + const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, + bool train, double momentum, double eps) { + return at::native_batch_norm(self, weight_opt, bias_opt, Tensor(), Tensor(), train, momentum, eps); +} + +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { VMAP_SUPPORT(native_batch_norm, NATIVE_BATCH_NORM_BATCH_RULE(native_batch_norm)); VMAP_SUPPORT(cudnn_batch_norm, CUDNN_BATCH_NORM_BATCH_RULE(cudnn_batch_norm)); VMAP_SUPPORT(miopen_batch_norm, MIOPEN_BATCH_NORM_BATCH_RULE(miopen_batch_norm)); + m.impl("_native_batch_norm_legit", _native_batch_norm_legit_batch); + m.impl("_native_batch_norm_legit.no_stats", _native_batch_norm_legit_no_stats_batch); m.impl("native_batch_norm_backward", NATIVE_BATCH_NORM_BACKWARD_BATCH_RULE(native_batch_norm_backward)); m.impl("cudnn_batch_norm_backward", CUDNN_BATCH_NORM_BACKWARD_BATCH_RULE(at::functorch::cudnn_batch_norm_backward_wrapper)); m.impl("miopen_batch_norm_backward", MIOPEN_BATCH_NORM_BACKWARD_BATCH_RULE(at::functorch::miopen_batch_norm_backward_wrapper)); diff --git a/functorch/functorch/csrc/BatchRulesPooling.cpp b/aten/src/ATen/functorch/BatchRulesPooling.cpp similarity index 92% rename from functorch/functorch/csrc/BatchRulesPooling.cpp rename to aten/src/ATen/functorch/BatchRulesPooling.cpp index a04cba329697..ad79f49bc3b3 100644 --- a/functorch/functorch/csrc/BatchRulesPooling.cpp +++ b/aten/src/ATen/functorch/BatchRulesPooling.cpp @@ -4,9 +4,9 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include -#include +#include +#include +#include #include namespace at { namespace functorch { @@ -35,7 +35,7 @@ max_pool2d_with_indices_batch_rule( reshape_dim_outof(0, bdim_size, std::get<1>(result)), 0); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { EXISTING_BDIM(_adaptive_avg_pool2d); EXISTING_BDIM_ALL_BOXED(_adaptive_avg_pool2d_backward); EXISTING_BDIM(_adaptive_avg_pool3d); diff --git a/functorch/functorch/csrc/BatchRulesRandomness.cpp b/aten/src/ATen/functorch/BatchRulesRandomness.cpp similarity index 87% rename from functorch/functorch/csrc/BatchRulesRandomness.cpp rename to aten/src/ATen/functorch/BatchRulesRandomness.cpp index a4a9ef9abcb7..159abc4108e8 100644 --- a/functorch/functorch/csrc/BatchRulesRandomness.cpp +++ b/aten/src/ATen/functorch/BatchRulesRandomness.cpp @@ -5,17 +5,23 @@ // LICENSE file in the root directory of this source tree. #include -#include -#include +#include +#include + +// This file contains batching rules for random operations. These are different +// from our regular batching rules: regular batching rules get registered to the +// FuncTorchBatched key, but batching rules for random operations get +// registered to FuncTorchVmapMode. This is because we need to interpose on +// random operations even if they're not on a BatchedTensor. namespace at { namespace functorch { template -Tensor random_batching_rule(IntArrayRef shape, ExtraArgs... extra_args) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); +Tensor random_batching_rule(SymIntArrayRef shape, ExtraArgs... extra_args) { + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); - VmapDimVector shapeVec(1, maybe_layer->batchSize()); + c10::SmallVector shapeVec(1, maybe_layer->batchSize()); shapeVec.reserve(shape.size() + 1); shapeVec.insert(shapeVec.end(), shape.begin(), shape.end()); RandomnessType randomness = maybe_layer->randomness(); @@ -29,7 +35,7 @@ Tensor random_batching_rule(IntArrayRef shape, ExtraArgs... extra_args) { template Tensor& random_inplace_batching_rule(Tensor& self, ExtraArgs... extra_args) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); const auto cur_level = maybe_layer->layerId(); Tensor self_value; @@ -54,7 +60,7 @@ Tensor& random_inplace_batching_rule(Tensor& self, ExtraArgs... extra_args) { } Tensor& bernoulli_inplace_Tensor_batching_rule(Tensor& self, const Tensor& p_, c10::optional gen) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); auto cur_level = maybe_layer->layerId(); RandomnessType randomness = maybe_layer->randomness(); @@ -104,7 +110,7 @@ Tensor& bernoulli_inplace_Tensor_batching_rule(Tensor& self, const Tensor& p_, c template Tensor randperm_batching_rule(int64_t n, ExtraArgs... extra_args) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); auto const batch_size = maybe_layer->batchSize(); RandomnessType randomness = maybe_layer->randomness(); @@ -124,7 +130,7 @@ Tensor randperm_batching_rule(int64_t n, ExtraArgs... extra_args) { template Tensor unary_pointwise_random_batch_rule(const Tensor& tensor, ExtraArgs... extra_args) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); const auto cur_level = maybe_layer->layerId(); @@ -152,7 +158,7 @@ Tensor unary_pointwise_random_batch_rule(const Tensor& tensor, ExtraArgs... extr template Tensor tensor_like_random_batch_rule(const Tensor& self, ExtraArgs... extra_args) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); const auto cur_level = maybe_layer->layerId(); RandomnessType randomness = maybe_layer->randomness(); @@ -178,7 +184,7 @@ Tensor tensor_like_random_batch_rule(const Tensor& self, ExtraArgs... extra_args } std::tuple native_dropout_batching_rule(const Tensor& tensor, double p, c10::optional train) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); const auto cur_level = maybe_layer->layerId(); RandomnessType randomness = maybe_layer->randomness(); @@ -208,7 +214,7 @@ std::tuple native_dropout_batching_rule(const Tensor& tensor, dou } Tensor multinomial_batching_rule(const Tensor& self, const int64_t num_samples, const bool replacement, const c10::optional generator) { - c10::impl::ExcludeDispatchKeyGuard guard(kVmapModeKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchVmapMode); auto maybe_layer = maybeCurrentDynamicLayer(); const auto cur_level = maybe_layer->layerId(); @@ -220,24 +226,32 @@ Tensor multinomial_batching_rule(const Tensor& self, const int64_t num_samples, RandomnessType randomness = maybe_layer->randomness(); check_randomness(randomness, self_bdim.has_value()); - if (randomness == RandomnessType::Different && !self_bdim) { - auto shape = self_value.sizes(); - VmapDimVector shapeVec(1, maybe_layer->batchSize()); - shapeVec.reserve(shape.size() + 1); - shapeVec.insert(shapeVec.end(), shape.begin(), shape.end()); - self_value = self_value.expand(shapeVec); - } - if (self_value.dim() == 3 && (self_bdim || randomness == RandomnessType::Different)) { - self_value = reshape_dim_into(1, 0, self_value); - } - auto out = multinomial(self_value, num_samples, replacement, generator); - if (randomness == RandomnessType::Same && !self_bdim) { - return out; - } - if(self_value.dim() == 3 && self_bdim) { - out = out.reshape(self.sizes()); + if (randomness == RandomnessType::Different) { + // 1D cases: S -> BS -> multinomial(BS) + // BS -> multinomial(BS) + // + // 2D cases: MS -> BMS -> (BM)S -> multinomial((BM)S) -> (BM)S -> BMS + // BMS -> (BM)S -> multinomial((BM)S) -> (BM)S -> BMS + const auto is_2D_case = rankWithoutBatchDim(self_value, self_bdim) == 2; + if (!self_bdim.has_value()) { + self_value = ensure_has_bdim(self_value, self_bdim.has_value(), maybe_layer->batchSize()); + } + if (is_2D_case) { + self_value = reshape_dim_into(0, 0, self_value); + } + auto out = multinomial(self_value, num_samples, replacement, generator); + if (is_2D_case) { + out = reshape_dim_outof(0, maybe_layer->batchSize(), out); + } + return makeBatched(out, 0, cur_level);; } - return makeBatched(out, 0, cur_level); + + TORCH_INTERNAL_ASSERT(randomness == RandomnessType::Same); // check_randomness eliminates error randomness + TORCH_INTERNAL_ASSERT(!self_bdim.has_value()); // check_randomness eliminates same randomness with batched input + // Must be same randomness with unbatched input + // 1D case: S -> multinomial(S) -> S + // 2D case: MS -> multinomial(MS) -> MS + return multinomial(self_value, num_samples, replacement, generator); } template @@ -245,13 +259,13 @@ struct RandomBatchRuleHelper; template struct RandomBatchRuleHelper> { - static Tensor apply(IntArrayRef shape, T... extra_args) { + static Tensor apply(SymIntArrayRef shape, T... extra_args) { return random_batching_rule(shape, std::forward(extra_args)...); } }; template -Tensor rand_int_wrapper(IntArrayRef shape, int64_t high, T... extra_args) { +Tensor rand_int_wrapper(SymIntArrayRef shape, int64_t high, T... extra_args) { return Func(high, shape, std::forward(extra_args)...); } @@ -270,7 +284,7 @@ struct RandIntBatchRuleHelper; template struct RandIntBatchRuleHelper> { - static Tensor apply(int64_t high, IntArrayRef shape, T... extra_args) { + static Tensor apply(int64_t high, SymIntArrayRef shape, T... extra_args) { return random_batching_rule), &rand_int_wrapper, int64_t, T...>(shape, high, std::forward(extra_args)...); @@ -278,7 +292,7 @@ struct RandIntBatchRuleHelper> { }; template -Tensor rand_int_low_wrapper(IntArrayRef shape, T0 scalar0, T1 scalar1, T... extra_args) { +Tensor rand_int_low_wrapper(SymIntArrayRef shape, T0 scalar0, T1 scalar1, T... extra_args) { return Func(scalar0, scalar1, shape, std::forward(extra_args)...); } @@ -287,7 +301,7 @@ struct RandTwoLeadingScalarsBatchRuleHelper; template struct RandTwoLeadingScalarsBatchRuleHelper> { - static Tensor apply(T0 scalar0, T1 scalar1, IntArrayRef shape, T... extra_args) { + static Tensor apply(T0 scalar0, T1 scalar1, SymIntArrayRef shape, T... extra_args) { return random_batching_rule), &rand_int_low_wrapper, int64_t, int64_t, T...>(shape, scalar0, scalar1, std::forward(extra_args)...); diff --git a/functorch/functorch/csrc/BatchRulesReduceOps.cpp b/aten/src/ATen/functorch/BatchRulesReduceOps.cpp similarity index 96% rename from functorch/functorch/csrc/BatchRulesReduceOps.cpp rename to aten/src/ATen/functorch/BatchRulesReduceOps.cpp index 17f7a263f4ee..9126507e73be 100644 --- a/functorch/functorch/csrc/BatchRulesReduceOps.cpp +++ b/aten/src/ATen/functorch/BatchRulesReduceOps.cpp @@ -4,8 +4,8 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include +#include +#include #include #include @@ -20,10 +20,6 @@ Tensor sum_decomp( return at::sum(self, range(0, self.dim()), false, dtype); } -Tensor sum_symint_decomp(const Tensor& input_t, c10::SymIntArrayRef dim, bool keepdim, optional opt_dtype) { - return at::sum(input_t, c10::asIntArrayRefSlow(dim), keepdim, opt_dtype); -} - Tensor mean_decomp( const Tensor& self, optional dtype) { return at::mean(self, range(0, self.dim()), false, dtype); @@ -74,14 +70,14 @@ void boxed_reduction_batch_rule(const c10::OperatorHandle& op, torch::jit::Stack const auto num_returns = schema.returns().size(); const auto num_arguments = schema.arguments().size(); - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); auto orig_arguments = torch::jit::last(*stack, num_arguments); if (std::none_of(orig_arguments.begin(), orig_arguments.end(), ivalueParticipatesInCurrentLevel)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); op.callBoxed(stack); return; } @@ -172,7 +168,7 @@ void boxed_reduction_batch_rule(const c10::OperatorHandle& op, torch::jit::Stack #define REDUCTION_BOXED_ARGS(op, dim_pos) \ m.impl(#op, torch::CppFunction::makeFromBoxedFunction>()); -// Skipping frobenius/nuclear/all/any since they don't have opinfo tests right now :P +// Skipping all/any since they don't have opinfo tests right now :P Tensor dist_decomp(const Tensor& self, const Tensor& other, const Scalar& p) { return at::norm((self - other), p); @@ -372,7 +368,7 @@ std::tuple> searchsorted_batch_rule( TORCH_INTERNAL_ASSERT(false); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { VMAP_SUPPORT2(searchsorted, Tensor, searchsorted_batch_rule); REDUCTION_BOXED(_fft_r2c); REDUCTION_BOXED(_fft_c2r); @@ -426,7 +422,6 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { REDUCTION_BOXED(_log_softmax); REDUCTION_BOXED_ARGS(rot90, 2); VMAP_SUPPORT(aminmax, aminmax_batching_rule); - m.impl("sum.SymInt", sum_symint_decomp); VMAP_SUPPORT(_log_softmax_backward_data, _log_softmax_backward_batch_rule); VMAP_SUPPORT(_softmax_backward_data, _softmax_backward_batch_rule); } diff --git a/functorch/functorch/csrc/BatchRulesScatterOps.cpp b/aten/src/ATen/functorch/BatchRulesScatterOps.cpp similarity index 98% rename from functorch/functorch/csrc/BatchRulesScatterOps.cpp rename to aten/src/ATen/functorch/BatchRulesScatterOps.cpp index da01d464908e..fc51e9d74409 100644 --- a/functorch/functorch/csrc/BatchRulesScatterOps.cpp +++ b/aten/src/ATen/functorch/BatchRulesScatterOps.cpp @@ -4,11 +4,11 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include +#include #include #include -#include -#include +#include +#include #include #include #include @@ -317,7 +317,7 @@ std::tuple> index_batch_rule( // plumbing done since we don't support List> in codegen Tensor index_plumbing(const Tensor & self, const List> & indices ) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); @@ -504,7 +504,7 @@ void index_put__batch_rule( // plumbing done since we don't support List> in codegen Tensor& index_put__plumbing(Tensor & self, const List> & indices , const Tensor & values, bool accumulate) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); @@ -543,7 +543,7 @@ void _index_put_impl__batch_rule( // plumbing done since we don't support List> in codegen Tensor &_index_put_impl__plumbing(Tensor &self, const List> &indices, const Tensor &values, bool accumulate, bool unsafe) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); @@ -664,7 +664,7 @@ std::tuple> index_put_batch_rule( // plumbing done since we don't support List> in codegen Tensor index_put_plumbing(const Tensor & self, const List> & indices, const Tensor & values, bool accumulate) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto maybe_layer = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); @@ -928,6 +928,11 @@ Tensor index_copy_decomp( return at::scatter(self, dim, index_, source); ; } +// Note [Fix vmap slice_scatter] +// registers a decomposition for `slice_scatter` that calls into `slice.src` +// *_scatter operators have some special semantics though, that we can't easily +// through a decomposition: slice_scatter's output needs to have the same +// size, size, strides and storage_offset as the input. Tensor slice_scatter_decomp(const Tensor &self, const Tensor &src, int64_t dim, c10::optional start, c10::optional end, int64_t step) @@ -1050,7 +1055,7 @@ std::tuple> masked_fill_scalar_batch_rule( return std::make_tuple(result, 0); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { m.impl("index.Tensor", index_plumbing); m.impl("index_put_", index_put__plumbing); m.impl("index_put", index_put_plumbing); diff --git a/functorch/functorch/csrc/BatchRulesUnaryOps.cpp b/aten/src/ATen/functorch/BatchRulesUnaryOps.cpp similarity index 97% rename from functorch/functorch/csrc/BatchRulesUnaryOps.cpp rename to aten/src/ATen/functorch/BatchRulesUnaryOps.cpp index 660cb1f3c713..ee6391c6e284 100644 --- a/functorch/functorch/csrc/BatchRulesUnaryOps.cpp +++ b/aten/src/ATen/functorch/BatchRulesUnaryOps.cpp @@ -4,8 +4,8 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include +#include +#include namespace at { namespace functorch { @@ -79,7 +79,7 @@ to_other_batch_rule(const Tensor& self, optional self_bdim, return std::make_tuple(self.to(other, non_blocking, copy, memory_format), self_bdim); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { #define UNARY_POINTWISE_ALL2(op, overload) \ POINTWISE_BOXED2(op ## _, overload); \ @@ -139,7 +139,6 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { UNARY_POINTWISE_ALL(mvlgamma); UNARY_POINTWISE_ALL(nan_to_num); UNARY_POINTWISE_ALL(neg); - UNARY_POINTWISE_ALL(positive); UNARY_POINTWISE_ALL(rad2deg); UNARY_POINTWISE_ALL(reciprocal); UNARY_POINTWISE_ALL(round); diff --git a/functorch/functorch/csrc/BatchRulesViews.cpp b/aten/src/ATen/functorch/BatchRulesViews.cpp similarity index 83% rename from functorch/functorch/csrc/BatchRulesViews.cpp rename to aten/src/ATen/functorch/BatchRulesViews.cpp index e4160ea4c98f..98eaf0f387a6 100644 --- a/functorch/functorch/csrc/BatchRulesViews.cpp +++ b/aten/src/ATen/functorch/BatchRulesViews.cpp @@ -4,12 +4,12 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include +#include #include #include -#include -#include +#include +#include #include #include #include @@ -58,7 +58,7 @@ namespace at { namespace functorch { // // Now that we have written `sum_batch_rule`, we have to register it inside a // TORCH_LIBRARY_IMPL block: -// TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +// TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { // ... // VMAP_SUPPORT2(sum, int, sum_batch_rule); // ... @@ -79,7 +79,7 @@ namespace at { namespace functorch { // return torch.add(self, product, value); // } // And register it inside a TORCH_LIBRARY_IMPL block: -// TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +// TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { // ... // m.impl("addcmul", addcmul_decomp); // ... @@ -102,15 +102,15 @@ std::tuple> unsqueeze_batch_rule( std::tuple> repeat_batch_rule( const Tensor& self, optional self_bdim, - IntArrayRef sizes) { + c10::SymIntArrayRef sizes) { - VmapDimVector sizes_with_bdim = { sizes.begin(), sizes.end() }; + SymDimVector sizes_with_bdim = { sizes.begin(), sizes.end() }; sizes_with_bdim.insert(sizes_with_bdim.begin(), 1); auto self_ = moveBatchDimToFront(self, self_bdim); while (self_.dim() < (int64_t)sizes_with_bdim.size()) { self_ = self_.unsqueeze(1); } - return std::make_tuple(self_.repeat(sizes_with_bdim), 0); + return std::make_tuple(self_.repeat_symint(sizes_with_bdim), 0); } @@ -136,22 +136,22 @@ std::tuple> diag_batch_rule( std::tuple> _unsafe_view_batch_rule( const Tensor& self, optional self_bdim, - IntArrayRef size) { + c10::SymIntArrayRef size) { auto self_ = moveBatchDimToFront(self, self_bdim); - VmapDimVector view_size(size); + SymDimVector view_size(size); view_size.insert(view_size.begin(), self_.size(0)); // See if the view is valid. If it's not, then we copy. // It's OK to copy, because _unsafe_view(x) guarantees that x isn't used // anymore. - const at::DimVector inferred_size = at::infer_size_dv(view_size, self_.numel()); - const auto stride = at::detail::computeStride(self_.sizes(), - self_.strides(), + const at::SymDimVector inferred_size = at::infer_size_dv(view_size, self_.sym_numel()); + const auto stride = at::detail::computeStride(self_.sym_sizes(), + self_.sym_strides(), inferred_size); if (!stride.has_value()) { self_ = self_.contiguous(); } - return std::make_tuple(at::_unsafe_view(self_, view_size), 0); + return std::make_tuple(at::_unsafe_view_symint(self_, view_size), 0); } std::tuple> flip_batch_rule(const Tensor& self, optional self_bdim, IntArrayRef dims) { @@ -175,7 +175,7 @@ const Tensor& resize__plumbing( TORCH_INTERNAL_ASSERT(maybe_layer.has_value()); int64_t cur_level = maybe_layer->layerId(); if (!isBatchedAtLevel(self, cur_level)) { - c10::impl::ExcludeDispatchKeyGuard guard2(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard2(DispatchKey::FuncTorchBatched); return self.resize_(size, optional_memory_format); } @@ -190,7 +190,7 @@ const Tensor& resize__plumbing( TORCH_INTERNAL_ASSERT(self_bdim.value() == 0, "NYI: resize_ batch rule for batch dim != 0"); // Resize the wrapped tensor - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); self_value = moveBatchDimToFront(self_value, self_bdim); VmapDimVector new_size(size); new_size.insert(new_size.begin(), self_value.size(*self_bdim)); @@ -275,26 +275,26 @@ std::tuple, optional> chunk_batching_rule(const Ten return std::make_tuple(at::chunk(self_, chunks, new_dim), 0); } -std::tuple> select_batching_rule(const Tensor& self, optional bdim, int64_t dim, int64_t index) { +std::tuple> select_batching_rule(const Tensor& self, optional bdim, int64_t dim, c10::SymInt index) { if (!bdim) { - return std::make_tuple(self.select(dim, index), nullopt); + return std::make_tuple(self.select_symint(dim, index), nullopt); } auto _self = moveBatchDimToFront(self, bdim); auto dim_physical = getPhysicalDim(_self, true, dim); - auto result = _self.select(dim_physical, index); + auto result = _self.select_symint(dim_physical, index); return std::make_tuple(result, 0); } -std::tuple> _reshape_alias_batch_rule(const Tensor& self, optional bdim, const IntArrayRef shape, const IntArrayRef strides) { +std::tuple> _reshape_alias_batch_rule(const Tensor& self, optional bdim, const c10::SymIntArrayRef shape, const c10::SymIntArrayRef strides) { (void) strides; TORCH_INTERNAL_ASSERT(bdim.has_value()); auto self_ = moveBatchDimToFront(self, bdim); - c10::SmallBuffer new_shape(shape.size() + 1); - new_shape[0] = self_.size(0); + c10::SymDimVector new_shape(shape.size() + 1); + new_shape[0] = self_.sym_size(0); std::copy(shape.begin(), shape.end(), new_shape.begin() + 1); - return std::make_tuple(at::reshape(self_, new_shape), 0); + return std::make_tuple(at::reshape_symint(self_, new_shape), 0); } std::tuple> roll_batch_rule(const Tensor& self, optional bdim, IntArrayRef shifts, IntArrayRef dims) { @@ -330,15 +330,15 @@ std::tuple> diagonal_batching_rule( std::tuple> diagonal_backward_batch_rule( const Tensor& grad_input, optional grad_input_bdim, - IntArrayRef input_sizes, int64_t offset, int64_t dim1, int64_t dim2) { + c10::SymIntArrayRef input_sizes, int64_t offset, int64_t dim1, int64_t dim2) { auto logical_rank = rankWithoutBatchDim(grad_input, grad_input_bdim); auto grad_input_ = moveBatchDimToFront(grad_input, grad_input_bdim); dim1 = maybe_wrap_dim(dim1, logical_rank + 1) + 1; dim2 = maybe_wrap_dim(dim2, logical_rank + 1) + 1; - c10::SmallBuffer input_sizes_(input_sizes.size() + 1); + c10::SymDimVector input_sizes_(input_sizes.size() + 1); input_sizes_[0] = grad_input_.size(0); std::copy(input_sizes.begin(), input_sizes.end(), input_sizes_.begin() + 1); - auto result = at::diagonal_backward(grad_input_, input_sizes_, offset, dim1, dim2); + auto result = at::diagonal_backward_symint(grad_input_, input_sizes_, offset, dim1, dim2); return std::make_tuple(std::move(result), 0); } @@ -346,13 +346,13 @@ std::tuple> slice_batch_rule( const Tensor& self, optional self_bdim, int64_t dim, - c10::optional start, - c10::optional end, - int64_t step) { + c10::optional start, + c10::optional end, + c10::SymInt step) { auto self_ = moveBatchDimToFront(self, self_bdim); dim = getPhysicalDim(self, self_bdim.has_value(), dim); - auto result = self_.slice(dim, start, end, step); + auto result = self_.slice_symint(dim, start, end, step); return std::make_tuple(result, 0); } @@ -402,51 +402,58 @@ std::tuple> permute_batching_rule( std::tuple> select_backward_batch_rule( const Tensor& grad_input, optional grad_input_bdim, - IntArrayRef input_sizes, int64_t dim, int64_t index) { + c10::SymIntArrayRef input_sizes, int64_t dim, c10::SymInt index) { auto logical_rank = rankWithoutBatchDim(grad_input, grad_input_bdim); auto grad_input_ = moveBatchDimToFront(grad_input, grad_input_bdim); dim = maybe_wrap_dim(dim, logical_rank + 1) + 1; - c10::SmallBuffer input_sizes_(input_sizes.size() + 1); - input_sizes_[0] = grad_input_.size(0); + c10::SymDimVector input_sizes_(input_sizes.size() + 1); + input_sizes_[0] = grad_input_.sym_size(0); std::copy(input_sizes.begin(), input_sizes.end(), input_sizes_.begin() + 1); - auto result = at::select_backward(grad_input_, input_sizes_, dim, index); + auto result = at::select_backward_symint(grad_input_, input_sizes_, dim, index); return std::make_tuple(std::move(result), 0); } std::tuple> slice_backward_batch_rule( const Tensor& grad_input, optional grad_input_bdim, - IntArrayRef input_sizes, int64_t dim, int64_t start, int64_t end, int64_t step) { + SymIntArrayRef input_sizes, int64_t dim, c10::SymInt start, c10::SymInt end, c10::SymInt step) { auto logical_rank = rankWithoutBatchDim(grad_input, grad_input_bdim); auto grad_input_ = moveBatchDimToFront(grad_input, grad_input_bdim); dim = maybe_wrap_dim(dim, logical_rank) + 1; - c10::SmallBuffer input_sizes_(input_sizes.size() + 1); + c10::SymDimVector input_sizes_(input_sizes.size() + 1); input_sizes_[0] = grad_input_.size(0); std::copy(input_sizes.begin(), input_sizes.end(), input_sizes_.begin() + 1); - auto result = at::slice_backward(grad_input_, input_sizes_, dim, start, end, step); + auto result = at::slice_backward_symint(grad_input_, input_sizes_, dim, start, end, step); return std::make_tuple(std::move(result), 0); } std::tuple> view_batching_rule( - const Tensor &self, optional self_bdim, IntArrayRef size) + const Tensor &self, optional self_bdim, SymIntArrayRef sym_size) { TORCH_INTERNAL_ASSERT(self_bdim.has_value()); auto self_ = moveBatchDimToFront(self, self_bdim); - VmapDimVector size_(size.size() + 1); + c10::SmallVector size_(sym_size.size() + 1); // copy batch size size_[0] = self_.size(0); - std::copy(size.cbegin(), size.cend(), size_.begin() + 1); - return std::make_tuple(self_.view(size_), 0); + std::copy(sym_size.cbegin(), sym_size.cend(), size_.begin() + 1); + return std::make_tuple(self_.view_symint(size_), 0); } -Tensor view_symint_decomposition(const Tensor& self, - c10::SymIntArrayRef size) { - return self.view( c10::asIntArrayRefSlow(size)); +std::tuple> view_copy_batch_rule( + const Tensor& self, + optional self_bdim, + c10::SymIntArrayRef size) { + auto self_ = moveBatchDimToFront(self, self_bdim); + SymDimVector view_size(size.size() + 1); + view_size[0] = self_.size(0); + std::copy(size.cbegin(), size.cend(), view_size.begin() + 1); + + return std::make_tuple(at::view_copy_symint(self_, view_size), 0); } template std::tuple> expand_batch_rule( - const Tensor &self, optional self_bdim, IntArrayRef size, bool implicit) + const Tensor &self, optional self_bdim, SymIntArrayRef size, bool implicit) { auto self_dim = self.dim(); TORCH_CHECK(static_cast(self_dim - 1) <= size.size(), @@ -457,7 +464,7 @@ std::tuple> expand_batch_rule( auto self_sizes = self_.sizes(); auto batch_size = self_sizes[0]; - c10::SmallBuffer size_(size.size() + 1); + c10::SmallVector size_(size.size() + 1); size_[0] = batch_size; std::copy(size.cbegin(), size.cend(), size_.begin() + 1); @@ -471,12 +478,12 @@ std::tuple> expand_batch_rule( // so the strategy here is to view it first as a tensor of size [B0, 1, 3] and // then expand. auto extra_dims = size.size() - (self_dim - 1); - VmapDimVector view_shape(size_.size(), /*init_value*/1); + c10::SmallVector view_shape(size_.size(), /*init_value*/1); view_shape[0] = batch_size; std::copy(self_sizes.cbegin() + 1, self_sizes.cend(), view_shape.begin() + 1 + extra_dims); - return std::make_tuple(Func(self_.view(view_shape), size_, implicit), 0); + return std::make_tuple(Func(self_.view_symint(view_shape), size_, implicit), 0); } std::tuple> unfold_batch_rule( @@ -496,6 +503,18 @@ std::tuple> unfold_batch_rule( return std::make_tuple(result, 0); } +std::tuple> narrow_copy_batch_rule( + const Tensor &self, optional self_bdim, int64_t dim, c10::SymInt start, c10::SymInt length) +{ + TORCH_INTERNAL_ASSERT(self_bdim.has_value()); + auto self_ = moveBatchDimToFront(self, self_bdim); + auto logical_rank = rankWithoutBatchDim(self, self_bdim); + dim = maybe_wrap_dim(dim, logical_rank) + 1; + auto result = self_.narrow_copy_symint(dim, start, length); + + return std::make_tuple(result, 0); +} + std::tuple> movedim_batch_rule(const Tensor& self, optional self_bdim, IntArrayRef source, IntArrayRef destination) { auto self_ = moveBatchDimToFront(self, self_bdim); auto source_ = getPhysicalDims(self_, self_bdim.has_value(), source); @@ -511,20 +530,16 @@ std::tuple> diag_embed_batch_rule(const Tensor& self, return std::make_tuple(at::diag_embed(self_, offset, dim1, dim2), 0); } -// We need to write a real batching rule to fully support symint. -// This requires symint variants of other operations, like `view`, -// which don't exist yet. -Tensor expand_symint_decomp_hack(const Tensor& self, SymIntArrayRef packed_size, bool implicit) { - auto size = asIntArrayRefSlow(packed_size); - return self.expand(size, implicit); +Tensor trace_decomp(const Tensor& tensor) { + return tensor.diagonal().sum(); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { VMAP_SUPPORT(diag, diag_batch_rule); VMAP_SUPPORT(chunk, chunk_batching_rule); m.impl("flatten.using_ints", static_cast(native::flatten)); VMAP_SUPPORT(flip, flip_batch_rule); - RUN_JIT_DECOMPOSITION(trace) + m.impl("trace", trace_decomp); VMAP_SUPPORT(tril, VARIADIC_BDIMS_BATCH_RULE(ATEN_FN(tril))); VMAP_SUPPORT(triu, VARIADIC_BDIMS_BATCH_RULE(ATEN_FN(triu))); VMAP_SUPPORT(repeat, repeat_batch_rule); @@ -542,6 +557,7 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { VMAP_SUPPORT(select_backward, select_backward_batch_rule); VMAP_SUPPORT(slice_backward, slice_backward_batch_rule); VMAP_SUPPORT(view, view_batching_rule); + VMAP_SUPPORT(view_copy, view_copy_batch_rule); VMAP_SUPPORT(expand, SINGLE_ARG(expand_batch_rule)); VMAP_SUPPORT(expand_copy, SINGLE_ARG(expand_batch_rule)); VMAP_SUPPORT(unfold, unfold_batch_rule); @@ -549,8 +565,7 @@ TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { VMAP_SUPPORT2(slice, Tensor, slice_batch_rule); VMAP_SUPPORT2(transpose, int, transpose_int_batch_rule); VMAP_SUPPORT(diag_embed, diag_embed_batch_rule); - m.impl("expand.SymInt", expand_symint_decomp_hack); - m.impl("view.SymInt", view_symint_decomposition); + VMAP_SUPPORT(narrow_copy, narrow_copy_batch_rule); } }} diff --git a/functorch/functorch/csrc/BatchedFallback.cpp b/aten/src/ATen/functorch/BatchedFallback.cpp similarity index 97% rename from functorch/functorch/csrc/BatchedFallback.cpp rename to aten/src/ATen/functorch/BatchedFallback.cpp index 6b6c58b243ee..87cdcc0fe9fc 100644 --- a/functorch/functorch/csrc/BatchedFallback.cpp +++ b/aten/src/ATen/functorch/BatchedFallback.cpp @@ -4,12 +4,11 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include -#include -#include -#include -#include +#include +#include +#include +#include +#include #include #include @@ -268,7 +267,7 @@ void batchedTensorForLoopFallback(const c10::OperatorHandle& op, torch::jit::Sta "We could not generate a fallback."); if (std::none_of(arguments.begin(), arguments.end(), ivalueParticipatesInCurrentLevel)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); op.callBoxed(stack); return; } @@ -354,7 +353,7 @@ void batchedTensorForLoopFallback(const c10::OperatorHandle& op, torch::jit::Sta // argument is a BatchedTensor TORCH_INTERNAL_ASSERT(input_physical_views_iter != input_physical_views.end()); const auto& physical_view_for_argument = *input_physical_views_iter; - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); torch::jit::push(stack, physical_view_for_argument.tensor().index(index)); batched_tensor_inputs_pos_iter++; input_physical_views_iter++; @@ -362,7 +361,7 @@ void batchedTensorForLoopFallback(const c10::OperatorHandle& op, torch::jit::Sta // std::cout << "[Fallback]: "; // at::dump_tensor((*stack)[stack->size() - 1].toTensor()); - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); op.callBoxed(stack); // Store the result into `output_shards`. See NOTE: [Output shards layout] @@ -379,7 +378,7 @@ void batchedTensorForLoopFallback(const c10::OperatorHandle& op, torch::jit::Sta auto output_shards_chunks = MatrixRef(output_shards, num_batches); for (const auto return_idx : c10::irange(0, num_returns)) { auto shards = output_shards_chunks[return_idx]; - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto flat_output = safeStack(shards); // See NOTE [vmap through backward and undefined grad] if (!flat_output.defined()) { diff --git a/functorch/functorch/csrc/BatchedFallback.h b/aten/src/ATen/functorch/BatchedFallback.h similarity index 63% rename from functorch/functorch/csrc/BatchedFallback.h rename to aten/src/ATen/functorch/BatchedFallback.h index 9130245f28b1..05d223568a37 100644 --- a/functorch/functorch/csrc/BatchedFallback.h +++ b/aten/src/ATen/functorch/BatchedFallback.h @@ -12,14 +12,19 @@ namespace at { namespace functorch { +// This file contains code for the vmap fallback (also known as the +// BatchedTensor fallback or the Batched fallback). This code runs +// when an operation doesn't have a batching rule implemented. + // If an operator doesn't have a batching rule implemented then we fallback -// to this implementation. The fallback only works on out-of-place operators -// that return only tensors with new memory. (e.g., no in-place operators, no -// view operations). +// to this implementation. The fallback doesn't work on out= variants or +// view operations; that is, it works for out-of-place operations and +// in-place non-view operations. // -// The fallback effectively takes all of the BatchedTensors in `stack`, slices -// them, and runs `op` on all of the corresponding slices to produce slices -// of the outputs. The output slices then get `torch.stack`ed to create the +// For out-of-place operations, the fallback effectively takes all of the +// BatchedTensors in `stack`, slices them, and runs `op` on all of the +// corresponding slices to produce slices of the outputs. The output slices +// then get `torch.stack`ed to create the // final returns. // // The performance of the fallback is not very good because it introduces an @@ -27,11 +32,15 @@ namespace functorch { // write batching rules for operators whenever possible. void batchedTensorForLoopFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack); -bool isVmapFallbackWarningEnabled(); -void setVmapFallbackWarningEnabled(bool enabled); +// The vmap fallback emits a warning by default, but it may be disabled if +// the user finds it to be too annoying. +TORCH_API bool isVmapFallbackWarningEnabled(); +TORCH_API void setVmapFallbackWarningEnabled(bool enabled); -bool isVmapFallbackEnabled(); -void setVmapFallbackEnabled(bool enabled); +// Used for testing. The vmap fallback is enabled by default. When it is disabled, +// it raises an error. +TORCH_API bool isVmapFallbackEnabled(); +TORCH_API void setVmapFallbackEnabled(bool enabled); template A vector_to_result(const std::vector& buffer) { return buffer[0].to(); @@ -43,8 +52,8 @@ template std::tuple vector_to_resu return std::make_tuple(buffer[0].to(), buffer[1].to(), buffer[2].to()); } -// This is a way to call the slow fallback from inside some plumbing -// TODO: Probably better way to metaprogram this +// slow_fallback is a way to call the vmap fallback inside some boxed kernel. +// There is probably some better way to metaprogram this. template Ret slow_fallback(const c10::OperatorHandle& op, ArrayRef args) { std::vector stack(args.begin(), args.end()); diff --git a/functorch/functorch/csrc/BatchedTensorImpl.cpp b/aten/src/ATen/functorch/BatchedTensorImpl.cpp similarity index 58% rename from functorch/functorch/csrc/BatchedTensorImpl.cpp rename to aten/src/ATen/functorch/BatchedTensorImpl.cpp index 487df2900071..c5d6eb34030d 100644 --- a/functorch/functorch/csrc/BatchedTensorImpl.cpp +++ b/aten/src/ATen/functorch/BatchedTensorImpl.cpp @@ -3,51 +3,19 @@ // // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include +#include #include #include -#include #include namespace at { namespace functorch { -BatchedTensorImpl::BatchedTensorImpl(Tensor value, int64_t bdim, int64_t level) - : TensorImpl( - c10::DispatchKeySet(kBatchedKey), - value.dtype(), - value.device() - ) - , value_(std::move(value)) - , level_(level) - , bdim_(bdim) -{ - // TODO: I don't think this ctor gets used. - TORCH_INTERNAL_ASSERT(false); - TORCH_INTERNAL_ASSERT(value_.defined()); - set_storage_access_should_throw(); - set_sizes_strides_policy(SizesStridesPolicy::CustomStrides); - checkInvariants(); - - const auto public_dims = value_.dim() - 1; - const auto value_sizes = value_.sizes(); - const auto value_strides = value_.strides(); - sizes_and_strides_.resize(public_dims); - for (const auto dim : c10::irange(0, public_dims)) { - auto actual_dim = actualDim(dim, /*wrap_dim=*/false); - sizes_and_strides_.size_at_unchecked(dim) = value_sizes.at(actual_dim); - sizes_and_strides_.stride_at_unchecked(dim) = value_strides.at(actual_dim); - } - storage_offset_= value_.storage_offset(); - refresh_numel(); - refresh_contiguous(); -} - BatchedTensorImpl::BatchedTensorImpl(DispatchKeySet key_set, Tensor value, int64_t bdim, int64_t level) : TensorImpl( - key_set.add(kBatchedKey), + key_set.add(DispatchKey::FuncTorchBatched), value.dtype(), value.device() ) @@ -57,7 +25,7 @@ BatchedTensorImpl::BatchedTensorImpl(DispatchKeySet key_set, Tensor value, int64 { TORCH_INTERNAL_ASSERT(value_.defined()); set_storage_access_should_throw(); - set_sizes_strides_policy(SizesStridesPolicy::CustomStrides); + set_custom_sizes_strides(SizesStridesPolicy::CustomStrides); checkInvariants(); refreshTensorMetadata(); } @@ -82,36 +50,11 @@ int64_t BatchedTensorImpl::actualDim(int64_t dim, bool wrap_dim) const { const auto ndim = sizes_and_strides_.size(); dim = maybe_wrap_dim(dim, ndim); } - auto is_bdim = createBatchDimBitset(bdim_); - - // TODO(vfdev): As BatchedTensorImpl is refactored and has only one dim. - // Below code may be simplified. - - // Example: assume dim = 3, and is_bdim = 10010011000... - // The 1's are batch dims and 0's are normal dims of the underlying value_ Tensor. - // actualDim gives us the index of `dim` in the `value_` Tensor, which is equivalent - // to asking "where does the 3rd (0-indexed) zero occur in the bitset?". - // The answer to that is index 5. - // - // TODO(rzou): the PDEP instruction does exactly this - // (https://stackoverflow.com/questions/7669057/find-nth-set-bit-in-an-int) - // but it might require newer (>= ~2015) CPUs. We should clean this up - // if/when we have dropped support for older CPUs. - int64_t non_bdim_count = 0; - for (int64_t actual_dim = 0; actual_dim < kVmapMaxTensorDims; actual_dim++) { - if (is_bdim[actual_dim]) { - continue; - } - if (non_bdim_count == dim) { - return actual_dim; - } - non_bdim_count++; + if (bdim_ <= dim) { + return dim + 1; + } else { + return dim; } - // If we hit this assert, then that means - // `non_bdim_count` + #num_bdims > kVmapMaxTensorDims. We restrict the number - // of dims a BatchedTensorImpl can have to kVmapMaxTensorDims so this should - // never be hit. - TORCH_INTERNAL_ASSERT(false); } void BatchedTensorImpl::checkInvariants() const { @@ -124,6 +67,11 @@ IntArrayRef BatchedTensorImpl::strides_custom() const { return strides_default(); } +SymIntArrayRef BatchedTensorImpl::sym_strides_custom() const { + return sym_strides_default(); +} + + // TODO: implement proper contiguity on batched tensor, then put // sizes_strides_policy back to Default bool BatchedTensorImpl::is_contiguous_custom(at::MemoryFormat memory_format) const { diff --git a/functorch/functorch/csrc/BatchedTensorImpl.h b/aten/src/ATen/functorch/BatchedTensorImpl.h similarity index 82% rename from functorch/functorch/csrc/BatchedTensorImpl.h rename to aten/src/ATen/functorch/BatchedTensorImpl.h index 37294f20695c..320989604570 100644 --- a/functorch/functorch/csrc/BatchedTensorImpl.h +++ b/aten/src/ATen/functorch/BatchedTensorImpl.h @@ -12,9 +12,6 @@ #include #include -#include -#include - namespace at { namespace functorch { @@ -43,8 +40,7 @@ constexpr int64_t kBatchDimsStackSize = 5; // // bt.sizes() returns (5, 7); bt.sum(0) performs a reduction over the (public) // dim 0, which is equivalent to dim 3 in the underlying ones(2, 3, 5, 7) tensor. -struct BatchedTensorImpl : public c10::TensorImpl { - explicit BatchedTensorImpl(Tensor value, int64_t dim, int64_t level); +struct TORCH_API BatchedTensorImpl : public c10::TensorImpl { explicit BatchedTensorImpl(at::DispatchKeySet key_set, Tensor value, int64_t dim, int64_t level); // Returns batch dimension of this tensor @@ -68,6 +64,7 @@ struct BatchedTensorImpl : public c10::TensorImpl { // We have to override this because we opted into CustomStrides IntArrayRef strides_custom() const override; + SymIntArrayRef sym_strides_custom() const override; // Override a bunch of methods inherited from TensorImpl to return error messages. bool is_contiguous_custom(at::MemoryFormat memory_format=at::MemoryFormat::Contiguous) const override; void set_size(int64_t dim, int64_t new_size) override; @@ -78,9 +75,16 @@ struct BatchedTensorImpl : public c10::TensorImpl { #endif void refreshTensorMetadata(); + + // Used in torchdim. torchdim uses non-lexical BatchedTensor; the way it + // accomplishes this is a hack where it is able to modify the levels of + // BatchedTensor to match the level of the current vmap transform. void _unsafe_set_level(int64_t level) { level_ = level; } + + // Used in batching rule for in-place view operations that can change + // the index of the bdim (think squeeze_, unsqueeze_) void unsafe_set_bdim(int64_t bdim) { // NB: you MUST call refreshTensorMetadata after doing this. bdim_ = bdim; @@ -99,7 +103,7 @@ struct BatchedTensorImpl : public c10::TensorImpl { // NB: We use the term "BatchedTensor" to mean a Tensor that is backed with a // BatchedTensorImpl. inline bool isBatchedTensor(const Tensor& tensor) { - return tensor.unsafeGetTensorImpl()->key_set().has(kBatchedKey); + return tensor.unsafeGetTensorImpl()->key_set().has(DispatchKey::FuncTorchBatched); } // It is unsafe to call this on a Tensor that is not backed by a @@ -130,11 +134,15 @@ inline std::bitset createVmapLevelsBitset(int64_t level) { } // Use this to construct a BatchedTensor from a regular Tensor -FUNCTORCH_API Tensor makeBatched(const Tensor& tensor, int64_t dim, int64_t level); +TORCH_API Tensor makeBatched(const Tensor& tensor, int64_t dim, int64_t level); // Adds a batch dim to `tensor`, returning a BatchedTensor -FUNCTORCH_API Tensor addBatchDim(const Tensor& tensor, int64_t dim, int64_t level); +TORCH_API Tensor addBatchDim(const Tensor& tensor, int64_t dim, int64_t level); +// Certain dispatch keys must be propagated to the BatchedTensor (or, in general, +// any wrapper Tensor subclasses). This is because there are methods on Tensor +// that skip dispatch and check for the presence of a dispatch key (e.g. is_cpu()). +// TODO: should probably contain more (or all?) backend keys constexpr DispatchKeySet kKeysToPropagateToWrapper({ DispatchKey::Negative, DispatchKey::Conjugate, diff --git a/functorch/functorch/csrc/BatchingMetaprogramming.h b/aten/src/ATen/functorch/BatchingMetaprogramming.h similarity index 92% rename from functorch/functorch/csrc/BatchingMetaprogramming.h rename to aten/src/ATen/functorch/BatchingMetaprogramming.h index e054e58568be..e77960f441fe 100644 --- a/functorch/functorch/csrc/BatchingMetaprogramming.h +++ b/aten/src/ATen/functorch/BatchingMetaprogramming.h @@ -8,6 +8,14 @@ #include #include +// This file contains template metaprogramming things that are used for our +// batching rules. +// +// See NOTE: [vmap plumbing] for more details on why this is necessary. +// The plumbing has a bunch of metaprogramming hacks for determining the signature +// of a batching rule from the signature of the operator, many of which use the +// helper functions in this file. + namespace at { namespace functorch { diff --git a/functorch/functorch/csrc/DynamicLayer.cpp b/aten/src/ATen/functorch/DynamicLayer.cpp similarity index 72% rename from functorch/functorch/csrc/DynamicLayer.cpp rename to aten/src/ATen/functorch/DynamicLayer.cpp index 8bfd388358a0..d152f3c08c2d 100644 --- a/functorch/functorch/csrc/DynamicLayer.cpp +++ b/aten/src/ATen/functorch/DynamicLayer.cpp @@ -4,16 +4,15 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include -#include -#include +#include +#include +#include +#include #include #include #include #include -#include #include #include #include @@ -22,8 +21,8 @@ namespace at { namespace functorch { void setDynamicLayerFrontBackKeysIncluded(bool included) { - c10::impl::tls_set_dispatch_key_included(kDynamicLayerFrontModeKey, included); - c10::impl::tls_set_dispatch_key_included(kDynamicLayerBackModeKey, included); + c10::impl::tls_set_dispatch_key_included(DispatchKey::FuncTorchDynamicLayerFrontMode, included); + c10::impl::tls_set_dispatch_key_included(DispatchKey::FuncTorchDynamicLayerBackMode, included); } DynamicLayer::DynamicLayer( @@ -75,8 +74,8 @@ RandomnessType DynamicLayer::randomness() const { return VmapInterpreterPtr(&interpreter_).randomness(); } -constexpr DispatchKeySet kFrontBackKeys({kDynamicLayerBackModeKey, kDynamicLayerFrontModeKey}); - +// Maps level to life handle, see NOTE: [Life handles and lexically scoped transforms] +// for details using DynmetaData = std::unordered_map>; DynmetaData kDynMetaDataSingleton; @@ -84,6 +83,13 @@ static DynmetaData& getGlobalDynmetaData() { return kDynMetaDataSingleton; } +// functorch stores some TLS. Inside the TLS is the stack of transforms. +// Unfortunately, since functorch isn't a part of libtorch, we have +// a level of indirection. FuncTorchTLSBase is the interface that lives in libtorch, +// while FuncTorchTLS implements all the methods and stores data. +// +// TODO: after functorch C++ code is moved into PyTorch, we can get rid of +// this layer of indirection. class FuncTorchTLS : public FuncTorchTLSBase { public: FuncTorchTLS() {} @@ -95,7 +101,7 @@ class FuncTorchTLS : public FuncTorchTLSBase { } int64_t checkSupportsAutogradFunction() const override { - TORCH_CHECK(dynamicLayerStack.size() == 0, + TORCH_CHECK(dynamicLayerStack.size() == 0 || getAutogradFunctionAllowed(), "functorch functions (vmap, grad, vjp, etc.) currently do not support the use of autograd.Function. ", "Please rewrite your function to not use autograd.Function while we work on fixing this"); return 0; @@ -122,6 +128,7 @@ class FuncTorchTLS : public FuncTorchTLSBase { std::vector dynamicLayerStack; bool allow_inplace_requires_grad_ = false; + bool allow_autograd_function_ = false; }; static FuncTorchTLS* getRawFunctorchTLS() { @@ -145,6 +152,15 @@ bool getInplaceRequiresGradAllowed() { return functorch_tls->allow_inplace_requires_grad_; } +void setAutogradFunctionAllowed(bool allowed) { + auto* functorch_tls = getRawFunctorchTLS(); + functorch_tls->allow_autograd_function_ = allowed; +} + +bool getAutogradFunctionAllowed() { + auto* functorch_tls = getRawFunctorchTLS(); + return functorch_tls->allow_autograd_function_; +} static std::vector& dynamicLayerStackAccessor() { return getRawFunctorchTLS()->dynamicLayerStack; @@ -198,7 +214,7 @@ bool areTransformsActive() { return !data.empty(); } -static DynamicLayer popDynamicLayer() { +DynamicLayer popDynamicLayer() { auto& dynamicLayerStack = dynamicLayerStackAccessor(); TORCH_INTERNAL_ASSERT(dynamicLayerStack.size() > 0); auto result = dynamicLayerStack.back(); @@ -216,7 +232,7 @@ static DynamicLayer popDynamicLayer() { return result; } -static int64_t pushDynamicLayer(DynamicLayer&& dynamic_layer) { +int64_t pushDynamicLayer(DynamicLayer&& dynamic_layer) { auto& dynamicLayerStack = dynamicLayerStackAccessor(); int64_t layerId = 1 + dynamicLayerStack.size(); TORCH_INTERNAL_ASSERT(layerId == dynamic_layer.layerId()); @@ -264,17 +280,11 @@ DynamicLayer popDynamicLayerAndDeleteMetadata() { auto level = result.layerId(); // TODO: is this lock safe? No one else should be writing to the same bucket - // if (c10::show_dispatch_trace_enabled()) { - // std::cout << "deleting metadata" << std::endl; - // } auto& data = getGlobalDynmetaData(); auto it = data.find(level); if (it == data.end()) { return result; } - // if (c10::show_dispatch_trace_enabled()) { - // std::cout << "deleted metadata for level " << level << std::endl; - // } // invalidate the thing *(it->second) = false; data.erase(level); @@ -294,10 +304,19 @@ Tensor unwrapIfDead(const Tensor& tensor) { void foreachTensorInplace(std::vector& args, int64_t begin, int64_t end, std::function func) { + auto func_with_bool = [&](const Tensor& tensor, bool unused) { return func(tensor); }; + foreachTensorInplaceWithFlag(args, begin, end, std::bitset<64>(), func_with_bool); +} + +void foreachTensorInplaceWithFlag(std::vector& args, int64_t begin, int64_t end, + const std::bitset<64> use_flag_relative, std::function func){ TORCH_INTERNAL_ASSERT(begin >= 0); TORCH_INTERNAL_ASSERT(end >= 0); TORCH_INTERNAL_ASSERT(begin <= end); - for (int64_t idx = begin; idx < end; idx++) { + for (int64_t relative_idx = 0; relative_idx < end - begin; relative_idx++) { + const bool flag = use_flag_relative[relative_idx] == 1; + + const auto idx = relative_idx + begin; auto ivalue = args[idx]; // Tensor?[] translates to a c10::List so we need to peek inside List if (ivalue.isList()) { @@ -307,7 +326,7 @@ void foreachTensorInplace(std::vector& args, int64_t begin, int64_t end, for (const auto list_idx : c10::irange(0, list.size())) { const auto& elt = list.get(list_idx); if (elt.isTensor()) { - list.set(list_idx, func(elt.toTensor())); + list.set(list_idx, func(elt.toTensor(), flag)); modified = true; } } @@ -319,7 +338,7 @@ void foreachTensorInplace(std::vector& args, int64_t begin, int64_t end, if (ivalue.isTensorList()) { auto list = ivalue.toTensorList(); for (const auto list_idx : c10::irange(0, list.size())) { - list[list_idx] = func(list[list_idx]); + list[list_idx] = func(list[list_idx], flag); } args[idx] = list; } @@ -328,7 +347,7 @@ void foreachTensorInplace(std::vector& args, int64_t begin, int64_t end, continue; } Tensor value = ivalue.toTensor(); - Tensor replacement = func(value); + Tensor replacement = func(value, flag); args[idx] = std::move(replacement); // sanity checks if (ivalue.toTensor().defined()) { @@ -371,6 +390,14 @@ bool isInplaceOp(const FunctionSchema& schema) { return return_alias_info && return_alias_info->isWrite(); } +c10::optional findAliasedOutput(const FunctionSchema& schema, const int64_t immutable_input_idx) { + for (size_t res_idx = 0; res_idx != schema.returns().size(); ++res_idx) { + if (schema.may_contain_alias(SchemaArgument(SchemaArgType::input, immutable_input_idx), SchemaArgument(SchemaArgType::output, res_idx))) { + return res_idx; // for everything currently in native_functions, each input aliases at most one output (tensor list counts as one output) + } + } + return nullopt; +} #ifdef HAS_TORCH_SHOW_DISPATCH_TRACE static void dump_local_tls() { @@ -391,43 +418,34 @@ WithoutTop::~WithoutTop() { pushDynamicLayer(std::move(layer_)); } -// NOTE: [forward-mode AD decompositions hack] -// -// The mechanism is: in DynamicLayerFrontMode, IF we are dispatching on the -// jvp transform, AND we have a decomposition for the operation, then run -// the decomposition. +// NOTE: [functorch front and back key fallbacks] // -// Let's break that down. There are a douple of moving pieces. +// Please read NOTE: [functorch interpreter stack] first for some context. +// The following doc also provides some visuals: +// https://docs.google.com/document/d/14qyaa3xIjmVxYiMLlIlQErunYgR_uR1WupsKMZlnGY4/edit // -// 0. How do we know what transform we're dispatching on? -// Easy, check the top of the DynamicLayerStack and read the transform. +// functorch's "stack of transforms" is implemented as the following: +// - each transform is associated with one or more dispatch keys in the PyTorch +// dispatcher. For example, vmap -> {FuncTorchBatched, FuncTorchVmapMode}, +// Autograd -> {Autograd{Backend}, ADInplaceOrView} +// - Whenever a functorch transform is active, the FuncTorchDynamicLayer{Front, Back}Mode +// keys are added to the dispatcher's local dispatch key set. // -// 1. Next, we must identify when an operation (e.g. nll_loss_backward) -// gets dispatched to. -// - register a special kernel to the DynamicLayerFrontMode key -// (see JVP_DECOMP) -// - that special kernel invokes dynamicLayerFrontFallbackOperator with -// an arg indicating we're going to use a decomp +// DynamicLayerFrontMode is responsible for: +// 1. selecting the transform that is at the top of the stack and grabbing its +// interpreter +// 2. Calling interpreter.process(), which does the following: +// 2a. enables/disables a bunch of dispatch keys, so that the only dispatch +// keys that are enabled are the ones that belong to the transform. +// 2b. redispatching // -// 2. Next, we need to call the decomposition. See call_decomposition_for_jvp. -// We currently use python decompositions that we torchscript. +// Eventually, DynamicLayerBackMode captures the redispatch from the transforms. +// DynamicLayerBackMode is responsible for: +// - redirecting back to DynamicLayerFrontMode -// Ideally c10::OperatorHandle would have a field like this -// to identify the operator. -// The stuff here should map 1:1 with the operator name. -// aten::nll_loss_backward -> nll_loss_backward -// aten::add.Tensor -> add_Tensor - -static void call_decomposition_for_jvp( +static void dynamicLayerFrontFallback( const c10::OperatorHandle& op, torch::jit::Stack* stack) { - run_jit_decomposition(op, stack); -} - -static void dynamicLayerFrontFallbackOperator( - const c10::OperatorHandle& op, - torch::jit::Stack* stack, - bool decomp_jvp) { auto& dynamicLayerStack = dynamicLayerStackAccessor(); TORCH_INTERNAL_ASSERT(dynamicLayerStack.size() > 0); #ifdef HAS_TORCH_SHOW_DISPATCH_TRACE @@ -436,13 +454,6 @@ static void dynamicLayerFrontFallbackOperator( dump_local_tls(); } #endif - - // Hack: if jvp and we have a decomposition registered, then do the decomposition - if (dynamicLayerStack.back().interpreter().key() == TransformType::Jvp && - decomp_jvp) { - return call_decomposition_for_jvp(op, stack); - } - // Save the current LocalDispatchKeySet (to the current DynamicLayer). // Upon exiting the current scope, that LocalDispatchKeySet gets restored. // When the current DynamicLayer dispatches to the next (inner) DynamicLayer, @@ -462,50 +473,44 @@ restoreLocalDispatchKeySetRAII(const c10::impl::LocalDispatchKeySet& key_set) { return c10::impl::ForceDispatchKeyGuard(key_set); } -void dynamicLayerFrontFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { - return dynamicLayerFrontFallbackOperator(op, stack, false); +// right now grad_special_case as a bool is sufficient because this is the only special case for grad. If we need to add +// more special cases, it's more scalable to add an enum to know which op we're looking at without looking at the schema +void dynamicLayerBack(const c10::OperatorHandle& op, torch::jit::Stack* stack, bool grad_special_case) { + auto& layer = dynamicLayerStackAccessor().back(); + auto restore_guard = restoreLocalDispatchKeySetRAII(layer.interpreter().getSavedLocalDispatchKeySet()); + WithoutTop guard; + + layer.interpreter().sendToNextInterpreter(op, stack, grad_special_case); } -void dynamicLayerFrontFallBackWithDecomp( - const c10::OperatorHandle& op, - torch::jit::Stack* stack) { - return dynamicLayerFrontFallbackOperator(op, stack, true); +// used for functions that have aliasing operations but should be treated like they're out of place (i.e. lift_fresh) +void dynamicLayerBackGradSpecialCase(const c10::OperatorHandle& op, torch::jit::Stack* stack) { + return dynamicLayerBack(op, stack, true); } void dynamicLayerBackFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { - auto& layer = dynamicLayerStackAccessor().back(); - auto restore_guard = restoreLocalDispatchKeySetRAII(layer.interpreter().getSavedLocalDispatchKeySet()); - WithoutTop guard; - - layer.interpreter().sendToNextInterpreter(op, stack); + return dynamicLayerBack(op, stack, false); } -TORCH_LIBRARY_IMPL(_, FT_DYNAMIC_LAYER_FRONT_MODE_KEY, m) { +TORCH_LIBRARY_IMPL(_, FuncTorchDynamicLayerFrontMode, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&dynamicLayerFrontFallback>()); } -TORCH_LIBRARY_IMPL(_, FT_DYNAMIC_LAYER_BACK_MODE_KEY, m) { +TORCH_LIBRARY_IMPL(_, FuncTorchDynamicLayerBackMode, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&dynamicLayerBackFallback>()); } -#define JVP_DECOMP(op) \ - m.impl(#op, torch::CppFunction::makeFromBoxedFunction<&dynamicLayerFrontFallBackWithDecomp>()); -#define JVP_DECOMP2(op, overload) \ - m.impl(#op "." #overload, torch::CppFunction::makeFromBoxedFunction<&dynamicLayerFrontFallBackWithDecomp>()); +#define SPECIAL_GRAD_CASE(op) \ + m.impl(#op, torch::CppFunction::makeFromBoxedFunction<&dynamicLayerBackGradSpecialCase>()); -TORCH_LIBRARY_IMPL(aten, FT_DYNAMIC_LAYER_FRONT_MODE_KEY, m) { - JVP_DECOMP(nll_loss_backward); - JVP_DECOMP(nll_loss2d_backward); - JVP_DECOMP(_log_softmax_backward_data); - JVP_DECOMP(_softmax_backward_data); - OP_DECOMPOSE(log_sigmoid); - JVP_DECOMP(log_sigmoid_forward); - JVP_DECOMP(native_layer_norm_backward); - JVP_DECOMP(native_batch_norm_backward); - JVP_DECOMP(cudnn_batch_norm_backward); +TORCH_LIBRARY_IMPL(aten, FuncTorchDynamicLayerBackMode, m) { + // lift_fresh: it's must be freshly allocated and should be wrapped. User shouldn't have access to input version + // alias: this is needed for the CompositeImplicit instance norm (running_mean/var get set to be a wrapped value) + // It's not a user facing function, but is more prone to possible errors + SPECIAL_GRAD_CASE(lift_fresh); + SPECIAL_GRAD_CASE(alias); } - } } // namespace at diff --git a/aten/src/ATen/functorch/DynamicLayer.h b/aten/src/ATen/functorch/DynamicLayer.h new file mode 100644 index 000000000000..6c7139f5c01e --- /dev/null +++ b/aten/src/ATen/functorch/DynamicLayer.h @@ -0,0 +1,131 @@ +// Copyright (c) Facebook, Inc. and its affiliates. +// All rights reserved. +// +// This source code is licensed under the BSD-style license found in the +// LICENSE file in the root directory of this source tree. + +#pragma once +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +// Forward declared +namespace c10 { struct AutogradMetaInterface; } + +namespace at { +namespace functorch { + +// This file contains the implementation of functorch's interpreter stack. +// See NOTE: [functorch interpreter stack] first before reading on. +// +// NB: the functorch interpreter stack is also referred to as: +// - the "dynamic layer stack" -- an older name for "interpreter" was +// "dynamic layer". +// - the "functorch mode stack". You can think of each functorch transform as a +// "mode" (in the same sense as torch_dispatch mode or torch_function mode), +// and functorch being an implementation of a "mode stack" where the modes +// may be arbitrary composed. + +// DynamicLayer is basically the same thing as an Interpreter. +// It represents a functorch transform and it holds an Interpreter, +// which contains metadata related to the transform and instructions on +// how to perform the transform. +// +// TODO: we can excise DynamicLayer in favor of Interpreter, +// But I am going to leave it for now as a compatiblity shim to avoid +// needing to refactor a lot of callsites... +struct TORCH_API DynamicLayer { + explicit DynamicLayer( + TransformType transform_type, + int64_t layerId, + optional batchSize = nullopt, + optional randomness = nullopt, + optional prev_grad_mode = nullopt, + optional pre_fwd_grad_mode = nullopt, + optional functionalize_add_back_views = nullopt); + + TransformType key() const; + int64_t layerId() const; + + const Interpreter& interpreter() const { return interpreter_; } + Interpreter& interpreter() { return interpreter_; } + + // Only valid for vmap + int64_t batchSize() const; + RandomnessType randomness() const; + + private: + Interpreter interpreter_; +}; + +TORCH_API int64_t initAndPushDynamicLayer( + TransformType transform_type, + optional batch_size = nullopt, + optional randomness = nullopt, + optional prev_grad_mode = nullopt, + optional prev_fwd_grad_mode = nullopt, + optional functionalize_add_back_views = nullopt); +TORCH_API DynamicLayer popDynamicLayerAndDeleteMetadata(); +TORCH_API c10::optional maybeCurrentDynamicLayer(); +TORCH_API const std::vector& getDynamicLayerStack(); +TORCH_API void setDynamicLayerStack(const std::vector& stack); +TORCH_API void setDynamicLayerFrontBackKeysIncluded(bool included); + +// NB: Not lock safe, you should only call this from Python where the GIL will +// prevent race conditions. +TORCH_API bool areTransformsActive(); + +// NOTE: [Life handles and lexically scoped transforms] +// functorch transforms are lexically scoped. +// Given a level, we store a "life handle" that is a boolean that tells us if the +// transform with that level is active or not. +// +// functorch's TensorWrapper (for grad transforms) stores a life handle. +// If a TensorWrapper escapes from the scope of the transform, then somehow +// it must know it escaped; it can tell by querying the life handle. +// +// NB: not lock safe. TODO: does it need a lock? +TORCH_API std::shared_ptr getLifeHandleForLevel(int64_t level); + +// Returns if an operator is in-place. An operator is inplace if: +// 1. The first argument is a Tensor and it is being written to +// 2. The first argument is being returned +// 3. No other arguments are aliased +// Here is an example of an in-place operator: +// add_(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!) +TORCH_API bool isInplaceOp(const c10::FunctionSchema& schema); + +// Given the indices of unwrapped inputs and the schema, this returns the indices of any outputs that should remain unwrapped +TORCH_API c10::optional findAliasedOutput(const FunctionSchema& schema, const int64_t immutable_input); + +TORCH_API Tensor unwrapIfDead(const Tensor& tensor); + +// Pretty printers +TORCH_API std::ostream& operator<<(std::ostream& os, const DynamicLayer& layer); +TORCH_API std::ostream& operator<<(std::ostream& os, const std::vector& dynamicLayerStack); + +// While a functorch transform is active, autograd.Function is disabled +// by default. The following two APIs are APIs for enabling +// autograd.Function. These are not user-facing APIs. +TORCH_API void setAutogradFunctionAllowed(bool allowed); +TORCH_API bool getAutogradFunctionAllowed(); + +// While a functorch grad transform is active, Tensor.requires_grad_() gets +// disabled. These two functions are the mechanism to controlling that. +TORCH_API void setInplaceRequiresGradAllowed(bool allowed); +TORCH_API bool getInplaceRequiresGradAllowed(); + +TORCH_API DynamicLayer popDynamicLayer(); +TORCH_API int64_t pushDynamicLayer(DynamicLayer&& layer); + +} +} // namespace at diff --git a/functorch/functorch/csrc/FunctionalizeInterpreter.cpp b/aten/src/ATen/functorch/FunctionalizeInterpreter.cpp similarity index 94% rename from functorch/functorch/csrc/FunctionalizeInterpreter.cpp rename to aten/src/ATen/functorch/FunctionalizeInterpreter.cpp index 4242305636cf..40e22c455509 100644 --- a/functorch/functorch/csrc/FunctionalizeInterpreter.cpp +++ b/aten/src/ATen/functorch/FunctionalizeInterpreter.cpp @@ -1,5 +1,5 @@ -#include -#include +#include +#include #include namespace at { namespace functorch { @@ -47,7 +47,8 @@ void FunctionalizeInterpreterPtr::processImpl( void FunctionalizeInterpreterPtr::sendToNextInterpreterImpl( const c10::OperatorHandle& op, - torch::jit::Stack* stack) { + torch::jit::Stack* stack, + bool grad_special_case) { // For now, we don't support nested functionalization calls. // This check just enforces that - after the functionalize kernel runs // and we hit the BackModeFallback, we'll have unwrapped our FunctionalTensors diff --git a/functorch/functorch/csrc/FunctionalizeInterpreter.h b/aten/src/ATen/functorch/FunctionalizeInterpreter.h similarity index 75% rename from functorch/functorch/csrc/FunctionalizeInterpreter.h rename to aten/src/ATen/functorch/FunctionalizeInterpreter.h index 5475b38f068f..4157eb82d84f 100644 --- a/functorch/functorch/csrc/FunctionalizeInterpreter.h +++ b/aten/src/ATen/functorch/FunctionalizeInterpreter.h @@ -1,14 +1,17 @@ #pragma once -#include +#include namespace at { namespace functorch { +// This is the interpreter that handles the functionalize() transform. +// See NOTE: [functorch interpreter stack] for more details. + struct FunctionalizeInterpreterPtr { explicit FunctionalizeInterpreterPtr(const Interpreter* base): base_(base) { TORCH_INTERNAL_ASSERT(base->key() == TransformType::Functionalize); } TransformType key() const { return base_->key(); } int64_t level() const { return base_->level(); } void processImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack); - void sendToNextInterpreterImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack); + void sendToNextInterpreterImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack, bool grad_special_case); bool functionalizeAddBackViews() const { return c10::get(base_->meta()).functionalizeAddBackViews_; } diff --git a/functorch/functorch/csrc/Interpreter.cpp b/aten/src/ATen/functorch/Interpreter.cpp similarity index 75% rename from functorch/functorch/csrc/Interpreter.cpp rename to aten/src/ATen/functorch/Interpreter.cpp index cce9fa05f70e..2531a49d5f19 100644 --- a/functorch/functorch/csrc/Interpreter.cpp +++ b/aten/src/ATen/functorch/Interpreter.cpp @@ -1,9 +1,9 @@ -#include -#include -#include -#include -#include -#include +#include +#include +#include +#include +#include +#include namespace at { namespace functorch { @@ -12,18 +12,18 @@ static DispatchKeySet get_all_dynlayer_keyset() { // "all dispatch keys between DynamicLayer{Front, Back}Mode, inclusive" auto result = - DispatchKeySet(DispatchKeySet::FULL_AFTER, kDynamicLayerFrontModeKey) - - DispatchKeySet(DispatchKeySet::FULL_AFTER, kDynamicLayerBackModeKey); - result = result | DispatchKeySet({kDynamicLayerFrontModeKey}); + DispatchKeySet(DispatchKeySet::FULL_AFTER, DispatchKey::FuncTorchDynamicLayerFrontMode) - + DispatchKeySet(DispatchKeySet::FULL_AFTER, DispatchKey::FuncTorchDynamicLayerBackMode); + result = result | DispatchKeySet({DispatchKey::FuncTorchDynamicLayerFrontMode}); // Hack: don't handle the autocast dispatch keys. Their interaction with functorch // is weird. result = result - autocast_dispatch_keyset; - // Hack: don't handle kVmapModeKey. We need a better way of modeling this. - // In e.g. grad(vmap(f)), kVmapModeKey makes it so that all random operations, + // Hack: don't handle DispatchKey::FuncTorchVmapMode. We need a better way of modeling this. + // In e.g. grad(vmap(f)), DispatchKey::FuncTorchVmapMode makes it so that all random operations, // even after we are done handling the vmap layer, error out. - result = result.remove(kVmapModeKey); + result = result.remove(DispatchKey::FuncTorchVmapMode); return result; } @@ -34,10 +34,10 @@ static DispatchKeySet all_dynlayer_keyset = get_all_dynlayer_keyset(); static DispatchKeySet keysForEnteringDynamicLayer(TransformType key) { if (key == TransformType::Vmap) { - // NB: Does not include kVmapModeKey. We may modulate the key when + // NB: Does not include DispatchKey::FuncTorchVmapMode. We may modulate the key when // constructing the DynamicLayer, but we don't control it when entering/exiting // the DynamicLayer. - return DispatchKeySet({kBatchedKey}); + return DispatchKeySet({DispatchKey::FuncTorchBatched}); } else if (key == TransformType::Grad || key == TransformType::Jvp) { return autograd_dispatch_keyset.add(DispatchKey::ADInplaceOrView); } else if (key == TransformType::Functionalize) { @@ -49,7 +49,7 @@ static DispatchKeySet keysForEnteringDynamicLayer(TransformType key) { DispatchKeySet keysToExcludeWhenEnteringDynamicLayer(TransformType key) { DispatchKeySet exclude = all_dynlayer_keyset; - exclude = exclude.remove(kDynamicLayerBackModeKey); + exclude = exclude.remove(DispatchKey::FuncTorchDynamicLayerBackMode); exclude = exclude - keysForEnteringDynamicLayer(key); return exclude; } @@ -115,8 +115,8 @@ void Interpreter::process(const c10::OperatorHandle& op, torch::jit::Stack* stac INTERPRETER_DISPATCH(key_, SINGLE_ARG(processImpl(op, stack))); } -void Interpreter::sendToNextInterpreter(const c10::OperatorHandle& op, torch::jit::Stack* stack) { - INTERPRETER_DISPATCH(key_, SINGLE_ARG(sendToNextInterpreterImpl(op, stack))); +void Interpreter::sendToNextInterpreter(const c10::OperatorHandle& op, torch::jit::Stack* stack, bool grad_special_case) { + INTERPRETER_DISPATCH(key_, SINGLE_ARG(sendToNextInterpreterImpl(op, stack, grad_special_case))); } }} diff --git a/functorch/functorch/csrc/Interpreter.h b/aten/src/ATen/functorch/Interpreter.h similarity index 91% rename from functorch/functorch/csrc/Interpreter.h rename to aten/src/ATen/functorch/Interpreter.h index 2a1a426824b1..f521e26f2b64 100644 --- a/functorch/functorch/csrc/Interpreter.h +++ b/aten/src/ATen/functorch/Interpreter.h @@ -1,14 +1,11 @@ #pragma once -// variant.h doesn't clean up after itself... -#include -#undef DECLTYPE_AUTO - -#include -#include +#include #include #include #include +#include +#include namespace at { namespace functorch { @@ -143,7 +140,7 @@ struct Interpreter { const InterpreterMeta& meta() const { return meta_; } void process(const c10::OperatorHandle& op, torch::jit::Stack* stack); - void sendToNextInterpreter(const c10::OperatorHandle& op, torch::jit::Stack* stack); + void sendToNextInterpreter(const c10::OperatorHandle& op, torch::jit::Stack* stack, bool grad_special_case); void saveLocalDispatchKeySet(c10::impl::LocalDispatchKeySet keyset) { TORCH_INTERNAL_ASSERT(!savedLocalDispatchKeySet_.has_value()); @@ -178,6 +175,16 @@ struct Interpreter { void foreachTensorInplace(std::vector& args, int64_t begin, int64_t end, std::function func); +// Applies the following for-loop: +// for i in range(begin, end): +// if use_flag_relative[i] == 1: <-- treats use_flag_relative as a bitset +// args[i] = func(args[i], i - begin, true) +// args[i] = func(args[i], i - begin) +void foreachTensorInplaceWithFlag(std::vector& args, int64_t begin, int64_t end, + const std::bitset<64> use_flag_relative, std::function func); + +std::vector findUnwrappedInputs(std::vector& args, int64_t begin, int64_t end); + DispatchKeySet keysToExcludeWhenEnteringDynamicLayer(TransformType key); void setup_dispatch_key_tls(DispatchKeySet exclude, DispatchKeySet include); diff --git a/functorch/functorch/csrc/LegacyBatchingRegistrations.cpp b/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp similarity index 82% rename from functorch/functorch/csrc/LegacyBatchingRegistrations.cpp rename to aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp index 6de2a4000ede..8456bf0008fa 100644 --- a/functorch/functorch/csrc/LegacyBatchingRegistrations.cpp +++ b/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp @@ -7,15 +7,14 @@ #include #include #include -#include +#include -#include -#include -#include -#include -#include -#include -#include +#include +#include +#include +#include +#include +#include namespace at { namespace functorch { @@ -23,6 +22,10 @@ namespace functorch { // NOTE: [What is a batching rule?] // +// NB: the following description only applies to this file and is about +// the legacy (deprecated) batching rule API. Please see writing_batch_rules.md +// for how to write new-style batching rules. +// // This files contains batching rules written with the legacy (now-deprecated) // batching rule API. // Please try to use the new-style batching rule API (see writing_batch_rules.md) @@ -61,23 +64,20 @@ namespace functorch { // to do steps (1), (2), and (4). // (see NOTE: [What is an VmapTransform?] in VmapTransforms.h) -// Note: [Future plans] -// The API for writing a batching rule isn't stable. In the future, we'd like -// to think about the problem of translating these batching rules to TorchScript. -// Ideally batching rules in eager mode vs TorchScript would look pretty similar, -// if not use the same mechanism. In order to accomplish that we might have to -// do some refactoring. - // PyTorch allows operations to specify dim 0 and dim -1 on a scalar tensor. static bool is_allowed_dim_on_scalar_tensor(int64_t dim) { return dim == 0 || dim == -1; } -// This check should probably go into the dispatcher... -static bool participatesInCurrentLevel(const Tensor& self) { +static int64_t get_current_level() { auto maybe_level = maybeCurrentDynamicLayer(); TORCH_INTERNAL_ASSERT(maybe_level.has_value()); - auto current_level = maybe_level->layerId(); + return maybe_level->layerId(); +} + +// This check should probably go into the dispatcher... +static bool participatesInCurrentLevel(const Tensor& self) { + auto current_level = get_current_level(); auto* maybe_batched_impl = maybeGetBatchedImpl(self); if (!maybe_batched_impl) { return false; @@ -87,7 +87,7 @@ static bool participatesInCurrentLevel(const Tensor& self) { return self_level == current_level; } -static bool participatesInCurrentLevel(TensorList self) { +static bool participatesInCurrentLevel(ITensorListRef self) { for (const Tensor& tensor : self) { if (participatesInCurrentLevel(tensor)) { return true; @@ -109,7 +109,7 @@ bool isPhysicalScalarTensor(const Tensor& logical_tensor) { std::vector chunk_batching_rule(const Tensor& self, int64_t chunks, int64_t dim) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return self.chunk(chunks, dim); } @@ -122,7 +122,7 @@ std::vector chunk_batching_rule(const Tensor& self, int64_t chunks, int6 std::vector tensor_split_sections_batching_rule(const Tensor& self, int64_t sections, int64_t dim) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::tensor_split(self, sections, dim); } auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self); @@ -134,7 +134,7 @@ std::vector tensor_split_sections_batching_rule(const Tensor& self, int6 std::vector tensor_split_indices_batching_rule(const Tensor& self, IntArrayRef indices, int64_t dim) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::tensor_split(self, indices, dim); } auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self); @@ -146,7 +146,7 @@ std::vector tensor_split_indices_batching_rule(const Tensor& self, IntAr Tensor& squeeze_dim__batching_rule(Tensor& self, int64_t dim) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return self.squeeze_(dim); } auto* batched = maybeGetBatchedImpl(self); @@ -180,7 +180,7 @@ Tensor& squeeze_dim__batching_rule(Tensor& self, int64_t dim) { Tensor& squeeze__batching_rule(Tensor& self) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return self.squeeze_(); } auto* batched = maybeGetBatchedImpl(self); @@ -217,7 +217,7 @@ Tensor& squeeze__batching_rule(Tensor& self) { Tensor& unsqueeze__batching_rule(Tensor& self, int64_t dim) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return self.unsqueeze_(dim); } auto* batched = maybeGetBatchedImpl(self); @@ -237,7 +237,7 @@ Tensor& unsqueeze__batching_rule(Tensor& self, int64_t dim) { Tensor& transpose__batching_rule(Tensor& self, int64_t dim0, int64_t dim1) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return self.transpose_(dim0, dim1); } auto* batched = maybeGetBatchedImpl(self); @@ -269,7 +269,7 @@ Tensor& transpose__batching_rule(Tensor& self, int64_t dim0, int64_t dim1) { Tensor& fill_inplace_scalar_batching_rule(Tensor& self, Scalar value) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return self.fill_(value); } auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self); @@ -299,7 +299,7 @@ Tensor& zero_inplace_batching_rule(Tensor &self) { Tensor transpose_int_batching_rule(const Tensor& self, int64_t dim0, int64_t dim1) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::transpose(self, dim0, dim1); } // PyTorch has a special case where scalar_tensor.transpose(dim0, dim1) works @@ -324,7 +324,7 @@ static int64_t getGradInputPhysicalDim(int64_t dim, IntArrayRef input_sizes, int Tensor select_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { if (!participatesInCurrentLevel(grad)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::select_backward(grad, input_sizes, dim, index); } auto grad_physical = MultiBatchVmapTransform::logicalToPhysical(grad); @@ -336,7 +336,7 @@ Tensor select_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes Tensor slice_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t start, int64_t end, int64_t step) { if (!participatesInCurrentLevel(grad)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::slice_backward(grad, input_sizes, dim, start, end, step); } auto grad_physical = MultiBatchVmapTransform::logicalToPhysical(grad); @@ -348,7 +348,7 @@ Tensor slice_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes, std::vector split_batching_rule(const Tensor& self, int64_t split_size, int64_t dim) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::split(self, split_size, dim); } auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self); @@ -360,7 +360,7 @@ std::vector split_batching_rule(const Tensor& self, int64_t split_size, std::vector split_with_sizes_batching_rule(const Tensor& self, IntArrayRef split_sizes, int64_t dim) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return split_with_sizes(self, split_sizes, dim); } auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self); @@ -372,7 +372,7 @@ std::vector split_with_sizes_batching_rule(const Tensor& self, IntArrayR std::vector unbind_batching_rule(const Tensor& self, int64_t dim) { if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::unbind(self, dim); } auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self); @@ -382,35 +382,11 @@ std::vector unbind_batching_rule(const Tensor& self, int64_t dim) { return result; } -// Checks that the smallest batch stride is greater than the largest example -// stride. This is something we can support but we choose not to because it's -// potentially error prone. -static void checkBatchDimsAtFrontInLayout(IntArrayRef physical_strides, int64_t num_batch_dims) { - auto smallest_batch_stride = std::min_element( - physical_strides.begin(), physical_strides.begin() + num_batch_dims); - auto largest_example_stride = std::max_element( - physical_strides.begin() + num_batch_dims, physical_strides.end()); - if (largest_example_stride == physical_strides.end()) { - // No example dimensions - return; - } - if (num_batch_dims == 1 && physical_strides.size() > 0 && physical_strides[0] == 0) { - // degenerate batch dim - return; - } - TORCH_CHECK(*smallest_batch_stride >= *largest_example_stride, - "vmap: Calling Tensor.as_strided is not supported unless the batch dims being ", - "vmapped over are at the front of the tensor (in memory layout). When they are ", - "not at the front of the tensor this operation can be error prone so we " - "actively discourage it; please file us a bug report and/or try to ", - "express the as_strided operation in terms of PyTorch view operations"); -} - // given (sizes, strides, storage_offset) returns the maximum location that // can be indexed (or nullopt if such a location doesn't exist, e.g., tensors // with zero-size dims). -static optional maximum_indexable_location( - IntArrayRef sizes, IntArrayRef strides, int64_t storage_offset) { +static optional maximum_indexable_location( + c10::SymIntArrayRef sizes, c10::SymIntArrayRef strides, c10::SymInt storage_offset) { auto result = native::storage_size_for(sizes, strides); if (result == 0) { return nullopt; @@ -425,12 +401,12 @@ static optional maximum_indexable_location( static void checkBasicAsStridedValidForSlice( const Tensor& physical_tensor, int64_t num_batch_dims, - IntArrayRef sizes, - IntArrayRef strides, - optional maybe_storage_offset) { - auto slice_sizes = physical_tensor.sizes().slice(num_batch_dims); - auto slice_strides = physical_tensor.strides().slice(num_batch_dims); - auto base_offset = physical_tensor.storage_offset(); + c10::SymIntArrayRef sizes, + c10::SymIntArrayRef strides, + optional maybe_storage_offset) { + auto slice_sizes = physical_tensor.sym_sizes().slice(num_batch_dims); + auto slice_strides = physical_tensor.sym_strides().slice(num_batch_dims); + auto base_offset = physical_tensor.sym_storage_offset(); auto storage_offset = maybe_storage_offset.value_or(base_offset); @@ -442,7 +418,7 @@ static void checkBasicAsStridedValidForSlice( } if (!max_slice_loc.has_value()) { TORCH_CHECK(false, - "result = tensor.as_strided(", sizes, ",", strides, ",", storage_offset, ")", + "result = tensor.as_strided(", sizes, ", ", strides, ", ", storage_offset, ") ", "can access memory outside of `tensor`. `tensor` has no storage but the ", "passed-in (size, stride, storage_offset) imply a result with some storage. ", "This is not supported inside of vmap, please try to rewrite the ", @@ -451,11 +427,11 @@ static void checkBasicAsStridedValidForSlice( TORCH_CHECK( *max_as_strided_loc <= *max_slice_loc && base_offset <= storage_offset, - "result = tensor.as_strided(", sizes, ",", strides, ",", storage_offset, ")", - "can access memory outside of `tensor`. `result` can access some", + "result = tensor.as_strided(", sizes, ", ", strides, ", ", storage_offset, ") ", + "can access memory outside of `tensor`. `result` can access some ", "memory in range [", storage_offset, ", ", *max_as_strided_loc, "], but ", "`tensor` can only access some memory in range [", base_offset, ", ", - *max_slice_loc, "]. This is not supported inside of vmap, please try to", + *max_slice_loc, "]. This is not supported inside of vmap, please try to ", "rewrite the `as_strided` call as a sequence of PyTorch view operations"); } @@ -483,12 +459,12 @@ static void checkBasicAsStridedValidForSlice( // >>> z = [x[i].as_strided([1], [1], 1 + x[i].storage_offset() - 1) for i in range(4)] Tensor as_strided_batching_rule( const Tensor& tensor, - IntArrayRef sizes, - IntArrayRef strides, - optional storage_offset) { + c10::SymIntArrayRef sizes, + c10::SymIntArrayRef strides, + optional storage_offset) { if (!participatesInCurrentLevel(tensor)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); - return at::as_strided(tensor, sizes, strides, storage_offset); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); + return at::as_strided_symint(tensor, sizes, strides, storage_offset); } auto physical_view = MultiBatchVmapTransform::logicalToPhysical(tensor); auto num_batch_dims = physical_view.numBatchDims(); @@ -502,18 +478,15 @@ Tensor as_strided_batching_rule( "same length! Got size ", sizes, " and stride ", strides); // Sanity checks: - // 1. All batch dims are at the front in memory layout (not necessary for - // correctness, but we are worried the user might be doing crazy things) - // 2. as_strided(sizes, strides, storage_offset + tensor[i].offset() - tensor.offset()) + // 1. as_strided(sizes, strides, storage_offset + tensor[i].offset() - tensor.offset()) // is valid for a slice of the input tensor. // See Note: [When will the as_strided batching rule fail?] for details. - checkBatchDimsAtFrontInLayout(physical_tensor.strides(), num_batch_dims); checkBasicAsStridedValidForSlice( physical_tensor, num_batch_dims, sizes, strides, storage_offset); // physical_strides = physical tensor's batch strides + (logical) strides auto batch_strides = physical_tensor.strides().slice(0, num_batch_dims); - VmapDimVector physical_strides; + SymDimVector physical_strides; physical_strides.reserve(num_batch_dims + strides.size()); physical_strides.insert( physical_strides.end(), batch_strides.begin(), batch_strides.end()); @@ -525,7 +498,7 @@ Tensor as_strided_batching_rule( // xs.as_strided(physical_sizes, physical_strides, offset) always succeeds // and creates a tensor y such that each y[i] references the same memory // locations as zi. See NOTE: [When will the as_strided batching rule fail?] - auto result = physical_view.tensor().as_strided( + auto result = physical_view.tensor().as_strided_symint( physical_sizes, physical_strides, storage_offset); return physical_view.getPhysicalToLogicalMap().apply(result); } @@ -618,7 +591,7 @@ Tensor as_strided_batching_rule( template Tensor unwrap_and_call(const Tensor& input, ExtraArgs... args) { if (!participatesInCurrentLevel(input)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return Func(input, args...); } // guard against the user passing in a batch of scalar tensors with batch @@ -630,7 +603,7 @@ Tensor unwrap_and_call(const Tensor& input, ExtraArgs... args) { template Tensor unwrap_and_call_method(const Tensor& input, ExtraArgs... extra_args) { if (!participatesInCurrentLevel(input)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return (input.*Func)(extra_args...); } auto* input_batched = unsafeGetBatchedImpl(input); @@ -638,23 +611,76 @@ Tensor unwrap_and_call_method(const Tensor& input, ExtraArgs... extra_args) { return makeBatched(output_physical, input_batched->bdim(), input_batched->level()); } -Tensor cat_batching_rule(TensorList tensors, int64_t dim) { +Tensor cat_batching_rule(const ITensorListRef& tensors, int64_t dim) { if (!participatesInCurrentLevel(tensors)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::cat(tensors, dim); } - auto physical_views = MultiBatchVmapTransform::logicalToPhysical(tensors); - auto physical_tensors = fmap( - physical_views, [](const VmapPhysicalView& view) -> Tensor { return view.tensor(); }); - TORCH_INTERNAL_ASSERT( - tensors.size() > 0, "The dispatcher should not have dispatched here otherwise."); - auto result = at::cat(physical_tensors, physical_views[0].getPhysicalDim(dim)); - return physical_views[0].getPhysicalToLogicalMap().apply(result); + + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); + + // NB: Probably bad for perf that we're allocating std::vectors for each level, but + // what can you do. + auto materialized = tensors.materialize(); + dim = at::legacy_cat_wrap_dim(dim, materialized); + + // Strategy: + // we're going to unwrap tensors, move their batch dims to the front, + // and put them into `tensors_to_cat`. Tensors that don't have a batch dim + // will get one forced onto them. + // + // Then, we'll do at::cat(tensors_to_cat, ...). + // + // There's a special case where at::cat ignores tensors that have logical shape + // [0]. If we see a Tensor that has logical shape [0] (but physical shape [B, 0]), + // we'll just slice the tensor to get a Tensor of shape [0] to pass to at::cat. + std::vector tensors_to_cat; + tensors_to_cat.reserve(tensors.size()); + c10::optional bdim_size = c10::nullopt; + + // find the bdim size. Might not exist if all BatchedTensors should be skipped + // by cat's special case. + for (const auto& tensor : tensors) { + if (!participatesInCurrentLevel(tensor)) { + continue; + } + if (at::native::cat_should_skip_tensor(tensor)) { + continue; + } + const auto* batched = unsafeGetBatchedImpl(tensor); + bdim_size = batched->value().size(batched->bdim()); + break; + } + + // unwrap batchedtensors; expand out bdims + for (const auto& tensor : tensors) { + if (!participatesInCurrentLevel(tensor)) { + if (at::native::cat_should_skip_tensor(tensor) || !bdim_size.has_value()) { + tensors_to_cat.emplace_back(tensor); + continue; + } + tensors_to_cat.emplace_back(ensure_has_bdim(tensor, /*has_bdim*/false, *bdim_size)); + continue; + } + const auto* batched = unsafeGetBatchedImpl(tensor); + if (at::native::cat_should_skip_tensor(tensor)) { + // Special case: slice the tensor to get something of shape [0] to pass to cat + // We slice instead of allocate a new tensor to propagate requires_gradness... + tensors_to_cat.emplace_back(batched->value().select(/*dim=*/batched->bdim(), /*index=*/0)); + continue; + } + tensors_to_cat.emplace_back(moveBatchDimToFront(batched->value(), batched->bdim())); + } + + auto new_dim = bdim_size.has_value() ? dim + 1 : dim; + c10::optional new_bdim = bdim_size.has_value() ? c10::make_optional((int64_t)0) : nullopt; + auto result = at::cat(tensors_to_cat, new_dim); + return makeBatched(result, new_bdim, get_current_level()); } Tensor block_diag_batching_rule(TensorList tensors) { if (!participatesInCurrentLevel(tensors)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::block_diag(tensors); } auto physical_views = MultiBatchVmapTransform::logicalToPhysical(tensors); @@ -682,7 +708,7 @@ Tensor block_diag_batching_rule(TensorList tensors) { Tensor stack_batching_rule(TensorList tensors, int64_t dim) { if (!participatesInCurrentLevel(tensors)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return at::stack(tensors, dim); } auto physical_views = MultiBatchVmapTransform::logicalToPhysical(tensors); @@ -700,14 +726,17 @@ Tensor stack_batching_rule(TensorList tensors, int64_t dim) { Tensor new_empty_strided_batching_rule( const Tensor& self, - IntArrayRef size, - IntArrayRef stride, + SymIntArrayRef sym_size, + SymIntArrayRef sym_stride, optional dtype, optional layout, optional device, optional pin_memory) { + + auto size = c10::asIntArrayRefSlow(sym_size); + auto stride = c10::asIntArrayRefSlow(sym_stride); if (!participatesInCurrentLevel(self)) { - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); return self.new_empty_strided( size, stride, dtype, layout, device, pin_memory); } @@ -763,25 +792,17 @@ Tensor new_empty_strided_batching_rule( return physical_view.getPhysicalToLogicalMap().apply(result); } -bool BatchedTensor_is_leaf(const Tensor& self) { - if (torch::autograd::impl::get_autograd_meta(self)) { - return torch::autograd::impl::get_autograd_meta(self)->grad_fn_ == nullptr; - } else { - return true; - } -} - Tensor& BatchedTensor_requires_grad_(Tensor& self, bool requires_grad) { self.set_requires_grad(requires_grad); return self; } -TORCH_LIBRARY_IMPL(_, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(_, FuncTorchBatched, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&batchedTensorForLoopFallback>()); } -TORCH_LIBRARY_IMPL(aten, FT_BATCHED_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchBatched, m) { // still legacy b/c teturns multiple tensors m.impl("tensor_split.sections", tensor_split_sections_batching_rule); m.impl("tensor_split.indices", tensor_split_indices_batching_rule); diff --git a/functorch/functorch/csrc/LegacyVmapTransforms.cpp b/aten/src/ATen/functorch/LegacyVmapTransforms.cpp similarity index 88% rename from functorch/functorch/csrc/LegacyVmapTransforms.cpp rename to aten/src/ATen/functorch/LegacyVmapTransforms.cpp index 3b57bd35e52e..682169a52622 100644 --- a/functorch/functorch/csrc/LegacyVmapTransforms.cpp +++ b/aten/src/ATen/functorch/LegacyVmapTransforms.cpp @@ -4,8 +4,8 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include +#include +#include #include #include @@ -76,6 +76,15 @@ VmapDimVector VmapPhysicalView::getPhysicalShape(IntArrayRef logical_shape) cons return result; } +SymDimVector VmapPhysicalView::getPhysicalShape(c10::SymIntArrayRef logical_shape) const { + SymDimVector result; + result.reserve(logical_shape.size() + numBatchDims()); + auto tensor_sizes = tensor_.sym_sizes(); + result.insert(result.end(), tensor_sizes.begin(), tensor_sizes.begin() + numBatchDims()); + result.insert(result.end(), logical_shape.begin(), logical_shape.end()); + return result; +} + static std::tuple computeFrontBatchDimsFromLevels(std::bitset levels_bitset) { int64_t level = 0; int64_t dim = 0; @@ -109,7 +118,7 @@ static Tensor moveDimToFrontAndExpand(Tensor tensor, optional dim, int6 // 4. Expand each physical tensor so that they have output batch size equal // to `batch_sizes` VmapPhysicalViewVec -MultiBatchVmapTransform::logicalToPhysical(TensorList logical_tensors) { +MultiBatchVmapTransform::logicalToPhysical(ITensorListRef logical_tensors) { auto cur_level = maybeCurrentDynamicLayer().value().layerId(); auto bdim_size = -1; @@ -134,12 +143,12 @@ MultiBatchVmapTransform::logicalToPhysical(TensorList logical_tensors) { auto* batched = maybeGetBatchedImpl(logical_tensor); if (!batched || (batched->level() != cur_level)) { // Unsqueeze dim 0, expand it to the correct shape - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto value = moveDimToFrontAndExpand(logical_tensor, {}, bdim_size); result.emplace_back(std::move(value), levels); continue; } - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto physical = batched->value(); auto value = moveDimToFrontAndExpand(physical, batched->bdim(), bdim_size); result.emplace_back(std::move(value), levels); @@ -189,12 +198,12 @@ VmapPhysicalViewVec BroadcastingVmapTransform::logicalToPhysical(TensorList logi auto* batched = maybeGetBatchedImpl(logical_tensor); if (!batched || (batched->level() != cur_level)) { // Unsqueeze dim 0, expand it to the correct shape - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto value = moveDimToFrontAndUnsqueeze(logical_tensor, {}, max_example_dim); result.emplace_back(std::move(value), levels); continue; } - c10::impl::ExcludeDispatchKeyGuard guard(kBatchedKey); + c10::impl::ExcludeDispatchKeyGuard guard(DispatchKey::FuncTorchBatched); auto physical = batched->value(); auto value = moveDimToFrontAndUnsqueeze(physical, batched->bdim(), max_example_dim); result.emplace_back(std::move(value), levels); diff --git a/functorch/functorch/csrc/LegacyVmapTransforms.h b/aten/src/ATen/functorch/LegacyVmapTransforms.h similarity index 95% rename from functorch/functorch/csrc/LegacyVmapTransforms.h rename to aten/src/ATen/functorch/LegacyVmapTransforms.h index 443c4e867de2..5fc05b6c8038 100644 --- a/functorch/functorch/csrc/LegacyVmapTransforms.h +++ b/aten/src/ATen/functorch/LegacyVmapTransforms.h @@ -6,8 +6,8 @@ #pragma once -#include -#include +#include +#include namespace at { namespace functorch { @@ -62,9 +62,9 @@ using VmapDimVector = SmallVector; // permutes all of the batch dims to the front of the tensor, aligns // and expands the batch dims to match each other (according to their `level`), // and returns a VmapPhysicalView on the tensor(s). -struct FUNCTORCH_API MultiBatchVmapTransform { +struct TORCH_API MultiBatchVmapTransform { static VmapPhysicalView logicalToPhysical(const Tensor& logical_tensor); - static VmapPhysicalViewVec logicalToPhysical(TensorList logical_tensors); + static VmapPhysicalViewVec logicalToPhysical(ITensorListRef logical_tensors); }; // VmapTransform for operators that broadcast all inputs. @@ -86,7 +86,7 @@ struct FUNCTORCH_API MultiBatchVmapTransform { // actually *need* to return a tensor of size (1, 2) for the second tensor // because the broadcasting operation takes care of that for us, but we do // it anyways to keep things simple. -struct FUNCTORCH_API BroadcastingVmapTransform { +struct TORCH_API BroadcastingVmapTransform { static VmapPhysicalViewVec logicalToPhysical(TensorList logical_tensors); }; @@ -118,7 +118,7 @@ struct VmapPhysicalToLogicalMap; // ^ // | // levels: 012345 -struct FUNCTORCH_API VmapPhysicalView { +struct TORCH_API VmapPhysicalView { VmapPhysicalView(Tensor&& tensor, std::bitset levels) : levels_(levels), tensor_(tensor) { // TORCH_INTERNAL_ASSERT(!isBatchedTensor(tensor)); @@ -146,6 +146,7 @@ struct FUNCTORCH_API VmapPhysicalView { // Maps a logical shape to a physical shape by pre-pending the batch // sizes to the logical shape. VmapDimVector getPhysicalShape(IntArrayRef logical_shape) const; + SymDimVector getPhysicalShape(c10::SymIntArrayRef logical_shape) const; int64_t numBatchDims() const; @@ -160,7 +161,7 @@ struct FUNCTORCH_API VmapPhysicalView { // to a logical one (BatchedTensor). It holds some levels that are used to do the // mapping and assumes that the batch dimensions in the physical tensor all // occur at the front of the tensor. -struct FUNCTORCH_API VmapPhysicalToLogicalMap { +struct TORCH_API VmapPhysicalToLogicalMap { VmapPhysicalToLogicalMap(std::bitset levels): levels_(levels) {} // Maps a physical tensor to a new logical tensor (BatchedTensor). diff --git a/aten/src/ATen/functorch/Macros.h b/aten/src/ATen/functorch/Macros.h new file mode 100644 index 000000000000..eb0a763261bf --- /dev/null +++ b/aten/src/ATen/functorch/Macros.h @@ -0,0 +1,3 @@ +#pragma once + +#define SINGLE_ARG(...) __VA_ARGS__ diff --git a/functorch/functorch/csrc/PlumbingHelper.cpp b/aten/src/ATen/functorch/PlumbingHelper.cpp similarity index 91% rename from functorch/functorch/csrc/PlumbingHelper.cpp rename to aten/src/ATen/functorch/PlumbingHelper.cpp index e75fb82a3864..5dd01d0abbcb 100644 --- a/functorch/functorch/csrc/PlumbingHelper.cpp +++ b/aten/src/ATen/functorch/PlumbingHelper.cpp @@ -4,9 +4,9 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include -#include +#include +#include +#include namespace at { namespace functorch { @@ -50,7 +50,7 @@ bool isBatchedAtLevel(const c10::optional& maybe_tensor, int64_t level) return isBatchedAtLevel(*maybe_tensor, level); } -bool isBatchedAtLevel(TensorList tensors, int64_t level) { +bool isBatchedAtLevel(ITensorListRef tensors, int64_t level) { for (const auto& tensor : tensors) { if (isBatchedAtLevel(tensor, level)) { return true; diff --git a/aten/src/ATen/functorch/PlumbingHelper.h b/aten/src/ATen/functorch/PlumbingHelper.h new file mode 100644 index 000000000000..9eb486a6eefa --- /dev/null +++ b/aten/src/ATen/functorch/PlumbingHelper.h @@ -0,0 +1,61 @@ +// Copyright (c) Facebook, Inc. and its affiliates. +// All rights reserved. +// +// This source code is licensed under the BSD-style license found in the +// LICENSE file in the root directory of this source tree. +#pragma once +#include +#include +#include + +// NOTE: [vmap plumbing] +// +// Here's how "batching rules" work. +// - we register kernels to the Batched key +// - these kernels have the same signatures as the original operators. +// For example, at::sin(Tensor self) accepts a Tensor, and the batched kernel +// must also accept a Tensor +// - However, it is more natural for users to write a batching rule like the +// following: sin_batch_rule(Tensor self, optional self_bdim) +// - There is some codegenerated layer (the "plumbing") that wraps the user +// defined batching rule (e.g. sin_batch_rule) in a kernel that can be +// registered to the Batched key. +// +// The plumbing is responsible for wrapping a batching rule into a form that may +// be registered as the kernel for the batched key. + +namespace at { namespace functorch { + +// Create a BatchedTensor given a tensor, bdim, and level +TORCH_API Tensor makeBatched(const Tensor& tensor, optional bdim, int64_t level); + +// Given a Tensor that may or may not be a BatchedTensor, unwrap it. +// If `tensor` is not a BatchedTensor, or is a BatchedTensor but the level +// doesn't match, then this returns (tensor, nullopt). +// Otherwise, it returns (unwrap(tensor), bdim). +TORCH_API std::tuple> unwrapTensorAtLevel(const Tensor& tensor, int64_t level); + +// Creates a vector of BatchedTensor +TORCH_API std::vector makeBatchedVector(const std::vector& tensors, optional bdim, int64_t level); + +// Returns True if ANY tensor in tensors is batched at level +TORCH_API bool isBatchedAtLevel(ITensorListRef tensors, int64_t level); +TORCH_API bool isBatchedAtLevel(const c10::List> maybe_tensors, int64_t level); +TORCH_API bool isBatchedAtLevel(const Tensor& tensor, int64_t level); +TORCH_API bool isBatchedAtLevel(const c10::optional& maybe_tensor, int64_t level); + +// Convenience helper. Returns true if any tensor is batched at level +TORCH_API bool areAnyBatchedAtLevel(ArrayRef> maybe_tensors, int64_t level); + +inline bool ivalueParticipatesInCurrentLevel(const IValue& ivalue) { + if (ivalue.isTensor()) { + auto maybe_level = maybeCurrentDynamicLayer(); + TORCH_INTERNAL_ASSERT(maybe_level.has_value()); + auto current_level = maybe_level->layerId(); + return isBatchedAtLevel(ivalue.toTensor(), current_level); + } + // TODO: should really check this + return false; +} + +}} diff --git a/functorch/functorch/csrc/PyTorchOperatorHacks.cpp b/aten/src/ATen/functorch/PyTorchOperatorHacks.cpp similarity index 95% rename from functorch/functorch/csrc/PyTorchOperatorHacks.cpp rename to aten/src/ATen/functorch/PyTorchOperatorHacks.cpp index 0bde1f53d254..9f76253f81fc 100644 --- a/functorch/functorch/csrc/PyTorchOperatorHacks.cpp +++ b/aten/src/ATen/functorch/PyTorchOperatorHacks.cpp @@ -1,26 +1,28 @@ -#include -#include +#include #include #include #include -#include -#include +#include +#include #include #include #include #include #include +#include namespace at { namespace functorch { -// TODO: all of these should be fixed in a more blessed way. In particular, -// it is bad if any of these go out-of-sync with the implementations in -// pytorch/pytorch. +// NOTE: [functorch's PyTorch Operator Hacks] // // This file contains hacks for composite PyTorch operators that are problematic. // For example, the composite op might have in-place operations, // or call data_ptr. We have some idea of how to fix these things in the long term -// (e.g. functionalization for the in-place operations). +// e.g., upstream the changes to PyTorch. +// +// TODO: all of these should be fixed in a more blessed way. In particular, +// it is bad if any of these go out-of-sync with the implementations in +// pytorch/pytorch. // TODO: upstream into core Tensor index_select_backward_hack(const Tensor& grad, IntArrayRef self_sizes, int64_t dim, const Tensor& index) { @@ -79,8 +81,8 @@ Tensor linear_hack(const Tensor& input, const Tensor& weight, const c10::optiona return at::mkldnn_linear(input, weight, *bias); } #if defined(C10_MOBILE) - if (xnnpack::use_linear(input, weight, *bias)) { - return xnnpack::linear(input, weight, *bias); + if (at::native::xnnpack::use_linear(input, weight, *bias)) { + return at::native::xnnpack::linear(input, weight, *bias); } #endif if (input.dim() == 2 && bias->defined()) { @@ -288,7 +290,7 @@ Tensor& feature_alpha_dropout_(Tensor& input, double p, bool train) { } // dropout_hack -TORCH_LIBRARY_IMPL(aten, FT_DYNAMIC_LAYER_FRONT_MODE_KEY, m) { +TORCH_LIBRARY_IMPL(aten, FuncTorchDynamicLayerFrontMode, m) { m.impl("index_select_backward", index_select_backward_hack); m.impl("linear", linear_hack); m.impl("binary_cross_entropy_with_logits", binary_cross_entropy_with_logits_hack); diff --git a/functorch/functorch/csrc/TensorWrapper.cpp b/aten/src/ATen/functorch/TensorWrapper.cpp similarity index 89% rename from functorch/functorch/csrc/TensorWrapper.cpp rename to aten/src/ATen/functorch/TensorWrapper.cpp index 054be6495c37..afd79943051e 100644 --- a/functorch/functorch/csrc/TensorWrapper.cpp +++ b/aten/src/ATen/functorch/TensorWrapper.cpp @@ -4,9 +4,9 @@ // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree. -#include -#include -#include +#include +#include +#include #include #include @@ -62,7 +62,7 @@ c10::intrusive_ptr makeTensorWrapperPtr(const Tensor& tensor, int auto keys_to_propagate = kKeysToPropagateToWrapper | DispatchKeySet({ DispatchKey::AutogradCPU, DispatchKey::AutogradCUDA, DispatchKey::AutogradXLA}); auto key_set = getKeysToPropagateToWrapper(tensor, keys_to_propagate); - key_set = key_set.add(kGradWrapperKey); + key_set = key_set.add(DispatchKey::FuncTorchGradWrapper); if (should_be_alive) { return c10::make_intrusive(key_set, tensor, level, getLifeHandleForLevel(level)); } else { @@ -70,7 +70,7 @@ c10::intrusive_ptr makeTensorWrapperPtr(const Tensor& tensor, int } } -Tensor makeTensorWrapper(const Tensor& tensor, int64_t level) { +Tensor makeTensorWrapper(const Tensor& tensor, int64_t level, bool is_immutable) { auto wrapped = maybeGetTensorWrapper(tensor); if (wrapped) { TORCH_INTERNAL_ASSERT(wrapped->level() < level); @@ -79,10 +79,10 @@ Tensor makeTensorWrapper(const Tensor& tensor, int64_t level) { auto keys_to_propagate = kKeysToPropagateToWrapper | DispatchKeySet({ DispatchKey::AutogradCPU, DispatchKey::AutogradCUDA, DispatchKey::AutogradXLA}); auto key_set = getKeysToPropagateToWrapper(tensor, keys_to_propagate); - key_set = key_set.add(kGradWrapperKey); + key_set = key_set.add(DispatchKey::FuncTorchGradWrapper); auto life_handle = getLifeHandleForLevel(level); - auto result = at::detail::make_tensor(key_set, tensor, level, std::move(life_handle)); - TORCH_INTERNAL_ASSERT(result.key_set().has(kGradWrapperKey)); + auto result = at::detail::make_tensor(key_set, tensor, level, std::move(life_handle), is_immutable); + TORCH_INTERNAL_ASSERT(result.key_set().has(DispatchKey::FuncTorchGradWrapper)); return result; } @@ -121,10 +121,12 @@ TensorWrapper::TensorWrapper( Tensor value, int64_t level, std::shared_ptr is_alive, + bool is_immutable, bool use_value_sizes_strides) : TensorImpl(key_set, value.dtype(), value.device()) , value_(std::move(value)) , level_(level) + , is_immutable_(is_immutable) , is_alive_(std::move(is_alive)) { TORCH_INTERNAL_ASSERT(value_.defined()); @@ -154,7 +156,7 @@ const char* TensorWrapper::tensorimpl_type_name() const { TensorWrapper* maybeGetTensorWrapper(const Tensor& tensor) { - if (!tensor.key_set().has(kGradWrapperKey)) { + if (!tensor.key_set().has(DispatchKey::FuncTorchGradWrapper)) { return nullptr; } return (TensorWrapper*)(tensor.unsafeGetTensorImpl()); @@ -184,7 +186,7 @@ void dead_tensor_wrapper_fallback(const c10::OperatorHandle& op, torch::jit::Sta // TensorWrapper backend fallback: Unwrap and fallthrough. -TORCH_LIBRARY_IMPL(_, FT_GRAD_WRAPPER_KEY, m) { +TORCH_LIBRARY_IMPL(_, FuncTorchGradWrapper, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&dead_tensor_wrapper_fallback>()); } diff --git a/aten/src/ATen/functorch/TensorWrapper.h b/aten/src/ATen/functorch/TensorWrapper.h new file mode 100644 index 000000000000..25da91fd88e8 --- /dev/null +++ b/aten/src/ATen/functorch/TensorWrapper.h @@ -0,0 +1,97 @@ +// Copyright (c) Facebook, Inc. and its affiliates. +// All rights reserved. +// +// This source code is licensed under the BSD-style license found in the +// LICENSE file in the root directory of this source tree. + +#pragma once + +#include +#include + +namespace at { +namespace functorch { + +// NOTE: [functorch's TensorWrapper] +// +// Taking better suggestions for a name. TensorWrapper is the wrapper Tensor +// Subclass for functorch's grad-based transforms (grad, vjp, jvp). It is +// analogous to how vmap uses BatchedTensor as the wrapper Tensor subclass. +// +// If you're familiar with the Tensor-Variable merge, TensorWrapper is effectively +// another Variable. +// +// Consider grad(grad(torch.sin))(x). This wraps `x` as TensorWrapper(TensorWrapper(x)). +// The reason why is so that each TensorWrapper can hold its own AutogradMeta and +// participate in a **separate** autograd graph. +// +// There are alternative designs we could have chosen (e.g. each grad transform +// stores a weak map of Tensor -> AutogradMeta); the benefit of the TensorWrapper +// design is that we can re-use existing VariableType kernels (i.e. Autograd kernels) +// without much modification. Since a TensorWrapper looks like a regular Tensor, +// the VariableType kernel can pull out the AutogradMeta struct from where it +// expects and extend the autograd graph + +struct TORCH_API TensorWrapper : public c10::TensorImpl { + explicit TensorWrapper( + c10::DispatchKeySet key_set, + Tensor value, + int64_t level, + std::shared_ptr is_alive, + bool is_immutable = false, // if true, this came from an operation that aliases an immutable tensor + bool use_value_sizes_strides = true); + + // Override a bunch of methods inherited from TensorImpl to return error messages + void set_size(int64_t dim, int64_t new_size) override; + void set_stride(int64_t dim, int64_t new_stride) override; + void set_storage_offset(int64_t storage_offset) override; + + void refreshMetadata(); + + const Tensor& value() const { + return value_; + } + optional level() const { + if (is_alive()) { + return level_; + } + return {}; + } + bool is_immutable() const { + return is_immutable_; + } + bool is_alive() const; + + // Overrides necessary for autograd + c10::intrusive_ptr shallow_copy_and_detach( + const c10::VariableVersion& version_counter, + bool allow_tensor_metadata_change) const override; + c10::intrusive_ptr shallow_copy_and_detach( + c10::VariableVersion&& version_counter, + bool allow_tensor_metadata_change) const override; + void shallow_copy_from(const c10::intrusive_ptr& impl) override; + + private: + const char* tensorimpl_type_name() const override; + Tensor value_; + int64_t level_; + bool is_immutable_; + + // TensorWrapper receives a boolean flag on whether or not the Grad Interpreter + // that created it is still alive or not. + // If the Grad Interpreter is no longer alive then it attempts to behave like + // a regular Tensor. + // + // When we exit the level, this wrapper may be marked as "not alive". + // Wrappers that are not alive: + // 1) May still have autograd metadata on them + // 2) Forward dispatches to the underlying value() + std::shared_ptr is_alive_; +}; + +TORCH_API Tensor makeTensorWrapper(const Tensor& tensor, int64_t level, bool is_immutable=false); +TORCH_API TensorWrapper* maybeGetTensorWrapper(const Tensor& tensor); +TORCH_API void dumpTensor(std::ostream & ss, const Tensor& tensor); +TORCH_API void dumpTensorCout(const Tensor& tensor); +} +} // namespace at diff --git a/functorch/functorch/csrc/VmapInterpreter.cpp b/aten/src/ATen/functorch/VmapInterpreter.cpp similarity index 68% rename from functorch/functorch/csrc/VmapInterpreter.cpp rename to aten/src/ATen/functorch/VmapInterpreter.cpp index a8f0283aa3b7..a7db8f13a031 100644 --- a/functorch/functorch/csrc/VmapInterpreter.cpp +++ b/aten/src/ATen/functorch/VmapInterpreter.cpp @@ -1,5 +1,5 @@ -#include -#include +#include +#include namespace at { namespace functorch { @@ -7,13 +7,14 @@ void VmapInterpreterPtr::processImpl( const c10::OperatorHandle& op, torch::jit::Stack* stack) { DispatchKeySet exclude = keysToExcludeWhenEnteringDynamicLayer(TransformType::Vmap); - setup_dispatch_key_tls(exclude, DispatchKeySet(kVmapModeKey)); + setup_dispatch_key_tls(exclude, DispatchKeySet(DispatchKey::FuncTorchVmapMode)); op.callBoxed(stack); } void VmapInterpreterPtr::sendToNextInterpreterImpl( const c10::OperatorHandle& op, - torch::jit::Stack* stack) { + torch::jit::Stack* stack, + bool grad_special_case) { // Re-dispatch if (getDynamicLayerStack().size() == 0) { sanityCheckStack(op, stack); diff --git a/functorch/functorch/csrc/VmapInterpreter.h b/aten/src/ATen/functorch/VmapInterpreter.h similarity index 76% rename from functorch/functorch/csrc/VmapInterpreter.h rename to aten/src/ATen/functorch/VmapInterpreter.h index 084cea956b28..2e4e6fff212f 100644 --- a/functorch/functorch/csrc/VmapInterpreter.h +++ b/aten/src/ATen/functorch/VmapInterpreter.h @@ -1,14 +1,17 @@ #pragma once -#include +#include namespace at { namespace functorch { +// This is the interpreter that handles the functionalize() transform. +// See NOTE: [functorch interpreter stack] for more details. + struct VmapInterpreterPtr { explicit VmapInterpreterPtr(const Interpreter* base): base_(base) { TORCH_INTERNAL_ASSERT(base->key() == TransformType::Vmap); } TransformType key() const { return base_->key(); } int64_t level() const { return base_->level(); } void processImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack); - void sendToNextInterpreterImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack); + void sendToNextInterpreterImpl(const c10::OperatorHandle& op, torch::jit::Stack* stack, bool grad_special_case); int64_t batchSize() const { return c10::get(base_->meta()).batchSize_; } diff --git a/functorch/functorch/csrc/VmapModeRegistrations.cpp b/aten/src/ATen/functorch/VmapModeRegistrations.cpp similarity index 83% rename from functorch/functorch/csrc/VmapModeRegistrations.cpp rename to aten/src/ATen/functorch/VmapModeRegistrations.cpp index 922b06e93db4..53c8c01ee7c7 100644 --- a/functorch/functorch/csrc/VmapModeRegistrations.cpp +++ b/aten/src/ATen/functorch/VmapModeRegistrations.cpp @@ -6,13 +6,17 @@ #include #include -#include -#include -#include -#include -#include +#include +#include +#include +#include #include +// functorch's vmap has two Dispatch Keys that implement it: +// FuncTorchBatched and FuncTorchVmapMode. This file contains registrations for +// FuncTorchVmapMode -- these registrations are to error out on operations +// that we don't support on regular Tensors. + namespace at { namespace functorch { diff --git a/aten/src/ATen/jit_macros.h b/aten/src/ATen/jit_macros.h index ca765f03afbf..9af826549021 100644 --- a/aten/src/ATen/jit_macros.h +++ b/aten/src/ATen/jit_macros.h @@ -3,12 +3,5 @@ #include // AT_USE_JITERATOR(), controls whether we jit some elementwise kernels -// Currently unsupported on ROCm GPUs -#if !AT_ROCM_ENABLED() #define AT_USE_JITERATOR() true #define jiterator_stringify(...) std::string(#__VA_ARGS__); -#else -#define AT_USE_JITERATOR() false -#define jiterator_stringify(...) \ - static_assert(false, "Jiterator is not supported on ROCm"); -#endif // USE_ROCM diff --git a/aten/src/ATen/jiterator_macros.h b/aten/src/ATen/jiterator_macros.h index 63a7dfa2eb96..3aa4c7ebb0af 100644 --- a/aten/src/ATen/jiterator_macros.h +++ b/aten/src/ATen/jiterator_macros.h @@ -25,8 +25,8 @@ // These `,`s confuse the preprocessor into thinking we are passing // multiple arguments to the macro. #define jiterator_code(...) __VA_ARGS__ -#if defined(__CUDACC__) -// CPU and CUDA case +#if defined(__CUDACC__) || defined(__HIPCC__) +// CPU and CUDA and ROCm case #define stringify_code(...) #__VA_ARGS__ #define jiterator_also_stringify_as(code, str_name) \ code /* define the function */ \ diff --git a/aten/src/ATen/miopen/Descriptors.h b/aten/src/ATen/miopen/Descriptors.h index a376b30315f7..ba7f232c8fd7 100644 --- a/aten/src/ATen/miopen/Descriptors.h +++ b/aten/src/ATen/miopen/Descriptors.h @@ -3,7 +3,7 @@ #include #include -#include +#include #include namespace at { namespace native { diff --git a/aten/src/ATen/miopen/Utils.h b/aten/src/ATen/miopen/Utils.h index 68ec5bafeebb..a0ec83d976bc 100644 --- a/aten/src/ATen/miopen/Utils.h +++ b/aten/src/ATen/miopen/Utils.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/aten/src/ATen/mkl/SparseBlas.cpp b/aten/src/ATen/mkl/SparseBlas.cpp index 1ad464b8d3a3..ac4bbf65f311 100644 --- a/aten/src/ATen/mkl/SparseBlas.cpp +++ b/aten/src/ATen/mkl/SparseBlas.cpp @@ -1,7 +1,7 @@ /* Provides the implementations of MKL Sparse BLAS function templates. */ - +#define TORCH_ASSERT_NO_OPERATORS #include #include diff --git a/aten/src/ATen/mps/EmptyTensor.cpp b/aten/src/ATen/mps/EmptyTensor.cpp index 759aef741ade..1642f4aeddd1 100644 --- a/aten/src/ATen/mps/EmptyTensor.cpp +++ b/aten/src/ATen/mps/EmptyTensor.cpp @@ -5,6 +5,7 @@ #include #include #include +#include #include #include diff --git a/aten/src/ATen/mps/IndexKernels.h b/aten/src/ATen/mps/IndexKernels.h new file mode 100644 index 000000000000..df22c616baac --- /dev/null +++ b/aten/src/ATen/mps/IndexKernels.h @@ -0,0 +1,181 @@ +#pragma once + +namespace at { +namespace mps { + +static const char * indexing_metal_shaders = R"INDEX_METAL( +#include +#include + +using namespace metal; + +constant uint32_t num_indices [[function_constant(0)]]; + +struct IndexAB { + // Allow up to 16 indices + metal::array indexArray [[ id(0) ]]; +}; + +template +kernel void index_select( + constant IndexAB & indexAB [[buffer(0)]], + constant void * indexSizes [[buffer(1)]], + constant void * indexStrides [[buffer(2)]], + constant uint3 * offsets [[buffer(3)]], + constant void * inputData [[buffer(4)]], + device void * outputData [[buffer(5)]], + uint thread_index [[thread_position_in_grid]]) { + constant int64_t * index_sizes = (constant int64_t *)indexSizes; + constant int64_t * index_strides = (constant int64_t *)indexStrides; + int64_t offset = 0; + for (uint32_t i = 0; i < num_indices; i++) { + int64_t index = ((constant int64_t*)(indexAB.indexArray[i]))[offsets[thread_index].z / sizeof(int64_t)]; + if (index < 0) { + index += index_sizes[i]; + } + offset += index * index_strides[i]; + } + device T * out = (device T*)((device char*)outputData + offsets[thread_index].x); + constant T * in = (constant T*)((constant char*)inputData + offsets[thread_index].y + offset); + *out = *in; +} + +template +kernel void index_put( + constant IndexAB & indexAB [[buffer(0)]], + constant void * indexSizes [[buffer(1)]], + constant void * indexStrides [[buffer(2)]], + constant uint3 * offsets [[buffer(3)]], + constant void * inputData [[buffer(4)]], + device void * outputData [[buffer(5)]], + uint thread_index [[thread_position_in_grid]]) { + + constant int64_t * index_sizes = (constant int64_t *)indexSizes; + constant int64_t * index_strides = (constant int64_t *)indexStrides; + int64_t offset = 0; + for (uint32_t i = 0; i < num_indices; i++) { + int64_t index = ((constant int64_t*)(indexAB.indexArray[i]))[offsets[thread_index].z / sizeof(int64_t)]; + if (index < 0) { + index += index_sizes[i]; + } + offset += index * index_strides[i]; + } + device T * out = (device T*)((device char*)outputData + offsets[thread_index].x + offset); + constant T * in = (constant T*)((constant char*)inputData + offsets[thread_index].y); + *out = *in; +} + +#define REGISTER_INDEX_OP(DTYPE_SIZE, DTYPE, INDEX_OP_TYPE) \ +template \ +[[host_name("index_" #INDEX_OP_TYPE "_" #DTYPE_SIZE)]] \ +kernel void index_ ## INDEX_OP_TYPE( \ + constant IndexAB & indexAB [[buffer(0)]], \ + constant void * indexSizes [[buffer(1)]], \ + constant void * indexStrides [[buffer(2)]], \ + constant uint3 * offsets [[buffer(3)]], \ + constant void * inputData [[buffer(4)]], \ + device void * outputData [[buffer(5)]], \ + uint thread_index [[thread_position_in_grid]]); + +#define REGISTER_INDEX_OP_ALL_DTYPES(INDEX_OP_TYPE) \ + REGISTER_INDEX_OP(8bit, char, INDEX_OP_TYPE); \ + REGISTER_INDEX_OP(16bit, short, INDEX_OP_TYPE); \ + REGISTER_INDEX_OP(32bit, int, INDEX_OP_TYPE); \ + REGISTER_INDEX_OP(64bit, long, INDEX_OP_TYPE); + +REGISTER_INDEX_OP_ALL_DTYPES(select); +REGISTER_INDEX_OP_ALL_DTYPES(put); + +kernel void kernel_index_offsets(constant packed_uint3 * strides [[buffer(0)]], + device uint3 * data_offsets [[buffer(1)]], + constant uint * iter_shape [[buffer(2)]], + constant uint & num_dimensions [[buffer(3)]], + constant uint & num_offsets [[buffer(4)]], + uint thread_index [[thread_position_in_grid]]) { + uint32_t idx = thread_index; + for (uint32_t dim = 0; dim < num_dimensions; dim++) { + uint32_t remainder = idx % iter_shape[dim]; + idx /= iter_shape[dim]; + + for (uint32_t offset = 0; offset < num_offsets; offset++) + data_offsets[thread_index][offset] += remainder * strides[dim][offset]; + } +} + +template +kernel void index_put_accumulate_native_dtypes(constant IndexAB & indexAB [[buffer(0)]], + constant void * indexSizes [[buffer(1)]], + constant void * indexStrides [[buffer(2)]], + constant uint3 * offsets [[buffer(3)]], + constant void * inputData [[buffer(4)]], + device void * outputData [[buffer(5)]], + uint thread_index [[thread_position_in_grid]]) { + constant int64_t * index_sizes = (constant int64_t *)indexSizes; + constant int64_t * index_strides = (constant int64_t *)indexStrides; + int64_t offset = 0; + for (uint32_t i = 0; i < num_indices; i++) { + int64_t index = ((constant int64_t*)(indexAB.indexArray[i]))[offsets[thread_index].z / sizeof(int64_t)]; + if (index < 0) { + index += index_sizes[i]; + } + offset += index * index_strides[i]; + } + device T * out = (device T*)((device char*)outputData + offsets[thread_index].x + offset); + constant E * in = (constant E*)((constant char*)inputData + offsets[thread_index].y); + atomic_fetch_add_explicit(out, *in, memory_order_relaxed); +} + +template +__attribute__((__always_inline__)) void atomic_fetch_add_relaxed(device void * addr, T value) { + device atomic_uint* uintAddr = (device atomic_uint*)addr; + uint expected = atomic_load_explicit(uintAddr, memory_order_relaxed); + T updated = as_type(expected) + value; + while (!atomic_compare_exchange_weak_explicit(uintAddr, &expected, as_type(updated), memory_order_relaxed, memory_order_relaxed)) { + updated = as_type(expected) + value; + } +} + +template +kernel void atomic_index_put_accumulate(constant IndexAB & indexAB [[buffer(0)]], + constant void * indexSizes [[buffer(1)]], + constant void * indexStrides [[buffer(2)]], + constant uint3 * offsets [[buffer(3)]], + constant void * inputData [[buffer(4)]], + device void * outputData [[buffer(5)]], + uint thread_index [[thread_position_in_grid]]) { + constant int64_t * index_sizes = (constant int64_t *)indexSizes; + constant int64_t * index_strides = (constant int64_t *)indexStrides; + int64_t offset = 0; + for (uint32_t i = 0; i < num_indices; i++) { + int64_t index = ((constant int64_t*)(indexAB.indexArray[i]))[offsets[thread_index].z / sizeof(int64_t)]; + if (index < 0) { + index += index_sizes[i]; + } + offset += index * index_strides[i]; + } + device void * out = (device void*)((device char*)outputData + offsets[thread_index].x + offset); + constant T * in = (constant T*)((constant char*)inputData + offsets[thread_index].y); + atomic_fetch_add_relaxed(out, *in); +} + +template +[[host_name("index_put_accumulate_32bit_float")]] +kernel void atomic_index_put_accumulate(constant IndexAB & indexAB [[buffer(0)]], + constant void * indexSizes [[buffer(1)]], + constant void * indexStrides [[buffer(2)]], + constant uint3 * offsets [[buffer(3)]], + constant void * inputData [[buffer(4)]], + device void * outputData [[buffer(5)]], + uint thread_index [[thread_position_in_grid]]); +template +[[host_name("index_put_accumulate_32bit_int")]] +kernel void index_put_accumulate_native_dtypes(constant IndexAB & indexAB [[buffer(0)]], + constant void * indexSizes [[buffer(1)]], + constant void * indexStrides [[buffer(2)]], + constant uint3 * offsets [[buffer(3)]], + constant void * inputData [[buffer(4)]], + device void * outputData [[buffer(5)]], + uint thread_index [[thread_position_in_grid]]); +)INDEX_METAL"; +} +} diff --git a/aten/src/ATen/mps/MPSAllocator.h b/aten/src/ATen/mps/MPSAllocator.h index ee8712d227ce..d739e8956d81 100644 --- a/aten/src/ATen/mps/MPSAllocator.h +++ b/aten/src/ATen/mps/MPSAllocator.h @@ -1,24 +1,11 @@ // Copyright © 2022 Apple Inc. -#include -#include -#include -#include -#include - -#include +#include #include #include #include -#include #include - -#ifdef __OBJC__ -#include -#include -#include -#include -#endif +#include // this implementation is based on CUDACachingAllocator. // It utilizes Metal Heaps to improve the performance with buffer allocation. @@ -47,16 +34,32 @@ namespace HeapAllocator { #define MB(x) round_page(x * 1048576UL) -static const size_t kMaxSmallAlloc = MB(1); // largest "small" allocation is 1 MiB -static const size_t kMinLargeAlloc = MB(10); // allocations between 1 and 10 MiB may use kLargeHeap -static const size_t kSmallHeap = MB(8); // "small" allocations are packed in 8 MiB heaps -static const size_t kLargeHeap = MB(32); // "large" allocations may be packed in 32 MiB heaps -static const size_t kRoundLarge = MB(2); // round up large allocations to 2 MiB - -// TODO: check the caching performance of write-combined mode -constexpr MTLResourceOptions kCPUCacheMode = MTLResourceOptionCPUCacheModeDefault; -constexpr MTLResourceOptions kPrivateResourceOptions = kCPUCacheMode | MTLResourceStorageModePrivate; -constexpr MTLResourceOptions kSharedResourceOptions = kCPUCacheMode | MTLResourceStorageModeShared; +static const size_t kMaxSmallAlloc = MB(1); // largest "small" allocation is 1 MiB +static const size_t kMinLargeAlloc = MB(10); // allocations between 1 and 10 MiB may use kLargeHeap +static const size_t kRoundLarge = MB(2); // round up large allocations to 2 MiB +static const size_t kSmallHeap = MB(8); // "small" allocations are packed in 8 MiB heaps +static const size_t kLargeHeap = MB(32); // "large" allocations may be packed in 32 MiB heaps +static const size_t kXLargeHeapD = MB(128); // "extra large" allocations on Discrete devices may be packed in 128 MiB heaps +static const size_t kXLargeHeapU = MB(1024); // "extra large" allocations on Unified devices may be packed in 1 GiB heaps + +// buffer pools could be customized with a combination of usage flags +enum UsageFlags : uint32_t { + PRIVATE = 0, + SMALL = (1 << 0), // small heaps have sizes of kSmallHeap, and large ones kLargeHeap + SHARED = (1 << 1), // shared pools allocated on devices with unified memory; otherwise, private between host/device + MANAGED = (1 << 2), // managed storage mode + HAZARD = (1 << 3), // enables Automatic Hazard Tracking for the resources allocated on the pool + SCALAR = (1 << 4), // used to import CPU scalar values to GPU and use them in MPS Stream +}; +// debug verbosity flags +enum DebugVerbosity : uint32_t { + SILENT = 0, + PROFILING = (1 << 0), // print generic profiling data for total system memory usage + ALLOCATIONS = (1 << 1), // print buffer allocations + RECYCLES = (1 << 2), // print buffer recycling + RELEASES = (1 << 3), // print buffer releases + LARGE_ONLY = (1 << 4), // only log large buffer pool transactions +}; struct HeapBlock; @@ -70,11 +73,16 @@ struct BufferBlock bool in_use; HeapBlock* heap; id_t buf_id; + // counter to candidate least recently used buffers for garbage collection + uint32_t gc_count; + uint32_t use_count; + // counter to assign unique ids to buffer blocks + static uint64_t buffer_counter; BufferBlock(size_t Size, size_t RequestedSize = 0, const id Buffer = nullptr, - HeapBlock* Heap = nullptr, id_t BufID = 0) : + HeapBlock* Heap = nullptr) : buffer(Buffer), size(Size), requested_size(RequestedSize), - in_use(false), heap(Heap), buf_id(BufID) { } + in_use(false), heap(Heap), buf_id(++buffer_counter), gc_count(0), use_count(0) { } static bool Comparator(const BufferBlock* a, const BufferBlock* b) { return (a->size != b->size) ? a->size < b->size : (uintptr_t)a->buffer < (uintptr_t)b->buffer; @@ -83,10 +91,28 @@ struct BufferBlock assert(((Alignment - 1) & Alignment) == 0); return ((Size + Alignment - 1) & ~(Alignment - 1)); } + uint32_t retainCount() const { return [buffer retainCount]; } }; typedef bool (*BufferComparison)(const BufferBlock*, const BufferBlock*); struct BufferPool; +struct AllocParams +{ + AllocParams(size_t Alloc_Size, size_t Requested_Size, BufferPool* Pool) : + search_key(Alloc_Size), pool(Pool), buffer_block(nullptr), + requested_size(Requested_Size), has_memory_pressure(false) { } + size_t size() const { return search_key.size; } + + BufferBlock search_key; + BufferPool* pool; + BufferBlock* buffer_block; + size_t requested_size; + // true if we exceed the low watermark limit. In this case + // we apply strategies to relieve the pressure before allocation. + bool has_memory_pressure; + // true if we're allocating on a unified memory device + bool has_unified_memory; +}; struct HeapBlock { @@ -94,37 +120,68 @@ struct HeapBlock struct { size_t total, available; } size; BufferPool* pool; unsigned int n_buffers; + id_t heap_id; + // indicates if we split this heap to sub-allocate 'several' buffers (otherwise single buffer) + bool is_split; + // counter to assign unique ids to heap blocks + static uint64_t heap_counter; HeapBlock(size_t Size, const id Heap = nullptr, BufferPool *Pool = nullptr) : - heap(Heap), size({.total = Size, .available = Size}), pool(Pool), n_buffers(0) { } + heap(Heap), size({.total = Size, .available = Size}), pool(Pool), + n_buffers(0), heap_id(++heap_counter), is_split(true) { } + + static MTLResourceOptions getOptions(uint32_t usage) { + // TODO: check the caching performance of write-combined mode + MTLResourceOptions options = MTLResourceCPUCacheModeDefaultCache; + + if (usage & UsageFlags::MANAGED) + options |= MTLResourceStorageModeManaged; + else if (usage & UsageFlags::SHARED) + options |= MTLResourceStorageModeShared; + else + options |= MTLResourceStorageModePrivate; - static MTLResourceOptions getOptions(bool SharedStorage = false) { return SharedStorage ? kSharedResourceOptions : kPrivateResourceOptions; } + options |= (usage & UsageFlags::HAZARD) ? MTLResourceHazardTrackingModeTracked : MTLResourceHazardTrackingModeUntracked; + + return options; + } - static id createMTLHeap(id device, size_t size, bool is_shared) { - id heap = nil; + static HeapBlock* createHeapBlock(AllocParams& params, id device, uint32_t usage) { + HeapBlock *heapBlock = nullptr; + bool is_split = true; + const size_t size = params.size(); MTLHeapDescriptor *d = [MTLHeapDescriptor new]; if (d) { + const size_t kXLargeHeap = params.has_unified_memory ? kXLargeHeapU : kXLargeHeapD; if (size <= kMaxSmallAlloc) { d.size = kSmallHeap; } else if (size < kMinLargeAlloc) { d.size = kLargeHeap; + } else if (size < kXLargeHeap / 2 && !params.has_memory_pressure) { + d.size = kXLargeHeap; } else { d.size = kRoundLarge * ((size + kRoundLarge - 1) / kRoundLarge); + is_split = false; } - d.storageMode = is_shared ? MTLStorageModeShared : MTLStorageModePrivate; + d.storageMode = (usage & UsageFlags::SHARED) ? MTLStorageModeShared : MTLStorageModePrivate; d.cpuCacheMode = MTLCPUCacheModeDefaultCache; // this automatically handles Metal buffer access synchronizations at the // cost of slightly lower performance. - d.hazardTrackingMode = MTLHazardTrackingModeTracked; - d.resourceOptions = getOptions(is_shared) | (MTLHazardTrackingModeTracked << MTLResourceHazardTrackingModeShift); + d.hazardTrackingMode = (usage & UsageFlags::HAZARD) ? MTLHazardTrackingModeTracked : MTLHazardTrackingModeUntracked; + d.resourceOptions = getOptions(usage); d.type = MTLHeapTypeAutomatic; - heap = [device newHeapWithDescriptor: d]; + id heap = [device newHeapWithDescriptor: d]; if (heap) { [heap setPurgeableState:MTLPurgeableStateNonVolatile]; + const size_t heap_size = heapAvailableSize(heap); + heapBlock = new HeapBlock(heap_size, heap, params.pool); + if (heapBlock) { + heapBlock->is_split = is_split; + } } [d release]; } - return heap; + return heapBlock; } static bool Comparator(const HeapBlock* a, const HeapBlock* b) { return a->size.available < b->size.available; @@ -132,82 +189,106 @@ struct HeapBlock static NSUInteger heapAvailableSize(id heap, size_t Alignment = vm_page_size) { return [heap maxAvailableSizeWithAlignment:Alignment]; } - id newMTLBuffer(size_t length, bool is_shared) { - id buf = [heap newBufferWithLength:length options:getOptions(is_shared)]; + id newMTLBuffer(size_t length, uint32_t usage) { + id buf = [heap newBufferWithLength:length options:getOptions(usage)]; if (buf) { - size.available = heapAvailableSize(heap); + updateAvailableSize(); n_buffers++; } return buf; } - void releaseMTLBuffer(id buffer) { + // returns the retainCount before releasing the buffer + uint32_t releaseMTLBuffer(id& buffer) { + const uint32_t retainCount = [buffer retainCount]; [buffer release]; - size.available = heapAvailableSize(heap); + buffer = nil; + updateAvailableSize(); n_buffers--; + return retainCount; } - void releaseMTLHeap() { + // returns the retainCount before releasing the heap + uint32_t releaseMTLHeap() { + const uint32_t retainCount = [heap retainCount]; TORCH_INTERNAL_ASSERT(!n_buffers); // assert if heap isn't empty + [heap setPurgeableState:MTLPurgeableStateEmpty]; [heap release]; + heap = nil; size.available = 0; + return retainCount; } + uint32_t retainCount() const { return [heap retainCount]; } + void updateAvailableSize() { size.available = heapAvailableSize(heap); } }; typedef bool (*HeapComparison)(const HeapBlock*, const HeapBlock*); struct BufferPool { - BufferPool(const id Device, bool Small, bool Shared) : - device(Device), is_small(Small), is_shared(Shared), - heaps(HeapBlock::Comparator), buffers(BufferBlock::Comparator) { } + BufferPool(const id Device, uint32_t Usage) : + device(Device), usage(Usage), n_buffers(0), allocated_size(0), available_size(0), + heaps(HeapBlock::Comparator), buffers(BufferBlock::Comparator) { } const id device; - // small heaps have sizes of kSmallHeap, and large ones kLargeHeap - const bool is_small; - // private pools allocated on device memory; otherwise, shared between host/device - const bool is_shared; + // usage flags to customize the pool for various purposes (see UsageFlags enum) + const uint32_t usage; + // total number of buffers in the pool + uint32_t n_buffers; + // total allocations size on this pool + size_t allocated_size; + // total memory available in the pool + size_t available_size; // list of heaps ordered by their "available" (not total) memory size std::set heaps; // list of only "available" buffers in the pool (i.e., buffers not in-use) std::set buffers; -}; - -struct AllocParams -{ - AllocParams(size_t Alloc_Size, size_t Requested_Size, BufferPool* Pool) : - search_key(Alloc_Size), pool(Pool), - buffer_block(nullptr), requested_size(Requested_Size) {} - size_t size() const { return search_key.size; } - - BufferBlock search_key; - BufferPool* pool; - BufferBlock* buffer_block; - size_t requested_size; + // list of heaps pending size update + std::unordered_set heaps_pending_update; }; class MPSHeapAllocatorImpl { public: explicit MPSHeapAllocatorImpl() : - m_device(at::mps::MPSDevice::getInstance()->device()), - m_large_pool_shared(m_device, false, true), m_large_pool_private(m_device, false, false), - m_small_pool_shared(m_device, true , true), m_small_pool_private(m_device, true , false), - m_total_allocated_memory(0), m_max_buffer_size([m_device maxBufferLength]), - m_set_fraction(false), m_enable_debug_info(false) { } + m_device(at::mps::MPSDevice::getInstance()->device()), + m_large_pool_shared (m_device, UsageFlags::SHARED | UsageFlags::HAZARD), + m_large_pool_private(m_device, UsageFlags::PRIVATE | UsageFlags::HAZARD), + m_small_pool_shared (m_device, UsageFlags::SMALL | UsageFlags::SHARED | UsageFlags::HAZARD), + m_small_pool_private(m_device, UsageFlags::SMALL | UsageFlags::PRIVATE | UsageFlags::HAZARD), + // no Hazard Tracking required for the Scalar pool (synchronized manually) + m_scalar_pool(m_device, UsageFlags::SMALL | UsageFlags::SHARED | UsageFlags::SCALAR), + m_total_allocated_memory(0), m_max_buffer_size([m_device maxBufferLength]), + m_stream(getDefaultMPSStream()) + { + init_allocator(); + } // interface exposed to at::Allocator - id Malloc(size_t size, bool sharedStorage); - void Free(void* ptr); - void EmptyCache(); + id malloc(size_t size, uint32_t usage); + void free(void* ptr); + void emptyCache(); + // interface exposed to internal MPS operations bool isSharedBuffer(void* ptr); ssize_t getRequestedBufferSize(void* ptr); void setBufferShape(void* ptr, const IntArrayRef& shape); IntArrayRef getBufferShape(void* ptr); - + id allocScalarBufferWithValue(void* value, size_t size); + // this indicates how far (in Megabytes) the current total allocations are from the + // low watermark limit which is used to detect if we're under memory pressure + // This returns zero if we've reached the low watermark limit + ssize_t getLowWatermarkValue(); + + bool getDebugVerbosity() const { return m_debug_verbosity; } + size_t getMaxTotalAllowedSize() const { return m_max_total_allowed_size; } + size_t getLowWatermarkLimit() const { return m_low_watermark_limit; } inline id Device() const { return m_device; } - void enable_debug_info() { m_enable_debug_info = true; } - bool debug_info_enabled() const { return m_enable_debug_info; } - void set_shared_storage_mode(bool useSharedStorage); private: + // (see m_high_watermark_ratio for description) + constexpr static double default_high_watermark_ratio = 0.0; + // (see m_low_watermark_ratio for description) + // on unified memory, we could allocate beyond the recommendedMaxWorkingSetSize + constexpr static double default_low_watermark_ratio_unified = 1.5; + constexpr static double default_low_watermark_ratio_discrete = 1.0; + const id m_device; std::mutex m_mutex; // allocated buffers by device pointer @@ -216,40 +297,69 @@ class MPSHeapAllocatorImpl BufferPool m_large_pool_shared, m_large_pool_private; // unallocated cached buffers 1 MB or smaller BufferPool m_small_pool_shared, m_small_pool_private; + // small cached buffers to import scalar values into MPS stream + BufferPool m_scalar_pool; // total memory allocated by HeapAllocator size_t m_total_allocated_memory; // max buffer size allowed by Metal size_t m_max_buffer_size; - // sets a soft upper bound to limit the total allocations - bool m_set_fraction; - // use "PYTORCH_DEBUG_MPS_ALLOCATOR" env-var to enable debug info - bool m_enable_debug_info; - - HeapBlock* get_free_heap(AllocParams& p); - bool get_free_buffer(AllocParams& p); + // maximum total size allowed to be allocated + size_t m_max_total_allowed_size; + // high watermark ratio is a hard limit for the total allowed allocations (between 0 and 1) + // 0 means unlimited (would spill to disk or system failure if OOM) + // 1 is maximum allowed by device.recommendedMaxWorkingSetSize + // (e.g., value 0.95 means we allocate up to 95% of total memory; beyond that allocations fail) + double m_high_watermark_ratio; + // low watermark ratio is a soft limit to attempt limiting memory allocations up to the lower watermark + // level by garbage collection or committing command buffers more frequently (a.k.a, adaptive commit). + // Value between 0 to m_high_watermark_ratio (setting 0.0 disables adaptive commit and garbage collection) + // (e.g., value 0.9 means we 'attempt' to limit allocations up to 90% of total memory) + double m_low_watermark_ratio; + // low watermark size limit (in Bytes) at the time we initialize the allocator + size_t m_low_watermark_limit; + // use "PYTORCH_DEBUG_MPS_ALLOCATOR" env-var to set debug verbosity + uint32_t m_debug_verbosity; + // default MPS stream + MPSStream* m_stream; + + void init_allocator(); + HeapBlock* get_free_heap(AllocParams& params); + bool get_free_buffer(AllocParams& params); BufferBlock* get_allocated_buffer_block(void* ptr); - bool alloc_buffer(AllocParams& p); + BufferBlock* alloc_buffer_block(size_t size, uint32_t usage); + bool alloc_buffer(AllocParams& params); void free_buffer(BufferBlock* buffer_block); - void release_buffer(BufferBlock* buffer_block, bool remove_empty_heap = true); + // returns true if the container heap is also released + bool release_buffer(BufferBlock* buffer_block, bool remove_empty_heap = true); void release_buffers(BufferPool& pool); - bool release_available_cached_buffers(const AllocParams& p); + bool release_available_cached_buffers(AllocParams& params); bool release_cached_buffers(); - void trigger_memory_callbacks(BufferBlock* buffer_block, IMpsAllocatorCallback::EventType event); - - BufferPool& get_pool(size_t Size, bool useShared) { - return Size <= kMaxSmallAlloc ? (useShared ? m_small_pool_shared : m_small_pool_private) : - (useShared ? m_large_pool_shared : m_large_pool_private); + // free unused cached blocks to reclaim GPU memory if memory pressure is high + void garbage_collect_cached_buffers(AllocParams& params); + + BufferPool& get_pool(size_t Size, uint32_t usage) { + if (usage & UsageFlags::SCALAR) + return m_scalar_pool; + return Size <= kMaxSmallAlloc ? ((usage & UsageFlags::SHARED) ? m_small_pool_shared : m_small_pool_private) : + ((usage & UsageFlags::SHARED) ? m_large_pool_shared : m_large_pool_private); } - size_t get_allocation_size(size_t Length, bool useShared) { + size_t get_allocation_size(size_t Length, uint32_t usage) const { MTLSizeAndAlign sizeAlign = [m_device heapBufferSizeAndAlignWithLength:Length - options:HeapBlock::getOptions(useShared)]; + options:HeapBlock::getOptions(usage)]; return BufferBlock::alignUp(sizeAlign.size, sizeAlign.align); } - // TODO: make this configurable - static size_t max_split_size() { return std::numeric_limits::max(); } // maximum size of device memory available for allocation in current process - size_t max_available_size() const { return [m_device recommendedMaxWorkingSetSize] - [m_device currentAllocatedSize]; } + size_t max_device_size() const { return [m_device recommendedMaxWorkingSetSize]; } + // there are implicit allocations from MPS backend, so we need to query the 'device' for + // total allocated size instead of manually tracking in MPSAllocator + size_t current_allocated_size() const { return [m_device currentAllocatedSize]; } + + void trigger_memory_callbacks(BufferBlock* buffer_block, IMpsAllocatorCallback::EventType event) const { + for (const auto& name : MPSAllocatorCallbacksRegistry()->Keys()) { + MPSAllocatorCallbacksRegistry()->Create(name)->executeMPSAllocatorCallback(buffer_block->buffer, event); + } + } // TODO: make a common function to do size unit conversions in PyTorch. static std::string format_size(uint64_t size) { @@ -266,5 +376,16 @@ class MPSHeapAllocatorImpl } // namespace HeapAllocator +// interface exposed to internal MPS operations + +// get the requested non-aligned size of an MTL buffer +ssize_t get_requested_buffer_size(void* ptr); +// retrieve the shape of a base tensor from a view tensor +IntArrayRef get_buffer_shape(void* ptr); +// set the shape of a base tensor from a view tensor +void set_buffer_shape(void* ptr, const IntArrayRef& shape); +// allocate a buffer from a specialized pool to import CPU scalars into GPU +DataPtr allocate_scalar_buffer(void* value, size_t size); + } // namespace mps } // namespace at diff --git a/aten/src/ATen/mps/MPSAllocator.mm b/aten/src/ATen/mps/MPSAllocator.mm index 2433acbc050b..a40ddd7992a2 100644 --- a/aten/src/ATen/mps/MPSAllocator.mm +++ b/aten/src/ATen/mps/MPSAllocator.mm @@ -4,6 +4,7 @@ #include #include #include +#include namespace at { namespace mps { @@ -12,129 +13,237 @@ namespace HeapAllocator { -HeapBlock* MPSHeapAllocatorImpl::get_free_heap(AllocParams& p) +uint64_t BufferBlock::buffer_counter = 0; +uint64_t HeapBlock::heap_counter = 0; + +void MPSHeapAllocatorImpl::init_allocator() +{ + // debug verbosity flags (see DebugVerbosity enum) + static const char *verbosity_str = getenv("PYTORCH_DEBUG_MPS_ALLOCATOR"); + m_debug_verbosity = verbosity_str ? strtol(verbosity_str, nullptr, 0) : DebugVerbosity::SILENT; + + // on unified memory, we set the allowed upper bound to twice the size of recommendedMaxWorkingSetSize. + const double high_watermark_upper_bound = m_device.hasUnifiedMemory ? 2.0 : 1.0; + + static const char *high_watermark_ratio_str = getenv("PYTORCH_MPS_HIGH_WATERMARK_RATIO"); + m_high_watermark_ratio = high_watermark_ratio_str ? strtod(high_watermark_ratio_str, nullptr) : default_high_watermark_ratio; + TORCH_CHECK(m_high_watermark_ratio >= 0.0 && m_high_watermark_ratio <= high_watermark_upper_bound, + "invalid high watermark ratio ", m_high_watermark_ratio); + + m_max_total_allowed_size = (m_high_watermark_ratio == 0.0) ? std::numeric_limits::max() : + static_cast(m_high_watermark_ratio * (double)max_device_size()); + // used for comparison with lower_watermark_ratio + const double high_watermark_limit = m_high_watermark_ratio == 0.0 ? high_watermark_upper_bound : m_high_watermark_ratio; + const double default_low_watermark_ratio = m_device.hasUnifiedMemory ? default_low_watermark_ratio_unified : + default_low_watermark_ratio_discrete; + static const char *low_watermark_ratio_str = getenv("PYTORCH_MPS_LOW_WATERMARK_RATIO"); + m_low_watermark_ratio = low_watermark_ratio_str ? strtod(low_watermark_ratio_str, nullptr) : default_low_watermark_ratio; + TORCH_CHECK(m_low_watermark_ratio >= 0.0 && m_low_watermark_ratio <= high_watermark_limit, + "invalid low watermark ratio ", m_low_watermark_ratio); + // we use this to detect if there's memory pressure + m_low_watermark_limit = (m_low_watermark_ratio == 0.0) ? std::numeric_limits::max() : + static_cast(m_low_watermark_ratio * (double)max_device_size()); +} + +HeapBlock* MPSHeapAllocatorImpl::get_free_heap(AllocParams& params) { - BufferPool *pool = p.pool; - HeapBlock *heapBlock = nullptr; - HeapBlock search_key(p.size()); - - auto it = pool->heaps.lower_bound(&search_key); - if (it == pool->heaps.end()) { - id heap = HeapBlock::createMTLHeap(pool->device, p.size(), pool->is_shared); - if (heap) { - size_t heap_size = HeapBlock::heapAvailableSize(heap); - heapBlock = new HeapBlock(heap_size, heap, pool); - - if (debug_info_enabled()) { - static unsigned int heap_counter = 0; + BufferPool& pool = *params.pool; + HeapBlock *heap_block = nullptr; + HeapBlock search_key(params.size()); + + auto it = pool.heaps.lower_bound(&search_key); + if (it == pool.heaps.end()) { + heap_block = HeapBlock::createHeapBlock(params, pool.device, pool.usage); + if (heap_block) { + if (m_debug_verbosity & DebugVerbosity::ALLOCATIONS) { std::cerr << "\nAllocated " - << (pool->is_small ? "small " : "large ") - << (pool->is_shared ? "shared " : "private ") - << "heap of size " << format_size(heap_size) - << " (#heaps: " << (++heap_counter) - << ", free memory: " << format_size(max_available_size()) << ")\n"; + << ((pool.usage & UsageFlags::SHARED) ? "shared " : "private ") + << " heap #" << heap_block->heap_id + << " of size " << format_size(heap_block->size.total) + << " (#heaps: " << (pool.heaps.size() + 1) + << ", current allocated: " << format_size(current_allocated_size()) << ")\n"; } } } else { - heapBlock = *it; + heap_block = *it; // remove and re-insert heap in the set later after a buffer is created. // this ensures updating the order of heaps based on their new available sizes - pool->heaps.erase(it); + pool.heaps.erase(it); } - return heapBlock; + return heap_block; } -bool MPSHeapAllocatorImpl::alloc_buffer(AllocParams& p) +bool MPSHeapAllocatorImpl::alloc_buffer(AllocParams& params) { - if (m_set_fraction && m_total_allocated_memory + p.size() > max_available_size()) + if (m_max_total_allowed_size != std::numeric_limits::max() && + current_allocated_size() + params.size() > m_max_total_allowed_size) return false; - HeapBlock *heap = get_free_heap(p); + HeapBlock *heap = get_free_heap(params); if (!heap) return false; // this will cause releasing pool buffers to free up memory - id buffer = heap->newMTLBuffer(p.size(), p.pool->is_shared); + BufferPool& pool = *params.pool; + + id buffer = heap->newMTLBuffer(params.size(), pool.usage); // this should never happen as the backing memory (i.e., heap) was allocated successfully. TORCH_INTERNAL_ASSERT(buffer); // insert heap after a buffer was created on it to update the order of heap's set - p.pool->heaps.insert(heap); - p.buffer_block = new BufferBlock(p.size(), p.requested_size, buffer, heap, m_allocated_buffers.size() + 1); - m_allocated_buffers[p.buffer_block->buffer] = p.buffer_block; - m_total_allocated_memory += p.size(); - - if (debug_info_enabled()) { + pool.heaps.insert(heap); + params.buffer_block = new BufferBlock(params.size(), params.requested_size, buffer, heap); + m_allocated_buffers[params.buffer_block->buffer] = params.buffer_block; + m_total_allocated_memory += params.size(); + pool.allocated_size += params.size(); + pool.n_buffers++; + + if ((m_debug_verbosity & DebugVerbosity::ALLOCATIONS) && + (!(m_debug_verbosity & DebugVerbosity::LARGE_ONLY) || !(pool.usage & UsageFlags::SMALL))) { std::cerr << "Allocated " - << (p.pool->is_shared ? "shared" : "private") - << " buffer #" << p.buffer_block->buf_id - << " of size " << format_size(p.size()) - << " at " << p.buffer_block->buffer - << " (requested size: " << format_size(p.requested_size) - << ", heap size: " << format_size(heap->size.available) - << ", total allocated: " << format_size(m_total_allocated_memory) << ")\n"; + << ((params.pool->usage & UsageFlags::SHARED) ? "shared" : "private") + << ((params.pool->usage & UsageFlags::SCALAR) ? " scalar" : "") + << " buffer #" << params.buffer_block->buf_id + << " of size " << format_size(params.size()) + << " at " << params.buffer_block->buffer + << " from heap #" << heap->heap_id + << " (requested: " << format_size(params.requested_size) + << ", heap: " << format_size(heap->size.available) + << ", total: " << format_size(m_total_allocated_memory) << ")\n"; } return true; } -bool MPSHeapAllocatorImpl::get_free_buffer(AllocParams& p) +bool MPSHeapAllocatorImpl::get_free_buffer(AllocParams& params) { - BufferPool& pool = *p.pool; - auto it = pool.buffers.lower_bound(&p.search_key); - if (it == pool.buffers.end()) - return false; - // do not return an oversized buffer for a large request - // allow oversized buffer size to be rounded up but within a limit - if ((p.size() < max_split_size() && (*it)->size >= max_split_size()) || - ((p.size() >= max_split_size()) && ((*it)->size >= p.size() + kLargeHeap))) + // this helps to monitor "implicit" allocations from MPS backend and to prevent OOM and system failure. + if (m_high_watermark_ratio > 0.0 && current_allocated_size() + params.size() > m_max_total_allowed_size) return false; - p.buffer_block = *it; - pool.buffers.erase(it); - if (debug_info_enabled()) { + BufferPool& pool = *params.pool; + // track buffer reuse intervals only on large pool when low watermark limit is enabled. + if (m_low_watermark_ratio > 0.0 && !(pool.usage & UsageFlags::SMALL)) { + for (auto& b : pool.buffers) { + ++b->gc_count; + } + } + auto it = pool.buffers.lower_bound(¶ms.search_key); + if (it != pool.buffers.end()) { + BufferBlock* buffer_block = *it; + + // the logic in here is simple: keep reusing existing heaps capacity as long as possible (by splitting + // or releasing oversize buffers, if required), and avoid 'new' heap allocations as much as possible. + if (buffer_block->size <= params.size() + kLargeHeap) { + // return the existing buffer if it already fits the requested size (i.e., not oversize) + params.buffer_block = buffer_block; + } else { + HeapBlock search_key(params.size()); + // if there's an 'existing' heap with enough capacity, then don't + // return the oversize buffer and sub-allocate from that existing heap. + if (pool.heaps.lower_bound(&search_key) != pool.heaps.end()) { + params.buffer_block = nullptr; + } else if (buffer_block->retainCount() <= 1) { + // otherwise if buffer is releasable immediately, we make room by releasing the + // buffer and reuse the new space within its heap container for the new smaller buffer allocation + release_buffer(buffer_block, false); + // this will skip unnecessary garbage collection as we'll reuse the newly released space + params.has_memory_pressure = false; + } else if (params.has_memory_pressure) { + // the oversized buffer is busy and not reusable at the moment. So release it (and potentially its heap container) + // in allocator, and ARC will later free up its backing memory when the busy command buffer finishes. + release_buffer(buffer_block, true); + } else { + // only if there's no memory pressure, we'll reuse the oversized buffer + params.buffer_block = buffer_block; + } + } + } + + if (!params.buffer_block) + return false; // this will make allocator to allocate a new buffer + + pool.buffers.erase(params.buffer_block); + params.buffer_block->gc_count = 0; + pool.available_size -= params.buffer_block->size; + + if ((m_debug_verbosity & DebugVerbosity::RECYCLES) && + (!(m_debug_verbosity & DebugVerbosity::LARGE_ONLY) || !(pool.usage & UsageFlags::SMALL))) { std::cerr << "Reusing " - << (p.pool->is_shared ? "shared" : "private") - << " buffer #" << p.buffer_block->buf_id - << " of size " << format_size(p.buffer_block->size) - << " at " << p.buffer_block->buffer - << " (requested size: " << format_size(p.requested_size) << ")\n"; + << ((params.pool->usage & UsageFlags::SHARED) ? "shared" : "private") + << ((params.pool->usage & UsageFlags::SCALAR) ? " scalar" : "") + << " buffer #" << params.buffer_block->buf_id + << " of size " << format_size(params.buffer_block->size) + << " at " << params.buffer_block->buffer + << " (requested: " << format_size(params.requested_size) + << ", use#: " << params.buffer_block->use_count + 1 + << ", retain#: " << params.buffer_block->retainCount() << ")\n"; } return true; } -id MPSHeapAllocatorImpl::Malloc(size_t size, bool sharedStorage) +BufferBlock* MPSHeapAllocatorImpl::alloc_buffer_block(size_t size, uint32_t usage) { TORCH_CHECK(size < m_max_buffer_size, "Invalid buffer size: ", format_size(size)); - std::lock_guard lock(m_mutex); - - size_t alloc_size = get_allocation_size(size, sharedStorage); - auto& pool = get_pool(alloc_size, sharedStorage); + size_t alloc_size = get_allocation_size(size, usage); + auto& pool = get_pool(alloc_size, usage); AllocParams params(alloc_size, size, &pool); + // we care about memory pressure if only we're allocating large buffers when the + // low watermark limit has been reached + params.has_memory_pressure = !(pool.usage & UsageFlags::SMALL) && getLowWatermarkValue() <= 0; + params.has_unified_memory = m_device.hasUnifiedMemory; + + // first, try to get a block from the existing pool. + bool block_found = get_free_buffer(params); + if (!block_found) { + // do garbage collection if memory pressure is high and there's enough memory in pool + if (params.has_memory_pressure && alloc_size < pool.available_size) { + garbage_collect_cached_buffers(params); + } - bool block_found = - // Search pool - get_free_buffer(params) || - // Attempt allocate - alloc_buffer(params) || - // Free enough available cached blocks to satisfy alloc and retry alloc. - (release_available_cached_buffers(params) && alloc_buffer(params)) || - // Free all non-split cached buffers and retry alloc. - (release_cached_buffers() && alloc_buffer(params)); + block_found = + // Attempt allocate + alloc_buffer(params) || + // Free enough available cached blocks to satisfy alloc and retry alloc. + (release_available_cached_buffers(params) && alloc_buffer(params)) || + // Free all cached buffers and retry alloc. + (release_cached_buffers() && alloc_buffer(params)); + } BufferBlock* buffer_block = params.buffer_block; - TORCH_INTERNAL_ASSERT(block_found && buffer_block); + + // the OOM could be triggered if: + // 1- the High Watermark limit has been reached (if enabled) + // 2- ran out of device memory, or the memory fragmentation is so high that a contiguous + // chunk of requested size couldn't be found. + if (!block_found || !buffer_block) { + if (m_high_watermark_ratio > 0.0) { + TORCH_CHECK(false, "MPS backend out of memory (MPS allocated: ", format_size(m_total_allocated_memory), + ", other allocations: ", format_size(current_allocated_size() - m_total_allocated_memory), + ", max allowed: ", format_size(m_max_total_allowed_size), "). Tried to allocate ", format_size(alloc_size), + " on ", ((pool.usage & UsageFlags::SHARED) ? "shared" : "private"), + " pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)."); + } else { + TORCH_CHECK(false, "MPS backend out of memory (MPS allocated: ", format_size(m_total_allocated_memory), + ", other allocations: ", format_size(current_allocated_size() - m_total_allocated_memory), + "). Tried to allocate ", format_size(alloc_size), + " on ", ((pool.usage & UsageFlags::SHARED) ? "shared" : "private"), " pool."); + } + } buffer_block->in_use = true; - return buffer_block->buffer; + buffer_block->use_count++; + + return buffer_block; } void MPSHeapAllocatorImpl::free_buffer(BufferBlock* buffer_block) { TORCH_INTERNAL_ASSERT(buffer_block->in_use); - trigger_memory_callbacks(buffer_block, IMpsAllocatorCallback::EventType::FREED); - buffer_block->in_use = false; - buffer_block->shape.clear(); // reset shape - BufferPool *pool = buffer_block->heap->pool; + + BufferPool& pool = *buffer_block->heap->pool; // Makes sure the BufferBlock* isn't already present in the pool we're freeing it back into. - TORCH_INTERNAL_ASSERT(pool->buffers.insert(buffer_block).second); + TORCH_INTERNAL_ASSERT(pool.buffers.insert(buffer_block).second); + pool.available_size += buffer_block->size; + buffer_block->shape.clear(); // reset shape + buffer_block->in_use = false; } BufferBlock* MPSHeapAllocatorImpl::get_allocated_buffer_block(void* ptr) @@ -146,105 +255,77 @@ return it->second; } -void MPSHeapAllocatorImpl::trigger_memory_callbacks(BufferBlock* buffer_block, IMpsAllocatorCallback::EventType event) { - for (const auto& name : MPSAllocatorCallbacksRegistry()->Keys()) { - MPSAllocatorCallbacksRegistry()->Create(name)->executeMPSAllocatorCallback(buffer_block->buffer, event); - } -} - -bool MPSHeapAllocatorImpl::isSharedBuffer(void* ptr) -{ - std::lock_guard lock(m_mutex); - - BufferBlock *buffer_block = get_allocated_buffer_block(ptr); - // it's OK for the buffer_block to not exist yet - return buffer_block && buffer_block->heap->pool->is_shared; -} - -ssize_t MPSHeapAllocatorImpl::getRequestedBufferSize(void* ptr) -{ - std::lock_guard lock(m_mutex); - - BufferBlock *buffer_block = get_allocated_buffer_block(ptr); - if (buffer_block) - return (ssize_t) buffer_block->requested_size; - // this indicates the passed buffer pointer wasn't found - return -1; -} - -void MPSHeapAllocatorImpl::setBufferShape(void* ptr, const IntArrayRef& shape) -{ - std::lock_guard lock(m_mutex); - - BufferBlock *buffer_block = get_allocated_buffer_block(ptr); - TORCH_INTERNAL_ASSERT(buffer_block, "failed to find the buffer ", ptr); - // note that the IntArrayRef doesn't own the underlying data, and the backing - // memory for shape data must persist as long as the buffer is in use. - // So we need to copy to vector. - buffer_block->shape = shape.vec(); -} - -IntArrayRef MPSHeapAllocatorImpl::getBufferShape(void* ptr) -{ - std::lock_guard lock(m_mutex); - - BufferBlock *buffer_block = get_allocated_buffer_block(ptr); - if (buffer_block && buffer_block->shape.size() > 0) - return IntArrayRef{buffer_block->shape}; - - return IntArrayRef(); -} - -void MPSHeapAllocatorImpl::Free(void* ptr) +bool MPSHeapAllocatorImpl::release_buffer(BufferBlock* buffer_block, bool remove_empty_heap) { - std::lock_guard lock(m_mutex); - - BufferBlock *buffer_block = get_allocated_buffer_block(ptr); - TORCH_INTERNAL_ASSERT(buffer_block); - free_buffer(buffer_block); -} - -void MPSHeapAllocatorImpl::EmptyCache() -{ - std::lock_guard lock(m_mutex); - release_cached_buffers(); -} - -void MPSHeapAllocatorImpl::release_buffer(BufferBlock* buffer_block, bool remove_empty_heap) -{ - trigger_memory_callbacks(buffer_block, IMpsAllocatorCallback::EventType::RELEASED); - - HeapBlock *heap = buffer_block->heap; - BufferPool *pool = heap->pool; + HeapBlock *heap_block = buffer_block->heap; + BufferPool& pool = *heap_block->pool; m_total_allocated_memory -= buffer_block->size; + pool.allocated_size -= buffer_block->size; + pool.available_size -= buffer_block->size; m_allocated_buffers.erase(buffer_block->buffer); - pool->buffers.erase(buffer_block); + pool.buffers.erase(buffer_block); + pool.n_buffers--; // will re-insert later to keep the heaps list sorted based on heap's new available size (if heap not empty) - pool->heaps.erase(heap); - heap->releaseMTLBuffer(buffer_block->buffer); - if (debug_info_enabled()) { + pool.heaps.erase(heap_block); + uint32_t retainCount = heap_block->releaseMTLBuffer(buffer_block->buffer); + + if ((m_debug_verbosity & DebugVerbosity::RELEASES) && + (!(m_debug_verbosity & DebugVerbosity::LARGE_ONLY) || !(pool.usage & UsageFlags::SMALL))) { std::cerr << "Released buffer #" << buffer_block->buf_id << " of size " << format_size(buffer_block->size) - << " (heap size: " << format_size(heap->size.available) - << ", total allocated: " << format_size(m_total_allocated_memory) << ")\n"; - + << " from heap #" << heap_block->heap_id + << " (heap size: " << format_size(heap_block->size.available) + << ", use#: " << buffer_block->use_count + << ", retain#: " << retainCount + << ", gc#: " << buffer_block->gc_count << ")\n"; } delete buffer_block; - if (remove_empty_heap && heap->n_buffers == 0) { - heap->releaseMTLHeap(); - if (debug_info_enabled()) { - std::cerr << "Released heap of size " << format_size(heap->size.total) - << " (free memory: " << format_size(max_available_size()) << ")\n"; + if (remove_empty_heap && heap_block->n_buffers == 0) { + pool.heaps_pending_update.erase(heap_block); + retainCount = heap_block->releaseMTLHeap(); + if (m_debug_verbosity & DebugVerbosity::RELEASES) { + std::cerr << "Released heap #" << heap_block->heap_id + << " of size " << format_size(heap_block->size.total) + << " (current allocated: " << format_size(current_allocated_size()) + << ", retain#: " << retainCount << ")\n"; } - delete heap; + delete heap_block; + return true; } else { - pool->heaps.insert(heap); + pool.heaps.insert(heap_block); + // if heap wasn't released and its released buffer is still busy in command buffer, the available + // size of the heap cannot be updated and we should defer updating until command buffer finishes. + if (retainCount > 1) { + pool.heaps_pending_update.insert(heap_block); + m_mutex.unlock(); + m_stream->addCompletedHandler(^(id ) { + std::lock_guard lock(m_mutex); + // check if the heap block still exists + if (pool.heaps_pending_update.find(heap_block) != pool.heaps_pending_update.end()) { + pool.heaps_pending_update.erase(heap_block); + pool.heaps.erase(heap_block); + heap_block->updateAvailableSize(); + pool.heaps.insert(heap_block); + } + }); + m_mutex.lock(); + } } + return false; } void MPSHeapAllocatorImpl::release_buffers(BufferPool& pool) { + if ((m_debug_verbosity & DebugVerbosity::PROFILING) && pool.n_buffers > 0) { + std::cerr << "Releasing " << pool.n_buffers + << " buffers from " + << ((pool.usage & UsageFlags::SMALL ) ? "small " : "large ") + << ((pool.usage & UsageFlags::SHARED) ? "shared" : "private") + << ((pool.usage & UsageFlags::SCALAR) ? " scalar" : "") + << " pool (total size: " << format_size(pool.allocated_size) + << ", free buffers: " << pool.buffers.size() << ")\n"; + } auto it = pool.buffers.begin(); while (it != pool.buffers.end()) { BufferBlock* buffer_block = *it; @@ -253,20 +334,18 @@ } } -bool MPSHeapAllocatorImpl::release_available_cached_buffers(const AllocParams& p) +bool MPSHeapAllocatorImpl::release_available_cached_buffers(AllocParams& params) { - BufferPool& pool = *p.pool; + BufferPool& pool = *params.pool; - if (max_split_size() == std::numeric_limits::max() || pool.buffers.empty()) + if (pool.buffers.empty()) return false; - BufferBlock key = p.search_key; - key.size = (key.size < max_split_size()) ? max_split_size() : key.size; - auto it = pool.buffers.lower_bound(&key); + auto it = pool.buffers.lower_bound(¶ms.search_key); if (it == pool.buffers.end()) { size_t totalReleased = 0; --it; - while ((totalReleased < key.size) && ((*it)->size >= max_split_size())) { + while (totalReleased < params.search_key.size) { auto cur = it; totalReleased += (*it)->size; if (it != pool.buffers.begin()) { @@ -277,7 +356,7 @@ break; } } - if (totalReleased < key.size) + if (totalReleased < params.search_key.size) return false; } else { release_buffer(*it); @@ -287,14 +366,179 @@ bool MPSHeapAllocatorImpl::release_cached_buffers() { + if (m_debug_verbosity >= DebugVerbosity::PROFILING) { + std::cerr << "Releasing buffer pools (MPS allocated: " << format_size(m_total_allocated_memory) + << ", other allocations: " << format_size(current_allocated_size() - m_total_allocated_memory) << ")\n"; + } + // before releasing the buffers make sure the command buffer has finished. + // we need to release the lock temporarily as synchronizing may cause deadlock with completion handlers. + m_mutex.unlock(); + m_stream->synchronize(SyncType::COMMIT_AND_WAIT); + m_mutex.lock(); // Free all cached blocks to system allocator release_buffers(m_large_pool_private); release_buffers(m_large_pool_shared); release_buffers(m_small_pool_private); release_buffers(m_small_pool_shared); + release_buffers(m_scalar_pool); return true; } +void MPSHeapAllocatorImpl::garbage_collect_cached_buffers(AllocParams& params) +{ + TORCH_INTERNAL_ASSERT(current_allocated_size() >= m_low_watermark_limit); + // attempt to collect garbage until we reach below low watermark limit + const auto target_size = current_allocated_size() - m_low_watermark_limit; + const BufferPool& pool = *params.pool; + // calculate the total age of the free-able blocks. We'll use it later to get the average age threshold. + double total_age = 0.0; + unsigned int freeable_block_count = 0, freed_count = 0; + size_t gc_reclaimed = 0; + + for (auto& b : pool.buffers) { + if (b->retainCount() <= 1) { + total_age += b->gc_count; + ++freeable_block_count; + } + } + if (freeable_block_count == 0) { + return; + } + // repeat GC until we reach reclaim > target size. + bool block_freed = true; + while (gc_reclaimed < target_size && block_freed && freeable_block_count > 0) { + // free blocks exceeding this age threshold first. + double age_threshold = total_age / freeable_block_count; + // stop iteration if we can no longer free a block. + block_freed = false; + // free blocks of > avg age. Stop garbage collection if we reach below the + // low watermark limit since re-allocation or fragmentation could be costly. + auto it = pool.buffers.begin(); + while (it != pool.buffers.end() && gc_reclaimed < target_size) { + BufferBlock* buffer_block = *it++; + if (buffer_block->gc_count >= age_threshold && buffer_block->retainCount() <= 1) { + block_freed = true; + gc_reclaimed += buffer_block->size; + total_age -= buffer_block->gc_count; + freeable_block_count--; + freed_count++; + release_buffer(buffer_block, !buffer_block->heap->is_split); + } + } + } + if (m_debug_verbosity & DebugVerbosity::RELEASES) { + std::cerr << "Garbage collected " << freed_count + << " buffers from large " + << ((pool.usage & UsageFlags::SHARED) ? "shared" : "private") + << " pool (total reclaimed: " << format_size(gc_reclaimed) + << ", #buffers: " << pool.buffers.size() << ")\n"; + } +} + +// public interface to MPSAllocator +id MPSHeapAllocatorImpl::malloc(size_t size, uint32_t usage) +{ + std::lock_guard lock(m_mutex); + + BufferBlock* buffer_block = alloc_buffer_block(size, usage); + return buffer_block ? buffer_block->buffer : nullptr; +} + +bool MPSHeapAllocatorImpl::isSharedBuffer(void* ptr) +{ + std::lock_guard lock(m_mutex); + + BufferBlock *buffer_block = get_allocated_buffer_block(ptr); + // it's OK for the buffer_block to not exist yet + return buffer_block && (buffer_block->heap->pool->usage & UsageFlags::SHARED); +} + +id MPSHeapAllocatorImpl::allocScalarBufferWithValue(void* value, size_t size) +{ + BufferBlock* buffer_block = nullptr; + { + std::lock_guard lock(m_mutex); + + buffer_block = alloc_buffer_block(size, UsageFlags::SCALAR); + if (!buffer_block) + return nullptr; + } + // buffer is out of the pool, so no mutex lock is needed + memcpy([buffer_block->buffer contents], value, size); + return buffer_block->buffer; +} + +ssize_t MPSHeapAllocatorImpl::getRequestedBufferSize(void* ptr) +{ + std::lock_guard lock(m_mutex); + + BufferBlock *buffer_block = get_allocated_buffer_block(ptr); + if (buffer_block) + return (ssize_t) buffer_block->requested_size; + // -1 indicates the passed buffer pointer wasn't found + return -1; +} + +void MPSHeapAllocatorImpl::setBufferShape(void* ptr, const IntArrayRef& shape) +{ + std::lock_guard lock(m_mutex); + + BufferBlock *buffer_block = get_allocated_buffer_block(ptr); + TORCH_INTERNAL_ASSERT(buffer_block, "failed to find the buffer ", ptr); + // note that the IntArrayRef doesn't own the underlying data, and the backing + // memory for shape data must persist as long as the buffer is in use. + // So we need to copy to vector. + buffer_block->shape = shape.vec(); +} + +IntArrayRef MPSHeapAllocatorImpl::getBufferShape(void* ptr) +{ + std::lock_guard lock(m_mutex); + + BufferBlock *buffer_block = get_allocated_buffer_block(ptr); + if (buffer_block && buffer_block->shape.size() > 0) + return IntArrayRef{buffer_block->shape}; + + return IntArrayRef(); +} + +void MPSHeapAllocatorImpl::free(void* ptr) +{ + BufferBlock *buffer_block = nullptr; + { + std::lock_guard lock(m_mutex); + + buffer_block = get_allocated_buffer_block(ptr); + TORCH_INTERNAL_ASSERT(buffer_block); + const BufferPool& pool = *buffer_block->heap->pool; + if (!(pool.usage & UsageFlags::SCALAR)) { + free_buffer(buffer_block); + return; + } + } + // we sync the scalar pool manually with completion handler at the time buffer is + // freed when the MPSScalar instance goes our of scope + m_stream->addCompletedHandler(^(id ) { + std::lock_guard lock(m_mutex); + free_buffer(buffer_block); + }); +} + +void MPSHeapAllocatorImpl::emptyCache() +{ + std::lock_guard lock(m_mutex); + release_cached_buffers(); +} + +ssize_t MPSHeapAllocatorImpl::getLowWatermarkValue() +{ + // check if low watermark limit is disabled + if (m_low_watermark_ratio == 0.0) + return std::numeric_limits::max(); + // current_allocated_size could exceed m_low_watermark_limit (e.g., when swapping to disk) + return std::max(0, (ssize_t)(m_low_watermark_limit - current_allocated_size()) / 1048576L); +} + } // namespace HeapAllocator // Use "at::mps::GetMPSAllocator()" to acquire a handle to MPS Allocator @@ -308,66 +552,66 @@ // MPS allocator struct to be registered with Pytorch struct TORCH_API MPSAllocator final : public at::Allocator { public: - explicit MPSAllocator(bool useSharedStorage) : - m_has_unified_memory(_getAllocImpl().Device().hasUnifiedMemory), m_use_shared_storage(useSharedStorage) + explicit MPSAllocator(uint32_t Usage) : + m_has_unified_memory(_getAllocImpl().Device().hasUnifiedMemory), m_usage(Usage) { - const bool enable_debug_info = isEnvVarEnabled("PYTORCH_DEBUG_MPS_ALLOCATOR"); - if (enable_debug_info) { - _getAllocImpl().enable_debug_info(); - if (!m_use_shared_storage || m_has_unified_memory) { + if (_getAllocImpl().getDebugVerbosity()) { + if (!(m_usage & HeapAllocator::UsageFlags::SHARED) || m_has_unified_memory) { + const size_t max_total_allowed_size = _getAllocImpl().getMaxTotalAllowedSize(); + const size_t low_watermark_limit = _getAllocImpl().getLowWatermarkLimit(); std::cerr << "Initializing " - << (useSharedStorage ? "shared" : "private") + << ((m_usage & HeapAllocator::UsageFlags::SHARED) ? "shared" : "private") << " heap allocator on " << (m_has_unified_memory ? "unified" : "discrete") << " device memory of size " - << _getAllocImpl().Device().recommendedMaxWorkingSetSize / 1048576UL << " MB\n"; + << _getAllocImpl().Device().recommendedMaxWorkingSetSize / 1048576UL << " MB" + << " (max allowed: " + << (max_total_allowed_size == std::numeric_limits::max() ? "unlimited" : + (to_string(max_total_allowed_size / 1048576UL) + " MB")) + << ", low watermark: " + << (low_watermark_limit == std::numeric_limits::max() ? "unlimited" : + (to_string(low_watermark_limit / 1048576UL) + " MB")) << ")\n"; } } } ~MPSAllocator() override { - _getAllocImpl().EmptyCache(); + _getAllocImpl().emptyCache(); } DataPtr allocate(const size_t nbytes) const override { - __block id buf = nbytes > 0 ? _getAllocImpl().Malloc(nbytes, m_use_shared_storage) : nullptr; + __block id buf = nbytes > 0 ? _getAllocImpl().malloc(nbytes, m_usage) : nullptr; + return { buf, buf, &Delete, at::Device(at::DeviceType::MPS, 0)}; + } + + DataPtr allocate_scalar_buffer(void *value, size_t size) const { + id buf = _getAllocImpl().allocScalarBufferWithValue(value, size); return { buf, buf, &Delete, at::Device(at::DeviceType::MPS, 0)}; } DeleterFnPtr raw_deleter() const override { return &Delete; } bool is_shared(void* ptr) const { return _getAllocImpl().isSharedBuffer(ptr); } - bool is_shared_storge_supported() const { return m_has_unified_memory; } + bool is_shared_storage_supported() const { return m_has_unified_memory; } private: bool m_has_unified_memory; - // use shared buffers on unified memory - bool m_use_shared_storage; + uint32_t m_usage; static void Delete(void* ptr) { if (ptr) { - _getAllocImpl().Free(ptr); - } - } - - static bool isEnvVarEnabled(const char *envvar) { - const char *e = getenv(envvar); - if (e) { - char *t = (char*) e; - long val = strtol(e, &t, 0); - return (t != e && val != 0); + _getAllocImpl().free(ptr); } - return false; } }; namespace { MPSAllocator& _getSharedAllocator() { - static MPSAllocator s_mps_shared_alloc(true); + static MPSAllocator s_mps_shared_alloc(HeapAllocator::UsageFlags::SHARED); return s_mps_shared_alloc; } MPSAllocator& _getPrivateAllocator() { - static mps::MPSAllocator s_mps_private_alloc(false); + static MPSAllocator s_mps_private_alloc(HeapAllocator::UsageFlags::PRIVATE); return s_mps_private_alloc; } } // anonymous namespace @@ -375,14 +619,14 @@ static bool isEnvVarEnabled(const char *envvar) { at::Allocator* getMPSSharedAllocator() { auto& sa = _getSharedAllocator(); - if (sa.is_shared_storge_supported()) { + if (sa.is_shared_storage_supported()) { return &sa; } return nullptr; } -at::Allocator* getMPSStaticAllocator() { +at::Allocator* getMPSPrivateAllocator() { return &_getPrivateAllocator(); } @@ -397,7 +641,15 @@ void set_buffer_shape(void* ptr, const IntArrayRef& shape) { IntArrayRef get_buffer_shape(void* ptr) { return _getAllocImpl().getBufferShape(ptr); -}; +} + +DataPtr allocate_scalar_buffer(void *value, size_t size) { + return _getPrivateAllocator().allocate_scalar_buffer(value, size); +} + +uint32_t get_adaptive_commit_threshold() { + return _getAllocImpl().getLowWatermarkValue(); +} } // namespace mps diff --git a/aten/src/ATen/mps/MPSDevice.h b/aten/src/ATen/mps/MPSDevice.h index d957c5440a06..48e1904346c1 100644 --- a/aten/src/ATen/mps/MPSDevice.h +++ b/aten/src/ATen/mps/MPSDevice.h @@ -11,9 +11,15 @@ #include #include typedef id MTLDevice_t; +typedef id MTLLibrary_t; +typedef id MTLFunction_t; +typedef MTLFunctionConstantValues* MTLFunctionConstantValues_t; #else typedef void* MTLDevice; typedef void* MTLDevice_t; +typedef void* MTLLibrary_t; +typedef void* MTLFunction_t; +typedef void* MTLFunctionConstantValues_t; #endif using namespace std; @@ -47,16 +53,25 @@ class TORCH_API MPSDevice { MTLDevice_t device() { return _mtl_device; } + /** + * Returns whether running on Ventura or newer + */ + bool isMacOS13Plus() const; + + MTLFunction_t metalIndexingFunction(const std::string &kernel, MTLFunctionConstantValues_t constantValues); ~MPSDevice(); private: static MPSDevice* _device; MTLDevice_t _mtl_device; + bool _macos13plus; + MTLLibrary_t _mtl_indexing_library; MPSDevice(); }; TORCH_API bool is_available(); +TORCH_API bool is_macos_13_or_newer(); TORCH_API at::Allocator* GetMPSAllocator(bool useSharedAllocator = false); diff --git a/aten/src/ATen/mps/MPSDevice.mm b/aten/src/ATen/mps/MPSDevice.mm index 277510066649..c11621b3f354 100644 --- a/aten/src/ATen/mps/MPSDevice.mm +++ b/aten/src/ATen/mps/MPSDevice.mm @@ -3,6 +3,7 @@ #include #include +#include namespace at { namespace mps { @@ -10,6 +11,15 @@ static std::unique_ptr mps_device; static c10::once_flag mpsdev_init; +static inline MTLLanguageVersion getMetalLanguageVersion(const id& device) { + // MPS Advanced Indexing needs at least Metal 2.0 (support for Argument Buffers and function constants) + // host_name attribute needs at least Metal 2.2 + MTLLanguageVersion languageVersion = MTLLanguageVersion2_2; + + TORCH_CHECK([device supportsFamily:MTLGPUFamilyMac2], "Missing Metal support for MTLGPUFamilyMac2"); + return languageVersion; +} + MPSDevice* MPSDevice::getInstance() { c10::call_once(mpsdev_init, [] { mps_device = std::unique_ptr(new MPSDevice()); @@ -17,16 +27,46 @@ return mps_device.get(); } +id MPSDevice::metalIndexingFunction(const std::string& kernel, MTLFunctionConstantValues* constantValues) { + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(_mtl_device); + NSError* error = nil; + if (!_mtl_indexing_library) { + MTLCompileOptions *options = [MTLCompileOptions new]; + [options setLanguageVersion: getMetalLanguageVersion(_mtl_device)]; + [options setFastMathEnabled: YES]; + _mtl_indexing_library = [_mtl_device newLibraryWithSource: [NSString stringWithCString: mps::indexing_metal_shaders encoding:NSASCIIStringEncoding] + options: options + error: &error]; + TORCH_CHECK(_mtl_indexing_library, "Failed to create indexing library, error: ", [[error description] UTF8String]); + } + + id indexFunction = nil; + if (constantValues) { + indexFunction = [[_mtl_indexing_library newFunctionWithName: [NSString stringWithUTF8String: kernel.c_str()] + constantValues: constantValues + error: &error] autorelease]; + } else { + indexFunction = [[_mtl_indexing_library newFunctionWithName: [NSString stringWithUTF8String: kernel.c_str()]] autorelease]; + } + + TORCH_CHECK(indexFunction, "Failed to create specialized function state object: ", kernel, ", error: ", [[error description] UTF8String]); + + return indexFunction; +} + MPSDevice::~MPSDevice() { [_mtl_device release]; + [_mtl_indexing_library release]; _mtl_device = nil; + _mtl_indexing_library = nil; } -MPSDevice::MPSDevice(): _mtl_device(nil) { +MPSDevice::MPSDevice(): _mtl_device(nil), _mtl_indexing_library(nil) { // Check that MacOS 12.3+ version of MPS framework is available // Create the MPSGraph and check method introduced in 12.3+ // which is used by MPS backend. id mpsCD = NSClassFromString(@"MPSGraph"); + _macos13plus = [mpsCD instancesRespondToSelector:@selector(cumulativeSumWithTensor:axis:name:)] == YES; if ([mpsCD instancesRespondToSelector:@selector(LSTMWithSourceTensor: recurrentWeight: inputWeight: @@ -37,6 +77,7 @@ name:)] == NO) { return; } + NSArray* devices = [MTLCopyAllDevices() autorelease]; for (unsigned long i = 0 ; i < [devices count] ; i++) { id device = devices[i]; @@ -45,18 +86,27 @@ break; } } - assert(_mtl_device); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(_mtl_device); + +} + +bool MPSDevice::isMacOS13Plus() const { + return _macos13plus; } at::Allocator* getMPSSharedAllocator(); -at::Allocator* getMPSStaticAllocator(); +at::Allocator* getMPSPrivateAllocator(); at::Allocator* GetMPSAllocator(bool useSharedAllocator) { - return useSharedAllocator ? getMPSSharedAllocator() : getMPSStaticAllocator(); + return useSharedAllocator ? getMPSSharedAllocator() : getMPSPrivateAllocator(); } bool is_available() { return MPSDevice::getInstance()->device() != nil; } +bool is_macos_13_or_newer() { + return MPSDevice::getInstance()->isMacOS13Plus(); +} + } // namespace mps } // namespace at diff --git a/aten/src/ATen/mps/MPSFallback.mm b/aten/src/ATen/mps/MPSFallback.mm index 7e6be9c772b9..f1c0dbbacdca 100644 --- a/aten/src/ATen/mps/MPSFallback.mm +++ b/aten/src/ATen/mps/MPSFallback.mm @@ -14,7 +14,7 @@ void mps_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) void mps_error_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { - TORCH_CHECK_NOT_IMPLEMENTED(false, "The operator '", op.schema().operator_name(), "' is not current implemented ", + TORCH_CHECK_NOT_IMPLEMENTED(false, "The operator '", op.schema().operator_name(), "' is not currently implemented ", "for the MPS device. If you want this op to be added in priority during the prototype ", "phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. ", "As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` ", @@ -22,6 +22,20 @@ void mps_error_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) "on MPS.") } + +// This dispatch should never be called for tensor on MPS but is frequently called +// If one of them are on CPU +Tensor slow_conv2d_forward_mps( + const Tensor &self, + const Tensor &weight, + IntArrayRef kernel_size, + const c10::optional &bias, + IntArrayRef stride, + IntArrayRef padding) { + TORCH_CHECK(self.device() == weight.device(), __func__, ": input(device='", self.device(), "') and weight(device=", weight.device(), "') must be on the same device"); + TORCH_INTERNAL_ASSERT(false, __func__, " should not be called for both tensors on MPS device"); +} + TORCH_LIBRARY_IMPL(_, MPS, m) { static const char *enable_mps_fallback = getenv("PYTORCH_ENABLE_MPS_FALLBACK"); if (!enable_mps_fallback || std::stoi(enable_mps_fallback) == 0) { @@ -35,7 +49,6 @@ void mps_error_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) // These ops are not supported via MPS backend currently, and we fallback to run on CPU. // For the rest of unsupported ops the user needs to pass 'PYTORCH_ENABLE_MPS_FALLBACK=1' // to fallback on CPU, otherwise we will error out. - m.impl("bitwise_not.out", torch::CppFunction::makeFromBoxedFunction<&mps_fallback>()); m.impl("bitwise_left_shift.Tensor_out", torch::CppFunction::makeFromBoxedFunction<&mps_fallback>()); m.impl("bitwise_right_shift.Tensor_out", torch::CppFunction::makeFromBoxedFunction<&mps_fallback>()); m.impl("embedding_renorm_", torch::CppFunction::makeFromBoxedFunction<&mps_fallback>()); @@ -49,7 +62,7 @@ void mps_error_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) m.impl("linalg_vector_norm", torch::CppFunction::makeFromBoxedFunction<&mps_fallback>()); m.impl("sgn.out", torch::CppFunction::makeFromBoxedFunction<&mps_fallback>()); m.impl("nonzero", torch::CppFunction::makeFromBoxedFunction<&mps_fallback>()); - m.impl("masked_select", torch::CppFunction::makeFromBoxedFunction<&mps_fallback>()); + m.impl("_slow_conv2d_forward", slow_conv2d_forward_mps); } } // namespace at diff --git a/aten/src/ATen/mps/MPSGuardImpl.h b/aten/src/ATen/mps/MPSGuardImpl.h index 27d32bf652e7..b6002497d223 100644 --- a/aten/src/ATen/mps/MPSGuardImpl.h +++ b/aten/src/ATen/mps/MPSGuardImpl.h @@ -109,12 +109,12 @@ struct TORCH_API MPSGuardImpl final : public c10::impl::DeviceGuardImplInterface struct OptionalMPSGuard { explicit OptionalMPSGuard() : guard_() {} - explicit OptionalMPSGuard(optional device_opt) + explicit OptionalMPSGuard(c10::optional device_opt) : guard_(device_opt) {} /// Set the current MPS device to the passed device index, if it is not /// nullopt - explicit OptionalMPSGuard(optional device_index_opt) + explicit OptionalMPSGuard(c10::optional device_index_opt) : guard_(device_index_opt) {} // Copy is not allowed @@ -144,14 +144,14 @@ struct OptionalMPSGuard { /// Returns the device that was set immediately prior to initialization of the /// guard, or nullopt if the guard is uninitialized. - optional original_device() const { + c10::optional original_device() const { return guard_.original_device(); } /// Returns the most recent device that was set using this device guard, /// either from construction, or via set_device, if the guard is initialized, /// or nullopt if the guard is uninitialized. - optional current_device() const { + c10::optional current_device() const { return guard_.current_device(); } diff --git a/aten/src/ATen/mps/MPSGuardImpl.mm b/aten/src/ATen/mps/MPSGuardImpl.mm index 2aedeccf82cb..787ef4cae7cd 100644 --- a/aten/src/ATen/mps/MPSGuardImpl.mm +++ b/aten/src/ATen/mps/MPSGuardImpl.mm @@ -35,7 +35,7 @@ auto mps_event = static_cast(*event); MPSStream mps_stream{stream}; - mps_event->recordEvent(&mps_stream); + mps_event->recordEvent(true); } void MPSGuardImpl::block( @@ -45,7 +45,7 @@ auto mps_event = static_cast(event); MPSStream mps_stream{stream}; - mps_event->waitForEvent(&mps_stream); + mps_event->waitForEvent(true); } bool MPSGuardImpl::queryEvent(void* event) const { diff --git a/aten/src/ATen/mps/MPSStream.h b/aten/src/ATen/mps/MPSStream.h index d4e6172954da..afd4d53e1cdd 100644 --- a/aten/src/ATen/mps/MPSStream.h +++ b/aten/src/ATen/mps/MPSStream.h @@ -43,6 +43,7 @@ enum class SyncType { COMMIT, // commit and flush the command buffer COMMIT_AND_WAIT, // flush and wait for command buffer execution to finish COMMIT_AND_CONTINUE,// commit and continue with a new underlying command buffer + COMMIT_ADAPTIVE, // commit adaptively based on available memory }; class TORCH_API MPSStream @@ -70,6 +71,7 @@ class TORCH_API MPSStream size_t length, size_t srcOffset, size_t dstOffset, bool non_blocking); void flush(); void executeMPSGraph(MPSGraph* mpsGraph, NSDictionary* feeds, NSDictionary* results, SyncType syncType = SyncType::NONE); + void addCompletedHandler(MTLCommandBufferHandler block); /// Get the MPS device index that this stream is associated with. c10::DeviceIndex device_index() const { return _stream.device_index(); } @@ -125,21 +127,27 @@ class TORCH_API MPSStreamImpl struct TORCH_API MPSEvent { - MPSEvent(); - // MPSEvent(id device); - + // for a new instance of MPSEvent, sometimes we want an empty shell and don't + // necessarily want to create events or listeners. So we defer initialization + // until we actually use the event (e.g., record, notify, etc.) + MPSEvent(bool deferInitialization = true); ~MPSEvent(); MTLSharedEvent_t event() const {return _event; } - void recordEvent(MPSStream *stream); - void waitForEvent(MPSStream *queue); // waits on the cpu - bool queryEvent(); - uint64_t getCurrentValue() { return _currentValue; } - void setCurrentValue(uint64_t currValue) { _currentValue = currValue; } + void recordEvent(bool syncEvent = false); + void waitForEvent(bool syncEvent = false); // waits on the cpu + void notifyEvent(MTLSharedEventNotificationBlock block); + bool queryEvent() const; + uint64_t getCurrentValue() const { return _signalCounter; } + void setCurrentValue(uint64_t currValue) { _signalCounter = currValue; } private: - bool _isRecorded = false; - uint64_t _currentValue = 0; + bool is_initialized; + uint64_t _signalCounter; + MPSStream* _stream; MTLSharedEvent_t _event; + MTLSharedEventListener* _listener; + + void initialize(); }; typedef MPSEvent* mpsEvent_t; diff --git a/aten/src/ATen/mps/MPSStream.mm b/aten/src/ATen/mps/MPSStream.mm index 948d5723cad9..04115fc268c7 100644 --- a/aten/src/ATen/mps/MPSStream.mm +++ b/aten/src/ATen/mps/MPSStream.mm @@ -5,7 +5,10 @@ namespace at { namespace mps { -#define USE_MPSCOMMANDBUFFER 1 +#define USE_COMMIT_AND_CONTINUE 1 + +// the frequency that we commit the command buffer calculated based on low watermark ratio in MPSAllocator +uint32_t get_adaptive_commit_threshold(); //----------------------------------------------------------------- // MPSStream @@ -47,6 +50,16 @@ case SyncType::COMMIT: flush(); break; + case SyncType::COMMIT_ADAPTIVE: + // the adaptive commit only commits if we hit the low watermark memory threshold + if (get_adaptive_commit_threshold() <= 1) { +#if USE_COMMIT_AND_CONTINUE + commitAndContinue(); +#else + flush(); +#endif + } + break; case SyncType::COMMIT_AND_WAIT: commitAndWait(); break; @@ -57,7 +70,7 @@ } void MPSStream::commit(bool doFlush) { -#if USE_MPSCOMMANDBUFFER +#if USE_COMMIT_AND_CONTINUE [commandBuffer() commitAndContinue]; #else if (doFlush) { @@ -96,6 +109,14 @@ [_commandBuffer release]; } +void MPSStream::addCompletedHandler(MTLCommandBufferHandler block) { + dispatch_sync(_serialQueue, ^() { + @autoreleasepool { + [commandBuffer() addCompletedHandler:block]; + } + }); +} + void MPSStream::fill(id buffer, uint8_t value, size_t length, size_t offset, SyncType syncType) { TORCH_INTERNAL_ASSERT(length >= offset); @@ -138,7 +159,7 @@ void MPSStream::executeMPSGraph(MPSGraph* mpsGraph, NSDictionary* feeds, NSDictionary* results, SyncType syncType) { dispatch_sync(_serialQueue, ^() { -#if USE_MPSCOMMANDBUFFER +#if USE_COMMIT_AND_CONTINUE [mpsGraph encodeToCommandBuffer:commandBuffer() feeds:feeds targetOperations:nil @@ -185,40 +206,73 @@ // MPSEvent //----------------------------------------------------------------- -MPSEvent::MPSEvent() { - _event = [MPSDevice::getInstance()->device() newSharedEvent]; +MPSEvent::MPSEvent(bool deferInitialization) : + is_initialized(false), _signalCounter(0), _stream(nil), _event(nil), _listener(nil) { + if (!deferInitialization) { + initialize(); + } } MPSEvent::~MPSEvent() { - [_event release]; - _event = nil; -} - -void MPSEvent::recordEvent(MPSStream* stream) { - @autoreleasepool { - _isRecorded = true; - dispatch_sync(stream->queue(), ^() { - @autoreleasepool { - id commandBuffer = stream->commandBuffer(); - [commandBuffer encodeSignalEvent:_event value:_currentValue]; - stream->commit(true); - } - }); + if (_event) { + [_event release]; + _event = nil; + } + if (_listener) { + [_listener release]; + _listener = nil; } } -void MPSEvent::waitForEvent(MPSStream* stream) { - dispatch_sync(stream->queue(), ^() { +void MPSEvent::initialize() { + _stream = getDefaultMPSStream(); + _event = [_stream->device() newSharedEvent]; + _listener = [[MTLSharedEventListener alloc] init]; + is_initialized = true; +} + +void MPSEvent::recordEvent(bool syncEvent) { + if (!is_initialized) + initialize(); + + dispatch_sync(_stream->queue(), ^() { + @autoreleasepool { + ++_signalCounter; + id commandBuffer = _stream->commandBuffer(); + [commandBuffer encodeSignalEvent:_event value:_signalCounter]; + if (syncEvent) + _stream->synchronize(SyncType::COMMIT); + } + }); +} + +void MPSEvent::waitForEvent(bool syncEvent) { + TORCH_INTERNAL_ASSERT(is_initialized); + dispatch_sync(_stream->queue(), ^() { + @autoreleasepool { + id commandBuffer = _stream->commandBuffer(); + [commandBuffer encodeWaitForEvent:_event value:_signalCounter]; + if (syncEvent) + _stream->synchronize(SyncType::COMMIT); + } + }); +} + +void MPSEvent::notifyEvent(MTLSharedEventNotificationBlock block) +{ + if (!is_initialized) + initialize(); + dispatch_sync(_stream->queue(), ^() { @autoreleasepool { - id commandBuffer = stream->commandBuffer(); - [commandBuffer encodeWaitForEvent:_event value:_currentValue]; - stream->commit(false); + ++_signalCounter; + [_event notifyListener:_listener atValue:_signalCounter block:block]; } }); } -bool MPSEvent::queryEvent() { - return !_isRecorded || (_event.signaledValue >= _currentValue); +bool MPSEvent::queryEvent() const { + // return false if not recorded or signaled yet + return _signalCounter && (_event.signaledValue >= _signalCounter); } } // namespace mps diff --git a/aten/src/ATen/native/Activation.cpp b/aten/src/ATen/native/Activation.cpp index 97f504b85dd1..bef09e81a5ea 100644 --- a/aten/src/ATen/native/Activation.cpp +++ b/aten/src/ATen/native/Activation.cpp @@ -1,11 +1,12 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include +#include #include -#include -#include +#include +#include #include +#include #if defined(C10_MOBILE) && defined(USE_XNNPACK) #include #endif @@ -17,6 +18,63 @@ #include #endif +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { // computes `result = self <= threshold ? value : other` @@ -314,7 +372,7 @@ bool use_mkldnn(const Tensor& input) { if (!at::globalContext().userEnabledMkldnn()) { return false; } - if (!input.is_contiguous() || input.numel() == 1) { + if (!input.is_contiguous() || input.numel() <= 1) { return false; } return (input.is_mkldnn()) || // input is mkldnn Tensor @@ -599,10 +657,12 @@ Tensor rrelu_with_noise_backward( } Tensor rrelu(const Tensor & self, const Scalar& lower, const Scalar& upper, bool training, c10::optional generator) { + TORCH_CHECK(lower.to() <= upper.to(), "Lower bound should be less than or equal to the upper bound") return at::rrelu_with_noise(self, at::empty_like(self, LEGACY_CONTIGUOUS_MEMORY_FORMAT), lower, upper, training, generator); } Tensor & rrelu_(Tensor & self, const Scalar& lower, const Scalar& upper, bool training, c10::optional generator) { + TORCH_CHECK(lower.to() <= upper.to(), "Lower bound should be less than or equal to the upper bound") return at::rrelu_with_noise_(self, at::empty_like(self, LEGACY_CONTIGUOUS_MEMORY_FORMAT), lower, upper, training, generator); } @@ -639,7 +699,7 @@ Tensor prelu_cpu(const Tensor& self, const Tensor& weight_) { auto as_nd = [&](const Tensor& t) { TORCH_CHECK( t.dim() == 1 || t.dim() == 0, - "prelu: Expected `weight` to be a scalar or 1D tensor, but got ndim = ", t.dim()); + "prelu: Expected `weight` to be a scalar or 1D tensor, but got: ndim = ", t.dim()); if (ndim >= 2) { sizes[1] = t.dim() == 1 ? t.size(0) : 1; strides[1] = t.dim() == 1 ? t.stride(0) : 0; diff --git a/aten/src/ATen/native/Activation.h b/aten/src/ATen/native/Activation.h index ba2dbc0768e8..64f6c6a6dceb 100644 --- a/aten/src/ATen/native/Activation.h +++ b/aten/src/ATen/native/Activation.h @@ -1,6 +1,8 @@ #pragma once #include +#include +#include namespace c10 { class Scalar; diff --git a/aten/src/ATen/native/AdaptiveAveragePooling.cpp b/aten/src/ATen/native/AdaptiveAveragePooling.cpp index cf4321a1d2d6..b612ef009b65 100644 --- a/aten/src/ATen/native/AdaptiveAveragePooling.cpp +++ b/aten/src/ATen/native/AdaptiveAveragePooling.cpp @@ -1,9 +1,21 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { @@ -16,16 +28,16 @@ namespace { IntArrayRef output_size) { TORCH_CHECK(output_size.size() == 2, "adaptive_avg_pool2d: output_size must be 2"); - int64_t ndim = input.ndimension(); - for (const auto i : c10::irange(1, ndim)) { + int64_t ndim = input.dim(); + TORCH_CHECK((ndim == 3 || ndim == 4), + "adaptive_avg_pool2d(): Expected 3D or 4D tensor, but got ", input.sizes()); + for (const auto i : {-2, -1}) { TORCH_CHECK(input.size(i) > 0, "adaptive_avg_pool2d(): Expected input to have non-zero size for non-batch dimensions, " - "but input has sizes ", input.sizes(), " with dimension ", i, " being " + "but input has sizes ", input.sizes(), " with dimension ", i + ndim, " being " "empty"); } - TORCH_CHECK((ndim == 3 || ndim == 4), - "adaptive_avg_pool2d(): Expected 3D or 4D tensor, but got ", input.sizes()); TORCH_CHECK(input.dtype() == output.dtype(), "expected dtype ", input.dtype(), " for `output` but got dtype ", output.dtype()); @@ -95,7 +107,7 @@ namespace { return output; } - Tensor adaptive_avg_pool2d(at::Tensor const& input, IntArrayRef output_size) { + Tensor adaptive_avg_pool2d_symint(at::Tensor const& input, SymIntArrayRef output_size) { TORCH_CHECK(output_size.size() == 2, "adaptive_avg_pool2d: output_size must be 2"); TORCH_CHECK( (output_size[0] >= 0 && output_size[1] >= 0), @@ -103,10 +115,10 @@ namespace { "but received {", output_size[0], ", ", output_size[1], "}"); if (input.is_mkldnn()) { - return at::mkldnn_adaptive_avg_pool2d(input, output_size); + return at::mkldnn_adaptive_avg_pool2d(input, c10::asIntArrayRefSlow(output_size)); } - if (!input.is_quantized() && output_size[0] == 1 && output_size[1] == 1) { + if (!input.is_quantized() && output_size[0] == 1 && output_size[1] == 1 && !input.is_xpu()) { // in this case, adaptive pooling is just computing mean over hw // dimensions, which can be done more efficiently #if defined(C10_MOBILE) && defined(USE_XNNPACK) @@ -118,13 +130,13 @@ namespace { Tensor out = input.mean({-1, -2}, /* keepdim = */ true); if (input.suggest_memory_format() == at::MemoryFormat::ChannelsLast) { // assert ndim == 4, since ndim = 3 doesn't give channels_last - const int n = input.size(0); - const int c = input.size(1); - out.as_strided_({n, c, 1, 1}, {c, 1, c, c}); + const auto n = input.sym_size(0); + const auto c = input.sym_size(1); + out.as_strided__symint({n, c, 1, 1}, {c, 1, c, c}); } return out; } else { - return _adaptive_avg_pool2d(input, output_size); + return _adaptive_avg_pool2d_symint(input, output_size); } } diff --git a/aten/src/ATen/native/AdaptiveAveragePooling3d.cpp b/aten/src/ATen/native/AdaptiveAveragePooling3d.cpp index 64b7014b1def..a0a02ca53160 100644 --- a/aten/src/ATen/native/AdaptiveAveragePooling3d.cpp +++ b/aten/src/ATen/native/AdaptiveAveragePooling3d.cpp @@ -1,21 +1,35 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { -inline int start_index(int a, int b, int c) { +inline int64_t start_index(int64_t a, int64_t b, int64_t c) { // NOLINTNEXTLINE(cppcoreguidelines-narrowing-conversions,bugprone-narrowing-conversions) - return (int)std::floor((float)(a * c) / b); + return (a / b) * c + ((a % b) * c) / b; } -inline int end_index(int a, int b, int c) { +inline int64_t end_index(int64_t a, int64_t b, int64_t c) { // NOLINTNEXTLINE(cppcoreguidelines-narrowing-conversions,bugprone-narrowing-conversions) - return (int)std::ceil((float)((a + 1) * c) / b); + return 1 + ((a + 1) * c - 1) / b; } template @@ -299,20 +313,20 @@ Tensor adaptive_avg_pool3d_cpu(Tensor const& input, IntArrayRef output_size) { return output; } -Tensor adaptive_avg_pool3d(Tensor const& input, IntArrayRef output_size) { +Tensor adaptive_avg_pool3d_symint(Tensor const& input, SymIntArrayRef output_size) { TORCH_CHECK(output_size.size() == 3, "adaptive_avg_pool3d: output_size must be 3"); TORCH_CHECK( (output_size[0] >= 0 && output_size[1] >= 0 && output_size[2] >= 0), "adaptive_avg_pool2d: elements of output_size must be greater than or equal to 0 ", "but received {", output_size[0], ", ", output_size[1], ",", output_size[2], "}"); - if (output_size[0] == 1 && output_size[1] == 1 && output_size[2] == 1) { + if (output_size[0] == 1 && output_size[1] == 1 && output_size[2] == 1 && !input.is_xpu()) { // in this case, adaptive pooling is just computing mean over hw // dimensions, which can be done more efficiently Tensor out = input.mean({-1, -2, -3}, /* keepdim = */ true); return out; } else { - return _adaptive_avg_pool3d(input, output_size); + return _adaptive_avg_pool3d_symint(input, output_size); } } diff --git a/aten/src/ATen/native/AdaptiveMaxPooling2d.cpp b/aten/src/ATen/native/AdaptiveMaxPooling2d.cpp index 61e53f52f7b1..8f9c7ce274eb 100644 --- a/aten/src/ATen/native/AdaptiveMaxPooling2d.cpp +++ b/aten/src/ATen/native/AdaptiveMaxPooling2d.cpp @@ -1,8 +1,15 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif namespace at { namespace meta { diff --git a/aten/src/ATen/native/AdaptiveMaxPooling3d.cpp b/aten/src/ATen/native/AdaptiveMaxPooling3d.cpp index 5b9904e02249..ecfc151f0710 100644 --- a/aten/src/ATen/native/AdaptiveMaxPooling3d.cpp +++ b/aten/src/ATen/native/AdaptiveMaxPooling3d.cpp @@ -1,9 +1,16 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif namespace at { namespace meta { @@ -66,12 +73,12 @@ namespace native { namespace { -inline int start_index(int a, int b, int c) { - return (int)std::floor((float)(a * c) / b); +inline int64_t start_index(int64_t a, int64_t b, int64_t c) { + return (a / b) * c + ((a % b) * c) / b; } -inline int end_index(int a, int b, int c) { - return (int)std::ceil((float)((a + 1) * c) / b); +inline int64_t end_index(int64_t a, int64_t b, int64_t c) { + return 1 + ((a + 1) * c - 1) / b; } // #define START_IND(a,b,c) a * c / b diff --git a/aten/src/ATen/native/AdaptivePooling.h b/aten/src/ATen/native/AdaptivePooling.h index 68fb08a5f397..6f6e49e195f4 100644 --- a/aten/src/ATen/native/AdaptivePooling.h +++ b/aten/src/ATen/native/AdaptivePooling.h @@ -1,6 +1,7 @@ #pragma once #include +#include #include namespace at { @@ -19,11 +20,11 @@ DECLARE_DISPATCH(adaptive_max_pooling_fn, adaptive_max_pool2d_kernel); DECLARE_DISPATCH(adaptive_max_pooling_backward_fn, adaptive_max_pool2d_backward_kernel); static inline int64_t start_index(int64_t a, int64_t b, int64_t c) { - return (int64_t)std::floor((float)(a * c) / b); + return (a / b) * c + ((a % b) * c) / b; } static inline int64_t end_index(int64_t a, int64_t b, int64_t c) { - return (int64_t)std::ceil((float)((a + 1) * c) / b); + return 1 + ((a + 1) * c - 1) / b; } }} // namespace at::native diff --git a/aten/src/ATen/native/AffineGridGenerator.cpp b/aten/src/ATen/native/AffineGridGenerator.cpp index fc5b22324eaa..fe2c2d4aaa2b 100644 --- a/aten/src/ATen/native/AffineGridGenerator.cpp +++ b/aten/src/ATen/native/AffineGridGenerator.cpp @@ -1,5 +1,17 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/AutogradComposite.cpp b/aten/src/ATen/native/AutogradComposite.cpp index 08f38ce249bb..c4573d5be918 100644 --- a/aten/src/ATen/native/AutogradComposite.cpp +++ b/aten/src/ATen/native/AutogradComposite.cpp @@ -1,6 +1,19 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/AveragePool2d.cpp b/aten/src/ATen/native/AveragePool2d.cpp index 1a3b88a62e12..441a320b7df2 100644 --- a/aten/src/ATen/native/AveragePool2d.cpp +++ b/aten/src/ATen/native/AveragePool2d.cpp @@ -1,7 +1,15 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif namespace at { diff --git a/aten/src/ATen/native/AveragePool3d.cpp b/aten/src/ATen/native/AveragePool3d.cpp index 1c4724eb038d..a31292ea2167 100644 --- a/aten/src/ATen/native/AveragePool3d.cpp +++ b/aten/src/ATen/native/AveragePool3d.cpp @@ -1,10 +1,19 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif namespace at { diff --git a/aten/src/ATen/native/BatchLinearAlgebra.cpp b/aten/src/ATen/native/BatchLinearAlgebra.cpp index b2dc974f5a3b..9800ab3a5a57 100644 --- a/aten/src/ATen/native/BatchLinearAlgebra.cpp +++ b/aten/src/ATen/native/BatchLinearAlgebra.cpp @@ -1,21 +1,124 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include -#include -#include +#include +#include #include #include #include #include -#include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + // First the required LAPACK implementations are registered here. // A comment above the registered LAPACK routine suggest which batched // linear algebra function uses that routine @@ -27,12 +130,6 @@ extern "C" void cgetrf_(int *m, int *n, std::complex *a, int *lda, int *i extern "C" void dgetrf_(int *m, int *n, double *a, int *lda, int *ipiv, int *info); extern "C" void sgetrf_(int *m, int *n, float *a, int *lda, int *ipiv, int *info); -// getri -extern "C" void zgetri_(int *n, std::complex *a, int *lda, int *ipiv, std::complex *work, int *lwork, int *info); -extern "C" void cgetri_(int *n, std::complex *a, int *lda, int *ipiv, std::complex *work, int *lwork, int *info); -extern "C" void dgetri_(int *n, double *a, int *lda, int *ipiv, double *work, int *lwork, int *info); -extern "C" void sgetri_(int *n, float *a, int *lda, int *ipiv, float *work, int *lwork, int *info); - // potrs extern "C" void zpotrs_(char *uplo, int *n, int *nrhs, std::complex *a, int *lda, std::complex *b, int *ldb, int *info); extern "C" void cpotrs_(char *uplo, int *n, int *nrhs, std::complex *a, int *lda, std::complex *b, int *ldb, int *info); @@ -454,6 +551,18 @@ TORCH_META_FUNC(_linalg_solve_ex)(const Tensor& A, set_output_contiguous(3, shape.slice(0, ndim - 2), A.options().dtype(kInt)); } +TORCH_META_FUNC(linalg_inv_ex)(const Tensor& A, bool check_errors) { + at::native::squareCheckInputs(A, "linalg.inv"); + at::native::checkFloatingOrComplex(A, "linalg.inv", /*allow_low_precision_dtypes*/false); + + auto shape = A.sizes(); + + auto result_strides = at::native::batched_matrix_contiguous_strides(shape, /*f-contig*=*/true); + set_output_strided(0, shape, result_strides, A.options(), {}); + set_output_contiguous( + 1, shape.slice(0, shape.size() - 2), A.options().dtype(ScalarType::Int)); // info +} + TORCH_META_FUNC(linalg_lu_factor_ex)(const Tensor& A, bool pivot, bool check_errors) { TORCH_CHECK(A.dim() >= 2, "torch.lu_factor: Expected tensor with 2 or more dimensions. Got size: ", A.sizes(), " instead"); @@ -682,31 +791,12 @@ namespace native { // Define the per-batch functions to be used in the main implementation of the batched // linear algebra operations -template -void lapackGetri(int n, scalar_t *a, int lda, int *ipiv, scalar_t *work, int lwork, int *info); - template void lapackCholeskySolve(char uplo, int n, int nrhs, scalar_t *a, int lda, scalar_t *b, int ldb, int *info); template void lapackSymeig(char jobz, char uplo, int n, scalar_t *a, int lda, value_t *w, scalar_t *work, int lwork, value_t *rwork, int *info); -template<> void lapackGetri>(int n, c10::complex *a, int lda, int *ipiv, c10::complex *work, int lwork, int *info) { - zgetri_(&n, reinterpret_cast*>(a), &lda, ipiv, reinterpret_cast*>(work), &lwork, info); -} - -template<> void lapackGetri>(int n, c10::complex *a, int lda, int *ipiv, c10::complex *work, int lwork, int *info) { - cgetri_(&n, reinterpret_cast*>(a), &lda, ipiv, reinterpret_cast*>(work), &lwork, info); -} - -template<> void lapackGetri(int n, double *a, int lda, int *ipiv, double *work, int lwork, int *info) { - dgetri_(&n, a, &lda, ipiv, work, &lwork, info); -} - -template<> void lapackGetri(int n, float *a, int lda, int *ipiv, float *work, int lwork, int *info) { - sgetri_(&n, a, &lda, ipiv, work, &lwork, info); -} - template<> void lapackLu>(int m, int n, c10::complex *a, int lda, int *ipiv, int *info) { zgetrf_(&m, &n, reinterpret_cast*>(a), &lda, ipiv, info); } @@ -1508,228 +1598,51 @@ void _linalg_check_errors( TORCH_INTERNAL_ASSERT(false); } -bool _requires_fw_or_bw_grad(const Tensor& input) { +// If an input requires fw or bw grad then we need to go down a different +// (slower) path to ensure that the gradients are computable. +// That is what `_may_require_fw_or_bw_grad` is helpful for. +// +// Why is there a isTensorSubclassLike check here? +// Without it, this function can lead to composite compliance problems, which +// may lead to bugs in functorch, where a Tensor Subclass that doesn't +// require grad may wrap a Tensor subclass that requires grad. +bool _may_require_fw_or_bw_grad(const Tensor& input) { return ((at::GradMode::is_enabled() && input.requires_grad()) - || input._fw_grad(/*level */ 0).defined()); -} - -// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ inverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -/* -Computes the inverse of n-by-n matrix 'self' -This is an in-place routine, it overwrites the content of 'self'. -'infos_lu' and 'infos_getri' are int Tensors containing error codes for each matrix in the batched input. -'infos_lu' is for holding lapackLU errors, and 'infos_getri' is for holding lapackGetri errors. -For more information see LAPACK's documentation for GETRI and GETRF routines. -*/ -template -static void apply_inverse(Tensor& self, Tensor& infos_lu, Tensor& infos_getri) { -#if !AT_BUILD_WITH_LAPACK() - AT_ERROR("inverse: LAPACK library not found in compilation"); -#else - using value_t = typename c10::scalar_value_type::type; - auto self_data = self.data_ptr(); - auto self_matrix_stride = matrixStride(self); - auto batch_size = batchCount(self); - auto n = self.size(-2); - auto lda = std::max(1, n); - - auto ipiv = at::empty({lda}, self.options().dtype(kInt)); - auto ipiv_data = ipiv.data_ptr(); - auto infos_lu_data = infos_lu.data_ptr(); - auto infos_getri_data = infos_getri.data_ptr(); - - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - int info; - // Run once, first to get the optimum work size - // Since we deal with batches of matrices with the same dimensions, doing this outside - // the loop saves (batch_size - 1) workspace queries which would provide the same result - // and (batch_size - 1) calls to allocate and deallocate workspace using at::empty() - int lwork = -1; - scalar_t wkopt; - lapackGetri(n, self_data, lda, ipiv_data, &wkopt, lwork, &info); - lwork = std::max(1, real_impl(wkopt)); - Tensor work = at::empty({lwork}, self.options()); - auto work_data = work.data_ptr(); - - for (const auto i : c10::irange(batch_size)) { - scalar_t* self_working_ptr = &self_data[i * self_matrix_stride]; - int* info_lu_working_ptr = &infos_lu_data[i]; - lapackLu(n, n, self_working_ptr, lda, ipiv_data, info_lu_working_ptr); - - // now compute the actual inverse - int* info_getri_working_ptr = &infos_getri_data[i]; - lapackGetri(n, self_working_ptr, lda, ipiv_data, work_data, lwork, info_getri_working_ptr); - } -#endif -} - -Tensor inverse(const Tensor &self) { - if (self.numel() == 0) { - return at::empty_like(self); - } - return at::linalg_inv(self); -} - -Tensor& inverse_out(const Tensor &self, Tensor &result) { - at::linalg_inv_out(result, self); - return result; -} - -// This is a type dispatching helper function for 'apply_inverse' -Tensor& _linalg_inv_out_helper_cpu(Tensor &result, Tensor& infos_lu, Tensor& infos_getri) { - // This function calculates the inverse matrix in-place - // result should be in column major order and contain matrices to invert - // the content of result is overwritten by 'apply_inverse' - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(result.scalar_type(), "linalg_inv_out_cpu", [&]{ - apply_inverse(result, infos_lu, infos_getri); - }); - return result; + || input._fw_grad(/*level */ 0).defined() + || isTensorSubclassLike(input)); } -// Computes the inverse matrix of 'input', it is saved to 'result' in-place -// LAPACK/MAGMA/cuSOLVER error codes are saved in 'infos' tensors, they are not checked here -static Tensor& linalg_inv_out_info(Tensor& result, Tensor& infos_lu, Tensor& infos_getri, const Tensor& input) { - squareCheckInputs(input, "linalg.inv"); - checkSameDevice("linalg.inv", result, input); - checkLinalgCompatibleDtype("linalg.inv", result, input); - - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(infos_lu.scalar_type() == kInt); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(infos_getri.scalar_type() == kInt); - - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(infos_lu.device() == input.device()); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(infos_getri.device() == input.device()); - - bool result_input_same_type = (result.scalar_type() == input.scalar_type()); - bool result_equal_expected_shape = result.sizes().equals(input.sizes()); - bool is_batched_column_major = false; - if (result.dim() >= 2) { - is_batched_column_major = result.mT().is_contiguous(); - } - - // if result is not empty and not in batched column major format - bool copy_needed = (result.numel() != 0 && !is_batched_column_major); - copy_needed |= !result_input_same_type; // or result does not have the same dtype as input - copy_needed |= (result.numel() != 0 && !result_equal_expected_shape); // or result does not have the expected shape - // we have to allocate a temporary tensor - - // similar conditions for infos_lu and infos_getri tensors - auto expected_info_shape = IntArrayRef(input.sizes().cbegin(), input.sizes().cend() - 2); // input.shape[:-2] - copy_needed |= (infos_lu.numel() != 0 && !infos_lu.is_contiguous()); - copy_needed |= (infos_lu.numel() != 0 && !(infos_lu.sizes().equals(expected_info_shape))); - - copy_needed |= (infos_getri.numel() != 0 && !infos_getri.is_contiguous()); - copy_needed |= (infos_getri.numel() != 0 && !(infos_getri.sizes().equals(expected_info_shape))); - - if (copy_needed) { - Tensor result_tmp = at::empty(input.sizes(), input.options()); - result_tmp.transpose_(-2, -1); - Tensor infos_lu_tmp = at::zeros({expected_info_shape}, input.options().dtype(kInt)); - Tensor infos_getri_tmp = at::zeros({expected_info_shape}, input.options().dtype(kInt)); - - result_tmp = linalg_inv_out_info(result_tmp, infos_lu_tmp, infos_getri_tmp, input); - - at::native::resize_output(result, result_tmp.sizes()); - result.copy_(result_tmp); - at::native::resize_output(infos_lu, infos_lu_tmp.sizes()); - infos_lu.copy_(infos_lu_tmp); - at::native::resize_output(infos_getri, infos_getri_tmp.sizes()); - infos_getri.copy_(infos_getri_tmp); - return result; - } - // else use result's storage directly - - // if result has no elements we can modify it - if (result.numel() == 0) { - at::native::resize_as_(result, input.mT(), MemoryFormat::Contiguous); - result.transpose_(-2, -1); - } - - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(result.sizes().equals(input.sizes())); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(result.scalar_type() == input.scalar_type()); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(result.device() == input.device()); - - // result tensor must be in batched column major order (Fortran contiguous) - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(result.mT().is_contiguous()); - - // if info has no elements we can modify it - if (infos_lu.numel() == 0) { - infos_lu.resize_(expected_info_shape); - infos_lu.fill_(0); - } - if (infos_getri.numel() == 0) { - infos_getri.resize_(expected_info_shape); - infos_getri.fill_(0); +// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ linalg.inv ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +TORCH_IMPL_FUNC(linalg_inv_ex_out)(const Tensor& A, bool check_errors, const Tensor& result, const Tensor& info) { + // Fill result with the identity + result.zero_(); + result.diagonal(0, -2, -1).fill_(1.); + at::linalg_solve_ex_out(const_cast(result), const_cast(info), A, result, /*left*/true); + if (check_errors) { + at::_linalg_check_errors(info, "linalg.inv_ex", A.dim() == 2); } - - // info tensors must be contiguous - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(infos_lu.is_contiguous()); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(infos_lu.sizes().equals(expected_info_shape)); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(infos_getri.is_contiguous()); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(infos_getri.sizes().equals(expected_info_shape)); - - // _linalg_inv_out_helper_ (apply_inverse) performs calculations in-place and result must be a copy of input - result.copy_(input); - - // TODO: Replace this helper with DECLARE/DEFINE_DISPATCH - result = at::_linalg_inv_out_helper_(result, infos_lu, infos_getri); - return result; } -// Computes the inverse matrix of 'input', it is saved to 'result' in-place -Tensor& linalg_inv_out(const Tensor &input, Tensor &result) { - auto info_shape = IntArrayRef(input.sizes().cbegin(), input.sizes().cend() - 2); // input.shape[:-2] - auto infos_lu = at::zeros({info_shape}, input.options().dtype(kInt)); - auto infos_getri = at::zeros({info_shape}, input.options().dtype(kInt)); - result = linalg_inv_out_info(result, infos_lu, infos_getri, input); - - // Now check LAPACK/MAGMA/cuSOLVER error codes - at::_linalg_check_errors(infos_lu, "linalg.inv", result.dim() == 2); - at::_linalg_check_errors(infos_getri, "linalg.inv", result.dim() == 2); +Tensor& linalg_inv_out(const Tensor& A, Tensor& result) { + auto info = at::empty({0}, A.options().dtype(kInt)); + at::linalg_inv_ex_out(result, info, A); + at::_linalg_check_errors(info, "linalg.inv", A.dim() == 2); return result; } -// Computes the inverse matrix of 'input' -Tensor linalg_inv(const Tensor &input) { +Tensor linalg_inv(const Tensor& A) { Tensor result, info; - std::tie(result, info) = at::linalg_inv_ex(input, /*check_errors=*/false); - - // we pass check_errors=false above and do the check here - // so that the name of the function is correct in the error message - at::_linalg_check_errors(info, "torch.linalg.inv", input.dim() == 2); + std::tie(result, info) = at::linalg_inv_ex(A); + at::_linalg_check_errors(info, "linalg.inv", A.dim() == 2); return result; } -std::tuple linalg_inv_ex_out(const Tensor& input, bool check_errors, Tensor& inverse, Tensor& info) { - squareCheckInputs(input, "linalg.inv_ex"); - ScalarType info_output_type = ScalarType::Int; - TORCH_CHECK( - info.scalar_type() == info_output_type, - "torch.linalg.inv_ex: ", - "Expected info to have ", info_output_type, " dtype, but got info with dtype ", info.scalar_type()); - - // provided `info` tensor is used to save the information about the LU decomposition of `input` - // in addition current implementation requires a separate tensor - // for saving the information about the inversion process after the LU decomposition - auto expected_info_shape = IntArrayRef(input.sizes().cbegin(), input.sizes().cend() - 2); // input.shape[:-2] - auto info_inversion = at::zeros({expected_info_shape}, input.options().dtype(kInt)); - - linalg_inv_out_info(inverse, info, info_inversion, input); - - if (check_errors) { - at::_linalg_check_errors(info, "torch.linalg.inv_ex", input.dim() == 2); - } - - return std::tuple(inverse, info); +Tensor& inverse_out(const Tensor& A, Tensor& result) { + return at::linalg_inv_out(result, A); } -std::tuple linalg_inv_ex(const Tensor& input, bool check_errors) { - squareCheckInputs(input, "linalg.inv_ex"); - Tensor inverse = at::empty(input.sizes(), input.options(), MemoryFormat::Contiguous); - inverse.transpose_(-2, -1); // make `inverse` tensor with batched column major format - auto info_shape = IntArrayRef(input.sizes().cbegin(), input.sizes().cend() - 2); // input.shape[:-2] - Tensor info = at::zeros({info_shape}, input.options().dtype(kInt)); - std::tie(inverse, info) = at::native::linalg_inv_ex_out(input, check_errors, inverse, info); - return std::make_tuple(inverse, info); +Tensor inverse(const Tensor& A) { + return at::linalg_inv(A); } // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cholesky_solve ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -2001,6 +1914,7 @@ TORCH_IMPL_FUNC(_linalg_solve_ex_out)(const Tensor& A, // Possible optimization: Compute the LU factorization of A^T if A is contiguous // Then we solve A^T X = B with adjoint=True // This saves a copy as A doesn't need to be copied into an F-contig matrix in lu_factor + // This optimization makes functorch's batching rule difficult. See NOTE [ solve_ex Batch Rule Contiguity ] const bool use_A_T = A.is_contiguous() && !A.is_complex(); at::linalg_lu_factor_ex_out(const_cast(LU), const_cast(pivots), @@ -2204,7 +2118,7 @@ TORCH_IMPL_FUNC(lu_unpack_out)(const Tensor& LU, .add_owned_input(pivots.contiguous()) .build(); - unpack_pivots_stub(pivots.device().type(), iter, std::min(m, n)); + unpack_pivots_stub(pivots.device().type(), iter, std::min(m, n), m); // Transform the permutation into a permutation matrix P.zero_(); @@ -2756,6 +2670,10 @@ Tensor& ormqr_out(const Tensor& input, const Tensor& tau, const Tensor& other, b left_size_condition, "] must be equal to input.shape[-2]"); + TORCH_CHECK( + tau.size(-1) <= input.size(-1), + "torch.ormqr: tau.shape[-1] must be less than or equal to input.shape[-1]"); + TORCH_CHECK( input.dim() - tau.dim() == 1, "torch.ormqr: ", @@ -2886,9 +2804,8 @@ std::tuple linalg_eigh_out(const Tensor& A, c10::string_view u Tensor linalg_eigvalsh(const Tensor& A, c10::string_view uplo) { - // See [Note: svdvals_compute_uv] for the condition in compute_v return std::get<0>(at::_linalg_eigh(A, uplo, - /*comptue_v=*/_requires_fw_or_bw_grad(A) || isTensorSubclassLike(A))); + /*compute_v=*/_may_require_fw_or_bw_grad(A))); } Tensor& linalg_eigvalsh_out(const Tensor& A, c10::string_view uplo, Tensor& L) { @@ -3346,7 +3263,7 @@ Tensor& linalg_eigvals_out(const Tensor& input, Tensor& values) { Tensor linalg_eigvals(const Tensor& input) { // if input requires grad we must compute the eigenvectors to make this function differentiable // the eigenvectors are not exposed to the user - if (_requires_fw_or_bw_grad(input)) { + if (_may_require_fw_or_bw_grad(input)) { return std::get<0>(at::linalg_eig(input)); } @@ -3358,66 +3275,6 @@ Tensor linalg_eigvals(const Tensor& input) { return values; } -// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ eig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -DEFINE_DISPATCH(eig_stub); - -std::tuple eig_out(const Tensor& self, bool eigenvectors, Tensor& e, Tensor& v) { - TORCH_WARN_ONCE( - "torch.eig is deprecated in favor of torch.linalg.eig and will be removed in a future ", - "PyTorch release.\n", - "torch.linalg.eig returns complex tensors of dtype cfloat or cdouble rather than real tensors ", - "mimicking complex tensors.\n", - "L, _ = torch.eig(A)\n", - "should be replaced with\n", - "L_complex = torch.linalg.eigvals(A)\n", - "and\n", - "L, V = torch.eig(A, eigenvectors=True)\n", - "should be replaced with\n", - "L_complex, V_complex = torch.linalg.eig(A)" - ); - TORCH_CHECK(self.dim() == 2, "input should be 2 dimensional"); - TORCH_CHECK(self.size(0) == self.size(1), "input should be square"); - TORCH_CHECK(self.isfinite().all().item(), "input should not contain infs or NaNs"); - checkSameDevice("torch.eig", e, self, "eigenvalues"); - checkLinalgCompatibleDtype("torch.eig", e, self, "eigenvalues"); - if (eigenvectors) { - checkSameDevice("torch.eig", v, self, "eigenvectors"); - checkLinalgCompatibleDtype("torch.eig", v, self, "eigenvectors"); - } - int64_t n = self.size(-1); - - if (isComplexType(at::typeMetaToScalarType(self.dtype()))) { - at::native::resize_output(e, {n}); - } else { - at::native::resize_output(e, {n, 2}); - } - if (eigenvectors) { - at::native::resize_output(v, self.sizes()); - } - - // optimization: if self is empty, we can immediately return the empty - // tensors, instead of getting empty tensors from eig_helper - if (self.numel() == 0) { - return std::tuple(e, v); - } - - Tensor vals_, vecs_; - std::tie(vals_, vecs_) = eig_stub(self.device().type(), self, eigenvectors); - e.copy_(vals_); - if (eigenvectors) { - v.copy_(vecs_); - } - return std::tuple(e, v); -} - -std::tuple eig(const Tensor& self, bool eigenvectors) { - Tensor e = at::empty({0}, self.options()); - Tensor v = at::empty({0}, self.options()); - at::eig_out(e, v, self, eigenvectors); - return std::tuple(e, v); -} - // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ linalg_svd ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /* torch.svd, implemented in terms of torch.linalg.svd. There are two main @@ -3516,12 +3373,8 @@ Tensor& linalg_svdvals_out(const Tensor& A, c10::optional driv } Tensor linalg_svdvals(const Tensor& A, c10::optional driver) { - // [Note: svdvals_compute_uv] - // NB: Why do we need isTensorSubclassLike check for linalg_svdvals but not linalg_eigvals? - // svdvals is decomposed at the vmap level in functorch so A can be a BatchedTensor wrapping - // a TensorWrapper requiring fw or bw grad. return std::get<1>(at::_linalg_svd(A, /*full_matrices=*/false, - /*comptue_uv=*/_requires_fw_or_bw_grad(A) || isTensorSubclassLike(A), + /*compute_uv=*/_may_require_fw_or_bw_grad(A), /*driver=*/driver)); } @@ -3766,14 +3619,16 @@ static void linalg_lstsq_out_info( at::sum_out(residuals, raw_residuals, /*dim=*/-2, /*keepdim=*/false, /*dtype*/real_dtype); } } - solution = solution.narrow(/*dim=*/-2, /*start=*/0, /*length*/n); + auto solution_view = solution.narrow(/*dim=*/-2, /*start=*/0, /*length*/n); + // manually restride original + solution.set_(solution.storage(), solution_view.storage_offset(), solution_view.sizes(), solution_view.strides()); if (m == 0) { solution.zero_(); } // for 1-dimensional 'other', we need to squeeze the solution after "apply_lstsq" if (vector_case) { - solution = solution.squeeze_(-1); + solution.squeeze_(-1); } } @@ -3987,106 +3842,6 @@ std::tuple linalg_lstsq( return std::make_tuple(solution, residuals, rank, singular_values); } -// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ legacy_lstsq ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -// This wraps Lapack's gels routine, which uses a QR or LQ factorization to -// solve any linear system, minimizing ||A.X - B|| -// A & B must be fortran-contiguous matrixes. -// On exit, A is overwritten with the QR/LQ factorization of input A -// B is overwritten with the solution vectors -template -static void apply_lstsq(const Tensor& B, const Tensor& A) { -#if !AT_BUILD_WITH_LAPACK() - TORCH_INTERNAL_ASSERT(false, "lstsq: LAPACK library not found in compilation"); -#else - - int m, n, nrhs, lda, ldb, info, lwork; - scalar_t wkopt = 0.0; - lwork = -1; // work length - m = A.size(0); - n = A.size(1); - nrhs = B.size(1); - info = 0; - lda = m; - ldb = (m > n) ? m : n; - - auto B_data = B.data_ptr(); - auto A_data = A.data_ptr(); - - // get info how much space is needed - lapackGels('N', m, n, nrhs, A_data, lda, B_data, ldb, &wkopt, lwork, &info); - - lwork = static_cast(wkopt); - Tensor work_tensor = at::empty({lwork}, A.scalar_type()); - auto work = work_tensor.data_ptr(); - - lapackGels('N', m, n, nrhs, A_data, lda, B_data, ldb, work, lwork, &info); - - TORCH_CHECK( - info >= 0, - "Lapack Error in gels : Illegal argument ", -info); - TORCH_CHECK( - info == 0, - "Lapack Error in gels: The ", info, "-th diagonal element of the ", - "triangular factor of A is zero"); -#endif -} - -std::tuple legacy_lstsq(const Tensor& B, const Tensor& A) { - TORCH_WARN_ONCE( - "torch.lstsq is deprecated in favor of torch.linalg.lstsq and will be removed in a future PyTorch release.\n", - "torch.linalg.lstsq has reversed arguments and does not return the QR decomposition in " - "the returned tuple (although it returns other information about the problem).\n", - "To get the qr decomposition consider using torch.linalg.qr.\n", - "The returned solution in torch.lstsq stored the residuals of the solution in the ", - "last m - n columns of the returned value whenever m > n. In torch.linalg.lstsq, the ", - "residuals in the field 'residuals' of the returned named tuple.\n", - "The unpacking of the solution, as in\n", - "X, _ = torch.lstsq(B, A).solution[:A.size(1)]\n", - "should be replaced with\n", - "X = torch.linalg.lstsq(A, B).solution"); - - TORCH_CHECK(A.scalar_type() == B.scalar_type(), "Exepected A and B dtypes to match but found ", - A.scalar_type(), " and ", B.scalar_type()); - TORCH_CHECK(A.dim() == 2, "Expected A to have 2 dimensions, but got ", A.dim()); - TORCH_CHECK(A.numel() != 0, "A should not be empty"); - TORCH_CHECK(B.dim() == 1 || B.dim() == 2, "Expected B to have 1 or 2 " - "dimensions, but got ", B.dim()); - TORCH_CHECK(B.numel() != 0, "B should not be empty"); - TORCH_CHECK(A.size(0) == B.size(0), "Expected A and B to have same size " - "at dim 0, but A has ", A.size(0), " rows and B has ", B.size(0), " rows"); - - const auto a_sizes = A.sizes(); - const auto ldb = std::max(a_sizes[0], a_sizes[1]); - - auto A_working = cloneBatchedColumnMajor(A); - auto B_working = copyBatchedColumnMajor(B.dim() == 1 ? B.unsqueeze(1) : B, ldb); - - AT_DISPATCH_FLOATING_TYPES(B.scalar_type(), "lstsq_cpu", [&] { - apply_lstsq(B_working, A_working); - }); - - return std::tuple(B_working, A_working); -} - -std::tuple legacy_lstsq_out( - const Tensor& B, const Tensor& A, Tensor& B_out, Tensor& A_out) { - const auto dtype = A.scalar_type(); - TORCH_CHECK(B.scalar_type() == dtype, "exepected A and B dtypes to match but found ", - A.scalar_type(), " and ", B.scalar_type()); - TORCH_CHECK(A_out.scalar_type() == dtype, "A_out to have scalar type ", dtype, - " but found", A_out.scalar_type()); - TORCH_CHECK(B_out.scalar_type() == dtype, "A_out to have scalar type ", dtype, - " but found", B_out.scalar_type()); - Tensor A_tmp, B_tmp; - std::tie(B_tmp, A_tmp) = native::legacy_lstsq(B, A); - resize_output(A_out, A_tmp.sizes()); - A_out.copy_(A_tmp); - resize_output(B_out, B_tmp.sizes()); - B_out.copy_(B_tmp); - return std::tuple(B_out, A_out); -} - DEFINE_DISPATCH(ldl_factor_stub); TORCH_IMPL_FUNC(linalg_ldl_factor_ex_out) diff --git a/aten/src/ATen/native/BatchLinearAlgebra.h b/aten/src/ATen/native/BatchLinearAlgebra.h index 531595f3544e..955b83b3855a 100644 --- a/aten/src/ATen/native/BatchLinearAlgebra.h +++ b/aten/src/ATen/native/BatchLinearAlgebra.h @@ -231,10 +231,6 @@ using cholesky_inverse_fn = Tensor& (*)(Tensor& /*result*/, Tensor& /*infos*/, b DECLARE_DISPATCH(cholesky_inverse_fn, cholesky_inverse_stub); -using eig_fn = std::tuple (*)(const Tensor&, bool&); - -DECLARE_DISPATCH(eig_fn, eig_stub); - using linalg_eig_fn = void (*)(Tensor& /*eigenvalues*/, Tensor& /*eigenvectors*/, Tensor& /*infos*/, const Tensor& /*input*/, bool /*compute_eigenvectors*/); DECLARE_DISPATCH(linalg_eig_fn, linalg_eig_stub); @@ -284,7 +280,8 @@ DECLARE_DISPATCH(lu_factor_fn, lu_factor_stub); using unpack_pivots_fn = void(*)( TensorIterator& iter, - const int64_t dim_size); + const int64_t dim_size, + const int64_t max_pivot); DECLARE_DISPATCH(unpack_pivots_fn, unpack_pivots_stub); using lu_solve_fn = void (*)( diff --git a/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp b/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp index 5b18dbe2d5fa..e53d8cd2d38f 100644 --- a/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp +++ b/aten/src/ATen/native/BatchLinearAlgebraKernel.cpp @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include @@ -7,6 +9,14 @@ #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + namespace at { namespace native { namespace { @@ -127,87 +137,6 @@ Tensor& cholesky_inverse_kernel_impl(Tensor& result, Tensor& infos, bool upper) return result; } -template -void apply_eig(const Tensor& self, bool eigenvectors, Tensor& vals_, Tensor& vecs_, int* info_ptr) { -#if !AT_BUILD_WITH_LAPACK() - TORCH_CHECK(false, "Calling torch.eig on a CPU tensor requires compiling ", - "PyTorch with LAPACK. Please use PyTorch built with LAPACK support."); -#else - using value_t = typename c10::scalar_value_type::type; - - char jobvr = eigenvectors ? 'V' : 'N'; - int64_t n = self.size(-1); - auto self_data = self.data_ptr(); - - auto vals_data = vals_.data_ptr(); - scalar_t* wr = vals_data; - - scalar_t* vecs_data = eigenvectors ? vecs_.data_ptr() : nullptr; - // NOLINTNEXTLINE(cppcoreguidelines-narrowing-conversions,bugprone-narrowing-conversions) - int ldvr = eigenvectors ? n : 1; - - Tensor rwork; - value_t* rwork_data = nullptr; - if (self.is_complex()) { - ScalarType real_dtype = toRealValueType(typeMetaToScalarType(self.dtype())); - rwork = at::empty({n*2}, self.options().dtype(real_dtype)); - rwork_data = rwork.data_ptr(); - } - - if (n > 0) { - // call lapackEig once to get the optimal size for work data - scalar_t wkopt; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - lapackEig('N', jobvr, n, self_data, n, wr, - nullptr, 1, vecs_data, ldvr, &wkopt, -1, rwork_data, info_ptr); - int lwork = std::max(1, real_impl(wkopt)); - - // call again to do the actual work - Tensor work = at::empty({lwork}, self.dtype()); - lapackEig('N', jobvr, n, self_data, n, wr, - nullptr, 1, vecs_data, ldvr, work.data_ptr(), lwork, rwork_data, info_ptr); - } -#endif -} - -std::tuple eig_kernel_impl(const Tensor& self, bool& eigenvectors) { - int64_t n = self.size(-1); - // lapackEig function expects the input to be column major, or stride {1, n}, - // so we must set the stride manually since the default stride for tensors is - // row major, {n, 1} - Tensor self_ = at::empty_strided( - {n, n}, - {1, n}, - at::TensorOptions(self.dtype())); - self_.copy_(self); - - auto options = self.options().memory_format(LEGACY_CONTIGUOUS_MEMORY_FORMAT); - - // the API is slightly different for the complex vs real case: if the input - // is complex, eigenvals will be a vector of complex. If the input is real, - // eigenvals will be a (n, 2) matrix containing the real and imaginary parts - // in each column - Tensor vals_; - if (self.is_complex()) { - vals_ = at::empty({n}, options); - } else { - vals_ = at::empty_strided({n, 2}, {1, n}, options); - } - Tensor vecs_ = eigenvectors - ? at::empty_strided({n, n}, {1, n}, options) - : Tensor(); - - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - auto infos = at::zeros({}, self.options().dtype(kInt)); - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(self.scalar_type(), "eig_cpu", [&]{ - apply_eig(self_, eigenvectors, vals_, vecs_, infos.data_ptr()); - }); - // NOLINTNEXTLINE(clang-analyzer-core.CallAndMessage) - at::_linalg_check_errors(infos, "eig", /*is_matrix*/true); - - return std::tuple(vals_, vecs_); -} - /* Computes the eigenvalues and eigenvectors of n-by-n matrix 'input'. This is an in-place routine, content of 'input', 'values', 'vectors' is overwritten. @@ -522,15 +451,6 @@ Tensor& orgqr_kernel_impl(Tensor& result, const Tensor& tau) { return result; } -// we use `enum class LapackLstsqDriverType` as keys in an unordered_map. -// Clang5 and Gcc5 do not support std::hash for enum classes, hence -// we provide our own hash function. -struct LapackLstsqDriverTypeHash { - std::size_t operator()(const LapackLstsqDriverType& driver_type) const { - return static_cast(driver_type); - } -}; - /* Solves a least squares problem. That is minimizing ||B - A X||. @@ -561,7 +481,7 @@ void apply_lstsq(const Tensor& A, Tensor& B, Tensor& rank, Tensor& singular_valu auto lapack_func = lapackLstsq; static auto driver_type_to_func - = std::unordered_map({ + = std::unordered_map({ {driver_t::Gels, lapackLstsq}, {driver_t::Gelsy, lapackLstsq}, {driver_t::Gelsd, lapackLstsq}, @@ -1072,6 +992,15 @@ void apply_lu_solve(const Tensor& LU, const Tensor& pivots, const Tensor& B, Tra // This is a type dispatching helper function for 'apply_lu_solve' void lu_solve_kernel(const Tensor& LU, const Tensor& pivots, const Tensor& B, TransposeType trans) { + // Lapack will write into unrelated memory if pivots are not in the right range so we do + // some simple sanity checks here for the CPU version + TORCH_CHECK(pivots.gt(0).all().item(), + "Pivots given to lu_solve must all be greater or equal to 1. " + "Did you properly pass the result of lu_factor?"); + TORCH_CHECK(pivots.le(LU.size(-2)).all().item(), + "Pivots given to lu_solve must all be smaller or equal to LU.size(-2). " + "Did you properly pass the result of lu_factor?"); + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(LU.scalar_type(), "linalg.lu_solve_cpu", [&]{ apply_lu_solve(LU, pivots, B, trans); }); @@ -1157,7 +1086,7 @@ void svd_kernel(const Tensor& A, }); } -void unpack_pivots_cpu_kernel(TensorIterator& iter, const int64_t dim_size) { +void unpack_pivots_cpu_kernel(TensorIterator& iter, const int64_t dim_size, const int64_t max_pivot) { if (iter.numel() == 0) { return; } @@ -1173,9 +1102,13 @@ void unpack_pivots_cpu_kernel(TensorIterator& iter, const int64_t dim_size) { const auto pivots_data = reinterpret_cast(pivots_ptr); for (const auto i : c10::irange(dim_size)) { + auto new_idx = pivots_data[i] - 1; + TORCH_CHECK(new_idx >= 0 && new_idx < max_pivot, + "pivots passed to lu_unpack must be between 1 and LU.size(-2) inclusive." + "Did you properly pass the result of lu_factor?"); std::swap( perm_data[i], - perm_data[pivots_data[i] - 1] + perm_data[new_idx] ); } @@ -1200,12 +1133,6 @@ REGISTER_AVX2_DISPATCH(cholesky_inverse_stub, &cholesky_inverse_kernel_impl); REGISTER_VSX_DISPATCH(cholesky_inverse_stub, &cholesky_inverse_kernel_impl); REGISTER_ZVECTOR_DISPATCH(cholesky_inverse_stub, &cholesky_inverse_kernel_impl); -REGISTER_ARCH_DISPATCH(eig_stub, DEFAULT, &eig_kernel_impl); -REGISTER_AVX512_DISPATCH(eig_stub, &eig_kernel_impl); -REGISTER_AVX2_DISPATCH(eig_stub, &eig_kernel_impl); -REGISTER_VSX_DISPATCH(eig_stub, &eig_kernel_impl); -REGISTER_ZVECTOR_DISPATCH(eig_stub, &eig_kernel_impl); - REGISTER_ARCH_DISPATCH(linalg_eig_stub, DEFAULT, &linalg_eig_kernel); REGISTER_AVX512_DISPATCH(linalg_eig_stub, &linalg_eig_kernel); REGISTER_AVX2_DISPATCH(linalg_eig_stub, &linalg_eig_kernel); diff --git a/aten/src/ATen/native/Batching.cpp b/aten/src/ATen/native/Batching.cpp index 109499f9cb17..b50b6201b7a2 100644 --- a/aten/src/ATen/native/Batching.cpp +++ b/aten/src/ATen/native/Batching.cpp @@ -1,3 +1,4 @@ +#include #include #include #include diff --git a/aten/src/ATen/native/BinaryOps.cpp b/aten/src/ATen/native/BinaryOps.cpp index 807170026a21..e0815b786d17 100644 --- a/aten/src/ATen/native/BinaryOps.cpp +++ b/aten/src/ATen/native/BinaryOps.cpp @@ -1,15 +1,149 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include -#include -#include -#include +#include +#include +#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include -#include -#include -#include -#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif namespace at { @@ -880,7 +1014,8 @@ Tensor mul_zerotensor(const Tensor& self, const Tensor& other) { auto out_device = correct_out_device(self, other); // hack to use the TensorIterator to get the correct broadcasting and type promotion logic auto device_ = Device(DeviceType::Meta); - auto meta_out = at::redispatch::mul(c10::DispatchKeySet(at::DispatchKey::Meta), self.to(device_), other.to(device_)); + constexpr c10::DispatchKeySet meta_dks(at::DispatchKey::Meta); + auto meta_out = at::_ops::mul_Tensor::redispatch(meta_dks, self.to(device_), other.to(device_)); return at::_efficientzerotensor(meta_out.sizes(), meta_out.options().device(out_device)); } @@ -888,7 +1023,8 @@ Tensor div_zerotensor(const Tensor& self, const Tensor& other) { auto out_device = correct_out_device(self, other); // hack to use the TensorIterator to get the correct broadcasting and type promotion logic auto device_ = Device(DeviceType::Meta); - auto meta_out = at::redispatch::div(c10::DispatchKeySet(at::DispatchKey::Meta), self.to(device_), other.to(device_)); + constexpr c10::DispatchKeySet meta_dks(at::DispatchKey::Meta); + auto meta_out = at::_ops::div_Tensor::redispatch(meta_dks, self.to(device_), other.to(device_)); if (self._is_zerotensor()) { if (other._is_zerotensor()) { @@ -916,7 +1052,9 @@ Tensor maybe_add_maybe_sub(const Tensor& self, const Tensor& other, const Scalar auto out_device = correct_out_device(self, other); // hack to use the TensorIterator to get the correct broadcasting and type promotion logic auto device_ = Device(DeviceType::Meta); - auto meta_out = at::redispatch::add(c10::DispatchKeySet(at::DispatchKey::Meta), self.to(device_), other.to(device_)); + constexpr c10::DispatchKeySet meta_dks(at::DispatchKey::Meta); + auto meta_out = at::_ops::add_Tensor::redispatch( + meta_dks, self.to(device_), other.to(device_), alpha); auto get_out_like = [&] (const Tensor& tensor) { @@ -951,7 +1089,7 @@ Tensor linalg_cross_zerotensor( // hack to use the TensorIterator to get the correct broadcasting and type // promotion logic (see add_zerotensor) auto device = Device(DeviceType::Meta); - auto meta_out = at::redispatch::linalg_cross( + auto meta_out = at::_ops::linalg_cross::redispatch( c10::DispatchKeySet(at::DispatchKey::Meta), input.to(device), other.to(device), diff --git a/aten/src/ATen/native/Blas.cpp b/aten/src/ATen/native/Blas.cpp index 0e9f62d9a3f1..deda705d0887 100644 --- a/aten/src/ATen/native/Blas.cpp +++ b/aten/src/ATen/native/Blas.cpp @@ -1,12 +1,31 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { TORCH_META_FUNC(addmv)(const Tensor &self, const Tensor &mat, const Tensor &vec, const Scalar& beta, const Scalar& alpha) { diff --git a/aten/src/ATen/native/BlasKernel.cpp b/aten/src/ATen/native/BlasKernel.cpp index 9cf1f995f3ca..87182b3514df 100644 --- a/aten/src/ATen/native/BlasKernel.cpp +++ b/aten/src/ATen/native/BlasKernel.cpp @@ -1,8 +1,12 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include -#include +#include #include +#include #include +#include +#include #if AT_BUILD_WITH_BLAS() extern "C" double ddot_(int *n, double *x, int *incx, double *y, int *incy); diff --git a/aten/src/ATen/native/Bucketization.cpp b/aten/src/ATen/native/Bucketization.cpp index 15d30c137d5b..7b53a31c5be7 100644 --- a/aten/src/ATen/native/Bucketization.cpp +++ b/aten/src/ATen/native/Bucketization.cpp @@ -1,10 +1,17 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + /* Implement a numpy like searchsorted and a TF like bucketize function running on cpu * * - torch.searchsorted(sorted_sequence, values, right=False, side='left', out_int32=False, sorter=None) diff --git a/aten/src/ATen/native/CPUBlas.cpp b/aten/src/ATen/native/CPUBlas.cpp index 13593a337949..b78e57fc63d6 100644 --- a/aten/src/ATen/native/CPUBlas.cpp +++ b/aten/src/ATen/native/CPUBlas.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include diff --git a/aten/src/ATen/native/CPUFallback.cpp b/aten/src/ATen/native/CPUFallback.cpp index 5199fb8acc78..985ee15a5a99 100644 --- a/aten/src/ATen/native/CPUFallback.cpp +++ b/aten/src/ATen/native/CPUFallback.cpp @@ -1,13 +1,19 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include #include -#include #include -#include + +#ifndef AT_PER_OPERATOR_HEADERS #include +#else +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/CPUFallback.h b/aten/src/ATen/native/CPUFallback.h index 91f1f08c1184..2d4dfc98aa06 100644 --- a/aten/src/ATen/native/CPUFallback.h +++ b/aten/src/ATen/native/CPUFallback.h @@ -15,27 +15,21 @@ TORCH_API void cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* st // This is a helper function that backends can use to directly call their boxed CPU fallback // TODO: update and add a usage example after https://github.com/pytorch/pytorch/pull/58092 lands. -template +template struct _call_fallback_fn final {}; -template -struct _call_fallback_fn final { - static_assert(std::is_same::return_type>::value, - "Return type mismatch"); - static_assert(std::is_same, typename guts::infer_function_traits_t::parameter_types>::value, - "Parameter types mismatch"); - - static ReturnType call(ParameterTypes... args) { +template +struct _call_fallback_fn final { + static ReturnType call(typename c10::maybe_keep_symint::type... args) { auto op = c10::Dispatcher::singleton() // TODO: figure out how to make compiler happy without dynamic casts .findSchemaOrThrow((const char*) Op::name, (const char*) Op::overload_name) //.findSchemaOrThrow("a", "b") - .typed(); - return c10::impl::BoxedKernelWrapper::call( + .typed::type...)>(); + return c10::impl::BoxedKernelWrapper::type...)>::call( c10::BoxedKernel::makeFromFunction(), op, c10::DispatchKeySet(), // we know that the cpu_fallback doesn't use the dispatch keyset. - //std::forward(args...) // TODO: get std::forward<> to work args... ); @@ -43,7 +37,10 @@ struct _call_fallback_fn final { }; template -using call_fallback_fn = _call_fallback_fn; +using call_fallback_fn_symint = _call_fallback_fn; + +template +using call_fallback_fn = _call_fallback_fn; } // namespace native } // namespace at diff --git a/aten/src/ATen/native/ChanelShuffle.cpp b/aten/src/ATen/native/ChanelShuffle.cpp index 7def359e7056..a4f9f2bfe864 100644 --- a/aten/src/ATen/native/ChanelShuffle.cpp +++ b/aten/src/ATen/native/ChanelShuffle.cpp @@ -1,14 +1,23 @@ -#include - +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include #if defined(C10_MOBILE) && defined(USE_XNNPACK) #include #endif #include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/Col2Im.cpp b/aten/src/ATen/native/Col2Im.cpp index f1e08a887c84..5ce747e9c7a7 100644 --- a/aten/src/ATen/native/Col2Im.cpp +++ b/aten/src/ATen/native/Col2Im.cpp @@ -1,12 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + // Note [im2col/col2im output padding] // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Our implementations of im2col and col2im take both the input height/width as @@ -135,7 +144,6 @@ static void col2im_out_cpu_template( int64_t n_output_plane = n_input_plane / (kernel_width * kernel_height); output.resize_({batch_size, n_output_plane, output_height, output_width}); - output.zero_(); AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kBFloat16, kHalf, input.scalar_type(), "col2im_out_cpu", [&] { @@ -179,18 +187,6 @@ static void col2im_out_cpu_template( }); } -void col2im_backward_out_cpu_template( - Tensor& grad_input, - const Tensor& grad_output, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride) { - // im2col_out_cpu checks size of kernel_size, dilation, padding and stride - at::native::im2col_out_cpu( - grad_output, kernel_size, dilation, padding, stride, grad_input); -} - } // namespace Tensor& col2im_out_cpu(const Tensor& input, @@ -219,29 +215,5 @@ Tensor col2im_cpu( return output; } -Tensor& col2im_backward_out_cpu(const Tensor& grad_output, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride, - Tensor& grad_input) { - col2im_backward_out_cpu_template( - grad_input, grad_output, kernel_size, dilation, padding, stride); - return grad_input; -} - -Tensor col2im_backward_cpu( - const Tensor& grad_output, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride) { - Tensor grad_input = at::empty_like(grad_output, LEGACY_CONTIGUOUS_MEMORY_FORMAT); - - col2im_backward_out_cpu_template( - grad_input, grad_output, kernel_size, dilation, padding, stride); - return grad_input; -} - } // namespace native } // namespace at diff --git a/aten/src/ATen/native/ComparisonUtils.cpp b/aten/src/ATen/native/ComparisonUtils.cpp new file mode 100644 index 000000000000..c16c361c3442 --- /dev/null +++ b/aten/src/ATen/native/ComparisonUtils.cpp @@ -0,0 +1,32 @@ +#include +#include +#include +#include +#include + +namespace at { + +class Tensor; + +namespace native { + +template +void _assert_match(const O& original, const C& compared, const std::string& name) { + if (compared) { + bool equal = (original == compared.value()); + if (!equal) { + std::stringstream msg; + msg << "Tensor " << name << " mismatch!"; + AT_ASSERT(equal, msg.str()); + } + } +} + +void _assert_tensor_metadata(at::Tensor const& tensor, at::OptionalIntArrayRef sizes, at::OptionalIntArrayRef strides, c10::optional dtype) { + _assert_match(tensor.sizes(), sizes, "sizes"); + _assert_match(tensor.strides(), strides, "strides"); + _assert_match(tensor.dtype(), dtype, "dtype"); +} + +} +} // namespace at::native diff --git a/aten/src/ATen/native/ComplexHelper.h b/aten/src/ATen/native/ComplexHelper.h index 88668d13145c..9533115a7066 100644 --- a/aten/src/ATen/native/ComplexHelper.h +++ b/aten/src/ATen/native/ComplexHelper.h @@ -1,8 +1,15 @@ #pragma once -#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + // WARNING: this header contains non-inline functions and should be only // included from ONE cpp file @@ -11,19 +18,18 @@ namespace at { namespace native { // View tensor with new dtype, storage offset, sizes and strides inline Tensor view_tensor( const Tensor &tensor, ScalarType dtype, - int64_t offset, IntArrayRef sizes, IntArrayRef strides) { + c10::SymInt offset, SymIntArrayRef sizes, SymIntArrayRef strides) { Storage storage = tensor.storage(); auto key_set = tensor.key_set().remove(DispatchKey::Conjugate); auto new_tensor = detail::make_tensor( c10::TensorImpl::VIEW, std::move(storage), key_set, scalarTypeToTypeMeta(dtype)); auto * impl = new_tensor.unsafeGetTensorImpl(); - impl->set_storage_offset(offset); - impl->set_sizes_and_strides(sizes, strides); + impl->set_sizes_and_strides(sizes, strides, offset); return new_tensor; } -inline DimVector computeStrideForViewAsReal(IntArrayRef oldstride) { - DimVector res(oldstride.size() + 1); +inline SymDimVector computeStrideForViewAsReal(SymIntArrayRef oldstride) { + SymDimVector res(oldstride.size() + 1); for (const auto i : c10::irange(oldstride.size())) { res[i] = oldstride[i] * 2; } @@ -33,13 +39,13 @@ inline DimVector computeStrideForViewAsReal(IntArrayRef oldstride) { Tensor _view_as_real_physical(const Tensor& self) { TORCH_CHECK(self.is_complex(), "view_as_real is only supported for complex tensors"); - auto old_sizes = self.sizes(); - DimVector new_sizes(old_sizes.size() + 1); + auto old_sizes = self.sym_sizes(); + SymDimVector new_sizes(old_sizes.size() + 1); std::copy(old_sizes.begin(), old_sizes.end(), new_sizes.begin()); // last dimension will always have two elements containing the real and imag vals new_sizes.back() = 2; - auto new_strides = computeStrideForViewAsReal(self.strides()); - auto new_storage_offset = 2 * self.storage_offset(); + auto new_strides = computeStrideForViewAsReal(self.sym_strides()); + auto new_storage_offset = self.sym_storage_offset() * 2; const auto float_type = c10::toRealValueType(self.scalar_type()); auto real_tensor = view_tensor(self, float_type, new_storage_offset, new_sizes, new_strides); return real_tensor; @@ -53,11 +59,11 @@ Tensor view_as_real(const Tensor& self) { return _view_as_real_physical(self); } -inline DimVector computeStrideForViewAsComplex(IntArrayRef oldstride) { +inline SymDimVector computeStrideForViewAsComplex(SymIntArrayRef oldstride) { const int64_t dim = oldstride.size(); TORCH_CHECK(oldstride[dim-1] == 1, "Tensor must have a last dimension with stride 1"); - DimVector res(dim - 1); + SymDimVector res(dim - 1); for (const auto i : c10::irange(res.size())) { TORCH_CHECK(oldstride[i] % 2 == 0, "Tensor must have a stride divisible by 2 for all but last dimension"); res[i] = oldstride[i] / 2; @@ -72,16 +78,16 @@ Tensor view_as_complex(const Tensor& self) { self.scalar_type() == kFloat || self.scalar_type() == kDouble || self.scalar_type() == kHalf, "view_as_complex is only supported for half, float and double tensors, but got a tensor of scalar type: ", self.scalar_type()); - auto old_sizes = self.sizes(); + auto old_sizes = self.sym_sizes(); TORCH_CHECK(old_sizes.size() != 0, "Input tensor must have one or more dimensions"); TORCH_CHECK(old_sizes[old_sizes.size()-1] == 2, "Tensor must have a last dimension of size 2"); - DimVector new_sizes(old_sizes.begin(), old_sizes.end() - 1); + SymDimVector new_sizes(old_sizes.begin(), old_sizes.end() - 1); - const auto new_strides = computeStrideForViewAsComplex(self.strides()); + const auto new_strides = computeStrideForViewAsComplex(self.sym_strides()); const auto complex_type = c10::toComplexType(self.scalar_type()); - TORCH_CHECK(self.storage_offset() % 2 == 0, "Tensor must have a storage_offset divisible by 2"); - const auto new_storage_offset = self.storage_offset() / 2; + TORCH_CHECK(self.sym_storage_offset() % 2 == 0, "Tensor must have a storage_offset divisible by 2"); + const auto new_storage_offset = self.sym_storage_offset() / 2; return view_tensor(self, complex_type, new_storage_offset, new_sizes, new_strides); } diff --git a/aten/src/ATen/native/ConvUtils.h b/aten/src/ATen/native/ConvUtils.h index 8493deba7b33..880ce0c2af54 100644 --- a/aten/src/ATen/native/ConvUtils.h +++ b/aten/src/ATen/native/ConvUtils.h @@ -80,40 +80,7 @@ static inline bool cudnnv8_use_heur_mode_b() { return cudnnv8_heuristic_mode_b; } -// NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) -struct ConvParams { - std::vector stride; - std::vector padding; - std::vector dilation; - bool transposed; - std::vector output_padding; - int groups; - bool benchmark; - bool deterministic; - bool cudnn_enabled; - bool allow_tf32; - - bool is_strided() const; - bool is_dilated() const; - bool is_padded() const; - bool is_output_padding_neg() const; - bool is_output_padding_big() const; - bool is_padding_neg() const; - bool is_stride_nonpos() const; - void view1d_as_2d(); - bool use_cpu_depthwise3x3_winograd(const at::Tensor& input, const at::Tensor& weight) const; - bool needs_64bit_indexing_no_split(const at::Tensor& input, const at::Tensor& weight) const; - bool use_cudnn(const at::Tensor& input, const at::Tensor& weight) const; - bool use_cudnn_depthwise(const at::Tensor& input, const at::Tensor& weight) const; - bool use_miopen(const at::Tensor& input, const at::Tensor& weight, bool bias_defined) const; - bool use_mkldnn(const at::Tensor& input, const at::Tensor& weight) const; - bool use_nnpack(const at::Tensor& input, const at::Tensor& weight) const; - bool use_xnnpack(const at::Tensor& input, const at::Tensor& weight, - const at::OptionalIntArrayRef bias_sizes_opt) const; - bool use_mps(const at::Tensor& input, const at::Tensor& weight) const; - bool is_depthwise(const at::Tensor& input, const at::Tensor& weight) const; -}; - +// Keep in sync with py::enum_ in Module.cpp enum class ConvBackend { CudaDepthwise2d, CudaDepthwise3d, @@ -139,24 +106,16 @@ enum class ConvBackend { MpsTranspose, }; -// Function to select the convolution backend based on the inputs and params. -// This overload is used within the convolution internals but not exposed to python. -// NB: The forward pass provides a bias tensor while the backward pass provides -// a bool indicating whether the bias is defined. This is done to save memory by -// avoiding saving the full bias tensor for backward. -TORCH_API ConvBackend select_conv_backend( - const Tensor& input, - const Tensor& weight, - const at::OptionalIntArrayRef bias_sizes_opt, - const bool need_backward, - const ConvParams& params); - // Overload for selecting the convolution backend from the full set of convolution inputs. // This overload is exposed to python for testing, etc. TORCH_API ConvBackend select_conv_backend( const Tensor& input, const Tensor& weight, const c10::optional& bias_opt, - IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, - bool transposed, IntArrayRef output_padding, int64_t groups); + IntArrayRef stride, SymIntArrayRef padding, IntArrayRef dilation, + bool transposed, SymIntArrayRef output_padding, int64_t groups, const at::OptionalSymIntArrayRef bias_sizes_opt); + +TORCH_API at::MemoryFormat _determine_backend_memory_format(const Tensor& input, + const Tensor& weight, + const ConvBackend backend); // --------------------------------------------------------------------- // @@ -227,7 +186,7 @@ static void convolution_shape_check( // Input checkDimRange(c, input, 3, 6 /* exclusive */); - checkSize(c, input, input_channels_dim, weight->size(1) * groups); + checkSize_symint(c, input, input_channels_dim, weight->size(1) * groups); // Weight checkSameDim(c, input, weight); @@ -241,15 +200,16 @@ static void convolution_shape_check( // as conv_output_size loses information; this is why conv_input_size // takes an extra output_padding argument to resolve the ambiguity. -static inline std::vector conv_output_size( - IntArrayRef input_size, IntArrayRef weight_size, - IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation = IntArrayRef() +template +static inline std::vector _conv_output_size( + ArrayRef input_size, ArrayRef weight_size, + ArrayRef padding, IntArrayRef stride, IntArrayRef dilation = IntArrayRef() ) { // ASSERT(input_size.size() > 2) // ASSERT(input_size.size() == weight_size.size()) bool has_dilation = dilation.size() > 0; auto dim = input_size.size(); - std::vector output_size(dim); + std::vector output_size(dim); output_size[0] = input_size[input_batch_size_dim]; output_size[1] = weight_size[weight_output_channels_dim]; for (const auto d : c10::irange(2, dim)) { @@ -260,40 +220,84 @@ static inline std::vector conv_output_size( return output_size; } -static inline std::vector conv_input_size( - IntArrayRef output_size, IntArrayRef weight_size, - IntArrayRef padding, IntArrayRef output_padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups +static inline std::vector conv_output_size( + IntArrayRef input_size, IntArrayRef weight_size, + IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation = IntArrayRef() +) { + return _conv_output_size(input_size, weight_size, padding, stride, dilation); +} + +static inline std::vector conv_output_size( + SymIntArrayRef input_size, SymIntArrayRef weight_size, + SymIntArrayRef padding, IntArrayRef stride, IntArrayRef dilation = IntArrayRef() +) { + return _conv_output_size(input_size, weight_size, padding, stride, dilation); +} + +template +std::vector _conv_input_size( + ArrayRef output_size, ArrayRef weight_size, + ArrayRef padding, ArrayRef output_padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups ) { // ASSERT(output_size.size() > 2) // ASSERT(output_size.size() == weight_size.size()) auto dim = output_size.size(); - std::vector input_size(dim); + std::vector input_size(dim); input_size[0] = output_size[output_batch_size_dim]; input_size[1] = weight_size[weight_input_channels_dim] * groups; for (const auto d : c10::irange(2, dim)) { - int kernel = dilation[d - 2] * (weight_size[d] - 1) + 1; - input_size[d] = (output_size[d] - 1) * stride[d - 2] - (2 * padding[d - 2]) + + auto kernel = (weight_size[d] - 1) * dilation[d - 2] + 1; + input_size[d] = (output_size[d] - 1) * stride[d - 2] - (padding[d - 2] * 2) + kernel + output_padding[d - 2]; } return input_size; } -static inline std::vector conv_weight_size( - IntArrayRef input_size, IntArrayRef output_size, +static inline std::vector conv_input_size( + SymIntArrayRef output_size, SymIntArrayRef weight_size, + SymIntArrayRef padding, SymIntArrayRef output_padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups +) { + return _conv_input_size(output_size, weight_size, padding, output_padding, stride, dilation, groups); +} + +static inline std::vector conv_input_size( + IntArrayRef output_size, IntArrayRef weight_size, IntArrayRef padding, IntArrayRef output_padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups +) { + return _conv_input_size(output_size, weight_size, padding, output_padding, stride, dilation, groups); +} + +template +std::vector _conv_weight_size( + ArrayRef input_size, ArrayRef output_size, + ArrayRef padding, ArrayRef output_padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups ) { auto dim = input_size.size(); - std::vector weight_size(dim); + std::vector weight_size(dim); weight_size[0] = output_size[1]; weight_size[1] = input_size[1] / groups; for (const auto d : c10::irange(2, dim)) { - int kernel = input_size[d] - (output_size[d] - 1) * stride[d - 2] - + 2 * padding[d - 2] - output_padding[d - 2]; + auto kernel = input_size[d] - (output_size[d] - 1) * stride[d - 2] + + padding[d - 2] * 2 - output_padding[d - 2]; weight_size[d] = (kernel - 1) / dilation[d - 2] + 1; } return weight_size; } +static inline std::vector conv_weight_size( + SymIntArrayRef input_size, SymIntArrayRef output_size, + SymIntArrayRef padding, SymIntArrayRef output_padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups +) { + return _conv_weight_size(input_size, output_size, padding, output_padding, stride, dilation, groups); +} + +static inline std::vector conv_weight_size( + IntArrayRef input_size, IntArrayRef output_size, + IntArrayRef padding, IntArrayRef output_padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups +) { + return _conv_weight_size(input_size, output_size, padding, output_padding, stride, dilation, groups); +} + static inline Tensor reshape_bias(int64_t dim, const Tensor& bias) { std::vector shape(dim, 1); shape[1] = -1; diff --git a/aten/src/ATen/native/Convolution.cpp b/aten/src/ATen/native/Convolution.cpp index 9f2d8efbd618..edb51a5c837d 100644 --- a/aten/src/ATen/native/Convolution.cpp +++ b/aten/src/ATen/native/Convolution.cpp @@ -1,20 +1,25 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#include #include #include #include #include #include #include -#include #include #include - -#include #include - #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #if AT_NNPACK_ENABLED() #include #endif @@ -23,311 +28,70 @@ #include #endif -constexpr int MIOPEN_DIM_MAX = 5; - -namespace at { namespace native { - -DEFINE_DISPATCH(conv_depthwise2d_backward_stub); -DEFINE_DISPATCH(conv_depthwise3d_backward_stub); -DEFINE_DISPATCH(cudnn_convolution_backward_stub); -DEFINE_DISPATCH(cudnn_convolution_transpose_backward_stub); -DEFINE_DISPATCH(slow_conv_transpose3d_backward_stub); -DEFINE_DISPATCH(convolution_depthwise3x3_winograd_stub); -DEFINE_DISPATCH(miopen_convolution_backward_stub); -DEFINE_DISPATCH(miopen_convolution_transpose_backward_stub); -DEFINE_DISPATCH(miopen_depthwise_convolution_backward_stub); -DEFINE_DISPATCH(mkldnn_convolution_backward_stub); -DEFINE_DISPATCH(slow_conv_dilated2d_backward_stub); -DEFINE_DISPATCH(slow_conv_dilated3d_backward_stub); -DEFINE_DISPATCH(slow_conv_transpose2d_backward_stub); -REGISTER_NO_CPU_DISPATCH(conv_depthwise2d_backward_stub); -REGISTER_NO_CPU_DISPATCH(conv_depthwise3d_backward_stub); -REGISTER_NO_CPU_DISPATCH(cudnn_convolution_backward_stub); -REGISTER_NO_CPU_DISPATCH(cudnn_convolution_transpose_backward_stub); -REGISTER_NO_CPU_DISPATCH(miopen_convolution_backward_stub); -REGISTER_NO_CPU_DISPATCH(miopen_convolution_transpose_backward_stub); -REGISTER_NO_CPU_DISPATCH(miopen_depthwise_convolution_backward_stub); - -std::ostream& operator<<(std::ostream & out, const ConvParams& params) { - out << "ConvParams {" - << " stride = " << IntArrayRef{params.stride} - << " padding = " << IntArrayRef{params.padding} - << " dilation = " << IntArrayRef{params.dilation} - << " transposed = " << params.transposed - << " output_padding = " << IntArrayRef{params.output_padding} - << " groups = " << params.groups - << " benchmark = " << params.benchmark - << " deterministic = " << params.deterministic - << " cudnn_enabled = " << params.cudnn_enabled - << " allow_tf32 = " << params.allow_tf32 - << "}"; - return out; -} - -auto ConvParams::is_strided() const -> bool { - bool is_strided = false; - for (auto s : stride) { - is_strided |= (s != 1); - } - return is_strided; -} - -auto ConvParams::is_dilated() const -> bool { - bool is_dilated = false; - for (auto d : dilation) { - is_dilated |= (d != 1); - } - return is_dilated; -} - -auto ConvParams::is_padded() const -> bool { - bool is_padded = false; - for (auto p : padding) { - is_padded |= (p != 0); - } - return is_padded; -} - -auto ConvParams::is_output_padding_neg() const -> bool { - bool is_non_neg = false; - for (auto p : output_padding) { - is_non_neg |= (p < 0); - } - return is_non_neg; -} - -auto ConvParams::is_output_padding_big() const -> bool { - bool is_big = false; - for (auto i: c10::irange(output_padding.size())) { - is_big |= (output_padding[i] >= stride[i]); - } - return is_big; -} - -auto ConvParams::is_padding_neg() const -> bool { - bool is_non_neg = false; - for (auto p : padding) { - is_non_neg |= (p < 0); - } - return is_non_neg; -} - -auto ConvParams::is_stride_nonpos() const -> bool { - bool is_nonpos = false; - for (auto s : stride) { - is_nonpos |= (s <= 0); - } - return is_nonpos; -} - -auto ConvParams::view1d_as_2d() -> void { - if (stride.size() == 1) { - stride.insert(stride.begin(), 1); - padding.insert(padding.begin(), 0); - dilation.insert(dilation.begin(), 1); - output_padding.insert(output_padding.begin(), 0); - } -} - -auto ConvParams::use_cpu_depthwise3x3_winograd( - const at::Tensor& input, - const at::Tensor& weight) const -> bool { -#if defined(__ARM_NEON__) - // Currently only 3x3 depthwise convolutions on tensors of float are supported. - return (input.ndimension() == 4) && - (input.size(1) == groups) && - (weight.ndimension() == 4 ) && - (weight.size(0) % input.size(1) == 0) && - (weight.size(1) == 1) && - (weight.size(2) == 3) && - (weight.size(3) == 3) && - (input.device().is_cpu()) && - (input.scalar_type() == at::kFloat) && - input.is_contiguous() && - (weight.device().is_cpu()) && - (weight.scalar_type() == at::kFloat) && - weight.is_contiguous() && - !is_strided() && - !is_dilated() && - !transposed; -#else - return false; -#endif -} - -auto ConvParams::needs_64bit_indexing_no_split(const at::Tensor& input, const at::Tensor& weight) const -> bool { - constexpr int64_t int_max = std::numeric_limits::max(); - int64_t numel_input = input.numel(); - // empty input - if (numel_input == 0) { - return false; - } - // input size can not be reduced to the range of int by splitting the batch dim - int64_t n = input.size(0); - if (numel_input / n > int_max) { - return true; - } - // output size can not be reduced to the range of int by splitting the batch dim - int64_t outsize = 1; - if (transposed) { - std::vector o = conv_input_size(input.sizes(), weight.sizes(), padding, output_padding, stride, dilation, groups); - outsize = c10::multiply_integers(o.begin() + 1, o.end()); - } else { - std::vector o = conv_output_size(input.sizes(), weight.sizes(), padding, stride, dilation); - outsize = c10::multiply_integers(o.begin() + 1, o.end()); - } - return outsize > int_max; -} - -auto ConvParams::use_cudnn(const at::Tensor& input, const at::Tensor& weight) const -> bool { - -// Note [Mobile check segfaults] -// cudnn and miopen are guaranteed not to be on mobile, and T102591915 / T110194934 suggest -// that maybe the compiledWithCuDNN() check sometimes segfaults (though I can't imagine how) -#if !defined(C10_MOBILE) - if (needs_64bit_indexing_no_split(input, weight)) { - return false; - } - if (!detail::getCUDAHooks().compiledWithCuDNN()) { - return false; - } - if (!input.is_cuda() || !cudnn_enabled) { - return false; - } - if (input.scalar_type() == at::kBFloat16 || weight.scalar_type() == at::kBFloat16) { - if (!(detail::getCUDAHooks().supportsBFloat16ConvolutionWithCuDNNv8() && at::native::cudnnv8_enabled_check_debug())) { - return false; - } - } - if (cudnn_conv_suggest_memory_format(input, weight) == at::MemoryFormat::Contiguous) { - // bypass dilation checks for channels_last convolution - if (deterministic && is_dilated()) { - // cudnn doesn't support deterministic dilated convolution fully yet - return false; - } - if (is_dilated()) { - return detail::getCUDAHooks().supportsDilatedConvolutionWithCuDNN() && !is_output_padding_big(); - } - } - return !is_output_padding_big(); -#else - return false; -#endif -} - -auto ConvParams::use_mps( const at::Tensor& input, const at::Tensor& weight) const -> bool { - // These checks need to be expanded. Currently we have very limited set of - // checks for MPS. -#ifdef USE_MPS - if (needs_64bit_indexing_no_split(input, weight)) { - return false; - } - if (!input.is_mps()) { - return false; - } - return true; +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include #else - return false; +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #endif -} - -auto ConvParams::use_miopen(const at::Tensor& input, const at::Tensor& weight, bool bias_defined) const -> bool { - if (needs_64bit_indexing_no_split(input, weight)) { - return false; - } - return ((input.scalar_type() == at::kFloat) || (input.scalar_type() == at::kHalf) || (input.scalar_type() == at::kBFloat16)) - && detail::getCUDAHooks().compiledWithMIOpen() - && input.is_cuda() - && input.dim() <= MIOPEN_DIM_MAX - && !(groups > 1 && is_dilated()) // MIOpen currently does not support dilation with groups of size > 1 - && !(input.scalar_type() == at::kBFloat16 && bias_defined) // MIOpen currently doesn't support bias with bfloat16 - && cudnn_enabled - ; -} - -auto ConvParams::use_mkldnn(const at::Tensor& input, const at::Tensor& weight) const -> bool { -#if AT_MKLDNN_ENABLED() - if (!at::globalContext().userEnabledMkldnn()) { - return false; - } - if (input.device().is_cpu() && input.scalar_type() == kBFloat16 && mkldnn_bf16_device_check()) { - return true; - } - return (input.is_mkldnn()) || // input is mkldnn Tensor - (input.device().is_cpu() && - input.scalar_type() == kFloat && // only on CPU Float Tensors - !transposed && // or transposed tensors - // For 1x1 filters, MKLDNN is faster than THNN when multi-threaded, - // but THNN is faster when single-threaded. - (is_strided() || is_dilated() || input.size(0) >= 16 || - weight.size(-1) != 1 || weight.size(-2) != 1 || at::get_num_threads() > 1) && - (groups > 1 - || (weight.size(-1) > 3 && weight.size(-2) > 3) - || input.size(0) > 1 - || input.size(0)*input.size(1)*input.size(2)*input.size(3) > 20480) // for some case, native is faster - ); -#endif - return false; -} - -auto ConvParams::use_nnpack(const at::Tensor& input, const at::Tensor& weight) const -> bool { -#if AT_NNPACK_ENABLED() - return at::_nnpack_available() && - input.device().is_cpu() && - input.scalar_type() == kFloat && // only on CPU Float Tensors - !is_dilated() && // or dilation - !transposed && // or transposed tensors - input.ndimension() == 4 && // must be in NCHW format - weight.ndimension() == 4 && - (weight.size(2) < 17) && (weight.size(3) < 17) // NNPACK only supports kernels up to 16x16 -#if !defined(C10_MOBILE) - && input.size(0) >= 16 // ensure large enough batch size to ensure perf, tuneable -#endif - ; -#endif - return false; -} - -auto ConvParams::use_xnnpack( - const at::Tensor& input, - const at::Tensor& weight, - const at::OptionalIntArrayRef bias_sizes_opt) const -> bool { -#if defined(C10_MOBILE) - if (!transposed) { - return (input.size(1) == groups) && - xnnpack::use_convolution2d( - input, - weight, - bias_sizes_opt, - padding, - stride, - dilation, - groups, - transposed); - } -#endif - return false; -} +constexpr int MIOPEN_DIM_MAX = 5; -// We currently only have depthwise support for the case where groups == -// nInputPlane and nInputPlane == nOutputPlane (the latter due to the lack of -// a depthwise multiplier) -auto ConvParams::is_depthwise( - const at::Tensor& input, const at::Tensor& weight) const -> bool { - return input.is_cuda() && - !transposed && - (input.ndimension() == 4 || input.ndimension() == 5) && - input.size(1) == groups && - groups > 1 && // no point if there is only a single group - weight.size(0) % input.size(1) == 0; // output channels must be a multiple of input channels -} +namespace at { namespace native { // Check workload to activate fast depthwise FP16 cudnn conv kernels +template bool check_cudnn_depthwise_workload(const at::Tensor& input, int stride) { - int w = input.size(3); // same as h - int ch = input.size(1); - int bs = input.size(0); + auto w = at::symint::size(input, 3); // same as h + auto ch = at::symint::size(input, 1); + auto bs = at::symint::size(input, 0); if (stride==1) { if (w >= 7) { // All batch sizes and nb_channels @@ -446,27 +210,28 @@ bool check_cudnn_depthwise_workload(const at::Tensor& input, int stride) { } // simplified version for cudnn 8.2 and above +template bool check_cudnn_depthwise_workload_with_filter(const at::Tensor& input, int stride, const at::Tensor& weight) { // 1D conv - if(input.size(2) == 1 && stride == 1){ + if(at::symint::size(input, 2) == 1 && stride == 1){ return true; } // 2d conv // only square filters - if (weight.size(2) != weight.size(3)) return false; - int filter = weight.size(3); + if (at::symint::size(weight, 2) != at::symint::size(weight, 3)) return false; + auto filter = at::symint::size(weight, 3); // only 1/3/5 filter if (filter != 1 && filter != 3 && filter != 5) return false; // we don't enforce square input but only check width to reduce heuristic space - if (input.size(3) < 7) return false; // min width 7 - int w = input.size(3); + if (at::symint::size(input, 3) < 7) return false; // min width 7 + auto w = at::symint::size(input, 3); // only 1/2 stride, use cudnn for all stride 1 if (stride == 1) return true; if (stride != 2) return false; - int ch = input.size(1); - int bs = input.size(0); + auto ch = at::symint::size(input, 1); + auto bs = at::symint::size(input, 0); // special case since bs1 show good perf in lots of cases if (bs == 1) { if (filter == 1 && w <= 28) return true; @@ -480,54 +245,390 @@ bool check_cudnn_depthwise_workload_with_filter(const at::Tensor& input, int str return false; } -// Use cudnn for FP16 depthwise convolutions -auto ConvParams::use_cudnn_depthwise( - const at::Tensor& input, const at::Tensor& weight) const -> bool { - if (cudnn_conv_suggest_memory_format(input, weight) != at::MemoryFormat::Contiguous && use_cudnn(input, weight)) { - // always use cudnn_depthwise for channels_last format - return true; + +bool xnnpack_use_convolution2d( + const Tensor& input, + const Tensor& weight, + const at::OptionalIntArrayRef bias_sizes_opt, + const IntArrayRef padding, + const IntArrayRef stride, + const IntArrayRef dilation, + const int64_t groups, + const bool transposed) { + return xnnpack::use_convolution2d(input, weight, bias_sizes_opt, padding, stride, dilation, groups, transposed); +} + +bool xnnpack_use_convolution2d( + const Tensor& input, + const Tensor& weight, + const at::OptionalSymIntArrayRef bias_sizes_opt, + const SymIntArrayRef padding, + const IntArrayRef stride, + const IntArrayRef dilation, + const int64_t groups, + const bool transposed) { + // Never use xnnpack for symbolic tracing + return false; +} + +// NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) +// This struct is templated so that we can run backend selection in a dynamic +// shapes context; all of the real kernel selection in eager mode runs with +// int64_t +template +struct ConvParams { + std::vector stride; + std::vector padding; + std::vector dilation; + bool transposed; + std::vector output_padding; + int groups; + bool benchmark; + bool deterministic; + bool cudnn_enabled; + bool allow_tf32; + + bool is_strided() const { + bool is_strided = false; + for (auto s : stride) { + is_strided |= (s != 1); + } + return is_strided; } - if (detail::getCUDAHooks().supportsDepthwiseConvolutionWithCuDNN()) { - long cudnn_version = detail::getCUDAHooks().versionCuDNN(); - if (cudnn_version >= 8200) { - bool kernel_cond = (use_cudnn(input, weight) && + + bool is_dilated() const { + bool is_dilated = false; + for (auto d : dilation) { + is_dilated |= (d != 1); + } + return is_dilated; + } + + bool is_padded() const { + bool is_padded = false; + for (auto p : padding) { + is_padded |= (p != 0); + } + return is_padded; + } + + bool is_output_padding_neg() const { + bool is_non_neg = false; + for (auto p : output_padding) { + is_non_neg |= (p < 0); + } + return is_non_neg; + } + + bool is_output_padding_big() const { + bool is_big = false; + for (auto i: c10::irange(output_padding.size())) { + is_big |= (output_padding[i] >= stride[i]); + } + return is_big; + } + + bool is_padding_neg() const { + bool is_non_neg = false; + for (auto p : padding) { + is_non_neg |= (p < 0); + } + return is_non_neg; + } + + bool is_stride_nonpos() const { + bool is_nonpos = false; + for (auto s : stride) { + is_nonpos |= (s <= 0); + } + return is_nonpos; + } + + void view1d_as_2d() { + if (stride.size() == 1) { + stride.insert(stride.begin(), 1); + padding.insert(padding.begin(), 0); + dilation.insert(dilation.begin(), 1); + output_padding.insert(output_padding.begin(), 0); + } + } + + bool use_cpu_depthwise3x3_winograd(const at::Tensor& input, const at::Tensor& weight, const c10::optional& bias) const { +#if defined(__ARM_NEON__) + // Currently only 3x3 depthwise convolutions on tensors of float are supported. + return (input.ndimension() == 4) && + (at::symint::size(input, 1) == groups) && + (weight.ndimension() == 4 ) && + (at::symint::size(weight, 0) % at::symint::size(input, 1) == 0) && + (at::symint::size(weight, 1) == 1) && + (at::symint::size(weight, 2) == 3) && + (at::symint::size(weight, 3) == 3) && + (input.device().is_cpu()) && + (input.scalar_type() == at::kFloat) && + input.is_contiguous() && + (weight.device().is_cpu()) && + (weight.scalar_type() == at::kFloat) && + weight.is_contiguous() && + (!bias.has_value() || bias->is_contiguous()) && + !is_strided() && + !is_dilated() && + !transposed; +#else + return false; +#endif + } + + bool needs_64bit_indexing_no_split(const at::Tensor& input, const at::Tensor& weight) const { + constexpr int64_t int_max = std::numeric_limits::max(); + auto numel_input = at::symint::numel(input); + // empty input + if (numel_input == 0) { + return false; + } + // input size can not be reduced to the range of int by splitting the batch dim + auto n = at::symint::size(input, 0); + if (numel_input / n > int_max) { + return true; + } + // output size can not be reduced to the range of int by splitting the batch dim + T outsize = 1; + if (transposed) { + auto o = conv_input_size(at::symint::sizes(input), at::symint::sizes(weight), padding, output_padding, stride, dilation, groups); + outsize = c10::multiply_integers(o.begin() + 1, o.end()); + } else { + auto o = conv_output_size(at::symint::sizes(input), at::symint::sizes(weight), padding, stride, dilation); + outsize = c10::multiply_integers(o.begin() + 1, o.end()); + } + return outsize > int_max; + } + + bool use_cudnn(const at::Tensor& input, const at::Tensor& weight) const { + // Note [Mobile check segfaults] + // cudnn and miopen are guaranteed not to be on mobile, and T102591915 / T110194934 suggest + // that maybe the compiledWithCuDNN() check sometimes segfaults (though I can't imagine how) +#if !defined(C10_MOBILE) + if (needs_64bit_indexing_no_split(input, weight)) { + return false; + } + if (!detail::getCUDAHooks().compiledWithCuDNN()) { + return false; + } + if (!input.is_cuda() || !cudnn_enabled) { + return false; + } + if (input.scalar_type() == at::kBFloat16 || weight.scalar_type() == at::kBFloat16) { + if (!(detail::getCUDAHooks().supportsBFloat16ConvolutionWithCuDNNv8() && at::native::cudnnv8_enabled_check_debug())) { + return false; + } + } + if (cudnn_conv_suggest_memory_format(input, weight) == at::MemoryFormat::Contiguous) { + // bypass dilation checks for channels_last convolution + if (deterministic && is_dilated()) { + // cudnn doesn't support deterministic dilated convolution fully yet + return false; + } + if (is_dilated()) { + return detail::getCUDAHooks().supportsDilatedConvolutionWithCuDNN() && !is_output_padding_big(); + } + } + return !is_output_padding_big(); +#else + return false; +#endif + } + + // Use cudnn for FP16 depthwise convolutions + bool use_cudnn_depthwise(const at::Tensor& input, const at::Tensor& weight) const { + if (cudnn_conv_suggest_memory_format(input, weight) != at::MemoryFormat::Contiguous && use_cudnn(input, weight)) { + // always use cudnn_depthwise for channels_last format + return true; + } + if (detail::getCUDAHooks().supportsDepthwiseConvolutionWithCuDNN()) { + long cudnn_version = detail::getCUDAHooks().versionCuDNN(); + if (cudnn_version >= 8200) { + bool kernel_cond = (use_cudnn(input, weight) && + input.scalar_type() == kHalf && // only for FP16 + weight.scalar_type() == kHalf && + is_depthwise(input, weight) && + input.ndimension() == 4 && // TODO: 5-D contiguous depthwise is not supported yet, need benchmarks + !is_dilated() && // no dilation supported + (stride[0] == stride[1] || at::symint::size(input, 2) == 1) && // square or 1d + at::symint::size(input, 1) >= 32); // min 32 channels supported) + if (kernel_cond) { + return check_cudnn_depthwise_workload_with_filter(input, stride[1], weight); + } + } + // keep (7600 <= cudnn < 8200) code unchanged + bool kernel_cond = (cudnn_version >= 7600 && + use_cudnn(input, weight) && input.scalar_type() == kHalf && // only for FP16 weight.scalar_type() == kHalf && is_depthwise(input, weight) && input.ndimension() == 4 && // TODO: 5-D contiguous depthwise is not supported yet, need benchmarks + at::symint::size(weight, 2) == at::symint::size(weight, 3) && // only square kernels + at::symint::size(input, 2) >= 7 && // min width/height 7 !is_dilated() && // no dilation supported - (stride[0] == stride[1] || input.size(2) == 1) && // square or 1d - input.size(1) >= 32); // min 32 channels supported) + stride[0] == stride[1] && // equal strides + ((at::symint::size(weight, 3) == 3) || (at::symint::size(weight, 3) == 1)) && + at::symint::size(input, 1) >= 32); // min 32 channels supported) if (kernel_cond) { - return check_cudnn_depthwise_workload_with_filter(input, stride[1], weight); + return check_cudnn_depthwise_workload(input, stride[0]); + } else { + return false; } - } - // keep (7600 <= cudnn < 8200) code unchanged - bool kernel_cond = (cudnn_version >= 7600 && - use_cudnn(input, weight) && - input.scalar_type() == kHalf && // only for FP16 - weight.scalar_type() == kHalf && - is_depthwise(input, weight) && - input.ndimension() == 4 && // TODO: 5-D contiguous depthwise is not supported yet, need benchmarks - weight.size(2) == weight.size(3) && // only square kernels - input.size(2) >= 7 && // min width/height 7 - !is_dilated() && // no dilation supported - stride[0] == stride[1] && // equal strides - ((weight.size(3) == 3) || (weight.size(3) == 1)) && - input.size(1) >= 32); // min 32 channels supported) - if (kernel_cond) { - return check_cudnn_depthwise_workload(input, stride[0]); } else { return false; } - } else { + } + + bool use_miopen(const at::Tensor& input, const at::Tensor& weight, bool bias_defined) const { + if (needs_64bit_indexing_no_split(input, weight)) { + return false; + } + return ((input.scalar_type() == at::kFloat) || (input.scalar_type() == at::kHalf) || (input.scalar_type() == at::kBFloat16)) + && detail::getCUDAHooks().compiledWithMIOpen() + && input.is_cuda() + && input.dim() <= MIOPEN_DIM_MAX + && !(groups > 1 && is_dilated()) // MIOpen currently does not support dilation with groups of size > 1 + && !(input.scalar_type() == at::kBFloat16 && bias_defined) // MIOpen currently doesn't support bias with bfloat16 + && cudnn_enabled + ; + } + bool use_mkldnn(const at::Tensor& input, const at::Tensor& weight) const { +#if AT_MKLDNN_ENABLED() + if (!at::globalContext().userEnabledMkldnn()) { + return false; + } + if (input.device().is_cpu() && input.scalar_type() == kBFloat16 && mkldnn_bf16_device_check()) { + return true; + } + return (input.is_mkldnn()) || // input is mkldnn Tensor + (input.device().is_cpu() && + input.scalar_type() == kFloat && // only on CPU Float Tensors + !transposed && // or transposed tensors + // For 1x1 filters, MKLDNN is faster than THNN when multi-threaded, + // but THNN is faster when single-threaded. + (is_strided() || is_dilated() || at::symint::size(input, 0) >= 16 || + at::symint::size(weight, -1) != 1 || at::symint::size(weight, -2) != 1 || at::get_num_threads() > 1) && + (groups > 1 + || (at::symint::size(weight, -1) > 3 && at::symint::size(weight, -2) > 3) + || at::symint::size(input, 0) > 1 + || at::symint::size(input, 0)*at::symint::size(input, 1)*at::symint::size(input, 2)*at::symint::size(input, 3) > 20480) // for some case, native is faster + ); + +#endif + return false; + } + bool use_nnpack(const at::Tensor& input, const at::Tensor& weight) const { +#if AT_NNPACK_ENABLED() + return at::_nnpack_available() && + input.device().is_cpu() && + input.scalar_type() == kFloat && // only on CPU Float Tensors + !is_dilated() && // or dilation + !transposed && // or transposed tensors + input.ndimension() == 4 && // must be in NCHW format + weight.ndimension() == 4 && + (at::symint::size(weight, 2) < 17) && (at::symint::size(weight, 3) < 17) // NNPACK only supports kernels up to 16x16 +#if !defined(C10_MOBILE) + && at::symint::size(input, 0) >= 16 // ensure large enough batch size to ensure perf, tuneable +#endif + ; +#endif + return false; + } + bool use_xnnpack(const at::Tensor& input, const at::Tensor& weight, + const at::OptionalArrayRef bias_sizes_opt) const { +#if defined(C10_MOBILE) + if (!transposed) { + // NB: for the call here, it MATTERS that we are templated. If you + // untemplate this to always use SymInt, the function + // xnnpack_use_convolution2d will always return false + return (at::symint::size(input, 1) == groups) && + xnnpack_use_convolution2d( + input, + weight, + bias_sizes_opt, + padding, + stride, + dilation, + groups, + transposed); + } +#endif + return false; + } + + bool use_mps(const at::Tensor& input, const at::Tensor& weight) const { + // These checks need to be expanded. Currently we have very limited set of + // checks for MPS. +#ifdef USE_MPS + if (needs_64bit_indexing_no_split(input, weight)) { + return false; + } + if (!input.is_mps()) { + return false; + } + return true; +#else return false; +#endif } + + // We currently only have depthwise support for the case where groups == + // nInputPlane and nInputPlane == nOutputPlane (the latter due to the lack of + // a depthwise multiplier) + bool is_depthwise(const at::Tensor& input, const at::Tensor& weight) const { + return input.is_cuda() && + !transposed && + (input.ndimension() == 4 || input.ndimension() == 5) && + at::symint::size(input, 1) == groups && + groups > 1 && // no point if there is only a single group + at::symint::size(weight, 0) % at::symint::size(input, 1) == 0; // output channels must be a multiple of input channels + } +}; + +DEFINE_DISPATCH(conv_depthwise2d_backward_stub); +DEFINE_DISPATCH(conv_depthwise3d_backward_stub); +DEFINE_DISPATCH(cudnn_convolution_backward_stub); +DEFINE_DISPATCH(cudnn_convolution_transpose_backward_stub); +DEFINE_DISPATCH(slow_conv_transpose3d_backward_stub); +DEFINE_DISPATCH(convolution_depthwise3x3_winograd_stub); +DEFINE_DISPATCH(miopen_convolution_backward_stub); +DEFINE_DISPATCH(miopen_convolution_transpose_backward_stub); +DEFINE_DISPATCH(miopen_depthwise_convolution_backward_stub); +DEFINE_DISPATCH(mkldnn_convolution_backward_stub); +DEFINE_DISPATCH(slow_conv_dilated2d_backward_stub); +DEFINE_DISPATCH(slow_conv_dilated3d_backward_stub); +DEFINE_DISPATCH(slow_conv_transpose2d_backward_stub); +REGISTER_NO_CPU_DISPATCH(conv_depthwise2d_backward_stub); +REGISTER_NO_CPU_DISPATCH(conv_depthwise3d_backward_stub); +REGISTER_NO_CPU_DISPATCH(cudnn_convolution_backward_stub); +REGISTER_NO_CPU_DISPATCH(cudnn_convolution_transpose_backward_stub); +REGISTER_NO_CPU_DISPATCH(miopen_convolution_backward_stub); +REGISTER_NO_CPU_DISPATCH(miopen_convolution_transpose_backward_stub); +REGISTER_NO_CPU_DISPATCH(miopen_depthwise_convolution_backward_stub); + +template +std::ostream& operator<<(std::ostream & out, const ConvParams& params) { + out << "ConvParams {" + << " stride = " << IntArrayRef{params.stride} + << " padding = " << ArrayRef{params.padding} + << " dilation = " << IntArrayRef{params.dilation} + << " transposed = " << params.transposed + << " output_padding = " << ArrayRef{params.output_padding} + << " groups = " << params.groups + << " benchmark = " << params.benchmark + << " deterministic = " << params.deterministic + << " cudnn_enabled = " << params.cudnn_enabled + << " allow_tf32 = " << params.allow_tf32 + << "}"; + return out; } +template static void check_shape_forward(const at::Tensor& input, - const c10::IntArrayRef& weight_sizes, const at::Tensor& bias, - const ConvParams& params) { + const c10::ArrayRef& weight_sizes, const at::Tensor& bias, + const ConvParams& params) { int64_t k = input.ndimension(); int64_t weight_dim = weight_sizes.size(); int64_t groups = params.groups; @@ -542,7 +643,7 @@ static void check_shape_forward(const at::Tensor& input, TORCH_CHECK(weight_dim == k, "Expected ", weight_dim, "-dimensional input for ", weight_dim, "-dimensional weight ", weight_sizes, ", but got ", k, "-dimensional input of size ", - input.sizes(), " instead"); + at::symint::sizes(input), " instead"); TORCH_CHECK(weight_sizes[0] >= groups, "Given groups=", groups, ", expected weight to be at least ", groups, " at dimension 0, but got weight of size ", weight_sizes, " instead"); @@ -552,23 +653,23 @@ static void check_shape_forward(const at::Tensor& input, "] instead"); if (!transposed) { - std::vector input_shape; - std::vector kernel_shape; + std::vector input_shape; + std::vector kernel_shape; bool kernel_size_correct = true; - TORCH_CHECK(input.size(1) == (weight_sizes[1] * groups), + TORCH_CHECK(at::symint::size(input, 1) == (weight_sizes[1] * groups), "Given groups=", groups, ", weight of size ", weight_sizes, - ", expected input", input.sizes(), " to have ", - (weight_sizes[1] * groups), " channels, but got ", input.size(1), + ", expected input", at::symint::sizes(input), " to have ", + (weight_sizes[1] * groups), " channels, but got ", at::symint::size(input, 1), " channels instead"); - TORCH_CHECK(!bias.defined() || (bias.ndimension() == 1 && bias.size(0) == weight_sizes[0]), + TORCH_CHECK(!bias.defined() || (bias.ndimension() == 1 && at::symint::size(bias, 0) == weight_sizes[0]), "Given weight of size ", weight_sizes, ", expected bias to be 1-dimensional with ", weight_sizes[0], " elements", - ", but got bias of size ", bias.sizes(), " instead"); + ", but got bias of size ", at::symint::sizes(bias), " instead"); for (const auto i : c10::irange(2, k)) { - input_shape.push_back(input.size(i) + 2 * padding[i-2]); + input_shape.push_back(at::symint::size(input, i) + 2 * padding[i-2]); // log new kernel size considering dilation kernel_shape.push_back(dilation[i-2] * (weight_sizes[i]-1) + 1); if (input_shape.back() < kernel_shape.back()) { @@ -594,22 +695,23 @@ static void check_shape_forward(const at::Tensor& input, "Kernel size: (", kernel_ss.str(), "). Kernel size can't be greater than actual input size"); } } else { // transposed - TORCH_CHECK(input.size(1) == weight_sizes[0], + TORCH_CHECK(at::symint::size(input, 1) == weight_sizes[0], "Given transposed=", transposed, ", weight of size ", weight_sizes, - ", expected input", input.sizes(), " to have ", weight_sizes[0], - " channels, but got ", input.size(1), " channels instead"); - TORCH_CHECK(!bias.defined() || (bias.ndimension() == 1 && bias.size(0) == weight_sizes[1] * groups), + ", expected input", at::symint::sizes(input), " to have ", weight_sizes[0], + " channels, but got ", at::symint::size(input, 1), " channels instead"); + TORCH_CHECK(!bias.defined() || (bias.ndimension() == 1 && at::symint::size(bias, 0) == weight_sizes[1] * groups), "Given transposed=", transposed, ", weight of size ", weight_sizes, ", expected bias to be 1-dimensional with ", weight_sizes[1] * groups, " elements", - ", but got bias of size ", bias.sizes(), " instead"); + ", but got bias of size ", at::symint::sizes(bias), " instead"); } } +template static void check_shape_backward( const at::Tensor& input, - const c10::IntArrayRef& weight_sizes, - const ConvParams& params) { - check_shape_forward(input, weight_sizes, /*bias=*/ Tensor(), params); + const c10::ArrayRef& weight_sizes, + const ConvParams& params) { + check_shape_forward(input, weight_sizes, /*bias=*/ Tensor(), params); } // Given an input tensor and an expected number of spatial dimensions, checks that the @@ -713,6 +815,7 @@ at::Tensor complex_convolution( IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, + bool transposed, IntArrayRef output_padding, int64_t groups) { check_input_same_type_as_parameters(input, weight, bias); @@ -730,15 +833,15 @@ at::Tensor complex_convolution( // conv(W, x, b) = a - b + i(c - a - b) Tensor a, b, c; if (!bias.defined()) { - a = at::convolution(i_r, w_r, bias, stride, padding, dilation, false, output_padding, groups); - b = at::convolution(i_i, w_i, bias, stride, padding, dilation, false, output_padding, groups); - c = at::convolution(i_r + i_i, w_r + w_i, bias, stride, padding, dilation, false, output_padding, groups); + a = at::convolution(i_r, w_r, bias, stride, padding, dilation, transposed, output_padding, groups); + b = at::convolution(i_i, w_i, bias, stride, padding, dilation, transposed, output_padding, groups); + c = at::convolution(i_r + i_i, w_r + w_i, bias, stride, padding, dilation, transposed, output_padding, groups); } else { Tensor b_r, b_i; std::tie(b_r, b_i) = complex_to_real(bias.resolve_conj()); - a = at::convolution(i_r, w_r, b_r, stride, padding, dilation, false, output_padding, groups); - b = at::convolution(i_i, w_i, Tensor(), stride, padding, dilation, false, output_padding, groups); - c = at::convolution(i_r + i_i, w_r + w_i, b_r + b_i, stride, padding, dilation, false, output_padding, groups); + a = at::convolution(i_r, w_r, b_r, stride, padding, dilation, transposed, output_padding, groups); + b = at::convolution(i_i, w_i, Tensor(), stride, padding, dilation, transposed, output_padding, groups); + c = at::convolution(i_r + i_i, w_r + w_i, b_r + b_i, stride, padding, dilation, transposed, output_padding, groups); } auto i = c10::Scalar(c10::complex(0, 1)); @@ -791,7 +894,7 @@ at::Tensor conv1d( std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 1, "conv1d"); Tensor output; if (at::isComplexType(input_.scalar_type())) { - output = complex_convolution(input, weight, bias, stride, padding, dilation, {0}, groups); + output = complex_convolution(input, weight, bias, stride, padding, dilation, false, {0}, groups); } else { output = at::convolution(input, weight, bias, stride, padding, dilation, false, {0}, groups); } @@ -805,12 +908,20 @@ at::Tensor conv2d( c10::MaybeOwned bias_maybe_owned = at::borrow_from_optional_tensor(bias_opt); const Tensor& bias = *bias_maybe_owned; + TORCH_CHECK( + !bias.defined() || bias.dtype() == input_.dtype(), + "Input type (", + input_.dtype().name(), + ") and bias type (", + bias.dtype().name(), + ") should be the same"); + Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 2, "conv2d"); Tensor output; if (at::isComplexType(input_.scalar_type())) { - output = complex_convolution(input, weight, bias, stride, padding, dilation, {{0, 0}}, groups); + output = complex_convolution(input, weight, bias, stride, padding, dilation, false, {{0, 0}}, groups); } else { output = at::convolution(input, weight, bias, stride, padding, dilation, false, {{0, 0}}, groups); } @@ -829,7 +940,7 @@ at::Tensor conv3d( std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 3, "conv3d"); Tensor output; if (at::isComplexType(input_.scalar_type())) { - output = complex_convolution(input, weight, bias, stride, padding, dilation, {{0, 0, 0}}, groups); + output = complex_convolution(input, weight, bias, stride, padding, dilation, false, {{0, 0, 0}}, groups); } else { output = at::convolution(input, weight, bias, stride, padding, dilation, false, {{0, 0, 0}}, groups); } @@ -844,8 +955,8 @@ static Tensor convolution_same( auto k = weight.dim(); TORCH_CHECK(k > 2, "weight should have at least three dimensions"); auto dim = static_cast(k - 2); - auto weight_sizes = weight.sizes(); - auto input_sizes = input.sizes(); + auto weight_sizes = weight.sym_sizes(); + auto input_sizes = input.sym_sizes(); TORCH_CHECK(k == input.dim(), "Expected ", k, "-dimensional input for ", k, "-dimensional weight", weight_sizes, ", but got ", @@ -860,7 +971,7 @@ static Tensor convolution_same( } // Calculate the correct padding - DimVector padding_l, padding_r; + SymDimVector padding_l, padding_r; bool symmetric_padding = true; for (auto i: c10::irange(dim)) { auto s = stride.size() == 1 ? stride[0] : stride[i]; @@ -876,14 +987,14 @@ static Tensor convolution_same( if (symmetric_padding) { // All backends handle symmetric padding natively - DimVector output_padding(static_cast(dim)); - return at::convolution(input, weight, bias, stride, padding_l, dilation, + SymDimVector output_padding(static_cast(dim)); + return at::convolution_symint(input, weight, bias, stride, padding_l, dilation, false, output_padding, groups); } TORCH_WARN_ONCE("Using padding='same' with even kernel lengths and odd dilation may" " require a zero-padded copy of the input be created"); - SmallVector pad_nd(static_cast(2 * dim)); + SmallVector pad_nd(static_cast(2 * dim)); for (auto i: c10::irange(dim)) { // Apply padding by the difference, leaving only a symmetric padding auto delta_pad = padding_r[i] - padding_l[i]; @@ -895,10 +1006,10 @@ static Tensor convolution_same( padding_l[i] = padding_r[i]; } } - auto padded_input = at::constant_pad_nd(input, pad_nd, 0); - DimVector output_padding(static_cast(dim)); - return at::convolution(padded_input, weight, bias, stride, padding_l, - dilation, false, output_padding, groups); + auto padded_input = at::constant_pad_nd_symint(input, pad_nd, 0); + SymDimVector output_padding(static_cast(dim)); + return at::convolution_symint(padded_input, weight, bias, stride, padding_l, + dilation, false, output_padding, groups); } Tensor _convolution_mode( @@ -979,8 +1090,14 @@ at::Tensor conv_transpose1d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 1, "conv_transpose1d"); - auto output = at::convolution( + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution( + input, weight, bias, stride, padding, dilation, true, output_padding, groups); + } else { + output = at::convolution( input, weight, bias, stride, padding, dilation, true, output_padding, groups); + } return is_batched ? output : output.squeeze(0); } @@ -994,8 +1111,14 @@ at::Tensor conv_transpose2d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 2, "conv_transpose2d"); - auto output = at::convolution( + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution( input, weight, bias, stride, padding, dilation, true, output_padding, groups); + } else { + output = at::convolution( + input, weight, bias, stride, padding, dilation, true, output_padding, groups); + } return is_batched ? output : output.squeeze(0); } @@ -1009,8 +1132,14 @@ at::Tensor conv_transpose3d( Tensor input; bool is_batched; std::tie(input, is_batched) = batchify(input_, /*num_spatial_dims=*/ 3, "conv_transpose3d"); - auto output = at::convolution( + Tensor output; + if (at::isComplexType(input_.scalar_type())) { + output = complex_convolution( + input, weight, bias, stride, padding, dilation, true, output_padding, groups); + } else { + output = at::convolution( input, weight, bias, stride, padding, dilation, true, output_padding, groups); + } return is_batched ? output : output.squeeze(0); } @@ -1040,61 +1169,25 @@ at::Tensor convolution_overrideable( TORCH_CHECK_NOT_IMPLEMENTED(false, "convolution_overrideable not implemented. You are likely triggering this with tensor backend other than CPU/CUDA/MKLDNN, if this is intended, please use TORCH_LIBRARY_IMPL to override this function "); } -// Selects a backend for convolution based on the inputs and params. -ConvBackend select_conv_backend( - const Tensor& input_r, const Tensor& weight_r, const c10::optional& bias_opt, - IntArrayRef stride_, IntArrayRef padding_, IntArrayRef dilation_, - bool transposed_, IntArrayRef output_padding_, int64_t groups_) { - c10::MaybeOwned bias_maybe_owned = at::borrow_from_optional_tensor(bias_opt); - const Tensor& bias = *bias_maybe_owned; - - auto& ctx = at::globalContext(); - auto k = weight_r.ndimension(); - int64_t dim = k - 2; - ConvParams params; - params.stride = expand_param_if_needed(stride_, "stride", dim); - params.padding = expand_param_if_needed(padding_, "padding", dim); - params.dilation = expand_param_if_needed(dilation_, "dilation", dim); - params.transposed = transposed_; - params.output_padding = expand_param_if_needed(output_padding_, "output_padding", dim); - params.groups = groups_; - params.benchmark = ctx.benchmarkCuDNN(); - params.deterministic = ctx.deterministicCuDNN() || ctx.deterministicAlgorithms(); - params.cudnn_enabled = ctx.userEnabledCuDNN(); - params.allow_tf32 = ctx.allowTF32CuDNN(); - - auto input = input_r; - auto weight = weight_r; - check_shape_forward(input, weight.sizes(), bias, params); - - // Expand 1d -> 2d. - // This is only done for backends that don't natively support 1d spatial input. - if (k == 3 && !input.is_mkldnn() && !input.is_xpu()) { - // avoid accidentally going through NHWC for permuted 3d input. - input = input.contiguous(); - params.view1d_as_2d(); - input = view4d(input); - weight = view4d(weight); - } - - auto bias_sizes_opt = bias.defined() ? c10::optional(bias.sizes()) : c10::nullopt; - bool need_backward = GradMode::is_enabled() && - (input.requires_grad() || weight.requires_grad() || (bias.defined() && bias.requires_grad())); - return select_conv_backend(input, weight, bias_sizes_opt, need_backward, params); -} - -ConvBackend select_conv_backend( +// Function to select the convolution backend based on the inputs and params. +// This overload is used within the convolution internals but not exposed to python. +// NB: The forward pass provides a bias tensor while the backward pass provides +// a bool indicating whether the bias is defined. This is done to save memory by +// avoiding saving the full bias tensor for backward. +template +ConvBackend _select_conv_backend( const Tensor& input, const Tensor& weight, - const at::OptionalIntArrayRef bias_sizes_opt, + const c10::optional& bias, + const at::OptionalArrayRef bias_sizes_opt, const bool need_backward, - const ConvParams& params) { + const ConvParams& params) { // don't send empty inputs through backends - if (input.size(0) == 0 || input.size(1) == 0) { + if (at::symint::size(input, 0) == 0 || at::symint::size(input, 1) == 0) { return input.is_mkldnn() ? ConvBackend::MkldnnEmpty : ConvBackend::Empty; - } else if (input.numel() == 0) { - TORCH_CHECK(false, "Only zero batch or zero channel inputs are supported, but got input shape: ", input.sizes()); + } else if (at::symint::numel(input) == 0) { + TORCH_CHECK(false, "Only zero batch or zero channel inputs are supported, but got input shape: ", at::symint::sizes(input)); } if (params.is_depthwise(input, weight)) { @@ -1130,7 +1223,7 @@ ConvBackend select_conv_backend( // option for NHWC. return ConvBackend::Xnnpack2d; // 3x3 depthwith convolutions implementation is inference only - } else if (!need_backward && params.use_cpu_depthwise3x3_winograd(input, weight)) { + } else if (!need_backward && params.use_cpu_depthwise3x3_winograd(input, weight, bias)) { return ConvBackend::Winograd3x3Depthwise; } else if ( !params.transposed && (input.ndimension() == 5) && @@ -1186,12 +1279,65 @@ ConvBackend select_conv_backend( AT_ERROR("unsupported ConvNd parameters"); } +// Selects a backend for convolution based on the inputs and params. +ConvBackend select_conv_backend( + const Tensor& input_r, const Tensor& weight_r, const c10::optional& bias_opt, + IntArrayRef stride_, SymIntArrayRef padding_, IntArrayRef dilation_, + bool transposed_, SymIntArrayRef output_padding_, int64_t groups_, const at::OptionalSymIntArrayRef bias_sizes_opt) { + c10::MaybeOwned bias_maybe_owned = at::borrow_from_optional_tensor(bias_opt); + const Tensor& bias = *bias_maybe_owned; + + auto& ctx = at::globalContext(); + auto k = weight_r.ndimension(); + int64_t dim = k - 2; + ConvParams params; + params.stride = expand_param_if_needed(stride_, "stride", dim); + params.padding = expand_param_if_needed(padding_, "padding", dim); + params.dilation = expand_param_if_needed(dilation_, "dilation", dim); + params.transposed = transposed_; + params.output_padding = expand_param_if_needed(output_padding_, "output_padding", dim); + params.groups = groups_; + params.benchmark = ctx.benchmarkCuDNN(); + params.deterministic = ctx.deterministicCuDNN() || ctx.deterministicAlgorithms(); + params.cudnn_enabled = ctx.userEnabledCuDNN(); + params.allow_tf32 = ctx.allowTF32CuDNN(); + + auto input = input_r; + auto weight = weight_r; + check_shape_forward(input, weight.sym_sizes(), bias, params); + + // Expand 1d -> 2d. + // This is only done for backends that don't natively support 1d spatial input. + if (k == 3 && !input.is_mkldnn() && !input.is_xpu()) { + // avoid accidentally going through NHWC for permuted 3d input. + input = input.contiguous(); + params.view1d_as_2d(); + input = view4d(input); + weight = view4d(weight); + } + + auto bias_sizes = bias.defined() ? c10::optional(bias.sym_sizes()) : bias_sizes_opt; + bool need_backward = GradMode::is_enabled() && + (input.requires_grad() || weight.requires_grad() || (bias.defined() && bias.requires_grad())); + return _select_conv_backend(input, weight, bias, bias_sizes, need_backward, params); +} + +// For BC reasons, have a copy that does not require bias_opt +ConvBackend select_conv_backend( + const Tensor& input, + const Tensor& weight, + const at::OptionalIntArrayRef bias_sizes_opt, + const bool need_backward, + const ConvParams& params) { + return _select_conv_backend(input, weight, {}, bias_sizes_opt, need_backward, params); +} + at::Tensor _convolution_nogroup_backend( const Tensor& input, const Tensor& weight, const Tensor& bias, const ConvBackend backend, - const ConvParams& params) { + const ConvParams& params) { auto kernel_size = weight.sizes().slice(2); switch(backend) { case ConvBackend::NnpackSpatial: @@ -1222,7 +1368,7 @@ at::Tensor _convolution_nogroup_backend( static inline std::vector calc_output_size( const Tensor& input, const Tensor& weight, - const ConvParams& params) { + const ConvParams& params) { std::vector output_size = params.transposed ? conv_input_size(input.sizes(), weight.sizes(), params.padding, params.output_padding, params.stride, params.dilation, params.groups) : @@ -1277,6 +1423,13 @@ static inline at::MemoryFormat determine_backend_memory_format( return backend_memory_format; } +at::MemoryFormat _determine_backend_memory_format( + const Tensor& input, + const Tensor& weight, + const ConvBackend backend) { + return determine_backend_memory_format(input, weight, backend); +} + at::Tensor _convolution( const Tensor& input_r, const Tensor& weight_r, const c10::optional& bias_r_opt, IntArrayRef stride_, IntArrayRef padding_, IntArrayRef dilation_, @@ -1294,8 +1447,9 @@ at::Tensor _convolution( int64_t dim = k - 2; TORCH_CHECK(dim > 0, "weight should have at least three dimensions"); + TORCH_CHECK(groups_ > 0, "non-positive groups is not supported"); - ConvParams params; + ConvParams params; params.stride = expand_param_if_needed(stride_, "stride", dim); params.padding = expand_param_if_needed(padding_, "padding", dim); params.dilation = expand_param_if_needed(dilation_, "dilation", dim); @@ -1323,7 +1477,7 @@ at::Tensor _convolution( auto bias_sizes_opt = bias.defined() ? c10::optional(bias.sizes()) : c10::nullopt; bool need_backward = GradMode::is_enabled() && (input.requires_grad() || weight.requires_grad() || (bias.defined() && bias.requires_grad())); - ConvBackend backend = select_conv_backend(input, weight, bias_sizes_opt, need_backward, params); + ConvBackend backend = _select_conv_backend(input, weight, bias, c10::OptionalIntArrayRef(bias_sizes_opt), need_backward, params); at::MemoryFormat backend_memory_format = determine_backend_memory_format(input, weight, backend); // Call the backend. @@ -1358,7 +1512,19 @@ at::Tensor _convolution( break; case ConvBackend::Empty: { - auto weight_view = at::_unsafe_view(weight, -1); + Tensor weight_view; + // Use permute and clone to avoid at::_unsafe_view(weight, -1) failure for non-contiguous cases where + // view size is not compatible with input tensor's size and stride. + if(weight.is_contiguous()) { + weight_view = at::_unsafe_view(weight, -1); + } else if (weight.is_contiguous(at::MemoryFormat::ChannelsLast)) { + weight_view = at::_unsafe_view(at::permute(weight, {0, 2, 3, 1}), -1); + } else if (weight.is_contiguous(at::MemoryFormat::ChannelsLast3d)) { + weight_view = at::_unsafe_view(at::permute(weight, {0, 2, 3, 4, 1}), -1); + } else { + weight_view = at::_unsafe_view(weight.clone(at::MemoryFormat::Contiguous), -1); + } + output = (input.size(1) == 0) ? (input.view(-1) * weight_view) : (input * weight_view[0]); if (bias.defined()) { output.add_(bias[0]); @@ -1536,7 +1702,7 @@ std::tuple _convolution_double_backward( const c10::option auto weight = weight_r; int64_t dim = weight.ndimension() - 2; - ConvParams params; + ConvParams params; params.stride = expand_param_if_needed(stride_, "stride", dim); params.padding = expand_param_if_needed(padding_, "padding", dim); params.dilation = expand_param_if_needed(dilation_, "dilation", dim); @@ -1599,7 +1765,7 @@ std::tuple _convolution_double_backward( const c10::option if (ggI.defined()) { // Modified params with correct padding - ConvParams gw_conv_params(params); + ConvParams gw_conv_params(params); // Disable groups as they are handled separately auto groups = gw_conv_params.groups; @@ -1668,7 +1834,7 @@ std::tuple _convolution_double_backward( const c10::option Tensor gI; if (input.numel() != 0) { if (ggW.defined()) { - ConvParams gi_conv_params(params); + ConvParams gi_conv_params(params); gi_conv_params.transposed = !params.transposed; if (params.transposed) { @@ -1724,7 +1890,7 @@ std::tuple _convolution_backward_nogroup_bac const Tensor& weight, const std::array output_mask, const ConvBackend backend, - const ConvParams& params) { + const ConvParams& params) { auto kernel_size = weight.sizes().slice(2); switch(backend) { case ConvBackend::Slow2d: @@ -1789,7 +1955,7 @@ std::tuple convolution_backward( TORCH_CHECK(dim > 0, "weight should have at least three dimensions"); auto& ctx = at::globalContext(); - ConvParams params; + ConvParams params; params.stride = expand_param_if_needed(stride, "stride", dim); params.padding = expand_param_if_needed(padding, "padding", dim); params.dilation = expand_param_if_needed(dilation, "dilation", dim); diff --git a/aten/src/ATen/native/ConvolutionMM2d.cpp b/aten/src/ATen/native/ConvolutionMM2d.cpp index d93166a1e343..eb4deee26945 100644 --- a/aten/src/ATen/native/ConvolutionMM2d.cpp +++ b/aten/src/ATen/native/ConvolutionMM2d.cpp @@ -1,15 +1,26 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include -#include #include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/ConvolutionMM3d.cpp b/aten/src/ATen/native/ConvolutionMM3d.cpp index 98dce11f48d4..3569a9a55d8e 100644 --- a/aten/src/ATen/native/ConvolutionMM3d.cpp +++ b/aten/src/ATen/native/ConvolutionMM3d.cpp @@ -1,14 +1,26 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include -#include #include #include #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + constexpr int64_t CONV3D_GRAIN_SALT = 20; namespace at { diff --git a/aten/src/ATen/native/ConvolutionMM3d.h b/aten/src/ATen/native/ConvolutionMM3d.h index 9567b5d928c1..b87674672d1d 100644 --- a/aten/src/ATen/native/ConvolutionMM3d.h +++ b/aten/src/ATen/native/ConvolutionMM3d.h @@ -1,4 +1,4 @@ -#include +#include namespace at { namespace native { diff --git a/aten/src/ATen/native/ConvolutionTBC.cpp b/aten/src/ATen/native/ConvolutionTBC.cpp index c90577822218..38aa7b85ca5f 100644 --- a/aten/src/ATen/native/ConvolutionTBC.cpp +++ b/aten/src/ATen/native/ConvolutionTBC.cpp @@ -1,8 +1,18 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/Copy.cpp b/aten/src/ATen/native/Copy.cpp index d4b5c74c3bf3..0c99943eb0cb 100644 --- a/aten/src/ATen/native/Copy.cpp +++ b/aten/src/ATen/native/Copy.cpp @@ -1,21 +1,29 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include #include #include -#include -#include +#include #include #include #include #include #include #include -#include #include #include #include -#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif #ifdef USE_FBGEMM #include @@ -116,12 +124,17 @@ static Tensor & copy_impl(Tensor & self, const Tensor & src, bool non_blocking) // 1. Memory Format for source and destination tensors is contiguous. // 2. Device for both the source and destination tensor is CPU. // 3. dtype conversion between FP32->FP16 and FP16->FP32. + // This checks that self.sizes() == src.sizes() because this code path doesn't + // support broadcasting. This also guards against out of bounds memory access + // when copying, see fbgemm::Float16ToFloat_ref. + // https://github.com/pytorch/pytorch/issues/88543 #ifdef USE_FBGEMM if (((self.dtype() == at::kFloat && src.dtype() == at::kHalf) || (self.dtype() == at::kHalf && src.dtype() == at::kFloat)) && (self.device().is_cpu() && src.device().is_cpu()) && ((self.is_contiguous() && src.is_contiguous()) || - (self.is_non_overlapping_and_dense() && self.strides() == src.strides()))) { + (self.is_non_overlapping_and_dense() && self.strides() == src.strides())) && + (self.sizes() == src.sizes())) { if (src.dtype() == at::kFloat && self.dtype() == at::kHalf) { auto* output_ptr = reinterpret_cast(self.data_ptr()); @@ -212,6 +225,18 @@ static Tensor & copy_impl(Tensor & self, const Tensor & src, bool non_blocking) return at::metal::metal_copy_(self, src); } + // Exit early if self and src are views of the same data + const bool is_same_data = ( + self.is_alias_of(src) && + self.storage_offset() == src.storage_offset() && + self.strides().equals(src.strides()) && + self.sizes().equals(src.sizes()) && + self.scalar_type() == src.scalar_type() + ); + if (is_same_data) { + return self; + } + auto iter = TensorIteratorConfig() .add_output(self) @@ -253,27 +278,39 @@ static Tensor & copy_impl(Tensor & self, const Tensor & src, bool non_blocking) return self; } +// NB: cribbed from https://github.com/pytorch/pytorch/pull/88198 +at::Tensor clone_preserve_strides(const at::Tensor& self) { + TORCH_INTERNAL_ASSERT(self.has_storage()); + // In cases where the input tensor has internal memory overlap, we cannot actually + // preserve the strides/storage_offset of the input tensor, because + // *_scatter ops will try to copy_() into the cloned tensor. + // However, this should **never** show up in functionalized user code; + // most aten ops that try to mutate a tensor with internal memory overlap would error anyway. + // + // The one place that this does come up is in autograd - if there's a select_scatter + // in the forward, then autograd will generate one for the backward. + // If the input to the select_scatter is grad_output, then this could be an expanded tensor + // with internal overlap. + //if (at::has_internal_overlap(self) == at::MemOverlap::Yes) { + // return self.clone(); + //} + auto dtype_size = self.dtype().itemsize(); + auto nbytes = self.storage().sym_nbytes(); + TORCH_INTERNAL_ASSERT(nbytes % dtype_size == 0); + auto numel = nbytes / dtype_size; + auto self_full_size = self.as_strided_symint({numel}, {1}, 0); + auto clone = self_full_size.clone(); + auto out = clone.as_strided_symint(self.sym_sizes(), self.sym_strides(), self.sym_storage_offset()); + return out; +} + Tensor copy(const Tensor& self, const Tensor& src, bool non_blocking) { // copy() is the "functional" form of copy_(). It exists so we can properly functionalize copy_(), but: // (1) It isn't exposed to the frontend (no python bindings) // (2) It isn't exposed to the backend (it's a composite, that decomposes into to() and expand_as() calls. - // Note: This implementation doesn't currently preserve the strides of `self`. - // That might be fine for functorch (which already doesn't preserve strides in vmap), - // but it's worth looking into whether or not this implementation will be problematic for LazyTensor/XLA. - auto intermediate = src.to(self, non_blocking); - // We can't use expand() here. Why? - // The contract for copy_() is that the output tensor has the same amount of storage as the original tensor. - // e.g. This should work: - // a = torch.ones(4, 4) - // b = torch.ones(1, 4) - // c = torch.ones(4, 4) - // torch.ops.aten.copy(a, b).add_(c) - // We don't want to emit an extra copy every time though, so we only do it if the shapes are different. - if (self.sizes() != intermediate.sizes()) { - return at::expand_copy(intermediate, self.sizes()); - } else { - return intermediate; - } + auto r = clone_preserve_strides(self); + r.copy_(src, non_blocking); + return r; } Tensor& copy_(Tensor& self, const Tensor& src, bool non_blocking) { diff --git a/aten/src/ATen/native/Correlation.cpp b/aten/src/ATen/native/Correlation.cpp index 0bd27195df76..9aca753c78ca 100644 --- a/aten/src/ATen/native/Correlation.cpp +++ b/aten/src/ATen/native/Correlation.cpp @@ -1,5 +1,23 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif namespace at { namespace native { @@ -47,7 +65,7 @@ Tensor cov( " != ", num_observations); TORCH_CHECK( - num_observations == 0 || w.min().ge(0).item(), + num_observations == 0 || at::is_scalar_tensor_true(w.min().ge(0)), "cov(): fweights cannot be negative"); } @@ -70,7 +88,7 @@ Tensor cov( " != ", num_observations); TORCH_CHECK( - num_observations == 0 || aw.min().ge(0).item(), + num_observations == 0 || at::is_scalar_tensor_true(aw.min().ge(0)), "cov(): aweights cannot be negative"); w = w.defined() ? w * aw : aw; } @@ -81,7 +99,7 @@ Tensor cov( : at::scalar_tensor(num_observations, in.options().dtype(kLong)); TORCH_CHECK( - !w.defined() || w_sum.ne(0).item(), + !w.defined() || at::is_scalar_tensor_true(w_sum.ne(0)), "cov(): weights sum to zero, can't be normalized"); const auto avg = (w.defined() ? in * w : in).sum(OBSERVATIONS_DIM) / w_sum; @@ -95,7 +113,7 @@ Tensor cov( norm_factor = w_sum - correction; } - if (norm_factor.le(0).item()) { + if (at::is_scalar_tensor_true(norm_factor.le(0))) { TORCH_WARN("cov(): degrees of freedom is <= 0"); norm_factor.zero_(); } @@ -121,7 +139,7 @@ Tensor corrcoef(const Tensor& self) { } // normalize covariance - const auto d = c.diag(); + const auto d = c.diagonal(); const auto stddev = at::sqrt(d.is_complex() ? at::real(d) : d); c = c / stddev.view({-1, 1}); c = c / stddev.view({1, -1}); diff --git a/aten/src/ATen/native/Cross.cpp b/aten/src/ATen/native/Cross.cpp index 4b3e43da1147..6c40001703c8 100644 --- a/aten/src/ATen/native/Cross.cpp +++ b/aten/src/ATen/native/Cross.cpp @@ -1,25 +1,39 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include -#include +#include +#include #include +#include -#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif namespace at { namespace meta { -TORCH_PRECOMPUTE_META_FUNC(linalg_cross) -(const Tensor & input, const Tensor & other, const int64_t dimension) { - auto out_size = infer_size(input.sizes(), other.sizes()); - Tensor input_broadcasted = input.expand(out_size); - Tensor other_broadcasted = other.expand(out_size); +TORCH_META_FUNC(linalg_cross) +(const Tensor & input, const Tensor & other, int64_t dim) { + auto x_d = input.dim(); + auto y_d = other.dim(); + // This is to avoid things like + // linalg.cross(torch.randn(2, 3), torch.randn(5, 2, 3), dim=2) + TORCH_CHECK(x_d == y_d, "linalg.cross: inputs must have the same number of dimensions."); + TORCH_CHECK(input.size(dim) == 3 && other.size(dim) == 3, "linalg.cross: inputs dimension ", dim, " must have length 3. Got ", input.size(dim), " and ", other.size(dim)); - int64_t dim = maybe_wrap_dim(dimension, input.dim()); // default dim = -1 - TORCH_CHECK(input_broadcasted.size(dim) == 3, "dimension ", dimension, " does not have size 3"); + // Broadcast the batch dimension of input and other. + // Since the non-batch dimensions agree, this is the same as broadcast all the inputs + auto out_size = infer_size(input.sizes(), other.sizes()); set_output_raw_strided(0, out_size, {}, input.options()); - return TORCH_PRECOMPUTE_STRUCT(linalg_cross)().set_dim(dim); } } @@ -56,8 +70,9 @@ Tensor & cross_out(const Tensor & input, const Tensor & other, const c10::option TORCH_IMPL_FUNC(linalg_cross_out) -(const Tensor & input, const Tensor & other, const int64_t dim, const Tensor & out) { - auto out_size = infer_size(input.sizes(), other.sizes()); +(const Tensor & input, const Tensor & other, int64_t dim, const Tensor & out) { + dim = maybe_wrap_dim(dim, input.dim()); + auto out_size = out.sizes(); Tensor input_broadcasted = input.expand(out_size); Tensor other_broadcasted = other.expand(out_size); diff --git a/aten/src/ATen/native/DilatedMaxPool2d.cpp b/aten/src/ATen/native/DilatedMaxPool2d.cpp index c9e980e44ab7..576e28866cbc 100644 --- a/aten/src/ATen/native/DilatedMaxPool2d.cpp +++ b/aten/src/ATen/native/DilatedMaxPool2d.cpp @@ -1,8 +1,18 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif namespace at { namespace meta { diff --git a/aten/src/ATen/native/DilatedMaxPool3d.cpp b/aten/src/ATen/native/DilatedMaxPool3d.cpp index 57fa6f9ea691..643943160556 100644 --- a/aten/src/ATen/native/DilatedMaxPool3d.cpp +++ b/aten/src/ATen/native/DilatedMaxPool3d.cpp @@ -1,11 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/DispatchStub.cpp b/aten/src/ATen/native/DispatchStub.cpp index a91448c3da72..52f73cfce43a 100644 --- a/aten/src/ATen/native/DispatchStub.cpp +++ b/aten/src/ATen/native/DispatchStub.cpp @@ -1,6 +1,8 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include +#include #include #include diff --git a/aten/src/ATen/native/DispatchStub.h b/aten/src/ATen/native/DispatchStub.h index bcbf41fd9d0f..9394442fe754 100644 --- a/aten/src/ATen/native/DispatchStub.h +++ b/aten/src/ATen/native/DispatchStub.h @@ -1,11 +1,10 @@ #pragma once -#include -#include -#include +#include +#include -#include #include +#include // Implements instruction set specific function dispatch. // diff --git a/aten/src/ATen/native/Distance.cpp b/aten/src/ATen/native/Distance.cpp index 8d23e10b1719..17be4a468751 100644 --- a/aten/src/ATen/native/Distance.cpp +++ b/aten/src/ATen/native/Distance.cpp @@ -1,11 +1,39 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { DEFINE_DISPATCH(pdist_forward_stub); diff --git a/aten/src/ATen/native/DistributionTemplates.h b/aten/src/ATen/native/DistributionTemplates.h index 15e2be8c8f27..2132407df80f 100644 --- a/aten/src/ATen/native/DistributionTemplates.h +++ b/aten/src/ATen/native/DistributionTemplates.h @@ -1,8 +1,9 @@ #pragma once -#include +#include #include #include +#include #include #include #include @@ -12,6 +13,15 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace templates { diff --git a/aten/src/ATen/native/Distributions.cpp b/aten/src/ATen/native/Distributions.cpp index 962c01061442..43305efdd885 100644 --- a/aten/src/ATen/native/Distributions.cpp +++ b/aten/src/ATen/native/Distributions.cpp @@ -1,24 +1,48 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include -#include +#include +#include #include #include #include -#include #include #include #include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include // NOLINTNEXTLINE(modernize-deprecated-headers) @@ -581,14 +605,14 @@ Tensor& multinomial_out(const Tensor& self, return result; } - // Fast-path for no replacement. + // Fast-path for no replacement or if only one sample is drawn. // Reference: // https://github.com/pytorch/pytorch/issues/11931#issuecomment-625882503 // Half is not supported on CPU. TORCH_CHECK( !(self.device().is_cpu() && self.scalar_type() == ScalarType::Half), "multinomial is not implemented for half on CPU"); - if (!with_replacement) { + if (!with_replacement || n_sample == 1) { // Sanity checks on `self`. auto is_valid = ((self.max() < INFINITY) & (self.min() >= 0)).item(); TORCH_CHECK( diff --git a/aten/src/ATen/native/Dropout.cpp b/aten/src/ATen/native/Dropout.cpp index 36e1b92ad1bd..2903fac4f504 100644 --- a/aten/src/ATen/native/Dropout.cpp +++ b/aten/src/ATen/native/Dropout.cpp @@ -1,8 +1,25 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { @@ -12,9 +29,9 @@ template using Ctype = typename std::conditional::type; Tensor make_feature_noise(const Tensor& input) { - auto input_sizes = input.sizes(); + auto input_sizes = input.sym_sizes(); TORCH_CHECK(input.dim() >= 2, "Feature dropout requires at least 2 dimensions in the input"); - std::vector sizes; + c10::SymDimVector sizes; sizes.reserve(input.dim()); sizes.push_back(input_sizes[0]); sizes.push_back(input_sizes[1]); @@ -22,11 +39,11 @@ Tensor make_feature_noise(const Tensor& input) { (void)i; //Suppress unused variable warning sizes.push_back(1); } - return input.new_empty(sizes); + return input.new_empty_symint(sizes); } bool is_fused_kernel_acceptable(const Tensor& input, double p) { - return (input.is_cuda() || input.is_xpu() || input.is_lazy()) && p > 0 && p < 1 && input.numel() > 0; + return (input.is_cuda() || input.is_xpu() || input.is_lazy()) && p > 0 && p < 1 && input.sym_numel() > 0; } // NB: sure, we could have used different overloads here, but I would feel insecure @@ -46,7 +63,7 @@ Tensor multiply(const Tensor& input, const Tensor& noise) { template Ctype _dropout_impl(T& input, double p, bool train) { TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p); - if (p == 0 || !train || input.numel() == 0) { + if (p == 0 || !train || input.sym_numel() == 0) { return input; } @@ -109,7 +126,7 @@ native_dropout_cpu(const Tensor& input, double p, c10::optional train) { return std::make_tuple(output, mask); } -Tensor native_dropout_backward_cpu(const Tensor& grad, const Tensor& mask, double scale) { +Tensor native_dropout_backward(const Tensor& grad, const Tensor& mask, double scale) { Tensor result = grad * mask * scale; return result; } @@ -117,7 +134,10 @@ Tensor native_dropout_backward_cpu(const Tensor& grad, const Tensor& mask, doubl Tensor dropout(const Tensor& input, double p, bool train) { auto result = [&]() { NoNamesGuard guard; - if (train && is_fused_kernel_acceptable(input, p)) { + // TODO: we can remove this is_nested() code smell in the future + // if we find a way to support _dropout for nested tensor + // e.g. make it an op (at::_dropout) to use dispatcher? + if (input.is_nested() || (train && is_fused_kernel_acceptable(input, p))) { return std::get<0>(at::native_dropout(input, p, train)); } return _dropout(input, p, train); diff --git a/aten/src/ATen/native/Embedding.cpp b/aten/src/ATen/native/Embedding.cpp index cac0cbe7130f..4c37325c4817 100644 --- a/aten/src/ATen/native/Embedding.cpp +++ b/aten/src/ATen/native/Embedding.cpp @@ -1,20 +1,40 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include #include -#include #include namespace at { namespace native { -Tensor embedding(const Tensor & weight, const Tensor & indices, - int64_t padding_idx, bool scale_grad_by_freq, bool sparse) { +Tensor embedding_symint(const Tensor & weight, const Tensor & indices, + c10::SymInt padding_idx, bool scale_grad_by_freq, bool sparse) { TORCH_CHECK(weight.dim() == 2, "'weight' must be 2-D"); auto indices_arg = TensorArg(indices, "indices", 1); checkScalarTypes("embedding", indices_arg, {kLong, kInt}); @@ -24,23 +44,30 @@ Tensor embedding(const Tensor & weight, const Tensor & indices, return weight.index_select(0, indices); } - auto size = indices.sizes().vec(); - for (auto d : weight.sizes().slice(1)) { + auto size = indices.sym_sizes().vec(); + for (auto d : weight.sym_sizes().slice(1)) { size.push_back(d); } - return weight.index_select(0, indices.reshape(-1)).view(size); + return weight.index_select(0, indices.reshape(-1)).view_symint(size); } -Tensor embedding_backward( - const Tensor & grad, const Tensor & indices, int64_t num_weights, - int64_t padding_idx, bool scale_grad_by_freq, bool sparse) { +Tensor embedding_backward_symint( + const Tensor & grad, const Tensor & indices, c10::SymInt num_weights, + c10::SymInt padding_idx, bool scale_grad_by_freq, bool sparse) { if (sparse) { + // TODO: if we teach sparse tensor how to propagate symints, the guard + // here is not strictly necessary. However, we think it is fine as is + // because num weights is derived from a parameter and therefore + // typically not varying. return at::embedding_sparse_backward( - grad, indices, num_weights, padding_idx, scale_grad_by_freq); + grad, indices, + num_weights.guard_int(__FILE__, __LINE__), + padding_idx.guard_int(__FILE__, __LINE__), + scale_grad_by_freq); } else { - return at::embedding_dense_backward( - grad, indices, num_weights, padding_idx, scale_grad_by_freq); + return at::embedding_dense_backward_symint( + grad, indices, num_weights, padding_idx, scale_grad_by_freq); } } @@ -60,25 +87,25 @@ Tensor embedding_sparse_backward( Tensor indices = indices_; Tensor grad = grad_; if (padding_idx != -1) { - torch::List> c({indices != padding_idx}); + c10::List> c({indices != padding_idx}); indices = indices.index(c); grad = grad.index(c); } - int64_t num_features = grad_.size(-1); - auto weight_size = std::array{{ num_weights, num_features }}; + auto num_features = grad_.sym_size(-1); + auto weight_size = std::array{{ num_weights, num_features }}; auto dense_options = grad.options(); // check if all our grad come from padding_idx - if (grad.numel() == 0) { - return at::_sparse_coo_tensor_unsafe(at::empty({1, 0}, indices_.options().dtype(kLong)), - at::empty({0, num_features}, dense_options), + if (grad.sym_numel() == 0) { + return at::_sparse_coo_tensor_unsafe_symint(at::empty({1, 0}, indices_.options().dtype(kLong)), + at::empty_symint({c10::SymInt(0), num_features}, dense_options), weight_size); } auto index = indices.reshape({1, -1}); - auto values = grad.reshape({-1, num_features}); - return at::_sparse_coo_tensor_unsafe(index.to(kLong), values, weight_size); + auto values = grad.reshape_symint({c10::SymInt(-1), num_features}); + return at::_sparse_coo_tensor_unsafe_symint(index.to(kLong), values, weight_size); } Tensor embedding_dense_backward_cpu( diff --git a/aten/src/ATen/native/EmbeddingBag.cpp b/aten/src/ATen/native/EmbeddingBag.cpp index 17094bf9082d..21404947b3db 100644 --- a/aten/src/ATen/native/EmbeddingBag.cpp +++ b/aten/src/ATen/native/EmbeddingBag.cpp @@ -1,13 +1,16 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include +#include #include +#include #include #include #include +#include #include +#include #ifdef USE_FBGEMM #include @@ -18,12 +21,32 @@ #include #include -#include -#include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif namespace { const int MODE_SUM = 0; @@ -1245,8 +1268,6 @@ void _embedding_bag_cpu_out( fbgemm_kernel_cache); } -// Assumes all input tensors are contiguous. -// See NOTE [ embedding_bag Native Functions ] in native_functions.yaml for details Tensor _embedding_bag_backward(const Tensor &grad, const Tensor &indices_, const Tensor &offsets_, const Tensor &offset2bag, @@ -1256,6 +1277,21 @@ Tensor _embedding_bag_backward(const Tensor &grad, const Tensor &indices_, bool scale_grad_by_freq, int64_t mode, bool sparse, const c10::optional& per_sample_weights_opt, int64_t padding_idx) { + return at::native::_embedding_bag_backward_symint( + grad, indices_, offsets_, offset2bag, bag_size_, max_indices_, num_weights, scale_grad_by_freq, mode, sparse, per_sample_weights_opt, padding_idx); +} + +// Assumes all input tensors are contiguous. +// See NOTE [ embedding_bag Native Functions ] in native_functions.yaml for details +Tensor _embedding_bag_backward_symint(const Tensor &grad, const Tensor &indices_, + const Tensor &offsets_, + const Tensor &offset2bag, + const Tensor &bag_size_, + const Tensor &max_indices_, + c10::SymInt num_weights, + bool scale_grad_by_freq, int64_t mode, + bool sparse, const c10::optional& per_sample_weights_opt, + int64_t padding_idx) { // See [Note: hacky wrapper removal for optional tensor] c10::MaybeOwned per_sample_weights_maybe_owned = at::borrow_from_optional_tensor(per_sample_weights_opt); const Tensor& per_sample_weights = *per_sample_weights_maybe_owned; @@ -1271,7 +1307,7 @@ Tensor _embedding_bag_backward(const Tensor &grad, const Tensor &indices_, checkContiguous("embedding_bag", offsets_arg); Tensor offset2bag_; - if (indices.numel() != 0 && offset2bag.numel() == 0) { + if (indices.sym_numel() != 0 && offset2bag.sym_numel() == 0) { offset2bag_ = offsets.new_zeros( {indices.size(0) + 1}, offsets.options()); // offset2bag = [0 0 0 0 0] @@ -1292,11 +1328,11 @@ Tensor _embedding_bag_backward(const Tensor &grad, const Tensor &indices_, } if (sparse) { - return at::_embedding_bag_sparse_backward( + return at::_embedding_bag_sparse_backward_symint( grad, indices, offsets, offset2bag_, bag_size_, num_weights, scale_grad_by_freq, mode, per_sample_weights, padding_idx); } else { - return at::_embedding_bag_dense_backward( + return at::_embedding_bag_dense_backward_symint( grad, indices, offset2bag_, bag_size_, max_indices_, num_weights, scale_grad_by_freq, mode, per_sample_weights, padding_idx); } @@ -1606,7 +1642,16 @@ Tensor _embedding_bag_per_sample_weights_backward_cpu( Tensor _embedding_bag_sparse_backward( const Tensor &grad_, const Tensor &indices, const Tensor &offsets, - const Tensor &offset2bag, const Tensor &bag_size_, int64_t num_weights, + const Tensor &offset2bag, const Tensor &bag_size_, SymInt num_weights, + bool scale_grad_by_freq, int64_t mode, const c10::optional& per_sample_weights_opt, + int64_t padding_idx) { + return at::native::_embedding_bag_sparse_backward_symint(grad_, indices, offsets, offset2bag, bag_size_, num_weights, + scale_grad_by_freq, mode, per_sample_weights_opt, padding_idx); +} + +Tensor _embedding_bag_sparse_backward_symint( + const Tensor &grad_, const Tensor &indices, const Tensor &offsets, + const Tensor &offset2bag, const Tensor &bag_size_, SymInt num_weights, bool scale_grad_by_freq, int64_t mode, const c10::optional& per_sample_weights_opt, int64_t padding_idx) { // See [Note: hacky wrapper removal for optional tensor] @@ -1628,7 +1673,7 @@ Tensor _embedding_bag_sparse_backward( AT_ASSERT(mode == MODE_SUM); index_grad.mul_(per_sample_weights.unsqueeze(1)); } - return native::embedding_backward(index_grad, indices, num_weights, padding_idx, + return native::embedding_backward_symint(index_grad, indices, num_weights, padding_idx, scale_grad_by_freq, true); } } diff --git a/aten/src/ATen/native/EmbeddingBag.h b/aten/src/ATen/native/EmbeddingBag.h index 6600c661d46a..9d44fa688b2b 100644 --- a/aten/src/ATen/native/EmbeddingBag.h +++ b/aten/src/ATen/native/EmbeddingBag.h @@ -1,4 +1,5 @@ -#include +#include +#include #include #ifdef USE_FBGEMM diff --git a/aten/src/ATen/native/Fill.cpp b/aten/src/ATen/native/Fill.cpp index 4952aaa91a05..ac3e2bb2cbd6 100644 --- a/aten/src/ATen/native/Fill.cpp +++ b/aten/src/ATen/native/Fill.cpp @@ -1,13 +1,24 @@ // Functions that fill Tensors with constants. +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include -#include #include -#include -#include +#include +#include +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/ForeachOpsKernels.cpp b/aten/src/ATen/native/ForeachOpsKernels.cpp index f5665be248e4..4b6ef9196f99 100644 --- a/aten/src/ATen/native/ForeachOpsKernels.cpp +++ b/aten/src/ATen/native/ForeachOpsKernels.cpp @@ -1,7 +1,57 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { #define FOREACH_BINARY_OP_SCALAR(OP) \ @@ -146,7 +196,30 @@ void foreach_tensor_##OP##_scalarlist_slow_(TensorList input, TensorList tensors for(const auto i : c10::irange(input.size())) { \ input[i].OP##_(tensors1[i], tensors2[i], scalars[i]); \ } \ -} \ +} + +#define FOREACH_POINTWISE_OP_TENSOR(OP) \ + std::vector foreach_tensor_##OP##_tensor_slow( \ + TensorList input, \ + TensorList tensors1, \ + TensorList tensors2, \ + const Tensor& scalars_) { \ + auto scalars = convert_tensor_to_scalar_list(scalars_, input.size()); \ + check_foreach_api_restrictions(input, tensors1, tensors2, scalars); \ + return foreach_tensor_##OP##_scalarlist_slow( \ + input, tensors1, tensors2, scalars); \ + } \ + \ + void foreach_tensor_##OP##_tensor_slow_( \ + TensorList input, \ + TensorList tensors1, \ + TensorList tensors2, \ + const Tensor& scalars_) { \ + auto scalars = convert_tensor_to_scalar_list(scalars_, input.size()); \ + check_foreach_api_restrictions(input, tensors1, tensors2, scalars); \ + foreach_tensor_##OP##_scalarlist_slow_( \ + input, tensors1, tensors2, scalars); \ + } FOREACH_BINARY_OP_LIST_ALPHA(add); FOREACH_BINARY_OP_LIST_ALPHA(sub); @@ -199,6 +272,9 @@ FOREACH_POINTWISE_OP_SCALAR(addcmul); FOREACH_POINTWISE_OP_SCALARLIST(addcdiv); FOREACH_POINTWISE_OP_SCALARLIST(addcmul); +FOREACH_POINTWISE_OP_TENSOR(addcdiv); +FOREACH_POINTWISE_OP_TENSOR(addcmul); + // NOTE(crcrpar): It didn't seem feasible to use `self[i]` as both the first and the last // arguments of `maximum_out` and `minimum_out` so I tentatively embarrassingly get and copy // the result to `self[i]`. diff --git a/aten/src/ATen/native/ForeachUtils.h b/aten/src/ATen/native/ForeachUtils.h index 033052f401f6..0166d040863c 100644 --- a/aten/src/ATen/native/ForeachUtils.h +++ b/aten/src/ATen/native/ForeachUtils.h @@ -2,6 +2,7 @@ #include #include +#include #ifndef AT_PER_OPERATOR_HEADERS #include @@ -123,6 +124,45 @@ bool check_fast_path_restrictions( return true; } +std::vector convert_tensor_to_scalar_list( + const Tensor& scalarList_, + int64_t expect_length) { + std::vector scalarList; + TORCH_CHECK( + scalarList_.device() == c10::kCPU, + "Expected scalars to be on CPU, got ", + scalarList_.device(), + " instead."); + TORCH_CHECK( + scalarList_.is_contiguous(), "Expected scalars to be contiguous."); + TORCH_CHECK( + scalarList_.dim() == 1, + "Expected packed scalar Tensor to be of dimension 1. Got ", + scalarList_.dim(), + " instead."); + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( + kComplexHalf, + kHalf, + kBool, + kBFloat16, + scalarList_.scalar_type(), + "convert_tensor_to_scalar_list", + [&]() { + const scalar_t* scalar_data = scalarList_.data_ptr(); + TORCH_CHECK( + (expect_length == scalarList_.size(0)), + "Expected length of scalars to match input of length ", + expect_length, + " but got ", + scalarList_.size(0), + " instead."); + for (int64_t i = 0; i < scalarList_.size(0); i++) { + scalarList.push_back(c10::Scalar(scalar_data[i])); + } + }); + return scalarList; +} + bool can_use_fast_route(ArrayRef tensorLists, ArrayRef scalarList = {}, bool does_op_promote_integer_inputs_to_float = false) { diff --git a/aten/src/ATen/native/FractionalMaxPool2d.cpp b/aten/src/ATen/native/FractionalMaxPool2d.cpp index b4f8207af042..82512c83f433 100644 --- a/aten/src/ATen/native/FractionalMaxPool2d.cpp +++ b/aten/src/ATen/native/FractionalMaxPool2d.cpp @@ -1,8 +1,18 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/FractionalMaxPool3d.cpp b/aten/src/ATen/native/FractionalMaxPool3d.cpp index 11769545090f..5890026872a8 100644 --- a/aten/src/ATen/native/FractionalMaxPool3d.cpp +++ b/aten/src/ATen/native/FractionalMaxPool3d.cpp @@ -1,10 +1,20 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/GatedLinearUnit.cpp b/aten/src/ATen/native/GatedLinearUnit.cpp index b7b20e1c32f1..0bbfc74f99a7 100644 --- a/aten/src/ATen/native/GatedLinearUnit.cpp +++ b/aten/src/ATen/native/GatedLinearUnit.cpp @@ -1,7 +1,22 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { diff --git a/aten/src/ATen/native/GridSampler.cpp b/aten/src/ATen/native/GridSampler.cpp index 8b0440610226..586c1cab40d1 100644 --- a/aten/src/ATen/native/GridSampler.cpp +++ b/aten/src/ATen/native/GridSampler.cpp @@ -1,17 +1,35 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include -#include -#include -#include +#include +#include #include -#include -#include -#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { using at::native::detail::GridSamplerInterpolation; diff --git a/aten/src/ATen/native/GridSamplerUtils.h b/aten/src/ATen/native/GridSamplerUtils.h index 0b6f29de8c42..7c22fedfe94e 100644 --- a/aten/src/ATen/native/GridSamplerUtils.h +++ b/aten/src/ATen/native/GridSamplerUtils.h @@ -101,7 +101,7 @@ bool cond_cudnn_grid_sampler( at::native::canUse32BitIndexMath(input) && at::native::canUse32BitIndexMath(grid) && input.dim() == 4 && - input.size(1) <= 1024); + input.sym_size(1) <= 1024); } } // anonymous namespace diff --git a/aten/src/ATen/native/Histogram.cpp b/aten/src/ATen/native/Histogram.cpp index c3a007f2c2dc..89ede6bea35c 100644 --- a/aten/src/ATen/native/Histogram.cpp +++ b/aten/src/ATen/native/Histogram.cpp @@ -1,10 +1,28 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include #include diff --git a/aten/src/ATen/native/Histogram.h b/aten/src/ATen/native/Histogram.h index 9df0aafafc18..3305cc5e315f 100644 --- a/aten/src/ATen/native/Histogram.h +++ b/aten/src/ATen/native/Histogram.h @@ -3,8 +3,6 @@ #include #include -#include - namespace at { namespace native { using histogramdd_fn = void(*)(const Tensor&, const c10::optional&, bool, Tensor&, const TensorList&); diff --git a/aten/src/ATen/native/Im2Col.cpp b/aten/src/ATen/native/Im2Col.cpp index c4b05bc18b56..416e77e9ff19 100644 --- a/aten/src/ATen/native/Im2Col.cpp +++ b/aten/src/ATen/native/Im2Col.cpp @@ -1,12 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + namespace at { namespace native { namespace { @@ -85,7 +94,6 @@ static void im2col_out_cpu_template( int64_t output_length = output_height * output_width; output.resize_({batch_size, n_output_plane, output_length}); - output.zero_(); AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kBFloat16, kHalf, input.scalar_type(), "im2col_out_cpu", [&] { @@ -120,29 +128,6 @@ static void im2col_out_cpu_template( }); } -static void im2col_backward_out_cpu_template( - Tensor& grad_input, - const Tensor& grad_output, - IntArrayRef input_size, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride) { - TORCH_CHECK( - input_size.size() == 2, - "It is expected input_size equals to 2, but got size ", - input_size.size()); - // col2im_out_cpu checks size of kernel_size, dilation, padding and stride - at::native::col2im_out_cpu( - grad_output, - input_size, - kernel_size, - dilation, - padding, - stride, - grad_input); -} - } // namespace Tensor& im2col_out_cpu(const Tensor& input, @@ -169,43 +154,5 @@ Tensor im2col_cpu( return output; } -Tensor& im2col_backward_out_cpu(const Tensor& grad_output, - IntArrayRef input_size, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride, - Tensor& grad_input) { - im2col_backward_out_cpu_template( - grad_input, - grad_output, - input_size, - kernel_size, - dilation, - padding, - stride); - return grad_input; -} - -Tensor im2col_backward_cpu( - const Tensor& grad_output, - IntArrayRef input_size, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride) { - Tensor grad_input = at::empty_like(grad_output, LEGACY_CONTIGUOUS_MEMORY_FORMAT); - - im2col_backward_out_cpu_template( - grad_input, - grad_output, - input_size, - kernel_size, - dilation, - padding, - stride); - return grad_input; -} - } // namespace native } // namespace at diff --git a/aten/src/ATen/native/IndexKernel.h b/aten/src/ATen/native/IndexKernel.h index 41b4efc5f441..a54343d510a8 100644 --- a/aten/src/ATen/native/IndexKernel.h +++ b/aten/src/ATen/native/IndexKernel.h @@ -1,5 +1,6 @@ #pragma once #include +#include namespace at { class Tensor; diff --git a/aten/src/ATen/native/IndexingUtils.cpp b/aten/src/ATen/native/IndexingUtils.cpp index e91eff03ab85..2dba1972ce57 100644 --- a/aten/src/ATen/native/IndexingUtils.cpp +++ b/aten/src/ATen/native/IndexingUtils.cpp @@ -1,9 +1,10 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include namespace at { namespace native { bool canUse32BitIndexMath(const TensorBase& t, int64_t max_elem) { - int64_t elements = t.numel(); + auto elements = t.sym_numel(); if (elements >= max_elem) { return false; } @@ -11,16 +12,16 @@ bool canUse32BitIndexMath(const TensorBase& t, int64_t max_elem) { return max_elem > 0; } - int64_t offset = 0; - int64_t linearId = elements - 1; + c10::SymInt offset = 0; + auto linearId = elements - 1; // NOTE: Assumes all strides are positive, which is true for now // NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions) for (int i = t.dim() - 1; i >= 0; --i) { - int64_t curDimIndex = linearId % t.size(i); - int64_t curDimOffset = curDimIndex * t.stride(i); + auto curDimIndex = linearId % t.sym_size(i); + auto curDimOffset = curDimIndex * t.sym_stride(i); offset += curDimOffset; - linearId /= t.size(i); + linearId /= t.sym_size(i); } if (offset >= max_elem) { diff --git a/aten/src/ATen/native/IndexingUtils.h b/aten/src/ATen/native/IndexingUtils.h index 500df7966d8e..a99b3817c275 100644 --- a/aten/src/ATen/native/IndexingUtils.h +++ b/aten/src/ATen/native/IndexingUtils.h @@ -48,12 +48,18 @@ static C10_UNUSED std::vector expandTensors(const Tensor & self, IOptTen return result; } -static C10_UNUSED void checkIndexTensorTypes(IOptTensorListRef indices) { +static C10_UNUSED void checkIndexTensorTypes(IOptTensorListRef indices, bool allow_int=false) { for (const auto& tensor : indices) { if (tensor.has_value() && tensor->defined()) { auto scalarType = tensor->scalar_type(); - if (scalarType != kLong && scalarType != kByte && scalarType != kBool) { - TORCH_CHECK_INDEX(false, "tensors used as indices must be long, byte or bool tensors"); + if (allow_int) { + if (scalarType != kLong && scalarType != kByte && scalarType != kBool && scalarType != kInt) { + TORCH_CHECK_INDEX(false, "tensors used as indices must be long, int, byte or bool tensors"); + } + } else { + if (scalarType != kLong && scalarType != kByte && scalarType != kBool) { + TORCH_CHECK_INDEX(false, "tensors used as indices must be long, byte or bool tensors"); + } } } } diff --git a/aten/src/ATen/native/Integration.cpp b/aten/src/ATen/native/Integration.cpp index 7ca01bae18a5..09e444476d1f 100644 --- a/aten/src/ATen/native/Integration.cpp +++ b/aten/src/ATen/native/Integration.cpp @@ -1,12 +1,23 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/Itertools.cpp b/aten/src/ATen/native/Itertools.cpp index 265b05054b0a..8d6ff506a43f 100644 --- a/aten/src/ATen/native/Itertools.cpp +++ b/aten/src/ATen/native/Itertools.cpp @@ -1,5 +1,20 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif #include diff --git a/aten/src/ATen/native/Lerp.cpp b/aten/src/ATen/native/Lerp.cpp index bfac91a881ae..2e67dec35033 100644 --- a/aten/src/ATen/native/Lerp.cpp +++ b/aten/src/ATen/native/Lerp.cpp @@ -1,5 +1,14 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS #include +#else +#include +#endif namespace at { namespace meta { diff --git a/aten/src/ATen/native/Lerp.h b/aten/src/ATen/native/Lerp.h index f24032f5e38d..c1784ae16f31 100644 --- a/aten/src/ATen/native/Lerp.h +++ b/aten/src/ATen/native/Lerp.h @@ -1,12 +1,39 @@ #pragma once #include +#include #include #include namespace at { namespace native { +template +C10_HOST_DEVICE C10_ALWAYS_INLINE bool is_lerp_weight_small(scalar_t weight) { + return std::abs(weight) < scalar_t(0.5); +} +template +C10_HOST_DEVICE C10_ALWAYS_INLINE bool is_lerp_weight_small(c10::complex weight) { + // Avoid the sqrt in abs(weight) + return (weight.real() * weight.real() + weight.imag() * weight.imag()) < scalar_t(0.25); +} + +template +C10_HOST_DEVICE C10_ALWAYS_INLINE scalar_t lerp(scalar_t self_, scalar_t end_, weight_t weight_) { + using opmath_t = at::opmath_type; + using opmath_weight_t = at::opmath_type; + + opmath_t self = self_; + opmath_t end = end_; + opmath_weight_t weight = weight_; + + // Conditional for better numeric. This has been discussed in + // https://github.com/pytorch/pytorch/pull/18871 + return is_lerp_weight_small(weight) + ? self + weight * (end - self) + : end - (end - self) * (opmath_t(1) - weight); +} + using lerp_fn_scalar = void (*)( at::TensorIteratorBase& iter, const Scalar& weight); diff --git a/aten/src/ATen/native/Linear.cpp b/aten/src/ATen/native/Linear.cpp index a002369fc547..591289a726ac 100644 --- a/aten/src/ATen/native/Linear.cpp +++ b/aten/src/ATen/native/Linear.cpp @@ -1,16 +1,36 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include -#include +#include +#include #include #include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include -#include #include #include #include @@ -26,9 +46,6 @@ Tensor linear(const Tensor& input, const Tensor& weight, const c10::optionaldefined() && input.is_contiguous()) { - // Also hit the fused path for contiguous 3D input. - const auto input_sizes = input.sizes(); - const auto result = at::addmm(*bias, input.view({input_sizes[0] * input_sizes[1], input_sizes[2]}), weight.t()); - return result.view({input_sizes[0], input_sizes[1], result.size(1)}); + if (input.dim() == 3 && bias->defined() && input.is_contiguous() && + !input.is_xla()) { + // Also hit the fused path for contiguous 3D input, if not using xla + // backend. Reshaping/flattening has some performance implications on xla. + const auto input_sizes = input.sym_sizes(); + const auto result = at::addmm(*bias, input.view_symint({input_sizes[0] * input_sizes[1], input_sizes[2]}), weight.t()); + return result.view_symint({input_sizes[0], input_sizes[1], result.sym_size(1)}); } auto output = at::matmul(input, weight.t()); if (bias->defined()) { @@ -86,51 +105,52 @@ static Tensor sumproduct_pair(const Tensor& left_, const Tensor& right_, IntArra return at::mul(left_, right_); int64_t dim = left_.dim(); auto sum_dims = at::dim_list_to_bitset(sum_dims_, dim); - // dimensions that will be part of the output (i.e. not summed over) in three vectors - // dims in lro appear in left, right and output, similarly lo: left and output, ro: right and output + // dimensions that will be part of the output (i.e. not summed over) in three vectors: + // dims in lro appear in left, right and output, similarly, lo: left and output, ro: right and output // also the sizes are kept track of for reshaping std::vector lro, lo, ro; - int64_t lro_size = 1, lo_size = 1, ro_size = 1, sum_size = 1; + SymInt lro_size = 1, lo_size = 1, ro_size = 1, sum_size = 1; Tensor left = left_; Tensor right = right_; for (const auto i : c10::irange(dim)) { - auto sl = left.size(i)>1; - auto sr = right.size(i)>1; + auto sl = left.sym_size(i)!=1; + auto sr = right.sym_size(i)!=1; if (sum_dims[i]) { // first dimensions that will be summed over after multiplication if (sl && sr) { // dimensions nontrivially in both left and right must be of the same size - TORCH_CHECK(left.size(i)==right.size(i), "non-broadcast dimensions must match"); - sum_size *= left.size(i); + TORCH_CHECK(left.sym_size(i)==right.sym_size(i), "non-broadcast dimensions must match"); + sum_size *= left.sym_size(i); } else if (sl) { // if it is only in one of left and right, we can sum right away left = left.sum(i, true); } else if (sr) { right = right.sum(i, true); } - } else if (sl && sr) { // now deal with dimensions dimensions that will be in the output + } else if (sl && sr) { // now deal with dimensions that will be in the output // dimensions nontrivially in both left and right must be of the same size - TORCH_CHECK(left.size(i)==right.size(i), "non-broadcast dimensions must match"); + TORCH_CHECK(left.sym_size(i)==right.sym_size(i), "non-broadcast dimensions must match"); lro.push_back(i); - lro_size *= left.size(i); + lro_size *= left.sym_size(i); } else if (sl) { // keep track of dimensions appearing only once lo.push_back(i); - lo_size *= left.size(i); + lo_size *= left.sym_size(i); } else { ro.push_back(i); - ro_size *= right.size(i); + ro_size *= right.sym_size(i); } } // we now work with the following permutations / shapes. // the pipeline is permute inputs -> reshape inputs -> batch matrix mul -> reshape(view) output -> permute output - // output: "lro, lo, 1-for-summed-dims, ro" with orgiginal shape dimensions + // output: "lro, lo, 1-for-summed-dims, ro" with original shape dimensions // left: "lro, lo, summed" permuted with lpermutation and the three flattened // right: "lro, summed, ro" permuted with rpermutation and the three flattened // then the permuted output is a view of bmm(left, right) // finally, opermutation reverts the permutation to the original order of dimensions - std::vector out_size; - // NOLINTNEXTLINE(performance-inefficient-vector-operation) - for (auto& d : lro) out_size.push_back(left.size(d)); - for (auto& d : lo) out_size.push_back(left.size(d)); - for (auto& d : sum_dims_) { out_size.push_back(1); (void)(d); }; // avoid warining about not using d - for (auto& d : ro) out_size.push_back(right.size(d)); + auto out_num_dim = lro.size() + lo.size() + sum_dims_.size() + ro.size(); + std::vector out_size; + out_size.reserve(out_num_dim); + for (auto& d : lro) out_size.push_back(left.sym_size(d)); + for (auto& d : lo) out_size.push_back(left.sym_size(d)); + for (auto& d : sum_dims_) { out_size.push_back(1); (void)(d); }; // avoid warning about not using d + for (auto& d : ro) out_size.push_back(right.sym_size(d)); std::vector lpermutation(lro); lpermutation.insert(lpermutation.end(), lo.begin(), lo.end()); @@ -142,7 +162,7 @@ static Tensor sumproduct_pair(const Tensor& left_, const Tensor& right_, IntArra rpermutation.insert(rpermutation.end(), ro.begin(), ro.end()); rpermutation.insert(rpermutation.end(), lo.begin(), lo.end()); - std::vector opermutation(lro.size()+lo.size()+sum_dims_.size()+ro.size(), -1); + std::vector opermutation(out_num_dim, -1); { int64_t i = 0; @@ -161,16 +181,15 @@ static Tensor sumproduct_pair(const Tensor& left_, const Tensor& right_, IntArra } // now we can execute the operations above - left = left.permute(lpermutation).reshape({lro_size, lo_size, sum_size}); - right = right.permute(rpermutation).reshape({lro_size, sum_size, ro_size}); + left = left.permute(lpermutation).reshape_symint({lro_size, lo_size, sum_size}); + right = right.permute(rpermutation).reshape_symint({lro_size, sum_size, ro_size}); Tensor result = at::bmm(left, right); - result = result.view(out_size).permute(opermutation); + result = result.view_symint(out_size).permute(opermutation); // finally squeeze summed dimensions if desired if (! keepdim) { auto sizes = result.sizes().vec(); - // NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions) - for (int i = dim-1; i>=0; i--) { + for (auto i = dim-1; i>=0; i--) { if (sum_dims[i]) { sizes.erase(sizes.begin() + i); } @@ -180,47 +199,55 @@ static Tensor sumproduct_pair(const Tensor& left_, const Tensor& right_, IntArra return result; } -namespace { - -bool einsum_check_label(unsigned char label) { - return std::isalpha(label); -} - -uint8_t einsum_label_to_index(unsigned char label) { - constexpr uint8_t NUM_OF_LETTERS = 'z' - 'a' + 1; - return std::isupper(label) ? label - 'A' : NUM_OF_LETTERS + (label - 'a'); -} - -unsigned char einsum_index_to_label(uint8_t index) { - constexpr uint8_t NUM_OF_LETTERS = 'z' - 'a' + 1; - return index < NUM_OF_LETTERS ? index + 'A' : index - NUM_OF_LETTERS + 'a'; -} - -} // namespace - -// There are roughly three parts to compute einsum: +// There are roughly three parts to computing einsum: // 1. Parse equation to extract the labels for each input operand and output // 2. Unsqueeze missing dimensions from input operands and permute to align them // 3. Compute result by multiplying input operands and summing contraction -// dimensions We do the last part by reducing to bmm. -Tensor einsum(c10::string_view equation, TensorList operands) { +// dimensions. We do the last part by reducing to bmm. +// If a path is specified, we reduce in the order specified by the path, else we +// default to going left => right. The path is a list of indices processed the same +// way as opt-einsum: https://optimized-einsum.readthedocs.io/en/stable/path_finding.html#format-of-the-path +Tensor einsum(c10::string_view equation, TensorList operands, at::OptionalIntArrayRef path) { TORCH_CHECK(!operands.empty(), "einsum(): must provide at least one operand"); + const auto num_ops = operands.size(); + + if (path.has_value()) { + const auto path_size = num_ops == 1 ? 1 : (num_ops - 1) * 2; + TORCH_CHECK( + path->size() == path_size, + "einsum(): expected contraction path given in path parameter to have size ", + path_size, + " but got ", + path->size()); + } + + // Labels must be in range [A-Za-z] + constexpr uint8_t NUM_OF_LETTERS = 'z' - 'a' + 1; + constexpr uint8_t TOTAL_LABELS = NUM_OF_LETTERS * 2; // Code used to identify ELLIPSIS ("...") - constexpr uint8_t ELLIPSIS = 52; + constexpr uint8_t ELLIPSIS = TOTAL_LABELS; + + // Convert label in [A-Za-z] to subscript in [0, TOTAL_LABELS) + auto label_to_subscript = [=](unsigned char label) -> uint8_t { + return std::isupper(label) ? label - 'A' : label - 'a' + NUM_OF_LETTERS; + }; + + // Convert subscript in [0, TOTAL_LABELS) to label in [A-Za-z] + auto subscript_to_label = [=](uint8_t s) -> unsigned char { + return s < NUM_OF_LETTERS ? s + 'A' : s + 'a' - NUM_OF_LETTERS; + }; // Find arrow (->) to split equation into lhs and rhs const auto arrow_pos = equation.find("->"); const auto lhs = equation.substr(0, arrow_pos); - const auto num_ops = operands.size(); - // Convert labels for input operands into an index in [0, 52) and store // them in op_labels for each operand along with ELLIPSIS if present. std::vector> op_labels(num_ops); - bool found_ell = false; + bool ell_in_input = false; std::size_t curr_op = 0; - for (auto i = decltype(lhs.length()){0}; i < lhs.length(); ++i) { + for (std::size_t i = 0; i < lhs.length(); ++i) { const unsigned char label = lhs[i]; switch (label) { case ' ': @@ -230,7 +257,7 @@ Tensor einsum(c10::string_view equation, TensorList operands) { case '.': TORCH_CHECK( // Only one ellipsis per operand can be given - !found_ell, + !ell_in_input, "einsum(): found \'.\' for operand ", curr_op, " for which an ellipsis was already found"); @@ -241,7 +268,7 @@ Tensor einsum(c10::string_view equation, TensorList operands) { curr_op, " that is not part of any ellipsis"); op_labels[curr_op].push_back(ELLIPSIS); - found_ell = true; + ell_in_input = true; break; case ',': @@ -250,17 +277,17 @@ Tensor einsum(c10::string_view equation, TensorList operands) { TORCH_CHECK( curr_op < num_ops, "einsum(): fewer operands were provided than specified in the equation"); - found_ell = false; + ell_in_input = false; break; default: // Parse label TORCH_CHECK( - einsum_check_label(label), + std::isalpha(label), "einsum(): invalid subscript given at index ", i, " in the equation string, subscripts must be in [a-zA-Z]"); - op_labels[curr_op].push_back(einsum_label_to_index(label)); + op_labels[curr_op].push_back(label_to_subscript(label)); } } @@ -268,8 +295,6 @@ Tensor einsum(c10::string_view equation, TensorList operands) { curr_op == num_ops - 1, "einsum(): more operands were provided than specified in the equation"); - // Labels must be within [a-zA-Z]. - constexpr uint8_t TOTAL_LABELS = 52; std::vector label_count(TOTAL_LABELS, 0); // The maximum number of dimensions covered by any ellipsis, needed when @@ -318,12 +343,13 @@ Tensor einsum(c10::string_view equation, TensorList operands) { // Start index of ellipsis dimensions in the permuted shape int64_t ell_index = 0; - found_ell = false; + bool ell_in_output = false; if (arrow_pos == std::string::npos) { // Implicit output is ellipsis (...) + labels seen only once perm_index = ell_num_dim; - found_ell = true; + // ell_in_output is used to stop us from reducing ellipses dims later + ell_in_output = true; for (const auto label : c10::irange(TOTAL_LABELS)) { if (label_count[label] == 1) { label_perm_index[label] = perm_index++; @@ -332,7 +358,7 @@ Tensor einsum(c10::string_view equation, TensorList operands) { } else { // Parse explicit output const auto rhs = equation.substr(arrow_pos + 2); - for (auto i = decltype(rhs.length()){0}; i < rhs.length(); ++i) { + for (std::size_t i = 0; i < rhs.length(); ++i) { const unsigned char label = rhs[i]; switch (label) { case ' ': @@ -342,7 +368,7 @@ Tensor einsum(c10::string_view equation, TensorList operands) { case '.': TORCH_CHECK( // There can only be one ellipsis in the output - !found_ell, + !ell_in_output, "einsum(): found \'.\' for output but an ellipsis (...) was already found"); TORCH_CHECK( // Ensure ellipsis is correct @@ -350,16 +376,16 @@ Tensor einsum(c10::string_view equation, TensorList operands) { "einsum(): found \'.\' for output that is not part of any ellipsis (...)"); ell_index = perm_index; perm_index += ell_num_dim; - found_ell = true; + ell_in_output = true; break; default: TORCH_CHECK( - einsum_check_label(label), + std::isalpha(label), "einsum(): invalid subscript given at index ", - lhs.size() + 2 + i, + lhs.size() + 2 + i, " in the equation string, subscripts must be in [a-zA-Z]"); - const auto index = einsum_label_to_index(label); + const auto index = label_to_subscript(label); TORCH_CHECK( // Ensure label appeared at least once for some input operand and at // most once for the output @@ -374,11 +400,11 @@ Tensor einsum(c10::string_view equation, TensorList operands) { } } - // Save output size before adding contraction dims (dims to sum out) - const int64_t out_size = perm_index; + // Save number of dimensions in output before adding contraction dims (dims to sum out) + const int64_t out_num_dim = perm_index; // If ellipsis is not part of the output, add to contraction dimensions - if (!found_ell) { + if (!ell_in_output) { ell_index = perm_index; perm_index += ell_num_dim; } @@ -390,144 +416,171 @@ Tensor einsum(c10::string_view equation, TensorList operands) { } } - // Here we unsqueeze missing dimensions to make all operands have the same - // number of dimensions. We take diagonals for repeated labels within the - // same operand. Finally we permute the operands to align dimensions as - // per the perm_out_index we computed above. - std::vector permuted_operands; - for (const auto i: c10::irange(num_ops)) { - std::vector perm_shape(perm_index, -1); - std::vector label_dim(TOTAL_LABELS, -1); - Tensor operand = operands[i]; - const auto labels = op_labels[i]; - const auto original_sizes = operand.sizes(); - - int64_t j = 0; - for (const auto& label : labels) { - if (label == ELLIPSIS) { - // Add missing dimensions covered by the ellipsis - const auto num_missing_dim = - ell_num_dim - (original_sizes.size() - labels.size() + 1); - for (const auto k : c10::irange(num_missing_dim)) { - (void)k; //Suppress unused warning - operand = operand.unsqueeze(j); + // Next: we check the sizes, take diagonals for repeated labels, unsqueeze + // missing dimensions so all operands have the same dimensions and permute + // the operands to align the dimensions following the indices computed above. + // We also count how many operands have dimension with size != 1 for each + // label used to identify which dimensions can be contracted. + std::vector label_size(TOTAL_LABELS, 1); + std::vector ell_sizes(ell_num_dim, 1); + std::vector dim_counts(perm_index, 0); + std::deque ops; + for (const auto i : irange(num_ops)) { + auto op = operands[i]; + std::vector permutation(perm_index, -1); + std::int64_t dim = 0; + for (const auto s : op_labels[i]) { + if (s == ELLIPSIS) { + // Iterate over each dimension covered by ellipsis + const auto ndim = operands[i].ndimension() - (static_cast(op_labels[i].size()) - 1); + for (auto j = ell_num_dim - ndim; j < ell_num_dim; ++j) { + if (op.sym_size(dim) != 1) { + // Update ellipsis size + TORCH_CHECK( + ell_sizes[j] == 1 || ell_sizes[j] == op.sym_size(dim), + "einsum(): dimension ", + dim, + " covered by ellipsis in operand ", + i, + "has size ", + op.size(dim), + " which does not broadcast with previously seen ellipsis with size ", + ell_sizes[j], + " for the respective dimension"); + ell_sizes[j] = op.sym_size(dim); + ++dim_counts[ell_index + j]; + } + permutation[ell_index + j] = dim++; } - for (const auto k : c10::irange(ell_num_dim)) { - perm_shape[ell_index + k] = j++; + } else if (permutation[label_perm_index[s]] == -1) { + if (op.sym_size(dim) != 1) { + // Update subscript + TORCH_CHECK( + label_size[s] == 1 || label_size[s] == op.sym_size(dim), + "einsum(): subscript ", + subscript_to_label(s), + " has size ", + op.sym_size(dim), + " for operand ", + i, + " which does not broadcast with previously seen size ", + label_size[s]); + label_size[s] = op.sym_size(dim); + ++dim_counts[label_perm_index[s]]; } - } else if (label_dim[label] != -1) { + permutation[label_perm_index[s]] = dim++; + } else { // Repeated label, take diagonal - const auto dim = label_dim[label]; + const auto prev_dim = permutation[label_perm_index[s]]; TORCH_CHECK( - operand.size(j) == operand.size(dim), + op.sym_size(dim) == op.sym_size(prev_dim), "einsum(): subscript ", - einsum_index_to_label(label), + subscript_to_label(s), " is repeated for operand ", i, " but the sizes don't match, ", - operand.size(j), + op.sym_size(dim), " != ", - operand.size(dim)); - operand = operand.diagonal(0, dim, j).movedim(-1, dim); - } else { - // Lookup output index for label - label_dim[label] = j; - perm_shape[label_perm_index[label]] = j++; + op.sym_size(prev_dim)); + op = op.diagonal(0, prev_dim, dim).movedim(-1, prev_dim); } } // Add dimensions for missing labels - for (int64_t& index : perm_shape) { - if (index == -1) { - operand = operand.unsqueeze(-1); - index = j++; + for (auto& val : permutation) { + if (val == -1) { + op = op.unsqueeze(dim); + val = dim++; } } - - permuted_operands.push_back(operand.permute(perm_shape)); + ops.emplace_back(op.permute(permutation)); } - // Check if operands broadcast and keep track of last operand with - // dimension size != 1 for optimizing reductions - std::vector dim_last_op(perm_index, 0); - bool has_zero_size_dim = false; - for (const auto dim : c10::irange(perm_index)) { - auto broadcast_size = permuted_operands[0].size(dim); - for (const auto i: c10::irange(1, num_ops)) { - const auto dim_size = permuted_operands[i].size(dim); - if (broadcast_size != dim_size && broadcast_size != 1 && dim_size != 1) { - std::ostringstream msg; - msg << "einsum(): operands do not broadcast with remapped shapes [original->remapped]:"; - for (const auto j: c10::irange(num_ops)) { - msg << " " << operands[j].sizes() << "->" - << permuted_operands[j].sizes(); - } - TORCH_CHECK(false, msg.str()); - } - if (dim_size != 1) { - broadcast_size = dim_size; - dim_last_op[dim] = i; - } - } - has_zero_size_dim |= broadcast_size == 0; - } + const auto contract_path = path.value_or(std::vector{}); + auto it = contract_path.begin(); - // Compute result - Tensor result = permuted_operands[0]; - - // Fast path for when an operand has zero sized dim - if (has_zero_size_dim) { - std::vector out_shape(out_size); - for (const auto i : c10::irange(out_size)) { - out_shape[i] = permuted_operands[dim_last_op[i]].size(i); - } - return at::zeros(out_shape, result.options()); - } + // Contract + while (ops.size() > 1) { + int64_t i = 0; + int64_t j = 1; - // Sum out or squeeze dimensions that are size 1 for all later operands - int64_t dim = out_size; - for (int64_t i = dim; i < perm_index; ++i, ++dim) { - if (dim_last_op[i] == 0) { - if (result.size(dim) == 1) { - result = result.squeeze(dim--); - } else { - result = result.sum(dim--); + if (path.has_value()) { + i = *it++; + j = *it++; + if (j < i) { + std::swap(i, j); } + + TORCH_CHECK( + i != j && i >= 0 && j < static_cast(ops.size()), + "einsum(): invalid contraction (", + i, + ", ", + j, + i == j ? ") cannot contract an operand with itself" + : ") operand index is out of bounds"); } - } - for (const auto i: c10::irange(1, num_ops)) { - Tensor operand = permuted_operands[i]; - std::vector sum_dims; + auto a = ops[i]; + auto b = ops[j]; + ops.erase(ops.begin() + j); + ops.erase(ops.begin() + i); - // Sum out or squeeze dimensions that are size 1 for all later operands - dim = out_size; - for (int64_t j = dim; j < perm_index; ++j, ++dim) { - if (dim_last_op[j] < i) { - operand = operand.squeeze(dim); - --dim; - } else if (dim_last_op[j] == i) { - if (result.size(dim) == 1) { - operand = operand.sum(dim); - result = result.squeeze(dim); - --dim; - } else { + // Collect dimensions that can be summed now + std::vector sum_dims; + SmallVector a_dims_to_sum; + SmallVector b_dims_to_sum; + for (auto dim = out_num_dim; dim < perm_index; ++dim) { + if (a.sym_size(dim) != 1 && b.sym_size(dim) != 1) { + if (--dim_counts[dim] == 1) { sum_dims.push_back(dim); + dim_counts[dim] = 0; + } + } else if (dim_counts[dim] == 1) { + if (a.sym_size(dim) != 1) { + a_dims_to_sum.push_back(dim); + dim_counts[dim] = 0; + } else if (b.sym_size(dim) != 1) { + b_dims_to_sum.push_back(dim); + dim_counts[dim] = 0; } } } - // Multiply tensors and sum out dimensions in sum_dims - if (sum_dims.empty()) { - result = result.mul(operand); - } else if (sum_dims.size() == result.sizes().size()) { - result = result.flatten().dot(operand.flatten()); + // Sum multiple dims at a time to minimize the number of kernel calls to sum + if (!a_dims_to_sum.empty()) { + a = a.sum(a_dims_to_sum, true); + } + if (!b_dims_to_sum.empty()) { + b = b.sum(b_dims_to_sum, true); + } + + if (path.has_value()) { + ops.emplace_back(sumproduct_pair(a, b, sum_dims, true)); } else { - result = sumproduct_pair(result, operand, sum_dims, false); + ops.emplace_front(sumproduct_pair(a, b, sum_dims, true)); } } - return result; + // Sum out contraction dims + if (perm_index - out_num_dim > 0) { + // if there were ops to contract, we would have already done so + // in the previous loop and all the dims to sum are now 1 + // NB: use view instead of squeeze (or sum) for faster (mps) performance + if (num_ops > 1) { + auto sizes = ops[0].sym_sizes().vec(); + for (auto dim = perm_index - 1; dim >= out_num_dim; --dim) { + sizes.erase(sizes.begin() + dim); + } + return ops[0].view_symint(sizes); + } else { + std::vector sum_dims(perm_index - out_num_dim); + std::iota(sum_dims.begin(), sum_dims.end(), out_num_dim); + return ops[0].sum(sum_dims); + } + } + + return ops[0]; } // _trilinear computes a trilinear einstein sum with an unrolled dimension diff --git a/aten/src/ATen/native/LinearAlgebra.cpp b/aten/src/ATen/native/LinearAlgebra.cpp index 529c6b5ef9ca..7e47170cd72e 100644 --- a/aten/src/ATen/native/LinearAlgebra.cpp +++ b/aten/src/ATen/native/LinearAlgebra.cpp @@ -1,27 +1,132 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include #include #include #include -#include #include #include #include #include #include -#include -#include #include +#include +#include +#include #include -#include #include #include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include #include @@ -48,6 +153,8 @@ namespace detail { namespace meta { #define ADDMM_META() \ + TORCH_CHECK(self.scalar_type() == mat2.scalar_type(), "self and mat2 must have the same dtype"); \ + TORCH_CHECK(mat1.scalar_type() == mat2.scalar_type(), "mat1 and mat2 must have the same dtype"); \ TORCH_CHECK(mat1.dim() == 2, "mat1 must be a matrix, got ", mat1.dim(), "-D tensor"); \ TORCH_CHECK(mat2.dim() == 2, "mat2 must be a matrix, got ", mat2.dim(), "-D tensor"); \ TORCH_CHECK( \ @@ -55,7 +162,7 @@ namespace meta { mat1.sizes()[0], "x", mat1.sizes()[1], " and ", mat2.sizes()[0], "x", mat2.sizes()[1], ")"); \ \ auto names = at::namedinference::propagate_names_for_addmm(mat1, mat2, self); \ - set_output_raw_strided(0, {mat1.sizes()[0], mat2.sizes()[1]}, {}, self.options(), names); + set_output_raw_strided(0, {mat1.sizes()[0], mat2.sizes()[1]}, {}, mat1.options(), names); TORCH_META_FUNC(addmm)(const Tensor& self, const Tensor& mat1, const Tensor& mat2, const Scalar& beta, const Scalar& alpha) { ADDMM_META(); @@ -711,24 +818,6 @@ Tensor linalg_matrix_rank(const Tensor& input, double tol, bool hermitian) { return matrix_rank_impl(input, atol_tensor, rtol_tensor, hermitian, result); } -Tensor matrix_rank(const Tensor& self, double tol, bool symmetric) { - TORCH_WARN_ONCE( - "torch.matrix_rank is deprecated in favor of torch.linalg.matrix_rank", - "and will be removed in a future PyTorch release. The parameter 'symmetric' was ", - "renamed in torch.linalg.matrix_rank to 'hermitian'." - ); - return at::linalg_matrix_rank(self, tol, symmetric); -} - -Tensor matrix_rank(const Tensor& self, bool symmetric) { - TORCH_WARN_ONCE( - "torch.matrix_rank is deprecated in favor of torch.linalg.matrix_rank", - "and will be removed in a future PyTorch release. The parameter 'symmetric' was ", - "renamed in torch.linalg.matrix_rank to 'hermitian'." - ); - return at::linalg_matrix_rank(self, 0.0, c10::nullopt, symmetric); -} - // multi_dot helper functions namespace { @@ -788,7 +877,7 @@ std::vector> matrix_chain_order(TensorList tensors) { /** * @brief Recursively multiplies the tensors i...j using the given order * - * @param tensors matrices to multiply togther + * @param tensors matrices to multiply together * @param order optimal chain multiplication order from #matrix_chain_order * @param i index of first tensor to be multiplied * @param j index of last tensor to be multiplied @@ -2285,8 +2374,7 @@ void compute_T18_scale_square( for (const auto i : c10::irange(mexp_scaled.size(0))) { auto s_val = s_cpu.select(0, i).template item(); auto mexp = mexp_scaled.select(0, i); - for (const auto p : c10::irange(s_val)) { - (void)p; //Suppress unused variable warning + for (const auto p C10_UNUSED : c10::irange(s_val)) { mexp = at::matmul(mexp, mexp); } mexp_out.select(0, i).copy_(mexp); @@ -2682,7 +2770,7 @@ Tensor& linalg_norm_out(const Tensor& X, c10::string_view ord, OptionalIntArrayR //////////////////////////////////////////////////////////////////////////////// // Frobenius Norm // -// Just used in linalg.norm. It should not be removed. // +// Just used in torch..norm. It should not be removed. // //////////////////////////////////////////////////////////////////////////////// Tensor frobenius_norm(const Tensor& self) { @@ -2728,7 +2816,7 @@ Tensor &frobenius_norm_out(const Tensor& self, //////////////////////////////////////////////////////////////////////////////// // Nuclear Norm // -// Just used in linalg.norm. It should not be removed. // +// Just used in torch.norm. It should not be removed. // //////////////////////////////////////////////////////////////////////////////// Tensor nuclear_norm(const Tensor& self, bool keepdim) { diff --git a/aten/src/ATen/native/LinearAlgebraUtils.h b/aten/src/ATen/native/LinearAlgebraUtils.h index cbeb49fe81c6..351bc33f6590 100644 --- a/aten/src/ATen/native/LinearAlgebraUtils.h +++ b/aten/src/ATen/native/LinearAlgebraUtils.h @@ -241,8 +241,7 @@ void batch_iterator_with_broadcasting(const Tensor& a, const Tensor& b, const fu auto* b_batch_idx_ptr = data[0]; auto* a_batch_idx_ptr = data[1]; - for (const auto elem : c10::irange(nelems)) { - (void)elem; //Suppress unused variable warning + for (const auto elem C10_UNUSED : c10::irange(nelems)) { auto b_curr_linear_batch_idx = *reinterpret_cast(b_batch_idx_ptr); auto a_curr_linear_batch_idx = *reinterpret_cast(a_batch_idx_ptr); diff --git a/aten/src/ATen/native/Loss.cpp b/aten/src/ATen/native/Loss.cpp index b5b7acb8ede2..78b7d7023620 100644 --- a/aten/src/ATen/native/Loss.cpp +++ b/aten/src/ATen/native/Loss.cpp @@ -1,15 +1,62 @@ -#include -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include +#include +#include +#include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + constexpr float EPSILON = 1e-12; namespace { @@ -157,15 +204,17 @@ Tensor triplet_margin_loss(const Tensor& anchor, const Tensor& positive, const T auto n_dim = negative.dim(); TORCH_CHECK( a_dim == p_dim && p_dim == n_dim, - "All inputs should have same dimension but got ", - a_dim, - "D, ", - p_dim, - "D and ", - n_dim, - "D inputs.") + "The anchor, positive, and negative tensors are expected to have " + "the same number of dimensions, but got: anchor ", a_dim, "D, " + "positive ", p_dim, "D, and negative ", n_dim, "D inputs") + auto dist_pos = at::pairwise_distance(anchor, positive, p, eps); auto dist_neg = at::pairwise_distance(anchor, negative, p, eps); + // The distance swap is described in the paper "Learning shallow + // convolutional feature descriptors with triplet losses" by V. Balntas, E. + // Riba et al. If True, and if the positive example is closer to the + // negative example than the anchor is, swaps the positive example and the + // anchor in the loss computation. if (swap) { auto dist_swap = at::pairwise_distance(positive, negative, p, eps); dist_neg = at::min(dist_neg, dist_swap); @@ -189,9 +238,22 @@ Tensor margin_ranking_loss(const Tensor& input1, const Tensor& input2, const Ten Tensor kl_div(const Tensor& input, const Tensor& target, int64_t reduction, bool log_target) { TORCH_CHECK(!input.is_complex() && !target.is_complex(), - "kl_div: Complex inputs not supported.") - auto output = log_target ? at::exp(target) * (target - input) - : target * (at::log(target) - input); + "kl_div: Complex inputs not supported."); + TORCH_CHECK(!at::isIntegralType(input.scalar_type(), /*include_bool*/true) && + !at::isIntegralType(target.scalar_type(), /*include_bool*/true), + "kl_div: Integral inputs not supported."); + Tensor output; + if (log_target) { + output = at::exp(target) * (target - input); + } else { + if (input.is_mps() || target.is_mps()) { + // MPS fallback, as MPS does not currently implement xlogy. + // MPS will give the wrong results at `target[i] = 0` + output = target * (at::log(target) - input); + } else { + output = at::xlogy(target, target) - target * input; + } + } return apply_loss_reduction(output, reduction); } diff --git a/aten/src/ATen/native/LossCTC.cpp b/aten/src/ATen/native/LossCTC.cpp index 344c7269b0f2..dcfad968cad7 100644 --- a/aten/src/ATen/native/LossCTC.cpp +++ b/aten/src/ATen/native/LossCTC.cpp @@ -5,15 +5,36 @@ // 1. Graves et al: http://www.cs.toronto.edu/~graves/icml_2006.pdf // We use the equations from above link, but note that [1] has 1-based indexing and we (of course) use 0-based. // Graves et al call the probabilities y, we use log_probs (also calling them inputs) +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include +#include #include #include -#include +#include +#include #include #include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif -#include #include namespace at { @@ -355,6 +376,18 @@ std::tuple ctc_loss_cpu(const Tensor& log_probs, const Tensor& t }); } +std::tuple ctc_loss_tensor(const Tensor& log_probs, const Tensor& targets, const Tensor& input_lengths, const Tensor& target_lengths, int64_t BLANK, bool zero_infinity) { + TORCH_CHECK(isIntegralType(input_lengths.scalar_type(), /*includeBool=*/false), "input_lengths must be integral"); + TORCH_CHECK(isIntegralType(target_lengths.scalar_type(), /*includeBool=*/false), "target_lengths must be integral"); + + Tensor ilc = input_lengths.to(Device(at::kCPU), at::kLong).contiguous(); + Tensor tlc = target_lengths.to(Device(at::kCPU), at::kLong).contiguous(); + IntArrayRef il(ilc.data_ptr(), ilc.numel()); + IntArrayRef tl(tlc.data_ptr(), tlc.numel()); + + return at::_ctc_loss(log_probs, targets, il, tl, BLANK, zero_infinity); +} + Tensor ctc_loss_backward_cpu(const Tensor& grad, const Tensor& log_probs, const Tensor& targets, IntArrayRef input_lengths, IntArrayRef target_lengths, const Tensor& neg_log_likelihood, const Tensor& log_alpha, int64_t BLANK, bool zero_infinity) { return AT_DISPATCH_FLOATING_TYPES(log_probs.scalar_type(), "ctc_loss_backward_cpu", [&] { @@ -366,10 +399,47 @@ Tensor ctc_loss_backward_cpu(const Tensor& grad, const Tensor& log_probs, const }); } +Tensor ctc_loss_backward_tensor( + const Tensor& grad, + const Tensor& log_probs, + const Tensor& targets, + const Tensor& input_lengths, + const Tensor& target_lengths, + const Tensor& neg_log_likelihood, + const Tensor& log_alpha, + int64_t BLANK, + bool zero_infinity) { + TORCH_CHECK( + isIntegralType(input_lengths.scalar_type(), /*includeBool=*/false), + "input_lengths must be integral"); + TORCH_CHECK(isIntegralType(target_lengths.scalar_type(), /*includeBool=*/false), "target_lengths must be integral"); + + Tensor ilc = input_lengths.to(Device(at::kCPU), at::kLong).contiguous(); + Tensor tlc = target_lengths.to(Device(at::kCPU), at::kLong).contiguous(); + IntArrayRef il(ilc.data_ptr(), ilc.numel()); + IntArrayRef tl(tlc.data_ptr(), tlc.numel()); + return at::_ctc_loss_backward(grad, log_probs, targets, il, tl, neg_log_likelihood, log_alpha, BLANK, zero_infinity); +} + +namespace { + +Tensor get_clamped_target_length( + IntArrayRef target_lengths, + const TensorOptions& options) { + return at::tensor(target_lengths, options).clamp_min(1); +} + +Tensor get_clamped_target_length( + Tensor target_lengths, + const TensorOptions& options) { + return target_lengths.clamp_min(1); +} + // this wrapper function dispatches to the native and cudnn implementations and hides the alpha/grad from the user (by just returning the loss) // the gradient is implemented for _cudnn_ctc_loss (just in derivatives.yaml) and _ctc_loss and this function has automatic gradients // it also handles the reduction if desired -Tensor ctc_loss(const Tensor& log_probs_, const Tensor& targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t BLANK, int64_t reduction, bool zero_infinity) { +template +Tensor ctc_loss_impl(const Tensor& log_probs_, const Tensor& targets, LengthsType input_lengths, LengthsType target_lengths, int64_t BLANK, int64_t reduction, bool zero_infinity) { auto is_batched = log_probs_.dim() == 3; Tensor log_probs = is_batched ? log_probs_ : log_probs_.unsqueeze(1); bool use_cudnn = @@ -397,8 +467,7 @@ Tensor ctc_loss(const Tensor& log_probs_, const Tensor& targets, IntArrayRef inp } } if (reduction == at::Reduction::Mean) { - auto target_lengths_t = - at::tensor(target_lengths, res.options()).clamp_min(1); + auto target_lengths_t = get_clamped_target_length(target_lengths, res.options()); return (res / target_lengths_t).mean(); } else if (reduction == at::Reduction::Sum) { return res.sum(); @@ -406,8 +475,22 @@ Tensor ctc_loss(const Tensor& log_probs_, const Tensor& targets, IntArrayRef inp return is_batched ? res : res.squeeze(0); } +} // namespace + +Tensor ctc_loss(const Tensor& log_probs_, const Tensor& targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t BLANK, int64_t reduction, bool zero_infinity) { + return ctc_loss_impl(log_probs_, targets, input_lengths, target_lengths, BLANK, reduction, zero_infinity); +} + // Convenience function accepting Tensors Tensor ctc_loss(const Tensor& log_probs, const Tensor& targets, const Tensor& input_lengths, const Tensor& target_lengths, int64_t BLANK, int64_t reduction, bool zero_infinity) { + if (at::areAnyTensorSubclassLike( + {log_probs, targets, input_lengths, target_lengths})) { + // Composite Compliant path for TensorSubclasses + return ctc_loss_impl(log_probs, targets, input_lengths, target_lengths, BLANK, reduction, zero_infinity); + } + + // Fast path (which accesses data_ptr) and less operator dispatches for + // regular tensors TORCH_CHECK(isIntegralType(input_lengths.scalar_type(), /*includeBool=*/false), "input_lengths must be integral"); TORCH_CHECK(isIntegralType(target_lengths.scalar_type(), /*includeBool=*/false), "target_lengths must be integral"); diff --git a/aten/src/ATen/native/LossMulti.h b/aten/src/ATen/native/LossMulti.h index 54736bcc123b..148615e7e14f 100644 --- a/aten/src/ATen/native/LossMulti.h +++ b/aten/src/ATen/native/LossMulti.h @@ -1,8 +1,8 @@ -#include -#include -#include - #pragma once +#include +#include +#include +#include namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/LossMultiLabelMargin.cpp b/aten/src/ATen/native/LossMultiLabelMargin.cpp index f59de5c8817a..26d7a748df8d 100644 --- a/aten/src/ATen/native/LossMultiLabelMargin.cpp +++ b/aten/src/ATen/native/LossMultiLabelMargin.cpp @@ -1,10 +1,23 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/LossMultiMargin.cpp b/aten/src/ATen/native/LossMultiMargin.cpp index c7ab53f1d211..110520cf8f95 100644 --- a/aten/src/ATen/native/LossMultiMargin.cpp +++ b/aten/src/ATen/native/LossMultiMargin.cpp @@ -1,9 +1,19 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/LossNLL.cpp b/aten/src/ATen/native/LossNLL.cpp index 1eb630538b80..28fc60508ab1 100644 --- a/aten/src/ATen/native/LossNLL.cpp +++ b/aten/src/ATen/native/LossNLL.cpp @@ -1,13 +1,32 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#include #include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include @@ -530,11 +549,11 @@ Tensor cross_entropy_loss_label_smoothing( const Tensor& target, const Tensor& weight, int64_t reduction, - int64_t ignore_index, + c10::SymInt ignore_index, double label_smoothing) { auto class_dim = self.dim() == 1 ? 0 : 1; auto input = at::log_softmax(self, class_dim, self.scalar_type()); - auto nllloss = at::nll_loss_nd(input, target, weight, reduction, ignore_index); + auto nllloss = at::nll_loss_nd_symint(input, target, weight, reduction, ignore_index); auto n_classes = input.size(class_dim); @@ -577,15 +596,15 @@ Tensor cross_entropy_loss_label_smoothing( return (1 - label_smoothing) * nllloss + ret * (label_smoothing / n_classes); } -Tensor cross_entropy_loss( +Tensor cross_entropy_loss_symint( const Tensor& self, const Tensor& target, const c10::optional& weight, int64_t reduction, - int64_t ignore_index, + c10::SymInt ignore_index, double label_smoothing) { Tensor ret; - if (self.sizes() == target.sizes()) { + if (self.sym_sizes() == target.sym_sizes()) { // Assume soft targets when input and target shapes are the same TORCH_CHECK(at::isFloatingType(target.scalar_type()), "Expected floating point type for target with class probabilities, got ", target.scalar_type()); @@ -604,7 +623,7 @@ Tensor cross_entropy_loss( ret = cross_entropy_loss_label_smoothing(self, target, weight_, reduction, ignore_index, label_smoothing); } else { auto class_dim = self.dim() == 1 ? 0 : 1; - ret = at::nll_loss_nd( + ret = at::nll_loss_nd_symint( at::log_softmax(self, class_dim, self.scalar_type()), target, weight, @@ -623,32 +642,41 @@ Tensor & nll_loss_out(const Tensor & self, const Tensor & target, const c10::opt return std::get<0>(at::nll_loss_forward_out(output, total_weight, self, target, weight, reduction, ignore_index)); } +Tensor nll_loss_symint(const Tensor & self, const Tensor & target, const c10::optional& weight_opt, int64_t reduction, c10::SymInt ignore_index) { + // See [Note: hacky wrapper removal for optional tensor] + c10::MaybeOwned weight_maybe_owned = at::borrow_from_optional_tensor(weight_opt); + const Tensor& weight = *weight_maybe_owned; + + return std::get<0>(at::nll_loss_forward_symint(self, target, weight, reduction, ignore_index)); +} + +// Duplicate of above code for non-symbolic ints. Kept for BC purposes and to minimize breakages. Tensor nll_loss(const Tensor & self, const Tensor & target, const c10::optional& weight_opt, int64_t reduction, int64_t ignore_index) { // See [Note: hacky wrapper removal for optional tensor] c10::MaybeOwned weight_maybe_owned = at::borrow_from_optional_tensor(weight_opt); const Tensor& weight = *weight_maybe_owned; - return std::get<0>(at::nll_loss_forward(self, target, weight, reduction, ignore_index)); + return std::get<0>(at::nll_loss_forward_symint(self, target, weight, reduction, ignore_index)); } -Tensor nll_loss_nd( +Tensor nll_loss_nd_symint( const Tensor& self, const Tensor& target, const c10::optional& weight, int64_t reduction, - int64_t ignore_index) { + c10::SymInt ignore_index) { if (self.dim() < 1) { TORCH_CHECK_VALUE( false, "Expected 1 or more dimensions (got ", self.dim(), ")"); } - if (self.dim() != 1 && self.sizes()[0] != target.sizes()[0]) { + if (self.dim() != 1 && self.sym_sizes()[0] != target.sym_sizes()[0]) { TORCH_CHECK_VALUE( false, "Expected input batch_size (", - self.sizes()[0], + self.sym_sizes()[0], ") to match target batch_size (", - target.sizes()[0], + target.sym_sizes()[0], ")."); } @@ -656,42 +684,42 @@ Tensor nll_loss_nd( Tensor input_ = self; Tensor target_ = target; if (input_.dim() == 1 || input_.dim() == 2) { - ret = at::nll_loss(input_, target_, weight, reduction, ignore_index); + ret = at::nll_loss_symint(input_, target_, weight, reduction, ignore_index); } else if (input_.dim() == 4) { - ret = at::nll_loss2d(input_, target_, weight, reduction, ignore_index); + ret = at::nll_loss2d_symint(input_, target_, weight, reduction, ignore_index); } else { // dim == 3 or dim > 4 - auto n = input_.sizes()[0]; - auto c = input_.sizes()[1]; - auto out_size = input_.sizes().slice(2).vec(); + auto n = input_.sym_sizes()[0]; + auto c = input_.sym_sizes()[1]; + auto out_size = input_.sym_sizes().slice(2).vec(); out_size.insert(out_size.begin(), n); - if (target_.sizes().slice(1) != input_.sizes().slice(2)) { + if (target_.sym_sizes().slice(1) != input_.sym_sizes().slice(2)) { TORCH_CHECK( false, "Expected target size ", - IntArrayRef(out_size), + SymIntArrayRef(out_size), ", got ", - target_.sizes()); + target_.sym_sizes()); } input_ = input_.contiguous(); target_ = target_.contiguous(); // support empty batches, see #15870 if (input_.numel() > 0) { - input_ = input_.view({n, c, 1, -1}); + input_ = input_.view_symint({n, c, 1, -1}); } else { - input_ = input_.view({n, c, 0, 0}); + input_ = input_.view_symint({n, c, 0, 0}); } if (target_.numel() > 0) { - target_ = target_.view({n, 1, -1}); + target_ = target_.view_symint({n, 1, -1}); } else { - target_ = target_.view({n, 0, 0}); + target_ = target_.view_symint({n, 0, 0}); } if (reduction != Reduction::None) { - ret = at::nll_loss2d(input_, target_, weight, reduction, ignore_index); + ret = at::nll_loss2d_symint(input_, target_, weight, reduction, ignore_index); } else { auto out = - at::nll_loss2d(input_, target_, weight, reduction, ignore_index); - ret = out.view(out_size); + at::nll_loss2d_symint(input_, target_, weight, reduction, ignore_index); + ret = out.view_symint(out_size); } } return ret; diff --git a/aten/src/ATen/native/LossNLL2d.cpp b/aten/src/ATen/native/LossNLL2d.cpp index d7ebf65231f1..aee22ce3edeb 100644 --- a/aten/src/ATen/native/LossNLL2d.cpp +++ b/aten/src/ATen/native/LossNLL2d.cpp @@ -1,12 +1,23 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { @@ -473,12 +484,21 @@ Tensor & nll_loss2d_out(const Tensor & self, const Tensor & target, const c10::o return std::get<0>(at::nll_loss2d_forward_out(output, total_weight, self, target, weight, reduction, ignore_index)); } +Tensor nll_loss2d_symint(const Tensor & self, const Tensor & target, const c10::optional& weight_opt, int64_t reduction, c10::SymInt ignore_index) { + // See [Note: hacky wrapper removal for optional tensor] + c10::MaybeOwned weight_maybe_owned = at::borrow_from_optional_tensor(weight_opt); + const Tensor& weight = *weight_maybe_owned; + + return std::get<0>(at::nll_loss2d_forward_symint(self, target, weight, reduction, ignore_index)); +} + +// Duplicate of above code for non-symbolic ints. Kept for BC purposes and to minimize breakages. Tensor nll_loss2d(const Tensor & self, const Tensor & target, const c10::optional& weight_opt, int64_t reduction, int64_t ignore_index) { // See [Note: hacky wrapper removal for optional tensor] c10::MaybeOwned weight_maybe_owned = at::borrow_from_optional_tensor(weight_opt); const Tensor& weight = *weight_maybe_owned; - return std::get<0>(at::nll_loss2d_forward(self, target, weight, reduction, ignore_index)); + return std::get<0>(at::nll_loss2d_forward_symint(self, target, weight, reduction, ignore_index)); } } // namespace native diff --git a/aten/src/ATen/native/MathBitFallThroughLists.h b/aten/src/ATen/native/MathBitFallThroughLists.h index 025c25bcbe7b..97b0854d82d0 100644 --- a/aten/src/ATen/native/MathBitFallThroughLists.h +++ b/aten/src/ATen/native/MathBitFallThroughLists.h @@ -54,7 +54,6 @@ namespace at { #define TENSOR_UTILITIES_AND_CONSTRUCTORS(m) \ m.impl("empty_like", torch::CppFunction::makeFallthrough()); \ m.impl("empty.memory_format", torch::CppFunction::makeFallthrough()); \ - m.impl("empty.SymInt", torch::CppFunction::makeFallthrough()); \ m.impl("empty.out", torch::CppFunction::makeFallthrough()); \ m.impl("empty_strided", torch::CppFunction::makeFallthrough()); \ m.impl("full_like", torch::CppFunction::makeFallthrough()); \ diff --git a/aten/src/ATen/native/MathBitsFallback.h b/aten/src/ATen/native/MathBitsFallback.h index 4e9c2d9e98b1..84e72aa724d0 100644 --- a/aten/src/ATen/native/MathBitsFallback.h +++ b/aten/src/ATen/native/MathBitsFallback.h @@ -1,12 +1,17 @@ -#include +#include #include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { // This fallback should only be used for operations that are self inverse and have a corresponding tensor diff --git a/aten/src/ATen/native/MaxPooling.cpp b/aten/src/ATen/native/MaxPooling.cpp index 3e615d7cf071..e809c75ba21d 100644 --- a/aten/src/ATen/native/MaxPooling.cpp +++ b/aten/src/ATen/native/MaxPooling.cpp @@ -1,10 +1,22 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { @@ -26,19 +38,19 @@ Tensor max_pool1d_impl( "max_pool1d() Expected 2D or 3D input tensor, but got ", self.sizes()); TORCH_CHECK( kernel_size.size() == 1, - "max_pool1d() kernel_size must be an int or int list of size 1 but got size ", + "max_pool1d() kernel_size must be an int, list of ints or tuple of ints of size 1 but got size ", kernel_size.size()); TORCH_CHECK( stride.size() == 0 || stride.size() == 1, - "max_pool1d() stride must be None, an int or int list of size 1 but got size ", + "max_pool1d() stride must be None, an int, list of ints, or tuple of ints of size 1 but got size ", stride.size()); TORCH_CHECK( padding.size() == 1, - "max_pool1d() padding must be an int or int list of size 1 but got size ", + "max_pool1d() padding must be an int, list of ints, or tuple of ints of size 1 but got size ", padding.size()); TORCH_CHECK( dilation.size() == 1, - "max_pool1d() dilation must be an int or int list of size 1 but got size ", + "max_pool1d() dilation must be an int, list of ints or tuple of ints of size 1 but got size ", dilation.size()); // If stride=None then set it to kernel_size @@ -97,13 +109,22 @@ Tensor max_pool1d( IntArrayRef padding, IntArrayRef dilation, bool ceil_mode) { + + auto ndim = self.ndimension(); + TORCH_CHECK( + (ndim == 2 && self.size(0) != 0 && self.size(1) != 0) || + (ndim == 3 && self.size(1) != 0 && self.size(2) != 0), + "max_pool1d: Expected 2D or 3D (batch mode) tensor with optional 0 dim batch size for input, but got:", + self.sizes()); + if (self.is_quantized()) { return at::quantized_max_pool1d( self, kernel_size, stride, padding, dilation, ceil_mode); } if ((self.requires_grad() && at::GradMode::is_enabled()) || self._fw_grad(/*level */ 0).defined() || - !self.device().is_cpu()) { + !self.device().is_cpu() || + isTensorSubclassLike(self)) { // Needs indices for grad and with_indices defines CUDA dispatch return std::get<0>(at::max_pool1d_with_indices( self, kernel_size, stride, padding, dilation, ceil_mode)); diff --git a/aten/src/ATen/native/MaxUnpooling.cpp b/aten/src/ATen/native/MaxUnpooling.cpp index 27d4e1a93c81..adab802d65cd 100644 --- a/aten/src/ATen/native/MaxUnpooling.cpp +++ b/aten/src/ATen/native/MaxUnpooling.cpp @@ -1,8 +1,17 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + namespace at { namespace native { @@ -11,6 +20,10 @@ Tensor& max_unpooling2d_forward_out_cpu( const Tensor& indices_, IntArrayRef output_size, Tensor& output) { + // See Note [Writing Nondeterministic Operations] + // Nondeterministic with duplicate indices + at::globalContext().alertNotDeterministic("max_unpooling2d_forward_out"); + auto oheight = output_size[0]; auto owidth = output_size[1]; TORCH_CHECK( @@ -149,6 +162,10 @@ Tensor& max_unpooling3d_forward_out_cpu(const Tensor& self_, IntArrayRef stride, IntArrayRef padding, Tensor& output) { + // See Note [Writing Nondeterministic Operations] + // Nondeterministic with duplicate indices + at::globalContext().alertNotDeterministic("max_unpooling3d_forward_out"); + TORCH_CHECK(output.is_contiguous(), "output must be contiguous"); int64_t oT = output_size[0]; int64_t oH = output_size[1]; diff --git a/aten/src/ATen/native/Memory.cpp b/aten/src/ATen/native/Memory.cpp index df6949b2d7d9..2b66f0893393 100644 --- a/aten/src/ATen/native/Memory.cpp +++ b/aten/src/ATen/native/Memory.cpp @@ -1,6 +1,17 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/MetaTensor.cpp b/aten/src/ATen/native/MetaTensor.cpp index 0b3bb3e04c7b..5ebe52ec4a81 100644 --- a/aten/src/ATen/native/MetaTensor.cpp +++ b/aten/src/ATen/native/MetaTensor.cpp @@ -12,19 +12,7 @@ namespace at { namespace native { -Tensor empty_meta( - IntArrayRef size, - c10::optional dtype_opt, - c10::optional layout_opt, - c10::optional device_opt, - c10::optional pin_memory_opt, - c10::optional memory_format_opt -) { - return at::detail::empty_meta( - size, dtype_opt, layout_opt, device_opt, pin_memory_opt, memory_format_opt); -} - -Tensor empty_symint_meta( +Tensor empty_meta_symint( SymIntArrayRef size, c10::optional dtype_opt, c10::optional layout_opt, @@ -41,6 +29,7 @@ Tensor empty_symint_meta( size, dtype_opt, layout_opt, device_opt, pin_memory_opt, memory_format_opt); } +// Kept only for BC with XLA Tensor empty_strided_meta( IntArrayRef size, IntArrayRef stride, @@ -49,7 +38,18 @@ Tensor empty_strided_meta( c10::optional device_opt, c10::optional pin_memory_opt ) { - return at::detail::empty_strided_meta( + return empty_strided_meta_symint(c10::fromIntArrayRefSlow(size), c10::fromIntArrayRefSlow(stride), dtype_opt, layout_opt, device_opt, pin_memory_opt); +} + +Tensor empty_strided_meta_symint( + SymIntArrayRef size, + SymIntArrayRef stride, + c10::optional dtype_opt, + c10::optional layout_opt, + c10::optional device_opt, + c10::optional pin_memory_opt +) { + return at::detail::empty_strided_symint_meta( size, stride, dtype_opt, layout_opt, device_opt, pin_memory_opt); } diff --git a/aten/src/ATen/native/NNPACK.cpp b/aten/src/ATen/native/NNPACK.cpp index 3df0a0623e43..4fb40a17d026 100644 --- a/aten/src/ATen/native/NNPACK.cpp +++ b/aten/src/ATen/native/NNPACK.cpp @@ -1,10 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + #if !AT_NNPACK_ENABLED() namespace at { @@ -198,8 +209,8 @@ Tensor _nnpack_spatial_convolution( .height = (size_t)output.size(2), }; const nnp_size output_subsample = { - .width = stride[1], - .height = stride[0], + .width = static_cast(stride[1]), + .height = static_cast(stride[0]), }; const auto input_ = input.contiguous(); diff --git a/aten/src/ATen/native/NaiveConvolutionTranspose2d.cpp b/aten/src/ATen/native/NaiveConvolutionTranspose2d.cpp index ea604c426c3b..a9cf36a004f4 100644 --- a/aten/src/ATen/native/NaiveConvolutionTranspose2d.cpp +++ b/aten/src/ATen/native/NaiveConvolutionTranspose2d.cpp @@ -1,5 +1,5 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include @@ -8,6 +8,17 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp b/aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp index 3d34091fd036..cf60f56f9df4 100644 --- a/aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp +++ b/aten/src/ATen/native/NaiveConvolutionTranspose3d.cpp @@ -1,11 +1,23 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/NaiveDilatedConvolution.cpp b/aten/src/ATen/native/NaiveDilatedConvolution.cpp index fa7b30f5977e..827bf204b093 100644 --- a/aten/src/ATen/native/NaiveDilatedConvolution.cpp +++ b/aten/src/ATen/native/NaiveDilatedConvolution.cpp @@ -1,14 +1,25 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include #include #include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/NamedTensor.cpp b/aten/src/ATen/native/NamedTensor.cpp index d725c26a1463..6ee2f095b6d0 100644 --- a/aten/src/ATen/native/NamedTensor.cpp +++ b/aten/src/ATen/native/NamedTensor.cpp @@ -1,8 +1,30 @@ -#include -#include - +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/NegateFallback.cpp b/aten/src/ATen/native/NegateFallback.cpp index a2b134a91e40..0a34b4f4331d 100644 --- a/aten/src/ATen/native/NegateFallback.cpp +++ b/aten/src/ATen/native/NegateFallback.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include diff --git a/aten/src/ATen/native/NonSymbolicBC.h b/aten/src/ATen/native/NonSymbolicBC.h new file mode 100644 index 000000000000..0b942efb52c3 --- /dev/null +++ b/aten/src/ATen/native/NonSymbolicBC.h @@ -0,0 +1,27 @@ +#pragma once +#include +#include +#include + +namespace at { +namespace native { +// This file contains non-symbolic signatures for ops that we have sym-intified the signature of. +// However, in certain cases (such as static runtime), we call the native versions of the ops directly. +// In those cases, we will duplicate the signature here with non-symbolic ints, and also duplicate the C++ implementation. +TORCH_API at::Tensor reshape(const at::Tensor& self, at::IntArrayRef proposed_shape); +TORCH_API at::Tensor narrow(const at::Tensor& self, int64_t dim, int64_t start, int64_t length); +TORCH_API at::Tensor _sparse_coo_tensor_unsafe(const at::Tensor & indices, const at::Tensor & values, at::IntArrayRef size, c10::optional dtype=c10::nullopt, c10::optional layout=c10::nullopt, c10::optional device=c10::nullopt, c10::optional pin_memory=c10::nullopt); +TORCH_API at::Tensor nll_loss(const at::Tensor & self, const at::Tensor & target, const c10::optional& weight_opt, int64_t reduction, int64_t ignore_index); +TORCH_API at::Tensor nll_loss2d(const at::Tensor & self, const at::Tensor & target, const c10::optional& weight_opt, int64_t reduction, int64_t ignore_index); +// The below ops don't get a duplicated C++ implementation. +// They are backward ops, which make them very unlikely to be called directly +// by external code (at::native::trace_backward). +// They get their own declaration for BC purposes however. +TORCH_API at::Tensor _embedding_bag_backward(const at::Tensor & grad, const at::Tensor & indices, const at::Tensor & offsets, const at::Tensor & offset2bag, const at::Tensor & bag_size, const at::Tensor & maximum_indices, int64_t num_weights, bool scale_grad_by_freq, int64_t mode, bool sparse, const c10::optional & per_sample_weights, int64_t padding_idx=-1); +TORCH_API at::Tensor _embedding_bag_sparse_backward(const at::Tensor & grad, const at::Tensor & indices, const at::Tensor & offsets, const at::Tensor & offset2bag, const at::Tensor & bag_size, int64_t num_weights, bool scale_grad_by_freq, int64_t mode, const c10::optional & per_sample_weights, int64_t padding_idx=-1); +TORCH_API at::Tensor value_selecting_reduction_backward(const at::Tensor & grad, int64_t dim, const at::Tensor & indices, at::IntArrayRef sizes, bool keepdim); +TORCH_API at::Tensor trace_backward(const at::Tensor & grad, at::IntArrayRef sizes); +TORCH_API at::Tensor index_select_backward(const at::Tensor & grad, at::IntArrayRef self_sizes, int64_t dim, const at::Tensor & index); +TORCH_API at::Tensor select(const at::Tensor& self, int64_t dim, int64_t index); +TORCH_API std::vector tensor_split(const Tensor& self, IntArrayRef indices, int64_t dim); +}} diff --git a/aten/src/ATen/native/Normalization.cpp b/aten/src/ATen/native/Normalization.cpp index e5373cac4ad2..ab9094d9b598 100644 --- a/aten/src/ATen/native/Normalization.cpp +++ b/aten/src/ATen/native/Normalization.cpp @@ -1,19 +1,55 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include -#include #include +#include +#include +#include +#include +#include +#include +#include #include -#include #include #include #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include +#include static const int MIOPEN_DIM_MAX = 5; @@ -41,14 +77,14 @@ DEFINE_DISPATCH(batch_norm_cpu_backward_stub); DEFINE_DISPATCH(renorm_scale_factor_stub); namespace { - void check_dims_match_num_input_features(const char* arg_name, int64_t expected, int64_t actual){ + void check_dims_match_num_input_features(const char* arg_name, SymInt expected, SymInt actual){ TORCH_CHECK(actual == expected, arg_name, " should contain ", expected, " elements not ", actual); } - static inline Tensor repeat_if_defined(const Tensor& t, int64_t repeat) { + static inline Tensor repeat_if_defined(const Tensor& t, SymInt repeat) { if (t.defined()) { - return t.repeat(repeat); + return t.repeat_symint(repeat); } return t; } @@ -88,17 +124,17 @@ std::tuple batch_norm_cpu_transform_input_template( const Tensor& input, const Tensor& weight, const Tensor& bias, const Tensor& save_mean /* optional */, const Tensor& save_invstd /* optional */, const Tensor& running_mean /* optional */, const Tensor& running_var /* optional */, - bool train, double eps) { + bool train, double eps, Tensor& output) { bool all_contiguous = is_contiguous(input) - && (!weight.defined() || weight.is_contiguous()) - && (!bias.defined() || bias.is_contiguous()) - && running_mean.is_contiguous() - && running_var.is_contiguous(); + && is_contiguous(output) + && (!weight.defined() || weight.is_contiguous()) + && (!bias.defined() || bias.is_contiguous()) + && running_mean.is_contiguous() + && running_var.is_contiguous(); // inference contiguous path if (all_contiguous) { - Tensor output = at::empty_like(input, suggest_memory_format_contig(input)); batch_norm_cpu_stub(kCPU, output, input, weight, bias, save_mean, save_invstd, running_mean, running_var, train, eps); return std::make_tuple(output, save_mean, save_invstd); @@ -130,7 +166,6 @@ std::tuple batch_norm_cpu_transform_input_template( auto b = bias.defined() ? as_nd(bias) : at::detail::scalar_tensor_static(0, dtype, kCPU); - Tensor output = at::empty_like(input, input.suggest_memory_format()); auto iter = TensorIteratorConfig() .add_output(output) .add_input(input) @@ -141,8 +176,7 @@ std::tuple batch_norm_cpu_transform_input_template( .check_all_same_dtype(false) .promote_inputs_to_common_dtype(false) .build(); - - cpu_kernel(iter, [=](scalar_t input, param_t mean, param_t invstd, param_t weight, param_t bias) { + cpu_kernel(iter, [=](scalar_t input, param_t mean, param_t invstd, param_t weight, param_t bias) -> scalar_t { return ((input - mean) * invstd) * weight + bias; }); return std::make_tuple(output, save_mean, save_invstd); @@ -151,30 +185,17 @@ std::tuple batch_norm_cpu_transform_input_template( template class VarTransform> std::tuple batch_norm_cpu_update_stats_template( const Tensor& input, const Tensor& running_mean, const Tensor& running_var, - double momentum, double eps) { + double momentum, double eps, Tensor& save_mean, Tensor& save_var_transform) { using accscalar_t = at::acc_type; int64_t n_input = input.size(1); int64_t n = input.numel() / n_input; - const int64_t ndim = input.dim(); - - // Reduce all dimensions except dim=1 - DimVector reduce_dims(ndim - 1); - reduce_dims[0] = 0; - for (const auto i : c10::irange(2, ndim)) { - reduce_dims[i - 1] = i; - } bool all_contiguous = is_contiguous(input); const bool mixed_type = !std::is_same::value; const auto dtype = mixed_type ? kFloat : input.scalar_type(); - // For contiguous case, leave 'mean' computation to kernel - Tensor save_mean = all_contiguous - ? at::empty({n_input}, input.options().dtype(dtype)) - : at::mean(input, /*dim=*/reduce_dims, /*keepdim=*/false, dtype); - Tensor save_var_transform = at::empty({n_input}, input.options().dtype(dtype)); auto save_mean_a = save_mean.accessor(); auto save_var_transform_a = save_var_transform.accessor(); @@ -186,6 +207,7 @@ std::tuple batch_norm_cpu_update_stats_template( auto _var_sum = at::empty({n_input}, input.options().dtype(dtype)); auto _mean_a = _mean.accessor(); auto _var_sum_a = _var_sum.accessor(); + auto momentum_ = static_cast(momentum); batch_norm_cpu_collect_stats_stub(kCPU, _mean, _var_sum, input); @@ -195,11 +217,11 @@ std::tuple batch_norm_cpu_update_stats_template( save_var_transform_a[f] = VarTransform{}(_var_sum_a[f] / n, eps); if (running_mean.defined()) { - running_mean_a[f] = momentum * _mean_a[f] + (1 - momentum) * running_mean_a[f]; + running_mean_a[f] = momentum_ * _mean_a[f] + (1 - momentum_) * running_mean_a[f]; } if (running_var.defined()) { - accscalar_t unbiased_var = _var_sum_a[f] / (n - 1); - running_var_a[f] = momentum * unbiased_var + (1 - momentum) * running_var_a[f]; + accscalar_t unbiased_var = _var_sum_a[f] / (n - 1); + running_var_a[f] = momentum_ * unbiased_var + (1 - momentum_) * running_var_a[f]; } } }); @@ -243,6 +265,25 @@ std::tuple batch_norm_cpu_update_stats_template( return std::make_tuple(save_mean, save_var_transform); } +template class VarTransform> +std::tuple batch_norm_cpu_update_stats_template( + const Tensor& input, const Tensor& running_mean, const Tensor& running_var, + double momentum, double eps) { + int64_t n_input = input.size(1); + const int64_t ndim = input.dim(); + DimVector reduce_dims(ndim - 1); + reduce_dims[0] = 0; + for (const auto i : c10::irange(2, ndim)) { + reduce_dims[i - 1] = i; + } + + const bool mixed_type = !std::is_same::value; + const auto dtype = mixed_type ? kFloat : input.scalar_type(); + Tensor save_mean = is_contiguous(input) ? at::empty({n_input}, input.options().dtype(dtype)) : at::mean(input, /*dim=*/reduce_dims, /*keepdim=*/false, dtype); + Tensor save_var_transform = at::empty({n_input}, input.options().dtype(dtype)); + return batch_norm_cpu_update_stats_template(input, running_mean, running_var, momentum, eps, save_mean, save_var_transform); +} + template std::tuple batch_norm_backward_cpu_template( const Tensor& grad_out_, const Tensor& input, const Tensor& weight, @@ -442,14 +483,14 @@ std::tuple _batch_norm_impl_index( const Tensor& running_mean = c10::value_or_else(running_mean_opt, [] {return Tensor();}); const Tensor& running_var = c10::value_or_else(running_var_opt, [] {return Tensor();}); - auto num_features = input.sizes()[1]; + auto num_features = input.sym_sizes()[1]; - if (input.numel() == 0) { + if (input.sym_numel() == 0) { Tensor reserve = at::empty({0}, input.options().dtype(kByte)); auto options = input.options().dtype( at::toAccumulateType(input.scalar_type(), /*is_cuda=*/input.is_cuda())); - auto save_mean = at::empty({num_features}, options); - auto save_invstd = at::empty({num_features}, options); + auto save_mean = at::empty_symint(c10::SymIntArrayRef({num_features}), options); + auto save_invstd = at::empty_symint(c10::SymIntArrayRef({num_features}), options); // don't return view of input, don't return empty tensor because it will break gradient chain auto out = input.clone(); @@ -460,20 +501,20 @@ std::tuple _batch_norm_impl_index( } if (running_mean.defined()) { - check_dims_match_num_input_features("running_mean", num_features, running_mean.numel()); + check_dims_match_num_input_features("running_mean", num_features, running_mean.sym_numel()); } else if (!training) { AT_ERROR("running_mean must be defined in evaluation mode"); } if (running_var.defined()) { - check_dims_match_num_input_features("running_var", num_features, running_var.numel()); + check_dims_match_num_input_features("running_var", num_features, running_var.sym_numel()); } else if (!training) { AT_ERROR("running_var must be defined in evaluation mode"); } if (weight.defined()) { - check_dims_match_num_input_features("weight", num_features, weight.numel()); + check_dims_match_num_input_features("weight", num_features, weight.sym_numel()); } if (bias.defined()) { - check_dims_match_num_input_features("bias", num_features, bias.numel()); + check_dims_match_num_input_features("bias", num_features, bias.sym_numel()); } const bool use_cudnn = ( @@ -485,12 +526,12 @@ std::tuple _batch_norm_impl_index( && ((running_mean.defined() && running_var.defined()) || (!running_mean.defined() && !running_var.defined() && training)) && (input.dim() >= 3) - && ((input.size(0) <= 880801 && training) // spatial, training - ||(input.size(0) <= 65535 && !training)) //spatial, eval + && ((input.sym_size(0) <= 880801 && training) // spatial, training + ||(input.sym_size(0) <= 65535 && !training)) //spatial, eval && detail::getCUDAHooks().compiledWithCuDNN() && eps >= detail::getCUDAHooks().batchnormMinEpsilonCuDNN() && cudnn_enabled && detail::getCUDAHooks().versionCuDNN() >= 5110L - && input.numel() < std::numeric_limits::max() // some cuDNN kernels have 32-bit indexing limitations + && input.sym_numel() < std::numeric_limits::max() // some cuDNN kernels have 32-bit indexing limitations ); if (use_cudnn) { @@ -523,7 +564,7 @@ std::tuple _batch_norm_impl_index( && cudnn_enabled ); - if (use_miopen) { + if (use_miopen && input.suggest_memory_format() != MemoryFormat::ChannelsLast && input.suggest_memory_format() != MemoryFormat::ChannelsLast3d) { return std::tuple_cat( at::miopen_batch_norm( input.contiguous(), weight.contiguous(), bias.contiguous(), @@ -609,32 +650,32 @@ Tensor instance_norm( const Tensor& running_mean = c10::value_or_else(running_mean_opt, [] {return Tensor();}); const Tensor& running_var = c10::value_or_else(running_var_opt, [] {return Tensor();}); - TORCH_CHECK(use_input_stats || (running_mean.defined() && running_var.defined()), + TORCH_CHECK(use_input_stats || (running_mean.defined() && running_var.defined()), "Expected running_mean and running_var to be defined when use_input_stats is false"); - std::vector shape = input.sizes().vec(); - int64_t b = input.size(0); - int64_t c = input.size(1); + std::vector shape = input.sym_sizes().vec(); + SymInt b = input.sym_size(0); + SymInt c = input.sym_size(1); shape[1] = b * c; - shape[0] = 1; + shape[0] = SymInt(1); Tensor weight_ = repeat_if_defined(weight, b); Tensor bias_ = repeat_if_defined(bias, b); Tensor running_mean_ = repeat_if_defined(running_mean, b); Tensor running_var_ = repeat_if_defined(running_var, b); - auto input_reshaped = input.contiguous().view(shape); + auto input_reshaped = input.contiguous().view_symint(shape); auto out = at::batch_norm(input_reshaped, weight_, bias_, running_mean_, running_var_, use_input_stats, momentum, eps, cudnn_enabled); // we alias running_mean and running_var because they are const but we want to modify their data if (running_mean.defined()) { - at::alias(running_mean).copy_(running_mean_.view({ b, c }).mean(0, false)); + at::alias(running_mean).copy_(running_mean_.view_symint({ b, c }).mean(0, false)); } if (running_var.defined()) { - at::alias(running_var).copy_(running_var_.view({ b, c }).mean(0, false)); + at::alias(running_var).copy_(running_var_.view_symint({ b, c }).mean(0, false)); } - return out.view(input.sizes()); + return out.view_symint(input.sym_sizes()); } std::tuple batch_norm_update_stats_cpu( @@ -655,8 +696,8 @@ std::tuple batch_norm_update_stats_cpu( }); } -std::tuple batch_norm_cpu(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, const c10::optional& running_mean_opt, const c10::optional& running_var_opt, - bool train, double momentum, double eps) { +std::tuple batch_norm_cpu_out(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, const c10::optional& running_mean_opt, const c10::optional& running_var_opt, + bool train, double momentum, double eps, Tensor& out, Tensor& save_mean, Tensor& save_var) { // See [Note: hacky wrapper removal for optional tensor] c10::MaybeOwned weight_maybe_owned = at::borrow_from_optional_tensor(weight_opt); const Tensor& weight = *weight_maybe_owned; @@ -664,33 +705,112 @@ std::tuple batch_norm_cpu(const Tensor& self, const c10: const Tensor& running_mean = c10::value_or_else(running_mean_opt, [] {return Tensor();}); const Tensor& running_var = c10::value_or_else(running_var_opt, [] {return Tensor();}); - checkBackend("batch_norm_cpu", {self, weight, bias, running_mean, running_var}, Backend::CPU); + checkBackend("batch_norm_cpu_out", {self, weight, bias, running_mean, running_var}, Backend::CPU); + // Resize out + at::native::resize_output(out, self.sizes()); const bool mixed_type = is_mixed_type(self, weight, bias, running_mean, running_var); - return AT_DISPATCH_FLOATING_TYPES_AND(ScalarType::BFloat16, self.scalar_type(), "batch_norm", [&] { + AT_DISPATCH_FLOATING_TYPES_AND(ScalarType::BFloat16, self.scalar_type(), "batch_norm", [&] { if (mixed_type) { check_mixed_data_type(self, weight, bias, running_mean, running_var); if (!train) { - auto save_mean = at::empty({0}, self.options().dtype(kFloat)); - auto save_var = at::empty({0}, self.options().dtype(kFloat)); - return batch_norm_cpu_transform_input_template(self, weight, bias, save_mean, save_var, running_mean, running_var, train, eps); + return batch_norm_cpu_transform_input_template(self, weight, bias, save_mean, save_var, running_mean, running_var, train, eps, out); } else { - auto save_stats = batch_norm_cpu_update_stats_template(self, running_mean, running_var, momentum, eps); - return batch_norm_cpu_transform_input_template(self, weight, bias, std::get<0>(save_stats), std::get<1>(save_stats), running_mean, running_var, train, eps); + // Resize save_mean and save_var + at::native::resize_output(save_mean, {self.size(1)}); + at::native::resize_output(save_var, {self.size(1)}); + auto save_stats = batch_norm_cpu_update_stats_template(self, running_mean, running_var, momentum, eps, save_mean, save_var); + return batch_norm_cpu_transform_input_template(self, weight, bias, std::get<0>(save_stats), std::get<1>(save_stats), running_mean, running_var, train, eps, out); } } else { if (!train) { - auto save_mean = at::empty({0}, self.options()); - auto save_var = at::empty({0}, self.options()); - return batch_norm_cpu_transform_input_template(self, weight, bias, save_mean, save_var, running_mean, running_var, train, eps); + return batch_norm_cpu_transform_input_template(self, weight, bias, save_mean, save_var, running_mean, running_var, train, eps, out); } else { - auto save_stats = batch_norm_cpu_update_stats_template(self, running_mean, running_var, momentum, eps); - return batch_norm_cpu_transform_input_template(self, weight, bias, std::get<0>(save_stats), std::get<1>(save_stats), running_mean, running_var, train, eps); + // Resize save_mean and save_var + at::native::resize_output(save_mean, {self.size(1)}); + at::native::resize_output(save_var, {self.size(1)}); + auto save_stats = batch_norm_cpu_update_stats_template(self, running_mean, running_var, momentum, eps, save_mean, save_var); + return batch_norm_cpu_transform_input_template(self, weight, bias, std::get<0>(save_stats), std::get<1>(save_stats), running_mean, running_var, train, eps, out); } } }); + + return std::tuple(out, save_mean, save_var); } +std::tuple batch_norm_cpu(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, const c10::optional& running_mean_opt, const c10::optional& running_var_opt, + bool train, double momentum, double eps) { + // See [Note: hacky wrapper removal for optional tensor] + c10::MaybeOwned weight_maybe_owned = at::borrow_from_optional_tensor(weight_opt); + const Tensor& weight = *weight_maybe_owned; + const Tensor& bias = c10::value_or_else(bias_opt, [] {return Tensor();}); + const Tensor& running_mean = c10::value_or_else(running_mean_opt, [] {return Tensor();}); + const Tensor& running_var = c10::value_or_else(running_var_opt, [] {return Tensor();}); + + checkBackend("batch_norm_cpu", {self, weight, bias, running_mean, running_var}, Backend::CPU); + + // Prepare output tensor + const bool all_contiguous = is_contiguous(self) + && (!weight.defined() || weight.is_contiguous()) + && (!bias.defined() || bias.is_contiguous()) + && running_mean.is_contiguous() + && running_var.is_contiguous(); + Tensor output = at::empty_like(self, all_contiguous ? suggest_memory_format_contig(self) : self.suggest_memory_format()); + + // Prepare save_mean and save_var + Tensor save_var; + Tensor save_mean; + const bool mixed_type = is_mixed_type(self, weight, bias, running_mean, running_var); + const int64_t ndim = self.dim(); + DimVector reduce_dims(ndim - 1); + reduce_dims[0] = 0; + for (const auto i : c10::irange(2, ndim)) { + reduce_dims[i - 1] = i; + } + if (mixed_type) { + if (!train) { + save_mean = at::empty({0}, self.options().dtype(kFloat)); + save_var = at::empty({0}, self.options().dtype(kFloat)); + } else { + save_mean = is_contiguous(self) ? at::empty({self.size(1)}, self.options().dtype(kFloat)) : at::mean(self, /*dim=*/reduce_dims, /*keepdim=*/false, kFloat); + save_var = at::empty({self.size(1)}, self.options().dtype(kFloat)); + } + } else { + if (!train) { + save_mean = at::empty({0}, self.options()); + save_var = at::empty({0}, self.options()); + } else { + save_mean = is_contiguous(self) ? at::empty({self.size(1)}, self.options()) : at::mean(self, /*dim=*/reduce_dims, /*keepdim=*/false); + save_var = at::empty({self.size(1)}, self.options()); + } + } + return batch_norm_cpu_out(self, weight_opt, bias_opt, running_mean_opt, running_var_opt, train, momentum, eps, output, save_mean, save_var); +} + + +std::tuple _batch_norm_legit_cpu( + const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, + Tensor& running_mean, Tensor& running_var, bool train, double momentum, double eps) { + return batch_norm_cpu(self, weight_opt, bias_opt, running_mean, running_var, train, momentum, eps); +} + +std::tuple _batch_norm_legit_no_stats_cpu( + const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, + bool train, double momentum, double eps) { + return batch_norm_cpu(self, weight_opt, bias_opt, Tensor(), Tensor(), train, momentum, eps); +} + + +std::tuple _batch_norm_legit_cpu_out(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, Tensor& running_mean, Tensor& running_var, bool train, double momentum, double eps, Tensor& out, Tensor& save_mean, Tensor& save_var) { + return batch_norm_cpu_out(self, weight_opt, bias_opt, running_mean, running_var, train, momentum, eps, out, save_mean, save_var); +} + + +std::tuple _batch_norm_legit_no_stats_cpu_out(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, bool train, double momentum, double eps, Tensor& out, Tensor& save_mean, Tensor& save_var) { + return batch_norm_cpu_out(self, weight_opt, bias_opt, Tensor(), Tensor(), train, momentum, eps, out, save_mean, save_var); +} + + std::tuple batch_norm_backward_cpu(const Tensor& grad_out, const Tensor& self, const c10::optional& weight_opt, const c10::optional& running_mean_opt, const c10::optional& running_var_opt, const c10::optional& save_mean_opt, const c10::optional& save_invstd_opt, bool train, double eps, std::array grad_input_mask) { // See [Note: hacky wrapper removal for optional tensor] diff --git a/aten/src/ATen/native/Onehot.cpp b/aten/src/ATen/native/Onehot.cpp index a0c061062174..41b7a6961863 100644 --- a/aten/src/ATen/native/Onehot.cpp +++ b/aten/src/ATen/native/Onehot.cpp @@ -1,4 +1,14 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/PackedSequence.cpp b/aten/src/ATen/native/PackedSequence.cpp index ec997d86aa1b..19b12b081960 100644 --- a/aten/src/ATen/native/PackedSequence.cpp +++ b/aten/src/ATen/native/PackedSequence.cpp @@ -1,5 +1,20 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif #include @@ -96,18 +111,20 @@ std::tuple _pack_padded_sequence(const Tensor& _input, const Ten // `grad` could be on arbitrary device and of arbitrary dtype, but `_batch_sizes` // is guaranteed to be a CPU int64 tensor. // See NOTE [ device and dtype of a PackedSequence ] -Tensor _pack_padded_sequence_backward(const Tensor& grad, at::IntArrayRef input_size, const Tensor& _batch_sizes, bool batch_first) { - std::vector input_size_after_t = input_size.vec(); +Tensor _pack_padded_sequence_backward_symint(const Tensor& grad, c10::SymIntArrayRef input_size, const Tensor& _batch_sizes, bool batch_first) { + std::vector input_size_after_t = input_size.vec(); if (batch_first) { TORCH_CHECK(input_size.size() >= 2); std::swap(input_size_after_t[0], input_size_after_t[1]); } - auto grad_input = at::zeros(input_size_after_t, grad.options()); + auto grad_input = at::zeros_symint(input_size_after_t, grad.options()); auto batch_sizes_t = _batch_sizes.contiguous(); checkLongTensor(batch_sizes_t); int64_t offset = 0; - int64_t max_seq_len = batch_sizes_t.size(0); + // NOTE: this op advertises as CompositeImplicitAutograd, but uses data_ptr(). + // we should fix this. + auto max_seq_len = batch_sizes_t.size(0); int64_t * batch_sizes = batch_sizes_t.data_ptr(); for (const auto i : c10::irange(max_seq_len)) { grad_input[i].slice(0, 0, batch_sizes[i]).copy_(grad.slice(0, offset, offset + batch_sizes[i])); diff --git a/aten/src/ATen/native/PadNd.cpp b/aten/src/ATen/native/PadNd.cpp index 9510b17de002..9421d537717c 100644 --- a/aten/src/ATen/native/PadNd.cpp +++ b/aten/src/ATen/native/PadNd.cpp @@ -1,8 +1,29 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { Tensor constant_pad_nd(const Tensor& self, IntArrayRef pad, const Scalar& value) { @@ -85,13 +106,13 @@ Tensor constant_pad_nd(const Tensor& self, IntArrayRef pad, const Scalar& value) return output; } -Tensor _pad_circular(const Tensor &self, IntArrayRef padding) { - const auto in_shape = self.sizes(); +Tensor _pad_circular_symint(const Tensor &self, c10::SymIntArrayRef padding) { + const auto in_shape = self.sym_sizes(); const auto ndim = static_cast(in_shape.size()) - 2; TORCH_CHECK(padding.size() + 4 == in_shape.size() * 2, "Invalid padding size, expected ", ndim * 2, " but got ", padding.size()); - DimVector out_shape(in_shape.size()); + c10::SymDimVector out_shape(in_shape.size()); out_shape[0] = in_shape[0]; out_shape[1] = in_shape[1]; @@ -110,18 +131,18 @@ Tensor _pad_circular(const Tensor &self, IntArrayRef padding) { "Negative padding value is resulting in an empty dimension"); } - auto out = self.new_empty(out_shape, self.options()); + auto out = self.new_empty_symint(out_shape, self.options()); // Put original array into the padded array Tensor out_slice = out; Tensor in_slice = self; - constexpr int64_t zero = 0; + const SymInt zero = 0; for (const auto i : c10::irange(ndim)) { const auto dim = ndim - i + 1; const auto pad_l = padding[2*i + 0]; const auto pad_r = padding[2*i + 1]; - out_slice = out_slice.slice(dim, std::max(pad_l, zero), out_shape[dim] - std::max(pad_r, zero)); - in_slice = in_slice.slice(dim, std::max(-pad_l, zero), in_shape[dim] - std::max(-pad_r, zero)); + out_slice = out_slice.slice_symint(dim, std::max(pad_l, zero), out_shape[dim] - std::max(pad_r, zero)); + in_slice = in_slice.slice_symint(dim, std::max(-pad_l, zero), in_shape[dim] - std::max(-pad_r, zero)); } out_slice.copy_(in_slice); @@ -137,16 +158,16 @@ Tensor _pad_circular(const Tensor &self, IntArrayRef padding) { const auto pad_r = padding[2*i + 1]; if (pad_l > 0) { - out_slice = out.slice(dim, 0, pad_l); - in_slice = out.slice(dim, + out_slice = out.slice_symint(dim, 0, pad_l); + in_slice = out.slice_symint(dim, out_shape[dim] - pad_l - std::max(pad_r, zero), out_shape[dim] - std::max(pad_r, zero)); out_slice.copy_(in_slice); } if (pad_r > 0) { - out_slice = out.slice(dim, out_shape[dim] - pad_r, out_shape[dim]); - in_slice = out.slice(dim, std::max(pad_l, zero), std::max(pad_l, zero) + pad_r); + out_slice = out.slice_symint(dim, out_shape[dim] - pad_r, out_shape[dim]); + in_slice = out.slice_symint(dim, std::max(pad_l, zero), std::max(pad_l, zero) + pad_r); out_slice.copy_(in_slice); } } @@ -154,14 +175,14 @@ Tensor _pad_circular(const Tensor &self, IntArrayRef padding) { return out; } -Tensor _pad_enum(const Tensor &self, IntArrayRef pad, int64_t mode_int, c10::optional value) { +Tensor _pad_enum_symint(const Tensor &self, c10::SymIntArrayRef pad, int64_t mode_int, c10::optional value) { const auto input_dim = self.dim(); TORCH_CHECK(pad.size() % 2 == 0, "Padding length must be divisible by 2"); TORCH_CHECK(static_cast(pad.size()) <= input_dim * 2, "Padding length too large"); auto mode = static_cast(mode_int); if (mode == at::padding_mode::constant) { - return at::constant_pad_nd(self, pad, value.value_or(0.0)); + return at::constant_pad_nd_symint(self, pad, value.value_or(0.0)); } TORCH_CHECK(!value.has_value() || *value == 0, "Padding mode \"", padding_mode_string(mode), @@ -169,23 +190,23 @@ Tensor _pad_enum(const Tensor &self, IntArrayRef pad, int64_t mode_int, c10::opt if (pad.size() == 2 && (input_dim == 2 || input_dim == 3)) { switch (mode) { - case at::padding_mode::reflect: return at::reflection_pad1d(self, pad); - case at::padding_mode::replicate: return at::replication_pad1d(self, pad); - case at::padding_mode::circular: return at::_pad_circular(self, pad); + case at::padding_mode::reflect: return at::reflection_pad1d_symint(self, pad); + case at::padding_mode::replicate: return at::replication_pad1d_symint(self, pad); + case at::padding_mode::circular: return at::_pad_circular_symint(self, pad); default: {} } } else if(pad.size() == 4 && (input_dim == 3 || input_dim == 4)) { switch (mode) { - case at::padding_mode::reflect: return at::reflection_pad2d(self, pad); - case at::padding_mode::replicate: return at::replication_pad2d(self, pad); - case at::padding_mode::circular: return at::_pad_circular(self, pad); + case at::padding_mode::reflect: return at::reflection_pad2d_symint(self, pad); + case at::padding_mode::replicate: return at::replication_pad2d_symint(self, pad); + case at::padding_mode::circular: return at::_pad_circular_symint(self, pad); default: {} } } else if (pad.size() == 6 && (input_dim == 4 || input_dim == 5)) { switch (mode) { - case at::padding_mode::reflect: return at::reflection_pad3d(self, pad); - case at::padding_mode::replicate: return at::replication_pad3d(self, pad); - case at::padding_mode::circular: return at::_pad_circular(self, pad); + case at::padding_mode::reflect: return at::reflection_pad3d_symint(self, pad); + case at::padding_mode::replicate: return at::replication_pad3d_symint(self, pad); + case at::padding_mode::circular: return at::_pad_circular_symint(self, pad); default: {} } } @@ -193,7 +214,7 @@ Tensor _pad_enum(const Tensor &self, IntArrayRef pad, int64_t mode_int, c10::opt "Only 2D, 3D, 4D, 5D padding with non-constant padding are supported for now"); } -Tensor pad(const Tensor &self, IntArrayRef pad, c10::string_view mode, c10::optional value) { +Tensor pad_symint(const Tensor &self, c10::SymIntArrayRef pad, c10::string_view mode, c10::optional value) { const auto mode_enum = [&] { if (mode == "reflect") { return at::padding_mode::reflect; @@ -207,7 +228,7 @@ Tensor pad(const Tensor &self, IntArrayRef pad, c10::string_view mode, c10::opti C10_THROW_ERROR(NotImplementedError, c10::str("Unrecognised padding mode ", mode)); }(); - return at::native::_pad_enum(self, pad, static_cast(mode_enum), value); + return at::native::_pad_enum_symint(self, pad, static_cast(mode_enum), value); } }} // namespace at::native diff --git a/aten/src/ATen/native/PadNd.h b/aten/src/ATen/native/PadNd.h deleted file mode 100644 index 37f59acb8a4c..000000000000 --- a/aten/src/ATen/native/PadNd.h +++ /dev/null @@ -1,22 +0,0 @@ -#pragma once - -namespace at { - -enum class padding_mode { - reflect, - replicate, - circular, - constant, -}; - -static inline c10::string_view padding_mode_string(padding_mode m) { - switch (m) { - case padding_mode::reflect: return "reflect"; - case padding_mode::replicate: return "replicate"; - case padding_mode::circular: return "circular"; - case padding_mode::constant: return "constant"; - } - TORCH_CHECK(false, "Invalid padding mode (", static_cast(m), ")"); -} - -} // namespace at diff --git a/aten/src/ATen/native/PixelShuffle.cpp b/aten/src/ATen/native/PixelShuffle.cpp index 41547a10f5fd..e535909a7342 100644 --- a/aten/src/ATen/native/PixelShuffle.cpp +++ b/aten/src/ATen/native/PixelShuffle.cpp @@ -1,9 +1,21 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include -#include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + +#include +#include +#include namespace at { namespace native { @@ -52,6 +64,11 @@ Tensor pixel_shuffle_cpu(const Tensor& self, int64_t upscale_factor) { auto output = at::empty({0}, self.options()); auto memory_format = self.suggest_memory_format(); output.resize_(output_sizes, memory_format); + + if (output.numel() == 0) { + return output; + } + auto input = self.contiguous(memory_format); pixel_shuffle_kernel(kCPU, output, input, upscale_factor); @@ -61,6 +78,10 @@ Tensor pixel_shuffle_cpu(const Tensor& self, int64_t upscale_factor) { Tensor pixel_unshuffle_cpu(const Tensor& self, int64_t downscale_factor) { check_pixel_unshuffle_shapes(self, downscale_factor); + if (self.numel() == 0) { + return self.clone(); + } + // Format: (B1, ..., Bn), C, H, W std::vector output_sizes(self.sizes().begin(), self.sizes().end() - 3); output_sizes.insert(output_sizes.end(), @@ -71,6 +92,11 @@ Tensor pixel_unshuffle_cpu(const Tensor& self, int64_t downscale_factor) { auto output = at::empty({0}, self.options()); auto memory_format = self.suggest_memory_format(); output.resize_(output_sizes, memory_format); + + if (output.numel() == 0) { + return output; + } + auto input = self.contiguous(memory_format); pixel_unshuffle_kernel(kCPU, output, input, downscale_factor); @@ -114,7 +140,8 @@ Tensor math_pixel_shuffle(const Tensor& self, int64_t upscale_factor) { std::vector final_shape(self.sizes().begin(), self_sizes_batch_end); final_shape.insert(final_shape.end(), {oc, oh, ow}); - return input_permuted.reshape(final_shape); + // pixel_shuffle expects to *never* return an alias of the input. + return input_permuted.clone(at::MemoryFormat::Contiguous).view(final_shape); } Tensor math_pixel_unshuffle(const Tensor& self, int64_t downscale_factor) { @@ -154,7 +181,8 @@ Tensor math_pixel_unshuffle(const Tensor& self, int64_t downscale_factor) { std::vector final_shape(self.sizes().begin(), self_sizes_batch_end); final_shape.insert(final_shape.end(), {oc, oh, ow}); - return input_permuted.reshape(final_shape); + // pixel_unshuffle expects to *never* return an alias of the input. + return input_permuted.clone(at::MemoryFormat::Contiguous).view(final_shape); } DEFINE_DISPATCH(pixel_shuffle_kernel); diff --git a/aten/src/ATen/native/PointwiseOps.cpp b/aten/src/ATen/native/PointwiseOps.cpp index a99bc959eb95..8259135ce14a 100644 --- a/aten/src/ATen/native/PointwiseOps.cpp +++ b/aten/src/ATen/native/PointwiseOps.cpp @@ -1,12 +1,17 @@ // Ternary and higher-order pointwise operations +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include -#include -#include +#include +#include +#include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif namespace at { namespace meta { diff --git a/aten/src/ATen/native/Pool.h b/aten/src/ATen/native/Pool.h index 0f3885524a79..0ff4490086b7 100644 --- a/aten/src/ATen/native/Pool.h +++ b/aten/src/ATen/native/Pool.h @@ -58,21 +58,27 @@ template static inline T pooling_output_shape( T inputSize, T kernelSize, T pad, T stride, T dilation, bool ceil_mode) { TORCH_CHECK(stride != 0, "stride should not be zero"); + TORCH_CHECK(pad >= 0, + "pad must be non-negative, but got pad: ", pad); + TORCH_CHECK(pad <= kernelSize / 2, + "pad should be at most half of kernel size, but got pad=", + pad, " and kernel_size=", kernelSize) return pooling_output_shape_pad_lr( inputSize, kernelSize, pad, pad, stride, dilation, ceil_mode); } -inline std::pair pooling_same_mode_padding_lr( - int64_t inputSize, int64_t kernelSize, int64_t stride, int64_t dilation) { +template +std::pair _pooling_same_mode_padding_lr( + T inputSize, T kernelSize, int64_t stride, int64_t dilation) { // NOTE: with strides, the output shape is ceil(inputSize/stride) - auto total_padding = dilation * (kernelSize - 1); + auto total_padding = T(dilation) * (kernelSize - 1); // Prefer symmetric padding if possible if (stride > 2 && (total_padding % 2 == 1)) { // The floor in the output size calculation gives us a little wiggle room auto wiggle_room = inputSize % stride - 1; if (wiggle_room > 0) { - --total_padding; + total_padding = total_padding - 1; } } @@ -80,6 +86,15 @@ inline std::pair pooling_same_mode_padding_lr( return {left, total_padding - left}; } +inline std::pair pooling_same_mode_padding_lr( + int64_t inputSize, int64_t kernelSize, int64_t stride, int64_t dilation) { + return _pooling_same_mode_padding_lr(inputSize, kernelSize, stride, dilation); +} + +inline std::pair pooling_same_mode_padding_lr( + c10::SymInt inputSize, c10::SymInt kernelSize, int64_t stride, int64_t dilation) { + return _pooling_same_mode_padding_lr(inputSize, kernelSize, stride, dilation); +} // AveragePool2d/DilatedMaxPool2d (forward) static inline void @@ -211,10 +226,20 @@ pool3d_shape_check( TORCH_CHECK(ndim == 4 || ndim == 5, fn_name, ": Expected 4D or 5D tensor for input, but got: ", input.sizes()); - for (const auto i : c10::irange(1, ndim)) { - TORCH_CHECK(input.size(i) > 0, - fn_name, "Expected input to have non-zero size for non-batch dimensions, but got", - input.sizes(), " with dimension ", i, " being empty."); + for (const auto i : c10::irange(ndim)) { + if (ndim == 5 && i == 0) { + // size of batch-dim can be 0. + continue; + } + TORCH_CHECK( + input.size(i) > 0, + fn_name, + ": Expected input's non-batch dimensions to have positive length," + " but input has a shape of ", + input.sizes(), + " and non-batch dimension ", + input.size(i), + " has length zero!") } if (check_input_size) { // AveragePool3d diff --git a/aten/src/ATen/native/Pooling.cpp b/aten/src/ATen/native/Pooling.cpp index 724c53fdd0c0..fcbe741ab0ea 100644 --- a/aten/src/ATen/native/Pooling.cpp +++ b/aten/src/ATen/native/Pooling.cpp @@ -1,12 +1,31 @@ -#include - -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include namespace at { namespace native { diff --git a/aten/src/ATen/native/Pow.cpp b/aten/src/ATen/native/Pow.cpp index 4326853a8165..7050524acebf 100644 --- a/aten/src/ATen/native/Pow.cpp +++ b/aten/src/ATen/native/Pow.cpp @@ -1,11 +1,20 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include -#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace meta { diff --git a/aten/src/ATen/native/QuantizedLinear.cpp b/aten/src/ATen/native/QuantizedLinear.cpp index af7643ec18b6..002bb1adc438 100644 --- a/aten/src/ATen/native/QuantizedLinear.cpp +++ b/aten/src/ATen/native/QuantizedLinear.cpp @@ -1,20 +1,28 @@ -#include -#include -#include -#include -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include +#include #include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #ifdef USE_FBGEMM diff --git a/aten/src/ATen/native/README.md b/aten/src/ATen/native/README.md index 043e93e332a6..651b21ae0186 100644 --- a/aten/src/ATen/native/README.md +++ b/aten/src/ATen/native/README.md @@ -47,10 +47,9 @@ signature. if one argument is a `FloatTensor`, all other arguments are checked to be `FloatTensor`s). `Tensor` or `Tensor?` must sometimes be annotated to indicate aliasing and mutability. - In general annotations can be defined via the following four situations: - - `Tensor(a)` - `a` is a set of Tensors that may alias to the same data. + In general annotations can be defined via the following situations: + - `Tensor(a)` - `a` is a set of Tensors that may alias to the same data. The set could have a size of one. - `Tensor(a!)` - members of `a` may be written to thus mutating the underlying data. - - `Tensor!` - shorthand for Tensor(fresh\_identifier!) - `Tensor(a! -> a|b)` - Tensor is in set `a`, written to, and after the write is in set `a` AND `b`. For more details on when and why this needs to happen, please see the section on annotations. - `Tensor[]`. A `Tensor[]` argument translates into a C++ argument of type `ArrayRef` @@ -445,7 +444,7 @@ By default, ATen code generation will generate device check, which will ensure all the tensor parameters passed to kernel are on the same device. -However, in some cases, checking the device is unncessary, because, +However, in some cases, checking the device is unnecessary, because, e.g., you call a function allows to work on multiple devices. In that case, code generation of the device check can be disabled by adding `device_check: NoCheck` to your function definition. @@ -476,6 +475,28 @@ as `Tensor &`, which 1) allowed changing which `TensorImpl` the `Tensor` itself was not necessary to allow the underlying data to change. (This was like using `T * const` when we wanted `const T*`.) +### `autogen` + +``` +- func: my_op_(Tensor(a!) self) -> Tensor(a!) +... + autogen: my_op, my_op.out +``` + +`autogen` keyword is being used to specify which native function the codegen system should generate +implementations for. +* For an in-place variant of a native function (op name ends with an `_`), we will generate a functional +variant and an out= variant. +* If a functional variant is given, we generate an out= variant. +* We don't support `autogen` for view ops, ops that bypass the dispatcher as well as composite ops. + +We also generate kernels for generated ops, which merely copy and return the result from the base ops. +These generated kernels can be found in `/aten/src/ATen/CompositeViewCopyKernels.cpp`. + +Also notice that for new operators being added to `native_functions.yaml`, if they satisfy the requirements +mentioned above, they should include `autogen` keyword, since functionalization depends on it. We will +enforce this in codegen. + ## Writing an implementation in C++ @@ -534,7 +555,7 @@ Here're steps to follow to decide the right dispatch keyword: Note: to support training, you're required to write a formula in derivatives.yaml since your backend implementations don't support autograd. - - Yes: you're likely calling other `at::` ops in the implemetation. Go to step 2. + - Yes: you're likely calling other `at::` ops in the implementation. Go to step 2. 2. Think about training: does your kernel support autograd? [check autograd support](#will-your-function-be-automatically-differentiable) - Yes: in other words, you're providing a `CompositeImplicitAutograd` kernel which supports both inference and autograd. @@ -588,7 +609,7 @@ It shows for a certain operator, what the computed dispatch table looks like aft 4. TODO: AutogradCPUOrCUDA Note that in native_functions.yaml you can mix using backend keywords and alias keywords above for one op: - - direct registration to backend always has higher precendence than alias + - direct registration to backend always has higher precedence than alias - DO NOT provide multiple alias keywords to the same op: alias keywords have precedence `CompositeExplicitAutograd > CompositeImplicitAutograd`, e.g. adding both `CompositeImplicitAutograd` and `CompositeExplicitAutograd` kernels for one op will completely ignore `CompositeImplicitAutograd` kernel for both inference and training. Thus this will trigger an error when native_functions.yaml is parsed. @@ -606,7 +627,8 @@ the torch._C._nn (marked with `python_module: nn`), torch._C._fft (marked with `python_module: fft`), torch._C._linalg (marked with `python_module: linalg`) objects, torch._C._sparse (marked with `python_module: sparse`) objects, -or torch._C._special (marked with `python_module: special`) objects. +torch._C._special (marked with `python_module: special`) objects, +or torch._C._nested (marked with `python_module: nested`) objects. ### Undefined tensor conventions diff --git a/aten/src/ATen/native/RNN.cpp b/aten/src/ATen/native/RNN.cpp index e40caef80e3c..52efc6929f54 100644 --- a/aten/src/ATen/native/RNN.cpp +++ b/aten/src/ATen/native/RNN.cpp @@ -1,8 +1,10 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include -#include +#include +#include +#include +#include #include #include #include @@ -10,6 +12,46 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + int register_linear_params(); namespace at { namespace native { @@ -624,20 +666,20 @@ tpair_of hidden_slice(const tpair_of& t, int64_t start, int64_t // It's a struct only because functional programming in C++ is a pain, and it's easier // to pass around "vtable pointers" than actual function pointers. -void check_rnn_cell_forward_input(const Tensor& input, int64_t input_size) { +void check_rnn_cell_forward_input(const Tensor& input, c10::SymInt input_size) { TORCH_CHECK( - input.size(1) == input_size, - "input has inconsistent input_size: got ", input.size(1), " expected ", input_size); + input.sym_size(1) == input_size, + "input has inconsistent input_size: got ", input.sym_size(1), " expected ", input_size); } -void check_rnn_cell_forward_hidden(const Tensor& input, const Tensor& hx, int64_t hidden_size, int64_t hidden_label) { +void check_rnn_cell_forward_hidden(const Tensor& input, const Tensor& hx, c10::SymInt hidden_size, c10::SymInt hidden_label) { TORCH_CHECK( - input.size(0) == hx.size(0), - "Input batch size ", input.size(0), " doesn't match hidden", hidden_label, " batch size ", hx.size(0)); + input.sym_size(0) == hx.sym_size(0), + "Input batch size ", input.sym_size(0), " doesn't match hidden", hidden_label, " batch size ", hx.sym_size(0)); TORCH_CHECK( - hx.size(1) == hidden_size, - "hidden", hidden_label, " has inconsistent hidden_size: got ", hx.size(1), ", expected ", hidden_size); + hx.sym_size(1) == hidden_size, + "hidden", hidden_label, " has inconsistent hidden_size: got ", hx.sym_size(1), ", expected ", hidden_size); } template @@ -717,7 +759,7 @@ struct GRUCell : Cell { const hidden_type& hidden, const cell_params& params, bool pre_compute_input = false) const override { - if (input.is_cuda()) { + if (input.is_cuda() || input.is_xpu()) { TORCH_CHECK(!pre_compute_input); auto igates = params.matmul_ih(input); auto hgates = params.matmul_hh(hidden); @@ -1465,8 +1507,8 @@ std::tuple lstm_cell( const Tensor& b_hh = c10::value_or_else(b_hh_opt, [] {return Tensor();}); TORCH_CHECK(hx.size() == 2, "lstm_cell expects two hidden states"); - check_rnn_cell_forward_input(input, w_ih.size(1)); - auto hidden_size = w_hh.size(1); + check_rnn_cell_forward_input(input, w_ih.sym_size(1)); + auto hidden_size = w_hh.sym_size(1); check_rnn_cell_forward_hidden(input, hx[0], hidden_size, 0); check_rnn_cell_forward_hidden(input, hx[1], hidden_size, 0); static at::Tensor undefined; diff --git a/aten/src/ATen/native/RNN.h b/aten/src/ATen/native/RNN.h index 2bdb9becf4fa..50aaa0a29c2b 100644 --- a/aten/src/ATen/native/RNN.h +++ b/aten/src/ATen/native/RNN.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include namespace at { namespace native { diff --git a/aten/src/ATen/native/RangeFactories.cpp b/aten/src/ATen/native/RangeFactories.cpp index 038da93456ed..408bf0a27e6f 100644 --- a/aten/src/ATen/native/RangeFactories.cpp +++ b/aten/src/ATen/native/RangeFactories.cpp @@ -1,13 +1,23 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include #include -#include #include -#include +#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/ReduceAllOps.cpp b/aten/src/ATen/native/ReduceAllOps.cpp index 31764734b67a..e1d51a1666af 100644 --- a/aten/src/ATen/native/ReduceAllOps.cpp +++ b/aten/src/ATen/native/ReduceAllOps.cpp @@ -1,8 +1,21 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include -#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#include +#include +#include +#include +#endif namespace at { namespace native { @@ -34,9 +47,16 @@ Tensor max(const Tensor &self) { } Tensor& max_unary_out(const Tensor &self, Tensor& out) { - Tensor tmp_output = at::max(self); - at::native::resize_output(out, tmp_output.sizes()); - out.copy_(tmp_output); + // First check if the devices match (CPU vs GPU) + TORCH_CHECK(self.device() == out.device()); + + TORCH_CHECK(canCast( + typeMetaToScalarType(self.dtype()), + typeMetaToScalarType(out.dtype()))); + + at::native::resize_output(out, {}); + + max_all_stub(self.device().type(), out, self.contiguous()); return out; } diff --git a/aten/src/ATen/native/ReduceOps.cpp b/aten/src/ATen/native/ReduceOps.cpp index 71fb7d94c4be..2fe5eee4a286 100644 --- a/aten/src/ATen/native/ReduceOps.cpp +++ b/aten/src/ATen/native/ReduceOps.cpp @@ -1,21 +1,114 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include #include -#include -#include +#include #include #include #include +#include +#include +#include #include #include -#include -#include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include @@ -24,9 +117,7 @@ #include #include #include -#include #include -#include #include namespace at { @@ -390,7 +481,6 @@ template void impl_func_cum_ops( const Tensor& self, int64_t dim, - c10::optional dtype, const Tensor& result, Stub& stub) { NoNamesGuard guard; @@ -409,7 +499,7 @@ TORCH_IMPL_FUNC(cumsum_out) int64_t dim, c10::optional dtype, const Tensor& result) { - impl_func_cum_ops(self, dim, dtype, result, cumsum_stub); + impl_func_cum_ops(self, dim, result, cumsum_stub); } TORCH_IMPL_FUNC(cumprod_out) @@ -417,7 +507,7 @@ TORCH_IMPL_FUNC(cumprod_out) int64_t dim, c10::optional dtype, const Tensor& result) { - impl_func_cum_ops(self, dim, dtype, result, cumprod_stub); + impl_func_cum_ops(self, dim, result, cumprod_stub); } Tensor reversed_cumsum(const Tensor& w, int64_t dim) { @@ -527,18 +617,22 @@ Tensor cumprod_backward(const Tensor& grad, const Tensor& input, int64_t dim, co auto input_conj = input.conj(); auto output_conj = output.conj(); + // For Composite Compliance, we always choose the slower but composite compliant path. + bool are_inputs_tensors_sublcass = areAnyTensorSubclassLike({input, grad, output}); + const auto w = output_conj * grad; const auto is_zero = input == 0; - if (!(is_zero.any().item())) { - return reversed_cumsum(w, dim).div(input_conj); + if (!are_inputs_tensors_sublcass) { + if (is_zero.any().item() == 0) { + return reversed_cumsum(w, dim).div(input_conj); + } } // If we are not computing a second order gradient, we can use an // O(n) implementation. The derivative of this implementation is _not_ // the second derivative of cumprod. As such, we fallback to a less efficient // O(n^2) implementation when at::GradMode::is_enabled(). - Tensor grad_input = at::zeros(input.sizes(), grad.options()); - if (!at::GradMode::is_enabled()) { + if (!at::GradMode::is_enabled() && !are_inputs_tensors_sublcass) { // n.b. This could probably be implemented much faster with a kernel // From here on we need to use some mask gymnastics to @@ -556,6 +650,7 @@ Tensor cumprod_backward(const Tensor& grad, const Tensor& input, int64_t dim, co // zeros_like(indices).scatter_(dim, indices, 1.) & cumsum == 1 // Note that the logic_and with cumsum == 1 accounts // for the case when there is no first zero + Tensor grad_input = at::zeros(input.sizes(), grad.options()); const auto cumsum = is_zero.cumsum(dim); // case k < z1 @@ -592,6 +687,7 @@ Tensor cumprod_backward(const Tensor& grad, const Tensor& input, int64_t dim, co .mul_(at::gather(output_conj, dim, (first_zero_index - 1).relu_()) .masked_fill_(first_zero_index == 0, 1.)) .masked_select(first_zero_mask)); + return grad_input; } else { // GradMode::enabled() /* If the input is nonzero, we need to calculate the dy_j / dx_k @@ -614,6 +710,15 @@ Tensor cumprod_backward(const Tensor& grad, const Tensor& input, int64_t dim, co dy_j / dx_k = 0, which is done right after the assert. */ + Tensor grad_input; + // For Composite Compliance, we will use + // at::stack on the grad slices, hence the vector. + std::vector grad_inputs; + if (are_inputs_tensors_sublcass) { + grad_inputs.reserve(dim_size); + } else { + grad_input = at::zeros(input.sizes(), grad.options()); + } auto ones_size = input.sizes().vec(); ones_size[dim] = 1; const Tensor ones = at::ones({1}, grad.options()).expand(ones_size); @@ -638,11 +743,16 @@ Tensor cumprod_backward(const Tensor& grad, const Tensor& input, int64_t dim, co // dim_size - k TORCH_CHECK(omitted_products.size(dim) == dim_size - k); - grad_input.select(dim, k).copy_( - at::sum(grad.slice(dim, k) * omitted_products,dim)); + auto grad_slice = at::sum(grad.slice(dim, k) * omitted_products, dim); + if (are_inputs_tensors_sublcass) { + grad_inputs.push_back(grad_slice); + } else { + grad_input.select(dim, k).copy_(grad_slice); + } } + + return are_inputs_tensors_sublcass ? at::stack(grad_inputs, dim) : grad_input; } - return grad_input; } // Implement std::is_nan for MSVC. @@ -1079,10 +1189,6 @@ Tensor sum(const Tensor& self, DimnameList dim, bool keepdim, c10::optional opt_dtype) { - return at::sum(input_t, c10::asIntArrayRefSlow(dim), keepdim, opt_dtype); -} - Tensor& sum_out(const Tensor& self, DimnameList dim, bool keepdim, optional opt_dtype, Tensor& result) { return at::sum_out(result, self, dimnames_to_positions(self, dim), keepdim, opt_dtype); @@ -1447,7 +1553,7 @@ inline void allany_impl( if (self.numel() == 0) { result.fill_(identity); } else if (self.numel() == 1) { - result.fill_(self.item().toBool()); + result.copy_(self.view_as(result).to(at::kBool)); } else { auto iter = get_allany_iter(self, result, dims, keepdim); stub(iter.device_type(), iter); @@ -1977,9 +2083,6 @@ bool cpu_equal(const Tensor& self, const Tensor& other) { at::NoNamesGuard guard; TORCH_CHECK(self.device() == other.device(), "Cannot compare two tensors on " "different devices. Got: ", self.device(), " and ", other.device()); - TORCH_CHECK(self.dtype() == other.dtype(), - "Expected object of scalar type ", self.dtype(), " but got scalar type ", - other.dtype(), " for argument 'other'"); if (!self.is_same_size(other)) { return false; } @@ -2012,14 +2115,19 @@ bool cpu_equal(const Tensor& self, const Tensor& other) { return result.load(); } +Tensor value_selecting_reduction_backward(const Tensor& grad, int64_t dim, const Tensor& indices, at::IntArrayRef sizes, bool keepdim) { + return at::native::value_selecting_reduction_backward_symint(grad, dim, indices, c10::fromIntArrayRefSlow(sizes), keepdim); +} + + // max(dim), min(dim), topk(dim), mode(dim), are examples of reduction // functions that select values. value_selecting_reduction_backward is the // backward function for those operators; it propagates the grad to the // specific value locations referred to at `indices`. -Tensor value_selecting_reduction_backward(const Tensor& grad, int64_t dim, const Tensor& indices, IntArrayRef sizes, bool keepdim) { +Tensor value_selecting_reduction_backward_symint(const Tensor& grad, int64_t dim, const Tensor& indices, c10::SymIntArrayRef sizes, bool keepdim) { auto inplace_scatter_if_not_tensor_subclass = [&](const Tensor& grad_out, const Tensor& indices_) { - auto grad_in = at::zeros(sizes, grad_out.options()); + auto grad_in = at::zeros_symint(sizes, grad_out.options()); if (areAnyTensorSubclassLike({grad, indices})) { return grad_in.scatter(dim, indices_, grad_out); } @@ -2038,5 +2146,9 @@ Tensor sum_csr(const Tensor &self, c10::optional dtype) { return self.values().sum(dtype); } +Tensor sum_coo(const Tensor &self, c10::optional dtype) { + return self._values().sum(dtype); +} + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/ReduceOpsUtils.h b/aten/src/ATen/native/ReduceOpsUtils.h index 9db9802ea788..2b46eb683f1c 100644 --- a/aten/src/ATen/native/ReduceOpsUtils.h +++ b/aten/src/ATen/native/ReduceOpsUtils.h @@ -102,7 +102,7 @@ static inline void check_scalar_type_device_layout_equal(const Tensor& out, cons OPTION_TYPE_EQUALITY_CHECK(layout, out.options(), self.options()); } -static inline Tensor integer_upcast(const Tensor& self, optional dtype) { +static inline Tensor integer_upcast(const Tensor& self, c10::optional dtype) { ScalarType scalarType = self.scalar_type(); ScalarType upcast_scalarType = dtype.value_or(at::isIntegralType(scalarType, /*includeBool=*/true) ? ScalarType::Long : scalarType); return self.toType(upcast_scalarType); diff --git a/aten/src/ATen/native/ReflectionPad.cpp b/aten/src/ATen/native/ReflectionPad.cpp index db744cc95eb0..3a6ad683d045 100644 --- a/aten/src/ATen/native/ReflectionPad.cpp +++ b/aten/src/ATen/native/ReflectionPad.cpp @@ -1,9 +1,26 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { @@ -965,8 +982,8 @@ TORCH_IMPL_FUNC(reflection_pad3d_out_cpu) auto input = input_.contiguous(); if (batch_mode) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( - kHalf, input.scalar_type(), "reflection_pad3d_cpu", [&] { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2( + kHalf, kBFloat16, input.scalar_type(), "reflection_pad3d_cpu", [&] { auto input_data = input.data_ptr(); auto output_data = output.data_ptr(); auto nbatch = input.size(0); @@ -986,8 +1003,8 @@ TORCH_IMPL_FUNC(reflection_pad3d_out_cpu) pad_front); }); } else { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( - kHalf, input.scalar_type(), "reflection_pad3d_cpu", [&] { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2( + kHalf, kBFloat16, input.scalar_type(), "reflection_pad3d_cpu", [&] { auto input_data = input.data_ptr(); auto output_data = output.data_ptr(); reflection_pad3d_out_frame( @@ -1043,8 +1060,8 @@ TORCH_IMPL_FUNC(reflection_pad3d_backward_out_cpu)(const Tensor& grad_output, grad_input.zero_(); if (batch_mode) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( - kHalf, input.scalar_type(), "reflection_pad3d_backward_cpu", [&] { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2( + kHalf, kBFloat16, input.scalar_type(), "reflection_pad3d_backward_cpu", [&] { reflection_pad3d_backward_out_loop( grad_input.data_ptr(), grad_output_.data_ptr(), @@ -1061,8 +1078,8 @@ TORCH_IMPL_FUNC(reflection_pad3d_backward_out_cpu)(const Tensor& grad_output, pad_front); }); } else { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( - kHalf, input.scalar_type(), "reflection_pad3d_backward_cpu", [&] { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2( + kHalf, kBFloat16, input.scalar_type(), "reflection_pad3d_backward_cpu", [&] { reflection_pad3d_backward_out_frame( grad_input.data_ptr(), grad_output_.data_ptr(), diff --git a/aten/src/ATen/native/Repeat.cpp b/aten/src/ATen/native/Repeat.cpp index b6e5c04f7702..c8c4e134929f 100644 --- a/aten/src/ATen/native/Repeat.cpp +++ b/aten/src/ATen/native/Repeat.cpp @@ -1,8 +1,19 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + template static void compute_cpu( index_t* repeat_ptr, @@ -64,11 +75,11 @@ Tensor repeat_interleave( } Tensor repeats_ = repeats; - if (repeats.dim() == 0 || (repeats.dim() == 1 && repeats.size(0) == 1)) { - repeats_ = repeats.reshape({1}).expand({input.size(dim.value())}); + if (repeats.dim() == 0 || (repeats.dim() == 1 && repeats.sym_size(0) == 1)) { + repeats_ = repeats.reshape({1}).expand_symint({input.sym_size(dim.value())}); } else if (repeats.dim() == 1) { TORCH_CHECK( - repeats.size(0) == input.size(dim.value()), + repeats.sym_size(0) == input.sym_size(dim.value()), "repeats must have the same size as input along dim") } else { AT_ERROR("repeats must be 0-dim or 1-dim tensor"); @@ -91,10 +102,17 @@ Tensor repeat_interleave( int64_t repeats, c10::optional dim, c10::optional output_size) { - at::Tensor repeats_ = - at::empty(1, self.options().dtype(at::kLong)).fill_(repeats); + at::Tensor repeats_ = at::empty(1, self.options().dtype(at::kLong)).fill_(repeats); return at::native::repeat_interleave(self, repeats_, dim, output_size); } +Tensor repeat_interleave_symint( + const Tensor& self, + c10::SymInt repeats, + c10::optional dim, + c10::optional output_size) { + return at::native::repeat_interleave(self, repeats.guard_int(__FILE__, __LINE__), dim, output_size); + } + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/ReplicationPadding.cpp b/aten/src/ATen/native/ReplicationPadding.cpp index 40fdb788a4ff..d0a4ea919acb 100644 --- a/aten/src/ATen/native/ReplicationPadding.cpp +++ b/aten/src/ATen/native/ReplicationPadding.cpp @@ -1,9 +1,24 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { diff --git a/aten/src/ATen/native/Resize.cpp b/aten/src/ATen/native/Resize.cpp index 08286f3983cc..bd47a25e6960 100644 --- a/aten/src/ATen/native/Resize.cpp +++ b/aten/src/ATen/native/Resize.cpp @@ -1,9 +1,16 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/Resize.h b/aten/src/ATen/native/Resize.h index c6fe2b3d2146..0bed4232695a 100644 --- a/aten/src/ATen/native/Resize.h +++ b/aten/src/ATen/native/Resize.h @@ -83,20 +83,30 @@ inline TensorImpl* resize_impl_cpu_( return self; } +template +T maybe_convert_symint(c10::SymInt) = delete; + +template <> +inline c10::SymInt maybe_convert_symint(c10::SymInt x) { return x; } + +template <> +inline int64_t maybe_convert_symint(c10::SymInt x) { return x.expect_int(); } + +template static inline void checkInBoundsForStorage( - IntArrayRef size, - IntArrayRef stride, - int64_t storage_offset, + ArrayRef size, + ArrayRef stride, + T storage_offset, const caffe2::TypeMeta data_type, const Storage& new_storage) { - int64_t storage_size_bytes = + T storage_size_bytes = at::detail::computeStorageNbytes(size, stride, data_type.itemsize()); - int64_t storage_offset_bytes = storage_offset * data_type.itemsize(); + T storage_offset_bytes = storage_offset * data_type.itemsize(); if (storage_size_bytes == 0) { // NB: (a tensor with arbitrary 0 dims)'s storage can have any numel. return; } - int64_t new_storage_size_bytes = new_storage.nbytes(); + T new_storage_size_bytes = maybe_convert_symint(new_storage.sym_nbytes()); TORCH_CHECK( storage_size_bytes + storage_offset_bytes <= new_storage_size_bytes, "setStorage: sizes ", @@ -114,8 +124,9 @@ static inline void checkInBoundsForStorage( new_storage_size_bytes); } -static inline void checkSetStorage(Tensor& result, Storage storage, int64_t storage_offset, - IntArrayRef size, IntArrayRef stride) { +template +static inline void checkSetStorage(Tensor& result, Storage storage, T storage_offset, + ArrayRef size, ArrayRef stride) { // FIXME: stride should be optional if (stride.data()) { TORCH_CHECK(size.size() == stride.size(), "unequal size length (", size.size(), @@ -151,11 +162,12 @@ static inline void checkSetStorage(Tensor& result, Storage storage, int64_t stor * Set self's sizes, strides, and storage_offset. * (size, stride, storage_offset) must be in bounds for self's storage. */ +template inline void setStrided( const Tensor& self, - IntArrayRef size, - IntArrayRef stride, - int64_t storage_offset) { + ArrayRef size, + ArrayRef stride, + T storage_offset) { TORCH_CHECK(size.size() == stride.size(), "mismatch in length of strides and shape"); for (auto val : stride) { TORCH_CHECK(val >= 0, @@ -169,13 +181,7 @@ inline void setStrided( /* storage offset */ TORCH_CHECK(storage_offset >= 0, "Tensor: invalid storage offset ", storage_offset); - self_->set_storage_offset(storage_offset); - - /* size and stride */ - if (self_->sizes() == size && self_->strides() == stride) { - return; - } - self_->set_sizes_and_strides(size, stride); + self_->set_sizes_and_strides(size, stride, c10::make_optional(storage_offset)); } }} diff --git a/aten/src/ATen/native/ResizeCommon.h b/aten/src/ATen/native/ResizeCommon.h index e814a71c89a8..1de4d74b3af6 100644 --- a/aten/src/ATen/native/ResizeCommon.h +++ b/aten/src/ATen/native/ResizeCommon.h @@ -6,11 +6,12 @@ namespace at { namespace native { -inline int64_t storage_size_for(IntArrayRef size, IntArrayRef stride) { +template +inline T storage_size_for(ArrayRef size, ArrayRef stride) { TORCH_INTERNAL_ASSERT_DEBUG_ONLY(size.size() == stride.size(), "storage_size_for(size, stride) requires that size and stride ", "have the same size as a precondition."); - int64_t storage_size = 1; + T storage_size = 1; for (const auto dim : c10::irange(size.size())) { if (size[dim] == 0) { storage_size = 0; diff --git a/aten/src/ATen/native/RowwisePrune.cpp b/aten/src/ATen/native/RowwisePrune.cpp index 40ae2215cbcc..c27707c4d307 100644 --- a/aten/src/ATen/native/RowwisePrune.cpp +++ b/aten/src/ATen/native/RowwisePrune.cpp @@ -1,8 +1,17 @@ // Copyright 2004-present Facebook. All Rights Reserved. +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/Scalar.cpp b/aten/src/ATen/native/Scalar.cpp index 7342c4806d44..f8932ea03bb2 100644 --- a/aten/src/ATen/native/Scalar.cpp +++ b/aten/src/ATen/native/Scalar.cpp @@ -1,5 +1,15 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/SegmentReduce.cpp b/aten/src/ATen/native/SegmentReduce.cpp index 3e562b7cf859..1e5e28dab86b 100644 --- a/aten/src/ATen/native/SegmentReduce.cpp +++ b/aten/src/ATen/native/SegmentReduce.cpp @@ -1,10 +1,23 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include #include #include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/SobolEngineOps.cpp b/aten/src/ATen/native/SobolEngineOps.cpp index 48366976a2e7..187faeba16a7 100644 --- a/aten/src/ATen/native/SobolEngineOps.cpp +++ b/aten/src/ATen/native/SobolEngineOps.cpp @@ -1,11 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/SobolEngineOpsUtils.cpp b/aten/src/ATen/native/SobolEngineOpsUtils.cpp index ef7cbb1faae9..709d5c06d3c9 100644 --- a/aten/src/ATen/native/SobolEngineOpsUtils.cpp +++ b/aten/src/ATen/native/SobolEngineOpsUtils.cpp @@ -1,4 +1,5 @@ /// This file contains tensor-agnostic SoboleEngine constants +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include /* diff --git a/aten/src/ATen/native/SobolEngineOpsUtils.h b/aten/src/ATen/native/SobolEngineOpsUtils.h index d3d7a362f2e8..495a43ed8a7c 100644 --- a/aten/src/ATen/native/SobolEngineOpsUtils.h +++ b/aten/src/ATen/native/SobolEngineOpsUtils.h @@ -1,6 +1,14 @@ /// This file contains some tensor-agnostic operations to be used in the /// core functions of the `SobolEngine` -#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/SoftMax.cpp b/aten/src/ATen/native/SoftMax.cpp index 21a94d5ed923..0332f57e9e23 100644 --- a/aten/src/ATen/native/SoftMax.cpp +++ b/aten/src/ATen/native/SoftMax.cpp @@ -1,13 +1,36 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include +#include #include #include #include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include #include @@ -137,10 +160,8 @@ void host_softmax( if (MaskedSoftMax) { TORCH_CHECK(mask_type_.has_value(), "Mask Type should be defined"); int64_t mask_type = mask_type_.value(); - TORCH_CHECK((mask_type == 0) || (mask_type == 1), "Mask Type should be 0 (src_mask) or 1 (src_key_padding_mask)"); - - // TODO: Add support for TxT src_mask - TORCH_CHECK(mask_type != 0, "src_mask not currently supported on CPU"); + // If mask_type == 2, then mask_.sizes() must equal input_.sizes() + TORCH_CHECK((mask_type == 0) || (mask_type == 1) || (mask_type == 2), "Mask Type should be 0 (src_mask) or 1 (src_key_padding_mask), or 2 (default_mask)"); } int64_t outer_size = 1; @@ -170,8 +191,22 @@ void host_softmax( output_data_base + outer_idx * outer_stride + inner_idx; bool* mask_data = nullptr; if (MaskedSoftMax) { - mask_data = mask_data_base + outer_idx * outer_stride + inner_idx; - } + // Process mask differently depending on the type: + // For a generic mask of mask_type == 2, mask shape is the same as the input shape, + // so indexing is the same. + auto mask_outer_idx = outer_idx; + if (mask_type_ == 0) { + // Optimized case: attention mask of shape LxL + // outer_idx goes over BxHxL, mask_outer_idx goes over L. + mask_outer_idx = outer_idx % input.size(2); + } else if (mask_type_ == 1) { + // Optimized case: padding mask of shape BxL + // outer_idx goes over BxHxL, mask_outer_idx goes over B. + mask_outer_idx = outer_idx / (input.size(1) * input.size(2)); + } + + mask_data = mask_data_base + mask_outer_idx * outer_stride + inner_idx; + }; // Calc max in softmax dim bool is_meaningful_max = false; @@ -553,15 +588,48 @@ Tensor log_softmax(const Tensor& self, Dimname dim, optional dtype) } Tensor masked_softmax_cpu(const Tensor& input_, const Tensor& mask_, const c10::optional dim_, const c10::optional mask_type_) { - TORCH_CHECK( - input_.sizes() == mask_.sizes(), "Mask shape should match input shape"); + + auto mask = mask_.contiguous(); + auto mask_type = mask_type_; // Mask type might get transformed below + TORCH_CHECK( mask_.scalar_type() == ScalarType::Bool, "Mask should be a boolean tensor"); + if ((mask.dim() != 2) || (input_.dim() != 4)) { + // Mask types 0 and 1 are only allowed for 2D masks and 4D inputs + mask_type = 2; + } + + if (mask_type == 2) { + TORCH_CHECK(input_.sizes() == mask.sizes(), + "For mask_type == 2 mask shape should match input shape") + } else if (mask_type == 1) { + // Padding mask of shape (B, L) + TORCH_CHECK((input_.sizes()[0] == mask.sizes()[0]) && (input_.sizes()[2] == mask.sizes()[1]), + "For mask_type == 1 mask shape should be (B, L)"); + if (dim_ != input_.dim() - 1) { + // We only process padding mask in the optimized way if softmax is applied along the last dimesion, + // otherwise we need to expand the mask into a generic 4D one + mask = mask_.view({input_.sizes()[0], 1, 1, input_.sizes()[2]}); + mask = mask.expand(input_.sizes()).contiguous(); + mask_type = 2; + } + } else if (mask_type == 0) { + // Attention mask of shape (L, L) + TORCH_CHECK((mask.dim() == 2) && (input_.sizes()[2] == mask.sizes()[0]) && (input_.sizes()[2] == mask.sizes()[1]), + "For mask_type == 0 mask shape should be (L, L)"); + if (dim_ != input_.dim() - 1) { + // We only process attention mask in a optimized way if softmax is applied along the last dimesion, + // otherwise we need to expand the mask into a generic 4D one + mask = mask.view({1, 1, input_.sizes()[2], input_.sizes()[2]}); + mask = mask.expand(input_.sizes()).contiguous(); + mask_type = 2; + } + } + Tensor output = at::empty_like(input_, input_.options()); auto input = input_.contiguous(); - auto mask = mask_.contiguous(); int64_t dim = dim_.has_value() ? dim_.value() : input.dim() - 1; dim = maybe_wrap_dim(dim, input_.dim()); @@ -575,7 +643,7 @@ Tensor masked_softmax_cpu(const Tensor& input_, const Tensor& mask_, const c10:: scalar_t, false /* LogSoftMax */, true /* MaskedSoftMax */>( - output, input, dim, mask.data_ptr(), mask_type_); + output, input, dim, mask.data_ptr(), mask_type); }); return output; } diff --git a/aten/src/ATen/native/Sorting.cpp b/aten/src/ATen/native/Sorting.cpp index fb4bdd87b7a7..3b50d7744aa2 100644 --- a/aten/src/ATen/native/Sorting.cpp +++ b/aten/src/ATen/native/Sorting.cpp @@ -1,8 +1,16 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include #include #include #include +#include +#include +#include +#include +#include #include #include #include @@ -11,6 +19,32 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include namespace at { @@ -227,7 +261,7 @@ Tensor quantile_compute( // synchronizing an accelerator with the CPU if (self.device().is_cpu()) { auto all_q_in_range = q.ge(0).logical_and_(q.le(1)).all(); - TORCH_CHECK(at::equal(all_q_in_range, all_q_in_range.new_ones({})), + TORCH_CHECK(at::is_scalar_tensor_true(all_q_in_range), "quantile() q values must be in the range [0, 1]"); } diff --git a/aten/src/ATen/native/SpectralOps.cpp b/aten/src/ATen/native/SpectralOps.cpp index d6389608a9e3..124c2d06d9e8 100644 --- a/aten/src/ATen/native/SpectralOps.cpp +++ b/aten/src/ATen/native/SpectralOps.cpp @@ -1,15 +1,67 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include -#include -#include +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include -#include -#include namespace at { namespace native { @@ -147,7 +199,7 @@ Tensor fft_c2r(c10::string_view function_name, " expects a floating point output tensor, but got ", out.scalar_type()); input = promote_tensor_fft(input, /*require_complex=*/true); const auto input_dim = input.dim(); - const auto dim = maybe_wrap_dim(unwrapped_dim, input_dim); + const auto dim = maybe_wrap_dim(unwrapped_dim, input_dim, /*wrap_scalar=*/false); const auto n = n_opt.value_or(2*(input.sizes()[dim] - 1)); TORCH_CHECK(n >= 1, "Invalid number of data points (", n, ") specified"); if (n_opt) { @@ -156,7 +208,7 @@ Tensor fft_c2r(c10::string_view function_name, const auto norm = norm_from_string(norm_str, forward); if (forward) { // FIXME: _fft does not support complex_output=false with inverse=false - input = at::conj(input); + input = input.conj(); } return fft_c2r_maybe_out( function_name, out, input, dim, static_cast(norm), n); @@ -173,7 +225,7 @@ Tensor fft_r2c(c10::string_view function_name, " expects a complex output tensor, but got ", out.scalar_type()); input = promote_tensor_fft(input); const auto input_dim = input.dim(); - const auto dim = maybe_wrap_dim(unwrapped_dim, input_dim); + const auto dim = maybe_wrap_dim(unwrapped_dim, input_dim, /*wrap_scalar=*/false); const auto n = n_opt.value_or(input.sizes()[dim]); TORCH_CHECK(n >= 1, "Invalid number of data points (", n, ") specified"); if (n_opt) { @@ -191,7 +243,7 @@ Tensor fft_r2c(c10::string_view function_name, if (!forward) { // FIXME: _fft_r2c doesn't support native r2c IFFT - return out.defined() ? at::conj_physical_out(out, ret) : at::conj(ret); + return out.defined() ? at::conj_physical_out(out, ret) : ret.conj(); } else { return ret; } @@ -205,7 +257,7 @@ Tensor fft_c2c(c10::string_view function_name, TORCH_CHECK(input.is_complex(), function_name, " expects a complex input tensor, but got ", input.scalar_type()); const auto input_dim = input.dim(); - const auto dim = maybe_wrap_dim(unwrapped_dim, input_dim); + const auto dim = maybe_wrap_dim(unwrapped_dim, input_dim, /*wrap_scalar=*/false); const auto n = n_opt.value_or(input.sizes()[dim]); TORCH_CHECK(n >= 1, "Invalid number of data points (", n, ") specified"); if (n_opt) { @@ -232,7 +284,7 @@ ShapeAndDims canonicalize_fft_shape_and_dim_args( if (dim) { ret.dim.resize(dim->size()); std::copy(dim->begin(), dim->end(), ret.dim.begin()); - maybe_wrap_dims(ret.dim, input_dim); + maybe_wrap_dims(ret.dim, input_dim, /*wrap_scalars=*/false); // Check dims are unique DimVector copy = ret.dim; @@ -520,7 +572,7 @@ static Tensor fft_hfftn_impl( } const auto last_dim = desc.dim.back(); - tmp = at::conj(tmp); + tmp = tmp.conj(); return fft_c2r_maybe_out(fname, out, tmp, last_dim, norm, last_dim_size); } @@ -558,7 +610,7 @@ static Tensor fft_ihfftn_impl( const auto last_dim = desc.dim.back(); auto tmp = at::_fft_r2c(x, last_dim, norm, /*onesided=*/true); if (desc.dim.size() == 1) { - return out.defined() ? at::conj_physical_out(tmp, out) : at::conj(tmp); + return out.defined() ? at::conj_physical_out(tmp, out) : tmp.conj(); } tmp = at::conj_physical(tmp); @@ -698,7 +750,7 @@ DimVector default_alldims(const Tensor& self, at::OptionalIntArrayRef dim_opt) { IntArrayRef dim_unwrapped = *dim_opt; dim.resize(dim_unwrapped.size()); for (const auto i : c10::irange(dim.size())) { - dim[i] = maybe_wrap_dim(dim_unwrapped[i], self.dim()); + dim[i] = maybe_wrap_dim(dim_unwrapped[i], self.dim(), /*wrap_scalars=*/false); } } else { dim.resize(self.dim()); @@ -796,20 +848,17 @@ Tensor stft(const Tensor& self, const int64_t n_fft, const optional hop const bool return_complex = return_complexOpt.value_or( self.is_complex() || (window.defined() && window.is_complex())); if (!return_complex) { - if (!return_complexOpt.has_value()) { - TORCH_WARN_ONCE( - "stft will soon require the return_complex parameter be given for real inputs, " - "and will further require that return_complex=True in a future PyTorch release." - ); - } + TORCH_CHECK(return_complexOpt.has_value(), + "stft requires the return_complex parameter be given for real inputs, " + "and will further require that return_complex=True in a future PyTorch release."); - // TORCH_WARN_ONCE( - // "stft with return_complex=False is deprecated. In a future pytorch " - // "release, stft will return complex tensors for all inputs, and " - // "return_complex=False will raise an error.\n" - // "Note: you can still call torch.view_as_real on the complex output to " - // "recover the old return format."); + TORCH_WARN_ONCE( + "stft with return_complex=False is deprecated. In a future pytorch " + "release, stft will return complex tensors for all inputs, and " + "return_complex=False will raise an error.\n" + "Note: you can still call torch.view_as_real on the complex output to " + "recover the old return format."); } if (!at::isFloatingType(self.scalar_type()) && !at::isComplexType(self.scalar_type())) { @@ -973,12 +1022,10 @@ Tensor istft(const Tensor& self, const int64_t n_fft, const optional ho const auto hop_length = hop_lengthOpt.value_or(n_fft >> 2); const auto win_length = win_lengthOpt.value_or(n_fft); - if (!self.is_complex()) { - TORCH_WARN_ONCE( - "istft will require a complex-valued input tensor in a future PyTorch release. " - "Matching the output from stft with return_complex=True. "); - } - Tensor input = self.is_complex() ? self.is_conj() ? at::view_as_real(self.resolve_conj()) : at::view_as_real(self) : self; + TORCH_CHECK(self.is_complex(), + "istft requires a complex-valued input tensor matching the " + "output from stft with return_complex=True."); + Tensor input = at::view_as_real(self.resolve_conj()); const auto input_dim = input.dim(); const auto n_frames = input.size(-2); const auto fft_size = input.size(-3); @@ -1006,13 +1053,13 @@ Tensor istft(const Tensor& self, const int64_t n_fft, const optional ho if (onesided) { if (n_fft / 2 + 1 != fft_size) { std::ostringstream ss; - REPR(ss) << ": expected the frequency dimension (3rd to the last) of the input tensor to match n_fft / 2 + 1 when onsided=True, but got " << fft_size; + REPR(ss) << ": expected the frequency dimension (3rd to the last) of the input tensor to match n_fft / 2 + 1 when onesided=True, but got " << fft_size; AT_ERROR(ss.str()); } } else { if (n_fft != fft_size) { std::ostringstream ss; - REPR(ss) << ": expected the frequency dimension (3rd to the last) of the input tensor to match n_fft when onsided=False, but got " << fft_size; + REPR(ss) << ": expected the frequency dimension (3rd to the last) of the input tensor to match n_fft when onesided=False, but got " << fft_size; AT_ERROR(ss.str()); } } @@ -1048,7 +1095,7 @@ Tensor istft(const Tensor& self, const int64_t n_fft, const optional ho input = input.unsqueeze(0); } - input = as_complex(input.transpose(1, 2)); // size: (channel, n_frames, fft_size, 2) + input = as_complex(input.transpose(1, 2)); // size: (channel, n_frames, fft_size) const fft_norm_mode norm = normalized ? fft_norm_mode::by_root_n : fft_norm_mode::by_n; if (return_complex) { @@ -1065,26 +1112,23 @@ Tensor istft(const Tensor& self, const int64_t n_fft, const optional ho TORCH_INTERNAL_ASSERT(input.size(2) == n_fft); Tensor y_tmp = input * window_tmp.view({1, 1, n_fft}); // size: (channel, n_frames, n_fft) - y_tmp = y_tmp.transpose(1, 2); // size: (channel, n_fft, frame) - - Tensor y = at::col2im(y_tmp, - /*output_size*/ {1, (n_frames - 1) * hop_length + n_fft}, - /*kernel_size*/ {1, n_fft}, - /*dilation*/ {1, 1}, - /*padding*/ {0, 0}, - /*stride*/ {1, hop_length} - ).squeeze(2); - window_tmp = window_tmp.pow(2).view({n_fft, 1}).repeat({1, n_frames}).unsqueeze(0); // size: (1, n_fft, n_frames) - Tensor window_envelop = at::col2im(window_tmp, - /*output_size*/ {1, (n_frames - 1) * hop_length + n_fft}, - /*kernel_size*/ {1, n_fft}, - /*dilation*/ {1, 1}, - /*padding*/ {0, 0}, - /*stride*/ {1, hop_length} - ).squeeze(2); // size: (1, 1, expected_output_signal_len) - - TORCH_INTERNAL_ASSERT(expected_output_signal_len == y.size(2)); - TORCH_INTERNAL_ASSERT(expected_output_signal_len == window_envelop.size(2)); + + Tensor y = at::unfold_backward( + y_tmp, + /*input_sizes=*/{y_tmp.size(0), expected_output_signal_len}, + /*dim=*/1, + /*size=*/n_fft, + /*step=*/hop_length); + window_tmp = window_tmp.pow(2).expand({1, n_frames, n_fft}); // size: (1, n_frames, n_fft) + Tensor window_envelop = at::unfold_backward( + window_tmp, + /*input_sizes=*/{1, expected_output_signal_len}, + /*dim=*/1, + /*size=*/n_fft, + /*step=*/hop_length); // size: (1, expected_output_signal_len) + + TORCH_INTERNAL_ASSERT(expected_output_signal_len == y.size(1)); + TORCH_INTERNAL_ASSERT(expected_output_signal_len == window_envelop.size(1)); // We need to trim the front padding away if centered const auto start = center ? n_fft / 2 : 0; @@ -1098,16 +1142,16 @@ Tensor istft(const Tensor& self, const int64_t n_fft, const optional ho return expected_output_signal_len; }(); - y = y.slice(2, start, end, 1); - window_envelop = window_envelop.slice(2, start, end, 1); - const auto window_envelop_lowest = window_envelop.abs().min().item().toDouble(); - if (window_envelop_lowest < 1e-11) { + y = y.slice(1, start, end, 1); + window_envelop = window_envelop.slice(1, start, end, 1); + const auto window_envelop_lowest = window_envelop.abs().min().lt(1e-11); + if (at::is_scalar_tensor_true(window_envelop_lowest)) { std::ostringstream ss; REPR(ss) << "window overlap add min: " << window_envelop_lowest; AT_ERROR(ss.str()); } - y = (y / window_envelop).squeeze(1); // size: (channel, expected_output_signal_len) + y = (y / window_envelop); // size: (channel, expected_output_signal_len) if (input_dim == 3) { y = y.squeeze(0); } @@ -1121,7 +1165,7 @@ Tensor istft(const Tensor& self, const int64_t n_fft, const optional ho } return y; - #undef REPR +#undef REPR } Tensor istft(const Tensor& self, const int64_t n_fft, const optional hop_lengthOpt, @@ -1138,7 +1182,7 @@ void _fft_fill_with_conjugate_symmetry_(const Tensor& input, IntArrayRef dim_) { const auto input_strides = input.strides(); TORCH_CHECK(dim_.size() > 0); DimVector dim(dim_.begin(), dim_.end()); - at::maybe_wrap_dims(dim, input_strides.size()); + at::maybe_wrap_dims(dim, input_strides.size(), /*wrap_scalars=*/false); if (input.numel() == 0 || input_sizes[dim.back()] <= 2) { return; // No elements need writing diff --git a/aten/src/ATen/native/SpmmReduce.cpp b/aten/src/ATen/native/SpmmReduce.cpp deleted file mode 100644 index cdbce3fe4b36..000000000000 --- a/aten/src/ATen/native/SpmmReduce.cpp +++ /dev/null @@ -1,32 +0,0 @@ -#include -#include -#include - -namespace at { namespace native { - -Tensor spmm_sum_cpu( - const Tensor& rowptr, - const Tensor& col, - const c10::optional& optional_value, - const Tensor& mat) { - TORCH_CHECK(rowptr.dim() == 1); - TORCH_CHECK(col.dim() == 1); - if (optional_value.has_value()) { - TORCH_CHECK(optional_value.value().dim() == 1); - TORCH_CHECK(optional_value.value().size(0) == col.size(0)); - } - TORCH_CHECK(mat.dim() >= 2); - - Tensor other = mat.contiguous(); - - auto sizes = other.sizes().vec(); - sizes[other.dim() - 2] = rowptr.numel() - 1; - Tensor result = at::empty(sizes, other.options()); - spmm_sum_stub(kCPU, result, rowptr, col, optional_value, other); - - return result; -} - -DEFINE_DISPATCH(spmm_sum_stub); - -}} // at::native diff --git a/aten/src/ATen/native/SpmmReduce.h b/aten/src/ATen/native/SpmmReduce.h deleted file mode 100644 index ac34bf0090de..000000000000 --- a/aten/src/ATen/native/SpmmReduce.h +++ /dev/null @@ -1,12 +0,0 @@ -#pragma once - -#include -#include - -namespace at { namespace native { - -using spmm_sum_fn = void(*)(const Tensor&, const Tensor&, const Tensor&, const c10::optional&, const Tensor&); -DECLARE_DISPATCH(spmm_sum_fn, spmm_sum_stub); - -}} // at::native - diff --git a/aten/src/ATen/native/SummaryOps.cpp b/aten/src/ATen/native/SummaryOps.cpp index cf86225460ea..ae0b38c96efa 100644 --- a/aten/src/ATen/native/SummaryOps.cpp +++ b/aten/src/ATen/native/SummaryOps.cpp @@ -1,10 +1,17 @@ // Returns the frequency of elements of input non-negative integer tensor. +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include +#include #include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif namespace at { namespace native { @@ -20,15 +27,15 @@ Tensor _bincount_cpu_template( AT_ERROR("minlength should be >= 0"); } if (self.dim() == 1 && self.numel() == 0) { - return native::zeros({minlength}, kLong); + return at::zeros({minlength}, kLong); } if (self.dim() != 1 || *self.min().data_ptr() < 0) { AT_ERROR("bincount only supports 1-d non-negative integral inputs."); } bool has_weights = weights.defined(); - if (has_weights && weights.size(0) != self.size(0)) { - AT_ERROR("input and weights should have the same length"); + if (has_weights && (weights.dim() != 1 || weights.size(0) != self.size(0))) { + AT_ERROR("weights should be 1-d and have the same length as input"); } Tensor output; @@ -38,7 +45,7 @@ Tensor _bincount_cpu_template( const input_t* self_p = self.data_ptr(); if (has_weights) { - output = native::zeros( + output = at::zeros( {nbins}, optTypeMetaToScalarType(weights.options().dtype_opt()), weights.options().layout_opt(), @@ -50,7 +57,7 @@ Tensor _bincount_cpu_template( output_p[self_p[i]] += weights_p[i]; } } else { - output = native::zeros({nbins}, kLong); + output = at::zeros({nbins}, kLong); int64_t* output_p = output.data_ptr(); for (const auto i : c10::irange(self_size)) { output_p[self_p[i]] += 1L; diff --git a/aten/src/ATen/native/TensorAdvancedIndexing.cpp b/aten/src/ATen/native/TensorAdvancedIndexing.cpp index 951d9eeb18fa..7d23413c6560 100644 --- a/aten/src/ATen/native/TensorAdvancedIndexing.cpp +++ b/aten/src/ATen/native/TensorAdvancedIndexing.cpp @@ -47,31 +47,93 @@ // ...) // // where & and * represent the C-style address-of and indirection operations. +// #define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include -#include -#include +#include +#include +#include +#include #include #include -#include -#include -#include +#include +#include +#include +#include +#include +#include +#include #include #include #include #include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include #include -#include #include #include @@ -416,6 +478,7 @@ DEFINE_DISPATCH(put_stub); DEFINE_DISPATCH(take_stub); DEFINE_DISPATCH(masked_fill_stub); REGISTER_NO_CPU_DISPATCH(index_put_with_sort_stub); +REGISTER_NO_CPU_DISPATCH(index_put_with_sort_quantized_stub); DEFINE_DISPATCH(masked_select_serial_stub); DEFINE_DISPATCH(masked_select_stub); DEFINE_DISPATCH(masked_scatter_stub); @@ -428,6 +491,10 @@ DEFINE_DISPATCH(scatter_reduce_stub); DEFINE_DISPATCH(scatter_scalar_reduce_stub); DEFINE_DISPATCH(scatter_reduce_two_stub); +DEFINE_DISPATCH(scatter_add_expanded_index_stub); +DEFINE_DISPATCH(scatter_reduce_expanded_index_stub); +DEFINE_DISPATCH(gather_expanded_index_stub); + static bool all_strides_match(TensorList tensors) { TORCH_CHECK(tensors.size() >= 1); auto strides = tensors[0].strides(); @@ -521,9 +588,9 @@ AdvancedIndex::AdvancedIndex(const Tensor& src, TensorList indices_list) } } - // For CUDA tensors, force all index tensors to have the same striding to - // simplify the CUDA kernel. - if (indices.size() >= 2 && this->src.device().type() == kCUDA) { + // For CUDA/MPS tensors, force all index tensors to have the same striding to + // simplify the CUDA/MPS kernel. + if (indices.size() >= 2 && (this->src.device().type() == kCUDA || this->src.device().type() == kMPS)) { if (!all_strides_match(indices)) { for (auto & indice : indices) { indice = indice.contiguous(); @@ -1095,8 +1162,6 @@ Tensor & index_select_out_cpu_(const Tensor & self, int64_t dim, const Tensor & TORCH_CHECK(index.scalar_type() == ScalarType::Long || index.scalar_type() == ScalarType::Int, "index_select(): Expected dtype int32 or int64 for index"); TORCH_CHECK(self.scalar_type() == result.scalar_type(), "index_select(): self and result must have the same scalar type"); - TORCH_CHECK(dim == 0 || dim < self.dim(), - "index_select(): Indexing dim ", dim, " is out of bounds of tensor"); at::assert_no_internal_overlap(result); at::assert_no_overlap(result, self); at::assert_no_overlap(result, index); @@ -1258,13 +1323,17 @@ Tensor index_select_quantized_cpu_(const Tensor & self, int64_t dim, const Tenso return at::native::index_select_out_cpu_(self, dim, index, result); } -Tensor index_select_backward(const Tensor& grad, IntArrayRef self_sizes, int64_t dim, const Tensor& index) { +Tensor index_select_backward(const Tensor& grad, at::IntArrayRef self_sizes, int64_t dim, const Tensor& index) { + return at::native::index_select_backward_symint(grad, c10::fromIntArrayRefSlow(self_sizes), dim, index); +} + +Tensor index_select_backward_symint(const Tensor& grad, c10::SymIntArrayRef self_sizes, int64_t dim, const Tensor& index) { // for composite compliance, use out-of-place variant of // `index_add` if index tensor is a Tensor Subclass. if (isTensorSubclassLike(index)) { - return grad.new_zeros(self_sizes, grad.options()).index_add(dim, index, grad); + return grad.new_zeros_symint(self_sizes, grad.options()).index_add(dim, index, grad); } - return grad.new_zeros(self_sizes, grad.options()).index_add_(dim, index, grad); + return grad.new_zeros_symint(self_sizes, grad.options()).index_add_(dim, index, grad); } Tensor & index_fill_(Tensor & self, int64_t dim, const Tensor & index, const Scalar& source) { @@ -1359,14 +1428,18 @@ TORCH_IMPL_FUNC(gather_out) (const Tensor& self, int64_t dim, const Tensor& index, bool sparse_grad, const Tensor& result) { if (index.numel() == 0) return; dim = at::maybe_wrap_dim(dim, self.dim()); - gather_stub(result.device().type(), result, self, dim, index); + if (can_use_expanded_index_path(result, dim, index, self)) { + gather_expanded_index_stub(result.device().type(), result, self, index); + } else { + gather_stub(result.device().type(), result, self, dim, index); + } } Tensor gather_backward(const Tensor& grad, const Tensor& self, int64_t dim, const Tensor& index, bool sparse_grad) { if (sparse_grad) { return at::_gather_sparse_backward(self, dim, index, grad); } - auto result = grad.new_zeros(self.sizes()); + auto result = grad.new_zeros_symint(self.sym_sizes()); // for composite compliance, use out-of-place variant of // `scatter_add` if index tensor is a Tensor Subclass. if (isTensorSubclassLike(index)) { @@ -1504,18 +1577,107 @@ TORCH_IMPL_FUNC(scatter_add) if (index.numel() == 0) return; - if (globalContext().deterministicAlgorithms() && self.device().type() == DeviceType::CUDA && self.dim() == 1) { - TORCH_CHECK(index.dim() == 1 && src.dim() == 1, "index and src should be 1D tensors when self is a 1D tensor, " - "but their dims are ", index.dim(), " and ", src.dim(), ", respectively"); - TORCH_CHECK(index.numel() == src.numel(), "index and src should have same number of elements for 1D tensors, " - "but got ", index.numel(), " versus ", src.numel()); - TORCH_CHECK(dim == 0, "dim should be zero for 1D self tensor, but got ", dim); - torch::List> indices; - indices.reserve(1); - indices.push_back(index); - mut_out.index_put_(indices, src, true); + // See Note [Enabling Deterministic Operations] + // Avoid gpuAtomicAdd for CUDA if deterministic mode is turned on + if (globalContext().deterministicAlgorithms() && self.device().type() == DeviceType::CUDA) { + if (self.dim() == 1) { + // TODO: Pretty sure these checks can be removed, since they're done in + // `scatter_meta_impl`, which I think is always called before this + TORCH_CHECK(index.dim() == 1 && src.dim() == 1, "index and src should be 1D tensors when self is a 1D tensor, " + "but their dims are ", index.dim(), " and ", src.dim(), ", respectively"); + TORCH_CHECK(index.numel() == src.numel(), "index and src should have same number of elements for 1D tensors, " + "but got ", index.numel(), " versus ", src.numel()); + TORCH_CHECK(dim == 0, "dim should be zero for 1D self tensor, but got ", dim); + torch::List> indices; + indices.reserve(1); + indices.push_back(index); + mut_out.index_put_(indices, src, true); + } else { + Tensor mut_out_contig = mut_out.contiguous(); + + auto index_coords_sizes = index.sizes().vec(); + index_coords_sizes.push_back(self.dim()); + auto index_coords = at::empty( + index_coords_sizes, + at::TensorOptions().dtype(at::ScalarType::Long).device(self.device())); + + for (int64_t dim_other = 0; dim_other < self.dim(); dim_other++) { + if (dim_other == dim) { + continue; + } + auto dim_coord_vals = at::arange( + index.size(dim_other), + at::TensorOptions().device(self.device())); + + for (int64_t dim_unsqueeze = 0; dim_unsqueeze < self.dim() - 1; dim_unsqueeze++) { + dim_coord_vals = dim_coord_vals.unsqueeze((dim_unsqueeze >= dim_other) ? -1 : 0); + } + + auto view_sizes = index.sizes().vec(); + view_sizes.push_back(1); + auto view_strides = index_coords.strides().vec(); + view_strides[self.dim()] = self.dim(); + + at::as_strided( + index_coords, + view_sizes, + view_strides, + dim_other + ).copy_(dim_coord_vals.unsqueeze(-1)); + } + + auto view_sizes = index.sizes().vec(); + view_sizes.push_back(1); + auto view_strides = index_coords.strides().vec(); + view_strides[self.dim()] = self.dim(); + + at::as_strided( + index_coords, + view_sizes, + view_strides, + dim + ).copy_(index.unsqueeze(-1)); + + Tensor index_coords_flat = index_coords.flatten(0, -2); + + // Copy mut_out_contig's strides into a tensor + // TODO: Is there a utility function that already does this? + IntArrayRef mut_out_contig_strides = mut_out_contig.strides(); + Tensor coord_strides = at::empty( + {mut_out_contig.dim()}, + TensorOptions().dtype(at::ScalarType::Long).device(at::kCPU)); + std::memcpy( + coord_strides.data_ptr(), + mut_out_contig_strides.data(), + coord_strides.nbytes()); + coord_strides = coord_strides.to(mut_out_contig.device()); + + // `index_flat` contains the 1-D indices corresponding with the + // flattened `mut_out` + Tensor index_flat = (index_coords_flat * coord_strides).sum({-1}); + Tensor mut_out_flat = mut_out_contig.flatten(); + Tensor src_flat = at::as_strided( + src, + index.sizes(), + src.strides() + ).flatten(); + + torch::List> indices; + indices.reserve(1); + indices.push_back(index_flat); + + mut_out_flat.index_put_(indices, src_flat, true); + + if (!mut_out.is_contiguous()) { + mut_out.copy_(mut_out_flat.reshape(mut_out.sizes())); + } + } } else { - scatter_add_stub(self.device().type(), mut_out, dim, index, src); + if (can_use_expanded_index_path(mut_out, dim, index, src)) { + scatter_add_expanded_index_stub(self.device().type(), mut_out, index, src); + } else { + scatter_add_stub(self.device().type(), mut_out, dim, index, src); + } } } @@ -1530,13 +1692,27 @@ TORCH_IMPL_FUNC(scatter_reduce_two) // See issue https://github.com/pytorch/pytorch/issues/74770 TORCH_WARN_ONCE("scatter_reduce() is in beta and the API may change at any time."); + dim = at::maybe_wrap_dim(dim, self.dim()); + auto mut_out = const_cast(out); + + if (!self.is_same(mut_out)) { + mut_out.copy_(self); + } + + const auto op = meta::get_operator_enum(reduce, true); + + if (can_use_expanded_index_path(mut_out, dim, index, src)) { + scatter_reduce_expanded_index_stub(self.device().type(), mut_out, index, src, op, include_self); + return; + } + scatter_impl(self, dim, index, src, out, scatter_reduce_two_stub, scatter_stub, reduce, include_self); - if (meta::get_operator_enum(reduce, true) == SCATTER_GATHER_OP::REDUCE_MEAN) { + if (op == SCATTER_GATHER_OP::REDUCE_MEAN) { auto ones = at::ones_like(src); auto count = include_self ? at::ones_like(out) : at::zeros_like(out); count.scatter_add_(dim, index, ones); diff --git a/aten/src/ATen/native/TensorAdvancedIndexing.h b/aten/src/ATen/native/TensorAdvancedIndexing.h index a0c282d550e4..01ae7edf036a 100644 --- a/aten/src/ATen/native/TensorAdvancedIndexing.h +++ b/aten/src/ATen/native/TensorAdvancedIndexing.h @@ -5,6 +5,7 @@ #include #include #include +#include namespace at { struct TensorIterator; @@ -15,7 +16,7 @@ namespace at { namespace native { enum class SCATTER_GATHER_OP: uint8_t {REDUCE_ADD, REDUCE_MULTIPLY, REDUCE_MAXIMUM, REDUCE_MINIMUM, REDUCE_MEAN}; using index_put_with_sort_fn = void(*)(Tensor &, const c10::List> &, const Tensor &, bool accumulate, bool unsafe); - +using index_put_with_sort_quantized_fn = void(*)(Tensor& self, const c10::List>& indices, const Tensor& value, double scale, int zero_point, bool unsafe); using gather_fn = void (*)(const Tensor & result, const Tensor & self, int64_t dim, const Tensor & index); using scatter_fn = void(*)(const Tensor& self, int64_t dim, const Tensor& index, const Tensor& src); using scatter_fill_fn = void(*)(const Tensor& self, int64_t dim, const Tensor& index, const Scalar& src); @@ -28,7 +29,7 @@ using scatter_reduce_two_fn = void(*)(const Tensor& self, const int64_t dim, con const Tensor& src, const SCATTER_GATHER_OP& reduce); DECLARE_DISPATCH(index_put_with_sort_fn, index_put_with_sort_stub); - +DECLARE_DISPATCH(index_put_with_sort_quantized_fn, index_put_with_sort_quantized_stub); DECLARE_DISPATCH(gather_fn, gather_stub); DECLARE_DISPATCH(scatter_fn, scatter_stub); DECLARE_DISPATCH(scatter_fill_fn, scatter_fill_stub); @@ -39,4 +40,50 @@ DECLARE_DISPATCH(scatter_reduce_two_fn, scatter_reduce_two_stub); TORCH_API Tensor& index_out(Tensor& result, const Tensor & self, const c10::List>& indices); +// fast paths for GNN usage +template +bool can_use_expanded_index_path(const Tensor& self, int64_t dim, const Tensor& index, const Tensor& src) { + if (!self.device().is_cpu()) { return false; } + + const auto st = self.scalar_type(); + if (!(st == ScalarType::Float || st == ScalarType::Double || st == ScalarType::BFloat16)) { return false; } + + if (!is_radix_sort_available()) { return false; } + + // skip when having empty tensor + if (self.numel() == 0 || index.numel() == 0 || src.numel() == 0) { return false; } + + // skip when having scalar tensor + if (self.ndimension() == 0 || index.ndimension() == 0 || src.ndimension() == 0) { return false; } + + if (is_scatter_like) { + // using `spmm` for scatter would require sorting on index, + // this is only perf beneficial when the inner dimension, aka, `channels` + // is big enough. + constexpr int64_t threshold = 16; + if (index.numel() / index.size(0) < threshold) { return false; } + } + + // usually the expanded index has stride on the first dimension to be 1, + // and strides on other dims to be 0 or 1, e.g. + // shape [108365, 16]; strides [1, 0] + // shape [13264, 1, 7]; strides [1, 1, 0] + auto index_strides = index.strides().vec(); + bool is_index_expanded = index_strides[0] == 1; + for (const auto dim : c10::irange(1, index_strides.size())) { + if (index_strides[dim] > 1) { is_index_expanded = false; } + } + + // index is expanded + return dim == 0 && is_index_expanded && src.is_contiguous() && self.is_contiguous(); +} + +using scatter_add_expanded_index_fn = void(*)(const Tensor&, const Tensor&, const Tensor&); +using scatter_reduce_expanded_index_fn = void(*)(const Tensor&, const Tensor&, const Tensor&, const SCATTER_GATHER_OP& reduce, bool); +using gather_expanded_index_fn = void (*)(const Tensor&, const Tensor&, const Tensor&); + +DECLARE_DISPATCH(scatter_add_expanded_index_fn, scatter_add_expanded_index_stub); +DECLARE_DISPATCH(scatter_reduce_expanded_index_fn, scatter_reduce_expanded_index_stub); +DECLARE_DISPATCH(gather_expanded_index_fn, gather_expanded_index_stub); + }} // namespace at::native diff --git a/aten/src/ATen/native/TensorAdvancedIndexingUtils.h b/aten/src/ATen/native/TensorAdvancedIndexingUtils.h index 8ffff8b6e912..0c0db4b83f35 100644 --- a/aten/src/ATen/native/TensorAdvancedIndexingUtils.h +++ b/aten/src/ATen/native/TensorAdvancedIndexingUtils.h @@ -1,5 +1,5 @@ #pragma once -#include +#include #include #include @@ -57,7 +57,7 @@ const Tensor& value){ } static AdvancedIndex make_info(Tensor self, IOptTensorListRef orig) { - checkIndexTensorTypes(orig); + checkIndexTensorTypes(orig, /*allow_int*/ true); // first expand BoolTensor (masks) or ByteTensor (masks) into 1 or more LongTensors auto indices = expandTensors(self, orig); // next broadcast all index tensors together @@ -82,6 +82,12 @@ static AdvancedIndex make_info(Tensor self, IOptTensorListRef orig) { indice = indice.to(self.device()); } } + for (auto & indice : indices) { + if (indice.defined() && indice.dtype() == at::kInt) { + indice = indice.to(at::kLong); + } + } + return AdvancedIndex(self, indices); } diff --git a/aten/src/ATen/native/TensorCompare.cpp b/aten/src/ATen/native/TensorCompare.cpp index 1ce3e32377d8..5d3ee7d98d80 100644 --- a/aten/src/ATen/native/TensorCompare.cpp +++ b/aten/src/ATen/native/TensorCompare.cpp @@ -1,19 +1,73 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include -#include -#include -#include +#include +#include +#include +#include +#include +#include #include +#include #include #include -#include -#include -#include #include -#include #include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif namespace at { namespace meta { @@ -399,8 +453,19 @@ static void isin_sorting( } } +template +Device out_device(Args&... inps){ + for (const auto& i : {inps...}){ + if (i.device() != at::kCPU) { + return i.device(); + } + } + return at::kCPU; +} + + Tensor& where_self_out(const Tensor& condition, const Tensor& self, const Tensor& other, Tensor& out) { - Tensor self_, other_; + Tensor self_, other_, condition_; if (self.dtype() != other.dtype()) { auto result_type = at::native::result_type(self, other); self_ = self.to(result_type); @@ -409,16 +474,30 @@ Tensor& where_self_out(const Tensor& condition, const Tensor& self, const Tensor self_ = self; other_ = other; } + auto device = out_device(condition, self_, other_); + condition_ = condition; + if (device != at::kCPU) { // allow CPU scalars on non-cpu device + if (condition.device() != device && condition.ndimension() == 0) { + condition_ = condition.to(device); + } + if (self_.device() != device && self_.ndimension() == 0) { + self_ = self_.to(device); + } + if (other_.device() != device && other_.ndimension() == 0) { + other_ = other_.to(device); + } + } if (condition.scalar_type() == ScalarType::Byte) { TORCH_WARN_ONCE("where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead."); } else { TORCH_CHECK(condition.scalar_type() == ScalarType::Bool, "where expected condition to be a boolean tensor, but got a tensor with dtype ", condition.scalar_type()); } - Tensor cond_bool = condition.scalar_type() == ScalarType::Byte ? condition.to(ScalarType::Bool) : condition; + condition_ = condition_.scalar_type() == ScalarType::Byte ? condition_.to(ScalarType::Bool) : condition_; + // if there's still a device mismatch, let tensoriterator error out with it auto iter = at::TensorIteratorConfig() .check_all_same_dtype(false) .add_output(out) - .add_input(cond_bool) + .add_input(condition_) .add_input(self_) .add_input(other_) .build(); @@ -426,9 +505,11 @@ Tensor& where_self_out(const Tensor& condition, const Tensor& self, const Tensor return out; } + Tensor where(const Tensor& condition, const Tensor& self, const Tensor& other) { + auto device = out_device(condition, self, other); auto result_type = at::native::result_type(self, other); - Tensor ret = at::empty({0}, self.options().dtype(result_type)); + Tensor ret = at::empty({0}, self.options().dtype(result_type).device(device)); at::native::where_self_out(condition, self, other, ret); return ret; } diff --git a/aten/src/ATen/native/TensorConversions.cpp b/aten/src/ATen/native/TensorConversions.cpp index 819516f67397..96275bde8299 100644 --- a/aten/src/ATen/native/TensorConversions.cpp +++ b/aten/src/ATen/native/TensorConversions.cpp @@ -1,16 +1,206 @@ +// #define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include #include #include +#include #include +#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + +#include #include +#include #include +#include #include +#include namespace at { namespace native { +namespace { +// dense_to_sparse_{csr,bsr,csc,bsc} common helpers + +// Preparation fo the N-D dense -> sparse compressed conversion. +// The N-D input is converted to 3-D (single batch dim) where we check that the +// product of batch dims is nonzero and for each batch the sparse matrix +// contained within has the same number of non-zero elements. +// The batches are joined along the compressed axis. The generation of indices +// for this matrix can be performed in a single step followed by a single step +// conversion to restore the batch dimension. +void dense_to_sparse_compressed_prepare_check_mask_values_batched( + const Layout& target_layout, + Tensor& values, + Tensor& mask, + const int64_t& n_batch_dim) { + if (n_batch_dim > 1) { + // For inputs with more than 1 batch dim we flatten them out. + // Input shape (b0, b1 ..., bn, r, c) -> (b0 * b1 * ... * bn, r ,c) + values = values.flatten(0, n_batch_dim - 1); + mask = mask.flatten(0, n_batch_dim - 1); + } + + // For informative messaging form the name of the function + // to_sparse_{csr,csc,bsr,bsc}. + TORCH_CHECK( + mask.size(0) > 0, + "to_sparse_", + // We want the message to match the function name so generate the + // lowercase acronym for the layout + sparse_csr::layoutToString(target_layout, false, true), + ": Expected product of batch dimensions to be non-zero."); + + // Compute the number of non-zero elements in the first batch, expand to full + // size + auto nse_per_batch = mask.select(0, 0).sum().expand(mask.size(0)); + TORCH_CHECK( + mask.sum({-2, -1}).equal(nse_per_batch), + "Expect the same number of specified elements per batch."); + + // We need to join batches into a matrix increasing the length of the + // compressed axis. This allows us to create indices for a compressed matrix + // and de-batch them later (two kernels). Otherwise we would have to create + // indices for each batch individually requiring n_batch kernels. For csr/bsr, + // we already have the batch dim adjacent to the compressed axis and can + // flatten them together. For csc/bsc, we need to transpose first. + // For BSR/CSR (b, r, c) -> (b*r, c) + // For BSC/CSC (b, c, r) -> (r, b*c) + AT_DISPATCH_ROW_SPARSE_COMPRESSED_LAYOUTS( + target_layout, + "dense_to_sparse_compressed", + [&]() { + values = values.flatten(0, 1); + mask = mask.flatten(0, 1); + }, + [&]() { + values = values.transpose(0, 1).flatten(1, 2); + mask = mask.transpose(0, 1).flatten(1, 2); + }); +} + +// This function unfolds the compressed indices of a compressed sparse matrix +// into a batched compressed sparse tensor. +// This is analogous to an unflatten-like operation: +// unflatten(0, {b, r}) for csr/bsr with input shape (r*b, c) +// (output shape (b, r, c)) +// unflatten(1, {b, c}).transpose(0,1) for csc/bsc with input shape (r, c*b) +// (output shape (r, b, c) unflatten, (b, r, c) unflatten + transpose) +// This only operates on the compressed indices as the plain indices and values +// can be manipulated as described above without special handling. +// It is a prerequisite for the conversion that the sparsity pattern is sane for +// the batched shape. That is each batch has the same number of nonzero +// elements. +Tensor compressed_to_batched_compressed_indices( + const Tensor& compressed_in, + const int64_t& n_batch, + bool out_int32) { + auto n_compressed_per_batch = (compressed_in.size(0) - 1) / n_batch; + ScalarType out_type = out_int32 ? ScalarType::Int : ScalarType::Long; + auto batched_out = at::zeros( + {n_batch, n_compressed_per_batch + 1}, + compressed_in.options().dtype(out_type)); + + // If the compressed dimension has length zero there is 1 element in each + // batch and it is zero we already have this result formed + if (n_compressed_per_batch > 0) { + // Slice the compressed indices ignoring the leading 0 element and reshape + // to n-batch rows + auto trailing_slice = + compressed_in.slice(0, 1, c10::nullopt, 1).reshape({n_batch, -1}); + // Slice the compressed indices again selecting the elements corresponding + // to the batch boundary. The values here will be increasing multiples of + // nnz per batch. Reshape to n-batch rows (1 col) for broadcasting. + // This is equivalent to arange(n_batch) * nnz_per_batch with the same + // reshape + auto offsets = compressed_in.slice(0, 0, -1, n_compressed_per_batch) + .reshape({n_batch, -1}); + // Subtracting the offsets from each row of the reshaped compressed indices + // gives us the compressed indices within the batch. The leading element of + // each row is not computed as it is always zero. We copy into the view on + // the output buffer. + batched_out.narrow(-1, 1, n_compressed_per_batch) + .copy_(trailing_slice - offsets); + } + return batched_out; +} + +// After generating member tensors for sparse_compressed matrix, if the target +// shape is N-D we must reform the batch dimensions. +// Single kernel is used to restore one batch dimension in the compressed +// indices. From there full batch shape is restored by reshape. No special +// handling is needed for restoring batch dimensions of the values or +// plain_indices it can be done with reshape/unflatten. +void reshape_2d_sparse_compressed_members_to_nd_batched( + const IntArrayRef full_sizes, + const int64_t& n_batch_dim, + Tensor& compressed_indices, + Tensor& plain_indices, + Tensor& values) { + auto batch_shape = full_sizes.slice(0, n_batch_dim); + auto n_batch = std::accumulate( + batch_shape.begin(), batch_shape.end(), 1, std::multiplies()); + // NOTE: using this conversion requires the nnz per batch is the same for all + // batches that will be formed. We ensured this was the case on the way in so + // it is safe to use this conversion. + compressed_indices = compressed_to_batched_compressed_indices( + compressed_indices, n_batch, /*out_int32*/ false); + + // We can infer the last dim of the reshape targets, it will be nnz or + // nrow/ncol+1 depending on the layout and member tensor targeted. + auto batchsize_infer_last = DimVector(batch_shape); + batchsize_infer_last.push_back(-1); + + // -1 will be nnz per batch + plain_indices = plain_indices.reshape(batchsize_infer_last); + // -1 will be ncols (bsc,csc) or nrows (bsr,csr) + 1 + compressed_indices = compressed_indices.reshape(batchsize_infer_last); + // -1 will be nnz (per batch). + // Note: Unflatten rather than reshape as it will work + // for both blocked and unblocked layouts. reshape works for unblocked layouts + // only + values = values.unflatten(0, batchsize_infer_last); +} +} // namespace + // Take a Device that may not have device_index set (i.e., having it as -1 // representing the current device) and return the corresponding Device // according to the actual device at the time of this function call. No-op @@ -54,48 +244,52 @@ Tensor _to_copy( // memory_format is handled separately due to MemoryFormat::Preserve logic options = self.options().merge_in(options).memory_format(c10::nullopt); auto memory_format = optional_memory_format.value_or(MemoryFormat::Preserve); + // TODO: Use the dispatcher for this. // Currently there are unenumerated extensibility issues preventing this. - if (self.is_sparse_csr()) { - TORCH_CHECK( - memory_format == MemoryFormat::Preserve, - "sparse_csr only supports memory format Preserve, but got ", - memory_format, - " instead."); - - auto new_values = at::native::to( - self.values(), - dtype, - c10::kStrided, // values are strided - device, - pin_memory, - non_blocking, - true, // force copy since we're in _to_copy - memory_format); - - auto new_crow_indices = at::native::to( - self.crow_indices(), - self.crow_indices().scalar_type(), // indices are integral - c10::kStrided, // indices are strided - device, - pin_memory, - non_blocking, - true, // force copy since we're in _to_copy - memory_format); - - auto new_col_indices = at::native::to( - self.col_indices(), - self.col_indices().scalar_type(), // indices are integral - c10::kStrided, // indices are strided - device, - pin_memory, - non_blocking, - true, // force copy since we're in _to_copy - memory_format); - - return at::native::_sparse_csr_tensor_unsafe( - new_crow_indices, - new_col_indices, + if (at::sparse_csr::is_sparse_compressed(self)) { + TORCH_CHECK( + memory_format == MemoryFormat::Preserve, + "to(options): ", at::sparse_csr::layoutToString(self.layout()), + " only supports memory format Preserve, but got ", memory_format, + " instead."); + + Tensor compressed_indices, plain_indices; + std::tie(compressed_indices, plain_indices) = at::sparse_csr::getCompressedPlainIndices(self); + + const auto new_values = at::native::to( + self.values(), + dtype, + c10::kStrided, + device, + pin_memory, + non_blocking, + true, // force copy since we are in _to_copy + memory_format); + + const auto new_compressed_indices = at::native::to( + compressed_indices, + compressed_indices.scalar_type(), + c10::kStrided, + device, + pin_memory, + non_blocking, + true, // force copy since we are in _to_copy + memory_format); + + const auto new_plain_indices = at::native::to( + plain_indices, + plain_indices.scalar_type(), + c10::kStrided, + device, + pin_memory, + non_blocking, + true, // force copy since we are in _to_copy + memory_format); + + return at::native::_sparse_compressed_tensor_unsafe( + new_compressed_indices, + new_plain_indices, new_values, self.sizes(), new_values.scalar_type(), @@ -309,6 +503,15 @@ Tensor to_dense_backward(const Tensor& grad, const Tensor& input_) { auto input = input_.coalesce(); return grad.sparse_mask(input); } + if (at::sparse_csr::is_sparse_compressed(input_)) { + // TODO: implement sparse_compressed_mask + switch(input_.layout()) { + case kSparseCsr: return grad.sparse_mask(input_.to_sparse()).to_sparse_csr(); + case kSparseCsc: return grad.sparse_mask(input_.to_sparse()).to_sparse_csc(); + // BSR and BSC should be handled via implement sparse_compressed_mask + default: ; // fall back to unsupported input layout error + } + } if (input_.layout() == c10::kMkldnn) { return grad.to_mkldnn(input_.scalar_type()); } @@ -329,7 +532,8 @@ Tensor to_dense(const Tensor& tensor, c10::optional dtype) { } if (tensor.layout() == c10::kSparseCsr || tensor.layout() == c10::kSparseCsc || - tensor.layout() == c10::kSparseBsr) { + tensor.layout() == c10::kSparseBsr || + tensor.layout() == c10::kSparseBsc) { return tensor._to_dense(dtype); } if (tensor.layout() == c10::kMkldnn) { @@ -358,6 +562,14 @@ Tensor sparse_compressed_to_dense( TORCH_CHECK( !dtype.has_value(), "dtype argument is not supported by sparse_csr_to_dense"); + + // Guard upfront against hybrid tensors (causes segfault) + auto batch_ndim = sparse_csr::numBatchDimensions(self); + + TORCH_CHECK( + (self.dim() - batch_ndim) == 2, + "sparse_compressed_to_dense: Hybrid tensors are not supported"); + if (self.layout() == kSparseCsr) { Tensor dst = at::zeros(self.sizes(), self.options().layout(kStrided)); return dst.add_(self); @@ -384,26 +596,28 @@ Tensor sparse_compressed_to_dense( dst_transposed.add_(to_transposed_csr); return dst_transposed.transpose(batch_ndim, batch_ndim + 1); } - if (self.layout() == kSparseBsr) { - auto crow_indices = self.crow_indices(); - auto col_indices = self.col_indices(); + if (self.layout() == kSparseBsr || self.layout() == kSparseBsc) { + Tensor compressed_indices; + Tensor plain_indices; + std::tie(compressed_indices, plain_indices) = + sparse_csr::getCompressedPlainIndices(self); + auto values = self.values(); Tensor dense = at::zeros(self.sizes(), self.options().layout(kStrided)); if (self.dim() == 2) { // Pad shape so we can treat 2-d like batched, we will squeeze out the // phantom batch dim at the end - crow_indices = crow_indices.unsqueeze(0); - col_indices = col_indices.unsqueeze(0); - values = values.unsqueeze(0); - dense = dense.unsqueeze(0); + compressed_indices.unsqueeze_(0); + plain_indices.unsqueeze_(0); + values = values.unsqueeze_(0); + dense = dense.unsqueeze_(0); } if (self.dim() > 3) { // Flatten batch dims - auto n_batch_dim = self.dim() - 2; - crow_indices = crow_indices.flatten(0, n_batch_dim - 1); - col_indices = col_indices.flatten(0, n_batch_dim - 1); - values = values.flatten(0, n_batch_dim - 1); - dense = dense.flatten(0, n_batch_dim - 1); + compressed_indices = compressed_indices.flatten(0, batch_ndim - 1); + plain_indices = plain_indices.flatten(0, batch_ndim - 1); + values = values.flatten(0, batch_ndim - 1); + dense = dense.flatten(0, batch_ndim - 1); } // At this point everything has 3d shape either the batch dim was inserted, @@ -419,7 +633,10 @@ Tensor sparse_compressed_to_dense( dense = dense.reshape({n_batch, -1, values.size(-2), values.size(-1)}); for (auto batch : c10::irange(n_batch)) { Tensor batch_indices = at::_convert_indices_from_csr_to_coo( - crow_indices[batch], col_indices[batch], false, false); + compressed_indices[batch], + plain_indices[batch], + false, + self.layout() == kSparseBsc); auto batch_row_indices = batch_indices.select(0, 0); auto batch_col_indices = batch_indices.select(0, 1); auto offsets = batch_col_indices + @@ -557,16 +774,6 @@ Tensor view_dtype(const Tensor& self, ScalarType dtype) { return new_tensor; } -// Sparse layout conversions Start - -Tensor dense_to_sparse_csr(const Tensor& self) { - return self.to_sparse().to_sparse_csr(); -} - -Tensor dense_to_sparse_csc(const Tensor& self) { - return self.to_sparse().to_sparse_csc(); -} - Tensor _tile_tensor(const Tensor& self, IntArrayRef blocksize) { // This code turns a matrix into a sequence of blocks // @@ -641,6 +848,83 @@ std::pair _not_zero_mask_to_col_row_indices( return std::pair(col_indices, row_indices); } +// Sparse layout conversions Start + +Tensor dense_to_sparse_csr(const Tensor& self) { + auto n_batch_dim = self.dim() - 2; + auto values = self; + auto not_zero_mask = self != 0; + + if (n_batch_dim > 0) { + dense_to_sparse_compressed_prepare_check_mask_values_batched( + Layout::SparseCsr, values, not_zero_mask, n_batch_dim); + } + + Tensor col_indices; + Tensor row_indices; + std::tie(col_indices, row_indices) = _not_zero_mask_to_col_row_indices( + not_zero_mask, at::kLong, not_zero_mask.device()); + Tensor crow_indices = at::_convert_indices_from_coo_to_csr( + row_indices, not_zero_mask.size(0), false /*out_int32*/); + { + auto mask_indices = _mask_to_indices(not_zero_mask.flatten()); + values = values.flatten().index_select(0, mask_indices); + } + + if (n_batch_dim > 0) { + reshape_2d_sparse_compressed_members_to_nd_batched( + self.sizes(), n_batch_dim, crow_indices, col_indices, values); + } + return at::native::_sparse_csr_tensor_unsafe( + crow_indices, + col_indices, + values, + self.sizes(), + values.scalar_type(), + c10::kSparseCsr, + values.device()); +} + +Tensor dense_to_sparse_csc(const Tensor& self) { + auto n_batch_dim = self.dim() - 2; + auto values = self; + auto not_zero_mask = self != 0; + + if (n_batch_dim > 0) { + dense_to_sparse_compressed_prepare_check_mask_values_batched( + Layout::SparseCsc, values, not_zero_mask, n_batch_dim); + } + + Tensor col_indices; + Tensor row_indices; + // Compressed col indices are the same as the row indices of the transpose! + std::tie(row_indices, col_indices) = _not_zero_mask_to_col_row_indices( + not_zero_mask.transpose(1, 0), at::kLong, not_zero_mask.device()); + Tensor ccol_indices = at::_convert_indices_from_coo_to_csr( + col_indices, not_zero_mask.size(-1), false /*out_int32*/); + { + // We need to transpose the mask and values before flattening so the nnz dim + // will run in col-major order. + values = values.transpose(0, 1).flatten(); + auto mask_indices = + _mask_to_indices(not_zero_mask.transpose(0, 1).flatten()); + values = values.index_select(0, mask_indices); + } + + if (n_batch_dim > 0) { + reshape_2d_sparse_compressed_members_to_nd_batched( + self.sizes(), n_batch_dim, ccol_indices, row_indices, values); + } + return at::native::_sparse_csc_tensor_unsafe( + ccol_indices, + row_indices, + values, + self.sizes(), + values.scalar_type(), + c10::kSparseCsc, + values.device()); +} + Tensor dense_to_sparse_bsr(const Tensor& self, IntArrayRef blocksize) { TORCH_CHECK( blocksize[0] > 0 && blocksize[1] > 0, @@ -659,92 +943,37 @@ Tensor dense_to_sparse_bsr(const Tensor& self, IntArrayRef blocksize) { " needs to be divisible by blocksize[1] ", blocksize[1]); - auto block_size_0 = self.size(-2) / blocksize[0]; auto n_batch_dim = self.dim() - 2; auto values = _batch_tile_tensor(self, blocksize); auto not_zero_mask = _batch_tile_tensor((self != 0), blocksize); - // Find tiles that have at least 1 non-zero value in them. - not_zero_mask = not_zero_mask.any(-1).any(-1); + auto mask_shape = DimVector(not_zero_mask.sizes().slice(0, n_batch_dim + 2)); + // Can't use -1 here one of sparse/batch dims may be zero + mask_shape.push_back(blocksize[0] * blocksize[1]); + not_zero_mask = not_zero_mask.view(mask_shape).any(-1); if (n_batch_dim > 0) { - // for 3D input the mask is already flat along the batch dims, avoid - // creating unnessesary view - if (n_batch_dim > 1) { - // flatten out the batch dims for N-D input - not_zero_mask = not_zero_mask.flatten(0, n_batch_dim - 1); - } - TORCH_CHECK( - not_zero_mask.size(0) > 0, - "to_sparse_bsr: Expected product of batch dimensions to be non-zero."); - - // If the input is ND we assert that the same sparsity pattern - // is used across matrices. That means the same number of materialized - // values and *at the same location*. - // This requirement is not included in Pearu's blog post on BSR invariants. - // He specifically states that different batches may have different sparsity - // patterns as long as the number of specified elements is the same for all - // batches. - - auto not_zero_mask_0 = not_zero_mask.select(0, 0); - auto nse_per_batch = not_zero_mask_0.sum().repeat(not_zero_mask.size(0)); - TORCH_CHECK( - not_zero_mask.sum({-2, -1}).equal(nse_per_batch), - "Expect the same number of specified elements per batch."); + dense_to_sparse_compressed_prepare_check_mask_values_batched( + Layout::SparseBsr, values, not_zero_mask, n_batch_dim); } Tensor col_indices; Tensor row_indices; std::tie(col_indices, row_indices) = _not_zero_mask_to_col_row_indices( not_zero_mask, at::kLong, not_zero_mask.device()); - Tensor crow_indices; - if (n_batch_dim > 0) { - // reshape to put the (flattened) batch dims back in - col_indices = col_indices.reshape({not_zero_mask.size(0), -1}); - row_indices = row_indices.reshape({not_zero_mask.size(0), -1}); - crow_indices = at::empty( - {not_zero_mask.size(0), block_size_0 + 1}, col_indices.options()); - // For each batch compute crow_indices - for (auto batch : c10::irange(not_zero_mask.size(0))) { - Tensor batch_crow_indices = crow_indices[batch]; - at::_convert_indices_from_coo_to_csr_out( - batch_crow_indices, - row_indices[batch], - block_size_0, - false /* out_int32 */); - } - // At this point, we have constructed col_indices and crow_indices - // such that they are 2d with dim0 of length B = product(batchdims). We can - // now reshape them to the correct shapes. - auto batch_shape = self.sizes().slice(0, n_batch_dim); - crow_indices = crow_indices.unflatten(0, batch_shape); - col_indices = col_indices.unflatten(0, batch_shape); - - // Mask is also leading dim B, but we can't masked select wit it (see below) - // unless it is flat, then we can partially faltten values, index it along - // and unfold the result to batchdims + (nnz(per batch), ) - auto batch_sizes_nnz = DimVector(batch_shape); - batch_sizes_nnz.push_back(-1); // we can infer nnz - not_zero_mask = not_zero_mask.flatten(); - // TODO: masked_select does not support some form of broadcasting, so we're - // using the mask to construct indices that are then passed into - // index_select. This isn't ideal. - values = values.flatten(0, -3) - .index_select(0, _mask_to_indices(not_zero_mask)) - .unflatten(0, batch_sizes_nnz); + Tensor crow_indices = at::_convert_indices_from_coo_to_csr( + row_indices, not_zero_mask.size(0), false /*out_int32*/); - } else { - crow_indices = at::_convert_indices_from_coo_to_csr( - row_indices.view({-1}), block_size_0, false /* out_int32 */); - not_zero_mask = not_zero_mask.reshape({-1}); - // TODO: masked_select does not support some form of broadcasting, so we're - // using the mask to construct indices that are then passed into - // index_select. This isn't ideal. - values = values.reshape({-1, values.size(-2), values.size(-1)}) - .index_select(0, _mask_to_indices(not_zero_mask)); + { + auto mask_indices = _mask_to_indices(not_zero_mask.flatten()); + values = values.flatten(0, -3).index_select(0, mask_indices); } + if (n_batch_dim > 0) { + reshape_2d_sparse_compressed_members_to_nd_batched( + self.sizes(), n_batch_dim, crow_indices, col_indices, values); + } return at::native::_sparse_bsr_tensor_unsafe( crow_indices, col_indices, @@ -756,64 +985,319 @@ Tensor dense_to_sparse_bsr(const Tensor& self, IntArrayRef blocksize) { } Tensor dense_to_sparse_bsc(const Tensor& self, IntArrayRef blocksize) { - AT_ERROR( - "Conversion from ", self.layout(), " to SparseBsc is currently not supported."); - return self; + TORCH_CHECK( + blocksize[0] > 0 && blocksize[1] > 0, + "blocksize needs to be non zero, but got ", + blocksize); + TORCH_CHECK( + self.size(-2) % blocksize[0] == 0, + "Tensor size(-2) ", + self.size(-2), + " needs to be divisible by blocksize[0] ", + blocksize[0]); + TORCH_CHECK( + self.size(-1) % blocksize[1] == 0, + "Tensor size(-1) ", + self.size(-1), + " needs to be divisible by blocksize[1] ", + blocksize[1]); + auto n_batch_dim = self.dim() - 2; + auto is_batched = n_batch_dim > 0; + auto values = _batch_tile_tensor(self, blocksize); + auto not_zero_mask = _batch_tile_tensor((self != 0), blocksize); + auto mask_shape = DimVector(not_zero_mask.sizes().slice(0, n_batch_dim + 2)); + // Can't use -1 here one of sparse/batch dims may be zero + mask_shape.push_back(blocksize[0] * blocksize[1]); + not_zero_mask = not_zero_mask.view(mask_shape).any(-1); + + if (is_batched) { + dense_to_sparse_compressed_prepare_check_mask_values_batched( + Layout::SparseBsc, values, not_zero_mask, n_batch_dim); + } + + Tensor col_indices; + Tensor row_indices; + // Compressed col indices are the same as the row indices of the transpose! + std::tie(row_indices, col_indices) = _not_zero_mask_to_col_row_indices( + not_zero_mask.transpose(1, 0), at::kLong, not_zero_mask.device()); + // This only works if the col_indices vector is in ascending order. + Tensor ccol_indices = at::_convert_indices_from_coo_to_csr( + col_indices, not_zero_mask.size(-1), false /*out_int32*/); + { + // We need the block-values in col major order, but blocks themselves to + // remain in row-major order, so we transpose the leading two dims, leaving + // the trailing two dims as is. + values = values.transpose(0, 1).flatten(0, -3); + // The mask must transpose as well to index it correctly. + auto mask_indices = + _mask_to_indices(not_zero_mask.transpose(0, 1).flatten()); + values = values.index_select(0, mask_indices); + } + if (is_batched) { + reshape_2d_sparse_compressed_members_to_nd_batched( + self.sizes(), n_batch_dim, ccol_indices, row_indices, values); + } + + return at::native::_sparse_bsc_tensor_unsafe( + ccol_indices, + row_indices, + values, + self.sizes(), + values.scalar_type(), + c10::kSparseBsc, + values.device()); +} + +void _check_blocksize_matches( + const Tensor& self, + c10::optional blocksize_opt, + const std::string& name) { + if (blocksize_opt.has_value()) { + const auto blocksize = *blocksize_opt; + const auto self_values = self.values(); + const auto self_blocksize = at::DimVector({self_values.size(-2), self_values.size(-1)}); + TORCH_CHECK(self_blocksize == blocksize, + name, "(): the provided blocksize does not match the blocksize of the to be converted tensor, ", + "got (", blocksize[0], ", ", blocksize[1], ") ", + "but expected (", self_blocksize[0], ", ", self_blocksize[1], ")."); + } +} + +Tensor sparse_compressed_clone( + const Tensor& self, + c10::optional blocksize, + const std::string& name) { + _check_blocksize_matches(self, blocksize, name); + // Just returning self doesn't work + // RuntimeError: t.use_count() <= 1 INTERNAL ASSERT FAILED at + // "../torch/csrc/autograd/autograd_not_implemented_fallback.cpp":152, + // please report a bug to PyTorch. + const auto layout = self.layout(); + Tensor compressed_indices, plain_indices; + std::tie(compressed_indices, plain_indices) = at::sparse_csr::getCompressedPlainIndices(self); + auto values = self.values(); + return _sparse_compressed_tensor_unsafe( + compressed_indices, + plain_indices, + values, + self.sizes(), + values.scalar_type(), + layout, + values.device()); +} + +Tensor sparse_compressed_to_flipped( + const Tensor& self, + c10::optional blocksize, + const std::string& name) { + _check_blocksize_matches(self, blocksize, name); + + const auto layout = self.layout(); + // NOTE: errors on non-compressed sparse layouts. + const auto flipped_layout = at::sparse_csr::flip_compressed_layout(layout); + + // Suppose compressed_indices represent rows of an input in either + // CSR or BSR sparse compressed format. + // In order to convert a batched CSR/BSR index into a batched CSC/BSC index + // we perform the following steps: + // 1. Convert a sparse compressed index representing batches of matrices of + // shape (b, r, c) to a sparse compressed index that represents a single + // matrix of shape (b * r, c). + // 2. Turn the compressed indices of the matrix of shape (b * r, c) into + // COO indices. + // 3. Map these COO indices into the COO indices of a matrix of shape (r, b * c) + // such that if A is a matrix of shape (b * r, c) and B is a matrix of shape + // (r, b * c) such that + // A[(k * r):(k * r + r), :] = B[:, (k * c):(k * c + c)] for all k in arange(b), + // then A[i, j] = B[i', j']. + // This is equivalent to finding indices that match values of matrices + // tiled vertically to values of the same matrices tiled horizontally. + // 4. Convert the COO indices to the CSC/BSC indices and form the output. + // + // NOTE: the reason behind vertical/horizontal tiling is to be able to transform + // indices over all matrices in the batch in a single kernel call, since + // all the existing coo <-> compressed indices conversion methods assume + // a single matrix. + // + // CSC/BSC inputs are handled in a similar fashion with a "transposed" argument. + // See the comments below for detailed explanations on how exactly each step + // is performed. + + Tensor compressed_indices, plain_indices; + std::tie(compressed_indices, plain_indices) = at::sparse_csr::getCompressedPlainIndices(self); + auto values = self.values(); + const auto nnz = plain_indices.size(-1); + + const auto n_batches = compressed_indices.dim() - 1; + auto n_batches_nonzero = n_batches; + // Insert fake batch dim for simplicity + if (!n_batches) { + n_batches_nonzero = 1; + compressed_indices.unsqueeze_(0); + plain_indices.unsqueeze_(0); + values.unsqueeze_(0); + } + + // NOTE: these sparse_dims are true sparse dims only for CSR/CSC inputs. + // And for BSR/BSC these are / . + // In other words, sparse_dims stores ranges of valid indices in the row/col dims. + const auto sparse_dims = [&]() -> at::DimVector { + auto sparse_dims = at::DimVector(self.sizes().slice(n_batches, 2)); + if (layout == at::kSparseBsr || layout == at::kSparseBsc) { + std::array blocksize = {values.size(-2), values.size(-1)}; + sparse_dims[0] /= blocksize[0]; + sparse_dims[1] /= blocksize[1]; + } + return sparse_dims; + }(); + + // batch_sizes_nonempty stores at least one, potentially fake, batch dimension. + // rebatch_sizes_nonempty is equivalent to batch_sizes_nonempty.push_back(-1), + // and is used to unflatten batch dimensions from a dimension of size + // (batch_numel * dim_size,) for some dim_size. + const auto batch_sizes_nonempty = at::DimVector(plain_indices.sizes().slice(0, n_batches_nonzero)); + auto rebatch_sizes_nonempty = at::DimVector(batch_sizes_nonempty); + rebatch_sizes_nonempty.push_back(-1); + const auto batch_numel_nonzero = std::accumulate( + batch_sizes_nonempty.begin(), + batch_sizes_nonempty.begin() + n_batches_nonzero, + 1, + std::multiplies()); + + // Equivalent to (arange(batch_numel_nonzero).mul_(nnz)).reshape(batch_sizes_nonempty). + // We just compute it differently to use `add` kernel in place of `mul` for better + // performance. + const auto batch_nnz_offset = [&]() -> Tensor { + const auto wrapped_nnz = at::tensor({nnz}, compressed_indices.options()); + const auto offset = wrapped_nnz + .expand({batch_numel_nonzero}) + .cumsum(-1).sub_(wrapped_nnz) + .reshape(batch_sizes_nonempty); + return offset; + }(); + + // Step 1 for CSR/BSR inputs: + // Convert a sparse compressed index representing batches of matrices of + // shape (b, r, c) to a sparse compressed index that represents a single + // matrix of shape (b * r, c). + // The algorithm is identical for CSC/BSC inputs, with the batch dimensions + // flattened in the "transposed" dimension. + const auto compressed_indices_2d = [&]() -> Tensor { + // Extract offsets only relevant for the first :-1 elements in a row/col. + const auto compressed_offsets = compressed_indices.slice(-1, 0, -1); + // batch_offsets offsets each individual matrix row/col offsets by the total + // sum of nnz's of all the matrices with the smaller batch index. + const auto batch_offsets = batch_nnz_offset + .unsqueeze(-1).expand_as(compressed_offsets); + // compressed_offsets + batch_offsets creates an offset vector for a 2d matrix + // that is stored in a compressed sparse format. + const auto compressed_offsets_2d = compressed_offsets.add(batch_offsets).reshape({-1}); + const auto offsets_len = compressed_offsets_2d.numel(); + auto res = at::empty({offsets_len + 1}, compressed_indices.options()); + res.slice(-1, 0, -1).copy_(compressed_offsets_2d); + // By appending nnz * batch_numel_nonzero to (compressed_offsets + batch_offsets) + // a compressed index of a 2d matrix is formed. + res.slice(-1, -1).fill_(nnz * batch_numel_nonzero); + return res; + }(); + // More involved for compressed indices, but pretty easy for plain_indices and values: + // just squash batch dimensions. + const auto plain_indices_2d = plain_indices.flatten(0, n_batches_nonzero); + // NOTE: values are not 2d! They just represent values of a sparse compressed 2d matrix. + const auto values_2d = values.flatten(0, n_batches_nonzero); + + const auto is_out_int32 = compressed_indices.scalar_type() == ScalarType::Int; + + // Step 2 & 3: + // + // Turn the compressed indices of the matrix of shape (b * r, c) into COO indices. + // + // Map these COO indices into the COO indices of a matrix of shape (r, b * c) + // such that if A is a matrix of shape (b * r, c) and B is a matrix of shape + // (r, b * c) such that + // A[(k * r):(k * r + r), :] = B[:, (k * c):(k * c + c)] for all k in arange(b), + // then A[i, j] = B[i', j']. + // This is equivalent to finding indices that match values of matrices + // tiled vertically to values of the same matrices tiled horizontally. + + // coo <-> sparse index conversions assume CSR/BSR inputs. + // To CSC/BSC inputs these indices will appear "transposed". + const auto is_transposed_indices = layout == at::kSparseCsc || layout == at::kSparseBsc; + const auto coo_indices_2d_transposed = [&]() -> Tensor { + const auto coo_indices_2d = _convert_indices_from_csr_to_coo( + compressed_indices_2d, + plain_indices_2d, + is_out_int32, + /*transpose=*/true); // Flip rows/cols for convenience. + // Convert COO indices of (b * r, c) to (r, b * c). + // It is a map (i, j) -> { + // b = i // r + // i' = i % r + // j' = j + b * c + // return (i', j') + // } + // NOTE: we used transposed=true above! + auto i = coo_indices_2d.select(0, 1); + auto j = coo_indices_2d.select(0, 0); + auto b = i.div(is_transposed_indices ? sparse_dims[1] : sparse_dims[0], "trunc"); + // Modify i, j in-place. + i.fmod_(is_transposed_indices ? sparse_dims[1] : sparse_dims[0]); + j.add_(b * (is_transposed_indices ? sparse_dims[0] : sparse_dims[1])); + return coo_indices_2d; + }(); + + // Step 4: + // Convert the COO indices to the CSC/BSC indices and form the output. + // We need to sort COO indices along the "tranposed" dim to satisfy the + // invariant of sorted plain indices. + // Hash coo indices by converting 2d indices to linear offsets with + // more "weight" (aka stride) placed on the "transposed" dimension. + const auto coo_indices_2d_transposed_hashed = at::sparse::flatten_indices( + coo_indices_2d_transposed, + is_transposed_indices ? at::DimVector({sparse_dims[0], sparse_dims[1] * batch_numel_nonzero}) + : at::DimVector({sparse_dims[1], sparse_dims[0] * batch_numel_nonzero})); + const auto hash_argsort = std::get<1>(coo_indices_2d_transposed_hashed.sort()); + const auto coo_indices_2d_transposed_sorted = coo_indices_2d_transposed.index_select(1, hash_argsort); + + const auto new_compressed_indices_coo_2d = coo_indices_2d_transposed_sorted.select(0, 0); + const auto new_plain_indices_2d = coo_indices_2d_transposed_sorted.select(0, 1); + const auto new_values_2d = values_2d.index_select(0, hash_argsort); + + auto new_compressed_indices = compressed_to_batched_compressed_indices( + _convert_indices_from_coo_to_csr( + new_compressed_indices_coo_2d, + is_transposed_indices + ? batch_numel_nonzero * sparse_dims[0] + : batch_numel_nonzero * sparse_dims[1], + is_out_int32), + batch_numel_nonzero, + is_out_int32) + .unflatten(0, batch_sizes_nonempty); + auto new_plain_indices = new_plain_indices_2d.unflatten(0, rebatch_sizes_nonempty); + auto new_values = new_values_2d.unflatten(0, rebatch_sizes_nonempty); + // Kill fake batch dim if it was inserted. + if (!n_batches) { + new_compressed_indices.squeeze_(0); + new_plain_indices.squeeze_(0); + new_values.squeeze_(0); + } + + return _sparse_compressed_tensor_unsafe( + new_compressed_indices, + new_plain_indices, + new_values, + self.sizes(), + new_values.scalar_type(), + flipped_layout, + new_values.device()); } Tensor sparse_compressed_to_sparse_csr(const Tensor& self) { if (self.layout() == kSparseCsc) { - TORCH_CHECK( - self.dim() == 2, - "Expected self to be of dimension 2, but got ", - self.dim(), - "."); - auto sizes = self.sizes(); - auto ccol_indices = self.ccol_indices(); - auto row_indices = self.row_indices(); - auto values = self.values(); - - // convert CSC indices to COO indices and swap its rows - const bool out_int32 = ccol_indices.scalar_type() == ScalarType::Int; - Tensor indices_transposed = _convert_indices_from_csr_to_coo( - ccol_indices, row_indices, out_int32, true); - - // sort transposed indices - auto indices_scalar = - at::sparse::flatten_indices(indices_transposed, {sizes[0], sizes[1]}); - auto indicesPermutation = std::get<1>(indices_scalar.sort(0)); - auto indices_transposed_sorted = - indices_transposed.index_select(1, indicesPermutation); - - // construct a CSR tensor - auto new_row_indices = indices_transposed_sorted.select(0, 0); - auto new_col_indices = indices_transposed_sorted.select(0, 1); - auto new_values = values.index_select(0, indicesPermutation); - Tensor new_crow_indices = - _convert_indices_from_coo_to_csr(new_row_indices, sizes[0], out_int32); - - return _sparse_csr_tensor_unsafe( - new_crow_indices, - new_col_indices, - new_values, - {sizes[0], sizes[1]}, - new_values.scalar_type(), - c10::kSparseCsr, - new_values.device()); + return sparse_compressed_to_flipped(self, c10::nullopt, "to_sparse_csr"); } if (self.layout() == kSparseCsr) { - // Just returning self doesn't work - // RuntimeError: t.use_count() <= 1 INTERNAL ASSERT FAILED at - // "../torch/csrc/autograd/autograd_not_implemented_fallback.cpp":152, - // please report a bug to PyTorch. aten::to_sparse_csr - return at::native::_sparse_csr_tensor_unsafe( - self.crow_indices(), - self.col_indices(), - self.values(), - self.sizes(), - self.scalar_type(), - c10::kSparseCsr, - self.device()); + return sparse_compressed_clone(self, c10::nullopt, "to_sparse_csr"); } AT_ERROR( "sparse_compressed_to_sparse_csr expected SparseCsr or SparseCsc layout but got ", @@ -1150,59 +1634,73 @@ Tensor _csr_to_block_csr_cpu(const Tensor& self, IntArrayRef blocksize) { } Tensor sparse_compressed_to_sparse_bsr(const Tensor& self, IntArrayRef blocksize) { - TORCH_CHECK( - self.is_sparse_csr(), - "Can only convert CSR to SparseBsr, but got ", - self.layout(), - " instead."); - Tensor self_values = self.values(); - Tensor self_crow_indices = self.crow_indices(); - Tensor self_col_indices = self.col_indices(); - Tensor cpu_result = _csr_to_block_csr_cpu( - _sparse_csr_tensor_unsafe( - self_crow_indices.cpu(), - self_col_indices.cpu(), - self_values.cpu(), - self.sizes(), - self_values.scalar_type(), - self.layout(), - self_values.device()), - blocksize); - Tensor result_values = cpu_result.values().to(self_values.options()); - Tensor result_crow_indices = - cpu_result.crow_indices().to(self_crow_indices.options()); - Tensor result_col_indices = - cpu_result.col_indices().to(self_col_indices.options()); - return at::native::_sparse_bsr_tensor_unsafe( - result_crow_indices, - result_col_indices, - result_values, - self.sizes(), - result_values.scalar_type(), - c10::kSparseBsr, - result_values.device()); + if (self.layout() == kSparseBsc) { + return sparse_compressed_to_flipped(self, blocksize, "to_sparse_bsr"); + } + if (self.layout() == kSparseBsr) { + return sparse_compressed_clone(self, blocksize, "to_sparse_bsr"); + } + if (self.layout() == kSparseCsr) { + TORCH_CHECK(self.dim() == 2, + "to_sparse_bsr(): conversion from Csr to Bsr is only possible for 2d inputs, ", + "but got input of dimension ", self.dim(), " instead."); + Tensor self_values = self.values(); + Tensor self_crow_indices = self.crow_indices(); + Tensor self_col_indices = self.col_indices(); + Tensor cpu_result = _csr_to_block_csr_cpu( + _sparse_csr_tensor_unsafe( + self_crow_indices.cpu(), + self_col_indices.cpu(), + self_values.cpu(), + self.sizes(), + self_values.scalar_type(), + self.layout(), + at::kCPU), + blocksize); + Tensor result_values = cpu_result.values().to(self_values.options()); + Tensor result_crow_indices = + cpu_result.crow_indices().to(self_crow_indices.options()); + Tensor result_col_indices = + cpu_result.col_indices().to(self_col_indices.options()); + return at::native::_sparse_bsr_tensor_unsafe( + result_crow_indices, + result_col_indices, + result_values, + self.sizes(), + result_values.scalar_type(), + c10::kSparseBsr, + result_values.device()); + } + AT_ERROR( + "sparse_compressed_to_sparse_bsr expected SparseCsr, SparseBsr or SparseBsc layout but got ", + self.layout()); + return self; } Tensor sparse_compressed_to_sparse_bsc(const Tensor& self, IntArrayRef blocksize) { + if (self.layout() == kSparseBsr) { + return sparse_compressed_to_flipped(self, blocksize, "to_sparse_bsr"); + } + if (self.layout() == kSparseBsc) { + return sparse_compressed_clone(self, blocksize, "to_sparse_bsr"); + } AT_ERROR( - "Conversion from ", self.layout(), " to SparseBsc is currently not supported."); + "sparse_compressed_to_sparse_bsc expected SparseBsr or SparseBsc layout but got ", + self.layout()); return self; } Tensor sparse_compressed_to_sparse_csc(const Tensor& self) { + if (self.layout() == kSparseCsr) { + return sparse_compressed_to_flipped(self, c10::nullopt, "to_sparse_csc"); + } if (self.layout() == kSparseCsc) { - // Based on to_sparse_csr just returning self doesn't work - return _sparse_csc_tensor_unsafe( - self.ccol_indices(), - self.row_indices(), - self.values(), - self.sizes(), - self.scalar_type(), - c10::kSparseCsc, - self.device()); + return sparse_compressed_clone(self, c10::nullopt, "to_sparse_csc"); } AT_ERROR( - "Conversion from ", self.layout(), " to SparseCsc is currently not supported."); + "sparse_compressed_to_sparse_csc expected SparseCsr or SparseCsc layout but got ", + self.layout()); + return self; } Tensor sparse_compressed_to_sparse(const Tensor& self, int64_t sparse_dim) { @@ -1239,7 +1737,7 @@ Tensor sparse_compressed_to_sparse(const Tensor& self) { // Sparse layout conversions End Tensor to_meta(const Tensor& tensor) { - auto out = at::native::empty_strided_meta(tensor.sizes(), tensor.strides(), \ + auto out = at::native::empty_strided_meta_symint(tensor.sym_sizes(), tensor.sym_strides(), \ /*dtype=*/c10::make_optional(tensor.scalar_type()), /*layout=*/c10::make_optional(tensor.layout()), \ /*device=*/c10::make_optional(c10::Device(c10::kMeta)), /*pin_memory=*/c10::nullopt); // needs to handle wrapped numbers, so dtype promotion works properly. @@ -1255,13 +1753,13 @@ c10::optional to_meta(const c10::optional& tensor) { return c10::nullopt; } -std::vector to_meta(const at::TensorList& t_list) { +std::vector to_meta(at::ITensorListRef t_list) { std::vector outs; outs.reserve(t_list.size()); - for (const auto& i : c10::irange(t_list.size())) { - outs.push_back(to_meta(t_list[i])); + for (const auto& tensor : t_list) { + outs.push_back(to_meta(tensor)); } return outs; } - -}} // namespace at::native +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/TensorConversions.h b/aten/src/ATen/native/TensorConversions.h index 75a01ea0e755..8ec21a75dcac 100644 --- a/aten/src/ATen/native/TensorConversions.h +++ b/aten/src/ATen/native/TensorConversions.h @@ -19,7 +19,7 @@ bool to_will_alias( Tensor to_meta(const Tensor& tensor); c10::optional to_meta(const c10::optional& tensor); -std::vector to_meta(const at::TensorList& t_list); +std::vector to_meta(at::ITensorListRef t_list); } // namespace native } // namespace at diff --git a/aten/src/ATen/native/TensorDimApply.h b/aten/src/ATen/native/TensorDimApply.h index ad9ca857eeab..e75cd40caf48 100644 --- a/aten/src/ATen/native/TensorDimApply.h +++ b/aten/src/ATen/native/TensorDimApply.h @@ -1,4 +1,5 @@ -#include +#pragma once +#include #include namespace at { diff --git a/aten/src/ATen/native/TensorFactories.cpp b/aten/src/ATen/native/TensorFactories.cpp index 230f7964658d..7245cb77b1c5 100644 --- a/aten/src/ATen/native/TensorFactories.cpp +++ b/aten/src/ATen/native/TensorFactories.cpp @@ -1,31 +1,99 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include + +#include #include #include #include +#include #include #include -#include #include +#include +#include +#include #include -#include -#include -#include -#include #include -#include #include #include -#include -#include +#include + #ifndef AT_PER_OPERATOR_HEADERS #include +#include #else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #endif #include -#include -#include #include #include #include @@ -186,12 +254,7 @@ Tensor empty_cpu(IntArrayRef size, c10::optional dtype_opt, c10::opt return at::detail::empty_cpu(size, dtype_opt, layout_opt, device_opt, pin_memory_opt, memory_format_opt); } -Tensor empty_symint_cpu(c10::SymIntArrayRef size, c10::optional dtype_opt, c10::optional layout_opt, - c10::optional device_opt, c10::optional pin_memory_opt, c10::optional memory_format_opt) { - return at::native::empty_cpu(c10::asIntArrayRefSlow(size), dtype_opt, layout_opt, device_opt, pin_memory_opt, memory_format_opt); -} - -Tensor empty( +Tensor empty_names( IntArrayRef size, c10::optional names, c10::optional dtype, @@ -262,12 +325,6 @@ Tensor empty_like( // See [Note: hacky wrapper removal for TensorOptions] TensorOptions options_ = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); - - TORCH_CHECK( - !(options_.has_memory_format() && optional_memory_format.has_value()), - "Cannot set memory_format both in TensorOptions and explicit argument; please delete " - "the redundant setter."); - TensorOptions options = self.options() .merge_in(options_) @@ -388,17 +445,6 @@ Tensor empty_like_quantized( } } -Tensor new_empty( - const Tensor& self, - IntArrayRef size, - c10::optional dtype_opt, - c10::optional layout_opt, - c10::optional device_opt, - c10::optional pin_memory_opt - ) { - return self.new_empty_symint(c10::SymIntArrayRef::fromIntArrayRef(size), dtype_opt, layout_opt, device_opt, pin_memory_opt); -} - Tensor new_empty_symint( const Tensor& self, SymIntArrayRef size, @@ -414,10 +460,10 @@ Tensor new_empty_symint( return at::empty_symint(size, dtype, layout, device, pin_memory, c10::nullopt); } -Tensor new_empty_strided( +Tensor new_empty_strided_symint( const Tensor& self, - IntArrayRef size, - IntArrayRef stride, + c10::SymIntArrayRef size, + c10::SymIntArrayRef stride, c10::optional dtype, c10::optional layout, c10::optional device, @@ -426,7 +472,7 @@ Tensor new_empty_strided( // See [Note: hacky wrapper removal for TensorOptions] TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); - return at::empty_strided(size, stride, self.options().merge_in(options)); + return at::empty_strided_symint(size, stride, self.options().merge_in(options)); } // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ eye ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1023,12 +1069,12 @@ Tensor tril_indices_cpu( // // 3. sequential RAM + transpose: create an n X 2 Tensor, fill the Tensor // sequentially, and then transpose it. - AT_DISPATCH_ALL_TYPES_AND(kBFloat16, result.scalar_type(), "tril_indices", [&]() -> void { + AT_DISPATCH_INDEX_TYPES(result.scalar_type(), "tril_indices", [&]() -> void { // fill the Tensor with correct values - scalar_t* result_data = result.data_ptr(); + index_t* result_data = result.data_ptr(); int64_t i = 0; - scalar_t r = std::max(0, -offset), c = 0; + index_t r = std::max(0, -offset), c = 0; while (i < tril_size) { result_data[i] = r; result_data[tril_size + i++] = c; @@ -1061,14 +1107,14 @@ Tensor triu_indices_cpu( // create an empty Tensor with correct size auto result = at::native::empty_cpu({2, triu_size}, dtype_opt, layout_opt, device_opt, pin_memory_opt); - AT_DISPATCH_ALL_TYPES_AND(kBFloat16, result.scalar_type(), "triu_indices", [&]() -> void { + AT_DISPATCH_INDEX_TYPES(result.scalar_type(), "triu_indices", [&]() -> void { // fill the Tensor with correct values - scalar_t* result_data = result.data_ptr(); + index_t* result_data = result.data_ptr(); int64_t i = 0; // not typing std::max with scalar_t as it could be an unsigned type // NOTE: no need to check if the returned value of std::max overflows - // scalar_t, as i and triu_size act as a guard. - scalar_t c = std::max(0, offset), r = 0; + // index_t, as i and triu_size act as a guard. + index_t c = std::max(0, offset), r = 0; while (i < triu_size) { result_data[i] = r; result_data[triu_size + i++] = c; @@ -1090,14 +1136,6 @@ Tensor triu_indices_cpu( // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ zeros ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Tensor zeros(IntArrayRef size, - c10::optional dtype, - c10::optional layout, - c10::optional device, - c10::optional pin_memory) { - return at::zeros_symint(c10::SymIntArrayRef::fromIntArrayRef(size), dtype, layout, device, pin_memory); -} - Tensor zeros_symint(SymIntArrayRef size, c10::optional dtype, c10::optional layout, @@ -1123,8 +1161,16 @@ Tensor _efficientzerotensor(IntArrayRef size, return out; } +Tensor& zeros_sparse_out(IntArrayRef size, Tensor& result) { + result.sparse_resize_and_clear_(size, size.size(), 0.); + return result; +} + Tensor& zeros_out(IntArrayRef size, Tensor& result) { if (result.is_sparse()) { + // TODO: I think this branch should be dead, but we don't have an easy + // way to cover all sparse kernels with zeros_sparse_out, so retain this + // for now result.sparse_resize_and_clear_(size, size.size(), 0.); return result; } else { @@ -1495,7 +1541,7 @@ Tensor clone(const Tensor& src, c10::optional optional_memory if (memory_format == MemoryFormat::Preserve) { if (src.is_non_overlapping_and_dense()) { // Copy all strides, this is marginally faster than calling empty_like - self = at::empty_strided(src.sizes(), src.strides(), src.options()); + self = at::empty_strided_symint(src.sym_sizes(), src.sym_strides(), src.options()); } else { self = at::empty_like(src); } diff --git a/aten/src/ATen/native/TensorFactories.h b/aten/src/ATen/native/TensorFactories.h index 35e058df4b3a..2c0665518a9e 100644 --- a/aten/src/ATen/native/TensorFactories.h +++ b/aten/src/ATen/native/TensorFactories.h @@ -1,10 +1,9 @@ #pragma once #include -#include +#include +#include #include -#include -#include #ifndef AT_PER_OPERATOR_HEADERS #include diff --git a/aten/src/ATen/native/TensorIteratorReduce.cpp b/aten/src/ATen/native/TensorIteratorReduce.cpp index ea772bfe7e64..606a44222687 100644 --- a/aten/src/ATen/native/TensorIteratorReduce.cpp +++ b/aten/src/ATen/native/TensorIteratorReduce.cpp @@ -1,11 +1,14 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include -#include -#include -#include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #include /// Contains the implementation of parallel reductions in TensorIterator. diff --git a/aten/src/ATen/native/TensorProperties.cpp b/aten/src/ATen/native/TensorProperties.cpp index f509f5982d96..e37dbf56cc81 100644 --- a/aten/src/ATen/native/TensorProperties.cpp +++ b/aten/src/ATen/native/TensorProperties.cpp @@ -1,12 +1,27 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include -#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif -#include #include + namespace at { namespace native { @@ -22,8 +37,8 @@ bool nested_is_same_size(const Tensor& self, const Tensor& other) { "nested. While Other ", other.is_nested()? "is " : "is not ", "nested.") - const auto self_nt_size = get_nested_size_tensor(self); - const auto other_nt_size = get_nested_size_tensor(other); + const auto self_nt_size = _nested_tensor_size(self); + const auto other_nt_size = _nested_tensor_size(other); return at::equal(self_nt_size, other_nt_size); } int64_t size(const Tensor& self, int64_t dim) { @@ -54,7 +69,7 @@ bool cudnn_is_acceptable(const TensorBase& self) { // tensors. Maybe some cuDNN functions actually support empty tensors, but // native/THNN kernels shouldn't be much slower because the output is also // likely empty. - if (self.numel() == 0) return false; + if (self.sym_numel() == 0) return false; // NB: In the old Python code, there was also a test to see if the // cuDNN library was actually dynamically linked or not. I'm not // sure if we can actually test this. diff --git a/aten/src/ATen/native/TensorShape.cpp b/aten/src/ATen/native/TensorShape.cpp index 6eab75417476..f2ee31fe0bcd 100644 --- a/aten/src/ATen/native/TensorShape.cpp +++ b/aten/src/ATen/native/TensorShape.cpp @@ -1,33 +1,215 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include +#include +#include #include -#include +#include #include #include #include #include +#include +#include +#include +#include #include #include #include +#include #include #include -#include #include +#include #include -#include #include -#include +#include #include -#include -#include -#include #include -#include #include #include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif +#include #include #include +#include #include -#include namespace at { namespace meta { @@ -56,7 +238,7 @@ inline c10::MemoryFormat cat_compute_output_memory_format(const MaterializedITen return format.value(); } -TORCH_PRECOMPUTE_META_FUNC(cat)(ITensorListRef tensors, int64_t dim) { +TORCH_PRECOMPUTE_META_FUNC(cat)(const ITensorListRef& tensors, int64_t dim) { // previously, size [0] tensors were the only possible empty tensors; thus, it wasn't possible // to cat empty tensors unless all the other tensors were 1-dimensional, so we allowed these tensors // to be "skipped". We maintain this behavior for backwards compatibility, but only for this specific @@ -64,10 +246,10 @@ TORCH_PRECOMPUTE_META_FUNC(cat)(ITensorListRef tensors, int64_t dim) { auto materialized = tensors.materialize(); cat_check_no_zero_dim(materialized); - dim = at::legacy_cat_wrap_dim(dim, tensors); + dim = at::legacy_cat_wrap_dim(dim, materialized); // Checking names before the actual dimensions. - auto maybe_outnames = namedinference::compute_cat_outnames(tensors); + auto maybe_outnames = namedinference::compute_cat_outnames(materialized); TORCH_CHECK( materialized.size() > 0, "torch.cat(): expected a non-empty list of Tensors"); @@ -123,11 +305,11 @@ TORCH_PRECOMPUTE_META_FUNC(cat)(ITensorListRef tensors, int64_t dim) { size_t size_at_dim = 0; for (const auto i : c10::irange(materialized.size())) { const Tensor& t = materialized[i]; + all_same_dtype = all_same_dtype && out_dtype == t.scalar_type(); if (!at::native::cat_should_skip_tensor(t)) { at::native::check_cat_shape_except_dim(materialized[valid], t, dim, i); size_at_dim += t.size(dim); all_contiguous = all_contiguous && t.is_contiguous(memory_format); - all_same_dtype = all_same_dtype && out_dtype == t.scalar_type(); all_same_sizes_and_stride = all_same_sizes_and_stride && t.sizes() == materialized[valid].get().sizes() && t.strides() == materialized[valid].get().strides(); @@ -202,9 +384,48 @@ Tensor& set_storage_cpu_(Tensor& result, Storage storage, int64_t storage_offset return result; } -Tensor& set_(Tensor& result, const Tensor& storage, int64_t storage_offset, IntArrayRef size, IntArrayRef stride) { +Tensor& set_storage_meta__symint(Tensor& result, Storage storage, c10::SymInt storage_offset, c10::SymIntArrayRef size, c10::SymIntArrayRef stride) { + checkSetStorage(result, storage, storage_offset, size, stride); + + c10::SymDimVector contiguous_strides; + if (stride.data() == nullptr) { + // TODO: dedupe this with empty() symbolic logic + int64_t dim = size.size(); + contiguous_strides.resize(dim); + if (dim > 0) { + const auto last_idx = dim - 1; + contiguous_strides.at(last_idx) = 1; + for (auto i = last_idx - 1; i >= 0; --i) { + // TODO: max with 1 + contiguous_strides.at(i) = contiguous_strides.at(i+1) * size.at(i+1); + } + } + stride = contiguous_strides; + } + + // Run this before storage setting so we can access numel + result.unsafeGetTensorImpl()->set_sizes_and_strides(size, stride, storage_offset); + + // Matches maybe_resize_storage_cpu no-numel behavior + if (result.sym_numel() != 0) { + // maybe_resize_storage_cpu can handle no storage exists at all but + // that should never be the case here + TORCH_INTERNAL_ASSERT(storage); + TORCH_CHECK(storage.resizable(), "Trying to resize storage that is not resizable"); + // All meta data pointers are the same, so we don't have to "re" allocate + // it. TODO: Actually this might not quite be correct if we use special + // pointers to track whether or not fake cuda tensors are pinned or not + const auto itemsize = result.dtype().itemsize(); + c10::SymInt size_bytes = at::detail::computeStorageNbytes( + size, stride, itemsize, storage_offset); + storage.set_nbytes(std::move(size_bytes)); + } + return result; +} + +Tensor& set__symint(Tensor& result, const Tensor& storage, c10::SymInt storage_offset, c10::SymIntArrayRef size, c10::SymIntArrayRef stride) { TORCH_CHECK(storage.is_contiguous(), "passed in tensor to be used as storage must be contiguous"); - return result.set_(storage.storage(), storage_offset + storage.storage_offset(), size, stride); + return result.set__symint(storage.storage(), storage_offset + storage.sym_storage_offset(), size, stride); } Tensor& set_tensor_(Tensor& result, const Tensor& source) { @@ -300,7 +521,7 @@ Tensor sparse_broadcast_to(const Tensor& self, IntArrayRef size) { new_values_size[0] = new_indices_size[1]; Tensor new_values = values.expand(broadcast_dense_sizes).repeat_interleave(nnz_factor, 0); - Tensor new_indices = at::native::new_empty(indices, new_indices_size); + Tensor new_indices = indices.new_empty(new_indices_size); if (broadcast_sizes.size()>0) { // ones(broadcast_sizes).nonzero() is equivalent to // product(map(arange, broadcast_sizes)) but avoids creating @@ -318,8 +539,8 @@ Tensor sparse_broadcast_to(const Tensor& self, IntArrayRef size) { return at::sparse_coo_tensor(new_indices, new_values, size)._coalesced_(is_coalesced); } -Tensor broadcast_to(const Tensor& self, IntArrayRef size) { - return self.expand(size); +Tensor broadcast_to_symint(const Tensor& self, SymIntArrayRef size) { + return self.expand_symint(size); } std::vector broadcast_tensors(TensorList tensors) { @@ -327,7 +548,7 @@ std::vector broadcast_tensors(TensorList tensors) { } TORCH_IMPL_FUNC(cat_out_cpu) -(ITensorListRef tensors, +(const ITensorListRef& tensors, int64_t dim, int64_t valid, bool all_contiguous, @@ -428,6 +649,23 @@ Tensor concat(TensorList tensors, int64_t dim) { return at::cat(tensors, dim); } +// torch.concatenate, alias for torch.cat +Tensor& concatenate_out(TensorList tensors, Dimname dim, Tensor& result) { + return at::cat_out(result, tensors, dimname_to_position(tensors[0], dim)); +} + +Tensor concatenate(TensorList tensors, Dimname dim) { + return at::cat(tensors, dimname_to_position(tensors[0], dim)); +} + +Tensor& concatenate_out(TensorList tensors, int64_t dim, Tensor & result) { + return at::cat_out(result, tensors, dim); +} + +Tensor concatenate(TensorList tensors, int64_t dim) { + return at::cat(tensors, dim); +} + static bool sizes_match_except(IntArrayRef s1, IntArrayRef s2, int64_t dim_except /* should already be wrapped */) { if (s1.size() != s2.size()) { return false; @@ -458,16 +696,16 @@ static void check_cat_sparse_dims(Tensor const &t, ", but tensor at position ", pos, " has ", t.sparse_dim(), ", ", t.dense_dim(), "."); } -static Tensor cat_sparse_impl(TensorList tensors, int64_t dim) { +static Tensor cat_sparse_impl(const MaterializedITensorListRef& tensors, int64_t dim) { std::vector indices; std::vector values; - int64_t wrapped = maybe_wrap_dim(dim, tensors[0].dim()); - int64_t sparse_dim = tensors[0].sparse_dim(); - int64_t dense_dim = tensors[0].dense_dim(); - IntArrayRef sizes = tensors[0].sizes(); + int64_t wrapped = maybe_wrap_dim(dim, tensors[0].get().dim()); + int64_t sparse_dim = tensors[0].get().sparse_dim(); + int64_t dense_dim = tensors[0].get().dense_dim(); + IntArrayRef sizes = tensors[0].get().sizes(); if (wrapped < sparse_dim) { for (const auto i : c10::irange(tensors.size())) { - auto const &t = tensors[i]; + const Tensor& t = tensors[i]; check_cat_sparse_dims(t, i, sizes, wrapped, sparse_dim, dense_dim); indices.push_back(t._indices()); values.push_back(t._values()); @@ -486,7 +724,7 @@ static Tensor cat_sparse_impl(TensorList tensors, int64_t dim) { int64_t col = 0; int64_t cumulative_offset = 0; for (const auto i : c10::irange(tensors.size())) { - auto const &t = tensors[i]; + const Tensor& t = tensors[i]; int64_t this_piece_size = t._nnz(); // cumulative_offset is zero for the first piece, so // don't waste time doing this operation unless i > 0. @@ -502,10 +740,10 @@ static Tensor cat_sparse_impl(TensorList tensors, int64_t dim) { idxs, vals, sizes_copy, - optTypeMetaToScalarType(tensors[0].options().dtype_opt()), - tensors[0].options().layout_opt(), - tensors[0].options().device_opt(), - tensors[0].options().pinned_memory_opt()); + optTypeMetaToScalarType(tensors[0].get().options().dtype_opt()), + tensors[0].get().options().layout_opt(), + tensors[0].get().options().device_opt(), + tensors[0].get().options().pinned_memory_opt()); } else { // Catting along a dense dimension requires us to create new values. @@ -527,29 +765,33 @@ static Tensor cat_sparse_impl(TensorList tensors, int64_t dim) { // The dimension in each tensor's values object that corresponds to the overall dimension along which we're catting. int64_t values_dim = wrapped - sparse_dim + 1; // The final size along the catted dimension. - const int64_t total_size = std::accumulate(tensors.begin(), tensors.end(), static_cast(0), [values_dim](int64_t l, Tensor const &r) { - return l + r._values().size(values_dim); - }); - auto zeros_sizes = tensors[0]._values().sizes().vec(); + const int64_t total_size = std::accumulate( + tensors.begin(), + tensors.end(), + static_cast(0), + [values_dim](int64_t l, const Tensor& r) { + return l + r._values().size(values_dim); + }); + auto zeros_sizes = tensors[0].get()._values().sizes().vec(); int64_t cumulative_size = 0; std::vector vals_pieces; std::vector idxs_pieces; for (const auto i : c10::irange(tensors.size())) { - auto const &t = tensors[i]; + const Tensor& t = tensors[i]; check_cat_sparse_dims(t, i, sizes, wrapped, sparse_dim, dense_dim); // dimension 0 of values corresponds to the number of values, // rather than to any logical dimension of the sparse tensor. zeros_sizes[0] = t._values().size(0); zeros_sizes[values_dim] = cumulative_size; cumulative_size += t._values().size(values_dim); - auto z1 = native::zeros( + auto z1 = at::zeros( zeros_sizes, optTypeMetaToScalarType(t._values().options().dtype_opt()), t._values().options().layout_opt(), t._values().options().device_opt(), t._values().options().pinned_memory_opt()); zeros_sizes[values_dim] = total_size - cumulative_size; - auto z2 = native::zeros( + auto z2 = at::zeros( zeros_sizes, optTypeMetaToScalarType(t._values().options().dtype_opt()), t._values().options().layout_opt(), @@ -565,16 +807,17 @@ static Tensor cat_sparse_impl(TensorList tensors, int64_t dim) { at::cat(idxs_pieces, 1), at::cat(vals_pieces), sizes_copy, - optTypeMetaToScalarType(tensors[0].options().dtype_opt()), - tensors[0].options().layout_opt(), - tensors[0].options().device_opt(), - tensors[0].options().pinned_memory_opt()); + optTypeMetaToScalarType(tensors[0].get().options().dtype_opt()), + tensors[0].get().options().layout_opt(), + tensors[0].get().options().device_opt(), + tensors[0].get().options().pinned_memory_opt()); } } -Tensor cat_sparse(TensorList tensors, int64_t dim) { - auto maybe_outnames = namedinference::compute_cat_outnames(tensors); - auto result = cat_sparse_impl(tensors, at::legacy_cat_wrap_dim(dim, tensors)); +Tensor cat_sparse(const ITensorListRef& tensors, int64_t dim) { + auto materialized = tensors.materialize(); + auto maybe_outnames = namedinference::compute_cat_outnames(materialized); + auto result = cat_sparse_impl(materialized, at::legacy_cat_wrap_dim(dim, materialized)); namedinference::propagate_names_if_nonempty(result, maybe_outnames); return result; } @@ -660,54 +903,66 @@ std::vector chunk(const Tensor& self, int64_t chunks, int64_t dim) { TORCH_CHECK(chunks > 0, "chunk expects `chunks` to be greater than 0, got: ", chunks); - const auto dim_size = self.size(dim); - int64_t split_size = (dim_size + chunks - 1) / chunks; + const auto dim_size = self.sym_size(dim); + auto split_size = (dim_size + chunks - 1) / chunks; // We need to call split_with_sizes in the case where split_size and dimension size are 0, because // a call to split would discard the number of chunks (because we can have an arbitrary number of // 0-sized chunks adding up to 0). So, call split_with_sizes with the correct number of chunks, // eventually we will do this for all cases. if (split_size == 0 && dim_size == 0) { - std::vector split_sizes(chunks, split_size); + std::vector split_sizes(chunks, split_size); split_sizes[chunks - 1] = split_size - (split_size * chunks - dim_size); - return self.split_with_sizes(split_sizes, dim); + return self.split_with_sizes_symint(split_sizes, dim); } else { - return self.split(split_size, dim); + return self.split_symint(split_size, dim); } } -std::vector tensor_split(const Tensor& self, int64_t sections, int64_t dim) { +std::vector tensor_split_sections_symint(const Tensor& self, c10::SymInt sym_sections, int64_t dim) { TORCH_CHECK(self.dim() > 0, "tensor_split expected at least a 1-dimensional tensor, but got a tensor with ", self.dim()," dims"); int64_t dim_ = maybe_wrap_dim(dim, self.dim()); + // NB: intentional, sections specifies number of output tensors, which + // cannot be polymorphic + int64_t sections = sym_sections.guard_int(__FILE__, __LINE__); TORCH_CHECK(sections > 0, "number of sections must be larger than 0, got ", sections); - const auto dim_size = self.size(dim_); + const auto dim_size = self.sym_size(dim_); std::vector splits(sections); - int64_t min_split_size = dim_size / sections; - int64_t num_splits_one_extra = dim_size % sections; - int64_t start_idx = 0; + auto min_split_size = dim_size / sections; + auto num_splits_one_extra = dim_size % sections; + c10::SymInt start_idx = 0; for (const auto split_idx : c10::irange(sections)) { - int64_t split_size = (split_idx < num_splits_one_extra) ? (min_split_size + 1) : min_split_size; - splits[split_idx] = at::slice(self, dim_, start_idx, start_idx + split_size); + auto split_size = (num_splits_one_extra > split_idx) ? (min_split_size + 1) : min_split_size; + splits[split_idx] = at::slice_symint(self, dim_, start_idx, start_idx + split_size); start_idx += split_size; } return splits; } -std::vector tensor_split(const Tensor& self, IntArrayRef indices, int64_t dim) { +template +std::vector _tensor_split_indices(const Tensor& self, ArrayRef indices, int64_t dim) { TORCH_CHECK(self.dim() > 0, "tensor_split expected at least a 1-dimensional tensor, but got a tensor with ", self.dim()," dims"); int64_t dim_ = maybe_wrap_dim(dim, self.dim()); int64_t num_indices = indices.size(); std::vector splits(num_indices + 1); - int64_t start_idx = 0; + T start_idx(0); for (const auto split_idx : c10::irange(num_indices)) { - int64_t end_idx = indices[split_idx]; - splits[split_idx] = at::slice(self, dim_, start_idx, end_idx); + auto end_idx = indices[split_idx]; + splits[split_idx] = at::symint::slice(self, dim_, start_idx, end_idx); start_idx = end_idx; } - splits[num_indices] = at::slice(self, dim_, start_idx, self.size(dim_)); + splits[num_indices] = at::symint::slice(self, dim_, start_idx, at::symint::size(self, dim_)); return splits; } +std::vector tensor_split(const Tensor& self, IntArrayRef indices, int64_t dim) { + return _tensor_split_indices(self, indices, dim); +} + +std::vector tensor_split_indices_symint(const Tensor& self, SymIntArrayRef indices, int64_t dim) { + return _tensor_split_indices(self, indices, dim); +} + std::vector tensor_split(const Tensor& self, const Tensor& tensor_indices_or_sections, int64_t dim) { TORCH_CHECK(self.dim() > 0, "tensor_split expected at least a 1-dimensional tensor, but got a tensor with ", self.dim()," dims"); auto split_device = tensor_indices_or_sections.device(); @@ -843,12 +1098,7 @@ Tensor diag_embed(const Tensor& self, int64_t offset, int64_t dim1_, int64_t dim return result; } -Tensor expand_symint(const Tensor& self, c10::SymIntArrayRef packed_size, bool implicit) { - auto size = asIntArrayRefSlow(packed_size); - return self.expand(size, implicit); -} - -Tensor expand(const Tensor& self, IntArrayRef size, bool /*unused*/) { +Tensor expand(const Tensor& self, c10::IntArrayRef size, bool /*unused*/) { TORCH_CHECK(size.size() >= (size_t)self.dim(), "expand(", self.toString(), "{", self.sizes(), "}, size=", size, "): the number of sizes provided (", size.size(), ") ", @@ -864,7 +1114,7 @@ Tensor expand(const Tensor& self, IntArrayRef size, bool /*unused*/) { } Tensor expand_as(const Tensor& self, const Tensor& other) { - return self.expand(other.sizes()); + return self.expand_symint(other.sym_sizes()); } Tensor sum_to_size(const Tensor& self, IntArrayRef size) { @@ -884,6 +1134,7 @@ Tensor make_qtensor(const Tensor& self, IntArrayRef size, IntArrayRef stride, Qu } Tensor as_strided_tensorimpl(const Tensor& self, IntArrayRef size, IntArrayRef stride, optional storage_offset_) { + TORCH_INTERNAL_ASSERT(!self.is_mps(), "as_strided_tensorimpl does not work with MPS; call self.as_strided(...) instead"); auto storage_offset = storage_offset_.value_or(self.storage_offset()); auto result = at::detail::make_tensor( c10::TensorImpl::VIEW, Storage(self.storage()), self.key_set(), self.dtype()); @@ -891,6 +1142,22 @@ Tensor as_strided_tensorimpl(const Tensor& self, IntArrayRef size, IntArrayRef s return result; } +Tensor as_strided_tensorimpl_meta(const Tensor& self, IntArrayRef size, IntArrayRef stride, optional storage_offset_) { + auto storage_offset = storage_offset_.value_or(self.storage_offset()); + auto result = at::detail::make_tensor( + c10::TensorImpl::VIEW, Storage(self.storage()), self.key_set(), self.dtype()); + setStrided(result, size, stride, storage_offset); + return result; +} + +Tensor as_strided_tensorimpl_meta_symint(const Tensor& self, SymIntArrayRef sym_size, SymIntArrayRef sym_stride, optional sym_storage_offset_) { + auto sym_storage_offset = sym_storage_offset_.value_or(self.sym_storage_offset()); + auto result = at::detail::make_tensor( + c10::TensorImpl::VIEW, Storage(self.storage()), self.key_set(), self.dtype()); + setStrided(result, sym_size, sym_stride, sym_storage_offset); + return result; +} + Tensor as_strided_qtensorimpl(const Tensor& self, IntArrayRef size, IntArrayRef stride, optional storage_offset_) { auto storage_offset = storage_offset_.value_or(self.storage_offset()); auto quantizer = get_qtensorimpl(self)->quantizer(); @@ -921,20 +1188,18 @@ Tensor as_strided_qtensorimpl(const Tensor& self, IntArrayRef size, IntArrayRef return result; } -const Tensor &as_strided_(const Tensor& self, IntArrayRef size, IntArrayRef stride, optional storage_offset_) { - auto storage_offset = storage_offset_.value_or(self.storage_offset()); +const Tensor &as_strided__symint(const Tensor& self, SymIntArrayRef size, SymIntArrayRef stride, optional storage_offset_) { + auto storage_offset = storage_offset_.value_or(self.sym_storage_offset()); setStrided(self, size, stride, storage_offset); return self; } -Tensor narrow_copy_symint(const Tensor& self, int64_t dim, int64_t start, SymInt sym_length) { - return self.narrow_copy(dim, start, sym_length.expect_int()); -} - Tensor narrow_copy_dense(const Tensor& self, int64_t dim, int64_t start, int64_t length) { return self.narrow(dim, start, length).clone(at::MemoryFormat::Contiguous); } +// Should just use narrow_copy_out, but this API is used internally at Meta: +// https://github.com/pytorch/pytorch/pull/87045#issuecomment-1309353561 Tensor narrow_copy_dense_cpu(const Tensor& self, int64_t dim, int64_t start, int64_t length){ auto output = at::empty_like(self); return narrow_copy_dense_cpu_out(self, dim, start, length, output); @@ -944,9 +1209,10 @@ Tensor narrow_copy_sparse(const Tensor& self, int64_t dim, int64_t start, int64_ int64_t allDim = self.dim(); int64_t end = start+length; TORCH_CHECK(allDim > 0, "narrow() cannot be applied to a 0-dim tensor."); + TORCH_CHECK(length >= 0, "narrow(): length must be non-negative."); TORCH_CHECK(dim >= 0 && dim < allDim, "Dimension ", dim, " out of range. Expecting 0 <= dim < ", allDim, "."); - TORCH_CHECK(start >= 0 && length >= 0 && end <= self.size(dim), + TORCH_CHECK(start >= 0 && end <= self.size(dim), "Invalid range to narrow. range(start, start+length) must be a subset of range(0, ", self.size(dim), ").") Tensor indices = self._indices(); int64_t sparse_dim = self.sparse_dim(); @@ -974,6 +1240,8 @@ Tensor narrow_copy_sparse(const Tensor& self, int64_t dim, int64_t start, int64_ return newTensor._coalesced_(self.is_coalesced()); } +// Should just use narrow_copy_out, but this API is used internally at Meta: +// https://github.com/pytorch/pytorch/pull/87045#issuecomment-1309353561 Tensor& narrow_copy_dense_cpu_out( const Tensor& self, int64_t dim, int64_t start, int64_t length, Tensor& output ) { @@ -1057,20 +1325,35 @@ Tensor& narrow_copy_dense_cpu_out( Tensor narrow(const Tensor& self, int64_t dim, int64_t start, int64_t length) { TORCH_CHECK(self.dim() > 0, "narrow() cannot be applied to a 0-dim tensor."); + TORCH_CHECK(length >= 0, "narrow(): length must be non-negative."); auto cur_size = self.size(dim); if (start != cur_size) { // start being the end is valid, but not a valid dim specification. start = maybe_wrap_dim(start, cur_size); } - TORCH_CHECK(length >= 0 && start <= cur_size - length, + TORCH_CHECK(start <= cur_size - length, "start (", start, ") + length (", length, ") exceeds dimension size (", cur_size, ")."); return at::slice(self, dim, start, start + length, 1); } -Tensor narrow(const Tensor& self, int64_t dim, const Tensor& start, int64_t length) { +Tensor narrow_symint(const Tensor& self, int64_t dim, SymInt start, SymInt length) { + TORCH_CHECK(self.dim() > 0, "narrow() cannot be applied to a 0-dim tensor."); + TORCH_CHECK(length >= 0, "narrow(): length must be non-negative."); + auto cur_size = self.sym_size(dim); + if (start != cur_size) { // start being the end is valid, but not a valid dim specification. + start = maybe_wrap_dim(start, cur_size); + } + TORCH_CHECK(start <= cur_size - length, + "start (", start, ") + length (", length, ") exceeds dimension size (", cur_size, ")."); + return at::slice_symint(self, dim, start, start + length, 1); +} + +// This overload exists purely for XLA, because they wanted to pass in "symbolic" +// start via Tensor. +Tensor narrow_tensor_symint(const Tensor& self, int64_t dim, const Tensor& start, SymInt length) { TORCH_CHECK(start.dim() == 0 && isIntegralType(start.scalar_type(), /*includeBool=*/false), "start must be an 0-dim integral Tensor."); int64_t st = start.item(); - return at::narrow(self, dim, st, length); + return at::narrow_symint(self, dim, c10::SymInt(st), length); } std::tuple> @@ -1261,18 +1544,65 @@ Tensor alias_with_sizes_and_strides( return self_; } -Tensor reshape(const Tensor& self, IntArrayRef proposed_shape) { - // reshape has special autograd logic since it sometimes returns a view but sometimes does not - // we have to intercept here instead of using dispatcher - // otherwise we will see "autograd still running" kind of error in inference mode: - // * if we create a tensor in inference mode scope, - // then pass it to a inference mode decorated function, - // everything is fine - // * but if we create the input tensor not with inference mode, - // then errors like "Cannot set version_counter for inference tensor" arise - if (self.is_nested()) { - return at::_reshape_nested(self, proposed_shape); +Tensor reshape_symint(const Tensor& self, c10::SymIntArrayRef proposed_shape) { + if (self.is_sparse()) { + AT_ERROR("reshape is not implemented for sparse tensors"); + } + c10::SymDimVector shape = infer_size_dv(proposed_shape, self.sym_numel()); + + if (self.is_mkldnn()) { + return at::_mkldnn_reshape(self, c10::asIntArrayRefSlow(shape)); + } + + // `computeStride` returns the proper strides to use if this + // `reshape` can be just a view. + auto stride = at::detail::computeStride(self.sym_sizes(), self.sym_strides(), shape); + + // NB: Even though we have viewable geometry and the target strides here, + // we do not just call `as_strided` on `self` because the backward + // for `as_strided` is not as efficient as that of `view` (since the + // former is meant to handle general cases). + // + // Similarly we don't call `view` because it duplicates some of the work + // we've already done, and instead call our internal/private operator + // `_reshape_alias` that essentially does the same thing as `view` and + // `as_strided` without any of the extra overhead. + if (stride.has_value()) { + // Temporary check to revert to the old behavior/view in cases where the + // device is not supported (e.g. for XLA the operation is not supported + // so we use `view` instead). + // + // We need to do the checks here instead of in `native_functions.yaml` + // to preserve backwards compatibility. + if (!self.is_xla() && !self.is_lazy() && !self.is_ipu() && !at::isTensorSubclassLike(self)) { + return self._reshape_alias_symint(shape, stride.value()); + } else { + return self.view_symint(shape); + } + } + return at::_unsafe_view_symint(self.clone(at::MemoryFormat::Contiguous), shape); +} + +Tensor _reshape_copy_symint(const Tensor& self, c10::SymIntArrayRef proposed_shape) { + if (self.is_sparse()) { + TORCH_CHECK(0, "_reshape_copy is not implemented for sparse tensors"); + } + c10::SymDimVector shape = infer_size_dv(proposed_shape, self.sym_numel()); + + if (self.is_mkldnn()) { + TORCH_CHECK(0, "_reshape_copy not implemented for mkldnn tensors"); + } + + if (self.is_contiguous()) { + return self.view_symint(shape).clone(at::MemoryFormat::Contiguous); + } else { + return at::_unsafe_view_symint(self.clone(at::MemoryFormat::Contiguous), shape); } +} + +// Duplicate of above code for non-symbolic ints. Kept for BC purposes and to +// minimize breakages. +Tensor reshape(const Tensor& self, IntArrayRef proposed_shape) { if (self.is_sparse()) { AT_ERROR("reshape is not implemented for sparse tensors"); } @@ -1399,13 +1729,21 @@ QuantizerPtr create_subtensor_quantizer(const Tensor& self, bool is_select, int6 } Tensor select(const Tensor& self, int64_t dim, int64_t index) { + return at::select_symint(self, dim, c10::SymInt{index}); +} + +Tensor select(const Tensor& self, Dimname dim, int64_t index) { + return at::select_symint(self, dimname_to_position(self, dim), c10::SymInt{index}); +} + +Tensor select_symint(const Tensor& self, int64_t dim, c10::SymInt index) { int64_t ndim = self.dim(); if (ndim == 0) { TORCH_CHECK_INDEX(false, "select() cannot be applied to a 0-dim tensor."); } dim = maybe_wrap_dim(dim, ndim); - auto size = self.size(dim); - if (index < -size || index >= size) { + auto size = self.sym_sizes()[dim]; + if (size < -index || size <= index) { if (self.has_names() && self.names()[dim] != Dimname::wildcard()) { TORCH_CHECK_INDEX(false, "select(): index ", index, " out of range for tensor of size ", self.sizes(), " at dimension ", self.names()[dim]); @@ -1417,32 +1755,37 @@ Tensor select(const Tensor& self, int64_t dim, int64_t index) { index += size; } if (self.is_sparse()) { - return select_sparse(self, dim, index); + return select_sparse(self, dim, index.guard_int(__FILE__, __LINE__)); } - DimVector sizes(self.sizes().begin(), self.sizes().end()); - DimVector strides(self.strides().begin(), self.strides().end()); - auto storage_offset = self.storage_offset() + index * strides[dim]; - sizes.erase(sizes.begin() + dim); - strides.erase(strides.begin() + dim); Tensor result; if (self.is_quantized()) { - auto quantizer = create_subtensor_quantizer(self, true, index, index + 1, dim, 1); + auto local_index = index.guard_int(__FILE__, __LINE__); + + DimVector sizes(self.sizes().begin(), self.sizes().end()); + DimVector strides(self.strides().begin(), self.strides().end()); + auto storage_offset = self.storage_offset() + local_index * strides[dim]; + sizes.erase(sizes.begin() + dim); + strides.erase(strides.begin() + dim); + + auto quantizer = create_subtensor_quantizer(self, true, local_index, local_index + 1, dim, 1); result = as_strided_qtensorimpl(self, sizes, strides, storage_offset, quantizer); } else { - result = self.as_strided(sizes, strides, storage_offset); + std::vector sizes(self.sym_sizes().begin(), self.sym_sizes().end()); + std::vector strides(self.sym_strides().begin(), self.sym_strides().end()); + auto storage_offset = self.sym_storage_offset() + index * strides[dim]; + sizes.erase(sizes.begin() + dim); + strides.erase(strides.begin() + dim); + + result = self.as_strided_symint(sizes, strides, storage_offset); } namedinference::propagate_names_except(result, self, {dim}); return result; } -Tensor select(const Tensor& self, Dimname dim, int64_t index) { - return at::select(self, dimname_to_position(self, dim), index); -} - -Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { - auto grad_input = at::zeros(input_sizes, grad.options()); - grad_input.select(dim, index).copy_(grad); +Tensor select_backward_symint(const Tensor& grad, c10::SymIntArrayRef input_sizes, int64_t dim, c10::SymInt index) { + auto grad_input = at::zeros_symint(input_sizes, grad.options()); + grad_input.select_symint(dim, index).copy_(grad); return grad_input; } @@ -2095,10 +2438,6 @@ Tensor slice( // TODO: support negative strides TORCH_CHECK(step > 0, "slice step must be positive"); - // INT64_MAX stands for default value. - if (start_val == INT64_MAX) { - start_val = 0; - } if (start_val < 0) { start_val += sizes[dim]; } @@ -2125,6 +2464,10 @@ Tensor slice( auto quantizer = create_subtensor_quantizer(self, false, start_val, end_val, dim, step); result = as_strided_qtensorimpl(self, sizes, strides, storage_offset, quantizer); } else { + // NB: it is extremely important to perform a redispatch here for + // the MPS backend; if you call directly to as_strided_tensorimpl, + // the necessary metadata for MPS will not get setup and you will + // get silently wrong results result = self.as_strided(sizes, strides, storage_offset); } namedinference::propagate_names(result, self); @@ -2149,8 +2492,8 @@ std::vector split(const Tensor& self, int64_t split_size, int64_t dim) { return splits; } -std::vector split(const Tensor& self, IntArrayRef sizes, int64_t dim) { - return at::split_with_sizes(self, sizes, dim); +std::vector split_symint(const Tensor& self, c10::SymIntArrayRef sizes, int64_t dim) { + return at::split_with_sizes_symint(self, sizes, dim); } std::vector unsafe_split(const Tensor& self, int64_t split_size, int64_t dim) { @@ -2199,7 +2542,7 @@ std::vector split_with_sizes(const Tensor& self, IntArrayRef split_sizes TORCH_CHECK(length >= 0, "split_with_sizes expects split_sizes have only non-negative ", "entries, but got split_sizes=", split_sizes); - splits.push_back(self.narrow(dim, start_idx, length)); + splits.push_back(at::native::slice(self, dim, start_idx, start_idx + length, 1)); start_idx += length; } TORCH_CHECK(start_idx == dim_size, @@ -2521,6 +2864,132 @@ Tensor & transpose_(Tensor & self, int64_t dim0, int64_t dim1) { return self; } +namespace { +// Transpose implementation for sparse compressed layouts +// NB: We assume that dim1,dim0 have already been wrapped +static inline Tensor sparse_compressed_transpose( + const Tensor& self, + int64_t dim0, + int64_t dim1) { + auto compressed_inds = AT_DISPATCH_ROW_SPARSE_COMPRESSED_LAYOUTS( + self.layout(), + "compressed_inds", + [&self]() { return self.crow_indices(); }, + [&self]() { return self.ccol_indices(); }); + + auto plain_inds = AT_DISPATCH_ROW_SPARSE_COMPRESSED_LAYOUTS( + self.layout(), + "plain_inds", + [&self]() { return self.col_indices(); }, + [&self]() { return self.row_indices(); }); + + const auto n_batch_dim = compressed_inds.dim() - 1; + const auto n_dense_dim = self.dim() - n_batch_dim - 2; + + // In theory it works, but missing to_dense coverage to test + TORCH_CHECK( + n_dense_dim == 0, + "transpose(): hybrid sparse compressed tensors with dense dimensions are not supported"); + + // Classify transpose "type" + enum class TransposeDim : uint8_t { Batch, Sparse, Dense }; + auto classify_dim = [&n_batch_dim](const int64_t dim) { + if (dim < n_batch_dim) { + return TransposeDim::Batch; + } else if (dim > n_batch_dim + 1) { + return TransposeDim::Dense; + } else { + return TransposeDim::Sparse; + } + }; + + const auto transpose_type = classify_dim(dim0); + { + auto dim_type_name = [](const TransposeDim dim) { + switch (dim) { + case TransposeDim::Batch: + return "Batch"; + case TransposeDim::Dense: + return "Dense"; + case TransposeDim::Sparse: + return "Sparse"; + default: + TORCH_INTERNAL_ASSERT( + false, + "Impossible TransposeDim value: ", + static_cast>(dim)); + } + }; + const auto dim1_type = classify_dim(dim1); + TORCH_CHECK( + dim1_type == transpose_type, + "transpose(): can only transpose dimensions of the same type (Batch, Sparse, Dense), got ", + dim0, + "(", + dim_type_name(transpose_type), + ")", + " and ", + dim1, + "(", + dim_type_name(dim1_type), + ")"); + } + + // We have validated everything, early exit for equal dims (no effect) + if (dim0 == dim1) { + return self.clone(); + } + + auto result_sizes = DimVector(self.sizes()); + std::swap(result_sizes[dim0], result_sizes[dim1]); + Tensor result_vals; + auto result_layout = self.layout(); + + if (transpose_type == TransposeDim::Batch) { + compressed_inds = compressed_inds.transpose(dim0, dim1).contiguous(); + plain_inds = plain_inds.transpose(dim0, dim1).contiguous(); + result_vals = self.values().transpose(dim0, dim1).contiguous(); + + } else if (transpose_type == TransposeDim::Dense) { + // NB: This code should work, but is untestable due to lack of support for + // dense dimensions in to_dense. The Debug assert is present to emphasize + // the fact that the block should not be possible to hit this code block + TORCH_INTERNAL_ASSERT( + false, "transpose(): Shouldn't have reached this point"); + result_vals = AT_DISPATCH_PLAIN_SPARSE_COMPRESSED_LAYOUTS( + self.layout(), + "sparse_transpose", + // un-blocked: 2 sparse dims map to single nnz dim, so dense dim0/1 are + // one position left + [&]() { return self.values().transpose(dim0 - 1, dim1 - 1); }, + // blocked: 2 sparse dims map to 3 (nnz, ) + blocksize dims, so dense + // dim0/1 are one position right + [&]() { return self.values().transpose(dim0 + 1, dim1 + 1); }); + } else /*if (transpose_type == TransposeDim::Sparse) */ { + // Flip the layout + result_layout = sparse_csr::flip_compressed_layout(self.layout()); + result_vals = AT_DISPATCH_PLAIN_SPARSE_COMPRESSED_LAYOUTS( + self.layout(), + "sparse_transpose", + // un-blocked: no change to values, layout is flipped. + [&]() { return self.values(); }, + // blocked: the blocks are nested under the sparse dims so they must be + // transposed as well. + [&]() { + return self.values().transpose(-2 - n_dense_dim, -1 - n_dense_dim); + }); + } + return at::native::_sparse_compressed_tensor_unsafe( + compressed_inds, + plain_inds, + result_vals, + result_sizes, + self.scalar_type(), + result_layout, + self.device()); +} +} // namespace + Tensor transpose(const Tensor & self, int64_t dim0, int64_t dim1) { auto ndims = self.dim(); dim0 = maybe_wrap_dim(dim0, ndims); @@ -2533,45 +3002,25 @@ Tensor transpose(const Tensor & self, int64_t dim0, int64_t dim1) { Tensor self_clone = self.clone(); return sparse_transpose_(self_clone, dim0, dim1); } - TORCH_CHECK(!(self.layout() == kSparseBsr || self.layout() == kSparseBsc), - "Transposition of tensors with ", self.layout(), " layout is currently not supported."); - - // Transpose of a tensor is a view operation. - if (dim0 == dim1) { - return self; + if (self.layout() == kSparseBsr || self.layout() == kSparseCsr || + self.layout() == kSparseBsc || self.layout() == kSparseCsc) { + return sparse_compressed_transpose(self, dim0, dim1); } if (self.is_mkldnn()) { return at::_mkldnn_transpose(self, dim0, dim1); } - DimVector sizes(self.sizes().begin(), self.sizes().end()); - std::swap(sizes[dim0], sizes[dim1]); - - if (self.layout() == kSparseCsr) { - TORCH_CHECK(self.dim() == 2, "Transposition for layout ", self.layout(), " is only supported for 2D inputs.") - return at::native::_sparse_csc_tensor_unsafe( - self.crow_indices(), - self.col_indices(), - self.values(), - sizes, - self.scalar_type(), - c10::kSparseCsc, - self.device()); - } - if (self.layout() == kSparseCsc) { - return at::native::_sparse_csr_tensor_unsafe( - self.ccol_indices(), - self.row_indices(), - self.values(), - sizes, - self.scalar_type(), - c10::kSparseCsr, - self.device()); + // Transpose of a tensor is a view operation. + if (dim0 == dim1) { + return self.alias(); } - DimVector strides(self.strides().begin(), self.strides().end()); + + SymDimVector sizes(self.sym_sizes().begin(), self.sym_sizes().end()); + std::swap(sizes[dim0], sizes[dim1]); + SymDimVector strides(self.sym_strides().begin(), self.sym_strides().end()); std::swap(strides[dim0], strides[dim1]); - auto result = self.as_strided(sizes, strides); + auto result = self.as_strided_symint(sizes, strides); propagate_transposed_names(result, self, dim0, dim1); return result; } @@ -2599,30 +3048,30 @@ Tensor & t_(Tensor & self) { return self.transpose_(0, self.dim() < 2 ? 0 : 1); } -std::tuple +std::tuple inferSqueezeGeometry(const Tensor &tensor) { - DimVector sizes; - DimVector strides; + SymDimVector sizes; + SymDimVector strides; for(const auto d : c10::irange(tensor.dim())) { - if(tensor.sizes()[d] != 1) { - sizes.push_back(tensor.sizes()[d]); - strides.push_back(tensor.strides()[d]); + if(tensor.sym_sizes()[d] != 1) { + sizes.push_back(tensor.sym_sizes()[d]); + strides.push_back(tensor.sym_strides()[d]); } } return std::make_tuple(std::move(sizes), std::move(strides)); } -std::tuple +std::tuple inferSqueezeGeometry(const Tensor& tensor, int64_t dim) { - DimVector sizes; - DimVector strides; + SymDimVector sizes; + SymDimVector strides; for(const auto d : c10::irange(tensor.dim())) { - if(d != dim || tensor.sizes()[dim] != 1) { - sizes.push_back(tensor.sizes()[d]); - strides.push_back(tensor.strides()[d]); + if(d != dim || tensor.sym_sizes()[dim] != 1) { + sizes.push_back(tensor.sym_sizes()[d]); + strides.push_back(tensor.sym_strides()[d]); } } return std::make_tuple(std::move(sizes), std::move(strides)); @@ -2652,14 +3101,14 @@ inferUnsqueezeGeometry(const Tensor& tensor, int64_t dim) { // dim is present if squeezing a single dimension and absent if squeezing all dimensions Tensor squeeze_qtensor(const Tensor& self, c10::optional dim) { auto quantizer = get_qtensorimpl(self)->quantizer(); - DimVector sizes; - DimVector strides; + SymDimVector sizes; + SymDimVector strides; std::tie(sizes, strides) = dim.has_value() ? inferSqueezeGeometry(self, dim.value()) : inferSqueezeGeometry(self); if (quantizer->qscheme() == QScheme::PER_CHANNEL_AFFINE) { const auto* per_channel_quantizer = static_cast(quantizer.get()); auto axis = per_channel_quantizer->axis(); int64_t shift = 0; - integer_range dims = dim.has_value() ? integer_range{dim.value(), dim.value() + 1} : c10::irange(self.dim()); + integer_range dims = dim.has_value() ? integer_range{dim.value(), dim.value() + 1} : c10::irange(0, self.dim()); for (const auto d : dims) { if (self.sizes()[d] == 1) { TORCH_CHECK(axis != d, "Squeeze is only possible on non-axis dimension for Per-Channel Quantized Tensors."); @@ -2674,7 +3123,9 @@ Tensor squeeze_qtensor(const Tensor& self, c10::optional dim) { axis, quantizer->scalar_type()); } - auto result = make_qtensor(self, sizes, strides, quantizer); + // TODO: quantized Tensor support for SymInt needs to be added but basic building blocs + // are missing for now. + auto result = make_qtensor(self, c10::asIntArrayRefSlow(sizes), c10::asIntArrayRefSlow(strides), quantizer); if (dim.has_value()) { namedinference::propagate_names_except(result, self, {dim.value()}); } else { @@ -2687,7 +3138,7 @@ Tensor squeeze_qtensor(const Tensor& self, c10::optional dim) { Tensor squeeze(const Tensor& self) { auto g = inferSqueezeGeometry(self); - at::Tensor result = self.as_strided(std::get<0>(g), std::get<1>(g)); + at::Tensor result = self.as_strided_symint(std::get<0>(g), std::get<1>(g)); auto maybe_outnames = namedinference::compute_squeeze_outnames(self); namedinference::propagate_names_if_nonempty(result, maybe_outnames); return result; @@ -2703,11 +3154,11 @@ Tensor squeeze_quantized(const Tensor& self) { Tensor squeeze(const Tensor& self, int64_t dim) { int64_t dims = self.dim(); dim = maybe_wrap_dim(dim, dims); - if (dims == 0 || self.sizes()[dim] != 1) { - return self.as_strided(self.sizes(), self.strides()); + if (dims == 0 || self.sym_sizes()[dim] != 1) { + return self.as_strided_symint(self.sym_sizes(), self.sym_strides()); } auto g = inferSqueezeGeometry(self, dim); - auto result = self.as_strided(std::get<0>(g), std::get<1>(g)); + auto result = self.as_strided_symint(std::get<0>(g), std::get<1>(g)); namedinference::propagate_names_except(result, self, {dim}); return result; } @@ -2720,7 +3171,7 @@ Tensor squeeze_quantized(const Tensor& self, int64_t dim) { Tensor & squeeze_(Tensor& self) { auto g = inferSqueezeGeometry(self); - self.as_strided_(std::get<0>(g), std::get<1>(g)); + self.as_strided__symint(std::get<0>(g), std::get<1>(g)); return self; } @@ -2728,12 +3179,12 @@ Tensor & squeeze_(Tensor& self, int64_t dim) { int64_t dims = self.dim(); dim = maybe_wrap_dim(dim, self.dim()); - if (dims == 0 || self.sizes()[dim] != 1) { - self.as_strided_(self.sizes(), self.strides()); + if (dims == 0 || self.sym_sizes()[dim] != 1) { + self.as_strided__symint(self.sym_sizes(), self.sym_strides()); return self; } auto g = inferSqueezeGeometry(self, dim); - self.as_strided_(std::get<0>(g), std::get<1>(g)); + self.as_strided__symint(std::get<0>(g), std::get<1>(g)); return self; } @@ -2782,7 +3233,7 @@ Tensor unsqueeze_sparse(Tensor const &self, int64_t dim) { if (dim <= sparse_dim) { auto new_indices = at::cat( {indices.narrow(0, 0, dim), - native::zeros( + at::zeros( {1, indices.size(1)}, kLong, indices.options().layout_opt(), @@ -2839,18 +3290,18 @@ Tensor flatten(const Tensor& self, int64_t start_dim, int64_t end_dim) { // of freedom we don't want; for example, consider shape [0, 1, 3, 0], with start_dim=1, end_dim=2. // It's clear we want result shape [0, 3, 0] but passing [0, -1, 0] to infer_size means the -1 // can take on any value and satisfy the constraints. - auto slice_numel = c10::multiply_integers(self.sizes().slice(start_dim, end_dim - start_dim + 1)); - std::vector shape; + auto slice_numel = c10::multiply_integers(self.sym_sizes().slice(start_dim, end_dim - start_dim + 1)); + std::vector shape; shape.reserve(self.dim() - end_dim + start_dim); for (const auto i : c10::irange(start_dim)) { - shape.push_back(self.sizes()[i]); + shape.push_back(self.sym_sizes()[i]); } shape.push_back(slice_numel); for (const auto i : c10::irange(end_dim + 1, self.dim())) { - shape.push_back(self.sizes()[i]); + shape.push_back(self.sym_sizes()[i]); } - return native::reshape(self, shape); + return native::reshape_symint(self, shape); } Tensor flatten(const Tensor& self, int64_t start_dim, int64_t end_dim, Dimname out_dim) { @@ -3119,17 +3570,12 @@ Tensor adjoint(const Tensor &self) { } Tensor view(const Tensor& self, - IntArrayRef size) { + at::IntArrayRef size) { return view_impl(self, size); } -Tensor view_symint(const Tensor& self, - c10::SymIntArrayRef size) { - return self.view(c10::asIntArrayRefSlow(size)); -} - Tensor alias(const Tensor& self) { - return alias_with_sizes_and_strides(self, self.sizes(), self.strides()); + return alias_with_sizes_and_strides(self, self.sizes(), self.strides()); } Tensor detach(const Tensor& self) { @@ -3142,107 +3588,54 @@ Tensor detach(const Tensor& self) { /*allow_tensor_metadata_change=*/false)); } -Tensor unfold(const Tensor& self, int64_t dimension, int64_t size, int64_t step) { - // some special handling to deal with allow dimension == 0 when self.dim() == 0 - dimension = at::maybe_wrap_dim(dimension, self.dim(), /*wrap_scalar=*/true); +Tensor unfold(const Tensor& self, int64_t d, int64_t size, int64_t step) { + // some special handling to deal with allow d == 0 when self.dim() == 0 + auto ndim = self.dim(); + d = at::maybe_wrap_dim(d, ndim, /*wrap_scalar=*/true); - const auto sizes = self.sizes(); - const auto strides = self.strides(); - int64_t max_size = self.dim() == 0 ? 1 : sizes[dimension]; - TORCH_CHECK(size <= max_size, "maximum size for tensor at dimension ", dimension, + auto sizes = self.sizes().vec(); + auto strides = self.strides().vec(); + int64_t max_size = self.dim() == 0 ? 1 : sizes[d]; + TORCH_CHECK(size <= max_size, "maximum size for tensor at dimension ", d, " is ", max_size, " but size is ", size); TORCH_CHECK(step > 0, "step is ", step, " but must be > 0"); - - DimVector new_size(self.dim() + 1); - DimVector new_stride(self.dim() + 1); - - new_size[self.dim()] = size; - new_stride[self.dim()] = self.dim() == 0 ? 1 : strides[dimension]; - for(const auto d : c10::irange(self.dim())) { - const auto self_size = sizes[d]; - const auto self_stride = strides[d]; - if(d == dimension) { - new_size[d] = (self_size - size) / step + 1; - new_stride[d] = step*self_stride; - } else { - new_size[d] = self_size; - new_stride[d] = self_stride; - } + sizes.push_back(size); + strides.push_back(self.dim() == 0 ? 1 : strides[d]); + // The if handles the self.dim() == 0 case + if (d < ndim) { + sizes[d] = (sizes[d] - size) / step + 1; + strides[d] *= step; } - - return self.as_strided(new_size, new_stride); + return self.as_strided(sizes, strides); } -template -void apply_diag(Tensor& result, const Tensor& self, int64_t dimension) { - TORCH_CHECK(self.dim() == 1 || self.dim() == 2, "matrix or a vector expected"); - - auto self_data = self.data_ptr(); - if (self.dim() == 1) { - auto self_size = self.size(0); - auto self_stride = self.stride(0); - int64_t sz = self_size + std::abs(dimension); - - at::native::resize_output(result, {sz, sz}); - result.zero_(); - auto r_data = result.data_ptr(); - auto r_stride_0 = result.stride(0); - auto r_stride_1 = result.stride(1); - r_data += (dimension >= 0 ? dimension*r_stride_1 : -dimension*r_stride_0); - - for (const auto i : c10::irange(self_size)) { - r_data[i * (r_stride_0 + r_stride_1)] = self_data[i * self_stride]; - } +Tensor diag(const Tensor& self, int64_t offset) { + auto ndim = self.dim(); + TORCH_CHECK(ndim == 1 || ndim == 2, "diag(): Supports 1D or 2D tensors. Got ", self.dim(), "D"); + if (ndim == 1) { + return at::diag_embed(self, offset); } else { - auto self_stride_0 = self.stride(0); - auto self_stride_1 = self.stride(1); - - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - int64_t sz; - if (dimension >= 0) { - sz = std::min(self.size(0), self.size(1) - dimension); - } else { - sz = std::min(self.size(0) + dimension, self.size(1)); - } - - at::native::resize_output(result, {sz}); - result.zero_(); - auto r_data = result.data_ptr(); - auto r_stride_0 = result.stride(0); - self_data += (dimension >= 0 ? dimension * self_stride_1 : -dimension * self_stride_0); - for (const auto i : c10::irange(sz)) { - r_data[i * r_stride_0] = self_data[i * (self_stride_0 + self_stride_1)]; - } + // We return a copy of the diagonal + return at::diagonal_copy(self, offset); } } -Tensor diag(const Tensor& self, int64_t dimension) { - Tensor result = at::empty({0}, self.options()); - at::diag_out(result, self, dimension); - return result; -} - -Tensor& diag_cpu_out(const Tensor& self, int64_t dimension, Tensor &result) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(kBFloat16, kBool, self.scalar_type(), "diag", [&] { - apply_diag(result, self, dimension); - }); - return result; -} - -Tensor diag_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t diagonal) { - auto ndimension = input_sizes.size(); - AT_ASSERT(ndimension == 1 || ndimension == 2); - - if (ndimension == 1 || input_sizes[0] == input_sizes[1]) { - return grad.diag(diagonal); +Tensor& diag_out(const Tensor& self, int64_t offset, Tensor& out) { + auto ndim = self.dim(); + TORCH_CHECK(ndim == 1 || ndim == 2, "Supports 1D or 2D tensors. Got ", self.dim(), "D"); + if (ndim == 1) { + TORCH_CHECK( + canCast(self.scalar_type(), out.scalar_type()), + "diag: result type ", self.scalar_type(), " can't be cast to the desired out= type ", + out.scalar_type()); + return at::diag_embed_out(out, self, offset); + } else { + return at::diagonal_copy_out(out, self, offset); } - - // Input was a matrix but was not square - return at::diagonal_backward(grad, input_sizes, diagonal, 0, 1); } -Tensor diagonal_backward(const Tensor & grad, IntArrayRef input_sizes, int64_t offset, int64_t dim1, int64_t dim2) { - auto grad_input = at::zeros(input_sizes, grad.options()); +Tensor diagonal_backward_symint(const Tensor & grad, SymIntArrayRef input_sizes, int64_t offset, int64_t dim1, int64_t dim2) { + auto grad_input = at::zeros_symint(input_sizes, grad.options()); auto diag = grad_input.diagonal(offset, dim1, dim2); diag.copy_(grad); return grad_input; @@ -3250,7 +3643,7 @@ Tensor diagonal_backward(const Tensor & grad, IntArrayRef input_sizes, int64_t o Tensor movedim(const Tensor& self, IntArrayRef src, IntArrayRef dst) { TORCH_CHECK(src.size() == dst.size(), "movedim: Invalid source or destination dims: source (", - src, " dims ) should contain the same number of dims as destination (", dst, " dims)"); + src, " dims) should contain the same number of dims as destination (", dst, " dims)"); size_t self_dim = self.dim(); DimVector normalized_src(src.size()); @@ -3399,9 +3792,9 @@ at::Tensor slice_scatter(const at::Tensor& self, const at::Tensor& src, int64_t slice.copy_(src); return output; } -at::Tensor select_scatter(const at::Tensor& self, const at::Tensor& src, int64_t dim, int64_t index) { +at::Tensor select_scatter_symint(const at::Tensor& self, const at::Tensor& src, int64_t dim, c10::SymInt index) { auto output = self.clone(); - auto slice = output.select(dim, index); + auto slice = output.select_symint(dim, index); TORCH_CHECK(slice.sizes() == src.sizes(), "expected src to have a size equal to the slice of self. src size = ", src.sizes(), ", slice size = ", slice.sizes()); slice.copy_(src); return output; @@ -3413,12 +3806,12 @@ at::Tensor diagonal_scatter(const at::Tensor& self, const at::Tensor& src, int64 slice.copy_(src); return output; } -at::Tensor as_strided_scatter(const at::Tensor& self, const at::Tensor& src, at::IntArrayRef size, at::IntArrayRef stride, c10::optional storage_offset) { +at::Tensor as_strided_scatter_symint(const at::Tensor& self, const at::Tensor& src, at::SymIntArrayRef size, at::SymIntArrayRef stride, c10::optional storage_offset) { // See Note [as_strided_scatter backward support] TORCH_INTERNAL_ASSERT(!self.requires_grad() || self.is_contiguous(), "as_strided_scatter is currently only supported for contiguous inputs"); auto output = self.clone(); - auto slice = output.as_strided(size, stride, storage_offset); - TORCH_CHECK(slice.sizes() == src.sizes(), "expected src to have a size equal to the slice of self. src size = ", src.sizes(), ", slice size = ", slice.sizes()); + auto slice = output.as_strided_symint(size, stride, storage_offset); + TORCH_CHECK(slice.sym_sizes() == src.sym_sizes(), "expected src to have a size equal to the slice of self. src size = ", src.sym_sizes(), ", slice size = ", slice.sym_sizes()); slice.copy_(src); return output; } @@ -3477,8 +3870,8 @@ at::Tensor& _neg_view_copy_out(const at::Tensor & self, at::Tensor & out) { } -at::Tensor& as_strided_copy_out(const at::Tensor & self, at::IntArrayRef size, at::IntArrayRef stride, c10::optional storage_offset, at::Tensor & out) { - auto tmp = self.as_strided(size, stride, storage_offset); +at::Tensor& as_strided_copy_out_symint(const at::Tensor & self, at::SymIntArrayRef size, at::SymIntArrayRef stride, c10::optional storage_offset, at::Tensor & out) { + auto tmp = self.as_strided_symint(size, stride, storage_offset); out.copy_(tmp); return out; } @@ -3492,8 +3885,16 @@ at::Tensor& _sparse_broadcast_to_copy_out(const at::Tensor & self, at::IntArrayR at::Tensor& diagonal_copy_out(const at::Tensor & self, int64_t offset, int64_t dim1, int64_t dim2, at::Tensor & out) { - auto tmp = self.diagonal(offset, dim1, dim2); - out.copy_(tmp); + TORCH_CHECK( + out.device() == self.device(), + "diagonal_copy: Expected out and self tensors to be on the same device, but got ", + "out on ", out.device(), " and self on ", self.device()); + auto result = self.diagonal(offset, dim1, dim2); + at::native::resize_output(out, result.sizes()); + TORCH_CHECK( + canCast(result.scalar_type(), out.scalar_type()), + "diagonal_copy: result type ", result.scalar_type(), " can't be cast to the desired out= type ", out.scalar_type()); + out.copy_(result); return out; } @@ -3505,8 +3906,8 @@ at::Tensor& expand_copy_SymInt_out(const at::Tensor & self, c10::SymIntArrayRef } -at::Tensor& expand_copy_out(const at::Tensor & self, at::IntArrayRef size, bool implicit, at::Tensor & out) { - auto tmp = self.expand(size, implicit); +at::Tensor& expand_copy_out_symint(const at::Tensor & self, at::SymIntArrayRef size, bool implicit, at::Tensor & out) { + auto tmp = self.expand_symint(size, implicit); out.copy_(tmp); return out; } @@ -3533,8 +3934,8 @@ at::Tensor& _reshape_alias_copy_out(const at::Tensor & self, at::IntArrayRef siz } -at::Tensor& select_copy_int_out(const at::Tensor & self, int64_t dim, int64_t index, at::Tensor & out) { - auto tmp = self.select(dim, index); +at::Tensor& select_copy_symint_out(const at::Tensor & self, int64_t dim, c10::SymInt index, at::Tensor & out) { + auto tmp = self.select_symint(dim, index); out.copy_(tmp); return out; } @@ -3661,8 +4062,8 @@ void unbind_copy_int_out(const at::Tensor & self, int64_t dim, at::TensorList o } -at::Tensor& view_copy_out(const at::Tensor & self, at::IntArrayRef size, at::Tensor & out) { - auto tmp = self.view(size); +at::Tensor& view_copy_out_symint(const at::Tensor & self, at::SymIntArrayRef size, at::Tensor & out) { + auto tmp = self.view_symint(size); out.copy_(tmp); return out; } @@ -3688,5 +4089,13 @@ at::Tensor& alias_copy_out(const at::Tensor & self, at::Tensor & out) { return out; } +int64_t sparse_dim_strided(const at::Tensor& self) { + return 0; +} + +int64_t dense_dim_strided(const at::Tensor& self) { + return self.dim(); +} + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/TensorShape.h b/aten/src/ATen/native/TensorShape.h index bb296b5ae5bc..60e2533e9b53 100644 --- a/aten/src/ATen/native/TensorShape.h +++ b/aten/src/ATen/native/TensorShape.h @@ -1,6 +1,7 @@ #pragma once #include #include +#include namespace at { namespace native { @@ -26,11 +27,12 @@ inline void check_cat_shape_except_dim(const Tensor & first, const Tensor & seco } } -inline void check_cat_no_zero_dim(at::ArrayRef tensors) { - for(const auto i : c10::irange(tensors.size())) { - auto& t = tensors[i]; +inline void check_cat_no_zero_dim(const MaterializedITensorListRef& tensors) { + int64_t i = 0; + for(const Tensor& t : tensors) { TORCH_CHECK(t.dim() > 0, "zero-dimensional tensor (at position ", i, ") cannot be concatenated"); + i++; } } @@ -51,11 +53,4 @@ inline int64_t get_num_splits(const Tensor& self, int64_t split_size, int64_t di return num_splits; } -/// -/// For more information, see -/// https://pytorch.org/docs/master/generated/torch.Tensor.unfold.html#torch.Tensor.unfold -/// - -Tensor unfold(const Tensor& self, int64_t dimension, int64_t size, int64_t step); - }} // namespace at::native diff --git a/aten/src/ATen/native/TensorTransformations.cpp b/aten/src/ATen/native/TensorTransformations.cpp index f0e2c0f02caa..028b05e66930 100644 --- a/aten/src/ATen/native/TensorTransformations.cpp +++ b/aten/src/ATen/native/TensorTransformations.cpp @@ -1,14 +1,31 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include // for flip_stub -#include -#include #include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/TestOps.cpp b/aten/src/ATen/native/TestOps.cpp index a8c30f5c3ba6..f36765436991 100644 --- a/aten/src/ATen/native/TestOps.cpp +++ b/aten/src/ATen/native/TestOps.cpp @@ -1,10 +1,25 @@ // Copyright 2004-present Facebook. All Rights Reserved. +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include +#include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/TriangularOps.cpp b/aten/src/ATen/native/TriangularOps.cpp index d5f408a74f1b..59d2b8a0d224 100644 --- a/aten/src/ATen/native/TriangularOps.cpp +++ b/aten/src/ATen/native/TriangularOps.cpp @@ -1,22 +1,34 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { TORCH_META_FUNC(tril)(const Tensor& self, int64_t k) { + TORCH_CHECK(self.dim() >= 2, "tril: input tensor must have at least 2 dimensions") set_output_raw_strided(0, self.sizes(), {}, self.options()); } TORCH_META_FUNC(triu)(const Tensor& self, int64_t k) { + TORCH_CHECK(self.dim() >= 2, "triu: input tensor must have at least 2 dimensions") set_output_raw_strided(0, self.sizes(), {}, self.options()); } @@ -168,12 +180,16 @@ TORCH_IMPL_FUNC(triu_cpu)(const Tensor& self, int64_t k, const Tensor &result) { compute_triu_tril(self, k, result); } -Tensor trace_backward(const Tensor& grad, IntArrayRef sizes) { +Tensor trace_backward(const Tensor& grad, at::IntArrayRef sizes) { + return at::native::trace_backward_symint(grad, c10::fromIntArrayRefSlow(sizes)); +} + +Tensor trace_backward_symint(const Tensor& grad, c10::SymIntArrayRef sizes) { if (sizes.size() != 2) { throw std::runtime_error("expected matrix input"); } - auto grad_input = at::zeros(sizes[0] * sizes[1], grad.options()); + auto grad_input = at::zeros_symint(sizes[0] * sizes[1], grad.options()); auto indices = at::arange(0, grad_input.numel(), sizes[1] + 1, grad.options().dtype(at::kLong)); // for composite compliance, use out-of-place variant of // `index_fill` if grad tensor is a Tensor Subclass. @@ -182,7 +198,7 @@ Tensor trace_backward(const Tensor& grad, IntArrayRef sizes) { } else { grad_input.index_fill_(0, indices, grad); } - return grad_input.view(sizes); + return grad_input.view_symint(sizes); } } // namespace native diff --git a/aten/src/ATen/native/TriangularOpsUtils.h b/aten/src/ATen/native/TriangularOpsUtils.h index c5bce42ed3fd..e380a510bdde 100644 --- a/aten/src/ATen/native/TriangularOpsUtils.h +++ b/aten/src/ATen/native/TriangularOpsUtils.h @@ -1,4 +1,4 @@ -#include +#include #include namespace at { diff --git a/aten/src/ATen/native/TypeProperties.cpp b/aten/src/ATen/native/TypeProperties.cpp index feceb75631ce..36354c133a98 100644 --- a/aten/src/ATen/native/TypeProperties.cpp +++ b/aten/src/ATen/native/TypeProperties.cpp @@ -1,8 +1,26 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/UnaryOps.cpp b/aten/src/ATen/native/UnaryOps.cpp index 160955a01350..845610ce373e 100644 --- a/aten/src/ATen/native/UnaryOps.cpp +++ b/aten/src/ATen/native/UnaryOps.cpp @@ -1,26 +1,174 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include +#include +#include +#include +#include +#include #include -#include -#include -#include #include #include -#include -#include #include -#include -#include -#include -#include -#include +#include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + +#include namespace at { @@ -157,6 +305,21 @@ TORCH_IMPL_FUNC(func_out) (const Tensor& self, const Tensor& result) { \ func_stub(device_type(), *this); \ } +// This macro is as optional as the one above. torch.(ceil|floor|round|trunc) are no-ops for integers +// See gh-70918 +#define CREATE_UNARY_TORCH_IMPL_INTEGER_NO_OP_FUNC(func_out, func_stub) \ +TORCH_IMPL_FUNC(func_out) (const Tensor& self, const Tensor& result) { \ + if (c10::isIntegralType(self.scalar_type(), /*includeBool=*/false)) { \ + result.copy_(self); \ + } else { \ + func_stub(device_type(), *this); \ + } \ +} +CREATE_UNARY_TORCH_IMPL_INTEGER_NO_OP_FUNC(ceil_out, ceil_stub) +CREATE_UNARY_TORCH_IMPL_INTEGER_NO_OP_FUNC(floor_out, floor_stub) +CREATE_UNARY_TORCH_IMPL_INTEGER_NO_OP_FUNC(round_out, round_stub) +CREATE_UNARY_TORCH_IMPL_INTEGER_NO_OP_FUNC(trunc_out, trunc_stub) + CREATE_UNARY_TORCH_IMPL_FUNC(acos_out, acos_stub) CREATE_UNARY_TORCH_IMPL_FUNC(acosh_out, acosh_stub) CREATE_UNARY_TORCH_IMPL_FUNC(asin_out, asin_stub) @@ -164,7 +327,6 @@ CREATE_UNARY_TORCH_IMPL_FUNC(asinh_out, asinh_stub) CREATE_UNARY_TORCH_IMPL_FUNC(atan_out, atan_stub) CREATE_UNARY_TORCH_IMPL_FUNC(atanh_out, atanh_stub) CREATE_UNARY_TORCH_IMPL_FUNC(bitwise_not_out, bitwise_not_stub) -CREATE_UNARY_TORCH_IMPL_FUNC(ceil_out, ceil_stub) CREATE_UNARY_TORCH_IMPL_FUNC(cos_out, cos_stub) CREATE_UNARY_TORCH_IMPL_FUNC(cosh_out, cosh_stub) CREATE_UNARY_TORCH_IMPL_FUNC(digamma_out, digamma_stub) @@ -174,7 +336,6 @@ CREATE_UNARY_TORCH_IMPL_FUNC(erfinv_out, erfinv_stub) CREATE_UNARY_TORCH_IMPL_FUNC(exp_out, exp_stub) CREATE_UNARY_TORCH_IMPL_FUNC(exp2_out, exp2_stub) CREATE_UNARY_TORCH_IMPL_FUNC(expm1_out, expm1_stub) -CREATE_UNARY_TORCH_IMPL_FUNC(floor_out, floor_stub) CREATE_UNARY_TORCH_IMPL_FUNC(frac_out, frac_stub) CREATE_UNARY_TORCH_IMPL_FUNC(i0_out, i0_stub) CREATE_UNARY_TORCH_IMPL_FUNC(lgamma_out, lgamma_stub) @@ -184,7 +345,6 @@ CREATE_UNARY_TORCH_IMPL_FUNC(log1p_out, log1p_stub) CREATE_UNARY_TORCH_IMPL_FUNC(log2_out, log2_stub) CREATE_UNARY_TORCH_IMPL_FUNC(neg_out, neg_stub) CREATE_UNARY_TORCH_IMPL_FUNC(reciprocal_out, reciprocal_stub) -CREATE_UNARY_TORCH_IMPL_FUNC(round_out, round_stub) CREATE_UNARY_TORCH_IMPL_FUNC(rsqrt_out, rsqrt_stub) CREATE_UNARY_TORCH_IMPL_FUNC(sigmoid_out, sigmoid_stub) CREATE_UNARY_TORCH_IMPL_FUNC(sign_out, sign_stub) @@ -201,7 +361,6 @@ CREATE_UNARY_TORCH_IMPL_FUNC(special_log_ndtr_out, special_log_ndtr_stub) CREATE_UNARY_TORCH_IMPL_FUNC(sqrt_out, sqrt_stub) CREATE_UNARY_TORCH_IMPL_FUNC(tan_out, tan_stub) CREATE_UNARY_TORCH_IMPL_FUNC(tanh_out, tanh_stub) -CREATE_UNARY_TORCH_IMPL_FUNC(trunc_out, trunc_stub) CREATE_UNARY_TORCH_IMPL_FUNC(special_airy_ai_out, special_airy_ai_stub) CREATE_UNARY_TORCH_IMPL_FUNC(special_bessel_j0_out, special_bessel_j0_stub) CREATE_UNARY_TORCH_IMPL_FUNC(special_bessel_j1_out, special_bessel_j1_stub) @@ -723,8 +882,7 @@ constexpr double QUARTER = 0.25; } static inline void mvlgamma_check(const Tensor& self, int64_t p) { - TORCH_CHECK((self > HALF * (p - 1)).all().item(), - "All elements must be greater than (p-1)/2"); + TORCH_CHECK(self.scalar_type() != kBool, "The input tensor may not be a boolean tensor."); TORCH_CHECK(p >= 1, "p has to be greater than or equal to 1"); } diff --git a/aten/src/ATen/native/Unfold2d.cpp b/aten/src/ATen/native/Unfold2d.cpp index 0a3b760a33fd..60bbc8a77712 100644 --- a/aten/src/ATen/native/Unfold2d.cpp +++ b/aten/src/ATen/native/Unfold2d.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include namespace at { namespace native { diff --git a/aten/src/ATen/native/Unfold3d.cpp b/aten/src/ATen/native/Unfold3d.cpp index 3495f92dc3ce..1a2d0ea2ae1f 100644 --- a/aten/src/ATen/native/Unfold3d.cpp +++ b/aten/src/ATen/native/Unfold3d.cpp @@ -1,5 +1,7 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include diff --git a/aten/src/ATen/native/Unfold3d.h b/aten/src/ATen/native/Unfold3d.h index 51eb89f9b810..e9b5a34a8d10 100644 --- a/aten/src/ATen/native/Unfold3d.h +++ b/aten/src/ATen/native/Unfold3d.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace at { namespace native { diff --git a/aten/src/ATen/native/UnfoldBackward.cpp b/aten/src/ATen/native/UnfoldBackward.cpp index 10bee80cea23..494143232116 100644 --- a/aten/src/ATen/native/UnfoldBackward.cpp +++ b/aten/src/ATen/native/UnfoldBackward.cpp @@ -5,6 +5,7 @@ #include #include #else +#include #include #include #endif @@ -21,6 +22,11 @@ Tensor unfold_backward( int64_t step ) { auto grad_input = at::zeros(input_sizes, grad.options()); + if (step >= size) { + auto gI_unfolded = grad_input.unfold(dim, size, step); + gI_unfolded.copy_(grad); + return grad_input; + } unfold_backward_stub( grad.device().type(), diff --git a/aten/src/ATen/native/UnfoldBackward.h b/aten/src/ATen/native/UnfoldBackward.h index 1f6c8fa1b289..f8099167361c 100644 --- a/aten/src/ATen/native/UnfoldBackward.h +++ b/aten/src/ATen/native/UnfoldBackward.h @@ -1,10 +1,9 @@ #pragma once #include -#include +#include #include -#include -#include +#include #ifndef AT_PER_OPERATOR_HEADERS #include @@ -108,79 +107,6 @@ static C10_UNUSED TensorIterator _make_unfold_backward_iter_over_grad_out( return iter; } -static C10_UNUSED TensorIterator _make_unfold_backward_iter_over_grad_in( - Tensor& grad_out, - const Tensor& grad_in, - int64_t dim, - int64_t /*size*/, - int64_t /*step*/ -) { - dim = maybe_wrap_dim(dim, grad_out.dim()); - // last dim stores the folds - auto last_dim = maybe_wrap_dim(-1, grad_in.dim()); - - auto grad_in_dim = ensure_nonempty_dim(grad_in.dim()); - auto grad_in_dim_size = ensure_nonempty_size(grad_in, dim); - auto grad_in_last_dim_size = ensure_nonempty_size(grad_in, last_dim); - - /* prepare grad_out for TensorIterator { */ - auto grad_out_restrided = grad_out.unsqueeze(-1); - - auto grad_out_strides = ensure_nonempty_vec(grad_out_restrided.strides().vec()); - auto grad_out_sizes = ensure_nonempty_vec(grad_out_restrided.sizes().vec()); - - grad_out_strides[dim] = 0; - grad_out_strides[last_dim] = 0; - - grad_out_sizes[dim] = grad_in_dim_size; - grad_out_sizes[last_dim] = grad_in_last_dim_size; - - grad_out_restrided = grad_out_restrided.as_strided(grad_out_sizes, grad_out_strides); - /* } */ - - // for each element grad_out[i_1,...,i_dim,...,i_last_dim] - // we have to know i_dim and i_last_dim. - // This information is stored in Tensors - // idx_dim and idx_last_dim - /* prepare idx_dim and idx_last_dim for TensorIterator { */ - auto idx_dim = at::arange( - 0, grad_in_dim_size, grad_in.options().dtype(at::kLong) - ); - - auto idx_dim_strides = std::vector(grad_in_dim, 0); - auto idx_dim_sizes = std::vector(grad_in_dim, 1); - - idx_dim_strides[dim] = 1; - idx_dim_sizes[dim] = grad_in_dim_size; - - auto idx_dim_restrided = idx_dim.as_strided(idx_dim_sizes, idx_dim_strides); - - auto idx_last_dim = at::arange( - 0, grad_in_last_dim_size, grad_in.options().dtype(at::kLong) - ); - - auto idx_last_dim_strides = std::vector(grad_in_dim, 0); - auto idx_last_dim_sizes = std::vector(grad_in_dim, 1); - - idx_last_dim_strides[last_dim] = 1; - idx_last_dim_sizes[last_dim] = grad_in_last_dim_size; - - auto idx_last_dim_restrided = idx_last_dim.as_strided(idx_last_dim_sizes, idx_last_dim_strides); - /* } */ - - auto iter = TensorIteratorConfig() - .set_check_mem_overlap(false) - .check_all_same_dtype(false) - .resize_outputs(false) - .add_owned_output(grad_out_restrided) - .add_owned_input(grad_in) - .add_owned_input(idx_dim_restrided) - .add_owned_input(idx_last_dim_restrided) - .build(); - - return iter; -} - } }} // namespace at::native diff --git a/aten/src/ATen/native/Unique.cpp b/aten/src/ATen/native/Unique.cpp index f418611e0864..92b48c9f388c 100644 --- a/aten/src/ATen/native/Unique.cpp +++ b/aten/src/ATen/native/Unique.cpp @@ -1,8 +1,27 @@ // Returns unique elements of input tensor. +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include +#include #include #include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif #include #include diff --git a/aten/src/ATen/native/UpSample.cpp b/aten/src/ATen/native/UpSample.cpp index db75b7e99fdb..1a6af7526030 100644 --- a/aten/src/ATen/native/UpSample.cpp +++ b/aten/src/ATen/native/UpSample.cpp @@ -1,4 +1,5 @@ // Copyright 2004-present Facebook. All Rights Reserved. +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include diff --git a/aten/src/ATen/native/UpSample.h b/aten/src/ATen/native/UpSample.h index 6b248352de6a..144b5921eed3 100644 --- a/aten/src/ATen/native/UpSample.h +++ b/aten/src/ATen/native/UpSample.h @@ -2,11 +2,11 @@ #include -#include +#include #include +#include #include - /** * Note [compute_scales_value] * Note [area_pixel_compute_scale] @@ -266,15 +266,13 @@ static inline scalar_t area_pixel_compute_scale( bool align_corners, const c10::optional scale) { // see Note [area_pixel_compute_scale] - if(align_corners){ + if(align_corners) { if(output_size > 1) { return static_cast(input_size - 1) / (output_size - 1); - } - else { + } else { return static_cast(0); } - } - else{ + } else { return compute_scales_value(scale, input_size, output_size); } } @@ -288,7 +286,8 @@ static inline scalar_t area_pixel_compute_source_index( if (align_corners) { return scale * dst_index; } else { - scalar_t src_idx = scale * (dst_index + 0.5) - 0.5; + scalar_t src_idx = scale * (dst_index + static_cast(0.5)) - + static_cast(0.5); // [Note] Follow Opencv resize logic: // We allow negative src_idx here and later will use // dx = src_idx - floorf(src_idx) @@ -301,7 +300,8 @@ static inline scalar_t area_pixel_compute_source_index( // where we should and then remove this cubic flag. // This matters in cubic mode, as we might need [-1, 0, 1, 2] // to interpolate and the weights can be affected. - return (!cubic && src_idx < 0) ? scalar_t(0) : src_idx; + return (!cubic && src_idx < static_cast(0)) ? scalar_t(0) + : src_idx; } } @@ -445,8 +445,10 @@ static inline void compute_source_index_and_lambda( lambda0 = static_cast(1); lambda1 = static_cast(0); } else { - const scalar_t real_input_index = area_pixel_compute_source_index( - ratio, output_index, align_corners, /*cubic=*/false); + using opmath_t = at::opmath_type; + const auto real_input_index = + area_pixel_compute_source_index( + ratio, output_index, align_corners, /*cubic=*/false); input_index0 = static_cast(real_input_index); int64_t offset = (input_index0 < input_size - 1) ? 1 : 0; input_index1 = input_index0 + offset; diff --git a/aten/src/ATen/native/UpSampleBicubic2d.cpp b/aten/src/ATen/native/UpSampleBicubic2d.cpp index 5dd1b370b217..035bea562954 100644 --- a/aten/src/ATen/native/UpSampleBicubic2d.cpp +++ b/aten/src/ATen/native/UpSampleBicubic2d.cpp @@ -1,8 +1,24 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { @@ -109,8 +125,7 @@ static void upsample_bicubic2d_backward_out_frame( for (const auto output_x : c10::irange(output_width)) { scalar_t* in = &idata[output_y * input_width + output_x]; scalar_t* out = &odata[output_y * output_width + output_x]; - for (const auto c : c10::irange(channels)) { - (void)c; //Suppress unused variable warning + for (const auto c C10_UNUSED : c10::irange(channels)) { in[0] = out[0]; in += input_width * input_height; out += output_width * output_height; @@ -146,8 +161,7 @@ static void upsample_bicubic2d_backward_out_frame( get_cubic_upsample_coefficients(x_coeffs, t_x); get_cubic_upsample_coefficients(y_coeffs, t_y); - for (const auto c : c10::irange(channels)) { - (void)c; //Suppress unused variable warning + for (const auto c C10_UNUSED : c10::irange(channels)) { scalar_t out_value = out[output_y * output_width + output_x]; for (const auto i : c10::irange(4)) { @@ -273,18 +287,6 @@ Tensor upsample_bicubic2d( return at::upsample_bicubic2d(input, osize, align_corners, scale_h, scale_w); } -Tensor upsample_bicubic2d_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - bool align_corners, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_h = get_scale_value(scale_factors, 0); - auto scale_w = get_scale_value(scale_factors, 1); - return at::upsample_bicubic2d_backward(grad_output, osize, input_size, align_corners, scale_h, scale_w); -} - Tensor _upsample_bicubic2d_aa( const Tensor& input, at::OptionalIntArrayRef output_size, @@ -296,18 +298,6 @@ Tensor _upsample_bicubic2d_aa( return at::_upsample_bicubic2d_aa(input, osize, align_corners, scale_h, scale_w); } -Tensor _upsample_bicubic2d_aa_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - bool align_corners, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_h = get_scale_value(scale_factors, 0); - auto scale_w = get_scale_value(scale_factors, 1); - return at::_upsample_bicubic2d_aa_backward(grad_output, osize, input_size, align_corners, scale_h, scale_w); -} - DEFINE_DISPATCH(upsample_bicubic2d_kernel); DEFINE_DISPATCH(_upsample_bicubic2d_aa_kernel); DEFINE_DISPATCH(_upsample_bicubic2d_aa_backward_kernel); diff --git a/aten/src/ATen/native/UpSampleBilinear2d.cpp b/aten/src/ATen/native/UpSampleBilinear2d.cpp index 527555a066ab..5d91e93e016d 100644 --- a/aten/src/ATen/native/UpSampleBilinear2d.cpp +++ b/aten/src/ATen/native/UpSampleBilinear2d.cpp @@ -1,11 +1,26 @@ // Adapted from interp.cpp from Caffe util by Pauline Luc // Originally developed by George Papandreou +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include -#include +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { @@ -154,18 +169,6 @@ Tensor upsample_bilinear2d( return at::upsample_bilinear2d(input, osize, align_corners, scale_h, scale_w); } -Tensor upsample_bilinear2d_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - bool align_corners, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_h = get_scale_value(scale_factors, 0); - auto scale_w = get_scale_value(scale_factors, 1); - return at::upsample_bilinear2d_backward(grad_output, osize, input_size, align_corners, scale_h, scale_w); -} - Tensor _upsample_bilinear2d_aa( const Tensor& input, at::OptionalIntArrayRef output_size, @@ -177,18 +180,6 @@ Tensor _upsample_bilinear2d_aa( return at::_upsample_bilinear2d_aa(input, osize, align_corners, scale_h, scale_w); } -Tensor _upsample_bilinear2d_aa_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - bool align_corners, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_h = get_scale_value(scale_factors, 0); - auto scale_w = get_scale_value(scale_factors, 1); - return at::_upsample_bilinear2d_aa_backward(grad_output, osize, input_size, align_corners, scale_h, scale_w); -} - DEFINE_DISPATCH(upsample_bilinear2d_kernel); DEFINE_DISPATCH(upsample_bilinear2d_backward_kernel); DEFINE_DISPATCH(_upsample_bilinear2d_aa_kernel); diff --git a/aten/src/ATen/native/UpSampleLinear1d.cpp b/aten/src/ATen/native/UpSampleLinear1d.cpp index b100450c2b6a..aed082b68563 100644 --- a/aten/src/ATen/native/UpSampleLinear1d.cpp +++ b/aten/src/ATen/native/UpSampleLinear1d.cpp @@ -1,10 +1,22 @@ // Adapted from interp.cpp from Caffe util by Pauline Luc // Originally developed by George Papandreou +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include -#include +#include +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace meta { @@ -87,17 +99,6 @@ Tensor upsample_linear1d( return at::upsample_linear1d(input, osize, align_corners, scale_w); } -Tensor upsample_linear1d_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - bool align_corners, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_w = get_scale_value(scale_factors, 0); - return at::upsample_linear1d_backward(grad_output, osize, input_size, align_corners, scale_w); -} - DEFINE_DISPATCH(upsample_linear1d_kernel); DEFINE_DISPATCH(upsample_linear1d_backward_kernel); diff --git a/aten/src/ATen/native/UpSampleNearest1d.cpp b/aten/src/ATen/native/UpSampleNearest1d.cpp index 83121ed3be45..1bdbda8f66c4 100644 --- a/aten/src/ATen/native/UpSampleNearest1d.cpp +++ b/aten/src/ATen/native/UpSampleNearest1d.cpp @@ -1,7 +1,23 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { @@ -125,26 +141,6 @@ Tensor _upsample_nearest_exact1d( return at::_upsample_nearest_exact1d(input, osize, scale_w); } -Tensor upsample_nearest1d_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_w = get_scale_value(scale_factors, 0); - return at::upsample_nearest1d_backward(grad_output, osize, input_size, scale_w); -} - -Tensor _upsample_nearest_exact1d_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_w = get_scale_value(scale_factors, 0); - return at::_upsample_nearest_exact1d_backward(grad_output, osize, input_size, scale_w); -} - DEFINE_DISPATCH(upsample_nearest1d_kernel); DEFINE_DISPATCH(_upsample_nearest_exact1d_kernel); DEFINE_DISPATCH(upsample_nearest1d_backward_kernel); diff --git a/aten/src/ATen/native/UpSampleNearest2d.cpp b/aten/src/ATen/native/UpSampleNearest2d.cpp index ee5dce4a02ef..65e20b78f868 100644 --- a/aten/src/ATen/native/UpSampleNearest2d.cpp +++ b/aten/src/ATen/native/UpSampleNearest2d.cpp @@ -1,9 +1,24 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { @@ -152,28 +167,6 @@ Tensor _upsample_nearest_exact2d( return at::_upsample_nearest_exact2d(input, osize, scale_h, scale_w); } -Tensor upsample_nearest2d_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_h = get_scale_value(scale_factors, 0); - auto scale_w = get_scale_value(scale_factors, 1); - return at::upsample_nearest2d_backward(grad_output, osize, input_size, scale_h, scale_w); -} - -Tensor _upsample_nearest_exact2d_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_h = get_scale_value(scale_factors, 0); - auto scale_w = get_scale_value(scale_factors, 1); - return at::_upsample_nearest_exact2d_backward(grad_output, osize, input_size, scale_h, scale_w); -} - DEFINE_DISPATCH(upsample_nearest2d_kernel); DEFINE_DISPATCH(_upsample_nearest_exact2d_kernel); DEFINE_DISPATCH(upsample_nearest2d_backward_kernel); diff --git a/aten/src/ATen/native/UpSampleNearest3d.cpp b/aten/src/ATen/native/UpSampleNearest3d.cpp index 0e4040980ae2..27ca6745655c 100644 --- a/aten/src/ATen/native/UpSampleNearest3d.cpp +++ b/aten/src/ATen/native/UpSampleNearest3d.cpp @@ -1,8 +1,23 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace meta { @@ -147,7 +162,7 @@ TORCH_IMPL_FUNC(_upsample_nearest_exact3d_backward_out_cpu) ( using at::native::upsample::compute_output_size; using at::native::upsample::get_scale_value; -Tensor upsample_nearest3d_cpu( +Tensor upsample_nearest3d( const Tensor& input, at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { @@ -158,7 +173,7 @@ Tensor upsample_nearest3d_cpu( return at::upsample_nearest3d(input, osize, scale_d, scale_h, scale_w); } -Tensor _upsample_nearest_exact3d_cpu( +Tensor _upsample_nearest_exact3d( const Tensor& input, at::OptionalIntArrayRef output_size, c10::optional> scale_factors) { @@ -169,31 +184,6 @@ Tensor _upsample_nearest_exact3d_cpu( return at::_upsample_nearest_exact3d(input, osize, scale_d, scale_h, scale_w); } -// when structured kernels can handle QuantizedCPU, update these overloads to be CompositeExplicitAutograd -Tensor upsample_nearest3d_backward_cpu( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return at::upsample_nearest3d_backward(grad_output, osize, input_size, scale_d, scale_h, scale_w); -} - -Tensor _upsample_nearest_exact3d_backward_cpu( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return at::_upsample_nearest_exact3d_backward(grad_output, osize, input_size, scale_d, scale_h, scale_w); -} - DEFINE_DISPATCH(upsample_nearest3d_kernel); DEFINE_DISPATCH(_upsample_nearest_exact3d_kernel); DEFINE_DISPATCH(upsample_nearest3d_backward_kernel); diff --git a/aten/src/ATen/native/UpSampleTrilinear3d.cpp b/aten/src/ATen/native/UpSampleTrilinear3d.cpp index 73fffbe5afe7..1bf9c8f6cb4e 100644 --- a/aten/src/ATen/native/UpSampleTrilinear3d.cpp +++ b/aten/src/ATen/native/UpSampleTrilinear3d.cpp @@ -1,11 +1,22 @@ // Adapted from interp.cpp from Caffe util by Pauline Luc // Originally developed by George Papandreou +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS -#include -#include +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace meta { @@ -100,19 +111,6 @@ Tensor upsample_trilinear3d( return at::upsample_trilinear3d(input, osize, align_corners, scale_d, scale_h, scale_w); } -Tensor upsample_trilinear3d_backward( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - bool align_corners, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return at::upsample_trilinear3d_backward(grad_output, osize, input_size, align_corners, scale_d, scale_h, scale_w); -} - DEFINE_DISPATCH(upsample_trilinear3d_kernel); DEFINE_DISPATCH(upsample_trilinear3d_backward_kernel); diff --git a/aten/src/ATen/native/VariableMethodStubs.cpp b/aten/src/ATen/native/VariableMethodStubs.cpp index ce5432e677af..6191717930ae 100644 --- a/aten/src/ATen/native/VariableMethodStubs.cpp +++ b/aten/src/ATen/native/VariableMethodStubs.cpp @@ -1,5 +1,23 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif // The stubs in here are used by dynamic dispatch. It just redirects everything // to the Tensor method we manually bind in TensorBody.h. diff --git a/aten/src/ATen/native/WeightNorm.cpp b/aten/src/ATen/native/WeightNorm.cpp index bf258d80a0fb..8291120f1960 100644 --- a/aten/src/ATen/native/WeightNorm.cpp +++ b/aten/src/ATen/native/WeightNorm.cpp @@ -1,11 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include +#include #include -#include -#include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/ao_sparse/library.cpp b/aten/src/ATen/native/ao_sparse/library.cpp index 0c0042c6b143..1a284726e93f 100644 --- a/aten/src/ATen/native/ao_sparse/library.cpp +++ b/aten/src/ATen/native/ao_sparse/library.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/fbgemm_utils.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/fbgemm_utils.cpp index 2f1d8a3e7be9..cdbfda3c71bb 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/fbgemm_utils.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/fbgemm_utils.cpp @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/packed_params.h b/aten/src/ATen/native/ao_sparse/quantized/cpu/packed_params.h index 57ebba85a063..1ca66bf536a7 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/packed_params.h +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/packed_params.h @@ -11,7 +11,7 @@ namespace sparse { using LinearPackedSerializationType = std::tuple, std::vector>; -#define SPARSE_LINEAR_PACKED_PARAM_SERIALIZATION_VERSION 1 +#define SPARSE_LINEAR_PACKED_PARAM_SERIALIZATION_VERSION 2 using BCSRSerializationType = std::tuple< @@ -22,8 +22,8 @@ using BCSRSerializationType = at::Tensor, // Weight Scales (single element vector if per-tensor) (float) at::Tensor, // Wrapper for Weight Zero Points (single element vector if per-tensor) (int8_t) bool, // Quantization Scheme (true: per tensor, false: per channel) - at::Tensor, // Wrapper for Row Block Indices (int32_t) - at::Tensor, // Wrapper for Column Block Indices (int32_t) + at::Tensor, // Wrapper for Row Block Indices (int8_t, int16_t, or int32_t) + at::Tensor, // Wrapper for Column Block Indices (int8_t, int16_t, or int32_t) at::Tensor, // Wrapper for Non-Zero Weight Values, each +128 (uint8_t) int64_t, // Number of Output Channels int64_t // Number of Input Channels diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear.cpp index 12046dde22f9..de053b353758 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear.cpp @@ -1,4 +1,5 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include @@ -7,6 +8,13 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + namespace ao { namespace sparse { diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_deserialize.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_deserialize.cpp index 24d24eee66ec..d367dbe01103 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_deserialize.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_deserialize.cpp @@ -11,6 +11,7 @@ namespace ao { namespace sparse { namespace { +const int64_t serialization_version_index = 0; const int64_t bias_index = 1; const int64_t out_features_block_size_index = 2; const int64_t in_features_block_size_index = 3; @@ -127,16 +128,25 @@ c10::intrusive_ptr PackedLinearWeight::deserialize( return static_cast(static_cast(v) - 128); }); + const at::Tensor row_block_indices = + std::get(serialized); + const at::Tensor col_block_indices = + std::get(serialized); // Unpack as non backend specific untiled BCSR then pack as Fbgemm tiled BCSR // because untiled Fbgemm BCSR currently doesn't exist unpack_bcsr( reinterpret_cast(weight_origin.data_ptr()), - ao::sparse::BCSR( - std::move(weight_values), - unwrap_vector( - std::get(serialized)), // Row Indices - unwrap_vector( - std::get(serialized))), // Col Indices + AT_DISPATCH_INTEGRAL_TYPES( + row_block_indices.scalar_type(), + "packed_linear_weight_fbgemm_setup_bcsr", + [&] { + return ao::sparse::BCSR( + std::move(weight_values), + unwrap_vector( + std::get(serialized)), + unwrap_vector( + std::get(serialized))); + }), output_channels, input_channels, out_features_block_size, @@ -160,6 +170,28 @@ c10::intrusive_ptr PackedLinearWeightQnnp::deserialize( return c10::make_intrusive(serialized); } +template +struct UnsignedIndicesTypeTrait { + static_assert( + sizeof(INDICES_DTYPE) == 0, + "Invalid dtype for UnsignedIndicesTypeTrait"); +}; + +template <> +struct UnsignedIndicesTypeTrait { + using t = uint32_t; +}; + +template <> +struct UnsignedIndicesTypeTrait { + using t = uint16_t; +}; + +template <> +struct UnsignedIndicesTypeTrait { + using t = uint8_t; +}; + // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) PackedLinearWeightQnnp::PackedLinearWeightQnnp( const BCSRSerializationType& serialized) @@ -173,6 +205,17 @@ PackedLinearWeightQnnp::PackedLinearWeightQnnp( : c10::kPerChannelAffine), output_channels_(std::get(serialized)), input_channels_(std::get(serialized)) { + const int64_t serialization_version = + std::get(serialized); + TORCH_CHECK( + serialization_version <= SPARSE_LINEAR_PACKED_PARAM_SERIALIZATION_VERSION, + "Attempted to deserialize sparse qlinear packed params with an ", + "incompatible serialization version (", + serialization_version, + " > ", + SPARSE_LINEAR_PACKED_PARAM_SERIALIZATION_VERSION, + ")"); + if (orig_bias_.has_value()) { bias_ = orig_bias_.value(); @@ -242,15 +285,35 @@ PackedLinearWeightQnnp::PackedLinearWeightQnnp( std::get(serialized); deserialized_bcsr_weight_values_ = std::get(serialized); - bcsr_matrix_ = qnnpack::generateBlockCSRMatrix( - (uint32_t*)deserialized_bcsr_col_block_indices_.data_ptr(), - (uint32_t*)deserialized_bcsr_row_block_indices_.data_ptr(), - deserialized_bcsr_weight_values_.data_ptr(), - deserialized_bcsr_col_block_indices_.numel(), - deserialized_bcsr_row_block_indices_.numel(), - deserialized_bcsr_weight_values_.numel(), - out_features_block_size_, - in_features_block_size_); +#define AT_DISPATCH_CASE_BCSR_INDICES_TYPES(...) \ + AT_DISPATCH_CASE(at::ScalarType::Char, __VA_ARGS__) \ + AT_DISPATCH_CASE(at::ScalarType::Int, __VA_ARGS__) \ + AT_DISPATCH_CASE(at::ScalarType::Short, __VA_ARGS__) + +#define AT_DISPATCH_BCSR_INDICES_TYPES(TYPE, NAME, ...) \ + AT_DISPATCH_SWITCH( \ + TYPE, NAME, AT_DISPATCH_CASE_BCSR_INDICES_TYPES(__VA_ARGS__)) + + bcsr_matrix_ = AT_DISPATCH_BCSR_INDICES_TYPES( + deserialized_bcsr_row_block_indices_.scalar_type(), + "packed_linear_weight_qnnp_setup_bcsr", + [&] { + using unsigned_t = UnsignedIndicesTypeTrait::t; + return qnnpack::generateBlockCSRMatrix( + reinterpret_cast( + deserialized_bcsr_col_block_indices_.data_ptr()), + reinterpret_cast( + deserialized_bcsr_row_block_indices_.data_ptr()), + deserialized_bcsr_weight_values_.data_ptr(), + deserialized_bcsr_col_block_indices_.numel(), + deserialized_bcsr_row_block_indices_.numel(), + deserialized_bcsr_weight_values_.numel(), + out_features_block_size_, + in_features_block_size_); + }); + +#undef AT_DISPATCH_CASE_BCSR_INDICES_TYPES +#undef AT_DISPATCH_BCSR_INDICES_TYPES } #endif // USE_PYTORCH_QNNPACK diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_dynamic.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_dynamic.cpp index bd6f92c97c5e..64cab80790a9 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_dynamic.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_dynamic.cpp @@ -1,4 +1,5 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include @@ -10,6 +11,13 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + namespace ao { namespace sparse { @@ -37,7 +45,7 @@ at::Tensor PackedLinearWeightQnnp::apply_dynamic_impl( const auto cols_input = static_cast(input.size(input.dim() - 1)); TORCH_CHECK( cols_input == input_channels_, - "quantized_sparse_lienar: Input tensor's last and weight tensor's" + "quantized_sparse_linear: Input tensor's last and weight tensor's" " second dimension must match."); // On empty input, no output data will be generated, @@ -75,11 +83,12 @@ at::Tensor PackedLinearWeightQnnp::apply_dynamic_impl( output_channels_, q_input_contig.q_zero_point(), w_zero_points_.data(), - bcsr_matrix_->col_indices.data(), - bcsr_matrix_->row_values.data(), + bcsr_matrix_->col_indices_data_ptr(), + bcsr_matrix_->row_values_data_ptr(), bcsr_matrix_->values.data(), bcsr_matrix_->row_block_size, /* out_features_block_size */ bcsr_matrix_->col_block_size, /* in_features_block_size */ + bcsr_matrix_->indices_dtype, 0, /* output zero point: not used */ std::numeric_limits::min(), std::numeric_limits::max(), diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp index 616ed9011e0c..83aaf810edd7 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include @@ -7,6 +9,13 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + #include namespace ao { @@ -190,7 +199,7 @@ PackedLinearWeightQnnp::PackedLinearWeightQnnp( for (const auto i : c10::irange(wt_numel)) { qnnp_w_data[i] = static_cast(w_data[i] + 128); } - bcsr_matrix_ = qnnpack::generateBlockCSRMatrix( + bcsr_matrix_ = qnnpack::generateBlockCSRMatrix( reinterpret_cast(qnnp_w_data), output_channels_, input_channels_, diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_serialize.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_serialize.cpp index cacb2815a2a3..7fd0cb25ff20 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_serialize.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_serialize.cpp @@ -195,6 +195,37 @@ BCSRSerializationType PackedLinearWeightQnnp::serialize() { TORCH_CHECK(false, "Unsupported quantization scheme."); } + at::Tensor wrapped_row_values; + at::Tensor wrapped_col_indices; + + const uint32_t max_index = bcsr_matrix_->max_index(); + + if (max_index <= std::numeric_limits::max()) { + // Cast from uint8_t range to int8_t + wrapped_row_values = QNNPACK_BCSRMATRIX_DISPATCH_INDICES_DTYPE( + bcsr_matrix_, + { return wrap_vector(typed_bcsr->row_values, c10::kChar); }); + wrapped_col_indices = QNNPACK_BCSRMATRIX_DISPATCH_INDICES_DTYPE( + bcsr_matrix_, + { return wrap_vector(typed_bcsr->col_indices, c10::kChar); }); + } else if (max_index <= std::numeric_limits::max()) { + // Cast from uint16_t range to int16_t + wrapped_row_values = QNNPACK_BCSRMATRIX_DISPATCH_INDICES_DTYPE( + bcsr_matrix_, + { return wrap_vector(typed_bcsr->row_values, c10::kShort); }); + wrapped_col_indices = QNNPACK_BCSRMATRIX_DISPATCH_INDICES_DTYPE( + bcsr_matrix_, + { return wrap_vector(typed_bcsr->col_indices, c10::kShort); }); + } else { + // Cast from uint32_t range to int32_t + wrapped_row_values = QNNPACK_BCSRMATRIX_DISPATCH_INDICES_DTYPE( + bcsr_matrix_, + { return wrap_vector(typed_bcsr->row_values, c10::kInt); }); + wrapped_col_indices = QNNPACK_BCSRMATRIX_DISPATCH_INDICES_DTYPE( + bcsr_matrix_, + { return wrap_vector(typed_bcsr->col_indices, c10::kInt); }); + } + return BCSRSerializationType( SPARSE_LINEAR_PACKED_PARAM_SERIALIZATION_VERSION, orig_bias_, @@ -203,10 +234,8 @@ BCSRSerializationType PackedLinearWeightQnnp::serialize() { std::move(w_scales_compact), std::move(w_zero_points_compact), (q_scheme_ == c10::kPerTensorAffine), - wrap_vector( - bcsr_matrix_->row_values, c10::kInt), // Casting from uint32_t to int - wrap_vector( - bcsr_matrix_->col_indices, c10::kInt), // Casting from uint32_t to int + wrapped_row_values, + wrapped_col_indices, wrap_vector(bcsr_matrix_->values, c10::kByte), output_channels_, input_channels_); diff --git a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp index c10cc40af4a2..14cf9521a4cd 100644 --- a/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp +++ b/aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp @@ -1,10 +1,20 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#endif + namespace ao { namespace sparse { int register_linear_params(); diff --git a/aten/src/ATen/native/cpu/Activation.cpp b/aten/src/ATen/native/cpu/Activation.cpp index 6f3eac783ccd..728ea62f1898 100644 --- a/aten/src/ATen/native/cpu/Activation.cpp +++ b/aten/src/ATen/native/cpu/Activation.cpp @@ -623,7 +623,25 @@ void shrink_backward_kernel(TensorIteratorBase& iter, const Scalar& lambd) { } void hardtanh_backward_kernel(TensorIterator& iter, const Scalar& min, const Scalar& max) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardshrink_backward_cpu", [&] { + if (iter.dtype() == kBFloat16) { + auto min_val = min.to(); + auto max_val = max.to(); + cpu_kernel_vec( + iter, + [=](BFloat16 grad_val, BFloat16 self_val) -> BFloat16 { + return (float(self_val) <= min_val || float(self_val) >= max_val) ? BFloat16(0) : grad_val; + }, + [=](Vectorized grad_val, Vectorized self_val) -> Vectorized { + Vectorized grad_val0, grad_val1, self_val0, self_val1; + std::tie(grad_val0, grad_val1) = convert_bfloat16_float(grad_val); + std::tie(self_val0, self_val1) = convert_bfloat16_float(self_val); + return convert_float_bfloat16( + ((self_val0 > min_val) & (self_val0 < max_val)) & grad_val0, + ((self_val1 > min_val) & (self_val1 < max_val)) & grad_val1 + ); + }); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "hardshrink_backward_cpu", [&] { auto min_val = min.to(); auto max_val = max.to(); cpu_kernel_vec( @@ -635,6 +653,7 @@ void hardtanh_backward_kernel(TensorIterator& iter, const Scalar& min, const Sca return ((self_val > min_val) & (self_val < max_val)) & grad_val; }); }); + } } void hardswish_kernel(TensorIterator& iter) { @@ -1035,8 +1054,23 @@ void glu_backward_kernel(TensorIterator& iter) { } void silu_kernel(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( - kBFloat16, iter.dtype(), "silu_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + const Vectorized kOneVec(1.0f); + cpu_kernel_vec( + iter, + [](BFloat16 x) -> BFloat16 { + return float(x) / (1.0f + std::exp(-float(x))); + }, + [kOneVec](Vectorized x_vec) -> Vectorized { + Vectorized x_vec0, x_vec1; + std::tie(x_vec0, x_vec1) = convert_bfloat16_float(x_vec); + return convert_float_bfloat16( + x_vec0 / (kOneVec + x_vec0.neg().exp()), + x_vec1 / (kOneVec + x_vec1.neg().exp())); + }); + } else { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES( + iter.dtype(), "silu_cpu", [&]() { const Vectorized kOneVec(scalar_t(1)); cpu_kernel_vec( iter, @@ -1047,11 +1081,34 @@ void silu_kernel(TensorIteratorBase& iter) { return x_vec / (kOneVec + x_vec.neg().exp()); }); }); + } } void silu_backward_kernel(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( - kBFloat16, iter.dtype(), "silu_backward_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + const Vectorized kOneVec(1.0f); + cpu_kernel_vec( + iter, + [](BFloat16 dy, BFloat16 x) -> BFloat16 { + const float sigmoid = + 1.0f / (1.0f + std::exp(-float(x))); + return dy * sigmoid * (1.0f + x * (1.0f - sigmoid)); + }, + [kOneVec](Vectorized dy_vec, Vectorized x_vec) -> Vectorized { + Vectorized x_vec0, x_vec1, dy_vec0, dy_vec1; + std::tie(x_vec0, x_vec1) = convert_bfloat16_float(x_vec); + std::tie(dy_vec0, dy_vec1) = convert_bfloat16_float(dy_vec); + const Vectorized sigmoid0 = + kOneVec / (kOneVec + x_vec0.neg().exp()); + const Vectorized sigmoid1 = + kOneVec / (kOneVec + x_vec1.neg().exp()); + return convert_float_bfloat16( + dy_vec0 * sigmoid0 * (kOneVec + x_vec0 * (kOneVec - sigmoid0)), + dy_vec1 * sigmoid1 * (kOneVec + x_vec1 * (kOneVec - sigmoid1))); + }); + } else { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES( + iter.dtype(), "silu_backward_cpu", [&]() { const Vectorized kOneVec(scalar_t(1)); cpu_kernel_vec( iter, @@ -1066,10 +1123,26 @@ void silu_backward_kernel(TensorIteratorBase& iter) { return dy_vec * sigmoid * (kOneVec + x_vec * (kOneVec - sigmoid)); }); }); + } } void mish_kernel(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "mish_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + cpu_kernel_vec( + iter, + [](BFloat16 x) -> BFloat16{ + return static_cast(float(x) * std::tanh(std::log1p(std::exp(float(x))))); + }, + [](Vectorized x_vec) -> Vectorized { + Vectorized x_vec0, x_vec1; + std::tie(x_vec0, x_vec1) = convert_bfloat16_float(x_vec); + return convert_float_bfloat16( + x_vec0 * x_vec0.exp().log1p().tanh(), + x_vec1 * x_vec1.exp().log1p().tanh() + ); + }); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "mish_cpu", [&]() { using Vec = Vectorized; cpu_kernel_vec( iter, @@ -1080,10 +1153,36 @@ void mish_kernel(TensorIteratorBase& iter) { return x_vec * x_vec.exp().log1p().tanh(); }); }); + } } void mish_backward_kernel(TensorIterator& iter) { - AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "mish_backward_cpu", [&]() { + if (iter.dtype() == kBFloat16) { + using Vec = Vectorized; + const Vec kOneVec(1.0f); + cpu_kernel_vec( + iter, + [](BFloat16 dy, BFloat16 x) -> BFloat16 { + const float sigmoid = + 1.0f / (1.0f + std::exp(-float(x))); + const float tanh_softplus = std::tanh(std::log1p(std::exp(float(x)))); + return dy * (tanh_softplus + x * sigmoid * (1.0f - tanh_softplus * tanh_softplus)); + }, + [kOneVec](Vectorized dy_vec, Vectorized x_vec) -> Vectorized { + Vectorized x_vec0, x_vec1, dy_vec0, dy_vec1; + std::tie(x_vec0, x_vec1) = convert_bfloat16_float(x_vec); + std::tie(dy_vec0, dy_vec1) = convert_bfloat16_float(dy_vec); + const Vec sigmoid0 = kOneVec / (kOneVec + x_vec0.neg().exp()); + const Vec sigmoid1 = kOneVec / (kOneVec + x_vec1.neg().exp()); + const Vec tanh_softplus0 = x_vec0.exp().log1p().tanh(); + const Vec tanh_softplus1 = x_vec1.exp().log1p().tanh(); + return convert_float_bfloat16( + dy_vec0 * (tanh_softplus0 + x_vec0 * sigmoid0 * (kOneVec - tanh_softplus0 * tanh_softplus0)), + dy_vec1 * (tanh_softplus1 + x_vec1 * sigmoid1 * (kOneVec - tanh_softplus1 * tanh_softplus1)) + ); + }); + } else { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "mish_backward_cpu", [&]() { using Vec = Vectorized; const Vec kOneVec(scalar_t(1)); cpu_kernel_vec( @@ -1100,6 +1199,7 @@ void mish_backward_kernel(TensorIterator& iter) { return dy_vec * (tanh_softplus + x_vec * sigmoid * (kOneVec - tanh_softplus * tanh_softplus)); }); }); + } } void prelu_cpu_kernel(TensorIterator& iter) { diff --git a/aten/src/ATen/native/cpu/AtomicAddFloat.h b/aten/src/ATen/native/cpu/AtomicAddFloat.h index db96e1760de5..5b24ee4821c4 100644 --- a/aten/src/ATen/native/cpu/AtomicAddFloat.h +++ b/aten/src/ATen/native/cpu/AtomicAddFloat.h @@ -1,7 +1,7 @@ #ifndef ATOMIC_ADD_FLOAT #define ATOMIC_ADD_FLOAT -#if (defined(__x86_64__) || defined(__i386__)) +#if (defined(__x86_64__) || defined(__i386__) || defined(__aarch64__)) #include #else #define _mm_pause() @@ -24,7 +24,11 @@ static inline void cpu_atomic_add_float(float* dst, float fvalue) unsigned* old_intV = (unsigned*)(&old_value.intV); while (!std::atomic_compare_exchange_strong(dst_intV, old_intV, new_value.intV)) { +#ifdef __aarch64__ + __asm__ __volatile__("yield;" : : : "memory"); +#else _mm_pause(); +#endif old_value.floatV = *dst; new_value.floatV = old_value.floatV + fvalue; } diff --git a/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp b/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp index 2c9ac5ac15b6..9b5f442ef02c 100644 --- a/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp +++ b/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp @@ -68,8 +68,8 @@ void mul_kernel(TensorIteratorBase& iter) { } else { AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2(kBFloat16, kHalf, iter.dtype(), "mul_cpu", [&]() { cpu_kernel_vec(iter, - [=](scalar_t a, scalar_t b) -> scalar_t { return a * b; }, - [=](Vectorized a, Vectorized b) { + [=](scalar_t a, scalar_t b) __ubsan_ignore_undefined__ -> scalar_t { return a * b; }, + [=](Vectorized a, Vectorized b) __ubsan_ignore_undefined__ { return a * b; }); }); @@ -314,10 +314,13 @@ void bitwise_xor_kernel(TensorIteratorBase& iter) { void lshift_kernel(TensorIteratorBase& iter) { AT_DISPATCH_INTEGRAL_TYPES(iter.dtype(), "lshift_cpu", [&]() { - cpu_kernel(iter, - [](scalar_t a, scalar_t b) -> scalar_t { - return static_cast>(a) << b; - }); + cpu_kernel_vec(iter, + [](scalar_t a, scalar_t b) -> scalar_t { + return static_cast>(a) << b; + }, + [](Vectorized a, Vectorized b) { + return a << b; + }); }); } @@ -380,10 +383,13 @@ void logical_xor_kernel(TensorIterator& iter) { void rshift_kernel(TensorIteratorBase& iter) { AT_DISPATCH_INTEGRAL_TYPES(iter.dtype(), "rshift_cpu", [&]() { - cpu_kernel(iter, - [](scalar_t a, scalar_t b) -> scalar_t { - return a >> b; - }); + cpu_kernel_vec(iter, + [](scalar_t a, scalar_t b) -> scalar_t { + return a >> b; + }, + [](Vectorized a, Vectorized b) { + return a >> b; + }); }); } diff --git a/aten/src/ATen/native/cpu/BlasKernel.cpp b/aten/src/ATen/native/cpu/BlasKernel.cpp index cf12c392f868..7a27b152edf7 100644 --- a/aten/src/ATen/native/cpu/BlasKernel.cpp +++ b/aten/src/ATen/native/cpu/BlasKernel.cpp @@ -2,6 +2,7 @@ #include #include #include +#include namespace at { namespace native { @@ -30,6 +31,29 @@ void scale_(int64_t m, int64_t n, opmath_t alpha, scalar_t *a, int64_t lda) { } } +template +auto sum(int64_t N, Func f) { + constexpr int ilp_factor = 4; + using acc_t = decltype(f(0)); + + // Calculate independent partial sums then add together at the end + std::array partial_sums{}; + + int64_t i = 0; + for (; i + ilp_factor <= N; i += ilp_factor) { + c10::ForcedUnroll{}([&](int k) { + partial_sums[k] += f(i + k); + }); + } + for (; i < N; ++i) { + partial_sums[0] += f(i); + } + for (int k = 1; k < ilp_factor; ++k) { + partial_sums[0] += partial_sums[k]; + } + return partial_sums[0]; +} + template void gemm_notrans_( @@ -73,15 +97,15 @@ void gemm_transa_( for (const auto i : c10::irange(m)) { const scalar_t *b_ = b; for (const auto j : c10::irange(n)) { - opmath_t sum = 0; - for (const auto l : c10::irange(k)) { - sum += static_cast(a_[l]) * static_cast(b_[l]); - } + const auto dot = sum(k, [&](int64_t l) -> opmath_t { + return static_cast(a_[l]) * static_cast(b_[l]); + }); b_ += ldb; - if (beta == scalar_t(0)) - c[j*ldc+i] = alpha*sum; - else - c[j*ldc+i] = beta*c[j*ldc+i]+alpha*sum; + if (beta == opmath_t(0)) { + c[j*ldc+i] = alpha*dot; + } else { + c[j*ldc+i] = beta*c[j*ldc+i]+alpha*dot; + } } a_ += lda; } @@ -124,26 +148,19 @@ void gemm_transab_( const scalar_t *b, int64_t ldb, opmath_t beta, scalar_t *c, int64_t ldc) { - // c *= beta - scale_(m, n, beta, c, ldc); - - // c += alpha * (a.T @ b.T) + // c = beta * c + alpha * (a.T @ b.T) for (const auto i : c10::irange(m)) { for (const auto j : c10::irange(n)) { - int64_t l_k = k / 4; - for (const auto l_l : c10::irange(l_k)) { - c[j * ldc + i] += a[i * lda + l_l * 4 + 0] // - * (b[(l_l * 4 + 0) * ldb + j] * alpha); - c[j * ldc + i] += a[i * lda + l_l * 4 + 1] // - * (b[(l_l * 4 + 1) * ldb + j] * alpha); - c[j * ldc + i] += a[i * lda + l_l * 4 + 2] // - * (b[(l_l * 4 + 2) * ldb + j] * alpha); - c[j * ldc + i] += a[i * lda + l_l * 4 + 3] // - * (b[(l_l * 4 + 3) * ldb + j] * alpha); + const auto dot = sum(k, [&](int64_t l) -> opmath_t { + return static_cast(a[i * lda + l]) * + static_cast(b[l * ldb + j]); + }); + + if (beta == opmath_t(0)) { + c[j * ldc + i] = alpha * dot; + } else { + c[j * ldc + i] = beta * c[j * ldc + i] + alpha * dot; } - int64_t l = l_k * 4; - for (; l < k; l++) - c[j * ldc + i] += a[i * lda + l] * (b[l * ldb + j] * alpha); } } } diff --git a/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp b/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp index 769c9028e7b0..57bd4f3badc0 100644 --- a/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp +++ b/aten/src/ATen/native/cpu/ChannelShuffleKernel.cpp @@ -1,8 +1,10 @@ -#include +#define TORCH_ASSERT_NO_OPERATORS +#include + +#include #include #include #include -#include #include #include @@ -12,8 +14,8 @@ namespace { template void cpu_channel_shuffle( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t groups) { auto input_data = input.data_ptr(); auto output_data = output.data_ptr(); @@ -57,8 +59,8 @@ void cpu_channel_shuffle( template void cpu_channel_shuffle_cl( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t groups) { auto input_data = input.data_ptr(); auto output_data = output.data_ptr(); @@ -83,8 +85,8 @@ void cpu_channel_shuffle_cl( } void channel_shuffle_kernel_impl( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t groups) { switch (input.suggest_memory_format()) { case at::MemoryFormat::Contiguous: { diff --git a/aten/src/ATen/native/cpu/ChannelShuffleKernel.h b/aten/src/ATen/native/cpu/ChannelShuffleKernel.h index 939a6c4b172d..10e592cf59eb 100644 --- a/aten/src/ATen/native/cpu/ChannelShuffleKernel.h +++ b/aten/src/ATen/native/cpu/ChannelShuffleKernel.h @@ -1,12 +1,14 @@ -#include -#include +#pragma once #include +#include -#pragma once +namespace at { +class TensorBase; +} namespace at { namespace native { -using channel_shuffle_fn = void(*)(Tensor&, const Tensor&, int64_t); +using channel_shuffle_fn = void(*)(TensorBase&, const TensorBase&, int64_t); DECLARE_DISPATCH(channel_shuffle_fn, channel_shuffle_kernel); }} // at::native diff --git a/aten/src/ATen/native/cpu/CopyKernel.cpp b/aten/src/ATen/native/cpu/CopyKernel.cpp index de1841d989c3..c6411efd77cd 100644 --- a/aten/src/ATen/native/cpu/CopyKernel.cpp +++ b/aten/src/ATen/native/cpu/CopyKernel.cpp @@ -13,9 +13,6 @@ namespace native { inline namespace CPU_CAPABILITY { void neg_kernel(TensorIteratorBase &iter); void conj_kernel(TensorIteratorBase &iter); -} // namespace CPU_CAPABILITY - -namespace { void float_bfloat16_copy_kernel(TensorIteratorBase &iter, bool requires_neg) { auto strides_out = iter.strides(0); @@ -52,8 +49,7 @@ void float_bfloat16_copy_kernel(TensorIteratorBase &iter, bool requires_neg) { std::copy_n(base, 2, data.data()); const int64_t *outer_strides = &strides[2]; - for (const auto it : c10::irange(size1)) { - (void)it; + for (const auto it C10_UNUSED : c10::irange(size1)) { Vecd dst_s; if (strides_in[0] == 0) { dst_s = Vecd(dest_t(*((scalar_t*)data[1]))); @@ -122,8 +118,7 @@ void float_bfloat16_copy_kernel(TensorIteratorBase &iter, bool requires_neg) { std::copy_n(base, 2, data.data()); const int64_t *outer_strides = &strides[2]; - for (const auto it : c10::irange(size1)) { - (void)it; + for (const auto it C10_UNUSED : c10::irange(size1)) { Vecd dst_s; if (strides_in[0] == 0) { dst_s = Vecd(dest_t(*((scalar_t*)data[1]))); @@ -246,22 +241,20 @@ void copy_kernel(TensorIterator& iter, bool /*non_blocking*/) { AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4(ScalarType::ComplexHalf, ScalarType::Half, ScalarType::Bool, ScalarType::BFloat16, dtype, "copy_", [&] { using dest_t = scalar_t; AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4(ScalarType::ComplexHalf, ScalarType::Half, ScalarType::Bool, ScalarType::BFloat16, iter.dtype(1), "copy_", [&] { - // Note (@zasdfgbnm): - // - // The code below can not be simplified as - // cpu_kernel(iter, c10::static_cast_with_inter_type::apply); - // - // because this would force the compiler to instantiate the inline function and generate a function call in the loop - // instead of inlining it, making all the optimizations like vectorization impossible. - // You can verify this by looking the the symbols of `libtorch_cpu.so`: - // - // readelf -Ws libtorch_cpu.so | grep static_cast_with_inter_type - // - // If done correctly, the above command should have no output. - // - // See: https://github.com/pytorch/pytorch/issues/31271 - cpu_kernel(iter, [](scalar_t src) -> dest_t { - return c10::static_cast_with_inter_type::apply(src); }); + if (iter.has_contiguous_first_dim()) { + TORCH_INTERNAL_ASSERT(iter.ninputs() == 1); + TORCH_INTERNAL_ASSERT(iter.noutputs() == 1); + + iter.for_each([](char **data, const int64_t *strides, int64_t size) { + auto src = reinterpret_cast(data[1]); + auto dst = reinterpret_cast(data[0]); + at::vec::convert(src, dst, size); + }); + } else { + cpu_kernel(iter, [](scalar_t x) -> dest_t { + return c10::convert(x); + }); + } }); }); @@ -274,7 +267,7 @@ void copy_kernel(TensorIterator& iter, bool /*non_blocking*/) { } } -} // anonymous namespace +} // namespace CPU_CAPABILITY REGISTER_DISPATCH(copy_stub, ©_kernel); diff --git a/aten/src/ATen/native/cpu/CopyKernel.h b/aten/src/ATen/native/cpu/CopyKernel.h new file mode 100644 index 000000000000..9d2affd6101a --- /dev/null +++ b/aten/src/ATen/native/cpu/CopyKernel.h @@ -0,0 +1,12 @@ +#pragma once + +namespace at { +struct TensorIteratorBase; + +namespace native { +inline namespace CPU_CAPABILITY { + +void direct_copy_kernel(TensorIteratorBase &iter); +void copy_kernel(TensorIterator& iter, bool /*non_blocking*/); + +}}} // namespace at::native::CPU_CAPABILITY diff --git a/aten/src/ATen/native/cpu/DepthwiseConvKernel.h b/aten/src/ATen/native/cpu/DepthwiseConvKernel.h index 56956b443386..80970074b8e6 100644 --- a/aten/src/ATen/native/cpu/DepthwiseConvKernel.h +++ b/aten/src/ATen/native/cpu/DepthwiseConvKernel.h @@ -1,6 +1,7 @@ #pragma once #include +#include /* Depthwise 3x3 Winograd convolution operator @@ -12,7 +13,7 @@ class Tensor; namespace native { using convolution_depthwise3x3_winograd_fn = - Tensor (*)(const Tensor &, const Tensor &, const Tensor &,IntArrayRef, IntArrayRef, int64_t); + Tensor (*)(const Tensor &, const Tensor &, const Tensor &, IntArrayRef, IntArrayRef, int64_t); DECLARE_DISPATCH(convolution_depthwise3x3_winograd_fn, convolution_depthwise3x3_winograd_stub); diff --git a/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp b/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp index 98404005c551..9f88a23c8e36 100644 --- a/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp +++ b/aten/src/ATen/native/cpu/DistanceOpsKernel.cpp @@ -394,8 +394,7 @@ struct Dist { const scalar_t * t1_end = t1 + l1_size; const scalar_t * t2_end = t2 + l2_size; - for (const auto l : c10::irange(d)) { - (void)l; //Suppress unused variable warning + for (const auto l C10_UNUSED : c10::irange(d)) { for (; t1 != t1_end; t1 += m, res += m) { const Vec vec_t1 = Vec::loadu(t1, count); Vec res_vec = Vec::loadu(res, count); diff --git a/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp b/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp index 0f4d4b607717..de3be1587e56 100644 --- a/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp +++ b/aten/src/ATen/native/cpu/FunctionOfAMatrixUtilsKernel.cpp @@ -30,8 +30,7 @@ void _compute_linear_combination_cpu_kernel( auto* RESTRICT in_ptr = data[1]; auto* RESTRICT coeff_ptr = data[2]; - for (const auto elem : c10::irange(n)) { - (void)elem; //Suppress unused variable warning + for (const auto elem C10_UNUSED : c10::irange(n)) { auto* RESTRICT out_data = reinterpret_cast(out_ptr); auto* RESTRICT in_data = reinterpret_cast(in_ptr); using primitive_t = typename scalar_value_type::type; diff --git a/aten/src/ATen/native/cpu/HistogramKernel.cpp b/aten/src/ATen/native/cpu/HistogramKernel.cpp index 6d6b4a749fb2..83011aa2e9a7 100644 --- a/aten/src/ATen/native/cpu/HistogramKernel.cpp +++ b/aten/src/ATen/native/cpu/HistogramKernel.cpp @@ -148,8 +148,8 @@ void histogramdd_cpu_contiguous(Tensor& hist, const TensorList& bin_edges, for (const auto dim : c10::irange(D)) { const input_t elt = accessor_in[i][dim]; - // Skips elements which fall outside the specified bins - if (elt < leftmost_edge[dim] || rightmost_edge[dim] < elt) { + // Skips elements which fall outside the specified bins and NaN elements + if (!(elt >= leftmost_edge[dim] && elt <= rightmost_edge[dim])) { skip_elt = true; break; } @@ -166,8 +166,8 @@ void histogramdd_cpu_contiguous(Tensor& hist, const TensorList& bin_edges, * the appropriate bin via simple division. */ pos = static_cast((elt - leftmost_edge[dim]) - / (rightmost_edge[dim] - leftmost_edge[dim]) - * (num_bin_edges[dim] - 1)); + * (num_bin_edges[dim] - 1) + / (rightmost_edge[dim] - leftmost_edge[dim])); /* Ensures consistency with bin_edges by checking the bins to the left and right * of the selected position. Necessary for cases in which an element very close diff --git a/aten/src/ATen/native/cpu/IndexKernel.cpp b/aten/src/ATen/native/cpu/IndexKernel.cpp index be8e1a0a7315..81e135d1e749 100644 --- a/aten/src/ATen/native/cpu/IndexKernel.cpp +++ b/aten/src/ATen/native/cpu/IndexKernel.cpp @@ -74,8 +74,7 @@ void cpu_take_put_kernel( auto loop = [&](char** data, const int64_t* strides, int64_t n) { auto* iterated_data_bytes = data[0]; auto* index_data_bytes = data[1]; - for (const auto elem : c10::irange(n)) { - (void)elem; //Suppress unused variable warning + for (const auto elem C10_UNUSED : c10::irange(n)) { auto idx = *reinterpret_cast(index_data_bytes); auto& iterated = *reinterpret_cast(iterated_data_bytes); @@ -192,8 +191,7 @@ void index_fill_kernel( auto handle_nonzero_idx_stride = [&](char** data, const int64_t* strides, int64_t n) { auto* self_data_bytes = data[0]; auto* index_data_bytes = data[1]; - for (const auto elem : c10::irange(n)) { - (void)elem; //Suppress unused variable warning + for (const auto elem C10_UNUSED : c10::irange(n)) { auto* self_data = reinterpret_cast(self_data_bytes); auto idx = *reinterpret_cast(index_data_bytes); TORCH_CHECK_INDEX(idx >= -self_dim_size && idx < self_dim_size, @@ -219,8 +217,7 @@ void index_fill_kernel( if (idx < 0) { idx += self_dim_size; } - for (const auto elem : c10::irange(n)) { - (void)elem; //Suppress unused variable warning + for (const auto elem C10_UNUSED: c10::irange(n)) { auto* self_data = reinterpret_cast(self_data_bytes); self_data[idx * self_dim_stride] = fill_val; @@ -253,8 +250,7 @@ void index_copy_kernel( auto* self_data_bytes = data[0]; auto* index_data_bytes = data[1]; auto* source_data_bytes = data[2]; - for (const auto elem : c10::irange(n)) { - (void)elem; //Suppress unused variable warning + for (const auto elem C10_UNUSED : c10::irange(n)) { auto* self_data = reinterpret_cast(self_data_bytes); auto idx = *reinterpret_cast(index_data_bytes); auto* source_data = reinterpret_cast(source_data_bytes); @@ -277,8 +273,7 @@ void index_copy_kernel( TORCH_CHECK_INDEX(idx >= 0 && idx < self_dim_size, "index_copy_(): index ", idx, " is out of bounds for dimension ", dim, " with size ", self_dim_size); - for (const auto elem : c10::irange(n)) { - (void)elem; //Suppress unused variable warning + for (const auto elem C10_UNUSED : c10::irange(n)) { auto* self_data = reinterpret_cast(self_data_bytes); auto* source_data = reinterpret_cast(source_data_bytes); @@ -462,6 +457,75 @@ void masked_select_kernel(TensorIterator& iter, int64_t result_stride) { }); } + +template +void cpu_hflip_vec(at::TensorIterator& iter) { + + auto loop2d = [&](char** base, const int64_t *strides, int64_t size0, int64_t size1) { + + static constexpr int ntensors = 3; + std::array data_arr; + std::copy_n(base, ntensors, data_arr.data()); + const int64_t *outer_strides = &strides[ntensors]; + + using Vec = Vectorized; + + constexpr auto stride = sizeof(scalar_t); + TORCH_INTERNAL_ASSERT(stride == -strides[0] && stride == strides[1]); + + for (const auto j C10_UNUSED : c10::irange(size1)) { + + // vectorized loop with negative stride for output + char** C10_RESTRICT data_ = data_arr.data(); + int64_t n = size0; + + char* C10_RESTRICT data[ntensors]; + for (const auto arg : c10::irange(ntensors)) { + data[arg] = data_[arg]; + } + + int64_t i = 0; + + // data[0] unaligned pre-pass + int64_t offset = (j * n + (n - i - Vec::size())) % 32; + offset = (offset >= n) ? n : offset; + for (; i < offset; i++) { + scalar_t* out_ptr = (scalar_t*)(data[0] - i * stride); + *out_ptr = *(scalar_t *)(data[1] + i * stride); + } + // Empirically found that it is faster to process 3 data items together vs 2 or 4 + for (; i <= n - 3 * Vec::size(); i += 3 * Vec::size()) { + auto out1 = Vec::loadu(data[1] + i * stride); + auto out2 = Vec::loadu(data[1] + (i + Vec::size()) * stride); + auto out3 = Vec::loadu(data[1] + (i + 2 * Vec::size()) * stride); + // flip the vector: 1234 -> 4321 + out1 = flip(out1); + out2 = flip(out2); + out3 = flip(out3); + out1.store(data[0] - (i + Vec::size() - 1) * stride); + out2.store(data[0] - (i + 2 * Vec::size() - 1) * stride); + out3.store(data[0] - (i + 3 * Vec::size() - 1) * stride); + } + if (i < n) { + for (; i < n; i++) { + scalar_t* out_ptr = (scalar_t*)(data[0] - i * stride); + *out_ptr = *(scalar_t *)(data[1] + i * stride); + } + } + + // advance: + for (const auto arg : c10::irange(data_arr.size())) { + data_arr[arg] += outer_strides[arg]; + } + } + }; + + int64_t grain_size = at::internal::GRAIN_SIZE; + iter.for_each(loop2d, grain_size); + iter.cast_outputs(); +} + + void flip_kernel(TensorIterator& iter, const bool quantized) { if (quantized) { AT_DISPATCH_QINT_AND_SUB_BYTE_TYPES(iter.dtype(), "flip_quantized_cpu", @@ -471,6 +535,29 @@ void flip_kernel(TensorIterator& iter, const bool quantized) { }); }); } else { + // Special case: horizontal flip with vectorization and input is contiguous + // Context: horizontal flip leads to strides[0] < 0 and + // thus is_contiguous condition is not satisfied and non-vectorized code path is taken. + auto output_strides = iter.strides(0); + auto input_strides = iter.strides(1); + if (iter.ndim() > 0 && output_strides[0] < 0 && input_strides[0] == iter.element_size(1)) { + auto iter_dtype = iter.dtype(); + if (iter_dtype == kByte) { + return cpu_hflip_vec(iter); + } else if (iter_dtype == kFloat) { + return cpu_hflip_vec(iter); + } else if (iter_dtype == kInt) { + return cpu_hflip_vec(iter); + } else if (iter_dtype == kShort) { + return cpu_hflip_vec(iter); + } else if (iter_dtype == kLong) { + return cpu_hflip_vec(iter); + } else if (iter_dtype == kDouble) { + return cpu_hflip_vec(iter); + } + // other dtypes are handled below with cpu_kernel_vec + } + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kBool, kHalf, kBFloat16, iter.dtype(), "flip_cpu", [&iter] { cpu_kernel_vec(iter, [](scalar_t a, scalar_t /*dummy input*/) -> scalar_t { diff --git a/aten/src/ATen/native/cpu/LerpKernel.cpp b/aten/src/ATen/native/cpu/LerpKernel.cpp index 28b2cde664ab..afff85370acd 100644 --- a/aten/src/ATen/native/cpu/LerpKernel.cpp +++ b/aten/src/ATen/native/cpu/LerpKernel.cpp @@ -4,35 +4,127 @@ #include #include +#include + namespace at { namespace native { namespace { +template +Vectorized is_lerp_weight_small(Vectorized weight) { + static_assert(!c10::is_complex::value, ""); + return weight.abs() < Vectorized(0.5); +} + +// is_lerp_weight_small doesn't work for complex because z.abs() returns a +// complex vector which can't be compared. Either implement it with z.abs_2_(), +// or fallback to the scalar function. +#if !(defined(CPU_CAPABILITY_DEFAULT) || defined(_MSC_VER)) +template +Vectorized> is_lerp_weight_small(Vectorized> weight) { + using vec_reg_t = decltype(weight.abs_2_()); + vec_reg_t mask = Vectorized(weight.abs_2_()) < Vectorized(0.25); + return Vectorized>(mask); +} +#else +template +Vectorized lerp_vec_map(Vectorized start, Vectorized end, Vectorized weight) { + using vec_t = Vectorized; + __at_align__ scalar_t start_arr[vec_t::size()]; + __at_align__ scalar_t end_arr[vec_t::size()]; + __at_align__ scalar_t weight_arr[vec_t::size()]; + __at_align__ scalar_t result_arr[vec_t::size()]; + + start.store(start_arr); + end.store(end_arr); + weight.store(weight_arr); + + for (auto i : c10::irange(vec_t::size())) { + result_arr[i] = lerp(start_arr[i], end_arr[i], weight_arr[i]); + } + return vec_t::loadu(result_arr); +} + +template +Vectorized> lerp_vec(Vectorized> start, Vectorized> end, Vectorized> weight) { + return lerp_vec_map(start, end, weight); +} +#endif + +template +Vectorized lerp_vec(Vectorized start, Vectorized end, Vectorized weight) { + using vec_t = Vectorized; + auto mask = is_lerp_weight_small(weight); + auto coeff = vec_t::blendv(weight - vec_t(1), weight, mask); + auto base = vec_t::blendv(end, start, mask); + return vec::fmadd(coeff, end - start, base); +} + void lerp_scalar_kernel(at::TensorIteratorBase& iter, const Scalar& weight) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(iter.common_dtype(), "lerp_kernel_scalar", [&] { - using value_t = typename c10::scalar_value_type::type; - scalar_t weight_val = weight.to(); - at::native::cpu_kernel( - iter, - [weight_val](scalar_t self_val, scalar_t end_val) { - return (zabs(weight_val) < 0.5) - ? self_val + weight_val * (end_val - self_val) - : end_val - (end_val - self_val) * (scalar_t(1) - weight_val); - }); - }); + if (iter.common_dtype() == kBFloat16) { + using bVec = Vectorized; + using fVec = Vectorized; + float weight_val = weight.to(); + auto weight_vec = fVec(weight_val); + at::native::cpu_kernel_vec( + iter, + [weight_val](BFloat16 self_val, BFloat16 end_val) -> BFloat16 { + return lerp(self_val, end_val, weight_val); + }, + [=](bVec self_vec, bVec end_vec) -> bVec { + fVec self_vec0, self_vec1, end_vec0, end_vec1; + std::tie(self_vec0, self_vec1) = convert_bfloat16_float(self_vec); + std::tie(end_vec0, end_vec1) = convert_bfloat16_float(end_vec); + auto result0 = lerp_vec(self_vec0, end_vec0, weight_vec); + auto result1 = lerp_vec(self_vec1, end_vec1, weight_vec); + return convert_float_bfloat16(result0, result1); + }); + } else { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(iter.common_dtype(), "lerp_kernel_scalar", [&] { + auto weight_val = weight.to(); + at::native::cpu_kernel_vec( + iter, + [weight_val](scalar_t self_val, scalar_t end_val) { + return lerp(self_val, end_val, weight_val); + }, + [weight_val](Vectorized self, Vectorized end) { + const Vectorized weight(weight_val); + return lerp_vec(self, end, weight); + }); + }); + } } void lerp_tensor_kernel(at::TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(iter.common_dtype(), "lerp_kernel_tensor", [&] { - using value_t = typename c10::scalar_value_type::type; - at::native::cpu_kernel( - iter, - [](scalar_t self_val, scalar_t end_val, scalar_t weight_val) { - return (zabs(weight_val) < 0.5) - ? self_val + weight_val * (end_val - self_val) - : end_val - (end_val - self_val) * (scalar_t(1) - weight_val); - }); - }); + if (iter.common_dtype() == kBFloat16) { + using bVec = Vectorized; + using fVec = Vectorized; + at::native::cpu_kernel_vec( + iter, + [=](BFloat16 self_val, BFloat16 end_val, BFloat16 weight_val) -> BFloat16 { + return lerp(self_val, end_val, weight_val); + }, + [=](bVec self_vec, bVec end_vec, bVec weight_vec) -> bVec { + fVec self_vec0, self_vec1, end_vec0, end_vec1, weight_vec0, weight_vec1; + std::tie(self_vec0, self_vec1) = convert_bfloat16_float(self_vec); + std::tie(end_vec0, end_vec1) = convert_bfloat16_float(end_vec); + std::tie(weight_vec0, weight_vec1) = convert_bfloat16_float(weight_vec); + auto result0 = lerp_vec(self_vec0, end_vec0, weight_vec0); + auto result1 = lerp_vec(self_vec1, end_vec1, weight_vec1); + return convert_float_bfloat16(result0, result1); + }); + } else { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(iter.common_dtype(), "lerp_kernel_tensor", [&] { + at::native::cpu_kernel_vec( + iter, + [](scalar_t self_val, scalar_t end_val, scalar_t weight_val) { + return lerp(self_val, end_val, weight_val); + }, + [](Vectorized self_val, Vectorized end_val, Vectorized weight_val) { + return lerp_vec(self_val, end_val, weight_val); + }); + }); + } } } // anonymous namespace diff --git a/aten/src/ATen/native/cpu/Loops.h b/aten/src/ATen/native/cpu/Loops.h index 2558736ddc0f..8e76cca50f01 100644 --- a/aten/src/ATen/native/cpu/Loops.h +++ b/aten/src/ATen/native/cpu/Loops.h @@ -269,8 +269,7 @@ struct VectorizedLoop2d { const int64_t *outer_strides = &strides[ntensors]; if (is_contiguous(strides)) { - for (const auto i : c10::irange(size1)) { - (void)i; + for (const auto i C10_UNUSED : c10::irange(size1)) { vectorized_loop(data.data(), size0, 0, op, vop); advance(data, outer_strides); } @@ -278,14 +277,12 @@ struct VectorizedLoop2d { using Indices = std::make_index_sequence; unroll_contiguous_scalar_checks(strides, Indices{}, [&](size_t idx) { if (idx) { - for (const auto i : c10::irange(size1)) { - (void)i; + for (const auto i C10_UNUSED : c10::irange(size1)) { vectorized_loop(data.data(), size0, idx, op, vop); advance(data, outer_strides); } } else { - for (const auto i : c10::irange(size1)) { - (void)i; + for (const auto i C10_UNUSED : c10::irange(size1)) { basic_loop(data.data(), strides, 0, size0, op); advance(data, outer_strides); } diff --git a/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp b/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp index aedd845fee89..0045edd2feaf 100644 --- a/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp +++ b/aten/src/ATen/native/cpu/PixelShuffleKernel.cpp @@ -1,8 +1,10 @@ -#include +#define TORCH_ASSERT_NO_OPERATORS +#include + +#include #include #include #include -#include #include #include @@ -12,8 +14,8 @@ namespace { template void cpu_pixel_shuffle( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t upscale_factor) { auto input_data = input.data_ptr(); auto output_data = output.data_ptr(); @@ -52,8 +54,8 @@ void cpu_pixel_shuffle( template void cpu_pixel_shuffle_channels_last( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t upscale_factor) { TORCH_CHECK(input.ndimension() == 4, "pixel shuffle with channels last format supports tensors with 4 dims"); @@ -110,8 +112,8 @@ void cpu_pixel_shuffle_channels_last( template void cpu_pixel_unshuffle( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t downscale_factor) { auto input_data = input.data_ptr(); auto output_data = output.data_ptr(); @@ -151,8 +153,8 @@ void cpu_pixel_unshuffle( template void cpu_pixel_unshuffle_channels_last( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t downscale_factor) { TORCH_CHECK(input.ndimension() == 4, "pixel unshuffle with channels last format supports tensors with 4 dims"); @@ -192,8 +194,8 @@ void cpu_pixel_unshuffle_channels_last( } void pixel_shuffle_kernel_impl( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t upscale_factor) { switch (input.suggest_memory_format()) { case at::MemoryFormat::Contiguous: { @@ -216,8 +218,8 @@ void pixel_shuffle_kernel_impl( } void pixel_unshuffle_kernel_impl( - Tensor& output, - const Tensor& input, + TensorBase& output, + const TensorBase& input, int64_t downscale_factor) { switch (input.suggest_memory_format()) { case at::MemoryFormat::Contiguous: { diff --git a/aten/src/ATen/native/cpu/PixelShuffleKernel.h b/aten/src/ATen/native/cpu/PixelShuffleKernel.h index f7234edf0e60..c015e674a24c 100644 --- a/aten/src/ATen/native/cpu/PixelShuffleKernel.h +++ b/aten/src/ATen/native/cpu/PixelShuffleKernel.h @@ -1,12 +1,13 @@ -#include -#include +#pragma once #include -#pragma once +namespace at { +class TensorBase; +} namespace at { namespace native { -using pixel_shuffle_fn = void(*)(Tensor&, const Tensor&, int64_t); +using pixel_shuffle_fn = void(*)(TensorBase&, const TensorBase&, int64_t); DECLARE_DISPATCH(pixel_shuffle_fn, pixel_shuffle_kernel); DECLARE_DISPATCH(pixel_shuffle_fn, pixel_unshuffle_kernel); diff --git a/aten/src/ATen/native/cpu/README.md b/aten/src/ATen/native/cpu/README.md index ab2f9d3d0260..2cf6fa0a1332 100644 --- a/aten/src/ATen/native/cpu/README.md +++ b/aten/src/ATen/native/cpu/README.md @@ -64,7 +64,7 @@ within 256bit & 512bits registers. vec defines various operators such as As an example `ReduceOpsKernel.cpp` implements a generic `kernel_` that reduces an entire array using a given associative binary operation such as +. -More explicity, calling `kernel_` with template argument `std::plus` will cause +More explicitly, calling `kernel_` with template argument `std::plus` will cause it to sum up the entire array into a single value. `ReduceOpsKernel.cpp` uses the `CPU_CAPABILITY_*` macros to "know" under which @@ -73,7 +73,7 @@ generic code, which will be compiled under multipled compilation settings. `../ReduceOps.cpp` now includes the header `ReduceOpsKernel.h`, which contains a generic definition of `sumImplAll`. This function allows the user to reduce -over a dimension or all dimensions. The appropiate capability is chosen at +over a dimension or all dimensions. The appropriate capability is chosen at runtime using cpuinfo. If the current platform has AVX2, `sumImpl` will be set to `sumImplAll`. diff --git a/aten/src/ATen/native/cpu/Reduce.h b/aten/src/ATen/native/cpu/Reduce.h index 8fe94699503b..fdb1c0d1a0fc 100644 --- a/aten/src/ATen/native/cpu/Reduce.h +++ b/aten/src/ATen/native/cpu/Reduce.h @@ -69,8 +69,7 @@ static inline void vectorized_reduction(char** data, int64_t n, int64_t stride, template static inline void UNARY_OUTER_LOOP(char* data[2], const int64_t strides[2], int64_t n, F f) { - for (const auto j : c10::irange(n)) { - (void)j; //Suppress unused variable warning + for (const auto j C10_UNUSED : c10::irange(n)) { f(); data[0] += strides[0]; data[1] += strides[1]; diff --git a/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp b/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp index 52e18faf737d..bbf45ba2ecd0 100644 --- a/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp +++ b/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp @@ -61,8 +61,7 @@ static inline void cpu_cum_base_kernel(const Tensor& result, auto* result_data_bytes = data[0]; const auto* self_data_bytes = data[1]; - for (const auto i : c10::irange(n)) { - (void)i; //Suppress unused variable warning + for (const auto i C10_UNUSED : c10::irange(n)) { f( (scalar_t*)result_data_bytes, result_dim_stride, (scalar_t*)self_data_bytes, self_dim_stride, init_val @@ -185,7 +184,7 @@ static void prod_kernel_impl(TensorIterator& iter) { // NOLINTNEXTLINE(bugprone-argument-comment) /*identity=*/1); } else { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX(iter.dtype(), "prod_cpu", [&] { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND(kBFloat16, iter.dtype(), "prod_out_cpu", [&] { binary_kernel_reduce_vec( iter, [=](scalar_t a, scalar_t b) @@ -334,20 +333,9 @@ static void and_kernel_impl(TensorIterator& iter) { binary_kernel_reduce_vec( iter, [=](uint8_t a, uint8_t b) -> uint8_t { return (a && b) ? 1 : 0; }, -#if defined(CPU_CAPABILITY_ZVECTOR) [=](Vectorized a, Vectorized b) { return a & b; }, -#else - [=](Vectorized a, Vectorized b) { - Vectorized c = Vectorized(); - - for (decltype(c.size()) i = 0; i != Vectorized::size(); i++) { - c[i] = (a[i] && b[i]) ? 1 : 0; - } - return c; - }, -#endif /*ident=*/true); } else { binary_kernel_reduce_vec( @@ -381,20 +369,9 @@ static void or_kernel_impl(TensorIterator& iter) { binary_kernel_reduce_vec( iter, [=](uint8_t a, uint8_t b) -> uint8_t { return (a || b) ? 1 : 0; }, -#if defined(CPU_CAPABILITY_ZVECTOR) [=](Vectorized a, Vectorized b) { return a | b; }, -#else - [=](Vectorized a, Vectorized b) { - Vectorized c = Vectorized(); - - for (decltype(c.size()) i = 0; i != Vectorized::size(); i++) { - c[i] = (a[i] || b[i]) ? 1 : 0; - } - return c; - }, -#endif /*ident=*/false); } else { binary_kernel_reduce_vec( diff --git a/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp b/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp index 8a157cee7522..6321fb6349e5 100644 --- a/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp +++ b/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp @@ -10,7 +10,6 @@ #include #include #include -#include #include namespace at { namespace native { @@ -184,8 +183,7 @@ struct cpu_scatter_gather_base_kernel { // vs dim-TensorIterator loop order depending on // whether dim is the last dimension if (dim== self.dim() - 1) { - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { // dim loop is a separate code block // for better performance _cpu_scatter_gather_dim_loop()( @@ -202,8 +200,7 @@ struct cpu_scatter_gather_base_kernel { for (const auto i : c10::irange(index_dim_size)) { auto* self_data = self_data_bytes; auto* index_data = (char*)((int64_t*)index_data_bytes + i * index_dim_stride); - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { int64_t idx_dim = *(int64_t*)index_data; // we are not putting idx_dim in the error message because it disables // loop optimization in clang-7 @@ -268,8 +265,7 @@ struct cpu_scatter_gather_base_kernel { // vs dim-TensorIterator loop order depending on // whether dim is the last dimension if (dim== self.dim() - 1) { - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { // dim loop is a separate code block // for better performance _cpu_scatter_gather_dim_loop()( @@ -290,8 +286,7 @@ struct cpu_scatter_gather_base_kernel { auto* self_data = self_data_bytes; auto* index_data = (char*)((int64_t*)index_data_bytes + i * index_dim_stride); auto* src_data = src_data_bytes; - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { int64_t idx_dim = *(int64_t*)index_data; // we are not putting idx_dim in the error message because it disables // loop optimization in clang-7 @@ -357,8 +352,7 @@ struct cpu_scatter_gather_base_kernel { // vs dim-TensorIterator loop order depending on // whether dim is the last dimension if (dim== self.dim() - 1) { - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { // dim loop is a separate code block // for better performance _cpu_scatter_gather_dim_loop()( @@ -379,8 +373,7 @@ struct cpu_scatter_gather_base_kernel { auto* self_data = self_data_bytes; auto* index_data = (char*)((int64_t*)index_data_bytes + i * index_dim_stride); auto* src_data = src_data_bytes; - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { int64_t idx_dim = *(int64_t*)index_data; // we are not putting idx_dim in the error message because it disables // loop optimization in clang-7 @@ -446,8 +439,7 @@ struct cpu_scatter_gather_base_kernel { // vs dim-TensorIterator loop order depending on // whether dim is the last dimension if (dim== self.dim() - 1) { - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { // dim loop is a separate code block // for better performance _cpu_scatter_gather_dim_loop()( @@ -468,8 +460,7 @@ struct cpu_scatter_gather_base_kernel { auto* self_data = self_data_bytes; auto* index_data = (char*)((int64_t*)index_data_bytes + i * index_dim_stride); auto* src_data = src_data_bytes; - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { int64_t idx_dim = *(int64_t*)index_data; // we are not putting idx_dim in the error message because it disables // loop optimization in clang-7 @@ -535,8 +526,7 @@ struct cpu_scatter_gather_base_kernel { // vs dim-TensorIterator loop order depending on // whether dim is the last dimension if (dim== self.dim() - 1) { - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { // dim loop is a separate code block // for better performance _cpu_scatter_gather_dim_loop()( @@ -557,8 +547,7 @@ struct cpu_scatter_gather_base_kernel { auto* self_data = self_data_bytes; auto* index_data = (char*)((int64_t*)index_data_bytes + i * index_dim_stride); auto* src_data = src_data_bytes; - for (const auto nelem : c10::irange(n)) { - (void)nelem; //Suppress unused variable warning + for (const auto nelem C10_UNUSED : c10::irange(n)) { int64_t idx_dim = *(int64_t*)index_data; // we are not putting idx_dim in the error message because it disables // loop optimization in clang-7 @@ -584,13 +573,55 @@ struct cpu_scatter_gather_base_kernel { } }; +template +inline void init(scalar_t* ptr, int64_t size, bool include_self) { + if (!include_self) { + using acc_t = vec::vec_scalar_t; + using Vec = vec::Vectorized; + + acc_t val; + if (reduce == SCATTER_GATHER_OP::REDUCE_ADD || + reduce == SCATTER_GATHER_OP::REDUCE_MEAN) { + val = static_cast(0); + } else if (reduce == SCATTER_GATHER_OP::REDUCE_MULTIPLY) { + val = static_cast(1); + } else if (reduce == SCATTER_GATHER_OP::REDUCE_MAXIMUM) { + val = std::numeric_limits::lowest(); + } else { + val = std::numeric_limits::max(); + } + vec::map( + [val](Vec x) { return Vec(val); }, + ptr, + ptr, + size); + } +} + +template +inline vec_t update(const vec_t& x, const vec_t& y) { + if (reduce == SCATTER_GATHER_OP::REDUCE_ADD || + reduce == SCATTER_GATHER_OP::REDUCE_MEAN) { + return x + y; + } else if (reduce == SCATTER_GATHER_OP::REDUCE_MULTIPLY) { + return x * y; + } else if (reduce == SCATTER_GATHER_OP::REDUCE_MAXIMUM) { + return vec::maximum(x, y); + } else { + return vec::minimum(x, y); + } +} + // Note [scatter reduce optimization] // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // -// 1. initiative: optimize scatter_reduce optimization on PyG -// `scatter_add` is extensively used on 'message passing' when -// aggregating info. The `index` tensor is extended which means -// the aggregation is on rowwise. +// 1. initiative: optimize `scatter_reduce` on classic PyG use-case: +// `scatter_reduce` is extensively used on 'message passing' when +// aggregating info. +// +// Typically, `self` will 2D tensor and `index` is a 1D extended/broadcasted +// tensor, which means that the aggregation is on rowwise and we can vectorize +// on the inner dimensions. // // 2. implementation: map `scatter_reduce` to `spmm` reduce // in the shape of `[M, N]` * `[N, K]`, where: @@ -604,8 +635,8 @@ struct cpu_scatter_gather_base_kernel { // // step 2: spmm reduce, parallel on M and vectorize on K // -template -void cpu_scatter_add_contig_kernel(const Tensor& self, const Tensor& index, const Tensor& src) { +template +void cpu_scatter_reduce_expanded_index(const Tensor& self, const Tensor& index, const Tensor& src, bool include_self) { int64_t* index_data = index.data_ptr(); scalar_t* self_data = self.data_ptr(); scalar_t* src_data = src.data_ptr(); @@ -624,9 +655,9 @@ void cpu_scatter_add_contig_kernel(const Tensor& self, const Tensor& index, cons for (const auto i : c10::irange(begin, end)) { int64_t index = index_data[i]; TORCH_CHECK(index >= 0 && index < index_upper_bound, - "index ", index, - " is out of bounds for dimension ", 0, - " with size ", index_upper_bound); + "index ", index, + " is out of bounds for dimension ", 0, + " with size ", index_upper_bound); keys[i] = index; values[i] = i; } @@ -689,25 +720,110 @@ void cpu_scatter_add_contig_kernel(const Tensor& self, const Tensor& index, cons int64_t off_start = row_index_offset[m]; int64_t off_end = row_index_offset[m + 1]; scalar_t* self_ptr = self_data + row * K; + + // reinit rows in `self` if needed + init(self_ptr, K, include_self); + for (const auto n : c10::irange(off_start, off_end)) { int64_t col = sorted_col_index_values[n]; scalar_t* src_ptr = src_data + col * K; vec::map2( - [](Vec x, Vec y) { return x + y; }, + [](Vec x, Vec y) { return update(x, y); }, self_ptr, self_ptr, src_ptr, K); } + + if (reduce == SCATTER_GATHER_OP::REDUCE_MEAN) { + int64_t count = include_self ? 1 : 0; + count += off_end - off_start; + if (count != 0) { + vec::map( + [count](Vec x) { return x / Vec(count); }, + self_ptr, + self_ptr, + K); + } + } } }); } -void scatter_add_config(const Tensor& self, const Tensor& index, const Tensor& src) { - AT_DISPATCH_ALL_TYPES_AND3( - ScalarType::Bool, ScalarType::Half, ScalarType::BFloat16, self.scalar_type(), - "scatter_add_contig", [&] { - cpu_scatter_add_contig_kernel(self, index, src); +template +void cpu_gather_expanded_index_kernel(const Tensor& result, const Tensor& index, const Tensor& self) { + int64_t* index_data = index.data_ptr(); + scalar_t* result_data = result.data_ptr(); + scalar_t* self_data = self.data_ptr(); + + const int64_t M = ensure_nonempty_size(result, 0); + const int64_t N = ensure_nonempty_size(self, 0); + const int64_t K = index.numel() / M; + + const int64_t index_upper_bound = N; + + using Vec = vec::Vectorized; + int64_t grain_size = std::max((int64_t) 1, at::internal::GRAIN_SIZE / K); + at::parallel_for(0, M, grain_size, [&](int64_t begin, int64_t end) { + for (const auto m : c10::irange(begin, end)) { + scalar_t* result_ptr = result_data + m * K; + int64_t index = index_data[m]; + TORCH_CHECK(index >= 0 && index < index_upper_bound, + "index ", index, + " is out of bounds for dimension ", 0, + " with size ", index_upper_bound); + scalar_t* self_ptr = self_data + index * K; + int64_t d = 0; + for (; d < K - (K % Vec::size()); d += Vec::size()) { + Vec out_vec = Vec::loadu(self_ptr + d); + out_vec.store(result_ptr + d); + } + #if !defined(_MSC_VER) && !defined(COMPILING_FOR_MIN_SIZE) + # pragma unroll + #endif + for (; d < K; d++) { + result_ptr[d] = self_ptr[d]; + } + } + }); +} + +void scatter_add_expanded_index_kernel(const Tensor& self, const Tensor& index, const Tensor& src) { + AT_DISPATCH_FLOATING_TYPES_AND( + ScalarType::BFloat16, self.scalar_type(), "scatter_add_expanded_index", [&] { + cpu_scatter_reduce_expanded_index(self, index, src, /*include_self*/true); + }); +} + +void scatter_reduce_expanded_index_kernel( + const Tensor& self, const Tensor& index, const Tensor& src, + const SCATTER_GATHER_OP& reduce, bool include_self) { + AT_DISPATCH_FLOATING_TYPES_AND( + ScalarType::BFloat16, self.scalar_type(), "scatter_reduce_expanded_index", [&] { + switch (reduce) { + case SCATTER_GATHER_OP::REDUCE_ADD : + cpu_scatter_reduce_expanded_index(self, index, src, include_self); + break; + case SCATTER_GATHER_OP::REDUCE_MULTIPLY : + cpu_scatter_reduce_expanded_index(self, index, src, include_self); + break; + case SCATTER_GATHER_OP::REDUCE_MAXIMUM : + cpu_scatter_reduce_expanded_index(self, index, src, include_self); + break; + case SCATTER_GATHER_OP::REDUCE_MINIMUM : + cpu_scatter_reduce_expanded_index(self, index, src, include_self); + break; + case SCATTER_GATHER_OP::REDUCE_MEAN : + cpu_scatter_reduce_expanded_index(self, index, src, include_self); + break; + } + }); +} + +void gather_expanded_index_kernel(const Tensor& result, const Tensor& self, const Tensor& index) { + AT_DISPATCH_FLOATING_TYPES_AND( + ScalarType::BFloat16, self.scalar_type(), "gather_expanded_index", [&] { + cpu_gather_expanded_index_kernel(result, index, self); }); } @@ -727,25 +843,10 @@ void scatter_fill_cpu_kernel(const Tensor& self, int64_t dim, const Tensor& inde self, dim, index, value, "scatter_fill_cpu_", tensor_assign); } -inline bool is_fast_path_scatter(const Tensor& self, int64_t dim, const Tensor& index, const Tensor& src) { -#if AT_PARALLEL_OPENMP - //TODO: add optimization when inner_size is 1 - // currently inner_size == 1 will go sequetial - if (index.numel() == index.size(0)) { return false; } - return dim == 0 && index.stride(dim) == 1 && src.is_contiguous() && self.is_contiguous(); -#else - return false; -#endif -} - void scatter_add_cpu_kernel(const Tensor& self, int64_t dim, const Tensor& index, const Tensor& src) { - if (is_fast_path_scatter(self, dim, index, src)) { - scatter_add_config(self, index, src); - } else { - cpu_scatter_gather_base_kernel<>()( - self, dim, index, src, - "scatter_add_", reduce_add); - } + cpu_scatter_gather_base_kernel<>()( + self, dim, index, src, + "scatter_add_", reduce_add); } void scatter_reduce_cpu_kernel(const Tensor& self, const int64_t dim, const Tensor& index, @@ -816,4 +917,9 @@ REGISTER_DISPATCH(scatter_reduce_stub, &scatter_reduce_cpu_kernel); REGISTER_DISPATCH(scatter_scalar_reduce_stub, &scatter_scalar_reduce_cpu_kernel); REGISTER_DISPATCH(scatter_reduce_two_stub, &scatter_reduce_two_cpu_kernel); +// fast paths for GNN usage +REGISTER_DISPATCH(scatter_add_expanded_index_stub, &scatter_add_expanded_index_kernel); +REGISTER_DISPATCH(scatter_reduce_expanded_index_stub, &scatter_reduce_expanded_index_kernel); +REGISTER_DISPATCH(gather_expanded_index_stub, &gather_expanded_index_kernel); + }} // namespace at::native diff --git a/aten/src/ATen/native/cpu/SortingKernel.cpp b/aten/src/ATen/native/cpu/SortingKernel.cpp index fdbecbb65cdf..66c9c3b68c8a 100644 --- a/aten/src/ATen/native/cpu/SortingKernel.cpp +++ b/aten/src/ATen/native/cpu/SortingKernel.cpp @@ -45,8 +45,7 @@ void _dim_apply( return; } - for (const auto i : c10::irange(n)) { - (void)i; //Suppress unused variable warning + for (const auto i C10_UNUSED : c10::irange(n)) { f( reinterpret_cast(values_data_bytes), values_dim_stride, diff --git a/aten/src/ATen/native/cpu/SparseFactories.cpp b/aten/src/ATen/native/cpu/SparseFactories.cpp index 0b0f73e1844c..1fb33c7e3713 100644 --- a/aten/src/ATen/native/cpu/SparseFactories.cpp +++ b/aten/src/ATen/native/cpu/SparseFactories.cpp @@ -1,35 +1,25 @@ +#define TORCH_ASSERT_NO_OPERATORS +#include + #include -#include -#include -#include #include -#include -#include +#include #include -#include -#include -#include +#include #include -#ifndef AT_PER_OPERATOR_HEADERS -#include -#include -#else -#include -#endif - namespace at { namespace native { -using namespace at::sparse; namespace { void _spdiags_kernel_cpu( TensorIterator& iter, - const Tensor& diagonals, - Tensor& values, - Tensor& indices) { - auto* row_index_write_ptr = indices[0].data_ptr(); - auto* col_index_write_ptr = indices[1].data_ptr(); + const TensorBase& diagonals, + TensorBase& values, + TensorBase& indices) { + auto* row_index_write_ptr = indices.data_ptr(); + auto* col_index_write_ptr = row_index_write_ptr + indices.stride(0); + const int64_t diagonals_index_stride = diagonals.stride(0); const int64_t diagonals_read_stride = diagonals.stride(1); AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( at::ScalarType::BFloat16, @@ -39,7 +29,9 @@ void _spdiags_kernel_cpu( diagonals.scalar_type(), "spdiags_cpu", [&] { - auto* values_write_ptr = values.data_ptr(); + auto* const values_write_ptr = values.data_ptr(); + const auto* const diagonals_ptr = diagonals.data_ptr(); + cpu_kernel( iter, [&](int64_t diag_index, @@ -52,8 +44,9 @@ void _spdiags_kernel_cpu( auto* vals_start = values_write_ptr + out_offset; const int64_t first_col = std::max(diag_offset, 0); const int64_t first_row = first_col - diag_offset; - auto* data_read = diagonals[diag_index].data_ptr() + - first_col * diagonals_read_stride; + auto* data_read = (diagonals_ptr + + diagonals_index_stride * diag_index + + first_col * diagonals_read_stride); for (int64_t i = 0; i < n_out; ++i) { rows_start[i] = first_row + i; cols_start[i] = first_col + i; diff --git a/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp b/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp index 74854855ff83..cba47abcd3e4 100644 --- a/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp +++ b/aten/src/ATen/native/cpu/SpmmReduceKernel.cpp @@ -1,150 +1,601 @@ #define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include - +#include #include -#include #include #include #include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + namespace at { namespace native { namespace { -template -void spmm_sum_kernel_impl( - const Tensor& result, - const Tensor& rowptr, - const Tensor& col, - const c10::optional& optional_value, - const Tensor& mat) { - - scalar_t* result_data = result.data_ptr(); - int64_t* rowptr_data = rowptr.data_ptr(); - int64_t* col_data = col.data_ptr(); - scalar_t* value_data = has_optional_value ? optional_value.value().data_ptr() : nullptr; - scalar_t* mat_data = mat.data_ptr(); - - int64_t M = rowptr.numel() - 1; - int64_t N = mat.size(-2); - int64_t K = mat.size(-1); - int64_t B = mat.numel() / (N * K); - - // directly parallel on `B * M` may lead to load imbalance, +template +struct Reducer { + static inline void init(scalar_t* ptr, int64_t size) { + using acc_t = vec::vec_scalar_t; + using Vec = vec::Vectorized; + + acc_t val; + if (reduce == SPMM_MAX) { + val = std::numeric_limits::lowest(); + } else if (reduce == SPMM_MIN) { + val = std::numeric_limits::max(); + } else { + return; + } + + vec::map( + [val](Vec x) { return Vec(val); }, + ptr, + ptr, + size); + } + + static inline void update(scalar_t& out, const scalar_t data) { + if (reduce == SPMM_SUM || reduce == SPMM_MEAN) { + out += data; + } else if (reduce == SPMM_MAX) { + out = std::max(out, data); + } else { + out = std::min(out, data); + } + } + + static inline void update( + vec::Vectorized& out_vec, + const vec::Vectorized& data_vec) { + if (reduce == SPMM_SUM || reduce == SPMM_MEAN) { + out_vec += data_vec; + } else if (reduce == SPMM_MAX) { + out_vec = vec::maximum(out_vec, data_vec); + } else { + out_vec = vec::minimum(out_vec, data_vec); + } + } +}; + +template +void spmm_reduce_kernel_impl( + const Tensor& out, + const Tensor& crow_indices_, + const Tensor& col_indices_, + const Tensor& values_, + const Tensor& weight_) { + + int64_t nnz = values_.numel(); + if (nnz == 0) { + return; + } + + auto crow_indices = crow_indices_.contiguous(); + auto col_indices = col_indices_.contiguous(); + auto values = values_.contiguous(); + auto weight = weight_.contiguous(); + + scalar_t* out_data = out.data_ptr(); + index_t* csr_data = crow_indices.data_ptr(); + index_t* col_data = col_indices.data_ptr(); + scalar_t* val_data = values.data_ptr(); + scalar_t* weight_data = weight.data_ptr(); + + int64_t M = crow_indices.numel() - 1; + int64_t K = weight.size(-1); + + // directly parallel on `M` may lead to load imbalance, // statically determine thread partition here to average payload // for each thread. int num_threads = at::get_num_threads(); - std::vector thread_splits(num_threads + 1, B * M); - int64_t thread_averge_payload = (rowptr_data[M] - rowptr_data[0]) / num_threads; + std::vector thread_splits(num_threads + 1, M); + + int64_t thread_averge_payload = std::max((int64_t)1, divup(nnz, num_threads)); thread_splits[0] = 0; int64_t sum = 0; int64_t t = 1; for (const auto m : c10::irange(M)) { - int64_t row_start = rowptr_data[m]; - int64_t row_end = rowptr_data[m + 1]; + int64_t row_start = csr_data[m]; + int64_t row_end = csr_data[m + 1]; sum += row_end - row_start; if (sum > t * thread_averge_payload) { - thread_splits[t] = B * m; + thread_splits[t] = m; t++; } } // need to restore the last index, // due to rounding error when calculating `thread_averge_payload`. - thread_splits[num_threads] = B * M; + thread_splits[num_threads] = M; - // TODO: add bfloat16 support here using Vec = vec::Vectorized; at::parallel_for(0, num_threads, 1, [&](int64_t cbegin, int64_t cend) { int tid = at::get_thread_num(); int64_t begin = thread_splits[tid]; int64_t end = thread_splits[tid + 1]; - int64_t row_start, row_end, b, m, c; - for (const auto i : c10::irange(begin, end)) { - b = i / M; - m = i % M; - row_start = rowptr_data[m]; - row_end = rowptr_data[m + 1]; + int64_t row_start, row_end, c; + for (const auto m : c10::irange(begin, end)) { + row_start = csr_data[m]; + row_end = csr_data[m + 1]; - scalar_t* result_ptr = result_data + i * K; + scalar_t* out_ptr = out_data + m * K; constexpr int64_t kVecSize = Vec::size(); constexpr int64_t kVLEN = kVecSize * 4; constexpr int64_t CHUNK_SIZE = 16; - // init the output lane - vec::map([](Vec x) { return Vec(0); }, result_ptr, result_ptr, K); + // reinit the output row for reduce type 'max' and 'min' + int64_t count = row_end - row_start; + if (count != 0) { + Reducer::init(out_ptr, K); + } // blocking on rowwise to reduce write memory bandwidth for (int64_t e0 = row_start; e0 < row_end; e0 += CHUNK_SIZE) { int64_t e1 = std::min(e0 + CHUNK_SIZE, row_end); - // unrolling by 4 int64_t k = 0; for (; k < K - (K % kVLEN); k += kVLEN) { - Vec out_vec0 = Vec::loadu(result_ptr + k); - Vec out_vec1 = Vec::loadu(result_ptr + k + kVecSize); - Vec out_vec2 = Vec::loadu(result_ptr + k + kVecSize * 2); - Vec out_vec3 = Vec::loadu(result_ptr + k + kVecSize * 3); + Vec out_vec0 = Vec::loadu(out_ptr + k); + Vec out_vec1 = Vec::loadu(out_ptr + k + kVecSize); + Vec out_vec2 = Vec::loadu(out_ptr + k + kVecSize * 2); + Vec out_vec3 = Vec::loadu(out_ptr + k + kVecSize * 3); for (const auto e : c10::irange(e0, e1)) { c = col_data[e]; - scalar_t val = has_optional_value ? value_data[e] : scalar_t(1); - scalar_t* mat_ptr = mat_data + b * N * K + c * K + k; + scalar_t val = val_data[e]; + scalar_t* weight_ptr = weight_data + c * K + k; - out_vec0 += Vec::loadu(mat_ptr) * Vec(val); - out_vec1 += Vec::loadu(mat_ptr + kVecSize) * Vec(val); - out_vec2 += Vec::loadu(mat_ptr + kVecSize * 2) * Vec(val); - out_vec3 += Vec::loadu(mat_ptr + kVecSize * 3) * Vec(val); + Reducer::update(out_vec0, Vec::loadu(weight_ptr) * Vec(val)); + Reducer::update(out_vec1, Vec::loadu(weight_ptr + kVecSize) * Vec(val)); + Reducer::update(out_vec2, Vec::loadu(weight_ptr + kVecSize * 2) * Vec(val)); + Reducer::update(out_vec3, Vec::loadu(weight_ptr + kVecSize * 3) * Vec(val)); } - out_vec0.store(result_ptr + k); - out_vec1.store(result_ptr + k + kVecSize); - out_vec2.store(result_ptr + k + kVecSize * 2); - out_vec3.store(result_ptr + k + kVecSize * 3); + out_vec0.store(out_ptr + k); + out_vec1.store(out_ptr + k + kVecSize); + out_vec2.store(out_ptr + k + kVecSize * 2); + out_vec3.store(out_ptr + k + kVecSize * 3); } for (; k < K - (K % Vec::size()); k += Vec::size()) { - Vec out_vec = Vec::loadu(result_ptr + k); + Vec out_vec = Vec::loadu(out_ptr + k); for (const auto e : c10::irange(e0, e1)) { c = col_data[e]; - scalar_t val = has_optional_value ? value_data[e] : scalar_t(1); - scalar_t* mat_ptr = mat_data + b * N * K + c * K; - out_vec += Vec::loadu(mat_ptr + k) * Vec(val); + scalar_t val = val_data[e]; + scalar_t* weight_ptr = weight_data + c * K; + Reducer::update(out_vec, Vec::loadu(weight_ptr + k) * Vec(val)); } - out_vec.store(result_ptr + k); + out_vec.store(out_ptr + k); } for (; k < K; k++) { - scalar_t out_val = result_ptr[k]; + scalar_t out_val = out_ptr[k]; for (const auto e : c10::irange(e0, e1)) { c = col_data[e]; - scalar_t val = has_optional_value ? value_data[e] : scalar_t(1); - scalar_t* mat_ptr = mat_data + b * N * K + c * K; - out_val += mat_ptr[k] * val; + scalar_t val = val_data[e]; + scalar_t* weight_ptr = weight_data + c * K; + Reducer::update(out_val, weight_ptr[k] * val); } - result_ptr[k] = out_val; + out_ptr[k] = out_val; + } + } + + if (reduce == SPMM_MEAN && count != 0) { + int64_t k = 0; + for (; k < K - (K % Vec::size()); k += Vec::size()) { + Vec out_vec = Vec::loadu(out_ptr + k); + out_vec /= Vec(count); + out_vec.store(out_ptr + k); + } + for (; k < K; k++) { + out_ptr[k] /= count; } } } }); } -void spmm_sum_kernel( - const Tensor& result, - const Tensor& rowptr, - const Tensor& col, - const c10::optional& optional_value, - const Tensor& mat) { - AT_DISPATCH_FLOATING_TYPES(result.scalar_type(), "spmm_sum_kernel", [&]() { - if (optional_value.has_value()) { - spmm_sum_kernel_impl(result, rowptr, col, optional_value, mat); - } else { - spmm_sum_kernel_impl(result, rowptr, col, optional_value, mat); +template +inline void update(scalar_t *val, scalar_t new_val, index_t *arg, index_t new_arg) { + if ((reduce == SPMM_MIN && new_val < *val) || + (reduce == SPMM_MAX && new_val > *val)) { + *val = new_val; + *arg = new_arg; + } +} + +template +void spmm_reduce_arg_kernel_impl( + const Tensor& out, + const Tensor& arg_out, + const Tensor& crow_indices_, + const Tensor& col_indices_, + const Tensor& values_, + const Tensor& weight_) { + + TORCH_CHECK(reduce == SPMM_MAX || reduce == SPMM_MIN); + int64_t nnz = values_.numel(); + if (nnz == 0) { + return; + } + + auto crow_indices = crow_indices_.contiguous(); + auto col_indices = col_indices_.contiguous(); + auto values = values_.contiguous(); + auto weight = weight_.contiguous(); + + scalar_t* out_data = out.data_ptr(); + index_t* arg_out_data = arg_out.data_ptr(); + index_t* csr_data = crow_indices.data_ptr(); + index_t* col_data = col_indices.data_ptr(); + scalar_t* val_data = values.data_ptr(); + scalar_t* weight_data = weight.data_ptr(); + + int64_t M = crow_indices.numel() - 1; + int64_t K = weight.size(-1); + + at::parallel_for(0, M, 1, [&](int64_t begin, int64_t end) { + int64_t row_start, row_end, c; + for (const auto m : c10::irange(begin, end)) { + row_start = csr_data[m]; + row_end = csr_data[m + 1]; + + scalar_t* out_ptr = out_data + m * K; + index_t* arg_out_ptr = arg_out_data + m * K; + + int64_t count = row_end - row_start; + if (count != 0) { + Reducer::init(out_ptr, K); + for (const auto e : c10::irange(row_start, row_end)) { + c = col_data[e]; + scalar_t val = val_data[e]; + + scalar_t* weight_ptr = weight_data + c * K; + for (const auto k : c10::irange(K)) { + update( + &out_ptr[k], val * weight_ptr[k], &arg_out_ptr[k], index_t(e)); + }; + } + } + } + }); +} + +template +void spmm_reduce_backward_input_kernel_impl( + const Tensor& grad_input, + const Tensor& grad_out_, + const Tensor& crow_indices_, + const Tensor& col_indices_, + const Tensor& weight_, + const Tensor& row_indices_) { + + int64_t nnz = grad_input._nnz(); + if (nnz == 0) { + return; + } + + auto grad_out = grad_out_.contiguous(); + auto crow_indices = crow_indices_.contiguous(); + auto col_indices = col_indices_.contiguous(); + auto weight = weight_.contiguous(); + auto row_indices = row_indices_.contiguous(); + + scalar_t* grad_values_data = grad_input.values().data_ptr(); + scalar_t* grad_out_data = grad_out.data_ptr(); + index_t* crow_data = crow_indices.data_ptr(); + index_t* col_data = col_indices.data_ptr(); + scalar_t* weight_data = weight.data_ptr(); + index_t* row_data = row_indices.data_ptr(); + + int64_t K = grad_out.size(1); + + using Vec = vec::Vectorized>; + at::parallel_for(0, nnz, 1, [&](int64_t begin, int64_t end) { + for (const auto i : c10::irange(begin, end)) { + index_t row = row_data[i], col = col_data[i]; + + scalar_t val = vec::map2_reduce_all( + [](Vec x, Vec y) { return x * y; }, + [](Vec x, Vec y) { return x + y; }, + weight_data + col * K, + grad_out_data + row * K, + K); + + if (reduce == SPMM_MEAN) { + index_t row_start = crow_data[row], row_end = crow_data[row + 1]; + val /= std::max((index_t)1, row_end - row_start); + } + + grad_values_data[i] = val; + } + }); +} + +// backward for reduce type 'max' or 'min' +template +void spmm_reduce_backward_input_arg_kernel_impl( + const Tensor& grad_input, + const Tensor& grad_out_, + const Tensor& col_indices_, + const Tensor& weight_, + const Tensor& arg_out_) { + + int64_t nnz = grad_input._nnz(); + if (nnz == 0) { + return; + } + + auto grad_out = grad_out_.contiguous(); + auto col_indices = col_indices_.contiguous(); + auto weight = weight_.contiguous(); + auto arg_out = arg_out_.contiguous(); + + scalar_t* grad_values_data = grad_input.values().data_ptr(); + scalar_t* grad_out_data = grad_out.data_ptr(); + index_t* col_data = col_indices.data_ptr(); + scalar_t* weight_data = weight.data_ptr(); + index_t* arg_out_data = arg_out.data_ptr(); + + int64_t M = grad_out.size(0); + int64_t K = grad_out.size(1); + auto grad = at::empty({M, K}, grad_out.options()); + scalar_t* grad_data = grad.data_ptr(); + + at::parallel_for(0, M, 1, [&](int64_t begin, int64_t end) { + for (const auto m : c10::irange(begin, end)) { + scalar_t* grad_out_ptr = grad_out_data + m * K; + scalar_t* grad_ptr = grad_data + m * K; + index_t* arg_out_ptr = arg_out_data + m * K; + + for (const auto k : c10::irange(K)) { + if (arg_out_ptr[k] == index_t(nnz)) { + grad_ptr[k] = scalar_t(0); + } else { + // collect weight at max/min indices + index_t col = col_data[arg_out_data[m * K + k]]; + grad_ptr[k] = weight_data[col * K + k] * grad_out_ptr[k]; + } + } + } + }); + + // scatter_add, consider to parallel this with atomic + for (const auto i : c10::irange(M * K)) { + index_t ind = arg_out_data[i]; + if (ind != index_t(nnz)) { + grad_values_data[ind] += grad_data[i]; } + } +} + +template +void spmm_reduce_update_values_kernel_impl( + const Tensor& updated_values, + const Tensor& values_, + const Tensor& crow_indices_, + const Tensor& row_indices_) { + + int64_t nnz = values_.numel(); + if (nnz == 0) { + return; + } + + auto values = values_.contiguous(); + auto crow_indices = crow_indices_.contiguous(); + auto row_indices = row_indices_.contiguous(); + + scalar_t* updated_values_data = updated_values.data_ptr(); + scalar_t* values_data = values.data_ptr(); + index_t* crow_data = crow_indices.data_ptr(); + index_t* row_data = row_indices.data_ptr(); + + at::parallel_for(0, nnz, 1, [&](int64_t begin, int64_t end) { + for (const auto i : c10::irange(begin, end)) { + index_t row = row_data[i]; + index_t row_start = crow_data[row], row_end = crow_data[row + 1]; + updated_values_data[i] = values_data[i] / std::max((index_t)1, row_end - row_start); + } + }); +} + +template +void spmm_reduce_backward_weight_arg_kernel_impl( + const Tensor& grad_weight, + const Tensor& grad_out_, + const Tensor& col_indices_, + const Tensor& values_, + const Tensor& arg_out_) { + + int64_t nnz = values_.numel(); + if (nnz == 0) { + return; + } + + auto grad_out = grad_out_.contiguous(); + auto col_indices = col_indices_.contiguous(); + auto values = values_.contiguous(); + auto arg_out = arg_out_.contiguous(); + + scalar_t* grad_weight_data = grad_weight.data_ptr(); + scalar_t* grad_out_data = grad_out.data_ptr(); + index_t* col_data = col_indices.data_ptr(); + scalar_t* values_data = values.data_ptr(); + index_t* arg_out_data = arg_out.data_ptr(); + + int64_t M = grad_out.size(0); + int64_t K = grad_out.size(1); + auto grad = at::empty({M, K}, grad_out.options()); + scalar_t* grad_data = grad.data_ptr(); + + at::parallel_for(0, M, 1, [&](int64_t begin, int64_t end) { + for (const auto m : c10::irange(begin, end)) { + scalar_t* grad_out_ptr = grad_out_data + m * K; + scalar_t* grad_ptr = grad_data + m * K; + index_t* arg_out_ptr = arg_out_data + m * K; + + for (const auto k : c10::irange(K)) { + if (arg_out_ptr[k] == index_t(nnz)) { + grad_ptr[k] = scalar_t(0); + } else { + grad_ptr[k] = values_data[arg_out_ptr[k]] * grad_out_ptr[k]; + } + } + } + }); + + // scatter_add, consider to parallel this with atomic + for (const auto m : c10::irange(M)) { + for (const auto k : c10::irange(K)) { + index_t ind = arg_out_data[m * K + k]; + if (ind != index_t(nnz)) { + index_t col = col_data[ind]; + grad_weight_data[col * K + k] += grad_data[m * K + k]; + } + } + } +} + +void spmm_reduce_kernel( + const Tensor& out, + const Tensor& crow_indices, + const Tensor& col_indices, + const Tensor& values, + const Tensor& weight, + SPMM_REDUCE_OP reduce_op) { + AT_DISPATCH_FLOATING_TYPES_AND(ScalarType::BFloat16, values.scalar_type(), "spmm_reduce_kernel", [&]() { + AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "spmm_reduce_indices", [&]() { + AT_DISPATCH_REDUCTION_TYPES(reduce_op, [&]() { + spmm_reduce_kernel_impl( + out, crow_indices, col_indices, values, weight); + }); + }); + }); +} + +void spmm_reduce_arg_kernel( + const Tensor& out, + const Tensor& arg_out, + const Tensor& crow_indices, + const Tensor& col_indices, + const Tensor& values, + const Tensor& weight, + SPMM_REDUCE_OP reduce_op) { + AT_DISPATCH_FLOATING_TYPES_AND(ScalarType::BFloat16, values.scalar_type(), "spmm_reduce_kernel", [&]() { + AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "spmm_reduce_indices", [&]() { + AT_DISPATCH_REDUCTION_TYPES(reduce_op, [&]() { + spmm_reduce_arg_kernel_impl( + out, arg_out, crow_indices, col_indices, values, weight); + }); + }); + }); +} + +void spmm_reduce_backward_input_kernel( + const Tensor& grad_input, + const Tensor& grad_out, + const Tensor& crow_indices, + const Tensor& col_indices, + const Tensor& weight, + const Tensor& row_indices, + SPMM_REDUCE_OP reduce_op) { + TORCH_CHECK(reduce_op == SPMM_SUM || reduce_op == SPMM_MEAN); + AT_DISPATCH_FLOATING_TYPES_AND(ScalarType::BFloat16, weight.scalar_type(), "spmm_reduce_backward_input_kernel", [&]() { + AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "spmm_reduce_backward_input_indices", [&]() { + AT_DISPATCH_REDUCTION_TYPES(reduce_op, [&]() { + spmm_reduce_backward_input_kernel_impl( + grad_input, grad_out, crow_indices, col_indices, weight, row_indices); + }); + }); + }); +} + +void spmm_reduce_backward_input_arg_kernel( + const Tensor& grad_input, + const Tensor& grad_out, + const Tensor& col_indices, + const Tensor& weight, + const Tensor& arg_out, + SPMM_REDUCE_OP reduce_op) { + TORCH_CHECK(reduce_op == SPMM_MAX || reduce_op == SPMM_MIN); + AT_DISPATCH_FLOATING_TYPES_AND(ScalarType::BFloat16, weight.scalar_type(), "spmm_reduce_backward_input_arg_kernel", [&]() { + AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "spmm_reduce_backward_input_arg_indices", [&]() { + spmm_reduce_backward_input_arg_kernel_impl( + grad_input, grad_out, col_indices, weight, arg_out); + }); + }); +} + +void spmm_reduce_update_values_kernel( + const Tensor& updated_values, + const Tensor& values, + const Tensor& crow_indices, + const Tensor& row_indices) { + AT_DISPATCH_FLOATING_TYPES_AND(ScalarType::BFloat16, values.scalar_type(), "spmm_reduce_update_values_kernel", [&]() { + AT_DISPATCH_INDEX_TYPES(crow_indices.scalar_type(), "spmm_reduce_update_values_indices", [&]() { + spmm_reduce_update_values_kernel_impl( + updated_values, values, crow_indices, row_indices); + }); + }); +} + +void spmm_reduce_backward_weight_kernel( + const Tensor& grad_weight, + const Tensor& grad_out, + const Tensor& crow_indices, + const Tensor& values, + const Tensor& row_indices, + const Tensor& ccol_indices, + const Tensor& csr2csc, + SPMM_REDUCE_OP reduce_op) { + TORCH_CHECK(reduce_op == SPMM_SUM || reduce_op == SPMM_MEAN); + // need to permute row_indices to CSC order + auto row = row_indices.index_select(0, csr2csc); + + Tensor val; + if (reduce_op == SPMM_MEAN) { + // for reduce type "mean", need to update the values + // with rowcount for each of the nonzero element. + Tensor updated_values = at::empty(values.sizes(), values.options()); + spmm_reduce_update_values_kernel(updated_values, values, crow_indices, row_indices); + val = updated_values.index_select(0, csr2csc); + } else { + val = values.index_select(0, csr2csc); + } + + if (reduce_op == SPMM_SUM || reduce_op == SPMM_MEAN) { + spmm_reduce_kernel(grad_weight, ccol_indices, row, val, grad_out, SPMM_SUM); + } +} + +void spmm_reduce_backward_weight_arg_kernel( + const Tensor& grad_weight, + const Tensor& grad_out, + const Tensor& col_indices, + const Tensor& values, + const Tensor& arg_out, + SPMM_REDUCE_OP reduce_op) { + TORCH_CHECK(reduce_op == SPMM_MAX || reduce_op == SPMM_MIN); + AT_DISPATCH_FLOATING_TYPES_AND(ScalarType::BFloat16, values.scalar_type(), "spmm_reduce_backward_weight_arg_kernel", [&]() { + AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "spmm_reduce_backward_weight_arg_indices", [&]() { + spmm_reduce_backward_weight_arg_kernel_impl( + grad_weight, grad_out, col_indices, values, arg_out); + }); }); } } // anonymous namespace -REGISTER_DISPATCH(spmm_sum_stub, &spmm_sum_kernel); +REGISTER_DISPATCH(spmm_reduce_stub, &spmm_reduce_kernel); +REGISTER_DISPATCH(spmm_reduce_arg_stub, &spmm_reduce_arg_kernel); +REGISTER_DISPATCH(spmm_reduce_backward_input_stub, &spmm_reduce_backward_input_kernel); +REGISTER_DISPATCH(spmm_reduce_backward_input_arg_stub, &spmm_reduce_backward_input_arg_kernel); +REGISTER_DISPATCH(spmm_reduce_backward_weight_stub, &spmm_reduce_backward_weight_kernel); +REGISTER_DISPATCH(spmm_reduce_backward_weight_arg_stub, &spmm_reduce_backward_weight_arg_kernel); }} // at::native diff --git a/aten/src/ATen/native/cpu/SpmmReduceKernel.h b/aten/src/ATen/native/cpu/SpmmReduceKernel.h new file mode 100644 index 000000000000..cbd26cfbf4ba --- /dev/null +++ b/aten/src/ATen/native/cpu/SpmmReduceKernel.h @@ -0,0 +1,45 @@ +#pragma once + +#include +#include + +namespace at { namespace native { + +enum SPMM_REDUCE_OP {SPMM_SUM, SPMM_MAX, SPMM_MIN, SPMM_MEAN}; + +using spmm_reduce_fn = void(*)(const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, SPMM_REDUCE_OP op); +using spmm_reduce_arg_fn = void(*)(const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, SPMM_REDUCE_OP op); +using spmm_reduce_backward_input_fn = void(*)(const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, SPMM_REDUCE_OP op); +using spmm_reduce_backward_input_arg_fn = void(*)(const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, SPMM_REDUCE_OP op); +using spmm_reduce_backward_weight_fn = void(*)(const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, SPMM_REDUCE_OP op); + +DECLARE_DISPATCH(spmm_reduce_fn, spmm_reduce_stub); +DECLARE_DISPATCH(spmm_reduce_arg_fn, spmm_reduce_arg_stub); +DECLARE_DISPATCH(spmm_reduce_backward_input_fn, spmm_reduce_backward_input_stub); +DECLARE_DISPATCH(spmm_reduce_backward_input_arg_fn, spmm_reduce_backward_input_arg_stub); +DECLARE_DISPATCH(spmm_reduce_backward_weight_fn, spmm_reduce_backward_weight_stub); +DECLARE_DISPATCH(spmm_reduce_backward_input_arg_fn, spmm_reduce_backward_weight_arg_stub); + +#define AT_DISPATCH_REDUCTION_TYPES(op, ...) \ + [&] { \ + switch (op) { \ + case SPMM_SUM: { \ + static constexpr SPMM_REDUCE_OP reduce = SPMM_SUM; \ + return __VA_ARGS__(); \ + } \ + case SPMM_MEAN: { \ + static constexpr SPMM_REDUCE_OP reduce = SPMM_MEAN; \ + return __VA_ARGS__(); \ + } \ + case SPMM_MIN: { \ + static constexpr SPMM_REDUCE_OP reduce = SPMM_MIN; \ + return __VA_ARGS__(); \ + } \ + case SPMM_MAX: { \ + static constexpr SPMM_REDUCE_OP reduce = SPMM_MAX; \ + return __VA_ARGS__(); \ + } \ + } \ + }() + +}} // at::native diff --git a/aten/src/ATen/native/cpu/TensorCompareKernel.cpp b/aten/src/ATen/native/cpu/TensorCompareKernel.cpp index 903fef2f0331..1547249b7018 100644 --- a/aten/src/ATen/native/cpu/TensorCompareKernel.cpp +++ b/aten/src/ATen/native/cpu/TensorCompareKernel.cpp @@ -83,8 +83,7 @@ static inline void compare_base_kernel(const Tensor& result1, const Tensor& resu auto* result1_data_bytes = data[0]; auto* result2_data_bytes = data[1]; const auto* self_data_bytes = data[2]; - for (const auto i : c10::irange(n)) { - (void)i; //Suppress unused variable warning + for (const auto i C10_UNUSED : c10::irange(n)) { f((scalar_t*)result1_data_bytes, (scalar_t_2*)result2_data_bytes, (scalar_t*)self_data_bytes, @@ -245,8 +244,7 @@ static void mode_kernel_impl( std::vector> elements(self_dim_size); - for (const auto k : c10::irange(n)) { - (void)k; //Suppress unused variable warning + for (const auto k C10_UNUSED : c10::irange(n)) { scalar_t* values_data = (scalar_t*)values_data_bytes; int64_t* indices_data = (int64_t*)indices_data_bytes; const scalar_t* self_data = (scalar_t*)self_data_bytes; diff --git a/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp b/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp index a53587e56da4..8a0534fd3da5 100644 --- a/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp +++ b/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -203,13 +204,18 @@ static void angle_kernel(TensorIteratorBase& iter) { // NB: Ignores the negative bit on tensors void conj_kernel(TensorIteratorBase& iter) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( - kBool, kBFloat16, kHalf, kComplexHalf, iter.common_dtype(), "conj_cpu", [&]() { - cpu_kernel_vec( - iter, - [=](scalar_t a) -> scalar_t { return conj_impl(a); }, - [=](Vectorized a) { return a.conj(); }); - }); + AT_DISPATCH_SWITCH(iter.common_dtype(), "conj_cpu", + AT_DISPATCH_CASE_ALL_TYPES_AND3(kBool, kBFloat16, kHalf, [&] { + // conj is a no-op for non-complex types + direct_copy_kernel(iter); + }) + AT_DISPATCH_CASE_COMPLEX_TYPES_AND(kComplexHalf, [&] { + cpu_kernel_vec( + iter, + [=](scalar_t a) -> scalar_t { return conj_impl(a); }, + [=](Vectorized a) { return a.conj(); }); + }) + ); } static void bitwise_not_kernel(TensorIteratorBase& iter) { diff --git a/aten/src/ATen/native/cpu/Unfold2d.cpp b/aten/src/ATen/native/cpu/Unfold2d.cpp index 9bfa9ac8c6ab..fae56c7ebc2b 100644 --- a/aten/src/ATen/native/cpu/Unfold2d.cpp +++ b/aten/src/ATen/native/cpu/Unfold2d.cpp @@ -354,8 +354,7 @@ static void unfolded2d_copy_channels_last( int64_t x = 0; data_index_init(start, y, output_height, x, output_width); - for (const auto k : c10::irange(start, end)) { - (void)k; // Suppress unused variable warning + for (const auto k C10_UNUSED: c10::irange(start, end)) { scalar_t* dst = finput_data + y * output_width * kH * kW * n_input_plane + x * kH * kW * n_input_plane; scalar_t* src = input_data; diff --git a/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp b/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp index 8cfe6674906e..aa5dfb014380 100644 --- a/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp +++ b/aten/src/ATen/native/cpu/UnfoldBackwardKernel.cpp @@ -1,5 +1,6 @@ #define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include #include #include #include @@ -65,8 +66,7 @@ void _unfold_backward_internal_kernel( int64_t grad_in_dim_stride, int64_t grad_in_last_dim_stride, int64_t grad_in_dim_size, - int64_t grad_out_dim_stride, - bool is_step_ge_size + int64_t grad_out_dim_stride ) { if (iter.numel() == 0) { return; @@ -77,55 +77,32 @@ void _unfold_backward_internal_kernel( auto* RESTRICT grad_in_ptr = data[1]; auto* RESTRICT idx_dim_ptr = data[2]; - if (is_step_ge_size) { - auto* RESTRICT idx_last_dim_ptr = data[3]; + for (const auto elem C10_UNUSED : c10::irange(nelems)) { + auto* RESTRICT grad_out_data = reinterpret_cast(grad_out_ptr); + auto* RESTRICT grad_in_data = reinterpret_cast(grad_in_ptr); - for (const auto elem : c10::irange(nelems)) { - (void)elem; //Suppress unused variable warning - auto* RESTRICT grad_out_data = reinterpret_cast(grad_out_ptr); - auto* RESTRICT grad_in_data = reinterpret_cast(grad_in_ptr); + auto idx_dim = *reinterpret_cast(idx_dim_ptr); - auto idx_dim = *reinterpret_cast(idx_dim_ptr); - auto idx_last_dim = *reinterpret_cast(idx_last_dim_ptr); + // left_fold potentially intersecting with idx_dim + // is either (idx_dim - size) / step or the next integer. + int64_t left_fold_idx = (idx_dim > size) ? (idx_dim - size) / step : 0; + if (!(left_fold_idx * step <= idx_dim && idx_dim < left_fold_idx * step + size)) { + ++left_fold_idx; + } - auto grad_out_idx_dim = idx_dim * step + idx_last_dim; - grad_out_data[grad_out_idx_dim * grad_out_dim_stride] = *grad_in_data; + auto right_fold_idx = idx_dim / step; + right_fold_idx = (right_fold_idx >= grad_in_dim_size) + ? (grad_in_dim_size - 1) : right_fold_idx; - grad_out_ptr += strides[0]; - grad_in_ptr += strides[1]; - idx_dim_ptr += strides[2]; - idx_last_dim_ptr += strides[3]; - } - } - else { - for (const auto elem : c10::irange(nelems)) { - (void)elem; //Suppress unused variable warning - auto* RESTRICT grad_out_data = reinterpret_cast(grad_out_ptr); - auto* RESTRICT grad_in_data = reinterpret_cast(grad_in_ptr); - - auto idx_dim = *reinterpret_cast(idx_dim_ptr); - - // left_fold potentially intersecting with idx_dim - // is either (idx_dim - size) / step or the next integer. - int64_t left_fold_idx = (idx_dim > size) ? (idx_dim - size) / step : 0; - if (!(left_fold_idx * step <= idx_dim && idx_dim < left_fold_idx * step + size)) { - ++left_fold_idx; - } - - auto right_fold_idx = idx_dim / step; - right_fold_idx = (right_fold_idx >= grad_in_dim_size) - ? (grad_in_dim_size - 1) : right_fold_idx; - - for (auto fold_idx = left_fold_idx; fold_idx <= right_fold_idx; ++fold_idx) { - auto idx_last_dim = idx_dim - fold_idx * step; - *grad_out_data += grad_in_data[fold_idx * grad_in_dim_stride - + idx_last_dim * grad_in_last_dim_stride]; - } - - grad_out_ptr += strides[0]; - grad_in_ptr += strides[1]; - idx_dim_ptr += strides[2]; + for (auto fold_idx = left_fold_idx; fold_idx <= right_fold_idx; ++fold_idx) { + auto idx_last_dim = idx_dim - fold_idx * step; + *grad_out_data += grad_in_data[fold_idx * grad_in_dim_stride + + idx_last_dim * grad_in_last_dim_stride]; } + + grad_out_ptr += strides[0]; + grad_in_ptr += strides[1]; + idx_dim_ptr += strides[2]; } }; @@ -149,16 +126,8 @@ void unfold_backward_cpu_kernel( auto grad_out_dim_stride = ensure_nonempty_stride(grad_out, dim); - auto is_step_ge_size = (step >= size); - - TensorIterator iter = - is_step_ge_size ? - _make_unfold_backward_iter_over_grad_in( - grad_out, grad_in, dim, size, step - ) : - _make_unfold_backward_iter_over_grad_out( - grad_out, grad_in, dim, size, step - ); + TensorIterator iter = _make_unfold_backward_iter_over_grad_out( + grad_out, grad_in, dim, size, step); AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, @@ -171,8 +140,7 @@ void unfold_backward_cpu_kernel( grad_in_dim_stride, grad_in_last_dim_stride, grad_in_dim_size, - grad_out_dim_stride, - is_step_ge_size + grad_out_dim_stride ); } ); diff --git a/aten/src/ATen/native/cpu/UpSampleKernel.cpp b/aten/src/ATen/native/cpu/UpSampleKernel.cpp index cfc931862372..8d418c264504 100644 --- a/aten/src/ATen/native/cpu/UpSampleKernel.cpp +++ b/aten/src/ATen/native/cpu/UpSampleKernel.cpp @@ -471,12 +471,12 @@ void cpu_upsample_linear_channels_last( TORCH_CHECK(channels > 0, "expected input and output channels greater than 0 but got ", channels); int64_t output_slice_size = output_depth * output_height * output_width * channels; - using accscalar_t = at::acc_type; + using opmath_t = at::opmath_type; using Vec = vec::Vectorized; auto loop2d = [&](int64_t begin, int64_t end) { - const scalar_t height_scale = area_pixel_compute_scale( + const auto height_scale = area_pixel_compute_scale( input_height, output_height, align_corners, scales[0]); - const scalar_t width_scale = area_pixel_compute_scale( + const auto width_scale = area_pixel_compute_scale( input_width, output_width, align_corners, scales[1]); auto input_indexr = [=](int64_t n, int64_t h, int64_t w) { @@ -486,7 +486,7 @@ void cpu_upsample_linear_channels_last( // NOLINTNEXTLINE(cppcoreguidelines-init-variables) int64_t ih0, ih1, iw0, iw1; - scalar_t h0lambda, h1lambda, w0lambda, w1lambda; + opmath_t h0lambda, h1lambda, w0lambda, w1lambda; for (const auto n : c10::irange(begin, end)) { for (const auto oh : c10::irange(output_height)) { compute_source_index_and_lambda( @@ -501,10 +501,10 @@ void cpu_upsample_linear_channels_last( scalar_t* i01 = input_indexr(n, ih0, iw1); scalar_t* i10 = input_indexr(n, ih1, iw0); scalar_t* i11 = input_indexr(n, ih1, iw1); - accscalar_t w00 = h0lambda * w0lambda; - accscalar_t w01 = h0lambda * w1lambda; - accscalar_t w10 = h1lambda * w0lambda; - accscalar_t w11 = h1lambda * w1lambda; + opmath_t w00 = h0lambda * w0lambda; + opmath_t w01 = h0lambda * w1lambda; + opmath_t w10 = h1lambda * w0lambda; + opmath_t w11 = h1lambda * w1lambda; int64_t size = channels; int64_t d = 0; @@ -521,11 +521,11 @@ void cpu_upsample_linear_channels_last( }; auto loop3d = [&](int64_t begin, int64_t end) { - const scalar_t depth_scale = area_pixel_compute_scale( + const auto depth_scale = area_pixel_compute_scale( input_depth, output_depth, align_corners, scales[0]); - const scalar_t height_scale = area_pixel_compute_scale( + const auto height_scale = area_pixel_compute_scale( input_height, output_height, align_corners, scales[1]); - const scalar_t width_scale = area_pixel_compute_scale( + const auto width_scale = area_pixel_compute_scale( input_width, output_width, align_corners, scales[2]); auto input_indexr = [=](int64_t n, int64_t d, int64_t h, int64_t w) { @@ -536,7 +536,7 @@ void cpu_upsample_linear_channels_last( // NOLINTNEXTLINE(cppcoreguidelines-init-variables) int64_t id0, id1, ih0, ih1, iw0, iw1; - scalar_t d0lambda, d1lambda, h0lambda, h1lambda, w0lambda, w1lambda; + opmath_t d0lambda, d1lambda, h0lambda, h1lambda, w0lambda, w1lambda; for (const auto n : c10::irange(begin, end)) { for (const auto od : c10::irange(output_depth)) { compute_source_index_and_lambda( @@ -559,14 +559,14 @@ void cpu_upsample_linear_channels_last( scalar_t* i101 = input_indexr(n, id1, ih0, iw1); scalar_t* i110 = input_indexr(n, id1, ih1, iw0); scalar_t* i111 = input_indexr(n, id1, ih1, iw1); - accscalar_t w000 = d0lambda * h0lambda * w0lambda; - accscalar_t w001 = d0lambda * h0lambda * w1lambda; - accscalar_t w010 = d0lambda * h1lambda * w0lambda; - accscalar_t w011 = d0lambda * h1lambda * w1lambda; - accscalar_t w100 = d1lambda * h0lambda * w0lambda; - accscalar_t w101 = d1lambda * h0lambda * w1lambda; - accscalar_t w110 = d1lambda * h1lambda * w0lambda; - accscalar_t w111 = d1lambda * h1lambda * w1lambda; + opmath_t w000 = d0lambda * h0lambda * w0lambda; + opmath_t w001 = d0lambda * h0lambda * w1lambda; + opmath_t w010 = d0lambda * h1lambda * w0lambda; + opmath_t w011 = d0lambda * h1lambda * w1lambda; + opmath_t w100 = d1lambda * h0lambda * w0lambda; + opmath_t w101 = d1lambda * h0lambda * w1lambda; + opmath_t w110 = d1lambda * h1lambda * w0lambda; + opmath_t w111 = d1lambda * h1lambda * w1lambda; int64_t size = channels; int64_t d = 0; @@ -613,8 +613,7 @@ struct HelperInterpBase { auto new_shape = std::vector(ndims, 1); new_shape[reshape_dim] = output_size; - for (const auto j : c10::irange(interp_size)) { - (void)j; //Suppress unused variable warning + for (const auto j C10_UNUSED : c10::irange(interp_size)) { output.emplace_back(empty(new_shape, CPU(c10::CppTypeToScalarType()))); output.emplace_back(empty(new_shape, CPU(output_type))); } @@ -735,8 +734,7 @@ struct HelperInterpNearest : public HelperInterpBase { auto new_shape = std::vector(ndims, 1); new_shape[reshape_dim] = output_size; - for (const auto j : c10::irange(interp_size)) { - (void)j; //Suppress unused variable warning + for (const auto j C10_UNUSED : c10::irange(interp_size)) { output.emplace_back(empty(new_shape, CPU(c10::CppTypeToScalarType()))); // Defines weights for consistency, but not used output.emplace_back(at::ones(new_shape, CPU(output_type))); @@ -767,7 +765,6 @@ struct HelperInterpNearest : public HelperInterpBase { AT_DISPATCH_FLOATING_TYPES_AND( ScalarType::BFloat16, scalar_type, "compute_indices_weights_nearest", [&] { - scalar_t scale = area_pixel_compute_scale(input_size, output_size, align_corners, opt_scale); auto input_index_ptr = output[0].data_ptr(); @@ -778,10 +775,11 @@ struct HelperInterpNearest : public HelperInterpBase { // index_f32 = (output_index) * scale // input_index = floor(index_f32) // Same as OpenCV INTER_NEAREST - + using opmath_t = at::opmath_type; for (const auto i : c10::irange(output_size)) { - const scalar_t real_input_index = area_pixel_compute_source_index( - scale, i, /*align_corners=*/true, /*cubic=*/false); + const auto real_input_index = + area_pixel_compute_source_index( + scale, i, /*align_corners=*/true, /*cubic=*/false); input_index = static_cast(floorf(real_input_index)); input_index_ptr[i] = static_cast(std::min(input_index, input_size - 1)) * stride; } @@ -818,7 +816,6 @@ struct HelperInterpNearestExact : public HelperInterpNearest { AT_DISPATCH_FLOATING_TYPES( scalar_type, "compute_indices_weights_nearest", [&] { - scalar_t scale = area_pixel_compute_scale(input_size, output_size, align_corners, opt_scale); auto input_index_ptr = output[0].data_ptr(); @@ -829,10 +826,11 @@ struct HelperInterpNearestExact : public HelperInterpNearest { // index_f32 = (output_index + 0.5) * scale - 0.5 // input_index = round(index_f32) // Same as Pillow and Scikit-Image/Scipy ndi.zoom - + using opmath_t = at::opmath_type; for (const auto i : c10::irange(output_size)) { - const scalar_t real_input_index = area_pixel_compute_source_index( - scale, i, /*align_corners=*/align_corners, /*cubic=*/false); + const auto real_input_index = + area_pixel_compute_source_index( + scale, i, /*align_corners=*/align_corners, /*cubic=*/false); input_index = static_cast(floorf(real_input_index + 0.5)); input_index_ptr[i] = static_cast(std::min(input_index, input_size - 1)) * stride; } @@ -865,10 +863,8 @@ struct HelperInterpLinear : public HelperInterpBase { std::vector output; HelperInterpLinear::init_indices_weights( scalar_type, output, output_size, ndims, reshape_dim, HelperInterpLinear::interp_size); - AT_DISPATCH_FLOATING_TYPES_AND( ScalarType::BFloat16, scalar_type, "compute_indices_weights_linear", [&] { - scalar_t scale = area_pixel_compute_scale(input_size, output_size, align_corners, opt_scale); auto input_index0_ptr = output[0].data_ptr(); @@ -970,7 +966,6 @@ struct HelperInterpCubic : public HelperInterpBase { AT_DISPATCH_FLOATING_TYPES_AND( ScalarType::BFloat16, scalar_type, "compute_indices_weights_cubic", [&] { - scalar_t scale = area_pixel_compute_scale(input_size, output_size, align_corners, opt_scale); int64_t input_index; @@ -980,11 +975,11 @@ struct HelperInterpCubic : public HelperInterpBase { int64_t * idx_ptr; scalar_t * wt_ptr; - + using opmath_t = at::opmath_type; for (const auto i : c10::irange(output_size)) { - - const scalar_t real_input_index = area_pixel_compute_source_index( - scale, i, align_corners, /*cubic=*/true); + const auto real_input_index = + area_pixel_compute_source_index( + scale, i, align_corners, /*cubic=*/true); input_index = static_cast(floorf(real_input_index)); get_cubic_upsample_coefficients(coeffs, real_input_index - input_index); @@ -1184,7 +1179,6 @@ void _separable_upsample_generic_Nd_kernel_impl_single_dim( int interp_size = F::interp_size; auto input_scalar_type = input.scalar_type(); - if (interp_size == 1 && input_scalar_type == at::ScalarType::Byte) { // nearest also supports uint8 tensor, but we have to use float // with compute_indices_weights @@ -1266,12 +1260,26 @@ void _upsample_nearest_exact1d_kernel_impl( output, input, false, {scales_w}); } +int _use_vectorized_kernel_cond( + const Tensor& output, + const Tensor& input) { + // This condition is used to know whether we should dispatch to a vectorized + // kernel, or to the more general upsample_generic_Nd_kernel_impl(). For now, + // the vectorized kernels are only optimized for channels_last and when C >= 4 + // (shape = NCHW). For a very wide range of use-cases (typically image or mask + // resizing where we have C < 4), using upsample_generic_Nd_kernel_impl() is + // actually faster. On top of that, bencharmks showed that this also depends on + // the *output* size (output_H + output_W) , for both upsampling and + // downsampling. The current 128 threshold was determined through benchmarks. + return ((input.is_contiguous(at::MemoryFormat::ChannelsLast)) && (input.size(-3) > 3)) || ((output.size(-2) + output.size(-1)) <= 128); +} + void upsample_nearest2d_kernel_impl( const Tensor& output, const Tensor& input, c10::optional scales_h, c10::optional scales_w) { - if (input.is_contiguous(at::MemoryFormat::ChannelsLast)) { + if (_use_vectorized_kernel_cond(output, input)) { AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Byte, at::ScalarType::BFloat16, input.scalar_type(), "upsample_nearest2d_channels_last", [&] { cpu_upsample_nearest_channels_last(output, input, {scales_h, scales_w}); @@ -1287,7 +1295,7 @@ void _upsample_nearest_exact2d_kernel_impl( const Tensor& input, c10::optional scales_h, c10::optional scales_w) { - if (input.is_contiguous(at::MemoryFormat::ChannelsLast)) { + if (_use_vectorized_kernel_cond(output, input)) { AT_DISPATCH_FLOATING_TYPES_AND(at::ScalarType::Byte, input.scalar_type(), "upsample_nearest2d_channels_last", [&] { cpu_upsample_nearest_channels_last(output, input, {scales_h, scales_w}); }); @@ -1346,8 +1354,12 @@ void upsample_bilinear2d_kernel_impl( c10::optional scales_h, c10::optional scales_w) { - // Temporarily dispatch to original channels last implementation - if (input.is_contiguous(at::MemoryFormat::ChannelsLast)) { + // See note above about _use_vectorized_kernel_cond(output, input). The extra cond is present + // because benchmarks showed that with only 1 thread, images (C == 3) were + // slightly faster with the vectorized kernel than with the generic one. + // That's not the case for masks though (C == 1), which strongly benefit from + // using the generic kernel. + if ((_use_vectorized_kernel_cond(output, input)) || (at::get_num_threads() == 1 && input.size(-3) == 3)) { AT_DISPATCH_FLOATING_TYPES_AND(at::ScalarType::BFloat16, input.scalar_type(), "upsample_bilinear2d_channels_last", [&] { cpu_upsample_linear_channels_last(output, input, align_corners, {scales_h, scales_w}); }); diff --git a/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp b/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp index a26cef72bb10..c73e0249dee8 100644 --- a/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp +++ b/aten/src/ATen/native/cpu/UpSampleMoreKernel.cpp @@ -441,9 +441,9 @@ void cpu_upsample_linear_backward_channels_last( int64_t input_width = input_sizes[ndim - 1]; int64_t output_width = output_sizes[ndim - 1]; - using accscalar_t = at::acc_type; + using opmath_t = at::opmath_type; using Vec = vec::Vectorized; - auto acc = [](scalar_t* gin, scalar_t* gout, accscalar_t w, int64_t size) { + auto acc = [](scalar_t* gin, scalar_t* gout, opmath_t w, int64_t size) { int64_t d = 0; for (; d < size - (size % Vec::size()); d += Vec::size()) { Vec gin_vec = Vec::loadu(gin + d) + Vec(w) * Vec::loadu(gout + d); diff --git a/aten/src/ATen/native/cpu/WeightNormKernel.cpp b/aten/src/ATen/native/cpu/WeightNormKernel.cpp index 9dc6b5285805..8ab7226d2127 100644 --- a/aten/src/ATen/native/cpu/WeightNormKernel.cpp +++ b/aten/src/ATen/native/cpu/WeightNormKernel.cpp @@ -1,6 +1,8 @@ -#include +#define TORCH_ASSERT_NO_OPERATORS +#include #include +#include #include #include #include @@ -13,10 +15,10 @@ namespace { template void weight_norm_first_dim_kernel( - Tensor& w, - Tensor& norm, - const Tensor& v, - const Tensor& g, + TensorBase& w, + TensorBase& norm, + const TensorBase& v, + const TensorBase& g, int64_t M, int64_t N) { const auto v_data = v.data_ptr(); const auto g_data = g.data_ptr(); @@ -121,10 +123,10 @@ inline void apply_norm_per_row( template void weight_norm_last_dim_kernel( - Tensor& w, - Tensor& norm, - const Tensor& v, - const Tensor& g, + TensorBase& w, + TensorBase& norm, + const TensorBase& v, + const TensorBase& g, int64_t M, int64_t N) { const auto v_data = v.data_ptr(); const auto g_data = g.data_ptr(); @@ -132,7 +134,7 @@ void weight_norm_last_dim_kernel( auto norm_data = norm.data_ptr(); int num_threads = at::get_num_threads(); - Tensor buffer = at::empty({num_threads, N}, norm.options()).zero_(); + TensorBase buffer = at::detail::empty_cpu({num_threads, N}, norm.options()).zero_(); auto buffer_data = buffer.data_ptr(); // vertical parallel reduction @@ -173,12 +175,12 @@ void weight_norm_last_dim_kernel( template void weight_norm_backward_first_dim_kernel( - Tensor& grad_v, - Tensor& grad_g, - const Tensor& grad_w, - const Tensor& saved_v, - const Tensor& saved_g, - const Tensor& saved_norm, + TensorBase& grad_v, + TensorBase& grad_g, + const TensorBase& grad_w, + const TensorBase& saved_v, + const TensorBase& saved_g, + const TensorBase& saved_norm, int64_t M, int64_t N) { const auto grad_w_data = grad_w.data_ptr(); const auto saved_v_data = saved_v.data_ptr(); @@ -314,12 +316,12 @@ inline void apply_per_row_backward( template void weight_norm_backward_last_dim_kernel( - Tensor& grad_v, - Tensor& grad_g, - const Tensor& grad_w, - const Tensor& saved_v, - const Tensor& saved_g, - const Tensor& saved_norm, + TensorBase& grad_v, + TensorBase& grad_g, + const TensorBase& grad_w, + const TensorBase& saved_v, + const TensorBase& saved_g, + const TensorBase& saved_norm, int64_t M, int64_t N) { const auto grad_w_data = grad_w.data_ptr(); const auto saved_v_data = saved_v.data_ptr(); @@ -335,7 +337,7 @@ void weight_norm_backward_last_dim_kernel( // int num_threads = at::get_num_threads(); int K = std::max(3, num_threads); - Tensor buffer = at::empty({K, N}, saved_norm.options()).zero_(); + TensorBase buffer = at::detail::empty_cpu({K, N}, saved_norm.options()).zero_(); auto buffer_data = buffer.data_ptr(); // vertical parallel reduction @@ -391,10 +393,10 @@ void weight_norm_backward_last_dim_kernel( } void weight_norm_kernel( - Tensor& w, - Tensor& norm, - const Tensor& v, - const Tensor& g, + TensorBase& w, + TensorBase& norm, + const TensorBase& v, + const TensorBase& g, int64_t dim) { TORCH_INTERNAL_ASSERT(dim == 0 || dim == v.dim() - 1, "fused kernels can only be applied for first or last dim"); @@ -414,12 +416,12 @@ void weight_norm_kernel( } void weight_norm_backward_kernel( - Tensor& grad_v, - Tensor& grad_g, - const Tensor& grad_w, - const Tensor& saved_v, - const Tensor& saved_g, - const Tensor& saved_norm, + TensorBase& grad_v, + TensorBase& grad_g, + const TensorBase& grad_w, + const TensorBase& saved_v, + const TensorBase& saved_g, + const TensorBase& saved_norm, int64_t dim) { TORCH_INTERNAL_ASSERT(dim == 0 || dim == saved_v.dim() - 1, "fused kernels can only be applied for first or last dim"); diff --git a/aten/src/ATen/native/cpu/WeightNormKernel.h b/aten/src/ATen/native/cpu/WeightNormKernel.h index 1f5ad65b52d9..6e1f3ec3b029 100644 --- a/aten/src/ATen/native/cpu/WeightNormKernel.h +++ b/aten/src/ATen/native/cpu/WeightNormKernel.h @@ -1,13 +1,18 @@ #pragma once - -#include #include +#include + +namespace at { +class TensorBase; +} namespace at { namespace native { -using weight_norm_fn = void(*)(Tensor&, Tensor&, const Tensor&, const Tensor&, int64_t); +using weight_norm_fn = void(*)( + TensorBase&, TensorBase&, const TensorBase&, const TensorBase&, int64_t); using weight_norm_backward_fn = void(*)( - Tensor&, Tensor&, const Tensor&, const Tensor&, const Tensor&, const Tensor&, int64_t); + TensorBase&, TensorBase&, const TensorBase&, const TensorBase&, + const TensorBase&, const TensorBase&, int64_t); DECLARE_DISPATCH(weight_norm_fn, weight_norm_stub); DECLARE_DISPATCH(weight_norm_backward_fn, weight_norm_backward_stub); diff --git a/aten/src/ATen/native/cpu/radix_sort.h b/aten/src/ATen/native/cpu/radix_sort.h index ad94f2e06e91..2b0657ee6986 100644 --- a/aten/src/ATen/native/cpu/radix_sort.h +++ b/aten/src/ATen/native/cpu/radix_sort.h @@ -5,6 +5,8 @@ namespace at { namespace native { +bool inline is_radix_sort_available() { return false; } + template std::pair radix_sort_parallel( K* inp_key_buf, @@ -21,6 +23,7 @@ std::pair radix_sort_parallel( #else #include +#include namespace at { namespace native { @@ -31,7 +34,7 @@ namespace { // // Copied from fbgemm implementation here: // https://github.com/pytorch/FBGEMM/blob/main/fbgemm_gpu/src/cpu_utils.cpp -// +// // `radix_sort_parallel` is only available when ATen is compiled with OpenMP, // since the algorithm requires sync between omp threads, which can not be perfectly // mapped to `at::parallel_for` at the current stage. @@ -132,8 +135,11 @@ void radix_sort_kernel( } } } + } // namespace +bool inline is_radix_sort_available() { return true; } + template std::pair radix_sort_parallel( K* inp_key_buf, @@ -143,12 +149,16 @@ std::pair radix_sort_parallel( int64_t elements_count, int64_t max_value) { int maxthreads = omp_get_max_threads(); - alignas(64) int histogram[RDX_HIST_SIZE * maxthreads]; - alignas(64) int histogram_ps[RDX_HIST_SIZE * maxthreads + 1]; + std::unique_ptr histogram_tmp(new int[RDX_HIST_SIZE * maxthreads]); + std::unique_ptr histogram_ps_tmp(new int[RDX_HIST_SIZE * maxthreads + 1]); + int* histogram = histogram_tmp.get(); + int* histogram_ps = histogram_ps_tmp.get(); if (max_value == 0) { return std::make_pair(inp_key_buf, inp_value_buf); } - int num_bits = sizeof(K) * 8 - __builtin_clz(max_value); + + // __builtin_clz is not portable + int num_bits = sizeof(K) * 8 - llvm::countLeadingZeros(static_cast>(max_value)); unsigned int num_passes = (num_bits + 7) / 8; #pragma omp parallel diff --git a/aten/src/ATen/native/cuda/Activation.cpp b/aten/src/ATen/native/cuda/Activation.cpp index 4360f8b5c3ef..31926b353b4a 100644 --- a/aten/src/ATen/native/cuda/Activation.cpp +++ b/aten/src/ATen/native/cuda/Activation.cpp @@ -114,7 +114,7 @@ Tensor prelu_cuda(const Tensor& self, const Tensor& weight_) { Tensor result = at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); TORCH_CHECK(weight_dim == 0 || weight_dim == 1, - "prelu: Expected `weight` to be a scalar or 1D tensor, but got ndim = ", + "prelu: Expected `weight` to be a scalar or 1D tensor, but got: ndim = ", weight_dim); // case1: shared weight for all channels diff --git a/aten/src/ATen/native/cuda/AdaptiveAveragePooling.cu b/aten/src/ATen/native/cuda/AdaptiveAveragePooling.cu index 55b0d3322e04..42c10fb6eb29 100644 --- a/aten/src/ATen/native/cuda/AdaptiveAveragePooling.cu +++ b/aten/src/ATen/native/cuda/AdaptiveAveragePooling.cu @@ -23,8 +23,8 @@ #include #include -#define START_IND(a,b,c) (int)std::floor((float)(a * c) / b) -#define END_IND(a,b,c) (int)std::ceil((float)((a + 1) * c) / b) +#define START_IND(a,b,c) ((int64_t)((a / b) * c + ((a % b) * c) / b)) +#define END_IND(a,b,c) (1 + ((int64_t)(a + 1) * c - 1) / b) #define START_IND_INT(a,b,c) ((a * c) / b) #define END_IND_INT(a,b,c) (((a + 1) * c + b - 1) / b) @@ -442,10 +442,14 @@ namespace { output_arg{ output, "output", 2 }; checkAllSameGPU(__func__, {input_arg, output_arg}); - for (int64_t i = 1; i < input.ndimension(); i++) { + TORCH_CHECK(output_size.size() == 2, "adaptive_avg_pool2d: output_size must be 2"); + int64_t ndim = input.dim(); + TORCH_CHECK((ndim == 3 || ndim == 4), + "adaptive_avg_pool2d(): Expected 3D or 4D tensor, but got ", input.sizes()); + for (const auto i : {-2, -1}) { TORCH_CHECK(input.size(i) > 0, "adaptive_avg_pool2d(): Expected input to have non-zero size for non-batch dimensions, " - "but input has sizes ", input.sizes(), " with dimension ", i, " being " + "but input has sizes ", input.sizes(), " with dimension ", i + ndim, " being " "empty"); } @@ -538,9 +542,6 @@ namespace { break; } case at::MemoryFormat::Contiguous: { - TORCH_CHECK((input.ndimension() == 3 || input.ndimension() == 4), - "adaptive_avg_pool2d(): Expected 3D or 4D tensor, but got ", - input.sizes()); int64_t grid_x = input.size(-3); if (input.ndimension() == 4) { input_ = input.contiguous(); diff --git a/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu b/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu index ec71b37015fb..6e43e382ddfc 100644 --- a/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu +++ b/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu @@ -28,12 +28,12 @@ namespace native { namespace { -__device__ inline int start_index(int a, int b, int c) { - return (int)std::floor((float)(a * c) / b); +__device__ inline int64_t start_index(int64_t a, int64_t b, int64_t c) { + return (a / b) * c + ((a % b) * c) / b; } -__device__ inline int end_index(int a, int b, int c) { - return (int)std::ceil((float)((a + 1) * c) / b); +__device__ inline int64_t end_index(int64_t a, int64_t b, int64_t c) { + return 1 + ((a + 1) * c - 1) / b; } // 5d tensor B x D x T x H x W diff --git a/aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu b/aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu index 5b46fb9c34a5..4903fdacc8cb 100644 --- a/aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu +++ b/aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu @@ -28,12 +28,12 @@ namespace native { namespace { -__device__ inline int start_index(int a, int b, int c) { - return (int)std::floor((float)(a * c) / b); +__device__ inline int64_t start_index(int64_t a, int64_t b, int64_t c) { + return (a / b) * c + ((a % b) * c) / b; } -__device__ inline int end_index(int a, int b, int c) { - return (int)std::ceil((float)((a + 1) * c) / b); +__device__ inline int64_t end_index(int64_t a, int64_t b, int64_t c) { + return 1 + ((a + 1) * c - 1) / b; } // 4d tensor B x D x H x W diff --git a/aten/src/ATen/native/cuda/AdaptiveMaxPooling3d.cu b/aten/src/ATen/native/cuda/AdaptiveMaxPooling3d.cu index baafc6c56d46..4694d73b3a02 100644 --- a/aten/src/ATen/native/cuda/AdaptiveMaxPooling3d.cu +++ b/aten/src/ATen/native/cuda/AdaptiveMaxPooling3d.cu @@ -28,12 +28,12 @@ namespace native { namespace { -__device__ inline int start_index(int a, int b, int c) { - return (int)std::floor((float)(a * c) / b); +__device__ inline int64_t start_index(int64_t a, int64_t b, int64_t c) { + return (a / b) * c + ((a % b) * c) / b; } -__device__ inline int end_index(int a, int b, int c) { - return (int)std::ceil((float)((a + 1) * c) / b); +__device__ inline int64_t end_index(int64_t a, int64_t b, int64_t c) { + return 1 + ((a + 1) * c - 1) / b; } // 5d tensor B x D x T x H x W diff --git a/aten/src/ATen/native/cuda/AveragePool2d.cu b/aten/src/ATen/native/cuda/AveragePool2d.cu index 55632014a0de..46e96e902981 100644 --- a/aten/src/ATen/native/cuda/AveragePool2d.cu +++ b/aten/src/ATen/native/cuda/AveragePool2d.cu @@ -32,8 +32,8 @@ __device__ inline int max(int a, int b) { template __global__ void avg_pool2d_out_cuda_frame(const int nthreads, - const scalar_t* const bottom_data, const int channels, - const int height, const int width, const int pooled_height, + const scalar_t* const bottom_data, const int64_t channels, + const int64_t height, const int64_t width, const int64_t pooled_height, const int pooled_width, const int kernel_h, const int kernel_w, const int stride_h, const int stride_w, const int pad_h, const int pad_w, scalar_t* const top_data, const int divisor_override, @@ -81,8 +81,8 @@ __global__ void avg_pool2d_out_cuda_frame(const int nthreads, template __global__ void avg_pool2d_out_cuda_frame_nhwc(const int nthreads, - const scalar_t* const bottom_data, const int channels, - const int height, const int width, const int pooled_height, + const scalar_t* const bottom_data, const int64_t channels, + const int64_t height, const int64_t width, const int pooled_height, const int pooled_width, const int kernel_h, const int kernel_w, const int stride_h, const int stride_w, const int pad_h, const int pad_w, scalar_t* const top_data, const int divisor_override, @@ -130,8 +130,8 @@ __global__ void avg_pool2d_out_cuda_frame_nhwc(const int nthreads, template __global__ void avg_pool2d_backward_out_cuda_frame(const int nthreads, const scalar_t* const top_diff, - const int channels, const int height, - const int width, const int pooled_height, const int pooled_width, + const int64_t channels, const int64_t height, + const int64_t width, const int64_t pooled_height, const int64_t pooled_width, const int kernel_h, const int kernel_w, const int stride_h, const int stride_w, const int pad_h, const int pad_w, scalar_t* const bottom_diff, const int divisor_override, @@ -187,8 +187,8 @@ __global__ void avg_pool2d_backward_out_cuda_frame(const int nthreads, const sca template __global__ void avg_pool2d_backward_out_cuda_frame_nhwc(const int nthreads, const scalar_t* const top_diff, - const int channels, const int height, - const int width, const int pooled_height, const int pooled_width, + const int64_t channels, const int64_t height, + const int64_t width, const int pooled_height, const int pooled_width, const int kernel_h, const int kernel_w, const int stride_h, const int stride_w, const int pad_h, const int pad_w, scalar_t* const bottom_diff, const int divisor_override, diff --git a/aten/src/ATen/native/cuda/BinaryLogicalOpsKernels.cu b/aten/src/ATen/native/cuda/BinaryLogicalOpsKernels.cu index e69674412c79..cc6046c003e4 100644 --- a/aten/src/ATen/native/cuda/BinaryLogicalOpsKernels.cu +++ b/aten/src/ATen/native/cuda/BinaryLogicalOpsKernels.cu @@ -18,7 +18,7 @@ void logical_and_kernel_cuda(TensorIterator& iter) { #if AT_USE_JITERATOR() static const auto logical_and_string = jiterator_stringify( template - T logical_and_kernel(T a, T b) { + bool logical_and_kernel(T a, T b) { return a && b; } ); // logical_and_string @@ -48,24 +48,76 @@ void logical_and_kernel_cuda(TensorIterator& iter) { } } +const char logical_or_name[] = "logical_or_kernel"; void logical_or_kernel_cuda(TensorIterator& iter) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kHalf, kBool, ScalarType::BFloat16, - iter.common_dtype(), "logical_or_cuda", [&]() { + auto dtype = iter.common_dtype(); + if (at::isComplexType(dtype)) { +#if AT_USE_JITERATOR() + static const auto logical_or_string = jiterator_stringify( + template + bool logical_or_kernel(T a, T b) { + return a || b; + } + ); // logical_or_string + AT_DISPATCH_COMPLEX_TYPES(dtype, "logical_or_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/ logical_or_name, + /*return_dtype=*/ scalar_t, + /*common_dtype=*/ scalar_t, + /*arity=*/ 2>(iter, logical_or_string); + }); +#else + AT_DISPATCH_COMPLEX_TYPES(dtype, "logical_or_cuda", [&]() { + gpu_kernel_with_scalars(iter, []GPU_LAMBDA(scalar_t a, scalar_t b) -> bool { + return a || b; + }); + }); +#endif + } else { + AT_DISPATCH_ALL_TYPES_AND3(kHalf, kBool, ScalarType::BFloat16, + dtype, "logical_or_cuda", [&]() { opmath_symmetric_gpu_kernel_with_scalars( iter, []GPU_LAMBDA(scalar_t a, scalar_t b) -> bool { return a || b; }); }); + } } +const char logical_xor_name[] = "logical_xor_kernel"; void logical_xor_kernel_cuda(TensorIterator& iter) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kHalf, kBool, ScalarType::BFloat16, - iter.common_dtype(), "logical_xor_cuda", [&]() { + auto dtype = iter.common_dtype(); + if (at::isComplexType(dtype)) { +#if AT_USE_JITERATOR() + static const auto logical_xor_string = jiterator_stringify( + template + bool logical_xor_kernel(T a, T b) { + return bool(a) != bool(b); + } + ); + AT_DISPATCH_COMPLEX_TYPES(dtype, "logical_xor_cuda", [&]() { + jitted_gpu_kernel< + /*name=*/ logical_xor_name, + /*return_dtype=*/ scalar_t, + /*common_dtype=*/ scalar_t, + /*arity=*/ 2>(iter, logical_xor_string); + }); // logical_xor_string +#else + AT_DISPATCH_COMPLEX_TYPES(dtype, "logical_xor_cuda", [&]() { + gpu_kernel_with_scalars(iter, []GPU_LAMBDA(scalar_t a, scalar_t b) -> bool { + return bool(a) != bool(b); + }); + }); +#endif + } else { + AT_DISPATCH_ALL_TYPES_AND3(kHalf, kBool, ScalarType::BFloat16, + dtype, "logical_xor_cuda", [&]() { opmath_symmetric_gpu_kernel_with_scalars( iter, []GPU_LAMBDA(scalar_t a, scalar_t b) -> bool { return bool(a) != bool(b); }); }); + } } REGISTER_DISPATCH(logical_and_stub, &logical_and_kernel_cuda); diff --git a/aten/src/ATen/native/cuda/Bucketization.cu b/aten/src/ATen/native/cuda/Bucketization.cu index 2a3d5730d786..21c582216628 100644 --- a/aten/src/ATen/native/cuda/Bucketization.cu +++ b/aten/src/ATen/native/cuda/Bucketization.cu @@ -10,7 +10,6 @@ #include #include #else -#include #include #include #include @@ -191,11 +190,6 @@ Tensor searchsorted_cuda( return result; } -// See [Note about _torch_cuda_cu_linker_symbol_op and torch_cuda_cu] in native_functions.yaml -Tensor _torch_cuda_cu_linker_symbol_op_cuda(const Tensor& self) { - return self; -} - Tensor searchsorted_cuda( const Tensor& sorted_sequence, const Scalar& self, diff --git a/aten/src/ATen/native/cuda/Col2Im.cu b/aten/src/ATen/native/cuda/Col2Im.cu index 5cb825a2e70b..53eb2df3013e 100644 --- a/aten/src/ATen/native/cuda/Col2Im.cu +++ b/aten/src/ATen/native/cuda/Col2Im.cu @@ -16,7 +16,6 @@ #include #else #include -#include #include #include #endif @@ -99,17 +98,13 @@ void col2im_out_cuda_template( int64_t batch_size = input.size(0); int64_t n_input_plane = input.size(1); int64_t n_output_plane = n_input_plane / (kernel_width * kernel_height); + int64_t input_batch_stride = input.stride(0); output.resize_({batch_size, n_output_plane, output_height, output_width}); - output.zero_(); + int64_t output_batch_stride = output.stride(0); - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(kHalf, + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kHalf, kBFloat16, input.scalar_type(), "col2im_out_cuda", [&] { - using accscalar_t = at::acc_type; - - Tensor input_n; - Tensor output_n; - int64_t height_col = (output_height + 2 * pad_height - (dilation_height * (kernel_height - 1) + 1)) / stride_height + @@ -119,28 +114,26 @@ void col2im_out_cuda_template( stride_width + 1; - for (int64_t elt = 0; elt < batch_size; elt++) { - input_n = input.select(0, elt); - output_n = output.select(0, elt); - - col2im( - at::cuda::getCurrentCUDAStream(), - input_n.data_ptr(), - n_output_plane, - output_height, - output_width, - height_col, - width_col, - kernel_height, - kernel_width, - pad_height, - pad_width, - stride_height, - stride_width, - dilation_height, - dilation_width, - output_n.data_ptr()); - } + col2im_batched( + at::cuda::getCurrentCUDAStream(), + input.data_ptr(), + input_batch_stride, + batch_size, + n_output_plane, + output_height, + output_width, + height_col, + width_col, + kernel_height, + kernel_width, + pad_height, + pad_width, + stride_height, + stride_width, + dilation_height, + dilation_width, + output.data_ptr(), + output_batch_stride); if (!batched_input) { output.resize_({n_output_plane, output_height, output_width}); @@ -148,18 +141,6 @@ void col2im_out_cuda_template( }); } -void col2im_backward_out_cuda_template( - Tensor& grad_input, - const Tensor& grad_output, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride) { - // im2col_out_cuda checks size of kernel_size, dilation, padding and stride - at::native::im2col_out_cuda( - grad_output, kernel_size, dilation, padding, stride, grad_input); -} - } // namespace Tensor& col2im_out_cuda(const Tensor& input, @@ -188,29 +169,5 @@ Tensor col2im_cuda( return output; } -Tensor& col2im_backward_out_cuda(const Tensor& grad_output, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride, - Tensor& grad_input) { - col2im_backward_out_cuda_template( - grad_input, grad_output, kernel_size, dilation, padding, stride); - return grad_input; -} - -Tensor col2im_backward_cuda( - const Tensor& grad_output, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride) { - Tensor grad_input = at::empty_like(grad_output, LEGACY_CONTIGUOUS_MEMORY_FORMAT); - - col2im_backward_out_cuda_template( - grad_input, grad_output, kernel_size, dilation, padding, stride); - return grad_input; -} - } // namespace native } // namespace at diff --git a/aten/src/ATen/native/cuda/Copy.cu b/aten/src/ATen/native/cuda/Copy.cu index 4fb647e329d3..564ecf1c1291 100644 --- a/aten/src/ATen/native/cuda/Copy.cu +++ b/aten/src/ATen/native/cuda/Copy.cu @@ -6,7 +6,6 @@ #include #include #include -#include #include #include #include @@ -17,13 +16,15 @@ #include #endif +#include +#include + namespace at { namespace native { void neg_kernel_cuda(TensorIteratorBase &iter); void conj_kernel_cuda(TensorIteratorBase &iter); -namespace { void direct_copy_kernel_cuda(TensorIteratorBase &iter) { ScalarType dtype = iter.dtype(0); if (isQIntType(dtype)) { @@ -43,12 +44,13 @@ void neg_conj_kernel_cuda(TensorIteratorBase &iter) { gpu_kernel(iter, [] GPU_LAMBDA(scalar_t x) { return -std::conj(x); }); }); } -} // namespace (anonymous) using namespace at::cuda; // device-to-device copy, does type conversion -void copy_device_to_device(TensorIterator& iter, bool non_blocking) { +void copy_device_to_device(TensorIterator& iter, + bool non_blocking, + bool p2p_enabled) { int64_t numel = iter.numel(); // We can memcpy the memory if both tensors have the same type AND both @@ -89,11 +91,28 @@ void copy_device_to_device(TensorIterator& iter, bool non_blocking) { void *src = iter.data_ptr(1); size_t size = numel * iter.element_size(0); if (src != dst || src_device != dst_device) { - // Perform the copy - AT_CUDA_CHECK(cudaMemcpyAsync( - dst, src, size, - cudaMemcpyDeviceToDevice, - copy_stream)); + // Due to bizarre cuda driver intricacies, copies of + // cudaMallocAsynced memory between devices that aren't + // peer-to-peer-capable need "cudaMemcpyPeerAsync". +#ifdef USE_ROCM + bool needs_pool_specific_peer_access = false; +#else + bool needs_pool_specific_peer_access = CUDACachingAllocator::get()->needsPoolSpecificPeerAccess(); +#endif + bool needs_MemcpyPeer = (src_device != dst_device && + needs_pool_specific_peer_access && + !p2p_enabled); + if (needs_MemcpyPeer) { + AT_CUDA_CHECK(cudaMemcpyPeerAsync( + dst, dst_device.index(), + src, src_device.index(), + size, copy_stream)); + } else { + AT_CUDA_CHECK(cudaMemcpyAsync( + dst, src, size, + cudaMemcpyDeviceToDevice, + copy_stream)); + } } } else { if (same_neg) { @@ -207,7 +226,7 @@ static void copy_kernel_cuda(TensorIterator& iter, bool non_blocking) { // Copy on GPU (or between GPUs) if (dst_device.is_cuda() && src_device.is_cuda()) { - copy_device_to_device(iter, non_blocking); + copy_device_to_device(iter, non_blocking, p2p_enabled); return; } diff --git a/aten/src/ATen/native/cuda/Copy.h b/aten/src/ATen/native/cuda/Copy.h new file mode 100644 index 000000000000..5639567d6666 --- /dev/null +++ b/aten/src/ATen/native/cuda/Copy.h @@ -0,0 +1,10 @@ +#pragma once + +namespace at { +struct TensorIteratorBase; + +namespace native { + +void direct_copy_kernel_cuda(TensorIteratorBase &iter); + +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/CumminmaxKernel.cu b/aten/src/ATen/native/cuda/CumminmaxKernel.cu new file mode 100644 index 000000000000..ea73273e2d4b --- /dev/null +++ b/aten/src/ATen/native/cuda/CumminmaxKernel.cu @@ -0,0 +1,29 @@ +#define TORCH_ASSERT_NO_OPERATORS +#include +#include + +#include +#include + +#include +#include + +namespace at { namespace native { + +void launch_cummax_cuda_kernel(const TensorBase& self, const TensorBase& values, const TensorBase& indices, int64_t dim) { + AT_DISPATCH_ALL_TYPES_AND3(at::ScalarType::Bool, at::ScalarType::Half, at::ScalarType::BFloat16, + self.scalar_type(), "cummax_cuda", [&]() { + scalar_t init = self.is_floating_point() ? (-1*std::numeric_limits::infinity()) : std::numeric_limits::lowest(); + scan_dim_with_indices(self, values, indices, dim, init, std::greater_equal()); + }); +} + +void launch_cummin_cuda_kernel(const TensorBase& self, const TensorBase& values, const TensorBase& indices, int64_t dim) { + AT_DISPATCH_ALL_TYPES_AND3(at::ScalarType::Bool, at::ScalarType::Half, at::ScalarType::BFloat16, + self.scalar_type(), "cummin_cuda", [&]() { + scalar_t init = self.is_floating_point() ? std::numeric_limits::infinity() : std::numeric_limits::max(); + scan_dim_with_indices(self, values, indices, dim, init, std::less_equal()); + }); +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/CumprodKernel.cu b/aten/src/ATen/native/cuda/CumprodKernel.cu new file mode 100644 index 000000000000..d1f3233abb13 --- /dev/null +++ b/aten/src/ATen/native/cuda/CumprodKernel.cu @@ -0,0 +1,23 @@ +#define TORCH_ASSERT_NO_OPERATORS +#include +#include + +#include +#include + +namespace at { namespace native { + +void launch_cumprod_cuda_kernel(const TensorBase& result, const TensorBase& self, int64_t dim) { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2( + ScalarType::Half, ScalarType::BFloat16, self.scalar_type(), "cumprod_cuda", [&]() { + scalar_t init = 1; + scan_dim( + self, + result, + dim, + init, + std::multiplies()); + }); +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/CumsumKernel.cu b/aten/src/ATen/native/cuda/CumsumKernel.cu new file mode 100644 index 000000000000..85866b3f0f32 --- /dev/null +++ b/aten/src/ATen/native/cuda/CumsumKernel.cu @@ -0,0 +1,25 @@ +#define TORCH_ASSERT_NO_OPERATORS +#include +#include + +#include +#include + +namespace at { namespace native { + +void launch_cumsum_cuda_kernel(const TensorBase& result, const TensorBase& self, int64_t dim) { + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2( + ScalarType::Half, ScalarType::BFloat16, + self.scalar_type(), "cumsum_cuda", + [&]() { + scalar_t init = 0; + scan_dim( + self, + result, + dim, + init, + std::plus()); + }); +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/DepthwiseConv2d.cu b/aten/src/ATen/native/cuda/DepthwiseConv2d.cu index 8f0f9b99903a..20748837bbaf 100644 --- a/aten/src/ATen/native/cuda/DepthwiseConv2d.cu +++ b/aten/src/ATen/native/cuda/DepthwiseConv2d.cu @@ -236,7 +236,6 @@ __global__ void conv_depthwise2d_grad_weight_kernel( } } } - __syncthreads(); // At this point each thread in the block has a local gradient, which we need to // accumulate prior to writing the global value diff --git a/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu b/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu index 05a201147241..728c3144f083 100644 --- a/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu +++ b/aten/src/ATen/native/cuda/DilatedMaxPool2d.cu @@ -44,8 +44,8 @@ static __device__ inline int p_end(int size, int pad, int pooled_size, int strid // kernels borrowed from Caffe template __global__ void max_pool_forward_nchw(const int nthreads, const scalar_t* bottom_data, - const int channels, const int height, - const int width, const int pooled_height, const int pooled_width, + const int64_t channels, const int64_t height, + const int64_t width, const int pooled_height, const int pooled_width, const int kernel_h, const int kernel_w, const int stride_h, const int stride_w, const int pad_h, const int pad_w, const int dilation_h, const int dilation_w, scalar_t* top_data, @@ -83,8 +83,8 @@ __global__ void max_pool_forward_nchw(const int nthreads, const scalar_t* bottom template C10_LAUNCH_BOUNDS_1(CUDA_MAX_THREADS) __global__ void max_pool_forward_nhwc(const scalar_t* bottom_data, const int nbatch, - const int channels, const int height, - const int width, const int pooled_height, const int pooled_width, + const int64_t channels, const int64_t height, + const int64_t width, const int pooled_height, const int pooled_width, const int kernel_h, const int kernel_w, const int stride_h, const int stride_w, const int pad_h, const int pad_w, const int dilation_h, const int dilation_w, @@ -176,8 +176,8 @@ C10_LAUNCH_BOUNDS_2(BLOCK_THREADS, 4) C10_LAUNCH_BOUNDS_2(BLOCK_THREADS, 8) #endif __global__ void max_pool_backward_nchw(const scalar_t* top_diff, - const int64_t* top_mask, const int num, const int channels, - const int height, const int width, const int pooled_height, + const int64_t* top_mask, const int num, const int64_t channels, + const int64_t height, const int64_t width, const int pooled_height, const int pooled_width, const int kernel_h, const int kernel_w, const int stride_h, const int stride_w, const int pad_h, const int pad_w, const int dilation_h, const int dilation_w, @@ -209,8 +209,8 @@ __global__ void max_pool_backward_nchw(const scalar_t* top_diff, template C10_LAUNCH_BOUNDS_1(CUDA_MAX_THREADS) __global__ void max_pool_backward_nhwc(const scalar_t* top_diff, - const int64_t* top_mask, const int nbatch, const int channels, - const int height, const int width, const int pooled_height, + const int64_t* top_mask, const int nbatch, const int64_t channels, + const int64_t height, const int64_t width, const int pooled_height, const int pooled_width, const int kernel_h, const int kernel_w, const int stride_h, const int stride_w, const int pad_h, const int pad_w, const int dilation_h, const int dilation_w, @@ -242,9 +242,9 @@ __global__ void max_pool_backward_nhwc(const scalar_t* top_diff, int iH = (height + gridDim.z-1) / gridDim.z; int iW = (width + gridDim.y-1) / gridDim.y; int istartH = threadIdx.z + blockIdx.z*iH; - int iendH = ::min(istartH+iH, height); + int iendH = ::min(static_cast(istartH)+iH, height); int istartW = threadIdx.y + blockIdx.y*iW; - int iendW = ::min(istartW+iW, width); + int iendW = ::min(static_cast(istartW)+iW, width); for (int ih = istartH; ih < iendH; ih+=blockDim.z) { int phstart = p_start(ih, pad_h, kernel_h, dilation_h, stride_h); @@ -423,14 +423,14 @@ IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, bool ceil_mode, -const Tensor& indices, +const Tensor& indices_, const Tensor& gradInput) { NoNamesGuard guard; TensorArg gradInput_arg{ gradInput, "gradInput", 1 }; TensorArg gradOutput_arg{ gradOutput_, "gradOutput_", 2 }; TensorArg input_arg{ input_, "input_", 3 }; - TensorArg indices_arg{ indices, "indices", 4 }; + TensorArg indices_arg{ indices_, "indices", 4 }; checkAllSameGPU(__func__, {gradInput_arg, gradOutput_arg, input_arg, indices_arg}); @@ -474,6 +474,8 @@ const Tensor& gradInput) { const int64_t out_stride_h = gradOutput.stride(-2); const int64_t out_stride_w = gradOutput.stride(-1); + const Tensor indices = indices_.contiguous(memory_format); + gradInput.zero_(); int64_t count = input.numel(); diff --git a/aten/src/ATen/native/cuda/DistanceKernel.cu b/aten/src/ATen/native/cuda/DistanceKernel.cu index a9130bd3e808..2ae4cd592e6b 100644 --- a/aten/src/ATen/native/cuda/DistanceKernel.cu +++ b/aten/src/ATen/native/cuda/DistanceKernel.cu @@ -6,6 +6,8 @@ #include #include +#include +#include #include #ifndef AT_PER_OPERATOR_HEADERS @@ -21,20 +23,7 @@ namespace at { namespace native { namespace { -static const int forward_threads = 256; - -template -static __forceinline__ __device__ scalar_t device_sqrt(scalar_t val); - -template <> -__forceinline__ __device__ float device_sqrt(float val) { - return ::sqrtf(val); -} - -template <> -__forceinline__ __device__ double device_sqrt(double val) { - return ::sqrt(val); -} +constexpr int kCUDANumThreads = 256; template struct dists { @@ -92,27 +81,16 @@ struct dists { }; template -__device__ static inline scalar_t reduce_agg(scalar_t agg) { - for (int offset = warpSize / 2; offset > 0; offset /= 2) { - F::agg(agg, WARP_SHFL_DOWN(agg, offset)); - } - - __shared__ scalar_t shared[forward_threads]; - int lane = threadIdx.x % warpSize; - int warp_id = threadIdx.x / warpSize; - if (lane == 0) { - shared[warp_id] = agg; - } +struct DistReduceOp { + __forceinline__ __device__ scalar_t combine(scalar_t a, scalar_t b) const { + F::agg(a, b); + return a; + } - __syncthreads(); - agg = (threadIdx.x < blockDim.x / warpSize) ? shared[lane] : 0.0; - if (warp_id == 0) { - for (int offset = blockDim.x / warpSize / 2; offset > 0; offset /= 2) { - F::agg(agg, WARP_SHFL_DOWN(agg, offset)); + __forceinline__ __device__ scalar_t warp_shfl_down(scalar_t data, int offset) const { + return WARP_SHFL_DOWN(data, offset); } - } - return agg; -} +}; template __global__ static void pdist_kernel_cuda_impl(scalar_t * result, const scalar_t * self, const int64_t n, const int64_t m, const scalar_t p, @@ -133,7 +111,9 @@ __global__ static void pdist_kernel_cuda_impl(scalar_t * result, const scalar_t F::inc(agg, std::abs(*a - *b), p); } - agg = reduce_agg(agg); + __shared__ scalar_t agg_smem[kCUDANumThreads]; + scalar_t agg_init{0.0}; + agg = cuda_utils::BlockReduce(agg, DistReduceOp{}, agg_init, agg_smem); if (threadIdx.x == 0) { result[k] = F::finish(agg, p); } @@ -222,7 +202,9 @@ __global__ static void cdist_kernel_cuda_impl(scalar_t * result, const scalar_t for (; a < end; a += stride, b += stride) { F::inc(agg, std::abs(*a - *b), p); } - agg = reduce_agg(agg); + __shared__ scalar_t agg_smem[kCUDANumThreads]; + scalar_t agg_init{0.0}; + agg = cuda_utils::BlockReduce(agg, DistReduceOp{}, agg_init, agg_smem); if (threadIdx.x == 0) { result[blockIdx.x] = F::finish(agg, p); } @@ -236,31 +218,27 @@ void cdist_kernel_impl(Tensor& result, const Tensor& x1, const Tensor& x2, doubl const int64_t l1_size = r1 * m; const int64_t l2_size = r2 * m; const dim3 grid(result.numel()); - const dim3 block(forward_threads); + const dim3 block(kCUDANumThreads); AT_DISPATCH_FLOATING_TYPES(x1.scalar_type(), "cdist_cuda", [&] { + auto impl_fptr = cdist_kernel_cuda_impl::p>; if (p == 0.0) { - cdist_kernel_cuda_impl::zero><<>>(result.data_ptr(), x1.data_ptr(), x2.data_ptr(), p, r2, m, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = cdist_kernel_cuda_impl::zero>; } else if (p == 1.0) { - cdist_kernel_cuda_impl::one><<>>(result.data_ptr(), x1.data_ptr(), x2.data_ptr(), p, r2, m, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = cdist_kernel_cuda_impl::one>; } else if (p == 2.0) { - cdist_kernel_cuda_impl::two><<>>(result.data_ptr(), x1.data_ptr(), x2.data_ptr(), p, r2, m, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = cdist_kernel_cuda_impl::two>; } else if (std::isinf(p)) { - cdist_kernel_cuda_impl::inf><<>>(result.data_ptr(), x1.data_ptr(), x2.data_ptr(), p, r2, m, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); - } else { - cdist_kernel_cuda_impl::p><<>>(result.data_ptr(), x1.data_ptr(), x2.data_ptr(), p, r2, m, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = cdist_kernel_cuda_impl::inf>; } + impl_fptr<<>>(result.data_ptr(), x1.data_ptr(), x2.data_ptr(), p, r2, m, r_size, l1_size, l2_size); + C10_CUDA_KERNEL_LAUNCH_CHECK(); }); } void pdist_forward_kernel_impl(Tensor& result, const Tensor& self, double p) { const dim3 grid(result.numel()); - const dim3 block(forward_threads); + const dim3 block(kCUDANumThreads); int64_t n = self.size(0); int64_t m = self.size(1); // https://github.com/pytorch/pytorch/issues/15511 demonstrated we need to do @@ -269,22 +247,18 @@ void pdist_forward_kernel_impl(Tensor& result, const Tensor& self, double p) { const double n2_squared_minus_1 = n2 * n2 - 1; AT_DISPATCH_FLOATING_TYPES(self.scalar_type(), "pdist_cuda", [&] { + auto impl_fptr = pdist_kernel_cuda_impl::p>; if (p == 0.0) { - pdist_kernel_cuda_impl::zero><<>>(result.data_ptr(), self.data_ptr(), n, m, p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = pdist_kernel_cuda_impl::zero>; } else if (p == 1.0) { - pdist_kernel_cuda_impl::one><<>>(result.data_ptr(), self.data_ptr(), n, m, p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = pdist_kernel_cuda_impl::one>; } else if (p == 2.0) { - pdist_kernel_cuda_impl::two><<>>(result.data_ptr(), self.data_ptr(), n, m, p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = pdist_kernel_cuda_impl::two>; } else if (std::isinf(p)) { - pdist_kernel_cuda_impl::inf><<>>(result.data_ptr(), self.data_ptr(), n, m, p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); - } else { - pdist_kernel_cuda_impl::p><<>>(result.data_ptr(), self.data_ptr(), n, m, p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = pdist_kernel_cuda_impl::inf>; } + impl_fptr<<>>(result.data_ptr(), self.data_ptr(), n, m, p, n2, n2_squared_minus_1); + C10_CUDA_KERNEL_LAUNCH_CHECK(); }); } @@ -311,22 +285,18 @@ void pdist_backward_kernel_impl(Tensor& result, const Tensor& grad, const Tensor Tensor buffer = at::empty({n - 1, result.size(0), result.size(1)}, result.options()); AT_DISPATCH_FLOATING_TYPES(self.scalar_type(), "pdist_cuda_backward", [&] { + auto impl_fptr = pdist_backward_kernel_cuda_impl::p>; if (p == 1.0) { - pdist_backward_kernel_cuda_impl::one><<>>(buffer.data_ptr(), grad.data_ptr(), self.data_ptr(), dist.data_ptr(), grad.stride(0), n, m, dist.numel(), p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = pdist_backward_kernel_cuda_impl::one>; } else if (p < 2.0) { - pdist_backward_kernel_cuda_impl::lt_two><<>>(buffer.data_ptr(), grad.data_ptr(), self.data_ptr(), dist.data_ptr(), grad.stride(0), n, m, dist.numel(), p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = pdist_backward_kernel_cuda_impl::lt_two>; } else if (p == 2.0) { - pdist_backward_kernel_cuda_impl::two><<>>(buffer.data_ptr(), grad.data_ptr(), self.data_ptr(), dist.data_ptr(), grad.stride(0), n, m, dist.numel(), p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = pdist_backward_kernel_cuda_impl::two>; } else if (std::isinf(p)) { - pdist_backward_kernel_cuda_impl::inf><<>>(buffer.data_ptr(), grad.data_ptr(), self.data_ptr(), dist.data_ptr(), grad.stride(0), n, m, dist.numel(), p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); - } else { - pdist_backward_kernel_cuda_impl::p><<>>(buffer.data_ptr(), grad.data_ptr(), self.data_ptr(), dist.data_ptr(), grad.stride(0), n, m, dist.numel(), p, n2, n2_squared_minus_1); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = pdist_backward_kernel_cuda_impl::inf>; } + impl_fptr<<>>(buffer.data_ptr(), grad.data_ptr(), self.data_ptr(), dist.data_ptr(), grad.stride(0), n, m, dist.numel(), p, n2, n2_squared_minus_1); + C10_CUDA_KERNEL_LAUNCH_CHECK(); }); at::sum_out(result, buffer, 0); @@ -364,32 +334,20 @@ void cdist_backward_kernel_impl(Tensor& result, const Tensor& grad, const Tensor Tensor buffer = at::empty({batch, r2, r1, m}, result.options()); AT_DISPATCH_FLOATING_TYPES(result.scalar_type(), "cdist_cuda_backward", [&] { + auto impl_fptr = cdist_backward_kernel_cuda_impl::p>; if (p == 1.0) { - cdist_backward_kernel_cuda_impl::one><<>>(buffer.data_ptr(), - grad.data_ptr(), x1.data_ptr(), x2.data_ptr(), dist.data_ptr(), - p, r1, r2, m, count, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = cdist_backward_kernel_cuda_impl::one>; } else if (p < 2.0) { - cdist_backward_kernel_cuda_impl::lt_two><<>>(buffer.data_ptr(), - grad.data_ptr(), x1.data_ptr(), x2.data_ptr(), dist.data_ptr(), - p, r1, r2, m, count, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = cdist_backward_kernel_cuda_impl::lt_two>; } else if (p == 2.0) { - cdist_backward_kernel_cuda_impl::two><<>>(buffer.data_ptr(), - grad.data_ptr(), x1.data_ptr(), x2.data_ptr(), dist.data_ptr(), - p, r1, r2, m, count, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); + impl_fptr = cdist_backward_kernel_cuda_impl::two>; } else if (std::isinf(p)) { - cdist_backward_kernel_cuda_impl::inf><<>>(buffer.data_ptr(), - grad.data_ptr(), x1.data_ptr(), x2.data_ptr(), dist.data_ptr(), - p, r1, r2, m, count, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); - } else { - cdist_backward_kernel_cuda_impl::p><<>>(buffer.data_ptr(), + impl_fptr = cdist_backward_kernel_cuda_impl::inf>; + } + impl_fptr<<>>(buffer.data_ptr(), grad.data_ptr(), x1.data_ptr(), x2.data_ptr(), dist.data_ptr(), p, r1, r2, m, count, r_size, l1_size, l2_size); - C10_CUDA_KERNEL_LAUNCH_CHECK(); - } + C10_CUDA_KERNEL_LAUNCH_CHECK(); }); at::sum_out(result, buffer, 1); diff --git a/aten/src/ATen/native/cuda/Distributions.cu b/aten/src/ATen/native/cuda/Distributions.cu index 717ad4d985d4..f45d745eb418 100644 --- a/aten/src/ATen/native/cuda/Distributions.cu +++ b/aten/src/ATen/native/cuda/Distributions.cu @@ -47,6 +47,7 @@ void poisson_cuda_kernel( at::PhiloxCudaState philox_args) { auto functor = [philox_args] __device__( scalar_t & ret_val, const scalar_t& lambda) { + CUDA_KERNEL_ASSERT(lambda >= 0 && "invalid Poisson rate, expected rate to be non-negative"); auto seeds = at::cuda::philox::unpack(philox_args); curandStatePhilox4_32_10_t state; curand_init(std::get<0>(seeds), diff --git a/aten/src/ATen/native/cuda/EmbeddingBag.cu b/aten/src/ATen/native/cuda/EmbeddingBag.cu index 7ac3a7151b79..2cd76cbe34d1 100644 --- a/aten/src/ATen/native/cuda/EmbeddingBag.cu +++ b/aten/src/ATen/native/cuda/EmbeddingBag.cu @@ -26,6 +26,7 @@ #include #include #include +#include #include @@ -457,14 +458,6 @@ Tensor _embedding_bag_dense_backward_cuda(const Tensor &grad_, const Tensor &ind } } -template -__inline__ __device__ -static scalar_t warpReduceSum(scalar_t val) { - for (int offset = C10_WARP_SIZE/2; offset > 0; offset /= 2) - val += WARP_SHFL_DOWN(val, offset); - return val; -} - template __global__ static void _embedding_bag_per_sample_weights_backward_kernel( const scalar_t* grad, int64_t grad_stride0, int64_t grad_stride1, @@ -495,7 +488,7 @@ __global__ static void _embedding_bag_per_sample_weights_backward_kernel( weight[weight_stride0 * embedding_idx + weight_stride1 * feature_idx]; } } - result = warpReduceSum(result); + result = cuda_utils::WarpReduceSum(result); if (thread_in_warp == 0) { output[sample_idx] = result; } diff --git a/aten/src/ATen/native/cuda/ForeachFunctors.cuh b/aten/src/ATen/native/cuda/ForeachFunctors.cuh index 8a16534cec3f..a72c33ac6960 100644 --- a/aten/src/ATen/native/cuda/ForeachFunctors.cuh +++ b/aten/src/ATen/native/cuda/ForeachFunctors.cuh @@ -47,6 +47,25 @@ __device__ bool init_args( return all_aligned; } +template +__device__ bool init_args( + T** args, + FusedOptimizerTensorListMetadata& tl, + int chunk_idx, + int chunk_size, + int tensor_loc) { + bool all_aligned = true; + for (int i = 0; i < depth; i++) { + args[i] = (T*)tl.addresses[i][tensor_loc]; + args[i] += chunk_idx * chunk_size; + + if (!is_aligned(args[i])) { + all_aligned = false; + } + } + return all_aligned; +} + template __device__ void load_args(T r_args[][kILP], T** args, int i_start, int chunk_size, int n) { #pragma unroll diff --git a/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu b/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu index 3b04b68b0f39..27b3d77ad4d6 100644 --- a/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu +++ b/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu @@ -160,10 +160,45 @@ void foreach_tensor_##NAME##_scalarlist_cuda_(TensorList input, TensorList tenso foreach_pointwise_op_(input, tensors1, tensors2, scalars); \ } +#define FOREACH_POINTWISE_OP_TENSOR(NAME, OP) \ + std::vector foreach_tensor_##NAME##_tensor_cuda( \ + TensorList input, \ + TensorList tensors1, \ + TensorList tensors2, \ + const Tensor& scalars_) { \ + auto scalars = convert_tensor_to_scalar_list(scalars_, input.size()); \ + check_foreach_api_restrictions(input, tensors1, tensors2, scalars); \ + if (!can_use_fast_route({input, tensors1, tensors2}) || \ + has_integral_tensor(input, /* includeBool */ true)) { \ + return at::native::foreach_tensor_##NAME##_scalarlist_slow( \ + input, tensors1, tensors2, scalars); \ + } \ + \ + return foreach_pointwise_op(input, tensors1, tensors2, scalars); \ + } \ + \ + void foreach_tensor_##NAME##_tensor_cuda_( \ + TensorList input, \ + TensorList tensors1, \ + TensorList tensors2, \ + const Tensor& scalars_) { \ + auto scalars = convert_tensor_to_scalar_list(scalars_, input.size()); \ + check_foreach_api_restrictions(input, tensors1, tensors2, scalars); \ + if (!can_use_fast_route({input, tensors1, tensors2}, scalars) || \ + has_integral_tensor(input, /* includeBool */ true)) { \ + return at::native::foreach_tensor_##NAME##_scalarlist_slow_( \ + input, tensors1, tensors2, scalars); \ + } \ + \ + foreach_pointwise_op_(input, tensors1, tensors2, scalars); \ + } + FOREACH_POINTWISE_OP_SCALAR(addcmul, std::multiplies); FOREACH_POINTWISE_OP_SCALAR(addcdiv, std::divides); FOREACH_POINTWISE_OP_SCALARLIST(addcmul, std::multiplies); FOREACH_POINTWISE_OP_SCALARLIST(addcdiv, std::divides); +FOREACH_POINTWISE_OP_TENSOR(addcdiv, std::divides); +FOREACH_POINTWISE_OP_TENSOR(addcmul, std::multiplies); // Why bool tensors are pushed to slowpath? diff --git a/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu b/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu index 46ea4eadf1fe..24db8776cd49 100644 --- a/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu +++ b/aten/src/ATen/native/cuda/FractionalMaxPool2d.cu @@ -185,10 +185,10 @@ TORCH_IMPL_FUNC(fractional_max_pool2d_out_cuda) ( AT_DISPATCH_FLOATING_TYPES_AND_HALF(input.scalar_type(), "fractional_max_pool2d_out_cuda_frame", [&] { - auto devInput = input_.packed_accessor(); - auto devOutput = output_.packed_accessor(); - auto devIndices = indices_.packed_accessor(); - auto devSamples = randomSamples.packed_accessor(); + auto devInput = input_.packed_accessor64(); + auto devOutput = output_.packed_accessor64(); + auto devIndices = indices_.packed_accessor64(); + auto devSamples = randomSamples.packed_accessor64(); fractional_max_pool2d_out_cuda_frame <<>>( devOutput, devIndices, devInput, devSamples, @@ -253,12 +253,12 @@ TORCH_IMPL_FUNC(fractional_max_pool2d_backward_cuda)( gradInput_.size(0)); dim3 block(outputPlaneSize > 128 ? 128 : outputPlaneSize); - auto devIndices = indices_.packed_accessor(); + auto devIndices = indices_.packed_accessor64(); AT_DISPATCH_FLOATING_TYPES_AND_HALF(gradOutput.scalar_type(), "fractional_max_pool2d_backward_out_cuda_frame", [&] { - auto devGradInput = gradInput_.packed_accessor(); - auto devGradOutput = gradOutput_.packed_accessor(); + auto devGradInput = gradInput_.packed_accessor64(); + auto devGradOutput = gradOutput_.packed_accessor64(); fractional_max_pool2d_backward_out_cuda_frame <<>>( devGradInput, devGradOutput, devIndices); diff --git a/aten/src/ATen/native/cuda/FusedAdamKernel.cu b/aten/src/ATen/native/cuda/FusedAdamKernel.cu new file mode 100644 index 000000000000..d35c44df219c --- /dev/null +++ b/aten/src/ATen/native/cuda/FusedAdamKernel.cu @@ -0,0 +1,45 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include +#include +#include + + +namespace at { namespace native { + +// note(crcrpar): To observe the CI rules, i.e. 20 minutes per file to compile, defensively split instantiations into _impl files. +// this is only for CUDA 11.3 for which it took about 20 minutes and 28 minutes in my workstation and CI, respectively. +// As a data point, it took about 20 seconds for CUDA 11.7 installed in my environment. +// See https://github.com/pytorch/pytorch/pull/81705 for details. +void _fused_adam_kernel_cuda_( + at::TensorList params, + at::TensorList grads, + at::TensorList exp_avgs, + at::TensorList exp_avg_sqs, + at::TensorList max_exp_avg_sqs, + at::TensorList state_steps, + const double lr, + const double beta1, + const double beta2, + const double weight_decay, + const double eps, + const bool amsgrad, + const bool maximize, + const c10::optional& grad_scale, + const c10::optional& found_inf +) { + if (amsgrad) { + TORCH_CHECK( + at::native::check_fast_path_restrictions({params, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs}), + "params, grads, exp_avgs, exp_avg_sqs, and max_exp_avg_sqs must have same dtype, device, and layout"); + _fused_adam_cuda_impl_(params, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, lr, beta1, beta2, weight_decay, eps, amsgrad, maximize, grad_scale, found_inf); + } else { + TORCH_CHECK( + at::native::check_fast_path_restrictions({params, grads, exp_avgs, exp_avg_sqs}), + "params, grads, exp_avgs, and exp_avg_sqs must have same dtype, device, and layout"); + _fused_adam_cuda_impl_(params, grads, exp_avgs, exp_avg_sqs, state_steps, lr, beta1, beta2, weight_decay, eps, amsgrad, maximize, grad_scale, found_inf); + } +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/GridSampler.cu b/aten/src/ATen/native/cuda/GridSampler.cu index bfc3d86b8ab9..8aae51499e03 100644 --- a/aten/src/ATen/native/cuda/GridSampler.cu +++ b/aten/src/ATen/native/cuda/GridSampler.cu @@ -96,8 +96,8 @@ namespace { } } } else if (interpolation_mode == GridSamplerInterpolation::Nearest) { - index_t ix_nearest = static_cast(::round(ix)); - index_t iy_nearest = static_cast(::round(iy)); + index_t ix_nearest = static_cast(::nearbyint(ix)); + index_t iy_nearest = static_cast(::nearbyint(iy)); // assign nearest neighor pixel value to output pixel auto inp_ptr_NC = input.data + n * inp_sN; diff --git a/aten/src/ATen/native/cuda/Im2Col.cu b/aten/src/ATen/native/cuda/Im2Col.cu index 89b2a1879b4b..a209aa276463 100644 --- a/aten/src/ATen/native/cuda/Im2Col.cu +++ b/aten/src/ATen/native/cuda/Im2Col.cu @@ -18,7 +18,6 @@ #include #include #include -#include #endif namespace at { @@ -103,10 +102,9 @@ static void im2col_out_cuda_template( int64_t output_length = output_height * output_width; output.resize_({batch_size, n_output_plane, output_length}); - output.zero_(); // Launch kernel - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(kHalf, + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kHalf, kBFloat16, input.scalar_type(), "im2col_out_cuda", [&] { Tensor input_n; Tensor output_n; @@ -140,29 +138,6 @@ static void im2col_out_cuda_template( }); } -static void im2col_backward_out_cuda_template( - Tensor& grad_input, - const Tensor& grad_output, - IntArrayRef input_size, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride) { - TORCH_CHECK( - input_size.size() == 2, - "It is expected input_size equals to 2, but got size ", - input_size.size()); - // col2im_out_cuda checks size of kernel_size, dilation, padding and stride - at::native::col2im_out_cuda( - grad_output, - input_size, - kernel_size, - dilation, - padding, - stride, - grad_input); -} - } // namespace Tensor& im2col_out_cuda(const Tensor& input, @@ -188,42 +163,5 @@ Tensor im2col_cuda( return output; } -Tensor& im2col_backward_out_cuda(const Tensor& grad_output, - IntArrayRef input_size, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride, - Tensor& grad_input) { - im2col_backward_out_cuda_template( - grad_input, - grad_output, - input_size, - kernel_size, - dilation, - padding, - stride); - return grad_input; -} - -Tensor im2col_backward_cuda( - const Tensor& grad_output, - IntArrayRef input_size, - IntArrayRef kernel_size, - IntArrayRef dilation, - IntArrayRef padding, - IntArrayRef stride) { - Tensor grad_input = at::empty_like(grad_output, LEGACY_CONTIGUOUS_MEMORY_FORMAT); - im2col_backward_out_cuda_template( - grad_input, - grad_output, - input_size, - kernel_size, - dilation, - padding, - stride); - return grad_input; -} - } // namespace native } // namespace at diff --git a/aten/src/ATen/native/cuda/IndexKernel.cu b/aten/src/ATen/native/cuda/IndexKernel.cu index dee39b40e91e..f23c2dc3b387 100644 --- a/aten/src/ATen/native/cuda/IndexKernel.cu +++ b/aten/src/ATen/native/cuda/IndexKernel.cu @@ -12,6 +12,7 @@ #include #include #include +#include #include @@ -239,6 +240,21 @@ static void index_put_kernel(TensorIterator& iter, IntArrayRef index_size, IntAr }); } +void index_put_kernel_quantized_cuda(TensorIterator& iter, IntArrayRef index_size, IntArrayRef index_stride, bool accumulate, double scale, int zero_point) { + TORCH_CHECK(!accumulate, "index_put does not support accumulate=true"); + AT_DISPATCH_QINT_AND_SUB_BYTE_TYPES(iter.dtype(), "index_put", [&] { + constexpr int64_t qmin = std::numeric_limits::min(); + constexpr int64_t qmax = std::numeric_limits::max(); + float inv_scale = 1.0f / static_cast(scale); + + gpu_index_kernel(iter, index_size, index_stride, [inv_scale, zero_point, qmin, qmax]C10_DEVICE(char* out_data, char* in_data, int64_t offset) { + int64_t qvalue = static_cast(zero_point + nearbyintf(*(float*)in_data * inv_scale)); + qvalue = min(max(qvalue, qmin), qmax); + *(scalar_t*)(out_data + offset) = static_cast(qvalue); + }); + }); +} + template void cuda_take_put_kernel( TensorIterator& iter, @@ -451,4 +467,6 @@ REGISTER_DISPATCH(put_stub, &put_kernel); REGISTER_DISPATCH(take_stub, &take_kernel); REGISTER_DISPATCH(flip_stub, &flip_kernel); +REGISTER_CUDA_DISPATCH(index_put_kernel_quantized_stub, &index_put_kernel_quantized_cuda); + }} // namespace at::native diff --git a/aten/src/ATen/native/cuda/Indexing.cu b/aten/src/ATen/native/cuda/Indexing.cu index 6ea88069ca2e..9140e2ada8a3 100644 --- a/aten/src/ATen/native/cuda/Indexing.cu +++ b/aten/src/ATen/native/cuda/Indexing.cu @@ -1,6 +1,7 @@ #define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include +#include #include #include @@ -35,13 +36,13 @@ #include #include #include +#include #include #include namespace { - template __global__ void indexing_backward_kernel( int64_t* sorted_indices, int64_t* indices, scalar_t* grad_output, scalar_t* grad_weight, @@ -120,6 +121,65 @@ __global__ void indexing_backward_kernel( } } +template +__global__ void indexing_backward_kernel_quantized( + int64_t* sorted_indices, int64_t* indices, float* grad_output, scalar_t* grad_weight, + int64_t numel, int64_t stride, int64_t stride_before, int64_t outer_dim, + float inv_scale, int zero_point, int64_t qmin, int64_t qmax) { + + // This implementation is adopted from indexing_backward_kernel above. + using opmath_t = at::opmath_type; + for (int64_t z = blockIdx.z; z < outer_dim; z += gridDim.z){ + int64_t idx = blockIdx.x * blockDim.y + threadIdx.y; + if (idx < numel + && (idx == 0 || sorted_indices[idx] != sorted_indices[idx - 1])){ + do { + int64_t start_feature = threadIdx.x + blockIdx.y * blockDim.x * SZ; + // we only keep the last duplicate index so skip those before it + if ((idx < numel - 1) && sorted_indices[idx] == sorted_indices[idx + 1]) { + idx++; + continue; + } + const int64_t weight_row = ((int64_t) sorted_indices[idx]) * stride + z * stride_before; + const int64_t grad_row = ((int64_t) indices[idx]) * stride + z * numel * stride; + const opmath_t scale = (opmath_t)1.0; + + opmath_t gradient[SZ]; + opmath_t weight[SZ]; + + while (start_feature < stride) { + #pragma unroll + for (int ii = 0; ii < SZ; ii++) { + int64_t feature_dim = start_feature + ii * C10_WARP_SIZE; + if (feature_dim < stride) { + gradient[ii] = static_cast(grad_output[grad_row + feature_dim]); + } + } + + #pragma unroll + for (int ii = 0; ii < SZ; ii++) { + weight[ii] = gradient[ii] * scale; + } + + #pragma unroll + for (int ii = 0; ii < SZ; ii++) { + int64_t feature_dim = start_feature + ii * C10_WARP_SIZE; + if (feature_dim < stride) { + // we do quantization here + int64_t qvalue = static_cast(zero_point + nearbyintf(weight[ii]* inv_scale)); + qvalue = min(max(qvalue, qmin), qmax); + grad_weight[weight_row + feature_dim] = static_cast(qvalue); + } + } + start_feature += gridDim.y * blockDim.x * SZ; + } + + idx++; + } while (idx < numel && sorted_indices[idx] == sorted_indices[idx - 1]); + } + } +} + } @@ -231,9 +291,14 @@ computeLinearIndex(const Tensor & src, TensorList indices, bool check_range) { static std::tuple> makeLinearIndex(Tensor self, IOptTensorListRef orig, bool check_range) { - checkIndexTensorTypes(orig); + checkIndexTensorTypes(orig, /*allow_int*/true); // first expand BoolTensor (masks) or ByteTensor (masks) into 1 or more LongTensors auto indices = expandTensors(self, orig); + for (auto & i : indices) { + if (i.defined() && i.dtype() == at::kInt) { + i = i.to(at::kLong); + } + } // next broadcast all index tensors together indices = expand_outplace(indices); // add missing null Tensors so that it matches self.dim() @@ -357,6 +422,106 @@ void index_put_with_sort_kernel(Tensor & self, const c10::List>& indices, const Tensor & value, double scale, int zero_point, bool unsafe) { + if (indices.size() > (size_t)self.dim()) { + TORCH_CHECK_INDEX(false, "too many indices for tensor of dimension ", self.dim(), " (got ", indices.size(), ")"); + } + bool self_contiguous = self.is_contiguous(); + auto self_ = self_contiguous ? self : self.contiguous(); + Tensor linearIndex, src, expandedValue = value; + int64_t nElemBefore, strideBefore, sliceSize; + std::vector inversePerm; + std::tie(linearIndex, src, nElemBefore, strideBefore, sliceSize, inversePerm) = makeLinearIndex(self_, indices, !unsafe); + int64_t num_indices = linearIndex.numel(); + + if (expandedValue.numel() < num_indices * nElemBefore * sliceSize) { + auto expanded_size = at::DimVector(expandedValue.sizes()); + auto size1 = expandedValue.sizes(); + auto size2 = linearIndex.sizes(); + if (are_expandable(size1, size2)) { + expanded_size = infer_size_dimvector(size1, size2); + } + if (nElemBefore > 1) { + expanded_size.insert(expanded_size.begin(), nElemBefore); + } + expandedValue = expandedValue.expand(expanded_size); + } + expandedValue = expandedValue.contiguous(); + + if (num_indices > 0 && sliceSize > 0) { + const bool permuted = !src.is_contiguous(); + auto src_ = permuted ? src.contiguous() : src; + linearIndex = linearIndex.reshape(-1); + auto sorted_indices = at::empty_like(linearIndex, LEGACY_CONTIGUOUS_MEMORY_FORMAT); + auto orig_indices = at::empty_like(linearIndex, LEGACY_CONTIGUOUS_MEMORY_FORMAT); + const cudaStream_t stream = at::cuda::getCurrentCUDAStream(); + + linearIndex.divide_(sliceSize, "trunc"); + + // cub on CUDA <= 11.2 have a bug that for small sizes + // cub's sort can be much slower than thrust's merge sort + // this bug is fixed in CUDA 11.3 +#if (defined(CUDA_VERSION) && CUDA_VERSION < 11030) || defined(USE_ROCM) + if (num_indices < 50000) { + index_put_with_sort_kernel_thrust_helper(linearIndex, orig_indices, sorted_indices, num_indices); + } else +#endif + { + // Sort the inputs into sorted with the corresponding indices + auto range = at::arange(num_indices, linearIndex.options()); + // linearIndex can not be negative, and we take advantage of this + // fact to sort on less bits for better performance. + int64_t nbits = cuda::cub::get_num_bits(largestIndex(self_) / sliceSize); + cuda::cub::radix_sort_pairs( + linearIndex.data_ptr(), sorted_indices.data_ptr(), + range.data_ptr(), orig_indices.data_ptr(), + num_indices, false, 0, nbits); + } + + TORCH_INTERNAL_ASSERT( + linearIndex.numel()*sliceSize*nElemBefore == expandedValue.numel(), + "number of flattened indices did not match number of elements in the value tensor: ", + linearIndex.numel()*sliceSize*nElemBefore, " vs ", expandedValue.numel()); + const int UNROLL = 4; + const int indices_per_block = 4; + const int warp_size = at::cuda::warp_size(); + dim3 grid(ceil_div(num_indices, (int64_t) indices_per_block), + std::min(at::cuda::getCurrentDeviceProperties()->maxGridSize[1], ceil_div(sliceSize, (int64_t) (warp_size*UNROLL))), + std::min(std::max(1,nElemBefore), at::cuda::getCurrentDeviceProperties()->maxGridSize[2])); + dim3 block(warp_size, indices_per_block); + + AT_DISPATCH_QINT_TYPES( + src.scalar_type(), "indexing_backward_quantized", [&] { + constexpr int64_t qmin = std::numeric_limits::min(); + constexpr int64_t qmax = std::numeric_limits::max(); + float inv_scale = 1.0f / static_cast(scale); + + indexing_backward_kernel_quantized<<>>( + sorted_indices.data_ptr(), + orig_indices.data_ptr(), + expandedValue.data_ptr(), + src_.data_ptr(), + num_indices, + sliceSize, + strideBefore, + nElemBefore, + inv_scale, + zero_point, + qmin, + qmax); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + }); + + if (permuted) { + self.copy_(src_.permute(inversePerm)); + } else if (!self_contiguous) { + self.copy_(self_); + } + } +} + +REGISTER_CUDA_DISPATCH(index_put_with_sort_quantized_stub, &index_put_with_sort_quantized); } //anonymous @@ -1215,6 +1380,35 @@ void masked_fill_kernel(TensorIterator& iter, const Scalar& value) { }); } +template +void cuda_masked_fill_kernel_quantized(TensorIterator& iter, scalar_t quantized_val) { + gpu_kernel( + iter, [quantized_val] GPU_LAMBDA(scalar_t self, mask_t mask) -> scalar_t { + if (mask) { + return quantized_val; + } + return self; + }); +} + +void masked_fill_kernel_quantized(TensorIterator& iter, const Scalar& value, double scale, int zero_point) { + AT_DISPATCH_QINT_TYPES( + iter.common_dtype(), "masked_fill_", [&]() { + float float_val = value.to(); + const auto quantized_val = quantize_val(scale, zero_point, float_val); + auto mask_dtype = iter.input_dtype(0); + + if (mask_dtype == at::ScalarType::Bool) { + cuda_masked_fill_kernel_quantized(iter, quantized_val); + } + else { + cuda_masked_fill_kernel_quantized(iter, quantized_val); + } + }); +} + +REGISTER_CUDA_DISPATCH(masked_fill_kernel_quantized_stub, &masked_fill_kernel_quantized); + } // anonymous namespace Tensor & masked_fill__cuda(Tensor& self, const Tensor & mask, const Scalar& value) { diff --git a/aten/src/ATen/native/cuda/JitLoops.cuh b/aten/src/ATen/native/cuda/JitLoops.cuh index bb37a6acc2e1..6f350c550ce9 100644 --- a/aten/src/ATen/native/cuda/JitLoops.cuh +++ b/aten/src/ATen/native/cuda/JitLoops.cuh @@ -12,11 +12,7 @@ #include -#if !AT_ROCM_ENABLED() #include -#else -#error Jiterator not supported on ROCm -#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/cuda/KernelUtils.cuh b/aten/src/ATen/native/cuda/KernelUtils.cuh index 1e36e2db74d5..d2e956d1a3e4 100644 --- a/aten/src/ATen/native/cuda/KernelUtils.cuh +++ b/aten/src/ATen/native/cuda/KernelUtils.cuh @@ -1,6 +1,10 @@ #pragma once #include +#if !(defined(USE_ROCM) || ((defined(CUDA_VERSION) && CUDA_VERSION < 11000) || (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800)))) +#include +#endif + namespace at { namespace native { @@ -66,7 +70,49 @@ __device__ __forceinline__ void fastSpecializedAtomicAdd( template < typename scalar_t, typename index_t, - typename std::enable_if::value>::type* = + typename std::enable_if::value>::type* = + nullptr> +__device__ __forceinline__ void fastSpecializedAtomicAdd( + scalar_t* tensor, + index_t index, + const index_t numel, + scalar_t value) { +#if ( \ + (defined(USE_ROCM)) || \ + (defined(CUDA_VERSION) && (CUDA_VERSION < 11000)) || \ + (defined(__CUDA_ARCH__) && (__CUDA_ARCH__ < 800))) + gpuAtomicAddNoReturn( + reinterpret_cast(tensor) + index, + static_cast(value)); +#else + // Accounts for the chance tensor falls on an odd 16 bit alignment (ie, not 32 bit aligned) + __nv_bfloat16* target_addr = reinterpret_cast<__nv_bfloat16*>(tensor + index); + bool low_byte = (reinterpret_cast(target_addr) % sizeof(__nv_bfloat162) == 0); + + if (low_byte && index < (numel - 1)) { + __nv_bfloat162 value2; + value2.x = *reinterpret_cast<__nv_bfloat16*>(&value); + value2.y = __int2bfloat16_rz(0); + atomicAdd(reinterpret_cast<__nv_bfloat162*>(target_addr), value2); + + } else if (!low_byte && index > 0) { + __nv_bfloat162 value2; + value2.x = __int2bfloat16_rz(0); + value2.y = *reinterpret_cast<__nv_bfloat16*>(&value); + atomicAdd(reinterpret_cast<__nv_bfloat162*>(target_addr - 1), value2); + + } else { + atomicAdd( + reinterpret_cast<__nv_bfloat16*>(tensor) + index, *reinterpret_cast<__nv_bfloat16*>(&value)); + } +#endif +} + + +template < + typename scalar_t, + typename index_t, + typename std::enable_if::value && !std::is_same::value >::type* = nullptr> __device__ __forceinline__ void fastSpecializedAtomicAdd( scalar_t* tensor, diff --git a/aten/src/ATen/native/cuda/Lerp.cu b/aten/src/ATen/native/cuda/Lerp.cu index ac1f2ba379b5..c1adb5b6fc03 100644 --- a/aten/src/ATen/native/cuda/Lerp.cu +++ b/aten/src/ATen/native/cuda/Lerp.cu @@ -14,23 +14,13 @@ void lerp_tensor_kernel(at::TensorIteratorBase& iter) { at::ScalarType::Half, at::ScalarType::BFloat16, iter.common_dtype(), "lerp_cuda", [&] { - using opmath_t = at::opmath_type; at::native::gpu_kernel( iter, [] GPU_LAMBDA( scalar_t self_val, scalar_t end_val, scalar_t weight_val) -> scalar_t { - opmath_t self_val_f = self_val; - opmath_t end_val_f = end_val; - opmath_t weight_val_f = weight_val; - // Conditional for better numeric. This has been discussed in - // https://github.com/pytorch/pytorch/pull/18871 - return (std::abs(weight_val_f) < 0.5) - ? self_val_f + weight_val_f * (end_val_f - self_val_f) - : end_val_f - - (end_val_f - self_val_f) * - (opmath_t{1} - weight_val_f); + return lerp(self_val, end_val, weight_val); }); }); } @@ -44,14 +34,7 @@ void lerp_scalar_kernel(at::TensorIteratorBase& iter, const c10::Scalar& weight) auto weight_val = weight.to(); at::native::gpu_kernel( iter, [=] GPU_LAMBDA(scalar_t self_val, scalar_t end_val) { - opmath_t self_val_f = self_val; - opmath_t end_val_f = end_val; - // Conditional for better numeric. This has been discussed in - // https://github.com/pytorch/pytorch/pull/18871 - return (std::abs(weight_val) < 0.5) - ? self_val_f + weight_val * (end_val_f - self_val_f) - : end_val_f - - (end_val_f - self_val_f) * (opmath_t{1} - weight_val); + return lerp(self_val, end_val, weight_val); }); }); } diff --git a/aten/src/ATen/native/cuda/LinearAlgebra.cu b/aten/src/ATen/native/cuda/LinearAlgebra.cu index 280a5046ef06..ae6901a361af 100644 --- a/aten/src/ATen/native/cuda/LinearAlgebra.cu +++ b/aten/src/ATen/native/cuda/LinearAlgebra.cu @@ -101,14 +101,14 @@ static void _launch_kernel(int total_n_elems, func_t f) { C10_CUDA_KERNEL_LAUNCH_CHECK(); } -void unpack_pivots_cuda_kernel(TensorIterator& iter, const int64_t dim_size) { +void unpack_pivots_cuda_kernel(TensorIterator& iter, const int64_t dim_size, const int64_t max_pivot) { if (iter.numel() == 0) { return; } if (!iter.can_use_32bit_indexing()) { for (auto& sub_iter : iter.with_32bit_indexing()) { - unpack_pivots_cuda_kernel(sub_iter, dim_size); + unpack_pivots_cuda_kernel(sub_iter, dim_size, max_pivot); } return; } diff --git a/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp b/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp index 913e30b77c0f..655090d28e63 100644 --- a/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp +++ b/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp @@ -34,9 +34,7 @@ namespace native { #if defined(BUILD_LAZY_CUDA_LINALG) namespace { cuda::detail::LinalgDispatch disp = {_symeig_helper_cuda, - _cholesky_solve_helper_cuda, - legacy_lstsq_cuda, - _linalg_inv_out_helper_cuda}; + _cholesky_solve_helper_cuda}; at::DynamicLibrary& getTorchLinalgLibrary() { static at::DynamicLibrary lib("libtorch_cuda_linalg.so", nullptr, true); @@ -94,11 +92,6 @@ void lazy_linalg_eigh_kernel(const Tensor& eigenvalues, const Tensor& eigenvecto linalg_eigh_stub(DeviceType::CUDA, eigenvalues, eigenvectors, infos, upper, compute_eigenvectors); } -std::tuple lazy_eig_kernel(const Tensor& self, bool& eigenvectors) { - loadLazyTorchLinalgLibrary(); - return eig_stub(DeviceType::CUDA, self, eigenvectors); -} - void lazy_linalg_eig_kernel(Tensor& eigenvalues, Tensor& eigenvectors, Tensor& infos, const Tensor& input, bool compute_eigenvectors) { getTorchLinalgLibrary(); linalg_eig_stub(DeviceType::CUDA, eigenvalues, eigenvectors, infos, input, compute_eigenvectors); @@ -156,7 +149,6 @@ REGISTER_CUDA_DISPATCH(orgqr_stub, &lazy_orgqr_kernel); REGISTER_CUDA_DISPATCH(ormqr_stub, &lazy_ormqr_kernel); REGISTER_CUDA_DISPATCH(geqrf_stub, &lazy_geqrf_kernel); REGISTER_CUDA_DISPATCH(linalg_eigh_stub, &lazy_linalg_eigh_kernel); -REGISTER_CUDA_DISPATCH(eig_stub, &lazy_eig_kernel); REGISTER_CUDA_DISPATCH(linalg_eig_stub, &lazy_linalg_eig_kernel); REGISTER_CUDA_DISPATCH(svd_stub, &lazy_svd_kernel) REGISTER_CUDA_DISPATCH(lu_solve_stub, &lazy_lu_solve); @@ -177,18 +169,6 @@ void registerLinalgDispatch(const LinalgDispatch& disp_) { } }} //namespace cuda::detail -Tensor& _linalg_inv_out_helper_cuda(Tensor &result, Tensor& infos_lu, Tensor& infos_getri) { - getTorchLinalgLibrary(); - TORCH_CHECK(disp.inv_out_helper != _linalg_inv_out_helper_cuda, "Can't find _linalg_inv_out_helper_cuda"); - return disp.inv_out_helper(result, infos_lu, infos_getri); -} - -std::tuple legacy_lstsq_cuda(const Tensor &B, const Tensor &A) { - getTorchLinalgLibrary(); - TORCH_CHECK(disp.legacy_lstsq != legacy_lstsq_cuda, "Can't find legacy_lstsq_cuda"); - return disp.legacy_lstsq(B, A); -} - Tensor _cholesky_solve_helper_cuda(const Tensor& self, const Tensor& A, bool upper) { getTorchLinalgLibrary(); TORCH_CHECK(disp.cholesky_solve_helper != _cholesky_solve_helper_cuda, "Can't find _cholesky_solve_helper_cuda"); @@ -203,22 +183,4 @@ std::tuple _symeig_helper_cuda(const Tensor& self, bool eigenvec #endif /*defined(BUILD_LAZY_CUDA_LINALG)*/ -std::tuple legacy_lstsq_out_cuda( - const Tensor& B, const Tensor& A, Tensor& B_out, Tensor& A_out) { - const auto dtype = A.scalar_type(); - TORCH_CHECK(B.scalar_type() == dtype, "exepected A and B dtypes to match but found ", - A.scalar_type(), " and ", B.scalar_type()); - TORCH_CHECK(A_out.scalar_type() == dtype, "A_out to have scalar type ", dtype, - " but found", A_out.scalar_type()); - TORCH_CHECK(B_out.scalar_type() == dtype, "A_out to have scalar type ", dtype, - " but found", B_out.scalar_type()); - Tensor A_tmp, B_tmp; - std::tie(B_tmp, A_tmp) = native::legacy_lstsq_cuda(B, A); - resize_output(A_out, A_tmp.sizes()); - A_out.copy_(A_tmp); - resize_output(B_out, B_tmp.sizes()); - B_out.copy_(B_tmp); - return std::tuple(B_out, A_out); -} - }} // namespace at::native diff --git a/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu b/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu new file mode 100644 index 000000000000..28b3236caa2d --- /dev/null +++ b/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu @@ -0,0 +1,37 @@ +#define TORCH_ASSERT_NO_OPERATORS +#include +#include +#include + +#include +#include + +#include +#include + +namespace at { namespace native { + +void launch_logcumsumexp_cuda_kernel(const TensorBase& result, const TensorBase& self, int64_t dim) { + AT_DISPATCH_FLOATING_TYPES_AND2( + ScalarType::Half, ScalarType::BFloat16, + self.scalar_type(), "logcumsumexp_cuda", + [&]() { + using opmath_t = at::opmath_type; + scalar_t init = -std::numeric_limits::infinity(); + auto log_add_exp = [] C10_HOST_DEVICE (const scalar_t x_, const scalar_t y_) -> scalar_t { + const opmath_t x{x_}, y{y_}; + auto min = at::_isnan(y) ? y : std::min(x, y); //std::min returns first arg if one of the args is nan + auto max = at::_isnan(y) ? y : std::max(x, y); //std::max returns first arg if one of the args is nan + if (min != max || ::isfinite(min)) { + // nan will be propagated here + return ::log1p(std::exp(min - max)) + max; + } else { + // special case to correctly handle infinite inputs + return x; + } + }; + scan_dim(self, result, dim, init, log_add_exp); + }); +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/Loss.cu b/aten/src/ATen/native/cuda/Loss.cu index fcb3229198ab..f1cda14a16a2 100644 --- a/aten/src/ATen/native/cuda/Loss.cu +++ b/aten/src/ATen/native/cuda/Loss.cu @@ -152,6 +152,7 @@ namespace { constexpr int NLL_LOSS_THREADS = 32; +// NOTE(crcrpar): `Byte` support was added for https://github.com/pytorch/pytorch/issues/59765. #define AT_DISPATCH_NLL_LOSS_INDEX_TYPES(TYPE, NAME, ...) \ AT_DISPATCH_SWITCH(TYPE, NAME, \ AT_PRIVATE_CASE_TYPE_USING_HINT(at::ScalarType::Byte, index_t, __VA_ARGS__) \ @@ -164,10 +165,10 @@ __global__ void nll_loss_forward_no_reduce_cuda_kernel( index_t* target, scalar_t* output, scalar_t* weights, - int n_classes, - int ignore_index) { + int64_t n_classes, + int64_t ignore_index) { CUDA_KERNEL_LOOP(index, batch_size) { - int cur_target = target[index]; + index_t cur_target = target[index]; if (cur_target == ignore_index) { output[index] = static_cast(0); continue; @@ -187,12 +188,12 @@ __global__ void nll_loss_forward_reduce_cuda_kernel_1d( index_t* target, scalar_t* weights, bool size_average, - int n_classes, + int64_t n_classes, int64_t ignore_index) { CUDA_KERNEL_ASSERT(threadIdx.x == 0 && threadIdx.y == 0 && threadIdx.z == 0); - int t = static_cast(*target); - if (t != static_cast(ignore_index)) { + const index_t t = *target; + if (t != ignore_index) { CUDA_KERNEL_ASSERT(t >= 0 && t < n_classes); const auto cur_weight = weights != nullptr ? weights[t] : scalar_t{1}; *total_weight = cur_weight; @@ -223,9 +224,9 @@ __global__ void nll_loss_forward_reduce_cuda_kernel_2d( index_t* target, scalar_t* weights, bool size_average, - int nframe, - int ndim, - int n_classes, + int64_t nframe, + int64_t ndim, + int64_t n_classes, int64_t ignore_index) { // NOLINTNEXTLINE(cppcoreguidelines-init-variables) __shared__ accscalar_t sh_inputs[NLL_LOSS_THREADS], @@ -234,8 +235,8 @@ __global__ void nll_loss_forward_reduce_cuda_kernel_2d( sh_inputs[threadIdx.x] = static_cast(0); acc_weight[threadIdx.x] = static_cast(0); for (int i = threadIdx.x; i < nframe; i += NLL_LOSS_THREADS) { - int t = target[i]; - if (t != static_cast(ignore_index)) { + index_t t = target[i]; + if (t != ignore_index) { CUDA_KERNEL_ASSERT(t >= 0 && t < n_classes); scalar_t cur_weight = weights != nullptr ? weights[t] : static_cast(1); @@ -400,11 +401,11 @@ __global__ void nll_loss_backward_no_reduce_cuda_kernel( PackedTensorAccessor64 grad_output, PackedTensorAccessor64 grad_input, scalar_t *weights, - int n_classes, - int ignore_index) { + int64_t n_classes, + int64_t ignore_index) { CUDA_KERNEL_LOOP(index, batch_size) { - int cur_target = target[index]; + index_t cur_target = target[index]; if (cur_target == ignore_index) { continue; } @@ -422,19 +423,21 @@ __global__ void nll_loss_backward_reduce_cuda_kernel_1d( index_t *target, scalar_t *total_weight, bool size_average, - int n_classes, + int64_t n_classes, int64_t ignore_index ) { - int t = static_cast(*target); - if (t != static_cast(ignore_index)) { + const index_t t = *target; + if (t != ignore_index) { CUDA_KERNEL_ASSERT(t >= 0 && t < n_classes); - const auto grad = -(size_average ? *grad_output / *total_weight - : *grad_output); - grad_input[t] = weights != nullptr ? weights[t] * grad - : grad; + const auto grad = -(size_average ? *grad_output / *total_weight : *grad_output); + grad_input[t] = weights != nullptr ? weights[t] * grad : grad; } } +template struct bwd_index_type { using type = T; }; +template<> struct bwd_index_type { using type = int; }; +template<> struct bwd_index_type { using type = uint64_t; }; + template __global__ void nll_loss_backward_reduce_cuda_kernel_2d( scalar_t* grad_input, @@ -445,17 +448,20 @@ __global__ void nll_loss_backward_reduce_cuda_kernel_2d( bool size_average, int nframe, int ndim, - int n_classes, + int64_t n_classes, int64_t ignore_index) { + using bwd_index_t = typename bwd_index_type::type; const auto grad = -(size_average ? *grad_output / *total_weight : *grad_output); for (int i = threadIdx.x; i < nframe; i += NLL_LOSS_THREADS) { - int t = target[i]; - if (t != static_cast(ignore_index)) { + const index_t t = target[i]; + if (t != ignore_index) { CUDA_KERNEL_ASSERT(t >= 0 && t < n_classes); - grad_input[i * ndim + t] = weights != nullptr ? weights[t] * grad - : grad; + // NOTE(crcrpar): this index could overflow in int64_t as `t` itself can be close to the max. + const bwd_index_t index = static_cast(i) * ndim + t; + CUDA_KERNEL_ASSERT(index >= 0); + grad_input[index] = weights != nullptr ? weights[t] * grad : grad; } } } @@ -504,8 +510,7 @@ void nll_loss_backward_out_cuda_template( target.data_ptr(), grad_output.packed_accessor64(), grad_input.packed_accessor64(), - weight.defined() ? weight_.data_ptr() - : nullptr, + weight.defined() ? weight_.data_ptr() : nullptr, n_classes, ignore_index); C10_CUDA_KERNEL_LAUNCH_CHECK(); diff --git a/aten/src/ATen/native/cuda/MaxUnpooling.cu b/aten/src/ATen/native/cuda/MaxUnpooling.cu index 9c24c4ea8edc..ba1a7eb1f5cb 100644 --- a/aten/src/ATen/native/cuda/MaxUnpooling.cu +++ b/aten/src/ATen/native/cuda/MaxUnpooling.cu @@ -118,6 +118,10 @@ Tensor& max_unpooling2d_forward_out_cuda(const Tensor& self_, const Tensor& indices_, IntArrayRef output_size, Tensor& output) { + // See Note [Writing Nondeterministic Operations] + // Nondeterministic with duplicate indices + at::globalContext().alertNotDeterministic("max_unpooling2d_forward_out"); + TORCH_CHECK(output.is_contiguous(), "output must be contiguous"); TORCH_CHECK( indices_.scalar_type() == at::ScalarType::Long, @@ -291,6 +295,10 @@ Tensor& max_unpooling3d_forward_out_cuda(const Tensor& self_, IntArrayRef stride, IntArrayRef padding, Tensor& output) { + // See Note [Writing Nondeterministic Operations] + // Nondeterministic with duplicate indices + at::globalContext().alertNotDeterministic("max_unpooling3d_forward_out"); + TORCH_CHECK(output.is_contiguous(), "output must be contiguous"); max_unpooling3d_shape_check( self_, Tensor(), indices_, output_size, stride, padding, "max_unpooling3d_forward_out_cuda()"); diff --git a/aten/src/ATen/native/cuda/MultiMarginLoss.cu b/aten/src/ATen/native/cuda/MultiMarginLoss.cu index 15e6d1e9dc0c..26f21cfa59a2 100644 --- a/aten/src/ATen/native/cuda/MultiMarginLoss.cu +++ b/aten/src/ATen/native/cuda/MultiMarginLoss.cu @@ -31,6 +31,7 @@ __global__ void MultiMarginLoss_forward_kernel( scalar_t *input_k = input + k*dim; scalar_t *output_k = output + k; int target_k = static_cast(target[k]); + CUDA_KERNEL_ASSERT(target_k >= 0 && target_k < dim && "target index is out of bounds"); scalar_t input_target_k = input_k[target_k]; int i_start = threadIdx.x; diff --git a/aten/src/ATen/native/cuda/MultiTensorApply.cuh b/aten/src/ATen/native/cuda/MultiTensorApply.cuh index 29675695e013..a74144974a48 100644 --- a/aten/src/ATen/native/cuda/MultiTensorApply.cuh +++ b/aten/src/ATen/native/cuda/MultiTensorApply.cuh @@ -24,6 +24,7 @@ __device__ __forceinline__ void load_store(T* dst, T* src, int dst_offset, int s ((LT*)dst)[dst_offset] = ((LT*)src)[src_offset]; } +// TODO(crcrpar): Add `n>5` for `low prec params & their higher prec copy` // TensorListMetadata has to be < 4KB - the limit for kernel launch argument static constexpr int depth_to_max_tensors[5] = {110, 64, 48, 36, 30}; static constexpr int depth_to_max_blocks[5] = {320, 320, 320, 320, 320}; @@ -38,6 +39,18 @@ template struct TensorListMetadata int start_tensor_this_launch; }; +// NOTE(crcrpar): This is a conservative resolution to handle `state_steps` +// whose each element is `at::Tensor` of 1 element representing the number of `step`s called so far. +template struct FusedOptimizerTensorListMetadata +{ + void* addresses[n][depth_to_max_tensors[n-1]]; + int numel_for_tensor[depth_to_max_tensors[n-1]]; + void* state_steps_addresses[depth_to_max_tensors_scalarlist[n-1]]; + unsigned char block_to_tensor[depth_to_max_blocks[n-1]]; + int block_to_chunk[depth_to_max_blocks[n-1]]; + int start_tensor_this_launch; +}; + template struct TensorListScalarListMetadata { void* addresses[n][depth_to_max_tensors_scalarlist[n-1]]; @@ -184,6 +197,61 @@ void multi_tensor_apply( } } } +} + +template +void multi_tensor_apply_for_fused_optimizer( + std::vector>& tensor_lists, + at::TensorList state_steps, + T callable, + ArgTypes... args) { + TORCH_CHECK(tensor_lists.size() == depth, "Number of tensor lists has to match the depth"); + const auto num_tensors = tensor_lists[0].size(); + FusedOptimizerTensorListMetadata tensorListMeta; + + int loc_block_info = 0; + int loc_tensor_info = 0; + for (const auto & tensor_index : c10::irange(num_tensors)) { + tensorListMeta.state_steps_addresses[loc_tensor_info] = state_steps[tensor_index].data_ptr(); + tensorListMeta.numel_for_tensor[loc_tensor_info] = tensor_lists[0][tensor_index].numel(); + for (const auto & d : c10::irange(depth)) { + tensorListMeta.addresses[d][loc_tensor_info] = tensor_lists[d][tensor_index].data_ptr(); } + loc_tensor_info++; + + const auto chunks = (tensor_lists[0][tensor_index].numel() + kChunkSize - 1) / kChunkSize; + for (const auto & chunk : c10::irange(chunks)) { + tensorListMeta.block_to_tensor[loc_block_info] = loc_tensor_info - 1; + tensorListMeta.block_to_chunk[loc_block_info] = chunk; + loc_block_info++; + + const auto tensor_full = (loc_tensor_info == depth_to_max_tensors[depth - 1] && chunk == chunks - 1); + const auto blocks_full = loc_block_info == depth_to_max_blocks[depth - 1]; + const auto last_chunk = (tensor_index == num_tensors - 1 && chunk == chunks - 1); + + if (tensor_full || blocks_full || last_chunk) { + multi_tensor_apply_kernel<<>>( + tensorListMeta, + callable, + args...); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + + // Reset. + loc_block_info = 0; + if (chunk == chunks - 1) { + loc_tensor_info = 0; + } else { + tensorListMeta.numel_for_tensor[0] = tensorListMeta.numel_for_tensor[loc_tensor_info - 1]; + tensorListMeta.state_steps_addresses[0] = tensorListMeta.state_steps_addresses[loc_tensor_info - 1]; + for (const auto & d : c10::irange(depth)) { + tensorListMeta.addresses[d][0] = tensorListMeta.addresses[d][loc_tensor_info - 1]; + } + loc_tensor_info = 1; + } + } + } + } +} + } // namespace }} // at::native diff --git a/aten/src/ATen/native/cuda/MultinomialKernel.cu b/aten/src/ATen/native/cuda/MultinomialKernel.cu index de8e8404ac2d..c8473245604c 100644 --- a/aten/src/ATen/native/cuda/MultinomialKernel.cu +++ b/aten/src/ATen/native/cuda/MultinomialKernel.cu @@ -80,7 +80,7 @@ void renormRows(Tensor& t) { int64_t cols = t.size(1); auto props = at::cuda::getCurrentDeviceProperties(); - CUDA_KERNEL_ASSERT(props != NULL); + TORCH_CHECK(props != nullptr); int numSM = props->multiProcessorCount; const int64_t maxThreads = std::min( props->maxThreadsPerBlock, cuda_utils::kCUDABlockReduceMaxThreads); @@ -342,7 +342,7 @@ void multinomial_with_replacement_kernel_impl( AT_DISPATCH_FLOATING_TYPES_AND_HALF(self_v.scalar_type(), "multinomial_kernel_cuda", [&] { using accscalar_t = at::acc_type; auto props = at::cuda::getCurrentDeviceProperties(); - CUDA_KERNEL_ASSERT(props != NULL); + TORCH_CHECK(props != nullptr); int numSM = props->multiProcessorCount; int maxThreads = props->maxThreadsPerBlock; int maxShared = props->sharedMemPerBlock; diff --git a/aten/src/ATen/native/cuda/NLLLoss2d.cu b/aten/src/ATen/native/cuda/NLLLoss2d.cu index 2246c836f3dc..d3f128462529 100644 --- a/aten/src/ATen/native/cuda/NLLLoss2d.cu +++ b/aten/src/ATen/native/cuda/NLLLoss2d.cu @@ -44,6 +44,7 @@ inline scalar_t* optional_data(const Tensor& source) { using at::cuda::detail::CUDA_NUM_THREADS; using at::cuda::detail::GET_BLOCKS; +// TODO(crcrpar): Think about introducing `canUse32BitIndexMath` and choose int or int64_t for `target`. template C10_LAUNCH_BOUNDS_1(CUDA_NUM_THREADS) __global__ void nll_loss2d_forward_no_reduce_kernel( @@ -98,11 +99,13 @@ __global__ void nll_loss2d_forward_kernel( for (int i = (blockIdx.x % blocks_per_sample) * blockDim.x + threadIdx.x; i < map_nelem; i += step) { - int t = target[toffset + i]; + int64_t t = target[toffset + i]; if (t != ignore_index) { CUDA_KERNEL_ASSERT(t >= 0 && t < n_classes); cur_weight = weight != nullptr ? weight[t] : static_cast(1); - input_sum -= input[ioffset + i + map_nelem * t] * cur_weight; + const auto input_index = ioffset + i + map_nelem * t; + CUDA_KERNEL_ASSERT(input_index >= 0); + input_sum -= input[input_index] * cur_weight; acc_weight += cur_weight; } } @@ -185,9 +188,11 @@ __global__ void nll_loss2d_backward_kernel( for (int i = (blockIdx.x % blocks_per_sample) * blockDim.x + threadIdx.x; i < map_nelem; i += step) { - int t = (int)target_thread[i]; + const int64_t t = target_thread[i]; if (t != ignore_index) { CUDA_KERNEL_ASSERT(t >= 0 && t < n_classes); + const auto grad_input_index = i + map_nelem * t; + CUDA_KERNEL_ASSERT(grad_input_index >= 0); grad_input_thread[i + map_nelem * t] = weights != nullptr ? weights[t] * grad : grad; } @@ -268,9 +273,9 @@ void nll_loss2d_forward_out_cuda_template( 0, at::cuda::getCurrentCUDAStream()>>>( count, - input.packed_accessor(), - target.packed_accessor(), - output.packed_accessor(), + input.packed_accessor64(), + target.packed_accessor64(), + output.packed_accessor64(), optional_data(weight_), ignore_index); C10_CUDA_KERNEL_LAUNCH_CHECK(); @@ -403,9 +408,9 @@ void nll_loss2d_backward_out_cuda_template( 0, at::cuda::getCurrentCUDAStream()>>>( count, - target.packed_accessor(), - grad_output.packed_accessor(), - grad_input.packed_accessor(), + target.packed_accessor64(), + grad_output.packed_accessor64(), + grad_input.packed_accessor64(), optional_data(weight_), ignore_index); C10_CUDA_KERNEL_LAUNCH_CHECK(); diff --git a/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu b/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu index d34de0f156bd..0ed107f2db19 100644 --- a/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu +++ b/aten/src/ATen/native/cuda/NaiveConvolutionTranspose3d.cu @@ -176,7 +176,7 @@ void slow_conv_transpose3d_out_cuda_template( const Tensor& input_, const Tensor& weight_, IntArrayRef kernel_size, - const Tensor& bias, + const Tensor& bias_, IntArrayRef stride, IntArrayRef padding, IntArrayRef output_padding, @@ -226,7 +226,7 @@ void slow_conv_transpose3d_out_cuda_template( int n_output_plane = weight_.size(1); TensorArg input_arg{input_, "input", 1}, output_arg{output, "output", 2}, - weight_arg{weight_, "weight", 3}, bias_arg{bias, "bias", 4}; + weight_arg{weight_, "weight", 3}, bias_arg{bias_, "bias", 4}; checkAllSameGPU( "slow_conv_transpose3d_out_cuda", @@ -236,7 +236,7 @@ void slow_conv_transpose3d_out_cuda_template( input_, Tensor(), weight_, - bias, + bias_, kernel_depth, kernel_width, kernel_height, @@ -254,12 +254,9 @@ void slow_conv_transpose3d_out_cuda_template( output_padding_height, 0); - TORCH_CHECK( - !bias.defined() || bias.is_contiguous(), - "bias tensor has to be contiguous"); - Tensor input = input_.contiguous(); Tensor weight = weight_.contiguous(); + Tensor bias = bias_.defined() ? bias_.contiguous() : bias_; int is_batch = false; if (input.dim() == 4) { diff --git a/aten/src/ATen/native/cuda/Normalization.cu b/aten/src/ATen/native/cuda/Normalization.cu index 3b27ebfc7d92..a8eff154c350 100644 --- a/aten/src/ATen/native/cuda/Normalization.cu +++ b/aten/src/ATen/native/cuda/Normalization.cu @@ -48,8 +48,11 @@ bool is_mixed_type(const Tensor& input, const Args&... parameters) { } inline bool batch_norm_use_channels_last_kernels(const at::Tensor& self) { - return (self.is_contiguous(at::MemoryFormat::ChannelsLast) || - (self.is_contiguous() && self.strides()[1] == 1)); + return ( + self.is_contiguous(at::MemoryFormat::ChannelsLast) || + self.is_contiguous(at::MemoryFormat::ChannelsLast3d) || + (self.is_contiguous() && self.strides()[1] == 1) + ); } enum class Impl { @@ -470,6 +473,22 @@ std::tuple batch_norm_cuda(const Tensor& self, const c10 return std::make_tuple(output, save_mean, save_invstd); } +std::tuple _batch_norm_legit_cuda(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, Tensor& running_mean, Tensor& running_var, bool train, double momentum, double epsilon) { + return batch_norm_cuda(self, weight_opt, bias_opt, running_mean, running_var, train, momentum, epsilon); +} + +std::tuple _batch_norm_legit_no_stats_cuda(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, bool train, double momentum, double epsilon) { + return batch_norm_cuda(self, weight_opt, bias_opt, Tensor(), Tensor(), train, momentum, epsilon); +} + +std::tuple _batch_norm_legit_cuda_out(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, Tensor& running_mean, Tensor& running_var, bool train, double momentum, double epsilon, Tensor& output, Tensor& save_mean, Tensor& save_invstd) { + return batch_norm_cuda_out(self, weight_opt, bias_opt, running_mean, running_var, train, momentum, epsilon, output, save_mean, save_invstd); +} + +std::tuple _batch_norm_legit_no_stats_cuda_out(const Tensor& self, const c10::optional& weight_opt, const c10::optional& bias_opt, bool train, double momentum, double epsilon, Tensor& output, Tensor& save_mean, Tensor& save_invstd) { + return batch_norm_cuda_out(self, weight_opt, bias_opt, Tensor(), Tensor(), train, momentum, epsilon, output, save_mean, save_invstd); +} + std::tuple batch_norm_backward_cuda(const Tensor& grad_out, const Tensor& input, const c10::optional& weight_opt, const c10::optional& running_mean_opt, const c10::optional& running_var_opt, const c10::optional& save_mean_opt, const c10::optional& save_invstd_opt, bool train, double epsilon, std::array grad_input_mask) { // See [Note: hacky wrapper removal for optional tensor] c10::MaybeOwned weight = at::borrow_from_optional_tensor(weight_opt); diff --git a/aten/src/ATen/native/cuda/Normalization.cuh b/aten/src/ATen/native/cuda/Normalization.cuh index a9b11e76db68..cc79284fea4d 100644 --- a/aten/src/ATen/native/cuda/Normalization.cuh +++ b/aten/src/ATen/native/cuda/Normalization.cuh @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -60,26 +61,10 @@ struct Float2 { v2 += a.v2; return *this; } -}; - -template -struct SumOp { - __device__ SumOp(const PTA& t) : tensor(t) {} - __device__ __forceinline__ accscalar_t operator()(int batch, int plane, int n) { - return static_cast(tensor[batch][plane][n]); - } - const PTA& tensor; -}; - -template -struct VarOp { - __device__ VarOp(accscalar_t m, const PTA& t) : mean(m), tensor(t) {} - __device__ __forceinline__ accscalar_t operator()(int batch, int plane, int n) { - accscalar_t val = tensor[batch][plane][n]; - return (val - mean) * (val - mean); + __device__ friend Float2 operator+(Float2 a, const Float2& b) { + a += b; + return a; } - const accscalar_t mean; - const PTA& tensor; }; template @@ -96,21 +81,25 @@ struct GradOp { const PTA& grad_output; }; -// Sum across all threads within a warp -template -static __device__ __forceinline__ T warpSum(T val) { - for (int i = 0; i < getMSB(C10_WARP_SIZE); ++i) { - val += WARP_SHFL_XOR(val, 1 << i, C10_WARP_SIZE); - } - return val; -} +template +struct SumReduceOp { + __device__ __forceinline__ acc_t combine(acc_t a, acc_t b) const { return a + b; } + + __device__ __forceinline__ acc_t warp_shfl_down(acc_t data, int offset) const { + return WARP_SHFL_DOWN(data, offset); + } +}; template -static __device__ __forceinline__ Float2 warpSum(Float2 value) { - value.v1 = warpSum(value.v1); - value.v2 = warpSum(value.v2); - return value; -} +struct SumReduceOp> { + using acc_t = Float2; + + __device__ __forceinline__ acc_t combine(acc_t a, acc_t b) const { return a + b; } + + __device__ __forceinline__ acc_t warp_shfl_down(acc_t data, int offset) const { + return {WARP_SHFL_DOWN(data.v1, offset), WARP_SHFL_DOWN(data.v2, offset)}; + } +}; // Sum across (batch, x/y/z) applying Op() pointwise // this works by first having each thread sum it's part @@ -130,37 +119,13 @@ __device__ scalar_t reduce(Op op, PTA tensor, int plane) { sum += op(batch, plane, x); } } - - // first warpSum to get one value per thread to - // one value per warp - sum = warpSum(sum); - - // this writes each warps item into shared memory - // there are at most C10_WARP_SIZE items left because - // there are at most C10_WARP_SIZE**2 threads at the beginning __shared__ scalar_t shared[C10_WARP_SIZE]; - __syncthreads(); - int tid = threadIdx.x + threadIdx.y * blockDim.x; - if (tid % C10_WARP_SIZE == 0) { - shared[tid / C10_WARP_SIZE] = sum; - } - if (tid >= blockDim.x * blockDim.y / C10_WARP_SIZE && tid < C10_WARP_SIZE) { - // zero out the other entries in shared - shared[tid] = (scalar_t)0; - } - __syncthreads(); - // now have a second warpSum to reduce the intermediate values - // from shared memory to a single number. The very first - // thread writes it to shared memory. - - if (tid / C10_WARP_SIZE == 0) { - sum = warpSum(shared[tid]); - if (tid == 0) { + SumReduceOp reduce_op; + sum = cuda_utils::BlockReduce, cuda_utils::Block2D>(sum, reduce_op, 0, shared); + if (threadIdx.x == 0 && threadIdx.y == 0) { shared[0] = sum; - } } __syncthreads(); - // Everyone picks it up, should be broadcast into the whole grad_input return shared[0]; } diff --git a/aten/src/ATen/native/cuda/Pow.cuh b/aten/src/ATen/native/cuda/Pow.cuh new file mode 100644 index 000000000000..9530b0ede274 --- /dev/null +++ b/aten/src/ATen/native/cuda/Pow.cuh @@ -0,0 +1,58 @@ +#pragma once +#include +#include + +namespace at { namespace native { + +namespace { + + +// SFINAE doesn't work well with NVCC under Windows for math functions like pow and sqrt. +// So we need to define the functions with the explicit function signatures. +// As for pow, the following signatures are defined as the device function: +// pow(float, int) +// pow(double, int) +// pow(float, float) +// pow(double, double) +#ifdef _MSC_VER +// Functions for pow +// pow for at::Half +static inline __host__ __device__ at::Half pow_(at::Half base, at::Half exp) { + return static_cast(std::pow(static_cast(base), static_cast(exp))); +} +// pow for at::BFloat16 +static inline __host__ __device__ at::BFloat16 pow_(at::BFloat16 base, at::BFloat16 exp) { + return static_cast(std::pow(static_cast(base), static_cast(exp))); +} +// pow (floating, floating/int) +template +static inline __host__ __device__ typename std::enable_if::value && (std::is_same::value || std::is_same::value), Base_type>::type + pow_(Base_type base, Exp_type exp) { + return std::pow(base, exp); +} +// pow (Otherwise) +template +static inline __host__ __device__ typename std::enable_if::value && !std::is_same::value, Base_type>::type + pow_(Base_type base, Exp_type exp) { + return static_cast(std::pow(static_cast(base), static_cast(exp))); +} +#else +template +static inline __host__ __device__ Base_type pow_(Base_type base, Exp_type exp) { + return ::pow(base, exp); +} +#endif + +template +static inline __host__ __device__ std::enable_if_t::value, T> pow_( + T base, T exp) { + return at::native::powi(base, exp); +} + +template +static inline __host__ __device__ c10::complex pow_(c10::complex base, c10::complex exp) { + return c10_complex_math::pow(base, exp); +} + +} // namespace +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/PowKernel.cu b/aten/src/ATen/native/cuda/PowKernel.cu index a1e453455d1b..30868f27d609 100644 --- a/aten/src/ATen/native/cuda/PowKernel.cu +++ b/aten/src/ATen/native/cuda/PowKernel.cu @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -17,54 +18,6 @@ void reciprocal_kernel_cuda(TensorIteratorBase& iter); namespace { - -// SFINAE doesn't work well with NVCC under Windows for math functions like pow and sqrt. -// So we need to define the functions with the explicit function signatures. -// As for pow, the following signatures are defined as the device function: -// pow(float, int) -// pow(double, int) -// pow(float, float) -// pow(double, double) -#ifdef _MSC_VER -// Functions for pow -// pow for at::Half -static inline __host__ __device__ at::Half pow_(at::Half base, at::Half exp) { - return static_cast(std::pow(static_cast(base), static_cast(exp))); -} -// pow for at::BFloat16 -static inline __host__ __device__ at::BFloat16 pow_(at::BFloat16 base, at::BFloat16 exp) { - return static_cast(std::pow(static_cast(base), static_cast(exp))); -} -// pow (floating, floating/int) -template -static inline __host__ __device__ typename std::enable_if::value && (std::is_same::value || std::is_same::value), Base_type>::type - pow_(Base_type base, Exp_type exp) { - return std::pow(base, exp); -} -// pow (Otherwise) -template -static inline __host__ __device__ typename std::enable_if::value && !std::is_same::value, Base_type>::type - pow_(Base_type base, Exp_type exp) { - return static_cast(std::pow(static_cast(base), static_cast(exp))); -} -#else -template -static inline __host__ __device__ Base_type pow_(Base_type base, Exp_type exp) { - return ::pow(base, exp); -} -#endif - -template -static inline __host__ __device__ std::enable_if_t::value, T> pow_( - T base, T exp) { - return at::native::powi(base, exp); -} - -template -static inline __host__ __device__ c10::complex pow_(c10::complex base, c10::complex exp) { - return c10_complex_math::pow(base, exp); -} - void pow_tensor_scalar_kernel(TensorIteratorBase& iter, const Scalar& exp_scalar); template diff --git a/aten/src/ATen/native/cuda/Reduce.cuh b/aten/src/ATen/native/cuda/Reduce.cuh index 34e99ae57a59..0b3e4a622487 100644 --- a/aten/src/ATen/native/cuda/Reduce.cuh +++ b/aten/src/ATen/native/cuda/Reduce.cuh @@ -1135,8 +1135,23 @@ inline void gpu_reduce_kernel(TensorIterator& iter, const ops_t& ops, ident_t id using traits = function_traits; using arg_t = typename traits::template arg<0>::type; + // at::Half/at::ComplexHalf overflows easily as it's range is very small. + // So when scalar_t and out_scalar_t are at::Half/at::ComplexHalf, we + // set can_accumulate_in_output to False. + static constexpr bool is_inp_out_type_half_or_chalf = + (std::is_same::value && + std::is_same::value) || + (std::is_same, scalar_t>::value && + std::is_same, out_scalar_t>::value); + // at::BFloat16 has lower precision and can lead to rounding errors. + // So when scalar_t and out_scalar_t are at::BFloat16, we + // set can_accumulate_in_output to False. + static constexpr bool is_inp_out_type_bfloat16 = + (std::is_same::value && + std::is_same::value); static constexpr bool can_accumulate_in_output = - std::is_convertible::value; + std::is_convertible::value && + !(is_inp_out_type_half_or_chalf || is_inp_out_type_bfloat16); bool can_use_32bit_indexing = iter.can_use_32bit_indexing(); std::unique_ptr owned_buf_ptr; @@ -1227,9 +1242,23 @@ inline void jitted_gpu_reduce_kernel(TensorIterator& iter, const std::string& fu //TODO - this will be different for more complicated reductions, but for now reductions using //func_wrapper all have arg_t = opmath using arg_t = at::opmath_type; + // at::Half/at::ComplexHalf overflows easily as it's range is very small. + // So when scalar_t and out_scalar_t are at::Half/at::ComplexHalf, we + // set can_accumulate_in_output to False. + static constexpr bool is_inp_out_type_half_or_chalf = + (std::is_same::value && + std::is_same::value) || + (std::is_same, scalar_t>::value && + std::is_same, out_scalar_t>::value); + // at::BFloat16 has lower precision and can lead to rounding errors. + // So when scalar_t and out_scalar_t are at::BFloat16, we + // set can_accumulate_in_output to False. + static constexpr bool is_inp_out_type_bfloat16 = + (std::is_same::value && + std::is_same::value); static constexpr bool can_accumulate_in_output = - std::is_convertible::value; - static_assert(can_accumulate_in_output == true, "unsupported arg_t for jitted reduction"); + std::is_convertible::value && + !(is_inp_out_type_half_or_chalf || is_inp_out_type_bfloat16); bool can_use_32bit_indexing = iter.can_use_32bit_indexing(); std::unique_ptr owned_buf_ptr; diff --git a/aten/src/ATen/native/cuda/ReflectionPad.cu b/aten/src/ATen/native/cuda/ReflectionPad.cu index 33f71368ca10..5380b0fef5f2 100644 --- a/aten/src/ATen/native/cuda/ReflectionPad.cu +++ b/aten/src/ATen/native/cuda/ReflectionPad.cu @@ -335,7 +335,7 @@ void reflection_pad2d_out_template( int64_t size_y = nplane; int64_t size_z = nbatch; - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(kHalf, + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kHalf, kBFloat16, input.scalar_type(), "reflection_pad2d_out_template", [&] { for (int64_t block_y = 0; block_y < size_y; block_y += 65535) { @@ -407,7 +407,7 @@ void reflection_pad2d_backward_out_template( int64_t size_y = nplane; int64_t size_z = nbatch; - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(kHalf, + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kHalf, kBFloat16, input.scalar_type(), "reflection_pad2d_backward_out_template", [&] { for (int64_t block_y = 0; block_y < size_y; block_y += 65535) { @@ -463,8 +463,8 @@ TORCH_IMPL_FUNC(reflection_pad1d_out_cuda) Tensor input = input_.contiguous(); - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1( - kHalf, input.scalar_type(), "reflection_pad1d_out_template", [&] { + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2( + kHalf, kBFloat16, input.scalar_type(), "reflection_pad1d_out_template", [&] { reflection_pad1d_out_kernel<<< grid_size, block_size, @@ -520,7 +520,7 @@ TORCH_IMPL_FUNC(reflection_pad1d_backward_out_cuda)(const Tensor& grad_output_, dim3 block_size(output_w > 256 ? 256 : output_w); dim3 grid_size((int) ::ceil(output_w / 256.0), nplane, nbatch); - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(kHalf, + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kHalf, kBFloat16, grad_input.scalar_type(), "reflection_pad1d_backward_out_cuda", [&] { reflection_pad1d_backward_out_kernel<<< grid_size, block_size, 0, at::cuda::getCurrentCUDAStream()>>>( @@ -589,7 +589,7 @@ TORCH_IMPL_FUNC(reflection_pad3d_out_cuda) ( auto input = input_.contiguous(); bool batch_mode = (input.dim() == 5); - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(kHalf, + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kHalf, kBFloat16, input.scalar_type(), "reflection_pad3d_out_cuda", [&] { auto input_inner = input; auto output_inner = output; @@ -641,7 +641,7 @@ TORCH_IMPL_FUNC(reflection_pad3d_backward_out_cuda) ( int64_t pad_top = padding[2]; int64_t pad_front = padding[4]; - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1(kHalf, + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2(kHalf, kBFloat16, input.scalar_type(), "reflection_pad3d_backward_out_cuda", [&] { auto grad_input_ = grad_input; auto grad_output_ = grad_output; diff --git a/aten/src/ATen/native/cuda/RreluWithNoise.cu b/aten/src/ATen/native/cuda/RreluWithNoise.cu index 762098ab7770..c97cd15e1e85 100644 --- a/aten/src/ATen/native/cuda/RreluWithNoise.cu +++ b/aten/src/ATen/native/cuda/RreluWithNoise.cu @@ -17,7 +17,7 @@ namespace at { namespace native { template -#if __CUDA_ARCH__ >= 350 || defined __HIP_PLATFORM_HCC__ +#if __CUDA_ARCH__ >= 350 || defined USE_ROCM C10_LAUNCH_BOUNDS_2(256, 4) #endif __global__ void rrelu_with_noise_cuda_kernel( diff --git a/aten/src/ATen/native/cuda/ScanKernels.cu b/aten/src/ATen/native/cuda/ScanUtils.cuh similarity index 84% rename from aten/src/ATen/native/cuda/ScanKernels.cu rename to aten/src/ATen/native/cuda/ScanUtils.cuh index 44982208c086..ba27a245172b 100644 --- a/aten/src/ATen/native/cuda/ScanKernels.cu +++ b/aten/src/ATen/native/cuda/ScanUtils.cuh @@ -1,18 +1,15 @@ -#define TORCH_ASSERT_NO_OPERATORS -#include -#include -#include -#include -#include +#pragma once #include -#include -#include - +#include #include +#include -#include +#include +#include +#include -namespace at { namespace native { +namespace at { +namespace native { template constexpr inline integer ceil_div(integer n, integer m) { @@ -158,7 +155,7 @@ __global__ void tensor_kernel_scan_outer_dim_with_indices(scalar_t *self_, scala } } -void check_fits_in_unsigned(int64_t val, const char* name) { +inline void check_fits_in_unsigned(int64_t val, const char* name) { constexpr auto umax = std::numeric_limits::max(); TORCH_CHECK( val >= 0 && val <= umax, name, " must fit in a 32-bit uint32_t value"); @@ -224,22 +221,6 @@ void scan_dim_with_indices(const TensorBase& self, const TensorBase& values, con } } -void launch_cummax_cuda_kernel(const TensorBase& self, const TensorBase& values, const TensorBase& indices, int64_t dim) { - AT_DISPATCH_ALL_TYPES_AND3(at::ScalarType::Bool, at::ScalarType::Half, at::ScalarType::BFloat16, - self.scalar_type(), "cummax_cuda", [&]() { - scalar_t init = self.is_floating_point() ? (-1*std::numeric_limits::infinity()) : std::numeric_limits::lowest(); - scan_dim_with_indices(self, values, indices, dim, init, std::greater_equal()); - }); -} - -void launch_cummin_cuda_kernel(const TensorBase& self, const TensorBase& values, const TensorBase& indices, int64_t dim) { - AT_DISPATCH_ALL_TYPES_AND3(at::ScalarType::Bool, at::ScalarType::Half, at::ScalarType::BFloat16, - self.scalar_type(), "cummin_cuda", [&]() { - scalar_t init = self.is_floating_point() ? std::numeric_limits::infinity() : std::numeric_limits::max(); - scan_dim_with_indices(self, values, indices, dim, init, std::less_equal()); - }); -} - // TODO: The implementation of `tensor_kernel_scan_outer_dim` and // `tensor_kernel_scan_innermost_dim` is similar to // `tensor_kernel_scan_outer_dim_with_indices` @@ -468,54 +449,4 @@ void scan_dim(const TensorBase& self, const TensorBase& result, } } -void launch_logcumsumexp_cuda_kernel(const TensorBase& result, const TensorBase& self, int64_t dim) { - AT_DISPATCH_FLOATING_TYPES_AND2( - ScalarType::Half, ScalarType::BFloat16, - self.scalar_type(), "logcumsumexp_cuda", - [&]() { - using accscalar_t = acc_type; - scalar_t init = -std::numeric_limits::infinity(); - auto log_add_exp = [] C10_HOST_DEVICE (const scalar_t x, const scalar_t y) -> scalar_t { - scalar_t min = at::_isnan(y) ? y : std::min(x,y); //std::min returns first arg if one of the args is nan - scalar_t max = at::_isnan(y) ? y : std::max(x,y); //std::max returns first arg if one of the args is nan - if (min != max || ::isfinite(static_cast(min))) { - // nan will be propagated here - return ::log1p(std::exp(min - max)) + max; - } else { - // special case to correctly handle infinite inputs - return x; - } - }; - scan_dim(self, result, dim, init, log_add_exp); - }); -} - -void launch_cumsum_cuda_kernel(const TensorBase& result, const TensorBase& self, int64_t dim) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2( - ScalarType::Half, ScalarType::BFloat16, - self.scalar_type(), "cumsum_cuda", - [&]() { - scalar_t init = 0; - scan_dim( - self, - result, - dim, - init, - std::plus()); - }); -} - -void launch_cumprod_cuda_kernel(const TensorBase& result, const TensorBase& self, int64_t dim) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND2( - ScalarType::Half, ScalarType::BFloat16, self.scalar_type(), "cumprod_cuda", [&]() { - scalar_t init = 1; - scan_dim( - self, - result, - dim, - init, - std::multiplies()); - }); -} - -}} // namespace at::native +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/Shape.cu b/aten/src/ATen/native/cuda/Shape.cu index 08605cf4ed1b..389515eac1e6 100644 --- a/aten/src/ATen/native/cuda/Shape.cu +++ b/aten/src/ATen/native/cuda/Shape.cu @@ -252,7 +252,7 @@ void parallel_cat(const Tensor &out, const MaterializedITensorListRef& inputs, i } // namespace TORCH_IMPL_FUNC(cat_out_cuda) -(ITensorListRef tensors, +(const ITensorListRef& tensors, int64_t dim, int64_t valid, bool all_contiguous, diff --git a/aten/src/ATen/native/cuda/SoftMax.cu b/aten/src/ATen/native/cuda/SoftMax.cu index c53276e619be..6df916caaa85 100644 --- a/aten/src/ATen/native/cuda/SoftMax.cu +++ b/aten/src/ATen/native/cuda/SoftMax.cu @@ -636,8 +636,8 @@ cunn_SoftMaxForward(outscalar_t *output, scalar_t *input, int classes) // forward pointers to batch[blockIdx.x] // each block handles a sample in the mini-batch - input += blockIdx.x * classes; - output += blockIdx.x * classes; + input += static_cast(blockIdx.x) * classes; + output += static_cast(blockIdx.x) * classes; const int shift = ((uint64_t)input) % ALIGN_BYTES / sizeof(scalar_t); const int output_shift = ((uint64_t)output) % ALIGN_BYTES / sizeof(outscalar_t); @@ -672,9 +672,9 @@ cunn_SoftMaxBackward(scalar_t *gradInput, outscalar_t *output, outscalar_t *grad extern __shared__ unsigned char smem[]; auto sdata = reinterpret_cast(smem); - gradInput += blockIdx.x * classes; - output += blockIdx.x * classes; - gradOutput += blockIdx.x * classes; + gradInput += static_cast(blockIdx.x) * classes; + output += static_cast(blockIdx.x) * classes; + gradOutput += static_cast(blockIdx.x) * classes; const int shift = ((uint64_t)gradInput) % ALIGN_BYTES / sizeof(scalar_t); const int output_shift = ((uint64_t)output) % ALIGN_BYTES / sizeof(outscalar_t); @@ -963,7 +963,7 @@ Tensor masked_softmax_cuda(const Tensor& input_, const Tensor& mask_, const c10: TORCH_CHECK(mask_type_.has_value(), "Mask Type should be defined"); int64_t mask_type = mask_type_.value(); - TORCH_CHECK((mask_type == 0) || (mask_type == 1), "Mask Type should be 0 (src_mask) or 1 (src_key_padding_mask)"); + TORCH_CHECK((mask_type == 0) || (mask_type == 1) || (mask_type == 2), "Mask Type should be 0 (src_mask), 1 (src_key_padding_mask), or 2 (default_mask)"); // If input is [B, H, T, T] and mask is [B, T] // we have special fast kernel @@ -975,6 +975,7 @@ Tensor masked_softmax_cuda(const Tensor& input_, const Tensor& mask_, const c10: // TODO We should have special fast kernel for TxT mask as well // mask_type == 0 => mask_ is a src_mask bool is_TxT_mask = (mask_type == 0) && input_.dim() == 4 && mask_.dim() == 2 && input_.size(3) == mask_.size(1) && input_.size(2) == mask_.size(0) && mask_.size(0) == mask_.size(1); + // If mask_type == 2, then mask_.sizes() must equal input_.sizes() TORCH_CHECK(mask_.sizes() == input_.sizes() || is_BxT_mask || is_TxT_mask, "Mask shape should match input. mask: ", mask_.sizes(), " input: ", input_.sizes()); auto input = input_.dim() == 0 ? input_.view(1) : input_; @@ -992,7 +993,9 @@ Tensor masked_softmax_cuda(const Tensor& input_, const Tensor& mask_, const c10: // 4) dim == input.dim() - 1 // Otherwise, we fallback to vanilla softmax (where we do not support transformer_mask since converting the mask is expensive) if (softmax_elements > 1024 || softmax_elements * input.element_size() > 4096 || !mask.is_contiguous() || dim < input.dim()-1) { - TORCH_CHECK(mask.sizes() == input.sizes(), "Mask shape should match input shape; transformer_mask is not supported in the fallback case."); + if (is_BxT_mask) { + mask = mask.view({mask_.size(0), 1, 1, mask_.size(1)}).expand(input.sizes()); + } AT_DISPATCH_FLOATING_TYPES_AND2( ScalarType::Half, ScalarType::BFloat16, @@ -1061,7 +1064,7 @@ Tensor masked_softmax_backward_cuda( auto grad = grad_.contiguous(); auto output = output_.contiguous(); auto mask = mask_.contiguous(); - int64_t dim = dim_.has_value() ? dim_.value() : output.dim() - 1; + int64_t dim = dim_.has_value() ? maybe_wrap_dim(dim_.value(), output.dim()) : output.dim() - 1; grad = grad.dim() == 0 ? grad.view(1) : grad; mask = mask.dim() == 0 ? mask.view(1) : mask; diff --git a/aten/src/ATen/native/cuda/SparseBinaryOpIntersectionKernel.cu b/aten/src/ATen/native/cuda/SparseBinaryOpIntersectionKernel.cu new file mode 100644 index 000000000000..d34e0c62e6ab --- /dev/null +++ b/aten/src/ATen/native/cuda/SparseBinaryOpIntersectionKernel.cu @@ -0,0 +1,150 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include +#include +#include + +namespace at { +namespace native { + +namespace { + +template +struct CUDAKernelLauncher { + static void launch(TensorIteratorBase& iter, const func_t& f) { + gpu_kernel(iter, f); + } +}; + +struct MulOp { + template + static FUNCAPI scalar_t apply(scalar_t a, scalar_t b) { + return a * b; + } +}; + +template <> +FUNCAPI bool MulOp::apply(bool a, bool b) { + return a && b; +} + +template +C10_LAUNCH_BOUNDS_2(nt, vt) +__global__ void apply_kernel(int n, loop_t loop) { + constexpr int nv = nt * vt; + int idx = nv * blockIdx.x + threadIdx.x; + + #pragma unroll + for (int i = 0; i < vt; ++i) { + if (idx < n) { + loop(idx); + idx += nt; + } + } +} + +template +void launch_kernel(int64_t n, const loop_t& loop) { + TORCH_INTERNAL_ASSERT(0 <= n && n <= std::numeric_limits::max()); + if (!n) { + return; + } + + const dim3 block(nt); + const dim3 grid((n + block.x * vt - 1) / (block.x * vt)); + const auto stream = at::cuda::getCurrentCUDAStream(); + apply_kernel<<>>(n, loop); + C10_CUDA_KERNEL_LAUNCH_CHECK(); +} + +template +void binary_op_intersection_kernel( + TensorIterator& iter, + int64_t lhs_nnz_stride, + int64_t rhs_nnz_stride) { + if (!iter.can_use_32bit_indexing()) { + for (auto& sub_iter : iter.with_32bit_indexing()) { + binary_op_intersection_kernel( + sub_iter, lhs_nnz_stride, rhs_nnz_stride); + } + return; + } + + auto* RESTRICT ptr_res_values_bytes = reinterpret_cast(iter.data_ptr(0)); + const auto* RESTRICT ptr_lhs_values_bytes = reinterpret_cast(iter.data_ptr(1)); + const auto* RESTRICT ptr_lhs_select_idx_bytes = reinterpret_cast(iter.data_ptr(2)); + const auto* RESTRICT ptr_rhs_values_bytes = reinterpret_cast(iter.data_ptr(3)); + const auto* RESTRICT ptr_rhs_select_idx_bytes = reinterpret_cast(iter.data_ptr(4)); + + auto offset_calc = make_offset_calculator<5>(iter); + auto loop = [=] FUNCAPI (int i) { + auto offsets = offset_calc.get(i); + + auto* RESTRICT ptr_res_values = reinterpret_cast(ptr_res_values_bytes + offsets[0]); + const auto* RESTRICT ptr_lhs_values = reinterpret_cast(ptr_lhs_values_bytes + offsets[1]); + const auto lhs_nnz_idx = *reinterpret_cast(ptr_lhs_select_idx_bytes + offsets[2]); + const auto* RESTRICT ptr_rhs_values = reinterpret_cast(ptr_rhs_values_bytes + offsets[3]); + const auto rhs_nnz_idx = *reinterpret_cast(ptr_rhs_select_idx_bytes + offsets[4]); + + *ptr_res_values = binary_op_t::apply( + *(ptr_lhs_values + lhs_nnz_idx * lhs_nnz_stride), + *(ptr_rhs_values + rhs_nnz_idx * rhs_nnz_stride)); + }; + + launch_kernel(iter.numel(), loop); +} + + +template +struct CUDAValueSelectionIntersectionKernel { + static Tensor apply( + const Tensor& lhs_values, + const Tensor& lhs_select_idx, + const Tensor& rhs_values, + const Tensor& rhs_select_idx) { + auto iter = make_value_selection_intersection_iter( + lhs_values, + lhs_select_idx, + rhs_values, + rhs_select_idx); + auto res_values = iter.tensor(0); + + // If res_values is empty, we can return it right away. + // Otherwise floating point issues with OffsetCalculator. + if (!res_values.numel()) { + return res_values; + } + + const auto lhs_nnz_stride = lhs_values.stride(0); + const auto rhs_nnz_stride = rhs_values.stride(0); + + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( + ScalarType::Bool, ScalarType::Half, ScalarType::BFloat16, res_values.scalar_type(), + "binary_op_intersection_cpu", [&] { + AT_DISPATCH_INDEX_TYPES(lhs_select_idx.scalar_type(), + "binary_op_intersection_cpu", [&] { + binary_op_intersection_kernel( + iter, lhs_nnz_stride, rhs_nnz_stride); + }); + }); + + return res_values; + } +}; + +void mul_sparse_sparse_out_cuda_kernel( + Tensor& result, + const Tensor& x, + const Tensor& y) { + using CUDAValueSelectionMulKernel = CUDAValueSelectionIntersectionKernel; + _sparse_binary_op_intersection_kernel_out( + result, x, y + ); +} + +} + +REGISTER_CUDA_DISPATCH(mul_sparse_sparse_out_stub, &mul_sparse_sparse_out_cuda_kernel); + +}} diff --git a/aten/src/ATen/native/cuda/SummaryOps.cu b/aten/src/ATen/native/cuda/SummaryOps.cu index 5476682d7c4d..3383c38ac9ac 100644 --- a/aten/src/ATen/native/cuda/SummaryOps.cu +++ b/aten/src/ATen/native/cuda/SummaryOps.cu @@ -15,7 +15,7 @@ #include #include #include -#include +#include #endif namespace at { @@ -271,7 +271,7 @@ bool CUDA_tensor_histogram( detail::TensorInfo pInfo(nullptr, 0, {}, {}); Tensor partial_output; if (memType == CUDAHistogramMemoryType::MULTI_BLOCK) { - partial_output = native::zeros( + partial_output = at::zeros( {grid.x, nbins}, optTypeMetaToScalarType(a.options().dtype_opt()), a.options().layout_opt(), @@ -313,7 +313,7 @@ Tensor _bincount_cuda_template( AT_ERROR("minlength should be >= 0"); } if (self.dim() == 1 && self.numel() == 0) { - return native::zeros( + return at::zeros( {minlength}, kLong, c10::nullopt /* layout */, @@ -327,8 +327,8 @@ Tensor _bincount_cuda_template( } bool has_weights = weights.defined(); - if (has_weights && weights.size(0) != self.size(0)) { - AT_ERROR("input and weights should have the same length"); + if (has_weights && (weights.dim() != 1 || weights.size(0) != self.size(0))) { + AT_ERROR("weights should be 1-d and have the same length as input"); } const int64_t nbins = @@ -342,7 +342,7 @@ Tensor _bincount_cuda_template( // alloc output counter on GPU Tensor output; if (has_weights) { - output = native::zeros( + output = at::zeros( {nbins}, optTypeMetaToScalarType(weights.options().dtype_opt()), weights.options().layout_opt(), @@ -351,7 +351,7 @@ Tensor _bincount_cuda_template( cuda::CUDA_tensor_histogram( output, self, weights, nbins, minvalue, maxvalue); } else { - output = native::zeros( + output = at::zeros( {nbins}, kLong, c10::nullopt /* layout */, @@ -373,7 +373,7 @@ Tensor _histc_cuda_template( if (nbins <= 0) { AT_ERROR("bins must be > 0"); } - Tensor output = native::zeros( + Tensor output = at::zeros( {nbins}, self.scalar_type(), c10::nullopt /* layout */, diff --git a/aten/src/ATen/native/cuda/TensorFactories.cu b/aten/src/ATen/native/cuda/TensorFactories.cu index 6e05908b2cce..e880b21d650d 100644 --- a/aten/src/ATen/native/cuda/TensorFactories.cu +++ b/aten/src/ATen/native/cuda/TensorFactories.cu @@ -55,10 +55,6 @@ Tensor empty_cuda(IntArrayRef size, c10::optional dtype_opt, c10::op return at::detail::empty_cuda(size, dtype_opt, layout_opt, device_opt, pin_memory_opt, memory_format_opt); } -Tensor empty_symint_cuda(c10::SymIntArrayRef size, c10::optional dtype_opt, c10::optional layout_opt, c10::optional device_opt, c10::optional pin_memory_opt, c10::optional memory_format_opt) { - return at::native::empty_cuda(asIntArrayRefSlow(size), dtype_opt, layout_opt, device_opt, pin_memory_opt, memory_format_opt); -} - Tensor _efficientzerotensor_cuda(IntArrayRef size, c10::optional dtype, c10::optional layout, @@ -294,10 +290,10 @@ Tensor tril_indices_cuda( cuda::getApplyGrid(tril_size, dim_grid, tensor.get_device()), "unable to get dim grid"); - AT_DISPATCH_ALL_TYPES_AND(at::ScalarType::Half, tensor.scalar_type(), "tril_indices_cuda", [&] { + AT_DISPATCH_INDEX_TYPES(tensor.scalar_type(), "tril_indices_cuda", [&] { tril_indices_kernel<<< dim_grid, dim_block, 0, at::cuda::getCurrentCUDAStream()>>>( - tensor.data_ptr(), + tensor.data_ptr(), trapezoid_row_offset, m_first_row, col, @@ -372,10 +368,10 @@ Tensor triu_indices_cuda( cuda::getApplyGrid(triu_size, dim_grid, tensor.get_device()), "unable to get dim grid"); - AT_DISPATCH_ALL_TYPES_AND(at::ScalarType::Half, tensor.scalar_type(), "triu_indices_cuda", [&] { + AT_DISPATCH_INDEX_TYPES(tensor.scalar_type(), "triu_indices_cuda", [&] { triu_indices_kernel<<< dim_grid, dim_block, 0, at::cuda::getCurrentCUDAStream()>>>( - tensor.data_ptr(), + tensor.data_ptr(), std::max(0, offset), m_first_row, col, diff --git a/aten/src/ATen/native/cuda/TriangularOps.cu b/aten/src/ATen/native/cuda/TriangularOps.cu index f87d821f396c..a079ec684988 100644 --- a/aten/src/ATen/native/cuda/TriangularOps.cu +++ b/aten/src/ATen/native/cuda/TriangularOps.cu @@ -102,137 +102,9 @@ TORCH_IMPL_FUNC(triu_cuda)(const Tensor& self, int64_t k, const Tensor &result) } } -// Copy the kth diagonal of a matrix B to a vector A. -template -C10_LAUNCH_BOUNDS_1(1024) -__global__ void copy_from_diagonal_kernel( - scalar_t* a, - scalar_t* b, - std::ptrdiff_t start, - std::ptrdiff_t size, - std::ptrdiff_t strideSum, - std::ptrdiff_t strideA) { - for (std::ptrdiff_t linearIndex = blockIdx.x * blockDim.x + threadIdx.x; - linearIndex < size; - linearIndex += gridDim.x * blockDim.x) { - const std::ptrdiff_t bOffset = start + strideSum * linearIndex; - a[strideA * linearIndex] = b[bOffset]; - } -} - -// Copy vector B to the kth diagonal of a matrix A -template -C10_LAUNCH_BOUNDS_1(1024) -__global__ void copy_to_diagonal_kernel( - scalar_t* a, - scalar_t* b, - std::ptrdiff_t start, - std::ptrdiff_t size, - std::ptrdiff_t strideSum, - std::ptrdiff_t strideB) { - for (std::ptrdiff_t linearIndex = blockIdx.x * blockDim.x + threadIdx.x; - linearIndex < size; - linearIndex += gridDim.x * blockDim.x) { - const std::ptrdiff_t aOffset = start + strideSum * linearIndex; - a[aOffset] = b[strideB * linearIndex]; - } -} - -template -Tensor& apply_diag(Tensor& result, const Tensor& self, int64_t dimension) { - TORCH_CHECK( - self.dim() == 1 || self.dim() == 2, "matrix or a vector expected"); - - TensorArg result_arg{result, "result", 1}; - TensorArg self_arg{self, "self", 2}; - checkAllSameGPU(__func__, {result_arg, self_arg}); - checkSameType(__func__, result_arg, self_arg); - - int nDimension = self.dim(); - if (nDimension == 2) { - auto self_stride_0 = self.stride(0); - auto self_stride_1 = self.stride(1); - - int sz; - if (dimension > 0) { - sz = std::min(self.size(0), self.size(1) - dimension); - } else { - sz = std::min(self.size(0) + dimension, self.size(1)); - } - - at::native::resize_output(result, {sz}); - if (sz > 0) { - at::assert_no_internal_overlap(result); - auto result_stride = result.stride(0); - const dim3 threads(std::min( - int(sz), - int(at::cuda::getCurrentDeviceProperties()->maxThreadsPerBlock))); - const dim3 grid( - std::min(int(1024), ceil_div(int(sz), int(threads.x)))); - auto start = - (dimension >= 0 ? dimension * self_stride_1 - : -dimension * self_stride_0); - - // Kernel Launch - copy_from_diagonal_kernel - <<>>( - result.data_ptr(), - self.data_ptr(), - start, - sz, - self_stride_0 + self_stride_1, - result_stride); - C10_CUDA_KERNEL_LAUNCH_CHECK(); - } - } else { - auto n_elems = self.numel(); - auto sz = (dimension > 0) ? n_elems + dimension : n_elems - dimension; - auto self_stride = self.stride(0); - at::native::resize_output(result, {sz, sz}); - result.zero_(); - if (sz > 0) { - at::assert_no_internal_overlap(result); - auto result_stride_0 = result.stride(0); - auto result_stride_1 = result.stride(1); - const dim3 threads(std::min( - int(sz), at::cuda::getCurrentDeviceProperties()->maxThreadsPerBlock)); - const dim3 grid( - std::min(int(1024), ceil_div(int(sz), int(threads.x)))); - auto start = - (dimension >= 0 ? dimension * result_stride_1 - : -dimension * result_stride_0); - - // Kernel Launch - copy_to_diagonal_kernel - <<>>( - result.data_ptr(), - self.data_ptr(), - start, - n_elems, - result_stride_0 + result_stride_1, - self_stride); - C10_CUDA_KERNEL_LAUNCH_CHECK(); - } - } - - return result; -} - -Tensor& diag_cuda_out(const Tensor& self, int64_t dimension, Tensor& result) { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4( - kComplexHalf, ScalarType::Half, ScalarType::BFloat16, ScalarType::Bool, - self.scalar_type(), "diag_cuda", - [&] { - apply_diag(result, self, dimension); - }); - return result; -} - Tensor trace_cuda(const Tensor& self) { TORCH_CHECK(self.dim() == 2, "expected a matrix"); - int dimension = 0; - auto result = at::diag(self, dimension); - return result.sum(); + return self.diagonal().sum(); } } // namespace native diff --git a/aten/src/ATen/native/cuda/UnaryComplexKernels.cu b/aten/src/ATen/native/cuda/UnaryComplexKernels.cu index 0589c3ba4f0d..a04194b1117e 100644 --- a/aten/src/ATen/native/cuda/UnaryComplexKernels.cu +++ b/aten/src/ATen/native/cuda/UnaryComplexKernels.cu @@ -1,6 +1,7 @@ #define TORCH_ASSERT_NO_OPERATORS #include #include +#include #include #include #include @@ -58,22 +59,10 @@ void angle_kernel_cuda(TensorIteratorBase& iter) { } } -// We manually overload conj because std::conj does not work types other than c10::complex. -template -__host__ __device__ static inline scalar_t conj_wrapper(scalar_t v) { - return v; -} - -template -__host__ __device__ static inline c10::complex conj_wrapper(c10::complex v) { - return std::conj(v); -} - // NB: Ignores the negative bit on tensors const char conj_name[] = "conj_kernel"; void conj_kernel_cuda(TensorIteratorBase& iter) { - auto common_dtype = iter.common_dtype(); - if (common_dtype == kComplexHalf) { + auto conj_chalf = [&] { using scalar_t = c10::complex; #if AT_USE_JITERATOR() static const auto conj_string = jiterator_stringify( @@ -85,17 +74,23 @@ void conj_kernel_cuda(TensorIteratorBase& iter) { jitted_gpu_kernel(iter, conj_string); #else gpu_kernel(iter, [] GPU_LAMBDA(scalar_t a) -> scalar_t { - return conj_wrapper(a); + return std::conj(a); }); #endif - } else { - AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( - kBool, kBFloat16, kHalf, iter.common_dtype(), "conj_cuda", [&]() { - gpu_kernel(iter, [] GPU_LAMBDA(scalar_t a) -> scalar_t { - return conj_wrapper(a); - }); - }); - } + }; + + AT_DISPATCH_SWITCH(iter.common_dtype(), "conj_cuda", + AT_DISPATCH_CASE_ALL_TYPES_AND3(kBool, kBFloat16, kHalf, [&] { + // Conj is a no-op for non-complex types + direct_copy_kernel_cuda(iter); + }) + AT_DISPATCH_CASE_COMPLEX_TYPES([&] { + gpu_kernel(iter, [] GPU_LAMBDA(scalar_t a) -> scalar_t { + return std::conj(a); + }); + }) + AT_DISPATCH_CASE(kComplexHalf, conj_chalf) + ); } REGISTER_DISPATCH(angle_stub, &angle_kernel_cuda); diff --git a/aten/src/ATen/native/cuda/UnaryFractionKernels.cu b/aten/src/ATen/native/cuda/UnaryFractionKernels.cu index 87aa784b7d5d..ae4d4a01aa00 100644 --- a/aten/src/ATen/native/cuda/UnaryFractionKernels.cu +++ b/aten/src/ATen/native/cuda/UnaryFractionKernels.cu @@ -122,7 +122,7 @@ __host__ __device__ static inline c10::complex nearbyint_wrapper(c10::com } #pragma push -#pragma diag_suppress 177 // Function was declared but never referenced +#pragma nv_diag_suppress 177 // Function was declared but never referenced __host__ __device__ static inline c10::complex nearbyint_wrapper(c10::complex a) { return c10::complex(::nearbyint(static_cast(a.real())), ::nearbyint(static_cast(a.imag()))); } diff --git a/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu b/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu index 0cb0d9f238cf..2481fd602896 100644 --- a/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu +++ b/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu @@ -151,7 +151,9 @@ void sigmoid_kernel_cuda(TensorIteratorBase& iter) { } else { AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, common_dtype, "sigmoid_cuda", [&]() { gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { - return scalar_t{1} / (scalar_t{1} + std::exp(-a)); + using opmath_t = at::opmath_type; + const auto one = opmath_t{1}; + return static_cast(one/(one + std::exp(-opmath_t{a}))); }); }); } @@ -179,8 +181,9 @@ void sinc_kernel_cuda(TensorIteratorBase& iter) { return scalar_t(1); } else { // NVCC says constexpr var is not accessible from device - scalar_t product = c10::detail::pi() * a; - return std::sin(product) / product; + using opmath_t = at::opmath_type; + opmath_t product = c10::detail::pi() * opmath_t{a}; + return static_cast(std::sin(product) / product); } }); }); diff --git a/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu b/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu index 90f5238d0180..d75de2a6e90f 100644 --- a/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu +++ b/aten/src/ATen/native/cuda/UnfoldBackwardKernel.cu @@ -1,6 +1,7 @@ #define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include #include #include #include @@ -57,8 +58,7 @@ void _unfold_backward_internal_kernel( int64_t grad_in_dim_stride, int64_t grad_in_last_dim_stride, int64_t grad_in_dim_size, - int64_t grad_out_dim_stride, - bool is_step_ge_size + int64_t grad_out_dim_stride ) { if (iter.numel() == 0) { return; @@ -73,8 +73,7 @@ void _unfold_backward_internal_kernel( grad_in_dim_stride, grad_in_last_dim_stride, grad_in_dim_size, - grad_out_dim_stride, - is_step_ge_size + grad_out_dim_stride ); } return; @@ -84,63 +83,39 @@ void _unfold_backward_internal_kernel( char* __restrict__ grad_in_ptr = reinterpret_cast(iter.data_ptr(1)); char* __restrict__ idx_dim_ptr = reinterpret_cast(iter.data_ptr(2)); - if (is_step_ge_size) { - char* __restrict__ idx_last_dim_ptr = reinterpret_cast(iter.data_ptr(3)); + auto offset_calc = make_offset_calculator<3>(iter); - auto offset_calc = make_offset_calculator<4>(iter); + // The algorithm is: for each index in grad_out find + // the elements contributing to it and sum them up. + // Note: the algorithm does not require any synchronization. + auto loop = [=]C10_DEVICE(int i) { + auto offsets = offset_calc.get(i); - // this loop simply copies the data - // from proper places in grad_out to grad_in - auto loop = [=]C10_DEVICE(int i) { - auto offsets = offset_calc.get(i); + auto* __restrict__ grad_out_data = reinterpret_cast(grad_out_ptr + offsets[0]); + auto* __restrict__ grad_in_data = reinterpret_cast(grad_in_ptr + offsets[1]); - auto* __restrict__ grad_out_data = reinterpret_cast(grad_out_ptr + offsets[0]); - auto* __restrict__ grad_in_data = reinterpret_cast(grad_in_ptr + offsets[1]); + auto idx_dim = *reinterpret_cast(idx_dim_ptr + offsets[2]); - auto idx_dim = *reinterpret_cast(idx_dim_ptr + offsets[2]); - auto idx_last_dim = *reinterpret_cast(idx_last_dim_ptr + offsets[3]); - - auto grad_out_idx_dim = idx_dim * step + idx_last_dim; - grad_out_data[grad_out_idx_dim * grad_out_dim_stride] = *grad_in_data; - }; - - _launch_unfold_backward_kernel(iter.numel(), loop); - } - else { - auto offset_calc = make_offset_calculator<3>(iter); - - // The algorithm is: for each index in grad_out find - // the elements contributing to it and sum them up. - // Note: the algorithm does not require any synchronization. - auto loop = [=]C10_DEVICE(int i) { - auto offsets = offset_calc.get(i); - - auto* __restrict__ grad_out_data = reinterpret_cast(grad_out_ptr + offsets[0]); - auto* __restrict__ grad_in_data = reinterpret_cast(grad_in_ptr + offsets[1]); - - auto idx_dim = *reinterpret_cast(idx_dim_ptr + offsets[2]); - - // left_fold potentially intersecting with idx_dim - // is either (idx_dim - size) / step or the next integer. - int64_t left_fold_idx = (idx_dim > size) ? (idx_dim - size) / step : 0; - if (!(left_fold_idx * step <= idx_dim && idx_dim < left_fold_idx * step + size)) { - ++left_fold_idx; - } + // left_fold potentially intersecting with idx_dim + // is either (idx_dim - size) / step or the next integer. + int64_t left_fold_idx = (idx_dim > size) ? (idx_dim - size) / step : 0; + if (!(left_fold_idx * step <= idx_dim && idx_dim < left_fold_idx * step + size)) { + ++left_fold_idx; + } - auto right_fold_idx = idx_dim / step; - right_fold_idx = (right_fold_idx >= grad_in_dim_size) ? - (grad_in_dim_size - 1) : right_fold_idx; + auto right_fold_idx = idx_dim / step; + right_fold_idx = (right_fold_idx >= grad_in_dim_size) ? + (grad_in_dim_size - 1) : right_fold_idx; - for (auto fold_idx = left_fold_idx; fold_idx <= right_fold_idx; ++fold_idx) { - auto idx_last_dim = idx_dim - fold_idx * step; - *grad_out_data += grad_in_data[fold_idx * grad_in_dim_stride - + idx_last_dim * grad_in_last_dim_stride]; - } + for (auto fold_idx = left_fold_idx; fold_idx <= right_fold_idx; ++fold_idx) { + auto idx_last_dim = idx_dim - fold_idx * step; + *grad_out_data += grad_in_data[fold_idx * grad_in_dim_stride + + idx_last_dim * grad_in_last_dim_stride]; + } - }; + }; - _launch_unfold_backward_kernel(iter.numel(), loop); - } + _launch_unfold_backward_kernel(iter.numel(), loop); } void unfold_backward_cuda_kernel( @@ -160,16 +135,8 @@ void unfold_backward_cuda_kernel( auto grad_out_dim_stride = ensure_nonempty_stride(grad_out, dim); - auto is_step_ge_size = (step >= size); - - TensorIterator iter = - is_step_ge_size ? - _make_unfold_backward_iter_over_grad_in( - grad_out, grad_in, dim, size, step - ) : - _make_unfold_backward_iter_over_grad_out( - grad_out, grad_in, dim, size, step - ); + TensorIterator iter = _make_unfold_backward_iter_over_grad_out( + grad_out, grad_in, dim, size, step); AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, @@ -182,8 +149,7 @@ void unfold_backward_cuda_kernel( grad_in_dim_stride, grad_in_last_dim_stride, grad_in_dim_size, - grad_out_dim_stride, - is_step_ge_size + grad_out_dim_stride ); } ); diff --git a/aten/src/ATen/native/cuda/UpSampleNearest2d.cu b/aten/src/ATen/native/cuda/UpSampleNearest2d.cu index 8aa4f68aeda6..f223655daca1 100644 --- a/aten/src/ATen/native/cuda/UpSampleNearest2d.cu +++ b/aten/src/ATen/native/cuda/UpSampleNearest2d.cu @@ -94,13 +94,13 @@ __global__ void upsample_nearest2d_nhwc_out_frame( float width_scale, const size_t out_numel) { - const int index = blockIdx.x * blockDim.x + threadIdx.x; + const int64_t index = blockIdx.x * blockDim.x + threadIdx.x; if (index < out_numel) { - const int c = index % channels; - const int w2 = (index / channels) % width2; - const int h2 = (index / channels / width2) % height2; - const int n = index / channels / width2 / height2; + const auto c = index % channels; + const auto w2 = (index / channels) % width2; + const auto h2 = (index / channels / width2) % height2; + const auto n = index / channels / width2 / height2; const size_t h1 = height1 == height2 ? h2 : nn_compute_source_index_fn(height_scale, h2, height1); const size_t w1 = width1 == width2 ? w2 : nn_compute_source_index_fn(width_scale, w2, width1); @@ -240,13 +240,13 @@ static void upsample_nearest2d_out_cuda_template( output.is_contiguous(memory_format)) { at::Tensor input = input_.contiguous(at::MemoryFormat::ChannelsLast); - TORCH_CHECK(input.numel() < std::numeric_limits::max(), - "upsample_nearest_nhwc only supports input tensors with less than INT_MAX elements"); - TORCH_CHECK(output.numel() < std::numeric_limits::max(), - "upsample_nearest_nhwc only supports output tensors with less than INT_MAX elements"); + TORCH_CHECK(input.numel() < std::numeric_limits::max(), + "upsample_nearest_nhwc only supports input tensors with less than 2^63 - 1 elements"); + TORCH_CHECK(output.numel() < std::numeric_limits::max(), + "upsample_nearest_nhwc only supports output tensors with less than 2^63 - 1 elements"); - const int num_kernels = output.numel(); - const int num_threads = std::min(at::cuda::getCurrentDeviceProperties()->maxThreadsPerBlock, 1024); + const int64_t num_kernels = output.numel(); + const int64_t num_threads = std::min(at::cuda::getCurrentDeviceProperties()->maxThreadsPerBlock, 1024); AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::Byte, input.scalar_type(), "upsample_nearest2d_nhwc_out_frame", [&] { const scalar_t* idata = input.data_ptr(); diff --git a/aten/src/ATen/native/cuda/UpSampleNearest3d.cu b/aten/src/ATen/native/cuda/UpSampleNearest3d.cu index 1a4afa012d78..58f14ad491a6 100644 --- a/aten/src/ATen/native/cuda/UpSampleNearest3d.cu +++ b/aten/src/ATen/native/cuda/UpSampleNearest3d.cu @@ -337,52 +337,5 @@ TORCH_IMPL_FUNC(_upsample_nearest_exact3d_backward_out_cuda) ( using at::native::upsample::compute_output_size; using at::native::upsample_cuda::get_scale_value; -Tensor upsample_nearest3d_cuda( - const Tensor& input, - at::OptionalIntArrayRef output_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input.sizes(), output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return at::upsample_nearest3d(input, osize, scale_d, scale_h, scale_w); -} - -Tensor _upsample_nearest_exact3d_cuda( - const Tensor& input, - at::OptionalIntArrayRef output_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input.sizes(), output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return at::_upsample_nearest_exact3d(input, osize, scale_d, scale_h, scale_w); -} - -// when structured kernels can handle QuantizedCPU, update these overloads to be CompositeExplicitAutograd -Tensor upsample_nearest3d_backward_cuda( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return at::upsample_nearest3d_backward(grad_output, osize, input_size, scale_d, scale_h, scale_w); -} - -Tensor _upsample_nearest_exact3d_backward_cuda( - const Tensor& grad_output, - at::OptionalIntArrayRef output_size, - IntArrayRef input_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input_size, output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return at::_upsample_nearest_exact3d_backward(grad_output, osize, input_size, scale_d, scale_h, scale_w); -} - } // namespace native } // namespace at diff --git a/aten/src/ATen/native/cuda/block_reduce.cuh b/aten/src/ATen/native/cuda/block_reduce.cuh index e01cd0b060f5..fa75c71f8aca 100644 --- a/aten/src/ATen/native/cuda/block_reduce.cuh +++ b/aten/src/ATen/native/cuda/block_reduce.cuh @@ -29,24 +29,43 @@ __inline__ __device__ T WarpReduceSum(T val) { return val; } +struct Block1D { + static __forceinline__ __device__ int Tid() { return threadIdx.x; } + + static __forceinline__ __device__ int Warps() { + return blockDim.x / C10_WARP_SIZE; + } +}; + +struct Block2D { + static __forceinline__ __device__ int Tid() { + return threadIdx.x + threadIdx.y * blockDim.x; + } + + static __forceinline__ __device__ int Warps() { + return blockDim.x * blockDim.y / C10_WARP_SIZE; + } +}; + // Sums `val` across all threads in a block. // +// Warning: the return value is only valid for thread 0. // Assumptions: -// - Thread blocks are an 1D set of threads (indexed with `threadIdx.x` only) // - The size of each block should be a multiple of `C10_WARP_SIZE` // - `shared` should be a pointer to shared memory with size of, at least, // `sizeof(T) * number_of_warps` -template +template __inline__ __device__ T BlockReduceSum(T val, T* shared) { - const int lid = threadIdx.x % C10_WARP_SIZE; - const int wid = threadIdx.x / C10_WARP_SIZE; + const int tid = B::Tid(); + const int lid = tid % C10_WARP_SIZE; + const int wid = tid / C10_WARP_SIZE; val = WarpReduceSum(val); - __syncthreads(); + __syncthreads(); // prevent races when BlockReduces are called in a row. if (lid == 0) { shared[wid] = val; } __syncthreads(); - val = (threadIdx.x < blockDim.x / C10_WARP_SIZE) ? shared[lid] : T(0); + val = (tid < B::Warps()) ? shared[lid] : T(0); if (wid == 0) { val = WarpReduceSum(val); } @@ -62,19 +81,19 @@ __inline__ __device__ T WarpReduce(T val, const ReduceOp& op) { return val; } -template +template __inline__ __device__ T BlockReduce(T val, const ReduceOp& op, const T& identity_element, T* shared) { - const int lid = threadIdx.x % C10_WARP_SIZE; - const int wid = threadIdx.x / C10_WARP_SIZE; + const int tid = B::Tid(); + const int lid = tid % C10_WARP_SIZE; + const int wid = tid / C10_WARP_SIZE; val = WarpReduce(val, op); - __syncthreads(); + __syncthreads(); // prevent races when BlockReduces are called in a row. if (lid == 0) { shared[wid] = val; } __syncthreads(); - val = (threadIdx.x < blockDim.x / C10_WARP_SIZE) ? shared[lid] - : identity_element; + val = (tid < B::Warps()) ? shared[lid] : identity_element; if (wid == 0) { val = WarpReduce(val, op); } diff --git a/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cu b/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cu new file mode 100644 index 000000000000..f394899e24bd --- /dev/null +++ b/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cu @@ -0,0 +1,52 @@ +#include + +#include +#include +#include +#include +#include + +namespace at { namespace native { + +void _fused_adam_cuda_impl_( + at::TensorList params, + at::TensorList grads, + at::TensorList exp_avgs, + at::TensorList exp_avg_sqs, + at::TensorList max_exp_avg_sqs, + at::TensorList state_steps, + const double lr, + const double beta1, + const double beta2, + const double weight_decay, + const double eps, + const bool amsgrad, + const bool maximize, + const c10::optional& grad_scale, + const c10::optional& found_inf +) { + std::vector> tensor_lists{ + params.vec(), grads.vec(), exp_avgs.vec(), exp_avg_sqs.vec(), max_exp_avg_sqs.vec() }; + + float* grad_scale_ptr = grad_scale.has_value() ? grad_scale->data_ptr() : nullptr; + float* found_inf_ptr = found_inf.has_value() ? found_inf->data_ptr() : nullptr; + + AT_DISPATCH_FLOATING_TYPES_AND2(kHalf, kBFloat16, params[0].scalar_type(), + "fused_adam_kernel_cuda", [&]() { + multi_tensor_apply_for_fused_optimizer<5>( + tensor_lists, + state_steps, + FusedAdamMathFunctor(), + lr, + beta1, + beta2, + weight_decay, + eps, + maximize, + /* amsgrad */true, + grad_scale_ptr, + found_inf_ptr); + }); +} + +} } // namespace at::native diff --git a/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cuh b/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cuh new file mode 100644 index 000000000000..46e893e564d9 --- /dev/null +++ b/aten/src/ATen/native/cuda/fused_adam_amsgrad_impl.cuh @@ -0,0 +1,24 @@ +#pragma once +#include + +namespace at { namespace native { + +void _fused_adam_cuda_impl_( + at::TensorList params, + at::TensorList grads, + at::TensorList exp_avgs, + at::TensorList exp_avg_sqs, + at::TensorList max_exp_avg_sqs, + at::TensorList state_steps, + const double lr, + const double beta1, + const double beta2, + const double weight_decay, + const double eps, + const bool amsgrad, + const bool maximize, + const c10::optional& grad_scale, + const c10::optional& found_inf +); + +} } // namespace at::native diff --git a/aten/src/ATen/native/cuda/fused_adam_impl.cu b/aten/src/ATen/native/cuda/fused_adam_impl.cu new file mode 100644 index 000000000000..3674f83b20b1 --- /dev/null +++ b/aten/src/ATen/native/cuda/fused_adam_impl.cu @@ -0,0 +1,51 @@ +#include + +#include +#include +#include +#include +#include + +namespace at { namespace native { + +void _fused_adam_cuda_impl_( + at::TensorList params, + at::TensorList grads, + at::TensorList exp_avgs, + at::TensorList exp_avg_sqs, + at::TensorList state_steps, + const double lr, + const double beta1, + const double beta2, + const double weight_decay, + const double eps, + const bool amsgrad, + const bool maximize, + const c10::optional& grad_scale, + const c10::optional& found_inf +) { + std::vector> tensor_lists{ + params.vec(), grads.vec(), exp_avgs.vec(), exp_avg_sqs.vec() }; + + float* grad_scale_ptr = grad_scale.has_value() ? grad_scale->data_ptr() : nullptr; + float* found_inf_ptr = found_inf.has_value() ? found_inf->data_ptr() : nullptr; + + AT_DISPATCH_FLOATING_TYPES_AND2(kHalf, kBFloat16, params[0].scalar_type(), + "fused_adam_kernel_cuda", [&]() { + multi_tensor_apply_for_fused_optimizer<4>( + tensor_lists, + state_steps, + FusedAdamMathFunctor(), + lr, + beta1, + beta2, + weight_decay, + eps, + maximize, + /* amsgrad */false, + grad_scale_ptr, + found_inf_ptr); + }); +} + +} } // namespace at::native diff --git a/aten/src/ATen/native/cuda/fused_adam_impl.cuh b/aten/src/ATen/native/cuda/fused_adam_impl.cuh new file mode 100644 index 000000000000..a76ba566970f --- /dev/null +++ b/aten/src/ATen/native/cuda/fused_adam_impl.cuh @@ -0,0 +1,23 @@ +#pragma once +#include + +namespace at { namespace native { + +void _fused_adam_cuda_impl_( + at::TensorList params, + at::TensorList grads, + at::TensorList exp_avgs, + at::TensorList exp_avg_sqs, + at::TensorList state_steps, + const double lr, + const double beta1, + const double beta2, + const double weight_decay, + const double eps, + const bool amsgrad, + const bool maximize, + const c10::optional& grad_scale, + const c10::optional& found_inf +); + +} } // namespace at::native diff --git a/aten/src/ATen/native/cuda/fused_adam_utils.cuh b/aten/src/ATen/native/cuda/fused_adam_utils.cuh new file mode 100644 index 000000000000..8d7c410915c1 --- /dev/null +++ b/aten/src/ATen/native/cuda/fused_adam_utils.cuh @@ -0,0 +1,166 @@ +#pragma once +#include +#include +#include +#include + + +namespace at { namespace native { + +namespace { + +constexpr uint8_t kParamIdx = 0; +constexpr uint8_t kGradIdx = 1; +constexpr uint8_t kExpAvgIdx = 2; +constexpr uint8_t kExpAvgSqIdx = 3; +constexpr uint8_t kMaxExpAvgSqIdx = 4; + +template +C10_DEVICE __forceinline__ void adam_math( + scalar_type r_args[depth][kILP], + const float* step_count, + const double lr, + const double beta1, + const double beta2, + const double weight_decay, + const double eps, + const bool maximize, + const bool amsgrad, + const float* grad_scale_ptr, + const float* found_inf_ptr +) { +#pragma unroll + for (int ii = 0; ii < kILP; ii++) { + // Load values. + opmath_t param = static_cast(r_args[kParamIdx][ii]); + opmath_t grad = static_cast(r_args[kGradIdx][ii]); + if (grad_scale_ptr) { + grad /= (static_cast(*grad_scale_ptr)); + } + const opmath_t grad_to_store = grad; + if (maximize) { + grad = -grad; + } + opmath_t exp_avg = static_cast(r_args[kExpAvgIdx][ii]); + opmath_t exp_avg_sq = static_cast(r_args[kExpAvgSqIdx][ii]); + opmath_t max_exp_avg_sq; + if (amsgrad) { + max_exp_avg_sq = static_cast(r_args[kMaxExpAvgSqIdx][ii]); + } + + // Update param, grad, 1st and 2nd order momentum. + if (weight_decay != 0) { + grad += param * weight_decay; + } + // todo(crcrpar): use lerp + // ref: https://developer.nvidia.com/blog/lerp-faster-cuda/ + exp_avg = beta1 * exp_avg + (1 - beta1) * grad; + exp_avg_sq = beta2 * exp_avg_sq + (1 - beta2) * grad * grad; + + if (amsgrad) { + max_exp_avg_sq = std::max(max_exp_avg_sq, exp_avg_sq); + } + + const opmath_t bias_correction1 = 1 - at::native::pow_(beta1, *step_count); + const opmath_t bias_correction2 = 1 - at::native::pow_(beta2, *step_count); + + const opmath_t step_size = lr / bias_correction1; + + const opmath_t bias_correction2_sqrt = std::sqrt(bias_correction2); + + opmath_t denom; + if (amsgrad) { + denom = (std::sqrt(max_exp_avg_sq) / bias_correction2_sqrt) + eps; + } else { + denom = (std::sqrt(exp_avg_sq) / bias_correction2_sqrt) + eps; + } + + param -= step_size * exp_avg / denom; + + // Store results. + r_args[kParamIdx][ii] = param; + if (grad_scale_ptr) { + r_args[kGradIdx][ii] = grad_to_store; + } + r_args[kExpAvgIdx][ii] = exp_avg; + r_args[kExpAvgSqIdx][ii] = exp_avg_sq; + if (amsgrad) { + r_args[kMaxExpAvgSqIdx][ii] = max_exp_avg_sq; + } + } +} + +// [note: Conditional Gradient Store when `optimizer.step` is called by GradScaler] +// When a user is training their model(s) with an FP16 AMP recipe, +// parameter updates are done via `grad_scaler.step(optimizer)` instead of `optimizer.step()`. +// For most optimizers, GradScaler unscales gradients on behalf of those optimizers. +// Also, before `.step`, it makes sure that all the gradients involved are finite, which incurs a device sync. +// On the other hand, fused optimizers set their member variable of `_step_supports_amp_scaling` to `True` +// in order to remove the device sync above. This means that fused optimizers have to have +// their CUDA kernels (a) unscale gradients and (b) skip parameter updates accordingly. +// To be functionally on par with `torch.optim` optimizers and `_multi_tensor` ones, +// the kernel below writes out gradients only when `grad_scale_ptr != nullptr. +template +struct FusedAdamMathFunctor { + static_assert(depth == 4 || depth == 5, "depth of 4 for Adam, depth of 5 for Adam with AMSGrad."); + using opmath_t = at::opmath_type; + C10_DEVICE __forceinline__ void operator()( + int chunk_size, + FusedOptimizerTensorListMetadata& tl, + const double lr, + const double beta1, + const double beta2, + const double weight_decay, + const double eps, + const bool maximize, + const bool amsgrad, + const float* grad_scale_ptr, + const float* found_inf_ptr + ) { + int tensor_loc = tl.block_to_tensor[blockIdx.x]; + int chunk_idx = tl.block_to_chunk[blockIdx.x]; + int n = tl.numel_for_tensor[tensor_loc]; + + if (found_inf_ptr && *found_inf_ptr == 1) { + return; + } + float *step_count = reinterpret_cast(tl.state_steps_addresses[tensor_loc]); + + scalar_type* args[depth]; + const bool all_aligned{init_args(args, tl, chunk_idx, chunk_size, tensor_loc)}; + n -= chunk_idx * chunk_size; + scalar_type r_args[depth][kILP]; + + if ((n % kILP == 0) && (chunk_size % kILP == 0) && all_aligned) { + for (int i_start = threadIdx.x; i_start * kILP < n && i_start * kILP < chunk_size; i_start += blockDim.x) { +#pragma unroll + for (int i = 0; i < depth; i++) { + load_store(r_args[i], args[i], 0, i_start); + } + adam_math( + r_args, step_count, lr, beta1, beta2, weight_decay, eps, maximize, amsgrad, grad_scale_ptr, found_inf_ptr); +#pragma unroll + for (int i = 0; i < depth; i++) { + if (i != kGradIdx || grad_scale_ptr) { + load_store(args[i], r_args[i], i_start, 0); + } + } + } + } else { + for (int i_start = 0; i_start < n && i_start < chunk_size; i_start += blockDim.x * kILP) { + load_args(r_args, args, i_start, chunk_size, n); + adam_math( + r_args, step_count, lr, beta1, beta2, weight_decay, eps, maximize, amsgrad, grad_scale_ptr, found_inf_ptr); +#pragma unroll + for (int i = 0; i < depth; i++) { + if (i != kGradIdx || grad_scale_ptr) { + store_args(args[i], r_args[i], i_start, chunk_size, n); + } + } + } + } + } +}; +} // namespace + +}} // namespace at::native diff --git a/aten/src/ATen/native/cuda/im2col.cuh b/aten/src/ATen/native/cuda/im2col.cuh index 391b8c6d83af..06eef13208c6 100644 --- a/aten/src/ATen/native/cuda/im2col.cuh +++ b/aten/src/ATen/native/cuda/im2col.cuh @@ -1,9 +1,8 @@ #pragma once +#include #include #include -#include -#include #include @@ -103,6 +102,60 @@ void im2col( C10_CUDA_KERNEL_LAUNCH_CHECK(); } +template +__forceinline__ __device__ void col2im_device( + const int64_t index, + const dt* data_col, + const int64_t height, + const int64_t width, + const int64_t channels, + const int64_t kernel_h, + const int64_t kernel_w, + const int64_t pad_height, + const int64_t pad_width, + const int64_t stride_height, + const int64_t stride_width, + const int64_t dilation_height, + const int64_t dilation_width, + const int64_t height_col, + const int64_t width_col, + dt* data_im) { + accT val = static_cast(0); + const int64_t w_im = index % width + pad_width; + const int64_t h_im = (index / width) % height + pad_height; + const int64_t c_im = index / (width * height); + int64_t kernel_extent_w = (kernel_w - 1) * dilation_width + 1; + int64_t kernel_extent_h = (kernel_h - 1) * dilation_height + 1; + // compute the start and end of the output + const int64_t w_col_start = (w_im < kernel_extent_w) + ? 0 + : (w_im - kernel_extent_w) / stride_width + 1; + const int64_t w_col_end = ::min(w_im / stride_width + 1, width_col); + const int64_t h_col_start = (h_im < kernel_extent_h) + ? 0 + : (h_im - kernel_extent_h) / stride_height + 1; + const int64_t h_col_end = ::min(h_im / stride_height + 1, height_col); + + // TODO: use LCM of stride and dilation to avoid unnecessary loops + for (int64_t h_col = h_col_start; h_col < h_col_end; h_col += 1) { + for (int64_t w_col = w_col_start; w_col < w_col_end; w_col += 1) { + int64_t h_k = (h_im - h_col * stride_height); + int64_t w_k = (w_im - w_col * stride_width); + if (h_k % dilation_height == 0 && w_k % dilation_width == 0) { + h_k /= dilation_height; + w_k /= dilation_width; + int64_t data_col_index = + (((c_im * kernel_h + h_k) * kernel_w + w_k) * height_col + + h_col) * + width_col + + w_col; + val += data_col[data_col_index]; + } + } + } + data_im[index] = static_cast
(val); +} + template C10_LAUNCH_BOUNDS_1(512) __global__ void col2im_kernel( @@ -123,40 +176,23 @@ __global__ void col2im_kernel( const int64_t width_col, dt* data_im) { CUDA_KERNEL_LOOP(index, n) { - accT val = static_cast(0); - const int64_t w_im = index % width + pad_width; - const int64_t h_im = (index / width) % height + pad_height; - const int64_t c_im = index / (width * height); - int64_t kernel_extent_w = (kernel_w - 1) * dilation_width + 1; - int64_t kernel_extent_h = (kernel_h - 1) * dilation_height + 1; - // compute the start and end of the output - const int64_t w_col_start = (w_im < kernel_extent_w) - ? 0 - : (w_im - kernel_extent_w) / stride_width + 1; - const int64_t w_col_end = ::min(w_im / stride_width + 1, width_col); - const int64_t h_col_start = (h_im < kernel_extent_h) - ? 0 - : (h_im - kernel_extent_h) / stride_height + 1; - const int64_t h_col_end = ::min(h_im / stride_height + 1, height_col); - - // TODO: use LCM of stride and dilation to avoid unnecessary loops - for (int64_t h_col = h_col_start; h_col < h_col_end; h_col += 1) { - for (int64_t w_col = w_col_start; w_col < w_col_end; w_col += 1) { - int64_t h_k = (h_im - h_col * stride_height); - int64_t w_k = (w_im - w_col * stride_width); - if (h_k % dilation_height == 0 && w_k % dilation_width == 0) { - h_k /= dilation_height; - w_k /= dilation_width; - int64_t data_col_index = - (((c_im * kernel_h + h_k) * kernel_w + w_k) * height_col + - h_col) * - width_col + - w_col; - val += data_col[data_col_index]; - } - } - } - data_im[index] = static_cast
(val); + col2im_device( + index, + data_col, + height, + width, + channels, + kernel_h, + kernel_w, + pad_height, + pad_width, + stride_height, + stride_width, + dilation_height, + dilation_width, + height_col, + width_col, + data_im); } } @@ -203,5 +239,107 @@ void col2im( C10_CUDA_KERNEL_LAUNCH_CHECK(); } +template +C10_LAUNCH_BOUNDS_1(512) +__global__ void col2im_batched_kernel( + const int64_t n, + const dt* data_col, + const int64_t col_batch_stride, + const int64_t nbatch, + const int64_t height, + const int64_t width, + const int64_t channels, + const int64_t kernel_h, + const int64_t kernel_w, + const int64_t pad_height, + const int64_t pad_width, + const int64_t stride_height, + const int64_t stride_width, + const int64_t dilation_height, + const int64_t dilation_width, + const int64_t height_col, + const int64_t width_col, + dt* data_im, + const int64_t im_batch_stride) { + using accT = at::acc_type; + const auto im_numel = n * nbatch; + + CUDA_KERNEL_LOOP_TYPE(index, im_numel, int64_t) { + const auto ibatch = index / n; + const auto slice_index = index % n; + + col2im_device( + slice_index, + data_col + ibatch * col_batch_stride, + height, + width, + channels, + kernel_h, + kernel_w, + pad_height, + pad_width, + stride_height, + stride_width, + dilation_height, + dilation_width, + height_col, + width_col, + data_im + ibatch * im_batch_stride); + } +} + +template +void col2im_batched( + cudaStream_t stream, + const dt* data_col, + const int64_t col_batch_stride, + const int64_t nbatch, + const int64_t channels, + const int64_t height, + const int64_t width, + const int64_t height_col, + const int64_t width_col, + const int64_t patch_height, + const int64_t patch_width, + const int64_t pad_height, + const int64_t pad_width, + const int64_t stride_height, + const int64_t stride_width, + const int64_t dilation_height, + const int64_t dilation_width, + dt* data_im, + const int64_t im_batch_stride) { + const int64_t num_kernels = channels * height * width; + const int64_t output_numel = nbatch * num_kernels; + if (output_numel == 0) { + return; // No work to do + } + + // To avoid involving atomic operations, we will launch one kernel per + // bottom dimension, and then in the kernel add up the top dimensions. + // CUDA_NUM_THREADS = 1024 + col2im_batched_kernel<<>>( + num_kernels, + data_col, + col_batch_stride, + nbatch, + height, + width, + channels, + patch_height, + patch_width, + pad_height, + pad_width, + stride_height, + stride_width, + dilation_height, + dilation_width, + height_col, + width_col, + data_im, + im_batch_stride); + C10_CUDA_KERNEL_LAUNCH_CHECK(); +} + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/cuda/jit_utils.cpp b/aten/src/ATen/native/cuda/jit_utils.cpp index 673ea9f476e4..a1266fb1b504 100644 --- a/aten/src/ATen/native/cuda/jit_utils.cpp +++ b/aten/src/ATen/native/cuda/jit_utils.cpp @@ -3,10 +3,10 @@ #include #include #include -#include #include #include #include +#include #include #include #include @@ -40,7 +40,148 @@ namespace at { namespace cuda { namespace jit { +// hiprtc already includes some traits, so this removes duplicate definitions of +// integral_constant, is_same, is_integral, enable_if, is_floating_point, is_arithmetic. +// Copied from aten/src/ATen/cuda/llvm_basic.cpp, then modified as above. +// If not compiling for ROCm, return the original get_traits_string(). +std::string get_traits_string_but_hiprtc_safe() { +#ifdef USE_ROCM + return R"ESCAPE( +namespace std { + +template +_Tp&& __declval(int); +template +_Tp __declval(long); +template +decltype(__declval<_Tp>(0)) declval() noexcept; + +template struct remove_const {typedef _Tp type;}; +template struct remove_const {typedef _Tp type;}; +template using remove_const_t = typename remove_const<_Tp>::type; + +template struct remove_volatile {typedef _Tp type;}; +template struct remove_volatile {typedef _Tp type;}; +template using remove_volatile_t = typename remove_volatile<_Tp>::type; + +template struct remove_cv +{typedef typename remove_volatile::type>::type type;}; +template using remove_cv_t = typename remove_cv<_Tp>::type; + +template struct __libcpp_is_floating_point : public false_type {}; +template <> struct __libcpp_is_floating_point : public true_type {}; +template <> struct __libcpp_is_floating_point : public true_type {}; +template <> struct __libcpp_is_floating_point : public true_type {}; + +template +inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value; + +template +struct __numeric_type +{ + static void __test(...); + static float __test(float); + static double __test(char); + static double __test(int); + static double __test(unsigned); + static double __test(long); + static double __test(unsigned long); + static double __test(long long); + static double __test(unsigned long long); + static double __test(double); + static long double __test(long double); + + typedef decltype(__test(declval<_Tp>())) type; + static const bool value = !is_same::value; +}; + +template <> +struct __numeric_type +{ + static const bool value = true; +}; + +// __promote + +template ::value && + __numeric_type<_A2>::value && + __numeric_type<_A3>::value> +class __promote_imp +{ +public: + static const bool value = false; +}; + +template +class __promote_imp<_A1, _A2, _A3, true> +{ +private: + typedef typename __promote_imp<_A1>::type __type1; + typedef typename __promote_imp<_A2>::type __type2; + typedef typename __promote_imp<_A3>::type __type3; +public: + typedef decltype(__type1() + __type2() + __type3()) type; + static const bool value = true; +}; + +template +class __promote_imp<_A1, _A2, void, true> +{ +private: + typedef typename __promote_imp<_A1>::type __type1; + typedef typename __promote_imp<_A2>::type __type2; +public: + typedef decltype(__type1() + __type2()) type; + static const bool value = true; +}; + +template +class __promote_imp<_A1, void, void, true> +{ +public: + typedef typename __numeric_type<_A1>::type type; + static const bool value = true; +}; + +template +class __promote : public __promote_imp<_A1, _A2, _A3> {}; + +} // namespace std +)ESCAPE"; +#else + return get_traits_string(); +#endif +} + +#ifdef USE_ROCM +const std::string jit_preamble = R"ESCAPE( +#pragma clang force_cuda_host_device begin +)ESCAPE"; +const std::string jit_epilogue = R"ESCAPE( +#pragma clang force_cuda_host_device end +)ESCAPE"; +#else +const std::string jit_preamble; +const std::string jit_epilogue; +#endif + const std::string jit_common_types = R"ESCAPE( + #ifdef __HIPCC__ + #define ERROR_UNSUPPORTED_CAST ; + // corresponds to aten/src/ATen/native/cuda/thread_constants.h + #define CUDA_OR_ROCM_NUM_THREADS 256 + // corresponds to aten/src/ATen/cuda/detail/OffsetCalculator.cuh + #define MAX_DIMS 16 + #ifndef __forceinline__ + #define __forceinline__ inline __attribute__((always_inline)) + #endif + #else + //TODO use _assert_fail, because assert is disabled in non-debug builds + #define ERROR_UNSUPPORTED_CAST assert(false); + #define CUDA_OR_ROCM_NUM_THREADS 128 + #define MAX_DIMS 25 + #endif #define POS_INFINITY __int_as_float(0x7f800000) #define INFINITY POS_INFINITY #define NEG_INFINITY __int_as_float(0xff800000) @@ -54,11 +195,9 @@ const std::string jit_common_types = R"ESCAPE( static_assert(sizeof(int64_t) == 8, "expected size does not match"); static_assert(sizeof(uint32_t) == 4, "expected size does not match"); static_assert(sizeof(int8_t) == 1, "expected size does not match"); - constexpr int num_threads = 128; + constexpr int num_threads = CUDA_OR_ROCM_NUM_THREADS; constexpr int thread_work_size = 4; // TODO: make template substitution once we decide where those vars live constexpr int block_work_size = thread_work_size * num_threads; - //TODO use _assert_fail, because assert is disabled in non-debug builds - #define ERROR_UNSUPPORTED_CAST assert(false); ${traits_string} ${cmath_string} @@ -146,15 +285,22 @@ struct alignas(2) Half { Half() = default; inline __host__ __device__ Half(float value){ +#ifdef __HIPCC__ + x = __half_as_short(__float2half(value)); +#else asm("{ cvt.rn.f16.f32 %0, %1;}\n" : "=h"(x) : "f"(value)); +#endif } inline __host__ __device__ operator float() const{ +#ifdef __HIPCC__ + return __half2float(*reinterpret_cast(&x)); +#else float val; asm("{ cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(x)); // do we need const cast here? //asm("{ cvt.f32.f16 %0, %1;}\n" : "=f"(val) : "h"(__HALF_TO_CUS(x))); return val; +#endif } - }; } )ESCAPE"; @@ -201,9 +347,18 @@ struct alignas(2) BFloat16 { } inline __host__ __device__ operator float() const{ +#ifdef __HIPCC__ + union + { + uint32_t int32; + float fp32; + } u = {uint32_t(x) << 16}; + return u.fp32; +#else float val; asm("{ mov.b32 %0, {0,%1};}\n" : "=f"(val) : "h"(x)); //do we need const cast here? return val; +#endif } }; @@ -450,7 +605,7 @@ const std::string offset_calc_template = R"ESCAPE( } #pragma unroll - for (int dim = 0; dim < 25; ++dim) { + for (int dim = 0; dim < MAX_DIMS; ++dim) { if (dim == dims) { break; } @@ -469,9 +624,9 @@ const std::string offset_calc_template = R"ESCAPE( } int dims; - IntDivider sizes_[25]; + IntDivider sizes_[MAX_DIMS]; // NOTE: this approach will not support nInputs == 0 - ${index_type} strides_[25][NARGS]; + ${index_type} strides_[MAX_DIMS][NARGS]; }; @@ -501,7 +656,7 @@ const std::string jit_code_template = R"ESCAPE( int idx = blockIdx.x; int remaining = numel - block_work_size * idx; - auto thread_idx = threadIdx.x; + int thread_idx = threadIdx.x; #pragma unroll for (int j = 0; j < thread_work_size; j++){ @@ -592,7 +747,7 @@ const std::string jit_vectorized_code_template = R"ESCAPE( constexpr int vec_size = ${vec_size}; using scalar_t = ${scalar_type}; int remaining = N - block_work_size * blockIdx.x; - auto thread_idx = threadIdx.x; + int thread_idx = threadIdx.x; int idx = blockIdx.x; ${declare_load_arrays} ${declare_store_arrays} @@ -651,6 +806,49 @@ const std::string jit_vectorized_code_template = R"ESCAPE( } )ESCAPE"; +static void replace_all(std::string& s, const std::string& to_replace, const std::string& replace_with) { + std::ostringstream oss; + std::size_t pos = 0; + std::size_t prev_pos = pos; + + while (true) { + prev_pos = pos; + pos = s.find(to_replace, pos); + if (pos == std::string::npos) + break; + oss << s.substr(prev_pos, pos - prev_pos); + oss << replace_with; + pos += to_replace.size(); + } + + oss << s.substr(prev_pos); + s = oss.str(); +} + +// hipify replaces certain device math functions, e.g., std::max -> ::max +// See torch/utils/hipify/cuda_to_hip_mappings.py. +// Replace them back. Search for " ::" to avoid duplicate replacements. +static std::string unhipify_math_functions(const std::string &original) { + static std::vector> mappings = { + {" std::max", " ::max"}, + {" std::min", " ::min"}, + {" std::ceil", " ::ceil"}, + {" std::floor", " ::floor"}, + {" std::exp", " ::exp"}, + {" std::log", " ::log"}, + {" std::pow", " ::pow"}, + {" std::fabs", " ::fabs"}, + {" std::fmod", " ::fmod"}, + {" std::remainder", " ::remainder"}, + {" std::frexp", " ::frexp"} + }; + std::string ret = original; + for (const auto& mapping : mappings) { + replace_all(ret, mapping.second, mapping.first); + } + return ret; +} + // The following is copied from fused_kernel.cpp // TODO: refactor codegenOutputQuery into its own file // that can be included by both files @@ -668,7 +866,12 @@ void codegenOutputQuery( int& nvrtc_major, int& nvrtc_minor, bool& compile_to_sass) { - +#ifdef USE_ROCM + AT_CUDA_NVRTC_CHECK(nvrtc().nvrtcVersion(&nvrtc_major, &nvrtc_minor)); + cuda_major = prop->major; + cuda_minor = prop->minor; + compile_to_sass = false; +#else AT_CUDA_NVRTC_CHECK(nvrtc().nvrtcVersion(&nvrtc_major, &nvrtc_minor)); TORCH_CHECK( nvrtc_major >= 6, "NVRTC versions less than 6 are not supported. Is: ", nvrtc_major); @@ -690,6 +893,8 @@ void codegenOutputQuery( max_dev_version = CUDAVersion(7, 5); } else if (nvrtc_version == CUDAVersion(11, 0)) { // 11.0 supports 3-8.0 max_dev_version = CUDAVersion(8, 0); + } else if (nvrtc_major == 11 && nvrtc_minor < 8) { + max_dev_version = CUDAVersion(8, 6); } else { // If the driver version is unknown (i.e. newer than this code) // assume the driver supports this device @@ -711,6 +916,7 @@ void codegenOutputQuery( // compile to sass is not allowed prior to CUDA 11.1 compile_to_sass = false; #endif +#endif } // TODO: another copy paste from jit, refactor so it's usable from both @@ -722,7 +928,7 @@ void __inline__ initializeCudaContext() { AT_CUDA_DRIVER_CHECK(at::globalContext().getNVRTC().cuCtxGetCurrent(&pctx)); if (!pctx) { std::unique_lock cudaFreeMutexLock( - *(c10::cuda::CUDACachingAllocator::getFreeMutex())); + *(c10::cuda::getFreeMutex())); cudaFree(nullptr); } } @@ -764,7 +970,7 @@ constexpr int thread_work_size = THREAD_WORK_SIZE; std::string generate_code( int nInputs, int nOutputs, - const std::string& func, + const std::string& func_, const std::string& name, const std::string& f_inputs_type, const std::string& compute_type, @@ -776,6 +982,7 @@ std::string generate_code( bool vectorized, int vec_size, bool return_by_ref) { + std::string func = func_; at::jit::TemplateEnv env; env.s("index_type", "unsigned int"); @@ -887,11 +1094,16 @@ std::string generate_code( f_inputs_type == "std::complex" || result_type == "std::complex" || f_inputs_type == "std::complex" || result_type == "std::complex") { // complex depends on complex and Half dtypes. - env.s("traits_string", get_traits_string()); + env.s("traits_string", get_traits_string_but_hiprtc_safe()); env.s("complex_body_string", get_complex_body_string()); env.s("complex_math_string", get_complex_math_string()); +#ifdef USE_ROCM + // unhipify math functions, but only if std::complex is used. + func = unhipify_math_functions(func); + env.s("functor", func); +#endif } else if (dynamic_casting) { - env.s("traits_string", get_traits_string()); + env.s("traits_string", get_traits_string_but_hiprtc_safe()); env.s("complex_body_string", get_complex_body_string()); env.s("complex_math_string", ""); } else { @@ -948,7 +1160,8 @@ std::string generate_code( } env.s("store_outputs", store_outputs.str()); - static auto cuda_template = at::jit::CodeTemplate(jit_common_types + offset_calc_template + jit_code_template); + static auto cuda_template = at::jit::CodeTemplate( + jit_preamble + jit_common_types + offset_calc_template + jit_code_template + jit_epilogue); const auto code = cuda_template.format(env); return code; } @@ -1014,7 +1227,8 @@ std::string generate_code( } env.s("store_unrolled_outputs", store_unrolled_outputs.str()); - static auto cuda_template = at::jit::CodeTemplate(jit_common_types + jit_vectorized_code_template); + static auto cuda_template = at::jit::CodeTemplate( + jit_preamble + jit_common_types + jit_vectorized_code_template + jit_epilogue); const auto code = cuda_template.format(env); return code; } @@ -1114,7 +1328,7 @@ std::string generate_reduction_code( std::string generate_reduction_code( int nOutputs, - const std::string& func, + const std::string& func_, const std::string& name, const int vt0, const std::string& f_inputs_type, @@ -1124,6 +1338,7 @@ std::string generate_reduction_code( bool vectorized, int vec_size, int max_threads_codegen) { + std::string func = func_; at::jit::TemplateEnv env; env.s("index_type", "unsigned int"); env.s("scalar_type", f_inputs_type); @@ -1149,10 +1364,14 @@ std::string generate_reduction_code( f_inputs_type == "std::complex" || f_inputs_type == "std::complex" ) { // complex depends on complex and Half dtypes. - env.s("traits_string", get_traits_string()); + env.s("traits_string", get_traits_string_but_hiprtc_safe()); env.s("complex_body_string", get_complex_body_string()); env.s("complex_math_string", get_complex_math_string()); env.s("complex", std::to_string(1)); +#ifdef USE_ROCM + // unhipify math functions, but only if std::complex is used. + func = unhipify_math_functions(func); +#endif } else { env.s("traits_string", ""); env.s("complex_body_string", ""); @@ -1168,7 +1387,7 @@ std::string generate_reduction_code( env.s("functor", func); env.s("output_vec_size", std::to_string(vec_size)); static auto cuda_template = at::jit::CodeTemplate( - jit_common_types + offset_calc_template + get_reduction_template()); + jit_preamble + jit_common_types + offset_calc_template + get_reduction_template() + jit_epilogue); const auto code = cuda_template.format(env); return code; } @@ -1312,6 +1531,9 @@ NvrtcFunction jit_pwise_function( AT_CUDA_NVRTC_CHECK(nvrtc.nvrtcCreateProgram( &program, code.c_str(), nullptr, 0, nullptr, nullptr)); +#ifdef USE_ROCM + std::vector args = {"--std=c++14"}; +#else // Constructs nvrtc build arguments // CUDA 11.1 allows going directly to SASS (sm_) instead of PTX (compute_) // which gives better backwards compatibility to work on older driver, @@ -1326,6 +1548,7 @@ NvrtcFunction jit_pwise_function( // NOLINTNEXTLINE(cppcoreguidelines-init-variables) std::vector args = { "--std=c++14", compute.c_str(), "-default-device"}; +#endif #ifndef NDEBUG // Add line info to generated kernels diff --git a/aten/src/ATen/native/cuda/jit_utils.h b/aten/src/ATen/native/cuda/jit_utils.h index 13aa723db275..8206f67316e1 100644 --- a/aten/src/ATen/native/cuda/jit_utils.h +++ b/aten/src/ATen/native/cuda/jit_utils.h @@ -8,7 +8,6 @@ #include #include #include -#include namespace at { namespace cuda { namespace jit { diff --git a/aten/src/ATen/native/cuda/layer_norm_kernel.cu b/aten/src/ATen/native/cuda/layer_norm_kernel.cu index 96d700c761eb..693524818fb4 100644 --- a/aten/src/ATen/native/cuda/layer_norm_kernel.cu +++ b/aten/src/ATen/native/cuda/layer_norm_kernel.cu @@ -25,6 +25,8 @@ #endif #include +#include + namespace at { namespace native { @@ -33,6 +35,7 @@ namespace { constexpr int kCUDANumThreads = 256; constexpr int kColwiseReduceTileSize = 32; +constexpr unsigned int kWarpSize = 32; constexpr int vec_size = 4; //we could make it dependent on dtype, but that would lead to different results between float and low-p types // aligned vector generates vectorized load/store on CUDA (copy-pasted from MemoryAccess.cuh) @@ -555,8 +558,108 @@ __global__ void GammaBetaBackwardCUDAKernel1( } } +template +__global__ void GammaBetaBackwardCUDAKernel_32x32( + int64_t M, + int64_t N, + const T* dY, + const T* X, + const T_ACC* mean, + const T_ACC* rstd, + T* dg, + T* db) { + alignas(sizeof(double)) extern __shared__ char s_data1[]; + T_ACC* s_data_typed = reinterpret_cast(&s_data1); + T_ACC* s_dg; + T_ACC* s_db; + + T_ACC dg_sum = 0; + T_ACC db_sum = 0; + + const int64_t j = blockIdx.x * blockDim.x + threadIdx.x; + + if (j < N) { + constexpr int unroll_factor = 8; + int laneId = threadIdx.x & 0x1f; + + T_ACC mean_reg, mean_reg_tmp; + T_ACC rstd_reg, rstd_reg_tmp; + T dY_reg; + T X_reg; + + // Main loop + int bcounter; + for (bcounter = 0; bcounter < M / (blockDim.y * unroll_factor); + bcounter++) { + int offset = (bcounter * blockDim.y + threadIdx.y) * unroll_factor; + + if (laneId < unroll_factor) { + mean_reg_tmp = mean[offset + laneId]; + rstd_reg_tmp = rstd[offset + laneId]; + } +#if !defined(USE_ROCM) + // Volta and newer architectures allow lane divergence within a warp. + __syncwarp(); +#endif + + #pragma unroll + for (int ii = 0; ii < unroll_factor; ++ii) { + dY_reg = dY[(offset + ii) * N + j]; + X_reg = X[(offset + ii) * N + j]; + mean_reg = WARP_SHFL(mean_reg_tmp, ii, kWarpSize); + rstd_reg = WARP_SHFL(rstd_reg_tmp, ii, kWarpSize); + dg_sum += dY_reg * (X_reg - mean_reg) * rstd_reg; + db_sum += dY_reg; + } + } + + // Remainder loop + int offset = (bcounter * blockDim.y + threadIdx.y) * unroll_factor; + for (int ii = 0; ii < unroll_factor; ii++) { + if ((offset + ii) < M) { + mean_reg = mean[offset + ii]; + rstd_reg = rstd[offset + ii]; + dY_reg = dY[(offset + ii) * N + j]; + X_reg = X[(offset + ii) * N + j]; + dg_sum += dY_reg * (X_reg - mean_reg) * rstd_reg; + db_sum += dY_reg; + } + } + + // This kernel uses a block of (32 x 32) and gets called when M; N + // divide by 32. We can use warp shuffles for the final reduction + // step. This removes 4 shmem loads and stores with their + // corresponding __syncthreads() + + // This greatly reduces bank conflicts at the expense of a little + // extra shared memory. It does not impact occupancy + int padded_bx = (1 + blockDim.x); + s_dg = s_data_typed; + s_db = s_data_typed + (padded_bx * blockDim.y); + s_dg[threadIdx.y * padded_bx + threadIdx.x] = dg_sum; + s_db[threadIdx.y * padded_bx + threadIdx.x] = db_sum; + __syncthreads(); + + // Load transposed so that a warp holds an entire column + T_ACC reg_dg = s_dg[threadIdx.x * padded_bx + threadIdx.y]; + T_ACC reg_db = s_db[threadIdx.x * padded_bx + threadIdx.y]; + for (int delta = 16; delta >= 1; delta /= 2) { + reg_dg += WARP_SHFL_XOR(reg_dg, delta, kWarpSize); + reg_db += WARP_SHFL_XOR(reg_db, delta, kWarpSize); + } + if (threadIdx.x == 0) { + const int64_t j = blockIdx.x * blockDim.x + threadIdx.y; + if (dg) { + dg[j] = reg_dg; + } + if (db) { + db[j] = reg_db; + } + } + } +} template __global__ void GammaBetaBackwardCUDAKernel( @@ -569,66 +672,75 @@ __global__ void GammaBetaBackwardCUDAKernel( T* dg, T* db) { alignas(sizeof(double)) extern __shared__ char s_data1[]; - T_ACC * s_data_typed = reinterpret_cast(&s_data1); + T_ACC* s_data_typed = reinterpret_cast(&s_data1); + T_ACC* s_dg; + T_ACC* s_db; + const int64_t j = blockIdx.x * blockDim.x + threadIdx.x; - constexpr int unroll = 8; - T dYs[unroll]; - T Xs[unroll]; - T_ACC * means = s_data_typed; - T_ACC * rstds = s_data_typed + unroll * blockDim.y; + T_ACC dg_sum = 0; T_ACC db_sum = 0; + if (j < N) { + constexpr int unroll_factor = 8; + + T_ACC mean_reg; + T_ACC rstd_reg; + T dY_reg; + T X_reg; + + // Main Loop int bcounter; - for (bcounter = 0; bcounter < M/(blockDim.y * unroll); bcounter++){ - int offset = (bcounter * blockDim.y + threadIdx.y) * unroll; - #pragma unroll - for (int ii=0; ii=1; offset /= 2){ + + for (int offset = blockDim.y / 2; offset >= 1; offset /= 2) { if (threadIdx.y < offset) { - s_data_typed[threadIdx.y * blockDim.x + threadIdx.x] += s_data_typed[(threadIdx.y + offset) * blockDim.x + threadIdx.x]; - s_data_typed[blockDim.x * blockDim.y + threadIdx.y * blockDim.x + threadIdx.x] += - s_data_typed[blockDim.x * blockDim.y + (threadIdx.y + offset) * blockDim.x + threadIdx.x]; - } + s_dg[threadIdx.y * blockDim.x + threadIdx.x] += + s_dg[(threadIdx.y + offset) * blockDim.x + threadIdx.x]; + s_db[threadIdx.y * blockDim.x + threadIdx.x] += + s_db[(threadIdx.y + offset) * blockDim.x + threadIdx.x]; + } __syncthreads(); } + if (threadIdx.y == 0) { if (dg) { - dg[j] = s_data_typed[threadIdx.x]; + dg[j] = s_dg[threadIdx.x]; } if (db) { - db[j] = s_data_typed[threadIdx.x + blockDim.x * blockDim.y]; + db[j] = s_db[threadIdx.x]; } } } @@ -684,7 +796,7 @@ void LayerNormKernelImplInternal( auto can_vectorize = [&](const T * ptr, int alignment){uint64_t addr = reinterpret_cast(ptr); return addr % alignment == 0;}; constexpr int num_vec_elems = vec_size; constexpr int alignment = num_vec_elems * sizeof(T); - if ((std::is_same::value || std::is_same::value) && + if ((std::is_same::value || std::is_same::value || std::is_same::value) && N <= 1ULL << std::numeric_limits::digits && N % num_vec_elems == 0 && can_vectorize(X_data, alignment) && can_vectorize(Y_data, alignment)) { launch_vectorized_layer_norm_kernel(static_cast(N), M, eps, X_data, gamma_data, beta_data, Y_data, mean_data, rstd_data); @@ -722,6 +834,201 @@ void LayerNormKernelImpl( }); } +template __device__ +void cuLoadWriteStridedInputs( + const int i1_block, + const int thr_load_row_off, + const int thr_load_col_off, + const int i2_off, + const int row_stride, + T_ACC* warp_buf1, + T_ACC* warp_buf2, + const T* input, + const T* dout, + const int i1_end, + const int64_t N, + const T_ACC* __restrict__ mean, + const T_ACC* __restrict__ rstd) +{ + int i1 = i1_block+thr_load_row_off; + if (i1 < i1_end) { + T curr_mean = mean[i1]; + T curr_rstd = rstd[i1]; + for (int k = 0; k < blockDim.y; ++k) { + int i2 = i2_off + k; + int load_idx = i1*N+i2; + int write_idx = thr_load_row_off*row_stride+thr_load_col_off+k; + if (i2(input[load_idx]); + T curr_dout = static_cast(dout[load_idx]); + warp_buf1[write_idx] = curr_dout; + warp_buf2[write_idx] = curr_dout * (curr_input - curr_mean) * curr_rstd; + } else { + warp_buf1[write_idx] = T(0); + warp_buf2[write_idx] = T(0); + } + } + } else { + for (int k = 0; k < blockDim.y; ++k) { + int write_idx = thr_load_row_off*row_stride+thr_load_col_off+k; + warp_buf1[write_idx] = T(0); + warp_buf2[write_idx] = T(0); + } + } +} + +template __device__ +void cuLoadAddStridedInputs( + const int i1_block, + const int thr_load_row_off, + const int thr_load_col_off, + const int i2_off, + const int row_stride, + T_ACC* warp_buf1, + T_ACC* warp_buf2, + const T* input, + const T* dout, + const int i1_end, + const int64_t N, + const T_ACC* __restrict__ mean, + const T_ACC* __restrict__ rstd) +{ + int i1 = i1_block+thr_load_row_off; + if (i1 < i1_end) { + T_ACC curr_mean = mean[i1]; + T_ACC curr_rstd = rstd[i1]; + for (int k = 0; k < blockDim.y; ++k) { + int i2 = i2_off + k; + int load_idx = i1*N+i2; + int write_idx = thr_load_row_off*row_stride+thr_load_col_off+k; + if (i2(input[load_idx]); + T_ACC curr_dout = static_cast(dout[load_idx]); + warp_buf1[write_idx] += curr_dout; + warp_buf2[write_idx] += curr_dout * (curr_input - curr_mean) * curr_rstd; + } + } + } +} + +template __global__ +void cuComputePartGradGammaBeta( + const T* __restrict__ dout, + const T* __restrict__ input, + const int64_t M, + const int64_t N, + const T_ACC* __restrict__ mean, + const T_ACC* __restrict__ rstd, + T_ACC* part_grad_gamma, + T_ACC* part_grad_beta) +{ + const int numsegs_M = (M+blockDim.y*blockDim.y-1) / (blockDim.y*blockDim.y); + const int segs_per_block = (numsegs_M + gridDim.y - 1) / gridDim.y; + const int i1_beg = blockIdx.y * segs_per_block * blockDim.y*blockDim.y; + const int i1_beg_plus_one = (blockIdx.y+1) * segs_per_block * blockDim.y*blockDim.y; + const int i1_end = i1_beg_plus_one < M ? i1_beg_plus_one : M; + const int row_stride = blockDim.x+1; + const int thr_load_col_off = (threadIdx.x*blockDim.y)&(blockDim.x-1); + const int thr_load_row_off = (threadIdx.x*blockDim.y)/blockDim.x + threadIdx.y*blockDim.y; + const int i2_off = blockIdx.x * blockDim.x + thr_load_col_off; + alignas(sizeof(double)) extern __shared__ char shared[]; + T_ACC * buf = reinterpret_cast(&shared); // buf has at least blockDim.x * blockDim.y * blockDim.y + (blockDim.y - 1)*(blockDim.x/blockDim.y) elements + T_ACC* warp_buf1 = (T_ACC*)buf; + T_ACC* warp_buf2 = warp_buf1 + blockDim.y * blockDim.y * row_stride; + // compute partial sums from strided inputs + // do this to increase number of loads in flight + cuLoadWriteStridedInputs(i1_beg,thr_load_row_off,thr_load_col_off,i2_off,row_stride,warp_buf1,warp_buf2,input,dout,i1_end,N,mean,rstd); + for (int i1_block = i1_beg+blockDim.y*blockDim.y; i1_block < i1_end; i1_block+=blockDim.y*blockDim.y) { + cuLoadAddStridedInputs(i1_block,thr_load_row_off,thr_load_col_off,i2_off,row_stride,warp_buf1,warp_buf2,input,dout,i1_end,N,mean,rstd); + } + __syncthreads(); + // inter-warp reductions + // sum within each warp + T_ACC acc1 = T_ACC(0); + T_ACC acc2 = T_ACC(0); + for (int k = 0; k < blockDim.y; ++k) { + int row1 = threadIdx.y + k*blockDim.y; + int idx1 = row1*row_stride + threadIdx.x; + acc1 += warp_buf1[idx1]; + acc2 += warp_buf2[idx1]; + } + warp_buf1[threadIdx.y*row_stride+threadIdx.x] = acc1; + warp_buf2[threadIdx.y*row_stride+threadIdx.x] = acc2; + __syncthreads(); + // sum all warps + for (int offset = blockDim.y/2; offset > 1; offset /= 2) { + if (threadIdx.y < offset) { + int row1 = threadIdx.y; + int row2 = threadIdx.y + offset; + int idx1 = row1*row_stride + threadIdx.x; + int idx2 = row2*row_stride + threadIdx.x; + warp_buf1[idx1] += warp_buf1[idx2]; + warp_buf2[idx1] += warp_buf2[idx2]; + } + __syncthreads(); + } + int i2 = blockIdx.x * blockDim.x + threadIdx.x; + if (threadIdx.y == 0 && i2 < N) { + int row1 = threadIdx.y; + int row2 = threadIdx.y + 1; + int idx1 = row1*row_stride + threadIdx.x; + int idx2 = row2*row_stride + threadIdx.x; + part_grad_beta[blockIdx.y*N+i2] = warp_buf1[idx1] + warp_buf1[idx2]; + part_grad_gamma[blockIdx.y*N+i2] = warp_buf2[idx1] + warp_buf2[idx2]; + } +} + +template __global__ +void cuComputeGradGammaBeta( + const T_ACC* part_grad_gamma, + const T_ACC* part_grad_beta, + const int part_size, + const int64_t M, + const int64_t N, + T* grad_gamma, + T* grad_beta) +{ + // sum partial gradients for gamma and beta + alignas(sizeof(double)) extern __shared__ char shared[]; + T_ACC * buf = reinterpret_cast(&shared); + int i2 = blockIdx.x * blockDim.x + threadIdx.x; + if (i2 < N) { + // each warp does sequential reductions until reduced part_size is num_warps + int num_warp_reductions = part_size / blockDim.y; + T_ACC sum_gamma = T_ACC(0); + T_ACC sum_beta = T_ACC(0); + const T_ACC* part_grad_gamma_ptr = part_grad_gamma + threadIdx.y * num_warp_reductions * N + i2; + const T_ACC* part_grad_beta_ptr = part_grad_beta + threadIdx.y * num_warp_reductions * N + i2; + for (int warp_offset = 0; warp_offset < num_warp_reductions; ++warp_offset) { + sum_gamma += part_grad_gamma_ptr[warp_offset*N]; + sum_beta += part_grad_beta_ptr[warp_offset*N]; + } + // inter-warp reductions + const int nbsize3 = blockDim.x * blockDim.y / 2; + for (int offset = blockDim.y/2; offset >= 1; offset /= 2) { + // top half write to shared memory + if (threadIdx.y >= offset && threadIdx.y < 2*offset) { + const int write_idx = (threadIdx.y - offset) * blockDim.x + threadIdx.x; + buf[write_idx] = sum_gamma; + buf[write_idx+nbsize3] = sum_beta; + } + __syncthreads(); + // bottom half sums + if (threadIdx.y < offset) { + const int read_idx = threadIdx.y * blockDim.x + threadIdx.x; + sum_gamma += buf[read_idx]; + sum_beta += buf[read_idx+nbsize3]; + } + __syncthreads(); + } + // write out fully summed gradients + if (threadIdx.y == 0) { + grad_gamma[i2] = sum_gamma; + grad_beta[i2] = sum_beta; + } + } +} + template void LayerNormBackwardKernelImplInternal( const Tensor& dY, @@ -750,8 +1057,8 @@ void LayerNormBackwardKernelImplInternal( gamma.defined() ? gamma.template data_ptr() : nullptr; T* dX_data = dX->defined() ? dX->template data_ptr() : nullptr; cudaStream_t cuda_stream = at::cuda::getCurrentCUDAStream(); + const int warp_size = at::cuda::warp_size(); if (dX_data != nullptr) { - const int warp_size = at::cuda::warp_size(); const dim3 blocks(M); int nshared = (num_threads()/warp_size) * sizeof(T_ACC); layer_norm_grad_input_kernel<<>>(dY_data, @@ -763,7 +1070,8 @@ void LayerNormBackwardKernelImplInternal( T* dgamma_data = dgamma->defined() ? dgamma->template data_ptr() : nullptr; T* dbeta_data = dbeta->defined() ? dbeta->template data_ptr() : nullptr; - if (M < 512) { + + if (M < 128) { // For small batch size, do colwise reduce directly. const int64_t B = (N + kCUDANumThreads - 1) / kCUDANumThreads; GammaBetaBackwardSimpleCUDAKernel @@ -778,19 +1086,77 @@ void LayerNormBackwardKernelImplInternal( dbeta_data); C10_CUDA_KERNEL_LAUNCH_CHECK(); } else { - dim3 threads{16, 32}; - int blocks = (N + threads.x-1)/threads.x; - GammaBetaBackwardCUDAKernel - <<>>( - M, - N, - dY_data, - X_data, - mean_data, - rstd_data, - dgamma_data, - dbeta_data); +#if defined(USE_ROCM) + // For small batch size, do colwise reduce directly. + const int part_size = warp_size; + const dim3 threads2(warp_size, 4, 1); + const dim3 blocks2((N + threads2.x - 1) / threads2.x, part_size, 1); + const int nshared2_a = 2 * sizeof(T_ACC) * threads2.y * threads2.y * (threads2.x + 1); + const int nshared2_b = threads2.x * threads2.y * sizeof(T_ACC); + const int nshared2 = nshared2_a > nshared2_b ? nshared2_a : nshared2_b; + + const auto part_grad_dtype = at::toAccumulateType(X.scalar_type(), true); + Tensor part_grad_gamma = at::empty({part_size,N}, gamma.options().dtype(part_grad_dtype)); + Tensor part_grad_beta = at::native::empty_like(part_grad_gamma); + cuComputePartGradGammaBeta<<>>( + dY_data, + X_data, + M,N, + mean_data, + rstd_data, + part_grad_gamma.template data_ptr(), + part_grad_beta.template data_ptr()); C10_CUDA_KERNEL_LAUNCH_CHECK(); + + const dim3 threads3(warp_size, 8, 1); // Optimization for ROCm + const dim3 blocks3((N + threads2.x - 1) / threads2.x, 1, 1); + const int nshared3 = threads3.x * threads3.y * sizeof(T); + cuComputeGradGammaBeta<<>>( + part_grad_gamma.template data_ptr(), + part_grad_beta.template data_ptr(), + part_size, + M,N, + dgamma_data, + dbeta_data); + C10_CUDA_KERNEL_LAUNCH_CHECK(); +#else + if ((M % kWarpSize == 0) && (N % kWarpSize == 0)) { + // This implementation relies on warp primitives and requires that M and N divide + // exactly to warp size. + dim3 threads{kWarpSize, kWarpSize}; + int blocks = (N + threads.x - 1) / threads.x; + + // If M and N divide by 32, we can use warp shuffles for the final reduction. That requires + // transposing values in shared memory, so we apply a padding to reduce bank conflicts. + size_t shmem_sz = 2 * sizeof(T_ACC) * (threads.x + 1) * threads.y; + GammaBetaBackwardCUDAKernel_32x32 + <<>>( + M, + N, + dY_data, + X_data, + mean_data, + rstd_data, + dgamma_data, + dbeta_data); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + } else { + dim3 threads{16, 32}; + int blocks = (N + threads.x - 1) / threads.x; + size_t shmem_sz = 2 * sizeof(T_ACC) * threads.x * threads.y; + GammaBetaBackwardCUDAKernel + <<>>( + M, + N, + dY_data, + X_data, + mean_data, + rstd_data, + dgamma_data, + dbeta_data); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + } +#endif } } } diff --git a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp index 320c799f23bc..22de5012f11f 100644 --- a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp +++ b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp @@ -24,7 +24,6 @@ #include #else #include -#include #include #include #include @@ -33,7 +32,6 @@ #include #include #include -#include #include #include #endif @@ -115,20 +113,6 @@ void magmaLuNoPivBatched( magma_int_t m, magma_int_t n, scalar_t** dA_array, magma_int_t ldda, magma_int_t* info_array, magma_int_t batchsize, const MAGMAQueue& magma_queue); -template -inline magma_int_t magmaGetriOptimalBlocksize(magma_int_t n); - -template -void magmaGetri( - magma_int_t n, scalar_t* dA, magma_int_t ldda, magma_int_t* ipiv, scalar_t* dwork, - magma_int_t lwork, magma_int_t* info); - -template -void magmaGetriBatched( - magma_int_t n, scalar_t** dA_array, magma_int_t ldda, - magma_int_t** ipiv_array, scalar_t** dinvA_array, magma_int_t lddia, - magma_int_t* info_array, magma_int_t batchsize, const MAGMAQueue& magma_queue); - template void magmaCholeskySolve( magma_uplo_t uplo, magma_int_t n, magma_int_t nrhs, scalar_t* dA, magma_int_t ldda, @@ -400,154 +384,6 @@ void magmaLuNoPivBatched>( AT_CUDA_CHECK(cudaGetLastError()); } -template<> -inline magma_int_t magmaGetriOptimalBlocksize(magma_int_t n) { - return magma_get_dgetri_nb(n); -} - -template<> -inline magma_int_t magmaGetriOptimalBlocksize(magma_int_t n) { - return magma_get_sgetri_nb(n); -} - -template <> -inline magma_int_t magmaGetriOptimalBlocksize>( - magma_int_t n) { - return magma_get_zgetri_nb(n); -} - -template <> -inline magma_int_t magmaGetriOptimalBlocksize>( - magma_int_t n) { - return magma_get_cgetri_nb(n); -} - -template<> -void magmaGetri( - magma_int_t n, double* dA, magma_int_t ldda, magma_int_t* ipiv, double* dwork, - magma_int_t lwork, magma_int_t* info) { - MagmaStreamSyncGuard guard; - magma_dgetri_gpu(n, dA, ldda, ipiv, dwork, lwork, info); - AT_CUDA_CHECK(cudaGetLastError()); -} - -template<> -void magmaGetri( - magma_int_t n, float* dA, magma_int_t ldda, magma_int_t* ipiv, float* dwork, - magma_int_t lwork, magma_int_t* info) { - MagmaStreamSyncGuard guard; - magma_sgetri_gpu(n, dA, ldda, ipiv, dwork, lwork, info); - AT_CUDA_CHECK(cudaGetLastError()); -} - -template <> -void magmaGetri>( - magma_int_t n, - c10::complex* dA, - magma_int_t ldda, - magma_int_t* ipiv, - c10::complex* dwork, - magma_int_t lwork, - magma_int_t* info) { - MagmaStreamSyncGuard guard; - magma_zgetri_gpu( - n, - reinterpret_cast(dA), - ldda, - ipiv, - reinterpret_cast(dwork), - lwork, - info); - AT_CUDA_CHECK(cudaGetLastError()); -} - -template <> -void magmaGetri>( - magma_int_t n, - c10::complex* dA, - magma_int_t ldda, - magma_int_t* ipiv, - c10::complex* dwork, - magma_int_t lwork, - magma_int_t* info) { - MagmaStreamSyncGuard guard; - magma_cgetri_gpu( - n, - reinterpret_cast(dA), - ldda, - ipiv, - reinterpret_cast(dwork), - lwork, - info); - AT_CUDA_CHECK(cudaGetLastError()); -} - -template<> -void magmaGetriBatched( - magma_int_t n, double** dA_array, magma_int_t ldda, - magma_int_t** ipiv_array, double** dinvA_array, magma_int_t lddia, - magma_int_t* info_array, magma_int_t batchsize, const MAGMAQueue& magma_queue) { - magma_dgetri_outofplace_batched(n, dA_array, ldda, ipiv_array, dinvA_array, lddia, info_array, batchsize, magma_queue.get_queue()); - AT_CUDA_CHECK(cudaGetLastError()); -} - -template<> -void magmaGetriBatched( - magma_int_t n, float** dA_array, magma_int_t ldda, - magma_int_t** ipiv_array, float** dinvA_array, magma_int_t lddia, - magma_int_t* info_array, magma_int_t batchsize, const MAGMAQueue& magma_queue) { - magma_sgetri_outofplace_batched(n, dA_array, ldda, ipiv_array, dinvA_array, lddia, info_array, batchsize, magma_queue.get_queue()); - AT_CUDA_CHECK(cudaGetLastError()); -} - -template <> -void magmaGetriBatched>( - magma_int_t n, - c10::complex** dA_array, - magma_int_t ldda, - magma_int_t** ipiv_array, - c10::complex** dinvA_array, - magma_int_t lddia, - magma_int_t* info_array, - magma_int_t batchsize, - const MAGMAQueue& magma_queue) { - magma_zgetri_outofplace_batched( - n, - reinterpret_cast(dA_array), - ldda, - ipiv_array, - reinterpret_cast(dinvA_array), - lddia, - info_array, - batchsize, - magma_queue.get_queue()); - AT_CUDA_CHECK(cudaGetLastError()); -} - -template <> -void magmaGetriBatched>( - magma_int_t n, - c10::complex** dA_array, - magma_int_t ldda, - magma_int_t** ipiv_array, - c10::complex** dinvA_array, - magma_int_t lddia, - magma_int_t* info_array, - magma_int_t batchsize, - const MAGMAQueue& magma_queue) { - magma_cgetri_outofplace_batched( - n, - reinterpret_cast(dA_array), - ldda, - ipiv_array, - reinterpret_cast(dinvA_array), - lddia, - info_array, - batchsize, - magma_queue.get_queue()); - AT_CUDA_CHECK(cudaGetLastError()); -} - template<> void magmaCholeskySolve( magma_uplo_t uplo, magma_int_t n, magma_int_t nrhs, double* dA, magma_int_t ldda, @@ -1319,156 +1155,6 @@ void ldl_solve_kernel( REGISTER_CUDA_DISPATCH(ldl_factor_stub, &ldl_factor_kernel) REGISTER_CUDA_DISPATCH(ldl_solve_stub, &ldl_solve_kernel) -// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ inverse ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -/* -Computes the inverse of n-by-n matrix 'self', it is saved to 'self_inv'. -'infos' is an int Tensor containing error codes for each matrix in the batched input. -'infos_lu' is for holding magmaLU errors, and 'infos_getri' is for holding magmaGetri errors -For more information see MAGMA's documentation for GETRI and GETRF routines. -*/ -template -static void apply_batched_inverse(Tensor& self, Tensor& self_inv, Tensor& infos_lu, Tensor& infos_getri) { -#if !AT_MAGMA_ENABLED() -AT_ERROR("inverse: MAGMA library not found in " - "compilation. Please rebuild with MAGMA."); -#else - auto self_data = self.data_ptr(); - auto self_mat_stride = matrixStride(self); - auto self_inv_data = self_inv.data_ptr(); - auto self_inv_mat_stride = matrixStride(self_inv); - - auto infos_lu_data = infos_lu.data_ptr(); - auto infos_getri_data = infos_getri.data_ptr(); - - magma_int_t batch_size = magma_int_cast(batchCount(self), "batchCount"); - // MAGMA does not work with batch_size == 0, let's return early in this case - if (batch_size == 0) { - return; - } - - magma_int_t n = magma_int_cast(self.size(-2), "self.size(-2)"); - magma_int_t lda = std::max(1, n); - - magma_int_t* ipiv_data; - magma_int_t** ipiv_array; - scalar_t** self_array; - scalar_t** self_inv_array; - - ALLOCATE_ARRAY(ipiv_data, magma_int_t, batch_size * lda); - ALLOCATE_ARRAY(ipiv_array, magma_int_t*, batch_size); - ALLOCATE_ARRAY(self_array, scalar_t*, batch_size); - ALLOCATE_ARRAY(self_inv_array, scalar_t*, batch_size); - - // Set up the created arrays - for (int64_t i = 0; i < batch_size; i++) { - self_array[i] = &self_data[i * self_mat_stride]; - self_inv_array[i] = &self_inv_data[i * self_inv_mat_stride]; - ipiv_array[i] = &ipiv_data[i * n]; - } - // magmaLuBatched leaves ipiv_data values unwritten for singular matrices. - // Initialize to avoid memory access violations inside magma kernels (gh-51930). - std::fill_n(ipiv_data, batch_size * n, 1); - - MAGMAQueue magma_queue(self.get_device()); - magmaLuBatched( - n, n, self_array, lda, ipiv_array, infos_lu_data, - batch_size, magma_queue); - - constexpr int64_t batch_limit = 65535; - // Compute as many batches of 65535 possible - // The number of "mini"-batches are floor(batch_size / batch_limit) - // and these cover floor(batch_size / batch_limit) * batch_limit matrix solves - int64_t mini_batches = batch_size / batch_limit, mini_idx; - for (mini_idx = 0; mini_idx < mini_batches * batch_limit; mini_idx += batch_limit) { - scalar_t** self_array_cur = &self_array[mini_idx]; - scalar_t** self_inv_array_cur = &self_inv_array[mini_idx]; - magma_int_t** ipiv_array_cur = &ipiv_array[mini_idx]; - magma_int_t* info_array_cur_getri = &infos_getri_data[mini_idx]; - - magmaGetriBatched( - n, self_array_cur, lda, ipiv_array_cur, self_inv_array_cur, - lda, info_array_cur_getri, batch_limit, magma_queue); - } - - // Compute whatever is left = batch_size - floor(batch_size / batch_limit) * batch_limit - // which concisely is equal to batch_size % batch_limit - if (batch_size % batch_limit != 0) { - magmaGetriBatched( - n, &self_array[mini_idx], lda, &ipiv_array[mini_idx], &self_inv_array[mini_idx], - lda, &infos_getri_data[mini_idx], batch_size % batch_limit, magma_queue); - } -#endif -} - -template -static void apply_single_inverse(Tensor& self, Tensor& info_lu, Tensor& info_getri) { -#if !AT_MAGMA_ENABLED() -AT_ERROR("inverse: MAGMA library not found in " - "compilation. Please rebuild with MAGMA."); -#else - auto self_data = self.data_ptr(); - magma_int_t n = magma_int_cast(self.size(-2), "self.size(-2)"); - magma_int_t lda = std::max(1, n); - magma_int_t lwork = n * magmaGetriOptimalBlocksize(n); - - // magmaLu and magmaGetri requires info argument to live on CPU - // but info_lu and info_getri tensors are on the same device as self - magma_int_t info_lu_cpu = 0; - magma_int_t info_getri_cpu = 0; - - Tensor ipiv = at::empty({lda}, at::kInt); - Tensor dwork = at::empty({lwork}, self.options()); - magmaLu(n, n, self_data, lda, ipiv.data_ptr(), &info_lu_cpu); - magmaGetri( - n, self_data, lda, ipiv.data_ptr(), dwork.data_ptr(), lwork, &info_getri_cpu); - info_lu.fill_(info_lu_cpu); - info_getri.fill_(info_getri_cpu); -#endif -} - - -// This is a type dispatching helper function for 'apply_batched_inverse' and 'singleCheckErrors' -Tensor& _linalg_inv_out_helper_cuda_legacy(Tensor& result, Tensor& infos_lu, Tensor& infos_getri) { - // assuming result is in column major order and contains the matrices to invert - if (result.dim() > 2) { - auto input_working_copy = cloneBatchedColumnMajor(result); - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(result.scalar_type(), "linalg_inv_out_cuda", [&]{ - apply_batched_inverse( - input_working_copy, result, infos_lu, infos_getri); - }); - } else { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(result.scalar_type(), "linalg_inv_out_cuda", [&]{ - apply_single_inverse(result, infos_lu, infos_getri); - }); - } - return result; -} - -// This is a MAGMA/cuSOLVER dispatching helper function -Tensor& _linalg_inv_out_helper_cuda(Tensor &result, Tensor& infos_lu, Tensor& infos_getri) { - // This function calculates the inverse matrix in-place - // result should be in column major order and contain matrices to invert -#ifdef USE_CUSOLVER - auto preferred_backend = at::globalContext().linalgPreferredBackend(); - switch (preferred_backend) { - case at::LinalgBackend::Cusolver: - return _linalg_inv_out_helper_cuda_lib(result, infos_lu, infos_getri); // cusolver or cublas - case at::LinalgBackend::Magma: - return _linalg_inv_out_helper_cuda_legacy(result, infos_lu, infos_getri); // magma-cuda - default: - if (batchCount(result) <= 2 || !use_magma_) { - return _linalg_inv_out_helper_cuda_lib(result, infos_lu, infos_getri); // cusolver or cublas - } else { - return _linalg_inv_out_helper_cuda_legacy(result, infos_lu, infos_getri); // magma-cuda - } - } -#else - return _linalg_inv_out_helper_cuda_legacy(result, infos_lu, infos_getri); // magma-cuda -#endif - return result; -} - // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cholesky_solve ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ template @@ -1874,6 +1560,9 @@ static void apply_lu_factor_batched_magma(const Tensor& input, const Tensor& piv input_array[i] = &input_data[i * input_matrix_stride]; } + // needed to run lu tests in parallel, see https://github.com/pytorch/pytorch/issues/82894 for examples + // of failures + c10::cuda::device_synchronize(); MAGMAQueue magma_queue(input.get_device()); if (compute_pivots) { @@ -1928,7 +1617,12 @@ static void lu_factor(const Tensor& input, const Tensor& pivots, const Tensor& i const auto preferred_backend = at::globalContext().linalgPreferredBackend(); #ifdef USE_CUSOLVER const auto lu_factor_cusolver = [batch_size, m, n](const Tensor& input, const Tensor& pivots, const Tensor& infos, bool compute_pivots) { - if (batch_size == 1 || m != n || m >= 512) { + // In CUDA 10.2, lu_factor_looped_cusolver does not finish the computations when the input + // matrix is exactly singular. The returned pivots contain garbage. This breaks linalg.det + // Now, batched_cublas does not handle rectangular matrices, so we still dispatch to + // looped_cusolver even if m != n. + constexpr bool looped_correct = CUSOLVER_VERSION >= 11100; + if (m != n || (looped_correct && (batch_size == 1 || m >= 512))) { lu_factor_looped_cusolver(input, pivots, infos, compute_pivots); } else { lu_factor_batched_cublas(input, pivots, infos, compute_pivots); @@ -2344,96 +2038,6 @@ void linalg_eigh_kernel(const Tensor& eigenvalues, const Tensor& eigenvectors, c REGISTER_CUDA_DISPATCH(linalg_eigh_stub, &linalg_eigh_kernel); -// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ eig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -// magmaEig uses a hybrid CPU-GPU algorithm, which takes and return CPU -// memory. So, we accept a GPU tensor, copy it to CPU memory, and later copy -// the returned values from CPU to GPU. See also magmaSymeig, which uses a -// similar approach. - -template -static void apply_eig(const Tensor& self, bool eigenvectors, Tensor& out_eigvals, Tensor& out_eigvecs, - int* info_ptr) { -#if !AT_MAGMA_ENABLED() -TORCH_CHECK(false, "Calling torch.eig on a CUDA tensor requires compiling PyTorch with MAGMA. " - "Either transfer the tensor to the CPU before calling torch.eig or recompile with MAGMA."); -#else - TORCH_INTERNAL_ASSERT(self.device() == at::kCPU, "Internal error: apply_eig needs a CPU tensor"); - using value_t = typename c10::scalar_value_type::type; - magma_vec_t jobvr = eigenvectors ? MagmaVec : MagmaNoVec; - magma_int_t n = magma_int_cast(self.size(-1), "n"); - auto self_data = self.data_ptr(); - - auto out_eigvals_data = out_eigvals.data_ptr(); - scalar_t *wr = out_eigvals_data; - - scalar_t *vr_data = NULL; - magma_int_t ldvr = 1; - if (jobvr == MagmaVec) - { - vr_data = out_eigvecs.data_ptr(); - ldvr = n; - } - - value_t *rwork_data = nullptr; - if (isComplexType(at::typeMetaToScalarType(self.dtype()))) { - ALLOCATE_ARRAY(rwork_data, value_t, n*2); - } - - if (n > 0) { - // call magmaEig once to get the optimal size of work_data - scalar_t wkopt; - magma_int_t info; - magmaEig(MagmaNoVec, jobvr, n, self_data, n, wr, NULL, 1, vr_data, ldvr, &wkopt, -1, rwork_data, &info); - magma_int_t lwork = static_cast(real_impl(wkopt)); - - // call it a 2nd time to to the actual work - scalar_t *work_data = nullptr; - ALLOCATE_ARRAY(work_data, scalar_t, lwork); - magmaEig(MagmaNoVec, jobvr, n, self_data, n, wr, NULL, 1, vr_data, ldvr, work_data, lwork, rwork_data, &info); - *info_ptr = info; - } -#endif -} - -/* - * Internal helper; like eig_cuda but: - * 1. assume that self is a square matrix of side "n" - * 2. return CPU tensors (because this is what magmaEig returns), which will be copied to GPU memory - * by the caller - */ -std::tuple eig_kernel_impl(const Tensor& self, bool& eigenvectors) { - int64_t n = self.size(-1); - // copy self to pinned CPU memory - auto self_working_copy = at::empty_strided( - {n, n}, // square matrix - {1, n}, // column-ordered, as magmaEig expects - at::TensorOptions(at::kCPU).dtype(self.dtype()).pinned_memory(true)); - self_working_copy.copy_(self); - - // tensors holding the results. We use empty_strided to make them column-ordered - auto options = self.options().device(at::kCPU).memory_format(LEGACY_CONTIGUOUS_MEMORY_FORMAT); - Tensor out_eigvals; - if (isComplexType(at::typeMetaToScalarType(self.dtype()))) { - out_eigvals = at::empty({n}, options); - } else { - out_eigvals = at::empty_strided({n, 2}, {1, n}, options); - } - auto out_eigvecs = eigenvectors - ? at::empty_strided({n, n}, {1, n}, options) - : Tensor(); - - auto infos = at::zeros({}, self_working_copy.options().dtype(kInt)); - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(self.scalar_type(), "eig_cuda", [&]{ - apply_eig(self_working_copy, eigenvectors, out_eigvals, out_eigvecs, infos.data_ptr()); - }); - at::_linalg_check_errors(infos, "eig", /*is_matrix*/true); - - return std::tuple(out_eigvals, out_eigvecs); -} - -REGISTER_CUDA_DISPATCH(eig_stub, &eig_kernel_impl); - // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ linalg_eig ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /* @@ -2869,7 +2473,7 @@ static void lu_solve_kernel(const Tensor& LU, const Tensor& pivots, const Tensor .add_output(perm) .add_input(*pivots_) .build(); - unpack_pivots_stub(pivots_->device().type(), iter, n); + unpack_pivots_stub(pivots_->device().type(), iter, n, n); if (trans == TransposeType::NoTranspose) { // Get the inverse permutation @@ -3189,73 +2793,12 @@ void lstsq_kernel(const Tensor& a, Tensor& b, Tensor& /*rank*/, Tensor& /*singul REGISTER_CUDA_DISPATCH(lstsq_stub, &lstsq_kernel); -// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ legacy_lstsq ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -std::tuple legacy_lstsq_cuda(const Tensor &B, const Tensor &A) { - TORCH_WARN_ONCE( - "torch.lstsq is deprecated in favor of torch.linalg.lstsq and will be removed in a future PyTorch release.\n", - "torch.linalg.lstsq has reversed arguments and does not return the QR decomposition in " - "the returned tuple (although it returns other information about the problem).\n", - "To get the qr decomposition consider using torch.linalg.qr.\n", - "The returned solution in torch.lstsq stored the residuals of the solution in the ", - "last m - n columns of the returned value whenever m > n. In torch.linalg.lstsq, the ", - "residuals in the field 'residuals' of the returned named tuple.\n", - "The unpacking of the solution, as in\n", - "X, _ = torch.lstsq(B, A).solution[:A.size(1)]\n", - "should be replaced with\n", - "X = torch.linalg.lstsq(A, B).solution" - ); - -#if !AT_MAGMA_ENABLED() - TORCH_CHECK(false, "solve: MAGMA library not found in " - "compilation. Please rebuild with MAGMA."); -#else - const auto dtype = A.scalar_type(); - TORCH_CHECK(B.scalar_type() == dtype, "exepected A and B dtypes to match but found ", - dtype, " and ", B.scalar_type()); - TORCH_CHECK(A.numel() > 0 && A.dim() == 2, "A should be (non-empty) 2 dimensional"); - TORCH_CHECK(B.numel() > 0 && B.dim() == 2, "B should be (non-empty) 2 dimensional"); - auto a_sizes = A.sizes(); - auto b_sizes = B.sizes(); - TORCH_CHECK(a_sizes[0] == b_sizes[0], "Expected A and b to have same size " - "at dim 0, but A has ", a_sizes[0], " rows and B has ", b_sizes[0], " rows"); - TORCH_CHECK(a_sizes[0] >= a_sizes[1], "Expected A with shape (m x n) to have " - "m >= n. The case for m < n is not implemented yet."); - - Tensor A_working = cloneBatchedColumnMajor(A); - Tensor B_working = cloneBatchedColumnMajor(B); - - int64_t m = a_sizes[0]; - int64_t n = a_sizes[1]; - int64_t nrhs = b_sizes[1]; - - int info; - AT_DISPATCH_FLOATING_TYPES(A.scalar_type(), "legacy_lstsq_cuda", [&] { - scalar_t *a_data = A_working.data_ptr(); - scalar_t *b_data = B_working.data_ptr(); - scalar_t wkopt; - magmaGels(MagmaNoTrans, m, n, nrhs, a_data, m, b_data, m, &wkopt, -1, &info); - - const auto hwork_size = static_cast(wkopt); - scalar_t *hwork = nullptr; - ALLOCATE_ARRAY(hwork, scalar_t, hwork_size); - - magmaGels(MagmaNoTrans, m, n, nrhs, a_data, m, b_data, m, hwork, hwork_size, &info); - }); - - TORCH_CHECK(info == 0, "MAGMA gels : Argument %d : illegal value", -info); - return std::tuple(B_working, A_working); -#endif // AT_MAGMA_ENABLED() -} - #if defined(BUILD_LAZY_CUDA_LINALG) struct DispatchInitializer { DispatchInitializer() { cuda::detail::LinalgDispatch disp{ _symeig_helper_cuda, - _cholesky_solve_helper_cuda, - legacy_lstsq_cuda, - _linalg_inv_out_helper_cuda}; + _cholesky_solve_helper_cuda}; cuda::detail::registerLinalgDispatch(disp); }; } initializer; diff --git a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp index d3109d866a59..89c1246a32d1 100644 --- a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp +++ b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp @@ -237,6 +237,9 @@ void apply_ldl_solve_cusolver( auto pivots_ = pivots.to(kLong); auto pivots_data = pivots_.data_ptr(); + // needed to run ldl_solve tests in parallel + // see https://github.com/pytorch/pytorch/issues/82894 for examples of failures + c10::cuda::device_synchronize(); auto handle = at::cuda::getCurrentCUDASolverDnHandle(); auto datatype = at::cuda::solver::get_cusolver_datatype(); size_t worksize_device = 0; @@ -471,101 +474,6 @@ inline static Tensor column_major_identity_matrix_like(const Tensor& self) { return at::ones(size_slice, self.options()).diag_embed().mT(); } -template -inline static void _apply_single_inverse_helper(scalar_t* self_ptr, scalar_t* self_inv_ptr, int* ipiv_ptr, int* info_getrf_ptr, int* info_getrs_ptr, int n, int lda) { - // self_inv_ptr should already be an identity matrix - - auto handle = at::cuda::getCurrentCUDASolverDnHandle(); - at::cuda::solver::getrf(handle, n, n, self_ptr, lda, ipiv_ptr, info_getrf_ptr); - at::cuda::solver::getrs(handle, n, n, self_ptr, lda, ipiv_ptr, self_inv_ptr, lda, info_getrs_ptr, CUBLAS_OP_N); -} - -template -static void apply_batched_inverse_lib(Tensor& self, Tensor& self_inv, Tensor& infos_getrf, Tensor& infos_getrs) { - const int batch_size = cuda_int_cast(batchCount(self), "batchCount"); - const int n = cuda_int_cast(self.size(-2), "self.size(-2)"); - const int lda = std::max(1, n); - - auto self_data = self.data_ptr(); - auto self_mat_stride = matrixStride(self); - auto self_inv_data = self_inv.data_ptr(); - auto self_inv_mat_stride = matrixStride(self_inv); - - auto infos_getrf_data = infos_getrf.data_ptr(); - auto infos_getrs_data = infos_getrs.data_ptr(); - - auto& allocator = *::c10::cuda::CUDACachingAllocator::get(); - - // Heuristic: For small batch size or large matrix size, we use for-loop to iterate over the batches instead of - // calling the batched cublas routine. - if (batch_size <= 8 || /* batch_size > 8 && */ n >= 512) { - for (int64_t i = 0; i < batch_size; i++) { - auto dataPtr = allocator.allocate(sizeof(int) * lda); - int* pivot = reinterpret_cast(dataPtr.get()); - - int* infos_getrf_working_ptr = &infos_getrf_data[i]; - int* infos_getrs_working_ptr = &infos_getrs_data[i]; - - _apply_single_inverse_helper( - &self_data[i * self_mat_stride], &self_inv_data[i * self_inv_mat_stride], pivot, infos_getrf_working_ptr, infos_getrs_working_ptr, n, lda); - } - } else { - // cublas batched kernels require input be "device array of device pointers" - Tensor self_array = at::arange( - reinterpret_cast(self_data), - reinterpret_cast(&self_data[(batch_size-1) * self_mat_stride]) + 1, - static_cast(self_mat_stride * sizeof(scalar_t)), self.options().dtype(at::kLong)); - Tensor self_inv_array = at::arange( - reinterpret_cast(self_inv_data), - reinterpret_cast(&self_inv_data[(batch_size-1) * self_inv_mat_stride]) + 1, - static_cast(self_inv_mat_stride * sizeof(scalar_t)), self.options().dtype(at::kLong)); - - auto dataPtr = allocator.allocate(sizeof(int)*batch_size*lda); - int* ipiv_array = reinterpret_cast(dataPtr.get()); - - at::cuda::blas::getrfBatched(n, reinterpret_cast(self_array.data_ptr()), lda, - ipiv_array, infos_getrf_data, batch_size); - - at::cuda::blas::getriBatched(n, reinterpret_cast(self_array.data_ptr()), lda, - ipiv_array, reinterpret_cast(self_inv_array.data_ptr()), lda, infos_getrs_data, batch_size); - } -} - -template -static void apply_single_inverse_lib(const Tensor& self, Tensor& self_inv, Tensor& infos_getrf, Tensor& infos_getrs) { - int n = cuda_int_cast(self.size(-2), "self.size(-2)"); - int lda = std::max(1, n); - - Tensor ipiv = at::empty({lda}, self.options().dtype(at::kInt)); - - _apply_single_inverse_helper( - self.data_ptr(), self_inv.data_ptr(), ipiv.data_ptr(), infos_getrf.data_ptr(), infos_getrs.data_ptr(), n, lda); -} - -// This is a type dispatching helper function for 'apply_batched_inverse_lib' and 'apply_single_inverse_lib' -Tensor& _linalg_inv_out_helper_cuda_lib(Tensor& result, Tensor& infos_getrf, Tensor& infos_getrs) { - // assuming result is in column major order and contains the matrices to invert - Tensor input_working_copy = cloneBatchedColumnMajor(result); - - // for getrf + getrs (cusolver path) - // result should be filled with identity matrices - result.zero_(); - result.diagonal(/*offset=*/0, /*dim1=*/-2, /*dim2=*/-1).fill_(1); - - if (result.dim() > 2) { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(result.scalar_type(), "linalg_inv_out_cuda", [&]{ - apply_batched_inverse_lib( - input_working_copy, result, infos_getrf, infos_getrs); - }); - } else { - AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(result.scalar_type(), "linalg_inv_out_cuda", [&]{ - apply_single_inverse_lib(input_working_copy, result, infos_getrf, infos_getrs); - }); - } - - return result; -} - // call cusolver gesvd function to calculate svd template inline static void apply_svd_cusolver_gesvd(const Tensor& A, const Tensor& U, const Tensor& S, const Tensor& V, @@ -748,23 +656,21 @@ inline static void apply_svd_cusolver_gesvdjBatched(const Tensor& A, const Tenso using value_t = typename c10::scalar_value_type::type; int m = cuda_int_cast(A.size(-2), "m"); int n = cuda_int_cast(A.size(-1), "n"); - int k = std::min(m, n); int batchsize = cuda_int_cast(batchCount(A), "batch size"); + int lda = A.stride(-1); + int ldu = compute_uv ? U.stride(-1) : m; + int ldv = compute_uv ? V.stride(-1) : n; // Need to pass allocated memory to the function, otherwise it fails auto& allocator = *::c10::cuda::CUDACachingAllocator::get(); - auto dataPtr_U = !compute_uv ? allocator.allocate(sizeof(scalar_t) * batchsize * m * k) : c10::DataPtr{}; - auto dataPtr_V = !compute_uv ? allocator.allocate(sizeof(scalar_t) * batchsize * n * k) : c10::DataPtr{}; + auto dataPtr_U = !compute_uv ? allocator.allocate(sizeof(scalar_t) * batchsize * m * ldu) : c10::DataPtr{}; + auto dataPtr_V = !compute_uv ? allocator.allocate(sizeof(scalar_t) * batchsize * n * ldv) : c10::DataPtr{}; auto A_data = A.data_ptr(); auto U_data = compute_uv ? U.data_ptr() : reinterpret_cast(dataPtr_U.get()); auto S_data = S.data_ptr(); auto V_data = compute_uv ? V.data_ptr() : reinterpret_cast(dataPtr_V.get()); - int lda = A.stride(-1); - int ldu = compute_uv ? U.stride(-1) : m; - int ldv = compute_uv ? V.stride(-1) : n; - TORCH_INTERNAL_ASSERT(m <= 32 && n <= 32, "gesvdjBatched requires both matrix dimensions not greater than 32, but got " "m = ", m, " n = ", n); @@ -787,10 +693,42 @@ inline static void apply_svd_cusolver_gesvdjBatched(const Tensor& A, const Tenso TORCH_CUSOLVER_CHECK(cusolverDnDestroyGesvdjInfo(gesvdj_params)); } -inline static void svd_cusolver_gesvdjBatched(const Tensor& A, const Tensor& U, const Tensor& S, const Tensor& V, const Tensor& infos, bool compute_uv) { +inline static void svd_cusolver_gesvdjBatched(const Tensor& A, const Tensor& U, const Tensor& S, const Tensor& V, const Tensor& infos, bool full_matrices, bool compute_uv) { + auto m = A.size(-2); + auto n = A.size(-1); + auto k = std::min(m, n); + // The kernel assumes full_matrices == true + // If full_matrices == false and m != n, we create auxiliary tensors of the right size and copy the results back + auto U_ = U; + auto V_ = V; + if (compute_uv && !full_matrices) { + auto sizes = A.sizes().vec(); + if (m > n) { + // Size of U with full_matrices == True + sizes.end()[-1] = m; + // U, V should be a batch of Fortran contiguous arrays + U_ = U.new_empty(sizes).mT(); + } else if (m < n) { + // Size of V with full_matrices == True + sizes.end()[-2] = n; + V_ = V.new_empty(sizes).mT(); + } + } + // Here U_ and V_ are batches of F-contig square matrices + AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(A.scalar_type(), "svd_cuda_gesvdjBatched", [&] { - apply_svd_cusolver_gesvdjBatched(A, U, S, V, infos, compute_uv); + apply_svd_cusolver_gesvdjBatched(A, U_, S, V_, infos, compute_uv); }); + + // Copy the result back if we created any new matrix + if (compute_uv && !full_matrices) { + if (!U_.is_alias_of(U)) { + U.copy_(U_.narrow(-1, 0, k)); + } + if (!V_.is_alias_of(V)) { + V.copy_(V_.narrow(-1, 0, k)); + } + } } template @@ -924,21 +862,23 @@ void svd_cusolver(const Tensor& A, const Tensor& V, const Tensor& info) { // Here U and V are F-contig whenever they are defined (i.e. whenever compute_uv=true) - const auto batch_size = batchCount(A); const auto m = A.size(-2); const auto n = A.size(-1); const auto k = std::min(m, n); static const char* check_svd_doc = "Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html"; - // The default heuristic is to use gesvdj driver + // The default heuristic is to use the gesvdj driver const auto driver_v = driver.value_or("gesvdj"); if (driver_v == "gesvd") { svd_cusolver_gesvd(A, U, S, V, info, full_matrices, compute_uv); } else if (driver_v == "gesvdj") { - if (m <= 32 && n <= 32 && batch_size > 1 && (full_matrices || m == n)) { - svd_cusolver_gesvdjBatched(cloneBatchedColumnMajor(A), U, S, V, info, compute_uv); + // See the benchmarks in + // https://github.com/pytorch/pytorch/pull/88502#issuecomment-1303860789 + // The m <= 32 && n <= 32 restrictions come from the limitations of the cusolver backend. See the cusolver docs + if (m <= 32 && n <= 32) { + svd_cusolver_gesvdjBatched(cloneBatchedColumnMajor(A), U, S, V, info, full_matrices, compute_uv); } else { // gesvdj driver may be numerically unstable for large sized matrix svd_cusolver_gesvdj(cloneBatchedColumnMajor(A), U, S, V, info, full_matrices, compute_uv); diff --git a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.h b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.h index adee8cc9eb4e..532919e83ebd 100644 --- a/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.h +++ b/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.h @@ -59,10 +59,6 @@ void lu_solve_batched_cublas(const Tensor& LU, const Tensor& pivots, const Tenso #ifdef USE_CUSOLVER -// entrance of calculations of `inverse` using cusolver getrf + getrs, cublas getrfBatched + getriBatched -Tensor _inverse_helper_cuda_lib(const Tensor& self); -Tensor& _linalg_inv_out_helper_cuda_lib(Tensor& result, Tensor& infos_getrf, Tensor& infos_getrs); - // entrance of calculations of `svd` using cusolver gesvdj and gesvdjBatched void svd_cusolver(const Tensor& A, const bool full_matrices, const bool compute_uv, const c10::optional& driver, const Tensor& U, const Tensor& S, const Tensor& V, const Tensor& info); @@ -90,8 +86,6 @@ namespace cuda { namespace detail { struct LinalgDispatch { std::tuple (*symeig_helper)(const Tensor& self, bool eigenvectors, bool upper); Tensor (*cholesky_solve_helper)(const Tensor& self, const Tensor& A, bool upper); - std::tuple (*legacy_lstsq)(const Tensor &B, const Tensor &A); - Tensor& (*inv_out_helper)(Tensor &result, Tensor& infos_lu, Tensor& infos_getri); }; C10_EXPORT void registerLinalgDispatch(const LinalgDispatch&); }} // namespace cuda::detail diff --git a/aten/src/ATen/native/cuda/reduction_template.cuh b/aten/src/ATen/native/cuda/reduction_template.cuh index 4d9d559d8ec8..a38edb538256 100644 --- a/aten/src/ATen/native/cuda/reduction_template.cuh +++ b/aten/src/ATen/native/cuda/reduction_template.cuh @@ -4,11 +4,22 @@ namespace cuda { const std::string reduction_template_0 = R"ESCAPE( #define C10_HOST_DEVICE __host__ __device__ #define C10_DEVICE __device__ + #if defined(__clang__) && defined(__HIP__) + #ifndef __forceinline__ + #define __forceinline__ inline __attribute__((always_inline)) + #endif + // until ROCm support for kernel asserts is restored + #define assert(expr) (static_cast(0)) + #endif template __device__ __forceinline__ T WARP_SHFL_DOWN(T value, unsigned int delta, int width = warpSize, unsigned int mask = 0xffffffff) { + #if defined(__clang__) && defined(__HIP__) + return __shfl_down(value, delta, width); + #else return __shfl_down_sync(mask, value, delta, width); + #endif } @@ -17,8 +28,13 @@ const std::string reduction_template_0 = R"ESCAPE( __device__ __forceinline__ std::complex WARP_SHFL_DOWN(std::complex value, unsigned int delta, int width = warpSize, unsigned int mask = 0xffffffff) { return std::complex( + #if defined(__clang__) && defined(__HIP__) + __shfl_down(value.real(), delta, width), + __shfl_down(value.imag(), delta, width)); + #else __shfl_down_sync(mask, value.real(), delta, width), __shfl_down_sync(mask, value.imag(), delta, width)); + #endif } #endif diff --git a/aten/src/ATen/native/cuda/vol2col.cuh b/aten/src/ATen/native/cuda/vol2col.cuh index 7ab719bc819e..51dbe1c74405 100644 --- a/aten/src/ATen/native/cuda/vol2col.cuh +++ b/aten/src/ATen/native/cuda/vol2col.cuh @@ -15,7 +15,7 @@ using namespace at::cuda::detail; // Kernel for fast unfold+copy on volumes template __global__ void vol2col_kernel( - const int n, + const int64_t n, const T* data_vol, const int depth, const int height, @@ -37,16 +37,16 @@ __global__ void vol2col_kernel( const int width_col, T* data_col) { CUDA_KERNEL_LOOP(index, n) { - int w_out = index % width_col; + auto w_out = index % width_col; index /= width_col; - int h_out = index % height_col; + auto h_out = index % height_col; index /= height_col; - int t_out = index % depth_col; - int channel_in = index / depth_col; - int channel_out = channel_in * ksize_t * ksize_h * ksize_w; - int t_in = t_out * stride_t - pad_t; - int h_in = h_out * stride_h - pad_h; - int w_in = w_out * stride_w - pad_w; + auto t_out = index % depth_col; + auto channel_in = index / depth_col; + auto channel_out = channel_in * ksize_t * ksize_h * ksize_w; + auto t_in = t_out * stride_t - pad_t; + auto h_in = h_out * stride_h - pad_h; + auto w_in = w_out * stride_w - pad_w; data_col += ((channel_out * depth_col + t_out) * height_col + h_out) * width_col + w_out; @@ -54,9 +54,9 @@ __global__ void vol2col_kernel( for (int i = 0; i < ksize_t; ++i) { for (int j = 0; j < ksize_h; ++j) { for (int k = 0; k < ksize_w; ++k) { - int t = t_in + i * dilation_t; - int h = h_in + j * dilation_h; - int w = w_in + k * dilation_w; + auto t = t_in + i * dilation_t; + auto h = h_in + j * dilation_h; + auto w = w_in + k * dilation_w; *data_col = (t >= 0 && h >= 0 && w >= 0 && t < depth && h < height && w < width) ? data_vol @@ -126,7 +126,7 @@ void vol2col( template __global__ void vol2im_kernel( - const unsigned n, + const int64_t n, const T* data_col, const unsigned depth, const unsigned height, @@ -150,30 +150,30 @@ __global__ void vol2im_kernel( T* data_vol) { CUDA_KERNEL_LOOP(index, n) { accT val = static_cast(0); - const unsigned w_im = index % width + pad_w; - const unsigned h_im = (index / width) % height + pad_h; - const unsigned t_im = (index / width / height) % depth + pad_t; - const unsigned c_im = index / (width * height * depth); - unsigned kernel_extent_w = (kernel_w - 1) * dilation_w + 1; - unsigned kernel_extent_h = (kernel_h - 1) * dilation_h + 1; - unsigned kernel_extent_t = (kernel_t - 1) * dilation_t + 1; + const auto w_im = index % width + pad_w; + const auto h_im = (index / width) % height + pad_h; + const auto t_im = (index / width / height) % depth + pad_t; + const auto c_im = index / (width * height * depth); + auto kernel_extent_w = (kernel_w - 1) * dilation_w + 1; + auto kernel_extent_h = (kernel_h - 1) * dilation_h + 1; + auto kernel_extent_t = (kernel_t - 1) * dilation_t + 1; // compute the start and end of the output - const unsigned w_col_start = + const auto w_col_start = (w_im < kernel_extent_w) ? 0 : (w_im - kernel_extent_w) / stride_w + 1; - const unsigned w_col_end = std::min(w_im / stride_w + 1, width_col); - const unsigned h_col_start = + const auto w_col_end = std::min(w_im / stride_w + 1, width_col); + const auto h_col_start = (h_im < kernel_extent_h) ? 0 : (h_im - kernel_extent_h) / stride_h + 1; - const unsigned h_col_end = std::min(h_im / stride_h + 1, height_col); - const unsigned t_col_start = + const auto h_col_end = std::min(h_im / stride_h + 1, height_col); + const auto t_col_start = (t_im < kernel_extent_t) ? 0 : (t_im - kernel_extent_t) / stride_t + 1; - const unsigned t_col_end = std::min(t_im / stride_t + 1, depth_col); + const auto t_col_end = std::min(t_im / stride_t + 1, depth_col); // TODO: use LCM of stride and dilation to avoid unnecessary loops for (unsigned t_col = t_col_start; t_col < t_col_end; t_col += 1) { for (unsigned h_col = h_col_start; h_col < h_col_end; h_col += 1) { for (unsigned w_col = w_col_start; w_col < w_col_end; w_col += 1) { - unsigned t_k = (t_im - t_col * stride_t); - unsigned h_k = (h_im - h_col * stride_h); - unsigned w_k = (w_im - w_col * stride_w); + uint64_t t_k = (t_im - t_col * stride_t); + uint64_t h_k = (h_im - h_col * stride_h); + uint64_t w_k = (w_im - w_col * stride_w); if (t_k % dilation_t == 0 && h_k % dilation_h == 0 && w_k % dilation_w == 0) { t_k /= dilation_t; diff --git a/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp b/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp index 50fc37ba76da..bfc7184e9303 100644 --- a/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp +++ b/aten/src/ATen/native/cudnn/AffineGridGenerator.cpp @@ -1,8 +1,17 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #if !AT_CUDNN_ENABLED() namespace at { namespace native { diff --git a/aten/src/ATen/native/cudnn/BatchNorm.cpp b/aten/src/ATen/native/cudnn/BatchNorm.cpp index 1c70aa353b51..f1f275e63885 100644 --- a/aten/src/ATen/native/cudnn/BatchNorm.cpp +++ b/aten/src/ATen/native/cudnn/BatchNorm.cpp @@ -1,5 +1,5 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include @@ -32,6 +32,16 @@ std::tuple cudnn_batch_norm_backward( #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/cudnn/ConvPlaceholders.cpp b/aten/src/ATen/native/cudnn/ConvPlaceholders.cpp index 0474b1bf1448..feb679d57d78 100644 --- a/aten/src/ATen/native/cudnn/ConvPlaceholders.cpp +++ b/aten/src/ATen/native/cudnn/ConvPlaceholders.cpp @@ -1,6 +1,16 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include // for the definition of AT_CUDNN_ENABLED -#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/cudnn/ConvShared.cpp b/aten/src/ATen/native/cudnn/ConvShared.cpp index 9f921faf0320..2c4d77c6f617 100644 --- a/aten/src/ATen/native/cudnn/ConvShared.cpp +++ b/aten/src/ATen/native/cudnn/ConvShared.cpp @@ -1,12 +1,30 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include // for the definition of AT_CUDNN_ENABLED +#include #include +#include +#include +#include #include #if AT_CUDNN_ENABLED() #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + // NOTE [cuDNN API version] // // ConvPlaceholders.cpp contains placeholder implementation of cudnn @@ -436,7 +454,7 @@ Tensor cudnn_convolution_relu( bool allow_tf32 = ctx.allowTF32CuDNN(); auto _bias = bias_t.has_value() ? bias_t.value() - : at::native::zeros( + : at::zeros( {output_t.size(1)}, optTypeMetaToScalarType(output_t.options().dtype_opt()), output_t.options().layout_opt(), @@ -514,7 +532,7 @@ Tensor cudnn_convolution_add_relu( auto _alpha = alpha.has_value() ? alpha.value().to() : 1.0; auto _bias = bias_t.has_value() ? bias_t.value() - : at::native::zeros( + : at::zeros( {output_t.size(1)}, optTypeMetaToScalarType(output_t.options().dtype_opt()), output_t.options().layout_opt(), diff --git a/aten/src/ATen/native/cudnn/ConvShared.h b/aten/src/ATen/native/cudnn/ConvShared.h index fbcf667f40fc..9a576de285ce 100644 --- a/aten/src/ATen/native/cudnn/ConvShared.h +++ b/aten/src/ATen/native/cudnn/ConvShared.h @@ -1,4 +1,5 @@ -#include +#pragma once +#include #include #include diff --git a/aten/src/ATen/native/cudnn/Conv_v7.cpp b/aten/src/ATen/native/cudnn/Conv_v7.cpp index 63968fd2072f..f5c7af79a740 100644 --- a/aten/src/ATen/native/cudnn/Conv_v7.cpp +++ b/aten/src/ATen/native/cudnn/Conv_v7.cpp @@ -1,15 +1,21 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include // for the definition of AT_CUDNN_ENABLED #if AT_CUDNN_ENABLED() #include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#endif #include #include -#include -#include -#include -#include #include #include #include @@ -199,10 +205,11 @@ size_t getMaxWorkspaceSize( { size_t max_ws_size = 0; size_t max_block_size = 0; - size_t tmp_bytes = 0; // Only used for filling pointer parameters that aren't used later const auto device = c10::cuda::current_device(); - c10::cuda::CUDACachingAllocator::cacheInfo(device, &tmp_bytes, &max_block_size); + // For the native allocator, retrieves the size of the largest unused block. + // For cudaMallocAsync, see c10/cuda/CUDAMallocAsync.cpp:cacheInfo for details. + c10::cuda::CUDACachingAllocator::cacheInfo(device, &max_block_size); for (const auto i : c10::irange(n_algo)) { cudnnStatus_t err; diff --git a/aten/src/ATen/native/cudnn/Conv_v8.cpp b/aten/src/ATen/native/cudnn/Conv_v8.cpp index 2ad8d4ffe37c..11fe5be8298e 100644 --- a/aten/src/ATen/native/cudnn/Conv_v8.cpp +++ b/aten/src/ATen/native/cudnn/Conv_v8.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include // for the definition of AT_CUDNN_ENABLED #if AT_CUDNN_ENABLED() @@ -10,7 +11,7 @@ #include #include #include -#include +#include #include #include #include @@ -26,6 +27,12 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { namespace { @@ -47,9 +54,12 @@ uint8_t getAlignment(const Tensor &t) { return alignment; } -cudnn_frontend::Tensor getTensorDescriptorWithTypeVirtual(const Tensor &t, const int64_t id, const uint8_t alignment, const cudnnDataType_t dataType, const bool _virtual) { +cudnn_frontend::Tensor getTensorDescriptorWithTypeVirtual(const Tensor &t, const int64_t id, const uint8_t alignment, const cudnnDataType_t dataType, const at::MemoryFormat memory_format, const bool _virtual) { auto sizes = t.sizes(); auto strides = t.strides(); + bool channels_last = memory_format == at::MemoryFormat::ChannelsLast || + memory_format == at::MemoryFormat::ChannelsLast3d; + fixSizeOneDimStride(sizes.size(), &sizes[0], (int64_t *) &strides[0], channels_last); auto r = cudnn_frontend::TensorBuilder() .setDim(sizes.size(), sizes.data()) .setStrides(strides.size(), strides.data()) @@ -61,8 +71,8 @@ cudnn_frontend::Tensor getTensorDescriptorWithTypeVirtual(const Tensor &t, const return r; } -cudnn_frontend::Tensor getTensorDescriptor(const Tensor &t, const int64_t id, const uint8_t alignment) { - return getTensorDescriptorWithTypeVirtual(t, id, alignment, getCudnnDataType(t), false); +cudnn_frontend::Tensor getTensorDescriptor(const Tensor &t, const int64_t id, const uint8_t alignment, const at::MemoryFormat memory_format) { + return getTensorDescriptorWithTypeVirtual(t, id, alignment, getCudnnDataType(t), memory_format, false); } cudnn_frontend::ConvDesc_v8 getConvDescriptor(cudnnDataType_t dataType, IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, const at::ScalarType scalar_type) { @@ -152,7 +162,8 @@ BenchmarkCache benchmark_cache_fus // would not be a POD anymore. void setCacheKey(CacheKey& key, const cudnnBackendDescriptorType_t operation, const Tensor& y, const Tensor& x, const Tensor& w, const IntArrayRef padding, const IntArrayRef stride, const IntArrayRef dilation, int64_t groups, bool deterministic, bool allow_tf32) { memset(&key, 0, sizeof(key)); - setConvolutionParams(&key.params, x, w, padding, stride, dilation, groups, deterministic, allow_tf32, x.suggest_memory_format()); + at::MemoryFormat memory_format = cudnn_conv_suggest_memory_format(x, w); + setConvolutionParams(&key.params, x, w, padding, stride, dilation, groups, deterministic, allow_tf32, memory_format); key.operation = operation; key.x_alignment = getAlignment(x); key.y_alignment = getAlignment(y); @@ -161,7 +172,8 @@ void setCacheKey(CacheKey& key, const cudnnBackendDescriptorType_t operation, co void setCacheKeyFused(CacheKeyFused& key, const Tensor& y, const Tensor& x, const Tensor& w, const Tensor& z, const Tensor& b, const float alpha, const IntArrayRef padding, const IntArrayRef stride, const IntArrayRef dilation, int64_t groups, bool deterministic, bool allow_tf32) { memset(&key, 0, sizeof(key)); - setConvolutionParams(&key.params, x, w, padding, stride, dilation, groups, deterministic, allow_tf32, x.suggest_memory_format()); + at::MemoryFormat memory_format = cudnn_conv_suggest_memory_format(x, w); + setConvolutionParams(&key.params, x, w, padding, stride, dilation, groups, deterministic, allow_tf32, memory_format); key.x_alignment = getAlignment(x); key.y_alignment = getAlignment(y); key.w_alignment = getAlignment(w); @@ -200,9 +212,9 @@ void run_conv_plan_fused(cudnnHandle_t handle, const Tensor& x, const Tensor& y, auto build_opgraph(const cudnnHandle_t handle, const cudnnBackendDescriptorType_t desc, const Tensor& x, const Tensor& y, const Tensor& w, const CacheKey& key, const IntArrayRef padding, const IntArrayRef stride, const IntArrayRef dilation) { auto op = cudnn_frontend::OperationBuilder(desc) - .setxDesc(getTensorDescriptor(x, 'x', key.x_alignment)) - .setyDesc(getTensorDescriptor(y, 'y', key.y_alignment)) - .setwDesc(getTensorDescriptor(w, 'w', key.w_alignment)) + .setxDesc(getTensorDescriptor(x, 'x', key.x_alignment, key.params.memory_format)) + .setyDesc(getTensorDescriptor(y, 'y', key.y_alignment, key.params.memory_format)) + .setwDesc(getTensorDescriptor(w, 'w', key.w_alignment, key.params.memory_format)) .setcDesc(getConvDescriptor(key.params.dataType, padding, stride, dilation, x.scalar_type())) .build(); std::array ops = {&op}; @@ -232,33 +244,33 @@ auto build_opgraph_fused(const cudnnHandle_t handle, const Tensor & x, const Ten const float alpha1 = 1.0; const float alpha2 = alpha; auto conv_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_CONVOLUTION_FORWARD_DESCRIPTOR) - .setxDesc(getTensorDescriptor(x, 'x', key.x_alignment)) + .setxDesc(getTensorDescriptor(x, 'x', key.x_alignment, key.params.memory_format)) // virtual output of conv - .setyDesc(getTensorDescriptorWithTypeVirtual(y, 'C', key.y_alignment, precision, true)) - .setwDesc(getTensorDescriptor(w, 'w', key.w_alignment)) + .setyDesc(getTensorDescriptorWithTypeVirtual(y, 'C', key.y_alignment, precision, key.params.memory_format, true)) + .setwDesc(getTensorDescriptor(w, 'w', key.w_alignment, key.params.memory_format)) .setAlpha(alpha1) .setcDesc(convDesc) .build(); auto add_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) .setxDesc(conv_op.getOutputTensor()) - .setbDesc(getTensorDescriptor(z, 'z', key.z_alignment)) + .setbDesc(getTensorDescriptor(z, 'z', key.z_alignment, key.params.memory_format)) // another virtual output (of add) - .setyDesc(getTensorDescriptorWithTypeVirtual(y, 'A', key.y_alignment, precision, true)) + .setyDesc(getTensorDescriptorWithTypeVirtual(y, 'A', key.y_alignment, precision, key.params.memory_format, true)) .setpwDesc(addDesc) .setAlpha(alpha1) .setAlpha2(alpha2) .build(); auto add_bias_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) .setxDesc(add_op.getOutputTensor()) - .setbDesc(getTensorDescriptor(b, 'b', key.b_alignment)) + .setbDesc(getTensorDescriptor(b, 'b', key.b_alignment, key.params.memory_format)) // another virtual output (of add bias) - .setyDesc(getTensorDescriptorWithTypeVirtual(y, 'B', key.y_alignment, precision, true)) + .setyDesc(getTensorDescriptorWithTypeVirtual(y, 'B', key.y_alignment, precision, key.params.memory_format, true)) .setpwDesc(addBiasDesc) .build(); auto act_op = cudnn_frontend::OperationBuilder(CUDNN_BACKEND_OPERATION_POINTWISE_DESCRIPTOR) .setxDesc(add_bias_op.getOutputTensor()) // final output is in original datatype - .setyDesc(getTensorDescriptor(y, 'y', key.y_alignment)) + .setyDesc(getTensorDescriptor(y, 'y', key.y_alignment, key.params.memory_format)) .setpwDesc(actDesc) .build(); std::array ops = {&conv_op, &add_op, &add_bias_op, &act_op}; @@ -300,8 +312,7 @@ size_t get_available_workspace() { int device; C10_CUDA_CHECK(cudaGetDevice(&device)); size_t max_block_size = 0; - size_t tmp_bytes = 0; // Only used for filling pointer parameters that aren't used later - c10::cuda::CUDACachingAllocator::cacheInfo(device, &tmp_bytes, &max_block_size); + c10::cuda::CUDACachingAllocator::cacheInfo(device, &max_block_size); return max_block_size; } @@ -654,7 +665,7 @@ void raw_cudnn_convolution_add_relu_out( bool allow_tf32) { if (output.numel() == 0) { return; } if (at::native::cudnnv8_enabled_check_debug()) { - auto bias_ = bias.view({1, bias.numel(), 1, 1}); + auto bias_ = input.ndimension() == 4 ? bias.view({1, bias.numel(), 1, 1}) : bias.view({1, bias.numel(), 1, 1, 1}); run_fused_conv(input, output, weight, z, bias_, alpha, stride, padding, dilation, groups, benchmark, deterministic, allow_tf32); diff --git a/aten/src/ATen/native/cudnn/GridSampler.cpp b/aten/src/ATen/native/cudnn/GridSampler.cpp index b22d25cbff97..8697b89c399a 100644 --- a/aten/src/ATen/native/cudnn/GridSampler.cpp +++ b/aten/src/ATen/native/cudnn/GridSampler.cpp @@ -1,9 +1,18 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #if !AT_CUDNN_ENABLED() namespace at { namespace native { diff --git a/aten/src/ATen/native/cudnn/LossCTC.cpp b/aten/src/ATen/native/cudnn/LossCTC.cpp index 37c5277428b7..a741816424a7 100644 --- a/aten/src/ATen/native/cudnn/LossCTC.cpp +++ b/aten/src/ATen/native/cudnn/LossCTC.cpp @@ -1,11 +1,23 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #if AT_CUDNN_ENABLED() #include #endif +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + #if (!AT_CUDNN_ENABLED()) || (CUDNN_VERSION < 7600) namespace at { namespace native { @@ -21,10 +33,30 @@ bool _use_cudnn_ctc_loss( return false; } +bool _use_cudnn_ctc_loss_tensor( + const Tensor& log_probs, + const Tensor& targets, + const Tensor& input_lengths, + const Tensor& target_lengths, + int64_t BLANK) { + return false; +} + std::tuple _cudnn_ctc_loss(const Tensor& log_probs, const Tensor& targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t BLANK, bool deterministic, bool zero_infinity) { AT_ERROR("cudnn_ctc_loss: ATen not compiled with cuDNN >= 7 support"); } +std::tuple _cudnn_ctc_loss_tensor( + const Tensor& log_probs, + const Tensor& targets, + const Tensor& input_lengths, + const Tensor& target_lengths, + int64_t BLANK, + bool deterministic, + bool zero_infinity) { + AT_ERROR("cudnn_ctc_loss: ATen not compiled with cuDNN >= 7 support"); +} + }} #else // AT_CUDNN_ENABLED @@ -68,6 +100,20 @@ bool _use_cudnn_ctc_loss( return use_cudnn; } +bool _use_cudnn_ctc_loss_tensor( + const Tensor& log_probs, + const Tensor& targets, + const Tensor& input_lengths, + const Tensor& target_lengths, + int64_t BLANK) { + Tensor ilc = input_lengths.to(Device(at::kCPU), at::kLong).contiguous(); + Tensor tlc = target_lengths.to(Device(at::kCPU), at::kLong).contiguous(); + IntArrayRef il(ilc.data_ptr(), ilc.numel()); + IntArrayRef tl(tlc.data_ptr(), tlc.numel()); + return at::_use_cudnn_ctc_loss( + log_probs, targets, il, tl, BLANK); +} + std::tuple _cudnn_ctc_loss(const Tensor& log_probs_t, const Tensor& targets_t, IntArrayRef input_lengths_, IntArrayRef target_lengths_, int64_t BLANK, bool deterministic, bool zero_infinity) { (void)zero_infinity; // only used for backward const CheckedFrom c = "cudnn_ctc_loss"; @@ -138,6 +184,21 @@ std::tuple _cudnn_ctc_loss(const Tensor& log_probs_t, const Tens return std::make_tuple(costs, grad); } +std::tuple _cudnn_ctc_loss_tensor( + const Tensor& log_probs, + const Tensor& targets, + const Tensor& input_lengths, + const Tensor& target_lengths, + int64_t BLANK, + bool deterministic, + bool zero_infinity) { + Tensor ilc = input_lengths.to(Device(at::kCPU), at::kLong).contiguous(); + Tensor tlc = target_lengths.to(Device(at::kCPU), at::kLong).contiguous(); + IntArrayRef il(ilc.data_ptr(), ilc.numel()); + IntArrayRef tl(tlc.data_ptr(), tlc.numel()); + return at::_cudnn_ctc_loss( + log_probs, targets, il, tl, BLANK, deterministic, zero_infinity); +} }} // namespace at::native diff --git a/aten/src/ATen/native/cudnn/RNN.cpp b/aten/src/ATen/native/cudnn/RNN.cpp index 29430b38e74e..426243392b6f 100644 --- a/aten/src/ATen/native/cudnn/RNN.cpp +++ b/aten/src/ATen/native/cudnn/RNN.cpp @@ -1,18 +1,32 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include #include #include -#include #include #include -#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #if !AT_CUDNN_ENABLED() namespace at { namespace native { @@ -56,7 +70,7 @@ Tensor _cudnn_init_dropout_state(double dropout, bool train, int64_t dropout_see c10::optional device, c10::optional pin_memory) { // See [Note: hacky wrapper removal for TensorOptions] - TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); + TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); AT_ERROR("_cudnn_init_dropout_state: ATen not compiled with cuDNN support"); } diff --git a/aten/src/ATen/native/group_norm.cpp b/aten/src/ATen/native/group_norm.cpp index db1d82f84fef..22ff9ea5f0e8 100644 --- a/aten/src/ATen/native/group_norm.cpp +++ b/aten/src/ATen/native/group_norm.cpp @@ -1,26 +1,37 @@ -#include -#include -#include -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + #include #include -#include #include #include namespace at { + namespace native { +template void check_group_norm_inputs( const Tensor& input, const Tensor& weight, const Tensor& bias, - int64_t C, + T C, int64_t num_groups) { TORCH_CHECK( num_groups > 0, @@ -34,14 +45,14 @@ void check_group_norm_inputs( "num_groups=", num_groups); TORCH_CHECK( - !weight.defined() || (weight.dim() == 1 && weight.numel() == C), + !weight.defined() || (weight.dim() == 1 && at::symint::numel(weight) == C), "Expected weight to be a vector of size equal to the number of ", "channels in input, but got weight of shape ", weight.sizes(), " and input of shape ", input.sizes()); TORCH_CHECK( - !bias.defined() || (bias.dim() == 1 && bias.numel() == C), + !bias.defined() || (bias.dim() == 1 && at::symint::numel(bias) == C), "Expected bias to be a vector of size equal to the number of ", "channels in input, but got bias of shape ", weight.sizes(), @@ -162,24 +173,24 @@ Tensor group_norm( const Tensor& weight = *weight_maybe_owned; const Tensor& bias = c10::value_or_else(bias_opt, [] { return Tensor(); }); - const int64_t N = input.size(0); - const int64_t C = input.size(1); + const auto N = input.sym_size(0); + const auto C = input.sym_size(1); check_group_norm_inputs(input, weight, bias, C, num_groups); - const auto input_shape = input.sizes(); - const int64_t HxW = - c10::multiply_integers(input_shape.cbegin() + 2, input_shape.cend()); + const auto input_shape = input.sym_sizes(); + const auto HxW = + c10::multiply_integers(input_shape.slice(2)); const Tensor kEmpty; auto memory_format = input.suggest_memory_format(); - const auto& X = input.device().is_cpu() ? + const auto& X = input.device().is_cpu() || input.device().is_xpu() ? input.contiguous(memory_format) : input.contiguous(); const auto& gamma = weight.defined() ? weight.contiguous() : kEmpty; const auto& beta = bias.defined() ? bias.contiguous() : kEmpty; - TORCH_CHECK(!gamma.defined() || gamma.numel() == C); - TORCH_CHECK(!beta.defined() || beta.numel() == C); + TORCH_CHECK(!gamma.defined() || gamma.sym_numel() == C); + TORCH_CHECK(!beta.defined() || beta.sym_numel() == C); return std::get<0>( - at::native_group_norm(X, gamma, beta, N, C, HxW, num_groups, eps)); + at::native_group_norm_symint(X, gamma, beta, N, C, HxW, num_groups, eps)); } DEFINE_DISPATCH(GroupNormKernel); @@ -224,8 +235,10 @@ std::tuple math_group_norm( } else if (bias.defined()) { out = out.add(bias.view(affine_param_shape)); } - at::Tensor mean = std::get<1>(outputs).view({N, group}); - at::Tensor rstd = std::get<2>(outputs).view({N, group}); + // convert mean/std to have the same dtype as input. + // This follows the same behavior as the CPU and CUDA kernels. + at::Tensor mean = std::get<1>(outputs).to(c10::TensorOptions().dtype(input.scalar_type())).view({N, group}); + at::Tensor rstd = std::get<2>(outputs).to(c10::TensorOptions().dtype(input.scalar_type())).view({N, group}); return std::make_tuple(out, mean, rstd); } } // namespace native diff --git a/aten/src/ATen/native/im2col.h b/aten/src/ATen/native/im2col.h index c3daed3d4ffc..ecbb7ab0b35d 100644 --- a/aten/src/ATen/native/im2col.h +++ b/aten/src/ATen/native/im2col.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/aten/src/ATen/native/im2col_shape_check.h b/aten/src/ATen/native/im2col_shape_check.h index 45fc96ea8443..d6c95465da26 100644 --- a/aten/src/ATen/native/im2col_shape_check.h +++ b/aten/src/ATen/native/im2col_shape_check.h @@ -1,6 +1,7 @@ #pragma once #include #include +#include namespace at { namespace native { @@ -36,6 +37,13 @@ static inline void col2im_shape_check( dilation_height, " dilation_width: ", dilation_width); + TORCH_CHECK( + pad_width >= 0 && pad_height >= 0, + "padding should be non-negative, but got pad_height: ", + pad_height, + " pad_width: ", + pad_width); + int64_t ndim = input.ndimension(); // allow dim=0 only the batch dimension. @@ -218,7 +226,7 @@ static inline void im2col_shape_check( output_height, ", ", output_width, - "), which is too small (non-positive)."); + "), but its components must be at least one."); } } diff --git a/aten/src/ATen/native/layer_norm.cpp b/aten/src/ATen/native/layer_norm.cpp index e4ea0ac8fe21..8269a4d3af9e 100644 --- a/aten/src/ATen/native/layer_norm.cpp +++ b/aten/src/ATen/native/layer_norm.cpp @@ -1,17 +1,26 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include -#include -#include -#include +#include #include #include -#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif #include -#include -#include #include #include @@ -37,8 +46,7 @@ void layer_norm_with_mean_rstd_out( for (const auto idx : c10::irange(axis)) { stat_shape.emplace_back(input_shape[idx]); } - for (const auto idx : c10::irange(axis, input.dim())) { - (void)idx; // Suppress unused variable warning + for (const auto idx C10_UNUSED : c10::irange(axis, input.dim())) { stat_shape.emplace_back(1); } @@ -167,9 +175,9 @@ std::tuple layer_norm_backward_cpu( return std::make_tuple(std::move(dX), std::move(dgamma), std::move(dbeta)); } -Tensor layer_norm( +Tensor layer_norm_symint( const Tensor& input, - IntArrayRef normalized_shape, const c10::optional& weight_opt /* optional */, const c10::optional& bias_opt /* optional */, + c10::SymIntArrayRef normalized_shape, const c10::optional& weight_opt /* optional */, const c10::optional& bias_opt /* optional */, double eps, bool /* cudnn_enable, deprecated */) { // See [Note: hacky wrapper removal for optional tensor] @@ -178,8 +186,7 @@ Tensor layer_norm( c10::MaybeOwned bias_maybe_owned = at::borrow_from_optional_tensor(bias_opt); const Tensor& bias = *bias_maybe_owned; - - return std::get<0>(at::native_layer_norm(input, normalized_shape, weight, bias, eps)); + return std::get<0>(at::native_layer_norm_symint(input, normalized_shape, weight, bias, eps)); } DEFINE_DISPATCH(LayerNormKernel); @@ -216,7 +223,7 @@ std::tuple math_native_layer_norm( at::empty_like(input, c10::TensorOptions().dtype(result_type)) ); } - at::Tensor input_reshaped = input.view({1, M, -1}); + at::Tensor input_reshaped = input.reshape({1, M, -1}); // Unlike Batch Normalization, which applies scalar scale and bias for each // entire channel/plane with the affine option, Layer Normalization applies // per-element scale and bias. E.g. For input {N, C, H, W}, weight for @@ -239,8 +246,7 @@ std::tuple math_native_layer_norm( for (const auto idx : c10::irange(axis)) { stat_shape.push_back(input_shape[idx]); } - for (const auto idx : c10::irange(axis, input.dim())) { - (void)idx; // Suppress unused variable + for (const auto idx C10_UNUSED : c10::irange(axis, input.dim())) { stat_shape.push_back(1); } mean = mean.view(stat_shape); diff --git a/aten/src/ATen/native/metal/MetalAten.mm b/aten/src/ATen/native/metal/MetalAten.mm index c1c34217c374..f100f473f055 100644 --- a/aten/src/ATen/native/metal/MetalAten.mm +++ b/aten/src/ATen/native/metal/MetalAten.mm @@ -70,12 +70,13 @@ #pragma mark - ATen Ops Tensor empty( - IntArrayRef size, + c10::SymIntArrayRef sym_size, optional dtype, optional layout, optional device, optional pin_memory, c10::optional memory_format) { + auto size = c10::asIntArrayRefSlow(sym_size); TORCH_CHECK( !pin_memory.has_value(), "'pin_memory' argument is incompatible with Metal tensor"); diff --git a/aten/src/ATen/native/metal/MetalContext.mm b/aten/src/ATen/native/metal/MetalContext.mm index 51423f59785a..c9571757f246 100644 --- a/aten/src/ATen/native/metal/MetalContext.mm +++ b/aten/src/ATen/native/metal/MetalContext.mm @@ -89,7 +89,7 @@ - (BOOL)available { constants { TORCH_CHECK(_library, "Failed to load Metal shaders"); std::string kernelStr = kernel; - for (auto i = 0; i < constants.count; ++i) { + for (NSUInteger i = 0; i < constants.count; ++i) { kernelStr += "_" + std::string([constants[i] stringValue].UTF8String); } std::lock_guard g(_pipelineCacheMutex); @@ -100,7 +100,7 @@ - (BOOL)available { MTLFunctionConstantValues* constantValues = [MTLFunctionConstantValues new]; NSUInteger ushortArgIndex = 0; NSUInteger floatArgIndex = 12; - for (auto i = 0; i < constants.count; ++i) { + for (NSUInteger i = 0; i < constants.count; ++i) { NSNumber* constant = constants[i]; const char* type = constant.objCType; if (strcmp(type, @encode(NSUInteger)) == 0 || diff --git a/aten/src/ATen/native/metal/MetalConvParams.h b/aten/src/ATen/native/metal/MetalConvParams.h index f4cc1a2c5fa8..7b0bfc9670a1 100644 --- a/aten/src/ATen/native/metal/MetalConvParams.h +++ b/aten/src/ATen/native/metal/MetalConvParams.h @@ -22,7 +22,7 @@ struct Conv2DParams final { } bool isDepthwise() const { - // Currently, only channel multipler of 1 is supported + // Currently, only channel multiplier of 1 is supported // i.e. inputFeatureChannels == outputFeatureChannels return G > 1 && IC == 1 && OC == G && OC == C; } diff --git a/aten/src/ATen/native/metal/MetalTensorImpl.h b/aten/src/ATen/native/metal/MetalTensorImpl.h index 799f7ef3bd11..2fb87b2f4f89 100644 --- a/aten/src/ATen/native/metal/MetalTensorImpl.h +++ b/aten/src/ATen/native/metal/MetalTensorImpl.h @@ -31,6 +31,10 @@ struct TORCH_API MetalTensorImpl : public OpaqueTensorImpl { return strides_; } + c10::SymIntArrayRef sym_strides_custom() const override { + return c10::fromIntArrayRefKnownNonNegative(strides_); + } + bool is_contiguous_custom(c10::MemoryFormat memory_format) const override { return true; } diff --git a/aten/src/ATen/native/metal/mpscnn/MPSCNNConvOp.mm b/aten/src/ATen/native/metal/mpscnn/MPSCNNConvOp.mm index adf9e1b75c2d..bf4136aed5db 100644 --- a/aten/src/ATen/native/metal/mpscnn/MPSCNNConvOp.mm +++ b/aten/src/ATen/native/metal/mpscnn/MPSCNNConvOp.mm @@ -75,10 +75,10 @@ + (MPSCNNConvOp*)conv2d:(const Conv2DParams&)params using namespace at::native::metal::mpscnn; TORCH_CHECK( params.DX == params.DY == 1, "Dilated convolution is not supported yet."); - const int64_t oC = params.OC; - const int64_t iC = params.C; - const int64_t kH = params.KH; - const int64_t kW = params.KW; + const NSUInteger oC = params.OC; + const NSUInteger iC = params.C; + const NSUInteger kH = params.KH; + const NSUInteger kW = params.KW; MPSCNNNeuron* neuron = at::native::metal::neuron(t); MPSCNNConvolutionDescriptor* desc = nil; if (params.isDepthwise()) { @@ -149,7 +149,7 @@ + (MPSCNNConvOp*)conv2d:(const Conv2DParams&)params offset.z = 0; [conv setOffset:offset]; - TORCH_CHECK(conv.inputFeatureChannels == params.IC * params.G); + TORCH_CHECK(static_cast(conv.inputFeatureChannels) == params.IC * params.G); TORCH_CHECK(oC % conv.groups == 0); TORCH_CHECK(conv.outputFeatureChannels == oC); TORCH_CHECK(conv.kernelWidth == kW); diff --git a/aten/src/ATen/native/metal/mpscnn/MPSImageWrapper.mm b/aten/src/ATen/native/metal/mpscnn/MPSImageWrapper.mm index d5a9632d26c9..14c98f99cff0 100644 --- a/aten/src/ATen/native/metal/mpscnn/MPSImageWrapper.mm +++ b/aten/src/ATen/native/metal/mpscnn/MPSImageWrapper.mm @@ -23,6 +23,9 @@ + (instancetype)newWithMPSImageWrapper:(MPSImageWrapper*)wrapper { - (void)dealloc { _imageWrapper = nullptr; +#if !__has_feature(objc_arc) + [super dealloc]; +#endif } - (void)beginSynchronization { diff --git a/aten/src/ATen/native/metal/ops/MetalConcat.mm b/aten/src/ATen/native/metal/ops/MetalConcat.mm index c43bf055fa2e..8c28568d3101 100644 --- a/aten/src/ATen/native/metal/ops/MetalConcat.mm +++ b/aten/src/ATen/native/metal/ops/MetalConcat.mm @@ -16,13 +16,11 @@ namespace native { namespace metal { -Tensor cat_batch(const TensorList tensors, MetalTensorImplStorage& mt) { - at::Tensor tensor = tensors[0]; +Tensor cat_batch(const Tensor& tensor, const ITensorListRef& tensors, MetalTensorImplStorage& mt) { MetalCommandBuffer* commandBuffer = getCommandBuffer(tensor); MPSImage* Y = mt.texture()->image(); ushort cat_dim4_pointer = 0; - for (int i = 0; i < tensors.size(); ++i) { - const auto& t = tensors[i]; + for (const auto& t : tensors) { MPSImage* X = imageFromTensor(t); MetalCommandBuffer* Xcb = getCommandBuffer(t); TORCH_CHECK( @@ -55,8 +53,7 @@ Tensor cat_batch(const TensorList tensors, MetalTensorImplStorage& mt) { return output; } -Tensor cat_feature(const TensorList tensors, MetalTensorImplStorage& mt) { - at::Tensor tensor = tensors[0]; +Tensor cat_feature(const Tensor& tensor, const ITensorListRef& tensors, MetalTensorImplStorage& mt) { MetalCommandBuffer* commandBuffer = getCommandBuffer(tensor); MPSImage* Y = mt.texture()->image(); ushort channel_offset = 0; @@ -68,9 +65,9 @@ Tensor cat_feature(const TensorList tensors, MetalTensorImplStorage& mt) { tt.texture()->allocateTemporaryStorage(temp_size, commandBuffer); MPSImage* T = tt.texture()->image(); - for (int i = 0; i < tensors.size(); ++i) { - MPSImage* X = imageFromTensor(tensors[i]); - MetalCommandBuffer* Xcb = getCommandBuffer(tensors[i]); + for (const auto& t : tensors) { + MPSImage* X = imageFromTensor(t); + MetalCommandBuffer* Xcb = getCommandBuffer(t); TORCH_CHECK( [commandBuffer isEqual:Xcb], @"inputs have different Metal command buffers"); @@ -165,15 +162,15 @@ Tensor cat_feature(const TensorList tensors, MetalTensorImplStorage& mt) { return output; } -Tensor cat(const TensorList tensors, int64_t dim) { +Tensor cat(const ITensorListRef& tensors, int64_t dim) { TORCH_CHECK( dim == 0 || dim == 1, "Metal cat is implemented only for batch dimension"); int64_t cat_dim_size = 0; - at::Tensor tensor = tensors[0]; + TORCH_CHECK(!tensors.empty(), "cat expected a non-empty list of Tensor"); + at::Tensor tensor = *tensors.begin(); MetalCommandBuffer* commandBuffer = getCommandBuffer(tensor); - for (int i = 0; i < tensors.size(); ++i) { - const auto& t = tensors[i]; + for (const auto& t : tensors) { TORCH_CHECK(t.dim() == 4, "Metal cat expects 4 dimensional inputs"); TORCH_CHECK(t.is_metal(), "Metal cat expects metal tensors"); @@ -197,9 +194,9 @@ Tensor cat(const TensorList tensors, int64_t dim) { mt.texture()->allocateTemporaryStorage(result_size, commandBuffer); if (dim == 1) { - return cat_feature(tensors, mt); + return cat_feature(tensor, tensors, mt); } - return cat_batch(tensors, mt); + return cat_batch(tensor, tensors, mt); } TORCH_LIBRARY_IMPL(aten, Metal, m) { diff --git a/aten/src/ATen/native/metal/ops/MetalConvolution.mm b/aten/src/ATen/native/metal/ops/MetalConvolution.mm index 2e1503f67076..46295abefae9 100644 --- a/aten/src/ATen/native/metal/ops/MetalConvolution.mm +++ b/aten/src/ATen/native/metal/ops/MetalConvolution.mm @@ -106,7 +106,9 @@ Tensor conv2d_prepack_run( } // namespace prepack TORCH_LIBRARY_IMPL(aten, Metal, m) { - m.impl(TORCH_SELECTIVE_NAME("aten::conv2d"), TORCH_FN(conv2d)); + // NB: this didn't actually do anything; need to generalize this to + // work for general convolution and register to aten::convolution + // m.impl(TORCH_SELECTIVE_NAME("aten::conv2d"), TORCH_FN(conv2d)); }; TORCH_LIBRARY_IMPL(metal_prepack, Metal, m) { diff --git a/aten/src/ATen/native/metal/ops/MetalHardshrink.mm b/aten/src/ATen/native/metal/ops/MetalHardshrink.mm index 972768070407..4de506cb6526 100644 --- a/aten/src/ATen/native/metal/ops/MetalHardshrink.mm +++ b/aten/src/ATen/native/metal/ops/MetalHardshrink.mm @@ -15,6 +15,8 @@ using MetalTensorImpl = at::MetalTensorImpl; +// NB: this is currently unused, but I've left it because in principle +// it's useful Tensor& hardshrink_(Tensor& input, const at::Scalar& lambda=0.5) { float l = lambda.toFloat(); MPSImage* X = imageFromTensor(input); @@ -84,7 +86,6 @@ Tensor hardshrink(const at::Tensor& input, const at::Scalar& lambda=0.5) { } TORCH_LIBRARY_IMPL(aten, Metal, m) { - m.impl(TORCH_SELECTIVE_NAME("aten::hardshrink_"), TORCH_FN(hardshrink_)); m.impl(TORCH_SELECTIVE_NAME("aten::hardshrink"), TORCH_FN(hardshrink)); }; diff --git a/aten/src/ATen/native/metal/ops/MetalPadding.mm b/aten/src/ATen/native/metal/ops/MetalPadding.mm index 4edd4a04bbde..748fa8f4b653 100644 --- a/aten/src/ATen/native/metal/ops/MetalPadding.mm +++ b/aten/src/ATen/native/metal/ops/MetalPadding.mm @@ -35,7 +35,7 @@ Tensor reflection_pad2d(const Tensor& input, IntArrayRef padding) { } std::vector output_size(input_dim); - for (size_t d = 0; d < input_dim; ++d) { + for (int d = 0; d < input_dim; ++d) { if (d == input_dim - 1) { output_size[d] = input_size[d] + pad_right + pad_left; } diff --git a/aten/src/ATen/native/metal/ops/MetalReshape.mm b/aten/src/ATen/native/metal/ops/MetalReshape.mm index 37842ee3be59..551e336a9be1 100644 --- a/aten/src/ATen/native/metal/ops/MetalReshape.mm +++ b/aten/src/ATen/native/metal/ops/MetalReshape.mm @@ -16,7 +16,8 @@ namespace metal { API_AVAILABLE(ios(11.0), macos(10.13)) -Tensor view(const Tensor& input, IntArrayRef size) { +Tensor view(const Tensor& input, c10::SymIntArrayRef sym_size) { + auto size = c10::asIntArrayRefSlow(sym_size); TORCH_CHECK(input.is_metal()); auto inferred_size = at::infer_size(size, input.numel()); auto stride = @@ -63,7 +64,7 @@ Tensor view(const Tensor& input, IntArrayRef size) { Tensor reshape(const Tensor& input, IntArrayRef shape) { TORCH_CHECK(input.is_metal()); - return view(input, shape); + return view(input, c10::fromIntArrayRefSlow(shape)); } Tensor flatten_using_ints( diff --git a/aten/src/ATen/native/miopen/BatchNorm_miopen.cpp b/aten/src/ATen/native/miopen/BatchNorm_miopen.cpp index 28e20e90b299..91f3f01764da 100644 --- a/aten/src/ATen/native/miopen/BatchNorm_miopen.cpp +++ b/aten/src/ATen/native/miopen/BatchNorm_miopen.cpp @@ -1,7 +1,16 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + // TODO: Remove the condition on AT_ROCM_ENABLED entirely, // don't build this file as part of CPU build. #include diff --git a/aten/src/ATen/native/miopen/Conv_miopen.cpp b/aten/src/ATen/native/miopen/Conv_miopen.cpp index 61eb209d5adc..677a711ce7a6 100644 --- a/aten/src/ATen/native/miopen/Conv_miopen.cpp +++ b/aten/src/ATen/native/miopen/Conv_miopen.cpp @@ -1,8 +1,25 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + // TODO: Remove the condition on AT_ROCM_ENABLED entirely, // don't build this file as part of CPU build. #include @@ -102,6 +119,20 @@ std::tuple miopen_depthwise_convolution_backwa AT_ERROR("miopen_depthwise_convolution_backward: ATen not compiled with MIOpen support"); } + +at::Tensor miopen_convolution_add_relu( + const at::Tensor& input, const at::Tensor& weight, const at::Tensor& z, + const c10::optional& alpha, const c10::optional& bias, IntArrayRef stride, + IntArrayRef padding, IntArrayRef dilation, int64_t groups) { + AT_ERROR("miopen_convolution_add_relu: ATen not compiled with MIOpen support"); +} + +at::Tensor miopen_convolution_relu( + const at::Tensor& input, const at::Tensor& weight, const c10::optional& bias, + IntArrayRef stride, IntArrayRef padding, IntArrayRef dilation, int64_t groups) { + AT_ERROR("miopen_convolution_relu: ATen not compiled with MIOpen support"); +} + }} #else // AT_ROCM_ENABLED @@ -1449,6 +1480,219 @@ Tensor miopen_convolution_transpose( return output_t; } +// MIOpen fused convolution bias activation forward +void raw_miopen_convolution_relu_out( + const Tensor& output, + const Tensor& input, + const Tensor& weight, + const Tensor& bias, + IntArrayRef stride, + IntArrayRef padding, + IntArrayRef dilation, + int64_t groups, + bool benchmark, + bool deterministic) { + + auto dataType = getMiopenDataType(input); + miopenConvolutionMode_t c_mode = miopenConvolution; + + ConvolutionArgs args{ input, output, weight }; + args.handle = getMiopenHandle(); + setConvolutionParams(&args.params, args.handle, input, weight, padding, stride, dilation, groups, deterministic); + args.idesc.set(input); + args.wdesc.set(weight, input.suggest_memory_format(), 0); + args.odesc.set(output); + args.cdesc.set(dataType, c_mode, input.dim() - 2, args.params.padding, args.params.stride, args.params.dilation, args.params.groups); + + TensorDescriptor bdesc; + bdesc.set(bias.expand({1, bias.size(0)}), output.dim()); + + // Create the fusion plan + miopenFusionPlanDescriptor_t fusePlanDesc; + miopenFusionOpDescriptor_t convoOp; + miopenFusionOpDescriptor_t biasOp; + miopenFusionOpDescriptor_t activOp; + MIOPEN_CHECK(miopenCreateFusionPlan(&fusePlanDesc, miopenVerticalFusion, args.idesc.desc())); + MIOPEN_CHECK(miopenCreateOpConvForward(fusePlanDesc, &convoOp, args.cdesc.desc(), args.wdesc.desc())); + MIOPEN_CHECK(miopenCreateOpBiasForward(fusePlanDesc, &biasOp, bdesc.desc())); + MIOPEN_CHECK(miopenCreateOpActivationForward(fusePlanDesc, &activOp, miopenActivationRELU)); + + // compile fusion plan + MIOPEN_CHECK(miopenCompileFusionPlan(args.handle, fusePlanDesc)); + + // Set the Args + float alpha = static_cast(1); + float beta = static_cast(0); + float activ_alpha = static_cast(0); + float activ_beta = static_cast(0); + float activ_gamma = static_cast(0); + miopenOperatorArgs_t fusionArgs; + MIOPEN_CHECK(miopenCreateOperatorArgs(&fusionArgs)); + MIOPEN_CHECK(miopenSetOpArgsConvForward(fusionArgs, convoOp, &alpha, &beta, weight.data_ptr())); + MIOPEN_CHECK(miopenSetOpArgsBiasForward(fusionArgs, biasOp, &alpha, &beta, bias.data_ptr())); + MIOPEN_CHECK(miopenSetOpArgsActivForward(fusionArgs, activOp, &alpha, &beta, activ_alpha, activ_beta, activ_gamma)); + + miopenExecuteFusionPlan(args.handle, fusePlanDesc, args.idesc.desc(), input.data_ptr(), args.odesc.desc(), output.data_ptr(), fusionArgs); + + // Cleanup + miopenDestroyFusionPlan(fusePlanDesc); +} + +static at::Tensor self_or_new_memory_format(at::Tensor& self, at::MemoryFormat memory_format) { + if (self.is_contiguous(memory_format)) { + return self; + } + return at::empty_like(self, self.options(), memory_format); +} + +Tensor miopen_convolution_add_relu( + const Tensor& input, + const Tensor& weight, + const Tensor& z, + const c10::optional& alpha, + const c10::optional& bias, + IntArrayRef stride, + IntArrayRef padding, + IntArrayRef dilation, + int64_t groups) { + + // MIOpen does not support fusion of add, the alpha2 * z step of the below cuDNN function: + // y = act ( alpha1 * conv(x) + alpha2 * z + bias ) + + auto memory_format = input.suggest_memory_format(); + + auto& ctx = at::globalContext(); + bool benchmark = ctx.benchmarkCuDNN(); + + TensorArg input_arg { input, "input", 1 }, + weight_arg { weight, "weight", 2 }; + auto output = miopen_convolution_forward( + "miopen_convolution_add_relu", + input_arg, + weight_arg, + padding, + stride, + dilation, + groups, + benchmark, + false // deterministic + ); + + auto contig_output = self_or_new_memory_format(output, memory_format); + + if (!output.is_same(contig_output)) { + contig_output.copy_(output); + } + + auto _alpha = alpha.has_value() ? alpha.value().to() : 1.0; + auto _bias = bias.has_value() + ? bias.value() + : at::zeros( + {contig_output.size(1)}, + optTypeMetaToScalarType(contig_output.options().dtype_opt()), + contig_output.options().layout_opt(), + contig_output.options().device_opt(), + contig_output.options().pinned_memory_opt()); + + at::Tensor alpha_mul_z_add_bias = at::native::reshape_bias(input.dim(), _bias).add(z, _alpha); + contig_output.add_(alpha_mul_z_add_bias); + contig_output.relu_(); + + return contig_output; +} + +Tensor miopen_convolution_relu( + const Tensor& input, + const Tensor& weight, + const c10::optional& bias, + IntArrayRef stride, + IntArrayRef padding, + IntArrayRef dilation, + int64_t groups) { + + auto memory_format = input.suggest_memory_format(); + + auto& ctx = at::globalContext(); + bool benchmark = ctx.benchmarkCuDNN(); + + // MIOpen currently only supports MemoryFormat::Contiguous and fp32 and 2d + if (input.suggest_memory_format() == at::MemoryFormat::Contiguous + && input.scalar_type() == at::kFloat + && input.ndimension() == 4) { + + // FuseFrozenConvAddRelu performs some tensor shape checking + Tensor output_t = at::detail::empty_cuda( + conv_output_size( + input.sizes(), weight.sizes(), padding, stride, dilation), + input.options().memory_format(input.suggest_memory_format())); + if (output_t.numel() == 0) { + return output_t; + } + + auto _bias = bias.has_value() + ? bias.value() + : at::zeros( + {output_t.size(1)}, + optTypeMetaToScalarType(output_t.options().dtype_opt()), + output_t.options().layout_opt(), + output_t.options().device_opt(), + output_t.options().pinned_memory_opt()); + + raw_miopen_convolution_relu_out( + output_t, + input, + weight, + _bias, + stride, + padding, + dilation, + groups, + benchmark, // benchmark + false // deterministic + ); + + return output_t; + } + else { + // fallback + + TensorArg input_arg { input, "input", 1 }, + weight_arg { weight, "weight", 2 }; + auto output = miopen_convolution_forward( + "miopen_convolution_relu", + input_arg, + weight_arg, + padding, + stride, + dilation, + groups, + benchmark, + false // deterministic + ); + + auto contig_output = self_or_new_memory_format(output, memory_format); + + if (!output.is_same(contig_output)) { + contig_output.copy_(output); + } + + auto _bias = bias.has_value() + ? bias.value() + : at::zeros( + {contig_output.size(1)}, + optTypeMetaToScalarType(contig_output.options().dtype_opt()), + contig_output.options().layout_opt(), + contig_output.options().device_opt(), + contig_output.options().pinned_memory_opt()); + + at::Tensor reshaped_bias = at::native::reshape_bias(input.dim(), _bias); + contig_output.add_(reshaped_bias); + contig_output.relu_(); + + return contig_output; + } +} + REGISTER_CUDA_DISPATCH(miopen_convolution_backward_stub, &miopen_convolution_backward); REGISTER_CUDA_DISPATCH(miopen_convolution_transpose_backward_stub, &miopen_convolution_transpose_backward); REGISTER_CUDA_DISPATCH(miopen_depthwise_convolution_backward_stub, &miopen_depthwise_convolution_backward); diff --git a/aten/src/ATen/native/miopen/RNN_miopen.cpp b/aten/src/ATen/native/miopen/RNN_miopen.cpp index b5a63dd803d1..b61794487a41 100644 --- a/aten/src/ATen/native/miopen/RNN_miopen.cpp +++ b/aten/src/ATen/native/miopen/RNN_miopen.cpp @@ -1,15 +1,28 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include #include #include #include -#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + #if !AT_ROCM_ENABLED() namespace at { namespace native { diff --git a/aten/src/ATen/native/mkl/LinearAlgebra.cpp b/aten/src/ATen/native/mkl/LinearAlgebra.cpp index 2790f1e8b3f2..a47afe97648f 100644 --- a/aten/src/ATen/native/mkl/LinearAlgebra.cpp +++ b/aten/src/ATen/native/mkl/LinearAlgebra.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_NO_OPERATORS #include #include diff --git a/aten/src/ATen/native/mkl/LinearAlgebra.h b/aten/src/ATen/native/mkl/LinearAlgebra.h index a536c193524e..a3bbd8285320 100644 --- a/aten/src/ATen/native/mkl/LinearAlgebra.h +++ b/aten/src/ATen/native/mkl/LinearAlgebra.h @@ -1,5 +1,6 @@ -#include +#pragma once #include +#include namespace at { namespace native { diff --git a/aten/src/ATen/native/mkl/SparseBlasImpl.cpp b/aten/src/ATen/native/mkl/SparseBlasImpl.cpp index 3e1f7a5771a1..a2ed1af23795 100644 --- a/aten/src/ATen/native/mkl/SparseBlasImpl.cpp +++ b/aten/src/ATen/native/mkl/SparseBlasImpl.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include @@ -351,30 +352,132 @@ void addmm_out_sparse_csr( const Scalar& beta, const Scalar& alpha, const Tensor& result) { - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(mat1.dim() == 2 && mat2.dim() == 2 && result.dim() == 2); - if ((mat1.layout() == kSparseCsr || mat1.layout() == kSparseBsr) && - mat2.layout() == kStrided && result.layout() == kStrided) { - return addmm_dense_result(mat1, mat2, beta, alpha, result); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY( + mat1.dim() == 2 && mat2.dim() == 2 && result.dim() == 2); + TORCH_INTERNAL_ASSERT( + !((mat1.layout() == kStrided) && (mat2.layout() == kStrided) && + (result.layout() == kStrided)), + "Expected at least one sparse input"); + + // Layout checks are nested mat1, mat2, result + // Conditions are ordered strided, csr, csc, bsr, bsc. + // Valid combinations terminate in a return + // Invalid combinations are omitted and will fall though to the TORCH check + // generating an informative error message + if (mat1.layout() == kStrided) { + if (mat2.layout() == kSparseCsr) { + if (result.layout() == kStrided) { + // TODO: Add native CSC support via cuSPARSE if supported. + return addmm_dense_result( + mat2.transpose(0, 1).to_sparse_csr(), + mat1.transpose(0, 1), + beta, + alpha, + result.transpose(0, 1)); + } + } + if (mat2.layout() == kSparseCsc) { + if (result.layout() == kStrided) { + return addmm_dense_result( + mat2.transpose(-2, -1), + mat1.transpose(-2, -1), + beta, + alpha, + result.transpose(-2, -1)); + } + } + if (mat2.layout() == kSparseBsc) { + if (result.layout() == kStrided) { + return addmm_dense_result( + mat2.transpose(-2, -1), + mat1.transpose(-2, -1), + beta, + alpha, + result.transpose(-2, -1)); + } + } } - if (mat1.layout() == kStrided && mat2.is_sparse_csr() && - result.layout() == kStrided) { - // TODO: Use MKL's transposition flags instead of this costly conversion to - // CSR - return addmm_dense_result( - mat2.transpose(0, 1).to_sparse_csr(), - mat1.transpose(0, 1), - beta, - alpha, - result.transpose(0, 1)); + if (mat1.layout() == kSparseCsr) { + if (mat2.layout() == kStrided) { + if (result.layout() == kStrided) { + return addmm_dense_result(mat1, mat2, beta, alpha, result); + } + } + if (mat2.layout() == kSparseCsr) { + if (result.layout() == kStrided) { + return addmm_sparse_input_dense_result(mat1, mat2, beta, alpha, result); + } + if (result.layout() == kSparseCsr) { + return addmm_sparse_result(mat1, mat2, beta, alpha, result); + } + } + if (mat2.layout() == kSparseCsc) { + if (result.layout() == kStrided) { + // TODO: CSR @ CSC kernel would be very fast due to format alignment + return addmm_sparse_input_dense_result( + mat1, mat2.to_sparse_csr(), beta, alpha, result); + } + if (result.layout() == kSparseCsr) { + // TODO: CSR @ CSC kernel would be very fast due to format alignment + return addmm_sparse_result( + mat1, mat2.to_sparse_csr(), beta, alpha, result); + } + } } - if (mat1.is_sparse_csr() && mat2.is_sparse_csr() && result.layout() == kStrided) { - return addmm_sparse_input_dense_result(mat1, mat2, beta, alpha, result); + if (mat1.layout() == kSparseCsc) { + if (mat2.layout() == kStrided) { + if (result.layout() == kStrided) { + // TODO: avoid csc->csr conversion with native csc support + return addmm_dense_result( + mat1.to_sparse_csr(), mat2, beta, alpha, result); + } + } + if (mat2.layout() == kSparseCsr) { + if (result.layout() == kSparseCsr) { + // TODO: avoid csc->csr conversion with native csc support + return addmm_sparse_result( + mat1.to_sparse_csr(), mat2, beta, alpha, result); + } + } + if (mat2.layout() == kSparseCsc) { + if (result.layout() == kStrided) { + return addmm_sparse_input_dense_result( + mat2.transpose(-2, -1), + mat1.transpose(-2, -1), + beta, + alpha, + result.transpose(-2, -1)); + } + if (result.layout() == kSparseCsr) { + // TODO avoid csc->csr + return addmm_sparse_result( + mat1.to_sparse_csr(), mat2.to_sparse_csr(), beta, alpha, result); + } + if (result.layout() == kSparseCsc) { + return addmm_sparse_result( + mat2.transpose(-2, -1), + mat1.transpose(-2, -1), + beta, + alpha, + result.transpose(-2, -1)); + } + } } - if (mat1.is_sparse_csr() && mat2.is_sparse_csr() && result.is_sparse_csr()) { - return addmm_sparse_result(mat1, mat2, beta, alpha, result); + if (mat1.layout() == kSparseBsr) { + if (mat2.layout() == kStrided) { + if (result.layout() == kStrided) { + return addmm_dense_result(mat1, mat2, beta, alpha, result); + } + } } - TORCH_CHECK(false, "addmm: computation on CPU is not implemented for ", - result.layout(), " + ", mat1.layout(), " @ ", mat2.layout()); + TORCH_CHECK( + false, + "addmm: computation on CPU is not implemented for ", + result.layout(), + " + ", + mat1.layout(), + " @ ", + mat2.layout()); } /* diff --git a/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.cpp b/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.cpp index bf84d583dbde..8081de65facf 100644 --- a/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.cpp +++ b/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include // Don't compile with MKL for MSVC/macos since linking the sparse MKL routines diff --git a/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.h b/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.h index 74f3c62215fd..480282e3b3ed 100644 --- a/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.h +++ b/aten/src/ATen/native/mkl/SparseCsrLinearAlgebra.h @@ -1,4 +1,5 @@ -#include +#pragma once +#include #include namespace at { diff --git a/aten/src/ATen/native/mkl/SpectralOps.cpp b/aten/src/ATen/native/mkl/SpectralOps.cpp index 470c3a48e5e0..cb00ce99d82e 100644 --- a/aten/src/ATen/native/mkl/SpectralOps.cpp +++ b/aten/src/ATen/native/mkl/SpectralOps.cpp @@ -1,13 +1,25 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + #if AT_MKL_ENABLED() || AT_POCKETFFT_ENABLED() #include +#include namespace at { namespace native { // In real-to-complex transform, MKL FFT only fills half of the values due to @@ -313,16 +325,9 @@ Tensor _fft_c2c_mkl(const Tensor& self, IntArrayRef dim, int64_t normalization, }} #elif AT_MKL_ENABLED() -#include -#include #include -#include -#include - -#include #include -#include #include #include diff --git a/aten/src/ATen/native/mkldnn/BinaryOps.cpp b/aten/src/ATen/native/mkldnn/BinaryOps.cpp index b842c425a919..3b68c60a9d68 100644 --- a/aten/src/ATen/native/mkldnn/BinaryOps.cpp +++ b/aten/src/ATen/native/mkldnn/BinaryOps.cpp @@ -1,7 +1,15 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include + +#ifndef AT_PER_OPERATOR_HEADERS #include +#else +#include +#include +#include +#endif #if !AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/mkldnn/Conv.cpp b/aten/src/ATen/native/mkldnn/Conv.cpp index 0096a1cda674..3d8188c003e1 100644 --- a/aten/src/ATen/native/mkldnn/Conv.cpp +++ b/aten/src/ATen/native/mkldnn/Conv.cpp @@ -1,7 +1,20 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include + +#ifndef AT_PER_OPERATOR_HEADERS #include -#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif #if !AT_MKLDNN_ENABLED() @@ -39,7 +52,6 @@ REGISTER_NO_CPU_DISPATCH(mkldnn_convolution_backward_stub); #include #include -#include namespace at { namespace native { @@ -155,41 +167,34 @@ static void check_shape_forward(const Tensor& input, // but weight/bias and grad_weight/grad_bias are always CPU tensor. // -Tensor mkldnn_convolution( - const Tensor& input, - const Tensor& weight, - const c10::optional& bias_opt, - IntArrayRef padding, +static inline at::MemoryFormat mkldnn_convolution_memory_format(int64_t dims, bool is_channels_last) { + auto memory_format = at::MemoryFormat::Contiguous; + if (is_channels_last) { + memory_format = dims == 4 ? at::MemoryFormat::ChannelsLast : at::MemoryFormat::ChannelsLast3d; + } + return memory_format; +} + +void _mkldnn_convolution_out ( + const Tensor& input_t, + const Tensor& weight_t, + const Tensor& bias, + std::vector& output_sizes, + ideep::tensor& y, IntArrayRef stride, IntArrayRef dilation, - int64_t groups) { - // See [Note: hacky wrapper removal for optional tensor] - c10::MaybeOwned bias_maybe_owned = at::borrow_from_optional_tensor(bias_opt); - const Tensor& bias = *bias_maybe_owned; - - if (input.scalar_type() == ScalarType::BFloat16) { - TORCH_CHECK(mkldnn_bf16_device_check(), - "mkldnn_convolution: bf16 path needs the cpu support avx512bw, avx512vl and avx512dq"); - } - - check_shape_forward(input, weight, bias, padding, stride, dilation, groups); - - bool is_channels_last = input.suggest_memory_format() == at::MemoryFormat::ChannelsLast; - - auto output_sizes = conv_output_size(input.sizes(), weight.sizes(), padding, stride, dilation); - auto output = at::empty({0}, input.options()); - + IntArrayRef padding, + int64_t groups, + bool is_channels_last, + const ideep::attr_t& op_attr) { + auto memory_format = mkldnn_convolution_memory_format(input_t.ndimension(), is_channels_last); + auto input = input_t.is_mkldnn() ? input_t : input_t.contiguous(memory_format); + auto weight = weight_t.is_mkldnn() ? weight_t : weight_t.contiguous(memory_format); const ideep::tensor x = itensor_from_tensor(input); const ideep::tensor w = itensor_from_tensor(weight); - - ideep::tensor y; - if (is_channels_last) { - output.resize_(output_sizes, input.suggest_memory_format()); - y = itensor_from_tensor(output); - } if (bias.defined()) { const ideep::tensor b = itensor_from_tensor(bias); - ideep::convolution_forward::compute( + ideep::convolution_forward::compute_v3( x, w, b, @@ -199,9 +204,11 @@ Tensor mkldnn_convolution( {dilation.begin(), dilation.end()}, {padding.begin(), padding.end()}, {padding.begin(), padding.end()}, - groups); + groups, + is_channels_last, + op_attr); } else { - ideep::convolution_forward::compute( + ideep::convolution_forward::compute_v3( x, w, {output_sizes.cbegin(), output_sizes.cend()}, @@ -210,24 +217,392 @@ Tensor mkldnn_convolution( {dilation.begin(), dilation.end()}, {padding.begin(), padding.end()}, {padding.begin(), padding.end()}, - groups); + groups, + is_channels_last, + op_attr); + } +} + +Tensor _mkldnn_convolution( + const Tensor& input_t, + const Tensor& weight_t, + const c10::optional& bias_opt, + IntArrayRef padding, + IntArrayRef stride, + IntArrayRef dilation, + int64_t groups, + bool use_channels_last, + c10::string_view attr = "none", + torch::List> scalars = + torch::List>(), + c10::optional algorithm = c10::nullopt) { + ideep::attr_t op_attr = ideep::attr_t(); + if (attr != "none") { + auto it = fusion_unary_attr_map().find(attr); + TORCH_CHECK( + it != fusion_unary_attr_map().end(), "Fusion behavior undefined."); + op_attr = it->second(scalars, algorithm); + } + // See [Note: hacky wrapper removal for optional tensor] + c10::MaybeOwned bias_maybe_owned = at::borrow_from_optional_tensor(bias_opt); + const Tensor& bias = *bias_maybe_owned; + + if (input_t.scalar_type() == ScalarType::BFloat16) { + TORCH_CHECK(mkldnn_bf16_device_check(), + "mkldnn_convolution: bf16 path needs the cpu support avx512bw, avx512vl and avx512dq"); } - if (input.is_mkldnn()) { - return MKLDNNTensor(y, input.options()); - } else if (!is_channels_last) { - return mkldnn_to_dense(MKLDNNTensor(y, input.options())); + check_shape_forward(input_t, weight_t, bias, padding, stride, dilation, groups); + + auto memory_format = + mkldnn_convolution_memory_format(input_t.ndimension(), use_channels_last); + + auto output_sizes = conv_output_size(input_t.sizes(), weight_t.sizes(), padding, stride, dilation); + auto output = at::empty({0}, input_t.options()); + ideep::tensor y; + if (use_channels_last) { + output.resize_(output_sizes, memory_format); + y = itensor_from_tensor(output); + } + _mkldnn_convolution_out( + input_t, + weight_t, + bias, + output_sizes, + y, + stride, + dilation, + padding, + groups, + use_channels_last, + op_attr); + + if (input_t.is_mkldnn()) { + return MKLDNNTensor(y, input_t.options()); + } else if (!use_channels_last) { + return mkldnn_to_dense(MKLDNNTensor(y, input_t.options())); } else { TORCH_INTERNAL_ASSERT(y.get_desc().is_nhwc()); return output; } } +Tensor mkldnn_convolution( + const Tensor& input_t, + const Tensor& weight_t, + const c10::optional& bias_opt, + IntArrayRef padding, + IntArrayRef stride, + IntArrayRef dilation, + int64_t groups) { + bool use_channels_last = mkldnn_conv_use_channels_last(input_t, weight_t); + return _mkldnn_convolution( + input_t, + weight_t, + bias_opt, + padding, + stride, + dilation, + groups, + use_channels_last); +} + +Tensor mkldnn_convolution_pointwise( + const Tensor& input_t, + const Tensor& weight_t, + const c10::optional& bias_opt, + IntArrayRef padding, + IntArrayRef stride, + IntArrayRef dilation, + int64_t groups, + c10::string_view attr, + torch::List> scalars, + c10::optional algorithm) { + c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset); + bool use_channels_last = + weight_t.is_mkldnn() || mkldnn_conv_use_channels_last(input_t, weight_t); + return _mkldnn_convolution( + input_t, + weight_t, + bias_opt, + padding, + stride, + dilation, + groups, + use_channels_last, + attr, + scalars, + algorithm); +} + +// Fuse convolution+binary_op+unary_op for good performance, which doing such +// operation: output=unary_op(binary_op(conv(input_t, ...), other_t, alpha)). +// The binary_attr means which binary_op is, it can be "add", or +// other binary operation. the unary_attr means which unary_op is, +// it can be "relu" or other unary operation, if it is none, meaning that +// there doesn't have a unary post op. unary_scalars and unary_algorithm +// are the parameters of the unary op, such as "hardtanh" has scalar parameters, +// "gelu" has algorithm parameters. +Tensor mkldnn_convolution_pointwise_binary( + const Tensor& input_t, + const Tensor& other_t, + const Tensor& weight_t, + const c10::optional& bias_opt, + IntArrayRef padding, + IntArrayRef stride, + IntArrayRef dilation, + int64_t groups, + c10::string_view binary_attr, + c10::optional alpha, + c10::optional unary_attr, + torch::List> unary_scalars, + c10::optional unary_algorithm) { + TORCH_CHECK( + input_t.ndimension() == 4 || input_t.ndimension() == 5, + "mkldnn_convolution_pointwise_binary: currently only support 2d and 3d") + TORCH_CHECK( + !alpha.has_value() || alpha.value().to() == 1.0, + "mkldnn_convolution_pointwise_binary: the alpha value should be none or 1.0"); + + c10::MaybeOwned bias_maybe_owned = + at::borrow_from_optional_tensor(bias_opt); + const Tensor& bias = *bias_maybe_owned; + + // Make sure inputs have same type(device, layout, dtype), device is cpu and + // dtype is float or bfloat16. + check_mkldnn_binary_fusion_inputs(input_t, other_t, weight_t, bias); + + check_shape_forward( + input_t, weight_t, bias, padding, stride, dilation, groups); + + auto output_sizes = conv_output_size( + input_t.sizes(), weight_t.sizes(), padding, stride, dilation); + // TODO: support broadcast binary fusion. + TORCH_CHECK( + output_sizes == other_t.sizes(), + "Binary Fusion's inputs should have same shape"); + // Only calling fusion path for channels_last path. + // TODO: OneDNN doesn't optimize well for groups > 1 case, it will be enabled + // at next OneDNN release. + bool use_channels_last = + weight_t.is_mkldnn() || mkldnn_conv_use_channels_last(input_t, weight_t); + bool can_be_fused = groups == 1 && use_channels_last; + + c10::string_view unary_attr_value = "none"; + ideep::algorithm unary_alg; + if (unary_attr.has_value()) { + auto it_unary = fusion_unary_alg_map().find(unary_attr.value()); + // Now, we only support conv+binary+relu. + TORCH_CHECK( + it_unary != fusion_unary_alg_map().end(), + "Unary Fusion behavior undefined."); + unary_attr_value = unary_attr.value(); + unary_alg = it_unary->second; + } + auto it_binary = fusion_binary_alg_map().find(binary_attr); + TORCH_CHECK( + it_binary != fusion_binary_alg_map().end(), + "Binary Fusion behavior undefined."); + c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset); + if (can_be_fused) { + auto memory_format = + mkldnn_convolution_memory_format(input_t.ndimension(), true); + auto input = input_t.contiguous(memory_format); + auto weight = + weight_t.is_mkldnn() ? weight_t : weight_t.contiguous(memory_format); + auto other = other_t.contiguous(memory_format); + auto output = at::empty_like(other); + const ideep::tensor x = itensor_from_tensor(input); + const ideep::tensor w = itensor_from_tensor(weight); + const ideep::tensor z = itensor_from_tensor(other); + ideep::tensor y = itensor_from_tensor(output); + auto output_size = other.sizes().vec(); + ideep::tag format_tag = ideep::tag::nhwc; + if (input_t.ndimension() == 5) { + format_tag = ideep::tag::ndhwc; + } + auto other_desc = ideep::tensor::desc( + output_size, get_mkldnn_dtype(weight.scalar_type()), format_tag); + + ideep::attr_t op_attr; + ideep::post_ops po; + po.append_binary(it_binary->second, other_desc); + if (unary_attr_value != "none") { + po.append_eltwise(1.0, unary_alg, 0.f, 0.f); + } + op_attr.set_post_ops(po); + + if (bias.defined()) { + const ideep::tensor b = itensor_from_tensor(bias); + ideep::convolution_forward::compute_binary( + x, + z, + w, + b, + output_size, + y, + {stride.begin(), stride.end()}, + {dilation.begin(), dilation.end()}, + {padding.begin(), padding.end()}, + {padding.begin(), padding.end()}, + groups, + /* is_channels_last */ true, + op_attr); + } else { + ideep::convolution_forward::compute_binary( + x, + z, + w, + output_size, + y, + {stride.begin(), stride.end()}, + {dilation.begin(), dilation.end()}, + {padding.begin(), padding.end()}, + {padding.begin(), padding.end()}, + groups, + /* is_channels_last */ true, + op_attr); + } + return output; + } else { + // Fallback case, if inputs are not channels last or have different dtype, + // OneDNN fusion may have performance regression. + Tensor output; + if (weight_t.is_mkldnn()) { + output = _mkldnn_convolution( + input_t, weight_t, bias, padding, stride, dilation, groups, true); + } else { + output = at::convolution( + input_t, weight_t, bias, stride, padding, dilation, false, 0, groups); + } + if (binary_attr == "add" && unary_attr_value != "none") { + output = at::native::add_relu_(output, other_t); + return output; + } + if (binary_attr == "add") { + output.add_(other_t); + } else if (binary_attr == "sub") { + output.sub_(other_t); + } else if (binary_attr == "mul") { + output.mul_(other_t); + } else { + output.div_(other_t); + } + if (unary_attr_value != "none") { + output.relu_(); + } + return output; + } +} + +// Fuse convolution+binary_op+unary_op for good performance, which doing +// such operation: other_t=unary_op(binary_op(conv(input_t, ...), other_t, +// alpha)). The binary_attr means which binary_op is, it can be "add", or other +// binary operation. the unary_attr means which unary_op is, it can be "relu" or +// other unary operation, if it is none, meaning that there doesn't have a unary +// post op. unary_scalars and unary_algorithm are the parameters of the unary +// op, such as "hardtanh" has scalar parameters "gelu" has algorithm parameters. + +Tensor& mkldnn_convolution_pointwise_binary_( + const Tensor& input_t, + Tensor& other_t, + const Tensor& weight_t, + const c10::optional& bias_opt, + IntArrayRef padding, + IntArrayRef stride, + IntArrayRef dilation, + int64_t groups, + c10::string_view binary_attr, + c10::optional alpha, + c10::optional unary_attr, + torch::List> unary_scalars, + c10::optional unary_algorithm) { + // other_t += convolution(...), other_t = unary(other_t) + TORCH_CHECK( + input_t.ndimension() == 4 || input_t.ndimension() == 5, + "mkldnn_convolution_add_: currently only support 2d and 3d") + TORCH_CHECK( + binary_attr == "add", + "mkldnn_convolution_pointwise_binary_: only support binary op fusion") + TORCH_CHECK( + !alpha.has_value() || alpha.value().to() == 1.0, + "mkldnn_convolution_pointwise_binary: the alpha value for the binary op should be none(meaning 1.0) or 1.0"); + TORCH_CHECK( + !unary_attr.has_value() || unary_attr.value() == "relu", + "mkldnn_convolution_pointwise_binary: only support none or relu unary op fusion after binary op"); + + c10::MaybeOwned bias_maybe_owned = + at::borrow_from_optional_tensor(bias_opt); + const Tensor& bias = *bias_maybe_owned; + + // Make sure inputs have same type(device, layout, dtype), device is cpu and + // dtype is float or bfloat16. + check_mkldnn_binary_fusion_inputs(input_t, other_t, weight_t, bias); + + check_shape_forward( + input_t, weight_t, bias, padding, stride, dilation, groups); + + auto output_sizes = conv_output_size( + input_t.sizes(), weight_t.sizes(), padding, stride, dilation); + TORCH_CHECK( + output_sizes == other_t.sizes(), + "Add Fusion's inputs should have same shape"); + // Only calling fusion path for channels_last path and the output is contiguous tensor(channels_last). + bool can_be_fused = (weight_t.is_mkldnn() || + mkldnn_conv_use_channels_last(input_t, weight_t)) && + (other_t.is_contiguous(at::MemoryFormat::ChannelsLast) || + other_t.is_contiguous(at::MemoryFormat::ChannelsLast3d)); + c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset); + if (can_be_fused) { + ideep::tensor y = itensor_from_tensor(other_t); + ideep::attr_t op_attr; + if (unary_attr.has_value()) { + op_attr = ideep::attr_t::residual(); + } else { + op_attr = ideep::attr_t::fuse_sum(); + } + _mkldnn_convolution_out( + input_t, + weight_t, + bias, + output_sizes, + y, + stride, + dilation, + padding, + groups, + true, + op_attr); + } else { + // Fallback case, if inputs are not channels last or have different dtype, + // OneDNN fusion may have performance regression. + Tensor output; + if (weight_t.is_mkldnn()) { + output = _mkldnn_convolution( + input_t, weight_t, bias, padding, stride, dilation, groups, true); + } else { + output = at::convolution( + input_t, weight_t, bias, stride, padding, dilation, false, 0, groups); + } + if (unary_attr.has_value()) { + other_t = at::native::add_relu_(other_t, output); + } else { + other_t.add_(output); + } + } + return other_t; +} + Tensor mkldnn_convolution_backward_input( - IntArrayRef input_size, const Tensor& grad_output, const Tensor& weight, - IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups, bool bias_defined) -{ - bool is_channels_last = grad_output.suggest_memory_format() == at::MemoryFormat::ChannelsLast; + IntArrayRef input_size, + const Tensor& grad_output, + const Tensor& weight, + IntArrayRef padding, + IntArrayRef stride, + IntArrayRef dilation, + int64_t groups, + bool bias_defined, + bool is_channels_last) { auto grad_input = at::empty({0}, grad_output.options()); auto grad_y = itensor_from_tensor(grad_output); @@ -235,10 +610,11 @@ Tensor mkldnn_convolution_backward_input( ideep::tensor grad_x; if (is_channels_last) { - grad_input.resize_(input_size, grad_output.suggest_memory_format()); + auto memory_format = mkldnn_convolution_memory_format(grad_output.ndimension(), is_channels_last); + grad_input.resize_(input_size, memory_format); grad_x = itensor_from_tensor(grad_input); } - ideep::convolution_backward_data::compute( + ideep::convolution_backward_data::compute_v2( grad_y, w, input_size.vec(), @@ -247,7 +623,8 @@ Tensor mkldnn_convolution_backward_input( dilation.vec(), padding.vec(), padding.vec(), - groups); + groups, + is_channels_last); if (grad_output.is_mkldnn()) { return MKLDNNTensor(grad_x, grad_output.options()); @@ -260,17 +637,21 @@ Tensor mkldnn_convolution_backward_input( } std::tuple mkldnn_convolution_backward_weights( - IntArrayRef weight_size, const Tensor& grad_output, const Tensor& input, - IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups, bool bias_defined) -{ - bool is_channels_last = grad_output.suggest_memory_format() == at::MemoryFormat::ChannelsLast; - + IntArrayRef weight_size, + const Tensor& grad_output, + const Tensor& input, + IntArrayRef padding, + IntArrayRef stride, + IntArrayRef dilation, + int64_t groups, + bool bias_defined, + bool is_channels_last) { const ideep::tensor grad_y = itensor_from_tensor(grad_output); const ideep::tensor x = itensor_from_tensor(input); ideep::tensor grad_w, grad_b; if (bias_defined) { - ideep::convolution_backward_weights::compute( + ideep::convolution_backward_weights::compute_v2( x, grad_y, weight_size.vec(), @@ -280,9 +661,10 @@ std::tuple mkldnn_convolution_backward_weights( dilation.vec(), padding.vec(), padding.vec(), - groups); + groups, + is_channels_last); } else { - ideep::convolution_backward_weights::compute( + ideep::convolution_backward_weights::compute_v2( x, grad_y, weight_size.vec(), @@ -291,7 +673,8 @@ std::tuple mkldnn_convolution_backward_weights( dilation.vec(), padding.vec(), padding.vec(), - groups); + groups, + is_channels_last); } if (!is_channels_last) { @@ -306,20 +689,23 @@ std::tuple mkldnn_convolution_backward_weights( } std::tuple mkldnn_convolution_backward( - const Tensor& input, const Tensor& grad_output_t, const Tensor& weight, + const Tensor& input_t, const Tensor& grad_output_t, const Tensor& weight_t, IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups, std::array output_mask) { - auto memory_format = input.suggest_memory_format(); + bool is_channels_last = mkldnn_conv_use_channels_last(input_t, weight_t); + auto memory_format = mkldnn_convolution_memory_format(input_t.ndimension(), is_channels_last); Tensor grad_output = grad_output_t.is_mkldnn() ? grad_output_t : grad_output_t.contiguous(memory_format); + Tensor input = input_t.is_mkldnn() ? input_t : input_t.contiguous(memory_format); + Tensor weight = weight_t.is_mkldnn() ? weight_t : weight_t.contiguous(memory_format); Tensor grad_input, grad_weight, grad_bias; if (output_mask[0]) { grad_input = mkldnn_convolution_backward_input( - input.sizes(), grad_output, weight, padding, stride, dilation, groups, output_mask[2]); + input.sizes(), grad_output, weight, padding, stride, dilation, groups, output_mask[2], is_channels_last); } if (output_mask[1] || output_mask[2]) { std::tie(grad_weight, grad_bias) = mkldnn_convolution_backward_weights( - weight.sizes(), grad_output, input, padding, stride, dilation, groups, output_mask[2]); + weight.sizes(), grad_output, input, padding, stride, dilation, groups, output_mask[2], is_channels_last); } return std::make_tuple(grad_input, grad_weight, grad_bias); @@ -327,6 +713,29 @@ std::tuple mkldnn_convolution_backward( REGISTER_ALL_CPU_DISPATCH(mkldnn_convolution_backward_stub, &mkldnn_convolution_backward); +TORCH_LIBRARY_IMPL(mkldnn, CPU, m) { + m.impl( + TORCH_SELECTIVE_NAME("mkldnn::_convolution_pointwise"), + TORCH_FN(mkldnn_convolution_pointwise)); + m.impl( + TORCH_SELECTIVE_NAME("mkldnn::_convolution_pointwise.binary"), + TORCH_FN(mkldnn_convolution_pointwise_binary)); + m.impl( + TORCH_SELECTIVE_NAME("mkldnn::_convolution_pointwise_.binary"), + TORCH_FN(mkldnn_convolution_pointwise_binary_)); +} + +TORCH_LIBRARY_IMPL(mkldnn, MkldnnCPU, m) { + m.impl( + TORCH_SELECTIVE_NAME("mkldnn::_convolution_pointwise"), + TORCH_FN(mkldnn_convolution_pointwise)); + m.impl( + TORCH_SELECTIVE_NAME("mkldnn::_convolution_pointwise.binary"), + TORCH_FN(mkldnn_convolution_pointwise_binary)); + m.impl( + TORCH_SELECTIVE_NAME("mkldnn::_convolution_pointwise_.binary"), + TORCH_FN(mkldnn_convolution_pointwise_binary_)); +} }} // namespace at::native #endif diff --git a/aten/src/ATen/native/mkldnn/Copy.cpp b/aten/src/ATen/native/mkldnn/Copy.cpp index eb45bad99264..088353f06b45 100644 --- a/aten/src/ATen/native/mkldnn/Copy.cpp +++ b/aten/src/ATen/native/mkldnn/Copy.cpp @@ -1,6 +1,12 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include + +#ifndef AT_PER_OPERATOR_HEADERS #include +#else +#include +#endif #if !AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/mkldnn/Gelu.cpp b/aten/src/ATen/native/mkldnn/Gelu.cpp index 1d2a67251513..71ab0b92f545 100644 --- a/aten/src/ATen/native/mkldnn/Gelu.cpp +++ b/aten/src/ATen/native/mkldnn/Gelu.cpp @@ -1,8 +1,15 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + #if !AT_MKLDNN_ENABLED() namespace at { namespace native { diff --git a/aten/src/ATen/native/mkldnn/IDeepRegistration.cpp b/aten/src/ATen/native/mkldnn/IDeepRegistration.cpp index b99527480bf2..97f8f8951959 100644 --- a/aten/src/ATen/native/mkldnn/IDeepRegistration.cpp +++ b/aten/src/ATen/native/mkldnn/IDeepRegistration.cpp @@ -1,5 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include #if AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/mkldnn/Linear.cpp b/aten/src/ATen/native/mkldnn/Linear.cpp index 0138190de78a..894e54eefb1c 100644 --- a/aten/src/ATen/native/mkldnn/Linear.cpp +++ b/aten/src/ATen/native/mkldnn/Linear.cpp @@ -1,6 +1,23 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include #include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif #if !AT_MKLDNN_ENABLED() @@ -160,7 +177,260 @@ std::tuple mkldnn_linear_backward( return std::tuple{grad_input, grad_weight, grad_bias}; } +Tensor mkldnn_linear_pointwise( + const Tensor& input_t, + const Tensor& weight_t, + const c10::optional& bias_opt, + c10::string_view attr, + torch::List> scalars, + c10::optional algorithm) { + auto input = input_t.contiguous(); + auto input_size = input.sizes(); + + const int64_t dim = input.dim(); + auto input_reshaped = + dim == 2 ? input : input.reshape({-1, input.size(input.dim() - 1)}); + + std::vector output_size(input_size.begin(), input_size.end() - 1); + output_size.push_back(weight_t.size(0)); + auto output = at::empty(output_size, input.options()); + + if (dim != 2) { + std::vector output_size_reshaped = {input_reshaped.size(0), + weight_t.size(0)}; + output = output.reshape(output_size_reshaped); + } + + c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset); + ideep::tensor mkldnn_output = itensor_from_tensor(output); + + c10::MaybeOwned bias_maybe_owned = + at::borrow_from_optional_tensor(bias_opt); + const Tensor& bias = *bias_maybe_owned; + + const ideep::tensor mkldnn_input = itensor_view_from_dense(input_reshaped); + + c10::optional mkldnn_bias{c10::nullopt}; + if (bias.defined()) { + mkldnn_bias = itensor_from_tensor(bias); + } + const ideep::tensor w = itensor_from_tensor(weight_t); + + auto it = fusion_unary_attr_map().find(attr); + TORCH_CHECK( + it != fusion_unary_attr_map().end(), "Fusion behavior undefined."); + ideep::attr_t op_attr = it->second(scalars, algorithm); + + if (mkldnn_bias.has_value()) { + ideep::inner_product_forward::compute( + mkldnn_input, + w, + mkldnn_bias.value(), + mkldnn_output, + ideep::scale_t(), + ideep::scale_t(), + ideep::scale_t(), + op_attr); + } else { + ideep::inner_product_forward::compute( + mkldnn_input, + w, + mkldnn_output, + ideep::scale_t(), + ideep::scale_t(), + ideep::scale_t(), + op_attr); + } + + if (dim != 2) { + output = output.reshape(output_size); + } + + return output; +} + +Tensor mkldnn_linear_pointwise_binary( + const Tensor& input_t, + const Tensor& other_t, + const Tensor& weight_t, + const c10::optional& bias_opt, + c10::string_view attr) { + c10::MaybeOwned bias_maybe_owned = + at::borrow_from_optional_tensor(bias_opt); + const Tensor& bias = *bias_maybe_owned; + // Make sure inputs have same type(device, layout, dtype), device is cpu and + // dtype is float or bfloat16. + check_mkldnn_binary_fusion_inputs(input_t, other_t, weight_t, bias); + + auto input = input_t.contiguous(); + + auto it_binary = fusion_binary_alg_map().find(attr); + TORCH_CHECK( + it_binary != fusion_binary_alg_map().end(), "Fusion behavior undefined."); + + auto input_size = input.sizes(); + + const int64_t dim = input.dim(); + auto input_reshaped = + dim == 2 ? input : input.reshape({-1, input.size(input.dim() - 1)}); + + std::vector output_size(input_size.begin(), input_size.end() - 1); + output_size.push_back(weight_t.size(0)); + auto output = at::empty(output_size, input.options()); + auto other_reshaped = other_t.contiguous(); + + if (dim != 2) { + std::vector output_size_reshaped = { + input_reshaped.size(0), weight_t.size(0)}; + output = output.reshape(output_size_reshaped); + other_reshaped = other_reshaped.reshape(output_size_reshaped); + } + + TORCH_CHECK( + output.sizes() == other_reshaped.sizes(), + "linear_binary_run expects the size of output and other tensor to be the same"); + + c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset); + ideep::tensor mkldnn_output = itensor_from_tensor(output); + const ideep::tensor mkldnn_other = itensor_from_tensor(other_reshaped); + const ideep::tensor mkldnn_input = itensor_view_from_dense(input_reshaped); + + c10::optional mkldnn_bias{c10::nullopt}; + if (bias.defined()) { + mkldnn_bias = itensor_from_tensor(bias); + } + const ideep::tensor w = itensor_from_tensor(weight_t); + + auto other_desc = mkldnn_other.get_desc(); + auto op_attr = ideep::attr_t::fuse_binary(it_binary->second, other_desc); + + if (mkldnn_bias.has_value()) { + ideep::inner_product_forward::compute_binary( + mkldnn_input, + mkldnn_other, + w, + mkldnn_bias.value(), + mkldnn_output, + op_attr); + } else { + ideep::inner_product_forward::compute_binary( + mkldnn_input, mkldnn_other, w, mkldnn_output, op_attr); + } + + if (dim != 2) { + output = output.reshape(output_size); + } + + return output; +} + +TORCH_LIBRARY_IMPL(mkldnn, CPU, m) { + m.impl( + TORCH_SELECTIVE_NAME("mkldnn::_linear_pointwise"), + TORCH_FN(mkldnn_linear_pointwise)); + m.impl( + TORCH_SELECTIVE_NAME("mkldnn::_linear_pointwise.binary"), + TORCH_FN(mkldnn_linear_pointwise_binary)); +} + } // namespace native } // namespace at #endif // AT_MKLDNN_ENABLED + +#if AT_MKL_ENABLED() && AT_MKLDNN_ENABLED() +#include + +namespace at { +namespace native { + +Tensor mkl_linear( + const Tensor& self, + const Tensor& mkl_weight_t, + const Tensor& origin_weight_t, + const c10::optional& bias_opt, + const int64_t prepack_batch_size) { + c10::MaybeOwned bias_maybe_owned = + at::borrow_from_optional_tensor(bias_opt); + const Tensor& bias = *bias_maybe_owned; + TORCH_CHECK( + self.options().type_equal(origin_weight_t.options()), + "Input type (", + self.toString(), + ") and weight type (", + origin_weight_t.toString(), + ") should be the same"); + TORCH_CHECK( + !bias.defined() || (self.options().type_equal(bias.options())), + "Input type (", + self.toString(), + ") and bias type (", + bias.toString(), + ") should be the same"); + TORCH_CHECK( + mkl_weight_t.scalar_type() == origin_weight_t.scalar_type() && + origin_weight_t.scalar_type() == kFloat, + "mkl_linear: weight dtype should be float"); + + c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset); + auto input_size = self.sizes(); + std::vector output_size(input_size.begin(), input_size.end() - 1); + output_size.push_back(origin_weight_t.size(0)); + auto output = at::empty(output_size, self.options()); + int64_t M = self.numel() / self.size(self.dim() - 1); + if (M == prepack_batch_size && mkl_weight_t.is_mkldnn()) { + auto self_ = self.is_contiguous() ? self : self.contiguous(); + auto K = origin_weight_t.size(1); + auto N = origin_weight_t.size(0); + const ideep::tensor& w = itensor_from_mkldnn(mkl_weight_t); + auto in_ptr = self_.data_ptr(); + auto weight_ptr = (float*)(w.get_data_handle()); + auto out_ptr = output.data_ptr(); + if (bias.defined()) { + auto bias_ = bias.is_contiguous() ? bias : bias.contiguous(); + auto bias_ptr = bias_.data_ptr(); +#ifdef _OPENMP +#if (_OPENMP >= 201307) +#pragma omp parallel for simd schedule( \ + static) if (omp_get_max_threads() > 1 && !omp_in_parallel()) +#else +#pragma omp parallel for schedule( \ + static) if (omp_get_max_threads() > 1 && !omp_in_parallel()) +#endif +#endif + for (int64_t i = 0; i < M; ++i) { + memcpy(out_ptr + i * N, bias_ptr, sizeof(float) * N); + } + } + cblas_sgemm_compute( + CblasRowMajor, + CblasNoTrans, + CblasPacked, + M, + N, + K, + in_ptr, + K, + weight_ptr, + K, + bias.defined() ? 1.f : 0.f, + out_ptr, + N); + } else { + output = at::linear_out(output, self, origin_weight_t, bias_opt); + } + return output; +} + +TORCH_LIBRARY_IMPL(mkl, CPU, m) { + m.impl(TORCH_SELECTIVE_NAME("mkl::_mkl_linear"), TORCH_FN(mkl_linear)); +} + +TORCH_LIBRARY_IMPL(mkl, MkldnnCPU, m) { + m.impl(TORCH_SELECTIVE_NAME("mkl::_mkl_linear"), TORCH_FN(mkl_linear)); +} + +} // namespace native +} // namespace at + +#endif // AT_MKL_ENABLED && AT_MKLDNN_ENABLED diff --git a/aten/src/ATen/native/mkldnn/MKLDNNCommon.h b/aten/src/ATen/native/mkldnn/MKLDNNCommon.h index a86d1c4b722c..783195361570 100644 --- a/aten/src/ATen/native/mkldnn/MKLDNNCommon.h +++ b/aten/src/ATen/native/mkldnn/MKLDNNCommon.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #if AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp b/aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp index fbfb329a5e93..d643fae22ca2 100644 --- a/aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp +++ b/aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp @@ -1,9 +1,23 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include +#include #include #include #include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { @@ -75,7 +89,8 @@ Tensor mkldnn_reorder_conv2d_weight( IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, - int64_t groups) { + int64_t groups, + c10::OptionalArrayRef input_size) { if (self.scalar_type() == ScalarType::BFloat16) { TORCH_CHECK(mkldnn_bf16_device_check(), "mkldnn_reorder_conv2d_weight: bf16 path needs the cpu support avx512bw, avx512vl and avx512dq"); @@ -93,16 +108,28 @@ Tensor mkldnn_reorder_conv2d_weight( w.reshape({wdims[0] * wdims[1], wdims[2], wdims[3], wdims[4]}); } - auto desc = - ideep::convolution_forward::expected_weights_desc( - w.get_dims(), - w.get_data_type(), - {stride.begin(), stride.end()}, - {padding.begin(), padding.end()}, - {padding.begin(), padding.end()}, - {dilation.begin(), dilation.end()}, - groups, - ideep::algorithm::convolution_direct); + ideep::dims src_dims = ideep::dims(); + bool is_channels_last = false; + if (input_size.has_value()) { + src_dims = input_size.value().vec(); + // if has input size, we always use channels last. + is_channels_last = true; + } + + auto desc = ideep::convolution_forward::expected_weights_desc( + w.get_dims(), + w.get_data_type(), + {stride.begin(), stride.end()}, + {padding.begin(), padding.end()}, + {padding.begin(), padding.end()}, + {dilation.begin(), dilation.end()}, + groups, + ideep::algorithm::convolution_direct, + ideep::prop_kind::forward, + w.get_data_type(), + src_dims, + ideep::attr_t(), + is_channels_last); ideep::tensor result; result.init(desc); result.feed_from(w); @@ -156,7 +183,8 @@ Tensor mkldnn_reorder_conv2d_weight( IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, - int64_t groups) { + int64_t groups, + c10::OptionalArrayRef input_size) { TORCH_CHECK(false, "mkldnn_reorder_conv2d_weight: MKL-DNN build is disabled"); } @@ -171,4 +199,48 @@ Tensor mkldnn_reorder_conv3d_weight( #endif // AT_MKLDNN_ENABLED() +#if AT_MKL_ENABLED() && AT_MKLDNN_ENABLED() +#include + +Tensor mkl_reorder_linear_weight( + const Tensor& weight, + const int64_t batch_size) { + TORCH_CHECK( + weight.scalar_type() == ScalarType::Float, + "reorder_linear_weight: weight's dtype should be float"); + c10::impl::ExcludeDispatchKeyGuard edkg(c10::autograd_dispatch_keyset); + auto M = batch_size; + auto N = weight.size(0); + auto K = weight.size(1); + int64_t pack_size = + (int64_t)(cblas_sgemm_pack_get_size(CblasBMatrix, M, N, K) / sizeof(float) + 1); + auto packed_weight = empty_mkldnn( + {pack_size, 1}, + weight.scalar_type(), + weight.options().layout_opt(), + weight.options().device_opt(), + weight.options().pinned_memory_opt()); + ideep::tensor& mkl_weight = itensor_from_mkldnn(packed_weight); + ideep::tensor& orig_w = itensor_from_mkldnn(weight); + cblas_sgemm_pack( + CblasRowMajor, + CblasBMatrix, + CblasTrans, + M, + N, + K, + 1.0f, + (float*)(orig_w.get_data_handle()), + K, + (float*)(mkl_weight.get_data_handle())); + return packed_weight; +} + +TORCH_LIBRARY_IMPL(mkl, MkldnnCPU, m) { + m.impl( + TORCH_SELECTIVE_NAME("mkl::_mkl_reorder_linear_weight"), + TORCH_FN(mkl_reorder_linear_weight)); +} + +#endif // AT_MKL_ENABLED && AT_MKLDNN_ENABLED }} diff --git a/aten/src/ATen/native/mkldnn/Matmul.cpp b/aten/src/ATen/native/mkldnn/Matmul.cpp index 9b07dbfcee5f..383d29659230 100644 --- a/aten/src/ATen/native/mkldnn/Matmul.cpp +++ b/aten/src/ATen/native/mkldnn/Matmul.cpp @@ -1,7 +1,9 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include +#include #include + #if !AT_MKLDNN_ENABLED() namespace at { @@ -127,11 +129,24 @@ void mkldnn_matmul( (mat1.dim() == 2 && mat2.dim() == 1) || // aten::mv (mat1.dim() == 1 && mat2.dim() == 1), // aten::dot "mkldnn_matmul: unsupported dims for mat and mat2"); - TORCH_CHECK(mat1.scalar_type() == at::kBFloat16 && - mat2.scalar_type() == at::kBFloat16 && - result.scalar_type() == at::kBFloat16, "mkldnn_matmul: only enabled for bf16 path"); + TORCH_CHECK(mkldnn_bf16_device_check(), - "mkldnn_matmul: mkldnn_matmul bf16 path needs the cpu support avx512bw, avx512vl and avx512dq"); + "mkldnn_matmul: mkldnn_matmul bf16 path needs the cpu support avx512bw, avx512vl and avx512dq, or AWS Graviton3"); + +#if defined(__aarch64__) + if (mkldnn_bf16_device_check_arm()) { + //onednn fastmath mode can leverage bf16 HW even for the fp32 input, e.g. Arm Neoverse V1 + //so, don't restrict the mkldnn_matmul only for bf16 inputs, allow it for float as well + TORCH_CHECK((mat1.scalar_type() == mat2.scalar_type()) && (mat1.scalar_type() == result.scalar_type()) && + ((mat1.scalar_type() == at::kFloat) || (mat1.scalar_type() == at::kBFloat16)), + "mkldnn_matmul: only enabled for fp32 and bf16 path"); + } else +#endif + { + TORCH_CHECK(mat1.scalar_type() == at::kBFloat16 && + mat2.scalar_type() == at::kBFloat16 && + result.scalar_type() == at::kBFloat16, "mkldnn_matmul: only enabled for bf16 path"); + } auto mat1_unsqueezed = mat1.dim() == 1 ? mat1.unsqueeze(0) : mat1; auto mat2_unsqueezed = mat2.dim() == 1 ? mat2.unsqueeze(1) : mat2; @@ -209,14 +224,29 @@ bool use_mkldnn_bf16_matmul( const Tensor& mat1, const Tensor& mat2, const Tensor& result) { - return ( - use_mkldnn_bf16_matmul() && - mat1.scalar_type() == kBFloat16 && - mat2.scalar_type() == kBFloat16 && - (!result.defined() || result.scalar_type() == kBFloat16) && - mat1.numel() != 0 && - mat2.numel() != 0 && - checksize(mat1, mat2)); +#if defined(__aarch64__) + if (mkldnn_bf16_device_check_arm()) { + //onednn fastmath mode can leverage bf16 HW even for the fp32 input, e.g. Arm Neoverse V1 + //so, don't restrict the mkldnn_matmul only for bf16 inputs, allow it for float as well + return ( + use_mkldnn_bf16_matmul() && + (mat1.scalar_type() == mat2.scalar_type()) && (!result.defined() || (mat1.scalar_type() == result.scalar_type())) && + ((mat1.scalar_type() == kFloat) || (mat1.scalar_type() == kBFloat16)) && + mat1.numel() != 0 && + mat2.numel() != 0 && + checksize(mat1, mat2)); + } else +#endif + { + return ( + use_mkldnn_bf16_matmul() && + mat1.scalar_type() == kBFloat16 && + mat2.scalar_type() == kBFloat16 && + (!result.defined() || result.scalar_type() == kBFloat16) && + mat1.numel() != 0 && + mat2.numel() != 0 && + checksize(mat1, mat2)); + } } } // namespace native diff --git a/aten/src/ATen/native/mkldnn/Matmul.h b/aten/src/ATen/native/mkldnn/Matmul.h index 63426714933b..999cae99a7e0 100644 --- a/aten/src/ATen/native/mkldnn/Matmul.h +++ b/aten/src/ATen/native/mkldnn/Matmul.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include // For TransposeType diff --git a/aten/src/ATen/native/mkldnn/MkldnnTensorMath.cpp b/aten/src/ATen/native/mkldnn/MkldnnTensorMath.cpp index 71d23a34425e..c12db6d6b7e9 100644 --- a/aten/src/ATen/native/mkldnn/MkldnnTensorMath.cpp +++ b/aten/src/ATen/native/mkldnn/MkldnnTensorMath.cpp @@ -1,10 +1,16 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #if !AT_MKLDNN_ENABLED() namespace at { diff --git a/aten/src/ATen/native/mkldnn/Normalization.cpp b/aten/src/ATen/native/mkldnn/Normalization.cpp index 83750ae51263..d0171865fac6 100644 --- a/aten/src/ATen/native/mkldnn/Normalization.cpp +++ b/aten/src/ATen/native/mkldnn/Normalization.cpp @@ -1,8 +1,17 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#endif + #if !AT_MKLDNN_ENABLED() namespace at { @@ -32,6 +41,23 @@ std::tuple mkldnn_layer_norm_last_index_weight_bias_f32( TORCH_CHECK(false, "mkldnn_layer_norm_last_index_weight_bias_f32: ATen not compiled with MKLDNN support"); } +std::tuple _mkldnn_batch_norm_legit( + const Tensor& input, const c10::optional& weight_opt, const c10::optional& bias_opt, Tensor& running_mean, Tensor& running_var, + bool train, + double momentum, + double eps) { + TORCH_CHECK(false, "_mkldnn_batch_norm_legit: ATen not compiled with MKLDNN support"); +} + + +std::tuple _mkldnn_batch_norm_legit_no_stats( + const Tensor& input, const c10::optional& weight_opt, const c10::optional& bias_opt, + bool train, + double momentum, + double eps) { + TORCH_CHECK(false, "_mkldnn_batch_norm_legit_no_stats: ATen not compiled with MKLDNN support"); +} + } // namespace native } // namespace at @@ -164,6 +190,25 @@ std::tuple mkldnn_batch_norm( } } + +std::tuple _mkldnn_batch_norm_legit( + const Tensor& input, const c10::optional& weight_opt, const c10::optional& bias_opt, Tensor& running_mean, Tensor& running_var, + bool train, + double momentum, + double eps) { + return mkldnn_batch_norm(input, weight_opt, bias_opt, running_mean, running_var, train, momentum, eps); +} + + +std::tuple _mkldnn_batch_norm_legit_no_stats( + const Tensor& input, const c10::optional& weight_opt, const c10::optional& bias_opt, + bool train, + double momentum, + double eps) { + return mkldnn_batch_norm(input, weight_opt, bias_opt, Tensor(), Tensor(), train, momentum, eps); +} + + std::tuple mkldnn_batch_norm_backward(const Tensor& grad_output, const Tensor& input, const c10::optional& weight_opt, const c10::optional& running_mean_opt, const c10::optional& running_var_opt, const c10::optional& save_mean_opt, const c10::optional& save_invstd_opt, bool train, diff --git a/aten/src/ATen/native/mkldnn/Pooling.cpp b/aten/src/ATen/native/mkldnn/Pooling.cpp index 80cfa2efcc10..30ff49f49dd3 100644 --- a/aten/src/ATen/native/mkldnn/Pooling.cpp +++ b/aten/src/ATen/native/mkldnn/Pooling.cpp @@ -1,12 +1,28 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #if !AT_MKLDNN_ENABLED() @@ -502,7 +518,7 @@ Tensor mkldnn_adaptive_avg_pool2d( /*padding*/ {0, 0}, /*dilation*/ {1, 1}, /*ceil_mode*/ false, - /*algo*/ ideep::algorithm::pooling_avg); + /*algo*/ ideep::algorithm::pooling_avg_exclude_padding); } Tensor& mkldnn_adaptive_avg_pool2d_out_stub(const Tensor& input, diff --git a/aten/src/ATen/native/mkldnn/Prelu.cpp b/aten/src/ATen/native/mkldnn/Prelu.cpp index acc78211d83c..dc7d239da7b6 100644 --- a/aten/src/ATen/native/mkldnn/Prelu.cpp +++ b/aten/src/ATen/native/mkldnn/Prelu.cpp @@ -17,7 +17,7 @@ std::tuple mkldnn_prelu_backward(const Tensor& grad_output, cons }} -#else // AT_MKLDNN_EBABLED +#else // AT_MKLDNN_ENABLED #include #include @@ -76,4 +76,4 @@ std::tuple mkldnn_prelu_backward(const Tensor& grad_output, cons } }} -#endif // AT_MKLDNN_EBABLED +#endif // AT_MKLDNN_ENABLED diff --git a/aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp b/aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp index 44447441f6da..8841d65a2e78 100644 --- a/aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp +++ b/aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp @@ -34,6 +34,17 @@ TORCH_LIBRARY(mkldnn, m) { // NOLINTNEXTLINE(performance-move-const-arg,cppcoreguidelines-avoid-magic-numbers) std::move(std::get<7>(state))); }); + + m.def(TORCH_SELECTIVE_SCHEMA( + "mkldnn::_linear_pointwise(Tensor X, Tensor W, Tensor? B, str attr, Scalar?[] scalars, str? algorithm) -> Tensor Y")); + m.def(TORCH_SELECTIVE_SCHEMA( + "mkldnn::_linear_pointwise.binary(Tensor X, Tensor other, Tensor W, Tensor? B, str attr) -> Tensor Y")); + m.def(TORCH_SELECTIVE_SCHEMA( + "mkldnn::_convolution_pointwise(Tensor X, Tensor W, Tensor? B, int[] padding, int[] stride, int[] dilation, int groups, str attr, Scalar?[] scalars, str? algorithm) -> Tensor Y")); + m.def(TORCH_SELECTIVE_SCHEMA( + "mkldnn::_convolution_pointwise.binary(Tensor X, Tensor other, Tensor W, Tensor? B, int[] padding, int[] stride, int[] dilation, int groups, str binary_attr, Scalar? alpha, str? unary_attr, Scalar?[] unary_scalars, str? unary_algorithm) -> Tensor Y")); + m.def(TORCH_SELECTIVE_SCHEMA( + "mkldnn::_convolution_pointwise_.binary(Tensor X, Tensor(a!) other, Tensor W, Tensor? B, int[] padding, int[] stride, int[] dilation, int groups, str binary_attr, Scalar? alpha, str? unary_attr, Scalar?[] unary_scalars, str? unary_algorithm) -> Tensor(a!) Y")); } TORCH_LIBRARY(mkldnn_prepacked, m) { @@ -58,3 +69,22 @@ TORCH_LIBRARY_IMPL(mkldnn_prepacked, CPU, m) { } // namespace at #endif // AT_MKLDNN_ENABLED() + +#if AT_MKL_ENABLED() && AT_MKLDNN_ENABLED() + +namespace at { +namespace native { +namespace mkl { + +TORCH_LIBRARY(mkl, m) { + m.def(TORCH_SELECTIVE_SCHEMA( + "mkl::_mkl_reorder_linear_weight(Tensor X, int batch_size) -> Tensor")); + m.def(TORCH_SELECTIVE_SCHEMA( + "mkl::_mkl_linear(Tensor X, Tensor MKL_W, Tensor ORI_W, Tensor? B, int batch_size) -> Tensor")); +} + +} // namespace mkl +} // namespace native +} // namespace at + +#endif // AT_MKL_ENABLED && AT_MKLDNN_ENABLED diff --git a/aten/src/ATen/native/mkldnn/Relu.cpp b/aten/src/ATen/native/mkldnn/Relu.cpp index 517fa6aa6444..ace99f7706e0 100644 --- a/aten/src/ATen/native/mkldnn/Relu.cpp +++ b/aten/src/ATen/native/mkldnn/Relu.cpp @@ -1,7 +1,13 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include // for mkldnn_relu, mkldnn_... +#include // for mkldnn_relu_backward +#endif #if !AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/mkldnn/SoftMax.cpp b/aten/src/ATen/native/mkldnn/SoftMax.cpp index 743584544ef9..d49643ee1ad3 100644 --- a/aten/src/ATen/native/mkldnn/SoftMax.cpp +++ b/aten/src/ATen/native/mkldnn/SoftMax.cpp @@ -1,6 +1,12 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include + +#ifndef AT_PER_OPERATOR_HEADERS #include +#else +#include // for mkldnn_softmax +#endif #if !AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/mkldnn/TensorFactories.cpp b/aten/src/ATen/native/mkldnn/TensorFactories.cpp index a944d4db19b6..65a22aa74ed5 100644 --- a/aten/src/ATen/native/mkldnn/TensorFactories.cpp +++ b/aten/src/ATen/native/mkldnn/TensorFactories.cpp @@ -1,10 +1,14 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -namespace at { namespace native { +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif -Tensor empty_symint_mkldnn(c10::SymIntArrayRef sizes, c10::optional dtype, c10::optional layout, c10::optional device, c10::optional pin_memory, c10::optional optional_memory_format) { - return at::native::empty_mkldnn(c10::asIntArrayRefSlow(sizes), dtype, layout, device, pin_memory, optional_memory_format); -} +namespace at { namespace native { #if AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/mkldnn/TensorShape.cpp b/aten/src/ATen/native/mkldnn/TensorShape.cpp index ec3c58eda77f..1e54aae9d660 100644 --- a/aten/src/ATen/native/mkldnn/TensorShape.cpp +++ b/aten/src/ATen/native/mkldnn/TensorShape.cpp @@ -1,9 +1,19 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include -#include +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#endif + #if !AT_MKLDNN_ENABLED() namespace at { @@ -69,6 +79,9 @@ Tensor mkldnn_clone(const Tensor& self, c10::optional optiona } Tensor mkldnn_transpose(const Tensor& self, int64_t dim0, int64_t dim1) { + auto ndims = self.dim(); + dim0 = maybe_wrap_dim(dim0, ndims); + dim1 = maybe_wrap_dim(dim1, ndims); const ideep::tensor& x = itensor_from_mkldnn(self); ideep::tensor y; std::vector axes(x.ndims()); diff --git a/aten/src/ATen/native/mkldnn/UnaryOps.cpp b/aten/src/ATen/native/mkldnn/UnaryOps.cpp index f4a1a76c69b1..7f57d99ac176 100644 --- a/aten/src/ATen/native/mkldnn/UnaryOps.cpp +++ b/aten/src/ATen/native/mkldnn/UnaryOps.cpp @@ -1,6 +1,13 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include + +#ifndef AT_PER_OPERATOR_HEADERS #include +#else +#include // for mkldnn_sigmoid, mkldnn_... +#include // for mkldnn_tanh, mkldnn_tanh_ +#endif #if !AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/mkldnn/Utils.cpp b/aten/src/ATen/native/mkldnn/Utils.cpp index 62aeee407808..2c9bcc016e47 100644 --- a/aten/src/ATen/native/mkldnn/Utils.cpp +++ b/aten/src/ATen/native/mkldnn/Utils.cpp @@ -1,3 +1,4 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include @@ -32,4 +33,136 @@ std::vector pool_output_sizes( return output_size; } +void check_mkldnn_binary_fusion_inputs( + const Tensor& input, + const Tensor& other, + const Tensor& weight, + const Tensor& bias) { + if (!weight.is_mkldnn()) { + TORCH_CHECK( + input.options().type_equal(weight.options()), + "Input type (", + input.toString(), + ") and weight type (", + weight.toString(), + ") should be the same"); + } else { + TORCH_CHECK( + input.scalar_type() == input.scalar_type(), + "mkldnn pointwise binary: input dtype and weight dtype should be the same"); + } + TORCH_CHECK( + input.options().type_equal(other.options()), + "Input type (", + input.toString(), + ") and other type (", + other.toString(), + ") should be the same"); + TORCH_CHECK( + !bias.defined() || (input.options().type_equal(bias.options())), + "Input type (", + input.toString(), + ") and bias type (", + bias.toString(), + ") should be the same"); + TORCH_CHECK( + input.device().is_cpu(), + "mkldnn pointwise binary fusion: input's device should be CPU"); + TORCH_CHECK( + input.scalar_type() == ScalarType::Float || + input.scalar_type() == ScalarType::BFloat16, + "mkldnn pointwise binary: input's dtype should be float or bfloat16"); + if (input.scalar_type() == ScalarType::BFloat16) { + TORCH_CHECK( + mkldnn_bf16_device_check(), + "mkldnn pointwise binary: bf16 path needs the cpu support avx512bw, avx512vl and avx512dq"); + } +} + +#if AT_MKLDNN_ENABLED() + +#define ATTR_FUNC(NAME) \ + [](torch::List> scalars, \ + c10::optional algorithm) { \ + return ideep::attr_t::fuse_##NAME(); \ + } + +AttrFunction attr_func_leaky_relu = + [](torch::List> scalars, + c10::optional algorithm) { + TORCH_CHECK( + scalars.size() == 1 && + scalars[0].get().toOptional().has_value(), + "leaky_relu is expected to have one scalar input: negative_slope"); + auto alpha_value = + scalars[0].get().toOptional().value().to(); + return ideep::attr_t::fuse_relu(1.0, alpha_value); + }; + +AttrFunction attr_func_hardtanh = + [](torch::List> scalars, + c10::optional algorithm) { + TORCH_CHECK( + scalars.size() == 2 && + scalars[0].get().toOptional().has_value() && + scalars[1].get().toOptional().has_value(), + "hardtanh is expected to have two scalar input: min_val and max_val"); + + auto lower_bound_value = + scalars[0].get().toOptional().value().to(); + auto upper_bound_value = + scalars[1].get().toOptional().value().to(); + return ideep::attr_t::fuse_clamp(lower_bound_value, upper_bound_value); + }; + +AttrFunction attr_func_gelu = [](torch::List> scalars, + c10::optional algorithm) { + TORCH_CHECK( + algorithm.has_value(), + "gelu is expected to have one str input: algorithm"); + dnnl::algorithm gelu_type; + if (algorithm.value() == "none") { + gelu_type = dnnl::algorithm::eltwise_gelu_erf; + } else if (algorithm.value() == "tanh") { + gelu_type = dnnl::algorithm::eltwise_gelu_tanh; + } else { + TORCH_INTERNAL_ASSERT( + false, "Unsupported gelu algorithm: ", algorithm.value()); + } + + return ideep::attr_t::fuse_gelu(1.0, 0.f, 0.f, gelu_type); +}; + +const std::map& fusion_unary_attr_map() { + static const std::map fusion_attr_map{ + {"relu", ATTR_FUNC(relu)}, + {"sigmoid", ATTR_FUNC(sigmoid)}, + {"tanh", ATTR_FUNC(tanh)}, + {"swish", ATTR_FUNC(swish)}, + {"hardswish", ATTR_FUNC(hardswish)}, + {"leaky_relu", attr_func_leaky_relu}, + {"hardtanh", attr_func_hardtanh}, + {"gelu", attr_func_gelu}, + }; + return fusion_attr_map; +}; + +const std::map& fusion_unary_alg_map() { + static const std::map fusion_attr_map{ + {"relu", {ideep::algorithm::eltwise_relu}}, + }; + return fusion_attr_map; +}; + +const std::map& fusion_binary_alg_map() { + static const std::map fusion_attr_map{ + {"add", {ideep::algorithm::binary_add}}, + {"sub", {ideep::algorithm::binary_sub}}, + {"mul", {ideep::algorithm::binary_mul}}, + {"div", {ideep::algorithm::binary_div}}, + }; + return fusion_attr_map; +}; + +#endif // AT_MKLDNN_ENABLED() }} diff --git a/aten/src/ATen/native/mkldnn/Utils.h b/aten/src/ATen/native/mkldnn/Utils.h index a27b842be04b..a25be13c46da 100644 --- a/aten/src/ATen/native/mkldnn/Utils.h +++ b/aten/src/ATen/native/mkldnn/Utils.h @@ -1,11 +1,15 @@ #pragma once -#include +#include +#include +#include #include -#include -#include #include +#include +#if AT_MKLDNN_ENABLED() +#include +#endif // AT_MKLDNN_ENABLED() namespace at { namespace native { @@ -22,11 +26,41 @@ std::vector pool_output_sizes( IntArrayRef padding_r, IntArrayRef dilation, bool ceil_mode); + +void check_mkldnn_binary_fusion_inputs( + const Tensor& input, + const Tensor& other, + const Tensor& weight, + const Tensor& bias); + +#if AT_MKLDNN_ENABLED() + +using AttrFunction = std::function>, + c10::optional)>; + +const std::map& fusion_unary_attr_map(); + +const std::map& fusion_unary_alg_map(); + +const std::map& fusion_binary_alg_map(); + +#endif // AT_MKLDNN_ENABLED() }; inline bool mkldnn_bf16_device_check() { - return cpuinfo_initialize() && cpuinfo_has_x86_avx512bw() - && cpuinfo_has_x86_avx512vl() && cpuinfo_has_x86_avx512dq(); + return cpuinfo_initialize() && ((cpuinfo_has_x86_avx512bw() + && cpuinfo_has_x86_avx512vl() && cpuinfo_has_x86_avx512dq()) || (cpuinfo_has_arm_bf16())); +} + +#if defined(__aarch64__) +inline bool mkldnn_bf16_device_check_arm() { + return (cpuinfo_initialize() && cpuinfo_has_arm_bf16()); +} +#else +constexpr bool mkldnn_bf16_device_check_arm() { + return false; } +#endif } diff --git a/aten/src/ATen/native/mps/Copy.h b/aten/src/ATen/native/mps/Copy.h index 1a4465e73538..4ffa73d039ad 100644 --- a/aten/src/ATen/native/mps/Copy.h +++ b/aten/src/ATen/native/mps/Copy.h @@ -1,20 +1,7 @@ // Copyright © 2022 Apple Inc. #pragma once -#include - -#include -#include -#include -#include -#include -#include - -#ifdef __OBJC__ -#include -#include -#include -#endif +#include namespace at { namespace native { diff --git a/aten/src/ATen/native/mps/MPSGraphVenturaOps.h b/aten/src/ATen/native/mps/MPSGraphVenturaOps.h new file mode 100644 index 000000000000..b77db66795cf --- /dev/null +++ b/aten/src/ATen/native/mps/MPSGraphVenturaOps.h @@ -0,0 +1,17 @@ +#pragma once +#include + +// TODO: Remove me when moved to MacOS 13 +@interface MPSGraph (VenturaOps) +- (MPSGraphTensor *)cumulativeSumWithTensor:(MPSGraphTensor *)tensor + axis:(NSInteger)axis + name:(NSString *)name; + +- (MPSGraphTensor *)sortWithTensor:(MPSGraphTensor *)tensor + axis:(NSInteger)axis + name:(NSString *)name; + +- (MPSGraphTensor *)argSortWithTensor:(MPSGraphTensor *)tensor + axis:(NSInteger)axis + name:(NSString *)name; +@end diff --git a/aten/src/ATen/native/mps/OperationUtils.h b/aten/src/ATen/native/mps/OperationUtils.h index 32aede7fc5e0..93b014124339 100644 --- a/aten/src/ATen/native/mps/OperationUtils.h +++ b/aten/src/ATen/native/mps/OperationUtils.h @@ -37,6 +37,20 @@ struct TORCH_CUDA_CPP_API MPSGeneratorImpl : public c10::GeneratorImpl { const Generator& getDefaultMPSGenerator(); +struct MPSScalar { + id getMTLBuffer() const { return __builtin_bit_cast(id, buffer.get()); } + + size_t size = 0; + ScalarType type = ScalarType::Undefined; + c10::DataPtr buffer; // stores MTLBuffer (frees buffer if MPSScalar instance goes out of scope) + union { + float f; // MPS doesn't support 'double' + at::Half h; + int64_t i; + bool b; + } value {}; +}; + void runMPSGraph( MPSStream* mpsStream, MPSGraph* mpsGraph, @@ -45,10 +59,10 @@ void runMPSGraph( MPSDataType getMPSDataType(ScalarType scalar_type); MPSDataType getMPSScalarType(ScalarType scalar_type); +MPSScalar getMPSScalar(const Scalar& scalar, ScalarType type); std::string getMPSTypeString(ScalarType scalar_type); std::string getMPSShapeString(MPSShape* shape); std::string getTensorsStringKey(const TensorList& tensors, bool use_scalar_value = false); -double getMPSScalarValue(const Tensor& t); std::string getArrayRefString(const IntArrayRef s); // use has_storage() on the returned tensor to determine if src actually is a view Tensor gatherViewTensor(const at::Tensor& src, at::Tensor& dst); @@ -87,7 +101,7 @@ void resize_tensor(Tensor* output); MPSGraphTensor* trunc_tensor(MPSGraph* mpsGraph, MPSGraphTensor* inputTensor); MPSGraphTensor* castMPSTensor(MPSGraph *mpsGraph, MPSGraphTensor* tensor, ScalarType toType); MPSGraphTensorData *getMPSGraphTensorData(MPSGraph* mpsGraph, MPSStream* mpsStream, const Tensor& tensor); -MPSGraphTensorData* getMPSGraphTensorFromScalar(MPSStream* mpsStream, const Scalar& scalar, MPSDataType dataType); +MPSGraphTensorData* getMPSGraphTensorFromScalar(MPSStream* mpsStream, MPSScalar& scalar); MPSGraph* make_mps_graph(); void printTensorNDArray(const Tensor& t); @@ -95,6 +109,7 @@ void printTensorNDArray(const Tensor& t); MPSGraphTensor* mpsGraphUnrankedPlaceHolder(MPSGraph *mpsGraph, MPSDataType dataType); MPSGraphTensor* mpsGraphRankedPlaceHolder(MPSGraph *mpsGraph, MPSDataType dataType, MPSShape* mpsShape); MPSGraphTensor* mpsGraphRankedPlaceHolder(MPSGraph *mpsGraph, const Tensor& tensor); +MPSGraphTensor* mpsGraphScalarPlaceHolder(MPSGraph *mpsGraph, MPSDataType dataType); MPSGraphTensor* mpsGraphScalarPlaceHolder(MPSGraph *mpsGraph, const Scalar& scalar); string get_mem_format_string(c10::MemoryFormat memory_format); @@ -190,6 +205,11 @@ struct MPSGraphCache return result; } + template + inline T* CreateCachedGraphAs(const std::string& key, CreateCachedGraphBlock createCacheBlock, void* view_ptr = nullptr) { + return static_cast(CreateCachedGraph(key, createCacheBlock, view_ptr)); + } + MPSCachedGraph* LookUp(const std::string& key) const { __block MPSCachedGraph* result = nullptr; @@ -244,6 +264,7 @@ struct MPSGraphCache }; + } // namespace mps } // namespace native } // namespace at diff --git a/aten/src/ATen/native/mps/OperationUtils.mm b/aten/src/ATen/native/mps/OperationUtils.mm index 65c30a0b39ed..f41484b27b14 100644 --- a/aten/src/ATen/native/mps/OperationUtils.mm +++ b/aten/src/ATen/native/mps/OperationUtils.mm @@ -12,6 +12,7 @@ this->set_current_seed(random); return random; } + uint64_t MPSGeneratorImpl::current_seed() const { return seed_; } @@ -61,7 +62,7 @@ } void runMPSGraph(MPSStream* mpsStream, MPSGraph* mpsGraph, NSDictionary* feeds, NSDictionary* results) { - mpsStream->executeMPSGraph(mpsGraph, feeds, results); + mpsStream->executeMPSGraph(mpsGraph, feeds, results, SyncType::COMMIT_ADAPTIVE); } MPSDataType getMPSDataType(ScalarType scalar_type) { @@ -163,7 +164,7 @@ MPSDataType getMPSScalarType(ScalarType scalar_type) { str += getMPSTypeString(tensor.scalar_type()) + "["; // if tensor is a scalar if (tensor.dim() == 0) { - str += (use_scalar_value ? std::to_string(getMPSScalarValue(tensor)) : "Scalar"); + str += (use_scalar_value ? std::to_string(tensor.item().to()) : "Scalar"); } else { const NSString* ns_shape_key = [[getMPSShape(tensor) valueForKey:@"description"] componentsJoinedByString:@","]; str += std::string(ns_shape_key.UTF8String); @@ -176,26 +177,8 @@ MPSDataType getMPSScalarType(ScalarType scalar_type) { return str; } -double getMPSScalarValue(const Tensor& t) { - assert (t.dim() == 0); // only applicable for scalar types - auto other_value = t.item(); - return other_value.to(); -} - MPSShape* getMPSShape(const Tensor& t) { - const int sz = t.dim(); - const int sz_ = (sz > 0) ? sz : 1; - - NSNumber* numbers[sz_]; - - for (int i = 0; i < sz_; i++) - { - NSInteger sz_i = (i < sz) ? t.size(i) : 1; - - NSNumber* number = [NSNumber numberWithInteger:sz_i]; - numbers[i] = number; - } - return [NSArray arrayWithObjects:numbers count:sz_]; + return getMPSShape(t.sizes()); } MPSShape* getMPSShape(c10::MaybeOwned t) { @@ -207,16 +190,14 @@ MPSDataType getMPSScalarType(ScalarType scalar_type) { const int sz = sizes.size(); const int sz_ = (sz > 0) ? sz : 1; - NSNumber* numbers[sz_]; + std::vector numbers(sz_); - for (int i = 0; i < sz_; i++) - { + for (int i = 0; i < sz_; i++) { NSInteger sz_i = (i < sz) ? sizes[i] : 1; - NSNumber* number = [NSNumber numberWithInteger:sz_i]; numbers[i] = number; } - return [NSArray arrayWithObjects:numbers count:sz_]; + return [NSArray arrayWithObjects:numbers.data() count:numbers.size()]; } void printTensorNDArray(const Tensor& t) { @@ -250,9 +231,9 @@ void printTensorNDArray(const Tensor& t) { // use "_tensor" from Placeholder to retain view's output during its usage in other ops _tensor = gatherViewTensor(src, emptyShell); if (!_tensor.has_storage()) { - // if we cannot gather, we make the the tensor contiguous implicitly, and keep + // if we cannot gather, we make the tensor contiguous implicitly, and keep // it in placeholder to be able to retrieve it when we return from constructor - _tensor = src.contiguous(); + _tensor = src.clone(MemoryFormat::Contiguous); } srcBuf = getMTLBufferStorage(_tensor); } @@ -297,46 +278,36 @@ void printTensorNDArray(const Tensor& t) { return result; } -MPSGraphTensorData* getMPSGraphTensorFromScalar(MPSStream* mpsStream, const Scalar& scalar, MPSDataType dataType) { - union { - float f; // MPS doesn't support 'double' - at::Half h; - int64_t i; - bool b; - } v; - switch (dataType) { - case MPSDataTypeFloat32: - v.f = scalar.to(); - break; - case MPSDataTypeFloat16: - v.h = scalar.to(); - break; - case MPSDataTypeInt64: - v.i = scalar.to(); - break; - case MPSDataTypeInt32: - v.i = scalar.to(); - break; - case MPSDataTypeInt16: - v.i = scalar.to(); - break; - case MPSDataTypeInt8: - v.i = scalar.to(); - break; - case MPSDataTypeUInt8: - v.i = scalar.to(); - break; - case MPSDataTypeBool: - v.b = scalar.to(); - break; +MPSScalar getMPSScalar(const Scalar& scalar, ScalarType type) { + switch (type) { + case ScalarType::Double: + case ScalarType::Float: return {.value.f = scalar.to() , .size = sizeof(float) , .type = type}; + case ScalarType::Half: return {.value.h = scalar.to(), .size = sizeof(short) , .type = type}; + case ScalarType::Long: return {.value.i = scalar.to() , .size = sizeof(int64_t), .type = type}; + case ScalarType::Int: return {.value.i = scalar.to() , .size = sizeof(int32_t), .type = type}; + case ScalarType::Short: return {.value.i = scalar.to() , .size = sizeof(int16_t), .type = type}; + case ScalarType::Char: return {.value.i = scalar.to() , .size = sizeof(int8_t) , .type = type}; + case ScalarType::Byte: return {.value.i = scalar.to() , .size = sizeof(uint8_t), .type = type}; + case ScalarType::Bool: return {.value.b = scalar.to() , .size = sizeof(bool) , .type = type}; default: - TORCH_INTERNAL_ASSERT(false, "Unsupported scalar type on MPS backend.") + TORCH_INTERNAL_ASSERT(false, "Unsupported scalar type '", type, "' on MPS backend."); } +} - MPSNDArrayDescriptor *tensorDesc = [MPSNDArrayDescriptor descriptorWithDataType:dataType shape:@[@1]]; - MPSNDArray *tensorNDArray = [[[MPSNDArray alloc] initWithDevice:mpsStream->device() descriptor:tensorDesc] autorelease]; - [tensorNDArray writeBytes:&v strideBytes:nil]; - MPSGraphTensorData* result = [[[MPSGraphTensorData alloc] initWithMPSNDArray:tensorNDArray] autorelease]; +MPSGraphTensorData* getMPSGraphTensorFromScalar(MPSStream* mpsStream, MPSScalar& scalar) { + MPSGraphTensorData *result = nullptr; + // Scalar pools are only supported on devices with unified memory + if (mpsStream->device().hasUnifiedMemory) { + scalar.buffer = at::mps::allocate_scalar_buffer(&scalar.value, scalar.size); + result = [[[MPSGraphTensorData alloc] initWithMTLBuffer: scalar.getMTLBuffer() + shape: @[@1] + dataType: getMPSScalarType(scalar.type)] autorelease]; + } else { + MPSNDArrayDescriptor *tensorDesc = [MPSNDArrayDescriptor descriptorWithDataType:getMPSScalarType(scalar.type) shape:@[@1]]; + MPSNDArray *tensorNDArray = [[[MPSNDArray alloc] initWithDevice:mpsStream->device() descriptor:tensorDesc] autorelease]; + [tensorNDArray writeBytes:&scalar.value strideBytes:nil]; + result = [[[MPSGraphTensorData alloc] initWithMPSNDArray:tensorNDArray] autorelease]; + } return result; } @@ -368,6 +339,12 @@ void resize_tensor(Tensor* output) { name:nil]; } +MPSGraphTensor* mpsGraphScalarPlaceHolder(MPSGraph *mpsGraph, MPSDataType dataType) { + return [mpsGraph placeholderWithShape:@[@1] + dataType:dataType + name:nil]; +} + MPSGraphTensor* mpsGraphScalarPlaceHolder(MPSGraph *mpsGraph, const Scalar& scalar) { return [mpsGraph placeholderWithShape:@[@1] dataType:getMPSScalarType(scalar.type()) @@ -411,4 +388,4 @@ void executeMPSAllocatorCallback(void* ptr, EventType event) override { } } // namespace mps } // namespace native -} // namespace at +} // namespace at \ No newline at end of file diff --git a/aten/src/ATen/native/mps/TensorFactory.cpp b/aten/src/ATen/native/mps/TensorFactory.cpp index d280da4d9c65..2f4c04024536 100644 --- a/aten/src/ATen/native/mps/TensorFactory.cpp +++ b/aten/src/ATen/native/mps/TensorFactory.cpp @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -71,17 +72,6 @@ Tensor empty_mps( return at::detail::empty_mps(size, dtype_opt, layout_opt, device_opt, pin_memory_opt, memory_format_opt); } -Tensor empty_symint_mps( - c10::SymIntArrayRef size, - c10::optional dtype_opt, - c10::optional layout_opt, - c10::optional device_opt, - c10::optional pin_memory_opt, - c10::optional memory_format_opt) { - - return at::native::empty_mps(c10::asIntArrayRefSlow(size), dtype_opt, layout_opt, device_opt, pin_memory_opt, memory_format_opt); -} - Tensor empty_strided_mps( IntArrayRef size, IntArrayRef stride, diff --git a/aten/src/ATen/native/mps/operations/Activation.mm b/aten/src/ATen/native/mps/operations/Activation.mm index e929a41be2ce..618a00f33787 100644 --- a/aten/src/ATen/native/mps/operations/Activation.mm +++ b/aten/src/ATen/native/mps/operations/Activation.mm @@ -777,16 +777,17 @@ Tensor relu_mps(const Tensor& self) { MPSGraphTensor* normcdf (MPSGraph* mpsGraph, MPSGraphTensor *inputTensor) { // (1.0f + erf(x*SQRT1_2)) * 0.5f * x; + auto dataType = [inputTensor dataType]; const float SQRT1_2 = 0.707106781186547524400844362104849039f; - MPSGraphTensor *sqrt1_2 = [mpsGraph constantWithScalar:SQRT1_2 - shape:@[@1] - dataType:MPSDataTypeFloat32]; - MPSGraphTensor *onef = [mpsGraph constantWithScalar:1.0f - shape:@[@1] - dataType:MPSDataTypeFloat32]; - MPSGraphTensor *halff = [mpsGraph constantWithScalar:0.5f - shape:@[@1] - dataType:MPSDataTypeFloat32]; + MPSGraphTensor *sqrt1_2 = [mpsGraph constantWithScalar: SQRT1_2 + shape: @[@1] + dataType: dataType]; + MPSGraphTensor *onef = [mpsGraph constantWithScalar: 1.0f + shape: @[@1] + dataType: dataType]; + MPSGraphTensor *halff = [mpsGraph constantWithScalar: 0.5f + shape: @[@1] + dataType: dataType]; MPSGraphTensor *erfTensor = [mpsGraph multiplicationWithPrimaryTensor: inputTensor secondaryTensor: sqrt1_2 @@ -807,6 +808,7 @@ Tensor relu_mps(const Tensor& self) { ) { using namespace mps; TORCH_CHECK(output.is_mps()); + TORCH_CHECK(c10::isFloatingType(self.scalar_type()), "GELU is only implemented for floating types"); // Empty output if(output.numel() == 0) @@ -899,6 +901,7 @@ Tensor relu_mps(const Tensor& self) { CachedGraph *newCachedGraph = nil; @autoreleasepool { + auto dataType = getMPSDataType(self.scalar_type()); MPSGraph* mpsGraph = make_mps_graph(); newCachedGraph = new CachedGraph(mpsGraph); @@ -906,15 +909,15 @@ Tensor relu_mps(const Tensor& self) { getMPSDataType(grad.scalar_type()), getMPSShape(grad)); MPSGraphTensor* inputTensor = mpsGraphRankedPlaceHolder(mpsGraph, - getMPSDataType(self.scalar_type()), + dataType, getMPSShape(self)); MPSGraphTensor* cdf = normcdf(mpsGraph, inputTensor); - MPSGraphTensor *halff = [mpsGraph constantWithScalar:-0.5f - shape:@[@1] - dataType:MPSDataTypeFloat32]; - MPSGraphTensor *betaf = [mpsGraph constantWithScalar:kBeta - shape:@[@1] - dataType:MPSDataTypeFloat32]; + MPSGraphTensor *halff = [mpsGraph constantWithScalar: -0.5f + shape: @[@1] + dataType: dataType]; + MPSGraphTensor *betaf = [mpsGraph constantWithScalar :kBeta + shape :@[@1] + dataType:dataType]; MPSGraphTensor *pdfMul = [mpsGraph squareWithTensor : inputTensor name : nil]; pdfMul = [mpsGraph multiplicationWithPrimaryTensor : pdfMul @@ -1456,19 +1459,20 @@ Tensor glu_backward_mps (const Tensor& grad_output, if(result.numel() == 0) return; - auto beta_f = beta.to(); - struct CachedGraph : public MPSCachedGraph { CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) {} MPSGraphTensor *inputTensor_ = nil; MPSGraphTensor *betaTensor_ = nil; + MPSGraphTensor *thresholdTensor_ = nil; MPSGraphTensor *outputTensor_ = nil; }; MPSGraphCache* cache_ = MPSGraphCache::getInstance(); MPSStream* stream = getCurrentMPSStream(); + MPSScalar beta_scalar = getMPSScalar(beta, ScalarType::Float); + MPSScalar threshold_scalar = getMPSScalar(threshold, ScalarType::Float); @autoreleasepool { string key = "softplus_out_mps:" + getTensorsStringKey({self}); @@ -1484,7 +1488,9 @@ Tensor glu_backward_mps (const Tensor& grad_output, newCachedGraph = new CachedGraph(mpsGraph); MPSGraphTensor* inputTensor = mpsGraphRankedPlaceHolder(mpsGraph, self); - MPSGraphTensor* betaTensor = mpsGraphScalarPlaceHolder(mpsGraph, beta); + MPSGraphTensor* betaTensor = mpsGraphScalarPlaceHolder(mpsGraph, getMPSDataType(ScalarType::Float)); + + MPSGraphTensor* thresholdTensor = mpsGraphScalarPlaceHolder(mpsGraph, getMPSDataType(ScalarType::Float)); MPSGraphTensor* reluTensor = [mpsGraph reLUWithTensor:inputTensor name:nil]; @@ -1497,9 +1503,6 @@ Tensor glu_backward_mps (const Tensor& grad_output, MPSGraphTensor* bxTensor = [mpsGraph multiplicationWithPrimaryTensor:inputTensor secondaryTensor:betaTensor name:nil]; - MPSGraphTensor* thresholdTensor = [mpsGraph constantWithScalar:threshold.to() - shape:@[@1] - dataType:getMPSDataType(self.scalar_type())]; MPSGraphTensor* predicateTensor = [mpsGraph greaterThanWithPrimaryTensor:bxTensor secondaryTensor:thresholdTensor name:nil]; @@ -1522,6 +1525,7 @@ Tensor glu_backward_mps (const Tensor& grad_output, newCachedGraph->inputTensor_ = inputTensor; newCachedGraph->betaTensor_ = betaTensor; + newCachedGraph->thresholdTensor_ = thresholdTensor; newCachedGraph->outputTensor_ = outputTensor; } return newCachedGraph; @@ -1534,7 +1538,8 @@ Tensor glu_backward_mps (const Tensor& grad_output, // Create dictionary of inputs and outputs NSDictionary* feeds = @{ selfPlaceholder.getMPSGraphTensor() : selfPlaceholder.getMPSGraphTensorData(), - cachedGraph->betaTensor_ : getMPSGraphTensorFromScalar(stream, beta_f, MPSDataTypeFloat32) + cachedGraph->betaTensor_ : getMPSGraphTensorFromScalar(stream, beta_scalar), + cachedGraph->thresholdTensor_ : getMPSGraphTensorFromScalar(stream, threshold_scalar), }; NSDictionary* results = @{ outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() @@ -1557,7 +1562,8 @@ Tensor glu_backward_mps (const Tensor& grad_output, if(grad_input.numel() == 0) return; - auto beta_f = beta.to(); + MPSScalar beta_scalar = getMPSScalar(beta, ScalarType::Float); + MPSScalar threshold_scalar = getMPSScalar(threshold, ScalarType::Float); struct CachedGraph : public MPSCachedGraph { @@ -1565,6 +1571,7 @@ Tensor glu_backward_mps (const Tensor& grad_output, MPSGraphTensor *gradOutputTensor_ = nil; MPSGraphTensor *inputTensor_ = nil; MPSGraphTensor *betaTensor_ = nil; + MPSGraphTensor *thresholdTensor_ = nil; MPSGraphTensor *outputTensor_ = nil; }; @@ -1588,7 +1595,9 @@ Tensor glu_backward_mps (const Tensor& grad_output, MPSGraphTensor* inputTensor = mpsGraphRankedPlaceHolder(mpsGraph, self); - MPSGraphTensor* betaTensor = mpsGraphScalarPlaceHolder(mpsGraph, beta); + MPSGraphTensor* betaTensor = mpsGraphScalarPlaceHolder(mpsGraph, getMPSScalarType(ScalarType::Float)); + + MPSGraphTensor* thresholdTensor = mpsGraphScalarPlaceHolder(mpsGraph, getMPSScalarType(ScalarType::Float)); MPSGraphTensor* unitTensor = [mpsGraph constantWithScalar:1.0 shape:@[@1] @@ -1607,9 +1616,6 @@ Tensor glu_backward_mps (const Tensor& grad_output, rTensor = [mpsGraph divisionWithPrimaryTensor:rTensor secondaryTensor:unitExpBxTensor name:nil]; - MPSGraphTensor* thresholdTensor = [mpsGraph constantWithScalar:threshold.to() - shape:@[@1] - dataType:getMPSDataType(self.scalar_type())]; MPSGraphTensor* predicateTensor = [mpsGraph greaterThanWithPrimaryTensor:bxTensor secondaryTensor:thresholdTensor name:nil]; @@ -1621,6 +1627,7 @@ Tensor glu_backward_mps (const Tensor& grad_output, newCachedGraph->gradOutputTensor_ = gradOutputTensor; newCachedGraph->inputTensor_ = inputTensor; newCachedGraph->betaTensor_ = betaTensor; + newCachedGraph->thresholdTensor_ = thresholdTensor; newCachedGraph->outputTensor_ = outputTensor; } return newCachedGraph; @@ -1635,7 +1642,8 @@ Tensor glu_backward_mps (const Tensor& grad_output, NSDictionary* feeds = @{ gradOutputPlaceholder.getMPSGraphTensor() : gradOutputPlaceholder.getMPSGraphTensorData(), selfPlaceholder.getMPSGraphTensor() : selfPlaceholder.getMPSGraphTensorData(), - cachedGraph->betaTensor_ : getMPSGraphTensorFromScalar(stream, beta_f, MPSDataTypeFloat32) + cachedGraph->betaTensor_ : getMPSGraphTensorFromScalar(stream, beta_scalar), + cachedGraph->thresholdTensor_ : getMPSGraphTensorFromScalar(stream, threshold_scalar), }; NSDictionary* results = @{ gradInputPlaceholder.getMPSGraphTensor() : gradInputPlaceholder.getMPSGraphTensorData() @@ -2194,5 +2202,257 @@ Tensor prelu_mps(const Tensor& self, const Tensor& weight_) { return grad_input; } +Tensor& hardswish_out_mps(const Tensor& self, Tensor& output) { + using namespace mps; + using CachedGraph = MPSUnaryCachedGraph; + + TORCH_CHECK(self.is_mps()); + + if (output.numel() == 0) { + return output; + } + + MPSGraphCache* cache_ = MPSGraphCache::getInstance(); + + MPSStream* stream = at::mps::getCurrentMPSStream(); + + @autoreleasepool { + string key = "hardswish_out_mps" + getTensorsStringKey({self}); + CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + if (!cachedGraph) { + MPSCachedGraph* tmpCachedGraph = + cache_->CreateCachedGraph(key, ^MPSCachedGraph*() { + CachedGraph* newCachedGraph = nil; + @autoreleasepool { + MPSGraph* mpsGraph = make_mps_graph(); + newCachedGraph = new CachedGraph(mpsGraph); + MPSGraphTensor* inputTensor = + mpsGraphRankedPlaceHolder(mpsGraph, self); + + MPSGraphTensor* zeroTensor = [mpsGraph + constantWithScalar:0.0f + shape:@[ @1 ] + dataType:getMPSDataType(self.scalar_type())]; + + MPSGraphTensor* threeTensor = [mpsGraph + constantWithScalar:3.0f + shape:@[ @1 ] + dataType:getMPSDataType(self.scalar_type())]; + + MPSGraphTensor* negativeThreeTensor = [mpsGraph + constantWithScalar:-3.0f + shape:@[ @1 ] + dataType:getMPSDataType(self.scalar_type())]; + + MPSGraphTensor* sixTensor = [mpsGraph + constantWithScalar:6.0f + shape:@[ @1 ] + dataType:getMPSDataType(self.scalar_type())]; + + MPSGraphTensor* lessThanMinPredicateTensor = [mpsGraph + lessThanOrEqualToWithPrimaryTensor:inputTensor + secondaryTensor:negativeThreeTensor + name:nil]; + + MPSGraphTensor* lessThanMaxPredicateTensor = + [mpsGraph lessThanWithPrimaryTensor:inputTensor + secondaryTensor:threeTensor + name:nil]; + + MPSGraphTensor* inputPlusThreeTensor = + [mpsGraph additionWithPrimaryTensor:inputTensor + secondaryTensor:threeTensor + name:nil]; + + MPSGraphTensor* inputDivSixTensor = + [mpsGraph divisionWithPrimaryTensor:inputPlusThreeTensor + secondaryTensor:sixTensor + name:nil]; + + MPSGraphTensor* weightedTensor = + [mpsGraph multiplicationWithPrimaryTensor:inputTensor + secondaryTensor:inputDivSixTensor + name:nil]; + + MPSGraphTensor* tempTensor = + [mpsGraph selectWithPredicateTensor:lessThanMaxPredicateTensor + truePredicateTensor:weightedTensor + falsePredicateTensor:inputTensor + name:nil]; + + MPSGraphTensor* outputTensor = + [mpsGraph selectWithPredicateTensor:lessThanMinPredicateTensor + truePredicateTensor:zeroTensor + falsePredicateTensor:tempTensor + name:nil]; + newCachedGraph->inputTensor_ = inputTensor; + newCachedGraph->outputTensor_ = outputTensor; + } + return newCachedGraph; + }); + cachedGraph = static_cast(tmpCachedGraph); + } + Placeholder selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self); + Placeholder outputPlaceholder = + Placeholder(cachedGraph->outputTensor_, output); + + // Create dictionary of inputs and outputs + NSDictionary* feeds = @{ + selfPlaceholder.getMPSGraphTensor() : + selfPlaceholder.getMPSGraphTensorData() + }; + + NSDictionary* results = @{ + outputPlaceholder.getMPSGraphTensor() : + outputPlaceholder.getMPSGraphTensorData() + }; + + runMPSGraph(stream, cachedGraph->graph(), feeds, results); + } + return output; +} + +Tensor hardswish_mps(const Tensor& self) { + using namespace mps; + Tensor output = at::empty_like(self, self.suggest_memory_format()); + + return hardswish_out_mps(self, output); +} + +Tensor& hardswish_mps_(Tensor& self) { + using namespace mps; + Tensor& output = self; + + return hardswish_out_mps(self, output); +} + +Tensor hardswish_backward_mps(const Tensor& grad_output, const Tensor& self) { + using namespace mps; + + if (grad_output.numel() == 0) { + return grad_output; + } + + Tensor grad_input = at::empty_like(self, self.suggest_memory_format()); + + struct CachedGraph : public MPSCachedGraph { + CachedGraph(MPSGraph* graph) : MPSCachedGraph(graph) {} + MPSGraphTensor* gradOutputTensor_ = nil; + MPSGraphTensor* inputTensor_ = nil; + MPSGraphTensor* gradInputTensor_ = nil; + }; + + MPSGraphCache* cache_ = MPSGraphCache::getInstance(); + + MPSStream* stream = at::mps::getCurrentMPSStream(); + + @autoreleasepool { + string key = "hardswish_backward_mps" + getTensorsStringKey({self}); + CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + if (!cachedGraph) { + MPSCachedGraph* tmpCachedGraph = + cache_->CreateCachedGraph(key, ^MPSCachedGraph*() { + CachedGraph* newCachedGraph = nil; + @autoreleasepool { + MPSGraph* mpsGraph = make_mps_graph(); + newCachedGraph = new CachedGraph(mpsGraph); + MPSGraphTensor* gradOutputTensor = + mpsGraphRankedPlaceHolder(mpsGraph, grad_output); + MPSGraphTensor* inputTensor = + mpsGraphRankedPlaceHolder(mpsGraph, self); + + MPSGraphTensor* zeroTensor = [mpsGraph + constantWithScalar:0.0f + shape:@[ @1 ] + dataType:getMPSDataType(grad_output.scalar_type())]; + + MPSGraphTensor* unitTensor = [mpsGraph + constantWithScalar:1.0f + shape:@[ @1 ] + dataType:getMPSDataType(grad_output.scalar_type())]; + + MPSGraphTensor* threeTensor = [mpsGraph + constantWithScalar:3.0f + shape:@[ @1 ] + dataType:getMPSDataType(grad_output.scalar_type())]; + + MPSGraphTensor* negativeThreeTensor = [mpsGraph + constantWithScalar:-3.0f + shape:@[ @1 ] + dataType:getMPSDataType(grad_output.scalar_type())]; + + MPSGraphTensor* halfTensor = [mpsGraph + constantWithScalar:0.5f + shape:@[ @1 ] + dataType:getMPSDataType(grad_output.scalar_type())]; + + MPSGraphTensor* tempTensor = + [mpsGraph divisionWithPrimaryTensor:inputTensor + secondaryTensor:threeTensor + name:nil]; + + MPSGraphTensor* weightedTensor = + [mpsGraph additionWithPrimaryTensor:tempTensor + secondaryTensor:halfTensor + name:nil]; + + MPSGraphTensor* lessThanMinPredicateTensor = [mpsGraph + lessThanOrEqualToWithPrimaryTensor:inputTensor + secondaryTensor:negativeThreeTensor + name:nil]; + + MPSGraphTensor* lessThanMaxPredicateTensor = + [mpsGraph lessThanWithPrimaryTensor:inputTensor + secondaryTensor:threeTensor + name:nil]; + + MPSGraphTensor* lessThanMaxGradTensor = + [mpsGraph selectWithPredicateTensor:lessThanMaxPredicateTensor + truePredicateTensor:weightedTensor + falsePredicateTensor:unitTensor + name:nil]; + + MPSGraphTensor* gradTensor = + [mpsGraph selectWithPredicateTensor:lessThanMinPredicateTensor + truePredicateTensor:zeroTensor + falsePredicateTensor:lessThanMaxGradTensor + name:nil]; + MPSGraphTensor* gradInputTensor = + [mpsGraph multiplicationWithPrimaryTensor:gradTensor + secondaryTensor:gradOutputTensor + name:nil]; + + newCachedGraph->gradOutputTensor_ = gradOutputTensor; + newCachedGraph->inputTensor_ = inputTensor; + newCachedGraph->gradInputTensor_ = gradInputTensor; + } + return newCachedGraph; + }); + cachedGraph = static_cast(tmpCachedGraph); + } + + Placeholder gradOutputPlaceholder = + Placeholder(cachedGraph->gradOutputTensor_, grad_output); + Placeholder selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self); + Placeholder gradInputPlaceholder = + Placeholder(cachedGraph->gradInputTensor_, grad_input); + + // Create dictionary of inputs and outputs + NSDictionary* feeds = @{ + gradOutputPlaceholder.getMPSGraphTensor() : + gradOutputPlaceholder.getMPSGraphTensorData(), + selfPlaceholder.getMPSGraphTensor() : + selfPlaceholder.getMPSGraphTensorData() + }; + + NSDictionary* results = @{ + gradInputPlaceholder.getMPSGraphTensor() : + gradInputPlaceholder.getMPSGraphTensorData() + }; + + runMPSGraph(stream, cachedGraph->graph(), feeds, results); + } + return grad_input; +} } // namespace native } // namespace at diff --git a/aten/src/ATen/native/mps/operations/AdaptivePooling.mm b/aten/src/ATen/native/mps/operations/AdaptivePooling.mm index 1d58de2902cf..e13deb805bb6 100644 --- a/aten/src/ATen/native/mps/operations/AdaptivePooling.mm +++ b/aten/src/ATen/native/mps/operations/AdaptivePooling.mm @@ -19,11 +19,27 @@ int64_t &strideH, int64_t &strideW, int64_t &kernel_sizeH, int64_t &kernel_sizeW) { - strideH = (int64_t) (isizeH / osizeH); - strideW = (int64_t) (isizeW / osizeW); + TORCH_CHECK((isizeH >= osizeH && isizeW >= osizeW) || (isizeH <= osizeH && isizeW <= osizeW), + "Adaptive pool MPS: Input height and width must both be greather than or equal to, or lesser than, output height and width") + + TORCH_CHECK((!(isizeH <= osizeH && isizeW <= osizeW) || (osizeH % isizeH == 0 && osizeW % isizeW == 0)), + "Adaptive pool MPS: If output is larger than input, output sizes must be multiples of input sizes") + + if(isizeH >= osizeH) { + strideH = (int64_t) (isizeH / osizeH); + strideW = (int64_t) (isizeW / osizeW); + + kernel_sizeH = isizeH - (osizeH-1) * strideH; + kernel_sizeW = isizeW - (osizeW-1) * strideW; + } + else { + strideH = (int64_t) (osizeH / isizeH); + strideW = (int64_t) (osizeW / isizeW); + + kernel_sizeH = osizeH - (isizeH-1) * strideH; + kernel_sizeW = osizeW - (isizeW-1) * strideW; + } - kernel_sizeH = isizeH - (osizeH-1) * strideH; - kernel_sizeW = isizeW - (osizeW-1) * strideW; } // Adaptive average pooling @@ -71,13 +87,33 @@ strideH, strideW, kernel_sizeH, kernel_sizeW); - output = at::avg_pool2d(input, - IntArrayRef({kernel_sizeH, kernel_sizeW}), - IntArrayRef({strideH, strideW}), - IntArrayRef({0, 0}), - false, - true, - c10::nullopt); + if(isizeH >= osizeH) { + output = at::avg_pool2d(input, + IntArrayRef({kernel_sizeH, kernel_sizeW}), + IntArrayRef({strideH, strideW}), + IntArrayRef({0, 0}), + false, + true, + c10::nullopt); + } else { + Tensor phony_grad = at::ones_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); + auto input_sizes = input.sizes(); + std::vector phony_shape{input_sizes.begin(), input_sizes.end() -2}; + phony_shape.push_back(output_size[0]); + phony_shape.push_back(output_size[1]); + phony_grad.resize_(IntArrayRef(phony_shape)); + output = at::avg_pool2d_backward(input, + phony_grad, + IntArrayRef({kernel_sizeH, kernel_sizeW}), + IntArrayRef({strideH, strideW}), + IntArrayRef({0, 0}), + false, + true, + c10::nullopt); + // Multiply output by kernel size + output = at::mul(output, kernel_sizeH*kernel_sizeW); + } + return output; } @@ -138,15 +174,27 @@ strideH, strideW, kernel_sizeH, kernel_sizeW); auto gradInput = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); - if (gradInput.numel() != 0) - gradInput = at::avg_pool2d_backward(gradOutput, - input, - IntArrayRef({kernel_sizeH, kernel_sizeW}), - IntArrayRef({strideH, strideW}), - IntArrayRef({0, 0}), - false, - true, - c10::nullopt); + if (gradInput.numel() != 0) { + if(isizeH >= osizeH) { + gradInput = at::avg_pool2d_backward(gradOutput, + input, + IntArrayRef({kernel_sizeH, kernel_sizeW}), + IntArrayRef({strideH, strideW}), + IntArrayRef({0, 0}), + false, + true, + c10::nullopt); + } else { + gradInput = at::avg_pool2d(gradOutput, + IntArrayRef({kernel_sizeH, kernel_sizeW}), + IntArrayRef({strideH, strideW}), + IntArrayRef({0, 0}), + false, + true, + c10::nullopt); + gradInput = at::mul(gradInput, kernel_sizeH*kernel_sizeW); + } + } return gradInput; diff --git a/aten/src/ATen/native/mps/operations/BinaryOps.mm b/aten/src/ATen/native/mps/operations/BinaryOps.mm index b619307ef8aa..a246bb0c50f0 100644 --- a/aten/src/ATen/native/mps/operations/BinaryOps.mm +++ b/aten/src/ATen/native/mps/operations/BinaryOps.mm @@ -27,10 +27,6 @@ void binaryOpTensor(const Tensor& self, const Tensor& other, const Scalar& alpha, const Tensor& output_, std::string op_name, BinaryOpBlock binaryBlock) { - // it's possible to receive empty tensors here - if (self.numel() == 0 || other.numel() == 0) { - return; - } MPSStream* mpsStream = getCurrentMPSStream(); const bool is_self_scalar = self.dim() == 0; @@ -41,6 +37,11 @@ void binaryOpTensor(const Tensor& self, const Tensor& other, const Scalar& alpha output_.resize_(new_size); } + // it's possible to receive empty tensors here + if (self.numel() == 0 || other.numel() == 0) { + return; + } + Tensor output = output_; bool needsCopyToOutput = false; @@ -72,16 +73,37 @@ void binaryOpTensor(const Tensor& self, const Tensor& other, const Scalar& alpha // this type inference is only required at the time of graph creation const ScalarType common_dtype = c10::promoteTypes(self.scalar_type(), other.scalar_type()); - if (self.scalar_type() != common_dtype) { - primaryCastTensor = castMPSTensor(mpsGraph, newCachedGraph->primaryTensor, common_dtype); + + // Condition - + // 1. Division operation + // 2. Inputs are not float + bool div_condition = op_name.rfind("div", 0) == 0 + && (!(common_dtype == ScalarType::Float || common_dtype == ScalarType::Half)); + + auto compute_type = ScalarType::Float; + + if(div_condition) { + + if(output_.scalar_type() == ScalarType::Float || output_.scalar_type() == ScalarType::Half) + compute_type = output_.scalar_type(); + + primaryCastTensor = castMPSTensor(mpsGraph, newCachedGraph->primaryTensor, compute_type); + secondaryCastTensor = castMPSTensor(mpsGraph, newCachedGraph->secondaryTensor, compute_type); } - if (other.scalar_type() != common_dtype) { - secondaryCastTensor = castMPSTensor(mpsGraph, newCachedGraph->secondaryTensor, common_dtype); + else { + if (self.scalar_type() != common_dtype) { + primaryCastTensor = castMPSTensor(mpsGraph, newCachedGraph->primaryTensor, common_dtype); + } + if (other.scalar_type() != common_dtype) { + secondaryCastTensor = castMPSTensor(mpsGraph, newCachedGraph->secondaryTensor, common_dtype); + } } newCachedGraph->outputTensor = binaryBlock(newCachedGraph, primaryCastTensor, secondaryCastTensor); // Cast output tensor to an expected type if needed, which addresses discrepancy when int64 scalar is added to int32 tensor // Output tensor should have been promoted but it remains an int32 tensor - if (output_.scalar_type() != common_dtype) { + + if ((div_condition && compute_type != output_.scalar_type()) || + output_.scalar_type() != common_dtype) { newCachedGraph->outputTensor = castMPSTensor(mpsGraph, newCachedGraph->outputTensor, output_.scalar_type()); } } @@ -93,22 +115,29 @@ void binaryOpTensor(const Tensor& self, const Tensor& other, const Scalar& alpha NSMutableDictionary *feeds = [[NSMutableDictionary new] autorelease]; Placeholder selfPlaceholder; Placeholder otherPlaceholder; + MPSScalar self_scalar; + MPSScalar other_scalar; + MPSScalar alpha_scalar; if (is_self_scalar && !self.is_mps()) { - feeds[cachedGraph->primaryTensor] = getMPSGraphTensorFromScalar(mpsStream, self.item(), getMPSScalarType(self.scalar_type())); + self_scalar = getMPSScalar(self.item(), self.scalar_type()); + feeds[cachedGraph->primaryTensor] = getMPSGraphTensorFromScalar(mpsStream, self_scalar); } else { selfPlaceholder = Placeholder(cachedGraph->primaryTensor, self); feeds[selfPlaceholder.getMPSGraphTensor()] = selfPlaceholder.getMPSGraphTensorData(); } if (is_other_scalar && !other.is_mps()) { - feeds[cachedGraph->secondaryTensor] = getMPSGraphTensorFromScalar(mpsStream, other.item(), getMPSScalarType(other.scalar_type())); + other_scalar = getMPSScalar(other.item(), other.scalar_type()); + feeds[cachedGraph->secondaryTensor] = getMPSGraphTensorFromScalar(mpsStream, other_scalar); } else { otherPlaceholder = Placeholder(cachedGraph->secondaryTensor, other); feeds[otherPlaceholder.getMPSGraphTensor()] = otherPlaceholder.getMPSGraphTensorData(); } + // 'cachedGraph->alphaTensor' is not nil only if add_sub_template() was called with an alpha value != 1.0 if (cachedGraph->alphaTensor) { - feeds[cachedGraph->alphaTensor] = getMPSGraphTensorFromScalar(mpsStream, alpha, getMPSScalarType(other.scalar_type())); + alpha_scalar = getMPSScalar(alpha, other.scalar_type()); + feeds[cachedGraph->alphaTensor] = getMPSGraphTensorFromScalar(mpsStream, alpha_scalar); } Placeholder outputPlaceholder = Placeholder(cachedGraph->outputTensor, needsCopyToOutput ? output : output_); @@ -138,7 +167,11 @@ void div_mode_template(const Tensor& self, const Tensor& other, MPSGraphTensor* divTensor = [mpsGraph divisionWithPrimaryTensor:primaryCastTensor secondaryTensor:secondaryCastTensor name:nil]; - if (!rounding_mode.has_value()) { + // Rounding is a no-op for integral types, and also a reasonable workaround + // For MPSGraph bug on Apple Silicon, that throws `Function floorOp_i64 was not found in the library` + // See https://github.com/pytorch/pytorch/issues/84995 + bool isFloatOutput = ([divTensor dataType] & MPSDataTypeFloatBit) != 0; + if (!rounding_mode.has_value() || !isFloatOutput) { return divTensor; } else if (*rounding_mode == "trunc") { return trunc_tensor(mpsGraph, divTensor); @@ -203,6 +236,9 @@ void add_sub_template(const Tensor& self, const Tensor& other, const Scalar& alp #define CREATE_MPS_STRUCTURED_BINARY_OP_FUNC(func_out, func_stub, other_type) \ TORCH_IMPL_FUNC(func_out) (const Tensor& self, const other_type& other, const Tensor& output) { \ + TORCH_CHECK(!(self.scalar_type() == ScalarType::Long && \ + (std::string(#func_stub) == "power" || std::string(#func_stub) == "atan2")), \ + "MPS does not support ", #func_stub, " op with int64 input") \ mps::binaryOp##other_type(self, other, Scalar(1.0), output, #func_stub, \ ^BinaryOpFn(cachedGraph, primaryCastTensor, secondaryCastTensor) { \ MPSGraph* mpsGraph = cachedGraph->graph(); \ diff --git a/aten/src/ATen/native/mps/operations/BitwiseBinaryOps.mm b/aten/src/ATen/native/mps/operations/BitwiseOps.mm similarity index 79% rename from aten/src/ATen/native/mps/operations/BitwiseBinaryOps.mm rename to aten/src/ATen/native/mps/operations/BitwiseOps.mm index 62d25c3e97d9..5b57693296b1 100644 --- a/aten/src/ATen/native/mps/operations/BitwiseBinaryOps.mm +++ b/aten/src/ATen/native/mps/operations/BitwiseOps.mm @@ -73,6 +73,15 @@ kernel void bitwise_xor_scalar(constant uint& length [[buffer(0)]], out[offset] = a[offset] ^ b; }} +kernel void bitwise_not(constant uint& length [[buffer(0)]], + device {0} *out [[buffer(1)]], + device {1} *a [[buffer(2)]], + uint offset [[thread_position_in_grid]]) {{ + if (offset >= length) {{ + return; + }} + out[offset] = ~a[offset]; +}} )METAL"; @@ -113,8 +122,10 @@ kernel void bitwise_xor_scalar(constant uint& length [[buffer(0)]], return it->second; } NSError *error = nil; + MTLCompileOptions *options = [[MTLCompileOptions new] autorelease]; + [options setLanguageVersion: MTLLanguageVersion2_3]; auto rc = [device newLibraryWithSource:[NSString stringWithUTF8String:fmt::format(BITWISE_OPS_TEMPLATE, t1, t2, t3).c_str()] - options:nil + options:options error:&error]; TORCH_CHECK(rc != nil && error == nil, "Failed to compile library: ", [[error localizedDescription] UTF8String]); libMap[key] = rc; @@ -161,6 +172,9 @@ void handle_tensor_tensor_binary_op(const at::Tensor& self, const at::Tensor& ot getMetalType(other), kernel_name); uint32_t length = output.numel(); + if (length == 0) { + return; + } dispatch_sync(stream->queue(), ^(){ id buffer = stream->commandBuffer(); id commandEncoder = [buffer computeCommandEncoder]; @@ -191,6 +205,9 @@ void handle_tensor_scalar_binary_op(const at::Tensor& self, const at::Scalar& ot kernel_name); uint64_t sval = other.to(); uint32_t length = output.numel(); + if (length == 0) { + return; + } dispatch_sync(stream->queue(), ^(){ id buffer = stream->commandBuffer(); id commandEncoder = [buffer computeCommandEncoder]; @@ -262,12 +279,69 @@ void handle_tensor_scalar_binary_op(const at::Tensor& self, const at::Scalar& ot return _bitwise_op_out_mps(self, other, output, "xor"); } +at::Tensor& bitwise_not_out_mps (const at::Tensor& self, at::Tensor& output_) { + // Handle boolean tensor using logical not + if (self.scalar_type() == c10::ScalarType::Bool) { + return at::native::logical_not_out_mps(self, output_); + } + + at::Tensor output = output_; + bool needs_output_copy = false; + + at::native::resize_output(output, self.sizes()); + if (!output.is_contiguous()) { + output = output.contiguous(); + needs_output_copy = true; + } + if (self.dim() == 0) { + if (self.scalar_type() == c10::ScalarType::Byte) { + // Unsigned types need a special handling to keep result of operation in 0..255 output + output.fill_(c10::Scalar(static_cast(~self.item()))); + } else { + output.fill_(c10::Scalar(~self.item())); + } + return output_; + } + uint32_t length = output.numel(); + if (length == 0) { + return output_; + } + using namespace at::mps; + MPSStream* stream = getCurrentMPSStream(); + id cplState = getCPLState(MPSDevice::getInstance()->device(), + getMetalType(output), + getMetalType(self), + getMetalType(self), + "bitwise_not"); + dispatch_sync(stream->queue(), ^(){ + id buffer = stream->commandBuffer(); + id commandEncoder = [buffer computeCommandEncoder]; + + id outBuf = __builtin_bit_cast(id, output.storage().data()); + id selfBuf = __builtin_bit_cast(id, self.storage().data()); + + [commandEncoder pushDebugGroup:@"Dispatch bitwise_not kernel"]; + [commandEncoder setComputePipelineState:cplState]; + [commandEncoder setBytes:&length length:sizeof(length) atIndex:0]; + [commandEncoder setBuffer:outBuf offset:output.storage_offset()*output.itemsize() atIndex:1]; + [commandEncoder setBuffer:selfBuf offset:self.storage_offset()*self.itemsize() atIndex:2]; + dispatch1DJob(commandEncoder, cplState, length); + [commandEncoder endEncoding]; + stream->commit(true); + }); + if (needs_output_copy) { + output_.copy_(output); + } + return output_; +} + TORCH_LIBRARY_IMPL(aten, MPS, m) { m.impl("bitwise_and.Tensor_out", bitwise_and_out_mps); m.impl("bitwise_or.Tensor_out", bitwise_or_out_mps); m.impl("bitwise_xor.Tensor_out", bitwise_xor_out_mps); + m.impl("bitwise_not.out", bitwise_not_out_mps); } } // anonymous namespace diff --git a/aten/src/ATen/native/mps/operations/Blas.mm b/aten/src/ATen/native/mps/operations/Blas.mm index 7ab34ac31401..20a3ec5eb6db 100644 --- a/aten/src/ATen/native/mps/operations/Blas.mm +++ b/aten/src/ATen/native/mps/operations/Blas.mm @@ -51,13 +51,36 @@ Tensor dot_mps( MPSGraphTensor *selfTensor = mps::mpsGraphRankedPlaceHolder(mpsGraph, self); MPSGraphTensor *otherTensor = mps::mpsGraphRankedPlaceHolder(mpsGraph, other); - MPSGraphTensor *dot = [mpsGraph multiplicationWithPrimaryTensor: selfTensor - secondaryTensor: otherTensor + MPSGraphTensor *castSelf = nil; + MPSGraphTensor *castOther = nil; + + if(self.scalar_type() == ScalarType::Short || self.scalar_type() == ScalarType::Byte + || self.scalar_type() == ScalarType::Char) { + castSelf = [mpsGraph castTensor:selfTensor + toType:MPSDataTypeInt32 + name:@"castSelfTensor"]; + castOther = [mpsGraph castTensor:otherTensor + toType:MPSDataTypeInt32 + name:@"castOtherTensor"]; + } else { + castSelf = selfTensor; + castOther = otherTensor; + } + + MPSGraphTensor *dot = [mpsGraph multiplicationWithPrimaryTensor: castSelf + secondaryTensor: castOther name: @"multiplication"]; MPSGraphTensor *dotProductTensor = [mpsGraph reductionSumWithTensor: dot axes: nil name: @"dotProduct"]; + + if(self.scalar_type() == ScalarType::Short || self.scalar_type() == ScalarType::Byte + || self.scalar_type() == ScalarType::Char) + dotProductTensor = [mpsGraph castTensor:dotProductTensor + toType:getMPSDataType(self.scalar_type()) + name:@"castDotProductTensor"]; + newCachedGraph->selfTensor_ = selfTensor; newCachedGraph->otherTensor_ = otherTensor; newCachedGraph->outputTensor_ = dotProductTensor; diff --git a/aten/src/ATen/native/mps/operations/ConstantOps.mm b/aten/src/ATen/native/mps/operations/ConstantOps.mm index 0cfd7ccc2ff5..a5ddd82a229e 100644 --- a/aten/src/ATen/native/mps/operations/ConstantOps.mm +++ b/aten/src/ATen/native/mps/operations/ConstantOps.mm @@ -35,11 +35,15 @@ MPSGraph *mpsGraph = make_mps_graph(); newCachedGraph = new CachedGraph(mpsGraph); auto isBool = self.scalar_type() == c10::ScalarType::Bool; - auto dataType = (!isBool) ? getMPSScalarType(self.scalar_type()) : MPSDataTypeInt8; + auto isUInt8 = self.scalar_type() == c10::ScalarType::Byte; + auto dataType = !isUInt8 ? !isBool ? getMPSScalarType(self.scalar_type()) : MPSDataTypeInt8 : MPSDataTypeUInt32; // constantWithScalar does not work for boolTypes on MacOS-12.[34] // workaround by filing it as int8 tensor and than casting to bool // See https://github.com/pytorch/pytorch/issues/82427 - MPSGraphTensor* inputTensor = [mpsGraph constantWithScalar:value.toDouble() + // constantWithScalar does not work for UInt8 Types on MacOS-12.[34]/Ventura preview + // workaround by filing it as uint32 tensor and than casting to uint8 + // See https://github.com/pytorch/pytorch/issues/83692 + MPSGraphTensor* inputTensor = [mpsGraph constantWithScalar: value.toDouble() shape:getMPSShape(self) dataType:dataType]; MPSGraphTensor* outputTensor = [mpsGraph identityWithTensor:inputTensor @@ -49,6 +53,11 @@ toType:MPSDataTypeBool name:@"constWithBool-workaround"]; } + if (isUInt8) { + outputTensor = [mpsGraph castTensor:outputTensor + toType:MPSDataTypeUInt8 + name:@"constWithUInt8-workaround"]; + } newCachedGraph->outputTensor_ = outputTensor; } diff --git a/aten/src/ATen/native/mps/operations/Convolution.mm b/aten/src/ATen/native/mps/operations/Convolution.mm index 2c74dcf07667..88bad9a5872a 100644 --- a/aten/src/ATen/native/mps/operations/Convolution.mm +++ b/aten/src/ATen/native/mps/operations/Convolution.mm @@ -39,6 +39,19 @@ void fill_conv_desc(MPSGraphConvolution2DOpDescriptor* descriptor_, descriptor_.groups = groups; } +static +MPSShape* get_mps_conv_shape(const Tensor& tensor, bool is_channels_last) { + if (is_channels_last) { + const auto tensorSizes = tensor.sizes(); + const NSUInteger N = tensorSizes[0]; + const NSUInteger C = tensorSizes[1]; + const NSUInteger H = tensorSizes[2]; + const NSUInteger W = tensorSizes[3]; + return @[@(N), @(H), @(W), @(C)]; + } + return at::native::mps::getMPSShape(tensor); +} + Tensor _mps_convolution( const Tensor& input_t, const Tensor& weight_t, @@ -47,6 +60,8 @@ Tensor _mps_convolution( IntArrayRef stride, IntArrayRef dilation, int64_t groups) { + TORCH_CHECK(input_t.dim() < 5, "Conv3D is not supported on MPS"); + namespace native_mps = at::native::mps; CheckedFrom c = "mps_convolution"; TensorArg input { input_t, "input", 1 }, @@ -124,19 +139,7 @@ Tensor _mps_convolution( + mps::getTensorsStringKey({input_t, weight_t}) + ":" + to_string(bias_defined) + ":" + bias_shape_key; CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); - MPSShape* inputShape = nil; - - if (is_channels_last) { - const auto inputSizes = input_t.sizes(); - const NSUInteger N = inputSizes[0]; - const NSUInteger C = inputSizes[1]; - const NSUInteger H = inputSizes[2]; - const NSUInteger W = inputSizes[3]; - inputShape = @[@(N), @(H), @(W), @(C)]; - } else { - inputShape = native_mps::getMPSShape(input_t); - } - + MPSShape* inputShape = get_mps_conv_shape(input_t, is_channels_last); if(!cachedGraph) { native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { @@ -147,8 +150,8 @@ Tensor _mps_convolution( newCachedGraph = new CachedGraph(mpsGraph); MPSGraphConvolution2DOpDescriptor *descriptor_ = [[MPSGraphConvolution2DOpDescriptor new] autorelease]; - fill_conv_desc(descriptor_, stride[0], stride[1], - dilation[0], dilation[1], + fill_conv_desc(descriptor_, stride[1], stride[0], + dilation[1], dilation[0], padding[1], padding[0], memory_format, groups); @@ -229,7 +232,7 @@ Tensor mps_convolution_backward_input( c10::nullopt, kMPS, c10::nullopt, - memory_format); + c10::nullopt); // Avoid "grad_input" when this is being used as transposed convolution TensorArg grad_input{ grad_input_t, "result", 0 }; @@ -264,9 +267,7 @@ Tensor mps_convolution_backward_input( } MPSShape* mps_input_shape = getMPSShape(input_size); - NSString* ns_shape_key = [[mps_input_shape valueForKey:@"description"] componentsJoinedByString:@","]; - string key = "mps_convolution_backward_input:" + to_string(stride[0]) + ":" + to_string(stride[1]) + ":" + to_string(dilation[0]) + ":" + to_string(dilation[1]) + ":" + to_string(padding[0]) + ":" + to_string(padding[1]) + ":" @@ -285,8 +286,8 @@ Tensor mps_convolution_backward_input( newCachedGraph = new CachedGraph(mpsGraph); MPSGraphConvolution2DOpDescriptor *descriptor_ = [[MPSGraphConvolution2DOpDescriptor new] autorelease]; - fill_conv_desc(descriptor_, stride[0], stride[1], - dilation[0], dilation[1], + fill_conv_desc(descriptor_, stride[1], stride[0], + dilation[1], dilation[0], padding[1], padding[0], memory_format, groups); @@ -333,6 +334,9 @@ Tensor mps_convolution_backward_weights( using namespace mps; CheckedFrom c = "mps_convolution_backward_weights"; auto memory_format = input_t.suggest_memory_format(); + bool is_channels_last = (memory_format == at::MemoryFormat::ChannelsLast); + MPSShape* inputShape = get_mps_conv_shape(input_t, is_channels_last); + MPSShape* gradOutputShape = get_mps_conv_shape(grad_output_t, is_channels_last); // For uniformity with everything else, although it seems grad_weight // would be unambiguous too. @@ -342,7 +346,7 @@ Tensor mps_convolution_backward_weights( checkAllSameType(c, {grad_output, input}); checkAllSameGPU(c, {grad_output, input}); - auto grad_weight_t = at::empty(weight_size, grad_output_t.options(), memory_format); + auto grad_weight_t = at::empty(weight_size, grad_output_t.options(), c10::nullopt); TensorArg grad_weight{ grad_weight_t, "result", 0 }; convolution_shape_check(c, input, grad_weight, grad_output, padding, stride, dilation, groups); @@ -375,9 +379,7 @@ Tensor mps_convolution_backward_weights( } MPSShape* mps_weight_shape = getMPSShape(weight_size); - NSString* ns_shape_key = [[mps_weight_shape valueForKey:@"description"] componentsJoinedByString:@","]; - string key = "mps_convolution_backward_weights:" + to_string(stride[0]) + ":" + to_string(stride[1]) + ":" + to_string(dilation[0]) + ":" + to_string(dilation[1]) + ":" + to_string(padding[0]) + ":" + to_string(padding[1]) + ":" @@ -396,13 +398,13 @@ Tensor mps_convolution_backward_weights( newCachedGraph = new CachedGraph(mpsGraph); MPSGraphConvolution2DOpDescriptor *descriptor_ = [[MPSGraphConvolution2DOpDescriptor new] autorelease]; - fill_conv_desc(descriptor_, stride[0], stride[1], - dilation[0], dilation[1], + fill_conv_desc(descriptor_, stride[1], stride[0], + dilation[1], dilation[0], padding[1], padding[0], memory_format, groups); - MPSGraphTensor* gradOutputTensor = native_mps::mpsGraphRankedPlaceHolder(mpsGraph, grad_output_t); - MPSGraphTensor* inputTensor = native_mps::mpsGraphRankedPlaceHolder(mpsGraph, input_t); + MPSGraphTensor* gradOutputTensor = native_mps::mpsGraphRankedPlaceHolder(mpsGraph, native_mps::getMPSScalarType(grad_output_t.scalar_type()), gradOutputShape); + MPSGraphTensor* inputTensor = native_mps::mpsGraphRankedPlaceHolder(mpsGraph, native_mps::getMPSScalarType(input_t.scalar_type()), inputShape); MPSGraphTensor* gradWeightTensor = [mpsGraph convolution2DWeightsGradientWithIncomingGradientTensor:gradOutputTensor sourceTensor:inputTensor @@ -419,8 +421,8 @@ Tensor mps_convolution_backward_weights( cachedGraph = static_cast(tmpCachedGraph); } - auto gradOutputPlaceholder = Placeholder(cachedGraph->gradOutputTensor_, grad_output_t); - auto inputPlaceholder = Placeholder(cachedGraph->inputTensor_, input_t); + auto gradOutputPlaceholder = Placeholder(cachedGraph->gradOutputTensor_, grad_output_t, gradOutputShape); + auto inputPlaceholder = Placeholder(cachedGraph->inputTensor_, input_t, inputShape); auto outputPlaceholder = Placeholder(cachedGraph->gradWeightTensor_, grad_weight_t); NSDictionary *feeds = @{ diff --git a/aten/src/ATen/native/mps/operations/Copy.mm b/aten/src/ATen/native/mps/operations/Copy.mm index 3c2ab0d6c2f8..99183d21030c 100644 --- a/aten/src/ATen/native/mps/operations/Copy.mm +++ b/aten/src/ATen/native/mps/operations/Copy.mm @@ -33,10 +33,29 @@ return (void*)alignedAddress; } +/** + * Computes number of elements one needs to transfer to preserve all the elements + */ +size_t compute_strided_size(const at::Tensor& t) { + size_t rc = 1; + if (t.numel() == 0) { + return 0; + } + for(const auto i: c10::irange(t.dim())) { + assert(t.size(i) > 0); + rc += (t.size(i) - 1) * t.stride(i); + } + return rc; +} + +bool is_strided_contiguous(const at::Tensor& t) { + return compute_strided_size(t) == t.numel(); +} + // Copy sourceBuffer into destBuffer, casting sourceBuffer to src.scalar_type(). // The shapes and dtypes are taken from dst and src, but their storage pointers are not used. void copy_cast_mps(at::Tensor& dst, const at::Tensor& src, - id destBuffer, id sourceBuffer) { + id destBuffer, id sourceBuffer, bool non_blocking = true) { using namespace mps; struct CachedGraph : public MPSCachedGraph @@ -84,6 +103,8 @@ void copy_cast_mps(at::Tensor& dst, const at::Tensor& src, NSDictionary* feeds = @{cachedGraph->inputTensor_: srcData}; NSDictionary* results = @{cachedGraph->outputTensor_: dstData}; runMPSGraph(stream, cachedGraph->graph(), feeds, results); + if (!non_blocking) + stream->synchronize(SyncType::COMMIT_AND_WAIT); } } @@ -113,38 +134,51 @@ void copy_cast_mps(at::Tensor& dst, const at::Tensor& src, src = src_; } id sourceBuffer = getMTLBufferStorage(src); - size_t src_total_size = src_.is_view() ? at::detail::computeStorageNbytesContiguous(src.sizes(), src.element_size(), src.storage_offset()) : - src.nbytes(); - size_t size_to_copy = src.nbytes(); - - // In case of dtype change, first convert src inplace - if (src_.dtype() != dst_.dtype()) { - copy_cast_mps(dst, src, sourceBuffer, sourceBuffer); - // Use the element size of dst to calculate the total size after casting - size_to_copy = (size_to_copy / src.element_size()) * dst.element_size(); - } - - // If there's anything wrong with source, we shouldn't return dst_ silently and must error out. - TORCH_INTERNAL_ASSERT(sourceBuffer && size_to_copy > 0); - TORCH_INTERNAL_ASSERT(src_total_size >= storage_byte_offset); - TORCH_INTERNAL_ASSERT(dst.nbytes() >= (dst.storage_offset() * dst.element_size())); + size_t dst_tensor_nbytes = dst.nbytes(); @autoreleasepool { MTLResourceOptions options = MTLResourceOptionCPUCacheModeDefault | MTLResourceStorageModeShared; NSUInteger alignedLength = 0; void* host_dst = dst.storage().data(); - void* alignedPtr = pageAlignedBlockPtr(host_dst, (NSUInteger)src_total_size, &alignedLength); + void* alignedPtr = pageAlignedBlockPtr(host_dst, (NSUInteger)dst_tensor_nbytes, &alignedLength); + NSUInteger destOffset = (uintptr_t(host_dst) - uintptr_t(alignedPtr)); + // 4 bytes alignment required on macos for blits. + TORCH_INTERNAL_ASSERT(destOffset % 4 == 0, "Unaligned blit request"); + id destBuffer = [device newBufferWithBytesNoCopy:alignedPtr length:alignedLength options:options deallocator:nil]; - NSUInteger destOffset = uintptr_t(host_dst) - uintptr_t(alignedPtr); - // 4 bytes alignment required on macos for blits. - TORCH_INTERNAL_ASSERT(destOffset % 4 == 0, "Unaligned blit request"); + id tmpBuffer = sourceBuffer; + Tensor tmp; + bool needsBlit = true; + if (src_.dtype() != dst.dtype()) { + if (destOffset == 0 && storage_byte_offset == 0) { + // Return the casted tensor directly if there's no destination offset + needsBlit = false; + tmpBuffer = destBuffer; + } else if (src.element_size() < dst.element_size()) { + tmp = at::native::empty_mps(dst.sizes(), dst.scalar_type(), c10::nullopt, kMPS); + tmpBuffer = getMTLBufferStorage(tmp); + } + } - stream->copy_and_sync(sourceBuffer, destBuffer, size_to_copy, storage_byte_offset, destOffset, non_blocking); - [destBuffer release]; + size_t size_to_copy = src.nbytes(); + // In case of dtype change, first convert src inplace + if (src_.dtype() != dst.dtype()) { + copy_cast_mps(dst, src, tmpBuffer, sourceBuffer, non_blocking); + } + + if (needsBlit) { + size_to_copy = (size_to_copy / src.element_size()) * dst.element_size(); + + // If there's anything wrong with source, we shouldn't return dst_ silently and must error out. + TORCH_INTERNAL_ASSERT(sourceBuffer && dst_tensor_nbytes > 0); + + stream->copy_and_sync(tmpBuffer, destBuffer, size_to_copy, storage_byte_offset, destOffset, non_blocking); + [destBuffer release]; + } } if (!dst.is_same(dst_)) { dst_.copy_(dst, non_blocking); @@ -153,55 +187,60 @@ void copy_cast_mps(at::Tensor& dst, const at::Tensor& src, return dst_; } -static at::Tensor& copy_to_mps_(at::Tensor& dst_, const at::Tensor& src_, bool non_blocking) +// Copies tensor from cpu to mps backed by identical strided-contiguous data +static void copy_to_mps_stride_contig(at::Tensor& dst, const at::Tensor& src, bool non_blocking) { MPSStream* stream = getCurrentMPSStream(); - Tensor dst; - Tensor src; - id device = MPSDevice::getInstance()->device(); - auto dst_byte_offset = dst_.storage_offset() * dst_.itemsize(); - id destBuffer = getMTLBufferStorage(dst_); - uint64_t src_total_size = 0; - - if (src_.is_view()) { - src = src_.to(dst_.dtype()).expand_as(dst_).contiguous(); - // Get the actual size of a View (takes into account the storage offset) - // For View tensors, the storage offset can be bigger than what's being reported by nbytes - src_total_size = at::detail::computeStorageNbytesContiguous(src.sizes(), src.element_size(), src.storage_offset()); - } else { - src = src_; - if (src.dtype() != dst_.dtype()) { - // In case of dtype change, perform conversion on source device - src = src.to(dst_.dtype()); - } - src_total_size = src.nbytes(); - } - + auto dst_byte_offset = dst.storage_offset() * dst.itemsize(); + auto src_byte_offset = src.storage_offset() * src.itemsize(); + id destBuffer = getMTLBufferStorage(dst); const size_t size_to_copy = src.nbytes(); - const void* host_src = src.storage().data(); - TORCH_INTERNAL_ASSERT(src_total_size >= (src.storage_offset() * src.element_size())); - TORCH_INTERNAL_ASSERT(dst_.nbytes() >= dst_byte_offset); + const void* host_src = static_cast(src.storage().data()) + src_byte_offset; + + TORCH_INTERNAL_ASSERT(src.dtype() == dst.dtype() && src.strides() == dst.strides() && is_strided_contiguous(src)); - NSUInteger sourceOffset = 0; @autoreleasepool { MTLResourceOptions options = MTLResourceOptionCPUCacheModeDefault | MTLResourceStorageModeShared; NSUInteger alignedLength = 0; + NSUInteger sourceOffset = 0; - void* alignedPtr = pageAlignedBlockPtr(host_src, (NSUInteger)src_total_size, &alignedLength); + void* alignedPtr = pageAlignedBlockPtr(host_src, (NSUInteger)size_to_copy, &alignedLength); id sourceBuffer = [device newBufferWithBytesNoCopy:alignedPtr length:alignedLength options:options deallocator:nil]; sourceOffset = uintptr_t(host_src) - uintptr_t(alignedPtr); - if (src_.is_view() || !src_.is_contiguous()) - sourceOffset += src_.storage_offset() * src_.itemsize(); stream->copy_and_sync(sourceBuffer, destBuffer, size_to_copy, sourceOffset, dst_byte_offset, non_blocking); [sourceBuffer release]; } +} - return dst_; +static at::Tensor& copy_to_mps_(at::Tensor& dst_, const at::Tensor& src_, bool non_blocking) +{ + // Typecast to dst_ if needed and expand, which is a no-op + Tensor src = (src_.dtype() != dst_.dtype() ? src_.to(dst_.dtype()) : src_).expand_as(dst_); + + // If src is not contiguously strided it must be cloned + // It does not mean that tensor is contiguous, but rather + // that it could be represented as 1d view + if (!is_strided_contiguous(src)) { + src = src.clone(); + TORCH_INTERNAL_ASSERT(is_strided_contiguous(src)); + } + Tensor dst = dst_; + bool needs_copy = false; + // If src and dst_ strides do not match, it means that + // either dst_ is not representable as 1d view or its stride order is different + // in that case create an empty storage like src, copy it to device and then do + // reshaping on the device + if (src.strides() != dst_.strides()) { + needs_copy = true; + dst = at::empty_like(src, at::device(at::kMPS)); + } + copy_to_mps_stride_contig(dst, src, non_blocking && !needs_copy); + return needs_copy? dst_.copy_(dst) : dst_; } void copy_blit_mps(void* dst, const void* src, size_t size) { @@ -235,17 +274,29 @@ void copy_blit_mps(void* dst, const void* src, size_t size) { } else { src = src_; } + id destBuffer = getMTLBufferStorage(dst_); + id sourceBuffer = getMTLBufferStorage(src); + // Scatter to `dst` if the memory is not contiguous // If the memory is not contiguous, it means that the tensor has strides and we would not be // able to do the copy using a single blit if (!dst_.is_contiguous()) { - return scatterViewTensor(src, dst_); + Tensor tmp; + if (src.dtype() != dst_.dtype()) { + id tmpBuffer = sourceBuffer; + if (src.element_size() < dst_.element_size()) { + tmp = at::native::empty_mps(dst_.sizes(), dst_.scalar_type(), c10::nullopt, kMPS); + tmpBuffer = getMTLBufferStorage(tmp); + } + + copy_cast_mps(dst_, src, tmpBuffer, sourceBuffer); + } + + return scatterViewTensor((src.dtype() != dst_.dtype() && tmp.has_storage()) ? tmp : src, dst_); } src._set_conj(src_.is_conj()); src._set_neg(src_.is_neg()); - id destBuffer = getMTLBufferStorage(dst_); - id sourceBuffer = getMTLBufferStorage(src); const size_t src_size = src.nbytes(); if (src.dtype() == dst_.dtype()) { MPSStream* stream = getCurrentMPSStream(); diff --git a/aten/src/ATen/native/mps/operations/Distributions.mm b/aten/src/ATen/native/mps/operations/Distributions.mm index 999b1cc79d5b..99d01c6825b3 100644 --- a/aten/src/ATen/native/mps/operations/Distributions.mm +++ b/aten/src/ATen/native/mps/operations/Distributions.mm @@ -3,429 +3,302 @@ #include #include #include +#include namespace at { namespace native { +namespace mps { -Tensor& uniform_mps_(Tensor& input, double from, double to, c10::optional gen_) +struct RandomCachedGraph : public MPSCachedGraph { - using namespace mps; - - if (input.numel() == 0) { - return input; + RandomCachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) { + // initialize Philox state values (only required once when graph is created) + const auto seed = c10::detail::getNonDeterministicRandom(); + const auto subsequence = c10::detail::getNonDeterministicRandom(); + philoxState = at::Philox4_32(seed, subsequence); + // the two last state values are the Philox keys which are initialized once only + stateValues[5] = static_cast(seed); + stateValues[6] = static_cast(seed >> 32); + } + // Only relevant for multinomial + MPSGraphTensor *probTensor = nil; + MPSGraphTensor *resultTensor = nil; + MPSGraphTensor *stateTensor = nil; + // used for Normal distributions only + MPSGraphTensor *meanTensor = nil, *stdTensor = nil; + // we initialize and keep the philox's state in the graph. This would + // guarantee producing new random values each time the same graph is reused. + at::Philox4_32 philoxState; + std::array stateValues = {1}; + + void updatePhiloxCounters() { + // calling philoxState() would call operator() of philox_engine class to + // get each of the four newly generated counter values (see PhiloxRNGEngine.h). + for (int i = 1; i <= 4; i++) + stateValues[i] = philoxState(); + } +}; + +typedef MPSGraphTensor* (^RandomOpBlock)(RandomCachedGraph*, MPSGraphTensor*); +#define RandomOpFn(graph, randomTensor) MPSGraphTensor* (mps::RandomCachedGraph* graph, MPSGraphTensor* randomTensor) + +// for Uniform distributions with scalar from (val1) and to (val2) intervals +// for Normal distributions with scalar mean (val1) and std (val2) values +template +Tensor& random_mps_impl(Tensor& self, scalar_t val1, scalar_t val2, + const c10::optional& mean_opt, + const c10::optional& std_opt, + MPSGraphRandomDistribution distribution, + std::string op_name, RandomOpBlock randomBlock) +{ + if (self.numel() == 0) { + return self; } - double delta = (to - from); - AT_DISPATCH_FLOATING_TYPES(input.scalar_type(), "check_uniform_bounds", [&] { - const auto min = static_cast(std::numeric_limits::lowest()); - const auto max = static_cast(std::numeric_limits::max()); - TORCH_CHECK(from <= to, "uniform_ expects to return a [from, to) range, but found from=", from, " > to=", to); - TORCH_CHECK((to - from) <= std::numeric_limits::max(), - "uniform_ expects to-from <= std::numeric_limits<", toString(input.scalar_type()), - ">::max(), but found to=", to, " and from=", from, - " which result in to-from to exceed the limit"); - from = std::min(std::max(from, min), max); - to = std::max(std::min(to, max), min); - }); - - struct CachedGraph : public MPSCachedGraph - { - CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) {} - MPSGraphTensor *outputTensor_ = nil; - }; - MPSGraphCache* cache_ = MPSGraphCache::getInstance(); - MPSStream* stream = getCurrentMPSStream(); - uint64_t seed_ = c10::detail::getNonDeterministicRandom(true); @autoreleasepool { - MPSShape* input_shape = getMPSShape(input); - string key = "uniform_mps_" + getTensorsStringKey(input) + ":" + to_string(from) + ":" + to_string(to) + ":" + to_string(seed_); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); - - if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { + string key = op_name + getTensorsStringKey({self}) + ":" + to_string(val1) + ":" + to_string(val2); + auto cachedGraph = cache_->LookUpAs(key); - CachedGraph *newCachedGraph = nil; + if (!cachedGraph) { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph * () { + RandomCachedGraph *newCachedGraph = nil; @autoreleasepool { MPSGraph* mpsGraph = make_mps_graph(); - newCachedGraph = new CachedGraph(mpsGraph); - - // TODO: right now taking the default seed. Extend it to be extracted from the - // MPSGenerator - MPSGraphTensor* randomTensor = [mpsGraph randomUniformTensorWithShape:input_shape - seed:seed_ - name:nil]; - MPSGraphTensor* deltaTensor = [mpsGraph constantWithScalar:delta - shape:input_shape - dataType:MPSDataTypeFloat32]; - MPSGraphTensor* fromTensor = [mpsGraph constantWithScalar:from - shape:input_shape - dataType:MPSDataTypeFloat32]; - MPSGraphTensor* mulTensor = [mpsGraph multiplicationWithPrimaryTensor:randomTensor - secondaryTensor:deltaTensor - name:nil]; - MPSGraphTensor* outputTensor = [mpsGraph additionWithPrimaryTensor:mulTensor - secondaryTensor:fromTensor - name:nil]; - newCachedGraph->outputTensor_ = outputTensor; - + newCachedGraph = new RandomCachedGraph(mpsGraph); + newCachedGraph->stateTensor = mpsGraphRankedPlaceHolder(mpsGraph, MPSDataTypeInt32, @[@7]); + + // FP16, FP32 and Int32 are the only data types supported for distributions on MPS backend. + const MPSDataType inputDataType = [&] { + // only for random_mps, we pass interval range of type int64_t + if (std::is_same::value) + return MPSDataTypeInt32; + else + return (self.scalar_type() == ScalarType::Half) ? MPSDataTypeFloat16 : MPSDataTypeFloat32; + }(); + const MPSDataType outputDataType = (std::is_same::value) ? MPSDataTypeBool : inputDataType; + + MPSGraphRandomOpDescriptor *desc = [MPSGraphRandomOpDescriptor descriptorWithDistribution: distribution + dataType: inputDataType]; + if (distribution == MPSGraphRandomDistributionUniform) { + if (inputDataType == MPSDataTypeInt32) { + desc.minInteger = static_cast(val1); + desc.maxInteger = static_cast(val2); + } else { + desc.min = static_cast(val1); + desc.max = static_cast(val2); + } + } else if (distribution == MPSGraphRandomDistributionNormal) { + desc.mean = static_cast(val1); + desc.standardDeviation = static_cast(val2); + } + // we don't use the output state tensor from the MPSGraph API as it requires reading back from GPU to CPU. + // Instead, we keep the Philox state in the cached graph and use the PyTorch's philox_engine to maintain + // the counters, and feed them to the graph manually + NSArray *resultTensors = [mpsGraph randomTensorWithShape: getMPSShape(self) + descriptor: desc + stateTensor: newCachedGraph->stateTensor + name: nil]; + newCachedGraph->resultTensor = randomBlock ? randomBlock(newCachedGraph, resultTensors[0]) : resultTensors[0]; + // results will be cast if self's scalar type isn't directly supported by MPS backend. + if (getMPSDataType(self.scalar_type()) != outputDataType) + newCachedGraph->resultTensor = castMPSTensor(mpsGraph, newCachedGraph->resultTensor, self.scalar_type()); } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); + } + // update the Philox state values on each run of the same graph + cachedGraph->updatePhiloxCounters(); + // feed the updated state values to the graph + MPSNDArrayDescriptor *stateDesc = [MPSNDArrayDescriptor descriptorWithDataType: MPSDataTypeInt32 shape: @[@7]]; + MPSNDArray *stateNDArray = [[[MPSNDArray alloc] initWithDevice: stream->device() descriptor: stateDesc] autorelease]; + [stateNDArray writeBytes: &cachedGraph->stateValues[0] strideBytes: nil]; + MPSGraphTensorData* stateTensorData = [[[MPSGraphTensorData alloc] initWithMPSNDArray: stateNDArray] autorelease]; + + Placeholder meanPlaceholder, stdPlaceholder; + NSMutableDictionary *feeds = [[NSMutableDictionary new] autorelease]; + feeds[cachedGraph->stateTensor] = stateTensorData; + + if (cachedGraph->stdTensor) { + const Tensor& stdTensor = *(at::borrow_from_optional_tensor(std_opt)); + stdPlaceholder = Placeholder(cachedGraph->stdTensor, stdTensor); + feeds[stdPlaceholder.getMPSGraphTensor()] = stdPlaceholder.getMPSGraphTensorData(); + } + if (cachedGraph->meanTensor) { + const Tensor& meanTensor = *(at::borrow_from_optional_tensor(mean_opt)); + meanPlaceholder = Placeholder(cachedGraph->meanTensor, meanTensor); + feeds[meanPlaceholder.getMPSGraphTensor()] = meanPlaceholder.getMPSGraphTensorData(); } - auto outputPlaceholder = Placeholder(cachedGraph->outputTensor_, input); - NSDictionary *feeds = nil; - NSDictionary* results = @{ - outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() + Placeholder outputPlaceholder = Placeholder(cachedGraph->resultTensor, self); + NSDictionary *results = @{ + outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData(), }; runMPSGraph(stream, cachedGraph->graph(), feeds, results); + } + + return self; +} +Tensor& normal_mps_impl(Tensor& self, double mean_s, double std_s, + const c10::optional& mean_opt, + const c10::optional& std_opt, + std::string op_name) +{ + const Tensor& std_t = *(at::borrow_from_optional_tensor(std_opt)); + const Tensor& mean_t = *(at::borrow_from_optional_tensor(mean_opt)); + + TORCH_CHECK(std_s >= 0.0, op_name, " expects std >= 0.0, but found std=", std_s); + if (std_t.defined()) { + TORCH_CHECK(!std_t.is_complex(), op_name, " expects standard deviation to be non-complex"); + if (mean_t.defined()) + TORCH_CHECK(mean_t.numel() == std_t.numel(), op_name, ": mean and std must have same number of elements") } - return input; + RandomOpBlock random_op_block = ^RandomOpFn(cachedGraph, randomTensor) { + MPSGraph* mpsGraph = cachedGraph->graph(); + MPSGraphTensor* resultTensor = randomTensor; + + if (std_t.defined()) { + cachedGraph->stdTensor = mpsGraphRankedPlaceHolder(mpsGraph, std_t); + resultTensor = [mpsGraph multiplicationWithPrimaryTensor: randomTensor + secondaryTensor: cachedGraph->stdTensor + name: nil]; + } + if (mean_t.defined()) { + cachedGraph->meanTensor = mpsGraphRankedPlaceHolder(mpsGraph, mean_t); + return [mpsGraph additionWithPrimaryTensor: resultTensor + secondaryTensor: cachedGraph->meanTensor + name: nil]; + } + return resultTensor; + }; + return random_mps_impl(self, mean_s, std_s, mean_opt, std_opt, + MPSGraphRandomDistributionNormal, + op_name + getTensorsStringKey({mean_t, std_t}), random_op_block); + +} + +Tensor& bernoulli_mps_impl(Tensor& self, const Tensor& prob_t, std::string op_name) +{ + TORCH_CHECK(prob_t.is_same_size(self), op_name, ": probability and self tensor should be of the same shape") + + RandomOpBlock random_op_block = ^RandomOpFn(cachedGraph, randomTensor) { + MPSGraph* mpsGraph = cachedGraph->graph(); + cachedGraph->stdTensor = mpsGraphRankedPlaceHolder(mpsGraph, prob_t); + return [mpsGraph lessThanWithPrimaryTensor: randomTensor + secondaryTensor: cachedGraph->stdTensor + name: nil]; + }; + // Bernoulli generates binary output so we use bool type + return mps::random_mps_impl(self, 0.0, 1.0, c10::nullopt, prob_t, + MPSGraphRandomDistributionUniform, + op_name + getTensorsStringKey({prob_t}), random_op_block); +} + +} // namespace mps + +Tensor& uniform_mps_(Tensor& self, double from, double to, c10::optional gen) { + AT_DISPATCH_FLOATING_TYPES_AND_HALF(self.scalar_type(), "check_uniform_bounds", [&] { + const auto min = static_cast(std::numeric_limits::lowest()); + const auto max = static_cast(std::numeric_limits::max()); + TORCH_CHECK(from <= to, "uniform_ expects to return a [from, to) range, but found from=", from, " > to=", to); + TORCH_CHECK((to - from) <= std::numeric_limits::max(), + "uniform_ expects to-from <= std::numeric_limits<", toString(self.scalar_type()), + ">::max(), but found to=", to, " and from=", from, + " which result in to-from to exceed the limit"); + from = std::min(std::max(from, min), max); + to = std::max(std::min(to, max), min); + }); + + return mps::random_mps_impl(self, from, to, c10::nullopt, c10::nullopt, + MPSGraphRandomDistributionUniform, __func__, nullptr); } Tensor& normal_mps_(Tensor& self, double mean, double std, c10::optional gen) { - if (self.numel() == 0) - return self; - TORCH_CHECK(std >= 0.0, "normal_mps_ expects std >= 0.0, but found std=", std); - - Tensor mean_t = empty_mps( - self.sizes(), - self.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - mean_t.fill_(mean); - - Tensor std_t = empty_mps( - self.sizes(), - self.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - std_t.fill_(std); - - return normal_mps_out(mean_t, std_t, gen, self); + return mps::normal_mps_impl(self, mean, std, c10::nullopt, c10::nullopt, __func__); } Tensor normal_mps(const Tensor& mean, double std, c10::optional gen) { - Tensor output = empty_mps( - mean.sizes(), - mean.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - - Tensor std_t = empty_mps( - output.sizes(), - output.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - std_t.fill_(std); - - return normal_mps_out(mean, std_t, gen, output); + Tensor self = empty_mps(mean.sizes(), mean.scalar_type(), c10::nullopt, kMPS); + return mps::normal_mps_impl(self, 0.0, std, mean, c10::nullopt, __func__); } Tensor normal_mps(double mean, const Tensor& std, c10::optional gen) { - Tensor output = empty_mps( - std.sizes(), - std.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - - Tensor mean_t = empty_mps( - output.sizes(), - output.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - mean_t.fill_(mean); - - return normal_mps_out(mean_t, std, gen, output); + Tensor self = empty_mps(std.sizes(), std.scalar_type(), c10::nullopt, kMPS); + // when there's no tensor-type mean, we cannot pass scalar mean value due to the order of + // multiply/add ops in random computation. So we create a mean tensor instead. + Tensor mean_t = at::full_like(self, Scalar(mean)); + return mps::normal_mps_impl(self, 0.0, 1.0, mean_t, std, __func__); } Tensor normal_mps(const Tensor& mean, const Tensor& std, c10::optional gen) { auto shape = at::infer_size(mean.sizes(), std.sizes()); - - Tensor output = empty_mps( - shape, - mean.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - - return normal_mps_out(mean, std, gen, output); + Tensor self = empty_mps(shape, mean.scalar_type(), c10::nullopt, kMPS); + return mps::normal_mps_impl(self, 0.0, 1.0, mean, std, __func__); } -Tensor& normal_mps_out(const Tensor& mean, double std, c10::optional gen, Tensor& output) { - TORCH_CHECK(std >= 0.0, "normal_mps_out expects std >= 0.0, but found std=", std); - - Tensor std_t = empty_mps( - output.sizes(), - output.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - std_t.fill_(std); - - return normal_mps_out(mean, std_t, gen, output); - +Tensor& normal_mps_out(const Tensor& mean, double std, c10::optional gen, Tensor& self) { + return mps::normal_mps_impl(self, 0.0, std, mean, c10::nullopt, __func__); } -Tensor& normal_mps_out(double mean, const Tensor& std, c10::optional gen, Tensor& output) { - Tensor mean_t = empty_mps( - output.sizes(), - output.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - mean_t.fill_(mean); - - return normal_mps_out(mean_t, std, gen, output); - +Tensor& normal_mps_out(double mean, const Tensor& std, c10::optional gen, Tensor& self) { + // when there's no tensor-type mean, we cannot pass scalar mean value due to the order of + // multiply/add ops in random computation. So we create a mean tensor instead. + Tensor mean_t = at::full_like(self, Scalar(mean)); + return mps::normal_mps_impl(self, 0.0, 1.0, mean_t, std, __func__); } -Tensor& normal_mps_out(const Tensor& mean, const Tensor& std, c10::optional gen, Tensor& output) { - TORCH_CHECK(!std.is_complex(), "normal expects standard deviation to be non-complex"); - // Check that mean and std have same number of elements +Tensor& normal_mps_out(const Tensor& mean, const Tensor& std, c10::optional gen, Tensor& self) { TORCH_CHECK(mean.numel() == std.numel(), "normal_mps_out: mean and std must have same number of elements") - - using namespace mps; - - struct CachedGraph : public MPSCachedGraph - { - CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) {} - MPSGraphTensor* outputTensor_ = nil; - MPSGraphTensor* meanTensor_ = nil; - MPSGraphTensor* stdTensor_ = nil; - }; - - MPSGraphCache* cache_ = MPSGraphCache::getInstance(); - - MPSStream* stream = getCurrentMPSStream(); - uint64_t seed_ = c10::detail::getNonDeterministicRandom(true); - - @autoreleasepool { - MPSShape* input_shape = getMPSShape(output); - string key = "normal_mps_out:" + getMPSShapeString(input_shape) + ":" + getMPSTypeString(output.scalar_type()) + ":" + to_string(seed_); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); - - if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { - - CachedGraph *newCachedGraph = nil; - - @autoreleasepool { - MPSGraph* mpsGraph = make_mps_graph(); - newCachedGraph = new CachedGraph(mpsGraph); - - MPSGraphRandomOpDescriptor* desc = [[MPSGraphRandomOpDescriptor new] autorelease]; - desc.distribution = MPSGraphRandomDistributionNormal; - desc.dataType = getMPSDataType(output.scalar_type()); - desc.mean = 0.0; - desc.standardDeviation = 1.0; - - MPSGraphTensor* meanTensor = mpsGraphRankedPlaceHolder(mpsGraph, getMPSDataType(output.scalar_type()), input_shape); - MPSGraphTensor* stdTensor = mpsGraphRankedPlaceHolder(mpsGraph, getMPSDataType(output.scalar_type()), input_shape); - - // TODO: right now taking the default seed. Extend it to be extracted from the - // MPSGenerator - MPSGraphTensor* randomTensor = [mpsGraph randomTensorWithShape:input_shape - descriptor:desc - seed:seed_ - name:nil]; - MPSGraphTensor* scaleTensor = [mpsGraph multiplicationWithPrimaryTensor:randomTensor - secondaryTensor:stdTensor - name:nil]; - MPSGraphTensor* outputTensor = [mpsGraph additionWithPrimaryTensor:scaleTensor - secondaryTensor:meanTensor - name:nil]; - newCachedGraph->meanTensor_ = meanTensor; - newCachedGraph->stdTensor_ = stdTensor; - newCachedGraph->outputTensor_ = outputTensor; - - } - return newCachedGraph; - }); - cachedGraph = static_cast(tmpCachedGraph); - } - - auto meanPlaceholder = Placeholder(cachedGraph->meanTensor_, mean); - auto stdPlaceholder = Placeholder(cachedGraph->stdTensor_, std); - auto outputPlaceholder = Placeholder(cachedGraph->outputTensor_, output); - NSDictionary *feeds = @{ - meanPlaceholder.getMPSGraphTensor() : meanPlaceholder.getMPSGraphTensorData(), - stdPlaceholder.getMPSGraphTensor() : stdPlaceholder.getMPSGraphTensorData() - }; - NSDictionary* results = @{ - outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() - }; - - runMPSGraph(stream, cachedGraph->graph(), feeds, results); - - } - - return output; + return mps::normal_mps_impl(self, 0.0, 1.0, mean, std, __func__); } Tensor& bernoulli_out_mps(const Tensor& p_, c10::optional gen, Tensor& result) { result.resize_(p_.sizes()); - return bernoulli_mps_(result, p_, gen); + return mps::bernoulli_mps_impl(result, p_, __func__); } Tensor& bernoulli_mps_(Tensor& self, double p, c10::optional gen) { - TORCH_CHECK(0 <= p && p <= 1, "bernoulli_mps_ expects p to be in [0, 1], but got p=", p); - Tensor p_t = empty_mps( - self.sizes(), - self.scalar_type(), - c10::nullopt, - kMPS, - c10::nullopt, - c10::nullopt); - p_t.fill_(p); - - return bernoulli_mps_(self, p_t, gen); + TORCH_CHECK(0.0 <= p && p <= 1.0, "bernoulli_mps_ expects p to be in [0, 1], but got p=", p); + Tensor prob_t = at::full_like(self, Scalar(p)); + return mps::bernoulli_mps_impl(self, prob_t, __func__); } Tensor& bernoulli_mps_(Tensor& self, const Tensor& p_, c10::optional gen) { - TORCH_CHECK(self.is_same_size(p_), "bernoulli_mps_: probability and self tensor should be of the same shape") - - using namespace mps; - - MPSStream* stream = getCurrentMPSStream(); - uint64_t seed_ = c10::detail::getNonDeterministicRandom(true); - - @autoreleasepool { - MPSShape* input_shape = getMPSShape(self); - - auto mps_dtype = getMPSDataType(p_.scalar_type()); - - MPSGraph* mpsGraph = make_mps_graph(); - - MPSGraphTensor* probTensor = mpsGraphRankedPlaceHolder(mpsGraph, mps_dtype, input_shape); - - // TODO: right now taking the default seed. Extend it to be extracted from the - // MPSGenerator - MPSGraphTensor* randomTensor = [mpsGraph randomUniformTensorWithShape:input_shape - seed:seed_ - name:nil]; - MPSGraphTensor* outputTensor = [mpsGraph lessThanWithPrimaryTensor:randomTensor - secondaryTensor:probTensor - name:nil]; - - auto probPlaceholder = Placeholder(probTensor, p_); - auto outputPlaceholder = Placeholder(outputTensor, self); - NSDictionary *feeds = @{ - probPlaceholder.getMPSGraphTensor() : probPlaceholder.getMPSGraphTensorData(), - }; - NSDictionary* results = @{ - outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() - }; - - runMPSGraph(stream, mpsGraph, feeds, results); - } - - return self; - + return mps::bernoulli_mps_impl(self, p_, __func__); } -// Taken from ATen/native/DistributionTemplates.h -#define CHECK_OUT_OF_BOUNDS(var, name, min, max, dtype) \ - TORCH_CHECK(var >= min && var <= max, name , " is out of bounds for ", dtype); \ - -#define WARN_OUT_OF_BOUNDS(var, name, digits, dtype) \ - if (var < -(1LL << digits) || var > (1LL << digits)) { \ - TORCH_WARN(name , " is out of bounds [-(2^", digits, "), 2^", digits, "]. ", \ - "Due to precision limitations ", dtype, " can support discrete uniform distribution only within this range. ", \ - "This warning will become an error in version 1.7 release, please fix the code in advance"); \ - } - -// Modified from ATen/native/DistributionTemplates.h -static void check_from_to_in_range(int64_t from, int64_t to_inc, ScalarType scalar_type) { - const auto dtype = scalarTypeToTypeMeta(scalar_type); - if (isFloatingType(scalar_type)) { - AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, scalar_type, "check_random_fp_bounds", [&] { - const auto min = static_cast(std::numeric_limits::lowest()); - const auto max = static_cast(std::numeric_limits::max()); - CHECK_OUT_OF_BOUNDS(from, "from", min, max, dtype); - CHECK_OUT_OF_BOUNDS(to_inc, "to - 1", min, max, dtype); - - constexpr auto digits = std::numeric_limits::digits; - WARN_OUT_OF_BOUNDS(from, "from", digits, dtype); - WARN_OUT_OF_BOUNDS(to_inc, "to - 1", digits, dtype); - }); - } else if (isIntegralType(scalar_type, /*includeBool=*/true)) { - AT_DISPATCH_INTEGRAL_TYPES_AND(at::ScalarType::Bool, scalar_type, "check_random_integral_bounds", [&]() { - const auto min = static_cast(std::numeric_limits::lowest()); - const auto max = static_cast(std::numeric_limits::max()); - CHECK_OUT_OF_BOUNDS(from, "from", min, max, dtype); - CHECK_OUT_OF_BOUNDS(to_inc, "to - 1", min, max, dtype); - }); - } else { - TORCH_CHECK(false, "check_random_bounds handles only integral, floating-point and boolean types"); - } -} - - // random_.from -Tensor& random_mps_ - (Tensor& self, - int64_t from, - optional to_opt, - c10::optional gen) { - - using namespace mps; - - MPSStream* stream = getCurrentMPSStream(); - uint64_t seed_ = c10::detail::getNonDeterministicRandom(true); - +Tensor& random_mps_(Tensor& self, int64_t from, c10::optional to_opt, c10::optional gen) { auto input_dtype = self.scalar_type(); + int64_t to = 0; - int64_t to; - - if(to_opt.has_value()) { + if (to_opt.has_value()) { // [from, to) to = *to_opt; TORCH_CHECK(from < to, "random_mps_ expects 'from' to be less than 'to', but got from=", from, " >= to=", to); if (isFloatingType(input_dtype)) { - // TODO: what is "random_update_from_to"? AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input_dtype, "random_update_from_to", [&] { from = templates::update_from(from); to = templates::update_to(to); TORCH_CHECK(from < to, "random_mps_ expects 'from' casted to dtype to be less than 'to' casted to dtype, but got from=", from, " >= to=", to); }); - check_from_to_in_range(from, to - 1, input_dtype); + templates::check_from_to_in_range(from, to - 1, self.dtype()); } - } - else if (from != std::numeric_limits::lowest()) { + } else if (from != std::numeric_limits::lowest()) { // [from, std::numeric_limits::max()] - to = 0; - if(isFloatingType(input_dtype)) { + if (isFloatingType(input_dtype)) { AT_DISPATCH_FLOATING_TYPES_AND2(at::ScalarType::Half, at::ScalarType::BFloat16, input_dtype, "random_from_to_range_calc", [&] { constexpr int64_t scalar_t_max = static_cast(1) << std::numeric_limits::digits; to = scalar_t_max > std::numeric_limits::max() ? std::numeric_limits::max() : static_cast(scalar_t_max); from = templates::update_from(from); TORCH_CHECK(from < to, "random_mps_ expects 'from' casted to dtype to be less than or equal to 'to' casted to dtype, but got from=", from, " > to=", to); }); - } - else if(isIntegralType(input_dtype, /*includeBool=*/true)) { + } else if (isIntegralType(input_dtype, /*includeBool=*/true)) { AT_DISPATCH_INTEGRAL_TYPES_AND(at::ScalarType::Bool, input_dtype, "random_from_to_range_calc", [&] { if (std::is_same::value) { to = static_cast(true); @@ -437,124 +310,294 @@ static void check_from_to_in_range(int64_t from, int64_t to_inc, ScalarType scal else { TORCH_CHECK(false, "random_mps_ handles only integral, floating-point and boolean types"); } - check_from_to_in_range(from, to, input_dtype); + templates::check_from_to_in_range(from, to, self.dtype()); } else { // [std::numeric_limits::lowest(), std::numeric_limits::max()] // range = 2^64 - // TODO - how to implement this? + // TODO - should we error out in case max is beyond MPS limit (INT32_MAX)? TORCH_CHECK(false, "random_mps_ currently does not handle the lowest() -> max() range"); - - } - - @autoreleasepool { - MPSShape* input_shape = getMPSShape(self); - - MPSGraph* mpsGraph = make_mps_graph(); - - MPSGraphRandomOpDescriptor* descriptor = [MPSGraphRandomOpDescriptor descriptorWithDistribution:MPSGraphRandomDistributionUniform - dataType:MPSDataTypeInt32]; - descriptor.minInteger = from; - descriptor.maxInteger = to - 1; - - // TODO: right now taking the default seed. Extend it to be extracted from the - // MPSGenerator - MPSGraphTensor* randomTensor = [mpsGraph randomTensorWithShape:input_shape - descriptor:descriptor - seed:seed_ - name:nil]; - - MPSGraphTensor* outputTensor = nil; - - if(input_dtype != ScalarType::Int) - outputTensor = [mpsGraph castTensor:randomTensor - toType:getMPSDataType(input_dtype) - name:@"outputTensor"]; - else - outputTensor = randomTensor; - - auto outputPlaceholder = Placeholder(outputTensor, self); - NSDictionary *feeds = nil; - NSDictionary* results = @{ - outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() - }; - - runMPSGraph(stream, mpsGraph, feeds, results); } - return self; - + return mps::random_mps_impl(self, from, to - 1, c10::nullopt, c10::nullopt, + MPSGraphRandomDistributionUniform, __func__, nullptr); } -Tensor& random_mps_ - (Tensor& self, - int64_t to, - c10::optional gen) { - +Tensor& random_mps_(Tensor& self, int64_t to, c10::optional gen) { return random_mps_(self, 0, to, gen); } // Exponential distribution - Tensor& exponential_mps_(Tensor& self, double lambda, c10::optional gen) { + TORCH_CHECK(lambda > 0, "exponential_mps_: lambda must be greater than zero") - using namespace mps; + mps::RandomOpBlock random_op_block = ^RandomOpFn(cachedGraph, randomTensor) { + MPSGraph* mpsGraph = cachedGraph->graph(); + MPSGraphTensor* unitTensor = [mpsGraph constantWithScalar: 1.0f + dataType: randomTensor.dataType]; + MPSGraphTensor* minusLambdaTensor = [mpsGraph constantWithScalar: -lambda + dataType: randomTensor.dataType]; + MPSGraphTensor* subtractTensor = [mpsGraph subtractionWithPrimaryTensor: unitTensor + secondaryTensor: randomTensor + name: nil]; + MPSGraphTensor* logTensor = [mpsGraph logarithmWithTensor: subtractTensor + name: nil]; + return [mpsGraph divisionWithPrimaryTensor: logTensor + secondaryTensor: minusLambdaTensor + name: nil]; + }; + return mps::random_mps_impl(self, 0.0, 1.0, c10::nullopt, c10::nullopt, + MPSGraphRandomDistributionUniform, + "exponential_mps_:" + std::to_string(lambda), random_op_block); +} - if (self.numel() == 0) { - return self; - } +Tensor& multinomial_with_replacement_mps_kernel( + const Tensor& self, + const int64_t n_sample, + c10::optional generator, + Tensor& result) { - TORCH_CHECK(lambda > 0, "exponential_mps_: lambda must be greater than zero") + using namespace mps; - struct CachedGraph : public MPSCachedGraph - { - CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) {} - MPSGraphTensor *outputTensor_ = nil; - }; + int inputSize = self.dim(); + int numDist = + inputSize == 1 ? 1 : self.size(0); + int numCategories = + inputSize == 1 ? self.size(0) : self.size(1); + + // Restructure data for 2d + auto self_v = inputSize == 1 ? self.view({numDist, numCategories}) : self; + auto result_v = inputSize == 1 ? result.view({numDist, n_sample}) : result; MPSStream* stream = getCurrentMPSStream(); - uint64_t seed_ = c10::detail::getNonDeterministicRandom(true); + MPSGraphCache* cache_ = MPSGraphCache::getInstance(); @autoreleasepool { - MPSShape* self_shape = getMPSShape(self); - - MPSGraph* mpsGraph = make_mps_graph(); - // TODO: right now taking the default seed. Extend it to be extracted from the - // MPSGenerator - MPSGraphTensor* randomTensor = [mpsGraph randomUniformTensorWithShape:self_shape - seed:seed_ - name:nil]; - MPSGraphTensor* unitTensor = [mpsGraph constantWithScalar:1.0f - dataType:MPSDataTypeFloat32]; - MPSGraphTensor* minusLambdaTensor = [mpsGraph constantWithScalar:-lambda - dataType:MPSDataTypeFloat32]; - MPSGraphTensor* subtractTensor = [mpsGraph subtractionWithPrimaryTensor:unitTensor - secondaryTensor:randomTensor - name:nil]; - MPSGraphTensor* logTensor = [mpsGraph logarithmWithTensor:subtractTensor - name:nil]; - MPSGraphTensor* outputTensor = [mpsGraph divisionWithPrimaryTensor:logTensor - secondaryTensor:minusLambdaTensor + string key = "multinomial_with_replacement:" + getTensorsStringKey({self}) + ":" + to_string(n_sample); + auto cachedGraph = cache_->LookUpAs(key); + if (!cachedGraph) { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph * () { + RandomCachedGraph *newCachedGraph = nil; + @autoreleasepool { + MPSShape* prob_shape = getMPSShape(self_v); + MPSGraph* mpsGraph = make_mps_graph(); + newCachedGraph = new RandomCachedGraph(mpsGraph); + newCachedGraph->stateTensor = mpsGraphRankedPlaceHolder(mpsGraph, MPSDataTypeInt32, @[@7]); + + auto prob_dtype = getMPSDataType(self_v.scalar_type()); + + // This is probability weights + newCachedGraph->probTensor = mpsGraphRankedPlaceHolder(mpsGraph, getMPSDataType(self_v.scalar_type()), prob_shape); + + MPSGraphTensor *sumProbs = [mpsGraph reductionSumWithTensor:newCachedGraph->probTensor + axis:-1 + name:nil]; + + MPSGraphTensor *normalizedProbs = [mpsGraph divisionWithPrimaryTensor:newCachedGraph->probTensor + secondaryTensor:sumProbs + name:nil]; + + auto ns_numCategories = [NSNumber numberWithInt:numCategories]; + auto ns_numDist = [NSNumber numberWithInt:numDist]; + auto ns_n_sample = [NSNumber numberWithInt:n_sample]; + + MPSGraphTensor *ones = [mpsGraph constantWithScalar:1.0f + shape:@[ns_numCategories, ns_numCategories] + dataType:prob_dtype]; + auto zeroTensor = [mpsGraph constantWithScalar: 0.0f + dataType: MPSDataTypeInt32]; + auto minusOneTensor = [mpsGraph constantWithScalar: -1.0f + dataType: MPSDataTypeInt32]; + + MPSGraphTensor *upperTriangle = [mpsGraph bandPartWithTensor:ones + numLowerTensor:zeroTensor + numUpperTensor:minusOneTensor name:nil]; + MPSGraphTensor *upperProbRange = [mpsGraph matrixMultiplicationWithPrimaryTensor:normalizedProbs + secondaryTensor:upperTriangle + name:nil]; - if(getMPSDataType(self.scalar_type()) != MPSDataTypeFloat32) - outputTensor = [mpsGraph castTensor:outputTensor - toType:getMPSDataType(self.scalar_type()) - name:@"output"]; + MPSGraphTensor *lowerProbRange = [mpsGraph subtractionWithPrimaryTensor:upperProbRange + secondaryTensor:normalizedProbs + name:nil]; - auto outputPlaceholder = Placeholder(outputTensor, self); - NSDictionary *feeds = nil; + upperProbRange = [mpsGraph reshapeTensor:upperProbRange + withShape:@[ns_numDist, @1, ns_numCategories] + name:nil]; + lowerProbRange = [mpsGraph reshapeTensor:lowerProbRange + withShape:@[ns_numDist, @1, ns_numCategories] + name:nil]; + + MPSGraphRandomOpDescriptor *descriptor = [MPSGraphRandomOpDescriptor descriptorWithDistribution:MPSGraphRandomDistributionUniform + dataType:prob_dtype]; + NSArray *generatorTensors = [mpsGraph randomTensorWithShape:@[ns_numDist, ns_n_sample, @1] + descriptor:descriptor + stateTensor:newCachedGraph->stateTensor + name:nil]; + MPSGraphTensor *randomTensor = generatorTensors[0]; + + auto broadcastShape = @[ns_numDist ,ns_n_sample, ns_numCategories]; + int broadcastShapeVals[3] = {numDist, static_cast(n_sample), numCategories}; + MPSGraphTensor *broadcastShapeTensor = [mpsGraph constantWithData:[NSData dataWithBytes:broadcastShapeVals length:sizeof(int) * broadcastShape.count] + shape:@[[NSNumber numberWithUnsignedInteger:broadcastShape.count]] + dataType:MPSDataTypeUInt32]; + + MPSGraphTensor *samplesTensor = [mpsGraph broadcastTensor:randomTensor + toShape:broadcastShape + name:nil]; + MPSGraphTensor *sampleAbove = [mpsGraph greaterThanWithPrimaryTensor:samplesTensor + secondaryTensor:lowerProbRange + name:nil]; + MPSGraphTensor *sampleBelow = [mpsGraph lessThanWithPrimaryTensor:samplesTensor + secondaryTensor:upperProbRange + name:nil]; + MPSGraphTensor *sampleWithin = [mpsGraph logicalANDWithPrimaryTensor:sampleAbove + secondaryTensor:sampleBelow + name:nil]; + MPSGraphTensor *sampleMask = [mpsGraph castTensor:sampleWithin + toType:MPSDataTypeInt32 + name:@"sampleMask"]; + MPSGraphTensor *categoriesTensor = [mpsGraph coordinateAlongAxis:-1 + withShapeTensor:broadcastShapeTensor + name:nil]; + MPSGraphTensor *binnedSamplesTensor = [mpsGraph multiplicationWithPrimaryTensor:categoriesTensor + secondaryTensor:sampleMask + name:nil]; + MPSGraphTensor *reducedTensor = [mpsGraph reductionSumWithTensor:binnedSamplesTensor + axis:-1 + name:nil]; + MPSGraphTensor *reshapeTensor = [mpsGraph reshapeTensor:reducedTensor + withShape:@[ns_numDist ,ns_n_sample] + name:nil]; + newCachedGraph->resultTensor = [mpsGraph castTensor:reshapeTensor + toType:getMPSDataType(result.scalar_type()) + name:@"resultTensor"]; + } + return newCachedGraph; + }); + } + // update the Philox state values on each run of the same graph + cachedGraph->updatePhiloxCounters(); + // feed the updated state values to the graph + MPSNDArrayDescriptor *stateDesc = [MPSNDArrayDescriptor descriptorWithDataType: MPSDataTypeInt32 shape: @[@7]]; + MPSNDArray *stateNDArray = [[[MPSNDArray alloc] initWithDevice: stream->device() descriptor: stateDesc] autorelease]; + [stateNDArray writeBytes: &cachedGraph->stateValues[0] strideBytes: nil]; + MPSGraphTensorData* stateTensorData = [[[MPSGraphTensorData alloc] initWithMPSNDArray: stateNDArray] autorelease]; + + auto probPlaceholder = Placeholder(cachedGraph->probTensor, self_v); + auto outputPlaceholder = Placeholder(cachedGraph->resultTensor, result_v); + NSDictionary *feeds = @{ + cachedGraph->stateTensor : stateTensorData, + probPlaceholder.getMPSGraphTensor() : probPlaceholder.getMPSGraphTensorData() + }; NSDictionary* results = @{ outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() }; - runMPSGraph(stream, mpsGraph, feeds, results); + runMPSGraph(stream, cachedGraph->graph(), feeds, results); + } + + return result; + +} +/* The largest consecutive integer representable in float32 (2^24) */ +constexpr int64_t FLOAT32_MAX_CONSECUTIVE_INT = 1 << (FLT_MANT_DIG); + +Tensor& multinomial_out_mps(const Tensor& self, + int64_t n_sample, + bool with_replacement, + c10::optional gen, + Tensor& result) { + + TORCH_CHECK( + result.device() == self.device(), + "multinomial arguments must have the same device"); + TORCH_CHECK( + self.dim() > 0 && self.dim() <= 2, "prob_dist must be 1 or 2 dim"); + TORCH_CHECK( + at::isFloatingType(self.scalar_type()), + "multinomial only supports floating-point dtypes for input, got: ", + self.scalar_type()); + TORCH_CHECK(result.scalar_type() == ScalarType::Long, + "multinomial expects Long tensor out, got: ", result.scalar_type()); + TORCH_CHECK(n_sample > 0, "cannot sample n_sample <= 0 samples"); + int64_t n_categories = self.size(-1); + TORCH_CHECK(with_replacement || (n_sample <= n_categories), + "cannot sample n_sample > prob_dist.size(-1) samples without replacement"); + // Since the index tensor is float, numCategories cannot exceed max + // float integer precision + TORCH_CHECK( + n_categories <= FLOAT32_MAX_CONSECUTIVE_INT, + "number of categories cannot exceed 2^24"); + + if (self.dim() == 1) { + result.resize_({n_sample}); + } else { + const int64_t n_dist = self.size(0); + result.resize_({n_dist, n_sample}); + } + if (result.numel() == 0) { + return result; } - return self; + // Fast-path for no replacement (or if only one sample draw). + // Reference: + // https://github.com/pytorch/pytorch/issues/11931#issuecomment-625882503 + if (!with_replacement || n_sample == 1) { + // Sanity checks on `self`. + auto is_valid = ((self.max() < INFINITY) & (self.min() >= 0)).item(); + TORCH_CHECK( + is_valid.to(), + "probability tensor contains either `inf`, `nan` or element < 0"); + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + bool zero_prob_condition; + if (self.dim() == 1){ + zero_prob_condition = (self.sum() == 0).item().to(); + } else { + zero_prob_condition = (self.sum(1) == 0).sum().item().to(); + } + TORCH_CHECK( + !zero_prob_condition, + "invalid multinomial distribution (sum of probabilities <= 0)"); + + // The algorithm is from gumbel softmax. + // s = argmax( logp - log(-log(eps)) ) where eps ~ U(0, 1) + // Here we can apply exp to the formula which will not affect result of + // argmax or topk. Then we have + // s = argmax( p / (-log(eps)) ) where eps ~ U(0, 1). + // We can also simplify the formula above by + // s = argmax( p / q ) where q ~ Exp(1) + Tensor q = at::empty_like(self).exponential_(1, gen); + // In theory the probability to generate 0 from exponential distribution is + // 0. However, on CUDA side there is a protection to avoid 0s, but on CPU + // side, there is a very low probability to generate 0 from + // exponential. The probability is about 2^(-DBL_MANT_DIG). We just + // ignore it here, but there may be some risk to get invalid output on CPU. + at::div_out(q, self, q); + if (n_sample == 1) { + at::argmax_out(result, q, /*dim=*/-1, /*keepdim=*/true); + } else { + Tensor vals = at::empty(result.sizes(), self.options()); + at::topk_out(vals, result, q, n_sample); + } + return result; + } + + result = multinomial_with_replacement_mps_kernel(const_cast(self), n_sample, gen, result); + + return result; +} +Tensor multinomial_mps( + const Tensor& self, + int64_t n_sample, + bool with_replacement, + c10::optional gen) { + Tensor result = at::empty({0}, self.options().dtype(kLong)); + multinomial_out_mps(self, n_sample, with_replacement, gen, result); + return result; } } // namespace native diff --git a/aten/src/ATen/native/mps/operations/Eye.mm b/aten/src/ATen/native/mps/operations/Eye.mm index 45b3fdf68b07..6b72c0686caa 100644 --- a/aten/src/ATen/native/mps/operations/Eye.mm +++ b/aten/src/ATen/native/mps/operations/Eye.mm @@ -70,9 +70,9 @@ @autoreleasepool { // A key is used to identify the MPSGraph which was created once, and can be reused if the parameters, data types etc match the earlier created MPSGraph string key = "eye_out_mps:" + getTensorsStringKey({result}); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -94,7 +94,6 @@ } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } // Create placeholders which use the keys of the CachedGraph to create inputs and outputs of the operation diff --git a/aten/src/ATen/native/mps/operations/Indexing.h b/aten/src/ATen/native/mps/operations/Indexing.h new file mode 100644 index 000000000000..e769a7121d50 --- /dev/null +++ b/aten/src/ATen/native/mps/operations/Indexing.h @@ -0,0 +1,39 @@ +// Copyright © 2022 Apple Inc. + +#include +#include +#include +#include +#include +#include +#include +#include + +using namespace at::mps; + +namespace at { +namespace native { +namespace mps { + +std::string getBitSizeString(ScalarType scalar_type) { + size_t scalarBitSize = c10::elementSize(scalar_type) * 8; + TORCH_CHECK(scalarBitSize <= 64, "Unsupported data type: ", getMPSTypeString(scalar_type)); + return std::to_string(scalarBitSize) + "bit"; + +} + +std::string getIndexFunctionName(ScalarType scalar_type, bool index_select, bool accumulate) { + std::string indexFunction = index_select ? "index_select_" : + (accumulate && (scalar_type != kBool)) ? "index_put_accumulate_" : "index_put_"; + + indexFunction += getBitSizeString(scalar_type); + if (accumulate) { + TORCH_CHECK(scalar_type == ScalarType::Float || scalar_type == ScalarType::Int, "Unsupported data type for accumulate case: ", getMPSTypeString(scalar_type)); + string dtypeString = (scalar_type == ScalarType::Float) ? "_float" : "_int"; + indexFunction += dtypeString; + } + return indexFunction; +} +} +} +} diff --git a/aten/src/ATen/native/mps/operations/Indexing.mm b/aten/src/ATen/native/mps/operations/Indexing.mm index 7acb2fdba422..78e93fc99175 100644 --- a/aten/src/ATen/native/mps/operations/Indexing.mm +++ b/aten/src/ATen/native/mps/operations/Indexing.mm @@ -1,5 +1,4 @@ // Copyright © 2022 Apple Inc. - #include #include #include @@ -12,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -20,6 +20,7 @@ #include #include #include +#include #ifdef __OBJC__ #include @@ -28,6 +29,199 @@ namespace at { namespace native { +static +bool dispatchIndexKernel(TensorIteratorBase& iter, + IntArrayRef index_size, + IntArrayRef index_stride, + bool index_select, + bool accumulate) { + using namespace mps; + + if (iter.numel() == 0) + return true; + + const Tensor& inputTensor = iter.tensor(1); + Tensor outputTensor = iter.tensor(0); + id inputBuffer = getMTLBufferStorage(inputTensor); + id outputBuffer = getMTLBufferStorage(outputTensor); + MPSStream* mpsStream = getCurrentMPSStream(); + id device = MPSDevice::getInstance()->device(); + + dispatch_sync(mpsStream->queue(), ^(){ + @autoreleasepool { + NSError* error = nil; + constexpr uint32_t nOffsets = 3; + const int64_t num_indices = index_size.size(); + const uint32_t numThreads = iter.numel(); + const uint32_t nDim = iter.ndim(); + const IntArrayRef& iterShape = iter.shape(); + std::vector iterShapeData(iterShape.size()); + std::vector> strides(nDim); + + for (const auto i: c10::irange(iterShape.size())) { + TORCH_CHECK(i <= UINT32_MAX); + iterShapeData[i] = (uint32_t)(iterShape[i]); + } + + for (const auto i: c10::irange(nDim)) { + for (const auto offset: c10::irange(nOffsets)) { + strides[i][offset] = iter.strides(offset)[i]; + } + } + + MTLSize gridSize = MTLSizeMake(numThreads, 1, 1); + id commandBuffer = mpsStream->commandBuffer(); + id computeEncoder = [commandBuffer computeCommandEncoder]; + id kernelDataOffsetsFunction = MPSDevice::getInstance()->metalIndexingFunction("kernel_index_offsets", nil); + id kernelDataOffsetsPSO = [[device newComputePipelineStateWithFunction: kernelDataOffsetsFunction + error: &error] autorelease]; + id kernelDataOffsets = [[device newBufferWithLength: numThreads * sizeof(simd_uint3) + options: 0] autorelease]; + TORCH_CHECK(kernelDataOffsetsPSO, "Failed to created pipeline state object, error: ", [[error description] UTF8String]); + + [computeEncoder setComputePipelineState:kernelDataOffsetsPSO]; + [computeEncoder setBytes:strides.data() length:sizeof(uint32_t) * nDim * nOffsets atIndex:0]; + [computeEncoder setBuffer:kernelDataOffsets offset:0 atIndex:1]; + [computeEncoder setBytes:iterShapeData.data() length:sizeof(uint32_t) * iterShape.size() atIndex:2]; + [computeEncoder setBytes:&nDim length:sizeof(uint32_t) atIndex:3]; + [computeEncoder setBytes:&nOffsets length:sizeof(uint32_t) atIndex:4]; + + NSUInteger kernelOffsetsTGSize = kernelDataOffsetsPSO.maxTotalThreadsPerThreadgroup; + if (kernelOffsetsTGSize > numThreads) + kernelOffsetsTGSize = numThreads; + + MTLSize kernelOffsetsThreadGroupSize = MTLSizeMake(kernelOffsetsTGSize, 1, 1); + [computeEncoder dispatchThreads: gridSize + threadsPerThreadgroup: kernelOffsetsThreadGroupSize]; + + MTLFunctionConstantValues* constantValues = [[MTLFunctionConstantValues new] autorelease]; + [constantValues setConstantValue: &num_indices type:MTLDataTypeUInt atIndex:0]; + + std::string indexFunction = getIndexFunctionName(inputTensor.scalar_type(), index_select, accumulate); + id indexKernelFunction = MPSDevice::getInstance()->metalIndexingFunction(indexFunction, constantValues); + id argumentEncoder = [[indexKernelFunction newArgumentEncoderWithBufferIndex:0] autorelease]; + NSUInteger argumentBufferLength = argumentEncoder.encodedLength; + id indexAB = [[device newBufferWithLength:argumentBufferLength options:0] autorelease]; + [argumentEncoder setArgumentBuffer:indexAB offset:0]; + + for (uint32_t idx = 0; idx < num_indices; idx++) { + const Tensor& indexTensor = iter.tensor(idx+2); + [argumentEncoder setBuffer: getMTLBufferStorage(indexTensor) + offset: indexTensor.storage_offset() * indexTensor.element_size() + atIndex: idx]; + TORCH_CHECK(indexTensor.scalar_type() == ScalarType::Long, "index(): Expected dtype int64 for Index"); + } + + // FIXME: PSO needs to be cached + id indexSelectPSO = [[device newComputePipelineStateWithFunction: indexKernelFunction + error: &error] autorelease]; + TORCH_CHECK(indexSelectPSO, "Failed to created pipeline state object, error: ", [[error description] UTF8String]); + + for (uint32_t idx = 0; idx < num_indices; idx++) { + const Tensor& indexTensor = iter.tensor(idx+2); + [computeEncoder useResource:getMTLBufferStorage(indexTensor) usage:MTLResourceUsageRead]; + } + + [computeEncoder setComputePipelineState:indexSelectPSO]; + [computeEncoder setBuffer:indexAB offset:0 atIndex:0]; + [computeEncoder setBytes:index_size.data() length:sizeof(index_size[0]) * index_size.size() atIndex:1]; + [computeEncoder setBytes:index_stride.data() length:sizeof(index_stride[0]) * index_stride.size() atIndex:2]; + [computeEncoder setBuffer:kernelDataOffsets offset:0 atIndex:3]; + [computeEncoder setBuffer:inputBuffer offset:inputTensor.storage_offset() * inputTensor.element_size() atIndex:4]; + [computeEncoder setBuffer:outputBuffer offset:outputTensor.storage_offset() * outputTensor.element_size() atIndex:5]; + + NSUInteger tgSize = indexSelectPSO.maxTotalThreadsPerThreadgroup; + if (tgSize > numThreads) + tgSize = numThreads; + + MTLSize threadGroupSize = MTLSizeMake(tgSize, 1, 1); + [computeEncoder dispatchThreads: gridSize + threadsPerThreadgroup: threadGroupSize]; + + [computeEncoder endEncoding]; + mpsStream->commit(true); + } + }); + + return true; +} + +static void validateInputData(const TensorIteratorBase& iter, IntArrayRef index_size, IntArrayRef index_stride, const std::string& op, bool accumulate) { + using namespace mps; + + int64_t num_indices = index_size.size(); + TORCH_CHECK(num_indices <= 16, "Current limit allows up to 16 indices to be used in MPS indexing kernels"); + + AT_ASSERT(num_indices == index_stride.size()); + AT_ASSERT(num_indices == iter.ntensors() - 2); + const Tensor& inputTensor = iter.tensor(1); + + if (accumulate) { + // No atomic support for the rest of dtypes + TORCH_CHECK(inputTensor.scalar_type() == ScalarType::Float || + inputTensor.scalar_type() == ScalarType::Int || + inputTensor.scalar_type() == ScalarType::Bool); + } else { + TORCH_CHECK(c10::isIntegralType(inputTensor.scalar_type(), /*includesBool=*/true) || + inputTensor.scalar_type() == ScalarType::Float || + inputTensor.scalar_type() == ScalarType::Half, + getMPSTypeString(inputTensor.scalar_type()) + std::string(" not supported for index.Tensor_out")); + } +} + +void index_kernel_mps(TensorIteratorBase& iter, IntArrayRef index_size, IntArrayRef index_stride) { + using namespace mps; + @autoreleasepool { + validateInputData(iter, index_size, index_stride, "index.Tensor_out", /*accumulate=*/false); + dispatchIndexKernel(iter, index_size, index_stride, /*index_select=*/true, /*accumulate=*/false); + } +} + +void index_put_kernel_mps(TensorIterator& iter, IntArrayRef index_size, IntArrayRef index_stride, bool accumulate) { + using namespace mps; + @autoreleasepool { + validateInputData(iter, index_size, index_stride, "index_put_impl", accumulate); + dispatchIndexKernel(iter, index_size, index_stride, /*index_select=*/false, accumulate); + } +} + +static Tensor & masked_select_out_mps_impl(Tensor & result, const Tensor & self, const Tensor & mask) { + NoNamesGuard guard; + + TORCH_CHECK(mask.scalar_type() == ScalarType::Byte || mask.scalar_type() == ScalarType::Bool, + "masked_select: expected BoolTensor or ByteTensor for mask"); + TORCH_CHECK(self.scalar_type() == result.scalar_type(), + "masked_select(): self and result must have the same scalar type"); + + auto mask_temp = (mask.dim() == 0) + ? c10::MaybeOwned::owned(mask.unsqueeze(0)) + : c10::MaybeOwned::borrowed(mask); + auto self_temp = (self.dim() == 0) + ? c10::MaybeOwned::owned(self.unsqueeze(0)) + : c10::MaybeOwned::borrowed(self); + + // Cannot reassign to mask_temp and self_temp here! if they are + // owning and expand_outplace returns a borrow, the returned borrow + // would dangle. + auto mask_self_expanded = expand_outplace(*mask_temp, *self_temp); + at::index_out( + result, *std::get<1>(mask_self_expanded), + c10::List>({*std::move(std::get<0>(mask_self_expanded))})); + + return result; +} + +Tensor masked_select_mps(const Tensor & self, const Tensor & mask) { + namedinference::compute_broadcast_outnames(self, mask); + Tensor result = at::empty({0}, self.options()); + return masked_select_out_mps_impl(result, self, mask); +} + +Tensor & masked_select_out_mps(const Tensor & self, const Tensor & mask, Tensor & result) { + namedinference::compute_broadcast_outnames(self, mask); + return masked_select_out_mps_impl(result, self, mask); +} + Tensor flip_mps(const Tensor& self, IntArrayRef dims) { using namespace mps; @@ -42,7 +236,7 @@ Tensor flip_mps(const Tensor& self, IntArrayRef dims) { auto total_dims = self.dim(); // It wraps the dims and checks that there are no repeated dims auto flip_dims_b = at::dim_list_to_bitset(dims, total_dims); - NSMutableArray * ns_dims = [NSMutableArray new]; + NSMutableArray * ns_dims = [[NSMutableArray new] autorelease]; for (const auto i : c10::irange(total_dims)) { if(flip_dims_b[i] && self.size(i) > 1 && self.stride(i) != 0) { @@ -58,12 +252,7 @@ Tensor flip_mps(const Tensor& self, IntArrayRef dims) { MPSStream* stream = getCurrentMPSStream(); - struct CachedGraph : public MPSCachedGraph - { - CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) {} - MPSGraphTensor* inputTensor_ = nil; - MPSGraphTensor* outputTensor_ = nil; - }; + using CachedGraph = mps::MPSUnaryCachedGraph; MPSGraphCache* cache_ = MPSGraphCache::getInstance(); @@ -71,9 +260,9 @@ Tensor flip_mps(const Tensor& self, IntArrayRef dims) { NSString* ns_dims_key = [[ns_dims valueForKey:@"description"] componentsJoinedByString:@","]; // A key is used to identify the MPSGraph which was created once, and can be reused if the parameters, data types etc match the earlier created MPSGraph string key = "flip_mps:" + getTensorsStringKey({self}) + ":" + string([ns_dims_key UTF8String]); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + auto cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -90,7 +279,6 @@ Tensor flip_mps(const Tensor& self, IntArrayRef dims) { } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } // Create placeholders which use the keys of the CachedGraph to create inputs and outputs of the operation @@ -147,10 +335,10 @@ Tensor flip_mps(const Tensor& self, IntArrayRef dims) { @autoreleasepool { string key = "index_add_mps_out" + getTensorsStringKey({self, index, source}) + ":" + std::to_string(dim); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @autoreleasepool { @@ -178,19 +366,19 @@ Tensor flip_mps(const Tensor& self, IntArrayRef dims) { } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } Placeholder selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self); Placeholder indexPlaceholder = Placeholder(cachedGraph->indexTensor_, index); Placeholder sourcePlaceholder = Placeholder(cachedGraph->sourceTensor_, source); Placeholder outputPlaceholder = Placeholder(cachedGraph->outputTensor_, result); + MPSScalar alpha_scalar = getMPSScalar(alpha_f, source.scalar_type()); NSDictionary* feeds = @{ selfPlaceholder.getMPSGraphTensor() : selfPlaceholder.getMPSGraphTensorData(), indexPlaceholder.getMPSGraphTensor() : indexPlaceholder.getMPSGraphTensorData(), sourcePlaceholder.getMPSGraphTensor() : sourcePlaceholder.getMPSGraphTensorData(), - cachedGraph->alphaTensor_ : getMPSGraphTensorFromScalar(stream, alpha_f, MPSDataTypeFloat32) + cachedGraph->alphaTensor_ : getMPSGraphTensorFromScalar(stream, alpha_scalar), }; NSDictionary* results = @{ outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() @@ -265,10 +453,10 @@ Tensor index_select_mps(const Tensor & self, @autoreleasepool { string key = "index_select_out_mps" + getTensorsStringKey({self, index}) + ":" + std::to_string(dim); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @autoreleasepool { @@ -290,7 +478,6 @@ Tensor index_select_mps(const Tensor & self, } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } Placeholder selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self); @@ -335,9 +522,9 @@ Tensor index_select_mps(const Tensor & self, MPSStream* stream = getCurrentMPSStream(); @autoreleasepool { string key = "masked_fill" + getTensorsStringKey({self, mask}) + ":" + std::to_string(value.toDouble()); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -371,7 +558,6 @@ Tensor index_select_mps(const Tensor & self, } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } Placeholder selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self); @@ -420,7 +606,7 @@ Tensor embedding_dense_backward_mps( int64_t D = incoming_gradient_shape[num_incoming_gradient_dims - 1]; c10::SmallVector outgoing_gradient_shape{num_weights, D}; Tensor outgoing_gradient = at::native::empty_mps( - IntArrayRef(outgoing_gradient_shape.data(), outgoing_gradient_shape.size()), + IntArrayRef(outgoing_gradient_shape), grad_.scalar_type(), c10::nullopt, kMPS, @@ -435,10 +621,10 @@ Tensor embedding_dense_backward_mps( @autoreleasepool { string key = "edb_mps:" + native_mps::getMPSTypeString(grad_.scalar_type()) + ":indices" + std::to_string(num_indices_dims) + ":num_weights" + std::to_string(num_weights) + ":padding_idx" + std::to_string(padding_idx) + ":scaled" + std::to_string(scale_grad_by_freq); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + CachedGraph* cachedGraph = cache_->LookUpAs(key); // Initialize once if configuration not found in cache if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -450,17 +636,20 @@ Tensor embedding_dense_backward_mps( MPSGraphTensor* indicesTensor = native_mps::mpsGraphUnrankedPlaceHolder(mpsGraph, native_mps::getMPSDataType(indices.scalar_type())); - MPSGraphTensor *reshapedIndicesTensor = [mpsGraph expandDimsOfTensor:indicesTensor - axes:@[@-1] - name:nil]; + MPSGraphTensor* reshapedIndicesTensor = indicesTensor; - MPSGraphTensor *outgoingGradTensor; - outgoingGradTensor = [mpsGraph scatterNDWithUpdatesTensor:incomingGradTensor - indicesTensor:reshapedIndicesTensor - shape:native_mps::getMPSShape(IntArrayRef(outgoing_gradient_shape.data(), outgoing_gradient_shape.size())) - batchDimensions:0 - mode:MPSGraphScatterModeAdd - name:@"edb"]; + if (num_indices_dims != 0) { + reshapedIndicesTensor = [mpsGraph expandDimsOfTensor: indicesTensor + axes: @[@-1] + name: nil]; + } + + auto outgoingGradTensor = [mpsGraph scatterNDWithUpdatesTensor: incomingGradTensor + indicesTensor: reshapedIndicesTensor + shape: native_mps::getMPSShape(IntArrayRef(outgoing_gradient_shape)) + batchDimensions: 0 + mode: MPSGraphScatterModeAdd + name: @"edb"]; newCachedGraph->incomingGradTensor_ = incomingGradTensor; newCachedGraph->indicesTensor_ = indicesTensor; @@ -469,7 +658,6 @@ Tensor embedding_dense_backward_mps( } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } auto incomingGradPlaceholder = native_mps::Placeholder(cachedGraph->incomingGradTensor_, grad_); auto indicesPlaceholder = native_mps::Placeholder(cachedGraph->indicesTensor_, indices); @@ -494,5 +682,7 @@ Tensor embedding_dense_backward_mps( return masked_fill__mps(self, mask, value.item()); } -} -} +REGISTER_DISPATCH(index_stub, &index_kernel_mps); +REGISTER_DISPATCH(index_put_stub, &index_put_kernel_mps); +} // native +} // at diff --git a/aten/src/ATen/native/mps/operations/Linear.mm b/aten/src/ATen/native/mps/operations/Linear.mm index 34b933d44461..ddaa6ce97963 100644 --- a/aten/src/ATen/native/mps/operations/Linear.mm +++ b/aten/src/ATen/native/mps/operations/Linear.mm @@ -18,13 +18,15 @@ Tensor _mps_linear( const Tensor& input, - const Tensor& weight, + const Tensor& weight_arg, const c10::optional& bias_opt) { // wT = transpose(weight); // y=x*wT+b using namespace mps; + auto weight = (weight_arg.dim() == 1) ? weight_arg.view({1, weight_arg.size(0)}) : weight_arg; + TORCH_CHECK(input.scalar_type() == ScalarType::Double || input.scalar_type() == ScalarType::Float || input.scalar_type() == ScalarType::Half, "MPS device does not support linear for non-float inputs"); @@ -150,7 +152,15 @@ Tensor _mps_linear( mps::runMPSGraph(stream, cachedGraph->graph(), feeds, results); } - return output; + // Shave off '1' present at the end of the shape + if(weight_arg.dim() == 1) { + // Number of elements in new output shape + auto output_sizes = output.sizes(); + std::vector out_shape(output_sizes.begin(), output_sizes.end()-1); + return output.view(IntArrayRef(out_shape)); + } + else + return output; } Tensor _mps_linear_backward_input( @@ -361,10 +371,10 @@ Tensor _mps_linear_backward_input( const Tensor& weight, std::array output_mask) { Tensor grad_input, grad_weight, grad_bias; if (output_mask[0]) { - grad_input = at::_mps_linear_backward_input(input.sizes(), grad_output, weight); + grad_input = _mps_linear_backward_input(input.sizes(), grad_output, weight); } if (output_mask[1] || output_mask[2]) { - std::tie(grad_weight, grad_bias) = at::_mps_linear_backward_weights(grad_output, input, weight, output_mask[2]); + std::tie(grad_weight, grad_bias) = _mps_linear_backward_weights(grad_output, input, weight, output_mask[2]); } return std::tuple{grad_input, grad_weight, grad_bias}; } diff --git a/aten/src/ATen/native/mps/operations/LossOps.mm b/aten/src/ATen/native/mps/operations/LossOps.mm index cc112265a3a8..3430af0434de 100644 --- a/aten/src/ATen/native/mps/operations/LossOps.mm +++ b/aten/src/ATen/native/mps/operations/LossOps.mm @@ -455,7 +455,7 @@ void nllnd_loss_backward_impl( auto totalWeightPlaceholder = Placeholder(cachedGraph->totalWeightTensor_, total_weight); auto gradInputPlaceholder = Placeholder(cachedGraph->gradInputTensor_, grad_input); - NSMutableDictionary* feeds = [[NSMutableDictionary alloc] initWithCapacity: 4]; + NSMutableDictionary* feeds = [[[NSMutableDictionary alloc] initWithCapacity: 4] autorelease]; feeds[inputPlaceholder.getMPSGraphTensor()] = inputPlaceholder.getMPSGraphTensorData(); feeds[targetPlaceholder.getMPSGraphTensor()] = targetPlaceholder.getMPSGraphTensorData(); feeds[totalWeightPlaceholder.getMPSGraphTensor()] = totalWeightPlaceholder.getMPSGraphTensorData(); @@ -697,7 +697,7 @@ void nllnd_loss_backward_impl( Placeholder totalWeightsPlaceholder = Placeholder(cachedGraph->totalWeightTensor_, total_weight); // Create dictionary of inputs and outputs - NSMutableDictionary* feeds = [[NSMutableDictionary alloc] initWithCapacity: 4]; + NSMutableDictionary* feeds = [[[NSMutableDictionary alloc] initWithCapacity: 4] autorelease]; feeds[selfPlaceholder.getMPSGraphTensor()] = selfPlaceholder.getMPSGraphTensorData(); feeds[targetPlaceholder.getMPSGraphTensor()] = targetPlaceholder.getMPSGraphTensorData(); feeds[batchSizePlaceholder.getMPSGraphTensor()] = batchSizePlaceholder.getMPSGraphTensorData(); diff --git a/aten/src/ATen/native/mps/operations/Normalization.mm b/aten/src/ATen/native/mps/operations/Normalization.mm index 2e026b9acb46..49f1e0538463 100644 --- a/aten/src/ATen/native/mps/operations/Normalization.mm +++ b/aten/src/ATen/native/mps/operations/Normalization.mm @@ -411,6 +411,54 @@ Check if running mean exists (maybe do this check before making graph) return std::make_tuple(output, save_mean, save_var); } +std::tuple _batch_norm_legit_mps + (const Tensor& self, + const c10::optional& weight_opt, + const c10::optional& bias_opt, + Tensor& running_mean, + Tensor& running_var, + bool train, + double momentum, + double epsilon) { + + return batch_norm_mps(self, weight_opt, bias_opt, running_mean, running_var, train, momentum, epsilon); +} + +std::tuple _batch_norm_legit_no_stats_mps + (const Tensor& self, + const c10::optional& weight_opt, + const c10::optional& bias_opt, + bool train, + double momentum, + double epsilon) { + + return batch_norm_mps(self, weight_opt, bias_opt, Tensor(), Tensor(), train, momentum, epsilon); +} + +std::tuple _batch_norm_legit_mps_out + (const Tensor& self, + const c10::optional& weight_opt, + const c10::optional& bias_opt, + Tensor& running_mean, + Tensor& running_var, + bool train, double momentum, double epsilon, + Tensor& output, + Tensor& save_mean, + Tensor& save_var) { + return batch_norm_mps_out(self, weight_opt, bias_opt, running_mean, running_var, train, momentum, epsilon, output, save_mean, save_var); +} + +std::tuple _batch_norm_legit_no_stats_mps_out + (const Tensor& self, + const c10::optional& weight_opt, + const c10::optional& bias_opt, + bool train, double momentum, double epsilon, + Tensor& output, + Tensor& save_mean, + Tensor& save_var) { + return batch_norm_mps_out(self, weight_opt, bias_opt, Tensor(), Tensor(), train, momentum, epsilon, output, save_mean, save_var); +} + string get_mem_string(c10::MemoryFormat memory_format) { string mem_format_key; switch(memory_format) { @@ -823,7 +871,7 @@ string get_mem_string(c10::MemoryFormat memory_format) { const int normalized_ndim = normalized_shape.size(); // NOLINTNEXTLINE(bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions) const int axis = input_ndim - normalized_ndim; - at::Tensor input_reshaped = input.view({1, M, -1}); + at::Tensor input_reshaped = input.reshape({1, M, -1}); // Unlike Batch Normalization, which applies scalar scale and bias for each // entire channel/plane with the affine option, Layer Normalization applies // per-element scale and bias. E.g. For input {N, C, H, W}, weight for diff --git a/aten/src/ATen/native/mps/operations/Pad.mm b/aten/src/ATen/native/mps/operations/Pad.mm new file mode 100644 index 000000000000..63a26e66288b --- /dev/null +++ b/aten/src/ATen/native/mps/operations/Pad.mm @@ -0,0 +1,306 @@ +// Copyright © 2022 Apple Inc. + +#include +#include + +namespace at { +namespace native { +namespace mps { + +// Pad operations (1D/2D/3D forward and backward) +Tensor& pad_out_template(Tensor &output, const Tensor &input_, IntArrayRef padding, + const c10::optional& grad_output_opt, + MPSGraphPaddingMode mode, double constantValue, const string op_name) +{ + const int padding_size = (int) padding.size(); + const int padding_dim = padding_size / 2; // either 1D, 2D, or 3D + + TORCH_CHECK(padding_size == 2 || padding_size == 4 || padding_size == 6, + "invalid padding argument of size ", padding_size); + + const Tensor& grad_output_ = *(at::borrow_from_optional_tensor(grad_output_opt)); + const bool is_backward_pass = grad_output_.defined(); + + int64_t nbatch = 1; + int64_t ndims = input_.ndimension(); + // number of input dims with ConstantPad could be less than 2 + int dim_w = ndims > 1 ? padding_dim : 0; + int dim_h = padding_dim - 1; + int dim_d = padding_dim - 2; + int dim_slices = 0; + + if (!is_backward_pass && ndims > 1) { + bool valid_dims = input_.size(1) != 0 && input_.size(padding_dim) != 0; + TORCH_CHECK((ndims == 1 + padding_dim && valid_dims) || + (ndims == 2 + padding_dim && valid_dims && input_.size(1 + padding_dim) != 0), + "3D or 4D (batch mode) tensor expected for input, but got: ", input_); + } + + if (ndims == 2 + padding_dim) { + nbatch = input_.size(0); + dim_w++; + dim_h++; + dim_d++; + dim_slices++; + } + + int64_t pad_l = padding[0]; + int64_t pad_r = padding[1]; + int64_t pad_t = padding_dim > 1 ? padding[2] : 0; + int64_t pad_b = padding_dim > 1 ? padding[3] : 0; + int64_t pad_front = padding_dim > 2 ? padding[4] : 0; + int64_t pad_back = padding_dim > 2 ? padding[5] : 0; + + int64_t nplane = input_.size(dim_slices); + int64_t input_w = input_.size(dim_w); + int64_t output_w = input_w + pad_l + pad_r; + int64_t input_h = padding_dim > 1 ? input_.size(dim_h) : 0; + int64_t output_h = padding_dim > 1 ? input_h + pad_t + pad_b : 0; + int64_t input_d = padding_dim > 2 ? input_.size(dim_d) : 0; + int64_t output_d = padding_dim > 2 ? input_d + pad_front + pad_back : 0; + + Tensor grad_output, input = input_; + + if (!is_backward_pass) { + TORCH_CHECK(pad_l < input_w && pad_r < input_w, + "Argument #4: Padding size should be less than the corresponding " + "input dimension, but got: padding (", pad_l, ", ", pad_r, + ") at dimension ", dim_w, " of input ", ndims); + + if (padding_dim > 1) { + TORCH_CHECK(pad_t < input_h && pad_b < input_h, + "Argument #6: Padding size should be less than the corresponding " + "input dimension, but got: padding (", pad_t, ", ", pad_b, + ") at dimension ", dim_h, " of input ", ndims); + } + TORCH_CHECK(output_w >= 1 || output_h >= padding_dim - 1, + "input (H: ", input_h, ", W: ", input_w, ") is too small. Calculated " + "output H: ", output_h, " W: ", output_w); + + if (ndims == 1 + padding_dim) { + if (padding_dim == 3) + output.resize_({nplane, output_d, output_h, output_w}); + else if (padding_dim == 2) + output.resize_({nplane, output_h, output_w}); + else + output.resize_({nplane, output_w}); + } else { + if (padding_dim == 3) + output.resize_({nbatch, nplane, output_d, output_h, output_w}); + else if (padding_dim == 2) + output.resize_({nbatch, nplane, output_h, output_w}); + else if (ndims > 1) + output.resize_({nbatch, nplane, output_w}); + else + output.resize_({output_w}); + } + if (output.numel() == 0 || input_.numel() == 0) + return output; + input = input_.contiguous(); + } else { + TORCH_CHECK(output_w == grad_output_.size(dim_w), + "gradOutput width unexpected. Expected: ", output_w, ", Got: ", grad_output_.size(dim_w)); + if (padding_dim > 1) { + TORCH_CHECK(output_h == grad_output_.size(dim_h), + "gradOutput height unexpected. Expected: ", output_h, ", Got: ", grad_output_.size(dim_h)); + } + grad_output = grad_output_.contiguous(); + } + + std::vector leftPadVec(ndims, @(0)); + std::vector rightPadVec(ndims, @(0)); + leftPadVec [ndims - 1] = @(pad_l); + rightPadVec[ndims - 1] = @(pad_r); + if (padding_dim >= 2) { + leftPadVec [ndims - 2] = @(pad_t); + rightPadVec[ndims - 2] = @(pad_b); + } + if (padding_dim >= 3) { + leftPadVec [ndims - 3] = @(pad_front); + rightPadVec[ndims - 3] = @(pad_back); + } + MPSShape *leftPadding = [NSArray arrayWithObjects:leftPadVec.data() count:ndims]; + MPSShape *rightPadding = [NSArray arrayWithObjects:rightPadVec.data() count:ndims]; + + struct CachedGraph : public MPSCachedGraph { + CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) { } + MPSGraphTensor *inputTensor = nil, *outputTensor = nil; + MPSGraphTensor *gradOutputTensor = nil; + }; + MPSGraphCache* cache_ = MPSGraphCache::getInstance(); + + @autoreleasepool { + string key = op_name + getTensorsStringKey({input, grad_output}) + + ":L" + to_string(pad_l) + ":R" + to_string(pad_r) + + ":T" + to_string(pad_t) + ":B" + to_string(pad_b) + + ":F" + to_string(pad_front) + ":K" + to_string(pad_back); + + CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + if(!cachedGraph) { + cachedGraph = static_cast(cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { + CachedGraph *newCachedGraph = nil; + @autoreleasepool { + MPSGraph* mpsGraph = make_mps_graph(); + newCachedGraph = new CachedGraph(mpsGraph); + newCachedGraph->inputTensor = mpsGraphRankedPlaceHolder(mpsGraph, input); + if (!is_backward_pass) { + newCachedGraph->outputTensor = [mpsGraph padTensor:newCachedGraph->inputTensor + withPaddingMode:mode + leftPadding:leftPadding + rightPadding:rightPadding + constantValue:constantValue + name:nil]; + } else { + newCachedGraph->gradOutputTensor = mpsGraphRankedPlaceHolder(mpsGraph, grad_output); + newCachedGraph->outputTensor = [mpsGraph padGradientWithIncomingGradientTensor:newCachedGraph->gradOutputTensor + sourceTensor:newCachedGraph->inputTensor + paddingMode:mode + leftPadding:leftPadding + rightPadding:rightPadding + name:nil]; + } + } + return newCachedGraph; + })); + } + Placeholder inputPlaceholder = Placeholder(cachedGraph->inputTensor, input); + Placeholder outputPlaceholder = Placeholder(cachedGraph->outputTensor, output); + + NSMutableDictionary *feeds = [[NSMutableDictionary new] autorelease]; + feeds[inputPlaceholder.getMPSGraphTensor()] = inputPlaceholder.getMPSGraphTensorData(); + if (is_backward_pass) { + Placeholder gradOutputPlaceholder = Placeholder(cachedGraph->gradOutputTensor, grad_output); + feeds[gradOutputPlaceholder.getMPSGraphTensor()] = gradOutputPlaceholder.getMPSGraphTensorData(); + } + NSDictionary* results = @{ + outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() + }; + runMPSGraph(getCurrentMPSStream(), cachedGraph->graph(), feeds, results); + } + return output; +} +} // namespace mps + +// 1D Reflection and Replication Padding +TORCH_IMPL_FUNC(reflection_pad1d_out_mps) +(const Tensor& input, IntArrayRef padding, const Tensor& output) +{ + mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, + MPSGraphPaddingModeReflect, 0.0, "reflection_pad1d_out_mps"); +} + +TORCH_IMPL_FUNC(reflection_pad1d_backward_out_mps) +(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, const Tensor& grad_input) +{ + grad_input.resize_as_(input).zero_(); + mps::pad_out_template(const_cast(grad_input), input, padding, grad_output, + MPSGraphPaddingModeReflect, 0.0, "reflection_pad1d_backward_out_mps"); +} + +TORCH_IMPL_FUNC(replication_pad1d_out_mps) +(const Tensor& input, IntArrayRef padding, const Tensor& output) +{ + mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, + MPSGraphPaddingModeClampToEdge, 0.0, "replication_pad1d_out_mps"); +} + +TORCH_IMPL_FUNC(replication_pad1d_backward_out_mps) +(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, const Tensor& grad_input) +{ + grad_input.resize_as_(input).zero_(); + mps::pad_out_template(const_cast(grad_input), input, padding, grad_output, + MPSGraphPaddingModeClampToEdge, 0.0, "replication_pad1d_backward_out_mps"); +} + +// 2D Reflection and Replication Padding +Tensor& reflection_pad2d_out_mps(const Tensor& input, IntArrayRef padding, Tensor& output) +{ + return mps::pad_out_template(output, input, padding, c10::nullopt, MPSGraphPaddingModeReflect, 0.0, __func__); +} + +Tensor reflection_pad2d_mps(const Tensor& input, IntArrayRef padding) +{ + Tensor output = at::empty({0}, input.options()); + return mps::pad_out_template(output, input, padding, c10::nullopt, MPSGraphPaddingModeReflect, 0.0, __func__); +} + +Tensor& reflection_pad2d_backward_out_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, Tensor& grad_input) +{ + grad_input.resize_as_(input).zero_(); + return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeReflect, 0.0, __func__); +} + +Tensor reflection_pad2d_backward_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding) +{ + auto grad_input = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); + return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeReflect, 0.0, __func__); +} + +TORCH_IMPL_FUNC(replication_pad2d_out_mps) +(const Tensor& input, IntArrayRef padding, const Tensor& output) +{ + mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, + MPSGraphPaddingModeClampToEdge, 0.0, "replication_pad2d_out_mps"); +} + +Tensor& replication_pad2d_backward_out_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, Tensor& grad_input) +{ + grad_input.resize_as_(input).zero_(); + return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeClampToEdge, 0.0, __func__); +} + +Tensor replication_pad2d_backward_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding) +{ + auto grad_input = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); + return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeClampToEdge, 0.0, __func__); +} + +// 3D Reflection and Replication Padding +TORCH_IMPL_FUNC(reflection_pad3d_out_mps) +(const Tensor& input, IntArrayRef padding, const Tensor& output) +{ + mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, + MPSGraphPaddingModeReflect, 0.0, "reflection_pad3d_out_mps"); +} + +TORCH_IMPL_FUNC(reflection_pad3d_backward_out_mps) +(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, const Tensor& grad_input) +{ + grad_input.resize_as_(input).zero_(); + mps::pad_out_template(const_cast(grad_input), input, padding, grad_output, + MPSGraphPaddingModeReflect, 0.0, "reflection_pad3d_backward_out_mps"); +} + +TORCH_IMPL_FUNC(replication_pad3d_out_mps) +(const Tensor& input, IntArrayRef padding, const Tensor& output) +{ + mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, + MPSGraphPaddingModeClampToEdge, 0.0, "replication_pad3d_out_mps"); +} + +Tensor& replication_pad3d_backward_out_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, Tensor& grad_input) +{ + grad_input.resize_as_(input).zero_(); + return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeClampToEdge, 0.0, __func__); +} + +Tensor replication_pad3d_backward_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding) +{ + auto grad_input = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); + return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeClampToEdge, 0.0, __func__); +} + +// backward pass is exlicitly handled in autograd by negating the "pad" argument +Tensor constant_pad_nd_mps(const Tensor& self, IntArrayRef pad, const Scalar& value) +{ + if (pad.size() > 6) { + TORCH_WARN_ONCE("MPS: The constant padding of more than 3 dimensions is not currently supported natively. ", + "It uses View Ops default implementation to run. This may have performance implications."); + return at::native::constant_pad_nd(self, pad, value); + } + Tensor output = at::empty({0}, self.options()); + return mps::pad_out_template(output, self, pad, c10::nullopt, MPSGraphPaddingModeConstant, value.toDouble(), __func__); +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/mps/operations/PointwiseOps.mm b/aten/src/ATen/native/mps/operations/PointwiseOps.mm index 261749bd269f..8da6b94dd856 100644 --- a/aten/src/ATen/native/mps/operations/PointwiseOps.mm +++ b/aten/src/ATen/native/mps/operations/PointwiseOps.mm @@ -36,10 +36,10 @@ @autoreleasepool { string key = op_name + getTensorsStringKey({self, tensor1, tensor2}, false); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph * () { CachedGraph* newCachedGraph = nil; @autoreleasepool { @@ -72,7 +72,6 @@ } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } // Inputs as placeholders @@ -80,13 +79,14 @@ Placeholder tensor1Placeholder = Placeholder(cachedGraph->firstTensor, tensor1); Placeholder tensor2Placeholder = Placeholder(cachedGraph->secondTensor, tensor2); Placeholder outputPlaceholder = Placeholder(cachedGraph->outputTensor, output); + MPSScalar value_scalar = getMPSScalar(value_opt, self.scalar_type()); // Create dictionary of inputs and outputs NSDictionary* feeds = @{ selfPlaceholder.getMPSGraphTensor() : selfPlaceholder.getMPSGraphTensorData(), tensor1Placeholder.getMPSGraphTensor() : tensor1Placeholder.getMPSGraphTensorData(), tensor2Placeholder.getMPSGraphTensor() : tensor2Placeholder.getMPSGraphTensorData(), - cachedGraph->valueTensor : getMPSGraphTensorFromScalar(mpsStream, value_opt, getMPSScalarType(self.scalar_type())), + cachedGraph->valueTensor : getMPSGraphTensorFromScalar(mpsStream, value_scalar), }; NSDictionary* results = @{ diff --git a/aten/src/ATen/native/mps/operations/RangeFactories.mm b/aten/src/ATen/native/mps/operations/RangeFactories.mm index 2d97e01fea13..403ae4748f0f 100644 --- a/aten/src/ATen/native/mps/operations/RangeFactories.mm +++ b/aten/src/ATen/native/mps/operations/RangeFactories.mm @@ -95,7 +95,7 @@ auto stream = getCurrentMPSStream(); auto mpsDataType = getMPSDataType(result.scalar_type()); @autoreleasepool { - string key = "arange_mps_out:" + getTensorsStringKey({result}) + ":" + to_string(size); + string key = "arange_mps_out" + getTensorsStringKey({result}) + ":" + to_string(size); auto cachedGraph = static_cast(cache_->LookUp(key)); if (!cachedGraph) { auto *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph *() { @@ -106,8 +106,10 @@ } Placeholder outputPlaceholder = Placeholder(cachedGraph->outputTensor, r); NSMutableDictionary *feeds = [[NSMutableDictionary new] autorelease]; - feeds[cachedGraph->startTensor] = getMPSGraphTensorFromScalar(stream, start, mpsDataType); - feeds[cachedGraph->multiplyTensor] = getMPSGraphTensorFromScalar(stream, Scalar(step), mpsDataType); + MPSScalar startScalar = getMPSScalar(start, result.scalar_type()); + feeds[cachedGraph->startTensor] = getMPSGraphTensorFromScalar(stream, startScalar); + MPSScalar stepScalar = getMPSScalar(step, result.scalar_type()); + feeds[cachedGraph->multiplyTensor] = getMPSGraphTensorFromScalar(stream, stepScalar); NSDictionary* results = @{ outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() @@ -167,13 +169,16 @@ } NSMutableDictionary *feeds = [[NSMutableDictionary new] autorelease]; - auto multiplyScalar = (end.to() - start.to()) / ((double)steps - 1.0f); + auto multiply = (end.to() - start.to()) / ((double)steps - 1.0f); Placeholder outputPlaceholder = Placeholder(cachedGraph->outputTensor, r); // Create dictionary of inputs and outputs - feeds[cachedGraph->startTensor] = getMPSGraphTensorFromScalar(stream, start, MPSDataTypeFloat32); - feeds[cachedGraph->endTensor] = getMPSGraphTensorFromScalar(stream, end, MPSDataTypeFloat32); - feeds[cachedGraph->multiplyTensor] = getMPSGraphTensorFromScalar(stream, Scalar(multiplyScalar), MPSDataTypeFloat32); + MPSScalar startScalar = getMPSScalar(start, ScalarType::Float); + feeds[cachedGraph->startTensor] = getMPSGraphTensorFromScalar(stream, startScalar); + MPSScalar endScalar = getMPSScalar(end, ScalarType::Float); + feeds[cachedGraph->endTensor] = getMPSGraphTensorFromScalar(stream, endScalar); + MPSScalar multiplyScalar = getMPSScalar(multiply, ScalarType::Float); + feeds[cachedGraph->multiplyTensor] = getMPSGraphTensorFromScalar(stream, multiplyScalar); NSDictionary* results = @{ outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() diff --git a/aten/src/ATen/native/mps/operations/ReduceOps.mm b/aten/src/ATen/native/mps/operations/ReduceOps.mm index d6e510a06e32..d905107b8ffd 100644 --- a/aten/src/ATen/native/mps/operations/ReduceOps.mm +++ b/aten/src/ATen/native/mps/operations/ReduceOps.mm @@ -9,6 +9,7 @@ #include #include #include +#include namespace at { namespace native { @@ -26,7 +27,8 @@ SUM, PROD, MEAN, - COUNT_NONZERO + COUNT_NONZERO, + TRACE }; @@ -138,7 +140,7 @@ void set_axes_and_shapes(const Tensor& input_t, } void reduction_out_mps - (const Tensor& input_t, + (const Tensor& input_tensor, OptionalIntArrayRef opt_dim, bool keepdim, c10::optional dtype, @@ -146,6 +148,8 @@ void set_axes_and_shapes(const Tensor& input_t, MPSReductionType reduction_type, const std::string& func_name) { + auto input_t = (input_tensor.sizes().size() == 0) ? input_tensor.view({1}) : input_tensor; + IntArrayRef input_shape = input_t.sizes(); if (opt_dim.has_value()) { @@ -183,7 +187,7 @@ void set_axes_and_shapes(const Tensor& input_t, auto cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -236,6 +240,14 @@ void set_axes_and_shapes(const Tensor& input_t, castOutputTensor = [mpsGraph reductionMinimumWithTensor:inputTensor axes:axes name:nil]; + } else if(reduction_type == MPSReductionType::TRACE) { + MPSGraphTensor *bandPartWithTensor = [mpsGraph bandPartWithTensor:inputTensor + numLower:0 + numUpper:0 + name:nil]; + castOutputTensor = [mpsGraph reductionSumWithTensor:bandPartWithTensor + axes:@[@0, @1] + name:nil]; } MPSGraphTensor* outputTensor = nil; @@ -252,15 +264,15 @@ void set_axes_and_shapes(const Tensor& input_t, } return newCachedGraph; }); - cachedGraph = tmpCachedGraph->as(); } auto inputPlaceholder = native_mps::Placeholder(); - if(apparent_input_shape) + if (apparent_input_shape) { inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t, apparent_input_shape); - else + } else { inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t); + } auto outputPlaceholder = native_mps::Placeholder(cachedGraph->outputTensor_, output_t, apparent_output_shape); NSDictionary *feeds = @{ inputPlaceholder.getMPSGraphTensor() : inputPlaceholder.getMPSGraphTensorData(), @@ -284,6 +296,26 @@ void set_axes_and_shapes(const Tensor& input_t, reduction_out_mps(input_t, opt_dim, keepdim, dtype, output_t, MPSReductionType::SUM, "sum_out_mps"); } +Tensor trace_mps_out(const Tensor& self) { + + Tensor output_t = at::native::empty_mps( + {}, + self.scalar_type(), + c10::nullopt, + kMPS, + c10::nullopt, + c10::nullopt); + + std::vector dims(self.dim()); + std::iota(dims.begin(), dims.end(), 0); + + reduction_out_mps(self, IntArrayRef(dims), false, c10::nullopt, const_cast(output_t), MPSReductionType::TRACE, "trace_mps_out"); + + return output_t; + + +} + TORCH_IMPL_FUNC(prod_out_mps) (const Tensor& input_t, int64_t dim, @@ -299,7 +331,7 @@ void set_axes_and_shapes(const Tensor& input_t, // Taken from ReduceOps.cpp inline ScalarType get_dtype_from_self( const Tensor& self, - const optional& dtype, + const c10::optional& dtype, bool promote_integers) { if (dtype.has_value()) { return dtype.value(); @@ -331,12 +363,8 @@ inline ScalarType get_dtype_from_self( Tensor prod_mps(const Tensor &self, c10::optional opt_dtype) { - auto num_dims = self.dim(); - - int64_t dims[num_dims]; - - for(int i = 0; i < num_dims; i++) - dims[i] = i; + std::vector dims(self.dim()); + std::iota(dims.begin(), dims.end(), 0); Tensor output_t = at::native::empty_mps( {}, @@ -346,7 +374,7 @@ Tensor prod_mps(const Tensor &self, c10::optional opt_dtype) { c10::nullopt, c10::nullopt); - reduction_out_mps(self, IntArrayRef(dims, num_dims), false, opt_dtype, const_cast(output_t), MPSReductionType::PROD, "prod_mps"); + reduction_out_mps(self, IntArrayRef(dims), false, opt_dtype, const_cast(output_t), MPSReductionType::PROD, "prod_mps"); return output_t; } @@ -360,13 +388,13 @@ Tensor count_nonzero_mps(const Tensor& self, IntArrayRef dims){ set_axes_and_shapes(self, dims, axes, apparent_input_shape, apparent_output_shape, output_shape); - int64_t* raw_output_shape = (int64_t *)malloc([output_shape count] * sizeof(int64_t)); - for(int i=0; i < [output_shape count]; i++) { + std::vector raw_output_shape([output_shape count]); + for(auto i: c10::irange(raw_output_shape.size())) { raw_output_shape[i] = [output_shape[i] longValue]; } Tensor output_t = at::native::empty_mps( - IntArrayRef(raw_output_shape, [output_shape count]), + IntArrayRef(raw_output_shape), ScalarType::Long, c10::nullopt, kMPS, @@ -375,8 +403,6 @@ Tensor count_nonzero_mps(const Tensor& self, IntArrayRef dims){ reduction_out_mps(self, dims, false, self.scalar_type(), const_cast(output_t), MPSReductionType::COUNT_NONZERO, "count_nonzero_mps"); - free(raw_output_shape); - return output_t; } @@ -391,14 +417,17 @@ Tensor count_nonzero_mps(const Tensor& self, IntArrayRef dims){ } TORCH_IMPL_FUNC(norm_out_mps) -(const Tensor& input_t, +(const Tensor& input_tensor, const OptionalScalarRef opt_p, IntArrayRef dim, bool keepdim, const Tensor& output_t) { - if (input_t.numel() == 0) + if (input_tensor.numel() == 0) return; + + auto input_t = (input_tensor.sizes().size() == 0) ? input_tensor.view({1}) : input_tensor; + IntArrayRef input_shape = input_t.sizes(); for(int i = 0; i < dim.size(); i++) { @@ -452,7 +481,7 @@ Tensor count_nonzero_mps(const Tensor& self, IntArrayRef dims){ auto cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -522,7 +551,6 @@ Tensor count_nonzero_mps(const Tensor& self, IntArrayRef dims){ } return newCachedGraph; }); - cachedGraph = tmpCachedGraph->as(); } auto inputPlaceholder = native_mps::Placeholder(); @@ -584,7 +612,7 @@ Tensor std_var_common_impl_mps( NSMutableArray *axes = nil; NSMutableArray *apparent_output_shape = nil; NSMutableArray *apparent_input_shape = nil; - int64_t* output_shape = nil; + std::vector output_shape; if ((!keepdim && !use_dim) || (!keepdim && use_dim && dim_value.size() <= 0)) { @@ -624,7 +652,6 @@ Tensor std_var_common_impl_mps( axes); num_output_dims = (num_input_dims >= num_reduce_dims) ? (num_input_dims - num_reduce_dims) : 0; //num_input_dims; - output_shape = (int64_t *)malloc(num_output_dims * sizeof(int64_t)); unsigned int curr_i = 0; for (int i = 0; i < num_input_dims; i++) @@ -639,13 +666,17 @@ Tensor std_var_common_impl_mps( } } if (found) continue; - output_shape[curr_i] = input_shape[i]; + output_shape.push_back(input_shape[i]); curr_i += 1; + // End loop when output shape is filled + if (curr_i == num_output_dims) + break; } for(int i = 0; i < num_reduce_dims; i++) { - correction_n *= input_shape[dim_value[i]]; + auto wrap_dim = maybe_wrap_dim(dim_value[i], input_shape.size()); + correction_n *= input_shape[wrap_dim]; } // (3, 4, 5) --> (3, 5) } @@ -662,10 +693,9 @@ Tensor std_var_common_impl_mps( input_shape, axes); num_output_dims = num_input_dims; - output_shape = (int64_t *)malloc(num_output_dims * sizeof(int64_t)); for (int i = 0; i < num_input_dims; i++) { - output_shape[i] = (int64_t) 1; + output_shape.push_back((int64_t) 1); correction_n *= input_shape[i]; } // scalar --> vector case [[1.0034567]] @@ -685,21 +715,22 @@ Tensor std_var_common_impl_mps( axes); num_output_dims = num_input_dims;//(num_input_dims >= num_reduce_dims) ? (num_input_dims - num_reduce_dims) : 0; - output_shape = (int64_t *)malloc(num_output_dims * sizeof(int64_t)); for(int i = 0; i < num_reduce_dims; i++) { - correction_n *= input_shape[dim_value[i]]; + auto wrap_dim = maybe_wrap_dim(dim_value[i], input_shape.size()); + correction_n *= input_shape[wrap_dim]; } for (int i = 0; i < num_input_dims; i++) { - output_shape[i] = [apparent_output_shape[i] longValue]; + output_shape.push_back([apparent_output_shape[i] longValue]); } } + Tensor output_t = at::native::empty_mps( - IntArrayRef(output_shape, num_output_dims), + IntArrayRef(output_shape.data(), num_output_dims), input_t.scalar_type(), c10::nullopt, kMPS, @@ -726,7 +757,7 @@ Tensor std_var_common_impl_mps( auto cachedGraph = cache_->LookUpAs(key); // Initialize once if configuration not found in cache if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -761,7 +792,6 @@ Tensor std_var_common_impl_mps( } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } auto inputPlaceholder = native_mps::Placeholder(); @@ -784,7 +814,7 @@ Tensor std_var_common_impl_mps( }; native_mps::runMPSGraph(stream, cachedGraph->graph(), feeds, results); } - free(output_shape); + return output_t; } @@ -844,7 +874,7 @@ Tensor std_mps( CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @autoreleasepool { @@ -884,7 +914,6 @@ Tensor std_mps( } return newCachedGraph; }); - cachedGraph = tmpCachedGraph->as(); } auto inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t); @@ -919,7 +948,7 @@ Tensor std_mps( CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -960,7 +989,6 @@ Tensor std_mps( } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } auto inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t); @@ -1015,7 +1043,7 @@ Tensor std_mps( CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @autoreleasepool { @@ -1055,7 +1083,6 @@ Tensor std_mps( } return newCachedGraph; }); - cachedGraph = tmpCachedGraph->as(); } auto inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t); @@ -1090,7 +1117,7 @@ Tensor std_mps( CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -1131,7 +1158,6 @@ Tensor std_mps( } return newCachedGraph; }); - cachedGraph = tmpCachedGraph->as(); } auto inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t); @@ -1183,7 +1209,7 @@ Tensor std_mps( CachedGraph* cachedGraph = cache_->LookUpAs(key); // Initialize once if configuration not found in cache if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -1210,7 +1236,6 @@ Tensor std_mps( } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } auto inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t, apparent_input_shape); @@ -1294,10 +1319,10 @@ Tensor min_mps(const Tensor& input_t) { @autoreleasepool { string key = func_name + ":" + to_string(dim_) + ":" + native_mps::getMPSTypeString(input_t.scalar_type()); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -1347,7 +1372,6 @@ Tensor min_mps(const Tensor& input_t) { } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } auto inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t); @@ -1461,7 +1485,7 @@ Tensor min_mps(const Tensor& input_t) { CachedGraph* cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ native_mps::MPSCachedGraph * () { CachedGraph *newCachedGraph = nil; @@ -1502,7 +1526,6 @@ Tensor min_mps(const Tensor& input_t) { } return newCachedGraph; }); - cachedGraph = static_cast(tmpCachedGraph); } native_mps::Placeholder inputPlaceholder = native_mps::Placeholder(); @@ -1565,8 +1588,8 @@ Tensor min_mps(const Tensor& input_t) { // Use this if keepdim is false int64_t num_output_dims = num_input_dims - 1; - int64_t* malloc_apparent_out_shape = (int64_t *)malloc(num_input_dims * sizeof(int64_t)); - int64_t* malloc_out_shape = (int64_t *)malloc(num_output_dims * sizeof(int64_t)); + std::vector vec_apparent_out_shape(num_input_dims); + std::vector vec_out_shape(num_output_dims); apparent_out_shape = [NSMutableArray arrayWithCapacity:num_input_dims]; // Counter for shape when keepdim is false @@ -1574,12 +1597,12 @@ Tensor min_mps(const Tensor& input_t) { for(int i = 0; i < num_input_dims; i++) { if(dim_ == i) { apparent_out_shape[i] = @1; - malloc_apparent_out_shape[i] = 1; + vec_apparent_out_shape[i] = 1; } else { apparent_out_shape[i] = [NSNumber numberWithInt:input_shape[i]]; - malloc_apparent_out_shape[i] = input_shape[i]; - malloc_out_shape[out_i] = input_shape[i]; + vec_apparent_out_shape[i] = input_shape[i]; + vec_out_shape[out_i] = input_shape[i]; out_i++; } } @@ -1588,30 +1611,29 @@ Tensor min_mps(const Tensor& input_t) { Tensor indices_t; if(!keepdim) { output_t = at::native::empty_mps( - IntArrayRef(malloc_out_shape, num_output_dims), + IntArrayRef(vec_out_shape), input_t.scalar_type(), c10::nullopt, kMPS, c10::nullopt, c10::nullopt); indices_t = at::native::empty_mps( - IntArrayRef(malloc_out_shape, num_output_dims), + IntArrayRef(vec_out_shape), ScalarType::Long, c10::nullopt, kMPS, c10::nullopt, c10::nullopt); - } - else { + } else { output_t = at::native::empty_mps( - IntArrayRef(malloc_apparent_out_shape, num_input_dims), + IntArrayRef(vec_apparent_out_shape), input_t.scalar_type(), c10::nullopt, kMPS, c10::nullopt, c10::nullopt); indices_t = at::native::empty_mps( - IntArrayRef(malloc_apparent_out_shape, num_input_dims), + IntArrayRef(vec_apparent_out_shape), ScalarType::Long, c10::nullopt, kMPS, @@ -1620,15 +1642,11 @@ Tensor min_mps(const Tensor& input_t) { } if (output_t.numel() == 0 || input_t.numel() == 0) { - free(malloc_out_shape); - free(malloc_apparent_out_shape); return std::tuple{output_t, indices_t}; } min_max_out_mps(input_t, dim, keepdim, output_t, indices_t, reduction_type, func_name); - free(malloc_out_shape); - free(malloc_apparent_out_shape); return std::tuple{output_t, indices_t}; } @@ -1650,5 +1668,319 @@ Tensor min_mps(const Tensor& input_t) { return min_max_mps(input_t, dim, keepdim, MPSReductionType::MIN, "min_mps"); } +// Median of entire tensor into scalar result +Tensor median_mps(const Tensor& input_t) { + + if(!is_macos_13_or_newer()){ + TORCH_WARN_ONCE("MPS: median op is supported natively starting from macOS 13.0. ", + "Falling back on CPU. This may have performace implications."); + return at::median(input_t.to("cpu")); + } + + TORCH_INTERNAL_ASSERT(input_t.scalar_type() != ScalarType::Long, "median not supported for Long dtype on MPS"); + + namespace native_mps = at::native::mps; + using CachedGraph = native_mps::MPSUnaryCachedGraph; + + native_mps::MPSGraphCache* cache_ = native_mps::MPSGraphCache::getInstance(); + + IntArrayRef input_shape = input_t.sizes(); + int64_t num_input_dims = input_shape.size(); + + // calculate total no. of elements in the input tensor to reduce it to one dimension + NSMutableArray *apparent_input_shape = [NSMutableArray arrayWithCapacity:1]; + int64_t num_in_elements = 1; + for(int i = 0; i < num_input_dims; i++) { + num_in_elements *= input_shape[i]; + } + + apparent_input_shape[0] = [NSNumber numberWithInt:num_in_elements]; + + Tensor output_t = at::native::empty_mps({}, input_t.scalar_type(), c10::nullopt, kMPS, c10::nullopt, c10::nullopt); + + if (output_t.numel() == 0 || num_in_elements == 0) { + return output_t; + } + + @autoreleasepool { + string key = "median_mps:"+ mps::getMPSTypeString(input_t.scalar_type()) + mps::getTensorsStringKey(input_t); + CachedGraph* cachedGraph = cache_->LookUpAs(key); + // Initialize once if configuration not found in cache + if(!cachedGraph) { + native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + + CachedGraph *newCachedGraph = nil; + + @autoreleasepool { + MPSGraph* mpsGraph = native_mps::make_mps_graph(); + newCachedGraph = new CachedGraph(mpsGraph); + + MPSGraphTensor* inputTensor = native_mps::mpsGraphRankedPlaceHolder(mpsGraph, input_t); + + MPSGraphTensor* outputTensor = nil; + + MPSGraphTensor * reshapedTensor = [mpsGraph reshapeTensor:inputTensor + withShape:@[@-1] + name:nil]; + MPSGraphTensor * sortedTensor = [mpsGraph + sortWithTensor:reshapedTensor + axis:((NSUInteger) (int)0) + name:nil]; + + outputTensor = [mpsGraph sliceTensor:sortedTensor + dimension:0 + start:((NSUInteger) (int)((num_in_elements+1)/2 ) - 1) + length:1 + name:nil]; + + newCachedGraph->inputTensor_ = inputTensor; + newCachedGraph->outputTensor_ = outputTensor; + } + return newCachedGraph; + }); + cachedGraph = static_cast(tmpCachedGraph); + } + + auto inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t); + auto outputPlaceholder = native_mps::Placeholder(cachedGraph->outputTensor_, output_t, @[@1]); + + NSDictionary *feeds = @{ + inputPlaceholder.getMPSGraphTensor() : inputPlaceholder.getMPSGraphTensorData(), + }; + + NSDictionary *results = @{ + outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() + }; + + native_mps::runMPSGraph(getCurrentMPSStream(), cachedGraph->graph(), feeds, results); + } + + return output_t; +} + + +void median_out_mps + (const Tensor& input_t, + int64_t dim, + bool keepdim, + const Tensor& output_t, + const Tensor& indices_t, + const std::string& func_name) { + + namespace native_mps = at::native::mps; + + if (output_t.numel() == 0) { + return; + } + if (input_t.numel() == 1 && input_t.dim() == 0) { + output_t.fill_(input_t); + indices_t.fill_(0); + return; + } + + // Derive from MPSCachedGraph + struct CachedGraph : public native_mps::MPSCachedGraph + { + CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) {} + MPSGraphTensor *inputTensor_ = nil; + MPSGraphTensor *outputTensor_ = nil; + MPSGraphTensor *indicesTensor_ = nil; + }; + + native_mps::MPSGraphCache* cache_ = native_mps::MPSGraphCache::getInstance(); + + int64_t dim_ = maybe_wrap_dim(dim, input_t.dim()); + + // Calculate the output shape according to keepdim=True + // If there is no dim argument, the input shape is flattened + IntArrayRef input_shape = input_t.sizes(); + int64_t num_input_dims = input_shape.size(); + NSMutableArray *apparent_out_shape = nil; + + apparent_out_shape = [NSMutableArray arrayWithCapacity:num_input_dims]; + for(int i = 0; i < num_input_dims; i++) { + if(dim_ == i) + apparent_out_shape[i] = @1; + else + apparent_out_shape[i] = [NSNumber numberWithInt:input_shape[i]]; + } + int dim_total_elements = input_shape[dim_]; + + auto stream = at::mps::getCurrentMPSStream(); + + @autoreleasepool { + string key = func_name + ":" + to_string(dim_) + ":" + native_mps::getMPSTypeString(input_t.scalar_type()); + CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); + + if(!cachedGraph) { + native_mps::MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ native_mps::MPSCachedGraph * () { + + CachedGraph *newCachedGraph = nil; + + @autoreleasepool { + MPSGraph* mpsGraph = native_mps::make_mps_graph(); + newCachedGraph = new CachedGraph(mpsGraph); + + MPSGraphTensor* inputTensor = native_mps::mpsGraphUnrankedPlaceHolder(mpsGraph, native_mps::getMPSDataType(input_t.scalar_type())); + MPSGraphTensor* outputTensor = nil; + MPSGraphTensor * sortedTensor = [mpsGraph + sortWithTensor:inputTensor + axis:((NSUInteger) (int)dim_) + name:nil]; + + outputTensor = [mpsGraph sliceTensor:sortedTensor + dimension:dim_ + start:((NSUInteger) (int)((dim_total_elements+1)/2 ) - 1) + length:1 + name:nil]; + MPSGraphTensor* argreduceOutTensor = nil; + argreduceOutTensor = [mpsGraph argSortWithTensor:inputTensor + axis:(NSInteger)dim_ + name:@"argmax_out"]; + MPSGraphTensor* argOutputTensor = [mpsGraph sliceTensor:argreduceOutTensor + dimension:dim_ + start:((NSUInteger) (int)((dim_total_elements+1)/2 ) - 1) + length:1 + name:nil]; + + newCachedGraph->inputTensor_ = inputTensor; + newCachedGraph->outputTensor_ = outputTensor; + newCachedGraph->indicesTensor_ = argOutputTensor; + } + return newCachedGraph; + }); + cachedGraph = static_cast(tmpCachedGraph); + } + + auto inputPlaceholder = native_mps::Placeholder(cachedGraph->inputTensor_, input_t); + auto outputPlaceholder = native_mps::Placeholder(cachedGraph->outputTensor_, output_t, apparent_out_shape); + auto indicesPlaceholder = native_mps::Placeholder(cachedGraph->indicesTensor_, indices_t, apparent_out_shape); + + NSDictionary *feeds = @{ + inputPlaceholder.getMPSGraphTensor() : inputPlaceholder.getMPSGraphTensorData(), + }; + + NSDictionary *results = @{ + outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData(), + indicesPlaceholder.getMPSGraphTensor() : indicesPlaceholder.getMPSGraphTensorData() + }; + + native_mps::runMPSGraph(stream, cachedGraph->graph(), feeds, results); + + } + +} + +// in case mps sortWithTensor do not supported on macOS +std::tuple median_from_cpu( + const Tensor& self, + int64_t dim, + bool keepdim, Tensor & valuesI, Tensor & indicesI, IntArrayRef vec_out_shape, IntArrayRef vec_apparent_out_shape) { + // Tensor a = at::median(self.to("cpu")); + Tensor values; + Tensor indices; + if (!keepdim){ + values = at::empty({vec_out_shape}, self.options()); + indices = at::empty({vec_out_shape}, self.options().dtype(kLong)); + + } + else{ + values = at::empty({vec_apparent_out_shape}, self.options()); + indices = at::empty({vec_apparent_out_shape}, self.options().dtype(kLong)); + } + at::median_out(values, indices, self, dim, keepdim); + + valuesI.copy_(values); + indicesI.copy_(indices); + return std::forward_as_tuple(valuesI, indicesI); +} + +TORCH_API ::std::tuple median_out_mps + (const at::Tensor & input_t, + int64_t dim, + bool keepdim, + at::Tensor & values, + at::Tensor & indices){ + + TORCH_INTERNAL_ASSERT(input_t.scalar_type() != ScalarType::Long, "median not supported for Long dtype on MPS"); + + namespace native_mps = at::native::mps; + int64_t dim_ = maybe_wrap_dim(dim, input_t.dim()); + native::zero_numel_check_dims(input_t, dim_, "max()"); + + // Calculate the output shape according to keepdim=True + // If there is no dim argument, the input shape is flattened + IntArrayRef input_shape = input_t.sizes(); + int64_t num_input_dims = input_shape.size(); + NSMutableArray *apparent_out_shape = nil; + // Use this if keepdim is false + int64_t num_output_dims = num_input_dims - 1; + + std::vector vec_apparent_out_shape(num_input_dims); + std::vector vec_out_shape(num_output_dims); + + apparent_out_shape = [NSMutableArray arrayWithCapacity:num_input_dims]; + // Counter for shape when keepdim is false + int out_i = 0; + for(int i = 0; i < num_input_dims; i++) { + if(dim_ == i) { + apparent_out_shape[i] = @1; + vec_apparent_out_shape[i] = 1; + } + else { + apparent_out_shape[i] = [NSNumber numberWithInt:input_shape[i]]; + vec_apparent_out_shape[i] = input_shape[i]; + vec_out_shape[out_i] = input_shape[i]; + out_i++; + } + } + + if(!keepdim) { + values = at::native::empty_mps( + IntArrayRef(vec_out_shape), + input_t.scalar_type(), + c10::nullopt, + kMPS, + c10::nullopt, + c10::nullopt); + indices = at::native::empty_mps( + IntArrayRef(vec_out_shape), + ScalarType::Long, + c10::nullopt, + kMPS, + c10::nullopt, + c10::nullopt); + } else { + values = at::native::empty_mps( + IntArrayRef(vec_apparent_out_shape), + input_t.scalar_type(), + c10::nullopt, + kMPS, + c10::nullopt, + c10::nullopt); + indices = at::native::empty_mps( + IntArrayRef(vec_apparent_out_shape), + ScalarType::Long, + c10::nullopt, + kMPS, + c10::nullopt, + c10::nullopt); + } + + if (values.numel() == 0 || input_t.numel() == 0) { + return std::tuple{values, indices}; + } + + if(!is_macos_13_or_newer()){ + TORCH_WARN_ONCE("MPS: median op is supported natively starting from macOS 13.0.", + "Falling back on CPU. This may have performace implications."); + return median_from_cpu(input_t.to("cpu"), dim, keepdim, values, indices, IntArrayRef(vec_out_shape),IntArrayRef(vec_apparent_out_shape) ); + } + + median_out_mps(input_t, dim, keepdim, values, indices, "median_out_mps"); + + return std::tuple{values, indices}; +} + } // native } // at diff --git a/aten/src/ATen/native/mps/operations/Repeat.mm b/aten/src/ATen/native/mps/operations/Repeat.mm index 53bcddf405cc..8b6b709da642 100644 --- a/aten/src/ATen/native/mps/operations/Repeat.mm +++ b/aten/src/ATen/native/mps/operations/Repeat.mm @@ -108,16 +108,17 @@ Tensor repeat_mps(const Tensor& self, IntArrayRef repeats) { num_repeat_dims); // Set output shape - int64_t output_shape[num_repeat_dims]; + std::vector output_shape(num_repeat_dims); bool zero_tensor = false; - for(int i = 0; i < num_repeat_dims; i++) { + for(auto i : c10::irange(num_repeat_dims)) { output_shape[i] = repeats[i] * [apparent_input_shape[i] intValue]; - if(output_shape[i] == 0) + if(output_shape[i] == 0) { zero_tensor = true; + } } Tensor output = at::native::empty_mps( - IntArrayRef(output_shape, num_repeat_dims), + IntArrayRef(output_shape), self.scalar_type(), c10::nullopt, kMPS, diff --git a/aten/src/ATen/native/mps/operations/RnnOps.mm b/aten/src/ATen/native/mps/operations/RnnOps.mm index f15e842b54b2..23a59a19fdd2 100644 --- a/aten/src/ATen/native/mps/operations/RnnOps.mm +++ b/aten/src/ATen/native/mps/operations/RnnOps.mm @@ -193,7 +193,7 @@ Placeholder recurrentKernelWeight; Placeholder bias; Placeholder recurrentBias; - NSMutableDictionary *feeds = [[NSMutableDictionary alloc] init]; + NSMutableDictionary *feeds = [[[NSMutableDictionary alloc] init] autorelease]; for (size_t i = 0; i < num_layers; i+=1) { kernelWeight = Placeholder([kernelWeightsList objectAtIndex:i], kernel_weights[i]); recurrentKernelWeight = Placeholder([recurrentKernelWeightsList objectAtIndex:i], recurrent_kernel_weights[i]); @@ -425,7 +425,7 @@ Placeholder gradientHyPlaceholder = Placeholder(cachedGraph->inputTensors_[6], grad_hy); Placeholder gradientCyPlaceholder = Placeholder(cachedGraph->inputTensors_[7], grad_cy); - NSMutableDictionary *feeds = [[NSMutableDictionary alloc] init]; + NSMutableDictionary *feeds = [[[NSMutableDictionary alloc] init] autorelease]; [feeds setObject:gradientPlaceholder.getMPSGraphTensorData() forKey:gradientPlaceholder.getMPSGraphTensor()]; [feeds setObject:gradientHyPlaceholder.getMPSGraphTensorData() forKey:gradientHyPlaceholder.getMPSGraphTensor()]; [feeds setObject:gradientCyPlaceholder.getMPSGraphTensorData() forKey:gradientCyPlaceholder.getMPSGraphTensor()]; @@ -469,7 +469,7 @@ std::vector grad_hx = {grad_state, grad_cell_state}; - NSMutableDictionary *results = [[NSMutableDictionary alloc] init]; + NSMutableDictionary *results = [[[NSMutableDictionary alloc] init] autorelease]; NSMutableArray *gradOutputArray = cachedGraph->gradOutput_; NSMutableArray *gradRecWeightsArray = cachedGraph->gradRecWeights_; NSMutableArray *gradWeightsArray = cachedGraph->gradWeights_; diff --git a/aten/src/ATen/native/mps/operations/ScatterGather.mm b/aten/src/ATen/native/mps/operations/ScatterGather.mm index c4943d1242d9..cf8d8a1fef7e 100644 --- a/aten/src/ATen/native/mps/operations/ScatterGather.mm +++ b/aten/src/ATen/native/mps/operations/ScatterGather.mm @@ -15,7 +15,7 @@ namespace native { TORCH_IMPL_FUNC(gather_out_mps) -(const Tensor & self, +(const Tensor & self_arg, int64_t dim, const Tensor & index, bool sparse_grad, @@ -24,6 +24,8 @@ using namespace mps; MPSStream* stream = getCurrentMPSStream(); + auto self = self_arg.dim() == 0 ? self_arg.view({1}) : self_arg; + dim = at::maybe_wrap_dim(dim, self.dim()); TORCH_CHECK(!sparse_grad, "sparse_grad not supported in MPS yet") @@ -150,7 +152,7 @@ } void scatter_mps_general -(const Tensor& self, +(const Tensor& self_arg, int64_t dim, const Tensor& index, const Tensor& src, @@ -161,6 +163,8 @@ using namespace mps; MPSStream* stream = getCurrentMPSStream(); + auto self = self_arg.dim() == 0 ? self_arg.view({1}) : self_arg; + dim = at::maybe_wrap_dim(dim, self.dim()); TORCH_CHECK(index.scalar_type() == ScalarType::Long || index.scalar_type() == ScalarType::Int, "index_select(): Expected dtype int32 or int64 for index"); @@ -358,13 +362,13 @@ // 2. Flatten the values // 3. Scatter into input with add mode - int shape_data[num_input_dims]; + std::vector shape_data(num_input_dims); for(int i = 0; i < num_input_dims; i++) { shape_data[i] = {[scatterInputShape[i] intValue]}; } - MPSGraphTensor* scatterInputShapeTensor = [mpsGraph constantWithData:[NSData dataWithBytes:shape_data length:num_input_dims * sizeof(int)] + MPSGraphTensor* scatterInputShapeTensor = [mpsGraph constantWithData:[NSData dataWithBytes:shape_data.data() length:num_input_dims * sizeof(int)] shape:@[[NSNumber numberWithInt:num_input_dims]] dataType:MPSDataTypeInt32]; diff --git a/aten/src/ATen/native/mps/operations/Shape.mm b/aten/src/ATen/native/mps/operations/Shape.mm index 6bb918061c89..f491f2ff823a 100644 --- a/aten/src/ATen/native/mps/operations/Shape.mm +++ b/aten/src/ATen/native/mps/operations/Shape.mm @@ -16,288 +16,6 @@ namespace at { namespace native { -namespace mps { - -// Pad operations (1D/2D/3D forward and backward) -Tensor& pad_out_template(Tensor &output, const Tensor &input_, IntArrayRef padding, - const c10::optional& grad_output_opt, - MPSGraphPaddingMode mode, double constantValue, const string op_name) -{ - const int padding_size = (int) padding.size(); - const int padding_dim = padding_size / 2; // either 1D, 2D, or 3D - - TORCH_CHECK(padding_size == 2 || padding_size == 4 || padding_size == 6, - "invalid padding argument of size ", padding_size); - - const Tensor& grad_output_ = *(at::borrow_from_optional_tensor(grad_output_opt)); - const bool is_backward_pass = grad_output_.defined(); - - int dim_w = padding_dim, dim_h = padding_dim - 1, dim_d = padding_dim - 2, dim_slices = 0; - int64_t nbatch = 1, ndims = input_.ndimension(); - - if (!is_backward_pass) { - bool valid_dims = input_.size(1) != 0 && input_.size(padding_dim) != 0; - TORCH_CHECK((ndims == 1 + padding_dim && valid_dims) || - (ndims == 2 + padding_dim && valid_dims && input_.size(1 + padding_dim) != 0), - "3D or 4D (batch mode) tensor expected for input, but got: ", input_); - } - - if (ndims == 2 + padding_dim) { - nbatch = input_.size(0); - dim_w++; - dim_h++; - dim_d++; - dim_slices++; - } - - int64_t pad_l = padding[0]; - int64_t pad_r = padding[1]; - int64_t pad_t = padding_dim > 1 ? padding[2] : 0; - int64_t pad_b = padding_dim > 1 ? padding[3] : 0; - int64_t pad_front = padding_dim > 2 ? padding[4] : 0; - int64_t pad_back = padding_dim > 2 ? padding[5] : 0; - - int64_t nplane = input_.size(dim_slices); - int64_t input_w = input_.size(dim_w); - int64_t output_w = input_w + pad_l + pad_r; - int64_t input_h = padding_dim > 1 ? input_.size(dim_h) : 0; - int64_t output_h = padding_dim > 1 ? input_h + pad_t + pad_b : 0; - int64_t input_d = padding_dim > 2 ? input_.size(dim_d) : 0; - int64_t output_d = padding_dim > 2 ? input_d + pad_front + pad_back : 0; - - Tensor grad_output, input = input_; - - if (!is_backward_pass) { - TORCH_CHECK(pad_l < input_w && pad_r < input_w, - "Argument #4: Padding size should be less than the corresponding " - "input dimension, but got: padding (", pad_l, ", ", pad_r, - ") at dimension ", dim_w, " of input ", ndims); - - if (padding_dim > 1) { - TORCH_CHECK(pad_t < input_h && pad_b < input_h, - "Argument #6: Padding size should be less than the corresponding " - "input dimension, but got: padding (", pad_t, ", ", pad_b, - ") at dimension ", dim_h, " of input ", ndims); - } - TORCH_CHECK(output_w >= 1 || output_h >= padding_dim - 1, - "input (H: ", input_h, ", W: ", input_w, ") is too small. Calculated " - "output H: ", output_h, " W: ", output_w); - - if (ndims == 1 + padding_dim) { - if (padding_dim == 3) - output.resize_({nplane, output_d, output_h, output_w}); - else if (padding_dim == 2) - output.resize_({nplane, output_h, output_w}); - else - output.resize_({nplane, output_w}); - } else { - if (padding_dim == 3) - output.resize_({nbatch, nplane, output_d, output_h, output_w}); - else if (padding_dim == 2) - output.resize_({nbatch, nplane, output_h, output_w}); - else - output.resize_({nbatch, nplane, output_w}); - } - if (output.numel() == 0 || input_.numel() == 0) - return output; - input = input_.contiguous(); - } else { - TORCH_CHECK(output_w == grad_output_.size(dim_w), - "gradOutput width unexpected. Expected: ", output_w, ", Got: ", grad_output_.size(dim_w)); - if (padding_dim > 1) { - TORCH_CHECK(output_h == grad_output_.size(dim_h), - "gradOutput height unexpected. Expected: ", output_h, ", Got: ", grad_output_.size(dim_h)); - } - grad_output = grad_output_.contiguous(); - } - - const int64_t input_dim = input.dim(); - MPSShape *leftPadding = nullptr, *rightPadding = nullptr; - if (padding_dim == 3) { - leftPadding = [NSArray arrayWithObjects:(const NSNumber*[]){ @(0), @(0), @(pad_front), @(pad_t), @(pad_l) } count:input_dim]; - rightPadding = [NSArray arrayWithObjects:(const NSNumber*[]){ @(0), @(0), @(pad_back), @(pad_b), @(pad_r) } count:input_dim]; - } else if (padding_dim == 2) { - leftPadding = [NSArray arrayWithObjects:(const NSNumber*[]){ @(0), @(0), @(pad_t), @(pad_l) } count:input_dim]; - rightPadding = [NSArray arrayWithObjects:(const NSNumber*[]){ @(0), @(0), @(pad_b), @(pad_r) } count:input_dim]; - } else if (padding_dim == 1) { - leftPadding = [NSArray arrayWithObjects:(const NSNumber*[]){ @(0), @(0), @(pad_l) } count:input_dim]; - rightPadding = [NSArray arrayWithObjects:(const NSNumber*[]){ @(0), @(0), @(pad_r) } count:input_dim]; - } - - struct CachedGraph : public MPSCachedGraph { - CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) { } - MPSGraphTensor *inputTensor = nil, *outputTensor = nil; - MPSGraphTensor *gradOutputTensor = nil; - }; - MPSGraphCache* cache_ = MPSGraphCache::getInstance(); - - @autoreleasepool { - string key = op_name + getTensorsStringKey({input, grad_output}) + - ":L" + to_string(pad_l) + ":R" + to_string(pad_r) + - ":T" + to_string(pad_t) + ":B" + to_string(pad_b) + - ":F" + to_string(pad_front) + ":K" + to_string(pad_back); - - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); - if(!cachedGraph) { - cachedGraph = static_cast(cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { - CachedGraph *newCachedGraph = nil; - @autoreleasepool { - MPSGraph* mpsGraph = make_mps_graph(); - newCachedGraph = new CachedGraph(mpsGraph); - newCachedGraph->inputTensor = mpsGraphRankedPlaceHolder(mpsGraph, input); - if (!is_backward_pass) { - newCachedGraph->outputTensor = [mpsGraph padTensor:newCachedGraph->inputTensor - withPaddingMode:mode - leftPadding:leftPadding - rightPadding:rightPadding - constantValue:constantValue - name:nil]; - } else { - newCachedGraph->gradOutputTensor = mpsGraphRankedPlaceHolder(mpsGraph, grad_output); - newCachedGraph->outputTensor = [mpsGraph padGradientWithIncomingGradientTensor:newCachedGraph->gradOutputTensor - sourceTensor:newCachedGraph->inputTensor - paddingMode:mode - leftPadding:leftPadding - rightPadding:rightPadding - name:nil]; - } - } - return newCachedGraph; - })); - } - Placeholder inputPlaceholder = Placeholder(cachedGraph->inputTensor, input); - Placeholder outputPlaceholder = Placeholder(cachedGraph->outputTensor, output); - - NSMutableDictionary *feeds = [[NSMutableDictionary new] autorelease]; - feeds[inputPlaceholder.getMPSGraphTensor()] = inputPlaceholder.getMPSGraphTensorData(); - if (is_backward_pass) { - Placeholder gradOutputPlaceholder = Placeholder(cachedGraph->gradOutputTensor, grad_output); - feeds[gradOutputPlaceholder.getMPSGraphTensor()] = gradOutputPlaceholder.getMPSGraphTensorData(); - } - NSDictionary* results = @{ - outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() - }; - runMPSGraph(getCurrentMPSStream(), cachedGraph->graph(), feeds, results); - } - return output; -} -} // namespace mps - -// 1D Reflection and Replication Padding -TORCH_IMPL_FUNC(reflection_pad1d_out_mps) -(const Tensor& input, IntArrayRef padding, const Tensor& output) -{ - mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, - MPSGraphPaddingModeReflect, 0.0, "reflection_pad1d_out_mps"); -} - -TORCH_IMPL_FUNC(reflection_pad1d_backward_out_mps) -(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, const Tensor& grad_input) -{ - grad_input.resize_as_(input).zero_(); - mps::pad_out_template(const_cast(grad_input), input, padding, grad_output, - MPSGraphPaddingModeReflect, 0.0, "reflection_pad1d_backward_out_mps"); -} - -TORCH_IMPL_FUNC(replication_pad1d_out_mps) -(const Tensor& input, IntArrayRef padding, const Tensor& output) -{ - mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, - MPSGraphPaddingModeClampToEdge, 0.0, "replication_pad1d_out_mps"); -} - -TORCH_IMPL_FUNC(replication_pad1d_backward_out_mps) -(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, const Tensor& grad_input) -{ - grad_input.resize_as_(input).zero_(); - mps::pad_out_template(const_cast(grad_input), input, padding, grad_output, - MPSGraphPaddingModeClampToEdge, 0.0, "replication_pad1d_backward_out_mps"); -} - -// 2D Reflection and Replication Padding -Tensor& reflection_pad2d_out_mps(const Tensor& input, IntArrayRef padding, Tensor& output) -{ - return mps::pad_out_template(output, input, padding, c10::nullopt, MPSGraphPaddingModeReflect, 0.0, __func__); -} - -Tensor reflection_pad2d_mps(const Tensor& input, IntArrayRef padding) -{ - Tensor output = at::empty({0}, input.options()); - return mps::pad_out_template(output, input, padding, c10::nullopt, MPSGraphPaddingModeReflect, 0.0, __func__); -} - -Tensor& reflection_pad2d_backward_out_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, Tensor& grad_input) -{ - grad_input.resize_as_(input).zero_(); - return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeReflect, 0.0, __func__); -} - -Tensor reflection_pad2d_backward_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding) -{ - auto grad_input = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); - return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeReflect, 0.0, __func__); -} - -TORCH_IMPL_FUNC(replication_pad2d_out_mps) -(const Tensor& input, IntArrayRef padding, const Tensor& output) -{ - mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, - MPSGraphPaddingModeClampToEdge, 0.0, "replication_pad2d_out_mps"); -} - -Tensor& replication_pad2d_backward_out_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, Tensor& grad_input) -{ - grad_input.resize_as_(input).zero_(); - return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeClampToEdge, 0.0, __func__); -} - -Tensor replication_pad2d_backward_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding) -{ - auto grad_input = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); - return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeClampToEdge, 0.0, __func__); -} - -// 3D Reflection and Replication Padding -TORCH_IMPL_FUNC(reflection_pad3d_out_mps) -(const Tensor& input, IntArrayRef padding, const Tensor& output) -{ - mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, - MPSGraphPaddingModeReflect, 0.0, "reflection_pad3d_out_mps"); -} - -TORCH_IMPL_FUNC(reflection_pad3d_backward_out_mps) -(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, const Tensor& grad_input) -{ - grad_input.resize_as_(input).zero_(); - mps::pad_out_template(const_cast(grad_input), input, padding, grad_output, - MPSGraphPaddingModeReflect, 0.0, "reflection_pad3d_backward_out_mps"); -} - -TORCH_IMPL_FUNC(replication_pad3d_out_mps) -(const Tensor& input, IntArrayRef padding, const Tensor& output) -{ - mps::pad_out_template(const_cast(output), input, padding, c10::nullopt, - MPSGraphPaddingModeClampToEdge, 0.0, "replication_pad3d_out_mps"); -} - -Tensor& replication_pad3d_backward_out_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding, Tensor& grad_input) -{ - grad_input.resize_as_(input).zero_(); - return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeClampToEdge, 0.0, __func__); -} - -Tensor replication_pad3d_backward_mps(const Tensor& grad_output, const Tensor& input, IntArrayRef padding) -{ - auto grad_input = at::zeros_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT); - return mps::pad_out_template(grad_input, input, padding, grad_output, MPSGraphPaddingModeClampToEdge, 0.0, __func__); -} - -// backward pass is exlicitly handled in autograd by negating the "pad" argument -Tensor constant_pad_nd_mps(const Tensor& self, IntArrayRef pad, const Scalar& value) -{ - Tensor output = at::empty({0}, self.options()); - return mps::pad_out_template(output, self, pad, c10::nullopt, MPSGraphPaddingModeConstant, value.toDouble(), __func__); -} // topk TORCH_IMPL_FUNC(topk_out_mps) @@ -499,7 +217,7 @@ void check_shape_except_dim(const Tensor &first, const Tensor &second, //} TORCH_IMPL_FUNC(cat_out_mps) - (ITensorListRef inputs, + (const ITensorListRef& inputs, int64_t dimension, int64_t valid, bool all_contiguous, @@ -521,7 +239,7 @@ void check_shape_except_dim(const Tensor &first, const Tensor &second, idx++; } - dimension = legacy_cat_wrap_dim(dimension, inputs); + dimension = legacy_cat_wrap_dim(dimension, materialized_inputs); // previously, size [0] tensors were the only possible empty tensors; thus, it // wasn't possible to cat empty tensors unless all the other tensors were @@ -671,8 +389,8 @@ void check_shape_except_dim(const Tensor &first, const Tensor &second, // Create placeholders auto len_tensor_array = inputs.size() - skipped_tensor_indices.size(); - MPSGraphTensor* inputMPSGraphTensors[len_tensor_array]; - MPSGraphTensor* castInputMPSGraphTensors[len_tensor_array]; + std::vector inputMPSGraphTensors(len_tensor_array); + std::vector castInputMPSGraphTensors(len_tensor_array); int graph_tensor_idx = 0; for(const Tensor* tensor : input_tensors) { @@ -693,7 +411,7 @@ void check_shape_except_dim(const Tensor &first, const Tensor &second, graph_tensor_idx++; } - auto inputTensorsArray = [NSArray arrayWithObjects:castInputMPSGraphTensors + auto inputTensorsArray = [NSArray arrayWithObjects:castInputMPSGraphTensors.data() count:len_tensor_array]; // Use concatTensors to concatenate MPSGraphTensor* outputTensor = [mpsGraph concatTensors:inputTensorsArray diff --git a/aten/src/ATen/native/mps/operations/TensorCompare.mm b/aten/src/ATen/native/mps/operations/TensorCompare.mm index fb3b93a602f1..44d19e99c2f6 100644 --- a/aten/src/ATen/native/mps/operations/TensorCompare.mm +++ b/aten/src/ATen/native/mps/operations/TensorCompare.mm @@ -37,6 +37,47 @@ void clamp_mps_graph(CachedGraph* cachedGraph, const Tensor& input_tensor) } } +void check_min_max_dims(const OptionalTensorRef clamp_opt, + const Tensor& input_t, + string op_name) { + + if(!clamp_opt->is_same_size(input_t)) { + + auto num_clamp_dims = clamp_opt->dim(); + auto num_input_dims = input_t.dim(); + + auto clamp_shape = clamp_opt->sizes(); + auto input_shape = input_t.sizes(); + + TORCH_CHECK(num_clamp_dims <= num_input_dims, op_name + ": clamp tensor number of dims must not be greater than that of input tensor") + + for(int i = 0; i < num_clamp_dims; i++) + // One of the indices is allowed to be 1; will be handled by broadcast + TORCH_CHECK(clamp_shape[num_clamp_dims-1-i] == input_shape[num_input_dims-1-i] || + clamp_shape[num_clamp_dims-1-i] == 1 || + input_shape[num_input_dims-1-i] == 1, + op_name + ": clamp tensor trailing shape must match input tensor") + + } +} + +void fill_new_shape(int64_t num_input_dims, + int64_t num_clamp_dims, + int64_t *new_shape, + IntArrayRef clamp_shape) { + + // Extend the shape with ones to the left + int clamp_idx = 0; + for(int i = 0; i < num_input_dims; i++) { + if(i < num_input_dims - num_clamp_dims) + new_shape[i] = 1; + else { + new_shape[i] = clamp_shape[clamp_idx]; + clamp_idx++; + } + } +} + void clamp_tensor_out_mps(const Tensor& input_t, const OptionalTensorRef min_opt, const OptionalTensorRef max_opt, @@ -48,17 +89,54 @@ void clamp_tensor_out_mps(const Tensor& input_t, TORCH_CHECK(has_min || has_max, op_name + ": either min, max or both tensors must be defined") if (has_min) - TORCH_CHECK(min_opt->is_same_size(input_t), op_name + ": min and input tensors must be of the same shape") + check_min_max_dims(min_opt, input_t, op_name); + if (has_max) - TORCH_CHECK(max_opt->is_same_size(input_t), op_name + ": max and input tensors must be of the same shape") + check_min_max_dims(max_opt, input_t, op_name); if (output_t.numel() == 0) return; + IntArrayRef new_min_shape; + IntArrayRef new_max_shape; + + auto num_min_dims = min_opt->dim(); + auto num_max_dims = max_opt->dim(); + auto num_input_dims = input_t.dim(); + + std::vector new_min_arr(num_input_dims); + std::vector new_max_arr(num_input_dims); + + if(has_min && num_min_dims < num_input_dims) { + fill_new_shape(num_input_dims, num_min_dims, new_min_arr.data(), min_opt->sizes()); + new_min_shape = IntArrayRef(new_min_arr); + } + + if(has_max && num_max_dims < num_input_dims) { + fill_new_shape(num_input_dims, num_max_dims, new_max_arr.data(), max_opt->sizes()); + new_max_shape = IntArrayRef(new_max_arr); + } + + Tensor min_opt_tensor; + Tensor max_opt_tensor; + + if(has_min) { + min_opt_tensor = (num_min_dims < num_input_dims) ? (*min_opt).view(new_min_shape) : *min_opt; + } + if(has_max) { + max_opt_tensor = (num_max_dims < num_input_dims) ? (*max_opt).view(new_max_shape) : *max_opt; + } + @autoreleasepool { // the optional min/max refs could affect how we build the cached graph + + auto tensor_key = has_min ? (has_max ? getTensorsStringKey({input_t, min_opt_tensor, max_opt_tensor}) + : getTensorsStringKey({input_t, min_opt_tensor})) + : (has_max ? getTensorsStringKey({input_t, max_opt_tensor}) + : getTensorsStringKey({input_t})); + string key = op_name + (has_min ? "_min" : "") + (has_max ? "_max" : "") - + "_tensor" + getTensorsStringKey({input_t}); + + "_tensor" + tensor_key; MPSGraphCache* cache_ = MPSGraphCache::getInstance(); CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); @@ -71,9 +149,9 @@ void clamp_tensor_out_mps(const Tensor& input_t, newCachedGraph = new CachedGraph(mpsGraph); if (has_min) - newCachedGraph->minTensor = mpsGraphRankedPlaceHolder(mpsGraph, *min_opt); + newCachedGraph->minTensor = mpsGraphRankedPlaceHolder(mpsGraph, min_opt_tensor); if (has_max) - newCachedGraph->maxTensor = mpsGraphRankedPlaceHolder(mpsGraph, *max_opt); + newCachedGraph->maxTensor = mpsGraphRankedPlaceHolder(mpsGraph, max_opt_tensor); clamp_mps_graph(newCachedGraph, input_t); } @@ -88,11 +166,11 @@ void clamp_tensor_out_mps(const Tensor& input_t, NSMutableDictionary *feeds = [[NSMutableDictionary new] autorelease]; feeds[inputPlaceholder.getMPSGraphTensor()] = inputPlaceholder.getMPSGraphTensorData(); if (has_min) { - auto minPlaceholder = Placeholder(cachedGraph->minTensor, *min_opt); + auto minPlaceholder = Placeholder(cachedGraph->minTensor, min_opt_tensor); feeds[minPlaceholder.getMPSGraphTensor()] = minPlaceholder.getMPSGraphTensorData(); } if (has_max) { - auto maxPlaceholder = Placeholder(cachedGraph->maxTensor, *max_opt); + auto maxPlaceholder = Placeholder(cachedGraph->maxTensor, max_opt_tensor); feeds[maxPlaceholder.getMPSGraphTensor()] = maxPlaceholder.getMPSGraphTensorData(); } @@ -302,29 +380,33 @@ Tensor where_mps(const Tensor& condition, const Tensor& self, const Tensor& other) { - bool cond_zero_shape = (condition.dim() == 0); - bool self_zero_shape = (self.dim() == 0); - bool other_zero_shape = (other.dim() == 0); - auto max_dim = std::max(condition.dim(), std::max(self.dim(), other.dim())); - auto sum_dims = condition.dim() + self.dim() + other.dim(); + // How many leading dimensions do we broadcast across for each Tensor? + int cond_num_implicit_ones = (max_dim - condition.dim()); + int self_num_implicit_ones = (max_dim - self.dim()); + int other_num_implicit_ones = (max_dim - other.dim()); - TORCH_CHECK(max_dim == 0 || !(sum_dims % max_dim), "All inputs of where should have same/compatible number of dims") - - int64_t out_arr[max_dim]; + std::vector out_arr(max_dim); // Broadcasted output shape for(int i = 0; i < max_dim; i++) { - int64_t cond_num = cond_zero_shape ? 0 : condition.size(i); - int64_t self_num = self_zero_shape ? 0 : self.size(i); - int64_t other_num = other_zero_shape ? 0 : other.size(i); + // Use up the leading broadcast dimensions for each Tensor, then continue from the start of the "actual" shape + int64_t cond_idx = i < cond_num_implicit_ones ? 1 : (condition.size(i - cond_num_implicit_ones)); + int64_t self_idx = i < self_num_implicit_ones ? 1 : (self.size(i - self_num_implicit_ones)); + int64_t other_idx = i < other_num_implicit_ones ? 1 : (other.size(i - other_num_implicit_ones)); + + auto max_idx = std::max({cond_idx, self_idx, other_idx}); + + TORCH_CHECK(cond_idx == max_idx || cond_idx == 1 || (cond_idx == 0 && max_idx == 1), i, "'th index ", cond_idx, " of condition tensor does not match the other tensors") + TORCH_CHECK(self_idx == max_idx || self_idx == 1 || (self_idx == 0 && max_idx == 1), i, "'th index ", self_idx, " of x tensor does not match the other tensors") + TORCH_CHECK(other_idx == max_idx || other_idx == 1 || (other_idx == 0 && max_idx == 1), i, "'th index ", other_idx, " of x tensor does not match the other tensors") - out_arr[i] = std::max(cond_num, std::max(self_num, other_num)); + out_arr[i] = (cond_idx == 0 || self_idx == 0 || other_idx == 0) ? 0 : max_idx; } - Tensor ret = empty_mps(IntArrayRef(out_arr, max_dim), + Tensor ret = empty_mps(IntArrayRef(out_arr), self.scalar_type(), c10::nullopt, kMPS, diff --git a/aten/src/ATen/native/mps/operations/TriangularOps.mm b/aten/src/ATen/native/mps/operations/TriangularOps.mm index fb6e1c52ba49..c27670796499 100644 --- a/aten/src/ATen/native/mps/operations/TriangularOps.mm +++ b/aten/src/ATen/native/mps/operations/TriangularOps.mm @@ -172,197 +172,5 @@ } -Tensor& diag_mps_out(const Tensor& self, - int64_t diagonal, - Tensor &output) { - - // Do checks, resize output - IntArrayRef input_size = self.sizes(); - auto num_input_dims = input_size.size(); - // Input can only be 1D or 2D - TORCH_CHECK(num_input_dims == 1 || num_input_dims == 2, - "diag_mps_out: Input tensor must be 1D or 2D") - - if(num_input_dims == 1) { - auto n = input_size[0]; - if(diagonal > 0) - n += diagonal; - else if(diagonal < 0) - n -= diagonal; - - output.resize_({n, n}); - } - else if(num_input_dims == 2) { - auto num_diag_elements = std::min(input_size[0], input_size[1]); - if(diagonal > 0) { - TORCH_CHECK(input_size[1] - diagonal > 0, "Matrix not big enough for requested diagonal") - num_diag_elements = std::min(input_size[0], input_size[1] - diagonal); - } - else if(diagonal < 0) { - TORCH_CHECK(input_size[0] + diagonal > 0, "Matrix not big enough for requested diagonal") - num_diag_elements = std::min(input_size[0] + diagonal, input_size[1]); - } - - output.resize_({num_diag_elements}); - } - - using namespace mps; - MPSStream* stream = getCurrentMPSStream(); - - // Derive from MPSCachedGraph - struct CachedGraph : public MPSCachedGraph - { - CachedGraph(MPSGraph *graph) : MPSCachedGraph(graph) {} - MPSGraphTensor *inputTensor_ = nil; - MPSGraphTensor *outputTensor_ = nil; - }; - - MPSGraphCache* cache_ = MPSGraphCache::getInstance(); - - @autoreleasepool { - - MPSShape* input_shape = getMPSShape(self); - MPSShape* output_shape = getMPSShape(output); - NSNumber* num_input_cols = nil; - NSNumber* num_output_cols = nil; - NSMutableArray* flat_input_shape = nil; - NSMutableArray* flat_output_shape = nil; - if(num_input_dims == 1) { - num_output_cols = output_shape[1]; - flat_output_shape = [NSMutableArray arrayWithCapacity:1]; - flat_output_shape[0] = [NSNumber numberWithInt:[output_shape[0] intValue] * [output_shape[1] intValue]]; - } - else if(num_input_dims == 2) { - num_input_cols = input_shape[1]; - flat_input_shape = [NSMutableArray arrayWithCapacity:1]; - flat_input_shape[0] = [NSNumber numberWithInt:[input_shape[0] intValue] * [input_shape[1] intValue]]; - } - NSString* ns_shape_key = [[input_shape valueForKey:@"description"] componentsJoinedByString:@","]; - string key = "diag_mps_out:" + getMPSTypeString(self.scalar_type()) + ":" + std::to_string(diagonal) - + ":" + string([ns_shape_key UTF8String]); - CachedGraph* cachedGraph = static_cast(cache_->LookUp(key)); - - if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph * () { - CachedGraph *newCachedGraph = nil; - - @autoreleasepool { - MPSGraph* mpsGraph = make_mps_graph(); - newCachedGraph = new CachedGraph(mpsGraph); - - // TODO: Accept this as the flat version in 2D case - MPSGraphTensor* inputTensor = nil; - if(num_input_dims == 1) - inputTensor = mpsGraphUnrankedPlaceHolder(mpsGraph, getMPSDataType(self.scalar_type())); - else - inputTensor = mpsGraphRankedPlaceHolder(mpsGraph, getMPSDataType(self.scalar_type()), flat_input_shape); - - MPSGraphTensor* outputTensor = nil; - - MPSGraphTensor* zeroTensor = [mpsGraph constantWithScalar:0 - dataType:MPSDataTypeInt32]; - MPSGraphTensor* numDiagElementsRange = nil; - MPSGraphTensor* diagOffset = nil; - MPSGraphTensor* rowMultiplier = nil; - MPSGraphTensor* rowIndices = nil; - MPSGraphTensor* colIndices = nil; - MPSGraphTensor* indicesTensor = nil; - - if(num_input_dims == 1) { - int shape_data[1] = {[input_shape[0] intValue]}; - MPSGraphTensor* inputShapeTensor = [mpsGraph constantWithData:[NSData dataWithBytes:shape_data length:sizeof(int)] - shape:@[@1] - dataType:MPSDataTypeInt32]; - numDiagElementsRange = [mpsGraph coordinateAlongAxisTensor: zeroTensor - withShapeTensor: inputShapeTensor - name: nil]; - diagOffset = [mpsGraph constantWithScalar:diagonal - dataType:MPSDataTypeInt32]; - rowMultiplier = [mpsGraph constantWithScalar:[num_output_cols intValue] - dataType:MPSDataTypeInt32]; - } - else { - int shape_data[1] = {[output_shape[0] intValue]}; - MPSGraphTensor* outputShapeTensor = [mpsGraph constantWithData:[NSData dataWithBytes:shape_data length:sizeof(int)] - shape:@[@1] - dataType:MPSDataTypeInt32]; - numDiagElementsRange = [mpsGraph coordinateAlongAxisTensor: zeroTensor - withShapeTensor: outputShapeTensor - name: nil]; - diagOffset = [mpsGraph constantWithScalar:diagonal - dataType:MPSDataTypeInt32]; - rowMultiplier = [mpsGraph constantWithScalar:[num_input_cols intValue] - dataType:MPSDataTypeInt32]; - } - - if(diagonal >= 0) { - rowIndices = numDiagElementsRange; - colIndices = [mpsGraph additionWithPrimaryTensor:numDiagElementsRange - secondaryTensor:diagOffset - name:nil]; - } - else { - rowIndices = [mpsGraph subtractionWithPrimaryTensor:numDiagElementsRange - secondaryTensor:diagOffset - name:nil];; - colIndices = numDiagElementsRange; - } - - indicesTensor = [mpsGraph multiplicationWithPrimaryTensor:rowIndices - secondaryTensor:rowMultiplier - name:nil]; - indicesTensor = [mpsGraph additionWithPrimaryTensor:indicesTensor - secondaryTensor:colIndices - name:nil]; - - if(num_input_dims == 1) { - // TODO: Scatter mode doesn't matter, so what should I set it to be? - outputTensor = [mpsGraph scatterWithUpdatesTensor:inputTensor - indicesTensor:indicesTensor - shape:flat_output_shape - axis:0 - mode:MPSGraphScatterModeAdd - name:nil]; - outputTensor = [mpsGraph reshapeTensor:outputTensor - withShape:output_shape - name:nil]; - } - else if(num_input_dims == 2) { - outputTensor = [mpsGraph gatherWithUpdatesTensor:inputTensor - indicesTensor:indicesTensor - axis:0 - batchDimensions:0 - name:nil]; - } - - newCachedGraph->inputTensor_ = inputTensor; - newCachedGraph->outputTensor_ = outputTensor; - } - return newCachedGraph; - }); - cachedGraph = static_cast(tmpCachedGraph); - } - - Placeholder selfPlaceholder = Placeholder(); - if(num_input_dims == 1) - selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self); - else - selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self, flat_input_shape); - - Placeholder outputPlaceholder = Placeholder(cachedGraph->outputTensor_, output); - - NSDictionary* feeds = @{ - selfPlaceholder.getMPSGraphTensor() : selfPlaceholder.getMPSGraphTensorData() - }; - NSDictionary* results = @{ - outputPlaceholder.getMPSGraphTensor() : outputPlaceholder.getMPSGraphTensorData() - }; - - runMPSGraph(stream, cachedGraph->graph(), feeds, results); - } - - return output; -} - } // namespace native } // namespace at diff --git a/aten/src/ATen/native/mps/operations/UnaryOps.mm b/aten/src/ATen/native/mps/operations/UnaryOps.mm index 2231a66fb3ac..3d641d3af82c 100644 --- a/aten/src/ATen/native/mps/operations/UnaryOps.mm +++ b/aten/src/ATen/native/mps/operations/UnaryOps.mm @@ -5,6 +5,7 @@ #include #include #include +#include #include namespace at { @@ -12,24 +13,29 @@ namespace mps { typedef MPSGraphTensor* (^UnaryOpBlock)(MPSGraph*, MPSGraphTensor*); +using is_noop_p = std::function; -void unary_op(const Tensor& self, const Tensor& output, std::string op_name, UnaryOpBlock unaryBlock) + +bool is_empty_tensor(const Tensor& self) { + return self.numel() == 0; +} + +void unary_op(const Tensor& self, const Tensor& output, std::string op_name, UnaryOpBlock unaryBlock, is_noop_p is_noop = is_empty_tensor) { - TORCH_CHECK_TYPE(self.scalar_type() != ScalarType::Long, "Operation '", op_name, "()' does not support input type 'int64' in MPS backend."); if (!output.is_same_size(self)) { output.resize_(self.sizes()); } - // Empty tensor is noop - if (self.numel() == 0) { + if (is_noop(self)) { + output.copy_(self); return; } MPSGraphCache* cache_ = MPSGraphCache::getInstance(); @autoreleasepool { - string key = op_name + getTensorsStringKey({self}, /*use_scalar_value*/ false); + string key = op_name + getTensorsStringKey({self, output}, /*use_scalar_value*/ false); auto cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph* () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph* () { MPSUnaryCachedGraph *newCachedGraph = nil; @autoreleasepool { MPSGraph* mpsGraph = make_mps_graph(); @@ -44,7 +50,6 @@ void unary_op(const Tensor& self, const Tensor& output, std::string op_name, Una } return newCachedGraph; }); - cachedGraph = tmpCachedGraph->as(); } Placeholder selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self); @@ -61,6 +66,14 @@ void unary_op(const Tensor& self, const Tensor& output, std::string op_name, Una MPSGraphTensor* trunc_tensor(MPSGraph* mpsGraph, MPSGraphTensor* inputTensor) { + // Rounding is a no-op for integral types, and also a reasonable workaround + // For MPSGraph bug on Apple Silicon, that throws `Function floorOp_i64 was not found in the library` + // See https://github.com/pytorch/pytorch/issues/84995 + bool isFloatInput = ([inputTensor dataType] & MPSDataTypeFloatBit) != 0; + if (!isFloatInput) { + return inputTensor; + } + MPSGraphTensor* zeroTensor = [mpsGraph constantWithScalar:0.0 dataType:inputTensor.dataType]; MPSGraphTensor* predicateTensor = [mpsGraph lessThanWithPrimaryTensor:inputTensor @@ -80,6 +93,51 @@ void unary_op(const Tensor& self, const Tensor& output, std::string op_name, Una { return mps::trunc_tensor(mpsGraph, inputTensor); }); } +TORCH_IMPL_FUNC(signbit_out_mps) (const Tensor& self, const Tensor& output) { + mps::unary_op(self, output, "signbit_out_mps", + ^ MPSGraphTensor* (MPSGraph* mpsGraph, MPSGraphTensor* inputTensor) { + MPSGraphTensor* output; + // signbit is not implemented for int64 type. + // workaround for `Function signbitOp_i64 was not found in the library` + if ([inputTensor dataType] == MPSDataTypeInt64) { + MPSGraphTensor* zeroTensor = [mpsGraph constantWithScalar:0.0 dataType:inputTensor.dataType]; + output = [mpsGraph lessThanWithPrimaryTensor:inputTensor + secondaryTensor:zeroTensor + name:nil]; + } else { + output = [mpsGraph signbitWithTensor: inputTensor name: nil]; + } + return mps::castMPSTensor(mpsGraph, output, ScalarType::Bool); + }); +} + +TORCH_IMPL_FUNC(sign_out_mps) (const Tensor& self, const Tensor& output) { + mps::unary_op(self, output, "sign_out_mps", + ^ MPSGraphTensor* (MPSGraph* mpsGraph, MPSGraphTensor* inputTensor) { + // Sign op is not implemented in MPS as of MacOS13.0 beta, so simulate it using clamp + if ([inputTensor dataType] == MPSDataTypeInt64) { + return [mpsGraph clampWithTensor:inputTensor + minValueTensor:[mpsGraph constantWithScalar:-1 dataType:MPSDataTypeInt64] + maxValueTensor:[mpsGraph constantWithScalar:1 dataType:MPSDataTypeInt64] + name: nil]; + } + return [mpsGraph signWithTensor: inputTensor name: nil]; + }); +} + +#define CREATE_MPS_STRUCTURED_UNARY_ROUNDING_TORCH_IMPL_FUNC(func_out, func_stub) \ +TORCH_IMPL_FUNC(func_out) (const Tensor& self, const Tensor& output) { \ + mps::unary_op(self, output, #func_out, \ + ^ MPSGraphTensor* (MPSGraph* mpsGraph, MPSGraphTensor* inputTensor) \ + { return [mpsGraph func_stub##WithTensor:inputTensor name:nil]; }, \ + [](const Tensor& t) -> bool { \ + return t.numel() == 0 || isIntegralType(t.scalar_type()); \ + }); \ +} +CREATE_MPS_STRUCTURED_UNARY_ROUNDING_TORCH_IMPL_FUNC(ceil_out_mps, ceil) +CREATE_MPS_STRUCTURED_UNARY_ROUNDING_TORCH_IMPL_FUNC(floor_out_mps, floor) +CREATE_MPS_STRUCTURED_UNARY_ROUNDING_TORCH_IMPL_FUNC(round_out_mps, round) + #define CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(func_out, func_stub) \ TORCH_IMPL_FUNC(func_out) (const Tensor& self, const Tensor& output) { \ mps::unary_op(self, output, #func_out, \ @@ -101,14 +159,10 @@ void unary_op(const Tensor& self, const Tensor& output, std::string op_name, Una CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(reciprocal_out_mps, reciprocal) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(sqrt_out_mps, squareRoot) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(rsqrt_out_mps, reverseSquareRoot) -CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(sign_out_mps, sign) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(neg_out_mps, negative) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(log_out_mps, logarithm) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(log10_out_mps, logarithmBase10) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(log2_out_mps, logarithmBase2) -CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(ceil_out_mps, ceil) -CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(floor_out_mps, floor) -CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(round_out_mps, round) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(erf_out_mps, erf) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(sin_out_mps, sin) CREATE_MPS_STRUCTURED_UNARY_TORCH_IMPL_FUNC(cos_out_mps, cos) @@ -144,7 +198,7 @@ void unary_op(const Tensor& self, const Tensor& output, std::string op_name, Una auto cachedGraph = cache_->LookUpAs(key); if(!cachedGraph) { - MPSCachedGraph *tmpCachedGraph = cache_->CreateCachedGraph(key, ^ MPSCachedGraph* () { + cachedGraph = cache_->CreateCachedGraphAs(key, ^ MPSCachedGraph* () { MPSUnaryCachedGraph *newCachedGraph = nil; @autoreleasepool { MPSGraph* mpsGraph = make_mps_graph(); @@ -161,7 +215,6 @@ void unary_op(const Tensor& self, const Tensor& output, std::string op_name, Una } return newCachedGraph; }); - cachedGraph = tmpCachedGraph->as(); } Placeholder selfPlaceholder = Placeholder(cachedGraph->inputTensor_, self); @@ -176,5 +229,70 @@ void unary_op(const Tensor& self, const Tensor& output, std::string op_name, Una } } +TORCH_IMPL_FUNC(frac_out_mps) (const Tensor& self, const Tensor& output) { + TORCH_CHECK(isFloatingType(self.scalar_type()), "frac_out_mps is only implemented for floating types"); + mps::unary_op(self, output, "frac_out_mps", + ^ MPSGraphTensor* (MPSGraph* mpsGraph, MPSGraphTensor* inputTensor) { + auto zeroTensor = [mpsGraph constantWithScalar:0.0 + dataType:inputTensor.dataType]; + auto predicateTensor = [mpsGraph lessThanWithPrimaryTensor:inputTensor + secondaryTensor:zeroTensor + name:nil]; + auto truncTensor = [mpsGraph selectWithPredicateTensor:predicateTensor + truePredicateTensor:[mpsGraph ceilWithTensor :inputTensor name:nil] + falsePredicateTensor:[mpsGraph floorWithTensor:inputTensor name:nil] + name:nil]; + return [mpsGraph subtractionWithPrimaryTensor:inputTensor + secondaryTensor:truncTensor + name: nil]; + }); +} + +TORCH_IMPL_FUNC(expm1_out_mps) (const Tensor& self, const Tensor& output) { + mps::unary_op(self, output, "expm1_out_mps", + ^ MPSGraphTensor* (MPSGraph* mpsGraph, MPSGraphTensor* inputTensor) { + MPSGraphTensor* oneTensor = [mpsGraph constantWithScalar:1.0 + shape:@[@1] + dataType:inputTensor.dataType]; + MPSGraphTensor* ePowTensor = [mpsGraph exponentWithTensor:inputTensor + name:nil]; + return [mpsGraph subtractionWithPrimaryTensor:ePowTensor + secondaryTensor:oneTensor + name: nil]; + }); +} + + + +TORCH_IMPL_FUNC(cumsum_out_mps) +(const Tensor& self, + int64_t dim, + c10::optional dtype, + const Tensor& result) { + TORCH_CHECK(dim >=0 && dim < std::max(1LL, self.ndimension()), "Expected dim to be between 0 and ", self.ndimension(), " but got ", dim); + if (!is_macos_13_or_newer()) { + TORCH_WARN_ONCE("torch.cumsum supported by MPS on MacOS 13+, please upgrade"); + auto cpu_result = self.to(at::Device(kCPU)).cumsum(dim, dtype); + at::_copy_from_and_resize(cpu_result, result); + return; + } + auto input = dtype.has_value() ? self.to(dtype.value()) : self; + mps::unary_op(input, result, "cumsum_out_mp" + std::to_string(dim), + ^ MPSGraphTensor* (MPSGraph* mpsGraph, MPSGraphTensor* inputTensor) { + // cumsum is horribly broken for int8, int16 and as chances for overflow is pretty high, cast to int32 + if (isIntegralType(input.scalar_type()) && input.scalar_type() !=ScalarType::Int) { + inputTensor = mps::castMPSTensor(mpsGraph, inputTensor, result.scalar_type()); + } + auto rc = [mpsGraph cumulativeSumWithTensor: inputTensor + axis: dim + name: nil]; + if (result.scalar_type()!= input.scalar_type() || + (isIntegralType(input.scalar_type()) && input.scalar_type() !=ScalarType::Int)) { + return mps::castMPSTensor(mpsGraph, rc, result.scalar_type()); + } + return rc; + }); +} + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/mps/operations/View.mm b/aten/src/ATen/native/mps/operations/View.mm index a8a55b21d246..0e35c7b2f642 100644 --- a/aten/src/ATen/native/mps/operations/View.mm +++ b/aten/src/ATen/native/mps/operations/View.mm @@ -2,19 +2,9 @@ #include #include +#include namespace at { - -// these are from MPSAllocator -namespace mps { - // to check the requested non-aligned size of an MTL buffer - ssize_t get_requested_buffer_size(void* ptr); - // to retrieve the shape of a base tensor from a view tensor - IntArrayRef get_buffer_shape(void* ptr); - // to set the shape of a base tensor from a view tensor - void set_buffer_shape(void* ptr, const IntArrayRef& shape); -} - namespace native { namespace mps { @@ -62,9 +52,13 @@ shape: getMPSShape(src.numel()) dataType: inputType] autorelease]; } - feeds[cachedGraph->storageOffsetTensor] = getMPSGraphTensorFromScalar(stream, Scalar(storage_offset), MPSDataTypeInt32); + MPSScalar storageOffsetScalar = getMPSScalar(storage_offset, ScalarType::Int); + feeds[cachedGraph->storageOffsetTensor] = getMPSGraphTensorFromScalar(stream, storageOffsetScalar); + + std::vector strideScalars(sizes.size()); for (int i = 0; i < sizes.size(); i++) { - feeds[cachedGraph->strideTensors[i]] = getMPSGraphTensorFromScalar(stream, Scalar(strides[i]), MPSDataTypeInt32); + strideScalars[i] = getMPSScalar(strides[i], ScalarType::Int); + feeds[cachedGraph->strideTensors[i]] = getMPSGraphTensorFromScalar(stream, strideScalars[i]); } // Workaround for MPSShaderLibrary bug // TODO: Remove once https://github.com/pytorch/pytorch/issues/82305 is resolved @@ -79,7 +73,7 @@ cachedGraph->outputTensor : outputTensorData }; stream->executeMPSGraph(cachedGraph->graph(), feeds, results, - requires_sync ? SyncType::COMMIT : SyncType::NONE); + requires_sync ? SyncType::COMMIT : SyncType::COMMIT_ADAPTIVE); } return output; } @@ -144,7 +138,7 @@ withShape: @[@-1] name: nil]; if (needsScatter) { - MPSGraphTensor* scatteredTensor = [mpsGraph scatterAlongAxis: 0 + MPSGraphTensor* scatteredTensor = [mpsGraph scatterAlongAxis: (NSInteger) 0 withDataTensor: reshapedInputTensor updatesTensor: cachedGraph->updatesTensor indicesTensor: reshapedIndicesTensor @@ -201,7 +195,9 @@ // IntArrayRef wouldn't own the data, so we use a static storage static const int64_t shape_1d = 1; // self.sizes().size() could be zero - base_shape = self.sizes().size() ? self.sizes() : IntArrayRef(&shape_1d, 1); + base_shape = self.sizes().size() ? self.sizes() : + self.is_view() ? self._base().sizes() : IntArrayRef(&shape_1d, 1); + // base_shape will be retained in MPSAllocator until buffer gets recycled if (self.storage().data()) set_buffer_shape(self.storage().data(), base_shape); @@ -232,7 +228,7 @@ newCachedGraph->strideTensors.push_back(mpsGraphRankedPlaceHolder(mpsGraph, MPSDataTypeInt32, @[@1])); } if (needsScatter) { - newCachedGraph->updatesTensor = mpsGraphUnrankedPlaceHolder(mpsGraph, getMPSDataType(self.scalar_type())); + newCachedGraph->updatesTensor = mpsGraphUnrankedPlaceHolder(mpsGraph, inputType); } newCachedGraph->outputTensor = chainViewOperation(newCachedGraph, size, stride, storage_offset, base_shape, needsScatter, needsBoolCast); } @@ -278,7 +274,7 @@ Tensor gatherViewTensor(const at::Tensor& src, at::Tensor& dst) } // namespace mps // implementation of as_strided() op -Tensor as_strided_tensorimpl_mps(const Tensor& self, IntArrayRef size, IntArrayRef stride, optional storage_offset_) +Tensor as_strided_tensorimpl_mps(const Tensor& self, IntArrayRef size, IntArrayRef stride, c10::optional storage_offset_) { auto storage_offset = storage_offset_.value_or(self.storage_offset()); auto result = detail::make_tensor(c10::TensorImpl::VIEW, Storage(self.storage()), self.key_set(), self.dtype()); diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index 7c43a7b25eef..e8d2b884c6d2 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -170,6 +170,9 @@ CPU: _assert_async_cpu CUDA: _assert_async_cuda + +- func: _assert_tensor_metadata(Tensor a, int[]? size=None, int[]? stride=None, ScalarType? dtype=None) -> () + - func: refine_names(Tensor(a) self, Dimname[] names) -> Tensor(a) variants: method @@ -178,20 +181,30 @@ dispatch: CUDA: _use_cudnn_ctc_loss +- func: _use_cudnn_ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank) -> bool + device_check: NoCheck # Tensor arguments allowed to be on different devices, see also _cudnn_ctc_loss + dispatch: + CUDA: _use_cudnn_ctc_loss_tensor + - func: _cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank, bool deterministic, bool zero_infinity) -> (Tensor, Tensor) device_check: NoCheck # log_probs is expected to be on CUDA while targets is expected to be on CPU dispatch: CUDA: _cudnn_ctc_loss autogen: _cudnn_ctc_loss.out +- func: _cudnn_ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank, bool deterministic, bool zero_infinity) -> (Tensor, Tensor) + device_check: NoCheck # log_probs is expected to be on CUDA while targets is expected to be on CPU + dispatch: + CUDA: _cudnn_ctc_loss_tensor + - func: _use_cudnn_rnn_flatten_weight() -> bool -- func: _cudnn_rnn_flatten_weight(Tensor[] weight_arr, int weight_stride0, int input_size, int mode, int hidden_size, int proj_size, int num_layers, bool batch_first, bool bidirectional) -> Tensor +- func: _cudnn_rnn_flatten_weight(Tensor[] weight_arr, int weight_stride0, SymInt input_size, int mode, SymInt hidden_size, SymInt proj_size, int num_layers, bool batch_first, bool bidirectional) -> Tensor dispatch: CUDA: _cudnn_rnn_flatten_weight autogen: _cudnn_rnn_flatten_weight.out -- func: _cudnn_rnn(Tensor input, Tensor[] weight, int weight_stride0, Tensor? weight_buf, Tensor hx, Tensor? cx, int mode, int hidden_size, int proj_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, int[] batch_sizes, Tensor? dropout_state) -> (Tensor, Tensor, Tensor, Tensor, Tensor) +- func: _cudnn_rnn(Tensor input, Tensor[] weight, int weight_stride0, Tensor? weight_buf, Tensor hx, Tensor? cx, int mode, SymInt hidden_size, SymInt proj_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, SymInt[] batch_sizes, Tensor? dropout_state) -> (Tensor, Tensor, Tensor, Tensor, Tensor) # rnn_tanh may or may not redispatch to _cudnn_rnn based on algorithm and build. Thus it might hit dispatch or kernel device check. # Disable dispatch time device check for consistent behavior. device_check: NoCheck @@ -199,7 +212,7 @@ CUDA: _cudnn_rnn autogen: _cudnn_rnn.out -- func: _cudnn_rnn_backward(Tensor input, Tensor[] weight, int weight_stride0, Tensor weight_buf, Tensor hx, Tensor? cx, Tensor output, Tensor? grad_output, Tensor? grad_hy, Tensor? grad_cy, int mode, int hidden_size, int proj_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, int[] batch_sizes, Tensor? dropout_state, Tensor reserve, bool[4] output_mask) -> (Tensor, Tensor, Tensor, Tensor[]) +- func: _cudnn_rnn_backward(Tensor input, Tensor[] weight, int weight_stride0, Tensor weight_buf, Tensor hx, Tensor? cx, Tensor output, Tensor? grad_output, Tensor? grad_hy, Tensor? grad_cy, int mode, SymInt hidden_size, SymInt proj_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, SymInt[] batch_sizes, Tensor? dropout_state, Tensor reserve, bool[4] output_mask) -> (Tensor, Tensor, Tensor, Tensor[]) dispatch: CUDA: _cudnn_rnn_backward autogen: _cudnn_rnn_backward.out @@ -230,12 +243,13 @@ dispatch: CPU: native_dropout_cpu CUDA: native_dropout_cuda - tags: nondeterministic_seeded + NestedTensorCPU, NestedTensorCUDA: native_dropout_nested + tags: nondeterministic_seeded, canonical autogen: native_dropout.out - func: native_dropout_backward(Tensor grad_output, Tensor mask, float scale) -> Tensor dispatch: - CPU: native_dropout_backward_cpu + CPU, NestedTensorCPU, NestedTensorCUDA: native_dropout_backward CUDA: native_dropout_backward_cuda autogen: native_dropout_backward.out @@ -252,27 +266,28 @@ - func: _shape_as_tensor(Tensor self) -> Tensor - func: dropout(Tensor input, float p, bool train) -> Tensor - dispatch: - CompositeImplicitAutograd: dropout - NestedTensorCPU, NestedTensorCUDA: dropout_nested tags: nondeterministic_seeded - func: dropout_(Tensor(a!) self, float p, bool train) -> Tensor(a!) - dispatch: - CompositeImplicitAutograd: dropout_ - NestedTensorCPU, NestedTensorCUDA: dropout_nested_ + tags: nondeterministic_seeded - func: feature_dropout(Tensor input, float p, bool train) -> Tensor + tags: nondeterministic_seeded - func: feature_dropout_(Tensor(a!) self, float p, bool train) -> Tensor(a!) + tags: nondeterministic_seeded - func: alpha_dropout(Tensor input, float p, bool train) -> Tensor + tags: nondeterministic_seeded - func: alpha_dropout_(Tensor(a!) self, float p, bool train) -> Tensor(a!) + tags: nondeterministic_seeded - func: feature_alpha_dropout(Tensor input, float p, bool train) -> Tensor + tags: nondeterministic_seeded - func: feature_alpha_dropout_(Tensor(a!) self, float p, bool train) -> Tensor(a!) + tags: nondeterministic_seeded - func: abs(Tensor self) -> Tensor device_check: NoCheck # TensorIterator @@ -281,6 +296,7 @@ CompositeExplicitAutograd: abs SparseCPU, SparseCUDA: abs_sparse SparseCsrCPU, SparseCsrCUDA: abs_sparse_csr + tags: canonical - func: abs_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -475,6 +491,7 @@ MkldnnCPU: mkldnn_add ZeroTensor: add_zerotensor NestedTensorCPU, NestedTensorCUDA: NestedTensor_add_Tensor + tags: canonical - func: add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -533,6 +550,7 @@ variants: function, method dispatch: CompositeExplicitAutograd: add + tags: canonical - func: add_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -650,6 +668,7 @@ dispatch: CompositeExplicitAutograd: arange cpp_no_default_args: ['step'] + tags: canonical - func: arange.out(Scalar end, *, Tensor(a!) out) -> Tensor(a!) dispatch: @@ -779,23 +798,24 @@ - func: arctanh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) -- func: as_strided(Tensor(a) self, int[] size, int[] stride, int? storage_offset=None) -> Tensor(a) +- func: as_strided(Tensor(a) self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor(a) variants: function, method dispatch: - ZeroTensor, CPU, CUDA, Meta: as_strided_tensorimpl + ZeroTensor, CPU, CUDA: as_strided_tensorimpl + Meta: as_strided_tensorimpl_meta_symint MPS: as_strided_tensorimpl_mps QuantizedCPU, QuantizedCUDA: as_strided_qtensorimpl device_check: NoCheck device_guard: False -- func: as_strided_(Tensor(a!) self, int[] size, int[] stride, int? storage_offset=None) -> Tensor(a!) +- func: as_strided_(Tensor(a!) self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor(a!) use_const_ref_for_mutable_tensors: True variants: function, method device_check: NoCheck device_guard: False tags: inplace_view dispatch: - CompositeExplicitAutogradNonFunctional: as_strided_ + CompositeExplicitAutogradNonFunctional: as_strided__symint - func: asin(Tensor self) -> Tensor device_check: NoCheck # TensorIterator @@ -933,6 +953,7 @@ - func: bernoulli.out(Tensor self, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: function + tags: nondeterministic_seeded dispatch: CPU, CUDA: bernoulli_out MPS: bernoulli_out_mps @@ -940,6 +961,7 @@ - func: bernoulli_.Tensor(Tensor(a!) self, Tensor p, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: method + tags: nondeterministic_seeded dispatch: CPU, CUDA: bernoulli_ MPS: bernoulli_mps_ @@ -948,6 +970,7 @@ - func: bernoulli_.float(Tensor(a!) self, float p=0.5, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: method + tags: nondeterministic_seeded dispatch: CPU, CUDA: bernoulli_ MPS: bernoulli_mps_ @@ -962,6 +985,8 @@ device_check: NoCheck # TensorIterator variants: function, method tags: nondeterministic_seeded + dispatch: + CompositeExplicitAutogradNonFunctional: bernoulli - func: bilinear(Tensor input1, Tensor input2, Tensor weight, Tensor? bias=None) -> Tensor @@ -1018,6 +1043,7 @@ device_check: NoCheck # TensorIterator structured_delegate: bitwise_not.out variants: function, method + tags: canonical - func: bitwise_not_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -1150,7 +1176,9 @@ dispatch: SparseCPU: bmm_sparse_cpu SparseCUDA: bmm_sparse_cuda - NestedTensorCPU, NestedTensorCUDA: bmm_nested + NestedTensorCPU: bmm_nested + NestedTensorCUDA: bmm_nested_cuda + tags: canonical - func: bmm.out(Tensor self, Tensor mat2, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -1167,8 +1195,10 @@ device_check: NoCheck device_guard: False -- func: broadcast_to(Tensor(a) self, int[] size) -> Tensor(a) +- func: broadcast_to(Tensor(a) self, SymInt[] size) -> Tensor(a) variants: function, method + dispatch: + CompositeImplicitAutograd: broadcast_to_symint - func: _sparse_broadcast_to(Tensor(a) self, int[] size) -> Tensor(a) variants: function @@ -1180,6 +1210,7 @@ dispatch: SparseCPU, SparseCUDA: cat_sparse QuantizedCPU: cat_quantized_cpu + tags: canonical - func: cat.out(Tensor[] tensors, int dim=0, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -1204,6 +1235,15 @@ - func: concat.names_out(Tensor[] tensors, Dimname dim, *, Tensor(a!) out) -> Tensor(a!) +# alias for torch.cat +- func: concatenate(Tensor[] tensors, int dim=0) -> Tensor + +- func: concatenate.out(Tensor[] tensors, int dim=0, *, Tensor(a!) out) -> Tensor(a!) + +- func: concatenate.names(Tensor[] tensors, Dimname dim) -> Tensor + +- func: concatenate.names_out(Tensor[] tensors, Dimname dim, *, Tensor(a!) out) -> Tensor(a!) + - func: block_diag(Tensor[] tensors) -> Tensor variants: function dispatch: @@ -1252,12 +1292,19 @@ variants: function, method device_check: NoCheck device_guard: False + dispatch: + CompositeImplicitAutograd: chunk + NestedTensorCPU, NestedTensorCUDA: chunk_nested_tensor -- func: tensor_split.sections(Tensor(a -> *) self, int sections, int dim=0) -> Tensor(a)[] +- func: tensor_split.sections(Tensor(a -> *) self, SymInt sections, int dim=0) -> Tensor(a)[] variants: function, method + dispatch: + CompositeImplicitAutograd: tensor_split_sections_symint -- func: tensor_split.indices(Tensor(a -> *) self, int[] indices, int dim=0) -> Tensor(a)[] +- func: tensor_split.indices(Tensor(a -> *) self, SymInt[] indices, int dim=0) -> Tensor(a)[] variants: function, method + dispatch: + CompositeImplicitAutograd: tensor_split_indices_symint - func: tensor_split.tensor_indices_or_sections(Tensor(a -> *) self, Tensor tensor_indices_or_sections, int dim=0) -> Tensor(a)[] variants: function, method @@ -1269,6 +1316,7 @@ structured_delegate: clamp.out dispatch: QuantizedCPU: clamp_quantized_cpu + tags: canonical - func: clamp.Tensor(Tensor self, Tensor? min=None, Tensor? max=None) -> Tensor variants: function, method @@ -1411,26 +1459,29 @@ dispatch: CPU, CUDA: polar_out -- func: constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> Tensor +- func: constant_pad_nd(Tensor self, SymInt[] pad, Scalar value=0) -> Tensor variants: function dispatch: CompositeExplicitAutograd: constant_pad_nd MPS: constant_pad_nd_mps autogen: constant_pad_nd.out + tags: canonical - func: contiguous(Tensor(a) self, *, MemoryFormat memory_format=contiguous_format) -> Tensor(a) variants: method manual_cpp_binding: True -- func: convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> Tensor +- func: convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, SymInt[] padding, int[] dilation, bool transposed, SymInt[] output_padding, int groups) -> Tensor dispatch: CompositeExplicitAutograd: convolution autogen: convolution.out + tags: canonical -- func: convolution_backward(Tensor grad_output, Tensor input, Tensor weight, int[]? bias_sizes, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor) +- func: convolution_backward(Tensor grad_output, Tensor input, Tensor weight, SymInt[]? bias_sizes, int[] stride, SymInt[] padding, int[] dilation, bool transposed, SymInt[] output_padding, int groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor) dispatch: CompositeExplicitAutograd, CUDA: convolution_backward autogen: convolution_backward.out + tags: canonical - func: convolution_overrideable(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups) -> Tensor dispatch: @@ -1442,7 +1493,7 @@ CompositeExplicitAutograd: convolution_backward_overrideable autogen: convolution_backward_overrideable.out -- func: _convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> Tensor +- func: _convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, SymInt[] padding, int[] dilation, bool transposed, SymInt[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> Tensor dispatch: CompositeExplicitAutograd: _convolution autogen: _convolution.out @@ -1451,7 +1502,7 @@ - func: _convolution_mode(Tensor input, Tensor weight, Tensor? bias, int[] stride, str padding, int[] dilation, int groups) -> Tensor -- func: _convolution_double_backward(Tensor? ggI, Tensor? ggW, Tensor? ggb, Tensor gO, Tensor weight, Tensor self, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor) +- func: _convolution_double_backward(Tensor? ggI, Tensor? ggW, Tensor? ggb, Tensor gO, Tensor weight, Tensor self, int[] stride, SymInt[] padding, int[] dilation, bool transposed, SymInt[] output_padding, int groups, bool[3] output_mask) -> (Tensor, Tensor, Tensor) - func: conv1d(Tensor input, Tensor weight, Tensor? bias=None, int[1] stride=1, int[1] padding=0, int[1] dilation=1, int groups=1) -> Tensor @@ -1484,6 +1535,8 @@ - func: copy(Tensor self, Tensor src, bool non_blocking=False) -> Tensor variants: function + dispatch: + CompositeExplicitAutogradNonFunctional: copy - func: copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!) variants: method @@ -1494,6 +1547,7 @@ SparseCPU, SparseCUDA: copy_sparse_wrapper_ CompositeExplicitAutograd: copy_ SparseCsrCPU, SparseCsrCUDA: copy_sparse_compressed_ + NestedTensorCPU, NestedTensorCUDA: copy_nested_ autogen: copy.out - func: _copy_from(Tensor self, Tensor dst, bool non_blocking=False) -> Tensor @@ -1726,6 +1780,7 @@ device_check: NoCheck # TensorIterator dispatch: CPU, CUDA: cumsum_out + MPS: cumsum_out_mps - func: cumsum.dimname(Tensor self, Dimname dim, *, ScalarType? dtype=None) -> Tensor device_check: NoCheck # TensorIterator @@ -1751,6 +1806,13 @@ CPU: ctc_loss_cpu CUDA: ctc_loss_gpu autogen: _ctc_loss.out + tags: dynamic_output_shape # the shape of second output is data dependent + +- func: _ctc_loss.Tensor(Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, int blank=0, bool zero_infinity=False) -> (Tensor, Tensor) + dispatch: + CPU, CUDA: ctc_loss_tensor + autogen: _ctc_loss.Tensor_out + tags: dynamic_output_shape # the shape of second output is data dependent - func: _ctc_loss_backward(Tensor grad, Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, Tensor neg_log_likelihood, Tensor log_alpha, int blank, bool zero_infinity=False) -> Tensor dispatch: @@ -1758,10 +1820,14 @@ CUDA: ctc_loss_backward_gpu autogen: _ctc_loss_backward.out +- func: _ctc_loss_backward.Tensor(Tensor grad, Tensor log_probs, Tensor targets, Tensor input_lengths, Tensor target_lengths, Tensor neg_log_likelihood, Tensor log_alpha, int blank, bool zero_infinity=False) -> Tensor + dispatch: + CPU, CUDA: ctc_loss_backward_tensor + - func: diag_embed(Tensor self, int offset=0, int dim1=-2, int dim2=-1) -> Tensor variants: function, method dispatch: - CompositeExplicitAutograd: diag_embed + CompositeExplicitAutogradNonFunctional: diag_embed autogen: diag_embed.out - func: diagflat(Tensor self, int offset=0) -> Tensor @@ -1779,12 +1845,12 @@ - func: diagonal.Dimname(Tensor(a) self, *, Dimname outdim, Dimname dim1, Dimname dim2, int offset=0) -> Tensor(a) variants: function, method -- func: diagonal_backward(Tensor grad_output, int[] input_sizes, int offset, int dim1, int dim2) -> Tensor +- func: diagonal_backward(Tensor grad_output, SymInt[] input_sizes, int offset, int dim1, int dim2) -> Tensor variants: function device_check: NoCheck device_guard: False dispatch: - CompositeExplicitAutograd: diagonal_backward + CompositeExplicitAutograd: diagonal_backward_symint autogen: diagonal_backward.out - func: fill_diagonal_(Tensor(a!) self, Scalar fill_value, bool wrap=False) -> Tensor(a!) @@ -1824,6 +1890,8 @@ dispatch: SparseCPU, SparseCUDA: div_sparse ZeroTensor: div_zerotensor + NestedTensorCPU, NestedTensorCUDA: NestedTensor_div_Tensor + tags: canonical - func: div_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -1870,6 +1938,8 @@ variants: function, method dispatch: CompositeExplicitAutograd: div + NestedTensorCPU, NestedTensorCUDA: NestedTensor_div_Scalar + tags: canonical - func: div_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -1959,22 +2029,25 @@ dispatch: CompositeExplicitAutograd: vdot_out -- func: einsum(str equation, Tensor[] tensors) -> Tensor +- func: einsum(str equation, Tensor[] tensors, *, int[]? path=None) -> Tensor -- func: embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> Tensor +- func: embedding(Tensor weight, Tensor indices, SymInt padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> Tensor dispatch: - CompositeExplicitAutograd: embedding + CompositeExplicitAutograd: embedding_symint NestedTensorCPU, NestedTensorCUDA: NestedTensor_embedding autogen: embedding.out -- func: embedding_backward(Tensor grad, Tensor indices, int num_weights, int padding_idx, bool scale_grad_by_freq, bool sparse) -> Tensor +- func: embedding_backward(Tensor grad, Tensor indices, SymInt num_weights, SymInt padding_idx, bool scale_grad_by_freq, bool sparse) -> Tensor + dispatch: + CompositeImplicitAutograd: embedding_backward_symint -- func: embedding_dense_backward(Tensor grad_output, Tensor indices, int num_weights, int padding_idx, bool scale_grad_by_freq) -> Tensor +- func: embedding_dense_backward(Tensor grad_output, Tensor indices, SymInt num_weights, SymInt padding_idx, bool scale_grad_by_freq) -> Tensor dispatch: CPU: embedding_dense_backward_cpu CUDA: embedding_dense_backward_cuda MPS: embedding_dense_backward_mps autogen: embedding_dense_backward.out + tags: canonical - func: embedding_renorm_(Tensor(a!) self, Tensor indices, float max_norm, float norm_type) -> Tensor(a!) dispatch: @@ -2021,11 +2094,15 @@ CUDA: _embedding_bag_cuda autogen: _embedding_bag.out -- func: _embedding_bag_backward(Tensor grad, Tensor indices, Tensor offsets, Tensor offset2bag, Tensor bag_size, Tensor maximum_indices, int num_weights, bool scale_grad_by_freq, int mode, bool sparse, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor +- func: _embedding_bag_backward(Tensor grad, Tensor indices, Tensor offsets, Tensor offset2bag, Tensor bag_size, Tensor maximum_indices, SymInt num_weights, bool scale_grad_by_freq, int mode, bool sparse, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor + dispatch: + CompositeImplicitAutograd: _embedding_bag_backward_symint -- func: _embedding_bag_sparse_backward(Tensor grad, Tensor indices, Tensor offsets, Tensor offset2bag, Tensor bag_size, int num_weights, bool scale_grad_by_freq, int mode, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor +- func: _embedding_bag_sparse_backward(Tensor grad, Tensor indices, Tensor offsets, Tensor offset2bag, Tensor bag_size, SymInt num_weights, bool scale_grad_by_freq, int mode, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor + dispatch: + CompositeImplicitAutograd: _embedding_bag_sparse_backward_symint -- func: _embedding_bag_dense_backward(Tensor grad, Tensor indices, Tensor offset2bag, Tensor bag_size, Tensor maximum_indices, int num_weights, bool scale_grad_by_freq, int mode, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor +- func: _embedding_bag_dense_backward(Tensor grad, Tensor indices, Tensor offset2bag, Tensor bag_size, Tensor maximum_indices, SymInt num_weights, bool scale_grad_by_freq, int mode, Tensor? per_sample_weights, int padding_idx=-1) -> Tensor dispatch: CPU: _embedding_bag_dense_backward_cpu CUDA: _embedding_bag_dense_backward_cuda @@ -2041,60 +2118,35 @@ device_check: NoCheck device_guard: False dispatch: - CompositeExplicitAutograd: empty + CompositeExplicitAutograd: empty_names autogen: empty.names_out -- func: empty.memory_format(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor +- func: empty.memory_format(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor dispatch: CPU: empty_cpu CUDA: empty_cuda MPS: empty_mps - Meta: empty_meta + Meta: empty_meta_symint MkldnnCPU: empty_mkldnn SparseCPU, SparseCUDA, SparseMeta: empty_sparse SparseCsrCPU, SparseCsrCUDA: empty_sparse_compressed QuantizedCPU, QuantizedCUDA, QuantizedMeta: empty_unknown_quantized -# all calls to empty() in python used to go through the symint overload -# even if all arguments were concerete integers. -# adding symint overloads of kernels for every dispatch key allowed us -# to skip redispatching to `empty.memory_format` and hit backend kernels directly -# we recently updated signature parsing to dispath `empty()` calls in python -# to `empty.SymInt` iff there's is a symint node argument -# hopefully, we could simplify this entry soon -- func: empty.SymInt(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor - dispatch: - CPU: empty_symint_cpu - CUDA: empty_symint_cuda - MPS: empty_symint_mps - Meta: empty_symint_meta - MkldnnCPU: empty_symint_mkldnn - SparseCPU, SparseCUDA, SparseMeta: empty_symint_sparse - SparseCsrCPU, SparseCsrCUDA: empty_symint_sparse_compressed - QuantizedCPU, QuantizedCUDA: empty_symint_unknown_quantized - autogen: empty.SymInt_out - # We do not make new_empty a composite that calls into new_empty_strided, as the strided version # is significantly more difficult to implement by different backends -- func: new_empty(Tensor self, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor - variants: method - dispatch: - CompositeExplicitAutograd: new_empty - autogen: new_empty.out - -- func: new_empty.SymInt(Tensor self, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: new_empty(Tensor self, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor variants: method dispatch: CompositeExplicitAutograd: new_empty_symint - autogen: new_empty.SymInt_out + autogen: new_empty.out -- func: new_empty_strided(Tensor self, int[] size, int[] stride, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: new_empty_strided(Tensor self, SymInt[] size, SymInt[] stride, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor variants: method dispatch: - CompositeExplicitAutogradNonFunctional: new_empty_strided + CompositeExplicitAutogradNonFunctional: new_empty_strided_symint autogen: new_empty_strided.out -- func: new_full(Tensor self, int[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: new_full(Tensor self, SymInt[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor variants: method dispatch: # NB: Although this composite mutates on the inside, it is @@ -2102,7 +2154,7 @@ CompositeExplicitAutograd: new_full autogen: new_full.out -- func: new_zeros(Tensor self, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: new_zeros(Tensor self, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor variants: method dispatch: # NB: Although this composite mutates on the inside, it is @@ -2110,7 +2162,7 @@ CompositeExplicitAutograd: new_zeros autogen: new_zeros.out -- func: new_ones(Tensor self, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: new_ones(Tensor self, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor variants: method dispatch: # NB: Although this composite mutates on the inside, it is @@ -2134,7 +2186,7 @@ QuantizedCPU, QuantizedCUDA: empty_per_channel_affine_quantized autogen: _empty_per_channel_affine_quantized.out -- func: resize_(Tensor(a!) self, int[] size, *, MemoryFormat? memory_format=None) -> Tensor(a!) +- func: resize_(Tensor(a!) self, SymInt[] size, *, MemoryFormat? memory_format=None) -> Tensor(a!) use_const_ref_for_mutable_tensors: True variants: method device_check: NoCheck @@ -2165,7 +2217,7 @@ QuantizedCPU, QuantizedCUDA: empty_quantized autogen: empty_quantized.out -- func: empty.out(int[] size, *, MemoryFormat? memory_format=None, Tensor(a!) out) -> Tensor(a!) +- func: empty.out(SymInt[] size, *, MemoryFormat? memory_format=None, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck device_guard: False @@ -2177,14 +2229,15 @@ QuantizedCPU, QuantizedCUDA: empty_like_quantized SparseCPU, SparseCUDA, SparseMeta: empty_like_sparse_coo SparseCsrCPU, SparseCsrCUDA: empty_like_sparse_csr + NestedTensorCPU, NestedTensorCUDA: empty_like_nested autogen: empty_like.out -- func: empty_strided(int[] size, int[] stride, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: empty_strided(SymInt[] size, SymInt[] stride, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor dispatch: CPU: empty_strided_cpu CUDA: empty_strided_cuda MPS: empty_strided_mps - Meta: empty_strided_meta + Meta: empty_strided_meta_symint QuantizedCPU, QuantizedCUDA: empty_strided_unknown_quantized autogen: empty_strided.out @@ -2195,6 +2248,7 @@ dispatch: SparseCPU, SparseCUDA: erf_sparse SparseCsrCPU, SparseCsrCUDA: erf_sparse_csr + tags: canonical - func: erf_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -2235,6 +2289,7 @@ device_check: NoCheck # TensorIterator structured_delegate: exp.out variants: function, method + tags: canonical - func: exp_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -2286,22 +2341,17 @@ structured_inherits: TensorIteratorBase dispatch: CPU, CUDA: expm1_out + MPS: expm1_out_mps SparseCPU, SparseCUDA: expm1_sparse_out SparseCsrCPU, SparseCsrCUDA: expm1_sparse_csr_out -- func: expand.SymInt(Tensor(a) self, SymInt[] size, *, bool implicit=False) -> Tensor(a) - variants: method # This is method-only to match the previous tensor API. In the future we could make this a function too. - device_check: NoCheck - device_guard: False - dispatch: - CompositeExplicitAutograd: expand_symint - -- func: expand(Tensor(a) self, int[] size, *, bool implicit=False) -> Tensor(a) +- func: expand(Tensor(a) self, SymInt[] size, *, bool implicit=False) -> Tensor(a) variants: method # This is method-only to match the previous tensor API. In the future we could make this a function too. device_check: NoCheck device_guard: False dispatch: CompositeExplicitAutograd: expand + tags: canonical - func: expand_as(Tensor(a) self, Tensor other) -> Tensor(a) variants: method # This is method-only to match the previous tensor API. In the future we could make this a function too. @@ -2351,6 +2401,7 @@ variants: function dispatch: CompositeExplicitAutograd: fill + tags: canonical - func: fill.Tensor(Tensor self, Tensor value) -> Tensor variants: function @@ -2366,6 +2417,7 @@ QuantizedCPU, QuantizedCUDA: fill_quantized_ Meta: fill_meta_ SparseCsrCPU, SparseCsrCUDA: fill_sparse_csr_ + NestedTensorCPU, NestedTensorCUDA: fill_nested_ autogen: fill.Scalar_out - func: fill_.Tensor(Tensor(a!) self, Tensor value) -> Tensor(a!) @@ -2376,6 +2428,7 @@ MPS: fill_tensor_mps_ QuantizedCPU, QuantizedCUDA: fill_quantized_ Meta: fill_meta_ + NestedTensorCPU, NestedTensorCUDA: fill_nested_ autogen: fill.Tensor_out - func: floor(Tensor self) -> Tensor @@ -2436,11 +2489,17 @@ device_check: NoCheck # TensorIterator structured_delegate: frac.out variants: function, method + dispatch: + SparseCPU, SparseCUDA: frac_sparse + SparseCsrCPU, SparseCsrCUDA: frac_sparse_csr - func: frac_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator structured_delegate: frac.out variants: function, method + dispatch: + SparseCPU, SparseCUDA: frac_sparse_ + SparseCsrCPU, SparseCsrCUDA: frac_sparse_csr_ - func: frac.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -2448,6 +2507,9 @@ structured_inherits: TensorIteratorBase dispatch: CPU, CUDA: frac_out + MPS: frac_out_mps + SparseCPU, SparseCUDA: frac_sparse_out + SparseCsrCPU, SparseCsrCUDA: frac_sparse_csr_out - func: full.names(int[] size, Scalar fill_value, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor device_check: NoCheck @@ -2456,11 +2518,11 @@ CompositeExplicitAutograd: full autogen: full.names_out -- func: full(int[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: full(SymInt[] size, Scalar fill_value, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor dispatch: CompositeExplicitAutograd: full -- func: full.out(int[] size, Scalar fill_value, *, Tensor(a!) out) -> Tensor(a!) +- func: full.out(SymInt[] size, Scalar fill_value, *, Tensor(a!) out) -> Tensor(a!) dispatch: CompositeExplicitAutograd: full_out @@ -2528,6 +2590,7 @@ CPU, QuantizedCPU: grid_sampler_2d_cpu CUDA: grid_sampler_2d_cuda autogen: grid_sampler_2d.out + tags: canonical # `grid_sampler_2d_backward` takes in `output_mask` to optimize performance for # the case where `input` doesn't require gradient. Gradient for `grid` is always @@ -2610,16 +2673,18 @@ - func: group_norm(Tensor input, int num_groups, Tensor? weight=None, Tensor? bias=None, float eps=1e-05, bool cudnn_enabled=True) -> Tensor -- func: native_group_norm(Tensor input, Tensor? weight, Tensor? bias, int N, int C, int HxW, int group, float eps) -> (Tensor, Tensor, Tensor) +- func: native_group_norm(Tensor input, Tensor? weight, Tensor? bias, SymInt N, SymInt C, SymInt HxW, int group, float eps) -> (Tensor, Tensor, Tensor) dispatch: CPU, CUDA: native_group_norm CompositeExplicitAutograd: math_group_norm autogen: native_group_norm.out + tags: canonical -- func: native_group_norm_backward(Tensor grad_out, Tensor input, Tensor mean, Tensor rstd, Tensor? weight, int N, int C, int HxW, int group, bool[3] output_mask) -> (Tensor, Tensor, Tensor) +- func: native_group_norm_backward(Tensor grad_out, Tensor input, Tensor mean, Tensor rstd, Tensor? weight, SymInt N, SymInt C, SymInt HxW, int group, bool[3] output_mask) -> (Tensor, Tensor, Tensor) dispatch: CPU, CUDA: native_group_norm_backward autogen: native_group_norm_backward.out + tags: canonical # Real to complex forward FFT - func: _fft_r2c(Tensor self, int[] dim, int normalization, bool onesided) -> Tensor @@ -2648,13 +2713,13 @@ CUDA: _fft_c2r_cufft_out # Standard complex to complex FFT (forward or backward) -- func: _fft_c2c(Tensor self, int[] dim, int normalization, bool forward) -> Tensor +- func: _fft_c2c(Tensor self, SymInt[] dim, int normalization, bool forward) -> Tensor variants: function dispatch: CPU: _fft_c2c_mkl CUDA: _fft_c2c_cufft -- func: _fft_c2c.out(Tensor self, int[] dim, int normalization, bool forward, *, Tensor(a!) out) -> Tensor(a!) +- func: _fft_c2c.out(Tensor self, SymInt[] dim, int normalization, bool forward, *, Tensor(a!) out) -> Tensor(a!) variants: function dispatch: CPU: _fft_c2c_mkl_out @@ -2694,7 +2759,7 @@ precomputed: - indices -> DimVector sizes, DimVector strides dispatch: - CPU, CUDA: index_out + CPU, CUDA, MPS: index_out - func: index_copy.out(Tensor self, int dim, Tensor index, Tensor source, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -2740,22 +2805,14 @@ device_check: NoCheck # TensorIterator variants: function dispatch: - CPU, CUDA: _index_put_impl_ + CPU, CUDA, MPS: _index_put_impl_ QuantizedCPU: _index_put_impl_quantized_cpu_ + QuantizedCUDA: _index_put_impl_quantized_cuda_ autogen: _index_put_impl, _index_put_impl.out - func: instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> Tensor variants: function -- func: inverse(Tensor self) -> Tensor - variants: function, method - dispatch: - CompositeExplicitAutograd: inverse - -- func: inverse.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) - dispatch: - CompositeExplicitAutograd: inverse_out - - func: isclose(Tensor self, Tensor other, float rtol=1e-05, float atol=1e-08, bool equal_nan=False) -> Tensor variants: function, method @@ -2881,22 +2938,27 @@ - func: kthvalue.dimname_out(Tensor self, int k, Dimname dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices) -- func: layer_norm(Tensor input, int[] normalized_shape, Tensor? weight=None, Tensor? bias=None, float eps=1e-05, bool cudnn_enable=True) -> Tensor +- func: layer_norm(Tensor input, SymInt[] normalized_shape, Tensor? weight=None, Tensor? bias=None, float eps=1e-05, bool cudnn_enable=True) -> Tensor + dispatch: + CompositeImplicitAutograd: layer_norm_symint -- func: native_layer_norm(Tensor input, int[] normalized_shape, Tensor? weight, Tensor? bias, float eps) -> (Tensor, Tensor, Tensor) +- func: native_layer_norm(Tensor input, SymInt[] normalized_shape, Tensor? weight, Tensor? bias, float eps) -> (Tensor, Tensor, Tensor) dispatch: CPU: layer_norm_cpu CUDA: layer_norm_cuda MPS: layer_norm_mps CompositeExplicitAutograd: math_native_layer_norm + NestedTensorCPU, NestedTensorCUDA: nested_layer_norm autogen: native_layer_norm.out + tags: canonical -- func: native_layer_norm_backward(Tensor grad_out, Tensor input, int[] normalized_shape, Tensor mean, Tensor rstd, Tensor? weight, Tensor? bias, bool[3] output_mask) -> (Tensor, Tensor, Tensor) +- func: native_layer_norm_backward(Tensor grad_out, Tensor input, SymInt[] normalized_shape, Tensor mean, Tensor rstd, Tensor? weight, Tensor? bias, bool[3] output_mask) -> (Tensor, Tensor, Tensor) dispatch: CPU: layer_norm_backward_cpu CUDA: layer_norm_backward_cuda MPS: layer_norm_backward_mps autogen: native_layer_norm_backward.out + tags: canonical - func: nan_to_num(Tensor self, float? nan=None, float? posinf=None, float? neginf=None) -> Tensor variants: function, method @@ -2920,10 +2982,12 @@ dispatch: CompositeImplicitAutograd: linear NestedTensorCPU, NestedTensorCUDA: nested_linear + MPS: _mps_linear - func: linear_backward(Tensor self, Tensor grad_output, Tensor weight, bool[3] output_mask) -> (Tensor, Tensor, Tensor) dispatch: NestedTensorCPU, NestedTensorCUDA: nested_linear_backward + MPS: mps_linear_backward autogen: linear_backward.out - func: linear.out(Tensor input, Tensor weight, Tensor? bias=None, *, Tensor(a!) out) -> Tensor(a!) @@ -2931,15 +2995,6 @@ dispatch: CompositeExplicitAutograd: linear_out -# TODO: Add this function to MPS dispatch key so that we avoid declaring it in -# native_functions.yaml -# https://github.com/pytorch/pytorch/issues/77394 -- func: _mps_linear(Tensor self, Tensor weight, Tensor? bias=None) -> Tensor - python_module: nn - dispatch: - MPS: _mps_linear - autogen: _mps_linear.out - - func: mkldnn_linear(Tensor self, Tensor weight, Tensor? bias=None) -> Tensor python_module: nn dispatch: @@ -2961,21 +3016,6 @@ MkldnnCPU: mkldnn_linear_backward autogen: mkldnn_linear_backward.out -- func: _mps_linear_backward_input(int[] input_size, Tensor grad_output, Tensor weight) -> Tensor - dispatch: - MPS: _mps_linear_backward_input - autogen: _mps_linear_backward_input.out - -- func: _mps_linear_backward_weights(Tensor grad_output, Tensor input, Tensor weight, bool bias_defined) -> (Tensor, Tensor) - dispatch: - MPS: _mps_linear_backward_weights - autogen: _mps_linear_backward_weights.out - -- func: mps_linear_backward(Tensor self, Tensor grad_output, Tensor weight, bool[3] output_mask) -> (Tensor, Tensor, Tensor) - dispatch: - MPS: mps_linear_backward - autogen: mps_linear_backward.out - - func: fbgemm_linear_int8_weight_fp32_activation(Tensor input, Tensor weight, Tensor packed, Tensor col_offsets, Scalar weight_scale, Scalar weight_zero_point, Tensor bias) -> Tensor - func: fbgemm_linear_int8_weight(Tensor input, Tensor weight, Tensor packed, Tensor col_offsets, Scalar weight_scale, Scalar weight_zero_point, Tensor bias) -> Tensor @@ -3014,6 +3054,7 @@ device_check: NoCheck # TensorIterator structured_delegate: log.out variants: function, method + tags: canonical - func: log_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -3185,6 +3226,7 @@ - func: _log_softmax(Tensor self, int dim, bool half_to_float) -> Tensor structured_delegate: _log_softmax.out + tags: canonical - func: _log_softmax.out(Tensor self, int dim, bool half_to_float, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -3264,10 +3306,6 @@ CompositeImplicitAutograd: matmul_out NestedTensorCPU, NestedTensorCUDA: matmul_out_nested -- func: matrix_rank.tol(Tensor self, float tol, bool symmetric=False) -> Tensor - -- func: matrix_rank(Tensor self, bool symmetric=False) -> Tensor - # Alias to linalg.matrix_power - func: matrix_power(Tensor self, int n) -> Tensor variants: function, method @@ -3319,6 +3357,7 @@ variants: function, method dispatch: QuantizedCPU, QuantizedCUDA: qmax + tags: canonical - func: max.dim_max(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) max, Tensor(b!) max_values) -> (Tensor(a!) values, Tensor(b!) indices) device_check: NoCheck # TensorIterator @@ -3336,14 +3375,17 @@ - func: max.names_dim_max(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) max, Tensor(b!) max_values) -> (Tensor(a!) values, Tensor(b!) indices) device_check: NoCheck # TensorIterator -- func: value_selecting_reduction_backward(Tensor grad, int dim, Tensor indices, int[] sizes, bool keepdim) -> Tensor +- func: value_selecting_reduction_backward(Tensor grad, int dim, Tensor indices, SymInt[] sizes, bool keepdim) -> Tensor variants: function device_check: NoCheck device_guard: False + dispatch: + CompositeImplicitAutograd: value_selecting_reduction_backward_symint - func: amax(Tensor self, int[1] dim=[], bool keepdim=False) -> Tensor variants: function, method structured_delegate: amax.out + tags: canonical - func: amax.out(Tensor self, int[1] dim=[], bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -3425,6 +3467,7 @@ variants: function, method dispatch: QuantizedCPU: mean_quantized_cpu + tags: canonical - func: mean.out(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!) structured: True @@ -3453,6 +3496,7 @@ dispatch: CPU: median_cpu CUDA: median_cuda + MPS: median_mps autogen: median.out - func: median.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices) @@ -3464,6 +3508,7 @@ dispatch: CPU: median_out_cpu CUDA: median_out_cuda + MPS: median_out_mps - func: median.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices) variants: function, method @@ -3498,6 +3543,7 @@ variants: function, method dispatch: QuantizedCPU, QuantizedCUDA: qmin + tags: canonical - func: min.dim_min(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) min, Tensor(b!) min_indices) -> (Tensor(a!) values, Tensor(b!) indices) device_check: NoCheck # TensorIterator @@ -3518,6 +3564,7 @@ - func: amin(Tensor self, int[1] dim=[], bool keepdim=False) -> Tensor variants: function, method structured_delegate: amin.out + tags: canonical - func: amin.out(Tensor self, int[1] dim=[], bool keepdim=False, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -3538,7 +3585,7 @@ MPS: mps_convolution_backward autogen: mps_convolution_backward.out -- func: mkldnn_convolution(Tensor self, Tensor weight, Tensor? bias, int[] padding, int[] stride, int[] dilation, int groups) -> Tensor +- func: mkldnn_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, int[] stride, int[] dilation, int groups) -> Tensor dispatch: CompositeExplicitAutograd: mkldnn_convolution autogen: mkldnn_convolution.out @@ -3553,21 +3600,29 @@ CUDA: miopen_batch_norm_backward autogen: miopen_batch_norm_backward.out -- func: miopen_convolution(Tensor self, Tensor weight, Tensor? bias, int[] padding, int[] stride, int[] dilation, int groups, bool benchmark, bool deterministic) -> Tensor +- func: miopen_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, int[] stride, int[] dilation, int groups, bool benchmark, bool deterministic) -> Tensor dispatch: CUDA: miopen_convolution autogen: miopen_convolution.out -- func: miopen_convolution_transpose(Tensor self, Tensor weight, Tensor? bias, int[] padding, int[] output_padding, int[] stride, int[] dilation, int groups, bool benchmark, bool deterministic) -> Tensor +- func: miopen_convolution_transpose(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, SymInt[] output_padding, int[] stride, int[] dilation, int groups, bool benchmark, bool deterministic) -> Tensor dispatch: CUDA: miopen_convolution_transpose autogen: miopen_convolution_transpose.out -- func: miopen_depthwise_convolution(Tensor self, Tensor weight, Tensor? bias, int[] padding, int[] stride, int[] dilation, int groups, bool benchmark, bool deterministic) -> Tensor +- func: miopen_depthwise_convolution(Tensor self, Tensor weight, Tensor? bias, SymInt[] padding, int[] stride, int[] dilation, int groups, bool benchmark, bool deterministic) -> Tensor dispatch: CUDA: miopen_depthwise_convolution autogen: miopen_depthwise_convolution.out +- func: miopen_convolution_relu(Tensor self, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, int groups) -> Tensor + dispatch: + CUDA: miopen_convolution_relu + +- func: miopen_convolution_add_relu(Tensor self, Tensor weight, Tensor z, Scalar? alpha, Tensor? bias, int[] stride, int[] padding, int[] dilation, int groups) -> Tensor + dispatch: + CUDA: miopen_convolution_add_relu + - func: miopen_rnn(Tensor input, Tensor[] weight, int weight_stride0, Tensor hx, Tensor? cx, int mode, int hidden_size, int num_layers, bool batch_first, float dropout, bool train, bool bidirectional, int[] batch_sizes, Tensor? dropout_state) -> (Tensor, Tensor, Tensor, Tensor, Tensor) dispatch: CUDA: miopen_rnn @@ -3584,6 +3639,7 @@ dispatch: SparseCPU, SparseCUDA: _sparse_mm SparseCsrCPU, SparseCsrCUDA: _sparse_csr_mm + tags: canonical - func: mm.out(Tensor self, Tensor mat2, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -3609,11 +3665,6 @@ SparseCUDA: sparse_mask_helper_cuda autogen: _sparse_mask_helper.out -- func: spmm_sum(Tensor rowptr, Tensor col, Tensor? optional_value, Tensor mat) -> Tensor - variants: function - dispatch: - CPU: spmm_sum_cpu - - func: mode(Tensor self, int dim=-1, bool keepdim=False) -> (Tensor values, Tensor indices) variants: function, method dispatch: @@ -3638,6 +3689,7 @@ MkldnnCPU: mkldnn_mul ZeroTensor: mul_zerotensor NestedTensorCPU, NestedTensorCUDA: NestedTensor_mul_Tensor + tags: canonical - func: mul_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -3669,6 +3721,7 @@ CompositeExplicitAutograd: mul SparseCsrCPU, SparseCsrCUDA: mul_scalar_sparse_csr NestedTensorCPU, NestedTensorCUDA: NestedTensor_mul_Scalar + tags: canonical - func: mul_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -3720,33 +3773,31 @@ dispatch: CompositeExplicitAutograd: mvlgamma_ -- func: narrow_copy(Tensor self, int dim, int start, int length) -> Tensor +- func: narrow_copy(Tensor self, int dim, SymInt start, SymInt length) -> Tensor variants: function, method dispatch: CPU: narrow_copy_dense_cpu SparseCPU, SparseCUDA: narrow_copy_sparse - CompositeExplicitAutogradNonFunctional: narrow_copy_dense + CompositeExplicitAutogradNonFunctional: narrow_copy_dense_symint tags: view_copy -- func: narrow_copy.SymInt(Tensor self, int dim, int start, SymInt length) -> Tensor - variants: function, method - dispatch: - CompositeExplicitAutograd: narrow_copy_symint - autogen: narrow_copy.SymInt_out - -- func: narrow_copy.out(Tensor self, int dim, int start, int length, *, Tensor(a!) out) -> Tensor(a!) +- func: narrow_copy.out(Tensor self, int dim, SymInt start, SymInt length, *, Tensor(a!) out) -> Tensor(a!) dispatch: CPU: narrow_copy_dense_cpu_out -- func: narrow(Tensor(a) self, int dim, int start, int length) -> Tensor(a) +- func: narrow(Tensor(a) self, int dim, SymInt start, SymInt length) -> Tensor(a) variants: function, method device_check: NoCheck device_guard: False + dispatch: + CompositeImplicitAutograd: narrow_symint -- func: narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> Tensor(a) +- func: narrow.Tensor(Tensor(a) self, int dim, Tensor start, SymInt length) -> Tensor(a) variants: function, method device_check: NoCheck device_guard: False + dispatch: + CompositeImplicitAutograd: narrow_tensor_symint - func: native_batch_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor) dispatch: @@ -3754,11 +3805,42 @@ CUDA: batch_norm_cuda MPS: batch_norm_mps MkldnnCPU: mkldnn_batch_norm + tags: canonical - func: native_batch_norm.out(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, *, Tensor(a!) out, Tensor(b!) save_mean, Tensor(c!) save_invstd) -> (Tensor(a!), Tensor(b!), Tensor(c!)) dispatch: CUDA: batch_norm_cuda_out MPS: batch_norm_mps_out + CPU: batch_norm_cpu_out + +# TODO: In 2 weeks, we should make native_batch_norm composite implicit so that this correct schema percolates correctly through our dispatching +- func: _native_batch_norm_legit(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor) + dispatch: + CPU: _batch_norm_legit_cpu + CUDA: _batch_norm_legit_cuda + MPS: _batch_norm_legit_mps + MkldnnCPU: _mkldnn_batch_norm_legit + autogen: _native_batch_norm_legit_functional + +- func: _native_batch_norm_legit.out(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, bool training, float momentum, float eps, *, Tensor(d!) out, Tensor(e!) save_mean, Tensor(f!) save_invstd) -> (Tensor(d!), Tensor(e!), Tensor(f!)) + dispatch: + CPU: _batch_norm_legit_cpu_out + CUDA: _batch_norm_legit_cuda_out + MPS: _batch_norm_legit_mps_out + +- func: _native_batch_norm_legit.no_stats(Tensor input, Tensor? weight, Tensor? bias, bool training, float momentum, float eps) -> (Tensor, Tensor, Tensor) + dispatch: + CPU: _batch_norm_legit_no_stats_cpu + CUDA: _batch_norm_legit_no_stats_cuda + MPS: _batch_norm_legit_no_stats_mps + MkldnnCPU: _mkldnn_batch_norm_legit_no_stats + tags: canonical + +- func: _native_batch_norm_legit.no_stats_out(Tensor input, Tensor? weight, Tensor? bias, bool training, float momentum, float eps, *, Tensor(a!) out, Tensor(b!) save_mean, Tensor(c!) save_invstd) -> (Tensor(a!), Tensor(b!), Tensor(c!)) + dispatch: + CPU: _batch_norm_legit_no_stats_cpu_out + CUDA: _batch_norm_legit_no_stats_cuda_out + MPS: _batch_norm_legit_no_stats_mps_out - func: batch_norm_stats(Tensor input, float eps) -> (Tensor, Tensor) dispatch: @@ -3812,7 +3894,7 @@ - func: _nnpack_available() -> bool -- func: _nnpack_spatial_convolution(Tensor input, Tensor weight, Tensor? bias, int[2] padding, int[2] stride=1) -> Tensor +- func: _nnpack_spatial_convolution(Tensor input, Tensor weight, Tensor? bias, SymInt[2] padding, int[2] stride=1) -> Tensor variants: function dispatch: CompositeExplicitAutograd: _nnpack_spatial_convolution @@ -3825,11 +3907,11 @@ CompositeExplicitAutograd: ones autogen: ones.names_out -- func: ones(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: ones(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor dispatch: CompositeExplicitAutograd: ones -- func: ones.out(int[] size, *, Tensor(a!) out) -> Tensor(a!) +- func: ones.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!) dispatch: CompositeExplicitAutograd: ones_out @@ -3838,6 +3920,7 @@ # NB: Although this composite mutates on the inside, it is # non-differentiable so NonFunctional doesn't apply CompositeExplicitAutograd: ones_like + NestedTensorCPU, NestedTensorCUDA: ones_like autogen: ones_like.out - func: pairwise_distance(Tensor x1, Tensor x2, float p=2, float eps=1e-06, bool keepdim=False) -> Tensor @@ -3880,6 +3963,7 @@ CompositeExplicitAutograd: permute MPS: permute_mps SparseCPU, SparseCUDA: permute_sparse_coo + tags: canonical - func: movedim.intlist(Tensor(a) self, int[] source, int[] destination) -> Tensor(a) variants: function, method @@ -3971,66 +4055,81 @@ variants: function, method dispatch: CompositeExplicitAutograd: rad2deg + SparseCPU, SparseCUDA: rad2deg_sparse SparseCsrCPU, SparseCsrCUDA: rad2deg_sparse_csr - func: rad2deg_(Tensor(a!) self) -> Tensor(a!) variants: function, method dispatch: CompositeExplicitAutograd: rad2deg_ + SparseCPU, SparseCUDA: rad2deg_sparse_ SparseCsrCPU, SparseCsrCUDA: rad2deg_sparse_csr_ - func: rad2deg.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) dispatch: CompositeExplicitAutograd: rad2deg_out + SparseCPU, SparseCUDA: rad2deg_sparse_out SparseCsrCPU, SparseCsrCUDA: rad2deg_sparse_csr_out - func: deg2rad(Tensor self) -> Tensor variants: function, method dispatch: CompositeExplicitAutograd: deg2rad + SparseCPU, SparseCUDA: deg2rad_sparse + SparseCsrCPU, SparseCsrCUDA: deg2rad_sparse_csr - func: deg2rad_(Tensor(a!) self) -> Tensor(a!) variants: function, method dispatch: CompositeExplicitAutograd: deg2rad_ + SparseCPU, SparseCUDA: deg2rad_sparse_ + SparseCsrCPU, SparseCsrCUDA: deg2rad_sparse_csr_ - func: deg2rad.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) dispatch: CompositeExplicitAutograd: deg2rad_out + SparseCPU, SparseCUDA: deg2rad_sparse_out + SparseCsrCPU, SparseCsrCUDA: deg2rad_sparse_csr_out - func: scalar_tensor(Scalar s, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor dispatch: CompositeExplicitAutograd: scalar_tensor autogen: scalar_tensor.out + tags: canonical -- func: rand.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: rand.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor device_check: NoCheck device_guard: False dispatch: CompositeExplicitAutograd: rand autogen: rand.names_out + tags: nondeterministic_seeded -- func: rand.generator_with_names(int[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: rand.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor device_check: NoCheck device_guard: False + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: rand autogen: rand.generator_with_names_out -- func: rand(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: rand(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: rand -- func: rand.generator(int[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: rand.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: rand -- func: rand.out(int[] size, *, Tensor(a!) out) -> Tensor(a!) +- func: rand.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: rand_out -- func: rand.generator_out(int[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) +- func: rand.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded - func: rand_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor tags: nondeterministic_seeded @@ -4040,37 +4139,43 @@ CompositeExplicitAutograd: rand_like autogen: rand_like.out -- func: randint(int high, int[] size, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: randint(int high, SymInt[] size, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randint -- func: randint.generator(int high, int[] size, *, Generator? generator, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: randint.generator(int high, SymInt[] size, *, Generator? generator, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randint -- func: randint.low(int low, int high, int[] size, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: randint.low(int low, int high, SymInt[] size, *, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randint -- func: randint.low_generator(int low, int high, int[] size, *, Generator? generator, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: randint.low_generator(int low, int high, SymInt[] size, *, Generator? generator, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randint -- func: randint.out(int high, int[] size, *, Tensor(a!) out) -> Tensor(a!) +- func: randint.out(int high, SymInt[] size, *, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randint_out -- func: randint.generator_out(int high, int[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) +- func: randint.generator_out(int high, SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randint_out -- func: randint.low_out(int low, int high, int[] size, *, Tensor(a!) out) -> Tensor(a!) +- func: randint.low_out(int low, int high, SymInt[] size, *, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randint_out -- func: randint.low_generator_out(int low, int high, int[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) +- func: randint.low_generator_out(int low, int high, SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randint_out @@ -4090,32 +4195,37 @@ CompositeExplicitAutograd: randint_like autogen: randint_like.low_dtype_out -- func: randn(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: randn(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randn -- func: randn.generator(int[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: randn.generator(SymInt[] size, *, Generator? generator, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randn -- func: randn.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: randn.names(SymInt[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + tags: nondeterministic_seeded device_check: NoCheck device_guard: False dispatch: CompositeExplicitAutograd: randn autogen: randn.names_out -- func: randn.generator_with_names(int[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: randn.generator_with_names(SymInt[] size, *, Generator? generator, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + tags: nondeterministic_seeded device_check: NoCheck device_guard: False dispatch: CompositeExplicitAutograd: randn autogen: randn.generator_with_names_out -- func: randn.out(int[] size, *, Tensor(a!) out) -> Tensor(a!) +- func: randn.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded -- func: randn.generator_out(int[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) +- func: randn.generator_out(SymInt[] size, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded - func: randn_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor tags: nondeterministic_seeded @@ -4131,14 +4241,17 @@ CompositeExplicitAutograd: randperm - func: randperm.generator(int n, *, Generator? generator, ScalarType? dtype=long, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randperm - func: randperm.out(int n, *, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: randperm_out - func: randperm.generator_out(int n, *, Generator? generator, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CPU: randperm_out_cpu CUDA: randperm_out_cuda @@ -4168,6 +4281,7 @@ device_check: NoCheck # TensorIterator structured_delegate: reciprocal.out variants: function, method + tags: canonical - func: reciprocal_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -4189,6 +4303,8 @@ dispatch: SparseCPU, SparseCUDA: neg_sparse SparseCsrCPU, SparseCsrCUDA: neg_sparse_csr + NestedTensorCPU, NestedTensorCUDA: NestedTensor_neg + tags: canonical - func: neg_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -4197,6 +4313,7 @@ dispatch: SparseCPU, SparseCUDA: neg_sparse_ SparseCsrCPU, SparseCsrCUDA: neg_sparse_csr_ + NestedTensorCPU, NestedTensorCUDA: NestedTensor_neg_ - func: neg.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -4217,12 +4334,13 @@ - func: negative.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) -- func: repeat(Tensor self, int[] repeats) -> Tensor +- func: repeat(Tensor self, SymInt[] repeats) -> Tensor variants: method # This is method-only to match the previous tensor API. In the future we could make this a function too. dispatch: CompositeExplicitAutograd: repeat MPS: repeat_mps autogen: repeat.out + tags: canonical - func: repeat_interleave.Tensor(Tensor repeats, *, int? output_size=None) -> Tensor variants: function @@ -4235,28 +4353,28 @@ - func: repeat_interleave.self_Tensor(Tensor self, Tensor repeats, int? dim=None, *, int? output_size=None) -> Tensor variants: function, method -- func: repeat_interleave.self_int(Tensor self, int repeats, int? dim=None, *, int? output_size=None) -> Tensor +- func: repeat_interleave.self_int(Tensor self, SymInt repeats, int? dim=None, *, int? output_size=None) -> Tensor variants: function, method + dispatch: + CompositeImplicitAutograd: repeat_interleave_symint -- func: reshape(Tensor(a) self, int[] shape) -> Tensor(a) +- func: reshape(Tensor(a) self, SymInt[] shape) -> Tensor(a) variants: function, method device_check: NoCheck device_guard: False - -- func: _reshape_nested(Tensor self, int[] shape) -> Tensor dispatch: - NestedTensorCPU, NestedTensorCUDA: _reshape_nested - autogen: _reshape_nested.out + CompositeImplicitAutograd: reshape_symint + CompositeImplicitAutogradNestedTensor: reshape_nested -- func: _reshape_nested_backward(Tensor self, Tensor grad) -> Tensor +- func: _reshape_copy(Tensor self, SymInt[] size) -> Tensor + variants: function dispatch: - NestedTensorCPU, NestedTensorCUDA: _reshape_nested_backward - autogen: _reshape_nested_backward.out + CompositeExplicitAutograd: _reshape_copy_symint # NOTE [ _reshape_alias ] is meant to be used in the implementation of reshape. # They are not user-facing, hence the leading underscore. Please don't use it # anywhere else. -- func: _reshape_alias(Tensor(a) self, int[] size, int[] stride) -> Tensor(a) +- func: _reshape_alias(Tensor(a) self, SymInt[] size, SymInt[] stride) -> Tensor(a) variants: function, method device_check: NoCheck device_guard: False @@ -4275,6 +4393,9 @@ variants: method device_check: NoCheck device_guard: False + dispatch: + CompositeImplicitAutograd: reshape_as + CompositeImplicitAutogradNestedTensor: reshape_as_nested - func: round(Tensor self) -> Tensor device_check: NoCheck # TensorIterator @@ -4326,6 +4447,7 @@ tags: nondeterministic_seeded - func: rrelu_(Tensor(a!) self, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> Tensor(a!) + tags: nondeterministic_seeded device_check: NoCheck # TensorIterator - func: relu(Tensor self) -> Tensor @@ -4336,7 +4458,11 @@ MPS: relu_mps MkldnnCPU: mkldnn_relu QuantizedCPU: relu_quantized_cpu + QuantizedCUDA: relu_quantized_cuda NestedTensorCPU, NestedTensorCUDA: NestedTensor_relu + SparseCPU, SparseCUDA: relu_sparse + SparseCsrCPU, SparseCsrCUDA: relu_sparse_csr + tags: canonical - func: relu_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -4346,7 +4472,10 @@ MPS: relu_mps_ MkldnnCPU: mkldnn_relu_ QuantizedCPU: relu_quantized_cpu_ + QuantizedCUDA: relu_quantized_cuda_ NestedTensorCPU, NestedTensorCUDA: NestedTensor_relu_ + SparseCPU, SparseCUDA: relu_sparse_ + SparseCsrCPU, SparseCsrCUDA: relu_sparse_csr_ autogen: relu.out - func: relu6(Tensor self) -> Tensor @@ -4400,6 +4529,7 @@ QuantizedCPU: gelu_quantized_cpu QuantizedCUDA: gelu_quantized_cuda NestedTensorCPU, NestedTensorCUDA: NestedTensor_gelu + tags: canonical - func: gelu_backward.grad_input(Tensor grad_output, Tensor self, *, str approximate='none', Tensor(a!) grad_input) -> Tensor(a!) structured: True @@ -4448,6 +4578,7 @@ device_check: NoCheck # TensorIterator structured_delegate: rsqrt.out variants: function, method + tags: canonical - func: rsqrt_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -4467,23 +4598,30 @@ device_check: NoCheck device_guard: False -- func: select.int(Tensor(a) self, int dim, int index) -> Tensor(a) +- func: select.int(Tensor(a) self, int dim, SymInt index) -> Tensor(a) variants: function, method device_check: NoCheck device_guard: False dispatch: - CompositeExplicitAutograd: select + CompositeExplicitAutograd: select_symint SparseCsrCPU, SparseCsrCUDA: select_sparse_csr NestedTensorCPU, NestedTensorCUDA: select_nested -- func: select_backward(Tensor grad_output, int[] input_sizes, int dim, int index) -> Tensor +- func: select_backward(Tensor grad_output, SymInt[] input_sizes, int dim, SymInt index) -> Tensor variants: function device_check: NoCheck device_guard: False dispatch: - CompositeExplicitAutogradNonFunctional: select_backward + CompositeExplicitAutogradNonFunctional: select_backward_symint autogen: select_backward.out +- func: _nested_select_backward(Tensor grad_output, Tensor self, int dim, SymInt index) -> Tensor + variants: function + device_check: NoCheck + device_guard: False + dispatch: + NestedTensorCPU, NestedTensorCUDA: _nested_select_backward_symint + - func: selu(Tensor self) -> Tensor device_check: NoCheck # TensorIterator @@ -4559,6 +4697,7 @@ dispatch: QuantizedCPU: sigmoid_quantized_cpu MkldnnCPU: mkldnn_sigmoid + tags: canonical - func: sigmoid_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -4670,6 +4809,7 @@ variants: function, method dispatch: CompositeExplicitAutograd: detach + NestedTensorCPU, NestedTensorCUDA: detach # Like `detach()`, but modifies this `Variable` in-place. This method may # only be called on non-view `Variable`s. You can use `is_view()` to check @@ -4691,14 +4831,18 @@ device_check: NoCheck device_guard: False -- func: slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> Tensor(a) +- func: slice.Tensor(Tensor(a) self, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor(a) variants: function, method device_check: NoCheck device_guard: False dispatch: CompositeExplicitAutograd: slice + tags: canonical + +# NOTE: The implementation of split_with_sizes bypasses the dispatcher to call this; undo +# that if adding specific implementations here! -- func: slice_backward(Tensor grad_output, int[] input_sizes, int dim, int start, int end, int step) -> Tensor +- func: slice_backward(Tensor grad_output, SymInt[] input_sizes, int dim, SymInt start, SymInt end, SymInt step) -> Tensor variants: function device_check: NoCheck device_guard: False @@ -4706,20 +4850,21 @@ CompositeExplicitAutograd: slice_backward autogen: slice_backward.out -- func: slice_scatter(Tensor self, Tensor src, int dim=0, int? start=None, int? end=None, int step=1) -> Tensor +- func: slice_scatter(Tensor self, Tensor src, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor variants: function, method device_check: NoCheck device_guard: False dispatch: CompositeExplicitAutograd: slice_scatter autogen: slice_scatter.out + tags: canonical -- func: select_scatter(Tensor self, Tensor src, int dim, int index) -> Tensor +- func: select_scatter(Tensor self, Tensor src, int dim, SymInt index) -> Tensor variants: function, method device_check: NoCheck device_guard: False dispatch: - CompositeExplicitAutograd: select_scatter + CompositeExplicitAutograd: select_scatter_symint autogen: select_scatter.out - func: diagonal_scatter(Tensor self, Tensor src, int offset=0, int dim1=0, int dim2=1) -> Tensor @@ -4730,12 +4875,12 @@ CompositeExplicitAutograd: diagonal_scatter autogen: diagonal_scatter.out -- func: as_strided_scatter(Tensor self, Tensor src, int[] size, int[] stride, int? storage_offset=None) -> Tensor +- func: as_strided_scatter(Tensor self, Tensor src, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor variants: function, method device_check: NoCheck device_guard: False dispatch: - CompositeExplicitAutograd: as_strided_scatter + CompositeExplicitAutograd: as_strided_scatter_symint autogen: as_strided_scatter.out - func: smm(Tensor self, Tensor mat2) -> Tensor @@ -4758,6 +4903,7 @@ dispatch: MkldnnCPU: mkldnn_softmax NestedTensorCPU, NestedTensorCUDA: softmax_nested + tags: canonical - func: _softmax.out(Tensor self, int dim, bool half_to_float, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -4778,7 +4924,7 @@ CUDA: softmax_backward_cuda_out MPS: softmax_backward_mps_out -- func: unsafe_split.Tensor(Tensor self, int split_size, int dim=0) -> Tensor[] +- func: unsafe_split.Tensor(Tensor self, SymInt split_size, int dim=0) -> Tensor[] variants: function, method device_check: NoCheck device_guard: False @@ -4786,18 +4932,20 @@ CompositeExplicitAutograd: unsafe_split autogen: unsafe_split.Tensor_out -- func: split.Tensor(Tensor(a -> *) self, int split_size, int dim=0) -> Tensor(a)[] +- func: split.Tensor(Tensor(a -> *) self, SymInt split_size, int dim=0) -> Tensor(a)[] variants: function, method device_check: NoCheck device_guard: False dispatch: CompositeExplicitAutograd: split -- func: split.sizes(Tensor(a -> *) self, int[] split_size, int dim=0) -> Tensor(a)[] +- func: split.sizes(Tensor(a -> *) self, SymInt[] split_size, int dim=0) -> Tensor(a)[] variants: function, method device_guard: False + dispatch: + CompositeImplicitAutograd: split_symint -- func: unsafe_split_with_sizes(Tensor self, int[] split_sizes, int dim=0) -> Tensor[] +- func: unsafe_split_with_sizes(Tensor self, SymInt[] split_sizes, int dim=0) -> Tensor[] variants: function, method device_check: NoCheck device_guard: False @@ -4805,7 +4953,7 @@ CompositeExplicitAutograd: unsafe_split_with_sizes autogen: unsafe_split_with_sizes.out -- func: split_with_sizes(Tensor(a -> *) self, int[] split_sizes, int dim=0) -> Tensor(a)[] +- func: split_with_sizes(Tensor(a -> *) self, SymInt[] split_sizes, int dim=0) -> Tensor(a)[] variants: function, method device_check: NoCheck device_guard: False @@ -4837,6 +4985,7 @@ dispatch: CompositeExplicitAutograd: squeeze QuantizedCPU, QuantizedCUDA: squeeze_quantized + NestedTensorCPU, NestedTensorCUDA: squeeze_nested - func: squeeze.dim(Tensor(a) self, int dim) -> Tensor(a) variants: function, method @@ -4845,6 +4994,8 @@ dispatch: CompositeExplicitAutograd: squeeze QuantizedCPU, QuantizedCUDA: squeeze_quantized + NestedTensorCPU, NestedTensorCUDA: squeeze_dim_nested + tags: canonical - func: squeeze.dimname(Tensor(a) self, Dimname dim) -> Tensor(a) variants: function, method @@ -4940,22 +5091,17 @@ variants: function, method dispatch: CompositeExplicitAutograd: sum + SparseCPU, SparseCUDA: sum_coo SparseCsrCPU, SparseCsrCUDA: sum_csr autogen: sum.out -- func: sum.SymInt(Tensor self, SymInt[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor - device_check: NoCheck # TensorIterator - variants: function, method - dispatch: - CompositeExplicitAutograd: sum_symint - autogen: sum.SymInt_out - - func: sum.dim_IntList(Tensor self, int[1]? dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor structured_delegate: sum.IntList_out device_check: NoCheck # TensorIterator variants: function, method dispatch: NestedTensorCPU: NestedTensor_sum_dim_CPU + tags: canonical - func: sum.dim_DimnameList(Tensor self, Dimname[1] dim, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor device_check: NoCheck # TensorIterator @@ -4971,6 +5117,11 @@ - func: sum.DimnameList_out(Tensor self, Dimname[1] dim, bool keepdim=False, *, ScalarType? dtype=None, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator +# TODO: this function will be replaced once nested expand semantics have been settled on +- func: _nested_sum_backward(Tensor grad, Tensor self, int[1]? dim, bool keepdim=False) -> Tensor + dispatch: + NestedTensorCPU: _nested_sum_backward_cpu + - func: nansum(Tensor self, int[1]? dim=None, bool keepdim=False, *, ScalarType? dtype=None) -> Tensor variants: function, method dispatch: @@ -4992,6 +5143,7 @@ dispatch: SparseCPU, SparseCUDA: sqrt_sparse SparseCsrCPU, SparseCsrCUDA: sqrt_sparse_csr + tags: canonical - func: sqrt_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -5161,6 +5313,8 @@ MkldnnCPU: mkldnn_tanh SparseCPU, SparseCUDA: tanh_sparse SparseCsrCPU, SparseCsrCUDA: tanh_sparse_csr + NestedTensorCPU, NestedTensorCUDA: NestedTensor_tanh + tags: canonical - func: tanh_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -5170,6 +5324,7 @@ MkldnnCPU: mkldnn_tanh_ SparseCPU, SparseCUDA: tanh_sparse_ SparseCsrCPU, SparseCsrCUDA: tanh_sparse_csr_ + NestedTensorCPU, NestedTensorCUDA: NestedTensor_tanh_ - func: tanh.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -5216,12 +5371,16 @@ dispatch: CPU, CUDA: threshold_backward_out MPS: threshold_backward_out_mps + SparseCPU, SparseCUDA: threshold_backward_sparse_out + SparseCsrCPU, SparseCsrCUDA: threshold_backward_sparse_compressed_out - func: threshold_backward(Tensor grad_output, Tensor self, Scalar threshold) -> Tensor variants: function structured_delegate: threshold_backward.grad_input dispatch: MkldnnCPU: mkldnn_relu_backward + SparseCPU, SparseCUDA: threshold_backward_sparse + SparseCsrCPU, SparseCsrCUDA: threshold_backward_sparse_compressed - func: tile(Tensor self, int[] dims) -> Tensor variants: function, method @@ -5324,12 +5483,24 @@ CUDA: nested_from_padded_cuda autogen: _nested_from_padded.out +# These private functions are temporary. They will be updated/deleted when nested tensors switch to using SymInts for their metadata representation - func: _nested_tensor_size(Tensor self) -> Tensor variants: method dispatch: - NestedTensorCPU, NestedTensorCUDA: NestedTensor_get_nested_size_tensor + NestedTensorCPU, NestedTensorCUDA: _nested_tensor_size autogen: _nested_tensor_size.out +- func: _nested_tensor_strides(Tensor self) -> Tensor + variants: method + dispatch: + NestedTensorCPU, NestedTensorCUDA: _nested_tensor_strides + autogen: _nested_tensor_strides.out + +- func: _nested_tensor_offsets(Tensor self) -> int[] + variants: method + dispatch: + NestedTensorCPU, NestedTensorCUDA: _nested_tensor_offsets + # _nested_from_padded is not usable from Python, so # _nested_from_padded_and_nested_example is available for testing. - func: _nested_from_padded_and_nested_example(Tensor padded, Tensor nt_example) -> Tensor @@ -5337,6 +5508,22 @@ NestedTensorCPU, NestedTensorCUDA: NestedTensor_from_padded_and_nested_example autogen: _nested_from_padded_and_nested_example.out +# The input arguments' types to this functions are temporary. When nested tensors switch to using SymInts for their metadata representation +# this will need to be updated +- func: _nested_view_from_buffer(Tensor(a) self, Tensor nested_size, Tensor nested_strides, int[] offsets) -> Tensor(a) + variants: function + device_check: NoCheck + dispatch: + CPU, CUDA: _nested_view_from_buffer + +- func: _nested_view_from_buffer_copy(Tensor self, Tensor nested_size, Tensor nested_strides, int[] offsets) -> Tensor + variants: function + device_check: NoCheck + tags: view_copy + dispatch: + CompositeExplicitAutogradNonFunctional: _nested_view_from_buffer_copy + autogen: _nested_view_from_buffer_copy.out + - func: _trilinear(Tensor i1, Tensor i2, Tensor i3, int[] expand1, int[] expand2, int[] expand3, int[] sumdim, int unroll_dim=1) -> Tensor dispatch: # calls unsqueeze @@ -5429,7 +5616,7 @@ tags: dynamic_output_shape autogen: _unique2.out -- func: _unsafe_view(Tensor self, int[] size) -> Tensor +- func: _unsafe_view(Tensor self, SymInt[] size) -> Tensor dispatch: CompositeExplicitAutograd: _unsafe_view autogen: _unsafe_view.out @@ -5442,6 +5629,8 @@ CompositeExplicitAutograd: unsqueeze SparseCPU, SparseCUDA: unsqueeze_sparse QuantizedCPU, QuantizedCUDA: unsqueeze_quantized + NestedTensorCPU, NestedTensorCUDA: unsqueeze_nested + tags: canonical - func: unsqueeze_(Tensor(a!) self, int dim) -> Tensor(a!) variants: method @@ -5460,6 +5649,7 @@ - func: var.dim(Tensor self, int[1]? dim, bool unbiased=True, bool keepdim=False) -> Tensor device_check: NoCheck # TensorIterator variants: function, method + tags: canonical - func: var.correction(Tensor self, int[1]? dim, *, int? correction, bool keepdim=False) -> Tensor device_check: NoCheck # TensorIterator @@ -5525,6 +5715,7 @@ dispatch: CPU, CUDA: where MPS: where_mps + tags: canonical - func: where.self_out(Tensor condition, Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -5583,19 +5774,14 @@ CUDA: _efficientzerotensor_cuda autogen: _efficientzerotensor.out -- func: zeros(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor - dispatch: - CompositeExplicitAutograd: zeros - -- func: zeros.SymInt(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: zeros(SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor dispatch: CompositeExplicitAutograd: zeros_symint - autogen: zeros.SymInt_out -- func: zeros.out(int[] size, *, Tensor(a!) out) -> Tensor(a!) +- func: zeros.out(SymInt[] size, *, Tensor(a!) out) -> Tensor(a!) dispatch: CompositeExplicitAutograd: zeros_out - SparseCPU, SparseCUDA, SparseMeta: zeros_out + SparseCPU, SparseCUDA, SparseMeta: zeros_sparse_out - func: zeros_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor dispatch: @@ -5626,6 +5812,7 @@ autogen: _dirichlet_grad.out - func: _sample_dirichlet(Tensor self, Generator? generator=None) -> Tensor + tags: nondeterministic_seeded variants: function dispatch: CPU: _s_dirichlet_cpu @@ -5842,6 +6029,7 @@ QuantizedCPU, QuantizedCUDA: quantized_clone NestedTensorCPU, NestedTensorCUDA: clone_nested autogen: clone.out + tags: canonical - func: positive(Tensor(a) self) -> Tensor(a) variants: function, method @@ -5858,7 +6046,7 @@ variants: function, method dispatch: SparseCPU, SparseCUDA: resize_as_sparse_ - SparseCsrCPU, SparseCsrCUDA: resize_as_sparse_csr_ + SparseCsrCPU, SparseCsrCUDA: resize_as_sparse_compressed_ autogen: resize_as_sparse, resize_as_sparse.out - func: zero_(Tensor(a!) self) -> Tensor(a!) @@ -5889,6 +6077,7 @@ dispatch: SparseCPU, SparseCUDA: sub_sparse ZeroTensor: sub_zerotensor + tags: canonical - func: sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -5903,6 +6092,7 @@ variants: function, method dispatch: CompositeExplicitAutograd: sub + tags: canonical - func: sub_.Scalar(Tensor(a!) self, Scalar other, Scalar alpha=1) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -5979,6 +6169,19 @@ SparseCsrCUDA: sparse_sampled_addmm_sparse_csr_cuda SparseCsrCPU: sparse_sampled_addmm_sparse_csr_cpu +- func: spmm_reduce(Tensor input, Tensor weight, str reduce, Tensor? row_indices=None, Tensor? ccol_indices=None, Tensor? csr2csc=None) -> Tensor + python_module: sparse + +- func: _spmm_reduce(Tensor input, Tensor weight, str reduce, Tensor? row_indices=None, Tensor? ccol_indices=None, Tensor? csr2csc=None) -> (Tensor, Tensor) + python_module: sparse + dispatch: + SparseCsrCPU: _spmm_reduce_sparse_csr_cpu + +- func: _spmm_reduce_backward(Tensor input, Tensor grad_out, Tensor weight, str reduce, Tensor arg_out, Tensor row_indices, Tensor ccol_indices, Tensor csr2csc, bool[2] output_mask) -> (Tensor, Tensor) + python_module: sparse + dispatch: + SparseCsrCPU: _spmm_reduce_backward_sparse_csr_cpu + - func: addmm.out(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, Tensor(a!) out) -> Tensor(a!) structured: True dispatch: @@ -5997,6 +6200,7 @@ SparseCPU: addmm_sparse_dense_cpu SparseCUDA: addmm_sparse_dense_cuda SparseCsrCPU, SparseCsrCUDA: addmm_sparse_compressed_dense + tags: canonical - func: addmm_(Tensor(a!) self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor(a!) structured_delegate: addmm.out @@ -6155,7 +6359,9 @@ - func: sparse_coo_tensor.indices_size(Tensor indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor -- func: _sparse_coo_tensor_unsafe(Tensor indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: _sparse_coo_tensor_unsafe(Tensor indices, Tensor values, SymInt[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + dispatch: + CompositeImplicitAutograd: _sparse_coo_tensor_unsafe_symint - func: _validate_sparse_coo_tensor_args(Tensor indices, Tensor values, int[] size) -> () @@ -6170,9 +6376,9 @@ SparseCPU, SparseCUDA, SparseMeta, Meta: new_with_dims_sparse autogen: _sparse_coo_tensor_with_dims.out -- func: _sparse_coo_tensor_with_dims_and_tensors(int sparse_dim, int dense_dim, int[] size, Tensor indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor +- func: _sparse_coo_tensor_with_dims_and_tensors(int sparse_dim, int dense_dim, SymInt[] size, Tensor indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor dispatch: - SparseCPU, SparseCUDA, SparseMeta, Meta: new_with_dims_and_tensor_sparse + SparseCPU, SparseCUDA, SparseMeta, Meta: new_with_dims_and_tensor_sparse_symint autogen: _sparse_coo_tensor_with_dims_and_tensors.out - func: sparse_resize_(Tensor(a!) self, int[] size, int sparse_dim, int dense_dim) -> Tensor(a!) @@ -6217,6 +6423,7 @@ - func: sparse_dim(Tensor self) -> int variants: method dispatch: + CPU, CUDA: sparse_dim_strided SparseCPU, SparseCUDA, SparseMeta: sparse_dim_sparse SparseCsrCPU, SparseCsrCUDA: sparse_dim_sparse_csr device_check: NoCheck @@ -6233,6 +6440,7 @@ - func: dense_dim(Tensor self) -> int variants: method dispatch: + CPU, CUDA: dense_dim_strided SparseCPU, SparseCUDA, SparseMeta: dense_dim_sparse SparseCsrCPU, SparseCsrCUDA: dense_dim_sparse_csr device_check: NoCheck @@ -6312,6 +6520,7 @@ dispatch: SparseCPU, SparseCUDA, SparseMeta: values_sparse SparseCsrCPU, SparseCsrCUDA: values_sparse_csr + NestedTensorCPU, NestedTensorCUDA: values_nested device_check: NoCheck device_guard: False @@ -6360,11 +6569,12 @@ SparseCPU, SparseCUDA: copy_sparse_ autogen: copy_sparse_to_sparse, copy_sparse_to_sparse.out +# By adding the AutogradNestedTensor this makes this function CompositeImplicit-like for nested tensors - func: unbind.int(Tensor(a -> *) self, int dim=0) -> Tensor(a)[] variants: function, method dispatch: CompositeExplicitAutograd: unbind - NestedTensorCPU, NestedTensorCUDA: NestedTensor_unbind + CompositeImplicitAutogradNestedTensor: NestedTensor_unbind - func: unbind.Dimname(Tensor(a -> *) self, Dimname dim) -> Tensor(a)[] variants: function, method @@ -6421,7 +6631,7 @@ CPU: dense_to_mkldnn autogen: to_mkldnn.out -- func: mkldnn_reorder_conv2d_weight(Tensor self, int[2] padding=0, int[2] stride=1, int[2] dilation=1, int groups=1) -> Tensor +- func: mkldnn_reorder_conv2d_weight(Tensor self, int[2] padding=0, int[2] stride=1, int[2] dilation=1, int groups=1, int[]? input_size=None) -> Tensor variants: function python_module: nn dispatch: @@ -6563,6 +6773,8 @@ - func: _fake_quantize_learnable_per_tensor_affine_backward(Tensor grad, Tensor self, Tensor scale, Tensor zero_point, int quant_min, int quant_max, float grad_factor=1.0) -> (Tensor, Tensor, Tensor) variants: function + dispatch: + CPU, CUDA: _fake_quantize_learnable_per_tensor_affine_backward - func: fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> Tensor device_check: NoCheck # TensorIterator @@ -6585,6 +6797,8 @@ - func: _fake_quantize_learnable_per_channel_affine_backward(Tensor grad, Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max, float grad_factor=1.0) -> (Tensor, Tensor, Tensor) variants: function + dispatch: + CPU, CUDA: _fake_quantize_learnable_per_channel_affine_backward - func: fused_moving_avg_obs_fake_quant(Tensor self, Tensor observer_on, Tensor fake_quant_on, Tensor(a!) running_min, Tensor(b!) running_max, Tensor(c!) scale, Tensor(d!) zero_point, float averaging_const, int quant_min, int quant_max, int ch_axis, bool per_row_fake_quant=False, bool symmetric_quant=False) -> Tensor variants: function @@ -6617,7 +6831,9 @@ device_guard: False dispatch: CompositeExplicitAutograd: _to_copy + NestedTensorCPU, NestedTensorCUDA: _to_copy_nested autogen: _to_copy.out + tags: canonical # to(Device) must not exist because all constructors of Device also works for # TensorOptions. Otherwise, an ambiguity error is thrown. @@ -6785,7 +7001,9 @@ CompositeExplicitAutograd: _pack_padded_sequence autogen: _pack_padded_sequence.out -- func: _pack_padded_sequence_backward(Tensor grad, int[] input_size, Tensor batch_sizes, bool batch_first) -> Tensor +- func: _pack_padded_sequence_backward(Tensor grad, SymInt[] input_size, Tensor batch_sizes, bool batch_first) -> Tensor + dispatch: + CompositeImplicitAutograd: _pack_padded_sequence_backward_symint - func: _pad_packed_sequence(Tensor data, Tensor batch_sizes, bool batch_first, Scalar padding_value, int total_length) -> (Tensor, Tensor) @@ -6799,21 +7017,24 @@ CPU, CUDA, Meta, MPS: set_ autogen: set.source_Storage, set.source_Storage_out -- func: set_.source_Storage_storage_offset(Tensor(a!) self, Storage source, int storage_offset, int[] size, int[] stride=[]) -> Tensor(a!) +- func: set_.source_Storage_storage_offset(Tensor(a!) self, Storage source, SymInt storage_offset, SymInt[] size, SymInt[] stride=[]) -> Tensor(a!) variants: method device_check: NoCheck device_guard: False dispatch: - CPU, Meta: set_storage_cpu_ + CPU: set_storage_cpu_ + Meta: set_storage_meta__symint CUDA: set_storage_cuda_ MPS: set_storage_mps_ QuantizedCPU, QuantizedCUDA: set_storage_quantized_ autogen: set.source_Storage_storage_offset, set.source_Storage_storage_offset_out -- func: set_.source_Tensor_storage_offset(Tensor(a!) self, Tensor source, int storage_offset, int[] size, int[] stride=[]) -> Tensor(a!) +- func: set_.source_Tensor_storage_offset(Tensor(a!) self, Tensor source, SymInt storage_offset, SymInt[] size, SymInt[] stride=[]) -> Tensor(a!) variants: method device_check: NoCheck device_guard: False + dispatch: + CompositeImplicitAutograd: set__symint - func: set_.source_Tensor(Tensor(a!) self, Tensor source) -> Tensor(a!) variants: method @@ -6855,7 +7076,7 @@ - func: lift_fresh_copy(Tensor self) -> Tensor tags: view_copy dispatch: - CompositeExplicitAutograd: lift_fresh_copy + CompositeExplicitAutogradNonFunctional: lift_fresh_copy autogen: lift_fresh_copy.out - func: is_set_to(Tensor self, Tensor tensor) -> bool @@ -6872,6 +7093,7 @@ CPU: masked_fill__cpu CUDA: masked_fill__cuda QuantizedCPU: masked_fill__quantized_cpu + QuantizedCUDA: masked_fill__quantized_cuda MPS: masked_fill__mps autogen: masked_fill.Scalar_out @@ -6888,6 +7110,7 @@ CPU: masked_fill__cpu CUDA: masked_fill__cuda QuantizedCPU: masked_fill__quantized_cpu + QuantizedCUDA: masked_fill__quantized_cuda MPS: masked_fill__mps autogen: masked_fill.Tensor_out @@ -6921,21 +7144,15 @@ CPU: masked_softmax_backward_cpu autogen: _masked_softmax_backward.out -- func: view.SymInt(Tensor(a) self, SymInt[] size) -> Tensor(a) - variants: method - device_check: NoCheck - device_guard: False - dispatch: - CompositeExplicitAutograd: view_symint - MkldnnCPU: mkldnn_view_symint - -- func: view(Tensor(a) self, int[] size) -> Tensor(a) +- func: view(Tensor(a) self, SymInt[] size) -> Tensor(a) variants: method device_check: NoCheck device_guard: False dispatch: - ZeroTensor, CPU, CUDA, Meta, QuantizedCPU, QuantizedCUDA, MPS: view + ZeroTensor, Meta, CPU, CUDA, QuantizedCPU, QuantizedCUDA, MPS: view MkldnnCPU: mkldnn_view + NestedTensorCPU, NestedTensorCUDA: view_nested + tags: canonical # Warning: If you want to change the name or overload name of this # operator, you might also want to change the `isBlockListedSchema` @@ -7111,6 +7328,7 @@ - func: scatter_add(Tensor self, int dim, Tensor index, Tensor src) -> Tensor structured_delegate: scatter_add.out variants: function, method + tags: canonical - func: scatter_add_(Tensor(a!) self, int dim, Tensor index, Tensor src) -> Tensor(a!) structured_delegate: scatter_add.out @@ -7181,6 +7399,7 @@ device_check: NoCheck # TensorIterator variants: method, function structured_delegate: bitwise_and.Tensor_out + tags: canonical - func: bitwise_and_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -7236,6 +7455,7 @@ device_check: NoCheck # TensorIterator variants: method, function structured_delegate: bitwise_or.Tensor_out + tags: canonical - func: bitwise_or_.Scalar(Tensor(a!) self, Scalar other) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -7496,6 +7716,7 @@ - func: random_.from(Tensor(a!) self, int from, int? to, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: method + tags: nondeterministic_seeded dispatch: CPU, CUDA: random_ Meta: random_meta_ @@ -7504,6 +7725,7 @@ - func: random_.to(Tensor(a!) self, int to, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator + tags: nondeterministic_seeded variants: method dispatch: CPU, CUDA: random_ @@ -7513,6 +7735,7 @@ - func: random_(Tensor(a!) self, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator + tags: nondeterministic_seeded variants: method dispatch: CPU, CUDA: random_ @@ -7521,6 +7744,7 @@ - func: uniform_(Tensor(a!) self, float from=0, float to=1, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator + tags: nondeterministic_seeded variants: method dispatch: CPU, CUDA: uniform_ @@ -7531,12 +7755,14 @@ - func: cauchy_(Tensor(a!) self, float median=0, float sigma=1, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator variants: method + tags: nondeterministic_seeded dispatch: CPU, CUDA: cauchy_ autogen: cauchy, cauchy.out - func: log_normal_(Tensor(a!) self, float mean=1, float std=2, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator + tags: nondeterministic_seeded variants: method dispatch: CPU, CUDA: log_normal_ @@ -7544,6 +7770,7 @@ - func: exponential_(Tensor(a!) self, float lambd=1, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator + tags: nondeterministic_seeded variants: method dispatch: CPU, CUDA: exponential_ @@ -7552,6 +7779,7 @@ - func: geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator + tags: nondeterministic_seeded variants: method dispatch: CPU, CUDA: geometric_ @@ -7560,20 +7788,9 @@ autogen: geometric, geometric.out - func: diag.out(Tensor self, int diagonal=0, *, Tensor(a!) out) -> Tensor(a!) - dispatch: - CPU: diag_cpu_out - CUDA: diag_cuda_out - MPS: diag_mps_out - func: diag(Tensor self, int diagonal=0) -> Tensor variants: method, function - dispatch: - CompositeExplicitAutograd: diag - -- func: diag_backward(Tensor grad, int[] input_sizes, int diagonal) -> Tensor - variants: function - device_check: NoCheck - device_guard: False - func: cross.out(Tensor self, Tensor other, int? dim=None, *, Tensor(a!) out) -> Tensor(a!) @@ -7619,12 +7836,15 @@ dispatch: CPU: trace_cpu CUDA: trace_cuda + MPS: trace_mps_out autogen: trace.out -- func: trace_backward(Tensor grad, int[] sizes) -> Tensor +- func: trace_backward(Tensor grad, SymInt[] sizes) -> Tensor variants: function device_check: NoCheck device_guard: False + dispatch: + CompositeImplicitAutograd: trace_backward_symint - func: ne.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -7700,6 +7920,7 @@ variants: method, function dispatch: QuantizedCPU: eq_quantized_cpu + tags: canonical - func: eq.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -7732,6 +7953,7 @@ variants: method, function dispatch: QuantizedCPU: ge_quantized_cpu + tags: canonical - func: ge.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -7791,6 +8013,7 @@ variants: method, function dispatch: QuantizedCPU: le_quantized_cpu + tags: canonical - func: le.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -7850,6 +8073,7 @@ variants: method, function dispatch: QuantizedCPU: gt_quantized_cpu + tags: canonical - func: gt.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -7909,6 +8133,7 @@ variants: method, function dispatch: QuantizedCPU: lt_quantized_cpu + tags: canonical - func: lt.Tensor_out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -7983,21 +8208,25 @@ SparseCPU: index_select_sparse_cpu SparseCUDA: index_select_sparse_cuda MPS: index_select_mps + tags: canonical - func: index_select.dimname_out(Tensor self, Dimname dim, Tensor index, *, Tensor(a!) out) -> Tensor(a!) - func: index_select.dimname(Tensor self, Dimname dim, Tensor index) -> Tensor variants: method, function -- func: index_select_backward(Tensor grad, int[] self_sizes, int dim, Tensor index) -> Tensor +- func: index_select_backward(Tensor grad, SymInt[] self_sizes, int dim, Tensor index) -> Tensor variants: function device_check: NoCheck device_guard: False + dispatch: + CompositeImplicitAutograd: index_select_backward_symint - func: masked_select.out(Tensor self, Tensor mask, *, Tensor(a!) out) -> Tensor(a!) dispatch: CPU: masked_select_out_cpu CUDA: masked_select_out_cuda + MPS: masked_select_out_mps tags: dynamic_output_shape - func: masked_select(Tensor self, Tensor mask) -> Tensor @@ -8005,6 +8234,7 @@ dispatch: CPU: masked_select_cpu CUDA: masked_select_cuda + MPS: masked_select_mps tags: dynamic_output_shape - func: masked_select_backward(Tensor grad, Tensor input, Tensor mask) -> Tensor @@ -8023,7 +8253,7 @@ dispatch: CPU: nonzero_cpu CUDA: nonzero_cuda - tags: dynamic_output_shape + tags: dynamic_output_shape, canonical - func: nonzero_numpy(Tensor self) -> Tensor[] variants: method, function @@ -8041,6 +8271,7 @@ - func: gather(Tensor self, int dim, Tensor index, *, bool sparse_grad=False) -> Tensor variants: method, function structured_delegate: gather.out + tags: canonical - func: gather_backward(Tensor grad, Tensor self, int dim, Tensor index, bool sparse_grad) -> Tensor variants: function @@ -8090,19 +8321,10 @@ device_check: NoCheck # TensorIterator variants: method -- func: cross_entropy_loss(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, int ignore_index=-100, float label_smoothing=0.0) -> Tensor +- func: cross_entropy_loss(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100, float label_smoothing=0.0) -> Tensor python_module: nn - -- func: lstsq.X(Tensor self, Tensor A, *, Tensor(a!) X, Tensor(b!) qr) -> (Tensor(a!) solution, Tensor(b!) QR) dispatch: - CPU: legacy_lstsq_out - CUDA: legacy_lstsq_out_cuda - -- func: lstsq(Tensor self, Tensor A) -> (Tensor solution, Tensor QR) - variants: method, function - dispatch: - CPU: legacy_lstsq - CUDA: legacy_lstsq_cuda + CompositeImplicitAutograd: cross_entropy_loss_symint - func: triangular_solve.X(Tensor self, Tensor A, bool upper=True, bool transpose=False, bool unitriangular=False, *, Tensor(a!) X, Tensor(b!) M) -> (Tensor(a!) solution, Tensor(b!) cloned_coefficient) structured: True @@ -8149,15 +8371,6 @@ CUDA: _symeig_helper_cuda autogen: _symeig_helper.out -- func: eig.e(Tensor self, bool eigenvectors=False, *, Tensor(a!) e, Tensor(b!) v) -> (Tensor(a!) eigenvalues, Tensor(b!) eigenvectors) - dispatch: - CompositeExplicitAutograd: eig_out - -- func: eig(Tensor self, bool eigenvectors=False) -> (Tensor eigenvalues, Tensor eigenvectors) - variants: method, function - dispatch: - CompositeExplicitAutograd: eig - - func: svd.U(Tensor self, bool some=True, bool compute_uv=True, *, Tensor(a!) U, Tensor(b!) S, Tensor(c!) V) -> (Tensor(a!) U, Tensor(b!) S, Tensor(c!) V) - func: svd(Tensor self, bool some=True, bool compute_uv=True) -> (Tensor U, Tensor S, Tensor V) @@ -8271,13 +8484,16 @@ # TODO: remove dispatch section when porting TH CUDA to ATen - func: multinomial.out(Tensor self, int num_samples, bool replacement=False, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CPU, CUDA: multinomial_out + MPS: multinomial_out_mps - func: multinomial(Tensor self, int num_samples, bool replacement=False, *, Generator? generator=None) -> Tensor variants: method, function dispatch: CPU, CUDA: multinomial + MPS: multinomial_mps tags: nondeterministic_seeded - func: lgamma.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) @@ -8405,6 +8621,7 @@ dispatch: CPU: signbit_out CUDA: signbit_out + MPS: signbit_out_mps SparseCPU, SparseCUDA: signbit_sparse_out SparseCsrCPU, SparseCsrCUDA: signbit_sparse_csr_out @@ -8681,13 +8898,6 @@ MPS: max_mps QuantizedCPU: max_quantized_cpu -# Not to be confused with binary op `max.out`. Commented because of failed CI -# FIXME: enable this -#- func: max.unary_out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) -# device_check: NoCheck # TensorIterator -# dispatch: -# CompositeExplicitAutograd: max_unary_out - - func: fmax(Tensor self, Tensor other) -> Tensor structured_delegate: fmax.out device_check: NoCheck # TensorIterator @@ -8704,6 +8914,7 @@ structured_delegate: maximum.out device_check: NoCheck # TensorIterator variants: method, function + tags: canonical - func: maximum.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -8722,10 +8933,17 @@ - func: max.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) device_check: NoCheck # TensorIterator +- func: max.unary_out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) + device_check: NoCheck # TensorIterator + dispatch: + CPU, CUDA: max_unary_out + QuantizedCPU: max_quantized_unary_out + - func: minimum(Tensor self, Tensor other) -> Tensor structured_delegate: minimum.out device_check: NoCheck # TensorIterator variants: method, function + tags: canonical - func: minimum.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) structured: True @@ -8878,7 +9096,7 @@ CPU, CUDA, Meta: unfold QuantizedCPU, QuantizedCUDA: unfold -- func: unfold_backward(Tensor grad_in, int[] input_sizes, int dim, int size, int step) -> Tensor +- func: unfold_backward(Tensor grad_in, SymInt[] input_sizes, int dim, int size, int step) -> Tensor variants: function dispatch: CPU, CUDA: unfold_backward @@ -8931,6 +9149,7 @@ variants: function, method dispatch: SparseCPU, SparseCUDA: pow_sparse_scalar + tags: canonical - func: pow_.Scalar(Tensor(a!) self, Scalar exponent) -> Tensor(a!) device_check: NoCheck # TensorIterator @@ -8964,6 +9183,7 @@ - func: normal_(Tensor(a!) self, float mean=0, float std=1, *, Generator? generator=None) -> Tensor(a!) device_check: NoCheck # TensorIterator + tags: nondeterministic_seeded variants: method dispatch: CPU, CUDA: normal_ @@ -8977,10 +9197,12 @@ # but we can't due to overload ambiguity with normal.Tensor_float. - func: normal_functional(Tensor self, float mean=0, float std=1, *, Generator? generator=None) -> Tensor device_check: NoCheck # TensorIterator + tags: nondeterministic_seeded dispatch: CompositeExplicitAutograd: normal_functional - func: normal.Tensor_float_out(Tensor mean, float std=1, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!) + tags: nondeterministic_seeded dispatch: CPU, CUDA: normal_out MPS: normal_mps_out @@ -8998,6 +9220,7 @@ CPU, CUDA: normal_out Meta: normal_out_meta MPS: normal_mps_out + tags: nondeterministic_seeded - func: normal.float_Tensor(float mean, Tensor std, *, Generator? generator=None) -> Tensor dispatch: @@ -9011,6 +9234,7 @@ CPU, CUDA: normal_out Meta: normal_out_meta MPS: normal_mps_out + tags: nondeterministic_seeded - func: normal.Tensor_Tensor(Tensor mean, Tensor std, *, Generator? generator=None) -> Tensor dispatch: @@ -9019,13 +9243,15 @@ Meta: normal_meta tags: nondeterministic_seeded -- func: normal.float_float(float mean, float std, int[] size, *, Generator? generator=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: normal.float_float(float mean, float std, SymInt[] size, *, Generator? generator=None, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor dispatch: CompositeExplicitAutograd: normal + tags: nondeterministic_seeded -- func: normal.float_float_out(float mean, float std, int[] size, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!) +- func: normal.float_float_out(float mean, float std, SymInt[] size, *, Generator? generator=None, Tensor(a!) out) -> Tensor(a!) dispatch: CompositeExplicitAutograd: normal_out + tags: nondeterministic_seeded - func: alias(Tensor(a) self) -> Tensor(a) variants: method, function @@ -9689,6 +9915,14 @@ CUDA: foreach_tensor_addcdiv_scalarlist_cuda_ autogen: _foreach_addcdiv.ScalarList_out +- func: _foreach_addcdiv_.Tensor(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Tensor scalars) -> () + device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices + variants: function + dispatch: + CPU: foreach_tensor_addcdiv_tensor_slow_ + CUDA: foreach_tensor_addcdiv_tensor_cuda_ + autogen: _foreach_addcdiv.Tensor_out + - func: _foreach_addcmul_.ScalarList(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar[] scalars) -> () device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices variants: function @@ -9697,6 +9931,14 @@ CUDA: foreach_tensor_addcmul_scalarlist_cuda_ autogen: _foreach_addcmul.ScalarList_out +- func: _foreach_addcmul_.Tensor(Tensor(a!)[] self, Tensor[] tensor1, Tensor[] tensor2, Tensor scalars) -> () + device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices + variants: function + dispatch: + CPU: foreach_tensor_addcmul_tensor_slow_ + CUDA: foreach_tensor_addcmul_tensor_cuda_ + autogen: _foreach_addcmul.Tensor_out + - func: _foreach_addcdiv.Scalar(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar value=1) -> Tensor[] device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices variants: function @@ -9718,6 +9960,13 @@ CPU: foreach_tensor_addcdiv_scalarlist_slow CUDA: foreach_tensor_addcdiv_scalarlist_cuda +- func: _foreach_addcdiv.Tensor(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Tensor scalars) -> Tensor[] + device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices + variants: function + dispatch: + CPU: foreach_tensor_addcdiv_tensor_slow + CUDA: foreach_tensor_addcdiv_tensor_cuda + - func: _foreach_addcmul.ScalarList(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Scalar[] scalars) -> Tensor[] device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices variants: function @@ -9725,6 +9974,13 @@ CPU: foreach_tensor_addcmul_scalarlist_slow CUDA: foreach_tensor_addcmul_scalarlist_cuda +- func: _foreach_addcmul.Tensor(Tensor[] self, Tensor[] tensor1, Tensor[] tensor2, Tensor scalars) -> Tensor[] + device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices + variants: function + dispatch: + CPU: foreach_tensor_addcmul_tensor_slow + CUDA: foreach_tensor_addcmul_tensor_cuda + - func: _foreach_maximum.List(Tensor[] self, Tensor[] other) -> Tensor[] device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices variants: function @@ -9784,17 +10040,6 @@ CPU: searchsorted_cpu CUDA: searchsorted_cuda -# [Note about _torch_cuda_cu_linker_symbol_op and torch_cuda_cu] -# This is a DUMMY function to force the linking against torch_cuda_cu on Windows. -# Otherwise, the Windows linker will optimize and not include torch_cuda_cu even when we -# want it to be included. This is similar to what we do with warp_size for torch_cuda_cpp, -# described as the solution to this issue: https://github.com/pytorch/pytorch/issues/31611 -# This op should NOT be used or exposed or edited or else Windows builds (with BUILD_SPLIT_CUDA) will break. -- func: _torch_cuda_cu_linker_symbol_op(Tensor self) -> Tensor - dispatch: - CUDA: _torch_cuda_cu_linker_symbol_op_cuda - autogen: _torch_cuda_cu_linker_symbol_op.out - - func: searchsorted.Tensor_out(Tensor sorted_sequence, Tensor self, *, bool out_int32=False, bool right=False, str? side=None, Tensor? sorter=None, Tensor(a!) out) -> Tensor(a!) dispatch: CPU: searchsorted_out_cpu @@ -9909,16 +10154,20 @@ CPU: multilabel_margin_loss_backward_cpu CUDA: multilabel_margin_loss_backward_cuda -- func: nll_loss.out(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, int ignore_index=-100, *, Tensor(a!) out) -> Tensor(a!) +- func: nll_loss.out(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100, *, Tensor(a!) out) -> Tensor(a!) python_module: nn -- func: nll_loss_nd(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, int ignore_index=-100) -> Tensor +- func: nll_loss_nd(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100) -> Tensor python_module: nn + dispatch: + CompositeImplicitAutograd: nll_loss_nd_symint -- func: nll_loss(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, int ignore_index=-100) -> Tensor +- func: nll_loss(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100) -> Tensor python_module: nn + dispatch: + CompositeImplicitAutograd: nll_loss_symint -- func: nll_loss_forward.output(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, *, Tensor(a!) output, Tensor(b!) total_weight) -> (Tensor(a!), Tensor(b!)) +- func: nll_loss_forward.output(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, *, Tensor(a!) output, Tensor(b!) total_weight) -> (Tensor(a!), Tensor(b!)) python_module: nn structured: True dispatch: @@ -9926,11 +10175,11 @@ CUDA: nll_loss_forward_out_cuda MPS: nll_loss_forward_out_mps -- func: nll_loss_forward(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index) -> (Tensor output, Tensor total_weight) +- func: nll_loss_forward(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index) -> (Tensor output, Tensor total_weight) python_module: nn structured_delegate: nll_loss_forward.output -- func: nll_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: nll_loss_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -9938,38 +10187,40 @@ CUDA: nll_loss_backward_out_cuda MPS: nll_loss_backward_out_mps -- func: nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight) -> Tensor +- func: nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight) -> Tensor python_module: nn structured_delegate: nll_loss_backward.grad_input -- func: nll_loss2d.out(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, int ignore_index=-100, *, Tensor(a!) out) -> Tensor(a!) +- func: nll_loss2d.out(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100, *, Tensor(a!) out) -> Tensor(a!) python_module: nn -- func: nll_loss2d(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, int ignore_index=-100) -> Tensor +- func: nll_loss2d(Tensor self, Tensor target, Tensor? weight=None, int reduction=Mean, SymInt ignore_index=-100) -> Tensor python_module: nn + dispatch: + CompositeImplicitAutograd: nll_loss2d_symint -- func: nll_loss2d_forward.output(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, *, Tensor(a!) output, Tensor(b!) total_weight) -> (Tensor(a!), Tensor(b!)) +- func: nll_loss2d_forward.output(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, *, Tensor(a!) output, Tensor(b!) total_weight) -> (Tensor(a!), Tensor(b!)) python_module: nn dispatch: CPU: nll_loss2d_forward_out_cpu CUDA: nll_loss2d_forward_out_cuda MPS: nll_loss2d_forward_out_mps -- func: nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index) -> (Tensor output, Tensor total_weight) +- func: nll_loss2d_forward(Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index) -> (Tensor output, Tensor total_weight) python_module: nn dispatch: CPU: nll_loss2d_forward_cpu CUDA: nll_loss2d_forward_cuda MPS: nll_loss2d_forward_mps -- func: nll_loss2d_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: nll_loss2d_backward.grad_input(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn dispatch: CPU: nll_loss2d_backward_out_cpu CUDA: nll_loss2d_backward_out_cuda MPS: nll_loss2d_backward_out_mps -- func: nll_loss2d_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight) -> Tensor +- func: nll_loss2d_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, SymInt ignore_index, Tensor total_weight) -> Tensor python_module: nn dispatch: CPU: nll_loss2d_backward_cpu @@ -10160,6 +10411,7 @@ dispatch: CPU, CUDA, MPS: hardtanh QuantizedCPU: hardtanh_quantized_cpu + tags: canonical - func: hardtanh_backward.grad_input(Tensor grad_output, Tensor self, Scalar min_val, Scalar max_val, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn @@ -10185,23 +10437,27 @@ python_module: nn dispatch: CPU, CUDA: hardswish_out + MPS: hardswish_out_mps - func: hardswish(Tensor self) -> Tensor device_check: NoCheck # TensorIterator python_module: nn dispatch: CPU, CUDA: hardswish + MPS: hardswish_mps - func: hardswish_(Tensor(a!) self) -> Tensor(a!) device_check: NoCheck # TensorIterator python_module: nn dispatch: CPU, CUDA: hardswish_ + MPS: hardswish_mps_ - func: hardswish_backward(Tensor grad_output, Tensor self) -> Tensor python_module: nn dispatch: CPU, CUDA: hardswish_backward + MPS: hardswish_backward_mps autogen: hardswish_backward.out - func: leaky_relu.out(Tensor self, Scalar negative_slope=0.01, *, Tensor(a!) out) -> Tensor(a!) @@ -10220,6 +10476,7 @@ python_module: nn dispatch: QuantizedCPU: leaky_relu_quantized_cpu + tags: canonical - func: leaky_relu_backward.grad_input(Tensor grad_output, Tensor self, Scalar negative_slope, bool self_is_result, *, Tensor(a!) grad_input) -> Tensor(a!) structured: True @@ -10276,6 +10533,7 @@ - func: rrelu_with_noise.out(Tensor self, Tensor noise, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn + tags: nondeterministic_seeded dispatch: CPU: rrelu_with_noise_out_cpu CUDA: rrelu_with_noise_out_cuda @@ -10295,6 +10553,7 @@ - func: rrelu_with_noise_(Tensor(a!) self, Tensor noise, Scalar lower=0.125, Scalar upper=0.3333333333333333, bool training=False, Generator? generator=None) -> Tensor(a!) python_module: nn + tags: nondeterministic_seeded dispatch: CPU: rrelu_with_noise_cpu_ CUDA: rrelu_with_noise_cuda_ @@ -10349,7 +10608,7 @@ structured_delegate: softshrink_backward.grad_input python_module: nn -- func: adaptive_avg_pool2d.out(Tensor self, int[2] output_size, *, Tensor(a!) out) -> Tensor(a!) +- func: adaptive_avg_pool2d.out(Tensor self, SymInt[2] output_size, *, Tensor(a!) out) -> Tensor(a!) python_module: nn dispatch: CPU: adaptive_avg_pool2d_out_cpu @@ -10357,8 +10616,10 @@ MPS: adaptive_avg_pool2d_out_mps MkldnnCPU: mkldnn_adaptive_avg_pool2d_out_stub -- func: adaptive_avg_pool2d(Tensor self, int[2] output_size) -> Tensor +- func: adaptive_avg_pool2d(Tensor self, SymInt[2] output_size) -> Tensor python_module: nn + dispatch: + CompositeImplicitAutograd: adaptive_avg_pool2d_symint - func: mkldnn_adaptive_avg_pool2d(Tensor self, int[2] output_size) -> Tensor dispatch: @@ -10373,7 +10634,7 @@ MkldnnCPU: mkldnn_adaptive_avg_pool2d_backward autogen: mkldnn_adaptive_avg_pool2d_backward.out -- func: _adaptive_avg_pool2d(Tensor self, int[2] output_size) -> Tensor +- func: _adaptive_avg_pool2d(Tensor self, SymInt[2] output_size) -> Tensor dispatch: CPU: adaptive_avg_pool2d_cpu CUDA: adaptive_avg_pool2d_cuda @@ -10381,6 +10642,7 @@ QuantizedCPU: adaptive_avg_pool2d_quantized_cpu QuantizedCUDA: adaptive_avg_pool2d_quantized_cuda autogen: _adaptive_avg_pool2d.out + tags: canonical - func: _adaptive_avg_pool2d_backward(Tensor grad_output, Tensor self) -> Tensor python_module: nn @@ -10389,18 +10651,21 @@ CUDA: adaptive_avg_pool2d_backward_cuda MPS: adaptive_avg_pool2d_backward_mps autogen: _adaptive_avg_pool2d_backward.out + tags: canonical -- func: adaptive_avg_pool3d.out(Tensor self, int[3] output_size, *, Tensor(a!) out) -> Tensor(a!) +- func: adaptive_avg_pool3d.out(Tensor self, SymInt[3] output_size, *, Tensor(a!) out) -> Tensor(a!) python_module: nn dispatch: CPU: adaptive_avg_pool3d_out_cpu CUDA: adaptive_avg_pool3d_out_cuda QuantizedCPU: adaptive_avg_pool3d_out_quantized_cpu -- func: adaptive_avg_pool3d(Tensor self, int[3] output_size) -> Tensor +- func: adaptive_avg_pool3d(Tensor self, SymInt[3] output_size) -> Tensor python_module: nn + dispatch: + CompositeImplicitAutograd: adaptive_avg_pool3d_symint -- func: _adaptive_avg_pool3d(Tensor self, int[3] output_size) -> Tensor +- func: _adaptive_avg_pool3d(Tensor self, SymInt[3] output_size) -> Tensor dispatch: CPU: adaptive_avg_pool3d_cpu CUDA: adaptive_avg_pool3d_cuda @@ -10489,6 +10754,7 @@ dispatch: MkldnnCPU: mkldnn_avg_pool2d QuantizedCPU: avg_pool2d_quantized_cpu + tags: canonical - func: avg_pool2d_backward.grad_input(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, bool ceil_mode, bool count_include_pad, int? divisor_override, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn @@ -10504,6 +10770,7 @@ structured_delegate: avg_pool2d_backward.grad_input dispatch: MkldnnCPU: mkldnn_avg_pool2d_backward + tags: canonical - func: avg_pool3d.out(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn @@ -10600,6 +10867,7 @@ - func: max_pool2d_with_indices(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=0, int[2] dilation=1, bool ceil_mode=False) -> (Tensor, Tensor) python_module: nn structured_delegate: max_pool2d_with_indices.out + tags: canonical - func: max_pool2d_with_indices_backward.grad_input(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, int[2] dilation, bool ceil_mode, Tensor indices, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn @@ -10612,6 +10880,7 @@ - func: max_pool2d_with_indices_backward(Tensor grad_output, Tensor self, int[2] kernel_size, int[2] stride, int[2] padding, int[2] dilation, bool ceil_mode, Tensor indices) -> Tensor python_module: nn structured_delegate: max_pool2d_with_indices_backward.grad_input + tags: canonical # Return: (Tensor output, Tensor indices) - func: max_pool3d_with_indices.out(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=0, int[3] dilation=1, bool ceil_mode=False, *, Tensor(a!) out, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!)) @@ -10663,7 +10932,7 @@ CPU: max_unpooling3d_forward_cpu CUDA: max_unpooling3d_forward_cuda -- func: reflection_pad1d.out(Tensor self, int[2] padding, *, Tensor(a!) out) -> Tensor(a!) +- func: reflection_pad1d.out(Tensor self, SymInt[2] padding, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -10672,11 +10941,11 @@ CUDA: reflection_pad1d_out_cuda MPS: reflection_pad1d_out_mps -- func: reflection_pad1d(Tensor self, int[2] padding) -> Tensor +- func: reflection_pad1d(Tensor self, SymInt[2] padding) -> Tensor python_module: nn structured_delegate: reflection_pad1d.out -- func: reflection_pad1d_backward.grad_input(Tensor grad_output, Tensor self, int[2] padding, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: reflection_pad1d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[2] padding, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -10684,18 +10953,18 @@ CUDA: reflection_pad1d_backward_out_cuda MPS: reflection_pad1d_backward_out_mps -- func: reflection_pad1d_backward(Tensor grad_output, Tensor self, int[2] padding) -> Tensor +- func: reflection_pad1d_backward(Tensor grad_output, Tensor self, SymInt[2] padding) -> Tensor python_module: nn structured_delegate: reflection_pad1d_backward.grad_input -- func: reflection_pad2d.out(Tensor self, int[4] padding, *, Tensor(a!) out) -> Tensor(a!) +- func: reflection_pad2d.out(Tensor self, SymInt[4] padding, *, Tensor(a!) out) -> Tensor(a!) python_module: nn dispatch: CPU, QuantizedCPU: reflection_pad2d_out_cpu CUDA: reflection_pad2d_out_cuda MPS: reflection_pad2d_out_mps -- func: reflection_pad2d(Tensor self, int[4] padding) -> Tensor +- func: reflection_pad2d(Tensor self, SymInt[4] padding) -> Tensor python_module: nn dispatch: CPU: reflection_pad2d_cpu @@ -10703,21 +10972,21 @@ CUDA: reflection_pad2d_cuda MPS: reflection_pad2d_mps -- func: reflection_pad2d_backward.grad_input(Tensor grad_output, Tensor self, int[4] padding, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: reflection_pad2d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[4] padding, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn dispatch: CPU: reflection_pad2d_backward_out_cpu CUDA: reflection_pad2d_backward_out_cuda MPS: reflection_pad2d_backward_out_mps -- func: reflection_pad2d_backward(Tensor grad_output, Tensor self, int[4] padding) -> Tensor +- func: reflection_pad2d_backward(Tensor grad_output, Tensor self, SymInt[4] padding) -> Tensor python_module: nn dispatch: CPU: reflection_pad2d_backward_cpu CUDA: reflection_pad2d_backward_cuda MPS: reflection_pad2d_backward_mps -- func: reflection_pad3d.out(Tensor self, int[6] padding, *, Tensor(a!) out) -> Tensor(a!) +- func: reflection_pad3d.out(Tensor self, SymInt[6] padding, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -10725,11 +10994,11 @@ CUDA: reflection_pad3d_out_cuda MPS: reflection_pad3d_out_mps -- func: reflection_pad3d(Tensor self, int[6] padding) -> Tensor +- func: reflection_pad3d(Tensor self, SymInt[6] padding) -> Tensor python_module: nn structured_delegate: reflection_pad3d.out -- func: reflection_pad3d_backward.grad_input(Tensor grad_output, Tensor self, int[6] padding, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: reflection_pad3d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[6] padding, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -10737,11 +11006,11 @@ CUDA: reflection_pad3d_backward_out_cuda MPS: reflection_pad3d_backward_out_mps -- func: reflection_pad3d_backward(Tensor grad_output, Tensor self, int[6] padding) -> Tensor +- func: reflection_pad3d_backward(Tensor grad_output, Tensor self, SymInt[6] padding) -> Tensor python_module: nn structured_delegate: reflection_pad3d_backward.grad_input -- func: replication_pad1d.out(Tensor self, int[2] padding, *, Tensor(a!) out) -> Tensor(a!) +- func: replication_pad1d.out(Tensor self, SymInt[2] padding, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -10749,11 +11018,11 @@ CUDA: replication_pad1d_out_cuda MPS: replication_pad1d_out_mps -- func: replication_pad1d(Tensor self, int[2] padding) -> Tensor +- func: replication_pad1d(Tensor self, SymInt[2] padding) -> Tensor python_module: nn structured_delegate: replication_pad1d.out -- func: replication_pad1d_backward.grad_input(Tensor grad_output, Tensor self, int[2] padding, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: replication_pad1d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[2] padding, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -10761,11 +11030,11 @@ CUDA: replication_pad1d_backward_out_cuda MPS: replication_pad1d_backward_out_mps -- func: replication_pad1d_backward(Tensor grad_output, Tensor self, int[2] padding) -> Tensor +- func: replication_pad1d_backward(Tensor grad_output, Tensor self, SymInt[2] padding) -> Tensor python_module: nn structured_delegate: replication_pad1d_backward.grad_input -- func: replication_pad2d.out(Tensor self, int[4] padding, *, Tensor(a!) out) -> Tensor(a!) +- func: replication_pad2d.out(Tensor self, SymInt[4] padding, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -10773,25 +11042,25 @@ CUDA: replication_pad2d_out_cuda MPS: replication_pad2d_out_mps -- func: replication_pad2d(Tensor self, int[4] padding) -> Tensor +- func: replication_pad2d(Tensor self, SymInt[4] padding) -> Tensor python_module: nn structured_delegate: replication_pad2d.out -- func: replication_pad2d_backward.grad_input(Tensor grad_output, Tensor self, int[4] padding, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: replication_pad2d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[4] padding, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn dispatch: CPU: replication_pad2d_backward_out_cpu CUDA: replication_pad2d_backward_out_cuda MPS: replication_pad2d_backward_out_mps -- func: replication_pad2d_backward(Tensor grad_output, Tensor self, int[4] padding) -> Tensor +- func: replication_pad2d_backward(Tensor grad_output, Tensor self, SymInt[4] padding) -> Tensor python_module: nn dispatch: CPU: replication_pad2d_backward_cpu CUDA: replication_pad2d_backward_cuda MPS: replication_pad2d_backward_mps -- func: replication_pad3d.out(Tensor self, int[6] padding, *, Tensor(a!) out) -> Tensor(a!) +- func: replication_pad3d.out(Tensor self, SymInt[6] padding, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -10799,207 +11068,113 @@ CUDA: replication_pad3d_out_cuda MPS: replication_pad3d_out_mps -- func: replication_pad3d(Tensor self, int[6] padding) -> Tensor +- func: replication_pad3d(Tensor self, SymInt[6] padding) -> Tensor python_module: nn structured_delegate: replication_pad3d.out -- func: replication_pad3d_backward.grad_input(Tensor grad_output, Tensor self, int[6] padding, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: replication_pad3d_backward.grad_input(Tensor grad_output, Tensor self, SymInt[6] padding, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn dispatch: CPU: replication_pad3d_backward_out_cpu CUDA: replication_pad3d_backward_out_cuda MPS: replication_pad3d_backward_out_mps -- func: replication_pad3d_backward(Tensor grad_output, Tensor self, int[6] padding) -> Tensor +- func: replication_pad3d_backward(Tensor grad_output, Tensor self, SymInt[6] padding) -> Tensor python_module: nn dispatch: CPU: replication_pad3d_backward_cpu CUDA: replication_pad3d_backward_cuda MPS: replication_pad3d_backward_mps -- func: _pad_circular(Tensor self, int[] pad) -> Tensor - python_module: nn - -- func: _pad_enum(Tensor self, int[] pad, int mode, float? value=None) -> Tensor - python_module: nn - -- func: pad(Tensor self, int[] pad, str mode="constant", float? value=None) -> Tensor - python_module: nn - -- func: upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: _pad_circular(Tensor self, SymInt[] pad) -> Tensor python_module: nn dispatch: - CompositeExplicitAutograd: upsample_linear1d - autogen: upsample_linear1d.vec_out + CompositeImplicitAutograd: _pad_circular_symint -- func: upsample_linear1d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: _pad_enum(Tensor self, SymInt[] pad, int mode, float? value=None) -> Tensor python_module: nn dispatch: - CompositeExplicitAutograd: upsample_linear1d_backward - autogen: upsample_linear1d_backward.vec_out + CompositeImplicitAutograd: _pad_enum_symint -- func: upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: pad(Tensor self, SymInt[] pad, str mode="constant", float? value=None) -> Tensor python_module: nn dispatch: - CompositeExplicitAutograd: upsample_bilinear2d - autogen: upsample_bilinear2d.vec_out + CompositeImplicitAutograd: pad_symint -- func: upsample_bilinear2d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: upsample_linear1d.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_bilinear2d_backward - autogen: upsample_bilinear2d_backward.vec_out + autogen: upsample_linear1d.vec_out -- func: _upsample_bilinear2d_aa.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: upsample_bilinear2d.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: _upsample_bilinear2d_aa - autogen: _upsample_bilinear2d_aa.vec_out + autogen: upsample_bilinear2d.vec_out + tags: canonical -- func: _upsample_bilinear2d_aa_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: _upsample_bilinear2d_aa.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: _upsample_bilinear2d_aa_backward - autogen: _upsample_bilinear2d_aa_backward.vec_out + autogen: _upsample_bilinear2d_aa.vec_out -- func: upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: upsample_trilinear3d.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_trilinear3d autogen: upsample_trilinear3d.vec_out -- func: upsample_trilinear3d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: upsample_bicubic2d.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_trilinear3d_backward - autogen: upsample_trilinear3d_backward.vec_out - -- func: upsample_bicubic2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_bicubic2d autogen: upsample_bicubic2d.vec_out -- func: upsample_bicubic2d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: _upsample_bicubic2d_aa.vec(Tensor input, SymInt[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_bicubic2d_backward - autogen: upsample_bicubic2d_backward.vec_out - -- func: _upsample_bicubic2d_aa.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CompositeExplicitAutograd: _upsample_bicubic2d_aa autogen: _upsample_bicubic2d_aa.vec_out -- func: _upsample_bicubic2d_aa_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, bool align_corners, float[]? scale_factors) -> Tensor +- func: upsample_nearest1d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: _upsample_bicubic2d_aa_backward - autogen: _upsample_bicubic2d_aa_backward.vec_out - -- func: upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_nearest1d autogen: upsample_nearest1d.vec_out -- func: _upsample_nearest_exact1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor +- func: _upsample_nearest_exact1d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: _upsample_nearest_exact1d autogen: _upsample_nearest_exact1d.vec_out -- func: upsample_nearest1d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_nearest1d_backward - autogen: upsample_nearest1d_backward.vec_out - -- func: _upsample_nearest_exact1d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CompositeExplicitAutograd: _upsample_nearest_exact1d_backward - autogen: _upsample_nearest_exact1d_backward.vec_out - -- func: upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor +- func: upsample_nearest2d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_nearest2d autogen: upsample_nearest2d.vec_out + tags: canonical -- func: _upsample_nearest_exact2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor +- func: _upsample_nearest_exact2d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CompositeExplicitAutograd: _upsample_nearest_exact2d autogen: _upsample_nearest_exact2d.vec_out -- func: upsample_nearest2d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CompositeExplicitAutograd: upsample_nearest2d_backward - autogen: upsample_nearest2d_backward.vec_out - -- func: _upsample_nearest_exact2d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CompositeExplicitAutograd: _upsample_nearest_exact2d_backward - autogen: _upsample_nearest_exact2d_backward.vec_out - -- func: upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor +- func: upsample_nearest3d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CPU: upsample_nearest3d_cpu - CUDA: upsample_nearest3d_cuda - QuantizedCPU: upsample_nearest3d_quantized_cpu autogen: upsample_nearest3d.vec_out -- func: _upsample_nearest_exact3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor +- func: _upsample_nearest_exact3d.vec(Tensor input, SymInt[]? output_size, float[]? scale_factors) -> Tensor python_module: nn - dispatch: - CPU: _upsample_nearest_exact3d_cpu - CUDA: _upsample_nearest_exact3d_cuda - QuantizedCPU: _upsample_nearest_exact3d_quantized_cpu autogen: _upsample_nearest_exact3d.vec_out -- func: upsample_nearest3d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CPU: upsample_nearest3d_backward_cpu - CUDA: upsample_nearest3d_backward_cuda - autogen: upsample_nearest3d_backward.vec_out - -- func: _upsample_nearest_exact3d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, float[]? scale_factors) -> Tensor - python_module: nn - dispatch: - CPU: _upsample_nearest_exact3d_backward_cpu - CUDA: _upsample_nearest_exact3d_backward_cuda - autogen: _upsample_nearest_exact3d_backward.vec_out - # NOTE: all of the non-"vec" upsample overloads are only kept for backward compatibility. -- func: upsample_linear1d.out(Tensor self, int[1] output_size, bool align_corners, float? scales=None, *, Tensor(a!) out) -> Tensor(a!) +- func: upsample_linear1d.out(Tensor self, SymInt[1] output_size, bool align_corners, float? scales=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_linear1d_out_cpu CUDA: upsample_linear1d_out_cuda -- func: upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> Tensor +- func: upsample_linear1d(Tensor self, SymInt[1] output_size, bool align_corners, float? scales=None) -> Tensor python_module: nn structured_delegate: upsample_linear1d.out -- func: upsample_linear1d_backward.grad_input(Tensor grad_output, int[1] output_size, int[3] input_size, bool align_corners, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: upsample_linear1d_backward.grad_input(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, bool align_corners, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_linear1d_backward_out_cpu CUDA: upsample_linear1d_backward_out_cuda -- func: upsample_linear1d_backward(Tensor grad_output, int[1] output_size, int[3] input_size, bool align_corners, float? scales=None) -> Tensor +- func: upsample_linear1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, bool align_corners, float? scales=None) -> Tensor python_module: nn structured_delegate: upsample_linear1d_backward.grad_input -- func: upsample_bilinear2d.out(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: upsample_bilinear2d.out(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -11007,13 +11182,13 @@ CUDA: upsample_bilinear2d_out_cuda MPS: upsample_bilinear2d_out_mps -- func: upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_bilinear2d(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_bilinear2d.out dispatch: QuantizedCPU: upsample_bilinear2d_quantized_cpu -- func: upsample_bilinear2d_backward.grad_input(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: upsample_bilinear2d_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -11021,99 +11196,99 @@ CUDA: upsample_bilinear2d_backward_out_cuda MPS: upsample_bilinear2d_backward_out_mps -- func: upsample_bilinear2d_backward(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_bilinear2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_bilinear2d_backward.grad_input -- func: _upsample_bilinear2d_aa.out(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: _upsample_bilinear2d_aa.out(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: _upsample_bilinear2d_aa_out_cpu CUDA: _upsample_bilinear2d_aa_out_cuda -- func: _upsample_bilinear2d_aa(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor +- func: _upsample_bilinear2d_aa(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: _upsample_bilinear2d_aa.out -- func: _upsample_bilinear2d_aa_backward.grad_input(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: _upsample_bilinear2d_aa_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: _upsample_bilinear2d_aa_backward_out_cpu CUDA: _upsample_bilinear2d_aa_backward_out_cuda -- func: _upsample_bilinear2d_aa_backward(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor +- func: _upsample_bilinear2d_aa_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: _upsample_bilinear2d_aa_backward.grad_input -- func: upsample_bicubic2d.out(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: upsample_bicubic2d.out(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_bicubic2d_out_cpu CUDA: upsample_bicubic2d_out_cuda -- func: upsample_bicubic2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_bicubic2d(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_bicubic2d.out -- func: upsample_bicubic2d_backward.grad_input(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: upsample_bicubic2d_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_bicubic2d_backward_out_cpu CUDA: upsample_bicubic2d_backward_out_cuda -- func: upsample_bicubic2d_backward(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_bicubic2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_bicubic2d_backward.grad_input -- func: _upsample_bicubic2d_aa.out(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: _upsample_bicubic2d_aa.out(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: _upsample_bicubic2d_aa_out_cpu CUDA: _upsample_bicubic2d_aa_out_cuda -- func: _upsample_bicubic2d_aa(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor +- func: _upsample_bicubic2d_aa(Tensor self, SymInt[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: _upsample_bicubic2d_aa.out -- func: _upsample_bicubic2d_aa_backward.grad_input(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: _upsample_bicubic2d_aa_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: _upsample_bicubic2d_aa_backward_out_cpu CUDA: _upsample_bicubic2d_aa_backward_out_cuda -- func: _upsample_bicubic2d_aa_backward(Tensor grad_output, int[2] output_size, int[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor +- func: _upsample_bicubic2d_aa_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: _upsample_bicubic2d_aa_backward.grad_input -- func: upsample_trilinear3d.out(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: upsample_trilinear3d.out(Tensor self, SymInt[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_trilinear3d_out_cpu CUDA: upsample_trilinear3d_out_cuda -- func: upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_trilinear3d(Tensor self, SymInt[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_trilinear3d.out -- func: upsample_trilinear3d_backward.grad_input(Tensor grad_output, int[3] output_size, int[5] input_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: upsample_trilinear3d_backward.grad_input(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_trilinear3d_backward_out_cpu CUDA: upsample_trilinear3d_backward_out_cuda -- func: upsample_trilinear3d_backward(Tensor grad_output, int[3] output_size, int[5] input_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_trilinear3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_trilinear3d_backward.grad_input -- func: upsample_nearest1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> Tensor(a!) +- func: upsample_nearest1d.out(Tensor self, SymInt[1] output_size, float? scales=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -11121,44 +11296,44 @@ CUDA: upsample_nearest1d_out_cuda MPS: upsample_nearest1d_out_mps -- func: _upsample_nearest_exact1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> Tensor(a!) +- func: _upsample_nearest_exact1d.out(Tensor self, SymInt[1] output_size, float? scales=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: _upsample_nearest_exact1d_out_cpu CUDA: _upsample_nearest_exact1d_out_cuda -- func: upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> Tensor +- func: upsample_nearest1d(Tensor self, SymInt[1] output_size, float? scales=None) -> Tensor python_module: nn structured_delegate: upsample_nearest1d.out -- func: _upsample_nearest_exact1d(Tensor self, int[1] output_size, float? scales=None) -> Tensor +- func: _upsample_nearest_exact1d(Tensor self, SymInt[1] output_size, float? scales=None) -> Tensor python_module: nn structured_delegate: _upsample_nearest_exact1d.out -- func: upsample_nearest1d_backward.grad_input(Tensor grad_output, int[1] output_size, int[3] input_size, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: upsample_nearest1d_backward.grad_input(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_nearest1d_backward_out_cpu CUDA: upsample_nearest1d_backward_out_cuda -- func: _upsample_nearest_exact1d_backward.grad_input(Tensor grad_output, int[1] output_size, int[3] input_size, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: _upsample_nearest_exact1d_backward.grad_input(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: _upsample_nearest_exact1d_backward_out_cpu CUDA: _upsample_nearest_exact1d_backward_out_cuda -- func: upsample_nearest1d_backward(Tensor grad_output, int[1] output_size, int[3] input_size, float? scales=None) -> Tensor +- func: upsample_nearest1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None) -> Tensor python_module: nn structured_delegate: upsample_nearest1d_backward.grad_input -- func: _upsample_nearest_exact1d_backward(Tensor grad_output, int[1] output_size, int[3] input_size, float? scales=None) -> Tensor +- func: _upsample_nearest_exact1d_backward(Tensor grad_output, SymInt[1] output_size, SymInt[3] input_size, float? scales=None) -> Tensor python_module: nn structured_delegate: _upsample_nearest_exact1d_backward.grad_input -- func: upsample_nearest2d.out(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: upsample_nearest2d.out(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -11166,7 +11341,7 @@ CUDA: upsample_nearest2d_out_cuda MPS: upsample_nearest2d_out_mps -- func: _upsample_nearest_exact2d.out(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: _upsample_nearest_exact2d.out(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -11174,19 +11349,19 @@ CUDA: _upsample_nearest_exact2d_out_cuda MPS: _upsample_nearest_exact2d_out_mps -- func: upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_nearest2d(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_nearest2d.out dispatch: QuantizedCPU: upsample_nearest2d_quantized_cpu -- func: _upsample_nearest_exact2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor +- func: _upsample_nearest_exact2d(Tensor self, SymInt[2] output_size, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: _upsample_nearest_exact2d.out dispatch: QuantizedCPU: _upsample_nearest_exact2d_quantized_cpu -- func: upsample_nearest2d_backward.grad_input(Tensor grad_output, int[2] output_size, int[4] input_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: upsample_nearest2d_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -11194,7 +11369,7 @@ CUDA: upsample_nearest2d_backward_out_cuda MPS: upsample_nearest2d_backward_out_mps -- func: _upsample_nearest_exact2d_backward.grad_input(Tensor grad_output, int[2] output_size, int[4] input_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: _upsample_nearest_exact2d_backward.grad_input(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: @@ -11202,59 +11377,59 @@ CUDA: _upsample_nearest_exact2d_backward_out_cuda MPS: _upsample_nearest_exact2d_backward_out_mps -- func: upsample_nearest2d_backward(Tensor grad_output, int[2] output_size, int[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_nearest2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_nearest2d_backward.grad_input -- func: _upsample_nearest_exact2d_backward(Tensor grad_output, int[2] output_size, int[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor +- func: _upsample_nearest_exact2d_backward(Tensor grad_output, SymInt[2] output_size, SymInt[4] input_size, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: _upsample_nearest_exact2d_backward.grad_input -- func: upsample_nearest3d.out(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: upsample_nearest3d.out(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_nearest3d_out_cpu CUDA: upsample_nearest3d_out_cuda -- func: _upsample_nearest_exact3d.out(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) +- func: _upsample_nearest_exact3d.out(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: _upsample_nearest_exact3d_out_cpu CUDA: _upsample_nearest_exact3d_out_cuda -- func: upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_nearest3d(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_nearest3d.out dispatch: QuantizedCPU: upsample_nearest3d_quantized_cpu -- func: _upsample_nearest_exact3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor +- func: _upsample_nearest_exact3d(Tensor self, SymInt[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: _upsample_nearest_exact3d.out dispatch: QuantizedCPU: _upsample_nearest_exact3d_quantized_cpu -- func: upsample_nearest3d_backward.grad_input(Tensor grad_output, int[3] output_size, int[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: upsample_nearest3d_backward.grad_input(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: upsample_nearest3d_backward_out_cpu CUDA: upsample_nearest3d_backward_out_cuda -- func: _upsample_nearest_exact3d_backward.grad_input(Tensor grad_output, int[3] output_size, int[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) +- func: _upsample_nearest_exact3d_backward.grad_input(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None, *, Tensor(a!) grad_input) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: _upsample_nearest_exact3d_backward_out_cpu CUDA: _upsample_nearest_exact3d_backward_out_cuda -- func: upsample_nearest3d_backward(Tensor grad_output, int[3] output_size, int[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor +- func: upsample_nearest3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: upsample_nearest3d_backward.grad_input -- func: _upsample_nearest_exact3d_backward(Tensor grad_output, int[3] output_size, int[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor +- func: _upsample_nearest_exact3d_backward(Tensor grad_output, SymInt[3] output_size, SymInt[5] input_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> Tensor python_module: nn structured_delegate: _upsample_nearest_exact3d_backward.grad_input @@ -11311,24 +11486,24 @@ # these are the same thing, but we give them different prefixes to # make the operational distinction clear. -- func: slow_conv_transpose2d.out(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, int[2] padding=0, int[2] output_padding=0, int[2] dilation=1, *, Tensor(a!) out) -> Tensor(a!) +- func: slow_conv_transpose2d.out(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, SymInt[2] padding=0, SymInt[2] output_padding=0, int[2] dilation=1, *, Tensor(a!) out) -> Tensor(a!) python_module: nn structured: True dispatch: CPU: slow_conv_transpose2d_structured_cpu CUDA: slow_conv_transpose2d_structured_cuda -- func: slow_conv_transpose2d(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, int[2] padding=0, int[2] output_padding=0, int[2] dilation=1) -> Tensor +- func: slow_conv_transpose2d(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, SymInt[2] padding=0, SymInt[2] output_padding=0, int[2] dilation=1) -> Tensor python_module: nn structured_delegate: slow_conv_transpose2d.out -- func: slow_conv_transpose3d.out(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, int[3] padding=0, int[3] output_padding=0, int[3] dilation=1, *, Tensor(a!) out) -> Tensor(a!) +- func: slow_conv_transpose3d.out(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, SymInt[3] padding=0, SymInt[3] output_padding=0, int[3] dilation=1, *, Tensor(a!) out) -> Tensor(a!) python_module: nn dispatch: CPU: slow_conv_transpose3d_out_cpu CUDA: slow_conv_transpose3d_out_cuda -- func: slow_conv_transpose3d(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, int[3] padding=0, int[3] output_padding=0, int[3] dilation=1) -> Tensor +- func: slow_conv_transpose3d(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, SymInt[3] padding=0, SymInt[3] output_padding=0, int[3] dilation=1) -> Tensor python_module: nn dispatch: CPU: slow_conv_transpose3d_cpu @@ -11365,76 +11540,65 @@ CUDA: slow_conv2d_backward_cuda autogen: _slow_conv2d_backward.output_mask_out -- func: _conv_depthwise2d.out(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias, int[2] stride, int[2] padding, int[2] dilation, *, Tensor(a!) out) -> Tensor(a!) +- func: _conv_depthwise2d.out(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias, int[2] stride, SymInt[2] padding, int[2] dilation, *, Tensor(a!) out) -> Tensor(a!) use_const_ref_for_mutable_tensors: True python_module: nn dispatch: CUDA: conv_depthwise2d_cuda_out -- func: _conv_depthwise2d(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias, int[2] stride, int[2] padding, int[2] dilation) -> Tensor +- func: _conv_depthwise2d(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias, int[2] stride, SymInt[2] padding, int[2] dilation) -> Tensor python_module: nn dispatch: CUDA: conv_depthwise2d_cuda -- func: conv_depthwise3d(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias, int[3] stride, int[3] padding, int[3] dilation) -> Tensor +- func: conv_depthwise3d(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias, int[3] stride, SymInt[3] padding, int[3] dilation) -> Tensor python_module: nn dispatch: CUDA: conv_depthwise3d_cuda autogen: conv_depthwise3d.out -- func: slow_conv3d.out(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, int[3] padding=0, *, Tensor(a!) out) -> Tensor(a!) +- func: slow_conv3d.out(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, SymInt[3] padding=0, *, Tensor(a!) out) -> Tensor(a!) python_module: nn -- func: slow_conv3d(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, int[3] padding=0) -> Tensor +- func: slow_conv3d(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, SymInt[3] padding=0) -> Tensor python_module: nn -- func: slow_conv3d_forward.output(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias, int[3] stride, int[3] padding, *, Tensor(a!) output) -> Tensor(a!) +- func: slow_conv3d_forward.output(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias, int[3] stride, SymInt[3] padding, *, Tensor(a!) output) -> Tensor(a!) python_module: nn dispatch: CPU: slow_conv3d_forward_out_cpu -- func: slow_conv3d_forward(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias, int[3] stride, int[3] padding) -> Tensor +- func: slow_conv3d_forward(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias, int[3] stride, SymInt[3] padding) -> Tensor python_module: nn dispatch: CPU: slow_conv3d_forward_cpu -- func: slow_conv_dilated2d(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, int[2] padding=0, int[2] dilation=1) -> Tensor +- func: slow_conv_dilated2d(Tensor self, Tensor weight, int[2] kernel_size, Tensor? bias=None, int[2] stride=1, SymInt[2] padding=0, int[2] dilation=1) -> Tensor python_module: nn dispatch: CPU: slow_conv_dilated2d_cpu CUDA: slow_conv_dilated2d_cuda autogen: slow_conv_dilated2d.out -- func: slow_conv_dilated3d(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, int[3] padding=0, int[3] dilation=1) -> Tensor +- func: slow_conv_dilated3d(Tensor self, Tensor weight, int[3] kernel_size, Tensor? bias=None, int[3] stride=1, SymInt[3] padding=0, int[3] dilation=1) -> Tensor python_module: nn dispatch: CPU: slow_conv_dilated3d_cpu CUDA: slow_conv_dilated3d_cuda autogen: slow_conv_dilated3d.out -- func: col2im.out(Tensor self, int[2] output_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, *, Tensor(a!) out) -> Tensor(a!) +- func: col2im.out(Tensor self, SymInt[2] output_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, *, Tensor(a!) out) -> Tensor(a!) python_module: nn dispatch: CPU: col2im_out_cpu CUDA: col2im_out_cuda -- func: col2im(Tensor self, int[2] output_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride) -> Tensor +- func: col2im(Tensor self, SymInt[2] output_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride) -> Tensor python_module: nn dispatch: CPU: col2im_cpu CUDA: col2im_cuda - -- func: col2im_backward.grad_input(Tensor grad_output, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, *, Tensor(a!) grad_input) -> Tensor(a!) - python_module: nn - dispatch: - CPU: col2im_backward_out_cpu - CUDA: col2im_backward_out_cuda - -- func: col2im_backward(Tensor grad_output, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride) -> Tensor - python_module: nn - dispatch: - CPU: col2im_backward_cpu - CUDA: col2im_backward_cuda + tags: canonical - func: column_stack(Tensor[] tensors) -> Tensor @@ -11452,18 +11616,6 @@ CPU: im2col_cpu CUDA: im2col_cuda -- func: im2col_backward.grad_input(Tensor grad_output, int[2] input_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride, *, Tensor(a!) grad_input) -> Tensor(a!) - python_module: nn - dispatch: - CPU: im2col_backward_out_cpu - CUDA: im2col_backward_out_cuda - -- func: im2col_backward(Tensor grad_output, int[2] input_size, int[2] kernel_size, int[2] dilation, int[2] padding, int[2] stride) -> Tensor - python_module: nn - dispatch: - CPU: im2col_backward_cpu - CUDA: im2col_backward_cuda - - func: isfinite(Tensor self) -> Tensor variants: function, method device_check: NoCheck @@ -12130,8 +12282,6 @@ - func: linalg_cross.out(Tensor self, Tensor other, *, int dim=-1, Tensor(a!) out) -> Tensor(a!) python_module: linalg structured: True - precomputed: - - dim -> int dim dispatch: CPU, CUDA: linalg_cross_out @@ -12343,34 +12493,26 @@ dispatch: CPU, CUDA: linalg_householder_product_out -- func: _linalg_inv_out_helper_(Tensor(a!) self, Tensor(b!) infos_lu, Tensor(c!) infos_getri) -> Tensor(a!) - variants: function - dispatch: - CPU: _linalg_inv_out_helper_cpu - CUDA: _linalg_inv_out_helper_cuda - autogen: _linalg_inv_out_helper, _linalg_inv_out_helper.out - -- func: linalg_inv_ex(Tensor self, *, bool check_errors=False) -> (Tensor inverse, Tensor info) +- func: linalg_inv_ex(Tensor A, *, bool check_errors=False) -> (Tensor inverse, Tensor info) python_module: linalg - variants: function - dispatch: - # calls transpose_ - CompositeExplicitAutogradNonFunctional: linalg_inv_ex + structured_delegate: linalg_inv_ex.inverse -- func: linalg_inv_ex.inverse(Tensor self, *, bool check_errors=False, Tensor(a!) inverse, Tensor(b!) info) -> (Tensor(a!) inverse, Tensor(b!) info) +- func: linalg_inv_ex.inverse(Tensor A, *, bool check_errors=False, Tensor(a!) inverse, Tensor(b!) info) -> (Tensor(a!) inverse, Tensor(b!) info) python_module: linalg - variants: function + structured: True dispatch: - # calls transpose_ - CompositeExplicitAutogradNonFunctional: linalg_inv_ex_out + CPU, CUDA: linalg_inv_ex_out -- func: linalg_inv(Tensor self) -> Tensor +- func: linalg_inv(Tensor A) -> Tensor python_module: linalg - variants: function -- func: linalg_inv.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) +- func: linalg_inv.out(Tensor A, *, Tensor(a!) out) -> Tensor(a!) python_module: linalg - variants: function + +- func: inverse(Tensor self) -> Tensor + variants: function, method + +- func: inverse.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) - func: inner(Tensor self, Tensor other) -> Tensor variants: function, method @@ -12603,6 +12745,17 @@ - func: linalg_multi_dot.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!) python_module: linalg +## Functions related to the `torch.nested` namespace +# Note [nested namespace binding] +# Functions in the nested python module should have their names start with +# "nested_" underscore and be bound to the desired Python name in +# torch/nested/__init__.py, and the desired C++ name in torch/csrc/api/include/torch/nested.h. +# The "nested_" names should be hidden from the user and not documented. + +- func: nested_to_padded_tensor(Tensor self, float padding, int[]? output_size=None) -> Tensor + python_module: nested + variants: function + ## Functions that are only for testing # It is undocumented and should not be used outside of tests. - func: _test_serialization_subcmul(Tensor self, Tensor other, Scalar alpha=1) -> Tensor @@ -12698,11 +12851,11 @@ variants: function python_module: nn -- func: nested_tensor(Tensor[] list, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor +- func: _nested_tensor_from_tensor_list(Tensor[] list, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor variants: function dispatch: - CompositeExplicitAutograd: nested_tensor - autogen: nested_tensor.out + CompositeExplicitAutograd: _nested_tensor_from_tensor_list + autogen: _nested_tensor_from_tensor_list.out - func: _fw_primal_copy(Tensor self, int level) -> Tensor variants: function @@ -12740,10 +12893,10 @@ CompositeExplicitAutogradNonFunctional: _neg_view_copy tags: view_copy -- func: as_strided_copy(Tensor self, int[] size, int[] stride, int? storage_offset=None) -> Tensor +- func: as_strided_copy(Tensor self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None) -> Tensor variants: function dispatch: - CompositeExplicitAutogradNonFunctional: as_strided_copy + CompositeExplicitAutogradNonFunctional: as_strided_copy_symint tags: view_copy - func: _sparse_broadcast_to_copy(Tensor self, int[] size) -> Tensor @@ -12758,16 +12911,10 @@ CompositeExplicitAutogradNonFunctional: diagonal_copy tags: view_copy -- func: expand_copy(Tensor self, int[] size, *, bool implicit=False) -> Tensor - variants: function - dispatch: - CompositeExplicitAutogradNonFunctional: expand_copy - tags: view_copy - -- func: expand_copy.SymInt(Tensor self, SymInt[] size, *, bool implicit=False) -> Tensor +- func: expand_copy(Tensor self, SymInt[] size, *, bool implicit=False) -> Tensor variants: function dispatch: - CompositeExplicitAutograd: expand_copy_SymInt + CompositeExplicitAutogradNonFunctional: expand_copy_symint tags: view_copy - func: permute_copy(Tensor self, int[] dims) -> Tensor @@ -12776,16 +12923,16 @@ CompositeExplicitAutogradNonFunctional: permute_copy tags: view_copy -- func: _reshape_alias_copy(Tensor self, int[] size, int[] stride) -> Tensor +- func: _reshape_alias_copy(Tensor self, SymInt[] size, SymInt[] stride) -> Tensor variants: function dispatch: - CompositeExplicitAutogradNonFunctional: _reshape_alias_copy + CompositeExplicitAutogradNonFunctional: _reshape_alias_copy_symint tags: view_copy -- func: select_copy.int(Tensor self, int dim, int index) -> Tensor +- func: select_copy.int(Tensor self, int dim, SymInt index) -> Tensor variants: function dispatch: - CompositeExplicitAutogradNonFunctional: select_copy_int + CompositeExplicitAutogradNonFunctional: select_copy_symint tags: view_copy - func: detach_copy(Tensor self) -> Tensor @@ -12794,22 +12941,22 @@ CompositeExplicitAutogradNonFunctional: detach_copy tags: view_copy -- func: slice_copy.Tensor(Tensor self, int dim=0, int? start=None, int? end=None, int step=1) -> Tensor +- func: slice_copy.Tensor(Tensor self, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1) -> Tensor variants: function dispatch: - CompositeExplicitAutogradNonFunctional: slice_copy_Tensor + CompositeExplicitAutogradNonFunctional: slice_copy_Tensor_symint tags: view_copy -- func: split_copy.Tensor(Tensor self, int split_size, int dim=0) -> Tensor[] +- func: split_copy.Tensor(Tensor self, SymInt split_size, int dim=0) -> Tensor[] variants: function dispatch: - CompositeExplicitAutogradNonFunctional: split_copy_Tensor + CompositeExplicitAutogradNonFunctional: split_copy_Tensor_symint tags: view_copy -- func: split_with_sizes_copy(Tensor self, int[] split_sizes, int dim=0) -> Tensor[] +- func: split_with_sizes_copy(Tensor self, SymInt[] split_sizes, int dim=0) -> Tensor[] variants: function dispatch: - CompositeExplicitAutogradNonFunctional: split_with_sizes_copy + CompositeExplicitAutogradNonFunctional: split_with_sizes_copy_symint tags: view_copy - func: squeeze_copy(Tensor self) -> Tensor @@ -12881,14 +13028,14 @@ - func: ccol_indices_copy(Tensor self) -> Tensor variants: function dispatch: - CompositeExplicitAutograd: ccol_indices_copy + CompositeExplicitAutogradNonFunctional: ccol_indices_copy tags: view_copy autogen: ccol_indices_copy.out - func: row_indices_copy(Tensor self) -> Tensor variants: function dispatch: - CompositeExplicitAutograd: row_indices_copy + CompositeExplicitAutogradNonFunctional: row_indices_copy tags: view_copy autogen: row_indices_copy.out @@ -12898,10 +13045,10 @@ CompositeExplicitAutogradNonFunctional: unbind_copy_int tags: view_copy -- func: view_copy(Tensor self, int[] size) -> Tensor +- func: view_copy(Tensor self, SymInt[] size) -> Tensor variants: function dispatch: - CompositeExplicitAutogradNonFunctional: view_copy + CompositeExplicitAutogradNonFunctional: view_copy_symint tags: view_copy - func: view_copy.dtype(Tensor self, ScalarType dtype) -> Tensor @@ -12958,18 +13105,10 @@ CompositeExplicitAutograd: _neg_view_copy_out -- func: view_copy.SymInt(Tensor self, SymInt[] size) -> Tensor +- func: as_strided_copy.out(Tensor self, SymInt[] size, SymInt[] stride, SymInt? storage_offset=None, *, Tensor(a!) out) -> Tensor(a!) variants: function dispatch: - CompositeExplicitAutograd: view_copy_SymInt - tags: view_copy - autogen: view_copy.SymInt_out - - -- func: as_strided_copy.out(Tensor self, int[] size, int[] stride, int? storage_offset=None, *, Tensor(a!) out) -> Tensor(a!) - variants: function - dispatch: - CompositeExplicitAutograd: as_strided_copy_out + CompositeExplicitAutograd: as_strided_copy_out_symint - func: _sparse_broadcast_to_copy.out(Tensor self, int[] size, *, Tensor(a!) out) -> Tensor(a!) @@ -12984,16 +13123,10 @@ CompositeExplicitAutograd: diagonal_copy_out -- func: expand_copy.SymInt_out(Tensor self, SymInt[] size, *, bool implicit=False, Tensor(a!) out) -> Tensor(a!) +- func: expand_copy.out(Tensor self, SymInt[] size, *, bool implicit=False, Tensor(a!) out) -> Tensor(a!) variants: function dispatch: - CompositeExplicitAutograd: expand_copy_SymInt_out - - -- func: expand_copy.out(Tensor self, int[] size, *, bool implicit=False, Tensor(a!) out) -> Tensor(a!) - variants: function - dispatch: - CompositeExplicitAutograd: expand_copy_out + CompositeExplicitAutograd: expand_copy_out_symint - func: permute_copy.out(Tensor self, int[] dims, *, Tensor(a!) out) -> Tensor(a!) @@ -13002,16 +13135,16 @@ CompositeExplicitAutograd: permute_copy_out -- func: _reshape_alias_copy.out(Tensor self, int[] size, int[] stride, *, Tensor(a!) out) -> Tensor(a!) +- func: _reshape_alias_copy.out(Tensor self, SymInt[] size, SymInt[] stride, *, Tensor(a!) out) -> Tensor(a!) variants: function dispatch: CompositeExplicitAutograd: _reshape_alias_copy_out -- func: select_copy.int_out(Tensor self, int dim, int index, *, Tensor(a!) out) -> Tensor(a!) +- func: select_copy.int_out(Tensor self, int dim, SymInt index, *, Tensor(a!) out) -> Tensor(a!) variants: function dispatch: - CompositeExplicitAutograd: select_copy_int_out + CompositeExplicitAutograd: select_copy_symint_out - func: detach_copy.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) @@ -13020,19 +13153,19 @@ CompositeExplicitAutograd: detach_copy_out -- func: slice_copy.Tensor_out(Tensor self, int dim=0, int? start=None, int? end=None, int step=1, *, Tensor(a!) out) -> Tensor(a!) +- func: slice_copy.Tensor_out(Tensor self, int dim=0, SymInt? start=None, SymInt? end=None, SymInt step=1, *, Tensor(a!) out) -> Tensor(a!) variants: function dispatch: CompositeExplicitAutograd: slice_copy_Tensor_out -- func: split_copy.Tensor_out(Tensor self, int split_size, int dim=0, *, Tensor(a!)[] out) -> () +- func: split_copy.Tensor_out(Tensor self, SymInt split_size, int dim=0, *, Tensor(a!)[] out) -> () variants: function dispatch: CompositeExplicitAutograd: split_copy_Tensor_out -- func: split_with_sizes_copy.out(Tensor self, int[] split_sizes, int dim=0, *, Tensor(a!)[] out) -> () +- func: split_with_sizes_copy.out(Tensor self, SymInt[] split_sizes, int dim=0, *, Tensor(a!)[] out) -> () variants: function dispatch: CompositeExplicitAutograd: split_with_sizes_copy_out @@ -13110,10 +13243,10 @@ CompositeExplicitAutograd: unbind_copy_int_out -- func: view_copy.out(Tensor self, int[] size, *, Tensor(a!) out) -> Tensor(a!) +- func: view_copy.out(Tensor self, SymInt[] size, *, Tensor(a!) out) -> Tensor(a!) variants: function dispatch: - CompositeExplicitAutograd: view_copy_out + CompositeExplicitAutograd: view_copy_out_symint - func: view_copy.dtype_out(Tensor self, ScalarType dtype, *, Tensor(a!) out) -> Tensor(a!) @@ -13133,18 +13266,17 @@ dispatch: CompositeExplicitAutograd: alias_copy_out -- func: to_padded_tensor(Tensor self, float padding, int[]? output_size=None) -> Tensor +- func: to_padded_tensor(Tensor self, float padding, SymInt[]? output_size=None) -> Tensor variants: method dispatch: NestedTensorCPU: NestedTensor_to_padded_tensor_generic NestedTensorCUDA: NestedTensor_to_padded_tensor_cuda autogen: to_padded_tensor.out -- func: _nested_tensor_layer_norm(Tensor self, Tensor? weight, Tensor? bias, float eps) -> Tensor - variants: method +- func: _nested_tensor_softmax_with_shape(Tensor self, Tensor query) -> Tensor dispatch: - NestedTensorCPU, NestedTensorCUDA: NestedTensor_layer_norm - autogen: _nested_tensor_layer_norm.out + NestedTensorCPU: NestedTensor_softmax_dropout + NestedTensorCUDA: NestedTensor_softmax_dropout_cuda # Apparently, putting "forward" in the name will cause Python bindings to be skipped, so "fwd" it is. - func: _transformer_encoder_layer_fwd(Tensor src, int embed_dim, int num_heads, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, bool use_gelu, bool norm_first, float eps, Tensor norm_weight_1, Tensor norm_bias_1, Tensor norm_weight_2, Tensor norm_bias_2, Tensor ffn_weight_1, Tensor ffn_bias_1, Tensor ffn_weight_2, Tensor ffn_bias_2, Tensor? mask=None, int? mask_type=None) -> Tensor @@ -13156,11 +13288,53 @@ - func: _native_multi_head_attention(Tensor query, Tensor key, Tensor value, int embed_dim, int num_head, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, Tensor? mask=None, bool need_weights=True, bool average_attn_weights=True, int? mask_type=None) -> (Tensor, Tensor) variants: function dispatch: - CPU, CUDA, NestedTensorCPU, NestedTensorCUDA: native_multi_head_attention + CPU, NestedTensorCPU: native_multi_head_attention_cpu + CUDA, NestedTensorCUDA: native_multi_head_attention_cuda autogen: _native_multi_head_attention.out -- func: _scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=True, bool is_causal=False) -> (Tensor, Tensor) +- func: _scaled_dot_product_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=False, bool is_causal=False) -> (Tensor, Tensor) + python_module: nn variants: function + autogen: _scaled_dot_product_attention.out + +- func: _fused_sdp_choice(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=False, bool is_causal=False) -> int + dispatch: + CPU, NestedTensorCPU, Meta: _fused_sdp_choice_cpp + CUDA, NestedTensorCUDA: _fused_sdp_choice_cuda + +- func: _scaled_dot_product_attention_math(Tensor query, Tensor key, Tensor value, Tensor? attn_mask=None, float dropout_p=0.0, bool need_attn_weights=False, bool is_causal=False) -> (Tensor, Tensor) + variants: function + +- func: _scaled_dot_product_flash_attention(Tensor query, Tensor key, Tensor value, float dropout_p=0.0, bool return_softmax=False, bool is_causal=False) -> (Tensor, Tensor, Tensor) + dispatch: + CUDA: _scaled_dot_product_flash_attention_cuda + NestedTensorCUDA: _scaled_dot_product_flash_attention_nestedtensor_cuda + +- func: _scaled_dot_product_efficient_attention(Tensor query, Tensor key, Tensor value, bool compute_log_sumexp, bool is_causal=False) -> (Tensor, Tensor) + dispatch: + CUDA: _scaled_dot_product_efficient_attention_cuda + NestedTensorCUDA: _scaled_dot_product_efficient_attention_nestedtensor_cuda + +- func: _scaled_dot_product_efficient_attention_backward(Tensor grad_out_, Tensor query, Tensor key, Tensor value, Tensor out, Tensor logsumexp, bool is_causal=False) -> (Tensor, Tensor, Tensor) + dispatch: + CUDA: _scaled_dot_product_efficient_attention_backward_cuda + +# Returns ouput, softmax_logsumexp, softmax +- func: _flash_attention_forward(Tensor query, Tensor key, Tensor value, Tensor cum_seq_q, Tensor cum_seq_k, int max_q, int max_k, bool return_softmax, float dropout_p, bool is_causal) -> (Tensor, Tensor, Tensor) + variants: function + dispatch: + CUDA: _flash_attention_forward + +# Returns ouput, logsumexp if compute_logsumexp +- func: _efficient_attention_forward(Tensor query, Tensor key, Tensor value, Tensor? cu_seqlens_q, Tensor? cu_seqlens_k, int? max_seqlen_q, bool compute_log_sumexp=False, bool causal=False) -> (Tensor, Tensor) + variants: function + dispatch: + CUDA: _efficient_attention_forward + +- func: _efficient_attention_backward(Tensor grad_out_, Tensor query, Tensor key, Tensor value, Tensor out, Tensor logsumexp, bool is_causal=False) -> (Tensor, Tensor, Tensor) + variants: function + dispatch: + CUDA: _efficient_attention_backward - func: _triton_scaled_dot_attention(Tensor q, Tensor k, Tensor v, float dropout_p=0.0) -> Tensor variants: function @@ -13792,3 +13966,11 @@ dispatch: CPU: foobar autogen: _foobar.out + +# Fused Optimizer CUDA kernels. +- func: _fused_adam_(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] exp_avgs, Tensor(d!)[] exp_avg_sqs, Tensor(e!)[] max_exp_avg_sqs, Tensor[] state_steps, *, float lr, float beta1, float beta2, float weight_decay, float eps, bool amsgrad, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> () + # Unlike "foreach" functions, lists of tensors should be guaranteed to be on the same device (for now). + variants: function + dispatch: + CUDA: _fused_adam_kernel_cuda_ + autogen: _fused_adam, _fused_adam.out diff --git a/aten/src/ATen/native/nested/NestedTensorAliases.cpp b/aten/src/ATen/native/nested/NestedTensorAliases.cpp new file mode 100644 index 000000000000..c5785297be4f --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorAliases.cpp @@ -0,0 +1,15 @@ +#include + +namespace at { +namespace native { + +// alias for to_padded_tensor in nested namespace +Tensor nested_to_padded_tensor( + const Tensor& t, + double padding, + OptionalIntArrayRef output_size) { + return t.to_padded_tensor(padding, output_size); +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorBackward.cpp b/aten/src/ATen/native/nested/NestedTensorBackward.cpp index 39016bd85e5f..51a4210a56ae 100644 --- a/aten/src/ATen/native/nested/NestedTensorBackward.cpp +++ b/aten/src/ATen/native/nested/NestedTensorBackward.cpp @@ -8,12 +8,12 @@ #include #include #include -#include +#include namespace at { namespace native { -// See Note [nested tensor matmul] TODO in NestedTensorMath.cpp +// See Note [nested tensor matmul] in NestedTensorMath.cpp std::tuple matmul_backward_nested( const Tensor& grad, const Tensor& self, @@ -66,23 +66,6 @@ std::tuple nested_linear_backward( return std::tuple{grad_input, grad_weight, grad_bias}; } -Tensor _reshape_nested_backward(const Tensor& self, const Tensor& grad) { - auto self_ptr = get_nested_tensor_impl(self); - // TODO: this is to reproduce self_ptr->opt_sizes_ - // if an accessor is provided in the future, can replace this - std::vector sizes; - for (int64_t i = 0; i < self_ptr->dim(); i++) { - c10::optional opt_size = self_ptr->opt_size(i); - if (opt_size.has_value()) { - sizes.push_back(*opt_size); - } - else { - sizes.push_back(-1); - } - } - return grad.reshape(sizes); -} - Tensor nested_softmax_backward( const Tensor& grad, const Tensor& output, @@ -123,6 +106,68 @@ Tensor nested_softmax_backward( input_dtype); } return grad_output; + +} + +// Rudimentary sum backward assuming the conditions in #82387 +Tensor _nested_sum_backward_cpu( + const Tensor& grad, + const Tensor& nested_self, + OptionalIntArrayRef opt_dims, + bool keepdim) { + auto nt_self = get_nested_tensor_impl(nested_self); + auto nt_grad = get_nested_tensor_impl(grad); + const Tensor& grad_buffer = nt_grad->get_buffer(); + const Tensor& self_buffer = nt_self->get_buffer(); + auto grad_sizes = nt_grad->get_nested_size_tensor(); + auto self_sizes = nt_self->get_nested_size_tensor(); + int64_t ntensors = nt_self->size(0); + const Tensor& self_grad_buffer = self_buffer.new_empty(self_buffer.sizes()); + + auto num_segments = at::prod(grad_sizes, -1); + auto segment_lengths = self_sizes.select(1, -1); + + // This logic assumes for now that + // (1) all the gradient nested tensors are contiguous + // (2) the gradient nested tensors are stored contiguously in the buffer + AT_DISPATCH_ALL_TYPES_AND2( + ScalarType::Half, ScalarType::BFloat16, self_grad_buffer.scalar_type(), "nested_sum_dim_cpu", [&]() { + auto* self_grad_data = self_grad_buffer.data_ptr(); + const auto* output_grad_data = grad_buffer.data_ptr(); + int64_t out_idx = 0, in_idx = 0; + for (const auto i : c10::irange(ntensors)) { + int64_t segments = num_segments[i].item(); + int64_t segment_length = segment_lengths[i].item(); + for (auto j = 0; j < segments; j++) { + scalar_t output_grad = output_grad_data[out_idx]; + for (auto k = 0; k < segment_length; k++) { + self_grad_data[in_idx] = output_grad; + in_idx += 1; + } + out_idx += 1; + } + } + }); + + return wrap_buffer(self_grad_buffer, self_sizes); + +} + + +Tensor _nested_select_backward_symint( + const Tensor& grad, + const Tensor& nested_self, + int64_t dim, + c10::SymInt index) { + auto nt_self = get_nested_tensor_impl(nested_self); + const Tensor& self_buffer = nt_self->get_buffer(); + const auto self_sizes = nt_self->get_nested_size_tensor(); + const Tensor& self_grad_buffer = self_buffer.new_zeros(self_buffer.sizes()); + + auto nt_grad = wrap_buffer(self_grad_buffer, self_sizes); + nt_grad.select_symint(dim, index).copy_(grad); + + return nt_grad; } } // namespace native diff --git a/aten/src/ATen/native/nested/NestedTensorBinaryOps.cpp b/aten/src/ATen/native/nested/NestedTensorBinaryOps.cpp new file mode 100644 index 000000000000..215252f91d6d --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorBinaryOps.cpp @@ -0,0 +1,247 @@ +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace at { +namespace native { + +DEFINE_DISPATCH(nested_dense_elementwise_stub); +REGISTER_NO_CPU_DISPATCH(nested_dense_elementwise_stub); + +std::pair +get_elementwise_nested_tensor_impl( + const Tensor& self, + const Tensor& other, + const std::string& op_name) { + if (self.is_nested() && !(other.is_nested())) { + TORCH_CHECK( + false, + "Expected both self and other to be nested, but got a nested self and non-nested other"); + } else if (!(self.is_nested()) && other.is_nested()) { + TORCH_CHECK( + false, + "Expected both self and other to be nested, but got a non-nested self and nested other"); + } else if (!(self.is_nested()) || !(other.is_nested())) { + TORCH_CHECK( + false, + "Expected both self and other to be nested, but got a non-nested self and non-nested other"); + } + + auto self_ptr = get_nested_tensor_impl(self); + auto other_ptr = get_nested_tensor_impl(other); + + TORCH_CHECK( + self.dim() == other.dim(), + op_name, + " does not support broadcasting when given a NestedTensor"); + TORCH_CHECK( + at::equal( + self_ptr->get_nested_size_tensor(), + other_ptr->get_nested_size_tensor()), + op_name, + " does not support broadcasting when given a NestedTensor"); + TORCH_CHECK( + at::equal( + self_ptr->get_nested_stride_tensor(), + other_ptr->get_nested_stride_tensor()), + op_name, + " requires strides to match when given NestedTensors"); + auto self_offsets = self_ptr->get_storage_offsets(); + auto other_offsets = other_ptr->get_storage_offsets(); + bool offsets_match = true; + for (size_t i = 0; i < self_offsets.size(); i++) { + offsets_match = offsets_match && (self_offsets[i] == other_offsets[i]); + } + TORCH_CHECK( + offsets_match, + op_name, + " requires offsets to match when given NestedTensors"); + return std::make_pair(self_ptr, other_ptr); +} + +template +Tensor NestedTensor_elementwise_Tensor( + const Tensor& self, + const Tensor& other, + const std::string& op_name, + Func f) { + // self is a scalar + if (!self.is_nested() && self.dim() == 0 && self.numel() == 1) { + auto other_impl = get_nested_tensor_impl(other); + return wrap_buffer( + f(self, other_impl->get_unsafe_storage_as_tensor()), + other_impl->get_nested_size_tensor().clone(), + other_impl->get_nested_stride_tensor().clone(), + other_impl->get_storage_offsets() + ); + } + // other is a scalar + if (!other.is_nested() && other.dim() == 0 && other.numel() == 1) { + auto self_impl = get_nested_tensor_impl(self); + return wrap_buffer( + f(self_impl->get_unsafe_storage_as_tensor(), other), + self_impl->get_nested_size_tensor().clone(), + self_impl->get_nested_stride_tensor().clone(), + self_impl->get_storage_offsets() + ); + } + // special case when other is dense + if (self.is_nested() && !other.is_nested()) { + // check for the [B, *, D], [B, 1, D] esuhm case + // TODO: this if statement is ugly and hopefully we will remove this in the near future + auto self_ptr = get_nested_tensor_impl(self); + if (self_ptr->dim() == 3 && + other.dim() == 3 && + self_ptr->size(0) == other.size(0) && + other.size(1) == 1 && + self_ptr->opt_size(2).has_value() && + self_ptr->opt_size(2).value() == other.size(2) && + self.is_cuda() && + other.is_cuda()) { + if (!nested_tensor_impl_is_contiguous(self_ptr)) { + self_ptr = get_nested_tensor_impl(self.contiguous()); + } + const auto self_buffer = self_ptr->get_buffer(); + const auto self_sizes = self_ptr->get_nested_size_tensor(); + auto result_buffer = at::empty_like(self_buffer); + auto result = wrap_buffer(result_buffer, self_sizes); + if (op_name == "add") { + nested_dense_elementwise_stub(self.device().type(), result, self, other, NESTED_DENSE_OP::ADD); + } else if (op_name == "mul") { + nested_dense_elementwise_stub(self.device().type(), result, self, other, NESTED_DENSE_OP::MUL); + } else { + TORCH_CHECK(false, "Unsupported nested dense elementwise op"); + } + return result; + } + TORCH_CHECK(false, "Expected both self and other to be nested, but got a nested self and non-nested other."); + } + + NestedTensorImpl* self_impl = nullptr; + NestedTensorImpl* other_impl = nullptr; + std::tie(self_impl, other_impl) = + get_elementwise_nested_tensor_impl(self, other, op_name); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(self_impl); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(other_impl); + return wrap_buffer( + f(self_impl->get_unsafe_storage_as_tensor(), + other_impl->get_unsafe_storage_as_tensor()), + self_impl->get_nested_size_tensor(), + self_impl->get_nested_stride_tensor(), + self_impl->get_storage_offsets()); +} + +Tensor NestedTensor_add_Tensor( + const Tensor& self, + const Tensor& other, + const Scalar& alpha) { + return NestedTensor_elementwise_Tensor( + self, other, "add", [alpha](const Tensor& b1, const Tensor& b2) { + return at::add(b1, b2, alpha); + }); +} + +Tensor NestedTensor_mul_Tensor(const Tensor& self, const Tensor& other) { + return NestedTensor_elementwise_Tensor( + self, other, "mul", [](const Tensor& b1, const Tensor& b2) { + return at::mul(b1, b2); + }); +} + +// Only usable on the C++ side; scalars are converted to tensors coming from Python. +Tensor NestedTensor_mul_Scalar(const Tensor& self, const Scalar& other) { + return NestedTensor_mul_Tensor(self, wrapped_scalar_tensor(other)); +} + +Tensor NestedTensor_div_Tensor(const Tensor& self, const Tensor& other) { + return NestedTensor_elementwise_Tensor( + self, other, "div", [](const Tensor& b1, const Tensor& b2) { + return at::div(b1, b2); + }); +} + +// Only usable on the C++ side; scalars are converted to tensors coming from Python. +Tensor NestedTensor_div_Scalar(const Tensor& self, const Scalar& other) { + return NestedTensor_div_Tensor(self, wrapped_scalar_tensor(other)); +} + +template +Tensor& NestedTensor_elementwise__Tensor( + Tensor& self, + const Tensor& other, + const std::string& op_name, + Func f) { + // self is a scalar + if (!self.is_nested() && self.dim() == 0 && self.numel() == 1) { + auto other_impl = get_nested_tensor_impl(other); + f(self, other_impl->get_buffer()); + return self; + } + // other is a scalar + if (!other.is_nested() && other.dim() == 0 && other.numel() == 1) { + auto self_impl = get_nested_tensor_impl(self); + f(self_impl->get_buffer(), other); + return self; + } + NestedTensorImpl* self_impl = nullptr; + NestedTensorImpl* other_impl = nullptr; + std::tie(self_impl, other_impl) = + get_elementwise_nested_tensor_impl(self, other, op_name); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(self_impl); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(other_impl); + const auto& nt_self = *self_impl; + const auto& nt_other = *other_impl; + f(nt_self.get_buffer().view({-1}), nt_other.get_buffer().view({-1})); + return self; +} + +Tensor& NestedTensor_add__Tensor( + Tensor& self, + const Tensor& other, + const Scalar& alpha) { + return NestedTensor_elementwise__Tensor( + self, other, "add_", [alpha](const Tensor& b1, const Tensor& b2) { + return b1.add_(b2, alpha); + }); +} + +Tensor& NestedTensor_mul__Tensor(Tensor& self, const Tensor& other) { + return NestedTensor_elementwise__Tensor( + self, other, "mul_", [](const Tensor& b1, const Tensor& b2) { + return b1.mul_(b2); + }); +} + +// Only usable on the C++ side; scalars are converted to tensors coming from Python. +Tensor& NestedTensor_mul__Scalar(Tensor& self, const Scalar& other) { + return NestedTensor_mul__Tensor(self, wrapped_scalar_tensor(other)); +} + +Tensor& fill_nested_(Tensor& self, const Scalar& value) { + const auto& self_buf = get_nested_tensor_impl(self)->get_buffer(); + self_buf.fill_(value); + return self; +} + +Tensor& fill_nested_(Tensor& self, const Tensor& value) { + const auto& self_buf = get_nested_tensor_impl(self)->get_buffer(); + self_buf.fill_(value); + return self; +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorBinaryOps.h b/aten/src/ATen/native/nested/NestedTensorBinaryOps.h new file mode 100644 index 000000000000..51eeaf291911 --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorBinaryOps.h @@ -0,0 +1,16 @@ +#pragma once + +#include +#include + +namespace at { +namespace native { + +enum class NESTED_DENSE_OP: uint8_t {ADD, MUL}; + +using nested_dense_elementwise_fn = void (*)(Tensor& result, const Tensor & self, const Tensor & other, const NESTED_DENSE_OP& op); + +DECLARE_DISPATCH(nested_dense_elementwise_fn, nested_dense_elementwise_stub); + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorFactories.cpp b/aten/src/ATen/native/nested/NestedTensorFactories.cpp new file mode 100644 index 000000000000..b45fbb24880c --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorFactories.cpp @@ -0,0 +1,125 @@ +#include +#include +#include + +namespace at { +namespace native { + +TensorOptions verify_empty_parameters( + const at::Tensor& self, + c10::optional dtype, + c10::optional layout, + c10::optional device, + c10::optional pin_memory, + c10::optional optional_memory_format) { + TensorOptions options_ = + TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory( + pin_memory); + + TORCH_CHECK( + !(options_.has_memory_format() && optional_memory_format.has_value()), + "Cannot set memory_format both in TensorOptions and explicit argument; please delete " + "the redundant setter."); + TensorOptions options = self.options().merge_in(options_).merge_memory_format( + optional_memory_format); + + auto memory_format = + options_.memory_format_opt().value_or(MemoryFormat::Preserve); + TORCH_CHECK( + memory_format == MemoryFormat::Preserve, + "empty_like_nested only supports memory format Preserve, but got ", + memory_format, + " instead."); + + TORCH_CHECK( + self.is_contiguous(), + "empty_like only supports contiguous memory format for Nested Tensors"); + + TORCH_CHECK( + !(options.layout() != kStrided && optional_memory_format.has_value()), + "memory format option is only supported by strided tensors"); + return options; +} + +Tensor empty_like_nested( + const Tensor& self, + c10::optional dtype, + c10::optional layout, + c10::optional device, + c10::optional pin_memory, + c10::optional optional_memory_format) { + auto options = verify_empty_parameters( + self, dtype, layout, device, pin_memory, optional_memory_format); + auto self_nt = get_nested_tensor_impl(self); + Tensor new_buffer = at::empty_like(self_nt->get_buffer(), options); + auto nested_size = self_nt->get_nested_size_tensor().clone(); + auto nested_strides = self_nt->get_nested_stride_tensor().clone(); + auto offsets = std::vector(self_nt->get_storage_offsets()); + auto tensor = detail::make_tensor_base( + new_buffer, nested_size, nested_strides, std::move(offsets)); + return tensor; +} + +// Take a Device that may not have device_index set (i.e., having it as -1 +// representing the current device) and return the corresponding Device +// according to the actual device at the time of this function call. No-op +// if the device_index is set. +static inline Device ensure_has_index(Device device) { + if (device.is_cpu() || device.has_index()) { + return device; + } + const c10::impl::DeviceGuardImplInterface* impl = + c10::impl::getDeviceGuardImpl(device.type()); + return impl->getDevice(); +} + +Tensor _to_copy_nested( + const Tensor& self, + c10::optional dtype, + c10::optional layout, + c10::optional device, + c10::optional pin_memory, + bool non_blocking, + c10::optional optional_memory_format) { + TORCH_CHECK( + !layout.has_value() || self.layout() == layout.value(), + "to(options) doesn't support converting to a different layout, " + "but got self.layout being ", + self.layout(), + " and options.layout set as ", + layout.value()); + auto options = + TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory( + pin_memory); + + if (options.has_device()) { + options = options.device(ensure_has_index(options.device())); + } + // memory_format is handled separately due to MemoryFormat::Preserve logic + options = self.options().merge_in(options).memory_format(c10::nullopt); + auto memory_format = optional_memory_format.value_or(MemoryFormat::Preserve); + + bool pin_out = + (non_blocking && self.is_cuda() && options.device().is_cpu() && + (options.layout() == c10::kStrided)); + + Tensor r; + r = at::empty_like(self, dtype, layout, device, pin_out, memory_format); + get_nested_tensor_impl(r)->get_buffer().copy_( + get_nested_tensor_impl(self)->get_buffer(), non_blocking); + return r; +} + +Tensor& copy_nested_(Tensor& self, const Tensor& src, bool non_blocking) { + const auto* nt_self = get_nested_tensor_impl(self); + const auto* nt_src = get_nested_tensor_impl(src); + TORCH_CHECK( + at::equal( + nt_self->get_nested_size_tensor(), nt_src->get_nested_size_tensor()), + "copy_ only supports tensors that are the same size for Nested implementations"); + nt_self->get_buffer().copy_(nt_src->get_buffer(), non_blocking); + return self; +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorFactories.h b/aten/src/ATen/native/nested/NestedTensorFactories.h new file mode 100644 index 000000000000..51123f0fc119 --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorFactories.h @@ -0,0 +1,7 @@ +#pragma once + +namespace at { +namespace native { + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorMath.cpp b/aten/src/ATen/native/nested/NestedTensorMath.cpp index 486173a1aa67..5842c3b8b217 100644 --- a/aten/src/ATen/native/nested/NestedTensorMath.cpp +++ b/aten/src/ATen/native/nested/NestedTensorMath.cpp @@ -1,42 +1,23 @@ #include -#include #include -#include -#include -#include -#include +#include +#include +#include #include -#include -#include +#include +#include +#include +#include +#include +#include +#include + +#include namespace at { namespace native { - namespace { -template -Tensor map_nt(const Tensor& nt, Func f) { - auto* nt_impl = get_nested_tensor_impl(nt); - const auto& sizes = nt_impl->get_nested_size_tensor(); - return at::detail::make_tensor(f(nt_impl->get_buffer()), sizes); -} - -c10::optional maybe_get_consistent_last_dim_of_nested_tensor( - const NestedTensorImpl& nt) { - const auto& sizes = nt.get_nested_size_tensor(); - // The last entry in every row of sizes must be the same. - const auto& last_dims = sizes.select(1, -1); - const auto last_dims_accessor = last_dims.packed_accessor64(); - // REVIEW: this can't be the most efficient and concise way to - // write this check, can it? - const auto last_dim_value = last_dims_accessor[0]; - for (const auto i : c10::irange(1, last_dims.numel())) { - if (last_dims_accessor[i] != last_dim_value) { - return c10::nullopt; - } - } - return last_dim_value; -} int64_t num_bytes(IntArrayRef sizes) { // 0-dim Tensors have torch.Size of .size() 0, but carry 1 memory. @@ -53,26 +34,6 @@ int64_t num_bytes(IntArrayRef sizes) { return result; } -std::vector NestedTensor_get_max_size_from_size_tensor(const Tensor& sizes) { - if (sizes.dim() == 0) { - return {}; - } - const auto sizes_ptr = sizes.data_ptr(); - const auto sizes_size_0 = sizes.sizes()[0]; - const auto sizes_size_1 = sizes.sizes()[1]; - TORCH_INTERNAL_ASSERT(sizes_size_1 > 0); - std::vector results(sizes_size_1, 0); - for (const auto ii : c10::irange(sizes_size_0)) { - for (const auto jj : c10::irange(sizes_size_1)) { - auto val = sizes_ptr[ii * sizes_size_1 + jj]; - if (results[jj] < val) { - results[jj] = val; - } - } - } - return results; -} - Tensor pad_tensor_to_shape( const Tensor& t, IntArrayRef goal_shape, @@ -111,40 +72,17 @@ std::vector NestedTensor_unbind( if (ntensors == 0) { return result_tensors; } - const at::Tensor& buffer = self_ptr->get_buffer(); + // This returns a differentiable view of self as a regular tensor + auto buffer = self.values(); std::vector sizes = NestedTensor_get_sizes(self_ptr), strides = NestedTensor_get_strides(self_ptr); - const std::vector& offsets = self_ptr->get_offsets(); + const std::vector& offsets = self_ptr->get_storage_offsets(); for (const int64_t i: c10::irange(ntensors)){ result_tensors[i] = buffer.as_strided(sizes[i], strides[i], offsets[i]); } return result_tensors; } -Tensor& NestedTensor_relu_(Tensor& self) { - auto buffer = get_nested_tensor_impl(self)->get_buffer(); - at::relu_(buffer); - return self; -} - -Tensor NestedTensor_relu(const Tensor& self) { - return map_nt(self, at::relu); -} - -Tensor& NestedTensor_gelu_(Tensor& self, c10::string_view approximate) { - auto buffer = get_nested_tensor_impl(self)->get_buffer(); - at::gelu_(buffer, approximate); - return self; -} - -Tensor NestedTensor_gelu(const Tensor& self, c10::string_view approximate) { - return map_nt( - self, - [approximate](const Tensor& buffer) { - return at::gelu(buffer, approximate); - }); -} - Tensor NestedTensor_nested_tensor_from_mask(const Tensor& t, const Tensor& mask, bool mask_check) { TORCH_CHECK(mask.scalar_type() == at::ScalarType::Bool, "Expected mask to be of ScalarType Bool, but got ", mask.scalar_type(), " instead."); TORCH_CHECK(mask.dim() == 2, "Padding mask should be 2D"); @@ -197,7 +135,7 @@ bool NestedTensor_nested_tensor_from_mask_left_aligned(const Tensor& t, const Te return sizes.equal(nums); } -Tensor nested_tensor( +Tensor _nested_tensor_from_tensor_list( TensorList list, c10::optional dtype, c10::optional layout, @@ -229,21 +167,58 @@ Tensor nested_tensor( pin_memory); } -int64_t get_consistent_last_dim_of_nested_tensor(const NestedTensorImpl& nt) { - auto result = maybe_get_consistent_last_dim_of_nested_tensor(nt); +C10_ALWAYS_INLINE std::pair _check_nested_layer_norm_inputs( + const NestedTensorImpl& input, + IntArrayRef normalized_shape, + const Tensor& weight /* optional */, + const Tensor& bias /* optional */) { + + const size_t normalized_ndim = normalized_shape.size(); TORCH_CHECK( - result.has_value(), - "all tensors in NestedTensor must have the same trailing dim for Matmul but got ", - nt.get_nested_size_tensor().select(1, -1)); - return *result; -} + normalized_ndim >= 1, + "Expected normalized_shape to be at least 1-dimensional, i.e., ", + "containing at least one element, but got normalized_shape = ", + normalized_shape); + TORCH_CHECK( + !weight.defined() || weight.sizes().equals(normalized_shape), + "Expected weight to be of same shape as normalized_shape, but got ", + "weight of shape ", + weight.sizes(), + " and normalized_shape = ", + normalized_shape); + TORCH_CHECK( + !bias.defined() || bias.sizes().equals(normalized_shape), + "Expected bias to be of same shape as normalized_shape, but got ", + "bias of shape ", + bias.sizes(), + " and normalized_shape = ", + normalized_shape); + + // Check that the normalized_shape has the exact same sizes as the last dimensions from the NestedTensor input + // Also, compute M and N considering the idiosyncracies of NestedTensors + int64_t N = 1; + for (const auto i: c10::irange(normalized_ndim)) { + TORCH_CHECK( + input.opt_size(-normalized_ndim + i) != c10::nullopt, + "normalized_shape extends into irregular dimensions for the nested tensor" + ); + TORCH_CHECK( + normalized_shape[i] == *input.opt_size(-normalized_ndim + i), + "The shape at dimension ", + i, + "of normalized_shape doesn't match the input" + ); + N *= normalized_shape[i]; + } + + const int64_t M = input.numel() / N; -std::vector NestedTensor_get_max_size(const NestedTensorImpl& nt) { - return NestedTensor_get_max_size_from_size_tensor(nt.get_nested_size_tensor()); + return std::make_pair(M, N); } -Tensor NestedTensor_layer_norm( +std::tuple nested_layer_norm( const Tensor& input, + IntArrayRef normalized_shape, const c10::optional& weight_opt, const c10::optional& bias_opt, double eps) { @@ -255,8 +230,9 @@ Tensor NestedTensor_layer_norm( auto* nt_input = get_nested_tensor_impl(input); TORCH_CHECK(nested_tensor_impl_is_contiguous(nt_input)); const auto& input_buffer = nt_input->get_buffer(); - const auto last_dim = get_consistent_last_dim_of_nested_tensor(*nt_input); - const auto valid_word_num = input_buffer.numel() / last_dim; + auto M_N = _check_nested_layer_norm_inputs(*nt_input, normalized_shape, weight, bias); + auto M = M_N.first; + auto N = M_N.second; const auto weight_contig = weight.expect_contiguous(); const auto bias_contig = bias.expect_contiguous(); auto output_buffer = at::native::empty_like( @@ -271,21 +247,24 @@ Tensor NestedTensor_layer_norm( auto acc_type = at::toAccumulateType(input_buffer.scalar_type(), true); options = options.dtype(acc_type); } - Tensor mean = at::empty({valid_word_num}, options); - Tensor rstd = at::empty({valid_word_num}, options); + Tensor mean = at::empty({M}, options); + Tensor rstd = at::empty({M}, options); LayerNormKernel( input_buffer.is_cuda() ? kCUDA : kCPU, input_buffer, *weight_contig, *bias_contig, - valid_word_num, - last_dim, + M, + N, eps, &output_buffer, &mean, &rstd); - return at::detail::make_tensor( - std::move(output_buffer), nt_input->get_nested_size_tensor()); + return std::make_tuple( + wrap_buffer(output_buffer, nt_input->get_nested_size_tensor()), + mean, + rstd + ); } Tensor NestedTensor_from_padded_and_nested_example( @@ -441,157 +420,6 @@ Tensor NestedTensor_embedding( result_buffer.reshape({-1}), std::move(new_sizes)); } -std::pair -get_elementwise_nested_tensor_impl( - const Tensor& self, - const Tensor& other, - const std::string& op_name) { - if (self.is_nested() && !(other.is_nested())) { - TORCH_CHECK( - false, - "Expected both self and other to be nested, but got a nested self and non-nested other"); - } else if (!(self.is_nested()) && other.is_nested()) { - TORCH_CHECK( - false, - "Expected both self and other to be nested, but got a non-nested self and nested other"); - } else if (!(self.is_nested()) || !(other.is_nested())) { - TORCH_CHECK( - false, - "Expected both self and other to be nested, but got a non-nested self and non-nested other"); - } - - auto self_ptr = get_nested_tensor_impl(self); - auto other_ptr = get_nested_tensor_impl(other); - - TORCH_CHECK( - self.dim() == other.dim(), - op_name, - " does not support broadcasting when given a NestedTensor"); - TORCH_CHECK( - at::equal( - self_ptr->get_nested_size_tensor(), - other_ptr->get_nested_size_tensor()), - op_name, - " does not support broadcasting when given a NestedTensor"); - TORCH_CHECK( - nested_tensor_impl_is_contiguous(self_ptr) && - nested_tensor_impl_is_contiguous(other_ptr), - op_name, - " does not support non-contiguous NestedTensor inputs"); - return std::make_pair(self_ptr, other_ptr); -} - -template -Tensor NestedTensor_elementwise_Tensor( - const Tensor& self, - const Tensor& other, - const std::string& op_name, - Func f) { - // self is a scalar - if (!self.is_nested() && self.dim() == 0 && self.numel() == 1) { - auto other_impl = get_nested_tensor_impl(other); - return wrap_buffer( - f(self, other_impl->get_buffer()), - other_impl->get_nested_size_tensor().clone() - ); - } - // other is a scalar - if (!other.is_nested() && other.dim() == 0 && other.numel() == 1) { - auto self_impl = get_nested_tensor_impl(self); - return wrap_buffer( - f(self_impl->get_buffer(), other), - self_impl->get_nested_size_tensor().clone() - ); - } - NestedTensorImpl* self_impl = nullptr; - NestedTensorImpl* other_impl = nullptr; - std::tie(self_impl, other_impl) = - get_elementwise_nested_tensor_impl(self, other, op_name); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(self_impl); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(other_impl); - const auto& nt_self = *self_impl; - const auto& nt_other = *other_impl; - const auto& self_sizes = nt_self.get_nested_size_tensor(); - return wrap_buffer( - f(nt_self.get_buffer().reshape({-1}), - nt_other.get_buffer().reshape({-1})), - self_sizes); -} - -Tensor NestedTensor_add_Tensor( - const Tensor& self, - const Tensor& other, - const Scalar& alpha) { - return NestedTensor_elementwise_Tensor( - self, other, "add", [alpha](const Tensor& b1, const Tensor& b2) { - return at::add(b1, b2, alpha); - }); -} - -Tensor NestedTensor_mul_Tensor(const Tensor& self, const Tensor& other) { - return NestedTensor_elementwise_Tensor( - self, other, "mul", [](const Tensor& b1, const Tensor& b2) { - return at::mul(b1, b2); - }); -} - -// Only usable on the C++ side; scalars are converted to tensors coming from Python. -Tensor NestedTensor_mul_Scalar(const Tensor& self, const Scalar& other) { - return NestedTensor_mul_Tensor(self, wrapped_scalar_tensor(other)); -} - -template -Tensor& NestedTensor_elementwise__Tensor( - Tensor& self, - const Tensor& other, - const std::string& op_name, - Func f) { - // self is a scalar - if (!self.is_nested() && self.dim() == 0 && self.numel() == 1) { - auto other_impl = get_nested_tensor_impl(other); - f(self, other_impl->get_buffer()); - return self; - } - // other is a scalar - if (!other.is_nested() && other.dim() == 0 && other.numel() == 1) { - auto self_impl = get_nested_tensor_impl(self); - f(self_impl->get_buffer(), other); - return self; - } - NestedTensorImpl* self_impl = nullptr; - NestedTensorImpl* other_impl = nullptr; - std::tie(self_impl, other_impl) = - get_elementwise_nested_tensor_impl(self, other, op_name); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(self_impl); - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(other_impl); - const auto& nt_self = *self_impl; - const auto& nt_other = *other_impl; - f(nt_self.get_buffer().view({-1}), nt_other.get_buffer().view({-1})); - return self; -} - -Tensor& NestedTensor_add__Tensor( - Tensor& self, - const Tensor& other, - const Scalar& alpha) { - return NestedTensor_elementwise__Tensor( - self, other, "add_", [alpha](const Tensor& b1, const Tensor& b2) { - return b1.add_(b2, alpha); - }); -} - -Tensor& NestedTensor_mul__Tensor(Tensor& self, const Tensor& other) { - return NestedTensor_elementwise__Tensor( - self, other, "mul_", [](const Tensor& b1, const Tensor& b2) { - return b1.mul_(b2); - }); -} - -// Only usable on the C++ side; scalars are converted to tensors coming from Python. -Tensor& NestedTensor_mul__Scalar(Tensor& self, const Scalar& other) { - return NestedTensor_mul__Tensor(self, wrapped_scalar_tensor(other)); -} - // Very rudimentary sum_dim for prototyping with torch_scatter.segment_reduce. Tensor NestedTensor_sum_dim_CPU( const Tensor& self, @@ -666,61 +494,117 @@ Tensor NestedTensor_sum_dim_CPU( Tensor select_nested(const Tensor& self, int64_t dim, int64_t index) { auto self_ptr = get_nested_tensor_impl(self); + std::vector sizes = NestedTensor_get_sizes(self_ptr), + strides = NestedTensor_get_strides(self_ptr); + const std::vector& offsets = self_ptr->get_storage_offsets(); + const at::Tensor& buffer = self_ptr->get_unsafe_storage_as_tensor(); int64_t positive_dim = at::maybe_wrap_dim(dim, self_ptr->dim()); - TORCH_CHECK( - positive_dim == 0, - "NestedTensor can only be selected along dimension 0 ", - "got dimension ", dim, " instead." - ); int64_t ntensors = self_ptr->size(0); - TORCH_CHECK_INDEX( - index >= -ntensors && index < ntensors, - "index ", index, - " is out of bounds for dimension 0 with size ", ntensors); - int64_t positive_index = index < 0 ? index + ntensors : index; - const at::Tensor& buffer = self_ptr->get_buffer(); - std::vector sizes = NestedTensor_get_sizes(self_ptr), - strides = NestedTensor_get_strides(self_ptr); - const std::vector& offsets = self_ptr->get_offsets(); - return buffer.as_strided(sizes[positive_index], strides[positive_index], offsets[positive_index]); + TORCH_CHECK_INDEX(ntensors > 0, "You can only select when the NT is not empty."); + int64_t ndims = static_cast(sizes[0].size()); + if (positive_dim == 0) { + TORCH_CHECK_INDEX( + index >= -ntensors && index < ntensors, + "index ", + index, + " is out of bounds for dimension 0 with size ", + ntensors); + int64_t positive_index = index < 0 ? index + ntensors : index; + return buffer.as_strided( + sizes[positive_index], + strides[positive_index], + offsets[positive_index]); + } else { + auto new_sizes = at::empty({ntensors, ndims-1}, TensorOptions().dtype(kLong)); + auto new_strides = at::empty({ntensors, ndims-1}, TensorOptions().dtype(kLong)); + auto new_offsets = std::vector(offsets); + std::vector tensor_slices(ntensors); + for (int64_t i : c10::irange(ntensors)) { + int64_t *size_ptr = new_sizes[i].data_ptr(); + int64_t *stride_ptr = new_strides[i].data_ptr(); + + int64_t dim_idx = 0; + for (int64_t j : c10::irange(ndims)) { + if (j != dim - 1) { + size_ptr[dim_idx] = sizes[i][j]; + stride_ptr[dim_idx] = strides[i][j]; + ++dim_idx; + } else { + TORCH_CHECK_INDEX( + index >= 0 && index < sizes[i][j], + "index ", + index, + " is out of bounds for dimension ", + j, + " of the ", + i, + "th constituent tensor with size ", + sizes[i][j]); + new_offsets[i] = offsets[i] + index * strides[i][j]; + } + } + } + return create_nested_view_tensor(self, new_sizes, new_strides, std::move(new_offsets)); + } + } Tensor clone_nested( const Tensor& self, c10::optional optional_memory_format) { - auto memory_format = optional_memory_format.value_or(MemoryFormat::Preserve); - TORCH_CHECK( - memory_format == MemoryFormat::Preserve, - "clone_nested only supports memory format Preserve, but got ", - memory_format, - " instead."); - // TODO: The size doesn't necessarily need to be cloned, but it is more - // conservative. This is something we could revisit once we land a more - // efficient implementation of nested_size_tensor_. - return wrap_buffer( - get_buffer(self).clone(), get_nested_size_tensor(self).clone()); -} - -at::Tensor NestedTensor_get_nested_size_tensor(const at::Tensor& self){ - return get_nested_size_tensor(self); + auto memory_format = optional_memory_format.value_or(c10::MemoryFormat::Preserve); + auto self_ptr = get_nested_tensor_impl(self); + if (memory_format == c10::MemoryFormat::Preserve || + (memory_format == c10::MemoryFormat::Contiguous && self.is_contiguous())) { + const Tensor& buffer = self_ptr->get_unsafe_storage_as_tensor(), + sizemat = self_ptr->get_nested_size_tensor(), + stridemat = self_ptr->get_nested_stride_tensor(); + const std::vector& offsets = self_ptr->get_storage_offsets(); + // TODO: The size and the stride do not necessarily need to be cloned, + // but it is more conservative. + // This is something we could revisit once we land a more + // efficient implementation of nested_size_tensor_ and nested_stride_tensor. + return wrap_buffer(buffer.clone(), sizemat.clone(), stridemat.clone(), std::vector(offsets)); + } + // actually, memory format is contiguous and self is noncontiguous + else if (memory_format == c10::MemoryFormat::Contiguous) { + const Tensor& self_buffer = self_ptr->get_unsafe_storage_as_tensor(), + sizemat = self_ptr->get_nested_size_tensor(); + Tensor output_buffer = at::empty(self.numel(), self_buffer.options()); + Tensor output = wrap_buffer(output_buffer, sizemat); + std::vector self_unbind = self.unbind(), + output_unbind = output.unbind(); + for (const int64_t i: c10::irange(self_ptr->size(0))) { + output_unbind[i].copy_(self_unbind[i]); + } + return output; + } else { + TORCH_CHECK( + false, + "Nested tensor clone supports Preserve and Contiguous memory formats, called clone with memory format: ", + memory_format); + } } -Tensor dropout_nested(const Tensor& input, double p, bool train) { +std::tuple native_dropout_nested(const Tensor& input, double p, c10::optional train) { auto input_ptr = get_nested_tensor_impl(input); - const Tensor& input_buffer = input_ptr->get_buffer(), + const Tensor& input_buffer = input_ptr-> get_unsafe_storage_as_tensor(), & sizemat = input_ptr->get_nested_size_tensor(), & stridemat = input_ptr->get_nested_stride_tensor(); - const std::vector& offsets = input_ptr->get_offsets(); - Tensor output_buffer = at::dropout(input_buffer, p, train); + const std::vector& offsets = input_ptr->get_storage_offsets(); + Tensor output_buffer, mask_buffer; + if (input_buffer.numel() == 0) { + output_buffer = input_buffer.clone(); + mask_buffer = input_buffer.clone(); + } + else { + std::tie(output_buffer, mask_buffer) = at::native_dropout(input_buffer, p, train); + } // regular tensor dropout reuses input size and stride // i.e. if input is not contiguous, then output is also discontiguous - return wrap_buffer(output_buffer, sizemat.clone(), stridemat.clone(), offsets); -} - -Tensor& dropout_nested_(Tensor& input, double p, bool train) { - Tensor input_buffer = get_buffer(input); - at::dropout_(input_buffer, p, train); - return input; + Tensor output = wrap_buffer(output_buffer, sizemat.clone(), stridemat.clone(), std::vector(offsets)), + mask = wrap_buffer(mask_buffer, sizemat.clone(), stridemat.clone(), std::vector(offsets)); + return std::make_tuple(output, mask); } Tensor softmax_nested( @@ -737,7 +621,10 @@ Tensor softmax_nested( positive_dim >= 1, "Cannot apply softmax across nested dimension 0"); // create a contiguous output - const Tensor& buffer = input_ptr->get_buffer(), + // TODO We would ideally use a empty_like here, but that is not supported + // for nested tensors yet. Since we are only using the buffer for the options + // and size it is okay to use unsafe_storage_as_tensor here. + const Tensor& buffer = input_ptr->get_unsafe_storage_as_tensor(), & sizemat = input_ptr->get_nested_size_tensor(); Tensor output_buffer = buffer.new_empty(buffer.sizes()); Tensor output = wrap_buffer(output_buffer, sizemat.clone()); @@ -758,224 +645,6 @@ Tensor softmax_nested( return output; } -Tensor bmm_nested(const Tensor& self, const Tensor& mat2) { - if (self.is_nested() && !mat2.is_nested()) { - AT_ERROR("Expected both to be nested, but got a nested self and non-nested other"); - } - else if (!self.is_nested() && mat2.is_nested()) { - AT_ERROR("Expected both to be nested, but got a non-nested self and nested other"); - } - // dispatcher should have guaranteed that at least one is nested - auto self_ptr = get_nested_tensor_impl(self); - auto mat2_ptr = get_nested_tensor_impl(mat2); - TORCH_CHECK(self_ptr->dim() == 3, "batch1 must be a 3D tensor"); - TORCH_CHECK(mat2_ptr->dim() == 3, "batch2 must be a 3D tensor"); - int64_t ntensors = self_ptr->size(0), - ntensors2 = mat2_ptr->size(0); - TORCH_CHECK(ntensors == ntensors2, - "Expected size for the 1st dimension of batch2 tensor to be: ", ntensors, - " but got: ", ntensors2, "."); - const Tensor& self_buffer = self_ptr->get_buffer(), - & mat2_buffer = mat2_ptr->get_buffer(); - std::vector self_sizes = NestedTensor_get_sizes(self_ptr), - mat2_sizes = NestedTensor_get_sizes(mat2_ptr), - self_strides = NestedTensor_get_strides(self_ptr), - mat2_strides = NestedTensor_get_strides(mat2_ptr); - const std::vector& self_offsets = self_ptr->get_offsets(), - & mat2_offsets = mat2_ptr->get_offsets(); - // create a contiguous output - int64_t out_numel = 0; - const Tensor& self_sizemat = self_ptr->get_nested_size_tensor(); - Tensor out_sizemat = self_sizemat.new_empty(self_sizemat.sizes()); - int64_t* out_sizemat_ptr = out_sizemat.data_ptr(); - for (int64_t i = 0; i < ntensors; i++) { - const IntArrayRef& self_shape = self_sizes[i], - & mat2_shape = mat2_sizes[i]; - const int64_t& self_size0 = self_shape[0], & self_size1 = self_shape[1], - & mat2_size0 = mat2_shape[0], & mat2_size1 = mat2_shape[1]; - TORCH_CHECK(self_size1 == mat2_size0, - i, "-th nested matrices in batch cannot be multiplied (", - self_size0, "x", self_size1, " and ", - mat2_size0, "x", mat2_size1, ")"); - out_sizemat_ptr[0] = self_size0; - out_sizemat_ptr[1] = mat2_size1; - out_sizemat_ptr += 2; - out_numel += self_size0 * mat2_size1; - } - Tensor out_buffer = self_buffer.new_empty(out_numel); - Tensor output = wrap_buffer(out_buffer, out_sizemat); - // call tensor mm - // TODO: `padding nested tensor -> bmm -> remove padding` may be more efficient - // until we have specialized nested tensor bmm kernel - // useful resource: `aten/src/ATen/native/cpu/LinearAlgebra.cpp/bmm_out_or_baddbmm_` - // `aten/src/ATen/native/cuda/Blas.cpp/baddbmm_out_cuda_impl` - std::vector output_unbind = output.unbind(); - for (int64_t i = 0; i < ntensors; i++) { - at::mm_out(output_unbind[i], - self_buffer.as_strided(self_sizes[i], self_strides[i], self_offsets[i]), - mat2_buffer.as_strided(mat2_sizes[i], mat2_strides[i], mat2_offsets[i])); - } - return output; -} - -// utilities support `matmul_nested` -namespace { -// Args: -// self_sizes: the sizes of `self` in `matmul_nested` -// mat2_sizes: the sizes of `mat2` in `matmul_nested` -// buffer_op: the options for new buffer -// sizemat_op: the options for new size matrix -// Returns: -// the batch size of each input underlying tensor, i.e. the product of batch-dimension sizes -// the empty output nested tensor -inline std::tuple, Tensor> -matmul_nested_helper( - const std::vector& self_sizes, - const std::vector& mat2_sizes, - const c10::TensorOptions& buffer_op, - const c10::TensorOptions& sizemat_op) { - int64_t ntensors = self_sizes.size(), - ndims = self_sizes[0].size(); - std::vector batch_sizes(ntensors, 1); - Tensor sizemat = at::empty({ntensors, ndims}, sizemat_op); - int64_t* sizemat_ptr = sizemat.data_ptr(); - int64_t numel = 0; - for (int64_t i = 0; i < ntensors; i++) { - const IntArrayRef& self_size = self_sizes[i], - & mat2_size = mat2_sizes[i]; - int64_t& batch_size = batch_sizes[i]; - // batch dimensions - for (int64_t j = 0; j < ndims - 2; j++) { - const int64_t& self_sizej = self_size[j], - & mat2_sizej = mat2_size[j]; - TORCH_CHECK( - self_sizej == mat2_sizej, - "matmul: For nested tensors, no broadcasting is currently performed: ", - i, "-th nested matrices in batch at dimension ", j + 1, - " have mismatching sizes ", self_sizej, " and ", mat2_sizej); - sizemat_ptr[j] = self_sizej; - batch_size *= sizemat_ptr[j]; - } - // matrix multiplication dimensions - const int64_t& self_size0 = self_size[ndims - 2], & self_size1 = self_size[ndims - 1], - & mat2_size0 = mat2_size[ndims - 2], & mat2_size1 = mat2_size[ndims - 1]; - TORCH_CHECK( - self_size1 == mat2_size0, - "matmul: ", - i, "-th nested matrices in batch cannot be multiplied (", - self_size0, "x", self_size1, " and ", - mat2_size0, "x", mat2_size1, ")"); - sizemat_ptr[ndims - 2] = self_size0; - sizemat_ptr[ndims - 1] = mat2_size1; - sizemat_ptr += ndims; - numel += batch_size * self_size0 * mat2_size1; - } - Tensor buffer = at::empty(numel, buffer_op); - Tensor output = wrap_buffer(buffer, sizemat); - return std::make_tuple(batch_sizes, output); -} -} - -// Note [nested tensor matmul] -// This is really a generalized batched matmul dedicated to nested tensors, -// where `self` and `mat2` have same number (>= 3) of dimensions. -// The last 2 dimensions will be considered as matrix dimensions, -// so they should be matrix-multiplicable. -// The leading dimensions are considered as batch dimensions, -// and since nested tensor does not support broadcasting for now, -// for each batch dimension `self` and `mat2` must have same size. -// TODO: Should make full matmul semantics support some day -Tensor matmul_nested(const Tensor& self, const Tensor& mat2) { - if (self.is_nested() && !mat2.is_nested()) { - AT_ERROR("Expected both to be nested, but got a nested self and non-nested other"); - } - else if (!self.is_nested() && mat2.is_nested()) { - AT_ERROR("Expected both to be nested, but got a non-nested self and nested other"); - } - // dispatcher should have guaranteed that at least one is nested - auto self_ptr = get_nested_tensor_impl(self), - mat2_ptr = get_nested_tensor_impl(mat2); - int64_t self_dim = self_ptr->dim(), - mat2_dim = mat2_ptr->dim(); - TORCH_CHECK( - self_dim >= 3, - "matmul: For nested tensors, only inputs with >= 3 dims are currently supported. 1st input has rank: ", - self_dim); - TORCH_CHECK( - mat2_dim >= 3, - "matmul: For nested tensors, only inputs with >= 3 dims are currently supported. 2nd input has rank: ", - mat2_dim); - TORCH_CHECK(self_dim == mat2_dim, "matmul: both inputs must have same rank"); - int64_t ntensors = self_ptr->size(0), - ntensors2 = mat2_ptr->size(0); - TORCH_CHECK(ntensors == ntensors2, - "matmul: Expected size for the 1st dimension of 2nd input tensor to be: ", ntensors, - " but got: ", ntensors2, "."); - const Tensor& self_buffer = self_ptr->get_buffer(), - & mat2_buffer = mat2_ptr->get_buffer(); - std::vector self_sizes = NestedTensor_get_sizes(self_ptr), - mat2_sizes = NestedTensor_get_sizes(mat2_ptr), - self_strides = NestedTensor_get_strides(self_ptr), - mat2_strides = NestedTensor_get_strides(mat2_ptr); - const std::vector& self_offsets = self_ptr->get_offsets(), - & mat2_offsets = mat2_ptr->get_offsets(); - // create a contiguous output - std::vector batch_sizes; - Tensor output; - std::tie(batch_sizes, output) = matmul_nested_helper( - self_sizes, mat2_sizes, self_buffer.options(), self_ptr->get_nested_size_tensor().options()); - // call tensor matmul - // TODO: `padding nested tensor -> bmm -> remove padding` may be more efficient - // until we have specialized nested tensor bmm kernel - // useful resource: `aten/src/ATen/native/cpu/LinearAlgebra.cpp/bmm_out_or_baddbmm_` - // `aten/src/ATen/native/cuda/Blas.cpp/baddbmm_out_cuda_impl` - std::vector output_unbind = output.unbind(); - for (int64_t i = 0; i < ntensors; i++) { - const IntArrayRef& self_size = self_sizes[i], - & mat2_size = mat2_sizes[i]; - const int64_t& batch_size = batch_sizes[i]; - if (batch_size == 1) { - at::mm_out( - output_unbind[i], - self_buffer.as_strided(self_size, self_strides[i], self_offsets[i]), - mat2_buffer.as_strided(mat2_size, mat2_strides[i], mat2_offsets[i]) - ); - } - else { - at::bmm_out( - output_unbind[i], - self_buffer.as_strided(self_size, self_strides[i], self_offsets[i]) - .reshape({batch_size, self_size[self_dim - 1 - 2], self_size[self_dim - 1 - 1]}), - mat2_buffer.as_strided(mat2_size, mat2_strides[i], mat2_offsets[i]) - .reshape({batch_size, mat2_size[self_dim - 1 - 2], mat2_size[self_dim - 1 - 1]}) - ); - } - } - return output; -} - -Tensor& matmul_out_nested(const Tensor& tensor1, const Tensor& tensor2, Tensor& result) { - // TODO: this is a very quick and dirty implementation - // should improve it to avoid the intermediate memory usage - Tensor function_result = at::matmul(tensor1, tensor2); - auto function_result_ptr = get_nested_tensor_impl(function_result); - // TODO: this is to reproduce function_result_ptr->opt_sizes_ - // if an accessor is provided in the future, can replace this - std::vector sizes; - for (int64_t i = 0; i < function_result_ptr->dim(); i++) { - c10::optional opt_size = function_result_ptr->opt_size(i); - if (opt_size.has_value()) { - sizes.push_back(*opt_size); - } - else { - sizes.push_back(-1); - } - } - result.reshape(sizes); - result.copy_(function_result); - return result; -} - Tensor transpose_nested(const Tensor& self, int64_t dim0, int64_t dim1) { auto self_ptr = get_nested_tensor_impl(self); // check input dimensions @@ -1001,10 +670,77 @@ Tensor transpose_nested(const Tensor& self, int64_t dim0, int64_t dim1) { // create transposed `sizemat` and `stridemat` Tensor sizemat_transposed = at::index_select(sizemat, 1, column_indices), stridemat_transposed = at::index_select(stridemat, 1, column_indices); - return wrap_buffer(self_ptr->get_buffer(), sizemat_transposed, stridemat_transposed, self_ptr->get_offsets()); + return create_nested_view_tensor( + self, sizemat_transposed, stridemat_transposed, std::vector(self_ptr->get_storage_offsets())); } -// utilities supporting `_reshape_nested` +Tensor squeeze_nested(const Tensor& self) { + TORCH_CHECK(false, + "squeeze(): For nested tensors, squeeze without the dim argument is not supported ", + "at the moment, however you can use squeeze(Tensor self, int dim) instead ", + "if you need this feature, please open an issue on github describing your use case."); + return self; +} + +Tensor squeeze_dim_nested(const Tensor& self, int64_t dim) { + auto self_ptr = get_nested_tensor_impl(self); + int64_t ndim = self_ptr->dim(); + int64_t wrapped_dim = at::maybe_wrap_dim(dim, ndim); + TORCH_CHECK(wrapped_dim > 0, + "squeeze(): For nested tensors, squeezing dimension 0 is not supported at the moment ", + "if you need this feature, please open an issue on github describing your use case."); + const Tensor& sizemat = self_ptr->get_nested_size_tensor(); + const Tensor& stridemat = self_ptr->get_nested_stride_tensor(); + // if tensor.size(dim) != 1 torch.squeeze will return the result, we do the same here + c10::optional size_dim = self_ptr->opt_size(dim); + if (!(size_dim.has_value() && size_dim.value() == 1)) { + // detach to avoid triggering throw_error_if_base_and_tensor_are_same + return self.detach(); + } + // if ndim == 2 and we pass the above if statement we should have a + // nested tensor of singleton tensors + TORCH_CHECK(ndim != 2, + "squeeze(): For nested tensors, squeezing a nested tensor of singleton tensors is not ", + "supported at the moment, if you need this feature, please open an issue on github", + "describing your use case."); + auto column_indices = sizemat.new_empty(ndim - 2); + int64_t* column_indices_ptr = column_indices.data_ptr(); + std::iota(column_indices_ptr, column_indices_ptr + wrapped_dim - 1, 0); + std::iota(column_indices_ptr + wrapped_dim - 1, column_indices_ptr + ndim - 2, wrapped_dim); + auto sizemat_squeezed = at::index_select(sizemat, 1, column_indices); + auto stridemat_squeezed = at::index_select(stridemat, 1, column_indices); + return create_nested_view_tensor( + self, sizemat_squeezed, stridemat_squeezed, std::vector(self_ptr->get_storage_offsets())); +} + +Tensor unsqueeze_nested(const Tensor& self, int64_t dim) { + auto self_ptr = get_nested_tensor_impl(self); + int64_t ndim = self_ptr->dim(); + int64_t wrapped_dim = at::maybe_wrap_dim(dim, ndim + 1); + TORCH_CHECK(wrapped_dim > 0, + "unsqueeze(): For nested tensors, unsqueezing dimension 0 is not supported at the moment ", + "if you need this feature, please open an issue on github describing your use case."); + const Tensor& sizemat = self_ptr->get_nested_size_tensor(); + const Tensor& stridemat = self_ptr->get_nested_stride_tensor(); + auto mat_dim = wrapped_dim - 1; + Tensor new_size = sizemat.new_ones({sizemat.size(0), 1}); + Tensor sizemat_unsqueezed = at::cat({sizemat.slice(1, 0, mat_dim), + new_size, + sizemat.slice(1, mat_dim, ndim)}, 1); + Tensor new_stride; + if (wrapped_dim == ndim) { + new_stride = stridemat.new_ones({stridemat.size(0), 1}); + } else { + new_stride = (stridemat.select(1, mat_dim - 1) * sizemat.select(1, mat_dim - 1)).unsqueeze(-1); + } + Tensor stridemat_unsqueezed = at::cat({stridemat.slice(1, 0, mat_dim), + new_stride, + stridemat.slice(1, mat_dim, ndim)}, 1); + return create_nested_view_tensor( + self, sizemat_unsqueezed, stridemat_unsqueezed, std::vector(self_ptr->get_storage_offsets())); +} + +// utilities supporting `view_nested` and `reshape_nested` namespace { // Args: // sizes: the sizes of original nested tensor @@ -1012,10 +748,10 @@ namespace { // proposed_shape: user proposed new shape // op: the options for new size and stride matrices // Returns: -// whether reshape as view is possible (i.e. old buffer can be reused) +// whether viewable // size matrix after reshape -// stride matrix after reshape (not fully populated if reshape as view is impossible) -inline std::tuple NestedTensor_reshape_size_stride( +// stride matrix after reshape (not fully populated if not viewable) +inline std::tuple NestedTensor_compute_size_stride( const std::vector& sizes, const std::vector& strides, const IntArrayRef& proposed_shape, @@ -1023,7 +759,7 @@ inline std::tuple NestedTensor_reshape_size_stride( int64_t ntensors = sizes.size(), ndims_underlying = sizes[0].size(), ndims_underlying_reshaped = proposed_shape.size() - 1; - bool reshape_as_view = true; + bool viewable = true; Tensor sizemat_reshaped = at::empty({ntensors, ndims_underlying_reshaped}, op), stridemat_reshaped = at::empty({ntensors, ndims_underlying_reshaped}, op); int64_t* sizemat_reshaped_ptr = sizemat_reshaped.data_ptr(), @@ -1033,21 +769,31 @@ inline std::tuple NestedTensor_reshape_size_stride( & stride = strides[itensor]; // compute reshaped size std::vector size_reshaped_vector(proposed_shape.begin() + 1, proposed_shape.end()); + // only allow one pre-existing dimension to have proposed shape == -1 + int64_t infer_index_old = -1; // some negative sizes remain to be infered if (ndims_underlying < ndims_underlying_reshaped) { + int64_t numel = 1, numel_reshaped = 1; // replace negative sizes for old dimensions with old sizes for (int64_t idim = 0; idim < ndims_underlying; idim++) { int64_t& size_reshaped = size_reshaped_vector[idim]; TORCH_CHECK(size_reshaped >= -1, "invalid shape dimension ", size_reshaped); if (size_reshaped == -1) { + TORCH_CHECK(infer_index_old == -1, "only one dimension can be inferred"); size_reshaped = size[idim]; + infer_index_old = idim; } + numel *= size[idim]; + numel_reshaped *= size_reshaped; } // infer negative size for new dimension int64_t infer_index = -1; for (int64_t idim = ndims_underlying; idim < ndims_underlying_reshaped; idim++) { const int64_t& size_reshaped = size_reshaped_vector[idim]; - if (size_reshaped == -1) { + if (size_reshaped >= 0) { + numel_reshaped *= size_reshaped; + } + else if (size_reshaped == -1) { if (infer_index > -1) { throw std::runtime_error("only one dimension can be inferred"); } @@ -1055,22 +801,36 @@ inline std::tuple NestedTensor_reshape_size_stride( infer_index = idim; } } - else if (size_reshaped < 0) { + else { AT_ERROR("invalid shape dimension ", size_reshaped); } } - // See Note [inference and inheritance semantics] + // See Note [Special size rule for nested tensor] TORCH_CHECK(infer_index == -1, "nested tensor does not infer shape"); + TORCH_CHECK( + numel == numel_reshaped, + "shape '", proposed_shape, "' ", + "is invalid for input of size ", numel); } // all negative sizes can be replaced else { + int64_t numel = 1, numel_reshaped = 1; for (int64_t idim = 0; idim < ndims_underlying_reshaped; idim++) { int64_t& size_reshaped = size_reshaped_vector[idim]; TORCH_CHECK(size_reshaped >= -1, "invalid shape dimension ", size_reshaped); if (size_reshaped == -1) { size_reshaped = size[idim]; } + numel *= size[idim]; + numel_reshaped *= size_reshaped; + } + for (int64_t idim = ndims_underlying_reshaped; idim < ndims_underlying; idim++) { + numel *= size[idim]; } + TORCH_CHECK( + numel == numel_reshaped, + "shape '", proposed_shape, "' ", + "is invalid for input of size ", numel); } IntArrayRef size_reshaped(size_reshaped_vector); // compute reshaped stride @@ -1088,7 +848,7 @@ inline std::tuple NestedTensor_reshape_size_stride( } // reshape as view is impossible else { - reshape_as_view = false; + viewable = false; // fill reshaped size into sizemat for (int64_t idim = 0; idim < ndims_underlying_reshaped; idim++) { sizemat_reshaped_ptr[idim] = size_reshaped[idim]; @@ -1096,42 +856,104 @@ inline std::tuple NestedTensor_reshape_size_stride( sizemat_reshaped_ptr += ndims_underlying_reshaped; } } - return std::make_tuple(reshape_as_view, sizemat_reshaped, stridemat_reshaped); + return std::make_tuple(viewable, sizemat_reshaped, stridemat_reshaped); } +} // namespace -// Args: -// nt_reshaped: the reshaped nested tensor to receive copies -// buffer: the original nested tensor buffer -// sizes: the original nested tensor sizes (may have gone through collapsing or splitting) -// strides: the original nested tensor strides (may have gone through collapsing or splitting) -// offsets: the original nested tensor offsets (may have gone through collapsing or splitting) -inline void NestedTensor_reshape_copy( - Tensor& nt_reshaped, +// Note [Special size rule for nested tensor] +// Instead of infering size, -1 means "inherit the old size", so: +// * negative size is legal for a ragged dimension +// * however, we only allow one -1 +// In principle we could still infer a dimension, +// we are designing a better semantics to include both inheritance and inference +Tensor view_nested(const Tensor& self, IntArrayRef proposed_shape) { + TORCH_CHECK( + proposed_shape.size() > 0, + "shape '[]' is invalid for a nested tensor"); + auto self_ptr = get_nested_tensor_impl(self); + // basic information before reshaping + int64_t ntensors = self_ptr->size(0); + TORCH_CHECK( + ntensors > 0, + "empty nested tensor cannot be reshaped"); + // basic information after reshaping + int64_t ntensors_reshaped = proposed_shape[0]; + TORCH_CHECK( + ntensors == ntensors_reshaped, + "view: For now nested view cannot change or infer the implicit batch dimension"); + std::vector sizes = NestedTensor_get_sizes(self_ptr), + strides = NestedTensor_get_strides(self_ptr); + // reshaping underlying tensor dimensions does not change offset + // determine reshaped size and stride + const Tensor& sizemat = self_ptr->get_nested_size_tensor(); + bool viewable; + Tensor sizemat_reshaped, stridemat_reshaped; + std::tie(viewable, sizemat_reshaped, stridemat_reshaped) = NestedTensor_compute_size_stride( + sizes, strides, proposed_shape, sizemat.options()); + TORCH_CHECK( + viewable, + "view size is not compatible with input tensor's size and stride " + "(at least one dimension spans across two contiguous subspaces). " + "Use .reshape(...) instead."); + return create_nested_view_tensor(self, sizemat_reshaped, stridemat_reshaped, std::vector(self_ptr->get_storage_offsets())); +} + /** + * Create a buffer tensor that is a view of self + * + * This serves as the boundary between nested and non nested tensor + * view conversions + * + * @return Returns a new non nested tensor that + * aliases the same storage as self + */ +Tensor values_nested(const Tensor& self) { + TORCH_INTERNAL_ASSERT(self.is_nested(), "Can only create a buffer from Nested Tensor"); + auto* nt_self = get_nested_tensor_impl(self); + return nt_self->get_unsafe_storage_as_tensor(); +} + +/** + * Create a nested tensor that is a view of a buffer + * + * This serves as the boundary between non nested tensor and nested + * view conversions + * + * @return Returns a nested tensor that + * aliases the same storage as buffer + */ +Tensor _nested_view_from_buffer( const Tensor& buffer, - const std::vector& sizes, - const std::vector& strides, - const std::vector& offsets) { - auto nt_reshaped_ptr = get_nested_tensor_impl(nt_reshaped); - const Tensor& buffer_reshaped = nt_reshaped_ptr->get_buffer(); - std::vector sizes_reshaped = NestedTensor_get_sizes(nt_reshaped_ptr), - strides_reshaped = NestedTensor_get_strides(nt_reshaped_ptr); - const std::vector& offsets_reshaped = nt_reshaped_ptr->get_offsets(); - for (int64_t i = 0; i < nt_reshaped_ptr->size(0); i++) { - buffer_reshaped.as_strided(sizes_reshaped[i], strides_reshaped[i], offsets_reshaped[i]).copy_( - // TODO: can we avoid allocating new memory for `buffer...reshape` - // I did not find anything like reshape_out - buffer.as_strided(sizes[i], strides[i], offsets[i]).reshape(sizes_reshaped[i])); - } + const Tensor& nested_size_tensor, + const Tensor& nested_stride_tensor, + IntArrayRef offsets) { + TORCH_INTERNAL_ASSERT( + !buffer.is_nested(), + "Can only a create Nested Tensor from a normal tensor buffer"); + TORCH_INTERNAL_ASSERT(buffer.dim() == 1, "The input buffer must be flat"); + TORCH_INTERNAL_ASSERT(nested_size_tensor.dim() == 2, "Expected the nested size tensor to be two dimensional."); + uint64_t num_elements_nested_size = at::prod(nested_size_tensor, 1).sum().item(); + uint64_t buffer_storage_size = buffer.storage().nbytes()/buffer.dtype().itemsize(); + TORCH_INTERNAL_ASSERT( + buffer_storage_size == num_elements_nested_size, + "The number of elements in the buffer must equal the nested tensor size but buffer size: ", + buffer_storage_size, + " and nested tensor size: ", + num_elements_nested_size, + "."); + + TORCH_INTERNAL_ASSERT(nested_stride_tensor.dim() == 2, "Expected the nested stride tensor to be two dimensional."); + TORCH_INTERNAL_ASSERT(nested_size_tensor.size(0) == nested_stride_tensor.size(0), "Expected the first dimension of nested size and nested stride tensor to be equal."); + TORCH_INTERNAL_ASSERT(nested_stride_tensor.size(0) == (int64_t)offsets.size(), "Expected the first dimension of nested stride tensor to equal the length of offsets."); + return at::detail::make_tensor( + c10::TensorImpl::VIEW, + buffer, + nested_size_tensor, + nested_stride_tensor, + std::vector(offsets.begin(), offsets.end())); } -} // namespace -// Special rules for reshape(nested tensor): -// 1. Only 1 regular dimension can be collapsed with -// or splitted from the implicit batch dimension -// 2. Instead of infering size, -1 means "inherit the old size", so: -// * negative size is legal for a ragged dimension -// * multiple sizes can be -1 -Tensor _reshape_nested(const Tensor& self, IntArrayRef proposed_shape) { +// See Note [Special size rule for nested tensor] +Tensor reshape_nested(const Tensor& self, IntArrayRef proposed_shape) { TORCH_CHECK( proposed_shape.size() > 0, "shape '[]' is invalid for a nested tensor"); @@ -1142,38 +964,43 @@ Tensor _reshape_nested(const Tensor& self, IntArrayRef proposed_shape) { ntensors > 0, "empty nested tensor cannot be reshaped"); // basic information after reshaping - int64_t ntensors_reshaped; - if (proposed_shape[0] >= 0) { - ntensors_reshaped = proposed_shape[0]; - } - else if (proposed_shape[0] == -1) { - ntensors_reshaped = ntensors; - } - else { - AT_ERROR("invalid shape dimension ", proposed_shape[0]); - } + int64_t ntensors_reshaped = proposed_shape[0]; TORCH_CHECK( ntensors == ntensors_reshaped, - "for now reshape cannot change the implicit batch dimension"); + "reshape: For now nested reshape cannot change or infer the implicit batch dimension"); std::vector sizes = NestedTensor_get_sizes(self_ptr), strides = NestedTensor_get_strides(self_ptr); - const std::vector& offsets = self_ptr->get_offsets(); // reshaping underlying tensor dimensions does not change offset // determine reshaped size and stride - const Tensor& buffer = self_ptr->get_buffer(), - & sizemat = self_ptr->get_nested_size_tensor(); - bool reshape_as_view; + const Tensor& sizemat = self_ptr->get_nested_size_tensor(); + bool viewable{false}; Tensor sizemat_reshaped, stridemat_reshaped; - std::tie(reshape_as_view, sizemat_reshaped, stridemat_reshaped) = NestedTensor_reshape_size_stride( + std::tie(viewable, sizemat_reshaped, stridemat_reshaped) = NestedTensor_compute_size_stride( sizes, strides, proposed_shape, sizemat.options()); - if (reshape_as_view) { - return wrap_buffer(buffer, sizemat_reshaped, stridemat_reshaped, offsets); + if (viewable) { + return self.view(proposed_shape); } - Tensor buffer_reshaped = buffer.new_empty(buffer.sizes()); - Tensor output = wrap_buffer(buffer_reshaped, sizemat_reshaped); - NestedTensor_reshape_copy(output, - buffer, sizes, strides, offsets); - return output; + else { + return self.clone(at::MemoryFormat::Contiguous).view(proposed_shape); + } +} + +Tensor reshape_as_nested(const Tensor& self, const Tensor& other) { + auto other_ptr = get_nested_tensor_impl(other); + // TODO: this is to reproduce other_ptr->opt_sizes_ + // if an accessor is provided in the future, can replace this + std::vector sizes; + for (int64_t i = 0; i < other_ptr->dim(); i++) { + c10::optional opt_size = other_ptr->opt_size(i); + if (opt_size.has_value()) { + sizes.push_back(*opt_size); + } + else { + sizes.push_back(-1); + } + } + // reshape with other.opt_sizes_ + return self.reshape(sizes); } } // namespace native diff --git a/aten/src/ATen/native/nested/NestedTensorMath.h b/aten/src/ATen/native/nested/NestedTensorMath.h index 11b94b65a4e5..954fa807f183 100644 --- a/aten/src/ATen/native/nested/NestedTensorMath.h +++ b/aten/src/ATen/native/nested/NestedTensorMath.h @@ -1,261 +1,22 @@ #pragma once -#include -#include +#include #include - -#include +#include namespace at { namespace native { -struct NestedTensorImpl; - -// TODO: cache this and only do it once per NestedTensor -int64_t get_consistent_last_dim_of_nested_tensor(const NestedTensorImpl& nt); - -inline at::Tensor wrap_buffer(at::Tensor buffer, at::Tensor nested_size_tensor) { - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(buffer.is_contiguous(), "Given buffer must be contiguous."); - return at::detail::make_tensor( - std::move(buffer), std::move(nested_size_tensor)); -} - -inline at::Tensor wrap_buffer( - at::Tensor buffer, at::Tensor nested_size_tensor, - at::Tensor nested_stride_tensor, const std::vector& offsets) { - TORCH_INTERNAL_ASSERT_DEBUG_ONLY(buffer.is_contiguous(), "Given buffer must be contiguous."); - return at::detail::make_tensor( - std::move(buffer), std::move(nested_size_tensor), - std::move(nested_stride_tensor), offsets); -} - -inline at::Tensor get_buffer(const at::Tensor& tensor) { - return get_nested_tensor_impl(tensor)->get_buffer(); -} - -// The sizes of the underlying tensors -inline std::vector NestedTensor_get_sizes(const NestedTensorImpl* self_ptr) { - int64_t ntensors = self_ptr->size(0); - std::vector sizes(ntensors); - if (ntensors == 0) { - return sizes; - } - const Tensor& sizemat = self_ptr->get_nested_size_tensor(); - int64_t orig_dim = sizemat.size(1); - // nesting scalars has empty sizes - if (orig_dim == 0) { - return sizes; - } - const int64_t* sizemat_ptr = sizemat.data_ptr(); - - for(const auto i: c10::irange(ntensors)){ - sizes[i] = IntArrayRef(sizemat_ptr, sizemat_ptr + orig_dim); - sizemat_ptr += orig_dim; - } - return sizes; -} - -inline std::vector NestedTensor_get_sizes(const at::Tensor& self) { - const NestedTensorImpl* self_ptr = get_nested_tensor_impl(self); - return NestedTensor_get_sizes(self_ptr); -} - -// The strides of the underlying tensors -inline std::vector NestedTensor_get_strides(const NestedTensorImpl* self_ptr) { - int64_t ntensors = self_ptr->size(0); - std::vector strides(ntensors); - if (ntensors == 0) { - return strides; - } - const Tensor& stridemat = self_ptr->get_nested_stride_tensor(); - int64_t orig_dim = stridemat.size(1); - // nesting scalars has empty strides - if (orig_dim == 0) { - return strides; - } - const int64_t* stridemat_ptr = stridemat.data_ptr(); - for(const auto i: c10::irange(ntensors)) { - strides[i] = IntArrayRef(stridemat_ptr, stridemat_ptr + orig_dim); - stridemat_ptr += orig_dim; - } - return strides; -} - -inline std::vector NestedTensor_get_strides(const at::Tensor& self) { - const NestedTensorImpl* self_ptr = get_nested_tensor_impl(self); - return NestedTensor_get_strides(self_ptr); -} - -TORCH_API std::vector NestedTensor_get_max_size( - const NestedTensorImpl& nt); TORCH_API Tensor NestedTensor_to_padded_tensor_generic( const Tensor& t, double padding, OptionalIntArrayRef output_size); -namespace impl { - -template -struct NestedNode { - NestedNode() = delete; - explicit NestedNode(std::vector&& children) - : _is_leaf(false), _children(children) {} - explicit NestedNode(TensorList children) - : _is_leaf(false), _children(children.vec()) {} - // NestedNode(NestedNode&) = delete; - // NestedNode(const NestedNode&) = delete; - // NestedNode& operator=(NestedNode) = delete; - explicit NestedNode(T payload) : _is_leaf(true), _payload(payload) {} - inline bool is_leaf() const { - return _is_leaf; - } - inline size_t degree() const { - return _children.size(); - } - inline const std::vector unbind() const { - return _children; - } - inline T children(size_t i) const { - return _children[i]; - } - inline const T& payload() const { - return _payload; - } - inline T& payload() { - return _payload; - } - - private: - bool _is_leaf; - std::vector _children; - T _payload; -}; - -using TensorNode = NestedNode; - -template -class _map; - -template -class _map> { - public: - static A function_one( - F&& fn, - const Args&... nested_node) { - return std::forward(fn)(nested_node...); - } - // NOTE: We must move F to avoid copying objects if it is a lambda with - // captures. - static NestedNode function( - F&& fn, - const NestedNode&... nested_node) { - size_t degree = 0; - bool all_leaf = true; - c10::guts::tuple_map( - std::forward_as_tuple(nested_node...), [&all_leaf, °ree](auto n) { - all_leaf = all_leaf && (n.is_leaf()); - if (degree > 1 && n.degree() > 1) { - TORCH_CHECK(degree == n.degree(), "NestedNodes must match in degree."); - } - if (n.degree() > degree) { - degree = n.degree(); - } - return nullptr; - }); - // All NestedNodes just wrap regular objects. - if (all_leaf) { - return NestedNode(std::forward(fn)(nested_node.payload()...)); - } - // Some NestedNodes wrap regular Tensors, some NestedTensors and some other types. - std::vector result; - for (size_t i = 0; i < degree; i++) { - std::tuple children = c10::guts::tuple_map( - std::forward_as_tuple(nested_node...), [&i](auto a) { - static_assert( - c10::guts::is_instantiation_of::value, - "Internal error."); - // Broadcast regular arguments across NestedTensor constituents. - // This could be a Tensor, integer or anything else really. - if (a.is_leaf()) { - return a.payload(); - } - // Broadcast NestedTensors with one constituent. - if (a.degree() == 1 && !a.is_leaf()) { - return a.children(0); - } - TORCH_CHECK(a.degree() > 0, "Internal assert."); - return a.children(i); - }); - c10::guts::apply( - [&result, &fn](Args... filtered) { - result.emplace_back(function_one(std::forward(fn), filtered...)); - }, - std::move(children)); - } - return NestedNode(std::move(result)); - } -}; - -// TODO: Add static assert to verify lambda arguments match nested_node types -template -static inline NestedNode< - typename c10::guts::infer_function_traits::type::return_type> -map(F&& fn, const NestedNode&... nested_node) { - return _map< - F, - typename c10::guts::infer_function_traits::type::return_type, - typename c10::guts::infer_function_traits::type::parameter_types>:: - function(std::forward(fn), nested_node...); -} - -inline TensorNode get_nested_tensor_structure(at::Tensor tensor) { - if (get_nested_tensor_impl_or_null(tensor) == nullptr) { - return TensorNode(std::move(tensor)); - } - return TensorNode(tensor.unbind()); -} - -inline Tensor wrap_tensor_node( - TensorNode tensor_node, - c10::optional dtype, - c10::optional layout, - c10::optional device, - c10::optional pin_memory) { - TORCH_CHECK( - !tensor_node.is_leaf(), "Expected TensorNode to wrap a list of Tensors."); - TensorOptions options_ = - TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory( - pin_memory); - if (tensor_node.degree() == 0) { - return wrap_buffer(ones({0}, dtype, layout, device), ones({})); - } - std::vector sizes; - std::vector flat_tensors; - for (const auto i : c10::irange(tensor_node.degree())) { - flat_tensors.push_back( - tensor_node.children(i).reshape(-1).contiguous()); - sizes.push_back( - tensor(c10::IntArrayRef(tensor_node.children(i).sizes()))); - } - - TensorOptions options = flat_tensors[0].options().merge_in(options_); - - return wrap_buffer( - at::cat(flat_tensors).to(options), at::native::stack(sizes)); -} - -} // namespace impl - -// This function is meant to ease rapid operator coverage for -// NestedTensor kernels. It is not meant to be efficient. Use it judiciously. -template -inline at::Tensor map_nested_tensor(F&& fn, A... a) { - return wrap_tensor_node( - impl::map(std::forward(fn), impl::get_nested_tensor_structure(a)...), - c10::nullopt, - c10::nullopt, - c10::nullopt, - c10::nullopt); +template +Tensor map_nt(const Tensor& nt, Func f) { + auto* nt_impl = get_nested_tensor_impl(nt); + const auto& sizes = nt_impl->get_nested_size_tensor(); + return at::detail::make_tensor(f(nt_impl->get_buffer()), sizes); } } // namespace native diff --git a/aten/src/ATen/native/nested/NestedTensorMatmul.cpp b/aten/src/ATen/native/nested/NestedTensorMatmul.cpp new file mode 100644 index 000000000000..c8cfa124330d --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorMatmul.cpp @@ -0,0 +1,352 @@ +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace at { +namespace native { + +Tensor bmm_nested(const Tensor& self, const Tensor& mat2) { + if (self.is_nested() && !mat2.is_nested()) { + AT_ERROR("Expected both to be nested, but got a nested self and non-nested other"); + } + else if (!self.is_nested() && mat2.is_nested()) { + AT_ERROR("Expected both to be nested, but got a non-nested self and nested other"); + } + // dispatcher should have guaranteed that at least one is nested + auto self_ptr = get_nested_tensor_impl(self); + auto mat2_ptr = get_nested_tensor_impl(mat2); + TORCH_CHECK(self_ptr->dim() == 3, "batch1 must be a 3D tensor"); + TORCH_CHECK(mat2_ptr->dim() == 3, "batch2 must be a 3D tensor"); + int64_t ntensors = self_ptr->size(0), + ntensors2 = mat2_ptr->size(0); + TORCH_CHECK(ntensors == ntensors2, + "Expected size for the 1st dimension of batch2 tensor to be: ", ntensors, + " but got: ", ntensors2, "."); + const Tensor& self_buffer = self_ptr->get_unsafe_storage_as_tensor(), + & mat2_buffer = mat2_ptr->get_unsafe_storage_as_tensor(); + std::vector self_sizes = NestedTensor_get_sizes(self_ptr), + mat2_sizes = NestedTensor_get_sizes(mat2_ptr), + self_strides = NestedTensor_get_strides(self_ptr), + mat2_strides = NestedTensor_get_strides(mat2_ptr); + const std::vector& self_offsets = self_ptr->get_storage_offsets(), + & mat2_offsets = mat2_ptr->get_storage_offsets(); + // create a contiguous output + int64_t out_numel = 0; + const Tensor& self_sizemat = self_ptr->get_nested_size_tensor(); + Tensor out_sizemat = self_sizemat.new_empty(self_sizemat.sizes()); + int64_t* out_sizemat_ptr = out_sizemat.data_ptr(); + for (int64_t i = 0; i < ntensors; i++) { + const IntArrayRef& self_shape = self_sizes[i], + & mat2_shape = mat2_sizes[i]; + const int64_t& self_size0 = self_shape[0], & self_size1 = self_shape[1], + & mat2_size0 = mat2_shape[0], & mat2_size1 = mat2_shape[1]; + TORCH_CHECK(self_size1 == mat2_size0, + i, "-th nested matrices in batch cannot be multiplied (", + self_size0, "x", self_size1, " and ", + mat2_size0, "x", mat2_size1, ")"); + out_sizemat_ptr[0] = self_size0; + out_sizemat_ptr[1] = mat2_size1; + out_sizemat_ptr += 2; + out_numel += self_size0 * mat2_size1; + } + Tensor out_buffer = self_buffer.new_empty(out_numel); + Tensor output = wrap_buffer(out_buffer, out_sizemat); + // call tensor mm + // TODO: `padding nested tensor -> bmm -> remove padding` may be more efficient + // until we have specialized nested tensor bmm kernel + // useful resource: `aten/src/ATen/native/cpu/LinearAlgebra.cpp/bmm_out_or_baddbmm_` + // `aten/src/ATen/native/cuda/Blas.cpp/baddbmm_out_cuda_impl` + std::vector output_unbind = output.unbind(); + for (int64_t i = 0; i < ntensors; i++) { + at::mm_out(output_unbind[i], + self_buffer.as_strided(self_sizes[i], self_strides[i], self_offsets[i]), + mat2_buffer.as_strided(mat2_sizes[i], mat2_strides[i], mat2_offsets[i])); + } + return output; +} + +// utilities support `matmul_nested` +namespace { +// Args: +// self_sizes: the sizes of `self` in `matmul_nested` +// mat2_sizes: the sizes of `mat2` in `matmul_nested` +// buffer_op: the options for new buffer +// sizemat_op: the options for new size matrix +// Returns: +// the batch size of each input underlying tensor, i.e. the product of batch-dimension sizes +// the empty output nested tensor +inline std::tuple, Tensor> +matmul_nested_helper( + const std::vector& self_sizes, + const std::vector& mat2_sizes, + const c10::TensorOptions& buffer_op, + const c10::TensorOptions& sizemat_op) { + int64_t ntensors = self_sizes.size(), + ndims = self_sizes[0].size(); + std::vector batch_sizes(ntensors, 1); + Tensor sizemat = at::empty({ntensors, ndims}, sizemat_op); + int64_t* sizemat_ptr = sizemat.data_ptr(); + int64_t numel = 0; + for (int64_t i = 0; i < ntensors; i++) { + const IntArrayRef& self_size = self_sizes[i], + & mat2_size = mat2_sizes[i]; + int64_t& batch_size = batch_sizes[i]; + // batch dimensions + for (int64_t j = 0; j < ndims - 2; j++) { + const int64_t& self_sizej = self_size[j], + & mat2_sizej = mat2_size[j]; + TORCH_CHECK( + self_sizej == mat2_sizej, + "matmul: For nested tensors, no broadcasting is currently performed: ", + i, "-th nested matrices in batch at dimension ", j + 1, + " have mismatching sizes ", self_sizej, " and ", mat2_sizej); + sizemat_ptr[j] = self_sizej; + batch_size *= sizemat_ptr[j]; + } + // matrix multiplication dimensions + const int64_t& self_size0 = self_size[ndims - 2], & self_size1 = self_size[ndims - 1], + & mat2_size0 = mat2_size[ndims - 2], & mat2_size1 = mat2_size[ndims - 1]; + TORCH_CHECK( + self_size1 == mat2_size0, + "matmul: ", + i, "-th nested matrices in batch cannot be multiplied (", + self_size0, "x", self_size1, " and ", + mat2_size0, "x", mat2_size1, ")"); + sizemat_ptr[ndims - 2] = self_size0; + sizemat_ptr[ndims - 1] = mat2_size1; + sizemat_ptr += ndims; + numel += batch_size * self_size0 * mat2_size1; + } + Tensor buffer = at::empty(numel, buffer_op); + Tensor output = wrap_buffer(buffer, sizemat); + return std::make_tuple(batch_sizes, output); +} +} + +Tensor matmul_with_bmm_nested(const Tensor& self, const Tensor& mat2) { + // Tensor self = self_.contiguous(); + // Tensor mat2 = mat2_.contiguous(); + // self [N, n_heads, *, head_dim] + // mat2 [N, n_heads, head_dim, *] + const auto self_ptr = get_nested_tensor_impl(self); + const auto mat2_ptr = get_nested_tensor_impl(mat2); + // metadata for self + std::vector self_sizes = NestedTensor_get_sizes(self_ptr); + std::vector self_strides = NestedTensor_get_strides(self_ptr); + std::vector self_offsets = self_ptr->get_storage_offsets(); + auto opt = self_ptr->get_nested_size_tensor().options(); + + // metadata for mat2 + std::vector mat2_sizes = NestedTensor_get_sizes(mat2_ptr); + std::vector mat2_strides = NestedTensor_get_strides(mat2_ptr); + std::vector mat2_offsets = mat2_ptr->get_storage_offsets(); + auto opt2 = mat2_ptr->get_nested_size_tensor().options(); + + int64_t N = self_sizes.size(); + int64_t n_heads = self_sizes[0][0]; + + // viewed metadata for self + auto self_new_sizes = at::empty({N * n_heads, 2}, opt); + int64_t* self_new_sizes_ptr = self_new_sizes.data_ptr(); + + auto self_new_strides = at::empty({N * n_heads, 2}, opt); + int64_t* self_new_strides_ptr = self_new_strides.data_ptr(); + std::vector self_new_offsets; + + // viewed metadata for mat2 + auto mat2_new_sizes = at::empty({N * n_heads, 2}, opt2); + int64_t* mat2_new_sizes_ptr = mat2_new_sizes.data_ptr(); + + auto mat2_new_strides = at::empty({N * n_heads, 2}, opt2); + int64_t* mat2_new_strides_ptr = mat2_new_strides.data_ptr(); + std::vector mat2_new_offsets; + + for (int64_t i = 0; i < N; i++) { + const IntArrayRef& self_size_i = self_sizes[i]; + const IntArrayRef& self_stride_i = self_strides[i]; + int64_t self_offset = self_offsets[i]; + + const IntArrayRef& mat2_size_i = mat2_sizes[i]; + const IntArrayRef& mat2_stride_i = mat2_strides[i]; + int64_t mat2_offset = mat2_offsets[i]; + for (int64_t j = 0; j < n_heads; j++) { + auto idx = (i * n_heads + j) * 2; + self_new_sizes_ptr[idx] = self_size_i[1]; + self_new_sizes_ptr[idx + 1] = self_size_i[2]; + self_new_strides_ptr[idx] = self_stride_i[1]; + self_new_strides_ptr[idx + 1] = self_stride_i[2]; + self_new_offsets.push_back(self_offset); + self_offset += self_stride_i[0]; + + mat2_new_sizes_ptr[idx] = mat2_size_i[1]; + mat2_new_sizes_ptr[idx + 1] = mat2_size_i[2]; + mat2_new_strides_ptr[idx] = mat2_stride_i[1]; + mat2_new_strides_ptr[idx + 1] = mat2_stride_i[2]; + mat2_new_offsets.push_back(mat2_offset); + mat2_offset += mat2_stride_i[0]; + } + } + + + // view self as [N * n_heads, *, head_dim] (collapse first 2 dims) + auto viewed_self = create_nested_view_tensor( + self, self_new_sizes, self_new_strides, std::vector(self_new_offsets)); + + // view mat2 as [N * n_heads, head_dim, *] (collapse first 2_dims) + auto viewed_mat2 = create_nested_view_tensor( + mat2, mat2_new_sizes, mat2_new_strides, std::vector(mat2_new_offsets)); + + // output [N * n_heads, *, *] + auto bmm_output = at::bmm(viewed_self, viewed_mat2); + + // generate metadata for viewing output as [N, n_heads, *, *] + // output of bmm should be contiguous so stride calculations should hold + auto out_new_sizes = at::empty({N, 3}, opt); + auto out_new_strides = at::empty({N, 3}, opt); + std::vector out_new_offsets; + + int64_t* out_new_sizes_ptr = out_new_sizes.data_ptr(); + int64_t* out_new_strides_ptr = out_new_strides.data_ptr(); + + int64_t out_offset = 0; + for (int64_t i = 0; i < N; i++) { + out_new_offsets.push_back(out_offset); + const IntArrayRef& self_size_i = self_sizes[i]; + const IntArrayRef& mat2_size_i = mat2_sizes[i]; + auto idx = i * 3; + out_new_sizes_ptr[idx] = n_heads; + out_new_sizes_ptr[idx + 1] = self_size_i[1]; + out_new_sizes_ptr[idx + 2] = mat2_size_i[2]; + out_new_strides_ptr[idx] = self_size_i[1] * mat2_size_i[2]; + out_new_strides_ptr[idx + 1] = mat2_size_i[2]; + out_new_strides_ptr[idx + 2] = 1; + out_offset += n_heads * (self_size_i[1] * mat2_size_i[2]); + } + + auto viewed_out = create_nested_view_tensor( + bmm_output, out_new_sizes, out_new_strides, std::vector(out_new_offsets)); + + return viewed_out; + +} + +// Note [nested tensor matmul] +// This is really a generalized batched matmul dedicated to nested tensors, +// where `self` and `mat2` have same number (>= 3) of dimensions. +// The last 2 dimensions will be considered as matrix dimensions, +// so they should be matrix-multiplicable. +// The leading dimensions are considered as batch dimensions, +// and since nested tensor does not support broadcasting for now, +// for each batch dimension `self` and `mat2` must have same size. +// TODO: Should make full matmul semantics support some day +Tensor matmul_nested(const Tensor& self, const Tensor& mat2) { + if (self.is_nested() && !mat2.is_nested()) { + AT_ERROR("Expected both to be nested, but got a nested self and non-nested other"); + } + else if (!self.is_nested() && mat2.is_nested()) { + AT_ERROR("Expected both to be nested, but got a non-nested self and nested other"); + } + // to_padded_tensor only supports contiguous inputs + auto self_contig = self.contiguous(); + auto mat2_contig = mat2.contiguous(); + // dispatcher should have guaranteed that at least one is nested + const auto self_ptr = get_nested_tensor_impl(self_contig); + const auto mat2_ptr = get_nested_tensor_impl(mat2_contig); + int64_t self_dim = self_ptr->dim(), + mat2_dim = mat2_ptr->dim(); + TORCH_CHECK( + self_dim >= 3, + "matmul: For nested tensors, only inputs with >= 3 dims are currently supported. 1st input has rank: ", + self_dim); + TORCH_CHECK( + mat2_dim >= 3, + "matmul: For nested tensors, only inputs with >= 3 dims are currently supported. 2nd input has rank: ", + mat2_dim); + TORCH_CHECK(self_dim == mat2_dim, "matmul: both inputs must have the same rank"); + int64_t ntensors = self_ptr->size(0), + ntensors2 = mat2_ptr->size(0); + TORCH_CHECK(ntensors == ntensors2, + "matmul: Expected size for the 1st dimension of 2nd input tensor to be: ", ntensors, + " but got: ", ntensors2, "."); + // Ensure batch dimensions have the same sizes (no broadcasting). + const auto& self_sizes = self_ptr->get_nested_size_tensor(); + const auto& mat2_sizes = mat2_ptr->get_nested_size_tensor(); + const auto& self_batch_sizes = self_sizes.narrow(1, 0, self_dim-3); + const auto& mat2_batch_sizes = mat2_sizes.narrow(1, 0, mat2_dim-3); + TORCH_CHECK(at::equal(self_batch_sizes, mat2_batch_sizes), + "matmul: For nested tensors, batch dimensions must have the same sizes, ", + "no broadcasting is currently performed. Got batch shapes for self ", + self_batch_sizes, + " and batch shapes for mat2 ", + mat2_batch_sizes); + // Ensure last dim of self and second last dim of mat2 have the same size + const auto& self_dim_size = self_sizes.select(1, -1); + const auto& mat2_dim_size = mat2_sizes.select(1, -2); + TORCH_CHECK(at::equal(self_dim_size, mat2_dim_size), + "matmul: Nested tensors cannot be matrix multiplied, last dimension of self has sizes", + self_dim_size, + "second last dimension of mat2 has sizes", + mat2_dim_size); + + // use bmm inference-only fast path for [N, n_heads, *, head_dim] [N, n_heads, head_dim, *] + if (self.is_cuda() && + self_dim == 4 && self.is_contiguous() && + mat2_dim == 4 && mat2.is_contiguous() && + !(GradMode::is_enabled() && (self.requires_grad() || mat2.requires_grad()))) { + auto n_heads = self_sizes.select(0, 1).select(0, 0).item(); + auto self_first_dim_n_heads = at::all(self_sizes.select(1, 0) == n_heads).item(); + auto mat2_first_dim_n_heads = at::all(mat2_sizes.select(1, 0) == n_heads).item(); + if (self_first_dim_n_heads && mat2_first_dim_n_heads) { + return matmul_with_bmm_nested(self, mat2); + } + } + + // Construct output size from input sizes + Tensor output_sizes = self_sizes.clone(); + // The last entry in every row of output_sizes should be last column of mat2_sizes + output_sizes.index_put_({at::indexing::Slice(), -1}, mat2_sizes.select(1, -1).clone()); + + auto self_padded = self_contig.to_padded_tensor(0.); + auto mat2_padded = mat2_contig.to_padded_tensor(0.); + auto output_padded = at::matmul(self_padded, mat2_padded); + auto output_nested = nested_from_padded_generic(output_padded, output_sizes); + return output_nested; +} + +Tensor& matmul_out_nested(const Tensor& tensor1, const Tensor& tensor2, Tensor& result) { + // TODO: this is a very quick and dirty implementation + // should improve it to avoid the intermediate memory usage + Tensor function_result = at::matmul(tensor1, tensor2); + auto function_result_ptr = get_nested_tensor_impl(function_result); + // TODO: this is to reproduce function_result_ptr->opt_sizes_ + // if an accessor is provided in the future, can replace this + std::vector sizes; + for (int64_t i = 0; i < function_result_ptr->dim(); i++) { + c10::optional opt_size = function_result_ptr->opt_size(i); + if (opt_size.has_value()) { + sizes.push_back(*opt_size); + } + else { + sizes.push_back(-1); + } + } + result.reshape(sizes); + result.copy_(function_result); + return result; +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.cpp b/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.cpp index d33decc22433..95c762ccc8ed 100644 --- a/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.cpp +++ b/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.cpp @@ -3,9 +3,11 @@ #include #include #include -#include +#include + #include #include +#include namespace at { namespace native { @@ -138,44 +140,56 @@ Tensor NestedTensor_add_NestedTensor_in_place( return self; } -void NestedTensor_softmax_dropout(const Tensor& query, Tensor& attn_scores) { +Tensor NestedTensor_softmax_dropout(const Tensor& self, const Tensor& query) { const auto* query_nt = get_nested_tensor_impl_or_null(query); TORCH_INTERNAL_ASSERT(query_nt != nullptr); TORCH_INTERNAL_ASSERT(nested_tensor_impl_is_contiguous(query_nt)); const Tensor& sizes = query_nt->get_nested_size_tensor(); const auto num_tensors = sizes.sizes()[0]; - const auto max_seq_len = attn_scores.sizes()[2]; + + auto output = at::empty_like(self,{}, at::MemoryFormat::Contiguous); + TORCH_INTERNAL_ASSERT(output.is_contiguous()); + + const auto max_seq_len = self.sizes()[2]; for (int64_t i = 0; i < num_tensors; i++) { auto seq_len = sizes.index({i, 0}).item(); - auto subseq = attn_scores.index( + auto subseq = self.index( {i, indexing::Slice(), indexing::Slice(0, seq_len), indexing::Slice(0, seq_len)}); auto subscores = at::softmax(subseq, subseq.dim() - 1); - attn_scores.index_put_( + output.index_put_( {i, indexing::Slice(), indexing::Slice(0, seq_len), indexing::Slice(0, seq_len)}, subscores); - attn_scores.index_put_( + output.index_put_( {i, indexing::Slice(), indexing::Slice(0, seq_len), indexing::Slice(seq_len, max_seq_len)}, 0); - attn_scores.index_put_( + output.index_put_( {i, indexing::Slice(), indexing::Slice(seq_len, max_seq_len), indexing::Slice(0, max_seq_len)}, 0); } + return output; } +Tensor NestedTensor_softmax_dropout_cuda(const Tensor& self, const Tensor& query) { + c10::optional attn_mask; + + attn_mask = NestedTensor_to_mask(query, 2, self.size(2)); + attn_mask = attn_mask->to(query.device(), /*non-blocking=*/true); + return _masked_softmax(self, *attn_mask, self.dim() - 1, /*mask type */ 1 ); // NestedTensor_to_mask produces a BxT mask +} Tensor NestedTensor_batch_offsets_from_size_tensor( const Tensor& sizes, @@ -196,8 +210,10 @@ Tensor NestedTensor_batch_offsets_from_size_tensor( return offsets; } + Tensor NestedTensor_to_mask(const Tensor& nt, c10::optional mask_dim, c10::optional mask_dim_length) { auto* nt_impl = get_nested_tensor_impl(nt); + TORCH_CHECK(nested_tensor_impl_is_contiguous(nt_impl), "to_mask only works on contiguous NestedTensors."); TORCH_CHECK( !mask_dim || *mask_dim < nt.dim(), "Requested mask dimension ", @@ -229,5 +245,6 @@ Tensor NestedTensor_to_mask(const Tensor& nt, c10::optional mask_dim, c } return result; } + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.h b/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.h index 96ecfe91c3dd..0f623f896d0f 100644 --- a/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.h +++ b/aten/src/ATen/native/nested/NestedTensorTransformerFunctions.h @@ -50,8 +50,6 @@ Tensor NestedTensor_from_padded_tensor_cpu( const Tensor& padded, const NestedTensorImpl& nt); -void NestedTensor_softmax_dropout(const Tensor& query, Tensor& attn_scores); - Tensor NestedTensor_to_mask(const Tensor& nt, c10::optional mask_dim, c10::optional mask_dim_length); template @@ -85,5 +83,21 @@ void add_padding_kernelLauncher( const std::vector& output_sizes, const int batch_size, const int output_batch_size); + +TORCH_API Tensor flash_attention_helper( + const Tensor& query, + const Tensor& key, + const Tensor& value, + double dropout_p, + bool need_atten_weights, + bool is_causal); + +TORCH_API std::tuple mem_efficient_helper_nested_unpacked( + const Tensor& query, + const Tensor& key, + const Tensor& value, + double dropout_p, + bool need_atten_weights, + bool is_causal); } // namespace native } // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorUnaryOps.cpp b/aten/src/ATen/native/nested/NestedTensorUnaryOps.cpp new file mode 100644 index 000000000000..6be7239775ea --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorUnaryOps.cpp @@ -0,0 +1,74 @@ +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace at { +namespace native { + +Tensor& NestedTensor_relu_(Tensor& self) { + auto self_ptr = get_nested_tensor_impl(self); + check_numel_equals_buffer_size(self_ptr); + auto buffer = self_ptr->get_buffer(); + at::relu_(buffer); + return self; +} + +Tensor NestedTensor_relu(const Tensor& self) { + return map_nt(self, at::relu); +} + +Tensor& NestedTensor_gelu_(Tensor& self, c10::string_view approximate) { + auto self_ptr = get_nested_tensor_impl(self); + check_numel_equals_buffer_size(self_ptr); + auto buffer = self_ptr->get_buffer(); + at::gelu_(buffer, approximate); + return self; +} + +Tensor NestedTensor_gelu(const Tensor& self, c10::string_view approximate) { + return map_nt( + self, + [approximate](const Tensor& buffer) { + return at::gelu(buffer, approximate); + }); +} + +Tensor& NestedTensor_tanh_(Tensor& self) { + auto self_ptr = get_nested_tensor_impl(self); + check_numel_equals_buffer_size(self_ptr); + auto buffer = self_ptr->get_buffer(); + at::tanh_(buffer); + return self; +} + +Tensor NestedTensor_tanh(const Tensor& self) { + return map_nt(self, at::tanh); +} + +Tensor& NestedTensor_neg_(Tensor& self) { + auto self_ptr = get_nested_tensor_impl(self); + check_numel_equals_buffer_size(self_ptr); + auto buffer = self_ptr->get_buffer(); + at::neg_(buffer); + return self; +} + +Tensor NestedTensor_neg(const Tensor& self) { + return map_nt(self, at::neg); +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorUtils.cpp b/aten/src/ATen/native/nested/NestedTensorUtils.cpp new file mode 100644 index 000000000000..50ca7db6cb6b --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorUtils.cpp @@ -0,0 +1,112 @@ +#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#endif + +namespace at { +namespace native { + +/** + * Thin wrapper around get_nested_size_tensor that is registered as a native function + * + * @return The nested tensors' size tensor. + */ +at::Tensor _nested_tensor_size(const at::Tensor& self) { + return get_nested_size_tensor(self); +} + +at::Tensor _nested_tensor_strides(const at::Tensor& self){ + return get_nested_tensor_impl(self) -> get_nested_stride_tensor(); +} +std::vector _nested_tensor_offsets(const at::Tensor& self){ + return get_nested_tensor_impl(self) -> get_storage_offsets(); +} + +// Helper functions for getting information about a nested tensor's shape. +std::vector NestedTensor_get_max_size_from_size_tensor( + const Tensor& sizes) { + if (sizes.dim() == 0) { + return {}; + } + const auto sizes_ptr = sizes.data_ptr(); + const auto sizes_size_0 = sizes.sizes()[0]; + const auto sizes_size_1 = sizes.sizes()[1]; + TORCH_INTERNAL_ASSERT(sizes_size_1 > 0); + std::vector results(sizes_size_1, 0); + for (const auto ii : c10::irange(sizes_size_0)) { + for (const auto jj : c10::irange(sizes_size_1)) { + auto val = sizes_ptr[ii * sizes_size_1 + jj]; + if (results[jj] < val) { + results[jj] = val; + } + } + } + return results; +} + +std::vector NestedTensor_get_max_size(const NestedTensorImpl& nt) { + return NestedTensor_get_max_size_from_size_tensor( + nt.get_nested_size_tensor()); +} + +int64_t get_consistent_last_dim_of_nested_tensor(const NestedTensorImpl& nt) { + c10::optional last_dim = nt.opt_size(-1); + TORCH_CHECK( + last_dim != c10::nullopt, + "Expected all tensors in nested tensor to have the same trailing dimension, instead last dimension equals: ", + nt.get_nested_size_tensor().select(1, -1)); + return *last_dim; +} + +std::vector chunk_nested_tensor(const Tensor& self, int64_t chunks, int64_t dim) { + int64_t ndim = self.dim(); + if (ndim == 0) { + TORCH_CHECK_INDEX(false, "chunk() cannot be applied to a 0-dim tensor."); + } + dim = maybe_wrap_dim(dim, ndim); + TORCH_CHECK(self.dim() - 1 == dim, + "Chunk for nested tensors is currently only supported for the last dimension."); + TORCH_CHECK(chunks > 0,"chunk expects `chunks` to be greater than 0, got: ", chunks); + TORCH_CHECK(self.is_contiguous(), "chunk expects `self` to be contiguous."); + auto self_impl = get_nested_tensor_impl(self); + const int64_t last_dim_size = get_consistent_last_dim_of_nested_tensor(*self_impl); + TORCH_CHECK(last_dim_size % chunks == 0, + "Chunk for nested tensors is only supported for nested tensors with trailing dimension divisible by chunks, got: ", + last_dim_size, " % ", chunks, " != 0"); + int64_t n_tensors = self.size(0); + int64_t split_size = last_dim_size / chunks; + std::vector splits(chunks); + const auto& sizes = self_impl->get_nested_size_tensor(); + const auto& strides = self_impl->get_nested_stride_tensor(); + const std::vector& offsets = self_impl->get_storage_offsets(); + // Account for the implicit batch dim + --dim; + int64_t tensor_dim = sizes.size(1); + for (const auto split_idx : c10::irange(chunks)) { + auto new_sizes = sizes.clone() ; + auto new_strides = strides.clone(); + // This copys offsets so we are safe to move + auto new_offsets = std::vector(offsets); + int64_t *size_ptr = new_sizes.data_ptr(); + // Get start val for each split + int64_t start_val = split_idx * split_size; + for (int64_t i : c10::irange(n_tensors)) { + const int64_t index = i * tensor_dim + dim; + new_offsets[i] = offsets[i] + start_val; + size_ptr[index] = split_size; + } + splits[split_idx] = create_nested_view_tensor(self, new_sizes, new_strides, std::move(new_offsets)); + } + return splits; +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/NestedTensorUtils.h b/aten/src/ATen/native/nested/NestedTensorUtils.h new file mode 100644 index 000000000000..6590db9116e0 --- /dev/null +++ b/aten/src/ATen/native/nested/NestedTensorUtils.h @@ -0,0 +1,423 @@ +#pragma once + +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS + +#include +#include +#else +#include +#include +#include +#include +#include +#include +#endif + +#include + +namespace at { +namespace native { +struct NestedTensorImpl; + +// The following functions are used to construct nested tensors from buffers and +// metadata. + +inline at::Tensor wrap_buffer( + at::Tensor buffer, + at::Tensor nested_size_tensor) { + TORCH_INTERNAL_ASSERT_DEBUG_ONLY( + buffer.is_contiguous(), "Given buffer must be contiguous."); + return at::detail::make_tensor( + std::move(buffer), std::move(nested_size_tensor)); +} + +inline at::Tensor wrap_buffer( + at::Tensor buffer, + at::Tensor nested_size_tensor, + at::Tensor nested_stride_tensor, + std::vector&& offsets) { + TORCH_INTERNAL_ASSERT_DEBUG_ONLY( + buffer.is_contiguous(), "Given buffer must be contiguous."); + return at::detail::make_tensor( + std::move(buffer), + std::move(nested_size_tensor), + std::move(nested_stride_tensor), + std::move(offsets)); +} + +inline at::Tensor wrap_buffer( + at::Tensor buffer, + at::Tensor nested_size_tensor, + at::Tensor nested_stride_tensor, + const std::vector& offsets) { + std::vector offsets_copy(offsets); + return wrap_buffer( + buffer, + nested_size_tensor, + nested_stride_tensor, + std::move(offsets_copy)); +} + +inline at::Tensor get_buffer(const at::Tensor& tensor) { + return get_nested_tensor_impl(tensor)->get_buffer(); +} + +/** + * Create a new nested tensor that is a view of a base nested tensor + * + * create_view_tensor calls a specialized constructor that copys the + * the keys from base onto the new view tensor being created. + * The storage is shared between the base and the returned view tensor + * + * All callers of this helper must: + * - Only return a view of the input + * - Must be explicit and define a derivative + * + * @param base Base tensor to construct view from. + * @param nested_size_tensor View tensors' sizes. + * @param nested_stride_tensor View tensors' strides. + * @param offsets View tensors' offsets. + * @return A newly constructed view tensor + */ +inline at::Tensor create_nested_view_tensor( + const at::Tensor& base, + at::Tensor nested_size_tensor, + at::Tensor nested_stride_tensor, + std::vector&& offsets) { + TORCH_INTERNAL_ASSERT( + base.is_nested(), + "This function can only be used to create nested tensor views"); + TORCH_INTERNAL_ASSERT( + c10::impl::tls_local_dispatch_key_set().excluded_.has( + c10::DispatchKey::AutogradFunctionality), + "Creating a non differentiable nested tensor view in a CompositeImplicit function is not allowed."); + return at::detail::make_tensor( + c10::TensorImpl::VIEW, + base, + nested_size_tensor, + nested_stride_tensor, + std::move(offsets)); +} +// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +// Helper functions for getting information about a nested tensor's shape. + +int64_t get_consistent_last_dim_of_nested_tensor(const NestedTensorImpl& nt); + +// The sizes of the underlying tensors +inline std::vector NestedTensor_get_sizes( + const NestedTensorImpl* self_ptr) { + int64_t ntensors = self_ptr->size(0); + std::vector sizes(ntensors); + if (ntensors == 0) { + return sizes; + } + const Tensor& sizemat = self_ptr->get_nested_size_tensor(); + int64_t orig_dim = sizemat.size(1); + // nesting scalars has empty sizes + if (orig_dim == 0) { + return sizes; + } + const int64_t* sizemat_ptr = sizemat.data_ptr(); + + for (const auto i : c10::irange(ntensors)) { + sizes[i] = IntArrayRef(sizemat_ptr, sizemat_ptr + orig_dim); + sizemat_ptr += orig_dim; + } + return sizes; +} + +TORCH_API std::vector NestedTensor_get_max_size( + const NestedTensorImpl& nt); + +std::vector NestedTensor_get_max_size_from_size_tensor( + const Tensor& sizes); + +inline std::vector NestedTensor_get_sizes(const at::Tensor& self) { + const NestedTensorImpl* self_ptr = get_nested_tensor_impl(self); + return NestedTensor_get_sizes(self_ptr); +} +// The strides of the underlying tensors +inline std::vector NestedTensor_get_strides( + const NestedTensorImpl* self_ptr) { + int64_t ntensors = self_ptr->size(0); + std::vector strides(ntensors); + if (ntensors == 0) { + return strides; + } + const Tensor& stridemat = self_ptr->get_nested_stride_tensor(); + int64_t orig_dim = stridemat.size(1); + // nesting scalars has empty strides + if (orig_dim == 0) { + return strides; + } + const int64_t* stridemat_ptr = stridemat.data_ptr(); + for (const auto i : c10::irange(ntensors)) { + strides[i] = IntArrayRef(stridemat_ptr, stridemat_ptr + orig_dim); + stridemat_ptr += orig_dim; + } + return strides; +} + +inline std::vector NestedTensor_get_strides( + const at::Tensor& self) { + const NestedTensorImpl* self_ptr = get_nested_tensor_impl(self); + return NestedTensor_get_strides(self_ptr); +} + +inline void check_numel_equals_buffer_size(const at::Tensor& self) { + auto self_impl = get_nested_tensor_impl(self); + TORCH_CHECK( + self.numel() == self_impl->get_buffer_size(), + "Number of elements in nested tensor must match number of elements in buffer."); +} + +inline void check_numel_equals_buffer_size(const NestedTensorImpl* self_ptr) { + TORCH_CHECK( + self_ptr->numel() == self_ptr->get_buffer_size(), + "Number of elements in nested tensor must match number of elements in buffer."); +} +// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +// Data structures and functions for generically applying a function on a nested +// tensor. +namespace impl { + +template +struct NestedNode { + NestedNode() = delete; + explicit NestedNode(std::vector&& children) + : _is_leaf(false), _children(children) {} + explicit NestedNode(TensorList children) + : _is_leaf(false), _children(children.vec()) {} + // NestedNode(NestedNode&) = delete; + // NestedNode(const NestedNode&) = delete; + // NestedNode& operator=(NestedNode) = delete; + explicit NestedNode(T payload) : _is_leaf(true), _payload(payload) {} + inline bool is_leaf() const { + return _is_leaf; + } + inline size_t degree() const { + return _children.size(); + } + inline const std::vector unbind() const { + return _children; + } + inline T children(size_t i) const { + return _children[i]; + } + inline const T& payload() const { + return _payload; + } + inline T& payload() { + return _payload; + } + + private: + bool _is_leaf; + std::vector _children; + T _payload; +}; + +using TensorNode = NestedNode; + +template +class _map; + +template +class _map> { + public: + static A function_one(F&& fn, const Args&... nested_node) { + return std::forward(fn)(nested_node...); + } + // NOTE: We must move F to avoid copying objects if it is a lambda with + // captures. + static NestedNode function( + F&& fn, + const NestedNode&... nested_node) { + size_t degree = 0; + bool all_leaf = true; + c10::guts::tuple_map( + std::forward_as_tuple(nested_node...), [&all_leaf, °ree](auto n) { + all_leaf = all_leaf && (n.is_leaf()); + if (degree > 1 && n.degree() > 1) { + TORCH_CHECK( + degree == n.degree(), "NestedNodes must match in degree."); + } + if (n.degree() > degree) { + degree = n.degree(); + } + return nullptr; + }); + // All NestedNodes just wrap regular objects. + if (all_leaf) { + return NestedNode(std::forward(fn)(nested_node.payload()...)); + } + // Some NestedNodes wrap regular Tensors, some NestedTensors and some other + // types. + std::vector result; + for (size_t i = 0; i < degree; i++) { + std::tuple children = c10::guts::tuple_map( + std::forward_as_tuple(nested_node...), [&i](auto a) { + static_assert( + c10::guts::is_instantiation_of::value, + "Internal error."); + // Broadcast regular arguments across NestedTensor constituents. + // This could be a Tensor, integer or anything else really. + if (a.is_leaf()) { + return a.payload(); + } + // Broadcast NestedTensors with one constituent. + if (a.degree() == 1 && !a.is_leaf()) { + return a.children(0); + } + TORCH_CHECK(a.degree() > 0, "Internal assert."); + return a.children(i); + }); + c10::guts::apply( + [&result, &fn](Args... filtered) { + result.emplace_back(function_one(std::forward(fn), filtered...)); + }, + std::move(children)); + } + return NestedNode(std::move(result)); + } +}; + +// TODO: Add static assert to verify lambda arguments match nested_node types +template +static inline NestedNode< + typename c10::guts::infer_function_traits::type::return_type> +map(F&& fn, const NestedNode&... nested_node) { + return _map< + F, + typename c10::guts::infer_function_traits::type::return_type, + typename c10::guts::infer_function_traits::type::parameter_types>:: + function(std::forward(fn), nested_node...); +} + +inline TensorNode get_nested_tensor_structure(at::Tensor tensor) { + if (get_nested_tensor_impl_or_null(tensor) == nullptr) { + return TensorNode(std::move(tensor)); + } + return TensorNode(tensor.unbind()); +} + +inline Tensor wrap_tensor_node( + TensorNode tensor_node, + c10::optional dtype, + c10::optional layout, + c10::optional device, + c10::optional pin_memory) { + TORCH_CHECK( + !tensor_node.is_leaf(), "Expected TensorNode to wrap a list of Tensors."); + TensorOptions options_ = + TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory( + pin_memory); + if (tensor_node.degree() == 0) { + return wrap_buffer(ones({0}, dtype, layout, device), ones({})); + } + + // Fast path: if all tensors are on CPU, have contiguous memory, and the same + // dtype, copying can be done much faster. + bool all_tensors_cpu = true; + bool all_tensors_contiguous = true; + bool all_tensors_same_dtype = true; + auto first_dtype = tensor_node.children(0).dtype(); + std::vector start_offsets(tensor_node.degree()); + start_offsets[0] = 0; + long total_size = 0; + for (const auto i : c10::irange(tensor_node.degree())) { + all_tensors_cpu = all_tensors_cpu && tensor_node.children(i).is_cpu(); + all_tensors_contiguous = + all_tensors_contiguous && tensor_node.children(i).is_contiguous(); + all_tensors_same_dtype = all_tensors_same_dtype && + (first_dtype == tensor_node.children(i).dtype()); + if (!(all_tensors_cpu && all_tensors_contiguous && + all_tensors_same_dtype)) { + break; + } + if (i > 0) { + start_offsets[i] = + start_offsets[i - 1] + tensor_node.children(i - 1).numel(); + } + total_size += tensor_node.children(i).numel(); + } + + TensorOptions options; + Tensor nt_buffer, nt_sizes; + if (all_tensors_cpu && all_tensors_contiguous && all_tensors_same_dtype) { + nt_buffer = at::empty({total_size}, tensor_node.children(0).options()); + nt_sizes = at::empty( + {static_cast(tensor_node.degree()), + static_cast(tensor_node.children(0).sizes().size())}, + TensorOptions().dtype(kLong)); + AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3( + at::ScalarType::Half, + at::ScalarType::Bool, + at::ScalarType::BFloat16, + c10::typeMetaToScalarType(first_dtype), + "create_nt_buffer", + [&]() { + at::parallel_for( + 0, tensor_node.degree(), 1, [&](int64_t begin, int64_t end) { + for (int64_t i = begin; i < end; ++i) { + // Only try copying memory if there is more than 0 elements + // for a certain tensor + if (tensor_node.children(i).numel() > 0) { + memcpy( + nt_buffer.data_ptr() + start_offsets[i], + tensor_node.children(i).data_ptr(), + tensor_node.children(i).numel() * sizeof(scalar_t)); + } + } + }); + }); + long sizes_offset = 0; + for (size_t i = 0; i < tensor_node.degree(); ++i) { + auto tensor_sizes = tensor_node.children(i).sizes(); + for (size_t j = 0; j < tensor_sizes.size(); ++j) { + nt_sizes.data_ptr()[sizes_offset++] = tensor_sizes[j]; + } + } + options = nt_buffer.options().merge_in(options_); + } else { // Slow path + std::vector flat_tensors; + std::vector sizes; + for (const auto i : c10::irange(tensor_node.degree())) { + flat_tensors.push_back(tensor_node.children(i).reshape(-1).contiguous()); + sizes.push_back( + tensor(c10::IntArrayRef(tensor_node.children(i).sizes()))); + } + options = flat_tensors[0].options().merge_in(options_); + nt_buffer = at::cat(flat_tensors); + nt_sizes = at::native::stack(sizes); + } + + return wrap_buffer(nt_buffer.to(options), nt_sizes); +} + +} // namespace impl + +// This function is meant to ease rapid operator coverage for +// NestedTensor kernels. It is not meant to be efficient. Use it judiciously. +template +inline at::Tensor map_nested_tensor(F&& fn, A... a) { + return wrap_tensor_node( + impl::map(std::forward(fn), impl::get_nested_tensor_structure(a)...), + c10::nullopt, + c10::nullopt, + c10::nullopt, + c10::nullopt); +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/cuda/NestedTensorBinaryOps.cu b/aten/src/ATen/native/nested/cuda/NestedTensorBinaryOps.cu new file mode 100644 index 000000000000..678e62f5a81c --- /dev/null +++ b/aten/src/ATen/native/nested/cuda/NestedTensorBinaryOps.cu @@ -0,0 +1,120 @@ +#include + +#include + +#include +#include + +#include +#include +#include +#include +#include + +#include +#include + + +#include + +#define BLOCK_DIM 256 + +namespace at { +namespace native { + + +// only for nested [B, *, D], dense [B, 1, D] +template +__global__ void op_dense_esuhm( + const T* input, + const T* dense, + T* output, + int64_t embedding_dim, + const int64_t* offsets, + const func_t& f) +{ + // each batch is handled by a block + const int64_t batch_idx = blockIdx.x; + const int64_t grain_size = blockDim.x; + const int64_t tid = threadIdx.x; + const int64_t range = offsets[batch_idx + 1] - offsets[batch_idx]; + // each thread handles (embedding_dim // grain_size + (embedding_dim % grain_size <= tid)) elems + // of the dense embedding + for (int64_t idx = tid; idx < embedding_dim; idx += grain_size) { + const T dense_elem = dense[batch_idx * embedding_dim + idx]; + for (int64_t nested_idx = idx; nested_idx < range; nested_idx += embedding_dim) { + output[offsets[batch_idx] + nested_idx] = f(input[offsets[batch_idx] + nested_idx], dense_elem); + } + } +} + +template +void nested_op_dense_kernelLauncher( + const T* input, // [sum(*) x embedding_dim] + const T* dense, // [batch_size x embedding_dim] + T* output, // [sum(*) x embedding_dim] + int64_t batch_size, + int64_t embedding_dim, + const int64_t* input_offsets, // [batch_size] + func_t f) +{ + dim3 grid; + grid.x = batch_size; + const auto stream = at::cuda::getDefaultCUDAStream(); + + op_dense_esuhm<<>>( + input, + dense, + output, + embedding_dim, + input_offsets, + f); +} + +template +void _nested_op_dense_esuhm_kernel(Tensor& result, const Tensor& self, const Tensor& other, func_t f) { + auto self_ptr = get_nested_tensor_impl(self); + auto result_ptr = get_nested_tensor_impl(result); + + const auto self_buffer = self_ptr->get_buffer(); + const auto offsets = self_ptr->get_storage_offsets(); + const auto batch_size = other.size(0); + const auto embedding_size = other.size(2); + + auto result_buffer = result_ptr->get_buffer(); + auto result_offsets = at::cat({at::tensor(offsets), at::tensor(self_ptr->numel())}); + result_offsets = result_offsets.to(kCUDA); + + const scalar_t* self_data_ptr = self_buffer.data_ptr(); + const scalar_t* other_data_ptr = other.data_ptr(); + scalar_t* result_data_ptr = result_buffer.data_ptr(); + int64_t* result_offsets_ptr = result_offsets.data_ptr(); + + nested_op_dense_kernelLauncher( + self_data_ptr, + other_data_ptr, + result_data_ptr, + batch_size, + embedding_size, + result_offsets_ptr, + f); +} + +void _nested_op_dense_esuhm_cuda(Tensor& result, const Tensor& self, const Tensor& other, const NESTED_DENSE_OP& op) { + AT_DISPATCH_ALL_TYPES_AND2( + ScalarType::Half, ScalarType::BFloat16, self.scalar_type(), "_nested_op_dense_esuhm", [&]() { + switch (op) { + case NESTED_DENSE_OP::ADD : + _nested_op_dense_esuhm_kernel(result, self, other, [] __host__ __device__ (scalar_t a, scalar_t b) -> scalar_t { return a + b; }); + break; + case NESTED_DENSE_OP::MUL : + _nested_op_dense_esuhm_kernel(result, self, other, [] __host__ __device__ (scalar_t a, scalar_t b) -> scalar_t { return a * b; }); + break; + } + }); +} + +REGISTER_CUDA_DISPATCH(nested_dense_elementwise_stub, &_nested_op_dense_esuhm_cuda); + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu b/aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu new file mode 100644 index 000000000000..22cf38f85020 --- /dev/null +++ b/aten/src/ATen/native/nested/cuda/NestedTensorMatmul.cu @@ -0,0 +1,416 @@ +#include + +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include +#include + +#ifndef USE_ROCM +#ifndef _WIN32 +#include +#include +#include +#endif +#endif + +#include + +#define BLOCK_DIM 256 +#define GRID_DIM_Y 16 + +namespace at { +namespace native { + +#ifndef USE_ROCM +#ifndef _WIN32 +namespace { + +template < + typename scalar_t, + unsigned int kPad, + typename LayoutA, + typename LayoutB, + typename OpClass, + typename Arch, + typename ThreadBlockShape, + typename WarpShape, + typename InstructionShape> +void gemm_grouped_cuda_internal( + const std::vector& lda, + const std::vector& ldb, + const std::vector& ldd, + const std::vector& aptr, + const std::vector& bptr, + const std::vector& dptr, + const std::vector& gemm_sizes, + const int problem_count, + at::Device& device) { + using Element = scalar_t; + using ElementAcc = float; + + using GemmConfiguration = + typename cutlass::gemm::device::DefaultGemmConfiguration< + OpClass, + Arch, + Element, + Element, + Element, + ElementAcc>; + + using GemmKernel = typename cutlass::gemm::kernel::DefaultGemmGrouped< + Element, + LayoutA, + cutlass::ComplexTransform::kNone, + kPad, + Element, + LayoutB, + cutlass::ComplexTransform::kNone, + kPad, + Element, + cutlass::layout::RowMajor, + ElementAcc, + OpClass, + Arch, + ThreadBlockShape, + WarpShape, + InstructionShape, + typename GemmConfiguration::EpilogueOutputOp, + cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle, + GemmConfiguration::kStages>::GemmKernel; + + using GemmGrouped = typename cutlass::gemm::device::GemmGrouped; + using EpilogueOutputOp = typename GemmGrouped::GemmKernel::Epilogue::OutputOp; + typename EpilogueOutputOp::Params epilogue_op(/*alpha*/ 1, /*beta*/ 0); + + const int64_t gemm_coord_size = + problem_count * ((int64_t)sizeof(cutlass::gemm::GemmCoord)); + // Number of gmm args not including *problem_sizes + at::Tensor gmm_args = at::empty( + {problem_count * 6 + gemm_coord_size}, + at::TensorOptions().dtype(at::kLong).pinned_memory(true)); + + // Obtain pointers for each argument (on host) + int64_t* lda_data = gmm_args.data_ptr(); // Base pointer + int64_t* ldb_data = lda_data + problem_count; + int64_t* ldd_data = lda_data + 2 * problem_count; + int64_t* ptr_a_data = lda_data + 3 * problem_count; + int64_t* ptr_b_data = lda_data + 4 * problem_count; + int64_t* ptr_d_data = lda_data + 5 * problem_count; + cutlass::gemm::GemmCoord* problem_sizes_data = + reinterpret_cast(lda_data + 6 * problem_count); + + // Set arguments into gmm_args from input args + for (int i = 0; i < problem_count; ++i) { + problem_sizes_data[i] = gemm_sizes[i]; + lda_data[i] = lda[i]; + ldb_data[i] = ldb[i]; + ldd_data[i] = ldd[i]; + ptr_a_data[i] = reinterpret_cast(aptr[i]); + ptr_b_data[i] = reinterpret_cast(bptr[i]); + ptr_d_data[i] = reinterpret_cast(dptr[i]); + } + const int threadblock_count = + GemmGrouped::sufficient(problem_sizes_data, problem_count); + + // Transfer arguments to GPU + gmm_args = gmm_args.to(device, true); + + // Obtain pointers for each of arguments (on GPU) + lda_data = gmm_args.data_ptr(); // Base pointer + ldb_data = lda_data + problem_count; + ldd_data = lda_data + 2 * problem_count; + ptr_a_data = lda_data + 3 * problem_count; + ptr_b_data = lda_data + 4 * problem_count; + ptr_d_data = lda_data + 5 * problem_count; + problem_sizes_data = + reinterpret_cast(lda_data + 6 * problem_count); + + // Create GemmGrouped::Arguments using the arguments prepared above + typename GemmGrouped::Arguments args( + problem_sizes_data, + problem_count, + threadblock_count, + epilogue_op, + reinterpret_cast(ptr_a_data), + reinterpret_cast(ptr_b_data), + reinterpret_cast(ptr_d_data), + reinterpret_cast(ptr_d_data), + lda_data, + ldb_data, + ldd_data, + ldd_data); + + GemmGrouped gemm; + cutlass::Status status = + gemm.initialize(args, nullptr, at::cuda::getCurrentCUDAStream()); + TORCH_CHECK( + status != cutlass::Status::kErrorWorkspaceNull, + "Failed to initialize CUTLASS Grouped GEMM kernel due to workspace."); + TORCH_CHECK( + status != cutlass::Status::kErrorInternal, + "Failed to initialize CUTLASS Grouped GEMM kernel due to internal error."); + TORCH_CHECK( + status == cutlass::Status::kSuccess, + "Failed to initialize CUTLASS Grouped GEMM kernel."); + + // Run CUTLASS group GEMM + status = gemm.run(at::cuda::getCurrentCUDAStream()); + TORCH_CHECK( + status == cutlass::Status::kSuccess, + "Failed to run CUTLASS Grouped GEMM kernel."); + + C10_CUDA_KERNEL_LAUNCH_CHECK(); +} + +template +bool group_gemm_dispatch( + at::Device device, + const std::vector& aptr, + const std::vector& bptr, + const std::vector& dptr, + const std::vector& lda, + const std::vector& ldb, + const std::vector& ldd, + std::vector gemm_sizes, + int64_t ntensors) { + return false; +} + +template <> +bool group_gemm_dispatch( + at::Device device, + const std::vector& aptr, + const std::vector& bptr, + const std::vector& dptr, + const std::vector& lda, + const std::vector& ldb, + const std::vector& ldd, + std::vector gemm_sizes, + int64_t ntensors) { + + gemm_grouped_cuda_internal< + float, + 1, + cutlass::layout::RowMajor, + cutlass::layout::RowMajor, + cutlass::arch::OpClassSimt, + cutlass::arch::Sm80, + cutlass::gemm::GemmShape<128, 128, 8>, + cutlass::gemm::GemmShape<64, 32, 8>, + cutlass::gemm::GemmShape<1, 1, 1>>( + lda, ldb, ldd, aptr, bptr, dptr, gemm_sizes, ntensors, device); + return true; +} + +template <> +bool group_gemm_dispatch( + at::Device device, + const std::vector& aptr_, + const std::vector& bptr_, + const std::vector& dptr_, + const std::vector& lda, + const std::vector& ldb, + const std::vector& ldd, + std::vector gemm_sizes, + int64_t ntensors) { + + // Check alignment + bool all_pad_8 = true; + for (int i = 0; i < ntensors; i++) { + all_pad_8 = all_pad_8 && (gemm_sizes[i].n() % 8 == 0); + all_pad_8 = all_pad_8 && (gemm_sizes[i].k() % 8 == 0); + + // Not sure if this is a requirement, on the safe side + all_pad_8 = all_pad_8 && (lda[i] % 8 == 0); + all_pad_8 = all_pad_8 && (ldb[i] % 8 == 0); + all_pad_8 = all_pad_8 && (ldd[i] % 8 == 0); + } + + std::vector aptr; + std::vector bptr; + std::vector dptr; + for (int64_t i = 0; i < ntensors; i++) { + aptr.push_back(reinterpret_cast(aptr_[i])); + bptr.push_back(reinterpret_cast(bptr_[i])); + dptr.push_back(reinterpret_cast(dptr_[i])); + } + if (all_pad_8) { + gemm_grouped_cuda_internal< + cutlass::half_t, + 8, + cutlass::layout::RowMajor, + cutlass::layout::RowMajor, + cutlass::arch::OpClassTensorOp, + cutlass::arch::Sm80, + cutlass::gemm::GemmShape<128, 128, 32>, + cutlass::gemm::GemmShape<64, 64, 32>, + cutlass::gemm::GemmShape<16, 8, 16>>( + lda, ldb, ldd, aptr, bptr, dptr, gemm_sizes, ntensors, device); + return true; + } else { + gemm_grouped_cuda_internal< + cutlass::half_t, + 1, + cutlass::layout::RowMajor, + cutlass::layout::RowMajor, + cutlass::arch::OpClassSimt, + cutlass::arch::Sm80, + cutlass::gemm::GemmShape<128, 128, 8>, + cutlass::gemm::GemmShape<64, 32, 8>, + cutlass::gemm::GemmShape<1, 1, 1>>( + lda, ldb, ldd, aptr, bptr, dptr, gemm_sizes, ntensors, device); + return true; + } + // Did not perform GEMM + return false; +} + +} // namespace + +#endif +#endif + +Tensor bmm_nested_cuda(const Tensor& self, const Tensor& mat2) { + if (self.is_nested() && !mat2.is_nested()) { + AT_ERROR( + "Expected both to be nested, but got a nested self and non-nested other"); + } else if (!self.is_nested() && mat2.is_nested()) { + AT_ERROR( + "Expected both to be nested, but got a non-nested self and nested other"); + } + // dispatcher should have guaranteed that at least one is nested + auto self_ptr = get_nested_tensor_impl(self); + auto mat2_ptr = get_nested_tensor_impl(mat2); + TORCH_CHECK(self_ptr->dim() == 3, "batch1 must be a 3D tensor"); + TORCH_CHECK(mat2_ptr->dim() == 3, "batch2 must be a 3D tensor"); + int64_t ntensors = self_ptr->size(0), ntensors2 = mat2_ptr->size(0); + TORCH_CHECK( + ntensors == ntensors2, + "Expected size for the 1st dimension of batch2 tensor to be: ", + ntensors, + " but got: ", + ntensors2, + "."); + + // create a contiguous output + const Tensor& self_sizemat = self_ptr->get_nested_size_tensor(); + Tensor out_sizemat = self_sizemat.new_empty(self_sizemat.sizes()); + int64_t* out_sizemat_ptr = out_sizemat.data_ptr(); + + std::vector self_sizes = NestedTensor_get_sizes(self_ptr); + std::vector mat2_sizes = NestedTensor_get_sizes(mat2_ptr); + + int64_t out_numel = 0; + for (int64_t i = 0; i < ntensors; i++) { + const IntArrayRef &self_shape = self_sizes[i], &mat2_shape = mat2_sizes[i]; + const int64_t &self_size0 = self_shape[0], &self_size1 = self_shape[1], + &mat2_size0 = mat2_shape[0], &mat2_size1 = mat2_shape[1]; + TORCH_CHECK( + self_size1 == mat2_size0, + i, + "-th nested matrices in batch cannot be multiplied (", + self_size0, + "x", + self_size1, + " and ", + mat2_size0, + "x", + mat2_size1, + ")"); + out_sizemat_ptr[0] = self_size0; + out_sizemat_ptr[1] = mat2_size1; + out_sizemat_ptr += 2; + out_numel += self_size0 * mat2_size1; + } + const Tensor &self_buffer = self_ptr->get_unsafe_storage_as_tensor(); + const Tensor &mat2_buffer = mat2_ptr->get_unsafe_storage_as_tensor(); + Tensor out_buffer = self_buffer.new_empty(out_numel); + Tensor output = wrap_buffer(out_buffer, out_sizemat); + auto out_ptr = get_nested_tensor_impl(output); + + std::vector self_strides = NestedTensor_get_strides(self_ptr); + std::vector mat2_strides = NestedTensor_get_strides(mat2_ptr); + const std::vector& self_offsets = self_ptr->get_storage_offsets(); + const std::vector& mat2_offsets = mat2_ptr->get_storage_offsets(); + const std::vector& out_offsets = out_ptr->get_storage_offsets(); + +#ifndef USE_ROCM +#ifndef _WIN32 + bool success = false; + AT_DISPATCH_FLOATING_TYPES_AND_HALF( + self.scalar_type(), "group_gemm_dispatch", [&] { + std::vector aptr(ntensors); + std::vector bptr(ntensors); + std::vector dptr(ntensors); + std::vector lda(ntensors); + std::vector ldb(ntensors); + std::vector ldd(ntensors); + std::vector gemm_sizes; + bool all_row_major = true; + for (int64_t i = 0; i < ntensors; i++) { + const IntArrayRef& self_shape = self_sizes[i]; + const IntArrayRef& mat2_shape = mat2_sizes[i]; + const int64_t &self_size0 = self_shape[0]; + const int64_t &self_size1 = self_shape[1]; + const int64_t &mat2_size0 = mat2_shape[0]; + const int64_t &mat2_size1 = mat2_shape[1]; + gemm_sizes.push_back( + cutlass::gemm::GemmCoord(self_size0, mat2_size1, self_size1)); + aptr[i] = self_buffer.data_ptr() + self_offsets[i]; + bptr[i] = mat2_buffer.data_ptr() + mat2_offsets[i]; + dptr[i] = out_buffer.data_ptr() + out_offsets[i]; + all_row_major = all_row_major && (self_strides[i][1] == 1); + all_row_major = all_row_major && (mat2_strides[i][1] == 1); + lda[i] = self_strides[i][0]; + ldb[i] = mat2_strides[i][0]; + ldd[i] = mat2_size1; + } + auto dprops = at::cuda::getCurrentDeviceProperties(); + bool is_sm8x = dprops->major == 8 && dprops->minor >= 0; + if (all_row_major && + self.is_contiguous() && + mat2.is_contiguous() && + is_sm8x) { + success = group_gemm_dispatch( + output.device(), + aptr, + bptr, + dptr, + lda, + ldb, + ldd, + gemm_sizes, + ntensors); + } + }); + if (success) { + return output; + } +#endif +#endif + + std::vector output_unbind = output.unbind(); + for (int64_t i = 0; i < ntensors; i++) { + at::mm_out( + output_unbind[i], + self_buffer.as_strided(self_sizes[i], self_strides[i], self_offsets[i]), + mat2_buffer.as_strided( + mat2_sizes[i], mat2_strides[i], mat2_offsets[i])); + } + return output; +} + +} // namespace native +} // namespace at diff --git a/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp b/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp index d89e5c5763d7..9c72454560d3 100644 --- a/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp +++ b/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cpp @@ -1,7 +1,11 @@ +#include +#include #include +#include #include #include +#include #ifndef AT_PER_OPERATOR_HEADERS #include @@ -9,9 +13,13 @@ #include #endif +#include #include #include +#include +#include +#include namespace at { namespace native { namespace { @@ -36,16 +44,16 @@ Tensor nested_from_padded_cuda( const Tensor& sizes, bool do_transform_0213) { if (padded.dim() > 1 && padded.dim() < 5) { + // Instead of erroring call the generic version + if(!(padded.dim() == 4 && do_transform_0213) && !(padded.dim() == 3 && !do_transform_0213)){ + return at::native::nested_from_padded_generic(padded, sizes, do_transform_0213); + } if (padded.dtype() != kFloat && padded.dtype() != kHalf) { TORCH_WARN_ONCE( "nested_from_padded CUDA kernels only support fp32/fp16; falling " "back to slower generic kernel"); return at::native::nested_from_padded_generic(padded, sizes, do_transform_0213); } - TORCH_CHECK( - (padded.dim() == 4 && do_transform_0213) || - (padded.dim() == 3 && !do_transform_0213), - "padded tensor size error"); Tensor target_offsets = NestedTensor_batch_offsets_from_size_tensor(sizes, 0); Tensor padded_sizes_tensor = at::tensor(padded.sizes()); @@ -60,45 +68,46 @@ Tensor nested_from_padded_cuda( auto input_size_ptr = output_size_ptr + target_size_sizes.numel(); auto offsets_ptr = input_size_ptr + padded_sizes_tensor.numel(); + Tensor padded_contiguous = padded.contiguous(); if (padded.dtype() == kFloat) { if (do_transform_0213) { remove_padding_transform0213_kernelLauncher( - padded.data_ptr(), + padded_contiguous.data_ptr(), output.data_ptr(), offsets_ptr, input_size_ptr, output_size_ptr, - padded.dim() - 2, - padded.sizes()[0]); + padded_contiguous.dim() - 2, + padded_contiguous.sizes()[0]); } else { remove_padding_kernelLauncher( - padded.data_ptr(), + padded_contiguous.data_ptr(), output.data_ptr(), offsets_ptr, input_size_ptr, output_size_ptr, - padded.dim() - 1, - padded.sizes()[0]); + padded_contiguous.dim() - 1, + padded_contiguous.sizes()[0]); } } else if (padded.dtype() == kHalf) { if (do_transform_0213) { remove_padding_transform0213_kernelLauncher( - padded.data_ptr(), + padded_contiguous.data_ptr(), output.data_ptr(), offsets_ptr, input_size_ptr, output_size_ptr, - padded.dim() - 2, - padded.sizes()[0]); + padded_contiguous.dim() - 2, + padded_contiguous.sizes()[0]); } else { remove_padding_kernelLauncher( - padded.data_ptr(), + padded_contiguous.data_ptr(), output.data_ptr(), offsets_ptr, input_size_ptr, output_size_ptr, - padded.dim() - 1, - padded.sizes()[0]); + padded_contiguous.dim() - 1, + padded_contiguous.sizes()[0]); } } else { AT_ERROR("Only support fp32/fp16 for padded input"); @@ -143,8 +152,8 @@ Tensor NestedTensor_to_padded_tensor_cuda( if (t_dim == 3 && nt_input->opt_size(2) && (*nt_input->opt_size(2) > 0) && !(output_size.has_value())) { Tensor nt_sizes = nt_input->get_nested_size_tensor(); - Tensor sizes_dim1 = at::native::narrow(nt_sizes, 1, 0, 1); - Tensor sizes_dim2 = at::native::narrow(nt_sizes, 1, 1, 1); + Tensor sizes_dim1 = at::native::narrow_symint(nt_sizes, 1, 0, 1); + Tensor sizes_dim2 = at::native::narrow_symint(nt_sizes, 1, 1, 1); Tensor result = at::detail::make_tensor( nt_input->get_buffer(), sizes_dim1 * sizes_dim2[0]); TORCH_INTERNAL_ASSERT_DEBUG_ONLY(result.dim() == 2); @@ -204,5 +213,342 @@ Tensor NestedTensor_to_padded_tensor_cuda( } return NestedTensor_to_padded_tensor_generic(t, padding, output_size); } + +namespace{ + +/** + * This function is used to calculate two pieces of metadata that are needed + * for use with flash-attention and efficient_attention kernels. They are the + * cumulative sequence_length over a batch of sequences and the maximum sequence + * length. + * + * @return A tuple of cumulative sequence lengths and the maximum sequence length, + * and the last element in the cumulative_sequence_lengths + */ +std::tuple cumulative_and_max_seq_len(Tensor qkv) { + TORCH_CHECK( + qkv.is_nested(), + "QKV must be nested for flash cumulative_seq_len calculation.") + auto* nt_impl = get_nested_tensor_impl(qkv); + const auto& sizes = nt_impl->get_nested_size_tensor(); + auto size_tensor_stride = sizes.stride(0); + + const int64_t batch_size = qkv.size(0); + auto cumulative_seqlen = at::zeros( + {batch_size + 1}, TensorOptions().device(at::kCPU).dtype(at::kInt)); + + auto* sizes_ptr = sizes.data_ptr(); + auto* cumulative_seqlen_ptr = cumulative_seqlen.data_ptr(); + + int32_t sum = 0; + int64_t max_seqlen = -1; + cumulative_seqlen_ptr[0] = sum; + for (const auto i : c10::irange(batch_size)) { + // Calculate the cumulative sum of the sequence lengths + auto current_seq_len = sizes_ptr[(i * size_tensor_stride)]; + sum += current_seq_len; + cumulative_seqlen_ptr[i + 1] = sum; + + // Find the max element while we traverse + max_seqlen = std::max(max_seqlen, current_seq_len); + } + // Send to GPU, this is pretty light weight calc for normal batch size + // but maybe this needs to be on gpu + cumulative_seqlen = cumulative_seqlen.to(TensorOptions().device(at::kCUDA)); + return std::tuple{cumulative_seqlen, max_seqlen, sum}; +} + +/** + * This function checks if a nested tensor is valid for + * use with the flash-attention and efficient_attention kernels without + * needing to call contiguous on the nested tensor input. + * It checks that the storage offsets' adjacent_differences are a constant mutiple + * of the previous tensor in the nested tensor and that the strides are monitonically decreasing. + * This check is done after calling transpose on the nested tensor. + * + * @return A boolean indicating of contiguous needs to be called for input + */ +bool is_safe_to_get_storage_as_tensor(const NestedTensorImpl* tensor) { + const auto& tensor_offsets = tensor->get_storage_offsets(); + const Tensor& tensor_sizes = tensor->get_nested_size_tensor(); + const Tensor& tensor_strides = tensor->get_nested_stride_tensor(); + + const int64_t n_tensors = tensor_strides.size(0); + const int64_t n_dims = tensor_strides.size(1); + + if (n_tensors <= 1) { + return true; + } + + int64_t* previous_tensor_stride = tensor_strides.data_ptr(); + // Check initially that they are in strictly descending order + for (int i{1}; i < n_dims; i++) { + if (previous_tensor_stride[i - 1] <= previous_tensor_stride[i]) { + return false; + } + } + // Check that each tensor i in the nested tensor has the same strides + auto tensor_stride_0 = tensor_strides.stride(0); + + for (int i{1}; i < n_tensors; i++) { + for (const int64_t j : c10::irange(n_dims)) { + if (previous_tensor_stride[j] != + previous_tensor_stride[i * tensor_stride_0 + j]) { + return false; + } + } + } + // Check the offsets are a constant multiple from the previous numels + const int64_t* tensor_size_ptr = tensor_sizes.data_ptr(); + const int64_t* tensor_stride_ptr = tensor_strides.data_ptr(); + + int64_t numel_0 = (tensor_size_ptr[0] * tensor_stride_ptr[0]); + TORCH_INTERNAL_ASSERT(numel_0 > 0, "numels must be positive!"); + + int64_t offset_constant = (tensor_offsets[1] - tensor_offsets[0]) / numel_0; + for (int64_t i = 2; i < n_tensors; i++) { + // TODO: When 0 seq_len nested tensors are allowed we need to guard against this + int64_t previous_numel = tensor_size_ptr[(i - 1) * tensor_stride_0] * tensor_stride_ptr[(i - 1) * tensor_stride_0]; + TORCH_INTERNAL_ASSERT(previous_numel > 0, "numels must be positive!"); + int64_t current_offset_constant = (tensor_offsets[i] - tensor_offsets[i - 1]) / previous_numel; + if (current_offset_constant != offset_constant) { + return false; + } + } + // Congrats you made it! + return true; +} + +} // namespace + +std::tuple _scaled_dot_product_flash_attention_nestedtensor_cuda( + const Tensor& query, + const Tensor& key, + const Tensor& value, + double dropout_p, + bool return_softmax, + bool is_causal) { + TORCH_CHECK(false, "There are currently cuda memory errors being returned from this path.") + // Query (Batch x Num_heads x {Q_seq_len} x Dim_per_head) + // Key (Batch x Num_heads x {KV_seq_len} x Dim_per_head) + // Value (Batch x Num_heads x {KV_seq_len} x Dim_per_head) + const int64_t num_heads = query.size(1); + const int64_t head_dim = query.size(3); + + // Query -> Query (Batch x {Q_seq_len} x Num_heads x Dim_per_head) + // Key -> Key (Batch x {KV_seq_len} x Num_heads x Dim_per_head) + // Value -> Value (Batch x {KV_seq_len} x Num_heads x Dim_per_head) + Tensor q_t = query.transpose(1, 2).contiguous(); + Tensor k_t = key.transpose(1, 2).contiguous(); + Tensor v_t = value.transpose(1, 2).contiguous(); + + // K and V have to have the same Nnz, should probably torch_check + // assume in order to not iterate over v + + auto cumulative_and_max_q = cumulative_and_max_seq_len(q_t); + auto cumulative_and_max_k = cumulative_and_max_seq_len(k_t); + + Tensor cumulative_sequence_length_q = std::get<0>(cumulative_and_max_q); + Tensor cumulative_sequence_length_k = std::get<0>(cumulative_and_max_k); + + const int64_t max_seqlen_batch_q = std::get<1>(cumulative_and_max_q); + const int64_t max_seqlen_batch_k = std::get<1>(cumulative_and_max_k); + + const int64_t Nnz_q = cumulative_sequence_length_q[-1].item(); + const int64_t Nnz_kv = cumulative_sequence_length_k[-1].item(); + + auto query_buffer_reshaped = + get_buffer(q_t).view({Nnz_q, num_heads, head_dim}); + auto key_buffer_reshaped = + get_buffer(k_t).view({Nnz_kv, num_heads, head_dim}); + auto value_buffer_reshaped = + get_buffer(v_t).view({Nnz_kv, num_heads, head_dim}); + + auto attention_and_lse_and_softmax = + at::_flash_attention_forward( + query_buffer_reshaped, + key_buffer_reshaped, + value_buffer_reshaped, + cumulative_sequence_length_q, + cumulative_sequence_length_k, + max_seqlen_batch_q, + max_seqlen_batch_k, + return_softmax, + dropout_p, + is_causal); + // Reshape output to convert nnz to batch_size and seq_len + Tensor attention = std::get<0>(attention_and_lse_and_softmax); + attention = wrap_buffer(attention.view(-1), get_nested_size_tensor(q_t).clone()).transpose(1,2); + return std::tie(attention, std::get<1>(attention_and_lse_and_softmax), std::get<2>(attention_and_lse_and_softmax)); +} + +std::tuple _scaled_dot_product_efficient_attention_nestedtensor_cuda( + const Tensor& query, + const Tensor& key, + const Tensor& value, + bool compute_log_sumexp, + bool is_causal) { + // Query (Batch x Num_heads x {Q_seq_len} x Dim_per_head) + // Key (Batch x Num_heads x {KV_seq_len} x Dim_per_head) + // Value (Batch x Num_heads x {KV_seq_len} x Dim_per_head) + const int64_t num_heads = query.size(1); + const int64_t head_dim = query.size(3); + + Tensor q_t = query.transpose(1, 2); + Tensor k_t = key.transpose(1, 2); + Tensor v_t = value.transpose(1, 2); + + auto cumulative_and_max_q_and_nnz_q = cumulative_and_max_seq_len(q_t); + auto cumulative_and_max_k_and_nnz_k = cumulative_and_max_seq_len(k_t); + + // K and V have to have the same Nnz, should probably torch_check + // assume in order to not iterate over v + + Tensor cumulative_sequence_length_q = std::get<0>(cumulative_and_max_q_and_nnz_q); + Tensor cumulative_sequence_length_k = std::get<0>(cumulative_and_max_k_and_nnz_k); + + const int64_t max_seqlen_batch_q = std::get<1>(cumulative_and_max_q_and_nnz_q); + + const int64_t Nnz_q = std::get<2>(cumulative_and_max_q_and_nnz_q); + const int64_t Nnz_kv = std::get<2>(cumulative_and_max_k_and_nnz_k); + + Tensor query_buffer_reshaped; + Tensor key_buffer_reshaped; + Tensor value_buffer_reshaped; + + const auto* query_impl = get_nested_tensor_impl(q_t); + const auto* key_impl = get_nested_tensor_impl(k_t); + const auto* value_impl = get_nested_tensor_impl(v_t); + + // If the physical layout of the NestedTensor's storage + // is not: batch, {seq_len}, num_heads, head_dim then we need + // to call contiguous + if (!q_t.is_contiguous() && !is_safe_to_get_storage_as_tensor(query_impl)) { + q_t = q_t.contiguous(); + query_impl = get_nested_tensor_impl(q_t); + } + if (!k_t.is_contiguous() && !is_safe_to_get_storage_as_tensor(key_impl)) { + k_t = k_t.contiguous(); + key_impl = get_nested_tensor_impl(k_t); + } + if (!v_t.is_contiguous() && !is_safe_to_get_storage_as_tensor(value_impl)) { + v_t = v_t.contiguous(); + value_impl = get_nested_tensor_impl(v_t); + } + + Tensor q_storage_as_tensor = + get_nested_tensor_impl(q_t)->get_unsafe_storage_as_tensor(); + Tensor k_storage_as_tensor = + get_nested_tensor_impl(k_t)->get_unsafe_storage_as_tensor(); + Tensor v_storage_as_tensor = + get_nested_tensor_impl(v_t)->get_unsafe_storage_as_tensor(); + + auto query_stride_tensor = query_impl->get_nested_stride_tensor(); + auto key_stride_tensor = key_impl->get_nested_stride_tensor(); + auto value_stride_tensor = value_impl->get_nested_stride_tensor(); + + const int64_t head_dim_stride = 1; + + const int64_t* q_strides = query_stride_tensor.data_ptr(); + const int64_t nnz_q_stride = q_strides[0]; + const int64_t head_q_stride = q_strides[1]; + + const int64_t* k_strides = key_stride_tensor.data_ptr(); + const int64_t nnz_k_stride = k_strides[0]; + const int64_t head_k_stride = k_strides[1]; + + const int64_t* v_strides = value_stride_tensor.data_ptr(); + const int64_t nnz_v_stride = v_strides[0]; + const int64_t head_v_stride = v_strides[1]; + + query_buffer_reshaped = q_storage_as_tensor.as_strided( + {Nnz_q, num_heads, head_dim}, + {nnz_q_stride, head_q_stride, head_dim_stride}, + query_impl->get_storage_offsets()[0]); + key_buffer_reshaped = k_storage_as_tensor.as_strided( + {Nnz_kv, num_heads, head_dim}, + {nnz_k_stride, head_k_stride, head_dim_stride}, + key_impl->get_storage_offsets()[0]); + value_buffer_reshaped = v_storage_as_tensor.as_strided( + {Nnz_kv, num_heads, head_dim}, + {nnz_v_stride, head_v_stride, head_dim_stride}, + value_impl->get_storage_offsets()[0]); + std::tuple attention_and_logsumexp= + at::_efficient_attention_forward( + query_buffer_reshaped.unsqueeze(0), + key_buffer_reshaped.unsqueeze(0), + value_buffer_reshaped.unsqueeze(0), + cumulative_sequence_length_q, + cumulative_sequence_length_k, + max_seqlen_batch_q, + compute_log_sumexp, + is_causal); + // Reshape output to convert nnz to batch_size and seq_len + Tensor attention = std::get<0>(attention_and_logsumexp); + attention = + wrap_buffer(attention.view(-1), get_nested_size_tensor(q_t).clone()) + .transpose(1, 2); + return std::tie(attention, std::get<1>(attention_and_logsumexp)); +} + +Tensor flash_attention_helper( + const Tensor& query, + const Tensor& key, + const Tensor& value, + double dropout_p, + bool need_atten_weights, + bool is_causal) { + // Query is of size (batch_size x ragged_seq_len x (3 or 1) x n_heads x + // head_did + int64_t head_dim{query.size(-1)}; + int64_t num_heads{query.size(-2)}; + + auto cumulative_and_max_q_and_nnz_q = cumulative_and_max_seq_len(query); + Tensor cumulative_sequence_length_q = std::get<0>(cumulative_and_max_q_and_nnz_q); + int64_t max_seqlen_batch_q = std::get<1>(cumulative_and_max_q_and_nnz_q); + + TORCH_CHECK( + key.is_same(key) && query.is_same(value), + "Key and Value must be the same tensor"); + + int64_t Nnz_q = std::get<2>(cumulative_and_max_q_and_nnz_q); + + // For the packed case we need to set the output size for dim 2 to 1 + auto atten_size = get_nested_size_tensor(query).clone(); + atten_size.index({at::indexing::Slice(), 1}) = 1; + + auto qkv_buffer_reshaped = get_buffer(query) + .view({Nnz_q, 3, num_heads, head_dim}) + .transpose(0, 1) + .contiguous(); + + auto q = qkv_buffer_reshaped[0]; + auto k = qkv_buffer_reshaped[1]; + auto v = qkv_buffer_reshaped[2]; + + TORCH_CHECK(q.is_contiguous()); + TORCH_CHECK(k.is_contiguous()); + TORCH_CHECK(v.is_contiguous()); + + // If we are passing in query, key, value all the same tensors then we have + // packed them into one tensor and need to slice for flash attention + Tensor attention = + std::get<0>(at::_flash_attention_forward( + q, + k, + v, + cumulative_sequence_length_q, + cumulative_sequence_length_q, + max_seqlen_batch_q, + max_seqlen_batch_q, + false /*return_softmax*/, + dropout_p, + is_causal)); + // Output of flash_attention is a regular tensor lets wrap it back up to + // form a nested tensor + + return wrap_buffer(attention.view(-1), atten_size); +} + } // namespace native } // namespace at diff --git a/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu b/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu index e8eb164bf4e7..56cac2a89803 100644 --- a/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu +++ b/aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctions.cu @@ -15,6 +15,17 @@ #include #include +#include + +#ifndef USE_ROCM +#ifndef _WIN32 +#include +#include +#include +#endif +#endif + +#include #define BLOCK_DIM 256 #define GRID_DIM_Y 16 @@ -146,7 +157,7 @@ void remove_padding_kernelLauncher( dim3 grid; grid.x = batch_size; grid.y = GRID_DIM_Y; - at::cuda::CUDAStream stream = at::cuda::getDefaultCUDAStream(); + at::cuda::CUDAStream stream = at::cuda::getCurrentCUDAStream(); if (output_dim == 2) { remove_padding_2<<>>( input, @@ -180,7 +191,7 @@ void remove_padding_transform0213_kernelLauncher( dim3 grid; grid.x = batch_size; grid.y = GRID_DIM_Y; - at::cuda::CUDAStream stream = at::cuda::getDefaultCUDAStream(); + at::cuda::CUDAStream stream = at::cuda::getCurrentCUDAStream(); TORCH_CHECK( output_dim == 2, "remove padding transform0213 only support output dim == 2"); @@ -338,7 +349,8 @@ __global__ void add_padding_3( const int i0 = i / (output_sizes_2 * output_sizes_3); const int i1 = (i % (output_sizes_2 * output_sizes_3)) / output_sizes_3; const int i2 = i % output_sizes_3; - if (batch_id < batch_size && i0 < sizes_i[0] && i1 < sizes_i[1] && i2 < sizes_i[2]) { + if (batch_id < batch_size && i0 < sizes_i[0] && i1 < sizes_i[1] && + i2 < sizes_i[2]) { const int offset = offsets[batch_id]; const int input_offset = offset + i0 * (sizes_i[1] * sizes_i[2]) + i1 * sizes_i[2] + i2; @@ -352,7 +364,8 @@ __global__ void add_padding_3( const int i0 = i / (output_sizes_2 * output_sizes_3); const int i1 = (i % (output_sizes_2 * output_sizes_3)) / output_sizes_3; const int i2 = i % output_sizes_3; - if (batch_id < batch_size && i0 < sizes_i[0] && i1 < sizes_i[1] && i2 < sizes_i[2]) { + if (batch_id < batch_size && i0 < sizes_i[0] && i1 < sizes_i[1] && + i2 < sizes_i[2]) { const int offset = offsets[batch_id]; const int input_offset = offset + i0 * (sizes_i[1] * sizes_i[2]) + i1 * sizes_i[2] + i2; @@ -374,7 +387,7 @@ void add_padding_kernelLauncher( const std::vector& output_sizes, const int batch_size, const int output_batch_size) { - at::cuda::CUDAStream stream = at::cuda::getDefaultCUDAStream(); + at::cuda::CUDAStream stream = at::cuda::getCurrentCUDAStream(); dim3 grid; grid.x = output_batch_size; grid.y = GRID_DIM_Y; diff --git a/aten/src/ATen/native/prim_native_functions.cpp b/aten/src/ATen/native/prim_native_functions.cpp index 8f82345c1905..4e79c112d7fc 100644 --- a/aten/src/ATen/native/prim_native_functions.cpp +++ b/aten/src/ATen/native/prim_native_functions.cpp @@ -1,4 +1,11 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/AffineQuantizer.cpp b/aten/src/ATen/native/quantized/AffineQuantizer.cpp index e2fa8f65adc6..dbda6ebd5f90 100644 --- a/aten/src/ATen/native/quantized/AffineQuantizer.cpp +++ b/aten/src/ATen/native/quantized/AffineQuantizer.cpp @@ -97,6 +97,21 @@ void checkSameSize( " only works with Tensors with the same shape"); } +void checkPerChannelParamsSize( + const Tensor& rtensor, + int64_t axis, + const Tensor& scales, + const Tensor& zero_points +) { + int64_t channel = rtensor.size(axis); + TORCH_CHECK( + channel == int64_t(scales.numel()), + "length of scales must equal to channel, expected ", channel, " got, ", scales.numel()); + TORCH_CHECK( + channel == int64_t(zero_points.numel()), + "length of zero_points must equal to channel expected ", channel, " got, ", zero_points.numel()); +} + } // anonymous namespace Tensor& quantize_tensor_per_tensor_affine( @@ -156,13 +171,7 @@ Tensor& quantize_tensor_per_channel_affine( "Expected: [0, ", rtensor.dim(), ")"); - int64_t channel = rtensor.size(axis); - TORCH_CHECK( - channel == int64_t(scales.numel()), - "length of scales must equal to channel"); - TORCH_CHECK( - channel == int64_t(zero_points.numel()), - "length of zero_points must equal to channel"); + checkPerChannelParamsSize(rtensor, axis, scales, zero_points); quantize_tensor_per_channel_affine_stub( rtensor.device().type(), rtensor, qtensor, scales, zero_points, axis); @@ -195,13 +204,7 @@ Tensor& quantize_tensor_per_channel_float_qparams( "Expected: [0, ", rtensor.dim(), ")"); - int64_t channel = rtensor.size(axis); - TORCH_CHECK( - channel == int64_t(scales.numel()), - "length of scales must equal to channel"); - TORCH_CHECK( - channel == int64_t(zero_points.numel()), - "length of zero_points must equal to channel"); + checkPerChannelParamsSize(rtensor, axis, scales, zero_points); quantize_tensor_per_channel_float_qparams_stub( rtensor.device().type(), rtensor, qtensor, scales, zero_points, axis); @@ -260,13 +263,7 @@ Tensor& dequantize_tensor_per_channel_affine( " Expected: [0, ", qtensor.dim(), ")"); - int64_t channel = qtensor.size(axis); - TORCH_CHECK( - channel == int64_t(scales.numel()), - "length of scales must equal to channel"); - TORCH_CHECK( - channel == int64_t(zero_points.numel()), - "length of zero_points must equal to channel"); + checkPerChannelParamsSize(rtensor, axis, scales, zero_points); dequantize_tensor_per_channel_affine_stub( qtensor.device().type(), qtensor, rtensor, scales, zero_points, axis); @@ -297,13 +294,7 @@ Tensor& dequantize_tensor_per_channel_float_qparams( " Expected: [0, ", qtensor.dim(), ")"); - int64_t channel = qtensor.size(axis); - TORCH_CHECK( - channel == int64_t(scales.numel()), - "length of scales must equal to channel"); - TORCH_CHECK( - channel == int64_t(zero_points.numel()), - "length of zero_points must equal to channel"); + checkPerChannelParamsSize(rtensor, axis, scales, zero_points); dequantize_tensor_per_channel_float_qparams_stub( qtensor.device().type(), qtensor, rtensor, scales, zero_points, axis); diff --git a/aten/src/ATen/native/quantized/AffineQuantizer.h b/aten/src/ATen/native/quantized/AffineQuantizer.h index cd39e3424066..1ff342a643c3 100644 --- a/aten/src/ATen/native/quantized/AffineQuantizer.h +++ b/aten/src/ATen/native/quantized/AffineQuantizer.h @@ -1,6 +1,7 @@ #pragma once -#include +#include +#include #include #include diff --git a/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp b/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp index e40f8ef1fdb0..5d02d9e04ed7 100644 --- a/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp +++ b/aten/src/ATen/native/quantized/AffineQuantizerBase.cpp @@ -71,6 +71,33 @@ void quantize_vec( (float)scale, (int32_t)zero_point, precision}); } +#if defined(__ARM_NEON__) || defined(__aarch64__) +// For use when compiling FBGEMM on aarch64 but still supporting x86 +// intrinsics via simde +template +T quantize_val_arm( + const float scale, + const int32_t zero_point, + const float value) { + constexpr int32_t qmin = std::numeric_limits::min(); + constexpr int32_t qmax = std::numeric_limits::max(); + float inv_scale = 1.0f / scale; + auto r = zero_point + static_cast(std::nearbyint(value * inv_scale)); + r = std::max(r, qmin); + r = std::min(r, qmax); + return static_cast(r); +} + +template uint8_t quantize_val_arm( + const float scale, + const int32_t zero_point, + const float value); +template int8_t quantize_val_arm( + const float scale, + const int32_t zero_point, + const float value); +#endif + template inline float dequantize_val(double scale, int64_t zero_point, T value) { // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) diff --git a/aten/src/ATen/native/quantized/FakeQuantAffine.h b/aten/src/ATen/native/quantized/FakeQuantAffine.h index 3b1dbf608c13..1fb7cfbb0e72 100644 --- a/aten/src/ATen/native/quantized/FakeQuantAffine.h +++ b/aten/src/ATen/native/quantized/FakeQuantAffine.h @@ -1,6 +1,7 @@ #pragma once -#include +#include +#include #include namespace at { diff --git a/aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp b/aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp index 700b3b14b180..aac039f0e03e 100644 --- a/aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp +++ b/aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp @@ -122,10 +122,10 @@ Tensor fake_quantize_per_tensor_affine_cachemask_backward( const Tensor& dY, const Tensor& mask) { TORCH_CHECK(mask.scalar_type() == ScalarType::Bool); - TORCH_CHECK(mask.numel() == dY.numel(), + TORCH_CHECK(mask.sym_numel() == dY.sym_numel(), "`mask` and `dY` are not the same size: ", - "`mask` is size ", mask.numel(), " and `dY` is size ", dY.numel()); - if (dY.numel() <= 0) { + "`mask` is size ", mask.sym_numel(), " and `dY` is size ", dY.sym_numel()); + if (dY.sym_numel() <= 0) { return dY; } // Note: no additional kernels needed, since mask is pre-computed diff --git a/aten/src/ATen/native/quantized/IndexKernel.h b/aten/src/ATen/native/quantized/IndexKernel.h index 69f12472bea0..0e240b5a8e9a 100644 --- a/aten/src/ATen/native/quantized/IndexKernel.h +++ b/aten/src/ATen/native/quantized/IndexKernel.h @@ -5,9 +5,10 @@ namespace at { namespace native { using masked_fill_kernel_quantized_fn = void(*)(TensorIterator& iter, const Scalar& value, double scale, int zero_point); using index_put_kernel_quantized_fn = void(*)(TensorIterator& iter, IntArrayRef index_size, IntArrayRef index_stride, bool accumulate, double scale, int zero_point); + DECLARE_DISPATCH(masked_fill_kernel_quantized_fn, masked_fill_kernel_quantized_stub); DECLARE_DISPATCH(index_put_kernel_quantized_fn, index_put_kernel_quantized_stub); -// TODO: implement index_put_kernel_quantized_cuda in cuda/IndexKernel.cu and put CUDA kernel in a stub + } // native } // at diff --git a/aten/src/ATen/native/quantized/PackedParams.h b/aten/src/ATen/native/quantized/PackedParams.h index 64d8ec840c46..179fcce23dfe 100644 --- a/aten/src/ATen/native/quantized/PackedParams.h +++ b/aten/src/ATen/native/quantized/PackedParams.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include struct LinearPackedParamsBase : public torch::jit::CustomClassHolder { diff --git a/aten/src/ATen/native/quantized/QTensor.cpp b/aten/src/ATen/native/quantized/QTensor.cpp index b2737f578fbf..b3ff8bd8b327 100644 --- a/aten/src/ATen/native/quantized/QTensor.cpp +++ b/aten/src/ATen/native/quantized/QTensor.cpp @@ -330,6 +330,12 @@ std::tuple choose_qparams_optimized( const double ratio, int64_t bit_width) { + if (numel < 0 || numel > input_tensor.numel()) { + TORCH_CHECK(false, "numel is out of the bound of input tensor"); + } + + TORCH_CHECK(numel <= input_tensor.numel(), "numel ", numel, + " greater than input_tensor.numel() ", input_tensor.numel()); const float* input_row = input_tensor.data_ptr(); float xmin = *std::min_element(input_row, input_row + numel); float xmax = *std::max_element(input_row, input_row + numel); diff --git a/aten/src/ATen/native/quantized/README.md b/aten/src/ATen/native/quantized/README.md index 62c4a8a1f9e1..f042881a8ceb 100644 --- a/aten/src/ATen/native/quantized/README.md +++ b/aten/src/ATen/native/quantized/README.md @@ -171,7 +171,8 @@ def quantized_xand(qa, qb): return ops.quantized.xand(qa, qb) ``` -**Note:** If writing new pytorch functions that use quantized kernels, it is strongly encouraged to place them in the `torch/nn/quantized/functional.py`. +**Note:** If writing new pytorch functions that use quantized kernels, +it is strongly encouraged to place them in the `torch/ao/nn/quantized/functional.py`. ### C++ diff --git a/aten/src/ATen/native/quantized/TensorAdvancedIndexing.cpp b/aten/src/ATen/native/quantized/TensorAdvancedIndexing.cpp index 904a8942eed9..4bfa5acaa263 100644 --- a/aten/src/ATen/native/quantized/TensorAdvancedIndexing.cpp +++ b/aten/src/ATen/native/quantized/TensorAdvancedIndexing.cpp @@ -5,11 +5,13 @@ #include #include #include +#include namespace at { namespace native { DEFINE_DISPATCH(masked_fill_kernel_quantized_stub); DEFINE_DISPATCH(index_put_kernel_quantized_stub); +DEFINE_DISPATCH(index_put_with_sort_quantized_stub); namespace { static TensorIterator make_index_put_iterator(const AdvancedIndex& info, const Tensor& value) { @@ -76,6 +78,51 @@ Tensor & masked_fill__quantized_cpu(Tensor& self, const Tensor & mask, const Ten return self; } +Tensor & masked_fill_impl_quantized_cuda(Tensor& self, const Tensor & mask, const Scalar& value) { + TORCH_CHECK(self.device() == mask.device(), "expected self and mask to be on the same device, but got mask on ", + mask.device(), " and self on ", self.device()); + TORCH_CHECK(mask.scalar_type() == kByte || mask.scalar_type() == kBool, + "expected mask dtype to be Bool but got ", mask.scalar_type()); + TORCH_CHECK(self.qscheme() == c10::kPerTensorAffine, "masked_fill__quantized_cpu for quantized tensors is currently only supported for per tensor quantized tensors"); + + auto maybe_outnames = namedinference::broadcast_to_outnames(self, mask, "masked_fill_"); + + if (at::has_internal_overlap(self) == MemOverlap::Yes) { + TORCH_WARN( + "Use of masked_fill_ on expanded tensors is deprecated. " + "Please clone() the tensor before performing this operation. " + "This also applies to advanced indexing e.g. tensor[mask] = scalar"); + } + at::assert_no_partial_overlap(self, mask); + + c10::MaybeOwned b_mask = expand_inplace(self, mask, "masked_fill_"); + + auto iter = TensorIteratorConfig() + .set_check_mem_overlap(false) + .check_all_same_dtype(false) + .resize_outputs(false) + .add_output(self) + .add_input(self) + .add_input(*b_mask) + .build(); + + masked_fill_kernel_quantized_stub(iter.device_type(), iter, value, self.q_scale(), self.q_zero_point()); + namedinference::propagate_names_if_nonempty(self, maybe_outnames); + return self; +} + +Tensor & masked_fill__quantized_cuda(Tensor& self, const Tensor & mask, const Scalar& value) { + TORCH_CHECK(!self.device().is_cpu(), "masked_fill_: Expected inputs to be on same device") + return masked_fill_impl_quantized_cuda(self, mask, value); +} + +Tensor & masked_fill__quantized_cuda(Tensor& self, const Tensor & mask, const Tensor & value) { + TORCH_CHECK(value.dim() == 0, "masked_fill_ only supports a 0-dimensional value tensor, but got tensor " + "with ", value.dim(), " dimension(s)."); + TORCH_CHECK(!self.device().is_cpu(), "masked_fill_: Expected inputs to be on same device") + return masked_fill_impl_quantized_cuda(self, mask, value.item()); +} + Tensor& _index_put_impl_quantized_cpu_(Tensor & self, const torch::List>& indices, const Tensor & value, const bool accumulate, const bool unsafe) { TORCH_CHECK_INDEX(indices.size() <= (size_t)self.dim(), "too many indices for tensor of dimension ", self.dim(), " (got ", indices.size(), ")"); TORCH_CHECK(!value.is_quantized(), "Value argument for quantized input_put should not be quantized"); @@ -112,5 +159,49 @@ Tensor& _index_put_impl_quantized_cpu_(Tensor & self, const torch::List>& indices, const Tensor & value, const bool accumulate, const bool unsafe) { + TORCH_CHECK_INDEX(indices.size() <= (size_t)self.dim(), "too many indices for tensor of dimension ", self.dim(), " (got ", indices.size(), ")"); + TORCH_CHECK(!value.is_quantized(), "Value argument for quantized input_put should not be quantized"); + TORCH_CHECK(self.qscheme() == c10::kPerTensorAffine, "index_put for quantized tensors is currently only supported for per tensor quantized tensors"); + TORCH_CHECK(!accumulate, "index_put for quantized tensors is currently only supported for accumulate=False"); + + if (at::has_internal_overlap(self) == MemOverlap::Yes) { + TORCH_WARN( + "Use of index_put_ on expanded tensors is deprecated. " + "Please clone() the tensor before performing this operation. " + "This also applies to advanced indexing e.g. tensor[indices] = tensor"); + } + + auto masked_fill_dispatch = canDispatchToMaskedFill(self, indices, value); + if (std::get<0>(masked_fill_dispatch)) { + return self.masked_fill_(std::get<1>(masked_fill_dispatch), value.item()); + } + + auto value_ = value; + if (value.device() != self.device() && value.numel() == 1 && value.dim() == 0) { + value_ = value.to(self.device()); + } + TORCH_CHECK(value.device() == self.device(), "expected device ", self.device(), " but got device ", value.device(), " for value tensor"); + + at::assert_no_overlap(self, value); + // NOLINTNEXTLINE(performance-implicit-conversion-in-loop) + for (const c10::optional& index: indices) { + if (index.has_value()) { + at::assert_no_overlap(self, *index); + } + } + + // See Note [Enabling Deterministic Operations] + if (self.device().type() == DeviceType::CUDA && globalContext().deterministicAlgorithms()) { + index_put_with_sort_quantized_stub(self.device().type(), self, indices, value_, self.q_scale(), self.q_zero_point(), unsafe); + return self; + } + + auto info = make_info(self, indices); + auto iter = make_index_put_iterator(info, value_); + index_put_kernel_quantized_stub(iter.device_type(), iter, info.indexed_sizes, info.indexed_strides, accumulate, self.q_scale(), self.q_zero_point()); + return self; +} + } } diff --git a/aten/src/ATen/native/quantized/TensorCompare.cpp b/aten/src/ATen/native/quantized/TensorCompare.cpp index 08a104257f4e..747f8bfe4d30 100644 --- a/aten/src/ATen/native/quantized/TensorCompare.cpp +++ b/aten/src/ATen/native/quantized/TensorCompare.cpp @@ -14,6 +14,19 @@ Tensor max_quantized_cpu(const Tensor& self) { return std::get<0>(self.reshape({-1}).max(/*dim=*/0)); } +Tensor& max_quantized_unary_out(const Tensor& self, Tensor& out) { + // TODO this implementation is inefficient for now. + TORCH_CHECK(self.device() == out.device()); + + TORCH_CHECK(canCast( + typeMetaToScalarType(self.dtype()), + typeMetaToScalarType(out.dtype()))); + Tensor temp = max_quantized_cpu(self); + at::native::resize_output(out, temp.sizes()); + out.copy_(temp); + return out; +} + Tensor min_quantized_cpu(const Tensor& self) { return std::get<0>(self.reshape({-1}).min(/*dim=*/0)); } diff --git a/aten/src/ATen/native/quantized/TensorFactories.cpp b/aten/src/ATen/native/quantized/TensorFactories.cpp index 66c48f4ce752..aa0fef5df9dc 100644 --- a/aten/src/ATen/native/quantized/TensorFactories.cpp +++ b/aten/src/ATen/native/quantized/TensorFactories.cpp @@ -66,16 +66,6 @@ Tensor empty_per_channel_affine_quantized( quantizer); } -Tensor empty_symint_unknown_quantized( - c10::SymIntArrayRef size, - c10::optional dtype, - c10::optional layout, - c10::optional device, - c10::optional pin_memory, - c10::optional optional_memory_format) { - return at::native::empty_unknown_quantized(c10::asIntArrayRefSlow(size), dtype, layout, device, pin_memory, optional_memory_format); -} - Tensor empty_unknown_quantized( IntArrayRef size, c10::optional dtype, diff --git a/aten/src/ATen/native/quantized/cpu/AdaptiveAveragePooling.cpp b/aten/src/ATen/native/quantized/cpu/AdaptiveAveragePooling.cpp index 4d6e0f79db95..1317817902cf 100644 --- a/aten/src/ATen/native/quantized/cpu/AdaptiveAveragePooling.cpp +++ b/aten/src/ATen/native/quantized/cpu/AdaptiveAveragePooling.cpp @@ -1,8 +1,20 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/quantized/cpu/AveragePool2d.cpp b/aten/src/ATen/native/quantized/cpu/AveragePool2d.cpp index 264707c25a8f..bb72a2010ca3 100644 --- a/aten/src/ATen/native/quantized/cpu/AveragePool2d.cpp +++ b/aten/src/ATen/native/quantized/cpu/AveragePool2d.cpp @@ -1,5 +1,7 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include #include #include @@ -7,6 +9,14 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/quantized/cpu/AveragePool3d.cpp b/aten/src/ATen/native/quantized/cpu/AveragePool3d.cpp index 35580bfc50d8..93534b70c2c0 100644 --- a/aten/src/ATen/native/quantized/cpu/AveragePool3d.cpp +++ b/aten/src/ATen/native/quantized/cpu/AveragePool3d.cpp @@ -1,17 +1,21 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + #include -#include -#include -#include #include namespace at { diff --git a/aten/src/ATen/native/quantized/cpu/BinaryOps.cpp b/aten/src/ATen/native/quantized/cpu/BinaryOps.cpp index 1e3f1d3ddb0f..58a7036bdd7e 100644 --- a/aten/src/ATen/native/quantized/cpu/BinaryOps.cpp +++ b/aten/src/ATen/native/quantized/cpu/BinaryOps.cpp @@ -1,8 +1,9 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include +#include #include -#include -#include -#include #include #include #include @@ -10,6 +11,18 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + +#include + namespace at { namespace native { @@ -23,10 +36,10 @@ namespace { inline void check_inputs(const Tensor& qa, const Tensor& qb) { TORCH_CHECK( qa.qscheme() == kPerTensorAffine, - "Only per tensor quantization is suported in Add."); + "Only per tensor quantization is supported in Add."); TORCH_CHECK( qa.qscheme() == qb.qscheme(), - "Both inputs to Add must have the same quantization shceme."); + "Both inputs to Add must have the same quantization scheme."); TORCH_CHECK( qa.scalar_type() == qb.scalar_type(), "Add operands should have same data type."); diff --git a/aten/src/ATen/native/quantized/cpu/BinaryOps.h b/aten/src/ATen/native/quantized/cpu/BinaryOps.h index ada78c59f95c..cf86a13c139a 100644 --- a/aten/src/ATen/native/quantized/cpu/BinaryOps.h +++ b/aten/src/ATen/native/quantized/cpu/BinaryOps.h @@ -1,4 +1,4 @@ -#include +#include namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/ChannelShuffle.cpp b/aten/src/ATen/native/quantized/cpu/ChannelShuffle.cpp index e0b455a7300b..bb42b4edbe7a 100644 --- a/aten/src/ATen/native/quantized/cpu/ChannelShuffle.cpp +++ b/aten/src/ATen/native/quantized/cpu/ChannelShuffle.cpp @@ -1,16 +1,16 @@ -#include -#include -#include -#include -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include -#include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/EmbeddingPackedParams.h b/aten/src/ATen/native/quantized/cpu/EmbeddingPackedParams.h index 945c8edf7c75..140b716df269 100644 --- a/aten/src/ATen/native/quantized/cpu/EmbeddingPackedParams.h +++ b/aten/src/ATen/native/quantized/cpu/EmbeddingPackedParams.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include struct EmbeddingPackedParamsBase : public torch::jit::CustomClassHolder { diff --git a/aten/src/ATen/native/quantized/cpu/IntReprQuant.cpp b/aten/src/ATen/native/quantized/cpu/IntReprQuant.cpp index b3735ddb236d..9867a8f48a9e 100644 --- a/aten/src/ATen/native/quantized/cpu/IntReprQuant.cpp +++ b/aten/src/ATen/native/quantized/cpu/IntReprQuant.cpp @@ -1,10 +1,20 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/LinearUnpackImpl.cpp b/aten/src/ATen/native/quantized/cpu/LinearUnpackImpl.cpp index 89465a8c5208..c9387eb0ebb1 100644 --- a/aten/src/ATen/native/quantized/cpu/LinearUnpackImpl.cpp +++ b/aten/src/ATen/native/quantized/cpu/LinearUnpackImpl.cpp @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include @@ -7,6 +9,16 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + int register_linear_params(); #ifdef USE_FBGEMM diff --git a/aten/src/ATen/native/quantized/cpu/MakePerTensorQuantizedTensor.cpp b/aten/src/ATen/native/quantized/cpu/MakePerTensorQuantizedTensor.cpp index a321de08b994..a0a4342c4e00 100644 --- a/aten/src/ATen/native/quantized/cpu/MakePerTensorQuantizedTensor.cpp +++ b/aten/src/ATen/native/quantized/cpu/MakePerTensorQuantizedTensor.cpp @@ -1,7 +1,14 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include +#include #include + +#ifndef AT_PER_OPERATOR_HEADERS #include +#else +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/Normalization.cpp b/aten/src/ATen/native/quantized/cpu/Normalization.cpp index a5be594bdf39..2918c6530538 100644 --- a/aten/src/ATen/native/quantized/cpu/Normalization.cpp +++ b/aten/src/ATen/native/quantized/cpu/Normalization.cpp @@ -1,12 +1,20 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #include -#include namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/OnednnUtils.h b/aten/src/ATen/native/quantized/cpu/OnednnUtils.h index 6ad70356b3e0..533d83361f05 100644 --- a/aten/src/ATen/native/quantized/cpu/OnednnUtils.h +++ b/aten/src/ATen/native/quantized/cpu/OnednnUtils.h @@ -4,8 +4,168 @@ #if AT_MKLDNN_ENABLED() #include #include -#include -#include +#include +#include + +#include + +using PrimitiveCacheKey = std::tuple< + double, // input_scale + int64_t, // input_zero_point + std::vector, // input_shape + double, // output_scale + int64_t, // output_zero_point + int64_t>; // OMP_number_of_threads + +enum CacheKeyIndex { + InputScale, + InputZeroPoint, + InputShape, + OutputScale, + OutputZeroPoint, + NumOfThreads, +}; + +// Base class of primitive cache +struct PrimitiveCache { + PrimitiveCacheKey key; + + bool hit(const PrimitiveCacheKey& key) { + return this->key == key; + } +}; + +using LinearParams = ideep::matmul_forward_params; +using Conv = dnnl::convolution_forward; +using ConvDesc = dnnl::convolution_forward::primitive_desc; +using ConvParams = ideep::convolution_forward_params; +using Deconv = dnnl::deconvolution_forward; +using DeconvDesc = dnnl::deconvolution_forward::primitive_desc; +using DeconvParams = ideep::deconv_forward_params; + +struct LinearPrimitiveCache : PrimitiveCache { + LinearPrimitiveCache() {} + + LinearPrimitiveCache( + const PrimitiveCacheKey& key, + const LinearParams& param) { + this->key = key; + this->param = param; + } + + LinearPrimitiveCache( + const PrimitiveCacheKey& key, + const LinearParams& param, + const ideep::tensor& bias) { + this->key = key; + this->param = param; + if (!bias.is_empty()) { + expected_bias = + bias.reorder_if_differ_in(param.pd.bias_desc(), param.bias_attr); + } + } + + LinearParams param; + ideep::tensor expected_bias; + + // For dynamic qlinear, scale and zero point + // are set at execution time. So we only need to compare + // the rest part of key. + bool hit_dynamic(const PrimitiveCacheKey& new_key) { + auto cached_input_shape = std::get(this->key); + auto new_input_shape = std::get(new_key); + return ( + cached_input_shape == new_input_shape && + std::get(this->key) == std::get(new_key)); + } + + LinearParams& get_param() { + return param; + } + + ideep::tensor& get_expected_bias() { + return expected_bias; + } +}; + +struct ConvPrimitiveCache : PrimitiveCache { + ConvPrimitiveCache() {} + + ConvPrimitiveCache(const PrimitiveCacheKey& key, + const ConvDesc& conv_desc, + const ideep::tensor& bias, + const ideep::attr_t bias_attr) { + this->key = key; + this->primitive_desc = conv_desc; + this->primitive = Conv(this->primitive_desc); + // Construct tensor of input zero point + ideep::tensor::desc input_zp_desc = {{1}, ideep::data_type::s32, {1}}; + this->input_zp_tensor.init(input_zp_desc, ideep::engine::cpu_engine()); + auto zp_data_ptr = reinterpret_cast(this->input_zp_tensor.get_data_handle()); + zp_data_ptr[0] = std::get(key); + // Construct expected bias + this->expected_bias = bias.reorder_if_differ_in(conv_desc.bias_desc(), bias_attr); + } + + ConvDesc primitive_desc; + Conv primitive; + ideep::tensor input_zp_tensor; + ideep::tensor expected_bias; + + inline ConvDesc& get_primitive_desc() { + return primitive_desc; + } + + inline Conv& get_primitive() { + return primitive; + } + + inline ideep::tensor& get_src_zp_tensor() { + return input_zp_tensor; + } + + inline ideep::tensor& get_bias() { + return expected_bias; + } +}; + +struct DeconvPrimitiveCache : PrimitiveCache { + DeconvPrimitiveCache() {} + + DeconvPrimitiveCache(const PrimitiveCacheKey& key, + const DeconvDesc& deconv_desc, + const ideep::tensor& bias, + const ideep::attr_t bias_attr, + const ideep::tensor& input_zero_point) { + this->key = key; + this->primitive_desc = deconv_desc; + this->primitive = Deconv(this->primitive_desc); + this->input_zp_tensor = std::move(input_zero_point); + // Construct expected bias + this->expected_bias = bias.reorder_if_differ_in(deconv_desc.bias_desc(), bias_attr); + } + + DeconvDesc primitive_desc; + Deconv primitive; + ideep::tensor input_zp_tensor; + ideep::tensor expected_bias; + + inline DeconvDesc& get_primitive_desc() { + return primitive_desc; + } + + inline Deconv& get_primitive() { + return primitive; + } + + inline ideep::tensor& get_src_zp_tensor() { + return input_zp_tensor; + } + + inline ideep::tensor& get_bias() { + return expected_bias; + } +}; struct PackedLinearWeightsOnednn : public LinearPackedParamsBase { PackedLinearWeightsOnednn( @@ -16,7 +176,9 @@ struct PackedLinearWeightsOnednn : public LinearPackedParamsBase { : weight_(std::move(weight)), bias_(std::move(bias)), orig_weight_(std::move(orig_weight)), - orig_bias_(std::move(orig_bias)) {} + orig_bias_(std::move(orig_bias)) { + cache_initialized_flag = std::make_unique(); + } std::unique_ptr weight_; c10::optional bias_; at::Tensor orig_weight_; @@ -45,6 +207,9 @@ struct PackedLinearWeightsOnednn : public LinearPackedParamsBase { c10::optional bias); private: + LinearPrimitiveCache prim_cache; + std::unique_ptr cache_initialized_flag; + template at::Tensor apply_impl( at::Tensor input, @@ -53,6 +218,10 @@ struct PackedLinearWeightsOnednn : public LinearPackedParamsBase { template at::Tensor apply_dynamic_impl(at::Tensor input, bool reduce_range=false); + + LinearPrimitiveCache& get_cache() { + return prim_cache; + } }; template @@ -68,16 +237,18 @@ struct PackedConvWeightsOnednn : public ConvPackedParamsBase { torch::List dilation, int64_t groups, uint8_t transpose) - : weight_(std::move(weight)), - bias_(std::move(bias)), - orig_weight_(std::move(orig_weight)), - orig_bias_(std::move(orig_bias)), - stride_(std::move(stride)), - padding_(std::move(padding)), - output_padding_(std::move(output_padding)), - dilation_(std::move(dilation)), - groups_(groups), - transpose_(transpose) {} + : weight_(std::move(weight)), + bias_(std::move(bias)), + orig_weight_(std::move(orig_weight)), + orig_bias_(std::move(orig_bias)), + stride_(std::move(stride)), + padding_(std::move(padding)), + output_padding_(std::move(output_padding)), + dilation_(std::move(dilation)), + groups_(groups), + transpose_(transpose) { + cache_initialized_flag = std::make_unique(); + } std::unique_ptr weight_; c10::optional bias_; @@ -141,11 +312,90 @@ struct PackedConvWeightsOnednn : public ConvPackedParamsBase { } private: + ConvPrimitiveCache conv_prim_cache; + DeconvPrimitiveCache deconv_prim_cache; + std::unique_ptr cache_initialized_flag; + template at::Tensor apply_impl( const at::Tensor& input, double output_scale, int64_t output_zero_point); + + ConvPrimitiveCache& get_conv_cache() { + assert(!transpose()); + return conv_prim_cache; + } + + DeconvPrimitiveCache& get_deconv_cache() { + assert(transpose()); + return deconv_prim_cache; + } }; +namespace onednn_utils { + +// Try to reorder tensor to expected desc at runtime +// Do it in a `try...catch...` manner to avoid oneDNN's errors +// TODO: Move it to third_party/ideep +static void try_reorder( + ideep::tensor& t, + const ideep::tensor::desc&& desc, + ideep::scale_t scales) { + if (t.get_desc() != desc) { + try { + t = t.reorder_if_differ_in(desc); + } catch (...) { + ideep::tensor&& plain = t.to_public(nullptr, t.get_data_type()); + t = plain.reorder_if_differ_in(desc); + } + t.set_scale(scales); + } +} + +// ONEDNN requires symmetric quantization of weight +// Use this util function to check. +static bool is_weight_symmetric_quant( + const at::Tensor& weight, + bool is_transposed_conv) { + bool is_symmetric = true; + const auto qtype = weight.qscheme(); + if (qtype == c10::kPerTensorAffine) { + is_symmetric &= (weight.q_zero_point() == 0); + } else if (qtype == c10::kPerChannelAffine) { + if (is_transposed_conv) { + // This case is currently not supported in PyTorch + // but we do not want to raise an error in this util function. + is_symmetric = false; + } else { + auto output_channels = weight.size(0); + for (int i = 0; i < output_channels; ++i) { + auto zp = weight.q_per_channel_zero_points()[i].item(); + is_symmetric &= (zp == 0); + } + } + } else { + // This case is currently not supported in PyTorch + // but we do not want to raise an error in this util function. + is_symmetric = false; + } + return is_symmetric; +} + +// Check if onednn should be used w.r.t fbgemm +static bool should_use_onednn_quant( + const at::Tensor& weight, + bool is_transposed_conv, + int groups, + torch::List output_padding) { + bool vnni_available = cpuinfo_has_x86_avx512vnni(); + bool w_sym_quant = + is_weight_symmetric_quant(weight, is_transposed_conv); + bool opad_all_zero = + std::all_of(output_padding.begin(), output_padding.end(), [](int i) { return i==0; }); + return vnni_available && (groups <= 100) && w_sym_quant && opad_all_zero; +} + +} // onednn_utils + #endif // #if AT_MKLDNN_ENABLED() diff --git a/aten/src/ATen/native/quantized/cpu/Pooling.cpp b/aten/src/ATen/native/quantized/cpu/Pooling.cpp index 16ee1c566e3b..0153dd68d735 100644 --- a/aten/src/ATen/native/quantized/cpu/Pooling.cpp +++ b/aten/src/ATen/native/quantized/cpu/Pooling.cpp @@ -1,10 +1,10 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include #include #include -#include -#include #include #include #include @@ -12,6 +12,17 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h b/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h index 799d159114c7..9c6c721657cb 100644 --- a/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h +++ b/aten/src/ATen/native/quantized/cpu/QnnpackUtils.h @@ -1,7 +1,7 @@ #pragma once #ifdef USE_PYTORCH_QNNPACK -#include +#include #include #include #include @@ -9,6 +9,12 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #include struct QnnpackOperatorDeleter { @@ -266,8 +272,9 @@ struct PackedConvWeightsQnnp : public ConvPackedParamsBase { void* zero_buffer = malloc(zero_size); if (zero_buffer == nullptr) { pytorch_qnnp_delete_operator(convolution); - pytorch_qnnp_log_error( - "failed to allocate %zu bytes for zero padding", zero_size); + TORCH_INTERNAL_ASSERT( + false, "failed to allocate %zu bytes for zero padding", + zero_size); } // Need to set to input zero point // memset(zero_buffer, input_zero_point, zero_size); diff --git a/aten/src/ATen/native/quantized/cpu/QuantUtils.h b/aten/src/ATen/native/quantized/cpu/QuantUtils.h index f53efab900be..85bcaa1a69fd 100644 --- a/aten/src/ATen/native/quantized/cpu/QuantUtils.h +++ b/aten/src/ATen/native/quantized/cpu/QuantUtils.h @@ -1,10 +1,21 @@ #pragma once -#include +#include +#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + namespace quant_utils { namespace { float RawUint16ToFp16(unsigned short value) { diff --git a/aten/src/ATen/native/quantized/cpu/QuantizedOps.h b/aten/src/ATen/native/quantized/cpu/QuantizedOps.h index 506f0e46e573..8cba2f8cdd94 100644 --- a/aten/src/ATen/native/quantized/cpu/QuantizedOps.h +++ b/aten/src/ATen/native/quantized/cpu/QuantizedOps.h @@ -1,7 +1,10 @@ -#include +#pragma once +#include +#include +#include +#include #include #include -#include namespace at { namespace native { @@ -143,7 +146,7 @@ using qupsample_bilinear2d_fn = void (*)( c10::optional scales_w); using qcat_nhwc_fn = Tensor (*)( - const c10::List& qxs, + const MaterializedITensorListRef& qxs, int64_t dim, double scale, int64_t zero_point); diff --git a/aten/src/ATen/native/quantized/cpu/ReduceOps.cpp b/aten/src/ATen/native/quantized/cpu/ReduceOps.cpp index e7f78b29bbf0..c2d18693b9ea 100644 --- a/aten/src/ATen/native/quantized/cpu/ReduceOps.cpp +++ b/aten/src/ATen/native/quantized/cpu/ReduceOps.cpp @@ -1,11 +1,24 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include -#include #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include // for _empty_affine_q... +#include // for mean +#include // for mean_out_quanti... +#include // for quantize_per_te... +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/Sorting.cpp b/aten/src/ATen/native/quantized/cpu/Sorting.cpp index 7419f6f7e617..9389261ac1e8 100644 --- a/aten/src/ATen/native/quantized/cpu/Sorting.cpp +++ b/aten/src/ATen/native/quantized/cpu/Sorting.cpp @@ -1,13 +1,17 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include -#include -#include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/TensorOperators.cpp b/aten/src/ATen/native/quantized/cpu/TensorOperators.cpp index 05a5a4521938..97799b3b8d42 100644 --- a/aten/src/ATen/native/quantized/cpu/TensorOperators.cpp +++ b/aten/src/ATen/native/quantized/cpu/TensorOperators.cpp @@ -1,11 +1,29 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/TensorShape.cpp b/aten/src/ATen/native/quantized/cpu/TensorShape.cpp index 172ad041a610..b4b519020246 100644 --- a/aten/src/ATen/native/quantized/cpu/TensorShape.cpp +++ b/aten/src/ATen/native/quantized/cpu/TensorShape.cpp @@ -1,13 +1,27 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include +#include #include #include #include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + #include #include @@ -19,7 +33,7 @@ DEFINE_DISPATCH(qcat_relu_nhwc_stub); namespace { -bool is_cat_nhwc_fast_path(const c10::List& qxs, int dim) { +bool is_cat_nhwc_fast_path(const MaterializedITensorListRef& qxs, int64_t dim) { TORCH_CHECK(qxs.size() > 0); bool is_fast_path = dim == 1; // NOLINTNEXTLINE(performance-implicit-conversion-in-loop) @@ -35,21 +49,21 @@ bool is_valid_quantization_scheme(const Tensor& t) { return (qtype == kPerTensorAffine) || (qtype == kPerTensorSymmetric); } -bool all_inputs_sharing_qparams(TensorList qxs) { +bool all_inputs_sharing_qparams(const MaterializedITensorListRef& qxs) { bool is_valid = true; for (const auto i : c10::irange(1, qxs.size())) { - is_valid |= qxs[0].is_quantized(); - is_valid |= qxs[i].is_quantized() == qxs[0].is_quantized(); - is_valid |= qxs[i].qscheme() == qxs[0].qscheme(); - is_valid |= qxs[i].dtype() == qxs[0].dtype(); - if (qxs[0].qscheme() == kPerTensorAffine) { - is_valid |= qxs[i].q_scale() == qxs[0].q_scale(); - is_valid |= qxs[i].q_zero_point() == qxs[0].q_zero_point(); - } else if (qxs[0].qscheme() == kPerChannelAffine) { - is_valid |= qxs[i].q_per_channel_scales().equal(qxs[0].q_per_channel_scales()); - is_valid |= qxs[i].q_per_channel_zero_points().equal(qxs[0].q_per_channel_zero_points()); + is_valid |= qxs[0].get().is_quantized(); + is_valid |= qxs[i].get().is_quantized() == qxs[0].get().is_quantized(); + is_valid |= qxs[i].get().qscheme() == qxs[0].get().qscheme(); + is_valid |= qxs[i].get().dtype() == qxs[0].get().dtype(); + if (qxs[0].get().qscheme() == kPerTensorAffine) { + is_valid |= qxs[i].get().q_scale() == qxs[0].get().q_scale(); + is_valid |= qxs[i].get().q_zero_point() == qxs[0].get().q_zero_point(); + } else if (qxs[0].get().qscheme() == kPerChannelAffine) { + is_valid |= qxs[i].get().q_per_channel_scales().equal(qxs[0].get().q_per_channel_scales()); + is_valid |= qxs[i].get().q_per_channel_zero_points().equal(qxs[0].get().q_per_channel_zero_points()); } else { - TORCH_CHECK(false, "Unrecognized qscheme:", toString(qxs[0].qscheme())); + TORCH_CHECK(false, "Unrecognized qscheme:", toString(qxs[0].get().qscheme())); } } return is_valid; @@ -61,7 +75,7 @@ bool all_inputs_sharing_qparams(TensorList qxs) { */ template Tensor quantized_cat_impl( - const c10::List& qxs, + const MaterializedITensorListRef& qxs, int64_t dim, double scale, int64_t zero_point) { @@ -73,8 +87,8 @@ Tensor quantized_cat_impl( } } - const auto x_dtype = qxs.get(0).scalar_type(); - const auto x_qscheme = qxs.get(0).qscheme(); + const auto x_dtype = qxs[0].get().scalar_type(); + const auto x_qscheme = qxs[0].get().qscheme(); std::vector xs; xs.reserve(qxs.size()); // NOLINTNEXTLINE(performance-implicit-conversion-in-loop) @@ -99,6 +113,15 @@ Tensor quantized_cat_impl( return qy; } +template +Tensor quantized_cat_impl( + ITensorListRef qxs, + int64_t dim, + double scale, + int64_t zero_point) { + return quantized_cat_impl(qxs.materialize(), dim, scale, zero_point); +} + template Tensor qcat( const c10::List& qxs, @@ -134,28 +157,29 @@ TORCH_LIBRARY_IMPL(quantized, QuantizedCPU, m) { m.impl(TORCH_SELECTIVE_NAME("quantized::cat_relu_out"), TORCH_FN(qcat_out)); } -Tensor cat_quantized_cpu(TensorList qxs, int64_t dim) { - TORCH_CHECK(is_valid_quantization_scheme(qxs[0]), +Tensor cat_quantized_cpu(const ITensorListRef& qxs, int64_t dim) { + auto materialized = qxs.materialize(); + TORCH_CHECK(is_valid_quantization_scheme(materialized[0]), "Only per-tensor quantization is supported in 'cat'!"); TORCH_CHECK( - all_inputs_sharing_qparams(qxs), + all_inputs_sharing_qparams(materialized), "All inputs should share the same quantization parameters."); - check_cat_no_zero_dim(qxs); - dim = legacy_cat_wrap_dim(dim, qxs); - double _scale = qxs[0].q_scale(); - int64_t _zero_point = qxs[0].q_zero_point(); - return quantized_cat_impl(c10::List(qxs), dim, _scale, _zero_point); + check_cat_no_zero_dim(materialized); + dim = legacy_cat_wrap_dim(dim, materialized); + double _scale = materialized[0].get().q_scale(); + int64_t _zero_point = materialized[0].get().q_zero_point(); + return quantized_cat_impl(materialized, dim, _scale, _zero_point); } -Tensor& cat_out_quantized_cpu(TensorList qxs, int64_t dim, Tensor& out) { - TORCH_CHECK(is_valid_quantization_scheme(qxs[0]), +Tensor& cat_out_quantized_cpu(const ITensorListRef& qxs, int64_t dim, Tensor& out) { + auto materialized = qxs.materialize(); + TORCH_CHECK(is_valid_quantization_scheme(materialized[0]), "Only per-tensor quantization is supported in 'cat'!") TORCH_CHECK(is_valid_quantization_scheme(out), "Only per-tensor quantization is supported in 'cat'!") - check_cat_no_zero_dim(qxs); - dim = legacy_cat_wrap_dim(dim, qxs); - auto out_ = quantized_cat_impl(c10::List(qxs), dim, out.q_scale(), - out.q_zero_point()); + check_cat_no_zero_dim(materialized); + dim = legacy_cat_wrap_dim(dim, materialized); + auto out_ = quantized_cat_impl(qxs, dim, out.q_scale(), out.q_zero_point()); at::native::copy_(out, out_, /*non_blocking=*/false); return out; } diff --git a/aten/src/ATen/native/quantized/cpu/UpSampleBilinear2d.cpp b/aten/src/ATen/native/quantized/cpu/UpSampleBilinear2d.cpp index ff8800228435..ac0fb23eb4c3 100644 --- a/aten/src/ATen/native/quantized/cpu/UpSampleBilinear2d.cpp +++ b/aten/src/ATen/native/quantized/cpu/UpSampleBilinear2d.cpp @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include @@ -6,10 +8,15 @@ #include #include -#include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + #include -#include namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/UpSampleNearest2d.cpp b/aten/src/ATen/native/quantized/cpu/UpSampleNearest2d.cpp index 9f8b065576df..abe6dfd22586 100644 --- a/aten/src/ATen/native/quantized/cpu/UpSampleNearest2d.cpp +++ b/aten/src/ATen/native/quantized/cpu/UpSampleNearest2d.cpp @@ -1,11 +1,19 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include -#include -#include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #include #include diff --git a/aten/src/ATen/native/quantized/cpu/UpSampleNearest3d.cpp b/aten/src/ATen/native/quantized/cpu/UpSampleNearest3d.cpp index ba723d707ee9..4b4c63eb7c3d 100644 --- a/aten/src/ATen/native/quantized/cpu/UpSampleNearest3d.cpp +++ b/aten/src/ATen/native/quantized/cpu/UpSampleNearest3d.cpp @@ -1,8 +1,16 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include -#include -#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif #include @@ -230,27 +238,5 @@ Tensor _upsample_nearest_exact3d_quantized_cpu( input, osize, scale_d, scale_h, scale_w); } -Tensor upsample_nearest3d_quantized_cpu( - const Tensor& input, - at::OptionalIntArrayRef output_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input.sizes(), output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return upsample_nearest3d_quantized_cpu(input, osize, scale_d, scale_h, scale_w); -} - -Tensor _upsample_nearest_exact3d_quantized_cpu( - const Tensor& input, - at::OptionalIntArrayRef output_size, - c10::optional> scale_factors) { - auto osize = compute_output_size(input.sizes(), output_size, scale_factors); - auto scale_d = get_scale_value(scale_factors, 0); - auto scale_h = get_scale_value(scale_factors, 1); - auto scale_w = get_scale_value(scale_factors, 2); - return _upsample_nearest_exact3d_quantized_cpu(input, osize, scale_d, scale_h, scale_w); -} - } // namespace native } // namespace at diff --git a/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h b/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h index 78f325263f4f..12e4fbbf1e76 100644 --- a/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h +++ b/aten/src/ATen/native/quantized/cpu/XnnpackUtils.h @@ -3,7 +3,7 @@ #ifdef USE_XNNPACK #include -#include +#include #include using xnnpack_operator = at::native::xnnpack::Operator; diff --git a/aten/src/ATen/native/quantized/cpu/conv_serialization.h b/aten/src/ATen/native/quantized/cpu/conv_serialization.h index 9e4edb8f9a88..e9d833c9fc22 100644 --- a/aten/src/ATen/native/quantized/cpu/conv_serialization.h +++ b/aten/src/ATen/native/quantized/cpu/conv_serialization.h @@ -1,11 +1,19 @@ #pragma once -#include +#include #include #include #include #include #include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #include @@ -330,6 +338,37 @@ c10::intrusive_ptr> deserialize_conv( auto& ctx = at::globalContext(); +#ifdef USE_FBGEMM + if (ctx.qEngine() == at::QEngine::X86) { +#if AT_MKLDNN_ENABLED() + bool use_onednn = onednn_utils::should_use_onednn_quant( + weight.value(), transpose, groups, output_padding); + if (use_onednn) { + return PackedConvWeightsOnednn::prepack( + weight.value(), + bias, + stride, + padding, + output_padding, + dilation, + groups, + transpose + ); + } +#endif + return PackedConvWeight::prepack( + weight.value(), + bias, + stride, + padding, + output_padding, + dilation, + groups, + transpose + ); + } // x86 +#endif + #ifdef USE_FBGEMM if (ctx.qEngine() == at::QEngine::FBGEMM) { return PackedConvWeight::prepack( diff --git a/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp b/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp index 33d8bd88b858..8af21bbc7df8 100644 --- a/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp +++ b/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp @@ -1,4 +1,10 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include +#include +#include +#include #include #include #include @@ -14,6 +20,12 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + int register_linear_params(); int register_embedding_params(); @@ -445,7 +457,8 @@ int register_linear_params() { bias = std::move(std::get<1>(state)); #ifdef USE_FBGEMM - if (at::globalContext().qEngine() == at::QEngine::FBGEMM) { + if (at::globalContext().qEngine() == at::QEngine::FBGEMM || + at::globalContext().qEngine() == at::QEngine::X86) { if (weight.scalar_type() == at::kQInt8) { return PackedLinearWeight::prepack( std::move(weight), std::move(bias)); @@ -547,6 +560,7 @@ int register_embedding_params() { return PackedEmbeddingBagWeight::prepack(weight); }) .def("bit_rate", &EmbeddingPackedParamsBase::bit_rate) + .def("unpack", &EmbeddingPackedParamsBase::unpack) .def("version", &EmbeddingPackedParamsBase::version); return 0; diff --git a/aten/src/ATen/native/quantized/cpu/fused_obs_fake_quant.cpp b/aten/src/ATen/native/quantized/cpu/fused_obs_fake_quant.cpp index 5fd73c58ed33..77c60141b065 100644 --- a/aten/src/ATen/native/quantized/cpu/fused_obs_fake_quant.cpp +++ b/aten/src/ATen/native/quantized/cpu/fused_obs_fake_quant.cpp @@ -1,9 +1,24 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#include +#endif + #ifdef USE_FBGEMM #include #endif @@ -221,7 +236,7 @@ at::Tensor fused_moving_avg_obs_fake_quant( const int64_t ch_axis, bool per_row_fake_quant, bool symmetric_quant) { - if (self.numel() == 0) { + if (self.sym_numel() == 0) { return self.clone(); } const auto res = at::_fused_moving_avg_obs_fq_helper( diff --git a/aten/src/ATen/native/quantized/cpu/init_qnnpack.cpp b/aten/src/ATen/native/quantized/cpu/init_qnnpack.cpp index b4a524566605..82fb217e46fa 100644 --- a/aten/src/ATen/native/quantized/cpu/init_qnnpack.cpp +++ b/aten/src/ATen/native/quantized/cpu/init_qnnpack.cpp @@ -1,8 +1,7 @@ #ifdef USE_PYTORCH_QNNPACK #include -#include -#include +#include #include #include diff --git a/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp b/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp index d1293cc29f27..a1f8f0d7c245 100644 --- a/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp +++ b/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include @@ -15,6 +17,13 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + #include #ifdef USE_FBGEMM #include @@ -47,7 +56,7 @@ void check_tensor_memory_format(const Tensor& ref, const Tensor& other) { template Tensor qcat_nhwc_kernel( - const c10::List& qxs, + const MaterializedITensorListRef& qxs, int64_t dim, double scale, int64_t zero_point) { @@ -110,7 +119,7 @@ Tensor qcat_nhwc_kernel( c10::nullopt); // N, H, and W are explicitly captured here because there's a bug in GCC5 - // which causes an internal compiler error if they're not + // and clang5 which causes an internal compiler error if they're not AT_DISPATCH_QINT_TYPES(output.scalar_type(), "qcat_nhwc", [&, N, H, W]() { using Vec = Vectorized; at::parallel_for(0, N * H * W, 0, [&](int64_t begin, int64_t end) { @@ -2747,18 +2756,26 @@ void quantized_normalize_kernel( dq = (dq - layer_mean_div_scale_xVec) * gamma_p_vec + beta_vec; - qVec::quantize(dqXVec, y_scale, y_zp, y_inv_scale) - .store(Y_ptr + vecStartIdx); } + qVec::quantize(dqXVec, y_scale, y_zp, y_inv_scale) + .store(Y_ptr + vecStartIdx); } - for (int64_t remIdx = chEndIdx - kNonVecRemInChannel; - remIdx < chEndIdx; - remIdx++) { - auto qXVal = X_ptr[remIdx]; - float dqXVal = at::native::dequantize_val(x_fake_scale, x_zp, qXVal); - float dqY = - (dqXVal - layer_mean_div_scale_x) * gamma_p + beta; - Y_ptr[remIdx] = at::native::quantize_val(y_scale, y_zp, dqY); + + // Remainder + if (kNonVecRemInChannel > 0) { + int64_t remIdx = chEndIdx - kNonVecRemInChannel; + auto qXVec = qVec::loadu(X_ptr + remIdx, kNonVecRemInChannel); + auto dqXVec = qXVec.dequantize(x_fake_scale_vec, x_zp_vec, + x_fake_scale_zp_neg_premul_vec); + int validDqvecLen = (kNonVecRemInChannel - 1) / fVec::size() + 1; + for (int i = 0; i < validDqvecLen; ++i) { + auto &dq = dqXVec[i]; + dq = + (dq - layer_mean_div_scale_xVec) * + gamma_p_vec + beta_vec; + } + qVec::quantize(dqXVec, y_scale, y_zp, y_inv_scale) + .store(Y_ptr + remIdx, kNonVecRemInChannel); } } // chIdx @@ -3782,8 +3799,8 @@ void quantize_tensor_per_channel_impl( // channels_last contig. // If axis = 0 and channels_last contig, implementation for channels // first (NCHW) works. - for (const auto b : c10::irange(batches)) { - for (const auto e : c10::irange(elements_per_channel)) { + for (const auto b C10_UNUSED : c10::irange(batches)) { + for (const auto e C10_UNUSED : c10::irange(elements_per_channel)) { uint32_t c = 0; while (c + 8 < channels) { const int16x8_t vzero_point = vld1q_s16(&zero_points_int16t[c]); @@ -3813,8 +3830,8 @@ void quantize_tensor_per_channel_impl( } } } else { - for (const auto b : c10::irange(batches)) { - for (const auto c : c10::irange(channels)) { + for (const auto b C10_UNUSED : c10::irange(batches)) { + for (const auto c C10_UNUSED : c10::irange(channels)) { uint32_t e = 0; const int16x8_t vzero_point = vdupq_n_s16(zero_points_int16t[c]); const float32x4_t vinv_scale = vdupq_n_f32(inv_scales[c]); diff --git a/aten/src/ATen/native/quantized/cpu/qclamp.cpp b/aten/src/ATen/native/quantized/cpu/qclamp.cpp index 21570fd436ea..10f8c4bd7d23 100644 --- a/aten/src/ATen/native/quantized/cpu/qclamp.cpp +++ b/aten/src/ATen/native/quantized/cpu/qclamp.cpp @@ -1,15 +1,24 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include -#include -#include +#include #include #include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/quantized/cpu/qconv.cpp b/aten/src/ATen/native/quantized/cpu/qconv.cpp index 873d983a4820..b6fa57b9e3ed 100644 --- a/aten/src/ATen/native/quantized/cpu/qconv.cpp +++ b/aten/src/ATen/native/quantized/cpu/qconv.cpp @@ -1,9 +1,13 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include #include #include -#include +#include +#include +#include #include +#include #include #include #include @@ -15,6 +19,19 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + #include namespace { @@ -113,7 +130,7 @@ at::SmallVector MakeDeConvOutputShape( ", output padding: ", output_padding[idx], ", dilation: ", dilation[idx]) TORCH_CHECK(output_shape[idx + 2] < kReasonableMaxDim, - "Output dimension is beyound reasonable maximum for ", idx, + "Output dimension is beyond reasonable maximum for ", idx, " axis;" " kernel: ", kernel[idx], ", stride: ", stride[idx], @@ -1227,49 +1244,98 @@ at::Tensor PackedConvWeightsOnednn::apply_impl( const ideep::dims& dilates = dilation().vec(); const ideep::dims& padding_l = padding().vec(); const ideep::dims& padding_r = padding().vec(); - const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0/act.q_scale()); // Scales of ONEDNN and PyTorch are reciprocal + double input_scale = act.q_scale(); + int64_t input_zp = act.q_zero_point(); + // Scales of ONEDNN and PyTorch are reciprocal + const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0/input_scale); const ideep::scale_t& weights_scales = weights.get_scale(); - const ideep::scale_t& dst_scales = ideep::scale_t(weights_scales.size(), 1.0/output_scale); // Scales of ONEDNN and PyTorch are reciprocal - const ideep::zero_point_t src_zero_points = ideep::zero_point_t(1, act.q_zero_point()); + int64_t scale_size = weights_scales.size(); + double inv_output_scale = 1.0/output_scale; + const ideep::zero_point_t src_zero_points = ideep::zero_point_t(1, input_zp); const ideep::zero_point_t dst_zero_points = ideep::zero_point_t(1, output_zero_point); ideep::attr_t op_attr = kReluFused ? ideep::attr_t::fuse_relu() : ideep::attr_t(); - op_attr.set_zero_points(DNNL_ARG_SRC, ideep::utils::tensor_zp_mask(1), {DNNL_RUNTIME_S32_VAL}); // runtime src zero point - if (with_bias) { - // Bias might be modified outside (e.g. by quantization bias correction). - // If so, update the prepacked bias as well. - if (bias_.value().get_data_handle() != orig_bias_.value().data_ptr()) { - bias_.value().init(bias_.value().get_desc(), orig_bias_.value().data_ptr()); - } - const auto& b = bias_.value(); - if (transpose()) { + // Since src zero point is unknown, set runtime value here + op_attr.set_zero_points(DNNL_ARG_SRC, ideep::utils::tensor_zp_mask(1), {DNNL_RUNTIME_S32_VAL}); + + // Bias might be modified outside (e.g. by quantization bias correction). + // If so, update the prepacked bias as well. + if (with_bias && bias_.value().get_data_handle() != orig_bias_.value().data_ptr()) { + bias_.value().init(bias_.value().get_desc(), orig_bias_.value().data_ptr()); + } + const auto& b = with_bias ? bias_.value() : ideep::tensor(); + int num_threads = at::get_num_threads(); + if (transpose()) { + // Primitive cache is initialized when called for the first time + // and won't be updated afterwards. + PrimitiveCacheKey cache_key = std::make_tuple( + input_scale, input_zp, src_dims, output_scale, output_zero_point, num_threads); + c10::call_once(*cache_initialized_flag, [&](){ + DeconvParams params; + ideep::convolution_transpose_forward::prepare( + params, src, weights, b, dst_dims, dst, + strides, padding_l, padding_r, dilates, groups(), + src_scales, weights_scales, ideep::scale_t(scale_size, inv_output_scale), + src_zero_points, dst_zero_points, op_attr, + dnnl::algorithm::deconvolution_direct, + dnnl::prop_kind::forward_inference, + ideep::u8s8, ideep::engine::cpu_engine()); + get_deconv_cache() = DeconvPrimitiveCache( + cache_key, params.pd, b, params.bias_attr, params.input_zero_point); + onednn_utils::try_reorder( + weights, (ideep::tensor::desc)params.pd.weights_desc(), weights_scales); + }); + if (get_deconv_cache().hit(cache_key)) { + Deconv& primitive = get_deconv_cache().get_primitive(); + DeconvDesc& pd = get_deconv_cache().get_primitive_desc(); + auto& src_zp_tensor = get_deconv_cache().get_src_zp_tensor(); + auto& expected_bias = get_deconv_cache().get_bias(); + ideep::convolution_transpose_forward::compute( + pd, primitive, src, weights, expected_bias, dst, src_zp_tensor, groups()); + } else { ideep::convolution_transpose_forward::compute_v2( src, weights, b, dst_dims, dst, strides, padding_l, padding_r, dilates, - groups(), src_scales, weights_scales, dst_scales, src_zero_points, dst_zero_points, - op_attr, dnnl::algorithm::deconvolution_direct, dnnl::prop_kind::forward_inference, - ideep::u8s8, ideep::engine::cpu_engine()); - } else { - ideep::convolution_forward::compute_v2( - src, weights, b, dst_dims, dst, - strides, dilates, padding_l, padding_r, groups(), - src_scales, weights_scales, dst_scales, src_zero_points, dst_zero_points, - op_attr, dnnl::algorithm::convolution_direct, dnnl::prop_kind::forward_inference, + groups(), src_scales, weights_scales, + ideep::scale_t(scale_size, inv_output_scale), + src_zero_points, dst_zero_points, op_attr, + dnnl::algorithm::deconvolution_direct, + dnnl::prop_kind::forward_inference, ideep::u8s8, ideep::engine::cpu_engine()); } - } else { - if (transpose()) { - ideep::convolution_transpose_forward::compute_v2( - src, weights, dst_dims, dst, - strides, padding_l, padding_r, dilates, - groups(), src_scales, weights_scales, dst_scales, src_zero_points, dst_zero_points, - op_attr, dnnl::algorithm::deconvolution_direct, dnnl::prop_kind::forward_inference, - ideep::u8s8, ideep::engine::cpu_engine()); + } else { // not transposed + PrimitiveCacheKey cache_key = std::make_tuple( + input_scale, input_zp, src_dims, output_scale, output_zero_point, num_threads); + c10::call_once(*cache_initialized_flag, [&](){ + src.set_zero_point(src_zero_points); + dst.set_zero_point(dst_zero_points); + ConvParams params; + ideep::convolution_forward::prepare( + params, src, weights, b, dst_dims, dst, + strides, dilates, padding_l, padding_r, groups(), + src_scales, weights_scales, ideep::scale_t(scale_size, inv_output_scale), + op_attr, dnnl::algorithm::convolution_direct, + dnnl::prop_kind::forward_inference, + ideep::u8s8, ideep::engine::cpu_engine()); + get_conv_cache() = ConvPrimitiveCache(cache_key, params.pd, b, params.bias_attr); + onednn_utils::try_reorder( + weights, (ideep::tensor::desc)params.pd.weights_desc(), weights_scales); + }); + // If hit, use cached data. If miss, fall back to normal path. + if (get_conv_cache().hit(cache_key)) { + ConvDesc& pd = get_conv_cache().get_primitive_desc(); + Conv& primitive = get_conv_cache().get_primitive(); + auto& src_zp_tensor = get_conv_cache().get_src_zp_tensor(); + auto& expected_bias = get_conv_cache().get_bias(); + ideep::convolution_forward::compute( + pd, primitive, src, weights, expected_bias, dst, src_zp_tensor, groups()); } else { ideep::convolution_forward::compute_v2( - src, weights, dst_dims, dst, + src, weights, b, dst_dims, dst, strides, dilates, padding_l, padding_r, groups(), - src_scales, weights_scales, dst_scales, src_zero_points, dst_zero_points, - op_attr, dnnl::algorithm::convolution_direct, dnnl::prop_kind::forward_inference, + src_scales, weights_scales, ideep::scale_t(scale_size, inv_output_scale), + src_zero_points, dst_zero_points, op_attr, + dnnl::algorithm::convolution_direct, + dnnl::prop_kind::forward_inference, ideep::u8s8, ideep::engine::cpu_engine()); } } diff --git a/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp b/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp index f4783484aaf8..26a2855a0fbb 100644 --- a/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp +++ b/aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp @@ -1,8 +1,8 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include -#include -#include +#include +#include #include #include #include @@ -11,9 +11,15 @@ #include #include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include // for dequantize +#include +#endif + #ifdef USE_FBGEMM template diff --git a/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp b/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp index fd31c2e70883..9d2f1a96c31b 100644 --- a/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp +++ b/aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp @@ -1,16 +1,24 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include +#include +#include +#include #include #include #include #include #include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #include #ifdef USE_FBGEMM @@ -437,7 +445,7 @@ c10::intrusive_ptr> PackedConvWeightsOnednn< exp_wgt.init(w_desc); exp_wgt.set_scale(wgt_scales); // Also for feed_from() exp_wgt.feed_from(wgt, transpose); // expect wgt to be in [OC IC KH KW] format - ideep::tensor * packed_weight_p = new ideep::tensor(exp_wgt); + ideep::tensor * packed_weight_p = new ideep::tensor(std::move(exp_wgt)); packed_weight_p->set_scale(wgt_scales); packed_weight_p->set_zero_point(wgt_zero_points); std::unique_ptr weight_ptr(packed_weight_p); @@ -521,6 +529,21 @@ class QConvPackWeightInt8 final { int64_t groups, bool transpose) { auto& ctx = at::globalContext(); +#ifdef USE_FBGEMM + if (ctx.qEngine() == at::QEngine::X86) { +#if AT_MKLDNN_ENABLED() + bool use_onednn = onednn_utils::should_use_onednn_quant( + weight, transpose, groups, output_padding); + if (use_onednn) { + return PackedConvWeightsOnednn::prepack( + weight, bias, stride, padding, output_padding, dilation, groups, transpose); + } +#endif + return PackedConvWeight::prepack( + weight, bias, stride, padding, output_padding, dilation, groups, transpose); + } // x86 +#endif // defined(USE_FBGEMM) || AT_MKLDNN_ENABLED() + #ifdef USE_FBGEMM if (ctx.qEngine() == at::QEngine::FBGEMM) { return PackedConvWeight::prepack( @@ -598,6 +621,25 @@ class QConv1dPackWeightInt8 final { padding = quant_utils::MakeArgForConv1d(padding, 0); output_padding = quant_utils::MakeArgForConv1d(output_padding, 0); dilation = quant_utils::MakeArgForConv1d(dilation, 1); + +#ifdef USE_FBGEMM + if (ctx.qEngine() == at::QEngine::X86) { +#if AT_MKLDNN_ENABLED() + bool use_onednn = onednn_utils::should_use_onednn_quant( + weight, transpose, groups, output_padding); + if (use_onednn) { + return PackedConvWeightsOnednn<2>::prepack( + weight, bias, stride, padding, output_padding, dilation, groups, + transpose); + } +#endif + return PackedConvWeight<2>::prepack( + weight, bias, stride, padding, output_padding, dilation, groups, + transpose); + + } // x86 +#endif + #ifdef USE_FBGEMM if (ctx.qEngine() == at::QEngine::FBGEMM) { return PackedConvWeight<2>::prepack( diff --git a/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp b/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp index ad32d9b16a20..8af8d62f2f8a 100644 --- a/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp +++ b/aten/src/ATen/native/quantized/cpu/qconv_unpack_impl.cpp @@ -126,7 +126,7 @@ template std::tuple> PackedConvWeightsOnednn< kSpatialDim>::unpack() { return std::tuple>( - orig_weight_, orig_bias_); + orig_weight_.clone(), orig_bias_); } template std::tuple> PackedConvWeightsOnednn< diff --git a/aten/src/ATen/native/quantized/cpu/qelu.cpp b/aten/src/ATen/native/quantized/cpu/qelu.cpp index ba921efcc91e..f8b66781f2e9 100644 --- a/aten/src/ATen/native/quantized/cpu/qelu.cpp +++ b/aten/src/ATen/native/quantized/cpu/qelu.cpp @@ -1,9 +1,15 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp b/aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp index ac6cce628064..e2703bb93fb4 100644 --- a/aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp +++ b/aten/src/ATen/native/quantized/cpu/qembeddingbag.cpp @@ -1,4 +1,5 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include @@ -9,10 +10,20 @@ #endif #include +#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + int register_embedding_params(); namespace { diff --git a/aten/src/ATen/native/quantized/cpu/qembeddingbag.h b/aten/src/ATen/native/quantized/cpu/qembeddingbag.h index 301b025322a3..86ed0f530f9c 100644 --- a/aten/src/ATen/native/quantized/cpu/qembeddingbag.h +++ b/aten/src/ATen/native/quantized/cpu/qembeddingbag.h @@ -1,4 +1,6 @@ -#include +#pragma once +#include +#include namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp b/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp index 748e89fc182d..dab19e0908e3 100644 --- a/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp +++ b/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.cpp @@ -1,12 +1,24 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include +#include #include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #include int register_embedding_params(); @@ -254,9 +266,10 @@ Tensor& qembeddingbag_byte_prepack_out(Tensor& output, const Tensor& weight) { } #else - const auto weight_data = weight_contig->scalar_type() == at::ScalarType::Half - ? weight_contig->to(at::ScalarType::Float).data_ptr() - : weight_contig->data_ptr(); + const Tensor& float_weight = weight_contig->scalar_type() == at::ScalarType::Half + ? weight_contig->to(at::ScalarType::Float) + : *weight_contig; + const auto weight_data = float_weight.data_ptr(); constexpr float kEpsilon = 1e-8f; for (auto row : c10::irange(embedding_rows)) { const float* input_row = weight_data + row * embedding_cols; diff --git a/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.h b/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.h index c52cbae4f2c8..a18ec214ebad 100644 --- a/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.h +++ b/aten/src/ATen/native/quantized/cpu/qembeddingbag_prepack.h @@ -1,7 +1,7 @@ -#include +#pragma once +#include -namespace at { -namespace native { +namespace at { namespace native { Tensor& qembeddingbag_byte_prepack_out(Tensor& output, const Tensor& weight); diff --git a/aten/src/ATen/native/quantized/cpu/qembeddingbag_unpack.cpp b/aten/src/ATen/native/quantized/cpu/qembeddingbag_unpack.cpp index 68e7c4fdaca2..d0c62d686135 100644 --- a/aten/src/ATen/native/quantized/cpu/qembeddingbag_unpack.cpp +++ b/aten/src/ATen/native/quantized/cpu/qembeddingbag_unpack.cpp @@ -1,10 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + int register_embedding_params(); at::Tensor PackedEmbeddingBagWeight::unpack() { diff --git a/aten/src/ATen/native/quantized/cpu/qgelu.cpp b/aten/src/ATen/native/quantized/cpu/qgelu.cpp index 05901b556e47..f9a3c32343df 100644 --- a/aten/src/ATen/native/quantized/cpu/qgelu.cpp +++ b/aten/src/ATen/native/quantized/cpu/qgelu.cpp @@ -1,15 +1,12 @@ -#include -#include -#include -#include -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include -#include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/qhardsigmoid.cpp b/aten/src/ATen/native/quantized/cpu/qhardsigmoid.cpp index 6059671eb067..aa37e51e7ea1 100644 --- a/aten/src/ATen/native/quantized/cpu/qhardsigmoid.cpp +++ b/aten/src/ATen/native/quantized/cpu/qhardsigmoid.cpp @@ -1,12 +1,19 @@ -#include -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/quantized/cpu/qhardswish.cpp b/aten/src/ATen/native/quantized/cpu/qhardswish.cpp index 7f2431de86ec..bf4e0d988295 100644 --- a/aten/src/ATen/native/quantized/cpu/qhardswish.cpp +++ b/aten/src/ATen/native/quantized/cpu/qhardswish.cpp @@ -1,12 +1,18 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/quantized/cpu/qlinear.cpp b/aten/src/ATen/native/quantized/cpu/qlinear.cpp index 0e51b9867607..111b5eb5f139 100644 --- a/aten/src/ATen/native/quantized/cpu/qlinear.cpp +++ b/aten/src/ATen/native/quantized/cpu/qlinear.cpp @@ -1,6 +1,8 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include +#include #include #include #include @@ -8,9 +10,20 @@ #include #include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include // for _empty_affine_q... +#include // for empty_affine_qu... +#include // for empty +#include // for quantize_per_ch... +#include // for quantize_per_te... +#include +#endif + #include #include @@ -629,10 +642,13 @@ at::Tensor PackedLinearWeightsOnednn::apply_impl( ideep::attr_t op_attr = ReluFused ? ideep::attr_t::fuse_relu() : ideep::attr_t(); ideep::tensor x(input_desc, input_contig->data_ptr()); auto dst_dims = {M, N}; - const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0/input.q_scale()); + double input_scale = input.q_scale(); + int64_t input_zero_point = input.q_zero_point(); + const ideep::scale_t& src_scales = ideep::scale_t(1, 1.0/input_scale); const ideep::scale_t& weights_scales = w.get_scale(); - const ideep::scale_t& dst_scales = ideep::scale_t(1, 1.0/output_scale); // Scales of ONEDNN and PyTorch are reciprocal - const ideep::zero_point_t& src_zero_point = ideep::zero_point_t(1, input.q_zero_point()); + // Scales of ONEDNN and PyTorch are reciprocal + const ideep::scale_t& dst_scales = ideep::scale_t(1, 1.0/output_scale); + const ideep::zero_point_t& src_zero_point = ideep::zero_point_t(1, input_zero_point); const ideep::zero_point_t& dst_zero_point = ideep::zero_point_t(1, output_zero_point); // Compute: Use ideep::matmul_forward to support asymmetric quantization // Allocate output Tensor @@ -644,20 +660,39 @@ at::Tensor PackedLinearWeightsOnednn::apply_impl( if (output.numel() == 0) { return output; } - ideep::tensor y({dst_dims, ideep::tensor::data_type::u8, {output.strides().cbegin(), output.strides().cend()}}, + ideep::tensor y({dst_dims, ideep::tensor::data_type::u8, + {output.strides().cbegin(), output.strides().cend()}}, output.data_ptr()); - if (bias_.has_value()) { + bool with_bias = bias_.has_value(); + if (with_bias) { // Bias might be modified outside (e.g. by quantization bias correction). // If so, update the prepacked bias as well. if (bias_.value().get_data_handle() != orig_bias_.value().data_ptr()) { bias_.value().init(bias_.value().get_desc(), orig_bias_.value().data_ptr()); } - const auto& b = bias_.value(); - ideep::matmul_forward::compute_v2(x, w, b, y, 1.0f, 1.0f, src_scales, weights_scales, dst_scales, - src_zero_point, dst_zero_point, op_attr); + } + const auto& b = with_bias ? bias_.value() : ideep::tensor(); + // Primitive cache is initialized when called for the first time + // and won't be updated afterwards. + int num_threads = at::get_num_threads(); + PrimitiveCacheKey cache_key = std::make_tuple( + input_scale, input_zero_point, input_dims, output_scale, output_zero_point, num_threads); + c10::call_once(*cache_initialized_flag, [&](){ + LinearParams params; + ideep::matmul_forward::prepare( + params, x, w, b, y, 1.0f, 1.0f, + src_scales, weights_scales, dst_scales, + src_zero_point, dst_zero_point, op_attr); + get_cache() = LinearPrimitiveCache(cache_key, params); + onednn_utils::try_reorder( + w, (ideep::tensor::desc)params.pd.weights_desc(), weights_scales); + }); + if (get_cache().hit(cache_key)) { + LinearParams& params = get_cache().get_param(); + ideep::matmul_forward::compute(params, x, w, b, y); } else { - ideep::matmul_forward::compute_v2(x, w, y, 1.0f, 1.0f, src_scales, weights_scales, dst_scales, - src_zero_point, dst_zero_point, op_attr); + ideep::matmul_forward::compute_v2(x, w, b, y, 1.0f, 1.0f, src_scales, weights_scales, + dst_scales, src_zero_point, dst_zero_point, op_attr); } auto out_sizes = input.sizes().vec(); out_sizes.back() = N; diff --git a/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp b/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp index df529a6612f9..537d0f492f8f 100644 --- a/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp +++ b/aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp @@ -1,6 +1,7 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include #include #include #include @@ -9,7 +10,14 @@ #include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#endif #include @@ -236,7 +244,7 @@ at::Tensor PackedLinearWeightsQnnp::apply_dynamic_impl( at::Tensor input, bool reduce_range) { if (reduce_range) { - TORCH_WARN("Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release."); + TORCH_WARN_ONCE("Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release."); } using at::Tensor; @@ -415,14 +423,21 @@ at::Tensor& PackedLinearWeightFp16::apply_dynamic_impl( // Resize output Tensor output.resize_(output_sizes); - // Call the fp16 gemm interface - fbgemm::cblas_gemm_compute( - fbgemm::matrix_op_t::NoTranspose, - M, - input_ptr, - packed_weight_fp16, - 0.0f, - output.data_ptr()); + int num_tasks = at::get_num_threads(); + at::parallel_for(0, num_tasks, 1, [&](int64_t begin, int64_t end) { + for (const auto task_id : c10::irange(begin, end)) { + // Call the fp16 gemm interface + fbgemm::cblas_gemm_compute( + /*transa=*/fbgemm::matrix_op_t::NoTranspose, + /*m=*/static_cast(M), + /*A=*/input_ptr, + /*Bp=*/packed_weight_fp16, + /*beta=*/0.0f, + /*C=*/output.data_ptr(), + /*thread_id=*/static_cast(task_id), + /*num_threads=*/num_tasks); + } + }); // Add bias term if (bias_.has_value()) { @@ -496,10 +511,21 @@ at::Tensor PackedLinearWeightsOnednn::apply_dynamic_impl( x.init(input_desc, input_contig.data_ptr()); // Find quantization parameters float x_max = 0, x_min = 0; - if (input.numel() > 0) { - x_min = input_contig.min().item(); - x_max = input_contig.max().item(); +#ifdef USE_FBGEMM + // Use FBGEMM's FindMinMax if available since it's faster + fbgemm::FindMinMax( + /*m=*/input_contig.data_ptr(), + /*min=*/&x_min, + /*max=*/&x_max, + /*len=*/input.numel()); +#else + if (input_contig.numel() > 0) { + Tensor t_min, t_max; + std::tie(t_min, t_max) = at::aminmax(input_contig); + x_max = t_max.item(); + x_min = t_min.item(); } +#endif const int precision = 8; auto q_params = quant_utils::ChooseQuantizationParams( /*min=*/x_min, @@ -524,18 +550,37 @@ at::Tensor PackedLinearWeightsOnednn::apply_dynamic_impl( ideep::tensor y({dst_dims, ideep::tensor::data_type::f32, {output.strides().cbegin(), output.strides().cend()}}, output.data_ptr()); - if (bias_.has_value()) { + bool with_bias = bias_.has_value(); + if (with_bias) { // Bias might be modified outside (e.g. by quantization bias correction). // If so, update the prepacked bias as well. if (bias_.value().get_data_handle() != orig_bias_.value().data_ptr()) { bias_.value().init(bias_.value().get_desc(), orig_bias_.value().data_ptr()); } - const ideep::tensor b = bias_.value(); - ideep::matmul_forward::compute_v2(x, w, b, y, 1.0f, 1.0f, - src_scales, weights_scales, ideep::scale_t(), - src_zero_point, ideep::zero_point_t(), op_attr); + } + const auto& b = with_bias ? bias_.value() : ideep::tensor(); + // Primitive cache is initialized when called for the first time + // and won't be updated afterwards. + int num_threads = at::get_num_threads(); + PrimitiveCacheKey cache_key = std::make_tuple( + q_params.scale, q_params.zero_point, input_dims, 1.0, 0, num_threads); + c10::call_once(*cache_initialized_flag, [&](){ + LinearParams params; + ideep::matmul_forward::prepare( + params, x, w, b, y, 1.0f, 1.0f, + src_scales, weights_scales, ideep::scale_t(), + src_zero_point, ideep::zero_point_t(), op_attr); + get_cache() = LinearPrimitiveCache(cache_key, params); + onednn_utils::try_reorder( + w, (ideep::tensor::desc)params.pd.weights_desc(), weights_scales); + }); + if (get_cache().hit_dynamic(cache_key)) { + LinearParams& params = get_cache().get_param(); + ideep::matmul_forward::compute_dynamic( + params, x, w, b, y, 1.0f, 1.0f, src_scales, weights_scales, + ideep::scale_t(), src_zero_point, ideep::zero_point_t()); } else { - ideep::matmul_forward::compute_v2(x, w, y, 1.0f, 1.0f, + ideep::matmul_forward::compute_v2(x, w, b, y, 1.0f, 1.0f, src_scales, weights_scales, ideep::scale_t(), src_zero_point, ideep::zero_point_t(), op_attr); } diff --git a/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp b/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp index b4f0f4c41f41..36523bbd1b9b 100644 --- a/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp +++ b/aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include @@ -9,9 +11,19 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif + #include #include +#include #include int register_linear_params(); @@ -238,7 +250,7 @@ c10::intrusive_ptr PackedLinearWeightsOnednn::prepack( dnnl::memory::data_type::u8); ideep::tensor exp_wgt(w_desc); exp_wgt.feed_from(wgt); - ideep::tensor * packed_weight_p = new ideep::tensor(exp_wgt); + ideep::tensor * packed_weight_p = new ideep::tensor(std::move(exp_wgt)); packed_weight_p->set_scale(wgt_scales); packed_weight_p->set_zero_point(wgt_zero_points); std::unique_ptr weight_ptr(packed_weight_p); @@ -288,7 +300,8 @@ class QLinearPackWeightInt8 final { auto& ctx = at::globalContext(); #ifdef USE_FBGEMM - if (ctx.qEngine() == at::QEngine::FBGEMM) { + if (ctx.qEngine() == at::QEngine::FBGEMM || + ctx.qEngine() == at::QEngine::X86) { return PackedLinearWeight::prepack(std::move(weight), std::move(bias)); } #endif @@ -320,7 +333,8 @@ class QLinearPackWeightFp16 final { // temporarily convert weight back to fp32, needs to be fixed // after fbgemm fixes the interface for their prepacking op (take fp16 input0 weight = weight.to(ScalarType::Float); - if (ctx.qEngine() == at::QEngine::FBGEMM) { + if (ctx.qEngine() == at::QEngine::FBGEMM || + ctx.qEngine() == at::QEngine::X86) { return PackedLinearWeightFp16::prepack( std::move(weight), std::move(bias)); } diff --git a/aten/src/ATen/native/quantized/cpu/qmatmul.cpp b/aten/src/ATen/native/quantized/cpu/qmatmul.cpp index c1e5041a5734..4da714e0bcf0 100644 --- a/aten/src/ATen/native/quantized/cpu/qmatmul.cpp +++ b/aten/src/ATen/native/quantized/cpu/qmatmul.cpp @@ -21,7 +21,7 @@ inline void check_inputs(const Tensor& qa, const Tensor& qb) { "MatMul operands should have same data type."); TORCH_CHECK( qa.qscheme() == kPerTensorAffine || qa.qscheme() == kPerTensorSymmetric, - "Only per-tensor quantization is suported in Matmul."); + "Only per-tensor quantization is supported in Matmul."); TORCH_CHECK( qa.qscheme() == qb.qscheme(), "Both inputs to Matmul must have the same quantization scheme."); @@ -45,7 +45,7 @@ Tensor qmatmul( " and ", b_num_dims, " provided)"); TORCH_CHECK( num_dims >= 2, - "Quantized Matmul currently only suports operands which are at least 2-dimensional. (", + "Quantized Matmul currently only supports operands which are at least 2-dimensional. (", num_dims, " provided)"); const int64_t m = qa.size(num_dims - 2); diff --git a/aten/src/ATen/native/quantized/cpu/qmul.cpp b/aten/src/ATen/native/quantized/cpu/qmul.cpp index 7015df9ea654..aa6ad0e724f5 100644 --- a/aten/src/ATen/native/quantized/cpu/qmul.cpp +++ b/aten/src/ATen/native/quantized/cpu/qmul.cpp @@ -1,9 +1,28 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include #include #include +#include +#include +#include #include +#include +#include #include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#endif #include @@ -21,7 +40,7 @@ inline void check_inputs(const Tensor& qa, const Tensor& qb) { TORCH_CHECK(qa.scalar_type() == qb.scalar_type(), "Mul operands should have same data type."); TORCH_CHECK(qa.qscheme() == qb.qscheme(), - "Both inputs to Mul must have the same quantization shceme."); + "Both inputs to Mul must have the same quantization scheme."); } // Note: out is assumed to be the same size as self and other. @@ -37,6 +56,124 @@ Tensor _mul_out(Tensor& out, const Tensor& self, const Tensor& other) { return out; } +#ifdef USE_XNNPACK +template +Tensor _mul_out_xnnpack( + const Tensor& self, + const Tensor& other, + double output_scale, + int64_t output_zero_point) { + using underlying_t = typename scalar_t::underlying; + + const string func_name = "xnnp_mul()"; + TORCH_CHECK(self.ndimension() > 0, func_name, ": Got empty input tensor."); + TORCH_CHECK( + at::native::xnnpack::available(), func_name, ": XNNPACK is not available") + + // using qa memory format for qb to allow xnnpack kernel to flatten all the + // dims + auto qa_mem_format = self.suggest_memory_format(); + Tensor self_contig = self.contiguous(qa_mem_format); + Tensor other_contig = other.contiguous(qa_mem_format); + + Tensor out = at::native::empty_affine_quantized( + at::infer_size_dimvector(self_contig.sizes(), other_contig.sizes()), + self.scalar_type(), + c10::nullopt /* layout */, + kCPU, + c10::nullopt /* pin_memory */, + output_scale, + output_zero_point, + qa_mem_format); + + if (self_contig.size(0) == 0) { + return out; + } + + int64_t self_zero_point = self_contig.q_zero_point(); + double self_scale = self_contig.q_scale(); + int64_t other_zero_point = other_contig.q_zero_point(); + double other_scale = other_contig.q_scale(); + + int64_t output_min = std::numeric_limits::min(); + int64_t output_max = std::numeric_limits::max(); + + if(ReLUFused) { + /* + * FIXME: use acticationLimits() + * With , MSVC runs into "error C3862: indetifier activationLimits not + * found". + */ + constexpr int64_t qmin = std::numeric_limits::min(); + constexpr int64_t qmax = std::numeric_limits::max(); + int64_t qvalue = static_cast(output_zero_point); + qvalue = std::max(qvalue, qmin); + output_min = static_cast(std::min(qvalue, qmax)); + } + + xnn_operator_t xnnp_op = nullptr; + xnnpack_operator xnnp_qmul_operator; + + // create xnnpack multiply operator ... + auto status = xnn_create_multiply_nd_qs8( + self_zero_point, + self_scale, + other_zero_point, + other_scale, + static_cast(output_zero_point), + static_cast(output_scale), + output_min, + output_max, + 0, + &xnnp_op); + + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn create operator failed(", + status, + ")!"); + xnnp_qmul_operator = xnnpack_operator(xnnp_op); + + + const auto self_shape = xnnp_utils::get_mem_format_aware_shape(self_contig); + const auto other_shape = xnnp_utils::get_mem_format_aware_shape(other_contig); + + // set up operator + status = xnn_setup_multiply_nd_qs8( + xnnp_qmul_operator.get(), + self_shape.size(), + self_shape.data(), + other_shape.size(), + other_shape.data(), + reinterpret_cast(self_contig.data_ptr()), + reinterpret_cast(other_contig.data_ptr()), + reinterpret_cast(out.data_ptr()), + caffe2::pthreadpool_()); + + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn setup operator failed(", + status, + ")!"); + + // Run the operator + status = xnn_run_operator( + xnnp_qmul_operator.get(), /* xnn_operator_t op */ + caffe2::pthreadpool_()); /* pthreadpool_t threadpool */ + TORCH_CHECK( + status == xnn_status_success, + func_name, + ": xnn run operator failed(", + status, + ")"); + + return out; +} + +#endif // use XNNPACK + template Tensor _mul_scalar_out(Tensor& out, const Tensor& self, const Scalar& other) { int64_t self_zero_point = self.q_zero_point(); @@ -100,19 +237,27 @@ Tensor _mul_scalar_out(Tensor& out, const Tensor& self, const Scalar& other) { }); return out; -} + } template class QMul final { public: static Tensor run(Tensor qa, Tensor qb, double scale, int64_t zero_point) { check_inputs(qa, qb); +#ifdef USE_XNNPACK + int64_t q_max = std::numeric_limits::max(); + if (zero_point < q_max && qa.scalar_type() == kQInt8) { + return _mul_out_xnnpack(qa, qb, scale, zero_point); + } +#endif // USE_XNNPACK + auto qc = at::_empty_affine_quantized( infer_size_dimvector(qa.sizes(), qb.sizes()), at::device(kCPU).dtype(qa.scalar_type()), scale, zero_point, qa.suggest_memory_format()); + return _mul_out(qc, qa, qb); } }; @@ -169,7 +314,7 @@ class QMulScalarTensor final { static Tensor run(Tensor qa, Tensor b) { TORCH_CHECK(qa.qscheme() == kPerTensorAffine || qa.qscheme() == kPerTensorSymmetric, - "Only per tensor quantization is suported in Mul."); + "Only per tensor quantization is supported in Mul."); auto qc = at::empty_like(qa, qa.suggest_memory_format()); return _mul_scalar_out(qc, qa, b.item()); } diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt b/aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt index 2c9ec7aa1e3a..8b5b82453a95 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/CMakeLists.txt @@ -175,7 +175,6 @@ set(PYTORCH_QNNPACK_EXEC_SRCS src/deconv-run.cc src/fc-run.cc src/fc-dynamic-run.cc - src/pack_block_sparse.cc src/indirection.c src/operator-run.c) diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/bench/q8gemm_sparse.cc b/aten/src/ATen/native/quantized/cpu/qnnpack/bench/q8gemm_sparse.cc index cb45912ed152..eabf62fe9410 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/bench/q8gemm_sparse.cc +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/bench/q8gemm_sparse.cc @@ -254,14 +254,13 @@ class Q8GEMMSparse : public benchmark::Fixture { colBlockSize(), sparsity(), kernel_zero_points.data()); - bcsr_matrix_ = - qnnpack::generateBlockCSRMatrix( - k_.data(), - nc(), - kc(), - rowBlockSize(), - colBlockSize(), - kernel_zero_points.data()); + bcsr_matrix_ = qnnpack::generateBlockCSRMatrix( + k_.data(), + nc(), + kc(), + rowBlockSize(), + colBlockSize(), + kernel_zero_points.data()); std::vector dequantization_scales(num_zero_points_kernel, 0.75f); c_.resize(mc() * nc()); std::fill(c_.begin(), c_.end(), 0xA5); @@ -466,13 +465,14 @@ BENCHMARK_TEMPLATE_DEFINE_F(Q8GEMMSparse_Op, 4x8c1x4_prepacked__aarch32_neon, 4, for (uint32_t n = 0, channel_offset = 0; n < nc(); n += nr(), channel_offset += nr()) { const uint32_t nrr = min(nc() - n, nr()); - pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon( + pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w32__aarch32_neon( mrr, nrr, a_packed.data() + (m >> 2) * (k_blocks << 2) * mr(), bcsr_matrix_->values.data(), - bcsr_matrix_->row_values.data() + n, - bcsr_matrix_->col_indices.data(), + static_cast(bcsr_matrix_->row_values_data_ptr()) + + n, + static_cast(bcsr_matrix_->col_indices_data_ptr()), b() + n, c() + m * nc() + n, nc(), @@ -512,13 +512,14 @@ BENCHMARK_TEMPLATE_DEFINE_F(Q8GEMMSparse_Op, 4x8c8x1_prepacked__aarch32_neon, 4, for (uint32_t n = 0, channel_offset = 0; n < nc(); n += nr(), channel_offset += nr()) { const uint32_t nrr = min(nc() - n, nr()); - pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon( + pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w32__aarch32_neon( mrr, nrr, a_packed.data() + (m >> 2) * (k_blocks << 2) * mr(), bcsr_matrix_->values.data(), - bcsr_matrix_->row_values.data() + (n >> 3), - bcsr_matrix_->col_indices.data(), + static_cast(bcsr_matrix_->row_values_data_ptr()) + + (n >> 3), + static_cast(bcsr_matrix_->col_indices_data_ptr()), b() + n, c() + m * nc() + n, nc(), @@ -585,13 +586,13 @@ BENCHMARK_TEMPLATE_DEFINE_F(Q8GEMMSparse_Op, 8x8c1x4_prepacked__aarch64_neon, 8, for (uint32_t n = 0, channel_offset = 0; n < nc(); n += nr(), channel_offset += nr()) { const uint32_t nrr = min(nc() - n, nr()); - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA__aarch64_neon( + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w32__aarch64_neon( mrr, nrr, a_packed.data() + (m >> 3) * (k_blocks << 2) * mr(), bcsr_matrix_->values.data(), - bcsr_matrix_->row_values.data(), - bcsr_matrix_->col_indices.data(), + static_cast(bcsr_matrix_->row_values_data_ptr()), + static_cast(bcsr_matrix_->col_indices_data_ptr()), b() + n, c() + m * nc() + n, nc(), @@ -630,13 +631,13 @@ BENCHMARK_TEMPLATE_DEFINE_F(Q8GEMMSparse_Op, 8x8c8x1_prepacked__aarch64_neon, 8, for (uint32_t n = 0, channel_offset = 0; n < nc(); n += nr(), channel_offset += nr()) { const uint32_t nrr = min(nc() - n, nr()); - pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA__aarch64_neon( + pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w32__aarch64_neon( mrr, nrr, a_packed.data() + (m >> 3) * (k_blocks << 2) * mr(), bcsr_matrix_->values.data(), - bcsr_matrix_->row_values.data(), - bcsr_matrix_->col_indices.data(), + static_cast(bcsr_matrix_->row_values_data_ptr()), + static_cast(bcsr_matrix_->col_indices_data_ptr()), b() + n, c() + m * nc() + n, nc(), diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/buckbuild.bzl b/aten/src/ATen/native/quantized/cpu/qnnpack/buckbuild.bzl index 5c1c316678e1..f981cce9726d 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/buckbuild.bzl +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/buckbuild.bzl @@ -272,7 +272,6 @@ def define_qnnpack(third_party, labels = []): "src/max-pooling.c", "src/operator-delete.c", "src/operator-run.c", - "src/pack_block_sparse.cc", "src/sigmoid.c", "src/softargmax.c", "src/tanh.c", diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/cmake/DownloadGoogleTest.cmake b/aten/src/ATen/native/quantized/cpu/qnnpack/cmake/DownloadGoogleTest.cmake index 4a86d641e412..66b2232b5925 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/cmake/DownloadGoogleTest.cmake +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/cmake/DownloadGoogleTest.cmake @@ -11,7 +11,7 @@ project(googletest-download NONE) include(ExternalProject) ExternalProject_Add(googletest URL https://github.com/google/googletest/archive/release-1.10.0.zip - URL_HASH SHA256=f3ed3b58511efd272eb074a3a6d6fb79d7c2e6a0e374323d1e6bcbcc1ef141bf + URL_HASH SHA256=94c634d499558a76fa649edb13721dce6e98fb1e7018dfaeba3cd7a083945e91 SOURCE_DIR "${CONFU_DEPENDENCIES_SOURCE_DIR}/googletest" BINARY_DIR "${CONFU_DEPENDENCIES_BINARY_DIR}/googletest" CONFIGURE_COMMAND "" diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/CMakeLists.txt b/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/CMakeLists.txt index f19d6c61f33f..e763e4e3ba93 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/CMakeLists.txt +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/CMakeLists.txt @@ -63,7 +63,7 @@ set_target_properties(clog PROPERTIES C_EXTENSIONS NO) CLOG_TARGET_RUNTIME_LIBRARY(clog) set_target_properties(clog PROPERTIES PUBLIC_HEADER include/clog.h) -target_include_directories(clog BEFORE PUBLIC include) +target_include_directories(clog PUBLIC $ $) if(CLOG_LOG_TO_STDIO) target_compile_definitions(clog PRIVATE CLOG_LOG_TO_STDIO=1) else() @@ -73,7 +73,10 @@ if(ANDROID AND NOT CLOG_LOG_TO_STDIO) target_link_libraries(clog PRIVATE log) endif() +add_library(cpuinfo::clog ALIAS clog) + install(TARGETS clog + EXPORT cpuinfo-targets LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}" ARCHIVE DESTINATION "${CMAKE_INSTALL_LIBDIR}" PUBLIC_HEADER DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}") diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/cmake/DownloadGoogleTest.cmake b/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/cmake/DownloadGoogleTest.cmake index 4a86d641e412..66b2232b5925 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/cmake/DownloadGoogleTest.cmake +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/cmake/DownloadGoogleTest.cmake @@ -11,7 +11,7 @@ project(googletest-download NONE) include(ExternalProject) ExternalProject_Add(googletest URL https://github.com/google/googletest/archive/release-1.10.0.zip - URL_HASH SHA256=f3ed3b58511efd272eb074a3a6d6fb79d7c2e6a0e374323d1e6bcbcc1ef141bf + URL_HASH SHA256=94c634d499558a76fa649edb13721dce6e98fb1e7018dfaeba3cd7a083945e91 SOURCE_DIR "${CONFU_DEPENDENCIES_SOURCE_DIR}/googletest" BINARY_DIR "${CONFU_DEPENDENCIES_BINARY_DIR}/googletest" CONFIGURE_COMMAND "" diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/include/pack_block_sparse.h b/aten/src/ATen/native/quantized/cpu/qnnpack/include/pack_block_sparse.h index bfaa19e564b4..4770b30638ad 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/include/pack_block_sparse.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/include/pack_block_sparse.h @@ -15,9 +15,14 @@ #ifndef _WIN32 #include #endif +#include #include #include +#ifdef QNNPACK_BCSRMATRIX_DEBUG +#include +#endif // QNNPACK_BCSRMATRIX_DEBUG + namespace qnnpack { template @@ -70,13 +75,20 @@ struct OwnedOrBorrowedVector { owned(false) {} }; -typedef struct BCSRMatrix { - OwnedOrBorrowedVector col_indices; - OwnedOrBorrowedVector row_values; +struct BCSRMatrix { OwnedOrBorrowedVector values; uint32_t col_block_size; // input features block size uint32_t row_block_size; // output features block size - void print() const; + enum pytorch_qnnp_sparse_matrix_indices_dtype indices_dtype; + virtual ~BCSRMatrix() = default; + // Return void for the data ptrs because it doesn't require knowing the + // underlying TypedBCSRMatrix indices dtype and that's how it's passed + // into the qnnpack fully connected sparse op + virtual const void* col_indices_data_ptr() const = 0; + virtual const void* row_values_data_ptr() const = 0; +#ifdef QNNPACK_BCSRMATRIX_DEBUG + virtual void print() const = 0; +#endif // QNNPACK_BCSRMATRIX_DEBUG /* * Unpack from BCSR to Dense * - Each value and zero point converted to int8_t by subtracting 128 @@ -84,29 +96,288 @@ typedef struct BCSRMatrix { * - dst should be able to hold num_rows * num_cols elements * - zero_points should hold num_rows zero points */ + virtual void unpack( + int8_t* dst, + const int64_t num_rows, + const int64_t num_cols, + const uint8_t* zero_points) const = 0; + virtual uint32_t max_index() const = 0; +}; + +template +struct TypedBCSRMatrix : BCSRMatrix { + OwnedOrBorrowedVector col_indices; + OwnedOrBorrowedVector row_values; + TypedBCSRMatrix(); + const void* col_indices_data_ptr() const override; + const void* row_values_data_ptr() const override; +#ifdef QNNPACK_BCSRMATRIX_DEBUG + void print() const override; +#endif // QNNPACK_BCSRMATRIX_DEBUG void unpack( int8_t* dst, const int64_t num_rows, const int64_t num_cols, - const uint8_t* zero_points) const; -} BCSRMatrix; + const uint8_t* zero_points) const override; + uint32_t max_index() const override; + + ~TypedBCSRMatrix() override = default; +}; +template std::unique_ptr generateBlockCSRMatrix( const uint8_t* a, const size_t N, const size_t K, const uint32_t row_block_size, const uint32_t col_block_size, - const uint8_t* zero_points); + const uint8_t* zero_points) { + assert(K > 0); + std::unique_ptr> bcsr_mat = + std::make_unique>(); + auto& row_values = bcsr_mat->row_values.vector(); + auto& col_indices = bcsr_mat->col_indices.vector(); + auto& values = bcsr_mat->values.vector(); + + const uint32_t num_row_blocks = (N + row_block_size - 1) / row_block_size; + // K must be > 0 + const uint32_t num_col_blocks = (K + col_block_size - 1) / col_block_size; + row_values.reserve(num_row_blocks); + uint32_t num_nnz_blocks{0}; + row_values.push_back(num_nnz_blocks); + for (uint32_t i = 0; i < num_row_blocks; ++i) { + for (uint32_t j = 0; j < num_col_blocks; ++j) { + bool block_zero{true}; + for (uint32_t ib = 0; ib < row_block_size; ++ib) { + uint32_t row_index = i * row_block_size + ib; + if PYTORCH_QNNP_UNLIKELY(row_index >= N) { + break; + } + for (uint32_t jb = 0; jb < col_block_size; ++jb) { + uint32_t col_index = j * col_block_size + jb; + if PYTORCH_QNNP_UNLIKELY(col_index >= K) { + goto block_scanned; + } + if (*(a + row_index * K + col_index) != zero_points[row_index]) { + block_zero = false; + goto block_scanned; + } + } + } +block_scanned: + if (!block_zero) { + col_indices.push_back(j); + num_nnz_blocks++; + for (uint32_t ib = 0; ib < row_block_size; ++ib) { + uint32_t row_index = i * row_block_size + ib; + if PYTORCH_QNNP_UNLIKELY(row_index >= N) { + for (; row_index < (num_row_blocks * row_block_size); row_index++) { + for (uint32_t jb = 0; jb < col_block_size; ++jb) { + values.push_back(zero_points[N-1]); + } + } + break; + } + for (uint32_t jb = 0; jb < col_block_size; ++jb) { + uint32_t col_index = j * col_block_size + jb; + if PYTORCH_QNNP_UNLIKELY(col_index >= K) { + values.push_back(zero_points[row_index]); + } else { + uint8_t val = *(a + row_index * K + col_index); + values.push_back(val); + } + } + } + } + } + row_values.push_back(num_nnz_blocks); + } + bcsr_mat->row_block_size = row_block_size; + bcsr_mat->col_block_size = col_block_size; + return bcsr_mat; +} + +template std::unique_ptr generateBlockCSRMatrix( - uint32_t* col_indices, - uint32_t* row_values, + INDICES_DTYPE* col_indices, + INDICES_DTYPE* row_values, uint8_t* values, const int64_t col_indices_size, const int64_t row_values_size, const int64_t values_size, const int64_t row_block_size, - const int64_t col_block_size); + const int64_t col_block_size) { + std::unique_ptr> bcsr_mat = + std::make_unique>(); + bcsr_mat->col_indices = + OwnedOrBorrowedVector(col_indices, col_indices_size); + bcsr_mat->row_values = + OwnedOrBorrowedVector(row_values, row_values_size); + bcsr_mat->values = OwnedOrBorrowedVector(values, values_size); + bcsr_mat->row_block_size = row_block_size; + bcsr_mat->col_block_size = col_block_size; + return bcsr_mat; +} + +template +struct IndicesDtypeEnumTrait { + static_assert( + sizeof(INDICES_DTYPE) == 0, + "Invalid dtype for IndicesDtypeEnumTrait"); +}; + +template <> +struct IndicesDtypeEnumTrait { + const static pytorch_qnnp_sparse_matrix_indices_dtype dtype = + pytorch_qnnp_sparse_matrix_indices_dtype_uint32_t; +}; + +template <> +struct IndicesDtypeEnumTrait { + const static pytorch_qnnp_sparse_matrix_indices_dtype dtype = + pytorch_qnnp_sparse_matrix_indices_dtype_uint16_t; +}; + +template <> +struct IndicesDtypeEnumTrait { + const static pytorch_qnnp_sparse_matrix_indices_dtype dtype = + pytorch_qnnp_sparse_matrix_indices_dtype_uint8_t; +}; + +template +TypedBCSRMatrix::TypedBCSRMatrix() { + indices_dtype = IndicesDtypeEnumTrait::dtype; +} + +template +const void* TypedBCSRMatrix::col_indices_data_ptr() const { + return static_cast(col_indices.data()); +} + +template +const void* TypedBCSRMatrix::row_values_data_ptr() const { + return static_cast(row_values.data()); +} + +#ifdef QNNPACK_BCSRMATRIX_DEBUG +template +void TypedBCSRMatrix::print() const { + std::cout << "row block size:" << row_block_size << std::endl; + std::cout << "col block size:" << col_block_size << std::endl; + std::cout << "row ptr\n"; + std::cout + << "indices dtype: uint" + << static_cast< + std::underlying_type_t>( + indices_dtype) + << "_t" << std::endl; + for (uint32_t i = 0; i < row_values.size(); i++) { + std::cout << (uint32_t)row_values[i] << ", "; + } + std::cout << std::endl; + std::cout << "col indices\n"; + for (uint32_t i = 0; i < col_indices.size(); i++) { + std::cout << (uint32_t)col_indices[i] << ", "; + } + std::cout << std::endl; + std::cout << "Actual values\n"; + for (uint32_t i = 0; i < values.size(); i++) { + std::cout << (uint32_t)values[i] << ", "; + } + std::cout << std::endl; +} +#endif // QNNPACK_BCSRMATRIX_DEBUG + +template +void TypedBCSRMatrix::unpack( + int8_t* dst, + const int64_t num_rows, + const int64_t num_cols, + const uint8_t* zero_points) const { + for (int64_t i = 0; i < num_rows; i++) { + memset( + dst + i * num_cols, + static_cast(static_cast(zero_points[i]) - 128), + num_cols * sizeof(int8_t)); + } + + const int64_t num_block_rows = static_cast(row_values.size()) - 1; + const int64_t block_size = (int64_t)row_block_size * col_block_size; + int64_t weight_values_num = 0; + for (int64_t block_row_num = 0; block_row_num < num_block_rows; + block_row_num++) { + const int64_t num_blocks_in_current_block_row = + row_values[block_row_num + 1] - row_values[block_row_num]; + for (int64_t k = 0; k < num_blocks_in_current_block_row; + k++) { // iterate over each block in the row + const int64_t block_start_row_num = block_row_num * row_block_size; + const int64_t block_start_col_num = + (int64_t)(col_indices[weight_values_num / block_size]) * + col_block_size; + for (int64_t l = 0; l < block_size; + l++) { // iterate over each value in the block + const int64_t row_num = block_start_row_num + l / col_block_size; + const int64_t col_num = block_start_col_num + l % col_block_size; + if (row_num < num_rows && col_num < num_cols) { + dst[row_num * num_cols + col_num] = static_cast( + static_cast(values[weight_values_num]) - 128); + } + weight_values_num++; + } + } + } +} + +template +uint32_t TypedBCSRMatrix::max_index() const { + return static_cast(std::max( + *std::max_element( + row_values.data(), row_values.data() + row_values.size()), + *std::max_element( + col_indices.data(), col_indices.data() + col_indices.size()))); +} + +/** + * Given a BCSRMatrix (bcsr_) and a block of code enclosed in { } + * (dispatch_body), run the block of code with the following in scope + * 1) The BCSRMatrix's underlying TypedBCSRMatrix, called typed_bcsr + * 2) The TypedBCSRMatrix's indices data type, called INDICES_DTYPE + */ +#define QNNPACK_BCSRMATRIX_DISPATCH_INDICES_DTYPE(bcsr_, dispatch_body) \ + [&bcsr = bcsr_]() { \ + switch (bcsr->indices_dtype) { \ + case pytorch_qnnp_sparse_matrix_indices_dtype_uint32_t: { \ + using INDICES_DTYPE = uint32_t; \ + const qnnpack::TypedBCSRMatrix* typed_bcsr = \ + static_cast*>( \ + bcsr.get()); \ + return [&typed_bcsr]() dispatch_body(); \ + } \ + case pytorch_qnnp_sparse_matrix_indices_dtype_uint16_t: { \ + using INDICES_DTYPE = uint16_t; \ + const qnnpack::TypedBCSRMatrix* typed_bcsr = \ + static_cast*>( \ + bcsr.get()); \ + return [&typed_bcsr]() dispatch_body(); \ + } \ + case pytorch_qnnp_sparse_matrix_indices_dtype_uint8_t: { \ + using INDICES_DTYPE = uint8_t; \ + const qnnpack::TypedBCSRMatrix* typed_bcsr = \ + static_cast*>( \ + bcsr.get()); \ + return [&typed_bcsr]() dispatch_body(); \ + } \ + case pytorch_qnnp_sparse_matrix_indices_dtype_invalid: { \ + assert(false); \ + } \ + } \ + /* Throw exception to avoid the following errors: */ \ + /* - "non-void lambda does not return a value in all control paths" */ \ + /* - "control reaches end of non-void function" */ \ + /* Throwing exception from within invalid case alone does not fix these */ \ + throw std::invalid_argument( \ + "Invalid indices dtype in QNNPACK_BCSRMATRIX_DISPATCH_INDICES_DTYPE"); \ + }() } // namespace qnnpack diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/include/pytorch_qnnpack.h b/aten/src/ATen/native/quantized/cpu/qnnpack/include/pytorch_qnnpack.h index 07666ea09605..c518104153e5 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/include/pytorch_qnnpack.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/include/pytorch_qnnpack.h @@ -32,6 +32,13 @@ enum pytorch_qnnp_status { pytorch_qnnp_status_out_of_memory = 5, }; +enum pytorch_qnnp_sparse_matrix_indices_dtype { + pytorch_qnnp_sparse_matrix_indices_dtype_invalid = 0, + pytorch_qnnp_sparse_matrix_indices_dtype_uint8_t = 8, + pytorch_qnnp_sparse_matrix_indices_dtype_uint16_t = 16, + pytorch_qnnp_sparse_matrix_indices_dtype_uint32_t = 32, +}; + enum pytorch_qnnp_status pytorch_qnnp_initialize(void); enum pytorch_qnnp_status pytorch_qnnp_deinitialize(void); @@ -168,11 +175,12 @@ enum pytorch_qnnp_status pytorch_qnnp_create_fully_connected_sparse_dq_nc_q8( size_t output_channels, uint8_t input_zero_point, const uint8_t* kernel_zero_points, - const uint32_t* kernel_col_indices, - const uint32_t* kernel_row_values, + const void* kernel_col_indices, + const void* kernel_row_values, const uint8_t* kernel_values, const uint32_t kernel_row_block_size, const uint32_t kernel_col_block_size, + enum pytorch_qnnp_sparse_matrix_indices_dtype kernel_indices_dtype, uint8_t output_zero_point, uint8_t output_min, uint8_t output_max, diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/fully-connected-sparse.c b/aten/src/ATen/native/quantized/cpu/qnnpack/src/fully-connected-sparse.c index 4feadadf9796..71226ab5250e 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/fully-connected-sparse.c +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/fully-connected-sparse.c @@ -26,11 +26,12 @@ enum pytorch_qnnp_status pytorch_qnnp_create_fully_connected_sparse_dq_nc_q8( size_t output_channels, uint8_t input_zero_point, const uint8_t* kernel_zero_points, - const uint32_t* kernel_col_indices, - const uint32_t* kernel_row_values, + const void* kernel_col_indices, + const void* kernel_row_values, const uint8_t* kernel_values, const uint32_t kernel_row_block_size, const uint32_t kernel_col_block_size, + enum pytorch_qnnp_sparse_matrix_indices_dtype kernel_indices_dtype, uint8_t output_zero_point, uint8_t output_min, uint8_t output_max, @@ -77,8 +78,34 @@ enum pytorch_qnnp_status pytorch_qnnp_create_fully_connected_sparse_dq_nc_q8( goto error; } } - fully_connected->sparse_matrix.col_indices = kernel_col_indices; - fully_connected->sparse_matrix.row_values = kernel_row_values; + + fully_connected->sparse_matrix.indices_dtype = kernel_indices_dtype; + switch (kernel_indices_dtype) { + case pytorch_qnnp_sparse_matrix_indices_dtype_uint32_t: + fully_connected->sparse_matrix.col_indices_w32 = + (const uint32_t*)kernel_col_indices; + fully_connected->sparse_matrix.row_values_w32 = + (const uint32_t*)kernel_row_values; + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_uint16_t: + fully_connected->sparse_matrix.col_indices_w16 = + (const uint16_t*)kernel_col_indices; + fully_connected->sparse_matrix.row_values_w16 = + (const uint16_t*)kernel_row_values; + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_uint8_t: + fully_connected->sparse_matrix.col_indices_w8 = + (const uint8_t*)kernel_col_indices; + fully_connected->sparse_matrix.row_values_w8 = + (const uint8_t*)kernel_row_values; + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_invalid: + status = pytorch_qnnp_status_invalid_parameter; + pytorch_qnnp_log_error( + "Invalid indices dtype specified for qnnpack fully connected sparse"); + goto error; + } + fully_connected->sparse_matrix.values = kernel_values; fully_connected->sparse_matrix.row_block_size = kernel_row_block_size; fully_connected->sparse_matrix.col_block_size = kernel_col_block_size; diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/init.c b/aten/src/ATen/native/quantized/cpu/qnnpack/src/init.c index 8768349d8587..b2ea18c669c6 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/init.c +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/init.c @@ -61,7 +61,9 @@ static void init(void) { }; pytorch_qnnp_params.q8gemm_sparse_c1x4 = (struct pytorch_q8gemm_sparse_parameters){ .gemm_dq = NULL, - .packedA_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon, + .packedA_w32_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w32__aarch32_neon, + .packedA_w16_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w16__aarch32_neon, + .packedA_w8_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w8__aarch32_neon, .packA = pytorch_q8gemm_sparse_packA_ukernel_4x4__aarch32_neon, .mr = 4, .nr = 8, @@ -73,7 +75,9 @@ static void init(void) { }; pytorch_qnnp_params.q8gemm_sparse_c8x1 = (struct pytorch_q8gemm_sparse_parameters){ .gemm_dq = NULL, - .packedA_gemm_dq = pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon, + .packedA_w32_gemm_dq = pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w32__aarch32_neon, + .packedA_w16_gemm_dq = pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w16__aarch32_neon, + .packedA_w8_gemm_dq = pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w8__aarch32_neon, .packA = pytorch_q8gemm_sparse_packA_ukernel_4x4__aarch32_neon, .mr = 4, .nr = 8, @@ -169,7 +173,9 @@ static void init(void) { #elif CPUINFO_ARCH_ARM64 pytorch_qnnp_params.q8gemm_sparse_c1x4 = (struct pytorch_q8gemm_sparse_parameters){ .gemm_dq = NULL, - .packedA_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA__aarch64_neon, + .packedA_w32_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w32__aarch64_neon, + .packedA_w16_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w16__aarch64_neon, + .packedA_w8_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w8__aarch64_neon, .packA = pytorch_q8gemm_sparse_packA_ukernel_8x4__aarch64_neon, .mr = 8, .nr = 8, @@ -181,7 +187,9 @@ static void init(void) { }; pytorch_qnnp_params.q8gemm_sparse_c8x1 = (struct pytorch_q8gemm_sparse_parameters){ .gemm_dq = NULL, - .packedA_gemm_dq = pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA__aarch64_neon, + .packedA_w32_gemm_dq = pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w32__aarch64_neon, + .packedA_w16_gemm_dq = pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w16__aarch64_neon, + .packedA_w8_gemm_dq = pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w8__aarch64_neon, .packA = pytorch_q8gemm_sparse_packA_ukernel_8x4__aarch64_neon, .mr = 8, .nr = 8, @@ -265,7 +273,9 @@ static void init(void) { }; pytorch_qnnp_params.q8gemm_sparse_c1x4 = (struct pytorch_q8gemm_sparse_parameters){ .gemm_dq = NULL, - .packedA_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2, + .packedA_w32_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2, + .packedA_w16_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2, + .packedA_w8_gemm_dq = pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2, .packA = pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, .mr = 8, .nr = 4, @@ -277,7 +287,9 @@ static void init(void) { }; pytorch_qnnp_params.q8gemm_sparse_c8x1 = (struct pytorch_q8gemm_sparse_parameters){ .gemm_dq = NULL, - .packedA_gemm_dq = NULL, + .packedA_w32_gemm_dq = NULL, + .packedA_w16_gemm_dq = NULL, + .packedA_w8_gemm_dq = NULL, .packA = NULL, .mr = 4, .nr = 8, diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/operator-run.c b/aten/src/ATen/native/quantized/cpu/qnnpack/src/operator-run.c index b1757ebb7ec9..a9a8858fe2b1 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/operator-run.c +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/operator-run.c @@ -128,14 +128,28 @@ struct q8gemm_prepackA_sparse_dq_context { size_t a_packed_stride; size_t log2_mr; size_t log2_row_block_size; - const uint32_t* kernel_col_indices; - const uint32_t* kernel_row_values; + union { + const uint32_t* kernel_col_indices_w32; + const uint16_t* kernel_col_indices_w16; + const uint8_t* kernel_col_indices_w8; + }; + union { + const uint32_t* kernel_row_values_w32; + const uint16_t* kernel_row_values_w16; + const uint8_t* kernel_row_values_w8; + }; + enum pytorch_qnnp_sparse_matrix_indices_dtype kernel_indices_dtype; const uint8_t* kernel_values; const float* bias; float* c; // can be float or uint8)t size_t c_stride; struct pytorch_qnnp_conv_dynamic_quantization_params quantization_params; - const pytorch_q8gemm_dq_sparse_packedA_ukernel_function ukernel; + union { + // Not const because assigned after context is initialized + pytorch_q8gemm_dq_sparse_packedA_w32_ukernel_function ukernel_w32; + pytorch_q8gemm_dq_sparse_packedA_w16_ukernel_function ukernel_w16; + pytorch_q8gemm_dq_sparse_packedA_w8_ukernel_function ukernel_w8; + }; const pytorch_q8gemm_sparse_packA_ukernel_function prepack_ukernel; }; @@ -172,26 +186,66 @@ static void compute_q8gemm_prepacked_sparse_dq( size_t pixel_range, size_t mr_block_size, size_t nr_block_size) { - const uint8_t* restrict a_packed = context->a_packed; const size_t mr_packed_block_start = ((mr_block_start >> context->log2_mr) * context->a_packed_stride); - float* restrict c = (float*)context->c; + const uint8_t* restrict a_packed = context->a_packed + mr_packed_block_start; const size_t c_stride = context->c_stride; - - size_t output_channel_index = nr_block_start; - context->ukernel( - mr_block_size, - nr_block_size, - a_packed + mr_packed_block_start, - context->kernel_values, - context->kernel_row_values + - (nr_block_start >> context->log2_row_block_size), - context->kernel_col_indices, - context->bias + nr_block_start, - c + mr_block_start * c_stride + nr_block_start, - c_stride, - output_channel_index, - &context->quantization_params); + float* restrict c = + ((float*)context->c) + mr_block_start * c_stride + nr_block_start; + const size_t kernel_row_values_shift = + nr_block_start >> context->log2_row_block_size; + const float* bias = context->bias + nr_block_start; + const size_t output_channel_index = nr_block_start; + + switch (context->kernel_indices_dtype) { + case pytorch_qnnp_sparse_matrix_indices_dtype_uint32_t: + context->ukernel_w32( + mr_block_size, + nr_block_size, + a_packed, + context->kernel_values, + context->kernel_row_values_w32 + kernel_row_values_shift, + context->kernel_col_indices_w32, + bias, + c, + c_stride, + output_channel_index, + &context->quantization_params); + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_uint16_t: + context->ukernel_w16( + mr_block_size, + nr_block_size, + a_packed, + context->kernel_values, + context->kernel_row_values_w16 + kernel_row_values_shift, + context->kernel_col_indices_w16, + bias, + c, + c_stride, + output_channel_index, + &context->quantization_params); + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_uint8_t: + context->ukernel_w8( + mr_block_size, + nr_block_size, + a_packed, + context->kernel_values, + context->kernel_row_values_w8 + kernel_row_values_shift, + context->kernel_col_indices_w8, + bias, + c, + c_stride, + output_channel_index, + &context->quantization_params); + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_invalid: + pytorch_qnnp_log_error( + "Invalid indices dtype specified for " + "operator-run compute_q8gemm_prepacked_sparse_dq"); + assert(false); + } } struct q8sum_rows_context { @@ -1094,7 +1148,8 @@ enum pytorch_qnnp_status pytorch_qnnp_run_operator( const size_t group_output_channels = op->group_output_channels; uint32_t mr, log2_mr, nr, kr, log2_row_block_size; pytorch_q8gemm_sparse_packA_ukernel_function prepack_kernel; - pytorch_q8gemm_dq_sparse_packedA_ukernel_function compute_kernel; + struct pytorch_q8gemm_sparse_parameters* pytorch_q8gemm_sparse_params = + NULL; // used to assign ukernel if (op->sparse_matrix.row_block_size == 1 && op->sparse_matrix.col_block_size == 4) { mr = pytorch_qnnp_params.q8gemm_sparse_c1x4.mr; @@ -1102,9 +1157,8 @@ enum pytorch_qnnp_status pytorch_qnnp_run_operator( log2_row_block_size = 0; nr = pytorch_qnnp_params.q8gemm_sparse_c1x4.nr; kr = pytorch_qnnp_params.q8gemm_sparse_c1x4.kr; - compute_kernel = - pytorch_qnnp_params.q8gemm_sparse_c1x4.packedA_gemm_dq; prepack_kernel = pytorch_qnnp_params.q8gemm_sparse_c1x4.packA; + pytorch_q8gemm_sparse_params = &pytorch_qnnp_params.q8gemm_sparse_c1x4; } else if (op->sparse_matrix.row_block_size == 8 && op->sparse_matrix.col_block_size == 1) { mr = pytorch_qnnp_params.q8gemm_sparse_c8x1.mr; @@ -1112,9 +1166,8 @@ enum pytorch_qnnp_status pytorch_qnnp_run_operator( log2_row_block_size = 3; nr = pytorch_qnnp_params.q8gemm_sparse_c8x1.nr; kr = pytorch_qnnp_params.q8gemm_sparse_c8x1.kr; - compute_kernel = - pytorch_qnnp_params.q8gemm_sparse_c8x1.packedA_gemm_dq; prepack_kernel = pytorch_qnnp_params.q8gemm_sparse_c8x1.packA; + pytorch_q8gemm_sparse_params = &pytorch_qnnp_params.q8gemm_sparse_c8x1; } else { return pytorch_qnnp_status_invalid_parameter; } @@ -1132,24 +1185,56 @@ enum pytorch_qnnp_status pytorch_qnnp_run_operator( } struct q8gemm_prepackA_sparse_dq_context - q8gemm_prepack_sparse_dq_context = { - .k = group_input_channels, - .a = op->input, - .a_stride = op->input_pixel_stride, - .a_packed = op->prepacked_a, - .a_packed_stride = k_stride * mr, - .log2_mr = log2_mr, - .log2_row_block_size = log2_row_block_size, - .kernel_col_indices = op->sparse_matrix.col_indices, - .kernel_row_values = op->sparse_matrix.row_values, - .kernel_values = op->sparse_matrix.values, - .bias = (const float*)op->bias, - .c = (float*)op->output, - .c_stride = op->output_pixel_stride, - .quantization_params = op->dynamic_conv_quantization_params, - .ukernel = compute_kernel, - .prepack_ukernel = prepack_kernel, - }; + q8gemm_prepack_sparse_dq_context = { + .k = group_input_channels, + .a = op->input, + .a_stride = op->input_pixel_stride, + .a_packed = op->prepacked_a, + .a_packed_stride = k_stride * mr, + .log2_mr = log2_mr, + .log2_row_block_size = log2_row_block_size, + .kernel_indices_dtype = op->sparse_matrix.indices_dtype, + .kernel_values = op->sparse_matrix.values, + .bias = (const float*)op->bias, + .c = (float*)op->output, + .c_stride = op->output_pixel_stride, + .quantization_params = op->dynamic_conv_quantization_params, + .prepack_ukernel = prepack_kernel, + // kernel_col_indices, kernel_row_values, and ukernel assigned + // below + }; + + switch (q8gemm_prepack_sparse_dq_context.kernel_indices_dtype) { + case pytorch_qnnp_sparse_matrix_indices_dtype_uint32_t: + q8gemm_prepack_sparse_dq_context.kernel_col_indices_w32 = + op->sparse_matrix.col_indices_w32; + q8gemm_prepack_sparse_dq_context.kernel_row_values_w32 = + op->sparse_matrix.row_values_w32; + q8gemm_prepack_sparse_dq_context.ukernel_w32 = + pytorch_q8gemm_sparse_params->packedA_w32_gemm_dq; + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_uint16_t: + q8gemm_prepack_sparse_dq_context.kernel_col_indices_w16 = + op->sparse_matrix.col_indices_w16; + q8gemm_prepack_sparse_dq_context.kernel_row_values_w16 = + op->sparse_matrix.row_values_w16; + q8gemm_prepack_sparse_dq_context.ukernel_w16 = + pytorch_q8gemm_sparse_params->packedA_w16_gemm_dq; + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_uint8_t: + q8gemm_prepack_sparse_dq_context.kernel_col_indices_w8 = + op->sparse_matrix.col_indices_w8; + q8gemm_prepack_sparse_dq_context.kernel_row_values_w8 = + op->sparse_matrix.row_values_w8; + q8gemm_prepack_sparse_dq_context.ukernel_w8 = + pytorch_q8gemm_sparse_params->packedA_w8_gemm_dq; + break; + case pytorch_qnnp_sparse_matrix_indices_dtype_invalid: + pytorch_qnnp_log_error( + "Invalid indices dtype specified for " + "operator-run pytorch_qnnp_ukernel_type_gemm_prepackA_sparse_dq"); + return pytorch_qnnp_status_invalid_parameter; + } // This batch size is not the actual batch size of the op // The batch size is modified in fully-connected-sparse.c diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/pack_block_sparse.cc b/aten/src/ATen/native/quantized/cpu/qnnpack/src/pack_block_sparse.cc deleted file mode 100644 index c837f55cda85..000000000000 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/pack_block_sparse.cc +++ /dev/null @@ -1,170 +0,0 @@ -/* - * Copyright (c) Facebook, Inc. and its affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ -#include -#include -#include -#include -#include - -#include - -namespace qnnpack { -std::unique_ptr generateBlockCSRMatrix( - const uint8_t* a, - const size_t N, - const size_t K, - const uint32_t row_block_size, - const uint32_t col_block_size, - const uint8_t* zero_points) { - assert(K > 0); - std::unique_ptr bcsr_mat_ptr = std::make_unique(); - auto& bcsr_mat = *bcsr_mat_ptr; - auto& row_values = bcsr_mat.row_values.vector(); - auto& col_indices = bcsr_mat.col_indices.vector(); - auto& values = bcsr_mat.values.vector(); - - const uint32_t num_row_blocks = (N + row_block_size - 1) / row_block_size; - // K must be > 0 - const uint32_t num_col_blocks = (K + col_block_size - 1) / col_block_size; - - row_values.reserve(num_row_blocks); - uint32_t num_nnz_blocks{0}; - row_values.push_back(num_nnz_blocks); - for (uint32_t i = 0; i < num_row_blocks; ++i) { - for (uint32_t j = 0; j < num_col_blocks; ++j) { - bool block_zero{true}; - for (uint32_t ib = 0; ib < row_block_size; ++ib) { - uint32_t row_index = i * row_block_size + ib; - if PYTORCH_QNNP_UNLIKELY(row_index >= N) { - break; - } - for (uint32_t jb = 0; jb < col_block_size; ++jb) { - uint32_t col_index = j * col_block_size + jb; - if PYTORCH_QNNP_UNLIKELY(col_index >= K) { - goto block_scanned; - } - if (*(a + row_index * K + col_index) != zero_points[row_index]) { - block_zero = false; - goto block_scanned; - } - } - } -block_scanned: - if (!block_zero) { - col_indices.push_back(j); - num_nnz_blocks++; - for (uint32_t ib = 0; ib < row_block_size; ++ib) { - uint32_t row_index = i * row_block_size + ib; - if PYTORCH_QNNP_UNLIKELY(row_index >= N) { - for (; row_index < (num_row_blocks * row_block_size); row_index++) { - for (uint32_t jb = 0; jb < col_block_size; ++jb) { - values.push_back(zero_points[N-1]); - } - } - break; - } - for (uint32_t jb = 0; jb < col_block_size; ++jb) { - uint32_t col_index = j * col_block_size + jb; - if PYTORCH_QNNP_UNLIKELY(col_index >= K) { - values.push_back(zero_points[row_index]); - } else { - uint8_t val = *(a + row_index * K + col_index); - values.push_back(val); - } - } - } - } - } - row_values.push_back(num_nnz_blocks); - } - bcsr_mat.row_block_size = row_block_size; - bcsr_mat.col_block_size = col_block_size; - return bcsr_mat_ptr; -} - -std::unique_ptr generateBlockCSRMatrix( - uint32_t* col_indices, - uint32_t* row_values, - uint8_t* values, - const int64_t col_indices_size, - const int64_t row_values_size, - const int64_t values_size, - const int64_t row_block_size, - const int64_t col_block_size) { - std::unique_ptr bcsr_mat_ptr = std::make_unique(); - BCSRMatrix& bcsr_mat = *bcsr_mat_ptr; - bcsr_mat.col_indices = - OwnedOrBorrowedVector(col_indices, col_indices_size); - bcsr_mat.row_values = - OwnedOrBorrowedVector(row_values, row_values_size); - bcsr_mat.values = OwnedOrBorrowedVector(values, values_size); - bcsr_mat.row_block_size = row_block_size; - bcsr_mat.col_block_size = col_block_size; - return bcsr_mat_ptr; -} - -void BCSRMatrix::print() const { - std::cout << "row block size:" << row_block_size << std::endl; - std::cout << "col block size:" << col_block_size << std::endl; - std::cout << "row ptr\n"; - for (int i = 0; i < row_values.size(); i++) { - std::cout << row_values[i] << ", "; - } - std::cout << std::endl; - std::cout << "col indices\n"; - for (int i = 0; i < col_indices.size(); i++) { - std::cout << col_indices[i] << ", "; - } - std::cout << std::endl; - std::cout << "Actual values\n"; - for (int i = 0; i < values.size(); i++) { - std::cout << (uint32_t)values[i] << ", "; - } - std::cout << std::endl; -} - -void BCSRMatrix::unpack( - int8_t* dst, - const int64_t num_rows, - const int64_t num_cols, - const uint8_t* zero_points) const { - for (int64_t i = 0; i < num_rows; i++) { - memset( - dst + i * num_cols, - static_cast(static_cast(zero_points[i]) - 128), - num_cols * sizeof(int8_t)); - } - - const int64_t num_block_rows = static_cast(row_values.size()) - 1; - const int64_t block_size = (int64_t)row_block_size * col_block_size; - int64_t weight_values_num = 0; - for (int64_t block_row_num = 0; block_row_num < num_block_rows; - block_row_num++) { - const int64_t num_blocks_in_current_block_row = - row_values[block_row_num + 1] - row_values[block_row_num]; - for (int64_t k = 0; k < num_blocks_in_current_block_row; - k++) { // iterate over each block in the row - const int64_t block_start_row_num = block_row_num * row_block_size; - const int64_t block_start_col_num = - (int64_t)(col_indices[weight_values_num / block_size]) * - col_block_size; - for (int64_t l = 0; l < block_size; - l++) { // iterate over each value in the block - const int64_t row_num = block_start_row_num + l / col_block_size; - const int64_t col_num = block_start_col_num + l % col_block_size; - if (row_num < num_rows && col_num < num_cols) { - dst[row_num * num_cols + col_num] = static_cast( - static_cast(values[weight_values_num]) - 128); - } - weight_values_num++; - } - } - } -} - -} // namsepace qnnpack diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/4x4c2-sse2.c b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/4x4c2-sse2.c index 0b2da5a62bed..398496e08115 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/4x4c2-sse2.c +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/4x4c2-sse2.c @@ -327,14 +327,15 @@ void pytorch_q8gemm_ukernel_4x4c2__sse2( (uint32_t)_mm_cvtsi128_si32(_mm_unpackhi_epi32(vout, vout)); *((uint32_t*)c3) = (uint32_t)_mm_cvtsi128_si32(_mm_srli_si128(vout, 12)); } else { + typedef PYTORCH_QNNP_UNALIGNED uint16_t unaligned_uint16_t; if (nr >= 2) { - *((uint16_t*)c0) = (uint16_t)_mm_extract_epi16(vout, 0); + *((unaligned_uint16_t*)c0) = (uint16_t)_mm_extract_epi16(vout, 0); c0 += 2; - *((uint16_t*)c1) = (uint16_t)_mm_extract_epi16(vout, 2); + *((unaligned_uint16_t*)c1) = (uint16_t)_mm_extract_epi16(vout, 2); c1 += 2; - *((uint16_t*)c2) = (uint16_t)_mm_extract_epi16(vout, 4); + *((unaligned_uint16_t*)c2) = (uint16_t)_mm_extract_epi16(vout, 4); c2 += 2; - *((uint16_t*)c3) = (uint16_t)_mm_extract_epi16(vout, 6); + *((unaligned_uint16_t*)c3) = (uint16_t)_mm_extract_epi16(vout, 6); c3 += 2; vout = _mm_srli_epi32(vout, 16); nr -= 2; diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/4x8c1x4-dq-packedA-aarch32-neon.S b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/4x8c1x4-dq-packedA-aarch32-neon.S index 1d545734f6d4..5b796bb2563c 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/4x8c1x4-dq-packedA-aarch32-neon.S +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/4x8c1x4-dq-packedA-aarch32-neon.S @@ -9,6 +9,12 @@ #include #include +#ifndef __APPLE__ +#define NDEF_APPLE_SYMBOLS .arch armv7-a; .fpu neon +#else +#define NDEF_APPLE_SYMBOLS +#endif + # r0 mr # r1 nr # r2 packed_a @@ -60,7 +66,397 @@ # |----------------| # -# void pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon( +# void pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch32_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint##W_INDEX_DTYPE_NUM_BITS##_t* w_row_ptr, +# const uint##W_INDEX_DTYPE_NUM_BITS##_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +#define MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_4X8_PACKEDA__AARCH32_NEON(W_INDEX_DTYPE_NUM_BITS, W_INDEX_DTYPE_NUM_BYTES_ARG, W_INDEX_DTYPE_LOG_NUM_BYTES_ARG, LOAD_INDEX_INSTRUCTION) ;\ + BEGIN_FUNCTION pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch32_neon ;\ + .arm ;\ + NDEF_APPLE_SYMBOLS ;\ + ;\ + PUSH {r4, r5, r6, r7, r8, r9, r10, r11, lr} ;\ + VPUSH {d8-d15} ;\ + ;\ + /* Store nr in r11 as well for late user. */ ;\ + MOV r11, r1 ;\ + /* Load output channel index */ ;\ + LDR r5, [sp, 120] ;\ + /* Load quantization params */ ;\ + /* - r7 = quantization_params */ ;\ + LDR r7, [sp, 124] ;\ + /* Load input_zero_point */ ;\ + VLD1.8 {d16[]}, [r7] ;\ + ADD r7, r7, 4 ;\ + /* Load pointer to per channel zero points array */ ;\ + LDR r4, [r7] ;\ + /* Add output_channel_index to the b_zero_point pointer */ ;\ + ADD r4, r4, r5 ;\ + ;\ + /* We enter the loop if r1 is atleast 1. */ ;\ + /* r1 = r1 - 1 will happen in the epilogue */ ;\ + /* of the loop */ ;\ + CMP r1, 1 ;\ + BLO _7_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + /* Load w_row_ptr + n */ ;\ + LDR r5, [sp, 100] ;\ + /* r7 = blocks_id_ptr */ ;\ + LDR r7, [sp, 104] ;\ + ;\ + .p2align 5 ;\ + _0_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + VEOR q10, q10, q10 ;\ + VLD1.8 {d17[]}, [r4]! ;\ + /* ip = w_row_ptr[n], lr = w_row_ptr[n+1] */ ;\ + /* r5 = r5 + W_INDEX_DTYPE_NUM_BYTES_ARG to point to next n */ ;\ + LOAD_INDEX_INSTRUCTION ip, [r5], W_INDEX_DTYPE_NUM_BYTES_ARG ;\ + LOAD_INDEX_INSTRUCTION lr, [r5] ;\ + /* r6 = temp_packed_w = packed_w + w_row_ptr[n] * 4 */ ;\ + /* This points to the first block of nonzero value */ ;\ + /* for the nth row. */ ;\ + ADD r6, r3, ip, LSL #2 ;\ + /* r9 = temp_w_block_ids_ptr = w_block_ids_ptr (r7) + w_row_ptr[n] */ ;\ + /* LSL for when elements are >1 byte */ ;\ + /* (4 bytes: LSL #2, 2 bytes: LSL #1, 1 byte: LSL #0) */ ;\ + /* This points to the block id of the first block */ ;\ + /* It should contain lr - ip number of block ids */ ;\ + ADD r9, r7, ip, LSL W_INDEX_DTYPE_LOG_NUM_BYTES_ARG ;\ + /* r8 = num_blocks that needs to be processed */ ;\ + SUB r8, lr, ip ;\ + SUBS r8, r8, 2 ;\ + BLO _1_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + k_loop_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + /* Load 2 non zero blocks of weights. Each block = 1x4. */ ;\ + VLD1.8 {d0}, [r6]! ;\ + ;\ + /* ip = block_id_ptr[0] */ ;\ + /* lr = block_id_ptr[1] */ ;\ + LOAD_INDEX_INSTRUCTION ip, [r9], W_INDEX_DTYPE_NUM_BYTES_ARG ;\ + LOAD_INDEX_INSTRUCTION lr, [r9], W_INDEX_DTYPE_NUM_BYTES_ARG ;\ + ;\ + /* Add offset to r2 */ ;\ + /* Shift by 4 because each packed block is a block of 4x4 */ ;\ + /* which 16 bytes */ ;\ + ADD r10, r2, ip, LSL #4 ;\ + /* q9 = vxb */ ;\ + VSUBL.U8 q0, d0, d17 ;\ + ;\ + /* d2, d3 = 4x4 transposed */ ;\ + VLD1.8 {d2}, [r10]! ;\ + VLD1.8 {d3}, [r10] ;\ + ;\ + ADD r10, r2, lr, LSL #4 ;\ + ;\ + VSUBL.U8 q4, d2, d16 /* vxa0_t */ ;\ + ;\ + /* d4, d5 = next 4x4 transposed */ ;\ + VLD1.8 {d4}, [r10]! ;\ + VLD1.8 {d5}, [r10] ;\ + ;\ + VSUBL.U8 q5, d3, d16 /* vxa1_t */ ;\ + VSUBL.U8 q6, d4, d16 /* vxa4_t */ ;\ + VSUBL.U8 q7, d5, d16 /* vxa5_t */ ;\ + ;\ + /* q4, q5 = 4x4 block (16 values each of 16 bits) */ ;\ + /* q6, q7 = 4x4 block (16 values each of 16 bits) */ ;\ + ;\ + VMLAL.S16 q10, d8, d0[0] ;\ + VMLAL.S16 q10, d9, d0[1] ;\ + VMLAL.S16 q10, d10, d0[2] ;\ + VMLAL.S16 q10, d11, d0[3] ;\ + VMLAL.S16 q10, d12, d1[0] ;\ + VMLAL.S16 q10, d13, d1[1] ;\ + VMLAL.S16 q10, d14, d1[2] ;\ + VMLAL.S16 q10, d15, d1[3] ;\ + ;\ + SUBS r8, r8, 2 ;\ + ;\ + BHS k_loop_w##W_INDEX_DTYPE_NUM_BITS ;\ + _1_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + CMP r8, -2 ;\ + BEQ _2_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + /* Load last nonzero block */ ;\ + /* For this we will load 4 8 bit values as one 32 bit value */ ;\ + VLD1.32 {d0[]}, [r6]! ;\ + /* q9 = vxb */ ;\ + VSUBL.U8 q0, d0, d17 ;\ + ;\ + /* ip = block_id_ptr[0] */ ;\ + LOAD_INDEX_INSTRUCTION ip, [r9] ;\ + ;\ + /* Add offset to r2 */ ;\ + /* Shift by 4 because each packed block is a block of 4x4 */ ;\ + /* which 16 bytes */ ;\ + ADD r10, r2, ip, LSL #4 ;\ + ;\ + VLD1.8 {d2}, [r10]! ;\ + VLD1.8 {d3}, [r10] ;\ + ;\ + VSUBL.U8 q4, d2, d16 /* vxa0_t */ ;\ + VSUBL.U8 q5, d3, d16 /* vxa1_t */ ;\ + ;\ + VMLAL.S16 q10, d8, d0[0] ;\ + VMLAL.S16 q10, d9, d0[1] ;\ + VMLAL.S16 q10, d10, d0[2] ;\ + VMLAL.S16 q10, d11, d0[3] ;\ + ;\ + .p2align 4 ;\ + _2_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + /* Store result on stack */ ;\ + ;\ + /* -12 because TOS - 4, TOS - 8, and TOS - 12, store mr, nr and pointer to weight zp */ ;\ + /* + 128 bytes of buffer when nr = 1 */ ;\ + /* This is needed because after processing all nrs we will */ ;\ + /* load 128 bytes from stack. This is for q10, q11 for max nr of 4 */ ;\ + /* Thus we will load accumulators back in q0, q1, q2, q3, q4, q5, q6, q7 */ ;\ + /* When nr < 4, extra q values will be fetched from stack which may overlap */ ;\ + /* with other parts of stack storing local variables. To avoid that we just */ ;\ + /* create a buffer of 128 bytes inbetween to make sure pointer increment */ ;\ + /* never produces address that is beyond the stack frame of this function. */ ;\ + SUB r9, sp, 140 ;\ + /* Each iteration produce 4 values each of 4 bytes */ ;\ + /* Thus 4 x 4 = 16 bytes 2^4 */ ;\ + /* In this implementation, first value will be stored at */ ;\ + /* 1st value: sp - 12 - r1 * 16 */ ;\ + /* 2nd value: sp - 12 - (r1 - 1) * 16 */ ;\ + /* and so on. */ ;\ + SUB r9, r9, r1, LSL #4 ;\ + VST1.32 {q10}, [r9] ;\ + ;\ + /* Check if nr >=1 */ ;\ + SUBS r1, r1, 1 ;\ + BHI _0_w##W_INDEX_DTYPE_NUM_BITS ;\ + _3_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + /* First load all the accumulators from stack */ ;\ + /* Load nr */ ;\ + SUB r9, sp, 140 ;\ + SUB r9, r9, r11, LSL #4 ;\ + /* Now load q8-q15 */ ;\ + /* This is 8x4 block (nrxmr) */ ;\ + /* We will transpose this to 4x8 (mrxnr) */ ;\ + /* q8, q12 : x00, x10, x20, x30; x04, x14, x24, x34 */ ;\ + /* q9, q13 : x01, x11, x21, x31; x05, x15, x25, x35 */ ;\ + /* q10, q14 : x02, x12, x22, x32; x06, x16, x26, x36 */ ;\ + /* q11, q15 : x03, x13, x23, x33; x07, x17, x27, x37 */ ;\ + VLD1.32 {q8}, [r9]! ;\ + VLD1.32 {q9}, [r9]! ;\ + VLD1.32 {q10}, [r9]! ;\ + VLD1.32 {q11}, [r9]! ;\ + VLD1.32 {q12}, [r9]! ;\ + VLD1.32 {q13}, [r9]! ;\ + VLD1.32 {q14}, [r9]! ;\ + VLD1.32 {q15}, [r9] ;\ + ;\ + /*# Now transpose q8-11 */ ;\ + /* VTRN.32 q8, q9 */ ;\ + /* VTRN.32 q10, q11 */ ;\ + /* q8 : X00, x01, x20, x21 */ ;\ + /* q9 : X10, x11, x30, x31 */ ;\ + /* q10: X02, x03, x22, x23 */ ;\ + /* q11: X12, x13, x32, x33 */ ;\ + /* VSWP d16, d17 */ ;\ + /* q8 : x20, x21, x00, x01 */ ;\ + /* VEXT.32 q6, q8, q10, 2 */ ;\ + /* q6 : x00, x01, x02, x03 */ ;\ + /* VEXT.32 q10, q10, q8, 2 */ ;\ + /* q10: x22, x23, x20, x21 */ ;\ + /* VSWP d20, d21 */ ;\ + /* VMOV q8, q6 */ ;\ + /* q8 : X00, x01, x02, x03 */ ;\ + /* q10: x20, x21, x22, x23 */ ;\ + /* VSWP d18, d19 */ ;\ + /* q9 : x30, x31, x10, x11 */ ;\ + /* VEXT.32 q6, q9, q11, 2 */ ;\ + /* q6 : x10, x11, x12, x13 */ ;\ + /* VEXT.32 q11, q11, q9, 2 */ ;\ + /* q11: x32, x33, x30, x31 */ ;\ + /* VSWP d22, d23 */ ;\ + /* VMOV q9, q6 */ ;\ + /* q9 : x10, x11, x12, x13 */ ;\ + /* q11: x30, x31, x32, x33 */ ;\ + /* Thus we have */ ;\ + /* q8 : X00, x01, x02, x03 */ ;\ + /* q9 : X10, x11, x12, x13 */ ;\ + /* q10: X20, x21, x22, x23 */ ;\ + /* q11: X30, x31, x32, x33 */ ;\ + /* Now we can do the same for q4-q7 */ ;\ + /* q12: X04, X05, X06, X07 */ ;\ + /* q13: X14, X15, X16, X17 */ ;\ + /* q14: X24, X25, X26, X27 */ ;\ + /* q15: X34, X35, X36, X37 */ ;\ + ;\ + VTRN.32 q8, q9 ;\ + VTRN.32 q10, q11 ;\ + VSWP d16, d17 ;\ + VEXT.32 q6, q8, q10, 2 ;\ + VEXT.32 q10, q10, q8, 2 ;\ + VSWP d20, d21 ;\ + VMOV q8, q6 ;\ + VSWP d18, d19 ;\ + VEXT.32 q6, q9, q11, 2 ;\ + VEXT.32 q11, q11, q9, 2 ;\ + VSWP d22, d23 ;\ + VMOV q9, q6 ;\ + ;\ + VTRN.32 q12, q13 ;\ + VTRN.32 q14, q15 ;\ + VSWP d24, d25 ;\ + VEXT.32 q6, q12, q14, 2 ;\ + VEXT.32 q14, q14, q12, 2 ;\ + VSWP d28, d29 ;\ + VMOV q12, q6 ;\ + VSWP d26, d27 ;\ + VEXT.32 q6, q13, q15, 2 ;\ + VEXT.32 q15, q15, q13, 2 ;\ + VSWP d30, d31 ;\ + VMOV q13, q6 ;\ + ;\ + /* Load output channel index */ ;\ + LDR r5, [sp, 120] ;\ + /* Load quantization params */ ;\ + /* - r7 = quantization_params */ ;\ + LDR r7, [sp, 124] ;\ + ADD r7, r7, 8 ;\ + /* Load pointer to per channel requant scale */ ;\ + LDR r7, [r7] ;\ + /* Now r7 has the base_addr + offset for multipliers */ ;\ + ADD r7, r7, r5, LSL #2 ;\ + ;\ + LDR r6, [sp, 108] ;\ + /* Load q6: vmultiplier_c0123 */ ;\ + VLD1.32 {d12, d13}, [r7]! ;\ + /* Load q7: vmultiplier_c4567 */ ;\ + VLD1.32 {d14, d15}, [r7] ;\ + VCVT.F32.S32 q8, q8 ;\ + VCVT.F32.S32 q9, q9 ;\ + VCVT.F32.S32 q10, q10 ;\ + VLD1.32 {q0}, [r6]! ;\ + VLD1.32 {q1}, [r6] ;\ + ;\ + VCVT.F32.S32 q11, q11 ;\ + VCVT.F32.S32 q12, q12 ;\ + VCVT.F32.S32 q13, q13 ;\ + VCVT.F32.S32 q14, q14 ;\ + VCVT.F32.S32 q15, q15 ;\ + ;\ + VMUL.F32 q8, q8, q6 ;\ + VMUL.F32 q9, q9, q6 ;\ + VMUL.F32 q10, q10, q6 ;\ + VMUL.F32 q11, q11, q6 ;\ + VMUL.F32 q12, q12, q7 ;\ + VMUL.F32 q13, q13, q7 ;\ + VMUL.F32 q14, q14, q7 ;\ + VMUL.F32 q15, q15, q7 ;\ + ;\ + VADD.F32 q8, q8, q0 ;\ + VADD.F32 q9, q9, q0 ;\ + VADD.F32 q10, q10, q0 ;\ + VADD.F32 q11, q11, q0 ;\ + VADD.F32 q12, q12, q1 ;\ + VADD.F32 q13, q13, q1 ;\ + VADD.F32 q14, q14, q1 ;\ + VADD.F32 q15, q15, q1 ;\ + ;\ + /* Load c, c_stride: */ ;\ + /* - r1 = c */ ;\ + /* - r9 = c_stride */ ;\ + LDR r1, [sp, 112] ;\ + LDR r9, [sp, 116] ;\ + LSL r9, r9, 2 ;\ + ;\ + /* r1 = c0 = c pointer */ ;\ + ;\ + CMP r0, 2 ;\ + /* r2 = c1 */ ;\ + ADD r2, r1, r9 ;\ + MOVLO r2, r1 ;\ + ;\ + /* r3 = c2 */ ;\ + ADD r3, r2, r9 ;\ + MOVLS r3, r2 ;\ + ;\ + CMP r0, 4 ;\ + /* r4 = c3 */ ;\ + ADD r4, r3, r9 ;\ + MOVNE r4, r3 ;\ + ;\ + CMP r11, 8 ;\ + BNE _4_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + VST1.32 {q8}, [r1]! ;\ + VST1.32 {q9}, [r2]! ;\ + VST1.32 {q10}, [r3]! ;\ + VST1.32 {q11}, [r4]! ;\ + VST1.32 {q12}, [r1] ;\ + VST1.32 {q13}, [r2] ;\ + VST1.32 {q14}, [r3] ;\ + VST1.32 {q15}, [r4] ;\ + ;\ + VPOP {d8-d15} ;\ + POP {r4, r5, r6, r7, r8, r9, r10, r11, lr} ;\ + BX lr ;\ + ;\ + .p2align 3 ;\ + _4_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + CMP r11, 4 ;\ + BLO _5_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + VST1.32 {q8}, [r1]! ;\ + VST1.32 {q9}, [r2]! ;\ + VST1.32 {q10}, [r3]! ;\ + VST1.32 {q11}, [r4]! ;\ + ;\ + SUB r11, 4 ;\ + ;\ + VMOV.32 q8, q12 ;\ + VMOV.32 q9, q13 ;\ + VMOV.32 q10, q14 ;\ + VMOV.32 q11, q15 ;\ + ;\ + _5_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + CMP r11, 2 ;\ + BLO _6_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + VST1.32 {d16}, [r1]! ;\ + VST1.32 {d18}, [r2]! ;\ + VST1.32 {d20}, [r3]! ;\ + VST1.32 {d22}, [r4]! ;\ + ;\ + SUB r11, 2 ;\ + ;\ + VEXT.32 q8, q8, 2 ;\ + VEXT.32 q9, q9, 2 ;\ + VEXT.32 q10, q10, 2 ;\ + VEXT.32 q11, q11, 2 ;\ + ;\ + _6_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + TEQ r11, 0 ;\ + BEQ _7_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + VST1.32 {d16[0]}, [r1] ;\ + VST1.32 {d18[0]}, [r2] ;\ + VST1.32 {d20[0]}, [r3] ;\ + VST1.32 {d22[0]}, [r4] ;\ + ;\ + _7_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + VPOP {d8-d15} ;\ + POP {r4, r5, r6, r7, r8, r9, r10, r11, lr} ;\ + BX lr ;\ + ;\ + END_FUNCTION pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch32_neon + +# void pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w32__aarch32_neon( # size_t mr, # size_t nr, # const uint8_t* a_packed, @@ -72,385 +468,39 @@ # size_t c_stride, # size_t output_channel_index, # const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) -BEGIN_FUNCTION pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon - .arm -#ifndef __APPLE__ - .arch armv7-a - .fpu neon -#endif - - PUSH {r4, r5, r6, r7, r8, r9, r10, r11, lr} - VPUSH {d8-d15} - - # Store nr in r11 as well for late user. - MOV r11, r1 - # Load output channel index - LDR r5, [sp, 120] - # Load quantization params - # - r7 = quantization_params - LDR r7, [sp, 124] - # Load input_zero_point - VLD1.8 {d16[]}, [r7] - ADD r7, r7, 4 - # Load pointer to per channel zero points array - LDR r4, [r7] - # Add output_channel_index to the b_zero_point pointer - ADD r4, r4, r5 - - # We enter the loop if r1 is atleast 1. - # r1 = r1 - 1 will happen in the epilogue - # of the loop - CMP r1, 1 - BLO 7f - - # Load w_row_ptr + n - LDR r5, [sp, 100] - # r7 = blocks_id_ptr - LDR r7, [sp, 104] - - .p2align 5 -0: - VEOR q10, q10, q10 - VLD1.8 {d17[]}, [r4]! - # ip = w_row_ptr[n], lr = w_row_ptr[n+1] - # r5 = r5 + 4 to point to next n - LDR ip, [r5], #4 - LDR lr, [r5] - # r6 = temp_packed_w = packed_w + w_row_ptr[n] * 4 - # This points to the first block of nonzero value - # for the nth row. - ADD r6, r3, ip, LSL #2 - # r9 = temp_w_block_ids_ptr = w_block_ids_ptr (r7) + w_row_ptr[n] - # LSL2 because each element is 4 bytes - # This points to the block id of the first block - # It should contain lr - ip number of block ids - ADD r9, r7, ip, LSL #2 - # r8 = num_blocks that needs to be processed - SUB r8, lr, ip - SUBS r8, r8, 2 - BLO 1f - -k_loop: - # Load 2 non zero blocks of weights. Each block = 1x4. - VLD1.8 {d0}, [r6]! - - #ip = block_id_ptr[0] - #lr = block_id_ptr[1] - LDR ip, [r9], #4 - LDR lr, [r9], #4 - - # Add offset to r2 - # Shift by 4 because each packed block is a block of 4x4 - # which 16 bytes - ADD r10, r2, ip, LSL #4 - # q9 = vxb - VSUBL.U8 q0, d0, d17 - - # d2, d3 = 4x4 transposed - VLD1.8 {d2}, [r10]! - VLD1.8 {d3}, [r10] - - ADD r10, r2, lr, LSL #4 - - VSUBL.U8 q4, d2, d16 // vxa0_t - - # d4, d5 = next 4x4 transposed - VLD1.8 {d4}, [r10]! - VLD1.8 {d5}, [r10] - - VSUBL.U8 q5, d3, d16 // vxa1_t - VSUBL.U8 q6, d4, d16 // vxa4_t - VSUBL.U8 q7, d5, d16 // vxa5_t - - # q4, q5 = 4x4 block (16 values each of 16 bits) - # q6, q7 = 4x4 block (16 values each of 16 bits) - - VMLAL.S16 q10, d8, d0[0] - VMLAL.S16 q10, d9, d0[1] - VMLAL.S16 q10, d10, d0[2] - VMLAL.S16 q10, d11, d0[3] - VMLAL.S16 q10, d12, d1[0] - VMLAL.S16 q10, d13, d1[1] - VMLAL.S16 q10, d14, d1[2] - VMLAL.S16 q10, d15, d1[3] - - SUBS r8, r8, 2 - - BHS k_loop -1: - CMP r8, -2 - BEQ 2f - - # Load last nonzero block - # For this we will load 4 8 bit values as one 32 bit value - VLD1.32 {d0[]}, [r6]! - # q9 = vxb - VSUBL.U8 q0, d0, d17 - - #ip = block_id_ptr[0] - LDR ip, [r9] - - # Add offset to r2 - # Shift by 4 because each packed block is a block of 4x4 - # which 16 bytes - ADD r10, r2, ip, LSL #4 - - VLD1.8 {d2}, [r10]! - VLD1.8 {d3}, [r10] - - VSUBL.U8 q4, d2, d16 // vxa0_t - VSUBL.U8 q5, d3, d16 // vxa1_t - - VMLAL.S16 q10, d8, d0[0] - VMLAL.S16 q10, d9, d0[1] - VMLAL.S16 q10, d10, d0[2] - VMLAL.S16 q10, d11, d0[3] - - .p2align 4 -2: - # Store result on stack - - # -12 because TOS - 4, TOS - 8, and TOS - 12, store mr, nr and pointer to weight zp - # + 128 bytes of buffer when nr = 1 - # This is needed because after processing all nrs we will - # load 128 bytes from stack. This is for q10, q11 for max nr of 4 - # Thus we will load accumulators back in q0, q1, q2, q3, q4, q5, q6, q7 - # When nr < 4, extra q values will be fetched from stack which may overlap - # with other parts of stack storing local variables. To avoid that we just - # create a buffer of 128 bytes inbetween to make sure pointer increment - # never produces address that is beyond the stack frame of this function. - SUB r9, sp, 140 - # Each iteration produce 4 values each of 4 bytes - # Thus 4 x 4 = 16 bytes 2^4 - # In this implementation, first value will be stored at - # 1st value: sp - 12 - r1 * 16 - # 2nd value: sp - 12 - (r1 - 1) * 16 - # and so on. - SUB r9, r9, r1, LSL #4 - VST1.32 {q10}, [r9] - - # Check if nr >=1 - SUBS r1, r1, 1 - BHI 0b -3: - # First load all the accumulators from stack - # Load nr - SUB r9, sp, 140 - SUB r9, r9, r11, LSL #4 - # Now load q8-q15 - # This is 8x4 block (nrxmr) - # We will transpose this to 4x8 (mrxnr) - # q8, q12 : x00, x10, x20, x30; x04, x14, x24, x34 - # q9, q13 : x01, x11, x21, x31; x05, x15, x25, x35 - # q10, q14 : x02, x12, x22, x32; x06, x16, x26, x36 - # q11, q15 : x03, x13, x23, x33; x07, x17, x27, x37 - VLD1.32 {q8}, [r9]! - VLD1.32 {q9}, [r9]! - VLD1.32 {q10}, [r9]! - VLD1.32 {q11}, [r9]! - VLD1.32 {q12}, [r9]! - VLD1.32 {q13}, [r9]! - VLD1.32 {q14}, [r9]! - VLD1.32 {q15}, [r9] - - ## Now transpose q8-11 - # VTRN.32 q8, q9 - # VTRN.32 q10, q11 - # q8 : X00, x01, x20, x21 - # q9 : X10, x11, x30, x31 - # q10: X02, x03, x22, x23 - # q11: X12, x13, x32, x33 - # VSWP d16, d17 - # q8 : x20, x21, x00, x01 - # VEXT.32 q6, q8, q10, 2 - # q6 : x00, x01, x02, x03 - # VEXT.32 q10, q10, q8, 2 - # q10: x22, x23, x20, x21 - # VSWP d20, d21 - # VMOV q8, q6 - # q8 : X00, x01, x02, x03 - # q10: x20, x21, x22, x23 - # VSWP d18, d19 - # q9 : x30, x31, x10, x11 - # VEXT.32 q6, q9, q11, 2 - # q6 : x10, x11, x12, x13 - # VEXT.32 q11, q11, q9, 2 - # q11: x32, x33, x30, x31 - # VSWP d22, d23 - # VMOV q9, q6 - # q9 : x10, x11, x12, x13 - # q11: x30, x31, x32, x33 - # Thus we have - # q8 : X00, x01, x02, x03 - # q9 : X10, x11, x12, x13 - # q10: X20, x21, x22, x23 - # q11: X30, x31, x32, x33 - # Now we can do the same for q4-q7 - # q12: X04, X05, X06, X07 - # q13: X14, X15, X16, X17 - # q14: X24, X25, X26, X27 - # q15: X34, X35, X36, X37 - - VTRN.32 q8, q9 - VTRN.32 q10, q11 - VSWP d16, d17 - VEXT.32 q6, q8, q10, 2 - VEXT.32 q10, q10, q8, 2 - VSWP d20, d21 - VMOV q8, q6 - VSWP d18, d19 - VEXT.32 q6, q9, q11, 2 - VEXT.32 q11, q11, q9, 2 - VSWP d22, d23 - VMOV q9, q6 - - VTRN.32 q12, q13 - VTRN.32 q14, q15 - VSWP d24, d25 - VEXT.32 q6, q12, q14, 2 - VEXT.32 q14, q14, q12, 2 - VSWP d28, d29 - VMOV q12, q6 - VSWP d26, d27 - VEXT.32 q6, q13, q15, 2 - VEXT.32 q15, q15, q13, 2 - VSWP d30, d31 - VMOV q13, q6 +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_4X8_PACKEDA__AARCH32_NEON(32, #4, #2, LDR) - # Load output channel index - LDR r5, [sp, 120] - # Load quantization params - # - r7 = quantization_params - LDR r7, [sp, 124] - ADD r7, r7, 8 - # Load pointer to per channel requant scale - LDR r7, [r7] - # Now r7 has the base_addr + offset for multipliers - ADD r7, r7, r5, LSL #2 - - LDR r6, [sp, 108] - # Load q6: vmultiplier_c0123 - VLD1.32 {d12, d13}, [r7]! - # Load q7: vmultiplier_c4567 - VLD1.32 {d14, d15}, [r7] - VCVT.F32.S32 q8, q8 - VCVT.F32.S32 q9, q9 - VCVT.F32.S32 q10, q10 - VLD1.32 {q0}, [r6]! - VLD1.32 {q1}, [r6] - - VCVT.F32.S32 q11, q11 - VCVT.F32.S32 q12, q12 - VCVT.F32.S32 q13, q13 - VCVT.F32.S32 q14, q14 - VCVT.F32.S32 q15, q15 - - VMUL.F32 q8, q8, q6 - VMUL.F32 q9, q9, q6 - VMUL.F32 q10, q10, q6 - VMUL.F32 q11, q11, q6 - VMUL.F32 q12, q12, q7 - VMUL.F32 q13, q13, q7 - VMUL.F32 q14, q14, q7 - VMUL.F32 q15, q15, q7 - - VADD.F32 q8, q8, q0 - VADD.F32 q9, q9, q0 - VADD.F32 q10, q10, q0 - VADD.F32 q11, q11, q0 - VADD.F32 q12, q12, q1 - VADD.F32 q13, q13, q1 - VADD.F32 q14, q14, q1 - VADD.F32 q15, q15, q1 - - # Load c, c_stride: - # - r1 = c - # - r9 = c_stride - LDR r1, [sp, 112] - LDR r9, [sp, 116] - LSL r9, r9, 2 - - # r1 = c0 = c pointer - - CMP r0, 2 - # r2 = c1 - ADD r2, r1, r9 - MOVLO r2, r1 - - # r3 = c2 - ADD r3, r2, r9 - MOVLS r3, r2 - - CMP r0, 4 - # r4 = c3 - ADD r4, r3, r9 - MOVNE r4, r3 - - CMP r11, 8 - BNE 4f - - VST1.32 {q8}, [r1]! - VST1.32 {q9}, [r2]! - VST1.32 {q10}, [r3]! - VST1.32 {q11}, [r4]! - VST1.32 {q12}, [r1] - VST1.32 {q13}, [r2] - VST1.32 {q14}, [r3] - VST1.32 {q15}, [r4] - - VPOP {d8-d15} - POP {r4, r5, r6, r7, r8, r9, r10, r11, lr} - BX lr - - .p2align 3 -4: - CMP r11, 4 - BLO 5f - - VST1.32 {q8}, [r1]! - VST1.32 {q9}, [r2]! - VST1.32 {q10}, [r3]! - VST1.32 {q11}, [r4]! - - SUB r11, 4 - - VMOV.32 q8, q12 - VMOV.32 q9, q13 - VMOV.32 q10, q14 - VMOV.32 q11, q15 - -5: - CMP r11, 2 - BLO 6f - - VST1.32 {d16}, [r1]! - VST1.32 {d18}, [r2]! - VST1.32 {d20}, [r3]! - VST1.32 {d22}, [r4]! - - SUB r11, 2 - - VEXT.32 q8, q8, 2 - VEXT.32 q9, q9, 2 - VEXT.32 q10, q10, 2 - VEXT.32 q11, q11, 2 - -6: - TEQ r11, 0 - BEQ 7f - - VST1.32 {d16[0]}, [r1] - VST1.32 {d18[0]}, [r2] - VST1.32 {d20[0]}, [r3] - VST1.32 {d22[0]}, [r4] - -7: - VPOP {d8-d15} - POP {r4, r5, r6, r7, r8, r9, r10, r11, lr} - BX lr +# void pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w16__aarch32_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint16_t* w_row_ptr, +# const uint16_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_4X8_PACKEDA__AARCH32_NEON(16, #2, #1, LDRH) -END_FUNCTION pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon +# void pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w8__aarch32_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint8_t* w_row_ptr, +# const uint8_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_4X8_PACKEDA__AARCH32_NEON(8, #1, #0, LDRB) #ifdef __ELF__ .section ".note.GNU-stack","",%progbits #endif + +#undef NDEF_APPLE_SYMBOLS +#undef MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_4X8_PACKEDA__AARCH32_NEON diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/4x8c8x1-dq-packedA-aarch32-neon.S b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/4x8c8x1-dq-packedA-aarch32-neon.S index 109307d082d1..dd829f80e373 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/4x8c8x1-dq-packedA-aarch32-neon.S +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/4x8c8x1-dq-packedA-aarch32-neon.S @@ -9,6 +9,12 @@ #include #include +#ifndef __APPLE__ +#define NDEF_APPLE_SYMBOLS .arch armv7-a; .fpu neon +#else +#define NDEF_APPLE_SYMBOLS +#endif + # r0 mr # r1 nr # r2 packed_a @@ -60,7 +66,306 @@ # |----------------| # -# void pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon( +# void pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch32_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint##W_INDEX_DTYPE_NUM_BITS##_t* w_row_ptr, +# const uint##W_INDEX_DTYPE_NUM_BITS##_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +#define MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_4X8_PACKEDA__AARCH32_NEON(W_INDEX_DTYPE_NUM_BITS, W_INDEX_DTYPE_NUM_BYTES_ARG, W_INDEX_DTYPE_LOG_NUM_BYTES_ARG, LOAD_INDEX_INSTRUCTION) ;\ + BEGIN_FUNCTION pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch32_neon ;\ + .arm ;\ + NDEF_APPLE_SYMBOLS ;\ + ;\ + PUSH {r4, r5, r6, r7, r8, r9, r10, r11, lr} ;\ + VPUSH {d8-d15} ;\ + ;\ + /* Store nr in r11 as well for late user. */ ;\ + MOV r11, r1 ;\ + /* Load output channel index */ ;\ + LDR r5, [sp, 120] ;\ + /* Load quantization params */ ;\ + /* - r7 = quantization_params */ ;\ + LDR r7, [sp, 124] ;\ + /* Load input_zero_point */ ;\ + VLD1.8 {d14[]}, [r7] ;\ + ADD r7, r7, 4 ;\ + /* Load pointer to per channel zero points array */ ;\ + LDR r4, [r7] ;\ + /* Add output_channel_index to the b_zero_point pointer */ ;\ + ADD r4, r4, r5 ;\ + ;\ + /* Load w_row_ptr + n */ ;\ + LDR r5, [sp, 100] ;\ + /* r7 = blocks_id_ptr */ ;\ + LDR r7, [sp, 104] ;\ + ;\ + VEOR q8, q8, q8 ;\ + VEOR q9, q9, q9 ;\ + VEOR q10, q10, q10 ;\ + VEOR q11, q11, q11 ;\ + VEOR q12, q12, q12 ;\ + VEOR q13, q13, q13 ;\ + VEOR q14, q14, q14 ;\ + VEOR q15, q15, q15 ;\ + VLD1.8 {d15}, [r4] ;\ + /* ip = w_row_ptr[n], lr = w_row_ptr[n+1] */ ;\ + /* r5 = r5 + W_INDEX_DTYPE_NUM_BYTES_ARG to point to next n */ ;\ + LOAD_INDEX_INSTRUCTION ip, [r5], W_INDEX_DTYPE_NUM_BYTES_ARG ;\ + LOAD_INDEX_INSTRUCTION lr, [r5] ;\ + /* r6 = temp_packed_w = packed_w + w_row_ptr[n] * 8 */ ;\ + /* * 8 because each block contains 8 values */ ;\ + /* This points to the first block of nonzero value */ ;\ + /* for the nth row. */ ;\ + ADD r6, r3, ip, LSL #3 ;\ + /* r9 = temp_w_block_ids_ptr = w_block_ids_ptr (r7) + w_row_ptr[n] */ ;\ + /* LSL for when elements are >1 byte */ ;\ + /* (4 bytes: LSL #2, 2 bytes: LSL #1, 1 byte: LSL #0) */ ;\ + /* This points to the col block id of the first block */ ;\ + /* It should contain lr - ip number of block ids */ ;\ + /* Note that in this kernel sparsity pattern is 8x1. */ ;\ + /* Thus each block contains only 1 k as opposed to */ ;\ + /* 1x4 where each block contains 4 k. */ ;\ + ADD r9, r7, ip, LSL W_INDEX_DTYPE_LOG_NUM_BYTES_ARG ;\ + /* r8 = num_blocks that needs to be processed */ ;\ + SUB r8, lr, ip ;\ + SUBS r8, r8, 2 ;\ + BLO _1_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + .p2align 5 ;\ + k_loop_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + /* Load 2 non zero blocks of weights. Each block = 8x1. */ ;\ + VLD1.8 {d0}, [r6]! ;\ + VLD1.8 {d2}, [r6]! ;\ + ;\ + /* ip = block_id_ptr[0] */ ;\ + /* lr = block_id_ptr[1] */ ;\ + LOAD_INDEX_INSTRUCTION ip, [r9], W_INDEX_DTYPE_NUM_BYTES_ARG ;\ + LOAD_INDEX_INSTRUCTION lr, [r9], W_INDEX_DTYPE_NUM_BYTES_ARG ;\ + ;\ + /* Add offset to r2 */ ;\ + /* Shift by 4 because each packed block is a block of 4x1 */ ;\ + /* which 4 bytes */ ;\ + ADD r10, r2, ip, LSL #2 ;\ + /* q9 = vxb */ ;\ + VSUBL.U8 q0, d0, d15 ;\ + VSUBL.U8 q1, d2, d15 ;\ + ;\ + /* d4 = 4x1 transposed */ ;\ + VLD1.32 {d4[]}, [r10] ;\ + ;\ + ADD r10, r2, lr, LSL #2 ;\ + ;\ + VSUBL.U8 q2, d4, d14 /* vxa0_t */ ;\ + ;\ + /* d5 = next 4x1 transposed */ ;\ + VLD1.32 {d6[]}, [r10] ;\ + ;\ + VSUBL.U8 q3, d6, d14 /* vxa1_t */ ;\ + ;\ + /* q0 = d0, d1 = 8x1 block of weight for k */ ;\ + /* q1 = d2, d3 = 8x1 block of weight for k + 1 */ ;\ + /* q2's d4 = 4x1 block of activation for k */ ;\ + /* q3's d6 = 4x1 block of activation for k + 1 */ ;\ + ;\ + /* Generate 4x8 block as two 4x4 blocks */ ;\ + ;\ + VMLAL.S16 q8, d0, d4[0] ;\ + VMLAL.S16 q9, d1, d4[0] ;\ + VMLAL.S16 q10, d0, d4[1] ;\ + VMLAL.S16 q11, d1, d4[1] ;\ + VMLAL.S16 q12, d0, d4[2] ;\ + VMLAL.S16 q13, d1, d4[2] ;\ + VMLAL.S16 q14, d0, d4[3] ;\ + VMLAL.S16 q15, d1, d4[3] ;\ + ;\ + VMLAL.S16 q8, d2, d6[0] ;\ + VMLAL.S16 q9, d3, d6[0] ;\ + VMLAL.S16 q10, d2, d6[1] ;\ + VMLAL.S16 q11, d3, d6[1] ;\ + VMLAL.S16 q12, d2, d6[2] ;\ + VMLAL.S16 q13, d3, d6[2] ;\ + VMLAL.S16 q14, d2, d6[3] ;\ + VMLAL.S16 q15, d3, d6[3] ;\ + ;\ + SUBS r8, r8, 2 ;\ + ;\ + BHS k_loop_w##W_INDEX_DTYPE_NUM_BITS ;\ + _1_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + CMP r8, -2 ;\ + BEQ _3_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + /* Load last nonzero block */ ;\ + /* For this we will load 4 8 bit values as one 32 bit value */ ;\ + VLD1.8 {d0}, [r6] ;\ + /* q9 = vxb */ ;\ + VSUBL.U8 q0, d0, d15 ;\ + ;\ + /* ip = block_id_ptr[0] */ ;\ + LOAD_INDEX_INSTRUCTION ip, [r9] ;\ + ;\ + /* Add offset to r2 */ ;\ + /* Shift by 4 because each packed block is a block of 4x1 */ ;\ + /* which 4 bytes */ ;\ + ADD r10, r2, ip, LSL #2 ;\ + ;\ + VLD1.32 {d4[]}, [r10]! ;\ + ;\ + VSUBL.U8 q2, d4, d14 /* vxa0_t */ ;\ + ;\ + VMLAL.S16 q8, d0, d4[0] ;\ + VMLAL.S16 q9, d1, d4[0] ;\ + VMLAL.S16 q10, d0, d4[1] ;\ + VMLAL.S16 q11, d1, d4[1] ;\ + VMLAL.S16 q12, d0, d4[2] ;\ + VMLAL.S16 q13, d1, d4[2] ;\ + VMLAL.S16 q14, d0, d4[3] ;\ + VMLAL.S16 q15, d1, d4[3] ;\ + ;\ + ;\ + .p2align 4 ;\ + _3_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + /* Load output channel index */ ;\ + LDR r5, [sp, 120] ;\ + /* Load quantization params */ ;\ + /* - r7 = quantization_params */ ;\ + LDR r7, [sp, 124] ;\ + ADD r7, r7, 8 ;\ + /* Load pointer to per channel requant scale */ ;\ + LDR r7, [r7] ;\ + /* Now r7 has the base_addr + offset for multipliers */ ;\ + ADD r7, r7, r5, LSL #2 ;\ + ;\ + LDR r6, [sp, 108] ;\ + /* Load q6: vmultiplier_c0123 */ ;\ + VLD1.32 {d12, d13}, [r7]! ;\ + /* Load q7: vmultiplier_c4567 */ ;\ + VLD1.32 {d14, d15}, [r7] ;\ + VCVT.F32.S32 q8, q8 ;\ + VCVT.F32.S32 q9, q9 ;\ + VCVT.F32.S32 q10, q10 ;\ + VLD1.32 {q0}, [r6]! ;\ + VLD1.32 {q1}, [r6] ;\ + ;\ + VCVT.F32.S32 q11, q11 ;\ + VCVT.F32.S32 q12, q12 ;\ + VCVT.F32.S32 q13, q13 ;\ + VCVT.F32.S32 q14, q14 ;\ + VCVT.F32.S32 q15, q15 ;\ + ;\ + VMUL.F32 q8, q8, q6 ;\ + VMUL.F32 q9, q9, q7 ;\ + VMUL.F32 q10, q10, q6 ;\ + VMUL.F32 q11, q11, q7 ;\ + VMUL.F32 q12, q12, q6 ;\ + VMUL.F32 q13, q13, q7 ;\ + VMUL.F32 q14, q14, q6 ;\ + VMUL.F32 q15, q15, q7 ;\ + ;\ + VADD.F32 q8, q8, q0 ;\ + VADD.F32 q9, q9, q1 ;\ + VADD.F32 q10, q10, q0 ;\ + VADD.F32 q11, q11, q1 ;\ + VADD.F32 q12, q12, q0 ;\ + VADD.F32 q13, q13, q1 ;\ + VADD.F32 q14, q14, q0 ;\ + VADD.F32 q15, q15, q1 ;\ + ;\ + /* Load c, c_stride: */ ;\ + /* - r1 = c */ ;\ + /* - r9 = c_stride */ ;\ + LDR r1, [sp, 112] ;\ + LDR r9, [sp, 116] ;\ + LSL r9, r9, 2 ;\ + ;\ + /* r1 = c0 = c pointer */ ;\ + ;\ + CMP r0, 2 ;\ + /* r2 = c1 */ ;\ + ADD r2, r1, r9 ;\ + MOVLO r2, r1 ;\ + ;\ + /* r3 = c2 */ ;\ + ADD r3, r2, r9 ;\ + MOVLS r3, r2 ;\ + ;\ + CMP r0, 4 ;\ + /* r4 = c3 */ ;\ + ADD r4, r3, r9 ;\ + MOVNE r4, r3 ;\ + ;\ + CMP r11, 8 ;\ + BNE _4_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + VST1.32 {q8}, [r1]! ;\ + VST1.32 {q10}, [r2]! ;\ + VST1.32 {q12}, [r3]! ;\ + VST1.32 {q14}, [r4]! ;\ + VST1.32 {q9}, [r1] ;\ + VST1.32 {q11}, [r2] ;\ + VST1.32 {q13}, [r3] ;\ + VST1.32 {q15}, [r4] ;\ + ;\ + VPOP {d8-d15} ;\ + POP {r4, r5, r6, r7, r8, r9, r10, r11, lr} ;\ + BX lr ;\ + ;\ + .p2align 3 ;\ + _4_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + CMP r11, 4 ;\ + BLO _5_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + VST1.32 {q8}, [r1]! ;\ + VST1.32 {q10}, [r2]! ;\ + VST1.32 {q12}, [r3]! ;\ + VST1.32 {q14}, [r4]! ;\ + ;\ + SUB r11, 4 ;\ + ;\ + VMOV.32 q8, q9 ;\ + VMOV.32 q10, q11 ;\ + VMOV.32 q12, q13 ;\ + VMOV.32 q14, q15 ;\ + ;\ + _5_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + CMP r11, 2 ;\ + BLO _6_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + VST1.32 {d16}, [r1]! ;\ + VST1.32 {d20}, [r2]! ;\ + VST1.32 {d24}, [r3]! ;\ + VST1.32 {d28}, [r4]! ;\ + ;\ + SUB r11, 2 ;\ + ;\ + VEXT.32 q8, q8, 2 ;\ + VEXT.32 q10, q10, 2 ;\ + VEXT.32 q12, q12, 2 ;\ + VEXT.32 q14, q14, 2 ;\ + ;\ + _6_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + TEQ r11, 0 ;\ + BEQ _7_w##W_INDEX_DTYPE_NUM_BITS ;\ + ;\ + VST1.32 {d16[0]}, [r1] ;\ + VST1.32 {d20[0]}, [r2] ;\ + VST1.32 {d24[0]}, [r3] ;\ + VST1.32 {d28[0]}, [r4] ;\ + ;\ + _7_w##W_INDEX_DTYPE_NUM_BITS##: ;\ + VPOP {d8-d15} ;\ + POP {r4, r5, r6, r7, r8, r9, r10, r11, lr} ;\ + BX lr ;\ + ;\ + END_FUNCTION pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch32_neon + +# void pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w32__aarch32_neon( # size_t mr, # size_t nr, # const uint8_t* a_packed, @@ -72,294 +377,39 @@ # size_t c_stride, # size_t output_channel_index, # const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) -BEGIN_FUNCTION pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon - .arm -#ifndef __APPLE__ - .arch armv7-a - .fpu neon -#endif - - PUSH {r4, r5, r6, r7, r8, r9, r10, r11, lr} - VPUSH {d8-d15} - - # Store nr in r11 as well for late user. - MOV r11, r1 - # Load output channel index - LDR r5, [sp, 120] - # Load quantization params - # - r7 = quantization_params - LDR r7, [sp, 124] - # Load input_zero_point - VLD1.8 {d14[]}, [r7] - ADD r7, r7, 4 - # Load pointer to per channel zero points array - LDR r4, [r7] - # Add output_channel_index to the b_zero_point pointer - ADD r4, r4, r5 - - # Load w_row_ptr + n - LDR r5, [sp, 100] - # r7 = blocks_id_ptr - LDR r7, [sp, 104] - - VEOR q8, q8, q8 - VEOR q9, q9, q9 - VEOR q10, q10, q10 - VEOR q11, q11, q11 - VEOR q12, q12, q12 - VEOR q13, q13, q13 - VEOR q14, q14, q14 - VEOR q15, q15, q15 - VLD1.8 {d15}, [r4] - # ip = w_row_ptr[n], lr = w_row_ptr[n+1] - # r5 = r5 + 4 to point to next n - LDR ip, [r5], #4 - LDR lr, [r5] - # r6 = temp_packed_w = packed_w + w_row_ptr[n] * 8 - # * 8 because each block contains 8 values - # This points to the first block of nonzero value - # for the nth row. - ADD r6, r3, ip, LSL #3 - # r9 = temp_w_block_ids_ptr = w_block_ids_ptr (r7) + w_row_ptr[n] - # LSL2 because each element is 4 bytes because blocks ids are uint32_t pointer - # This points to the col block id of the first block - # It should contain lr - ip number of block ids - # Note that in this kernel sparsity pattern is 8x1. - # Thus each block contains only 1 k as opposed to - # 1x4 where each block contains 4 k. - ADD r9, r7, ip, LSL #2 - # r8 = num_blocks that needs to be processed - SUB r8, lr, ip - SUBS r8, r8, 2 - BLO 1f - - .p2align 5 -k_loop: - # Load 2 non zero blocks of weights. Each block = 8x1. - VLD1.8 {d0}, [r6]! - VLD1.8 {d2}, [r6]! - - #ip = block_id_ptr[0] - #lr = block_id_ptr[1] - LDR ip, [r9], #4 - LDR lr, [r9], #4 - - # Add offset to r2 - # Shift by 4 because each packed block is a block of 4x1 - # which 4 bytes - ADD r10, r2, ip, LSL #2 - # q9 = vxb - VSUBL.U8 q0, d0, d15 - VSUBL.U8 q1, d2, d15 - - # d4 = 4x1 transposed - VLD1.32 {d4[]}, [r10] - - ADD r10, r2, lr, LSL #2 - - VSUBL.U8 q2, d4, d14 // vxa0_t - - # d5 = next 4x1 transposed - VLD1.32 {d6[]}, [r10] - - VSUBL.U8 q3, d6, d14 // vxa1_t - - # q0 = d0, d1 = 8x1 block of weight for k - # q1 = d2, d3 = 8x1 block of weight for k + 1 - # q2's d4 = 4x1 block of activation for k - # q3's d6 = 4x1 block of activation for k + 1 - - # Generate 4x8 block as two 4x4 blocks - - VMLAL.S16 q8, d0, d4[0] - VMLAL.S16 q9, d1, d4[0] - VMLAL.S16 q10, d0, d4[1] - VMLAL.S16 q11, d1, d4[1] - VMLAL.S16 q12, d0, d4[2] - VMLAL.S16 q13, d1, d4[2] - VMLAL.S16 q14, d0, d4[3] - VMLAL.S16 q15, d1, d4[3] - - VMLAL.S16 q8, d2, d6[0] - VMLAL.S16 q9, d3, d6[0] - VMLAL.S16 q10, d2, d6[1] - VMLAL.S16 q11, d3, d6[1] - VMLAL.S16 q12, d2, d6[2] - VMLAL.S16 q13, d3, d6[2] - VMLAL.S16 q14, d2, d6[3] - VMLAL.S16 q15, d3, d6[3] - - SUBS r8, r8, 2 - - BHS k_loop -1: - CMP r8, -2 - BEQ 3f - - # Load last nonzero block - # For this we will load 4 8 bit values as one 32 bit value - VLD1.8 {d0}, [r6] - # q9 = vxb - VSUBL.U8 q0, d0, d15 - - #ip = block_id_ptr[0] - LDR ip, [r9] - - # Add offset to r2 - # Shift by 4 because each packed block is a block of 4x1 - # which 4 bytes - ADD r10, r2, ip, LSL #2 - - VLD1.32 {d4[]}, [r10]! - - VSUBL.U8 q2, d4, d14 // vxa0_t - - VMLAL.S16 q8, d0, d4[0] - VMLAL.S16 q9, d1, d4[0] - VMLAL.S16 q10, d0, d4[1] - VMLAL.S16 q11, d1, d4[1] - VMLAL.S16 q12, d0, d4[2] - VMLAL.S16 q13, d1, d4[2] - VMLAL.S16 q14, d0, d4[3] - VMLAL.S16 q15, d1, d4[3] - - - .p2align 4 -3: - # Load output channel index - LDR r5, [sp, 120] - # Load quantization params - # - r7 = quantization_params - LDR r7, [sp, 124] - ADD r7, r7, 8 - # Load pointer to per channel requant scale - LDR r7, [r7] - # Now r7 has the base_addr + offset for multipliers - ADD r7, r7, r5, LSL #2 - - LDR r6, [sp, 108] - # Load q6: vmultiplier_c0123 - VLD1.32 {d12, d13}, [r7]! - # Load q7: vmultiplier_c4567 - VLD1.32 {d14, d15}, [r7] - VCVT.F32.S32 q8, q8 - VCVT.F32.S32 q9, q9 - VCVT.F32.S32 q10, q10 - VLD1.32 {q0}, [r6]! - VLD1.32 {q1}, [r6] +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_4X8_PACKEDA__AARCH32_NEON(32, #4, #2, LDR) - VCVT.F32.S32 q11, q11 - VCVT.F32.S32 q12, q12 - VCVT.F32.S32 q13, q13 - VCVT.F32.S32 q14, q14 - VCVT.F32.S32 q15, q15 - - VMUL.F32 q8, q8, q6 - VMUL.F32 q9, q9, q7 - VMUL.F32 q10, q10, q6 - VMUL.F32 q11, q11, q7 - VMUL.F32 q12, q12, q6 - VMUL.F32 q13, q13, q7 - VMUL.F32 q14, q14, q6 - VMUL.F32 q15, q15, q7 - - VADD.F32 q8, q8, q0 - VADD.F32 q9, q9, q1 - VADD.F32 q10, q10, q0 - VADD.F32 q11, q11, q1 - VADD.F32 q12, q12, q0 - VADD.F32 q13, q13, q1 - VADD.F32 q14, q14, q0 - VADD.F32 q15, q15, q1 - - # Load c, c_stride: - # - r1 = c - # - r9 = c_stride - LDR r1, [sp, 112] - LDR r9, [sp, 116] - LSL r9, r9, 2 - - # r1 = c0 = c pointer - - CMP r0, 2 - # r2 = c1 - ADD r2, r1, r9 - MOVLO r2, r1 - - # r3 = c2 - ADD r3, r2, r9 - MOVLS r3, r2 - - CMP r0, 4 - # r4 = c3 - ADD r4, r3, r9 - MOVNE r4, r3 - - CMP r11, 8 - BNE 4f - - VST1.32 {q8}, [r1]! - VST1.32 {q10}, [r2]! - VST1.32 {q12}, [r3]! - VST1.32 {q14}, [r4]! - VST1.32 {q9}, [r1] - VST1.32 {q11}, [r2] - VST1.32 {q13}, [r3] - VST1.32 {q15}, [r4] - - VPOP {d8-d15} - POP {r4, r5, r6, r7, r8, r9, r10, r11, lr} - BX lr - - .p2align 3 -4: - CMP r11, 4 - BLO 5f - - VST1.32 {q8}, [r1]! - VST1.32 {q10}, [r2]! - VST1.32 {q12}, [r3]! - VST1.32 {q14}, [r4]! - - SUB r11, 4 - - VMOV.32 q8, q9 - VMOV.32 q10, q11 - VMOV.32 q12, q13 - VMOV.32 q14, q15 - -5: - CMP r11, 2 - BLO 6f - - VST1.32 {d16}, [r1]! - VST1.32 {d20}, [r2]! - VST1.32 {d24}, [r3]! - VST1.32 {d28}, [r4]! - - SUB r11, 2 - - VEXT.32 q8, q8, 2 - VEXT.32 q10, q10, 2 - VEXT.32 q12, q12, 2 - VEXT.32 q14, q14, 2 - -6: - TEQ r11, 0 - BEQ 7f - - VST1.32 {d16[0]}, [r1] - VST1.32 {d20[0]}, [r2] - VST1.32 {d24[0]}, [r3] - VST1.32 {d28[0]}, [r4] - -7: - VPOP {d8-d15} - POP {r4, r5, r6, r7, r8, r9, r10, r11, lr} - BX lr +# void pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w16__aarch32_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint16_t* w_row_ptr, +# const uint16_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_4X8_PACKEDA__AARCH32_NEON(16, #2, #1, LDRH) -END_FUNCTION pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon +# void pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w8__aarch32_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint8_t* w_row_ptr, +# const uint8_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_4X8_PACKEDA__AARCH32_NEON(8, #1, #0, LDRB) #ifdef __ELF__ .section ".note.GNU-stack","",%progbits #endif + +#undef NDEF_APPLE_SYMBOLS +#undef MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_4X8_PACKEDA__AARCH32_NEON diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.c b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.c index 98376e3d2cdb..768574ba6f51 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.c +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.c @@ -1,434 +1,17 @@ -/* - * Copyright (c) Facebook, Inc. and its affiliates. - * All rights reserved. - * - * This source code is licensed under the BSD-style license found in the - * LICENSE file in the root directory of this source tree. - */ - -#include - -#include -#include - -#include "8x4c1x4-packed-sse2.h" - -#define CONVERT_TO_FP_AND_TRANSPOSE(a, b, c, d, t_a, t_b, t_c, t_d) \ - a_ps = _mm_cvtepi32_ps(a); \ - b_ps = _mm_cvtepi32_ps(b); \ - c_ps = _mm_cvtepi32_ps(c); \ - d_ps = _mm_cvtepi32_ps(d); \ - tmp0 = _mm_shuffle_ps(a_ps, b_ps, _MM_SHUFFLE(1, 0, 1, 0)); \ - tmp1 = _mm_shuffle_ps(a_ps, b_ps, _MM_SHUFFLE(3, 2, 3, 2)); \ - tmp2 = _mm_shuffle_ps(c_ps, d_ps, _MM_SHUFFLE(1, 0, 1, 0)); \ - tmp3 = _mm_shuffle_ps(c_ps, d_ps, _MM_SHUFFLE(3, 2, 3, 2)); \ - t_a = _mm_shuffle_ps(tmp0, tmp2, _MM_SHUFFLE(2, 0, 2, 0)); \ - t_b = _mm_shuffle_ps(tmp0, tmp2, _MM_SHUFFLE(3, 1, 3, 1)); \ - t_c = _mm_shuffle_ps(tmp1, tmp3, _MM_SHUFFLE(2, 0, 2, 0)); \ - t_d = _mm_shuffle_ps(tmp1, tmp3, _MM_SHUFFLE(3, 1, 3, 1)); - -void pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2( - size_t mr, - size_t nr, - const uint8_t* a_packed, - const uint8_t* packed_w, - const uint32_t* w_row_ptr, - const uint32_t* w_block_ids_ptr, - const float* b, - float* c, - size_t c_stride, - size_t output_channel_index, - const struct pytorch_qnnp_conv_dynamic_quantization_params - quantization_params[RESTRICT_STATIC 1]) { - - const __m128i va_zero_point = _mm_set1_epi16(quantization_params->input_zero_point); - const __m128 vbias = _mm_load_ps(b); - const __m128i vzero = _mm_setzero_si128(); - - // Packed A format. - // 8kx4m blocks for alls blocks given 4 rows (4m) are placed in contiguous memory. - // Original A - // --------- K ----------- -- (K + 4 - 1) / 4 -- - // | | | | - // | | (M + 8 - 1)/8 | - // | | Packed | | - // M | => |-------------------| - // | | Thus Packed A has (K + 4 - 1)/4 * (M + 8 -1)/8 blocks - // | | - // |---------------------| - // - // Each 8 x 4 blocks is transposed and stored. - // Each of the (K + 4 - 1)/4 blocks for a given group of 8 m blocks - // are stored adjacent in memory - // Thus, each block: - // |----8m-----|----8m-----| - // 4k | | ..... - // |-----------|-----------| - // This locality helps in loading 8kx8m blocks of activations - // Note when M is not multiple of 8, the rest can contain arbitrary - // data in packed A as we will not be writing those out. - // This wil be taken care by just copying the appropriate valid data - - __m128i vacc_low[4]; - __m128i vacc_high[4]; - const __m128 vmultiplier = - _mm_loadu_ps(&quantization_params->multipliers[output_channel_index]); - for (int32_t n = 0; n < nr; n++) { - vacc_low[n] = _mm_setzero_si128(); - vacc_high[n] = _mm_setzero_si128(); - const int16_t b_zero_point = - (int16_t)(uint16_t)quantization_params->kernel_zero_points[ - output_channel_index + n]; - - int32_t num_blocks = w_row_ptr[n+1] - w_row_ptr[n]; - // Offset into compressed values. - // w_row_ptr[0] is the block offset in the compressed values. - // Where the corresponding row of the weight matrix starts. - const uint8_t* temp_packed_w = packed_w + w_row_ptr[n] * COL_BLOCK_SIZE; - // Similarly w_row_ptr[0] is also the block offset where - // corresponding row's block column ids start. - // Per row # of block column ids = # of block values - const uint32_t* temp_w_block_ids_ptr = w_block_ids_ptr + w_row_ptr[n]; - while (num_blocks > 1) { - // Load two 1x4 uint8 blocks 2 ints - const uint8_t* b_ptr = temp_packed_w; - // This is not perf optimal since this will result in - // register spills. We probably should work with output block - // of 1x4 instead of 1x8 - // But doing is this way because mostly this how we will - // do it for ARM and this reference code helps establish - // the baseline for functional correctness. - const int16_t b_0 = (int16_t)((uint16_t)(b_ptr[0])); - const int16_t b_1 = (int16_t)((uint16_t)(b_ptr[1])); - const int16_t b_2 = (int16_t)((uint16_t)(b_ptr[2])); - const int16_t b_3 = (int16_t)((uint16_t)(b_ptr[3])); - const int16_t b_4 = (int16_t)((uint16_t)(b_ptr[4])); - const int16_t b_5 = (int16_t)((uint16_t)(b_ptr[5])); - const int16_t b_6 = (int16_t)((uint16_t)(b_ptr[6])); - const int16_t b_7 = (int16_t)((uint16_t)(b_ptr[7])); - // Now we will load 8kx1(broadcast 8) weight values - const __m128i vxb0 = _mm_set1_epi16((b_0 - b_zero_point)); - const __m128i vxb1 = _mm_set1_epi16((b_1 - b_zero_point)); - const __m128i vxb2 = _mm_set1_epi16((b_2 - b_zero_point)); - const __m128i vxb3 = _mm_set1_epi16((b_3 - b_zero_point)); - const __m128i vxb4 = _mm_set1_epi16((b_4 - b_zero_point)); - const __m128i vxb5 = _mm_set1_epi16((b_5 - b_zero_point)); - const __m128i vxb6 = _mm_set1_epi16((b_6 - b_zero_point)); - const __m128i vxb7 = _mm_set1_epi16((b_7 - b_zero_point)); - - // Load activation blocks. In this kernel we assume - // a mat is already transposed. K x M - // 1. Load 8 1x8 registers = 8k x 8m - - // Load column id of the first 1x4 block - int32_t col_block_id_0 = temp_w_block_ids_ptr[0]; - // Load column id of the second 1x4 block - int32_t col_block_id_1 = temp_w_block_ids_ptr[1]; - const __m128i va0 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 0)); - const __m128i va1 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 1)); - const __m128i va2 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 2)); - const __m128i va3 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 3)); - const __m128i va4 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_1 * PACKED_A_BLOCK_SIZE + MR * 0)); - const __m128i va5 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_1 * PACKED_A_BLOCK_SIZE + MR * 1)); - const __m128i va6 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_1 * PACKED_A_BLOCK_SIZE + MR * 2)); - const __m128i va7 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_1 * PACKED_A_BLOCK_SIZE + MR * 3)); - - const __m128i vxa0 = - sub_zero_point(_mm_unpacklo_epi8(va0, vzero), va_zero_point); - const __m128i vxa1 = - sub_zero_point(_mm_unpacklo_epi8(va1, vzero), va_zero_point); - const __m128i vxa2 = - sub_zero_point(_mm_unpacklo_epi8(va2, vzero), va_zero_point); - const __m128i vxa3 = - sub_zero_point(_mm_unpacklo_epi8(va3, vzero), va_zero_point); - const __m128i vxa4 = - sub_zero_point(_mm_unpacklo_epi8(va4, vzero), va_zero_point); - const __m128i vxa5 = - sub_zero_point(_mm_unpacklo_epi8(va5, vzero), va_zero_point); - const __m128i vxa6 = - sub_zero_point(_mm_unpacklo_epi8(va6, vzero), va_zero_point); - const __m128i vxa7 = - sub_zero_point(_mm_unpacklo_epi8(va7, vzero), va_zero_point); - - // acc += a0 * b0; - __m128i vacc_low_16bits = _mm_mullo_epi16(vxa0, vxb0); - __m128i vacc_high_16bits = _mm_mulhi_epi16(vxa0, vxb0); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a1 * b1; - vacc_low_16bits = _mm_mullo_epi16(vxa1, vxb1); - vacc_high_16bits = _mm_mulhi_epi16(vxa1, vxb1); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a2 * b2; - vacc_low_16bits = _mm_mullo_epi16(vxa2, vxb2); - vacc_high_16bits = _mm_mulhi_epi16(vxa2, vxb2); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a3 * b3; - vacc_low_16bits = _mm_mullo_epi16(vxa3, vxb3); - vacc_high_16bits = _mm_mulhi_epi16(vxa3, vxb3); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a4 * b4; - vacc_low_16bits = _mm_mullo_epi16(vxa4, vxb4); - vacc_high_16bits = _mm_mulhi_epi16(vxa4, vxb4); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a5 * b5; - vacc_low_16bits = _mm_mullo_epi16(vxa5, vxb5); - vacc_high_16bits = _mm_mulhi_epi16(vxa5, vxb5); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a6 * b6; - vacc_low_16bits = _mm_mullo_epi16(vxa6, vxb6); - vacc_high_16bits = _mm_mulhi_epi16(vxa6, vxb6); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a7 * b7; - vacc_low_16bits = _mm_mullo_epi16(vxa7, vxb7); - vacc_high_16bits = _mm_mulhi_epi16(vxa7, vxb7); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - - // Now we have 1x8 m acculated 32 bit values in vacc_low[n](4) and vacc_high[n](4) - - temp_packed_w = temp_packed_w + COL_BLOCK_SIZE * 2; - temp_w_block_ids_ptr += 2; - num_blocks -= 2; - } - if (num_blocks > 0) { - // Load two 1x4 uint8 blocks 2 ints - const uint8_t* b_ptr = temp_packed_w; - const int16_t b_0 = (int16_t)((uint16_t)(b_ptr[0])); - const int16_t b_1 = (int16_t)((uint16_t)(b_ptr[1])); - const int16_t b_2 = (int16_t)((uint16_t)(b_ptr[2])); - const int16_t b_3 = (int16_t)((uint16_t)(b_ptr[3])); - // Now we will load 8kx1(broadcast 8) weight values - const __m128i vxb0 = _mm_set1_epi16((b_0 - b_zero_point)); - const __m128i vxb1 = _mm_set1_epi16((b_1 - b_zero_point)); - const __m128i vxb2 = _mm_set1_epi16((b_2 - b_zero_point)); - const __m128i vxb3 = _mm_set1_epi16((b_3 - b_zero_point)); - - // Then load transformed weight blocks - // 1. Load 4 1x8 registers = 4k x 8m - // Thus have 4x8 (4k x 8m) activations a0, a1, a2, a3 - // Each a containing 8 m values. - - // Load column id of the first 1x4 block - int32_t col_block_id_0 = temp_w_block_ids_ptr[0]; - const __m128i va0 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 0)); - const __m128i va1 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 1)); - const __m128i va2 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 2)); - const __m128i va3 = - _mm_loadl_epi64((const __m128i*) (a_packed + - col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 3)); - const __m128i vxa0 = - sub_zero_point(_mm_unpacklo_epi8(va0, vzero), va_zero_point); - const __m128i vxa1 = - sub_zero_point(_mm_unpacklo_epi8(va1, vzero), va_zero_point); - const __m128i vxa2 = - sub_zero_point(_mm_unpacklo_epi8(va2, vzero), va_zero_point); - const __m128i vxa3 = - sub_zero_point(_mm_unpacklo_epi8(va3, vzero), va_zero_point); - - // acc += a0 * b0; - __m128i vacc_low_16bits = _mm_mullo_epi16(vxa0, vxb0); - __m128i vacc_high_16bits = _mm_mulhi_epi16(vxa0, vxb0); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a1 * b1; - vacc_low_16bits = _mm_mullo_epi16(vxa1, vxb1); - vacc_high_16bits = _mm_mulhi_epi16(vxa1, vxb1); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a2 * b2; - vacc_low_16bits = _mm_mullo_epi16(vxa2, vxb2); - vacc_high_16bits = _mm_mulhi_epi16(vxa2, vxb2); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - // acc += a3 * b3; - vacc_low_16bits = _mm_mullo_epi16(vxa3, vxb3); - vacc_high_16bits = _mm_mulhi_epi16(vxa3, vxb3); - vacc_low[n] = _mm_add_epi32(vacc_low[n], - _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); - vacc_high[n] = _mm_add_epi32(vacc_high[n], - _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); - - // Now we have 1x8 m acculated 32 bit values in vacc_low[n](4) and vacc_high[n](4) - } - } - - __m128 vout[8]; - __m128 a_ps, b_ps, c_ps, d_ps, tmp0, tmp1, tmp2, tmp3; - - // Transform low half of 4x8 result - // That is 4x4 block (4n x 4m) - // Convert to FP and transpose: 4m x 4n - CONVERT_TO_FP_AND_TRANSPOSE(vacc_low[0], - vacc_low[1], - vacc_low[2], - vacc_low[3], - vout[0], - vout[1], - vout[2], - vout[3]) - CONVERT_TO_FP_AND_TRANSPOSE(vacc_high[0], - vacc_high[1], - vacc_high[2], - vacc_high[3], - vout[4], - vout[5], - vout[6], - vout[7]) - - vout[0] = _mm_mul_ps(vmultiplier, vout[0]); - vout[1] = _mm_mul_ps(vmultiplier, vout[1]); - vout[2] = _mm_mul_ps(vmultiplier, vout[2]); - vout[3] = _mm_mul_ps(vmultiplier, vout[3]); - vout[4] = _mm_mul_ps(vmultiplier, vout[4]); - vout[5] = _mm_mul_ps(vmultiplier, vout[5]); - vout[6] = _mm_mul_ps(vmultiplier, vout[6]); - vout[7] = _mm_mul_ps(vmultiplier, vout[7]); - - vout[0] = _mm_add_ps(vout[0], vbias); - vout[1] = _mm_add_ps(vout[1], vbias); - vout[2] = _mm_add_ps(vout[2], vbias); - vout[3] = _mm_add_ps(vout[3], vbias); - vout[4] = _mm_add_ps(vout[4], vbias); - vout[5] = _mm_add_ps(vout[5], vbias); - vout[6] = _mm_add_ps(vout[6], vbias); - vout[7] = _mm_add_ps(vout[7], vbias); - - float* c0 = c; - float* c1 = c0 + c_stride; - if (mr < 2) { - c1 = c0; - vout[1] = vout[0]; - } - float* c2 = c1 + c_stride; - if (mr < 3) { - c2 = c0; - vout[2] = vout[0]; - } - float* c3 = c2 + c_stride; - if (mr < 4) { - c3 = c0; - vout[3] = vout[0]; - } - float* c4 = c3 + c_stride; - if (mr < 5) { - c4 = c0; - vout[4] = vout[0]; - } - float* c5 = c4 + c_stride; - if (mr < 6) { - c5 = c0; - vout[5] = vout[0]; - } - float* c6 = c5 + c_stride; - if (mr < 7) { - c6 = c0; - vout[6] = vout[0]; - } - float* c7 = c6 + c_stride; - if (mr < 8) { - c7 = c0; - vout[7] = vout[0]; - } - - if (nr == 4) { - _mm_storeu_ps(c0, vout[0]); - _mm_storeu_ps(c1, vout[1]); - _mm_storeu_ps(c2, vout[2]); - _mm_storeu_ps(c3, vout[3]); - _mm_storeu_ps(c4, vout[4]); - _mm_storeu_ps(c5, vout[5]); - _mm_storeu_ps(c6, vout[6]); - _mm_storeu_ps(c7, vout[7]); - } else { - if (nr >= 2) { - _mm_storel_pi((__m64*)c0, vout[0]); - _mm_storel_pi((__m64*)c1, vout[1]); - _mm_storel_pi((__m64*)c2, vout[2]); - _mm_storel_pi((__m64*)c3, vout[3]); - _mm_storel_pi((__m64*)c4, vout[4]); - _mm_storel_pi((__m64*)c5, vout[5]); - _mm_storel_pi((__m64*)c6, vout[6]); - _mm_storel_pi((__m64*)c7, vout[7]); - - nr -= 2; - - c0 += 2; - c1 += 2; - c2 += 2; - c3 += 2; - c4 += 2; - c5 += 2; - c6 += 2; - c7 += 2; - vout[0] = _mm_shuffle_ps(vout[0], vout[0], _MM_SHUFFLE(2, 2, 2, 2)); - vout[1] = _mm_shuffle_ps(vout[1], vout[1], _MM_SHUFFLE(2, 2, 2, 2)); - vout[2] = _mm_shuffle_ps(vout[2], vout[2], _MM_SHUFFLE(2, 2, 2, 2)); - vout[3] = _mm_shuffle_ps(vout[3], vout[3], _MM_SHUFFLE(2, 2, 2, 2)); - vout[4] = _mm_shuffle_ps(vout[4], vout[4], _MM_SHUFFLE(2, 2, 2, 2)); - vout[5] = _mm_shuffle_ps(vout[5], vout[5], _MM_SHUFFLE(2, 2, 2, 2)); - vout[6] = _mm_shuffle_ps(vout[6], vout[6], _MM_SHUFFLE(2, 2, 2, 2)); - vout[7] = _mm_shuffle_ps(vout[7], vout[7], _MM_SHUFFLE(2, 2, 2, 2)); - } - if (nr != 0) { - *c0 = _mm_cvtss_f32(vout[0]); - *c1 = _mm_cvtss_f32(vout[1]); - *c2 = _mm_cvtss_f32(vout[2]); - *c3 = _mm_cvtss_f32(vout[3]); - *c4 = _mm_cvtss_f32(vout[4]); - *c5 = _mm_cvtss_f32(vout[5]); - *c6 = _mm_cvtss_f32(vout[6]); - *c7 = _mm_cvtss_f32(vout[7]); - } - } -} +#define KERNEL_NAME pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2 +#define W_INDEX_DTYPE uint32_t +#include "8x4c1x4-dq-packedA-sse2.h" +#undef KERNEL_NAME +#undef W_INDEX_DTYPE + +#define KERNEL_NAME pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2 +#define W_INDEX_DTYPE uint16_t +#include "8x4c1x4-dq-packedA-sse2.h" +#undef KERNEL_NAME +#undef W_INDEX_DTYPE + +#define KERNEL_NAME pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2 +#define W_INDEX_DTYPE uint8_t +#include "8x4c1x4-dq-packedA-sse2.h" +#undef KERNEL_NAME +#undef W_INDEX_DTYPE diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.h b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.h new file mode 100644 index 000000000000..5503d6718172 --- /dev/null +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.h @@ -0,0 +1,435 @@ +/* + * Copyright (c) Facebook, Inc. and its affiliates. + * All rights reserved. + * + * This source code is licensed under the BSD-style license found in the + * LICENSE file in the root directory of this source tree. + */ + +#include + +#include +#include + +#include "8x4c1x4-packed-sse2.h" + +#define CONVERT_TO_FP_AND_TRANSPOSE(a, b, c, d, t_a, t_b, t_c, t_d) \ + a_ps = _mm_cvtepi32_ps(a); \ + b_ps = _mm_cvtepi32_ps(b); \ + c_ps = _mm_cvtepi32_ps(c); \ + d_ps = _mm_cvtepi32_ps(d); \ + tmp0 = _mm_shuffle_ps(a_ps, b_ps, _MM_SHUFFLE(1, 0, 1, 0)); \ + tmp1 = _mm_shuffle_ps(a_ps, b_ps, _MM_SHUFFLE(3, 2, 3, 2)); \ + tmp2 = _mm_shuffle_ps(c_ps, d_ps, _MM_SHUFFLE(1, 0, 1, 0)); \ + tmp3 = _mm_shuffle_ps(c_ps, d_ps, _MM_SHUFFLE(3, 2, 3, 2)); \ + t_a = _mm_shuffle_ps(tmp0, tmp2, _MM_SHUFFLE(2, 0, 2, 0)); \ + t_b = _mm_shuffle_ps(tmp0, tmp2, _MM_SHUFFLE(3, 1, 3, 1)); \ + t_c = _mm_shuffle_ps(tmp1, tmp3, _MM_SHUFFLE(2, 0, 2, 0)); \ + t_d = _mm_shuffle_ps(tmp1, tmp3, _MM_SHUFFLE(3, 1, 3, 1)); + +// KERNEL_NAME and W_INDEX_DTYPE macros are defined in +// https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x4c1x4-dq-packedA-sse2.c +void KERNEL_NAME( + size_t mr, + size_t nr, + const uint8_t* a_packed, + const uint8_t* packed_w, + const W_INDEX_DTYPE* w_row_ptr, + const W_INDEX_DTYPE* w_block_ids_ptr, + const float* b, + float* c, + size_t c_stride, + size_t output_channel_index, + const struct pytorch_qnnp_conv_dynamic_quantization_params + quantization_params[RESTRICT_STATIC 1]) { + const __m128i va_zero_point = _mm_set1_epi16(quantization_params->input_zero_point); + const __m128 vbias = _mm_load_ps(b); + const __m128i vzero = _mm_setzero_si128(); + + // Packed A format. + // 8kx4m blocks for alls blocks given 4 rows (4m) are placed in contiguous memory. + // Original A + // --------- K ----------- -- (K + 4 - 1) / 4 -- + // | | | | + // | | (M + 8 - 1)/8 | + // | | Packed | | + // M | => |-------------------| + // | | Thus Packed A has (K + 4 - 1)/4 * (M + 8 -1)/8 blocks + // | | + // |---------------------| + // + // Each 8 x 4 blocks is transposed and stored. + // Each of the (K + 4 - 1)/4 blocks for a given group of 8 m blocks + // are stored adjacent in memory + // Thus, each block: + // |----8m-----|----8m-----| + // 4k | | ..... + // |-----------|-----------| + // This locality helps in loading 8kx8m blocks of activations + // Note when M is not multiple of 8, the rest can contain arbitrary + // data in packed A as we will not be writing those out. + // This wil be taken care by just copying the appropriate valid data + + __m128i vacc_low[4]; + __m128i vacc_high[4]; + const __m128 vmultiplier = + _mm_loadu_ps(&quantization_params->multipliers[output_channel_index]); + for (int32_t n = 0; n < nr; n++) { + vacc_low[n] = _mm_setzero_si128(); + vacc_high[n] = _mm_setzero_si128(); + const int16_t b_zero_point = + (int16_t)(uint16_t)quantization_params->kernel_zero_points[ + output_channel_index + n]; + + int32_t num_blocks = w_row_ptr[n+1] - w_row_ptr[n]; + // Offset into compressed values. + // w_row_ptr[0] is the block offset in the compressed values. + // Where the corresponding row of the weight matrix starts. + const uint8_t* temp_packed_w = packed_w + w_row_ptr[n] * COL_BLOCK_SIZE; + // Similarly w_row_ptr[0] is also the block offset where + // corresponding row's block column ids start. + // Per row # of block column ids = # of block values + const W_INDEX_DTYPE* temp_w_block_ids_ptr = w_block_ids_ptr + w_row_ptr[n]; + while (num_blocks > 1) { + // Load two 1x4 uint8 blocks 2 ints + const uint8_t* b_ptr = temp_packed_w; + // This is not perf optimal since this will result in + // register spills. We probably should work with output block + // of 1x4 instead of 1x8 + // But doing is this way because mostly this how we will + // do it for ARM and this reference code helps establish + // the baseline for functional correctness. + const int16_t b_0 = (int16_t)((uint16_t)(b_ptr[0])); + const int16_t b_1 = (int16_t)((uint16_t)(b_ptr[1])); + const int16_t b_2 = (int16_t)((uint16_t)(b_ptr[2])); + const int16_t b_3 = (int16_t)((uint16_t)(b_ptr[3])); + const int16_t b_4 = (int16_t)((uint16_t)(b_ptr[4])); + const int16_t b_5 = (int16_t)((uint16_t)(b_ptr[5])); + const int16_t b_6 = (int16_t)((uint16_t)(b_ptr[6])); + const int16_t b_7 = (int16_t)((uint16_t)(b_ptr[7])); + // Now we will load 8kx1(broadcast 8) weight values + const __m128i vxb0 = _mm_set1_epi16((b_0 - b_zero_point)); + const __m128i vxb1 = _mm_set1_epi16((b_1 - b_zero_point)); + const __m128i vxb2 = _mm_set1_epi16((b_2 - b_zero_point)); + const __m128i vxb3 = _mm_set1_epi16((b_3 - b_zero_point)); + const __m128i vxb4 = _mm_set1_epi16((b_4 - b_zero_point)); + const __m128i vxb5 = _mm_set1_epi16((b_5 - b_zero_point)); + const __m128i vxb6 = _mm_set1_epi16((b_6 - b_zero_point)); + const __m128i vxb7 = _mm_set1_epi16((b_7 - b_zero_point)); + + // Load activation blocks. In this kernel we assume + // a mat is already transposed. K x M + // 1. Load 8 1x8 registers = 8k x 8m + + // Load column id of the first 1x4 block + int32_t col_block_id_0 = temp_w_block_ids_ptr[0]; + // Load column id of the second 1x4 block + int32_t col_block_id_1 = temp_w_block_ids_ptr[1]; + const __m128i va0 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 0)); + const __m128i va1 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 1)); + const __m128i va2 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 2)); + const __m128i va3 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 3)); + const __m128i va4 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_1 * PACKED_A_BLOCK_SIZE + MR * 0)); + const __m128i va5 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_1 * PACKED_A_BLOCK_SIZE + MR * 1)); + const __m128i va6 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_1 * PACKED_A_BLOCK_SIZE + MR * 2)); + const __m128i va7 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_1 * PACKED_A_BLOCK_SIZE + MR * 3)); + + const __m128i vxa0 = + sub_zero_point(_mm_unpacklo_epi8(va0, vzero), va_zero_point); + const __m128i vxa1 = + sub_zero_point(_mm_unpacklo_epi8(va1, vzero), va_zero_point); + const __m128i vxa2 = + sub_zero_point(_mm_unpacklo_epi8(va2, vzero), va_zero_point); + const __m128i vxa3 = + sub_zero_point(_mm_unpacklo_epi8(va3, vzero), va_zero_point); + const __m128i vxa4 = + sub_zero_point(_mm_unpacklo_epi8(va4, vzero), va_zero_point); + const __m128i vxa5 = + sub_zero_point(_mm_unpacklo_epi8(va5, vzero), va_zero_point); + const __m128i vxa6 = + sub_zero_point(_mm_unpacklo_epi8(va6, vzero), va_zero_point); + const __m128i vxa7 = + sub_zero_point(_mm_unpacklo_epi8(va7, vzero), va_zero_point); + + // acc += a0 * b0; + __m128i vacc_low_16bits = _mm_mullo_epi16(vxa0, vxb0); + __m128i vacc_high_16bits = _mm_mulhi_epi16(vxa0, vxb0); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a1 * b1; + vacc_low_16bits = _mm_mullo_epi16(vxa1, vxb1); + vacc_high_16bits = _mm_mulhi_epi16(vxa1, vxb1); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a2 * b2; + vacc_low_16bits = _mm_mullo_epi16(vxa2, vxb2); + vacc_high_16bits = _mm_mulhi_epi16(vxa2, vxb2); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a3 * b3; + vacc_low_16bits = _mm_mullo_epi16(vxa3, vxb3); + vacc_high_16bits = _mm_mulhi_epi16(vxa3, vxb3); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a4 * b4; + vacc_low_16bits = _mm_mullo_epi16(vxa4, vxb4); + vacc_high_16bits = _mm_mulhi_epi16(vxa4, vxb4); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a5 * b5; + vacc_low_16bits = _mm_mullo_epi16(vxa5, vxb5); + vacc_high_16bits = _mm_mulhi_epi16(vxa5, vxb5); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a6 * b6; + vacc_low_16bits = _mm_mullo_epi16(vxa6, vxb6); + vacc_high_16bits = _mm_mulhi_epi16(vxa6, vxb6); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a7 * b7; + vacc_low_16bits = _mm_mullo_epi16(vxa7, vxb7); + vacc_high_16bits = _mm_mulhi_epi16(vxa7, vxb7); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + + // Now we have 1x8 m acculated 32 bit values in vacc_low[n](4) and vacc_high[n](4) + + temp_packed_w = temp_packed_w + COL_BLOCK_SIZE * 2; + temp_w_block_ids_ptr += 2; + num_blocks -= 2; + } + if (num_blocks > 0) { + // Load two 1x4 uint8 blocks 2 ints + const uint8_t* b_ptr = temp_packed_w; + const int16_t b_0 = (int16_t)((uint16_t)(b_ptr[0])); + const int16_t b_1 = (int16_t)((uint16_t)(b_ptr[1])); + const int16_t b_2 = (int16_t)((uint16_t)(b_ptr[2])); + const int16_t b_3 = (int16_t)((uint16_t)(b_ptr[3])); + // Now we will load 8kx1(broadcast 8) weight values + const __m128i vxb0 = _mm_set1_epi16((b_0 - b_zero_point)); + const __m128i vxb1 = _mm_set1_epi16((b_1 - b_zero_point)); + const __m128i vxb2 = _mm_set1_epi16((b_2 - b_zero_point)); + const __m128i vxb3 = _mm_set1_epi16((b_3 - b_zero_point)); + + // Then load transformed weight blocks + // 1. Load 4 1x8 registers = 4k x 8m + // Thus have 4x8 (4k x 8m) activations a0, a1, a2, a3 + // Each a containing 8 m values. + + // Load column id of the first 1x4 block + int32_t col_block_id_0 = temp_w_block_ids_ptr[0]; + const __m128i va0 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 0)); + const __m128i va1 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 1)); + const __m128i va2 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 2)); + const __m128i va3 = + _mm_loadl_epi64((const __m128i*) (a_packed + + col_block_id_0 * PACKED_A_BLOCK_SIZE + MR * 3)); + const __m128i vxa0 = + sub_zero_point(_mm_unpacklo_epi8(va0, vzero), va_zero_point); + const __m128i vxa1 = + sub_zero_point(_mm_unpacklo_epi8(va1, vzero), va_zero_point); + const __m128i vxa2 = + sub_zero_point(_mm_unpacklo_epi8(va2, vzero), va_zero_point); + const __m128i vxa3 = + sub_zero_point(_mm_unpacklo_epi8(va3, vzero), va_zero_point); + + // acc += a0 * b0; + __m128i vacc_low_16bits = _mm_mullo_epi16(vxa0, vxb0); + __m128i vacc_high_16bits = _mm_mulhi_epi16(vxa0, vxb0); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a1 * b1; + vacc_low_16bits = _mm_mullo_epi16(vxa1, vxb1); + vacc_high_16bits = _mm_mulhi_epi16(vxa1, vxb1); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a2 * b2; + vacc_low_16bits = _mm_mullo_epi16(vxa2, vxb2); + vacc_high_16bits = _mm_mulhi_epi16(vxa2, vxb2); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + // acc += a3 * b3; + vacc_low_16bits = _mm_mullo_epi16(vxa3, vxb3); + vacc_high_16bits = _mm_mulhi_epi16(vxa3, vxb3); + vacc_low[n] = _mm_add_epi32(vacc_low[n], + _mm_unpacklo_epi16(vacc_low_16bits, vacc_high_16bits)); + vacc_high[n] = _mm_add_epi32(vacc_high[n], + _mm_unpackhi_epi16(vacc_low_16bits, vacc_high_16bits)); + + // Now we have 1x8 m acculated 32 bit values in vacc_low[n](4) and vacc_high[n](4) + } + } + + __m128 vout[8]; + __m128 a_ps, b_ps, c_ps, d_ps, tmp0, tmp1, tmp2, tmp3; + + // Transform low half of 4x8 result + // That is 4x4 block (4n x 4m) + // Convert to FP and transpose: 4m x 4n + CONVERT_TO_FP_AND_TRANSPOSE(vacc_low[0], + vacc_low[1], + vacc_low[2], + vacc_low[3], + vout[0], + vout[1], + vout[2], + vout[3]) + CONVERT_TO_FP_AND_TRANSPOSE(vacc_high[0], + vacc_high[1], + vacc_high[2], + vacc_high[3], + vout[4], + vout[5], + vout[6], + vout[7]) + + vout[0] = _mm_mul_ps(vmultiplier, vout[0]); + vout[1] = _mm_mul_ps(vmultiplier, vout[1]); + vout[2] = _mm_mul_ps(vmultiplier, vout[2]); + vout[3] = _mm_mul_ps(vmultiplier, vout[3]); + vout[4] = _mm_mul_ps(vmultiplier, vout[4]); + vout[5] = _mm_mul_ps(vmultiplier, vout[5]); + vout[6] = _mm_mul_ps(vmultiplier, vout[6]); + vout[7] = _mm_mul_ps(vmultiplier, vout[7]); + + vout[0] = _mm_add_ps(vout[0], vbias); + vout[1] = _mm_add_ps(vout[1], vbias); + vout[2] = _mm_add_ps(vout[2], vbias); + vout[3] = _mm_add_ps(vout[3], vbias); + vout[4] = _mm_add_ps(vout[4], vbias); + vout[5] = _mm_add_ps(vout[5], vbias); + vout[6] = _mm_add_ps(vout[6], vbias); + vout[7] = _mm_add_ps(vout[7], vbias); + + float* c0 = c; + float* c1 = c0 + c_stride; + if (mr < 2) { + c1 = c0; + vout[1] = vout[0]; + } + float* c2 = c1 + c_stride; + if (mr < 3) { + c2 = c0; + vout[2] = vout[0]; + } + float* c3 = c2 + c_stride; + if (mr < 4) { + c3 = c0; + vout[3] = vout[0]; + } + float* c4 = c3 + c_stride; + if (mr < 5) { + c4 = c0; + vout[4] = vout[0]; + } + float* c5 = c4 + c_stride; + if (mr < 6) { + c5 = c0; + vout[5] = vout[0]; + } + float* c6 = c5 + c_stride; + if (mr < 7) { + c6 = c0; + vout[6] = vout[0]; + } + float* c7 = c6 + c_stride; + if (mr < 8) { + c7 = c0; + vout[7] = vout[0]; + } + + if (nr == 4) { + _mm_storeu_ps(c0, vout[0]); + _mm_storeu_ps(c1, vout[1]); + _mm_storeu_ps(c2, vout[2]); + _mm_storeu_ps(c3, vout[3]); + _mm_storeu_ps(c4, vout[4]); + _mm_storeu_ps(c5, vout[5]); + _mm_storeu_ps(c6, vout[6]); + _mm_storeu_ps(c7, vout[7]); + } else { + if (nr >= 2) { + _mm_storel_pi((__m64*)c0, vout[0]); + _mm_storel_pi((__m64*)c1, vout[1]); + _mm_storel_pi((__m64*)c2, vout[2]); + _mm_storel_pi((__m64*)c3, vout[3]); + _mm_storel_pi((__m64*)c4, vout[4]); + _mm_storel_pi((__m64*)c5, vout[5]); + _mm_storel_pi((__m64*)c6, vout[6]); + _mm_storel_pi((__m64*)c7, vout[7]); + + nr -= 2; + + c0 += 2; + c1 += 2; + c2 += 2; + c3 += 2; + c4 += 2; + c5 += 2; + c6 += 2; + c7 += 2; + vout[0] = _mm_shuffle_ps(vout[0], vout[0], _MM_SHUFFLE(2, 2, 2, 2)); + vout[1] = _mm_shuffle_ps(vout[1], vout[1], _MM_SHUFFLE(2, 2, 2, 2)); + vout[2] = _mm_shuffle_ps(vout[2], vout[2], _MM_SHUFFLE(2, 2, 2, 2)); + vout[3] = _mm_shuffle_ps(vout[3], vout[3], _MM_SHUFFLE(2, 2, 2, 2)); + vout[4] = _mm_shuffle_ps(vout[4], vout[4], _MM_SHUFFLE(2, 2, 2, 2)); + vout[5] = _mm_shuffle_ps(vout[5], vout[5], _MM_SHUFFLE(2, 2, 2, 2)); + vout[6] = _mm_shuffle_ps(vout[6], vout[6], _MM_SHUFFLE(2, 2, 2, 2)); + vout[7] = _mm_shuffle_ps(vout[7], vout[7], _MM_SHUFFLE(2, 2, 2, 2)); + } + if (nr != 0) { + *c0 = _mm_cvtss_f32(vout[0]); + *c1 = _mm_cvtss_f32(vout[1]); + *c2 = _mm_cvtss_f32(vout[2]); + *c3 = _mm_cvtss_f32(vout[3]); + *c4 = _mm_cvtss_f32(vout[4]); + *c5 = _mm_cvtss_f32(vout[5]); + *c6 = _mm_cvtss_f32(vout[6]); + *c7 = _mm_cvtss_f32(vout[7]); + } + } +} diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x8c1x4-dq-packedA-aarch64-neon.S b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x8c1x4-dq-packedA-aarch64-neon.S index 375581ec3fec..aca408e89757 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x8c1x4-dq-packedA-aarch64-neon.S +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x8c1x4-dq-packedA-aarch64-neon.S @@ -8,6 +8,24 @@ #include +#ifndef IGNORE_CODE_ALIGN_DIRECTIVES +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_5 .p2align 5 +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_4 .p2align 4 +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_3 .p2align 3 +#else +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_5 +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_4 +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_3 +#endif + +# Macro for separating instructions. For most builds, ; can be used, but for +# ARM64 + Mach, ; begins a comment, and %% is used to separate instructions +#if defined(__MACH__) +#define XX %% +#else +#define XX ; +#endif + .macro TRANSPOSE_4X4_S32 vin0, vin1, vin2, vin3, temp0, temp1, temp2, temp3 TRN1 \temp0\().4s, \vin0\().4s, \vin1\().4s TRN2 \temp1\().4s, \vin0\().4s, \vin1\().4s @@ -30,7 +48,460 @@ # |params | 16 # |-----------| -# void pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon( +# void pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch64_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint##W_INDEX_DTYPE_NUM_BITS##_t* w_row_ptr, +# const uint##W_INDEX_DTYPE_NUM_BITS##_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +#define MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_8X8_PACKEDA__AARCH64_NEON(W_INDEX_DTYPE_NUM_BITS, W_INDEX_DTYPE_NUM_BYTES_ARG, W_INDEX_DTYPE_LOG_NUM_BYTES_ARG, LOAD_INDEX_INSTRUCTION) XX\ + BEGIN_FUNCTION pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch64_neon XX\ + XX\ + STP d15, d14, [sp, -16] XX\ + STP d13, d12, [sp, -32] XX\ + STP d11, d10, [sp, -48] XX\ + STP d9, d8, [sp, -64] XX\ + XX\ + MOV x11, x1 XX\ + /* Load output channel index */ XX\ + LDR x10, [sp, 8] XX\ + /* Load params */ XX\ + LDR x8, [sp, 16] XX\ + XX\ + /* Load a_zero_point */ XX\ + LD1R {v24.8b}, [x8] XX\ + ADD x8, x8, 8 XX\ + XX\ + /* Load pointer to per channel zero points array */ XX\ + LDR x17, [x8], 8 XX\ + XX\ + /* Load pointer to per channel multiplier */ XX\ + LDR x13, [x8] XX\ + XX\ + /* Add offset to the base pointer */ XX\ + ADD x17, x17, x10 XX\ + /* Mul by 4 to get byte offset for multiplier */ XX\ + LSL x10, x10, 2 XX\ + /* Add offset to the base pointer for multiplier */ XX\ + ADD x13, x13, x10 XX\ + XX\ + /* Load b_zero_point */ XX\ + LD1 {v25.8b}, [x17] XX\ + /* Load multiplier c0123 */ XX\ + LD1 {v26.4s}, [x13], 16 XX\ + /* Load multiplier c4567 */ XX\ + LD1 {v30.4s}, [x13] XX\ + XX\ + EOR x12, x12, x12 XX\ + EOR x13, x13, x13 XX\ + XX\ + CMP x1, 1 XX\ + B.LO _7_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_5 XX\ + _0_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + /* v8 := zero */ XX\ + EOR v8.16b, v8.16b, v8.16b XX\ + /* v9 := zero */ XX\ + EOR v9.16b, v9.16b, v9.16b XX\ + XX\ + DUP v29.8b, v25.b[0] XX\ + /* w12 = w_row_ptr[n], x13 = w_row_ptr[n+1] */ XX\ + /* x4 = x4 + W_INDEX_DTYPE_NUM_BYTES_ARG to point to next n */ XX\ + LOAD_INDEX_INSTRUCTION w12, [x4], W_INDEX_DTYPE_NUM_BYTES_ARG XX\ + LOAD_INDEX_INSTRUCTION w13, [x4] XX\ + /* x10 = temp_packed_w = packed_w + w_row_ptr[n] * 4 */ XX\ + /* This points to the first block of nonzero value */ XX\ + /* for the nth row. */ XX\ + ADD x10, x3, x12, LSL #2 XX\ + /* x9 = temp_w_block_ids_ptr = w_block_ids_ptr (x5) + w_row_ptr[n] */ XX\ + /* LSL for when elements are >1 byte */ XX\ + /* (4 bytes: LSL #2, 2 bytes: LSL #1, 1 byte: LSL #0) */ XX\ + /* This points to the block id of the first block */ XX\ + /* It should contain x13 - x12 number of block ids */ XX\ + ADD x9, x5, x12, LSL W_INDEX_DTYPE_LOG_NUM_BYTES_ARG XX\ + /* x8 = num_blocks that needs to be processed */ XX\ + SUB x8, x13, x12 XX\ + SUBS x8, x8, 2 XX\ + B.LO _1_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + k_loop_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + /* b0-7 (channel 0) */ XX\ + LD1 {v10.8b}, [x10], 8 XX\ + USUBL v10.8h, v10.8b, v29.8b XX\ + XX\ + /* x12 = block_id_ptr[0] */ XX\ + /* x13 = block_id_ptr[1] */ XX\ + LOAD_INDEX_INSTRUCTION w12, [x9], W_INDEX_DTYPE_NUM_BYTES_ARG XX\ + LOAD_INDEX_INSTRUCTION w13, [x9], W_INDEX_DTYPE_NUM_BYTES_ARG XX\ + /* Add offset to x2 */ XX\ + /* Shift by 5 because each packed block is a block of 8x4 */ XX\ + /* which 32 bytes */ XX\ + ADD x16, x2, x12, LSL #5 XX\ + ADD x17, x2, x13, LSL #5 XX\ + XX\ + LD1 {v0.8b}, [x16], 8 XX\ + LD1 {v1.8b}, [x16], 8 XX\ + LD1 {v2.8b}, [x16], 8 XX\ + LD1 {v3.8b}, [x16] XX\ + LD1 {v4.8b}, [x17], 8 XX\ + LD1 {v5.8b}, [x17], 8 XX\ + LD1 {v6.8b}, [x17], 8 XX\ + LD1 {v7.8b}, [x17] XX\ + XX\ + USUBL v0.8h, v0.8b, v24.8b XX\ + USUBL v1.8h, v1.8b, v24.8b XX\ + USUBL v2.8h, v2.8b, v24.8b XX\ + USUBL v3.8h, v3.8b, v24.8b XX\ + USUBL v4.8h, v4.8b, v24.8b XX\ + USUBL v5.8h, v5.8b, v24.8b XX\ + USUBL v6.8h, v6.8b, v24.8b XX\ + USUBL v7.8h, v7.8b, v24.8b XX\ + XX\ + SMLAL v8.4s, v0.4h, v10.h[0] XX\ + SMLAL2 v9.4s, v0.8h, v10.h[0] XX\ + SMLAL v8.4s, v1.4h, v10.h[1] XX\ + SMLAL2 v9.4s, v1.8h, v10.h[1] XX\ + SMLAL v8.4s, v2.4h, v10.h[2] XX\ + SMLAL2 v9.4s, v2.8h, v10.h[2] XX\ + SMLAL v8.4s, v3.4h, v10.h[3] XX\ + SMLAL2 v9.4s, v3.8h, v10.h[3] XX\ + SMLAL v8.4s, v4.4h, v10.h[4] XX\ + SMLAL2 v9.4s, v4.8h, v10.h[4] XX\ + SMLAL v8.4s, v5.4h, v10.h[5] XX\ + SMLAL2 v9.4s, v5.8h, v10.h[5] XX\ + SMLAL v8.4s, v6.4h, v10.h[6] XX\ + SMLAL2 v9.4s, v6.8h, v10.h[6] XX\ + SUBS x8, x8, 2 XX\ + SMLAL v8.4s, v7.4h, v10.h[7] XX\ + SMLAL2 v9.4s, v7.8h, v10.h[7] XX\ + XX\ + XX\ + B.HS k_loop_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + _1_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + CMP x8, -2 XX\ + B.EQ _2_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + /* b0-7 (channel 0) */ XX\ + LD1R {v10.4s}, [x10] XX\ + USUBL v10.8h, v10.8b, v29.8b XX\ + XX\ + /* x12 = block_id_ptr[0] */ XX\ + LOAD_INDEX_INSTRUCTION w12, [x9] XX\ + /* Add offset to x2 */ XX\ + /* Shift by 5 because each packed block is a block of 8x4 */ XX\ + /* which 32 bytes */ XX\ + ADD x16, x2, x12, LSL #5 XX\ + XX\ + LD1 {v0.8b}, [x16], 8 XX\ + LD1 {v1.8b}, [x16], 8 XX\ + LD1 {v2.8b}, [x16], 8 XX\ + LD1 {v3.8b}, [x16] XX\ + XX\ + USUBL v0.8h, v0.8b, v24.8b XX\ + USUBL v1.8h, v1.8b, v24.8b XX\ + USUBL v2.8h, v2.8b, v24.8b XX\ + USUBL v3.8h, v3.8b, v24.8b XX\ + XX\ + SMLAL v8.4s, v0.4h, v10.h[0] XX\ + SMLAL2 v9.4s, v0.8h, v10.h[0] XX\ + SMLAL v8.4s, v1.4h, v10.h[1] XX\ + SMLAL2 v9.4s, v1.8h, v10.h[1] XX\ + SMLAL v8.4s, v2.4h, v10.h[2] XX\ + SMLAL2 v9.4s, v2.8h, v10.h[2] XX\ + SMLAL v8.4s, v3.4h, v10.h[3] XX\ + SMLAL2 v9.4s, v3.8h, v10.h[3] XX\ + XX\ + NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_4 XX\ + _2_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + /* Store result on stack */ XX\ + XX\ + /* -64 because all d8-d15 are on stack */ XX\ + /* + 256 bytes of buffer when nr = 1 */ XX\ + /* 256 because we are doing 8x8 block with each value being 4 bytes */ XX\ + /* Thus 64 * 4 = 256 */ XX\ + /* 256 + 64 = 320 */ XX\ + /* This is needed because after processing all nrs we will */ XX\ + /* load 256 bytes from stack. */ XX\ + /* Thus we will load accumulators back in v8, v9, v10, v11, v12, v13, v14, v15 */ XX\ + /* v16, v17, v18, v19, v20, v21, v22, v23 */ XX\ + /* When nr < 8, say nr = 1, extra v values will be fetched from stack which may overlap */ XX\ + /* with other parts of stack storing local variables. To avoid that we just */ XX\ + /* create a buffer of 256 bytes inbetween to make sure pointer increment */ XX\ + /* never produces address that is beyond the stack frame of this function. */ XX\ + SUB x9, sp, 320 XX\ + /* Each iteration produce 8 values each of 4 bytes */ XX\ + /* Thus 8 x 4 = 32 bytes 2^5 */ XX\ + /* In this implementation, first value will be stored at */ XX\ + /* 1st value: sp - 64 - r1 * 32 */ XX\ + /* 2nd value: sp - 12 - (r1 - 1) * 32 */ XX\ + /* and so on. */ XX\ + SUB x9, x9, x1, LSL #5 XX\ + ST1 {v8.4s}, [x9], 16 XX\ + ST1 {v9.4s}, [x9] XX\ + XX\ + /* Shift zero point vector by 8 to load */ XX\ + /* zero point of the next channel */ XX\ + SRI v25.2d, v25.2d, #8 XX\ + /* Check if nr >=1 */ XX\ + SUBS x1, x1, 1 XX\ + BHI _0_w##W_INDEX_DTYPE_NUM_BITS XX\ + _3_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + /* First load all the accumulators from stack */ XX\ + /* Load nr */ XX\ + SUB x9, sp, 320 XX\ + SUB x9, x9, x11, LSL #5 XX\ + /* Now load v8-v15 */ XX\ + /* This is 8x4 block (nrxmr) */ XX\ + /* We will transpose this to 4x8 (mrxnr) */ XX\ + /* v8, v9 : x00, x10, x20, x30; x40, x50, x60, x70 */ XX\ + /* v10, v11 : x01, x11, x21, x31; x41, x51, x61, x71 */ XX\ + /* v12, v13 : x02, x12, x22, x32; x42, x52, x62, x72 */ XX\ + /* v14, v15 : x03, x13, x23, x33; x43, x53, x63, x73 */ XX\ + /* */ XX\ + /* v16, v17 : x04, x14, x24, x34; x44, x54, x64, x74 */ XX\ + /* v18, v19 : x05, x15, x25, x35; x45, x55, x65, x75 */ XX\ + /* v20, v21 : x06, x16, x26, x36; x46, x56, x66, x76 */ XX\ + /* v22, v23 : x07, x17, x27, x37; x47, x57, x67, x77 */ XX\ + LD1 {v8.4s}, [x9], 16 XX\ + LD1 {v9.4s}, [x9], 16 XX\ + LD1 {v10.4s}, [x9], 16 XX\ + LD1 {v11.4s}, [x9], 16 XX\ + LD1 {v12.4s}, [x9], 16 XX\ + LD1 {v13.4s}, [x9], 16 XX\ + LD1 {v14.4s}, [x9], 16 XX\ + LD1 {v15.4s}, [x9], 16 XX\ + LD1 {v16.4s}, [x9], 16 XX\ + LD1 {v17.4s}, [x9], 16 XX\ + LD1 {v18.4s}, [x9], 16 XX\ + LD1 {v19.4s}, [x9], 16 XX\ + LD1 {v20.4s}, [x9], 16 XX\ + LD1 {v21.4s}, [x9], 16 XX\ + LD1 {v22.4s}, [x9], 16 XX\ + LD1 {v23.4s}, [x9] XX\ + XX\ + /* We can tranpose one 4x4 block using macro */ XX\ + /* TRANSPOSE_4X4_S32 v8, v10, v12, v14, v0, v1, v2, v3 */ XX\ + /* After this we have */ XX\ + /* v8 : x00, x01, x02, x03 */ XX\ + /* v10 : x10, x11, x12, x13 */ XX\ + /* v12 : x20, x21, x22, x23 */ XX\ + /* v14 : x30, x31, x32, x33 */ XX\ + /* Then using */ XX\ + /* TRANSPOSE_4X4_S32 v16, v18, v20, v22, v4, v5, v6, v7 */ XX\ + /* We get */ XX\ + /* v16 : x04, x05, x06, x07 */ XX\ + /* v18 : x14, x15, x16, x17 */ XX\ + /* v20 : x24, x25, x26, x27 */ XX\ + /* v22 : x34, x35, x36, x37 */ XX\ + /* Similarly we can transpose other two 4x4 blocks and we get */ XX\ + /* tranposed 8x8 */ XX\ + XX\ + TRANSPOSE_4X4_S32 v8, v10, v12, v14, v0, v1, v2, v3 XX\ + TRANSPOSE_4X4_S32 v16, v18, v20, v22, v4, v5, v6, v7 XX\ + TRANSPOSE_4X4_S32 v9, v11, v13, v15, v0, v1, v2, v3 XX\ + TRANSPOSE_4X4_S32 v17, v19, v21, v23, v4, v5, v6, v7 XX\ + XX\ + /* row 0: v8, v16 */ XX\ + /* row 1: v10, v18 */ XX\ + /* row 2: v12, v20 */ XX\ + /* row 3: v14, v22 */ XX\ + /* row 4: v9, v17 */ XX\ + /* row 5: v11, v19 */ XX\ + /* row 6: v13, v21 */ XX\ + /* row 7: v15, v23 */ XX\ + XX\ + /* Load c_stride & params */ XX\ + LDR x16, [sp] XX\ + LSL x16, x16, 2 XX\ + LD1 {v24.4s}, [x6], 16 XX\ + LD1 {v25.4s}, [x6] XX\ + XX\ + SCVTF v8.4s, v8.4s XX\ + SCVTF v9.4s, v9.4s XX\ + SCVTF v10.4s, v10.4s XX\ + SCVTF v11.4s, v11.4s XX\ + SCVTF v12.4s, v12.4s XX\ + SCVTF v13.4s, v13.4s XX\ + SCVTF v14.4s, v14.4s XX\ + SCVTF v15.4s, v15.4s XX\ + SCVTF v16.4s, v16.4s XX\ + SCVTF v17.4s, v17.4s XX\ + SCVTF v18.4s, v18.4s XX\ + SCVTF v19.4s, v19.4s XX\ + SCVTF v20.4s, v20.4s XX\ + SCVTF v21.4s, v21.4s XX\ + SCVTF v22.4s, v22.4s XX\ + SCVTF v23.4s, v23.4s XX\ + XX\ + FMUL v8.4s, v8.4s, v26.4s XX\ + FMUL v16.4s, v16.4s, v30.4s XX\ + FMUL v10.4s, v10.4s, v26.4s XX\ + FMUL v18.4s, v18.4s, v30.4s XX\ + FMUL v12.4s, v12.4s, v26.4s XX\ + FMUL v20.4s, v20.4s, v30.4s XX\ + FMUL v14.4s, v14.4s, v26.4s XX\ + FMUL v22.4s, v22.4s, v30.4s XX\ + FMUL v9.4s, v9.4s, v26.4s XX\ + FMUL v17.4s, v17.4s, v30.4s XX\ + FMUL v11.4s, v11.4s, v26.4s XX\ + FMUL v19.4s, v19.4s, v30.4s XX\ + FMUL v13.4s, v13.4s, v26.4s XX\ + FMUL v21.4s, v21.4s, v30.4s XX\ + FMUL v15.4s, v15.4s, v26.4s XX\ + FMUL v23.4s, v23.4s, v30.4s XX\ + XX\ + FADD v8.4s, v8.4s, v24.4s XX\ + FADD v16.4s, v16.4s, v25.4s XX\ + FADD v10.4s, v10.4s, v24.4s XX\ + FADD v18.4s, v18.4s, v25.4s XX\ + FADD v12.4s, v12.4s, v24.4s XX\ + FADD v20.4s, v20.4s, v25.4s XX\ + FADD v14.4s, v14.4s, v24.4s XX\ + FADD v22.4s, v22.4s, v25.4s XX\ + FADD v9.4s, v9.4s, v24.4s XX\ + FADD v17.4s, v17.4s, v25.4s XX\ + FADD v11.4s, v11.4s, v24.4s XX\ + FADD v19.4s, v19.4s, v25.4s XX\ + FADD v13.4s, v13.4s, v24.4s XX\ + FADD v21.4s, v21.4s, v25.4s XX\ + FADD v15.4s, v15.4s, v24.4s XX\ + FADD v23.4s, v23.4s, v25.4s XX\ + XX\ + /* Compute c0-c7 */ XX\ + XX\ + ADD x9, x7, x16 XX\ + CMP x0, 2 XX\ + CSEL x9, x7, x9, LO XX\ + XX\ + ADD x10, x9, x16 XX\ + CSEL x10, x9, x10, LS XX\ + XX\ + ADD x8, x10, x16 XX\ + CMP x0, 4 XX\ + CSEL x8, x10, x8, LO XX\ + XX\ + ADD x12, x8, x16 XX\ + CSEL x12, x8, x12, LS XX\ + XX\ + ADD x13, x12, x16 XX\ + CMP x0, 6 XX\ + CSEL x13, x12, x13, LO XX\ + XX\ + ADD x14, x13, x16 XX\ + CSEL x14, x13, x14, LS XX\ + XX\ + ADD x15, x14, x16 XX\ + CMP x0, 8 XX\ + CSEL x15, x14, x15, NE XX\ + XX\ + CMP x11, 8 XX\ + B.NE _4_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + ST1 {v8.4s}, [x7], 16 XX\ + ST1 {v16.4s}, [x7] XX\ + ST1 {v10.4s}, [x9], 16 XX\ + ST1 {v18.4s}, [x9] XX\ + ST1 {v12.4s}, [x10], 16 XX\ + ST1 {v20.4s}, [x10] XX\ + ST1 {v14.4s}, [x8], 16 XX\ + ST1 {v22.4s}, [x8] XX\ + ST1 {v9.4s}, [x12], 16 XX\ + ST1 {v17.4s}, [x12] XX\ + ST1 {v11.4s}, [x13], 16 XX\ + ST1 {v19.4s}, [x13] XX\ + ST1 {v13.4s}, [x14], 16 XX\ + ST1 {v21.4s}, [x14] XX\ + ST1 {v15.4s}, [x15], 16 XX\ + ST1 {v23.4s}, [x15] XX\ + XX\ + LDP d9, d8, [sp, -64] XX\ + LDP d11, d10, [sp, -48] XX\ + LDP d13, d12, [sp, -32] XX\ + LDP d15, d14, [sp, -16] XX\ + XX\ + RET XX\ + XX\ + NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_3 XX\ + _4_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + CMP x11, 4 XX\ + B.LO _5_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + ST1 {v8.4s}, [x7], 16 XX\ + ST1 {v10.4s}, [x9], 16 XX\ + ST1 {v12.4s}, [x10], 16 XX\ + ST1 {v14.4s}, [x8], 16 XX\ + ST1 {v9.4s}, [x12], 16 XX\ + ST1 {v11.4s}, [x13], 16 XX\ + ST1 {v13.4s}, [x14], 16 XX\ + ST1 {v15.4s}, [x15], 16 XX\ + XX\ + SUB x11, x11, 4 XX\ + XX\ + MOV v8.16b, v16.16b XX\ + MOV v10.16b, v18.16b XX\ + MOV v12.16b, v20.16b XX\ + MOV v14.16b, v22.16b XX\ + MOV v9.16b, v17.16b XX\ + MOV v11.16b, v19.16b XX\ + MOV v13.16b, v21.16b XX\ + MOV v15.16b, v23.16b XX\ + XX\ + _5_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + CMP x11, 2 XX\ + B.LO _6_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + ST1 {v8.2s}, [x7], 8 XX\ + ST1 {v10.2s}, [x9], 8 XX\ + ST1 {v12.2s}, [x10], 8 XX\ + ST1 {v14.2s}, [x8], 8 XX\ + ST1 {v9.2s}, [x12], 8 XX\ + ST1 {v11.2s}, [x13], 8 XX\ + ST1 {v13.2s}, [x14], 8 XX\ + ST1 {v15.2s}, [x15], 8 XX\ + XX\ + SUB x11, x11, 2 XX\ + XX\ + EXT v8.16b, v8.16b, v8.16b, 8 XX\ + EXT v10.16b, v10.16b, v10.16b, 8 XX\ + EXT v12.16b, v12.16b, v12.16b, 8 XX\ + EXT v14.16b, v14.16b, v14.16b, 8 XX\ + EXT v9.16b, v9.16b, v9.16b, 8 XX\ + EXT v11.16b, v11.16b, v11.16b, 8 XX\ + EXT v13.16b, v13.16b, v13.16b, 8 XX\ + EXT v15.16b, v15.16b, v15.16b, 8 XX\ + XX\ + _6_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + CMP x11, 1 XX\ + B.LO _7_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + ST1 {v8.s}[0], [x7] XX\ + ST1 {v10.s}[0], [x9] XX\ + ST1 {v12.s}[0], [x10] XX\ + ST1 {v14.s}[0], [x8] XX\ + ST1 {v9.s}[0], [x12] XX\ + ST1 {v11.s}[0], [x13] XX\ + ST1 {v13.s}[0], [x14] XX\ + ST1 {v15.s}[0], [x15] XX\ + XX\ + _7_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + LDP d9, d8, [sp, -64] XX\ + LDP d11, d10, [sp, -48] XX\ + LDP d13, d12, [sp, -32] XX\ + LDP d15, d14, [sp, -16] XX\ + XX\ + RET XX\ + XX\ + END_FUNCTION pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch64_neon + +# void pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w32__aarch64_neon( # size_t mr, # size_t nr, # const uint8_t* a_packed, @@ -42,451 +513,42 @@ # size_t c_stride, # size_t output_channel_index, # const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) -BEGIN_FUNCTION pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA__aarch64_neon - - STP d15, d14, [sp, -16] - STP d13, d12, [sp, -32] - STP d11, d10, [sp, -48] - STP d9, d8, [sp, -64] - - MOV x11, x1 - # Load output channel index - LDR x10, [sp, 8] - # Load params - LDR x8, [sp, 16] - - # Load a_zero_point - LD1R {v24.8b}, [x8] - ADD x8, x8, 8 - - # Load pointer to per channel zero points array - LDR x17, [x8], 8 - - # Load pointer to per channel multiplier - LDR x13, [x8] - - # Add offset to the base pointer - ADD x17, x17, x10 - # Mul by 4 to get byte offset for multiplier - LSL x10, x10, 2 - # Add offset to the base pointer for multiplier - ADD x13, x13, x10 - - # Load b_zero_point - LD1 {v25.8b}, [x17] - # Load multiplier c0123 - LD1 {v26.4s}, [x13], 16 - # Load multiplier c4567 - LD1 {v30.4s}, [x13] - - EOR x12, x12, x12 - EOR x13, x13, x13 - - CMP x1, 1 - B.LO 7f - -#ifndef IGNORE_CODE_ALIGN_DIRECTIVES - .p2align 5 -#endif -0: - # v8 := zero - EOR v8.16b, v8.16b, v8.16b - # v9 := zero - EOR v9.16b, v9.16b, v9.16b - - DUP v29.8b, v25.b[0] - # w12 = w_row_ptr[n], x13 = w_row_ptr[n+1] - # x4 = x4 + 4 to point to next n - LDR w12, [x4], #4 - LDR w13, [x4] - # x10 = temp_packed_w = packed_w + w_row_ptr[n] * 4 - # This points to the first block of nonzero value - # for the nth row. - ADD x10, x3, x12, LSL #2 - # x9 = temp_w_block_ids_ptr = w_block_ids_ptr (x5) + w_row_ptr[n] - # LSL2 because each element is 4 bytes - # This points to the block id of the first block - # It should contain x13 - x12 number of block ids - ADD x9, x5, x12, LSL #2 - # x8 = num_blocks that needs to be processed - SUB x8, x13, x12 - SUBS x8, x8, 2 - B.LO 1f - -k_loop: - // b0-7 (channel 0) - LD1 {v10.8b}, [x10], 8 - USUBL v10.8h, v10.8b, v29.8b - - #x12 = block_id_ptr[0] - #x13 = block_id_ptr[1] - LDR w12, [x9], #4 - LDR w13, [x9], #4 - # Add offset to x2 - # Shift by 5 because each packed block is a block of 8x4 - # which 32 bytes - ADD x16, x2, x12, LSL #5 - ADD x17, x2, x13, LSL #5 - - LD1 {v0.8b}, [x16], 8 - LD1 {v1.8b}, [x16], 8 - LD1 {v2.8b}, [x16], 8 - LD1 {v3.8b}, [x16] - LD1 {v4.8b}, [x17], 8 - LD1 {v5.8b}, [x17], 8 - LD1 {v6.8b}, [x17], 8 - LD1 {v7.8b}, [x17] - - USUBL v0.8h, v0.8b, v24.8b - USUBL v1.8h, v1.8b, v24.8b - USUBL v2.8h, v2.8b, v24.8b - USUBL v3.8h, v3.8b, v24.8b - USUBL v4.8h, v4.8b, v24.8b - USUBL v5.8h, v5.8b, v24.8b - USUBL v6.8h, v6.8b, v24.8b - USUBL v7.8h, v7.8b, v24.8b - - SMLAL v8.4s, v0.4h, v10.h[0] - SMLAL2 v9.4s, v0.8h, v10.h[0] - SMLAL v8.4s, v1.4h, v10.h[1] - SMLAL2 v9.4s, v1.8h, v10.h[1] - SMLAL v8.4s, v2.4h, v10.h[2] - SMLAL2 v9.4s, v2.8h, v10.h[2] - SMLAL v8.4s, v3.4h, v10.h[3] - SMLAL2 v9.4s, v3.8h, v10.h[3] - SMLAL v8.4s, v4.4h, v10.h[4] - SMLAL2 v9.4s, v4.8h, v10.h[4] - SMLAL v8.4s, v5.4h, v10.h[5] - SMLAL2 v9.4s, v5.8h, v10.h[5] - SMLAL v8.4s, v6.4h, v10.h[6] - SMLAL2 v9.4s, v6.8h, v10.h[6] - SUBS x8, x8, 2 - SMLAL v8.4s, v7.4h, v10.h[7] - SMLAL2 v9.4s, v7.8h, v10.h[7] - - - B.HS k_loop - -1: - CMP x8, -2 - B.EQ 2f - - // b0-7 (channel 0) - LD1R {v10.4s}, [x10] - USUBL v10.8h, v10.8b, v29.8b - - #x12 = block_id_ptr[0] - LDR w12, [x9] - # Add offset to x2 - # Shift by 5 because each packed block is a block of 8x4 - # which 32 bytes - ADD x16, x2, x12, LSL #5 - - LD1 {v0.8b}, [x16], 8 - LD1 {v1.8b}, [x16], 8 - LD1 {v2.8b}, [x16], 8 - LD1 {v3.8b}, [x16] - - USUBL v0.8h, v0.8b, v24.8b - USUBL v1.8h, v1.8b, v24.8b - USUBL v2.8h, v2.8b, v24.8b - USUBL v3.8h, v3.8b, v24.8b - - SMLAL v8.4s, v0.4h, v10.h[0] - SMLAL2 v9.4s, v0.8h, v10.h[0] - SMLAL v8.4s, v1.4h, v10.h[1] - SMLAL2 v9.4s, v1.8h, v10.h[1] - SMLAL v8.4s, v2.4h, v10.h[2] - SMLAL2 v9.4s, v2.8h, v10.h[2] - SMLAL v8.4s, v3.4h, v10.h[3] - SMLAL2 v9.4s, v3.8h, v10.h[3] - -#ifndef IGNORE_CODE_ALIGN_DIRECTIVES - .p2align 4 -#endif -2: - # Store result on stack - - # -64 because all d8-d15 are on stack - # + 256 bytes of buffer when nr = 1 - # 256 because we are doing 8x8 block with each value being 4 bytes - # Thus 64 * 4 = 256 - # 256 + 64 = 320 - # This is needed because after processing all nrs we will - # load 256 bytes from stack. - # Thus we will load accumulators back in v8, v9, v10, v11, v12, v13, v14, v15 - # v16, v17, v18, v19, v20, v21, v22, v23 - # When nr < 8, say nr = 1, extra v values will be fetched from stack which may overlap - # with other parts of stack storing local variables. To avoid that we just - # create a buffer of 256 bytes inbetween to make sure pointer increment - # never produces address that is beyond the stack frame of this function. - SUB x9, sp, 320 - # Each iteration produce 8 values each of 4 bytes - # Thus 8 x 4 = 32 bytes 2^5 - # In this implementation, first value will be stored at - # 1st value: sp - 64 - r1 * 32 - # 2nd value: sp - 12 - (r1 - 1) * 32 - # and so on. - SUB x9, x9, x1, LSL #5 - ST1 {v8.4s}, [x9], 16 - ST1 {v9.4s}, [x9] - - # Shift zero point vector by 8 to load - # zero point of the next channel - SRI v25.2d, v25.2d, #8 - # Check if nr >=1 - SUBS x1, x1, 1 - BHI 0b -3: - # First load all the accumulators from stack - # Load nr - SUB x9, sp, 320 - SUB x9, x9, x11, LSL #5 - # Now load v8-v15 - # This is 8x4 block (nrxmr) - # We will transpose this to 4x8 (mrxnr) - # v8, v9 : x00, x10, x20, x30; x40, x50, x60, x70 - # v10, v11 : x01, x11, x21, x31; x41, x51, x61, x71 - # v12, v13 : x02, x12, x22, x32; x42, x52, x62, x72 - # v14, v15 : x03, x13, x23, x33; x43, x53, x63, x73 - # - # v16, v17 : x04, x14, x24, x34; x44, x54, x64, x74 - # v18, v19 : x05, x15, x25, x35; x45, x55, x65, x75 - # v20, v21 : x06, x16, x26, x36; x46, x56, x66, x76 - # v22, v23 : x07, x17, x27, x37; x47, x57, x67, x77 - LD1 {v8.4s}, [x9], 16 - LD1 {v9.4s}, [x9], 16 - LD1 {v10.4s}, [x9], 16 - LD1 {v11.4s}, [x9], 16 - LD1 {v12.4s}, [x9], 16 - LD1 {v13.4s}, [x9], 16 - LD1 {v14.4s}, [x9], 16 - LD1 {v15.4s}, [x9], 16 - LD1 {v16.4s}, [x9], 16 - LD1 {v17.4s}, [x9], 16 - LD1 {v18.4s}, [x9], 16 - LD1 {v19.4s}, [x9], 16 - LD1 {v20.4s}, [x9], 16 - LD1 {v21.4s}, [x9], 16 - LD1 {v22.4s}, [x9], 16 - LD1 {v23.4s}, [x9] - - # We can tranpose one 4x4 block using macro - # TRANSPOSE_4X4_S32 v8, v10, v12, v14, v0, v1, v2, v3 - # After this we have - # v8 : x00, x01, x02, x03 - # v10 : x10, x11, x12, x13 - # v12 : x20, x21, x22, x23 - # v14 : x30, x31, x32, x33 - # Then using - # TRANSPOSE_4X4_S32 v16, v18, v20, v22, v4, v5, v6, v7 - # We get - # v16 : x04, x05, x06, x07 - # v18 : x14, x15, x16, x17 - # v20 : x24, x25, x26, x27 - # v22 : x34, x35, x36, x37 - # Similarly we can transpose other two 4x4 blocks and we get - # tranposed 8x8 - - TRANSPOSE_4X4_S32 v8, v10, v12, v14, v0, v1, v2, v3 - TRANSPOSE_4X4_S32 v16, v18, v20, v22, v4, v5, v6, v7 - TRANSPOSE_4X4_S32 v9, v11, v13, v15, v0, v1, v2, v3 - TRANSPOSE_4X4_S32 v17, v19, v21, v23, v4, v5, v6, v7 - - # row 0: v8, v16 - # row 1: v10, v18 - # row 2: v12, v20 - # row 3: v14, v22 - # row 4: v9, v17 - # row 5: v11, v19 - # row 6: v13, v21 - # row 7: v15, v23 +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_8X8_PACKEDA__AARCH64_NEON(32, #4, #2, LDR) - # Load c_stride & params - LDR x16, [sp] - LSL x16, x16, 2 - LD1 {v24.4s}, [x6], 16 - LD1 {v25.4s}, [x6] - - SCVTF v8.4s, v8.4s - SCVTF v9.4s, v9.4s - SCVTF v10.4s, v10.4s - SCVTF v11.4s, v11.4s - SCVTF v12.4s, v12.4s - SCVTF v13.4s, v13.4s - SCVTF v14.4s, v14.4s - SCVTF v15.4s, v15.4s - SCVTF v16.4s, v16.4s - SCVTF v17.4s, v17.4s - SCVTF v18.4s, v18.4s - SCVTF v19.4s, v19.4s - SCVTF v20.4s, v20.4s - SCVTF v21.4s, v21.4s - SCVTF v22.4s, v22.4s - SCVTF v23.4s, v23.4s - - FMUL v8.4s, v8.4s, v26.4s - FMUL v16.4s, v16.4s, v30.4s - FMUL v10.4s, v10.4s, v26.4s - FMUL v18.4s, v18.4s, v30.4s - FMUL v12.4s, v12.4s, v26.4s - FMUL v20.4s, v20.4s, v30.4s - FMUL v14.4s, v14.4s, v26.4s - FMUL v22.4s, v22.4s, v30.4s - FMUL v9.4s, v9.4s, v26.4s - FMUL v17.4s, v17.4s, v30.4s - FMUL v11.4s, v11.4s, v26.4s - FMUL v19.4s, v19.4s, v30.4s - FMUL v13.4s, v13.4s, v26.4s - FMUL v21.4s, v21.4s, v30.4s - FMUL v15.4s, v15.4s, v26.4s - FMUL v23.4s, v23.4s, v30.4s - - FADD v8.4s, v8.4s, v24.4s - FADD v16.4s, v16.4s, v25.4s - FADD v10.4s, v10.4s, v24.4s - FADD v18.4s, v18.4s, v25.4s - FADD v12.4s, v12.4s, v24.4s - FADD v20.4s, v20.4s, v25.4s - FADD v14.4s, v14.4s, v24.4s - FADD v22.4s, v22.4s, v25.4s - FADD v9.4s, v9.4s, v24.4s - FADD v17.4s, v17.4s, v25.4s - FADD v11.4s, v11.4s, v24.4s - FADD v19.4s, v19.4s, v25.4s - FADD v13.4s, v13.4s, v24.4s - FADD v21.4s, v21.4s, v25.4s - FADD v15.4s, v15.4s, v24.4s - FADD v23.4s, v23.4s, v25.4s - - // Compute c0-c7 - - ADD x9, x7, x16 - CMP x0, 2 - CSEL x9, x7, x9, LO - - ADD x10, x9, x16 - CSEL x10, x9, x10, LS - - ADD x8, x10, x16 - CMP x0, 4 - CSEL x8, x10, x8, LO - - ADD x12, x8, x16 - CSEL x12, x8, x12, LS - - ADD x13, x12, x16 - CMP x0, 6 - CSEL x13, x12, x13, LO - - ADD x14, x13, x16 - CSEL x14, x13, x14, LS - - ADD x15, x14, x16 - CMP x0, 8 - CSEL x15, x14, x15, NE - - CMP x11, 8 - B.NE 4f - - ST1 {v8.4s}, [x7], 16 - ST1 {v16.4s}, [x7] - ST1 {v10.4s}, [x9], 16 - ST1 {v18.4s}, [x9] - ST1 {v12.4s}, [x10], 16 - ST1 {v20.4s}, [x10] - ST1 {v14.4s}, [x8], 16 - ST1 {v22.4s}, [x8] - ST1 {v9.4s}, [x12], 16 - ST1 {v17.4s}, [x12] - ST1 {v11.4s}, [x13], 16 - ST1 {v19.4s}, [x13] - ST1 {v13.4s}, [x14], 16 - ST1 {v21.4s}, [x14] - ST1 {v15.4s}, [x15], 16 - ST1 {v23.4s}, [x15] - - LDP d9, d8, [sp, -64] - LDP d11, d10, [sp, -48] - LDP d13, d12, [sp, -32] - LDP d15, d14, [sp, -16] - - RET - -#ifndef IGNORE_CODE_ALIGN_DIRECTIVES - .p2align 3 -#endif -4: - CMP x11, 4 - B.LO 5f - - ST1 {v8.4s}, [x7], 16 - ST1 {v10.4s}, [x9], 16 - ST1 {v12.4s}, [x10], 16 - ST1 {v14.4s}, [x8], 16 - ST1 {v9.4s}, [x12], 16 - ST1 {v11.4s}, [x13], 16 - ST1 {v13.4s}, [x14], 16 - ST1 {v15.4s}, [x15], 16 - - SUB x11, x11, 4 - - MOV v8.16b, v16.16b - MOV v10.16b, v18.16b - MOV v12.16b, v20.16b - MOV v14.16b, v22.16b - MOV v9.16b, v17.16b - MOV v11.16b, v19.16b - MOV v13.16b, v21.16b - MOV v15.16b, v23.16b - -5: - CMP x11, 2 - B.LO 6f - - ST1 {v8.2s}, [x7], 8 - ST1 {v10.2s}, [x9], 8 - ST1 {v12.2s}, [x10], 8 - ST1 {v14.2s}, [x8], 8 - ST1 {v9.2s}, [x12], 8 - ST1 {v11.2s}, [x13], 8 - ST1 {v13.2s}, [x14], 8 - ST1 {v15.2s}, [x15], 8 - - SUB x11, x11, 2 - - EXT v8.16b, v8.16b, v8.16b, 8 - EXT v10.16b, v10.16b, v10.16b, 8 - EXT v12.16b, v12.16b, v12.16b, 8 - EXT v14.16b, v14.16b, v14.16b, 8 - EXT v9.16b, v9.16b, v9.16b, 8 - EXT v11.16b, v11.16b, v11.16b, 8 - EXT v13.16b, v13.16b, v13.16b, 8 - EXT v15.16b, v15.16b, v15.16b, 8 - -6: - CMP x11, 1 - B.LO 7f - - ST1 {v8.s}[0], [x7] - ST1 {v10.s}[0], [x9] - ST1 {v12.s}[0], [x10] - ST1 {v14.s}[0], [x8] - ST1 {v9.s}[0], [x12] - ST1 {v11.s}[0], [x13] - ST1 {v13.s}[0], [x14] - ST1 {v15.s}[0], [x15] - -7: - LDP d9, d8, [sp, -64] - LDP d11, d10, [sp, -48] - LDP d13, d12, [sp, -32] - LDP d15, d14, [sp, -16] - - RET +# void pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w16__aarch64_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint16_t* w_row_ptr, +# const uint16_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_8X8_PACKEDA__AARCH64_NEON(16, #2, #1, LDRH) -END_FUNCTION pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA__aarch64_neon +# void pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w8__aarch64_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint8_t* w_row_ptr, +# const uint8_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_8X8_PACKEDA__AARCH64_NEON(8, #1, #0, LDRB) #ifdef __ELF__ .section ".note.GNU-stack","",%progbits #endif + +#undef NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_5 +#undef NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_4 +#undef NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_3 +#undef MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_1X4_UKERNEL_8X8_PACKEDA__AARCH64_NEON +#undef XX diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x8c8x1-dq-packedA-aarch64-neon.S b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x8c8x1-dq-packedA-aarch64-neon.S index 5bb470b2521b..2ba033c57c83 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x8c8x1-dq-packedA-aarch64-neon.S +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm_sparse/8x8c8x1-dq-packedA-aarch64-neon.S @@ -8,6 +8,24 @@ #include +#ifndef IGNORE_CODE_ALIGN_DIRECTIVES +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_5 .p2align 5 +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_4 .p2align 4 +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_3 .p2align 3 +#else +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_5 +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_4 +#define NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_3 +#endif + +# Macro for separating instructions. For most builds, ; can be used, but for +# ARM64 + Mach, ; begins a comment, and %% is used to separate instructions +#if defined(__MACH__) +#define XX %% +#else +#define XX ; +#endif + # params # c_stride @@ -19,7 +37,389 @@ # |params | 16 # |-----------| -# void pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon( +# void pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch64_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint##W_INDEX_DTYPE_NUM_BITS##_t* w_row_ptr, +# const uint##W_INDEX_DTYPE_NUM_BITS##_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +#define MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_8X8_PACKEDA__AARCH64_NEON(W_INDEX_DTYPE_NUM_BITS, W_INDEX_DTYPE_NUM_BYTES_ARG, W_INDEX_DTYPE_LOG_NUM_BYTES_ARG, LOAD_INDEX_INSTRUCTION) XX\ + BEGIN_FUNCTION pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch64_neon XX\ + XX\ + STP d15, d14, [sp, -16] XX\ + STP d13, d12, [sp, -32] XX\ + STP d11, d10, [sp, -48] XX\ + STP d9, d8, [sp, -64] XX\ + XX\ + MOV x11, x1 XX\ + /* Load output channel index */ XX\ + LDR x10, [sp, 8] XX\ + /* Load params */ XX\ + LDR x8, [sp, 16] XX\ + XX\ + /* Load a_zero_point */ XX\ + LD1R {v24.8b}, [x8] XX\ + ADD x8, x8, 8 XX\ + XX\ + /* Load pointer to per channel zero points array */ XX\ + LDR x17, [x8], 8 XX\ + XX\ + /* Load pointer to per channel multiplier */ XX\ + LDR x13, [x8] XX\ + XX\ + /* Add offset to the base pointer */ XX\ + ADD x17, x17, x10 XX\ + /* Mul by 4 to get byte offset for multiplier */ XX\ + LSL x10, x10, 2 XX\ + /* Add offset to the base pointer for multiplier */ XX\ + ADD x13, x13, x10 XX\ + XX\ + /* Load b_zero_point */ XX\ + LD1 {v25.8b}, [x17] XX\ + /* Load multiplier c0123 */ XX\ + LD1 {v26.4s}, [x13], 16 XX\ + /* Load multiplier c4567 */ XX\ + LD1 {v30.4s}, [x13] XX\ + XX\ + EOR x12, x12, x12 XX\ + EOR x13, x13, x13 XX\ + XX\ + EOR v8.16b, v8.16b, v8.16b XX\ + EOR v9.16b, v9.16b, v9.16b XX\ + EOR v10.16b, v10.16b, v10.16b XX\ + EOR v11.16b, v11.16b, v11.16b XX\ + EOR v12.16b, v12.16b, v12.16b XX\ + EOR v13.16b, v13.16b, v13.16b XX\ + EOR v14.16b, v14.16b, v14.16b XX\ + EOR v15.16b, v15.16b, v15.16b XX\ + EOR v16.16b, v16.16b, v16.16b XX\ + EOR v17.16b, v17.16b, v17.16b XX\ + EOR v18.16b, v18.16b, v18.16b XX\ + EOR v19.16b, v19.16b, v19.16b XX\ + EOR v20.16b, v20.16b, v20.16b XX\ + EOR v21.16b, v21.16b, v21.16b XX\ + EOR v22.16b, v22.16b, v22.16b XX\ + EOR v23.16b, v23.16b, v23.16b XX\ + XX\ + /* w12 = w_row_ptr[n], x13 = w_row_ptr[n+1] */ XX\ + /* x4 = x4 + W_INDEX_DTYPE_NUM_BYTES_ARG to point to next n */ XX\ + LOAD_INDEX_INSTRUCTION w12, [x4], W_INDEX_DTYPE_NUM_BYTES_ARG XX\ + LOAD_INDEX_INSTRUCTION w13, [x4] XX\ + /* x10 = temp_packed_w = packed_w + w_row_ptr[n] * 8 */ XX\ + /* This points to the first block of nonzero value */ XX\ + /* for the nth row. */ XX\ + ADD x10, x3, x12, LSL #3 XX\ + /* x9 = temp_w_block_ids_ptr = w_block_ids_ptr (x5) + w_row_ptr[n] */ XX\ + /* LSL for when elements are >1 byte */ XX\ + /* (4 bytes: LSL #2, 2 bytes: LSL #1, 1 byte: LSL #0) */ XX\ + /* This points to the block id of the first block */ XX\ + /* It should contain x13 - x12 number of block ids */ XX\ + ADD x9, x5, x12, LSL W_INDEX_DTYPE_LOG_NUM_BYTES_ARG XX\ + /* x8 = num_blocks that needs to be processed */ XX\ + SUB x8, x13, x12 XX\ + SUBS x8, x8, 2 XX\ + B.LO _1_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_5 XX\ + k_loop_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + /* k_loop processes two k values */ XX\ + /* Load two 8x1 blocks */ XX\ + LD1 {v0.8b}, [x10], 8 XX\ + LD1 {v1.8b}, [x10], 8 XX\ + USUBL v0.8h, v0.8b, v25.8b XX\ + USUBL v1.8h, v1.8b, v25.8b XX\ + XX\ + /* x12 = block_id_ptr[0] */ XX\ + /* x13 = block_id_ptr[1] */ XX\ + LOAD_INDEX_INSTRUCTION w12, [x9], W_INDEX_DTYPE_NUM_BYTES_ARG XX\ + LOAD_INDEX_INSTRUCTION w13, [x9], W_INDEX_DTYPE_NUM_BYTES_ARG XX\ + /* Add offset to x2 */ XX\ + /* Shift by 3 because each packed block is a block of 8x1 */ XX\ + /* which 8 bytes */ XX\ + ADD x16, x2, x12, LSL #3 XX\ + ADD x17, x2, x13, LSL #3 XX\ + XX\ + /* Load two 8x1 blocks of activation */ XX\ + /* First 8x1 for first channel */ XX\ + /* second 8x1 for next channel */ XX\ + LD1 {v2.8b}, [x16] XX\ + LD1 {v3.8b}, [x17] XX\ + XX\ + USUBL v2.8h, v2.8b, v24.8b XX\ + USUBL v3.8h, v3.8b, v24.8b XX\ + XX\ + /* First channel */ XX\ + SMLAL v8.4s, v0.4h, v2.h[0] XX\ + SMLAL2 v9.4s, v0.8h, v2.h[0] XX\ + SMLAL v10.4s, v0.4h, v2.h[1] XX\ + SMLAL2 v11.4s, v0.8h, v2.h[1] XX\ + SMLAL v12.4s, v0.4h, v2.h[2] XX\ + SMLAL2 v13.4s, v0.8h, v2.h[2] XX\ + SMLAL v14.4s, v0.4h, v2.h[3] XX\ + SMLAL2 v15.4s, v0.8h, v2.h[3] XX\ + SMLAL v16.4s, v0.4h, v2.h[4] XX\ + SMLAL2 v17.4s, v0.8h, v2.h[4] XX\ + SMLAL v18.4s, v0.4h, v2.h[5] XX\ + SMLAL2 v19.4s, v0.8h, v2.h[5] XX\ + SMLAL v20.4s, v0.4h, v2.h[6] XX\ + SMLAL2 v21.4s, v0.8h, v2.h[6] XX\ + SMLAL v22.4s, v0.4h, v2.h[7] XX\ + SMLAL2 v23.4s, v0.8h, v2.h[7] XX\ + XX\ + SUBS x8, x8, 2 XX\ + /* Second channel */ XX\ + SMLAL v8.4s, v1.4h, v3.h[0] XX\ + SMLAL2 v9.4s, v1.8h, v3.h[0] XX\ + SMLAL v10.4s, v1.4h, v3.h[1] XX\ + SMLAL2 v11.4s, v1.8h, v3.h[1] XX\ + SMLAL v12.4s, v1.4h, v3.h[2] XX\ + SMLAL2 v13.4s, v1.8h, v3.h[2] XX\ + SMLAL v14.4s, v1.4h, v3.h[3] XX\ + SMLAL2 v15.4s, v1.8h, v3.h[3] XX\ + SMLAL v16.4s, v1.4h, v3.h[4] XX\ + SMLAL2 v17.4s, v1.8h, v3.h[4] XX\ + SMLAL v18.4s, v1.4h, v3.h[5] XX\ + SMLAL2 v19.4s, v1.8h, v3.h[5] XX\ + SMLAL v20.4s, v1.4h, v3.h[6] XX\ + SMLAL2 v21.4s, v1.8h, v3.h[6] XX\ + SMLAL v22.4s, v1.4h, v3.h[7] XX\ + SMLAL2 v23.4s, v1.8h, v3.h[7] XX\ + XX\ + B.HS k_loop_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + _1_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + CMP x8, -2 XX\ + B.EQ _3_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + LD1 {v0.8b}, [x10] XX\ + USUBL v0.8h, v0.8b, v25.8b XX\ + XX\ + /* x12 = block_id_ptr[0] */ XX\ + LOAD_INDEX_INSTRUCTION w12, [x9] XX\ + /* Add offset to x2 */ XX\ + ADD x16, x2, x12, LSL #3 XX\ + XX\ + LD1 {v2.8b}, [x16] XX\ + USUBL v2.8h, v2.8b, v24.8b XX\ + XX\ + SMLAL v8.4s, v0.4h, v2.h[0] XX\ + SMLAL2 v9.4s, v0.8h, v2.h[0] XX\ + SMLAL v10.4s, v0.4h, v2.h[1] XX\ + SMLAL2 v11.4s, v0.8h, v2.h[1] XX\ + SMLAL v12.4s, v0.4h, v2.h[2] XX\ + SMLAL2 v13.4s, v0.8h, v2.h[2] XX\ + SMLAL v14.4s, v0.4h, v2.h[3] XX\ + SMLAL2 v15.4s, v0.8h, v2.h[3] XX\ + SMLAL v16.4s, v0.4h, v2.h[4] XX\ + SMLAL2 v17.4s, v0.8h, v2.h[4] XX\ + SMLAL v18.4s, v0.4h, v2.h[5] XX\ + SMLAL2 v19.4s, v0.8h, v2.h[5] XX\ + SMLAL v20.4s, v0.4h, v2.h[6] XX\ + SMLAL2 v21.4s, v0.8h, v2.h[6] XX\ + SMLAL v22.4s, v0.4h, v2.h[7] XX\ + SMLAL2 v23.4s, v0.8h, v2.h[7] XX\ + XX\ + NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_4 XX\ + _3_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + /* row 0: v8, v9 */ XX\ + /* row 1: v10, v11 */ XX\ + /* row 2: v12, v13 */ XX\ + /* row 3: v14, v15 */ XX\ + /* row 4: v16, v17 */ XX\ + /* row 5: v18, v19 */ XX\ + /* row 6: v20, v21 */ XX\ + /* row 7: v22, v23 */ XX\ + XX\ + /* Load c_stride & params */ XX\ + LDR x16, [sp] XX\ + LSL x16, x16, 2 XX\ + LD1 {v24.4s}, [x6], 16 XX\ + LD1 {v25.4s}, [x6] XX\ + XX\ + SCVTF v8.4s, v8.4s XX\ + SCVTF v9.4s, v9.4s XX\ + SCVTF v10.4s, v10.4s XX\ + SCVTF v11.4s, v11.4s XX\ + SCVTF v12.4s, v12.4s XX\ + SCVTF v13.4s, v13.4s XX\ + SCVTF v14.4s, v14.4s XX\ + SCVTF v15.4s, v15.4s XX\ + SCVTF v16.4s, v16.4s XX\ + SCVTF v17.4s, v17.4s XX\ + SCVTF v18.4s, v18.4s XX\ + SCVTF v19.4s, v19.4s XX\ + SCVTF v20.4s, v20.4s XX\ + SCVTF v21.4s, v21.4s XX\ + SCVTF v22.4s, v22.4s XX\ + SCVTF v23.4s, v23.4s XX\ + XX\ + FMUL v8.4s, v8.4s, v26.4s XX\ + FMUL v9.4s, v9.4s, v30.4s XX\ + FMUL v10.4s, v10.4s, v26.4s XX\ + FMUL v11.4s, v11.4s, v30.4s XX\ + FMUL v12.4s, v12.4s, v26.4s XX\ + FMUL v13.4s, v13.4s, v30.4s XX\ + FMUL v14.4s, v14.4s, v26.4s XX\ + FMUL v15.4s, v15.4s, v30.4s XX\ + FMUL v16.4s, v16.4s, v26.4s XX\ + FMUL v17.4s, v17.4s, v30.4s XX\ + FMUL v18.4s, v18.4s, v26.4s XX\ + FMUL v19.4s, v19.4s, v30.4s XX\ + FMUL v20.4s, v20.4s, v26.4s XX\ + FMUL v21.4s, v21.4s, v30.4s XX\ + FMUL v22.4s, v22.4s, v26.4s XX\ + FMUL v23.4s, v23.4s, v30.4s XX\ + XX\ + FADD v8.4s, v8.4s, v24.4s XX\ + FADD v9.4s, v9.4s, v25.4s XX\ + FADD v10.4s, v10.4s, v24.4s XX\ + FADD v11.4s, v11.4s, v25.4s XX\ + FADD v12.4s, v12.4s, v24.4s XX\ + FADD v13.4s, v13.4s, v25.4s XX\ + FADD v14.4s, v14.4s, v24.4s XX\ + FADD v15.4s, v15.4s, v25.4s XX\ + FADD v16.4s, v16.4s, v24.4s XX\ + FADD v17.4s, v17.4s, v25.4s XX\ + FADD v18.4s, v18.4s, v24.4s XX\ + FADD v19.4s, v19.4s, v25.4s XX\ + FADD v20.4s, v20.4s, v24.4s XX\ + FADD v21.4s, v21.4s, v25.4s XX\ + FADD v22.4s, v22.4s, v24.4s XX\ + FADD v23.4s, v23.4s, v25.4s XX\ + XX\ + /* Compute c0-c7 */ XX\ + XX\ + ADD x9, x7, x16 XX\ + CMP x0, 2 XX\ + CSEL x9, x7, x9, LO XX\ + XX\ + ADD x10, x9, x16 XX\ + CSEL x10, x9, x10, LS XX\ + XX\ + ADD x8, x10, x16 XX\ + CMP x0, 4 XX\ + CSEL x8, x10, x8, LO XX\ + XX\ + ADD x12, x8, x16 XX\ + CSEL x12, x8, x12, LS XX\ + XX\ + ADD x13, x12, x16 XX\ + CMP x0, 6 XX\ + CSEL x13, x12, x13, LO XX\ + XX\ + ADD x14, x13, x16 XX\ + CSEL x14, x13, x14, LS XX\ + XX\ + ADD x15, x14, x16 XX\ + CMP x0, 8 XX\ + CSEL x15, x14, x15, NE XX\ + XX\ + CMP x11, 8 XX\ + B.NE _4_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + ST1 {v8.4s}, [x7], 16 XX\ + ST1 {v9.4s}, [x7] XX\ + ST1 {v10.4s}, [x9], 16 XX\ + ST1 {v11.4s}, [x9] XX\ + ST1 {v12.4s}, [x10], 16 XX\ + ST1 {v13.4s}, [x10] XX\ + ST1 {v14.4s}, [x8], 16 XX\ + ST1 {v15.4s}, [x8] XX\ + ST1 {v16.4s}, [x12], 16 XX\ + ST1 {v17.4s}, [x12] XX\ + ST1 {v18.4s}, [x13], 16 XX\ + ST1 {v19.4s}, [x13] XX\ + ST1 {v20.4s}, [x14], 16 XX\ + ST1 {v21.4s}, [x14] XX\ + ST1 {v22.4s}, [x15], 16 XX\ + ST1 {v23.4s}, [x15] XX\ + XX\ + LDP d9, d8, [sp, -64] XX\ + LDP d11, d10, [sp, -48] XX\ + LDP d13, d12, [sp, -32] XX\ + LDP d15, d14, [sp, -16] XX\ + XX\ + RET XX\ + XX\ + NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_3 XX\ + _4_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + CMP x11, 4 XX\ + B.LO _5_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + ST1 {v8.4s}, [x7], 16 XX\ + ST1 {v10.4s}, [x9], 16 XX\ + ST1 {v12.4s}, [x10], 16 XX\ + ST1 {v14.4s}, [x8], 16 XX\ + ST1 {v16.4s}, [x12], 16 XX\ + ST1 {v18.4s}, [x13], 16 XX\ + ST1 {v20.4s}, [x14], 16 XX\ + ST1 {v22.4s}, [x15], 16 XX\ + XX\ + SUB x11, x11, 4 XX\ + XX\ + MOV v8.16b, v9.16b XX\ + MOV v10.16b, v11.16b XX\ + MOV v12.16b, v13.16b XX\ + MOV v14.16b, v15.16b XX\ + MOV v16.16b, v17.16b XX\ + MOV v18.16b, v19.16b XX\ + MOV v20.16b, v21.16b XX\ + MOV v22.16b, v23.16b XX\ + XX\ + _5_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + CMP x11, 2 XX\ + B.LO _6_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + ST1 {v8.2s}, [x7], 8 XX\ + ST1 {v10.2s}, [x9], 8 XX\ + ST1 {v12.2s}, [x10], 8 XX\ + ST1 {v14.2s}, [x8], 8 XX\ + ST1 {v16.2s}, [x12], 8 XX\ + ST1 {v18.2s}, [x13], 8 XX\ + ST1 {v20.2s}, [x14], 8 XX\ + ST1 {v22.2s}, [x15], 8 XX\ + XX\ + SUB x11, x11, 2 XX\ + XX\ + EXT v8.16b, v8.16b, v8.16b, 8 XX\ + EXT v10.16b, v10.16b, v10.16b, 8 XX\ + EXT v12.16b, v12.16b, v12.16b, 8 XX\ + EXT v14.16b, v14.16b, v14.16b, 8 XX\ + EXT v16.16b, v16.16b, v16.16b, 8 XX\ + EXT v18.16b, v18.16b, v18.16b, 8 XX\ + EXT v20.16b, v20.16b, v20.16b, 8 XX\ + EXT v22.16b, v22.16b, v22.16b, 8 XX\ + XX\ + _6_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + CMP x11, 1 XX\ + B.LO _7_w##W_INDEX_DTYPE_NUM_BITS XX\ + XX\ + ST1 {v8.s}[0], [x7] XX\ + ST1 {v10.s}[0], [x9] XX\ + ST1 {v12.s}[0], [x10] XX\ + ST1 {v14.s}[0], [x8] XX\ + ST1 {v16.s}[0], [x12] XX\ + ST1 {v18.s}[0], [x13] XX\ + ST1 {v20.s}[0], [x14] XX\ + ST1 {v22.s}[0], [x15] XX\ + XX\ + _7_w##W_INDEX_DTYPE_NUM_BITS##: XX\ + LDP d9, d8, [sp, -64] XX\ + LDP d11, d10, [sp, -48] XX\ + LDP d13, d12, [sp, -32] XX\ + LDP d15, d14, [sp, -16] XX\ + XX\ + RET XX\ + XX\ + END_FUNCTION pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w##W_INDEX_DTYPE_NUM_BITS##__aarch64_neon + +# void pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w32__aarch64_neon( # size_t mr, # size_t nr, # const uint8_t* a_packed, @@ -31,380 +431,42 @@ # size_t c_stride, # size_t output_channel_index, # const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) -BEGIN_FUNCTION pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA__aarch64_neon - - STP d15, d14, [sp, -16] - STP d13, d12, [sp, -32] - STP d11, d10, [sp, -48] - STP d9, d8, [sp, -64] - - MOV x11, x1 - # Load output channel index - LDR x10, [sp, 8] - # Load params - LDR x8, [sp, 16] - - # Load a_zero_point - LD1R {v24.8b}, [x8] - ADD x8, x8, 8 - - # Load pointer to per channel zero points array - LDR x17, [x8], 8 - - # Load pointer to per channel multiplier - LDR x13, [x8] - - # Add offset to the base pointer - ADD x17, x17, x10 - # Mul by 4 to get byte offset for multiplier - LSL x10, x10, 2 - # Add offset to the base pointer for multiplier - ADD x13, x13, x10 - - # Load b_zero_point - LD1 {v25.8b}, [x17] - # Load multiplier c0123 - LD1 {v26.4s}, [x13], 16 - # Load multiplier c4567 - LD1 {v30.4s}, [x13] - - EOR x12, x12, x12 - EOR x13, x13, x13 - - EOR v8.16b, v8.16b, v8.16b - EOR v9.16b, v9.16b, v9.16b - EOR v10.16b, v10.16b, v10.16b - EOR v11.16b, v11.16b, v11.16b - EOR v12.16b, v12.16b, v12.16b - EOR v13.16b, v13.16b, v13.16b - EOR v14.16b, v14.16b, v14.16b - EOR v15.16b, v15.16b, v15.16b - EOR v16.16b, v16.16b, v16.16b - EOR v17.16b, v17.16b, v17.16b - EOR v18.16b, v18.16b, v18.16b - EOR v19.16b, v19.16b, v19.16b - EOR v20.16b, v20.16b, v20.16b - EOR v21.16b, v21.16b, v21.16b - EOR v22.16b, v22.16b, v22.16b - EOR v23.16b, v23.16b, v23.16b - - # w12 = w_row_ptr[n], x13 = w_row_ptr[n+1] - # x4 = x4 + 4 to point to next n - LDR w12, [x4], #4 - LDR w13, [x4] - # x10 = temp_packed_w = packed_w + w_row_ptr[n] * 8 - # This points to the first block of nonzero value - # for the nth row. - ADD x10, x3, x12, LSL #3 - # x9 = temp_w_block_ids_ptr = w_block_ids_ptr (x5) + w_row_ptr[n] - # LSL2 because each element is 4 bytes - # This points to the block id of the first block - # It should contain x13 - x12 number of block ids - ADD x9, x5, x12, LSL #2 - # x8 = num_blocks that needs to be processed - SUB x8, x13, x12 - SUBS x8, x8, 2 - B.LO 1f - -#ifndef IGNORE_CODE_ALIGN_DIRECTIVES - .p2align 5 -#endif -k_loop: - # k_loop processes two k values - # Load two 8x1 blocks - LD1 {v0.8b}, [x10], 8 - LD1 {v1.8b}, [x10], 8 - USUBL v0.8h, v0.8b, v25.8b - USUBL v1.8h, v1.8b, v25.8b - - #x12 = block_id_ptr[0] - #x13 = block_id_ptr[1] - LDR w12, [x9], #4 - LDR w13, [x9], #4 - # Add offset to x2 - # Shift by 3 because each packed block is a block of 8x1 - # which 8 bytes - ADD x16, x2, x12, LSL #3 - ADD x17, x2, x13, LSL #3 - - # Load two 8x1 blocks of activation - # First 8x1 for first channel - # second 8x1 for next channel - LD1 {v2.8b}, [x16] - LD1 {v3.8b}, [x17] - - USUBL v2.8h, v2.8b, v24.8b - USUBL v3.8h, v3.8b, v24.8b - - # First channel - SMLAL v8.4s, v0.4h, v2.h[0] - SMLAL2 v9.4s, v0.8h, v2.h[0] - SMLAL v10.4s, v0.4h, v2.h[1] - SMLAL2 v11.4s, v0.8h, v2.h[1] - SMLAL v12.4s, v0.4h, v2.h[2] - SMLAL2 v13.4s, v0.8h, v2.h[2] - SMLAL v14.4s, v0.4h, v2.h[3] - SMLAL2 v15.4s, v0.8h, v2.h[3] - SMLAL v16.4s, v0.4h, v2.h[4] - SMLAL2 v17.4s, v0.8h, v2.h[4] - SMLAL v18.4s, v0.4h, v2.h[5] - SMLAL2 v19.4s, v0.8h, v2.h[5] - SMLAL v20.4s, v0.4h, v2.h[6] - SMLAL2 v21.4s, v0.8h, v2.h[6] - SMLAL v22.4s, v0.4h, v2.h[7] - SMLAL2 v23.4s, v0.8h, v2.h[7] - - SUBS x8, x8, 2 - # Second channel - SMLAL v8.4s, v1.4h, v3.h[0] - SMLAL2 v9.4s, v1.8h, v3.h[0] - SMLAL v10.4s, v1.4h, v3.h[1] - SMLAL2 v11.4s, v1.8h, v3.h[1] - SMLAL v12.4s, v1.4h, v3.h[2] - SMLAL2 v13.4s, v1.8h, v3.h[2] - SMLAL v14.4s, v1.4h, v3.h[3] - SMLAL2 v15.4s, v1.8h, v3.h[3] - SMLAL v16.4s, v1.4h, v3.h[4] - SMLAL2 v17.4s, v1.8h, v3.h[4] - SMLAL v18.4s, v1.4h, v3.h[5] - SMLAL2 v19.4s, v1.8h, v3.h[5] - SMLAL v20.4s, v1.4h, v3.h[6] - SMLAL2 v21.4s, v1.8h, v3.h[6] - SMLAL v22.4s, v1.4h, v3.h[7] - SMLAL2 v23.4s, v1.8h, v3.h[7] - - B.HS k_loop - -1: - CMP x8, -2 - B.EQ 3f - - LD1 {v0.8b}, [x10] - USUBL v0.8h, v0.8b, v25.8b - - #x12 = block_id_ptr[0] - LDR w12, [x9] - # Add offset to x2 - ADD x16, x2, x12, LSL #3 - - LD1 {v2.8b}, [x16] - USUBL v2.8h, v2.8b, v24.8b - - SMLAL v8.4s, v0.4h, v2.h[0] - SMLAL2 v9.4s, v0.8h, v2.h[0] - SMLAL v10.4s, v0.4h, v2.h[1] - SMLAL2 v11.4s, v0.8h, v2.h[1] - SMLAL v12.4s, v0.4h, v2.h[2] - SMLAL2 v13.4s, v0.8h, v2.h[2] - SMLAL v14.4s, v0.4h, v2.h[3] - SMLAL2 v15.4s, v0.8h, v2.h[3] - SMLAL v16.4s, v0.4h, v2.h[4] - SMLAL2 v17.4s, v0.8h, v2.h[4] - SMLAL v18.4s, v0.4h, v2.h[5] - SMLAL2 v19.4s, v0.8h, v2.h[5] - SMLAL v20.4s, v0.4h, v2.h[6] - SMLAL2 v21.4s, v0.8h, v2.h[6] - SMLAL v22.4s, v0.4h, v2.h[7] - SMLAL2 v23.4s, v0.8h, v2.h[7] - -#ifndef IGNORE_CODE_ALIGN_DIRECTIVES - .p2align 4 -#endif -3: - # row 0: v8, v9 - # row 1: v10, v11 - # row 2: v12, v13 - # row 3: v14, v15 - # row 4: v16, v17 - # row 5: v18, v19 - # row 6: v20, v21 - # row 7: v22, v23 - - # Load c_stride & params - LDR x16, [sp] - LSL x16, x16, 2 - LD1 {v24.4s}, [x6], 16 - LD1 {v25.4s}, [x6] - - SCVTF v8.4s, v8.4s - SCVTF v9.4s, v9.4s - SCVTF v10.4s, v10.4s - SCVTF v11.4s, v11.4s - SCVTF v12.4s, v12.4s - SCVTF v13.4s, v13.4s - SCVTF v14.4s, v14.4s - SCVTF v15.4s, v15.4s - SCVTF v16.4s, v16.4s - SCVTF v17.4s, v17.4s - SCVTF v18.4s, v18.4s - SCVTF v19.4s, v19.4s - SCVTF v20.4s, v20.4s - SCVTF v21.4s, v21.4s - SCVTF v22.4s, v22.4s - SCVTF v23.4s, v23.4s - - FMUL v8.4s, v8.4s, v26.4s - FMUL v9.4s, v9.4s, v30.4s - FMUL v10.4s, v10.4s, v26.4s - FMUL v11.4s, v11.4s, v30.4s - FMUL v12.4s, v12.4s, v26.4s - FMUL v13.4s, v13.4s, v30.4s - FMUL v14.4s, v14.4s, v26.4s - FMUL v15.4s, v15.4s, v30.4s - FMUL v16.4s, v16.4s, v26.4s - FMUL v17.4s, v17.4s, v30.4s - FMUL v18.4s, v18.4s, v26.4s - FMUL v19.4s, v19.4s, v30.4s - FMUL v20.4s, v20.4s, v26.4s - FMUL v21.4s, v21.4s, v30.4s - FMUL v22.4s, v22.4s, v26.4s - FMUL v23.4s, v23.4s, v30.4s - - FADD v8.4s, v8.4s, v24.4s - FADD v9.4s, v9.4s, v25.4s - FADD v10.4s, v10.4s, v24.4s - FADD v11.4s, v11.4s, v25.4s - FADD v12.4s, v12.4s, v24.4s - FADD v13.4s, v13.4s, v25.4s - FADD v14.4s, v14.4s, v24.4s - FADD v15.4s, v15.4s, v25.4s - FADD v16.4s, v16.4s, v24.4s - FADD v17.4s, v17.4s, v25.4s - FADD v18.4s, v18.4s, v24.4s - FADD v19.4s, v19.4s, v25.4s - FADD v20.4s, v20.4s, v24.4s - FADD v21.4s, v21.4s, v25.4s - FADD v22.4s, v22.4s, v24.4s - FADD v23.4s, v23.4s, v25.4s +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_8X8_PACKEDA__AARCH64_NEON(32, #4, #2, LDR) - // Compute c0-c7 - - ADD x9, x7, x16 - CMP x0, 2 - CSEL x9, x7, x9, LO - - ADD x10, x9, x16 - CSEL x10, x9, x10, LS - - ADD x8, x10, x16 - CMP x0, 4 - CSEL x8, x10, x8, LO - - ADD x12, x8, x16 - CSEL x12, x8, x12, LS - - ADD x13, x12, x16 - CMP x0, 6 - CSEL x13, x12, x13, LO - - ADD x14, x13, x16 - CSEL x14, x13, x14, LS - - ADD x15, x14, x16 - CMP x0, 8 - CSEL x15, x14, x15, NE - - CMP x11, 8 - B.NE 4f - - ST1 {v8.4s}, [x7], 16 - ST1 {v9.4s}, [x7] - ST1 {v10.4s}, [x9], 16 - ST1 {v11.4s}, [x9] - ST1 {v12.4s}, [x10], 16 - ST1 {v13.4s}, [x10] - ST1 {v14.4s}, [x8], 16 - ST1 {v15.4s}, [x8] - ST1 {v16.4s}, [x12], 16 - ST1 {v17.4s}, [x12] - ST1 {v18.4s}, [x13], 16 - ST1 {v19.4s}, [x13] - ST1 {v20.4s}, [x14], 16 - ST1 {v21.4s}, [x14] - ST1 {v22.4s}, [x15], 16 - ST1 {v23.4s}, [x15] - - LDP d9, d8, [sp, -64] - LDP d11, d10, [sp, -48] - LDP d13, d12, [sp, -32] - LDP d15, d14, [sp, -16] - - RET - -#ifndef IGNORE_CODE_ALIGN_DIRECTIVES - .p2align 3 -#endif -4: - CMP x11, 4 - B.LO 5f - - ST1 {v8.4s}, [x7], 16 - ST1 {v10.4s}, [x9], 16 - ST1 {v12.4s}, [x10], 16 - ST1 {v14.4s}, [x8], 16 - ST1 {v16.4s}, [x12], 16 - ST1 {v18.4s}, [x13], 16 - ST1 {v20.4s}, [x14], 16 - ST1 {v22.4s}, [x15], 16 - - SUB x11, x11, 4 - - MOV v8.16b, v9.16b - MOV v10.16b, v11.16b - MOV v12.16b, v13.16b - MOV v14.16b, v15.16b - MOV v16.16b, v17.16b - MOV v18.16b, v19.16b - MOV v20.16b, v21.16b - MOV v22.16b, v23.16b - -5: - CMP x11, 2 - B.LO 6f - - ST1 {v8.2s}, [x7], 8 - ST1 {v10.2s}, [x9], 8 - ST1 {v12.2s}, [x10], 8 - ST1 {v14.2s}, [x8], 8 - ST1 {v16.2s}, [x12], 8 - ST1 {v18.2s}, [x13], 8 - ST1 {v20.2s}, [x14], 8 - ST1 {v22.2s}, [x15], 8 - - SUB x11, x11, 2 - - EXT v8.16b, v8.16b, v8.16b, 8 - EXT v10.16b, v10.16b, v10.16b, 8 - EXT v12.16b, v12.16b, v12.16b, 8 - EXT v14.16b, v14.16b, v14.16b, 8 - EXT v16.16b, v16.16b, v16.16b, 8 - EXT v18.16b, v18.16b, v18.16b, 8 - EXT v20.16b, v20.16b, v20.16b, 8 - EXT v22.16b, v22.16b, v22.16b, 8 - -6: - CMP x11, 1 - B.LO 7f - - ST1 {v8.s}[0], [x7] - ST1 {v10.s}[0], [x9] - ST1 {v12.s}[0], [x10] - ST1 {v14.s}[0], [x8] - ST1 {v16.s}[0], [x12] - ST1 {v18.s}[0], [x13] - ST1 {v20.s}[0], [x14] - ST1 {v22.s}[0], [x15] - -7: - LDP d9, d8, [sp, -64] - LDP d11, d10, [sp, -48] - LDP d13, d12, [sp, -32] - LDP d15, d14, [sp, -16] - - RET +# void pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w16__aarch64_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint16_t* w_row_ptr, +# const uint16_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_8X8_PACKEDA__AARCH64_NEON(16, #2, #1, LDRH) -END_FUNCTION pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA__aarch64_neon +# void pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w8__aarch64_neon( +# size_t mr, +# size_t nr, +# const uint8_t* a_packed, +# const uint8_t* packed_w, +# const uint8_t* w_row_ptr, +# const uint8_t* w_block_ids_ptr, +# const float* b, +# uint8_t* restrict c, +# size_t c_stride, +# size_t output_channel_index, +# const union pytorch_qnnp_conv_dynamic_quantization_params quantization_params[restrict static 1]) +MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_8X8_PACKEDA__AARCH64_NEON(8, #1, #0, LDRB) #ifdef __ELF__ .section ".note.GNU-stack","",%progbits #endif + +#undef NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_5 +#undef NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_4 +#undef NDEF_IGNORE_CODE_ALIGN_DIRECTIVES_P2ALIGN_3 +#undef MAKE_PYTORCH_Q8GEMM_DQ_SPARSE_8X1_UKERNEL_8X8_PACKEDA__AARCH64_NEON +#undef XX diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/common.h b/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/common.h index 14bcc01d21ed..fbfaa85904c7 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/common.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/common.h @@ -80,3 +80,15 @@ #if defined(_MSC_VER) #define __builtin_prefetch #endif + +#if defined(__GNUC__) + #define PYTORCH_QNNP_UNALIGNED __attribute__((__aligned__(1))) +#elif defined(_MSC_VER) + #if defined(_M_IX86) + #define PYTORCH_QNNP_UNALIGNED + #else + #define PYTORCH_QNNP_UNALIGNED __unaligned + #endif +#else + #error "Platform-specific implementation of PYTORCH_QNNP_UNALIGNED required" +#endif diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/operator.h b/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/operator.h index 44e702a7e412..a6e2dbe24f81 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/operator.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/operator.h @@ -38,11 +38,20 @@ enum pytorch_qnnp_ukernel_type { }; typedef struct { - const uint32_t* col_indices; - const uint32_t* row_values; + union { + const uint32_t* col_indices_w32; + const uint16_t* col_indices_w16; + const uint8_t* col_indices_w8; + }; + union { + const uint32_t* row_values_w32; + const uint16_t* row_values_w16; + const uint8_t* row_values_w8; + }; const uint8_t* values; uint32_t row_block_size; uint32_t col_block_size; + enum pytorch_qnnp_sparse_matrix_indices_dtype indices_dtype; } sparse_matrix_t; struct pytorch_qnnp_operator { diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/params.h b/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/params.h index 1fb607e3f195..04536dafcef9 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/params.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/params.h @@ -331,7 +331,7 @@ typedef void (*pytorch_q8gemm_dq_sparse_ukernel_function)( size_t output_channel_index, const struct pytorch_qnnp_conv_dynamic_quantization_params* quantization_params); -typedef void (*pytorch_q8gemm_dq_sparse_packedA_ukernel_function)( +typedef void (*pytorch_q8gemm_dq_sparse_packedA_w32_ukernel_function)( size_t mr, size_t nr, const uint8_t* a_packed, @@ -344,6 +344,32 @@ typedef void (*pytorch_q8gemm_dq_sparse_packedA_ukernel_function)( size_t output_channel_index, const struct pytorch_qnnp_conv_dynamic_quantization_params* quantization_params); +typedef void (*pytorch_q8gemm_dq_sparse_packedA_w16_ukernel_function)( + size_t mr, + size_t nr, + const uint8_t* a_packed, + const uint8_t* packed_w, + const uint16_t* w_row_ptr, + const uint16_t* w_block_ids_ptr, + const float* bias, + float* c, + size_t c_stride, + size_t output_channel_index, + const struct pytorch_qnnp_conv_dynamic_quantization_params* quantization_params); + +typedef void (*pytorch_q8gemm_dq_sparse_packedA_w8_ukernel_function)( + size_t mr, + size_t nr, + const uint8_t* a_packed, + const uint8_t* packed_w, + const uint8_t* w_row_ptr, + const uint8_t* w_block_ids_ptr, + const float* bias, + float* c, + size_t c_stride, + size_t output_channel_index, + const struct pytorch_qnnp_conv_dynamic_quantization_params* quantization_params); + typedef void (*pytorch_q8gemm_sparse_packA_ukernel_function)( const size_t mr, const size_t K, @@ -545,7 +571,11 @@ struct pytorch_q8conv_parameters { struct pytorch_q8gemm_sparse_parameters { pytorch_q8gemm_dq_sparse_ukernel_function gemm_dq; - pytorch_q8gemm_dq_sparse_packedA_ukernel_function packedA_gemm_dq; + // w32, w16, and w8 refer to variants of the kernel which use uint32_t, + // uint16_t, and uint8_t datatype for row values/col indices respectively + pytorch_q8gemm_dq_sparse_packedA_w32_ukernel_function packedA_w32_gemm_dq; + pytorch_q8gemm_dq_sparse_packedA_w16_ukernel_function packedA_w16_gemm_dq; + pytorch_q8gemm_dq_sparse_packedA_w8_ukernel_function packedA_w8_gemm_dq; pytorch_q8gemm_sparse_packA_ukernel_function packA; uint8_t mr; uint8_t nr; diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/q8gemm_sparse.h b/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/q8gemm_sparse.h index 572b7cfe54a7..a4079f9bde0b 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/q8gemm_sparse.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/src/qnnpack/q8gemm_sparse.h @@ -61,32 +61,72 @@ DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_UKERNEL_FUNCTION(pytorch_q8ge DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_UKERNEL_FUNCTION(pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4__aarch64_neon) DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_UKERNEL_FUNCTION(pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4__sse2) -#define DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION(fn_name) \ - PYTORCH_QNNP_INTERNAL void fn_name( \ - size_t mr, \ - size_t nr, \ - const uint8_t* a_packed, \ - const uint8_t* packed_w, \ - const uint32_t* w_row_ptr, \ - const uint32_t* w_block_ids_ptr, \ - const float* b, \ - float* c, \ - size_t c_stride, \ - size_t output_channel_index, \ - const struct pytorch_qnnp_conv_dynamic_quantization_params* quantization_params); - +#define DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( \ + fn_name, w_index_dtype) \ + PYTORCH_QNNP_INTERNAL void fn_name( \ + size_t mr, \ + size_t nr, \ + const uint8_t* a_packed, \ + const uint8_t* packed_w, \ + const w_index_dtype* w_row_ptr, \ + const w_index_dtype* w_block_ids_ptr, \ + const float* b, \ + float* c, \ + size_t c_stride, \ + size_t output_channel_index, \ + const struct pytorch_qnnp_conv_dynamic_quantization_params* \ + quantization_params); + +// w32, w16, and w8 refer to variants of the kernel which use uint32_t, +// uint16_t, and uint8_t datatype for row values/col indices respectively +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w32__aarch32_neon, + uint32_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w16__aarch32_neon, + uint16_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w8__aarch32_neon, + uint8_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w32__aarch32_neon, + uint32_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w16__aarch32_neon, + uint16_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w8__aarch32_neon, + uint8_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__aarch32_neon, + uint32_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w32__aarch64_neon, + uint32_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w16__aarch64_neon, + uint16_t) +DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w8__aarch64_neon, + uint8_t) DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( - pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon) + pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w32__aarch64_neon, + uint32_t) DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( - pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon) + pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w16__aarch64_neon, + uint16_t) DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__aarch32_neon) + pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w8__aarch64_neon, + uint8_t) DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA__aarch64_neon) + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2, + uint32_t) DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( - pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA__aarch64_neon) + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2, + uint16_t) DECLARE_PYTORCH_Q8GEMM_DYNAMIC_QUANTIZATION_SPARSE_PACKEDA_UKERNEL_FUNCTION( - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2) + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2, + uint8_t) #define DECLARE_PYTORCH_Q8GEMM_PARSE_PACKA_UKERNEL_FUNCTION(fn_name) \ PYTORCH_QNNP_INTERNAL void fn_name( \ diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/test/fully-connected-sparse-operator-tester.h b/aten/src/ATen/native/quantized/cpu/qnnpack/test/fully-connected-sparse-operator-tester.h index 575c0a17bceb..b1338df41f18 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/test/fully-connected-sparse-operator-tester.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/test/fully-connected-sparse-operator-tester.h @@ -241,13 +241,13 @@ class FullyConnectedSparseOperatorTester { } while (max_elem == min_elem); std::unique_ptr bcsr_matrix = - qnnpack::generateBlockCSRMatrix( - kernel.data(), - outputChannels(), - inputChannels(), - rowBlockSize(), - colBlockSize(), - kernelZeroPoints.data()); + qnnpack::generateBlockCSRMatrix( + kernel.data(), + outputChannels(), + inputChannels(), + rowBlockSize(), + colBlockSize(), + kernelZeroPoints.data()); std::fill(output.begin(), output.end(), 0xA5); std::fill(output_dynamic.begin(), output_dynamic.end(), 0.0f); @@ -320,11 +320,12 @@ class FullyConnectedSparseOperatorTester { outputChannels(), inputZeroPoint, kernelZeroPoints.data(), - bcsr_matrix->col_indices.data(), - bcsr_matrix->row_values.data(), + bcsr_matrix->col_indices_data_ptr(), + bcsr_matrix->row_values_data_ptr(), bcsr_matrix->values.data(), bcsr_matrix->row_block_size, bcsr_matrix->col_block_size, + pytorch_qnnp_sparse_matrix_indices_dtype_uint32_t, outputZeroPoint, qmin(), qmax(), @@ -441,13 +442,13 @@ class FullyConnectedSparseOperatorTester { min_elem = *std::min_element(kernel.cbegin(), kernel.cend()); } while (max_elem == min_elem); std::unique_ptr bcsr_matrix = - qnnpack::generateBlockCSRMatrix( - kernel.data(), - outputChannels(), - inputChannels(), - rowBlockSize(), - colBlockSize(), - kernelZeroPoints.data()); + qnnpack::generateBlockCSRMatrix( + kernel.data(), + outputChannels(), + inputChannels(), + rowBlockSize(), + colBlockSize(), + kernelZeroPoints.data()); std::fill(output.begin(), output.end(), 0xA5); std::fill(output_dynamic.begin(), output_dynamic.end(), 0.0f); @@ -520,11 +521,12 @@ class FullyConnectedSparseOperatorTester { outputChannels(), inputZeroPoint, kernelZeroPoints.data(), - bcsr_matrix->col_indices.data(), - bcsr_matrix->row_values.data(), + bcsr_matrix->col_indices_data_ptr(), + bcsr_matrix->row_values_data_ptr(), bcsr_matrix->values.data(), bcsr_matrix->row_block_size, bcsr_matrix->col_block_size, + pytorch_qnnp_sparse_matrix_indices_dtype_uint32_t, outputZeroPoint, qmin(), qmax(), diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/test/gemm-block-sparse-microkernel-tester.h b/aten/src/ATen/native/quantized/cpu/qnnpack/test/gemm-block-sparse-microkernel-tester.h index 25e7bb670653..53eb9ed33830 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/test/gemm-block-sparse-microkernel-tester.h +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/test/gemm-block-sparse-microkernel-tester.h @@ -279,7 +279,7 @@ class GemmBlockSparseMicrokernelTester { } while (max_elem == min_elem); std::unique_ptr bcsr_matrix = - qnnpack::generateBlockCSRMatrix( + qnnpack::generateBlockCSRMatrix( b.data(), n(), k(), @@ -332,8 +332,8 @@ class GemmBlockSparseMicrokernelTester { aPtr, aStride() * sizeof(uint8_t), bcsr_matrix->values.data(), - bcsr_matrix->row_values.data(), - bcsr_matrix->col_indices.data(), + static_cast(bcsr_matrix->row_values_data_ptr()), + static_cast(bcsr_matrix->col_indices_data_ptr()), bias.data(), c.data(), cStride(), @@ -355,9 +355,10 @@ class GemmBlockSparseMicrokernelTester { } } + template void test_packed( pytorch_q8gemm_sparse_packA_ukernel_function packa, - pytorch_q8gemm_dq_sparse_packedA_ukernel_function qgemm) const { + GEMM_UKERNEL_DTYPE qgemm) const { ASSERT_LE(m(), mr()); ASSERT_LE(n(), nr()); @@ -405,13 +406,13 @@ class GemmBlockSparseMicrokernelTester { min_elem = *std::min_element(b.cbegin(), b.cend()); } while (max_elem == min_elem); std::unique_ptr bcsr_matrix = - qnnpack::generateBlockCSRMatrix( - b.data(), - n(), - k(), - rowBlockSize(), - colBlockSize(), - kernel_zero_points.data()); + qnnpack::generateBlockCSRMatrix( + b.data(), + n(), + k(), + rowBlockSize(), + colBlockSize(), + kernel_zero_points.data()); ASSERT_NE( *std::max_element(a.cbegin(), a.cend()), @@ -465,8 +466,10 @@ class GemmBlockSparseMicrokernelTester { n(), a_packed.data(), bcsr_matrix->values.data(), - bcsr_matrix->row_values.data(), - bcsr_matrix->col_indices.data(), + static_cast( + bcsr_matrix->row_values_data_ptr()), + static_cast( + bcsr_matrix->col_indices_data_ptr()), bias.data(), c.data(), cStride(), diff --git a/aten/src/ATen/native/quantized/cpu/qnnpack/test/q8gemm_sparse.cc b/aten/src/ATen/native/quantized/cpu/qnnpack/test/q8gemm_sparse.cc index 42467e2d2952..49f970c1dabc 100644 --- a/aten/src/ATen/native/quantized/cpu/qnnpack/test/q8gemm_sparse.cc +++ b/aten/src/ATen/native/quantized/cpu/qnnpack/test/q8gemm_sparse.cc @@ -16,25 +16,31 @@ #define TEST_PACKED_ROW_BLOCK_SIZEXCOL_BLOCK_SIZE_SPARSE_OP(MR, \ NR, row_block_size, col_block_size, \ - prepacking_kernel, compute_kernel) \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_4) { \ + prepacking_kernel, compute_kernel_w32, compute_kernel_w16, compute_kernel_w8) \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_4) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(3) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(3) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_4_strided_a) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_4_strided_a) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -42,15 +48,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(3) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aStride(37) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aStride(37); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_4_strided_c) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_4_strided_c) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -58,47 +70,65 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(3) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .cStride(17) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .cStride(17); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_4_qmin128) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_4_qmin128) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(3) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .qmin(128) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(3) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size) \ + .qmin(128); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_4_qmax128) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_4_qmax128) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(3) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .qmax(128) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(3) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size) \ + .qmax(128); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_4_azp0) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_4_azp0) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -106,15 +136,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(3) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_4_bzp0) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_4_bzp0) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -122,15 +158,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(3) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .bZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .bZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_4_nozp) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_4_nozp) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -139,30 +181,42 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ .aZeroPoint(0) \ - .bZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .bZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_8) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_8) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(5) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(5) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_8_strided_a) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_8_strided_a) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -170,15 +224,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(5) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aStride(37) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aStride(37); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_8_strided_c) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_8_strided_c) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -186,47 +246,65 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(5) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .cStride(17) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .cStride(17); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_8_qmin128) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_8_qmin128) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(5) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .qmin(128) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(5) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size) \ + .qmin(128); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_8_qmax128) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_8_qmax128) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(5) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .qmax(128) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(5) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size) \ + .qmax(128); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_8_azp0) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_8_azp0) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -234,15 +312,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(5) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_8_bzp0) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_8_bzp0) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -250,15 +334,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(5) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .bZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .bZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_lt_8_nozp) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_lt_8_nozp) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -267,30 +357,42 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ .aZeroPoint(0) \ - .bZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .bZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_eq_8) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_eq_8) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(8) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(8) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_eq_8_strided_a) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_eq_8_strided_a) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -298,15 +400,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(8) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aStride(37) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aStride(37); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_eq_8_strided_c) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_eq_8_strided_c) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -314,47 +422,65 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(8) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .cStride(17) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .cStride(17); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_eq_8_qmin128) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_eq_8_qmin128) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(8) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .qmin(128) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(8) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size) \ + .qmin(128); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_eq_8_qmax128) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_eq_8_qmax128) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(8) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .qmax(128) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(8) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size) \ + .qmax(128); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_eq_8_azp0) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_eq_8_azp0) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -362,15 +488,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(8) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_eq_8_bzp0) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_eq_8_bzp0) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -378,15 +510,21 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(8) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .bZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .bZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_eq_8_nozp) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_eq_8_nozp) { \ TEST_REQUIRES_ARM_NEON; \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -395,33 +533,45 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ .aZeroPoint(0) \ - .bZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .bZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_gt_8) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_gt_8) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 9; k < 16; k++) { \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(k) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .test_packed( \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(k) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ prepacking_kernel, \ - compute_kernel); \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_gt_8_strided_a) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_gt_8_strided_a) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 9; k < 16; k++) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -429,17 +579,23 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(k) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aStride(37) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aStride(37); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_gt_8_strided_c) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_gt_8_strided_c) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 9; k < 16; k++) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -447,17 +603,23 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(k) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .cStride(17) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .cStride(17); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_gt_8_azp0) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_gt_8_azp0) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 9; k < 16; k++) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -465,17 +627,23 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(k) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_gt_8_bzp0) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_gt_8_bzp0) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 9; k < 16; k++) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -483,17 +651,23 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(k) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .bZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .bZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_gt_8_nozp) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_gt_8_nozp) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 9; k < 16; k++) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -502,19 +676,25 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ .aZeroPoint(0) \ - .bZeroPoint(0) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .bZeroPoint(0); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_gt_8_subtile) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_gt_8_subtile) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 9; k < 16; k++) { \ for (uint32_t m = 1; m <= MR; m++) { \ for (uint32_t n = 1; n <= NR; n++) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(m) \ @@ -522,36 +702,48 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(k) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .iterations(3) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .iterations(3); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_div_8) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_div_8) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 16; k < 128; k += 8) { \ - GemmBlockSparseMicrokernelTester() \ - .mr(MR) \ - .nr(NR) \ - .m(MR) \ - .n(NR) \ - .k(k) \ - .rowBlockSize(row_block_size) \ - .colBlockSize(col_block_size) \ - .test_packed( \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ + .mr(MR) \ + .nr(NR) \ + .m(MR) \ + .n(NR) \ + .k(k) \ + .rowBlockSize(row_block_size) \ + .colBlockSize(col_block_size); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ prepacking_kernel, \ - compute_kernel); \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_div_8_strided_a) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_div_8_strided_a) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 16; k < 128; k += 8) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -559,17 +751,23 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(k) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .aStride(171) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .aStride(171); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_div_8_strided_c) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_div_8_strided_c) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 16; k < 128; k += 8) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(MR) \ @@ -577,19 +775,25 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(k) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .cStride(17) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .cStride(17); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ \ -TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH3232_NEON, packedA_k_div_8_subtile) { \ +TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARCH32_NEON, packedA_k_div_8_subtile) { \ TEST_REQUIRES_ARM_NEON; \ for (size_t k = 16; k < 128; k += 24) { \ for (uint32_t m = 1; m <= MR; m++) { \ for (uint32_t n = 1; n <= NR; n++) { \ - GemmBlockSparseMicrokernelTester() \ + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() \ .mr(MR) \ .nr(NR) \ .m(m) \ @@ -597,33 +801,43 @@ TEST(Q8GEMM__##MR ## x ##NR ## c##row_block_size ## x ##col_block_size ## __AARC .k(k) \ .rowBlockSize(row_block_size) \ .colBlockSize(col_block_size) \ - .iterations(3) \ - .test_packed( \ - prepacking_kernel, \ - compute_kernel); \ + .iterations(3); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w32); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w16); \ + tester.test_packed( \ + prepacking_kernel, \ + compute_kernel_w8); \ } \ } \ } \ } -#define TEST_PACKED_1x4_SPARSE_OP(MR, NR, prepacking_kernel, compute_kernel) \ +#define TEST_PACKED_1x4_SPARSE_OP(MR, NR, prepacking_kernel, compute_kernel_w32, compute_kernel_w16, compute_kernel_w8) \ TEST_PACKED_ROW_BLOCK_SIZEXCOL_BLOCK_SIZE_SPARSE_OP(MR, \ - NR, 1, 4, prepacking_kernel, compute_kernel) -#define TEST_PACKED_8x1_SPARSE_OP(MR, NR, prepacking_kernel, compute_kernel) \ + NR, 1, 4, prepacking_kernel, compute_kernel_w32, compute_kernel_w16, compute_kernel_w8) +#define TEST_PACKED_8x1_SPARSE_OP(MR, NR, prepacking_kernel, compute_kernel_w32, compute_kernel_w16, compute_kernel_w8) \ TEST_PACKED_ROW_BLOCK_SIZEXCOL_BLOCK_SIZE_SPARSE_OP(MR, \ - NR, 8, 1, prepacking_kernel, compute_kernel) + NR, 8, 1, prepacking_kernel, compute_kernel_w32, compute_kernel_w16, compute_kernel_w8) #if CPUINFO_ARCH_ARM TEST_PACKED_1x4_SPARSE_OP( 4, 8, pytorch_q8gemm_sparse_packA_ukernel_4x4__aarch32_neon, - pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA__aarch32_neon) + pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w32__aarch32_neon, + pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w16__aarch32_neon, + pytorch_q8gemm_dq_sparse_1x4_ukernel_4x8_packedA_w8__aarch32_neon) TEST_PACKED_8x1_SPARSE_OP( 4, 8, pytorch_q8gemm_sparse_packA_ukernel_4x4__aarch32_neon, - pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA__aarch32_neon) + pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w32__aarch32_neon, + pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w16__aarch32_neon, + pytorch_q8gemm_dq_sparse_8x1_ukernel_4x8_packedA_w8__aarch32_neon) #endif @@ -633,12 +847,16 @@ TEST_PACKED_1x4_SPARSE_OP( 8, 8, pytorch_q8gemm_sparse_packA_ukernel_8x4__aarch64_neon, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA__aarch64_neon) + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w32__aarch64_neon, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w16__aarch64_neon, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x8_packedA_w8__aarch64_neon) TEST_PACKED_8x1_SPARSE_OP( 8, 8, pytorch_q8gemm_sparse_packA_ukernel_8x4__aarch64_neon, - pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA__aarch64_neon) + pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w32__aarch64_neon, + pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w16__aarch64_neon, + pytorch_q8gemm_dq_sparse_8x1_ukernel_8x8_packedA_w8__aarch64_neon) #endif @@ -646,367 +864,613 @@ TEST_PACKED_8x1_SPARSE_OP( TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_4) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(3).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(3); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_4_strided_a) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(3) - .aStride(37) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aStride(37); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_4_strided_c) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(3) - .cStride(17) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .cStride(17); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_4_qmin128) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(3).qmin(128).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(3) + .qmin(128); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_4_qmax128) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(3).qmax(128).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(3) + .qmax(128); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_4_azp0) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(3) - .aZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_4_bzp0) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(3) - .bZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .bZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_4_nozp) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(3) .aZeroPoint(0) - .bZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .bZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_8) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(5).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(5); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_8_strided_a) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(5) - .aStride(37) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aStride(37); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_8_strided_c) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(5) - .cStride(17) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .cStride(17); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_8_qmin128) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(5).qmin(128).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(5) + .qmin(128); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_8_qmax128) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(5).qmax(128).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(5) + .qmax(128); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_8_azp0) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(5) - .aZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_8_bzp0) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(5) - .bZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .bZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_lt_8_nozp) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(5) .aZeroPoint(0) - .bZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .bZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_eq_8) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(8).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(8); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_eq_8_strided_a) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(8) - .aStride(37) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aStride(37); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_eq_8_strided_c) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(8) - .cStride(17) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .cStride(17); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_eq_8_qmin128) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(8).qmin(128).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(8) + .qmin(128); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_eq_8_qmax128) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(8).qmax(128).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(8) + .qmax(128); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_eq_8_azp0) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(8) - .aZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_eq_8_bzp0) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(8) - .bZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .bZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_eq_8_nozp) { TEST_REQUIRES_X86_SSE2; - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(8) .aZeroPoint(0) - .bZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .bZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_gt_8) { TEST_REQUIRES_X86_SSE2; for (size_t k = 9; k < 16; k++) { - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(k).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(k); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_gt_8_strided_a) { TEST_REQUIRES_X86_SSE2; for (size_t k = 9; k < 16; k++) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(k) - .aStride(37) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aStride(37); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_gt_8_strided_c) { TEST_REQUIRES_X86_SSE2; for (size_t k = 9; k < 16; k++) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(k) - .cStride(17) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .cStride(17); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_gt_8_azp0) { TEST_REQUIRES_X86_SSE2; for (size_t k = 9; k < 16; k++) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(k) - .aZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_gt_8_bzp0) { TEST_REQUIRES_X86_SSE2; for (size_t k = 9; k < 16; k++) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(k) - .bZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .bZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_gt_8_nozp) { TEST_REQUIRES_X86_SSE2; for (size_t k = 9; k < 16; k++) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(k) .aZeroPoint(0) - .bZeroPoint(0) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .bZeroPoint(0); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } @@ -1015,16 +1479,22 @@ TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_gt_8_subtile) { for (size_t k = 9; k < 16; k++) { for (uint32_t m = 1; m <= 8; m++) { for (uint32_t n = 1; n <= 4; n++) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(m) .n(n) .k(k) - .iterations(3) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .iterations(3); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } } @@ -1033,41 +1503,65 @@ TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_gt_8_subtile) { TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_div_8) { TEST_REQUIRES_X86_SSE2; for (size_t k = 16; k < 128; k += 8) { - GemmBlockSparseMicrokernelTester().mr(8).nr(4).m(8).n(4).k(k).test_packed( + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() + .mr(8) + .nr(4) + .m(8) + .n(4) + .k(k); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_div_8_strided_a) { TEST_REQUIRES_X86_SSE2; for (size_t k = 16; k < 128; k += 8) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(k) - .aStride(171) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .aStride(171); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_div_8_strided_c) { TEST_REQUIRES_X86_SSE2; for (size_t k = 16; k < 128; k += 8) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(8) .n(4) .k(k) - .cStride(17) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .cStride(17); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } @@ -1076,16 +1570,22 @@ TEST(Q8GEMM_8x4c1x4__SSE2, packedA_k_div_8_subtile) { for (size_t k = 16; k < 128; k += 24) { for (uint32_t m = 1; m <= 8; m++) { for (uint32_t n = 1; n <= 4; n++) { - GemmBlockSparseMicrokernelTester() + GemmBlockSparseMicrokernelTester tester = GemmBlockSparseMicrokernelTester() .mr(8) .nr(4) .m(m) .n(n) .k(k) - .iterations(3) - .test_packed( - pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, - pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA__sse2); + .iterations(3); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w32__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w16__sse2); + tester.test_packed( + pytorch_q8gemm_sparse_packA_ukernel_8x4__sse2, + pytorch_q8gemm_dq_sparse_1x4_ukernel_8x4_packedA_w8__sse2); } } } diff --git a/aten/src/ATen/native/quantized/cpu/qnormalization.cpp b/aten/src/ATen/native/quantized/cpu/qnormalization.cpp index ddfbad8917f7..f9b94ec4e49d 100644 --- a/aten/src/ATen/native/quantized/cpu/qnormalization.cpp +++ b/aten/src/ATen/native/quantized/cpu/qnormalization.cpp @@ -1,11 +1,16 @@ -#include +#include #include #include -#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + #include #include diff --git a/aten/src/ATen/native/quantized/cpu/qrelu.cpp b/aten/src/ATen/native/quantized/cpu/qrelu.cpp index e4ca887fb674..fcdfb0e9260c 100644 --- a/aten/src/ATen/native/quantized/cpu/qrelu.cpp +++ b/aten/src/ATen/native/quantized/cpu/qrelu.cpp @@ -1,6 +1,8 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include +#include #include #include #include @@ -10,6 +12,19 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/quantized/cpu/qsigmoid.cpp b/aten/src/ATen/native/quantized/cpu/qsigmoid.cpp index 354590e211c7..862d2bad49dd 100644 --- a/aten/src/ATen/native/quantized/cpu/qsigmoid.cpp +++ b/aten/src/ATen/native/quantized/cpu/qsigmoid.cpp @@ -1,15 +1,22 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include -#include -#include -#include #include #include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/quantized/cpu/qtanh.cpp b/aten/src/ATen/native/quantized/cpu/qtanh.cpp index fde8f41630df..5dc3e759ede1 100644 --- a/aten/src/ATen/native/quantized/cpu/qtanh.cpp +++ b/aten/src/ATen/native/quantized/cpu/qtanh.cpp @@ -1,16 +1,19 @@ -#include -#include -#include -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include #include #include -#include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cpu/qthreshold.cpp b/aten/src/ATen/native/quantized/cpu/qthreshold.cpp index 6c1f10356d98..c2b03638c0ea 100644 --- a/aten/src/ATen/native/quantized/cpu/qthreshold.cpp +++ b/aten/src/ATen/native/quantized/cpu/qthreshold.cpp @@ -1,9 +1,16 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/quantized/cuda/Activation.cpp b/aten/src/ATen/native/quantized/cuda/Activation.cpp index 3154c2a75dd0..3a9e400fa81b 100644 --- a/aten/src/ATen/native/quantized/cuda/Activation.cpp +++ b/aten/src/ATen/native/quantized/cuda/Activation.cpp @@ -1,5 +1,6 @@ #include #include +#include namespace at { namespace native { @@ -17,5 +18,13 @@ Tensor gelu_quantized_cuda(const Tensor& qx, c10::string_view approximate) { return at::quantize_per_tensor(result_fp32, qx.q_scale(), qx.q_zero_point(), qx.scalar_type()); } +Tensor relu_quantized_cuda(const Tensor& self) { + auto zero_point = self.q_zero_point(); + auto int_repr = self.int_repr(); + auto mask = (int_repr > zero_point); + const auto relu_int_repr = at::where(mask, int_repr, zero_point); + return at::_make_per_tensor_quantized_tensor(relu_int_repr, self.q_scale(), zero_point); +} + } // namespace at::native } // namespace at diff --git a/aten/src/ATen/native/quantized/cuda/Activation.cu b/aten/src/ATen/native/quantized/cuda/Activation.cu new file mode 100644 index 000000000000..9e3e3ba13ea6 --- /dev/null +++ b/aten/src/ATen/native/quantized/cuda/Activation.cu @@ -0,0 +1,21 @@ +#include +#include +#include + +namespace at { +namespace native { + +Tensor& relu_quantized_cuda_(Tensor& self) { + const auto zero_point = self.q_zero_point(); + AT_DISPATCH_QINT_TYPES( + self.scalar_type(), "qrelu_cuda", [&]() { + auto iter = TensorIterator::unary_op(self, self); + gpu_kernel(iter, [zero_point] GPU_LAMBDA(scalar_t value) -> scalar_t { + return scalar_t(std::max(value.val_, zero_point)); + }); + }); + return self; +} + +} // namespace at::native +} // namespace at diff --git a/aten/src/ATen/native/quantized/cuda/AffineQuantizer.cu b/aten/src/ATen/native/quantized/cuda/AffineQuantizer.cu index 6f251fc33502..c60dc57f9226 100644 --- a/aten/src/ATen/native/quantized/cuda/AffineQuantizer.cu +++ b/aten/src/ATen/native/quantized/cuda/AffineQuantizer.cu @@ -1,10 +1,20 @@ -#include -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { namespace { diff --git a/aten/src/ATen/native/quantized/cuda/EmbeddingBag.cu b/aten/src/ATen/native/quantized/cuda/EmbeddingBag.cu index 55b0b0d4f36d..0580c47b8c62 100644 --- a/aten/src/ATen/native/quantized/cuda/EmbeddingBag.cu +++ b/aten/src/ATen/native/quantized/cuda/EmbeddingBag.cu @@ -1,4 +1,6 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include #include #include @@ -7,6 +9,16 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cuda/FakeQuantizeCore.cu b/aten/src/ATen/native/quantized/cuda/FakeQuantizeCore.cu index e85622b3d4fa..3d340a303afb 100644 --- a/aten/src/ATen/native/quantized/cuda/FakeQuantizeCore.cu +++ b/aten/src/ATen/native/quantized/cuda/FakeQuantizeCore.cu @@ -1,7 +1,7 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include #include -#include #include #include #include diff --git a/aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu b/aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu index a448a7cca215..d75a10c0db89 100644 --- a/aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu +++ b/aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu @@ -1,9 +1,21 @@ -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#include +#include +#include +#include +#endif + #include namespace at { diff --git a/aten/src/ATen/native/quantized/cuda/IntReprQuant.cu b/aten/src/ATen/native/quantized/cuda/IntReprQuant.cu index 497b94d020f3..082244ca0c85 100644 --- a/aten/src/ATen/native/quantized/cuda/IntReprQuant.cu +++ b/aten/src/ATen/native/quantized/cuda/IntReprQuant.cu @@ -1,8 +1,17 @@ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include #include -#include -#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cuda/MakePerTensorQuantizedTensor.cu b/aten/src/ATen/native/quantized/cuda/MakePerTensorQuantizedTensor.cu index 82fc77735a94..ce5a54ceec16 100644 --- a/aten/src/ATen/native/quantized/cuda/MakePerTensorQuantizedTensor.cu +++ b/aten/src/ATen/native/quantized/cuda/MakePerTensorQuantizedTensor.cu @@ -1,7 +1,20 @@ -#include -#include +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS +#include +#include +#include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + namespace at { namespace native { diff --git a/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp b/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp index dce78c4bb294..fbb46b4b0174 100644 --- a/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp +++ b/aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp @@ -18,6 +18,13 @@ #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#endif + #include namespace at { @@ -64,10 +71,10 @@ std::unordered_map> PackedConvWeightCudnn< int64_t groups, bool transpose) { // TODO: need to check out to implement groups for conv operator in Conv.cpp - TORCH_CHECK(groups == 1, "Quantized cudnn conv2d is currenty limited to groups = 1; received groups =", groups); + TORCH_CHECK(groups == 1, "Quantized cudnn conv2d is currently limited to groups = 1; received groups =", groups); TORCH_CHECK(weight.qscheme() == c10::kPerTensorAffine, "Unsupported qscheme: ", toString(weight.qscheme())); TORCH_CHECK( kSpatialDim == 2, // 1D is packed as 2d, hence we don't need other checks diff --git a/aten/src/ATen/native/quantized/cudnn/utils.h b/aten/src/ATen/native/quantized/cudnn/utils.h index 1a58e8f38456..4e5f663efa16 100644 --- a/aten/src/ATen/native/quantized/cudnn/utils.h +++ b/aten/src/ATen/native/quantized/cudnn/utils.h @@ -19,6 +19,12 @@ This file contains some of the auxiliary functions used by both Conv.cpp & Linea #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#endif + struct PackedLinearWeightCudnn : public LinearPackedParamsBase { PackedLinearWeightCudnn( at::Tensor orig_weight, @@ -207,7 +213,11 @@ uint8_t getAlignment(const at::Tensor &t) { // alignment are in bytes uint8_t alignment = 1; uintptr_t address = reinterpret_cast(t.data_ptr()); - while (address % alignment == 0 && alignment < 16) alignment *= 2; + for (; alignment < 16; alignment *= 2) { + if (address % (alignment * 2)) { + return alignment; + } + } return alignment; } diff --git a/aten/src/ATen/native/quantized/qconv_unpack.cpp b/aten/src/ATen/native/quantized/qconv_unpack.cpp index 41f4754e8f1b..90e210ebe227 100644 --- a/aten/src/ATen/native/quantized/qconv_unpack.cpp +++ b/aten/src/ATen/native/quantized/qconv_unpack.cpp @@ -7,9 +7,12 @@ The implementations for the unpack functions can be found in /cpu/qconv_unpack_i and /cudnn/ConvUnpackImpl.cpp, for cudnn. */ +#define TORCH_ASSERT_ONLY_METHOD_OPERATORS #include -#include +#include +#include +#include #include #include #include @@ -17,6 +20,15 @@ and /cudnn/ConvUnpackImpl.cpp, for cudnn. #include #include +#ifndef AT_PER_OPERATOR_HEADERS +#include +#else +#include +#include +#include +#endif + + namespace at { namespace native { namespace { @@ -36,7 +48,8 @@ class QConvUnpackWeightsInt8 final { auto& ctx = at::globalContext(); #ifdef USE_FBGEMM - if (ctx.qEngine() == at::QEngine::FBGEMM) { + if (ctx.qEngine() == at::QEngine::FBGEMM || + ctx.qEngine() == at::QEngine::X86) { return packed_weight->unpack(); } #endif @@ -72,7 +85,8 @@ class QConv1dUnpackWeightsInt8 final { at::Tensor weight; c10::optional bias; #ifdef USE_FBGEMM - if (ctx.qEngine() == at::QEngine::FBGEMM) { + if (ctx.qEngine() == at::QEngine::FBGEMM || + ctx.qEngine() == at::QEngine::X86) { std::tie(weight, bias) = packed_weight->unpack(); weight = weight.squeeze_(quant_utils::kConv1dSqueezeDim + 2); return std::tuple>(weight, bias); diff --git a/aten/src/ATen/native/sparse/Macros.h b/aten/src/ATen/native/sparse/Macros.h new file mode 100644 index 000000000000..7dac5b04e6f8 --- /dev/null +++ b/aten/src/ATen/native/sparse/Macros.h @@ -0,0 +1,19 @@ +#pragma once + +#if defined(__CUDACC__) || defined(__HIPCC__) +#define GPUCC +#define FUNCAPI __host__ __device__ +#define INLINE __forceinline__ +#else +#define FUNCAPI +#define INLINE inline +#endif + +#if defined(_WIN32) || defined(_WIN64) +// Temporarily disable __restrict on Windows, +// as it turns out not all MSVC versions are aware of it. +// #define RESTRICT __restrict +#define RESTRICT +#else +#define RESTRICT __restrict__ +#endif diff --git a/aten/src/ATen/native/sparse/SparseBinaryOpIntersectionCommon.h b/aten/src/ATen/native/sparse/SparseBinaryOpIntersectionCommon.h new file mode 100644 index 000000000000..08ba4de68cac --- /dev/null +++ b/aten/src/ATen/native/sparse/SparseBinaryOpIntersectionCommon.h @@ -0,0 +1,585 @@ +#pragma once + +#include +#include +#include +#include +#include +#include + +#ifndef AT_PER_OPERATOR_HEADERS +#include +#include +#else +#include +#include +#include +#include +#include +#endif + +#ifdef GPUCC +#define NAME "sparse_binary_op_intersection_cuda" +#else +#define NAME "sparse_binary_op_intersection_cpu" +#endif + +#define CALL(...) __VA_ARGS__(); +#define EXPAND(b, n, ...) \ + if (b) { \ + using index_t ## n = int32_t; \ + __VA_ARGS__ \ + } \ + else { \ + using index_t ## n = int64_t; \ + __VA_ARGS__ \ + } +#define BOOL_TO_INDEX_TYPE1(b0, ...) \ + EXPAND(b0, 0, CALL(__VA_ARGS__)) +#define BOOL_TO_INDEX_TYPE2(b1, b0, ...) \ + EXPAND(b1, 1, BOOL_TO_INDEX_TYPE1(b0, __VA_ARGS__)) +#define BOOL_TO_INDEX_TYPE3(b2, b1, b0, ...) \ + EXPAND(b2, 2, BOOL_TO_INDEX_TYPE2(b1, b0, __VA_ARGS__)) + +namespace at { +namespace native { + +namespace { + +using at::sparse::get_sparse_impl; + +// ForwardIt: only legacy random access iterator is supported. +template +static FUNCAPI INLINE +ForwardIt find_bound(ForwardIt first, ForwardIt last, const T& value) { + ForwardIt RESTRICT it; + typename std::iterator_traits::difference_type count, step; + // NOTE: std::distance(first, last) compiles but produces wrong results on CUDA, + // so only legacy random access iterators are safe in this code. + count = last - first; + + while (count > 0) { + it = first; + step = count / 2; + // avoiding std::advance(it, step), + // although it does work unlike std::distance on CUDA. + it += step; + // The decision which separates finding a lower bound vs an upper bound. + // Note that a lower bound is a value at *it with the smallest index + // such that *it >= value if such value exists, or last if does not. + // Similarly, an upper bound is a value at *it with the smallest index + // such that *it > value if such value exists, or last if does not. + // Let is_lower = true and *it < value, then we know that *it and values + // preceeding *it cannot contain a lower bound, so we adjust initial iterator range + // from [first, first + count] to [first + step + 1, first + count - (step + 1)], + // where +1 skips the element at which we have just evaluated *it < value. + // Samilar logic holds when is_lower = false. + if (is_lower ? *it < value : value >= *it) { + first = ++it; + count -= step + 1; + } + else { + count = step; + } + } + return first; +} + +template