Add ROCm5.2/AMDGPU support for PyTorch 1.10 #1

WBobby · 2022-07-14T19:45:03Z

[release/1.10] Pin builder and xla repo ([release/1.10] Pin builder and xla repo pytorch/pytorch#65433)
[1.10] Remove torch.vmap ([1.10] Remove torch.vmap pytorch/pytorch#65496)
Fix test reporting git merge-base (Fix test reporting git merge-base pytorch/pytorch#65787)
Fix the slowdown of _object_to_tensor since 1.9 (Fix the slowdown of _object_to_tensor since 1.9 pytorch/pytorch#65721) (Fix the slowdown of _object_to_tensor since 1.9 (#65721) pytorch/pytorch#65835)
[DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper ([DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper pytorch/pytorch#65220) ([DataPipe] Fix deepcopy for Mapper and in-place modification for IterableWrapper pytorch/pytorch#65924)
[ci] try installing libgnutls to fix cert error ([ci] try installing libgnutls to fix cert error pytorch/pytorch#65934) ([ci] try installing libgnutls to fix cert error (#65934) pytorch/pytorch#65979)
Binary building wthout python fix (Binary building wthout python fix pytorch/pytorch#66031) ([release/1.10] Binary building without python fix (#66031) pytorch/pytorch#66117)
Fix Windows ninja builds when MAX_JOBS is specified (Fix Windows ninja builds when MAX_JOBS is specified pytorch/pytorch#65444) (Fix Windows ninja builds when MAX_JOBS is specified (#65444) pytorch/pytorch#66155)
Fix backward compatibility tests (Fix backward compatibility tests pytorch/pytorch#66186)
[iOS][CI] Update dev certs ([iOS][CI] Update dev certs pytorch/pytorch#66004) ([iOS][CI] Update dev certs (#66004) pytorch/pytorch#66188)
[DataPipe] DataPipe Fix and Deprecation Warnings for Release 1.10 ([DataPipe] DataPipe Fix and Deprecation Warnings for Release 1.10 pytorch/pytorch#65932)
Tweak file_diff_from_base for release/1.10 branch (Tweak file_diff_from_base for release/1.10 branch pytorch/pytorch#66202)
Added option to update parameters using state_dict in AveragedModel (Added option to update parameters using state_dict in AveragedModel pytorch/pytorch#65495) (Added option to update parameters using state_dict in AveragedModel (#65495) pytorch/pytorch#65755)
Revert "Added option to update parameters using state_dict in AveragedModel (Added option to update parameters using state_dict in AveragedModel pytorch/pytorch#65495) (Added option to update parameters using state_dict in AveragedModel (#65495) pytorch/pytorch#65755)" (Revert "Added option to update parameters using state_dict in AveragedModel (#65495)" pytorch/pytorch#66308)
Convert Sampler back to lazily construction (Convert generator in Sampler back to lazy construction pytorch/pytorch#63646) (Convert generator attached to Sampler back to lazily construction pytorch/pytorch#65926)
[ONNX] Deprecate various args ([ONNX] Deprecate various args pytorch/pytorch#65962)
Fix cosine similarity dim checks (Fix cosine similarity dim checks pytorch/pytorch#66214)
fix normal with empty std (fix normal with empty std pytorch/pytorch#66524)
Call PyArray_Check only if NumPy is available (Call PyArray_Check only if NumPy is available pytorch/pytorch#66433) (Call PyArray_Check only if NumPy is available (#66433) pytorch/pytorch#66629)
Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors pytorch/pytorch#66082) (Handle shared memory cases in MathBithFallback pytorch/pytorch#66576)
Delete extraneous whitespaces
Disable .numpy() and .tolist() for tensor subclasses subclasses and f… (Disable .numpy() and .tolist() for tensor subclasses subclasses and f… pytorch/pytorch#66642)
Handle shared memory cases in MathBitFallback (Handle shared memory cases in MathBitFallback pytorch/pytorch#66667)
(torch/elastic) add fqdn hostname to error printout ((torch/elastic) add fqdn hostname to error printout pytorch/pytorch#66182) ((torch/elastic) add fqdn hostname to error printout (#66182) pytorch/pytorch#66662)
Use preview version of kineto with roctracer support
Update kineto commit
related commits for apex and torchvision
rocblas alt impl during backward pass only
Revert "Replace internal::GRAIN_SIZE by grain_size (parameter). (Replace internal::GRAIN_SIZE by grain_size (parameter). pytorch/pytorch#53177)"
Add amdgpu repos for ROCm4.5 install (Add amdgpu repos for ROCm4.5 install ROCm/pytorch#886)
Use --no-check-certificate flag for valgrind install in CentOS7 only to workaround certificate expiry issue
Update gfx archs
add support for ubuntu 20.04 to CI docker images
Set variables for base urls to allow configurability
[ROCm] use hipCUB chained iterator
Export c10::hip::HIPCachingAllocatorMasqueradingAsCUDA::get() in libtorch_hip.so so apex sees symbol definition at runtime
Disable caffe2 build (Disable caffe2 build ROCm/pytorch#901)
Add ROCm5.0/AMDGPU support (Add ROCm5.0/AMDGPU support ROCm/pytorch#904)
Fix JIT path for Pytorch extensions and other hipify fixes (Fix JIT path for Pytorch extensions and other hipify fixes (release/1.10) ROCm/pytorch#903)
Cherry-pick the commit to make TORCH_(CUDABLAS|CUSOLVER)_CHECK usable in custom extensions (Cherry-pick the commit to make TORCH_(CUDABLAS|CUSOLVER)_CHECK usable in custom extensions ROCm/pytorch#909)
Add AMDGPU version for ROCm5.0.1
Hipify bug fix for header_include_paths being passed in as None from JIT path
Remove gfx1030 from list of default targets for PyTorch since Navi21 support is not in PT1.9
[WIP][resubmit] Don't #define NUM_THREADS ([resubmit] Don't #define NUM_THREADS pytorch/pytorch#68008)
Add amdgpu version support for ROCm5.1 (Add amdgpu version support for ROCm5.1 ROCm/pytorch#980)
Add ROCm5.1.1/AMDGPU support (Add ROCm5.1.1/AMDGPU support ROCm/pytorch#985)
[ROCm] revert cat operator performance work-around ([ROCm] revert cat operator performance work-around ROCm/pytorch#987)
Enable atomicAddNoRet() for all gfx targets (Enable atomicAddNoRet() for all gfx targets ROCm/pytorch#992)
Updated handling of PYTORCH_ROCM_ARCH while building base docker
Properly import LooseVersion (Properly import LooseVersion ROCm/pytorch#996)
[ROCm] use ncclAllToAll for rocm
Deactive ncclAllToAll since degradation was observed on a Hayabusa system
Add ROCm5.1.3/AMDGPU support
[ROCm] update cmake package DIR paths ([ROCm] update cmake package DIR paths pytorch/pytorch#77087)
Add ROCm5.2/AMDGPU support for PyTorch 1.10
Add ROCm5.2/AMDGPU support for PyTorch 1.10

Fixes #ISSUE_NUMBER

Pin builder to https://github.com/pytorch/builder/commits/release/1.10 Pin xla to https://github.com/pytorch/xla/tree/r1.10

torch.vmap is a prototype feature and should not be in the stable binary. This PR: - Removes the torch.vmap API - Removes the documentation entry for torch.vmap - Changes the vmap tests to use an internal API instead of torch.vmap. Test Plan: - Tested locally (test_torch, test_autograd, test_type_hints, test_vmap), but also wait for CI.

…rch#65835) Summary: Pull Request resolved: pytorch#65721 #Closes: pytorch#65696 The bug is introduced in pytorch#55861, and it causes 100X slowdown since 1.9. ghstack-source-id: 139128267 Test Plan: Performance test: ``` import time from torch.distributed.distributed_c10d import _object_to_tensor start = time.time() _object_to_tensor("x" * 50_000_000) print("Time:", time.time() - start) ``` Reviewed By: rohan-varma Differential Revision: D31219794 fbshipit-source-id: 1abec38f9d51361c1eab6ad5efd87b589322e208 Co-authored-by: Yi Wang <wayi@fb.com>

…on for IterableWrapper (pytorch#65220) (pytorch#65924) Summary: Pull Request resolved: pytorch#65220 Fixes pytorch#65221 - Remove deepcopy from Mapper to support file handles - Convert `IterableWrapper` to deepcopy iterable instance within each iterator to prevent in-place modification (different data per epoch) - Convert `IDP` to `IterableWrapper` in test_datapipe.py - Refine the variable names (prevent using `dp` that is module reference) Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31021886 Pulled By: ejguan fbshipit-source-id: 72a9eee66c758e2717d591cd0942892bddedc223

…rch#65979) Summary: Pull Request resolved: pytorch#65934 see: pytorch#65931, this was a suggested remediation on the linked issue Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D31313040 Pulled By: suo fbshipit-source-id: a9e2b82a1e879962af768ed3049c73ab77394738 Co-authored-by: Michael Suo <suo@fb.com>

Summary: Fixes pytorch#66030 Pull Request resolved: pytorch#66031 Reviewed By: VitalyFedyunin Differential Revision: D31356243 Pulled By: malfet fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd

…ytorch#66155) Summary: Reported by cloudhan in pytorch#64733 (comment) Fixes regression introduced by pytorch@047e682 cc malfet seemethere Pull Request resolved: pytorch#65444 Reviewed By: dagitses, seemethere Differential Revision: D31103260 Pulled By: malfet fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284

Compare operator list against RC1 build rather than against nightly

Summary: Fixes pytorch#65988 Pull Request resolved: pytorch#66004 Reviewed By: xta0 Differential Revision: D31340893 Pulled By: malfet fbshipit-source-id: 3bf0be266e9686a73d62e86c5cf0bebeb0416260 Co-authored-by: Tao Xu <taox@fb.com>

…torch#65932) * Unify the output pathname of archive reader and extractor (pytorch#65424) Summary: Pull Request resolved: pytorch#65424 This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93 Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31090447 Pulled By: ejguan fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65 * [DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (pytorch#65827) Summary: Pull Request resolved: pytorch#65827 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31272794 Pulled By: NivekT fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02 * [DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (pytorch#65877) Summary: Pull Request resolved: pytorch#65877 Fixes pytorch#65808 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31296041 Pulled By: NivekT fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4 Co-authored-by: Erjia Guan <erjia@fb.com>

…ytorch#65495) (pytorch#65755) * Added option to update parameters using state_dict in AveragedModel (pytorch#65495) Summary: While implementing [EMA](pytorch/vision#4381 extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](pytorch/vision#4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: pytorch/vision#4406 (review) Pull Request resolved: pytorch#65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2 (cherry picked from commit 2ea724b) * Added validation of mode parameter in AveragedModel (pytorch#65921) Summary: Discussion: pytorch#65495 (comment) Pull Request resolved: pytorch#65921 Reviewed By: albanD Differential Revision: D31310105 Pulled By: prabhat00155 fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3 (cherry picked from commit c7748fc)

…dModel (pytorch#65495) (pytorch#65755)" (pytorch#66308) This reverts commit 5f1a434.

…65926) Summary: Pull Request resolved: pytorch#63646 Fixes pytorch#63609 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D30451774 Pulled By: ejguan fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413

* [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (pytorch#61702) (pytorch#64370) Summary: Pull Request resolved: pytorch#64370 As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1. Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function. This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905270 Pulled By: malfet fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a Co-authored-by: fatcat-z <zhang-ji@outlook.com> * [ONNX] Remove strip_doc_string param from torch.onnx.export() function. (pytorch#61712) (pytorch#64371) Summary: Pull Request resolved: pytorch#64371 As of now, the "strip_doc_string" parameter was described as below: strip_doc_string (bool, default True): do not include the field doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``. This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits. To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter. But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR. This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905268 Pulled By: malfet fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60 Co-authored-by: fatcat-z <zhang-ji@outlook.com> * [ONNX] minor doc improvements and cleanup (pytorch#62514) (pytorch#64373) Summary: Pull Request resolved: pytorch#64373 * Fix some bad formatting and clarify things in onnx.rst. * In `export_to_pretty_string`: * Add documentation for previously undocumented args. * Document that `f` arg is ignored and mark it deprecated. * Update tests to stop setting `f`. * Warn if `_retain_param_name` is set. * Use double quotes for string literals in test_operators.py. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905271 Pulled By: malfet fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3 * [ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (pytorch#62815) (pytorch#64380) Summary: Pull Request resolved: pytorch#64380 * `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function. * Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905266 Pulled By: malfet fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2 Co-authored-by: hwangdeyu <dejack953@outlook.com> * [ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (pytorch#62257) (pytorch#64382) Summary: Pull Request resolved: pytorch#64382 * This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit. * When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself. * This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905265 Pulled By: malfet fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb Co-authored-by: hwangdeyu <dejack953@outlook.com> * fix clang-tidy error introduced by pytorch#64382 (pytorch#65977) Summary: Pull Request resolved: pytorch#65977 Reviewed By: ngimel Differential Revision: D31423174 Pulled By: malfet fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352 Co-authored-by: BowenBao <bowbao@microsoft.com> Co-authored-by: fatcat-z <zhang-ji@outlook.com> Co-authored-by: hwangdeyu <dejack953@outlook.com>

* fix cosine similarity dimensionality check * fix shapes in the doc

…rch#66629) Summary: Fixes pytorch#66353 Fixes #{issue number} Pull Request resolved: pytorch#66433 Reviewed By: seemethere, janeyx99 Differential Revision: D31548290 Pulled By: malfet fbshipit-source-id: 3b094bc8195d0392338e0bdc6df2f39587b85bb3

…ix .tolist() for conjugated and negated tensors (pytorch#66082) (pytorch#66576) Summary: Pull Request resolved: pytorch#66082 Fixes pytorch#66024 pytorch#65779 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD Test Plan: Imported from OSS Reviewed By: Gamrix, albanD Differential Revision: D31615588 Pulled By: anjali411 fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19

pytorch#66642) * Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (pytorch#66082) Summary: Pull Request resolved: pytorch#66082 Fixes pytorch#66024 pytorch#65779 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD Test Plan: Imported from OSS Reviewed By: Gamrix, albanD Differential Revision: D31615588 Pulled By: anjali411 fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19 * Apply suggestions from code review Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>

* Handle shared memory cases in MathBithFallback (pytorch#63602) Summary: Pull Request resolved: pytorch#63602 This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example: ``` a=torch.tensor([1+1j]) b=a.conj() b.add_(a) # should return tensor([2]) but returns tensor ([2-2j]) ``` The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s). This PR fixes this issue by: 1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`). 2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before. 3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector. 4. Do the computation. 5. Re-conjugate the mutable argument tensors. NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details. Fixes pytorch#59943 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30466905 Pulled By: anjali411 fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b * fix lint (pytorch#66572) Summary: Pull Request resolved: pytorch#66572 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31624043 Pulled By: suo fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd Co-authored-by: anjali411 <chourdiaanjali123@gmail.com> Co-authored-by: Michael Suo <suo@fb.com>

…ytorch#66662) Summary: Pull Request resolved: pytorch#66182 closes pytorch#63174 Does a few things: 1. adds hostname to the error report 2. moves the "root cause" section to the end (presumably since the logs are being "tailed" we want the root cause to appear at the end) 3. moves redundant error info logging to debug 4. makes the border max 60 char in length and justifies left for the header NOTE: YOU HAVE TO annotate your main function with torch.distributed.elastic.multiprocessing.errors.record, otherwise no traceback is printed (this is because python exception propagation does NOT work out of the both for IPC - hence the extra record annotation). Test Plan: Sample ``` ============================================================ run_script_path FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2021-10-05_17:37:22 host : devvm4955.prn0.facebook.com rank : 0 (local_rank: 0) exitcode : 1 (pid: 3296201) error_file: /home/kiuk/tmp/elastic/none_3_lsytqe/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/tmp/jetter.xr3_x6qq/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 372, in wrapper return f(*args, **kwargs) File "main.py", line 28, in main raise RuntimeError(args.throws) RuntimeError: foobar ============================================================ ``` Reviewed By: cbalioglu, aivanou Differential Revision: D31416492 fbshipit-source-id: 0aeaf6e634e23ce0ea7f6a03b12c8a9ac57246e9

…ytorch#53177)" - This reverts commit a0d1e70. - Reverting this commit, since it is causing a regression for detectron2 - Please check SWDEV-304968 for more info.

* Add amdgpu repos for rocm install * Correct ROCm version

Signed-off-by: Wang, Yanyao <yanyao.wang@amd.com> Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>

* Various hipify-related fixes: 1. Fix JIT path of building PyTorch extensions 2. Use absolute paths for all files to allow for absolute paths in includes/ignores 3. Limit hipification to build_dir for non-JIT path 4. Ignore ROCm/PyTorch headers during hipification of header_include_dirs for JIT path 5. Update hipify output with clearer status 6. Don't include files ignored by hipify in output 7. Define HIP flags in cflags for JIT path as well 8. Ensure includes and ignores are passed in as absolute paths for pytorch build; explicitly require relative paths for certain helper functions

… in custom extensions (ROCm#909)

…JIT path

…support is not in PT1.9

Summary: This reverts commit 9e8016d. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: pytorch#68008 Reviewed By: H-Huang Differential Revision: D32254779 Pulled By: ngimel fbshipit-source-id: 38ec415199f62a1e58000abe3e34ac91898a94ae

Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>

revert d5ca53c (pytorch#46097). The changes only affect ROCm. Reverts a work-around for a compiler performance issue that is no longer needed. `python -m pt.cat_test --tag_filter all --device cuda` ``` OLD Forward Execution Time (us) : 48.833 NEW Forward Execution Time (us) : 8.318 OLD Forward Execution Time (us) : 54.508 NEW Forward Execution Time (us) : 23.824 OLD Forward Execution Time (us) : 52.117 NEW Forward Execution Time (us) : 14.942 OLD Forward Execution Time (us) : 98.790 NEW Forward Execution Time (us) : 74.334 OLD Forward Execution Time (us) : 102.063 NEW Forward Execution Time (us) : 76.008 OLD Forward Execution Time (us) : 167.786 NEW Forward Execution Time (us) : 123.679 OLD Forward Execution Time (us) : 98.320 NEW Forward Execution Time (us) : 67.436 OLD Forward Execution Time (us) : 91.484 NEW Forward Execution Time (us) : 59.230 OLD Forward Execution Time (us) : 109.569 NEW Forward Execution Time (us) : 76.557 OLD Forward Execution Time (us) : 106.603 NEW Forward Execution Time (us) : 87.635 OLD Forward Execution Time (us) : 106.693 NEW Forward Execution Time (us) : 88.902 OLD Forward Execution Time (us) : 110.881 NEW Forward Execution Time (us) : 94.361 OLD Forward Execution Time (us) : 122.925 NEW Forward Execution Time (us) : 123.046 OLD Forward Execution Time (us) : 272.442 NEW Forward Execution Time (us) : 271.932 OLD Forward Execution Time (us) : 457.329 NEW Forward Execution Time (us) : 456.767 OLD Forward Execution Time (us) : 117.688 NEW Forward Execution Time (us) : 87.133 OLD Forward Execution Time (us) : 873.764 NEW Forward Execution Time (us) : 865.075 OLD Forward Execution Time (us) : 1746.831 NEW Forward Execution Time (us) : 1730.252 OLD Forward Execution Time (us) : 2619.303 NEW Forward Execution Time (us) : 2598.717 OLD Forward Execution Time (us) : 52.063 NEW Forward Execution Time (us) : 7.904 OLD Forward Execution Time (us) : 52.275 NEW Forward Execution Time (us) : 8.118 OLD Forward Execution Time (us) : 51.896 NEW Forward Execution Time (us) : 7.938 OLD Forward Execution Time (us) : 51.745 NEW Forward Execution Time (us) : 7.922 OLD Forward Execution Time (us) : 52.575 NEW Forward Execution Time (us) : 13.299 OLD Forward Execution Time (us) : 52.090 NEW Forward Execution Time (us) : 8.015 ``` Pull Request resolved: pytorch#74129 Approved by: https://github.com/ngimel

Properly import LooseVersion (pytorch#69904) Summary: This fixes regression introduced by pytorch#57040 Somehow importing `distutils` from `setuptool` caused import of `distutils.versions`, which is not a documented dependency and got change with the release of [setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0) We should not rely on that, as `import distutils` never re-imports `distutils.version`, which one can see by observing https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py or by running: ``` % python3 -c "import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys'] % python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version'] ``` Pull Request resolved: pytorch#69904 Reviewed By: albanD, atalman, janeyx99 Differential Revision: D33094453 Pulled By: malfet fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97

@jithunnair-amd

use ncclAllToAll for rocm version > 5.0; ROCm/rccl#503 detail on ncclAllToAll: ROCm/rccl#503 @jithunnair-amd @amathews-amd Pull Request resolved: pytorch#75128 Approved by: https://github.com/wenkaidu, https://github.com/yzygitzh, https://github.com/seemethere

…stem

…evert_ncclAllToAll Deactive ncclAllToAll

Add ROCm5.1.3/AMDGPU support

Fixes nightly libtorch builds. As of ROCm 5.1.x, all *.cmake files are under /opt/rocm/lib/cmake/package instead of /opt/rocm/package/lib/cmake. Pull Request resolved: pytorch#77087 Approved by: https://github.com/seemethere

WBobby · 2022-07-14T19:46:17Z

test

…78136) (pytorch#78204) This prevents `import torch` accidentally crash on machines with no metal devices Should prevent crashes reported in pytorch#77662 (comment) and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true Backtrace to the crash: ``` (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23 frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 frame #2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125 frame ROCm#3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535 frame ROCm#4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl: -> 0x10fd9f524 <+436>: movq %rax, 0x1b0(%rbx) 0x10fd9f52b <+443>: movw $0x0, 0x1b8(%rbx) 0x10fd9f534 <+452>: addq $0x8, %rsp 0x10fd9f538 <+456>: popq %rbx (lldb) disassemble ... 0x10fd9f514 <+420>: movq 0xf19ad15(%rip), %rsi ; "maxBufferLength" 0x10fd9f51b <+427>: movq %r14, %rdi 0x10fd9f51e <+430>: callq *0xeaa326c(%rip) ; (void *)0x00007fff7202be40: objc_msgSend ``` which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in https://github.com/pytorch/pytorch/blob/2ae3c59e4bcb8e6e75b4a942cacc2d338c88e609/aten/src/ATen/mps/MPSAllocator.h#L171 Pull Request resolved: pytorch#78136 Approved by: https://github.com/seemethere Co-authored-by: Nikita Shulga <nshulga@fb.com>

…78136) This prevents `import torch` accidentally crash on machines with no metal devices Should prevent crashes reported in pytorch#77662 (comment) and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true Backtrace to the crash: ``` (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23 frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 frame #2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125 frame ROCm#3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535 frame ROCm#4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl: -> 0x10fd9f524 <+436>: movq %rax, 0x1b0(%rbx) 0x10fd9f52b <+443>: movw $0x0, 0x1b8(%rbx) 0x10fd9f534 <+452>: addq $0x8, %rsp 0x10fd9f538 <+456>: popq %rbx (lldb) disassemble ... 0x10fd9f514 <+420>: movq 0xf19ad15(%rip), %rsi ; "maxBufferLength" 0x10fd9f51b <+427>: movq %r14, %rdi 0x10fd9f51e <+430>: callq *0xeaa326c(%rip) ; (void *)0x00007fff7202be40: objc_msgSend ``` which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in https://github.com/pytorch/pytorch/blob/2ae3c59e4bcb8e6e75b4a942cacc2d338c88e609/aten/src/ATen/mps/MPSAllocator.h#L171 Pull Request resolved: pytorch#78136 Approved by: https://github.com/seemethere

… of libtorch_python (pytorch#78028) Summary: This moves torch::class_<WorkerInfo> into `rpc_agent.cpp` so it gets registered in libtorch instead of libtorch_python. This is intermediate work to getting torch::deploy to load an unmodified copy of libtorch. Current RPC is incompatible due to duplicate registrations. ``` unknown file: Failure C++ exception with description "Exception Caught inside torch::deploy embedded library: Custom class with name __torch__.torch.classes.dist_rpc.WorkerInfo is already registered. Ensure that registration with torch::class_ is only called once. Exception raised from registerCustomClass at ../aten/src/ATen/core/custom_class.cpp:61 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f3bd9adb92e in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7f3bd9ab7068 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: torch::registerCustomClass(std::shared_ptr<c10::ClassType>) + 0x110 (0x7f3bc2258980 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame ROCm#3: torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&) + 0x3b9 (0x7f3bc225a419 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame ROCm#4: [0x7f3ba45cfea1] frame ROCm#5: <unknown function> + 0x1b5334 (0x5652bdab9334 in ./test_deploy) frame ROCm#6: <unknown function> + 0x1b4f3e (0x5652bdab8f3e in ./test_deploy) frame ROCm#7: <unknown function> + 0x1b519b (0x5652bdab919b in ./test_deploy) frame ROCm#8: loadSearchFile(char const*) + 0x23e (0x7f3ba62f37f8 in /tmp/torch_deploy9ATEFg) frame ROCm#9: deploy_set_self + 0x51 (0x7f3ba62f38f9 in /tmp/torch_deploy9ATEFg) frame ROCm#10: torch::deploy::Interpreter::Interpreter(torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>) + 0x274 (0x5652bdaaa790 in ./test_deploy) frame ROCm#11: void __gnu_cxx::new_allocator<torch::deploy::Interpreter>::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x81 (0x5652bdaaf58b in ./test_deploy) frame ROCm#12: void std::allocator_traits<std::allocator<torch::deploy::Interpreter> >::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(std::allocator<torch::deploy::Interpreter>&, torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x4a (0x5652bdaae320 in ./test_deploy) frame ROCm#13: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::_M_realloc_insert<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(__gnu_cxx::__normal_iterator<torch::deploy::Interpreter*, std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> > >, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xee (0x5652bdaae4a0 in ./test_deploy) frame ROCm#14: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::emplace_back<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xb6 (0x5652bdaad258 in ./test_deploy) frame ROCm#15: torch::deploy::InterpreterManager::InterpreterManager(unsigned long, std::shared_ptr<torch::deploy::Environment>) + 0x123 (0x5652bdaa83b1 in ./test_deploy) frame ROCm#16: TorchpyTest_InitTwice_Test::TestBody() + 0x65 (0x5652bda075a9 in ./test_deploy) frame ROCm#17: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x65 (0x5652bda944b7 in ./test_deploy) frame ROCm#18: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x5a (0x5652bda8cfe7 in ./test_deploy) frame ROCm#19: testing::Test::Run() + 0x100 (0x5652bda68622 in ./test_deploy) frame ROCm#20: testing::TestInfo::Run() + 0x10f (0x5652bda68fb3 in ./test_deploy) frame ROCm#21: testing::TestSuite::Run() + 0x121 (0x5652bda6980d in ./test_deploy) frame ROCm#22: testing::internal::UnitTestImpl::RunAllTests() + 0x38e (0x5652bda756e6 in ./test_deploy) frame ROCm#23: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x65 (0x5652bda9586b in ./test_deploy) frame ROCm#24: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x5a (0x5652bda8e0f7 in ./test_deploy) frame ROCm#25: testing::UnitTest::Run() + 0xc9 (0x5652bda73fd1 in ./test_deploy) frame ROCm#26: RUN_ALL_TESTS() + 0x11 (0x5652bda169fa in ./test_deploy) frame ROCm#27: main + 0x27 (0x5652bda10ce2 in ./test_deploy) frame ROCm#28: <unknown function> + 0x2d310 (0x7f3bc0431310 in /usr/lib/libc.so.6) frame ROCm#29: __libc_start_main + 0x81 (0x7f3bc04313c1 in /usr/lib/libc.so.6) frame ROCm#30: _start + 0x25 (0x5652bda063b5 in ./test_deploy) ``` Test Plan: CI Differential Revision: D36564258 Pull Request resolved: pytorch#78028 Approved by: https://github.com/rohan-varma

…ytorch#78276) Fixes ROCm#325 **Summary**: Currently, the pytorchbot only allows for rebasing to the master branch. These modifications add functionality for rebasing to the 'viable/strict' branch of pytorch/pytorch by adding a flag to the comment. **Test Plan:** tested manually on personal fork ([#1](swang392#1)), and included a test case in test_tryrebase.py that checks if rebasing to viable/strict branch was successful. Pull Request resolved: pytorch#78276 Approved by: https://github.com/clee2000, https://github.com/janeyx99

… to conform with non-quantized countertpart filenames Summary: Names of analogous files in quantized directory (previously snake case) were inconsistent with their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes all files in quantized (and sub-directories) dir to have pascal case. `aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR. ``` terminate called after throwing an instance of 'c10::Error' what(): Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types. Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame ROCm#8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2) ..........................truncated............. ``` Test Plan: ``` python test/test_quantization.py ``` Pull Request resolved: pytorch#77037 Approved by: https://github.com/jerryzh168

malfet and others added 30 commits September 21, 2021 16:16

[release/1.10] Pin builder and xla repo (pytorch#65433)

ad22804

Pin builder to https://github.com/pytorch/builder/commits/release/1.10 Pin xla to https://github.com/pytorch/xla/tree/r1.10

Fix test reporting git merge-base (pytorch#65787)

c05547f

Binary building wthout python fix (pytorch#66031) (pytorch#66117)

ecfcb8f

Summary: Fixes pytorch#66030 Pull Request resolved: pytorch#66031 Reviewed By: VitalyFedyunin Differential Revision: D31356243 Pulled By: malfet fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd

Fix backward compatibility tests (pytorch#66186)

5f3eee1

Compare operator list against RC1 build rather than against nightly

Tweak file_diff_from_base for release/1.10 branch (pytorch#66202)

ecbf5a7

Revert "Added option to update parameters using state_dict in Average…

49f52b6

…dModel (pytorch#65495) (pytorch#65755)" (pytorch#66308) This reverts commit 5f1a434.

Fix cosine similarity dim checks (pytorch#66214)

9509e8a

* fix cosine similarity dimensionality check * fix shapes in the doc

fix normal with empty std (pytorch#66524)

c3ea586

Delete extraneous whitespaces

cc360fa

Use preview version of kineto with roctracer support

03c7e47

Update kineto commit

f9456e2

related commits for apex and torchvision

eb539d9

rocblas alt impl during backward pass only

bb47ac7

Revert "Replace internal::GRAIN_SIZE by grain_size (parameter). (p…

9ceebba

…ytorch#53177)" - This reverts commit a0d1e70. - Reverting this commit, since it is causing a regression for detectron2 - Please check SWDEV-304968 for more info.

Add amdgpu repos for ROCm4.5 install (ROCm#886)

af7d4f0

* Add amdgpu repos for rocm install * Correct ROCm version

jithunnair-amd and others added 22 commits February 7, 2022 18:12

Disable caffe2 build (ROCm#901)

613a636

Add ROCm5.0/AMDGPU support (ROCm#904)

089849b

Signed-off-by: Wang, Yanyao <yanyao.wang@amd.com> Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>

Cherry-pick the commit to make TORCH_(CUDABLAS|CUSOLVER)_CHECK usable…

599891e

… in custom extensions (ROCm#909)

Add AMDGPU version for ROCm5.0.1

8f1516f

Hipify bug fix for header_include_paths being passed in as None from …

5ef474f

…JIT path

Remove gfx1030 from list of default targets for PyTorch since Navi21 …

fbe849f

…support is not in PT1.9

Add amdgpu version support for ROCm5.1 (ROCm#980)

da64fe2

Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>

Add ROCm5.1.1/AMDGPU support (ROCm#985)

3414a8e

Co-authored-by: Wang, Yanyao <yanyao.wang@amd.com>

Enable atomicAddNoRet() for all gfx targets (ROCm#992)

cf85f6b

Updated handling of PYTORCH_ROCM_ARCH while building base docker

3af6016

Deactive ncclAllToAll since degradation was observed on a Hayabusa sy…

4a5e2b0

…stem

Merge pull request ROCm#1011 from ROCmSoftwarePlatform/release/1.10_r…

85ace43

…evert_ncclAllToAll Deactive ncclAllToAll

Add ROCm5.1.3/AMDGPU support

5fa4b1f

Merge pull request ROCm#1017 from WBobby/release/1.10

3c9bd05

Add ROCm5.1.3/AMDGPU support

[ROCm] update cmake package DIR paths (pytorch#77087)

0352824

Fixes nightly libtorch builds. As of ROCm 5.1.x, all *.cmake files are under /opt/rocm/lib/cmake/package instead of /opt/rocm/package/lib/cmake. Pull Request resolved: pytorch#77087 Approved by: https://github.com/seemethere

Add ROCm5.2/AMDGPU support for PyTorch 1.10

e301e9b

Add ROCm5.2/AMDGPU support for PyTorch 1.10

5e03866

WBobby closed this Jul 14, 2022

WBobby deleted the release/1.10 branch September 29, 2022 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ROCm5.2/AMDGPU support for PyTorch 1.10 #1

Add ROCm5.2/AMDGPU support for PyTorch 1.10 #1

WBobby commented Jul 14, 2022

WBobby commented Jul 14, 2022

Add ROCm5.2/AMDGPU support for PyTorch 1.10 #1

Add ROCm5.2/AMDGPU support for PyTorch 1.10 #1

Conversation

WBobby commented Jul 14, 2022

WBobby commented Jul 14, 2022