[DO NOT MERGE][Unity][Merge] Test merge main to unity #14416

csullivan · 2023-03-28T16:20:20Z

No description provided.

Introducing MemHammer Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Hongyi Jin <3231950289@qq.com>

Fix data type and add minimal reproducible test. Co-authored-by: Sunghyun Park <spark@octoml.ai>

Goes to the latest revision in the same major version (22.12.0). 23.1.0 is released but it involves some style changes so we would need to reformat the entire codebase. I need 22.12.0 to properly deal with processing some files found in this PR: #14167 Where black cannot parse the file in the current version, but can in the updated version.

This patch just exposes an existing analysis API IsOutputBlock to python. Since many schedule primitives have conditions on output blocks, this API would be really useful while scheduling

* [BugFix] Support rewrite_once when the number of callbacks > 1 * callbacks_map -> done, swapping false and true --------- Co-authored-by: Bin Li <binli1@amd.com>

Short desc This changes allow my to compile and tune models for hexagon directly from my macOS laptop without full switching to linux environment. List of changes Replace local linker call with call from docker container with Hexagon SDK. Yes, that is the only SDK tool used by TVM during compilation. Enhanced search of ADB. Not only in PATH, but also in ANDROID_HOME, ANDROID_SDK_ROOT and default sdk installation directory. Mac OS doesn't allow to easily change default PATH env var for UI application launched from dock bar. So adb is not available for IDE by default. Motivation Some engineers would like to continue work with comfortable macOS environment even if they have to play with hexagon devices. At this moment there is no official Hexagon SDK for macOS system. Alternatives are next: fully switch to remote linux, use local linux virtual machine or try to introduce required hexagon SDK functionality for macOS. The last option is more preferable to me. Signed-off-by: Alexander Peskov <peskovnn@gmail.com>

Following the docker file update in #14293, I'm doing the actual image update. Validated in https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/ci-docker-staging/10/pipeline

# Motivation Currently, we miss a schedule primitive to change the data type of allocated buffer (e.g. via `cache_read`/`cache_write`), and thus we cannot perform type conversion while loading data from global to shared memory. This PR adds a new schedule primitive `set_dtype` that follows the interface of `set_scope` and allows users to customize the allocated buffers' data type. # Example Before running `set_dtype`: ```python @T.prim_func def before_set_dtype( A: T.Buffer((128, 128), "float32"), C: T.Buffer((128, 128), "float32") ) -> None: B = T.alloc_buffer((128, 128), dtype="float32") for i, j in T.grid(128, 128): with T.block("B"): vi, vj = T.axis.remap("SS", [i, j]) B[vi, vj] = A[vi, vj] * 2.0 for i, j in T.grid(128, 128): with T.block("C"): vi, vj = T.axis.remap("SS", [i, j] C[vi, vj] = B[vi, vj] + 1.0 ``` then we perform the `set_dtype` schedule: ```python sch = tir.Schedule(before_set_dtype) sch.set_dtype("B", buffer_index=0, dtype="float16") print(sch.mod["main"].script()) ``` we get transformed code: ```python @T.prim_func def after_set_dtype( A: T.Buffer((128, 128), "float32"), C: T.Buffer((128, 128), "float32") ) -> None: B = T.alloc_buffer((128, 128), dtype="float16") for i, j in T.grid(128, 128): with T.block("B"): vi, vj = T.axis.remap("SS", [i, j]) B[vi, vj] = T.cast(A[vi, vj] * 2.0, "float16") for i, j in T.grid(128, 128): with T.block("C"): vi, vj = T.axis.remap("SS", [i, j] C[vi, vj] = T.cast(B[vi, vj], "float32") + 1.0 ``` where data type conversions are inserted automatically. # Other Usage Using the combination of `cache_read` + `set_dtype` can help us load data from the memory hierarchy while converting data to the desired type.

This PR adds test harness for mlperftiny submission using microTVM.

* update image * bump

* [TIR][Utility] More flexible tir::Substitute arguments Previously, the `tir::Substitute` method had overloads that supported a few ways of providing the variable map (e.g. `const Map<Var,PrimExpr>&`, `std::unordered_map<const VarNode*, PrimExpr>&`, etc.), delegating out to the overload that uses `std::function<Optional<PrimExpr>(const Var&)>`. However, the types supported for the variable map depended on the type being substituted (e.g. only supporting `const Map<Var,PrimExpr>&` with substituting into a `Array<Range>`), which would be unexpected to new developers. This PR makes the `tir::Substitute` utility more uniform in the arguments that it accepts. * For any type that is supported by `tir::Substitute`, `Array<T>` is also supported. * Any variable mapping type can be used with any substitution type. All variable mapping types are normalized to `std::function<Optional<PrimExpr>(const Var&)>`. * For `Map` and `std::unordered_map` arguments, the value type may be any subclass of `PrimExpr` (e.g. `Map<Var, Var>` instead of `Map<Var, PrimExpr>`). Previously, the calling scope needed to either construct a temporary map that returned `PrimExpr`, or to use a broader value type in the map than otherwise required. The initial and primary goal was to allow a `Map<Var, Var>` to be used as an argument to `tir::Substitute`, rather than a `Map<Var, PrimExpr>`, and making the utility more general was more straightforward than adding multiple overloads specificall for `Map<Var, Var>`. * fix linting errors * Fix copy/paste error in docstring

…14376) * [Hexagon] Allow scalar tensors to have null shape during allocation. * Add unit test for new case (scalar and null shape)

Previously, while TVMScript introduces a new scope for other contexts (e.g. `for`, `while`, `with`, etc), the `if` and `else` blocks did not introduce a new scope. This caused erroneous parsing errors if the `if` and `else` blocks each contained a variable with the same name. Added a `self.var_table.with_frame()` context resolves this issue.

* initial basis * Generated all the tile sizes * is this all you need? * linting lint move schedule rule to own file lint p2 layout transform fixings * forgot to forward arg * fix tests * reduce search space * lint * schedule rule documentation * add a note * fix wording * handle implicit reshape case v1 * clean up comments * address comments * testing harness * more progress on testing harness * fix case where shape changes in mod * inline after schedule genreation to help analysis * proper autoinlining INTO layout transform block to maintain extants * clean up * reindex for introducing cache block * reorganize testing * more cleanup * remove forced false * use the proper dispatcher * update test, make default schedule rule None * linting * fix mypy errors * clean up * manual test cases * manual tests * add comment, fix improper implicit reshape handling * fix * remove extra comments * more lints * refactor * remove extraneous check * lint again :/ * remove uneeded newline * remove leading spaces

`Load` and `Store` were deprecated in [RFC0039](https://github.com/apache/tvm-rfcs/blob/main/rfcs/0039-buffer-physical-layout.md). It has been several releases since the deprecation and it's no longer used in the codebase, now it is time to remove them.

This PR fixes the TVMScript printer to print `T.bool(True)` or `T.bool(False)` for `IntImm(dtype=bool)`. So the parser is able to parse it back.

Updated the error message to state which PrimFunc has a malformed pipeline annotation, the blocks found in that primfunc, and the pipeline annotation found.

Previously, the legalization was only handled by propagating the dtype of the indices to the transformed indices. As a result, output indices whose value did not depend on the input index would be left with the incorrect dtype.

If a variable is undefined within a PrimFunc, included the name of the PrimFunc in the error message. Otherwise, in a IRModule with multiple functions, it may be unclear which PrimFunc is malformed.

…s as attr (#14379) * not estimating the flops when there is a default estimated flops as attr * add unittests * lint fix * make unittest simpler

Enable USE_MICRO for mac and windows CI builds to ensure code building doesn't break for those platforms.

* [HEX] Enhanced vector lanes for some intrinsics * fix pylint Signed-off-by: Alexander Peskov <peskovnn@gmail.com> * fix lint 2 Signed-off-by: Alexander Peskov <peskovnn@gmail.com> * Fix typo Signed-off-by: Alexander Peskov <peskovnn@gmail.com> --------- Signed-off-by: Alexander Peskov <peskovnn@gmail.com>

Upgrading ethos-u-vela pip version to 3.7.0.

This PR adds unittest for schedule primitive read_at and write_at. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

…ed to Ethos-U (#13212) Added an option to tvmc and Ethos-U for printing to console or to the file which operators from the initial graph are offloaded to Ethos-U and which aren't. It forms line-by-line output of initial model IR, indicating which operations ported to Ethos-U. Compiler option "--target-ethos-u-dump_npu_functions_coverage" has been replaced by more generic "--dump-offloads" with the same meaning. ## Usage ``` # output to console: tvmc compile --target=ethos-u,cmsis-nn,c \ --dump-offloads=- \ ........ # output to file: tvmc compile --target=ethos-u,cmsis-nn,c \ --dump-offloads=<file path> \ ........ ``` ## Example output: ... Total number of operators and distribution by targets Total: 211 target1: 198 target2: 10 generic: 3 'target1 <- target2.qnn_conv2d' 'target1 <- %0 = qnn.conv2d(%tfl.quantize, %v_param_1, ...' 'target1 <- %1 = nn.bias_add(%0, %v_param_2, axis=3);' 'target1 <- %2 = qnn.requantize(%1, meta[relay.Constant]...' 'target2 <- target2.reshape' 'target2 <- %3 = reshape(%2, newshape=[1, 1001]);' 'generic <- %4 = nn.pad(%3, -128f, pad_width=[[0, 0], [1, 1]...' ...

This PR refactors BF16Legalize to enable more f32 computations. We also split the BF16Legalize into two steps. - BF16ComputeLegalize changes all computation to f32 while keeping the external BF16 storages. - BF16StorageLegalize changes all storage to u16. Now BF16 kernels accept tvm.nd.array that are created as bfloat16 type.

…ise op (#14408) [MetaSchedule][Hexagon] Improve vectorization for standalone elementwise ops Motivation: It was found that for standalone elementwise operations (add, sub, etc.) MetaScheduler generates code with poor performance due to lack of vector code on some input tensor shapes. Current implementation is not able to vectorize if innermost loops extent is not multiple of the vector length. What was done: Core changes: it checks current loops nest, if all loops are "simple", i.e. loops without annotations, bindings, reduce axis, then it does the following: 1) Fuse all loops into single one. 2) Split this new loop into 2 parts: inner and outer. Herewith split factor for the inner loop is equal to 'max_vectorize_extent' MetaScheduler parameter. 3) Parallelize outer loop and vectorize inner loop. Performance measurement: Measurement was done on Qualcomm Snapdragon 888. As it was expected, 1 and 2 got significant performance boost, 3 and 4 - without changes. N | op | Dtype | Shape | Before fix, ms | After fix, ms | speedup | --|---------|-------|------------------|----------------|---------------|---------| 1 | add | uint8 | 1, 8, 56, 56, 32 | 1.264 | 0.167 | 7.5x | 2 | qnn.add | uint8 | 1, 8, 56, 56, 32 | 2.213 | 0.336 | 6.6x | 3 | add | int32 | 1, 8, 56, 56, 32 | 0.161 | 0.150 | 1.07x | 4 | seq* | uint8 | 1, 64, 56, 56 | 2.634 | 2.679 | 0.98x | ----------------------------------------------------------------------------------| seq* - test of the ops sequence: qnn.conv2d + bias_add + qnn.requantize, weights shape = [256, 64, 1, 1]

tvm-bot · 2023-03-28T16:20:26Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

_{Generated by tvm-bot}

Previously, the error messasge from `FindPrimFuncFrame`, `FindBlockFrame`, and `FindIfFrame` stated that they could not find the requested frame when the top-most frame did not match the requested type. This error message could be misinterpreted by a user as stating that the frame didn't exist at all. This commit updates the error message to distinguish between the case of a missing frame (e.g. `T.reads()` occurring outside of any `T.block()` frame) and a frame not being top-most (e.g. `T.reads()` occurring inside a `T.block()`, but inside an `if` conditional instead of the top of the block).

cblmemo and others added 28 commits March 20, 2023 18:02

[TIR] Fix Datatype in Lower TVM Builtin (#14347)

7f6da09

Fix data type and add minimal reproducible test. Co-authored-by: Sunghyun Park <spark@octoml.ai>

[TIR] [Analysis] Expose IsOutputBlock to python (#14352)

50b3ae4

This patch just exposes an existing analysis API IsOutputBlock to python. Since many schedule primitives have conditions on output blocks, this API would be really useful while scheduling

[BugFix] Support rewrite_once when the number of callbacks > 1 (#14344)

d4ca123

* [BugFix] Support rewrite_once when the number of callbacks > 1 * callbacks_map -> done, swapping false and true --------- Co-authored-by: Bin Li <binli1@amd.com>

[COMMUNITY] janetsc -> Reviewer (#14359)

5abcf72

[CI] Update GPU image for CUDA 11.7 (#14363)

0c2dd47

Following the docker file update in #14293, I'm doing the actual image update. Validated in https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/ci-docker-staging/10/pipeline

[microTVM]Add MLPerfTiny test harness (#14309)

9142815

This PR adds test harness for mlperftiny submission using microTVM.

[CI][EZ] Upgrade CI Lint Image (#14373)

10a12ba

* update image * bump

[Hexagon] Allow scalar tensors to have null shape during allocation (#…

3b274aa

…14376) * [Hexagon] Allow scalar tensors to have null shape during allocation. * Add unit test for new case (scalar and null shape)

[TVMScript][Fix] Fix bool printing for roundtrip (#14390)

6759702

This PR fixes the TVMScript printer to print `T.bool(True)` or `T.bool(False)` for `IntImm(dtype=bool)`. So the parser is able to parse it back.

[TIR] Improved error message in InjectSoftwarePipeline (#14391)

ad6fbec

Updated the error message to state which PrimFunc has a malformed pipeline annotation, the blocks found in that primfunc, and the pipeline annotation found.

[TIR] Legalize dtype of constants in IndexMap (#14385)

b09e72b

Previously, the legalization was only handled by propagating the dtype of the indices to the transformed indices. As a result, output indices whose value did not depend on the input index would be left with the incorrect dtype.

[TIR] Improved MakePackedAPI error message (#14387)

4a2a3b5

If a variable is undefined within a PrimFunc, included the name of the PrimFunc in the error message. Otherwise, in a IRModule with multiple functions, it may be unclear which PrimFunc is malformed.

[TIR] not estimating the flops when there is a default estimated flop…

c5075dc

…s as attr (#14379) * not estimating the flops when there is a default estimated flops as attr * add unittests * lint fix * make unittest simpler

[CI][microTVM] Enable USE_MICRO for mac and windows CI builds (#14393)

0d0d2f0

Enable USE_MICRO for mac and windows CI builds to ensure code building doesn't break for those platforms.

[microNPU] Upgrade Vela to v3.7.0 (#14374)

6e70e79

Upgrading ethos-u-vela pip version to 3.7.0.

[TIR][Schedule] Add unittest for read_write_at (#14395)

30bf013

This PR adds unittest for schedule primitive read_at and write_at. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

csullivan closed this Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE][Unity][Merge] Test merge main to unity #14416

[DO NOT MERGE][Unity][Merge] Test merge main to unity #14416

csullivan commented Mar 28, 2023

tvm-bot commented Mar 28, 2023

[DO NOT MERGE][Unity][Merge] Test merge main to unity #14416

[DO NOT MERGE][Unity][Merge] Test merge main to unity #14416

Conversation

csullivan commented Mar 28, 2023

tvm-bot commented Mar 28, 2023