[Caffe Frontend] supporting group > 1 cases for Deconv op #8125

zotanika · 2021-05-25T12:40:09Z

Handling group > 1 cases, assuming group == output channels
Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops.
Added some test cases.

- adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework - skipping Relay multiplication if coeff is 1 Signed-off-by: zotanika <zotanika@gmail.com>

* Handling group > 1 cases, assuming group == output channels * Decomposed into Relay split, transposed conv, and multi-leveled concatenation. * Added some test cases. Signed-off-by: zotanika <zotanika@gmail.com>

* [TVMC] Add support for the MLF to 'compile' command Add support for the Model Library Format (MLF) to 'tvmc' so users can output compilation artifacts to a MLF archive passing the new flag '--output-format mlf'. For instance: $ python3 -m tvm.driver.tvmc compile ./sine_model.tflite --target="c" --output sine.tar --output-format mlf will generate a sine.tar archive that is serialized accordingly to the MLF. Since the MLF is currently meant to be used only on micro targets, an error is generated if one tries to run a MLF outside a micro context. The micro context does not exist yet but will be later introduced as part of the [RFC] "TVMC: Add support for µTVM". That commit also adds 3 pytest tests to test tvmc + MLF. Finally, it also fixes some missing periods in the 'compile' command help sections and renames export_format to output_format so there is no confusion with flag '--dump-code', which contains "formats to export" in its help section. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> * Fix missing importorskip in the import_package test Fix missing importorskip() in the import_package test allowing the test in question to be skipped when 'tflite' is not installed in the test environment, otherwise the test will fail with: [...] > archive_path = exported_tvmc_package.package_path E AttributeError: 'str' object has no attribute 'package_path'

…te (apache#8085)

Added handling of CallNode objects created via packed functions invocation + test cases. Change-Id: I5374abc59a3b0f79f27364c45f1a5789536df940

This PR is part of the TensorIR upstreaming effort (apache#7527), stage M2a. In this PR, we implemented ScheduleError, an error reporting mechanism for schedule primitives to report user-face error messages, with the functionality of rendering the TIR out in the TVM script syntax. This set of APIs allows future improvement of error location rendering, e.g. more colorful rendering mechanisms like synr does. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* Fix typos and format in comments Fix typos and format in comments about the registry manager of packed functions. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> * Fix lint No more than 100 characters per line is allowed.

Fix typo in a comment about AOT executor. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>

* [Vulkan] Enable instance/device extensions - Vulkan requires that extensions be explicitly enabled if used. Explicitly list out which extensions are required (currently none) and which are optional. * [Vulkan] Extract device information from vulkan API. - Based on vkGetPhysicalDeviceProperties and vkGetPhysicalDeviceFeatures, determine which Vulkan capabilities are supported, pack into a Target. * [Vulkan] Query instance-supported apiVersion before creating instance - Previously, vkCreateInstance was called to initialize Vulkan 1.0. * [Vulkan] Moved options for dedicated allocation and push descriptors to environment variables - Query support for dedicated allocation and push descriptors along with the rest of the device support. Move the options to disable their use from compile-time variables to environment variables `TVM_VULKAN_DISABLE_PUSH_DESCRIPTOR` and `TVM_VULKAN_DISABLE_DEDICATED_ALLOCATION`. * [Vulkan] Move option for vulkan validation layers to environment variable - Moved to enable faster use as a debug tool. If `TVM_VULKAN_ENABLE_VALIDATION_LAYERS` is a non-empty string, validation layers will be enabled. * [Vulkan] Explicitly enable vulkan features in device creation - Vulkan requires that features be explicitly enabled before use. For each feature that the device supports and a shader might use, declare it in the call to `vkCreateDevice`. * [Vulkan] Avoid repeated queries for device attributes. - Implement `VulkanDeviceAPI::GetAttr` based on the per-device values stored in the Target. This pulls all logic for querying device parameters is in a single location. * [Vulkan] Implement "from_device" flag for the vulkan target. - With the number of device capabilities that may or may not be supported by a vulkan driver, it can be tedious to input them. Specifying "-from_device=0" now indicate that any unspecified values should be read from the device. * [Vulkan][Codegen] Read vulkan device capabilities/limits from Target - Previously, the codegen assumed that all device features were present. Now, the codegen reads device capabilities from the Target, and throws an error if codegen would require use of an unsupported feature. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

This flag is causes CUBLAS to use tensore cores on all operations. With f32 or f64 operations, this leads to loss of accuracy.

* Add fast_softmax support in fast_math pass * Lintfix * Update

tqchen · 2021-05-26T18:13:28Z

@FrozenGene Please help to manage this PR

Co-authored-by: wangyucheng <wangyucheng@sensetime.com>

Change-Id: I927b43df95a8db8b042bc3cf2a1f23739d102b9d

) Currently, on linux platforms, only checks for cuda install directory in /usr/local/cuda/include. The `nvidia-cuda-dev` package of Ubuntu 20.04 installs at /usr/include, so it would be good to check that location as well. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

FrozenGene · 2021-05-28T11:32:49Z

@zotanika Do you mind splitting two PRs ? One for Deconv, another is for reduction.

* initial * remove compare * temp fix * debugging * hack * hack for testing * both test pass * cleanup * fix tests and tutorials * restructure * cleanup * cleanup * fix check files * fixed for physical devices * address comments * reduce nrf stack size * update sample url * format

This commit pins the black version to provide stability. It is expected that the pinned version will be moved forward periodically. Change-Id: Ied866bff85a1a832959bc1d4673a7fdec68128a7

* [IR][Pass][Instrument] Pass instrument framework This commit provides utilies to instrument passes: 1. Add a new namespace tvm.instrument 2. Introduce PassInstrument and PassInstrumentor to PassContext Example ------- passes_mem = #... Impl of memory instrument passes_time = tvm.instrument.PassesTimeInstrument() with tvm.transform.PassContext( pass_instrumentor=PassInstrumentor([passes_mem, passes_time])): tvm.relay.build(mod, 'llvm') passes_mem.rendor() passes_time.rendor() 3. Integrate existing PassContext::Trace() and timing profile * [IR][Pass][Instrument] Fix python test_pass_manager.py * Fix comment * Fix lint * Fix test_pass_annotation * Fix test_pass_annotation.py * Fix lint * Fix test_pass_annotation.py * Fix test_pass_annotation.py * Fix review comments * Fix tutorial use_pass_infra.py * Fix review comments * Fix review comments * Fix typo * Fix review comments * Fix review comments * Fix unittest error: test_cow_pass * Fix unittest error * Add more test cases for exceptions * Fix nit * Doc override_instruments() * Fix review comments * Fix lint * Fix EnterContext exception behavior

…nality. (apache#8157) This is in preparation for additional refactoring. Functions are organized according to group similar functionality together, to minimize the amount of file-to-file transfers needed later. The main divisions are between VulkanDeviceAPI, VulkanModuleNode/VulkanWrappedFunc, VulkanThreadEntry, and VulkanContext. Other than minimal renaming of private functions and addition of some comments, this commit should have zero changes to the functions definitions themselves, only to their arrangement within the src/runtime/vulkan directory. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

- adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework - skipping Relay multiplication if coeff is 1 Signed-off-by: zotanika <zotanika@gmail.com>

This reverts commit e26846f.

- Generate valid LLVM IR. - Set proper alignment on the constant variables.

This helps in debugging, as the function name, arguments, and docstrings show the function name from the source code instead of the wrapper function.(e.g. `<function tvm.topi.cuda.dense.dense_small_batch(cfg, data, weight, bias=None, out_dtype=None)>` instead of `<function tvm.autotvm.task.topi_integration.register_topi_compute.<locals>._decorate.<locals>.wrapper(*args, **kwargs)>`.) Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

Reduced default number of threads in reduction kernels for Metal. Default code generation generated thread block with the following size: 32x32x1. With this size number of threads per threadgroup was equal to 1024 (32 * 32 * 1). Sometimes device doesn't have enough resources and in this case we will get an exception that the block size is greater than value of maxTotalThreadsPerThreadgroup. To prevent such situation we decrease default number of threads. With this fix every model should work with default codegen and auto-tuning or auto-scheduling will select the optimal number of threads.

…pache#8230) Currently board-specific config files (boards/*.conf) are not copied from Zephyr project dir to the destination build dir, so as a consequence the per board configs are not used when building the runtime libraries, like libcommon. Hence, for instance, it's currently not possible to set CONFIG_FPU per board since it only takes effect when it's set in the generic 'prj.con' config file. This commit fixes it by copying to the build dir (to each lib dir) the proper .conf for the selected target board. For example, if target 'qemu_x86' is selected 'qemu_x86.conf' is copied to the boards/ dir inside the lib dirs, so Zephyr build system can find it and combine it with configs found in the generic 'prj.conf'. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>

* Fixed the destruction order tflite::Interpreter and EdgeTPUContext * Fixed include omission * Formatted

…ncubator-tvm into frontend-caffe-deconv

…DeviceAPI (apache#8196) * [Vulkan][Refactor] Moved VulkanStream ownership from VulkanThreadEntry to VulkanDevice - Implemented ThreadMap, a container for per-thread objects. Unlike dmlc::ThreadLocalStore, ThreadMap is intended for use as a non-static thread-specific lookup. - Added ThreadMap<VulkanStream> as a member to VulkanDevice, updated all uses. * [Vulkan][Refactor] Pulled VulkanBuffer allocation/deallocation into constructor/destructor. - VulkanBuffer owns the VkBuffer and VkDeviceMemory that it allocates, and deallocates on destruction. - VulkanHostVisibleBuffer owns a VulkanBuffer, and additional calls vkUnmapMemory on destruction. * [Vulkan][Refactor] Move the VulkanStagingBuffer to be owned by the VulkanDevice - Previously, was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class. * [Vulkan][Refactor] Move ownership of per-thread uniform buffer to VulkanDevice - Previously, VulkanUniformBuffer was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class. * [Vulkan][Refactor] Moved ownership of per-thread workspace pool to VulkanDeviceAPI - Previously, the WorkspacePool was owned by VulkanThreadEntry, and required a lookup from VulkanDeviceAPI::AllocWorkspace. As a result, non-global VulkanDeviceAPI would interact with each other. * [Vulkan][Refactor] Moved ownership of per-thread active device id to VulkanDeviceAPI - Previously, the active device was owned by VulkanThreadEntry, so lookups to multiple global variables were required. Now, everything goes from the VulkanDeviceAPI. - Removed VulkanThreadEntry, as all functionality has been moved to either VulkanDevice or VulkanDeviceAPI. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>

* [BYOC][ACL] Prevent dilated pooling Added check preventing avg_pool2d and max_pool2d to be scheduled for execution via ACL* runtime if dilation other than (1, 1) is provided as ACL does not currently support dilation attribute in pooling layer. *ACL stands for "Compute Library for the Arm® Architecture" Change-Id: If8f65d3a154e09f880bec73dd756d9f985a20ff2 * linter Change-Id: If91809350786e69f59596301e0cbd3def6815cd0

…he#7858) - Replaced capabilities header file with api calls introduced by the 20.11 ethosn driver stack release. - Removed 20.08 driver stack support and updated all affected code.

* num of cores * add target list * extension * qemu * fix * comments * add qemu to setup build * fix * add mps2 test * merge fix * add commit option * add log * fix * fix zephyr init * rename * fix zephyr init * uncomment * fixed qemu isntall * cleanup * version * add commit option * fixed qemu isntall * add docker import * cleanup * fix * cleanup * fix * fix zephyr path * fix * fix * address comments * fix test * fix * add wait * comments * changed test to script * add checks * fix zephyr * Revert "add wait" This reverts commit 70f3c7d. * address comments

…adcast (apache#8250) * Allow cblas batch_matmul implicit bcast * Add cblas batch_matmul bcast when batch_a=1

The micro TVM page was moved during a recent docs update. This patch moved the top level index to the former location.

…che#8245) * [CI] [ComputeLibrary] Use pre-built binaries instead of compiled Pre-built Compute Library binaries are now downloaded (credits to @leandorn) instead of on-site compilation. Change-Id: I9fd66ce02141813f02382b95351a382ccf775584 * Added Apache 2.0 License Change-Id: I3c2af1a86984f81c4ee9408925af9c51510a978f

…nto frontend-caffe-deconv

zotanika · 2021-06-15T12:08:14Z

reopened #8260 on a clean branch

zotanika and others added 18 commits May 11, 2021 16:50

[Caffe Frontend] adding Reduction op

fbcfd5d

reformatting Reduction op test script

655d9ef

reformatting Reduction test script

f80ba3e

[Caffe frontend] Reduction op

8d8af41

- adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework - skipping Relay multiplication if coeff is 1 Signed-off-by: zotanika <zotanika@gmail.com>

linting test script

be443dd

linting

d361160

Merge branch 'apache:main' into zotanika

657c2af

[Caffe Frontend] Supporting multiple grouped(channel-wise) Deconv op

43e25e5

* Handling group > 1 cases, assuming group == output channels * Decomposed into Relay split, transposed conv, and multi-leveled concatenation. * Added some test cases. Signed-off-by: zotanika <zotanika@gmail.com>

[Relay][PRNG] Support generating data of any shape in threefry_genera…

6f82e98

…te (apache#8085)

[Relay][dismantler] Added handling of packed func (apache#8004)

aefa0c8

Added handling of CallNode objects created via packed functions invocation + test cases. Change-Id: I5374abc59a3b0f79f27364c45f1a5789536df940

[METAL] Split kernels and compile them separately (apache#7980)

dc5fc68

Fix typos and format in comments (apache#8132)

2a008f3

* Fix typos and format in comments Fix typos and format in comments about the registry manager of packed functions. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> * Fix lint No more than 100 characters per line is allowed.

Fix typo in a comment (apache#8129)

69e56c6

Fix typo in a comment about AOT executor. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>

[CUBLAS] Remove deprecated CUBLAS_TENSOR_OP_MATH flag (apache#8130)

31e21a2

This flag is causes CUBLAS to use tensore cores on all operations. With f32 or f64 operations, this leads to loss of accuracy.

[FastMath] Add fast_softmax support in fast_math pass (apache#8138)

4344540

* Add fast_softmax support in fast_math pass * Lintfix * Update

tqchen assigned FrozenGene May 26, 2021

wyc-ruiker and others added 4 commits May 26, 2021 16:00

[Codegen][CUDA] Fix make_int4x cuda codegen vectorize (apache#8137)

f4dce24

Co-authored-by: wangyucheng <wangyucheng@sensetime.com>

[lint] Fix black whitespace errors (apache#8124)

23a95f6

Change-Id: I927b43df95a8db8b042bc3cf2a1f23739d102b9d

[COMMUNITY] New committer -- trevor-m (apache#8141)

2cde3dc

mehrdadh and others added 5 commits May 28, 2021 10:13

Pin black version (apache#8139)

f0aedc4

This commit pins the black version to provide stability. It is expected that the pinned version will be moved forward periodically. Change-Id: Ied866bff85a1a832959bc1d4673a7fdec68128a7

[CI] Cleanup stale logs for auto-tuning (apache#8160)

a4fb12d

zotanika and others added 27 commits June 10, 2021 16:11

[Caffe frontend] Reduction op

847b3b8

- adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework - skipping Relay multiplication if coeff is 1 Signed-off-by: zotanika <zotanika@gmail.com>

linting test script

400edce

linting

f3fad0d

[Caffe Frontend] reverting codes related Reduction for splitting PR

72637ef

instant fix against docker format error

2214813

Revert "instant fix against docker format error"

0321ad3

This reverts commit e26846f.

instant fix against docker format error, only on 'frontend/caffe'

37824f4

[COMMUNITY] Egor Churaev -> reviewer (apache#8231)

b895f2e

[LLVM] Fix CodeGenLLVM::LinkParameters (apache#8213)

4079ffd

- Generate valid LLVM IR. - Set proper alignment on the constant variables.

support matching attributes with more complext objects (apache#8240)

4e9760b

Fix compile time and runtime errors of EdgeTPURuntime (apache#8133)

657af3a

* Fixed the destruction order tflite::Interpreter and EdgeTPUContext * Fixed include omission * Formatted

Merge branch 'frontend-caffe-deconv' of https://github.com/zotanika/i…

938c1f6

…ncubator-tvm into frontend-caffe-deconv

[ETHOSN] Removed support for 20.08 version of the driver stack. (apac…

d69011d

…he#7858) - Replaced capabilities header file with api calls introduced by the 20.11 ethosn driver stack release. - Removed 20.08 driver stack support and updated all affected code.

[TOPI][batch_matmul] Allow cblas batch_matmul implicit batch_size bro…

ab16685

…adcast (apache#8250) * Allow cblas batch_matmul implicit bcast * Add cblas batch_matmul bcast when batch_a=1

doc: fixes to dataflow_pattern (apache#8247)

3972c29

Unify Python and C++ TIR lower API (apache#8110)

9dd1286

Move Micro TVM top level page (apache#8249)

f4b95ab

The micro TVM page was moved during a recent docs update. This patch moved the top level index to the former location.

Fix build break in android_rpc (apache#8252)

1c251f5

make simplify inference iterative (apache#8246)

24c2f5c

Merge remote-tracking branch 'remotes/origin/frontend-caffe-deconv' i…

af998e4

…nto frontend-caffe-deconv

zotanika closed this Jun 15, 2021

zotanika deleted the frontend-caffe-deconv branch June 15, 2021 05:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Caffe Frontend] supporting group > 1 cases for Deconv op #8125

[Caffe Frontend] supporting group > 1 cases for Deconv op #8125

zotanika commented May 25, 2021

tqchen commented May 26, 2021 •

edited

FrozenGene commented May 28, 2021

zotanika commented Jun 15, 2021

[Caffe Frontend] supporting group > 1 cases for Deconv op #8125

[Caffe Frontend] supporting group > 1 cases for Deconv op #8125

Conversation

zotanika commented May 25, 2021

tqchen commented May 26, 2021 • edited

FrozenGene commented May 28, 2021

zotanika commented Jun 15, 2021

tqchen commented May 26, 2021 •

edited