Adding prod & prod_dim #1173

unrenormalizable · 2024-01-25T04:33:52Z

Pull Request Template

NOTE:

Following things still need to be done, creating this PR to get guidance on those. Go to the comments section of this PR and I point these out in their respective places.

tch, ndarray & candle dont support prod_axis required for this implementation. QUESTION should I implement it in the burn layer with some gymnastics or leave it as panic!() like e.g. burn-candle's int_div_scalar

there is large copy-pasta from sum implementation

was expecting to write some more directed tests esp. in burn-wgpu but couldn't find tests for other ops. Am I missing something or do we expect the ops to be tested by the higher layers?

Checklist

Confirmed that run-checks all script has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

n/a

Changes

Adds prod & prod_dim in preparation for adding ReduceProd ONNX op. Really a most useless checkin, doing it for getting familiar with the codebase + fun.

Testing

run-checks all

2. ONNX IR fix ReduceL* 3. ignoring ipynb checkpoints

# Conflicts: # burn-book/src/building-blocks/tensor.md

unrenormalizable · 2024-01-25T04:35:14Z

burn-candle/src/ops/int_tensor.rs

@@ -311,6 +311,14 @@ impl<F: FloatCandleElement, I: IntCandleElement> IntTensorOps<Self> for Candle<F
        CandleTensor::new(tensor.tensor.sum_keepdim(dim).unwrap())
    }

+    fn int_prod<const D: usize>(tensor: IntTensor<Self, D>) -> IntTensor<Self, 1> {
+        todo!();


tch, ndarray & candle dont support prod_axis required for this implementation.

QUESTION should I implement it in the burn layer with some gymnastics or leave it as panic!() like e.g. burn-candle's int_div_scalar

I think it's rather bad to offer an operation in the API if it's gonna fail on 75% of our backends. I think if you can work out some default implementation in the burn layer it would be cool, otherwise we should not even offer the operation.
Candle's int_div_scalar is an exception that we should fix eventually.

i agree. i will file issues on tch, ndarray and candle and implement a default.

unrenormalizable · 2024-01-25T04:36:45Z

burn-wgpu/src/kernel/reduce/tune/prod_dim.rs

@@ -0,0 +1,116 @@
+use burn_compute::tune::{AutotuneOperation, AutotuneOperationSet};


this is a large copy-pasta from sum implementation

what is the recommendation here? parametrize the core implementation and call them with sum / prod arguments?

I guess we'll want to refactor this so that all reduce operations share some core. The list of reduction operations is growing (there's also this PR #1136 coming up with more autotuned reduce) so we can't afford to have this many duplicates. For now you can keep it as is but I'll open an issue for refactoring this.

louisfd

Hi @unrenormalizable
Thanks for the PR draft, it's looking good. See my comments

louisfd · 2024-01-25T14:10:48Z

burn-wgpu/src/kernel/reduce/reduction_shared_memory.rs

+                (WORKGROUP_DEFAULT * WORKGROUP_DEFAULT).to_string(),
+            )
+            .register("initial", 0.0.to_string())
+            .register("update", "shared_memory[local_id] += value; ")


I guess you'll want initial to be 1.0 and the update to be *=

louisfd · 2024-01-25T14:14:15Z

burn-wgpu/src/kernel/reduce/tune/prod_dim.rs

@@ -0,0 +1,116 @@
+use burn_compute::tune::{AutotuneOperation, AutotuneOperationSet};


I guess we'll want to refactor this so that all reduce operations share some core. The list of reduction operations is growing (there's also this PR #1136 coming up with more autotuned reduce) so we can't afford to have this many duplicates. For now you can keep it as is but I'll open an issue for refactoring this.

louisfd · 2024-01-25T14:15:28Z

burn-wgpu/src/template/reduction/recursive_prod.wgsl

+    workgroupBarrier();
+
+    if id_local == 0u {
+        var prod = {{ elem }}(0);


You should start at 1.0 or it won't compute much ;)

louisfd · 2024-01-25T14:18:06Z

burn-candle/src/ops/int_tensor.rs

@@ -311,6 +311,14 @@ impl<F: FloatCandleElement, I: IntCandleElement> IntTensorOps<Self> for Candle<F
        CandleTensor::new(tensor.tensor.sum_keepdim(dim).unwrap())
    }

+    fn int_prod<const D: usize>(tensor: IntTensor<Self, D>) -> IntTensor<Self, 1> {
+        todo!();


I think it's rather bad to offer an operation in the API if it's gonna fail on 75% of our backends. I think if you can work out some default implementation in the burn layer it would be cool, otherwise we should not even offer the operation.
Candle's int_div_scalar is an exception that we should fix eventually.

louisfd · 2024-01-25T14:21:20Z

burn-autodiff/src/ops/tensor.rs

+                    let ones = B::ones(shape, &B::device(&grad));
+                    let grad = B::prod_dim(grad, dim);
+
+                    B::mul(ones, grad)


Why multiply by one?

louisfd · 2024-01-25T14:30:55Z

burn-autodiff/src/ops/tensor.rs

+
+                unary::<B, D, D, _>(ops.parents, ops.node, grads, |grad| {
+                    let ones = B::ones(shape, &B::device(&grad));
+                    let grad = B::prod_dim(grad, dim);


As an example, I think the derivative of prod_dim([a, b, c, d], 0) should be
[bcd, acd, abd, bcd], then multiply by the unchanged grad. So you'll need to register the original input as state. Don't hesitate to call me out if you think I'm wrong I haven't thought of it too long

yeah i am stupid. i will add unit tests for this.

antimora · 2024-01-29T20:03:31Z

Great! Let me know when you start working on ONNX part.

unrenormalizable · 2024-01-30T02:05:02Z

Great! Let me know when you start working on ONNX part.

will do & my apologies, taking way longer than expected. overestimated my learning abilities 🤣

antimora · 2024-01-30T02:58:40Z

No worries. We are here to help.

# Conflicts: # burn-autodiff/src/ops/tensor.rs # burn-candle/src/lib.rs # burn-candle/src/ops/tensor.rs # burn-fusion/src/ops/float.rs # burn-fusion/src/stream/operation.rs # burn-ndarray/src/ops/tensor.rs # burn-tch/src/ops/tensor.rs # burn-wgpu/src/ops/float_ops.rs

unrenormalizable · 2024-02-01T17:37:02Z

Folks, I am going to pause the prod/prod_dim work till required dependencies are available.

The main blocker is cumprod is required to implement the autodiff/grad component for prod. This is as per the pytorch implementation which is a reasonable approach.

In addition I have filed the prod_axis requirements on huggingface/candle/1620 & rust-ndarray/ndarray/1351

Following items from this PR will be taken forward in a separate PR.

reduce_dim_sum.wgsl edge case fix
reduce_dim_shared_memory.wgsl initialization issue

unrenormalizable and others added 10 commits December 22, 2023 14:26

Making Embedding.weight public + ignoring VS2022 local cache.

7ca6659

adding back NL@ EO .gitignore

484b24d

runcheck errors introduced.

640d278

preexisting ".\run-checks.ps1 all" errors

293d3cd

Merge branch 'tracel-ai:main' into main

f4bf4f4

Merge remote-tracking branch 'upstream/main'

22701da

1. Fixing import errors in jupyter notebooks

be4e844

2. ONNX IR fix ReduceL* 3. ignoring ipynb checkpoints

Adding prod & prod_dim

56cf7ac

Merge remote-tracking branch 'upstream/main'

f1e2913

# Conflicts: # burn-book/src/building-blocks/tensor.md

adding missed changes

d08c5a6

unrenormalizable commented Jan 25, 2024

View reviewed changes

unrenormalizable changed the base branch from main to book/guide January 25, 2024 04:37

unrenormalizable changed the base branch from book/guide to main January 25, 2024 04:37

louisfd requested changes Jan 25, 2024

View reviewed changes

louisfd mentioned this pull request Jan 25, 2024

Refactor Reduce Autotune #1175

Closed

fixed basic issues with prod impl

826e27c

antimora added the feature The feature request label Jan 31, 2024

unrenormalizable mentioned this pull request Feb 1, 2024

Implement prod / prod_dim #1227

Closed

unrenormalizable closed this Feb 1, 2024

unrenormalizable mentioned this pull request Feb 1, 2024

Couple of changes in WGSL layer #1231

Closed

4 tasks

unrenormalizable mentioned this pull request Feb 26, 2024

Fixes a couple of edge cases with sum/reduction WGSL layer. #1363

Closed

2 tasks

antimora mentioned this pull request Mar 10, 2024

prod and prod_axis tensor operators #526

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding prod & prod_dim #1173

Adding prod & prod_dim #1173

unrenormalizable commented Jan 25, 2024

unrenormalizable Jan 25, 2024

louisfd Jan 25, 2024

unrenormalizable Jan 25, 2024

unrenormalizable Jan 25, 2024 •

edited

louisfd Jan 25, 2024

louisfd left a comment

louisfd Jan 25, 2024

louisfd Jan 25, 2024

louisfd Jan 25, 2024

louisfd Jan 25, 2024

louisfd Jan 25, 2024

louisfd Jan 25, 2024

unrenormalizable Jan 25, 2024

antimora commented Jan 29, 2024

unrenormalizable commented Jan 30, 2024 •

edited

antimora commented Jan 30, 2024

unrenormalizable commented Feb 1, 2024

		@@ -0,0 +1,116 @@
		use burn_compute::tune::{AutotuneOperation, AutotuneOperationSet};

Adding prod & prod_dim #1173

Adding prod & prod_dim #1173

Conversation

unrenormalizable commented Jan 25, 2024

Pull Request Template

Checklist

Related Issues/PRs

Changes

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unrenormalizable Jan 25, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

louisfd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antimora commented Jan 29, 2024

unrenormalizable commented Jan 30, 2024 • edited

antimora commented Jan 30, 2024

unrenormalizable commented Feb 1, 2024

unrenormalizable Jan 25, 2024 •

edited

unrenormalizable commented Jan 30, 2024 •

edited