New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more nvFuser debug information #387
Comments
A nvFuser fusion region in the Thunder trace is represented as a BoundSymbol. You can get all bound symbols by accessing nvfuser_symbols = [bsym for bsym in trace.bound_symbols if bsym.sym.name.startswith("nvFusion")] There's also lightning-thunder/thunder/examine/__init__.py Line 207 in dd42bb3
Here's an example session using bound symbols info to retrieve information on inputs: In [1]: import torch
In [2]: import thunder
In [3]: @thunder.jit
...: def func(x):
...: t1 = thunder.prims.var(x, (0, 1), correction=1)
...: t2 = thunder.prims.add(t1, t1)
...: return t2
...:
In [4]: x = torch.randn(512, 512, device="cuda")
In [5]: out = func(x)
In [6]: thunder.last_traces(func)[-1]
Out[6]:
# Constructed by Delete Last Used (took 0 milliseconds)
import torch
from thunder.executors.torchex import no_autocast
@torch.no_grad()
@no_autocast
def computation(x):
# x: "cuda:0 f32[512, 512]"
[t2] = nvFusion0(x)
# t1 = prims.var(x, (0, 1), correction=1) # t1: "cuda:0 f32[]"
# t2 = prims.add(t1, t1) # t2: "cuda:0 f32[]"
del x
return t2
In [7]: import thunder.examine
In [8]: thunder.examine.get_fusion_symbols(thunder.last_traces(func)[-1])
Out[8]:
[[t2] = nvFusion0(x)
# t1 = prims.var(x, (0, 1), correction=1) # t1: "cuda:0 f32[]"
# t2 = prims.add(t1, t1) # t2: "cuda:0 f32[]"]
In [9]: trace = thunder.last_traces(func)[-1]
In [10]: nvfuser_symbols = [bsym for bsym in trace.bound_symbols if bsym.sym.name.startswith("nvFusion")]
In [11]: nvfuser_symbols
Out[11]:
[[t2] = nvFusion0(x)
# t1 = prims.var(x, (0, 1), correction=1) # t1: "cuda:0 f32[]"
# t2 = prims.add(t1, t1) # t2: "cuda:0 f32[]"]
In [12]: nvfuser_symbols[0].args
Out[12]: (x,)
In [13]: nvfuser_symbols[0].args[0].shape
Out[13]: (512, 512)
In [14]: nvfuser_symbols[0].args[0].dtype
Out[14]: float32
Given the information above what mechanism are you planning to add? |
Thanks for the comment! To see what mechanism I came up with before you had a chance to comment, please check out the linked PR #388. With this added context, I'll check out how I can reuse the examine mechanism in my PR and update this issue. |
After further inspection @IvanYashchuk I still think this are two sightly different things. In the PR the output is a ready to run python code for the fusion and the method you explained allows to get similar information but missing the stride and the code to run the fusions. However, I agree with you on that it might be better to move the code from that PR to the Even better, I think I can get the information about the inputs from the trace using your technique, eliminating the need to modify the |
What is the goal? Let's think of debugging scenarios, here are example I could come up with:
That's all specific for nvFuser as a FusingExecutor. How can this be extended to run any slice of a trace that involves any FusingExecutor and/or OperatorExecutor ops? |
🚀 Feature
For debugging purposes, I would like to be able to quickly retrieve input information for a fusion.
Motivation
When debugging performance issues an important aspect is to be able to narrow the search space quickly and efficiently.
Pitch
At the time of writing, debugging nvFuser fusions is very approachable already as it is possible to retrieve fusion definitions from the trace. It is however not trivial to retrieve the input information. Therefore, to improve the experience even further, I propose to add a mechanism to retrieve input information for a fusion definition.
Alternatives
Additional context
This idea came out after starting to work on #205 where an idea by @kevinstephano is to show how to dump the debug for specific fusions and execute them in a notebook. This issue works toward reaching that goal.
cc @carmocca @apaz-cli @Borda @tfogal
The text was updated successfully, but these errors were encountered: