New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MoE layer example #303
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Super exciting! Really looking forward to discuss this in more detail at a design review! |
Do we have any broader ideas for how this fits into the strategy for handling dynamic and data dependent shapes? I was under the impression that this was just something we were completely incapable of doing with the way that we're modeling traces. |
@@ -1048,6 +1053,7 @@ def find_producer_symbols(trace: TraceCtx, proxies: Sequence[Proxy], stop_proxie | |||
(__b = ltorch.sub(x, y) | |||
# __b = prims.sub(x, y),) | |||
""" | |||
stop_proxies = filter(lambda x: isinstance(x, Proxy), stop_proxies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stop_proxies = filter(lambda x: isinstance(x, Proxy), stop_proxies) | |
stop_proxies = tuple(filter(lambda x: isinstance(x, Proxy), stop_proxies)) |
Vaguely remember that I have run into |
DRAFT MODE TO PREVENT MERGES The approach and code is ready for experimentation and review.
The main result of this PR is that Thunder can run a variant of the MoE layer from LitGPT. There are three modifications
None
(need to create an issue). The workaround is to use unsqueeze instead ofNone
when indexing.+=
) is replaced withindex_add
.The main missing operator is
nonzero(x, as_tuple=True)
. The problem with this operator is that the output shape is unknown at compile time and it's dynamic at runtime. I tried using NumberProxy with None, NumberProxy with a custom int subclass as value, using a custom int subclass directly. But simple-1
in the shape worked best.The forward pass worked just with 14ce097. The backward pass required more of
-1
-special handling.Currently,
index_add
,index_select
,topk
are not fused with any of Thunder's fusing executors.