You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Myia is currently not able to handle aliased tensors in data structures. This issue can crop up in the Pytorch frontend, in code like this:
classLinearSeq(torch.nn.Module):
def__init__(self, a, b):
super(LinearSeq, self).__init__()
self.lin=torch.nn.Linear(a, b)
self.seq=torch.nn.Sequential(self.lin)
defforward(self, x):
returnself.seq(x)
The problem is that Myia sees both self.lin and self.seq[0], but it understands them as different parameters rather than the same parameter. Thus, if forward only uses self.seq, the gradient wrt self.lin is zero, and the update will be applied on seq, but not lin. Furthermore, if both seq and lin are used, they will accumulate gradients separately and will diverge.
This is a difficult problem, and if we handle it, I believe it would be best to consider the aliasing patterns statically (by which I mean specialize graphs wrt aliasing patterns). The fact that two tensors in opposite corners of a data structure may be aliased seems particularly difficult to deal with, but maybe we can get away with only supporting a few simple patterns.
So the question is, how do we deal with this?
The text was updated successfully, but these errors were encountered:
Myia is currently not able to handle aliased tensors in data structures. This issue can crop up in the Pytorch frontend, in code like this:
The problem is that Myia sees both
self.lin
andself.seq[0]
, but it understands them as different parameters rather than the same parameter. Thus, ifforward
only usesself.seq
, the gradient wrtself.lin
is zero, and the update will be applied onseq
, but notlin
. Furthermore, if bothseq
andlin
are used, they will accumulate gradients separately and will diverge.This is a difficult problem, and if we handle it, I believe it would be best to consider the aliasing patterns statically (by which I mean specialize graphs wrt aliasing patterns). The fact that two tensors in opposite corners of a data structure may be aliased seems particularly difficult to deal with, but maybe we can get away with only supporting a few simple patterns.
So the question is, how do we deal with this?
The text was updated successfully, but these errors were encountered: