Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using tensor.untyped_storage() in InvertibleCheckpoint class methods. #6926

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

drivanov
Copy link
Contributor

@drivanov drivanov commented Jan 9, 2024

Description

As suggested in the following warning:

tests/python/pytorch/nn/test_nn.py::test_group_rev_res[idtype0]
  /usr/local/lib/python3.10/dist-packages/dgl/nn/pytorch/conv/grouprevres.py:35: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
    inputs[1].storage().resize_(0)

the use tensor.storage() has been replaced by tensor.untyped_storage()

Checklist

Please feel free to remove inapplicable items for your PR.

  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 9, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 9, 2024

Commit ID: 007ea93

Build ID: 1

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 10, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 10, 2024

Commit ID: ef5d372

Build ID: 2

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@czkkkkkk czkkkkkk self-requested a review January 11, 2024 02:38
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 17, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 17, 2024

Commit ID: 4283dcb

Build ID: 3

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 19, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 19, 2024

Commit ID: 6a74b84

Build ID: 4

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 22, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 22, 2024

Commit ID: 7cedd70

Build ID: 5

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 29, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jan 29, 2024

Commit ID: 9ea5525

Build ID: 6

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 1, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 1, 2024

Commit ID: c1f8acd

Build ID: 7

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

@dgl-bot

@frozenbugs frozenbugs self-requested a review February 9, 2024 01:53
@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 9, 2024

Commit ID: 3bb0556fea621da60fd9498b83e20f63d71e0962

Build ID: 8

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 9, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 9, 2024

Commit ID: bb9d2da

Build ID: 9

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

================================== FAILURES ===================================
_________________________ test_group_rev_res[idtype0] _________________________

idtype = torch.int32

    @parametrize_idtype
    def test_group_rev_res(idtype):
        dev = F.ctx()
    
        num_nodes = 5
        num_edges = 20
        feats = 32
        groups = 2
        g = dgl.rand_graph(num_nodes, num_edges).to(dev)
        h = th.randn(num_nodes, feats).to(dev)
        conv = nn.GraphConv(feats // groups, feats // groups)
        model = nn.GroupRevRes(conv, groups).to(dev)
>       result = model(g, h)

tests\python\pytorch\nn\test_nn.py:2287: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py:1194: in _call_impl
    return forward_call(*input, **kwargs)
python\dgl\nn\pytorch\conv\grouprevres.py:254: in forward
    *(args + tuple([p for p in self.parameters() if p.requires_grad]))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ctx = <torch.autograd.function.InvertibleCheckpointBackward object at 0x00000052832EAE58>
fn = <bound method GroupRevRes._forward of GroupRevRes(
  (gnn_modules): ModuleList(
    (0): GraphConv(in=16, out=16, normalization=both, activation=None)
    (1): GraphConv(in=16, out=16, normalization=both, activation=None)
  )
)>
fn_inverse = <bound method GroupRevRes._inverse of GroupRevRes(
  (gnn_modules): ModuleList(
    (0): GraphConv(in=16, out=16, normalization=both, activation=None)
    (1): GraphConv(in=16, out=16, normalization=both, activation=None)
  )
)>
num_inputs = 2
inputs_and_weights = (Graph(num_nodes=5, num_edges=20,
      ndata_schemes={}
      edata_schemes={}), tensor([[ 0.5982, -1.6816, -0.5572, ...ameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       requires_grad=True))
inputs = (Graph(num_nodes=5, num_edges=20,
      ndata_schemes={}
      edata_schemes={}), tensor([[ 0.5982, -1.6816, -0.5572, ... 1.5027,  2.4316, -1.7318,  1.3843,
         -1.2294,  0.1610,  0.0136,  1.2388, -2.0080, -0.7917,  1.5043, -0.5614]]))
x = [Graph(num_nodes=5, num_edges=20,
      ndata_schemes={}
      edata_schemes={}), tensor([[ 0.5982, -1.6816, -0.5572, ... 1.5027,  2.4316, -1.7318,  1.3843,
         -1.2294,  0.1610,  0.0136,  1.2388, -2.0080, -0.7917,  1.5043, -0.5614]])]
element = tensor([[ 0.5982, -1.6816, -0.5572, -1.2908, -1.7179, -1.3323,  0.8583, -0.6617,
          1.5634, -2.0408,  0.0528, -...  1.5027,  2.4316, -1.7318,  1.3843,
         -1.2294,  0.1610,  0.0136,  1.2388, -2.0080, -0.7917,  1.5043, -0.5614]])

    @staticmethod
    def forward(ctx, fn, fn_inverse, num_inputs, *inputs_and_weights):
        ctx.fn = fn
        ctx.fn_inverse = fn_inverse
        ctx.weights = inputs_and_weights[num_inputs:]
        inputs = inputs_and_weights[:num_inputs]
        ctx.input_requires_grad = []
    
        with torch.no_grad():
            # Make a detached copy, which shares the storage
            x = []
            for element in inputs:
                if isinstance(element, torch.Tensor):
                    x.append(element.detach())
                    ctx.input_requires_grad.append(element.requires_grad)
                else:
                    x.append(element)
                    ctx.input_requires_grad.append(None)
            # Detach the output, which then allows discarding the intermediary results
            outputs = ctx.fn(*x).detach_()
    
        # clear memory of input node features
>       inputs[1].untyped_storage().resize_(0)
E       AttributeError: 'Tensor' object has no attribute 'untyped_storage'

python\dgl\nn\pytorch\conv\grouprevres.py:35: AttributeError
_________________________ test_group_rev_res[idtype1] _________________________

idtype = torch.int64

    @parametrize_idtype
    def test_group_rev_res(idtype):
        dev = F.ctx()
    
        num_nodes = 5
        num_edges = 20
        feats = 32
        groups = 2
        g = dgl.rand_graph(num_nodes, num_edges).to(dev)
        h = th.randn(num_nodes, feats).to(dev)
        conv = nn.GraphConv(feats // groups, feats // groups)
        model = nn.GroupRevRes(conv, groups).to(dev)
>       result = model(g, h)

tests\python\pytorch\nn\test_nn.py:2287: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py:1194: in _call_impl
    return forward_call(*input, **kwargs)
python\dgl\nn\pytorch\conv\grouprevres.py:254: in forward
    *(args + tuple([p for p in self.parameters() if p.requires_grad]))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

ctx = <torch.autograd.function.InvertibleCheckpointBackward object at 0x000000528B7445E8>
fn = <bound method GroupRevRes._forward of GroupRevRes(
  (gnn_modules): ModuleList(
    (0): GraphConv(in=16, out=16, normalization=both, activation=None)
    (1): GraphConv(in=16, out=16, normalization=both, activation=None)
  )
)>
fn_inverse = <bound method GroupRevRes._inverse of GroupRevRes(
  (gnn_modules): ModuleList(
    (0): GraphConv(in=16, out=16, normalization=both, activation=None)
    (1): GraphConv(in=16, out=16, normalization=both, activation=None)
  )
)>
num_inputs = 2
inputs_and_weights = (Graph(num_nodes=5, num_edges=20,
      ndata_schemes={}
      edata_schemes={}), tensor([[-2.0861e-01,  1.5159e+00, -...ameter containing:
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       requires_grad=True))
inputs = (Graph(num_nodes=5, num_edges=20,
      ndata_schemes={}
      edata_schemes={}), tensor([[-2.0861e-01,  1.5159e+00, -...75e-01,
         -1.4968e-02,  1.2952e+00, -1.2937e+00,  9.9673e-01, -1.9580e-01,
          1.3495e+00,  1.5232e+00]]))
x = [Graph(num_nodes=5, num_edges=20,
      ndata_schemes={}
      edata_schemes={}), tensor([[-2.0861e-01,  1.5159e+00, -...75e-01,
         -1.4968e-02,  1.2952e+00, -1.2937e+00,  9.9673e-01, -1.9580e-01,
          1.3495e+00,  1.5232e+00]])]
element = tensor([[-2.0861e-01,  1.5159e+00, -1.7376e+00, -7.2614e-01,  9.7599e-01,
          3.5593e-01,  4.2687e-01, -1.5182e-...275e-01,
         -1.4968e-02,  1.2952e+00, -1.2937e+00,  9.9673e-01, -1.9580e-01,
          1.3495e+00,  1.5232e+00]])

    @staticmethod
    def forward(ctx, fn, fn_inverse, num_inputs, *inputs_and_weights):
        ctx.fn = fn
        ctx.fn_inverse = fn_inverse
        ctx.weights = inputs_and_weights[num_inputs:]
        inputs = inputs_and_weights[:num_inputs]
        ctx.input_requires_grad = []
    
        with torch.no_grad():
            # Make a detached copy, which shares the storage
            x = []
            for element in inputs:
                if isinstance(element, torch.Tensor):
                    x.append(element.detach())
                    ctx.input_requires_grad.append(element.requires_grad)
                else:
                    x.append(element)
                    ctx.input_requires_grad.append(None)
            # Detach the output, which then allows discarding the intermediary results
            outputs = ctx.fn(*x).detach_()
    
        # clear memory of input node features
>       inputs[1].untyped_storage().resize_(0)
E       AttributeError: 'Tensor' object has no attribute 'untyped_storage'

@drivanov

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 21, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 21, 2024

Commit ID: 77473a7

Build ID: 10

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

@dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Feb 22, 2024

Commit ID: b2dbdd4effbff8fabf9218588bfe91205309d116

Build ID: 11

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

@drivanov
Copy link
Contributor Author

drivanov commented Feb 26, 2024

@frozenbugs : Sorry, I have no idea what is causing this problem. This is what I see in the debugger:

(Pdb) l
 31                 # Detach the output, which then allows discarding the intermediary results
 32                 outputs = ctx.fn(*x).detach_()
 33  
 34             # clear memory of input node features
 35             import pdb; pdb.set_trace()
 36  ->         inputs[1].untyped_storage().resize_(0)
 37  
 38             # store for backward pass
 39             ctx.inputs = [inputs]
 40             ctx.outputs = [outputs]
 41  
(Pdb) type(inputs[1])
<class 'torch.Tensor'>
(Pdb) hasattr(inputs[1], "untyped_storage")
True
(Pdb) 

The only reason I can think of is different versions of Python. You are using Python3.7:

C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py:1194: in _call_impl
    return forward_call(*input, **kwargs)

and in our container, we are using Python3.10

(Pdb) u
> /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1511)_wrapped_call_impl()
-> return self._call_impl(*args, **kwargs)

BTW, perhaps the versions of the “torch” are also different. It's what we are using:

root@ea1fb332897e:/opt/dgl/qa/L0_python_unittests# pip list | grep torch
pytorch-quantization      2.1.2
torch                     2.3.0a0+ebedce2
torch-tensorrt            2.3.0a0
torchdata                 0.7.1a0
torchmetrics              1.3.1
torchtext                 0.17.0a0
torchvision               0.18.0a0

@frozenbugs
Copy link
Collaborator

I am not sure, if this is not urgent, let's table it.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 7, 2024

Not authorized to trigger CI. Please ask core developer to help trigger via issuing comment:

  • @dgl-bot

@dgl-bot
Copy link
Collaborator

dgl-bot commented Mar 7, 2024

Commit ID: c259d1b

Build ID: 12

Status: ❌ CI test failed in Stage [Authentication].

Report path: link

Full logs path: link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants