Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result accuracy is different with PyTorch #6077

Open
xiaomaofeng opened this issue Apr 11, 2024 · 2 comments
Open

Result accuracy is different with PyTorch #6077

xiaomaofeng opened this issue Apr 11, 2024 · 2 comments
Labels

Comments

@xiaomaofeng
Copy link

xiaomaofeng commented Apr 11, 2024

System information

  • OS Platform and Distribution: Linux Ubuntu 22.10
  • ONNX version : 1.16.0
  • ONNX Runtime version : 1.17.1
  • Pytorch version: 1.13.1

My Sample Code:Just only have FFN, i had cal the output from pth or onnx model,

import torch
import torch.onnx
from torch import nn
import onnxruntime
from torch.nn import functional as F
import numpy as np


class MLP(nn.Module):

    """ Very simple multi-layer perceptron (also called FFN)"""

    def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
        super().__init__()
        self.num_layers = num_layers
        h = [hidden_dim] * (num_layers - 1)
        self.layers = nn.ModuleList(nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim]))

    def forward(self, x):
        for i, layer in enumerate(self.layers):
            x = F.relu(layer(x)) if i < self.num_layers - 1 else layer(x)

        return x


class SimpleModel(nn.Module):

    def __init__(self, in_channels, out_channels):
        super(SimpleModel, self).__init__()
        self.mask_mlp_embed = MLP(out_channels, out_channels, out_channels, 3)


    def forward(self, x):
        x = self.mask_mlp_embed(x)
        return x



if __name__ == '__main__':

    in_channels = 256
    out_channels = 256

    model = SimpleModel(in_channels, out_channels).cuda()
    dummy_input = torch.tensor(np.random.uniform(-50, 50, size=[1, in_channels, 256]).astype(np.float32)).cuda()

    model.eval()
    output = model(dummy_input)

    torch.onnx.export(model, 
                      dummy_input, 
                      "simple_model.onnx", 
                      export_params=True,
                      opset_version=14, 
                      do_constant_folding=False, 
                      input_names=['input'], 
                      output_names=['output']) 

    exproviders = ["CUDAExecutionProvider", "CPUExecutionProvider"]
    sess_options = onnxruntime.SessionOptions()
    sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
    onnx_model = onnxruntime.InferenceSession('simple_model.onnx', sess_options, providers=exproviders)
    ort_inputs = {'input': np.array(dummy_input.detach().cpu()).astype('float32')} 
    ort_outputs = onnx_model.run(None, ort_inputs) 

then cal the np.mean from output:

Mistake image, i had deleted that

**this image has mistake, do not notice it""

You can see the precision diff in the two model, why? please
and the size of the accuracy gap will change with input

In this sampleModel,the result only have 0.001 diff, but in my another mask2fomer model, still get this problem
""Detail Only Test""

class MLP(nn.Module):
""" Very simple multi-layer perceptron (also called FFN)"""

def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
    super().__init__()
    self.num_layers = num_layers
    h = [hidden_dim] * (num_layers - 1)
    self.liner = nn.Linear(256,256)

def forward(self, x):
    a = self.liner(x)
    b = F.relu(a)
    relu = F.relu(x)
    return x

I had relaunch more than one times, you can see the value will get a little change when processed by liner function
after using nn.Linear() one more times, accuracy problems will be exacerbated
image

the zip has a tensor i had saved: x.pt

@justinchuby
Copy link
Contributor

Is 1.13.1 the pytorch version being used? If so please test with torch>2.0. It would also be helpful to upload the resulting onnx model.

@gramalingam
Copy link
Contributor

It would also help to create the issue in pytorch repo. I guess it could be an issue in pytorch-to-onnx exporter, or in onnxruntime implementation, or a potential mismatch in the ONNX op spec.

But it is a bit weird, in that it uses only a linear layer and Relu ... both of which must be very well tested by now in all 3 components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants