Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT UNPIN] ORT 1.18.0 Release Candidates available for testing #20558

Open
sophies927 opened this issue May 4, 2024 · 15 comments
Open

[DO NOT UNPIN] ORT 1.18.0 Release Candidates available for testing #20558

sophies927 opened this issue May 4, 2024 · 15 comments
Labels
api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider ep:DML issues related to the DirectML execution provider ep:TensorRT issues related to TensorRT execution provider platform:web issues related to ONNX Runtime web; typically submitted using template release:1.18.0

Comments

@sophies927
Copy link
Contributor

sophies927 commented May 4, 2024

ORT 1.18 will be released soon, and release candidate builds are available now for testing. If you encounter issues, please report them by responding to this issue.

Release branch: rel-1.18.0
Release manager: @yihonglyu

Python Whls Nuget NPM Maven (Java)
CPU: 1.18.0.dev20240430005
GPU: 1.18.0.dev20240430005
CPU: 1.18.0-dev-20240501-0627-204f1f59b9
GPU (CUDA/TRT): 1.18.0-dev-20240430-2214-204f1f59b9
DirectML: 1.18.0-dev-20240501-0503-204f1f59b9
WindowsAI: 1.18.0-dev-20240430-1128-204f1f5
onnxruntime-node: 1.18.0-dev.20240430-204f1f59b9
onnxruntime-react-native: 1.18.0-dev.20240430-204f1f59b9
onnxruntime-web: 1.18.0-dev.20240430-204f1f59b9
CPU: 1.18.0-rc1
GPU: 1.18.0-rc1
@sophies927 sophies927 added ep:DML issues related to the DirectML execution provider api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider ep:TensorRT issues related to TensorRT execution provider platform:web issues related to ONNX Runtime web; typically submitted using template release:1.18.0 labels May 4, 2024
@sophies927 sophies927 pinned this issue May 4, 2024
@tianleiwu
Copy link
Contributor

tianleiwu commented May 6, 2024

Is there GPU python package for CUDA 12?

@HectorSVC HectorSVC unpinned this issue May 6, 2024
@sophies927 sophies927 pinned this issue May 8, 2024
@sophies927 sophies927 changed the title ORT 1.18.0 Release Candidates available for testing [DO NOT UNPIN] ORT 1.18.0 Release Candidates available for testing May 8, 2024
@sophies927
Copy link
Contributor Author

OrtApi last entry from 1.18 is not protected by the usual static_assert

Thanks for catching that!

@yihonglyu would you mind looking into this?

@yihonglyu
Copy link
Contributor

OrtApi last entry from 1.18 is not protected by the usual static_assert

@Hopobcn @sophies927 A fix is here #20671

@JulienTheron
Copy link

Hi, I just found out that ScatterND nodes are now placed on CPU. They were placed on GPU before.

See below:

2024-05-13 15:19:36.112  [16248]  DEBUG  Node placements
2024-05-13 15:19:36.112  [16248]  DEBUG   Node(s) placed on [CPUExecutionProvider]. Number of nodes: 6
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_1)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_2)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_3)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_4)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_5)
2024-05-13 15:19:36.112  [16248]  DEBUG   Node(s) placed on [TensorrtExecutionProvider]. Number of nodes: 23

With ORT 1.17.3:

2024-05-13 15:38:07.967  [  576]  DEBUG  Node placements
2024-05-13 15:38:07.967  [  576]  DEBUG   All nodes placed on [TensorrtExecutionProvider]. Number of nodes: 1

Is it because of the upgrade to TensorRT 10?

@sophies927
Copy link
Contributor Author

Hi, I just found out that ScatterND nodes are now placed on CPU. They were placed on GPU before.

See below:

2024-05-13 15:19:36.112  [16248]  DEBUG  Node placements
2024-05-13 15:19:36.112  [16248]  DEBUG   Node(s) placed on [CPUExecutionProvider]. Number of nodes: 6
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_1)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_2)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_3)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_4)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_5)
2024-05-13 15:19:36.112  [16248]  DEBUG   Node(s) placed on [TensorrtExecutionProvider]. Number of nodes: 23

With ORT 1.17.3:

2024-05-13 15:38:07.967  [  576]  DEBUG  Node placements
2024-05-13 15:38:07.967  [  576]  DEBUG   All nodes placed on [TensorrtExecutionProvider]. Number of nodes: 1

Is it because of the upgrade to TensorRT 10?

@jywu-msft

@jywu-msft
Copy link
Member

Hi, I just found out that ScatterND nodes are now placed on CPU. They were placed on GPU before.

See below:

2024-05-13 15:19:36.112  [16248]  DEBUG  Node placements
2024-05-13 15:19:36.112  [16248]  DEBUG   Node(s) placed on [CPUExecutionProvider]. Number of nodes: 6
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_1)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_2)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_3)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_4)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_5)
2024-05-13 15:19:36.112  [16248]  DEBUG   Node(s) placed on [TensorrtExecutionProvider]. Number of nodes: 23

With ORT 1.17.3:

2024-05-13 15:38:07.967  [  576]  DEBUG  Node placements
2024-05-13 15:38:07.967  [  576]  DEBUG   All nodes placed on [TensorrtExecutionProvider]. Number of nodes: 1

Is it because of the upgrade to TensorRT 10?

+@chilo-ms , @yf711

@jywu-msft
Copy link
Member

jywu-msft commented May 13, 2024

Hi, I just found out that ScatterND nodes are now placed on CPU. They were placed on GPU before.

See below:

2024-05-13 15:19:36.112  [16248]  DEBUG  Node placements
2024-05-13 15:19:36.112  [16248]  DEBUG   Node(s) placed on [CPUExecutionProvider]. Number of nodes: 6
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_1)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_2)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_3)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_4)
2024-05-13 15:19:36.112  [16248]  DEBUG    ScatterND (/ScatterND_5)
2024-05-13 15:19:36.112  [16248]  DEBUG   Node(s) placed on [TensorrtExecutionProvider]. Number of nodes: 23

With ORT 1.17.3:

2024-05-13 15:38:07.967  [  576]  DEBUG  Node placements
2024-05-13 15:38:07.967  [  576]  DEBUG   All nodes placed on [TensorrtExecutionProvider]. Number of nodes: 1

Is it because of the upgrade to TensorRT 10?

@JulienTheron can you provide us a repro test case or point us to a public model that can be used to investigate further? Are you also updating TensorRT version from 8.6 to 10.0 along with the ORT version update? +@chilo-ms

@JulienTheron
Copy link

@JulienTheron can you provide us a repro test case or point us to a public model that can be used to investigate further? Are you also updating TensorRT version from 8.6 to 10.0 along with the ORT version update? +@chilo-ms

Yes I've used the default TensorRT version for each ORT release, so 8.6 for 1.17.3 and 10 for 1.18.0.
I will see if I can find a public model to repro the issue with.

@JulienTheron
Copy link

Here is a version of our model that we can share:
model.zip

To repro, just create a TensorRT session like this:

const Ort::Env env(ORT_LOGGING_LEVEL_VERBOSE);

Ort::SessionOptions options;
OrtTensorRTProviderOptionsV2* trtOptions;
Ort::GetApi().CreateTensorRTProviderOptions(&trtOptions);
options.AppendExecutionProvider_TensorRT_V2(*trtOptions);
Ort::GetApi().ReleaseTensorRTProviderOptions(trtOptions);

Ort::Session session(env, L"model.onnx", options);

Thanks.

@sophies927
Copy link
Contributor Author

Is there GPU python package for CUDA 12?

There will be for the release. We had to make some updates to make the CUDA 12 package publication a bit easier (see this PR), which is why the packages weren't included in the first round release candidates.

@JulienTheron
Copy link

JulienTheron commented May 15, 2024

Here's a follow-up to the TensorRT issue I mentioned.
I edited

onnx_tensorrt;https://github.com/onnx/onnx-tensorrt/archive/eb43908b02a296ea0594432f06e9d3fac288d672.zip;94d07871810a36a5bc70a1def5c50504101c9bd1
and replaced it with onnx_tensorrt;https://github.com/onnx/onnx-tensorrt/archive/bacfaaa951653cd4e72efe727a543567cb38f7de.zip;26434329612e804164ab7baa6ae629ada56c1b26, then built with TensorRT 8.6.

The issue is now gone. I haven't quite figured why ScatterND is not placed on the TensorRT EP with TRT 10, but I guess the issue has nothing to do with the ONNX Runtime?

@edgchen1
Copy link
Contributor

#20715

@Djdefrag
Copy link

20713

@sophies927
Copy link
Contributor Author

#20715

Thanks for calling this out! Appears to be an issue with the model input caused by an ONNX update, rather than an issue with ORT 1.18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider ep:DML issues related to the DirectML execution provider ep:TensorRT issues related to TensorRT execution provider platform:web issues related to ONNX Runtime web; typically submitted using template release:1.18.0
Projects
None yet
Development

No branches or pull requests

8 participants