Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: C++ runtime on Windows #2806

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

HolyWu
Copy link
Contributor

@HolyWu HolyWu commented May 2, 2024

Description

Fix and enable C++ runtime on Windows.

Fixes #2247
Fixes #2371
Fixes #2484

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@HolyWu
Copy link
Contributor Author

HolyWu commented May 4, 2024

Some observations about the failed tests.

@github-actions github-actions bot added the component: tests Issues re: Tests label May 4, 2024
Copy link
Collaborator

@gs-olive gs-olive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and thank you very much for the contribution. Added a few questions/comments. The comments on the test cases + added fixes look good and switching np.dtype("i") to np.int32 for that case is fine to include in the converter.

.github/workflows/build-test-windows.yml Outdated Show resolved Hide resolved
.github/workflows/build_wheels_windows.yml Outdated Show resolved Hide resolved
@github-actions github-actions bot added component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels May 7, 2024
@github-actions github-actions bot removed component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels May 8, 2024
@gs-olive gs-olive added the ciflow/binaries/all Build for all Python Versions label May 17, 2024
Copy link

pytorch-bot bot commented May 17, 2024

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

@gs-olive
Copy link
Collaborator

gs-olive commented May 22, 2024

Hi @HolyWu - I wanted to share a few notes on the C++ runtime on Windows after some local testing:

  • When compiling locally, I had to upgrade the Bazel version to 6.3.2 so it could find cl and other C++ WIndows dependencies. Still, it fails during the ninja build. Have you seen the following error before?
../core/conversion/conversion.h(5): fatal error C1083: Cannot open include file: 'NvInfer.h': No such file or directory
  • I then tried downloading the artifact from the GHA job and installing, which worked.

Based on the GHA-installed package, I ran the CI-failing tests, for which I have the following results:

  • test_bert_base_uncased for both export and compile fails with transformers==4.41.0, but passes with transformers==4.39.3. Could you try fixing this version in the .yml?
  • test_linear_3_multi_dim_matrix + test_linear_4_multi_dim_matrix - these pass/fail nondeterministically for me on Windows. They fail more often and with larger variance than they do on Linux, but failures on this test have been seen on both systems. Needs further investigation on my end.
  • test_bert_base_uncased in the Torchscript path causes a Windows fatal exception on my machine as well - this should be skipped on Windows only while it is investigated.
    • The above is also resolved by using transformers==4.39.3 on my machine

@HolyWu HolyWu force-pushed the windows_cpp_runtime branch 2 times, most recently from 7a1d667 to f1c1c49 Compare May 22, 2024 13:39
@HolyWu
Copy link
Contributor Author

HolyWu commented May 22, 2024

Thanks for the notes. I have fixed the version of transformers and also add a missing test.

As for the error you encountered during ninja build, I haven't seen the exact same error. But I suspect you probably didn't run python setup.py bdist_wheel under x64 Native Tools Command Prompt for VS 2022 environment. The VS installer should have created the shortcut in your Start Menu. Or you can manually execute vcvars64.bat located in C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build. Now before running python setup.py bdist_wheel, you have to set DISTUTILS_USE_SDK env var first by executing set DISTUTILS_USE_SDK=1 (or you can add this env var to system permanently).

@gs-olive
Copy link
Collaborator

Thanks for the suggestions. I also added set INCLUDE={TENSORRT_PATH}, which enabled finding NvInfer.h, but now it shows errors for other .h files such as the following. I am not sure why the x64 Native Tools Command Prompt for VS 2022 does not appear to use the Windows PATH to source the .h files.

Python310\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory

@HolyWu
Copy link
Contributor Author

HolyWu commented May 23, 2024

PATH is only used for searching executable files, not headers or libraries. It's a bit weird that you had to manually specify include path for TensorRT, since it should have been set here and the directory should be there after bazel had completed building. Nevertheless, INCLUDE already has prefined paths under Command Prompt for VS 2022, so set INCLUDE={TENSORRT_PATH} overwrites them. You should set INCLUDE={TENSORRT_PATH};%INCLUDE% to append new path to existing one. You can echo %INCLUDE% to verify it.

@gs-olive
Copy link
Collaborator

gs-olive commented May 24, 2024

Hi @HolyWu - thanks for the suggestion. Ultimately, the issue was that the repo directory was not its original name, and so the NvInfer.h was not included in the setup.py. That has now been fixed, and it compiles to completion with bazel 6.3.2 (version 6.2.1 still says it cannot find cl and other MSVC utilities, even though they're available in that command prompt).

For some reason when importing the built package however, it shows OSError: [WinError 1114] A dynamic link library (DLL) initialization routine failed and the error seems to be sourced from torch_python.dll!THPGenerator_initDefaultGenerator. I'm not sure what is causing this issue, but I wanted to check if there might be any other .dlls I am missing.

The line causing the issue is where the library is registered with PyTorch and ultimately loaded, here:

  File "c:\torch_trt_fork\py\torch_tensorrt\__init__.py", line 114, in <module>
    _register_with_torch()
  File "c:\torch_trt_fork\py\torch_tensorrt\__init__.py", line 107, in _register_with_torch
    torch.ops.load_library(linked_file_full_path)

@HolyWu
Copy link
Contributor Author

HolyWu commented May 24, 2024

I have no clue. torchtrt.dll does not have a dependency on torch_python.dll, since torch_python.lib does not get used during linking torchtrt.dll. Can you upload your built wheel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/binaries/all Build for all Python Versions cla signed component: api [Python] Issues re: Python API component: build system Issues re: Build system component: tests Issues re: Tests
Projects
None yet
3 participants