Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows build is failing on Gitlab CI/CD #24

Closed
CMCDragonkai opened this issue May 18, 2023 · 8 comments
Closed

Windows build is failing on Gitlab CI/CD #24

CMCDragonkai opened this issue May 18, 2023 · 8 comments
Assignees
Labels
bug Something isn't working r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@CMCDragonkai
Copy link
Member

Describe the bug

First discovered in #7 (comment), and posted upstream cloudflare/boring#121.

The windows build of the quic native module uses the MSVC toolchain for rust. Everything is working until it hits something involving boring-sys crate. This crate is for the tls stack, and the -sys packages always involves deeper compiler tools. In this case, being something that uses C/C++ and even the assembler.

The error being:

  thread 'main' panicked at '"enum_(unnamed_at_deps/boringssl/src/include\\openssl/err_h_291_1)" is not a valid Ident', C:\Users\gitlab_runner\.cargo\registry\src\github.com-1ecc6299db9ec823\proc-macro2-1.0.56\src\fallback.rs:811:9
  stack backtrace:
     0: std::panicking::begin_panic_handler
               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library\std\src\panicking.rs:575
     1: core::panicking::panic_fmt
               at /rustc/2c8cc343237b8f7d5a3c3703e3a87f2eb2c54a74/library\core\src\panicking.rs:64
     2: proc_macro2::fallback::is_ident_continue
     3: <proc_macro2::fallback::Group as core::fmt::Debug>::fmt
     4: proc_macro2::fallback::Ident::new
     5: proc_macro2::imp::Ident::new
     6: proc_macro2::Ident::new
     7: bindgen::ir::context::BindgenContext::rust_ident_raw
     8: bindgen::ir::context::BindgenContext::rust_ident
     9: <bindgen::ir::enum_ty::Enum as bindgen::codegen::CodeGenerator>::codegen
    10: <bindgen::ir::ty::Type as bindgen::codegen::CodeGenerator>::codegen
    11: <bindgen::ir::item::Item as bindgen::codegen::CodeGenerator>::codegen
    12: <bindgen::ir::module::Module as bindgen::codegen::CodeGenerator>::codegen::{{closure}}
    13: <bindgen::ir::module::Module as bindgen::codegen::CodeGenerator>::codegen
    14: <bindgen::ir::item::Item as bindgen::codegen::CodeGenerator>::codegen
    15: bindgen::codegen::codegen::{{closure}}
    16: bindgen::ir::context::BindgenContext::gen
    17: bindgen::codegen::codegen
    18: <bindgen::BindgenOptions as core::default::Default>::default
    19: bindgen::Builder::generate
    20: <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::next
    21: core::ops::function::FnOnce::call_once{{vtable.shim}}
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Right now there's no clues on what to do here... except perhaps to try and match the GitHub action CI that is being used on the https://github.com/cloudflare/boring project. In particular their CI is actually building something and then testing it on Windows.

The last successful build is: https://github.com/cloudflare/boring/actions/runs/4949623099/jobs/8852046264.

Somethings to investigate:

  1. We're using a much later version of LLVM/clang. They are using:
    Installing LLVM and Clang 11.0 (11.0.1)...
    Downloading and extracting 'https://github.com/llvm/llvm-project/releases/download/llvmorg-11.0.1/LLVM-11.0.1-win64.exe'...
    
  2. Consider matching the NASM versions too

If all else fails, we could try cross-compiling everything. Furthermore I haven't tested this on our local windows computer, that may provide more clues and this may be something specific to the Gitlab windows CI machine.

To Reproduce

  1. Reproduced here: https://gitlab.com/MatrixAI/open-source/js-quic/-/jobs/4300087852

Expected behavior

Should just build like macos and linux atm.

Additional context

@CMCDragonkai CMCDragonkai added the bug Something isn't working label May 18, 2023
@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 9, 2023
@CMCDragonkai
Copy link
Member Author

Currently the .gitlab-ci.yml has the build:windows disabled as a dependency to build:prerelease and release:distribution.

When the windows build finally works, these 2 should be re-enabled so that they are part of the distribution jobs.

@CMCDragonkai
Copy link
Member Author

Refer to this commit: 896e86f

When the windows jobs work, reverse the changes to the windows jobs.

@CMCDragonkai
Copy link
Member Author

Asking chatgpt about this: https://chat.openai.com/share/34ecb6cd-1251-4ab2-b797-92d860e034f1

Seems like idea is that the binding generator might be choking on an unnamed enum.

C/C++ can have unnamed enums. The file it is referring to might be this: https://github.com/google/boringssl/blob/master/include/openssl/err.h#L304

image

The next step should be to inspect how quiche upstream project builds their binaries or tests on windows, and trace their CI process to see if they do something special with boringssl or something else.

@CMCDragonkai
Copy link
Member Author

Error is different with boring v3, less detail this time. cloudflare/boring#121 (comment)

@CMCDragonkai
Copy link
Member Author

Ok so apparently there could be an issue with windows-sdk-10.

Basically the windows SDK is not installed by default in Gitlab CI machines: https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/gcp/windows-containers/blob/main/cookbooks/preinstalled-software/README.md

Or maybe it is... since by default we can see that the build logs does this:

  -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.17763.

Anyway the build logs complain about a missing stdalign.h, and according to cloudflare/boring#121 (comment), only 10.0.20348.0 supports this.

Now on chocolatey, it's a bit confusing when looking for windows SDK packages.

Basically it seems Windows has some confusing version numbers. But right now noiseratio packages the ones correctly: https://community.chocolatey.org/profiles/noseratio

The current latest package is https://community.chocolatey.org/packages/windows-sdk-11-version-22H2-all. So I'm going to try this afterwards.

There's also some environment variables that may be relevant:

    LIBCLANG_PATH: "C:\\Program Files\\LLVM\\bin"
    CMAKE_SYSTEM_VERSION: "10.0.20348.0"

However CMAKE_SYSTEM_VERSION is supposed to default to the latest SDK version anyway, so it might just work fine.

The LIBCLANG_PATH can be needed for certain clang operations.

@CMCDragonkai
Copy link
Member Author

So it actually built without CMAKE_SYSTEM_VERSION https://gitlab.com/MatrixAI/open-source/js-quic/-/jobs/5305440152.

So I think I'll now try the latest SDK version too.

@CMCDragonkai
Copy link
Member Author

Great windows build no longer fails, however we have errors coming from other sources. Will solve this in #69.

@CMCDragonkai
Copy link
Member Author

@amydevs maybe important for future windows related builds #24 (comment)

@CMCDragonkai CMCDragonkai self-assigned this Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

No branches or pull requests

1 participant