Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] subsystem.Popen() should be called with a clean env #16118

Open
paulharris opened this issue Apr 22, 2024 · 14 comments
Open

[bug] subsystem.Popen() should be called with a clean env #16118

paulharris opened this issue Apr 22, 2024 · 14 comments
Assignees

Comments

@paulharris
Copy link
Contributor

paulharris commented Apr 22, 2024

Describe the bug

OS: Windows
Compiler: msvc-19 2022
Conan version: 2.2.3

Conan profile:

Profile host:
[settings]
arch=x86_64
build_type=Debug
compiler=msvc
compiler.cppstd=20
compiler.runtime=dynamic
compiler.runtime_type=Debug
compiler.version=193
os=Windows
[conf]
tools.cmake.cmaketoolchain:generator=Ninja

Profile build:
[settings]
arch=x86_64
build_type=Release
compiler=msvc
compiler.cppstd=20
compiler.runtime=dynamic
compiler.runtime_type=Release
compiler.version=193
os=Windows
[conf]
tools.cmake.cmaketoolchain:generator=Ninja

There are many problems caused by unclean environments during the build step.
I get a variety of different behaviours depending on if I use CMD, or MSVC-CMD, or my own msys (git-sdk + pacman), or my msys was started after loading the MSVC vcvars environment.

This is especially visible when running autotools builds, which uses the msys environment during the build.

Example: ICU would not build in some of those environments: conan-io/conan-center-index#23679

Moving straight on from that, a nicer example is libvpx, which loudly trips over the !ExitCode environment variable (making this easier to reason about).

When running conan from within my msvc+msys shell, the build ends up running in an environment that has many layers:

  • Standard Windows environment + whatever extra env variables in my user profile.
  • MSVC vcvars developer environment
  • My msys environment starts with bash --login
  • IMPORTANT PART number 1 bash --login introduces more variables (in particular !ExitCode=00000001)
  • . devenv/Scripts/activate introduces more items into the environment.
  • conan create is executed ...
  • (At this point, Conan runs Python's Popen() with conan's build environment bat files, in a CMD/DOS environment)
  • Conan's python converts existing environ key names to UPPERCASE, which ends up in the Popen() environment - https://docs.python.org/3/library/os.html#os.environ
  • IMPORTANT PART number 2 The !ExitCode env variable is now !EXITCODE=00000001
  • Popen() starts up, runs the conan build-env bat files, which eventually starts msys2 with bash --login.
  • IMPORTANT PART number 3 now we have a SECOND variable with the same equivalent key name: !ExitCode=00000000 ... interestingly, the value is different (0, not 1), and somehow it ended up in the environment instead of replacing the existing value.

Crazy, but true. This is what (part of) the environment looks like at this point:

COMMONPROGRAMFILES(X86)=C:\Program Files (x86)\Common Files
!C:=C:\Program Files (x86)\Microsoft Visual Studio\Installer
!ExitCode=00000000
!EXITCODE=00000001
PROGRAMFILES(X86)=C:\Program Files (x86)
SHELL=/usr/bin/bash

Finally, conan can execute autotools make, which executes and then fails with:

     3>C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(75
       5,5): error MSB6001: Invalid command line switch for "CL.exe". System.ArgumentException: Item has already been a
       dded. Key in dictionary: '!EXITCODE'  Key being added: '!ExitCode' [M:\cocache4\cache\b\libvp414895aa790c4\b\bui
       ld-debug\vpxrc.vcxproj]

Note that this is a really nice example, but plenty of other packages will change behaviour (sometimes non-fatally) with unclean environments.
For example, libvpx WILL build with a slightly different environment, but it won't produce debug libraries as requested! So the package installs with errors, only headers and no libs.

NOTE: To debug this, I added a breakpoint() to libvpx's recipe, before autotools.make(), then hand-edited the conan/bat files in the cache, adding a bunch of printouts of the environment.

How to reproduce it

  • Open MSVC Developer command prompt.
  • Run your local msys2 bash, this is how I do it: K:\git-sdk-64\bin\bash --login
  • Check env for our nice test variable: env | grep -i exit ----> !ExitCode=00000001
  • cd to your local conan-center clone, and do a conan-create:
    • cd /k/conan-center/recipes/libvpx/all
    • conan create . --version 1.13.1 --user test --channel wacky-env -pr:h the-host-profile -pr:b the-build-profile --build=missing
  • Observe lots of red errors.
paulharris added a commit to paulharris/conan that referenced this issue Apr 22, 2024
Do not allow the user's shell environment to leak into the build
environment. Instead, create a clean environment with just white-listed
environment keys.
@memsharded
Copy link
Member

Hi @paulharris

I am not sure this is a bug, this is how Conan and recipes in ConanCenter are designed to work:

  • It is generally assumed that Conan runs within an environment without VS vcvars activated, this is, not within a VS prompt either. VS doesn't allow to activate another environment on top of it, so that means that Conan wouldn't be allow to honor settings as compiler.version
  • It is generally assumed that Conan runs in a non msys2 environment, recipes in ConanCenter have win_bash=True and tool_requires(msys2...) to inject automatically the msys2 bash path.

If this is not what you want, there are some ways to inhibit some of this behavior:

  • tools.microsoft.bash:active will assume that it is running inside bash, and then not force the bash -c wrapper
  • tools.microsoft.msbuild:installation_path="" as empty string will disable the generator VCVars and it will not create and acivate a new VS prompts or its variables.

@memsharded memsharded self-assigned this Apr 22, 2024
@paulharris
Copy link
Contributor Author

I think the old conan1 had to be run in a vcvars shell, or at least that was how I was running it.
I'm happy to avoid running in a vcvars shell going forward.

It seems to work fine from within msys shell. To be clear, I run in there so I can use the other msys tools. I don't want to use my msys as part of the build chain.

So I want conan2 to create its own little bubble of a tightly controlled and repeatable environment each time it runs. I thought that was what conanbuild.bat was for...

Otherwise, is everyone expected to run a CI build farm or build in docker or something?
I haven't seen anything in the docs about how you should ensure your PATH and environment is clean before you run conan.

How are other people doing this? How exactly do you spin up a clean environment that doesn't involve building on a VM or container? And if this is what we should be doing as standard, shouldn't it be in the tutorials?

@valgur
Copy link
Contributor

valgur commented Apr 22, 2024

This would be a good opt-in conan conf option to have, I think.

@memsharded
Copy link
Member

I think the old conan1 had to be run in a vcvars shell, or at least that was how I was running it.
I'm happy to avoid running in a vcvars shell going forward.

I am a Windows user, and I didn't have to run in a vcvars shell in Conan 1 either. Most modern Conan generators in Conan 1 have been generating/calling VCVars to create the conan_vcvars environment activation when the build system is not enough (CMake with VS generators doesn't need it, but it does when using Ninja). In any case Conan can provide the activator when needed.

So I want conan2 to create its own little bubble of a tightly controlled and repeatable environment each time it runs. I thought that was what conanbuild.bat was for...

conanbuild.bat injects the environment from tool_requires for recipes at build time. But it is not a bubble, it doesn't provide full isolation of the system. It adds/appends/prepends/applies/unset the tool_requires definitions, and also those coming from profiles [buildenv].

It is the recipes or the profiles that can alter the environment the way they want, they can even unset defined environment variables. But it is not the responsibility of Conan to provide isolation from the system, because actually a lot of users rely on the system environment, and use it together with Conan. An extremely common situation is Conan adding tool_requires("cmake/..."), but having the compiler defined in the system, like with CC/CXX and other environment variables. If Conan would remove all environment this wouldn't work, and it will force users to always define everything in Conan profiles, and that is not the intention or the best default either.

How are other people doing this? How exactly do you spin up a clean environment that doesn't involve building on a VM or container? And if this is what we should be doing as standard, shouldn't it be in the tutorials?

No, it shouldn't be necessarily the default or standard Conan behavior. C++ devs have been working without isolated "virtualenvs" since always, and they are used to often rely, at least partially on the environment.

@memsharded
Copy link
Member

This would be a good opt-in conan conf option to have, I think.

I am not sure this works as a "feature" to opt-in, and it sounds to me like the typical rabbit hole, where users need more and more configurability because the implemented behavior is not good as there are too many different use cases, like: "Hey, why are you unsetting the variable "Program Files", I need it for my recipes to work, can you add a new conf that allows to exclude the "Program Files" from the unsetting?"

I think having a tool-require that defines a buildenv_info that unsets everything you want to unset would be just a very few lines and if you tool_requires it before the other tool-requires, I think it can work.

@paulharris
Copy link
Contributor Author

Firstly, thanks for engaging in this debate. I think it is important, at least if all it does is educate me.

I didn't run this through a chatgpt politeness filter, I'm just trying to lay out the arguments for and against so please just read it as the Arguments for the Opposed. I look forward to your rebuttal.


The tool-require approach feels like a hack for only me, when I'm sure other people have this same problem.
And how can I guarantee it will run before anything else?
I literally want it to run before any of the conan buildenv bat files run.


An extremely common situation is Conan adding tool_requires("cmake/..."), but having the compiler defined in the system, like with CC/CXX and other environment variables.

For this specific example, cmake should be defined in tool_requires (as you say),
and CC/CXX should be defined by the profile...
this is [conf] tools.build:compiler_executables

Any other environ variables should be in the profile, that is the job of the profile.
If there are environ vars required for just one recipe, there is a place in the recipe for that.

And when I need the environ to be different for a release build, then that is why there is more than one profile.


I think the whole point of conan builds is to have repeatable builds. There was even a blog post about deterministic builds. https://blog.conan.io/2019/09/02/Deterministic-builds-with-C-C++.html
At no point does it raise the problem of the shell environment that conan is running in. I had always assumed that when conan runs, it sets up the whole environment.

IMO it would be a lot more reliable and predictable for conan to start with a blank environment, and add things IN explicitly.
I say this as a recipe author too. I don't want everything to work on my computer, and then build differently on the CI or someone else's computer. This is exactly what was happening to me with libvpx. Proof-of-conan built fine, but it wouldn't build on my computer, due to extra stuff in the environment.

I don't even use libvpx directly, but now I've had to spend time hacking on that recipe understanding what is going on, when it was something in the environment unrelated to conan or the recipe.

I also want repeatability.
I don't want my builds to change just because this week I installed some bit of software and it added something into the environment. That way leads to insanity. Often the change in output won't become apparent until later (eg small change in the library behaviour), so it is hard to tell what (and when) it changed.


The PR of mine only has whitelisted items because I found nothing would work at all in a Popen call with a completely blank env. So I added what looked like the most basic environment elements that you would get in a blank machine.

And it worked brilliantly. If something extra is needed, I can add it in the profile or recipe.

I would go one step further and remove the added path to the python executable (my second commit in the PR).

# FROM MY PR #
        # Also get the path to conan's python, as some packages assume access to python
        # eg dav1d
        # This will also give access to ninja.exe and whatever else is installed in the devenv
        python_path = os.path.dirname(sys.executable)
        newenv['PATH'] = f'C:\\WINDOWS\\system32;C:\\WINDOWS;{python_path}'

Note that "giving access to ninja" is a negative, not a positive. The Ninja exe should be specified by conan, not whatever is in my PATH today.

The dav1d recipe requires python, and it got it out of the environment's PATH.
I see that as a recipe bug. If a recipe requires python, then, WHICH python does it require? That should be a build-requirement for the recipe, right? Like cmake.


I didn't make the same changes for Linux (yet) because I'm currently in Windows mode.
But I feel we should also clear out the environment for Linux and Mac too.

@paulharris
Copy link
Contributor Author

I should mention that I don't think the PR is finished. eg running Git.clone() fails without the shell's env, but I would like to see a more controlled environment.

@paulharris
Copy link
Contributor Author

In one of my recipes, I execute git and some other commands in the source() method.
It uses self.run() and Git.clone(), which (according to the docs) will execute conanbuild.sh as well as any tool-requires scripts required.

However,
source() is executed before there is a generator folder. That means we can't define the environment for source() to run commands, and we can't use tool-requires within source().

eg, I was not able to run envvars.save_script() within source().

@paulharris
Copy link
Contributor Author

The with env.vars(self).apply(): pattern also won't work with this PR.
We would have to clear the environment at the start, before running conan.
I don't think this can be done via tool_requires.

@paulharris
Copy link
Contributor Author

I've adjusted my approach and moved the environment-cleaner to the start of the conan process, otherwise EnvVars.apply() doesn't work - as @memsharded would've guessed.

To make it work with EnvVars.apply would likely cause too much change in the conan code, so now I'm looking at clearing the env while calling conan.

@paulharris
Copy link
Contributor Author

I'm not sure this would deal with all the issues, ie shouldn't conanbuild.bat be clearing stuff out so that cmake-executions and ninja-builds will also get a clean environment?

@memsharded
Copy link
Member

Thanks very much for your feedback.

I am closing #16119, based on the comments in #16119 (review). This is not considered a bug, but this is totally the intended and expected behavior of Conan. If users want to enforce a "curated" set of environment variables, this is a responsibility that is outside of Conan scope, and specific to every user, so this is not something that Conan can implement by default.

@paulharris
Copy link
Contributor Author

I think the biggest problem was !ExitCode, which is later mangled into !EXITCODE by python's environment dict.
When a second !ExitCode is added due to a second msys layer, MSVC's tools freak out.

I understand why it can't be merged, I'll see if I can find a workaround along the lines of what you've suggested above.

@memsharded
Copy link
Member

I think the biggest problem was !ExitCode, which is later mangled into !EXITCODE by python's environment dict.
When a second !ExitCode is added due to a second msys layer, MSVC's tools freak out.

There was a recent fix for Conan 2.3, that avoids some internal uppercasing that Conan was doing in Windows for all env-vars, check #16122, or try running from the develop2 branch to see if this improves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants