Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/environment propagate #8534

Merged
merged 40 commits into from Mar 23, 2021

Conversation

memsharded
Copy link
Member

@memsharded memsharded commented Feb 19, 2021

Changelog: Feature: New environment management, with correct value capturing and restoring on deactivate
Docs: conan-io/docs#2060

New approach to environment (env-vars) management:

VirtualEnv

  • New generator that will generate buildenv.xxx and runenv.xxx
  • Collects info from dependencies visiting the graph, not by the accumulated deps_env_info
  • It also collects information from profiles
  • It automatically collects runtime information based on cpp_info definition: exes, bin-paths, lib_paths to define PATH, LD_LIBRARY_PATH, DYLD_LIBRARY_PATH envvars

ConanfileDependencies

  • Entry point in each recipe to visit its dependencies
  • Access to dependencies conanfiles is restricted by a "read-only" conanfilewrapper

CMakeGen

  • Proof of concept of an aggregated meta-generator that includes CMakeToolchain, CMakeDeps and VirtualEnv
  • CMake build helper will integrate calls to VirtualEnv generated files

Copy link
Contributor

@lasote lasote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall idea looks pretty good. I miss some more unit tests mostly and we need to try this with more pieces (toolchains etc) and real recipes to confirm that it fits.

conans/test/unittests/tools/env/test_env.py Outdated Show resolved Hide resolved
conan/tools/env/environment.py Outdated Show resolved Hide resolved

def test_profile():
myprofile = textwrap.dedent("""
# define
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to specify a separator in the profile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it seems the syntax will be weird, especially to designate whitespaces

@memsharded
Copy link
Member Author

memsharded commented Feb 24, 2021

Please @lasote check the changes in 5f323b1:

  • Proposes CMakeGen integration that aggregates everything: CMakeToolchain, CMakeDeps, VirtualEnv
  • CMake will use the buildenv environment if existing, to inject env-vars

@lasote
Copy link
Contributor

lasote commented Feb 24, 2021

@memsharded sublime. Love it.

@solvingj
Copy link
Contributor

solvingj commented Feb 25, 2021

Excellent work! I really now appreciate the cleaner separation and management between the environment variables for build and run. I think the CMakeGen meta-wrapper makes sense now that we've had some time for the other parts to evolve.

I will use this opportunity to point out that the CMake helper would have been a good candidate to use the alternative generic/python .env strategy I proposed offline (provide a third option, de-coupled from user shells):

  • generate generic run.env and build.env files with pure key/value pairs
  • provide a conan command which can read and use these env files:
    • conan exec --withenv=run.env cmake build ...
    • this would just be a super-simple wrapper for tools.with_env(env): and os.subprocess('whatever')
  • use this mechanism in CMake helper instead of calling the .bat/.sh scripts

I understand it's a bit too ambitious and un-proven as a concept internally yet, so it's just a suggestion for future. But, I'm confident it's a great idea to provide this option. I definitely think it would be a better internal implementation for our own python code such as this (and every other future build system wrapper). But also, consider the vast number of other users trying to orchestrate Conan and these environments with their own python code, or cmake scripts, etc. It's really unnecessary and has significant disadvantages to have to go through the shell scripts with to get to the environment variables like this:

https://github.com/conan-io/conan/pull/8534/files#diff-56b36c23589a66b0bd803c545703684c74400cbb7bd8538b8949ed84613e3fc7R22-R27

def environment_wrap_command(filename, cmd):
    if filename.endswith(".bat"):
        return "{} && {}".format(filename, cmd)
    elif filename.endswith(".sh"):
        return 'bash -c ". {} && {}"'.format(filename, cmd.replace('"', r'\"'))
    raise Exception("Unsupported environment file type {}".format(filename))

The fact that we're not even providing any option to produce generic .env files that can be automated with custom scripts/languages feels like an obvious thing we could change and improve. Whether or not we provide the wrapper command could be a separate discussion.

That was the only thing I would suggest for fundamental change, perhaps in the next iteration. Even without that, this is a huge improvement and feels like we're closer to having this part really mastered.

@memsharded
Copy link
Member Author

Thanks @solvingj for the review.

There are a few things that prevent to have a nice solution based on generic .env files. Basically the assumption generate generic run.env and build.env files with pure key/value pairs doesn't hold:

  • The composition logic of environments results in complicated patterns to resolve. One environment should be able to do something like NEW_VAR = Val1 Val2 <Previous NEW_VAR> Val3 Val4, as a result of appends and prepends of information. Different shell scripts should now be able to parse those lines, interpret and correctly evaluate them.
  • There is no unique placeholder for representing the current/previous value of the environment variable. While it is very easy to encode such as $VAR, %VAR%, achieving that when the variable name is a variable itself is at least challenging, because the shell and batch evaluation is often based on expansion and not pure variable evaluation.
  • Paths need to be handled, because the path separator is different between OSs, but also different for different subsystems. So it cannot be replaced in the .env file, and some mechanism to identify "this is a path", and concatenate them is necessary. Now the parsing becomes more complex because paths with path spaces, and need to quote/unquote things, which makes things more complicated and fragile.

Note that the shell/batch script is still mandatory, we cannot go without it. After a conan install it must be possible to use native tools, and be able to build without calling any other Conan command again. If we don't provide the shell scripts, then a conan build is mandatory again, and forbids many other user flows and tools.

Said that, I think that generating serialized xxxx.env files from the Environment class is an easy thing, assuming they will be deserialized and used by Conan only, and not by the shell scripts. But having our code rely on that functionality might end in non-working user shell scripts, (because Conan internal .env management will eventually diverge from the scripts, which is exactly what we are trying to achieve with the new build system integration paradigm). So at the moment I prefer to focus on one way to do this thing, and make sure that it works reliably, robustly, and with low cognitive overhead, and there will be time to consider UX enhancements like this one.

Copy link
Contributor

@ytimenkov ytimenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand better (I always confuse build vs host, to me it's host vs target 😊):

  1. build requirement has bin_dirs / lib_dirs (e.g. a tool cmake, flatc) -> ends up in buildenv (PATH / LD_LIBRARY_PATH)
  2. package has bin_dirs / lib_dirs and it's application -> ends up in runenv when I do conan install <ref> -f VirtualEnv
  3. Package is a (host) requirement and has bin_dirs / lib_dirs -> ends up in runenv.

correct? So for development I activate only buildenv and get my tools.

But then buildenv_info / runenv_info are less clear:

  1. a package is build requirement: where do its build / run infos go?
  2. a package is a (host) requirement: ditto, where do explicit build/run envs end up?

Now the most unclear thing is env_info.PYTHONPATH. We use python packages and they can be both build requirement (as a plug in for build tool) and ordinary (a test runner when package is consumed).
Naturally I would expect to use buildenv in first case and runenv in the second. What are the needed changes for my setup? Should those recipes be changed to set both build_env_info and run_env_info properties?

Another thing to confirm: requirements from profile are always build ones, right? Asking because we have Conan packages with gcc and python and they're part of profile. However when running tests I would like to have them: python for test runner and gcov to process coverage (thus python should be in PATH) so I have to activate both build and run environments, right?

The last part is a bit tricky with lockfiles: I can't specify profile when using lockfile, so I have to create a simple consumer file with single dependency. Then (as in item 3) my bin_dirs / lib_dirs should end up in runenv, right?

pass


def environment_wrap_command(filename, cmd):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made somehow extensible (even maybe a jinja template)? bash may not be present everywhere (well, maybe sh is required for POSIX).
What about powershell on Windows?
Also for linux sh will do for running from toolchain, but support for other shells should be possible for user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the current scripts are using bash syntax, it is not just calling it, the scripts themselves should be adapted for other shells. If it is possible to make the xxxenv.sh scripts shell generic, that would be great.

The important bit I needed specific bash features if for capturing the environment for the deactivate_xxx.sh script.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bash requirement has been removed. Powershell support will also be added.

if filename.endswith(".bat"):
return "{} && {}".format(filename, cmd)
elif filename.endswith(".sh"):
return 'bash -c ". {} && {}"'.format(filename, cmd.replace('"', r'\"'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an escaping hell (backslashes need to be escaped as well for example, maybe spaces...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our tests run by default with spaces in paths, but indeed we should check it. We are adding more real tests that will hopefully cover these cases.

self._values[name] = value + [_PathSep] + [_EnvVarPlaceHolder]

def save_bat(self, filename, generate_deactivate=True, pathsep=os.pathsep):
deactivate = textwrap.dedent("""\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point about Jinja: if you have those templates then you can just go through the list of templates and generate each one. This will be customizeable and easily extensible. For example add setting:
virtualenv.templates=buildenv.sh.j2,runenv.sh.j2,buildenv.fish.j2
(Also modules / blocks can be loaded from config file location).

I rewrote my generator in Jinja. The template is arguably easier to read but python code is significantly simpler without all if'ed logic.

Another benefit of such user-extensibility is that it would be easier to write plain .env files as @solvingj asked and which could be used in IDE (at least many VS Code plugins start supporting .env files: python, C++ debugger).

Also doing this as a template doesn't mean immediate decision to publish as a user-facing contract, but still (I believe) will reduce complexity of python code.

The only trick is to carefully handle this for pyinstaller / zip-friendlyness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree it would be great to provide templates for this, and provide user extensibility. At this point my priority is to define the new paradigm:

  • How generators access dependencies information via a visitor (ConanFileDependencies)
  • The new separation of build and run envs
  • The explicit generation of environment scripts for handling env-vars instead of doing everything dynamically in Conan (and thus not allowing many user flows)
  • A clear and explicit mechanism to define env-var operations: append, prepend, clear, define, and how these compose for multiple levels (profile, build-requires, toolchains...)

Still the problem of .env files is not fully clear. In Conan we clearly have the user demand for those different operations: append, prepend, define, unset, and those .env files cannot do that. Also, in Conan users might need to concatenate more than one environment definition (for example the definition coming from profiles and build-requires, and another definition coming from a toolchain), that both will need to perform different operations like appending. Those .env files cannot do this either, up to my knowledge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still the problem of .env files is not fully clear.

At least in VS Code many plugins can use .env files instead of having settings. For example, python extension appends PYTHONPATH from it to its internal search path, C++ extension has similar option in debug configuration (for example to run program with modified LD_LIBRARY_PATH).

It may not even be possible to run IDE in such an activated environment, because, for example for workspace each folder (repo / project) has own settings / environment.

I'm not suggesting adding this by default, but some users may find it useful to customize and produce own files, even if they for example capture complete variables at the install time.

@memsharded
Copy link
Member Author

The mapping of explicitly define env-vars in buildenv_info and runenv_info should be straightforward, each one goes to its respective file

a package is build requirement: where do its build / run infos go?

A build requirement should only define buildenv_info, as by definition the build-requirement should not exist in runtime.

a package is a (host) requirement: ditto, where do explicit build/run envs end up?

buildenv_info will end in buildenv.xxx and runenv_info ends in runenv.xxx

@memsharded
Copy link
Member Author

Now the most unclear thing is env_info.PYTHONPATH. We use python packages and they can be both build requirement (as a plug in for build tool) and ordinary (a test runner when package is consumed).
Naturally I would expect to use buildenv in first case and runenv in the second. What are the needed changes for my setup? Should those recipes be changed to set both build_env_info and run_env_info properties?

Yes, I guess so. Maybe if the tests are only intended to run in the build() method, it could be considered a "buildenv" only. But it might be perfectly possible that different python utilities are used for the build and other for the tests, so having them in each build/run info, would make sense, wouldn't it? I am not sure if clear from the code, but it would be enough to define one Environment and assign it to both.

Another thing to confirm: requirements from profile are always build ones, right? Asking because we have Conan packages with gcc and python and they're part of profile. However when running tests I would like to have them: python for test runner and gcov to process coverage (thus python should be in PATH) so I have to activate both build and run environments, right?

The profiles will also contain [buildenv] and [runenv] both for build and host context while building. So far the [env] was only build one, but with this new proposal is possible to also define vars for the runenv, which might be useful for many users.

The last part is a bit tricky with lockfiles: I can't specify profile when using lockfile, so I have to create a simple consumer file with single dependency. Then (as in item 3) my bin_dirs / lib_dirs should end up in runenv, right?

This is not very clear, lockfiles contain a copy of the profiles, shouldn't require any specific setup?

@memsharded
Copy link
Member Author

@ytimenkov

I am experimenting with removing the environment capture, and it works without bash, which I agree would be better. I'll check if there is a possibility to capture the "deactivate_xxx.sh" script in pure shell without using bash, and at the moment, I am making the generation of "deactivate_xxxx.xxx" scripts opt-in, instead of opt-out.

Copy link
Contributor

@ytimenkov ytimenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if the tests are only intended to run in the build() method, it could be considered a "buildenv" only.

We run tests separately. So job1 builds conan package, uploads it and publishes a lockfile as an artifact. Next jobs take this artifact, do echo -e "[requires]\n$REF" > conanfile.txt && conan install . -g virtualenv -g virtualrunenv -o pkgname:testing=True and then like python3 test.py which is in the path and package-specific so it's the purpose of Conan package to provide right info to run particular tests.

However this is slightly irrelevant: PYTHONPATH is defined explicitly and I can put it to both run_env and build_env. This part is straight-forward.

I think my primary question was about autorun_environment (new function in this PR) and where do PATH and library paths end up for requirements and build requirements (including ones coming from profiles).

self._values[name] = value + [_PathSep] + [_EnvVarPlaceHolder]

def save_bat(self, filename, generate_deactivate=True, pathsep=os.pathsep):
deactivate = textwrap.dedent("""\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still the problem of .env files is not fully clear.

At least in VS Code many plugins can use .env files instead of having settings. For example, python extension appends PYTHONPATH from it to its internal search path, C++ extension has similar option in debug configuration (for example to run program with modified LD_LIBRARY_PATH).

It may not even be possible to run IDE in such an activated environment, because, for example for workspace each folder (repo / project) has own settings / environment.

I'm not suggesting adding this by default, but some users may find it useful to customize and produce own files, even if they for example capture complete variables at the install time.

@lasote
Copy link
Contributor

lasote commented Mar 16, 2021

For me is very important that everyone understand that we are building now the foundations of the Conan 2.0 and only a complete model will allow us to build on top of it the N layers to complete a successful Conan 2.0. So my vote is a big YES to merge this.
If later we realize that we want to revert it, we can perfectly do it, it is just an addition that it is not breaking anything in the Conan 1.X. I think it doesn't make any sense to keep this blocked and discussing if we can live without it when the real question is: Is this a better model? and the answer is, yes, it represents better the reality than the previous one. The rest of the considerations are just UX.

@jgsogo
Copy link
Contributor

jgsogo commented Mar 16, 2021

No, it doesn't represent the reality better. It is not about UX. It is about the model. It is adding an artificial environment that is not needed at all (we are waiting for an example, we have +750 recipes in ConanCenter and no one of them requires it). Without a minimal example, it is as arbitrary as adding self.sandboxenv_info or self.testenv_info.

@lasote
Copy link
Contributor

lasote commented Mar 16, 2021

Without a minimal example, it is as arbitrary as adding self.sandboxenv_info or self.testenv_info.

Multiple examples of variables that should exist at run but not on build already provided here

No, it doesn't represent the reality better. It is not about UX. It is about the model. It is adding an artificial environment that is not needed at all (we are waiting for an example, we have +750 recipes in ConanCenter and no one of them requires it).

"Not needed" is different from "useless". Can the recipes work without this feature, sure! they work! (even with vcpkg :P ) Can these variables exist at build time? sure, they can! but they shouldn't. That's what I mean by a better model.

@jgsogo
Copy link
Contributor

jgsogo commented Mar 16, 2021

Without a minimal example, it is as arbitrary as adding self.sandboxenv_info or self.testenv_info.

Multiple examples of variables that should exist at run but not on build already provided here

All those variables are needed when running as a build-requires and when running as standalone application. Why do you think you don't need those variables when using those packages as build-requires?

@lasote
Copy link
Contributor

lasote commented Mar 16, 2021

Why do you think you don't need those variables when using those packages as build-requires?

I don't think that, a better model that doesn't mix things up allows to manage it nicely:

#8534 (comment)

https://github.com/conan-io/conan/pull/8534/files#diff-5a2b3f42b8880cba4179072199deb461111c266409a3ce051e22e3bb1816084fR30-R32

@Psy-Kai
Copy link
Contributor

Psy-Kai commented Mar 16, 2021

@lasote do you talk about the need of something like

def package_info(self):
    self.buildenv_info.FOO = "Bar"
    self.runenv_info.HELLO = "World"

or something other? Because setting a runenv_info for a build-requires seems very odd (and vice versa). How would you explain the user that a build-requires package leaks information into the run-context? And how does a "normal" requires package leaks info into the build-context? That is very hard to explain. Mixing this things will lead to more complex package definitions.

Setting a single self.env_info and let the (build-)requires context decide how to propagate makes it simpler to understand and to use. And I cannot imagine an example where you would like to leak a build-context env variable into the run-context and vice versa. If the user want to mix these variables he sure can do manually in his conanfile (or by extracting them from conanbuildinfo.json). But I think that this will be a very very rare case.

@SSE4
Copy link
Contributor

SSE4 commented Mar 17, 2021

@jgsogo to be honest, I don't think ICU_DATA and friends are good examples.
first off, ICU_DATA is used as by executables, as by libraries, so it should be always set, basically.
second, it's needed in both contexts, as I may run x86 (build) ICU binaries:

$ conan install . --profile:build x86 --profile:host arm -g virtualbuildenv && ./activate_build.sh icurun <args>

but at the same time I may run ARM (host) binaries:

$ conan install . --profile:build x86 --profile:host arm -g virtualrunenv && ./activate_run.sh <myemu> icurun <args>

I need ICU_DATA to be set in both cases, for certain.
meanwhile, if I have in ICU conanfile something like:

    self.buildenv_info.ICU_DATA= "Foo"
    self.runenv_info.ICU_DATA= "Bar"

I don't see how it helps, it only complicates the things, introducing additional challenges in UX.
now I not only have build and host (run) environments, but I need to distinguish if I want to use something as build tool, or just regular run (there might be some conceptual or ideological differences, but still, both are just launching executables in the end).
so I technically can have 4 situations now:

  • I want to BUILD with icurun in BUILD context (launch x86 icurun binaries, but as a part of build process or as a build tool)
  • I want to RUN icurun in BUILDcontext (launch x86 icurun binaries, for other purposes)
  • I want to BUILD with icurun in HOST context (launch ARM icurun binaries, but as a part of build process or as a build tool)
  • I want to RUN icurun in HOST context (launch ARM icurun binaries, for other purposes)

IMO this is too much and super confusing.

@memsharded
Copy link
Member Author

@Psy-Kai

There are recipes that declare 2 different types of environment variables, and there are examples of both usages in ConanCenter. I am not even talking about build-requires, just a regular recipe containing a library as regular requires, in the “host” context:

Environment variables for build-time, most common example is XXX_ROOT

self.env_info.MYLIB_ROOT = self.package_folder.replace("\\", "/")

This one is an environment variable for building against it, more specifically for CMake find_package functionality to find it: https://cmake.org/cmake/help/latest/envvar/PackageName_ROOT.html. There are a few packages in ConanCenter using this approach.

This variable must exist in the conanbuildenv environment script, so CMake finds it when building.

On the other hand, there are env-vars for runtime, typically something like:

self.env_info.MYLIB_DATADIR = os.path.join(bindir, "datadir")

This is also something that can be found in ConanCenter recipes. This is the location needed at application runtime, to locate the necessary resources to run applications linking with this library. This information must exist in the conanrunenv environment script, so when the application is run, it will find that data directory at runtime.

The necessary representation for this is something like:

self.buildenv_info.MYLIB_ROOT = self.package_folder.replace("\\", "/")
self.runenv_info.MYLIB_DATADIR = os.path.join(bindir, "datadir")

Otherwise it will be impossible to distinguish the right target environment script for each one.

@memsharded
Copy link
Member Author

Seeing all the info, I have decided to move the PR forward as-is, there are a few pieces here that we need to keep moving forward. Even if it adds extra-model, it seems to exist real use cases that requires this flexibility, so I am keeping it. This part will not be documented at this moment, and we will keep evolving and investigating what is necessary and what not as we evolve the graph model. If we realize later that a single definition of environment is good enough, we will revert it.

Thanks all for the feedback!

@ytimenkov
Copy link
Contributor

Is it possible to keep all 3 instead of manual combination? env_info, runenv_info and buildenv_info In the recipe's package_info() and then use env_info + runenv_info or env_info + buildenv_info when generating? This will allow recipes to opt-in into differentiated behavior where needed so we can see later the demand for this feature.
If this is also used only with new generators there shouldn't be compatibility issues.

@jgsogo
Copy link
Contributor

jgsogo commented Mar 17, 2021

Thanks for the example. Now I understand the motivation after that proposal. This is something people have been doing with self.user_info: the consumer knows the libraries it requires and decides to use (or not) the variables provided by its requirements in host via self.user_info_host (legacy self.deps_user_info).

We can change our minds and start recommending this approach with environment variables, no problem. 👍


Questions in order to help users in future issues:

  • When the same variable comes from host context and build context, which one takes precedence? How it affects any future approach based on wrappers/shims to execute build-requires?

  • What is the plan for these new environment variables coming from host? Are they available before or after evaluating the method package_info() of the build-requires?

  • From my comment here, if I understand it correctly, the environment gathered from a build-requires is the self.buildenv_info from the build-requires plus the self.runenv_info from the requirements of the build-requires. Can a consumer decide that it wants the self.buildenv_info from the requirements of the build-requires?

    Example: a user that implements an SDK with a package that requires other tools like cmake, pkg-config, AndroidNDK,... those would be requires of the sdk-package. For sure, when that user declares in a consumer that he build-requires that sdk-package, he will need the self.buildenv_info from all of them. Is this by default and my comment was wrong? Is there a way to opt-in for this behavior?

    Note.- This is the scenario for the emsdk recipe (with python and nodejs packages).

And then, there are lots of details, but for me at the moment, these are some important points that will help me to make up my mind. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
New cross building model
  
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

7 participants