Elizabeth's Conceptual Framework for Environments

Some feedback from Todd:

I actually like this overview a lot, with some caveats:
- Thinking about specs as functional expressions is exactly the language that Nix and Guix use, though they take a much finer-grained approach to hashing things and I generally find that they don't really offer the same level of templating that Spack offers. They build a fixed stack based on the current state of package files and IMHO you can’t swap things as easily or consistently – there’s no concretization step where you can let custom policies handle decisions. I think this is an advantage of Spack.
- I think the terminology could use some work and might throw people off (maybe this is Spack’s fault).
- I am not so sure about what the right way to manage conflicts within environments.
  - Should environments even allow conflicting packages?

[@citibeth] Yes, environments should allow conflicting packages. If they didn't, they would be semantically equivalent to a single spec --- because you can always create a "bundle package" that include a bunch of other packages. Conflicting packages are common, and no problem at all, as long as thing aren't linked together. If I create an environment that involves emacs, git and py-numpy, I don't really care if those three have conflicting dependencies.

Conceptual Framework

Let's think of a fully concretized Spack spec like an expression in a functional language. Formal parameters to the function are its dependencies (which will be bound to other function expressions), and variants (which will be bound to single values). There's an additional constraint that each function can only appear once in an expression, after common subexpression elimination has been applied. Let's not get too carried away with this analogy, but only to the extent it yields useful insight.

Spack can be thought of as coming in two parts: (a) the concretizer, and (b) the installer. The Spack installer takes a fully concretized spec and produces a tree on disk corresponding to that spec. With a few avoidable exceptions, this operation is stateless and free of side effects: installing the same spec later will yield the same result. In this way, the installer is boring.

When thinking about environments, most of the stuff we argue most passionately about happens in the concretizer, not the installer. Moreover, all the algorithms / features described in this WG can be achieved without actually installing anything (unless you want to USE an environment, of course). So let's focus on the concretizer and how it is used to generate packages or installation; and later, for user use.

Spack Environment: Definition

Let's define a Spack environment as set of fully concretized specs, plus possibly some annotation information (below). These specs don't have to be compatible, i.e. they can share conflicting dependencies. (If specs in an environment were constrained to be compatible, then there is no point in defining an environment; a spec can already be used to declare a set of compatible specs). Spack already works implicitly with environments in some settings. For example:

Spack today implicitly defines what I'll call the global environment: the set of all packages ever installed by Spack. That environment includes annotations detailing which packages were installed expliclty vs. implicitly. Unfortunately, the global environment is not very semantically meaningful if one is working on more than one project, or has installed more than one generation of Spack software into it.
spack install + spack activate creates an enviornment, and then links the sub-packages of that environment into the main package. This is convenient, but it breaks the side-effect-free nature of Spack. A safer / more powerful way to do the same thing would be nice.
Any time somebody builds a package and then types spack module load more than once, an environment is implied.

Environments are independent of the install procedure. Conceptually, environments may be created in two ways:

Constructed piece by piece, interleaved with installation: i.e. as each package is installed, it is added to the environment. That is how the global environment is currently construct.
An environment can be constructed completely independent of installation. For example, a set of spack spec commands implies an environment without any related installation. In theory, an environment could be created and concretized first, and installed later.

Envspec: Definition

An envspec is a set of non-concretized specs supplied by the user: Spack can produce a spec from an envspec by concretizing each spec in it. The concretization function is "interesting," in that it is complex, highly configurable, and not necessarily stateless. The following things affect concretization:

The version of Spack.
The list of available versions in a package.
packages.yaml
(Under proposal): the set of concretized specs in the global environment. (Note: this should probably be modfied to be with respect to any environment, not just the global environment.)

Using Environments

The process of using Spack therefore can be distilled as one of:

The user creates one or more envspecs.
Spack concretizes those envspecs to create one or more environments; all of which are subsets of the global environment.

Once an environment is created, the following operations are possible (and currently done, in one way or another):

Install: meaning, install each spec in the environment.
Generate modules: meaning, generate a module file for each spec in the environment. (This is affected by modules.yaml and probably also config.yaml).
Generate a module load script.
Create a Spack View.
Spack activate --- but this breaks the stateless nature of spack install. Better to replace spack activate with a more general use of Spack views.

Note that there are two wrinkles in (3) and (4):

Specs might conflict; therefore, concretized specs in an environment needs to have some kind of precedence or ordering over each other. This is usually applied by thinking of an envspec as a list, not a set.
In some cases, a spec should be recursively followed when generating the module loads or the view; and in other cases, not. I have concluded that this environment should be incorporated in the package itself, and not specified by the user in the environment. But maybe it should be overridable in the envspec.

Flexibility

The above section outlines the basic operations that should be available on Spack environments; it does not specify how those operations are to be presented in a UI. We should strive for a UI that supports multiple different ways of working. For example:

Envspecs should be creatable either explicilty (by writing a file) or implicitly (by setting a global "current envspec" and then doing a bunch of spack commands; this is how Python virtualenvs work).
Users who don't want to know about environments should be defaulted to some global environment that works pretty much the same way Spack works now.
The different operations on environments should be separable from each other. Users should be able to create an environment (explicitly or implictly) without installing it (which is slow). They should be able to install without generating modules. They should be able to generate modules explicitly post-install (or even without installling); or even generate two different sets of modules for the same environment, using two different modules.yaml configurations. They should be able to create views or module load scripts from an environment as needed. etc.

Obsolete Specs, Spack Upgrades

Assuming the Spack code doesn't change, fully concretized specs can be re-installed at any time (since they are statless). HOWEVER... a Spack upgrade could cause some fully concretized specs to no longer work (i.e. they are obsolete). This would happen if a spec changes its function signature; i.e. the dependencies or variants it uses.

A spec could also be made obsolete if its hash code changes. This would happen if:

The version specified in the spec has been removed.
The set of dependencies or variants has changed.
Spack has been changed to upgrade all hash codes.
Spack decides to hash the entire package source, and a flake8 change is committed to it.

Efforts at hash stability can reduce the frequency that specs turn obsolete; and thus, the rate of perceived unnecessary re-building. This can be achieved by:

Consciously avoiding/delaying/bunching changes that affect hashes globally, and upgrading a major Spack version number in consequence.
Make hash codes depend only on the package function signature, plus a user-defined package version number; that can be incremented when the packager decides a hash needs to change.
Ensure the hash does NOT depend on the list of versions available; since this is upgraded requently.

Whatever the reason, obsolete specs are unavoiadble. Therefore, they must be managed. The "right" thing to do when an environment is found to contain obsolete specs is to re-generated the specs in question by concretizing the corresponding specs from the original envspec. (This might cause an error as well; in that case, user intervention is required to fix the envspec).

Therefore, it will be advantageous for an environment to carry with it the original envspec used to generate that environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly