Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a SAT solver to work out the dependency tree #139

Open
mattfarina opened this issue Apr 19, 2021 · 7 comments
Open

Use a SAT solver to work out the dependency tree #139

mattfarina opened this issue Apr 19, 2021 · 7 comments
Projects

Comments

@mattfarina
Copy link
Collaborator

No description provided.

@mattfarina mattfarina added this to To do in Shit It via automation Apr 19, 2021
@dragonchaser dragonchaser moved this from To do to In progress in Shit It May 11, 2021
@dragonchaser dragonchaser self-assigned this May 11, 2021
@viccuad viccuad self-assigned this May 26, 2021
@viccuad
Copy link
Member

viccuad commented May 26, 2021

Our problem

The problem space is defined by a set of packages that depend on each other in various ways. A package is comprised of:

  • An implicit chart.
  • URI (repo + name)
  • semver ranges (e.g: ~, ^). They get collapsed into a specific semver when deployed.
  • Kubernetes namespace. Either where to install it, or where are they installed, if already installed.
  • Defined relations to other packages (Depends, Optional-Depends, Provides, Conflicts).

Also, not considered now but could be considered in the future:

  • charts containing only CRDs. Given how CRD objects behave on remove and update in the cluster, this may mean to never uninstall them in normal operations, and pinning versions for them to not get updated automatically.
  • Values.yaml could be added to the package definition. Changing values.yaml and redeploying the chart means having a package that behaves differently (in the same way that ABI changes of usual Linux OS package makes them compatible or not with other packages).
  • Pinned installed packages. Those pinned packages may not be uninstalled/changed in version.
  • Automatically installed vs manually installed. This allows for autoremoving packages once nothing depends on them.

Given a set of installed packages in the system, and a set of packages that we want to be installed, we want to know if there's a set of solutions to it, and which one is better. Or, we want to know if there is no solution, and why. This is a Satisfabiality problem that is NP-hard.

Some of these features of packages are a non-issue for resolving the dependency problem: namespaces are just part of what defines a package, pinned and manually installed packages just create a new hard constraint to be added, etc.

Pure SAT, MaxSAT, Pseudo-Boolean SAT problem?

In the past, dependency problems have been solved either by custom heuristics or by using a SAT solver and pruning results with ad-hoc heuristics after the resolution.
Nowadays, more and more dependency problems are solved either by reducing the problem space (eg: go modules) or by using a SAT solver, with or without heuristics to help reduce the problem space.

In our case, we have several strategies we want to apply to our problem, some of which may be selected at run time. Given the amount of strategies and the constraints arising from our package definition, we will define our problem as a Pseudo-Boolean SAT one. This allows to specify weights on our statements, to implement our strategies, and allows for multicriteria optimization.

Also, since the expected number of charts installed in a cluster is going to be "low" (in the order of 10^1 instead of 10^3 or more that is normally seen for these problems), we may try to obtain more optimized solutions in a reasonable time.
In addition, some solvers have the ability to return sub-optimal solutions as soon as they find them, without waiting for the optimal solution calculation.

How does an upgrade operation work in a pure SAT dependency problem?

On traditional Linux OS dependency problems, the satisfiability of the upgrades is checked after the fact that packages have entered the repositories. This is normally accomplished by running tools against the repositories. These can be SAT solvers, specific scheduled jobs (e.g: debcheck & paper, debgraph in Debian), CI jobs for default installations, CI jobs for upgrades between different distribution releases, etc, in addition to normal QA work. This is is also complemented by gating the package set through several repositories. E.g: OpenSUSE Factory -> Tumbleweed -> Leap, or Debian Unstable -> Testing -> Stable. On the distribution receiving new packages (OpenSUSE Factory, Debian Unstable), transient uninstallability problems are expected. These problems are increasingly exposed when the same package cannot be installed in several architectures. On the more stable distributions (Opensuse Leap, Debian Stable), it is expected that all or most of the installability problems have been found.

In Hypper's case, we want the possibility to depend on packages (Helm charts) outside of our curated repositories. Therefore, we would benefit from knowing if a system can be upgraded or not, regardless of our ability on curating the repositories. Hence, we use a SAT solver directly when performing those upgrades.

Strategies

  • Minimize package removal.
  • Minimize number of package charts.
  • Maximize freshness of packages. By assigning a distance between package versions, we can optimize for those that have the smaller distance. Doing a full upgrade of the system means selecting to maximize freshness of all installed packages, without asking for installing new packages by default (but allowing for packages to be pulled into the installed list).
  • Minimize install of optional packages.
  • Maximize install of optional packages.

Not considered right now:

  • Minimize changes to installed packages (version bumps, values.yaml, CRD charts).

Encoding the Pseudo-Boolean Optimization problem

@viccuad
Copy link
Member

viccuad commented May 26, 2021

Existing solver libraries in Golang that we considered

We want a MAXSAT/Pseudo-Boolean SAT solver library with ample feature support, good docs, good API, a
thriving community, verbose output, in Golang, and with a compatible license
that can be statically linked.
We want a solver that is used in the field, updated continously with latest SAT
improvements from SAT competitions.
We want a solver that handles optimization problems (hence, a pseudo-boolean SAT solver).

Gini https://github.com/IRIFrance/gini

1 big known consumer, github.com/operator-framework/operator-lifecycle-manager.
Only pure SAT CNF problems.

Luet https://github.com/mudler/luet https://github.com/mudler/luet

GPLv3.
It defines a pure SAT problem (with constraints and literals inspired by OPIUM) (see https://github.com/mudler/luet/blob/master/pkg/solver/solver.go#L37-L54).
It uses the pure SAT solving of gophersat, and, if there's some failure, it uses qlearning to recover (see https://github.com/mudler/luet/blob/master/pkg/solver/resolver.go).
See also https://luet-lab.github.io/docs/docs/concepts/overview/constraints/.

go-sat https://github.com/mitchellh/go-sat

Pure SAT solver.
Last developed 4 years ago.
No known consumers.

saturday https://github.com/cespare/saturday

Pure SAT solver.

dpll https://github.com/bmatsuo/dpll

Pure SAT solver.

Go dep https://github.com/golang/dep/tree/master/gps

Deprecated, too tightly coupled with go modules.

vgo https://github.com/golang/go

Uses non-SAT solver heuristics:
https://research.swtch.com/vgo-principles
https://research.swtch.com/vgo-mvs

Non exported API, all under src/cmd/go/internal/{mod*,mvs}.

Selected implementation

Gophersat https://github.com/crillab/gophersat

No big known consumers, but several medium consumers (see
https://github.com/search?q=gophersat&type=code).
Solves pseudo-boolean, MAXSAT problems (e.g: efficiency problems).
Provides sub-optimal solutions to PB/MaxSAT problems as soon as it finds them, without needing to wait for the optimal solution. We can then decide how much we want to wait for a solution.
Last development Nov 2020.
Announcement email 2017: https://groups.google.com/g/golang-nuts/c/SE1dNC8N46o/m/0drzp9jdAQAJ.

@mattfarina
Copy link
Collaborator Author

How we handle upgrading shared dependencies is something we should ask the charts folks in rancher. They may have some insight. The same goes for any of our assumptions. I pinged some folks on this case.

@mattfarina
Copy link
Collaborator Author

Minimize changes to installed packages (version bumps, values.yaml, CRD charts).

We should minimize changes to the installed packages. If a shared dependency is already present at an acceptable version we should continue running it at that version.

@viccuad
Copy link
Member

viccuad commented May 28, 2021

We should minimize changes to the installed packages. If a shared dependency is already present at an acceptable version we should continue running it at that version.

For version bumps, that's easily doable, and should come for free: a "package" is defined by its version and chart, so asking for not bumping packages is asking to not install specific packages. For changes to values (and redeployments), we may need to include values to the package definition.

@viccuad
Copy link
Member

viccuad commented May 28, 2021

Opened draft PR to be able to argue better than here in the comments: #171.

@viccuad viccuad moved this from In progress to Done in Shit It Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants