Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof-carrying code: change range facts to be more general. #7965

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Feb 20, 2024

Previously, the proof-carrying code (PCC) mechanism optionally described each value in the program with at most one fact of the form:

  • A static range (min, max are u64s);
  • A dynamic range (min, max are symbolic expressions: sums of global values or SSA values and offsets)

This worked well enough in filetests that exercised PCC for static and dynamic Wasm memory cases: the fact language was expressive enough to describe all the invariants.

However, as soon as the optimizer started to combine different accesses together -- for example, by sharing one select_spectre_guard across multiple accesses -- it quickly became apparent that we might need to describe both a static and dynamic range for one value. The existing system fails to check, producing Conflict facts, on the new test in pcc_memory.rs here.

To make the fact language more expressive, I worked through a series of more expressive variants until finding one that seems to work:

  • First, a variant with combined static and dynamic ranges: e.g., range(0, 0xffff, 0, gv1) means that a value is within the given static range and also less than or equal to gv1. This allows the intersection of dynamic and static facts to work, but has a lot of weird edge-cases because it's like two analyses glued together in a product type; we really want to cross-compare against the two sometimes, e.g. if we know static range facts about the symbolic expressions and want to apply those elsewhere. It also led to weird logic due to redundancy: the expressions could also be constants (no "base value") and so we handled the constant-value case twice. It seemed that actually the two worlds should be merged completely.

  • Next, a variant with only Exprs, and two cases for a range: Exact (with one or more expressions that are known to be equivalent to the value) and Inclusive, with min and max lists. In both cases we want lists because we may know that a value is, for example, less than both v1 and gv2; both are needed to prove different things, and the relative order of the two is not known so it cannot be simplified.

    This was almost right; it fell apart only when working out apply_inequality where it became apparent that we need to sometimes state that a value is exactly equal to some expressions and less than others (e.g., exactly v1 and also in a 32-bit range).

    Aside from that it was also a bit awkward to have a four-case (or three-case for commutative) breakdown in all ops: exact+exact, exact+inclusive, inclusive+inclusive.

  • Finally, the variant in this PR: every range is described by three lists, the min, equal and max sets of expressions.

The way this works is: for any value for which we have a fact, we collect lower and upper bounds, and also expressions we know it's equivalent to. Like an egraph, we don't drop facts or "simplify" along the way, because any of the bits may be useful later. However we don't explode in memory or analysis time because this is bounded by the stated facts: we locally derive the "maximum fact" for the result of an addition, then check if it implies the stated fact on the actual result, then keep only that stated fact.

The value described by these sets is within the interval that is the intersection of all combinations of min/max values; this makes intersect quite simple (union the sets of bounds, and the equalities, because it must be both). Some of the other ops and tests -- union, and especially "is this value in the range" or "does this range imply this other range" -- are a little intricate, but I think correct. To pick a random example: an expression is within a range if we can prove that it is greater than or equal to all lower bounds, and vice-versa for upper bounds; OR if it is exactly equal to one of the equality bounds. Equality is structural on Exprs (i.e., the default derived Eq is valid) because they are not redundant: there is only one way to express v1+1, and we can never prove that v1 == v2 within the context of one expression.

I will likely write up a bunch more in the top doc-comment and throughout the code; this is meant to get the working system in first. (I'm also happy to do this as part of this PR if preferred.)

There are also some ideas for performance improvement if needed, e.g. by interning ValueRanges and then memoizing the operations (intersect(range2, range5) = range7 in a lookup table). I haven't measured perf yet.

I also haven't fuzzed this yet but will do so and then submit any required bugfixes separately. Hopefully we can get this turned on soon!

Previously, the proof-carrying code (PCC) mechanism optionally described
each value in the program with at most one fact of the form:

- A static range (min, max are `u64`s);
- A dynamic range (min, max are symbolic expressions: sums of global
  values or SSA values and offsets)

This worked well enough in filetests that exercised PCC for static and
dynamic Wasm memory cases: the fact language was expressive enough to
describe all the invariants.

However, as soon as the optimizer started to combine different accesses
together -- for example, by sharing one `select_spectre_guard` across
multiple accesses -- it quickly became apparent that we might need to
describe both a static *and* dynamic range for one value. The existing
system fails to check, producing `Conflict` facts, on the new test in
`pcc_memory.rs` here.

To make the fact language more expressive, I worked through a series of
more expressive variants until finding one that seems to work:

- First, a variant with combined static *and* dynamic ranges: e.g.,
  `range(0, 0xffff, 0, gv1)` means that a value is within the given
  static range *and* also less than or equal to `gv1`. This allows the
  intersection of dynamic and static facts to work, but has a lot of
  weird edge-cases because it's like two analyses glued together in a
  product type; we really want to cross-compare against the two
  sometimes, e.g. if we know static range facts about the symbolic
  expressions and want to apply those elsewhere. It also led to weird
  logic due to redundancy: the expressions could also be constants (no
  "base value") and so we handled the constant-value case twice. It
  seemed that actually the two worlds should be merged completely.

- Next, a variant with *only* `Expr`s, and two cases for a range:
  `Exact` (with one or more expressions that are known to be equivalent
  to the value) and `Inclusive`, with `min` and `max` *lists*. In both
  cases we want lists because we may know that a value is, for example,
  less than both `v1` and `gv2`; both are needed to prove different
  things, and the relative order of the two is not known so it cannot be
  simplified.

  This was almost right; it fell apart only when working out
  `apply_inequality` where it became apparent that we need to sometimes
  state that a value is exactly equal to some expressions *and* less
  than others (e.g., exactly `v1` and also in a 32-bit range).

  Aside from that it was also a bit awkward to have a four-case (or
  three-case for commutative) breakdown in all ops: exact+exact,
  exact+inclusive, inclusive+inclusive.

- Finally, the variant in this PR: every range is described by three
  lists, the `min`, `equal` and `max` sets of expressions.

The way this works is: for any value for which we have a fact, we
collect lower and upper bounds, and also expressions we know it's
equivalent to. Like an egraph, we don't drop facts or "simplify" along
the way, because any of the bits may be useful later. However we don't
explode in memory or analysis time because this is bounded by the stated
facts: we locally derive the "maximum fact" for the result of an
addition, then check if it implies the stated fact on the actual result,
then keep only that stated fact.

The value described by these sets is within the interval that is the
intersection of all combinations of min/max values; this makes
`intersect` quite simple (union the sets of bounds, and the equalities,
because it must be both). Some of the other ops and tests -- `union`,
and especially "is this value in the range" or "does this range imply
this other range" -- are a little intricate, but I think correct. To
pick a random example: an expression is within a range if we can prove
that it is greater than or equal to all lower bounds, and vice-versa for
upper bounds; OR if it is exactly equal to one of the equality bounds.
Equality is structural on `Expr`s (i.e., the default derived `Eq` is
valid) because they are not redundant: there is only one way to express
`v1+1`, and we can never prove that `v1 == v2` within the context of one
expression.

I will likely write up a bunch more in the top doc-comment and
throughout the code; this is meant to get the working system in first.
(I'm also happy to do this as part of this PR if preferred.)

There are also some ideas for performance improvement if needed, e.g. by
interning `ValueRange`s and then memoizing the operations
(`intersect(range2, range5) = range7` in a lookup table). I haven't
measured perf yet.

I also haven't fuzzed this yet but will do so and then submit any
required bugfixes separately. Hopefully we can get this turned on soon!
@cfallin cfallin requested review from a team as code owners February 20, 2024 19:57
@cfallin cfallin requested review from abrown, alexcrichton and fitzgen and removed request for a team, abrown and alexcrichton February 20, 2024 19:57
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just a few questions and general suspicion flagging.

But most importantly: shouldn't there be a new clif filetest or two for the GVN behavior that warranted this fact representation change?

cranelift/codegen/src/ir/pcc.rs Outdated Show resolved Hide resolved
Comment on lines +149 to +166
pub struct ValueRange {
/// Lower bounds (inclusive). The list specifies a set of bounds;
/// the concrete value is greater than or equal to *all* of these
/// bounds. If the list is empty, then there is no lower bound.
pub min: SmallVec<[Expr; 1]>,
/// Upper bounds (inclusive). The list specifies a set of bounds;
/// the concrete value is less than or equal to *all* of these
/// bounds. If the list is empty, then there is no upper bound.
pub max: SmallVec<[Expr; 1]>,
/// Equalties (inclusive). The list specifies a set of values all
/// of which are known to be equal to the value described by this
/// range. Note that if this is non-empty, the range's "size"
/// (cardinality of the set of possible values) is exactly one
/// value; but we may not know a concrete constant, and it is
/// still useful to carry around the lower/upper bounds to enable
/// further comparisons to be resolved.
pub equal: SmallVec<[Expr; 1]>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess let's just get something working first and worry about performance later, but having a bunch of smallvecs is a little concerning. In general, we try to keep all the guts of the data in DataFlowGraph be Copy and use things like cranelift_entity::EntityList to manage non-POD data within. But yeah, we can come back to this in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! I had actually started down that road, but both managing the efficient refs-to-list-arena representation and getting the logic right at the same time was getting to be a bit too much; so I wanted to get it working first. As mentioned at the end of the commit message I do think we can do something at a high level: intern ValueRanges and then memoize operations over them (which addresses the quadratic bit below too).

cranelift/codegen/src/ir/pcc.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/ir/pcc.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/ir/pcc.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/ir/pcc.rs Show resolved Hide resolved
Comment on lines +631 to +640
let min = self
.min
.iter()
.filter(|&e| {
self.min
.iter()
.all(|other| e == other || !Expr::le(other, e))
})
.cloned()
.collect::<SmallVec<[Expr; 1]>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The O(n^2) loop is a little scary here but I guess in practice this is sized at most by the number of unique values we GVN together? Which seems like it should generally be relatively small? But also something that is ~unbounded and user-controllable 😬

But I guess they aren't controlling the actual facts used during a wasm load and the inputs to an i32.load instruction are basically any i32 value. So I think this is maybe okay?

Seems like something we should think about and have a comment for calming readers down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a very interesting point: I was able to control the number of equal expressions by writing a bunch of addresses that simplify to the same value. (E.g., loads from p and p + (4-4).) Lower and upper bounds on the other hand are not user-controllable as the dynamic ones arise from memories and any given access only touches one memory (and static ones are simplified so that only one lower and one upper remain). Great catch and thank you for thinking of this!

The need for "equal" constraints arises from the transitivity -- we need to represent the values v1 and gv1 when we see icmp gte v1, gv1 -- but I suspect we might be able to get away with at most one. This does mean we lose information sometimes but...

(A possible sketch of a strategy: when we merge, always keep the "lesser" expression per the arbitrary lexicographical ordering; as long as we're consistent, we keep the comparison info for v1 vs. gv1 but not v2 vs. gv1, across all related facts, and all related facts should merge if we have two loads of addresses v1 and v2 that ended up merging during opt.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A possible sketch of a strategy: when we merge, always keep the "lesser" expression per the arbitrary lexicographical ordering; as long as we're consistent, we keep the comparison info for v1 vs. gv1 but not v2 vs. gv1, across all related facts, and all related facts should merge if we have two loads of addresses v1 and v2 that ended up merging during opt.)

This sounds very reasonable to me.

Comment on lines +1128 to +1134
/// Does this fact describe an exact expression?
pub fn as_expr(&self) -> Option<&Expr> {
match self {
Fact::Range {
range: ValueRange { equal, .. },
..
} => equal.first(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that we can return any of equal and be correct here. Does it matter? How are callers using this method? Does it make sense to tweak what is being asked here?

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:wasm labels Feb 21, 2024
- aarch64: support comparisons in the opposite direction, and
  comparisons against constants. (Arose when testing with constant
  addresses.)

- Machine-independent code: support merging Def and Compare facts by
  breaking ties toward the lowest value-numbers.
@cfallin
Copy link
Member Author

cfallin commented Feb 21, 2024

With some more testing I've found the equality mechanism to be a little fragile (specifically around constants, as well as the above-mentioned quadratic representation for multiple merged SSA vals), and I need to go have a think; so I'm gonna switch this to "draft" mode.

@cfallin cfallin marked this pull request as draft February 21, 2024 07:23
@cfallin
Copy link
Member Author

cfallin commented Feb 23, 2024

After a little more thought around both the specific last issues seen here, and the efficiency in general of the larger facts, I had the above-mentioned think, and now I have a thought on a new approach that I'd like to document here before (unfortunately) having to context-switch away for a bit.

To recap, the current state is:

  • main today has PCC that works fine for static checks (memory with sufficiently large guard regions); the range facts are compact Copy structs that describe statically-known integer bounds.

  • The difficulties arise when modeling symbolic expressions and comparisons between them that arise with dynamic checks -- specifically, tying together a compare with a conditional-select -- when this interacts either with optimizations merging things together or with constant values.

In my view, these difficulties are symptoms of trying to represent a "slice" of the overall worldview (partial order relation between program values) in facts on individual values. That is, from the perspective of an individual value, we have to carry around its inequalities with a bunch of other values in case we need them. Furthermore, because facts are attached to values, when constants are involved, we can lose track of things: a constant value can be rematerialized or incorporated in a bunch of different places and does not have a stable identity like an SSA value v1 does.

So, IMHO a reasonable solution might be to build a "whole worldview" data structure, and share it across the function. One can see this as taking this PR a bit further: this PR lets a fact on a single value talk about multiple relations (so single facts go from atomic tidbits to slices of the whole world); instead let's build "the whole world" and then point into it.

To be more concrete, I envision a "partial order relation DAG" with nodes that represent sets of equal values, either symbolic (SSA values, global values) or concrete (integer constants), and edges that represent less-than-or-equal relations. Edge labels represent known "distance" (e.g. 5 on edge from x to y means x + 5 <= y), and edges can also be labeled with predicates that make them true (value v1, the result of an icmp, implies v2 <= v3).

The idea is that we can point to any given node in this DAG from a fact on a value in the IR; these DAG nodes don't actually need to track the set of equal values (that's implicit in the set of all nodes that point to them); and we can do a union-find-like thing to efficiently answer less-than queries. Basically we can implement le(node1, node2, Option<pred>) -> Option<distance> where we get a Some(min_distance) if there is some path from one node to another (enabling edges labeled under pred if present), where min_distance is the sum of edge labels. Then we can compress the path by adding a direct edge (transitivity!) to make further queries more efficient. I suspect this will lead to a fairly efficient overall check of a function body, because the same comparisons will be done over and over. (E.g., different offsets from a base pointer into a struct, all less than the guard-region size.)

As mentioned I need to context-switch away for a bit but I'll try to get back to this at some point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:wasm cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants