Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: determinism with maps #13039

Closed
tac0turtle opened this issue Aug 25, 2022 · 13 comments
Closed

EPIC: determinism with maps #13039

tac0turtle opened this issue Aug 25, 2022 · 13 comments
Labels
T:Epic Epics

Comments

@tac0turtle
Copy link
Member

tac0turtle commented Aug 25, 2022

Summary

As mentioned in CosmWasm/wasmd#941 non-determinism in maps is a concern for users of cosmwasm and others. There are places the Cosmos-SDK uses maps in many places we should aim to replace this with a deterministic sort with maps or something like a btree.

Problem Definition

Maps are non-deterministic if handled incorrectly. Cosmos-sdk has a few places where maps could end in non-determinism

Proposal

Replace maps with a deterministic ordering data structure or sort for maps where needed.

cc @ValarDragon @ethanfrey

@tac0turtle tac0turtle added the T:Epic Epics label Aug 25, 2022
@aaronc
Copy link
Member

aaronc commented Aug 25, 2022

I would strongly advocate for a generic ordered map so we don't suffer the performance hit of sorting maps all over the place. We just need to make sure we choose an implementation which is well tested - might make sense to bring something into our source tree even.

@SpicyLemon
Copy link
Collaborator

I'd personally prefer to not have an added object layer. It adds complexity and makes things less portable.

In my opinion, a better solution would be a linter that complains/fails when range is used on a map. With the addition of generics to go, it's now easy to get a sorted list of keys, which is usually the annoying part when fixing a map loop. I'd much rather keep things as maps and just change range myMap to range getSortedKeys(myMap). You get to keep all the handling/syntax of a map and only address the problem where it's a problem; you're not changing things to an alternative data structures just because you might want to loop over it.

@aaronc
Copy link
Member

aaronc commented Aug 25, 2022

With the addition of generics to go, it's now easy to get a sorted list of keys, which is usually the annoying part when fixing a map loop. I'd much rather keep things as maps and just change range myMap to range getSortedKeys(myMap).

The big downside is you always have to do sorting at runtime and pay that performance cost. Performance has been a big issue for a while and if we want a sorted data structure, we should just use one.

@SpicyLemon
Copy link
Collaborator

The big downside is you always have to do sorting at runtime and pay that performance cost. Performance has been a big issue for a while and if we want a sorted data structure, we should just use one.

If you've got a map that is primarily used in loops, then I agree, a sorted data structure is called for. But if you only ever loop on it once or in sporadic instances (where its primary purpose is lookup), just sort the keys when you want to loop.

It'd be annoying if all maps were changed to a non-native data structure just in the off chance that they're needed in a loop.

@ValarDragon
Copy link
Contributor

ValarDragon commented Aug 29, 2022

uhh, I promise from live benchmarks, that sorting in application data OR map insertions is not showing up in any performance problems right now. I really think that its misguided to focus on performance here

The only place that sorting shows as taking forever is in the CacheKV store, which is a different problem where the entire design needs to be fixed.

I don;t really care about the decision of either a generics based BTree, or a Hashmap wrapper that sorts for iteration / has deterministic serialization (cref #12816 ). If the design is done right, this should really not be state machine breaking to change later, as they can have the same serialization/deserialization from proto. (Up to gas parameters, which we can just make the same)

@kocubinski
Copy link
Contributor

a better solution would be a linter that fails when range is used on a map

It should fail. It already complains, but this apparently isn't loud enough.

keep things as maps and just change range myMap to range getSortedKeys(myMap)

This is the best design to me as it's barely a design at all. If we see performance problems with map iteration outside of store/cachekv we can revisit.

The only place that sorting shows as taking forever is in the CacheKV store, which is a different problem where the entire design needs to be fixed.

A recent performance fix #12885 improved cache sorting significantly, but from what I've seen the current cache impl could be transitioned to a single BTree backed map instead of separating sorted/unsorted. This is smaller scope than the entire rewrite talked about in #12986. What do you think?

deterministic serialization

Serialization seems orthogonal to the problem at hand, am I missing something? If we always sort keys prior to iterating the form of the serialized data is irrelevant, provided we're OK with the runtime sorting performance hit.

@alexanderbez
Copy link
Contributor

From what I'm gathering and my two cents, it sounds like using generics over ranges is the most straightforward approach. BTrees being a second alternative, but might be overkill for most situations.

@08d2
Copy link
Contributor

08d2 commented Aug 31, 2022

@marbar3778

Maps are non-deterministic if handled incorrectly. Cosmos-sdk has a few places where maps could end in non-determinism

The Go map type is always non-deterministic:

The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next.

https://go.dev/ref/spec#RangeClause

Code that needs deterministic iteration order of key-value associative containers can't use the map type, at least not directly. Of course, the map type is built-in and ubiquitous in Go code, so it's not realistic to suggest a project can "replace" it with a different type.

If you want a map with ordered keys, you have basically two options. As suggested, you can use a separate type (like a BTree-based container) locally, and perform conversions to and from Go maps at API boundaries. But this is arduous, error-prone, and the impact of allocations on the GC can easily become significant.

The better option is almost always to use the regular map type, and when you want deterministic range order, extract and sort the keys, e.g.

func print(m map[string]int) {
    keys := make([]string, 0, len(m))
    for k := range m { keys = append(keys, k) }
    sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })

    for _, k := range keys {
        v := m[k]
        // ...
    }
}

Generics can make this easier, for sure.


@aaronc

The big downside is you always have to do sorting at runtime and pay that performance cost. Performance has been a big issue for a while and if we want a sorted data structure, we should just use one.

@ValarDragon is correct in observing that the performance cost of this O(n) range clause is almost certainly statistically zero relative to other performance inefficiencies in Cosmos, and not something worth designing around.

@robert-zaremba
Copy link
Collaborator

Anther potential source of problem is when we return a map in a query. This must be prohibited as well.

@alexanderbez
Copy link
Contributor

We don't need to prohibit that per se, since we're going with the annotation approach -- queries that are deemed safe for state machine use, will be annotated as such.

@robert-zaremba
Copy link
Collaborator

I would prefer to make such prohibition for the sake of completeness.

@alexanderbez
Copy link
Contributor

I would prefer to make such prohibition for the sake of completeness.

That's just not practical. It's not going to happen.

@odeke-em
Copy link
Collaborator

odeke-em commented Dec 7, 2023

So myself and team at Orijtech spent time and we did implement in 2022 a gosec static analyzer https://github.com/cosmos/gosec that would run on all cosmos-sdk code and it helped identify a bunch of map iterations which we sent in various PRs like #13554 but that got closed without merged.

However the pass was later deemed to be noisy and disabled per cosmos/gosec@bf28a33

It could be re-enabled if folks see fit or more work done on the pass, but the tools do exist and as we saw it might have been nice to discuss but in reality doesn't seem like a priority that was cared about. Should we be closing this issue otherwise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T:Epic Epics
Projects
Status: 🥳 Done
Development

No branches or pull requests

9 participants