Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: add cache to address codec #20122

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -61,6 +61,7 @@ Every module contains its own CHANGELOG.md. Please refer to the module you are i

### Improvements

* (codec) [#20122](https://github.com/cosmos/cosmos-sdk/pull/20122) Added a cache to address codec.
* (types) [#19869](https://github.com/cosmos/cosmos-sdk/pull/19869) Removed `Any` type from `codec/types` and replaced it with an alias for `cosmos/gogoproto/types/any`.
* (server) [#19854](https://github.com/cosmos/cosmos-sdk/pull/19854) Add customizability to start command.
* Add `StartCmdOptions` in `server.AddCommands` instead of `servertypes.ModuleInitFlags`. To set custom flags set them in the `StartCmdOptions` struct on the `AddFlags` field.
JulianToledano marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
108 changes: 106 additions & 2 deletions codec/address/bech32_codec.go
Expand Up @@ -3,23 +3,92 @@ package address
import (
"errors"
"strings"
"sync"
"sync/atomic"

"github.com/hashicorp/golang-lru/simplelru"

"cosmossdk.io/core/address"
errorsmod "cosmossdk.io/errors"

"github.com/cosmos/cosmos-sdk/internal/conv"
sdkAddress "github.com/cosmos/cosmos-sdk/types/address"
"github.com/cosmos/cosmos-sdk/types/bech32"
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"
)

const (
// TODO: ideally sdk.GetBech32PrefixValAddr("") should be used but currently there's a cyclical import.
// Once globals are deleted the cyclical import won't happen.
suffixValAddr = "valoper"
suffixConsAddr = "valcons"
)

// cache variables
var (
accAddrMu sync.Mutex
accAddrCache *simplelru.LRU
consAddrMu sync.Mutex
consAddrCache *simplelru.LRU
valAddrMu sync.Mutex
valAddrCache *simplelru.LRU

isCachingEnabled atomic.Bool
)

func init() {
var err error
isCachingEnabled.Store(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is always true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, followed the same style as in address.go, although I think it doesn't make much sense.


// in total the cache size is 61k entries. Key is 32 bytes and value is around 50-70 bytes.
// That will make around 92 * 61k * 2 (LRU) bytes ~ 11 MB
if accAddrCache, err = simplelru.NewLRU(60000, nil); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference: it would be great to have this configurable so people can trade CPU for memory or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm not sure what the best approach is to solve this, as LRUs are currently shared between codecs and defined as globals.

panic(err)
}
if consAddrCache, err = simplelru.NewLRU(500, nil); err != nil {
panic(err)
}
if valAddrCache, err = simplelru.NewLRU(500, nil); err != nil {
panic(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 do we need 3 different cache types? I was wondering if this is for sharding or pinning maybe? The LRU should keep the frequently used ones in cache anyway? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytes of the addresses are the same, but there are three possible outputs (AccAddress, ValAddress, ConsAddress). I see three options here:

  1. Use three different caches.
  2. Add the prefix of the address codec to the key
  3. Store a map in the LRU with prefix as keys (e.g., map["cosmos"] = "cosmos1dr3...")

I've chosen the first option to maintain consistency with how things were done before.

cosmos-sdk/types/address.go

Lines 100 to 111 in 8e60f3b

var (
// AccAddress.String() is expensive and if unoptimized dominantly showed up in profiles,
// yet has no mechanisms to trivially cache the result given that AccAddress is a []byte type.
accAddrMu sync.Mutex
accAddrCache *simplelru.LRU
consAddrMu sync.Mutex
consAddrCache *simplelru.LRU
valAddrMu sync.Mutex
valAddrCache *simplelru.LRU
isCachingEnabled atomic.Bool
)

}

type Bech32Codec struct {
Bech32Prefix string
}

var _ address.Codec = &Bech32Codec{}
type cachedBech32Codec struct {
codec Bech32Codec
mu *sync.Mutex
JulianToledano marked this conversation as resolved.
Show resolved Hide resolved
cache *simplelru.LRU
}

var (
_ address.Codec = &Bech32Codec{}
_ address.Codec = &cachedBech32Codec{}
)

func NewBech32Codec(prefix string) address.Codec {
return Bech32Codec{prefix}
ac := Bech32Codec{prefix}
if !isCachingEnabled.Load() {
return ac
}

lru := accAddrCache
mu := &accAddrMu
if strings.HasSuffix(prefix, suffixValAddr) {
lru = valAddrCache
mu = &valAddrMu
} else if strings.HasSuffix(prefix, suffixConsAddr) {
lru = consAddrCache
mu = &consAddrMu
}

return cachedBech32Codec{
codec: ac,
cache: lru,
mu: mu,
}
}

// StringToBytes encodes text to bytes
Expand Down Expand Up @@ -61,3 +130,38 @@ func (bc Bech32Codec) BytesToString(bz []byte) (string, error) {

return text, nil
}

func (cbc cachedBech32Codec) BytesToString(bz []byte) (string, error) {
key := conv.UnsafeBytesToStr(bz)
cbc.mu.Lock()
defer cbc.mu.Unlock()

if addr, ok := cbc.cache.Get(key); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is possible but in case the bytes match an existing bech32 address, it can create some trouble. Adding key prefixes may solve this or not sharing the same cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See first comment. Currently, there are three different caches, but adding key prefixes may be a solution if we prefer just one.

return addr.(string), nil
}

addr, err := cbc.codec.BytesToString(bz)
if err != nil {
return "", err
}
cbc.cache.Add(key, addr)

return addr, nil
}
JulianToledano marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be more than 1 bech32 prefix per type? The cache is global and may return wrong strings for the byte representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time its gonna be just one bech32 prefix. But there may be uses cases with more than one. For example someone who wants to develop a multi chain client. In those cases this cache may be a problem.

I'm more inclined to add the prefix to the key now. Or not share LRUs between codecs at all.

🤔🤔🤔

Copy link
Contributor

@alpe alpe May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is BytesToString for an address as this path may not point to an unique address. A unique cache key prefix for each cachedBech32Codec instance to the byte key would prevent this. You can keep the global cache but would require a global index for the bech32 prefix to cache prefix as well so that NewBech32Codec(prefix string) resolve to the same cache key prefix.
The alternative is a cache per cachedBech32Codec as you pointed out. This gives a better isolation but it comes with the cost of extra memory which can be a lot. You would still need a global index but for the bech32 prefix to cache now.
A third alternative is caching for the default bech32 prefixes defined in Config only. With this approach you won't need indexes or cache key prefixes. I would assume it would cover the majority of use cases on a chain.
Nevertheless, there is complexity to pass these defaults to the cache on initialization.
I have some preference on option 3 but option 1 can also work well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a global index for the bech32 prefix to cache prefix as well so that NewBech32Codec(prefix string) resolve to the same cache key prefix.

I think caches defined prefix should be sufficient, we can concatenate the prefix to the bytes as a key.

type Bech32Codec struct {
Bech32Prefix string
}

E.g. if two address codecs are defined with prefixes cosmos and osmo the LRU table'll look like this:

Key Value
cosmos010101 cosmos1g3...
osmo010101 osmo1vt...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work as well. 👍 Keys get bigger but they are unique


func (cbc cachedBech32Codec) StringToBytes(text string) ([]byte, error) {
cbc.mu.Lock()
defer cbc.mu.Unlock()

if addr, ok := cbc.cache.Get(text); ok {
return addr.([]byte), nil
}

addr, err := cbc.codec.StringToBytes(text)
if err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 it can be worth to cache failures, too. Some benchmarks would be interesting.

}
cbc.cache.Add(text, addr)

return addr, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize locking strategy in cachedBech32Codec methods.

The current implementation locks the entire method, which could lead to performance bottlenecks. Consider using a more granular locking strategy or other synchronization techniques like sync.Map which is optimized for the use case where the entry for a given key is only written once but read many times.

JulianToledano marked this conversation as resolved.
Show resolved Hide resolved
115 changes: 115 additions & 0 deletions codec/address/bech32_codec_test.go
@@ -0,0 +1,115 @@
package address

import (
"encoding/binary"
"testing"
"time"

"github.com/hashicorp/golang-lru/simplelru"
"gotest.tools/v3/assert"
JulianToledano marked this conversation as resolved.
Show resolved Hide resolved

"cosmossdk.io/core/address"

"github.com/cosmos/cosmos-sdk/internal/conv"
)

func TestNewBech32Codec(t *testing.T) {
tests := []struct {
name string
prefix string
lru *simplelru.LRU
address string
}{
{
name: "create accounts cached bech32 codec",
prefix: "cosmos",
lru: accAddrCache,
address: "cosmos1p8s0p6gqc6c9gt77lgr2qqujz49huhu6a80smx",
},
{
name: "create validator cached bech32 codec",
prefix: "cosmosvaloper",
lru: valAddrCache,
address: "cosmosvaloper1sjllsnramtg3ewxqwwrwjxfgc4n4ef9u2lcnj0",
},
{
name: "create consensus cached bech32 codec",
prefix: "cosmosvalcons",
lru: consAddrCache,
address: "cosmosvalcons1ntk8eualewuprz0gamh8hnvcem2nrcdsgz563h",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
assert.Equal(t, tt.lru.Len(), 0)
ac := NewBech32Codec(tt.prefix)
cached, ok := ac.(cachedBech32Codec)
assert.Assert(t, ok)
assert.Equal(t, cached.cache, tt.lru)

addr, err := ac.StringToBytes(tt.address)
assert.NilError(t, err)
assert.Equal(t, tt.lru.Len(), 1)

cachedAddr, ok := tt.lru.Get(tt.address)
assert.Assert(t, ok)
assert.DeepEqual(t, addr, cachedAddr)

accAddr, err := ac.BytesToString(addr)
assert.NilError(t, err)
assert.Equal(t, tt.lru.Len(), 2)

cachedStrAddr, ok := tt.lru.Get(conv.UnsafeBytesToStr(addr))
assert.Assert(t, ok)
assert.DeepEqual(t, accAddr, cachedStrAddr)
})
}
}
JulianToledano marked this conversation as resolved.
Show resolved Hide resolved

func TestBech32CodecRace(t *testing.T) {
ac := NewBech32Codec("cosmos")

workers := 4
done := make(chan bool, workers)
cancel := make(chan bool)

for i := byte(1); i <= 2; i++ { // works which will loop in first 100 addresses
go addressStringCaller(t, ac, i, 100, cancel, done)
}

for i := byte(1); i <= 2; i++ { // works which will generate 1e6 new addresses
go addressStringCaller(t, ac, i, 1000000, cancel, done)
}

<-time.After(time.Millisecond * 30)
close(cancel)

// cleanup
for i := 0; i < 4; i++ {
<-done
JulianToledano marked this conversation as resolved.
Show resolved Hide resolved
}
}

// generates AccAddress with `prefix` and calls String method
func addressStringCaller(t *testing.T, ac address.Codec, prefix byte, max uint32, cancel chan bool, done chan<- bool) {
t.Helper()

bz := make([]byte, 5) // prefix + 4 bytes for uint
bz[0] = prefix
for i := uint32(0); ; i++ {
if i >= max {
i = 0
}
select {
case <-cancel:
done <- true
return
default:
binary.BigEndian.PutUint32(bz[1:], i)
str, err := ac.BytesToString(bz)
assert.NilError(t, err)
assert.Assert(t, str != "")
}

}
}
8 changes: 2 additions & 6 deletions simapp/app.go
Expand Up @@ -202,12 +202,8 @@ func NewSimApp(
interfaceRegistry, _ := types.NewInterfaceRegistryWithOptions(types.InterfaceRegistryOptions{
ProtoFiles: proto.HybridResolver,
SigningOptions: signing.Options{
AddressCodec: address.Bech32Codec{
Bech32Prefix: sdk.GetConfig().GetBech32AccountAddrPrefix(),
},
ValidatorAddressCodec: address.Bech32Codec{
Bech32Prefix: sdk.GetConfig().GetBech32ValidatorAddrPrefix(),
},
AddressCodec: address.NewBech32Codec(sdk.GetConfig().GetBech32AccountAddrPrefix()),
ValidatorAddressCodec: address.NewBech32Codec(sdk.GetConfig().GetBech32ValidatorAddrPrefix()),
},
})
appCodec := codec.NewProtoCodec(interfaceRegistry)
Expand Down