Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inspect: add inspect mode for debugging crashed tendermint node #6785

Merged
merged 48 commits into from Aug 24, 2021

Conversation

williambanfield
Copy link
Contributor

@williambanfield williambanfield commented Jul 29, 2021

EDIT: Updated, see comment below

This change adds a sketch of the Debug mode.

This change adds a Debug struct to the node package. This Debug struct is intended to be created and started by a command in the cmd directory. The Debug struct runs the RPC server on the data directories: both the state store and the block store.

This change required a good deal of refactoring. Namely, a new rpc.go file was added to the node package. This file encapsulates functions for starting RPC servers used by nodes. A potential additional change is to further factor this code into shared code in the rpc package.

Minor API tweaks were also made that seemed appropriate such as the mechanism for fetching routes from the rpc/core package.

Additional work is required to register the Debug service as a command in the cmd directory but I am looking for feedback on if this direction seems appropriate before diving much further.

closes: #5908

rpc/jsonrpc/server/http_server.go Show resolved Hide resolved
rpc/jsonrpc/server/http_server.go Outdated Show resolved Hide resolved
node/setup.go Outdated Show resolved Hide resolved
rpc/core/routes.go Outdated Show resolved Hide resolved
@williambanfield williambanfield marked this pull request as draft July 29, 2021 23:55
rpc/core/routes.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 30, 2021

Codecov Report

Merging #6785 (3a858ca) into master (d7c3a8f) will decrease coverage by 0.21%.
The diff coverage is 51.30%.

@@            Coverage Diff             @@
##           master    #6785      +/-   ##
==========================================
- Coverage   62.88%   62.66%   -0.22%     
==========================================
  Files         307      309       +2     
  Lines       40464    40564     +100     
==========================================
- Hits        25447    25421      -26     
- Misses      13231    13343     +112     
- Partials     1786     1800      +14     
Impacted Files Coverage Δ
cmd/tendermint/commands/gen_node_key.go 0.00% <ø> (ø)
cmd/tendermint/commands/gen_validator.go 18.75% <ø> (ø)
cmd/tendermint/commands/probe_upnp.go 0.00% <ø> (ø)
cmd/tendermint/commands/replay.go 0.00% <ø> (ø)
cmd/tendermint/commands/reset_priv_validator.go 8.33% <ø> (ø)
cmd/tendermint/commands/root.go 50.00% <ø> (+5.88%) ⬆️
cmd/tendermint/commands/show_node_id.go 0.00% <ø> (ø)
cmd/tendermint/commands/show_validator.go 0.00% <ø> (ø)
rpc/core/env.go 47.36% <ø> (ø)
cmd/tendermint/commands/inspect.go 19.14% <19.14%> (ø)
... and 24 more

@lgtm-com
Copy link

lgtm-com bot commented Jul 30, 2021

This pull request introduces 1 alert when merging e2a6522 into 9a2a7d4 - view on LGTM.com

new alerts:

  • 1 for Useless assignment to local variable

rpc/jsonrpc/server/http_server.go Show resolved Hide resolved
store/store.go Outdated Show resolved Hide resolved
state/mocks/block_store.go Outdated Show resolved Hide resolved
rpc/core/routes.go Outdated Show resolved Hide resolved
node/debug_test.go Outdated Show resolved Hide resolved
node/debug.go Outdated Show resolved Hide resolved
node/rpc.go Outdated Show resolved Hide resolved
@tac0turtle
Copy link
Contributor

I like the approach here and the clean up!!

node/debug.go Outdated Show resolved Hide resolved
node/debug.go Outdated Show resolved Hide resolved
internal/consensus/replay_test.go Outdated Show resolved Hide resolved
state/services.go Outdated Show resolved Hide resolved
store/store.go Outdated Show resolved Hide resolved
types/event_bus.go Outdated Show resolved Hide resolved
node/debug.go Outdated Show resolved Hide resolved
node/node_test.go Outdated Show resolved Hide resolved
node/rpc.go Outdated Show resolved Hide resolved
node/rpc.go Outdated Show resolved Hide resolved
node/rpc.go Outdated Show resolved Hide resolved
rpc/core/routes.go Outdated Show resolved Hide resolved
rpc/core/routes.go Outdated Show resolved Hide resolved
rpc/core/routes.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Work!

I think just some minor touch ups / linting is required before this can be merged.

As a high level question, what would happen if I had a node running and then I inspected it at the same time. Would it work as expected or error? As a guess, this would depend on whether the db supports additional read-only connections right?

inspect/inspect.go Outdated Show resolved Hide resolved
inspect/inspect_test.go Outdated Show resolved Hide resolved
cmd/tendermint/commands/inspect.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@williambanfield williambanfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Work!

I think just some minor touch ups / linting is required before this can be merged.

As a high level question, what would happen if I had a node running and then I inspected it at the same time. Would it work as expected or error? As a guess, this would depend on whether the db supports additional read-only connections right?

Yeah, this would be reliant on how the DB manages the files for the storage. I'm not yet sure users will want to run this at the same time as a node. The node already provides RPC so this would be somewhat redundant. I tried running a node of each DB type alongside the corresponding inspect command one by one.

When trying this right now:

Error that is reported by goleveldb:

./build/tendermint inspect
ERROR: failed to initialize database: resource temporarily unavailable

Error that is reported by cleveldb:

./build/tendermint inspect --db-backend cleveldb
ERROR: failed to initialize database: IO error: lock /home/william/.tendermint/data/blockstore.db/LOCK: Resource temporarily unavailable

badgerdb:

 ./build/tendermint inspect --db-backend badgerdb
ERROR: failed to initialize database: Cannot acquire directory lock on "/home/william/.tendermint/data/blockstore".  Another process is using this Badger database.: resource temporarily unavailable

boltdb hangs trying to initialize the db connection

rocksdb:

./build/tendermint inspect --db-backend rocksdb
ERROR: failed to initialize database: IO error: While lock file: /home/william/.tendermint/data/blockstore.db/LOCK: Resource temporarily unavailable

badgerdb:

./build/tendermint inspect --db-backend badgerdb
ERROR: failed to initialize database: Cannot acquire directory lock on "/home/william/.tendermint/data/blockstore".  Another process is using this Badger database.: resource temporarily unavailable

@williambanfield williambanfield added the S:automerge Automatically merge PR when requirements pass label Aug 24, 2021
@williambanfield williambanfield changed the title node: add inspect mode for debugging crashed tendermint node inspect: add inspect mode for debugging crashed tendermint node Aug 24, 2021
@mergify mergify bot merged commit bc2b529 into master Aug 24, 2021
@mergify mergify bot deleted the wb/issue-5908 branch August 24, 2021 18:12
require.NoError(t, d.Run(ctx))
}()
// FIXME: used to induce context switch.
// Determine more deterministic method for prompting a context switch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I also filed #6858 to track this more generally.

@tac0turtle tac0turtle mentioned this pull request Jul 27, 2022
39 tasks
@williambanfield williambanfield restored the wb/issue-5908 branch September 9, 2022 16:13
williambanfield added a commit that referenced this pull request Nov 1, 2022
EDIT: Updated, see [comment below]( #6785 (comment))

This change adds a sketch of the `Debug` mode.

This change adds a `Debug` struct to the node package. This `Debug` struct is intended to be created and started by a command in the `cmd` directory. The `Debug` struct runs the RPC server on the data directories: both the state store and the block store.

This change required a good deal of refactoring. Namely, a new `rpc.go` file was added to the `node` package. This file encapsulates functions for starting RPC servers used by nodes. A potential additional change is to further factor this code into shared code _in_ the `rpc` package.

Minor API tweaks were also made that seemed appropriate such as the mechanism for fetching routes from the `rpc/core` package.

Additional work is required to register the `Debug` service as a command in the `cmd` directory but I am looking for feedback on if this direction seems appropriate before diving much further.

closes: #5908
williambanfield added a commit that referenced this pull request Dec 7, 2022
resurrect the inspect command from #6785

Co-authored-by: Sam Kleinman <garen@tychoish.com>
Co-authored-by: Thane Thomson <connect@thanethomson.com>
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S:automerge Automatically merge PR when requirements pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rpc: debug mode
5 participants