-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollup stops posting to DA layer after a while #1520
Comments
it seems to start after this throws. perhaps it's not properly recovering from
|
how long is a while? I noticed that between batches posting on Polaris, the time went from every block or so, to every minute, to every few minutes, eventually lagging 15 minutes between batch postings. during that time rollkit was still producing blocks, some 1000 extra blocks, with no warnings or errors then, after posting the last batch, the chain crashed after running for around one hour with the same error that @mycodecrafting is seeing: 5:49PM ERR failed to prepare proposal err="invalid timestamp, parent 1706914174 given 1706914174" height=3573 module=server i originally suggested to @mycodecrafting to change the block time to 2s, but i don't think changing the block time will do anything but slow down the rate at which the chain fails. it'll inevitably hit the point where it begins to lag so much, based on the number of blobs being posted IMO, that it can't fit them into a pfb and it gets stuck in mempool causing issues with acc sequence errors, eventually resulting in failed txs. (namespace on celenium to see the different batch postings click the "messages" tab) root cause5:49PM ERR failed to build payload err="invalid timestamp, parent [1706914174](tel:1706914174) given [1706914174](tel:1706914174)" module=server
5:49PM ERR failed to prepare proposal err="invalid timestamp, parent [1706914174](tel:1706914174) given [1706914174](tel:1706914174)" height=3573 module=server
5:49PM ERR failed to process proposal err="failed to find envelope in proposal" hash=2A4784028005D12EBD413A5BCDAC7B470DFF4B801D74D202BD0813B54E9C953A height=3573 module=server
panic: error while processing the proposal: <nil>
goroutine 356 [running]:
[github.com/rollkit/rollkit/block.(*Manager).publishBlock(0x140006a92c0](http://github.com/rollkit/rollkit/block.(*Manager).publishBlock(0x140006a92c0), {0x1049062e0, 0x14001454730?})
[github.com/rollkit/rollkit@v0.11.19/block/manager.go:732](http://github.com/rollkit/rollkit@v0.11.19/block/manager.go:732) +0xbd8
[github.com/rollkit/rollkit/block.(*Manager).AggregationLoop(0x140006a92c0](http://github.com/rollkit/rollkit/block.(*Manager).AggregationLoop(0x140006a92c0), {0x1049062e0, 0x14001454730}, 0x0)
[github.com/rollkit/rollkit@v0.11.19/block/manager.go:274](http://github.com/rollkit/rollkit@v0.11.19/block/manager.go:274) +0x1d0
[github.com/rollkit/rollkit/node.(*FullNode).OnStart.func1()](http://github.com/rollkit/rollkit/node.(*FullNode).OnStart.func1())
[github.com/rollkit/rollkit@v0.11.19/node/full.go:379](http://github.com/rollkit/rollkit@v0.11.19/node/full.go:379) +0x30
[github.com/rollkit/rollkit/types.(*ThreadManager).Go.func1()](http://github.com/rollkit/rollkit/types.(*ThreadManager).Go.func1())
[github.com/rollkit/rollkit@v0.11.19/types/threadmanager.go:26](http://github.com/rollkit/rollkit@v0.11.19/types/threadmanager.go:26) +0x58
created by [github.com/rollkit/rollkit/types.(*ThreadManager).Go](http://github.com/rollkit/rollkit/types.(*ThreadManager).Go) in goroutine 1
[github.com/rollkit/rollkit@v0.11.19/types/threadmanager.go:24](http://github.com/rollkit/rollkit@v0.11.19/types/threadmanager.go:24) +0x7c
make: * [start] Error 2 full logs & version infoversions i am using:
polaris logs: https://app.warp.dev/block/rgCrw9OZY4TPapRK2gkyth |
testing with gm world rollupi decided to leave the same test running overnight with gm world, using default versions in docs, v0.11.19 rollkit, v28.1.0 ignite results
|
Just attempted to manually test using this commit of celestia-da. I got this error output from Rollkit
The chain progresses despite the DA submission failure. We need to stop block production if pendingBlocks start to pile up past a certain number. We also need to solve all the underlying errors with DA submission |
including another example of poor dev ex relating to this problem:
|
on the initial
did you already have a chain running previously and the transaction is stuck? it looks like either the old transaction finally failed or it was overridden by a new one with higher fee? |
FWIW, when I was testing out a new version of Polaris with v0.12.10 of celestia-da, I did not encounter the same issues with blocks piling up. Here is the issue linked with testing setup and logs for reference. I am testing a GM rollup now with v0.12.10 of celestia-da. will report back here edit: i jinxed this. |
I'm hitting the same bugs on this. setup
results7 batch postings and 564 rollup blocks in, out of 2000 total blocks have been posted. ultimately, the chain stopped posting at block 564 on the rollup and has been stuck in an incorrect account sequence error since block 873. the rollup node, since block start of chain has been giving errors: first error - timed out waiting to be included in a block
second error - tx already in mempool
third error - 30 attempts failed
fourth error - incorrect account sequence
ultimately, the chain stopped posting at block 564 on the rollup and has been stuck in an incorrect account sequence error since block 873 logs |
testing againoff the jump, i hit errors too of 1. timeout -> 2. tx already in mempool -> 3. failed after 30 attempts -> 4. incorrect account sequence 💀
updatea blob went through, 500+ blocks after it should have on the rollup 800+ blocks in, 15 blocks have been posted to DA second blob has been posted 17 minutes later |
With #1535 we're handling common DA congestion errors like
Most of these errors originate from insufficient gas price. We're now adding an increased timeout and retrying the transaction based on the error message. We can improve this by estimating gas price from third party providers after celestiaorg/celestia-app#3114 is implemented. Until then we support dynamic gas price retargeting by using the |
Version of Git SHA
Observed the error on #1424, however it probably occurs on main as well.
System OS
Mac OS
Steps to reproduce it
Start a gm rollup, non-lazy aggregation, let it run for several hours. Eventually, the
successfully submitted Rollkit block to DA layer
messages stop appearing in the logs, but blocks still seem to get gossiped, stored, and indexed.Expected result
successfully submitted Rollkit block to DA layer
log messages keep up with the chainActual result
rollup seems to stop posting blocks to DA.
Notes
If it's failing silently, we need to ensure that the sequencer stops producing blocks, or else it becomes at risk of progressing too far for DA to keep up.
address the underlying cause if possible, and don't fail silently
The text was updated successfully, but these errors were encountered: