You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running the pre-built indexer (v2.15.3) on an AWS EC2 instance together with an archival node, and in the last 48h I've had the same issue twice: the indexer stops pushing data to the Postgres database (Timescale Cloud, so fully managed) with the following error:
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 27670983 but next round to account is 27670982","level":"error","msg":"block 27670983 import failed","time":"2023-03-16T08:02:45Z"}
The command I use to run the indexer is: algorand-indexer daemon --data-dir /home/ubuntu/indexerdata -d /var/lib/algorand --postgres "$TIMESCALE_PROD"
The node itself is still properly running at that point in time, output of goal node status -w 1000 is:
Last committed block: 27681419
Time since last block: 0.6s
Sync Time: 0.0s
Last consensus protocol: https://github.com/algorandfoundation/specs/tree/44fa607d6051730f5264526bf3c108d51f0eadb6
Next consensus protocol: https://github.com/algorandfoundation/specs/tree/44fa607d6051730f5264526bf3c108d51f0eadb6
Round for next consensus protocol: 27681420
Next consensus protocol supported: true
Last Catchpoint: 27680000#NA63SDQJD63NR3QPNC2NXYV6FPUJWHNJY6DDAMGURQ76CT2MYUUQ
Genesis ID: mainnet-v1.0
Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=
When I restart the indexer, all I get is the prompt to re-initialise the ledger:
{"error":"MakeProcessorWithLedgerInit() err: InitializeLedger() simple catchup err: RunMigration() err: MakeProcessor() err: the ledger cache is ahead of the required round and must be re-initialized","level":"error","msg":"blockprocessor.MakeProcessor() err MakeProcessorWithLedgerInit() err: InitializeLedger() simple catchup err: RunMigration() err: MakeProcessor() err: the ledger cache is ahead of the required round and must be re-initialized","time":"2023-03-16T08:06:41Z"}
Currently the only way I know to get it up and running again is by clearing out the indexer's data directory and starting sync again from the nearest catchpoint: algorand-indexer daemon --data-dir /home/ubuntu/indexerdata -d /var/lib/algorand --postgres "$TIMESCALE_PROD" --catchpoint "27670000#74HTMMCL63E74B43FLS3LHHQRMDO54HTF6FKC2JZK3K3PXNY6ZYQ"
Is this a known issue? Can I somehow make the indexer more robust to catch these kinds of issues?
As this issue seems to be fully indexer related (unless I'm missing something here), I thought it might be good to discuss this here. We specifically use the provided indexer so we don't have to write our own code and can rely on the stability provided out of the box, so looking forward to solving this!
I believe this sort of thing may happen if you have multiple Indexer writers running at the same time. Resetting the data directory is the right way to recover.
Normally I just have a single writer in a single process, so not really sure if this is the case here. If it is, then it seems like the problem exist within this version of the indexer.
Description
I'm running the pre-built indexer (v2.15.3) on an AWS EC2 instance together with an archival node, and in the last 48h I've had the same issue twice: the indexer stops pushing data to the Postgres database (Timescale Cloud, so fully managed) with the following error:
The command I use to run the indexer is:
algorand-indexer daemon --data-dir /home/ubuntu/indexerdata -d /var/lib/algorand --postgres "$TIMESCALE_PROD"
The node itself is still properly running at that point in time, output of
goal node status -w 1000
is:When I restart the indexer, all I get is the prompt to re-initialise the ledger:
Currently the only way I know to get it up and running again is by clearing out the indexer's data directory and starting sync again from the nearest catchpoint:
algorand-indexer daemon --data-dir /home/ubuntu/indexerdata -d /var/lib/algorand --postgres "$TIMESCALE_PROD" --catchpoint "27670000#74HTMMCL63E74B43FLS3LHHQRMDO54HTF6FKC2JZK3K3PXNY6ZYQ"
Is this a known issue? Can I somehow make the indexer more robust to catch these kinds of issues?
As this issue seems to be fully indexer related (unless I'm missing something here), I thought it might be good to discuss this here. We specifically use the provided indexer so we don't have to write our own code and can rely on the stability provided out of the box, so looking forward to solving this!
Our environment
3.14.2.stable
Steps to reproduce
Unknown, but seems to be happening a lot the past week.
The text was updated successfully, but these errors were encountered: