Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the pre-built indexer generates 'next round to account' error after a while without apparent reason #1507

Closed
MatteusDeloge opened this issue Mar 16, 2023 · 4 comments

Comments

@MatteusDeloge
Copy link

Description

I'm running the pre-built indexer (v2.15.3) on an AWS EC2 instance together with an archival node, and in the last 48h I've had the same issue twice: the indexer stops pushing data to the Postgres database (Timescale Cloud, so fully managed) with the following error:

{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 27670983 but next round to account is 27670982","level":"error","msg":"block 27670983 import failed","time":"2023-03-16T08:02:45Z"}

The command I use to run the indexer is: algorand-indexer daemon --data-dir /home/ubuntu/indexerdata -d /var/lib/algorand --postgres "$TIMESCALE_PROD"

The node itself is still properly running at that point in time, output of goal node status -w 1000 is:

Last committed block: 27681419
Time since last block: 0.6s
Sync Time: 0.0s
Last consensus protocol: https://github.com/algorandfoundation/specs/tree/44fa607d6051730f5264526bf3c108d51f0eadb6
Next consensus protocol: https://github.com/algorandfoundation/specs/tree/44fa607d6051730f5264526bf3c108d51f0eadb6
Round for next consensus protocol: 27681420
Next consensus protocol supported: true
Last Catchpoint: 27680000#NA63SDQJD63NR3QPNC2NXYV6FPUJWHNJY6DDAMGURQ76CT2MYUUQ
Genesis ID: mainnet-v1.0
Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=

When I restart the indexer, all I get is the prompt to re-initialise the ledger:

{"error":"MakeProcessorWithLedgerInit() err: InitializeLedger() simple catchup err: RunMigration() err: MakeProcessor() err: the ledger cache is ahead of the required round and must be re-initialized","level":"error","msg":"blockprocessor.MakeProcessor() err MakeProcessorWithLedgerInit() err: InitializeLedger() simple catchup err: RunMigration() err: MakeProcessor() err: the ledger cache is ahead of the required round and must be re-initialized","time":"2023-03-16T08:06:41Z"}

Currently the only way I know to get it up and running again is by clearing out the indexer's data directory and starting sync again from the nearest catchpoint: algorand-indexer daemon --data-dir /home/ubuntu/indexerdata -d /var/lib/algorand --postgres "$TIMESCALE_PROD" --catchpoint "27670000#74HTMMCL63E74B43FLS3LHHQRMDO54HTF6FKC2JZK3K3PXNY6ZYQ"

Is this a known issue? Can I somehow make the indexer more robust to catch these kinds of issues?

As this issue seems to be fully indexer related (unless I'm missing something here), I thought it might be good to discuss this here. We specifically use the provided indexer so we don't have to write our own code and can rely on the stability provided out of the box, so looking forward to solving this!

Our environment

  • Software version: 3.14.2.stable
  • Node status: see above
  • Indexer version: 2.15.3
  • Server: AWS EC2 c5.large running Ubuntu 20.04
  • Postgres: Timescale Cloud (v2.10.0) running Postgres v14.7

Steps to reproduce

Unknown, but seems to be happening a lot the past week.

@MatteusDeloge MatteusDeloge added the new-bug Bug report that needs triage label Mar 16, 2023
@winder
Copy link
Contributor

winder commented Mar 23, 2023

I believe this sort of thing may happen if you have multiple Indexer writers running at the same time. Resetting the data directory is the right way to recover.

@winder winder added Team Lamprey and removed new-bug Bug report that needs triage labels Mar 23, 2023
@MatteusDeloge
Copy link
Author

Normally I just have a single writer in a single process, so not really sure if this is the case here. If it is, then it seems like the problem exist within this version of the indexer.

@shiqizng
Copy link
Contributor

shiqizng commented May 16, 2023

I'm unable to reproduce the error with v2.15.3 indexer. I recommend switch to using Conduit and you'll not get this error.

@gmalouf
Copy link
Contributor

gmalouf commented May 23, 2024

Indexer 2.x was retired in 2023.

@gmalouf gmalouf closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants