Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reached max retries querying for block #1402

Closed
gaia opened this issue Feb 20, 2024 · 10 comments
Closed

Reached max retries querying for block #1402

gaia opened this issue Feb 20, 2024 · 10 comments
Assignees

Comments

@gaia
Copy link

gaia commented Feb 20, 2024

on v2.5.1 and at least also v2.5.0, i have a synced osmosis and celestia private RPCs being queried but showing Reached max retries querying for block even for very recent blocks (in addition to having at least 2 weeks before pruning)

@gaia
Copy link
Author

gaia commented Feb 29, 2024

Issue is the same on https://github.com/cosmos/relayer/releases/tag/v2.5.2

@jtieri
Copy link
Member

jtieri commented Feb 29, 2024

hey thanks for opening the issue!

we aren't seeing the same behavior in our infra. is there anything unique about your setup, perhaps a load balancer or the need to use port forwarding, etc? could you also try configuring some of the publicly available endpoints from the chain registry to rule out this being an issue with your nodes?

@gaia
Copy link
Author

gaia commented Feb 29, 2024

no load balancers. I'm running

image

I tried using all external RPCs, and I still get the same error. But as before, only on celestia<>osmosis (while cosmoshub<>osmosis works fine, using the same osmosis RPC)

Reached max retries querying for block, skipping {"chain_name": "celestia", "chain_id": "celestia", "height": 893372} && warn Reached max retries querying for block, skipping {"chain_name": "osmosis", "chain_id": "osmosis-1", "height": 14058148} (note the recent blockheights). This error is intermittent: it's not shown sequentially for every single block. After a while, it starts to only happen on Celestia (local or 3rd party RPC)

does rly establish a connection in which it needs an inbound port? or websockets? it's behind NAT at the router and NAT at the hypervisor (LXC/LXD)

PS: I can establish a websocket connection to a 3rd party using websocat fine

Would you mind giving me the exact query it is trying to do so that I can try it manually?

@jtieri jtieri self-assigned this Apr 23, 2024
@jtieri
Copy link
Member

jtieri commented Apr 23, 2024

the log you shared does have chain_name as celestia so it would seem that the Celestia RPC is the problematic one here. when you start the relayer are you using the debug flag -d? mostly asking to see if there are some details related to the error that are going unseen. i do remember an issue someone reported where the relayer was unable to sync with Celestia and it was due to some configuration on the node, see #1383

the relayer does not use websockets, it just makes RPC calls to the configured node

if i'm not mistaken the logs you are seeing are related to the block_results RPC endpoint

@gaia
Copy link
Author

gaia commented Apr 24, 2024

thanks, i will look into it again.

PS: port forwarding IS in use. there is NAT at the router to the public IP and also in the LAN IP of the host (since the relayer runs in an LXC container)

@jtieri
Copy link
Member

jtieri commented Apr 30, 2024

let me know what you turn up!

I'm thinking this is possibly related to some silent error or issue that is only being logged at the debug level related to the Celestia node, which could be stemming from some configuration value that is specific to Celestia. From what you described i don't think there is anything wrong with your relayer/node setup necessarily. Perhaps @agouin can take a peek at this and confirm that the system configuration you are using is fine?

@gaia
Copy link
Author

gaia commented Apr 30, 2024

i will run on rly again in the near future and report back. for now I am running hermes.
you can however use our rpc node for testing. i can send you some TIA.

@jtieri
Copy link
Member

jtieri commented May 6, 2024

i will run on rly again in the near future and report back. for now I am running hermes. you can however use our rpc node for testing. i can send you some TIA.

appreciate it! yeah if you wanna share your node i would be happy to try debugging this a bit when i have some extra cycles

@gaia
Copy link
Author

gaia commented May 7, 2024

i will run on rly again in the near future and report back. for now I am running hermes. you can however use our rpc node for testing. i can send you some TIA.

appreciate it! yeah if you wanna share your node i would be happy to try debugging this a bit when i have some extra cycles

happy to share. send me a DM on twitter (@wholesum), you are @Ethereal0ne, right?

@jtieri
Copy link
Member

jtieri commented May 21, 2024

The team did some debugging with your Celestia node between Celestia<>Osmosis and it turns out the node is currently configured to discard ABCI responses, which the relayer needs to work properly.

The same issue was described in #1383 and the solution is to go into the node's config and set the field discard_abci_responses = false. After that rly should have no problems connecting to the node and successfully relaying IBC packets.

@jtieri jtieri closed this as completed May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants