Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrequent block pileup #29575

Closed
riposteX opened this issue Apr 18, 2024 · 3 comments
Closed

Infrequent block pileup #29575

riposteX opened this issue Apr 18, 2024 · 3 comments
Labels

Comments

@riposteX
Copy link

System information

Geth version: v1.13.14
CL client & version: teku@24.1.3
OS & Version: Linux

Expected behaviour

Geth receives/processes blocks in a timely manner.

Actual behaviour

I run a number of geth/teku nodes and have recently noticed an infrequent (order of daily) pattern occurring across all of them where geth receives a burst of 6ish blocks around the same time, the oldest one, of course, being 72s stale.

Of course it could just be a temporary network issue, but I keep seeing this same number of blocks across multiple machines in multiple locations.

Could also be a teku issue, but seems a bit unlikely given the logs below.

Steps to reproduce the behaviour

I've added some custom logging in forkchoiceUpdated():

// Block is not canonical, set head.

unixNow := uint64(time.Now().Unix())
blockHeader := block.Header()
if blockHeader.Time+8 < unixNow {
	fmt.Println("FCU received late block:", blockHeader.Number, unixNow-blockHeader.Time)
}

With this code in place, yesterday I got the output:

FCU received late block: 19680123 63
FCU received late block: 19680124 52
FCU received late block: 19680125 40
FCU received late block: 19680126 28
FCU received late block: 19680127 16

and corresponding Teku logs:

01:01:51.130 INFO  - ESC[37mSlot Event  *** Slot: 8882707, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 64ESC[0m
01:01:56.514 WARN  - ESC[33mExecution Client request timed out. Make sure the Execution Client is online and can respond to requests.ESC[0m
01:01:56.516 WARN  - ESC[33mLate Block Import *** Block: 719951b2cd1546b28744883f37736b44aac68a8db826a050eba30b0862a6ca17 (8882707) proposer 425691 arrival 1500ms, gossip_validation +7ms, pre-state_retrieved +4ms, processed +146ms, data_availability_checked +0ms, execution_payload_result_received +7857ms, begin_importing +0ms, completed +2ms
ESC[0m
01:02:03.105 INFO  - ESC[37mSlot Event  *** Slot: 8882708, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 63ESC[0m
01:02:15.065 INFO  - ESC[37mSlot Event  *** Slot: 8882709, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 63ESC[0m
01:02:15.094 INFO  - ESC[32mExecution Client is responding to requests again after a previous failureESC[0m
01:02:18.523 WARN  - ESC[33mExecution Client request timed out. Make sure the Execution Client is online and can respond to requests.ESC[0m
01:02:27.058 INFO  - ESC[37mSlot Event  *** Slot: 8882710, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 70ESC[0m
01:02:39.051 INFO  - ESC[37mSlot Event  *** Slot: 8882711, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 70ESC[0m
01:02:50.552 INFO  - ESC[32mExecution Client is responding to requests again after a previous failureESC[0m
01:02:51.110 INFO  - ESC[37mSlot Event  *** Slot: 8882712, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 69ESC[0m
01:03:03.347 INFO  - ESC[37mSlot Event  *** Slot: 8882713, Block: e9a5299cdef23abf523c27235560777a176a614341703654aec7b3eab504f99e, Justified: 277583, Finalized: 277582, Peers: 69ESC[0m

I'm interpreting this as something is infrequently hanging geth for ~1 min.

Some other recent incidents occurred at block 19678501 and 19675309. The signature is always a 6ish block pileup on geth and a late block message on Teku. These late block messages were a bit different from the above though:

12:51:51.989 WARN  - ESC[33mLate Block Import *** Block: b9b41da780df72f2b0d81e7e54820cfad87cf25107ba63f
de3e147606bdb76d6 (8877857) proposer 241202 arrival 972ms, gossip_validation +6ms, pre-state_retrieved +
10ms, processed +223ms, execution_payload_result_received +0ms, begin_importing +3778ms, completed +0ms

and

23:34:51.509 WARN  - ESC[33mLate Block Import *** Block: 23b5b12c8680fe980ee2f8bf8455b43b837181b9988fcf7
4d5bd73130b457057 (8881072) proposer 37541 arrival 488ms, gossip_validation +7ms, pre-state_retrieved +1
2ms, processed +163ms, execution_payload_result_received +0ms, begin_importing +3837ms, completed +2ms

I was running an older (and less verbose) version of Teku at the time, Lucas Saldanha from the Teku Discord told me that both of these blocks were late because of blob data unavailability.

@karalabe
Copy link
Member

Could you share some logs from Geth when this happens?

@riposteX
Copy link
Author

I usually run with --verbosity 1 so no past logs. Will try to capture another one.

@riposteX
Copy link
Author

riposteX commented May 7, 2024

Seems to have magically stopped on its own, haven't seen the characteristic 5-6 block pattern for a while now. Will keep an eye out and reopen if it returns.

@riposteX riposteX closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants