Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing events when performing catchup with catchupModePageSize > catchupModeBlockGap #198

Open
peterbroadhurst opened this issue Feb 4, 2022 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@peterbroadhurst
Copy link
Contributor

peterbroadhurst commented Feb 4, 2022

This is problematic for FireFly, which subscribes from Block 0, and uses a PATCH on startup to ensure the configuration of the event stream is always up to date.

[2022-02-04T02:57:47.779Z]  INFO --> PATCH /eventstreams/es-2259ebbd-0f87-419d-4463-29876cde5be6
[2022-02-04T02:57:47.780Z]  INFO es-2259ebbd-0f87-419d-4463-29876cde5be6: Update event stream
[2022-02-04T02:57:47.780Z]  INFO es-2259ebbd-0f87-419d-4463-29876cde5be6: Notified of an ongoing stream update, existing batch processor
[2022-02-04T02:57:47.780Z]  INFO es-2259ebbd-0f87-419d-4463-29876cde5be6: Notified of an ongoing stream update, existing event poller
[2022-02-04T02:57:47.780Z]  INFO es-2259ebbd-0f87-419d-4463-29876cde5be6: Notified of an ongoing stream update, not waiting for new events
[2022-02-04T02:57:47.800Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): Uninstalled filter. ok=true (%!s(<nil>))
[2022-02-04T02:57:47.800Z]  INFO <-- PATCH /eventstreams/es-2259ebbd-0f87-419d-4463-29876cde5be6 [200]
[2022-02-04T02:57:47.801Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): initial block height for event stream (from block): 0
[2022-02-04T02:57:47.807Z]  INFO --> GET /subscriptions
[2022-02-04T02:57:47.807Z]  INFO <-- GET /subscriptions [200]
[2022-02-04T02:57:47.819Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): catchup mode. Blocks 0 -> 4999
[2022-02-04T02:57:47.822Z]  INFO WS/8d27544e-a7ce-460d-6442-0621118b27dc: Connected
[2022-02-04T02:57:47.891Z]  INFO WS/16721966-36b1-45bc-48ff-4816badd5556: Connected
[2022-02-04T02:57:48.850Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): initial block height for event stream (from block): 0
[2022-02-04T02:57:48.872Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): catchup mode. Blocks 5000 -> 9999
[2022-02-04T02:57:49.911Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): initial block height for event stream (from block): 0

Then once it catches up, something weird happens after the first batch:

[2022-02-04T03:05:50.176Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): catchup mode. Blocks 435000 -> 439999
[2022-02-04T03:05:50.226Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): Dispatching event. Address=0x4D2752586bff01E1F598B71FA56e9AC8A11582D8 BlockNumber=436683 TxIndex=0x0
[2022-02-04T03:05:50.226Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): Dispatching event. Address=0x4D2752586bff01E1F598B71FA56e9AC8A11582D8 BlockNumber=436721 TxIndex=0x0
[2022-02-04T03:05:50.226Z]  INFO es-2259ebbd-0f87-419d-4463-29876cde5be6: New batch length 1
[2022-02-04T03:05:50.226Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): Dispatching event. Address=0x4D2752586bff01E1F598B71FA56e9AC8A11582D8 BlockNumber=436764 TxIndex=0x0
[2022-02-04T03:05:50.226Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): Dispatching event. Address=0x4D2752586bff01E1F598B71FA56e9AC8A11582D8 BlockNumber=436835 TxIndex=0x0
[2022-02-04T03:05:50.226Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): Dispatching event. Address=0x4D2752586bff01E1F598B71FA56e9AC8A11582D8 BlockNumber=436916 TxIndex=0x0
[2022-02-04T03:05:50.226Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): Dispatching event. Address=0x4D2752586bff01E1F598B71FA56e9AC8A11582D8 BlockNumber=437041 TxIndex=0x0
[2022-02-04T03:05:50.727Z]  INFO es-2259ebbd-0f87-419d-4463-29876cde5be6: Batch 4 initiated with 6 events. FirstBlock=436683 LastBlock=437041
[2022-02-04T03:05:50.727Z]  WARN Cleared out suprious ack (could be from previous disonnect). err=FFEC100205: WebSocket '8d27544e-a7ce-460d-6442-0621118b27dc' closed
[2022-02-04T03:05:51.154Z]  INFO Attempt batch 4 complete. ok=true
[2022-02-04T03:05:51.227Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): initial block height for event stream (from block): 0
[2022-02-04T03:05:51.263Z]  INFO sb-e1f61740-2dc7-4e1f-68ad-f0ac55e4e663:BatchPin(address,uint256,string,bytes32,bytes32,string,bytes32[]): created filter from block 440000: 0xdde5b88258480118d701b810b2534400 - {Addresses:[0x4D2752586bff01E1F598B71FA56e9AC8A11582D8] Topics:[[0x805721bc246bccc732581be0c0aa2dd8f7ec93e97ba4b307be84428c98b0a12f]]}

See the second " initial block height for event stream (from block): 0" and then no further events.

There was an event it should have hit at block 437521 - which is after the 437041 event it dispatched as part of the gap fill, but before the 440000 where it set the filter.

@peterbroadhurst peterbroadhurst added the bug Something isn't working label Feb 4, 2022
@peterbroadhurst peterbroadhurst self-assigned this Feb 4, 2022
@peterbroadhurst
Copy link
Contributor Author

This is affect v0.12.0 of FireFly, which shipped this change: hyperledger/firefly@c070903

@peterbroadhurst
Copy link
Contributor Author

It seems to be the case that as well as resetting, the event stream does not receive events after the initial batch that are found during the rewind.

@peterbroadhurst
Copy link
Contributor Author

peterbroadhurst commented Feb 4, 2022

Ok - found the root cause here.

It's subtle:

  • The catchupModePageSize was set to be GREATER THAN the catchupModeBlockGap
  • This meant our logic at the point we caught up missed some events
  • It also seems to have meant we didn't write a checkpoint, which is why we are resetting to 0 - so the original subject of this issue is probably erroneous

@peterbroadhurst peterbroadhurst changed the title A PATCH to the event stream is resetting all subscriptions owned by that event stream to their initial state Missing events when performing catchup with catchupModePageSize > catchupModeBlockGap Feb 4, 2022
@peterbroadhurst
Copy link
Contributor Author

Note I found we did still have a problem with writing incremental HWM checkpoint while doing gap based recovery:
#199 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant