Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Bulk download between recent checkpoints? #2098

Open
woodruffw opened this issue Apr 24, 2024 · 2 comments
Open

[FR] Bulk download between recent checkpoints? #2098

woodruffw opened this issue Apr 24, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@woodruffw
Copy link
Member

I had this idea while playing around with my own monitoring tool, curious to hear what the Rekor folks think 馃檪 -- if you think it's too complicated or otherwise not worth the effort please close!

Description

Right now, a real-time log monitor might have an event loop like this:

  1. Persist the last observed checkpoint
  2. Wait until a new checkpoint appears
  3. Audit all entries in the range [old, new)

To do (3), the monitor calls /api/v1/log/entries/retrieve repeatedly for ranges of indices in [old, new), which each call only handling a maximum of 10 indices. Current typical checkpoint ranges include a few hundred entries, meaning that the retrieval loop takes a decent amount of time (and that monitoring requires more fallible network round-trips than strictly necessary).

My proposal: For the last N checkpoints (pick N to balance size tradeoffs), Rekor could bundle the entries between adjacent checkpoints into singular payloads. These payloads could then be made available via an endpoint like /api/v1/log/entries/retrieve/by-checkpoints (or similar), where the request to that endpoint specifies the checkpoint span.

Pros:

  • In the "happy" case, this would reduce the order of monitor network requests to Rekor from O(N) to O(1), making the monitor faster and reducing pressure on Rekor (this may not be significant anyways)

Cons:

  • Additional storage requirements on Rekor's side, along with a small amount of server complexity
  • In the "sad" case (where a monitor is catching up or missed a checkpoint for whatever reason), the network request order degrades back to O(N). This could be addressed through an even more clever "windowing" approach (where Rekor bundles the entire last N checkpointed entries into one giant payload and offers ranges over it), but this is even more complicated.

TL;DR: Rekor could bundle ranges between pairs of recent checkpoints to accelerate a common monitor retrieval pattern. This would reduce network traffic and improve monitor performance, at the cost of some additional storage and server complexity.

@woodruffw woodruffw added the enhancement New feature or request label Apr 24, 2024
@haydentherapper
Copy link
Contributor

Could this instead be a general purpose batch retrieval API, rather than specifically for checkpointing?

I had started implementation on this awhile ago but didn't get a chance to finish. The only thing to deal with is deciding whether the index you're querying by is the "global" log index, meaning you need to handle cross-shard lookups, or the shard-specific index, meaning you need to specify a tree ID too. I would prefer the latter, though it does make the API look different than the other APIs that are shard-agnostic.

@woodruffw
Copy link
Member Author

Could this instead be a general purpose batch retrieval API, rather than specifically for checkpointing?

I think so, yeah! I emphasized checkpointing above because it's what I was looking at for my hacky monitor, but I see no reason why it needs to be constrained to that 馃檪

I would prefer the latter, though it does make the API look different than the other APIs that are shard-agnostic.

That makes sense to me -- my 0.02c is that I don't mind a slightly more complicated/shard-aware client side API if the retrieval performance is worth it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants