Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opportunities to batch / avoid truncate and delete shards operations, triggered via chitchat. #4908

Open
fulmicoton opened this issue Apr 25, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@fulmicoton
Copy link
Contributor

No description provided.

@fulmicoton fulmicoton added the enhancement New feature or request label Apr 25, 2024
@fulmicoton
Copy link
Contributor Author

Delete shard only happens via chitchat. Nothing is done if the shard is not still part of the control plane model, so there should not be any redundancy in this operation.
Batching the delete shards request could however relieve the metastore from a load:
right now when a few indexer restart, they close their shard, which in turn end up being consumed all and deleted, all at the same time. It has been observed to be one of the most frequent query on project airmail.

Truncation can happen via chitchat or via reception of a grpc request.
Truncate entails writing something in the mrecordlog.
This is not an expensive operation per se (there is no fsync involved for instance). We also properly check that the queue is trailing behind before applying the truncation.

Because the gRPC does not update the shardpositions model, we actually often truncate twice, and the check is helping.

The check (and the truncation) however, require acquiring the write lock. Maybe we could make this operation cheaper by either doing the check using the partial lock, by updating the shardpositions model on grpc, or possibly by somehow batching truncations.

@fulmicoton
Copy link
Contributor Author

The delete operation seems to be using the index correctly.

Delete on shards  (cost=0.41..8.43 rows=1 width=6)
   ->  Index Scan using shards_pkey on shards  (cost=0.41..8.43 rows=1 width=6)
         Index Cond: (((index_uid)::text = 'simian_chico_8976363586344670227:01HWA32X693SVC1NVBCC2T0ND5'::text) AND ((source_id)::text = '_ingest-source'::text) AND ((shard_id)::text = ANY ('{01HWA3QH2X5E3TJEXBB4NEJH48}'::text[])))

@fulmicoton
Copy link
Contributor Author

fulmicoton commented Apr 25, 2024

upon deletion of an indexer, I see 317 "deleting shards" logs.
Each node is hosting about 150 shards, so that's twice more than expected.

Investigating specific shard id does show a unique line. The second half is probably "rebalancing" kicking in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant