Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphman copy indexing optimizations #5140

Open
leoyvens opened this issue Jan 15, 2024 · 2 comments
Open

Graphman copy indexing optimizations #5140

leoyvens opened this issue Jan 15, 2024 · 2 comments
Assignees
Labels
area/graphman ops Relevant to Graph Node Ops

Comments

@leoyvens
Copy link
Collaborator

To reduce the size of a DB shard without downtime, graphman copying deployments out is a good option. But it needs a few optimizations to be usable en masse, including on index creation. This track two of those optimizations:

  1. Create the indexes based on the index creation DDL of the source, not based on the default.
  2. Create the BTree indexes after the copy is done, and only the GiST afterwards.

@incrypto32 will mentor @itsjerryokolo on this task.

@leoyvens leoyvens added area/graphman ops Relevant to Graph Node Ops labels Jan 15, 2024
@leoyvens
Copy link
Collaborator Author

We can also apply this for grafts, and copy over custom statistics in addition to the indexes.

@fordN fordN assigned zorancv and unassigned incrypto32 and itsjerryokolo Apr 9, 2024
@lutter
Copy link
Collaborator

lutter commented Apr 11, 2024

Here's my take on this:

  • In general, we should defer creating BTree attribute indexes on all subgraphs until the subgraph is synced. What we really want is to kick off creating those indexes so that they are ready when the subgraph is synced, but not much before then, but as a first step it's ok to create them when the subgraph becomes synced.
  • All other indexes (not BTree or not an attribute index) should be created when the subgraph is first deployed as creating them later can be very very slow with bad consequences for the rest of the system.
  • For copies and grafts, we do not want to create the indexes that we create by default for new subgraphs. Instead, we should look at the source of the graft/copy, and recreate all the indexes that actually exist on the source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/graphman ops Relevant to Graph Node Ops
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants