Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up iterateEntities by sending callbacks to a promise queue #546

Open
ndowmon opened this issue Sep 8, 2021 · 1 comment
Open

Speed up iterateEntities by sending callbacks to a promise queue #546

ndowmon opened this issue Sep 8, 2021 · 1 comment
Labels

Comments

@ndowmon
Copy link
Contributor

ndowmon commented Sep 8, 2021

Recently, we encountered an integration where a single step that calls jobState.iterateEntities() took multiple hours to execute. Since this step is sequentially reading from disk, waiting for network calls, and writing to disk, it appears that we could significantly speed up certain steps by implementing something like below:

  export async function iterateEntityTypeIndex<T extends Entity = Entity>({
    type,
    iteratee,
  }: IterateIndexInput<T>) {
    const path = buildIndexDirectoryPath({
      collectionType: 'entities',
      type,
    });
  
+   const queue = new PQueue({ concurrency: 5 }); 
    await walkDirectory({
      path,
      iteratee: async (input) => {
        const object = await readGraphObjectFile<FlushedEntityData>(input);
        if (isObjectFlushedEntityData(object)) {
          for (const entity of object.entities as T[]) {
-           await iteratee(entity);
+           void queue.add(async () => await iteratee(entity));
          }
        }
      },
    });
+   await queue.onIdle();
  }
@VDubber
Copy link
Contributor

VDubber commented May 26, 2022

This should be benchmarked before desired changes are made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants