-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dont merge to main yet] Don't flush after step completes #977
base: main
Are you sure you want to change the base?
Conversation
packages/integration-sdk-runtime/src/execution/dependencyGraph.ts
Outdated
Show resolved
Hide resolved
packages/integration-sdk-runtime/src/execution/dependencyGraph.ts
Outdated
Show resolved
Hide resolved
name: IntegrationErrorEventName.UnexpectedError, | ||
description: 'Upload to persister failed', | ||
}); | ||
//How can we fail gracefully here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today if an upload fails - we fail that step, but its really an incorrect decision, since the upload usually has information regarding other steps as well. Should we just fail the job if an upload fails? Note this is not the only place we would be uploading, but we can also change the behavior there.
await this.lockOperation(async () => { | ||
const entitiesByStep = this.localGraphObjectStore.collectEntitiesByStep(); | ||
let entitiesToUpload: Entity[] = []; | ||
for (const [stepId, entities] of entitiesByStep) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we move this from pMap to a plain function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be no reason to divide the uploads we do. There might be a point of doing it for writes to disk.
Let's say a flushed batch of 6MB of entities gets into the function, do we want to generate 100 uploads if the entities come from 100 different steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. In that case, we may want to alter the way this whole flushing process is done in the future. We could just remove the concept of steps in this. That can be future work though.
…ps-2 Int 9336 dont flush after steps 2
Looks good to me. Let's make an alpha version. |
The motivation for this PR is to avoid flushing every time we complete a step. There should only be two times we flush to disk/upload:
Also, I don't think is optimal to divide by step the data we update, that would generate way more unnecessary uploads.
Tried it on dev, on instance
jupiterone-integration-dev
: We went from 432 uploads in a single job, to 3.