Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstandard compression for parameters and results #5995

Open
wants to merge 72 commits into
base: master
Choose a base branch
from

Conversation

guzzijones
Copy link
Contributor

@guzzijones guzzijones commented Jun 22, 2023

why

mitigate (as much as possible) 'document too large' mongo exceptions. 16 mb is the limit for mongodb document size.

done

  1. zstandard compression for parameter and results
  2. remove embedded liveaction doc in executions db and replace with liveaction id string. This change will potentially save 1/2 the time to write the liveaction to the database as the liveaction won't be written 2 times.
  3. added traceback info to exception handling for action execution engine and workflow engine. This should make it easier to debug for someone when the workflow fails for some reason. invalid orjson or unterminated yaql strings cause these exceptions in my experience.
  4. This will require a data migration for the executionDB model for the liveaction paramter as it is now a string as opposed to an embedded document
  5. I added a config option to turn off zstandard compression for parameters and results in the execution db and the liveaction db

@pull-request-size pull-request-size bot added the size/XXL PR that changes 1000+ lines. You should absolutely split your PR into several. label Jun 22, 2023
@guzzijones guzzijones changed the title initial zstandard compression for parameters and results WIP: initial zstandard compression for parameters and results Jun 22, 2023
@guzzijones guzzijones changed the title WIP: initial zstandard compression for parameters and results WIP: zstandard compression for parameters and results Jun 23, 2023
@guzzijones guzzijones self-assigned this Jul 3, 2023
@guzzijones
Copy link
Contributor Author

@amanda11 @cognifloyd would you please take a look at this and give me some feedback. I am interested in if the direction the code is going in is acceptable.

We are having numerous instances now of mongo documents being too large. I ran a check and one of the larger objects we have would compress down to .5 mb from 10 mb using zstandard.

Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit.

I don't know why Circle CI is falling. I'm guessing something we really on is no longer supported or was upgraded.

st2common/st2common/fields.py Outdated Show resolved Hide resolved
remove comment

Co-authored-by: Jacob Floyd <cognifloyd@gmail.com>
@guzzijones guzzijones requested review from amanda11 and removed request for amanda11 July 23, 2023 03:02
@guzzijones
Copy link
Contributor Author

@cognifloyd The circle ci is fixed. rabbitmq needs priv=true and redis needed to be pinned. pr. can you merge?

@guzzijones
Copy link
Contributor Author

I merged the packaging fixes.

@guzzijones guzzijones requested review from winem, Kami, userlocalhost, mickmcgrath13, nzlosh, m4dcoder, bishopbm1 and a team and removed request for amanda11 August 9, 2023 00:14
@Kami
Copy link
Member

Kami commented Aug 9, 2023

I think zstandard compression is a good change in case numbers across various types of datasets confirm it's indeed worth it / good trade off of CPU cycles used for compressing / decompressing those field values.

We decided to not enable compression by default as part of #4846 to keep things simple and reduce the scope of those changes. I'm fine with enabling the compression by default now, in case it numbers still back this up.

As far as this change goes:

remove embedded liveaction doc in executions db and replace with liveaction id string. This change will potentially save 1/2 the time to write the liveaction to the database as the liveaction won't be written 2 times.

This one I'm not too sure about it yet and I need to dig in and think about it some more.

I think it would be good if we split this PR into two - one for field compression changes and one for field value / foreign key change.

In the end, both changes are somewhat related, but they have vastly different implications - one was already benchmarked in the past, it's backward / forward compatible and the other one is not, it could have large implications and also requires a potentially expensive and breaking migration (which is not backward compatible).

Copy link
Member

@Kami Kami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

I think we are on the right track, but I added some comments on splitting this PR into two and potentially adding more micro benchmarks, etc.

@guzzijones
Copy link
Contributor Author

Thanks @Kami for the review. I would be ok just leaving this a breaking change. I don't have a lot of cycles to split it up. Thanks.

@guzzijones guzzijones requested a review from a team August 10, 2023 17:00
@guzzijones
Copy link
Contributor Author

FWIW we have forked over to this branch along with removing the deep copies in orquesta . We are seeing huge speed improvements. Also, no more Mongo document too large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL PR that changes 1000+ lines. You should absolutely split your PR into several.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants