Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup seeding #4901

Open
jorg-vr opened this issue Aug 14, 2023 — with Slack · 2 comments
Open

Speedup seeding #4901

jorg-vr opened this issue Aug 14, 2023 — with Slack · 2 comments
Labels
chore Repository/build/dependency maintenance

Comments

Copy link
Contributor

jorg-vr commented Aug 14, 2023

This is one of the main reason for slow tests on github actions

waarschijnlijk kunnen we dat voor een groot deel versnellen door de truuk op https://railsnotes.xyz/blog/seed-your-database-with-the-faker-gem#fixing-our-slow-seeds-with-upsert_all-and-activerecord-import

Slack Message

@jorg-vr jorg-vr added the chore Repository/build/dependency maintenance label Aug 14, 2023 — with Slack
@bmesuere
Copy link
Member

Note that in addition to seeding, it could also be used in the application itself. For example, when creating an evaluation we do a lot of inserts which can maybe be done as a single one.

@jorg-vr
Copy link
Contributor Author

jorg-vr commented Aug 18, 2023

Speeding up using bulk inserts is a lot less simple than the example given, which is just a bunch of inserts with Faker data

I tried to profile the seeding script using stackprof to find our causes of slowdown:
26% of our time is taken by gitable functions (eg repository cloning)
This is more file system related. We could ask ourselfs whether we need a 'large activity repo' in the seed

21% of time is taken by creating activity statuses
A lot of that time is also spend in validations.
This could potentially be rewritten in a single query, but it'll be rather complex to get correct

Next we get creating most courses (13%) and visualisation test (11%)
A significant part of this is creating series, series memberships, course memberships etc. But as we loop over these to create submissions, a lot of the speed up of a collective insert all is lost when we have to query all afterwards.
Creating submissions might be a good candidate for a collective insert, but these are also rather complex objects (We also have to fix the code and result file written to the filesystem) But avoiding some of the callbacks here could provide a speedup (some callbacks I tracked from submission create add up to at least 6.5% of total runtime)

I tried replacing student creation with one insert_all and one User.where(permissions: :student) call and it caused a slowdown instead of a speedup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Repository/build/dependency maintenance
Projects
Status: Unplanned
Development

No branches or pull requests

2 participants