Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only get the latest 100,000 scenarios from db-bot #1359

Open
noracato opened this issue Sep 19, 2023 · 7 comments
Open

Only get the latest 100,000 scenarios from db-bot #1359

noracato opened this issue Sep 19, 2023 · 7 comments
Assignees
Labels
Pinned Will never be marked as stale or auto-closed.

Comments

@noracato
Copy link
Member

noracato commented Sep 19, 2023

When downloading and importing a copy from the anonomized database dumps from db-bot you can easily spend up to 10 minutes just waiting around.
This slow expansion of waiting time happened because over the years the engine's database has grown bigger and bigger. As we only use these database dumps for development and help questions from users, we can opt for dumping not the full database, but only the last hunderd thousand scenarios or so (including the special scenarios like II3050). Then the download (currently 3.3GB) and the import will be much quicker.

Right now the limit is scenarios younger than 3 months and scenarios that are marked with keep_compatible. But that seems like it's not enough.

We can also maintain the current dump and add the light dump as an option to download from db-bot.

What do you think @mabijkerk and @thomas-qah?

@mabijkerk
Copy link
Member

Good that you identified this issue @noracato. My first preference would go to the light dump option. This however depends on our use of the db-bot dump. If we do not actually need all those scenarios then it does not make sense to download them. Perhaps there are other filters we can apply that would trim the download?

Since we're having a discussion about keep_compatible next week anyway, let's include this issue in that meeting.

@mabijkerk
Copy link
Member

My main question @noracato is what the use is of the dumps from db-bot. Who uses them and why?

@thomas-qah
Copy link
Contributor

thomas-qah commented Sep 26, 2023

Hope you don't mind my answering/chiming in: I think the usecases are diverse and numerous. For example, I used the etengine production dump to do the benchmark testing. It was very useful to have them!

I think that when downloading such a dump from the production db you usually have such a usecase in mind: to inspect how the database is performing in various manners, but also to see how the database is used; how many inputs/sliders do people set on average, how many (custom) curves, etc. As a developer it is much easier to inspect and test such things on a local database. You don't want to do that on a live production database because it puts unnecessary extra load on the server, and even more importantly, it can be dangerous to data integrity and server security.

Personally, I would be in favor of the 'keep current dump and add light dump' option.

@mabijkerk
Copy link
Member

mabijkerk commented Oct 3, 2023

Thanks for the explanation @thomas-qah. In that case I think my preference would be to set the default to 1 month old scenarios, but to allow users (meaning ourselves) to specify a different time limit.

  • db ete pro download default with 1 month old scenarios and all scenarios on keep_compatible
  • db ete pro 3 download with 3 month old scenarios and all scenarios on keep_compatible
  • db ete pro 0 download with only scenarios on keep_compatible (might be an interesting option to include)

Does this seem feasible for you?

@noracato
Copy link
Member Author

noracato commented Oct 3, 2023

Sure. That means we have to save a few different dumps each night (0, 1, 2 and 3) - right @thomas-qah? Which increases the time the server will be very busy and will increase our bill with amazon a bit.

If we would attract more users from different time zones, it could become a problem in the future. As they would be using the model at the times it will be busy with creating the backups. Not sure how much of a problem it actually is, but just putting it down here!

@thomas-qah
Copy link
Contributor

thomas-qah commented Oct 3, 2023

Yes what @noracato writes is correct, but we could of course create a schedule for when each dump gets created. For example:

  • Monthly: full prod dump (e.g. a few days before or after the deploy)
  • Weekly: dump with scenarios created during the last 3 months
  • Daily: dump with scenarios created during the last month

I think this would decrease the server load significantly, also compared to now :)

Copy link

github-actions bot commented Dec 3, 2023

This issue has had no activity for 60 days and will be closed in 7 days. Removing the "Stale" label or posting a comment will prevent it from being closed automatically. You can also add the "Pinned" label to ensure it isn't marked as stale in the future.

@github-actions github-actions bot added the Stale Issue had no activity for 60 days and will be, or has been, closed. label Dec 3, 2023
@mabijkerk mabijkerk removed the Stale Issue had no activity for 60 days and will be, or has been, closed. label Dec 4, 2023
@mabijkerk mabijkerk added the Pinned Will never be marked as stale or auto-closed. label Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pinned Will never be marked as stale or auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants