Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource limits cause cluster oom kill lock #17

Open
seanlaff opened this issue Apr 29, 2020 · 2 comments
Open

Resource limits cause cluster oom kill lock #17

seanlaff opened this issue Apr 29, 2020 · 2 comments

Comments

@seanlaff
Copy link

seanlaff commented Apr 29, 2020

I spin up a cluster with req/limit of 6gb mem. After about 10 mins of heavy load, dgraph alphas get oomkilled by kubernetes. When the alpha pods restart, the get oomkilled straight away- and the whole cluster stays in a broken state.

I'm guessing there's some sort of write-ahead-log that dgraph is trying to resume from (from the attached persistent volumes) that is larger than the mem limit given- causing it to instantly get oom killed?

@seanlaff
Copy link
Author

seanlaff commented Jun 4, 2020

Discussion was continued here https://discuss.dgraph.io/t/dgraph-cant-idle-without-being-oomkilled-after-large-data-ingestion/6543/60

Improvements have been made to both badger and dgraph since. Will run another large scale test soon

@darkn3rd
Copy link
Contributor

From discussion, it is related to dgraph-io/dgraph#5585.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants