Resource limits cause cluster oom kill lock #17

seanlaff · 2020-04-29T12:22:58Z

I spin up a cluster with req/limit of 6gb mem. After about 10 mins of heavy load, dgraph alphas get oomkilled by kubernetes. When the alpha pods restart, the get oomkilled straight away- and the whole cluster stays in a broken state.

I'm guessing there's some sort of write-ahead-log that dgraph is trying to resume from (from the attached persistent volumes) that is larger than the mem limit given- causing it to instantly get oom killed?

seanlaff · 2020-06-04T12:42:57Z

Discussion was continued here https://discuss.dgraph.io/t/dgraph-cant-idle-without-being-oomkilled-after-large-data-ingestion/6543/60

Improvements have been made to both badger and dgraph since. Will run another large scale test soon

darkn3rd · 2022-10-11T16:42:31Z

From discussion, it is related to dgraph-io/dgraph#5585.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource limits cause cluster oom kill lock #17

Resource limits cause cluster oom kill lock #17

seanlaff commented Apr 29, 2020 •

edited

seanlaff commented Jun 4, 2020

darkn3rd commented Oct 11, 2022

Resource limits cause cluster oom kill lock #17

Resource limits cause cluster oom kill lock #17

Comments

seanlaff commented Apr 29, 2020 • edited

seanlaff commented Jun 4, 2020

darkn3rd commented Oct 11, 2022

seanlaff commented Apr 29, 2020 •

edited