-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage spikes during WAL replay to more than normal usage #6934
Comments
Your Prometheus instance either needs more RAM, or you can delete the WAL directory (you will lose some data). Don't think there's been changes recently that would affect replay memory usage cc @codesome |
How many targets / timeseries do you have? For only 3 segments that's A LOT of memory. |
Hi. Thanks. That's not 3 segments - I just cut everything else out. I deleted WAL files and prometheus started just fine. Since WAL files are no longer, this issue cannot be reproduced and therefore closing it. |
We are also facing the same issue. We are running 2 pods of Prometheus with a memory limit of 10 Gib and a request limit of 6 Gib |
I'm also observing this issue. @cstyan is there anything we can do to further debug? In my case, the WAL replay memory usage seems to require 2-3x as much memory as the running Prometheus instance. So while the instance happily runs at around 30Gi memory usage, replay uses over 50+Gi which leads to being OOMKilled at startup. I can't simply increase the RAM limit here since it requires an excessive amount of memory to just start. Is that expected? It seems like it is keeping too much in memory while doing the replay. |
I have also seen that the memory spikes during WAL replay to almost double the consumption before shutdown (no remote write). Ideally WAL replay should not take so much memory, so there is some room for improvement here I believe. I am re-opening this issue and update the title (maybe the snapshotting work will help here). |
I don't think that snapshotting would make a difference, because we're still putting the same data in the head one way or the other. I think we should figure out why this is happening first. |
Any way to disable WAL completely? this has is reproducing on a daily basis now and causing quite an issue where our memory spike of prometheus is causing some issues., |
@yechiel-optimalq as a workaround you can remove the WAL folder in an init container. You will lose the data in the WAL, of course. |
Yes, i'm doing it, but it becomes a daily thing.. |
Is there any chance you could setup conprof to help taking relevant memory dumps at startup? |
We are running with helm chart, if you have some instructions for the conprof on the helm chart i can try |
this ofcourse continue to happen for us, just OOM killer due to WAL Reply. If i delete WAL all is good. |
Hello, Has anyone been able to resolve it? I keep having to delete 'wal' folder every few weeks. |
Can we get the full logs for the first hour? |
We would need some profiling for WAL replay to find the pieces that is holding onto memory. I think this is a common occurrance. While it is in my interest to profile and investigate, I don't think I would have time for that anytime soon. |
We've been having this issue for a long time now too. Prometheus runs fine until it gets evicted and then we have to go and delete the WAL. |
I did a quick profiling of WAL replay yesterday and there was nothing that stood out as an obvious thing to fix/optimise. I am expecting #7229 to help in this. |
I don't think it'll make a difference, we still have all the exact same data structures being built up one way or the other. |
Building a chunk from samples takes more allocations and temporary memory than simply taking encoded chunk bytes and sticking into the memory. We will know more after some benchmarks. |
Which is the same as the RAM it took when it was originally being ingested, and thus shouldn't cause any extra RAM requirements during replay. This feels like a red herring to me. |
+1 |
1 similar comment
+1 |
we have 4 instances of exact same version of prometheus; and only one of them is showing this behaviour; any clue how to debug it further? we tried deleting WAL directory but it returns back |
Can you try the release candidate? Might be better than hoping! |
prometheus/prometheus#6934 is the core problem, and we can afford to lose a tiny bit of data during restarts by killing the wal at startup instead of giving prometheus a lot of memory to read it all in. We lose data during the WAL reading phase anyway, and the amount of memory required seems ginormous (the linked issue has folks topping out at 64G). Apparently 2.45 is supposed to help, but it's not out yet (see prometheus/prometheus#6934 (comment)). We should only do the wal deletions on prometheii that have this specific problem. I've removed the higher memory request for this, and brought it in line with our other prometheii - and it seems to work fine without any issues. I've thus also reduced the size of the core node, practically reverting 2i2c-org#2701
prometheus/prometheus#6934 is the core problem, and we can afford to lose a tiny bit of data during restarts by killing the wal at startup instead of giving prometheus a lot of memory to read it all in. We lose data during the WAL reading phase anyway, and the amount of memory required seems ginormous (the linked issue has folks topping out at 64G). Apparently 2.45 is supposed to help, but it's not out yet (see prometheus/prometheus#6934 (comment)). We should only do the wal deletions on prometheii that have this specific problem. I've removed the higher memory request for this, and brought it in line with our other prometheii - and it seems to work fine without any issues. I've thus also reduced the size of the core node, practically reverting 2i2c-org#2701
Issue discussion summaryI read through this issue, and here are some takeaways.
|
Has anyone got any results to report from version 2.45 ? |
I trying to use 2.45. |
We are on 2.45 Still having restarts and crashes.. |
Same here. |
same here. |
same here. We still encounter the issue |
ֿAnyone is going to push this forward? We suffer badly from this issue!! |
Same issue here !! |
Same issue here |
1 similar comment
Same issue here |
So... did 2.45 fix the WAL loading memory spike? |
A number of people reported no improvement. It would be helpful to supply details, e.g. a memory profile when the process gets large. |
Sure, will keep an eye for when the loop of death is happening again. Could you please provide what details are you looking for exactly and how to get them? For context, we're running prometheus inside k8s with operator. Thanks! |
Prometheus (and most Go programs) have an http service that lets you fetch profiles. The command will vary according to your particular setup, but basically you want to get access to Prometheus' http port (perhaps via
If the output is less than 1KB it is probably an error; look at the output and try to resolve whatever it says. The standard issue template asks for other details such as config and logs. |
I would limit the storage.tsdb.retention.size to a lower value as a temporary fix |
Same issue with |
Seems I need to say it louder: adding "me too" with no details (a) emails everyone who commented so far and (b) gets us nowhere. Please don't do this. Clicking 👍 on the top description can indicate your desire for improvement. #6934 (comment) for further instructions. |
I hope this isn't considered a "me too" comment, we've been having this issue too, but as a fix we've dropped our We're using a bare bones Mimir deployment that uses about as much resources as one of our Prometheus instances. It's not ideal for our use case, we have fairly low metrics throughput requirements (around 500-600k series) so I don't see why we need to have a totally separate system just for long term storage, but WAL replays were really killing us on costs! |
Prometheus v2.48.1 We experienced this error in one of our clusters and stumbled in this issue here. What we discovered so far.
Memory limit reached with 32GB usage
I think the problematic process is the computing of wal files one more and more instead one file separately to save memory. It seems there is no option to proceed this manually. |
After another day of debugging we drifting in the direction that a specific type of metrics could cause the break between WAL file creating and blocks writing to TSDB. Regarding documentation this should happen latest after 2 hours. In our scenario it did not happen during 4 days. We saved the WAL files before deleting them and bring Prometheus back in service. Could be an option to investigate the WAL files, tried with Next plans:
cc: @puffitos @CerRegulus |
Hello from the bug scrub session! I would love to clarify things -- it might have been the case that in the past WAL replay was causing a dramatic overhead that caused unexpected OOMs e.g. your Prometheus run on 70% of memory limit and if the overhead was 300% more, you would get affected. There might be also a case of more work being done in the same time, so more CPU used during replay. If you are low on CPU in your env (e.g. Kubernetes CPU limit), then things are slower, including the GC process, thus memory is not released fast enough. However, I was recently benchmarking our Prometheus use at Google for the recent versions and I cannot reproduce a major memory overhead (looking on heap and working sets) (there is a 2x CPU bump though) when replying, even if it is, it's 1-5%. I wonder, if what many users reporting here is a symptom of the different problem and the replay OOM only surfaced that problem. So two very popular scenarios are right now are: A. Your Prometheus setup scraped too many series/samples in the first place and OOMed. Obviously replay will reload all those samples back, causing a classic "OOM crashloop" situation. No matter if replay will use 1% of normal mem use, or 200% the outcome would be the same. Generally I see 3 things are the next steps:
SummaryFor this particular issue, ideally please only add the comment if the 1 occurs. To tell you need:
All, so we can ensure the least overhead possible 💪🏽 |
Hi, we are using prometheus 2.48, trying to use it with vertical pod autoscaler, but we have problem that each time vertical pod autoscaler changes resources and recreates pod there is big CPU/Memory spike at prometheus pod startup which is causing spike in metrics used by vertical pod autoscaler creating much higher average than would be needed and pod is recreated every few minutes because it cannot stabilise because of that startup peak (spike is around 40G mem and 3-4 cores, and you see that after each restart peak util drops to down about 8GB and less than 1core), thus we cannot downscale. Also i cannot create wider range for metrics evaluation by vertical pod autoscaler because it then wouldn't be able to promptly react to quite sharp changes in number of our pods. We don't want to remove WAL as we don't want to loose data. Also it looks like size of peak is dependent on number of targets which were there previously (but not anymore for several hours) so it looks like something is not cleaned up? |
What did you do?
Tried to start prometheus.
What did you expect to see?
Prometheus up & running, web interface showing up.
What did you see instead? Under which circumstances?
Prometheus runs out of RAM during "WAL segment loaded" process.
Environment
Debian 9
System information:
Linux 4.9.0-11-amd64 x86_64
Prometheus version:
This is what happends during the start after 10+- minutes:
This is how systemd service looks like:
Here is RAM usage of the server (pas 1 hour) - note that RAM fills up, runs out of RAM, service gets killed and is being restarted:
Please advise how do I troubleshoot further this issue?
The text was updated successfully, but these errors were encountered: