-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki gets OOM killed when issuing a simple query #6501
Comments
The query is simple but it is processing all your 20Gb data. The
That's right, but you only have 4Gb available in a single node. What this statement is saying is that: if your load is under 100 Gi per day, using Loki in monolithic mode shouldn't be a problem, but if it is above that, you should consider using the other modes (microservices or simple-scalable). That doesn't mean that you will be able to process 20 Gb of log in a single query with a 4 Gb machine. |
@DylanGuedes As shown in the OP screenshot; the query is limited to the last 5 mins, I'd hardly consider this the whole 20 Gb data. 🙂 Also, I went ahead and used the simple scalable mode after creating this issue but with way less load ~ 2 - 4 Gb / day. The pods still get OOM killed and restart with some time-limited queries. I'd expect a better memory management for a tool like Loki! |
Sorry, I overlooked that. Hard to say how much data you had to process, though.
Which query did you use in that scenario? If it was the same query |
For the volumes you are describing I'm a little surprised to see such memory growth, however
This is a very painful query for Loki to execute telling it to basically count every line it's ingested in the last 5 mins multiple times, for such a short duration Grafana likely selected a really small step so when you use a fixed
This would be a better query to avoid overprocessing data, or also consider running as an
Where Sorry this is all a bit confusing, I'm actually working on documentation/blog post about this today to hopefully be out soon. Back to the original issue however, there still may be something wrong here, if you are able to pull a heap dump that could be very helpful Run this after the query has run for a few seconds:
I'm not sure if github lets you upload something like that here? Another consideration is that newer versions of Loki have an inmemory cache enabled by default for chunks which is sized at 1GB, so likely at least 1GB of that memory is cached objects |
@slim-bean I'm not sure I understand what you meant by this:
I'm curious about why should it go over the same log lines multiple times. I think an efficient way to get log rates is to just go over the log lines in the specified range vector only once and presenting the value of the per-second rate as an average over that range. I believe prometheus does it this way AFAIK. Am I understanding this right? Although your suggestions around using the
I tried but it just gives me empty response. Also tried the go command
|
@ElSamhaa you'd have to call the pprof/heap endpoint before getting your pod killed. |
I know :) |
Hi. I am facing the same issue with Loki's distributed helm chart with Loki version 2.6.1. |
Same here, using the latest helm chart for 2.6.1 in single binary mode. After startup, it seems Loki is just eating all available memory. Setting resource limits just makes it reach that point of unavailability sooner and hitting an OOM. |
+1 |
Just as a side note, this Grafana blog post solved my issue. I was using labels in Loki incorrectly. |
Three
Seems the defaults require many GBs of memory. Try lowering them if you have issues. Also: thank you Grafana Labs for this great project! 👏 |
I have been testing a microservice deployment of Loki 2.8.2 on a 6 node m7g.xlarge EKS cluster receiving a mere 7GB per hour (uncompressed). After a clean start, each node has at least 10GB of memory free, sometimes up to 15GB. Each node has 1 querier pod, and there there are no memory limits on the queriers (mainly for testing). This deployment includes query-frontend, query-scheduler, and index-gateway microservices. One of my use-cases is to support bulk downloading of data, for the moment by allowing users to use logcli directly (until Grafana itself supports bulk downloads). Ingestion is stable, but it's unfortunately easy for users to completely topple the queriers with simple "fetch all logs in the past hour" query with modest numbers of parallel workers. The higher the number, the quicker all queriers begin OOMing. For example this query, which hits about 300 streams:
Most queriers hold out for about 15 seconds there. Set I tried tuning the parameters mentioned by @csdaraujo above to absurd levels just to see if it would help: limits_config:
max_query_length: 45d
max_chunks_per_query: 1
max_streams_matchers_per_query: 1
max_query_series: 1
max_query_parallelism: 1
# Effectively disable query sharding
min_sharding_lookback: 30d Suffice it to say, it didn't. We need to figure out how to make Loki defensive in this situation lest it be declared not fit for purpose. We have to expect that users will very quickly figure out that increasing I'll keep banging on the configuration but meanwhile I'm hoping there might be suggestions on other knobs I could try tuning. |
@jtackaberry there's a new max query read limit that you can use. You can use |
@DylanGuedes ooh, this sounds very promising. Looks like it's not in v2.8.2 so I'll build from main and report back. Thanks for the pointer! |
The good news is those new parameters work as advertised. The bad news is there's no actual way to make the aforementioned logcli query work without monstrous upper limits. In hindsight I shouldn't be surprised by that given the OOMs, but I had clearly assumed more intelligence in logcli than actually exists. For the bulk download use case, it sounds like we might need to write a custom tool that more intelligently chunks up the work by first discovering the individual streams from the stream selector and then parallelizing download/reassembly of the individual streams. (A custom tool, or perhaps a PR to logcli itself. ) But the most important part is that the querier defends against OOM and these new settings look extremely useful for that. Thanks again! |
if you have 32G on the loki read node, what would you set this var to? max_querier_bytes_read 16G? |
Conversely, I can still pop the queriers within seconds even with a low As a result, I haven't yet managed to convince myself that I've had some success reducing querier OOMs by:
I've not yet been able to solve this logcli scenario through config tuning -- any logcli user can still pop all the queriers for their tenant at will -- but at least queries via Grafana are much more stable than before. |
@jtackaberry Thank you very much for your detailed explanation, I finally found time to review your reply and had a very good understanding of it. What measurement you have taken to set the gomemlimit? if you have 8GiB on the vm, then you set 6GiB? I agree with you about reducing querier.max_concurrent number to very low value, e.g I have 16 vcore, but I set the number to 8 which gave me a much better run. we have about 3 read nodes per dc, 3 dcs total. What I found is that even though loki said distributed, most of my queries are going to 1 node per dc (can be any node). do you run into the same issue? I figured if i can get the query to distribute to more nodes, i will have a better chance to mitigate this OOM. Many thanks in advanced. David |
@xh63 I'm not sure yet what to recommend for GOMEMLIMIT -- we've more experimentation to do -- but my gut feeling is that with an 8GB VM, assuming nothing else is on that VM but a single querier, 5GB is probably about right. Grafana themselves have published (in one of their webinars) that they target 6GB per querier and set a cgroups limit of 16GB (via Kubernetes pod limit). Granted, this webinar is a little old (it predates TSDB becoming GA and the recommended configuration), so I'm not sure if it's still an accurate reflection of how Grafana runs Loki in Grafana Cloud, but I do think that 8G as an upper bound on a single querier is probably too small if you're running Loki at any decent scale. Still working this out, but my sense is that we'll run queriers on a group of 8 core / 32GB nodes, targeting 2 queriers per node, except, unlike Grafana, we'll set the limit at 15GB per querier to leave room for other things running on the node (such as kubelet -- we'll be hosting in EKS -- and query-frontend and query-scheduler). Given that, I'm anticipating we'll bump GOMEMLIMIT from 6G to something closer to 10G. To your second question, I'm not sure what you mean technically when you say "DC" at least as it relates to Loki? I know one can configure zone awareness on the write path, but I'm not sure how this relates to the read path. My first thought is that your query isn't able to be split/sharded enough to keep your queriers busy. You didn't mention what time range you were running the query over, but Finally, we've recently discovered the most important thing of all when it comes to querier OOMs: avoid lz4 chunk encoding! Let me put it into perspective: we deployed a node group of 32 m7g.2xlarge (8 core / 32GB RAM) instances in EKS for the queriers. 32 queriers, 1 querier per node, with a pod limit of 30G. That's an effective 960GB of RAM for the queriers. We observed that generating the log volume histogram over a 3 hour search range with a label selector that covered a measly 50GB of logs -- 5% of the total RAM for queriers alone -- would still OOM multiple queriers. Profiling the queriers showed an enormous amount of memory used by the lz4 library. It may be related to pierrec/lz4#175. We changed chunk encoding to snappy and I've never seen the queriers so well behaved. We need to collect more log data with snappy and run larger queries to be sure, but initial results are extremely promising. So, moral of of the story is if you're using lz4 and struggling with query OOMs, change to snappy. Unfortunately your storage costs will increase by 2-3x and that's a tough pill to swallow, but it's better than a completely unusable product. |
I have deployed Loki simple deployment (one binary) using the official helm chart.
version:
2.5.0
Describe the bug
The memory consumption easily goes up to 4 Gi which I defined it later as a resource limit for the pod. Whenever a simple query is made, even without spanning more than minutes, the memory usage goes up and the pod gets OOM killed.
To Reproduce
Steps to reproduce the behavior:
sum (rate({job!=""}[2m]))
Expected behavior
It was described in the Loki docs that the simple (single binary) deployment should hold for loads up to
100 Gi
of logs per day. The tested load is way less than that around20 Gi
, so it should've held just fine.Environment:
The text was updated successfully, but these errors were encountered: