Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logstash v1.5.4 misleading error message when heapsize is too small #3817

Closed
TinLe opened this issue Aug 28, 2015 · 20 comments
Closed

logstash v1.5.4 misleading error message when heapsize is too small #3817

TinLe opened this issue Aug 28, 2015 · 20 comments
Assignees
Labels

Comments

@TinLe
Copy link

TinLe commented Aug 28, 2015

If logstash is started with too small of a heap, it barfed with misleading error message. Fortunately, I just changed one value, so I knew exactly what broke.

I had LS_HEAP_SIZE=2000m and lowered it to 1000m, when logstash is started, the log contain:

Exception in thread "|worker" Exception in thread "<elasticsearch" java.lang.UnsupportedOperationException
at java.lang.Thread.stop(Thread.java:869)
at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112)
at java.lang.Thread.run(Thread.java:745)
java.lang.UnsupportedOperationException
at java.lang.Thread.stop(Thread.java:869)
at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112)
at java.lang.Thread.run(Thread.java:745)

Changing the LS_HEAP_SIZE to 1500m allows it to run again. So evidently, I lowered it too much.

@suyograo suyograo added the bug label Aug 28, 2015
@suyograo
Copy link
Contributor

The error message is so bizarre. Not sure why it does not throw an OOM exception.

Related #3400

@TinLe
Copy link
Author

TinLe commented Aug 28, 2015

The config is pretty simple too.

input {
elasticsearch {
hosts => [ "lva1-app12321", "lva1-app12330", "lva1-app12340", "lva1-app12344" ]
port => "9200"
index => "es1_alias"
size => 500
scroll => "1m"
scan => true
}
}

filter {
mutate {
remove_field => ["command"]
rename => ["host", "logstash_node"]
gsub => [
"http_client_ip", "-", "127.0.0.1"
]
}
date { match => [ "timestamp", "EEE MMM dd hh:mm:ss zzz YYYY", "ISO8601", "UNIX_MS" ] }

metrics {
meter => [ "type" ]
add_tag => "metric"
clear_interval => 3600 # clear metrics every 60m
flush_interval => 60 # flush metrics every 1m
}
}
output {
if "metric" in [tags] {
stdout { codec => json }
} else {
elasticsearch {
host => "localhost"
port => "9200"
protocol => "http"
index => "es1-%{+YYYY.MM.dd}"
}
}
}

@suyograo
Copy link
Contributor

@TinLe it depends on the size of your events because the default bulk size from LS to ES is 5K! That may work when your logs are 300 bytes a pop, but will quickly blow up when its bigger. A defensive step we are taking is to reduce the default to 500 instead of 5k in the next release.

@TinLe You can try going back to 1 GB heap, but reduce the flush_size to 1K or something. See https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-flush_size

@TinLe
Copy link
Author

TinLe commented Aug 28, 2015

@suyograo Yes, I am still getting that exception error in the log, and then it seem to stop consuming, although logstash process is still around.

I tried 1GB heap, no good. Reduced flush_size to 1K, no good. Increased heap to 2GB, still exception. Now running at 2GB heap AND reduced flush_size to 100. So far stable last hour or so.

@suyograo
Copy link
Contributor

@TinLe interesting, whats the avg size of your events/docs?

@TinLe
Copy link
Author

TinLe commented Aug 28, 2015

Approx 4KB.

@suyograo
Copy link
Contributor

@TinLe another point that @talevy pointed out, you are using elasticsearch input -- we just fixed a bug in this plugin to reduce the memory usage.. logstash-plugins/logstash-input-elasticsearch#21

We'll have a version published soon with that patch. It should help

@suyograo
Copy link
Contributor

@TinLe would you be willing to test this patch? logstash-input-elasticsearch v1.0.1 has been released with the memory consumption fix

bin/plugin update logstash-input-elasticsearch should do it

@TinLe
Copy link
Author

TinLe commented Aug 29, 2015

Yep. That seem to have fixed it. I tested with 2GB and 1000 flush_size, works well for 10minutes. So tried again with 1.5GB and 1000 flush_size, still running. Memory usage seem to be lower too.

:-)

@suyograo
Copy link
Contributor

yay! nice work @talevy ! 🏆

@suyograo
Copy link
Contributor

thanks @TinLe for trying it out!

@guyboertje
Copy link
Contributor

While the issue is solved for @TinLe, there is still the unsatisfactory error message for the conditions described above.

@talevy
Copy link
Contributor

talevy commented Sep 9, 2015

@guyboertje do you have any ideas on how to catch this situation within jruby and handle it appropriately? I agree, we should catch this and supply users with a better message.

@suyograo
Copy link
Contributor

+100 on providing better feedback to users when OOM happens.

@guyboertje
Copy link
Contributor

this stackoverflow question and this JRuby issue points to the problem.

With JRuby 1.7.22 and above this particular error should not occur.

But it raises the next question. What is the exception raised in the thread that causes it to stop?

Probably java.lang.OutOfMemoryError in that thread but which JVM section is out of memory? We will only know if we rescue and log it (will that even work if we are OOM?).

However according to this JRuby issue, with rescue => e JRuby does not rescue java.lang.OutOfMemoryError and the Java docs indicate that it should not be caught. So it propagates up to the thread.

Here in [this stackoverflow question]9http://stackoverflow.com/questions/2679330/catching-java-lang-outofmemoryerror) and this answer in particular provide more insights.

Epic research even if I say so myself!

@jstangroome
Copy link

@guyboertje

With JRuby 1.7.22 and above this particular error should not occur.

Are we likely to see a LS 1.5.5 release that has been updated from JRuby 1.7.20 to 1.7.22 or will we need to use LS 2.0?

@guyboertje
Copy link
Contributor

@jstangroome - LS v1.5.4 does have JRuby 1.7.22. I was (badly) stating my surprise that the original poster got this UnsupportedOperation error in in LS 1.5.4 (1.7.22). I need to create a OOM generating input, codec, filter and output and test how we respond to these conditions.

We cannot guarantee that we will back port any improvements made in 2.0 in this regard to 1.5.5.

@ph see this line from JRuby specs
and this line and 1485.
Because the second link shows that the raise is done on the mainThread and that it could be a Ruby Exception class or a wrapped Java Error object i.e. not a subclass of Ruby Exception but a IRubyObject, maybe we should rescue Object => e

WDYT?

@guyboertje
Copy link
Contributor

@jstangroome - correction to above comment. You correctly point out here that LS 1.5.4 has 1.7.20

@suyograo
Copy link
Contributor

@jstangroome we will be shipping 1.5.5 with JRuby 1.7.22 and other fixes soon.

@guyboertje
Copy link
Contributor

I think we have this covered now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants