ElasticsearchContainer is not waiting for container being ready #2982

ghost · 2020-07-08T21:46:01Z

In /modules/elasticsearch/src/main/java/org/testcontainers/elasticsearch/ElasticsearchContainer.java#L55 the ElasticsearchContainer is waiting for either of two status codes:

.forStatusCodeMatching(response -> response == HTTP_OK || response == HTTP_UNAUTHORIZED)

But ES returning UNAUTHORIZED means that it is not 100% ready yet. Due to this, a simple test relying on the default ElasticsearchContainer without any changes and trying to use its API fails in about 10%-20% of invocations with errors like

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=security_exception, reason=missing authentication token for REST request

or if one adds default ES user credentials with

  org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=security_exception, reason=failed to authenticate user [elastic]]

The current workaround is (Scala code):

setWaitStrategy(
  getWaitStrategy.asInstanceOf[HttpWaitStrategy] // Get the current HttpWaitStrategy
    .forStatusCodeMatching(null) // Clear its StatusCodeMatching predicate.
    .forStatusCode(HTTP_OK)) // Set a single HTTP_OK StatusCode

This fixes the flaky test and I believe this should be the default in the ElasticsearchContainer.

The text was updated successfully, but these errors were encountered:

ghost · 2020-07-09T03:15:20Z

It is worth pointing out that this line was originally copy-pasted from Couchbase and allowing for the UNAUTHRIZED response here was not anything deliberate or stemming from ElasticSearch implementation details.

See the conversation which accompanied the change: #826 (comment)

raynigon · 2020-07-09T11:51:28Z

@lvorona i think this will not work, if elasticsearch security is enabled. Best way would be to call the cluster health api with credentials.

ghost · 2020-07-09T16:03:53Z

The user who enables elasticsearch security can still override the defaults similar to the way I have posted in the initial comment. This is only to make the defaults to work.

Currently a user who has a simple single container test has to add extra code. It would make sense to change the defaults to work out of the box and document the changes needed when security is enabled.

Perhaps I could change the health check to take security into account as well. Let me take a look.

rnorth · 2020-07-13T19:07:21Z

Thanks for the PR, @slovdahl. I'm not sure if the original line just a straight copy and paste from the Couchbase module (at least that's not my interpretation of #826 (comment))

@dadoonet, could you perhaps lend your opinion on the suggested approach in this ticket and the PR (#2983)?

slovdahl · 2020-07-14T05:58:41Z

Just for the record, I did not submit the PR 😄 I just added a proper cross-reference.

ghost · 2020-07-14T20:22:20Z

Thank you for fixing the PR.

I have pushed one more commit that points the HttpWaitStrategy at the dedicated ElasticSearch health endpoint 1: /_cluster/health

It took me a while to figure out the container's behaviour, but I have tested with various settings and waiting for the 200 on the dedicated healthcheck endpoint behaved the best.

Some important notes:

The 401 response mentioned in the ticket is not reproducible with the ES version used in this module by default (6.4.1)
The 401 response is reproducible with both 6.2.4 (used in our real code) and 6.8.0 - a slightly more recent version, which support authentication.
To reproduce the issue the system must be under some noticeable load. Even then, it takes several runs of the loop while mvn package; do echo -e "\n\nAnother one...\n\n" ; done to fail in the test.
Test with security enabled according to the elastic blog 2 worked, the health check endpoint itself does not require authentication
Simulating network slowness (sudo tc qdisc add dev docker0 root netem delay 500ms) may help to make reproducing this issue. I did it, not sure if it helped much though.

As a reminder, this issue came out of us looking for the root of a unit test flakiness. It does not fail all the time, but I have a small project (a pom.xml and a single unit test) that allowed me to reproduce the issue. Let me know if you want it, I'll share the code.

ghost · 2020-07-14T20:40:36Z

Actually, I was not enabling the authentication properly. The Cluster health check requires authentication, and I had to add the following clause for the test to pass:

// To properly enable the security
container.withEnv("ELASTIC_PASSWORD", "secret");
container.withEnv("xpack.security.enabled", "true");

// The line to add credentials to the wait strategy
.withBasicCredentials("elastic", "secret")

I feel like this is still an improvement over the test being flacky. I could add this snippet to the documentation (https://www.testcontainers.org/modules/elasticsearch/), but I need help to know which repository to update for that.

dadoonet · 2020-07-15T09:18:51Z

I feel like this is still an improvement over the test being flacky. I could add this snippet to the documentation (https://www.testcontainers.org/modules/elasticsearch/), but I need help to know which repository to update for that.

Hey. I think this is what I'm bringing with #2320 hopefully.

ghost · 2020-07-15T16:26:28Z

Great, then the user will not have to override the wait strategy, since your container already knows if the credentials are needed or not.

Feel free to incorporate this change into your PR if that would make it easier for you to manage the change.

stale · 2020-10-17T20:23:24Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this is a mistake, please reply to this comment to keep it open. If there isn't one already, a PR to fix or at least reproduce the problem in a test case will always help us get back on track to tackle this.

ghost · 2020-10-18T17:43:31Z

This has been resolved (see comments above).

slovdahl mentioned this issue Jul 13, 2020

Fixes issue 2982. ElasticsearchContainer to wait for HTTP_OK response as a signal of being ready. #2983

Closed

stale bot added the stale label Oct 17, 2020

ghost closed this as completed Oct 18, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ElasticsearchContainer is not waiting for container being ready #2982

ElasticsearchContainer is not waiting for container being ready #2982

ghost commented Jul 8, 2020

ghost commented Jul 9, 2020

raynigon commented Jul 9, 2020

ghost commented Jul 9, 2020

rnorth commented Jul 13, 2020

slovdahl commented Jul 14, 2020

ghost commented Jul 14, 2020

ghost commented Jul 14, 2020

dadoonet commented Jul 15, 2020

ghost commented Jul 15, 2020

stale bot commented Oct 17, 2020

ghost commented Oct 18, 2020

ElasticsearchContainer is not waiting for container being ready #2982

ElasticsearchContainer is not waiting for container being ready #2982

Comments

ghost commented Jul 8, 2020

ghost commented Jul 9, 2020

raynigon commented Jul 9, 2020

ghost commented Jul 9, 2020

rnorth commented Jul 13, 2020

slovdahl commented Jul 14, 2020

ghost commented Jul 14, 2020

ghost commented Jul 14, 2020

dadoonet commented Jul 15, 2020

ghost commented Jul 15, 2020

stale bot commented Oct 17, 2020

ghost commented Oct 18, 2020