Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] IP stays in exclude list when draining fails #128

Open
otrosien opened this issue Oct 26, 2020 · 0 comments
Open

[Bug] IP stays in exclude list when draining fails #128

otrosien opened this issue Oct 26, 2020 · 0 comments
Labels
bug Something isn't working

Comments

@otrosien
Copy link
Member

Expected Behavior

There are situations when ES will refuse to drain a given node (usually allocation constraints like max. number of shards per index and node). This will cause ES Operator to wait indefinitely for the draining to finish. At some point the scale-down event gets superseded by a scale-up event.

This should lead to the previously "to-be-drained" node to be used again.

Actual Behavior

What happens instead is that the IP stays in the cluster.routing.allocation.exclude._ip and the scale-up event only causes the statefulset to be updated, spawning new nodes. This leaves the node in a commissioned but unused state.

Steps to Reproduce the Problem

  1. Create a cluster with two nodes (minReplicas=1, maxReplicas=2, minIndexReplicas=0), add one index with two shards, no replicas and "routing.allocation.total_shards_per_node: 1"
  2. Wait for es-operator to start draining the second node, which will fail as ES rejects more than one shard of that same index onto the same node
  3. Trigger a scale-out event by putting some CPU load onto ES.
  4. Check :9200/_cluster/settings to see the IP being still in there.

Specifications

  • Version: latest
  • Platform: any
  • Subsystem: any
@mikkeloscar mikkeloscar added the bug Something isn't working label Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants