Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale index replicas independently #13

Open
otrosien opened this issue Mar 29, 2019 · 10 comments
Open

Scale index replicas independently #13

otrosien opened this issue Mar 29, 2019 · 10 comments
Labels
design Design idea, not ready for implementation

Comments

@otrosien
Copy link
Member

It is usually sufficient to scale up replicas of one index in a group: the one with the highest traffic. Benefit is increased efficiency of scaling operation, less wasted resources by adding replicas for indices that may not require it.

Implementation would require monitoring per-index or per-node CPU stats to identify the hot-spot in the cluster group. The indices allocated on this node are potential candidates for scaling out.

@otrosien otrosien added the design Design idea, not ready for implementation label Mar 29, 2019
@amanjain97
Copy link

Hi @otrosien
Is their any plan to implement the idea.
I have two index but I want to scale only one of them and other to expand replicas automatically.
How can we achieve this ?

@otrosien
Copy link
Member Author

Hi @amanjain97. We need to put effort first into conceptualizing this idea. For the time being try to stick with two EDS to scale them independently.

@amanjain97
Copy link

A quick workaround is using ES node group allocation to separate two groups and scale independently.
Interested to contribute if you plan to work on it.

@otrosien
Copy link
Member Author

I think this is what I meant. We have this kind of setup in our company where we're running multiple EDS stacks in the same cluster, and indices are separated into individual stacks by using node group attributes.

First of all, does this kind of set up work for you as well? What is the main pain point, is it cost?

@AyWa
Copy link
Contributor

AyWa commented Aug 25, 2020

We have similar problem, main point might be cost for us, but multiple EDS is working well.
I have a question about multiple EDS stacks. Sometimes the operator seems to be stuck in an operation and so it is not operating the other EDS:

time="2020-08-25T08:51:37Z" level=info msg="Found 1 remaining shards on es/data-4 (172.31.131.121)" endpoint="http://data-vi-vn.es.svc.cluster.local.:9200"
time="2020-08-25T08:52:03Z" level=info msg="Found 1 remaining shards on es/data-4 (172.31.131.121)" endpoint="http://data-vi-vn.es.svc.cluster.local.:9200"
time="2020-08-25T08:52:13Z" level=info msg="Scaling hint: UP" eds=data-shard-test namespace=es
time="2020-08-25T08:52:13Z" level=info msg="Updating last scaling event in EDS 'es/data-shard-test'"
time="2020-08-25T08:52:13Z" level=info msg="Updating desired scaling for EDS 'es/data-shard-test'. New desired replicas: 4. Increasing node replicas to 4."
time="2020-08-25T08:52:20Z" level=info msg="Found 1 remaining shards on es/data-4 (172.31.131.121)" endpoint="http://data-vi-vn.es.svc.cluster.local.:9200"
time="2020-08-25T08:52:38Z" level=info msg="Found 1 remaining shards on es/data-4 (172.31.131.121)" endpoint="http://data-vi-vn.es.svc.cluster.local.:9200"

The es/data-shard-test was never updated, and then any modification to all my EDS seems to not work. (for example if I was getting my statefulset, it was still at 2 for es/data-shard-test)

I will maybe start to investigate the code, but I am wondering if I should deploy one Operator per group ? or if the operator can be stuck in an operation

@mikkeloscar
Copy link
Collaborator

I will maybe start to investigate the code, but I am wondering if I should deploy one Operator per group ? or if the operator can be stuck in an operation

It should be able to handle multiple EDS at a time, so it could indicate a bug if it's not.

@AyWa
Copy link
Contributor

AyWa commented Aug 26, 2020

It should be able to handle multiple EDS at a time, so it could indicate a bug if it's not.

So it seems to happen when I set minIndexReplicas to 0 and the operator is downscaling to this level (seems to be stuck and not succeed even after long time) and ES master was not moving shards etc, So I think the node was not exclude from the ES cluster.

I will try to reproduce and look to more info like, log of master, check exlude node in the cluster etc

@otrosien
Copy link
Member Author

otrosien commented Sep 2, 2020

Sometimes ES is just not able to de-allocate shards from a node given its constraints (shards per node, availability zone awareness etc.) you'll need to check with the allocation-explain API (https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-allocation-explain.html) when such things happen.

@amanjain97
Copy link

@otrosien
Hi, We have tried a demo in dev and it seems to work well with multiple EDS.
I want to get your advice on what is the best strategy for hosting a multi-tenant elasticsearch so that multiple teams can use it with some restrictions of admin_api. Can we discuss about this.

@otrosien
Copy link
Member Author

I personally don't have any experience in running ES in multitenancy. Have you checked the security features from the basic license? https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api.html

It may be worth asking your questions in the Elastic discussion forums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design idea, not ready for implementation
Projects
None yet
Development

No branches or pull requests

4 participants