Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index templates support #1343

Merged
merged 3 commits into from Jan 31, 2019
Merged

Conversation

dbalabka
Copy link
Contributor

@dbalabka dbalabka commented Dec 14, 2017

Original PR #917

Scope

This PR implements Elastica Index Templates functionality via bundle configuration. Index Templates widely is used to create historically indexes:

  • storing logs (Kibana) to prevent index growing because of incorrectly mapped fields
  • metrics (Marvel) to archive or delete old information

YAML configuration files:

    index_templates:
        <name>:
            client: default
            template_name: <template name>
            template: some_index_*
            settings:
                number_of_shards: 1
                number_of_replicas: 0
            types:
                auto_suggest:
                    mappings:
                        <field name>:  <params>
                         ...

To apply templates changes we have created new command fos:elastica:reset-templates with options:

  • --index - index template to reset. If no index template name specified than all templates will be reset
  • --force-delete - will delete all indexes that matches index templates patterns. Aware that pattern may match various indexes.

You must run the following command to sync templates configuration on ES server with YAML configurations:

php bin/console fos:elastica:reset-templates

You can build-in this command into the deployment process to automate template configuration sync.

Future scope

There should be a possibility to create historical indexes with the possibility to generate index name on the fly based on some strategy (typically using current date)

Example
In the following example, we have an index with name user_searches_history_YYYYMM and index template user_searches_history_template.
Index user_searches_history_YYYYMM contains placeholder - YYYYMM which will be replaced with current year and month.
We do not need to set an index mapping explicitly. Instead, each added document might create a new index (if it not exists) with populated mapping from user_searches_history_template index template configuration, because of pattern user_searches_history_* will match any of user_searches_history_YYYYMM indexes.

    indexes:
        user_searches_history_YYYYMM:
            name_strategy: date
            types: ~
            ...
    index_templates:
        user_searches_history_template:
            template: user_searches_history_*
            settings:
                ...
            types:
                user_searches:
                    mappings:
                        ....

TODO

  • Cover with unit tests
  • Cover with functional tests
  • Acualize branch
  • Update documentation

@XWB
Copy link
Member

XWB commented Jun 8, 2018

@torinaki Are you still working on this?

@dbalabka
Copy link
Contributor Author

dbalabka commented Jun 9, 2018

@XWB yes. I will try to actualize this next week. Anyway, it is already working code because we are already using this implementation on production and it is integrated into our deployment pipeline.

@dbalabka
Copy link
Contributor Author

dbalabka commented Jun 9, 2018

@XWB I've updated brach with a fresh master. Will continue work ASAP

@XWB
Copy link
Member

XWB commented Jun 12, 2018

👍

@Destroy666x
Copy link

Destroy666x commented Aug 7, 2018

@torinaki any progress? Currently this bundle is kind of unusable with Elastic 6.x, with 10 indexes that need 15 exactly same analyzers/filters the config looks like programmer's worst nightmare. Because of that together with the lack of multiindex services I will have to revert back to 5.x.

@nicholasruunu
Copy link

@Destroy666x You can reuse blocks and merge with yaml if that helps.

@Destroy666x
Copy link

@nicholasruunu the solution is also a bit clunky, but better than the previous mess, I guess. However, it doesn't seem to work inside Symfony's .yml.
Here's my attempt:

Analysis: &analysis
    analysis:
        filter:
            ngram:
                type: 'nGram'
                min_gram: 2
                max_gram: 32
            arabic_stop:
                type: 'stop'
                stopwords: '_arabic_'
            arabic_stemmer:
                type: 'stemmer'
                language: 'arabic'
            english_stop:
                type: 'stop'
                stopwords: '_english_'
            english_stemmer:
                type: 'stemmer'
                language: 'english'
            english_possessive_stemmer:
                type: 'stemmer'
                language: 'possessive_english'
        analyzer:
            search_arabic:
                tokenizer: 'keyword'
                filter:
                    - 'lowercase'
                    - 'arabic_normalization'
            index_arabic_ngram:
                tokenizer: 'standard'
                filter:
                    - 'lowercase'
                    - 'arabic_normalization'
                    - 'ngram'
            search_arabic_ngram:
                tokenizer: 'standard'
                filter:
                    - 'lowercase'
                    - 'arabic_normalization'
            search_english:
                tokenizer: 'keyword'
                filter:
                    - 'lowercase'
            index_english_ngram:
                tokenizer: 'standard'
                filter:
                    - 'lowercase'
                    - 'ngram'
            search_english_ngram:
                tokenizer: 'standard'
                filter:
                    - 'lowercase'

Symfony complains that no configuration for Analysis. I tried with Defaults just in case that's constant, but same thing.

@nicholasruunu
Copy link

nicholasruunu commented Aug 7, 2018

@Destroy666x I think you misunderstand how it works, the only thing you add is the &<name> after the row in the original configuration where you want to 'copy' everything indented under.

Then you can paste with *<name> or merge with <<: *<name>

@Destroy666x
Copy link

Ah yes, then that's even more clunky. But it'll work as temporary solution before this gets implemented. Thanks for help!

@nicholasruunu
Copy link

@Destroy666x np

If you want to do it in a more template-like way, I believe you could use x-vars:

x-template: &template
    whatever: you-want
    
analysis: *template

or

x-aliases:
  - &template
      whatever: you-want

analysis: *template

@dbalabka
Copy link
Contributor Author

dbalabka commented Aug 7, 2018

@Destroy666x we already using this code on production but in the scope of our own fork with a little bit outdated version of this bundle. I will do last push make it work for the latest version.
FYI Elastica Index Templates feature has been invented to make possible automatic index mapping creation during document indexation. Usualy it is useful to store some historycal data (e.g. Kibana logs).
Probably, it might help in your case as well.

@dbalabka
Copy link
Contributor Author

I did some refactoring of initial implementation to make code less coupled and BC safe. Now I'm working on tests and bug fixes

@dbalabka
Copy link
Contributor Author

dbalabka commented Jan 2, 2019

@XWB seems that PR is ready for review.

Also, I have noticed that master build fails after the following changes:
520d910
Probably you should allow PHP7.3 to fail until it fixed.

composer.json Outdated Show resolved Hide resolved
@dbalabka
Copy link
Contributor Author

@XWB I have squashed commits

@dbalabka
Copy link
Contributor Author

@XWB can you please take a look at this PR?

src/Configuration/IndexTemplateConfig.php Outdated Show resolved Hide resolved
src/Configuration/IndexTemplateConfig.php Show resolved Hide resolved
composer.json Outdated Show resolved Hide resolved
src/Command/ResetTemplatesCommand.php Show resolved Hide resolved
src/Command/ResetTemplatesCommand.php Outdated Show resolved Hide resolved
src/Command/ResetTemplatesCommand.php Show resolved Hide resolved
src/Command/ResetTemplatesCommand.php Outdated Show resolved Hide resolved
src/Configuration/IndexConfigTrait.php Outdated Show resolved Hide resolved
src/Index/MappingBuilder.php Show resolved Hide resolved
src/Command/ResetTemplatesCommand.php Outdated Show resolved Hide resolved
� Conflicts:
�	tests/Unit/DependencyInjection/ConfigurationTest.php
�	tests/Unit/Elastica/ClientTest.php
@dbalabka
Copy link
Contributor Author

dbalabka commented Jan 30, 2019

@XWB I've adjusted code according to comments. Also, added one documentation page about index templates.

@XWB XWB merged commit c064338 into FriendsOfSymfony:master Jan 31, 2019
@XWB
Copy link
Member

XWB commented Jan 31, 2019

Thanks for the hard work @torinaki :)

@dbalabka
Copy link
Contributor Author

@XWB thanks you for your time and patience 😃

@pesseyjulien
Copy link

Hi @dbalabka,

Sorry to unearth the subject but I tried your feature, it's working fine until a new object is insert/updated in one of the index. Then it automatically reset the replicas to 1 whereas my parameter is set to 0.

Any idea what could be wrong on my side ?

Thanks in advance for any help,
Julien

@dbalabka
Copy link
Contributor Author

@pesseyjulien so if initially your setting value is 0 then it should not be a bug the implementation. Try to take a look into follow discussion on Elastic forum:

https://discuss.elastic.co/t/change-number-of-replicas-to-0-on-existing-indices/141622

There might be a relationship with new setting auto_expand_replicas

@pesseyjulien
Copy link

pesseyjulien commented Jul 24, 2021

Nop, modifying the settings works but then if I save an object in ES, it reverts to replica : 1

I don't understand why..

My templates :

   index_templates:
        all1:
            client: default
            template: 'events*'
            settings:
                number_of_shards: '%env(resolve:elastic_shards)%'
                number_of_replicas: '%env(resolve:elastic_replicas)%'
        all2:
            client: default
            template: 'invoices*'
            settings:
                number_of_shards: '%env(resolve:elastic_shards)%'
                number_of_replicas: '%env(resolve:elastic_replicas)%'
        all3:
            client: default
            template: 'companies*'
            settings:
                number_of_shards: '%env(resolve:elastic_shards)%'
                number_of_replicas: '%env(resolve:elastic_replicas)%'

With :

elastic_shards=1
elastic_replicas=0

What I do :

Reset templates

PUT /companies,companies_ref,events,invoices/_settings
{
 "index" : {
  "number_of_replicas":0,
  "auto_expand_replicas": false
 }
}

Result :

health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   invoices      -Ij3pe0CQxO8XeeTXQ-pHw   1   0        129            0       47kb           47kb
green  open   jobs          B4VwCvw3RP2AigvTIVk1kQ   1   1        173            0    194.7kb         97.3kb
green  open   users         Ep5XjFj7TdSmSZ5Nbtz8Yg   1   1        539            0    300.9kb        162.5kb
green  open   companies     13XCzF8JSSe7j7ZFvJFz1w   1   0        294            0     38.8kb         38.8kb
green  open   offers        ArfO0gURTkqHge95hZMLQQ   1   1        131            1    251.3kb        125.6kb
green  open   companies_ref XdkXzEJoS4iWAHQK1uDPdg   1   0        499            0    115.1kb        115.1kb
green  open   events        Xb8agkceT-yWugGwsG0JBg   1   0         10     

Save an object in the index -> revert back to

health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   invoices      -Ij3pe0CQxO8XeeTXQ-pHw   1   1        129            0       94kb           47kb
green  open   jobs          B4VwCvw3RP2AigvTIVk1kQ   1   1        173            0    194.7kb         97.3kb
green  open   users         Ep5XjFj7TdSmSZ5Nbtz8Yg   1   1        539            0    300.9kb        162.5kb
green  open   companies     13XCzF8JSSe7j7ZFvJFz1w   1   1        294            0     77.4kb         38.7kb
green  open   offers        ArfO0gURTkqHge95hZMLQQ   1   1        131            1    251.3kb        125.6kb
green  open   companies_ref XdkXzEJoS4iWAHQK1uDPdg   1   1        499            0    230.2kb        115.1kb
green  open   events        Xb8agkceT-yWugGwsG0JBg   1   1 

@dbalabka
Copy link
Contributor Author

@pesseyjulien please validate that template settings are created correctly using GET API

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-template.html

@pesseyjulien
Copy link

Looks like it

{
  "all1": {
    "order": 0,
    "index_patterns": [
      "events*"
    ],
    "settings": {
      "index": {
        "number_of_shards": "1",
        "number_of_replicas": "0"
      }
    },
    "mappings": {},
    "aliases": {}
  }
}

Maybe the index pattern is wrong ?

Thanks,
Julien

@dbalabka
Copy link
Contributor Author

@pesseyjulien so it seems that the template has been created correctly:

{
//...
    "settings": {
      "index": {
        "number_of_shards": "1",
        "number_of_replicas": "0"
      }
    },
//...
}

So, the code works correctly. It might be related to your Elastica settings. I suspect that auto_expand_replicas might be a cause.
Be aware that number_of_replicas is a dynamic setting and can be changed after index creation.

I suggest addressing your question to https://discuss.elastic.co community.

@pesseyjulien
Copy link

Turns out it was a limitation of Bonsai (my ES provider)..

Thanks a lot for the help anyway,
Julien

@dbalabka
Copy link
Contributor Author

@pesseyjulien curious how it is possible to set such limits and what is the reason?

@pesseyjulien
Copy link

Well this is their answer :

To avoid Bonsai users from incurring data loss, we’ve recently made the difficult decision to not have any indices on multitenant plans go below 1 replica shard.

I only use it for my test environnement but they have a shard limit, this is why I wanted no replication but they increased my limit in response, so it's ok for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants