shard has exceeded the maximum number of retries [1] #1630

kazukiyashiro · 2021-11-26T10:28:56Z

Hello!

For usage questions and help

https://discuss.elastic.co/t/curator-shard-has-exceeded-the-maximum-number-of-retries-1/290059

When the curator tries to allocate a replica shard of shrunken index I've got this error:

{
  "index" : "example-index-2021-09-29-shrink",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2021-11-23T12:26:19.515Z",
    "failed_allocation_attempts" : 1,
    "details" : "failed shard on node [8r_zhRD4RDm2peWnDun_3w]: failed recovery, failure RecoveryFailedException[[example-index-2021-09-29-shrink][0]: Recovery failed from {node15}{nWOPSov3TFKUunoiooVxMQ}{PSAfiXvZQx-NLyKpnXGs1A}{192.168.0.164}{192.168.0.164:9300}{ml.machine_memory=135291469824, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {node13}{8r_zhRD4RDm2peWnDun_3w}{KU0HhEPMQ_ilSV3RCe4XNw}{192.168.0.162}{192.168.0.162:9300}{ml.machine_memory=135291469824, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[node15][172.17.0.3:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [85] files with total size of [24.8gb]]; nested: ReceiveTimeoutTransportException[[node13][192.168.0.162:9300][internal:index/shard/recovery/file_chunk] request_id [1586168734] timed out after [899897ms]]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "8r_zhRD4RDm2peWnDun_3w",
      "node_name" : "node13",
      "transport_address" : "192.168.0.162:9300",
      "node_attributes" : {
        "ml.machine_memory" : "135291469824",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [1] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-11-23T12:26:19.515Z], failed_attempts[1], delayed=false, details[failed shard on node [8r_zhRD4RDm2peWnDun_3w]: failed recovery, failure RecoveryFailedException[[example-index-2021-09-29-shrink][0]: Recovery failed from {node15}{nWOPSov3TFKUunoiooVxMQ}{PSAfiXvZQx-NLyKpnXGs1A}{192.168.0.164}{192.168.0.164:9300}{ml.machine_memory=135291469824, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {node13}{8r_zhRD4RDm2peWnDun_3w}{KU0HhEPMQ_ilSV3RCe4XNw}{192.168.0.162}{192.168.0.162:9300}{ml.machine_memory=135291469824, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[node15][172.17.0.3:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [85] files with total size of [24.8gb]]; nested: ReceiveTimeoutTransportException[[node13][192.168.0.162:9300][internal:index/shard/recovery/file_chunk] request_id [1586168734] timed out after [899897ms]]; ], allocation_status[no_attempt]]]"
        }
      ]

Is there a way to increase the "index.allocation.max_retries" in curator settings?

Action file:

actions:
  1:
    action: shrink
    description: >-
      Shrink selected indices on the node with the most available space.
      Delete source index after successful shrink, then reroute the shrunk
      index with the provided parameters.
    options:
      ignore_empty_list: True
      shrink_node: DETERMINISTIC
      node_filters:
        permit_masters: True
      number_of_shards: 1
      number_of_replicas: ${REPLICA_COUNT:1}
      shrink_prefix:
      shrink_suffix: '-shrink'
      copy_aliases: True
      delete_after: True
      wait_for_active_shards: 1
      extra_settings:
        settings:
          index.codec: best_compression
      wait_for_completion: True
      wait_for_rebalance: True
      wait_interval: 9
      max_wait: -1
    filters:
     - filtertype: pattern
       kind: prefix
       value: ${INDEX_PREFIX}
     - filtertype: age
       source: name
       direction: older
       timestring: ${TIMESTAMP:'%Y-%m-%d'}
       unit: ${PERIOD:days}
       unit_count: ${PERIOD_COUNT}

Curator version: 5.8.4
OS: Centos 7

I've tried to create a template:

"shrink" : {
    "order" : 0,
    "index_patterns" : [
      "*-shrink"
    ],
    "settings" : {
      "index" : {
        "allocation" : {
          "max_retries" : "5"
        }
      }

But it doesn't help.
Here are indices settings after successful shrink:

GET /example-index-shrink/_settings

{
  "example-index-shrink" : {
    "settings" : {
      "index" : {
        "allocation" : {
          "max_retries" : "1"
        },
        "shrink" : {
          "source" : {
            "name" : "example-index",
            "uuid" : "mecKKzDDTzu77ViMv5N3EA"
          }
        },
        "blocks" : {
          "write" : null
        },
        "provided_name" : "example-index-shrink",
        "creation_date" : "1637751350836",
        "number_of_replicas" : "1",
        "uuid" : "MI_wbW35R8ubkYZOySfp1g",
        "version" : {
          "created" : "6080899",
          "upgraded" : "6080899"
        },
        "codec" : "best_compression",
        "routing" : {
          "allocation" : {
            "initial_recovery" : {
              "_id" : "nWOPSov3TFKUunoiooVxMQ"
            },
            "require" : {
              "_name" : null
            }
          }
        },
        "number_of_shards" : "1",
        "routing_partition_size" : "1",
        "resize" : {
          "source" : {
            "name" : "example-index",
            "uuid" : "mecKKzDDTzu77ViMv5N3EA"
          }
        }
      }
    }
  }
}

Thanks in advance

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shard has exceeded the maximum number of retries [1] #1630

shard has exceeded the maximum number of retries [1] #1630

kazukiyashiro commented Nov 26, 2021

shard has exceeded the maximum number of retries [1] #1630

shard has exceeded the maximum number of retries [1] #1630

Comments

kazukiyashiro commented Nov 26, 2021

For usage questions and help