[CI] DownsampleActionSingleNodeTests testCannotDownsampleWhileOtherDownsampleInProgress failing #107904

slobodanadamovic · 2024-04-25T14:11:50Z

Build scan:
https://gradle-enterprise.elastic.co/s/ygv5qoxopbkqs/tests/:x-pack:plugin:downsample:test/org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests/testCannotDownsampleWhileOtherDownsampleInProgress

Reproduction line:

./gradlew ':x-pack:plugin:downsample:test' --tests "org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests.testCannotDownsampleWhileOtherDownsampleInProgress" -Dtests.seed=FCE9B44B6379CBC3 -Dtests.locale=ar-LY -Dtests.timezone=Africa/Ceuta -Druntime.java=21 -Dtests.fips.enabled=true

Applicable branches:
main

Reproduces locally?:
No

Failure history:
Failure dashboard for org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests#testCannotDownsampleWhileOtherDownsampleInProgress

Failure excerpt:

org.elasticsearch.ElasticsearchException: downsample task [downsample-downsample-gsbskprihwaewu-0-351ms] failed

  at __randomizedtesting.SeedInfo.seed([FCE9B44B6379CBC3:DC864329F5D5E5D6]:0)
  at org.elasticsearch.xpack.downsample.TransportDownsampleAction$2.onResponse(TransportDownsampleAction.java:497)
  at org.elasticsearch.xpack.downsample.TransportDownsampleAction$2.onResponse(TransportDownsampleAction.java:489)
  at org.elasticsearch.persistent.PersistentTasksService$1.onNewClusterState(PersistentTasksService.java:195)
  at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:375)
  at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.clusterChanged(ClusterStateObserver.java:226)
  at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:561)
  at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:548)
  at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:506)
  at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:430)
  at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:155)
  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217)
  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  at java.lang.Thread.run(Thread.java:1583)

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-04-25T14:12:14Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

kkrik-es · 2024-04-26T09:14:28Z

Looking at the logs, there's a race between the two downsampling actions: the first manages to complete first so the second one fails as the downsample index can't be written any more:

[2024-04-25T15:59:36,589][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] started
[2024-04-25T15:59:36,596][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [gsbskprihwaewu][0] processed [2270] docs, created [535] downsample buckets
[2024-04-25T15:59:36,597][INFO ][o.e.c.r.a.AllocationService] [node_s_0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[downsample-gsbskprihwaewu][0]]])." previous.health="YELLOW" reason="shards started [[downsample-gsbskprihwaewu][0]]"
[2024-04-25T15:59:36,689][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] successfully sent [2270], received source doc [535], indexed downsampled doc [535], failed [0], took [0s]
[2024-04-25T15:59:36,689][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] completed
[2024-04-25T15:59:36,701][INFO ][o.e.x.d.TransportDownsampleAction] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms completed for shard [gsbskprihwaewu][0]
[2024-04-25T15:59:36,701][INFO ][o.e.x.d.TransportDownsampleAction] [node_s_0] All downsampling tasks completed [1]
[2024-04-25T15:59:36,751][WARN ][o.e.p.PersistentTasksClusterService] [node_s_0] trying to update state on task downsample-downsample-gsbskprihwaewu-0-351ms with unexpected allocation id 14
[2024-04-25T15:59:36,759][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] started
[2024-04-25T15:59:36,764][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [gsbskprihwaewu][0] processed [2270] docs, created [535] downsample buckets
[2024-04-25T15:59:36,776][ERROR][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] failed to populate downsample index. Failures: [{null=org.elasticsearch.cluster.block.ClusterBlockException: index [downsample-gsbskprihwaewu] blocked by: [FORBIDDEN/8/index write (api)];}]
[2024-04-25T15:59:36,777][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] successfully sent [2270], received source doc [535], indexed downsampled doc [535], failed [535], took [0s]

kkrik-es · 2024-04-26T13:58:01Z

@slobodanadamovic was the branch up-to-date? I submitted a fix for this in #107213, wonder if it's included.

slobodanadamovic · 2024-04-26T14:39:33Z

@kkrik-es Yes. The branch was up-to-date. I have just merged a new changes from the main before I reported it.

kkrik-es · 2024-04-29T05:30:54Z

Thanks for confirming, lemme reopen the original bug and mark this as a duplicate of #107210

slobodanadamovic added :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data >test-failure Triaged test failures from CI Team:StorageEngine labels Apr 25, 2024

elasticsearchmachine added the needs:risk Requires assignment of a risk label (low, medium, blocker) label Apr 25, 2024

kkrik-es self-assigned this Apr 25, 2024

kkrik-es added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 25, 2024

kkrik-es closed this as not planned Won't fix, can't repro, duplicate, stale Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] DownsampleActionSingleNodeTests testCannotDownsampleWhileOtherDownsampleInProgress failing #107904

[CI] DownsampleActionSingleNodeTests testCannotDownsampleWhileOtherDownsampleInProgress failing #107904

slobodanadamovic commented Apr 25, 2024

elasticsearchmachine commented Apr 25, 2024

kkrik-es commented Apr 26, 2024

kkrik-es commented Apr 26, 2024

slobodanadamovic commented Apr 26, 2024 •

edited

kkrik-es commented Apr 29, 2024 •

edited

[CI] DownsampleActionSingleNodeTests testCannotDownsampleWhileOtherDownsampleInProgress failing #107904

[CI] DownsampleActionSingleNodeTests testCannotDownsampleWhileOtherDownsampleInProgress failing #107904

Comments

slobodanadamovic commented Apr 25, 2024

elasticsearchmachine commented Apr 25, 2024

kkrik-es commented Apr 26, 2024

kkrik-es commented Apr 26, 2024

slobodanadamovic commented Apr 26, 2024 • edited

kkrik-es commented Apr 29, 2024 • edited

slobodanadamovic commented Apr 26, 2024 •

edited

kkrik-es commented Apr 29, 2024 •

edited