Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] DownsampleActionSingleNodeTests testCannotDownsampleWhileOtherDownsampleInProgress failing #107904

Closed
slobodanadamovic opened this issue Apr 25, 2024 · 5 comments
Assignees
Labels
low-risk An open issue or test failure that is a low risk to future releases :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine >test-failure Triaged test failures from CI

Comments

@slobodanadamovic
Copy link
Contributor

Build scan:
https://gradle-enterprise.elastic.co/s/ygv5qoxopbkqs/tests/:x-pack:plugin:downsample:test/org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests/testCannotDownsampleWhileOtherDownsampleInProgress

Reproduction line:

./gradlew ':x-pack:plugin:downsample:test' --tests "org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests.testCannotDownsampleWhileOtherDownsampleInProgress" -Dtests.seed=FCE9B44B6379CBC3 -Dtests.locale=ar-LY -Dtests.timezone=Africa/Ceuta -Druntime.java=21 -Dtests.fips.enabled=true

Applicable branches:
main

Reproduces locally?:
No

Failure history:
Failure dashboard for org.elasticsearch.xpack.downsample.DownsampleActionSingleNodeTests#testCannotDownsampleWhileOtherDownsampleInProgress

Failure excerpt:

org.elasticsearch.ElasticsearchException: downsample task [downsample-downsample-gsbskprihwaewu-0-351ms] failed

  at __randomizedtesting.SeedInfo.seed([FCE9B44B6379CBC3:DC864329F5D5E5D6]:0)
  at org.elasticsearch.xpack.downsample.TransportDownsampleAction$2.onResponse(TransportDownsampleAction.java:497)
  at org.elasticsearch.xpack.downsample.TransportDownsampleAction$2.onResponse(TransportDownsampleAction.java:489)
  at org.elasticsearch.persistent.PersistentTasksService$1.onNewClusterState(PersistentTasksService.java:195)
  at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onNewClusterState(ClusterStateObserver.java:375)
  at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.clusterChanged(ClusterStateObserver.java:226)
  at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:561)
  at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:548)
  at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:506)
  at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:430)
  at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:155)
  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:217)
  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:183)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  at java.lang.Thread.run(Thread.java:1583)

@slobodanadamovic slobodanadamovic added :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data >test-failure Triaged test failures from CI Team:StorageEngine labels Apr 25, 2024
@elasticsearchmachine elasticsearchmachine added the needs:risk Requires assignment of a risk label (low, medium, blocker) label Apr 25, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@kkrik-es kkrik-es self-assigned this Apr 25, 2024
@kkrik-es kkrik-es added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Apr 25, 2024
@kkrik-es
Copy link
Contributor

Looking at the logs, there's a race between the two downsampling actions: the first manages to complete first so the second one fails as the downsample index can't be written any more:

[2024-04-25T15:59:36,589][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] started
[2024-04-25T15:59:36,596][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [gsbskprihwaewu][0] processed [2270] docs, created [535] downsample buckets
[2024-04-25T15:59:36,597][INFO ][o.e.c.r.a.AllocationService] [node_s_0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[downsample-gsbskprihwaewu][0]]])." previous.health="YELLOW" reason="shards started [[downsample-gsbskprihwaewu][0]]"
[2024-04-25T15:59:36,689][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] successfully sent [2270], received source doc [535], indexed downsampled doc [535], failed [0], took [0s]
[2024-04-25T15:59:36,689][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] completed
[2024-04-25T15:59:36,701][INFO ][o.e.x.d.TransportDownsampleAction] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms completed for shard [gsbskprihwaewu][0]
[2024-04-25T15:59:36,701][INFO ][o.e.x.d.TransportDownsampleAction] [node_s_0] All downsampling tasks completed [1]
[2024-04-25T15:59:36,751][WARN ][o.e.p.PersistentTasksClusterService] [node_s_0] trying to update state on task downsample-downsample-gsbskprihwaewu-0-351ms with unexpected allocation id 14
[2024-04-25T15:59:36,759][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Downsampling task [downsample-downsample-gsbskprihwaewu-0-351ms on shard [gsbskprihwaewu][0] started
[2024-04-25T15:59:36,764][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [gsbskprihwaewu][0] processed [2270] docs, created [535] downsample buckets
[2024-04-25T15:59:36,776][ERROR][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] failed to populate downsample index. Failures: [{null=org.elasticsearch.cluster.block.ClusterBlockException: index [downsample-gsbskprihwaewu] blocked by: [FORBIDDEN/8/index write (api)];}]
[2024-04-25T15:59:36,777][INFO ][o.e.x.d.DownsampleShardIndexer] [node_s_0] Shard [[gsbskprihwaewu][0]] successfully sent [2270], received source doc [535], indexed downsampled doc [535], failed [535], took [0s]

@kkrik-es
Copy link
Contributor

@slobodanadamovic was the branch up-to-date? I submitted a fix for this in #107213, wonder if it's included.

@slobodanadamovic
Copy link
Contributor Author

slobodanadamovic commented Apr 26, 2024

@kkrik-es Yes. The branch was up-to-date. I have just merged a new changes from the main before I reported it.

@kkrik-es
Copy link
Contributor

kkrik-es commented Apr 29, 2024

Thanks for confirming, lemme reopen the original bug and mark this as a duplicate of #107210

@kkrik-es kkrik-es closed this as not planned Won't fix, can't repro, duplicate, stale Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-risk An open issue or test failure that is a low risk to future releases :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants