Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult #108094

arteam · 2024-04-30T14:24:24Z

It seems that the failure (the missed index) has always existed in the test scenario and it's supposed to be handled by TransportGetTaskAction.java. We catch IndexNotFoundException here and convert it to ResourceNotFoundException. Then we catch ResourceNotFoundException here and return a snapshot of a task as a response.

In the stack trace, getFinishedTaskFromIndex was called from getRunningTaskFromNode, not from waitedForCompletion due to a race between creating a get request and unblocking request which are sent asynchronously. I've changed the waitForCompletionTestCase test method to unblock the task only after the request started waiting for the task completion by registering a removal listener. By doing so, we make sure we test the "wait for completion" branch when task is running.

The part about the missed index seems to irrelevant, since waitedForCompletion is able to suppress the error and return a snapshot of running task which is not possible if getFinishedTaskFromIndex gets called directly from getRunningTaskFromNode.

Resolves #107823

Make sure the `.tasks` index is created before we starting testing task completion without storing its result. To achieve that, we store a fake task before we start `waitForCompletionTestCase`. Resolves #107823

elasticsearchmachine · 2024-04-30T14:24:48Z

Pinging @elastic/es-distributed (Team:Distributed)

arteam · 2024-05-02T06:29:40Z

@elasticmachine update branch

arteam · 2024-05-07T14:01:22Z

@elasticmachine update branch

henningandersen · 2024-05-10T14:33:12Z

The linked issue says that the tasks index got deleted, but that does not seem to match the resolution here? Can we find out why the tasks index was deleted too soon instead?

arteam · 2024-05-13T07:50:09Z

@henningandersen I believe the comment in the linked issue is wrong. The index was never deleted, because the test doesn't create the index. The test waits for the a completion of a task and the tasks only completes, because we have special error handling for the case where the index doesn't exist. I guess in some cases the error handling doesn't can't figure out that the root cause was IndexNotFoundException which should be converted to ResourceNotFoundException which is silently ignored.

I believe we shoud just explicitly create the index, because testGetTaskWaitForCompletionWithoutStoringResult is supposed to test task completion, not the error handling for missed indexes which is done in testGetTaskNotFound and testTasksGetWaitForNoTask.

henningandersen · 2024-05-15T06:54:51Z

@arteam it still smells like we might be covering up for a bug here. AFAICS, we expect the logic to work regardless of whether the index exists or not. Can you elaborate on how the test differentiates between whether the task exists or not? Since it if it is within the actual tasks code, we may want to target that instead (as well as add a dedicated test for it).

DaveCTurner · 2024-05-15T14:07:31Z

On Wed, May 15, 2024 at 3:02 PM Artem Prigoda ***@***.***> wrote: Started digging more deeply and the test stopped failing after #108052 <#108052> got merged

I'm pretty sure #108052 had no effect here, it was a pure refactoring.

…

Message ID: ***@***.***>

arteam · 2024-05-15T15:14:05Z

I'm pretty sure #108052 had no effect here, it was a pure refactoring.

Sorry about that! I deleted my comment right I realized that #108052 indeed just removed dead code, I was confused by the line numbers in the stack trace.

arteam · 2024-05-15T15:16:28Z

Still, the only way can I see the test failing is ExceptionsHelper.unwrap(e, ResourceNotFoundException.class) returning null. In fact, if I replace it with if (false) the error stack trace seems exactly like the one in the issue. Not sure how it is possible, though.

arteam · 2024-05-18T01:14:53Z

@elasticmachine update branch

henningandersen · 2024-05-19T19:49:10Z

The main problem seems to be that the test case does not find the task running, see this part of the stack trace:

      at org.elasticsearch.action.admin.cluster.node.tasks.get.TransportGetTaskAction.getRunningTaskFromNode(TransportGetTaskAction.java:140)

which is this line.

This is where the focus should go I think. The test ran in less than 50ms, so it is not something timeout related, rather likely some race. I did a bit of digging but did not find it.

I do notice that the test case is a suite case, which are sometimes disturbed by prior test. I did not find any such evidence though, so might be a red herring.

I notice that the test writes Test task finished on the node, so the test task was not cancelled either, since then I believe it would not output that.

This reverts commit bf3b27d.

arteam · 2024-05-21T07:39:22Z

@elasticmachine update branch

…for completion

arteam · 2024-05-21T07:54:40Z

@henningandersen That was a very good catch! getFinishedTaskFromIndex was called from getRunningTaskFromNode, not from waitedForCompletion. There indeed seems to be a race between creating a get request and unblocking request which are sent asynchronously. I've changed waitForCompletionTestCase to unblock the task only after the request started waiting for the task completion by registering a removal listener. By doing so, we make sure we test the "wait for completion" branch when task is running.

The part about the missed index seems to irrelevant, since waitedForCompletion is able to suppress the error and return a snapshot of running task which is not possible if getFinishedTaskFromIndex gets called directly from getRunningTaskFromNode.

Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult

bf3b27d

Make sure the `.tasks` index is created before we starting testing task completion without storing its result. To achieve that, we store a fake task before we start `waitForCompletionTestCase`. Resolves #107823

arteam added >test Issues or PRs that are addressing/adding tests :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. labels Apr 30, 2024

elasticsearchmachine added Team:Distributed Meta label for distributed team v8.15.0 labels Apr 30, 2024

arteam requested review from idegtiarenko and DaveCTurner April 30, 2024 17:06

Merge branch 'main' into save-fake-tasks-to-create-task-index

39fb24a

arteam requested review from idegtiarenko, volodk85 and DaveCTurner and removed request for idegtiarenko and DaveCTurner May 2, 2024 07:26

Merge branch 'main' into save-fake-tasks-to-create-task-index

88eeddc

arteam requested review from idegtiarenko, DaveCTurner, volodk85 and a team and removed request for idegtiarenko, volodk85 and DaveCTurner May 8, 2024 07:56

Merge branch 'main' into save-fake-tasks-to-create-task-index

9f5224b

arteam added 4 commits May 21, 2024 01:19

Unblock request only after we started waiting for completion

800d56c

Update comment

63396ac

Remove outdated comment

1b2bded

Revert "Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult"

e8241a5

This reverts commit bf3b27d.

elasticmachine and others added 2 commits May 21, 2024 08:39

Merge branch 'main' into save-fake-tasks-to-create-task-index

59036cd

Make sure we register onRemovedTaskListenerRegistered before we wait …

f59ff4e

…for completion

arteam requested a review from henningandersen May 21, 2024 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult #108094

Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult #108094

arteam commented Apr 30, 2024 •

edited

elasticsearchmachine commented Apr 30, 2024

arteam commented May 2, 2024

arteam commented May 7, 2024

henningandersen commented May 10, 2024

arteam commented May 13, 2024

henningandersen commented May 15, 2024

DaveCTurner commented May 15, 2024 via email

arteam commented May 15, 2024

arteam commented May 15, 2024

arteam commented May 18, 2024

henningandersen commented May 19, 2024

arteam commented May 21, 2024

arteam commented May 21, 2024 •

edited

Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult #108094

Are you sure you want to change the base?

Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult #108094

Conversation

arteam commented Apr 30, 2024 • edited

elasticsearchmachine commented Apr 30, 2024

arteam commented May 2, 2024

arteam commented May 7, 2024

henningandersen commented May 10, 2024

arteam commented May 13, 2024

henningandersen commented May 15, 2024

DaveCTurner commented May 15, 2024 via email

arteam commented May 15, 2024

arteam commented May 15, 2024

arteam commented May 18, 2024

henningandersen commented May 19, 2024

arteam commented May 21, 2024

arteam commented May 21, 2024 • edited

arteam commented Apr 30, 2024 •

edited

arteam commented May 21, 2024 •

edited