New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TasksIT#testGetTaskWaitForCompletionWithoutStoringResult #108094
base: main
Are you sure you want to change the base?
Conversation
Make sure the `.tasks` index is created before we starting testing task completion without storing its result. To achieve that, we store a fake task before we start `waitForCompletionTestCase`. Resolves #107823
Pinging @elastic/es-distributed (Team:Distributed) |
@elasticmachine update branch |
@elasticmachine update branch |
The linked issue says that the tasks index got deleted, but that does not seem to match the resolution here? Can we find out why the tasks index was deleted too soon instead? |
@henningandersen I believe the comment in the linked issue is wrong. The index was never deleted, because the test doesn't create the index. The test waits for the a completion of a task and the tasks only completes, because we have special error handling for the case where the index doesn't exist. I guess in some cases the error handling doesn't can't figure out that the root cause was I believe we shoud just explicitly create the index, because |
@arteam it still smells like we might be covering up for a bug here. AFAICS, we expect the logic to work regardless of whether the index exists or not. Can you elaborate on how the test differentiates between whether the task exists or not? Since it if it is within the actual tasks code, we may want to target that instead (as well as add a dedicated test for it). |
Still, the only way can I see the test failing is |
@elasticmachine update branch |
The main problem seems to be that the test case does not find the task running, see this part of the stack trace:
which is this line. This is where the focus should go I think. The test ran in less than 50ms, so it is not something timeout related, rather likely some race. I did a bit of digging but did not find it. I do notice that the test case is a suite case, which are sometimes disturbed by prior test. I did not find any such evidence though, so might be a red herring. I notice that the test writes |
@elasticmachine update branch |
@henningandersen That was a very good catch! The part about the missed index seems to irrelevant, since |
It seems that the failure (the missed index) has always existed in the test scenario and it's supposed to be handled by TransportGetTaskAction.java. We catch
IndexNotFoundException
here and convert it toResourceNotFoundException
. Then we catchResourceNotFoundException
here and return a snapshot of a task as a response.In the stack trace,
getFinishedTaskFromIndex
was called fromgetRunningTaskFromNode
, not fromwaitedForCompletion
due to a race between creating a get request and unblocking request which are sent asynchronously. I've changed thewaitForCompletionTestCase
test method to unblock the task only after the request started waiting for the task completion by registering a removal listener. By doing so, we make sure we test the "wait for completion" branch when task is running.The part about the missed index seems to irrelevant, since
waitedForCompletion
is able to suppress the error and return a snapshot of running task which is not possible ifgetFinishedTaskFromIndex
gets called directly fromgetRunningTaskFromNode
.Resolves #107823