Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call StepFunction SendTaskSuccess once then got: Task Timed Out: 'Provided task does not exist anymore', but succeeded eventually #4044

Open
NianLi71 opened this issue Mar 6, 2024 · 1 comment
Assignees
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation. stepfunctions

Comments

@NianLi71
Copy link

NianLi71 commented Mar 6, 2024

Describe the bug

I have code that consumes messages from SQS stand queue, each message will make a call to StepFunction SendTaskSuccess. I checked log that one message only called StepFunction SendTaskSuccess once with valid task token and got:

Failed to process with exception: An error occurred (TaskTimedOut) when calling the SendTaskSuccess operation: Task Timed Out: 'Provided task does not exist anymore'

I also saved the message id in DynamoDB, it was the same message that updated the DB item which caused TaskTimedOut
Looks like boto3 had made the first attempt to send StepFunction task token and the token expired, but eventually the SendTaskSuccess operation succeeded even with exception like above, and StepFunction successfully received the task token.

Any boto3 inside retry mechanism leads to this issue?

Expected Behavior

Should be no exception:

Failed to process with exception: An error occurred (TaskTimedOut) when calling the SendTaskSuccess operation: Task Timed Out: 'Provided task does not exist anymore'

when making first call

Current Behavior

Got exception

Failed to process with exception: An error occurred (TaskTimedOut) when calling the SendTaskSuccess operation: Task Timed Out: 'Provided task does not exist anymore'

even with first call to SendTaskSuccess, but eventually the SendTaskSuccess operation succeeded.

Reproduction Steps

The error was random, I tried to send about greater than 10K requests then there was one TaskTimedOut exception.

Possible Solution

No response

Additional Information/Context

No response

SDK version used

Boto3==1.34.50, BotoCore==1.34.50

Environment details (OS name and version, etc.)

AWS Lambda Python 3.9 x86_64

@NianLi71 NianLi71 added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Mar 6, 2024
@NianLi71 NianLi71 changed the title Call StepFunction SendTaskSuccess once then got: Task Timed Out: 'Provided task does not exist anymore', but succeeded at last Call StepFunction SendTaskSuccess once then got: Task Timed Out: 'Provided task does not exist anymore', but succeeded eventually Mar 6, 2024
@tim-finnigan tim-finnigan self-assigned this May 9, 2024
@tim-finnigan tim-finnigan added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label May 9, 2024
@tim-finnigan
Copy link
Contributor

Hello - thanks for reaching out and for your patience here. The send_task_status command involves a call to the underlying SendTaskStatus API. Therefore if there's an issue with the behavior here then it's likely something we'd need to escalate to the Step Functions team.

The TaskTimedOut exception indicates:

The task token has either expired or the task associated with the token has already been closed.

As a service exception this would be caught with the ClientError Botocore exception. The retry behavior depends on however you have configured retries.

If the error was random and only 1 in 10k as you mentioned, then this may have just been caused by something transient like a network issue. Maybe get_execution_history would help provide more context. For us to investigate this further I think we need a code snippet to reproduce the issue, and debug logs (using boto3.set_stream_logger('') to get more insight into what's going on.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation. p2 This is a standard priority issue stepfunctions and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation. stepfunctions
Projects
None yet
Development

No branches or pull requests

2 participants