Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-sdk-sqs raises NoMethodError when messages not found #2947

Closed
t-kinoshita opened this issue Nov 20, 2023 · 14 comments · Fixed by #2948
Closed

aws-sdk-sqs raises NoMethodError when messages not found #2947

t-kinoshita opened this issue Nov 20, 2023 · 14 comments · Fixed by #2948
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.

Comments

@t-kinoshita
Copy link

Describe the bug

Error occaisonally happens inside QueuePoller#poll.

The error message indicates messages is nil.

Expected Behavior

No error happens.

Current Behavior

 undefined method `empty?' for nil:NilClass
  /opt/rubies/ruby-2.7.8/lib/ruby/gems/2.7.0/gems/aws-sdk-sqs-1.67.0/lib/aws-sdk-sqs/queue_poller.rb:358:in `block (2 levels) in poll'

Reproduction Steps

Needs more research

Possible Solution

No response

Additional Information/Context

The error starts on Nov 18, so AWS side might have changed at the timing.

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-sqs-1.67.0

Environment details (Version of Ruby, OS environment)

ruby-2.7.8 on Amazon Linux 2

@t-kinoshita t-kinoshita added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 20, 2023
@mscrivo
Copy link

mscrivo commented Nov 20, 2023

We're seeing very similar issues here, but for us it started today at roughly 7:24am UTC. This is the second issue we've had to deal with this weekend with SQS. First friday night, Shoryuken started failing on empty receives on this line, which we had to monkey patch to check for null responses (even though the AWS SDK states that it should return an empty array), because the messages response was intermittently coming back as nil instead of []. These issues are popping up without any changes on our end, ie. no deployments occurred when the errors started.

@mullermp
Copy link
Contributor

Thanks for reporting this. Recently sqs changed its wire protocol from query to json. Are you able to reproduce with http_wire_trace: true as a client option, to see if the service is returning messages at all? Does rolling back the gem version solve the issue?

@alextwoods
Copy link
Contributor

We believe this is related to the protocol change from Query to aws json: eb6ac8c

The SDK is behaving correctly for the AWS json protocol: https://github.com/smithy-lang/smithy/blob/main/smithy-aws-protocol-tests/model/awsJson1_1 but I believe this is a change in behavior from the previous query protocol.

@mscrivo
Copy link

mscrivo commented Nov 20, 2023

I'm a bit confused, is it not automatically using the json protocol when using 1.67+? or is there a flag that needs to be set to enable it?

@mscrivo
Copy link

mscrivo commented Nov 20, 2023

FYI downgrading to gem version 1.65 appears to have fixed the issue for us.

@mullermp
Copy link
Contributor

mullermp commented Nov 20, 2023

The service accepts both formats. The older version of the gem will initiate the old format (query) and so the service responds that way. The new version of the gem initiates with the new format (json) and responds that way, too. The fix is unclear, and a few SDKs are affected, we are currently deliberating the correct approach. In the mean time, please use the older version.

@mscrivo
Copy link

mscrivo commented Nov 20, 2023

Thank you for the explanation. Can you explain why we seemingly saw the issue start at 2 random time over the weekend? Was there a slow rollout of responding with the new format on the back end? Otherwise, I would have expected this to pop up right away when we updated to the new gem.

@geeksam
Copy link

geeksam commented Nov 20, 2023

I lost a good chunk of my Saturday attempting to diagnose this issue, eventually monkeypatching the same line in Shoryuken that @mscrivo linked above. Appreciate the info about downgrading to 1.65.

@mullermp
Copy link
Contributor

Thank you for the explanation. Can you explain why we seemingly saw the issue start at 2 random time over the weekend? Was there a slow rollout of responding with the new format on the back end? Otherwise, I would have expected this to pop up right away when we updated to the new gem.

After switching protocols, SQS started sending a body like { "Message" [] } which was working in the Ruby SDK but was breaking other SDKs like Java. Over last weekend, SQS deployed a change that made the value null, "fixing the issue", so the body would come back as {}. The Ruby SDK then parsed this as nil messages.

@mullermp
Copy link
Contributor

After discussion within the greater SDK team, the Ruby SDK's behavior of default empty list was not correct, but we must preserve this behavior. I have a fix out that is pending review and protocol tests (this kind of change is considered high risk) #2948

Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@mullermp mullermp reopened this Nov 20, 2023
@mullermp
Copy link
Contributor

A fix should be shipped in the next couple hours in core version 3.187.1

@mullermp
Copy link
Contributor

The gem has been released, please upgrade and let me know if it works. I'm sorry for any troubles this caused (and hopefully my change does not cause trouble too!) SQS protocol change was very high risk and unfortunately I'm just along for the ride.

Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.
Projects
None yet
5 participants