Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{case_clause,[]} in ssl_gen_statem:iovec_from_front/4 #7507

Closed
AnufrievDE opened this issue Jul 20, 2023 · 19 comments · Fixed by #7526
Closed

{case_clause,[]} in ssl_gen_statem:iovec_from_front/4 #7507

AnufrievDE opened this issue Jul 20, 2023 · 19 comments · Fixed by #7526
Assignees
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS
Milestone

Comments

@AnufrievDE
Copy link

Hello, I've an application that perform http get requests with hackney use. And after uplifting to OTP-25.3.2.3 (erts-13.2.2.2), the following error, which seems not preventing the main application workflow, started to appear from time to time in the logs:

2023-07-17T14:42:18.500210+00:00 error: type: {{{case_clause,[]},[{ssl_gen_statem,iovec_from_front,4,[{file,"ssl_gen_statem.erl"},{line,1647}]},{ssl_gen_statem,read_application_data_bin,8,[{file,"ssl_gen_statem.erl"},{line,1483}]},{tls_gen_connection,handle_protocol_record,3,[{file,"tls_gen_connection.erl"},{line,414}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1426}]},{tls_connection,init,1,[{file,"tls_connection.erl"},{line,160}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]},{gen_statem,call,[<0.5596.5>,{recv,-1,5000},infinity]}}

Describe the bug
I am not sure this is a bug, because this missing case clause, seems expected, according to the code. But as I understood, this crash is happend at establishing tls_connection, after receiving some (handshake related?) data from the resource. Maybe this data is bad? Would you suggest why this crash happen?

To Reproduce
I have no exact scenario to reproduce, unfortunately. Just doing get https requests for specific resource, for example.

Expected behavior
No crashes expected. I don't see any crashes at the same circumstates on otp 25.0.4 (erts-13.1.2).

Affected versions
OTP 25.3.2.3 (erts-13.2.2.2)

@AnufrievDE AnufrievDE added the bug Issue is reported as a bug label Jul 20, 2023
@u3s u3s added the team:PS Assigned to OTP team PS label Jul 21, 2023
@u3s
Copy link
Contributor

u3s commented Jul 21, 2023

thanks for report.

  • would it be possible to verify some versions in between 25.0.4-25.3.2.3 ? Ideally to reduce the version step to 1 ...
  • does it appear with any https resource? or just one specific server/resource ?

@AnufrievDE
Copy link
Author

Checked on 25.0.4 again and also on a few versions between 25.0.4-25.3.2.3. Issue is reproduced.

But after changing the https resource for requests the issue is gone. Seems resource related.

@IngelaAndin
Copy link
Contributor

This function will only be called for handling application data, so the possible handshake related data should not factor in. At the moment I have no explanation for why this could happen. Do you know if the resource causing the trouble possible send empty application data packets? This is allowed for application data and is a corner case which might factor in.

@RaimoNiskanen
Copy link
Contributor

It is a bug. The comment in the code is meant to motivate why it uses an explicit match for a list of size 1 instead of using a last clause for all lists shorter than 2, because that causes an infinite loop if the user of the queue tries to remove more data from the queue than is queued. And an infinite loop is much harder to debug than a case clause.

Nevertheless, the code that uses the queue has failed to track how many bytes that are queued, and tries do dequeue more than is queued.

I see no obvious bug with zero sized packets. They could introduce empty binaries in the queue, but the code seems to handle that. Zero sized packets are probably less common and therefore less tested, though...

@RaimoNiskanen
Copy link
Contributor

  • It is not a known bug.
  • When I re-read the buffer handling code in ssl_gen_statem did not see anything suspicious (I wrote it so I probably have a blind spot here).
  • In the version interval OTP-25.0.4..OTP-25.3.2.3 I see no changes in that buffer handling code.

So can you show us a way to reproduce this bug?
Otherwise it do not see how it will be possible to hunt it down.

@RaimoNiskanen
Copy link
Contributor

Regarding the end of your log error entry: ...{gen_statem,call,[<0.5596.5>,{recv,-1,5000},infinity]}.

That should originate from a gen_statem:call(Pid, {recv,-1,5000}), which should be what ssl_gen_statem:recv(Pid, -1, 5000) does when called from the API function ssl:recv(SslSocket, Length, Timeout). But the ssl API function should not allow passing Length = -1. And if a negative length gets into the internal function ssl_gen_statem:recv/3 it is not checked and can really mess up the buffer handling.

How does Length = -1 get into that internal API?

@RaimoNiskanen
Copy link
Contributor

The line numbers in your log error entry doesn't quite match the line numbers for OTP-25.3.2.3.
Are there patches applied, or which OTP version actually produced that stack trace?

@RaimoNiskanen
Copy link
Contributor

But the ssl API function should not allow passing Length = -1

My bad. There is no check on Length.
But I see no reason why OTP-25.0.4 should handle Length = -1 any better than OTP-25.3.2.3.

So: it seems ssl:recv/2.3 has been called with a Length = -1 which is not documented to work.
The type specification allows a negative length and the API lets it through. Both should be fixed to be not allowed.

Can you (@AnufrievDE) somehow confirm that this is what is happening e.g by tracing on ssl:recv/3?

@AnufrievDE
Copy link
Author

Hello,

There were no patches applied to OTP, I'am not sure why line numbers differs, in first report. But I see issue reproduced regularly on otp 25.3.2.3, erts 13.2.2.2, there is today's part of stacktrace (with correct line numbers, matches sources from maint-25 branch):

log/erlang.log.0.gz:4:2023-08-02T09:04:35.685232+00:00 error: State machine <0.18141.4333> terminating. Reason: {case_clause,[]}. Stack: [{ssl_gen_statem,iovec_from_front,4,[{file,"ssl_gen_statem.erl"},{line,1650}]},{ssl_gen_statem,read_application_data_bin,8,[{file,"ssl_gen_statem.erl"},{line,1486}]},{ssl_gen_statem,passive_receive,4,[{file,"ssl_gen_statem.erl"},{line,939}]},{gen_statem,loop_state_callback,11,[{file,"gen_statem.erl"},{line,1426}]},{tls_connection,init,1,[{file,"tls_connection.erl"},{line,160}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]. Last event: {{call,{<0.19100.4309>,#Ref<0.3634545696.1880883201.189252>}},{recv,-1,5000}}. State: [{data,[{"State",{connection,{state,{static_env,client,gen_tcp,tls_gen_connection,tcp,tcp_closed,tcp_error,tcp_passive,{10,105,243,179},19358,#Port<0.5952138>,#Ref<0.3634545696.1223294977.203003>,#Ref<0.3634545696.1223294977.203002>,ssl_client_session_cache_db,{ssl_crl_cache,{{#Ref<0.3634545696.1223294977.203006>,#Ref<0.3634545696.1223294977.203007>},[]}},{#Ref<0.3634545696.1223294977.203004>,#Ref<0.3634545696.1223294977.203005>},{extracted,[{decoded,{{#Ref<0.3634545696.1880883201.186528>,15446673492073852651,{rdnSequence,[[{'AttributeTypeAndValue',{2,5,4,6},"JP"}],[{'AttributeTypeAndValue',{2,5,4,10},{printableString,"secom trust systems co.,ltd."}}],[{'AttributeTypeAndValue',{2,5,4,3},{printableString,"security communication ecc rootca1"}}]]}}...

I can collect some debug traces to share, if you know where it would be helpfull to enable it.

@AnufrievDE
Copy link
Author

Sorry for the confusion. Currently I can see issue on otp 25.0.4 as well as on 25.3.2.3. Just for some reasons (some changes in my application, probably additional requests added, I'll check it...), this issue began to appear and it coincided with the OTP uplift.

@RaimoNiskanen
Copy link
Contributor

Yes. That stack trace fits like a trucker hat on OTP-25.3.2.3.

It also shows that a passive recv with length -1 has been called, which should be forbidden in the API since it breaks the internal buffer handling...

If you can find out if that actually is what happens (ssl:recv/3 has been called with Length = -1), then I can claim No Error in OTP.

@AnufrievDE
Copy link
Author

AnufrievDE commented Aug 2, 2023

If you can find out if that actually is what happens (ssl:recv/3 has been called with Length = -1), then I can claim No Error in OTP.

There is traces from ssl:recv/3. It is called with Length = -1.

(<0.5379.4317>) call ssl:recv({sslsocket,{gen_tcp,#Port<0.5972350>,tls_connection,undefined},
           [<0.7406.4333>,<0.11018.4339>]},-1,5000)

@RaimoNiskanen
Copy link
Contributor

Can you add caller info to that trace and see from where that call is made?
(In case the bad call is made from other OTP code...)

@AnufrievDE
Copy link
Author

AnufrievDE commented Aug 2, 2023

dbg:tpl(ssl, recv, 3, cx):

(<0.31924.4357>) call ssl:recv({sslsocket,{gen_tcp,#Port<0.5997548>,tls_connection,undefined
},
           [<0.28413.4369>,<0.16905.4328>]},-1,5000) ({hackney_response,
                                                       stream_body_recv,2})
...
(<0.31924.4357>) exception_from {ssl,recv,3} {exit,{{{case_clause,[]},
        [{ssl_gen_statem,iovec_from_front,4,
                         [{file,"ssl_gen_statem.erl"},{line,1650}]},
         {ssl_gen_statem,read_application_data_bin,8,
                         [{file,"ssl_gen_statem.erl"},{line,1486}]},
         {ssl_gen_statem,passive_receive,4,
                         [{file,"ssl_gen_statem.erl"},{line,939}]},
         {gen_statem,loop_state_callback,11,
                     [{file,"gen_statem.erl"},{line,1426}]},
         {tls_connection,init,1,[{file,"tls_connection.erl"},{line,160}]},
         {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]},
       {gen_statem,call,[<0.28413.4369>,{recv,-1,5000},infinity]}}}

@RaimoNiskanen
Copy link
Contributor

Seems like a bug i Hackney, then, since it is called from hackney_response:stream_body_recv/2.

I will create a pull request to make this a function clause in ssl:recv/3 and link it to this issue, which will be closed when that improvement is merged.

RaimoNiskanen added a commit that referenced this issue Aug 4, 2023
…o maint

* raimo/ssl/non_neg_integer-length/GH-7507/OTP-18700:
  Guard against negative Length
RaimoNiskanen added a commit that referenced this issue Aug 4, 2023
…o maint

* raimo/ssl/non_neg_integer-length/GH-7507/OTP-18700:
  Guard against negative Length
@RaimoNiskanen
Copy link
Contributor

Strange..., this issue did not auto close when I merged PR #7526.
Maybe because benoitc/hackney#722 is still open. Might as well...

@IngelaAndin
Copy link
Contributor

I found that the only safe way to always get it closed is to include the close/solved comment in the commit message.

@RaimoNiskanen RaimoNiskanen linked a pull request Aug 4, 2023 that will close this issue
@RaimoNiskanen
Copy link
Contributor

I saw now that a closing keyword is stated to only work on the default branch, which should be 'master' in this repo, not 'maint'...

@RaimoNiskanen RaimoNiskanen added this to the OTP-26.1 milestone Aug 8, 2023
rickard-green pushed a commit that referenced this issue Sep 7, 2023
…o maint-25

* raimo/ssl/non_neg_integer-length/GH-7507/OTP-18700:
  Guard against negative Length
@apoorv-2204
Copy link

apoorv-2204 commented Jan 19, 2024

facing same issue, don't know how much there are related edgurgel/httpoison#485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants