Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected event type at TCPLayer.relay_messages: Expected DataReceived|ConnectionClosed|TcpMessageInjected, got Start({}) #6787

Open
changsin opened this issue Apr 6, 2024 · 13 comments
Labels
kind/triage Unclassified issues

Comments

@changsin
Copy link
Contributor

changsin commented Apr 6, 2024

Problem Description

We do not have a good repro scenario but we find the stack trace like this in some of our users.
They were using Safari when encountering these issues.

[10:47:50.564][127.0.0.1:50824] mitmproxy has crashed!
Traceback (most recent call last):
  File "mitmproxy/proxy/server.py", line 381, in server_event
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layer.py", line 270, in handle_event
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 880, in _handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 936, in event_to_child
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 776, in passthrough
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/tunnel.py", line 98, in _handle_event
  File "mitmproxy/proxy/layers/tls.py", line 489, in event_to_child
  File "mitmproxy/proxy/tunnel.py", line 153, in event_to_child
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/tunnel.py", line 98, in _handle_event
  File "mitmproxy/proxy/tunnel.py", line 153, in event_to_child
  File "mitmproxy/proxy/layer.py", line 272, in handle_event
  File "mitmproxy/proxy/layer.py", line 142, in handle_event
  File "mitmproxy/proxy/layer.py", line 233, in __continue
  File "mitmproxy/proxy/layer.py", line 196, in __process
  File "mitmproxy/proxy/layer.py", line 289, in _handle_event
  File "mitmproxy/proxy/layer.py", line 303, in _ask
  File "mitmproxy/proxy/layer.py", line 148, in handle_event
  File "mitmproxy/proxy/utils.py", line 27, in _check_event_type
AssertionError: Unexpected event type at TCPLayer.relay_messages: Expected DataReceived|ConnectionClosed|TcpMessageInjected, got Start({}).

Steps to reproduce the behavior:

We do not have a good repro scenario but they were using Safari when encountering these issues.
I am not sure if this is related, but we were trying to block accessing certain sites with this code:

    flow.response = http.Response.make(
        403,  # Forbidden status code
        b"Blocked: Access is not allowed",
        {"Content-Type": "text/plain"}
    )

This code works fine in Chrome and most users in Safari but does not seem to work in a small number of machines because of this issue.

System Information

Paste the output of "mitmproxy --version" here.

Mitmproxy: 10.2.4
Python: 3.10.11
OpenSSL: OpenSSL 3.2.1 30 Jan 2024
Platform: macOS-14.4-arm64-arm-64bit

@changsin changsin added the kind/triage Unclassified issues label Apr 6, 2024
@changsin
Copy link
Contributor Author

changsin commented Apr 6, 2024

We do not have good repro scenarios but based on the logs, the crash is proceeded by the following events:

  • Almost 47% of the time, the crash happend when ignoreHosts list is updated
  • About 21% of the time, the crash proceeded by tls_failed_client error
  • About 22% of the time, it is proceeded by OpenSSL.SSL.WantReadError

Is there a good way to debug this problem?

@changsin
Copy link
Contributor Author

changsin commented Apr 6, 2024

Some more additional info. It looks like the error occurs in the following conditions:

  1. TLS failures or server disconnect occurs
  2. Results in either OpenSSL.SSL.WantReadError or Unexpected event type at TCPLayer.relay_messages errors

When this happened, I was using WordPress or Expedia. After the crash, the site became unresponsive for 10 minutes or so.
Here are the callstacks.

[21:50:43.774][127.0.0.1:65504] server disconnect wordpress.com:443 (192.0.78.17:443)
[21:50:43.776][127.0.0.1:65504] mitmproxy has crashed!
Traceback (most recent call last):
  File "mitmproxy/proxy/server.py", line 381, in server_event
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layer.py", line 270, in handle_event
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 880, in _handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 936, in event_to_child
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 776, in passthrough
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/tunnel.py", line 98, in _handle_event
  File "mitmproxy/proxy/layers/tls.py", line 489, in event_to_child
  File "mitmproxy/proxy/tunnel.py", line 154, in event_to_child
  File "mitmproxy/proxy/tunnel.py", line 123, in _handle_command
  File "mitmproxy/proxy/layers/tls.py", line 435, in send_data
  File "OpenSSL/SSL.py", line 2056, in sendall
  File "OpenSSL/SSL.py", line 1810, in _raise_ssl_error
OpenSSL.SSL.WantReadError
[21:51:13.150][127.0.0.1:65522] Unable to establish TLS connection with server (connection closed). Trying to establish TLS with client anyway. If you plan to redirect requests away from this server, consider setting `connection_strategy` to `lazy` to suppress early connections.
[21:51:13.162][127.0.0.1:65522] server disconnect www.expedia.com:443 (23.52.32.154:443)
[21:51:13.165][127.0.0.1:65522] mitmproxy has crashed!
Traceback (most recent call last):
  File "mitmproxy/proxy/server.py", line 381, in server_event
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layer.py", line 270, in handle_event
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 880, in _handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 936, in event_to_child
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 776, in passthrough
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/tunnel.py", line 98, in _handle_event
  File "mitmproxy/proxy/layers/tls.py", line 489, in event_to_child
  File "mitmproxy/proxy/tunnel.py", line 154, in event_to_child
  File "mitmproxy/proxy/tunnel.py", line 123, in _handle_command
  File "mitmproxy/proxy/layers/tls.py", line 435, in send_data
  File "OpenSSL/SSL.py", line 2056, in sendall
  File "OpenSSL/SSL.py", line 1810, in _raise_ssl_error
OpenSSL.SSL.WantReadError
[21:51:13.167][127.0.0.1:65522] mitmproxy has crashed!
Traceback (most recent call last):
  File "mitmproxy/proxy/server.py", line 381, in server_event
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layer.py", line 270, in handle_event
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 880, in _handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 936, in event_to_child
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/layers/http/__init__.py", line 776, in passthrough
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/tunnel.py", line 98, in _handle_event
  File "mitmproxy/proxy/layers/tls.py", line 489, in event_to_child
  File "mitmproxy/proxy/tunnel.py", line 153, in event_to_child
  File "mitmproxy/proxy/layer.py", line 157, in handle_event
  File "mitmproxy/proxy/tunnel.py", line 98, in _handle_event
  File "mitmproxy/proxy/tunnel.py", line 153, in event_to_child
  File "mitmproxy/proxy/layer.py", line 272, in handle_event
  File "mitmproxy/proxy/layer.py", line 142, in handle_event
  File "mitmproxy/proxy/layer.py", line 233, in __continue
  File "mitmproxy/proxy/layer.py", line 196, in __process
  File "mitmproxy/proxy/layer.py", line 289, in _handle_event
  File "mitmproxy/proxy/layer.py", line 303, in _ask
  File "mitmproxy/proxy/layer.py", line 148, in handle_event
  File "mitmproxy/proxy/utils.py", line 27, in _check_event_type
AssertionError: Unexpected event type at TCPLayer.relay_messages: Expected DataReceived|ConnectionClosed|TcpMessageInjected, got Start({}).
[21:51:13.739][127.0.0.1:65524] client connect

@mhils
Copy link
Member

mhils commented Apr 6, 2024 via email

@changsin
Copy link
Contributor Author

changsin commented Apr 7, 2024

I just copied the main's server.py and rebuilt mitmproxy. Unfortunately, it still reproes. Should I apply the whole main?

@mhils
Copy link
Member

mhils commented Apr 8, 2024

Copying over server.py would have been enough here. But you can preempt the entire question by taking the current main branch in its entirety. :)

An unexpected Start event is a curious case. Can you verify if this still reproduces if you comment out this line?

loop.set_task_factory(asyncio.eager_task_factory) # type: ignore

@changsin
Copy link
Contributor Author

changsin commented Apr 9, 2024

I tried the main in its entirety but it still reproes
However, commenting out asyncio_utils.py line 84 seems to work.
We will test run for another day or so to make sure, but this seems promising so far.

Some additional info:

  • The crash seems to occur more often right after server disconnect events
  • Trying to disconnect from this particular URL: https://in.appcenter.ms/ often results in crashes

I'll provide some more useful info if I can find it.

@changsin
Copy link
Contributor Author

Unfortunately, we are still seeing the same crashing issues. The frequency seems reduced though but they are still happening with those two call stacks. It would be great if there is a way to debug the issue better.

@mhils
Copy link
Member

mhils commented Apr 10, 2024

Ideally we'd get this reproduced with --set proxy_debug -vvv, but from what I've seen so far I wouldn't be surprised if this breaks the race condition.

@changsin
Copy link
Contributor Author

I added proxy_debug through DumpMaster but it is still reproing.
The output does not seem different from what we had before. Maybe I am doing something wrong with this option?

    master_instance = dump.DumpMaster(
        opts,
        with_termlog=True,
        with_dumper=False,
    )
    master_instance.addons.add(strac_dlp)
    master_instance.addons.add(tlsconfig)
    master_instance.options.proxy_debug = True
    await master_instance.run()

@mhils
Copy link
Member

mhils commented Apr 12, 2024 via email

@changsin
Copy link
Contributor Author

How about catching SSL.WantReadError here? I will test it on my machine and let you know if it improves.

    def send_data(self, data: bytes) -> layer.CommandGenerator[None]:
        try:
            self.tls.sendall(data)
        except (SSL.ZeroReturnError, SSL.SysCallError, SSL.WantReadError): <-- added SSL.WantReadError
            # The other peer may still be trying to send data over, which we discard here.
            pass
        yield from self.tls_interact()


@mhils
Copy link
Member

mhils commented Apr 22, 2024

How about catching SSL.WantReadError here? I will test it on my machine and let you know if it improves.

I'm not really sure what to make out of the WantReadError from sendall - if that's really something that can happen for healthy connections, we probably need to change the TLS addon quite a bit. In short, if we just swallow the error for send(), we will forget that piece of data instead of retrying it.

The OpenSSL docs (https://www.openssl.org/docs/manmaster/man3/SSL_get_error.html) make it sound like send() only triggers WANT_READ if negotation is happening. Can you check whether this error is only occurring for TLS 1.2 or below? (catch WantReadError and then print details about the connection).

If this happens for TLS 1.3, my only suspicion would be that we have a situation here where the layer is already trying to transmit data before the TLS layer has finished the handshake. Maybe dump self.tunnel_state in the WantReadError handler as well and see if that is indicative of something?

@changsin
Copy link
Contributor Author

So far, the website that tend to result in crashes are all TLS1.2:

default.exp-tas.com
treatment.grammarly.com
consumer.entitlement.skype.com

Also, the repro seems quite consistent. If a crash is due to SSL.WantReadError, the scenario is:

  1. tls_failed_server
  2. add the failed domain to ignoreHosts list (this is our internal logic to mitigate certificate errors)
  3. SSL.WantReadError
  4. mitmproxy crashed

Here is an example

[07:51:52.408] tls_failed_server for default.exp-tas.com TlsData(conn=Server({'id': '\u202654a921', 'address': ('default.exp-tas.com', 443), 'peername': ('13.107.5.93', 443), 'sockname': ('192.168.45.36', 61392), 'state': <ConnectionState.CAN_WRITE: 2>, 'error': 'connection closed', 'tls': True, 'alpn_offers': [], 'sni': 'default.exp-tas.com', 'timestamp_start': 1713739912.3554618, 'timestamp_tcp_setup': 1713739912.364467}), context=Context(
  Client({'id': '\u2026c4fabc', 'address': None, 'peername': ('127.0.0.1', 61389), 'sockname': ('127.0.0.1', 8080), 'state': <ConnectionState.OPEN: 3>, 'tls': True, 'alpn_offers': [], 'sni': 'default.exp-tas.com', 'timestamp_start': 1713739912.348948}),
  Server({'id': '\u202654a921', 'address': ('default.exp-tas.com', 443), 'peername': ('13.107.5.93', 443), 'sockname': ('192.168.45.36', 61392), 'state': <ConnectionState.CAN_WRITE: 2>, 'error': 'connection closed', 'tls': True, 'alpn_offers': [], 'sni': 'default.exp-tas.com', 'timestamp_start': 1713739912.3554618, 'timestamp_tcp_setup': 1713739912.364467}),
  layers=[[HttpProxy(state: handle_event), HttpLayer(regular, conns: 2), HttpStream(id=1, passthrough), ServerTLSLayer(establishing 'default.exp-tas.com' None), ClientTLSLayer(establishing 'default.exp-tas.com' None)]]
), ssl_conn=<OpenSSL.SSL.Connection object at 0x122698790>, is_dtls=False)

[07:51:52.454] ignoreHosts list temporarily updated
[07:51:52.455][127.0.0.1:61389] Unable to establish TLS connection with server (connection closed). Trying to establish TLS with client anyway. If you plan to redirect requests away from this server, consider setting `connection_strategy` to `lazy` to suppress early connections.
[07:51:52.467][127.0.0.1:61389] server disconnect default.exp-tas.com:443 (13.107.5.93:443)
[07:51:52.469][127.0.0.1:61389] mitmproxy has crashed!
Traceback (most recent call last):
  File \"mitmproxy/proxy/server.py\", line 381, in server_event
  File \"mitmproxy/proxy/layer.py\", line 157, in handle_event
  File \"mitmproxy/proxy/layer.py\", line 270, in handle_event
  File \"mitmproxy/proxy/layer.py\", line 157, in handle_event
  File \"mitmproxy/proxy/layers/http/__init__.py\", line 880, in _handle_event
  File \"mitmproxy/proxy/layers/http/__init__.py\", line 936, in event_to_child
  File \"mitmproxy/proxy/layer.py\", line 157, in handle_event
  File \"mitmproxy/proxy/layers/http/__init__.py\", line 776, in passthrough
  File \"mitmproxy/proxy/layer.py\", line 157, in handle_event
  File \"mitmproxy/proxy/tunnel.py\", line 98, in _handle_event
  File \"mitmproxy/proxy/layers/tls.py\", line 489, in event_to_child
  File \"mitmproxy/proxy/tunnel.py\", line 154, in event_to_child
  File \"mitmproxy/proxy/tunnel.py\", line 123, in _handle_command
  File \"mitmproxy/proxy/layers/tls.py\", line 435, in send_data
  File \"OpenSSL/SSL.py\", line 2056, in sendall
  File \"OpenSSL/SSL.py\", line 1810, in _raise_ssl_error
OpenSSL.SSL.WantReadError

Does this give you any clue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/triage Unclassified issues
Projects
None yet
Development

No branches or pull requests

2 participants