Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in case of "IncompleteMessage: connection closed before message completed" #3610

Open
satyamagarwal249 opened this issue Mar 23, 2024 · 3 comments
Labels
C-bug Category: bug. Something is wrong. This is bad! V-0.14.x 0.14.x versions

Comments

@satyamagarwal249
Copy link

satyamagarwal249 commented Mar 23, 2024

Version
hyper 0.14.28
tokio 1.35.0
tokio-openssl 0.6.3

Description
Observing memory increase for "Incomplete Message error".

testing secario:

  1. server: on http1 request, simply doing connection close.

server.go:

package main

import (
	"crypto/tls"
	"fmt"
	"net/http"
	"os"
)

func handler(w http.ResponseWriter, r *http.Request) {
	// Hijack the connection to get access to the underlying TCP connection
	hj, ok := w.(http.Hijacker)
	if !ok {
		http.Error(w, "Hijacking not supported", http.StatusInternalServerError)
		return
	}
	conn, _, err := hj.Hijack()
	if err != nil {
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}
	defer conn.Close()

	// Close the connection immediately
	conn.Close()

	fmt.Println("Connection closed by server")
}

func main() {
	http.HandleFunc("/", handler)

	// Configure TLS with TLS 1.2
	tlsConfig := &tls.Config{
		MinVersion: tls.VersionTLS12,
	}

	server := &http.Server{
		Addr:      ":8080",
		Handler:   nil, // Default handler is used (Mux if not set)
		TLSConfig: tlsConfig,
	}

	// Start the HTTPS server with the specified TLS configuration
	err := server.ListenAndServeTLS("cert/server.crt", "cert/server.key")
	if err != nil {
		fmt.Println("Error starting server:", err)
		os.Exit(1)
	}
}

  1. client: Just 1 request in loop untill success (infinite, as server never ACK): http-put "1MB" data to server.

configs:
- http1 only
- disable pooling for simplicity: max_idle per pool=0

driving_client.rs

to add...
  • Client is always getting connection reset by server. This is leading to two types of error response in hyper:

    (A) hyper::Error(BodyWrite, Os { code: 104, kind: ConnectionReset, message: \"Connection reset by peer\" }))
    (B) hyper::Error(IncompleteMessage)
    where (A) does not seems to cause memory leakage, while (B) seems to.

  • I observed that:

    1. in high bandwidth scenario: (A) is dominating.
    2. in low bandwidth scenario: (B) is dominating

When I run my tests (15mins):

  1. In high bandwidth (around 5MBps),

    • memory was constant: no leakage.
    • Error A count: 5242
    • Error B count: 1
    • As per hyper logs: total 315MB flushed.
  2. In low bandwidth scenario (100KBps),

    • memory kept increasing continuously linearly. Around 40MB of leakage observed.
    • Error A count: 0
    • Error B count: 5004
    • As per hyper logs: total 80MB flushed. The timeseries graph of data flushed is parallel to memory increase. Memory leak seems to be is 50% of whatever flushed. (just trying to connect random dots...)

see memory growth of case 1 vs case2:
Screenshot from 2024-03-23 16-51-34

  • It seems something is happening in hyper & not because of my-driving code, I confirmed it as: I changed error to connection failure by enabling SSL verification
    hyper::Error(Connect, SSL(Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 167772294, library: \"SSL routines\", function: \"tls_post_process_server_certificate\", reason: \"certificate verify failed\", file: \"../ssl/statem/statem_clnt.c\", line: 1883 }]))) })) and it kept retrying. memory was constant: no leakage. It seems such leakage only when some data flushing happens.

Note:

  1. malloc_trim calls seems to clean the leakage to great extent.

  2. This was I tried to reproduce locally on my laptop where for just 1MB request, in few minutes 100MB of leakage was observed (where graph is continuously increasing). I have seen similar scenario in production, where it lead to around 6GB of memory leakage, which is what the main concern is.

Screenshot from 2024-03-23 17-49-08

  • See in above graph, where the memory shooted to 6GB for our app (ideally our app consumes 1 to 1.4GB max at a time). On debugging, I found we received 157310 instances of 503 errors from S3 (rate-limit by them). our each request was around 28KB and when I plotted timeseries graph of error blocks size accumulated, the growth and curvature of graph was completely matching.

  • Note that sudden dip of 1GB in memory at 15:54:30 is due to malloc_trim call. Above run was on EC2, where malloc_trim cleared only 1GB out of leaked 5GB, while on local runs on ubunut laptop malloc_trim seems to clear all leakage due to error blocks.

  1. I am still in process of understanding the hyper workflow. Pardon me, if any of below comments are wrong:
  • Overall observation is: when there is connection reset by server while some data is already flushed by hyper client, there is possibilty of memory leakage based on some racing condition of when/how hyper gets to know about conncetion closure.

  • like Error A, was received by hyper during poll_write, which make AsyncWrite call to n/w: tokio_openssl::SslStream:poll_write.
    Where os informed about connection reset. This case does not seems to be causing memory/resource leak.

logs

2886-ts=2024-03-22T16:48:48.357019727Z TRACE  hyper::client::client: handshake complete, spawning background dispatcher task
2887-ts=2024-03-22T16:48:48.357207290Z TRACE hyper::proto::h1::conn: flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Busy }
2888-ts=2024-03-22T16:48:48.357365572Z TRACE encode_headers: hyper::proto::h1::role: Client::encode method=PUT, body=Some(Known(1049396))
2889-ts=2024-03-22T16:48:48.357446885Z TRACE hyper::proto::h1::io: buffer.flatten self.len=439 buf.len=1049396
2891-ts=2024-03-22T16:48:48.357622526Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
2893-ts=2024-03-22T16:48:48.357666519Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
2895-ts=2024-03-22T16:48:48.357705690Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
2897-ts=2024-03-22T16:48:48.357748504Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
2898-ts=2024-03-22T16:48:48.357776174Z DEBUG hyper::proto::h1::dispatch: error writing: Connection reset by peer (os error 104)
  • while Error (B) was received by hyper during poll_read, which makes AsyncRead call to n/w: tokio_openssl::SslStream:poll_read. Where in 0 bytes response indicates that connection closed while some message was expected. This case seems to be causing memory/resouce leak. It also errored on shutdown as: error shutting down IO: Transport endpoint is not connected (os error 107)

logs

1928-ts=2024-03-22T16:48:46.518283727Z TRACE hyper::client::conn: client handshake Http1
1929-ts=2024-03-22T16:48:46.518358157Z TRACE hyper::client::client: handshake complete, spawning background dispatcher task
1930-ts=2024-03-22T16:48:46.518559721Z TRACE hyper::proto::h1::conn: flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Busy }
1931-ts=2024-03-22T16:48:46.518703546Z TRACE encode_headers: hyper::proto::h1::role: Client::encode method=PUT, body=Some(Known(1049396))
1932-ts=2024-03-22T16:48:46.518768090Z TRACE hyper::proto::h1::io: buffer.flatten self.len=439 buf.len=1049396
1934-ts=2024-03-22T16:48:46.518939168Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
1936-ts=2024-03-22T16:48:46.518991601Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
1938-ts=2024-03-22T16:48:46.519025458Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
1940-ts=2024-03-22T16:48:46.519061873Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
1942-ts=2024-03-22T16:48:46.519115645Z DEBUG hyper::proto::h1::io: flushed 16384 bytes
1943-ts=2024-03-22T16:48:46.519165894Z TRACE hyper::proto::h1::conn: Conn::read_head
1944-ts=2024-03-22T16:48:46.519198015Z TRACE hyper::proto::h1::io: received 0 bytes
1945-ts=2024-03-22T16:48:46.519209421Z TRACE hyper::proto::h1::io: parse eof
1946-ts=2024-03-22T16:48:46.519222649Z TRACE hyper::proto::h1::conn: State::close_read()
1947-ts=2024-03-22T16:48:46.519231806Z DEBUG hyper::proto::h1::conn: parse error (connection closed before message completed) with 0 bytes
1948-ts=2024-03-22T16:48:46.519240215Z DEBUG hyper::proto::h1::dispatch: read_head error: connection closed before message completed
1949-ts=2024-03-22T16:48:46.519262154Z TRACE hyper::proto::h1::conn: State::close_read()
1950-ts=2024-03-22T16:48:46.519271345Z TRACE hyper::proto::h1::conn: State::close_write()
1951-ts=2024-03-22T16:48:46.519295766Z DEBUG hyper::proto::h1::conn: error shutting down IO: Transport endpoint is not connected (os error 107)
1952-ts=2024-03-22T16:48:46.519380733Z DEBUG hyper::client::client: client connection error: error shutting down connection: Transport endpoint 
  • the buffered data on closed connection is causing leakage some where in some scenarios (incomplete message is one of them). I am still in process of figuring out how it cleans on closed connection.

  • As the errored blocks seems to get accumulated in memory and is unbounded, this issue can cause serious troubles. As an Application will keep consuming unbounded memory (in error scenarios) may suddenly crash.

@satyamagarwal249 satyamagarwal249 added the C-bug Category: bug. Something is wrong. This is bad! label Mar 23, 2024
@satyamagarwal249
Copy link
Author

In case of hyper::Error(IncompleteMessage) , as was seeing shutdown-error as: error shutting down IO: Transport endpoint is not connected (os error 107). So I tried disabling shutdown, as below:

impl AsyncWrite for MyStream {
    ...
    fn poll_shutdown(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), std::io::Error>> {
        // let this = self.project();
        // this.io.poll_shutdown(cx)
        Poll::Ready(Ok(()))
    }
}

This magically made memory graph horizontal, no leakage. No clue why this is happening.

@dswij
Copy link
Member

dswij commented Mar 31, 2024

@satyamagarwal249 thank you for the write-up. Can you help to post the minimal code for the client too, so that we can easily reproduce this?

@dswij
Copy link
Member

dswij commented Mar 31, 2024

Also, from your last comment, it seems like it has something to do with the io you're using. Is it possible that the implementation of io in your code is causing the leakage?

@dswij dswij added the V-0.14.x 0.14.x versions label Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug. Something is wrong. This is bad! V-0.14.x 0.14.x versions
Projects
None yet
Development

No branches or pull requests

2 participants