Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panics at on context close, v1.33.0 #1205

Closed
Gaudeamus opened this issue Feb 3, 2022 · 19 comments · Fixed by #1234
Closed

Panics at on context close, v1.33.0 #1205

Gaudeamus opened this issue Feb 3, 2022 · 19 comments · Fixed by #1234

Comments

@Gaudeamus
Copy link

Hello, unfortunately there are no more details except this message I could find.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x200 pc=0x901667]

goroutine 19473 [running]:
github.com/valyala/fasthttp.(*RequestCtx).Done(0x8c4639)
        /go/src/dsp/vendor/github.com/valyala/fasthttp/server.go:2691 +0x7
context.propagateCancel.func1()
        /usr/local/go/src/context/context.go:280 +0x50
created by context.propagateCancel
        /usr/local/go/src/context/context.go:278 +0x1d0
@Gaudeamus
Copy link
Author

I believe it's due to latest func (ctx *RequestCtx) reset() where ctx.s = nil is now set

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

Are you calling Done() function inside request handler or in some goroutine/task scheulder (ants, etc) where requestctx may be already released? if so, probably Done channel should be copied first into some variable inside request handler or passed to the goroutine as an argument directly. I believe it's the cost of framework reusing requestCtx internally.

@Gaudeamus
Copy link
Author

on the init of handler it goes like:

	handlerData.requestCtx = fasthttpCtx
	handlerData.ctx, handlerCtx.cancel = context.WithTimeout(fasthttpCtx, time.Millisecond*500)

on the defer of the handler there is release context like this:

	handlerData.requestCtx = nil //ensure not use context anymore
	if handlerData.cancel != nil {
		handlerData.cancel()
	}
	if handlerData.ctx != nil {
		select {
		case <-handlerData.ctx.Done():
		default:
			logs.Warning("context is not released")
		}
	}
	handlerData.cancel = nil
	handlerData.ctx = nil

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

as I understand it logging "context is not released", then panics after 500ms, you still referencing requestCtx inside context.WithTimeout, that's why it panics (handler finishes and sets ctx.s = nil, than somewhere in the context goroutine calling Done and panics). I don't like such behaiviour either. But strange since you are calling cancel, but the problem is here in the context

// context.go
go func() {
	select {
	case <-parent.Done(): // since this goroutine instantiated after the request handler is done, and parent don't have Done (ctx.s already =  nil)
		child.cancel(false, parent.Err())
	case <-child.Done():
	}
}()

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

I'll make pull request to golang to fix it, I guess it's reasonable) golang/go#50989

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

But in some cases parent.Err() will panic anyway, since server may close s.done channel, and in parallel child cancel then sometimes it might call parent.Err() when s already nil

// context.go
go func() {
	select {
	case <-done: // fixed but
		child.cancel(false, parent.Err()) // this is not, any race condition with parent (*requestCtx) can become a problem
	case <-child.Done():
	}
}()

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

I guess it's better to replace done channel inside s to context.Context (*cancelCtx) than no additional goroutine should be created on withTimeout as a result no race condition created for fasthttp.*RequestCtx

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

@Gaudeamus you can try this pull request. One of the issues of this pull request is that parent context shared across all requests handlers and all the issues of context.Context behavior...

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

Another fix for it is to utilise atomic for requestCtx.s field #1207 (won't fix requestCtx used inside context.Context might receive another server with another done channel concurrently)

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

@erikdubbelboer please review, probably you have a better idea how to avoid race condition with RequestCtx.
The problem here is that context package expects contexts to not be reused... , interesting case thanks @Gaudeamus for submitting it

@Gaudeamus
Copy link
Author

Gaudeamus commented Feb 3, 2022

just for your info - no warning is triggered, as you assumed

as I understand it logging "context is not released", then panics after 500ms

I'm currently downgrating to 1.32, will wait for next release to try. Thanx

@moredure
Copy link
Contributor

moredure commented Feb 3, 2022

Yes, true, no "context is not released" printed, because the context itself canceled, but context's WithDeadline internal goroutine which relies on (*requestCtx).Done() chan struct will be instantiated afterwards panicing with nil exception on RequestCtx s *Server field called by Done()...

@gzuidhof
Copy link

I can confirm I'm running into the same issue, when reverting back to v1.32.0 it's all good again.

@moredure
Copy link
Contributor

moredure commented Feb 16, 2022

@gzuidhof can you check #1206 it should fix this and also reduce the number of goroutines created, but probably will increase lock contention since sharing server context across all requestCtx handlers.

We certainly need to assign a bug flag to this ticket.

@erikdubbelboer
Copy link
Collaborator

Sorry I haven't had the time to look into this yet, I will hopefully this weekend.

@itzikiusa
Copy link

Hi, any news about it?

@GalvinGao
Copy link

Can confirm with github.com/valyala/fasthttp v1.33.0 // indirect, used by github.com/gofiber/fiber/v2 v2.27.0.

backend_1  | panic: runtime error: invalid memory address or nil pointer dereference
backend_1  | [signal SIGSEGV: segmentation violation code=0x1 addr=0x200 pc=0x881e47]
backend_1  |
backend_1  | goroutine 21135 [running]:
backend_1  | github.com/valyala/fasthttp.(*RequestCtx).Done(0x1097f80)
backend_1  | 	/go/pkg/mod/github.com/valyala/fasthttp@v1.33.0/server.go:2691 +0x7
backend_1  | context.propagateCancel.func1()
backend_1  | 	/usr/local/go/src/context/context.go:280 +0x50
backend_1  | created by context.propagateCancel
backend_1  | 	/usr/local/go/src/context/context.go:278 +0x1d0

erikdubbelboer added a commit that referenced this issue Mar 3, 2022
RequestCtx's are reused in a server specific pool, so no need to reset
it. This fixes a use after reset but when RequestCtx is use as context.

Fixes #1205
@erikdubbelboer
Copy link
Collaborator

How about this as a fix?: #1234

@moredure
Copy link
Contributor

moredure commented Mar 3, 2022

Should totally works as long as context pool server specific!
The only thing my solution might compete on is: parent context lock (internally on a map of contexts) vs goroutine creation per singe request.
But it's another story, and certainly needed to be benchmarked, maybe in some high load cases tons of goroutines created worser than lock contention on parent server context, memory vs lock contention, etc.

erikdubbelboer added a commit that referenced this issue Mar 4, 2022
RequestCtx's are reused in a server specific pool, so no need to reset
it. This fixes a use after reset but when RequestCtx is use as context.

Fixes #1205
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants