Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go routine deadlock #2485

Closed
RSWilli opened this issue Aug 19, 2022 · 7 comments
Closed

go routine deadlock #2485

RSWilli opened this issue Aug 19, 2022 · 7 comments

Comments

@RSWilli
Copy link

RSWilli commented Aug 19, 2022

Using Vite+sveltekit:

I switched branches in my repo and left vite running, while a lot of files changed (thanks to sveltejs/kit#5774) and I got the following core dump deadlock.

Maybe not as important in this case, but I want to put it here in case it has some importance in other cases.

fatal error: all goroutines are asleep - deadlock!

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x9bc6e0?)
        runtime/sema.go:62 +0x25
sync.(*WaitGroup).Wait(0xc000012010?)
        sync/waitgroup.go:139 +0x52
main.runService(0x1)
        github.com/evanw/esbuild/cmd/esbuild/service.go:177 +0x43b
main.main()
        github.com/evanw/esbuild/cmd/esbuild/main.go:214 +0x1ef

goroutine 6 [chan receive]:
main.runService.func1()
        github.com/evanw/esbuild/cmd/esbuild/service.go:110 +0x5d
created by main.runService
        github.com/evanw/esbuild/cmd/esbuild/service.go:108 +0x16b

goroutine 7 [chan receive]:
main.(*serviceType).sendRequest(0xc00003e300, {0x885a20, 0xc000408030})
        github.com/evanw/esbuild/cmd/esbuild/service.go:203 +0xfa
main.runService.func2()
        github.com/evanw/esbuild/cmd/esbuild/service.go:132 +0x39
created by main.runService
        github.com/evanw/esbuild/cmd/esbuild/service.go:129 +0x26e

goroutine 7295 [chan receive]:
main.(*serviceType).sendRequest(0xc00003e300, {0x885a20, 0xc000298ab0})
        github.com/evanw/esbuild/cmd/esbuild/service.go:203 +0xfa
main.(*serviceType).convertPlugins.func2.3({{0xc000214fd0, 0x4}, {0xc000030b10, 0x2a}, {0x8e76ce, 0x4}, {0xc000030b10, 0x1b}, 0x1, {0x0, ...}})
        github.com/evanw/esbuild/cmd/esbuild/service.go:781 +0x5e5
github.com/evanw/esbuild/pkg/api.(*pluginImpl).onResolve.func1({{0xc000214fd0, 0x4}, {0xc000030b10, 0x1b}, {0x0, 0x0}, {{0xc000030b10, 0x2a}, {0x8e76ce, 0x4}, ...}, ...})
        github.com/evanw/esbuild/pkg/api/api_impl.go:1625 +0x215
github.com/evanw/esbuild/internal/bundler.RunOnResolvePlugins({0xc0006012c0?, 0x1, 0x0?}, {0x9bfe38?, 0xc000100800}, {0xc00021e000, 0xc0000105b8, 0xc0000105d0, 0xc0000801a0, 0x4, ...}, ...)
        github.com/evanw/esbuild/internal/bundler/bundler.go:742 +0x955
github.com/evanw/esbuild/internal/bundler.parseFile({{0x9c2170, 0xc00003e280}, {0xc00021e000, 0xc0000105b8, 0xc0000105d0, 0xc0000801a0, 0x4, 0xc0002abbf0}, {0x9bfe38, 0xc000100800}, ...})
        github.com/evanw/esbuild/internal/bundler/bundler.go:406 +0x2410
created by github.com/evanw/esbuild/internal/bundler.(*scanner).maybeParseFile
        github.com/evanw/esbuild/internal/bundler/bundler.go:1257 +0x9ab

goroutine 7122 [chan receive]:
github.com/evanw/esbuild/internal/bundler.(*scanner).scanAllDependencies(0xc000100c00)
        github.com/evanw/esbuild/internal/bundler/bundler.go:1656 +0x218
github.com/evanw/esbuild/internal/bundler.ScanBundle({_, _, _, _, _, _}, {_, _}, {0x9bfe38, 0xc000100800}, ...)
        github.com/evanw/esbuild/internal/bundler/bundler.go:1138 +0x7d9
github.com/evanw/esbuild/pkg/api.rebuildImpl({0x2, 0x4, 0x0, 0xc0002ab8f0, 0x1, {0x0, 0x0}, 0x0, 0x0, {0x0, ...}, ...}, ...)
        github.com/evanw/esbuild/pkg/api/api_impl.go:1066 +0x220c
github.com/evanw/esbuild/pkg/api.buildImpl({0x2, 0x4, 0x0, 0xc0002ab8f0, 0x1, {0x0, 0x0}, 0x0, 0x0, {0x0, ...}, ...})
        github.com/evanw/esbuild/pkg/api/api_impl.go:799 +0x62b
github.com/evanw/esbuild/pkg/api.Build(...)
        github.com/evanw/esbuild/pkg/api/api.go:364
main.(*serviceType).handleBuildRequest(0xc00003e300, 0x46?, 0xc0002ab410?)
        github.com/evanw/esbuild/cmd/esbuild/service.go:501 +0x978
main.(*serviceType).handleIncomingPacket(0x0?, {0xc0001f6000?, 0x0?, 0x0?})
        github.com/evanw/esbuild/cmd/esbuild/service.go:232 +0x191
main.runService.func3()
        github.com/evanw/esbuild/cmd/esbuild/service.go:163 +0x32
created by main.runService
        github.com/evanw/esbuild/cmd/esbuild/service.go:162 +0x450
[1]    11705 segmentation fault (core dumped)  npm run dev
@evanw
Copy link
Owner

evanw commented Aug 19, 2022

Others have encountered something like this in the past: #1842. Unfortunately I can't reproduce this myself, and I was unable to fix it. Marking as unactionable because I likely won't be able to do anything with this report.

@jfirebaugh
Copy link

jfirebaugh commented Nov 1, 2022

I haven't been able to come up with a reproduction of the deadlock, but I did theorycraft a data race and confirmed it via -race. The data race happens when service.go discovers that stdin is EOF at the same moment that the sendPings goroutine sends a ping. In that case, service.keepAliveWaitGroup.Add(1) races service.keepAliveWaitGroup.Wait(). This race is apparently what the documentation for func (*WaitGroup) Add is referring to when it says

Note that calls with a positive delta that occur when the counter is zero must happen before a Wait.

And discussed in this thread.

Here is the race detector report:

==================
WARNING: DATA RACE
Write at 0x00c0001382a0 by main goroutine:
  runtime.racewrite()
      <autogenerated>:1 +0x10
  main.runService()
      github.com/evanw/esbuild/cmd/esbuild/service.go:188 +0x5d4
  main.main()
      github.com/evanw/esbuild/cmd/esbuild/main.go:208 +0x1f0

Previous read at 0x00c0001382a0 by goroutine 8:
  runtime.raceread()
      <autogenerated>:1 +0x10
  main.(*serviceType).sendRequest()
      github.com/evanw/esbuild/cmd/esbuild/service.go:208 +0xec
  main.runService.func2()
      github.com/evanw/esbuild/cmd/esbuild/service.go:139 +0x40

Goroutine 8 (running) created at:
  main.runService()
      github.com/evanw/esbuild/cmd/esbuild/service.go:135 +0x3d4
  main.main()
      github.com/evanw/esbuild/cmd/esbuild/main.go:208 +0x1f0
==================

I generated this by building esbuild (at the v0.14.38 tag -- the version we are currently using) with the following patch:

diff --git a/Makefile b/Makefile
index 87093c61..6f44e7aa 100644
--- a/Makefile
+++ b/Makefile
@@ -7,7 +7,7 @@ GO_FLAGS += "-ldflags=-s -w"
 GO_FLAGS += -trimpath
 
 esbuild: version-go cmd/esbuild/*.go pkg/*/*.go internal/*/*.go go.mod
-	CGO_ENABLED=0 go build $(GO_FLAGS) ./cmd/esbuild
+	CGO_ENABLED=1 go build $(GO_FLAGS) -race ./cmd/esbuild
 
 test:
 	@$(MAKE) --no-print-directory -j6 test-common
diff --git a/cmd/esbuild/service.go b/cmd/esbuild/service.go
index 96f18ba0..cf6d53c0 100644
--- a/cmd/esbuild/service.go
+++ b/cmd/esbuild/service.go
@@ -10,6 +10,7 @@ import (
 	"fmt"
 	"io"
 	"io/ioutil"
+	"log"
 	"os"
 	"regexp"
 	"sync"
@@ -65,6 +66,7 @@ func (service *serviceType) trackActiveBuild(key int, activeBuild *activeBuild)
 		service.activeBuilds[key] = activeBuild
 
 		// This pairs with "Done()" in "decRefCount"
+		log.Print("service.keepAliveWaitGroup.Add(1) (trackActiveBuild)")
 		service.keepAliveWaitGroup.Add(1)
 	}
 }
@@ -85,6 +87,7 @@ func (service *serviceType) decRefCount(key int, activeBuild *activeBuild) {
 		service.mutex.Unlock()
 
 		// This pairs with "Add()" in "trackActiveBuild"
+		log.Print("service.keepAliveWaitGroup.Done() (decRefCount)")
 		service.keepAliveWaitGroup.Done()
 	}
 }
@@ -94,6 +97,8 @@ type outgoingPacket struct {
 }
 
 func runService(sendPings bool) {
+	log.Print("runService")
+
 	logger.API = logger.JSAPI
 
 	service := serviceType{
@@ -114,6 +119,7 @@ func runService(sendPings bool) {
 			if _, err := os.Stdout.Write(packet.bytes); err != nil {
 				os.Exit(1) // I/O error
 			}
+			log.Print("service.keepAliveWaitGroup.Done() (packet writer)")
 			service.keepAliveWaitGroup.Done() // This pairs with the "Add()" when putting stuff into "outgoingPackets"
 		}
 	}()
@@ -129,6 +135,8 @@ func runService(sendPings bool) {
 		go func() {
 			for {
 				time.Sleep(1 * time.Second)
+				log.Print("ping")
+
 				service.sendRequest(map[string]interface{}{
 					"command": "ping",
 				})
@@ -158,13 +166,16 @@ func runService(sendPings bool) {
 
 			// Clone the input and run it on another goroutine
 			clone := append([]byte{}, packet...)
+			log.Print("service.keepAliveWaitGroup.Add(1) (loop 1)")
 			service.keepAliveWaitGroup.Add(1)
 			go func() {
 				out := service.handleIncomingPacket(clone)
 				if out.bytes != nil {
+					log.Print("service.keepAliveWaitGroup.Add(1) (loop 2)")
 					service.keepAliveWaitGroup.Add(1) // The writer thread will call "Done()"
 					service.outgoingPackets <- out
 				}
+				log.Print("service.keepAliveWaitGroup.Done() (loop 3)")
 				service.keepAliveWaitGroup.Done() // This pairs with the "Add()" in the stdin thread
 			}()
 		}
@@ -174,7 +185,9 @@ func runService(sendPings bool) {
 	}
 
 	// Wait for the last response to be written to stdout
+	log.Print("service.keepAliveWaitGroup.Wait()")
 	service.keepAliveWaitGroup.Wait()
+	log.Print("exiting")
 }
 
 func (service *serviceType) sendRequest(request interface{}) interface{} {
@@ -192,6 +205,7 @@ func (service *serviceType) sendRequest(request interface{}) interface{} {
 		service.callbacks[id] = callback
 		return id
 	}()
+	log.Print("service.keepAliveWaitGroup.Add(1) (sendRequest)")
 	service.keepAliveWaitGroup.Add(1) // The writer thread will call "Done()"
 	service.outgoingPackets <- outgoingPacket{
 		bytes: encodePacket(packet{
diff --git a/lib/npm/node.ts b/lib/npm/node.ts
index eee195e6..20b646ad 100644
--- a/lib/npm/node.ts
+++ b/lib/npm/node.ts
@@ -269,6 +269,11 @@ let ensureServiceIsRunning = (): Service => {
   // Assume the service was stopped if we get an error writing to stdin
   child.stdin.on('error', afterClose);
 
+  setTimeout(() => {
+    console.log("closing stdin")
+    child.stdin.end()
+  }, 993)
+
   // Propagate errors about failure to run the executable itself
   child.on('error', afterClose);

The value 993 was chosen by trial and error to most closely synchronize closing stdin with the first ping. After running this node script in a loop for a while, I got the race detector warning:

const esbuild = require('esbuild')
esbuild.initialize({})
setTimeout(() => process.exit(1), 1000)

My hunch is that without -race, the data race can manifest as a deadlock instead, although I haven't been able to reproduce that.

@yabab-dev
Copy link

I don't if it can help, I have same issue here with a code like this:

await esbuild(/* ...*/);
process.exit(0);

If I add a timeout for exit, problem is gone

await esbuild(/* ...*/);
setTimeout(() => process.exit(0), 100);

@evanw
Copy link
Owner

evanw commented Nov 3, 2022

@jfirebaugh Thanks very much for the detailed post. I wasn't aware of this issue with WaitGroup. I'll work on removing the data race.

evanw added a commit that referenced this issue Nov 3, 2022
@evanw evanw removed the unactionable label Nov 3, 2022
@evanw
Copy link
Owner

evanw commented Nov 3, 2022

Just pushed a potential fix for this issue. I'm going to consider this fixed and close this issue since there isn't enough information to verify the fix. We can reopen this issue if it's still happening after the next version of esbuild is released with the fix.

@evanw
Copy link
Owner

evanw commented Dec 8, 2022

My attempt to fix this didn't work: #2727. Specifically preventing esbuild from calling Add concurrently with Wait disabled the keep-alive ping of the parent process, which means esbuild no longer automatically exits when the parent process disappears.

@eyalch
Copy link

eyalch commented Apr 21, 2023

@evanw Not sure if it's helpful, but I think I was able to reproduce this issue by using the React integration of Astro and running the astro check command.

Output

~/projects/github-vh44xg
❯ npm run astro -- check
$ astro check
12:12:33 AM [check] Checking files
12:12:34 AM [content] No content directory found. Skipping type generation.
✔  Getting diagnostics for Astro files in /home/projects/github-vh44xg/…
12:12:35 AM [diagnostics] Result (1 file): 
- 0 errors
- 0 warnings
- 0 hints


~/projects/github-vh44xg 3s
❯ fatal error: too many writes on closed pipe

goroutine 6 [running]:
runtime.throw({0x99d47, 0x1e})
        runtime/panic.go:1047 +0x3 fp=0x838ed8 sp=0x838eb0 pc=0x12250003
os.sigpipe()
        runtime/os_js.go:144 +0x2 fp=0x838ef0 sp=0x838ed8 pc=0x13b70002
os.epipecheck(...)
        os/file_unix.go:205
os.(*File).Write(0x80c020, {0x86a480, 0x21, 0x40})
        os/file.go:183 +0x2d fp=0x838f78 sp=0x838ef0 pc=0x1605002d
main.runService.func1()
        github.com/evanw/esbuild/cmd/esbuild/service.go:99 +0x7 fp=0x838fe0 sp=0x838f78 pc=0x1f040007
runtime.goexit()
        runtime/asm_wasm.s:399 +0x1 fp=0x838fe8 sp=0x838fe0 pc=0x14070001
created by main.runService
        github.com/evanw/esbuild/cmd/esbuild/service.go:97 +0x1e

goroutine 1 [chan receive]:
runtime.gopark(0xb0e50, 0x832178, 0xe, 0x17, 0x2)
        runtime/proc.go:381 +0x28 fp=0x88dbc8 sp=0x88dba0 pc=0x124c0028
runtime.chanrecv(0x832120, 0x0, 0x1)
        runtime/chan.go:583 +0x7f fp=0x88dc50 sp=0x88dbc8 pc=0x106d007f
runtime.chanrecv1(0x832120, 0x0)
        runtime/chan.go:442 +0x2 fp=0x88dc78 sp=0x88dc50 pc=0x106b0002
github.com/evanw/esbuild/internal/helpers.(*ThreadSafeWaitGroup).Wait(...)
        github.com/evanw/esbuild/internal/helpers/waitgroup.go:36
main.runService.func2()
        github.com/evanw/esbuild/cmd/esbuild/service.go:114 +0x8 fp=0x88dc98 sp=0x88dc78 pc=0x1f030008
main.runService(0x1)
        github.com/evanw/esbuild/cmd/esbuild/service.go:160 +0x5c fp=0x88dde0 sp=0x88dc98 pc=0x1f01005c
main.main()
        github.com/evanw/esbuild/cmd/esbuild/main.go:236 +0x9e fp=0x88df88 sp=0x88dde0 pc=0x1efa009e
runtime.main()
        runtime/proc.go:250 +0x32 fp=0x88dfe0 sp=0x88df88 pc=0x12460032
runtime.goexit()
        runtime/asm_wasm.s:399 +0x1 fp=0x88dfe8 sp=0x88dfe0 pc=0x14070001

goroutine 2 [force gc (idle)]:
runtime.gopark(0xb0fe8, 0x3aa480, 0x11, 0x14, 0x1)
        runtime/proc.go:381 +0x28 fp=0x826fb8 sp=0x826f90 pc=0x124c0028
runtime.goparkunlock(...)
        runtime/proc.go:387
runtime.forcegchelper()
        runtime/proc.go:305 +0x1f fp=0x826fe0 sp=0x826fb8 pc=0x1249001f
runtime.goexit()
        runtime/asm_wasm.s:399 +0x1 fp=0x826fe8 sp=0x826fe0 pc=0x14070001
created by runtime.init.5
        runtime/proc.go:293 +0x2

goroutine 3 [GC sweep wait]:
runtime.gopark(0xb0fe8, 0x3aa800, 0xc, 0x14, 0x1)
        runtime/proc.go:381 +0x28 fp=0x827798 sp=0x827770 pc=0x124c0028
runtime.goparkunlock(...)
        runtime/proc.go:387
runtime.bgsweep(0x82a000)
        runtime/mgcsweep.go:278 +0xf fp=0x8277d0 sp=0x827798 pc=0x1179000f
runtime.gcenable.func1()
        runtime/mgc.go:178 +0x2 fp=0x8277e0 sp=0x8277d0 pc=0x110d0002
runtime.goexit()
        runtime/asm_wasm.s:399 +0x1 fp=0x8277e8 sp=0x8277e0 pc=0x14070001
created by runtime.gcenable
        runtime/mgc.go:178 +0x8

goroutine 4 [GC scavenge wait]:
runtime.gopark(0xb0fe8, 0x3aaa40, 0xd, 0x14, 0x2)
        runtime/proc.go:381 +0x28 fp=0x827f80 sp=0x827f58 pc=0x124c0028
runtime.goparkunlock(...)
        runtime/proc.go:387
runtime.(*scavengerState).park(0x3aaa40)
        runtime/mgcscavenge.go:400 +0xd fp=0x827fa8 sp=0x827f80 pc=0x1160000d
runtime.bgscavenge(0x82a000)
        runtime/mgcscavenge.go:628 +0x4 fp=0x827fd0 sp=0x827fa8 pc=0x11650004
runtime.gcenable.func2()
        runtime/mgc.go:179 +0x2 fp=0x827fe0 sp=0x827fd0 pc=0x110c0002
runtime.goexit()
        runtime/asm_wasm.s:399 +0x1 fp=0x827fe8 sp=0x827fe0 pc=0x14070001
created by runtime.gcenable
        runtime/mgc.go:179 +0xe

goroutine 5 [finalizer wait]:
runtime.gopark(0xb0e88, 0x3c76f0, 0x10, 0x14, 0x1)
        runtime/proc.go:381 +0x28 fp=0x826738 sp=0x826710 pc=0x124c0028
runtime.runfinq()
        runtime/mfinal.go:193 +0x1f fp=0x8267e0 sp=0x826738 pc=0x1104001f
runtime.goexit()
        runtime/asm_wasm.s:399 +0x1 fp=0x8267e8 sp=0x8267e0 pc=0x14070001
created by runtime.createfing
        runtime/mfinal.go:163 +0xd

goroutine 7 [waiting]:
runtime.gopark(0x0, 0x0, 0x0, 0x0, 0x1)
        runtime/proc.go:381 +0x28 fp=0x828f90 sp=0x828f68 pc=0x124c0028
runtime.handleEvent()
        runtime/lock_js.go:257 +0x1b fp=0x828fe0 sp=0x828f90 pc=0x10a3001b
runtime.goexit()
        runtime/asm_wasm.s:399 +0x1 fp=0x828fe8 sp=0x828fe0 pc=0x14070001
created by runtime.beforeIdle
        runtime/lock_js.go:207 +0x1a

goroutine 8 [chan receive]:
runtime.gopark(0xb0e50, 0x8321d8, 0xe, 0x17, 0x2)
        runtime/proc.go:381 +0x28 fp=0x839ea0 sp=0x839e78 pc=0x124c0028
runtime.chanrecv(0x832180, 0x839f88, 0x1)
        runtime/chan.go:583 +0x7f fp=0x839f28 sp=0x839ea0 pc=0x106d007f
runtime.chanrecv1(0x832180, 0x839f88)
        runtime/chan.go:442 +0x2 fp=0x839f50 sp=0x839f28 pc=0x106b0002
main.(*serviceType).sendRequest(0x80fec0, {0x44500, 0x8ea4e0})
        github.com/evanw/esbuild/cmd/esbuild/service.go:192 +0xb fp=0x839fa0 sp=0x839f50 pc=0x1f06000b
main.runService.func3()
        github.com/evanw/esbuild/cmd/esbuild/service.go:125 +0x3 fp=0x839fe0 sp=0x839fa0 pc=0x1f020003
runtime.goexit()
        runtime/asm_wasm.s:399 +0x1 fp=0x839fe8 sp=0x839fe0 pc=0x14070001
created by main.runService
        github.com/evanw/esbuild/cmd/esbuild/service.go:122 +0x41

Copied from the StackBlitz terminal.

Steps to reproduce

  1. Create a new Astro project using the React template 1
  2. Run npm run astro -- check.

To prove that it's the React integration that's causing this, open the astro.config.mjs file and comment out the React integration:

diff --git a/astro.config.mjs b/astro.config.mjs
--- a/astro.config.mjs
+++ b/astro.config.mjs
@@ -4,5 +4,5 @@ import react from '@astrojs/react';
 // https://astro.build/config
 export default defineConfig({
        // Enable React to support React JSX components.
-       integrations: [react()],
+       // integrations: [react()],
 });

Now the astro check command exits successfully:

~/projects/github-vh44xg
❯ npm run astro -- check
$ astro check
12:17:42 AM [check] Checking files
12:17:42 AM [content] No content directory found. Skipping type generation.
✔  Getting diagnostics for Astro files in /home/projects/github-vh44xg/…
12:17:43 AM [diagnostics] Result (1 file): 
- 0 errors
- 0 warnings
- 0 hints


~/projects/github-vh44xg 3s
❯ 

Footnotes

  1. I was able to reproduce the issue with a local project and also via StackBlitz, so feel free to use the React template in StackBlitz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants