Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: all goroutines are asleep - deadlock! #11

Open
tonyffrench opened this issue Jun 28, 2018 · 7 comments
Open

fatal error: all goroutines are asleep - deadlock! #11

tonyffrench opened this issue Jun 28, 2018 · 7 comments

Comments

@tonyffrench
Copy link

DRONE_MAX_PROCS is set to 1, on each of the servers, so this should not be a concurrency issue with multiple jobs running concurrently on the same host.

time="2018-06-28T04:00:25Z" level=info msg="Restoring cache from /cache/xxx/yyy/master.tar"
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache.restoreCache(0xc4200147c0, 0x1e, 0x5a1d20, 0x683d00, 0x5a1760, 0x683d00, 0x0, 0x0)
	/go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache/cache.go:65 +0x26d
github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache.Cache.Restore(0x5a1d20, 0x683d00, 0x5a1760, 0x683d00, 0xc4200147c0, 0x1e, 0xc4200147e0, 0x1e, 0xc42001c0b0, 0xa)
	/go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache/cache.go:36 +0x81
main.(*plugin).Exec(0xc4200ad920, 0x2, 0x2)
	/go/src/github.com/drone-plugins/drone-volume-cache/plugin.go:46 +0x2b7
main.run(0xc42008c160, 0xc42008c160, 0xc420049997)
	/go/src/github.com/drone-plugins/drone-volume-cache/main.go:136 +0x406
github.com/drone-plugins/drone-volume-cache/vendor/github.com/urfave/cli.HandleAction(0x54e000, 0x58c860, 0xc42008c160, 0xc42005c0c0, 0x0)
	/go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/urfave/cli/app.go:490 +0xc8
github.com/drone-plugins/drone-volume-cache/vendor/github.com/urfave/cli.(*App).Run(0xc4200881a0, 0xc42000e190, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/urfave/cli/app.go:264 +0x5ee
main.main()
	/go/src/github.com/drone-plugins/drone-volume-cache/main.go:90 +0xadc

goroutine 5 [select]:
io.(*pipe).Write(0xc42007e500, 0xc4200b4000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
	/usr/local/go/src/io/pipe.go:87 +0x1e3
io.(*PipeWriter).Write(0xc42000c040, 0xc4200b4000, 0x8000, 0x8000, 0x8000, 0x0, 0x0)
	/usr/local/go/src/io/pipe.go:153 +0x4c
io.copyBuffer(0x5a1140, 0xc42000c040, 0x5a1160, 0xc42000c048, 0xc4200b4000, 0x8000, 0x8000, 0xc420035f18, 0x18, 0xc420035f10)
	/usr/local/go/src/io/io.go:402 +0x240
io.Copy(0x5a1140, 0xc42000c040, 0x5a1160, 0xc42000c048, 0x0, 0xc420035f68, 0x18)
@tonyffrench
Copy link
Author

Its possible this was caused by a disk space issue. Better error handling and logging would be preferable.

@mvdan
Copy link

mvdan commented Dec 11, 2018

We ran into this, but our disk is far from full:

Filesystem      Size  Used Avail Use% Mounted on
udev            2.0G     0  2.0G   0% /dev
tmpfs           396M   41M  355M  11% /run
/dev/vda1        78G   38G   40G  49% /

The rest of our caches work fine, but this one repository keeps on crashing when restoring the cache.

@mvdan
Copy link

mvdan commented Dec 11, 2018

Deleted the tar file from the host machine and restarted the build twice, and the cahe restoring is working again. Perhaps the tar file had been corrupted somehow.

@mvdan
Copy link

mvdan commented Jan 16, 2019

We've encountered this bug two more times in the last few weeks. It stalls development, which is a tad annoying. The fact that it only happens sometimes seems to point at a race condition somewhere.

I can't upload the tar files, though, as they contain private code. One possible explanation is that we're using this plugin on a VM, and we run up to two CI jobs at a time. So perhaps two jobs could end up writing the same tar file in the host filesystem at the same time, potentially producing a bad tar file.

Still, that shouldn't explain a deadlock. And the OP is running only one concurrent job at a time.

@tboerger
Copy link
Contributor

But even then it shouldn't result in a deadlock. Are you sure it's not a space issue?

@mvdan
Copy link

mvdan commented Jan 16, 2019

As said before, not even close:

Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        78G   32G   46G  42% /

For completeness, here's the panic we got yesterday:

time="2019-01-16T05:25:19Z" level=info msg="Restoring cache from /cache/brankas-scraper.tar"
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache.restoreCache(0xc420076040, 0x1a, 0x5a1d20, 0x683d00, 0x5a1760, 0x683d00, 0x0, 0x0)
    /go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache/cache.go:65 +0x26d
github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache.Cache.Restore(0x5a1d20, 0x683d00, 0x5a1760, 0x683d00, 0xc420076040, 0x1a, 0xc420070060, 0x2f, 0xc420074060, 0x10)
    /go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache/cache.go:36 +0x81
main.(*plugin).Exec(0xc420459920, 0x2, 0x2)
    /go/src/github.com/drone-plugins/drone-volume-cache/plugin.go:46 +0x2b7
main.run(0xc420090160, 0xc420090160, 0xc420049997)
    /go/src/github.com/drone-plugins/drone-volume-cache/main.go:136 +0x406
github.com/drone-plugins/drone-volume-cache/vendor/github.com/urfave/cli.HandleAction(0x54e000, 0x58c860, 0xc420090160, 0xc420088060, 0x0)
    /go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/urfave/cli/app.go:490 +0xc8
github.com/drone-plugins/drone-volume-cache/vendor/github.com/urfave/cli.(*App).Run(0xc42008a1a0, 0xc420068170, 0x1, 0x1, 0x0, 0x0)
    /go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/urfave/cli/app.go:264 +0x5ee
main.main()
    /go/src/github.com/drone-plugins/drone-volume-cache/main.go:90 +0xadc

goroutine 18 [select]:
io.(*pipe).Write(0xc42007a500, 0xc4200bda00, 0x6600, 0x6600, 0x1a00, 0x0, 0x0)
    /usr/local/go/src/io/pipe.go:87 +0x1e3
io.(*PipeWriter).Write(0xc420078030, 0xc4200bc000, 0x8000, 0x8000, 0x8000, 0x0, 0x0)
    /usr/local/go/src/io/pipe.go:153 +0x4c
io.copyBuffer(0x5a1140, 0xc420078030, 0x5a1160, 0xc420078038, 0xc4200bc000, 0x8000, 0x8000, 0xc420030f18, 0x18, 0xc420030f10)
    /usr/local/go/src/io/io.go:402 +0x240
io.Copy(0x5a1140, 0xc420078030, 0x5a1160, 0xc420078038, 0x0, 0xc420030f68, 0x18)
    /usr/local/go/src/io/io.go:362 +0x5a
main.(*localCache).Get(0x683d00, 0xc420076040, 0x1a, 0x5a1140, 0xc420078030, 0x0, 0x0)
    /go/src/github.com/drone-plugins/drone-volume-cache/local.go:20 +0xa8
github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache.restoreCache.func1(0xc420078030, 0xc4200880c0, 0x5a1d20, 0x683d00, 0xc420076040, 0x1a)
    /go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache/cache.go:61 +0x88
created by github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache.restoreCache
    /go/src/github.com/drone-plugins/drone-volume-cache/vendor/github.com/drone/drone-cache-lib/cache/cache.go:58 +0x1fb

@mvdan
Copy link

mvdan commented Jan 9, 2020

In case this is useful to others, we ended up doing caching ourselves completely from scratch. It took just a tiny Docker image and a few lines of shell:

$ cat images/alpine-tar/Dockerfile 
FROM alpine:3.11.0

# Make sure we have GNU tar too; by default only busybox's tar is installed.
RUN apk add --no-cache tar zstd

$ cat .drone.yml # a shortened example
kind: pipeline
name: default

volumes:
- name: cache
  host:
    path: /root/drone/build-cache

steps:
- name: restore-cache
  image: YOUR-REGISTRY/alpine-tar:3.11.0
  environment:
    CACHE_FILE: /cache/${DRONE_REPO_NAME}.tar.zst
  commands:
    - mkdir -p .cache/
    - if test -e $CACHE_FILE; then tar -C .cache/ -xf $CACHE_FILE; fi
  volumes:
    - name: cache
      path: /cache

[steps that use the cache at .cache/]

- name: rebuild-cache
  depends_on: [YOUR-STEPS]
  image: YOUR-REGISTRY/alpine-tar:3.11.0
  environment:
    CACHE_FILE: /cache/${DRONE_REPO_NAME}.tar.zst
    TEMP_CACHE_FILE: /cache/${DRONE_REPO_NAME}-tmp-${DRONE_BUILD_NUMBER}.tar.zst
  commands:
    - tar -C .cache/ -caf $TEMP_CACHE_FILE .
    - mv $TEMP_CACHE_FILE $CACHE_FILE # use an atomic write
  volumes:
    - name: cache
      path: /cache

This has several advantages over this plugin, or really any other "volume cache" plugin I could find:

  • It uses a .tar.zst file, so it compresses pretty well while not using much CPU. Seems like a good balance between CPU and disk speed.
  • Its writes are atomic, so even if multiple rebuilds happen at the same time, one wins without corruption. We've also seen no deadlocks like this issue in almost a year.
  • It's highly configurable by design, since you can change the few lines of shell as you wish.

We've also seen a moderate speed-up when moving from drone-volume-cache to this method, presumably due to the compression. For example, restores went down from ~25s to ~15s on average, while the decompressed size stayed the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants