Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to quickly build and run images -- seeking help #401

Closed
mlissner opened this issue Jul 3, 2021 · 3 comments
Closed

How to quickly build and run images -- seeking help #401

mlissner opened this issue Jul 3, 2021 · 3 comments

Comments

@mlissner
Copy link

mlissner commented Jul 3, 2021

I'm trying to do something pretty simple as quickly as possible:

  • Build images
  • Launch them with docker compose
  • Run tests against them

If I'm not using this action, I just build the images (2 minutes), launch them (30s), and run tests (2 minutes) for a total of about 5 minutes.

Using this action, I've tried a couple different things:

  1. Using the load parameter, I build the images, they get exported, and then they get run by docker compose. This works, but the export is surprisingly slow and I don't understand why it's necessary — it's not something my local machine does.

    Verdict: Works, but 11 minutes total. Nine minutes for docker stuff, and two to run tests.

  2. Using the push parameter, I can push the built images to docker hub, then the docker compose will download them again from there. This works, but uploading then downloading the images is wasteful. Weirdly, it's faster than using the load parameter.

    I also don't love this because I don't actually want to push the images and I'd prefer not to log into docker hub (Github Secrets are no longer available to people outside your organization, so if I do this, all outside contributions fail to log into Docker Hub).

    Verdict: Works, except for outside contributors. 10 minutes total. Eight for docker stuff, two for tests. Seems to be the best option.

  3. Using the caching approach seems promising — maybe I can build the images and they'll get cached and that'll be good. I must not be understanding this properly though, because I never get cache hits. The cache key in the link is key: ${{ runner.os }}-buildx-${{ github.sha }}, so maybe this is just a cache within each run of the action, not one that lasts across runs? This is really unclear to me.

    Verdict: Works, but no clear speed improvement. Hm.

  4. I saw that there's an output type of docker. Maybe that is what build would normally do, since load seems to do some strange exporting thing? I haven't figured out how to use this, and the docs seem to say that the only output that is "available" is digest.

    Verdict: Didn't try this yet b/c docs seem to say it won't work anyway and I'm a bit exhausted.

Finally, and perhaps separately, my docker compose uses a number of images that I don't build, and I can't figure out how to cache those. That'd be nice to accomplish somehow too.

I'd love some help here. The use case (build, launch, run) seems really common, but somehow it doesn't work as simply as I'd expect.

All the various approaches to this are here: https://github.com/freelawproject/courtlistener/actions/workflows/tests.yml?query=branch%3Afaster-docker

Thanks for all you do and for any help you can provide.

@crazy-max
Copy link
Member

@mlissner

1. Using the `load` parameter, I build the images, they get exported, and then they get run by docker compose. This works, but the export is surprisingly slow and I don't understand why it's necessary — it's not something my local machine does.
   **Verdict:** Works, but 11 minutes total. Nine minutes for docker stuff, and two to run tests.

load is the right way to do it for your use case imo.

the export is surprisingly slow

If you talk about these steps:

#15 exporting to oci image format
#15 sha256:69a2560eef4d3ece902c3b5149d142e9bd132f25db0bc9e35b94201534c415d2
#15 exporting layers
#15 exporting layers 62.6s done
#15 exporting manifest sha256:18318be9357452c6d157e622f07db462c8af25ebb5ea621bd90f491614235968 done
#15 exporting config sha256:b2be8fc7e858110f93b0ceb1a471b633504d21a8fc95e540d1848ec0a5275f7e done
#15 sending tarball
#15 ...

#16 importing to docker
#16 sha256:608b79d189304045cd367eb8e79a778fe31a707ba277570f53af46adf1554ab5
#16 DONE 30.4s

#15 exporting to oci image format
#15 sha256:69a2560eef4d3ece902c3b5149d142e9bd132f25db0bc9e35b94201534c415d2
#15 sending tarball 39.9s done
#15 DONE 102.4s

That's necessary (oci export + import to docker). I think a more effective way than load would be to push to a local registry as shown here.

2\. Using the `push` parameter, I can push the built images to docker hub, then the docker compose will download them again from there. This works, but uploading then downloading the images is wasteful.

Use a local registry instead.

3\. Using the [caching](https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md) approach seems promising — maybe I can build the images and they'll get cached and that'll be good. I must not be understanding this properly though, because I never get cache hits. The cache key in the link is `key: ${{ runner.os }}-buildx-${{ github.sha }}`, so maybe this is just a cache _within_ each run of the action, not one that lasts across runs? This is really unclear to me.

As I can see in your workflow you're using the same cache for two different Dockerfiles/context which invalidates the cache for the next one. You should create one cache per Dockerfile/context and it should work. See also #153 (comment).

      - name: Cache Docker layers django
        uses: actions/cache@v2
        with:
          path: /tmp/.buildx-django-cache
          key: ${{ runner.os }}-buildx-django-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-buildx-django-
      - name: Cache Docker layers celery
        uses: actions/cache@v2
        with:
          path: /tmp/.buildx-celery-cache
          key: ${{ runner.os }}-buildx-celery-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-buildx-celery-
      - name: Build and push latest docker django image
        uses: docker/build-push-action@v2
        with:
          file: docker/django/Dockerfile
          tags: freelawproject/courtlistener-django:${{ github.sha }}
          cache-from: type=local,src=/tmp/.buildx-django-cache
          cache-to: type=local,dest=/tmp/.buildx-django-cache-new
      - name: Build and push latest docker celery image
        uses: docker/build-push-action@v2
        with:
          file: docker/task-server/Dockerfile
          tags: freelawproject/task-server:${{ github.sha }}
          cache-from: type=local,src=/tmp/.buildx-celery-cache
          cache-to: type=local,dest=/tmp/.buildx-celery-cache-new
      - name: Move caches
        run: |
          rm -rf /tmp/.buildx-django-cache
          mv /tmp/.buildx-django-cache-new /tmp/.buildx-django-cache
          rm -rf /tmp/.buildx-celery-cache
          mv /tmp/.buildx-celery-cache-new /tmp/.buildx-celery-cache
4\. I saw that there's an output type of `docker`. Maybe that is what `build` would normally do, since `load` seems to do some strange exporting thing? I haven't figured out how to use this, and the docs seem to say that the only output that is "available" is `digest`.

load is a shorthand for --output=type=docker and will automatically load the single-platform build result to docker images.

Finally, and perhaps separately, my docker compose uses a number of images that I don't build, and I can't figure out how to cache those. That'd be nice to accomplish somehow too.

If you have some images you want available when the runner is started you can use a self-hosted runner or you could maybe use the GitHub Cache for that. You also might be interested in Stargz Snapshotter.

@mlissner
Copy link
Author

mlissner commented Jul 7, 2021

Thanks @crazy-max, these notes are incredibly helpful. I appreciate everything you do and the time you gave to help me.

First, is there anything in this that you think is useful to the documentation? If so, I'd be happy to add it.

Second, a few notes about what I was able to do for others that might land here:

  1. I switched to using two caches. This is super messy, but it works and shaves minutes off my builds. Wonderful.

  2. After I got load working, I got the local registry working. That didn't actually help much vs using load.

  3. I didn't play with setting up a self-hosted runner. It would add expense and complexity we don't have time for unfortunately.

  4. I don't know where to begin with caching images required by docker compose and couldn't find anything in Google about it either. I'm not sure how much it'd help. Using the Github Cache for this would probably be faster than docker hub, but I can't imagine it'd be that much faster. Both are just downloading things. I saw one forum post that said pulling from the GIthub Cache would be slower than pulling from the docker hub. Seems dubious, but it's a data point.

  5. Stargz Snapshotter is magic. I put it on my list for later research. Holy cow.

Thank you again, @crazy-max. I think we can close this unless you have further comments you want to make.

@crazy-max
Copy link
Member

@mlissner

First, is there anything in this that you think is useful to the documentation? If so, I'd be happy to add it.

I plan to update the documentation for the new GitHub cache backend soon (see also moby/buildkit#1974 (comment)) which will drastically streamline the workflow.

I switched to using two caches. This is super messy, but it works and shaves minutes off my builds. Wonderful.

Will be less messy with the new GitHub cache backend (moby/buildkit#1974 (comment))

After I got load working, I got the local registry working. That didn't actually help much vs using load.

Ok thanks for your feedback, will see how we can improve that in the future. See also docker/buildx#626

I saw one forum post that said pulling from the GIthub Cache would be slower than pulling from the docker hub. Seems dubious, but it's a data point.

I would say that it is possible because everything in the GitHub Cache is downloaded from Azure Blob Storage as a single tarball and decompressed in the current API whereas the operation with docker pull is async for each layer.

I appreciate everything you do and the time you gave to help me.

With pleasure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants