New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job #37005
Conversation
d5187ce
to
7c84582
Compare
… for container based job ### What changes were proposed in this pull request? This patch add GITHUB_WORKSPACE to git trust safe.directory for container based job. There are 3 container based job in Spark Infra: - sparkr - lint - pyspark ### Why are the changes needed? When upgrade `git >= 2.35.2` (such as latest dev docker image in my case), fix a [CVE-2022-24765](https://github.blog/2022-04-12-git-security-vulnerability-announced/#cve-2022-24765) , and it has been backported [latest ubuntu git](actions/checkout#760 (comment)) The `GITHUB_WORKSPACE` is belong to action user (in host), but the container user is `root` (in container) (root is required in our spark job), so cause the error like: ``` fatal: unsafe repository ('/__w/spark/spark' is owned by someone else) To add an exception for this directory, call: git config --global --add safe.directory /__w/spark/spark fatal: unsafe repository ('/__w/spark/spark' is owned by someone else) To add an exception for this directory, call: git config --global --add safe.directory /__w/spark/spark ``` The solution is from `action/checkout` actions/checkout#760 , just add `GITHUB_WORKSPACE` to git trust safe.directory to make container user can checkout successfully. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed. CI passed with latest ubuntu hosted runner, here is a simple e2e test for pyspark job #37005 Closes #37079 from Yikun/SPARK-39610. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
509e126
to
d63d979
Compare
96db6b3
to
764923e
Compare
Failure due to unrelated error, I will do a retrigger after #37103 merged. |
Retriggered: https://github.com/Yikun/spark/runs/7225401037?check_suite_focus=true , infra image works well, let's see pyspark job. then will ready for review. |
764923e
to
3e6bc89
Compare
.github/workflows/build_and_test.yml
Outdated
-X DELETE | ||
-H "Accept: application/vnd.github+json" | ||
-H "Authorization: token ${{ secrets.GHCR_DEL }}" | ||
https://api.github.com/user/packages/container/${{ needs.infra-image.outputs.image_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's:
- a little bit flaky
- also bring some configure complex for developer, we have to add PAT secrets rather than GITHUB_TOKEN
- also it's not supported well by actions/delete-package-versions
- also github didn't provide delete package by tag API.
- there are some 3rd party action supported but apache infra will say no.
- maybe undeletion image tag can also help reproduce locally
So, I think apache-spark-ci-image:${{ inputs.branch }}-${{ github.run_id }}
(diff tag for each pr) without any cleanup better than apache-spark-ci-image-${{ inputs.branch }}-${{ github.run_id }}
(diff packages for each pr) by auto deletion.
If developers want to delete package for some reason, they can also easy delete complete image in github page.
@HyukjinKwon So, I decided to remove infra-image-post
and back to use apache-spark-ci-image:tag
without deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation.
cc @HyukjinKwon @dongjoon-hyun Ready for review, Thanks! |
uses: docker/login-action@v2 | ||
with: | ||
registry: ghcr.io | ||
username: ${{ github.actor }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: need a test for schedule job before ready to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case 1: The developer push a commit: ==> OK (CI passed!)
- Login: github.actor:
yikun
- CI Image URL:
yikun/apache-spark-ci-image:master-xxx
Case 2: Merge a commit in apache/spark master: ==> should OK
(committer need has right to push apache package) According to link it's inherited from repo right.
- Login: ${{ github.actor }}: such as
HyukjinKwon
(who merge the commit) - CI Image URL:
apache/apache-spark-ci-image:master-xxx
- local test: login , image
Case 3: Scheduler job (master): ==> should OK
(The latest user who create the schedule trigger need has right to push apache package) same rights wtih case2
- Login: ${{ github.actor }}: such as
HyukjinKwon
(The latest user who create the schedule trigger will be the actor for the schedule workflow) - CI Image URL:
apache/apache-spark-ci-image:master-xxx
- local test: login, image
Case 4: Scheduler job (branch-xxx) ==> OK
- Use branches image cache directly (
dongjoon/apache-spark-github-action-image:20220207
, future isspark/apache-spark-github-action-image-cache:branches-*
)
Case 5: The developer push a commit in branches: ==> OK
Case 6: The apache/spark merge a commit in branches: ==> OK
- use branches workflow, no change by this pr
- When cut branches, will same as case 1 & case 2
Does all spark committers has right to push apache ghcr package (case 2, case 3)? According to cache image, I can confirm @HyukjinKwon has rights, but not sure any different with other commiters. |
Yes, all other committers have the same permission with me. |
Let's get this in and see how it gose. |
Merged to master. |
Sure, I will monitor in recent days. and update results here:
|
fetch-depth: 0 | ||
repository: apache/spark | ||
ref: ${{ inputs.branch }} | ||
- name: Sync the current branch with the latest in Apache Spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a PR changes the dockerfile and merged, this will help to update the dockerfile in follow PR...so, I think it's necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh okay you're correct!
@@ -251,13 +251,73 @@ jobs: | |||
name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java }}-${{ matrix.hadoop }}-${{ matrix.hive }} | |||
path: "**/target/unit-tests.log" | |||
|
|||
pyspark: | |||
infra-image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, let's add a name
here to make it look pretty (e.g., Base image build)
Feel free to make a followup or a separate PR with a separate JIRA 👍 |
### What changes were proposed in this pull request? Add name for infra-image and cleanup nits and fix the inline format ### Why are the changes needed? Address comments in #37005 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #37131 from Yikun/SPARK-39718. Authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Change pyspark container from original static image to just-in-time build image from cache.
See also: https://docs.google.com/document/d/1_uiId-U1DODYyYZejAZeyz2OAjxcnA-xfwjynDF6vd0
This patch has 2 changes:
infra-image
job to build ci image from cache imagesWhy are the changes needed?
Help to speed up docker infra image build in each PR.
Does this PR introduce any user-facing change?
No
How was this patch tested?
CI passed