Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

git-lfs binary not resolving in .git post-checkout hook preventing a number of notebook updates from being published #290

Open
speediedan opened this issue Oct 20, 2023 · 2 comments
Assignees
Labels
bug / fix Something isn't working help wanted Extra attention is needed

Comments

@speediedan
Copy link
Contributor

speediedan commented Oct 20, 2023

馃悰 Bug

The Lightning-AI.tutorials [publish] Azure pipeline has been failing since October 12th.

The issue was first observed a few months ago but apparently is rearing its ugly head again. The publish pipeline typically fails with the "papermill" job on the "Git check & switch branch" step:

 publication
 Previous HEAD position was 3b4e9a0 docker: add missing `git-lfs` & readability (#284)
 Switched to branch 'publication'
 Your branch is behind 'origin/publication' by 10 commits, and can be fast-forwarded.
   (use "git pull" to update your local branch)

 This repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting .git/hooks/post-checkout.

 ##[error]Bash exited with code '1'.

The last papermill job component of the publish pipeline to succeed was the publishing of an updated version of course_UvA-DL/13-contrastive-learning triggered by c0d526ed5b4b4f27f8f4b3f5b0c4bac5b39d0d4c

The transition sequence from the final successful job to the failing ones:

  1. c0d526ed5b4b4f27f8f4b3f5b0c4bac5b39d0d4c gets checkout from main, changes wrt publication branch are detected and main is merged into publication and pushed successfully
  2. course_UvA-DL/13-contrastive-learning gets updated, committed and pushed to origin/publication at 5497c9a8dffdfad44aacd8b50a6eabec9ba823cb successfully
  3. The next notebook updated lightning_examples/cifar-baseline checks out c0d526ed5b4b4f27f8f4b3f5b0c4bac5b39d0d4c but when attempting to switch to the publication branch (5497c9a8dffdfad44aacd8b50a6eabec9ba823cb) triggers the git-lfs command not to be found in the post-checkout hook
  4. Publishing of all subsequent notebook updates has since be blocked (the publish pipeline failing with both new commits and re-running of previous runs). Including:

Using the latest tutorials docker image I've been able to replicate the entire flow including transitions from the precise refs and branches referenced above but have been unable to reproduce the failure to resolve git-lfs. Since I don't have permission to set pipeline variables, enable system diagnositics and trigger re-runs I'm not sure I'll be able to efficiently debug much further.

IMHO I think the next step in debugging would be to enable system diagnostics and trigger some new runs as well as adding some additional debugging code prior to the problematic checkout steps. For instance, update "Git check & switch branch" with:

      - bash: |
          set +e
          git fetch --all
          echo $(PUB_BRANCH)
          git ls-remote --heads origin ${PUB_BRANCH} | grep ${PUB_BRANCH} >/dev/null
          if [ "$?" == "1" ] ; then echo "Branch doesn't exist"; exit; fi
          apt-get update -q --fix-missing && apt-get install vim strace -y
          strace -o /git_lfs_post_checkout_hook.log -f git checkout ${PUB_BRANCH} 2>&1 && egrep -r "git-lfs\"|execve" /git_lfs_post_checkout_hook.log
          cat /git_lfs_post_checkout_hook.log
          git show-ref $(PUB_BRANCH)
          git pull
          set -e
        displayName: "Git check & switch branch"

If necessary of course, one could inject an extended pause prior to executing this step to allow attaching to and debugging within the container to finally nail this problem once and for-all!

I know the Lightning team has a lot of work on its plate so I'm just adding this issue and my observations so it doesn't get lost in the mix and to help minimize your valuable time debugging.

It'll be great to have notebook updates published automatically once this pipeline issue is resolved. Thanks for all your work!

@speediedan speediedan added bug / fix Something isn't working help wanted Extra attention is needed labels Oct 20, 2023
@Borda Borda self-assigned this Oct 21, 2023
@Borda
Copy link
Member

Borda commented Oct 21, 2023

This could be hard to debug from regular PR until we would fuse test and publish stage proposed in #286

Ultimate resolve may be nuke and rebuild as this repo was created also for this case in mind...

@speediedan
Copy link
Contributor Author

#286 is a great idea, fully behind it! Nuke and rebuild might be a good idea as well.

In the meantime though, I think it might be worth including the debugging lines I added to "git check and switch branch" step (with or without cat of the log) just so we have more visibility into why precisely the post-checkout hook is failing. The next merged PR could then potentially illuminate the problem before #286 gets implemented or the nuke approach. What do you think?

      - bash: |
          set +e
          git fetch --all
          echo $(PUB_BRANCH)
          git ls-remote --heads origin ${PUB_BRANCH} | grep ${PUB_BRANCH} >/dev/null
          if [ "$?" == "1" ] ; then echo "Branch doesn't exist"; exit; fi
          apt-get update -q --fix-missing && apt-get install vim strace -y
          strace -o /git_lfs_post_checkout_hook.log -f git checkout ${PUB_BRANCH} 2>&1 && egrep -r "git-lfs\"|execve" /git_lfs_post_checkout_hook.log
          cat /git_lfs_post_checkout_hook.log
          git show-ref $(PUB_BRANCH)
          git pull
          set -e
        displayName: "Git check & switch branch"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants