Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pre-push file list when merges are involved #860

Open
prem-nuro opened this issue Nov 6, 2018 · 10 comments
Open

Improve pre-push file list when merges are involved #860

prem-nuro opened this issue Nov 6, 2018 · 10 comments
Labels

Comments

@prem-nuro
Copy link

If I have a feature branch based on some old commit in develop or master, and I merge in develop or master, the pre-push hooks will run over all the merge commit files. Would it be possible to add a setting to provide a known "good" branch whose changes can be ignored? This is potentially useful as you gradually roll out a new check on a codebase, or if only some people have hooks installed.

A current workaround for this is by manually doing something like git diff $(git merge-base HEAD develop) --name-only [--diff-filter=ACMR?] in custom hooks, but I'd like to do this for other hooks I haven't written. The effect of this setting would just be to shrink the list of files provided to the hook, so their code wouldn't change.

Example repo: https://github.com/prem-nuro/precommit-issue-860

@asottile
Copy link
Member

asottile commented Nov 6, 2018

Hmmm, the three dots here are supposed to avoid merge commits.

I'm also not sure:

  1. how this feature would be configured
    • top level configuration value?
    • refspec?
  2. that it's a good idea
    • even with a branch name, how would pre-commit know where to generate the diff from?
    • what if that branch doesn't exist locally?
    • what if that branch does exist locally but is out of date?
    • what if the fetched version of the remote branch is out of date?
    • what if someone is pushing to that named branch?

and I really don't think I'd want to take on that complexity for a feature that would only be used very rarely (pre-push already doesn't see all that much use as it is). It would likely be broken very easily :(

So basically: convince me this is a problem worth fixing, convince me it'll not add significant complexity / special snowflakes, convince me it can't (easily) be done in hooks themselves :)

@prem-nuro
Copy link
Author

Regarding the current use of ..., I thought you'd need --no-merges or --first-parent or something.

Good points, here's my attempt at addressing them:

top level configuration value?

One option is a per-hook configuration: ignore_branch: [develop]
Any hooks already aware of special branches don't need them, but we can add them to hooks we haven't authored. I think a simple implementation supporting one special branch or tag is good enough for most use cases. And maybe this option could only have relevant for push stage hooks.

Another option is just top level overall.

refspec?

I think something simpler would be most convenient, matching no-commit-to-branch, hence just develop.

even with a branch name, how would pre-commit know where to generate the diff from?

We can simplify and just do git diff develop...foo. The diff will grow larger as time goes on, but that's okay. We're totally ignoring the current behavior here, which AFAIK is to just do the diff on un-pushed commits.

what if that branch doesn't exist locally?

Use the remote of the current branch's upstream. If it does exist locally, use the upstream version of that branch. Error if it doesn't exist in whatever remote we use.

what if that branch does exist locally but is out of date?

I think the ... notation will do the right thing here, if we use the remote upstream instead.

what if the fetched version of the remote branch is out of date?

Use the upstream version of the branch.

what if someone is pushing to that named branch?

If pushing to the branch, do the diff with whatever is upstream like origin/develop...develop. Special care for no upstream.

Summary:

Special branch is not local or remote Pushing to special branch which only exists locally Pushing to special branch Pushing to other branch
Diff computation Error Everything develop...origin/develop origin/develop...foo
Which remote/upstream Error None required Special branch remote/upstream Special branch remote/upstream if it exists locally. Else, assume it is the same name in current branch's remote. If that doesn't exist, error.

And to reiterate, I think the use-case here is:

  • We pre-push because we want to run a lot of hooks but not on every commit since they're all work in progress until we push.
  • Everything's going to be merged into develop eventually, which has varying levels of satisfying each hook as a new one is turned on. So we really care only about the diff with develop, such that any new code will satisfy hooks. This also helps us ignore things like rebasing on develop.

@asottile
Copy link
Member

The way you describe it should already be the existing behaviour.

Since the commits in develop already exist on the remote, it should only notice the difference between those not pushed and those that you're pushing. Can you provide an example showing that not being the case?

@prem-nuro
Copy link
Author

I see. I linked an example repo above: https://github.com/prem-nuro/precommit-issue-860

master is the special branch and foo is the regular one. If you look at the entries in LOG you should be able to follow along, let me know if the entries aren't clear.

@asottile
Copy link
Member

I had trouble following your example so I created a script:

script

#!/usr/bin/env bash
set -euxo pipefail

rm -rf upstream clone

git init upstream
git -C upstream commit --allow-empty -m 'Initial commit'
git -C upstream config receive.denyCurrentBranch ignore

git clone upstream clone

cd clone
cat > .pre-commit-config.yaml <<EOF
repos:
-   repo: local
    hooks:
    -   id: echo
        name: echo
        entry: echo
        verbose: true
        language: system
        stages: [push]
EOF
git add .pre-commit-config.yaml
pre-commit install -t pre-push
git commit -m "Add push"
git push origin HEAD

touch master_only_file
git add master_only_file
git commit -m "Add master_only_file"
git push origin HEAD

git checkout origin/master^ -b bar
touch bar
git add bar
git commit -m "Add bar"
git merge origin/master --no-edit

git log --oneline --graph --decorate

: merge base
mb="$(git merge-base origin/master HEAD)"
git diff "${mb}" HEAD --name-only

: commits to push
git rev-list HEAD --topo-order --reverse --not --remotes=origin |
    xargs git show

: source vs origin calculation
first_ancestor=$(
    git rev-list HEAD --topo-order --reverse --not --remotes=origin |
    head -1
)
source="$(git rev-parse "${first_ancestor}^")"
git show "${source}"
git diff --name-only "${source}...HEAD"

git push origin HEAD

output

$ bash t.sh
+ rm -rf upstream clone
+ git init upstream
Initialized empty Git repository in /tmp/t/upstream/.git/
+ git -C upstream commit --allow-empty -m 'Initial commit'
[master (root-commit) 50bebbc] Initial commit
+ git -C upstream config receive.denyCurrentBranch ignore
+ git clone upstream clone
Cloning into 'clone'...
done.
+ cd clone
+ cat
+ git add .pre-commit-config.yaml
+ pre-commit install -t pre-push
pre-commit installed at /tmp/t/clone/.git/hooks/pre-push
+ git commit -m 'Add push'
[master 625ef71] Add push
 1 file changed, 9 insertions(+)
 create mode 100644 .pre-commit-config.yaml
+ git push origin HEAD
echo.....................................................................Passed
hookid: echo

.pre-commit-config.yaml

Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 354 bytes | 354.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To /tmp/t/upstream
   50bebbc..625ef71  HEAD -> master
+ touch master_only_file
+ git add master_only_file
+ git commit -m 'Add master_only_file'
[master bbcf4f2] Add master_only_file
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 master_only_file
+ git push origin HEAD
echo.....................................................................Passed
hookid: echo

master_only_file

Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 306 bytes | 306.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To /tmp/t/upstream
   625ef71..bbcf4f2  HEAD -> master
+ git checkout 'origin/master^' -b bar
Switched to a new branch 'bar'
+ touch bar
+ git add bar
+ git commit -m 'Add bar'
[bar 90ef098] Add bar
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 bar
+ git merge origin/master --no-edit
Merge made by the 'recursive' strategy.
 master_only_file | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 master_only_file
+ git log --oneline --graph --decorate
*   aabbc0a (HEAD -> bar) Merge remote-tracking branch 'origin/master' into bar
|\  
| * bbcf4f2 (origin/master, origin/HEAD, master) Add master_only_file
* | 90ef098 Add bar
|/  
* 625ef71 Add push
* 50bebbc Initial commit
+ : merge base
++ git merge-base origin/master HEAD
+ mb=bbcf4f2e695bf186b7af76f49ec288e068ad804e
+ git diff bbcf4f2e695bf186b7af76f49ec288e068ad804e HEAD --name-only
bar
+ : commits to push
+ git rev-list HEAD --topo-order --reverse --not --remotes=origin
+ xargs git show
commit 90ef098415b3069e676bde710e9885243b02b0bc
Author: Anthony Sottile <asottile@umich.edu>
Date:   Mon Nov 12 11:20:31 2018 -0800

    Add bar

diff --git a/bar b/bar
new file mode 100644
index 0000000..e69de29

commit aabbc0a9e4f1c3b60b0d93d4fd977efb8f8bf996 (HEAD -> bar)
Merge: 90ef098 bbcf4f2
Author: Anthony Sottile <asottile@umich.edu>
Date:   Mon Nov 12 11:20:31 2018 -0800

    Merge remote-tracking branch 'origin/master' into bar

+ : source vs origin calculation
++ head -1
++ git rev-list HEAD --topo-order --reverse --not --remotes=origin
+ first_ancestor=90ef098415b3069e676bde710e9885243b02b0bc
++ git rev-parse '90ef098415b3069e676bde710e9885243b02b0bc^'
+ source=625ef7124a9b348855de5c34caa7cbced08cf011
+ git show 625ef7124a9b348855de5c34caa7cbced08cf011
commit 625ef7124a9b348855de5c34caa7cbced08cf011
Author: Anthony Sottile <asottile@umich.edu>
Date:   Mon Nov 12 11:20:30 2018 -0800

    Add push

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000..79a4a6f
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,9 @@
+repos:
+-   repo: local
+    hooks:
+    -   id: echo
+        name: echo
+        entry: echo
+        verbose: true
+        language: system
+        stages: [push]
+ git diff --name-only 625ef7124a9b348855de5c34caa7cbced08cf011...HEAD
bar
master_only_file
+ git push origin HEAD
echo.....................................................................Passed
hookid: echo

bar master_only_file

Counting objects: 4, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 523 bytes | 523.00 KiB/s, done.
Total 4 (delta 1), reused 0 (delta 0)
To /tmp/t/upstream
 * [new branch]      HEAD -> bar

@asottile
Copy link
Member

If we do this at all, I don't want a configuration value. If you can find a way to improve the push routine or revision differencing to determine this I'd be happiest with that.

@asottile asottile changed the title Allow specifying good branch Improve pre-push file list when merges are involved Feb 21, 2019
@asottile asottile added feature and removed question labels Feb 21, 2019
@sluongng
Copy link

#2424 was marked as duplicate of this, but the workflow I reported there does not involve merges but strictly rebase + force pushes.

If we do this at all, I don't want a configuration value. If you can find a way to improve the push routine or revision differencing to determine this I'd be happiest with that.

Internally I patched pre-commit in our monorepo with this

--- a/pre_commit/commands/hook_impl.py
+++ b/pre_commit/commands/hook_impl.py
@@ -117,14 +117,16 @@ def _pre_push_ns(
         local_branch, local_sha, remote_branch, remote_sha = line.split()
         if local_sha == Z40:
             continue
-        elif remote_sha != Z40 and _rev_exists(remote_sha):
-            return _ns(
-                'pre-push', color,
-                from_ref=remote_sha, to_ref=local_sha,
-                remote_branch=remote_branch,
-                local_branch=local_branch,
-                remote_name=remote_name, remote_url=remote_url,
-            )
+        # Irrelevant to monorepo workflow
+        #
+        # elif remote_sha != Z40 and _rev_exists(remote_sha):
+        #     return _ns(
+        #         'pre-push', color,
+        #         from_ref=remote_sha, to_ref=local_sha,
+        #         remote_branch=remote_branch,
+        #         local_branch=local_branch,
+        #         remote_name=remote_name, remote_url=remote_url,
+        #     )
         else:
             # ancestors not found in remote
             ancestors = subprocess.check_output((

If current feature branch was updated because of a git rebase origin/master, we don't want our user have to pay attention to the files that was updated between origin/master@{1}..origin/master, but strictly only the files that was touched between the merge-base commit and feature-branch's HEAD.

So it makes no sense for us to run pre-commit for the diff between remote commit vs local commit in this workflow.

@sluongng
Copy link

I think https://facebook.github.io/watchman/docs/scm-query.html described this class of diff calculation problem really well.
Definitely worth a read.

@sluongng
Copy link

solutioning:

I do recognize that this problem is workflow specific. So I think a top-level configuration to enable/disable this pre-push strategy, depending on workflow of the repository, would be much appreciated here.

@Artalus
Copy link

Artalus commented Apr 25, 2024

I would also vote for #2424 to be reopened, since the problems described in these two tickets are related to two different workflows, and the issue with rebase seems more addressable to me at least.

The fix suggested by sluongng makes sense to me and I don't think it needs any additional configuration. Contrary to the git merge case described by OP, with the git rebase workflow described in #2424 one essentially "grafts" the whole branch from one part of the "trunk" to another. There are no "ambigious" commits present both in the branch and the trunk, only the branching point, the branch head, and branched-out commits inbetween them.

With the current implementation from hook_impl.py, you get git diff <old branch head>...<new branch head> - this includes commits in the branch as well as commits in the trunk, same as in OP's issue with merges. But with sluongng's change applied you would get git diff <new branching point>...<new branch head>, same as if it was a fresh branch being pushed anew without remote to track. Note that this does not address the original problem with merges, as the merge commit is not present in the trunk and the changes introduced by it are indeed new to the branch.

I am actually curious if there are any cases at all, where one would prever the original behavior when pushing after rebase.

The only downside I see with the change discussed, is that instead of using a remote_sha provided to the hook from git itself, even for pushing regular commits there would be3 git rev-... invokes to parse the whole commit tree. For the record, the slowest part is git rev-list --max-parents=0; on my below-average laptop it takes ~15s in Linux repo (>1m commits) and ~1s in VSCode repo (100k commits). Considering people seem to use pre-push to run longer hooks already, that might be okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants