From 786e11d1b08ab280ea14a28f575f77518dbb6d70 Mon Sep 17 00:00:00 2001
From: Stefan van der Walt <stefanv@berkeley.edu>
Date: Wed, 18 Jan 2023 00:02:10 -0800
Subject: [PATCH] Add (optional) pre-commit hook

- Flake8 no longer handles patches, so instead apply to entire
  files (see https://github.com/PyCQA/flake8/issues/1760)
- Rename `tools/lint_diff.py` to `tools/lint.py`
- Files can be obtained either via diffing against a branch,
  or by specifying them manually
- `tools/pre-commit-hook.sh` is a pre-commit hook that utilizes
  `tools/lint.py` but only lints staged changes
- Update contributor guidelines to point to pre-commit hook

See also the discussion at https://github.com/scipy/scipy/issues/15489
---
 HACKING.rst.txt                     | 23 +++++++----
 azure-pipelines.yml                 |  2 +-
 doc/source/dev/contributor/pep8.rst | 34 +++++++++--------
 tools/{lint_diff.ini => lint.ini}   |  0
 tools/{lint_diff.py => lint.py}     | 59 +++++++++++++++++------------
 tools/pre-commit-hook.sh            | 35 +++++++++++++++++
 6 files changed, 105 insertions(+), 48 deletions(-)
 rename tools/{lint_diff.ini => lint.ini} (100%)
 rename tools/{lint_diff.py => lint.py} (58%)
 create mode 100755 tools/pre-commit-hook.sh
diff --git a/HACKING.rst.txt b/HACKING.rst.txt
index 6c8c272045e6..cead6e10fa99 100644
--- a/HACKING.rst.txt
+++ b/HACKING.rst.txt
@@ -82,13 +82,22 @@ tests, benchmarks, and correct code style.
     as it will appear online.
 
 4. Code style
-    Uniformity of style in which code is written is important to others trying
-    to understand the code.  SciPy follows the standard Python guidelines for
-    code style, `PEP8`_.  In order to check that your code conforms to PEP8,
-    you can use the `pep8 package`_ style checker.  Most IDEs and text editors
-    have settings that can help you follow PEP8, for example by translating
-    tabs by four spaces.  Using `pyflakes`_ to check your code is also a good
-    idea. More information is available in :ref:`pep8-scipy`.
+    Uniform code style makes it easier for others to read your code.
+    SciPy follows the standard Python style guideline, `PEP8`_.
+
+    We provide a git pre-commit hook that can check each of your commits
+    for proper style. Install it (once) by running the following from
+    the root of the SciPy repository::
+
+      ln -s tools/pre-commit-hook.sh .git/hooks/pre-commit
+
+    Alternatively, you may run the linter manually::
+
+      python tools/lint.py --diff-against main
+
+    Most IDEs and text editors also have settings that can help you
+    follow PEP8, for example by translating tabs by four spaces. More
+    information is available in :ref:`pep8-scipy`.
 
 A :ref:`checklist<pr-checklist>`, including these and other requirements, is
 available at the end of the example :ref:`development-workflow`.
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
index ef832f7a7105..5cb4ba1f8e00 100644
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@@ -156,7 +156,7 @@ stages:
         # so does not appear to be needed to fetch maintenance
         # branches (which Azure logs show being checked out
         # in Checkout stage)
-        python tools/lint_diff.py --branch origin/$(System.PullRequest.TargetBranch)
+        python tools/lint.py --diff-against origin/$(System.PullRequest.TargetBranch)
         tools/unicode-check.py
         tools/check_test_name.py
       displayName: 'Run Lint Checks'
diff --git a/doc/source/dev/contributor/pep8.rst b/doc/source/dev/contributor/pep8.rst
index e72d86b91ccc..fadb51597e34 100644
--- a/doc/source/dev/contributor/pep8.rst
+++ b/doc/source/dev/contributor/pep8.rst
@@ -19,30 +19,32 @@ compliance before pushing your code:
    Editor |rarr| Advanced Settings. This can help you fix PEP8 issues as you
    write your code.
 
--  SciPy makes use of special configuration files for linting with the
-   |flake8|_ tool.
+-  Note, however, that SciPy's |flake8|_ configuration may not match
+   that of your IDE exactly. See below on how to run the official
+   checks.
 
--  It is typically recommended to leave any existing style issues alone
-   unless they are part of the code you're already modifying.
+-  It is recommended to leave existing style issues alone
+   unless they exist in files you are already modifying.
    This practice ensures that the codebase is gradually cleaned up
    without dedicating precious review time to style-only cleanups.
-   Before sending a Pull Request, we suggest running the lint tests only
-   for the changes you've made in your feature branch. This will mimic
-   the continuous integration linting checks setup on GitHub.
-   After installing ``flake8``, you can run the following check locally
-   in the SciPy root directory to ensure your Pull Request doesn't
-   break the Continuous Integration linting tests.
 
-   ::
+-  Before sending a Pull Request, run the linter on changes made in
+   your feature branch. The checks will also be made during
+   continuous integration, but it's quicker to catch them early.
 
-      python tools/lint_diff.py
+   The easiest way to do so is to install our pre-commit hook (once)::
 
-   If you want to run the diff based lint tests only for specific files
-   or directories, please consider using the ``--files`` option.
+     ln -s tools/pre-commit-hook.sh .git/hooks/pre-commit
 
-   ::
+   This will run linting checks before each commit is made.
 
-      python tools/lint_diff.py --files scipy/odr/models.py scipy/ndimage
+   Alternatively, you can run the check manually from the SciPy root directory::
+
+      python tools/lint.py --diff-against main
+
+   You can also run the linter on specific files, using the ``--files`` option::
+
+      python tools/lint.py --files scipy/odr/models.py scipy/ndimage
 
 -  If you have existing code with a lot of PEP8 issues, consider using
    |autopep8|_ to automatically fix them before incorporating the code into
diff --git a/tools/lint_diff.ini b/tools/lint.ini
similarity index 100%
rename from tools/lint_diff.ini
rename to tools/lint.ini
diff --git a/tools/lint_diff.py b/tools/lint.py
similarity index 58%
rename from tools/lint_diff.py
rename to tools/lint.py
index 685377f34b69..aaa5b10ea01a 100755
--- a/tools/lint_diff.py
+++ b/tools/lint.py
@@ -4,9 +4,10 @@
 import subprocess
 from argparse import ArgumentParser
 
+
 CONFIG = os.path.join(
     os.path.abspath(os.path.dirname(__file__)),
-    'lint_diff.ini',
+    'lint.ini',
 )
 
 
@@ -49,48 +50,58 @@ def find_branch_point(branch):
     # If a branch split off over 1000 commits ago we will fail to find
     # the ancestor.
     raise RuntimeError(
-        'Failed to find a common ancestor in the last 1000 commits')
+        'Failed to find a common ancestor in the last 1000 commits'
+    )
 
 
-def find_diff(sha, files=None):
-    """Find the diff since the given sha."""
-    if files:
-        for file_or_dir in files:
-            msg = f"{file_or_dir} doesn't exist. Please provide a valid path."
-            assert os.path.exists(file_or_dir), msg
-    else:
-        files = ['*.py']
+def diff_files(sha):
+    """Find the diff since the given SHA."""
     res = subprocess.run(
-        ['git', 'diff', '--unified=0', sha, '--'] + files,
+        ['git', 'diff', '--name-only', '-z', sha, '--', '*.py'],
         stdout=subprocess.PIPE,
         encoding='utf-8'
     )
     res.check_returncode()
-    return res.stdout
+    return [f for f in res.stdout.split('\0') if f]
 
 
-def run_flake8(diff):
-    """Run flake8 on the given diff."""
+def run_flake8(files):
+    if not files:
+        return 0, ""
     res = subprocess.run(
-        ['flake8', '--diff', '--config', CONFIG],
-        input=diff,
+        ['flake8', '--config', CONFIG] + files,
         stdout=subprocess.PIPE,
-        encoding='utf-8',
+        encoding='utf-8'
     )
     return res.returncode, res.stdout
 
 
 def main():
     parser = ArgumentParser()
-    parser.add_argument("--branch", type=str, default='main',
-                        help="The branch to diff against")
-    parser.add_argument("--files", type=str, nargs='+', default=None,
-                        help="The files or directories to diff against")
+    # In Python 3.9, can use: argparse.BooleanOptionalAction
+    parser.add_argument("--diff-against", dest='branch',
+                        type=str, default=None,
+                        help="Diff against "
+                             "this branch and lint modified files. Use either "
+                             "`--diff-against` or `--files`, but not both.")
+    parser.add_argument("--files", nargs='*',
+                        help="Lint these files or directories; "
+                             "use **/*.py to lint all files")
+
     args = parser.parse_args()
 
-    branch_point = find_branch_point(args.branch)
-    diff = find_diff(branch_point, args.files)
-    rc, errors = run_flake8(diff)
+    if not ((args.files is None) ^ (args.branch is None)):
+        print('Specify either `--diff-against` or `--files`. Aborting.')
+        sys.exit(1)
+
+    if args.branch:
+        branch_point = find_branch_point(args.branch)
+        files = diff_files(branch_point)
+    else:
+        files = args.files
+
+    rc, errors = run_flake8(files)
+
     if errors:
         print(errors)
 
diff --git a/tools/pre-commit-hook.sh b/tools/pre-commit-hook.sh
new file mode 100755
index 000000000000..1a4e3392ccb8
--- /dev/null
+++ b/tools/pre-commit-hook.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+#
+# Pre-commit linting hook.
+#
+# Install from root of repository with:
+#
+#   ln -s tools/pre-commit-hook.sh .git/hooks/pre-commit
+
+# store names of files that were staged
+mapfile -d '' changed < <(git diff --cached --name-only -z --diff-filter=ACMR -- '*.py')
+
+# create a temporary copy of what would get committed, without unstaged modifications
+# (e.g., only certain changes in a file may have been committed)
+GIT_DIR=${GIT_DIR:-.git}
+workdir=$GIT_DIR/.pre-commit-workdir
+fakecommit=$(git commit-tree -p HEAD $(git write-tree) -m ...)
+
+if ! [[ -d $workdir ]]; then
+    git clone -qns "$GIT_DIR" "$workdir" || exit
+fi
+
+unset "${!GIT_@}"
+cd "$workdir"
+git checkout -q $fakecommit
+
+# run linter(s); if any fail, remember that, and set an overall hook failure status
+ret=0
+
+# Run lint.py from the scipy source tree
+../../tools/lint.py --files "${changed[@]}" || ret=1
+
+if [[ $ret -ne 0 ]]; then
+    echo "!! Linting failed; please make fixes, \`git add\` files, and re-commit."
+fi
+exit $ret