Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DX: Check trailing spaces in project files only #4957

Merged
merged 1 commit into from May 23, 2020

Conversation

ktomk
Copy link
Contributor

@ktomk ktomk commented May 16, 2020

(minor change despite longer description)

Previously the check_trailing_spaces.sh script was checking files that do not belong to the project, for example:

  • diverse composer.lock files
  • local development files like .idea or temporary files created

In its' previous design, the list of files to check for trailing spaces is generated by find which is not aware about different files (not) in the project. To exclude non-project files, it worked with a larger
blacklist of checking if a file is not in diverse paths, for example and most prominently:

  • -not -path "./.git/*"

which is kind of contradictory as the project is managed by git, but that folder can have a totally different name.

This blacklist is pretty verbose and also needs extra knowledge and care due to the nature of find as it operates on file-system level, not project level.

Files in the project are managed by git and git knows which files belong into the project and which not. Git on ...

  • ... repository level contains the information which project files are ignored. for example the composer.lock files.
  • ... developer or system level also knows about which files are ignored. for example the .idea folder (see "global .gitignore")

Change here is to replace find as file-system only utility with git itself, namely the git-ls-files command, for the operation to obtain the list of files to check. the find package (findutils) is not replaced as the xargs utility from it is in use.

This allows to drop the manually crafted blacklist and to specifically mark paths/files not to be checked which are (only) the test fixtures in:

  • tests/Fixtures/

The knowledge of git about the project itself and how the developer works next to the reduced list of files to check has the additional benefit of a cache and overall results in a much better performance.

@ktomk
Copy link
Contributor Author

ktomk commented May 16, 2020

How to test:

Use a differ of your choice to make the differences between the old and new behavior visible (here I took meld, diff does equally):

$ meld -- \
	<(find . -type f -not -path "./.git/*" -not -path "./dev-tools/bin/*" -not -path "./dev-tools/vendor/*" -not -path "./vendor/*" -not -path "./tests/Fixtures/*" | sort -fh) \
	<(git ls-files . ':!tests/Fixtures/*' | sed -e 's/^/.\//' | sort -fh)

Reviewing the changes should show no files added by the new behavior (right side) but instead show false-positive files from the left side removed.

@kubawerlos
Copy link
Contributor

@ktomk looks promising, can you point to 2.15 branch?

The difference is that we lose colours:
image

The first one is before and second after your changes - I don't know if we want to lose that, might be helpful for contributors to easily spot the trailing whitespaces.

And if we don't need the colour I guess some formating from the line echo "${files_with_trailing_spaces}"... could be removed.

BTW have you tried with git grep instead of git ls-files with grep?

@ktomk ktomk changed the base branch from 2.16 to 2.15 May 16, 2020 12:39
@ktomk
Copy link
Contributor Author

ktomk commented May 16, 2020

@kubawerlos colors should be back with the last fixup, for git grep I'll take a look.

@ktomk
Copy link
Contributor Author

ktomk commented May 16, 2020

@kubawerlos Yes, git-rep worked like a charm. No more execs of grep.

Trailing whitespaces detected:
 - in CHANGELOG.md at line 14
   > Changelog for v2.15.6 

Selection_074

@SpacePossum
Copy link
Contributor

Hi, thanks for all the work! You think this should also be applied to https://github.com/PHP-CS-Fixer/trailing-whitespace-checker ?

@SpacePossum SpacePossum added this to the 2.15.8 milestone May 17, 2020
@SpacePossum SpacePossum added the RTM Ready To Merge label May 18, 2020
Previously the `check_trailing_spaces.sh` script was checking files that
do not belong to the project, for example:

 - diverse `composer.lock` files
 - local development files like `.idea` or temporary files created

In its' previous design, the list of files to check for trailing spaces
is generated by `find` which is not aware about different files (not) in
the project. To exclude non-project files, it worked with a larger
blacklist of checking if a file is not in diverse paths, for example
and most prominently:

  - -not -path "./.git/*"

which is kind of contradictory as the project *is* managed by git, but
that folder can have a totally different name.

This blacklist is pretty verbose and also needs extra knowledge and care
due to the nature of `find` as it operates on file-system level, not
project level more specifically.

Files in the project are managed by `git` and `git` knows which files
belong into the project and which not. Git on ...

 - ... repository level contains the information which project files are
   ignored. for example the `composer.lock` files.
 - ... developer or system level also knows about which files are
   ignored. for example the `.idea` folder (see "global .gitignore")

Change here is to replace `find` as file-system only utility with `git`
itself, namely the `git-grep` command, for the operation to obtain
the list of files to check.

This includes the exec-call of grep, as gits' grep is a little more
powerful, too [1] *and* is compatible in the output format for the
post-processing with sed in the check-script.

This allows to drop the manually crafted blacklist and to specifically
mark paths/files not to be checked [2] which are (only) the test
fixtures in:

 - tests/Fixtures/

The knowledge of git about the project itself and how the developer
works next to the reduced list of files to check has the additional
benefit of a cache and overall results in a much better performance.

[1]: Fun with "git grep" <https://gitster.livejournal.com/27674.html>
[2]: pathspec - gitglossary <https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec>
@ktomk
Copy link
Contributor Author

ktomk commented May 18, 2020

Just squashed after review.

@SpacePossum

Hi, thanks for all the work! You think this should also be applied to https://github.com/PHP-CS-Fixer/trailing-whitespace-checker ?

I don't have it in use, no idea. The port is not straight forward at least for me as it needs also support for Subversion for which I currently don't have a system prepared for. Feel free to open an issue there if you need it there, too, then we can discuss the details of a port there. Just ping me.

Copy link
Contributor

@kubawerlos kubawerlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colours are back 👍

@SpacePossum
Copy link
Contributor

Thank you @ktomk.

@SpacePossum SpacePossum removed the RTM Ready To Merge label May 23, 2020
@SpacePossum SpacePossum merged commit ffee8fe into PHP-CS-Fixer:2.15 May 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants