Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make use of xargs for Repo().ignored function #1790

Open
ericwb opened this issue Jan 8, 2024 · 1 comment
Open

Make use of xargs for Repo().ignored function #1790

ericwb opened this issue Jan 8, 2024 · 1 comment

Comments

@ericwb
Copy link

ericwb commented Jan 8, 2024

In my use case, I wish to utilize the Repo class function ignored() to filter out a potentially larger list of files. The issue is that somethings this list of files is way too large. For example:

The command line max argument size is defined as noted here:

$ getconf ARG_MAX
1048576

However, my code is going beyond that maximum because the git repos I have chosen have lots of ignorable files.

I get this error:

Traceback (most recent call last):
  File "/Users/ericwb/workspace/bandit/.tox/py312/bin/bandit", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/ericwb/workspace/bandit/bandit/cli/main.py", line 657, in main
    b_mgr.discover_files(args.targets, args.recursive, args.excluded_paths)
  File "/Users/ericwb/workspace/bandit/bandit/core/manager.py", line 252, in discover_files
    ignore_list = repo.ignored(*files)
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/ericwb/workspace/bandit/.tox/py312/lib/python3.12/site-packages/git/repo/base.py", line 878, in ignored
    proc: str = self.git.check_ignore(*paths)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ericwb/workspace/bandit/.tox/py312/lib/python3.12/site-packages/git/cmd.py", line 736, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ericwb/workspace/bandit/.tox/py312/lib/python3.12/site-packages/git/cmd.py", line 1316, in _call_process
    return self.execute(call, **exec_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ericwb/workspace/bandit/.tox/py312/lib/python3.12/site-packages/git/cmd.py", line 988, in execute
    proc = Popen(
           ^^^^^^
  File "/Users/ericwb/.pyenv/versions/3.12.1/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/ericwb/.pyenv/versions/3.12.1/lib/python3.12/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: 'git'

If I was using the command line, I'd take advantage of xargs to split the argument size below the max and repeatedly call git check-ignore. For reference: https://stackoverflow.com/questions/2381241/what-is-the-subprocess-popen-max-length-of-the-args-parameter

While I can do the splitting of the list in my Python code, I believe it makes more sense that the library designed to call command lines does it itself. And maybe it doesn't make sense for all commands, be do believe xargs would work well for git check-ignore

@Byron
Copy link
Member

Byron commented Jan 9, 2024

Thanks for reporting!

I think this could be solved by providing the paths to stdin. It's probably not as trivial as it sounds as the input would have to be provided while the output is consumed to avoid deadlock due to filled pipes.

Reference ``` ❯ git check-ignore -h usage: git check-ignore [] ... or: git check-ignore [] --stdin
-q, --quiet           suppress progress reporting
-v, --verbose         be verbose

--stdin               read file names from stdin
-z                    terminate input and output records by a NUL character
-n, --non-matching    show non-matching input paths
--no-index            ignore index when checking
</details>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants