Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot seek stdin on pipe #496

Merged
merged 40 commits into from Feb 18, 2022
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
272fd74
add namespaces for parent attributes
tylerwince May 13, 2019
7a5e827
pylint formatting changes
tylerwince May 13, 2019
583149a
added _Seeker for running seek on sys.stdin
tylerwince May 15, 2019
cf4f11a
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
tylerwince May 15, 2019
acd80a7
Update node_visitor.py
tylerwince May 15, 2019
edcecbe
Update general_hardcoded_password.py
tylerwince May 15, 2019
ca5cfe3
Update general_hardcoded_password.py
tylerwince May 15, 2019
6a33c6e
pep8 fixes
tylerwince May 15, 2019
91543c2
added list handling for hard fname swaps
cat-code May 15, 2019
969e816
updated manager
tylerwince May 15, 2019
6535101
maintaining list order
cat-code May 15, 2019
10c6378
Merge pull request #1 from JuanHuaXu/495-cannot-seek-stdin-on-pipe
tylerwince May 15, 2019
9f99d63
updated pep8 errors
tylerwince May 16, 2019
c9bb2b5
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
tylerwince May 27, 2019
8305291
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
tylerwince Jun 25, 2019
132e1c2
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
tylerwince Aug 2, 2019
c88dcac
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
ericwb Aug 18, 2019
ce523ba
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
tylerwince Oct 8, 2019
7b62387
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
ericwb Jan 12, 2020
c4dda4d
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
ericwb Jan 21, 2020
235acee
Merge branch 'master' into 495-cannot-seek-stdin-on-pipe
ericwb Mar 8, 2020
8038c11
Merge branch 'main' into 495-cannot-seek-stdin-on-pipe
ericwb Feb 14, 2022
fe741b4
Update manager.py
ericwb Feb 14, 2022
b035898
Update manager.py
ericwb Feb 14, 2022
f0e8316
Update manager.py
ericwb Feb 16, 2022
f0106e4
Update issue.py
ericwb Feb 16, 2022
e27cb8a
Update node_visitor.py
ericwb Feb 16, 2022
410a7cf
Update manager.py
ericwb Feb 16, 2022
0b43d58
Update issue.py
ericwb Feb 16, 2022
3f0139e
Update context.py
ericwb Feb 17, 2022
128fb4e
Update issue.py
ericwb Feb 17, 2022
217fe52
Update manager.py
ericwb Feb 17, 2022
1a9b389
Update node_visitor.py
ericwb Feb 17, 2022
4ab2d4a
Update tester.py
ericwb Feb 17, 2022
c42e501
Update issue.py
ericwb Feb 17, 2022
e583627
Update manager.py
ericwb Feb 17, 2022
86ffba3
Update context.py
ericwb Feb 17, 2022
a5e44ba
Update node_visitor.py
ericwb Feb 17, 2022
2e9b2d4
Update manager.py
ericwb Feb 17, 2022
6562a4d
Merge branch 'main' into 495-cannot-seek-stdin-on-pipe
tylerwince Feb 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 10 additions & 1 deletion bandit/core/issue.py
Expand Up @@ -3,6 +3,7 @@
#
# SPDX-License-Identifier: Apache-2.0
import linecache
import sys

from bandit.core import constants

Expand Down Expand Up @@ -171,9 +172,17 @@ def get_code(self, max_lines=3, tabbed=False):
lmin = max(1, self.lineno - max_lines // 2)
lmax = lmin + len(self.linerange) + max_lines - 1

if self.fname == "<stdin>":
sys.stdin.seek(0)
for line_num in range(1, lmin):
sys.stdin.readline()

tmplt = "%i\t%s" if tabbed else "%i %s"
for line in range(lmin, lmax):
text = linecache.getline(self.fname, line)
if self.fname == "<stdin>":
text = sys.stdin.readline()
else:
text = linecache.getline(self.fname, line)

if isinstance(text, bytes):
text = text.decode("utf-8")
Expand Down
11 changes: 8 additions & 3 deletions bandit/core/manager.py
Expand Up @@ -4,6 +4,7 @@
# SPDX-License-Identifier: Apache-2.0
import collections
import fnmatch
import io
import json
import logging
import os
Expand Down Expand Up @@ -269,7 +270,11 @@ def run_tests(self):
self._show_progress("%s.. " % count, flush=True)
try:
if fname == "-":
sys.stdin = os.fdopen(sys.stdin.fileno(), "rb", 0)
open_fd = os.fdopen(sys.stdin.fileno(), "rb", 0)
sys.stdin = io.BytesIO(open_fd.read())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm 99% certain this is going to be a problem with some integration somewhere but if we were already patching this like this, perhaps it's fine? Would it not be better to do something like:

stdin = io.BytesIO(sys.stdin.read().encode())

It's less code, and then we can pass stdin to self._parse_file instead of sys.stdin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree patching sys.stdin looks really ugly. To avoid that, I think we would need to pass a variable around in quite a few places.

new_files_list = [
"<stdin>" if x == "-" else x for x in new_files_list
]
self._parse_file("<stdin>", sys.stdin, new_files_list)
else:
with open(fname, "rb") as fdata:
Expand Down Expand Up @@ -315,8 +320,8 @@ def _parse_file(self, fname, fdata, new_files_list):
# for the line
nosec_lines = dict()
try:
fdata.seek(0)
tokens = tokenize.tokenize(fdata.readline)
buf_data = io.BytesIO(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm a bit perplexed by what we're doing here, honestly. If fdata is always seekable (which, even with your current unmodified changes it is), why are we:

  1. Loading an entire file into memory
  2. Creating a new buffer for that data
  3. Counting the lines the way we are for our metrics?
  4. Parsing nosec comments in this try block when all we really care about is the TokenError and seek shouldn't be raising that either?

I'm just worried about what happens here with a 10kloc file with 100-200 line length limits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is all a bit inefficient. It reads the file once to count the number of lines. Then reads the full file again to tokenize in order to retrieve comments like # nosec.

So this bit of code was a bit perplexing to me as well. So stdin is not a seekable file descriptor (fdata.seek(0) will fail). On line 315, the data is read to its fullest (although I question if that is the case on all platforms).

But there's probably a better way to do all this.

Copy link
Member

@ericwb ericwb Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems Bandit is iterating through the given file at least 4 times.

  1. to count the lines-of-code in metrics count_locs
  2. to tokenize and identify lines marked with # nosec
  3. to ast parse the file
  4. use of linecache in issue.get_code() to reopen the file again and get the code snippet of context

Performance would probably be greatly improved on large files if we reduced to a single pass.

tokens = tokenize.tokenize(buf_data.readline)

if not self.ignore_nosec:
for toktype, tokval, (lineno, _), _, _ in tokens:
Expand Down
2 changes: 1 addition & 1 deletion bandit/core/node_visitor.py
Expand Up @@ -37,7 +37,7 @@ def __init__(self, fname, metaast, testset, debug, nosec_lines, metrics):
try:
self.namespace = b_utils.get_module_qualname_from_path(fname)
except b_utils.InvalidModulePath:
LOG.info(
LOG.warning(
"Unable to find qualified name for module: %s", self.fname
)
self.namespace = ""
Expand Down