Skip to content
This repository has been archived by the owner on Jan 29, 2024. It is now read-only.

[BBS-126] Make bandit tests pass #177

Merged
merged 19 commits into from Dec 3, 2020
Merged

[BBS-126] Make bandit tests pass #177

merged 19 commits into from Dec 3, 2020

Conversation

FrancescoCasalegno
Copy link
Contributor

@FrancescoCasalegno FrancescoCasalegno commented Nov 30, 2020

🏴‍☠️ Passing Bandit Tests 🏴‍☠️

Several lines were changed in the code to pass various bandit tests. This is what we learned.

  • eval() is unsafe, use ast.literal_eval() instead—but it's limited to evaluation of constructs involving only standard library!
  • input() is unsafe in Python 2, but OK in Python 3. This is because in Python 3 it replaces the older raw_input() which is the safe version of Python 2.
  • assert statements should not be used in prod code, because optimized python code will remove such statements. Raise specific exceptions instead.
  • Every time you engine.execute() a SQL query with an f-string with parameters, you are subject to potential injection attacks. So use the following syntax whenever possible.

--> 😡 Bad

names = ["Smith", "Turing", "Van Gogh"]
names = "(" + ", ".join(names) + ")"
q = f"SELECT * FROM my_table WHERE first_name IN {names}"
engine.execute(q)

--> 😃 Good

import sqlalchemy.sql as sql
names = ["Smith", "Turing", "Van Gogh"]
q = sql.text("SELECT * FROM my_table WHERE first_name IN :names").bindparams(sql.bindparam("names", expanding=True))
engine.execute(q, params={"names": names})

🏴‍☠️ Bandit Evaders 🏴‍☠️

We decided for the time being to ignore 5 SQL injection errors from bandit. We ignored these lines by flagging them with a # nosec comment. The reason is that we would have needed a major code refactoring, probably introducing SqlAlchemy's ORM or something even worse.

Useful to know for the future: apparently you cannot bind (neither with the syntax above, nor with any other syntax) the name of a table or of a column in a SQL query.

More precisely, these are the errors that we decided to silence.

Test results:
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   Location: src/bbsearch/database/cord_19.py:40
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b608_hardcoded_sql_expressions.html
39          with engine.begin() as connection:
40              query = f"SELECT sentence_id, text FROM {sentences_table_name}"
41              df_sentences = pd.read_sql(query, connection)
--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   Location: src/bbsearch/database/mining_cache.py:288
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b608_hardcoded_sql_expressions.html
287                 WHERE mining_model = :mining_model
288                 """
289                 self.engine.execute(
290                     sqlalchemy.sql.text(query),
291                     mining_model=model_schema["model_path"],
--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   Location: src/bbsearch/sql.py:338
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b608_hardcoded_sql_expressions.html
337                     WHERE (article_id = {a} AND paragraph_pos_in_article = {p})
338                     """
339                     for a, p in identifiers_pars[i * batch_size : (i + 1) * batch_size]
340                 )
--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   Location: src/bbsearch/sql.py:345
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b608_hardcoded_sql_expressions.html
344                 WHERE tt.mining_model IN {model_names}
345                 """
346                 dfs_pars.append(pd.read_sql(query_pars, engine))
347             df_pars = pd.concat(dfs_pars)
348             df_pars = df_pars.sort_values(
--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   Location: src/bbsearch/sql.py:595
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b608_hardcoded_sql_expressions.html
594                 )
595                 """.strip()
596                 sentence_conditions.append(article_condition_query)
597     
598             # Restricted sentence IDs
--------------------------------------------------
Code scanned:
        Total lines of code: 7308
        Total lines skipped (#nosec): 0
Run metrics:
        Total issues (by severity):
                Undefined: 0.0
                Low: 0.0
                Medium: 5.0
                High: 0.0
        Total issues (by confidence):
                Undefined: 0.0
                Low: 5.0
                Medium: 0.0
                High: 0.0
Files skipped (0):

@FrancescoCasalegno FrancescoCasalegno marked this pull request as ready for review December 1, 2020 10:15
@FrancescoCasalegno FrancescoCasalegno requested review from Stannislav, jankrepl and pafonta and removed request for Stannislav December 1, 2020 10:15
src/bbsearch/entrypoints/database_entrypoint.py Outdated Show resolved Hide resolved
src/bbsearch/mining/attribute.py Show resolved Hide resolved
src/bbsearch/mining/attribute.py Outdated Show resolved Hide resolved
src/bbsearch/mining/relation.py Show resolved Hide resolved
src/bbsearch/sql.py Show resolved Hide resolved
Comment on lines +179 to +180
"article_id": int(article_id),
"paragraph_pos_in_article": int(paragraph_pos_in_article),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docstring, article_id and paragraph_pos_in_article are already of type int. Why a casting to int is added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Python doesn't have static typing, if someone passes any other type than int we could have a mess.
In particular, if we pass a np.int64 the SQL query breaks, which was indeed what was happening in our case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Was it spotted by bandit?

Anyway, should we consider... type annotations for all BBS then? ;) Or data validation frameworks for Python arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For type annotations: I am not against as said in the past ;) I am still a bit annoyed that one would have to write them for each function by hand, it would be cool if we could use the numpy docstrings we already have to auto-generate type annotations (if PyCharm can do it, there must be a way I hope).

But afaik type annotations wouldn't have raised any exception here right? They are just useful to help your IDE or a developer to know which type is "expected".

For data validation frameworks: I do not know any, do you have one in mind in particular?
I feel Python's duck typing is a key feature, so unless it's really needed I would prefer avoiding type checking...

Copy link
Contributor

@pafonta pafonta Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrancescoCasalegno

auto-generate type annotations

According to what I have found, there is tooling to convert docstring types into type annotations.

type annotations wouldn't have raised any exception here right?

The IDE would have complained where the function with the body we discuss is used.
A static type checker like mypy, run by the CI, would have complained.
So, with this, the case where an exception would have been thrown would not be reached.

For data validation frameworks: I do not know any, do you have one in mind in particular?

For validating inputs, I would have pydantic in mind.

I feel Python's duck typing is a key feature, so unless it's really needed I would prefer avoiding type checking...

I agree. However, I would weight it against two other points:

  • How well are exceptions managed?
  • How critical is it if the process stops?

This being said, I think that checking docstrings types or Python type annotations would already let us be more comfortable with the runtime issues. I won't go for inputs validation.

src/bbsearch/sql.py Show resolved Hide resolved
src/bbsearch/sql.py Show resolved Hide resolved
src/bbsearch/sql.py Show resolved Hide resolved
src/bbsearch/sql.py Show resolved Hide resolved
Copy link
Contributor

@jankrepl jankrepl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice:) I learned a couple of new things:)

Copy link
Contributor

@Stannislav Stannislav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey guys, awesome job. See my comments.

src/bbsearch/entrypoints/database_entrypoint.py Outdated Show resolved Hide resolved
src/bbsearch/mining/entity.py Show resolved Hide resolved
src/bbsearch/mining/entity.py Outdated Show resolved Hide resolved
src/bbsearch/sql.py Show resolved Hide resolved
@FrancescoCasalegno FrancescoCasalegno merged commit 0f9028a into master Dec 3, 2020
@FrancescoCasalegno FrancescoCasalegno deleted the bbs_126 branch December 3, 2020 07:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants