Skip to content

Commit

Permalink
Plugin/CodeQL: Add filtering support
Browse files Browse the repository at this point in the history
Adds the ability to filter issues with filter patterns defined in
an arbitrary number of YAML files and/or at individual package level
with a section in the package CI YAML file.

Further details are available in the Readme.md file.

Signed-off-by: Michael Kubacki <michael.kubacki@microsoft.com>
  • Loading branch information
makubacki authored and kenlautner committed Oct 17, 2023
1 parent edb32d4 commit 8fa6e4f
Show file tree
Hide file tree
Showing 5 changed files with 442 additions and 4 deletions.
54 changes: 54 additions & 0 deletions .pytool/Plugin/CodeQL/CodeQlAnalyzePlugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import os
import yaml

from analyze import analyze_filter
from common import codeql_plugin

from edk2toolext import edk2_logging
Expand Down Expand Up @@ -80,6 +81,7 @@ def do_post_build(self, builder: UefiBuilder) -> int:
query_specifiers = None
package_config_file = Path(os.path.join(
self.package_path, self.package + ".ci.yaml"))
plugin_data = None
if package_config_file.is_file():
with open(package_config_file, 'r') as cf:
package_config_file_data = yaml.safe_load(cf)
Expand Down Expand Up @@ -139,6 +141,58 @@ def do_post_build(self, builder: UefiBuilder) -> int:
f"not created. Analysis cannot continue.")
return -1

filter_pattern_data = []
global_filter_file_value = builder.env.GetValue(
"STUART_CODEQL_FILTER_FILES")
if global_filter_file_value:
global_filter_files = global_filter_file_value.strip().split(',')
global_filter_files = [Path(f) for f in global_filter_files]

for global_filter_file in global_filter_files:
if global_filter_file.is_file():
with open(global_filter_file, 'r') as ff:
global_filter_file_data = yaml.safe_load(ff)
if "Filters" in global_filter_file_data:
current_pattern_data = \
global_filter_file_data["Filters"]
if type(current_pattern_data) is not list:
logging.critical(
f"CodeQL pattern data must be a list of "
f"strings. Data in "
f"{str(global_filter_file.resolve())} is "
f"invalid. CodeQL analysis is incomplete.")
return -1
filter_pattern_data += current_pattern_data
else:
logging.critical(
f"CodeQL global filter file "
f"{str(global_filter_file.resolve())} is "
f"malformed. Missing Filters section. CodeQL "
f"analysis is incomplete.")
return -1
else:
logging.critical(
f"CodeQL global filter file "
f"{str(global_filter_file.resolve())} was not found. "
f"CodeQL analysis is incomplete.")
return -1

if plugin_data and "Filters" in plugin_data:
if type(plugin_data["Filters"]) is not list:
logging.critical(
"CodeQL pattern data must be a list of strings. "
"CodeQL analysis is incomplete.")
return -1
filter_pattern_data.extend(plugin_data["Filters"])

if filter_pattern_data:
logging.info("Applying CodeQL SARIF result filters.")
analyze_filter.filter_sarif(
self.codeql_sarif_path,
self.codeql_sarif_path,
filter_pattern_data,
split_lines=False)

with open(self.codeql_sarif_path, 'r') as sf:
sarif_file_data = json.load(sf)

Expand Down
84 changes: 80 additions & 4 deletions .pytool/Plugin/CodeQL/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,19 @@ such users is [Local Development Tips](#local-development-tips).
1. [Database and Analysis Result Locations](#database-and-analysis-result-locations)
2. [Global Configuration](#global-configuration)
3. [Package-Specific Configuration](#package-specific-configuration)
4. [Integration Instructions](#integration-instructions)
4. [Filter Patterns](#filter-patterns)
5. [Integration Instructions](#integration-instructions)
- [Integration Step 1 - Choose Scopes](#integration-step-1---choose-scopes)
- [Scopes Available](#scopes-available)
- [Integration Step 2 - Choose CodeQL Queries](#integration-step-2---choose-codeql-queries)
- [Integration Step 3 - Determine Global Configuration Values](#integration-step-3---determine-global-configuration-values)
- [Integration Step 4 - Determine Package-Specific Configuration Values](#integration-step-4---determine-package-specific-configuration-values)
- [Integration Step 5 - Testing](#integration-step-5---testing)
- [Scopes Available](#scopes-available)
5. [High-Level Operation](#high-level-operation)
- [Integration Step 6 - Define Inclusion and Exclusion Filter Patterns](#integration-step-6---define-inclusion-and-exclusion-filter-patterns)
6. [High-Level Operation](#high-level-operation)
- [CodeQlBuildPlugin](#codeqlbuildplugin)
- [CodeQlAnalyzePlugin](#codeqlanalyzeplugin)
6. [Local Development Tips](#local-development-tips)
7. [Local Development Tips](#local-development-tips)

## Database and Analysis Result Locations

Expand Down Expand Up @@ -85,6 +87,26 @@ built by the script.
- `STUART_CODEQL_PATH` - The path to the CodeQL CLI application to use.
- `STUART_CODEQL_QUERY_SPECIFIERS` - The CodeQL CLI query specifiers to use. See [Running codeql database analyze](https://codeql.github.com/docs/codeql-cli/analyzing-databases-with-the-codeql-cli/#running-codeql-database-analyze)
for possible options.
- `STUART_CODEQL_FILTER_FILES` - The path to "filter" files that contains filter patterns as described in
[Filter Patterns](#filter-patterns).
- More than one file may be specified by separating each absolute file path with a comma.
- This might be useful to reference a global filter file from an upstream repo and also include a global filter
file for the local repo.
- Filters are concatenated in the order of files in the variable. Patterns in later files can override patterns
in earlier files.
- The file only needs to contain a list of filter pattern strings under a `"Filters"` key. For example:

```yaml
{
"Filters": [
"<pattern-line-1>",
"<pattern-line-2>"
]
}
...
```

Comments are allowed in the filter files and begin with `#` (like a normal YAML file).

## Package-Specific Configuration

Expand All @@ -98,9 +120,58 @@ the package.
"CodeQlAnalyze": {
"AuditOnly": False, # Don't fail the build if there are errors. Just log them.
"QuerySpecifiers": "" # Query specifiers to pass to CodeQL CLI.
"Filters": "" # Inclusion/exclusion filters
}
```

> _NOTE:_ If a global filter set is provided via `STUART_CODEQL_FILTER_FILES` and a package has a package-specific
> list, then the package-specific filter list (in a package CI YAML file) is appended onto the global filter list and
> may be used to override settings in the global list.
The format used to specify items in `"Filters"` is specified in [Filter Patterns](#filter-patterns).

## Filter Patterns

As you inspect results, you may want to include or exclude certain sets of results. For example, exclude some files by
file path entirely or adjust the CodeQL rule applied to a certain file. This plugin reuses logic from a popular
GitHub Action called [`filter-sarif`](https://github.com/advanced-security/filter-sarif) to allow filtering as part of
the plugin analysis process.

If any results are excluded using filters, the results are removed from the SARIF file. This allows the exclude results
seen locally to exactly match the results on the CI server.

Read the ["Patterns"](https://github.com/advanced-security/filter-sarif#patterns) section there for more details. The
patterns section is also copied below with some updates to make the information more relevant for an edk2 codebase
for convenience.

Each pattern line is of the form:

```plaintext
[+/-]<file pattern>[:<rule pattern>]
```

For example:

```yaml
-**/*Test*.c:** # exclusion pattern: remove all alerts from all test files
-**/*Test*.c # ditto, short form of the line above
+**/*.c:cpp/infiniteloop # inclusion pattern: This line has precedence over the first two
# and thus "allow lists" alerts of type "cpp/infiniteloop"
**/*.c:cpp/infiniteloop # ditto, the "+" in inclusion patterns is optional
** # allow all alerts in all files (reverses all previous lines)
```

- The path separator character in patterns is always `/`, independent of the platform the code is running on and
independent of the paths in the SARIF file.
- `*` matches any character, except a path separator
- `**` matches any character and is only allowed between path separators, e.g. `/**/file.txt`, `**/file.txt` or `**`.
NOT allowed: `**.txt`, `/etc**`
- The rule pattern is optional. If omitted, it will apply to alerts of all types.
- Subsequent lines override earlier ones. By default all alerts are included.
- If you need to use the literals `+`, `-`, `\` or `:` in your pattern, you can escape them with `\`, e.g.
`\-this/is/an/inclusion/file/pattern\:with-a-semicolon:and/a/rule/pattern/with/a/\\/backslash`. For `+` and `-`, this
is only necessary if they appear at the beginning of the pattern line.

## Integration Instructions

First, note that most CodeQL CLI operations will take a long time the first time they are run. This is due to:
Expand Down Expand Up @@ -216,6 +287,11 @@ package's CI YAML file.

Verify a `stuart_update` and `stuart_build` (or `stuart_ci_build`) command work.

### Integration Step 6 - Define Inclusion and Exclusion Filter Patterns

After reviewing the test results from Step 5, determine if you need to apply any filters as described in
[Filter Patterns](#filter-patterns).

## High-Level Operation

This section summarizes the complete CodeQL plugin flow. This is to help developers understand basic theory of
Expand Down
Empty file.
176 changes: 176 additions & 0 deletions .pytool/Plugin/CodeQL/analyze/analyze_filter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# @file analyze_filter.py
#
# Filters results in a SARIF file.
#
# Based on code in:
# https://github.com/advanced-security/filter-sarif
#
# Specifically:
# https://github.com/advanced-security/filter-sarif/blob/main/filter_sarif.py
#
# That code is licensed under:
# Apache License
# Version 2.0, January 2004
# http://www.apache.org/licenses/
#
# View the full and complete license as provided by that repository here:
# https://github.com/advanced-security/filter-sarif/blob/main/LICENSE
#
# This file has been altered from its original form.
#
# It primarily contains modifications made to integrate with the CodeQL plugin.
#
# Copyright (c) Microsoft Corporation. All rights reserved.
# SPDX-License-Identifier: BSD-2-Clause-Patent
##

import json
import logging
import re
from os import PathLike
from typing import Iterable, List, Tuple

from analyze.globber import match


def _match_path_and_rule(
path: str, rule: str, patterns: Iterable[str]) -> bool:
"""Returns whether a given path matches a given rule.
Args:
path (str): A file path string.
rule (str): A rule file path string.
patterns (Iterable[str]): An iterable of pattern strings.
Returns:
bool: True if the path matches a rule. Otherwise, False.
"""
result = True
for s, fp, rp in patterns:
if match(rp, rule) and match(fp, path):
result = s
return result


def _parse_pattern(line: str) -> Tuple[str]:
"""Parses a given pattern line.
Args:
line (str): The line string that contains the rule.
Returns:
Tuple[str]: The parsed sign, file pattern, and rule pattern from the
line.
"""
sep_char = ':'
esc_char = '\\'
file_pattern = ''
rule_pattern = ''
seen_separator = False
sign = True

# inclusion or exclusion pattern?
u_line = line
if line:
if line[0] == '-':
sign = False
u_line = line[1:]
elif line[0] == '+':
u_line = line[1:]

i = 0
while i < len(u_line):
c = u_line[i]
i = i + 1
if c == sep_char:
if seen_separator:
raise Exception(
'Invalid pattern: "' + line + '" Contains more than one '
'separator!')
seen_separator = True
continue
elif c == esc_char:
next_c = u_line[i] if (i < len(u_line)) else None
if next_c in ['+' , '-', esc_char, sep_char]:
i = i + 1
c = next_c
if seen_separator:
rule_pattern = rule_pattern + c
else:
file_pattern = file_pattern + c

if not rule_pattern:
rule_pattern = '**'

return sign, file_pattern, rule_pattern


def filter_sarif(input_sarif: PathLike,
output_sarif: PathLike,
patterns: List[str],
split_lines: bool) -> None:
"""Filters a SARIF file with a given set of filter patterns.
Args:
input_sarif (PathLike): Input SARIF file path.
output_sarif (PathLike): Output SARIF file path.
patterns (PathLike): List of filter pattern strings.
split_lines (PathLike): Whether to split lines in individual patterns.
"""
if split_lines:
tmp = []
for p in patterns:
tmp = tmp + re.split('\r?\n', p)
patterns = tmp

patterns = [_parse_pattern(p) for p in patterns if p]

logging.debug('Given patterns:')
for s, fp, rp in patterns:
logging.debug(
'files: {file_pattern} rules: {rule_pattern} ({sign})'.format(
file_pattern=fp,
rule_pattern=rp,
sign='positive' if s else 'negative'))

with open(input_sarif, 'r') as f:
s = json.load(f)

for run in s.get('runs', []):
if run.get('results', []):
new_results = []
for r in run['results']:
if r.get('locations', []):
new_locations = []
for l in r['locations']:
# TODO: The uri field is optional. We might have to
# fetch the actual uri from "artifacts" via
# "index"
# (see https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#-linking-results-to-artifacts)
uri = l.get(
'physicalLocation', {}).get(
'artifactLocation', {}).get(
'uri', None)

# TODO: The ruleId field is optional and potentially
# ambiguous. We might have to fetch the actual
# ruleId from the rule metadata via the ruleIndex
# field.
# (see https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#rule-metadata)
ruleId = r['ruleId']

if (uri is None or
_match_path_and_rule(uri, ruleId, patterns)):
new_locations.append(l)
r['locations'] = new_locations
if new_locations:
new_results.append(r)
else:
# locations array doesn't exist or is empty, so we can't
# match on anything. Therefore, we include the result in
# the output.
new_results.append(r)
run['results'] = new_results

with open(output_sarif, 'w') as f:
json.dump(s, f, indent=2)

0 comments on commit 8fa6e4f

Please sign in to comment.