Improve our data collection regarding bugs #2420

p-offtermatt · 2023-04-19T15:52:54Z

TL,DR: We should introduce better categorization and labeling for bugs in Gaia and ICS to annotate bugs with their rough type and the method by which they were found; the people triaging and closing issues should keep the labels up-to-date and remove or add bug tags; we should add, in the Gaia/ICS repo, all bugs we are aware of that our tests for Gaia/ICS should have caught, even if those bugs are ultimately in code of our dependencies.

Problem

Context:

To have solid quality assurance, it is important to know how effective we are at catching bugs, and how early in our process we catch them. In the spirit of Shift-left testing, bugs found late in the development process tend to be more expensive to fix, so it is interesting to keep track of how many bugs are found when. It is also interesting to characterize bugs by loose categories. Specific types of bugs are amenable to different types of testing and analysis, e.g. bugs in input handling can be found using out-of-the-box Fuzzing tools, deep bugs may need some more involved forms of testing like Property or Model based testing, bugs on the protocol level could be reduced by writing testable and executable specifications. Also, knowing how many and what types of bugs are found, going forward, can allow us to evaluate whether new techniques we introduce for testing are actually working, whether they find the type of bugs we anticipate, etc.

I did an analysis of historical bug reports for ICS and Gaia here: https://docs.google.com/spreadsheets/d/1_gzytyMzIT3BRzrsJdcdSJDTdCXAh-YtxudrDfwNAHw/edit?usp=sharing

For ICS, bug reports are useful and plenty, while the data quality for Gaia is lower.

Problem Statement:

We should have some form of data collection in place, to classify and get information about errors that are found within the projects we steward. This should include bugs that are found in testnets. If a bug should have been caught by our Gaia or ICS CI pipeline but wasn’t, it should be listed in that repo with an issue and get a classification.

Closing Criterion

We have improved our workflow for categorizing and tracking bugs. We are more confident that we have a grasp of bugs that are relevant for the projects we steward, and it is easy to access information about these bugs in some systematic manner.

Proposed Solution

I propose adding two new categories of issue labels in the ICS and Gaia repos.

These issue labels are helpful as it is very easy to filter data based on them.

Bug:{type} to denote rough classes of bugs that could be identified by similar testing techniques. I propose the following types, based on my findings while analyzing historical bug reports in the ICS repo. Some of the labels already exist in one way or another, others do not:
- Spec-Mismatch: To label bugs caused by mismatching implementation and specification.
  These bugs would likely be perfect candidates for Model or Property based testing.
- Tests: To label bugs inside our tests, i.e. a test fails, even though the behavior is correct.
  These bugs should help us identify where our tests are too brittle.
- Spec-Error: To label bugs where the specification itself has an error.
  These bugs could be avoided by having executable specifications. Importantly, if these bugs are found late, i.e. in a testnet, it should ring alarm bells, since they slipped through many layers of testing and quality assurance.
- Input-Handling: To label bugs related to reading and parsing inputs, e.g. reading Genesis files, parsing input data from an interface, etc.
  These bugs could be perfect candidates for out-of-the-box fuzzers, and if we have many of these we should consider adding those.
- DevOps: To label bugs inside our setup and infrastructure, e.g. bugs in Makefiles, Dockerfiles, default configurations, etc.
  These bugs could need more extensive end-to-end tests with setups closer to the real environment to find.
Found:{source} to denote how the bug was found. I propose the following sources. Again, some of these labels already exist in one way or another:
- security-audit-{identifier}: To identify bugs found in audits.
- diff-test/e2e-test/integration-test/unit-test: To identify bugs that were surfaced while adding new tests of a certain type.
- Fuzzing: To identify bugs found by fuzzing
- User: User issues
- Testnet: for bugs found in a testnet

More labels should be created if necessary and appropriate. Also, multiple labels can be added to one issue, e.g. an issue might be related to input-handling and a spec-mismatch.

I propose to apply these labels (as far as the information is known) when triaging bugs, and to add a step in our workflow to update the labels when fixing or closing issues. This mainly means removing the bug tag if it turns out the behavior is intended, adding a bug:{type} tag when it has been identified what caused the bug, and adding a found:{source} tag if it’s known how the bug was found.

I also propose to use the Gaia repo to collect bugs that were found in the testnet or in production on the Hub, even if those bugs are in code from our dependencies, e.g. the Cosmos SDK. For example, if a bug is present in the Gaia binary of a testnet or incentivized testnet, it should have ideally been caught by our tests before the testnet started, so we should log this with an issue. Thus, good practice would be to open an issue in the Gaia repo for any bug in the Gaia binary, and to link to an issue in the appropriate other repo if the bug is in code of our dependencies, e.g. link to an issue in the Cosmos SDK repo if a bug in Gaia was caused by a bug in the SDK. This allows us to also keep track of how prone to errors our dependencies are, and to which we need to pay special attention.

If we decide to integrate some extra labels into our workflow, we should take care to a) make sure the current team knows about the workflow and what to do with bug labels, and b) make sure that new incoming members of the team learn about this. One potential way to do this would be to add a section in CONTRIBUTING.md that specifically mentions that the person triaging or closing issues should take care of these issue labels. It could also be worth looking into bots to help with a part of this, e.g. a bot that adds a "classification needed" label to new issues labelled bug that are closed. Many of the labels I am proposing already exist, but haven't really been used, even when they might have been appropriate.

Potential issues with the solution

Aren’t we bound to run into https://en.wikipedia.org/wiki/Survivorship_bias, since the bugs we should find are the ones that we currently miss?

This seems true, but collecting better data will also allow us to avoid survivorship bias better, and to search where we are currently not finding bugs.

People will forget to tag issues or the tags will fall out of use over time.

That might happen, but maybe it will give useful data until then. In addition, to get useful data we do not need 100% of issues to be tagged correctly - having better data is good, even if it is not perfect either.

External users shouldn’t be bothered with this.

I think it is important that the core team takes care of this, and we don’t expect external users to know how to classify things. They might guess some category, or they might not, but this would be fine either way, since we triage issues anyways and can add the labels then.

Maybe categorizing these things by hand is pointless, if we could get LLMs to characterize issues.

If anyone has suggestions, I would be really happy to hear them. It seems there exist out-of-the-box plugins that could help streamline this, see https://github.com/apps/automatic-issue-classifier

p-offtermatt · 2023-05-09T09:53:58Z

informalsystems/hermes#2981 might be related

p-offtermatt · 2023-05-10T15:36:31Z

Talked to Marius about this, and I will come up with a plan to implement this, so I'm reassigning to myself.

p-offtermatt mentioned this issue Apr 19, 2023

Improve Gaia QA process #2406

Closed

p-offtermatt assigned jtremback and mpoke Apr 19, 2023

p-offtermatt mentioned this issue Apr 21, 2023

Outline long term QA goals #2427

Closed

16 tasks

p-offtermatt assigned p-offtermatt and unassigned jtremback and mpoke May 10, 2023

MSalopek mentioned this issue May 15, 2023

Use verbose in ci to prioritize humans over machines cosmos/interchain-security#957

Closed

18 tasks

MSalopek mentioned this issue Jul 24, 2023

document possible approaches regarding ICS QA and maintenance cosmos/interchain-security#1165

Closed

mpoke added the S: Productivity Productivity: Developer tooling, infrastructure improvements enabling future growth label Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve our data collection regarding bugs #2420

Improve our data collection regarding bugs #2420

p-offtermatt commented Apr 19, 2023 •

edited

p-offtermatt commented May 9, 2023

p-offtermatt commented May 10, 2023

Improve our data collection regarding bugs #2420

Improve our data collection regarding bugs #2420

Comments

p-offtermatt commented Apr 19, 2023 • edited

Problem

Context:

Problem Statement:

Closing Criterion

Proposed Solution

Potential issues with the solution

p-offtermatt commented May 9, 2023

p-offtermatt commented May 10, 2023

p-offtermatt commented Apr 19, 2023 •

edited