Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve our data collection regarding bugs #2420

Open
Tracked by #2406
p-offtermatt opened this issue Apr 19, 2023 · 2 comments
Open
Tracked by #2406

Improve our data collection regarding bugs #2420

p-offtermatt opened this issue Apr 19, 2023 · 2 comments
Assignees
Labels
S: Productivity Productivity: Developer tooling, infrastructure improvements enabling future growth

Comments

@p-offtermatt
Copy link
Contributor

p-offtermatt commented Apr 19, 2023

TL,DR: We should introduce better categorization and labeling for bugs in Gaia and ICS to annotate bugs with their rough type and the method by which they were found; the people triaging and closing issues should keep the labels up-to-date and remove or add bug tags; we should add, in the Gaia/ICS repo, all bugs we are aware of that our tests for Gaia/ICS should have caught, even if those bugs are ultimately in code of our dependencies.

Problem

Context:

To have solid quality assurance, it is important to know how effective we are at catching bugs, and how early in our process we catch them. In the spirit of Shift-left testing, bugs found late in the development process tend to be more expensive to fix, so it is interesting to keep track of how many bugs are found when. It is also interesting to characterize bugs by loose categories. Specific types of bugs are amenable to different types of testing and analysis, e.g. bugs in input handling can be found using out-of-the-box Fuzzing tools, deep bugs may need some more involved forms of testing like Property or Model based testing, bugs on the protocol level could be reduced by writing testable and executable specifications. Also, knowing how many and what types of bugs are found, going forward, can allow us to evaluate whether new techniques we introduce for testing are actually working, whether they find the type of bugs we anticipate, etc.

I did an analysis of historical bug reports for ICS and Gaia here: https://docs.google.com/spreadsheets/d/1_gzytyMzIT3BRzrsJdcdSJDTdCXAh-YtxudrDfwNAHw/edit?usp=sharing

For ICS, bug reports are useful and plenty, while the data quality for Gaia is lower.

Problem Statement:

We should have some form of data collection in place, to classify and get information about errors that are found within the projects we steward. This should include bugs that are found in testnets. If a bug should have been caught by our Gaia or ICS CI pipeline but wasn’t, it should be listed in that repo with an issue and get a classification.

Closing Criterion

We have improved our workflow for categorizing and tracking bugs. We are more confident that we have a grasp of bugs that are relevant for the projects we steward, and it is easy to access information about these bugs in some systematic manner.

Proposed Solution

I propose adding two new categories of issue labels in the ICS and Gaia repos.

These issue labels are helpful as it is very easy to filter data based on them.

  • Bug:{type} to denote rough classes of bugs that could be identified by similar testing techniques. I propose the following types, based on my findings while analyzing historical bug reports in the ICS repo. Some of the labels already exist in one way or another, others do not:
    • Spec-Mismatch: To label bugs caused by mismatching implementation and specification.
      These bugs would likely be perfect candidates for Model or Property based testing.
    • Tests: To label bugs inside our tests, i.e. a test fails, even though the behavior is correct.
      These bugs should help us identify where our tests are too brittle.
    • Spec-Error: To label bugs where the specification itself has an error.
      These bugs could be avoided by having executable specifications. Importantly, if these bugs are found late, i.e. in a testnet, it should ring alarm bells, since they slipped through many layers of testing and quality assurance.
    • Input-Handling: To label bugs related to reading and parsing inputs, e.g. reading Genesis files, parsing input data from an interface, etc.
      These bugs could be perfect candidates for out-of-the-box fuzzers, and if we have many of these we should consider adding those.
    • DevOps: To label bugs inside our setup and infrastructure, e.g. bugs in Makefiles, Dockerfiles, default configurations, etc.
      These bugs could need more extensive end-to-end tests with setups closer to the real environment to find.
  • Found:{source} to denote how the bug was found. I propose the following sources. Again, some of these labels already exist in one way or another:
    • security-audit-{identifier}: To identify bugs found in audits.
    • diff-test/e2e-test/integration-test/unit-test: To identify bugs that were surfaced while adding new tests of a certain type.
    • Fuzzing: To identify bugs found by fuzzing
    • User: User issues
    • Testnet: for bugs found in a testnet

More labels should be created if necessary and appropriate. Also, multiple labels can be added to one issue, e.g. an issue might be related to input-handling and a spec-mismatch.

I propose to apply these labels (as far as the information is known) when triaging bugs, and to add a step in our workflow to update the labels when fixing or closing issues. This mainly means removing the bug tag if it turns out the behavior is intended, adding a bug:{type} tag when it has been identified what caused the bug, and adding a found:{source} tag if it’s known how the bug was found.

I also propose to use the Gaia repo to collect bugs that were found in the testnet or in production on the Hub, even if those bugs are in code from our dependencies, e.g. the Cosmos SDK. For example, if a bug is present in the Gaia binary of a testnet or incentivized testnet, it should have ideally been caught by our tests before the testnet started, so we should log this with an issue. Thus, good practice would be to open an issue in the Gaia repo for any bug in the Gaia binary, and to link to an issue in the appropriate other repo if the bug is in code of our dependencies, e.g. link to an issue in the Cosmos SDK repo if a bug in Gaia was caused by a bug in the SDK. This allows us to also keep track of how prone to errors our dependencies are, and to which we need to pay special attention.

If we decide to integrate some extra labels into our workflow, we should take care to a) make sure the current team knows about the workflow and what to do with bug labels, and b) make sure that new incoming members of the team learn about this. One potential way to do this would be to add a section in CONTRIBUTING.md that specifically mentions that the person triaging or closing issues should take care of these issue labels. It could also be worth looking into bots to help with a part of this, e.g. a bot that adds a "classification needed" label to new issues labelled bug that are closed. Many of the labels I am proposing already exist, but haven't really been used, even when they might have been appropriate.

Potential issues with the solution

Aren’t we bound to run into https://en.wikipedia.org/wiki/Survivorship_bias, since the bugs we should find are the ones that we currently miss?

This seems true, but collecting better data will also allow us to avoid survivorship bias better, and to search where we are currently not finding bugs.

People will forget to tag issues or the tags will fall out of use over time.

That might happen, but maybe it will give useful data until then. In addition, to get useful data we do not need 100% of issues to be tagged correctly - having better data is good, even if it is not perfect either.

External users shouldn’t be bothered with this.

I think it is important that the core team takes care of this, and we don’t expect external users to know how to classify things. They might guess some category, or they might not, but this would be fine either way, since we triage issues anyways and can add the labels then.

Maybe categorizing these things by hand is pointless, if we could get LLMs to characterize issues.

If anyone has suggestions, I would be really happy to hear them. It seems there exist out-of-the-box plugins that could help streamline this, see https://github.com/apps/automatic-issue-classifier

@p-offtermatt
Copy link
Contributor Author

informalsystems/hermes#2981 might be related

@p-offtermatt
Copy link
Contributor Author

Talked to Marius about this, and I will come up with a plan to implement this, so I'm reassigning to myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: Productivity Productivity: Developer tooling, infrastructure improvements enabling future growth
Projects
Status: 📥 F2: Todo
Development

No branches or pull requests

3 participants