Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code insights: Extend GraphQL schema to expose repositories that cause incomplete datapoints #62578

Open
bahrmichael opened this issue May 9, 2024 · 2 comments · May be fixed by #62756
Open
Labels
code-insights Issues related to the Code Insights product code-insights-api An issue that relats to the code insights api

Comments

@bahrmichael
Copy link
Contributor

#62295 surfaced problems with debugging timeout and generic problems that lead to incomplete data points. The documentation says that the user should reduce the scope of the query, but we see that customers prefer to run queries over all of their repositories. I'm not sure how we would bisect repositories to exclude the problematic ones.

Our GraphQL schema for TimeoutDatapointAlert and GenericIncompleteDatapointAlert contains the time and maybe a generic reason that something went wrong.

We could help users that don't have access to logs by exposing the repository/repositories that caused problems.

Example:

type GenericIncompleteDatapointAlert implements IncompleteDatapointAlert {
    """
    The data point that is incomplete.
    """
    time: DateTime!

    """
    A message describing why the datapoint was marked incomplete.
    """
    reason: String!

    """
    The repositories that this data point is incomplete for.
    """
    repositories: [String!]!
}

As a next step we could surface this information in our UI, but including this info in our GraphQL should be a good start.

@bahrmichael bahrmichael added code-insights Issues related to the Code Insights product code-insights-api An issue that relats to the code insights api labels May 9, 2024
@camdencheek
Copy link
Member

Seems useful!

One question: can't we have multiple reasons? In particular, wouldn't there potentially be a different reason per incomplete repository?

@bahrmichael
Copy link
Contributor Author

wouldn't there potentially be a different reason per incomplete repository?

In theory yes. Currently we have only timeout and generic as reasons. From what I can see the code encourages adding new types when there are errors that we want to distinguish from each other.

func (g *genericIncompleteDatapointAlertResolver) Reason() string {
switch g.point.Reason {
default:
return "There was an issue during data processing that caused this point to be incomplete."
}
}

const (
ReasonTimeout IncompleteReason = "timeout"
ReasonGeneric IncompleteReason = "generic"
ReasonExceedsErrorLimit IncompleteReason = "exceeds-error-limit"
)

I feel like changing the shape of the incomplete datapoints and its sub-fields can become a rabbit-hole and would avoid going this route unless necessary.

An alternative to satisfy the need to to isolate out repos could be an additional field next to the incomplete datapoints that just has a list of repos with problems.

"""
Status indicators for a specific series of insight data.
"""
type InsightSeriesStatus {

    ...

    """
    Data points that are flagged terminally incomplete for this series.
    """
    incompleteDatapoints: [IncompleteDatapointAlert!]!

    repositoriesWithProblems: [String!]!
}

Instead of a list of repo names, this could also be a more sophisticated list with error reasons and timestamps.

bahrmichael added a commit to sourcegraph/docs that referenced this issue May 21, 2024
For sourcegraph/sourcegraph#62295

This PR updates the documentation with more tips for very large
repositories.

There are difficulties with Code Insights where it may run for a while,
and then tell the user that there were incomplete data points. This
probably came from very large repositories not being able to compute
reasonably fast.

In addition to this documentation update I'm working on giving users
more information about which repositories lead to incomplete datapoints:
sourcegraph/sourcegraph#62578

---

@sourcegraph/search-platform I poked a bit at the search backend when
gathering this info, and would like to get your input if it's accurate,
and if there may be other improvements to make complex queries run
faster on very large repos :)

@mike-r-mclaughlin Could you review if this new info would be helpful
for customers? I'm planning to expose the repositories that caused
incomplete datapoints with
sourcegraph/sourcegraph#62578. Then a customer
can see which repository didn't compute, pick that one, optimize the
query, and then run the big Code Insight again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-insights Issues related to the Code Insights product code-insights-api An issue that relats to the code insights api
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants