Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for total samples calculation in query stats. #173

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sahnib
Copy link

@sahnib sahnib commented Feb 9, 2023

This PR adds support for calculating total samples during a query, and returning this result as query statistics.

Issue: #148

Prometheus defines totalQueryableSamples as the total number of samples read out of the underlying Queryable instance inclusive of any samples that are buffered between range steps. The test cases have been modified to compare the totalQueryableSamples value between prometheus engine and thanos promql-engine, and ensures they are the same.

I am still working on ensuring the samples are calculated correctly in the distributed engine, however filing PR early allows us to ensure we have an alignment on the interface changes.

There are 2 methods added to the model.VectorOperator interface.

  1. Stats(): This method provides the query Stats after this operator has been evaluated.
  2. Drain(): Drains the operator so that all running go-routines are finished. This is required to be able to calculate total samples in the query.

The base operators like vectorSelector, matrixSelector calculate query samples in the Next() function. Higher level operators rely on the operators used inside them as necessary. Some higher level operators like dedup need to re-calculate total samples because the underlying operators can potentially double count some series.

We can extend this logic to enforce maxSamples limit at Operator level in the future.

Signed-off-by: sahnib sahnib@amazon.com

@fpetkovski
Copy link
Collaborator

fpetkovski commented Feb 10, 2023

Thanks for the PR and for reaching out early with a draft implementation. Pointing out the change to the core VectorOperator interface is indeed a good call, and I wonder if we can implement this feature without modifying the interface.

Can we instead inject a Stats object in the vector and matrix execution operators and modify stats through using functions from the atomic package? I don't think we need to worry about dedup right now since it is only used for distributed execution.

@sahnib
Copy link
Author

sahnib commented Feb 10, 2023

Thanks a lot for the guidance @fpetkovski. Injecting the stats object sounds like a good idea. We would need a mechanism to collect these stats and populate them in the compatibilityQuery object.

As I understand, we will inject the Stats object when we build operators, and then increment using atomic operations as we collect samples. To confirm, is the suggestion to share the same stats object across compatibility query and underlying operator will perform calculation?

Another issue I observed is that concurrencyOperator does not have a Drain() or Close() method today. This results in the underlying goroutines code link still continuing even if the top-level operator is finished. (An example is the binary/vector.go operator) I see that you also have a TODO regarding this. Do you think we should keep the Drain() or Close() method for this functionality?

@fpetkovski
Copy link
Collaborator

fpetkovski commented Feb 12, 2023

To confirm, is the suggestion to share the same stats object across compatibility query and underlying operator will perform calculation?

That's a good question, and I am not sure what's the best approach would be here. The advantage of sharing the object is keeping the code simpler, and I don't expect the atomic increments to have a significant performance impact. But, yes, that would be the idea, and it's worth keeping an eye on performance regressions with the benchmarks we have in the repo.

Another issue I observed is that concurrencyOperator does not have a Drain() or Close() method today. This results in the underlying goroutines code link still continuing even if the top-level operator is finished.

In theory each finished query will make sure the context is cancelled. Once the context is cancelled, all operators will terminate and I would expect memory to get released. The concurrency operator will also drain the channel before it terminates: https://github.com/thanos-community/promql-engine/blob/main/execution/exchange/concurrent.go#L93.

Have you seen issues with the current termination mechanism in practice?

@sahnib
Copy link
Author

sahnib commented Feb 15, 2023

I looked further into the suggestion of incrementing stats using a atomic operator @fpetkovski but ran into an issue with incrementing stats in vector and matrix object, specifically with the stepInvariantOperator.

Prometheus totalQueryableSamples is inclusive of any samples that are buffered between range steps. Hence, stepInvariantOperator needs to be aware of samples in its next Operator and multiply these samples by numSteps to account for samples buffered between range steps. I considered using the number of samples in u.cachedVector, but that does not work. This is because the number of samples in cachedVector can arise from a noArgFunctionOperator like pi() which should not be counted, or from a vector_selector which should be counted.

One mechanism for us to solve this is to return samples considered in the Next() function like - Next(ctx context.Context) ([]StepVector, int64, error), and use this value in stepInvariantOperator - but this results in change of interface again. Its still a smaller change though compared to adding new methods. We will need this value in the future as well to implement query samples per step. Let me know your thoughts.

I do wonder if this calculation will get a bit dicey when we consider distributed execution, as we will need to subtract duplicated samples in the dedup Operator. I would like thanos/promql-engine to support query samples for distributed Execution in future. (I have not completely thought the distributed path though and not sure how easy it will be support the sample calculation in distributed mode, tbh).

In theory each finished query will make sure the context is cancelled. Once the context is cancelled, all operators will terminate and I would expect memory to get released. The concurrency operator will also drain the channel before it terminates: https://github.com/thanos-community/promql-engine/blob/main/execution/exchange/concurrent.go#L93.

In my testing, I saw that tests were calculating less sample values compared to vanilla Prometheus. Further debugging suggested that goroutines in concurrencyOperator were left incomplete. Cancelling the context will allow goroutines to terminate, but we will miss calculating all samples because evaluation is short-circuited (https://github.com/thanos-community/promql-engine/blob/main/execution/exchange/concurrent.go#L77). I was able to get around this today by repeatedly calling rhs.Next() here to drain the operator - so we might not need to add any additional Drain() method in the interface as of now.

@fpetkovski
Copy link
Collaborator

Thanks for the elaboration @sahnib. I am out sick this week but I will revisit this again on Monday.

@fpetkovski
Copy link
Collaborator

@sahnib would you mind adding some failing tests as part of the PR so I can get a better insight into the problem? I am not familiar with how query stats work in the Prometheus engine and getting some concrete examples would help quite a bit.

@sahnib
Copy link
Author

sahnib commented Feb 22, 2023

Yeah, no worries @fpetkovski. I updated the PR with changes based on atomic counters. If I exclude the changes in step_invariant.go file (https://github.com/thanos-community/promql-engine/pull/173/files#diff-15039040719a84e49e402377c62ed1606af8f83e9d278e3e2b695b8eacfa90c2) and run the TestQueriesAgainstOldEngine, the following tests fail as total query samples do not match. Let me know if this helps.

2023-02-21-160211_604x1064_scrot

@sahnib sahnib marked this pull request as ready for review February 23, 2023 04:09
@fpetkovski
Copy link
Collaborator

I took a stab at this to get an insight into the mismatch we get with the Prometheus engine: https://github.com/thanos-community/promql-engine/compare/main...fpetkovski:engine-query-stats?expand=1. I also took a look at the documentation for the TotalStats field which says

// TotalSamples represents the total number of samples scanned
// while evaluating a query.
TotalSamples int64

Looking at the test failures, I see that most of them are caused by the step invariant optimization, or a short-circuit in the binary operator. This in our case produces lower values for TotalSamples.

Here are my thoughts on it:

  • TheTotalSamples calculation depends on the engine implementation. In our case we short-circuit some expressions (example test case topk with expression as argument not returning any value) which is actually a good thing. Since don't really scan samples during these optimizations, I don't think we should be overcounting just to match the output of the Prometheus engine.
  • The same would hold for the step invariant operator. In most execution plans we use a cached vector and avoid chunk decoding. Since we are not really scanning samples, I don't think we should be increasing TotalSamples just because Prometheus does it. An example here is count_over_time @ start which only decodes one sample per series. Prometheus claims it decodes the entire range, and it might in fact be doing that, but that's likely due to an inefficient optimization.
    If we really wanted the operator to also contribute to the total count, we can inspect its operand here and pass in the stats object in cases when they should be affected.

So to summarize, I would focus on making sure we count decoded samples correctly instead of trying to match what Prometheus does in this case. Let me know what you think.

@sahnib
Copy link
Author

sahnib commented Mar 2, 2023

Thanks for your input @fpetkovski. I was a bit tied up last week and could not get to this.

The feedback makes sense to me. However, I looked further into Prometheus, and it seems like there was a decision made to not account for optimizations that happen inside the query engine specifically. (The comment for TotalSamplesPerStep provides this information. See the pasted code sample below from Prometheus).


	// TotalSamples represents the total number of samples scanned
	// while evaluating a query.
	TotalSamples int64

	// TotalSamplesPerStep represents the total number of samples scanned
	// per step while evaluating a query. Each step should be identical to the
	// TotalSamples when a step is run as an instant query, which means
	// we intentionally do not account for optimizations that happen inside the
	// range query engine that reduce the actual work that happens.
	TotalSamplesPerStep []int64

We can choose to diverge from Prometheus here, but I think that the decision to exclude optimizations was made to allow users rely on total Samples regardless of the query engine used. Let me know what do you think.

@fpetkovski
Copy link
Collaborator

Given that with this implementation we want to focus on performance, adding a drain method would likely go against that goal and defeat the purpose of adding various optimizations.

I am okay with counting samples from the step invariant operator if we can do it by injecting a flag when the downstream operator is a vector/matrix selector. We should be able to do that here: https://github.com/thanos-community/promql-engine/blob/main/execution/execution.go#L225-L234

@sahnib
Copy link
Author

sahnib commented Mar 14, 2023

Thanks @fpetkovski. Your comments make sense to me. The PR got accidentally closed due to GitHub workflow syncing my main branch (apologies for that).

Given that with this implementation we want to focus on performance, adding a drain method would likely go against that goal and defeat the purpose of adding various optimizations.

I am aligned here. One of the primary motivations for the new engine is to improve performance. Unless we can come up with some way to add these samples without evaluation, we should skip it. I will exclude the change to run these operators, and modify tests accordingly.

I am okay with counting samples from the step invariant operator if we can do it by injecting a flag when the downstream operator is a vector/matrix selector. We should be able to do that here: https://github.com/thanos-community/promql-engine/blob/main/execution/execution.go#L225-L234

Thanks for your input. I will try this and get back to you. This will help us avoid the interface change, so I like the idea.

@sahnib sahnib reopened this Mar 14, 2023
@GiedriusS
Copy link
Member

Perhaps a simpler approach would be the 2nd option #106 (comment) here to avoid extending the interface?

@sahnib
Copy link
Author

sahnib commented Mar 23, 2023

@fpetkovski I have removed the changes to drain the operators in favor of performance. However, I tried the proposal to counting samples from the step invariant operator if we can do it by injecting a flag when the downstream operator is a vector/matrix selector. We should be able to do that here: https://github.com/thanos-community/promql-engine/blob/main/execution/execution.go#L225-L234

I ran into a issue with this approach. The current mechanism to create the Iterator model (https://github.com/thanos-community/promql-engine/blob/main/execution/execution.go#L231-L235) is very flexible (which is good thing :)) - however it means that the step invariant operator can have a nested vector selector inside it. An example is provided below. This results in a lot of state management just to track if a underlying Operator inside the stepInvariant Operator is vector/matrix selector - and I don't think its easy to manage this state.


=== RUN   TestInstantQuery/disableOptimizers=true/count_over_time_@_start#01/disableFallback=false
[*stepInvariantOperator]:
└──[*coalesce]:
   ├──[*concurrencyOperator(buff=2)]:
   │  └──[*matrixSelector] count_over_time({[__name__="http_requests_total"]}[5m0s] 0 mod 4)
   ├──[*concurrencyOperator(buff=2)]:
   │  └──[*matrixSelector] count_over_time({[__name__="http_requests_total"]}[5m0s] 1 mod 4)
   ├──[*concurrencyOperator(buff=2)]:
   │  └──[*matrixSelector] count_over_time({[__name__="http_requests_total"]}[5m0s] 2 mod 4)
   └──[*concurrencyOperator(buff=2)]:
      └──[*matrixSelector] count_over_time({[__name__="http_requests_total"]}[5m0s] 3 mod 4)
            --- PASS: TestInstantQuery/disableOptimizers=true/count_over_time_@_start#01/disableFallback=false (0.00s)

@GiedriusS thanks for your input. Currently, we are handling this by passing down queryStats object inside the vector/matrix selector objects. Passing around OperatorTracker is a interesting idea which I think is along same lines as injecting queryStats. We can aggregate in the injected queryStats object or the passed around OperatorTracker. I am okay with both approaches.

In order to help us move in some direction, I have removed the changes in stepInvariant Operator for now (until we settle on a mechanism to aggregate these samples in query Stats). @fpetkovski @GiedriusS Let me know if the subset of changes look good.

Would it be possible for us to jump on a call to get to a resolution on options between passing around OperatorTracker, injecting queryStats, and discuss mechanisms to handle counting samples scanned per step in the stepInvariantOperator.

@sahnib sahnib changed the title Add support for total samples calculation in quey stats. Add support for total samples calculation in query stats. Mar 23, 2023
@sahnib sahnib force-pushed the main branch 5 times, most recently from 00c5c6d to 9fbbbe2 Compare March 27, 2023 19:16
@fpetkovski
Copy link
Collaborator

@sahnib We spoke with Giedrius in Slack about this. Let's extend the interface methods by with adding an OperatorTracer to Next and Series. The tracer can have a reference to query stats and the matrix and vector selectors can increment those.

As we spoke during community hours, we can omit the step invariant operator for now and deal with it in a separate issue.

@sahnib
Copy link
Author

sahnib commented Apr 6, 2023

Thanks @fpetkovski I will make these changes in the next couple days.

@sahnib sahnib force-pushed the main branch 2 times, most recently from bb4a6ec to 10e53ba Compare April 10, 2023 18:40
Copy link
Collaborator

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me!

@@ -299,6 +321,7 @@ loop:
}
// Values in the buffer are guaranteed to be smaller than maxt.
if t >= mint {
currSamples += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just adding 1 to this variable but not using it anywhere? Also, what about selectExtPoints? 🤔 I'm surprised that the linter hasn't caught this 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. It's a remnant of my refactoring :). I have removed it.

The actual code to count samples is abstracted out in func (o *matrixSelector) countSamples(points []promql.Point) int64 function. [called at line 156 in the matrix_selector.go file] This prevents having this logic in both selectExtPoints and selectPoints.

…m vector operators during query execution.

The OperatorTracer is passed to the VectorOperator during Series() and Next() operations, and is a container
to aggregate o11y information. Currently, the vector selector and matrix selector operators use this tracer
for calculating query samples.
@GiedriusS
Copy link
Member

Thanks for your PR, we will discuss this topic a bit more with @PradyumnaKrishna @saswatamcode as this closely relates to our operator tracing project and come back to you.

@GiedriusS
Copy link
Member

We are working on exposing Explain() first through the Thanos UI so will come to your PR in a week or so, even if we won't merge this as-is it is definitely a very good example that we will start our work from!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants