Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynatrace] Move zero percentile creation from MeterFilter to Meter creation #4782

Open
wants to merge 4 commits into
base: 1.12.x
Choose a base branch
from

Conversation

pirgeo
Copy link
Contributor

@pirgeo pirgeo commented Mar 1, 2024

resolves #4750

See the issue for more context. This is mostly a refactoring of where the percentiles are registered (in the MeterFilter vs. in the creation of Meters)

CC @shakuzen

Copy link
Member

@shakuzen shakuzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I think. You can rebase now with the changes to LTT. Are you planning to add some tests?

}

public void addMeterId(Meter.Id id) {
metersWithArtificialZeroPercentile.add(id.getName() + ".percentile");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a Set of Id instead of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly moved the code around and didn't actually change much, but I guess we could use a set of Meter.Ids instead. That would then allow you to explicitly turn on/off the export of the 0th percentile for one particular Meter.Id. For example: meter.name with Tags tag=value1 would export a 0th percentile, but meter.name with tag=value2 would not. I am not sure if this is really that relevant - I assume you generally specify percentiles on the metric name level. I am happy to change it though, if you feel it's relevant.

@pirgeo
Copy link
Contributor Author

pirgeo commented Mar 4, 2024

Thanks for taking a look! I rebased on the latest branch and added tests.

// For LongTaskTimer, the 0 percentile is not tracked as it doesn't clear the
// "interpolatable line" threshold defined in DefaultLongTaskTimer#takeSnapshot().
// see shouldTrackPercentilesWhenDynatraceSummaryInstrumentsNotUsed for a test
// that exports LongTaskTimer percentiles
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is referring to these three lines:

List<Double> percentilesAboveInterpolatableLine = percentilesRequested.stream()
.filter(p -> p * (activeTasks.size() + 1) > activeTasks.size())
.collect(Collectors.toList());

They drop the 0 percentile (percentilesAboveInterpolatableLine won't contain the 0 percentile anymore)

I'm not sure if this is a bug or working as intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug a little deeper, and I think the problem lies in the fact that the 0% HistogramGauge is never registered.

  1. The LongTaskTimer is registered, with the explicit zero percentile.
  2. The histogram gauges get registered. To do so, an initial snapshot is created:
  3. At this point, the number of active tasks is 0. The LongTaskTimer is currently in the process of being registered. Thus, the calculation for percentilesAboveInterpolatableLine looks as follows:
    List<Double> percentilesAboveInterpolatableLine = percentilesRequested.stream()
    .filter(p -> p * (activeTasks.size() + 1) > activeTasks.size())
    .collect(Collectors.toList());

    Since the number of activeTasks is 0, the calculation in the filter effectively evaluates to p * (0 + 1) > 0, which is equivalent to p > 0. This, of course, is a condition that the 0 percentile (where p==0) explicitly does not fulfil. The 0 percentile is dropped here.
  4. Because the 0 percentile was dropped in the initial takeSnapshot, it will not be registered as a HistogramGauge because the percentileValues call has only percentiles > 0:
    ValueAtPercentile[] valueAtPercentiles = initialSnapshot.percentileValues();
  5. Since the 0th percentile gauge is never registered, it is also never exported.

Later, when the app is running and there are active tasks, this is caught in the if here:

if (!percentilesRequested.isEmpty() || !buckets.isEmpty()) {

However, on the initial takeSnapshot,
List<Double> youngestToOldestDurations = StreamSupport
.stream(((Iterable<SampleImpl>) activeTasks::descendingIterator).spliterator(), false)
.sequential()
.map(task -> task.duration(TimeUnit.NANOSECONDS))
.collect(Collectors.toList());
evaluates to an empty list, as there are no active tasks during registration, and the 0th percentile is never registered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have at least one bug in this area of the code. See #3877. I'll take a closer look tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to update here that I didn't get a chance to look into it more today due to some other work that needs to get done first. I'll come back to this once that's done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have just very recently seen another example of what seems like a race condition caused by the LongTask ending while the snapshot is being produced. I think Dynatrace should be covered for now with the changes in #4780 (most users will use the new DynatraceLongTaskTimer), but I agree that this should be revisited at some point. Thanks for the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants