Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare v2.52.0-rc.0 release #13969

Merged
merged 1 commit into from Apr 30, 2024
Merged

Prepare v2.52.0-rc.0 release #13969

merged 1 commit into from Apr 30, 2024

Conversation

ArthurSens
Copy link
Member

There were a few entries which I was not sure what to announce. Please let me know if you have better suggestions :)

@ArthurSens ArthurSens changed the base branch from main to release-2.52 April 22, 2024 15:35
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, a couple of suggestions, thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@ArthurSens ArthurSens force-pushed the prepare-v2.52 branch 6 times, most recently from 40a7376 to 6698159 Compare April 23, 2024 11:33
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We spent some time syncing with @ArthurSens resulting in those notes

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
* [BUGFIX] Scrape: Fix setting native histogram schema factor during scrape. #13846
* [BUGFIX] TSDB: Fix counting of histogram samples when creating WAL checkpoint stats. #13776
* [BUGFIX] TSDB: Avoid compacting empty heads. #13755
* [BUGFIX] TSDB: Count float histograms in WAL checkpoint. #13844
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make it a feature or join with stats bugfix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again about this, one is counting in WAL checkpoint and the other is counting in query stats... I'm not sure if it makes sense to join them in a single changelog entry

CHANGELOG.md Outdated Show resolved Hide resolved
@ArthurSens
Copy link
Member Author

/prombench v2.51.2

@prombot
Copy link
Contributor

prombot commented Apr 23, 2024

@ArthurSens is not a org member nor a collaborator and cannot execute benchmarks.

@Nexucis
Copy link
Member

Nexucis commented Apr 23, 2024

/prombench v2.51.2

@prombot
Copy link
Contributor

prombot commented Apr 23, 2024

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-13969 and v2.51.2

After successful deployment, the benchmarking results can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart v2.51.2

@ArthurSens ArthurSens changed the title Prepare v2.52 release Prepare v2.52.0-rc.0 release Apr 23, 2024
Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
@ArthurSens
Copy link
Member Author

Small decrease in memory allocations image
I can see a very small increase in CPU, not sure how much is relevant enough though image

I also see small increases in all query steps, but they are nanoseconds, or very small miliseconds

@prombot
Copy link
Contributor

prombot commented Apr 26, 2024

Benchmark tests are running for 3 days! If this is intended ignore this message otherwise you can cancel it by commenting: /prombench cancel

@ArthurSens
Copy link
Member Author

/prombench cancel

@prombot
Copy link
Contributor

prombot commented Apr 26, 2024

Benchmark cancel is in progress.

@ArthurSens
Copy link
Member Author

Initially, I thought the latency in different query steps would be insignificant, but when looking at overall query latency I can see a significant impact on the 0.99 quantile.

image

Query used:

sum(prometheus_engine_query_duration_seconds{namespace="prombench-${prNumber}"}) by (prometheus, quantile)

@bwplotka
Copy link
Member

Also noticed uneven traffic to services, explaining latency being higher for Prom from this PR for the period where it has more request to serve than the old one. Unfortunately it still happens but we improved this a lot already (uneven query load distribution):

image

@bwplotka
Copy link
Member

bwplotka commented Apr 30, 2024

Unfortunately we see even 0.9 percentile difference for moments the load was even. It seems indeed we have a small regression in query tail latency in this release, but perhaps something to tolerate. Many optimizations had happened and could impact this.

image

@bwplotka
Copy link
Member

bwplotka commented Apr 30, 2024

For 0.99 it was sometimes extreme difference - correlating with CPU does not help (maybe a bit more CPU used, weird)

image

@bwplotka
Copy link
Member

I think that one was affected by compaction (unlucky)

image

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit sketchy, there is a slight different in tail latency for queries, but perhaps negligible.

Either we stop release and try to bisect what optimization could cause this OR we proceed and optimize later when (if) users will be actually affected by this, ideally in RC phase. Happy with either option.

@ArthurSens
Copy link
Member Author

ArthurSens commented Apr 30, 2024

Hmmm, I didn't notice the compaction happening at the same time, but the compaction happens for both versions and only 1 spikes. The uneven query distribution sounds like a good explanation tho

I feel like this release candidate was delayed enough already, let's try publishing it and see if we get more feedback about slow queries.

@ArthurSens ArthurSens merged commit f170a01 into release-2.52 Apr 30, 2024
46 checks passed
@ArthurSens ArthurSens deleted the prepare-v2.52 branch April 30, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants