-
Notifications
You must be signed in to change notification settings - Fork 42
Notes
Because it's already impossible to follow my notes about the topic of (continues) profiling and because of the lack of a better place, I'll put the notes here.
Continuous profiling platform — debug performance issues in your code! https://github.com/pyroscope-io/pyroscope
- agent-based
- the UI is nice
- badger for storage (store the parsed profiling tree)
12 Feb 2021
Unity3D profiler https://docs.unity3d.com/Manual/Profiler.html
The Unity Profiler is a tool you can use to get performance information about your application. You can connect it to devices on your network or devices connected to your machine to test how your application runs on your intended release platform.
09 Dec
Amazon's CodeGuru Profiler https://aws.amazon.com/codeguru/faqs/
CodeGuru Profiler consists of three parts: an agent, the profiler service, and intelligent recommendations. The agent runs as an in-process thread as part of your application. It takes data from each of your service instances running the agent and sends them to the profiler service every 5 minutes, which then aggregates them. CodeGuru Profiler then publishes the profile data in interactive flame graphs that enable you to visualize the performance of your application. CodeGuru Profiler also continuously scans the profiled data and compares it against Amazon and performance engineering best practices and proactively alerts you with intelligent recommendations when performance issues are discovered.
As for now, it supports Java applications only.
03 Dec
PingCap's TiDB implemented a way to store pprof data as a tree in SQL tables. See https://github.com/pingcap/tidb/pull/13009
30 Oct
Have a look a atop(1)
.
Interesting part:
When atop is installed, the script
atop.daily
is stored in the/etc/atop
directory. This script takes care thatatop
is activated every day at midnight to write compressed binary data to the file/var/log/atop/atop_YYYYMMDD
with an interval of 10 minutes. Furthermore the script removes all raw files which are older than four weeks. The script is activated via the cron daemon using the file/etc/cron.d/atop
with the contents0 0 * * * root /etc/atop/atop.daily
15 Oct
Causal profiling in Go https://groups.google.com/forum/#!topic/golang-nuts/KrGnzhd3mV8
"Performance Matter" a talk about caz
, Causal Profiler
https://www.youtube.com/watch?v=r-TLSBdHe1A
4 Oct
clinic — a set of toolings to instrument and analyse performance of a Node.js service (https://clinicjs.org) https://www.youtube.com/watch?v=ASv8188AkVk
22 Aug
Observation
The initial idea (at least the implementation) to parse and store pprof files in Postrgres to save storage, didn't work. Postgres adds lots of storage overhead, comparing to gz proto files and I failed to build an effecient query API.
As the next experiment: . go back to storing raw pprof files and external index . store everything in badger, building custom index for quick quering (refer to Jaeger's badger plugin).
https://github.com/profefe/profefe/pull/28
10 Aug
Continuous Profiling a JVM services with Opsian (opsian.com) https://www.youtube.com/watch?v=3E3QZfoB57M
Some nice wordings to describe the project.
Why continuous profiling? . legal reasons: devs don’t have access to prod . keep historical data: postmortems, comparing normal/abnormal behaviour . store samples in context, e.g. labelling: compare service’s performance in different environments (cloud vs hardware)
3 Aug
During GopherCone, have talked to people from Uber and CoreOS Uber They want to build an internal system of continuous profiling. Aware of profefe. Has initial scaling requirements.
CoreOS (conprof) . conprof re-uses service discovery (SD) mechanism and labels from Prometheus . stores raw pprof files in a forked version of TSDB . everything related to operations (configuration, deployment, scaleability) is the same as with Prometheus
1 Aug
conprof — like profefe but pull-based, uses prometheus’s tsdb to store pprof data. https://github.com/conprof/conprof
How do they implement the list query? Do they read all pb.gz files from the storage to read the profiles meta-data? What if there 100th of pprof files in questions, does TSDB suitable for that?
28 May
https://github.com/nokia/memory-profiler Written in Rust. Has an API, that one can use for an inspiration to profefe's API.
22 May
stackimpact does basically the same thing as profefe, but cost money ($1K for <30 instances) https://stackimpact.com/docs/go-profiling/
22 Mar
(UI) Cividis color schema. Paper https://arxiv.org/ftp/arxiv/papers/1712/1712.01662.pdf Implementation https://github.com/pnnl/cmaputil/blob/master/colormaps/cividis.lut
10 Mar
(UI) Netflix’s FlameScope to visualise profiles https://github.com/Netflix/flamescope FlameScope shows a single perf file with the time distributed across XY axes: seconds on X, µ-seconds on Y.
6 Mar
I should have called it pporf or ppoof https://twitter.com/dgryski/status/1102678660237033472
5 Mar
(UI)
Profiling data (samples) from a fleet of instances can be visualised with heatmaps. See http://www.brendangregg.com/HeatMaps/utilization.html for the idea.
Netflix’s Vector — system-level performance visualiser. Use it to refer to heatmaps usage https://medium.com/netflix-techblog/extending-vector-with-ebpf-to-inspect-host-and-container-performance-5da3af4c584b An example of a dataset of I/O events to generate a heatmap https://github.com/spiermar/vector-pmda/tree/master/BINHeatMap
22 Feb
Google’s “Hipster Shop” provides examples of using Stackdriver, OpenCensus, etc. May use it as a public demo for profefe? https://github.com/GoogleCloudPlatform/microservices-demo
15 Feb
Liveprof a tool for automatic profiling, collecting and aggregating the data for PHP. Similar idea to profefe. TODO Have a look at the implementation details and the UI https://habr.com/ru/company/badoo/blog/436364/
Instana — another company that provides an agent to collect application performance data. One needs to look at their agent code https://github.com/instana/go-sensor
5 Feb 2019
Observation
The major source of storage consumption in pprof files (pb.gz) are strings (package names, file names). When continuously profile the application, the majority of files will contain the same strings. Instead of storing raw pprof files, we could parse them and storing all strings of all profiles in a dedicated table/index. Meaning that profiles from the same build will contain only profiling data (int64s)
The format of pprof proto is described in https://github.com/google/pprof
To decode a proto file with protoc
:
% gunzip -k adjust_server-cpu-b16-20180618-1.pb.gz
% protoc --decode perftools.profiles.Profile \
-I ~/Workspace/src/github.com/google/pprof/proto/ \
profile.proto <adjust_server-cpu-b16-20180618-1.pb > adjust_server-cpu-b16-20180618-1.txt
See https://github.com/timescale/prometheus-postgresql-adapter — an example of an approach to effectively import telemetry data to Postgres.
21 Dec
Storage requirements
adjust_server_cpu.prof
< 200KB x 1h/15min ~= 18MB/d x 50 machines = 900MB/d ~= 1GB/d
adjust_server_heap.prof
< 1MB x 1h/15min ~= 96MB/d x 50 machines = 4800MB/d == 4GB/d
Ideas
-
Leaf-first mode Reverse and regroup the stack trace (callee listed at bottom, caller at top) Helps to identify hot spots
-
Flame graphs diff Select the “baseline” and show increase/reduction with red/blue colours (or hue changing).
13 May
Initial architecture from 2018
- agent sends pprof files to the collector
- collector stores the file to FS and saves the metadata to Postgres
Reference implementations:
- https://github.com/gomods/athens — has an implementation of its storage using FS/S3 + external index
- https://github.com/noxiouz/docker-distribution-postgresql — a driver for Docker Registry that stores blobs on the FS, but keeps the index in Postgres.
- https://github.com/jaegertracing/jaeger/ — push based, stores data in Cassandra.