Skip to content
Vladimir Varankin edited this page Feb 12, 2021 · 13 revisions

Because it's already impossible to follow my notes about the topic of (continues) profiling and because of the lack of a better place, I'll put the notes here.


Continuous profiling platform — debug performance issues in your code! https://github.com/pyroscope-io/pyroscope

  • agent-based
  • the UI is nice
  • badger for storage (store the parsed profiling tree)

12 Feb 2021


Unity3D profiler https://docs.unity3d.com/Manual/Profiler.html

The Unity Profiler is a tool you can use to get performance information about your application. You can connect it to devices on your network or devices connected to your machine to test how your application runs on your intended release platform.

09 Dec


Amazon's CodeGuru Profiler https://aws.amazon.com/codeguru/faqs/

CodeGuru Profiler consists of three parts: an agent, the profiler service, and intelligent recommendations. The agent runs as an in-process thread as part of your application. It takes data from each of your service instances running the agent and sends them to the profiler service every 5 minutes, which then aggregates them. CodeGuru Profiler then publishes the profile data in interactive flame graphs that enable you to visualize the performance of your application. CodeGuru Profiler also continuously scans the profiled data and compares it against Amazon and performance engineering best practices and proactively alerts you with intelligent recommendations when performance issues are discovered.

As for now, it supports Java applications only.

03 Dec


PingCap's TiDB implemented a way to store pprof data as a tree in SQL tables. See https://github.com/pingcap/tidb/pull/13009

30 Oct


Have a look a atop(1). Interesting part:

When atop is installed, the script atop.daily is stored in the /etc/atop directory. This script takes care that atop is activated every day at midnight to write compressed binary data to the file /var/log/atop/atop_YYYYMMDD with an interval of 10 minutes. Furthermore the script removes all raw files which are older than four weeks. The script is activated via the cron daemon using the file /etc/cron.d/atop with the contents 0 0 * * * root /etc/atop/atop.daily

15 Oct


Causal profiling in Go https://groups.google.com/forum/#!topic/golang-nuts/KrGnzhd3mV8

"Performance Matter" a talk about caz, Causal Profiler https://www.youtube.com/watch?v=r-TLSBdHe1A

4 Oct


clinic — a set of toolings to instrument and analyse performance of a Node.js service (https://clinicjs.org) https://www.youtube.com/watch?v=ASv8188AkVk

22 Aug


Observation

The initial idea (at least the implementation) to parse and store pprof files in Postrgres to save storage, didn't work. Postgres adds lots of storage overhead, comparing to gz proto files and I failed to build an effecient query API.

As the next experiment: . go back to storing raw pprof files and external index . store everything in badger, building custom index for quick quering (refer to Jaeger's badger plugin).

https://github.com/profefe/profefe/pull/28

10 Aug


Continuous Profiling a JVM services with Opsian (opsian.com) https://www.youtube.com/watch?v=3E3QZfoB57M

Some nice wordings to describe the project.

Why continuous profiling? . legal reasons: devs don’t have access to prod . keep historical data: postmortems, comparing normal/abnormal behaviour . store samples in context, e.g. labelling: compare service’s performance in different environments (cloud vs hardware)

3 Aug


During GopherCone, have talked to people from Uber and CoreOS Uber They want to build an internal system of continuous profiling. Aware of profefe. Has initial scaling requirements.

CoreOS (conprof) . conprof re-uses service discovery (SD) mechanism and labels from Prometheus . stores raw pprof files in a forked version of TSDB . everything related to operations (configuration, deployment, scaleability) is the same as with Prometheus

1 Aug


conprof — like profefe but pull-based, uses prometheus’s tsdb to store pprof data. https://github.com/conprof/conprof

How do they implement the list query? Do they read all pb.gz files from the storage to read the profiles meta-data? What if there 100th of pprof files in questions, does TSDB suitable for that?

28 May


https://github.com/nokia/memory-profiler Written in Rust. Has an API, that one can use for an inspiration to profefe's API.

22 May


stackimpact does basically the same thing as profefe, but cost money ($1K for <30 instances) https://stackimpact.com/docs/go-profiling/

22 Mar


(UI) Cividis color schema. Paper https://arxiv.org/ftp/arxiv/papers/1712/1712.01662.pdf Implementation https://github.com/pnnl/cmaputil/blob/master/colormaps/cividis.lut

10 Mar


(UI) Netflix’s FlameScope to visualise profiles https://github.com/Netflix/flamescope FlameScope shows a single perf file with the time distributed across XY axes: seconds on X, µ-seconds on Y.

6 Mar


I should have called it pporf or ppoof https://twitter.com/dgryski/status/1102678660237033472

5 Mar


(UI)

Profiling data (samples) from a fleet of instances can be visualised with heatmaps. See http://www.brendangregg.com/HeatMaps/utilization.html for the idea.

Netflix’s Vector — system-level performance visualiser. Use it to refer to heatmaps usage https://medium.com/netflix-techblog/extending-vector-with-ebpf-to-inspect-host-and-container-performance-5da3af4c584b An example of a dataset of I/O events to generate a heatmap https://github.com/spiermar/vector-pmda/tree/master/BINHeatMap

22 Feb


Google’s “Hipster Shop” provides examples of using Stackdriver, OpenCensus, etc. May use it as a public demo for profefe? https://github.com/GoogleCloudPlatform/microservices-demo

15 Feb


Liveprof a tool for automatic profiling, collecting and aggregating the data for PHP. Similar idea to profefe. TODO Have a look at the implementation details and the UI https://habr.com/ru/company/badoo/blog/436364/

Instana — another company that provides an agent to collect application performance data. One needs to look at their agent code https://github.com/instana/go-sensor

5 Feb 2019


Observation

The major source of storage consumption in pprof files (pb.gz) are strings (package names, file names). When continuously profile the application, the majority of files will contain the same strings. Instead of storing raw pprof files, we could parse them and storing all strings of all profiles in a dedicated table/index. Meaning that profiles from the same build will contain only profiling data (int64s)

The format of pprof proto is described in https://github.com/google/pprof To decode a proto file with protoc:

% gunzip -k adjust_server-cpu-b16-20180618-1.pb.gz
% protoc --decode perftools.profiles.Profile \
  -I ~/Workspace/src/github.com/google/pprof/proto/ \
  profile.proto <adjust_server-cpu-b16-20180618-1.pb > adjust_server-cpu-b16-20180618-1.txt

See https://github.com/timescale/prometheus-postgresql-adapter — an example of an approach to effectively import telemetry data to Postgres.

21 Dec


Storage requirements

adjust_server_cpu.prof 
< 200KB x 1h/15min ~= 18MB/d x 50 machines = 900MB/d ~= 1GB/d

adjust_server_heap.prof
< 1MB x 1h/15min ~= 96MB/d x 50 machines = 4800MB/d == 4GB/d

Ideas

  • Leaf-first mode Reverse and regroup the stack trace (callee listed at bottom, caller at top) Helps to identify hot spots

  • Flame graphs diff Select the “baseline” and show increase/reduction with red/blue colours (or hue changing).

13 May


Initial architecture from 2018

  • agent sends pprof files to the collector
  • collector stores the file to FS and saves the metadata to Postgres

Reference implementations:

Clone this wiki locally