Skip to content

Releases: lahmatiy/cpupro

0.5.1

11 May 01:32
Compare
Choose a tag to compare
  • Added transformation from parent to children for call tree nodes for .cpuprofile files if needed (#5)
  • Implemented exclusion of ending no-sample time. For certain profiles, the time from the last sample until endTime can be significant, indicating the conclusion of the profiling session or adjustments from excluding idle samples at the end. This time is now excluded from the Profiling time which used for computing time percentages
  • Fixed double rendering of the page after the profile data is loaded

0.5.0 Performance, Reworked UI, New formats, Deno

10 May 02:43
Compare
Choose a tag to compare

This release of CPUpro introduces significant updates, including performance enhancements, a redesigned user interface, and expanded format and runtime support. This version introduces groundbreaking enhancements that significantly reduce the time to load and process extremely large profiles, making CPUpro highly efficient for analyzing complex long-running scripts. The user interface has been thoroughly revamped to offer a more intuitive and responsive experience, enhancing usability across various features and views. New profile formats and support for the Deno runtime has been added, expanding the tool's versatility and adaptability to modern development environments.

Performance

CPUpro has been entirely re-engineered to optimize the preprocessing of profiles upon loading and for subsequent computations. This redesign enables it to handle massive profiles (exceeding 100MB) significantly faster than other tools. CPUpro is currently the best option for analyzing intense long-running scripts that generate extensive CPU profiles, such as webpack build profiles or prolonged browser sessions (that can last minutes or even tens of minutes).

The table below illustrates the time of loading and first render of profiles of varying sizes across different tools:

Profile size Profile type CPUpro v0.5 CPUpro v0.4 Chromium DevTools speedscope
33MB
215k samples / 120k call tree
V8 cpuprofile 0.5s 0.8s 4.6s 6.5s
113MB
625k samples / 62k call tree
Chromium Profile 1.3s 1.6s 10.6s 12.4s
114MB
739k samples / 446k call tree
V8 cpuprofile 1.3s 2.6s 12.3s 18.5s
239MB
11.6M samples / 489k call tree
V8 cpuprofile 2.8s 11.3s 48s Out of memory
(after 23s)
277MB
127k samples / 35k call tree
Chromium Profile 1.9s 2.2s 4.2s Out of memory
(after 30s)
418MB
897k samples / 1.86M call tree
V8 cpuprofile 4.6s 8.7s Out of memory
(after 36s)
Out of memory
(after 49s)
2GB
7.3M samples / 7.28M call tree
V8 cpuprofile 27.1s Out of memory
(after 57s)
Invalid string length
(after 20s)
Out of memory
(after 43s)

Chrome 124 / MacBook Pro 13-inch, M1, 2020

As indicated in the table, the time is affected not only by the profile size but also by its format, the number of samples and the size of the call tree (note that some profiles contain millions of samples and nodes). Notably, the Chromium Profile, which includes extensive additional data beside CPU profile, tends to load faster than .cpuprofile files of the same size. It is worth mentioning that some tools struggle with large profiles, hitting the heap size limit (4GB) and resulting in crashes because of "Out of Memory" errors, which is particularly frustrating when a lengthy load time yields no results. Unlike these tools, CPUpro avoids such pitfalls thanks to new optimizations, now capable of loading and processing even 2GB profiles.

When comparing the loading time between CPUpro versions 0.4 and 0.5, the difference does not look so impressive. The reason for this is that a significant portion of the time is spent on loading and parsing JSON which remains unchanged. However, if we isolate the processing time and initial rendering, where main optimization efforts were concentrated, the new version shows performance improvements ranging from 1.5 to 11 times:

Profile size Profile type Load data & parse CPUpro v0.5
(computations + render)
CPUpro v0.4
(computations + render)
Delta
33MB
215k samples / 120k call tree
V8 cpuprofile 0.3s 0.16s 0.52s 3.1x
113MB
625k samples / 62k call tree
Chromium Profile 1.1s 0.21s 0.64s 3.0x
114MB
739k samples / 446k call tree
V8 cpuprofile 0.9s 0.37s 1.48s 4.0x
239MB
11.6M samples / 489k call tree
V8 cpuprofile 2.2s 0.79s 9.21s 11.7x
277MB
127k samples / 35k call tree
Chromium Profile 1.9s 0.15s 0.24s 1.7x
418MB
897k samples / 1.86M call tree
V8 cpuprofile 3.6s 1.12s 4.26s 3.6x
2GB
7.3M samples / 7.28M call tree
V8 cpuprofile 22.1s 4.98s

Chrome 124 / MacBook Pro 13-inch, M1, 2020

The acceleration was achieved by switching to linear memory (TypedArrays) for tree representation and calculations storage, despite the increased number and complexity of computations added since v0.4. The majority of the calculation algorithms are implemented using simple loops without recursion or complex branching. Experiments with WebAssembly for some calculations have resulted in up to a 2x speed increase in JavaScriptCore (Safari) and SpiderMonkey (Firefox), aligning execution times with V8, where there was no change in performance. Remarkably, the new algorithms allow V8 to optimize JavaScript execution to match the efficiency of WebAssembly, which was an unexpected.

Adopting TypedArray has drastically reduced heap memory usage. While modern browsers typically offer up to 4GB of heap space, exceeding this limit can crash browser's tab (and, accordingly, the app). CPUpro primarily uses the heap only for loading and parsing JSON and during the initial stages of data processing, then most data is managed using TypedArrays. These buffers, stored in what is termed "external memory", are only limited by the system's available memory, significantly lowering the risk of crashes due to "Out of memory". However, there is no reason to worry about it, since CPUpro consumes memory sparingly:

Profile size Profile type CPUpro v0.5 CPUpro v0.4 Chromium DevTools speedscope
33MB
215k samples / 120k call tree
V8 cpuprofile 8MB
External: 20MB
97MB 752MB 916MB
113MB
625k samples / 62k call tree
Chromium Profile 7MB
External: 17MB
61MB 1063MB 466MB
114MB
739k samples / 446k call tree
V8 cpuprofile 8MB
External: 155MB
324MB 1803MB 2001MB
239MB
11.6M samples / 489k call tree
V8 cpuprofile 12MB
External: 92MB
463MB 3877MB Out of memory
277MB
127k samples / 35k call tree
Chromium Profile 8MB
External: 9MB
34MB 488MB Out of memory
418MB
897k samples / 1.86M call tree
V8 cpuprofile 18MB
External: 233MB
1387MB Out of memory Out of memory
2GB
7.3M samples / 7.28M call tree
V8 cpuprofile 22MB
External: 866MB
Out of memory Invalid string length Out of memory

Data collected after loading the profile and calling the garbage collector

After loading the profile and initial calculations, CPUpro is ready for rapid timings recalculations and data sampling on demand, e.g. filter changes. This enhancement enabled the introduction of new complex views that were previously impossible due to prolonged calculations (many seconds) and UI freezing, which broke the user experience. Most views have also been optimized to react almost instantaneously to changes in filters, ensuring a seamless user experience even with large profiles.

cpupro-perf.mov

The optimizations in speed and memory efficiency are not just about improving profile loading and UI responsiveness, they also unlock new capabilities. Notably, it's crucial for features such as profile comparison, which requires loading at least two profiles, potentially doubling both the computation time and memory usage. These challenges have been addressed, setting the stage for future enhancements including profile comparison and more.

User interface

The user interface has undergone a significant redesign. The start page now appears more compact and provides a clearer overview of how the V8 engine operates. It features a timeline categorized by work type and function clustering tables, followed by a flamechart.

Demo


Other pages have also been reworked to be more informative. Each page now includes:

  • A timeline that not only displays self time but also nested time, with the distribution of nested time by categories.
  • A new section titled "Nested Time Distribution" that offers insights into the distribution of nested time in a hierarchical format, from a package to a function.
  • A basic flamechart displaying all frames related to the current subject (category, package, module, or function) as root frames.
image


The timeline has been enhanced with a tooltip that provides expanded details and the capability to select a range, a feature previously lacking when focusing on specific segments of work.

image


The Flamechart is now faster and smoother. It includes new selection capabilities and a detailed information block for the selected or zoomed frame.

image


The welcome page has been redesigned as well, and now offers example profiles in various formats to try:

image

New formats, runtimes, and registries

Support for new formats has been introduced:

  • V8 log converted into JSON with the --preprocess op...
Read more

0.4.0

21 Jan 20:41
Compare
Choose a tag to compare
  • Report
    • Extracted regular expression into a separate area regexp
    • Fixed edge cases when scriptId is not a number
    • Added ancestor call sites on a function page
    • Added function grouping on a function page (enabled by default)
    • Added timeline split by areas on default page
    • Improved function subtree displaying
    • Fixed processing of evaled functions (call frames with evalmachine prefixes)
  • CLI:
    • Added support to load jsonxl files
  • API:
    • Profile (result of profileEnd()):
      • Renamed methods:
        • writeToFile() -> writeToFileAsync()
        • writeToFileSync() -> writeToFile()
        • writeJsonxlToFileSync() -> writeJsonxlToFile()
      • Changed writeToFileAsync(), writeToFile() and writeJsonxlToFile() methods to return a destination file path
      • Added writeReport() method as alias to report.writeToFile()
    • profileEnd().report
      • Renamed writeToFile() -> writeToFileAsync() and writeToFileSync() -> writeToFile() (however, at the moment both are sync)
      • Changed open() method to return a destination file path
    • Capture (result of profile())
      • Added onEnd(callback) method to add a callback to call once capturing is finished, a callback can take a profiling result argument
      • Added writeToFile(), writeJsonxlToFile(), writeReport() and openReport() methods to call corresponding methods one capturing is finished
    • Changed profile() to return an active capturing for a name if any instead of creating a new one
    • Changed profile() to subscribe on process exit to end profiling (process.on('exit', () => profileEnd()))
    • Added writeToFile(), writeJsonxlToFile(), writeReport() and openReport() methods that starts profile() and call a corresponding method, i.e. writeReport() is the same as profile().writeReport()

0.3.0

06 Apr 17:32
Compare
Choose a tag to compare
  • Used jsonxl binary and gzip encodings for data on report generating, which allow to produce a report much faster and much smaller (up to 50 times) in size
  • Added writeJsonxlToFileSync() method to profile
  • Added build/*.html and package.json to exports
  • Report
    • Bumped discoveryjs to 1.0.0-beta.73
    • Enabled embed API for integrations
    • Rework flamechart for performance and reliability, it's a little more compact now
    • Added badges for function references
    • Updated segments timeline
    • Fixed Windows path processing
    • New page badges

0.2.1 – Boosted flame chart performance and fixes

20 Apr 10:58
Compare
Choose a tag to compare
  • Added count badges and tweaked numeric captions
  • Reworked flamechart view to improve performance especially on large datasets (eliminated double "renders" in some cases, a lot of unnecessary computations and other optimisations)
  • Changed behaviour in flamechart when click on already selected frame to select previously selected frame with a lower depth
  • Fixed flamechart's view height updating when stack depth is growing on zoom
  • Fixed processing of profiles when call frame scriptId is a non-numeric string
  • Bumped discoveryjs to 1.0.0-beta.65

0.2.0 – Support for Chromium profile format & flame charts

21 Feb 19:24
Compare
Choose a tag to compare
  • Added support for Chromium Developer Tools profile format (Trace Event Format)
  • Added flame chart on index page
  • Fixed time deltas processing
  • Fixed total time computation for areas, packages, modules and functions
  • Fixed module path processing
  • Reworked aggregations for areas, packages, modules and functions

0.1.1

08 Feb 15:33
Compare
Choose a tag to compare
  • Added missed bin field
  • Renamed profile recording method end() into profileEnd() for less confussion
  • Fixed a crash in viewer when an element in nodes doesn't contain a children field, e.g. when DevTools protocol is used
  • Fixed file module path normalization in viewer
  • Removed modification of startTime and endTime in recorded profile
  • Exposed createReport() method

0.1.0 – Hello world

07 Feb 22:09
Compare
Choose a tag to compare
  • Initial release