authors | state |
---|---|
Richard Kiene <richard.kiene@joyent.com> |
draft |
SmartOS metrics are spread across multiple native libraries (e.g. kstat, prstat, zfs) and modules such as node-kstat make interacting with them via Node possible. However, there is not a common interface for instrumenting a large set of SmartOS metrics from a single place. The goal of this RFD is to provide a single interface for instrumenting a large set of SmartOS metrics.
RFD 27 introduces the concept of a global zone Metric Agent which exposes metrics on a per container, per request basis. The Metric Agent will need to interface directly with native libraries in order to return the necessary metrics. Rather than embedding the code necessary to interact with native metrics in the Metric Agent itself, it seems preferable to provide an abstraction between the Node to native modules and the calling code. This should provide more portable code and code re-use, should a different application/agent need to instrument SmartOS.
Before deciding on a Node based module implementation, the following options were also considered but not chosen:
-
Removing the need for a Node based module by implementing a C based Metric Agent and calling the native libraries directly.
Languages written in C can take advantage of all of the mature debugging tooling SmartOS has to offer, which is nice. However, the set of HTTP server libraries leave quite a bit to be desired (e.g. cumbersome to work with, incompatible license, etc.), and the amount of de novo work necessary is not justified by a language that by its very nature will require more work than JavaScript.
-
Implementing the Metric Agent and the instrumenter in a static language such as Rust.
Rust seems like a very promising language. The downside to Rust is that it is also not very mature, and lacks support for the SmartOS debugging suite. Given that the instrumenter will be in the global zone, using an immature language does not seem prudent. Additionally, choosing Rust would require a significant investment to support the full suite of debugging tools SmartOS has to offer.
-
Providing the Metric Agent functionality with a pluggable front and backend instrumenter.
On the surface a pluggable instrumenter seems nice, but in reality it should not be necessary. If the Metric Agent in RFD 27 is designed correctly, it will be the only up-stack facing piece necessary, thus eliminating the need for a pluggable front end. Similarly, OS level metrics should not be gathered by more than one agent per compute node, so the set of consumers of metrics will not be diverse. Since there should not be a diverse set of metrics consumers, it seems appropriate to keep the abstraction between instrumenter and OS level intrumentation inside the instrumenter module itself.
The Metric Instrumenter Node module will be an abstraction on top of one or more Node Addons (e.g. node-kstat).
Consumers of the Metric Instrumenter node module should not be required to understand the intricacies of the underlying OS metric sources. Instead, consumers should only need to know which metric(s) they would like to consume.
For example, a consumer should not need to know that they must retrieve data
from kstat zones:::nsec_user
to get aggregate user CPU usage. Instead the
Metric Instrumenter will provide methods for metric retrieval, using predefined
metric keys. The string to metric pairs will be documented with each release and
programmatically discoverable via the module itself.
Dynamic, or ad-hoc, instrumentation is not in scope. The goals of simplicity and proper abstraction are in conflict with dynamic instrumentation. Thankfully, nothing about the Metric Instrumenter should prevent consumers from combining it with a future Dynamic Metric Instrumenter.
-
node-kstat needs to be updated so that it supports Node versions greater than 0.12.x.
-
zfs list
needs to be profiled so we understand it's impact on the box when used frequently. Furthermore, a node module should be created that allows calling a native library rather than synchronously calling a shell command.
cpu_agg_usage
=>kstat zones:::nsec_user
cpu_wait_time
=>kstat zones:::nsec_waitrq
zfs_used
=>zfs list used
zfs_available
=>zfs list available
load_average
=>kstat zones:::averun_1min
mem_agg_usage
=>kstat memory_cap:::rss
mem_limit
=>kstat memory_cap:::physcap
mem_swap
=>kstat memory_cap:::swap
mem_swap_limit
=>kstat memory_cap:::swapcap
net_agg_packets_in
=>kstat link:::ipackets64
net_agg_packets_out
=>kstat link:::opackets64
net_agg_bytes_in
=>kstat link:::rbytes64
net_agg_bytes_out
=>kstat link:::obytes64
time_of_day
=>gettimeofday(3C)
-
getMetric(<metric_key>, function(err, metric_data))
- metric_data example
{ "origin": "kstat link:::rbytes64", "unit": "bytes", "base": 2, "type": "counter", "value": 1234 }
- example usage
instrumenter.getMetric('cpu_agg_usage', function(err, metric_data) { if (err) { assert(!err); } else { // do things with metric_data here } });
-
getMetrics(<metric_keys>, function(err, metrics_data))
- metrics_data example
{ "net_agg_bytes_in": { "origin": "kstat link:::rbytes64", "unit": "bytes", "base": 2, "type": "counter", "value": 1234 }, "zfs_used": { "origin": "zfs available", "unit": "bytes", "base": 2, "type": "counter", "value": 1234 } }
- example usage
instrumenter.getMetrics(['mem_swap', 'zfs_used'], function(err, metric_data) { if (err) { assert(!err); } else { // do things with metrics_data here } });
-
getMetricKeys(function(err, keys)
- keys example
{ "net_agg_bytes_in": { "origin": "kstat link:::rbytes64", "unit": "bytes", "base": 2, "type": "counter" }, ... }