Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contract: add heatmap format docs #507

Merged
merged 15 commits into from Nov 9, 2022
15 changes: 15 additions & 0 deletions data/contract_docs/contract.md
Expand Up @@ -14,6 +14,21 @@ There are logical **_kinds_** (like Time Series Data, Numeric, Histogram, etc),

A **_data type_** definition or declaration in this framework includes both a kind and format. For example, "TimeSeriesWide" is: kind: "Time Series", format: "Wide".

* [Time series](./timeseries.md)
* [Wide](./timeseries.md#time-series-wide-format-timeserieswide)
* [Long](./timeseries.md#time-series-long-format-timeserieslong-sql-like)
* [Multi](./timeseries.md#time-series-multi-format-timeseriesmulti)
* [Numeric](./numeric.md)
* [Wide](./numeric.md#numeric-wide-format-numericwide)
* [Multi](./numeric.md#numeric-multi-format-numericmulti)
* [Long](./numeric.md#numeric-many-format-numericlong)
* [Heatmap](./heatmap.md)
* [Buckets](./heatmap.md#heatmap-buckets-heatmapbuckets)
* [Scanlines](./heatmap.md#heatmap-scanlines-heatmapscanlines)
* [Sparse](./heatmap.md#heatmap-sparse-heatmapsparse)



## Dimensional Set Based Kinds

Within a data type (kind+format), there can be multiple **_items_** of data that are uniquely identified. This forms a **_set_** of data items. For example, in the numeric kind there can be a set of numbers, or, in the time series kind, a set of time series-es :-).
Expand Down
232 changes: 232 additions & 0 deletions data/contract_docs/heatmap.md
@@ -0,0 +1,232 @@
# Heatmap

Status: EARLY Draft/Proposal

Heatmaps are used to show the magnitude of a phenomenon as color in two dimensions. The variation in color
may give visual cues about how the phenomenon is clustered or varies over space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For defining the "heatmap" kind, should we state that we exclude "geo spatial" map data from this kind? Seems like wikipedia calls this kind we are describing as "grid heap map". I think it is probably okay not to use "grid" and just call it heatmap and then in the geo case something like "spatial heap map" - but we should have a clear decision on if geo would be included in this kind (to which my guess is "no").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ignore geo for now -- we can either add a new kind more appropriate for geo, or perhaps allow the x values to be of type geo (that exists in the frontend, but not in the backend)



## Heatmap buckets (HeatmapBuckets)

The first field represents the X axis, the rest of the fields indicate rows in the heatmap.
The true numeric range of each bucket can be indicated using an "le" label. When absent,
The field display is used for the bucket label.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"field display", metadata or field name?

Copy link
Contributor

@bohandley bohandley Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is field display for the bucket label the same as the label object in a field?

For example, this field display would be
{__name__="go_gc_duration_seconds_sum", instance="localhost:9095", job="prometheus"}

{
          "name": "Value",
          "type": "number",
          "typeInfo": {
            "frame": "float64"
          },
          "labels": {
            "__name__": "go_gc_duration_seconds_sum",
            "instance": "localhost:9095",
            "job": "prometheus"
          },

but this one that has le could be described just by using the le label and doesn't need the whole field display?

"labels": {
            "__name__": "prometheus_http_request_duration_seconds_bucket",
            "handler": "/",
            "instance": "localhost:9095",
            "job": "prometheus",
            "le": "+Inf"
          },

Is this accurate @leeoniya?


Example:

<table>
<tr>
<td>
<strong>Type: Time</strong>
<p>
<strong>Name: Time</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: </strong>
</p>
<p>
<strong>Labels: {"le":<em> "10"</em>}</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: </strong>
</p>
<p>
<strong>Labels: {"le":<em> "20"</em>}</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: </strong>
</p>
<p>
<strong>Labels: {"le":<em> "+Inf"</em>}</strong>
</p>
</td>
</tr>
<tr>
<td>1653416391000</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<td>1653416391000</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<td>1653416391000</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
</table>


Note: [Timeseries wide](./timeseries.md#time-series-wide-format-timeserieswide) can be used directly
as heatmap-buckets, in this case each value field becomes a row in the heatmap.


## Heatmap scanlines (HeatmapScanlines)

In this format, each row in the frame indicates the value of a single cell in a heatmap.
There exists a row for every cell in the heatmap.

**Example:**

<table>
<tr>
<td>
<strong>Type: Time</strong>
<p>
<strong>Name: xMax|xMin|x</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: yMax|yMin|y</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: Count</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: Total</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: Speed</strong>
</p>
</td>
</tr>
<tr>
<td>1653416391000</td>
<td>100</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1653416391000</td>
<td>200</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>1653416391000</td>
<td>300</td>
<td>3</td>
<td>3</td>
<td>3</td>
</tr>

<tr>
<td>1653416392000</td>
<td>100</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>1653416392000</td>
<td>200</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
<tr>
<td>1653416392000</td>
<td>300</td>
<td>6</td>
<td>6</td>
<td>6</td>
</tr>
</table>

This format requires uniform cell sizing. The size of the cell is defined by the columns in each row that are chosen as the xMax|xMin|x and the yMax|yMin|y. We can see that the Number column(yMax|yMin|y) increases by 100(each cell is roughly 100 higher than the previous cell on the y axis) for each row containing a similar Time value(these stacked cells all have roughly the same location along the x axis). This produces a uniform cell size.

Note that multiple "value" fields can included to represent multiple dimensions within the same cell.
The first value field is used in the display, unless explicilty configured

The field names for yMax|yMin|y indicate the aggregation period or the supplied values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case sensitive? I don't care if it is or not, just that we pick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In transformations, the case sensitivity is the same, I suggest with stick with lower case y and uppercase M, yMin, yMax

* yMax: the values are from the bucket below
* yMin: the values are from to bucket above
* y: the values are in the middle of the bucket


## Heatmap sparse (HeatmapSparse)

This format is simplar to Heatmap scanlines, except that each cell is independent from its adjacent values.
Unlike scanlines, this allows resolutions to change over time. Where scanline has uniformity of cells over time, heatmap sparse allows for variability of cells along the x axis(Time).

Example:


<table>
<tr>
<td>
<strong>Type: Time</strong>
<p>
<strong>Name: xMin</strong>
</p>
</td>
<td>
<strong>Type: Time</strong>
<p>
<strong>Name: xMax</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: yMin</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: yMax</strong>
</p>
</td>
<td>
<strong>Type: Number</strong>
<p>
<strong>Name: Value</strong>
</p>
</td>
</tr>
<tr>
<td>1653416391000</td>
<td>1653416392000</td>
<td>100</td>
<td>200</td>
<td>1</td>
</tr>
<tr>
<td>1653416392000</td>
<td>1653416393000</td>
<td>200</td>
<td>400</td>
<td>2</td>
</tr>
</table>

* For high resolution with many gaps, this will require less space
* This format is much less optomized for fast render and lookup than the uniform "scanlines" approach