Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data decimation transform #1707

Open
Fil opened this issue Jun 21, 2023 · 7 comments · May be fixed by #1966
Open

data decimation transform #1707

Fil opened this issue Jun 21, 2023 · 7 comments · May be fixed by #1966
Assignees
Labels
enhancement New feature or request

Comments

@Fil
Copy link
Contributor

Fil commented Jun 21, 2023

A transform to decimate (sample) data, by filtering the index.

Possible strategies:

In practice we probably don't need all the methods; having one by default would be enough. M4 is easy to implement.

@Fil Fil added the enhancement New feature or request label Jun 21, 2023
@Fil Fil self-assigned this Jun 21, 2023
@Fil Fil changed the title data decimation data decimation transform Jun 21, 2023
@Fil Fil mentioned this issue Nov 25, 2023
@Fil
Copy link
Contributor Author

Fil commented Nov 27, 2023

A good place to apply decimation (almost transparently) is just before rendering. The index is filtered, and the X and Y values are scaled. We can filter the index again, so that the rendering is (almost) the same, but with a lighter path/footprint.

This notebook implements the M4 strategy on the line mark:
https://observablehq.com/@fil/fast-brush-with-line-simplification

A line chart based on 10 million points, which was impossible to render, becomes possible. A brush (#5) can even be added and enjoy interactive speed.

@mbostock
Copy link
Member

mbostock commented Dec 27, 2023

I would love for us to do decimation automatically (and transparently) when rendering areas and lines. (Even if only for the linear curve… and maybe we can make it work for the step curve too.)

@Fil
Copy link
Contributor Author

Fil commented Jan 2, 2024

I think I've solved the issue with curves (of all known types) in the prototype notebook. This is now using an extension of M4, where I add the first and last points, and for some curves the second and next to last points too. I'll work on a PR.

Fil added a commit that referenced this issue Jan 2, 2024
@Fil Fil linked a pull request Jan 2, 2024 that will close this issue
2 tasks
@Hvass-Labs
Copy link

I have two requests:

  1. I need to smoothen a time-series so it doesn't look so erratic when plotted, but it is important that I keep the peaks which get smoothened out by the windowY transform. Would it be possible to make this new down-sampling method one of the reduce options in the windowY transform, so I could both smoothen the data and keep the peaks?

  2. In another problem, I have to down-sample an array in Java-script. So it would be very useful to me, if you can provide direct access to the Java-script function for this down-sampling algorithm, similar to how I can compute e.g. histograms using a D3 function without actually plotting it.

Thanks!

@Fil
Copy link
Contributor Author

Fil commented Jan 15, 2024

For 1, let me refer to this notebook: https://observablehq.com/@fil/time-series-topological-subsampling. This framework offers a good way to think about the problem (like formally defining what a "peak" is), and the algorithm is pretty fast. There is a link to a second notebook that uses it with Plot.

For 2, if you still want to use the M4 strategy you could adapt the decimateIndex function I'm suggesting in the PR. It's using normalized values (in pixels), with a scaling factor pixelSize that you can tweak to decide which values of the horizontal component fall into the same "bucket". For example if X contains dates and the unit bucket that you're considering is an hour, you would use pixelSize: 3_600_000 (60x60x1000 milliseconds).

@Hvass-Labs
Copy link

Thanks for the suggestions!

I have taken a look at your Notebook and it looks great, but it is also considerably beyond my skill-level in this field :-) So hopefully you will one day make this an easy-to-use transform like windowY that everyone can use.

But let me elaborate a bit why I need this kind of smoothing. The time-series contains e.g. 16,000 daily data-points, which looks a bit erratic when plotted without smoothing.

I am also using your brushing / selection feature so the user can select a range of the plot and copy the data. But the extremes are fairly important in this application, so the user may be surprised if the copied data has more extreme values than what is shown in the plot.

That's why I think it may be good to smoothen the data while keeping the extremes.

smoothing using windowY (1)

smoothing using windowY (2)

@mbostock
Copy link
Member

mbostock commented Apr 5, 2024

Also AM4 described here in the DashQL paper: https://arxiv.org/pdf/2306.03714.pdf

Screenshot 2024-04-05 at 2 40 07 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants