Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for quantization error with fractional step #247

Merged
merged 2 commits into from Apr 3, 2022
Merged

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Apr 3, 2022

Ref. test breakage in observablehq/plot@2551afa

The new quantization approach for binning was introducing some rounding error, resulting in values being put in the wrong adjacent bin. For example, if x = 1.64, x0 = 1.2, and step = -50 (a bin width of 0.02), then quantization would produce a value of (x0 - x) * step = (1.2 - 1.64) * -50 = 21.999999999999996. When this is floored to 21, the value 1.64 would be erroneously placed in the bin representing [1.62, 1.64) rather than [1.64, 1.66).

If we switch to (x0 * step - x * step), then we get (1.2 * -50 - 1.64 * -50) = 22, producing the correct result. I’m not totally sure that this fixes the binning in all cases, but I think it might?

@mbostock mbostock requested a review from Fil April 3, 2022 02:37
@Fil
Copy link
Member

Fil commented Apr 3, 2022

Unfortunately it doesn't fix all cases, see https://observablehq.com/@d3/d3-bin-errors-247

My hunch is we should revert to 3.1.2, and create a new API for quantized bins—this API would only return a sum, and would make no promises of being exact for numbers that are "at" the limit between two bins.

@Fil
Copy link
Member

Fil commented Apr 3, 2022

Another possibility would be to test each time if the quantization "got it right", like so:

@@ -91,7 +91,9 @@ export default function bin() {
       } else if (step < 0) {
         for (i = 0; i < n; ++i) {
           if ((x = values[i]) != null && x0 <= x && x <= x1) {
-            bins[Math.floor(x0 * step - x * step)].push(data[i]);
+            let k = Math.floor(x0 * step - x * step);
+            if (bins[k].x1 <= x) k++;
+            bins[k].push(data[i]);
           }
         }
       }

and we could add the following unit test:

it("bin(data) doesn't round the values", () => {
  for (const n of [1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000]) {
    assert.strictEqual(bin().thresholds(n)(ticks(1, 2, n)).filter(d => d.length !== 1).length, 0);
  }
});

@mbostock
Copy link
Member Author

mbostock commented Apr 3, 2022

Yes, this is what I was thinking (though I’m probably off by one):

if (tz[k] <= x) ++k;

But don’t we need to test both sides? Meaning, it might need to be more like this:

if (tz[k] <= x) ++k;
else if (tz[k - 1] < x) --k;

And theoretically we might need a while loop if you can be off by more than one, but I think that would only apply if the bin width is at the limit of floating point precision and we can just ignore that.

@mbostock
Copy link
Member Author

mbostock commented Apr 3, 2022

I pushed the conditional fix. It also means we can go back to using (x0 - x) * step instead of x0 * step - x * step. Which do you prefer?

@Fil
Copy link
Member

Fil commented Apr 3, 2022

this one is fine (and one multiplication less than the other)

@mbostock mbostock merged commit 57225fc into main Apr 3, 2022
@mbostock mbostock deleted the mbostock/bin-test branch April 3, 2022 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants