Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantileIndex, medianIndex #159

Merged
merged 3 commits into from Jul 3, 2022
Merged

quantileIndex, medianIndex #159

merged 3 commits into from Jul 3, 2022

Conversation

Fil
Copy link
Member

@Fil Fil commented Jun 25, 2020

closes #140

@Fil Fil requested a review from mbostock June 25, 2020 10:18
@Fil
Copy link
Member Author

Fil commented Jun 25, 2020

@EE2dev please test :-)

@EE2dev
Copy link

EE2dev commented Jun 27, 2020

Here is a test observable:
https://observablehq.com/d/254038e0365760ad

It seems for an array with an 99 (odd number of elements) the value1 and index1 should be the same as value0 and index0 but it isnt. Or am I missing something?

@Fil
Copy link
Member Author

Fil commented Jun 28, 2020

Thanks so much for checking this! In the notebook's function I hadn't paid enough attention to the case where the quantile falls exactly on an index; I always returned [i,i+1] in that case. Note that this PR is about the second approach (which I called "ex post" in the notebook).

@Fil Fil added the feature label Jul 10, 2020
@mbostock
Copy link
Member

mbostock commented Aug 23, 2020

It feels a little unusual to me that quantileIndex (and by extension medianIndex) returns a two-element array of indexes rather than a single index. I understand why that is, but I think another reasonable interpretation would be to return only the first index, and then quantileIndex would be consistent with the other index methods. And secondly, knowing both indexes isn’t enough to compute the value returned by median or quantile using the R-7 method since you’d still need to the fractional component (i - i0). So how would you feel about just returning the first index?

Something like:

function quantileIndex(values, p, valueof) {
  values = Float64Array.from(numbers(values, valueof));
  if (!(n = values.length)) return;
  if ((p = +p) <= 0 || n < 2) return minIndex(values);
  if (p >= 1) return maxIndex(values);
  var n,
      i = (n - 1) * p,
      i0 = Math.floor(i),
      order = (i, j) => ascending(values[i], values[j]);
  return max(quickselect(Uint32Array.from(values, (_, i) => i), i0, 0, n - 1, order).subarray(0, i0 + 1), order);
}

@EE2dev
Copy link

EE2dev commented Aug 25, 2020

Thanks for the clarification. I agree about the downside/ overhead having two indices.. I am fine with your suggestion.

@mbostock mbostock removed the feature label Apr 8, 2022
@Fil
Copy link
Member Author

Fil commented Jun 30, 2022

I pushed a new version that returns the left-side index

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 11fa241, I made some changes:

  • I changed the implementation of quantileIndex to match my previous suggestion, though I needed to use greatest instead of max. The previous approach was doing a linear scan after computing the quantile value, but we can compute the quantile index directly, so I think it’s preferable to do that. Performance-wise I expect them to be similar since both require making two copies of the input array.
  • We need to apply numeric coercion even if valueof is undefined; the behavior of quantileIndex should match quantile. (And even so, we should test valueof for strictly undefined rather than falsey to match the behavior of default arguments in JavaScript.)
  • Tests should use strict equality instead of loose equality wherever possible.

FWIW, here is another alternative implementation that I considered (closer to your proposal), but I prefer the one now in this PR.

function quantileIndex(values, p, valueof) {
  values = Float64Array.from(numbers(values, valueof));
  if (!(n = values.length)) return;
  if ((p = +p) <= 0 || n < 2) return minIndex(values);
  if (p >= 1) return maxIndex(values);
  var n,
      m = Math.floor((n - 1) * p),
      q = max(quickselect(values.slice(), m).subarray(0, m + 1)),
      i = -1;
  while (++i < n) {
    if (values[i] === q) {
      return i;
    }
  }
}

@mbostock mbostock merged commit 99fe7ae into main Jul 3, 2022
@mbostock mbostock deleted the quantileIndex-140 branch July 3, 2022 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

d3.medianIndex / d3.quantileIndex?
3 participants