Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto: error on underspecified reduce #1833

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tophtucker
Copy link
Contributor

@tophtucker tophtucker commented Aug 24, 2023

Watching people use the chart cell, they often set a reducer without understanding what it’s reducing. Min of what, max of what, mean of what? Setting most reducers without a field is incoherent; even if it produces something, it’ll be misleading. But people often don’t realize they have to do something else, because they see some output, and scratch their heads trying to interpret it.

Since it’s easy to statically tell if a configuration is invalid like this, this PR throws an error in those cases. Demo: https://observablehq.com/d/932dd87a4c129e16

image

I tried to think of a more plain-English way to say this. We could do something like Error: ${channel} is "${reducer}" of what? reducer requires ${channel} field. People don’t know the word “reducer”, but they know the prepositional relationship expressed in “mean” of “height”. It’s gotta be of something.

Broken out from #1424, where we previously discussed whether "count" is really the only reducer that doesn’t need this error:

Fil:

sum does not need an input (it defaults to count); distinct works too I think, also giving count. min-index and max-index will always give 0. And then there are the x1, x2 reducers (for binX), which give the bins' bounds.

Toph:

Even though sum and distinct work as count, I've seen several people get confused by selecting "distinct" in the chart cell and seeing nothing change, so I'm still sort of inclined to throw (or warn?) on them.

Comment on lines 348 to 350
function isUnderspecifiedReduce({value, reduce}) {
return value === undefined && reduce !== undefined && reduce !== "count";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Suggested change
function isUnderspecifiedReduce({value, reduce}) {
return value === undefined && reduce !== undefined && reduce !== "count";
}
function isUnderspecifiedReduce({value, reduce}) {
return value == null && reduce != null && /^count$/i.test(reduce);
}

This:

  • Treats null and undefined as the same for value and reduce
  • Coerces reduce to a string
  • Does case-insensitive match equivalent to the keyword helper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!! ive always wondered why you use regexes in this sort of helper, this is a good explanation

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior seems desirable.

But I wonder if we could solve this at a lower level: would it be possible for the bin and group transform to detect this instead, when you’ve specified a reducer that requires an input channel but not provided said channel? I think it would be a matter of denoting whether a reducer requires an input channel when it is declared — or perhaps conversely, when the input is optional, since count is really the exception rather than the default, and then enforcing that the input exists when the evaluator is initialized…

Oh, wait. 🤔💡 The reason this isn’t an error is because if the input channel doesn’t exist, then the data itself is fed to the reducer in lieu of the channel:

initialize(data) {
V = input === undefined ? data : valueof(data, input);
if (reducer.scope === "data") {
context = reducer.reduceIndex(range(data), V);
}
},

So, if you have an array of numbers as data, you should be able to feed that directly to the median reducer without specifying an input channel. And hence this enforcement is limiting desirable shorthand functionality. (I don’t think the chart cell allows arrays of primitives to be data sources?)

I think either we have to solve this more generally #493 where we detect that the materialized input is entirely nullish or NaN, or we do a more specific heuristic here, like bypass the check if data is an array of primitives (numbers, dates, strings, booleans, etc.).?

@mbostock
Copy link
Member

Another potential direction is issue a warning when the input is more than 50% nullish or NaN, and work on hooking up Plot’s warnings to be displayed by the chart cell (#1192, although that issue presupposes a specific solution and we may want to consider other approaches).

@tophtucker
Copy link
Contributor Author

tophtucker commented Aug 24, 2023

Oh, wait. 🤔💡 The reason this isn’t an error is because if the input channel doesn’t exist, then the data itself is fed to the reducer in lieu of the channel:

ohhh ya. makes sense. plot.auto doesn't support shorthand / primitive arrays, though i've always wanted it to:

image

so this error wouldn't limit plot.auto any more than it's already limited. though it'd add some inertia to the limit.

Another potential direction is issue a warning when the input is more than 50% nullish or NaN

is the warning approach only bc of the shorthand case? at one point i think you described to me that things should be errors when they're statically incoherent in the options, or warnings if they're about the dynamic data. for plot.auto as it stands today it feels more like an error.

(#1192, although that issue presupposes a specific solution and we may want to consider other approaches).

yeah good aside, i've rewritten that issue description to be more agnostic 👍

@mbostock
Copy link
Member

plot.auto doesn't support shorthand / primitive arrays, though i've always wanted it to

Plot.auto doesn’t support shorthand (you have to specify some options), but it does support primitive arrays:

untitled (66)

Plot.auto(d3.range(1000).map(d3.randomNormal()), {x: Plot.identity}).plot()

@tophtucker
Copy link
Contributor Author

tophtucker commented Aug 25, 2023

Ohh right right. Here's the behavior of inputless reducers on primitive arrays w/ Plot.auto today:

image

It's tantalizing because it feels like almost certainly a case of user confusion.

bypass the check if data is an array of primitives (numbers, dates, strings, booleans, etc.)

feels right to me. (you can't tell from the materialized columns, right?) what do you think, same approach as arrayIsPrimitive in stdlib? or maybe we just check the first row or something? what's the plot norm for inference… https://github.com/observablehq/stdlib/blob/main/src/table.js#L86

@tophtucker
Copy link
Contributor Author

took a stab at it throwing only if not primitive (using a simpler primitivity check than stdlib table stuff). doesn't feel great to iterate over the data once more each time 🤷

}

function isPrimitive(values) {
return isEvery(values, (d) => ["number", "boolean", "string"].includes(typeof d) || d instanceof Date);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits here.

First, isEvery requires values to be an iterable, but mark data can also be an “arrayish” value that is supported by Array.from such as {length: 10}. So, we’d need to handle that case, too. I recommend protecting the call as isIterable(data) && !isPrimitive(data) (since any data is not iterable by definition can only contain undefined, as in a sparse array, which we would not consider to be primitive).

Second, we should also treat bigints as primitive. (There’s also the symbol type to consider. I guess you could use the mode reducer on non-primitive values? But that’s obscure enough that I wouldn’t worry about it. If you’re doing these things you probably can debug the error.)

Last, I would avoid array.includes(type) in the inner loop. Unless the JavaScript engine is smart, it’ll have to allocate that array afresh for each value, and the call to array.includes is also probably slower than an inline logical expression. It’s better to just write this by hand; see isOrdinal for an example.

@@ -243,3 +243,42 @@ it("Plot.autoSpec makes a faceted heatmap", () => {
colorMode: "fill"
});
});

it.only("autoSpec rejects an underspecified reducer", () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ it.only

@tophtucker
Copy link
Contributor Author

@mbostock I think I’ve addressed all comments:

  • Check isIterable before isPrimitive
  • Treat bigint as primitive
  • Write out "or" by hand instead of array.includes
  • Fix it.only

@Fil
Copy link
Contributor

Fil commented Oct 3, 2023

Codewise this lgtm. But can we rephrase the error message a little bit? setting y reducer to "mean" requires setting y field uses "reducer" and "field", but the properties are named reduce and value. Maybe something like “a value is required when setting the reduce to "mean"”, “provide a value to reduce y to "mean"”, “missing y value to reduce with "mean"”, or just “missing value for y”.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants