New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto: error on underspecified reduce #1833
base: main
Are you sure you want to change the base?
Conversation
src/marks/auto.js
Outdated
function isUnderspecifiedReduce({value, reduce}) { | ||
return value === undefined && reduce !== undefined && reduce !== "count"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
function isUnderspecifiedReduce({value, reduce}) { | |
return value === undefined && reduce !== undefined && reduce !== "count"; | |
} | |
function isUnderspecifiedReduce({value, reduce}) { | |
return value == null && reduce != null && /^count$/i.test(reduce); | |
} |
This:
- Treats null and undefined as the same for value and reduce
- Coerces reduce to a string
- Does case-insensitive match equivalent to the keyword helper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!! ive always wondered why you use regexes in this sort of helper, this is a good explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior seems desirable.
But I wonder if we could solve this at a lower level: would it be possible for the bin and group transform to detect this instead, when you’ve specified a reducer that requires an input channel but not provided said channel? I think it would be a matter of denoting whether a reducer requires an input channel when it is declared — or perhaps conversely, when the input is optional, since count is really the exception rather than the default, and then enforcing that the input exists when the evaluator is initialized…
Oh, wait. 🤔💡 The reason this isn’t an error is because if the input channel doesn’t exist, then the data itself is fed to the reducer in lieu of the channel:
Lines 212 to 217 in 1ff36a3
initialize(data) { | |
V = input === undefined ? data : valueof(data, input); | |
if (reducer.scope === "data") { | |
context = reducer.reduceIndex(range(data), V); | |
} | |
}, |
So, if you have an array of numbers as data, you should be able to feed that directly to the median reducer without specifying an input channel. And hence this enforcement is limiting desirable shorthand functionality. (I don’t think the chart cell allows arrays of primitives to be data sources?)
I think either we have to solve this more generally #493 where we detect that the materialized input is entirely nullish or NaN, or we do a more specific heuristic here, like bypass the check if data is an array of primitives (numbers, dates, strings, booleans, etc.).?
Another potential direction is issue a warning when the input is more than 50% nullish or NaN, and work on hooking up Plot’s warnings to be displayed by the chart cell (#1192, although that issue presupposes a specific solution and we may want to consider other approaches). |
ohhh ya. makes sense. plot.auto doesn't support shorthand / primitive arrays, though i've always wanted it to: so this error wouldn't limit plot.auto any more than it's already limited. though it'd add some inertia to the limit.
is the warning approach only bc of the shorthand case? at one point i think you described to me that things should be errors when they're statically incoherent in the options, or warnings if they're about the dynamic data. for plot.auto as it stands today it feels more like an error.
yeah good aside, i've rewritten that issue description to be more agnostic 👍 |
Ohh right right. Here's the behavior of inputless reducers on primitive arrays w/ Plot.auto today: It's tantalizing because it feels like almost certainly a case of user confusion.
feels right to me. (you can't tell from the materialized columns, right?) what do you think, same approach as arrayIsPrimitive in stdlib? or maybe we just check the first row or something? what's the plot norm for inference… https://github.com/observablehq/stdlib/blob/main/src/table.js#L86 |
took a stab at it throwing only if not primitive (using a simpler primitivity check than stdlib table stuff). doesn't feel great to iterate over the data once more each time 🤷 |
src/marks/auto.js
Outdated
} | ||
|
||
function isPrimitive(values) { | ||
return isEvery(values, (d) => ["number", "boolean", "string"].includes(typeof d) || d instanceof Date); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple nits here.
First, isEvery requires values to be an iterable, but mark data can also be an “arrayish” value that is supported by Array.from such as {length: 10}. So, we’d need to handle that case, too. I recommend protecting the call as isIterable(data) && !isPrimitive(data)
(since any data is not iterable by definition can only contain undefined, as in a sparse array, which we would not consider to be primitive).
Second, we should also treat bigints as primitive. (There’s also the symbol type to consider. I guess you could use the mode reducer on non-primitive values? But that’s obscure enough that I wouldn’t worry about it. If you’re doing these things you probably can debug the error.)
Last, I would avoid array.includes(type) in the inner loop. Unless the JavaScript engine is smart, it’ll have to allocate that array afresh for each value, and the call to array.includes is also probably slower than an inline logical expression. It’s better to just write this by hand; see isOrdinal for an example.
test/marks/auto-test.js
Outdated
@@ -243,3 +243,42 @@ it("Plot.autoSpec makes a faceted heatmap", () => { | |||
colorMode: "fill" | |||
}); | |||
}); | |||
|
|||
it.only("autoSpec rejects an underspecified reducer", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ it.only
@mbostock I think I’ve addressed all comments:
|
Codewise this lgtm. But can we rephrase the error message a little bit? |
Watching people use the chart cell, they often set a reducer without understanding what it’s reducing. Min of what, max of what, mean of what? Setting most reducers without a field is incoherent; even if it produces something, it’ll be misleading. But people often don’t realize they have to do something else, because they see some output, and scratch their heads trying to interpret it.
Since it’s easy to statically tell if a configuration is invalid like this, this PR throws an error in those cases. Demo: https://observablehq.com/d/932dd87a4c129e16
I tried to think of a more plain-English way to say this. We could do something like
Error: ${channel} is "${reducer}" of what? reducer requires ${channel} field
. People don’t know the word “reducer”, but they know the prepositional relationship expressed in “mean” of “height”. It’s gotta be of something.Broken out from #1424, where we previously discussed whether "count" is really the only reducer that doesn’t need this error:
Fil:
Toph: