-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding node label
s for more diagnosable error messages for async errors.
#585
Comments
Could the |
@zolkis that's an interesting idea, can you elaborate more with some examples? |
Well, IIUC the example above has the use case to annotate ops with labels, to help developers figure out what went wrong when exceptions are thrown. If that is the main use case, boilerplate code is needed for manually adding a label at every single level. When the label is not specified, there is no information in the exception. Instead of manually setting explicit labels, annotations (implicit labels) could be added automatically by the build algorithm, which knows where in the compute graph it currently is. When an exception occurs, the internal label like "step1:matmul" (standard names to be defined) could be passed. The example is changed just in that the labels are not developer-injected, but internally generated (as will be specified by an eventual future algorithm). const builder = new MLGraphBuilder(context);
const A = builder.input('0', operandType);
const B = builder.input('1', operandType);
const C = builder.matmul(A, B);
const D = builder.add(A, C);
// ... keep building a complex graph
...
const finalOperand = builder.add(E, F);
// Build the graph.
const graph = await builder.build({'output': finalOperand});
> Uncaught DOMException: Model graph build error: [Operand "step1:matmul"] input dimensions XXX exceed supported limit. IOW, the labels could be owned and attached by the implementation. The advantage is that this covers all graphs at all level automatically, the disadvantage is the lack of developer-given labels (and no possibility to piggy-back other instrumentation). However, the whole process is under the control of the implementation, no worries about sanitizing/checking developer injected labels. I am not even sure we must standardize the name space for such labels, as this could be owned by the implementations (when it's only meant for human eyes) -- unless programmatic handling of that information is needed. On the other hand, if there is experience and positive developer feedback on the label feature in WebGPU (citations needed), I have no objections using it also in WebNN. |
Thanks for explaining - I like this idea. Basically in A few notes:
|
I support this, looks like the best way to go. |
label
to allow better error handling for async errors.label
s for more diagnosable error messages for async errors.
thanks @zolkis ! auto-instrumentation would be indeed less work for developers. The thing I was concerned is whether the system can add meaningful enough labels. The more complex a model gets to, the less useful is something like a step number(as in your example). Imagine a transformer model, the developers would probably namespace the labels to something like: Allowing developers to specify labels, and fallback to system label seems more plausible. As for WebGPU label usage, I don't have exact stats to point to, but I did consult our WebGPU team and they mentioned developers find the labels extremely useful. |
Right, developers can add labels - but when they don't, does it mean they don't want any other information, i.e. should we ditch the auto-generation? Or set an option for that? |
For auto-generated information, we have a couple options:
option 2 will look like:
It seems that if the user agent can add useful annotation to the error message they can just always add them. So I'd prefer option 2? |
Related, we're currently trying to diagnose some Yolo-V9 slice issues, and the lack of diagnostic info is impeding investigation ("Failed to execute 'slice' on 'MLGraphBuilder': For dimension (0): the starting index to slice must be less than input size (2)." - cool, but which slice node?...) |
I may have said this in a WG telecon, but this feels like something where a prototype implementation would help inform the spec. So if any of the Chromium contributors who are feeling the pain want to hack something in, don't wait on spec discussions and it doesn't need to be perfect! Let's iterate and learn. |
This CL 5492314: WebNN: inital implementation of Add label for mloperand | https://chromium-review.googlesource.com/c/chromium/src/+/5492314 attempts to add label for MLOperand to report more detailed
One question about the IDL definition. If a sequence of operands were returned when invoking |
Another proposal: the label also could be added into MLOperator.
Any thoughts? |
Agree! From our experience of debugging the graph translated from frameworks like onnxruntime, the operator/node name is very useful to match the operator in the .onnx model with the implemented operator in backends of WebNN. |
We can have both of them: operand name and operator name.
|
The downside with adding to the MLOperator is that it doesn't exist as a concept in the spec, so we will need to add the param to each of the builder method. It also makes the param list longer. Alternative A - add to
|
Considering Joshua's comment, alternative A seems to fit the best (to handle MLOperand's immutability vs labels mutability). The parameter / options-dict can be optional. We also need an algorithm for generating good enough default labels (impl. specific, but as per the comment above, we should rather standardize the namespace before devs start to parse them and come up with various private namespaces). But if labels are used frequently, then from a developer coding perspective, I'd prefer the solution with setting the labels on separate lines, like in alternative B, as it allows separating code instrumentation from business logic. If we could find a mean to correctly do this, I'd go with that. |
This 5528797: WebNN: initial implementation of adding name for MLOperator | https://chromium-review.googlesource.com/c/chromium/src/+/5528797 attempts to add label for MLOperator to report more detailed
|
As a follow up of #572 , we propose platform specific validations should be done during the async
build
step.This poses a challenge for developers: they submitted a complex graph and one step within the graph is failing a platform specific check, it's hard to trace back the specific operand in the graph the error is about.
I propose to follow WebGPU’s practice to define a MLObjectBase with a
label
field to let MLOperand extend from.The usage would be like:
The
MLObjectBase
could also be extended by:MLBuffer
to help with debugging async buffer related errors.MLGraph
to help with debugging async errors from chained inference.The text was updated successfully, but these errors were encountered: