Core generic tensor ops #3024
gramalingam
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Background and Motivation:
One of the challenges faced in a standard like ONNX is the tradeoff between expressiveness and efficiency. This often manifests itself as a choice between generic ( or low-level) ops and specialized (or high-level) ops. Generic ops (for example, Scan in ONNX) can be composed to express far more models and lead to greater expressiveness. But specialized ops (for example, RNN or GRU or LSTM) can be implemented very efficiently, taking advantage of their specific requirements.
ONNX relies on the idea of supporting both kinds of ops and linking them using the concept of functions, which effectively define specialized ops in terms of other, lower-level or more generic, ops. This allows a backend that has a very efficient implementation of the specialized op to exploit that, while allowing other backends to rewrite the specialized op in terms of the other generic ops and use an implementation of the generic ops as a fallback or default implementation.
The MLIR project also exploits a similar approach, using the idea of different layers (or dialects) and using lowerings from one dialect to another.
Proposal
It will be beneficial to add a few generic tensor ops allowing us to achieve the above goals effectively. The suggested ops are:
These will be complemented by scalar ops that operate only on scalar values and return scalar values (such as Tanh, log, addition, subtraction). All the above tensor-ops will take the form of higher-order ops that take a scalar computation (graph) as an attribute.
In the MLIR terminology, we can think of the above as two dialects: a scalar dialect (effectively at the LLVM level) and a tensor dialect. The tensor ops suggested above are slightly higher-level than the Generic op in the LinAlg dialect of MLIR, which is itself inspired by the common forms of loops encountered in this setting (such as parallel loop and reduction loop).
The idea behind these core generic tensor ops is to have greater extensibility and expressiveness without compromising efficiency. Thus, these ops capture the key attributes that are exploited in an efficient implementation, while parametrizing over aspects that are not that critical for these optimizations.
Beta Was this translation helpful? Give feedback.
All reactions