Typed Computer Algebra System #24844
sylee957
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Here, I make a tutorial how to use
pyright
to build your own computer algebra system,that can use parametrically polymorphic expression trees such that you can easily make any computer algebra system you need.
It can even rely on sympy objects and compose with any other thing, while ensuring type safety.
I'd give an argument about how good type checking can get handy for verifying correct data is used in computer algebra system
(and general data processing problems)
Abstract Syntax Tree
A first step is to write computer algebra system define classes for abstract syntax tree types.
Here I use
dataclass
to derive immutability, syntactic equality, and string representation similar as how SymPy does inBasic
.Expression Trees
The basics to construct sympy expression tree is to have
Integer, Symbol
and also have symbolic operators for arithmetic
Mathematical constants can be reduced to nullary expressions trees,
without having to hassile with things like
AtomicExpr
And functions like
sin, cos, tan
can be defined as unary expression trees:And a convenient parametric type alias can be developed
Trig: TypeAlias = Sin[T] | Cos[T] | Tan[T]
So you can verify simultaneously that something is trigonometric function
However, you would have a question that the types defined above are not very general
Although we can express any 1-level addition of integers as:
And addition of integers inside addition as:
We can't define arbitrary nested addition types of integers because it would need type of something like
Add[int | Add[int] | Add[int | Add[int]] | ...]
However, it is possible to define such types by recursive type aliasing
NestedAdd: TypeAlias = Add["int | NestedAdd[int]"]
such that arbitrary nested addition types work as:
However,
x = 1
doesn't check because it is not inside the additionFor that case, the definition can be inverted as
NestedAdd: TypeAlias = int | Add["NestedAdd[int]"]
And then
x: NestedAdd = 1
worksAnd in fact, it is a good to distinguish both types in order to avoid the confusion
Expression Type
We can assemble all the expression types as a big recursive expression type
And now it is possible to type general mathematical expressions:
Structural induction
Now, it comes how to write the functions that works over such recursive types
and introduce the benefit of using that over any other approaches like object-oriented polymorphic approaches.
The biggest advantage of using recursive types with union is that it is easy to verify the correctness of the program by mathematical induction.
If the program is correct over every components (
Integer, Symbol, Add, Mul, ...
) of the union,then it is trivial to assert that the program works for every sympy expressions, without having to think about
x
?Add(x, y)
?Add(Mul(Pow, ...
?One example of sympy's basic function that can easily be tested by this is:
And 'pyright' is able to warn if you any of the cases are missing.
Other examples that this can be useful are printer functions like
latex, pycode
where you need to make sure that every sympy classes have their own printers to make sure that every sympy expressions are printable.
Generalized Expression Type
Now, there comes the problem of extending the expressions.
When you need some new symbolic expressions, let's say, hyperbolic functions, bessel functions, sum, integral, matrix, ...
We always have to edit the messy union types to make it grow bigger
And this is one way how sympy got bigger, it had to make base class (Basic, Expr) to grow bigger
and also many mathematical functions had to be defined.
And then you need to rewrite every parts of the sympy code that uses that
(like printers) to correctly handle new symbolic expression types.
However, it is technically possible to solve that problem without editing the core,
by parametrizing the recursive type aliases:
Now, everything get parametrized by
T
, and the good thing is thatT
can be substituted with anything by usersto define recursive expression types that includes
T
.For example you can extend the grammar of expressions to include hyperbolic expressions by:
Hyper: TypeAlias = Sinh[T] | Cosh[T] | Tanh[T]
MyExpr: TypeAlias = "Expr[Hyper[MyExpr]]"
and then arbitrary nested hyperbolic expressions can be defined
And
Expr[Never]
can be used to get the unparametrizedExpr
as above.For example, this passes type check
y: Expr[Never] = Sin(Add(Symbol("x"), Symbol("y")))
but this does not
y: Expr[Never] = Sinh(Add(Symbol("x"), Symbol("y")))
The way to generalize
Expr
, is to replace everything constant to everything variable.For example, even the ground types like
Integer | Symbol
can be eliminatedAnd then you are able to replace
T
with anything to achieve the generality.For example, now you find that it is absurd that we had even needed
Atom
at the first place,because it is safe to use
Expr[int | str]
and defineint
as integers,str
as symbols, and things should still work type safely.(We get original one back by using
Expr[Integer | String]
)And you can simultaneously make the program more minimal
like using
Literal['x'] | Literal['y'] | Literal['z']
if you only needx, y, z
as symbols.It is also good to develop functions that works for every
Expr[T]
because that allows generality for users,
For example it should be possible to transform
Add(dog, dog)
toMul(2, dog)
without having to change the core to deal with that object.
In this way, we can keep the core from growing indefinitely, while achieving full generality,
despite every feature requests users make.
I also note that something interesting happens when you parameterize that by
Never
For example,
x: Expr[Never] = Add(Add(), Mul())
which in fact constructs integer arithmetic by
0
and1
And things like
Trig[Never]
can't be constructed because they don't have ground typesConclusion
It is possible to perform full syntactic analysis of expressions by using recursive type aliases.
For example, you can even define mutually recursive grammar like
and verify in type-level that things like altnating structure of trigonometric and hyperbolic functions are correct
I hope that it gets handy for people who want to develop application project over sympy, or startup new projects in sympy.
who wants to keep things minimal while ensuring that things work fully general.
(and this is ensured by building correct software over parametric polymorphism)
Notes
Finitary vs Variadic Tree
There is technical reason to split finitary case and variadic case, because of the subtle difference in
tuple[T0, T1]
andtuple[T, ...]
When you use finitary product type, you can freely define positionally typed expression trees, which comes in handy.
For example, if you want exponent of power to be restricted to integer only, like$x^2, x^3$ and exclude the cases where the exponents should be too general like $x^{\sin(n)}$
Then you can use
Pow[Expr, int]
And you can safely write the code like:
without having to worry about
factorint
not working with general expressions than int.And the other reason to use finitary tuple type than variadic tuple type is that you can assert whether the
x.args
always has length2
.Before
TypeVarTuple
was introduced, it was common practice for developers to workaround likeVector2(Generic[T0, T1])
,Vector3(Generic[T0, T1, T2])
every time you need to verify positional arguments,which looks stupid 1
vs Object Oriented Recursive Data Structure
Before recursive typing was supported for type aliases (in Pyright)
The only workaround to type recursive data structures is to define classes.
For example, the following code below can assert arg-invariance 2 of
sympy.Basic
Such that
Basic[Atom]
can correctly type the arg-invariance ofsympy.Basic
However, this is not the best solution because when you want to subclass
Expr
fromBasic
and assert another arg-invariance such that everything inside
Expr
should beExpr
orAtom
Expr[Atom].args
is notAtom | Expr[Atom]
but it only inferred asAtom | Basic[Atom]
.You need to override
args
like below if you want to make sure thatExpr
has its own sub arg-invariance thanBasic
:However, if you use recursive type alias, you can express sub arg-invariance of
Expr
more conveniently as:Expr: TypeAlias = "Basic[Expr[T]]"
mypy
don't support recursive type alias at the moment,which gives the limitation of using such syntax like above
Superclassing vs Union
After union typing and recursive typing is supported,
we have two answers to achieve polymorphism
class Trig, class Sin(Trig), class Cos(Trig), class Tan(Trig)
Trig = Sin | Cos | Tan
However, I notice that there is a technical difference between them
because subclassing would mean "Infinite Union", while union typing would mean finite union.
You can think of
Trig
(used as superclass) as infinite union likeSin | Cos | Tan | ... | TrigOmega
that converges toTrig
,which can't be verified by computer.
For example, when you use
class Trig
, type checkers will fail to narrow downTrig -> Sin | Cos | Tan
even if you literally have 3 classes,
It will give up because assuming that there are infinitely many subclasses defined for them.
Footnotes
https://peps.python.org/pep-0646/#motivation ↩
https://github.com/sympy/sympy/wiki/Args-Invariant ↩
Beta Was this translation helpful? Give feedback.
All reactions