Typed Computer Algebra System #24844

sylee957 · 2023-03-04T09:25:55Z

sylee957
Mar 4, 2023
Collaborator

Here, I make a tutorial how to use pyright to build your own computer algebra system,
that can use parametrically polymorphic expression trees such that you can easily make any computer algebra system you need.
It can even rely on sympy objects and compose with any other thing, while ensuring type safety.
I'd give an argument about how good type checking can get handy for verifying correct data is used in computer algebra system
(and general data processing problems)

Abstract Syntax Tree

A first step is to write computer algebra system define classes for abstract syntax tree types.
Here I use dataclass to derive immutability, syntactic equality, and string representation similar as how SymPy does in Basic.

T = TypeVar("t")
Ts = TypeVarTuple("Ts")
T0 = TypeVar("T0")
T1 = TypeVar("T1")
T2 = TypeVar("T2")


@dataclass(frozen=True, init=False)
class Tree(Generic[*Ts]):
    args: tuple[*Ts]

    def __init__(self, *args: Unpack[Ts]):
        super().__setattr__("args", args)

Expression Trees

The basics to construct sympy expression tree is to have Integer, Symbol

class Integer:
    pass

class Symbol:
    pass

and also have symbolic operators for arithmetic

class Add(Tree[*tuple[T, ...]]):
    pass

class Mul(Tree[*tuple[T, ...]]):
    pass

class Pow(Tree[T0, T1]):
    pass

Mathematical constants can be reduced to nullary expressions trees,
without having to hassile with things like AtomicExpr

class E(Tree[()]):
    pass

class Pi(Tree[()]):
    pass

And functions like sin, cos, tan can be defined as unary expression trees:

class Sin(Tree[T0]):
    pass

class Cos(Tree[T0]):
    pass

class Tan(Tree[T0]):
    pass

And a convenient parametric type alias can be developed

Trig: TypeAlias = Sin[T] | Cos[T] | Tan[T]

So you can verify simultaneously that something is trigonometric function

x: Trig[int] = Sin[int](1)
x = Cos[int](1)

However, you would have a question that the types defined above are not very general

Although we can express any 1-level addition of integers as:

x: Add[int] = Add(1, 2, 3)

And addition of integers inside addition as:

x: Add[Add[int]] = Add(Add(1, 2), Add(3, 4))

We can't define arbitrary nested addition types of integers because it would need type of something like
Add[int | Add[int] | Add[int | Add[int]] | ...]

However, it is possible to define such types by recursive type aliasing

NestedAdd: TypeAlias = Add["int | NestedAdd[int]"]

such that arbitrary nested addition types work as:

x: NestedAdd = Add(1, 2)
x = Add(1, Add(2, Add(3, 4)))

However, x = 1 doesn't check because it is not inside the addition

For that case, the definition can be inverted as

NestedAdd: TypeAlias = int | Add["NestedAdd[int]"]

And then x: NestedAdd = 1 works

And in fact, it is a good to distinguish both types in order to avoid the confusion

Expression Type

We can assemble all the expression types as a big recursive expression type

Expr: TypeAlias = (
    Integer
    | Symbol
    | "Add[Expr]"
    | "Mul[Expr]"
    | "Pow[Expr, Expr]"
    | E
    | Pi
    | "Trig[Expr]"
)

And now it is possible to type general mathematical expressions:

expr: Expr = Symbol('x')
expr = Add(Integer(1), Symbol('x'))
expr = Mul(Integer(2), Sin(Symbol('x')))

Structural induction

Now, it comes how to write the functions that works over such recursive types
and introduce the benefit of using that over any other approaches like object-oriented polymorphic approaches.

The biggest advantage of using recursive types with union is that it is easy to verify the correctness of the program by mathematical induction.
If the program is correct over every components (Integer, Symbol, Add, Mul, ...) of the union,
then it is trivial to assert that the program works for every sympy expressions, without having to think about

What if it doesn't work with x ?
What if it doesn't work with Add(x, y) ?
What if Add(Mul(Pow, ...?

One example of sympy's basic function that can easily be tested by this is:

def free_symbols(expr: Expr) -> set[Symbol]:
    match expr:
        case Integer():
            return set()
        case Symbol():
            return {expr}

        case E():
            return set()
        case Pi():
            return set()

        case Add(args):
            return set[Symbol].union(*(free_symbols(x) for x in args))
        case Mul(args):
            return set[Symbol].union(*(free_symbols(x) for x in args))
        case Pow([x, y]):
            return free_symbols(x) | free_symbols(y)

        case Sin([x]):
            return free_symbols(x)
        case Cos([x]):
            return free_symbols(x)
        case Tan([x]):
            return free_symbols(x)

And 'pyright' is able to warn if you any of the cases are missing.

Other examples that this can be useful are printer functions like latex, pycode
where you need to make sure that every sympy classes have their own printers to make sure that every sympy expressions are printable.

Generalized Expression Type

Now, there comes the problem of extending the expressions.

When you need some new symbolic expressions, let's say, hyperbolic functions, bessel functions, sum, integral, matrix, ...
We always have to edit the messy union types to make it grow bigger
And this is one way how sympy got bigger, it had to make base class (Basic, Expr) to grow bigger
and also many mathematical functions had to be defined.

And then you need to rewrite every parts of the sympy code that uses that
(like printers) to correctly handle new symbolic expression types.

However, it is technically possible to solve that problem without editing the core,
by parametrizing the recursive type aliases:

Expr: TypeAlias = (
    T
    | Integer
    | Symbol
    | "Add[Expr[T]]"
    | "Mul[Expr[T]]"
    | "Pow[Expr[T], Expr[T]]"
    | E
    | Pi
    | "Trig[Expr[T]]"
)

Now, everything get parametrized by T, and the good thing is that T can be substituted with anything by users
to define recursive expression types that includes T.

For example you can extend the grammar of expressions to include hyperbolic expressions by:

Hyper: TypeAlias = Sinh[T] | Cosh[T] | Tanh[T]
MyExpr: TypeAlias = "Expr[Hyper[MyExpr]]"

and then arbitrary nested hyperbolic expressions can be defined

x: MyExpr
x = Add(Sinh(Symbol("x")), Cosh(Symbol("y")))
x = Sinh(Add(Symbol("x"), Symbol("y")))

And Expr[Never] can be used to get the unparametrized Expr as above.
For example, this passes type check
y: Expr[Never] = Sin(Add(Symbol("x"), Symbol("y")))
but this does not
y: Expr[Never] = Sinh(Add(Symbol("x"), Symbol("y")))

The way to generalize Expr, is to replace everything constant to everything variable.
For example, even the ground types like Integer | Symbol can be eliminated

Expr: TypeAlias = (
    T
    | "Add[Expr[T]]"
    | "Mul[Expr[T]]"
    | "Pow[Expr[T], Expr[T]]"
)

And then you are able to replace T with anything to achieve the generality.

For example, now you find that it is absurd that we had even needed Atom at the first place,
because it is safe to use Expr[int | str] and define int as integers, str as symbols, and things should still work type safely.
(We get original one back by using Expr[Integer | String])

And you can simultaneously make the program more minimal
like using Literal['x'] | Literal['y'] | Literal['z'] if you only need x, y, z as symbols.

It is also good to develop functions that works for every Expr[T]
because that allows generality for users,
For example it should be possible to transform Add(dog, dog) to Mul(2, dog)
without having to change the core to deal with that object.
In this way, we can keep the core from growing indefinitely, while achieving full generality,
despite every feature requests users make.

I also note that something interesting happens when you parameterize that by Never
For example, x: Expr[Never] = Add(Add(), Mul())
which in fact constructs integer arithmetic by 0 and 1
And things like Trig[Never] can't be constructed because they don't have ground types

Conclusion

It is possible to perform full syntactic analysis of expressions by using recursive type aliases.
For example, you can even define mutually recursive grammar like

MyTrig: TypeAlias = T | "Trig[MyHyper[T]]"
MyHyper: TypeAlias = T | "Hyper[MyTrig[T]]"

and verify in type-level that things like altnating structure of trigonometric and hyperbolic functions are correct

x: MyTrig[int]
x = 1
x = Sin(1)
x = Sin(Sinh(1))
x = Sin(Sinh(Sin(1)))
x = Sin(Sinh(Sin(Sinh(Sin(1)))))

I hope that it gets handy for people who want to develop application project over sympy, or startup new projects in sympy.
who wants to keep things minimal while ensuring that things work fully general.
(and this is ensured by building correct software over parametric polymorphism)

Notes

Finitary vs Variadic Tree

There is technical reason to split finitary case and variadic case, because of the subtle difference in tuple[T0, T1] and tuple[T, ...]
When you use finitary product type, you can freely define positionally typed expression trees, which comes in handy.

For example, if you want exponent of power to be restricted to integer only, like $x^2, x^3$ and exclude the cases where the exponents should be too general like $x^{\sin(n)}$

Then you can use Pow[Expr, int]

And you can safely write the code like:

def pow_expand(x: Pow[Expr, int]):
    factorint(x.args[1])

without having to worry about factorint not working with general expressions than int.

And the other reason to use finitary tuple type than variadic tuple type is that you can assert whether the x.args always has length 2.

Before TypeVarTuple was introduced, it was common practice for developers to workaround like
Vector2(Generic[T0, T1]), Vector3(Generic[T0, T1, T2]) every time you need to verify positional arguments,
which looks stupid ¹

vs Object Oriented Recursive Data Structure

Before recursive typing was supported for type aliases (in Pyright)
The only workaround to type recursive data structures is to define classes.

For example, the following code below can assert arg-invariance ² of sympy.Basic

class Basic(Generic[T]):
    args: tuple[T | Basic[T], ...]

Such that Basic[Atom] can correctly type the arg-invariance of sympy.Basic

However, this is not the best solution because when you want to subclass Expr from Basic
and assert another arg-invariance such that everything inside Expr should be Expr or Atom

class Expr(Basic[T]):
    pass

Expr[Atom].args is not Atom | Expr[Atom] but it only inferred as Atom | Basic[Atom].

You need to override args like below if you want to make sure that Expr has its own sub arg-invariance than Basic:

class Expr(Basic[T]):
    args: tuple[T | Expr[T], ...]

However, if you use recursive type alias, you can express sub arg-invariance of Expr more conveniently as:

Expr: TypeAlias = "Basic[Expr[T]]"

mypy don't support recursive type alias at the moment,
which gives the limitation of using such syntax like above

Superclassing vs Union

After union typing and recursive typing is supported,
we have two answers to achieve polymorphism

Via defining superclass class Trig, class Sin(Trig), class Cos(Trig), class Tan(Trig)
Via union Trig = Sin | Cos | Tan

However, I notice that there is a technical difference between them
because subclassing would mean "Infinite Union", while union typing would mean finite union.

You can think of Trig (used as superclass) as infinite union like Sin | Cos | Tan | ... | TrigOmega that converges to Trig,
which can't be verified by computer.

For example, when you use class Trig, type checkers will fail to narrow down Trig -> Sin | Cos | Tan
even if you literally have 3 classes,
It will give up because assuming that there are infinitely many subclasses defined for them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typed Computer Algebra System #24844

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Typed Computer Algebra System #24844

sylee957 Mar 4, 2023 Collaborator

Abstract Syntax Tree

Expression Trees

Expression Type

Structural induction

Generalized Expression Type

Conclusion

Notes

Finitary vs Variadic Tree

vs Object Oriented Recursive Data Structure

Superclassing vs Union

Footnotes

Replies: 0 comments

sylee957
Mar 4, 2023
Collaborator