Using static type checking as qa on a Flow library #3191

dyllamt · 2020-08-20T16:31:12Z

Current behavior

As a user developing Flows, I struggle with debugging flows, since each of the tasks represents delayed execution. Running a flow is the most complete test, but since flows can change the state of external systems, it would be helpful to have some qa that my flow is set-up correctly before running/registering. This is especially impactful in the situation where I have a flow library and want to implement this qa as part of a CI pipeline.

Proposed behavior

For flows with Tasks that have data dependencies between them, I could qa on the types passed between different tasks. Right now, tasks are typed to hint at the types encountered when the flow is registered. Imagine a scenario, where the types hinted at the task execution result (the result of the delayed execution). You could run mypy and ensure that the dependencies between flows have sensible types (plus your IDE will be more helpful)

Example

from typing import Any, Callable, TypeVar

import prefect
import prefect.tasks.core.function


F = TypeVar('F', bound=Callable[..., Any])


def typed_task(fn: F, **task_init_kwargs: Any) -> F:
    """Same dynamic code as `prefect.task`, but preserves function signature for debugging in a Flow context.
    """
    return prefect.tasks.core.function.FunctionTask(fn=fn, **task_init_kwargs)  # type: ignore


@typed_task
def task_one() -> str:
    return "1"


@typed_task
def task_two(arg: str) -> int:
    return int(arg)


with prefect.Flow("test_type_hints") as flow:
    result_one: str = task_one()  # typed to hint at the execution result 
    result_two: int = task_two(result_one)
flow.register()

The text was updated successfully, but these errors were encountered:

dyllamt · 2020-08-21T01:17:30Z

Update: It won't be possible to support this until PEP 612 and changes to mypy go live python/mypy#8645

jcrist · 2020-09-18T18:23:55Z

@dyllamt, those type signatures aren't 100% correct - the output of calling a task is another task, the annotations you've provided above tell mypy that the @typed_task decorated functions are just functions. While useful for the kind of type checking you're doing above, if you ever wanted to treat the output of calling one of those functions as a true task, mypy would complain.

In #3346 I fixed the task decorator to properly tell mypy that decorated functions are FunctionTask types. What I think we'd want to do to satisfy your needs is make Task types a parametrized type, then propagate that information through calls. So with your above example, calling task_one would return an instance of Task[str] rather than Task, and the generated task signature for task_two would take in a Union[str, Task[str]] rather than a str.

dyllamt · 2020-09-23T18:11:50Z

I think that's an elegant solution. When I was approaching the problem, I was just thinking about a narrow scope of looking at types within the context of flow execution. Parameterization seems like a good tack.

In my mind, a Task really is just a function, which has delayed execution properties. So I think part of the parameterization has to include it's argument signature as well as the return type (analogous to Callable). What are your thoughts on that? My proposed spec for task_two in the above example: Task[[str], int].

jcrist · 2020-09-23T18:48:16Z

This is a bit tricky, since prefect doesn't disambiguate between tasks that have been called (and represent a delayed result) and tasks that haven't been called and should be treated more like functions. The first really only needs the result parameter, while the second would need both.

In my experience I rarely see code passing around uncalled tasks as higher-order functions, so parametrizing on the call signature feels less important to me than on the call result (without higher-order functions, mypy can statically check the caller type signature on the task itself, which is already exposed). That said, perhaps the extra complication would be worth it, and would future-proof us against situations where that becomes more common.

dyllamt · 2020-09-27T18:04:47Z

Thinking it over, I feel comfortable parameterizing on just the return type. You're right - I never use higher-order functions with tasks. Generally, I have a module of pure functions, and then at the last minute in a Flow module, I'll convert to tasks:

# func.py
def add(x: float, y: float) -> float:
    return x + y

def sub(x: float, y: float) -> float:
    return x - y

# flow.py
add_task = prefect.task(add)
sub_task = prefect.task(sub)

I guess I would have to stop using the task decorator this way, since we need to hint that the task accepts Union[float, Task[float]], but I can live with that.

cicdw · 2020-10-03T03:31:58Z

We built support for tracking call signature annotations early on (e.g., https://github.com/PrefectHQ/prefect/blob/master/src/prefect/core/task.py#L744) and could probably leverage this to do some static validation independently of mypy, while possibly leveraging mypy within a task but use a higher level Prefect-specific type validation between tasks / at the Flow level that takes things like triggers into account.

github-actions · 2022-11-19T02:10:57Z

This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment.

github-actions · 2022-12-04T02:01:39Z

This issue was closed because it has been stale for 14 days with no activity. If this issue is important or you have more to add feel free to re-open it.

dyllamt added the enhancement An improvement of an existing feature label Aug 20, 2020

jcrist mentioned this issue Sep 18, 2020

A few mypy fixups #3346

Merged

3 tasks

jcrist closed this as completed in #3346 Sep 18, 2020

jcrist reopened this Sep 18, 2020

github-actions bot added the status:stale This may not be relevant anymore label Nov 19, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using static type checking as qa on a Flow library #3191

Using static type checking as qa on a Flow library #3191

dyllamt commented Aug 20, 2020

dyllamt commented Aug 21, 2020

jcrist commented Sep 18, 2020 •

edited

dyllamt commented Sep 23, 2020

jcrist commented Sep 23, 2020 •

edited

dyllamt commented Sep 27, 2020

cicdw commented Oct 3, 2020

github-actions bot commented Nov 19, 2022

github-actions bot commented Dec 4, 2022

Using static type checking as qa on a Flow library #3191

Using static type checking as qa on a Flow library #3191

Comments

dyllamt commented Aug 20, 2020

Current behavior

Proposed behavior

Example

dyllamt commented Aug 21, 2020

jcrist commented Sep 18, 2020 • edited

dyllamt commented Sep 23, 2020

jcrist commented Sep 23, 2020 • edited

dyllamt commented Sep 27, 2020

cicdw commented Oct 3, 2020

github-actions bot commented Nov 19, 2022

github-actions bot commented Dec 4, 2022

jcrist commented Sep 18, 2020 •

edited

jcrist commented Sep 23, 2020 •

edited