Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples to the docs showing how to specify various Deployment attributes in a YAML config #5919

Closed
marvin-robot opened this issue Jun 20, 2022 · 4 comments
Labels
docs status:accepted We may work on this; we will accept work from external contributors

Comments

@marvin-robot
Copy link
Member

Opened from the Prefect Public Slack Community

mzawadzki: Is there a way to specify flow_runner in the deployment? I get ValueError: Unregistered flow runner 'DockerFlowRunner'when running prefect deployment create my_deployment.yaml. My deployment looks like this:

name: test_platform_flow_first_deployment
flow_name: Data Platform Demo
flow_location: ./test_platform_flow.py
parameters:
  to_print: "Hello from first deployment!"
tags:
  - dev
flow_runner: 
  type: DockerFlowRunner
  config:
    image: viadot:orion

Unfortunately the flow_runner config is not documented anywhere so it's hard for me to say if I'm specifying it incorrectly or it's not supported at all.

anna: Can you try the same using Python? much easier than YAML:

import platform
from prefect import task, flow
from prefect import get_run_logger
from prefect.deployments import DeploymentSpec
from prefect.flow_runners import DockerFlowRunner


...


@flow
def hello_flow():
    hi = say_hi()
    print_platform_info(wait_for=[hi])


DeploymentSpec(name="dev", flow=hello_flow, flow_runner=DockerFlowRunner())


if __name__ == "__main__":
    hello_flow()

anna: afaik, YAML is pretty much only for non-Python DevOps admins

maybe you can try this?

flow_runner: DockerFlowRunner

mzawadzki: I'll check and get back to you but I'd prefer to use YAML eventually, I don't feel comfortable storing configs in executable files, I feel like analysts will eventually abuse this somehow 😅 and YAML files are very easy and safe to parse and check for policy in CI/CD.

anna: how would they abuse DeploymentSpec, but wouldn't abuse YAML? :thinking_face: it's the same config that gets sent to the backend

anna: But I totally understand what you mean with respect to ensuring standards, specifying deployments via Python code allows to build some extra abstraction/function allowing you to avoid boilerplate (which YAML forces you to have) - an example:

from prefect.deployments import DeploymentSpec
from prefect.flows import Flow
from prefect.orion.schemas.schedules import SCHEDULE_TYPES

# from prefect.flow_runners import DockerFlowRunner
from typing import Any, Dict, List
from flows.async_flow import async_flow
from flows.crypto_prices_etl import crypto_prices_etl
from flows.repo_trending_check import repo_trending_check


def set_deployment_spec(
    flow: Flow,
    deployment_name_suffix: str = "dev",
    schedule: SCHEDULE_TYPES = None,
    parameters: Dict[str, Any] = None,
    tags: List[str] = None,
) -> DeploymentSpec:
    deploy_tags = (
        [deployment_name_suffix] if tags is None else [deployment_name_suffix, *tags]
    )
    return DeploymentSpec(
        flow=flow,
        name=f"{flow.name}_{deployment_name_suffix}",
        schedule=schedule,
        tags=deploy_tags,
        parameters=parameters,
        # flow_runner=DockerFlowRunner()
    )


set_deployment_spec(async_flow)
set_deployment_spec(crypto_prices_etl)
set_deployment_spec(repo_trending_check)
set_deployment_spec(
    repo_trending_check,
    deployment_name_suffix="orion_dev",
    parameters=dict(repo="orion"),
)

anna: when using the same default spec, creating deployment is as simple as a single line and passing the flow to it as in here:

set_deployment_spec(crypto_prices_etl)

anna: <@ULVA73B9P> open "Add examples to the docs showing how to specify various Deployment attributes in a YAML config"

Original thread can be found here.

@sm-Fifteen
Copy link

sm-Fifteen commented Jul 7, 2022

I'm interested in this as well. This kind of "Infrastructure as config" approach where deployment configuration is expressed as files that can be tracked by version control is something I think would be beneficial in some projects.

One problem I have related to this is that not only is "DeploymentSpec" not detailed in the documentation, it's not even in the openapi.json document, so I can't even do this:

# yaml-language-server: $schema=http://localhost:4200/openapi.json#components/schemas/DeploymentSpec

name: demo-github-stuff
flow_location: ./demo_github_stuff.py
flow_name: GitHub Stars
tags:
- foo
- bar
flow_runner:
  type: subprocess
  config:
    virtualenv: "/foo/bar/.pyenv"
parameters:
    name: "Earth"
schedule:
    interval: 3600

...and have the VSCode YAML extension validate the document for me and guide me through writing it.

EDIT: It was a lucky guess on my part that the YAML extension actually parses the schema URL like it would inside a JSON schema, meaning URL fragments work as JSON pointers. That means you could actually get the document to properly validate like that just by having that comment as the first line, if DeploymentSpec was actually exposed in the OpenAPI definition.

EDIT2: DeploymentSpec being absent from the OpenAPI document is apparently just a consequence of how FastAPI only includes model schemas that are present in documented routes, which makes sense given its use case. Still, having the schema of configuration files available somewhere, possibly generated by Pydantic directly on a separate route, could prove really handy when editing these and others.

EDIT3: Just as I made my previous edit, I notice that Prefect 2.0b8 released this morning, with the documenteation amended to say "Deployments based on DeploymentSpec are deprecated as of the Prefect 2.0b8 release", so I guess DeploymentSpec not being there may no longer be a problem after all.

@trymzet
Copy link
Contributor

trymzet commented Jul 22, 2022

@sm-Fifteen So how do you know what to pass to Deployment in the YAML now? I'm thinking it has to be sth like:

flow:
  name: "My flow"
  path: "bucket/path/flow.py"
...
packager:
  type: "file"
  filesystem:
    _block_document_id: "your_remote_filesystem_block_id"

But I don't have the time to debug this or trace the entire logic so I'm waiting until there is any documentation available (hopefully Wednesday :)).

@sm-Fifteen
Copy link

sm-Fifteen commented Jul 22, 2022

@trymzet: Turns out the Deployment definition in the generated OpenAPI spec is the wrong one, so what I ended up doing is just running from prefect.deployments import Deployment; print(Deployment.schema_json(indent=2)) and pasting it in a file (prefect_deployment.schema.json.txt). That actually helped a lot with figuring out the actual format:

# yaml-language-server: $schema=./prefect_deployment.schema.json

name: demo-github-stuff
flow:
  path: ./demo_github_stuff.py
  name: GitHub Stars
tags:
- foo
- bar
flow_runner:
  type: subprocess
  config:
    virtualenv: "/foo/bar/.pyenv"
parameters: {}
schedule:
  interval: 300
packager:
  type: 'orion'
  serializer:
    type: 'source'

It's not quite ideal yet, because with the way flow_runner, packager and serializer work, they're defined as superclasses from which DockerPackager, OrionPackager, SourceSerializer and the like inherit, instead of flow_runner, packager and serializer being unions of the leaf classes (possibly tagged unions, as those are now supported by Pydantic) that Prefect would recognize. This makes it so the yaml schema validator can't really tell based on the information contained in the Deployment model what properties make sense for these types and the user is left having to look up package-specific API reference doc to figure it out. I actually had to look up the source to understand that virtualenv goes in flow_runner.config, not directly beneath flow_runner, unlike FilePackager with filesystem.

As for what to put under filesystem, the doc becomes scarcer and sparser the further down you go. :/

@billpalombi billpalombi added status:accepted We may work on this; we will accept work from external contributors priority:high labels Aug 9, 2022
@discdiver
Copy link
Contributor

Deployment docs have been updated showing deployment YAML file fields and how to make Deployments via Python deployment definitions.
https://docs.prefect.io/concepts/deployments/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs status:accepted We may work on this; we will accept work from external contributors
Projects
None yet
Development

No branches or pull requests

6 participants