Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template loader improvements #82

Open
ryanrichholt opened this issue Nov 2, 2019 · 0 comments
Open

Template loader improvements #82

ryanrichholt opened this issue Nov 2, 2019 · 0 comments
Labels
enhancement New feature or request
Projects

Comments

@ryanrichholt
Copy link
Collaborator

ryanrichholt commented Nov 2, 2019

Proposal

Leveraging the package loader in Jinja2 is possible if we force pipelines to be valid python packages. This is a big change to pipelines, but it opens up the possibility of cross-pipeline imports.

Current pipeline minimal requirements:

template.jst
pipeline.yaml

Proposed pipeline requirements:

__init__.py
setup.py
templates/
    template.jst

To aid in the development of pipelines, a new pipeline helper script could be developed that would create the initial boilerplate.

Benefits

  • Pipelines can be managed with pip, this allows them to be easily distributed from github or pypi. for example:
pip install jetstream-phoenix
  • Pipelines can include arbitrary code in their __init__.py that can be used setup their configuration
    data used for rendering the template. This adds an incredible amount of power to pipelines:
"""Standard boilerplate __init__.py created with helper script
Pipelines could be discovered and introspected with the plugin interface. After
discovery, the jinja2 loader could be configured to include all of the pipelines
that are currently installed."""
from pkg_resources import resource_filename
manifest = load_yaml(resource_filename('jetstream-phoenix', 'pipeline.yaml)

# Allowing customization here has potential for amazing things...
from magicpackage import download_database

db = 'temp/data.txt'
download_database(db)

data = {
    'foo': 'bar',  
    'database': db    # Always download the latest database to the project before rendering
}

Drawbacks

  • As with all python packages, supporting multiple versions of the same package in a single environment is problematic. Python scripts typically import package not import package@v1.0. There are some features in pkg_resources and the __require__ dunder which may allow this (idk much about it), but it's not easy to pull off. The typical solution to this problem is virtual environments.

    Removing the ability to have several versions of a pipeline installed (without resorting to a virtual
    environment) is probably not a very painful change at this point.

    Keep in mind, if a true import system is made available (imports work from any installed pipeline),
    there are bigger problems to solve if multiple versions were somehow made possible. This is the
    same problem with any dependency system:

    • a requires x v1.0
    • b requires x v2.0
    • c requires a and b
    • what happens?
  • How do we deal with eggs? There may be some mechanism in setup.py to indicate that pipelines
    cannot be packaged into eggs.

Conclusion

It's still going to take a lot of planning, but in principal it's a small change with some massive benefits. I would appreciate any thoughts on the idea.

@ryanrichholt ryanrichholt added the enhancement New feature or request label Nov 2, 2019
@ryanrichholt ryanrichholt added this to To do in Version 1.6 Nov 2, 2019
@ryanrichholt ryanrichholt added this to To do in Version 1.7 Nov 6, 2019
@ryanrichholt ryanrichholt removed this from To do in Version 1.6 Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Version 1.7
  
To do
Development

No branches or pull requests

1 participant