Skip to content

Releases: tgen/jetstream

v1.7.4

29 Nov 18:45
f9594f7
Compare
Choose a tag to compare

Jetstream v1.7.4 Release Notes

Major changes

  • Improved pipeline version parsing to use PEP440 style versioning - development and latest have been added as aliases to the latest development and stable release respectively.
    • Pipelines and their version now have a defined comparison format, e.g. defining __lt__ and __eq__ functions, this allows for a sorted pipeline list.
  • Improved handling support of JS_PIPELINE_PATH both within template via the expand_vars function and within the slurm_singularity backend.

Bug fixes

  • The slurm_singularity backend has improved search functionality for finding cached images, previously only found cached images if the digest was explicitly defined for the task.
  • Avoid erroneously attempting to bind $JS_PIPELINE_PATH if it has not been set, e.g. if the user is simply running jetstream run without any pipeline context.

Minor changes

  • Linting related adjustments to the slurm_singularity.py backend
  • Limiting the networkx version range to exclude the 3.0 release for now

Full Changelog: v1.7.3...v1.7.4

v1.7.3

19 Sep 01:36
0823bae
Compare
Choose a tag to compare

Jetstream v1.7.3 Release Notes

Major changes

  • For slurm backends, the sacct pinginess has been reduced, and we request less information instead of --all, this reduces load on the slurmdbd
  • The slurm_singularity backend can now submit jobs without a container definition
  • Added an md5 and assignbin filter for using in templates

Bug fixes

  • Not all asyncio.Event(loop)'s were fixed in previous commits, this should fix other cases impeding us from using python 3.10 #144

Minor changes

  • Adjusted handling of gpu jobs for the slurm_singularity backend, we now set SINGULARITYENV_CUDA_VISIBLE_DEVICES

Ease of use updates

  • A bash completion script is available under extras/completions/jetstream.bash, this is still in development, but can be used as a template for other users. This can be installed under ~/.bash_completion or to your preferred user completion script dir, e.g. ~/.local/share/bash-completion/completions/jetstream.bash

v1.7.2

12 Jan 17:33
f699486
Compare
Choose a tag to compare

Jetstream v1.7.2 Release Notes

Major changes

  • This adds the LibYAML implementation of yaml parsing if available, otherwise it defaults to the PyYAML implementation - more details available in issue #143
  • Handling an issue from a downstream pipeline - #10
    • By using the identity of the task for the slurm_singularity backend generated files, we avoid the potential for a sample.name or any user supplied variable generating a task name that is longer than 255 characters.
  • Better containerization of the slurm_singularity backend, using --contain ensures that we don't bind /home or any other directories defined in the singularity.conf unless we explicitly bind them
    • We also only use --nv if CUDA_VISIBLE_DEVICES is defined, some users have been misled into thinking that the warning thrown when on a non-gpu box is a job breaking error.

Minor changes

  • Updated mash report text - #116

v1.7.1

11 Oct 19:55
eca552c
Compare
Choose a tag to compare

Jetstream v1.7.1 Release Notes

Bug Fixes

  • We ran into a case where we were pinging the container registry way too frequently and getting IO timeouts. Now scripts generated by the slurm_singularity backend will have more extensive bash logic in order to use the cached singularity image if available, it should always be available unless the cache location is cleaned up post starting jetstream. This drastically reduces the "pinginess".

Minor changes

  • Reduced complexity of slurm_singularity makedirs creation of output directories.

Dev notes

  • Updated maintainer info in __init__.py

v1.7

22 Sep 17:15
64c44da
Compare
Choose a tag to compare

Jetstream v1.7 Release Notes

Major changes

  • A large number of backends have been added supporting docker, singularity, and dnanexus.

  • Slurm backend(s) settings have been moved to the overall settings config. Allows for user level slurm backend adjustments. For example,
    "NODE_FAIL" is now considered as an active state since the job should be requeued and potentially completed by slurm.

Bug Fixes

  • A deprecation level bug was introduced in python 3.10 relating to certain asyncio functions. We currently support python 3.7+.

  • Version checking has been updated to use packaging.version instead of distutils.version to be inline with PEP 440.

Dev notes

  • Updated unit test for container based backends

  • Version checking has been updated to use packaging.version instead of distutils.version to be inline with PEP 440.

v1.6.2

25 Oct 21:37
5c3d4b7
Compare
Choose a tag to compare

Jetstream v1.6.2 Release Notes

Major changes

  • Fixed issue #131
  • Fixed parsing of account info when the cluster does not supply accounting info
  • Fixed "RuntimeError: generator ignored GeneratorExit" exception handling

Dev Notes

  • Security issues resolved from dependabot

v1.6.1

01 Feb 21:03
ad1d051
Compare
Choose a tag to compare

Jetstream v1.6.1 Release Notes

Major changes

  • A new option --mash-only/-m allows users to mash two workflows prior to running a
    pipeline or workflow.

Dev Notes

  • Added unit test for mash only feature

v1.6

14 Feb 04:53
22d7ef9
Compare
Choose a tag to compare

Jetstream v1.6 Release Notes

Major changes

  • A new option --pipeline will allow for pointing directly to a pipeline directory
    instead of looking up by name.

  • New task directive reset is understood by the workflow class. Reset directives can
    be either a string or a sequence of strings. When the task is reset, it will also
    trigger a reset on any listed task name. Special values predecessors will trigger
    a reset for any direct predecessors of the task.

  • Pipeline and project paths are now exported as environment variables by the runner.
    Here are the environment variables:
    - JS_PIPELINE_PATH
    - JS_PIPELINE_NAME
    - JS_PIPELINE_VERSION
    - JS_PROJECT_PATH

  • Three new template global functions were added: env, getenv, setenv for
    interacting with environment variables during template rendering. Details in
    docs/templates.md

  • Config file inputs via -c/--config and -C/--config-file have been improved. There
    are now options for loading plain text as a list of lines (txt file type), and also
    for loading tabular data without headers (csv-nh, tsv-nh). Tabular config data
    can now be used with -C/--config-file and will be accessible with the __config_file__
    variable inside templates (json/yaml data will still be loaded at the top level).

Bug Fixes

  • Fixed bug with settings not creating correct pipelines variables in new config files.

  • Some arguments that worked with jetstream run but not jetstream pipelines are now
    working for both. Arguments like -r/--render-only and -b/--build-only will work
    with pipelines command now.

  • SlurmBackend slurm command checks are now silent instead of printing to terminal

  • Resolved single-column tsv parsing issue

Dev notes

  • Added unittests for entire pipelines and included a set of example pipelines

  • Version info is hardcoded in two places, setup.py and jetstream/__init__.py.
    There are guides added to the dev docs for how to handle features and releases.

v1.5 Release Notes

12 Jul 22:46
Compare
Choose a tag to compare

Major notes

Pipelines command

Jetstream now includes the jetstream pipelines command. Pipelines are another
layer added to managing workflow templates. Since templates support
import/include/extend statements with Jinja, they can actually be modularized
across several files. The pipeline system helps organize complicated templates
with a few helpful features.

Pipelines are Jetstream templates that have been documented with version
information and added to a jetstream pipelines directory. This command allows
pipelines to be referenced by name when starting runs and automatically
includes any pipeline scripts and variables during process.

To create a pipeline:

Add the template file(s) to a directory that is in your pipelines searchpath.
The default searchpath is your user home directoy, but it can be changed in the
application settings (see jetstream settings -h)

Create a pipeline.yaml file in the directory and be sure to include the
required fields.

Pipelines allow templates to be referenced by a name and optional version

Have a template that you use all the time? Name it, document it, and then you
can use jetstream pipelines to start runs with the name. It's also
version-aware, so you can reference a specific version of the pipeline, or just
let jetstream find the latest version that you have installed.

Variables can be included in the pipeline.yaml

Pipelines can include constant data used for rendering the templates. For
example, I use the pipelines.yaml to contain the file paths to reference data
for Phoenix. This removes the need to repeat these paths throughout the
template source code, and also brings those variables under our version control
system (they used to be stored in files outside of the pipeline code).

Additional executables/scripts can be included with the pipelines

If a bin property is added to the pipeline manifest, that directory will be
prepended to the user $PATH environment variable when the pipeline is started.
It's a handy way to bundle additional scripts with a pipeline and have them
all fall under the same project for version control purposes.

Tasks command updates

The tasks command was reworked internally, and there were some changes to the
cli options. The general philosophy for the command now is that a set of
filters is used to select the tasks of interest by name or status. The task
names are now given as positional arguments, for example:

jetstream tasks bwa_mem_sample_A haplotypecaller_sample_A ...

These arguments allow for glob wildcard matching: *

jetstream tasks bwa_mem_sample_*

Regex is also still supported:

jetstream tasks --regex 'bwa_mem_sample_[^A]' 

Finally, tasks matching the patterns can be filtered by status:

jetstream tasks -s complete bwa_mem_sample_*

By default, this command just lists any tasks matching the name/status options.
The action options can be used to perform additional actions on those tasks.
For example, --verbose prints out a ton of information about each task matching
the query.

Template variables from command-line arguments

These options have changed (again...sorry), but this time I think they work
really well. The reason for this change is to get rid of the awkward pattern
of having to add the extra -- argument before listing any config args. Also,
this new format is much easier to parse, and results in more informative error
messages when there are problems. Here are some use case examples:

Note: config variables can be added when creating projects (they will be stored
in the project.yaml) or when the pipeline/template is run (but they will not
be saved in the project.yaml). I typically prefer to add variables when
creating projects, because it means you can always go back later and see what
was used to render the template (in addition to seeing the final values used
in the commands themselves)

Adding variables when creating a project:

Variables can be added one argument at a time

jetstream init myproject -c reference_path /path/to/reference/file

or multiple:

jetstream init myproject -c reference_path /path/to/reference/file -c email ryan@tgen.org

variables can have a type declared (string is default if no type is declared).
the type should be included with the key parameter, colon separated.

jetstream init myproject -c int:threads 8 -c str:email ryan@tgen.org

lots of variables can loaded from files (note upper-case C)

jetstream init myproject -C ~/myconfig.json

loading variables from multiple files is also supported, but you'll need to
provide a name for them to be added under:

jetstream init myproject -c file:samples ~/mysamples.json -c file:patients ~/mypatients.json

Notice we used the lowercase -c/--config. Using uppercase -C/--config-file
overwrites the variable context entirely, and essentially adds the contents of
the file as "global" template variables. The lowercase -c file:... syntax
will include the variables loaded from the file under the namespace assigned
by the variable key.

JSON strings can also be loaded without saving to a file first:

jetstream init myproject -c json:names '["ryan", "bob", "fred"]' 

Projects

Projects have seen a couple small improvements. The project.yaml and
config.yaml have been collapsed into a single file: project.yaml with the
previous contents of that file now being listed under the field __project__.
This minor change has big impacts for loading variable data, any information
about the project can now be introspected in templates with the __project__
variable.

Re-initializing projects with jetstream init command will update the
project.yaml and will also add a record to the project history where you can
track changes to that file over time.

jetstream project has been improved and will tell you whats going on with a
project.

Minor changes

  • Project jetstream/pid.lock file is now tracking pending runs on projects.
    The jetstream run command will wait to acquire this file before starting.

  • Template variables can no longer be stored in the user application settings
    file. See more details in jetstream.templates

  • Lots of unused code was removed, this really helped reduce the dependencies

  • Task identity is only computed with cmd and exec directives. This means
    changes to cpus will not automatically cause a task to be re-run. In the
    future this may be adjusted to include runtime options like container ids or
    conda envs. Related to next note:

  • Workflow mash will always replace a task if the old version has failed. For
    example: If 99/100 tasks passed, one failed due to memory requirements. When
    you update the mem directive, only the failed task would be replaced, the
    other 99 tasks do not need to be run since the cmd hasn't changed.

  • input/after/before directives cannot be mappings with an re: property any
    more. Instead, use the new pattern directives after-re: before-re: and
    input-re. This fixes a number of issues when creating graphs.

1.3.0-beta release for demo

06 Feb 00:35
Compare
Choose a tag to compare
Pre-release

1.3.0-beta Release Notes

Major notes

New application settings system

Application settings are now handled via configuration files. There is a
detailed process for loading files and more information can be found with
the jetstream settings command. Previously some settings were taken
from environment variables, and those must now be set in your user config
file. The path where your user config file should be saved can be found
with the jetstream settings command. Here is an example config file:

# My user settings 
# Find the correct location to save this file by running "jetstream settings"
backend: slurm

pipelines:
    home: /path/to/your/pipelines/dir/

constants:
    foo: bar

Refinements to template rendering

The process for loading data used to render templates has undergone some
minor tweaks:

  1. Data should be added to a project with the jetstream init, adding data
    after the project is initialized can be accomplished by editing the config.yaml
    in the jetstream directory, or rerunning the init command. Instead of adding
    files to the config folder of a project after creating it, just pass them
    as command args.

  2. Command args for template data must follow an empty -- argument. This
    clearly distinguishes arguments for the application from template/project data
    arguments. For example:

    # Old style
    jetstream build template.jst --variables samples.json
    

    Must now be:

    # New style
    jetstream build template.jst -- --variables samples.json
    
  3. Template data will now be loaded from the following sources in order of
    descending priority (any source will override all sources below it in this
    list):

    • Command Arguments (eg. jetstream run ... -- --str:foo bar --file:csv:samples mysamples.csv)
    • Project config file (if working in a project)
    • Pipeline manifest constants: ... section (if running a pipeline)
    • User application settings config file constants: ... section
  4. Template rendering data will NO LONGER be automatically saved into the project
    config file. This allows project config data to override pipeline data reliably.
    To debug these features, enable debug logging -l debug and look for a line like this:
    templates:162 DEBUG 2019-02-05 16:44:10:Template render context:.

Projects have slimmed down

Projects will only include the jetstream directory after initialization. All
other directories are left to up to the workflow author. Projects are now
recognized by the presence of a the info file: jetstream/project.yaml. Existing
projects, should still work, but you will need to re-run jetstream init before
they are recognized. You can run the jetstream project command inside a project
to get info about that project (or make sure it has been intialized correctly)

Tasks commands can be used on any workflow file

jetstream project tasks has moved to jetstream tasks and will accept a -w/--workflow
argument. This makes it function with any built workflow file. If you are working
inside of a project (or --project argument is given), the project workflow file
will be used.

New task directive: retry

Retry is a task directive that will prevent a task from failing. The runner will resubmit
the task for the given number of attempts before failing. This state is loaded each
time the task is loaded, so it will NOT be preserved across multiple runs of the same
workflow. Here is a made-up workflow that demonstrates the directive:

- name: fails_once
  retry: 1
  cmd: |
    if [ ! -f foo.txt ]; then
      echo "File not found!"
      touch foo.txt
      exit 1
    else
      echo "File was found!"
    fi

Other changes

  • YAML error messages should be slightly more informative

  • The package includes a __main__.py file. For developers, this means it can be used with python -m jetstream

  • Tasks have a dynamic label that can be set by the runner backends. The slurm backend
    uses this feature to add the job id to logging messages regarding that task.

  • SlurmBackend will collect some account data for all tasks. This can be configured with
    settings:backends:slurm:sacct_fields

  • Tasks will include an elapsed_time state attribute for all runner backends.

  • New subcommands added render settings tasks