Skip to content

mpkocher/smrtflow

 
 

Repository files navigation

smrtflow

Circle CI

This README.md is intended to be terse notes for developers. See smrtflow.readthedocs.io for full docs.

"SMRT" refers to PacBio's sequencing technology and "smrtflow" is the name common models, Resolved Tool Contract/Tool Contract interface, commandline analysis tools, service orchestration engine and web services written in scala. PacBio pipelines can be run leveraging the PacBio workflow engine, pbsmrtipe (must be installed).

This code is written in Scala and this is an SBT multi-project.

Requires java >= 1.8.0_71 and sbt == 0.13.11

Building

# clone the repo
git clone https://github.com/PacificBiosciences/smrtflow.git

# use SBT to build and run tests
sbt clean pack test

# see also `sbt` + interactive `help`, `project`, `test`, `coverage`, `run`, ...
# sbt
# > help

Build Commandline Tools

make tools
# or
sbt pack

Add tools to path

source setup-tools-env.sh
pbservice --help
fasta-to-reference --help

See the full docs for details for details and examples of using SL tools, such as pbservice or fasta-to-reference.

Style Guide

scalafmt with default settings.

To reformat all code (this should be done prior to PR)

$> sbt scalafmt

To Test if the code is correctly formatted. This will return a non-zero exit code if the code is not formatted consistently with the scalafmt spec. The CI tests will verify that the code is properly formatted.

$> sbt scalafmt::test

Runtime dependencies

Running postgres

On the cluster:

module load jdk/1.8.0_71 postgresql
export PGDATA=/localdisk/scratch/$USER/pg
mkdir -p $PGDATA
# on a shared machine, choose a PGPORT that's not already in use
export PGPORT=5442
initdb
perl -pi.orig -e "s/#port\s*=\s*(\d+)/port = $PGPORT/" $PGDATA/postgresql.conf
pg_ctl -l $PGDATA/postgresql.log start
createdb smrtlinkdb
psql -d smrtlinkdb < extras/db-init.sql # these are for the run services or
psql -d smrtlinkdb < extras/test-db-init.sql # for the test db use in the *Spec.scala tests. The DB tables are drop and the migrations are run before each Spec.
export SMRTFLOW_DB_PORT=$PGPORT

Other Custom DB values:

ENV Property (-D<key>=<value>)
SMRTFLOW_DB_USER smrtflow.db.properties.user
SMRTFLOW_DB_PASSWORD smrtflow.db.properties.password
SMRTFLOW_DB_PORT smrtflow.db.properties.portNumber
SMRTFLOW_DB_HOST smrtflow.db.properties.serverName
SMRTFLOW_DB_NAME smrtflow.db.properties.databaseName

to run tests, also do:

export SMRTFLOW_TEST_DB_PORT=$PGPORT

Test DB Configuration for running unittests.

ENV Property (-D<key>=<value>)
SMRTFLOW_TEST_DB_USER smrtflow.test-db.properties.user
SMRTFLOW_TEST_DB_PASSWORD smrtflow.test-db.properties.password
SMRTFLOW_TEST_DB_PORT smrtflow.test-db.properties.portNumber
SMRTFLOW_TEST_DB_HOST smrtflow.test-db.properties.serverName
SMRTFLOW_TEST_DB_NAME smrtflow.test-db.properties.databaseName

Services

Launching SMRT Link/Analysis Services

sbt "smrt-server-analysis/run"

Set custom port

export PB_SERVICES_PORT=9997
sbt "smrt-server-analysis/run"

See the full docs for details

See reference.conf for more configuration parameters.

Swagger Docs

The SMRT Link Analysis Services are documented using Swagger Specification.

Validation of the swagger.json file

npm install -g swagger-tools
node_modules/swagger-tools/bin/swagger-tools validate /path/to/smrtlink_swagger.json

Or add swagger-tools cli tool to $PATH and:

make validate-swagger

UI Editor

UI Editor to import and edit the swagger file from a file or URL.

PacBio Common Models

Many core data models are described using XSDs.

See Resources Dir for details.

See the Readme for generating the java classes from the XSDs.

Also see the common model (e.g., Report, ToolContract, DataStore, Pipeline, PipelineView Rules) schemas here

REPL

Interactively load the smrtflow library code and execute expressions.

sbt smrtflow/test:console
@ import java.nio.file.Paths
import java.nio.file.Paths
@ val f = "/Users/mkocher/gh_mk_projects/smrtflow/PacBioTestData/data/SubreadSet/m54006_160504_020705.tiny.subreadset.xml"
f: String = "/Users/mkocher/gh_mk_projects/smrtflow/PacBioTestData/data/SubreadSet/m54006_160504_020705.tiny.subreadset.xml"
@ val px = Paths.get(f)
px: java.nio.file.Path = /Users/mkocher/gh_mk_projects/smrtflow/PacBioTestData/data/SubreadSet/m54006_160504_020705.tiny.subreadset.xml
@ import com.pacbio.secondary.smrtlink.analysis.datasets.io._
import com.pacbio.secondary.smrtlink.analysis.datasets.io._

@ val sset = DataSetLoader.loadSubreadSet(px)
sset: com.pacificbiosciences.pacbiodatasets.SubreadSet = com.pacificbiosciences.pacbiodatasets.SubreadSet@62c0ff68
@ sset.getName
res5: String = "subreads-sequel"

@ println("Services Example")
Services Example

@ import akka.actor.ActorSystem
import akka.actor.ActorSystem
@ implicit val actorSystem = ActorSystem("demo")
actorSystem: ActorSystem = akka://demo
@ import com.pacbio.secondary.smrtserver.client.{AnalysisServiceAccessLayer => Sal}
import com.pacbio.secondary.smrtserver.client.{AnalysisServiceAccessLayer => Sal}
@ val sal = new Sal("smrtlink-bihourly", 8081)
sal: com.pacbio.secondary.smrtlink.client.AnalysisServiceAccessLayer = com.pacbio.secondary.smrtlink.client.AnalysisServiceAccessLayer@8639ea4
@ val fx = sal.getStatus
fx: concurrent.Future[com.pacbio.common.models.ServiceStatus] = Success(ServiceStatus(smrtlink_analysis,Services have been up for 46 minutes and 59.472 seconds.,2819472,6d87566f-3433-4d73-8953-92673cc50f80,0.1.10-c63303e,secondarytest))
@ actorSystem.shutdown

@ exit
Bye!
welcomeBanner: Some[String] = Some(Welcome to the smrtflow REPL)
import ammonite.repl._
import ammonite.ops._
res0: Any = ()
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101).
Type in expressions for evaluation. Or try :help.

scala> :quit

Integration testing

At a minimum, integration test analysis jobs requires installing pbsmrtpipe (in a virtualenv) to run a pbsmrtpipe analysis job. Specific pipeilnes will have dependencies on exes, such as samtools or blasr.

  • set up PostgreSQL 9.6.1 instance (see configuration above)
  • install pbsmrtpipe in a VE
  • enable scala tools via make tools
  • add tools to path using source setup-tools-env.sh Test with which pbservice or pbservice --help
  • fetch PacBioTestData make PacBioTestData
  • launch services make start-smrt-server-analysis or make start-smrt-server-analysis-jar
  • import PacBioTestData make import-pbdata
  • import canned ReferenceSet and SubreadSet make test-int-import-data
  • run dev_diagnostic_stress test make test-int-run-analysis-stress

DISCLAIMER

THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.

About

SMRT Link and Analysis Server, PacBio Scala analysis tools and Common models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 63.6%
  • Java 33.1%
  • Python 1.5%
  • Shell 1.2%
  • Other 0.6%