Skip to content

HumanCellAtlas/secondary-analysis

Repository files navigation

secondary-analysis

Travis (.org) branch Snyk Vulnerabilities for GitHub Repo (Specific Manifest) Snyk Vulnerabilities for GitHub Repo (Specific Manifest) Snyk Vulnerabilities for GitHub Repo (Specific Manifest) Snyk Vulnerabilities for GitHub Repo (Specific Manifest) Snyk Vulnerabilities for GitHub Repo (Specific Manifest)

Github GitHub Code style: black

This repo is the gateway of the Secondary Analysis Service which is part of the Human Cell Atlas Data Coordination Platform, containing the testing suites, automations and utility scripts of the Secondary Analysis Service. This repo also serves as a issue tracker and hosting all of the tickets of the Secondary Analysis Service.

Architectural Diagram Secondary Analysis Tech Arch Diagram

Other Secondary Analysis Service repos:

  • Adapter Pipelines: Contains Data Coordination Platform adapter pipelines

  • Cromwell Tools: A collection of Python clients and accessory scripts for interacting with the Cromwell workflow execution engine - a scientific workflow engine designed for simplicity and scalability

  • Falcon: Queueing system that (after launching) throttles and inititates workflows

  • Lira: Listens to storage service notifications and launches workflows

  • Pipeline Tools: Contains Data Coordination Platform adapter pipelines and associated tools

  • scTools: Tools for single cell data processing

  • Secondary Analysis Deploy: Contains the deployment configuration and scripts for the Pipeline Execution Service

  • Skylab Analysis: Analysis and benchmarking reports for standardized HCA pipelines

  • Skylab: Standardized HCA data processing pipelines

Development

File Structure Layout

.
├── operations
│   ├── big_red_button              # The "BIG RED BUTTON" scripts of Secondary Analysis
│   ├── dashboard                   # The scripts to create log-based metrics on Google Cloud
│   ├── data_cleanup                # The scripts to cleanup data from the Cromwell execution buckets
│   ├── failure_analysis            # The scripts to perform failure analysis on failed workflows
│   ├── gcp_quota                   # The scripts to setup quotas monitors and fetch results
│   ├── run_analysis                # The scripts to trigger analysis manually in HCA DCP through Lira
│   └── tls_cert                    # The scripts to renew TLS certs
└── tests
    ├── integration                 # The integration test suite
    ├── meteoroid                   # The "Next-Gen Data Driven" test suite which is under construction
    └── scaling                     # The scaling test suite

Code Style

The Secondary Analysis code base is complying with the PEP-8 and using Black to format our code, in order to avoid "nitpicky" comments during the code review process so we spend more time discussing about the logic, not code styles.

In order to enable the auto-formatting in the development process, you have to spend a few seconds setting up the pre-commit the first time you clone the repo. It's highly recommended that you install the packages within a virtualenv.

  1. Install pre-commit by running: pip install pre-commit (or simply run pip install -r requirements.txt).
  2. Run pre-commit install to install the git hook.

Once you successfully install the pre-commit hook to this repo, the Black linter/formatter will be automatically triggered and run on this repo. Please make sure you followed the above steps, otherwise your commits might fail at the linting test!

If you really want to manually trigger the linters and formatters on your code, make sure Black and flake8 are installed in your Python environment and run flake8 DIR1 DIR2 and black DIR1 DIR2 --skip-string-normalization respectively.

About

Secondary Analysis Service of the Human Cell Atlas Data Coordination Platform

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published