TNBC

code related to Shu et al., 2016

tnbc_pipeline_scripts.py contains all python scripts for data analysis in Shu et al., 2016

Many of the functions and scripts here refer to functions and code from the bradner lab pipeline https://github.com/BradnerLab/pipeline

PREREQUISITES:

Clone this repo and cd into the TNBC folder -all paths are set relative to this folder -recommended that you cd into the TNBC folder to run all analysis
please install bamliquidator https://github.com/BradnerLab/pipeline/wiki/bamliquidator -set bamliquidator path globally as bamliquidator -set bamliquidator_batch path globally as bamliquidator_batch -install macs1.4.2 and add to global path as macs14 (http://liulab.dfci.harvard.edu/MACS/Download.html) -install bowtie2, samtools, cufflinks, cuffquant, R
Clone the bradnerlab pipeline https://github.com/BradnerLab/pipeline -edit section of paths in pipeline_dfci.py to correctly point to the pipeline directory, samtools, bamliquidator, and bamliquidator_batch
Data tables -A number of data tables are included to help organize files for analysis -Each row represents a specific dataset. The UNIQUE_ID and NAME corresponds to the sample name in GEO for Series GSE63584 -The BACKGROUND column represents the control for a given dataset -ENRICHED_REGION and ENRICHED_MACS refer to locations of peak files generated by various peak calling algorithms -For more information consult the loadDataTable function in the pipeline_dfci.py module
Obtaining raw data -raw fastqs may be obtained here: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63584 -recommended that fastqs be deposited into the ./fastq folder
Generate sorted and indexed bam files for each sample. -align raw fastqs to the hg19 genome. bowtie2 index can be found here: ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Homo_sapiens/UCSC/hg19/Ho\ mo_sapiens_UCSC_hg19.tar.gz -for alignment we used bowtie2 v2.2.1 with default parameters except for -k 1 -place bams in the ./bam folder or edit the "FILE_PATH" column in CHIP_DATA_TABLE.txt to point to the folder containing bams -bams should be named to include the UNIQUE_ID as in the CHIP_DATA_TABLE.txt. The UNIQUE_ID is derived from the GEO sample name. -bams shouled be sorted and indexed and include a corresponding .bai file with the same name -NOTE: code will not work if multiple bams in the directory share the same UNIQUE_ID
Running python analysis -cd into the repo directory -Run each block of code by uncommenting and then running python ./tnbc_pipeline_scripts.py -All paths are set relativeto the TNBC folder
Running R scripts -all R scripts are found in ./figure_code -additional R scripts are used for expression analysis and to generate plots of ChIP-Seq data. -output figures can be found in ./figures

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cufflinks/cuffnorm		cufflinks/cuffnorm
figure_code		figure_code
figures		figures
sum149_cuffnorm		sum149_cuffnorm
tables		tables
.gitignore		.gitignore
BRD4_DATA_TABLE.txt		BRD4_DATA_TABLE.txt
CHIP_DATA_TABLE.txt		CHIP_DATA_TABLE.txt
README.md		README.md
SUM149_RNA_TABLE.txt		SUM149_RNA_TABLE.txt
SUM159_RNA_TABLE.txt		SUM159_RNA_TABLE.txt
tnbc_pipeline_scripts.py		tnbc_pipeline_scripts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cufflinks/cuffnorm

cufflinks/cuffnorm

figure_code

figure_code

figures

figures

sum149_cuffnorm

sum149_cuffnorm

tables

tables

.gitignore

.gitignore

BRD4_DATA_TABLE.txt

BRD4_DATA_TABLE.txt

CHIP_DATA_TABLE.txt

CHIP_DATA_TABLE.txt

README.md

README.md

SUM149_RNA_TABLE.txt

SUM149_RNA_TABLE.txt

SUM159_RNA_TABLE.txt

SUM159_RNA_TABLE.txt

tnbc_pipeline_scripts.py

tnbc_pipeline_scripts.py

Repository files navigation

TNBC

About

Releases

Packages

Languages

BradnerLab/TNBC

Folders and files

Latest commit

History

Repository files navigation

TNBC

About

Resources

Stars

Watchers

Forks

Languages