Genes for Project Cognoma

This repository creates the set of genes to be used in Project Cognoma. The human subset of Entrez Gene is the basis of Cognoma genes. All genes in Cognoma should be converted to Entrez GeneIDs (using a preferred variable name of entrez_gene_id).

When encountering genes in Project Cognoma, identify which of the following approach should be applied:

If the input genes are only in symbols, open an issue to discuss mapping options.
If the input genes contain chromosome and symbol information, use chromosome-symbol-map.tsv to map the genes to Entrez GeneIDs.
If the genes are already encoded as Entrez GeneIDs, update the Gene_IDs to their most recent versions using updater.tsv and remove GeneIDs that are not in genes.tsv.

Downloads and data

The raw (downloaded) data is stored in the download directory. versions.json contains timestamps for the raw data. The raw data is tracked since the Entrez Gene FTP site doesn't version and archive files.

Created data is stored in the data directory. Applications should use the processed data rather than the raw data, if possible. Applications are strongly encouraged to use versioned (commit-hash-containing) links when accessing data from this repository.

Execution

Use the following commands to run the analysis, inside the environment specified by environment.yml:

# To run the entire analysis
python 1.download.py
python 2.process.py

# To run just the data processing
python 2.process.py

In general, we don't anticipate redownloading the data frequently. If you submit a pull request to create additional datasets, please do not execute 1.download.py.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
download		download
.gitattributes		.gitattributes
1.download.py		1.download.py
2.process.py		2.process.py
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

download

download

.gitattributes

.gitattributes

1.download.py

1.download.py

2.process.py

2.process.py

LICENSE.md

LICENSE.md

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Genes for Project Cognoma

Downloads and data

Execution

About

Releases

Packages

Contributors 2

Languages

License

cognoma/genes

Folders and files

Latest commit

History

Repository files navigation

Genes for Project Cognoma

Downloads and data

Execution

About

Topics

Resources

License

Stars

Watchers

Forks

Languages