Skip to content
This repository has been archived by the owner on Nov 8, 2021. It is now read-only.

Gene name aliases #37

Open
Gregory94 opened this issue Mar 11, 2021 · 1 comment
Open

Gene name aliases #37

Gregory94 opened this issue Mar 11, 2021 · 1 comment

Comments

@Gregory94
Copy link
Collaborator

Gregory94 commented Mar 11, 2021

There has been some confusion with gene names between datasets that are generated with our workflow and gene names from files created by others. This is most likely caused by the use of different naming conventions or gene aliases (i.e. the same gene can have multiple names).

One of the differences in gene names are between _pergene.txt files from the workflow of the Kornmann lab and from our workflow. I have checked the differences between the two files using this python script. This takes two _pergene.txt files as input and for each file creates a list of all gene names present in that file. It then looks for all genes that are in one list but not the other and vice versa.

I saw that there are 80 genes that are different between the Kornmann files and our files. I checked all genes and I noticed that sometimes they were using either a different naming convention for genes (e.g. we use MRX3 whilst they use YBL095W which are two names for the same gene) or they used an alias (e.g. we use BOL3 whilst they used AIM1, again both referring to the same gene).

Just be aware when comparing data files from different sources that include gene names, that there might be differences in the names for the same genes.

Solving this issue can be done using the Yeast_Protein_Names.txt file that stores all different names for the genes.
Alternatively you can use the genomicfeatures_dataframe.py script that creates a python dataframe including, for each gene its aliased and different naming conventions (it is also using the Yeast_Protein_Names.txt file).

@Gregory94
Copy link
Collaborator Author

Gregory94 commented Mar 11, 2021

Important note when using Yeast_Protein_Names.txt.
There has been a major update concerning gene names and aliases in Yeast_Protein_Names.txt.
More gene names are present and some genes have updated aliases.
This has been updated on the master branch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant