Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative Splicing #1079

Open
fanninpm opened this issue Dec 29, 2022 · 1 comment
Open

Alternative Splicing #1079

fanninpm opened this issue Dec 29, 2022 · 1 comment
Labels
t:feat Type: request of a new feature, functionality, enchancement

Comments

@fanninpm
Copy link

I was thinking of making Nextclade datasets for the human adenoviruses, starting with hAdV-F (NCBI accession: NC_001454.1), but I ran into a snag almost right away. The expression of certain genes uses alternative splicing, but Nextclade has a hard time reconciling discontiguous coding segments that contribute to the same polypeptide, especially when an intron interrupts a codon.

For example, the E1A gene has an intron that splits the coding region in the middle of a codon.

@fanninpm fanninpm added good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment t:feat Type: request of a new feature, functionality, enchancement labels Dec 29, 2022
@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Jan 25, 2023

@fanninpm In #1073 I am currently working on reimplementing genome annotation, and related parts of translation and analysis to account for not only continuous genes, but also CDS, and potentially mature protein regions (in GFF3 / sequence ontology lingo). This should hopefully include your case with hAdV.

Could you please provide some of the examples of genome annotations in GFF3 format which are relevant for your current or future work? This will help to make Nextclade more relevant and universal in terms of supported pathogens.

The git branch of #1073 has 2 additional binaries which can help to visualize how future Nextclade will see the genome annotation:

The "featuretree" just parses GFF3 file and builds a tree of all features:

cargo run --bin=featuretree -- path/to/genemap.gff

The "genemap" shows a simplified tree of only features that Nextclade can understand:

cargo run --bin=genemap -- path/to/genemap.gff

The consequences of this change is quite complex for Nextclade internals, so it might take some time.

@ivan-aksamentov ivan-aksamentov removed good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment labels Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:feat Type: request of a new feature, functionality, enchancement
Projects
No open projects
Development

No branches or pull requests

2 participants