You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was thinking of making Nextclade datasets for the human adenoviruses, starting with hAdV-F (NCBI accession: NC_001454.1), but I ran into a snag almost right away. The expression of certain genes uses alternative splicing, but Nextclade has a hard time reconciling discontiguous coding segments that contribute to the same polypeptide, especially when an intron interrupts a codon.
@fanninpm In #1073 I am currently working on reimplementing genome annotation, and related parts of translation and analysis to account for not only continuous genes, but also CDS, and potentially mature protein regions (in GFF3 / sequence ontology lingo). This should hopefully include your case with hAdV.
Could you please provide some of the examples of genome annotations in GFF3 format which are relevant for your current or future work? This will help to make Nextclade more relevant and universal in terms of supported pathogens.
The git branch of #1073 has 2 additional binaries which can help to visualize how future Nextclade will see the genome annotation:
The "featuretree" just parses GFF3 file and builds a tree of all features:
cargo run --bin=featuretree -- path/to/genemap.gff
The "genemap" shows a simplified tree of only features that Nextclade can understand:
cargo run --bin=genemap -- path/to/genemap.gff
The consequences of this change is quite complex for Nextclade internals, so it might take some time.
I was thinking of making Nextclade datasets for the human adenoviruses, starting with hAdV-F (NCBI accession: NC_001454.1), but I ran into a snag almost right away. The expression of certain genes uses alternative splicing, but Nextclade has a hard time reconciling discontiguous coding segments that contribute to the same polypeptide, especially when an intron interrupts a codon.
For example, the E1A gene has an intron that splits the coding region in the middle of a codon.
The text was updated successfully, but these errors were encountered: