Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame shift ORF7a:empty range is confusing #1042

Open
corneliusroemer opened this issue Oct 28, 2022 · 3 comments
Open

Frame shift ORF7a:empty range is confusing #1042

corneliusroemer opened this issue Oct 28, 2022 · 3 comments
Labels
package: nextclade t:bug Type: bug, error, something isn't working

Comments

@corneliusroemer
Copy link
Member

I remember we talked about this on Slack, we seem to sometimes output ORF7a:empty range into the tsv - which is not ideal since it breaks assumptions.

Maybe we can at least document what it means, and maybe fix it. I think this occurs if there's a frame shift in the stop codon?

Here are a few sample Genbank URLs:

OV623283.1
OV731935.1
OW351812.1
OU254502.1

empty_range.fasta.txt

@corneliusroemer corneliusroemer added t:bug Type: bug, error, something isn't working good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment package: nextclade and removed good first issue Good for newcomers help wanted Extra attention is needed labels Oct 28, 2022
@ivan-aksamentov
Copy link
Member

@ivan-aksamentov ivan-aksamentov removed the needs triage Mark for review and label assignment label Oct 31, 2022
@ivan-aksamentov
Copy link
Member

Click to show the result
jq '.results[0].frameShifts' nextclade.json | prettyjson
- 
  geneName:     ORF7a
  nucRel: 
    begin: 365
    end:   366
  nucAbs: 
    begin: 27758
    end:   27759
  codon: 
    begin: 122
    end:   122
  gapsLeading: 
    codon: 
      begin: 101
      end:   122
  gapsTrailing: 
    codon: 
      begin: 122
      end:   122
  codonMask: 
    begin: 101
    end:   122
Click to show the result
jq '[ .results[0].aaDeletions[] | select( .gene == "ORF7a" ) ]' nextclade.json | prettyjson
- 
  gene:             ORF7a
  refAA:            F
  codon:            100
  codonNucRange: 
    begin: 27693
    end:   27696
  refContext:       ATTTTTCTT
  queryContext:     ATTT-----
  contextNucRange: 
    begin: 27690
    end:   27699
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            L
  codon:            101
  codonNucRange: 
    begin: 27696
    end:   27699
  refContext:       TTTCTTATT
  queryContext:     T--------
  contextNucRange: 
    begin: 27693
    end:   27702
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            I
  codon:            102
  codonNucRange: 
    begin: 27699
    end:   27702
  refContext:       CTTATTGTT
  queryContext:     ---------
  contextNucRange: 
    begin: 27696
    end:   27705
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            V
  codon:            103
  codonNucRange: 
    begin: 27702
    end:   27705
  refContext:       ATTGTTGCG
  queryContext:     ---------
  contextNucRange: 
    begin: 27699
    end:   27708
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            A
  codon:            104
  codonNucRange: 
    begin: 27705
    end:   27708
  refContext:       GTTGCGGCA
  queryContext:     ---------
  contextNucRange: 
    begin: 27702
    end:   27711
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            A
  codon:            105
  codonNucRange: 
    begin: 27708
    end:   27711
  refContext:       GCGGCAATA
  queryContext:     ---------
  contextNucRange: 
    begin: 27705
    end:   27714
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            I
  codon:            106
  codonNucRange: 
    begin: 27711
    end:   27714
  refContext:       GCAATAGTG
  queryContext:     ---------
  contextNucRange: 
    begin: 27708
    end:   27717
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            V
  codon:            107
  codonNucRange: 
    begin: 27714
    end:   27717
  refContext:       ATAGTGTTT
  queryContext:     ---------
  contextNucRange: 
    begin: 27711
    end:   27720
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            F
  codon:            108
  codonNucRange: 
    begin: 27717
    end:   27720
  refContext:       GTGTTTATA
  queryContext:     ---------
  contextNucRange: 
    begin: 27714
    end:   27723
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            I
  codon:            109
  codonNucRange: 
    begin: 27720
    end:   27723
  refContext:       TTTATAACA
  queryContext:     ---------
  contextNucRange: 
    begin: 27717
    end:   27726
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            T
  codon:            110
  codonNucRange: 
    begin: 27723
    end:   27726
  refContext:       ATAACACTT
  queryContext:     ---------
  contextNucRange: 
    begin: 27720
    end:   27729
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            L
  codon:            111
  codonNucRange: 
    begin: 27726
    end:   27729
  refContext:       ACACTTTGC
  queryContext:     ---------
  contextNucRange: 
    begin: 27723
    end:   27732
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            C
  codon:            112
  codonNucRange: 
    begin: 27729
    end:   27732
  refContext:       CTTTGCTTC
  queryContext:     ---------
  contextNucRange: 
    begin: 27726
    end:   27735
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            F
  codon:            113
  codonNucRange: 
    begin: 27732
    end:   27735
  refContext:       TGCTTCACA
  queryContext:     ---------
  contextNucRange: 
    begin: 27729
    end:   27738
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            T
  codon:            114
  codonNucRange: 
    begin: 27735
    end:   27738
  refContext:       TTCACACTC
  queryContext:     ---------
  contextNucRange: 
    begin: 27732
    end:   27741
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            L
  codon:            115
  codonNucRange: 
    begin: 27738
    end:   27741
  refContext:       ACACTCAAA
  queryContext:     ---------
  contextNucRange: 
    begin: 27735
    end:   27744
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            K
  codon:            116
  codonNucRange: 
    begin: 27741
    end:   27744
  refContext:       CTCAAAAGA
  queryContext:     ---------
  contextNucRange: 
    begin: 27738
    end:   27747
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            R
  codon:            117
  codonNucRange: 
    begin: 27744
    end:   27747
  refContext:       AAAAGAAAG
  queryContext:     ---------
  contextNucRange: 
    begin: 27741
    end:   27750
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            K
  codon:            118
  codonNucRange: 
    begin: 27747
    end:   27750
  refContext:       AGAAAGACA
  queryContext:     ---------
  contextNucRange: 
    begin: 27744
    end:   27753
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            T
  codon:            119
  codonNucRange: 
    begin: 27750
    end:   27753
  refContext:       AAGACAGAA
  queryContext:     ---------
  contextNucRange: 
    begin: 27747
    end:   27756
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            E
  codon:            120
  codonNucRange: 
    begin: 27753
    end:   27756
  refContext:       ACAGAATGA
  queryContext:     --------A
  contextNucRange: 
    begin: 27750
    end:   27759
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
- 
  gene:             ORF7a
  refAA:            *
  codon:            121
  codonNucRange: 
    begin: 27756
    end:   27759
  refContext:       GAATGATTG
  queryContext:     -----ATTG
  contextNucRange: 
    begin: 27753
    end:   27762
  nucSubstitutions: 
    (empty array)
  nucDeletions: 
    - 
      start:  27694
      length: 64
Click to show the result
jq '[ .results[0].qc.stopCodons ]' nextclade.json | prettyjson
- 
  score:                  0
  status:                 good
  stopCodons: 
    (empty array)
  totalStopCodons:        0
  stopCodonsIgnored: 
    (empty array)
  totalStopCodonsIgnored: 0

@ivan-aksamentov
Copy link
Member

I'll leave the scientific analysis and documenting it to you, but let's see what can be improved in terms of the algorithm or user interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package: nextclade t:bug Type: bug, error, something isn't working
Projects
No open projects
Development

No branches or pull requests

2 participants