Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC content for RNA sequences #4722

Open
Lucandia opened this issue May 7, 2024 · 2 comments · May be fixed by #4727
Open

GC content for RNA sequences #4722

Lucandia opened this issue May 7, 2024 · 2 comments · May be fixed by #4727
Labels
good first issue This should be an easy fix, suitable for beginners

Comments

@Lucandia
Copy link

Lucandia commented May 7, 2024

Setup

I am reporting a problem with:

  • Biopython version: 1.81
  • Python version: 3.9.0
  • System: macOS Sonoma

Expected behaviour

from Bio.SeqUtils import gc_fraction
gr_fraction("GGAUCUUCGGAUCU") 
# expected result: 0.5

Actual behaviour

from Bio.SeqUtils import gc_fraction
gr_fraction("GGAUCUUCGGAUCU") 
# result: 0.7777777777777779

What to fix

Please add U to the non-ambiguous nucleotides to support RNA sequences (currently, the non-ambiguous nucleotides are ATCGSW). The previous GC content function (now deprecated) was supporting RNA sequences too

@peterjc
Copy link
Member

peterjc commented May 8, 2024

The old function didn't explicitly handle U either:

def GC(seq):
    """Calculate G+C content (DEPRECATED).
    Use Bio.SeqUtils.gc_fraction instead.
    """
    warnings.warn(
        "GC is deprecated; please use gc_fraction instead.",
        BiopythonDeprecationWarning,
    )

    gc = sum(seq.count(x) for x in ["G", "C", "g", "c", "S", "s"])
    try:
        return gc * 100.0 / len(seq)
    except ZeroDivisionError:
        return 0.0

(See 960a5e2 which removed it)

To mimic the old behaviour, use the ignore mode rather than the default remove mode:

>>> from Bio.SeqUtils import gc_fraction
>>> gc_fraction("GGAUCUUCGGAUCU", "ignore") 
0.5

That said, explicitly handling U for RNA support makes sense. I think it would be a one line change to add U to the dictionary here:

https://github.com/biopython/biopython/blob/biopython-183/Bio/SeqUtils/__init__.py#L32

This would also need a new test case (e.g. your example). Would you like to make a pull request?

@peterjc peterjc added the good first issue This should be an easy fix, suitable for beginners label May 8, 2024
@Lucandia
Copy link
Author

Sure, I'm on it!

@Lucandia Lucandia linked a pull request May 13, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue This should be an easy fix, suitable for beginners
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants