-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent enzyme.search behaviour when cutting sequence falls outside sequence #4604
Comments
Do you have a one-cutter example to hand?
It should be easier for me to understand what is puzzling you with a single cutter ;) |
Without having looked at the code, it seems quite obvious what's happening here. Only the "upper" strand is analyzed and to have a cut before the 2nd position (1-based), you need of course a first position. To have a cut before position 27, you need to have position 27 (but not 28). The "lower" strand is obviously not taken into account. While this in not a problem for type IIp REs, which cut inside their recognition sequence, it's a problem for type IIs REs, which cut outside. A possible solution would be to check for these enzymes if the reverse complement sequence gives the same result (if the recognition sequence is palindromic), or to check if the sequence is long enough to contain the cuttings for both strands. This could possibly be incorporated into the RE module. From a biological view, it's most likely unknown, if such enzymes will cut the DNA if one strand cannot be cut at all (because the DNA just ends). (Even some typ IIp REs don't cut (effectively) when their sites are directly at the end of a DNA molecule) So I wouldn't recommend to allow "air cuts", e.g. in your example, I wouldn't return a site if one strand is missing the neighboring base. I think the actual behavior for the "left site" is correct and for the "right site" is incorrect. |
As @MarkusPiotrowski points out, this can only happen with enzymes that cut outside their recognition site. It seems that only the positions on the watson/upper strand are returned. It's not so much whether the cut is on the left or on the right, instead what matters is whether the cut occurs on the top or bottom strand. Let me show you with three examples with BsaI, BsaXI, BspCNI. Recognition sites are shown in blue, and the cuts are marked in black. Note that there is a T among the As where the cut will occur. When that T is removed, the cut is no longer returned, as you can see from the code below: from Bio.Seq import Seq
from Bio.Restriction import BsaI, BsaXI, BspCNI
print(BsaI.search(Seq('GGTCTCATAAAA')))
print(BsaI.search(Seq('GGTCTCATAAA')))
print(BsaI.search(Seq('GGTCTCATAA')))
print(BsaI.search(Seq('GGTCTCATA')))
print(BsaI.search(Seq('GGTCTCAT')))
print(BsaI.search(Seq('GGTCTCA')))
print()
# Prints
# [8]
# [8]
# [8]
# [8]
# [8]
# []
print(BsaXI.search(Seq('AAATAAAAAAAAAACAAAAACTCC')))
print(BsaXI.search(Seq('AATAAAAAAAAAACAAAAACTCC')))
print(BsaXI.search(Seq('ATAAAAAAAAAACAAAAACTCC')))
print(BsaXI.search(Seq('TAAAAAAAAAACAAAAACTCC')))
print(BsaXI.search(Seq('AAAAAAAAAACAAAAACTCC')))
print()
# prints
# [5]
# [4]
# [3]
# [2]
# []
print(BspCNI.search(Seq('CTCAGAAAAAAAAAT')))
print(BspCNI.search(Seq('CTCAGAAAAAAAAA')))
# prints
# [15]
# [] @MarkusPiotrowski proposes that the default behaviour should be like in the BspCNI case (cutsite is returned only if the resulting cut will leave at least one base on each strand). Conventional wisdom for primer design is to add some extra bases after a cutsite, so this is probably correct. If you agree, I can implement that change and write the tests. If you know where to look immediately, I would appreciate it, but I can find my way. |
Hi @MarkusPiotrowski @peterjc do you agree with the proposed change? Should I go ahead? |
Hello, just following up on this in case you missed the last comment. |
I'm deferring to @MarkusPiotrowski on this one. |
Pinging @MarkusPiotrowski |
@manulera The situation is not clear-cut.
So please, go on to implement this change. Maybe you come across some references which would strengthen one or the other alternative. In general, it's always difficult for users, if Biopython acts different to some popular tool without a good reason. :-) |
Setup
I am reporting a problem with Biopython version, Python version, and operating
system as follows:
Steps to reproduce
cc: @BjornFJohansson
Let's start with the digestion of the following sequence
TAAAAAAAAAAAAGCCGGCAAAAAAATAAAAA
with the restriction enzyme that makes two cutsNmeDI
. I am using this example to show what happens at both ends, but the same applies if you use one-cutters only. After digestion we would get 3 fragments:If we use the enzyme search, we get the positions (one-based) where these cuts will occur:
If we start trimming the molecule from the sides, the sequence where the enzyme would make the cut and leave and overhang will disappear, and the behaviour is different when we trim the left and right side:
I don't know what the right thing to do is. My feeling is that probably the behaviour of the first trimming on either side should work, and the rest not? This would mean accepting a cut like this, where the cutting sites lie exactly at the end, and single-strand DNA is generated (see below). I am not sure if biologically a cut would happen with further trimming.
The text was updated successfully, but these errors were encountered: