-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient indexing into Bio.Align.substitution_matrices.Array
during pairwise2
alignment
#4701
Comments
kamurani
changed the title
Inefficient indexing into `Align
Inefficient indexing into Apr 12, 2024
Bio.Align.substitution_matrices.Array
during `pairwise2' alignment
kamurani
changed the title
Inefficient indexing into
Inefficient indexing into Apr 12, 2024
Bio.Align.substitution_matrices.Array
during `pairwise2' alignmentBio.Align.substitution_matrices.Array
during pairwise2
alignment
@kamurani Can you use the newer |
The following runs in 0.12 seconds: >>> from Bio import Align
>>> aligner = Align.PairwiseAligner()
>>> aligner.open_gap_score = -11
>>> aligner.extend_gap_score = -1
>>> from Bio.Align import substitution_matrices
>>> sm = substitution_matrices.load("BLOSUM62")
>>> seq1 = "MKKCTILVVASLLLVNSLLPGYGQNKIIQAQRNLNELCYNEGNDNKLYHVLNSKNGKIYNRNTVNRLLPMLRRKKNEKKNEKIERNNKLKQPPPPPNPNDPPPPNPNDPPPPNPNDPPPPNPNDPPPPNANDPPPPNANDPAPPNANDPAPPNANDPAPPNANDPAPPNANDPAPPNANDPAPPNANDPPPPNPNDPAPPQGNNNPQPQPRPQPQPQPQPQPQPQPQPQPRPQPQPQPGGNNNNKNNNNDDSYIPSAEKILEFVKQIRDSITEEWSQCNVTCGSGIRVRKRKGSNKKAEDLTLEDIDTEICKMDKCSSIFNIVSNSLGFVILLVLVFFN"
>>> seq2 = "MNYCKTTFHIFFFVLFFITIYEIKCQLRFASLGDWGKDTKGQILNAKYFKQFIKNERVTFIVSPGSNFIDGVKGLNDPAWKNLYEDVYSEEKGDMYMPFFTVLGTRDWTGNYNAQLLKGQGIYIEKNGETSIEKDADATNYPKWIMPNYWYHYFTHFTVSSGPSIVKTGHKDLAAAFIFIDTWVLSSNFPYKKIHEKAWNDLKSQLSVAKKIADFIIVVGDQPIYSSGYSRGSSYLAYYLLPLLKDAEVDLYISGHDNNMEVIEDNDMAHITCGSGSMSQGKSGMKNSKSLFFSSDIGFCVHELSNNGIVTKFVSSKKGEVIYTHKLNIKKKKTLDKVNALQHFAALPNVELTDVPSSGPMGNKDTFVRVVGTIGILIGSVIVFIGASSFLSKNMK"
>>> aligner.substitution_matrix = sm
>>> alignments = aligner.align(seq1, seq2)
>>> alignment = next(alignments)
>>> print(alignment) |
Could be that providing a |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System info
When using the
globalds
sequence alignment function, I notice that providing aBio.Align.substitution_matrices.Array
object results in a much larger time to get an output than when providing it as a python dictionary (i.e. same format as the now-deprecatedfrom Bio.SubsMat import MatrixInfo; MatrixInfo.blosum62
).For short sequences (i.e. < 20 AAs), there is no noticeable difference.
However, when running global alignment on sequences of length
(339, 396)
, thedict
version runs within 2 seconds but the non-dict version indefinitely runs (several minutes before ICtrl+D
).Not sure if this is expected behaviour or not; I figured I would post an issue in case there's something going on that shouldn't be.
The text was updated successfully, but these errors were encountered: