-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow qblast to accept lists of organisms to include/exclude #4516
base: master
Are you sure you want to change the base?
Allow qblast to accept lists of organisms to include/exclude #4516
Conversation
…onvert the argument to a Seq.
… a SeqRecord with a string.
- organisms A dictionary that defines the organisms that will be | ||
included/excluded in the search. The key is the name | ||
of the organism, following the taxonomy convention | ||
ie. "Bacteria (taxid:2)" and the value is a boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie -> eg
I assume (please confirm) this works with any taxonomy node like "Bacteria" and not just leaf nodes (like a specific species)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my use cases, it is indeed the case. And it does set difference between domains, eg.
{ "Bacteria (taxid:2)": False, "E. coli (taxid:562)": True }
Will give you results that are bacteria but not "E. coli". This particular behavior is what I am relying on for my use case.
If you decide to go forward with these changes, I will write a few tests to validate these properties.
|
||
if ORGANISM_REGEX.match(organism) is None: | ||
raise ValueError( | ||
"Organisms must be specified following the taxonomy convention. ie. 'Bacteria (taxid:2)'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie -> eg
Is this explained in the latest qblast documentation from the NCBI? |
Unfortunately, I could not find any official documentation for this functionality. I worked it out by inspecting the web requests performed by the "NCBI Blast" web page and reverse engineered the functionality. For this reason, I can understand you might not want to include the changes. |
[ X] I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
[ X] I have read the
CONTRIBUTING.rst
file, have runpre-commit
locally, and understand that continuous integration checks will be used to
confirm the Biopython unit tests and style checks pass with these changes.
[ X] I have added my name to the alphabetical contributors listings in the files
NEWS.rst
andCONTRIB.rst
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
Allow qblast to accept an explicit list of organisms to include/exclude. That this is already possible to a limited degree by using the entrez query. However, I have a use case which requires me to build a complex list of organisms to exclude/include, which also queries qblast multiple times (with similar but not identical lists of organisms). This feature makes my use case easier to implement and prevents blast from having to do the entrez query as I already provide the correct names and taxids of the organisms.
Let me know if you are open to include this in biopython, I am happy to write a test if so.