-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSBond parsing addition - 2nd attempt #4650
base: master
Are you sure you want to change the base?
Conversation
@@ -1,7 +1,6 @@ | |||
#!/usr/bin/env python | |||
# Copyright 2004 Kristian Rother. | |||
# Revisions copyright 2004 Thomas Hamelryck. | |||
# Revisions copyright 2024 James Krieger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was removing this line deliberate, and if so why?
@@ -21,6 +20,23 @@ | |||
from Bio import File | |||
|
|||
|
|||
# Added to parse SSBOND records from the PDB header and add them back to the header dict | |||
# Eric G. Suchanek, PhD 11/25/22 | |||
def _get_ssbond(inl): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a private function (leading underscore) so we don't insist on a docstring, but the comments here might as well be transformed into a docstring. Note there are style rules (one line summary, blank line, more text etc), so do run the pre-commit checks as per https://github.com/biopython/biopython/blob/master/CONTRIBUTING.rst
ssb = _get_ssbond(tail) | ||
ssbdict = {ssbond_numb: ssb} | ||
pdbh_dict["ssbond"].update(ssbdict) | ||
# print(f'SSBOND: {tail}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the commented out print statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this records a dictionary of 4-entry tuples, but only from reading the source code can you guess what they are and even then the names are cryptic: (prox, dist, chn, chn2)
.
For a start, please expand those into full words for clarity - and then perhaps use a named tuple or a dictionary instead of a tuple?
What ever you do, I think the SSBOND parsing needs to be documented in a public docstring.
Also, this really needs a test. I suggest extending the unit test already parsing Tests/PDB/1A8O.pdb
(one SSBOND line) or better Tests/PDB/7DDO.pdb
which has seven.
Inadvertently removed due to my inexperience. Will restore and incorporate your comments. Thank you.
Cheers,
Eric
—
Eric G. Suchanek, PhD
Do good, be good, make the world a better place. It only takes ONE person to change the world.
***@***.***
… On Mar 13, 2024, at 6:46 AM, Peter Cock ***@***.***> wrote:
@peterjc requested changes on this pull request.
Currently this records a dictionary of 4-entry tuples, but only from reading the source code can you guess what they are and even then the names are cryptic: (prox, dist, chn, chn2).
For a start, please expand those into full words for clarity - and then perhaps use a named tuple or a dictionary instead of a tuple?
What ever you do, I think the SSBOND parsing needs to be documented in a public docstring.
Also, this really needs a test. I suggest extending the unit test already parsing Tests/PDB/1A8O.pdb (one SSBOND line) or better Tests/PDB/7DDO.pdb which has seven.
—
Reply to this email directly, view it on GitHub <#4650 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7PYGR64TAWCYFVFDR3VHTYYAVCHAVCNFSM6AAAAABEHJ5MBWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMZTG43TKMRQGM>.
You are receiving this because you authored the thread.
|
@@ -166,7 +182,8 @@ def parse_pdb_header(infile): | |||
record_type = line[0:6] | |||
if record_type in ("ATOM ", "HETATM", "MODEL "): | |||
break | |||
header.append(line) | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the else here. The for loop will run until it his one of the record_types above and then will break (stop the for-loop).
@@ -219,6 +236,7 @@ def _parse_remark_465(line): | |||
|
|||
def _parse_pdb_header_list(header): | |||
# database fields | |||
# added ssbond 11/20/22 Eric G. Suchanek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove these comments. Git will track these changes and keep your authorship intact if anyone wishes to know who wrote this code.
@@ -333,6 +354,13 @@ def _parse_pdb_header_list(header): | |||
pdbh_dict["author"] += auth | |||
else: | |||
pdbh_dict["author"] = auth | |||
# parse SSBOND records and add them to the Header dict | |||
elif key == "SSBOND": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do something like in lines 259-261
above and use the _get_ssbond
method to directly grab the ss bond info from the header? That would make the function much easier to test (a single call) and you can do all the manipulation you need inside of it.
In other words:
pdbh_dict["ssbond"] = _get_ssbond(header)
# SSBOND 1 CYS A 26 CYS A 84 1555 1555 2.04 | ||
# CYS A 26 CYS A 84 1555 1555 2.0 | ||
# print(inl) | ||
tok = inl.split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not use split when dealing with PDB files. The format is character-delimited, so you might run into cases where e.g. the residue number is large enough to leave no space between itself and the next field and split will fail.
See here for the description of the SSBOND records.
[x ] I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
[ x] I have read the
CONTRIBUTING.rst
file, have runpre-commit
locally, and understand that continuous integration checks will be used to
confirm the Biopython unit tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rst
andCONTRIB.rst
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I've added SSBOND parsing to parse_pdb_header.py. This adds an SSBOND key to the PDB header structure, allowing parsing of disulfide bonds. I'm using this with https://github.com/suchanek/proteusPy . please remove the prior pull request.
Closes #...