Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSBond parsing addition - 2nd attempt #4650

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

suchanek
Copy link

@suchanek suchanek commented Mar 5, 2024

  • [x ] I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • [ x] I have read the CONTRIBUTING.rst file, have run pre-commit
    locally, and understand that continuous integration checks will be used to
    confirm the Biopython unit tests and style checks pass with these changes.

  • I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

I've added SSBOND parsing to parse_pdb_header.py. This adds an SSBOND key to the PDB header structure, allowing parsing of disulfide bonds. I'm using this with https://github.com/suchanek/proteusPy . please remove the prior pull request.

Closes #...

@@ -1,7 +1,6 @@
#!/usr/bin/env python
# Copyright 2004 Kristian Rother.
# Revisions copyright 2004 Thomas Hamelryck.
# Revisions copyright 2024 James Krieger.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was removing this line deliberate, and if so why?

@@ -21,6 +20,23 @@
from Bio import File


# Added to parse SSBOND records from the PDB header and add them back to the header dict
# Eric G. Suchanek, PhD 11/25/22
def _get_ssbond(inl):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a private function (leading underscore) so we don't insist on a docstring, but the comments here might as well be transformed into a docstring. Note there are style rules (one line summary, blank line, more text etc), so do run the pre-commit checks as per https://github.com/biopython/biopython/blob/master/CONTRIBUTING.rst

ssb = _get_ssbond(tail)
ssbdict = {ssbond_numb: ssb}
pdbh_dict["ssbond"].update(ssbdict)
# print(f'SSBOND: {tail}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the commented out print statement.

Copy link
Member

@peterjc peterjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this records a dictionary of 4-entry tuples, but only from reading the source code can you guess what they are and even then the names are cryptic: (prox, dist, chn, chn2).

For a start, please expand those into full words for clarity - and then perhaps use a named tuple or a dictionary instead of a tuple?

What ever you do, I think the SSBOND parsing needs to be documented in a public docstring.

Also, this really needs a test. I suggest extending the unit test already parsing Tests/PDB/1A8O.pdb (one SSBOND line) or better Tests/PDB/7DDO.pdb which has seven.

@suchanek
Copy link
Author

suchanek commented Mar 13, 2024 via email

@@ -166,7 +182,8 @@ def parse_pdb_header(infile):
record_type = line[0:6]
if record_type in ("ATOM ", "HETATM", "MODEL "):
break
header.append(line)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the else here. The for loop will run until it his one of the record_types above and then will break (stop the for-loop).

@@ -219,6 +236,7 @@ def _parse_remark_465(line):

def _parse_pdb_header_list(header):
# database fields
# added ssbond 11/20/22 Eric G. Suchanek
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove these comments. Git will track these changes and keep your authorship intact if anyone wishes to know who wrote this code.

@@ -333,6 +354,13 @@ def _parse_pdb_header_list(header):
pdbh_dict["author"] += auth
else:
pdbh_dict["author"] = auth
# parse SSBOND records and add them to the Header dict
elif key == "SSBOND":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do something like in lines 259-261 above and use the _get_ssbond method to directly grab the ss bond info from the header? That would make the function much easier to test (a single call) and you can do all the manipulation you need inside of it.

In other words:

pdbh_dict["ssbond"] = _get_ssbond(header)

# SSBOND 1 CYS A 26 CYS A 84 1555 1555 2.04
# CYS A 26 CYS A 84 1555 1555 2.0
# print(inl)
tok = inl.split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use split when dealing with PDB files. The format is character-delimited, so you might run into cases where e.g. the residue number is large enough to leave no space between itself and the next field and split will fail.

See here for the description of the SSBOND records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants