SSBond parsing addition - 2nd attempt #4650

suchanek · 2024-03-05T15:03:39Z

[x ] I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
[ x] I have read the CONTRIBUTING.rst file, have run pre-commit
locally, and understand that continuous integration checks will be used to
confirm the Biopython unit tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rst and CONTRIB.rst as part of this pull request, am listed
already, or do not wish to be listed. (This acknowledgement is optional.)

I've added SSBOND parsing to parse_pdb_header.py. This adds an SSBOND key to the PDB header structure, allowing parsing of disulfide bonds. I'm using this with https://github.com/suchanek/proteusPy . please remove the prior pull request.

Closes #...

peterjc · 2024-03-13T10:38:26Z

Bio/PDB/parse_pdb_header.py

@@ -1,7 +1,6 @@
 #!/usr/bin/env python
 # Copyright 2004 Kristian Rother.
 # Revisions copyright 2004 Thomas Hamelryck.
-# Revisions copyright 2024 James Krieger.


Was removing this line deliberate, and if so why?

peterjc · 2024-03-13T10:39:45Z

Bio/PDB/parse_pdb_header.py

@@ -21,6 +20,23 @@
 from Bio import File


+# Added to parse SSBOND records from the PDB header and add them back to the header dict
+# Eric G. Suchanek, PhD 11/25/22
+def _get_ssbond(inl):


This is a private function (leading underscore) so we don't insist on a docstring, but the comments here might as well be transformed into a docstring. Note there are style rules (one line summary, blank line, more text etc), so do run the pre-commit checks as per https://github.com/biopython/biopython/blob/master/CONTRIBUTING.rst

peterjc · 2024-03-13T10:42:18Z

Bio/PDB/parse_pdb_header.py

+            ssb = _get_ssbond(tail)
+            ssbdict = {ssbond_numb: ssb}
+            pdbh_dict["ssbond"].update(ssbdict)
+        # print(f'SSBOND: {tail}')


You can remove the commented out print statement.

peterjc

Currently this records a dictionary of 4-entry tuples, but only from reading the source code can you guess what they are and even then the names are cryptic: (prox, dist, chn, chn2).

For a start, please expand those into full words for clarity - and then perhaps use a named tuple or a dictionary instead of a tuple?

What ever you do, I think the SSBOND parsing needs to be documented in a public docstring.

Also, this really needs a test. I suggest extending the unit test already parsing Tests/PDB/1A8O.pdb (one SSBOND line) or better Tests/PDB/7DDO.pdb which has seven.

suchanek · 2024-03-13T11:43:40Z

Inadvertently removed due to my inexperience. Will restore and incorporate your comments. Thank you. Cheers, Eric — Eric G. Suchanek, PhD Do good, be good, make the world a better place. It only takes ONE person to change the world. ***@***.***

…

On Mar 13, 2024, at 6:46 AM, Peter Cock ***@***.***> wrote: @peterjc requested changes on this pull request. Currently this records a dictionary of 4-entry tuples, but only from reading the source code can you guess what they are and even then the names are cryptic: (prox, dist, chn, chn2). For a start, please expand those into full words for clarity - and then perhaps use a named tuple or a dictionary instead of a tuple? What ever you do, I think the SSBOND parsing needs to be documented in a public docstring. Also, this really needs a test. I suggest extending the unit test already parsing Tests/PDB/1A8O.pdb (one SSBOND line) or better Tests/PDB/7DDO.pdb which has seven. — Reply to this email directly, view it on GitHub <#4650 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7PYGR64TAWCYFVFDR3VHTYYAVCHAVCNFSM6AAAAABEHJ5MBWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMZTG43TKMRQGM>. You are receiving this because you authored the thread.

JoaoRodrigues · 2024-03-13T20:26:09Z

Bio/PDB/parse_pdb_header.py

@@ -166,7 +182,8 @@ def parse_pdb_header(infile):
            record_type = line[0:6]
            if record_type in ("ATOM  ", "HETATM", "MODEL "):
                break
-            header.append(line)
+            else:


No need for the else here. The for loop will run until it his one of the record_types above and then will break (stop the for-loop).

JoaoRodrigues · 2024-03-13T20:26:44Z

Bio/PDB/parse_pdb_header.py

@@ -219,6 +236,7 @@ def _parse_remark_465(line):

 def _parse_pdb_header_list(header):
    # database fields
+    # added ssbond 11/20/22 Eric G. Suchanek


I'd remove these comments. Git will track these changes and keep your authorship intact if anyone wishes to know who wrote this code.

JoaoRodrigues · 2024-03-13T20:29:36Z

Bio/PDB/parse_pdb_header.py

@@ -333,6 +354,13 @@ def _parse_pdb_header_list(header):
                pdbh_dict["author"] += auth
            else:
                pdbh_dict["author"] = auth
+        # parse SSBOND records and add them to the Header dict
+        elif key == "SSBOND":


Why not do something like in lines 259-261 above and use the _get_ssbond method to directly grab the ss bond info from the header? That would make the function much easier to test (a single call) and you can do all the manipulation you need inside of it.

In other words:

pdbh_dict["ssbond"] = _get_ssbond(header)

JoaoRodrigues · 2024-03-13T20:32:21Z

Bio/PDB/parse_pdb_header.py

+    # SSBOND   1 CYS A   26    CYS A   84                          1555   1555  2.04
+    # CYS A   26    CYS A   84                          1555   1555  2.0
+    # print(inl)
+    tok = inl.split()


Please do not use split when dealing with PDB files. The format is character-delimited, so you might run into cases where e.g. the residue number is large enough to leave no space between itself and the next field and split will fail.

See here for the description of the SSBOND records.

suchanek added 2 commits March 5, 2024 09:33

added my changed parse_pdb_header.py

7a23664

_get_biomoltrans

76ba041

suchanek requested a review from JoaoRodrigues as a code owner March 5, 2024 15:03

import error fixed

481cd69

peterjc mentioned this pull request Mar 13, 2024

Add SSBond parsing to parse_pdb_header.py #4649

Closed

peterjc reviewed Mar 13, 2024

View reviewed changes

peterjc requested changes Mar 13, 2024

View reviewed changes

JoaoRodrigues requested changes Mar 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSBond parsing addition - 2nd attempt #4650

SSBond parsing addition - 2nd attempt #4650

suchanek commented Mar 5, 2024

peterjc Mar 13, 2024

peterjc Mar 13, 2024

peterjc Mar 13, 2024

peterjc left a comment

suchanek commented Mar 13, 2024 via email

JoaoRodrigues Mar 13, 2024

JoaoRodrigues Mar 13, 2024

JoaoRodrigues Mar 13, 2024

JoaoRodrigues Mar 13, 2024

SSBond parsing addition - 2nd attempt #4650

Are you sure you want to change the base?

SSBond parsing addition - 2nd attempt #4650

Conversation

suchanek commented Mar 5, 2024

peterjc Mar 13, 2024

Choose a reason for hiding this comment

peterjc Mar 13, 2024

Choose a reason for hiding this comment

peterjc Mar 13, 2024

Choose a reason for hiding this comment

peterjc left a comment

Choose a reason for hiding this comment

suchanek commented Mar 13, 2024 via email

JoaoRodrigues Mar 13, 2024

Choose a reason for hiding this comment

JoaoRodrigues Mar 13, 2024

Choose a reason for hiding this comment

JoaoRodrigues Mar 13, 2024

Choose a reason for hiding this comment

JoaoRodrigues Mar 13, 2024

Choose a reason for hiding this comment