Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non english months in pdb headers 4449 #4450

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

exs-gbartlett
Copy link

  • I hereby agree to dual licence this and any previous contributions under both
    the Biopython License Agreement AND the BSD 3-Clause License.

  • I have read the CONTRIBUTING.rst file, have run pre-commit
    locally, and understand that continuous integration checks will be used to
    confirm the Biopython unit tests and style checks pass with these changes.

  • [] I have added my name to the alphabetical contributors listings in the files
    NEWS.rst and CONTRIB.rst as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

Closes #4449

Non-recognized months in pdb header no longer raise ValueError but return '0', the same behaviour as a month of 'xxx'. Unit test included.

@peterjc
Copy link
Member

peterjc commented Sep 20, 2023

Currently this silently ignores the bad month in a PDB file, using zero instead.

Unless our PDB team think otherwise, I would prefer this to consider the permissive/strict mode, and adjust the behavior accordingly.

CC @JoaoRodrigues @etal @jgreener64 (not an exhaustive list)

@jgreener64
Copy link
Contributor

I don't have strong opinions here. On the one hand I don't think returning a zero will cause problems in user code. On the other hand it does say at https://www.wwpdb.org/documentation/file-format-content/format33/sect2.html#HEADER that

The verification program checks that the deposition date is a legitimate date

so I would guess that all PDB entries have a legitimate date (I haven't checked).

@peterjc
Copy link
Member

peterjc commented Sep 21, 2023

I strongly suspect the test case here is from a third party tool exporting in PDB format, not a file from an official repository/mirror.

@exs-gbartlett
Copy link
Author

The test case is from a PDB file generated by a third party tool and is not a public repository file. Does biopython only support PDB files that have been deposited at wwPDB? The month 'xxx' is tolerated in the code without strict/permissive checks.

@peterjc
Copy link
Member

peterjc commented Sep 22, 2023

We try to parse "unofficial" files, but where they break the specification it is reasonable to give a warning or error.

In this case I personally would not want this to be silently parsed in strict mode.

@exs-gbartlett
Copy link
Author

Updated with warning and permissive flag carried through to the _format_date function. Updated unit tests. Hopefully this is an acceptable solution.

Copy link
Member

@peterjc peterjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor improvement aside (warning message and class), this looks good to me.

Bio/PDB/parse_pdb_header.py Outdated Show resolved Hide resolved
f"Non-standard month in PDB header: {month_name}."
) from None

warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still object against warnings, as by default they get printed only once in one Python session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-English months in PDB headers
4 participants