Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BinaryCIF Parser #4705

Open
Will-Tyler opened this issue Apr 16, 2024 · 2 comments · May be fixed by #4707
Open

BinaryCIF Parser #4705

Will-Tyler opened this issue Apr 16, 2024 · 2 comments · May be fixed by #4707

Comments

@Will-Tyler
Copy link
Contributor

Background

I plan to implement a BinaryCIF parser, and I am opening this issue as a forum for those who are interested to discuss the implementation of the parser.

BinaryCIF is a data format that stores CIF files using an efficient binary encoding (rather than a text encoding such as ASCII). BinaryCIF uses several compression methods to compress the CIF data, then encodes the compressed CIF data using a binary encoding called MessagePack. The specification of the BinaryCIF format is here.

Existing BinaryCIF Parsers

The py-mmcif repository contains a pure-Python BinaryCIF parser. The parser uses msgpack to decode the BinaryCIF data. The msgpack module returns a dictionary with the decoded CIF data. The parser then uses Python methods/generators to decompress the decoded CIF data.

Another pure-Python approach exists here with essentially the same approach.

Proposed Biopython Implementation

I propose taking a similar approach to the two existing BinaryCIF parsers listed above: decoding the CIF data using msgpack and decompressing the CIF data using pure Python. The msgpack package supports Python versions 3.8 and greater—the same versions that Biopython supports. The msgpack package would be an optional requirement—only required to parse BinaryCIF files.

After decoding the CIF data using msgpack, the implementation would use Python methods/generators to decompress the decoded CIF data.

Discussion

Using the msgpack package saves us the effort of writing and maintaining our own performant code to decode the MessagePack-formatted data. However, using this package does require the user to install it to use the BinaryCIF parser. Finally, using pure Python allows the parser to build a dictionary containing the CIF information and build the PDB structure similar to the way the mmCIF parser works.

@Will-Tyler
Copy link
Contributor Author

Tagging @JoaoRodrigues (Bio.PDB maintainer)

@peterjc
Copy link
Member

peterjc commented Apr 16, 2024

Handling msgpack as a soft dependency following approach used for the mmtf-python dependency for MMTF format seems very reasonable.

@Will-Tyler Will-Tyler linked a pull request Apr 23, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants