You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I plan to implement a BinaryCIF parser, and I am opening this issue as a forum for those who are interested to discuss the implementation of the parser.
BinaryCIF is a data format that stores CIF files using an efficient binary encoding (rather than a text encoding such as ASCII). BinaryCIF uses several compression methods to compress the CIF data, then encodes the compressed CIF data using a binary encoding called MessagePack. The specification of the BinaryCIF format is here.
Existing BinaryCIF Parsers
The py-mmcif repository contains a pure-Python BinaryCIF parser. The parser uses msgpack to decode the BinaryCIF data. The msgpack module returns a dictionary with the decoded CIF data. The parser then uses Python methods/generators to decompress the decoded CIF data.
Another pure-Python approach exists here with essentially the same approach.
Proposed Biopython Implementation
I propose taking a similar approach to the two existing BinaryCIF parsers listed above: decoding the CIF data using msgpack and decompressing the CIF data using pure Python. The msgpack package supports Python versions 3.8 and greater—the same versions that Biopython supports. The msgpack package would be an optional requirement—only required to parse BinaryCIF files.
After decoding the CIF data using msgpack, the implementation would use Python methods/generators to decompress the decoded CIF data.
Discussion
Using the msgpack package saves us the effort of writing and maintaining our own performant code to decode the MessagePack-formatted data. However, using this package does require the user to install it to use the BinaryCIF parser. Finally, using pure Python allows the parser to build a dictionary containing the CIF information and build the PDB structure similar to the way the mmCIF parser works.
The text was updated successfully, but these errors were encountered:
Background
I plan to implement a BinaryCIF parser, and I am opening this issue as a forum for those who are interested to discuss the implementation of the parser.
BinaryCIF is a data format that stores CIF files using an efficient binary encoding (rather than a text encoding such as ASCII). BinaryCIF uses several compression methods to compress the CIF data, then encodes the compressed CIF data using a binary encoding called MessagePack. The specification of the BinaryCIF format is here.
Existing BinaryCIF Parsers
The py-mmcif repository contains a pure-Python BinaryCIF parser. The parser uses msgpack to decode the BinaryCIF data. The
msgpack
module returns a dictionary with the decoded CIF data. The parser then uses Python methods/generators to decompress the decoded CIF data.Another pure-Python approach exists here with essentially the same approach.
Proposed Biopython Implementation
I propose taking a similar approach to the two existing BinaryCIF parsers listed above: decoding the CIF data using
msgpack
and decompressing the CIF data using pure Python. Themsgpack
package supports Python versions 3.8 and greater—the same versions that Biopython supports. Themsgpack
package would be an optional requirement—only required to parse BinaryCIF files.After decoding the CIF data using
msgpack
, the implementation would use Python methods/generators to decompress the decoded CIF data.Discussion
Using the
msgpack
package saves us the effort of writing and maintaining our own performant code to decode the MessagePack-formatted data. However, using this package does require the user to install it to use the BinaryCIF parser. Finally, using pure Python allows the parser to build a dictionary containing the CIF information and build the PDB structure similar to the way the mmCIF parser works.The text was updated successfully, but these errors were encountered: