Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse charmm psf file with residue insertion codes failed #2053

Closed
qiyifei1 opened this issue Aug 16, 2018 · 10 comments · Fixed by #4582
Closed

parse charmm psf file with residue insertion codes failed #2053

qiyifei1 opened this issue Aug 16, 2018 · 10 comments · Fixed by #4582

Comments

@qiyifei1
Copy link

The 0.18.0 MDAnalysis failed to parse charmm psf file when the residue id is not pure digit.

Error message:

Traceback (most recent call last):
  File "temp.py", line 1287, in <module>
    traj = MDAnalysis.Universe(psffile, dcdfile)
  File "/opt/anaconda2/lib/python2.7/site-packages/MDAnalysis/core/universe.py", line 296, in __init__
    "Error: {2}".format(self.filename, parser, err))
ValueError: Failed to construct topology from file nowater.psf with parser <class 'MDAnalysis.topology.PSFParser.PSFParser'>.
Error: invalid literal for int() with base 10: '185A'

Here are a few lines from the psf file. It is quite common to have a letter in the residue id in pdb files.

  4682 PROD     185      ASN      OD1      O       -0.550000       15.9990           0   0.00000     -0.301140E-02
  4683 PROD     185      ASN      ND2      NH2     -0.620000       14.0070           0   0.00000     -0.301140E-02
  4684 PROD     185      ASN      HD21     H        0.320000       1.00800           0   0.00000     -0.301140E-02
  4685 PROD     185      ASN      HD22     H        0.300000       1.00800           0   0.00000     -0.301140E-02
  4686 PROD     185      ASN      C        C        0.510000       12.0110           0   0.00000     -0.301140E-02
  4687 PROD     185      ASN      O        O       -0.510000       15.9990           0   0.00000     -0.301140E-02
  4688 PROD     185A     GLU      N        NH1     -0.470000       14.0070           0   0.00000     -0.301140E-02
  4689 PROD     185A     GLU      HN       H        0.310000       1.00800           0   0.00000     -0.301140E-02
  4690 PROD     185A     GLU      CA       CT1      0.700000E-01   12.0110           0   0.00000     -0.301140E-02
  4691 PROD     185A     GLU      HA       HB1      0.900000E-01   1.00800           0   0.00000     -0.301140E-02
  4692 PROD     185A     GLU      CB       CT2A    -0.180000       12.0110           0   0.00000     -0.301140E-02
  4693 PROD     185A     GLU      HB1      HA2      0.900000E-01   1.00800           0   0.00000     -0.301140E-02
  4694 PROD     185A     GLU      HB2      HA2      0.900000E-01   1.00800           0   0.00000     -0.301140E-02
  4695 PROD     185A     GLU      CG       CT2     -0.280000       12.0110           0   0.00000     -0.301140E-02
  4696 PROD     185A     GLU      HG1      HA2      0.900000E-01   1.00800           0   0.00000     -0.301140E-02
@richardjgowers
Copy link
Member

Hi @qiyifei1 , thanks for the bug report. Looks like PSF files can have insertion codes...

@orbeckst
Copy link
Member

@qiyifei1 how did you generate this PSF file?

I have never heard of insertion codes in PSF files. Can you read this file in VMD?

@qiyifei1
Copy link
Author

It was created using CHARMM-GUI which uses CHARMM to generate psf files. VMD is OK (display and selection).
Here is an example
http://protein.org.cn/uploads/step1_pdbreader.psf

@orbeckst
Copy link
Member

@qiyifei1 thanks for the example. If CHARMM-GUI produced the file then we should be able to parse it – but apparently we cannot, so this is a bug.

@ldx022
Copy link

ldx022 commented Nov 8, 2023

So has this bug been fixed? The psf topology file still cannot be loaded. The mdanalysis version of 2.6.1 is used.
The script I used is 'md = mda.Universe("../cg.psf", "./run.xtc")',but it failed and print as follows:
alueError Traceback (most recent call last)
File ~/software/miniconda3/envs/mdanalysis/lib/python3.10/site-packages/MDAnalysis/core/universe.py:110, in _topology_from_file_like(topology_file, topology_format, **kwargs)
109 with parser(topology_file) as p:
--> 110 topology = p.parse(**kwargs)
111 except (IOError, OSError) as err:
112 # There are 2 kinds of errors that might be raised here:
113 # one because the file isn't present
114 # or the permissions are bad, second when the parser fails

File ~/software/miniconda3/envs/mdanalysis/lib/python3.10/site-packages/MDAnalysis/topology/PSFParser.py:149, in PSFParser.parse(self, **kwargs)
147 next(psffile)
148 top.add_TopologyAttr(
--> 149 attr(self._parse_sec(psffile, info)))
150 except StopIteration:
151 # Reached the end of the file before we expected

File ~/software/miniconda3/envs/mdanalysis/lib/python3.10/site-packages/MDAnalysis/topology/PSFParser.py:178, in PSFParser._parse_sec(self, psffile, section_info)
177 logger.error(err)
--> 178 raise ValueError(err)
179 # Now figure out how many lines to read

ValueError: Expected section NTHETA but found 12

During handling of the above exception, another exception occurred:
...
128 "Error: {2}".format(topology_file, parser, err))
129 return topology

ValueError: Failed to construct topology from file ../cg.psf with parser <class 'MDAnalysis.topology.PSFParser.PSFParser'>.
Error: Expected section NTHETA but found 12

There are some custom residues in this psf file.However,I can load this psf file by using VMD.Is there any other solution?
Kind regards.

@orbeckst
Copy link
Member

orbeckst commented Nov 8, 2023

@ldx022 does your PSF file contain insertion codes, as in the original report, i.e. residue numbers with directly a character following?

@orbeckst orbeckst changed the title parse charmm psf file failed parse charmm psf file with residue insertion codes failed Nov 8, 2023
@ldx022
Copy link

ldx022 commented Nov 9, 2023

Thank you for your reply. For the psf file type, I only know that it is one of the topological formats.I'm not sure what insert code is, if I follow my own understanding, I have checked and there is no similar situation to the above 185A GLU.This psf file was generated using a coarse-grained program we developed. As a result, it has some custom residue types and atom types.Using vmd to load this file without any problem, I now attach the psf file(Because github upload has format restrictions, so I replaced the psf suffix with txt suffix)
cg.txt

@yuxuanzhuang
Copy link
Contributor

I recently ran into the same issue where I was unable to parse PSF files generated by psfgen in VMD.

After checking the PSF format, surprise surprise resid is a string! see the format below

https://github.com/MDAnalysis/mdanalysis/blob/d366921b0f614f455f5cc90d4024b8a5817aa9ed/package/MDAnalysis/topology/PSFParser.py#L212C26-L212C29

In VMD, resid is read as a string and then converted to an integer using the atoi function.

I believe the optimal approach would be to maintain resid as an integer in MDAnalysis while adding an attribute residstr. Maybe a good GSOC starter?

@orbeckst
Copy link
Member

orbeckst commented Mar 7, 2024

Can someone dig out the PSF specs from the CHARMM website or the source?

If it’s clear that insertion codes are defined then we can parse them separately and have an attribute “icode” do residues, IIRC we have that for PDB.

I’d rather parse the correct semantics than having a residstr attribute.

@yuxuanzhuang
Copy link
Contributor

From the latest charmm/source/io/psfres.F90 (fmt01 for no chemq; fmt01 for chemq; fmt01b for drude):

     # II,LSEGID,LRESID,LRES,TYPE(I),IAC(I),CG(I),AMASS(I),IMOVE(I),ECH(I),EHA(I)
     # PSF EXT
     fmt01= '(I10,1X,A8,1X,A8, 1X,A8,1X,A8,1X,I4,1X,2G14.6,I8)'
     fmt01a='(I10,1X,A8,1X,A8,1X,A8,1X,A8,1X,I4,1X,2G14.6,I8,2G14.6)'
     fmt01b='(I10,1X,A8,1X,A8,1X,A8,1X,A8,1X,I4,1X,2G14.6,I8,2G14.6,L1)'
     # PSF
     fmt01='(I8,1X,A4,1X,A4,1X,A4,1X,A4,1X,I4,1X,2G14.6,I8)'
     fmt01a='(I8,1X,A4,1X,A4,1X,A4,1X,A4,1X,I4,1X,2G14.6,I8,2G14.6)'
     fmt02='(I8,1X,A4,1X,A4,1X,A4,1X,A4,1X,A4,1X,2G14.6,I8)'
     fmt02a='(I8,1X,A4,1X,A4,1X,A4,1X,A4,1X,A4,1X,2G14.6,I8,2G14.6)'

LRESID -> A8 / A4

I believe CHARMM formats the resid to be left-justified and adds the insertion code at the end if present.

However, in the file generated by psfgen, the resid changes to hex (when it exceeds 99999?). It makes it difficult to differentiate the resid from the insertion code under these circumstances.

We can certainly be courteous and consider all conditions (if possible), or simply remove all non-numeric characters from the resid and parse the remainder (and adding 'residstr' as an attribute or not).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants