Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic execution order in CustomIntegrator #4038

Closed
tristanic opened this issue Apr 20, 2023 · 8 comments · Fixed by #4043
Closed

Non-deterministic execution order in CustomIntegrator #4038

tristanic opened this issue Apr 20, 2023 · 8 comments · Fixed by #4043
Labels

Comments

@tristanic
Copy link
Contributor

Following on from #2489 (comment)... this may be just my misunderstanding of the documentation, but if I define a CustomIntegrator (whose purpose is to generate a moving average of the simulation coordinates) as follows:

class Smoother(CustomIntegrator):
    def __init__(self, smoothing_alpha, smoothing=True):
        super().__init__(0)
        self.addGlobalVariable('reset_smooth', 1.0)
        self.addGlobalVariable('enabled', int(smoothing))
        self.addGlobalVariable('smoothing_alpha', smoothing_alpha)
        self.addPerDofVariable('smoothed', 0)
    
        self.beginIfBlock('enabled > 0')
#### Non-deterministic order
        self.addComputePerDof('smoothed', 'select(reset_smooth, x, x*smoothing_alpha + smoothed*(1-smoothing_alpha))')
        self.addComputeGlobal('reset_smooth', '0')
#### /Non-deterministic order
        self.endBlock()
    
        self.beginIfBlock('enabled = 0')
        self.addComputePerDof('smoothed', 'x')
        self.endBlock()

... then the behaviour on startup is non-deterministic. If the order of operations goes as written, what should happen is that on first run the select function simply fills the smoothed variable with the current coordinates, then the following addComputeGlobal('reset_smooth', '0') ensures that subsequent calls use the actual smoothing function. That sometimes happens, but in other cases it seems that reset_smooth is set to zero first, so the first set of coordinates fed to the smoothing function is all zeros. This happens on the Mac OpenCL and Metal platforms - haven't tried others.

This alternative approach appears to work fully reliably:

class Smoother(CustomIntegrator):
    def __init__(self, smoothing_alpha, smoothing=True):
        super().__init__(0)
        self.addGlobalVariable('reset_smooth', 1.0)
        self.addGlobalVariable('enabled', int(smoothing))
        self.addGlobalVariable('smoothing_alpha', smoothing_alpha)
        self.addPerDofVariable('smoothed', 0)
    
        self.beginIfBlock('enabled > 0')

        self.beginIfBlock('reset_smooth>0')
        self.addComputePerDof('smoothed', 'x')
        self.addComputeGlobal('reset_smooth', '0')
        self.endBlock() # reset_smooth>0

        self.beginIfBlock('reset_smooth=0')
        self.addComputePerDof('smoothed', 'x*smoothing_alpha + smoothed*(1-smoothing_alpha)')
        self.endBlock() # reset_smooth=0

        self.endBlock() # enabled > 0
    
        self.beginIfBlock('enabled = 0')
        self.addComputePerDof('smoothed', 'x')
        self.endBlock()
@peastman
Copy link
Member

The steps should always be executed in exactly the order listed. Can you create a complete test case that demonstrates the problem you're seeing?

@tristanic
Copy link
Contributor Author

tristanic commented Apr 20, 2023 via email

@tristanic
Copy link
Contributor Author

tristanic commented Apr 21, 2023

Here you go. The Smoother is instantiated with reset_smooth=1.0 and it is only stepped once, so the output of state.getPositions().value_in_unit(nanometers) should be identical to the output of smoother.getPerDofVariableByName('smoothed'). On my Mac it's a toss-up whether the latter is correct or all zeros. For what it's worth, yesterday I was running in ChimeraX using the native arm64 build, but for this test I used the conda x86_64 environment.

integrator_test_case.zip

edit: corrected logic fail on my part (doesn't affect the bug existence, just my explanation of it).

@tristanic
Copy link
Contributor Author

Also happens in Windows on both OpenCL and CUDA platforms. Doesn't appear to affect the CPU platform.

@peastman
Copy link
Member

The fix is in #4043. It was a problem with uninitialized memory. Thanks for creating such a clear test case!

@tristanic
Copy link
Contributor Author

Awesome, thanks for tracking it down so fast! Could I lean on you for a peace-of-mind check on the alternative Smoother implementation in the OP? I haven't seen it fail so far, but now I'm worried that might just be coincidence.

@peastman
Copy link
Member

I think it's just a coincidence. That version could be affected just as easily. But there's an easy workaround. Immediately after creating the Context, call setGlobalVariable() on the integrator to set a variable. You only need to set one, and it doesn't matter which. That will ensure that deviceGlobalsAreCurrent is false. But you need to wait until after the Context is created.

@tristanic
Copy link
Contributor Author

tristanic commented Apr 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants