Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug in writing DREAM posterior to disk #313

Open
bdsegal opened this issue Jan 24, 2024 · 2 comments
Open

Potential bug in writing DREAM posterior to disk #313

bdsegal opened this issue Jan 24, 2024 · 2 comments

Comments

@bdsegal
Copy link

bdsegal commented Jan 24, 2024

Hi all,

Thank you for writing this useful package.

While experimenting with spotpy, my colleagues and I may have encountered a bug in the implementation of the DREAM algorithm, described below. It would be great to get your eyes on this potential bug, as it might lead some users to mistakenly use the incorrect posterior draws. Thank you in advance, and please let me know if I'm missing anything.

Potential bug: On line 353 of dream.py, within each MC iteration parameter proposals are passed to self.postprocessing(). The postprocessing() function returns the likelihood, but it also saves parameters to disk as a side effect. In particular, on line 481 of _algorithm.py, postprocessing() saves the parameters that are passed to it to disk. Per the documentation on line 45 of dream.py I believe this is the same file that a user can then import afterwards. The problem is that the proposed parameters are passed to postprocessing() regardless of whether the proposal is accepted or rejected. So while the parameter values for each MC iteration are correctly stored in the dream object (see lines 390-399 of dream.py) the results saved to disk contain all parameter proposals but not necessarily draws representing the posterior (i.e. the results stored in self.bestpar).

Question: Is the above assessment correct, and if so, is that the intended behavior? I'm worried that some users may expect the values written to disk to be the posterior as opposed to all parameter proposals.

Thanks,
Brian

cc @para2x, @dlebauer, @infotroph

@thouska
Copy link
Owner

thouska commented Jan 26, 2024

Hi @bdsegal,
thank you for your detailed report and for using spotpy.

The design of saving all results to disk is indeed intended. This behaviour does not affect any of the internal samplings. Also, it does not affect the posterior distribution, which is by the original design of the dream algorithm only generated after convergence of the algorithm. At least that's what I think after double checking the code, based on your report.

How many runs are performed after convergence can be defined by the user and is indeed important to use only this set number of runs as part of the results afterwards, if you want to analyse the posterior distribution. I hope this is understandable when following the dream Tutorial? If not, I am happy about any suggestions for improvement.

@bdsegal
Copy link
Author

bdsegal commented Jan 26, 2024

Thank you @thouska for your answer and for pointing us to the tutorial. The tutorial is very helpful.

if I'm understanding correctly, it looks like the tutorial uses the post-convergence proposals to analyze the posterior, as opposed to the post-convergence results of the accept/reject decisions (i.e. the values that are added to self.bestpar). Is that right? We would like to use latter, and it looks like we can directly access those values in self.bestpar, though please let me know if you would recommend another approach.

My impression is that using the results from the accept/reject decisions as opposed to the raw proposals is in line with Vrugt (2016), which you reference in the dream tutorial, though please let me know if I'm misunderstanding.

I'm also not sure if it is necessary to discard all pre-convergence draws, since R-hat measures how well the chains are mixing up to the current iteration (potentially removing the burn-in period as is the default behavior in rstan's monitor function), though I do acknowledge that Vrugt recommends discarding the pre-convergence period as the warmup. It's great that the current output of spotpy provides flexibility in how the results can be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants