-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
saltelli.sample returns several times the exact same samples #447
Comments
Hi @chahank I believe this is expected behaviour - from memory the Sobol' sequence has some repetition initially. I do recognise that it is wasteful to do repeated runs with same values. In the mean time, you can revert to previous behavior by providing a There are some caveats though, so do take note of the documentation: SALib/src/SALib/sample/saltelli.py Line 30 in 4012c13
|
Thanks for the quick response! I have read the documentation, but to be honest could not understand what the whole description about skipping values meant, in particular not how this compared to the previous implementation. The linked PR and Issue discussion also did not help me out there. Is it possible to remove a posteriori the duplicate samples or would this break the sensitivity analysis? |
You mean you're unsure of what "skipping values" actually means? The brief explanation I can offer is that:
To avoid the duplicate samples, you can skip a given number of points in the Sobol' sequence (using the The caveat is simply that ideally
As I understand it, not following the above will still produce usable results (for some value of "usable") but may take more samples than necessary (mentioned above). As for removing samples, this is not recommended. As noted above, the Sobol' sequence is deterministic so changing the samples destroys its structure, and I think the subsequent analysis likely won't be usable. I recommend skipping values to avoid the duplicates rather than filtering. |
Amazing, thanks for the detailed explanation! I would maybe suggest adding this description to the documentation at some level, it would help users to understand the changes from the previous implementation. |
Glad it was helpful, and thanks for raising that the documentation may be too cryptic for some - I guess I assumed too much when making adjustments. Please keep this issue open for now - I will close it after I adjust the documentation again for clarity. |
May I give some feedback on the latest changes to the documentation? I am still somehow confused.
What does this mean? Also, the code gives a warning:
Again, here the question is what is this |
Hi again, Yes, I am still thinking through how to better express the suggestions. To answer your questions directly, there are broadly speaking, two somewhat separate recommendations in the literature as far as I am aware. One recommendation is that both An earlier perspective, the one you ask about here, is that skipping some points of the Sobol' sequence produces improved uniformity in the samples. If following the second of the two:
I take your point that the warning message is likely confusing as it requires awareness of these specific details to discern, and likely needs adjustment. Nevertheless I hope this cleared things up for you, and I'll try to come up with a better approach to explaining this in the documentation. Hope you don't mind if I ask for your input on the documentation again in the future? |
Sure, I am happy to give feedback also in the future. Thanks for the quick responses! Maybe for a bit of context: I integrated SAlib into our CLIMADA package to perform uncertainty and sensitivity analysis. CLIMADA can be used to model the impact and risk of natural catastrophes today and in the future. |
Coming back to the original question: why would the Sobol sequence produce identical samples? Is this a rounding error? I could not find any mention of repeated samples in any publications so far. |
The first point in the sequence is always the origin, so it will always produce identical values if it is not skipped (see Table 2 in [1], and a brief mention in [2]). The way Saltelli's sampler works is to cross-sample between two matrices identical matrices. The [1] https://doi.org/10.1016/j.cpc.2010.12.039 PS: I'm recently became aware of CLIMADA - looks very interesting! |
Hmm, I think to avoid overloading the docs and confusing users I will simplify to outlining just one of the recommendations: that both skip_values and N be a power of 2, and that skip_values be >= N. |
Thanks for the references (I did not yet have the time to read them in details). I think we might not be talking of the same thing when saying "identical samples". If I look at table 2 in [1], all of the 8 (rows in the table) 10-dimensional points are different samples. This is different from the output of the |
I think I understand you, but I also understand the confusion. Table 2 in [1] are not samples, they are points in the Sobol' sequence. The first row shows all points in this sequence for a 10-dimensional problem, and actually all dimensions, are identical (e.g., all set to 0.5). As Campolongo et al., describes (in [1]): "As in the first points of the Sobol' sequence the values of the coordinates tend to repeat (i.e. for the first point they are all equal to 0.5, for the second they are alternates couples of 0.25 and 0.75 and so on" This repetition is what causes the initial samples to be identical: "... in order to achieve different coordinates' values for the points a and b, we need to generate a quasi-random matrix of Sobol' numbers of size (R, 2k), with R > r, and discard the first few points for the auxiliary points ..." |
I am still confused as to how a (good uniform) sampling method from a continuous high-dimensional space can produce identical samples, but I must also admit that I have not yet understood fully the Sobol sequence algorithm. |
Closing - resolved in v1.4.5 |
This seems to still be happening for me. All of the outputs from my sobol sample are exactly the same, even using the recommended methods for setting a skip value. Is anyone else encountering this problem? |
@hudsonb22 could you open a new issue with an example of how you're using SALib and the results you're seeing? I can help more then. Note however that |
@ConnectedSystems Yes! It's issue #600 |
I recently upgrade to SAlib 1.4.0.2 and witnessed a behaviour that looks incorrect to me. When using
saltelli.sample
, most of the returned samples are identical, which would mean that the model is evaluated several times with the exact same input variables. Is this really how it should be?Code from the SAlib example:
Output:
The text was updated successfully, but these errors were encountered: