Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encore uses weak random numbers #37

Open
kain88-de opened this issue Jul 5, 2017 · 2 comments
Open

Encore uses weak random numbers #37

kain88-de opened this issue Jul 5, 2017 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@kain88-de
Copy link
Member

kain88-de commented Jul 5, 2017

In ap.c the normal rand function of the c standard lib is called. In general this is a weak Linear congruential generator that is not thread save. With weak here I mean that it has a short period and the resulting numbers are correlated. Another issue is thread safety. This is a problem when the CAffinityPropagation function is called from separate python processes because we can't independently seed the RNG for each call of the function. I'm not sure how often the addition of noise is used in practice but I would recommend to remove the addition of noise in ap.c and rather do that in affinityprop.pyx with it's own numpy.random.RandomState. To allow using unique random states I would suggest the following code from scikit-learn.

# copy-pasted from scikit-learn utils/validation.py
def check_random_state(seed):
    """Turn seed into a np.random.RandomState instance

    If seed is None (or np.random), return the RandomState singleton used
    by np.random.
    If seed is an int, return a new RandomState instance seeded with seed.
    If seed is already a RandomState instance, return it.
    Otherwise raise ValueError.
    """
    if seed is None or seed is np.random:
        return np.random.mtrand._rand
    if isinstance(seed, (numbers.Integral, np.integer)):
        return np.random.RandomState(seed)
    if isinstance(seed, np.random.RandomState):
        return seed
    raise ValueError('%r cannot be used to seed a numpy.random.RandomState'
                     ' instance' % seed)

The noise addtion in AffinityPropagation.run should then be

def run(...., noise=False, seed=None):
    if noise:
        rng = check_random_state(seed)
        matndarray += matndarray *  rng.uniform(0, 1e-16, size=matndarray.shape)

This will allow to pass a separate seed for each python process ensuring that the random numbers are not correlated. Additionally the numpy uses the Mersenne Twister, a widely accepted RNG for scientific applications.

@mtiberti if it doesn't matter that the added noise might be correlated it would be nice if you can still add a comment in the C code.

@mtiberti mtiberti self-assigned this Jul 6, 2017
@mtiberti
Copy link

mtiberti commented Jul 7, 2017

Thanks for this. Since noise is added to remove degeneracies from the similarity matrix it shouldn't be a problem if it's slightly correlated - a potentially problematic case would be when the same noise is added to elements of the similarity matrix which are identical, which I think is unlikely. However since the change you're proposing is not very time-consuming and still adds to the robustness of the code, I'm going to implement it. The only problem is that I'm about to leave for holidays and I'll be back on the 26th of July and I'll unlikely be able to work on it before then

@mtiberti
Copy link

a temporary solution would be the ability to pass a different random seed to each independent process to ensure that they are uncorrelated, as suggested in #40. The quality of RNG is not very critical for this application

@orbeckst orbeckst added the enhancement New feature or request label Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants