Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch_lfw_pairs issue #23750

Open
AnkS4 opened this issue Jun 24, 2022 · 4 comments
Open

fetch_lfw_pairs issue #23750

AnkS4 opened this issue Jun 24, 2022 · 4 comments
Labels

Comments

@AnkS4
Copy link

AnkS4 commented Jun 24, 2022

Describe the bug

On two different machines for exactly same code, I am getting different result.

Steps/Code to Reproduce

from sklearn.datasets import fetch_lfw_pairs
from sklearn.utils import resample

X = fetch_lfw_pairs(
    subset="test",
    # funneled=False,
    # slice_=(slice(0, 250), slice(0, 250)),
    resize=1,
    color=True
  )

n_samples = 1
imgs, y = resample(X.pairs, X.target, n_samples=n_samples, random_state=101)

print(imgs[0][0])

Expected Results

Both on both machines should be the same.

Actual Results

Result on Machine 1:

array([[[ 77.,  77.,  77.],
        [ 94.,  88.,  76.],
        [107., 101.,  77.],
        ...,
        [159., 124.,  97.],
        [141., 110.,  84.],
        [136., 109.,  82.]],

       [[ 78.,  78.,  80.],
        [108., 101.,  91.],
        [128., 121.,  99.],
        ...,
        [159., 124.,  97.],
        [151., 120.,  94.],
        [149., 124.,  98.]],

       [[ 82.,  80.,  85.],
        [111., 104.,  96.],
        [130., 122., 103.],
        ...,
        [163., 131., 105.],
        [157., 129., 104.],
        [159., 138., 110.]],

       ...,

       [[ 27.,  27.,  27.],
        [ 27.,  27.,  27.],
        [ 24.,  24.,  24.],
        ...,
        [151., 172., 235.],
        [142., 163., 228.],
        [128., 150., 210.]],

       [[ 26.,  24.,  25.],
        [ 33.,  33.,  33.],
        [ 30.,  29.,  27.],
        ...,
        [150., 171., 236.],
        [136., 157., 222.],
        [112., 133., 194.]],

       [[ 31.,  30.,  26.],
        [ 36.,  35.,  33.],
        [ 37.,  36.,  34.],
        ...,
        [146., 167., 230.],
        [124., 145., 208.],
        [ 93., 112., 171.]]], dtype=float32)

Result on Machine 2:

array([[[0.        , 0.00392157, 0.        ],
        [0.00392157, 0.01176471, 0.04705882],
        [0.01960784, 0.07450981, 0.16470589],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.00392157]],

       [[0.        , 0.        , 0.        ],
        [0.00392157, 0.01176471, 0.05098039],
        [0.01960784, 0.07843138, 0.16862746],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.00392157]],

       [[0.        , 0.00392157, 0.        ],
        [0.00392157, 0.01568628, 0.05490196],
        [0.01568628, 0.07450981, 0.16470589],
        ...,
        [0.        , 0.        , 0.00392157],
        [0.        , 0.00392157, 0.        ],
        [0.        , 0.00392157, 0.        ]],

       ...,

       [[0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]],

       [[0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]],

       [[0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        ...,
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ],
        [0.        , 0.        , 0.        ]]], dtype=float32)

Versions

Machine 1:

System:
    python: 3.8.5 (default, Sep  4 2020, 07:30:14)  [GCC 7.3.0]
executable: /home/anks/miniconda3/envs/stealth/bin/python
   machine: Linux-5.17.9-1-MANJARO-x86_64-with-glibc2.10

Python dependencies:
          pip: 21.2.4
   setuptools: 58.0.4
      sklearn: 1.0.2
        numpy: 1.21.2
        scipy: 1.7.3
       Cython: 0.29.30
       pandas: 1.3.0
   matplotlib: 3.5.1
       joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

Machine 2:

System:
    python: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:55:52)  [GCC 7.5.0]
executable: /home/aimluser/miniconda3/envs/application_env/bin/python3.8
   machine: Linux-4.4.0-042stab145.3-x86_64-with-glibc2.10

Python dependencies:
          pip: 22.1.2
   setuptools: 62.1.0
      sklearn: 1.0.2
        numpy: 1.22.3
        scipy: 1.8.0
       Cython: None
       pandas: 1.3.0
   matplotlib: 3.5.2
       joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True
@glemaitre
Copy link
Member

Is the original X the same on both machines?
Does the architecture different on the machines?

@AnkS4
Copy link
Author

AnkS4 commented Jun 28, 2022

No, the X is different in both machines.

Both machines are: x86_64

@thomasjpfan thomasjpfan added module:datasets Needs Investigation Issue requires investigation and removed Needs Triage Issue requires triage labels Jun 28, 2022
@lesteve
Copy link
Member

lesteve commented Jun 29, 2022

Weird, the first thing I would try would be to re-download the dataset by moving the cache out of the way: mv ~/scikit_learn_data{,.bak} and reexecute your snippet on both machines to see if you get rid of your issue.

If you still have the issue and if X differ on both machines as you said, this would be great if you could update your top post, simplify your snippet to only use fetch_lfw_pairs and avoid the resample afterwards, and update the snippet output on both machines.

@glemaitre
Copy link
Member

I quickly check on 2 machines (linux and MacOS) and reproduce the results of Machine #2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants