Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test test_dataframe_aggregations_multilevel[cov-disk-<lambda>1] #8795

Closed
jsignell opened this issue Mar 9, 2022 · 3 comments
Closed
Labels
needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. tests Unit tests and/or continuous integration

Comments

@jsignell
Copy link
Member

jsignell commented Mar 9, 2022

There is a flaky test test_dataframe_aggregations_multilevel[cov-disk-<lambda>1]

=================================== FAILURES ===================================
__________ test_dataframe_aggregations_multilevel[cov-disk-<lambda>1] __________
[gw2] linux -- Python 3.9.10 /usr/share/miniconda3/envs/test-environment/bin/python

grouper = <function <lambda> at 0x7f79a8f8da60>, agg_func = 'cov'

    @pytest.mark.parametrize(
        "grouper",
        [
            lambda df: ["a"],
            lambda df: ["a", "b"],
            lambda df: df["a"],
            lambda df: [df["a"], df["b"]],
            lambda df: [df["a"] > 2, df["b"] > 1],
        ],
    )
    def test_dataframe_aggregations_multilevel(grouper, agg_func):
        def call(g, m, **kwargs):
            return getattr(g, m)(**kwargs)
    
        pdf = pd.DataFrame(
            {
                "a": [1, 2, 6, 4, 4, 6, 4, 3, 7] * 10,
                "b": [4, 2, 7, 3, 3, 1, 1, 1, 2] * 10,
                "d": [0, 1, 2, 3, 4, 5, 6, 7, 8] * 10,
                "c": [0, 1, 2, 3, 4, 5, 6, 7, 8] * 10,
            },
            columns=["c", "b", "a", "d"],
        )
    
        ddf = dd.from_pandas(pdf, npartitions=10)
    
        # covariance only works with N+1 columns
        if agg_func not in ("cov", "corr"):
            assert_eq(
                call(pdf.groupby(grouper(pdf))["c"], agg_func),
                call(ddf.groupby(grouper(ddf))["c"], agg_func, split_every=2),
            )
    
        # not supported by pandas
        if agg_func != "nunique":
            assert_eq(
                call(pdf.groupby(grouper(pdf))[["c", "d"]], agg_func),
                call(ddf.groupby(grouper(ddf))[["c", "d"]], agg_func, split_every=2),
            )
    
            if agg_func in ("cov", "corr"):
                # there are sorting issues between pandas and chunk cov w/dask
                df = call(pdf.groupby(grouper(pdf)), agg_func).sort_index()
                cols = sorted(list(df.columns))
                df = df[cols]
>               dddf = call(ddf.groupby(grouper(ddf)), agg_func, split_every=2).compute()

dask/dataframe/tests/test_groupby.py:1138: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dask/base.py:292: in compute
    (result,) = compute(self, traverse=False, **kwargs)
dask/base.py:575: in compute
    results = schedule(dsk, keys, **kwargs)
dask/threaded.py:81: in get
    results = get_async(
dask/local.py:506: in get_async
    raise_exception(exc, tb)
dask/local.py:314: in reraise
    raise exc
dask/local.py:219: in execute_task
    result = _execute_task(task, data)
dask/core.py:119: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
dask/optimization.py:969: in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
dask/core.py:149: in get
    result = _execute_task(task, cache)
dask/core.py:119: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
dask/dataframe/groupby.py:449: in _cov_chunk
    mul = g.apply(_mul_cols, cols=cols).reset_index(level=-1, drop=True)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:1414: in apply
    result = self._python_apply_general(f, self._selected_obj)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:1460: in _python_apply_general
    return self._wrap_applied_output(
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/generic.py:1007: in _wrap_applied_output
    return self._concat_objects(values, not_indexed_same=not_indexed_same)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:1058: in _concat_objects
    result = concat(
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/util/_decorators.py:311: in wrapper
    return func(*args, **kwargs)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/reshape/concat.py:346: in concat
    op = _Concatenator(
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/reshape/concat.py:420: in __init__
    keys = type(keys).from_tuples(clean_keys, names=keys.names)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/multi.py:204: in new_meth
    return meth(self_or_cls, *args, **kwargs)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/multi.py:566: in from_tuples
    return cls.from_arrays(arrays, sortorder=sortorder, names=names)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/multi.py:489: in from_arrays
    codes, levels = factorize_from_iterables(arrays)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:2881: in factorize_from_iterables
    codes, categories = zip(*(factorize_from_iterable(it) for it in iterables))
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:2881: in <genexpr>
    codes, categories = zip(*(factorize_from_iterable(it) for it in iterables))
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:2854: in factorize_from_iterable
    cat = Categorical(values, ordered=False)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:451: in __init__
    dtype = CategoricalDtype(categories, dtype.ordered)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/dtypes/dtypes.py:183: in __init__
    self._finalize(categories, ordered, fastpath=False)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/dtypes/dtypes.py:337: in _finalize
    categories = self.validate_categories(categories, fastpath=fastpath)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/dtypes/dtypes.py:526: in validate_categories
    categories = Index._with_infer(categories, tupleize_cols=False)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/base.py:680: in _with_infer
    result = cls(*args, **kwargs)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/base.py:494: in __new__
    arr = _maybe_cast_data_without_dtype(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

subarr = array([1, 2, 3, 4, 6, 7], dtype=object), cast_numeric_deprecated = True

    def _maybe_cast_data_without_dtype(
        subarr: np.ndarray, cast_numeric_deprecated: bool = True
    ) -> ArrayLike:
        """
        If we have an arraylike input but no passed dtype, try to infer
        a supported dtype.
    
        Parameters
        ----------
        subarr : np.ndarray[object]
        cast_numeric_deprecated : bool, default True
            Whether to issue a FutureWarning when inferring numeric dtypes.
    
        Returns
        -------
        np.ndarray or ExtensionArray
        """
    
        result = lib.maybe_convert_objects(
            subarr,
            convert_datetime=True,
            convert_timedelta=True,
            convert_period=True,
            convert_interval=True,
            dtype_if_all_nat=np.dtype("datetime64[ns]"),
        )
        if result.dtype.kind in ["i", "u", "f"]:
            if not cast_numeric_deprecated:
                # i.e. we started with a list, not an ndarray[object]
                return result
    
>           warnings.warn(
                "In a future version, the Index constructor will not infer numeric "
                "dtypes when passed object-dtype sequences (matching Series behavior)",
                FutureWarning,
                stacklevel=3,
            )
E           FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)

/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/base.py:7137: FutureWarning

https://github.com/dask/dask/runs/5468641701?check_suite_focus=true

@jsignell jsignell added the tests Unit tests and/or continuous integration label Mar 9, 2022
@github-actions github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Apr 11, 2022
@jrbourbeau jrbourbeau self-assigned this Aug 4, 2022
@hendrikmakait hendrikmakait mentioned this issue Aug 8, 2022
3 tasks
@phobson
Copy link
Contributor

phobson commented Oct 6, 2022

Just popping to say that I'm seeing this manifest in a few recent PRs, e.g., https://github.com/dask/dask/actions/runs/3165861097/jobs/5155195568#step:7:22222

@jrbourbeau
Copy link
Member

Hopefully closed via #9701

@hendrikmakait
Copy link
Member

hendrikmakait commented Jan 13, 2023

The test popped up several times on CI again, though with a different parametrization: #9793 (comment). Should we reopen this or create a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. tests Unit tests and/or continuous integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants