Flaky test `test_dataframe_aggregations_multilevel[cov-disk-<lambda>1]` #8795

jsignell · 2022-03-09T17:05:32Z

There is a flaky test test_dataframe_aggregations_multilevel[cov-disk-<lambda>1]

=================================== FAILURES ===================================
__________ test_dataframe_aggregations_multilevel[cov-disk-<lambda>1] __________
[gw2] linux -- Python 3.9.10 /usr/share/miniconda3/envs/test-environment/bin/python

grouper = <function <lambda> at 0x7f79a8f8da60>, agg_func = 'cov'

    @pytest.mark.parametrize(
        "grouper",
        [
            lambda df: ["a"],
            lambda df: ["a", "b"],
            lambda df: df["a"],
            lambda df: [df["a"], df["b"]],
            lambda df: [df["a"] > 2, df["b"] > 1],
        ],
    )
    def test_dataframe_aggregations_multilevel(grouper, agg_func):
        def call(g, m, **kwargs):
            return getattr(g, m)(**kwargs)
    
        pdf = pd.DataFrame(
            {
                "a": [1, 2, 6, 4, 4, 6, 4, 3, 7] * 10,
                "b": [4, 2, 7, 3, 3, 1, 1, 1, 2] * 10,
                "d": [0, 1, 2, 3, 4, 5, 6, 7, 8] * 10,
                "c": [0, 1, 2, 3, 4, 5, 6, 7, 8] * 10,
            },
            columns=["c", "b", "a", "d"],
        )
    
        ddf = dd.from_pandas(pdf, npartitions=10)
    
        # covariance only works with N+1 columns
        if agg_func not in ("cov", "corr"):
            assert_eq(
                call(pdf.groupby(grouper(pdf))["c"], agg_func),
                call(ddf.groupby(grouper(ddf))["c"], agg_func, split_every=2),
            )
    
        # not supported by pandas
        if agg_func != "nunique":
            assert_eq(
                call(pdf.groupby(grouper(pdf))[["c", "d"]], agg_func),
                call(ddf.groupby(grouper(ddf))[["c", "d"]], agg_func, split_every=2),
            )
    
            if agg_func in ("cov", "corr"):
                # there are sorting issues between pandas and chunk cov w/dask
                df = call(pdf.groupby(grouper(pdf)), agg_func).sort_index()
                cols = sorted(list(df.columns))
                df = df[cols]
>               dddf = call(ddf.groupby(grouper(ddf)), agg_func, split_every=2).compute()

dask/dataframe/tests/test_groupby.py:1138: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dask/base.py:292: in compute
    (result,) = compute(self, traverse=False, **kwargs)
dask/base.py:575: in compute
    results = schedule(dsk, keys, **kwargs)
dask/threaded.py:81: in get
    results = get_async(
dask/local.py:506: in get_async
    raise_exception(exc, tb)
dask/local.py:314: in reraise
    raise exc
dask/local.py:219: in execute_task
    result = _execute_task(task, data)
dask/core.py:119: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
dask/optimization.py:969: in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
dask/core.py:149: in get
    result = _execute_task(task, cache)
dask/core.py:119: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
dask/dataframe/groupby.py:449: in _cov_chunk
    mul = g.apply(_mul_cols, cols=cols).reset_index(level=-1, drop=True)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:1414: in apply
    result = self._python_apply_general(f, self._selected_obj)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:1460: in _python_apply_general
    return self._wrap_applied_output(
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/generic.py:1007: in _wrap_applied_output
    return self._concat_objects(values, not_indexed_same=not_indexed_same)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:1058: in _concat_objects
    result = concat(
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/util/_decorators.py:311: in wrapper
    return func(*args, **kwargs)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/reshape/concat.py:346: in concat
    op = _Concatenator(
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/reshape/concat.py:420: in __init__
    keys = type(keys).from_tuples(clean_keys, names=keys.names)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/multi.py:204: in new_meth
    return meth(self_or_cls, *args, **kwargs)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/multi.py:566: in from_tuples
    return cls.from_arrays(arrays, sortorder=sortorder, names=names)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/multi.py:489: in from_arrays
    codes, levels = factorize_from_iterables(arrays)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:2881: in factorize_from_iterables
    codes, categories = zip(*(factorize_from_iterable(it) for it in iterables))
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:2881: in <genexpr>
    codes, categories = zip(*(factorize_from_iterable(it) for it in iterables))
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:2854: in factorize_from_iterable
    cat = Categorical(values, ordered=False)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:451: in __init__
    dtype = CategoricalDtype(categories, dtype.ordered)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/dtypes/dtypes.py:183: in __init__
    self._finalize(categories, ordered, fastpath=False)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/dtypes/dtypes.py:337: in _finalize
    categories = self.validate_categories(categories, fastpath=fastpath)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/dtypes/dtypes.py:526: in validate_categories
    categories = Index._with_infer(categories, tupleize_cols=False)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/base.py:680: in _with_infer
    result = cls(*args, **kwargs)
/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/base.py:494: in __new__
    arr = _maybe_cast_data_without_dtype(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

subarr = array([1, 2, 3, 4, 6, 7], dtype=object), cast_numeric_deprecated = True

    def _maybe_cast_data_without_dtype(
        subarr: np.ndarray, cast_numeric_deprecated: bool = True
    ) -> ArrayLike:
        """
        If we have an arraylike input but no passed dtype, try to infer
        a supported dtype.
    
        Parameters
        ----------
        subarr : np.ndarray[object]
        cast_numeric_deprecated : bool, default True
            Whether to issue a FutureWarning when inferring numeric dtypes.
    
        Returns
        -------
        np.ndarray or ExtensionArray
        """
    
        result = lib.maybe_convert_objects(
            subarr,
            convert_datetime=True,
            convert_timedelta=True,
            convert_period=True,
            convert_interval=True,
            dtype_if_all_nat=np.dtype("datetime64[ns]"),
        )
        if result.dtype.kind in ["i", "u", "f"]:
            if not cast_numeric_deprecated:
                # i.e. we started with a list, not an ndarray[object]
                return result
    
>           warnings.warn(
                "In a future version, the Index constructor will not infer numeric "
                "dtypes when passed object-dtype sequences (matching Series behavior)",
                FutureWarning,
                stacklevel=3,
            )
E           FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)

/usr/share/miniconda3/envs/test-environment/lib/python3.9/site-packages/pandas/core/indexes/base.py:7137: FutureWarning

https://github.com/dask/dask/runs/5468641701?check_suite_focus=true

The text was updated successfully, but these errors were encountered:

phobson · 2022-10-06T18:35:23Z

Just popping to say that I'm seeing this manifest in a few recent PRs, e.g., https://github.com/dask/dask/actions/runs/3165861097/jobs/5155195568#step:7:22222

jrbourbeau · 2022-12-01T00:26:57Z

Hopefully closed via #9701

hendrikmakait · 2023-01-13T11:41:49Z

The test popped up several times on CI again, though with a different parametrization: #9793 (comment). Should we reopen this or create a new issue?

jsignell added the tests Unit tests and/or continuous integration label Mar 9, 2022

github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Apr 11, 2022

jsignell mentioned this issue May 11, 2022

Prevent nulls in index for non-numeric dtypes #8963

Merged

4 tasks

jrbourbeau self-assigned this Aug 4, 2022

hendrikmakait mentioned this issue Aug 8, 2022

Unpack namedtuple #9361

Merged

3 tasks

jrbourbeau mentioned this issue Aug 17, 2022

Use entry_points utility in sizeof #9390

Merged

ian-r-rose self-assigned this Oct 14, 2022

jrbourbeau removed their assignment Oct 14, 2022

ian-r-rose mentioned this issue Oct 17, 2022

Avoid FutureWarning that can arise from index dtype inference #9575

Closed

3 tasks

ian-r-rose mentioned this issue Oct 25, 2022

CI failure: FutureWarning - test_dataframe_aggregations_multilevel #9598

Closed

jrbourbeau unassigned ian-r-rose Oct 28, 2022

rjzamora mentioned this issue Nov 30, 2022

Fix flaky test_dataframe_aggregations_multilevel #9701

Merged

jrbourbeau closed this as completed Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test `test_dataframe_aggregations_multilevel[cov-disk-<lambda>1]` #8795

Flaky test `test_dataframe_aggregations_multilevel[cov-disk-<lambda>1]` #8795

jsignell commented Mar 9, 2022

phobson commented Oct 6, 2022

jrbourbeau commented Dec 1, 2022

hendrikmakait commented Jan 13, 2023 •

edited

Flaky test test_dataframe_aggregations_multilevel[cov-disk-<lambda>1] #8795

Flaky test test_dataframe_aggregations_multilevel[cov-disk-<lambda>1] #8795

Comments

jsignell commented Mar 9, 2022

phobson commented Oct 6, 2022

jrbourbeau commented Dec 1, 2022

hendrikmakait commented Jan 13, 2023 • edited

Flaky test `test_dataframe_aggregations_multilevel[cov-disk-<lambda>1]` #8795

Flaky test `test_dataframe_aggregations_multilevel[cov-disk-<lambda>1]` #8795

hendrikmakait commented Jan 13, 2023 •

edited