Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MG Property Graph add_vertex_data crashes #2685

Closed
BradReesWork opened this issue Sep 12, 2022 · 5 comments
Closed

[BUG] MG Property Graph add_vertex_data crashes #2685

BradReesWork opened this issue Sep 12, 2022 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@BradReesWork
Copy link
Member

Describe the bug
When I try and add data

Cell In [21], line 17
13 #ddf = gdf
15 print(f"read recs {start_id} to {end_id} and now adding to PG")
---> 17 pG.add_vertex_data(ddf, vertex_col_name='id', type_name='paper')
19 #print(f"PG now contains {pG.get_num_vertices()} ")
22 rec_read = end_id

File ~/anaconda3/envs/cugraph_dev/lib/python3.9/site-packages/cugraph-22.8.0a0+166.gd98ddc69-py3.9-linux-x86_64.egg/cugraph/dask/structure/mg_property_graph.py:405, in EXPERIMENTAL__MGPropertyGraph.add_vertex_data(self, dataframe, vertex_col_name, type_name, property_columns)
398 # Ensure that both the predetermined vertex ID column name and vertex
399 # type column name are present for proper merging.
400
401 # NOTE: This copies the incoming DataFrame in order to add the new
402 # columns. The copied DataFrame is then merged (another copy) and then
403 # deleted when out-of-scope.
404 tmp_df = dataframe.copy()
--> 405 tmp_df[self.vertex_col_name] = tmp_df[vertex_col_name]
406 # FIXME: handle case of a type_name column already being in tmp_df
407 tmp_df[self.type_col_name] = type_name
...
File ~/anaconda3/envs/cugraph_dev/lib/python3.9/site-packages/numpy/core/_methods.py:44, in _amin(a, axis, out, keepdims, initial, where)
42 def _amin(a, axis=None, out=None, keepdims=False,
43 initial=_NoValue, where=True):
---> 44 return umr_minimum(a, axis, None, out, keepdims, initial, where)

TypeError: '<=' not supported between instances of 'str' and 'int'

@BradReesWork BradReesWork added bug Something isn't working ? - Needs Triage Need team to review and classify and removed ? - Needs Triage Need team to review and classify labels Sep 12, 2022
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Sep 12, 2022
@BradReesWork BradReesWork added this to the 22.10 milestone Sep 12, 2022
@BradReesWork BradReesWork removed this from Needs prioritizing in Bug Squashing Sep 12, 2022
@eriknw
Copy link
Contributor

eriknw commented Sep 12, 2022

Thanks. I can reproduce. This is actually an error in dask. Here is an example that goes through a similar code path that give the same error:

import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame({"a": [1, 2], "b": [3, 4], 1:[5, 6]})
ddf = dd.from_pandas(df, npartitions=2)
ddf["c"] = df["a"]  # <-- gives the error you see

df.mean(axis=0)
ddf.mean(axis=0)  # <-- gives similar error

A workaround is to have all column names be the same dtype:

gdf.columns = gdf.columns.astype(str)

@eriknw
Copy link
Contributor

eriknw commented Sep 13, 2022

Fixing in dask/dask#9485

@eriknw
Copy link
Contributor

eriknw commented Sep 20, 2022

This issue can be closed.

This is fixed in dask version 2022.9.1, which was released on September 19.

@rlratzel
Copy link
Contributor

We may need a pin to a minimum version of dask. dask>=2022.9.1

Workaround is to use only strings for column names, no mixing of strings, ints, etc.

@rlratzel rlratzel modified the milestones: 22.10, 22.12 Oct 5, 2022
@rlratzel
Copy link
Contributor

closed via dask/dask#9485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants