New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle missing values in dataframe with category dtype. #7331
Conversation
* Replace -1 in pandas/cudf initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories.
TODO need to move the transform out of array interface getter.
This PR adds some more tests. We have a number of different cases:
Tests in c++ now cover both DMatrix/DDM, weighted/normal. Missing with cudf and pandas are tested in Python. Didn't expect the complexity when I was creating the interface, should have been more thorough. Also, I should expose the |
Will test dask in a different PR. |
Codecov Report
@@ Coverage Diff @@
## master #7331 +/- ##
==========================================
- Coverage 83.68% 83.44% -0.24%
==========================================
Files 13 13
Lines 3885 3920 +35
==========================================
+ Hits 3251 3271 +20
- Misses 634 649 +15
Continue to review full report at Codecov.
|
@hcho3 Could you please take another look? |
* Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.
IsValid
functor.Close #7329 .
Depending on the difficulty of backporting, this can be part of the next patch release (1.5.1).