-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] chunked parquet reader is not factoring empty dataframes with >0
columns present
#15743
Comments
Hi @galipremsagar I have been looking into this and it just involves handling a bunch of logic. I have a couple of small questions before I implement the solution.
CC: @nvdbaranec @GregoryKimball for vis |
It is always good to not throw runtime error but that is less pressing for me. If we can fix it, it'll be a bonus.
I would expect has_next to return true and read_chunk to return an empty table from libcudf layer. It does right now but infinitely while has_next still keeps returning False. |
…read (#15757) Fixes #15743 This PR solves two problems. First, it does not any longer throw a CUDA failure or exception when an invalid (out of bound) chunk is read via `chunked_parquet_reader::read_chunk()` and instead returns an empty chunk. Second, for empty tables, it returns true for `has_next()` until the first call to `chunked_parquet_reader::read_chunk()`. After that `has_next()` returns false but `chunked_parquet_reader::read_chunk()` keeps returning empty chunks Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) URL: #15757
Describe the bug
A dataframe can have
>0
columns when it has 0 rows. There are two issues at play here:False
when we dohas_next
, but return an empty dataframe correctly when we callread_chunk
.has_next
returnsFalse
andread_chunk
raises aRuntimeError
- as expected, But incase of empty dataframes,has_next
returnsFalse
andread_chunk
endlessly keeps returning the empty dataframe without any error.Steps/Code to reproduce bug
Environment overview (please complete the following information)
The text was updated successfully, but these errors were encountered: