Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet error: Invalid offset in sparse column chunk data #8092

Closed
b4l opened this issue Nov 8, 2023 · 6 comments · Fixed by #8029
Closed

Parquet error: Invalid offset in sparse column chunk data #8092

b4l opened this issue Nov 8, 2023 · 6 comments · Fixed by #8029
Labels
bug Something isn't working

Comments

@b4l
Copy link

b4l commented Nov 8, 2023

Describe the bug

Querying on a registered parquet table sometimes throws an error. The query is doing a range query over multiple fields. The same query with different literals, in which only one range impacts the output as the others are unbound works fine. Also, this happens for SQL and DataFrame alike.

To Reproduce

No response

Expected behavior

Query returns result.

Additional context

No response

@b4l b4l added the bug Something isn't working label Nov 8, 2023
@tustvold
Copy link
Contributor

tustvold commented Nov 8, 2023

This should have been fixed by apache/arrow-rs#5036

Perhaps you could try out #8029

@b4l
Copy link
Author

b4l commented Nov 8, 2023

@tustvold thanks! It is indeed working as expected with #8029. Looking forward to the release.

@alamb
Copy link
Contributor

alamb commented Nov 13, 2023

FWIW we have released arrow 48.0.1 (see apache/arrow-rs#5050) so a cargo update in your project should get the fix @b4l

@alamb alamb closed this as completed Nov 13, 2023
@alamb
Copy link
Contributor

alamb commented Nov 13, 2023

Please let us know if that doesn't work

@liukun4515
Copy link
Contributor

We should add some comment to this issue or the related pr.

If user don't want to update the arrow-rs or the datafusion version to resolve this issue, they can disable the page index:
set the enable_page_index to false. to skip this bug.

cc @alamb @tustvold

@alamb
Copy link
Contributor

alamb commented Dec 15, 2023

If user don't want to update the arrow-rs or the datafusion version to resolve this issue, they can disable the page index:
set the enable_page_index to false. to skip this bug.

That is a good point

The workaround suggested by @liukun4515 can be implemented like this:

SQL

set datafusion.execution.parquet.enable_page_index = false;
0 rows in set. Query took 0.000 seconds.

Programatically

By setting ParquetOptions::enable_page_index to false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants