Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_ipc won't load Feather V2 files #1488

Closed
tobi-lipede-oodle opened this issue Oct 5, 2021 · 3 comments
Closed

read_ipc won't load Feather V2 files #1488

tobi-lipede-oodle opened this issue Oct 5, 2021 · 3 comments

Comments

@tobi-lipede-oodle
Copy link

Are you using Python or Rust?

Python

Which feature gates did you use?

This can be ignored by Python users.

What version of polars are you using?

0.7.15

What operating system are you using polars on?

macOS Big Sur

Describe your bug.

Following the code for read_ipc when use_pyarrow=True, it seems like DataFrame.from_arrow gets called on the result of pa.feather.read_feather. Should this instead be called on the result of pa.feather.read_table? The former returns a Pandas DataFrame, while the latter gives an Arrow table.

When use_pyarrow is False, the file loads, but with incorrect values.

What are the steps to reproduce the behavior?

Example

import polars as pl
import pandas as pd
import pyarrow as pa
import numpy as np

# Create a simple dataset on which we can reproduce the bug.
pd.DataFrame({
    "foo": [None, 1, 2],
    "bar": np.arange(3)
}).to_feather("test.arrow")

# Fails
pl.read_ipc("test.arrow")

# Works - Output 1
tbl = pa.feather.read_table("test.arrow")
print(pl.DataFrame.from_arrow(tbl))

# Loads, but with incorrect values - Output 2
print(pl.read_ipc("test.arrow", use_pyarrow=False))

Output 1:

shape: (3, 2)
╭──────┬─────╮
│ foo  ┆ bar │
│ ---  ┆ --- │
│ f64  ┆ i64 │
╞══════╪═════╡
│ null ┆ 0   │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 1    ┆ 1   │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2    ┆ 2   │
╰──────┴─────╯

Output 2:

shape: (3, 2)
╭──────┬─────────────────────╮
│ foo  ┆ bar                 │
│ ---  ┆ ---                 │
│ f64  ┆ i64                 │
╞══════╪═════════════════════╡
│ 0.0  ┆ 24                  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ 1261641627085906436 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ null ┆ 1369095386551025664 │
╰──────┴─────────────────────╯
@ritchie46
Copy link
Member

You are running a very old version of polars. Could you try to update to 0.9.12 and try again?

@ghuls
Copy link
Collaborator

ghuls commented Oct 5, 2021

The read_feather problem was fixed quite a while ago: #623

The problem with your Feather file is that is uses compression, which was not supported in arrow-rs (recent versions of polars use arrow2 which supports compression):
apache/arrow-rs#286

With the following it would have worked to load it with that old version of polars:

pd.DataFrame({
    "foo": [None, 1, 2],
    "bar": np.arange(3)
}).to_feather("test.arrow", compression=None)

Last version of polars:

pd.DataFrame({
    "foo": [None, 1, 2],
    "bar": np.arange(3)
}).to_feather("test.arrow")

pl.read_ipc("test.arrow", use_pyarrow=False)

Out[1]: 
shape: (3, 2)
┌──────┬─────┐
│ foobar │
│ ------ │
│ f64i64 │
╞══════╪═════╡
│ null0   │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 11   │
├╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 22   │
└──────┴─────┘

@tobi-lipede-oodle
Copy link
Author

You're both right - not sure how I managed to install such an old version. Will close this, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants