-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polars.EagerPolarsDataset "calamine" engine error for reading Excel files #589
Comments
Hi @butterlyn , thanks for reporting! I suspect that this is a problem of the error message being unhelpful here. IIRC, the Could you try |
Hi @astrojuanlu, yes I'm running |
Thanks for checking. Can this error be reproduced with https://github.com/kedro-org/kedro-starters/blob/main/spaceflights-pandas/%7B%7B%20cookiecutter.repo_name%20%7D%7D/data/01_raw/shuttles.xlsx ? |
Hi @astrojuanlu, just downloaded the file and checked, yes getting the same error message. |
Kedro datasets (generally) work by using >>> import fastexcel
>>> import fsspec
>>> import polars as pl
>>>
>>> fs = fsspec.filesystem('http')
>>> path = fs.open('https://github.com/kedro-org/kedro-viz/raw/main/demo-project/data/01_raw/shuttles.xlsx')
>>> pl.read_excel(path, engine='openpyxl')
shape: (77_096, 13)
┌───────┬───────────────────────────┬──────────────┬─────────────┬───┬──────────────────┬─────────────────────────┬──────────┬────────────┐
│ id ┆ shuttle_location ┆ shuttle_type ┆ engine_type ┆ … ┆ d_check_complete ┆ moon_clearance_complete ┆ price ┆ company_id │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ i64 │
╞═══════╪═══════════════════════════╪══════════════╪═════════════╪═══╪══════════════════╪═════════════════════════╪══════════╪════════════╡
│ 63561 ┆ Niue ┆ Type V5 ┆ Quantum ┆ … ┆ f ┆ f ┆ $1,325.0 ┆ 35029 │
│ 36260 ┆ Anguilla ┆ Type V5 ┆ Quantum ┆ … ┆ t ┆ f ┆ $1,780.0 ┆ 30292 │
│ 57015 ┆ Russian Federation ┆ Type V5 ┆ Quantum ┆ … ┆ f ┆ f ┆ $1,715.0 ┆ 19032 │
│ 14035 ┆ Barbados ┆ Type V5 ┆ Plasma ┆ … ┆ f ┆ f ┆ $4,770.0 ┆ 8238 │
│ 10036 ┆ Sao Tome and Principe ┆ Type V2 ┆ Plasma ┆ … ┆ f ┆ f ┆ $2,820.0 ┆ 30342 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 4368 ┆ Barbados ┆ Type V5 ┆ Quantum ┆ … ┆ t ┆ f ┆ $4,107.0 ┆ 6654 │
│ 2983 ┆ Bouvet Island (Bouvetoya) ┆ Type F5 ┆ Quantum ┆ … ┆ t ┆ f ┆ $1,169.0 ┆ 8000 │
│ 69684 ┆ Micronesia ┆ Type V5 ┆ Plasma ┆ … ┆ t ┆ f ┆ $1,910.0 ┆ 14296 │
│ 21738 ┆ Uzbekistan ┆ Type V5 ┆ Plasma ┆ … ┆ t ┆ f ┆ $2,170.0 ┆ 27363 │
│ 72645 ┆ Malta ┆ Type F5 ┆ Quantum ┆ … ┆ t ┆ f ┆ $1,455.0 ┆ 12542 │
└───────┴───────────────────────────┴──────────────┴─────────────┴───┴──────────────────┴─────────────────────────┴──────────┴────────────┘
>>> pl.read_excel(path, engine='calamine')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/miniconda3/envs/polars-test/lib/python3.10/site-packages/polars/_utils/deprecation.py", line 134, in wrapper
return function(*args, **kwargs)
File "/opt/miniconda3/envs/polars-test/lib/python3.10/site-packages/polars/_utils/deprecation.py", line 134, in wrapper
return function(*args, **kwargs)
File "/opt/miniconda3/envs/polars-test/lib/python3.10/site-packages/polars/io/spreadsheet/functions.py", line 253, in read_excel
return _read_spreadsheet(
File "/opt/miniconda3/envs/polars-test/lib/python3.10/site-packages/polars/io/spreadsheet/functions.py", line 469, in _read_spreadsheet
reader_fn, parser, worksheets = _initialise_spreadsheet_parser(
File "/opt/miniconda3/envs/polars-test/lib/python3.10/site-packages/polars/io/spreadsheet/functions.py", line 592, in _initialise_spreadsheet_parser
parser = fxl.read_excel(source, **engine_options)
File "/opt/miniconda3/envs/polars-test/lib/python3.10/site-packages/fastexcel/__init__.py", line 202, in read_excel
return ExcelReader(_read_excel(expanduser(path)))
File "/opt/miniconda3/envs/polars-test/lib/python3.10/posixpath.py", line 232, in expanduser
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not HTTPFile
>>> fastexcel.read_excel(path)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/miniconda3/envs/polars-test/lib/python3.10/site-packages/fastexcel/__init__.py", line 202, in read_excel
return ExcelReader(_read_excel(expanduser(path)))
File "/opt/miniconda3/envs/polars-test/lib/python3.10/posixpath.py", line 232, in expanduser
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not HTTPFile |
We can of course ask, but the Rust ecosystem has its own alternative to fsspec, called |
Description
polars.EagerPolarsDataset cannot read excel files using the "calamine" engine.
Context
The "calamine" engine (see: link), is significantly faster at reading and writing Excel files compared to the other
polars.read_excel()
engines. This is a practical requirement for reading/writing large Excel files without sacrificing performance.Steps to Reproduce
kedro new
.xlsx
,.xls
, or.xlsb
file indata/
directorypip install -U polars[pyarrow]
polars.read_excel()
"calamine" engine)pip install -U fastexcel
catalog.yml
:excel_input_file
as an input and run withkedro run
Expected Result
kedro run
to load Excel file viapolars.EagerPolarsDataset
as percatalog.yml
.Actual Result
Error when using
kedro run
:Note that when changing the
polars.read_excel()
to another engine (e.g.,xlsx2csv
), the polars.EagerPolarsDataset loads as expected.Your Environment
pip show kedro
orkedro -V
): 0.19.3pip show kedro-airflow
): kedro-datasets: 2.1.0python -V
): 3.11.7The text was updated successfully, but these errors were encountered: