wip-feat: pandas as soft dependency #3384

mattijn · 2024-03-25T23:18:11Z

This PR is an attempt to make pandas a soft dependency. I hope it can be used as inspiration, as I was not able to make the types happy. I've no real idea how it should be done, but I've been trying a few things, some with success and others without.

I also made an attempt to prioritize the DataFrameLike approach over the pandas routine, but decided to not do this as otherwise usage of a pandas DataFrame within Altair will require pyarrow to infer/serialize. My current feeling is that usage of pandas to infer and serialize the data is still preferred as it is not yet depending on pyarrow.

…nd index attributes over type

binste · 2024-03-29T06:55:17Z

Great to get the ball rolling on this, thank you @mattijn! I did not yet have time to review but just wanted to say that I'm happy to have a look at the types once I get to it. As long as the package works, I'm optimistic that we can make mypy happy.

mattijn · 2024-03-29T07:00:32Z

Thanks @binste! No rush! Maybe something for version 5.4

binste

Just some first comments. I haven't had the chance yet to run mypy on this PR (reviewed it in the browser) but I have some ideas how to make it work which I want to try out depending on the errors it throws.

binste · 2024-04-21T14:43:54Z

altair/utils/_importers.py

+
+
+def import_pandas() -> ModuleType:
+    min_version = "0.25"


Could you add a comment in pyproject.toml? Next to the pandas requirement that if the pandas version is updated, it also needs to be changed here. Although I'm realizing now that that file needs to be changed anyway to make pandas optional

binste · 2024-04-21T14:51:42Z

altair/_magics.py

        return curried.pipe(data, data_transformers.get())
    elif isinstance(data, str):
        return {"url": data}
+    elif _is_pandas_dataframe(data):


Is my understanding correct that this line is only reached if it's an old Pandas version which does not support the dataframe interchange protocol? Else it would already stop at line 43, right?

If yes, could you add a comment about this?

binste · 2024-04-21T14:52:15Z

altair/utils/core.py

@@ -53,6 +52,11 @@ def __dataframe__(
    ) -> DfiDataFrame: ...


+def _is_pandas_dataframe(obj: Any) -> bool:


Could this function be a simple isinstance(obj, pd.DataFrame)?

Thanks for start reviewing this PR @binste! I don't think I can do this without importing pandas first.

I tried setting up a function on which I can do some duck typing

def instance(obj): return type(obj).__name__

But found out that both polars and pandas are using the instance type DataFrame for their dataframe.

Maybe I'm missing something but couldn't we call the pandas import function you created in here and if it raises an importerror, we know it's not a pandas dataframe anyway.

It's pragmatic, I admit. But that would be an unnecessary import of pandas if it is available in the environment, but if the data object is something else.
I wish we could sniff the type without importing modules first.

Here's the optional import logic I added to plotly.py a while back: https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/_plotly_utils/optional_imports.py if should_load is False then it won't perform the import even if the library is installed. This was used with isinstance checks, because if pandas hasn't been loaded yet, you know the object you're dealing with isn't a pandas DataFrame, even if pandas is installed.

binste · 2024-04-21T14:57:47Z

altair/utils/_importers.py

+        return pd
+    except ImportError as err:
+        raise ImportError(
+            f"Serialization of the DataFrame requires\n"


Suggested change

f"Serialization of the DataFrame requires\n"

f"Serialization of this data requires\n"

It can also be a dict as in data.py: _data_to_csv_string. Furthermore, if it's a dataframe, it's already given that Pandas is installed.

binste · 2024-04-21T15:04:51Z

altair/utils/schemapi.py

+if TYPE_CHECKING:
+    pass
+


Suggested change

if TYPE_CHECKING:

pass

Aware that it's just a wip PR, thought I'd just note it anyway :)

binste · 2024-04-21T15:13:22Z

altair/utils/schemapi.py

+class _PandasTimestamp:
+    def isoformat(self):
+        return "dummy_isoformat"  # Return a dummy ISO format string


I think this should inherit from a Protocol as a pd.Timestamp is not an instance of _PandasTimestamp. You'll then also need to add the @runtime_checkable decorator from typing. Also, we could directly test for a pandas timestamp in a similar function to is_pandas_dataframe to keep these approaches consistent?

binste · 2024-04-21T15:18:47Z

tests/utils/test_core.py

@@ -4,11 +4,11 @@

 import numpy as np
 import pandas as pd
+from pandas.api.types import infer_dtype


Let's make the tests also run without pandas installed so that we can run the whole test suite once with pandas installed and once without. Prevents us from accidentally reintroducing a hard dependency again in the future

mattijn added 10 commits March 24, 2024 23:44

adapt tools files

110d4cd

changes from rerun generate_schema_wrapper

f40d951

add importer for pandas

ba4b778

prioritize DataFrameLike, use the pandas importer only when needed.

21910b1

prioritze DataFrameLike, check pandas dataframe using iloc, columns a…

88a8870

…nd index attributes over type

ruff

c14d94a

ruff format

d00cd10

relocate function

bd70bf4

prioritze pd.dataframe, currently no dependency on pyarrow

8b41305

ruff

62ab14d

jonmmease mentioned this pull request Mar 26, 2024

fix: Support falling back to pandas when pyarrow is installed but too old #3387

Merged

joelostblom added the enhancement label Mar 29, 2024

binste reviewed Apr 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip-feat: pandas as soft dependency #3384

wip-feat: pandas as soft dependency #3384

mattijn commented Mar 25, 2024

binste commented Mar 29, 2024

mattijn commented Mar 29, 2024

binste left a comment

binste Apr 21, 2024

binste Apr 21, 2024

binste Apr 21, 2024

mattijn Apr 24, 2024

binste Apr 24, 2024

mattijn Apr 24, 2024

jonmmease Apr 24, 2024

binste Apr 21, 2024

binste Apr 21, 2024

binste Apr 21, 2024

binste Apr 21, 2024

		@@ -53,6 +52,11 @@ def __dataframe__(
		) -> DfiDataFrame: ...


		def _is_pandas_dataframe(obj: Any) -> bool:

	f"Serialization of the DataFrame requires\n"
	f"Serialization of this data requires\n"

wip-feat: pandas as soft dependency #3384

Are you sure you want to change the base?

wip-feat: pandas as soft dependency #3384

Conversation

mattijn commented Mar 25, 2024

binste commented Mar 29, 2024

mattijn commented Mar 29, 2024

binste left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment