Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(python): Update to PyO3 to 0.18.0 #6531

Merged
merged 8 commits into from
Jan 30, 2023

Conversation

jjerphan
Copy link
Contributor

Follow-up of #4794

@github-actions github-actions bot added chore Maintenance work that does not impact the user python Related to Python Polars labels Jan 28, 2023
@jjerphan jjerphan force-pushed the chore-python/update-to-py03-0.18.0 branch from 5ed32c9 to 3dfd28f Compare January 28, 2023 16:02
@stinodego stinodego changed the title chore(python): Update to py03 0.18.0 build(python): Update to PyO3 to 0.18.0 Jan 28, 2023
@stinodego stinodego removed the chore Maintenance work that does not impact the user label Jan 28, 2023
@github-actions github-actions bot added the build Changes that affect the build system or external dependencies label Jan 28, 2023
@jjerphan
Copy link
Contributor Author

I still have tests failure because NumPy arrays containing strings cannot be converted to pl.Series. I am currently trying to understand why this is the case.

@ritchie46
Copy link
Member

I believe it is related to PyO3/pyo3#2615

This blocked previous upgrade.

@ritchie46
Copy link
Member

@jjerphan maybe we can add a new constructor specialized for this case?

Or we could ask upstream what is best way to do this in latest release?

@jjerphan
Copy link
Contributor Author

jjerphan commented Jan 28, 2023

I guess we can ask upstream and in the mean time have a new constructor which has been specialized for this case with a comment indicating that such a constructor is temporary.

What do you think of this approach?

@ritchie46
Copy link
Member

Yes, I think that's a good idea.

@jjerphan
Copy link
Contributor Author

jjerphan commented Jan 28, 2023

Perfect.

I am busy now, but I should be able to have a look at it tomorrow.

I also have followed-up discussions with PyO3/pyo3#2615 (comment).

@adamreichold
Copy link
Contributor

Could someone point me to where the entry point for those failing tests is so that I can check which downcast/extraction is failing exactly? Thank you!

@ritchie46
Copy link
Member

@adamreichold I can come back to you in an hour. 👍

@adamreichold
Copy link
Contributor

Could someone point me to where the entry point for those failing tests is so that I can check which downcast/extraction is failing exactly? Thank you!

So by digging and observing that this seems to affect only NumPy arrays of strings, I suspect that this comes from PySeries::new_str which extracts Wrap<Utf8Chunked>, right? (rust-numpy does not support the <U* dtypes yet.)

@adamreichold
Copy link
Contributor

adamreichold commented Jan 28, 2023

Could someone point me to where the entry point for those failing tests is so that I can check which downcast/extraction is failing exactly? Thank you!

So by digging and observing that this seems to affect only NumPy arrays of strings, I suspect that this comes from PySeries::new_str which extracts Wrap<Utf8Chunked>, right? (rust-numpy does not support the <U* dtypes yet.)

I think I found the issue, your various impls like

impl<'a, T> FromPyObject<'a> for Wrap<ChunkedArray<T>>

use a helper function

pub(crate) fn get_pyseq(obj: &PyAny) -> PyResult<(&PySequence, usize)> {
    let seq = <PySequence as PyTryFrom>::try_from(obj)?;
    let len = seq.len()?;
    Ok((seq, len))
}

and this will indeed fail with NumPy arrays as we do not downcast those to PySequence any more after PyO3/pyo3#2477 because they do not really support the sequence API (even though they do pass PySequence_Check).

But I also think that your code does not actually use the sequence interface as both .len() and .iter() work on any PyAny. So dropping get_pyseq together with mechanical changes like

diff --git a/py-polars/src/conversion.rs b/py-polars/src/conversion.rs
index 463b65371..575d69fbd 100644
--- a/py-polars/src/conversion.rs
+++ b/py-polars/src/conversion.rs
@@ -125,10 +125,10 @@ impl<'a> FromPyObject<'a> for Wrap<BooleanChunked> {
 
 impl<'a> FromPyObject<'a> for Wrap<Utf8Chunked> {
     fn extract(obj: &'a PyAny) -> PyResult<Self> {
-        let (seq, len) = get_pyseq(obj)?;
+        let len = obj.len()?;
         let mut builder = Utf8ChunkedBuilder::new("", len, len * 25);
 
-        for res in seq.iter()? {
+        for res in obj.iter()? {
             let item = res?;
             match item.extract::<&str>() {
                 Ok(val) => builder.append_value(val),

should work.

@ritchie46
Copy link
Member

Thanks a lot @adamreichold. I believe that was the issue indeed. Will come back with a confirmation. I

jjerphan and others added 2 commits January 29, 2023 09:36
@jjerphan
Copy link
Contributor Author

It looks like default values for all optional parameters must be provided as "required arguments after an Option<_> argument are ambiguous and being phased out".

@ritchie46
Copy link
Member

ritchie46 commented Jan 29, 2023

It looks like default values for all optional parameters must be provided as "required arguments after an Option<_> argument are ambiguous and being phased out".

Yes, everything with an Option now needs a attribute defining the arguments:

#[pyo3(signature=(a, b))]
fn do(a: u64, b: Option<u64>) {}  

@jjerphan
Copy link
Contributor Author

Is there a simple way to infer default values for signatures' arguments?

@ritchie46
Copy link
Member

In polars rust side all arguments are always required. So signature should just name all arguments.

@jjerphan
Copy link
Contributor Author

jjerphan commented Jan 29, 2023

At the moment, I get this kind of warning:

warning: use of deprecated constant `pyo3::impl_::deprecations::REQUIRED_ARGUMENT_AFTER_OPTION`: required arguments after an `Option<_>` argument are ambiguous and being phased out
         = help: add a `#[pyo3(signature)]` annotation on this function to unambiguously specify the default values for all optional parameters
  --> src/batched_csv.rs:25:9
   |
25 |         chunk_size: usize,
   |         ^^^^^^^^^^
   |
   = note: `#[warn(deprecated)]` on by default

I've tried to only specify the list of arguments' names, but I get other errors.

It seems that default values for required arguments must be provided. For instance, I get the following error:

  error: expected argument from function definition `lambda` but got argument `py`
    --> src/lazy/dsl.rs:1048:25
     |
1048 |     #[pyo3(signature = (py, lambda, window_size, weights, min_periods, center, /))]
     |                         ^^

Edit: py: Python arguments must be omitted.

@ritchie46
Copy link
Member

Mainly using the '#[pyo3(signature = ...)]' macro.

See: https://pyo3.rs/v0.18.0/migration
I might be wrong, or there might be a naming discrepancy.
@jjerphan jjerphan marked this pull request as ready for review January 29, 2023 11:06
@jjerphan
Copy link
Contributor Author

Everything should be up to date.

Please, let me know if '#[pyo3(signature = ...)]' macros are properly formatted.

Copy link
Contributor Author

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some questions regarding some of the proposed changes.

polars/polars-lazy/Cargo.toml Show resolved Hide resolved
polars/polars-lazy/polars-plan/Cargo.toml Show resolved Hide resolved
py-polars/Cargo.toml Show resolved Hide resolved
py-polars/src/lazy/dataframe.rs Show resolved Hide resolved
py-polars/src/conversion.rs Show resolved Hide resolved
@@ -230,6 +230,7 @@ def _scan_parquet(
rechunk,
_prepare_row_count_args(row_count_name, row_count_offset),
low_memory,
cloud_options=storage_options,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change wrong, or is there a naming discrepancy here?

py-polars/src/npy.rs Show resolved Hide resolved
@jjerphan jjerphan mentioned this pull request Jan 29, 2023
@ritchie46
Copy link
Member

Thanks a lot @jjerphan and @adamreichold for the help.

Let me know when it is good to go.

@jjerphan
Copy link
Contributor Author

Thank you for your guidance, @ritchie46 and @adamreichold.

I think this is mergeable.

@ritchie46 ritchie46 merged commit d43718d into pola-rs:master Jan 30, 2023
@jjerphan jjerphan deleted the chore-python/update-to-py03-0.18.0 branch January 30, 2023 09:16
cojmeister pushed a commit to cojmeister/polars that referenced this pull request Jan 30, 2023
Co-authored-by: Adam Reichold <adam.reichold@t-online.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Changes that affect the build system or external dependencies python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants