Automatically wrap `str` in a `vec![]` for `Vec<&str>` and `Vec<String>` #2500

jeertmans · 2022-07-07T11:06:35Z

EDIT

After few discussions (see below), it was proposed to raise an error in the case when str can be split into a Vec<T>.
Thus, only a dynamic type checking is done with PyAny.isinstance_of::<PyString>().

Old proposition

As discussed in #2342, this PR proposes a way to wrap str arguments inside a vector to avoid iterating through chars when it is not desired.

Currently, only the Vec<String> is working, as I cannot manage to compile for Vec<&str> (code was commented out).

This "solution" leverages the use of the specialization features, which is currently unstable. As such, it must be compile with nightly feature activated as well as the nightly channel for the rust compiler.

Another solution, as discussed in #2342, would be to throw an error instead of wrapping in a vector.

Demo

[package]
name = "fromstr"
# ...

[dependencies]
pyo3 = { path = "...", features = ["extension-module", "nightly"] }

use pyo3::prelude::*;   
                                                            
#[pyfunction]   
fn print_strings(strings: Vec<String>) {            
    for s in strings {                                                                          
        println!("{}", s);                       
    }           
}                              
                                        
                          
#[pyfunction]                                                                                   
fn print_str(strings: Vec<&str>) {
    for s in strings {                      
        println!("{}", s);     
    }                                      
}                
         
#[pymodule]              
fn fromstr(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(print_strings, m)?)?;
    m.add_function(wrap_pyfunction!(print_str, m)?)?;
    Ok(())                                                           
}

from fromstr import print_strings, print_str
                     
if __name__ == "__main__":
             
    print(1)                            
    print_strings("echo")
    print(2)              
    print_str("echo")
    print(3)
    print_strings(["echo"])
    print(4)
    print_str(["echo"])

# Outputs:
# 1
# echo
# 2
# e
# c
# h
# o
# 3
# echo
# 4
# echo

birkenfeld · 2022-07-07T14:09:31Z

IMO it's not ok to silently change behavior when nightly is activated.

jeertmans · 2022-07-07T14:18:35Z

@birkenfeld I agree that the user should be somehow notified of this, either through a warning or some error?

davidhewitt · 2022-07-07T22:36:10Z

Gotta agree that I'm not a fan of having behavior change by a feature.

I know I copied a quote in #2342 which suggested this wrapping, but how about a simpler solution: just add a check when extracting Vec<T> that the input is not a string (and return an error if it is).

Without specialization we'd have to pay that cost even for non-text types like Vec<i32>. Though if we measure it, I'd hope it would be almost unnoticeable.

That would at least hopefully work on stable...?

jeertmans · 2022-07-08T09:21:42Z

@davidhewitt The only way I see it could be implemented on the stable channel would be to use std::any::TypeId::of::<T>(), but it requires T to have a static lifetime, which is not allowed by pyo3 due to GIL restrictions I guess.

I have tried multiple tricks, but none seemed to work.

The implementation could look something like this:

impl<'a, T> FromPyObject<'a> for Vec<T>
where
    T: FromPyObject<'a>,
{
    fn extract(obj: &'a PyAny) -> PyResult<Self> {
        let ti = TypeId::of::<T>(); // Does not work
        if (ti == TypeId::of::<String>() || ti == TypeId::of::<&'static str>())
            && obj.is_instance_of::<PyString>()?
        {
            // Raise some error;
        }
        extract_sequence(obj)
    }
}

davidhewitt · 2022-07-09T05:03:55Z

Do we need the TypeId checks? I was wondering about the simpler:

if let Ok(true) = obj.is_instance_of::<PyString>()
{
    return Err(PyValueError::new_err("Can't extract `str` to `Vec`"))
}
extract_sequence(obj)

Just depends how much that affects performance I think?

jeertmans · 2022-07-09T12:02:59Z

The idea behind that kind of type checking (either with ˋTypeIdor specialization) was to still allow astr` to be transformed into, e.g., a ˋVecˋ.
But maybe this is not desirable as well and using isinstance is sufficient in this case.

jeertmans · 2022-07-11T10:18:15Z

@davidhewitt Here is a small benchmark I ran and its results:

(Don't mind about the print... function names, it should be count...)

pyo3/src/types/sequence.rs

impl<'a, T> FromPyObject<'a> for Vec<T>
where
    T: FromPyObject<'a>, 
{
    fn extract(obj: &'a PyAny) -> PyResult<Self> {
        if let Ok(true) = obj.is_instance_of::<PyString>() {
            return Err(PyValueError::new_err("Can't extract `str` to `Vec`"));
        }
        extract_sequence(obj)
    }
}

fromstr/src/lib.rs

use pyo3::prelude::*;

#[pyfunction]
fn print_strings(strings: Vec<String>) -> usize {            
    strings.len()
}                              

  
#[pyfunction]  
fn print_str(strings: Vec<&str>) -> usize {
     strings.len()  
} 
                                         

#[pyfunction]  
fn print_int(strings: Vec<isize>) -> usize {
    strings.len() 
}                                                             
                                                                                
#[pyfunction]
fn print_char(strings: Vec<char>) -> usize {
    strings.len()
}  
  
#[pymodule]   
fn fromstr(_py: Python, m: &PyModule) -> PyResult<()> {                                                             
    m.add_function(wrap_pyfunction!(print_strings, m)?)?;                                                
    m.add_function(wrap_pyfunction!(print_str, m)?)?; 
    m.add_function(wrap_pyfunction!(print_int, m)?)?;        
    m.add_function(wrap_pyfunction!(print_char, m)?)?;         
    Ok(())                                             
}

main.py

  from fromstr import *
  from timeit import timeit
             
  if __name__ == "__main__":                                 
                 
      number = 100000          
                                        
      i = list(range(10)) 
      c = list("abcdefg")                                                                       
      s = "some random sentence".split(" ")
                                                                     
      print(timeit(lambda: print_strings(s), number=number) / number)
      print(timeit(lambda: print_str(s), number=number) / number)
      print(timeit(lambda: print_char(c), number=number) / number)
      print(timeit(lambda: print_int(i), number=number) / number)

Without the `isinstance` call

(venv) ➜  fromstr python main.py
4.702104139996663e-06                               
3.5061912999935886e-06                              
5.810010350005541e-06                               
5.9074885100017125e-06                              
(venv) ➜  fromstr python main.py
4.845567360007408e-06                               
3.9516925000043556e-06                              
5.810035859994969e-06                               
5.726638340001955e-06

With the `isinstance` call

(venv) ➜  fromstr python main.py 
4.092408500000601e-06
2.954179230000591e-06
4.588532469997517e-06
4.370560449997356e-06
(venv) ➜  fromstr python main.py
3.9268650299982255e-06
2.893349019996094e-06
4.480811690000337e-06
4.1616317100033485e-06

davidhewitt · 2022-07-12T06:53:05Z

Hmm, those timeit numbers are really hard to interpret for me.

I did some benchmarking myself, my rough conclusion is that on my machine extracting an empty Vec takes about 100ns, and this check adds an additional 10ns (so 10%).

Extracting 10 &str elements takes another 100ns, or another way at looking at this is that we pay a cost similar to if the Vec was just one element larger.

Seems ok to me for correctness?

jeertmans · 2022-07-12T15:36:00Z

I have to agree that timeit may not be the best to quantify performance differences, especially on that scale :'-)
10% seems fair enough, but hopefully if the specialization feature becomes stable one day, I think it would be nice to use it to gain some performances back.

davidhewitt · 2022-07-13T05:40:45Z

Agreed. Correctness has to be the top priority, and as new compiler functionality emerges (and new language patterns) we can refine implementations to be more efficient.

A few things that this needs before ready for merge:

An entry in the Changed section of the CHANGELOG
A test for this new error case (extracting str as Vec should indeed fail).
To fix CI you'll need to rebase on clippy: fix some warnings from beta toolchain #2504 after merging (that PR includes a fix for the MSRV failure).

Closes PyO3#2342 Refactor to only raise an error based on `isinstance`

jeertmans · 2022-07-13T07:47:31Z

@davidhewitt that should all be done now :-)
Please tell me if I am missing something

davidhewitt

Looks great to me, thanks for iterating on this!

jeertmans · 2022-07-14T06:01:03Z

It was a pleasure, thank you for your time!

alex · 2022-10-13T12:44:14Z

Hi all, I realize I'm leaving this comment many months after this was landed, but...

Is there a reason ValueError was chosen here? TypeError seems more appropriate to me, and I wanted to ask if there was a reason it was rejected.

davidhewitt · 2022-10-13T20:08:30Z

@alex that's a reasonable suggestion; I didn't have a strong opinion when reviewing, however TypeError probably is better. PR welcome!

If this behaviour is useful to you, would you be willing to also join the discussion in #2632? The proposal there would remove this behaviour, and I'm undecided as I see both sides of the debate, so I would value having more user input.

alex · 2022-10-13T20:11:40Z

Sure, happy to do a PR to switch to TypeError :-)

jeertmans marked this pull request as ready for review July 12, 2022 15:36

jeertmans added 3 commits July 13, 2022 09:44

FromPyObject::extract from PyString to Vec<T> now raises an error

ae5af93

Closes PyO3#2342 Refactor to only raise an error based on `isinstance`

Document changes in CHANGELOG

d3c2af1

Add unit tests

433bdd8

jeertmans force-pushed the str-specialization branch from 6b01925 to 433bdd8 Compare July 13, 2022 07:45

davidhewitt approved these changes Jul 13, 2022

View reviewed changes

davidhewitt merged commit 308ffa2 into PyO3:main Jul 13, 2022

jeertmans deleted the str-specialization branch July 13, 2022 21:05

jeertmans mentioned this pull request Jul 14, 2022

Don't accept str in FromPyObject for Vec<&str> and Vec<String> #2342

Closed

messense linked an issue Jul 14, 2022 that may be closed by this pull request

Don't accept str in FromPyObject for Vec<&str> and Vec<String> #2342

Closed

adamreichold mentioned this pull request Sep 18, 2022

RFC: Fix #2615 by relaxing the type check in extract_sequence. #2620

Closed

adamreichold added a commit that referenced this pull request Sep 21, 2022

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

e970e04

adamreichold added a commit that referenced this pull request Sep 21, 2022

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

b3f5d43

adamreichold added a commit that referenced this pull request Sep 21, 2022

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

78b4f8f

adamreichold added a commit that referenced this pull request Sep 21, 2022

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

4f7a811

adamreichold added a commit that referenced this pull request Sep 27, 2022

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

d672903

adamreichold added a commit that referenced this pull request Oct 13, 2022

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

1d44ef5

adamreichold added a commit that referenced this pull request Nov 7, 2022

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

b23508b

adamreichold added a commit that referenced this pull request Nov 9, 2022

Revert #2500, i.e. impl FromPyObject for Vec<T> will accept str.

18232ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically wrap `str` in a `vec![]` for `Vec<&str>` and `Vec<String>` #2500

Automatically wrap `str` in a `vec![]` for `Vec<&str>` and `Vec<String>` #2500

jeertmans commented Jul 7, 2022 •

edited

birkenfeld commented Jul 7, 2022

jeertmans commented Jul 7, 2022

davidhewitt commented Jul 7, 2022

jeertmans commented Jul 8, 2022

davidhewitt commented Jul 9, 2022

jeertmans commented Jul 9, 2022

jeertmans commented Jul 11, 2022

davidhewitt commented Jul 12, 2022

jeertmans commented Jul 12, 2022

davidhewitt commented Jul 13, 2022

jeertmans commented Jul 13, 2022

davidhewitt left a comment

jeertmans commented Jul 14, 2022

alex commented Oct 13, 2022

davidhewitt commented Oct 13, 2022

alex commented Oct 13, 2022

Automatically wrap str in a vec![] for Vec<&str> and Vec<String> #2500

Automatically wrap str in a vec![] for Vec<&str> and Vec<String> #2500

Conversation

jeertmans commented Jul 7, 2022 • edited

EDIT

Old proposition

Demo

birkenfeld commented Jul 7, 2022

jeertmans commented Jul 7, 2022

davidhewitt commented Jul 7, 2022

jeertmans commented Jul 8, 2022

davidhewitt commented Jul 9, 2022

jeertmans commented Jul 9, 2022

jeertmans commented Jul 11, 2022

Without the isinstance call

With the isinstance call

davidhewitt commented Jul 12, 2022

jeertmans commented Jul 12, 2022

davidhewitt commented Jul 13, 2022

jeertmans commented Jul 13, 2022

davidhewitt left a comment

Choose a reason for hiding this comment

jeertmans commented Jul 14, 2022

alex commented Oct 13, 2022

davidhewitt commented Oct 13, 2022

alex commented Oct 13, 2022

Automatically wrap `str` in a `vec![]` for `Vec<&str>` and `Vec<String>` #2500

Automatically wrap `str` in a `vec![]` for `Vec<&str>` and `Vec<String>` #2500

jeertmans commented Jul 7, 2022 •

edited

Without the `isinstance` call

With the `isinstance` call