Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyClass Sequence of tuples or arrays not convertible to numpy array #2392

Closed
mtreinish opened this issue May 20, 2022 · 14 comments
Closed

PyClass Sequence of tuples or arrays not convertible to numpy array #2392

mtreinish opened this issue May 20, 2022 · 14 comments
Labels

Comments

@mtreinish
Copy link
Contributor

mtreinish commented May 20, 2022

Bug Description

When creating a custom pyclass that implements the sequence protocol with pyo3 0.16.x the output type is missing something (I haven't been able to figure out exactly what) that prevents numpy from interpreting it correctly to convert into an array. This only is an issue when using the creating a sequence type using #[pymethods]. It works as expected if you use the deprecated #[pyproto] with the PySequenceProtocol trait.

Steps to Reproduce

  1. create pyo3 pyclass that implement sequence protocol on array or tuple of numeric types, minimal example:
use pyo3::prelude::*;
use pyo3::exceptions::PyIndexError;

#[pyclass(module="reproduce")]
#[derive(Clone)]
pub struct MySequenceArray {
    pub data: Vec<[usize; 2]>
}

#[pymethods]
impl MySequenceArray {
    #[new]
    fn new(data: Vec<[usize; 2]>) -> Self {
        MySequenceArray {
            data
        }
    }

    fn __len__(&self) -> PyResult<usize> {
        Ok(self.data.len())
    }

    fn __getitem__(&self, idx: isize) -> PyResult<[usize; 2]> {
        if idx as usize >= self.data.len() {
            Err(PyIndexError::new_err("out of bounds"))
        } else {
            Ok(self.data[idx as usize])
        } 
    }
}

#[pyclass(module="reproduce")]
#[derive(Clone)]
pub struct MySequenceTuple {
    pub data: Vec<(usize, usize)>
}

#[pymethods]
impl MySequenceTuple {
    #[new]
    fn new(data: Vec<(usize, usize)>) -> Self {
        MySequenceTuple {
            data
        }
    }

    fn __len__(&self) -> PyResult<usize> {
        Ok(self.data.len())
    }

    fn __getitem__(&self, idx: isize) -> PyResult<(usize, usize)> {
        if idx as usize >= self.data.len() {
            Err(PyIndexError::new_err("out of bounds"))
        } else {
            Ok(self.data[idx as usize])
        }
    }
}

#[pymodule]
fn reproduce(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_class::<MySequenceTuple>()?;
    m.add_class::<MySequenceArray>()?;
    Ok(())
}
  1. In python create instances of MySequenceArray or MySequenceTuple:
import reproduce
test_list = [[0, 1], [1, 2]]
reproduce_array = reproduce.MySequenceArray(test_list)
test_tuple = [(0, 1), (1, 2)]
reproduce_tuple = reproduce.MySequenceTuple(test_tuple)
  1. Build 2d numpy array from custom sequence objects:
import numpy as np
nd_array = np.asarray(reproduce_array, dtype=np.uintp)
nd_tuple = np.asarray(reproduce_tuple, dtype=np.uintp)

Backtrace

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'reproduce.MySequenceArray'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.

Your operating system and version

Linux

Your Python version (python --version)

3.10.0

Your Rust version (rustc --version)

1.60.0

Your PyO3 version

0.16.x

How did you install python? Did you use a virtualenv?

pacman

Additional Info

This example worked as expected with the deprecated PySequenceProtocol trait and pyproto. So the pyclass code becomes:

use pyo3::prelude::*;
use pyo3::PySequenceProtocol;
use pyo3::exceptions::PyIndexError;


#[pyclass(module="reproduce")]
#[derive(Clone)]
pub struct MySequenceArray {
    pub data: Vec<[usize; 2]>
}

#[pymethods]
impl MySequenceArray {
    #[new]
    fn new(data: Vec<[usize; 2]>) -> Self {
        MySequenceArray {
            data
        }
    }
}
#[pyproto]
impl PySequenceProtocol for MySequenceArray {

    fn __len__(&self) -> PyResult<usize> {
        Ok(self.data.len())
    }

    fn __getitem__(&'p self, idx: isize) -> PyResult<[usize; 2]> {
        if idx as usize >= self.data.len() {
            Err(PyIndexError::new_err("out of bounds"))
        } else {
            Ok(self.data[idx as usize])
        }
    }
}

#[pyclass(module="reproduce")]
#[derive(Clone)]
pub struct MySequenceTuple {
    pub data: Vec<(usize, usize)>
}

#[pymethods]
impl MySequenceTuple {
    #[new]
    fn new(data: Vec<(usize, usize)>) -> Self {
        MySequenceTuple {
            data
        }
    }
}
#[pyproto]
impl PySequenceProtocol for MySequenceTuple {

    fn __len__(&self) -> PyResult<usize> {
        Ok(self.data.len())
    }

    fn __getitem__(&self, idx: isize) -> PyResult<(usize, usize)> {
        if idx as usize >= self.data.len() {
            Err(PyIndexError::new_err("out of bounds"))
        } else {
            Ok(self.data[idx as usize])
        }
    }
}

#[pymodule]
fn reproduce(_py: Python<'_>, m: &PyModule) -> PyResult<()> {
    m.add_class::<MySequenceTuple>()?;
    m.add_class::<MySequenceArray>()?;
    Ok(())
}

then running:

import reproduce
import numpy as np

test_list = [[0, 1], [1, 2]]
reproduce_array = reproduce.MySequenceArray(test_list)
test_tuple = [(0, 1), (1, 2)]
reproduce_tuple = reproduce.MySequenceTuple(test_tuple)
nd_array = np.asarray(reproduce_array, dtype=np.uintp)
nd_tuple = np.asarray(reproduce_tuple, dtype=np.uintp)
print(nd_array)
print(nd_tuple)

outputs the expected:

[[0 1]                                                         
 [1 2]]
[[0 1]
 [1 2]]
@mtreinish
Copy link
Contributor Author

As @georgios-ts pointed out in Qiskit/rustworkx#614 (comment) the underlying issue might be that the new pymethods approach is setting mp_length and pyproto is setting sq_length.

@davidhewitt
Copy link
Member

davidhewitt commented May 22, 2022

Ugh, that's frustrating.

The problem I saw with sq_length is that if it's set then when indexing (from C only) with a negative index then CPython will add the result of sq_length to it. This is a subtle behavior that I perceived as potentially surprising for users.

TBH while annoying I've been coming round to the opinion that maybe it's the lesser evil compared to not setting sq_length.

What would your opinion be? Is CPython adding the length to negative indices a problem in practice?

@ravenexp
Copy link
Contributor

TBH while annoying I've been coming round to the opinion that maybe it's the lesser evil compared to not setting sq_length.

What would your opinion be? Is CPython adding the length to negative indices a problem in practice?

I'm not the original reporter, but I've had a problem with my code not handling negative indexes correctly after migration from #[pyproto], solving which required adding several workarounds in my Rust code. On the contrary, I found the new PyO3 behavior surprising.

@birkenfeld
Copy link
Member

The "not handling correctly" just means that we aren't handling negative indices specially anymore, I think. This is intended.

@davidhewitt
Copy link
Member

I agree with @birkenfeld in that I felt having special handling for negative integers to be a footgun.

For example, imagine a sequence with length 5, and user attempts to index with -8.

If sq_len is implemented, then CPython will amend that -8 to -3 (by adding 5). This is done invisibly to the sq_item implementation. Not only would it be a bug for the sq_item implementation to amend that -3 to 2, but it's potentially done inconsistently; any other code which calls sq_item directly needs to reproduce CPython's behavior of conditionally adding sq_len to negative indices if present. CPython also doesn't adjust negative indices when trying the mp_subscript slot prior to trying sq_item.

Further, the sq_item implementation would probably not want to raise IndexError: -3 - I perceive that to be surprising to the user, given they had originally passed -8. Maybe the sq_item could correct the -3 to -8 before raising, but given the correction may not be applied consistently this won't help either.

To be honest, both having and not having sq_len have downsides, given we're now seeing the downsides of PyO3 having chosen to omit it.

My preferred solution would be to add a new type object flag to CPython to opt-out of this negative index adjustment so that implementations can handle it. That way PyO3 could return to implementing sq_len. I just haven't got around to proposing that.

@georgios-ts
Copy link

georgios-ts commented May 22, 2022

Well, this sounds indeed like a problem in CPython but from my perspective pyo3 should not try to fight that (and it didn't a few releases back). It's best to fill in sq_length and document this behavior.

@davidhewitt
Copy link
Member

While I'm leaning towards this indeed, we need to be careful because anyone who is converting indices manually like @ravenexp will end up with broken code if we start emitting sq_len. If we revert this for PyO3 0.17, we need big bold text in the CHANGELOG to make it clear users need to check and fix behavior.

@mtreinish
Copy link
Contributor Author

mtreinish commented Jun 4, 2022

I was thinking about this a bit more, maybe in the interest of compatibility we can add an explicit sequence flag to pyclass like is already there for mapping. If it's set only the sequence slots, including sq_len, get filled. That way if the user doesn't specify anything it behaves as it does today, but if you want an explicit sequence type you can specify that just like you can with a mapping type.

@davidhewitt
Copy link
Member

I think that's a reasonable plan. If anyone's got availability to write a PR for this, would be much appreciated. Otherwise I'll get to it eventually.... (Maybe in a few weeks though)

@davidhewitt
Copy link
Member

Sequence users - we're thinking to change PySequence and PyMapping types to be based on isinstance(value, collections.abc.Sequence) and isinstance(value, collections.abc.Mapping) respectively - see #2072 for more.

I was wondering if you have any opinions on this?

@davidhewitt
Copy link
Member

Implementation for #2392 (comment) now at #2567. @mtreinish are you willing to double-check that branch works for you before I merge it?

@davidhewitt
Copy link
Member

#2567 now merged, can test on main if desired. Release incoming soon, I will close this.

@ravenexp
Copy link
Contributor

I can't get #[pyclass(sequence)] to work as documented. Negative indexes are still passed to __getitem__() as is even in PyO3 v0.17.1, even though the class is tagged as a sequence and __len__() is implemented.

Am I missing something?

@davidhewitt
Copy link
Member

I think #2392 (comment) and your experience illustrates exactly why this CPython behaviour is very confusing. I will put a longer answer on #2601.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants