Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuple/List get_item to fallible #1733

Merged
merged 1 commit into from Aug 17, 2021
Merged

Tuple/List get_item to fallible #1733

merged 1 commit into from Aug 17, 2021

Conversation

aviramha
Copy link
Member

Hey, trying to help with:
#1667
specifically:
make PyTuple::get_item and PyList::get_item fallible
I'm not quite sure about the extract thingy, any guidance or feedback will be welcome.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for kicking this off!

Another way to design this which is potentially better for migration is to deprecate .get_item() and to introduce two new methods:

  • .get() -> PyResult<&PyAny>
  • .get_unchecked() -> &PyAny, doesn't do checking and only available on #[cfg(not(Py_LIMITED_API))]

Or .get() -> Option<&PyAny>, if handling the error internally feels appropriate. I'd be open to opinions on this.

Regarding the .extract() error, you should check for uses of .get_item() inside pyo3-macros-backend. TBH this is an example why deprecating get_item might be the better idea - the compilation will still go through and you'll just get warnings of where it's used, which can be fixed instead.

src/types/tuple.rs Outdated Show resolved Hide resolved
src/types/tuple.rs Outdated Show resolved Hide resolved
@aviramha
Copy link
Member Author

aviramha commented Jul 22, 2021

@davidhewitt
Ping!

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is taking shape!

@birkenfeld as you came up with the proposals in #1667 I'd be really interested for your opinion on this.

src/types/list.rs Outdated Show resolved Hide resolved
src/types/list.rs Outdated Show resolved Hide resolved
src/types/tuple.rs Outdated Show resolved Hide resolved
src/types/tuple.rs Outdated Show resolved Hide resolved
src/types/tuple.rs Outdated Show resolved Hide resolved
@aviramha aviramha force-pushed the list_stuff branch 2 times, most recently from c598465 to 996b677 Compare July 22, 2021 20:43
Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

I do have some suggestions..

@@ -83,6 +84,24 @@ impl PyList {
}
}

/// Gets the list item at the specified index.
pub fn get(&self, index: usize) -> PyResult<&PyAny> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider using isize rather than usize for this, if we still want negative indexing per #1667 (?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want negative indexing, but when using index (slicing). There's no real sense in implementing it at this function level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm leaning towards using usize everywhere, anyway 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this should only be relevant for slicing. Python allows negative indices for single item access as well as for slices. IMO either we allow negative indices for both, or none.

src/types/list.rs Show resolved Hide resolved
src/types/list.rs Show resolved Hide resolved
src/types/tuple.rs Outdated Show resolved Hide resolved
Copy link
Member

@birkenfeld birkenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with returning PyResult. The new name makes the API asymmetric (get vs set_item, and btw why is there is no del_item or remove...) Are we still allowed to break APIs if the result is a clear compile error?

usize vs isize needs to be discussed still.

A follow-up to this is the implementation of Index traits.

@@ -83,6 +84,24 @@ impl PyList {
}
}

/// Gets the list item at the specified index.
pub fn get(&self, index: usize) -> PyResult<&PyAny> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this should only be relevant for slicing. Python allows negative indices for single item access as well as for slices. IMO either we allow negative indices for both, or none.

let item = self.list.get_item(self.index);
if self.index < self.list.len() {
#[cfg(any(Py_LIMITED_API, PyPy))]
let item = self.list.get(self.index).expect("tuple.get failed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a list here

@davidhewitt
Copy link
Member

davidhewitt commented Jul 25, 2021

The new name makes the API asymmetric (get vs set_item, and btw why is there is no del_item or remove...) Are we still allowed to break APIs if the result is a clear compile error?

Yes, we could change get_item in 0.15. I do agree we should keep consistent names; maybe it is better to have get_item and get_item_unchecked here. I think I prefer having isize because it's more similar to the Python semantics, but I could be persuaded either way. IIRC the underlying C-API only accepts unsigned integers, so we'd need to do the conversion (but that's not too bad)...

Sorry @aviramha you picked a ticket which has a lot of open design questions 🙃

I definitely think if we call it .get() then we should accept usize only.

@aviramha
Copy link
Member Author

Sorry @aviramha you picked a ticket which has a lot of open design questions 🙃

That's OK. the get_item of Python is actually usize, so it might be more confusing that it works with negative indices. I guess it's a question whether you want the API to feel like a Rust wrapper to the CPython or a rust wrapper to the Python API.
Let me know which you prefer and I'll fix the PR.

@davidhewitt
Copy link
Member

That's a fair question. I think I'm now leaning towards this matching the C API, so would vote for get_item(usize) -> PyResult<&PyAny> and unsafe get_item_unchecked(usize) -> &PyAny.

@aviramha
Copy link
Member Author

That's a fair question. I think I'm now leaning towards this matching the C API, so would vote for get_item(usize) -> PyResult<&PyAny> and unsafe get_item_unchecked(usize) -> &PyAny.

Goes without saying that I am too. How do we make this call? 🙃

@davidhewitt
Copy link
Member

If it looks like a majority of us agree, I'm happy for us to commit to this plan (cc @birkenfeld @mejrs @messense).

@birkenfeld
Copy link
Member

birkenfeld commented Jul 25, 2021

Where would isize be allowed then? This should really be designed keeping all places with indices in mind (see #1667), not decided on a case-by-case basis just looking at the C API, which is inconsistent. For example, PyList_Insert accepts negative indices.

@birkenfeld
Copy link
Member

To be clear, I'm not opposed to doing away with isize everywhere and only accepting "real" indices, Rust-std-style. I just want it to be consistent.

@aviramha
Copy link
Member Author

@birkenfeld I understand your concern but I believe that's CPython inconsistency issue. It depends on what we value more - being consistent with it or improving it.

@aviramha
Copy link
Member Author

To make myself more clear (previous comment was from mobile) - when coding in PyO3 I often use the CPython docs to understand how stuff works - having different API (i.e get_item of PyO3 accepts negative indices vs CPython not) can easily cause confusion for anyone.

@birkenfeld
Copy link
Member

birkenfeld commented Jul 25, 2021

The expectation of each PyO3 method mapping exactly to one Python C-API call is misguided, though. If you are missing info from the PyO3 docs which is integral to understanding the API in question, it is a bug and should be fixed; please report that here.

And of course, in the case we're discussing here, the documentation would need to state that negative indices are supported and how they are interpreted. (That this is not documented currently is one of the points of #1667.)

@davidhewitt
Copy link
Member

I definitely agree that we should go for all-isize, or all-usize. It's not fun as a user to have to look up argument types for every function. (And the right document source should definitely be PyO3 docs, not CPython docs.)

Here's the arguments I can see either way:


usize

Good:

  • Matches std
  • Means we can use the SliceIndex trait to allow doing list[0..5] etc

Bad:

  • PySequence and PyList_Insert APIs work correctly with negative indices, but we'd be preventing users from doing so.

isize

Good:

  • Matches python usage, e.g. list[-5]
  • We'd have to work around APIs that don't support negative indexing. It's probably as simple as using PySequence_GetItem instead of PyList_GetItem, and similar, but not sure. Some edge cases might be annoying.

Bad:

  • Supporting negative indexing presumably has a (tiny) performance overhead, because negative indices must be converted to actual positive ones.
  • To have slices with negative indices such as list[-4..-2] we'd probably need to create a PySliceIndex trait which is implemented for various Range<isize>, RangeFrom<isize>, RangeFull<isize>. Could be a lot of work.
  • PyList::get_item_unchecked would almost certainly want to take usize. Either that or it could have a simple check if idx < self.len() { idx += self.len() }, But that would again imply performance overhead, which kinda sucks in an unsafe low-level API.

An alternative... what if we made PyList take usize and PySequence take isize? That way we'd get the simplicity of usize for most usage but also give users the "full API" if they chose to treat the list as a sequence. We'd have to document this really clearly, but at least there's then a clear pattern to it.

I think this "mixed-mode" would match the C-API in every way except PyList::insert, which we would make only take usize.

@mejrs
Copy link
Member

mejrs commented Jul 25, 2021

@davidhewitt thanks for the recap

Indexing and slicing with negative numbers doesn't seem like a good idea to me:

  • For someone not coming from Python, having negative indices work when indexing/slicing is something that I feel will mostly just be surprising and unexpected.
  • For someone who is coming from Python (to Rust), I think it'll be confusing if negative indexing works somewhere (pyo3) but not in other places (the rest of the Rust ecosystem).
  • It will only be a convenience for people making Python libraries rather than the people using said libraries. After all, authors can implement negative slicing and indexing for their own types.
  • I'm not a fan of doing a bunch of math tricks in what should be a low level api.

@aviramha
Copy link
Member Author

aviramha commented Aug 7, 2021

Cool. @davidhewitt Waiting for your review :

@davidhewitt
Copy link
Member

👍 will try to give a full read within the next few days!

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing forward on this! I'm afraid there's a few bits and pieces which still need tidying up. Comments scattered below.

@davidhewitt should I change get_slice to also use usize?

Before we release 0.15 we'll need to change all these tuple / list / sequence APIs which currently take isize to take usize. This PR is already big enough, so maybe it's better to make a list of what still needs to change in #1667 and then make follow-up PRs.

CHANGELOG.md Outdated Show resolved Hide resolved
benches/bench_list.rs Show resolved Hide resolved
benches/bench_tuple.rs Show resolved Hide resolved
pyo3-macros-backend/src/from_pyobject.rs Outdated Show resolved Hide resolved
src/types/list.rs Outdated Show resolved Hide resolved
src/types/list.rs Outdated Show resolved Hide resolved
src/types/list.rs Show resolved Hide resolved
src/types/tuple.rs Outdated Show resolved Hide resolved
@aviramha aviramha force-pushed the list_stuff branch 2 times, most recently from 9880cb4 to 8cbf9f9 Compare August 13, 2021 13:08
@aviramha
Copy link
Member Author

Thanks for the review. I think I fixed all of the comments :)

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are looking reasonable to me... but CI has a segfault?

CHANGELOG.md Outdated Show resolved Hide resolved
src/types/list.rs Show resolved Hide resolved
src/types/tuple.rs Show resolved Hide resolved
src/types/list.rs Outdated Show resolved Hide resolved
@aviramha
Copy link
Member Author

Oopsie, I thought the CI failure was main's fault

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I think I've found another cause of segfault... Fingers crossed CI green after 🤞

src/types/list.rs Outdated Show resolved Hide resolved
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks brilliant! Thanks again for your patience and many iterations as we worked out the design for this.

I'll merge this shortly (just going to cherry-pick some fixes for a 0.14.3 bugfix release first).

@davidhewitt
Copy link
Member

(See #1796 - as soon as that release is live I'll merge this. It just makes my life easier if I try to cherry-pick other stuff and there's no more breaking changes 😉)

Copy link
Member

@birkenfeld birkenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@aviramha
Copy link
Member Author

👍 looks brilliant! Thanks again for your patience and many iterations as we worked out the design for this.

I'll merge this shortly (just going to cherry-pick some fixes for a 0.14.3 bugfix release first).

You're welcome! Glad to contribute to an awesome framework.
Where do we wish to track progress on this so I might pick up the next task?

@davidhewitt
Copy link
Member

You're welcome! Glad to contribute to an awesome framework.
Where do we wish to track progress on this so I might pick up the next task?

🚀

If you want to own this, that would be awesome! The easiest way might be to create a new issue where you can own a list of checkboxes and close them as we make progress?

@birkenfeld
Copy link
Member

Since 0.14.x is going to be in a branch, can we merge this?

@davidhewitt
Copy link
Member

Ok 👍 I'm getting more used to doing the cherry-picking onto the 0.14.3 branch now, so let's proceed with this.

I see I created a merge conflict by merging #1802, so I'll fixup this quickly now...

@davidhewitt davidhewitt force-pushed the list_stuff branch 2 times, most recently from e32641a to 33b056b Compare August 17, 2021 13:23
…`usize` indices instead of `isize`.

- `PyList` and `PyTuple`'s `get_item` now always uses the safe API. See `get_item_unchecked` for retrieving index without checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants