Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect iterators against concurrent modification #2380

Merged
merged 8 commits into from May 31, 2022

Conversation

mejrs
Copy link
Member

@mejrs mejrs commented May 16, 2022

Fixes #2162

  • Changelog
  • More tests?
  • Does PyList need to be protected? It seems fine as is.

@mejrs mejrs added the CI-no-fail-fast If one job fails, allow the rest to keep testing label May 16, 2022
/// # Panics
///
/// If PyO3 detects that the dictionary is mutated during iteration, it will panic.
/// It is allowed to modify the values of the keys as you iterate over the dictionary, but only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know "modify the values of the keys" is copied verbatim from the C-API docs, but I think it's quite confusing.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this! 🚀

Does PyList need to be protected? It seems fine as is.

Good question. I guess it's by luck that with the current implementation we re-fetch the len() on each call to next()? I guess checking for modifications would require that and an extra check, so maybe it's ok as-is.

di_dict: *mut ffi::PyDictObject,
di_pos: ffi::Py_ssize_t,
di_used: ffi::Py_ssize_t,
marker: PhantomData<&'py PyDict>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk that the dict refcount can fall to 0 and can be accidentally released? Should this be Py<PyDict>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterator borrows from the Pydict, so this should be fine.


fn into_iter(self) -> Self::IntoIter {
unsafe {
let di_dict: *mut ffi::PyDictObject = self.as_ptr().cast();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always wondered if there should be an as_raw() -> *mut ffi::PyDictObject method on PyDict (and similar for other types)? Available only on not (limited_api / PyPy).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds interesting, but I'm not sure how often this is useful in practice. You only need it to access fields directly, which we don't tend to do.

#[inline]
fn next(&mut self) -> Option<Self::Item> {
unsafe {
if self.di_used != (*(self.di_dict)).ma_used {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between ma_used and PyDict_Size used below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...I read the cpython source some more: actually none. Let me reconsider the approach.

src/types/dict.rs Outdated Show resolved Hide resolved
impl PyFrozenSet {
/// Creates a new frozenset.
///
/// May panic when running out of memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know not from this PR, but is this actually true? It looks like returning PyResult means that we'll actually return PyErr containing a MemoryError if we run out of memory.

I suspect that actually the text may want to be more like PySet where we don't mention this and instead comment on the error return being caused by unhashable contents.

src/types/frozenset.rs Show resolved Hide resolved
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, smart new implementation!

@davidhewitt
Copy link
Member

I know I already approved... Looks like this needs a CHANGELOG and then good to merge?

@mejrs
Copy link
Member Author

mejrs commented May 23, 2022

Yes, I've been busy for a bit. I'll do it soon :)

@mejrs
Copy link
Member Author

mejrs commented May 29, 2022

That should be all 🎉

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

/// These iterators use Python's C-API directly. However in certain cases, like when compiling for
/// the Limited API and PyPy, the underlying structures are opaque and that may not be possible.
/// In these cases the iterators are implemented by forwarding to [`PyIterator`].
pub mod iter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it was long overdue for these to be public!

@davidhewitt davidhewitt merged commit 4f9d3d7 into PyO3:main May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-no-fail-fast If one job fails, allow the rest to keep testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iteration safety for containers
3 participants