Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyMapping is inconsistent with typing.Mapping #2072

Closed
Gobot1234 opened this issue Dec 22, 2021 · 28 comments · Fixed by #2477
Closed

PyMapping is inconsistent with typing.Mapping #2072

Gobot1234 opened this issue Dec 22, 2021 · 28 comments · Fixed by #2477

Comments

@Gobot1234
Copy link
Contributor

Gobot1234 commented Dec 22, 2021

Bug Description

The conversion table for PyMapping suggests anything that is a typing.Mapping should be able to be converted to PyMapping, this however is not the case seemly list and str are also downcast-able to a PyMapping.

This causes errors when you try to access things like items on the PyMapping as you get a AttributeError str has no attribute items.

Steps to Reproduce

use pyo3::prelude::*;
use pyo3::types::PyMapping;

#[pyfunction]
fn downcast_(a: &PyAny) -> PyResult<&PyMapping> {
    Ok(a.downcast::<PyMapping>()?)
}

#[pymodule]
fn downcast(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(downcast_, m)?)?;

    Ok(())
}
>>> import typing, downcast
>>> isinstance({}, typing.Mapping)
True
>>> downcast.downcast_({})
{}
>>> isinstance("", typing.Mapping)
False
>>> downcast.downcast_("")
''

Backtrace

No response

Your operating system and version

MacOS 11.1 (20C69)

Your Python version (python --version)

3.9.3

Your Rust version (rustc --version)

rustc 1.53.0-nightly (07e0e2ec2 2021-03-24)

Your PyO3 version

0.15.1

How did you install python? Did you use a virtualenv?

venv

Additional Info

No response

@Gobot1234 Gobot1234 added the bug label Dec 22, 2021
@davidhewitt
Copy link
Member

davidhewitt commented Dec 23, 2021

Yikes, thanks for reporting.

So I'm happy to take the fall for this one. This is the C-API definition of PyMapping_Check:

.. c:function:: int PyMapping_Check(PyObject *o)

   Return ``1`` if the object provides mapping protocol or supports slicing,
   and ``0`` otherwise.  Note that it returns ``1`` for Python classes with
   a :meth:`__getitem__` method since in general case it is impossible to
   determine what type of keys it supports. This function always succeeds.

Reading this again, it's such a broad definition that it doesn't really seem practically useful. And you're right that it's very different to typing.Mapping.

On Python 3.10 and up we could potentially make use of Py_TPFLAGS_MAPPING and Py_TPFLAGS_SEQUENCE to accept the same set of types as the match keyword. That seems generally correct.

However I'm not fond of having behaviour that's different according to the Python version.

Alternatively, we could call into Python to pretty much literally do isinstance(<thing>, collections.abc.Mapping), but that seems like a waste of performance. This might be a solution for Python 3.9 and backwards if we wanted to use Py_TPFLAGS_MAPPING.

Ideas on a solution very welcome!

@Gobot1234
Copy link
Contributor Author

I like the sound of using both the tp flags and the instance checks depending on your version

@davidhewitt davidhewitt added this to the 0.16 milestone Dec 23, 2021
@davidhewitt
Copy link
Member

I think this is awkward enough of a footgun (and also new enough, having been added in 0.15) that it would be good if we could come to a decision about what to do for this in time for the 0.16 release.

All opinions very much wanted!

@birkenfeld
Copy link
Member

Would be interesting to see what C extensions do.

@Gobot1234 Gobot1234 changed the title PyAny::downcast::<PyMapping>() is inconsistent with typing.Mapping PyMapping is inconsistent with typing.Mapping Dec 23, 2021
This was referenced Dec 27, 2021
@aviramha
Copy link
Member

I think that we should consider trying to resolve this upstream in CPython. The proposed behavior (checking of tpflags/isinstance) feels like something that PyMapping_Check should do. Opinions?

@bharel
Copy link

bharel commented Dec 31, 2021

I think that we should consider trying to resolve this upstream in CPython. The proposed behavior (checking of tpflags/isinstance) feels like something that PyMapping_Check should do. Opinions?

Remember guys, PyMapping_Check in CPython is used quite frequently and is meant to be as optimized as possible, while isinstance does not only check for inheritance.

Modifying it in CPython will cause a discrepancy across versions, and since you're not actually checking a __subclasshook__, user made classes will still not be taken into account. After all, not every user made mapping inherits from the appropriate ABC.

You can try blacklisting builtins but it's a whack a mole and will cause inconsistencies.

All in all, unless you take __subclasshook__ into account, which you don't want as it's a custom Python function, it'll never be the same as isinstance.

The only possible solution is to maybe covert the ABCs and respective hook methods to C in order to optimize them, but even then it'll be much slower.

@aganders3
Copy link
Contributor

I found some seemingly relevant discussion on BPO https://bugs.python.org/issue5945

One suggestion there is to use PyMapping_Check() && !PySequence_Check() for a true mapping check. Would that work here?

@aviramha
Copy link
Member

@aganders3 Nice find! I'll try to implement it and see how it behaves.

@aviramha
Copy link
Member

See #2098

@aviramha
Copy link
Member

aviramha commented Jan 14, 2022

Maybe we need to ask for help from upstream?

(from #2098 by @davidhewitt )
I think that's the way to go.. too many quirks.

@aviramha
Copy link
Member

Created a bpo https://bugs.python.org/issue46376

@aviramha
Copy link
Member

It might decrease ergonomics, but maybe we should consider not converting mapping/sequences to the PyMapping/PySequence as the API is mostly for non-custom classes?

@bharel
Copy link

bharel commented Jan 14, 2022

Seems like __subclasshook__ is no longer relevant. Python enforces anyone inheriting or registering from the ABCs to have TP_FLAGS set accordingly. Any class not inheriting from the ABC is no longer considered a "mapping" or a "sequence", therefore without even checking tp_as_mapping, tp_as_sequence, checking the relevant tp_flags should suffice. The solution will be extremely fast and will hold all cases apart from dynamic modifications in the ABCs which are fine to not account for.

@aviramha
Copy link
Member

python/cpython#30600

@aganders3
Copy link
Contributor

aganders3 commented Jan 19, 2022

It seems the upstream route is blocked, but maybe we can take the same out they did and just "fix" it in documentation. For example add a warning that downcast to a mapping may succeed for sequences and users may want to implement further checks as-needed.

The TP_FLAGS method seems correct, but again only applies to 3.10 so perhaps could be phased in down the line or provided as an opt-in alternative somehow (e.g. under a different function name).

@aviramha
Copy link
Member

I wouldn't call it blocked. I'm still working on a proposal to send to sig-capi and in the issue.

@aganders3
Copy link
Contributor

Yeah that's fair I guess I wasn't sure if that was enough of a block to miss the 0.16 release this is targeted for. Also if it's a change going forward, does that still leave an open issue in Python <=3.10?

@Gobot1234
Copy link
Contributor Author

I think the first comment from David is still the best approach

@davidhewitt davidhewitt modified the milestones: 0.16, 0.17 Feb 26, 2022
@Gobot1234
Copy link
Contributor Author

Considering this is a wont-fix upstream, could we just implment Davids suggestion to fix this?

@aganders3
Copy link
Contributor

Considering this is a wont-fix upstream, could we just implment Davids suggestion to fix this?

I'm +1 on this and would be happy to implement as well if no one else wants to claim it.

To be clear I think the proposal was:

  • check the tpflags for Python >=3.10
  • fall back to a call into Python to do an isintance check for Python <=3.9

@davidhewitt
Copy link
Member

Yes, I agree that we need to do something (current behavior seems broken). I still think the implementation above is likely to be ok. One possible complicating factor is that the flags aren't technically in the stable API - see python/cpython#32080

@samuelcolvin I think I spotted you used PyMapping in pydantic-core - I wonder if you've seen this issue / have any thoughts about how you would want a resolved implementation to behave?

@samuelcolvin
Copy link
Contributor

Thanks so much for making me aware. I hadn't seen this because 1) my tests for mappings were not good enough 🙈 2) I trusted pyo3 too much.

Should be partially fixed in pydantic/pydantic-core#117

I would agree that your suggestion #2072 (comment) of the fast solution for 3.10 and falling back to python isinstance for <=3.9, seems like the best route.

@davidhewitt
Copy link
Member

@aganders3 if you're willing to pick this up, would be very grateful. I think the only difference to my above proposal is that the fast path checking the flags would only apply on 3.10 and up when not(Py_LIMITED_API).

@davidhewitt
Copy link
Member

I trusted pyo3 too much.

Hopefully you can continue to trust PyO3! This is the only huge footgun of a bug which I'm aware of 🤪

@aganders3
Copy link
Contributor

I'm running into a slight snag with this implementation for Python classes implemented in Rust. For 3.10+ I think we can add the Py_TPFLAGS_MAPPING if defined with #[pyclass(mapping)] but for Py_LIMITED_API and Python <3.10 I don't see a way to make the isinstance(pyclass, collections.abc.Mapping) work. It looks like #991 may be related here.

This is causing a failure in the mapping_is_not_sequence test in tests/test_mapping.rs.

Any thoughts?

@aganders3
Copy link
Contributor

aganders3 commented Jun 23, 2022

Oh wait maybe I just need to add __subclasscheck__

edit: nope, that would have to be implemented on the metaclass 😞

@davidhewitt
Copy link
Member

Hmm maybe for now you can just call collections.abc.Mapping.register() (using PyModule::import) by hand in the test before trying any mapping checks?

Agreed #991 is the proper solution, I hope that progress can be made on that eventually...

@davidhewitt
Copy link
Member

For people following this thread: in #2477 it looks like we've concluded that my proposal above isn't quite right and instead we should always do the equivalent of isinstance(pyclass, collections.abc.Mapping). If really wanted users can opt out with unsafe code to check flags and use PyMapping::try_from_unchecked with known caveats (see #2477 where @aganders3 has written an excellent summary of this).

My question to you all: we are now wondering whether to change PySequence to be based on isinstance(pyclass, collections.abc.Sequence?

My inclination is yes, the consistency would be a highly desirable property. But it would be a breaking change so would like to take some time to consider it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment