Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictable and safe object inspection #13833

Draft
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

jasongrout
Copy link
Member

@jasongrout jasongrout commented Nov 17, 2022

In #13759, we discussed how Colab's inspection routine guards against calling an arbitrary object's __repr__() method, since in practice that can take a lot of time and/or memory, depending on how the object was implemented.

This PR experiments with including an updated/modified version of the Colab implementation (licensed Apache 2.0), and making it easy to configure the interactive shell's inspector implementation.

Use the new inspector with

ipython --InteractiveShell.inspector_class='IPython.core.oinspect_safe.SafeInspector'

CC @craigcitro

@jasongrout
Copy link
Member Author

I see a difference in the %pinfo output between the two inspectors:

class BadRepr:
    def __repr__(self):
        import time
        time.sleep(10)
        return 'a very long string'*100

Default inspector:

In [2]: %pinfo BadRepr
Init signature: BadRepr()
Docstring:      <no docstring>
Type:           type
Subclasses:     

SafeInspector:

In [2]: %pinfo BadRepr
Type:        type
String form: <class '__main__.BadRepr'>
Docstring:   <no docstring>

@jasongrout
Copy link
Member Author

Colab special-cases tensorflow objects:

        if (
            isinstance(shape, tuple)
            or hasattr(shape, "__module__")
            and isinstance(shape.__module__, str)
            and "tensorflow." in shape.__module__
        ):
            return "{} with shape {}".format(type_name, shape)

Should the IPython implementation also specialcase tensorflow? If not, can we make it easy for a subclass to have its own special cases at this point?

…ream code it is modifying for easier maintenance

oinspect._render_signature is basically a copy of inspect.Signature.__str__ with a few tweaks. However, comparing the two functions is hard since it was basically rewritten for _render_signature. This commit makes _render_signature nearly identical to inspect.Signature.__str__ so it can be easily compared in the future, with clear notes about what was changed.

There should be no behavior difference before and after this change.
…tdef

With this change, the code between the two functions is largely identical, with the change that the signature default and annotation values are replaced with versions with safe repr functions. With this change, the SafeInspector._getdef now adopts the line-breaking behavior of Inspector._getdef.
This also introduces new getstr and getlen methods that can be overridden to have customized string and len methods. For example, for a safe inspector, these may avoid calling the object methods in case it is not safe.
…ormation

This was added in Colab's implementation, and could be used to highlight the lines of the definition of an object, for example.
There should be no change in functionality from this commit
There should be no change in functionality from this commit
- simplify many of the methods
- align implementation of info() more closely with upstream oinspect info()
…ting getstr and getlen

By now, the oinspect info() implementation and the safe inspect info() implementation are basically aligned, with the upstream implementation calling out to getstr and getlen when it is formatting an object for display.
@jasongrout
Copy link
Member Author

I tried to unify the safe inspect info() and oinspect info() implementations. They mostly align. There have been many updates to oinspect info(), which I tried to reason about in the context of the safe inspection. There are still a few notes about TODO items I'm not sure about, but by in large this brings safe inspection in line with what the oinspect info currently produces. It seems that there are only a few key places where we needed to change the upstream implementation, so I put stubs in for functions the safe implementation can override. Then deleted the safe_inspect info() code. This greatly simplifies things. I spent a lot of time reading the Python inspect module code to convince myself that it was safe to rely on standard library code for more things. I tried to clearly indicate reasons behind many of the changes in the commits in this PR.

@jasongrout
Copy link
Member Author

I tried running all of our existing oinspect tests with the SafeInspector inspector, along with one other test:

class BadRepr:
    """
    A honeypot repr that should not be called
    """
    def __repr__(self):
        import time
        breakpoint()
        time.sleep(10)
        raise ValueError("DON'T CALL ME")
        return 'A VERY BAD REPR'

def test_info_badrepr():
    inspector.info(BadRepr())

and we have a problem: the Python inspect module calls the repr of an object in many different places for error messages.

The original Colab code sort of gets around this by trying to do the following before calling inspect's signature method:

  1. Get the source lines for the definition
  2. Parse the source lines for the definition
  3. Set the function body to nothing
  4. Unparse the function again, which just yields the signature, basically
  5. If this doesn't work, then call inspect.signature and hope for the best

inspect.signature is significantly more advanced (for example, it handles
However, inspect.signature is significantly more advanced. For example, it looks for a __signature__ attribute, partial methods, classes with __new__ or __init__, an inheritance hierarchy, etc.). In order to get equivalent functionality without the risk of calling an object's repr, I instead copied the parts of inspect.py that do the signature analysis and updated anywhere it calls an object's repr to instead call safe_repr. This does mean that we need to maintain a fork of core Python code, but the changes are minimal and likely easy to update.

So which approach should we use?

  1. Make a copy of parts of inspect.py and modify it to not call repr() or str() on an object, its parameters, or annotations?
  • Risk: the copy of inspect.py needs to work in all supported Python versions (I copied from Python 3.11)
  • Risk: updating the copy of inspect.py periodically needs to be done. Fortunately, the changes are quite minimal and easy to understand
  • Advantage: we get all the functionality of inspect.signature without any of the problems in calling an object's repr()
  1. Do Colab's approach of a best-effort guess at the callable's signature by parsing the source if we can find it, and if that doesn't work, just go ahead and call inspect.signature anyway and hope for the best?
  • Risk: We could still call an object's repr in many cases
  • Advantage: A lot less code to maintain and update

To experiment, I made the copy of parts of inspect.py and updated it to not call repr() or str() on the object in question or related things. It wasn't too bad, but it will be a bit of a pain to maintain going forward.

@jasongrout
Copy link
Member Author

Update: after thinking about this for a bit, I think it's untenable to carry a copy of the Python inspect.py file. I'm looking into how to back that part out and make the safe inspect lighter weight.

Another option is to go upstream to Python and ask if they can not print the repr of the object in error messages. Then we can just mostly use upstream inspect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant