Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly match package/module names in import hook #144

Merged
merged 2 commits into from Aug 22, 2020
Merged

Correctly match package/module names in import hook #144

merged 2 commits into from Aug 22, 2020

Conversation

wbolster
Copy link
Contributor

@wbolster wbolster commented Aug 14, 2020

The previously used regular expression tried to handle both exact matches and
and prefix matches in one go, using this approach:

re.compile(r'^%s\.?' % pkg)

However, this is incorrect, since the literal dot is optional in the
pattern, causing longer matches to also get included. For example, ‘foo’
should match ‘foo’ and ‘foo.bar’, but it also incorrectly matches ‘foobar’:

>>> re.compile(r'^foo\.?').match('foobar')
<_sre.SRE_Match object; span=(0, 3), match='foo'>

In practice, a command like this (using the pytest plugin as an example)
is supposed to check the ‘flask’ package and any modules below it:

pytest --typeguard-packages=flask

... but in reality it also checks other packages, such as
‘flask_sqlalchemy’ and ‘flask_redis’, if those happen to be installed.

This can be easily fixed by not using regular expression, but simple
string matching instead.

@coveralls
Copy link

coveralls commented Aug 14, 2020

Coverage Status

Coverage increased (+0.1%) to 88.593% when pulling d2230ca on wbolster:import-hook-name-matching into 01a6fc4 on agronholm:master.

The previously used regular expression tried to handle both exact
matches and and prefix matches in one go, using this approach:

    re.compile(r'^%s\.?' % pkg)

However, this is incorrect, since the literal dot is optional in the
pattern, causing longer matches to also get included. For example, ‘foo’
should match ‘foo’ and ‘foo.bar’, but it also incorrectly matches
‘foobar’:

    >>> re.compile(r'^foo\.?').match('foobar')
    <_sre.SRE_Match object; span=(0, 3), match='foo'>

In practice, a command like this (using the pytest plugin as an example)
is supposed to check the ‘flask’ package and any modules below it:

    pytest --typeguard-packages=flask

... but in reality it also checks other packages, such as
‘flask_sqlalchemy’ and ‘flask_redis’, if those happen to be installed.

This can be easily fixed by not using regular expression, but simple
string matching instead.
@agronholm
Copy link
Owner

Alternatively, we could fix the RE to be ^%s(\.|$).

@wbolster
Copy link
Contributor Author

big fan of simple code here.

the regex wasn't obviously wrong the first time. sure, it can be fixed by making it more complex. or more clever.

i am not a fan of clever. my preference is always to fix things by making them simpler. 😀

@wbolster
Copy link
Contributor Author

wbolster commented Aug 14, 2020

and to prove my point, your suggestion would still be wrong. 🙃

the string substitution can introduce wildcard characters in the regex. most notably a dot which is actually expected in submodule names.

sure, another re.escape() thrown on top would fix that.

but the end result would be even more complex/clever and hard to understand at a glance.

@agronholm
Copy link
Owner

Fair enough.

@wbolster
Copy link
Contributor Author

wbolster commented Aug 20, 2020

Just wondering, is there anything that needs to be done here before this can be merged?

It would be great if this and #143 could make it into a new release, so that it is easier to actually use this project without manually installing from a custom git branch (or maintaining a fork). No rush though.

@agronholm
Copy link
Owner

Since the test suite was passing before, it would be nice to get a test added that does not pass with the previous implementation. Can you do that?

@wbolster
Copy link
Contributor Author

wbolster commented Aug 21, 2020

sure, i pushed a commit with some tests for the should_match logic here: d2230ca. this covers the actual matching logic by checking the helper directly (grey-box testing).

i also experimented with a more black-box approach, since the above-mentioned test does not not actually check the import hook itself. the code below approaches it from a higher level, and actually checks that typeguard successfully hijacked the import (by checking for the injected import typeguard in the loaded module). in the end i didn't like it much because it's very hackish and brittle, so i abandoned the experiment. anyway, i'll just dump it here in case you're interested:

def test_package_name_with_match():
    """
    The import hook injects a ‘typeguard’ import into matching modules.
    """
    sys.modules.pop("dummymodule", None)  # unload hack
    with install_import_hook("dummymodule"):
        module = import_module("dummymodule")
        module.typeguard  # typeguard import was injected


def test_package_name_no_match_prefix():
    """
    The import hook does not hook into non-matching modules.
    """
    sys.modules.pop("dummymodule", None)  # unload hack
    with install_import_hook("dummy"):
        module = import_module("dummymodule")
        with pytest.raises(AttributeError):
            module.typeguard  # typeguard import was not injected

@agronholm agronholm merged commit 55fbcb5 into agronholm:master Aug 22, 2020
@agronholm
Copy link
Owner

Perfect. Thanks again!

@wbolster wbolster deleted the import-hook-name-matching branch September 16, 2020 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants