Correctly match package/module names in import hook #144

wbolster · 2020-08-14T17:49:13Z

The previously used regular expression tried to handle both exact matches and
and prefix matches in one go, using this approach:

re.compile(r'^%s\.?' % pkg)

However, this is incorrect, since the literal dot is optional in the
pattern, causing longer matches to also get included. For example, ‘foo’
should match ‘foo’ and ‘foo.bar’, but it also incorrectly matches ‘foobar’:

>>> re.compile(r'^foo\.?').match('foobar')
<_sre.SRE_Match object; span=(0, 3), match='foo'>

In practice, a command like this (using the pytest plugin as an example)
is supposed to check the ‘flask’ package and any modules below it:

pytest --typeguard-packages=flask

... but in reality it also checks other packages, such as
‘flask_sqlalchemy’ and ‘flask_redis’, if those happen to be installed.

This can be easily fixed by not using regular expression, but simple
string matching instead.

coveralls · 2020-08-14T18:06:25Z

Coverage increased (+0.1%) to 88.593% when pulling d2230ca on wbolster:import-hook-name-matching into 01a6fc4 on agronholm:master.

The previously used regular expression tried to handle both exact matches and and prefix matches in one go, using this approach: re.compile(r'^%s\.?' % pkg) However, this is incorrect, since the literal dot is optional in the pattern, causing longer matches to also get included. For example, ‘foo’ should match ‘foo’ and ‘foo.bar’, but it also incorrectly matches ‘foobar’: >>> re.compile(r'^foo\.?').match('foobar') <_sre.SRE_Match object; span=(0, 3), match='foo'> In practice, a command like this (using the pytest plugin as an example) is supposed to check the ‘flask’ package and any modules below it: pytest --typeguard-packages=flask ... but in reality it also checks other packages, such as ‘flask_sqlalchemy’ and ‘flask_redis’, if those happen to be installed. This can be easily fixed by not using regular expression, but simple string matching instead.

agronholm · 2020-08-14T22:01:06Z

Alternatively, we could fix the RE to be ^%s(\.|$).

wbolster · 2020-08-14T22:45:26Z

big fan of simple code here.

the regex wasn't obviously wrong the first time. sure, it can be fixed by making it more complex. or more clever.

i am not a fan of clever. my preference is always to fix things by making them simpler. 😀

wbolster · 2020-08-14T22:52:31Z

and to prove my point, your suggestion would still be wrong. 🙃

the string substitution can introduce wildcard characters in the regex. most notably a dot which is actually expected in submodule names.

sure, another re.escape() thrown on top would fix that.

but the end result would be even more complex/clever and hard to understand at a glance.

agronholm · 2020-08-15T07:00:24Z

Fair enough.

wbolster · 2020-08-20T19:43:20Z

Just wondering, is there anything that needs to be done here before this can be merged?

It would be great if this and #143 could make it into a new release, so that it is easier to actually use this project without manually installing from a custom git branch (or maintaining a fork). No rush though.

agronholm · 2020-08-21T05:46:51Z

Since the test suite was passing before, it would be nice to get a test added that does not pass with the previous implementation. Can you do that?

wbolster · 2020-08-21T20:34:47Z

sure, i pushed a commit with some tests for the should_match logic here: d2230ca. this covers the actual matching logic by checking the helper directly (grey-box testing).

i also experimented with a more black-box approach, since the above-mentioned test does not not actually check the import hook itself. the code below approaches it from a higher level, and actually checks that typeguard successfully hijacked the import (by checking for the injected import typeguard in the loaded module). in the end i didn't like it much because it's very hackish and brittle, so i abandoned the experiment. anyway, i'll just dump it here in case you're interested:

def test_package_name_with_match():
    """
    The import hook injects a ‘typeguard’ import into matching modules.
    """
    sys.modules.pop("dummymodule", None)  # unload hack
    with install_import_hook("dummymodule"):
        module = import_module("dummymodule")
        module.typeguard  # typeguard import was injected


def test_package_name_no_match_prefix():
    """
    The import hook does not hook into non-matching modules.
    """
    sys.modules.pop("dummymodule", None)  # unload hack
    with install_import_hook("dummy"):
        module = import_module("dummymodule")
        with pytest.raises(AttributeError):
            module.typeguard  # typeguard import was not injected

agronholm · 2020-08-22T19:51:17Z

Perfect. Thanks again!

add test_package_name_matching() to test the pattern matching

d2230ca

agronholm merged commit 55fbcb5 into agronholm:master Aug 22, 2020

wbolster deleted the import-hook-name-matching branch September 16, 2020 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly match package/module names in import hook #144

Correctly match package/module names in import hook #144

wbolster commented Aug 14, 2020 •

edited

coveralls commented Aug 14, 2020 •

edited

agronholm commented Aug 14, 2020

wbolster commented Aug 14, 2020

wbolster commented Aug 14, 2020 •

edited

agronholm commented Aug 15, 2020

wbolster commented Aug 20, 2020 •

edited

agronholm commented Aug 21, 2020

wbolster commented Aug 21, 2020 •

edited

agronholm commented Aug 22, 2020

Correctly match package/module names in import hook #144

Correctly match package/module names in import hook #144

Conversation

wbolster commented Aug 14, 2020 • edited

coveralls commented Aug 14, 2020 • edited

agronholm commented Aug 14, 2020

wbolster commented Aug 14, 2020

wbolster commented Aug 14, 2020 • edited

agronholm commented Aug 15, 2020

wbolster commented Aug 20, 2020 • edited

agronholm commented Aug 21, 2020

wbolster commented Aug 21, 2020 • edited

agronholm commented Aug 22, 2020

wbolster commented Aug 14, 2020 •

edited

coveralls commented Aug 14, 2020 •

edited

wbolster commented Aug 14, 2020 •

edited

wbolster commented Aug 20, 2020 •

edited

wbolster commented Aug 21, 2020 •

edited