Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] add a line matcher object #12219

Open
wants to merge 78 commits into
base: master
Choose a base branch
from

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Mar 31, 2024

Here's the line matching feature.

TL;DR: Usage is

app.stderr(flavor='re').assert_match('.*my message')

for the warnings for instance. It automatically removes the colors at the beginning and searches for a line matching the above string. The 'flavor' can maybe changed to 're' by default or have a assert_re_match method but that's how you would use the object in general.

There are some options with their default values but I'm not sure which one should be the default. For now, the line matcher object for test application uses the default options but maybe I can make it with better defaults.

@chrisjsewell, I'd suggest you have a look at the tests (tests/test_testing/test_matcher.py) to see how it would be used. It's still WIP but I would like some feedback and what to expose (actually, with the current API you can more or less match whatever you want more or less however you want but it's better if you don't need to add options or if you have dedicated methods so as an extension's writer I'd be glad if you could give me some feedback on what would the functionalities that you want for sure).

In the tests I used the most precise match information (namely the Line + offset) but you can ignore the offset and simply match against strings (see the tests for Line and Block objects).

I'm still wondering how to have a "nice" error context when the match fails... well, if you don't need the regex support you can just write something like app.stderr().lines() == [...]. Maybe I can also expose a simple interface for simple matching.

@picnixz
Copy link
Member Author

picnixz commented Mar 31, 2024

FYI: some stuff from opened PRs is included inside, so the diff will be smaller once they are merged. In addition, with the current implementation, it's easy for me to make some "shortcut" functions (the main logic is essentially done, except for the nice diff with regex patterns).

@picnixz
Copy link
Member Author

picnixz commented Apr 1, 2024

So, after some investigation, it's quite hard to have a "nice" diff because of regexes. I'll probably make another PR for that one because it's much more complex than what I thought at first (I think I'll take some inspiration from difflib, though it's not on my tasklist now).

I'll wait for some feedback before coming back to that PR.

EDIT: I've found various typos in the documentation so I'll update it tomorrow.

picnixz

This comment was marked as resolved.

@picnixz
Copy link
Member Author

picnixz commented Apr 2, 2024

After considering #12216, I'll update this PR as well.

@picnixz
Copy link
Member Author

picnixz commented Apr 10, 2024

Sure! I just pushed some docs because there multiple typos and refactored a bit things. By the way, I also observed that we had a bug in autodoc thanks to that... so I fixed it (I've got 3 PRs that need to be merged before that just for the docs to be correctly rendered).

@jayaddison
Copy link
Contributor

TL;DR: Usage is

app.stderr(flavor='re').assert_match('.*my message')

Checking some of the design reasoning for the method signature:

  • A reason to prefer the caller to select a flavor is that otherwise, a matcher pattern of type str could ambiguously be either a regex or a simple string match (app.stderr.assert_match('A*')).
  • Using keyword-arguments to select the flavor (app.stderr.assert_match(string='[not a regex]') might be non-obvious, and would prevent positional single-argument usage like the example above.

@picnixz
Copy link
Member Author

picnixz commented Apr 20, 2024

A reason to prefer the caller to select a flavor is that otherwise, a matcher pattern of type str could ambiguously be either a regex or a simple string match (app.stderr.assert_match('A*')).

I don't know which flavor to have by default... I would like to say "yay, it's maybe better to have a pure string flavor" because you usually want to have an exact match (I think most of our tests have exact matching and I think people don't want to escape possible meta-characters...).

One possibility that I had in mind is just to have other methods:

  • assert_match
  • assert_equal

But I'm not very happy with the equal itself... (ideally, it's assert_that_there_is_a_line_equal_to_one_of_the_strings_or_patterns)

Using keyword-arguments to select the flavor (app.stderr.assert_match(string='[not a regex]') might be non-obvious, and would prevent positional single-argument usage like the example above.

I think assert_match(expect, flavor=...) is more natural but I'm open to suggestions. Alternatively, we could have assert_match(regex='whatever to be considered a regex', string='whatever string it should be', fnmatch='whatever fnmatch pattern') but I'm not sure if more flavors should be supported (I tried to have an interface that is flavor-agnostic as much as possible and delegate the task to converting string-like objects to re-patterns objects however you deem them fit (maybe I could even use predicate-based things because I don't think I use anything else except pattern.match)).

@jayaddison
Copy link
Contributor

I think assert_match(expect, flavor=...) is more natural but I'm open to suggestions.

I think I'd probably prefer a developer experience of:

  • assert_contains(string) to check for lines containing string.
  • assert_matches(pattern) to check for lines that match a regular expression pattern.

That is: binding the flavour to the method name. That's partly because it could help me to think about and write / read code, but also there are some second-order downstream static analysis/dependency benefits (tooling/analysts can infer to some degree that assert_contains does not enter finite automata logic originated from Sphinx, for example).

@AA-Turner
Copy link
Member

AA-Turner commented Apr 24, 2024

I am probably missing some context here -- can someone point me to a brief overview of what this PR is for? The body text of the description simply says here is the matcher!

I'm aware Bénédikt has his thesis coming up soon, so no rush, just would be interested in the background given it's a 3000 line PR.

A

@picnixz
Copy link
Member Author

picnixz commented Apr 24, 2024

I am probably missing some context here

Oh yes, I think I never created an issue for that and just rushed to the implementation as if it were a hackathon. So, here's the context: we were doing some cleanups about colorization and co and I observed that many tests always have the same logic, namely:

lines = strip_colors(app.status.getvalue()).strip().splitlines()

and then, they match the lines one by one. All those lines could be replaced by a single function, let's say app.get_status_lines() and that would be it. But since I had time on my hands (and I wanted to have something for my local dev as well), I ended up implementing a line matcher where you can do more than just matching strings by strings, but also match blocks in one go, or with regexes and so on + have a nicer diff. The original idea was based on the pytester object that I used for testing plugins.

Technically speaking, the original issue could be solved by just adding two methods to SphinxTestApp that would just give you the lines. I can do a small PR if you think the 3k lines are too much. Actually, the interface of the matcher object is flexible enough that we can gradually add more methods if needed, but I essentially built it so that the most common operations are done by default, namely "remove colors -> strip -> split lines without keeping line breaks" (well, this default sequence is only for the matcher object for the test application, because people might want to use that matcher for any other kind of string technically).

I also think that it could be useful for matching autodoc outputs. I created a class for formatting expected RST output for enumeration test cases and I thought "how nice it would be to be able to do that but for other autodoc outputs as well" so I think the matcher object can be used in conjunction with those factory classes in a more user-friendly API. At least, tests would be easier to write (and probably shorter). Also, while the pytest output is fine, sometimes it's hard to detect exactly where the error happenned in the huge diff (the line matcher object would "highlight" the erroneous blocks; for now the logic is simple enough but I intended to do it using diff-like algorithms).

I'm aware Bénédikt has his thesis coming up soon, so no rush, just would be interested in the background given it's a 3000 line PR.

Yes, I'm currently writing it !

@picnixz
Copy link
Member Author

picnixz commented Apr 24, 2024

@jayaddison

That is: binding the flavour to the method name. That's partly because it could help me to think about and write / read code, but also there are some second-order downstream static analysis/dependency benefits (tooling/analysts can infer to some degree that assert_contains does not enter finite automata logic originated from Sphinx, for example).

The argument is sound. I can make the flavor-agnostic method private and expose dispatcher methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement enhance or introduce a new feature type:tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants