Replace tests that assert on token output with auto-updatable samples #1649

oprypin · 2020-12-28T20:32:42Z

This extracts the code samples from each test that used to do assert list(lexer.get_tokens(fragment)) == tokens, puts each of them into their own file, and gives the ability to automatically update such files if there are changes to the code affecting them.

To perform such an update in-place, run pytest tests/test_lexers.py --update-goldens. ~~The functionality comes from the 'pytest-golden' plugin (disclaimer: written by me!)~~

When testing without the update flag, this just asserts that the token list does not change from the stored list.

So, the tests still serve the same purpose as before (and you can see the diffs both from pytest and in code review), just that they don't need to be manually updated. One only needs to provide the interesting code sample.

Note to reviewers:

tests/test_lexers.py is the newly added file, which is the entry point for these tests.

The conversion work was mostly mechanical, but I'll make sure to look through all of it again if the overall idea is liked.

This extracts the code samples from each test that used to do `assert list(lexer.get_tokens(fragment)) == tokens`, puts each of them into their own file, and gives the ability to *automatically* update such files if there are changes to the code affecting them. To perform such an update in-place, run `pytest tests/test_lexers.py --update-goldens`. The functionality comes from the 'pytest-golden' plugin [1]. When testing *without* the update flag, this just asserts that the token list does not change from the stored list. So, the tests still serve the same purpose as before (and you can see the diffs both from pytest and in code review), just that they don't need to be manually updated. One only needs to provide the interesting code sample. [1]: https://github.com/oprypin/pytest-golden

Anteru · 2020-12-28T20:45:32Z

Hi, that's ... a rather huge change! Before I get into the details of this, this doesn't come with a formatter for it, unless I'm missing something. Right now use can use -f testcase to generate test case code. That is rather convenient as you can take any code snippet and turn it into a test quickly.

In general I think the idea is good, but I'm not entirely sure if that's not a bit too big of a change to take right now. I'd also like to have @birkenfeld chime in.

oprypin · 2020-12-28T20:49:47Z

Haha I see, looks like I didn't realize that people normally use -f testcase to update the tests.

You're saying it doesn't come with a formatter, but a formatter would be used manually and its output would be pasted into the test file. Instead, this one does the updating itself (with one universal command), so it's not needed.

Anteru · 2021-01-17T07:43:26Z

Can you please re-run this on the latest master, looks like there's a bunch of conflicts here. Regarding the formatter, I'm not sure I'm following: Today, when someone wants to test a piece of text, they run it through pygments -f testcase, and get code they can copy/paste into a unit test file. The testcase formatter produces the tokens and the token types. How do you envision this will work once your plugin is merged? I thought they'd still use testcase to produce the testcases, just in the new, more readable format.

oprypin · 2021-01-17T07:53:03Z

Can you please re-run this on the latest master, looks like there's a bunch of conflicts here.

This initial translation is partly manual and not trivial. Not worth resolving conflicts in case it's going to sit here for a while again.

I thought they'd still use testcase to produce the testcases, just in the new, more readable format.

They add a new file that looks like

input: |
  SOURCE CODE

then run pytest --update-goldens and the file gets fully populated.

If the example file already exists but needs an update, just the latter part is necessary.

Anteru · 2021-01-17T07:57:14Z

This initial translation is partly manual and not trivial. Not worth resolving conflicts in case it's going to sit here for a while again.

Well, I do want to merge it immediately, so it's not going to sit there for a while :) I just didn't want to merge ahead of a release, as we had a few PRs in flight.

They add a new file that looks like
input: |
  SOURCE CODE
then run pytest --update-goldens and the file gets fully populated.

I see. Can you please add a few lines to the docs about this?

oprypin · 2021-01-17T08:01:21Z

I do want to merge it immediately, so it's not going to sit there for a while :)

Oh sorry then, let me do that.

Has @birkenfeld checked it out? I acknowledge that this one can be controversial :D

BTW, I do recommend trying it out for yourself locally. There are various ways (gh command line is nice), the basic way is:

git fetch origin pull/1649/head
git checkout FETCH_HEAD

And yes, I'll think of a doc as well.

Anteru · 2021-01-17T09:16:24Z

Yes, @birkenfeld is on board. Please let me know when you're ready, I want to merge this within a day of you being done so we can cleanly cut over to the new system going forward. Thanks for your help here!

oprypin · 2021-01-17T09:19:34Z

Thank you as well! I will take a detailed look over everything today and then let you know.

birkenfeld · 2021-01-17T09:34:00Z

To chime in in more detail here, after having reviewed the patch more closely now: I'm in favor of the general concept, however I'm hesitant about two things:

YAML as the file format - I intensely dislike it in general, and in particular for such trivial data as two literal strings it's overkill.
The pytest plugin, while nice, is an uncommon dependency and itself has lots of transitive dependencies. This creates packaging work for distributors, while the actual provided functionality is quite modest.

We actually already have a facility for this in the test suite; the test_examplefiles can store its output and check against that (but it's not as nice, and disabled in the code). In the course (or as a followup) of this change, I'd also like to convert the examplefiles test to this method.

So as a TLDR: excellent idea, but the implementation needs to be simplified.

oprypin · 2021-01-17T09:49:06Z

Hehe, maybe it's true. Having gone through several iterations of this in development, I lost the context that it is in fact just 2 string literals. But to be fair, another feature of YAML is being used: the comments.
I think I can extract the functionality to have no external dependency.
But please do tell me your suggestion regarding how the sample files should look.
An alternative to

# comment
input: |
  sample
# comment
tokens: |
  Aaa      aaa
  Bbb      bbb

oprypin · 2021-01-17T20:13:19Z

Not worth resolving conflicts

Well, I guess that was a good prediction

Anteru · 2021-01-17T20:24:12Z

As to the file format, it could be as simple as splitting it into two files, testcase.input and testcase.expected, and .expected would be one token per line, with comment support (given that has to adhere to a format anyways.)

oprypin · 2021-01-17T21:43:29Z

As of now, did the work purely to avoid any external deps, by re-implementing them here.

If you want a particular file format, let me know.
Regarding the suggestion about two files per case - I think that will have really bad usability.

oprypin · 2021-01-17T22:45:58Z

Pushed with new file format, one which has no invalid states.

Canonical	Canonical w/o comment	w/o input marker	w/o tokens marker	w/o markers
comment ---input--- input ---tokens--- tokens	---input--- input ---tokens--- tokens	input ---tokens--- tokens	---input--- input	input

3rd and 4th examples should never occur in practice. And of course, writing the files will be in the canonical form.

birkenfeld · 2021-01-18T19:34:02Z

Ok sure, we can squash when merging.

Anteru · 2021-01-18T20:25:59Z

Very nice, thanks a lot! That should hopefully reduce the entry barrier a bit going forward.

oprypin · 2021-02-14T16:57:42Z

I see that not everything is 100% smooth. E.g. the fact that 3f78a50 was needed. If you have any pains with this, please poke me any time :)
E.g. if this particular example keeps re-occurring, then we should make the assertion be that the file content 100% matches, not that the values from it match

Anteru requested a review from birkenfeld December 28, 2020 20:41

Anteru added this to the 2.8 milestone Jan 12, 2021

oprypin added 2 commits January 17, 2021 10:08

Merge remote-tracking branch 'origin/master' into golden

22e3e05

Fix badly migrated vbscript tests

7d14db4

oprypin added 3 commits January 17, 2021 16:05

Describe golden tests in Contributing.md

b62ac2c

Finish some forgotten items

9cc150e

Merge remote-tracking branch 'origin/master' into golden

5b9a643

oprypin force-pushed the golden branch from bcf5ec7 to 5b9a643 Compare January 17, 2021 20:12

oprypin force-pushed the golden branch 2 times, most recently from 2b90301 to 3a649d2 Compare January 17, 2021 21:33

Avoid pytest-golden and YAML

b491e53

oprypin force-pushed the golden branch from 3a649d2 to b491e53 Compare January 17, 2021 21:39

oprypin added 2 commits January 17, 2021 23:30

Simplify sample file format

f35c9b8

Rename yaml -> txt

4a8d962

Canonicalization fix

a634f41

birkenfeld merged commit f0445be into pygments:master Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace tests that assert on token output with auto-updatable samples #1649

Replace tests that assert on token output with auto-updatable samples #1649

oprypin commented Dec 28, 2020 •

edited

Anteru commented Dec 28, 2020

oprypin commented Dec 28, 2020

Anteru commented Jan 17, 2021

oprypin commented Jan 17, 2021

Anteru commented Jan 17, 2021 •

edited

oprypin commented Jan 17, 2021

Anteru commented Jan 17, 2021

oprypin commented Jan 17, 2021

birkenfeld commented Jan 17, 2021

oprypin commented Jan 17, 2021 •

edited

oprypin commented Jan 17, 2021

Anteru commented Jan 17, 2021

oprypin commented Jan 17, 2021

oprypin commented Jan 17, 2021

birkenfeld commented Jan 18, 2021

Anteru commented Jan 18, 2021

oprypin commented Feb 14, 2021

Replace tests that assert on token output with auto-updatable samples #1649

Replace tests that assert on token output with auto-updatable samples #1649

Conversation

oprypin commented Dec 28, 2020 • edited

Anteru commented Dec 28, 2020

oprypin commented Dec 28, 2020

Anteru commented Jan 17, 2021

oprypin commented Jan 17, 2021

Anteru commented Jan 17, 2021 • edited

oprypin commented Jan 17, 2021

Anteru commented Jan 17, 2021

oprypin commented Jan 17, 2021

birkenfeld commented Jan 17, 2021

oprypin commented Jan 17, 2021 • edited

oprypin commented Jan 17, 2021

Anteru commented Jan 17, 2021

oprypin commented Jan 17, 2021

oprypin commented Jan 17, 2021

birkenfeld commented Jan 18, 2021

Anteru commented Jan 18, 2021

oprypin commented Feb 14, 2021

oprypin commented Dec 28, 2020 •

edited

Anteru commented Jan 17, 2021 •

edited

oprypin commented Jan 17, 2021 •

edited