Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace tests that assert on token output with auto-updatable samples #1649

Merged
merged 15 commits into from
Jan 18, 2021

Conversation

oprypin
Copy link
Contributor

@oprypin oprypin commented Dec 28, 2020

This extracts the code samples from each test that used to do assert list(lexer.get_tokens(fragment)) == tokens, puts each of them into their own file, and gives the ability to automatically update such files if there are changes to the code affecting them.

To perform such an update in-place, run pytest tests/test_lexers.py --update-goldens. The functionality comes from the 'pytest-golden' plugin (disclaimer: written by me!)

When testing without the update flag, this just asserts that the token list does not change from the stored list.

So, the tests still serve the same purpose as before (and you can see the diffs both from pytest and in code review), just that they don't need to be manually updated. One only needs to provide the interesting code sample.


Note to reviewers:

tests/test_lexers.py is the newly added file, which is the entry point for these tests.

The conversion work was mostly mechanical, but I'll make sure to look through all of it again if the overall idea is liked.

This extracts the code samples from each test that used to do `assert list(lexer.get_tokens(fragment)) == tokens`, puts each of them into their own file, and gives the ability to *automatically* update such files if there are changes to the code affecting them.

To perform such an update in-place, run `pytest tests/test_lexers.py --update-goldens`. The functionality comes from the 'pytest-golden' plugin [1].

When testing *without* the update flag, this just asserts that the token list does not change from the stored list.

So, the tests still serve the same purpose as before (and you can see the diffs both from pytest and in code review), just that they don't need to be manually updated. One only needs to provide the interesting code sample.

[1]: https://github.com/oprypin/pytest-golden
@Anteru
Copy link
Collaborator

Anteru commented Dec 28, 2020

Hi, that's ... a rather huge change! Before I get into the details of this, this doesn't come with a formatter for it, unless I'm missing something. Right now use can use -f testcase to generate test case code. That is rather convenient as you can take any code snippet and turn it into a test quickly.

In general I think the idea is good, but I'm not entirely sure if that's not a bit too big of a change to take right now. I'd also like to have @birkenfeld chime in.

@oprypin
Copy link
Contributor Author

oprypin commented Dec 28, 2020

Haha I see, looks like I didn't realize that people normally use -f testcase to update the tests.

You're saying it doesn't come with a formatter, but a formatter would be used manually and its output would be pasted into the test file. Instead, this one does the updating itself (with one universal command), so it's not needed.

@Anteru Anteru added this to the 2.8 milestone Jan 12, 2021
@Anteru
Copy link
Collaborator

Anteru commented Jan 17, 2021

Can you please re-run this on the latest master, looks like there's a bunch of conflicts here. Regarding the formatter, I'm not sure I'm following: Today, when someone wants to test a piece of text, they run it through pygments -f testcase, and get code they can copy/paste into a unit test file. The testcase formatter produces the tokens and the token types. How do you envision this will work once your plugin is merged? I thought they'd still use testcase to produce the testcases, just in the new, more readable format.

@oprypin
Copy link
Contributor Author

oprypin commented Jan 17, 2021

Can you please re-run this on the latest master, looks like there's a bunch of conflicts here.

This initial translation is partly manual and not trivial. Not worth resolving conflicts in case it's going to sit here for a while again.

I thought they'd still use testcase to produce the testcases, just in the new, more readable format.

They add a new file that looks like

input: |
  SOURCE CODE

then run pytest --update-goldens and the file gets fully populated.

If the example file already exists but needs an update, just the latter part is necessary.

@Anteru
Copy link
Collaborator

Anteru commented Jan 17, 2021

This initial translation is partly manual and not trivial. Not worth resolving conflicts in case it's going to sit here for a while again.

Well, I do want to merge it immediately, so it's not going to sit there for a while :) I just didn't want to merge ahead of a release, as we had a few PRs in flight.

They add a new file that looks like

input: |
  SOURCE CODE

then run pytest --update-goldens and the file gets fully populated.

I see. Can you please add a few lines to the docs about this?

@oprypin
Copy link
Contributor Author

oprypin commented Jan 17, 2021

I do want to merge it immediately, so it's not going to sit there for a while :)

Oh sorry then, let me do that.

Has @birkenfeld checked it out? I acknowledge that this one can be controversial :D

BTW, I do recommend trying it out for yourself locally. There are various ways (gh command line is nice), the basic way is:

git fetch origin pull/1649/head
git checkout FETCH_HEAD

And yes, I'll think of a doc as well.

@Anteru
Copy link
Collaborator

Anteru commented Jan 17, 2021

Yes, @birkenfeld is on board. Please let me know when you're ready, I want to merge this within a day of you being done so we can cleanly cut over to the new system going forward. Thanks for your help here!

@oprypin
Copy link
Contributor Author

oprypin commented Jan 17, 2021

Thank you as well! I will take a detailed look over everything today and then let you know.

@birkenfeld
Copy link
Member

To chime in in more detail here, after having reviewed the patch more closely now: I'm in favor of the general concept, however I'm hesitant about two things:

  • YAML as the file format - I intensely dislike it in general, and in particular for such trivial data as two literal strings it's overkill.
  • The pytest plugin, while nice, is an uncommon dependency and itself has lots of transitive dependencies. This creates packaging work for distributors, while the actual provided functionality is quite modest.

We actually already have a facility for this in the test suite; the test_examplefiles can store its output and check against that (but it's not as nice, and disabled in the code). In the course (or as a followup) of this change, I'd also like to convert the examplefiles test to this method.

So as a TLDR: excellent idea, but the implementation needs to be simplified.

@oprypin
Copy link
Contributor Author

oprypin commented Jan 17, 2021

Hehe, maybe it's true. Having gone through several iterations of this in development, I lost the context that it is in fact just 2 string literals. But to be fair, another feature of YAML is being used: the comments.
I think I can extract the functionality to have no external dependency.
But please do tell me your suggestion regarding how the sample files should look.
An alternative to

# comment
input: |
  sample
# comment
tokens: |
  Aaa      aaa
  Bbb      bbb

@oprypin
Copy link
Contributor Author

oprypin commented Jan 17, 2021

Not worth resolving conflicts

Well, I guess that was a good prediction

@Anteru
Copy link
Collaborator

Anteru commented Jan 17, 2021

As to the file format, it could be as simple as splitting it into two files, testcase.input and testcase.expected, and .expected would be one token per line, with comment support (given that has to adhere to a format anyways.)

@oprypin oprypin force-pushed the golden branch 2 times, most recently from 2b90301 to 3a649d2 Compare January 17, 2021 21:33
@oprypin
Copy link
Contributor Author

oprypin commented Jan 17, 2021

As of now, did the work purely to avoid any external deps, by re-implementing them here.

If you want a particular file format, let me know.
Regarding the suggestion about two files per case - I think that will have really bad usability.

@oprypin
Copy link
Contributor Author

oprypin commented Jan 17, 2021

Pushed with new file format, one which has no invalid states.

Canonical
Canonical
w/o comment
w/o
input marker
w/o
tokens marker
w/o markers
comment
---input---
input
---tokens---
tokens

---input---
input
---tokens---
tokens


input
---tokens---
tokens

---input---
input




input


3rd and 4th examples should never occur in practice. And of course, writing the files will be in the canonical form.

@birkenfeld
Copy link
Member

Ok sure, we can squash when merging.

@birkenfeld birkenfeld merged commit f0445be into pygments:master Jan 18, 2021
@Anteru
Copy link
Collaborator

Anteru commented Jan 18, 2021

Very nice, thanks a lot! That should hopefully reduce the entry barrier a bit going forward.

@oprypin
Copy link
Contributor Author

oprypin commented Feb 14, 2021

I see that not everything is 100% smooth. E.g. the fact that 3f78a50 was needed. If you have any pains with this, please poke me any time :)
E.g. if this particular example keeps re-occurring, then we should make the assertion be that the file content 100% matches, not that the values from it match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants