-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fuller support for fixtures #528
Comments
This sounds like
This is test parametrization. See #241
That sounds like
Depending in whether the setup should be done once for all tests or per test, this is either |
That's not the point. The issues are as follows:
|
Could you elaborate what exactly do you mean by dependency resolution? Maybe giving an example (even from another framework). The examples in your OP sound like a normal setup plus parametrization.
I think this does not mesh well with TAP as the reporting basis. You can either have a failed test or a passing test and then annotate that, like the
You're right, a test failure is something different than a setup failure. But in general we cannot decide if a setup failure is persistent or intermittent. The author of the test would know. This boils down to having a |
Tests generally need resources, such as a connection to a network service, or a file with dummy data. When a test requires a specific resource, it declares that dependency as a fixture. A test may need zero resources, one resource, or multiple resources. A static setup function may be limiting in flexibility for some scenarios. Some resources may be quite elaborate, such as a virtual machine cloned from a save point. The processes of creating these resources may be error prone, and not fully deterministic. However, a failure creating a fixture is not a failure of a test. A fixture is created specifically for testing, and the test body uses a fixture to find a vulnerability in the software being tested, based on an assumption that the fixture is created. Assuming a fixture is created successfully, the test itself provides valuable information about the validity of the test target. Advanced frameworks allow not only tests to declare dependencies on fixtures, but fixtures to declare dependencies on other fixtures. When preparing a test, the framework resolves not only the proximate dependencies, but also the ultimate dependencies, using a dependency graph much like the one used in a build process. For now, it is easiest to consider simply a many-to-many mapping of fixtures to tests without any dependencies of one fixture on another. Fixtures are often parameterized, but test parameterization and testing fixtures remain separate topics.
It may not be, but only in a narrow sense. A test report is useful as a complete listing of all tests that have passed versus failed. A good test set has high redundancy across many tests, not simply because redundancy offers greater certainty over validity. but because observations about which tests failed provide important information about the specific location of the problem. If a dependency fails, whether a setup routine or a fixture, then a test cannot run, but the failure gives no information about whether the test target is valid. In such cases, stopping all tests is more sensible, or if possible, resolving specifically which tests depend on the unavailable resource, and running only the others. The situation is similar to that of a theater producer inviting actors to a theater to audition for a play. The producer wants to see as many auditions as possible, but if the theater was forced to close because of an earlier fire, then the auditions may be cancelled for reasons unrelated to whether any of the actors were suitable choices for the roles. The producer should not conclude that any actor was not suitable simply because the fire prevented the scheduled auditions. The TAP format makes a good assumption, that someone running tests wishes to see a full report of exactly which tests failed versus passed. Even so, some situations will prevent properly running the tests, and they are the ones in which it is useful to terminate testing prematurely, and not to generate a full, properly-formatted report.
I'm not sure I follow. I think the essential issue is giving the test author a separation between the procedure that prepares a test, and the test itself. When this separation is properly enforced, then the place of failure will resolve the best way to report. A failure in preparing the test should be reported in a way that does not suggest that the test failed (that is, the target of the test is found to be invalid), only that the test cannot be prepared. A failure in the test itself should be reported that the target of the test was found to be invalid (we will assume that the test is valid, but that distinction is not the responsibility of the test framework). |
How would you distinguish a static vs "dynamic" setup function in bash? At this very shallow level I don't see what prevents you from having a
I think this boils down to the ultimate question: do you want a setup failure to give a failed test or a successful test result. The way Bats works, we map setup failures to test failures. I think this is the only right thing you can do in a general case, as you would not want to have a broken setup "disable" the test and miss a breaking change. If you know that you actually just want to skip the test and not fail the test run, then you can call
Dependency management is a can of worms I'd like to avoid opening. If you have a complex system with complex dependencies, you can try to map it to functions calling each other. To encode support into bats we would need a strong usecase and at least a good outline on what a solution would look like.
I was talking about parametrization as a means to make
I think I begin to grasp your underlying motivation, so let me try to rephrase to see if I got the gist of it:
Now, when resource A (e.g. a remote Server) is not available, you want to skip all tests that use this resource to avoid wasting time on running the rest of their
The issue with TAP is, that we can only write one of two lines: Please give an example of what your expected output should look like in TAP for a setup/fixture failure. |
"Static" meaning the same execution path of the setup logic for each test in the series. If a resource is not available, but needed for some tests, it would need to provisioned by a
If the Olympics are canceled, which athletes do you want to be awarded the medals? I provided a different metaphor in my earlier comments.
A broken setup does disable a test. Such is an inevitable fact.
I agree, but it would be a big help to be able to do it for just one level, even if not a general graph.
Parameterization and dependencies are different subjects. The practical concerns are related, but the concepts are different. The former is about repetition of the same test but for specific changes of values. The latter is about resolution of the necessary but minimal work for each single test, which is consistent for each invocation of the same test definition (before it's paramaterized), but different for each test definition.
Mostly, yes. I am not thinking particularly about wasted time, as much as about completeness and correctness of reports. What I want mostly is a report that partitions tests in three categories, 1) those that passed, 2) those that failed, even though the setup succeeded, those that were not attempted, (because the setup failed, making any result from the test body meaningless, if not misleading, in the report).
Yes, it seems that TAP is limiting, at least the way you describe it. I am not familiar myself, so taking the premise that it provides no outlet to report tests that were not run due to setup failure, I believe it makes sense to offer a runtime switch on the command line. In some cases, the output will need to be processed by a robot, in which case adherence to syntax may be the overarching concern. At other times, the preference may be to break from the TAPs-style of output in the special case, in preference to showing a message such as the following:
|
I have the feeling that this discussion is going in circles and we seem to have trouble communicating effectively with each other. I'll try to summarize one last time but if this drags on longer, I'll need to pause this to attend to more pressing issues. There are two main issues:
I think these issues are mostly orthogonal to each other and can therefore be dealt with separately.
|
Your summary is essentially accurate, so I'm not sure what you think might be missing or unclear about my responses, other than that you might not feel persuaded by the suggestion. At an rate, I am not viewing this conversation as unsuccessful, in terms of communication.
Yes, item (1) was the premise of the topic. To some degree, I agree about the orthogonality, but fulfilling item (1) allows the system to continue with some tests, even while unmet dependencies may prevent certain others. Thus, fully achieving the essence of item (2) depends to a certain extent on item (2). The relationship is more strongly of use case than implementation, so discussing the two itesm together or separate is largely a matter of preference.
What would you want to see in code? Part of the issue is that much of the discussion was about reports, which is not expressed in code but is interpretted in natural language. If you want an example that is real world but not an abstract metaphor, you might consider hearing someone report, "The network services were down last night, but none of the tests showed a regression in the project code".
I understand, but the true sense underlying your comments, within the current context, appears to express the idea that the design of the entire system be constrained by limitations of a formatter. My suggestion is that the framework should carry features most useful to those who use it, leaving to them any decisions about how to restrict its use to suit a chosen formatter.
A static "skip" keyword is entirely different from skipping a test based on a failed attempt to resolve a dependency, which is generally unknown beforehand. Further, it is often infeasible to insert a change to a test definition into the test environment, which may integrate directly with a revision control system. |
Obviously, the formatter can only act in the information that it has available. Therefore, it is constrained by the way TAP delivers these.
Skip in Bats is a command that can be executed anywhere in setup or test code, even guarded by arbitrary branching logic. Consider this with regard to your sbove example:
So a network failure will generate following TAP output:
|
I'm not sure this response is closely connected to my comment. My point was that in considering more features for the framework, limitations in one formatter (or even many) for display of richer information should not be seen as a reason to exclude the features. The features have benefit independent of the formatter, and how to manage any loss of information is a decision that may vary per case.
Thank you for the helpful clarification. Even considering it, though, I'm not sure that it detracts much from the added value for the additional features under discussion. |
The internal communication of bats is already a TAP stream plus some additional annotations that get filtered out by the formatters. By design, TAP prints the test outputs after the test state (
IMHO, this makes most of what you want possible with what is there already:
I think your formatter usecase might be too much of a niche to put directly into bats-core. |
I just skimmed the TAP specification, and discovered plenty of flexibility for extended reporting. One possible informal extension is represented by the following example:
In the example, In fact, skipping is explained rather broadly, for example, through the following excerpt:
Yes, of course, I agree. In fact, any Bats testing might be written simply in pure Bash. At issue is the role of a test framework being to present metaphors and semantics providing resilience and simplicity for the test writer, without the writer diving into boilerplate recipes. Fixtures are one of the tools that helps achieve these ends. |
I wrote up a small example of what I see as a possible solution for this issue in this gist. The example relies on the yet to be released external formatters feature from #602 . |
I am still working to understand what is happening in the example. Basically, the idea is to avoid some of the limitations in TAP output, bypassing it with a custom formatting engine? |
It is an example of creating a fixture hierarchy and using a custom formatter to ignore tests whose dependencies we're not met. The formatter relies in the internal extened bats interface as defined in |
Do I correctly understand that the structure of the fixture hierarchy is captured only by whatever call sequences are internal to the bodies of the test definitions, and that the new feature in the project represented by the example is limited to the customizable formatting? |
That is correct. It is an Implementation of what I describe in previous comments here. |
Well, it definitely adds some flexibility that might be useful as a workaround. It would still add further value if the proposed enhancement would be considered for future development? |
To be honest, I still don't know what exactly you expect to be implemented. The formatter won't become part of bats-core. I ensured that you can load your own via #602. The fixture part is pure bash, so there is nothing to do here. If you want to suggest concrete changes, I need concrete examples of where you want to go with this. |
Is it at least clear how one might imagine specific fixtures named as dependencies for particular tests (or other fixtures)? |
Well, I already wrote how I would handle that: fixtures are functions and tests call their respective fixture functions in the setup, as is done in the example. Caveat: for readability, the setup is done in the tests themselves because we only have one setup per file. In the gist, all setup_* functions are intended as fixtures. |
Are you referring to the example? The distinction I was making was for support for annotating the dependencies, as is available from many test frameworks. |
Yes, I am referring to the example. In my view, calling succinctly named functions is sufficient as annotation. If you find that lacking, then please provide an example where it could be improved upon with an explanation why and how it should be improved. |
Explicitly calling functions may be sufficient when considered solely with respect to the concern of succinctness, but the more essential benefit follows from the ability of the framework to derive from the annotations a dependency tree. An internal representation of tasks and dependencies facilities clean reporting and the avoidance of redundant operations within the same test sequence. The matter of primary relevance is the concept of an internal representation of tasks and dependencies. If you wish, I could consider giving an example of syntax, but the concept may be clear without one, as it follows a pattern common among many frameworks. |
An example, even in another framework/language, would be much appreciated. |
It is trivial for me to provide an example from another tool, as all that is needed is a link to a reference manual. I am choosing from the pytest manual the section entitled How to use fixtures. Please note that some features described in the page may be out of scope of the current discussion, but it is important to review the subsection entitled Fixtures can request other fixtures. |
Now, what exactly do you want to show there? Under the direct call paradigm functions can call each other too. The page discusses caching fixture results, which is not part of my example. Do you want to avoid duplicate fixture execution, e.g. in the following example?
|
An exhaustive representation of the full usefulness of fixtures is of course beyond the scope of this discussion. The general topic is presented in varied sources. The most relevant difference between fixtures as a framework feature, versus emulated by a direct function call (as in your example), is allowing the framework to detect an error in the fixture as distinct from from in the core test, and to report accordingly. A test with a failed fixture is not a failed test, only an unsuccessful test. |
Well, that maps pretty directly to failures in IMHO, this does not mesh well with the way
That may well be. However, we already have the name fixture in use internally and the concept will have different features depending on what the language can offer. I am frustrated by the vagueness of all of this which leaves the details for me to figure out. My hope was that we could flesh out the minimum requirements we would need to implement for this to be useful. As is, I don't really know what we are going to do here and it is not on my priority list. You are welcome to provide a pull request for this feature but I'll have to stop here. |
Yes, it may not, but the idea is to move beyond the current limitations of
Yes, we discussed this problem at great length in the original segment of this discussion. Did you review my suggestion for the TAP-compatible reporting strategy?
I wasn't aware they were still missing, but I would try to provide them, if you would allow me to try your patience just a bit further. |
You mean the Using the parlance of pytest: They have passing, failing and not attempted tests (that should fail the test run). We don't have the third option right now and it will be hard to make that work with TAP, because they lack it too.
Yes, please. |
It's not ideal, but workable. The output conforms to the more narrow assumptions provided by the basic format, but also includes richer information for ingestion by advanced engines.
For each test, allow annotation with a list of functions. The functions are run in sequence, as listed, with duplicates removed, before starting the test. If any function fails, then the test aborts. Internally, the test results are provided by a third category separate from pass or fail. The requirements may be expanded. These are the bare minimum. |
Current thinking appears to direct toward the use of recursive invocation of the Bats executable in cases requiring test fixtures.
An approach of such kind is far more cumbersome and limiting than a typical fixtures feature in testing software.
Generally, fixtures support at least a few benefits, some as follows, supporting simplicity and flexibility, for test definitions:
For example, currently I am testing a system that must express consistent interaction with a client for a variety of storage back ends. The tests providing the client interaction are defined, but lacking is any clear and simple way to run the same tests identically for each back end. A fixture which could reference a set of the processes for generating and linking to each back end would support this need following common design patterns in test systems.
As another example, suppose ten tests each require the same setup, but at some times the setup fails for reasons not related to a problem with the test target, for example, non-availability of a public internet resource. The current method would suggest placing the setup procedure in a shell function called
setup()
. If it were to fail, then it would be invoked and fail for each of the ten tests, and each would be reported as having failed. More helpful behavior would be stopping the test process after the first failure, and reporting a problem with the setup operation (that is, the fixture), and as such no tests would have been run or reported as having failed versus succeeded.The text was updated successfully, but these errors were encountered: