Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuller support for fixtures #528

Open
brainchild0 opened this issue Dec 27, 2021 · 32 comments
Open

fuller support for fixtures #528

brainchild0 opened this issue Dec 27, 2021 · 32 comments
Labels
Component: Bash Code Everything regarding the bash code Priority: Medium Wrong or misleading documentation, broken behavior with workaround Type: Enhancement Waiting for Contributor Feedback The original contributor did not yet respond to the latest request

Comments

@brainchild0
Copy link

brainchild0 commented Dec 27, 2021

Current thinking appears to direct toward the use of recursive invocation of the Bats executable in cases requiring test fixtures.

An approach of such kind is far more cumbersome and limiting than a typical fixtures feature in testing software.

Generally, fixtures support at least a few benefits, some as follows, supporting simplicity and flexibility, for test definitions:

  1. Automatic resolution of dependencies.
  2. Automatic iterative execution of same test body for different fixture values.
  3. Support for defining processes that run during testing but are not considered part of the test.

For example, currently I am testing a system that must express consistent interaction with a client for a variety of storage back ends. The tests providing the client interaction are defined, but lacking is any clear and simple way to run the same tests identically for each back end. A fixture which could reference a set of the processes for generating and linking to each back end would support this need following common design patterns in test systems.

As another example, suppose ten tests each require the same setup, but at some times the setup fails for reasons not related to a problem with the test target, for example, non-availability of a public internet resource. The current method would suggest placing the setup procedure in a shell function called setup(). If it were to fail, then it would be invoked and fail for each of the ten tests, and each would be reported as having failed. More helpful behavior would be stopping the test process after the first failure, and reporting a problem with the setup operation (that is, the fixture), and as such no tests would have been run or reported as having failed versus succeeded.

@martin-schulze-vireso
Copy link
Member

  1. Automatic resolution of dependencies.

This sounds like setup()'s job.

  1. Automatic iterative execution of same test body for different fixture values.

This is test parametrization. See #241

  1. Support for defining processes that run during testing but are not considered part of the test.

That sounds like setup again.

As another example, suppose ten tests each require the same setup, but at some times the setup fails for reasons not related to a problem with the test target, for example, non-availability of a public internet resource. The current method would suggest placing the setup procedure in a shell function called setup(). If it were to fail, then it would be invoked and fail for each of the ten tests, and each would be reported as having failed. More helpful behavior would be stopping the test process after the first failure, and reporting a problem with the setup operation (that is, the fixture), and as such no tests would have been run or reported as having failed versus succeeded.

Depending in whether the setup should be done once for all tests or per test, this is either setup_file territory or #209.

@brainchild0
Copy link
Author

Depending in whether the setup should be done once for all tests or per test, this is either setup_file territory or #209.

That's not the point.

The issues are as follows:

  • dependency resolution of fixtures as a feature of the framework
  • automatically resolving which setup tasks to perform for each test
  • whether a test is reported as failed because the setup failed, or rather whether the test is reported as simply never attempted (because setup is not part of the test)
  • whether testing stops because setup failed (which is different from stopping all tests because one test failed)

@martin-schulze-vireso
Copy link
Member

martin-schulze-vireso commented Dec 29, 2021

  • dependency resolution of fixtures as a feature of the framework
  • automatically resolving which setup tasks to perform for each test

Could you elaborate what exactly do you mean by dependency resolution? Maybe giving an example (even from another framework). The examples in your OP sound like a normal setup plus parametrization.

  • whether a test is reported as failed because the setup failed, or rather whether the test is reported as simply never attempted (because setup is not part of the test)

I think this does not mesh well with TAP as the reporting basis. You can either have a failed test or a passing test and then annotate that, like the # skip annotation. Currently, a setup failure will fail the test and print that the failure comes from setup, which does communicate the relevant information. Changing this would not only be a major departure from the current way things work but also does not make sense to me. A test that was skipped due to setup failures should still be counted as failed.

  • whether testing stops because setup failed (which is different from stopping all tests because one test failed)

You're right, a test failure is something different than a setup failure. But in general we cannot decide if a setup failure is persistent or intermittent. The author of the test would know. This boils down to having a --abort-on-setup-failure flag or a variable to turn that on per file or to offer a way for a test/setup to communicate to the upper layers that the file should be aborted.

@brainchild0
Copy link
Author

Could you elaborate what exactly do you mean by dependency resolution?

Tests generally need resources, such as a connection to a network service, or a file with dummy data.

When a test requires a specific resource, it declares that dependency as a fixture. A test may need zero resources, one resource, or multiple resources. A static setup function may be limiting in flexibility for some scenarios.

Some resources may be quite elaborate, such as a virtual machine cloned from a save point. The processes of creating these resources may be error prone, and not fully deterministic. However, a failure creating a fixture is not a failure of a test. A fixture is created specifically for testing, and the test body uses a fixture to find a vulnerability in the software being tested, based on an assumption that the fixture is created. Assuming a fixture is created successfully, the test itself provides valuable information about the validity of the test target.

Advanced frameworks allow not only tests to declare dependencies on fixtures, but fixtures to declare dependencies on other fixtures. When preparing a test, the framework resolves not only the proximate dependencies, but also the ultimate dependencies, using a dependency graph much like the one used in a build process.

For now, it is easiest to consider simply a many-to-many mapping of fixtures to tests without any dependencies of one fixture on another. Fixtures are often parameterized, but test parameterization and testing fixtures remain separate topics.

  • whether a test is reported as failed because the setup failed, or rather whether the test is reported as simply never attempted (because setup is not part of the test)

I think this does not mesh well with TAP as the reporting basis.

It may not be, but only in a narrow sense. A test report is useful as a complete listing of all tests that have passed versus failed. A good test set has high redundancy across many tests, not simply because redundancy offers greater certainty over validity. but because observations about which tests failed provide important information about the specific location of the problem.

If a dependency fails, whether a setup routine or a fixture, then a test cannot run, but the failure gives no information about whether the test target is valid. In such cases, stopping all tests is more sensible, or if possible, resolving specifically which tests depend on the unavailable resource, and running only the others.

The situation is similar to that of a theater producer inviting actors to a theater to audition for a play. The producer wants to see as many auditions as possible, but if the theater was forced to close because of an earlier fire, then the auditions may be cancelled for reasons unrelated to whether any of the actors were suitable choices for the roles. The producer should not conclude that any actor was not suitable simply because the fire prevented the scheduled auditions.

The TAP format makes a good assumption, that someone running tests wishes to see a full report of exactly which tests failed versus passed. Even so, some situations will prevent properly running the tests, and they are the ones in which it is useful to terminate testing prematurely, and not to generate a full, properly-formatted report.

whether testing stops because setup failed (which is different from stopping all tests because one test failed)

You're right, a test failure is something different than a setup failure. But in general we cannot decide if a setup failure is persistent or intermittent.

I'm not sure I follow. I think the essential issue is giving the test author a separation between the procedure that prepares a test, and the test itself. When this separation is properly enforced, then the place of failure will resolve the best way to report. A failure in preparing the test should be reported in a way that does not suggest that the test failed (that is, the target of the test is found to be invalid), only that the test cannot be prepared. A failure in the test itself should be reported that the target of the test was found to be invalid (we will assume that the test is valid, but that distinction is not the responsibility of the test framework).

@martin-schulze-vireso
Copy link
Member

Could you elaborate what exactly do you mean by dependency resolution?

Tests generally need resources, such as a connection to a network service, or a file with dummy data.

When a test requires a specific resource, it declares that dependency as a fixture. A test may need zero resources, one resource, or multiple resources. A static setup function may be limiting in flexibility for some scenarios.

How would you distinguish a static vs "dynamic" setup function in bash? At this very shallow level I don't see what prevents you from having a setup() that does the "magic" you are describing above.

Some resources may be quite elaborate, such as a virtual machine cloned from a save point. The processes of creating these resources may be error prone, and not fully deterministic. However, a failure creating a fixture is not a failure of a test. A fixture is created specifically for testing, and the test body uses a fixture to find a vulnerability in the software being tested, based on an assumption that the fixture is created. Assuming a fixture is created successfully, the test itself provides valuable information about the validity of the test target.

I think this boils down to the ultimate question: do you want a setup failure to give a failed test or a successful test result. The way Bats works, we map setup failures to test failures. I think this is the only right thing you can do in a general case, as you would not want to have a broken setup "disable" the test and miss a breaking change. If you know that you actually just want to skip the test and not fail the test run, then you can call skip in the setup().

Advanced frameworks allow not only tests to declare dependencies on fixtures, but fixtures to declare dependencies on other fixtures. When preparing a test, the framework resolves not only the proximate dependencies, but also the ultimate dependencies, using a dependency graph much like the one used in a build process.

Dependency management is a can of worms I'd like to avoid opening. If you have a complex system with complex dependencies, you can try to map it to functions calling each other. To encode support into bats we would need a strong usecase and at least a good outline on what a solution would look like.

For now, it is easiest to consider simply a many-to-many mapping of fixtures to tests without any dependencies of one fixture on another. Fixtures are often parameterized, but test parameterization and testing fixtures remain separate topics.

I was talking about parametrization as a means to make setup() more dynamic.

If a dependency fails, whether a setup routine or a fixture, then a test cannot run, but the failure gives no information about whether the test target is valid. In such cases, stopping all tests is more sensible, or if possible, resolving specifically which tests depend on the unavailable resource, and running only the others.
[...]
The TAP format makes a good assumption, that someone running tests wishes to see a full report of exactly which tests failed versus passed. Even so, some situations will prevent properly running the tests, and they are the ones in which it is useful to terminate testing prematurely, and not to generate a full, properly-formatted report.

I think I begin to grasp your underlying motivation, so let me try to rephrase to see if I got the gist of it:

  • you have many tests
  • with complex/costly setups
  • with shared dependencies that may fail

Now, when resource A (e.g. a remote Server) is not available, you want to skip all tests that use this resource to avoid wasting time on running the rest of their setup() just to notice again, that A is not available. Is that right?

I think the essential issue is giving the test author a separation between the procedure that prepares a test, and the test itself. When this separation is properly enforced, then the place of failure will resolve the best way to report. A failure in preparing the test should be reported in a way that does not suggest that the test failed (that is, the target of the test is found to be invalid), only that the test cannot be prepared. A failure in the test itself should be reported that the target of the test was found to be invalid (we will assume that the test is valid, but that distinction is not the responsibility of the test framework).

The issue with TAP is, that we can only write one of two lines: ok 1 test or not ok 1 test. We can adorn that with all kinds of additional context (like # skipped) but in the end it will be marked either be a failure or a success. Currently, a setup failure will map to not ok and the message will clearly state where setup() failed.

Please give an example of what your expected output should look like in TAP for a setup/fixture failure.

@brainchild0
Copy link
Author

How would you distinguish a static vs "dynamic" setup function in bash?

"Static" meaning the same execution path of the setup logic for each test in the series. If a resource is not available, but needed for some tests, it would need to provisioned by a setup() routine (because the function has one definition, and because the resource is needed by at least one test in the series), and then even the tests not needing the unavailable resource are not able to complete.

I think this boils down to the ultimate question: do you want a setup failure to give a failed test or a successful test result.

If the Olympics are canceled, which athletes do you want to be awarded the medals? I provided a different metaphor in my earlier comments.

The way Bats works, we map setup failures to test failures. I think this is the only right thing you can do in a general case, as you would not want to have a broken setup "disable" the test and miss a breaking change.

A broken setup does disable a test. Such is an inevitable fact.

Dependency management is a can of worms I'd like to avoid opening.

I agree, but it would be a big help to be able to do it for just one level, even if not a general graph.

I was talking about parametrization as a means to make setup() more dynamic.

Parameterization and dependencies are different subjects. The practical concerns are related, but the concepts are different. The former is about repetition of the same test but for specific changes of values. The latter is about resolution of the necessary but minimal work for each single test, which is consistent for each invocation of the same test definition (before it's paramaterized), but different for each test definition.

  • you have many tests
  • with complex/costly setups
  • with shared dependencies that may fail

Now, when resource A (e.g. a remote Server) is not available, you want to skip all tests that use this resource to avoid wasting time

Mostly, yes. I am not thinking particularly about wasted time, as much as about completeness and correctness of reports. What I want mostly is a report that partitions tests in three categories, 1) those that passed, 2) those that failed, even though the setup succeeded, those that were not attempted, (because the setup failed, making any result from the test body meaningless, if not misleading, in the report).

The issue with TAP is, that we can only write one of two lines: ok 1 test or not ok 1 test.

Yes, it seems that TAP is limiting, at least the way you describe it. I am not familiar myself, so taking the premise that it provides no outlet to report tests that were not run due to setup failure, I believe it makes sense to offer a runtime switch on the command line.

In some cases, the output will need to be processed by a robot, in which case adherence to syntax may be the overarching concern.

At other times, the preference may be to break from the TAPs-style of output in the special case, in preference to showing a message such as the following:

Testing aborted: Unable to complete test setup.

@martin-schulze-vireso
Copy link
Member

I have the feeling that this discussion is going in circles and we seem to have trouble communicating effectively with each other. I'll try to summarize one last time but if this drags on longer, I'll need to pause this to attend to more pressing issues.

There are two main issues:

  1. you want automatic dependency resolution for the setup phase, lets call that the fixture part
  2. you want a mechanism to filter out tests whose setup failed in the final display, lets call that the formatter part

I think these issues are mostly orthogonal to each other and can therefore be dealt with separately.

  1. I still need a good concrete example for the fixture part to see where Bats falls short and how that should look like in an ideal world, to be able to carry on a meaningful discussion. With concrete I mean code, not some prose description of real world analogues.
  2. although TAP is at the core of Bats internal communication, it is mainly a carrier medium that has already been used for several other formats. Similarly, there are many translators from TAP to other common test result formats. The current direction of Bats is to avoid adding more formatters in to core, but to work towards a stable interface that allows for third party formatters to do what you want with the report. TAP can report skipped tests, which would be the closest thing to what you want. Alternatively, you could inject custom commands into the TAP stream to enhance its output, but that would obviously tie your tests to the custom formatter. Apart from the formatting there still is the issue of return codes. If you have a test whose setup failed we can only decide to return 0 or non 0, which is usually connected with a CI job being marked as success or failure. What would you expect to report on skipped tests? In TAP, they count as success.

@brainchild0
Copy link
Author

brainchild0 commented Jan 5, 2022

I have the feeling that this discussion is going in circles and we seem to have trouble communicating effectively with each other. I'll try to summarize one last time but if this drags on longer, I'll need to pause this to attend to more pressing issues.

Your summary is essentially accurate, so I'm not sure what you think might be missing or unclear about my responses, other than that you might not feel persuaded by the suggestion.

At an rate, I am not viewing this conversation as unsuccessful, in terms of communication.

There are two main issues:

  1. you want automatic dependency resolution for the setup phase, lets call that the fixture part

  2. you want a mechanism to filter out tests whose setup failed in the final display, lets call that the formatter part

I think these issues are mostly orthogonal to each other and can therefore be dealt with separately.

Yes, item (1) was the premise of the topic. To some degree, I agree about the orthogonality, but fulfilling item (1) allows the system to continue with some tests, even while unmet dependencies may prevent certain others. Thus, fully achieving the essence of item (2) depends to a certain extent on item (2). The relationship is more strongly of use case than implementation, so discussing the two itesm together or separate is largely a matter of preference.

I still need a good concrete example for the fixture part to see where Bats falls short and how that should look like in an ideal world, to be able to carry on a meaningful discussion. With concrete I mean code, not some prose description of real world analogues.

What would you want to see in code? Part of the issue is that much of the discussion was about reports, which is not expressed in code but is interpretted in natural language. If you want an example that is real world but not an abstract metaphor, you might consider hearing someone report, "The network services were down last night, but none of the tests showed a regression in the project code".

although TAP is at the core of Bats internal communication, it is mainly a carrier medium that has already been used for several other formats. Similarly, there are many translators from TAP to other common test result formats. The current direction of Bats is to avoid adding more formatters in to core, but to work towards a stable interface that allows for third party formatters to do what you want with the report.

I understand, but the true sense underlying your comments, within the current context, appears to express the idea that the design of the entire system be constrained by limitations of a formatter.

My suggestion is that the framework should carry features most useful to those who use it, leaving to them any decisions about how to restrict its use to suit a chosen formatter.

TAP can report skipped tests, which would be the closest thing to what you want.

A static "skip" keyword is entirely different from skipping a test based on a failed attempt to resolve a dependency, which is generally unknown beforehand. Further, it is often infeasible to insert a change to a test definition into the test environment, which may integrate directly with a revision control system.

@martin-schulze-vireso
Copy link
Member

I understand, but the true sense underlying your comments, within the current context, appears to express the idea that the design of the entire system be constrained by limitations of a formatter.

My suggestion is that the framework should carry features most useful to those who use it, leaving to them any decisions about how to restrict its use to suit a chosen formatter.

Obviously, the formatter can only act in the information that it has available. Therefore, it is constrained by the way TAP delivers these.

TAP can report skipped tests, which would be the closest thing to what you want.

A static "skip" keyword is entirely different from skipping a test based on a failed attempt to resolve a dependency, which is generally unknown beforehand. Further, it is often infeasible to insert a change to a test definition into the test environment, which may integrate directly with a revision control system.

Skip in Bats is a command that can be executed anywhere in setup or test code, even guarded by arbitrary branching logic. Consider this with regard to your sbove example:

setup() {
  if network_is_down; then
    skip 'network is not available!'
  fi
}

@test 'test in network' {
   ...
}

So a network failure will generate following TAP output:

1..1
ok 1 Test in Network # skipped: network is not available!

@brainchild0
Copy link
Author

Obviously, the formatter can only act in the information that it has available. Therefore, it is constrained by the way TAP delivers these.

I'm not sure this response is closely connected to my comment. My point was that in considering more features for the framework, limitations in one formatter (or even many) for display of richer information should not be seen as a reason to exclude the features. The features have benefit independent of the formatter, and how to manage any loss of information is a decision that may vary per case.

Skip in Bats is a command that can be executed anywhere in setup or test code, even guarded by arbitrary branching logic. Consider this with regard to your sbove example:

Thank you for the helpful clarification. Even considering it, though, I'm not sure that it detracts much from the added value for the additional features under discussion.

@martin-schulze-vireso
Copy link
Member

martin-schulze-vireso commented Jan 11, 2022

Obviously, the formatter can only act in the information that it has available. Therefore, it is constrained by the way TAP delivers these.

I'm not sure this response is closely connected to my comment. My point was that in considering more features for the framework, limitations in one formatter (or even many) for display of richer information should not be seen as a reason to exclude the features. The features have benefit independent of the formatter, and how to manage any loss of information is a decision that may vary per case.

The internal communication of bats is already a TAP stream plus some additional annotations that get filtered out by the formatters. By design, TAP prints the test outputs after the test state (ok/not ok), so outputs have to be buffered to be injected into the stream after the state is known. I did not get into the details for this request, so there might be other complications. What I wanted to say is this: TAP is at the very core of Bats' implementation, so ripping it out will not be easy. However, as long as the information you want to transport fits into this extended TAP stream, we should be fine.

Skip in Bats is a command that can be executed anywhere in setup or test code, even guarded by arbitrary branching logic. Consider this with regard to your sbove example:

Thank you for the helpful clarification. Even considering it, though, I'm not sure that it detracts much from the added value for the additional features under discussion.

IMHO, this makes most of what you want possible with what is there already:

  1. fixtures: each dependency has its own function that gets called from setup(), you can even nest these to achieve transitivity. If the dependency fails, call skip (and even provide a reason!) and you're done. You can call it inside functions too, it does not need to be a direct command of setup().
  2. formatter: Bats is not quite there yet but it is planned that you can easily inject your own formatters. This means, you could write your own formatter that simply filters out all skipped tests and just prints a summary of how many tests were skipped. Or you could print which ones were skipped and why...

I think your formatter usecase might be too much of a niche to put directly into bats-core.

@brainchild0
Copy link
Author

brainchild0 commented Jan 11, 2022

The internal communication of bats is already a TAP stream plus some additional annotations that get filtered out by the formatters. By design, TAP prints the test outputs after the test state (ok/not ok), so outputs have to be buffered to be injected into the stream after the state is known.

I just skimmed the TAP specification, and discovered plenty of flexibility for extended reporting.

One possible informal extension is represented by the following example:

ok 2 - # SKIP DEPENDS(remote_host)

In the example, remote_host is the name of a fixture whose unavailability (indicated by DEPENDS) caused the framework to automatically skip test number 2.

In fact, skipping is explained rather broadly, for example, through the following excerpt:

The harness should report the text after # SKIP\S*\s+ as a reason for skipping.


IMHO, this makes most of what you want possible with what is there already:

Yes, of course, I agree. In fact, any Bats testing might be written simply in pure Bash.

At issue is the role of a test framework being to present metaphors and semantics providing resilience and simplicity for the test writer, without the writer diving into boilerplate recipes. Fixtures are one of the tools that helps achieve these ends.

@martin-schulze-vireso martin-schulze-vireso added the Priority: NeedsTriage Issue has not been vetted yet label May 14, 2022
@martin-schulze-vireso
Copy link
Member

I wrote up a small example of what I see as a possible solution for this issue in this gist. The example relies on the yet to be released external formatters feature from #602 .

@martin-schulze-vireso martin-schulze-vireso added Component: Bash Code Everything regarding the bash code Priority: Medium Wrong or misleading documentation, broken behavior with workaround Waiting for Contributor Feedback The original contributor did not yet respond to the latest request and removed Priority: NeedsTriage Issue has not been vetted yet labels Jun 6, 2022
@brainchild0
Copy link
Author

I am still working to understand what is happening in the example. Basically, the idea is to avoid some of the limitations in TAP output, bypassing it with a custom formatting engine?

@martin-schulze-vireso
Copy link
Member

It is an example of creating a fixture hierarchy and using a custom formatter to ignore tests whose dependencies we're not met. The formatter relies in the internal extened bats interface as defined in lib/bats-core/formatter.bash

@brainchild0
Copy link
Author

Do I correctly understand that the structure of the fixture hierarchy is captured only by whatever call sequences are internal to the bodies of the test definitions, and that the new feature in the project represented by the example is limited to the customizable formatting?

@martin-schulze-vireso
Copy link
Member

martin-schulze-vireso commented Jun 7, 2022

That is correct. It is an Implementation of what I describe in previous comments here.

@brainchild0
Copy link
Author

Well, it definitely adds some flexibility that might be useful as a workaround.

It would still add further value if the proposed enhancement would be considered for future development?

@martin-schulze-vireso
Copy link
Member

martin-schulze-vireso commented Jun 7, 2022

To be honest, I still don't know what exactly you expect to be implemented. The formatter won't become part of bats-core. I ensured that you can load your own via #602. The fixture part is pure bash, so there is nothing to do here. If you want to suggest concrete changes, I need concrete examples of where you want to go with this.

@brainchild0
Copy link
Author

brainchild0 commented Jun 7, 2022

Is it at least clear how one might imagine specific fixtures named as dependencies for particular tests (or other fixtures)?

@martin-schulze-vireso
Copy link
Member

martin-schulze-vireso commented Jun 7, 2022

Well, I already wrote how I would handle that: fixtures are functions and tests call their respective fixture functions in the setup, as is done in the example. Caveat: for readability, the setup is done in the tests themselves because we only have one setup per file. In the gist, all setup_* functions are intended as fixtures.

@brainchild0
Copy link
Author

Are you referring to the example? The distinction I was making was for support for annotating the dependencies, as is available from many test frameworks.

@martin-schulze-vireso
Copy link
Member

Yes, I am referring to the example. In my view, calling succinctly named functions is sufficient as annotation. If you find that lacking, then please provide an example where it could be improved upon with an explanation why and how it should be improved.

@brainchild0
Copy link
Author

Explicitly calling functions may be sufficient when considered solely with respect to the concern of succinctness, but the more essential benefit follows from the ability of the framework to derive from the annotations a dependency tree. An internal representation of tasks and dependencies facilities clean reporting and the avoidance of redundant operations within the same test sequence.

The matter of primary relevance is the concept of an internal representation of tasks and dependencies. If you wish, I could consider giving an example of syntax, but the concept may be clear without one, as it follows a pattern common among many frameworks.

@martin-schulze-vireso
Copy link
Member

An example, even in another framework/language, would be much appreciated.

@brainchild0
Copy link
Author

brainchild0 commented Jun 7, 2022

It is trivial for me to provide an example from another tool, as all that is needed is a link to a reference manual. I am choosing from the pytest manual the section entitled How to use fixtures.

Please note that some features described in the page may be out of scope of the current discussion, but it is important to review the subsection entitled Fixtures can request other fixtures.

@martin-schulze-vireso
Copy link
Member

Please note that some features described in the page may be out of scope of the current discussion, but it is important to review the subsection entitled Fixtures can request other fixtures.

Now, what exactly do you want to show there? Under the direct call paradigm functions can call each other too. The page discusses caching fixture results, which is not part of my example. Do you want to avoid duplicate fixture execution, e.g. in the following example?

setup_db() {
   : 

}

setup_server() {
    setup_db
    # some more ...
}

setup_backend() {
    setup_db
    # some more...
}

setup() {
  # this would call setup_db twice, which might not be idempotent
   setup_server()
   setup_backend()
}

@test ...

@brainchild0
Copy link
Author

An exhaustive representation of the full usefulness of fixtures is of course beyond the scope of this discussion. The general topic is presented in varied sources.

The most relevant difference between fixtures as a framework feature, versus emulated by a direct function call (as in your example), is allowing the framework to detect an error in the fixture as distinct from from in the core test, and to report accordingly. A test with a failed fixture is not a failed test, only an unsuccessful test.

@martin-schulze-vireso
Copy link
Member

Well, that maps pretty directly to failures in setup() which are right now reported as test failures. This may be due to TAP only supporting the directives SKIP and TODO which are for succeeding/failed tests respectively. So we could only map those setup/fixture errors to TODO, since they should still make the test run fail.

IMHO, this does not mesh well with the way setup is currently defined for bats. You will get an error message that tells you that setup failed but you won't get an easy way to filter these setup failures out, because its just a different error message.

An exhaustive representation of the full usefulness of fixtures is of course beyond the scope of this discussion. The general topic is presented in varied sources.

That may well be. However, we already have the name fixture in use internally and the concept will have different features depending on what the language can offer. I am frustrated by the vagueness of all of this which leaves the details for me to figure out.

My hope was that we could flesh out the minimum requirements we would need to implement for this to be useful. As is, I don't really know what we are going to do here and it is not on my priority list.

You are welcome to provide a pull request for this feature but I'll have to stop here.

@brainchild0
Copy link
Author

IMHO, this does not mesh well with the way setup is currently defined for bats.

Yes, it may not, but the idea is to move beyond the current limitations of setup. To be completely clear, the setup function is not a fixture.

This may be due to TAP only supporting the directives SKIP and TODO which are for succeeding/failed tests respectively.

Yes, we discussed this problem at great length in the original segment of this discussion. Did you review my suggestion for the TAP-compatible reporting strategy?

My hope was that we could flesh out the minimum requirements we would need to implement for this to be useful.

I wasn't aware they were still missing, but I would try to provide them, if you would allow me to try your patience just a bit further.

@martin-schulze-vireso
Copy link
Member

martin-schulze-vireso commented Jun 8, 2022

IMHO, this does not mesh well with the way setup is currently defined for bats.

Yes, it may not, but the idea is to move beyond the current limitations of setup. To be completely clear, the setup function is not a fixture.

This may be due to TAP only supporting the directives SKIP and TODO which are for succeeding/failed tests respectively.

Yes, we discussed this problem at great length in the original segment of this discussion. Did you review my suggestion for the TAP-compatible reporting strategy?

You mean the ok ... # SKIP DEPENDS(...) idea? The problem with that is, that the test counts as ok, which means if you only have passing tests and tests whose dependencies failed, you will get a passing result, at least from the internal return codes of bats. It would then be the responsibility of the TAP parser/formatter to turn that into an overall test failure, which I find dangerous.

Using the parlance of pytest: They have passing, failing and not attempted tests (that should fail the test run). We don't have the third option right now and it will be hard to make that work with TAP, because they lack it too.

My hope was that we could flesh out the minimum requirements we would need to implement for this to be useful.

I wasn't aware they were still missing, but I would try to provide them, if you would allow me to try your patience just a bit further.

Yes, please.

@brainchild0
Copy link
Author

brainchild0 commented Jun 8, 2022

Using the parlance of pytest: They have passing, failing and not attempted tests (that should fail the test run). We don't have the third option right now and it will be hard to make that work with TAP, because they lack it too.

It's not ideal, but workable. The output conforms to the more narrow assumptions provided by the basic format, but also includes richer information for ingestion by advanced engines.

My hope was that we could flesh out the minimum requirements we would need to implement for this to be useful.

I would try to provide them, if you would allow me to try your patience just a bit further.

Yes, please.

For each test, allow annotation with a list of functions. The functions are run in sequence, as listed, with duplicates removed, before starting the test. If any function fails, then the test aborts. Internally, the test results are provided by a third category separate from pass or fail.

The requirements may be expanded. These are the bare minimum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Bash Code Everything regarding the bash code Priority: Medium Wrong or misleading documentation, broken behavior with workaround Type: Enhancement Waiting for Contributor Feedback The original contributor did not yet respond to the latest request
Projects
None yet
Development

No branches or pull requests

2 participants