Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ArgumentsProvider that computes the cartesian product of collections of datapoints #1427

Closed
1 task
marcphilipp opened this issue May 21, 2018 · 45 comments

Comments

@marcphilipp
Copy link
Member

This issue originates from a dicussion in #1422 (comment).

While often it suffices to specify each set of arguments for the invocation of a parameterized test explicitly, there are cases where it's simply the combination of lists of potential values for each parameter, i.e. the cartesian product of these collections. While it's certainly already possible to compute these combinations manually, e.g. in a method referenced by @MethodSource, it would be more convenient if there were a declarative way to specify that.

Recently, @sormuras has written a sample CartesianProductProvider extension which does that by implementing TestTemplateInvocationContextProvider. However, by implementing this functionality as a ArgumentsProvider users would not have to learn a new concept and we could reuse the existing infrastructure.

One possibility would be to use fields and methods, like the Theories runner from JUnit 4.

@BaseSet
static final Set<String> strings = Set.of("foo", "bar");

@BaseSet
static Set<Integer> ints() {
    return Set.of(23, 42);
}

@ParameterizedTest
@AllCombinationsSource
void test(String a, int b) {...}

@junit-team/junit-lambda Thoughts?

Deliverables

  • Introduce new ArgumentsProvider and related annotations
@sbrannen
Copy link
Member

I think that sounds reasonable in terms of a new feature.

The hardest part will be coming up with good names for the annotations. 😉

@marcphilipp
Copy link
Member Author

Maybe, instead of introducing a bunch of new annotations, a simple API to combine sets of arguments from a @MethodSource method would be enough.

Something like

static Collection<Arguments> cartesianProduct() {
	return Arguments.builder(Set.of("foo", "bar"))
		.combineWith(Set.of(23, 42))
		.build();
}

or

static Collection<Arguments> cartesianProduct() {
	return Arguments.cartesianProduct(Set.of("foo", "bar"), Set.of(23, 42));
}

@sbrannen
Copy link
Member

True, that may prove more elegant, and it definitely avoids an explosion of annotations (which we should aim to avoid).

So I agree: let's go with a programmatic approach via @MethodSource first and see what people think of that!

@sbrannen
Copy link
Member

Though I would have Arguments.cartesianProduct() return a Set instead of a Collection, since it must be a set by definition -- right? 🤔

@marcphilipp
Copy link
Member Author

Well, a Set is a Collection... 😉

While, mathematically speaking, it must be a set, I think we should be more lenient than that. I think we should make it work with arbitrary Collections and just return a Collection or Iterable. So, for instance, if one passes a List including duplicates (according to equals()), we would include the duplicates in the resulting Collection. Does that make sense?

@sbrannen
Copy link
Member

Ummmm.... no, not really. 😇

My point was that the framework should actually require/enforce set semantics and avoid duplicated invocations of equivalent products.

Does that make sense? 😉

@marcphilipp
Copy link
Member Author

I think that's debatable because of potentially broken/slow equals implementations... 🤔

@sbrannen
Copy link
Member

Hmmmmmmmmmmmmm

@sbrannen
Copy link
Member

sbrannen commented May 21, 2018

I can't see your method signature for Arguments.cartesianProduct().

Are you proposing the following?

Collection<Arguments> Arguments.cartesianProduct(Collection<Object>... collections);

... b/c I was thinking of this:

Set<Arguments> Arguments.cartesianProduct(Set<Object>... sets);

@marcphilipp
Copy link
Member Author

I think that's debatable because of potentially broken/slow equals implementations.

I'm of course not referring to built-in Java types, but user-defined ones. E.g. I've often encountered JPA entities where equals() only compared their id field. Or, the best ever equals() implementation I've stumbled upon which unconditionally returned false. Of course, that's nonsense and people shouldn't do that. On the other hand, we allow duplicate invocations based on the Stream/Iterable instances returned from @MethodSource referenced methods so I'm not convinced we should be this strict in this case.

I can't see your method signature for Arguments.cartesianProduct().

Almost! I was thinking of this:

Collection<Arguments> Arguments.cartesianProduct(Collection<?>... collections);

The builder may be an alternative that let's one configure whether or not duplicates should be removed.

@sbrannen
Copy link
Member

I knew what you meant regarding equals/hashCode. I was just confirming the overall signature. 😉

Oops... yeah... when I typed <Object> I of course meant <?>.

On the other hand, we allow duplicate invocations based on the Stream/Iterable instances returned from @MethodSource referenced methods

That's true, but... those duplicates would be returned by user code; whereas, Arguments.cartesianProduct() would be framework code that can be more exact.

In any case, I see your point regarding broken equals/hashCode implementations in user code. Hence, my follow-up comment below.

The builder may be an alternative that let's one configure whether or not duplicates should be removed.

I think that would almost be a requirement to support that even if only as an opt-in feature in the builder.

@sbrannen
Copy link
Member

the best ever equals() implementation I've stumbled upon which unconditionally returned false.

That's quite a jewel of software engineering, but... I've seen something that tops it (on another level): a method in the service layer annotated with @Ignore from JUnit 4. I'll just leave it at that. 😇

@nrllewellyn
Copy link

nrllewellyn commented May 22, 2018

A couple of random (hopefully useful) notes from a JUnit user:

  • The annotation-driven approach does provide some (minor) advantages--it could produce some interesting testing options when mixed with nested tests (e.g. "all tests share parameters from the base class, but this nested class provides extra parameters"). I also feel that the annotation form is slightly more readable than the method-driven. (That said, it's not like someone couldn't just write a new ArgumentsSource to use annotations if they really wanted it--just throwing out a couple of opinions! 😄)
  • Whatever solution you decide on, I would recommend allowing the user to have multiple parameters of the same type. For example: void testAddTwoNumbers(int first, int second). I find that this is a common use case for reducers and other pieces of code that need to combine two values (or more). This ended up being a limiting factor for @ValueSource in a lot of our testing.
  • While someone is working in this codebase, you might want to fix the docs for parameterized tests--I believe that Section 3.15 was actually intended to be section 3.14.2

@sbrannen
Copy link
Member

I believe that Section 3.15 was actually intended to be section 3.14.2

Good catch!

Fixed in 5343323.

@jhorstmann
Copy link

For the programmatic approach, I would suggest that the method should return a lazy Stream<Arguments> instead of an already materialized collection.

I had a similar requirement some time ago. There was a huge number of input combinations generated by a cartesian product, but only a few of them were actually valid combinations for the test scenarios. An example would be that for the german version of a webshop, only certain payment methods and only a shipping address in germany would be valid.

Having this method returning a stream makes it easier to reuse it by filtering elements, without keeping all elements in memory first. I can provide an implementation if needed.

@sbrannen
Copy link
Member

I seem to recall hearing/reading something about performance issues when attempting to do something like building up a Cartesian product using Java streams.

So if we truly implement it solely based on streams instead of building a collection upfront, we would then likely need to keep performance in mind in order to support large data sets.

I can provide an implementation if needed.

@jhorstmann, perhaps just sharing your ideas/intentions would be a good start.

@jbduncan
Copy link
Contributor

I seem to recall hearing/reading something about performance issues when attempting to do something like building up a Cartesian product using Java streams.

@sbrannen Guava's Sets.cartesianProduct is relatively lazy by my understanding, so I imagine it wouldn't be impossible to make a (sequential) streams version which is at least as lazy and performant. But I admit that I've not personally read anything about the performance of stream-based Cartesian sets, so I may be speaking nonsense... :)

@sbrannen
Copy link
Member

Sure. I'm not saying it's not possible. I'm just saying that I assume it will be considerably more complicated (in terms of the actual implementation) if we accept multiple streams as input and stream the Cartesian product on the fly back to the Jupiter TestEngine for processing. 😉

@sbrannen
Copy link
Member

Based on internal team discussions, we are currently leaning toward one of the following programmatic APIs:

List<Arguments> Arguments.cartesianProduct(Set<?>... sets);

Or:

Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);

In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument tuples.

@sbrannen
Copy link
Member

Having this method returning a stream makes it easier to reuse it by filtering elements, without keeping all elements in memory first.

That sounds like a reasonable argument to me in favor of streaming results: I had not considered that a user might want to filter the Cartesian product tuples before returning them to the Jupiter TestEngine.

@sbrannen
Copy link
Member

@jbduncan, regarding possible performance issues, I think this blog is what I was recalling (or similar at least).

Quoting that blog:

this code computes the entire stream of cartesian product values before processing the first value.

I haven't verified the claim of course, but it at least sounds like the author put a lot of consideration into the presented solution.

p.s. FWIW, most solutions on Stack Overflow and similar channels suggest the use of flatMap() but don't mention any issues with performance.

@jbduncan
Copy link
Contributor

@sbrannen Cool, thank you for pointing me in the direction of that blog. I gave it a scan yesterday, and it looks promising to me!

Sure. I'm not saying it's not possible. I'm just saying that I assume it will be considerably more complicated (in terms of the actual implementation) if we accept multiple streams as input and stream the Cartesian product on the fly back to the Jupiter TestEngine for processing. 😉

@sbrannen Yeah, okay. Sorry, I wasn't terribly clear earlier: I was sub-consciously hoping that we could easily derive an implementation like Guava's Sets.cartesianProduct for this issue without needing to use the full-blown spliterator-based solution given in that blog, as I vaguely recalled that Sets.cartesianProduct's javadoc gave a citation for the algorithm its implementation was based on. But looking back at said javadoc, I realise that I was mistaken!

Thus, using the blog you linked earlier sounds like the best way forward to me.

P.S. FYI, when I was scanning through the blog, I actually recognised the JDK bug report that the author opened regarding flatMap(), for I saw a similar one being discussed on the OpenJDK mailing lists not too long ago. It turns out it's actually a duplicate of this bug, which is fixed already but only for Java 10+.

If the JUnit 5 team does use that blog as inspiration for resolving this issue, might it be worth opening another issue to test out the simpler flatMap()-based solution in the future if or when JUnit 5+ depends on Java 10+?

@jhorstmann
Copy link

For reference, my implementation is available at https://gist.github.com/jhorstmann/a7aba9947bc4926a75f6de8f69560c6e and actually looks very similar to the blog post, except it uses an iterator which is then wrapped into a spliterator. I can add any license if needed.

The cartesianProduct method returning Stream<Arguments> could already be implemented outside of junit. I'm thinking if we could make it even easier to use. Especially with a larger number of dimensions, mapping those to method arguments by index could be error prone. Especially if you'd want different subsets of dimensions for different test methods. Another approach could be to allow the different Source annotations on method parameters

void test(@CsvFileSource(...) @CsvToPerson Person person,
          @EnumSource(...) PaymentMethod paymentMethod,
          @MethodSource(...) Address shippingAddress,
          @MethodSource(...) Address billingAddress)

If I understand it correctly, the above annotations could also be used as meta-annotations, making the example even nicer. Different test methods could so easily use different sets of dimensions. Filtering of valid combinations could be done inside the method by using Assumptions.

@marcphilipp
Copy link
Member Author

Interesting idea! However, it wouldn't support multiple parameters per source at all. For starters I think we should define the cartesianProduct method in Arguments.

@jhorstmann Would you like to submit a PR with your implementation?

@matthieu-vergne
Copy link

Is there some advances on that matter? I am just looking for such a feature, actually.

@sbrannen
Copy link
Member

sbrannen commented Jan 10, 2019

Is there some advances on that matter? I am just looking for such a feature, actually.

Nothing official, but if I recall correctly, @sormuras created a proof of concept.

@sormuras ?

@sormuras
Copy link
Member

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.

Would be cool if s/o took over to make the integration/implementation happen.

@marcphilipp
Copy link
Member Author

I think a simple cartesianProduct method would be easy to implement and provide a way to implement such use cases along with @MethodSource without requiring any additional dependencies.

For example, I needed sth. like that here:
https://github.com/marcphilipp/nexus-publish-plugin/blob/master/src/test/kotlin/de/marcphilipp/gradle/nexus/NexusPublishPluginTests.kt#L55-L61

@matthieu-vergne
Copy link

I have a professional case (sorry I can't share it here) where I have many tests, including a bunch to be parametrized. These parameters build on the same few sets, but they are often different combinations (different size and order) of those sets. Rather than using a different function for each of them, I would rather provide the few annotations each of them needs, with the right sets in the right order. It would be way better to read than a double method every time.

@marcphilipp
Copy link
Member Author

@matthieu-vergne Could you please explain in more detail what you mean with "a double method every time"?

@matthieu-vergne
Copy link

The test method + the argument provider method.

@rougou
Copy link

rougou commented Feb 17, 2020

Based on internal team discussions, we are currently leaning toward one of the following programmatic APIs:

List<Arguments> Arguments.cartesianProduct(Set<?>... sets);

Or:

Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);

In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument tuples.

@sbrannen
I like this approach, but would prefer cartesianProduct(Collection<?> ... collections) as suggested by @marcphilipp previously. The reason being that when writing tests in Groovy, we'd have to do:
[false, true] as Set
etc. for each parameter. With Collection we won't have to add the as Set every time.

@marcphilipp
Copy link
Member Author

I'd be okay with a more loose definition based on Collection if it doesn't make the implementation more complex.

@matthieu-vergne
Copy link

If you can go this way, an Iterable may be enough.

@marcphilipp
Copy link
Member Author

No, that's just a sample project that shows how to accomplish this in an extension. It's not an officially supported feature (yet) since it's not as extensible as we'd like it to be.

@sormuras
Copy link
Member

Citing myself here:

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.

Would be cool if s/o took over to make the integration/implementation happen.

It's still up-for-grabs. ;-)

@tommai78101
Copy link

@marcphilipp Thanks for the clarification.

@scordio
Copy link
Contributor

scordio commented Jan 25, 2021

Citing myself here:

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.
Would be cool if s/o took over to make the integration/implementation happen.

It's still up-for-grabs. ;-)

It seems it was not yet advertised: @Michael1993 brought the feature to life with junit-pioneer/junit-pioneer#321, and I'm proposing some additions in junit-pioneer/junit-pioneer#409 and junit-pioneer/junit-pioneer#416

@marcphilipp marcphilipp removed this from the 5.8 Backlog milestone Jun 19, 2021
@stale
Copy link

stale bot commented Jun 21, 2022

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. Thank you for your contribution.

@stale stale bot added the status: stale label Jun 21, 2022
@stale
Copy link

stale bot commented Jul 12, 2022

This issue has been automatically closed due to inactivity. If you have a good use case for this feature, please feel free to reopen the issue.

@erik35
Copy link

erik35 commented Dec 1, 2022

are there plans to integrate this into JUnit 5 itself?

@marcphilipp
Copy link
Member Author

There are no current plans but we may reconsider if the Pioneer extension gets widely used.

@matter-it-does
Copy link

using cartesian test and saved so much complexity

@mahozad
Copy link
Contributor

mahozad commented Sep 15, 2023

See this StackOverflow post

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests