Introduce ArgumentsProvider that computes the cartesian product of collections of datapoints #1427

marcphilipp · 2018-05-21T08:15:03Z

This issue originates from a dicussion in #1422 (comment).

While often it suffices to specify each set of arguments for the invocation of a parameterized test explicitly, there are cases where it's simply the combination of lists of potential values for each parameter, i.e. the cartesian product of these collections. While it's certainly already possible to compute these combinations manually, e.g. in a method referenced by @MethodSource, it would be more convenient if there were a declarative way to specify that.

Recently, @sormuras has written a sample CartesianProductProvider extension which does that by implementing TestTemplateInvocationContextProvider. However, by implementing this functionality as a ArgumentsProvider users would not have to learn a new concept and we could reuse the existing infrastructure.

One possibility would be to use fields and methods, like the Theories runner from JUnit 4.

@BaseSet
static final Set<String> strings = Set.of("foo", "bar");

@BaseSet
static Set<Integer> ints() {
    return Set.of(23, 42);
}

@ParameterizedTest
@AllCombinationsSource
void test(String a, int b) {...}

@junit-team/junit-lambda Thoughts?

Deliverables

Introduce new ArgumentsProvider and related annotations

The text was updated successfully, but these errors were encountered:

sbrannen · 2018-05-21T12:24:50Z

I think that sounds reasonable in terms of a new feature.

The hardest part will be coming up with good names for the annotations. 😉

marcphilipp · 2018-05-21T12:39:32Z

Maybe, instead of introducing a bunch of new annotations, a simple API to combine sets of arguments from a @MethodSource method would be enough.

Something like

static Collection<Arguments> cartesianProduct() {
	return Arguments.builder(Set.of("foo", "bar"))
		.combineWith(Set.of(23, 42))
		.build();
}

or

static Collection<Arguments> cartesianProduct() {
	return Arguments.cartesianProduct(Set.of("foo", "bar"), Set.of(23, 42));
}

sbrannen · 2018-05-21T13:01:51Z

True, that may prove more elegant, and it definitely avoids an explosion of annotations (which we should aim to avoid).

So I agree: let's go with a programmatic approach via @MethodSource first and see what people think of that!

sbrannen · 2018-05-21T13:02:57Z

Though I would have Arguments.cartesianProduct() return a Set instead of a Collection, since it must be a set by definition -- right? 🤔

marcphilipp · 2018-05-21T13:15:26Z

Well, a Set is a Collection... 😉

While, mathematically speaking, it must be a set, I think we should be more lenient than that. I think we should make it work with arbitrary Collections and just return a Collection or Iterable. So, for instance, if one passes a List including duplicates (according to equals()), we would include the duplicates in the resulting Collection. Does that make sense?

sbrannen · 2018-05-21T13:18:20Z

Ummmm.... no, not really. 😇

My point was that the framework should actually require/enforce set semantics and avoid duplicated invocations of equivalent products.

Does that make sense? 😉

marcphilipp · 2018-05-21T13:22:49Z

I think that's debatable because of potentially broken/slow equals implementations... 🤔

sbrannen · 2018-05-21T13:23:50Z

Hmmmmmmmmmmmmm

sbrannen · 2018-05-21T13:26:56Z

I can't see your method signature for Arguments.cartesianProduct().

Are you proposing the following?

Collection<Arguments> Arguments.cartesianProduct(Collection<Object>... collections);

... b/c I was thinking of this:

Set<Arguments> Arguments.cartesianProduct(Set<Object>... sets);

marcphilipp · 2018-05-21T13:36:00Z

I think that's debatable because of potentially broken/slow equals implementations.

I'm of course not referring to built-in Java types, but user-defined ones. E.g. I've often encountered JPA entities where equals() only compared their id field. Or, the best ever equals() implementation I've stumbled upon which unconditionally returned false. Of course, that's nonsense and people shouldn't do that. On the other hand, we allow duplicate invocations based on the Stream/Iterable instances returned from @MethodSource referenced methods so I'm not convinced we should be this strict in this case.

I can't see your method signature for Arguments.cartesianProduct().

Almost! I was thinking of this:

Collection<Arguments> Arguments.cartesianProduct(Collection<?>... collections);

The builder may be an alternative that let's one configure whether or not duplicates should be removed.

sbrannen · 2018-05-21T13:55:55Z

I knew what you meant regarding equals/hashCode. I was just confirming the overall signature. 😉

Oops... yeah... when I typed <Object> I of course meant <?>.

On the other hand, we allow duplicate invocations based on the Stream/Iterable instances returned from @MethodSource referenced methods

That's true, but... those duplicates would be returned by user code; whereas, Arguments.cartesianProduct() would be framework code that can be more exact.

In any case, I see your point regarding broken equals/hashCode implementations in user code. Hence, my follow-up comment below.

The builder may be an alternative that let's one configure whether or not duplicates should be removed.

I think that would almost be a requirement to support that even if only as an opt-in feature in the builder.

sbrannen · 2018-05-21T13:58:26Z

the best ever equals() implementation I've stumbled upon which unconditionally returned false.

That's quite a jewel of software engineering, but... I've seen something that tops it (on another level): a method in the service layer annotated with @Ignore from JUnit 4. I'll just leave it at that. 😇

nrllewellyn · 2018-05-22T01:05:31Z

A couple of random (hopefully useful) notes from a JUnit user:

The annotation-driven approach does provide some (minor) advantages--it could produce some interesting testing options when mixed with nested tests (e.g. "all tests share parameters from the base class, but this nested class provides extra parameters"). I also feel that the annotation form is slightly more readable than the method-driven. (That said, it's not like someone couldn't just write a new ArgumentsSource to use annotations if they really wanted it--just throwing out a couple of opinions! 😄)
Whatever solution you decide on, I would recommend allowing the user to have multiple parameters of the same type. For example: void testAddTwoNumbers(int first, int second). I find that this is a common use case for reducers and other pieces of code that need to combine two values (or more). This ended up being a limiting factor for @ValueSource in a lot of our testing.
While someone is working in this codebase, you might want to fix the docs for parameterized tests--I believe that Section 3.15 was actually intended to be section 3.14.2

sbrannen · 2018-05-22T11:03:47Z

I believe that Section 3.15 was actually intended to be section 3.14.2

Good catch!

Fixed in 5343323.

jhorstmann · 2018-05-22T11:50:26Z

For the programmatic approach, I would suggest that the method should return a lazy Stream<Arguments> instead of an already materialized collection.

I had a similar requirement some time ago. There was a huge number of input combinations generated by a cartesian product, but only a few of them were actually valid combinations for the test scenarios. An example would be that for the german version of a webshop, only certain payment methods and only a shipping address in germany would be valid.

Having this method returning a stream makes it easier to reuse it by filtering elements, without keeping all elements in memory first. I can provide an implementation if needed.

sbrannen · 2018-05-22T11:58:26Z

I seem to recall hearing/reading something about performance issues when attempting to do something like building up a Cartesian product using Java streams.

So if we truly implement it solely based on streams instead of building a collection upfront, we would then likely need to keep performance in mind in order to support large data sets.

I can provide an implementation if needed.

@jhorstmann, perhaps just sharing your ideas/intentions would be a good start.

jbduncan · 2018-05-22T13:50:23Z

I seem to recall hearing/reading something about performance issues when attempting to do something like building up a Cartesian product using Java streams.

@sbrannen Guava's Sets.cartesianProduct is relatively lazy by my understanding, so I imagine it wouldn't be impossible to make a (sequential) streams version which is at least as lazy and performant. But I admit that I've not personally read anything about the performance of stream-based Cartesian sets, so I may be speaking nonsense... :)

sbrannen · 2018-05-22T16:04:39Z

Sure. I'm not saying it's not possible. I'm just saying that I assume it will be considerably more complicated (in terms of the actual implementation) if we accept multiple streams as input and stream the Cartesian product on the fly back to the Jupiter TestEngine for processing. 😉

sbrannen · 2018-05-22T16:08:57Z

Based on internal team discussions, we are currently leaning toward one of the following programmatic APIs:

List<Arguments> Arguments.cartesianProduct(Set<?>... sets);

Or:

Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);

In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument tuples.

sbrannen · 2018-05-22T16:10:50Z

Having this method returning a stream makes it easier to reuse it by filtering elements, without keeping all elements in memory first.

That sounds like a reasonable argument to me in favor of streaming results: I had not considered that a user might want to filter the Cartesian product tuples before returning them to the Jupiter TestEngine.

sbrannen · 2018-05-22T16:17:49Z

@jbduncan, regarding possible performance issues, I think this blog is what I was recalling (or similar at least).

Quoting that blog:

this code computes the entire stream of cartesian product values before processing the first value.

I haven't verified the claim of course, but it at least sounds like the author put a lot of consideration into the presented solution.

p.s. FWIW, most solutions on Stack Overflow and similar channels suggest the use of flatMap() but don't mention any issues with performance.

jbduncan · 2018-05-23T16:47:18Z

@sbrannen Cool, thank you for pointing me in the direction of that blog. I gave it a scan yesterday, and it looks promising to me!

Sure. I'm not saying it's not possible. I'm just saying that I assume it will be considerably more complicated (in terms of the actual implementation) if we accept multiple streams as input and stream the Cartesian product on the fly back to the Jupiter TestEngine for processing. 😉

@sbrannen Yeah, okay. Sorry, I wasn't terribly clear earlier: I was sub-consciously hoping that we could easily derive an implementation like Guava's Sets.cartesianProduct for this issue without needing to use the full-blown spliterator-based solution given in that blog, as I vaguely recalled that Sets.cartesianProduct's javadoc gave a citation for the algorithm its implementation was based on. But looking back at said javadoc, I realise that I was mistaken!

Thus, using the blog you linked earlier sounds like the best way forward to me.

P.S. FYI, when I was scanning through the blog, I actually recognised the JDK bug report that the author opened regarding flatMap(), for I saw a similar one being discussed on the OpenJDK mailing lists not too long ago. It turns out it's actually a duplicate of this bug, which is fixed already but only for Java 10+.

If the JUnit 5 team does use that blog as inspiration for resolving this issue, might it be worth opening another issue to test out the simpler flatMap()-based solution in the future if or when JUnit 5+ depends on Java 10+?

jhorstmann · 2018-05-24T12:56:12Z

For reference, my implementation is available at https://gist.github.com/jhorstmann/a7aba9947bc4926a75f6de8f69560c6e and actually looks very similar to the blog post, except it uses an iterator which is then wrapped into a spliterator. I can add any license if needed.

The cartesianProduct method returning Stream<Arguments> could already be implemented outside of junit. I'm thinking if we could make it even easier to use. Especially with a larger number of dimensions, mapping those to method arguments by index could be error prone. Especially if you'd want different subsets of dimensions for different test methods. Another approach could be to allow the different Source annotations on method parameters

void test(@CsvFileSource(...) @CsvToPerson Person person,
          @EnumSource(...) PaymentMethod paymentMethod,
          @MethodSource(...) Address shippingAddress,
          @MethodSource(...) Address billingAddress)

If I understand it correctly, the above annotations could also be used as meta-annotations, making the example even nicer. Different test methods could so easily use different sets of dimensions. Filtering of valid combinations could be done inside the method by using Assumptions.

marcphilipp · 2018-06-06T18:47:23Z

Interesting idea! However, it wouldn't support multiple parameters per source at all. For starters I think we should define the cartesianProduct method in Arguments.

@jhorstmann Would you like to submit a PR with your implementation?

matthieu-vergne · 2019-01-09T19:04:13Z

Is there some advances on that matter? I am just looking for such a feature, actually.

sbrannen · 2019-01-10T13:27:27Z

Is there some advances on that matter? I am just looking for such a feature, actually.

Nothing official, but if I recall correctly, @sormuras created a proof of concept.

@sormuras ?

sormuras · 2019-01-10T15:28:56Z

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.

Would be cool if s/o took over to make the integration/implementation happen.

marcphilipp · 2019-01-12T13:48:38Z

I think a simple cartesianProduct method would be easy to implement and provide a way to implement such use cases along with @MethodSource without requiring any additional dependencies.

For example, I needed sth. like that here:
https://github.com/marcphilipp/nexus-publish-plugin/blob/master/src/test/kotlin/de/marcphilipp/gradle/nexus/NexusPublishPluginTests.kt#L55-L61

matthieu-vergne · 2019-01-12T14:25:43Z

I have a professional case (sorry I can't share it here) where I have many tests, including a bunch to be parametrized. These parameters build on the same few sets, but they are often different combinations (different size and order) of those sets. Rather than using a different function for each of them, I would rather provide the few annotations each of them needs, with the right sets in the right order. It would be way better to read than a double method every time.

marcphilipp · 2019-01-12T15:19:15Z

@matthieu-vergne Could you please explain in more detail what you mean with "a double method every time"?

matthieu-vergne · 2019-01-12T15:40:09Z

The test method + the argument provider method.

rougou · 2020-02-17T04:49:02Z

Based on internal team discussions, we are currently leaning toward one of the following programmatic APIs:
List<Arguments> Arguments.cartesianProduct(Set<?>... sets);
Or:
Stream<Arguments> Arguments.cartesianProduct(Set<?>... sets);
In other words, we aim to accept sets as input and generate the Cartesian product as an ordered collection (e.g., a List) or Stream of Argument tuples.

@sbrannen
I like this approach, but would prefer cartesianProduct(Collection<?> ... collections) as suggested by @marcphilipp previously. The reason being that when writing tests in Groovy, we'd have to do:
[false, true] as Set
etc. for each parameter. With Collection we won't have to add the as Set every time.

marcphilipp · 2020-02-17T18:56:47Z

I'd be okay with a more loose definition based on Collection if it doesn't make the implementation more complex.

matthieu-vergne · 2020-02-17T19:01:11Z

If you can go this way, an Iterable may be enough.

tommai78101 · 2020-04-24T14:36:55Z

Is this new feature experimental?

https://github.com/junit-team/junit5-samples/tree/master/junit5-jupiter-extensions/src/main/java/com/example/cartesian

I didn't see it in this list:

https://junit.org/junit5/docs/current/user-guide/#api-evolution-experimental-apis

marcphilipp · 2020-04-25T12:55:57Z

No, that's just a sample project that shows how to accomplish this in an extension. It's not an officially supported feature (yet) since it's not as extensible as we'd like it to be.

sormuras · 2020-04-25T13:07:02Z

Citing myself here:

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.

Would be cool if s/o took over to make the integration/implementation happen.

It's still up-for-grabs. ;-)

tommai78101 · 2020-04-25T21:31:36Z

@marcphilipp Thanks for the clarification.

scordio · 2021-01-25T21:20:44Z

Citing myself here:

Aye. The POC wants to be integrated into JUnit Pioneer. See junit-pioneer/junit-pioneer#68 for details -- which also links back to the current POC within our JUnit 5 Jupiter Extension Samples project.
Would be cool if s/o took over to make the integration/implementation happen.

It's still up-for-grabs. ;-)

It seems it was not yet advertised: @Michael1993 brought the feature to life with junit-pioneer/junit-pioneer#321, and I'm proposing some additions in junit-pioneer/junit-pioneer#409 and junit-pioneer/junit-pioneer#416

stale · 2022-06-21T00:54:48Z

This issue has been automatically marked as stale because it has not had recent activity. Given the limited bandwidth of the team, it will be automatically closed if no further activity occurs. Thank you for your contribution.

stale · 2022-07-12T23:11:01Z

This issue has been automatically closed due to inactivity. If you have a good use case for this feature, please feel free to reopen the issue.

erik35 · 2022-12-01T14:35:29Z

are there plans to integrate this into JUnit 5 itself?

marcphilipp · 2022-12-02T08:51:19Z

There are no current plans but we may reconsider if the Pioneer extension gets widely used.

matter-it-does · 2023-01-23T16:41:56Z

using cartesian test and saved so much complexity

mahozad · 2023-09-15T05:04:53Z

See this StackOverflow post

marcphilipp added type: enhancement status: team discussion component: Jupiter theme: parameterized tests labels May 21, 2018

marcphilipp added this to the 5.3 Backlog milestone May 21, 2018

marcphilipp mentioned this issue May 21, 2018

JUnit 5 Theories #1422

Closed

7 tasks

marcphilipp removed the status: team discussion label Jun 6, 2018

BlueIce mentioned this issue May 6, 2019

Allow to define parameter source per method argument. #1870

Closed

marcphilipp mentioned this issue Sep 14, 2020

Test methods cannot combine RepeatedTest and ParameterizedTest annotations #1224

Closed

marcphilipp removed this from the 5.8 Backlog milestone Jun 19, 2021

stale bot added the status: stale label Jun 21, 2022

stale bot closed this as completed Jul 12, 2022

sormuras mentioned this issue Oct 16, 2022

Ability to combine a few argument sources in parameterized tests #3063

Closed

Introduce ArgumentsProvider that computes the cartesian product of collections of datapoints #1427

Introduce ArgumentsProvider that computes the cartesian product of collections of datapoints #1427

Comments

marcphilipp commented May 21, 2018

Deliverables

sbrannen commented May 21, 2018

marcphilipp commented May 21, 2018

sbrannen commented May 21, 2018

sbrannen commented May 21, 2018

marcphilipp commented May 21, 2018

sbrannen commented May 21, 2018

marcphilipp commented May 21, 2018

sbrannen commented May 21, 2018

sbrannen commented May 21, 2018 • edited

marcphilipp commented May 21, 2018

sbrannen commented May 21, 2018

sbrannen commented May 21, 2018

nrllewellyn commented May 22, 2018 • edited

sbrannen commented May 22, 2018

jhorstmann commented May 22, 2018

sbrannen commented May 22, 2018

jbduncan commented May 22, 2018

sbrannen commented May 22, 2018

sbrannen commented May 22, 2018

sbrannen commented May 22, 2018

sbrannen commented May 22, 2018

jbduncan commented May 23, 2018

jhorstmann commented May 24, 2018

marcphilipp commented Jun 6, 2018

matthieu-vergne commented Jan 9, 2019

sbrannen commented Jan 10, 2019 • edited

sormuras commented Jan 10, 2019

marcphilipp commented Jan 12, 2019

matthieu-vergne commented Jan 12, 2019

marcphilipp commented Jan 12, 2019

matthieu-vergne commented Jan 12, 2019

rougou commented Feb 17, 2020 • edited

marcphilipp commented Feb 17, 2020

matthieu-vergne commented Feb 17, 2020

tommai78101 commented Apr 24, 2020

marcphilipp commented Apr 25, 2020

sormuras commented Apr 25, 2020

tommai78101 commented Apr 25, 2020

scordio commented Jan 25, 2021

stale bot commented Jun 21, 2022

stale bot commented Jul 12, 2022

erik35 commented Dec 1, 2022

marcphilipp commented Dec 2, 2022

matter-it-does commented Jan 23, 2023

mahozad commented Sep 15, 2023

sbrannen commented May 21, 2018 •

edited

nrllewellyn commented May 22, 2018 •

edited

sbrannen commented Jan 10, 2019 •

edited

rougou commented Feb 17, 2020 •

edited