Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy dataprovider loading #3191

Closed
SamMousa opened this issue Jul 4, 2018 · 14 comments
Closed

Lazy dataprovider loading #3191

SamMousa opened this issue Jul 4, 2018 · 14 comments
Assignees
Labels
type/enhancement A new idea that should be implemented

Comments

@SamMousa
Copy link

SamMousa commented Jul 4, 2018

Q A
PHPUnit version 7.1.5
PHP version 7.2
Installation Method Composer

This is not a bug, but more of a feature request / proposal.

Currently the result of a dataprovider is always traversed, meaning that a generator will be fully traversed before the first test starts.

This approach has several downsides:

  • The whole data set must fit in memory.
  • Actual testing is delayed.

Changes required would be relatively minor.

  • PHPUnit\Util\Test::getProvidedData() return type should be changed to ?Traversable.
  • Instead of testing the dataset beforehand we should test each element separately, we can easily do this by wrapping the Traversable.
    TestSuite::createTest() needs some minor changes (can't check if a Traversable is empty).

I'm open for implementing if there's agreement that this would be an improvement.

@sebastianbergmann
Copy link
Owner

Depends on #10.

@keradus
Copy link
Contributor

keradus commented Jul 4, 2018

note: currently there is a possibility to filter tests by data provider label, this shall not be broken

@epdenouden
Copy link
Contributor

epdenouden commented Mar 29, 2019

note: currently there is a possibility to filter tests by data provider label, this shall not be broken

Just to clarify, you are referring to --filter someName also acting on @dataprovider row names or is there some other filtering mechanism I am not yet aware of?

This works with the caveat that the current mechanism discovers available tests by instantiation and then simply using the TestSuite-iterator. To get filters on tests working I have to force load the data rows when a filter gets injected in a DataProviderTestSuite.

Frankly, I am not too familiar with the ins and outs of RecursiveIteratorAggregate. I've always wondered if the execution reordering could have been implemented as a fancy Iterator.

./phpunit \
  --testdox \
  --colors=always \
  --no-configuration \
  --filter true \
  DataProviderFilterTest tests/_files/DataProviderFilterTest.php

PHPUnit 8.1-g78c9461c9 by Sebastian Bergmann and contributors.

DataProviderFilter
 ✔ True with data set 0
 ✔ True with data set 1
 ✔ True with data set 2
 ✔ True with data set 3

Time: 71 ms, Memory: 6.00 MB

OK (4 tests, 4 assertions)

@keradus
Copy link
Contributor

keradus commented Jun 8, 2019

Just to clarify, you are referring to --filter someName also acting on @dataProvider row names or is there some other filtering mechanism I am not yet aware of?

example of data provider:

public function provideFoo() {
    return [
        'foo_bar' => [1, 2],
        'baz_123' => [3, 4],
    ];
}

then, you can --filter foo_bar

@epdenouden
Copy link
Contributor

@keradus Yes, this scenario works in the prototype. It did require some new logic to check if a dataprovider requires being loaded for filtering to work, but yes, this works fine!

@epdenouden
Copy link
Contributor

epdenouden commented Jun 20, 2019

Hooray! I've got some first results as of yesterday:

image

  • Prefetched is the @dataprovider implementation from v8.x
  • JIT is the experimental just-in-time loading mechanism
  • JIT+unload is the JIT combined with data unloading during the run

Benefits of improved data provider handling

  • compared to current implementation speed and resource consumption is equal or better
  • already saves between 5%-10% memory on PHPUnit's self-test suite, which is light on data sets
  • smoother resource usage and lower spikes during initialization
  • works with selectors like --testsuite, --group and --filter
  • no changes to tests or configuration required

image

Opportunities for further improvement

  • Generator row-by-row loading could further smooth load spikes
  • cache data sets when used by multiple tests
  • data providers loaded on-demand could be executed non-statically and have access to resources created by fixtures
  • steadily climbing memory usage hints at low-hanging fruit in TestResult

Testing

The effect of on-demand creation of tests based on @dataprovider rows is visible in the test counts:

.......   61 / 1010 (  6%)      .......   61 / 1827 (  3%)
.......  122 / 1010 ( 12%)      .......  122 / 1827 (  6%)
.......  183 / 1044 ( 17%)      .......  183 / 1827 ( 10%)
.......  244 / 1138 ( 21%)      .......  244 / 1827 ( 13%)
.......  305 / 1172 ( 26%)      .......  305 / 1827 ( 16%)
.......  366 / 1253 ( 29%)      .......  366 / 1827 ( 20%)
.......  427 / 1323 ( 32%)      .......  427 / 1827 ( 23%)
.......  488 / 1334 ( 36%)      .......  488 / 1827 ( 26%)
.......  549 / 1334 ( 41%)      .......  549 / 1827 ( 30%)
.......  610 / 1334 ( 45%)      .......  610 / 1827 ( 33%)
.......  671 / 1336 ( 50%)      .......  671 / 1827 ( 36%)
.......  732 / 1339 ( 54%)      .......  732 / 1827 ( 40%)
.......  793 / 1365 ( 58%)      .......  793 / 1827 ( 43%)
.......  854 / 1393 ( 61%)      .......  854 / 1827 ( 46%)
.......  915 / 1394 ( 65%)      .......  915 / 1827 ( 50%)
.......  976 / 1394 ( 70%)      .......  976 / 1827 ( 53%)
....... 1037 / 1395 ( 74%)      ....... 1037 / 1827 ( 56%)
....... 1098 / 1400 ( 78%)      ....... 1098 / 1827 ( 60%)
....... 1159 / 1400 ( 82%)      ....... 1159 / 1827 ( 63%)
....... 1220 / 1402 ( 87%)      ....... 1220 / 1827 ( 66%)
....... 1281 / 1431 ( 89%)      ....... 1281 / 1827 ( 70%)
....... 1342 / 1449 ( 92%)      ....... 1342 / 1827 ( 73%)
....... 1403 / 1466 ( 95%)      ....... 1403 / 1827 ( 76%)
....... 1464 / 1510 ( 96%)      ....... 1464 / 1827 ( 80%)
....... 1525 / 1548 ( 98%)      ....... 1525 / 1827 ( 83%)
....... 1586 / 1827 ( 86%)      ....... 1586 / 1827 ( 86%)
....... 1647 / 1827 ( 90%)      ....... 1647 / 1827 ( 90%)
....... 1708 / 1827 ( 93%)      ....... 1708 / 1827 ( 93%)
....... 1769 / 1827 ( 96%)      ....... 1769 / 1827 ( 96%)
....    1827 / 1827 (100%)      ....    1827 / 1827 (100%)

Notes

  • PHPUnit's unit self-tests only use simple and small dataproviders: no large sets, no complex datatypes, no database interaction
  • memory usage during a run of ./phpunit --testsuite unit with xdebug.trace_format=1
  • exact measurements are slightly different between environments

@epdenouden epdenouden reopened this Jun 20, 2019
@epdenouden epdenouden self-assigned this Jun 20, 2019
@keradus
Copy link
Contributor

keradus commented Jun 24, 2019

It did require some new logic to check if a dataprovider requires being loaded for filtering to work

out of curiosity, can you elaborate that logic? like, how to detect if dataprovider shall be loaded or not to filter by keys provided by dataprovider ?

@epdenouden
Copy link
Contributor

epdenouden commented Jun 24, 2019

out of curiosity, can you elaborate that logic? like, how to detect if dataprovider shall be loaded or not to filter by keys provided by dataprovider ?

Unfortunately there is no magic here. Filtering by name requires the names to be available which in turn forces us to load the data provider. This works by loading the data upon DataProviderTestSuite::injectFilter, making the data is available for the iterator filter.

Many optimizations came to mind, introducing their own edge cases. Any ideas welcome as always! I intend to build a proper implementation and your critical eye is very much appreciated.

@epdenouden
Copy link
Contributor

if @keradus orders a meal in a restaurant, does the kitchen iterate through the whole menu only to return his order?

@epdenouden
Copy link
Contributor

ping @SamMousa on-demand data providers are a possible :)

@keradus
Copy link
Contributor

keradus commented Jun 24, 2019

if @epdenouden ordered anything with salami, does the kitchen knows where to look for it, or iterate over all dishes ingredients?

problematic part is that we can filter not only by items (test classes, methods) but also ingredients (case description)

@epdenouden
Copy link
Contributor

if @epdenouden ordered anything with salami, does the kitchen knows where to look for it, or iterate over all dishes ingredients?

problematic part is that we can filter not only by items (test classes, methods) but also ingredients (case description)

It will iterate through al dishes looking for salami as an ingredient. This works by simply reverting to the old data provider behaviour when a --filter is used. There are still some efficiency gains possible, but the main advantage of not loading the providers is gone

It might be possible, though! I am not (yet) familiar enough with the whole test filtering logic.

@epdenouden
Copy link
Contributor

Superseded by #3736

@keradus
Copy link
Contributor

keradus commented Sep 8, 2019

This works by simply reverting to the old data provider behaviour when a --filter is used. There are still some efficiency gains possible, but the main advantage of not loading the providers is gone

It may happen that a lot of great effort made to improve data providers will be simply ignored due to fact that someone used a filter to run only some or the tests ? What a pity, I almost always use filtering, @epdenouden

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement A new idea that should be implemented
Projects
None yet
Development

No branches or pull requests

4 participants