Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_runner + dev_compiler tests have become flaky on github actions #49441

Open
jakemac53 opened this issue Jul 12, 2022 · 27 comments
Open

build_runner + dev_compiler tests have become flaky on github actions #49441

jakemac53 opened this issue Jul 12, 2022 · 27 comments
Labels
area-web Use area-web for Dart web related issues, including the DDC and dart2js compilers and JS interop. P3 A lower priority bug or feature request web-dev-compiler

Comments

@jakemac53
Copy link
Contributor

See related issue dart-lang/build#3332

A number of our dev_compiler based integration tests have started either consistently failing or flakily failing. Unfortunately we don't get any chrome logs for these failures, and I haven't been able to figure out a way to get them. The tests also do not fail locally, only on github actions.

I presume this is some async timing related issue, but I can't confirm.

I bisected the issue based on released dev sdks (all I can test on github), the tests pass on 2.18.0-238.0.dev and fail on 2.18.0-250.0.dev.

@jakemac53
Copy link
Contributor Author

cc @Markzipan @nshahan

@jakemac53
Copy link
Contributor Author

jakemac53 commented Jul 12, 2022

Here is an example run, same exact tests and version solve, one passes on 2.18.0-238.0.dev and the other fails on 2.18.0-250.0.dev (you can expand the setup dart step on each to see the version).

@jakemac53
Copy link
Contributor Author

Not sure how this is related, but another anecdote is that the tests which are opted out of null safety consistently fail, while the opted in tests only fail flakily.

@Markzipan
Copy link
Contributor

@nshahan A couple of DDC changes were submitted on July 1st that may be the culprit. Most of the changes seem innocuous or unrelated to the timing issues Jake's seeing, so this change is the most suspicious: 624dd76#diff-0934de6a6f3b0f87927d259fb0473ea0e4756a222df7f8834eb96e6ffe822c66R85

Specifically the change to declaredNullability vs nullability, which seems to be semantically different according to the CFE. I can't think of an obvious test that could tickle this case and cause timing flakes though.

@Markzipan Markzipan added area-web Use area-web for Dart web related issues, including the DDC and dart2js compilers and JS interop. web-dev-compiler labels Jul 12, 2022
@nshahan
Copy link
Contributor

nshahan commented Jul 18, 2022

While doing some debugging on the change that @Markzipan mentioned it looked like we were missing some of the "Undetermined Nullability" erasure that we were attempting to perform so I changed the test condition to use the FutureOrType.declaredNullability field instead of FutureOrType.nullability. My expectation was that this would cause more occurrences to be caught and avoid a compile time error when some pattern in the user code triggered this in the CFE representation (DDC throws when trying to compile a FutureOr with "Undetermined Nullability"). I suppose there could be a pattern that I missed that would result in more "Undetermined Nullability" getting through this erasure but that should result in a compile time failure, and it should be a consistent failure.

@jakemac53 Do you have any idea if the compile is failing? Have you tried running the tests suite locally many times to account for the flakiness?

@jakemac53
Copy link
Contributor Author

jakemac53 commented Jul 20, 2022

There are no compile time errors, I tried running several times locally but I will try also throttling the network to see if that helps repro the issue.

Update: throttling doesn't seem to reproduce it either

@nshahan
Copy link
Contributor

nshahan commented Jul 20, 2022

Is there a way I can debug the test running in chrome locally?

@jakemac53
Copy link
Contributor Author

jakemac53 commented Jul 20, 2022

Is there a way I can debug the test running in chrome locally?

Two ways, from the _test_null_safety directory in the build repo:

  1. dart run build_runner serve test and then load http://localhost:8080 in your browser and click the link to the opted_out_test.
  2. dart run build_runner test -- --pause-after-load --run-skipped -p chrome test/opted_out_test.dart, browser will open automatically, this also more directly mimics how the tests will actually load and run, probably more useful.

@nshahan
Copy link
Contributor

nshahan commented Jul 22, 2022

I've been investigating and debugging but nothing jumps out at me as an obvious error. Finally I tried building at
2.18.0-238.0.dev and 2.18.0-250.0.dev and diffing all the files under the .dart_tool/build directory. There are many diffs but none in any .js files. For that reason I don't think the failures are due to DDC.

Between 2.18.0-238.0.dev and 2.18.0-250.0.dev there are only three changes to DDC: 0f2e021 f3daca2 624dd76.

There are many more VM and analyzer changes (not sure if analyzer factors into the workflow of running theses tests).

Can we try running with any more dev builds in between as github actions to see if that can narrow down the culprit CLs?

Browsing this link and moving backwards in time makes it easy to see the various CLs that could be the cause
https://github.com/dart-lang/sdk/commits/2.18.0-250.0.dev

@jakemac53
Copy link
Contributor Author

I am not aware of any way to run with a custom SDK on github actions... we can only download the sdks available on google cloud storage.

@nshahan
Copy link
Contributor

nshahan commented Jul 25, 2022

I did a little digging and I think that all dev releases are uploaded to cloud storage, but not all are signed so they don't appear as available in the dev channel.

The good new is that there is now a release "flavor" named raw that should allow you to run with any dev version https://github.com/dart-lang/setup-dart.

Let's do some further bisecting of the range to reduce the size of the blamelist.

@jakemac53
Copy link
Contributor Author

Trying that here dart-lang/build#3331

@jakemac53
Copy link
Contributor Author

Me and Nick spent some more time today looking into this but have hit a dead end. We were able to get some more information though:

  • The first failing version is 2.18.0-247.0.dev
  • Were were able to observe an exception, "Cannot read properties of undefined (reading 'async')" with this stack:
    at Object.execCb (http://localhost:41243/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:1696:33)
    at Module.check (http://localhost:41243/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:878:55)
    at http://localhost:41243/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:630:33
    at each (http://localhost:41243/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:59:31)
    at breakCycle (http://localhost:41243/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:619:17)
    at http://localhost:41243/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:632:29
    at each (http://localhost:41243/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:59:31)
    at breakCycle (http://localhost:41[243](https://github.com/dart-lang/build/runs/7527732968?check_suite_focus=true#step:6:247)/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:619:17)
    at http://localhost:41243/eFQET3WQRsE%2BJ3KRouHLGF6iFBeTd7Rw/test/packages/build_web_compilers/src/dev_compiler/require.js:632:29
  • the stack trace isn't helpful though. The async variable appears to get initialized at the very top of the dart_sdk.js file.

@Markzipan
Copy link
Contributor

From what I can tell, the only SDK changes at that version are those DDC Future updates and some analyzer-related command and test framework changes: 6ca7a18

Is it possible that bumping test or webdev caused some odd interactions with DDC? Is it possible to run these tests with arbitrary files reverted/updated?

@nshahan
Copy link
Contributor

nshahan commented Jul 26, 2022

@Markzipan! You just pointed out something that is so confusing but does add some explanation to what is going on so thank you!!!

I have been looking at this view which makes me think that the DDC changes were part of 2.18.0-246.0.dev. They appear in the timeline before the commit labeled 2.18.0-246.0.dev so that seemed logical to me.

But... Your link to the contents of the merge commit for 2.18.0-247.0.dev shows the changes! If I look at the merge commit for 2.18.0-246.0.dev they are not there! That caused me to look at the change directly and inspect the tags it is present in and what do you know?!? The change is not present in 2.18.0-246.0.dev!

I guess the ordering of merge commits on the timeline doesn't represent what commits are part of the merge. That means
timeline view of commits in order is not useful to understand which commits are part of which release (my mistaken assumption). Now that we know the DDC commits are in-fact part of the exact change that is causing the problem I will double check again to see if I can find a problem.

@nshahan
Copy link
Contributor

nshahan commented Jul 29, 2022

I just double checked my diffing test and I still see no difference in the output compiled by DDC at the two versions. To document my process:

  1. Install 2.18.0-246.0.dev
  2. rm -rf build/_test_null_safety/.dart_tool
  3. $ cd build/_test_null_safety
  4. $ dart pub get
  5. $ dart run ../build_runner/bin/build_runner.dart test -- --pause-after-load --run-skipped -p chrome test/opted_out_test.dart
  6. $ cp -r .dart_tool ~/dart_tool-2.18.0-246.0.dev
  7. Install 2.18.0-247.0.dev
  8. rm -rf build/_test_null_safety/.dart_tool
  9. $ cd build/_test_null_safety
  10. $ dart pub get
  11. $ dart run ../build_runner/bin/build_runner.dart test -- --pause-after-load --run-skipped -p chrome test/opted_out_test.dart
  12. $ cp -r .dart_tool ~/dart_tool-2.18.0-247.0.dev
  13. $ diff -qr -x '*.dill' ~/dart_tool-2.18.0-246.0.dev ~/dart_tool-2.18.0-247.0.dev

For the diff, I'm excluding .dill files because they all contain diffs. I believe this is because there is a dart sdk version encoded into them. The remaining results are:

Files dart_tool-2.18.0-246.0.dev/build/fcd1995bc647fb959e82ea360c6c2c9a/asset_graph.json and dart_tool-2.18.0-247.0.dev/build/fcd1995bc647fb959e82ea360c6c2c9a/asset_graph.json differ
Files dart_tool-2.18.0-246.0.dev/build_resolvers/sdk.sum.deps and dart_tool-2.18.0-247.0.dev/build_resolvers/sdk.sum.deps differ
Files dart_tool-2.18.0-246.0.dev/package_config.json and dart_tool-2.18.0-247.0.dev/package_config.json differ
Only in dart_tool-2.18.0-246.0.dev/pub/bin/test: test.dart-2.18.0-246.0.dev.snapshot
Only in dart_tool-2.18.0-247.0.dev/pub/bin/test: test.dart-2.18.0-247.0.dev.snapshot

The diffs in sdk.sum.deps and package_config.json are only the SDK version numbers and timestamps. That just leaves the asset_graph.json and test.dart snapshots as potential differences.

@jakemac53
Copy link
Contributor Author

jakemac53 commented Jul 29, 2022

Fwiw I have narrowed this down to the commit with this sha 624dd76. A custom sdk built from the tag for 2.18.0-246.0.dev passes, and cherry picking in that sha it fails. Trying to see if I can figure out any more meaningful diffs.

Update: I also only see a diff in the .dill files but not the .js files.

@annagrin
Copy link
Contributor

annagrin commented Aug 1, 2022

Couple of things that might help with debugging:

image

@annagrin
Copy link
Contributor

annagrin commented Aug 1, 2022

The diffs in sdk.sum.deps and package_config.json are only the SDK version numbers and timestamps. That just leaves the asset_graph.json and test.dart snapshots as potential differences.

Just had another thought - given the trace you mentioned above seems to indicate that async library import is used but is not defined in one of the .js modules - maybe the asset.graph.json is worth looking into...

@natebosch
Copy link
Member

Here is the start of a diff between the dart_sdk.js files. The + are the changes we sometimes see where it breaks.

@@ -1,5 +1,6 @@
-define([], (function load__dart_sdk() {
+define(['dart_sdk'], (function load__dart_sdk(dart_sdk) {
   'use strict';
+  const async = dart_sdk.async;
   const _library = Object.create(null);
   const dart = Object.create(_library);
   dart.library = _library;
@@ -13,7 +14,7 @@
   var _js_primitives = Object.create(dart.library);
   var _metadata = Object.create(dart.library);
   var _native_typed_data = Object.create(dart.library);
-  var async = Object.create(dart.library);
+  var async$ = Object.create(dart.library);
   var collection = Object.create(dart.library);
   var convert = Object.create(dart.library);
   var developer = Object.create(dart.library);
@@ -293,28 +294,29 @@
     ObjectAndStackTraceTovoid: () => (T$.ObjectAndStackTraceTovoid = dart.constFn(dart.fnType(dart.void, [core.Object, core.StackTrace])))(),
     VoidTovoid: () => (T$.VoidTovoid = dart.constFn(dart.fnType(dart.void, [])))(),
     dynamicTovoid: () => (T$.dynamicTovoid = dart.constFn(dart.fnType(dart.void, [dart.dynamic])))(),
-    _FutureOfNull: () => (T$._FutureOfNull = dart.constFn(async._Future$(core.Null)))(),
+    _FutureOfNull: () => (T$._FutureOfNull = dart.constFn(async$._Future$(core.Null)))(),
     VoidTo_FutureOfNull: () => (T$.VoidTo_FutureOfNull = dart.constFn(dart.fnType(T$._FutureOfNull(), [])))(),
     FutureOfNull: () => (T$.FutureOfNull = dart.constFn(async.Future$(core.Null)))(),
     FutureNOfNull: () => (T$.FutureNOfNull = dart.constFn(dart.nullable(T$.FutureOfNull())))(),
     ObjectTovoid: () => (T$.ObjectTovoid = dart.constFn(dart.fnType(dart.void, [core.Object])))(),
-    dynamicToFuture: () => (T$.dynamicToFuture = dart.constFn(dart.fnType(async.Future, [dart.dynamic])))(),
-    _FutureOfString: () => (T$._FutureOfString = dart.constFn(async._Future$(core.String)))(),
-    _FutureOfbool: () => (T$._FutureOfbool = dart.constFn(async._Future$(core.bool)))(),
+    dynamicToFuture: () => (T$.dynamicToFuture = dart.constFn(dart.fnType(async$.Future, [dart.dynamic])))(),
+    _FutureOfString: () => (T$._FutureOfString = dart.constFn(async$._Future$(core.String)))(),
+    _FutureOfbool: () => (T$._FutureOfbool = dart.constFn(async$._Future$(core.bool)))(),
     VoidTobool: () => (T$.VoidTobool = dart.constFn(dart.fnType(core.bool, [])))(),
     boolToNull: () => (T$.boolToNull = dart.constFn(dart.fnType(core.Null, [core.bool])))(),
     voidToNull: () => (T$.voidToNull = dart.constFn(dart.fnType(core.Null, [dart.void])))(),
-    _FutureOfint: () => (T$._FutureOfint = dart.constFn(async._Future$(core.int)))(),
+    _FutureOfint: () => (T$._FutureOfint = dart.constFn(async$._Future$(core.int)))(),
     ObjectAndStackTraceToNull: () => (T$.ObjectAndStackTraceToNull = dart.constFn(dart.fnType(core.Null, [core.Object, core.StackTrace])))(),
-    FutureOfvoid: () => (T$.FutureOfvoid = dart.constFn(async.Future$(dart.void)))(),
+    FutureOfvoid: () => (T$.FutureOfvoid = dart.constFn(async$.Future$(dart.void)))(),
     VoidToFutureOfvoid: () => (T$.VoidToFutureOfvoid = dart.constFn(dart.fnType(T$.FutureOfvoid(), [])))(),
-    EventSinkTo_ConverterStreamEventSink: () => (T$.EventSinkTo_ConverterStreamEventSink = dart.constFn(dart.fnType(convert._ConverterStreamEventSink, [async.EventSink])))(),
+    EventSinkTo_ConverterStreamEventSink: () => (T$.EventSinkTo_ConverterStreamEventSink = dart.constFn(dart.fnType(convert._ConverterStreamEventSink, [async$.EventSink])))(),
     JSArrayOfUint8List: () => (T$.JSArrayOfUint8List = dart.constFn(_interceptors.JSArray$(typed_data.Uint8List)))(),
     ObjectNAndObjectNTovoid: () => (T$.ObjectNAndObjectNTovoid = dart.constFn(dart.fnType(dart.void, [T$.ObjectN(), T$.ObjectN()])))(),
     ObjectNToObjectN: () => (T$.ObjectNToObjectN = dart.constFn(dart.fnType(T$.ObjectN(), [T$.ObjectN()])))(),
     EmptyIteratorOfNeverL: () => (T$.EmptyIteratorOfNeverL = dart.constFn(_internal.EmptyIterator$(dart.legacy(dart.Never))))(),

@nshahan
Copy link
Contributor

nshahan commented Aug 2, 2022

We have now been able to reproduce this issue locally if you limit the number of workers using an environment variable first. $ export BUILD_MAX_WORKERS_PER_TASK=1

We believe the issue is related to the persistent worker mode and what it uses as the dependencies of a given compile. In the example diff @natebosch posted above, it looks like during this particular compile of the weak mode SDK module, the dart_sdk module is shown as a dependency. There should be no dependencies in the SDK module. During other runs we saw the same problem manifest in the sound mode SDK module. This leads us to believe that the change I made in DDC has slightly changed the timing of the builds and causes an interleaving of compiles that doesn't correctly set the right dependencies for the SDK modules.

@natebosch
Copy link
Member

We print browser output in cases where we time out waiting for the bootstrapping code to connect a channel back to the runner (though that hasn't been as useful as we'd like). We don't currently do anything with the browser output when a test fails or times out, which is how this is surfacing.

Should we look for some other heuristic for when browser output might be useful? Maybe any failure?

Should we consider a flag to emit browser output?

cc @jakemac53 @grouma

@grouma
Copy link
Member

grouma commented Aug 4, 2022

Should we look for some other heuristic for when browser output might be useful? Maybe any failure?

I believe Chrome always adds some logs to the STDERR. If I'm mistaken, it probably makes sense to log the STDERR whenever there is a test failure.

Should we consider a flag to emit browser output?

Does it make sense to capture the output and conditionally log it in https://pub.dev/packages/browser_launcher?

@jakemac53
Copy link
Contributor Author

Fwiw I did add some extra logging in a branch of package:test I was using to debug this issue, but it didn't output anything helpful.

@natebosch
Copy link
Member

Fwiw I did add some extra logging in a branch of package:test I was using to debug this issue, but it didn't output anything helpful.

How did you get the stack trace from #49441 (comment) ?

@jakemac53
Copy link
Contributor Author

That was by listening for runtime exceptions through the debug service dart-lang/test@2a8455e

@nshahan
Copy link
Contributor

nshahan commented Nov 18, 2022

Setting as P3 because workaround was landed to avoid the issue entirely. It still would be nice to fix to avoid any possible new issues in the future though.

@nshahan nshahan added the P3 A lower priority bug or feature request label Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-web Use area-web for Dart web related issues, including the DDC and dart2js compilers and JS interop. P3 A lower priority bug or feature request web-dev-compiler
Projects
None yet
Development

No branches or pull requests

6 participants