Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ngcc): update package.json deterministically #34870

Closed

Conversation

gkalpak
Copy link
Member

@gkalpak gkalpak commented Jan 20, 2020

Ngcc adds properties to the package.json files of the entry-points it processes to mark them as processed for a format and point to the created Ivy entry-points (in case of --create-ivy-entry-points). When running ngcc in parallel mode (which is the default for the standalone ngcc command), multiple formats can be processed simultaneously for the same entry-point and the order of completion is not deterministic.

Previously, ngcc would append new properties at the end of the target object in package.json as soon as the format processing was completed. As a result, the order of properties in the resulting package.json (when processing multiple formats for an entry-point in parallel) was not deterministic. For tools that use file hashes for caching purposes (such as Bazel), this lead to a high probability of cache misses.

This commit fixes the problem by ensuring that the position of properties added to package.json files is deterministic and independent of the order in which each format is processed.

Jira issue: FW-1801

Fixes #34635.

@gkalpak gkalpak added type: bug/fix action: merge The PR is ready for merge by the caretaker target: patch This PR is targeted for the next patch release comp: ngcc labels Jan 20, 2020
@ngbot ngbot bot added this to the needsTriage milestone Jan 20, 2020
@mary-poppins
Copy link

You can preview 9b1d9df at https://pr34870-9b1d9df.ngbuilds.io/.

@gkalpak gkalpak force-pushed the fix-ngcc-deterministic-packagejson branch from 9b1d9df to 9ccfaec Compare January 20, 2020 16:59
@gkalpak gkalpak marked this pull request as ready for review January 20, 2020 16:59
@gkalpak gkalpak requested a review from a team as a code owner January 20, 2020 16:59
@mary-poppins
Copy link

You can preview 9ccfaec at https://pr34870-9ccfaec.ngbuilds.io/.

Copy link
Member

@petebacondarwin petebacondarwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @gkalpak - it was not a trivial fix after all.

I would be much happier (unless you can convince me otherwise) if there was no switch statement.

@@ -157,4 +174,42 @@ export function applyChange(ctx: JsonObject, propPath: string[], value: JsonValu
}

ctx[lastProp] = value;
positionProperty(ctx, lastProp, positioning);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional programming: for the win!

Instead of using "enumeration" and a switch statement you could do this by creating instances of isNextProp function instead of positioning. I would envisage something like:

const unimportant = (propToInsert: string, currentProp: string) => boolean = false;
const alphabetic = (propToInsert: string, currentProp: string) => boolean = currentProp > propToInsert;
const createBefore = (beforeProp: string) => (_propToInsert: string, currentProp: string) => currentProp === beforeProp;

This would also avoid the awkward union of string and object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I 100% understand what you are suggesting 😁

The problem (if I understand correctly) is that for unimportant I want to leave existing properties at their old positions (to avoid unnecessary changes to files). This can't be coded as an isNextProp() function. Also, in the future there might be different positionings, such as {after: <prop>}, which again cannot be coded as an isNextProp() function.

We could come up with ways to work around these issues, but I don't think there will be much benefit in doing it (as it will rather complicate the code than simplify it).

In general, I think it is just a coincidence that two (out of three) positioning options (alphabetic and {before: ...}) can be implemented with similar logic (movePropBefore()).

Fun fact:
Original, I specified positioning values as unimportant, alphabetic and {after: <prop>}, but switched to {before: <prop>} (instead of after), because it allowed the implementations to share more code and I didn't really care as long as the new prop was next to the old prop 😁

There is always the case that I might be missing something, so feel free to suggest an implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the basic idea: 6891dde

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed a few places that would also need updating but you get the general gist :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean now. Well, the thing is that in parallel mode the recorded changes are collected on the worker side, serialized to JSON and send over to the cluster master which applies them to the actual file. Therefore, they need to be JSON-serializable (and functions are not (in a reasonable way 😉)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That is a shame. Leave it as it is then.

Although, I could envisage changing the messaging between worker and master so that the message is abstracted more to the update task rather than needing to know about the JSON structure. I.E. where the property is placed could be owned by the cluster master and the messages could be more domain oriented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I agree, but that's a discussion for another day 😃

@@ -88,21 +104,31 @@ runInEachFileSystem(() => {
it('should send an `update-package-json` message to the master process', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an equivalent check to

afterEach(() => expect(processSendSpy).not.toHaveBeenCalled());

which is in the master version of this code. But in this case we should check that the writeChangesSpy was not called, right?

Copy link
Member Author

@gkalpak gkalpak Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍
Done

// Helpers
function expectNotToHaveProp(obj: object, prop: string) {
expect(obj.hasOwnProperty(prop))
.toBe(false, `Expected object not to have property '${prop}'.`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth stringifying the object in the error message too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

expect(pkgKeys).toContain('|fesm2015_ivy_ngcc|fesm2015|');
expect(pkgKeys).toContain('|fesm5_ivy_ngcc|fesm5|');
expect(stringifyKeys(pkg.__processed_by_ivy_ngcc__ !))
.toMatch(/\|esm5\|(?:.+\|)?fesm2015\|(?:.+\|)?fesm5\|/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not possible to simply check the array of keys directly here? What pre-existing properties might exist on this?
This regex matching seems overkill, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because processing one format will also marked all other formats pointing to the same entry-point as processed. For example, the fesm5 property may point to the same file as the module property and thus both properties will be marked as processed once either of them gets processed. Same for fesm2015 and es2015. Also, typings gets processed the first time we process a property.

So, after having explicitly processed esm5, fesm2015 and fesm5, the __processed_by_ivy_ngcc__ object will also contain es2015, module and typings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do know exactly what will be in the array for this test right?
I guess it is a trade off between complexity of the test vs brittleness of the test.

  • In this case the test is harder to read and reason about.
  • In the other case the test is more prone to failing if something about the other properties that are added changes.

But I would argue that we need to ensure that "all" properties added to the __processed_by_ivy_ngcc__ are deterministic and so I don't think it would be brittle to check all fo them.

Perhaps I am missing something...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we know which properties will be there and we can check with a string match.

In this case the test is harder to read and reason about.

Well...that is subjective 😛
It is a little unexpected why there are other props there, so we would have to have a comment explaining what is going on and some people claim that reading a simple RegExp would be more straight forward 😛

I don't think it makes a big difference anyway, so happy to switch to exact string match 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gkalpak gkalpak force-pushed the fix-ngcc-deterministic-packagejson branch 2 times, most recently from 91ad167 to e04a8fd Compare January 21, 2020 13:27
@mary-poppins
Copy link

You can preview e04a8fd at https://pr34870-e04a8fd.ngbuilds.io/.

Ngcc adds properties to the `package.json` files of the entry-points it
processes to mark them as processed for a format and point to the
created Ivy entry-points (in case of `--create-ivy-entry-points`). When
running ngcc in parallel mode (which is the default for the standalone
ngcc command), multiple formats can be processed simultaneously for the
same entry-point and the order of completion is not deterministic.

Previously, ngcc would append new properties at the end of the target
object in `package.json` as soon as the format processing was completed.
As a result, the order of properties in the resulting `package.json`
(when processing multiple formats for an entry-point in parallel) was
not deterministic. For tools that use file hashes for caching purposes
(such as Bazel), this lead to a high probability of cache misses.

This commit fixes the problem by ensuring that the position of
properties added to `package.json` files is deterministic and
independent of the order in which each format is processed.

Jira issue: [FW-1801](https://angular-team.atlassian.net/browse/FW-1801)

Fixes angular#34635
@gkalpak gkalpak force-pushed the fix-ngcc-deterministic-packagejson branch from 226d71d to d779c13 Compare January 21, 2020 20:45
@mary-poppins
Copy link

You can preview d779c13 at https://pr34870-d779c13.ngbuilds.io/.

AndrewKushnir pushed a commit that referenced this pull request Jan 23, 2020
AndrewKushnir pushed a commit that referenced this pull request Jan 23, 2020
Ngcc adds properties to the `package.json` files of the entry-points it
processes to mark them as processed for a format and point to the
created Ivy entry-points (in case of `--create-ivy-entry-points`). When
running ngcc in parallel mode (which is the default for the standalone
ngcc command), multiple formats can be processed simultaneously for the
same entry-point and the order of completion is not deterministic.

Previously, ngcc would append new properties at the end of the target
object in `package.json` as soon as the format processing was completed.
As a result, the order of properties in the resulting `package.json`
(when processing multiple formats for an entry-point in parallel) was
not deterministic. For tools that use file hashes for caching purposes
(such as Bazel), this lead to a high probability of cache misses.

This commit fixes the problem by ensuring that the position of
properties added to `package.json` files is deterministic and
independent of the order in which each format is processed.

Jira issue: [FW-1801](https://angular-team.atlassian.net/browse/FW-1801)

Fixes #34635

PR Close #34870
AndrewKushnir pushed a commit that referenced this pull request Jan 23, 2020
Ngcc adds properties to the `package.json` files of the entry-points it
processes to mark them as processed for a format and point to the
created Ivy entry-points (in case of `--create-ivy-entry-points`). When
running ngcc in parallel mode (which is the default for the standalone
ngcc command), multiple formats can be processed simultaneously for the
same entry-point and the order of completion is not deterministic.

Previously, ngcc would append new properties at the end of the target
object in `package.json` as soon as the format processing was completed.
As a result, the order of properties in the resulting `package.json`
(when processing multiple formats for an entry-point in parallel) was
not deterministic. For tools that use file hashes for caching purposes
(such as Bazel), this lead to a high probability of cache misses.

This commit fixes the problem by ensuring that the position of
properties added to `package.json` files is deterministic and
independent of the order in which each format is processed.

Jira issue: [FW-1801](https://angular-team.atlassian.net/browse/FW-1801)

Fixes #34635

PR Close #34870
@gkalpak gkalpak deleted the fix-ngcc-deterministic-packagejson branch January 23, 2020 20:10
sonukapoor pushed a commit to sonukapoor/angular that referenced this pull request Feb 13, 2020
sonukapoor pushed a commit to sonukapoor/angular that referenced this pull request Feb 13, 2020
Ngcc adds properties to the `package.json` files of the entry-points it
processes to mark them as processed for a format and point to the
created Ivy entry-points (in case of `--create-ivy-entry-points`). When
running ngcc in parallel mode (which is the default for the standalone
ngcc command), multiple formats can be processed simultaneously for the
same entry-point and the order of completion is not deterministic.

Previously, ngcc would append new properties at the end of the target
object in `package.json` as soon as the format processing was completed.
As a result, the order of properties in the resulting `package.json`
(when processing multiple formats for an entry-point in parallel) was
not deterministic. For tools that use file hashes for caching purposes
(such as Bazel), this lead to a high probability of cache misses.

This commit fixes the problem by ensuring that the position of
properties added to `package.json` files is deterministic and
independent of the order in which each format is processed.

Jira issue: [FW-1801](https://angular-team.atlassian.net/browse/FW-1801)

Fixes angular#34635

PR Close angular#34870
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Feb 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker cla: yes target: patch This PR is targeted for the next patch release type: bug/fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inconsistent __processed_by_ivy_ngcc__ order breaks caching
4 participants