Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call-by-name syntax migration goal #20714

Merged
merged 38 commits into from
Apr 12, 2024

Conversation

sureshjoshi
Copy link
Member

@sureshjoshi sureshjoshi commented Mar 24, 2024

This PR creates a native Python built-in goal which performs the call-by-name migration for passed in files/targets (covers items 1 and 4 in #20572).

Note: This is not a completed feature, as per the "split off tickets" below.

Outstanding items in progress:

Out of scope of this PR, but migrate-call-by-name isn't a great name. This could be a good jump off point for a generic migrate goal with options/sub-goals (ala Swift's or Svelte's native migration tools).

See #20572

…ll comments and whitespace

- Could use the AST representation to replace lines, or jump out to tokenize to perform the replacement
- Easier to compose later, or pull out exactly what's needed - rather than trying to do that after
@sureshjoshi
Copy link
Member Author

@benjyw Two items here I'd like some direction on, if you get a chance:

Re-name MultiGet to concurrently (review what else is required here)

Haven't touched anything related to MultiGet, since I know it'll be churned with concurrently - so what's the plan here?

Is concurrently intended to be an alias for MultiGet until we eventually deprecate the Get based syntax someday? Or is/will it be a different way to collect/call?

My naive assumption is that it's still just a co-routine collector, same as MultiGet with a shiny new name.

Investigate why so many intrinsic migrations are missing (e.g. Digest code)

I'm not entirely sure where to even start here. There are a bunch of calls that just don't even register in the Rust rule graph call-by-syntax mapper. I think an example I gave was:
Get(Digest, RemovePrefix(result.output_digest, REPORT_DIR))

In the associated issue, Stu mentioned:

Benjy gave all of the native rules names: if they're not already exposed from the native engine, he might be able to help with exposing them.

Do you have any idea what is going on with this one?

@benjyw
Copy link
Sponsor Contributor

benjyw commented Mar 31, 2024

Re names - yes, all rules should have names now. Context: #19755

Is something not working vis-a-vis the intrinsics names?

See src/rust/engine/src/intrinsics.rs specifically for those names

@benjyw
Copy link
Sponsor Contributor

benjyw commented Mar 31, 2024

Re MultiGet: IIUC it should be renamed concurrently and take rule names, instead of Gets. Right?

@sureshjoshi
Copy link
Member Author

sureshjoshi commented Mar 31, 2024

Is something not working vis-a-vis the intrinsics names?

See src/rust/engine/src/intrinsics.rs specifically for those names

Yeah, so the generated rule graph isn't showing a handful of called-by-name functions. So, the example of Get(Digest, RemovePrefix(result.output_digest, REPORT_DIR)) wasn't being mapped to a function call.

But based on your comment, this set of intrinsics is where I should be looking? If so, perfect - having not dug through the rust code yet, I was at a bit of a loss.

    ...
    intrinsics.insert(
            Intrinsic::new(
                "directory_digest_to_digest_contents",
                types.digest_contents,
                types.directory_digest,
            ),
            Box::new(directory_digest_to_digest_contents),
        );
        intrinsics.insert(
            Intrinsic::new(
                "directory_digest_to_digest_entries",
                types.digest_entries,
                types.directory_digest,
            ),
            Box::new(directory_digest_to_digest_entries),
        );
    ...

@benjyw
Copy link
Sponsor Contributor

benjyw commented Mar 31, 2024

Yeah that is mapped to a rust function, not a python one.

@sureshjoshi
Copy link
Member Author

sureshjoshi commented Apr 3, 2024

@stuhood @benjyw @huonw @kaos

Added you all to the PR. I've got some lame tax paperwork to do for the couple days, but I wanted to start getting some eyeballs on this.

When I'm back, I'm going to work on some cleaning up and re-factoring of the code for a few things that bother me, and to see if I can find a cleaner way to satisfy the type checker other than the barrage of isinstance on AST items. Or at least consolidate them or something 🤷🏽

I've pulled off intrinsics and partials as separate tickets, as I think those could be rabbit hole investigations, and I didn't want to delay getting the 90% tool into a PR.

I'll be updating this shortly with an example of running the migrations on a Pants repo, so you can see what it looks like outside of integration tests.
Snapshot of the April 3rd YOLO branch (intentionally unformatted):
93157fd

It's worth noting there are LOTS of places that aren't migrated, as we're not doing a find/replace, but navigating through the AST. For example, code like digests.add(await Get(...)) isn't currently handled.

pex = await Get(
VenvPex,
PexRequest,
# Some comment that the AST parse wipes out, so we can't migrate this call safely
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedded comments are wiped out by the AST. There are ways we can try to move this around at replacement time, but effort to value might not be there - especially as we ensure that the comment loses some context when we summarily move it.

I'd rather let the user know to either manually move/remove/replace the comment, or to extract the code with an embedded comment out. In the Pants repo, we have these cases specifically around Process calls with embedded Args

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that seems fine with me as long as the warning is rendered to remind people to look for the stripped comments in code review.

@stuhood
Copy link
Sponsor Member

stuhood commented Apr 4, 2024

Snapshot of the April 3rd YOLO branch (intentionally unformatted):
93157fd

That looks awesome: kudos!

Will review this weekend.

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great: thanks so much @sureshjoshi!

Comment on lines +44 to +49
The migration plan is a JSON representation of the rule graph, which is generated by the
engine based on the active backends/rules in the project.

Each item in the migration plan is a rule that contains the old `Get` syntax, the associated
input/output types, and the new function to directly call. The migration plan can be dumped as
JSON using the `--json` flag, which can be useful for debugging. For example:
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The migration plan is a JSON representation of the rule graph, which is generated by the
engine based on the active backends/rules in the project.
Each item in the migration plan is a rule that contains the old `Get` syntax, the associated
input/output types, and the new function to directly call. The migration plan can be dumped as
JSON using the `--json` flag, which can be useful for debugging. For example:
A JSON migration plan can also be emitted, which contains the old `Get` syntax, the associated
input/output types, and the new function to directly call. The migration plan can be dumped as
JSON using the `--json` flag, which can be useful for debugging. For example:

def map_no_args_get_to_new_syntax(
self, get: ast.Call, deps: list[RuleGraphGetDep]
) -> tuple[ast.Call, list[ast.ImportFrom]]:
"""Get(<OutputType>) -> the_rule_to_call(**implicitly())"""
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: If the called rule doesn't take any arguments, you can skip the **implicitly(), and just call it as the_rule_to_call().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

Ah, okay, I didn't know if there were any ambient arguments that could still be passed without an explicit Get arg.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...oh. Hm. Good question. 😬


logger.debug(ast.dump(get, indent=2))
assert len(get.args) == 2, f"Expected 2 arg, got {len(get.args)}"
assert isinstance(output_type := get.args[0], ast.Name), f"Expected Name, got {get.args[0]}"
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat pattern!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

walrus FTW

Though, the lack of AST inference is brutal - I need to extract some utility AST stuff to support our use cases - too much dupe code to make the type checker happy (but, the value/need in making the type checker happy with AST code is really high)

def map_long_form_get_to_new_syntax(
self, get: ast.Call, deps: list[RuleGraphGetDep]
) -> tuple[ast.Call, list[ast.ImportFrom]]:
"""Get(<OutputType>, <InputType>, input) -> the_rule_to_call(**implicitly(input))"""
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing why there is a difference between map_long_form_get_to_new_syntax and map_short_form_get_to_new_syntax in terms of the output syntax: it's clear why the input is different, but the "if the called function only takes a single argument with a type that matches the argument" case applies to both of them I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I updated the short-form syntax before pushing this PR, but didn't give much thought to the long-form at the time.

From the examples I've seen in the code, the "short-form" syntax calls the XYZRequest constructor, so replacing that is pretty simple.

But the long-form calls seem to have a wider variety of call patterns, and one of the original calls I tried to replace similar to the short-form ended up failing the type checker due to some missing context (but works when called using implicitly(dict)

I'll take a look back at this, to see if it was a runtime or just type check problem.

Comment on lines +615 to +636
def _maybe_replaceable_call(self, statement: ast.stmt) -> ast.Call | None:
"""Looks for the following forms of Get that we want to replace:

- bar_get = Get(...)
- bar = await Get(...)
This function is pretty gross, but a lot of it is to satisfy the type checker and
to not need to try/catch everywhere.
"""
if (
isinstance(statement, ast.Assign)
and (
isinstance((call_node := statement.value), ast.Call)
or (
isinstance((await_node := statement.value), ast.Await)
and isinstance((call_node := await_node.value), ast.Call)
)
)
and isinstance(call_node.func, ast.Name)
and call_node.func.id == "Get"
):
return call_node
return None
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, any call to Get(..) should be replaceable to function_call(..): the only difference should be that before replacement you get out a Get[T], and after replacement you get out an Awaitable[T]... so it might break some type checking I suppose. But maybe it would leave you happier with this code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is more about satisfying the type checker, as if the call is awaitable, we access a different part of the AST vs if it's just straight callable. This will be pulled out to a utility to at least make the primary code more linearly readable.

pex = await Get(
VenvPex,
PexRequest,
# Some comment that the AST parse wipes out, so we can't migrate this call safely
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that seems fine with me as long as the warning is rendered to remind people to look for the stripped comments in code review.

@sureshjoshi sureshjoshi added category:internal CI, fixes for not-yet-released features, etc. and removed category:new feature labels Apr 12, 2024
- Playing some type gymnastics
- Removed assertions in lieu of proper throwable errors (in case asserts are optimized away)
@sureshjoshi sureshjoshi marked this pull request as ready for review April 12, 2024 20:48
@sureshjoshi sureshjoshi changed the title [WIP] Call-by-name syntax migration goal Call-by-name syntax migration goal Apr 12, 2024
@sureshjoshi sureshjoshi merged commit ddd47d7 into pantsbuild:main Apr 12, 2024
24 checks passed
@sureshjoshi sureshjoshi deleted the 20572-call-by-name-migration branch April 12, 2024 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants