-
-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create migration tool for call-by-name syntax #20572
Comments
cc @sureshjoshi , who mentioned being interested in this at the recent team meeting. |
@sureshjoshi : One way to split this up might be for me to implement something that does steps 1 through 3 from the description, and then for you to do step 4. I will probably have 1-3 done this weekend, but I don't really have any context on 4. |
@stuhood Sure, whatever you think the most optimal way of going through this is. The essence of what I bought up was in the meeting was that, I want faster Pants startup times and for I won't be touching/looking at anything until my current project reaches its conference, and then I'm allocating a sizeable chunk of time (60 hours starting in mid-March , 80ish in April, 80ish in May) for a few Pants-centric items that I finally want to get out of my fork, and this. As far as (4) off the top of my head, I'm not sure. I've done this kinda thing in Typescript and in Swift (using SwiftSyntax), but not in Python. Without much thought, my idea was to use something like treesitter's AST, or maybe even ast-grep, but I haven't really looked into this yet. |
Medium term we'd like to get rid of the solver entirely, since it's complex code, so we want to no longer support call by signature. |
Gotcha, so essentially a forced migration for in-repo plugins |
We may want to bump the major version to 3.x when we get rid of the solver, as this will require substantial rewrites of external plugins. |
It shouldn't, assuming that they have executed the rewrite? Once the rewrite has been executed, the solver just (mostly) isn't used. More generally, thinking that this change might be necessary (due to the complexity of the solver) was in the back of my mind ever since 2.0.0, and was one of the main reasons why I didn't want to stabilize the plugin system. I think that with this type of change behind us, it would still be possible to stabilize the plugin system within the 2.x.y series. |
…ation. (#20574) As discussed in #20572, this change adds a built-in goal `migrate-call-by-name` which emits all `Get` calls which should be migrated to call-by-name syntax. A followup change should use the resulting information (`@rule` function pointers and `Get` signatures) to execute rewrites of the relevant files. Emits lines like: ``` <function infer_python_conftest_dependencies at 0x113633af0> Get(<class 'pants.engine.target.Targets'>, [<class 'pants.engine.addresses.Addresses'>]) -> <function resolve_targets at 0x112335d30> Get(<class 'pants.engine.internals.graph.Owners'>, [<class 'pants.engine.internals.graph.OwnersRequest'>]) -> <function find_owners at 0x112349160> Get(<class 'pants.backend.python.util_rules.ancestor_files.AncestorFiles'>, [<class 'pants.backend.python.util_rules.ancestor_files.AncestorFilesRequest'>]) -> <function find_ancestor_files at 0x1131e6790> ```
It looks we already use It seems to be possible to preserve most formatting (importantly: comments) with |
That's pretty cool, I'd never used tokenize before - I was thinking Just doing some ad hoc hacking - seems to be pretty do-able so far, when I'm actually looking into this, I'll probably slice off a small plugin to do a prototype PR on |
@stuhood Now that the completions PR is being reviewed, will be focusing on this next Before I jump into the re-writing part, what is the canonical single The test cases of @rule
def rule1(arg: int) -> int:
return arg
@rule
async def rule2() -> int:
return 2
async def rule3() -> int:
one_explicit = await rule1(1)
one_implicit = await rule1(**implicitly(int(1)))
two = await rule2()
return one_explicit + one_implicit + two And from the original proposal: # Equivalent to `Get(ReturnType)` (i.e. no arguments to `Get`). All arguments would be
# computed from Params which were already in scope:
await the_rule_to_call(**implicitly())
# Using a positional arg. Roughly equivalent to `Get(ReturnType, Arg1())`: the remaining arguments would be
# computed from Params which were already in scope:
await the_rule_to_call(Arg1(), **implicitly())
# Two positional args. Roughly equivalent to `Get(ReturnType, {Arg1(): Arg1, Arg2(): Arg2)`. Note that
# `@rule` graph solving is still necessary for this case, because the called `@rule` may have `Get`s
# which have additional dependencies.
await the_rule_to_call(Arg1(), Arg2())
# Equivalent to `Get(ReturnType, Arg1())`:
await the_rule_to_call(**implicitly(Arg1()))
# Equivalent to `Get(ReturnType, {Arg1(): Arg1, Arg2})`:
await the_rule_to_call(**implicitly({Arg1(): Arg1, Arg2(): Arg2})) As a practical example from the codebase: # rules.py
@rule
async def get_graphql_uvicorn_setup(
request: GraphQLUvicornServerSetupRequest, graphql: GraphQLSubsystem
) -> UvicornServerSetup:
browser = await Get(Browser, BrowserRequest, request.browser_request())
return UvicornServerSetup(graphql_uvicorn_setup(browser, graphql=graphql))
# browser.py
@rule
async def get_browser(request: BrowserRequest, open_binary: OpenBinary) -> Browser:
return Browser(open_binary, request.protocol, request.server)
# breakdown
<function get_graphql_uvicorn_setup at 0x11776b700>
Get(<class 'pants_explorer.server.browser.Browser'>,
[<class 'pants_explorer.server.browser.BrowserRequest'>])
-> <function get_browser at 0x112b8c670> would that turn into one of the following? browser = await get_browser(request.browser_request(), **implicitly())
browser = await get_browser(**implicitly(request.browser_request())) Also, would we want to explicitly type the |
…base, looking for statements of interest
…ll comments and whitespace - Could use the AST representation to replace lines, or jump out to tokenize to perform the replacement
Alright, so having a bit too much fun with this. In my branch, I wrote a basic script to play around with how we can do this migration (outside of Pants, as iterating in Pants right now takes a long time). It looks for Single-line, single-GET # Before
go_mod_addr = await Get(OwningGoMod, OwningGoModRequest(transitive_targets.roots[0].address))
# After
go_mod_addr = await call_by_some_name(TODO TODO TODO) Multi-line, single-GET # Before
addresses_for_thrift = await Get(
PythonModuleOwners,
PythonModuleOwnersRequest(
"thrift",
resolve=resolve,
locality=locality,
),
)
# After
addresses_for_thrift = await call_by_some_name(TODO TODO TODO) @benjyw Do you recall where we landed on MultiGet? Are those to be named The script doesn't add imports, but that should be easy to brute force - and I have a more clever idea that I want to play around with later on. |
Yea, I think that we should go with
I don't think that the resulting formatting is too important, because I think that anyone who cares about formatting will be using an auto-formatter. My guess is that many calls will be about the same length, so if you wanted bonus points, you could match the existing "multi-line-ness" of the call you are replacing. But just choosing one output wrapping strategy and then using it should be totally fine due to auto-formatters. |
Thanks @stuhood but to clarify, I'm not entirely sure what I should replace the existing GET call to in the examples above. Would it be the same as what's in the sample code above already? Or is there a new syntax? As in, there are several ways in the examples above - any preference to which I use? |
Some notes to self while playing around: # Where
@rule
async def get_browser(request: BrowserRequest, open_binary: OpenBinary) -> Browser:
# Transforming:
browser = await Get(Browser, BrowserRequest, request.browser_request())
# Into:
browser: Any = await get_browser(request.browser_request())
-> Pyright cannot infer type (becomes Any)
-> Runtime TypeError: get_browser() got multiple values for argument 'request'
browser: Browser = await get_browser(request.browser_request(), **implicitly())
-> Pyright correctly infers type
-> Runtime TypeError: get_browser() got multiple values for argument 'request'
browser: Browser = await get_browser(**implicitly(request.browser_request()))
-> Success
browser: Browser = await get_browser(**implicitly({request.browser_request(): BrowserRequest}))
-> Success |
@stuhood Are Digests intended to remain Gets?
This CreateDigest isn't included: leak_sandbox_path_digest = await Get(
Digest,
CreateDigest(
[
FileContent(
cmd,
leak_jdk_sandbox_paths.encode("utf-8"),
is_executable=True,
),
]
),
) Also interesting to note that a couple of the replacements aren't in that rule, but rather a coroutine called in the function: scala_runtime = await _materialize_scala_runtime_jars(scala_version)
async def _materialize_scala_runtime_jars(scala_version: ScalaVersion) -> Snapshot:
scala_artifacts = await Get(
ScalaArtifactsForVersionResult, ScalaArtifactsForVersionRequest(scala_version)
)
tool_classpath = await Get(
ToolClasspath,
ToolClasspathRequest(
artifact_requirements=ArtifactRequirements.from_coordinates(
scala_artifacts.all_coordinates
),
),
)
return await Get(
Snapshot,
AddPrefix(tool_classpath.content.digest, f"jvm/scala-runtime/{scala_version}"),
) That, in itself, isn't a big deal - since my AST code picks it up, but note to self: figure out how to lookup the mapper function |
I ran a subset of the migration (39 files converted) - Found a circular import bug, so need to fix that before I can dig into errors. I ran into a lot of type and property errors, especially in the BSP/JDK area. In a few of the cases, I think the new syntax was underspecified, and I needed to use the explicit typing (hard to automate this, unless I have it run after each partial migration). Other errors, I don't even know where to start with - so I just reverted the code. It was like the new function just wasn't being called at all, which is strange. Anyways, I was more curious if updating those 39 files netted any performance gains, but not yet it seems. Modding some code and re-running led to a 28 second wait with the current codebase, and 27.5 seconds with the migrated syntax. Looking deeper into this, the current syntax I tested the migrations for is only a small fraction of what we need. |
… try to YOLO it on the repo
Another note to self. The ast.unparse function works differently for Python 3.12 and 3.11 🤦🏽 3.11 is "more correct" |
@stuhood @benjyw main...sureshjoshi:pants:20572-call-by-name-samples I ran: python3.11 call-by-name.py # from my other branch
pants --changed-since=HEAD fix fmt lint By coincidence, we get examples of a few different Get cases, as well as some edge cases:
|
That looks great, and thanks for dealing with the rough edges! There is one simplification that it would be really good to implement. For the example: resolved_dependencies = await resolve_parsed_dependencies(
**implicitly(ResolvedParsedPythonDependenciesRequest(fs, parsed_dependencies, resolve))
) ... because the "implicit" argument already matches the type of one of the resolved_dependencies = await resolve_parsed_dependencies(
ResolvedParsedPythonDependenciesRequest(fs, parsed_dependencies, resolve),
**implicitly(),
) We'd like to encourage this for a few reasons, so it would be really awesome if you were able to do it as part of the rewrite:
|
Having said that, passing an argument as positional may not work in 100% of cases currently: for example if one of the other arguments to a So you might try doing it only for single argument |
@benjyw gave all of the native rules names: if they're not already exposed from the native engine, he might be able to help with exposing them. |
@stuhood Thanks for reviewing!
Gotcha, I didn't know this was a perf improvement, I just went with consistency of calling code assuming all else was equal. I'll take a look at carving out well-typed, single-parameters cases as positional arguments (within reason).
Ah interesting, I thought that this was something happening based on which backends were enabled ("so the rewritten file must actually have been loaded" from the tracking issue), and maybe I needed to enable something. I intentionally haven't dug too far into the Rust side of this, as I'm just trying to get the AST-rewriter handled first, before I dig into missing mappings. |
@stuhood Trying to write some integration tests to get the migrate goal up for a PR, but I'm running into trouble as the new backend I've generated isn't getting picked up for syntax migrations? Am I missing something here from the registration? Or is there a problem trying to register as an in-repo plugin? EDIT: Looks like putting in a contrived @goal_rule seems to kick this off? I'm not sure if that's a built-in rule specific requirement though? |
…main migration cases
BuiltinGoals are ... built in. AFAIK, there isn't a way to add them from a plugin/backend. But to be clear: the PR I already landed adds the goal, and is already integrated AFAICT. |
I think we're talking about different things. I'm writing tests for the migrate goal you added, with the syntax AST stuff I added - as a sanity check. I was trying to get the Specifically I added this for my new rules to get picked up: https://github.com/pantsbuild/pants/pull/20714/files#diff-e3b5d93ad8c721f83f88479f20ff9347b17932cf10659ddc278840791adbad8dR28-R33 |
This PR creates a native Python built-in goal which performs the call-by-name migration for passed in files/targets (covers items 1 and 4 in #20572). Note: This is not a completed feature, as several bugs have been split out into sub-tickets for investigation. The simplest use case is: `pants migrate-call-by-name --json ::`
I believe this ticket is complete and recently merged. The overall feature isn't complete until 2 outstanding tickets/investigations are completed, and we actually run it over the whole code-base - but I think 1 through 4 are done as See #20744 - Not entirely sure this is needed, haven't looked into it enough yet. |
🎊 |
Re MultiGet - I guess it should be called |
To complete #19730 by migrating Pants itself and any in-repo plugins to call-by-name syntax, we should create a built-in goal which would:
Load the rule graphFor each@rule
in the module, look up its solution in the rule graph.Get
in a@rule
, with its corresponding call-by-name syntax (using whatever code-rewriting API is best maintained currently)EDIT: #20574 implemented items 2 and 3.
The text was updated successfully, but these errors were encountered: