Kick off Red-knot #10849

MichaReiser · 2024-04-09T16:27:18Z

The beginning of multifile analysis. We'll eventually merge this with ruff but are using a dedicated crate to flesh out the basic infrastructure first.

codspeed-hq · 2024-04-09T16:32:43Z

CodSpeed Performance Report

Merging #10849 will not alter performance

_{Comparing red-knot (dd4748b) with main (845ba7c)}

Summary

✅ 30 untouched benchmarks

crates/red_knot/src/check.rs

crates/red_knot/src/files.rs

crates/red_knot/src/ast_ids.rs

crates/red_knot/Cargo.toml

carljm · 2024-04-12T00:33:51Z

crates/red_knot/src/ast_ids.rs

+#[derive(Debug, Eq, PartialEq, Hash)]
+pub struct FileAstId<N: HasAstId> {
+    ast_id: AstId,
+    _marker: PhantomData<fn() -> N>,


Why do we need this marker?

Rust requires for structs that each type parameter is used by at least one field. Now, AstId isn't generic over N.

We can work around this by using PhantomData. PhantomData has no runtime cost (it compiles down to a zero size type) and it's only purpose is to capture the type N.

You can think of the pattern applied here as compile-time only generics similar to Java where the generic arguments are erased at runtime but we want them at compile time to catch typing errors (e.g. we want to prevent that you use a AstId for an IfStmt to load a FunctionDef).

The thing I find a bit odd here is the fn() -> N; why is it not just PhantomData<N>? It seems like putting the typevar in return position like that might be intended to make TypedAstId contravariant in N rather than variant? But since Rust doesn't have struct subtyping, I'm not really sure what that would even mean; I thought Rust only really has variance for lifetimes.

The reason is (and I was surprised by it) that Phantom<Data> made the type non-copyable, NonEq etc.

I guess the alternative would have been to implement all these types manually https://users.rust-lang.org/t/how-to-copy-phantomdata-of-un-clone-able-types/82229

Oh weird! So basically it's just that Rust knows how to implement Eq for fn() -> N but not necessarily for N?

crates/red_knot/src/ast_ids.rs

crates/red_knot/src/hir.rs

crates/red_knot/src/ast_ids.rs

crates/red_knot/src/db.rs

crates/red_knot/src/ast_ids.rs

crates/red_knot/src/symbols.rs

carljm

The module stuff looks great! Sorry for leaving so many comments; many of them can probably just turn into TODOs for now.

crates/red_knot/src/module.rs

carljm · 2024-04-18T21:23:11Z

crates/red_knot/src/module.rs

+        Self(smol_str::SmolStr::new(name))
+    }
+
+    pub fn from_relative_path(path: &Path) -> Option<Self> {


Perhaps somewhere in here we should validate that the extensions is .py or .pyi?

This actually makes me think that perhaps from_relative_path should not exist at the ModuleName layer but at a layer where it returns several pieces of information: a ModuleName, a ModuleKind (e.g. ::Python or ::Stub), and also maybe is_package boolean (e.g. true for foo/__init__.py, false for foo.py). We will need all of those at some point; as implemented currently this from_relative_path is a lossy operation, since it returns only one of those.

Perhaps those other two fields, and this method, belong on Module/ModuleData?

I think ModuleKind and whether it is a package belongs on ModuleData. ModuleName is just the full qualified name of a package. The only reason I see for adding them to ModuleName is if we want to support explicitly querying modules by their kind, but I don't think this is something we want. That's why I think from_relative_path is fine. All it should do is to create the full qualified name from a relative import (it shouldn't perform any IO).

Perhaps somewhere in here we should validate that the extensions is .py or .pyi?

Yeah, early returning when it is not a py or pyi file makes sense.

Oh, I agree this data belongs on ModuleData, not on ModuleName, I wasn't suggesting to put it on ModuleName.

What I was suggesting was that from_relative_path should not be implemented like this as a ModuleName constructor, because then we throw away information from the path that we will need. I think from_relative_path should instead be implemented at a higher level where it returns all of the data that we can glean from the path, including the ModuleName.

I don't see us throwing away any information, considering that this method doesn't do any Io. The idea here is that we get a full qualified name that we can then throw into the resolve function (that retrieves all information)

Ah, yeah, I was looking at this method in isolation; I see the only place it's actually used is in resolve_path, where the very next step is to resolve that module name back to a path (and make sure it actually resolves to the right path). So I agree, we will get all that information from resolving, and that's where we should get it from.

I would still prefer for this to be a private method of ModuleResolver rather than a public constructor of ModuleName, because I think it's kind of important to ensure that it only be used in the context of resolve_path, as just described. We don't want some code in the future constructing ModuleName using ModuleName::from_relative_path and assuming that means it has a correct module name to path correspondence. But this is more a future thing for robustness, not an important prototype consideration.

Yeah I don't mind making it private and agree, the only two methods that need to be public are relative and new (or absolute).

carljm · 2024-04-18T21:28:58Z

crates/red_knot/src/module.rs

+        Self(smol_str::SmolStr::new(name))
+    }
+
+    pub fn relative(_dots: u32, name: &str, _to: &Path) -> Self {


The algorithm here will be simpler if instead of taking a full Path for _to parameter, we just take a ModuleName and is_package boolean (or some structure that encompasses both, might just be Module).

We need is_package because in foo/bar/__init__.py, from . import baz means foo.bar.baz, but from foo/bar.py, from . import baz means foo.baz.

If we take a full Path here, then we effectively have to re-implement (or call) from_relative_path again as part of this method, but in the places we will likely call this from (resolving a relative import we find in the AST) we will already have all the data of the current module handy, so it will be wasteful to recalculate that from Path.

That makes sense, it also avoids the need to check if the file has a .py extension haha

The only challenge with this is that we need to analyze files that aren't modules (and may not have a module name):

Jupyter notebooks

Files that don't have a py or pyi extension (Ruff supports configuring additional extensions that should be handled as python files)

I don't know much about Jupyter notebooks, but I would assume they just can't have relative imports?

For files that don't have a py or pyi extension, I would assume we still treat them as modules as if they did? I'm not sure what the use cases for this is.

crates/red_knot/src/module.rs

crates/red_knot/src/symbols.rs

carljm · 2024-04-19T18:57:41Z

crates/red_knot/src/module.rs

+        // src
+        //   parent
+        //     child
+        //       one.py


nit

Suggested change

// one.py

// __init__.py

// one.py

JonathanPlasse · 2024-04-22T06:25:17Z

https://excalidraw.com/#json=-Thvh6hnezji3DT3SfFYs,Hjt_fOpRTgpgNKy9Hfb9-Q

The link does not seem to be public.

MichaReiser · 2024-04-22T06:34:39Z

@JonathanPlasse I just opened it in a private session without any problems. Do you get an error that the link is invalid? Also, the link is a bit outdated.

trag1c · 2024-04-22T06:47:05Z

I can open it just fine 👍

JonathanPlasse · 2024-04-22T13:35:50Z

It works for me too in private session. Sorry for the noise.

crates/red_knot/src/types.rs

MichaReiser · 2024-04-23T09:53:42Z

crates/red_knot/src/types.rs

+/// Arena holding all known types
+#[derive(Default)]
+pub(crate) struct TypeEnvironment {
+    types_by_id: IndexVec<TypeId, Type>,


I think it will be interesting if we can come up with a more stable id for types than relying on inference order.

Yeah, it's a good point, I thought about this. I don't think TypeId (as in, the index into this IndexVec) can be stable; that's not really compatible with lazy type evaluation. But we may want to have another, more stable identifier (based on fully qualified name?) that we use most places instead of TypeId.

I could also abandon using the IndexVec arena in this case? But we will have a lot of types...

github-actions · 2024-04-23T19:24:59Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

This is needed because of the AST changes from `OnceCell` to `OnceLock`. If we decide we don't need AST to be Sync/Send, we can revert this along with those changes; just want green CI for now. Also adjust a URL in doc comments so `cargo docs` doesn't complain about it.

Review comments from 762fa0b

Just a reorganization per feedback from Micha that I agreed with. Puts all `impl` right after the corresponding `struct`, and moves iterators further down in the file, since they are implementation detail.

## Summary Indexing definitions of symbols is necessary for fast lazy type evaluation, so that when we see a reference to a name we can figure out the type of that symbol from its definition(s). We only want to do this indexing of definitions once per module, and then it needs to remain available in the cache thereafter for use by lazy type evaluation. Rust lifetimes won't let us use direct references to AST nodes owned by the cache. We could use unsafe code to strip the lifetimes from these references, with safety ensured by our cache invalidation: if we evict the AST from the cache, we must evict the symbol definitions also. But in order to serialize such a cache to disk, we would need (at least) an AST numbering scheme. This may still be something to look into in the future, for improved performance. For now, use `NodeKey`: indirect references to an AST node consisting of a `NodeKind` and a `TextRange`, which we can find again reasonably quickly in the AST. These are easy to serialize, have no lifetime problems, and don't require unsafe code. ## Test Plan Updated tests.

## Summary Now that symbol definitions don't hold direct references to AST nodes, and thus don't have a lifetime bound, there's no longer any reason to separate the "core SymbolTable" from the definitions. All of the symbol table, including definitions, can be cached together, and it all needs to be invalidated together if the module AST changes. So let's simplify this and have fewer structs. ## Test Plan Existing tests.

Fill out the representation of types to a few more cases (especially unions and intersections) before going too far with type evaluation.

I got this all working and solved the API lifetime issues without Arc, by means of a new set of `TypeRef` structs. The remaining potential performance issue is that anytime you hold on to any of the new `TypeRef` structs, you lock a shard of the `TypeStore::modules` dashmap to writes (because you are holding a reference into it). So it will be important to minimize the use and scope of these type-refs. I think we can do this to some degree by caching type judgments using just type IDs. I also think for CLI use when we want to be highly parallel, we can be smart about ordering (check all module bodies first, then check function bodies when module level types are all populated) to minimize write contention. Also, if needed we can break up `ModuleTypeStore`, or use inner mutability and internal locking to have finer-grained locking within it. I went with this version instead of rewriting to have the type arenas hold Arc to the types, because I am not totally convinced the Arc version will be better. With Arc every "read" turns into a write to the atomic reference count, which introduces overhead (which is really useless overhead for us, since ultimately we rely on the arenas for garbage collection). And so we will introduce contention on the atomic reference count even for reads of highly-used types. So for both versions we will have to be careful with our use of references. I think the Arc-free version is lower overhead and sets us up better for future optimization of the locking strategy, once we have more working code to optimize against. Even if I turn out to be wrong about the above and eventually we decide to use Arc, I'd rather go with this for now and move on to type evaluation, and make the Arc change later when we can evaluate the effects better.

This PR demonstrates resolving an import from one module to a class type from another module!

…path (#11162)

carljm reviewed Apr 10, 2024

View reviewed changes

crates/red_knot/src/check.rs Outdated Show resolved Hide resolved

crates/red_knot/src/check.rs Outdated Show resolved Hide resolved

MichaReiser force-pushed the red-knot branch 2 times, most recently from 49dbe6b to 34fa6fd Compare April 10, 2024 14:01

carljm reviewed Apr 10, 2024

View reviewed changes

crates/red_knot/src/files.rs Show resolved Hide resolved

carljm reviewed Apr 10, 2024

View reviewed changes

crates/red_knot/src/files.rs Outdated Show resolved Hide resolved

MichaReiser commented Apr 11, 2024

View reviewed changes

crates/red_knot/src/ast_ids.rs Outdated Show resolved Hide resolved

carljm reviewed Apr 12, 2024

View reviewed changes

crates/red_knot/src/ast_ids.rs Show resolved Hide resolved

MichaReiser force-pushed the red-knot branch from 9f44914 to 6a41893 Compare April 17, 2024 16:49

MichaReiser commented Apr 18, 2024

View reviewed changes

crates/red_knot/src/symbols.rs Outdated Show resolved Hide resolved

crates/red_knot/src/symbols.rs Outdated Show resolved Hide resolved

MichaReiser force-pushed the red-knot branch from 7c6b06f to bf4bd83 Compare April 18, 2024 08:52

carljm force-pushed the red-knot branch from bc210ee to 67bd533 Compare April 18, 2024 18:32

carljm reviewed Apr 19, 2024

View reviewed changes

MichaReiser changed the title ~~Exploration of a salsa like compilation model (does not compile)~~ Red Knot Apr 19, 2024

MichaReiser commented Apr 19, 2024

View reviewed changes

crates/red_knot/src/module.rs Outdated Show resolved Hide resolved

MichaReiser force-pushed the red-knot branch from 0236411 to d834f8a Compare April 19, 2024 09:49

MichaReiser commented Apr 19, 2024

View reviewed changes

crates/red_knot/src/symbols.rs Outdated Show resolved Hide resolved

carljm reviewed Apr 19, 2024

View reviewed changes

crates/red_knot/src/module.rs

// src

// parent

// child

// one.py

Copy link

Contributor

carljm Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change

// one.py

// __init__.py

// one.py

carljm force-pushed the red-knot branch from fe7213e to 654adf8 Compare April 19, 2024 22:54

MichaReiser commented Apr 23, 2024

View reviewed changes

crates/red_knot/src/types.rs Outdated Show resolved Hide resolved

MichaReiser commented Apr 23, 2024

View reviewed changes

crates/red_knot/src/types.rs Outdated Show resolved Hide resolved

MichaReiser commented Apr 23, 2024

View reviewed changes

carljm force-pushed the red-knot branch from f685605 to a937b63 Compare April 24, 2024 02:01

MichaReiser force-pushed the red-knot branch from 608a182 to c20f911 Compare April 25, 2024 07:58

MichaReiser and others added 22 commits April 27, 2024 10:11

Salsa like Database

7b411a5

clean up some clippy in symbols.rs

acdec51

Fix more clippy lints, allow pedantic

d190c6f

a bit more on analyze_imports

2d0f81f

clean up symbol table API a bit

d3ccb4d

remove analyze_imports

e5e0832

initial type environment

c9f3929

Caching cleanups

ebcd5df

Remove unused dependencies

b19cb89

review comments on initial type env (#11113)

274519f

Review comments from 762fa0b

[red-knot] re-organize code in symbols.rs (#11116)

6240613

Just a reorganization per feedback from Micha that I agreed with. Puts all `impl` right after the corresponding `struct`, and moves iterators further down in the file, since they are implementation detail.

red-knot: Add tracing (#11130)

163b261

red-knot: Cache symbol tables (#11106)

bedee31

Red knot: Add file watching and cancellation (#11127)

14b089f

[red-knot] add more types (#11135)

9beb681

Fill out the representation of types to a few more cases (especially unions and intersections) before going too far with type evaluation.

[red-knot] implement eval_symbol for from-import and class-def (#11157)

8554cf3

This PR demonstrates resolving an import from one module to a class type from another module!

[red-knot] Remove support for resolving modules by package directory …

4d7d3b2

…path (#11162)

Change ImportDefinition level to u32

d3620e4

MichaReiser force-pushed the red-knot branch from 381a3bc to d3620e4 Compare April 27, 2024 08:14

Enable pedantic lints and fix violations

dd4748b

MichaReiser added the internal An internal refactor or improvement label Apr 27, 2024

MichaReiser marked this pull request as ready for review April 27, 2024 08:27

MichaReiser changed the title ~~Red Knot~~ Kick off Red-knot Apr 27, 2024

MichaReiser enabled auto-merge (squash) April 27, 2024 08:29

MichaReiser merged commit 7cd065e into main Apr 27, 2024
18 checks passed

MichaReiser deleted the red-knot branch April 27, 2024 08:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kick off Red-knot #10849

Kick off Red-knot #10849

MichaReiser commented Apr 9, 2024 •

edited

codspeed-hq bot commented Apr 9, 2024 •

edited

carljm Apr 12, 2024

MichaReiser Apr 12, 2024

carljm Apr 18, 2024

MichaReiser Apr 19, 2024

carljm Apr 19, 2024

carljm left a comment

carljm Apr 18, 2024

MichaReiser Apr 19, 2024

carljm Apr 19, 2024

MichaReiser Apr 19, 2024

carljm Apr 19, 2024 •

edited

MichaReiser Apr 20, 2024 •

edited

carljm Apr 18, 2024

MichaReiser Apr 19, 2024

carljm Apr 19, 2024

carljm Apr 19, 2024

JonathanPlasse commented Apr 22, 2024

MichaReiser commented Apr 22, 2024

trag1c commented Apr 22, 2024

JonathanPlasse commented Apr 22, 2024

MichaReiser Apr 23, 2024

carljm Apr 23, 2024 •

edited

carljm Apr 23, 2024

github-actions bot commented Apr 23, 2024 •

edited

Kick off Red-knot #10849

Kick off Red-knot #10849

Conversation

MichaReiser commented Apr 9, 2024 • edited

codspeed-hq bot commented Apr 9, 2024 • edited

Merging #10849 will not alter performance

Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm Apr 19, 2024 • edited

Choose a reason for hiding this comment

MichaReiser Apr 20, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonathanPlasse commented Apr 22, 2024

MichaReiser commented Apr 22, 2024

trag1c commented Apr 22, 2024

JonathanPlasse commented Apr 22, 2024

Choose a reason for hiding this comment

carljm Apr 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Apr 23, 2024 • edited

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

MichaReiser commented Apr 9, 2024 •

edited

codspeed-hq bot commented Apr 9, 2024 •

edited

carljm Apr 19, 2024 •

edited

MichaReiser Apr 20, 2024 •

edited

carljm Apr 23, 2024 •

edited

github-actions bot commented Apr 23, 2024 •

edited

`ruff-ecosystem` results