Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Typeinfo object identity equality #2735

Merged

Conversation

thautwarm
Copy link
Contributor

This addresses fable-compiler/Fable.Python#42 in another approach. It does not need a change to the compiler code.

Such approach leverages a theory(need confirmation?) that .NET type identities are decided by the fullname and its generic type arguments. The only exception is anonymous records, which use unordered field info for equality.

This PR implements type comparisons that consume constant space and constant time.

@dbrattli
Copy link
Collaborator

dbrattli commented Jan 15, 2022

Looks good to me! What do you think @alfonsogarciacaro? Could you have a look at this? Do we btw have this problem in JavaScript?

@dbrattli dbrattli changed the title Typeinfo object identity equality [Python] Typeinfo object identity equality Jan 15, 2022
Copy link
Collaborator

@dbrattli dbrattli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for fixing!

@dbrattli dbrattli merged commit bc8d787 into fable-compiler:beyond Jan 18, 2022
@alfonsogarciacaro
Copy link
Member

@dbrattli Sorry, I didn't realize you mentioned me in the thread 😅 I wish GH told you when a notification is because of a direct mention (maybe it does and I'm missing something?).

obj.ReferenceEquals is compiled to JS strict equality === and I use it sometimes for cheap equality. It doesn't always give you the same results as in .NET, but I think it's the best we can get. According to the documentation, ReferenceEquals is like Equals but cannot be overriden. .NET also gives you counter-intuitive results sometimes (see the remarks in the documentation).

In the case of types, it does seems that .NET actually produces the same instance, even when using MakeGenericType which I found quite surprising. But it seems to use a special mechanism that's not a cache.

let t1 = typeof<int list>
let t2 = typedefof<obj list>.MakeGeneric([|typeof<int>|])
obj.ReferenceEquals(t1, t2) // true

I guess a cache as in this PR would have the same effect, although I'm always reluctant to include caches in library code unless it's explicit because they may grow out of control. Another solution could be to have a special implementation of ReferenceEquals for built-in types (in the case of Type it could just defer to Equals which should have the same effect).
In that particular example it evals to false the two times, and with strings and value types, obj.ReferenceEquals can also be counter

@alfonsogarciacaro
Copy link
Member

@dbrattli Sorry, I've been thinking about this, and I'm still unsure it's a good idea to include a cache in the library. Particularly not just to try to make ReferenceEquals consistent with .NET. As seen above, ReferenceEquals can also be counter-intuitive in .NET. In Fable JS we've tried to make equality consistent with F# in .NET but ReferenceEquals falls in the category of "Behavior depends on the platform". Among other reasons because we cannot reference vs value types in JS as in .NET, e.g. we cannot pass value by ref (I think it's the same in Python).

@thautwarm Is there a reason you cannot use Equals instead of ReferenceEquals for type comparison?

@thautwarm
Copy link
Contributor Author

I think your idea of avoiding cache makes sense, but a function that creates typeinfo should be pure, I think it is okay to cache such a function.

@thautwarm Is there a reason you cannot use Equals instead of ReferenceEquals for type comparison?

I can always use Equals instead of ReferenceEquals, the only concern here is the performance. I use reflections for data parsing/validation, like what pydantic does. Pydantic has unsolved corner cases as Python is not really statically typed (pydantic/pydantic#2678). F# by Fable can provide a novel way (for Python users) to achieve safer and more correct type-driven data parsing/validation: we can define types, and do serialization/deserialization as convenient as those in FSharp.Json. Such capability is a selling point of Fable for me, and one day we might attract Python users with it.

Concretely, type-driven data parsing/validation via F# reflection needs to map type objects (statically resolved in F# hence more reliable) to handler functions, but map lookup using structural hashing and structural equality is expensive. As can be seen in the following example code, serializing/deserializing nested data structure will compare types frequently, while using object identity for hashing and equality makes such recursion really cheap.

let rec deserialize (registeredHandlers: Map<System.Type, Handler>) (t: System.Type) (stream): obj =
     if t is record/union then // composite types
           // also lookup the map and call 'deserialize' recursively
     else
       match Map.tryFind t registeredHandlers with
       | None -> ..
       | Some handler -> handler.deserialize stream

@thautwarm
Copy link
Contributor Author

I did some rudimentary benchmark on JSON parsing, which shows my type-driven data parsing is slow due to F# reflection. I haven't make tests against this PR to show how much the performance gets improved, as it's difficult to locally use Fable.Cli...

@alfonsogarciacaro
Copy link
Member

Thanks for your reply @thautwarm! Some notes:

  • About using caches in Fable library: the problem is it's difficult to know when to free the cache and while we don't we are retaining some memory users will expect to be freed. Suppose we end up adding attribute info to the types, attributes can get arbitrarily large.

  • About equality in F#/Fable: we can say F# is opinionated about equality by adding structural equality to its native types (unions/records) and collections. Fable has implemented this structural equality to keep the F# semantics, but I'm unsure about trying to match the .NET behavior with ReferenceEquals because its semantics are less clear. I was surprised to learn the ReferenceEquals returned always true for types, I would have always used Equals to compare types. In maps, dictionaries, the performance is not achieved by using ReferenceEquals but GetHashCode which allows to quickly discard non-matching elements.

  • Thoth.Json also generates coders using reflection. I think it's quite fast (we haven't really done benchmarks, but nobody has complained it's too slow), it has a non-cached and cached (explicitly) version. The coders are stored in a map using the type fullname (there's an optimization in Fable to compile typeof<'T>.FullName as a string).

  • Finally, to quickly test a local version of Fable :)

clone https://github.com/fable-compiler/Fable
cd Fable
git checkout beyond # or the branch you're working on
dotnet fsi build.fsx library-py
dotnet fsi build.fsx quicktest-py

Then you can edit src/quictest-py/quicktest.fsx and the quicktest.py file next to it will be updated (note the generated F# will be enclosed in the #fsharp delimiters so you can keep Python code outside and it won't be overwritten).

@dbrattli
Copy link
Collaborator

Thanks for the good explanation @alfonsogarciacaro. I agree that caches (especially unbounded) are scary (and if the user do not know about them). So we need to figure out a better way to solve this or revert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants