Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add intern! macro which can be used to amortize the cost of creating Python objects by storing them inside a GILOnceCell. #2269

Merged
merged 6 commits into from Apr 4, 2022

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented Apr 3, 2022

Basically turns the proposed solution of #2266 into a discoverable/reusable macro.

I think this should be mentioned in the guide to actually be discoverable, but I am not sure where.

Closes #2223

@mejrs
Copy link
Member

mejrs commented Apr 3, 2022

Perhaps we should consider #2223

@adamreichold
Copy link
Member Author

Perhaps we should consider #2223 instead

After reading that issue, I am not what the proposed action would be. Isn't the _PyUnicode_FromId considered internal to CPython? Would we re-implement it and if so, wouldn't that essentially look like this macro?

Additionally, I do like that GILOnceCell can handle arbitrary Python objects and that the code to access them is transparent to the Rust compiler which should result in better machine code quality compared to calling into the FFI boundary.

src/once_cell.rs Outdated Show resolved Hide resolved
@mejrs
Copy link
Member

mejrs commented Apr 3, 2022

After reading that issue, I am not what the proposed action would be.

I worded that poorly. I meant more "this PR and that issue are closely related and should be considered together". I would much prefer this macro over messing with internal cpython api.

@alex what are your thoughts on this?

@mejrs
Copy link
Member

mejrs commented Apr 3, 2022

What do you think about adding a more generic version of that intern! for arbitrary PyObject instances

I'm concerned with how this interacts with generics:

pub fn bar<T: pyo3::ToPyObject>(py: Python<'_>, thing: T) -> &PyAny {
    intern!(py, &thing)
}

fn main() {
    Python::with_gil(|py| {
        let a = bar(py, "foo");
        let b = bar(py, 42);
        assert!(a.is(b));
    });
}

I think one way to fix this is to not allow the macro to capture variables...let me experiment with that.

@adamreichold
Copy link
Member Author

I think one way to fix this is to not allow the macro to capture variables...let me experiment with that.

You mean something like forcing the initializer to be evaluated in const context?

@alex
Copy link
Contributor

alex commented Apr 3, 2022

A few thoughts:

  • This looks much better than messing with the internals, the way I did (which was only ever a PoC for measurement, not a real suggestion)
  • Do you have a benchmark for how this performs?
  • How does this intersect with multiple interpreters?
  • What do you think about narrowing the API to only handle strings, to limit complexity? If it's strings only than the generics issue seems fine :-)

@adamreichold
Copy link
Member Author

Do you have a benchmark for how this performs?

Amended the commit with a simple benchmark and temporarily included your pyid! macro. It seems that the lazy initialization performed by GILOnceCell does have a cost, but it is still approximately factor of four compared to the direct method:

getattr_direct          time:   [79.420 ns 79.501 ns 79.591 ns]                           
getattr_intern          time:   [19.319 ns 19.339 ns 19.360 ns]                            
getattr_pyid            time:   [17.514 ns 17.529 ns 17.546 ns]                          

@alex
Copy link
Contributor

alex commented Apr 3, 2022

Nice. While I'm sad to give up 2ms this is a HUGE improvement over the status quo :-)

@birkenfeld
Copy link
Member

birkenfeld commented Apr 3, 2022

And has the advantage that it won't be removed soon like PyIdentifier.

@adamreichold
Copy link
Member Author

adamreichold commented Apr 3, 2022

What do you think about narrowing the API to only handle strings, to limit complexity? If it's strings only than the generics issue seems fine :-)

I was thinking about this and started to write a Rust-side re-implementation of what CPython 3.8 does for _PyUnicode_FromId and ended up with something that was still slower. I then re-read the CPython source and noticed that they intern the Python string backing the identifier, so I grabbed PyString::intern from #2268 and it turned out that for this particular benchmark (querying sys.version) this was indeed making the difference:

getattr_direct          time:   [143.34 ns 143.47 ns 143.60 ns]                           
getattr_intern          time:   [30.772 ns 30.796 ns 30.817 ns]                            
getattr_pyid            time:   [32.677 ns 32.701 ns 32.721 ns]

(The absolute numbers are different because I remembered to disable CPU frequency boosting.)

So I think we should - if only for this reason - expose a macro limited to creating Python strings for identifiers and make sure that we intern those strings.

(The bespoke re-implementation of _Py_Identifier in Rust did not improve things further compared to the now pushed version which still uses a macro around the existing GILOnceCell.)

(Due to using PyString::intern, this now includes the commits from #2269.)

…Python objects by storing them inside a GILOnceCell.
@davidhewitt
Copy link
Member

This is very cool! The only concern I have, is that I think for PEP 489 compatibility (reloadable modules / sub-interpreter compatibility) we need to be moving away from global statics to module state. I haven't thought about it super hard though; so I'm not sure if this is ok as-is or whether there's an easy tweak to intern as part of module state?

@mejrs
Copy link
Member

mejrs commented Apr 4, 2022

This is very cool! The only concern I have, is that I think for PEP 489 compatibility (reloadable modules / sub-interpreter compatibility) we need to be moving away from global statics to module state. I haven't thought about it super hard though; so I'm not sure if this is ok as-is or whether there's an easy tweak to intern as part of module state?

From the pep:

No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object’s storage reachable by PyModule_GetState. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes.

It's a bit weasily but it sounds like putting immutable builtin types like integers and strings in statics is OK?

Also, does this mean the GilOncecell example is a bad idea?

@birkenfeld
Copy link
Member

It's a bit weasily but it sounds like putting immutable builtin types like integers and strings in statics is OK?

I think so too. Another good reason to restrict this to strings.

Also, does this mean the GilOncecell example is a bad idea?

Yes, we probably should offer a convenient way to attach such "static" globals to modules.

CHANGELOG.md Outdated Show resolved Hide resolved
@adamreichold
Copy link
Member Author

It's a bit weasily but it sounds like putting immutable builtin types like integers and strings in statics is OK?

I think so too. Another good reason to restrict this to strings.

I am not sure this applies to interned strings as well as those implicitly reference the interning table that as probably at least per-interpreter?

Also, does this mean the GilOncecell example is a bad idea?

Yes, we probably should offer a convenient way to attach such "static" globals to modules.

I suspect that the result might end up looking like the standard library's std::thread::LocalKey, i.e. the static is only used to index into a per-module data structure (like the module dictionary).

How does this intersect with multiple interpreters?

so I'm not sure if this is ok as-is or whether there's an easy tweak to intern as part of module state?

I admittedly do not yet know how a PyO3 that is properly compatible with both of these things will look like, but I suspect that GILOnceCell will not be able not keep its current form. So my pragmatic answer is that this PR could stay as it is because while it not does improve things w.r.t. global state, it also does not really make it worse and will need the same changes like every other user of GILOnceCell to accommodate multiple interpreters/reloadable modules.

@davidhewitt
Copy link
Member

I admittedly do not yet know how a PyO3 that is properly compatible with both of these things will look like

Agreed, once we've figured this out there will likely be a bunch of APIs which would need deprecation anyway.

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :) Just one little thing...

src/once_cell.rs Show resolved Hide resolved
@adamreichold
Copy link
Member Author

adamreichold commented Apr 4, 2022

Sorry for the late change, but while trying this out, I noticed that implicitly using std::convert::Into would break the hygiene tests and also that using $text: literal instead of $text: expr breaks using e.g. stringify!(foobar) for the text.

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to expr is a bit footgunny, I think.

src/once_cell.rs Show resolved Hide resolved
Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This should be ready to merge now? 👍

@davidhewitt
Copy link
Member

Think so 👍

@alex
Copy link
Contributor

alex commented Apr 4, 2022 via email

…ing multiple times them with different text values yielding inconsistent results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[performance] Equivalent for CPython's _Py_Identifier
5 participants