Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe wrappers for PEP 587 initialization #1474

Closed
davidhewitt opened this issue Mar 6, 2021 · 6 comments
Closed

Safe wrappers for PEP 587 initialization #1474

davidhewitt opened this issue Mar 6, 2021 · 6 comments

Comments

@davidhewitt
Copy link
Member

This is a follow-on from the discussion in indygreg/PyOxidizer#324.

cc @indygreg @wkschwartz

The issue mentioned there is that the ffi bindings in initconfig.rs for PyPreConfig and PyConfig are cumbersome to initialize safely, so it would be helpful for us to provide safe wrappers for PyPreConfig_InitPythonConfig etc.

Quoting indygreg/PyOxidizer#324 (comment), a possible design is (credit @wkschwartz)

use std::mem::MaybeUninit;  
impl PyPreConfig {
  /// Create a new instance initialized with the [Python configuration].
  ///
  /// [Python configuration]: https://docs.python.org/3/c-api/init_config.html#init-python-config
  fn python() -> Self {
    let mut ppc = MaybeUninit::uninit();
    unsafe {
      PyPreConfig_InitPythonConfig(ppc.as_mut_ptr());
      ppc.assume_init()
    }
  }

  /// Create a new instance initialized with the [isolated configuration].
  ///
  /// [isolated configuration]: https://docs.python.org/3/c-api/init_config.html#isolated-configuration
  fn isolated() -> Self {
    let mut ppc = MaybeUninit::uninit();
    unsafe {
      PyPreConfig_InitIsolatedConfig(ppc.as_mut_ptr());
      ppc.assume_init()
    }
  }
}

I have a few notes on this design before we commit to this:

  1. After a user creates one of these FFI as per structs, they will often still need to make use of the unsafe helpers such as PyConfig_SetBytesString to further configure the fields contained. Should we also provide safe APIs for these?
  2. We recently split out the auto-initialize feature, asking users to call prepare_freethreaded_python() directly if they needed. That API doesn't offer a way to tie in these ffi structures. If we are implementing some safe wrappers for the PEP 587 APIs in PyO3, I think it would be nice to go the whole way so that users can have a safe alternative to prepare_freethreaded_python().
  3. Once done, users should call PyConfig_Clear to clean up the allocations contained. Sounds like a perfect candidate for an impl Drop.

Points 1 and 3 particular make me think that instead of adding the methods provided above, we could consider adding a module pyo3::init, which contains safe wrappers roughly along the lines of the following:

pub fn initialize_python_from_config(config: &PyConfig) {
    unsafe { ffi::Py_InitializeFromConfig(config); }
}

pub struct PyConfig(ffi::PyConfig);

impl PyConfig {
    // contains the ::isolated() and ::python() methods above, as well as other safe helpers to set all the fields
}

impl Drop for PyConfig {
    fn drop(&mut self) {
        unsafe { ffi::PyConfig_Clear(&mut self.0); }
    }
}
@wkschwartz
Copy link

I don’t fully grok the role of PyPreConfig, but using one in conjunction with your proposed fn initialize_python_from_config(&PyConfig) should also go through a safe API. Otherwise users who need PyPreConfig may be stuck with using PyConfig unsafely too.

Also a thing I don’t understand is how initialization interacts with subinterpreters. Should a safe initialization function return a sub interpreter object? A threadstate object? Or is it all necessarily global?

@indygreg
Copy link
Contributor

indygreg commented Mar 6, 2021

The pyembed crate's code for initializing the Python interpreter is... complex and unsafe is everywhere.

While I would welcome safe API wrappers for a lot of that, my guess is that the audience size for some of those APIs will be appropriately 1 :) I think safe PyConfig APIs for managing those object's lifecycle is justifiable, as that is a generically useful API for initializing a Python interpreter. The more questionable features are actually performing mutation operations on those data structures and support for multi-phase initialization. Although wrappers for APIs like PyConfig_SetBytesString don't seem that difficult to implement.

Supporting multi-phase initialization is its own beast. I'll need a way to use PyO3's APIs for initializing a Rust-implemented meta path importer between the calls to Py_InitializeFromConfig() and _Py_InitializeMain(). I believe Python doesn't say it is initialized until after _Py_InitializeMain() returns and PyO3's current checks around interpreter state when acquiring GILGuard fail in this scenario (if you attempt to acquire PyO3's GILGuard or Python before _Py_InitializeMain() completes it complains the interpreter isn't initialized).

From my perspective, I need some kind of way to obtain a usable Python between Py_InitializeFromConfig() and _Py_InitializeMain(). And I also need Rust code that is invoked indirectly as part of running _Py_InitializeMain() to be able to use a Python instance. (What happens in pyembed is I inject a Rust Python class into sys.meta_path and Python's interpreter startup routines call into that Python class before _Py_InitializeMain() completes.) Perhaps there could be an initialize_multi_phase() API that accepted a closure to run between the multi-phase initialization APIs?

https://github.com/indygreg/PyOxidizer/blob/590e39d747ae5ac98ac0521192f8af483a2a7885/pyembed/src/interpreter.rs#L170 is the specific (hopefully sufficiently documented) code explaining this in more detail.

@indygreg
Copy link
Contributor

indygreg commented Mar 6, 2021

Also a thing I don’t understand is how initialization interacts with subinterpreters. Should a safe initialization function return a sub interpreter object? A threadstate object? Or is it all necessarily global?

There's a concept of a main interpreter. The initialization APIs and the GIL are concerned with its state. Subinterpreters are things that can exist in addition to the main interpreter, after the main interpreter is initialized.

Also note that alpha builds of Python 3.10 have conditional support for per-subinterpreter locks (#define EXPERIMENTAL_ISOLATED_SUBINTERPRETERS 1), so each subinterpreter effectively has its own GIL. I haven't been following development of this feature that closely. But it certainly has implications for how PyO3 models the GIL.

@davidhewitt
Copy link
Member Author

davidhewitt commented Mar 6, 2021

While I would welcome safe API wrappers for a lot of that, my guess is that the audience size for some of those APIs will be appropriately 1 :)

That's true. You also make a good point that multi-phase initialization is complex and not something many users clearly need, so it's probably low priority for us to maintain in PyO3 vs. core optimizations and other features.

Thinking on the further, perhaps it's fine to leave this functionality out of PyO3 and allow pyembed / PyOxidizer deal with the raw ffi structs directly?

The original need you had for addition to PyO3 was the default impls for the ffi structs, however now that we note the right way to initialize them is to use ``MaybeUninitand the ffi functions, it's potentially manageable to do that inpyembed` for now... ?

@indygreg
Copy link
Contributor

indygreg commented Mar 6, 2021

I'm fine with PyO3 reducing scope here. pyembed already does a lot of direct, unsafe ffi and I am OK with that. I don't want to impose too much on other crates for niche features.

The things I'll need from PyO3 are defining some missing ffi symbols (seems pretty non-controversial) and maybe a (possibly unsafe) API to coerce PyO3's GIL checks into working during multi-phase initialization. Let me keep hacking on my port. Once I have a better grasp on what's going on in GIL land, I'll make some noise.

@davidhewitt
Copy link
Member Author

I think for now I'm going to close this as something we're not going to worry about in PyO3, but we can revisit this in the future if it becomes a popular demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants