Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable writing to python stdio streams #3920

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions newsfragments/3920.added.md
@@ -0,0 +1 @@
Add `pyo3::stdio::stdout` and `pyo3::stdio::stderr` to enable direct print to python `sys.stdout` and `sys.stderr`.
1 change: 1 addition & 0 deletions src/lib.rs
Expand Up @@ -430,6 +430,7 @@ pub mod impl_;
mod instance;
pub mod marker;
pub mod marshal;
pub mod stdio;
#[macro_use]
pub mod sync;
pub mod panic;
Expand Down
131 changes: 131 additions & 0 deletions src/stdio.rs
@@ -0,0 +1,131 @@
//! Enables direct write access to I/O streams in Python's `sys` module.

//! In some cases printing to Rust's `std::io::stdout` or `std::io::stderr` will not appear
//! in the Python interpreter, e.g. in Jupyter notebooks. This module provides a way to write
//! directly to Python's I/O streams from Rust in such cases.

//! ```rust
//! let mut stdout = pyo3::stdio::stdout();
//!
//! // This may not appear in Jupyter notebooks...
//! println!("Hello, world!");
//!
//! // ...but this will.
//! writeln!(stdout, "Hello, world!").unwrap();
//! ```

use crate::ffi::{PySys_WriteStderr, PySys_WriteStdout};
use crate::prelude::*;
use std::io::{LineWriter, Write};
use std::marker::PhantomData;
use std::os::raw::{c_char, c_int};

trait PyStdioRawConfig {
const STREAM: &'static str;
const PRINTFCN: unsafe extern "C" fn(*const i8, ...);
}

struct PyStdoutRaw {}
impl PyStdioRawConfig for PyStdoutRaw {
const STREAM: &'static str = "stdout";
const PRINTFCN: unsafe extern "C" fn(*const i8, ...) = PySys_WriteStdout;
}

struct PyStderrRaw {}
impl PyStdioRawConfig for PyStderrRaw {
const STREAM: &'static str = "stderr";
const PRINTFCN: unsafe extern "C" fn(*const i8, ...) = PySys_WriteStderr;
}

struct PyStdioRaw<T: PyStdioRawConfig> {
pystream: Py<PyAny>,
_phantom: PhantomData<T>,
}

impl<T: PyStdioRawConfig> PyStdioRaw<T> {
fn new() -> Self {
let pystream: Py<PyAny> = Python::with_gil(|py| {
let module = PyModule::import_bound(py, "sys").unwrap();
module.getattr(T::STREAM).unwrap().into()
});

Self {
pystream,
_phantom: PhantomData,
}
}
}

impl<T: PyStdioRawConfig> Write for PyStdioRaw<T> {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
Python::with_gil(|_py| unsafe {
(T::PRINTFCN)(
b"%.*s\0".as_ptr().cast(),
buf.len() as c_int,
buf.as_ptr() as *const c_char,
);
});
Ok(buf.len())
}
fn flush(&mut self) -> std::io::Result<()> {
Python::with_gil(|py| -> std::io::Result<()> {
self.pystream
.call_method0(py, "flush")
adamreichold marked this conversation as resolved.
Show resolved Hide resolved
.map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?;
Ok(())
})
}
}


struct PyStdio<T: PyStdioRawConfig> {
inner: LineWriter<PyStdioRaw<T>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the Rust-side line buffering here? From the Python documentation

When interactive, the stdout stream is line-buffered. Otherwise, it is block-buffered like regular text files. The stderr stream is line-buffered in both cases. You can make both streams unbuffered by passing the -u command-line option or setting the PYTHONUNBUFFERED environment variable.

I would infer that the line buffering is happening inside the sys.stdout/err Python objects (if it is desired/enabled) to which PySys_WriteStdout/err eventually defer. Meaning that we actually should not make a buffering decision here and also defer to whatever these Python objects decide for the buffer strategy.

Finally, this also makes me wonder whether we should go through the formatting machinery of PySys_WriteStdout/err at all instead of calling write on pystream same as we call flush. The code at

https://github.com/python/cpython/blob/5dc8c84d397110f9edfa56793ad8887b1f176d79/Python/sysmodule.c#L3895

at least does not seem to do anything more special than what we could do directly if intern! is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least does not seem to do anything more special than what we could do directly if intern! is used.

And we could avoid the cost of the runtime formatting machinery entirely, just passing the byte slice directly to write.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it with the LineWrite wrapper because when I tested it I seemed to be getting output immediately from every call to write!(stream,...) from within Rust, rather then getting full lines, i.e. seemingly no buffering was happening. I'm happy to remove it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that this is coming so piece-meal, but maybe we just want a public wrapper type that will adapt any Py<PyAny> as a Write impl by calling its write and flush methods, e.g.

pub struct PyWrite(Py<PyAny>);

impl PyWrite {
  pub fn new(ob: Py<PyAny>) -> Self;
}

impl Write for PyWrite { .. }

pub fn stdout(py: Python<'_>) -> PyResult<PyWrite> {
   let module = PyModule::import_bound(py, "sys")?;
   let stdout = module.getattr("stdout")?.into();
   PyWrite::new(stdout)
},

pub fn stderr() -> PyWrite;

(I am also wondering whether we should have a variant storing Bound<'py, PyAny> to avoid repeatedly calling with_gil in the Write impl. It should be easy enough to convert between PyWrite and PyWriteBound.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because when I tested it I seemed to be getting output immediately from every call to write!(stream,...) from within Rust

I think this depends on under what conditions you tested it, but in any case, the decision for buffering or not just be with the stream stored at sys.stdout/err, not with our wrapper of it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're happy for me to do so, I could add a buffered method to create a LineWriter wrapper as before, i.e.

impl PyWriter {
    pub fn buffered(&self) -> LineWriter<PyWriter> {
        LineWriter::new(self)
    }
}

Or, alternatively, an unbuffered that works the other way around. Fine if not though, since it's easy enough to implement such a feature myself as a user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're happy for me to do so, I could add a buffered method to create a LineWriter wrapper as before, i.e.

We can certainly add a convenience method to add optional buffering. I would suggest calling it line_buffered though as it is a relatively specific buffering strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pyo3::stdio::stdout() were to create something implementing std::fmt::Write instead, then any use of writeln! requires different use declarations depending on context. It also looks as if std::io::Write is used throughout std::io, so doing something like reading from a file and printing to stdout would be awkward if I want to use std::io::copy.

I have to think about this but I am not really convinced as sys.stdout is not the same the Unix stdout on which Rust's std::io::stdout module is operatring, it is defined as a text stream and not a byte stream. I think if you want to expose an impl of std::io::Write, you should use sys.stdout.buffer as the backing stream and not sys.stdout. But note that this will not always be available. (Silently corrupting the written data using from_utf8_lossy is not option IMHO as it is just too surprising if e.g. std::io::copy produces garbage ZIP archives for that reason.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a solution would be to implement new for PyWriterBound, and to test at the point of object creation whether the inner PyAny is an instance of either io.BufferedIOBase or io.TextIOBase. Subsequent write behaviour could then be configured accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsequent write behaviour could then be configured accordingly.

But what would that be? I don't think we can support io::Write using io.TextIOBase. (The other way around works, i.e. fmt::Write can write into io.BufferedIOBase by calling as_bytes, but it would need to do that explicitly to avoid errors on the Python side.)

So personally, I think we should

  • Provide generic wrappers for io::Write and fmt::Write, or since you can here with a specific use case in mind, at least for fmt::Write.
  • Provide constructor functions like io::stdout (accessing sys.stdout.buffer) and fmt::stdout (accessing sys.stdout) which only succeed if the correct Python writer is present.

}

impl<T: PyStdioRawConfig> PyStdio<T> {
fn new() -> Self {
Self {
inner: LineWriter::new(PyStdioRaw::new()),
}
}
}

impl<T: PyStdioRawConfig> Write for PyStdio<T> {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
self.inner.write(buf)
}
fn flush(&mut self) -> std::io::Result<()> {
self.inner.flush()
}
}

/// A handle to Python's `sys.stdout` stream.
pub struct PyStdout(PyStdio<PyStdoutRaw>);
/// A handle to Python's `sys.stderr` stream.
pub struct PyStderr(PyStdio<PyStderrRaw>);

/// Construct a new handle to Python's `sys.stdout` stream.
pub fn stdout() -> PyStdout {
PyStdout(PyStdio::new())
}
/// Construct a new handle to Python's `sys.stderr` stream.
pub fn stderr() -> PyStderr {
PyStderr(PyStdio::new())
}

impl Write for PyStdout {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
self.0.write(buf)
}
fn flush(&mut self) -> std::io::Result<()> {
self.0.flush()
}
}
impl Write for PyStderr {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
self.0.write(buf)
}
fn flush(&mut self) -> std::io::Result<()> {
self.0.flush()
}
}