Enable writing to python stdio streams #3920

goulart-paul · 2024-03-01T15:50:20Z

Enables creation of handles for printing directly to python sys.stdout and sys.stderr. This is a (partial) fix for #2247.

Usually calling Rust's println!result's in output appearing in the python interpreter. However, in some cases this fails (particularly in some, but not all, Jupyter notebooks and on Google Colab) because the Rust std::io::stdout and std::io::stderr streams are not redirected to match Python's sys.stdout and sys.stderr.

This does not directly fix the problem with println!, but instead enables printing via
writeln!(pyo3::stdio::stdout(),...)

I have not written a unit test for this because it's unclear what such a test should do other than not crash. I don't see an obvious way of checking via a unit test whether text piped to a python stream actually appears.

davidhewitt · 2024-03-01T23:46:35Z

Thanks for opening this PR! I will do my best to review tomorrow.

src/stdio.rs

davidhewitt · 2024-03-02T23:14:25Z

Thanks @adamreichold for picking up the review 👍

As well as the above, I think tests are definitely worth adding here. It would be possible to achieve that by temporarily assigning the Python sys.stdout / sys.stderr streams to io.StringIO objects (or similar), use these APIs to write to them, and validate the final contents.

src/stdio.rs

adamreichold · 2024-03-03T10:25:08Z

src/stdio.rs

+
+
+struct PyStdio<T: PyStdioRawConfig> {
+    inner: LineWriter<PyStdioRaw<T>>,


Do we really need the Rust-side line buffering here? From the Python documentation

When interactive, the stdout stream is line-buffered. Otherwise, it is block-buffered like regular text files. The stderr stream is line-buffered in both cases. You can make both streams unbuffered by passing the -u command-line option or setting the PYTHONUNBUFFERED environment variable.

I would infer that the line buffering is happening inside the sys.stdout/err Python objects (if it is desired/enabled) to which PySys_WriteStdout/err eventually defer. Meaning that we actually should not make a buffering decision here and also defer to whatever these Python objects decide for the buffer strategy.

Finally, this also makes me wonder whether we should go through the formatting machinery of PySys_WriteStdout/err at all instead of calling write on pystream same as we call flush. The code at

https://github.com/python/cpython/blob/5dc8c84d397110f9edfa56793ad8887b1f176d79/Python/sysmodule.c#L3895

at least does not seem to do anything more special than what we could do directly if intern! is used.

at least does not seem to do anything more special than what we could do directly if intern! is used.

And we could avoid the cost of the runtime formatting machinery entirely, just passing the byte slice directly to write.

I implemented it with the LineWrite wrapper because when I tested it I seemed to be getting output immediately from every call to write!(stream,...) from within Rust, rather then getting full lines, i.e. seemingly no buffering was happening. I'm happy to remove it though.

Sorry that this is coming so piece-meal, but maybe we just want a public wrapper type that will adapt any Py<PyAny> as a Write impl by calling its write and flush methods, e.g.

pub struct PyWrite(Py<PyAny>); impl PyWrite { pub fn new(ob: Py<PyAny>) -> Self; } impl Write for PyWrite { .. } pub fn stdout(py: Python<'_>) -> PyResult<PyWrite> { let module = PyModule::import_bound(py, "sys")?; let stdout = module.getattr("stdout")?.into(); PyWrite::new(stdout) }, pub fn stderr() -> PyWrite;

(I am also wondering whether we should have a variant storing Bound<'py, PyAny> to avoid repeatedly calling with_gil in the Write impl. It should be easy enough to convert between PyWrite and PyWriteBound.)

because when I tested it I seemed to be getting output immediately from every call to write!(stream,...) from within Rust

I think this depends on under what conditions you tested it, but in any case, the decision for buffering or not just be with the stream stored at sys.stdout/err, not with our wrapper of it.

If you're happy for me to do so, I could add a buffered method to create a LineWriter wrapper as before, i.e.

impl PyWriter { pub fn buffered(&self) -> LineWriter<PyWriter> { LineWriter::new(self) } }

Or, alternatively, an unbuffered that works the other way around. Fine if not though, since it's easy enough to implement such a feature myself as a user.

If you're happy for me to do so, I could add a buffered method to create a LineWriter wrapper as before, i.e.

We can certainly add a convenience method to add optional buffering. I would suggest calling it line_buffered though as it is a relatively specific buffering strategy.

If pyo3::stdio::stdout() were to create something implementing std::fmt::Write instead, then any use of writeln! requires different use declarations depending on context. It also looks as if std::io::Write is used throughout std::io, so doing something like reading from a file and printing to stdout would be awkward if I want to use std::io::copy.

I have to think about this but I am not really convinced as sys.stdout is not the same the Unix stdout on which Rust's std::io::stdout module is operatring, it is defined as a text stream and not a byte stream. I think if you want to expose an impl of std::io::Write, you should use sys.stdout.buffer as the backing stream and not sys.stdout. But note that this will not always be available. (Silently corrupting the written data using from_utf8_lossy is not option IMHO as it is just too surprising if e.g. std::io::copy produces garbage ZIP archives for that reason.)

Maybe a solution would be to implement new for PyWriterBound, and to test at the point of object creation whether the inner PyAny is an instance of either io.BufferedIOBase or io.TextIOBase. Subsequent write behaviour could then be configured accordingly.

Subsequent write behaviour could then be configured accordingly.

But what would that be? I don't think we can support io::Write using io.TextIOBase. (The other way around works, i.e. fmt::Write can write into io.BufferedIOBase by calling as_bytes, but it would need to do that explicitly to avoid errors on the Python side.)

So personally, I think we should

Provide generic wrappers for io::Write and fmt::Write, or since you can here with a specific use case in mind, at least for fmt::Write.

Provide constructor functions like io::stdout (accessing sys.stdout.buffer) and fmt::stdout (accessing sys.stdout) which only succeed if the correct Python writer is present.

goulart-paul · 2024-03-03T10:30:01Z

Thanks @adamreichold for picking up the review 👍

As well as the above, I think tests are definitely worth adding here. It would be possible to achieve that by temporarily assigning the Python sys.stdout / sys.stderr streams to io.StringIO objects (or similar), use these APIs to write to them, and validate the final contents.

Good idea. I had not thought to redirect the python side streams someplace else for inspection. I will give this a try.

Edit : unit test added.

goulart-paul added 6 commits March 1, 2024 15:12

Enable writing to python stdio streams

d360b44

add newsfragment

b0217de

add newsfragment

19bb492

rustfmt

421fb03

add missing doc example imports

de7d4de

rustfmt

7e0d188

adamreichold reviewed Mar 2, 2024

View reviewed changes

src/stdio.rs Outdated Show resolved Hide resolved

src/stdio.rs Outdated Show resolved Hide resolved

src/stdio.rs Outdated Show resolved Hide resolved

goulart-paul added 2 commits March 3, 2024 10:09

rewrite w/o macros

7a8f468

remove old comments

0d5e21d

adamreichold reviewed Mar 3, 2024

View reviewed changes

src/stdio.rs Outdated Show resolved Hide resolved

adamreichold reviewed Mar 3, 2024

View reviewed changes

goulart-paul added 6 commits March 3, 2024 10:38

intern! flush

5ad567b

rustfmt

6fa04aa

rewrite using PyWriter(Py<PyAny>)

41d600a

add Bound variant

e90f220

added unit test

f328a20

actually added unit test this time

e2b8e53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable writing to python stdio streams #3920

Enable writing to python stdio streams #3920

goulart-paul commented Mar 1, 2024

davidhewitt commented Mar 1, 2024

davidhewitt commented Mar 2, 2024

adamreichold Mar 3, 2024

adamreichold Mar 3, 2024

goulart-paul Mar 3, 2024

adamreichold Mar 3, 2024

adamreichold Mar 3, 2024

goulart-paul Mar 4, 2024

adamreichold Mar 4, 2024

adamreichold Mar 4, 2024

goulart-paul Mar 6, 2024

adamreichold Mar 9, 2024

goulart-paul commented Mar 3, 2024 •

edited



		struct PyStdio<T: PyStdioRawConfig> {
		inner: LineWriter<PyStdioRaw<T>>,

Enable writing to python stdio streams #3920

Are you sure you want to change the base?

Enable writing to python stdio streams #3920

Conversation

goulart-paul commented Mar 1, 2024

davidhewitt commented Mar 1, 2024

davidhewitt commented Mar 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goulart-paul commented Mar 3, 2024 • edited

goulart-paul commented Mar 3, 2024 •

edited