Greatly speed up doctests by compiling compatible doctests in one file #123974

GuillaumeGomez · 2024-04-15T17:34:38Z

I split the changes as much as possible to make it as easy as possible to review (even though it's a huge lot of code changes...).

The following tests are not included into the combined doctests:

compile_fail
#![no_std]
have invalid AST
test_harness
no capture
--show-output (because the output is nicer without the extra code surrounding it)

Everything else is merged into one file. If this one file fails to compile, we go back to the current strategy: compile each doctest separately.

Because of the edition codeblock attribute, I couldn't make them all into one file directly so I grouped them by edition, but it should be pretty anecdotic.

In case the users want a specific doctest to be opted-out from this doctest merge, I added the standalone codeblock attribute:

/// ```rust,standalone
/// // This doctest will be run in its own process!
/// ```

Now the interesting part, I ran it on a few crates and here are the results (with cargo test --doc to only include doctests):

crate	nb doctests	before this PR	with this PR
sysinfo	227	4.6s	1.11s
geos	157	3.95s	0.45s
core	4604	54.08s	23.20s
std	1147	13.88s	8.82s

r? @notriddle

GuillaumeGomez · 2024-04-16T18:19:36Z

Seems like where there are enough tests, libtest simply blocks and does nothing. I'll split tests by chunks of ~200 tests at once to start and then check how to improve output.

rustbot · 2024-04-17T14:04:06Z

Some changes occurred in run-make tests.

cc @jieyouxu

petrochenkov · 2024-04-17T14:45:47Z

There should probably be some opt out from this merging.

There are numerous reasons why two pieces of code may cause issues when combined together (they should compete for some common resource, in general, and that is not always detectable by a compiler).
This was previously attempted with run-pass tests from the compiler test suite, but it ended up being too brittle.

notriddle · 2024-04-17T15:01:55Z

How does this affect env_logger?

GuillaumeGomez · 2024-04-17T16:26:48Z

It fails in multiple tests:

running 28 tests
test src/fmt/humantime.rs - fmt::humantime::Formatter::timestamp (line 17) ... ok
test src/fmt/mod.rs - fmt (line 21) ... ok
test src/fmt/mod.rs - fmt (line 42) ... ok
test src/fmt/mod.rs - fmt::Formatter (line 121) ... ok
test src/lib.rs - (line 175) ... ok
test src/lib.rs - (line 234) ... ok
test src/logger.rs - logger::Builder::filter (line 382) ... ok
test src/logger.rs - logger::Builder::filter_level (line 360) ... ok
test src/logger.rs - logger::Builder::format (line 231) ... ok
test src/logger.rs - logger::Builder::filter_module (line 341) ... ok
test src/logger.rs - logger::Builder::write_style (line 436) ... ok
test src/logger.rs - logger::Logger::from_default_env (line 600) ... ok
test src/logger.rs - logger::Builder::target (line 415) ... ok
test src/logger.rs - logger::Logger::from_env (line 567) ... ok
test src/logger.rs - logger::Logger::from_env (line 576) ... ok
test src/lib.rs - (line 253) ... FAILED
test src/lib.rs - (line 214) ... FAILED
test src/lib.rs - (line 19) ... FAILED
test src/logger.rs - logger::Builder (line 22) ... FAILED
test src/logger.rs - logger::Builder::from_default_env (line 176) ... FAILED
test src/logger.rs - logger::Builder::from_env (line 96) ... FAILED
test src/logger.rs - logger::Builder::from_env (line 86) ... FAILED
test src/logger.rs - logger::Builder::parse_default_env (line 199) ... FAILED
test src/logger.rs - logger::Builder::new (line 56) ... FAILED
test src/logger.rs - logger::Builder::parse_env (line 123) ... FAILED
test src/logger.rs - logger::Builder::parse_env (line 138) ... FAILED
test src/logger.rs - logger::init_from_env (line 927) ... FAILED
test src/logger.rs - logger::try_init_from_env (line 890) ... FAILED

failures:

---- src/lib.rs - (line 253) stdout ----
thread 'src/lib.rs - (line 253)' panicked at src/logger.rs:499:14:
Builder::init should not be called after logger initialized: SetLoggerError(())
stack backtrace:
   0:     0x564cf5d0b200 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h9cb9ba9de157d448
   1:     0x564cf5d3115b - core::fmt::write::h712ed17a4bea48d7
   2:     0x564cf5cf3ff9 - std::io::Write::write_fmt::ha9b4cf5497604558
   3:     0x564cf5d0afbe - std::sys_common::backtrace::print::h527405b0eb64d489
   4:     0x564cf5d0c09a - std::panicking::default_hook::{{closure}}::h7c8284f1aa6c5571
   5:     0x564cf5d0bdbf - std::panicking::default_hook::haf59e725ae5fe4cb
   6:     0x564cf5ca07fa - test::test_main::{{closure}}::h24de74550d40f595
   7:     0x564cf5d0c749 - std::panicking::rust_panic_with_hook::hc2212689298edd22
   8:     0x564cf5d0c4f6 - std::panicking::begin_panic_handler::{{closure}}::h2058e51712b5fdf6
   9:     0x564cf5d0b419 - std::sys_common::backtrace::__rust_end_short_backtrace::h1dc4a638a0e69753
  10:     0x564cf5d0c227 - rust_begin_unwind
  11:     0x564cf5b2dc16 - core::panicking::panic_fmt::h88caaa095c0a5b53
  12:     0x564cf5b2e586 - core::result::unwrap_failed::ha5ec7624a00a22e4
  13:     0x564cf5b391fe - core::result::Result<T,E>::expect::ha6de1769dcf31264
                               at /home/imperio/rust/rust/library/core/src/result.rs:1034:23
  14:     0x564cf5b3610d - env_logger::logger::Builder::init::h3eaa9af3496d3a3d
                               at /home/imperio/rust/env_logger/src/logger.rs:498:9
  15:     0x564cf5b31e74 - rust_out::__doctest_8::main::hc4539ff026f50669
  16:     0x564cf5b31ee3 - rust_out::__doctest_8::TEST::{{closure}}::hffdb7dfd4fa1d8d0
  17:     0x564cf5b30176 - core::ops::function::FnOnce::call_once::h74d127fff682a19d
  18:     0x564cf5ca5d92 - test::__rust_begin_short_backtrace::h377d4000c87befa2
  19:     0x564cf5cc18b2 - test::types::RunnableTest::run::h656e7414cfb3441c
  20:     0x564cf5ca6056 - test::run_test_in_process::hcb4c940c64146602
  21:     0x564cf5cae2bd - std::sys_common::backtrace::__rust_begin_short_backtrace::h6035574334cd54dd
  22:     0x564cf5cb89c5 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hebcf07f42d5ca27a
  23:     0x564cf5d1757b - std::sys::pal::unix::thread::Thread::new::thread_start::h21bb0d8f9a9dc8fb
  24:     0x7fcbf7494ac3 - start_thread
                               at ./nptl/pthread_create.c:442:8
  25:     0x7fcbf7526850 - __GI___clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  26:                0x0 - <unknown>

I implemented a fallback in case compilation fails. I could do the same for failing tests. Like that, we can support both current doctests with shared context and others who don't have have this shared context issue can enjoy having doctests running much faster. What do you think?

EDIT: Another solution would be to add a new codeblock attribute like "standalone" to say that a doctest should not be combined with others, but I'm not a big fan of this approach.

notriddle · 2024-04-17T17:24:58Z

I implemented a fallback in case compilation fails. I could do the same for failing tests.

That only helps with spurious failure. Spurious success is an even worse problem, if a test case accidentally relies on a global resource that another test set up. Any kind of merging has this problem, but it gets worse if the merging only happens sometimes.

This is also a problem with TEST_BATCH_SIZE, I think. If a test accidentally relies on a global resource set up by another test, then the tests might magically stop passing when 250 other tests get sandwiched between them.

GuillaumeGomez · 2024-04-17T17:57:11Z

The logic currently is as follows: we merge tests, if the compilation of the combined tests failed, we fallback to the current behaviour.

One thing I thought though: we could spawn a new process for each test. That would prevent them from using "in program" common resources.

This is also a problem with TEST_BATCH_SIZE, I think. If a test accidentally relies on a global resource set up by another test, then the tests might magically stop passing when 250 other tests get sandwiched between them.

Yes it's the current issue with env_logger, hence why I'm suggesting to either try running all tests in the batch independently or start a new process for each doctest.

notriddle · 2024-04-17T19:00:43Z

The troublesome possibility that I'm thinking of is:

You start by setting things up so that all your tests pass in a batch, because one test relies on a process-global resource that another one set up. This is what I called a "spurious passing test."
You change something so that compilation in a batch fails, and the system silently runs all your tests in separate processes instead.
The error you get (something like "unwrap() called on None") is very far away from the change that caused things to break. It's "spooky action at a distance," and we should do better than that.

GuillaumeGomez · 2024-04-17T19:54:59Z

What about changing the default way to run doctests starting the 2024 edition?

EDIT: With a new standalone attribute allowing a doctest to not be part of the combined doctests.

notriddle · 2024-04-17T20:21:54Z

That's better than implicit fallbacks.

GuillaumeGomez · 2024-04-17T20:40:16Z

Agreed. Should I add the new standalone doctest attribute in this PR as well?

…pdate associated test to check both combined and non-combined doctests

…he 2021 edition

To do so, AST error detection was improved in order to not filter out too many doctests.

GuillaumeGomez · 2024-05-19T13:06:53Z

Fixed merge conflict.

camelid · 2024-05-22T21:08:08Z

This seems like a good idea in general, but the example given with static mut concerns me. For the whole time they've been around, doctests have been completely isolated from each other, and now there is the potential for "spooky action at a distance" as @notriddle mentioned. Ideally we could automatically detect doctests that (indirectly) used statics and either run them separately or emit a warning that standalone should be added. Maybe it'd be too difficult to implement or risk producing false positives depending on how stdlib types are implemented, etc. Thoughts?

camelid · 2024-05-22T21:09:40Z

If we do consider the static-detection approach, it could potentially be implemented by creating a MIR query uses_global_mut_state or similar that would recursively check if the associated MIR body read or mutated a static mut or called a function that did the same.

GuillaumeGomez · 2024-05-23T08:35:42Z

I think it'd be a nice improvement but I don't think it should be a part of this first iteration as I'm pretty sure there are cases where using statics in this context is perfectly fine and we'd need to be go through a lot of code to check in which cases it's ok or not.

camelid · 2024-05-25T22:16:19Z

Fair point. I guess this change is reasonable because it matches the way normal unit tests work (all compiled into one binary). Should we consider running all compatible doctests in a single binary in pre-2024 editions, then run those tests individually too, and check if the result didn't match the results of the individual tests (i.e., combined failed but all individual passed, or combined passed but at least one individual failed)? The challenge of course is that it'd cause a further slowdown on pre-2024 tests. Just a thought though to help migration to the new edition behavior.

Also curious what some other people on the team think about this change in behavior.

GuillaumeGomez · 2024-05-25T22:20:42Z

Fair point. I guess this change is reasonable because it matches the way normal unit tests work (all compiled into one binary). Should we consider running all compatible doctests in a single binary in pre-2024 editions, then run those tests individually too, and check if the result didn't match the results of the individual tests (i.e., combined failed but all individual passed, or combined passed but at least one individual failed)? The challenge of course is that it'd cause a further slowdown on pre-2024 tests. Just a thought though to help migration to the new edition behavior.

I'm not sure to understand what you have in mind in this case. Is it to uncover bugs we might have missed maybe?

Also curious what some other people on the team think about this change in behavior.

That's why we'll have an FCP in any case. :)

camelid · 2024-05-25T22:27:33Z

I'm not sure to understand what you have in mind in this case. Is it to uncover bugs we might have missed maybe?

No, more to detect problems with user code. E.g. if doctests in someone's crate do something like depend on the same mutable global state, we can warn them that this will break in Rust 2024.

GuillaumeGomez · 2024-05-25T22:29:23Z

I see. I'm not sure it's a good idea though. Maybe adding an option to force this new behaviour on non-2024 edition runs could be a way for users to give it a try?

camelid · 2024-05-25T22:31:10Z

Yeah, we should probably focus on that for now. Could also consider a crater run.

GuillaumeGomez · 2024-05-25T22:33:07Z

I'm still hoping we can make it before the 2024 edition is done. Let's hope I'm not too optimistic here. ^^'

camelid

Left a few comments about the docs. In terms of the code itself, I believe we should refactor the doctest code before making this change. It's hard to follow what's happening since test collection, processing, and running are all mixed together—not because of this PR, rather due to the existing code. There are also other issues like functions with tons of arguments, which is improved somewhat here but should ideally be done in a separate PR that doesn't change behavior.

I'm currently working on such a refactor and will open a PR soon. Just to be clear, my goal is not to overhaul the test running system but just try to organize and decouple the code better.

camelid · 2024-05-29T03:41:04Z

src/doc/rustdoc/src/write-documentation/documentation-tests.md

@@ -376,6 +376,58 @@ that the code sample should be compiled using the respective edition of Rust.
 # fn foo() {}
 ```

+Starting the 2024 edition[^edition-note], compatible doctests will be merged as one before being


Suggested change

Starting the 2024 edition[^edition-note], compatible doctests will be merged as one before being

Starting in the 2024 edition[^edition-note], compatible doctests will be merged as one before being

Perhaps also add a sentence explaining that it's for performance reasons? Just so people don't wonder why we're making the change. Could also be worth clarifying that if it fails, the individual tests will be rerun to give an exact error.

camelid · 2024-05-29T03:41:24Z

src/doc/rustdoc/src/write-documentation/documentation-tests.md

@@ -376,6 +376,58 @@ that the code sample should be compiled using the respective edition of Rust.
 # fn foo() {}
 ```

+Starting the 2024 edition[^edition-note], compatible doctests will be merged as one before being
+run. It means that they will share the process, so any change global/static variables will now


Suggested change

run. It means that they will share the process, so any change global/static variables will now

run. It means that they will share the process, so any change to global/static variables will now

jieyouxu · 2024-05-31T08:32:59Z

tests/run-make/doctests-merge/rmake.rs

+        .arg("--edition")
+        .arg(edition)


Suggestion: add an edition() helper to Rustdoc just like the one on Rustc.

jieyouxu · 2024-05-31T08:33:20Z

tests/run-make/doctests-merge/rmake.rs

+        .arg("--extern")
+        .arg(format!("foo={}", dep.display()))


Suggestion: add an extern_() helper like the one on Rustc.

bors · 2024-06-01T13:02:15Z

☔ The latest upstream changes (presumably #124577) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot assigned notriddle Apr 15, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Apr 15, 2024