New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set_var/remove_var are unsound in combination with C code that reads the environment #27970
Comments
cc @luqmana |
If we did take this route I'd want to convert the environment lock to a rwlock to ensure that we could at least have parallel dns queries. Also cc #27705. |
Yeah, either way the situation is tricky. We get tangled up in glibc internals. |
I found the wording on this page also particularly interesting:
The getaddrinfo function is indeed marked with this tag |
Even without low-level races, This problem would not go away if we provided thread-safe environment access at the libc layer. Therefore, I'm surprised the function isn't marked |
Triage: as far as I know, this hasn't changed at all. The docs do describe this possible footgun. |
This issue can also apply to any external crate that calls into something that calls get_env, for example time-rs/time#293 and chronotope/chrono#499. Maybe some way for these crates to observe the lock is called for? |
Seems to me like this is a soundness issue, and should be marked I-unsound? |
Assigning |
Just repeating what I've said on Zulip: To resolve the soundness bug in time & chrono, one option would be to provide a |
The env lock has to be exposed somehow, I agree. An alternative API would be a function that takes a closure that will be called with the lock held. This is less powerful (which might be a good thing: provide the least powerful API that is good enough). |
However, talking of the env lock -- also see #64718 for how the current use of the env lock is wrong, so exposing it in that state might be a problem. |
Sure, that would work as well. End result is the same, if done properly. |
Honestly, I don’t believe In a pure-Rust program, it is already safe, but in C, Even without the thread safety problems, there are other reasons that What might be safe is a function that fails if multiple threads are running, and which only allows setting a limited number of env vars, such as the timezone. |
Just to confirm this is not a theoretical issue, here is a Rust program that only uses the standard library and crashes in Linux with glibc with a Segmentation Fault. It calls |
The bug report rust-lang#27970 has existed for 8 years, the actual bug dates back to Rust pre-1.0. I documented it since it's in the interest of the user to be aware of it. The note can be removed once rust-lang#27970 is fixed.
Add discussion that concurrent access to the environment is unsafe The bug report rust-lang#27970 has existed for 8 years, the actual bug dates back to Rust pre-1.0. I documented it since it's in the interest of the user to be aware of it. The note can be removed once rust-lang#27970 is fixed.
Rollup merge of rust-lang#116888 - tbu-:pr_unsafe_env, r=Amanieu Add discussion that concurrent access to the environment is unsafe The bug report rust-lang#27970 has existed for 8 years, the actual bug dates back to Rust pre-1.0. I documented it since it's in the interest of the user to be aware of it. The note can be removed once rust-lang#27970 is fixed.
What is the status of this? Specifically with regards to exposing the env lock to users of std, this was talked about in 2020 but nothing much seems to have happened? Was the idea dropped, if so where is the rationale for that documented? |
glibc says calling set_env is unsound in the presence of threads. So in order to address this we need the application to essentially become single threaded while set_env is being called. The environment lock falls short on this. We can't seriously mandate that everyone acquires a lock before making libc calls, not least because the lock is Rust only. A more robust course of action would be to simply (and more accurately) mark set_env unsafe and teach how it can be safely used in a cross-platform manner. In this scenario the lock is basically redundant so it could even be considered for removal. |
I believe this is the current plan:
|
Plan sounds good. The right thing for applications to do is not modify the environment. If they're modifying it to pass to child processes, the right way to do that is passing an explicit new environment, not modifying your own environment and relying on it getting copied. If they're using the environment to control the behavior of C library code they're calling, that's inherently unsafe and the library should be fixed to accept explicit caller-provided overrides of whatever it was getting from the environment. |
@richfelker This doesn't really work if you are implementing a shell though, unless you decide to emulate your internal environment completely (and pass that on to tasks you spawn). That is a niche use case sure, and I'm curious as to what nushell and fish (which was recently rewritten in Rust) do here. |
Note that "emulating the internal environment completely" essentially only means keeping a hashmap around and accessing that instead of
fish doesn't call |
Of course this is the correct way to do a shell. Just because ancient unix history did things the sloppy way of modifying the shell process's environment directly rather than keeping its own data structures for key/value map doesn't mean a modern one written in a modern safe language should copy the same bad ideas. It's not even easier to store exports in your own environment, since you already need your own map for non-exported variables, and you can just use the same one (with export flag on each entry) for both. |
It is really unfortunate that libc and posix have functions that depend on environment variables, such as |
There is never a reason to set the locale environment vars within a program. If you want to request a specific locale, you pass the name to the locale functions. The environment is only there as a default inherited at execution time. For time zones, indeed the standard library functionality is rather poor. There have been proposals to modernize it with first class zone objects, but not a lot of progress on making it happen. |
Just for completeness sake: there is also https://github.com/sunfishcode/eyra#why which solves the issue on the level of the libc |
Is this something that could be prototyped in a C library? |
Sure. It isn't trivial though. On linux, you need to know how to read the timezone files, which is kind of complicated to do, I think that is what chrono does though. I'm not sure if there is a documented and supported way to do it on other major platforms. I think Mac uses the same tzinfo format, but I'm not sure if that is documented to remain stable. I don't know about windows, or mobile oses. |
Allow calling these functions without `unsafe` blocks in editions up until 2021, but don't trigger the `unused_unsafe` lint for `unsafe` blocks containing these functions. Fixes rust-lang#27970. Fixes rust-lang#90308.
Allow calling these functions without `unsafe` blocks in editions up until 2021, but don't trigger the `unused_unsafe` lint for `unsafe` blocks containing these functions. Fixes rust-lang#27970. Fixes rust-lang#90308.
Make `std::env::{set_var, remove_var}` unsafe in edition 2024 Allow calling these functions without `unsafe` blocks in editions up until 2021, but don't trigger the `unused_unsafe` lint for `unsafe` blocks containing these functions. Fixes rust-lang#27970. Fixes rust-lang#90308.
Like the documentation for our
set_var
(setenv) says, care must be taken with mutating environment variables in a multithreaded program. See this glibc bug #13271 that says getaddrinfo may call getenv.It looks like we have an unsynchronized call to getaddrinfo and this may cause trouble with glibc/Linux.
Seeing glibc's attitude to setenv in multithreaded programs,
set_var
seems like a big hazard in general(?).Discovered as an issue tangential to #27966
cc @alexcrichton
The text was updated successfully, but these errors were encountered: