Decide policy on stub implementations (syscalls, sockopts, flags, etc) #3280

stevenengler · 2024-01-11T20:53:00Z

Each time we need to decide whether to add a stub implementation we've mentioned that we should have a policy somewhere for what to do. This issue can be where we decide what that policy should be.

There are many syscalls, socket options, syscall flags, etc that Shadow doesn't support but applications use. Typically we want to return an error (ex: EOPNOTSUPP or EINVAL) for these unsupported features. If we're lucky the application will just ignore the error, but many applications will then exit with an error message. If we want the application to run in Shadow the choices are to patch the application, add support in Shadow, or add a stub in Shadow which returns "success" without actually performing the correctly. If the feature is complex and/or no one has time to work on it, the simplest solution is often to add a stub in Shadow. But this means we're intentionally adding incorrect behaviour to Shadow, which can lead to confusion for users and can make Shadow simulations much harder to debug. In the worst case it might cause Shadow to generate incorrect simulation results.

We already have many stub implementations. For example:

shadow/src/main/host/descriptor/socket/inet/legacy_tcp.rs

Lines 1231 to 1247 in 47bde40

    
           (libc::SOL_SOCKET, libc::SO_REUSEADDR) => { 
        
               // TODO: implement this, tor and tgen use it 
        
               log::trace!("setsockopt SO_REUSEADDR not yet implemented"); 
        
           } 
        
           (libc::SOL_SOCKET, libc::SO_REUSEPORT) => { 
        
               // TODO: implement this, tgen uses it 
        
               log::trace!("setsockopt SO_REUSEPORT not yet implemented"); 
        
           } 
        
           (libc::SOL_SOCKET, libc::SO_KEEPALIVE) => { 
        
               // TODO: implement this, libevent uses it in 
        
               // evconnlistener_new_bind() 
        
               log::trace!("setsockopt SO_KEEPALIVE not yet implemented"); 
        
           } 
        
           (libc::SOL_SOCKET, libc::SO_BROADCAST) => { 
        
               // TODO: implement this, pkg.go.dev/net uses it 
        
               log::trace!("setsockopt SO_BROADCAST not yet implemented"); 
        
           }

shadow/src/main/host/syscall/handler/wait.rs

Lines 94 to 97 in 47bde40

    
           if options.contains(WaitFlags::__WNOTHREAD) { 
        
               // TODO: track parent thread and check it here. 
        
               warn_once_then_debug!("__WNOTHREAD unimplemented; ignoring."); 
        
           }

shadow/src/main/host/process.rs

Lines 1557 to 1562 in 47bde40

    
           /// Resource usage, as returned e.g. by the `getrusage` syscall. 
        
           pub fn rusage(&self) -> linux_api::resource::rusage { 
        
               warn_once_then_debug!( 
        
                   "resource usage (rusage) tracking unimplemented; Returning bogus zeroed values" 
        
               ); 
        
               // TODO: Actually track some of these.

(and more)

The question is: What do we do when adding a stub implementation is the simplest solution to unblocking an application or use-case?

Some possible answers:

Don't allow any new stubs, but keep existing stubs.
Allow stubs for tor-related applications only.
Allow all stubs as long as they also have a warning (typically warn_once_then_debug!()).
Add an experimental config "stub" option which allows users to opt-into a list of stubs. This list would default to the stubs we already have and use, but new stubs would not be enabled by default.

We might want some "escape hatches" from the above rules as well, such as:

Allow new default stubs for features that are required for all applications. (For example the rseq syscall, which is partly implemented and is needed for modern glibc versions.)

The text was updated successfully, but these errors were encountered:

cohosh · 2024-01-12T17:25:38Z

Allow all stubs as long as they also have a warning (typically warn_once_then_debug!()).

This seems reasonable to me. When I first started running shadow simulations with Snowflake, I went through the warnings to make a quick judgement call on whether the missing feature would impact the results. Though with how quickly Go has been changing, I'll admit I haven't gone over the missing features that don't cause errors to see if anything important is missing.

I don't necessarily think these quick judgement calls are as useful as doing some sort of validation experiments to give more confidence that the simulations are accurate enough for a given application to draw useful conclusions about it.

I just took a look at my shadow log for my most recent (minimal) Snowflake experiment and I see a lot of warnings:

$ grep "WARN" shadow.log | wc -l
68

Most of these are repeats and there are only 5 unique warnings for missing features.

Add an experimental config "stub" option which allows users to opt-into a list of stubs. This list would default to the stubs we already have and use, but new stubs would not be enabled by default.

This also sounds reasonable to me, and I see the value in app developers being made aware of new missing features and forced to make a conscious decision about whether or not the given feature is needed.

robgjansen · 2024-01-14T01:36:27Z

This is tricky, but I think I'm leaning toward some flavor of the opt-in options telling shadow to ignore missing features.

When Shadow was mainly used for running Tor simulations, it was reasonable to just silently ignore or warn-then-ignore in these types of stubs, because we did the debugging work to verify that Tor wasn't reliant in some important way on the unimplemented functionality. But now Shadow has become more suited to running general applications outside of the Tor world, and it is not really possible to check and verify that much larger set of applications. It comes down to guessing how important the underlying feature is to the functionality of an application, and the guesses are going to have wide error bars.

Technically, failing a syscall with EOPNOTSUPP when a feature is not supported is a valid thing for Shadow to do, even if an application was not written to handle that case (because, Linux would never dream of returning it). But if we do that, the nicest thing Shadow could do for users is to identify the problem and make it clear how to work around it.

log a warning message that clearly explains the feature Shadow thinks was being requested, that the feature is unimplemented, and an option that the user can use to tell Shadow to ignore and return success for that feature (e.g., "re-run the simulation with the option --ignore-keepalive if the application does not require this feature").
pre-add all of the missing stubs, i.e., enumerate as many missing features as possible and auto-generate all of the ignore options and warning messages so that users are not constantly having to submit PRs to add a stub for a feature their favorite app uses.

stevenengler added Type: Bug Error or flaw producing unexpected results Component: Documentation In-repository documentation, under docs/ and removed Type: Bug Error or flaw producing unexpected results labels Jan 11, 2024

stevenengler mentioned this issue May 15, 2024

Add experimental stub_syscalls config option #3332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide policy on stub implementations (syscalls, sockopts, flags, etc) #3280

Decide policy on stub implementations (syscalls, sockopts, flags, etc) #3280

stevenengler commented Jan 11, 2024

cohosh commented Jan 12, 2024

robgjansen commented Jan 14, 2024

Decide policy on stub implementations (syscalls, sockopts, flags, etc) #3280

Decide policy on stub implementations (syscalls, sockopts, flags, etc) #3280

Comments

stevenengler commented Jan 11, 2024

cohosh commented Jan 12, 2024

robgjansen commented Jan 14, 2024