Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider exposing FUSE-specific errors from libfuse{2,3} #693

Closed
str4d opened this issue Aug 21, 2022 · 14 comments
Closed

Consider exposing FUSE-specific errors from libfuse{2,3} #693

str4d opened this issue Aug 21, 2022 · 14 comments

Comments

@str4d
Copy link

str4d commented Aug 21, 2022

fuse_mount_compat25 / fuse_session_mount currently return -1 on error, but provide no other information about what went wrong. If the cause is a system error, the caller can use errno to guess what went wrong, but otherwise the caller just has to abort.

My motivating example: if the mount options auto_unmount,allow_{root,other} are set but the user does not have user_allow_other set in their fuse.conf, the fusermount / fusermount3 binaries will print an error message to stderr but not set errno. This means that fuse_mount_compat25 / fuse_session_mount (which call fusermount / fusermount3 under the hood) will return -1, but if the caller tries to figure out what error occurred using errno, they will get 0 indicating success. The fuser Rust crate was doing precisely this, and reporting the error as Error: Success which is a very confusing error message. It was addressed by explicitly mapping errno = 0 to an unknown error type (cberner/fuser#178, cberner/fuser#218).

Perhaps instead the return value itself could be altered, based on whether the error is due to a FUSE-specific config issue or not? fuse_mount_compat25 / fuse_session_mount could be documented as returning -1 on an error from a system call (in which case the caller checks errno for more information), and value < -1 for FUSE-specific errors. The main issue would be how downstream users of libfuse would behave on receiving an error that is not -1 (i.e. if they consider success to be ret >= 0, or ret != -1).

@str4d
Copy link
Author

str4d commented Aug 21, 2022

My motivating example leads me to want to try mounting twice: once with auto_unmount,allow_root, and again without. But I only want to try this if the error was indeed due to the lack of user_allow_other in the config. An alternative would be for libfuse to provide an API for callers to query the current config (as it knows where the config file is that it will be listening to), so they can prepare the mount options correctly ahead of time (or provide error messages to their user that fit their native UX). If that seems like a better approach, I can open a separate issue for that.

@Nikratio
Copy link
Contributor

This is a great idea, thanks for taking the time to suggest it! Do you think you could work on this (or do you know someone who could)?

libfuse is fully volunteer driven. This means new features and bugfixes are implemented only when someone has a personal interest in them and therefore also does the necessary work. So if someone (that probably means you) is interested in working on this, then I'd be happy to give guidance and review patches.

@str4d
Copy link
Author

str4d commented Aug 22, 2022

Sure, I'd be happy to give this a go (in my own spare time). Lack of knowledge of the internals is definitely my weakness here (I've only ever used FUSE at a relatively high level), so guidance would be greatly appreciated.

Here's my current understanding of the APIs I'm interacting with:

If we want to go in the direction of uniformly reporting FUSE-specific errors, then we'd likely want to do so from other APIs as well. So the first step is probably to just walk the codebase and figure out what all the possible error conditions are (unless there is somewhere that already documents them). Then we can figure out the kinds of FUSE-specific errors we need to handle, and whether simply assigning them error codes is sufficient, or whether we just return e.g. -2 and define a separate errno-style API for reporting more verbose error information.

@bsbernd
Copy link
Collaborator

bsbernd commented Aug 22, 2022

Hmm, I have a concern regarding changing the return value from -1 to values below that - existing users might not expect that and check for == -1 instead of < 0. I guess it would be saver to set errno. An alternative might be to set the fuse error into errno values > XYZ (1000?) and then to add a new function const char *fuse_strerror(), which handles these values as well.

@str4d
Copy link
Author

str4d commented Aug 22, 2022

Yep, that's the same main issue I raised in the OP. It's always fine to take the errno approach; I just don't know if it's safe to set errno ourselves, or of we should use a separate global of our own in that case.

@bsbernd
Copy link
Collaborator

bsbernd commented Aug 22, 2022

You can assign errno yourself, but it might get overwritten by other calls - this is one of the issues in fuse_session_mount(). One way to solve that is to

int saved_errno = 0;
in rc = some_call();
if (rc != 0) {
    saved_errno = errno;
    some_error_call();
    goto out_err;
}


out_err:
    errno = saved_errno;
    return rc;

Introducing your own TLS error value also works, but users that missed fuse_strerror() would get surprising results as you noticed. On setting errno they would get correct results for standard error code and "unknown error" (I actually don't know what strerror(1000) exactly returns) for fuse error codes.

@str4d
Copy link
Author

str4d commented Aug 22, 2022

users that missed fuse_strerror() would get surprising results as you noticed.

Yep. That's why my original suggestion was to use other return values, so it was a transparent upgrade that could be easily leveraged. It would be nice if we could just do that instead of relying on out-of-band signaling.

How tricky would it be to survey the userbase of the mounting APIs (which is likely smaller than the userbase of the filesystem APIs) to see how they currently interpret return values?

  • In the case of libfuse2 the APIs are undocumented, but the returned value is a file descriptor, so the caller must check it is non-negative or subsequent usages will fail. For a user checking == -1, this would expose a bug in their code.
  • For the libfuse3 session APIs it's possible callers aren't using != 0 to check for errors (the fuser Rust crate is, at least), but it's at least documented that only 0 means success; even positive non-zero values could mean something else.

On setting errno they would get correct results for standard error code and "unknown error" (I actually don't know what strerror(1000) exactly returns) for fuse error codes.

If we can do so safely, I think we should set errno if we're going to return an error, just to avoid the "Error: Success" UX. I don't know if there's a defined code for "other" or "unknown error" though, and am wary to just trample over that namespace, hence my preference for either returning alternate values or defining an out-of-band error method.

@bsbernd
Copy link
Collaborator

bsbernd commented Aug 22, 2022

* In the case of `libfuse2` the APIs are undocumented, but the returned value is a file descriptor, so the caller _must_ check it is non-negative or subsequent usages will fail. For a user checking `== -1`, this would expose a bug in their code.

* For the `libfuse3` session APIs it's possible callers aren't using `!= 0` to check for errors (the `fuser` Rust crate is, at least), but it's at least documented that only `0` means success; even positive non-zero values could mean something else.

I won't switch my branch just for that, libfuse2 is deprecated anyway. master branch has include/fuse_lowlevel.h and that clearly states -1. Changing it would break API and ABI - you would need to handle that - I don't think that will be easy here, unless you want to introduce a function like fuse_session_mount2(). If you look at libfabric, its API functions take the expected version - there it is easier to do changes like those you are proposing, but libfuse does not have this at the moment and passing the API version can be rather confusing for the caller as well,

/**
 * Mount a FUSE file system.
 *
 * @param mountpoint the mount point path
 * @param se session object
 *
 * @return 0 on success, -1 on failure.
 **/
int fuse_session_mount(struct fuse_session *se, const char *mountpoint);

@str4d
Copy link
Author

str4d commented Aug 22, 2022

master branch has include/fuse_lowlevel.h and that clearly states -1. Changing it would break API and ABI

It would break the documented API, yes, but it would not break ABI (the method would still return int). The issue is that the documented @return 0 on success, -1 on failure leaves the entire rest of the int space undefined, so the question is how it could ever be used (if there was no intention to ever use it, then the return value could and should have just been bool, which has been present since C99, long before libfuse3 was released).

My suggestion amounts to:

/**
 * Mount a FUSE file system.
 *
 * @param mountpoint the mount point path
 * @param se session object
 *
 * @return 0 on success, -1 on system failure (in which case errno should be
 * checked for details), other negative values for FUSE-specific errors.
 **/
int fuse_session_mount(struct fuse_session *se, const char *mountpoint);

essentially replacing (-1, errno=0) with e.g. -2.

And in fact we can reason that this is a potentially safe (depending on how system APIs handle -1 as a file descriptor) change for the same reason it would be for libfuse2: fuse_session_fd returns -1 until fuse_session_mount returns 0, so a user checking fuse_session_mount() == -1 for errors, in the event that e.g. -2 was returned, would then attempt to use fuse_session_fd() as a file descriptor, and either immediately handle this error instead (if they check that fuse_session_fd() >= 0), or try and use -1 as a file descriptor and hit the same kind of bug as for libfuse2.

@bsbernd
Copy link
Collaborator

bsbernd commented Aug 22, 2022

It would break any existing binary that expects 0 or -1 as value. It is easy enough to change something like that in a small project, but in libfuse that has many users (nobody knows how many) - might introduce issues for a number of users. Which is why I'm suggesting to go the errno way, which would not break anything. At least I don't easily see what might get broken.

@str4d
Copy link
Author

str4d commented Aug 22, 2022

Which is why I'm suggesting to go the errno way, which would not break anything. At least I don't easily see what might get broken.

errno has many more users than libfuse, which is an even stronger argument for not using it to encode FUSE-specific errors. I note that the errno man page states:

The error numbers that correspond to each symbolic name vary across UNIX systems, and even across different architectures on Linux. Therefore, numeric values are not included as part of the list of error names below. The perror(3) and strerror(3) functions can be used to convert these names to corresponding textual error messages.

i.e. there is no guarantee that different platforms will agree on what errno numbers map to which errno error codes. This clearly means that trying to shoehorn FUSE-specific error codes into errno is unsafe.

So if altering the return values is not an option that the wider FUSE community would accept, then the only viable option is door number 3: add a FUSE-specific TLS error variable, and a new method to query its current value.

@str4d
Copy link
Author

str4d commented Aug 23, 2022

Did some digging into the code, and learned some more about how the architecture is set up. For my motivating example:

  • fuse_session_mount calls fuse_kern_mount.
    • fuse_kern_mount calls fuse_mount_sys.
      • fuse_mount_sys returns -2 because auto_unmount can only be handled by fusermount3.
    • fuse_kern_mount calls fuse_mount_fusermount.
      • fuse_mount_fusermount creates a Unix socket, and then forks.
        • The parent launches fusermount3, passing the file descriptor for one half of the Unix socket as an integer in the environment variable _FUSE_COMMFD.
          • fusermount3 parses a Unix socket file descriptor as an integer from the environment variable _FUSE_COMMFD.
          • fusermount3 attempts to create the FUSE mountpoint.
            • If any errors occur (such as allow_root not being allowed via the config), the error messages are printed to stderr and fusermount3 exits with status 1.
          • fusermount3 transfers the resulting file descriptor over the provided Unix socket via an SCM_RIGHTS socket message.
        • The child:
          • Listens on the other half of the Unix socket for an SCM_RIGHTS socket message, transferring the file descriptor of the FUSE mountpoint from fusermount3 to itself.
          • Returns the file descriptor.
    • fuse_kern_mount returns the file descriptor.
  • fuse_session_mount saves the file descriptor as se->fd, to be fetched later via fuse_session_fd.

The core point this workflow raises is that in order to correctly report all errors (or at least the error that is motivating me to work on this issue), we need to be able to pass not only a file descriptor out of fusermount3, but also an error code in the case of an error. This will require changes to the protocol between libfuse3 and fusermount3, but I cannot find any documentation for this protocol.

Is it safe to treat this protocol as internal, and make arbitrary changes to it? At a glance, I cannot see any evidence that the protocol is versioned in any way. The code that invokes fusermount3 does prefer the same install directory, which would have a high likelihood of fusermount3 having the same version as libfuse3. But failing that it will allow a binary from anywhere in the PATH, which suggests that the protocol between libfuse3 and fusermount3 potentially needs to handle unknown combinations of versioning.

If versioning is needed, and I'm correct that there is no current versioning, then we'd need to use two methods to establish bidirectional versioning:

  • libfuse3 would call fusermount3 --version and parse the output (eww) to determine whether the fusermount3 it will be talking to supports versioning.
  • libfuse3 versions that are version-aware would set a new environment variable that holds their version. Then fusermount3 versions that are version-aware would inspect that variable and switch on it for a versioned protocol, falling back on the original behaviour if the environment variable is unset.

@Nikratio
Copy link
Contributor

You can assume that libfuse3 and fusermount3 are always in sync.

@Nikratio
Copy link
Contributor

I'm closing this issue for now. Please note that this isn't meant to imply that you haven't found a real bug or worthwhile potential improvement - you most likely have and I'm grateful that you took the time to report it.

Unfortunately, this project does not currently have any active, regular contributors. As the maintainer, I try to review pull requests and make regular releases, but unfortunately I have no capacity to do significant development beyond that.
Issue reports that do not come with a pull request or clearly have high impact on a large number of users are therefore likely to languish.

I understand that this is frustrating for users, but I hope you can also understand that any development work that I do on this project has to compete with spending time with my family, doing work that I get paid for, doing something recreational without a computer, or working on features/bugs that affect me personally. Most bugs and ideas - unfortunately including this one - loose out in this competition.

In other words, unless you plan to work on this yourself or can recruit someone who will, it's unlikely that anyone is going to do anything about it anytime soon.

Still, you may wonder why I am closing the issue rather than keeping it open.

In short, I want the issue tracker to show the most important issues that users should be aware of and where prospective contributors could make the biggest difference. I do not think there is much value in using it as an exhaustive database of every idea or glitch that someone has ever encountered - especially if no one is intending to address/implement it.

For this reason, I am closing most issues when it becomes clear that they're unlikely to see any activity in the near future - and this seems to be the case here.

I understand that you have invested time and effort in reporting this, and I am very sorry that currently there is no way to build upon this. I wish the situation was different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants