Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing-journald ignores EMSGSIZE from send() #1698

Closed
swsnr opened this issue Nov 3, 2021 · 20 comments · Fixed by #1744
Closed

tracing-journald ignores EMSGSIZE from send() #1698

swsnr opened this issue Nov 3, 2021 · 20 comments · Fixed by #1744

Comments

@swsnr
Copy link
Contributor

swsnr commented Nov 3, 2021

Bug Report

As far as I can see the current implementation of tracing-journal entirely ignores the error returned from .send():

        // What could we possibly do on error?
        #[cfg(unix)]
        let _ = self.socket.send(&buf);

However, according to "Basics" in the systemd protocol documentation socket datagrams are subject to size limits. If a datagram exceeds the message size socket.send will return an EMSGSIZE error. In this case journal clients should write the payload to a sealed memfd instead and send that memfd to journald.

tracing-journald doesn't, so it may silently loose messages. On my system the limit appears to be about 213Kb according to /proc/sys/net/core/wmem_max; I say "appears" because I'm not entirely sure that this /proc file is definitely relevant. In any case the limit seems to be system-specific so I don't think tracing can generally assume that "reasonably-sized" messages fit into a single datagram. And I don't think tracing should risk loosing messages.

I can make a pull request (mostly copying from the working impl in libsystemdrs), but memfds aren't in the standard library and require either a lot of unsafe libc code or a nix dependency, so I'm not sure what the proper course of action is here.

Cheers,
Basti

@hawkw
Copy link
Member

hawkw commented Nov 3, 2021

I can make a pull request (mostly copying from the working impl in libsystemdrs), but memfds aren't in the standard library and require either a lot of unsafe libc code or a nix dependency, so I'm not sure what the proper course of action is here.

Thanks for offering to open a PR to fix this! <3 I think adding a dependency on the nix crate for this would be fine, but I'd also ask @Ralith, the original tracing-journald author, for his opinion.

@Ralith
Copy link
Collaborator

Ralith commented Nov 3, 2021

nix is a surprisingly heavyweight, broadly-scoped dependency that is usually avoidable; I would strongly prefer that it not be added, even if that means some unsafe libc, which I think will be fairly limited in scope. IMO one of the big upsides of tracing-journald as it stands is that it is so lightweight.

We could also consider logging an error on overflow, but actually passing the data through would be nicer.

@swsnr
Copy link
Contributor Author

swsnr commented Nov 3, 2021

Oh, I'm sorry, I wasn't aware that nix is considered heavyweight 🤔

Out of curiosity, and without contesting your words (I am not very familiar with nix after all), may I ask why is that? As far as I can see nix itself has only a few small and trivial dependencies on its own. It is the number of source files and the build time I presume?


In any case I'll try and make a pull request with plain libc, but it'll take a bit longer then I'm afraid, for I just noticed that libc doesn't wrap the memfd_create function. I'll need to call the underlying syscall directly which is very much unknown territory for me.

Also, can I add some integration tests for this crate along the way and have them run on CI? I've written some for libsystemd-rs, which I could mostly just copy; I'd just like to have some safety net when entering unsafe territory 😊

@hawkw
Copy link
Member

hawkw commented Nov 3, 2021

Also, can I add some integration tests for this crate along the way and have them run on CI?

please do, if it's not too much trouble! the one concern might be ensuring that appropriate dependencies are present on CI if you need to e.g. use journalctl to confirm that messages are actually present in the journal, but if you have the time to figure that out, any tests you want to add would be great!

@Ralith
Copy link
Collaborator

Ralith commented Nov 3, 2021

On review nix isn't as bad as I recalled (1.6s debug, 4.4s release on my machine, not counting the commonplace deps), but it's still orders of magnitude heavier than tracing-journald itself for dubious benefit, unless I'm vastly underestimating the complexity required here. It's also one more thing that has to be kept up to date lest duplicate versions proliferate through everyone's depgraph.

Invoking syscalls by hand isn't too difficult, but the barrier to adding new bindings to libc is very low as well. Thanks for working on this!

@hawkw
Copy link
Member

hawkw commented Nov 3, 2021

I think it would be preferable to add a new binding in the libc crate, and use that, if it's not too much more work compared to just making the syscall ourselves...

@Ralith
Copy link
Collaborator

Ralith commented Nov 3, 2021

Certainly less work in aggregate but a bit more waiting for them to kick a release out.

@hawkw
Copy link
Member

hawkw commented Nov 3, 2021

Yeah...well, we can always do it ourselves with the intention of removing it once libc adds a binding.

@swsnr
Copy link
Contributor Author

swsnr commented Nov 4, 2021

Oh, this escalated quickly 😔. I can make an implementation without nix for sure, but upstreaming a libc change is an extra mile I was not really prepared for.

I'm not sure I'd like to do this, less so since there's already a merge request for this at libc that went nowhere over some Android issues that I'm entirely unfamiliar with. I'd prefer if I could focus my attention on this crate first 🙂

@Ralith
Copy link
Collaborator

Ralith commented Nov 4, 2021

A raw syscall is fine with me; we can always swap it out. I'll try to nudge libc forward in the mean time.

@swsnr
Copy link
Contributor Author

swsnr commented Nov 4, 2021

Thanks 🙏

Just one final question from my side: Would you prefer a separate MR for the integration tests, or an all in one MR with tests and this change?

I don't mind either way, so whatever works best for you 🙂

@hawkw
Copy link
Member

hawkw commented Nov 4, 2021

Just one final question from my side: Would you prefer a separate MR for the integration tests, or an all in one MR with tests and this change?

I'd prefer separate PRs if that's practical. It would be okay for the EMSGSIZE change to be in a branch that's based on the branch adding integration tests (rather than on the master branch) if you add additional integration tests in that branch. But, I can live with it if you end up doing everything in one big PR, too.

@swsnr
Copy link
Contributor Author

swsnr commented Nov 4, 2021

@hawkw Alright. I opened #1709 for the integration tests.

@Ralith
Copy link
Collaborator

Ralith commented Nov 6, 2021

The libc change at rust-lang/libc#2069 has now been merged.

@swsnr
Copy link
Contributor Author

swsnr commented Nov 6, 2021

Cool.

@Ralith
Copy link
Collaborator

Ralith commented Nov 9, 2021

And released in 0.2.107 🎉

hawkw pushed a commit that referenced this issue Nov 9, 2021
Per discussion with @hawkw in #1698 I'm adding a few simple integration
tests for the journald subscriber, to have some safety net when
implementing the actual issue in #1698.

These tests send messages of various complexity to the journal, and then
use `journalctl`'s JSON output to get them back out, to check whether
the message arrives in the systemd journal as it was intended to.

## Motivation

Increase test coverage for the journald subscriber and codify a known
good state before approaching a fix for #1698.
@swsnr
Copy link
Contributor Author

swsnr commented Nov 9, 2021

Nice 🙂 Thanks for pushing this.

@swsnr
Copy link
Contributor Author

swsnr commented Nov 11, 2021

Turns out though that the real pain is passing the file descriptor over the socket; send_vectored_with_ancillary is unstable API, and the raw C API is a nightmare to use. I'll try to fiddle around with it, but it may take me a while to get it working, and it'll a nasty little chunk of weird unsafe code.

@Ralith
Copy link
Collaborator

Ralith commented Nov 11, 2021

cmsgs are wacky for sure, but the special case of sending exactly one message of hardcoded type shouldn't be too bad. https://github.com/quinn-rs/quinn/blob/main/quinn-udp/src/cmsg.rs may be an interesting reference.

@swsnr
Copy link
Contributor Author

swsnr commented Nov 23, 2021

I've opened #1744; I'm sorry that it took a bit longer than expected, but I had to fiddle a bit with this cmsg stuff 😇

hawkw pushed a commit that referenced this issue Nov 30, 2021
See #1698: Properly write large payloads to journal.

I'd appreciate a very careful review; this cmsg stuff is nasty, and
while it's well documented in `cmsg(3)` I had to fiddle a bit because
the corresponding functions in libc aren't const and thus don't permit a
direct allocation of the buffer as most `cmsg` C code around does.

Closes #1698

## Motivation

Linux limits the maximum amount of data permitted for a single Unix
datagram; sending large payloads directly will fail.

## Solution

Follow systemd.io/JOURNAL_NATIVE_PROTOCOL/ and check for `EMSGSIZE` from
`send()`; in this case write the payload to a memfd, seal it, and pass
it on to journald via a corresponding SCM_RIGHTS control message.

Per discussion in #1698 this adds no dependency on `nix`, and instead
implements fd forwarding directly with some bits of unsafe `libc` code.
hawkw pushed a commit that referenced this issue Dec 2, 2021
Per discussion with @hawkw in #1698 I'm adding a few simple integration
tests for the journald subscriber, to have some safety net when
implementing the actual issue in #1698.

These tests send messages of various complexity to the journal, and then
use `journalctl`'s JSON output to get them back out, to check whether
the message arrives in the systemd journal as it was intended to.

## Motivation

Increase test coverage for the journald subscriber and codify a known
good state before approaching a fix for #1698.
hawkw pushed a commit that referenced this issue Dec 2, 2021
See #1698: Properly write large payloads to journal.

I'd appreciate a very careful review; this cmsg stuff is nasty, and
while it's well documented in `cmsg(3)` I had to fiddle a bit because
the corresponding functions in libc aren't const and thus don't permit a
direct allocation of the buffer as most `cmsg` C code around does.

Closes #1698

## Motivation

Linux limits the maximum amount of data permitted for a single Unix
datagram; sending large payloads directly will fail.

## Solution

Follow systemd.io/JOURNAL_NATIVE_PROTOCOL/ and check for `EMSGSIZE` from
`send()`; in this case write the payload to a memfd, seal it, and pass
it on to journald via a corresponding SCM_RIGHTS control message.

Per discussion in #1698 this adds no dependency on `nix`, and instead
implements fd forwarding directly with some bits of unsafe `libc` code.
hawkw pushed a commit that referenced this issue Dec 3, 2021
See #1698: Properly write large payloads to journal.

I'd appreciate a very careful review; this cmsg stuff is nasty, and
while it's well documented in `cmsg(3)` I had to fiddle a bit because
the corresponding functions in libc aren't const and thus don't permit a
direct allocation of the buffer as most `cmsg` C code around does.

Closes #1698

## Motivation

Linux limits the maximum amount of data permitted for a single Unix
datagram; sending large payloads directly will fail.

## Solution

Follow systemd.io/JOURNAL_NATIVE_PROTOCOL/ and check for `EMSGSIZE` from
`send()`; in this case write the payload to a memfd, seal it, and pass
it on to journald via a corresponding SCM_RIGHTS control message.

Per discussion in #1698 this adds no dependency on `nix`, and instead
implements fd forwarding directly with some bits of unsafe `libc` code.
hawkw pushed a commit that referenced this issue Dec 19, 2021
Per discussion with @hawkw in #1698 I'm adding a few simple integration
tests for the journald subscriber, to have some safety net when
implementing the actual issue in #1698.

These tests send messages of various complexity to the journal, and then
use `journalctl`'s JSON output to get them back out, to check whether
the message arrives in the systemd journal as it was intended to.

## Motivation

Increase test coverage for the journald subscriber and codify a known
good state before approaching a fix for #1698.
hawkw pushed a commit that referenced this issue Dec 19, 2021
See #1698: Properly write large payloads to journal.

I'd appreciate a very careful review; this cmsg stuff is nasty, and
while it's well documented in `cmsg(3)` I had to fiddle a bit because
the corresponding functions in libc aren't const and thus don't permit a
direct allocation of the buffer as most `cmsg` C code around does.

Closes #1698

## Motivation

Linux limits the maximum amount of data permitted for a single Unix
datagram; sending large payloads directly will fail.

## Solution

Follow systemd.io/JOURNAL_NATIVE_PROTOCOL/ and check for `EMSGSIZE` from
`send()`; in this case write the payload to a memfd, seal it, and pass
it on to journald via a corresponding SCM_RIGHTS control message.

Per discussion in #1698 this adds no dependency on `nix`, and instead
implements fd forwarding directly with some bits of unsafe `libc` code.
hawkw pushed a commit that referenced this issue Dec 19, 2021
Per discussion with @hawkw in #1698 I'm adding a few simple integration
tests for the journald subscriber, to have some safety net when
implementing the actual issue in #1698.

These tests send messages of various complexity to the journal, and then
use `journalctl`'s JSON output to get them back out, to check whether
the message arrives in the systemd journal as it was intended to.

## Motivation

Increase test coverage for the journald subscriber and codify a known
good state before approaching a fix for #1698.
hawkw pushed a commit that referenced this issue Dec 19, 2021
See #1698: Properly write large payloads to journal.

I'd appreciate a very careful review; this cmsg stuff is nasty, and
while it's well documented in `cmsg(3)` I had to fiddle a bit because
the corresponding functions in libc aren't const and thus don't permit a
direct allocation of the buffer as most `cmsg` C code around does.

Closes #1698

## Motivation

Linux limits the maximum amount of data permitted for a single Unix
datagram; sending large payloads directly will fail.

## Solution

Follow systemd.io/JOURNAL_NATIVE_PROTOCOL/ and check for `EMSGSIZE` from
`send()`; in this case write the payload to a memfd, seal it, and pass
it on to journald via a corresponding SCM_RIGHTS control message.

Per discussion in #1698 this adds no dependency on `nix`, and instead
implements fd forwarding directly with some bits of unsafe `libc` code.
hawkw pushed a commit that referenced this issue Dec 20, 2021
Per discussion with @hawkw in #1698 I'm adding a few simple integration
tests for the journald subscriber, to have some safety net when
implementing the actual issue in #1698.

These tests send messages of various complexity to the journal, and then
use `journalctl`'s JSON output to get them back out, to check whether
the message arrives in the systemd journal as it was intended to.

## Motivation

Increase test coverage for the journald subscriber and codify a known
good state before approaching a fix for #1698.
hawkw pushed a commit that referenced this issue Dec 20, 2021
See #1698: Properly write large payloads to journal.

I'd appreciate a very careful review; this cmsg stuff is nasty, and
while it's well documented in `cmsg(3)` I had to fiddle a bit because
the corresponding functions in libc aren't const and thus don't permit a
direct allocation of the buffer as most `cmsg` C code around does.

Closes #1698

## Motivation

Linux limits the maximum amount of data permitted for a single Unix
datagram; sending large payloads directly will fail.

## Solution

Follow systemd.io/JOURNAL_NATIVE_PROTOCOL/ and check for `EMSGSIZE` from
`send()`; in this case write the payload to a memfd, seal it, and pass
it on to journald via a corresponding SCM_RIGHTS control message.

Per discussion in #1698 this adds no dependency on `nix`, and instead
implements fd forwarding directly with some bits of unsafe `libc` code.
kaffarell pushed a commit to kaffarell/tracing that referenced this issue May 22, 2024
See tokio-rs#1698: Properly write large payloads to journal.

I'd appreciate a very careful review; this cmsg stuff is nasty, and
while it's well documented in `cmsg(3)` I had to fiddle a bit because
the corresponding functions in libc aren't const and thus don't permit a
direct allocation of the buffer as most `cmsg` C code around does.

Closes tokio-rs#1698

## Motivation

Linux limits the maximum amount of data permitted for a single Unix
datagram; sending large payloads directly will fail.

## Solution

Follow systemd.io/JOURNAL_NATIVE_PROTOCOL/ and check for `EMSGSIZE` from
`send()`; in this case write the payload to a memfd, seal it, and pass
it on to journald via a corresponding SCM_RIGHTS control message.

Per discussion in tokio-rs#1698 this adds no dependency on `nix`, and instead
implements fd forwarding directly with some bits of unsafe `libc` code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants