Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate and remove common_log field from access logs and single_field encoder #4148

Closed
mholt opened this issue May 6, 2021 · 12 comments
Closed
Assignees
Labels
discussion 💬 The right solution needs to be found
Milestone

Comments

@mholt
Copy link
Member

mholt commented May 6, 2021

When Caddy 2 was being designed a couple years ago, there was still a lot of dependence on common-log-based workflows. Common Log Format (CLF) is bad, because it contains fields which are almost never used in modern HTTP, and it omits obvious fields like Host. This necessitates outdated workflows that separate each log emission into a separate file based on Host. And if you want to organize your logs by more than Host, or something other than Host, it gets even more complicated. And this does not scale well on the modern web, where most servers handle multiple domains -- some Caddy instances serve tens of thousands of hosts. CLF is too rigid to be useful in advanced deployments. It is also inefficient to generate CLF in Caddy's otherwise-zero-allocation logs, and impossible to filter them at emission-time because they're unstructured.

We make the case for structured logging in our official documentation.

Caddy 2 uses JSON logs. To support CLF in Caddy 2, we construct the CLF line and add it to the JSON message emitted for each HTTP request as a separate field. This has the unfortunate side-effect of bloating each and every access log message, and also makes them harder to read/scan. It contains only duplicate information that has to be extracted to be parsed as a subtext anyway.

A log encoder module, single_field, was created as a hack to be able to write purely CLF-formatted log files without any JSON. This was a short-term solution, not a well-engineered one.

Nowadays, modules like the transform encoder exist which can emit access logs in a custom format. They're not zero-allocation of course, but you can do what you want: including CLF if you really need it still.

The common log format is a thorn in the side of Caddy's logs, and I intend to:

I already have considerable weight behind this motion, but in true OSS fashion I wanted to make my intentions clear and open it to discussion before finalizing these changes shortly after the v2.4 release.

@mholt mholt added in progress 🏃‍♂️ Being actively worked on discussion 💬 The right solution needs to be found labels May 6, 2021
@mholt mholt added this to the 2.x milestone May 6, 2021
@mholt mholt self-assigned this May 6, 2021
@francislavoie
Copy link
Member

francislavoie commented May 6, 2021

I agree with this change, but it will be a breaking change for some people who rely on it to feed into tools like fail2ban and such, which don't support structured logs yet.

I think before we remove it from Caddy, we should have a clear and easy way to maintain this behaviour. Maybe this means adding a preset to the formatted encoder to do the same thing.

Also, commonLogFormat does have {http.auth.user.id} but the JSON output doesn't -- we should probably make sure the JSON logs have that set if available (i.e. basicauth username).

We'll also need to add a common_log time format to encoders.go ZapcoreEncoderConfig() because right now it's using {time.now.common_log} but placeholders are not available to be used generally in the formatted encoder.

Edit: Actually the format-encoder module as-is would actually break when given the defaults (i.e. [<template>] omitted in the Caddyfile), because it relies on {common_log} to exist in the logs. So that needs to be redone first. /cc @mohammed90

mohammed90 added a commit to caddyserver/transform-encoder that referenced this issue May 7, 2021
The field `{common_log}` is being deprecated along with all of its reference (see: caddyserver/caddy#4148). This commit is the first step in rebuilding the CLF template to avoid breaking users of this module for the sake of reconstructing CLF.
mholt pushed a commit that referenced this issue Jul 14, 2021
mohammed90 added a commit to caddyserver/transform-encoder that referenced this issue Jul 26, 2021
* internalize the CLF template

The field `{common_log}` is being deprecated along with all of its reference (see: caddyserver/caddy#4148). This commit is the first step in rebuilding the CLF template to avoid breaking users of this module for the sake of reconstructing CLF.

* use remote_addr, which includes the port, in place of the remote host

BREAKING: users of CLF will start seeing the port along with the remote host
@francislavoie francislavoie modified the milestones: 2.x, v2.5.0 Aug 23, 2021
@francislavoie
Copy link
Member

This will likely happen in v2.5.0 #4282

@francislavoie
Copy link
Member

@mholt FYI, related issue we should possibly address #4054

francislavoie added a commit that referenced this issue Nov 1, 2021
Closes #4396, related to #4148

This is a breaking change, but we're already planning on removing  the`common_log` field as well at the same time, so might as well all do it together.

In general, the remote IP is the useful part of the remote address. The port is rarely useful, because it only identifies ephemeral information about the connection from the client. But we were logging both in one field, so certain tooling that would want to only get the remote IP would need to split it up. Splitting is non-trivial, because of IPv4, IPv6, shenanigans. So it's best if we split it up-front before logging. If the log consumer actually cares about the remote port, it can re-assemble it.
francislavoie added a commit that referenced this issue Nov 25, 2021
Closes #4396, related to #4148

This is a breaking change, but we're already planning on removing  the`common_log` field as well at the same time, so might as well all do it together.

In general, the remote IP is the useful part of the remote address. The port is rarely useful, because it only identifies ephemeral information about the connection from the client. But we were logging both in one field, so certain tooling that would want to only get the remote IP would need to split it up. Splitting is non-trivial, because of IPv4, IPv6, shenanigans. So it's best if we split it up-front before logging. If the log consumer actually cares about the remote port, it can re-assemble it.
francislavoie added a commit that referenced this issue Nov 29, 2021
Closes #4396, related to #4148

This is a breaking change, but we're already planning on removing  the`common_log` field as well at the same time, so might as well all do it together.

In general, the remote IP is the useful part of the remote address. The port is rarely useful, because it only identifies ephemeral information about the connection from the client. But we were logging both in one field, so certain tooling that would want to only get the remote IP would need to split it up. Splitting is non-trivial, because of IPv4, IPv6, shenanigans. So it's best if we split it up-front before logging. If the log consumer actually cares about the remote port, it can re-assemble it.
@francislavoie
Copy link
Member

This is now done and merged, and will land in v2.5.0

@AD-Wright
Copy link

I agree with this change, but it will be a breaking change for some people who rely on it to feed into tools like fail2ban and such, which don't support structured logs yet.

I think before we remove it from Caddy, we should have a clear and easy way to maintain this behaviour. Maybe this means adding a preset to the formatted encoder to do the same thing.

Is the intended way to interface fail2ban with Caddy 2.5+ really to compile it yourself with the transform plugin added? Or is there a different method to accomplish what fail2ban does with "vanilla" Caddy? I feel like rate limiting, blocking based on 4xx / 5xx responses, etc. are important parts of running a web server, and it feels like implementing this got harder with 2.5+, but maybe I'm looking at it all wrong.

@francislavoie
Copy link
Member

Rather, our opinion is that fail2ban should modernize, and support structured logging. Common Log is legacy.

Also, it's trivial to build Caddy with plugins. We make it very easy to do so.

You just need to install Go (which just means to download and unzip it, and add it to your PATH), then download and run xcaddy with the plugins you want specified.

Or, you can download a build produced by our build server at https://caddyserver.com/download. Or if you're using Docker, use the builder image variant which comes with xcaddy ready to use.

For rate limiting, you could also use this plugin: https://github.com/mholt/caddy-ratelimit

@mholt
Copy link
Member Author

mholt commented Oct 12, 2022

We had to decide long ago that Caddy couldn't be in the business of integrating with every program, service, provider, and protocol out-of-the-box. Sorry if that is inconvenient, but we make it as easy as possible to customize your builds. Otherwise you'd end up with a 500MB binary that is a nightmare to configure, maintain, and ironically, build (dependency hell).

And actually, ideally all that functionality you describe would just be provided by caddy directly, without needing external dependencies and C code. caddy has plugins for what you say you need, and it also has plugins to integrate with the tools you want. So, just take your pick, I guess.

@muety
Copy link

muety commented Oct 12, 2022

Frankly, I don't think there is a Caddy plugin to write iptables rules depending on number of failed requests per time, like fail2ban does, unless you come up with some overly complex scripts for use with caddy-exec plugin.

I also didn't really like the decision to remove the (old-fashioned or not) still most commonly used log format "over night" with a minor version update.

My fail2ban and a couple of other tools have stopped working, too. Some I fixed, for some I just accepted it. I didn't repair fail2ban, yet, but I'd assume you could probably quite easily write your filter regex for structured logging. For JSON logging it might be a bit harder. If you find something, or come up with your own filter, I think a lot of people will appreciate it, @AD-Wright!

@francislavoie
Copy link
Member

I also didn't really like the decision to remove the (old-fashioned or not) still most commonly used log format "over night" with a minor version update.

First, it was not "overnight", we warned that we would do this many months before doing it. Also, Caddy doesn't strictly follow semver. We have no plans for a Caddy v3 currently. Think of v2 as the product name, and "2.5" as the major version.

Second, it was entirely redundant data, and can be reproduced from the structured logs. For users that didn't need it, it was a waste of space in their logs. For those users, it can be considered a "bug" to be logging those redundant bytes.

Again, there's a totally sufficient workaround that exists by using the https://github.com/caddyserver/transform-encoder plugin which even has a Common Log preset to make it as easy as possible. Just use that.

@mholt
Copy link
Member Author

mholt commented Oct 12, 2022

@muety

Frankly, I don't think there is a Caddy plugin to write iptables rules depending on number of failed requests per time, like fail2ban does, unless you come up with some overly complex scripts for use with caddy-exec plugin.

That sounds like a great candidate for a Caddy plugin! I'm not really familiar with iptables (I've used it once or twice) so someone with chops should write it.

To be clear, CLF support has always been a misfeature, only implemented to enable a smoother transition from v1 to v2. Ever since the launch of v2 we've been recommending the transition to modern tools and formats that support more efficient and useful structured logs.

@AD-Wright
Copy link

Common Log is legacy.

True, but that's also why many programs meant to interface with a web server use it (and not the modern, structured logging). It's also a little puzzling from the user side that it even existed in v1 to begin with then! (But that's how it goes sometimes)

Also, it's trivial to build Caddy with plugins. We make it very easy to do so.
You just need to install Go (which just means to download and unzip it, and add it to your PATH), then download and run xcaddy with the plugins you want specified.
Or, you can download a build produced by our build server at https://caddyserver.com/download. Or if you're using Docker, use the builder image variant which comes with xcaddy ready to use.

I had no idea about https://caddyserver.com/download! One of my concerns was in actually compiling with xcaddy on the VPS it's running on - Caddy runs great on little-to no hardware, but compiling not so much. Being able to download a pre-compiled binary is great.

Also, something I struggled with (and figured out just now) is the necessity of compiling a "plugin" into the main program. I have zero familiarity with Go, but to me something that has to be compiled into a program is a "patch", not a "plugin". "Plugin" evokes just copying a file somewhere, à la Minecraft. It seems like terminology is migrating towards package or module, which I think makes more sense.

However, in going down this route a bit further, I noticed that Caddy also has the add-package command - this seems like the easiest way to me, why did you not recommend it? Is it likely to be removed in the future? If not, I see this as a seamless way to add a package, truly at "plugin" levels. It looks like it will keep the plugin updated with Caddy, as well. Perhaps the transform-encoder should be updated to include the caddy add-package github.com/caddyserver/transform-encoder installation method? (and just replacing instances of format single_field common_log with format transform "{common_log}" in the Caddyfile)

Otherwise you'd end up with a 500MB binary

Fair point. This seems like a very small addition, but I get that in order to have a really small base package the "fat-trimmers" have to be a little aggressive.

Looking at some other threads and @muety 's experience, it seems like perhaps the deprecation of CLF preceded the availability of some tools that really made the workaround... work. If plugins are intended to be as easy as caddy add-package <> going forward, I am really happy with this overall.

Finally, congrats on 100 releases! This project has really accomplished a lot.

@mholt
Copy link
Member Author

mholt commented Oct 13, 2022

Thanks! The community has been awesome to get us to this point. And thanks for understanding :)

crazygolem added a commit to crazygolem/dotfiles that referenced this issue Jun 4, 2023
The Commons Log format was dropped in caddy 2.5, cf. [1].

[1]: caddyserver/caddy#4148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion 💬 The right solution needs to be found
Projects
None yet
Development

No branches or pull requests

4 participants