Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] filter/split access logs by fields other than hostname #5978

Closed
bmarwell opened this issue Dec 13, 2023 · 6 comments · Fixed by #6082
Closed

[RFE] filter/split access logs by fields other than hostname #5978

bmarwell opened this issue Dec 13, 2023 · 6 comments · Fixed by #6082
Labels
feature ⚙️ New feature or request

Comments

@bmarwell
Copy link
Contributor

Hi!

Currently, I can create an access log like so:

    log {
        output file /opt/servers/server1/logs/caddy.access.log
    }

However, with many health checks coming in, I would redirect all known User-Agents for health checks to its own log file:

    log access_log {
        filter not @health-user-agents && not @health-url-paths
        output file /opt/servers/server1/logs/caddy.access.log
    }
    log health_check_log {
        filter @health-user-agents || @health-url-paths
        output file /opt/servers/server1/logs/caddy.access.health.log
    }

Of course this is pseudo syntax, but I think you get the idea.

For reference, in Apache httpd I can write something like this:

SetEnvIf Request_URI ".*\/health-check$" HealthPattern
# just add more here, they are or'd

CustomLog     /opt/servers/server1/logs/caddy.access.health.log    combined    env=HealthPattern
# and for access.log:  env!HealthPattern
@francislavoie francislavoie added the feature ⚙️ New feature or request label Dec 13, 2023
@francislavoie
Copy link
Member

francislavoie commented Dec 13, 2023

Interesting.

I can see a way we could do this. We'd need to adjust

func (slc ServerLogConfig) wrapLogger(logger *zap.Logger, host string) *zap.Logger {
to take the whole Request instead of just the host, then we could use GetVar() to get a special variable like access_logger_name or something like that which is an override for the logger name for the current context. The user could do something like:

@healthCheckLog `header_regexp('User-Agent': '^some-regexp$') || path('/healthz*')`
vars @healthCheckLog access_logger_name "health_check_log"

PRs welcome if you want to take a crack at it.

@bmarwell
Copy link
Contributor Author

Yup, exactly. For me, at least, it is not really high priority. And I would probably see whether other users +1'd this idea.

@armadi1809
Copy link
Contributor

@francislavoie Is this still up for grabs? I would like to take a crack at it.

@francislavoie
Copy link
Member

@armadi1809 yeah, still up for grabs.

Thing is though, I'm not 100% convinced that doing this is a good idea because it makes an already complex logging system even more complex. This functionality might be hard to document and explain.

Anyway this is sorta what I was envisioning (run caddy adapt -p on this config to see the adapted JSON, important for discussion here):

example.com {
	log health_check_log {
		output file /var/logs/caddy.access.health.log
	}
	log {
		output file /var/logs/caddy.access.log
	}

	@healthCheck `header_regexp('User-Agent': '^some-regexp$') || path('/healthz*')`
	handle @healthCheck {
		vars access_logger_name health_check_log
		respond "Healthy"
	}

	handle {
		respond "Hello World"
	}
}

This Caddyfile ☝️ produces sensible JSON config. The logger names only has log0 (the 2nd log without a name), and with the change in behaviour I mentioned earlier we could use that access_logger_name variable to change which logger gets written to for a given request.

The problem is that if you order the log blocks in the opposite order (the one with no name -> log0, and then health_check_log), then logger_names will have health_check_log set as the logger for all logs (because the last logger name for a given hostname wins). This is a foot-gun, because order of log in the config would matter. I don't like this.

We'd need to think of a way to tell the Caddyfile adapter to ignore a given logger (i.e. health_check_log) from being used for logger_names. This probably means adding a subdirective like no_hostname or something (name TBD) which excludes it so that logger_names always has log0. I can't think of a way for us to do this automatically in a way that doesn't break other usecases.

Also I took a look a the code, we'll need to move around the call to s.Logs.wrapLogger for the error routes, because that gets called before we execute the routes, it should be called after so that the request with the vars populated can be used, otherwise vars won't be filled yet.

@armadi1809
Copy link
Contributor

That makes sense. I guess I will start the implementation and try to keep you in the loop. We'll then decide if we end up rolling with it or not. Thank you!

@bmarwell
Copy link
Contributor Author

Thanks! Even if you did not implement it, no blocker for me. Just thought it might be useful for others as well.

Sorting out load balancer requests etc. Makes log files easier to browse locally. That's all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants