[RFE] filter/split access logs by fields other than hostname #5978

bmarwell · 2023-12-13T07:58:49Z

Hi!

Currently, I can create an access log like so:

    log {
        output file /opt/servers/server1/logs/caddy.access.log
    }

However, with many health checks coming in, I would redirect all known User-Agents for health checks to its own log file:

    log access_log {
        filter not @health-user-agents && not @health-url-paths
        output file /opt/servers/server1/logs/caddy.access.log
    }
    log health_check_log {
        filter @health-user-agents || @health-url-paths
        output file /opt/servers/server1/logs/caddy.access.health.log
    }

Of course this is pseudo syntax, but I think you get the idea.

For reference, in Apache httpd I can write something like this:

SetEnvIf Request_URI ".*\/health-check$" HealthPattern
# just add more here, they are or'd

CustomLog     /opt/servers/server1/logs/caddy.access.health.log    combined    env=HealthPattern
# and for access.log:  env!HealthPattern

The text was updated successfully, but these errors were encountered:

francislavoie · 2023-12-13T09:07:21Z

Interesting.

I can see a way we could do this. We'd need to adjust

caddy/modules/caddyhttp/logging.go

Line 61 in 7d919af

    
           func (slc ServerLogConfig) wrapLogger(logger *zap.Logger, host string) *zap.Logger {

to take the whole Request instead of just the host, then we could use GetVar() to get a special variable like access_logger_name or something like that which is an override for the logger name for the current context. The user could do something like:

@healthCheckLog `header_regexp('User-Agent': '^some-regexp$') || path('/healthz*')`
vars @healthCheckLog access_logger_name "health_check_log"

PRs welcome if you want to take a crack at it.

bmarwell · 2023-12-13T12:08:58Z

Yup, exactly. For me, at least, it is not really high priority. And I would probably see whether other users +1'd this idea.

armadi1809 · 2024-01-17T16:18:30Z

@francislavoie Is this still up for grabs? I would like to take a crack at it.

francislavoie · 2024-01-17T18:12:14Z

@armadi1809 yeah, still up for grabs.

Thing is though, I'm not 100% convinced that doing this is a good idea because it makes an already complex logging system even more complex. This functionality might be hard to document and explain.

Anyway this is sorta what I was envisioning (run caddy adapt -p on this config to see the adapted JSON, important for discussion here):

example.com {
	log health_check_log {
		output file /var/logs/caddy.access.health.log
	}
	log {
		output file /var/logs/caddy.access.log
	}

	@healthCheck `header_regexp('User-Agent': '^some-regexp$') || path('/healthz*')`
	handle @healthCheck {
		vars access_logger_name health_check_log
		respond "Healthy"
	}

	handle {
		respond "Hello World"
	}
}

This Caddyfile ☝️ produces sensible JSON config. The logger names only has log0 (the 2nd log without a name), and with the change in behaviour I mentioned earlier we could use that access_logger_name variable to change which logger gets written to for a given request.

The problem is that if you order the log blocks in the opposite order (the one with no name -> log0, and then health_check_log), then logger_names will have health_check_log set as the logger for all logs (because the last logger name for a given hostname wins). This is a foot-gun, because order of log in the config would matter. I don't like this.

We'd need to think of a way to tell the Caddyfile adapter to ignore a given logger (i.e. health_check_log) from being used for logger_names. This probably means adding a subdirective like no_hostname or something (name TBD) which excludes it so that logger_names always has log0. I can't think of a way for us to do this automatically in a way that doesn't break other usecases.

Also I took a look a the code, we'll need to move around the call to s.Logs.wrapLogger for the error routes, because that gets called before we execute the routes, it should be called after so that the request with the vars populated can be used, otherwise vars won't be filled yet.

armadi1809 · 2024-01-17T18:33:13Z

That makes sense. I guess I will start the implementation and try to keep you in the loop. We'll then decide if we end up rolling with it or not. Thank you!

bmarwell · 2024-01-19T08:30:51Z

Thanks! Even if you did not implement it, no blocker for me. Just thought it might be useful for others as well.

Sorting out load balancer requests etc. Makes log files easier to browse locally. That's all.

francislavoie added the feature ⚙️ New feature or request label Dec 13, 2023

armadi1809 mentioned this issue Feb 4, 2024

logging: Add support for additional logger filters other than hostname #6082

Merged

francislavoie closed this as completed in #6082 May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] filter/split access logs by fields other than hostname #5978

[RFE] filter/split access logs by fields other than hostname #5978

bmarwell commented Dec 13, 2023

francislavoie commented Dec 13, 2023 •

edited

bmarwell commented Dec 13, 2023

armadi1809 commented Jan 17, 2024

francislavoie commented Jan 17, 2024

armadi1809 commented Jan 17, 2024

bmarwell commented Jan 19, 2024

[RFE] filter/split access logs by fields other than hostname #5978

[RFE] filter/split access logs by fields other than hostname #5978

Comments

bmarwell commented Dec 13, 2023

francislavoie commented Dec 13, 2023 • edited

bmarwell commented Dec 13, 2023

armadi1809 commented Jan 17, 2024

francislavoie commented Jan 17, 2024

armadi1809 commented Jan 17, 2024

bmarwell commented Jan 19, 2024

francislavoie commented Dec 13, 2023 •

edited