Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: Add design doc #4809

Merged
merged 90 commits into from
Feb 9, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
993ebd8
log: Add design doc
pellared Jan 2, 2024
1f35f95
[chore] Fix lint-markdown target
pellared Jan 2, 2024
524032c
[chore] Remove dead hyperlinks
pellared Jan 2, 2024
f095e0f
[chore] Remove dead hyperlink
pellared Jan 2, 2024
87f03a4
Refine docs
pellared Jan 3, 2024
6d1481c
Fix LoggerProvider snippet
pellared Jan 3, 2024
49826ff
Add hyperlink to prototype
pellared Jan 3, 2024
efe740e
Update log/DESIGN.md
pellared Jan 5, 2024
b7a4302
Merge branch 'main' into logs-bridge-part-1
pellared Jan 8, 2024
4137781
Merge branch 'main' into logs-bridge-part-1
pellared Jan 9, 2024
3113402
Add a reference to the prototype in benchmakrs section
pellared Jan 9, 2024
90b3857
Clarify sync.Pool usage
pellared Jan 9, 2024
cbc3a94
Update log/DESIGN.md
pellared Jan 9, 2024
4588071
log: Add design doc
pellared Jan 2, 2024
ba7bc62
[chore] Fix lint-markdown target
pellared Jan 2, 2024
b9c0fc9
[chore] Remove dead hyperlinks
pellared Jan 2, 2024
8cebce5
[chore] Remove dead hyperlink
pellared Jan 2, 2024
c66a91b
Refine docs
pellared Jan 3, 2024
a37b716
Fix LoggerProvider snippet
pellared Jan 3, 2024
0e82c0f
Add hyperlink to prototype
pellared Jan 3, 2024
8b45561
Update log/DESIGN.md
pellared Jan 5, 2024
621be00
Add a reference to the prototype in benchmakrs section
pellared Jan 9, 2024
2f8c6e4
Clarify sync.Pool usage
pellared Jan 9, 2024
c20bcc2
Update log/DESIGN.md
pellared Jan 9, 2024
2f0c54d
Make design decision more explicit
pellared Jan 9, 2024
c503b48
Attributes are defined as a slice
pellared Jan 9, 2024
57b5bdd
Define SeverityNumber
pellared Jan 9, 2024
61f9683
Define LoggerOptions
pellared Jan 10, 2024
fa72f4c
Define noop package
pellared Jan 10, 2024
e126afa
Define all parameters
pellared Jan 10, 2024
e525210
Clarify Record.Body
pellared Jan 10, 2024
ec43527
Document Record.StructuredBody alternative
pellared Jan 10, 2024
917a9a2
Fix typos
pellared Jan 10, 2024
8a33bb4
Merge branch 'main' into logs-bridge-part-1
pellared Jan 10, 2024
2cd7cae
Fix typo
pellared Jan 10, 2024
20ecb0e
Merge branch 'logs-bridge-part-1' of https://github.com/pellared/open…
pellared Jan 10, 2024
51a9bb7
Apply suggestions from code review
pellared Jan 10, 2024
3f25db7
Clarify trace context correlation
pellared Jan 10, 2024
b7a96bc
Merge branch 'main' into logs-bridge-part-1
pellared Jan 11, 2024
8d0da66
Merge branch 'main' into logs-bridge-part-1
pellared Jan 16, 2024
363accf
Merge branch 'main' into logs-bridge-part-1
pellared Jan 16, 2024
90dfb5e
Reference Displaying Severity recommendation
pellared Jan 16, 2024
0cd0677
Add Severity[Level]1 consts
pellared Jan 17, 2024
955ccb8
Simplify severity const block
pellared Jan 17, 2024
d3c0af9
Merge branch 'main' into logs-bridge-part-1
pellared Jan 18, 2024
35ff8f7
Record attributes based on slog.Record
pellared Jan 18, 2024
68986f6
Rejected proposal: Severity type encapsulating number and text
pellared Jan 18, 2024
33bea2e
Merge branch 'main' into logs-bridge-part-1
pellared Jan 18, 2024
60cb10a
The caller must not subsequently mutate the record passed to Emit
pellared Jan 18, 2024
544b371
Refine implementation requirements
pellared Jan 18, 2024
655a44a
Fix grammar
pellared Jan 18, 2024
23dc70d
Merge branch 'main' into logs-bridge-part-1
pellared Jan 19, 2024
bd23513
Update DESIGN.md
pellared Jan 24, 2024
516bc0c
Update DESIGN.md
pellared Jan 24, 2024
a9aeba4
Update DESIGN.md
pellared Jan 24, 2024
c570d87
Merge branch 'main' into logs-bridge-part-1
pellared Jan 24, 2024
cdb4b30
Fix typo
pellared Jan 24, 2024
1b802bd
Update footnotes
pellared Jan 24, 2024
81f9953
Merge branch 'main' into logs-bridge-part-1
pellared Jan 24, 2024
11ac173
Merge branch 'main' into logs-bridge-part-1
pellared Jan 25, 2024
249ecb6
Pass by value
pellared Jan 26, 2024
499da4e
Record with pointer receivers only
pellared Jan 26, 2024
3b526cc
Fix typo
pellared Jan 26, 2024
16a5fbf
Value has As[Kind] methods
pellared Jan 26, 2024
6a47a3c
On fmt.Stringer
pellared Jan 26, 2024
0556320
Merge branch 'main' into logs-bridge-part-1
pellared Jan 29, 2024
ce4751b
Merge branch 'main' into logs-bridge-part-1
pellared Jan 30, 2024
3744193
Update DESIGN.md
pellared Jan 30, 2024
342ebd8
Update DESIGN.md
pellared Jan 30, 2024
f18ce5c
Update DESIGN.md
pellared Jan 30, 2024
67aa37d
Merge branch 'main' into logs-bridge-part-1
pellared Feb 1, 2024
baaa596
Remove KeyValue.Invalid
pellared Feb 1, 2024
9c513c6
Add comments to value snippet
pellared Feb 1, 2024
9ab4546
Update DESIGN.md
pellared Feb 1, 2024
adaab51
Merge branch 'main' into logs-bridge-part-1
pellared Feb 1, 2024
c6e46ff
Fixes
pellared Feb 2, 2024
3f0f1bf
Rewrite Record.Body as any
pellared Feb 2, 2024
bda50d1
Use any instead of defining Value
pellared Feb 2, 2024
82d506b
Update Record attributes as slice
pellared Feb 2, 2024
98bdf81
Fix typo
pellared Feb 2, 2024
ce1dc3b
application -> bridge implementation
pellared Feb 2, 2024
0a4a05e
Merge branch 'main' into logs-bridge-part-1
pellared Feb 6, 2024
ad00cc8
Remove addressed issues
pellared Feb 7, 2024
16c316e
Add Add XYZ method to Logger rejected proposal
pellared Feb 7, 2024
fe8362c
Remove Record.Clone as it is not necessary
pellared Feb 7, 2024
7571081
Rejected proposal: Rename KeyValue to Attr
pellared Feb 7, 2024
b08fead
Fix typo
pellared Feb 7, 2024
a459c58
Merge branch 'main' into logs-bridge-part-1
pellared Feb 7, 2024
f0b9123
Merge branch 'main' into logs-bridge-part-1
pellared Feb 7, 2024
f72a6ed
Merge branch 'main' into logs-bridge-part-1
pellared Feb 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -315,4 +315,4 @@ add-tags: | $(MULTIMOD)

.PHONY: lint-markdown
lint-markdown:
docker run -v "$(CURDIR):$(WORKDIR)" docker://avtodev/markdown-lint:v1 -c $(WORKDIR)/.markdownlint.yaml $(WORKDIR)/**/*.md
docker run -v "$(CURDIR):$(WORKDIR)" avtodev/markdown-lint:v1 -c $(WORKDIR)/.markdownlint.yaml $(WORKDIR)/**/*.md
301 changes: 301 additions & 0 deletions log/DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
# Logs Bridge API

## Abstract

`go.opentelemetry.io/otel/log` provides
[Logs Bridge API](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/).

The initial version of the design and the prototype
was created in [#4725](https://github.com/open-telemetry/opentelemetry-go/pull/4725).

## Background

The key challenge is to create a well-performant API compliant with the specification.
pellared marked this conversation as resolved.
Show resolved Hide resolved
pellared marked this conversation as resolved.
Show resolved Hide resolved
pellared marked this conversation as resolved.
Show resolved Hide resolved
Performance is seen as one of the most important characteristics of logging libraries in Go.

## Design

This proposed design aims to:

- be specification compliant,
- be similar to Trace and Metrics API,
- take advantage of both OpenTelemetry and `slog` experience to achieve acceptable performance.

### Module structure

The Go module consits of the following packages:
pellared marked this conversation as resolved.
Show resolved Hide resolved

- `go.opentelemetry.io/otel/log`
- `go.opentelemetry.io/otel/log/embedded`
- `go.opentelemetry.io/otel/log/noop`
pellared marked this conversation as resolved.
Show resolved Hide resolved

The module name is compliant with
[Artifact Naming](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/bridge-api.md#artifact-naming)
and the package structure is the same as for Trace API and Metrics API.

### LoggerProvider

The [`LoggerProvider` abstraction](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#loggerprovider)
is defined as an interface in `provider.go`.
pellared marked this conversation as resolved.
Show resolved Hide resolved

pellared marked this conversation as resolved.
Show resolved Hide resolved
### Logger

The [`Logger` abstraction](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#logger)
is defined as an interface in `logger.go`.
pellared marked this conversation as resolved.
Show resolved Hide resolved

Canceling the context pass to `Emit` should not affect record processing.
Among other things, log messages may be necessary to debug a
cancellation-related problem.
The context is used to pass request-scoped values.
The API implementation should handle the trace context passed
in `ctx` to the `Emit` method.

### Record

The [`LogRecord` abstraction](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#logger)
is defined as a struct in `record.go`.

## Usage examples

### Log Bridge implementation

The log bridges can use [`sync.Pool`](https://pkg.go.dev/sync#Pool)
for reducing the number of allocations when passing attributes.
pellared marked this conversation as resolved.
Show resolved Hide resolved

The bridge implementation should do its best to pass
the `ctx` containing the trace context from the caller
so it can later passed via `Emit`.
Re-constructing a `context.Context` with [`trace.ContextWithSpanContext`](https://pkg.go.dev/go.opentelemetry.io/otel/trace#ContextWithSpanContext)
and [`trace.NewSpanContext`](https://pkg.go.dev/go.opentelemetry.io/otel/trace#NewSpanContext)
would usually involve more memory allocations.
pellared marked this conversation as resolved.
Show resolved Hide resolved
MrAlias marked this conversation as resolved.
Show resolved Hide resolved

The logging libraries which have recording methods that accepts `context.Context`,
such us [`slog`](https://pkg.go.dev/log/slog),
[`logrus`](https://pkg.go.dev/github.com/sirupsen/logrus)
[`zerolog`](https://pkg.go.dev/github.com/rs/zerolog),
makes passing the trace context trivial.

However, some libraries do not accept a `context.Context` in their recording methods.
Structured logging libraries,
such as [`logr`](https://pkg.go.dev/github.com/go-logr/logr)
and [`zap`](https://pkg.go.dev/go.uber.org/zap),
offer passing `any` type as a log attribute/field.
Therefore, their bridge implementations can define a "special" log attributes/field
that will be used to capture the trace context.

[The prototype](https://github.com/open-telemetry/opentelemetry-go/pull/4725)
has a naive implementation of
[slog.Handler](https://pkg.go.dev/log/slog#Handler) in `log/internal/slog.go`
and [logr.LogSink](https://pkg.go.dev/github.com/go-logr/logr#LogSink) in `log/internal/logr.go`.

### Direct API usage

The users may also chose to use the API directly.

```go
package app

var logger = otel.Logger("my-service")

// In some function:
logger.Emit(ctx, Record{Severity: log.SeverityInfo, Body: "Application started."})
```

### API implementation

If the implementation processes the record asynchronously,
then it has to copy record attributes,
in order to avoid use after free bugs and race condition.

Excerpt of how SDK can implement the `Logger` interface.

```go
type Logger struct {
scope instrumentation.Scope
processor Processor
}

func (l *Logger) Emit(ctx context.Context, r log.Record) {
// Create log record model.
record, err := toModel(r)
if err != nil {
otel.Handle(err)
return
}
l.processor.Process(ctx, record) // Note: A batch processor copies the attributes.
}
```

A test implementation of the the `Logger` interface
used for benchmarking is in `internal/writer_logger.go`.

## Compatibility

The backwards compatibility is achieved using the `embedded` design pattern
that is already used in Trace API and Metrics API.

Additionally, the `Logger.Emit` functionality can be extended by
adding new exported fields to the `Record` struct.

## Benchmarking

The benchmarks take inspiration from [`slog`](https://pkg.go.dev/log/slog),
because for the Go team it was also critical to create API that would be fast
and interoperable with existing logging packages.[^1][^2]
pellared marked this conversation as resolved.
Show resolved Hide resolved

## Rejected Alternatives

### Reuse slog

The API must not be coupled to [`slog`](https://pkg.go.dev/log/slog),
nor any other logging library.

The API needs to evolve orthogonally to `slog`.

`slog` is not compliant with the [Logs Bridge API](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/).
and we cannot expect the Go team to make `slog` compliant with it.

The interoperabilty can be achieved using [a log bridge](https://opentelemetry.io/docs/specs/otel/glossary/#log-appender--bridge).

You can read more about OpenTelemetry Logs design on [opentelemetry.io](https://opentelemetry.io/docs/concepts/signals/logs/).

### Record as interface

`Record` is defined as a `struct` because of the following reasons.

Log record is a value object without any behavior.
It is used as data input for Logger methods.

The log record resembles the instrument config structs like [metric.Float64CounterConfig](https://pkg.go.dev/go.opentelemetry.io/otel/metric#Float64CounterConfig).

Using `struct` instead of `interface` should have better the performance as e.g.
indirect calls are less optimized,
usage of interfaces tend to increase heap allocations.[^2]

The `Record` design is inspired by [`slog.Record`](https://pkg.go.dev/log/slog#Record).

### Options as parameter to Logger.Emit

One of the initial ideas was to have:

```go
type Logger interface{
embedded.Logger
Emit(ctx context.Context, options ...RecordOption)
}
```

The main reason was that design would be similar
to the [Meter API](https://pkg.go.dev/go.opentelemetry.io/otel/metric#Meter)
for creating instruments.

However, passing `Record` directly, instead of using options,
is more performant as it reduces heap allocations.[^3]

Another advantage of passing `Record` is that API would not have functions like `NewRecord(options...)`,
which would be used by the SDK and not by the users.

At last, the definition would be similar to [`slog.Handler.Handle`](https://pkg.go.dev/log/slog#Handler)
that was designed to provide optimization opportunities.[^1]

### Passing record as pointer to Logger.Emit

So far the benchmarks do not show differences that would
favor passing the record via pointer (and vice versa).

Passing via value feels safer because of the following reasons.

It follows the design of [`slog.Handler`](https://pkg.go.dev/log/slog#Handler).

It should reduce the possibility of a heap allocation.

The user would not be able to pass `nil`.
Therefore, it reduces the possiblity to have a nil pointer dereference.

### Passing struct as parameter to LoggerProvider.Logger

Similarly to `Logger.Emit`, we could have something like:

```go
type Logger interface{
embedded.Logger
Logger(name context.Context, config LoggerConfig)
}
```

The drawback of this idea would be that this would be
a different design from Trace and Metrics API.

The performance of acquiring a logger is not as critical
as the performance of emitting a log record. While a single
HTTP/RPC handler could write hundreds of logs, it should not
create a new logger for each log entry.
The application should reuse loggers whenever possible.

### Logger.WithAttributes

We could add `WithAttributes` to the `Logger` interface.
Then `Record` could be a simple struct with only exported fields.
The idea was that the SDK would implement the performance improvements
instead of doing it in the API.
This would allow having different optimisation strategies.
pellared marked this conversation as resolved.
Show resolved Hide resolved

During the analysis[^4], it occurred that the main problem of this proposal
is that the variadic slice passed to an interface method is always heap allocated.

Moreover, the logger returned by `WithAttribute` was allocated on the heap.

At last, the proposal was not specification compliant.
pellared marked this conversation as resolved.
Show resolved Hide resolved

### Record attributes like in slog.Record

To reduce the number of allocations of the attributes,
the `Record` could be modeled similarly to [`slog.Record`](https://pkg.go.dev/log/slog#Record).
`Record` could have `WalkAttributes` and `AddAttributes` methods,
like [`slog.Record.Attrs`](https://pkg.go.dev/log/slog#Record.Attrs)
and [`slog.Record.AddAttrs`](https://pkg.go.dev/log/slog#Record.AddAttrs),
in order to achieve high-performance when accessing and setting attributes efficiently.
`Record` would have a `AttributesLen` method that returns
the number of attributes to allow slice preallocation
when converting records to a different representation.

However, during the analysis[^5] we decided that having
a simple slice in `Record` is more flexible.

It is possible to achieve better performance, by using [`sync.Pool`](https://pkg.go.dev/sync#Pool).
MrAlias marked this conversation as resolved.
Show resolved Hide resolved

Having a simple `Record` without any logic makes it possible
that the optimisations can be done in API implementation
and bridge implementations.
For instance, in order to reduce the heap allocations of attributes,
the bridge implementation can use a `sync.Pool`.
In such case, the API implementation (SDK) would need to copy the attributes
when the records are processed asynchrounsly,
pellared marked this conversation as resolved.
Show resolved Hide resolved
in order to avoid use after free bugs and race conditions.

For reference, here is the reason why `slog` does not use `sync.Pool`[^2]:

> We can use a sync pool for records though we decided not to.
You can but it's a bad idea for us. Why?
Because users have control of Records.
Handler writers can get their hands on a record
and we'd have to ask them to free it
or try to free it magically at some some point.
But either way, they could get themselves in trouble by freeing it twice
or holding on to one after they free it.
That's a use after free bug and that's why `zerolog` was problematic for us.
`zerolog` as as part of its speed exposes a pool allocated value to users
if you use `zerolog` the normal way, that you'll see in all the examples,
you will never encounter a problem.
But if you do something a little out of the ordinary you can get
use after free bugs and we just didn't want to put that in the standard library.

We took a different decision, because the key difference is that `slog`
pellared marked this conversation as resolved.
Show resolved Hide resolved
is a logging library and Logs Bridge API is only a logging abstraction.
We want to provide more flexibility and offer better speed.

[^1]: Jonathan Amsterdam, [The Go Blog: Structured Logging with slog](https://go.dev/blog/slog)
[^2]: Jonathan Amsterdam, [GopherCon Europe 2023: A Fast Structured Logging Package](https://www.youtube.com/watch?v=tC4Jt3i62ns)
[^3]: [Emit definition discussion with benchmarks](https://github.com/open-telemetry/opentelemetry-go/pull/4725#discussion_r1400869566)
[^4]: [Logger.WithAttributes analysis](https://github.com/pellared/opentelemetry-go/pull/3)
[^5]: [Record attributes as field and use sync.Pool for reducing allocations analysis](https://github.com/pellared/opentelemetry-go/pull/4)