Skip to content
This repository has been archived by the owner on Sep 8, 2018. It is now read-only.

forwarder, others: remove the record size limit #21

Open
bracki opened this issue Jan 18, 2017 · 18 comments
Open

forwarder, others: remove the record size limit #21

bracki opened this issue Jan 18, 2017 · 18 comments
Assignees
Labels
Milestone

Comments

@bracki
Copy link

bracki commented Jan 18, 2017

When I try to ingest a large log file, the ingester will fail with bufio.Scanner: token too long.

> cat 2016-11-04-06.tsv|head -n 100000|./oklog forward localhost
ts=2017-01-18T22:46:24.876892152Z level=debug raw_target=tcp://localhost:7651 resolved_target=tcp://localhost:7651
ts=2017-01-18T22:46:25.025694007Z level=info stdin=exhausted due_to="bufio.Scanner: token too long"
@josephglanville
Copy link

This is likely caused by log lines larger than 64k which is bufios default max token size.
A call to s.Buffer() to set a larger buffer is probably required, though this should probably be configurable as it directly increases memory usage.

@JensRantil
Copy link

An alternative would be to store a line length in front of the line so we can know upfront how long the line is.

Also, the bigger length error should obviously not panic the process.

@bracki
Copy link
Author

bracki commented Jan 19, 2017

You are correct, there seems to be one huge line among all others.

@peterbourgon
Copy link
Member

Yep, you've stumbled over the record size limit. It's actually baked into various places throughout the system, not only in the forwarder. It's definitely fixable, I've just not taken the time yet. Thanks for filing the issue!

@peterbourgon peterbourgon changed the title Trouble ingesting larger log volume forwarder, others: remove the record size limit Jan 19, 2017
@newhook
Copy link

newhook commented Mar 29, 2017

@peterbourgon have you looked into this yet? We're considering using this for our production systems but we have potentially very long log lines so a fixed record size limit would cause us issues.

@peterbourgon
Copy link
Member

Nope. But I'm happy to turn my attention to it next. Warning: no ETAs ;)

Is there a reasonable upper size limit for your use case, after which you'd be willing to accept e.g. truncation?

@newhook
Copy link

newhook commented Mar 29, 2017

I just did a quick search over some of our logs over the past hour or so and the longest line we've emitted is 670620 bytes. I'm somewhat certain I could find longer entries if I searched over the past few days.

@peterbourgon
Copy link
Member

Goodness! Okay. JSON?

@newhook
Copy link

newhook commented Mar 29, 2017

Yeah, the huge lines are mostly comprised of JSON.

@peterbourgon peterbourgon self-assigned this Mar 31, 2017
@peterbourgon peterbourgon added this to the v0.3.0 milestone Mar 31, 2017
@lucasdss
Copy link

Hi, I ended up with the same problem here.
Usually our logs are json and almost every app I tried dispatched this error and crashed.

I might be able to help.
Just adding an parameter to set the max size would be enough our you were thinking in another approach? Maybe add recover as well?

@peterbourgon
Copy link
Member

Max size is baked in to several layers in several components, so it's not as easy as just adding a parameter, unfortunately.

@yellowmegaman
Copy link

Encountered this one too, tried to move all log backups to oklog, so we can search through them, the average log file size is around 100MB, some 350k+ lines. Got bufio.Scanner: token too long for almost all of them ;(
But good news are that all real-time logs are ingesting ok ;)

Any ideas how to get around this issue, before max size parameters are implemented? Maybe some tricky stdout-splitting?

@yellowmegaman
Copy link

OK so I've tried to fool around with parallel --block 1k --pipe and stuff, but in the end it is really long line. Sad thing that forwarder stops transferring file contents after token too long error.
Logs are Scala/Java exceptions, if it helps somehow.

@timwebster9
Copy link

We're seeing this now as well. Similarly we are looking at OK Log for general use in our dev/test environments. In this case it's the output from a Jenkins build/test job. Looks to be JSON output as well...

@peterbourgon
Copy link
Member

Thanks for the report. What's the rough max size of your lines?

@timwebster9
Copy link

In this particular file it looks to be 103046.

@samirtahir91
Copy link

It looks like bufio.Reader should be used according to bufio documentation

Scanning stops unrecoverably at EOF, the first I/O error, or a token too large to fit in the buffer. When a scan stops, the reader may have advanced arbitrarily far past the last token. Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead.

Is it possible changing from Scanner to Reader would work?

@peterbourgon
Copy link
Member

Well, sure, but...

blep ~/src/github.com/oklog/oklog (master) rg 'bufio.NewScanner'
cmd/oklog/stream.go
72:                     scanner := bufio.NewScanner(resp.Body)

cmd/oklog/query.go
157:            s := bufio.NewScanner(r)

cmd/oklog/forward.go
131:            s       = bufio.NewScanner(os.Stdin)

pkg/ingest/conn_test.go
106:            s := bufio.NewScanner(conn)

pkg/ingest/conn.go
73:     s := bufio.NewScanner(conn)
90:     s := bufio.NewScanner(conn)

pkg/stream/stream.go
130:    s := bufio.NewScanner(rc)

pkg/store/read_test.go
340:                    s := bufio.NewScanner(rc)
445:            s := bufio.NewScanner(r)

pkg/store/query_registry.go
129:    s := bufio.NewScanner(bytes.NewReader(segment))

pkg/store/api.go
457:            s     = bufio.NewScanner(src)

pkg/store/read.go
66:             scanner[i] = bufio.NewScanner(readers[i])
187:            scanner[i] = bufio.NewScanner(readers[i])
384:            s := bufio.NewScanner(src)
442:            rc.scanner[i] = bufio.NewScanner(rcs[i])

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants