Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/jsonx.AppendString causing panic in v3.20.0 #607

Closed
nicoroy2561 opened this issue Nov 27, 2022 · 12 comments
Closed

internal/jsonx.AppendString causing panic in v3.20.0 #607

nicoroy2561 opened this issue Nov 27, 2022 · 12 comments
Assignees
Labels

Comments

@nicoroy2561
Copy link

Description

As per title, my production app just ran into a panic using v3.20.0.
Downgrading to v3.19.2 for the time being, as that version worked fine last time.

Attaching panic details:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8c64b9]

goroutine 631981 [running]:
github.com/newrelic/go-agent/v3/internal/jsonx.AppendString(0xc000b5baf0?, {0x0, 0x3})
        /home/user/go/pkg/mod/github.com/newrelic/go-agent/v3@v3.20.0/internal/jsonx/encode.go:25 +0x59
github.com/newrelic/go-agent/v3/newrelic.(*jsonFieldsWriter).stringField(0xc000b5baf0, {0xfb4278?, 0xfb4278?}, {0x0, 0x3})
        /home/user/go/pkg/mod/github.com/newrelic/go-agent/v3@v3.20.0/newrelic/json_object_writer.go:34 +0x45
github.com/newrelic/go-agent/v3/newrelic.agentAttributesJSON(0xc00cb8cb88, 0xc005fe3680, 0x1)
        /home/user/go/pkg/mod/github.com/newrelic/go-agent/v3@v3.20.0/newrelic/attributes_from_internal.go:465 +0x177
github.com/newrelic/go-agent/v3/newrelic.(*txnEvent).WriteJSON(0xc00775f380, 0xc005fe3680)
        /home/user/go/pkg/mod/github.com/newrelic/go-agent/v3@v3.20.0/newrelic/txn_events.go:55 +0x30c
github.com/newrelic/go-agent/v3/newrelic.(*analyticsEvents).CollectorJSON(0xc00e4026f0, {0xc004165800, 0x7a})
        /home/user/go/pkg/mod/github.com/newrelic/go-agent/v3@v3.20.0/newrelic/analytics_events.go:118 +0x256
github.com/newrelic/go-agent/v3/newrelic.(*txnEvents).Data(0xc010437f20?, {0xc004165800?, 0x0?}, {0x0?, 0x1877698?, 0x18a4f40?})
        /home/user/go/pkg/mod/github.com/newrelic/go-agent/v3@v3.20.0/newrelic/txn_events.go:174 +0x25
github.com/newrelic/go-agent/v3/newrelic.(*app).doHarvest(0xc001e2d900, 0xc000b13fd0?, {0xd52c97?, 0xc009744cd2?, 0x19c0f60?}, 0xc001bb8a00)
        /home/user/go/pkg/mod/github.com/newrelic/go-agent/v3@v3.20.0/newrelic/internal_app.go:74 +0x15c
created by github.com/newrelic/go-agent/v3/newrelic.(*app).process
        /home/user/go/pkg/mod/github.com/newrelic/go-agent/v3@v3.20.0/newrelic/internal_app.go:228 +0x3ae

@iamemilio
Copy link
Contributor

Thanks for reporting, we will fix this ASAP

@iamemilio iamemilio self-assigned this Dec 1, 2022
@iamemilio
Copy link
Contributor

@nicoroy2561 Would it be possible to provide me with a reproducer?

@nicoroy2561
Copy link
Author

@nicoroy2561 Would it be possible to provide me with a reproducer?

Sorry, the stacktrace I posted is the whole I got from the crash. It's not even long enough for me to tell where it originates from.
It's also impossible to reproduce it because it only happened in live, I never encountered it when testing.

@iamemilio
Copy link
Contributor

This is going to be a tricky one. The library that the panic triggered on has not changed for 7 years though, so I doubt the version matters. It must just be an edge case that flew under the radar for quite some time.

@iamemilio
Copy link
Contributor

I was not able to reproduce this, no matter what I tried. I did get some nice additional coverage out of it though 😄. #610. I am totally stumped right now. How often does this error occur in your environment? If it happens again, can you try to capture any additional information that could give us a clue as to why this might be happening?

@nicoroy2561
Copy link
Author

Hi,
sorry for the late response.

Answering your questions first, I got this error only once on my live server. If what you're saying is true and the library has never changed over different versions, that makes me fear that it might just randomly happen again.

I took some time to properly look both at the code causing the panic and the stack trace, and I noticed something that seems quite weird to me:
github.com/newrelic/go-agent/v3/internal/jsonx.AppendString(0xc000b5baf0?, {0x0, 0x3})

The very first line of the stacktrace is just... weird. Leaving aside the '?' which apparently is normal, the next values are for the string's underlying struct. And they show a 0x3 length value, but a nil pointer 0x0.
This makes absolutely no sense as it shouldn't be possible to have a string object that has a length bigger than the underlying []byte (array?).
And that's likely what's causing the issue, because I kinda expect len() to only look at the length int and not check the actual pointer.

Since the problematic string originates from
func agentAttributesJSON(a *attributes, buf *bytes.Buffer, d destinationSet)
which takes the string from a.Agent, of type map[string]agentAttributeValue, it makes me think that there may be something somewhere either using Unsafe or using c code to set one such erroneous value within that map.

@nicoroy2561
Copy link
Author

nicoroy2561 commented Dec 3, 2022

Perhaps a recover could be added to func (app *app) doHarvest(h *harvest, harvestStart time.Time, run *appRun) so that when it's spawned by func (app *app) process() it doesn't crash the whole application, but instead skips harvesting data that single time.

I'd much rather get the stacktrace and a message like "New Relic data harvest failed" than a app-wide panic, especially if it can then potentially go on harvesting data later (skipping the problematic one?).

@iamemilio
Copy link
Contributor

Yeah, putting a recover would be a good stopgap. Hmm, let me double check everything. That is a good point about the nil array, that might be a good lead for us, but I am not sure that we use the unsafe library for anything. I'll have to double check that.

@nicoroy2561
Copy link
Author

If it can be of any help, I am still not experiencing this issue after downgrading to v3.19.2.
It might be that the new code added after that is totally safe and unrelated and this is just luck. But if I had to start somewhere, the newly changed code is where I'd start looking into.

@iamemilio
Copy link
Contributor

iamemilio commented Dec 12, 2022

We have not made any changes to that code path for quite a while. I am not sure if your library is doing any unsafe pointer modification, but I have not been able to find a culprit in the go agent yet. We will release a panic recover that will protect the harvest process from crashing fatally in this weeks release, and we recommend you upgrade to it. I'll keep this issue in the back of my mind, but I have to time box this investigation for the time being... If you get a log message reporting a panic in the new release, please @ me here and we will re-asses things. Harvest panics should no longer crash your application, but will result in that harvest's data getting dropped.

Leaving this open: mitigated, not resolved.

@nicoroy2561
Copy link
Author

I just checked and couldn't find any unsafe usage in my application (and dependecies). I'm sure there's gotta be something somewhere.
Either that, or I have zero idea on how it's possible to generate one such weird data structure.

I'll let you know if I get any other stack traces. Will update the lib as soon as I see the commit :)

Thank you for the help!

@iamemilio
Copy link
Contributor

No problem, this is one of the most bizarre bugs I have seen in a long time haha. I hope it gets resolved and thanks for all the help! There will be a release this week with the patch, guaranteed.

@Ak-x Ak-x closed this as completed Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants