Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

Nextjs fix #110

Merged
merged 66 commits into from
Sep 9, 2022
Merged

Nextjs fix #110

merged 66 commits into from
Sep 9, 2022

Conversation

lachrist
Copy link
Contributor

fixes #109

Copying files causes duplicate in `require.cache` which cause the main 
files to be executed twice if the observed application required the main 
file again -- eg: 
https://github.com/vercel/next.js/blob/7e5cb510c4155572d0b251db34845d34fd078480/packages/next/lib/get-project-dir.ts#L3

In addition, this new version of the abomination is less invasive and 
does not monkey-patch `module`.
Copy link
Contributor

@dustinbyrne dustinbyrne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I can confirm I'm able to launch a Next server via the agent launcher now.

Though, I'm still having trouble getting an HTTP response from the Next server. They monkey patch the Response.write method before returning the response which raises an exception because write is marked as a read only property. It looks like this is due to us setting it as non-writable here: https://github.com/applandinc/appmap-agent-js/blob/main/components/patch/default/index.mjs#L33-L39

It seems to me like the flags we are setting on the property should be inherited from the current descriptor instead of hard coded. What do you think?

Somewhat related to this, I'm not getting a route for _appmap/record. I didn't dig much into this.

I'm also seeing another issue where I can't attach a JavaScript debugger without crashing on launch. It doesn't look like it's related to this change so maybe it's a separate issue. It's something to do with unexpected (?) asynchronous operations: https://github.com/applandinc/appmap-agent-js/blob/main/components/hook-group/node/index.mjs#L57

Removing the assertions for id in group and similar allows the server to launch. Maybe these should just be warnings instead of panics? Generally I think it's better to potentially drop some data and log a warning instead of panicking. If this is unrelated, we can open a separate issue to track this.

@lachrist
Copy link
Contributor Author

Awesome! I can confirm I'm able to launch a Next server via the agent launcher now.

Though, I'm still having trouble getting an HTTP response from the Next server. They monkey patch the Response.write method before returning the response which raises an exception because write is marked as a read only property. It looks like this is due to us setting it as non-writable here: https://github.com/applandinc/appmap-agent-js/blob/main/components/patch/default/index.mjs#L33-L39

Nice pinpoint of the root cause!

It seems to me like the flags we are setting on the property should be inherited from the current descriptor instead of hard coded. What do you think?

Well by monkey patching we are already screwing up the transparency of the analysis. For instance, hasOwnProperty(response, "write") will return false in the original application and true in the instrumented application. Hence the http hook works for basic case but does not survive reflection. Depending on how they monkey patch, marking the property as writable might not suffice. It could convert an explicit error into silent non-sense. There could be a more transparent way to do this, for instance using Proxies. Of course the best way would be to rely on an existing hook/listener but I could not find it for spying on request/response body. Let me sleep on this.

Somewhat related to this, I'm not getting a route for _appmap/record. I didn't dig much into this.

I'm not sure what you mean. You have trouble with remote recording? The request is forwarded to the observed application? This could be related. To reproduce the failure, you simply try to perform remote recording on the sample application from above?

I'm also seeing another issue where I can't attach a JavaScript debugger without crashing on launch. It doesn't look like it's related to this change so maybe it's a separate issue. It's something to do with unexpected (?) asynchronous operations: https://github.com/applandinc/appmap-agent-js/blob/main/components/hook-group/node/index.mjs#L57

Removing the assertions for id in group and similar allows the server to launch. Maybe these should just be warnings instead of panics? Generally I think it's better to potentially drop some data and log a warning instead of panicking. If this is unrelated, we can open a separate issue to track this.

The async_hooks API is very state heavy. Because I'm not entirely sure of when the hooks are triggered I rely on a fair amount of assertions. Most of these assertions are implicitly used during the post-processing of the trace to re-order event causally rather than chronologically. So I have to be careful when modifying these assertions to avoid causing hard-to-find problems downstream. Maybe crashing is too strong. But I do need to be notified when an assertion breaks so that I can adapt the post-processing of the trace. I feel that simple warnings will be ignored. What can be done? Anyway, this means that there is a case that I did not foresee. But as you said, this is a separate issue.

@dustinbyrne
Copy link
Contributor

I'm not sure what you mean. You have trouble with remote recording? The request is forwarded to the observed application? This could be related. To reproduce the failure, you simply try to perform remote recording on the sample application from above?

Right, remote recording did not appear to be functioning. The request appeared as if it wasn't being intercepted, returning a 404 instead of the expected 200. To test, I was simply using curl to retrieve the recording status of the agent:

$ curl localhost:3000/_appmap/record

@dustinbyrne
Copy link
Contributor

The above also seemed to only happen on Mac which makes me wonder if it was just an issue with my environment. I'll need to circle back on that. The remote recording endpoint does appear working on Linux.

During some further testing I found that we're not able to hook the functions in pages compiled by next. They use webpack under the hood and the pages they emit end up looking something like:

(() => {
  var exports = {};
  exports.id = "pages/index";
  exports.ids = ["pages/index"];
  exports.modules = {
    "./pages/index.js": ((__unused_webpack_module, __webpack_exports__, __webpack_require__) => {
      eval("<WEBPACKED JS>");
    }
  }
})();

Actually hooking the page logic would require we also transform the source code string passed to eval. I took a stab at this myself but ran into some roadblocks. The AST generated by acorn includes a PropertyDefinition node type which isn't supported by escodegen, so an exception is raised during instrumentation . escodegen also seems largely unmaintained at this point so I attempted migrating to @babel/parser and @babel/generator to resolve the issue. Using babel, the source code would transform cleanly but ultimately result in a panic from within the node runtime.

That's sort of where things stand at the moment. I'm compiling a debug binary of node so that I can identify what's causing the panic.

@lachrist
Copy link
Contributor Author

lachrist commented Aug 1, 2022

Hi @dustin! Sorry I let this in the backburner for too long. I'm in holidays without a laptop. You seem to have stumble upon an issue linked to code transformation. In my experience fixing those are hard but I had the most success by reducing the code to be instrumented as much as possible and then reverse engineer the location of the bug. Sorry to not be able to help more. I will be back the 15th of August. I might be able to find a laptop if it is critical to solve the issue before that. Btw there still #108 that is waiting for a review. I don't think it will this issue but who knows?

@dividedmind
Copy link
Contributor

Hi @dustin! Sorry I let this in the backburner for too long. I'm in holidays without a laptop. You seem to have stumble upon an issue linked to code transformation. In my experience fixing those are hard but I had the most success by reducing the code to be instrumented as much as possible and then reverse engineer the location of the bug. Sorry to not be able to help more. I will be back the 15th of August. I might be able to find a laptop if it is critical to solve the issue before that. Btw there still #108 that is waiting for a review. I don't think it will this issue but who knows?

FWIW, I'm going to take a look at #108.

@lachrist
Copy link
Contributor Author

lachrist commented Aug 16, 2022

FWIW, I'm going to take a look at #108.

@dividedmind Thanks! I released it and answered your comments. Feel free to circle back on them.

@lachrist
Copy link
Contributor Author

lachrist commented Aug 16, 2022

@dustinbyrne

The above also seemed to only happen on Mac which makes me wonder if it was just an issue with my environment. I'll need to circle back on that. The remote recording endpoint does appear working on Linux.

I can confirm that the remote recording works fine in 11.2.0.

soft@HPFFCDE2 nextjs-blog % curl -verbose localhost:3001/_appmap/record
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3001 (#0)
> GET /_appmap/record HTTP/1.1
> Host: localhost:3001
> User-Agent: curl/7.64.1
> Accept: */*
> Referer: rbose
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 17
< Date: Tue, 16 Aug 2022 10:04:41 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< 
* Connection #0 to host localhost left intact
{"enabled":false}* Closing connection 0

Actually hooking the page logic would require we also transform the source code string passed to eval. I took a stab at this myself but ran into some roadblocks. The AST generated by acorn includes a PropertyDefinition node type which isn't supported by escodegen, so an exception is raised during instrumentation . escodegen also seems largely unmaintained at this point so I attempted migrating to @babel/parser and @babel/generator to resolve the issue. Using babel, the source code would transform cleanly but ultimately result in a panic from within the node runtime.

The dreaded eval function... I agree about escodegen. My concern about using babel is that it is rather heavy and it will coexists in the same process as the observed application. FYI I already use babel for post-processing the trace but it is inside another process. However now I see that https://github.com/babel/babel/tree/main/packages/babel-generator does not seem that heavy. I will investigate the switch.

That's sort of where things stand at the moment. I'm compiling a debug binary of node so that I can identify what's causing the panic.

Any progress on that front before I jump into it?

@lachrist lachrist marked this pull request as draft August 16, 2022 10:18
@lachrist
Copy link
Contributor Author

@dustinbyrne

During some further testing I found that we're not able to hook the functions in pages compiled by next. They use webpack under the hood and the pages they emit end up looking something like:

Actually, I'm not sure what you mean by: hooking the functions in pages compiled by next. Are these functions executed in the server or in the client? If it is executed on the client it is beyond our reach (for the moment). If it is executed on the server, I don't understand the concept of: function in pages. How important is it to record them? This confusion is probably due to my lack of knowledge of nextjs. So I will investigate this further but if you have a quick answer, I would love to hear it.

Defining monkey patches as constant properties was too restrictive. The 
idea was to be notified when the observed application would insert its 
own conflicting monkey patch. But the application could behave well and 
call our own monkey patch. This the case for instance for nextjs which 
has somewhere:

```js
function compression(e,o,c){
  var u=false;
  var m;
  var d=[];
  var x;
  var v=o.end;
  var f=o.on;
  var b=o.write;
  o.flush=function flush(){
    if(x){x.flush()}
  };
  o.write=function write(e,a){
    if(u){return false}
    if(!this._header){this._implicitHeader()}
    return x?x.write(toBuffer(e,a)):b.call(this,e,a)
  };
```
@lachrist
Copy link
Contributor Author

@dustinbyrne

Though, I'm still having trouble getting an HTTP response from the Next server. They monkey patch the Response.write method before returning the response which raises an exception because write is marked as a read only property. It looks like this is due to us setting it as non-writable here: https://github.com/applandinc/appmap-agent-js/blob/main/components/patch/default/index.mjs#L33-L39

Fixed by cb392b3

@dustinbyrne
Copy link
Contributor

That's sort of where things stand at the moment. I'm compiling a debug binary of node so that I can identify what's causing the panic.

Any progress on that front before I jump into it?

No, I ended up switching priorities instead of digging deeper into this.

During some further testing I found that we're not able to hook the functions in pages compiled by next. They use webpack under the hood and the pages they emit end up looking something like:

Actually, I'm not sure what you mean by: hooking the functions in pages compiled by next. Are these functions executed in the server or in the client? If it is executed on the client it is beyond our reach (for the moment). If it is executed on the server, I don't understand the concept of: function in pages. How important is it to record them? This confusion is probably due to my lack of knowledge of nextjs. So I will investigate this further but if you have a quick answer, I would love to hear it.

With next you can have both server-side and client-side code in the same page. You may have a method such as getServerSideProps which is run solely on the server, while getProps is intended to be run by the client. Depending on the configuration, both methods may be called on the server anyway due to server-side page rendering which allows the server to act as the client to serve a pre-rendered page.

@lachrist
Copy link
Contributor Author

lachrist commented Aug 16, 2022

@dustinbyrne Thanks for the clarification :) How important is it to record these functions? I remember when I started the agent we discussed with @kgilpin about instrumenting code provided to eval. And we agreed to not do it because (at the time) the idea behind the AppMap framework was to provide dynamic insights to static code. In other words, the things in the classMap should refer to a code location present in the file system. To be clear, I don't think that instrumenting code given to eval is challenging. But is it what we want to do? Won't we instrument too much code?

@dustinbyrne
Copy link
Contributor

This is a case where the server-side logic is compiled by Webpack and emitted out as a string passed to eval. The source map is inlined inside the source string so everything should map back to code present in the file system. I'm guessing web frameworks like Nuxt work in a similar fashion.

Also worth mentioning that it works differently for production builds. The source code is written directly to the file instead of being eval'd into the runtime.

@dustinbyrne
Copy link
Contributor

It's worth mentioning that I've only just discovered the production build. Maybe we already have some support for production-only builds?

`escodegen` is largely unmaintained at this point. `astring` is a good 
alternative. We also considered `@babel/generator` but it is more heavy 
weight. We only need support for estree compliant input, no source 
mapping and no minification.
There is no reason to keep these two hooks inside a single component. 
All the other hooks have their dedicated component. Also remove deadcode 
for transformSourceDefault.
`hook-esm` now provide functions that can directly be provided to the 
hook esm module api of node.
The configuration format exposed to the user was `config` and becomes 
`external-configuration`. And the compiled configuration was 
`configuration` and becomes `internal-configuration`.
Also, change if-else structure for readability.
Call expressions that are considered as eval call expressions will have 
their first argument instrumented. This enables to record functions 
provided as strings to eval calls. This functionality is disabled by 
default.
@dividedmind dividedmind self-requested a review September 8, 2022 07:46
Copy link
Contributor

@dividedmind dividedmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please consider splitting the ordering comments across the relevant files and expanding on them a bit more before you merge. Also note I haven't actually tried out the code, mine is only a reading review, so if you're not confident it works as expected, do some more testing, ask someone else (unless @dustinbyrne is already on it), and/or provide easy instructions for me to try it out. You might also consider cleaning up the commit history for a rebase merge, although it looks daunting so I won't blame you if you do a merge commit.

@lachrist
Copy link
Contributor Author

lachrist commented Sep 8, 2022

@dividedmind thanks for the valuable input! As soon as the CI passes I will merge :)

@lachrist lachrist merged commit 793f9a8 into main Sep 9, 2022
@lachrist lachrist deleted the nextjs-fix branch September 9, 2022 08:26
@appland-release
Copy link

🎉 This PR is included in version 11.3.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

next server fails to launch
5 participants