Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSii blocks the node process when the JS code is not being called by the host #4133

Open
TimothyJones opened this issue Jun 7, 2023 · 4 comments
Labels
feature-request A feature should be added or improved. p2

Comments

@TimothyJones
Copy link
Contributor

Describe the bug

In (at least) Java, the node process is blocked while it is waiting for new method calls.

Specifically, promises that are running in the JSii-ed node process don't run at all when the host process has no active calls to the JSii-ed process.

I have a small repro of this bug in this repository. Here's a breakdown of what's in that repo:

  • There is a JS class that kicks off a "tick" promise chain which logs to console every quarter second, and will print these logs even after longRunningSomething() has completed.
    *The console logs have a number in them that tell you which call of longRunningSomething() those logs were kicked off by.
  • When called from JS, this tick prints every quarter second (as expected) - this isn't in the repo, but is easy to demonstrate with:
       new Example().longRunningSomething();
       console.log("Long running something complete")
       await new Promise((resolve) => setTimeout(resolve, 5000)); // logs will keep printing while waiting for this to complete
    
  • When called from Java, the ticks only and print when the execution is with node.
  • When node returns to Java, the ticks stop.
  • The ticks restart again if further calls to JS are made.

Although this is an async example, the blocking of the node process happens regardless of whether or not the call in to it is async. I only made an async example so that I could make longRunningSomething() take a while so you can see that the ticks run while node is doing something that it is planning to return to java.

Expected Behavior

When the java code at that link is run, I expect the (1) ticks to continue after the call to longRunningSomething() while the Java thread is doing something else.

Current Behavior

When the java code at that link is run, the (1) ticks stop once the JS returns to Java.

The (1) ticks restart when

Reproduction Steps

With JDK 20.0.1 (this is just the version I know works, others probably do too):

git clone git@github.com:TimothyJones/jsii-async-issue.git
cd java-host/untitled
./gradlew run

Observe that the ticks stop printing when Java is waiting

Possible Solution

My guess is that the stdin/out protocol for communicating between JSii and the host uses blocking calls on the JS side. I suspect that non-blocking calls would solve this problem, and possibly several others.

Because JSii communicates across a stdin/out protocol with another process, I think there's quite a bit of flexibility in the way that async is treated - one side could see it as async, while the other sees it as sync (and vice versa).

Additional Information/Context

This is related to this issue in that I think this is the cause, rather than aync hairiness.

(that issue describes the exact situation that I have in my real code, though)

This is a blocking issue for me, I would be happy to look in to it (but definitely would need some guidance on where to look)

SDK version used

jsii@5.1.1

Environment details (OS name and version, etc.)

node 18.13.0, JVM 20.0.1, MacOS 13.3.1 (a)

@TimothyJones TimothyJones added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 7, 2023
@RomainMuller
Copy link
Contributor

Alright yes, the current implementation of the host in @jsii/runtime is fully synchronous (bad decisions from a long, long time ago; coming back to haunt).

To a large extent this is because the host needs to maintain a specific order of operations, as the jsii protocol currently is sequential (request -> response to that request). If order of operations is breached, then the response sent could be for a different request than the last one sent by the client, who'd then be unable to match that response to a given request.

I venture that we either need to build some state machines so that we can ensure that a given response is only sent when it's supposed to, so that the client can make sense of it all... or we need to re-design the interop model so that requests have IDs that are passed back in the response, so they could be sent out-of-order.

Following this, we'd also need to make sure that multiple requests/responses don't get simultaneously sent in such a way that the JSON IPC stream would get corrupted (although maybe we're safe with that due to how there can only be 1 JS-executing thread at any given time)

@TimothyJones
Copy link
Contributor Author

TimothyJones commented Jun 7, 2023

Right, yes. I think if you unblocked node, you could very quickly and naturally end up in a situation where the request responses would be out of order, because an unblocked async might want to call the host (this would definitely happen with my ContractCase project, which passes logs across the boundary so they can be idiomatically handled by the host's test framework).

or we need to re-design the interop model so that requests have IDs that are passed back in the response, so they could be sent out-of-order.

With the caveat that I only have the context from the runtime architecture diagrams (and related documentation), this feels like the more reasonable of the two to me.

we'd also need to make sure that multiple requests/responses don't get simultaneously sent in such a way that the JSON IPC stream would get corrupted (although maybe we're safe with that due to how there can only be 1 JS-executing thread at any given time)

I agree for calls from node -> host, but I think you might need to take care not to multi-thread calls from host -> node (but you probably have that problem regardless of how the protocol works). If it turns out to be a problem, there are a few possible solutions there, I think.

Sooo... this sounds like a substantial rework, and I might not be qualified to do it without more context of how JSii internals work. On the other hand, this issue kills my current approach for multi-platform dead -and JSii is otherwise such a perfect fit for my use-case. If a few hours work over a few PRs could help fix this, I'd like to help out. What do you think the best way to proceed is?

@TimothyJones
Copy link
Contributor Author

I can take a look at spiking something over the weekend - could you point me at the communication code? I see this interface which looks like it's the place where node -> host happens. Is that right? How does that relate to this module?

Also, presumably the other side of that communication boundary is implemented in each host language? How is the protocol tested? Would a good approach be to modify the node side, then follow breaking tests until it works?

Any code pointers (or general advice) would be super helpful. Also sorry for so many questions!

@TimothyJones
Copy link
Contributor Author

Actually, there might be a better way with less blast radius - If the node side of the communication boundary was controlled with a mutex, it could be possible to maintain the current contract between the hosts and the node side, but allow node to continue other unrelated promises.

This would stop the core blocking the node thread, but wouldn't allow multi-threaded access across the JSii boundary (which I think is the current state anyway).

mergify bot pushed a commit that referenced this issue Jun 13, 2023
Thanks for this awesome library!

While investigating #4133 , I noticed a few typos in the comments. This PR corrects them.

No behaviour changes are introduced.

Two of the three comments are in exposed TS doc comments in `@jsii/kernel`. Let me know if you'd prefer a different commit type for those.

---

By submitting this pull request, I confirm that my contribution is made under the terms of the [Apache 2.0 license].

[Apache 2.0 license]: https://www.apache.org/licenses/LICENSE-2.0
@mrgrain mrgrain added feature-request A feature should be added or improved. p2 and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2
Projects
None yet
Development

No branches or pull requests

3 participants