Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

callee session killed when caller session lost in flight #2103

Open
markope opened this issue Apr 28, 2024 · 7 comments
Open

callee session killed when caller session lost in flight #2103

markope opened this issue Apr 28, 2024 · 7 comments

Comments

@markope
Copy link
Contributor

markope commented Apr 28, 2024

I have a crossbar session from client A (browser) and a session from client B (server) both with autobahn-js.

  1. client B has registered an async function myfunc which has about 1 sec runtime.

  2. I call myfunc on client B from client A

  3. I don't await the result in client A and immediately kill the session in client A (by navigating the browser window location = 'https...')

Then client B session is killed completely (reason: null) and an internal error is raised (on_internal_error)
The destruction of the session is unexpected and unwanted because this affects all other clients as well that want to use registered functions of B.

Instead client B should throw an error which should be caught and handled in the on_internal_error function that is configured in the connection configuration.

@oberstet
Copy link
Contributor

oberstet commented Apr 28, 2024

From your description, I'm not sure what exactly is happening, I'm not even convinced the issue is in CB - it might very well be within ABJS, eg:

CB correctly handling the messaging between Caller/Router/Callee in a situation when an in-flight invocation which is still in-progress Callee-side is signaled (correctly by CB) that the original Caller is now dead, and the (still in-progress) invocation is NOT needed anymore (by the router to forward to the now dead original caller).

Well, and ABJS then not correctly dealing with the error returned from CB.


  1. Providing CB and ABJS (caller and callee) side logs would help.
  2. Having complete WAMP transport-level message traces for both the Caller and the Callee session would be perfect
  3. Having the same problem demonstrated not using ABJS but ABPY for both the Caller and the Callee would provide indirect evidence (still not positive/constructively, but indicating the problem must be in CB - since the same problem was demonstrated by both ABJS and ABPY for client side, so CB must be at fault ..)

@oberstet
Copy link
Contributor

oberstet commented Apr 28, 2024

Rgd the CB side of things, what is also crucial for tracking down such situations: are you using a single CB node, or are you using Router-to-Router links and multiple CB nodes?

Essentially, the complete WAMP clients (caller and callee) and WAMP routing (node or rlink with nodes) topology.

The latter obviously makes things an order more complex on the WAMP routing side (assuming now that the problem is indeed CB, single or multi-node, and not WAMP client side in ABJS ...)

@markope
Copy link
Contributor Author

markope commented Apr 28, 2024

Yes, could also be ABJS related. I'm using crossbar single node attached to a master node. No router-to-router links.

I extracted some log messages from the node that I associate with the observation:

2024-04-28 12:46:13 2024-04-28T10:46:13+0000 [Proxy         212] dropping connection to peer tcp4:127.0.0.1:36302 with abort=False: None
2024-04-28 12:46:23 2024-04-28T10:46:23+0000 [Proxy         212] dropping connection to peer tcp4:127.0.0.1:41174 with abort=False: None
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Proxy         212] <crossbar.worker.proxy.ProxyFrontendSession.onClose> proxy frontend session closed (wasClean=True)
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Router        223] <crossbar.router.router.Router.detach> router session detached from realm "realm1" (session=328729321225176, detached_session_ids=1, authid="5", authrole="acct_manager", authmethod="cryptosign-proxy", authprovider="static")
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Proxy         212] <crossbar.worker.proxy.ProxyBackendSession.onClose> proxy backend session closed (wasClean=True)
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Proxy         212] <crossbar.worker.proxy.ProxyFrontendSession.onClose> proxy frontend session closed (wasClean=True)
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Router        223] <crossbar.router.router.Router.detach> router session detached from realm "realm1" (session=5217118060887004, detached_session_ids=1, authid="system", authrole="system", authmethod="cryptosign-proxy", authprovider="static")
2024-04-28 12:46:27 2024-04-28T10:46:27+0000 [Proxy         212] <crossbar.worker.proxy.ProxyBackendSession.onClose> proxy backend session closed (wasClean=True)

(probably not very helpful)

will see if I can make a python example...

@oberstet
Copy link
Contributor

2024-04-28 12:46:13 2024-04-28T10:46:13+0000 [Proxy 212] dropping connection to peer tcp4:127.0.0.1:36302 with abort=False: None

thanks! but unfortunately, the logs do not reveal enough.

what is happening immediately before this line? what messages are received/sent by CB on each of the 2 involved sessions, in particular on session tcp4:127.0.0.1:36302?

you could bump up CB log level, you can bump it up to the point where it will log each and every single WAMP message.

the trick is to reproduce the problem with minimal messages/sessions attached .. otherwise the log volume will pretty much swamp you and finding the needle in the haystack and only just "read" the messages and understand what's going on becomes a challenge;)

@markope
Copy link
Contributor Author

markope commented Apr 28, 2024

That first line in the log really is the first line. i.e. any log output before is not related to this issue here, because I trigger the client A disconnect at that time. loglevel=debug is insane 😄 . I was not able to find anything there.

Properly Isolating the issue will take some time...

@oberstet
Copy link
Contributor

oberstet commented Apr 28, 2024

loglevel=debug is insane 😄 . I was not able to find anything there.

yes, it is;) if you upload the logs somewhere for me to download, I can have a look. first thing to identify within the logs is the line CB logs when it recognizes the Caller session (the one named "A" in your example) is gone .. and then look subsequently for any hints it does recognize that there still was an ongoing Call ... which did not yet return from the Callee ... and then what happens at this point?

further, what log lines can be observed once the Invocation finally comes back from the Callee, and CB recognizes that the session it wants to forward the result back is no longer there?

Properly Isolating the issue will take some time...

yeah, it can be tricky to figure out what exactly happens. it is an async distributed system with 3 participants. luckily, you don't have to deal with rlinks and router-to-router;)

@DZabavchik
Copy link
Contributor

@markope in order to isolate the issue, can you try without proxies (single "type": "router" worker only, no "type": "proxy" workers). But I doubt it is the proxy. With JS clients it is usually easy to debug looking at the WS frames in browser's developer tools/network tab.

It is peculiar that dropping connection are 10 seconds apart

2024-04-28 12:46:13 2024-04-28T10:46:13+0000 [Proxy         212] dropping connection to peer tcp4:127.0.0.1:36302 with abort=False: None
2024-04-28 12:46:23 2024-04-28T10:46:23+0000 [Proxy         212] dropping connection to peer tcp4:127.0.0.1:41174 with abort=False: None
2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants