Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Shutting down myself" caused by error occured in remote node. #7113

Open
ingted opened this issue Mar 3, 2024 · 2 comments
Open

"Shutting down myself" caused by error occured in remote node. #7113

ingted opened this issue Mar 3, 2024 · 2 comments

Comments

@ingted
Copy link

ingted commented Mar 3, 2024

Version Information
Version of Akka.NET? 1.5.0
Which Akka.NET Modules? Akka Remote

Describe the bug
A clear and concise description of what the bug is.

  1. Have two actors created in node A (port 64609) & B (port 64640)
  2. actor_in_a tell actor_in_b and actor_in_b would process the message and tell back
  3. However the generated reponse message is unable to be serializeb by Hyperion and caused error "Failed to write message to the transport" in node B
AssociationError [akka.tcp://cluster-system@10.28.199.143:64609] <- akka.tcp://cluster-system@10.28.199.143:64640: Error [Failed to write message to the transport] []
  1. Then A bumped into disassociation issue with a MYTHICAL node 64643 (I didn't create it)
Association between local [tcp://cluster-system@10.28.199.143:64643] and remote [tcp://cluster-system@10.28.199.143:64609] was disassociated because the ProtocolStateActor failed: Unknown
  1. Then B diassociates
Association with remote system akka.tcp://cluster-system@10.28.199.143:64640 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointException: Failed to write message to the transport   ---> Hyperion.ValueSerializers.UnsupportedTypeException: No coercion operator is defined between types 'CefBrowser*' and 'System.Object'.     at Hyperion.ValueSerializers.UnsupportedTypeSerializer.WriteManifest(Stream stream, SerializerSession session)     at lambda_method305(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at Hyperion.Extensions.StreamEx.WriteObject(Stream stream, Object value, Type valueType, ValueSerializer valueSerializer, Boolean preserveObjectReferences, SerializerSession session)     at lambda_method299(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at Hyperion.Extensions.StreamEx.WriteObject(Stream stream, Object value, Type valueType, ValueSerializer valueSerializer, Boolean preserveObjectReferences, SerializerSession session)     at Hyperion.SerializerFactories.EnumerableSerializerFactory.<>c__DisplayClass10_0.<BuildSerializer>b__1(Stream stream, Object o, SerializerSession session)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at lambda_method76(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at lambda_method72(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at lambda_method74(Closure, Stream, Object, SerializerSession)     at Hyperion.ValueSerializers.ObjectSerializer.WriteValue(Stream stream, Object value, SerializerSession session)     at Hyperion.Serializer.Serialize(Object obj, Stream stream, SerializerSession session)     at Hyperion.Serializer.Serialize(Object obj, Stream stream)     at Akka.Serialization.HyperionSerializer.ToBinary(Object obj)     at Akka.Remote.MessageSerializer.Serialize(ExtendedActorSystem system, Address address, Object message)     at Akka.Remote.EndpointWriter.WriteSend(Send send)     --- End of inner exception stack trace ---     at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow)     at Akka.Remote.EndpointWriter.WriteSend(Send send)     at Akka.Remote.EndpointWriter.<Writing>b__27_0(Send s)     at lambda_method64(Closure, Object, Action`1, Action`1, Action`1)     at Akka.Actor.ReceiveActor.OnReceive(Object message)     at Akka.Actor.UntypedActor.Receive(Object message)     at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)     at Akka.Actor.ActorCell.Invoke(Envelope envelope)]
  1. Then node A & B diassociate
Disassociated [akka.tcp://cluster-system@10.28.199.143:64640] -> akka.tcp://cluster-system@10.28.199.143:64609
Disassociated [akka.tcp://cluster-system@10.28.199.143:64609] <- akka.tcp://cluster-system@10.28.199.143:64640
  1. At last, A & B shut down: (seed node has port 9000)

For node B <= Shutting down myself

Message [AckIdleCheckTimer] from [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#1537073490] to [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#1537073490] was not delivered. [1] dead letters encountered. If this is not an expected behavior then [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#1537073490] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. Message content: Akka.Remote.EndpointWriter+AckIdleCheckTimer

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64609] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://cluster-system@10.28.199.143:64640, Uid=1558551789 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=3, version=12.8.202)]. Node roles [ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143]

"Couldn't establish a causal relationship between "remote" gossip and "local" gossip - Remote[Gossip(members = [Member(address = akka.tcp://cluster-system@10.28.199.143:9000, Uid=1028805500 status = Up, role=[dd,singletonRole,SeedNode,petabridge.cmd], upNumber=1, version=7.1.460), Member(address = akka.tcp://cluster-system@10.28.199.143:64609, Uid=942161684 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=2, version=1.0.0), Member(address = akka.tcp://cluster-system@10.28.199.143:64640, Uid=1558551789 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=3, version=12.8.202)], overview = GossipOverview(seen=[UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:9000, 1028805500), UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64640, 1558551789)], reachability=Reachability([akka.tcp://cluster-system@10.28.199.143:64640 -> UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64609, 942161684): Unreachable [Unreachable] (1)])), version = VectorClock(0DA4CAFA080D3226573233D2547D1AC0->6, 3EBA3B1B1C91D00A7301186C5FF6E40C->1)] - Local[Gossip(members = [Member(address = akka.tcp://cluster-system@10.28.199.143:9000, Uid=1028805500 status = Up, role=[dd,singletonRole,SeedNode,petabridge.cmd], upNumber=1, version=7.1.460), Member(address = akka.tcp://cluster-system@10.28.199.143:64609, Uid=942161684 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=2, version=1.0.0), Member(address = akka.tcp://cluster-system@10.28.199.143:64640, Uid=1558551789 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=3, version=12.8.202)], overview = GossipOverview(seen=[UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64609, 942161684)], reachability=Reachability([akka.tcp://cluster-system@10.28.199.143:64609 -> UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64640, 1558551789): Unreachable [Unreachable] (1)])), version = VectorClock(06163C12B3D0EBEA1063AC304EC6A2FE->1, 0DA4CAFA080D3226573233D2547D1AC0->6)] - merged them into [Gossip(members = [Member(address = akka.tcp://cluster-system@10.28.199.143:9000, Uid=1028805500 status = Up, role=[dd,singletonRole,SeedNode,petabridge.cmd], upNumber=1, version=7.1.460), Member(address = akka.tcp://cluster-system@10.28.199.143:64609, Uid=942161684 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=2, version=1.0.0), Member(address = akka.tcp://cluster-system@10.28.199.143:64640, Uid=1558551789 status = Up, role=[ShardNode,ShardAnalyticServiceNode,petabridge.cmd,10.28.199.143], upNumber=3, version=12.8.202)], overview = GossipOverview(seen=[], reachability=Reachability([akka.tcp://cluster-system@10.28.199.143:64609 -> UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64640, 1558551789): Unreachable [Unreachable] (1)][akka.tcp://cluster-system@10.28.199.143:64640 -> UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:64609, 942161684): Unreachable [Unreachable] (1)])), version = VectorClock(06163C12B3D0EBEA1063AC304EC6A2FE->1, 0DA4CAFA080D3226573233D2547D1AC0->6, 3EBA3B1B1C91D00A7301186C5FF6E40C->1)]"

Received gossip where this member has been downed, from [akka.tcp://cluster-system@10.28.199.143:9000]

Message [BackoffTimer] from [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#816387495] to [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#816387495] was not delivered. [8] dead letters encountered. If this is not an expected behavior then [akka://cluster-system/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%4010.28.199.143%3A64640-2/endpointWriter#816387495] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. Message content: Akka.Remote.EndpointWriter+BackoffTimer

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64609] - Node has been marked as DOWN. Shutting down myself

For Node A <= Shutting down myself

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64640] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://cluster-system@10.28.199.143:64609, Uid=942161684 status = Up, role=[ShardNode,petabridge.cmd,ShardAnalyticServiceNode,10.28.199.143], upNumber=2, version=1.0.0)]. Node roles [ShardNode,petabridge.cmd,ShardAnalyticServiceNode,10.28.199.143]

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64640] - Receiving gossip from [UniqueAddress: (akka.tcp://cluster-system@10.28.199.143:9000, 1028805500)]

Received gossip where this member has been downed, from [akka.tcp://cluster-system@10.28.199.143:9000]

Cluster Node [akka.tcp://cluster-system@10.28.199.143:64640] - Node has been marked as DOWN. Shutting down myself

To Reproduce
If needed, I will provide it in a small project.

Expected behavior
Errors occured in node B should not shut down node A...

Actual behavior
Node A "Shutting down myself"....

Environment
I am running on Windows with .NET 7.

@ingted
Copy link
Author

ingted commented Mar 3, 2024

This time it is different from #2903.
Now the disassociation cause each other shut down themself...

image

image

@ingted
Copy link
Author

ingted commented Mar 3, 2024

Since the error is expected and we can certainly not to trigger it... anyway... T_T|||

@Aaronontheweb Aaronontheweb added this to the 1.5.18 milestone Mar 5, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.18, 1.5.19 Mar 12, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.19, 1.5.20 Apr 15, 2024
@Aaronontheweb Aaronontheweb modified the milestones: 1.5.20, 1.5.21 Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants