We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这个问题最早在 #7776 提出, 有相应的PR #7815 但是在排查github构建时连续两次UT失败都是testServerHeartbeat方法(HeartbeatHandlerTest测试类)hang掉的时候,又看到了MultiMessageHandler的影子。 github构建日志里没有发现什么线索 (#8923 也有提及):
2021-09-25T10:17:40.0085194Z [INFO] ------------------------------------------------------- 2021-09-25T10:17:40.0086161Z [INFO] T E S T S 2021-09-25T10:17:40.0086846Z [INFO] ------------------------------------------------------- 2021-09-25T10:17:41.5136723Z [INFO] Running org.apache.dubbo.remoting.exchange.support.header.HeartbeatHandlerTest 2021-09-25T10:17:42.8678103Z Server bind successfully 2021-09-25T10:17:42.9604489Z connect count 1 2021-09-25T10:17:44.9882458Z Server bind successfully 2021-09-25T10:17:44.9965565Z ++++++++++++++ disconnect count 0 2021-09-25T10:17:44.9966140Z ++++++++++++++ connect count 1 2021-09-25T10:17:47.0152440Z Server bind successfully <== 从此处开始等待了87分钟,但是相关情况只在log4j日志有,github看不到 2021-09-25T11:44:45.0661784Z Terminate batch job (Y/N)? 2021-09-25T11:44:50.4730720Z ##[error]The action has timed out.
2021-09-25T14:00:34.4513058Z [INFO] ------------------------------------------------------- 2021-09-25T14:00:34.4513649Z [INFO] T E S T S 2021-09-25T14:00:34.4514082Z [INFO] ------------------------------------------------------- 2021-09-25T14:00:35.3461423Z [INFO] Running org.apache.dubbo.remoting.exchange.support.header.HeartbeatHandlerTest 2021-09-25T14:00:36.2617537Z Server bind successfully 2021-09-25T14:00:36.3014177Z connect count 1 2021-09-25T14:00:38.3370599Z Server bind successfully 2021-09-25T14:00:38.3521421Z ++++++++++++++ disconnect count 0 2021-09-25T14:00:38.3523615Z ++++++++++++++ connect count 1 2021-09-25T14:00:40.3670466Z Server bind successfully <== 从此处开始等待了47分钟,但是相关情况只在log4j日志有,github看不到 2021-09-25T14:47:51.2594748Z Terminate batch job (Y/N)?
只好本地跟踪这个测试方法,偶然发现handler链的header就是一个MultiMessageHandler对象,越看越熟悉的感觉出来了,然后MultiMessageHandler对象里的handler是AllChannelHandler,直接去看看它的caught方法:
public void caught(Channel channel, Throwable exception) throws RemotingException { ExecutorService executor = getExecutorService(); try { executor.execute(new ChannelEventRunnable(channel, handler, ChannelState.CAUGHT, exception)); } catch (Throwable t) { throw new ExecutionException("caught event", channel, getClass() + " error when process caught event .", t); } }
如果这里抛异常,那么MultiMessageHandler的for循环就会中断。
推测github构建hang掉的原因是:MultiMessageHandler的received开始循环,调用AllChannelHandler的received异常了,然后捕获继续调用AllChannelHandler的caught方法,但是又异常了,这次异常直接中断了MultiMessageHandler的循环,导致后面等着HeartbeatHandler接收的connected消息丢了,后果就是channel上的时间戳一直是初值null,CloseTimerTask任务一直空跑, disconnect.await一直傻等,最终job超时被kill掉。
但是github构建失败没有上传dubbo的错误日志,所以没办法考证了。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Environment
这个问题最早在 #7776 提出, 有相应的PR #7815
但是在排查github构建时连续两次UT失败都是testServerHeartbeat方法(HeartbeatHandlerTest测试类)hang掉的时候,又看到了MultiMessageHandler的影子。
github构建日志里没有发现什么线索 (#8923 也有提及):
只好本地跟踪这个测试方法,偶然发现handler链的header就是一个MultiMessageHandler对象,越看越熟悉的感觉出来了,然后MultiMessageHandler对象里的handler是AllChannelHandler,直接去看看它的caught方法:
如果这里抛异常,那么MultiMessageHandler的for循环就会中断。
推测github构建hang掉的原因是:MultiMessageHandler的received开始循环,调用AllChannelHandler的received异常了,然后捕获继续调用AllChannelHandler的caught方法,但是又异常了,这次异常直接中断了MultiMessageHandler的循环,导致后面等着HeartbeatHandler接收的connected消息丢了,后果就是channel上的时间戳一直是初值null,CloseTimerTask任务一直空跑,
disconnect.await一直傻等,最终job超时被kill掉。
但是github构建失败没有上传dubbo的错误日志,所以没办法考证了。
The text was updated successfully, but these errors were encountered: