Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiMessageHandler没有捕获handler.caught抛出的throwable #8929

Closed
zrlw opened this issue Sep 27, 2021 · 0 comments
Closed

MultiMessageHandler没有捕获handler.caught抛出的throwable #8929

zrlw opened this issue Sep 27, 2021 · 0 comments

Comments

@zrlw
Copy link
Contributor

zrlw commented Sep 27, 2021

Environment

  • Dubbo version: 3.0 / master

这个问题最早在 #7776 提出, 有相应的PR #7815
但是在排查github构建时连续两次UT失败都是testServerHeartbeat方法(HeartbeatHandlerTest测试类)hang掉的时候,又看到了MultiMessageHandler的影子。
github构建日志里没有发现什么线索 (#8923 也有提及):

  1. 3.0的github构建日志
2021-09-25T10:17:40.0085194Z [INFO] -------------------------------------------------------
2021-09-25T10:17:40.0086161Z [INFO]  T E S T S
2021-09-25T10:17:40.0086846Z [INFO] -------------------------------------------------------
2021-09-25T10:17:41.5136723Z [INFO] Running org.apache.dubbo.remoting.exchange.support.header.HeartbeatHandlerTest
2021-09-25T10:17:42.8678103Z Server bind successfully
2021-09-25T10:17:42.9604489Z connect count 1
2021-09-25T10:17:44.9882458Z Server bind successfully
2021-09-25T10:17:44.9965565Z ++++++++++++++ disconnect count 0
2021-09-25T10:17:44.9966140Z ++++++++++++++ connect count 1
2021-09-25T10:17:47.0152440Z Server bind successfully  <== 从此处开始等待了87分钟,但是相关情况只在log4j日志有,github看不到
2021-09-25T11:44:45.0661784Z Terminate batch job (Y/N)?   
2021-09-25T11:44:50.4730720Z ##[error]The action has timed out.
  1. master的github构建日志
2021-09-25T14:00:34.4513058Z [INFO] -------------------------------------------------------
2021-09-25T14:00:34.4513649Z [INFO]  T E S T S
2021-09-25T14:00:34.4514082Z [INFO] -------------------------------------------------------
2021-09-25T14:00:35.3461423Z [INFO] Running org.apache.dubbo.remoting.exchange.support.header.HeartbeatHandlerTest
2021-09-25T14:00:36.2617537Z Server bind successfully
2021-09-25T14:00:36.3014177Z connect count 1
2021-09-25T14:00:38.3370599Z Server bind successfully
2021-09-25T14:00:38.3521421Z ++++++++++++++ disconnect count 0
2021-09-25T14:00:38.3523615Z ++++++++++++++ connect count 1
2021-09-25T14:00:40.3670466Z Server bind successfully <== 从此处开始等待了47分钟,但是相关情况只在log4j日志有,github看不到
2021-09-25T14:47:51.2594748Z Terminate batch job (Y/N)? 

只好本地跟踪这个测试方法,偶然发现handler链的header就是一个MultiMessageHandler对象,越看越熟悉的感觉出来了,然后MultiMessageHandler对象里的handler是AllChannelHandler,直接去看看它的caught方法:

    public void caught(Channel channel, Throwable exception) throws RemotingException {
        ExecutorService executor = getExecutorService();
        try {
            executor.execute(new ChannelEventRunnable(channel, handler, ChannelState.CAUGHT, exception));
        } catch (Throwable t) {
            throw new ExecutionException("caught event", channel, getClass() + " error when process caught event .", t);
        }
    }

如果这里抛异常,那么MultiMessageHandler的for循环就会中断。

推测github构建hang掉的原因是:MultiMessageHandler的received开始循环,调用AllChannelHandler的received异常了,然后捕获继续调用AllChannelHandler的caught方法,但是又异常了,这次异常直接中断了MultiMessageHandler的循环,导致后面等着HeartbeatHandler接收的connected消息丢了,后果就是channel上的时间戳一直是初值null,CloseTimerTask任务一直空跑,
disconnect.await一直傻等,最终job超时被kill掉。

但是github构建失败没有上传dubbo的错误日志,所以没办法考证了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant