Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"commands out of sync" errors on MySQL 8.0.24 #1393

Open
owbone opened this issue Feb 10, 2023 · 8 comments
Open

"commands out of sync" errors on MySQL 8.0.24 #1393

owbone opened this issue Feb 10, 2023 · 8 comments
Milestone

Comments

@owbone
Copy link
Contributor

owbone commented Feb 10, 2023

Issue description

This is similar to #1392 but much harder to reproduce and causes real failures.

#1392 describes the situation where the "new" ER_CLIENT_INTERACTION_TIMEOUT error packet is received on an idle connection. However, if you're really unlucky then it's also possible for this packet to be received just as a query is being sent. This results in an ErrPktSync ("commands out of sync. You can't run this command now") error, as the sequence ID of the error packet doesn't correspond to the packet that was simultaneously written. We have seen this in at least two very busy services, and only when there are many connections hitting the wait_timeout.

What's more, this doesn't mark the connection as bad and leaves it in a "busy buffer" state, which likely causes at least one further query to fail.

I don't have any great suggestions for fixing this. Perhaps the driver could recognise that it's an error packet with a zero sequence ID, and return ErrBadConn in that case. At the very least, it should be able to expose the error rather than giving this "commands out of sync" error.

For any readers, we fixed this by just decreasing the application level connection lifetime to be much less than wait_timeout.

Example code

Use https://github.com/lukaszlach/docker-tc to apply a network delay to the MySQL container:

docker run -d \
    --name docker-tc \
    --network host \
    --cap-add NET_ADMIN \
    --restart always \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /var/docker-tc:/var/docker-tc \
    lukaszlach/docker-tc

Start the MySQL container in a network, with a 1 second network delay applied:

docker network create test-net
docker run \
    -e MYSQL_ROOT_PASSWORD=password \
    --label "com.docker-tc.enabled=1" \
    --label "com.docker-tc.delay=1000ms" \
    -p 3306:3306 \
    --net test-net \
    mysql:8.0.24

Then this will reproduce it fairly consistently:

package main

import (
	"database/sql"
	"fmt"
	"time"

	"github.com/go-sql-driver/mysql"
)

func main() {
	mysql.SetLogger(logger{})

	cfg := mysql.NewConfig()
	cfg.User = "root"
	cfg.Passwd = "password"
	cfg.Net = "tcp"
	cfg.Addr = "127.0.0.1"
	cfg.DBName = "mysql"

	db, err := sql.Open("mysql", cfg.FormatDSN())
	if err != nil {
		panic(err)
	}
	defer db.Close()

	db.SetMaxOpenConns(1)

	if _, err := db.Exec("SET GLOBAL wait_timeout = 2"); err != nil {
		panic(err)
	}

	time.Sleep(1500 * time.Millisecond)

	rows, err := db.Query("SHOW TABLES")
	if err != nil {
		fmt.Println("query failed:", err)
	} else {
		rows.Close()
	}
}

type logger struct{}

func (logger) Print(v ...interface{}) {
	fmt.Println(append([]interface{}{"mysql: "}, v...)...)
}

Error log

mysql:  busy buffer
panic: commands out of sync. You can't run this command now

Configuration

Driver version (or git SHA): 1.7.0

Go version: 1.19.5

Server version: MySQL 8.0.24

Server OS: E.g. Debian 8.1 (Jessie), Windows 10

@methane
Copy link
Member

methane commented Apr 11, 2023

We have seen this in at least two very busy services, and only when there are many connections hitting the wait_timeout.

See this topic.

This is an issue that should be fixed. But it doesn't happen on healthy configuration.

@owbone
Copy link
Contributor Author

owbone commented Apr 11, 2023

This can be caused by any error packet from the server, not just ER_CLIENT_INTERACTION_TIMEOUT. We've also seen it happen with ER_DISCONNECTING_REMAINING_CLIENTS, albeit very rarely and only when the server is in trouble. The latter (any possibly other errors) can't be fixed with any client-side configuration, and can still trigger #1394.

Setting SetConnMaxIdleTime() to lower than the wait_timeout does mitigate the problem though by removing the ER_CLIENT_INTERACTION_TIMEOUT errors. This issue is mainly about unobscuring the problem for anyone trying to debug the same thing in the future.

@methane
Copy link
Member

methane commented Apr 11, 2023

It is server side bug. You should update server.

@owbone
Copy link
Contributor Author

owbone commented Apr 11, 2023

Which part is the server side bug?

@methane
Copy link
Member

methane commented Apr 11, 2023

Server should use sequence number 0.

@owbone
Copy link
Contributor Author

owbone commented Apr 11, 2023

The server does use sequence number 0. The problem is when the client receives this after having incremented the sequence number.

@methane
Copy link
Member

methane commented Apr 11, 2023

OK. It would be fixed in future. But it has low priority.

@methane
Copy link
Member

methane commented Apr 11, 2023

At the very least, it should be able to expose the error rather than giving this "commands out of sync" error.

For the record, official MySQL client does not handle ER_CLIENT_INTERACTION_TIMEOUT case. It just return an error if EXTRA_DEBUG is defined. If EXTRA_DEBUG is not defined, client doesn't check the packet sequence number at all.

https://github.com/mysql/mysql-server/blob/1bfe02bdad6604d54913c62614bde57a055c8332/sql-common/net_serv.cc#L1506-L1516

@methane methane added this to the v1.8.0 milestone May 22, 2023
@methane methane modified the milestones: v1.8.0, v1.9.0 Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants