Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sctool takes more time (~3min) then before to retrieve manager-server version after upgrade #3849

Closed
mikliapko opened this issue May 14, 2024 · 1 comment · Fixed by #3855
Assignees
Labels
bug Something isn't working

Comments

@mikliapko
Copy link

Steps:

  1. Manager 3.1.2 is installed.
  2. Upgrade manager server (to 3.2.7-0.20240513.769fc4f1a-SNAPSHOT)
  3. Upgrade and start manager agents
  4. Start manager server systemctl start scylla-manager
  5. Sleep 30 seconds and get manager version via sctool version

Actual result:
Error Error: Get "http://127.0.0.1:12345/api/v1/version": dial tcp 127.0.0.1:12345: connect: connection refused while getting the version of client.

11:54:38  sdcm.mgmt.common.ScyllaManagerError: Encountered an error on sctool command: version: Encountered a bad command exit code!
11:54:38  Command: 'sudo sctool version'
11:54:38  Exit code: 1
11:54:38  Stdout:
11:54:38  Client version: 3.2.7-0.20240509.769fc4f1a
11:54:38  Stderr:
11:54:38  Error: Get "http://127.0.0.1:12345/api/v1/version": dial tcp 127.0.0.1:12345: connect: connection refused

Expected result:
In older versions it was enough to sleep 30 seconds before running sctool version to get the versions. So, expected the same behavior here as well.

Additional info:

May 14 09:18:05 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:05.350Z","M":"Scylla Manager Server","version":"3.2.7-0.20240513.769fc4f1a-SNAPSHOT","pid":23498,"_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:07 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:07.367Z","M":"Using config","c":{"HTTP":"127.0.0.1:12345","HTTPS":"","TLSVersion":"TLSv1.2","TLSCertFile":"","TLSKeyFile":"","TLSCAFile":"","Prometheus":"127.0.0.1:10091","Debug":"127.0.0.1:5112","ClientCacheTimeout":900000000000,"Logger":{"mode":"stderr","level":"info","sampling":null,"encoding":"json","Development":false},"Database":{"Hosts":["127.0.0.1"],"SSL":false,"User":"","Password":"","LocalDC":"","Keyspace":"scylla_manager","MigrateTimeout":30000000000,"MigrateMaxWaitSchemaAgreement":300000000000,"ReplicationFactor":1,"Timeout":1000000000,"TokenAware":true,"Port":9042,"InitAddr":""},"SSL":{"CertFile":"","Validate":true,"UserCertFile":"","UserKeyFile":""},"Healthcheck":{"NodeInfoTTL":300000000000,"MaxTimeout":1000000000,"Probes":0,"RelativeTimeout":0},"ConfigCache":{"UpdateFrequency":300000000000},"Backup":{"DiskSpaceFreeMinPercent":10,"LongPollingTimeoutSeconds":10,"AgeMax":86400000000000},"Restore":{"DiskSpaceFreeMinPercent":10,"LongPollingTimeoutSeconds":10},"Repair":{"PollInterval":50000000,"LongPollingTimeoutSeconds":10,"AgeMax":0,"GracefulStopTimeout":30000000000,"ForceRepairType":"auto","Murmur3PartitionerIgnoreMSBBits":12},"TimeoutConfig":{"Timeout":30000000000,"MaxTimeout":3600000000000,"ListTimeout":300000000000,"Backoff":{"WaitMin":1000000000,"WaitMax":30000000000,"MaxRetries":9,"Multiplier":2,"Jitter":0.2},"InteractiveBackoff":{"WaitMin":1000000000,"WaitMax":0,"MaxRetries":1,"Multiplier":0,"Jitter":0},"PoolDecayDuration":1800000000000}},"config_files":["/etc/scylla-manager/scylla-manager.yaml"],"_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:07 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:07.367Z","M":"Checking database connectivity...","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:07 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:07.400Z","M":"Migrating schema","keyspace":"scylla_manager","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:07 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:07.433Z","M":"Running migration","migration":"v3.2.0.cql"}
May 14 09:18:09 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:09.319Z","M":"Running migration","migration":"v3.2.2.cql"}
May 14 09:18:09 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:09.617Z","M":"Running migration","migration":"v3.2.6.cql"}
May 14 09:18:09 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:09.943Z","M":"Schema up to date","keyspace":"scylla_manager","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:09 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:09.959Z","M":"Initializing the clusters configuration cache","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:09 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:09.962Z","N":"cluster","M":"Creating new Scylla HTTP client","cluster_id":"81d123d3-2680-43c1-892e-1b7b441c59c5","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:09 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:09.966Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"1.12975014s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:11 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:11.097Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"2.380858913s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:13 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:13.479Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"4.259215654s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:17 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:17.739Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"6.445182799s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:24 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:24.186Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"15.790382382s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:18:39 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:18:39.978Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"30.060540412s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:19:10 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:19:10.040Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"35.843203465s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:19:45 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:19:45.884Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"25.661521099s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:20:11 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:20:11.546Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"24.530577691s","error":"dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
May 14 09:20:36 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"ERROR","T":"2024-05-14T09:20:36.078Z","N":"cluster","M":"Cannot find known hosts using coordinator host","host":"","error":"giving up after 10 attempts: dial tcp :10001: connect: connection refused","_trace_id":"0t81E-FJTIiFFRPVbhf3Ug","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/scyllaclient.(*retryableOperation).submit\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/retry.go:177\ngithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient.retryableTransport.Submit\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/retry.go:155\ngithub.com/scylladb/scylla-manager/v3/swagger/gen/scylla/v1/client/operations.(*Client).StorageServiceHostIDGet\n\tgithub.com/scylladb/scylla-manager/v3/swagger/gen/scylla/v1/client/operations/operations_client.go:11791\ngithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient.(*Client).Datacenters\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/client_scylla.go:161\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).discoverHosts\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).CreateClientNoCache\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:162\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateSingle\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:131\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateAll.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:182\nruntime.goexit\n\truntime/asm_amd64.s:1695\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/go-log@v0.0.7/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/go-log@v0.0.7/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).CreateClientNoCache\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:164\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateSingle\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:131\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateAll.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:182"}
May 14 09:20:36 manager-upgrade-master-f-monitor-node-c526625b-1 scylla-manager[23498]: {"L":"INFO","T":"2024-05-14T09:20:36.082Z","N":"cluster.client","M":"Measuring datacenter latencies","dcs":["us-east"],"_trace_id":"0t81E-FJTIiFFRPVbhf3Ug"}
@rayakurl rayakurl added the bug Something isn't working label May 15, 2024
@Michal-Leszczynski
Copy link
Collaborator

From SM logs:

May 13 09:51:39  {"L":"INFO","T":"2024-05-13T09:51:39.603Z","M":"Schema up to date","keyspace":"scylla_manager","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:39  {"L":"INFO","T":"2024-05-13T09:51:39.627Z","M":"Initializing the clusters configuration cache","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:39  {"L":"INFO","T":"2024-05-13T09:51:39.632Z","N":"cluster","M":"Creating new Scylla HTTP client","cluster_id":"c0e3ca23-81c5-4275-a1d6-9e8a94f305b2","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:39  {"L":"INFO","T":"2024-05-13T09:51:39.639Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"927.211008ms","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:40  {"L":"INFO","T":"2024-05-13T09:51:40.567Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"2.000883071s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:42  {"L":"INFO","T":"2024-05-13T09:51:42.569Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"4.495824842s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:47  {"L":"INFO","T":"2024-05-13T09:51:47.066Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"7.07442966s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:51:54  {"L":"INFO","T":"2024-05-13T09:51:54.142Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"13.149685425s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:52:07  {"L":"INFO","T":"2024-05-13T09:52:07.293Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"32.521097873s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:52:39  {"L":"INFO","T":"2024-05-13T09:52:39.816Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"35.929166916s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:53:15  {"L":"INFO","T":"2024-05-13T09:53:15.745Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"31.28088056s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:53:47  {"L":"INFO","T":"2024-05-13T09:53:47.027Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceHostIdGet","wait":"27.507559245s","error":"dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:54:14  {"L":"ERROR","T":"2024-05-13T09:54:14.537Z","N":"cluster","M":"Cannot find known hosts using coordinator host","host":"","error":"giving up after 10 attempts: dial tcp :10001: connect: connection refused","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw","errorStack":"github.com/scylladb/scylla-manager/v3/pkg/scyllaclient.(*retryableOperation).submit\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/retry.go:177\ngithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient.retryableTransport.Submit\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/retry.go:155\ngithub.com/scylladb/scylla-manager/v3/swagger/gen/scylla/v1/client/operations.(*Client).StorageServiceHostIDGet\n\tgithub.com/scylladb/scylla-manager/v3/swagger/gen/scylla/v1/client/operations/operations_client.go:11791\ngithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient.(*Client).Datacenters\n\tgithub.com/scylladb/scylla-manager/v3/pkg/scyllaclient/client_scylla.go:161\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).discoverHosts\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:188\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).CreateClientNoCache\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:162\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateSingle\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:131\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateAll.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:182\nruntime.goexit\n\truntime/asm_amd64.s:1695\n","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/go-log@v0.0.7/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/go-log@v0.0.7/logger.go:84\ngithub.com/scylladb/scylla-manager/v3/pkg/service/cluster.(*Service).CreateClientNoCache\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/cluster/service.go:164\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateSingle\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:131\ngithub.com/scylladb/scylla-manager/v3/pkg/service/configcache.(*Service).updateAll.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/service/configcache/service.go:182"}
May 13 09:54:14  {"L":"INFO","T":"2024-05-13T09:54:14.541Z","N":"cluster.client","M":"Measuring datacenter latencies","dcs":["us-east"],"_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}
May 13 09:54:14  {"L":"INFO","T":"2024-05-13T09:54:14.560Z","M":"Clusters config cache initialized","_trace_id":"GY1OgBL-SwWHDGF5bnfDhw"}

As it can be seen above, SM tries to connect to the host represented by an empty string "". It comes from:

func (s *Service) CreateClientNoCache(ctx context.Context, clusterID uuid.UUID) (*scyllaclient.Client, error) {
	s.logger.Info(ctx, "Creating new Scylla HTTP client", "cluster_id", clusterID)

	c, err := s.GetClusterByID(ctx, clusterID)
	if err != nil {
		return nil, err
	}

	config := scyllaclient.DefaultConfigWithTimeout(s.timeoutConfig)
	if c.Port != 0 {
		config.Port = fmt.Sprint(c.Port)
	}
	config.AuthToken = c.AuthToken
	config.Hosts = append([]string{c.Host}, c.KnownHosts...) // <- bug

This can't happen with clusters added to SM 3.2, as it persists the --host flag value in DB. That's not the case for SM 3.1, where only obtained KnownHosts were saved and --host discarded. This results in SM trying to connect to an empty host first, which slows down the process of creating scylla client and creates lots of error logs. The client is eventually created when SM moves to connecting to KnowHosts, so this is not a correctness issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants