Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scylla test with all addresses ipv6 public fails on nodetool status #7447

Closed
2 tasks
soyacz opened this issue May 17, 2024 · 7 comments · Fixed by #7460
Closed
2 tasks

Scylla test with all addresses ipv6 public fails on nodetool status #7447

soyacz opened this issue May 17, 2024 · 7 comments · Fixed by #7460
Assignees
Labels
Bug Something isn't working right

Comments

@soyacz
Copy link
Contributor

soyacz commented May 17, 2024

Packages

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Tried provision test with configurations/network_config/all_addresses_ipv6_public.yaml config and failed on Waiting for nodes to join the cluster.
While nodetool status shows all nodes up:

< t:2024-05-17 12:43:13,872 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > Datacenter: eu-west
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > ===================
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > Status=Up/Down
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > |/ State=Normal/Leaving/Joining/Moving
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > -- Address                                 Load    Tokens Owns Host ID                              Rack
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > UN 2a05:d018:12e3:f000:2415:944:1500:6022  1.61 MB 256    ?    a35d9f4b-220a-43d7-8317-bcf2a1e999b3 1a  
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > UN 2a05:d018:12e3:f000:4a6e:f30a:cbf0:f73  1.85 MB 256    ?    dd8fd382-83c8-4e31-a9a7-aaaa3bb98d11 1a  
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > UN 2a05:d018:12e3:f000:4eca:abac:68a4:7d4d 1.67 MB 256    ?    ae2d7ff7-85a6-4b90-9bb4-d39ef1a0d260 1a  
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > UN 2a05:d018:12e3:f000:7a4a:8820:7a5a:24e  1.69 MB 256    ?    63332c60-933b-4f12-bc49-ad7294e7c208 1a  
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > UN 2a05:d018:12e3:f000:ac27:1cec:17ce:814  1.66 MB 256    ?    11ffd883-4b93-49c0-9fd5-410142398859 1a  
< t:2024-05-17 12:43:13,873 f:base.py         l:230  c:RemoteLibSSH2CmdRunner p:DEBUG > UN 2a05:d018:12e3:f000:f763:aad5:adad:144a 1.67 MB 256    ?    6e298767-b691-4915-a693-9a499a15ca99 1a  

Installation details

Cluster size: 6 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • longevity-10gb-3h-speed-up-db-node-e6323f19-6 (34.246.135.76 | 2a05:d018:12e3:f000:f763:aad5:adad:144a) (shards: 7)
  • longevity-10gb-3h-speed-up-db-node-e6323f19-5 (54.247.14.104 | 2a05:d018:12e3:f000:4a6e:f30a:cbf0:f73) (shards: 7)
  • longevity-10gb-3h-speed-up-db-node-e6323f19-4 (3.250.174.162 | 2a05:d018:12e3:f000:7a4a:8820:7a5a:24e) (shards: 7)
  • longevity-10gb-3h-speed-up-db-node-e6323f19-3 (52.17.71.170 | 2a05:d018:12e3:f000:4eca:abac:68a4:7d4d) (shards: 7)
  • longevity-10gb-3h-speed-up-db-node-e6323f19-2 (54.216.72.174 | 2a05:d018:12e3:f000:2415:944:1500:6022) (shards: 7)
  • longevity-10gb-3h-speed-up-db-node-e6323f19-1 (3.251.98.1 | 2a05:d018:12e3:f000:ac27:1cec:17ce:814) (shards: 7)

OS / Image: ami-03e1418be972f5f58 (aws: undefined_region)

Test: provision-test
Test id: e6323f19-83c6-4ebc-97cb-23471b8b64fe
Test name: scylla-staging/lukasz/provision-test
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor e6323f19-83c6-4ebc-97cb-23471b8b64fe
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs e6323f19-83c6-4ebc-97cb-23471b8b64fe

Logs:

Jenkins job URL
Argus

@soyacz
Copy link
Contributor Author

soyacz commented May 17, 2024

also see errors like:
< t:2024-05-17 12:41:55,817 f:cluster.py l:2688 c:sdcm.cluster p:ERROR > Get nodes statuses. Failed to find a node in cluster by IP: 2a05:d018:12e3:f000:ac27:1cec:17ce:0814

@fruch
Copy link
Contributor

fruch commented May 19, 2024

Seems like we are comparing again without leading zeros,

I thought I'd fixed it, but it seems we broke it again

@fruch
Copy link
Contributor

fruch commented May 19, 2024

Seems like we are comparing again without leading zeros,

I thought I'd fixed it, but it seems we broke it again

this was the fix: #7047

and seems like togther with 2c2369d something broke, or AWS started returning the short version of ipv6

@fruch fruch added the Bug Something isn't working right label May 20, 2024
@fruch
Copy link
Contributor

fruch commented May 20, 2024

@juliayakovlev please take a look at this one, let's try to find why this isn't working as expected

@juliayakovlev
Copy link
Contributor

juliayakovlev commented May 20, 2024

Seems like we are comparing again without leading zeros,
I thought I'd fixed it, but it seems we broke it again

this was the fix: #7047

and seems like togther with 2c2369d something broke, or AWS started returning the short version of ipv6

@fruch
I think I need need to add similar "magic", like you did in https://github.com/scylladb/scylla-cluster-tests/pull/7047/files, here:

ipv6_addresses = [ipv6_address['Ipv6Address'] for ipv6_address in interface.ipv6_addresses]

@fruch fruch assigned juliayakovlev and unassigned juliayakovlev and soyacz May 20, 2024
@fruch
Copy link
Contributor

fruch commented May 20, 2024

Seems like we are comparing again without leading zeros,
I thought I'd fixed it, but it seems we broke it again

this was the fix: #7047
and seems like togther with 2c2369d something broke, or AWS started returning the short version of ipv6

@fruch I think I need need to add similar "magic", like you did in https://github.com/scylladb/scylla-cluster-tests/pull/7047/files, here:

ipv6_addresses = [ipv6_address['Ipv6Address'] for ipv6_address in interface.ipv6_addresses]

o.k. so lets do (let try doing it on any place we retrieve ipv6 addresses, also for gce/azure ?)

juliayakovlev added a commit to juliayakovlev/scylla-cluster-tests that referenced this issue May 21, 2024
The problem was found during IPv6 test.
IPv6 was converted to full format when parse nodetool status result.
This change was represented by  scylladb#7047

But IPv6 address was not converted when collected info about nodes network interfaces (AWS).

During checking nodes status, we compare between 'nodetool status' output and node address
that kept in the network_interfaces object. It fails because the IPv6 address miss leading zeros.
As result the test fails with errors 'Failed to find a node in cluster by IP'

Fixes: scylladb#7447
juliayakovlev added a commit to juliayakovlev/scylla-cluster-tests that referenced this issue May 21, 2024
The problem was found during IPv6 test.
IPv6 was converted to full format when parse nodetool status result.
This change was represented by  scylladb#7047

But IPv6 address was not converted when collected info about nodes network interfaces (AWS).

During checking nodes status, we compare between 'nodetool status' output and node address
that kept in the network_interfaces object. It fails because the IPv6 address miss leading zeros.
As result the test fails with errors 'Failed to find a node in cluster by IP'

Fixes: scylladb#7447
@juliayakovlev
Copy link
Contributor

#7460

juliayakovlev added a commit to juliayakovlev/scylla-cluster-tests that referenced this issue May 21, 2024
The problem was found during IPv6 test.
IPv6 was converted to full format when parse nodetool status result.
This change was represented by  scylladb#7047

But IPv6 address was not converted when collected info about nodes network interfaces (AWS).

During checking nodes status, we compare between 'nodetool status' output and node address
that kept in the network_interfaces object. It fails because the IPv6 address miss leading zeros.
As result the test fails with errors 'Failed to find a node in cluster by IP'

Also fix the same problem for peer and gossip output.

Fixes: scylladb#7447
fruch pushed a commit that referenced this issue May 21, 2024
The problem was found during IPv6 test.
IPv6 was converted to full format when parse nodetool status result.
This change was represented by  #7047

But IPv6 address was not converted when collected info about nodes network interfaces (AWS).

During checking nodes status, we compare between 'nodetool status' output and node address
that kept in the network_interfaces object. It fails because the IPv6 address miss leading zeros.
As result the test fails with errors 'Failed to find a node in cluster by IP'

Also fix the same problem for peer and gossip output.

Fixes: #7447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working right
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants