Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark] Some load timeout failures during concurrent DML testing #33120

Open
1 task done
elstic opened this issue May 17, 2024 · 1 comment
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@elstic
Copy link
Contributor

elstic commented May 17, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20240516-5b27a0cd 
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task : fouram-disk-stab-1715882400, id : 3
case: test_concurrent_locust_diskann_compaction_cluster

After inserting 100,000 data into milvus and concurrently load, search, query, insert, delete, and flush for 5h, there were 179 load failures.

   'load': {'Requests': 53374,
            'Fails': 179,
            'RPS': 2.97,
            'fail_s': 0.0,
            'RT_max': 30219.15,
            'RT_avg': 1293.95,
            'TP50': 220.0,
            'TP99': 22000.0},

client error log:
image

server:

fouram-disk-sta82400-3-87-9477-etcd-0                             1/1     Running       0               5m25s   10.104.18.119   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-etcd-1                             1/1     Running       0               5m25s   10.104.34.50    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-etcd-2                             1/1     Running       0               5m24s   10.104.25.235   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-datacoord-86b579c78cjkmdt   1/1     Running       3 (4m29s ago)   5m25s   10.104.25.226   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-datanode-66f87d6754-npzh5   1/1     Running       3 (4m29s ago)   5m25s   10.104.33.154   4am-node36   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-indexcoord-6586cfc7cmsz4x   1/1     Running       0               5m25s   10.104.25.224   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-indexnode-fb6f9cd59-v9w2c   1/1     Running       3 (4m33s ago)   5m25s   10.104.32.142   4am-node39   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-proxy-6767596f66-2rlt7      1/1     Running       3 (4m27s ago)   5m25s   10.104.25.225   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-querycoord-78cbb4b67lngm8   1/1     Running       3 (4m31s ago)   5m25s   10.104.25.223   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-querynode-746c5fcf9ck7l7m   1/1     Running       3 (4m31s ago)   5m25s   10.104.19.95    4am-node28   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-rootcoord-59d559d75-48nb5   1/1     Running       3 (4m27s ago)   5m24s   10.104.25.227   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-0                            1/1     Running       0               5m25s   10.104.18.111   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-1                            1/1     Running       0               5m25s   10.104.34.52    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-2                            1/1     Running       0               5m24s   10.104.25.239   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-3                            1/1     Running       0               5m24s   10.104.33.160   4am-node36   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-0                    1/1     Running       0               5m25s   10.104.25.233   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-1                    1/1     Running       0               5m24s   10.104.34.53    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-2                    1/1     Running       0               5m24s   10.104.18.124   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-init-vv9jp           0/1     Completed     0               5m25s   10.104.5.186    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-broker-0                    1/1     Running       0               5m25s   10.104.4.20     4am-node11   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-proxy-0                     1/1     Running       0               5m25s   10.104.5.185    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-pulsar-init-7d897           0/1     Completed     0               5m25s   10.104.5.184    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-recovery-0                  1/1     Running       0               5m24s   10.104.5.187    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-0                 1/1     Running       0               5m25s   10.104.34.47    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-1                 1/1     Running       0               4m35s   10.104.23.61    4am-node27   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-2                 1/1     Running       0               3m19s   10.104.19.111   4am-node28   <none>           <none> (base.py:257)
[2024-05-16 23:13:10,730 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|fouram-disk-sta82400-3-87-9477-milvus|fouram-disk-sta82400-3-87-9477-minio|fouram-disk-sta82400-3-87-9477-etcd|fouram-disk-sta82400-3-87-9477-pulsar|fouram-disk-sta82400-3-87-9477-zookeeper|fouram-disk-sta82400-3-87-9477-kafka|fouram-disk-sta82400-3-87-9477-log|fouram-disk-sta82400-3-87-9477-tikv'  (util_cmd.py:14)
[2024-05-16 23:13:21,029 -  INFO - fouram]: [CliClient] pod details of release(fouram-disk-sta82400-3-87-9477): 
 I0516 23:13:12.374287    3548 request.go:665] Waited for 1.19762423s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/discovery.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouram-disk-sta82400-3-87-9477-etcd-0                             1/1     Running            0               5h7m    10.104.18.119   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-etcd-1                             1/1     Running            0               5h7m    10.104.34.50    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-etcd-2                             1/1     Running            0               5h7m    10.104.25.235   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-datacoord-86b579c78cjkmdt   1/1     Running            3 (5h6m ago)    5h7m    10.104.25.226   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-datanode-66f87d6754-npzh5   1/1     Running            3 (5h6m ago)    5h7m    10.104.33.154   4am-node36   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-indexcoord-6586cfc7cmsz4x   1/1     Running            0               5h7m    10.104.25.224   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-indexnode-fb6f9cd59-v9w2c   1/1     Running            3 (5h6m ago)    5h7m    10.104.32.142   4am-node39   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-proxy-6767596f66-2rlt7      1/1     Running            3 (5h6m ago)    5h7m    10.104.25.225   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-querycoord-78cbb4b67lngm8   1/1     Running            3 (5h6m ago)    5h7m    10.104.25.223   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-querynode-746c5fcf9ck7l7m   1/1     Running            3 (5h6m ago)    5h7m    10.104.19.95    4am-node28   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-rootcoord-59d559d75-48nb5   1/1     Running            3 (5h6m ago)    5h7m    10.104.25.227   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-0                            1/1     Running            0               5h7m    10.104.18.111   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-1                            1/1     Running            0               5h7m    10.104.34.52    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-2                            1/1     Running            0               5h7m    10.104.25.239   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-3                            1/1     Running            0               5h7m    10.104.33.160   4am-node36   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-0                    1/1     Running            0               5h7m    10.104.25.233   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-1                    1/1     Running            0               5h7m    10.104.34.53    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-2                    1/1     Running            0               5h7m    10.104.18.124   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-init-vv9jp           0/1     Completed          0               5h7m    10.104.5.186    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-broker-0                    1/1     Running            0               5h7m    10.104.4.20     4am-node11   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-proxy-0                     1/1     Running            0               5h7m    10.104.5.185    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-pulsar-init-7d897           0/1     Completed          0               5h7m    10.104.5.184    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-recovery-0                  1/1     Running            0               5h7m    10.104.5.187    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-0                 1/1     Running            0               5h7m    10.104.34.47    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-1                 1/1     Running            0               5h6m    10.104.23.61    4am-node27   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-2                 1/1     Running            0               5h4m    10.104.19.111   4am-node28   <none>           <none>

Expected Behavior

no load fail

Steps To Reproduce

1. create a collection  
  2. build an DiskANN index on the vector column
  3. insert 100k vectors
  4. flush collection
  5. build index on vector column with the same parameters  
  6. count the total number of rows
  7. load collection
  8. execute concurrent search, query, flush, insert ,delete,load 
  9. step 8 lasts 5h

Milvus Log

No response

Anything else?

No response

@elstic elstic added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels May 17, 2024
@elstic elstic added this to the 2.4.2 milestone May 17, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 17, 2024
@yanliang567
Copy link
Contributor

/assign @weiliu1031
/unassign

@yanliang567 yanliang567 modified the milestones: 2.4.2, 2.4.3, 2.4.4 May 24, 2024
@yanliang567 yanliang567 modified the milestones: 2.4.4, 2.4.5 Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants