[Bug]: [benchmark][standalone][LRU] search raises error `fail to search on QueryNode 12: worker(12) query failed: cannot create std::vector larger than max_size()` in multi-collections scene #32383

wangting0128 · 2024-04-17T13:42:49Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version:debug-chyezh-lru-dev_enable_asan-cbfb3cb-20240417
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: lru-multi-client-qcnd8

server:
milvus standalone restart

NAME                                                              READY   STATUS                            RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lru-big-data-etcd-0                                               1/1     Running                           0                2d9h    10.104.17.116   4am-node23   <none>           <none>
lru-big-data-milvus-standalone-5b954b66cd-bwkxd                   1/1     Running                           1 (4m54s ago)    158m    10.104.21.225   4am-node24   <none>           <none>
lru-big-data-minio-5bd4bc5cd6-6wfkx                               1/1     Running                           0                2d9h    10.104.21.125   4am-node24   <none>           <none>
lru-big-data-pulsar-bookie-0                                      1/1     Running                           0                2d9h    10.104.31.44    4am-node34   <none>           <none>
lru-big-data-pulsar-bookie-1                                      1/1     Running                           0                2d9h    10.104.30.211   4am-node38   <none>           <none>
lru-big-data-pulsar-bookie-2                                      1/1     Running                           0                2d9h    10.104.17.117   4am-node23   <none>           <none>
lru-big-data-pulsar-broker-0                                      1/1     Running                           0                2d9h    10.104.6.186    4am-node13   <none>           <none>
lru-big-data-pulsar-proxy-0                                       1/1     Running                           0                2d9h    10.104.6.185    4am-node13   <none>           <none>
lru-big-data-pulsar-recovery-0                                    1/1     Running                           0                2d9h    10.104.1.199    4am-node10   <none>           <none>
lru-big-data-pulsar-zookeeper-0                                   1/1     Running                           0                2d9h    10.104.17.115   4am-node23   <none>           <none>
lru-big-data-pulsar-zookeeper-1                                   1/1     Running                           0                2d9h    10.104.29.76    4am-node35   <none>           <none>
lru-big-data-pulsar-zookeeper-2                                   1/1     Running                           0                2d9h    10.104.34.168   4am-node37   <none>           <none>

client 1 pod name: lru-multi-client-qcnd8-1954846186
client 1 log:

client 2 pod name:lru-multi-client-qcnd8-4216188395
client 2 log:

Expected Behavior

No response

Steps To Reproduce

1. deploy a standalone milvus with LRU enabled and insert 49m data
2. concurrent 2 clients， each one execute steps as follow：
   a. create a collection with int64(pk), float_vector, int64 three fields
   b. build HNSW index
   c. insert 1million 768dim data
   d. flush collection
   e. build index with the same params again
   f. load collection
   g. concurrent request:
       - search <- raises error

Milvus Log

No response

Anything else?

client 1 test result:

[2024-04-17 13:34:04,310 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-17 13:34:04,311 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-04-17 13:34:04,311 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-17 13:34:04,311 -  INFO - fouram]: grpc     search                                                                        115209     4(0.00%) |    601      91    7989    690 |   33.16        0.00 (stats.py:789)
[2024-04-17 13:34:04,311 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-17 13:34:04,311 -  INFO - fouram]:          Aggregated                                                                    115209     4(0.00%) |    601      91    7989    690 |   33.16        0.00 (stats.py:789)
[2024-04-17 13:34:04,311 -  INFO - fouram]:  (stats.py:790)
[2024-04-17 13:34:04,313 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': '',
            'deploy_mode': '',
            'config_name': '',
            'config': {},
            'host': 'lru-big-data-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'column_name': 'float32_vector',
                                                    'dim': 768,
                                                    'dataset_name': 'laion1b_nolang',
                                                    'dataset_size': '1m',
                                                    'ni_per': 10000},
                                 'collection_params': {'other_fields': ['int64_1']},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 30,
                                                                  'efConstruction': 360}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '1h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 1,
                                                       'params': {'top_k': 1,
                                                                  'nq': 1,
                                                                  'search_param': {'ef': 64},
                                                                  'timeout': 3000,
                                                                  'random_data': True}}]},
            'run_id': 2024041738951590,
            'datetime': '2024-04-17 11:38:15.145458',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 3009.9716},
                            'insert': {'total_time': 287.5895,
                                       'VPS': 3477.1784,
                                       'batch_time': 2.8759,
                                       'batch': 10000},
                            'flush': {'RT': 3.021},
                            'load': {'RT': 5.7206},
                            'Locust': {'Aggregated': {'Requests': 115209,
                                                      'Fails': 4,
                                                      'RPS': 33.16,
                                                      'fail_s': 0.0,
                                                      'RT_max': 7989.69,
                                                      'RT_avg': 601.54,
                                                      'TP50': 690.0,
                                                      'TP99': 1000.0},
                                       'search': {'Requests': 115209,
                                                  'Fails': 4,
                                                  'RPS': 33.16,
                                                  'fail_s': 0.0,
                                                  'RT_max': 7989.69,
                                                  'RT_avg': 601.54,
                                                  'TP50': 690.0,
                                                  'TP99': 1000.0}}}}}

client 2 test result:

[2024-04-17 13:25:59,147 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-17 13:25:59,148 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-04-17 13:25:59,148 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-17 13:25:59,148 -  INFO - fouram]: grpc     search                                                                        120505     7(0.01%) |    596      94    7914    690 |   33.47        0.00 (stats.py:789)
[2024-04-17 13:25:59,148 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-17 13:25:59,148 -  INFO - fouram]:          Aggregated                                                                    120505     7(0.01%) |    596      94    7914    690 |   33.47        0.00 (stats.py:789)
[2024-04-17 13:25:59,148 -  INFO - fouram]:  (stats.py:790)
[2024-04-17 13:25:59,149 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': '',
            'deploy_mode': '',
            'config_name': '',
            'config': {},
            'host': 'lru-big-data-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'column_name': 'float32_vector',
                                                    'dim': 768,
                                                    'dataset_name': 'laion1b_nolang',
                                                    'dataset_size': '1m',
                                                    'ni_per': 10000},
                                 'collection_params': {'other_fields': ['int64_1']},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 30,
                                                                  'efConstruction': 360}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '1h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 1,
                                                       'params': {'top_k': 1,
                                                                  'nq': 1,
                                                                  'search_param': {'ef': 64},
                                                                  'timeout': 3000,
                                                                  'random_data': True}}]},
            'run_id': 2024041738973678,
            'datetime': '2024-04-17 11:38:17.074997',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 2533.6166},
                            'insert': {'total_time': 284.3294,
                                       'VPS': 3517.0475,
                                       'batch_time': 2.8433,
                                       'batch': 10000},
                            'flush': {'RT': 2.5197},
                            'load': {'RT': 4.0285},
                            'Locust': {'Aggregated': {'Requests': 120505,
                                                      'Fails': 7,
                                                      'RPS': 33.47,
                                                      'fail_s': 0.0,
                                                      'RT_max': 7914.9,
                                                      'RT_avg': 596.27,
                                                      'TP50': 690.0,
                                                      'TP99': 1000.0},
                                       'search': {'Requests': 120505,
                                                  'Fails': 7,
                                                  'RPS': 33.47,
                                                  'fail_s': 0.0,
                                                  'RT_max': 7914.9,
                                                  'RT_avg': 596.27,
                                                  'TP50': 690.0,
                                                  'TP99': 1000.0}}}}}

The text was updated successfully, but these errors were encountered:

yanliang567 · 2024-04-17T14:28:42Z

/unassign

xiaofan-luan · 2024-04-17T20:52:39Z

/assign @MrPresent-Han

stale · 2024-05-18T19:13:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

yanliang567 · 2024-05-20T01:18:04Z

keep it

wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Apr 17, 2024

wangting0128 assigned chyezh and yanliang567 Apr 17, 2024

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 17, 2024

yanliang567 added this to the 2.4.1 milestone Apr 17, 2024

sre-ci-robot unassigned yanliang567 Apr 17, 2024

sre-ci-robot assigned MrPresent-Han Apr 17, 2024

stale bot added the stale indicates no udpates for 30 days label May 18, 2024

stale bot removed the stale indicates no udpates for 30 days label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [benchmark][standalone][LRU] search raises error `fail to search on QueryNode 12: worker(12) query failed: cannot create std::vector larger than max_size()` in multi-collections scene #32383

[Bug]: [benchmark][standalone][LRU] search raises error `fail to search on QueryNode 12: worker(12) query failed: cannot create std::vector larger than max_size()` in multi-collections scene #32383

wangting0128 commented Apr 17, 2024

yanliang567 commented Apr 17, 2024

xiaofan-luan commented Apr 17, 2024

stale bot commented May 18, 2024

yanliang567 commented May 20, 2024

[Bug]: [benchmark][standalone][LRU] search raises error fail to search on QueryNode 12: worker(12) query failed: cannot create std::vector larger than max_size() in multi-collections scene #32383

[Bug]: [benchmark][standalone][LRU] search raises error fail to search on QueryNode 12: worker(12) query failed: cannot create std::vector larger than max_size() in multi-collections scene #32383

Comments

wangting0128 commented Apr 17, 2024

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented Apr 17, 2024

xiaofan-luan commented Apr 17, 2024

stale bot commented May 18, 2024

yanliang567 commented May 20, 2024

[Bug]: [benchmark][standalone][LRU] search raises error `fail to search on QueryNode 12: worker(12) query failed: cannot create std::vector larger than max_size()` in multi-collections scene #32383

[Bug]: [benchmark][standalone][LRU] search raises error `fail to search on QueryNode 12: worker(12) query failed: cannot create std::vector larger than max_size()` in multi-collections scene #32383