[Bug]: excessive memory usage in datanode for 10M-dimensional HNSW index #32032

douglarek · 2024-04-08T14:43:43Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: v2.3.13
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: calculate according to tools sizing tool
- GPU: 
- Others:

Current Behavior

The current datanode's memory usage is too high:

Expected Behavior

Steps To Reproduce

Without using mmap, create a 4096-dimensional vector, index it with HNSW, and insert 10 million data points.

Milvus Log

Logs captured through export_milvus_log.sh: https://drive.google.com/file/d/18jaCfxDUxV_4gE7DDJ29T5ggAzo_UzPA/view?usp=sharing

Anything else?

Connection information for Kafka, etcd, and other important connections has been sanitized.

The text was updated successfully, but these errors were encountered:

xiaofan-luan · 2024-04-08T17:20:17Z

 a 4096-dimensional vector, index it with HNSW, and insert 10

This is the minimum requirment.
The datanode memory usage is more based on how frequent the write is. (Any partitions, collections number )
@congqixia is still working on optimizing it on 2.4 branch.
I would recommend to have 16GB memory if you have very intensive write workload.

yanliang567 · 2024-04-09T01:44:03Z

@douglarek I'd like to know more about how you insert data into milvus? the concurrent number of insertion? the number of entities for each insert request? did you manually call flush() during the insertion? also if possible please share the collection schema info.
/assign @douglarek
/unassign

douglarek · 2024-04-09T02:29:52Z

@douglarek I'd like to know more about how you insert data into milvus? the concurrent number of insertion? the number of entities for each insert request? did you manually call flush() during the insertion? also if possible please share the collection schema info. /assign @douglarek /unassign

Python SDK, 8 processes, 1000 vectors in single request, no flush, and the collection schema is:

By monitoring the metric milvus_proxy_receive_bytes_count, it shows 12MB/s.

By the way, as shown in the screenshot in my issue, why is the memory not balanced between the two datanodes?

congqixia · 2024-04-09T02:47:24Z

@douglarek for datanode workload unbalance issue, it looks like there is only one shard for this collection. So only one datanode can have workload, which is by design

hjlee9182 · 2024-04-09T03:43:37Z

@yanliang567
When do I have to do the flash? During the insert? Or after the insert?

yanliang567 · 2024-04-09T04:03:13Z

@yanliang567 When do I have to do the flash? During the insert? Or after the insert?

in most cases, you don't have to do flsuh() manually, Milvus will do it automatically.

xiaofan-luan · 2024-04-09T05:16:23Z

@hjlee9182

could you generate a pprof for the memory usage?
for cluster mode, milvus will use up to 50% of the pod memory. Did you set a memory limit for the pod? That is saying, if you set memory limit to 16G, milvus will use slightly more than 8g

hjlee9182 · 2024-04-09T05:33:07Z

I'm not a original author, I was just commenting out of curiosity, you're asking this person, right? @douglarek

douglarek · 2024-04-09T07:11:31Z

@hjlee9182

could you generate a pprof for the memory usage?

for cluster mode, milvus will use up to 50% of the pod memory. Did you set a memory limit for the pod? That is saying, if you set memory limit to 16G, milvus will use slightly more than 8g

I can do pprof. I see that Milvus has opened port 9091 for debugging. However, the large memory doesn't exceed 20GB all the time. I'll upload it once I capture it.
Yes, I have set its maximum memory to 64GB (I had to, as it was exceeding 50GB at its peak), and the requested memory is 8GB.

xiaofan-luan · 2024-04-09T16:55:56Z

try to set it to 32GB and I guess it will be 20GB.
Because what milvus can see is your limit.
But anyway I should try to improve the flush speed

douglarek · 2024-04-17T02:59:14Z

try to set it to 32GB and I guess it will be 20GB. Because what milvus can see is your limit. But anyway I should try to improve the flush speed

I've noticed an interesting phenomenon: no matter how many partitions are set for all topics under xxx_rootcoord-dml_xx, and no matter how many datanodes there are, the data always aggregates to the same datanode.

congqixia · 2024-04-17T03:08:57Z

try to set it to 32GB and I guess it will be 20GB. Because what milvus can see is your limit. But anyway I should try to improve the flush speed

I've noticed an interesting phenomenon: no matter how many partitions are set for all topics under xxx_rootcoord-dml_xx, and no matter how many datanodes there are, the data always aggregates to the same datanode.

For utilizing multiple datanodes or dml channels, the number or shards shall be increased but with some cost of search performance

douglarek · 2024-04-17T04:00:35Z

try to set it to 32GB and I guess it will be 20GB. Because what milvus can see is your limit. But anyway I should try to improve the flush speed

I've noticed an interesting phenomenon: no matter how many partitions are set for all topics under xxx_rootcoord-dml_xx, and no matter how many datanodes there are, the data always aggregates to the same datanode.

For utilizing multiple datanodes or dml channels, the number or shards shall be increased but with some cost of search performance

The number of shards for the collection is 4. I am currently running with 4 datanodes, and all topics of Kafka DML have 3 partitions each. The observation I'm making now is that among Kafka DML topics, 4 have been written to, but only one datanode seems to be actively processing, while the other three are updating checkpoints. I'm not sure if this is normal, but the datanode that's actively processing is experiencing a rapid increase in memory usage, now reaching 40GB(limit is 48G). The proxy data ingress speed is 12M/s. Below are some logs:

Working datanode:

[2024/04/17 03:57:30.506 +00:00] [INFO] [datanode/flush_manager.go:891] [SaveBinlogPath] [segmentID=449070276004601531] [collectionID=449070275988872015] [vchannel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0] [SegmentID=449070276004601531] [startPos=null] [checkPoints="[{\"segmentID\":449070276004601531,\"position\":{\"channel_name\":\"milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0\",\"msgID\":\"z8EOAAAAAAA=\",\"msgGroup\":\"datanode-961-milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0-true\",\"timestamp\":449138033858445315},\"num_of_rows\":27000}]"] ["Length of Field2BinlogPaths"=0] ["Length of Field2Stats"=0] ["Length of Field2Deltalogs"=1]
[2024/04/17 03:57:30.506 +00:00] [INFO] [datanode/segment.go:228] ["evictHistoryInsertBuffer done"] [segmentID=449070276003173050] [ts=2024/04/17 03:47:09.648 +00:00] [channel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.506 +00:00] [INFO] [datanode/segment.go:251] ["evictHistoryDeleteBuffer done"] [segmentID=449070276003173050] [ts=2024/04/17 03:47:09.648 +00:00] [channel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.506 +00:00] [INFO] [datanode/flush_manager.go:966] ["successfully save binlog"] [segmentID=449070276003173050] [collectionID=449070275988872015] [vchannel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.526 +00:00] [INFO] [datanode/segment.go:228] ["evictHistoryInsertBuffer done"] [segmentID=449070276004601531] [ts=2024/04/17 03:47:09.648 +00:00] [channel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.526 +00:00] [INFO] [datanode/segment.go:251] ["evictHistoryDeleteBuffer done"] [segmentID=449070276004601531] [ts=2024/04/17 03:47:09.648 +00:00] [channel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.526 +00:00] [INFO] [datanode/flush_manager.go:966] ["successfully save binlog"] [segmentID=449070276004601531] [collectionID=449070275988872015] [vchannel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.548 +00:00] [INFO] [datanode/flush_manager.go:891] [SaveBinlogPath] [segmentID=449070276002768313] [collectionID=449070275988872015] [vchannel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0] [SegmentID=449070276002768313] [startPos=null] [checkPoints="[{\"segmentID\":449070276002768313,\"position\":{\"channel_name\":\"milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0\",\"msgID\":\"0MEOAAAAAAA=\",\"msgGroup\":\"datanode-961-milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0-true\",\"timestamp\":449138033911136259},\"num_of_rows\":27000}]"] ["Length of Field2BinlogPaths"=0] ["Length of Field2Stats"=0] ["Length of Field2Deltalogs"=1]
[2024/04/17 03:57:30.555 +00:00] [INFO] [datanode/segment.go:228] ["evictHistoryInsertBuffer done"] [segmentID=449070276002768313] [ts=2024/04/17 03:47:09.849 +00:00] [channel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.555 +00:00] [INFO] [datanode/segment.go:251] ["evictHistoryDeleteBuffer done"] [segmentID=449070276002768313] [ts=2024/04/17 03:47:09.849 +00:00] [channel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.555 +00:00] [INFO] [datanode/flush_manager.go:966] ["successfully save binlog"] [segmentID=449070276002768313] [collectionID=449070275988872015] [vchannel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0]
[2024/04/17 03:57:30.572 +00:00] [INFO] [datanode/flush_manager.go:891] [SaveBinlogPath] [segmentID=449070276003786042] [collectionID=449070275988872015] [vchannel=milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0] [SegmentID=449070276003786042] [startPos=null] [checkPoints="[{\"segmentID\":449070276003786042,\"position\":{\"channel_name\":\"milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0\",\"msgID\":\"0MEOAAAAAAA=\",\"msgGroup\":\"datanode-961-milvus-zxy-mmap-rootcoord-dml_6_449070275988872015v0-true\",\"timestamp\":449138033911136259},\"num_of_rows\":27000}]"] ["Length of Field2BinlogPaths"=0] ["Length of Field2Stats"=0] ["Length of Field2Deltalogs"=1]

other datanodes:

[2024/04/17 03:56:35.272 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:56:35.689 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [cpTs=449138179571187714] [cpTime=2024/04/17 03:56:25.498 +00:00]
[2024/04/17 03:56:55.272 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:57:15.294 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:57:35.272 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:57:35.690 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [cpTs=449138195339149315] [cpTime=2024/04/17 03:57:25.648 +00:00]
[2024/04/17 03:57:55.272 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:15.272 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:35.273 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:35.689 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [cpTs=449138211094003714] [cpTime=2024/04/17 03:58:25.748 +00:00]
[2024/04/17 03:58:55.272 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:15.272 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:35.272 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [segmentID=449070275987331514] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:35.689 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_5_448935071155759810v0] [cpTs=449138226848858115] [cpTime=2024/04/17 03:59:25.848 +00:00]

[2024/04/17 03:57:24.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275990491859] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:57:28.630 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [cpTs=449138195758841859] [cpTime=2024/04/17 03:57:27.249 +00:00]
[2024/04/17 03:57:31.073 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=449070275987325214] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:57:44.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275990491859] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:57:51.074 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=449069910231090198] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:04.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275987342952] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:08.630 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [cpTs=449138206087315459] [cpTime=2024/04/17 03:58:06.649 +00:00]
[2024/04/17 03:58:11.073 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=448935071134136687] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:24.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275990492693] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:28.630 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [cpTs=449138211539910659] [cpTime=2024/04/17 03:58:27.449 +00:00]
[2024/04/17 03:58:31.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=449070275987288144] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:44.071 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275990491859] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:58:51.073 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=449070275987282486] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:04.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275990492693] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:08.630 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [cpTs=449138221868122115] [cpTime=2024/04/17 03:59:06.848 +00:00]
[2024/04/17 03:59:11.074 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=449070275987315802] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:24.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275990492693] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:28.629 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [cpTs=449138227320979459] [cpTime=2024/04/17 03:59:27.649 +00:00]
[2024/04/17 03:59:31.073 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=449070275987294746] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:44.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275990492693] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 03:59:51.074 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=449070275987281012] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 04:00:04.072 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [segmentID=449070275987342953] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]
[2024/04/17 04:00:08.630 +00:00] [DEBUG] [datanode/flow_graph_time_tick_node.go:111] ["UpdateChannelCheckpoint success"] [channel=milvus-zxy-mmap-rootcoord-dml_2_448935071151148141v0] [cpTs=449138237597024259] [cpTime=2024/04/17 04:00:06.849 +00:00]
[2024/04/17 04:00:11.078 +00:00] [DEBUG] [datanode/channel_meta.go:847] ["getChannelCheckpoint for segment"] [channel=milvus-zxy-mmap-rootcoord-dml_3_448935071094475827v0] [segmentID=448935071134106685] [isCurIBEmpty=true] [isCurDBEmpty=true] [len(hisIB)=0] [len(hisDB)=0] [newChannelCpTs=18446744073709551615]

stale · 2024-05-18T19:14:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

xiaofan-luan · 2024-05-19T02:30:14Z

Usually each shard should be split evenly. because the hash all the pk and write into one of the shard by PK.
unless you are trying to write with same PK or the PK has some really weird pattern

douglarek added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 8, 2024

douglarek assigned yanliang567 Apr 8, 2024

sre-ci-robot assigned douglarek and unassigned yanliang567 Apr 9, 2024

yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 9, 2024

stale bot added the stale indicates no udpates for 30 days label May 18, 2024

stale bot removed the stale indicates no udpates for 30 days label May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: excessive memory usage in datanode for 10M-dimensional HNSW index #32032

[Bug]: excessive memory usage in datanode for 10M-dimensional HNSW index #32032

douglarek commented Apr 8, 2024 •

edited

xiaofan-luan commented Apr 8, 2024

yanliang567 commented Apr 9, 2024

douglarek commented Apr 9, 2024 •

edited

congqixia commented Apr 9, 2024 •

edited

hjlee9182 commented Apr 9, 2024

yanliang567 commented Apr 9, 2024

xiaofan-luan commented Apr 9, 2024

hjlee9182 commented Apr 9, 2024 •

edited

douglarek commented Apr 9, 2024 •

edited

xiaofan-luan commented Apr 9, 2024

douglarek commented Apr 17, 2024

congqixia commented Apr 17, 2024

douglarek commented Apr 17, 2024 •

edited

stale bot commented May 18, 2024

xiaofan-luan commented May 19, 2024

[Bug]: excessive memory usage in datanode for 10M-dimensional HNSW index #32032

[Bug]: excessive memory usage in datanode for 10M-dimensional HNSW index #32032

Comments

douglarek commented Apr 8, 2024 • edited

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

xiaofan-luan commented Apr 8, 2024

yanliang567 commented Apr 9, 2024

douglarek commented Apr 9, 2024 • edited

congqixia commented Apr 9, 2024 • edited

hjlee9182 commented Apr 9, 2024

yanliang567 commented Apr 9, 2024

xiaofan-luan commented Apr 9, 2024

hjlee9182 commented Apr 9, 2024 • edited

douglarek commented Apr 9, 2024 • edited

xiaofan-luan commented Apr 9, 2024

douglarek commented Apr 17, 2024

congqixia commented Apr 17, 2024

douglarek commented Apr 17, 2024 • edited

stale bot commented May 18, 2024

xiaofan-luan commented May 19, 2024

douglarek commented Apr 8, 2024 •

edited

douglarek commented Apr 9, 2024 •

edited

congqixia commented Apr 9, 2024 •

edited

hjlee9182 commented Apr 9, 2024 •

edited

douglarek commented Apr 9, 2024 •

edited

douglarek commented Apr 17, 2024 •

edited