-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: excessive memory usage in datanode for 10M-dimensional HNSW index #32032
Comments
This is the minimum requirment. |
@douglarek I'd like to know more about how you insert data into milvus? the concurrent number of insertion? the number of entities for each insert request? did you manually call flush() during the insertion? also if possible please share the collection schema info. |
Python SDK, 8 processes, 1000 vectors in single request, no flush, and the collection schema is: By monitoring the metric By the way, as shown in the screenshot in my issue, why is the memory not balanced between the two datanodes? |
@douglarek for datanode workload unbalance issue, it looks like there is only one shard for this collection. So only one datanode can have workload, which is by design |
@yanliang567 |
in most cases, you don't have to do flsuh() manually, Milvus will do it automatically. |
|
I'm not a original author, I was just commenting out of curiosity, you're asking this person, right? @douglarek |
|
try to set it to 32GB and I guess it will be 20GB. |
I've noticed an interesting phenomenon: no matter how many partitions are set for all topics under xxx_rootcoord-dml_xx, and no matter how many datanodes there are, the data always aggregates to the same datanode. |
For utilizing multiple datanodes or dml channels, the number or shards shall be increased but with some cost of search performance |
The number of shards for the collection is 4. I am currently running with 4 datanodes, and all topics of Kafka DML have 3 partitions each. The observation I'm making now is that among Kafka DML topics, 4 have been written to, but only one datanode seems to be actively processing, while the other three are updating checkpoints. I'm not sure if this is normal, but the datanode that's actively processing is experiencing a rapid increase in memory usage, now reaching 40GB(limit is 48G). The proxy data ingress speed is 12M/s. Below are some logs: Working datanode:
other datanodes:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Usually each shard should be split evenly. because the hash all the pk and write into one of the shard by PK. |
Is there an existing issue for this?
Environment
Current Behavior
The current datanode's memory usage is too high:
Expected Behavior
Steps To Reproduce
Milvus Log
Logs captured through
export_milvus_log.sh
: https://drive.google.com/file/d/18jaCfxDUxV_4gE7DDJ29T5ggAzo_UzPA/view?usp=sharingAnything else?
Connection information for Kafka, etcd, and other important connections has been sanitized.
The text was updated successfully, but these errors were encountered: