Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [major compaction] Major compact hangs when enable partition key as clustering key #32329

Open
1 task done
binbinlv opened this issue Apr 16, 2024 · 1 comment
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@binbinlv
Copy link
Contributor

binbinlv commented Apr 16, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:lru_dev branch latest
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):   all 
- SDK version(e.g. pymilvus v2.0.0rc2): dev latest
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Major compact hangs when enable partition key as clustering key

Expected Behavior

Major compact successfully

Steps To Reproduce

  1. set "usePartitionKeyAsClusteringKey: true" in milvus.yaml
  2. run the following script:
import os
import time
import random
import string
import numpy as np
from pymilvus import (
    connections,
    utility,
    FieldSchema, CollectionSchema, DataType,
    Collection,
)

fmt = "\n=== {:30} ===\n"
dim = 128

print(fmt.format("start connecting to Milvus"))
host = os.environ.get('MILVUS_HOST')
if host == None:
    host = ""
print(fmt.format(f"Milvus host: {host}"))
connections.connect()

default_fields = [
    FieldSchema(name="count", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="key", dtype=DataType.INT64, is_partition_key=True),
    FieldSchema(name="random", dtype=DataType.DOUBLE),
    FieldSchema(name="var", dtype=DataType.VARCHAR, max_length=10000),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)
]
default_schema = CollectionSchema(fields=default_fields, description="test clustering-key collection")
collection_name = "major_compaction_collection_enable_scalar_partition_key_after_index"

if utility.has_collection(collection_name):
   collection = Collection(name=collection_name)
   collection.drop()
   print("drop the original collection")
hello_milvus = Collection(name=collection_name, schema=default_schema)

print("Starting major compaction")
start = time.time()
hello_milvus.compact(is_major=True)
res = hello_milvus.get_compaction_state(is_major=True)
print(res)
print("Waiting for major compaction complete")
hello_milvus.wait_for_compaction_completed(is_major=True)
end = time.time()
print("Major compaction complete in %f s" %(end - start))
res = hello_milvus.get_compaction_state(is_major=True)
print(res)


nb = 1000

rng = np.random.default_rng(seed=19530)
random_data = rng.random(nb).tolist()

vec_data = [[random.random() for _ in range(dim)] for _ in range(nb)]
_len = int(20)
_str = string.ascii_letters + string.digits
_s = _str
print("_str size ", len(_str))

for i in range(int(_len / len(_str))):
    _s += _str
    print("append str ", i)
values = [''.join(random.sample(_s, _len - 1)) for _ in range(nb)]
index = 0
while index < 100:
    # insert data
    data = [
        [index * nb + i for i in range(nb)],
        [random.randint(0,100) for i in range(nb)],
        random_data,
        values,
        vec_data,
    ]
    start = time.time()
    res = hello_milvus.insert(data)
    end = time.time() - start
    print("insert %d %d done in %f" % (index, nb, end))
    index += 1
    hello_milvus.flush()

print(f"Number of entities in Milvus: {hello_milvus.num_entities}")  # check the num_entites

# 4. create index
print(fmt.format("Start Creating index AUTOINDEX"))
index = {
    "index_type": "AUTOINDEX",
    "metric_type": "L2",
    "params": {},
}

print("creating index")
hello_milvus.create_index("embeddings", index)
print("waiting for index completed")
utility.wait_for_index_building_complete(collection_name)
res = utility.index_building_progress(collection_name)
print(res)

print(fmt.format("Load"))
hello_milvus.load()

res = utility.get_query_segment_info(collection_name)

print("before major compaction")
print(res)

# major compact

print("Starting major compaction")
start = time.time()
hello_milvus.compact(is_major=True)
res = hello_milvus.get_compaction_state(is_major=True)
print(res)
print("Waiting for major compaction complete")
hello_milvus.wait_for_compaction_completed(is_major=True)
end = time.time()
print("Major compaction complete in %f s" %(end - start))
res = hello_milvus.get_compaction_state(is_major=True)
print(res)

res = utility.get_query_segment_info(collection_name)
print("after major compaction")
print(res)

nb = 1
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]

nq = 1

default_search_params = {"metric_type": "L2", "params": {}}
res1 = hello_milvus.search(vectors[:nq], "embeddings", default_search_params, 10, "count >= 0")

print(res1[0].ids)

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&panes=%7B%22H1E%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22major-compact-search-zugwg.%2A%5C%22%7D%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1

Anything else?

No response

@binbinlv binbinlv added kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Apr 16, 2024
@binbinlv binbinlv added this to the 2.4.1 milestone Apr 16, 2024
Copy link

stale bot commented May 18, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

2 participants