Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: failed to init bloom filter v2.3.12 #32228

Open
1 task done
lironghai opened this issue Apr 12, 2024 · 13 comments
Open
1 task done

[Bug]: failed to init bloom filter v2.3.12 #32228

lironghai opened this issue Apr 12, 2024 · 13 comments
Assignees
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@lironghai
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.3.12
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):  CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I encountered the same issue in version 2.3.12, which was caused by milvus inexplicably crashing and unable to load the collection properly after restarting.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

[2024/04/12 14:41:38.832 +00:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_2_448504020244458261v7] [posTs=449027865499140097] [posTime=2024/04/12 07:02:50.709 +00:00]
[2024/04/12 14:41:38.832 +00:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToRelease] [nodeID=11] [channelName=by-dev-rootcoord-dml_2_448504020244458261v7] ["check interval"=5m0s]
[2024/04/12 14:41:38.833 +00:00] [ERROR] [retry/retry.go:46] ["retry func failed"] ["retry time"=0] [error="NoSuchKey(key=files/stats_log/446901322446388689/446901322446388690/448504020239382462/100/448504020239382478)"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:46\ngithub.com/milvus-io/milvus/internal/storage.(*RemoteChunkManager).Read\n\t/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:166\ngithub.com/milvus-io/milvus/internal/storage.(*RemoteChunkManager).MultiRead\n\t/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:222\ngithub.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).loadStats\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:433\ngithub.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).initPKstats\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:475\ngithub.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).InitPKstats\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:331\ngithub.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).addSegment\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:242\ngithub.com/milvus-io/milvus/internal/datanode.getChannelWithEtcdTickler.func2\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_sync_service.go:268\ngithub.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\ngithub.com/panjf2000/ants/v2.(*goWorker).run.func1\n\t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:67"]
[2024/04/12 14:41:38.833 +00:00] [WARN] [datanode/channel_meta.go:435] ["failed to load bloom filter files"] [segmentID=448504020239382462] [error="failed to read files/stats_log/446901322446388689/446901322446388690/448504020239382462/100/448504020239382478: attempt #0: NoSuchKey"]
[2024/04/12 14:41:38.833 +00:00] [ERROR] [datanode/channel_meta.go:244] ["failed to init bloom filter"] [segmentID=448504020239382462] [error="failed to read files/stats_log/446901322446388689/446901322446388690/448504020239382462/100/448504020239382478: attempt #0: NoSuchKey"] [stack="github.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).addSegment\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:244\ngithub.com/milvus-io/milvus/internal/datanode.getChannelWithEtcdTickler.func2\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_sync_service.go:268\ngithub.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\ngithub.com/panjf2000/ants/v2.(*goWorker).run.func1\n\t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:67"]
[2024/04/12 14:41:38.833 +00:00] [WARN] [datanode/flow_graph_manager.go:136] ["fail to create new DataSyncService"] [channel=by-dev-rootcoord-dml_4_446901322446388689v4] [error="failed to read files/stats_log/446901322446388689/446901322446388690/448504020239382462/100/448504020239382478: attempt #0: NoSuchKey"]
[2024/04/12 14:41:38.833 +00:00] [WARN] [datanode/event_manager.go:180] ["handle put event: new data sync service failed"] [vChanName=by-dev-rootcoord-dml_4_446901322446388689v4] [error="failed to read files/stats_log/446901322446388689/446901322446388690/448504020239382462/100/448504020239382478: attempt #0: NoSuchKey"]
[2024/04/12 14:41:38.833 +00:00] [INFO] [datanode/data_sync_service.go:260] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_6_448504020280547923v3] [segmentID=448982651407422186] [numRows=6]
[2024/04/12 14:41:38.833 +00:00] [INFO] [datanode/channel_meta.go:220] ["adding segment"] [type=Flushed] [segmentID=448982651407422186] [collectionID=448504020280547923] [partitionID=448504020280547924] [channel=by-dev-rootcoord-dml_6_448504020280547923v3] [startPosition=] [endPosition=] [recoverTs=449027865499140097] [importing=false]
[2024/04/12 14:41:38.833 +00:00] [INFO] [datanode/channel_meta.go:386] ["begin to init pk bloom filter"] [segmentID=448982651407422186] [statsBinLogsLen=1]
[2024/04/12 14:41:38.834 +00:00] [INFO] [datanode/event_manager.go:128] ["DataNode is handling watchInfo PUT event"] [key=by-dev/meta/channelwatch/11/by-dev-rootcoord-dml_2_448504020244458261v7] ["watch state"=ToRelease]
[2024/04/12 14:41:38.834 +00:00] [INFO] [datacoord/channel_manager.go:750] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_3_446901322383071941v3] [progress=0]
[2024/04/12 14:41:38.834 +00:00] [INFO] [datanode/data_node.go:283] ["try to release flowgraph"] [vChanName=by-dev-rootcoord-dml_2_448504020244458261v7]
[2024/04/12 14:41:38.834 +00:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_3_446901322383071941v3] [timerCount=62]
[2024/04/12 14:41:38.834 +00:00] [INFO] [datacoord/channel_manager.go:675] ["datanode release channel successfully, will reassign"] [nodeID=11] [channel=by-dev-rootcoord-dml_3_446901322383071941v3]
[2024/04/12 14:41:38.834 +00:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToRelease] [nodeID=11] [channelName=by-dev-rootcoord-dml_3_446901322383071941v3] ["timeout interval"=5m0s] [runningTimerCount=62]
[2024/04/12 14:41:38.835 +00:00] [ERROR] [retry/retry.go:46] ["retry func failed"] ["retry time"=0] [error="NoSuchKey(key=files/stats_log/448504020280547923/448504020280547924/448982651407422186/100/448982651407422204)"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:46\ngithub.com/milvus-io/milvus/internal/storage.(*RemoteChunkManager).Read\n\t/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:166\ngithub.com/milvus-io/milvus/internal/storage.(*RemoteChunkManager).MultiRead\n\t/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:222\ngithub.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).loadStats\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:433\ngithub.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).initPKstats\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:475\ngithub.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).InitPKstats\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:331\ngithub.com/milvus-io/milvus/internal/datanode.(*ChannelMeta).addSegment\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/channel_meta.go:242\ngithub.com/milvus-io/milvus/internal/datanode.getChannelWithEtcdTickler.func2\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_sync_service.go:268\ngithub.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\ngithub.com/panjf2000/ants/v2.(*goWorker).run.func1\n\t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:67"]
[2024/04/12 14:41:38.835 +00:00] [INFO] [datanode/event_manager.go:117] ["DataNode received a PUT event with an end State"] [state=WatchFailure]
[2024/04/12 14:41:38.835 +00:00] [INFO] [datanode/event_manager.go:216] ["handle put event success"] [key=channelwatch/11/by-dev-rootcoord-dml_4_446901322446388689v4] [state=WatchFailure] [vChanName=by-dev-rootcoord-dml_4_446901322446388689v4]
[2024/04/12 14:41:38.835 +00:00] [INFO] [datacoord/policy.go:459] ["AverageReassignPolicy working"] [avaNodesCount=0] [toAssignChannelNum=1] [avaNodesChannelSum=0]
[2024/04/12 14:41:38.835 +00:00] [WARN] [datacoord/policy.go:464] ["there is no available nodes when reassigning, return"]
[2024/04/12 14:41:38.835 +00:00] [WARN] [datanode/channel_meta.go:435] ["failed to load bloom filter files"] [segmentID=448982651407422186] [error="failed to read files/stats_log/448504020280547923/448504020280547924/448982651407422186/100/448982651407422204: attempt #0: NoSuchKey"]

Anything else?

No response

@lironghai lironghai added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 12, 2024
@lironghai
Copy link
Author

Upgrading to 2.3.13 is still the same error

@congqixia
Copy link
Contributor

it looks like statslog is missing. how did you deploy your milvus btw?

@lironghai
Copy link
Author

it looks like statslog is missing. how did you deploy your milvus btw?

Milvus Standalone GPU with Docker Compose

@congqixia
Copy link
Contributor

Assuming the milvus used minio deployed with the docker-compose, how did the data folder mapped into your storage? could you please provided the setup with confidential info masked?

@xiaofan-luan
Copy link
Contributor

Assuming the milvus used minio deployed with the docker-compose, how did the data folder mapped into your storage? could you please provided the setup with confidential info masked?

let's add some log for GC of scan? so it will be easy for us to track if file is missing due to GC

@yanliang567
Copy link
Contributor

@lironghai is it a completed newly milvus deployment? could you please share the reproduce steps and milvus logs? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.
/assign @lironghai
/unassign

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 13, 2024
@lironghai
Copy link
Author

@lironghai is it a completed newly milvus deployment? could you please share the reproduce steps and milvus logs? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs. /assign @lironghai /unassign

Minio is not deployed using Docker Compose, but using an external host minio Docker composite only deploys milvus and etcd by modifying the mini link configuration in Docker composite. yam to connect

企业微信截图_17129817801487

@yanliang567
Copy link
Contributor

/assign @LoveEachDay @congqixia
any more ideas?

@lironghai
Copy link
Author

@lironghai is it a completed newly milvus deployment? could you please share the reproduce steps and milvus logs? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs. /assign @lironghai /unassign

Minio is not deployed using Docker Compose, but using an external host minio Docker composite only deploys milvus and etcd by modifying the mini link configuration in Docker composite. yam to connect

企业微信截图_17129817801487

I tried to deploy using a full Docker Compose, but how do I recover my data

@congqixia
Copy link
Contributor

@lironghai quick question, did any other Milvus cluster shared same minio service with this one?

@lironghai
Copy link
Author

@lironghai quick question, did any other Milvus cluster shared same minio service with this one?

Yes, there may have been two Milvus services sharing the same mini before, but the mini does not have any data

@congqixia
Copy link
Contributor

@lironghai are those two milvus instance using same root path?

Copy link

stale bot commented May 18, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

@stale stale bot added the stale indicates no udpates for 30 days label May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug stale indicates no udpates for 30 days triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

5 participants