Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mc admin heal doesn't work #4858

Open
AlexZIX opened this issue Feb 26, 2024 · 7 comments
Open

mc admin heal doesn't work #4858

AlexZIX opened this issue Feb 26, 2024 · 7 comments

Comments

@AlexZIX
Copy link

AlexZIX commented Feb 26, 2024

I've replaced one broken disk with the new one and its filling with data. In previous versions of MinIO I can reviewed the healing progress using mc admin heal but for now it shows me that no active healing in my cluster:

root@minio-cold-1:~# mc admin heal minio-cold
No active healing is detected for new disks.

But at the same time I see in my Grafana that healing are in progress:

image

So is this a bug or new version shouldn't show the healing status in console?

mc --version

root@minio-cold-1:~# mc --version
mc version RELEASE.2023-01-28T20-29-38Z (commit-id=2e95a70c98fb9c2629cd89817b8759bfa109a4d0)
Runtime: go1.19.4 linux/amd64

System information

Cluster: 4 nodes with 4 disks on each

root@minio-cold-1:~# uname -a
Linux minio-cold-1 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

@harshavardhana
Copy link
Member

So is this a bug or new version shouldn't show the healing status in console?

This is because healing.bin is missing for some reason, causing the healing state to be removed.

// cc @vadmeste this sounds something we have seen now elsewhere, can you investigate?

@vadmeste
Copy link
Member

@AlexZIX newer versions do not show stats in prometheus anymore. Are you sure the disk healing did not finish ? can you check the disk usage (df -h) and compare it with other disks in the same erasure set ?

@AlexZIX
Copy link
Author

AlexZIX commented Feb 26, 2024

@vadmeste Healing should be in progress because replaced disk is still have only 10% of data:

image

One more question is why healing process too slow? I've replaced this disk week ago but it contains only 10% of data. If healing continues at the same speed then total recovering time will be 10 weeks or more that 2 months. Is that normal?

@vadmeste
Copy link
Member

can you share all MinIO logs of node minio-cold-4 ?

@AlexZIX
Copy link
Author

AlexZIX commented Feb 26, 2024

Yes if you'll explain where I can find it r how to export it.

@vadmeste
Copy link
Member

@AlexZIX it depends how you deployed MinIO. It is MinIO standard output. If it is bare-metal, most likley, journatlctl -u minio will show some logs. By the way are you using ILM expiry feature in this cluster ?

@AlexZIX
Copy link
Author

AlexZIX commented Feb 26, 2024

@vadmeste Output from journalctl attached.
minio.log

If ILM means expiration of versioned files which was removed then my answer is yes - we use buckets with versioning enabled with expiration settings from removed objects.

This is df -h output which may helps too:

root@minio-cold-4:~# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 1.6G 1.8M 1.6G 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 17G 5.6G 11G 35% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda2 1.8G 252M 1.4G 16% /boot
/dev/sda1 952M 6.1M 946M 1% /boot/efi
hdd-pool-1 3.2T 157G 3.1T 5% /hdd-pools/hdd-pool-1
hdd-pool-4 3.2T 1.3T 1.9T 41% /hdd-pools/hdd-pool-4
hdd-pool-3 3.2T 1.4T 1.9T 42% /hdd-pools/hdd-pool-3
hdd-pool-2 3.2T 1.4T 1.9T 42% /hdd-pools/hdd-pool-2
tmpfs 1.6G 4.0K 1.6G 1% /run/user/0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants