[BUG] mfsmaster hung and in unkillable D state #566

charlesboyo · 2024-03-02T23:10:24Z

Have you read through available documentation, open Github issues and Github Q&A Discussions?

Yes

System information

Your moosefs version and its origin (moosefs.com, packaged by distro, built from source, ...).

3.0.115-1, packaged by distro

Operating system (distribution) and kernel version.

Ubuntu 20.04.3 LTS, kernel 5.4.0-88-generic

Hardware / network configuration, and underlying filesystems on master, chunkservers, and clients.

SuperMicro chassis, Intel Xeon CPU E5-2690 v2, 10G, Intel 82599ES 10-Gigabit NIC
Host contains master (ext4), chunkserver (xfs) and self-mount on /mnt/mfs
There is a zpool with a file-based vdev contained in /mnt/mfs.
Running Proxmox Backup Server (PBS) in an LXC container on the 'raid' zpool, as well as another container with virtual machines.

How much data is tracked by moosefs master (order of magnitude)?

All fs objects: 6 TiB
Total space: 71 TiB
Free space: 59 TiB
RAM used: 641 MB
last metadata save duration: ~0.6s

Describe the problem you observed.

mfsmaster stops responding and so does the local and remote MFS mounts.
Top and various system utilities shows processes like mfsmaster, kswap, journal-offline and z_vdev_file hung in D state.
The 'raid' zpool whose vdev is a file in /mnt/mfs also hangs indefinitely.
It becomes impossible to log into the server using SSH or on the console shortly after this. Only a hard reboot works.
The virtual machines in one of the LXC containers continue to work just fine though I an unable to log into the container or the host.
The PBS container is inaccessible because it depends on the /mnt/mfs mount which is unresponsive.

Can you reproduce it? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I don't quite know how to reproduce this. The setup has been stable for months until an unscheduled reboot last week.
This issue started then and happens without much warning. Only thing I noticed before the hang was a read error on one of the chunk disks. Another thing is that the latest incident happened just as a backup job was starting on PBS.

Include any warning/errors/backtraces from the system logs.

systemctl status moosefs-master

● moosefs-master.service - MooseFS Master server
Loaded: loaded (/lib/systemd/system/moosefs-master.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2024-03-01 20:17:00 WAT; 1 day 2h ago
Main PID: 2915 (mfsmaster)
Tasks: 2 (limit: 309393)
Memory: 702.7M
CGroup: /system.slice/moosefs-master.service
├─2915 /usr/sbin/mfsmaster start
└─2919 mfsmaster (data writer)

Mar 02 21:30:40 ipnx-stor-hq01 mfsmaster[2915]: structure check loop
Mar 02 21:34:45 ipnx-stor-hq01 mfsmaster[2915]: chunk 000000000008D9AC_00000001: there are no copies
Mar 02 21:40:14 ipnx-stor-hq01 mfsmaster[2915]: chunk 000000000008D9AC_00000001: there are no copies
Mar 02 21:42:27 ipnx-stor-hq01 mfsmaster[2915]: chunk 000000000008D9AC_00000001: there are no copies
Mar 02 21:47:54 ipnx-stor-hq01 mfsmaster[2915]: chunk 000000000008D9AC_00000001: there are no copies
Mar 02 21:54:30 ipnx-stor-hq01 mfsmaster[2915]: chunk 000000000008D9AC_00000001: there are no copies
Mar 02 21:57:06 ipnx-stor-hq01 mfsmaster[2915]: chunk 000000000008D9AC_00000001: there are no copies
Mar 02 22:00:00 ipnx-stor-hq01 mfsmaster[2915]: child finished
Mar 02 22:00:00 ipnx-stor-hq01 mfsmaster[2915]: store process has finished - store time: 0.640
Mar 02 22:00:39 ipnx-stor-hq01 mfsmaster[2915]: long loop detected (9.027600s)

ls -l /var/lib/mfs/changelog.0.mfs

-rw-r----- 1 mfs mfs 715286 Mar 2 22:01 /var/lib/mfs/changelog.0.mfs

tail /var/lib/mfs/changelog.0.mfs

977979510: 1709413319|WRITE(52582,0,1,0):888190
977979511: 1709413319|MOVE(117644,8856743aa15b1924aea2935a17344e1c1f7cd460ec1b3dddd68ce2d4f757f16d.tmp_9PcsHp,117644,8856743aa15b1924aea2935a17344e1c1f7cd460ec1b3dddd68ce2d4f757f16d):52582
977979512: 1709413319|LENGTH(52582,15376,0)
977979513: 1709413319|UNLOCK(888190)
977979514: 1709413319|CREATE(103952,52da0241e382d76c7c6e45166853f40f4395d195d4ab3bc05ff331ef838690bc.tmp_hhlRrm,1,384,18,34,34,0):52583
977979515: 1709413319|ACQUIRE(69,52583)
977979516: 1709413319|ATTR(52583,420,34,34,1709413319,1709413319,0,0)
977979517: 1709413319|WRITE(52583,0,1,0):888191
977979518: 1709413319|MOVE(103952,52da0241e382d76c7c6e45166853f40f4395d195d4ab3bc05ff331ef838690bc.tmp_hhlRrm,103952,52da0241e382d76c7c6e45166853f40f4395d195d4ab3bc05ff331ef838690bc):52583
977979519: 1709413320|FREEINODES():4,0,22789

dmesg

[89816.535969] sd 0:0:55:0: [sda] tag#970 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[89816.536002] sd 0:0:55:0: [sda] tag#970 Sense Key : Medium Error [current] [descriptor]
[89816.536008] sd 0:0:55:0: [sda] tag#970 Add. Sense: Unrecovered read error
[89816.536016] sd 0:0:55:0: [sda] tag#970 CDB: Read(16) 88 00 00 00 00 00 c6 ea db 48 00 00 02 00 00 00
[89816.536023] blk_update_request: critical medium error, dev sda, sector 3337280472 op 0x0:(READ) flags 0x80700 phys_seg 46 prio class 0
[89817.028172] sd 0:0:55:0: [sda] tag#916 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[89817.028188] sd 0:0:55:0: [sda] tag#916 Sense Key : Medium Error [current] [descriptor]
[89817.028191] sd 0:0:55:0: [sda] tag#916 Add. Sense: Unrecovered read error
[89817.028196] sd 0:0:55:0: [sda] tag#916 CDB: Read(16) 88 00 00 00 00 00 c6 ea db d8 00 00 00 08 00 00
[89817.028199] blk_update_request: critical medium error, dev sda, sector 3337280472 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

[93164.858068] INFO: task kswapd0:328 blocked for more than 120 seconds.
[93164.859476] Tainted: P O 5.4.0-88-generic #99-Ubuntu
[93164.860808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[93164.862106] kswapd0 D 0 328 2 0x80004000
[93164.862110] Call Trace:
[93164.862125] __schedule+0x2e3/0x740
[93164.862131] schedule+0x42/0xb0
[93164.862135] io_schedule+0x16/0x40
[93164.862142] wait_on_page_bit+0x11c/0x200
[93164.862147] ? __delete_from_page_cache+0xa7/0x1d0
[93164.862154] ? file_fdatawait_range+0x30/0x30
[93164.862158] wait_on_page_writeback+0x43/0x90
[93164.862162] shrink_page_list+0x7a8/0xbb0
[93164.862165] shrink_inactive_list+0x201/0x3e0
[93164.862167] shrink_node_memcg+0x137/0x370
[93164.862170] mem_cgroup_shrink_node+0xb3/0x180
[93164.862177] mem_cgroup_soft_limit_reclaim+0x101/0x320
[93164.862179] balance_pgdat+0x297/0x590
[93164.862182] kswapd+0x1f8/0x3c0
[93164.862187] ? wait_woken+0x80/0x80
[93164.862193] kthread+0x104/0x140
[93164.862194] ? balance_pgdat+0x590/0x590
[93164.862196] ? kthread_park+0x90/0x90
[93164.862206] ret_from_fork+0x35/0x40

[93164.862209] INFO: task kswapd1:329 blocked for more than 120 seconds.
[93164.863298] Tainted: P O 5.4.0-88-generic #99-Ubuntu
[93164.864608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[93164.865869] kswapd1 D 0 329 2 0x80004000
[93164.865871] Call Trace:
[93164.865876] __schedule+0x2e3/0x740
[93164.865879] schedule+0x42/0xb0
[93164.865882] io_schedule+0x16/0x40
[93164.865885] wait_on_page_bit+0x11c/0x200
[93164.865888] ? file_fdatawait_range+0x30/0x30
[93164.865905] wait_on_page_writeback+0x43/0x90
[93164.865908] shrink_page_list+0x7a8/0xbb0
[93164.865912] shrink_inactive_list+0x201/0x3e0
[93164.865916] shrink_node_memcg+0x137/0x370
[93164.865919] ? __switch_to_asm+0x40/0x70
[93164.865922] ? __switch_to_asm+0x40/0x70
[93164.865924] ? __switch_to_asm+0x34/0x70
[93164.865926] mem_cgroup_shrink_node+0xb3/0x180
[93164.865931] mem_cgroup_soft_limit_reclaim+0x101/0x320
[93164.865935] balance_pgdat+0x297/0x590
[93164.865939] kswapd+0x1f8/0x3c0
[93164.865942] ? wait_woken+0x80/0x80
[93164.865946] kthread+0x104/0x140
[93164.865948] ? balance_pgdat+0x590/0x590
[93164.865951] ? kthread_park+0x90/0x90
[93164.865953] ret_from_fork+0x35/0x40

[93164.866007] INFO: task z_vdev_file:2543 blocked for more than 120 seconds.
[93164.866869] Tainted: P O 5.4.0-88-generic #99-Ubuntu
[93164.867717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[93164.868577] z_vdev_file D 0 2543 2 0x80004000
[93164.868581] Call Trace:
[93164.868584] __schedule+0x2e3/0x740
[93164.868587] ? check_preempt_curr+0x4e/0x90
[93164.868589] ? ttwu_do_wakeup+0x1e/0x150
[93164.868591] schedule+0x42/0xb0
[93164.868595] rwsem_down_write_slowpath+0x244/0x4d0
[93164.868598] ? select_idle_sibling+0x1f/0x70
[93164.868601] down_write+0x41/0x50
[93164.868606] fuse_file_write_iter+0x15f/0x3d0
[93164.868608] ? default_wake_function+0x12/0x20
[93164.868615] new_sync_write+0x125/0x1c0
[93164.868617] __vfs_write+0x29/0x40
[93164.868620] vfs_write+0xb9/0x1a0
[93164.868623] kernel_write+0x31/0x50
[93164.868639] spl_kernel_write+0xe/0x10 [spl]
[93164.868647] vn_rdwr+0x5b/0xc0 [spl]
[93164.868750] vdev_file_io_strategy+0x8d/0x140 [zfs]
[93164.868758] taskq_thread+0x245/0x430 [spl]
[93164.868760] ? wake_up_q+0x70/0x70
[93164.868763] kthread+0x104/0x140
[93164.868769] ? task_done+0x90/0x90 [spl]
[93164.868771] ? kthread_park+0x90/0x90
[93164.868773] ret_from_fork+0x35/0x40

[93164.868786] INFO: task mfsmaster:2915 blocked for more than 120 seconds.
[93164.869669] Tainted: P O 5.4.0-88-generic #99-Ubuntu
[93164.870592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[93164.871510] mfsmaster D 0 2915 1 0x00004000
[93164.871512] Call Trace:
[93164.871516] __schedule+0x2e3/0x740
[93164.871518] schedule+0x42/0xb0
[93164.871520] io_schedule+0x16/0x40
[93164.871523] wait_on_page_bit+0x11c/0x200
[93164.871525] ? __delete_from_page_cache+0xa7/0x1d0
[93164.871528] ? file_fdatawait_range+0x30/0x30
[93164.871530] wait_on_page_writeback+0x43/0x90
[93164.871531] shrink_page_list+0x7a8/0xbb0
[93164.871534] shrink_inactive_list+0x201/0x3e0
[93164.871537] shrink_node_memcg+0x137/0x370
[93164.871540] mem_cgroup_shrink_node+0xb3/0x180
[93164.871545] mem_cgroup_soft_limit_reclaim+0x101/0x320
[93164.871550] do_try_to_free_pages+0x139/0x3a0
[93164.871554] try_to_free_pages+0xea/0x1d0
[93164.871562] __alloc_pages_slowpath+0x447/0xe50
[93164.871568] ? __switch_to_asm+0x34/0x70
[93164.871569] ? __switch_to_asm+0x40/0x70
[93164.871572] ? __switch_to_asm+0x34/0x70
[93164.871573] ? __switch_to_asm+0x40/0x70
[93164.871574] ? __switch_to_asm+0x40/0x70
[93164.871576] ? __switch_to_asm+0x34/0x70
[93164.871578] ? get_page_from_freelist+0x24a/0x3f0
[93164.871580] __alloc_pages_nodemask+0x2d0/0x320
[93164.871584] alloc_pages_current+0x87/0xe0
[93164.871586] __get_free_pages+0x11/0x40
[93164.871592] __pollwait+0x92/0xd0
[93164.871597] tcp_poll+0x1e7/0x370
[93164.871603] sock_poll+0x7c/0xc0
[93164.871605] do_sys_poll+0x2ad/0x520
[93164.871609] ? tcp_push+0xfd/0x100
[93164.871611] ? tcp_sendmsg_locked+0x464/0xde0
[93164.871613] ? poll_initwait+0x40/0x40
[93164.871616] ? poll_select_finish+0x210/0x210
[93164.871618] ? poll_select_finish+0x210/0x210
[93164.871621] ? poll_select_finish+0x210/0x210
[93164.871624] ? poll_select_finish+0x210/0x210
[93164.871628] ? poll_select_finish+0x210/0x210
[93164.871633] ? poll_select_finish+0x210/0x210
[93164.871639] ? poll_select_finish+0x210/0x210
[93164.871647] ? poll_select_finish+0x210/0x210
[93164.871651] ? poll_select_finish+0x210/0x210
[93164.871655] __x64_sys_poll+0xa5/0x150
[93164.871661] do_syscall_64+0x57/0x190
[93164.871664] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[93164.871668] RIP: 0033:0x7ff3460deac7
[93164.871676] Code: Bad RIP value.
[93164.871677] RSP: 002b:00007ffd5e380de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000007
[93164.871679] RAX: ffffffffffffffda RBX: 000612b3cdb72140 RCX: 00007ff3460deac7
[93164.871680] RDX: 000000000000000a RSI: 000000000000000b RDI: 00007ffd5e380e40
[93164.871680] RBP: 00007ffd5e380e40 R08: 0000000000000009 R09: 0000000000000001
[93164.871681] R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffd5e380e0c
[93164.871682] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

[93164.871729] INFO: task mfsmount:379826 blocked for more than 120 seconds.
[93164.872658] Tainted: P O 5.4.0-88-generic #99-Ubuntu
[93164.873585] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[93164.874533] mfsmount D 0 379826 1 0x00000000
[93164.874535] Call Trace:
[93164.874539] __schedule+0x2e3/0x740
[93164.874541] ? __mod_lruvec_slab_state+0x99/0xd0
[93164.874543] schedule+0x42/0xb0
[93164.874545] io_schedule+0x16/0x40
[93164.874548] wait_on_page_bit+0x11c/0x200
[93164.874550] ? file_fdatawait_range+0x30/0x30
[93164.874552] wait_on_page_writeback+0x43/0x90
[93164.874554] shrink_page_list+0x7a8/0xbb0
[93164.874556] shrink_inactive_list+0x201/0x3e0
[93164.874560] shrink_node_memcg+0x137/0x370
[93164.874565] mem_cgroup_shrink_node+0xb3/0x180
[93164.874569] mem_cgroup_soft_limit_reclaim+0x101/0x320
[93164.874573] do_try_to_free_pages+0x139/0x3a0
[93164.874576] try_to_free_pages+0xea/0x1d0
[93164.874581] __alloc_pages_slowpath+0x447/0xe50
[93164.874584] ? schedule_hrtimeout_range_clock+0xa0/0x110
[93164.874586] ? get_page_from_freelist+0x24a/0x3f0
[93164.874589] __alloc_pages_nodemask+0x2d0/0x320
[93164.874591] alloc_pages_current+0x87/0xe0
[93164.874594] skb_page_frag_refill+0xa3/0x110
[93164.874596] sk_page_frag_refill+0x21/0x80
[93164.874599] tcp_sendmsg_locked+0x2c9/0xde0
[93164.874601] tcp_sendmsg+0x2d/0x50
[93164.874606] inet_sendmsg+0x43/0x70
[93164.874608] sock_sendmsg+0x5e/0x70
[93164.874611] sock_write_iter+0x93/0xf0
[93164.874614] do_iter_readv_writev+0x14f/0x1d0
[93164.874615] do_iter_write+0x84/0x1a0
[93164.874617] vfs_writev+0xa6/0xf0
[93164.874619] ? vfs_read+0x12e/0x160
[93164.874621] do_writev+0x71/0x120
[93164.874622] __x64_sys_writev+0x1c/0x20
[93164.874625] do_syscall_64+0x57/0x190
[93164.874626] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[93164.874628] RIP: 0033:0x7fe36270e61d
[93164.874631] Code: Bad RIP value.
[93164.874632] RSP: 002b:00007fe3506d8eb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
[93164.874633] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe36270e61d
[93164.874634] RDX: 0000000000000002 RSI: 00007fe3506d9010 RDI: 000000000000005f
[93164.874635] RBP: 0000000000000002 R08: 0000000000000000 R09: 00000000000000f9
[93164.874635] R10: 0000000000000002 R11: 0000000000000293 R12: 00007fe348356da0
[93164.874637] R13: 00007fe344200680 R14: 0000000000000015 R15: 0000000000000000

[93164.874716] INFO: task journal-offline:4001444 blocked for more than 120 seconds.
[93164.875654] Tainted: P O 5.4.0-88-generic #99-Ubuntu
[93164.876606] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[93164.877586] journal-offline D 0 4001444 145086 0x00004326
[93164.877588] Call Trace:
[93164.877595] __schedule+0x2e3/0x740
[93164.877667] ? zio_taskq_dispatch+0x7b/0xa0 [zfs]
[93164.877671] schedule+0x42/0xb0
[93164.877678] cv_wait_common+0x133/0x180 [spl]
[93164.877681] ? wait_woken+0x80/0x80
[93164.877686] __cv_wait+0x15/0x20 [spl]
[93164.877752] zil_commit_waiter+0x9d/0x120 [zfs]
[93164.877824] zil_commit_impl+0x64/0xa0 [zfs]
[93164.877892] zil_commit+0x40/0x60 [zfs]
[93164.877961] zpl_writepages+0xec/0x180 [zfs]
[93164.877978] do_writepages+0x43/0xd0
[93164.877980] __filemap_fdatawrite_range+0xd5/0x110
[93164.877983] filemap_write_and_wait_range+0x48/0xb0
[93164.878054] zpl_fsync+0x3a/0x90 [zfs]
[93164.878061] vfs_fsync_range+0x49/0x80
[93164.878065] ? __fget_light+0x57/0x70
[93164.878070] do_fsync+0x3d/0x70
[93164.878073] __x64_sys_fsync+0x14/0x20
[93164.878076] do_syscall_64+0x57/0x190
[93164.878078] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[93164.878080] RIP: 0033:0x7fadf5094a3a
[93164.878083] Code: Bad RIP value.
[93164.878084] RSP: 002b:00007fadf354dc40 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[93164.878085] RAX: ffffffffffffffda RBX: 000055f95e933980 RCX: 00007fadf5094a3a
[93164.878085] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000015
[93164.878086] RBP: 000055f95e93c660 R08: 0000000000000000 R09: 00007fadf354e6c0
[93164.878086] R10: 0000000000000398 R11: 0000000000000293 R12: 000055f95cd44c60
[93164.878087] R13: 0000000000000002 R14: 00007ffdb5172190 R15: 00007fadf2d4e000

[93164.878137] INFO: task z_vdev_file:4001445 blocked for more than 120 seconds.
[93164.878830] Tainted: P O 5.4.0-88-generic #99-Ubuntu
[93164.879535] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[93164.880244] z_vdev_file D 0 4001445 2 0x80004000
[93164.880245] Call Trace:
[93164.880248] __schedule+0x2e3/0x740
[93164.880249] schedule+0x42/0xb0
[93164.880252] rwsem_down_write_slowpath+0x244/0x4d0
[93164.880256] down_write+0x41/0x50
[93164.880257] fuse_file_write_iter+0x15f/0x3d0
[93164.880260] new_sync_write+0x125/0x1c0
[93164.880262] __vfs_write+0x29/0x40
[93164.880263] vfs_write+0xb9/0x1a0
[93164.880265] kernel_write+0x31/0x50
[93164.880271] spl_kernel_write+0xe/0x10 [spl]
[93164.880275] vn_rdwr+0x5b/0xc0 [spl]
[93164.880320] vdev_file_io_strategy+0x8d/0x140 [zfs]
[93164.880330] taskq_thread+0x245/0x430 [spl]
[93164.880333] ? __switch_to_asm+0x40/0x70
[93164.880334] ? wake_up_q+0x70/0x70
[93164.880336] kthread+0x104/0x140
[93164.880340] ? task_done+0x90/0x90 [spl]
[93164.880342] ? kthread_park+0x90/0x90
[93164.880343] ret_from_fork+0x35/0x40

[93297.838799] WARNING: Pool 'raid' has encountered an uncorrectable I/O failure and has been suspended.
[95921.823747] WARNING: Pool 'raid' has encountered an uncorrectable I/O failure and has been suspended.

xandrus · 2024-03-02T23:38:47Z

Hi,
Very important question.
Do you have running MooseFS metalogger in your cluster?

It looks like hard disk problem not MooseFS case: error, dev sda

[89816.536023] blk_update_request: critical medium error, dev sda, sector 3337280472 op 0x0:(READ) flags 0x80700 phys_seg 46 prio class 0

If your operating system and MooseFS metadata are on /dev/sda it means that you have a serious problem there!
Is mfsmaster still available from CGI, cli, clients or not?

In case when MooseFS master is not available end you are not able to do any IO operation, and you have running metalloger you should consider running metalogger machine as master.
Check this document:
https://moosefs.com/Content/Downloads/moosefs-3-0-users-manual.pdf
Chapter 5.2 Master metadata restore from Metaloggers

charlesboyo · 2024-03-02T23:47:49Z

Thanks @xandrus. Yes I have a metalogger and I am considering running it as master. But the original master cannot be killed even though is not accepting connections from anything. What is the risk involved if it suddenly unblocks?

ss -nap4 | grep LISTEN | grep mfsmaster

tcp LISTEN 101 100 0.0.0.0:9419 0.0.0.0:* users:(("mfsmaster",pid=2915,fd=8))
tcp LISTEN 101 100 0.0.0.0:9420 0.0.0.0:* users:(("mfsmaster",pid=2915,fd=11))
tcp LISTEN 101 100 0.0.0.0:9421 0.0.0.0:* users:(("mfsmaster",pid=2915,fd=12))

I tried restart the moosefs-master service, that didn't work.

systemctl status moosefs-master

● moosefs-master.service - MooseFS Master server
Loaded: loaded (/lib/systemd/system/moosefs-master.service; enabled; vendor preset: enabled)
Active: deactivating (final-sigterm) (Result: exit-code) since Sat 2024-03-02 23:11:01 WAT; 1h 34min ago
Process: 4015743 ExecStop=/usr/sbin/mfsmaster stop (code=exited, status=1/FAILURE)
Main PID: 2915 (mfsmaster)
Tasks: 2 (limit: 309393)
Memory: 702.6M
CGroup: /system.slice/moosefs-master.service
└─2915 /usr/sbin/mfsmaster start

Mar 02 23:40:20 ipnx-stor-hq01 mfsmaster[4015743]: about 1760 seconds passed and lock still exists
Mar 02 23:40:30 ipnx-stor-hq01 mfsmaster[4015743]: about 1770 seconds passed and lock still exists
Mar 02 23:40:40 ipnx-stor-hq01 mfsmaster[4015743]: about 1780 seconds passed and lock still exists
Mar 02 23:40:50 ipnx-stor-hq01 mfsmaster[4015743]: about 1790 seconds passed and lock still exists
Mar 02 23:41:00 ipnx-stor-hq01 mfsmaster[4015743]: about 1800 seconds passed and lockfile is still locked - giving up
Mar 02 23:41:00 ipnx-stor-hq01 mfsmaster[4015743]: waiting for termination .........................................................................................................................................
Mar 02 23:41:00 ipnx-stor-hq01 systemd[1]: moosefs-master.service: Control process exited, code=exited, status=1/FAILURE
Mar 03 00:11:01 ipnx-stor-hq01 systemd[1]: moosefs-master.service: State 'stop-sigterm' timed out. Killing.
Mar 03 00:11:01 ipnx-stor-hq01 systemd[1]: moosefs-master.service: Killing process 2915 (mfsmaster) with signal SIGKILL.
Mar 03 00:41:01 ipnx-stor-hq01 systemd[1]: moosefs-master.service: Processes still around after SIGKILL. Ignoring.

sda is one of the HDDs of the chunkserver. It does not contain the OS or MooseFS metadata.
CGI, chunkservers and clients are unable to connect to the master.

journalctl -f

Mar 03 00:44:47 ipnx-stor-hq01 mfsmount[26084]: can't connect to master ("127.0.0.1":"9421")
Mar 03 00:44:50 ipnx-stor-hq01 mfschunkserver[24427]: connecting ...
Mar 03 00:44:51 ipnx-stor-hq01 mfschunkserver[24427]: connection timed out
Mar 03 00:44:51 ipnx-stor-hq01 mfsmount[26084]: can't connect to master ("127.0.0.1":"9421")

charlesboyo · 2024-03-03T03:50:50Z

I created a new mfsmaster by replicating /etc/mfs and /var/lib/mfs from the unresponsive node to another machine.
Then updated the mfsmaster entry across all nodes, restarted services and they all moved to the new master.

Except the loopback mfsmount since it was configured to use 127.0.0.1 on the broken node.
Patched that to redirect to the new master with a combination of iptables and the route_localnet sysctl.

And then I could reach /mnt/mfs and the zpool vdev file. Running zpool clear on that zpool ... and all the processes in D state cleared. The hung mfsmaster terminated and I am able to log in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] mfsmaster hung and in unkillable D state #566

[BUG] mfsmaster hung and in unkillable D state #566

charlesboyo commented Mar 2, 2024

xandrus commented Mar 2, 2024

charlesboyo commented Mar 2, 2024 •

edited

charlesboyo commented Mar 3, 2024

[BUG] mfsmaster hung and in unkillable D state #566

[BUG] mfsmaster hung and in unkillable D state #566

Comments

charlesboyo commented Mar 2, 2024

Have you read through available documentation, open Github issues and Github Q&A Discussions?

System information

Your moosefs version and its origin (moosefs.com, packaged by distro, built from source, ...).

Operating system (distribution) and kernel version.

Hardware / network configuration, and underlying filesystems on master, chunkservers, and clients.

How much data is tracked by moosefs master (order of magnitude)?

Describe the problem you observed.

Can you reproduce it? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Include any warning/errors/backtraces from the system logs.

systemctl status moosefs-master

ls -l /var/lib/mfs/changelog.0.mfs

tail /var/lib/mfs/changelog.0.mfs

dmesg

xandrus commented Mar 2, 2024

charlesboyo commented Mar 2, 2024 • edited

ss -nap4 | grep LISTEN | grep mfsmaster

systemctl status moosefs-master

journalctl -f

charlesboyo commented Mar 3, 2024

charlesboyo commented Mar 2, 2024 •

edited