Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guestfs kernel choosing wrong root disk (under load/race condition) #69

Open
candlerb opened this issue Jul 12, 2021 · 3 comments
Open

Comments

@candlerb
Copy link

I am using guestfish to create prepare and upload multiple disk images. Each guestfs VM has a number of disk images attached, and uploads to them in turn. Additionally, there are multiple guestfs VMs running concurrently to prepare additional images, in order to make use of concurrency in my (quad-core) CPU. These concurrent instances are creating and uploading completely separate files (except that they all clone a shared qcow2 base image).

What I find is that under such load, guestfs intermittently fails to boot. I have made a standalone set of scripts to reproduce this.

It appears to be some sort of race condition: if I set LIBGUESTFS_DEBUG=1 and LIBGUESTFS_TRACE=1 and allow the output to go to the screen, then the problem occurs much less frequently. However I have now been able to reproduce the problem with LIBGUESTFS_DEBUG=1 and the results are included below.

Reproducer

  • gf-par.py
  • gf-run.py
  • Make an empty msdos image called mybase.img. I used mformat -C -f 1440 -i mybase.img :: from mtools.

Run ./gf-par.py.

It will spawn multiple concurrent copies of ./gf-run.py and if the problem occurs will leave its temporary files lying around, including the captured stdout/stderr from guestfish. Each instance creates a number of qcow2 clones of the mybase.img file and uploads some data.

Note: it doesn't reproduce every time. You may need to run it multiple times. I haven't got to the bottom of what makes it sometimes reproduce easily and sometimes not. In my real-world application, sometimes only running two guestfs instances concurrently is enough to trigger the problem.

Results

An example of when guestfs fails is here - full debug output.

In short, it appears to get confused about which disk is the root disk. It chooses sdau as the root disk:

[    0.000000] Command line: ... root=/dev/sdau

And yet, sdau is one of the qcow2 image clones; the supermin disk appears to be sdat (looking at the size)

[    0.830155] scsi 2:0:44:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    0.832186] scsi 2:0:46:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    1.094846] scsi 2:0:44:0: Attached scsi generic sg44 type 0
[    1.099462] sd 2:0:46:0: Power-on or device reset occurred
[    1.100347] sd 2:0:46:0: [sdat] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
[    1.101121] sd 2:0:46:0: [sdat] Write Protect is off
[    1.101683] sd 2:0:46:0: [sdat] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.105412] sd 2:0:44:0: Power-on or device reset occurred
[    1.106313] sd 2:0:44:0: [sdau] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.108897] sd 2:0:46:0: Attached scsi generic sg46 type 0
[    1.113986] sd 2:0:46:0: [sdat] Attached SCSI disk
[    1.114502] sd 2:0:44:0: [sdau] Write Protect is off
[    1.118211] sd 2:0:44:0: [sdau] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.121151]  sdau: sdau1
[    1.122104] sd 2:0:44:0: [sdau] Attached SCSI disk
...
supermin: picked /sys/block/sdau/dev (66:224) as root device
supermin: creating /dev/root as block special 66:224
supermin: mounting new root on /root
[    1.135304] EXT4-fs (sdau): VFS: Can't find ext4 filesystem
mount: /root: Invalid argument
[    1.136398] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
@candlerb
Copy link
Author

This makes it clearer: the drive letters assigned by Linux don't match the sequence of disk path numbers.

$ grep "logical blocks" err | sort -k3 -t: -n
[    0.849334] sd 2:0:0:0: [sda] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.853811] sd 2:0:1:0: [sdb] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.863275] sd 2:0:2:0: [sdc] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.869419] sd 2:0:3:0: [sdd] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.866291] sd 2:0:4:0: [sde] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.874033] sd 2:0:5:0: [sdf] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.882780] sd 2:0:6:0: [sdg] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.879101] sd 2:0:7:0: [sdh] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.887317] sd 2:0:8:0: [sdi] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.911467] sd 2:0:9:0: [sdj] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.896418] sd 2:0:10:0: [sdk] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.901380] sd 2:0:11:0: [sdl] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.904302] sd 2:0:12:0: [sdm] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.908550] sd 2:0:13:0: [sdn] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.918204] sd 2:0:14:0: [sdo] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.937406] sd 2:0:15:0: [sdp] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.923588] sd 2:0:16:0: [sdq] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.933875] sd 2:0:17:0: [sds] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)   <<<
[    0.930142] sd 2:0:18:0: [sdr] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)   <<<
[    0.948014] sd 2:0:19:0: [sdt] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.955293] sd 2:0:20:0: [sdu] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.960181] sd 2:0:21:0: [sdv] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.963255] sd 2:0:22:0: [sdw] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.986134] sd 2:0:23:0: [sdx] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.972792] sd 2:0:24:0: [sdy] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.979601] sd 2:0:25:0: [sdaa] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)   <<<
[    0.977594] sd 2:0:26:0: [sdz] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)   <<<
[    0.993489] sd 2:0:27:0: [sdab] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    0.997950] sd 2:0:28:0: [sdac] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.004614] sd 2:0:29:0: [sdad] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.021331] sd 2:0:30:0: [sdae] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.015359] sd 2:0:31:0: [sdaf] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.027027] sd 2:0:32:0: [sdag] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.031446] sd 2:0:33:0: [sdah] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.039598] sd 2:0:34:0: [sdaj] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)   <<<
[    1.036259] sd 2:0:35:0: [sdai] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)   <<<
[    1.064954] sd 2:0:36:0: [sdak] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.050903] sd 2:0:37:0: [sdal] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.056567] sd 2:0:38:0: [sdam] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.060191] sd 2:0:39:0: [sdan] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.072004] sd 2:0:40:0: [sdao] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.080638] sd 2:0:41:0: [sdap] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.085718] sd 2:0:42:0: [sdaq] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.087874] sd 2:0:43:0: [sdar] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)
[    1.106313] sd 2:0:44:0: [sdau] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)   <<<
[    1.102969] sd 2:0:45:0: [sdas] 2880 512-byte logical blocks: (1.47 MB/1.41 MiB)   <<<
[    1.100347] sd 2:0:46:0: [sdat] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)   <<<

So the root disk is the "last" disk (2:0:46:0), and guestfs picks what it thinks will be the "last" drive name allocated (/dev/sdau), but unfortunately sdas/sdat/sdau have been shuffled. There are several other shuffled pairs too.

(I am hopeful that the shuffling won't affect my application, since I'm applying labels to each of the disk files. But obviously if guestfs won't boot, then it fails)

If Linux can't be told to assign the drive letters in a deterministic order, then perhaps the solution is to use /dev/disk/by-path/... to select the root disk.

@candlerb
Copy link
Author

Since the problem is with the initial startup of the kernel, I've found it reproduces without mounting/uploading/unmounting any data to the attached disks. You just have to add a load of disks.

This results in a simplified gf-run.py

@sickcodes
Copy link

@libguestfs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants