-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guestfs kernel choosing wrong root disk (under load/race condition) #69
Comments
This makes it clearer: the drive letters assigned by Linux don't match the sequence of disk path numbers.
So the root disk is the "last" disk (2:0:46:0), and guestfs picks what it thinks will be the "last" drive name allocated ( (I am hopeful that the shuffling won't affect my application, since I'm applying labels to each of the disk files. But obviously if guestfs won't boot, then it fails) If Linux can't be told to assign the drive letters in a deterministic order, then perhaps the solution is to use |
Since the problem is with the initial startup of the kernel, I've found it reproduces without mounting/uploading/unmounting any data to the attached disks. You just have to add a load of disks. This results in a simplified gf-run.py |
I am using guestfish to create prepare and upload multiple disk images. Each guestfs VM has a number of disk images attached, and uploads to them in turn. Additionally, there are multiple guestfs VMs running concurrently to prepare additional images, in order to make use of concurrency in my (quad-core) CPU. These concurrent instances are creating and uploading completely separate files (except that they all clone a shared qcow2 base image).
What I find is that under such load, guestfs intermittently fails to boot. I have made a standalone set of scripts to reproduce this.
It appears to be some sort of race condition: if I set
LIBGUESTFS_DEBUG=1
andLIBGUESTFS_TRACE=1
and allow the output to go to the screen, then the problem occurs much less frequently. However I have now been able to reproduce the problem withLIBGUESTFS_DEBUG=1
and the results are included below.Reproducer
mybase.img
. I usedmformat -C -f 1440 -i mybase.img ::
from mtools.Run
./gf-par.py
.It will spawn multiple concurrent copies of
./gf-run.py
and if the problem occurs will leave its temporary files lying around, including the captured stdout/stderr from guestfish. Each instance creates a number of qcow2 clones of themybase.img
file and uploads some data.Note: it doesn't reproduce every time. You may need to run it multiple times. I haven't got to the bottom of what makes it sometimes reproduce easily and sometimes not. In my real-world application, sometimes only running two guestfs instances concurrently is enough to trigger the problem.
Results
An example of when guestfs fails is here - full debug output.
In short, it appears to get confused about which disk is the root disk. It chooses
sdau
as the root disk:And yet,
sdau
is one of the qcow2 image clones; the supermin disk appears to besdat
(looking at the size)The text was updated successfully, but these errors were encountered: