Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on boot in Hyper-V #976

Open
justledbetter opened this issue Feb 24, 2021 · 16 comments
Open

Crash on boot in Hyper-V #976

justledbetter opened this issue Feb 24, 2021 · 16 comments

Comments

@justledbetter
Copy link

I'm seeing a crash on boot when trying to startup under Hyper-V. Type 2 VM, virtualization extensions are enabled on the CPU, the system is configured with two vCPUs.

Booting...
illumos Version joyent_20210114T163038Z 64-bit

panic[cpu0]/thread=fffffffffbc4a0c0: microfind: could not calibrate delay loop

Warning - stack not written to the dump buffer
fffffffffbc4a220 unix:microfind+b2 ()
fffffffffbc4a2a0 unix:startup_modules+20 ()
fffffffffbc4a2b0 unix:startup+55 ()
fffffffffbc4a2f0 genunix:main+36 ()
fffffffffbc4a300 unix:_locore_start+90 ()

skipping system dump - no dump device configured
rebooting...
<system hangs here>

The same crash reproduces on OmniOS CE r151036.

@jasonbking
Copy link
Contributor

Would you be willing to try a test PI w/ a fix? The issue is that the Hyper-V Type 2 VMs don't emulate the i8254 PIT timer.

@justledbetter
Copy link
Author

Very happy to try anything you suggest, would just need instructions, as I'm new to debugging Illumos.

I had also tried booting up in a Generation 1 VM, and recall it hanging and not producing any output. Is there a way to enable more verbose output, to maybe see where it is hanging?

@jasonbking
Copy link
Contributor

In the boot loader, under boot options, there should be an option to enable verbose booting. Let me build and image w/ the fix (will take a bit) and I'll include a temporary link you can use to download it.

@justledbetter
Copy link
Author

I'll be standing by!

@jasonbking
Copy link
Contributor

What format media do you prefer? .tgz, .iso, or .usb?

@justledbetter
Copy link
Author

.iso works best for me, Thanks!

@jasonbking
Copy link
Contributor

try https://us-east.manta.joyent.com/jbk/public/tmp/platform-20210224T212002Z.iso -- that should use the HPET instead of the PIT to calibrate things

@justledbetter
Copy link
Author

This one crashes in a loop with messages repeating:

panic[cpu0]/thread=fffffffffbc4a0c0: bad DTrace trap

panic: entering debugger (continue to reboot)

...and then it eventually hangs.

@jasonbking
Copy link
Contributor

.. That is interesting... can you go into the boot options menu in the boot loader, enable verbose boot as well as kmdb? That should drop you to the KMDB prompt where you can use $C to get a stack trace

@justledbetter
Copy link
Author

justledbetter commented Feb 24, 2021

OK enabling the debugger, I get the following output:

panic[cpu0]/thread=fffffffffbc4a0c0: Failed to calibrate TSC
fffffffffbc8a280 unix:tsc_calibrate+16f ()
fffffffffbc8a2a0 unix:startup_tsc+18 ()
... startup+4a ()
etc

(Sorry, have to copy paste with my eyes :) )

Note I cannot input any text in the debugger after the crash.

@jasonbking
Copy link
Contributor

I think I see what happened.. I can do a quick incremental build (though it'll take a bit to re-upload)..

@jasonbking
Copy link
Contributor

I've uploaded a new ISO image (same path). Let me know how that one works -- if nothing else, you should get a new error :) though hopefully it actually works.

@justledbetter
Copy link
Author

justledbetter commented Feb 25, 2021

It gets past that error now, but still hangs later. The output of your module reads:

TSC calibrated using hyperv; freq is 0 MHzSMBIOS v3.1 loaded (961 bytes)initialized model-specific module 'cpu_ms.GenuineIntel' on chip 0 core 0 strand 0

(lack of newlines as in the original)

Now the startup hangs at:

ramdisk0 at root
ramdisk0 is /ramdisk
WARNING: Last shutdown is later than time on time-of-day chip; check date.
root on /ramdisk:a fstype ufs
/cpus (cpunex0) online
pseudo-device dld0
dld0 is /pseudo/dld@0
<hang>

Could it be hanging due to the assumed 0 MHz clock? Perhaps a div-zero trap that's not presenting itself as a crash?

I am trying to hit F1+A, but it's not doing anything. Not sure if this is similar to before (unable to enter text) or if there's something else going on. Is there a way to interrupt boot and set a breakpoint before it gets to this point in order to trace further?

Update: When booting in non-verbose mode, I get the following additional hint:

WARNING: Last shutdown is later than time on time-of-day chip; check date.
WARNING: Time of Day clock error: reason [Stalled]. -- Stopped tracking Time Of Day clock.

Also: While the system is hung, Hyper-V reports a constant 2% CPU usage (this VM has 2 vCPUs out of a 12-core system, iirc)

@jasonbking
Copy link
Contributor

Ok.. I'm trying one more thing and have updated the link again.. see if that works any better..

@justledbetter
Copy link
Author

justledbetter commented Feb 25, 2021

It's back to crashing at tsc_calibrate+16f () (with Failed to calibrate TSC as listed out above)

@justledbetter
Copy link
Author

Standing by to continue testing any time you need me to -- Thanks very much for all the help so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants