Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asm::delay() only blocks half the time intended on an STM32H7 #430

Open
nkrackow opened this issue Apr 13, 2022 · 6 comments
Open

asm::delay() only blocks half the time intended on an STM32H7 #430

nkrackow opened this issue Apr 13, 2022 · 6 comments

Comments

@nkrackow
Copy link

The asm::delay() function delays a program only by half the time/CPU cycles intended on an STM32H7.
I have prepared a somewhat minimal example here. My CPU clock is at 400 MHz and I specify a delay of 4_000_000_000 cycles which should lead to a 10 second delay. The actual delay is 5 seconds for me.

I have also confirmed the same behavior in other programs on an STM32H7 with much shorter timescales and looking at IOs on a scope. The delay is always exactly half the time expected.

One guess is that this is due to the dual issue pipeline of the H7 eating up two instructions per cycle.

@nkrackow nkrackow changed the title asm::delay() is only blocks halve the time intended on an STM32H7 asm::delay() is only blocks half the time intended on an STM32H7 Apr 13, 2022
@nkrackow nkrackow changed the title asm::delay() is only blocks half the time intended on an STM32H7 asm::delay() only blocks half the time intended on an STM32H7 Apr 13, 2022
@nkrackow
Copy link
Author

Oh I forgot to mention that I tried this with 0.7.4 and the current master. In my example repo I patch in the current master.

@adamgreig
Copy link
Member

adamgreig commented Apr 14, 2022

Hm, that's annoying, we deliberately double the number of cycles delay() blocks for in #312 to accommodate the dual issue pipeline of the Cortex-M7 (#236); in fact this led to complaints about delaying too long on Cortex-M3/4 in #325.

I vaguely recall checking this on an stm32h7 and getting the right result, I'll see if I can dig it out and investigate a bit further. I guess we could read CPUID at runtime to try and work out what CPU we're on, but even then things like whether execution is from flash or RAM or XIP (Q)SPI flash, whether i-cache is on, whether a flash accelerator is configured, etc etc etc will all heavily influence the number of cycles. The current calculation is a two-instruction loop and it runs half the specified number of cycles of iterations, i.e. it will retire as many instructions as requested delay.

It might be that the best fix we can do is make the documentation even more vague on precisely how many cycles of delay you'll get...

@nkrackow
Copy link
Author

Sorry.. ;)
I mean please do double check my example or try with other code. I tired to exclude other variables where possible and had some discussion with @jordens but there might still be another explanation/condition causing this for me.

But since this actually broke my code which had to wait for an external device reset I think it's an important bug.

@jordens
Copy link
Contributor

jordens commented Apr 14, 2022

I suspect the only practical solution might indeed be to not touch the scaling factor anymore for behavioral stability/compatibility and make the docs say that this is an uncalibrated delay with both an an offset and a scaling factor that depend on cortex-m architecture and e.g. enabled cpu/code location/cache features.

@adamgreig
Copy link
Member

I double checked this in #325 (comment) and indeed we currently only delay half the required time on CM7; the fix in #312 doesn't fully account for the core being able to issue both instructions in the loop at the same time.

I guess ideally we'd experiment a bit more about when a CM7 core will or won't dual-issue the loop, but if it seems like it does always manage it, I'd be inclined to try doing feature detection and delaying for about the right number of cycles for the current core, rather than being wrong on everything (as currently) or 3x worse on one core than another (if we fixed it for cm7).

@jordens
Copy link
Contributor

jordens commented Feb 9, 2023

I'd definitely prefer guaranteeing an error on the long delay side rather than trying "about right". For the frequent applications of e.g. guaranteeing datasheet timing a relative error of 3 is acceptable (if annoying) while 0.9 would be unacceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants