Analyze low batch size timing #538

ryan-summers · 2022-05-16T09:49:34Z

Analyze the timing requirements when using the DMA sample acquisition architecture for ADC/DAC operations for low batch sizes (e.g. 1 or 2).

If possible, we may want to eliminate the Peripheral data -> RAM DMA operation, as this would eliminate processing overhead in the loop. Instead, for these low batch counts, the data can just be manually transacted with the peripherals directly.

ryan-summers · 2022-05-16T11:53:24Z

The above capture was completed using toggling of the USART3 RX/TX lines while using a batch size of 1.

TX was enabled at the start of the DSP process() function call and de-asserted at the end.
RX was enabled immediately before getting the ADC/DAC data buffers and servicing the DBM DMA transfer. It was the disabled immediately inside of the closure processing said buffers.
- The second RX pulse is caused by RX being asserted immediately before data is transferred to the ethernet livestream. It is then deasserted immediately after the DBM/DMA transfer closure completes

As can be seen, the whole DSP process takes approximately 1.9uS, which comes to a maximum sampling rate of approximately 526KHz. Of that, servicing the DBM DMA transfers for data requires about 420ns.

If there was no DBM DMA transfer servicing required, the existing livestream / DSP routines require 1.48uS, which corresponds to a maximum sampling rate of ~676KHz. However, even without DBM DMA, there would still be some small amount of time required to read/write the SPI peripheral data registers, so in reality, the overhead would be slightly more.

Rough breakdown of time requirements within DSP processing for a batch size of 1:

pie title Process time breakout (Batch size = 1)
    "DSP Routines": 900
    "Get DMA Buffers": 440
    "Prepare livestream": 400
    "Update Telemetry": 120
    "Exit": 20
    "Entry": 120

jordens · 2022-05-16T14:15:00Z

Interesting. I seem to remember much less time for DSP. ~1000 insns is a lot. Might be worthwhile to check back against

stabilizer/src/main.rs

Lines 247 to 284 in 0fd442e

    
           #[task(binds = SPI1, resources = [spi, iir_state, iir_ch], priority = 2)] 
        
           fn spi1(c: spi1::Context) { 
        
               #[cfg(feature = "bkpt")] 
        
               cortex_m::asm::bkpt(); 
        
               let (spi1, spi2, spi4, spi5) = c.resources.spi; 
        
               let iir_ch = c.resources.iir_ch; 
        
               let iir_state = c.resources.iir_state; 
        
               let sr = spi1.sr.read(); 
        
               if sr.eot().bit_is_set() { 
        
                   spi1.ifcr.write(|w| w.eotc().set_bit()); 
        
               } 
        
               if sr.rxp().bit_is_set() { 
        
                   let rxdr = &spi1.rxdr as *const _ as *const u16; 
        
                   let a = unsafe { ptr::read_volatile(rxdr) }; 
        
                   let x0 = f32::from(a as i16); 
        
                   let y0 = iir_ch[0].update(&mut iir_state[0], x0); 
        
                   let d = y0 as i16 as u16 ^ 0x8000; 
        
                   let txdr = &spi2.txdr as *const _ as *mut u16; 
        
                   unsafe { ptr::write_volatile(txdr, d) }; 
        
               } 
        
               let sr = spi5.sr.read(); 
        
               if sr.eot().bit_is_set() { 
        
                   spi5.ifcr.write(|w| w.eotc().set_bit()); 
        
               } 
        
               if sr.rxp().bit_is_set() { 
        
                   let rxdr = &spi5.rxdr as *const _ as *const u16; 
        
                   let a = unsafe { ptr::read_volatile(rxdr) }; 
        
                   let x0 = f32::from(a as i16); 
        
                   let y0 = iir_ch[1].update(&mut iir_state[1], x0); 
        
                   let d = y0 as i16 as u16 ^ 0x8000; 
        
                   let txdr = &spi4.txdr as *const _ as *mut u16; 
        
                   unsafe { ptr::write_volatile(txdr, d) }; 
        
               } 
        
               #[cfg(feature = "bkpt")] 
        
               cortex_m::asm::bkpt(); 
        
           }

(caveat: old hardware I think).
Ah. I think the big difference in DSP load is the signal generator.
Also do generally use nightly and the cortex-m/inline-asm feature.
I've found DWT CYCCNT to be a nicer tool for these measurement that GPIO toggling. I think it could well be less overhead.

ryan-summers · 2022-05-16T14:29:27Z

My calculations are showing the DSP section taking approximately ~360 insns - the rest of the overhead here is from the various other things we've put into the DSP routing, such as telemetry, signal generation, DMA servicing, etc.

jordens · 2022-05-16T14:52:10Z

The 1.9 µs you measure are about 760 insns for "DSP Routines". That doesn't include DMA servicing and telemetry, right?

jordens · 2022-05-16T14:53:15Z

Ah. No. The 1.9 µs you call "DSP process" is not "DSP routines".

jordens · 2022-05-16T14:53:54Z

Isn't signal generation part of "DSP Routines" in your measurement?

ryan-summers · 2022-05-16T15:00:41Z

"DSP Routines" is inclusive of signal generation - it's the amount of time the closure on the ADCs/DACs run:

// Start timer
(adc0, adc1, dac0, adc1).lock(() {
    // Stop & Reset timer, this is "Get DMA Buffers"
})
// Stop & Reset timer, this is called "DSP Routines"

telemetry.latest_adcs = [adcs[0][0], adcs[1][0]];
telemetry.latest_dacs = [dacs[0][0], dacs[1][0]];
// Stop timer, this is "Update telemetry"

I'll try to get a full diff just to show things. I want to rework it so these calculations just get reported via telemetry instead of manually probing debug pins as well.

ryan-summers changed the title ~~Analyze Low batch size timing~~ Analyze low batch size timing May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyze low batch size timing #538

Analyze low batch size timing #538

ryan-summers commented May 16, 2022

ryan-summers commented May 16, 2022 •

edited

jordens commented May 16, 2022

ryan-summers commented May 16, 2022

jordens commented May 16, 2022

jordens commented May 16, 2022

jordens commented May 16, 2022

ryan-summers commented May 16, 2022

Analyze low batch size timing #538

Analyze low batch size timing #538

Comments

ryan-summers commented May 16, 2022

ryan-summers commented May 16, 2022 • edited

jordens commented May 16, 2022

ryan-summers commented May 16, 2022

jordens commented May 16, 2022

jordens commented May 16, 2022

jordens commented May 16, 2022

ryan-summers commented May 16, 2022

ryan-summers commented May 16, 2022 •

edited