You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apparently Vivado is failing to map wide FMAs to DSPs efficiently.
Lakeroad alone probably can't do this -- once a solver query needs to figure out that some combination of bvmuls == one wide bvmul, they all seem to choke. There may be solver tricks to do this (reasoning about multiplies is a known hard problem; I would think solvers like cvc5 would have done research on this). However, there's an even more obvious way around this: use equality saturation (ie Churchroad) to block up the FMA via rewrites, and then run Lakeroad synthesis on the smaller FMAs that result. Assuming the smaller FMAs are sized to fit on a single DSP, then this should work great.
Subtasks:
Get an example of a realistic wide FMA.
See how Vivado fails to map it.
Ingest wide FMA into Churchroad.
Develop rewrites to block wide FMA into DSP-sized FMAs.
Call out to Lakeroad to map the DSP-sized FMAs.
The text was updated successfully, but these errors were encountered:
The specific cases that would be super helpful for processor design are:
+-(32bx32b)+-32b->32b
32bx32b->64b
32bx32b->64b
+-(64bx64b)+-64b->64b
64bx64b->64b
64bx64b->128b
Not sure how the output bits can affect DSP inference! In ASIC, it is a substantial savings (10s of percents) to drop the upper bits. Could be free on FPGA? Interesting either way?
Any realistic number of pipeline stages is fine, in ASIC we typically see 3+-1
I have more advance usages I'd love support for, but this is a great place to start!
This was mentioned by @dpetrisko.
Apparently Vivado is failing to map wide FMAs to DSPs efficiently.
Lakeroad alone probably can't do this -- once a solver query needs to figure out that some combination of bvmuls == one wide bvmul, they all seem to choke. There may be solver tricks to do this (reasoning about multiplies is a known hard problem; I would think solvers like cvc5 would have done research on this). However, there's an even more obvious way around this: use equality saturation (ie Churchroad) to block up the FMA via rewrites, and then run Lakeroad synthesis on the smaller FMAs that result. Assuming the smaller FMAs are sized to fit on a single DSP, then this should work great.
Subtasks:
The text was updated successfully, but these errors were encountered: