Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Good test case: Mapping wider FMAs #52

Open
5 tasks
gussmith23 opened this issue Apr 29, 2024 · 1 comment
Open
5 tasks

Good test case: Mapping wider FMAs #52

gussmith23 opened this issue Apr 29, 2024 · 1 comment

Comments

@gussmith23
Copy link
Contributor

gussmith23 commented Apr 29, 2024

This was mentioned by @dpetrisko.

Apparently Vivado is failing to map wide FMAs to DSPs efficiently.

Lakeroad alone probably can't do this -- once a solver query needs to figure out that some combination of bvmuls == one wide bvmul, they all seem to choke. There may be solver tricks to do this (reasoning about multiplies is a known hard problem; I would think solvers like cvc5 would have done research on this). However, there's an even more obvious way around this: use equality saturation (ie Churchroad) to block up the FMA via rewrites, and then run Lakeroad synthesis on the smaller FMAs that result. Assuming the smaller FMAs are sized to fit on a single DSP, then this should work great.

Subtasks:

  • Get an example of a realistic wide FMA.
  • See how Vivado fails to map it.
  • Ingest wide FMA into Churchroad.
  • Develop rewrites to block wide FMA into DSP-sized FMAs.
  • Call out to Lakeroad to map the DSP-sized FMAs.
@dpetrisko
Copy link

Thanks @gussmith23 !

The specific cases that would be super helpful for processor design are:

  1. +-(32bx32b)+-32b->32b

  2. 32bx32b->64b

  3. 32bx32b->64b

  4. +-(64bx64b)+-64b->64b

  5. 64bx64b->64b

  6. 64bx64b->128b

Not sure how the output bits can affect DSP inference! In ASIC, it is a substantial savings (10s of percents) to drop the upper bits. Could be free on FPGA? Interesting either way?

Any realistic number of pipeline stages is fine, in ASIC we typically see 3+-1

I have more advance usages I'd love support for, but this is a great place to start!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants