What it is
The math coprocessor turns the FPGA's hardware DSP blocks into a peripheral the CPU drives with a few store/load instructions. The 6502 has no multiply instruction — in software a fixed-point product costs roughly 250 cycles. A signed multiply here maps to a single hardware DSP and completes in one clock.
What it does
The unit continuously multiplies whatever is latched in operand registers A and B, and provides
both the full 64-bit product and an arithmetic-right-shifted result. The shift is selectable via a
register (0…63), so the same unit serves any Q-format — SHIFT = 24 gives
8.24, SHIFT = 12 gives 4.12.
Why it matters
| Software (4.12) | Coprocessor (8.24) | |
|---|---|---|
| Per multiply | ~250 cycles | ~12–20 cycles |
| Sign handling | manual in 6502 | in hardware |
| Normalization | 4× lsr/ror chain | free (hardware) |
| Precision | 12 fractional bits | 24 fractional bits |
| Mandelbrot 320×200 | ~5–8 min | ~10 s |
Registers ($88B0)
| Address | Write | Read |
|---|---|---|
| $88B0–$88B3 | operand A (4 bytes, LE) | raw product, bytes 0–3 |
| $88B4–$88B7 | operand B (4 bytes, LE) | raw product, bytes 4–7 |
| $88B8–$88BB | — | result (A·B) >> SHIFT (4 bytes) |
| $88BC | SHIFT (0…63) | SHIFT |
All values are little-endian — matching how the 6502 stores multi-byte words. There is no "start" strobe: the result tracks the operands two clocks later, and because the 6502 needs several cycles before the next read, no wait states are required.
Using it from the 6502
MUL = $88B0 MUL_A = MUL+0 ; operand A (4 bytes) MUL_B = MUL+4 ; operand B (4 bytes) MUL_RES = MUL+8 ; result (4 bytes) lda #24 ; once: select 8.24 sta MUL+12 ; write A, write B, read result ...
The self-test copro_selftest.s computes 2.0 × 3.0,
checks for $06000000 and fills the screen green on success, red on failure.