advgpcgen
is a tool that generates Generalized Parallel Counters (GPC) from scratch, which serve as the core of multi-input adders in Xilinx FPGAs.
Addition of multiple values is used in almost all arithmetic operations, starting multiplication and multiply-accumulate operations. In ASICs, the method of constructing trees using full adders as the basic elements for multipliers has been known for a long time. However, full adders do not fit well with FPGA's LUTs and carry logic, which is not always efficient. Therefore, there are proposals for methods using adders expanded to have six inputs and three outputs, or adders where each input has weights other than 1 (2,4,8...) as basic elements. Such expanded adders called Generalized Parallel Counters (GPC), and the adder tree using GPCs are called Compressor Tree.
GPC is represented as follow:
For example, a full adder is represented as
So far, three types of GPCs know to be implementable in a single slice are
In this project, five new GPCs that implementable in single slice have been discovered:
The Verilog HDL implementations of the GPCs are located in the hdl
branch.
They require LUT1~5, LUT6_2, and CARRY4 modules.
cargo build --release
solve <shape>
- Determines whether a GPC exits for the specified input shape, and if so, print it as JSON.
$ cargo run --release --bin solve 1334 > gpc1334.json
Finished `release` profile [optimized] target(s) in 0.01s
Running `target/release/solve 1334`
$ cat gpc1334.json
{"shape":[4,3,3,1],"lut":[[[1,2,3],null,644245094496],[[1,2,4,5,6],null,8685059358021126272],[[4,5,6,8,9],7,1722882046844934120],[[4,5,6,8,9],10,6500312741898240]],"cin":0}
$
enum <width>
- Enumerates all GPCs of the input width.
- The default width is 4.
$ cargo run --release --bin enum 2
...
[4, 7] total over
[5, 7] total over
[6, 7] total over
[7, 7] total over
max_feasibles
{"shape":[7,0],"lut":[[[2,3,4,5,6],1,7608434000728254870],[[2,3,4,5,6],null,1692930048736133120]],"cin":0}
{"shape":[5,1],"lut":[[[1,2,3,4],null,116092966049280],[[1,2,3,5],null,26285199910912]],"cin":0}
{"shape":[3,2],"lut":[[[1,2],null,25769803784],[[3,4],null,25769803784]],"cin":0}
{"shape":[1,3],"lut":[[[1],null,2],[[2,3],null,25769803784]],"cin":0}
min_infeasibles
$
script/codegen.py <JSON>
- Generates Verilog HDL module and testbench from JSON (generated by
enum
orshape
). - The testbench tries every input bits patterns.
- GPCs require LUT1~5, LUT6_2 and CARRY4 modules.
- Generates Verilog HDL module and testbench from JSON (generated by
$ script/codegen.py gpc1334.json
module gpc1334_5(input [3:0] src0, input [2:0] src1, input [2:0] src2, input [0:0] src3, output [4:0] dst);
wire [3:0] gene;
wire [3:0] prop;
wire [3:0] out;
wire [3:0] carryout;
LUT3 #(
.INIT(8'h60)
) lut3_gene0(
.O(gene[0]),
.I0(src0[1]),
...
#1
{src3[0], src2[2], src2[1], src2[0], src1[2], src1[1], src1[0], src0[3], src0[2], src0[1], src0[0]} <= 11'h7fd;
#1
{src3[0], src2[2], src2[1], src2[0], src1[2], src1[1], src1[0], src0[3], src0[2], src0[1], src0[0]} <= 11'h7fe;
#1
{src3[0], src2[2], src2[1], src2[0], src1[2], src1[1], src1[0], src0[3], src0[2], src0[1], src0[0]} <= 11'h7ff;
#1
$finish();
end
endmodule
$
- Mugi Noda
- GPLv3