CSE467: Advanced Logic Design

Carl Ebeling, Winter 1999

Homework 3

Distributed: Monday Feb 8 - Due Wednesday Feb 17

In this assignment, you will learn how to use the "sis" program to synthesize multi-level logic. The first part of the assignment is in tutorial form. The second part of the assignment asks you to synthesize a couple circuits. You may kibbitz with others while doing this assignment, as long as you do it all yourself and you turn in your own work.

sis (a descendant of misII) is a program from Berkeley for performing multi-level logic synthesis. The algorithms in sis underly many of the tools used in industry today, including FPGA Express, the Synopsys synthesis tool that we are using.

To use sis, you will have to login to one of the instructional alphas. First source the file /cse/courses/cse467/sis/sis.cshrc.alpha This will add sis to your path. In the .../cse467/sis/ directory are a number of example files that you will use in this assignment plus a number of scripts. Copy these files to your own directory.

Start up sis by typing sis on the command line. Now read in the first example equation by typing:

sis> read_eqn ex1.eqn

This reads in an equation from the file ex1.eqn The print command prints the circuit currently being worked on as a DAG. Type:

sis> print

This will print out the equation you just read in. The print_stats command prints the statistics for the current circuit. Type:

sis> print_stats

This will give the number of inputs, outputs and nodes in the circuit. Since we are not using sis for sequential circuits, there will always be 0 latches. Also given is the number of literals, which is roughly the cost of the circuit when using standard CMOS library cells. The rule of thumb is that you can determine the number of Xilinx 4000 CLBs you'll need by dividing this number by 6.

You should use these two commands after performing a transformation to see what the transformation has done to the circuit. The first transformation command we will try is decomp. This decomposes a function by factoring it. The decomp command decomposes each equation in the circuit separately. Type:

sis> decomp

If you print the result, you will see that the equation has been factored so that the DAG now has two nodes. It turns out that this circuit is no cheaper than the first one, but it's a start.

Now read in the equation file ex2.eqn This will replace the circuit you were just using.

sis> read_eqn ex2.eqn

This file contains two equations. First use the decomp command - notice that this factors the two equations separately. That is, it does not find common factors. The two commands, gkx and gcx are used to find common factors when there is more than one output. The first, gkx, finds common kernel factors. First read in the file again, or use the collapse command, which converts each output equation to a single node:

sis> collapse

Then type:

sis> gkx -ab

(Documentation on the commands can be found at the end of this assignment.)
Notice that this finds the common factor (e + f). The second command is gcx, which finds common cube factors. Type:

sis> gcx -b

Notice that this finds the common factor abx.

Now read in the next example from ex3.eqn This also contains two functions. First use the decomp command. Notice that the two output equations are factored, but separately. Now use collapse, or read in the file again. Now type:

sis> decomp f1

which factors the first equation only. Next use the resub command by typing:

sis> resub -a

resub performs resubstitution, that is trying to factor equations using factors that are already in the DAG. In this case, (b + f) is a factor that can be used in f2. The resub command is used after factoring with gkx, gcx or decomp to find cases where a common factor might be useful.

Now use the collapse command to get back to the original functions and then type:

sis> fx

which is a command which combines kernal and cube factoring. Notice how the result differs from the previous results.

Finally, the eliminate command allows you to partially collapse a network by eliminating nodes that don't help the cost much. For example,

sis> eliminate 2

eliminates nodes that do not reduce the cost of the network by more than 2. We often run eliminate 0 to get rid of nodes that don't help at all. If you look at the end of this assignment, you will see some "scripts" that are popular for synthesizing circuits automatically. These scripts show you how the commands are typically used. However, you can often do better by applying the transformations by hand.

PROBLEM 1

Find an implementation of the circuit in ex3.eqn that has a literal cost of at most 24. Turn in the result.

Now read in the file count.pla:

sis> read_pla count.pla

This circuit has 8 inputs and 2 outputs. The first output is asserted if 3 of the 8 inputs are on and second output is asserted if 2 of the 8 inputs are on. You'll recognize this circuit - we already have an implementation that uses only 14 3-LUTs (if I remember right). Now you'll use the synthesis tools to see how good a result you can get. Before you start, take a look at the three scripts at the end of this assignment. These are the scripts most often used to synthesize circuits. You can execute a script by typing, for example:

sis> source script.algebraic

The last script is copied from the book and is used to generated a LUT mapping for a circuit that has already been optimized.

PROBLEM 2

Read in the count.pla circuit and generate an implementation that uses the smallest number 3-LUTs. Hand the result. (35 3-LUTs is a reasonable target to shoot for - can you do better?)

PROBLEM 3

Read in the count.pla circuit and generate a Xilinx 4000 implementation that uses the smallest number of CLBs. Remember that a CLB can implement one 5-input function or 2 4-input functions and that there is no restriction on sharing inputs like in the Xilinx 3000. Hand in the result. (12 CLB's is a good target to shoot for - can you do better?)

PROBLEM 4

Now use the Xilinx Foundation tools to implement this circuit. (You can copy the project for this from the .../cse467/Winter99/Examples folder.) How many CLBs are used?

PROBLEM 5

What conclusions do you draw from this experiment with the count circuit?

Synthesis Scripts

script.algebraic

This script uses algebraic division.

sweep                    # Eliminate inverters and constants
eliminate 5              # Eliminate nodes that done help much
simplify -m nocomp -d    # Apply 2-level logic minimization to nodes
resub -a                 # Try to use nodes as factors of other nodes

gkx -abt 30              # Find really helpful common kernal factors
resub -a; sweep
gcx -bt 30               # Find really helpful common cube factors
resub -a; sweep

gkx -abt 10              # Find helpful kernal factors
resub -a; sweep
gcx -bt 10               # Find helpful cube factors
resub -a; sweep

gkx -ab                  # Find all kernal factors
resub -a; sweep
gcx -b                   # Find all cube factors
resub -a; sweep

eliminate 0              # Eliminate nodes that don't help
decomp -g *              # Finish by factoring the nodes

script.boolean

This script uses boolean division - don't try this on really large circuits!

sweep; eliminate -1
simplify 
eliminate -1

sweep; eliminate 5
simplify 

resub -a

gkx -abt 30
resub -a; sweep
gcx -bt 30
resub -a; sweep

gkx -abt 10
resub -a; sweep
gcx -bt 10
resub -a; sweep

gkx -ab
resub -a; sweep
gcx -b
resub -a; sweep

eliminate 0
decomp -g *

eliminate -1; sweep

script.rugged

This is the most popular script.

sweep; eliminate -1
simplify -m nocomp
eliminate -1

sweep; eliminate 5
simplify -m nocomp
resub -a

fx
resub -a; sweep

eliminate -1; sweep
full_simplify -m nocomp

script.xilinx

Script for mapping to FPGAs, where 5 is the maximum size LUT available. (Copied from the text.)

xl_split -n 5       # Factor nodes so that none has more than 5 inputs
sweep
simplify
xl_partition -n 5   # Collapse nodes into fanouts, making sure none has > 5 inputs
sweep
simplify
xl_partition -n 5
sweep
xl_k_decomp -n 5    # Roth_Karp decomposition
sweep

Synthesis command documenation

decomp [-gqd] [node-list]
Decompose all the nodes in the node-list. If the node-list is not
specified, all the nodes in the current network will be decomposed.
Decompostion will factor nodes and make the divisor a new node within
the network, re-expressing other nodes in terms of this newly introduced
node. It is one of the transforms used to break down large functions
into smaller pieces, usually at the cost of introducing a few more
literals.
If the -q option (the default) is specified, the quick decomp algorithm
is used which extracts out an arbitrary kernel successively. Because of
the fast algorithm for generating an arbitrary kernel, decomp -q is very
fast compared with the decomp -g. In most cases, the result is very
close. This command is recommended at the early phase of the optimiza-
tion.
If the -g option is specified, the good decomp algorithm is used which
successively extracts out the best kernel until the function is factor
free, and applies the same algorithm to all the kernels just extracted.
This operation will give the best algebraic decomposition for the nodes.
But, since it generates all the kernels at each step, it takes more CPU
time. In general, decomp -q should be used in the early stage of the
optimization. Only at the end of the optimization, should decomp -g be
used.
If the -d option is specified, the disjoint decomposition is performed.
Currently, the disjoint decomposition is limited to the following simple
algorithm: It partitions the cubes into sets of cubes having disjoint
variable support, creates one node for each partition, and a node (the
root of the decomposition) which is the OR of all the partitions.

gcx [-bcdf] [-t threshold]
Extract common cubes from a network, re-express the network in terms of
these cubes, and in the process cut down on the number of literals
needed in the network.
The -b option chooses the best cube at each step when examining possible
cubes to be extracted; otherwise, the more efficient ping-pong algorithm
is used to find a good (but not necessarily the best) cube at each step.
The -c option uses the complement of each cube as well as the cube when
dividing the new cube into the network.
The -f option uses the number of literals in the factored form for the
network as a cost function for determining the best cube to be
extracted.
The -t option sets a threshold such that only a cube with a value
greater than the threshold will be extracted. By default, the threshold
is 0, so that all possible cube divisors are extracted.
The -d option enables a debugging mode which traces the execution of
gcx.
gkx [-1abcdfo] [-t threshold]
Extract multiple-cube common divisors from the network.
The -a option generates all kernels of all function in the network when
building the kernel-intersection table. By default, only level-0 ker-
nels are used.
The -b option chooses the best kernel intersection as the new factor at
each step of the algorithm; this is done by enumerating and considering
each possible kernel intersection, and choosing the best. By default,
the more efficient ping-pong algorithm is used to find a good (but not
necessarily the best) kernel intersection.
The -c option uses the new factor and its complement when attempting to
introduce the new factor into the network.
The -d option enables debugging information which traces the execution
of the kernel extract algorithm.
The -f option uses the number of literals in the factored form for the
network as the cost function when determining the value of a kernel
intersection; by default, the number of literals in the sum-of-products
form for the network is used.
The -o option allows for overlapping factors.
The -t option sets a threshold such that divisors are extracted only
while their value exceeds the threshold. By default, the threshold is 0
so that all possible multiple-cube divisors are extracted from the net-
work.
The -1 option performs only a single pass over the network. By default,
the kernel extract algorithm is iterated while there are still divisors
whose value exceeds the given threshold.
collapse [n1] [n2]
Collapse nodes in the network. With no arguments, the entire network is
collapsed into a single-level of functions (i.e., two-level form). Each
output will be expressed in terms of the primary inputs.
Given a single node, that function is collapsed until it is represented
entirely in terms of primary inputs.
Given two arguments, it is assumed that the second node is a fanin of
the first node. In this case, this dependency is removed (the first
node is expressed without the second node as a fanin).
Please note that this command negates any mapping that may have been
done at an earlier time.
Caution should be taken when collapsing network to two levels because
the two level representation may be too large. The alternative is to
use eliminate (selective collapse). Refer to eliminate for the details.
resub [-a] [-b] [-d] [node-list]
Resubstitute each node in the node-list into all the nodes in the net-
work. The resubstitution will try to use both the positive and negative
phase of the node. If node-list is not specified, the resubstitution
will be done for every node in the network and this operation will keep
looping until no more changes of the network can be made. Note the
difference between resub * and resub. The former will apply the resub-
stitution to each node only once.
The -a (default) option uses algebraic division when substituting one
node into another. The division is performed on both the divisor and
its complement.
The -b option uses Boolean division when substituting one node into
another. NOTE: Boolean resubstitution has not yet been implemented.
The -d option directs resub not to use the complement of a given node in
algebraic resubstitutions.
simplify [-d][-i num[:num]] [-m method] [-f filter] [node-list]
Simplify each node in the node-list using method with the don't-care set
generated according to dctype.
method specifies the algorithm used to minimize the nodes. snocomp
(default) invokes a single pass minimization procedure that does not
compute the complete offset. nocomp invokes the full minimization pro-
cedure (ala ESPRESSO) but does not compute the complete offset. dcsimp
invokes single pass tautology-based minimizer.
dctype specifies how the don't-care set is generated. The default don't
care set generated is a subset of the fanin don't care set. -d option
specifies that no don't care set is used. -i m:n specifies that the
fanin don't cares of nodes within m levels of transitive fanin and n
levels of transitive fanout of these transitive fanin are to be gen-
erated.
filter specifies how the don't-care set is filtered. exact (default)
uses the exact filter. disj_sup uses the disjoint support filter.
Note that a node function is replaced with the simplified version if the
new function has fewer literals in factored form. In the case of a tie,
the node function is replaced if the new function has fewer literals in
sum-of-products form.
fx [-o] [-b limit] [-l] [-z]
Greedy concurrent algorithm for finding the best double cube divisors
and single cube divisors. Finds all the double cube and single cube
divisors of the nodes in the network. It associates a cost function to
each node, and extracts the node with the best cost function greedily.
The -o option only looks for 0-level two-cube divisors.
The -b option reads an upper bound for the number of divisors generated.
The default value is 50000. This is because the number of divisors in
some cases can grow very large.
The -l option changes the level of each node in the network as allowed
by the slack between the required time and arrival time at that node.
The -z option uses zero-weight divisors (in addition to divisors with a
larger weight). This means that divisors that contribute zero gain to
the overall decomposition are extracted. This may result in an overall
better decomposition, but take an exhorbitant amount of time.
xl_split [-n value] [-d]
Ensures that every node in the network has number of fanins less than or
equal to value. This is accomplished by using kernel extraction and
AND-OR decomposition. -d turns on debugging facilities.
xl_partition [-n support] [-M MAX_FANINS] [-v verbosity_level] [-tm]
Tries to reduce the number of nodes by collapsing nodes into their
fanouts. Also takes into account extra nets created. Collapses a node
into its fanout only if the resulting fanout is feasible. Associates a
cost with each (node, fanout) pair which reflects the extra nets created
if node is collapsed into the fanout. It then selects pairs with lowest
costs and collapses them. The starting network need not be feasible, in
which case the resulting network also may not be. But if the starting
network is, so is the resulting network.
-n: support is the size of the TLU block (default = 5)
-t: Our experience was that if one is just interested in minimization of
number of nodes, then those nodes which can be collapsed into all their
fanouts are the main reduction-contributors. This option removes a node
from the network if it can be collapsed into all its fanouts. Very fast.
-m: move fanins around to increase collapsing possibilities. Do so for a
node only if after collapsing, it has at most MAX_FANINS (default =
15)(as specified by -M option).
-v: this sets the verbosity level (amount of information printed as the
algorithm proceeds) to verbosity_level.
xl_k_decomp [-n support] [-p node_name] [-v verbosity_level] [-f MAX_FANINS_K_DECOMP] [-de]
Uses Karp_Roth disjoint decomposition to recursively decompose nodes of
the network having fanin greater than support to obtain nodes each hav-
ing fanin of at most support. If -p node_name is specified, only the
node with the name node_name is decomposed. Otherwise, all the nodes
that have fanin greater than support are decomposed. If -d option is
specified, then if k_decomp fails to find a disjoint decomposition on a
node, the node is not decomposd by cube-packing. Option -e allows an
exhaustive search over all possible partitions to pick the best decompo-
sition of a node. Then the option -f MAX_FANINS_K_DECOMP sets the limit
on maximum number of fanins of a node for exhaustive decomposition. If
the number of fanins is higher, only the first input partition is con-
sidered.

ebeling@cs.washington.edu