In this assignment, you will learn how to use the "sis" program to synthesize multi-level logic. The first part of the assignment is in tutorial form. The second part of the assignment asks you to synthesize a couple circuits. You may kibbitz with others while doing this assignment, as long as you do it all yourself and you turn in your own work.
sis (a descendant of misII) is a program from Berkeley for performing multi-level logic synthesis. The algorithms in sis underly many of the tools used in industry today, including FPGA Express, the Synopsys synthesis tool that we are using.
To use sis, you will have to login to one of the instructional alphas. First source the file /cse/courses/cse467/sis/sis.cshrc.alpha This will add sis to your path. In the .../cse467/sis/ directory are a number of example files that you will use in this assignment plus a number of scripts. Copy these files to your own directory.
Start up sis by typing sis on the command line. Now read in the first example equation by typing:
sis> read_eqn ex1.eqnThis reads in an equation from the file ex1.eqn The print command prints the circuit currently being worked on as a DAG. Type:
sis> printThis will print out the equation you just read in. The print_stats command prints the statistics for the current circuit. Type:
sis> print_statsThis will give the number of inputs, outputs and nodes in the circuit. Since we are not using sis for sequential circuits, there will always be 0 latches. Also given is the number of literals, which is roughly the cost of the circuit when using standard CMOS library cells. The rule of thumb is that you can determine the number of Xilinx 4000 CLBs you'll need by dividing this number by 6.
You should use these two commands after performing a transformation to see what the transformation has done to the circuit. The first transformation command we will try is decomp. This decomposes a function by factoring it. The decomp command decomposes each equation in the circuit separately. Type:
sis> decompIf you print the result, you will see that the equation has been factored so that the DAG now has two nodes. It turns out that this circuit is no cheaper than the first one, but it's a start.
Now read in the equation file ex2.eqn This will replace the circuit you were just using.
sis> read_eqn ex2.eqnThis file contains two equations. First use the decomp command - notice that this factors the two equations separately. That is, it does not find common factors. The two commands, gkx and gcx are used to find common factors when there is more than one output. The first, gkx, finds common kernel factors. First read in the file again, or use the collapse command, which converts each output equation to a single node:
sis> collapseThen type:
sis> gkx -ab(Documentation on the commands can be found at the end of this assignment.)
sis> gcx -bNotice that this finds the common factor abx.
Now read in the next example from ex3.eqn This also contains two functions. First use the decomp command. Notice that the two output equations are factored, but separately. Now use collapse, or read in the file again. Now type:
sis> decomp f1which factors the first equation only. Next use the resub command by typing:
sis> resub -aresub performs resubstitution, that is trying to factor equations using factors that are already in the DAG. In this case, (b + f) is a factor that can be used in f2. The resub command is used after factoring with gkx, gcx or decomp to find cases where a common factor might be useful.
Now use the collapse command to get back to the original functions and then type:
sis> fxwhich is a command which combines kernal and cube factoring. Notice how the result differs from the previous results.
Finally, the eliminate command allows you to partially collapse a network by eliminating nodes that don't help the cost much. For example,
sis> eliminate 2eliminates nodes that do not reduce the cost of the network by more than 2. We often run eliminate 0 to get rid of nodes that don't help at all. If you look at the end of this assignment, you will see some "scripts" that are popular for synthesizing circuits automatically. These scripts show you how the commands are typically used. However, you can often do better by applying the transformations by hand.
Now read in the file count.pla:
sis> read_pla count.plaThis circuit has 8 inputs and 2 outputs. The first output is asserted if 3 of the 8 inputs are on and second output is asserted if 2 of the 8 inputs are on. You'll recognize this circuit - we already have an implementation that uses only 14 3-LUTs (if I remember right). Now you'll use the synthesis tools to see how good a result you can get. Before you start, take a look at the three scripts at the end of this assignment. These are the scripts most often used to synthesize circuits. You can execute a script by typing, for example:
sis> source script.algebraicThe last script is copied from the book and is used to generated a LUT mapping for a circuit that has already been optimized.
sweep # Eliminate inverters and constants eliminate 5 # Eliminate nodes that done help much simplify -m nocomp -d # Apply 2-level logic minimization to nodes resub -a # Try to use nodes as factors of other nodes gkx -abt 30 # Find really helpful common kernal factors resub -a; sweep gcx -bt 30 # Find really helpful common cube factors resub -a; sweep gkx -abt 10 # Find helpful kernal factors resub -a; sweep gcx -bt 10 # Find helpful cube factors resub -a; sweep gkx -ab # Find all kernal factors resub -a; sweep gcx -b # Find all cube factors resub -a; sweep eliminate 0 # Eliminate nodes that don't help decomp -g * # Finish by factoring the nodes
sweep; eliminate -1 simplify eliminate -1 sweep; eliminate 5 simplify resub -a gkx -abt 30 resub -a; sweep gcx -bt 30 resub -a; sweep gkx -abt 10 resub -a; sweep gcx -bt 10 resub -a; sweep gkx -ab resub -a; sweep gcx -b resub -a; sweep eliminate 0 decomp -g * eliminate -1; sweep
sweep; eliminate -1 simplify -m nocomp eliminate -1 sweep; eliminate 5 simplify -m nocomp resub -a fx resub -a; sweep eliminate -1; sweep full_simplify -m nocomp
xl_split -n 5 # Factor nodes so that none has more than 5 inputs sweep simplify xl_partition -n 5 # Collapse nodes into fanouts, making sure none has > 5 inputs sweep simplify xl_partition -n 5 sweep xl_k_decomp -n 5 # Roth_Karp decomposition sweep
decomp [-gqd] [node-list] Decompose all the nodes in the node-list. If the node-list is not specified, all the nodes in the current network will be decomposed. Decompostion will factor nodes and make the divisor a new node within the network, re-expressing other nodes in terms of this newly introduced node. It is one of the transforms used to break down large functions into smaller pieces, usually at the cost of introducing a few more literals. If the -q option (the default) is specified, the quick decomp algorithm is used which extracts out an arbitrary kernel successively. Because of the fast algorithm for generating an arbitrary kernel, decomp -q is very fast compared with the decomp -g. In most cases, the result is very close. This command is recommended at the early phase of the optimiza- tion. If the -g option is specified, the good decomp algorithm is used which successively extracts out the best kernel until the function is factor free, and applies the same algorithm to all the kernels just extracted. This operation will give the best algebraic decomposition for the nodes. But, since it generates all the kernels at each step, it takes more CPU time. In general, decomp -q should be used in the early stage of the optimization. Only at the end of the optimization, should decomp -g be used. If the -d option is specified, the disjoint decomposition is performed. Currently, the disjoint decomposition is limited to the following simple algorithm: It partitions the cubes into sets of cubes having disjoint variable support, creates one node for each partition, and a node (the root of the decomposition) which is the OR of all the partitions. gcx [-bcdf] [-t threshold] Extract common cubes from a network, re-express the network in terms of these cubes, and in the process cut down on the number of literals needed in the network. The -b option chooses the best cube at each step when examining possible cubes to be extracted; otherwise, the more efficient ping-pong algorithm is used to find a good (but not necessarily the best) cube at each step. The -c option uses the complement of each cube as well as the cube when dividing the new cube into the network. The -f option uses the number of literals in the factored form for the network as a cost function for determining the best cube to be extracted. The -t option sets a threshold such that only a cube with a value greater than the threshold will be extracted. By default, the threshold is 0, so that all possible cube divisors are extracted. The -d option enables a debugging mode which traces the execution of gcx. gkx [-1abcdfo] [-t threshold] Extract multiple-cube common divisors from the network. The -a option generates all kernels of all function in the network when building the kernel-intersection table. By default, only level-0 ker- nels are used. The -b option chooses the best kernel intersection as the new factor at each step of the algorithm; this is done by enumerating and considering each possible kernel intersection, and choosing the best. By default, the more efficient ping-pong algorithm is used to find a good (but not necessarily the best) kernel intersection. The -c option uses the new factor and its complement when attempting to introduce the new factor into the network. The -d option enables debugging information which traces the execution of the kernel extract algorithm. The -f option uses the number of literals in the factored form for the network as the cost function when determining the value of a kernel intersection; by default, the number of literals in the sum-of-products form for the network is used. The -o option allows for overlapping factors. The -t option sets a threshold such that divisors are extracted only while their value exceeds the threshold. By default, the threshold is 0 so that all possible multiple-cube divisors are extracted from the net- work. The -1 option performs only a single pass over the network. By default, the kernel extract algorithm is iterated while there are still divisors whose value exceeds the given threshold. collapse [n1] [n2] Collapse nodes in the network. With no arguments, the entire network is collapsed into a single-level of functions (i.e., two-level form). Each output will be expressed in terms of the primary inputs. Given a single node, that function is collapsed until it is represented entirely in terms of primary inputs. Given two arguments, it is assumed that the second node is a fanin of the first node. In this case, this dependency is removed (the first node is expressed without the second node as a fanin). Please note that this command negates any mapping that may have been done at an earlier time. Caution should be taken when collapsing network to two levels because the two level representation may be too large. The alternative is to use eliminate (selective collapse). Refer to eliminate for the details. resub [-a] [-b] [-d] [node-list] Resubstitute each node in the node-list into all the nodes in the net- work. The resubstitution will try to use both the positive and negative phase of the node. If node-list is not specified, the resubstitution will be done for every node in the network and this operation will keep looping until no more changes of the network can be made. Note the difference between resub * and resub. The former will apply the resub- stitution to each node only once. The -a (default) option uses algebraic division when substituting one node into another. The division is performed on both the divisor and its complement. The -b option uses Boolean division when substituting one node into another. NOTE: Boolean resubstitution has not yet been implemented. The -d option directs resub not to use the complement of a given node in algebraic resubstitutions. simplify [-d][-i num[:num]] [-m method] [-f filter] [node-list] Simplify each node in the node-list using method with the don't-care set generated according to dctype. method specifies the algorithm used to minimize the nodes. snocomp (default) invokes a single pass minimization procedure that does not compute the complete offset. nocomp invokes the full minimization pro- cedure (ala ESPRESSO) but does not compute the complete offset. dcsimp invokes single pass tautology-based minimizer. dctype specifies how the don't-care set is generated. The default don't care set generated is a subset of the fanin don't care set. -d option specifies that no don't care set is used. -i m:n specifies that the fanin don't cares of nodes within m levels of transitive fanin and n levels of transitive fanout of these transitive fanin are to be gen- erated. filter specifies how the don't-care set is filtered. exact (default) uses the exact filter. disj_sup uses the disjoint support filter. Note that a node function is replaced with the simplified version if the new function has fewer literals in factored form. In the case of a tie, the node function is replaced if the new function has fewer literals in sum-of-products form. fx [-o] [-b limit] [-l] [-z] Greedy concurrent algorithm for finding the best double cube divisors and single cube divisors. Finds all the double cube and single cube divisors of the nodes in the network. It associates a cost function to each node, and extracts the node with the best cost function greedily. The -o option only looks for 0-level two-cube divisors. The -b option reads an upper bound for the number of divisors generated. The default value is 50000. This is because the number of divisors in some cases can grow very large. The -l option changes the level of each node in the network as allowed by the slack between the required time and arrival time at that node. The -z option uses zero-weight divisors (in addition to divisors with a larger weight). This means that divisors that contribute zero gain to the overall decomposition are extracted. This may result in an overall better decomposition, but take an exhorbitant amount of time. xl_split [-n value] [-d] Ensures that every node in the network has number of fanins less than or equal to value. This is accomplished by using kernel extraction and AND-OR decomposition. -d turns on debugging facilities. xl_partition [-n support] [-M MAX_FANINS] [-v verbosity_level] [-tm] Tries to reduce the number of nodes by collapsing nodes into their fanouts. Also takes into account extra nets created. Collapses a node into its fanout only if the resulting fanout is feasible. Associates a cost with each (node, fanout) pair which reflects the extra nets created if node is collapsed into the fanout. It then selects pairs with lowest costs and collapses them. The starting network need not be feasible, in which case the resulting network also may not be. But if the starting network is, so is the resulting network. -n: support is the size of the TLU block (default = 5) -t: Our experience was that if one is just interested in minimization of number of nodes, then those nodes which can be collapsed into all their fanouts are the main reduction-contributors. This option removes a node from the network if it can be collapsed into all its fanouts. Very fast. -m: move fanins around to increase collapsing possibilities. Do so for a node only if after collapsing, it has at most MAX_FANINS (default = 15)(as specified by -M option). -v: this sets the verbosity level (amount of information printed as the algorithm proceeds) to verbosity_level. xl_k_decomp [-n support] [-p node_name] [-v verbosity_level] [-f MAX_FANINS_K_DECOMP] [-de] Uses Karp_Roth disjoint decomposition to recursively decompose nodes of the network having fanin greater than support to obtain nodes each hav- ing fanin of at most support. If -p node_name is specified, only the node with the name node_name is decomposed. Otherwise, all the nodes that have fanin greater than support are decomposed. If -d option is specified, then if k_decomp fails to find a disjoint decomposition on a node, the node is not decomposd by cube-packing. Option -e allows an exhaustive search over all possible partitions to pick the best decompo- sition of a node. Then the option -f MAX_FANINS_K_DECOMP sets the limit on maximum number of fanins of a node for exhaustive decomposition. If the number of fanins is higher, only the first input partition is con- sidered.