------------------------------------------------------ Memo on SimpleScalar 2.0 for cse586 ------------------------------------------------------ 00. Table of Contents 0. Preliminaries 0.1 Course Directory Layout I. Introduction II. Using SimpleScalar (basic) III. Using SimpleScalar (advanced) IV. SimpleScalar for Assignment 2 0. Preliminaries The SimpleScalar home page is: http://www.cs.wisc.edu/~mscalar/simplescalar.html I've downloaded a technical report that gives an overview of the system and placed it in the course directory at /cse/courses/cse586/01au/ss2.0/doc/TR_1342.ps I'll refer to this technical report frequently in this memo. I assume working knowledge of unix: familiarity with shells, input and output redirection, and the command-line environment. The examples here are in bash; users of other shells should translate. If there's large demand for a unix tutorial I'll put one up. 0.1 Directories and Files All the paths I reference here are relative to the course directory. This directory is visible from the instructional servers: fiji, ceylon, sumatra and tahiti, all off cs.washington.edu. /cse/courses/cse586/01au/ --- root of course directory bin/ --- binaries, e.g. sim-outorder and 586-setup-bash bin-ss/ --- SimpleScalar binaries, e.g. li.ss etc/ --- configuration files, e.g. hw2.cfg src/ --- input files for the benchmarks, organized by program. ss2.0/ --- the SimpleScalar 2.0 tree. Contains cross-compiler and all tools. I. Introduction SimpleScalar is a processor simulator. That is, it is a program that runs on one platform (e.g., x86), and executes binaries for another processor (e.g., MIPS). A program run on the simulator should execute the "same" as a program executed on the simulator's target platform. The trick is to define what you mean by "same". The usual trade-off when writing a simulator is accuracy vs. speed. If your criterion of same-ness simply considers program output, you can write a simple simulator that runs fast, but cannot produce detailed statistics on, for example, cache or pipeline performance. On the other hand, if your criterion of same-ness extends to how the pipeline is processed, detailed statistics on something like the pipeline can be produced. Because of the varying applications of simulators, no single compromise in this trade-off is very useful. Accordingly, the SimpleScalar tool set provides several simulators, each at a different point in this trade-off. The fastest, sim-fast, can execute instructions at 4 MHz, but guarantees nothing more than serial execution of the instructions. The slowest, sim-outorder, runs at 150 KHz, but simulates most aspects of a chip, including an out-of-order execution pipeline and branch prediction. An intermediate simulator, sim-cache, accurately simulates cache behavior but is not cycle accurate. The simulators in SimpleScalar implement a MIPS-like instruction set and chip design. The instruction set is detailed in the technical report. One interesting quirk is a 64-bit op-code, that has 16 spare bits intended to be used for poke extensions; the simulator is meant to be used for heavy experimentation. II. Using SimpleScalar (basic) I've put sim-outorder, the simulator to be used with assignment 2, in the bin/ directory of the course directory(/cse/courses/cse586/01au). Also in the bin/ directory is 586-setup-bash. Sourcing this does nice things to your environment. If there's demand, I'll put up a csh version as well. Full documentation on running the simulator is found in the technical report. Simulator parameters are set by command-line flags; as specifying long lists of flags can get tedious, the simulator can also read command-line flags from a configuration file, using the -config switch. You must also specify what program to execute on the simulator. This is done by specifying it last on the command line. As is standard for this sort of thing, any flags to be passed to the program are appended to the command line. For example, sim-outorder bin-ss/perl.ss -e 'print "hello\n";' executes the file perl.ss with sim-outorder, with perl.ss getting the flags -e 'print "hello\n";'. Remember that files executed by the simulator are binary files containing machine code for the simulator instruction set. Thus, for example, you cannot execute an x86 binary using sim-outorder. You must use a binary cross-compiled to the SimpleScalar architecture instead. Binaries that will be used in this course are found in bin-ss/ in the course directory. Note that statistics are written to stderr. Thus it's useful to do something like sim-outorder bin-ss/perl.ss -e 'print "hello\n";' 2> sim-results to save the output into sim-results for later analysis. III. Using SimpleScalar (advanced) I've built the complete SimpleScalar 2.0 tool-set at ss2.0/ in the course directory. There are two flavors of the simulator, one big-endian and the other little endian. The simulator apparently doesn't work well when its endian-ness doesn't match the host platform; therefore I've built the little-endian version of the tools. This is signified by the word "little" or "sslittle" in the tools or directories. The SimpleScalar environment is based on the gnu/binutils tool-set, all compiled to be cross-platform, with the host being x86 and the target sslittle. Cross-platform binutils and compiler/assembler/loader are located in ss2.0/bin (with long, fully descriptive names), or in sslittle-na-sstrix/bin (with the usual short names). The more adventurous students can use the tools to compile C-code to the SimpleScalar platform for simulation. IV. SimpleScalar for Assignment 2 A configuration file that contains flags suitable for the first part of assignment 2 can be found at etc/hw2.cfg in the course directory. Read through hw2.cfg and make sure you understand it. It sets up the caches and 2-level cache as directed. Note that even though the parameters for the 2-level cache are specified, the bimodal cache is specified as the one to use by the configuration file. You can change this by using the -bpred flag on the command line (command line flags take precedence over configuration file settings). You've been directed to use bin-ss/cc1.ss as the test program. This program takes input on stdin (the input is just a prepossessed C file). Assembly code is output on stdout. The spec95 inputs are located in src/cc1. Here is an example (split across multiple lines for formatting purposes only): sim-outorder -config $CDIR/etc/hw2.cfg -bpred 2lev $CDIR/bin-ss/cc1.ss \ < $CDIR/src/cc1/cexp.i > output.s 2>sim-output $CDIR is a variable set by 586-setup to be the root of the course directory tree. The -config flag reads in hw2.cfg, -bpred 2lev sets the branch prediction to be 2-level and cc1.ss is specified as the binary to execute. stdin is redirected to cexp.i, stdout to output.s, and stderr to sim-output. All the useful statistics are located in sim-output. Note that "-bpred bimod" is the switch to use to get 2-bit dynamic branch prediction. "bimod" is short for bimodal, which refers to the fact that 2-bit dynamic prediction moves between two states (modes): taken and not taken. BTFN prediction, a.k.a. "static combined", has been added to SimpleScalar. Access it using "-bpred static_comb"