Java Benchmark Results

Our optimizing compiler supports only a subset of Java, so it cannot be compared directly to other complete implementations of Java. Instead, we intend our results to suggest the relative effectiveness of the different optimization strategies we apply, for Java-like languages.

Java Implementation Caveats

Our Java implementation does not support the following features:

threads & synchronization
reflection, dynamic class loading, stack trace examination
many non-core primitives, most particularly graphics primitives
finalization

Also, we currently do not generate explicit null pointer dereference checks. Null pointers we expect would be handled in a real Java implementation by catching segmentation violation signals and reifying them as Java exceptions, rather than through explicit run-time checks. Finally, we eagerly initialize class files in a particular statically-defined order, rather than initializing them lazily upon first reference.

Benchmark Descriptions

Our Java benchmarks were selected because they were large, realistic Java programs (that satisfied the restrictions of our Java implementation). We have not tried compiling CaffeineMarks, Linpack, or other similar benchmark suites because they are not large or object-oriented enough for us.

Name	Size (lines)	Description
javac	25,400 + 13,700 std. lib.	Sun's compiler from Java source to Java bytecodes
espresso	13,800 + 13,700 std. lib.	Martin Odersky's drop-in replacement for javac
toba	3,900 + 12,900 std. lib.	Todd Proebsting et al.'s Java bytecode to C code translator
javadoc	28,471 + 13,700 std. lib.	Sun's documentation generator for Java source
javacup	9,200 + 12,200 std. lib.	Scott Hudson's parser generator for Java
pizza	27,500 + std. lib.	Pizza to Java bytecodes compiler
cassowary	3,400 + std. lib.	Constraint solver

Optimizer Configurations

Our experiments were performed by compiling Java programs into bytecodes using javac -O (except for espresso where only precompiled .class files were available), then running our Javelin front-end (based on Sun's javap bytecode disassembler) to translate Java bytecodes into the Vortex intermediate language, then running Vortex to produce C code, then running gcc -O2 on the C code to produce a stand-alone executable. The following table describes the different settings of the Vortex optimizer component of this pipeline:

Name	Description
unopt	No Vortex optimizations (but full gcc -O2 optimizations on the generated C code)
trad-opt	Vortex intraprocedural, non-message optimizations, including elimination of redundant and dead loads & stores and elimination of dead object creations
inl	trad-opt plus automatic inlining (beyond any inlining performed by javac -O)
i	inl plus intraprocedural static class analysis to optimize messages
i-cha	i plus class hierarchy analysis to automatically identify non-overridden methods
i-cha-scp	i-cha plus class-hierarchy-based static class prediction
i-prof	i plus dynamic-profile-guided class prediction
i-cha-prof	i-cha plus -prof
i-cha-scp-prof	i-cha-scp plus -prof

We also compare our results against other implementations of Java, just to place our performance results in some context. But note that because we do not support full Java, precise comparisons to other implementations are not meaningful.

Name Description

interp Sun's JDK1.0.2 interpreter

jit Sun's "just-in-time" dynamic compilation-based implementation of Java

toba Todd Proebsting et al.'s translator from Java (bytecodes) to C (version 1.0.b5), roughly corresponding to Vortex's unopt configuration

Name	Description
interp	Sun's JDK1.0.2 interpreter
jit	Sun's "just-in-time" dynamic compilation-based implementation of Java
toba	Todd Proebsting et al.'s translator from Java (bytecodes) to C (version 1.0.b5), roughly corresponding to Vortex's unopt configuration

Performance Data

[A Java applet implementing a bar chart should have appeared here.]

Raw, Detailed Data

Analysis

The performance gap between the interp, jit, and toba configurations and the Vortex unopt configuration should only be interpreted as evidence that the baseline Vortex implementation of Java is fairly efficient. A precise comparision is not meaningful because Vortex does not implement some potentially expensive language features such as threads, synchronization, dynamic class loading, and other reflective operations.

For these Java benchmarks, the most important optimization was class hierarchy analysis, followed by cross-module inlining. Class prediction (both static and profile-guided) is valuable for some of the Java programs, but ineffective for others. For some program/configuration pairs there are small performance inversions due to unfortunate interactions between the heuristics used to guide class prediction.

Last updated March 13, 1998.

Cecil/Vortex Project, chambers@cs.washington.edu