CSE503: Software Engineering
Lecture 22  (March 1, 1999)

David Notkin

v     LCLint [Evans et al.]

      [Material taken in part from a talk by S. Garland]

      Add some partial specification information to C code to

      Detect potential bugs

      Enforce coding style

v     Versatile and lightweight

      Incremental gain for incremental effort

      Fits in with other tools

      Detects potential bugs

      Specifications enable more accurate checks, messages

      Memory management a particular problem in the C language

      Enforces coding style

      Abstraction boundaries

      Use of mutable and immutable types

v     LCLint Does Not

      Encourage programmer to write

        Contorted code

        Inefficient code

      Report only actual errors

      Report all errors

      Insist on reporting a fixed set of potential errors

        Many options and control flags

v     Ex: Definition before Use

      Sample code
if (setVal(n, &buffer)) ...

      Must buffer be defined before calling setVal?

        Yes: bool setVal(int d, char *val);

        No:            bool setVal(int d, out char *val);

      Is buffer defined afterwards?

        Yes:     bool setVal(...); {modifies *val;}

        Maybe: bool setVal(...); {modifies nothing;}

        NO!:    bool setVal(...); {ensures trashed(val);}

v     More Accurate Checks

      Conventional lint tools report

        Too many spurious errors

        Too few actual errors

      Because

        Code does not reveal the programmer’s intent

        Fast checks require simplifying assumptions

        Specifications give good simplifying assumptions

v     Abstraction Boundaries

      Client code should rely only on specifications

      Types can be specified as abstract

        immutable type date;
date nextDay(date d); { }
mutable type
set;
void merge(set s, set t); {modifies s;}

v     LCLint detects

      Inappropriate access to representation

      Including use of ==

v     Checking Abstract Types

      Specification: set.lcl contains the single line
mutable type set;

      Client code
#include "set.h"
bool f(set s, set t) {
  if (s->size > 0) return (s == t);
  ...

      > lclint set client.c
client.c:4,7: Arrow access field of abstract type (set): s->size
client.c:5,13: Operands of == are abstract type (set):  s == t

v     Checking Side Effects

      Specification:
void set_insert (set s, int e)
     { modifies s;}
void set_union(set s, set t)
     { modifies s;}

      Code (in set.c) :
void set_union (set s, set t) {
    int i;
    for (i = 0; i < s->size; i++)
       set_insert(t, s->elements[i]);
    }

      Message:
set.c:35, 27: Called procedure set_insert may modify t:
set_insert(t, s->elements[i])

v     Checking Use of Memory

      Specifications

only char *gname;
. . .
void setName (temp char *pname) char *gname;

      Code
void setName (char *pname) {
   gname = pname;
}

      LCLint error messages
sample.c:2:3: Only storage gname not released before assignment: gname = pname
sample.c:2:3: Temp storage assigned to only: gname = pname

v     If C Were Better...

      Would LCLint still help?

      Yes, because specifications

        contain information not in code

        contain information that is hard to infer from code

        are usable with legacy code, existing compilers

        can be written faster than languages can be changed

        are important even with better languages

v     Experience with LCLint

      Reliable and efficient

      Runs at compiler speed

      Used on both new and legacy code

      1,000-200,000 line programs

      Over 500 users have sent e-mail to MIT

      Tested with varying amounts of specification

      Lots to almost none

      LCLint approximates missing specifications

      Results encouraging

v     Understanding Legacy Code

      Analyzed interpreter (quake) built at DEC SRC

      Discovered latent bugs (ordinary lint can do this)

      Discovered programming conventions

      Documented use of built-in types (int, char, bool)

      Identified (and repaired) (nearly) abstract types

      Documented action of procedures

      Use of global information, side-effects

      Enhanced documentation a common thread

      Easier to read and write because formulaic

      More trustworthy because checked

v     Fundamental benefit

      Partial specifications

      Low entry cost

      You get what you pay for (or maybe a bit more)

v     Purify

      Dynamic (commercial) tool for detecting memory leaks and access errors; there are others, such as Bounds Checker

        http://www.rational.com/products/purify/

      In some sense, a dynamic analog to the memory checking aspects of LCLint

v     Trapping every memory access would be overly expensive

      Purify inserts function calls before loads and stores to maintain a table that holds a 2-bit state for each byte in the heap and stack

      Requires working with malloc/free, too

v     Memory State Transitions

      Writing to memory with any bytes that are unwriteable prints a diagnostic

      Same with reading unreadable bytes

v     Other violations

      Array Bound Violations

        Allocate a “red-zone” around malloc blocks, recording them as unwriteable and unreadable

      Uninitialized variables

        Initialize them to allocated-but-uninitialized state

      Accessing freed memory

        Do not reallocate memory until it has aged

        Aging can be specified by the user in terms of number of calls to free

v     Overhead

      2 bits per state; about 25% memory overhead during development

      Run-time no worse than 5.5 times optimized C code (and usually no worse than compiling with debugging on)