CSE503: Software Engineering
Lecture 21 (February 26, 1999)

David Notkin

v Lackwit (and LCLint intro, if time)

Ø Examples of approaches that check partial specifications and that exploit type information

v Lackwit (O’Callahan & Jackson)

Ø Code-oriented tool that exploits type inference

Ø Answers queries about C programs

§ e.g., “locate all potential assignments to this field”

Ø Accounts for aliasing, calls through function pointers, type casts

Ø Efficient

§ e.g., answers queries about a Linux kernel (157KLOC) in under 10 minutes on a PC

v Placement

Ø Lexical tools are very general, but are often imprecise because they have no knowledge of the underlying programming language

Ø Syntactic tools have some knowledge of the language, are harder to implement, but can give more precise answers

Ø Semantic tools have deeper knowledge of the language, but generally don’t scale, don’t work on real languages and are hard to implement

Ø The goal of Lackwit, thus, is to use some semantic basis, in a scalable way, on a real language (C)

v It is a static tool

Ø Can work on incomplete programs

Ø Make assumptions about missing code, or supply stubs

v Sample queries

Ø Which integer variables contain file handles?

Ø Can pointer foo in function bar be passed to free()? If so, what paths in the call graph are involved?

Ø Field f of variable v has an incorrect value; where in the source might it have changed?

Ø Which functions modify the cur_veh field of map_manager_global?

v Lackwit analysis

Ø Approximate (may return false positives)

Ø Conservative (will not return false negatives) under some conditions

§ C’s type system has holes

§ Lackwit makes assumptions similar to those made by programmers (e.g., “no out-of-bounds memory accesses”)

§ Lackwit is unsound only for programs that don’t satisfy these assumptions

v Query commonalities

Ø There are a huge number of names for storage locations

§ local and global variables; procedure parameters; for records, etc., the sub-components

Ø Values flow from location to location, which can be associated with many different names

Ø Archetypal query: Which other names identify locations to which a value could flow to or from a location with this given name?

Ø Answers can be given textually or graphically

v An example query

Ø Query about the cur_veh field of map_manager_global

Ø Shaded ovals are functions extracting fields from the global

Ø Unshaded ovals pass pointers to the structure but don’t manipulate it

Ø Edges between ovals are calls

Ø Rectangles are globals

Ø Edges to rectangles are variable accesses

Ø Claim

§ This graph shows which functions would have to be checked when changing the invariants of the current vehicle object

§ Requires semantics, since many of the relationships are induced by aliasing over pointers

v Underlying technique

Ø Use type inference, allowing type information to be exploited to reduce information about values flowing to locations (and thus names)

Ø But what to do in programming languages without rich type systems?

Ø Trivial example

§ DollarAmt getSalary(EmployeeNum e)

Ø Relatively standard declaration

Ø Allows us to determine that there is no way for the value of e to flow to the result of the function

§ That is, e can (and surely would) affect what DollarAmt gets returned, but e itself (nor any computation that is based on e) will be able to flow to the return value

Ø Because they have different types

Ø Consider an alternative

§ int getSalary(int e)

Ø Another, perhaps more common, way to declare the same function

Ø This doesn’t allow the direct inference that e’s value doesn’t flow to the function return

§ Because they have the same type

v But maybe one could analyze the program to determine this anyway; Lackwit does so by using a type inference mechanism for precision

v Lackwit’s type system ignores the C type declarations

Ø Computes new types in a richer type system

Ø Incomplete type information

Ø Cover example from the paper

Ø void* return1st(void* x, void* y) {
return x; }

Ø (a ref^b, b) ®^f a ref^b

Ø The type variable a indicates that the type of the contents of the pointer x is unconstrained

Ø But it must be the same as the type of the contents of the return value

Ø Increases the set of queries that Lackwit can answer with precision

v Polymorphism

char* ptr1;
struct timeval* ptr2;
char** ptr3;
…
return1st(ptr1,ptr2);
return1st(ptr2,ptr3);

Ø Both calls match the previous function declaration

^ØThis is solved (basically) by giving return1st a richer type and instantiating it at every call site

^§(c ref^b, d) ®^d c ref^b

§ (e ref^a, f) ®^c e ref^a

v Type stuff

Ø Modified form of Hindley-Milner algorithm “W”

Ø Efforts made to handle

Ø Mutable types

Ø Recursive types

Ø Null pointers

Ø Uninitialized data

Ø Type casts

Ø Declaration order

v Morphin case study

Ø Robot control program of about 17KLOC

Ø Vehicle object contains two queue objects

Ø Client was investigating combining these two queues into one

Ø Queried each queue object to discover operations performed and their contexts

Ø The two graphs each contained 171 nodes

Ø But each graph had only five nodes highlighted as “accessor” nodes

Ø Example

§ These five matches helped identify code to be changed

§ grep would have returned false matches and missed matches when parameters were passed to functions

§ Context-sensitivity needed to distinguish the two queue objects because both are passed as arguments to the same queue functions

v Recap

Ø Helps find relationships among variables in a C program

Ø Exploits type inference to understand values flowing to locations and thus names

Ø Approximate, although safe under many (most?) conditions

Ø Reasonably efficient

Ø Although I didn’t show the numbers, they are now better than reported in the ICSE paper

CSE503: Software Engineering Lecture 21 (February 26, 1999)

David Notkin

CSE503: Software Engineering
Lecture 21 (February 26, 1999)