CSE 351, section 4: Procedure call

Michael Ratanapintha

Memory layout and usage of the stack

As Dr. Zahorjan just barely touched on yesterday, the memory address space of a single process consists of several regions. From lowest addresses to highest, these are:

Here is a vaguely pictorial view of the memory layout:

0x0000_0000 |--------------|
            | Instructions |
     |      |--------------|
     |      |   Literals   |
     |      |--------------|
     |      |   Globals    |
     |      |--------------|
     |      |              |
     |      |    Heap      |
     |      |              |
     |      |--------------|
     |      |      |       |
     |      |      |       |
     |      |      |       |
     |      |      V       |
     |      |              |
     |      |              |
     |      |              |
     |      |      ^       |
     |      |      |       |
     |      |      |       |
     |      |      |       |
     |      |--------------|
     |      |              |
     |      |    Stack     |
     V      |              |
            |              |
0xFFFF_FFFF |--------------|

The activation record (AR)

Dr. Zahorjan said yesterday that the stack contains local variables for each function call. I asked what was significant about local variables. We established that every procedure call must have separate space in memory for each local variable, because

I noted that this was fairly elementary stuff, what you learn in the first few weeks of CSE 142, but that you might not have thought of it explicitly before.

Building on this, we identified several data that must be kept for each call of a C function or Java method:

All these data form the activation record, or AR, for the procedure call. Every active procedure call must have its own AR somewhere in memory.

Implementing ARs with stack frames

In C and Java, each program (actually each thread) has a chain of active procedure calls, only one of which is actually executing code at the moment - the others are suspended waiting for a callee to return. The procedures must logically return in the opposite order from which they were called.

I said that this is naturally modeled by a stack, where the running procedure's AR is on the top, its caller's AR is next from the top, and so on. The "stack" in memory is intended to replicate this logical structure - each procedure's AR is represented by a stack frame, with the running procedure call's AR at the top of the stack. Though stack frames implement ARs, they are not the same - as we'll see in the factorial example, an x86 procedure's argument values are in its caller's stack frame, even though they belong to the AR of the callee.

We maintain two pointers into the stack: the stack pointer or SP, which points to the last memory word pushed on the stack (that is, just below the top of the stack), and the frame pointer or FP, which points to the first word pushed in the current call's stack frame. The FP and SP bound the running procedure's stack frame - by using addresses relative to the FP or SP, each procedure's code can address any data on the stack frame without having to encode absolute, constant addresses.

When we call a function, we "push" a new frame onto the stack by subtracting from SP, saving FP onto the stack, and then updating FP to point to the bottom word of the new frame (just above the old SP, which pointed to the top word of the caller frame). Returning from a function requires us to "pop" the callee's frame by adding to SP, and then restoring the FP to the saved value stored on the stack.

Caller- vs. callee-saved registers

Not all of a procedure call's AR is on the stack. As you saw with the Z86, processors have a small set of registers, very fast but very small memories. Many data must be stored in registers or fixed memory locations, including the program counter (PC), the condition codes, and the FP and SP. In addition, several processors, including the Z86, only allow ordinary computation to be done on registers; while the full x86 partially supports computation in memory, register-register operations are also common.

On most processors, all procedure calls share the same register set. Hence, the register values used by a procedure call are also part of the AR; when a procedure call is made, registers shared by the caller and callee must be saved onto the stack and restored back when the callee returns. The caller and callee divide this responsibility by adopting a split-register-set convention:

The calling convention

When one procedure calls another, both caller and callee must agree on several things - where the callee's arguments and return value are placed, the ways the caller and callee are allowed to use each other's stack frames, which registers are caller-saved and callee-saved, and so on. While the processor provides some support for these matters, in general they must be decided as a matter of programming convention. This calling convention must be agreed between the processor architects, the OS authors, the compiler authors, and the authors of previously compiled libraries. If there is disagreement, your program might fail to work because one designer assumed something was safe while another designer decided that such an action would be wrong or unsafe.

x86 procedure calling convention

The x86 provides several instructions that are specialized to procedure call and stack manipulation, but even so there are many calling conventions on the x86. The one below is the cdecl convention used by gcc on Linux.

Register usage conventions

On the x86, the stack pointer and frame pointer are stored in registers. The SP register is called %esp (extended stack pointer), while the FP register is called %ebp (extended base pointer). The "extended" comes from the fact that the x86 used to have a 16-bit address space, and was extended to have a 32-bit one.

Both %esp and %ebp are callee-saved - because the callee is always going to overwrite them, making them callee-saved reduces the code size by not having to generate code to restore them at every call. However, you don't need to explicitly save %esp; as shown below, the stack frame is laid out so that the top of the previous stack frame is always a fixed offset from %ebp, so you can compute the old %esp instead of storing it.

The 6 other x86 integer registers also have special names, rather than numbers. 3 of them are caller-saved:

The other 3 are callee-saved:

All of these are used for computation. %eax is also used to store the return value when a procedure returns; the other registers have special purposes only in some relatively rare cases (such as multiplication and division instructions).

Stack layout

On the x86, each call's argument values and return address are stored on the stack. A callee's arguments are stored at the top of its caller's stack frame, with the first argument at the top, the second argument just below, and so on.

Even though it too is supplied by the caller, the return address for a call is actually stored at the bottom of the callee's frame, just above the first argument to the callee. This odd split has mostly to do with the behavior of the special x86 instructions for procedure call. The second-from-bottom word of a callee's stack frame is generally the saved FP of its caller.

This word description gives rise to the following diagram of the stack frame layout, as used by the GCC C compiler on x86 Linux [1]:

direction   |---------------------|                   ^
of stack    | Space for arguments | <== %esp (SP)     |
growth      | to called routines  |                   |
     ^      |---------------------|                   |
     |      |  Local variables,   |                   |
     |      |  temporary values   |                   |
     |      |---------------------|                   | Current
     |      | Other callee-saved  |                   | (callee's)
     |      |      registers      |                   | stack
     |      |---------------------|                   | frame
     |      |   Saved %ebp (FP)   | <== %ebp (FP)     |
     |      |---------------------|                   |
     |      |   Return address    |                   |
     |      |---------------------|----------------------------
     |      |  Argument 1 passed  |                   |
     |      |      to callee      |                   |
     |      |---------------------|                   |
     |      |  Argument 2 passed  |                   |
     |      |      to callee      |                   |
     |      |---------------------|                   |
     |      |   More arguments    |                   |
     |      |        .            |                   |	Previous
     |      |        .            |                   | (caller's)
     |      |        .            |                   | frame
     |      |        .            |                   |
     |      |        .            |                   |
     |      |    Argument N       |                   |
     |      |---------------------|                   |
     |      |    Caller-saved     |                   |
     |      |     registers       |                   |
     |      |---------------------|                   |
     |      |   Rest of stack     |                   |
     |      |         |           |                   |
     |      |         |           |    	       	      |
     |      |         |           |                   |
     |      |         V           |                   |
     |      |---------------------|                   V 

Some things to note about this stack layout:

Summary of the calling sequence

In brief, this is how gcc on Linux for x86 implements procedure call:

  1. First, the caller makes the actual call:
    1. First, the caller pushes caller-saved register values onto the stack. These values are not used by the callee, so they can be pushed in any order.
    2. The caller pushes argument values onto the stack, rightmost first (in accordance with the convention that the leftmost argument is on top).
    3. Using the CALL instruction, the caller pushes the return address on the stack and then jumps to the callee.
  2. Then, before executing its "actual" code, the callee executes its prologue to set up its stack frame:
    1. The callee pushes the old frame pointer from %ebp onto the stack.
    2. The callee then updates the FP %ebp to point to where the old FP is located, establishing the bottom of the stack frame.
    3. The callee then allocates the rest of its stack frame by subtracting the frame size from the SP.
  3. The callee does its work, saving its return value if any to %eax.
  4. When done, the callee executes its epilogue to restore the previous SP and FP and return to the caller:
    1. Using LEAVE, the callee pops off its stack frame up to and including the saved FP, while also restoring the saved FP into %ebp .
    2. Then, using RET, the callee pops the return address off the stack and jumps back to it.
  5. Finally, the caller does some cleanup work before resuming its own code:
    1. If needed, the caller copies the return value %eax to memory or another register.
    2. The caller pops the arguments off the stack.
    3. When each caller-saved register's value is needed again, that register's saved value is loaded back from the stack.

We will see an example of most of these steps in the factorial example below.

Calling sequence example: Factorial

To study the calling sequence more closely, we'll use the following recursive factorial routine, which defines the factorial of all non-positive integers to be 1:

### Compiled from factorial_test.c , using
###   gcc -Wall -Werror -Wextra -S factorial_test.c
### This hand-annotated version of the resulting assembly code comes from
### factorial_test.annotated.s .
### 
### int factorial (int n);
### n ==> 8(%ebp) - (%ebp) is old %ebp, 4(%ebp) is return address, so
###                 8(%ebp) is first and only argument
### Return value ==> %eax (the register)
### Temp. value t1 ==> %eax
### Temp. value t2 ==> -12(%ebp) (location on stack)
### N.B. %eax is overloaded as both return value and temp. value
factorial:
	## [prologue]
	pushl	%ebp		# Push old frame pointer %ebp on stack
	movl	%esp, %ebp	# %ebp := %esp (currently pointing to old %ebp)
	subl	$40, %esp	# Allocate 40-byte stack frame (no push/pop for us)

	## if (n <= 0) {
	##   return (1);
	## }
	cmpl	$0, 8(%ebp)	# Compute n-0 (== n) - don't store result,
				# but set condition codes
	jg	fact_positive	# if (n > 0) goto fact_positive
	movl	$1, %eax	# result := 1
	jmp	fact_end	# goto fact_end

fact_positive:	
	## else {
	##   int recursive_result = factorial(n-1);
	##   return (n * recursive_result);
	## }
	movl	8(%ebp), %eax	# t1 (%eax) := n
	subl	$1, %eax	# t1 -= 1
	movl	%eax, (%esp)	# copy t1 to last word in our stack frame
	call	factorial	# push return addr on stack, goto recursive call
	movl	%eax, -12(%ebp) # t2 (-12(%ebp)) := recursive call's result
	movl	8(%ebp), %eax	# result := n
	imull	-12(%ebp), %eax	# result *= t2 (truncate upper word of product)

fact_end:	
	## [epilogue]
	leave			# pop stack frame (%esp := %ebp), pop and restore old %ebp
	ret			# pop return address, return there (to caller)
	.size	factorial, .-factorial

Call

Let's consider the sequence of instructions executed when factorial(n) makes a recursive call on factorial(n-1). The call itself is implemented in the two-instruction sequence in the fact_positive code block:

	movl	%eax, (%esp)	# copy t1 to last word in our stack frame
	call	factorial	# push return addr on stack, goto recursive call

Previously, we computed the argument to the recursive call, n-1, and stored it into register %eax. Now, we push this argument onto the stack right at the top of our stack frame. (We don't need to decrease %esp to push the value, because we have already allocated a stack frame big enough to store it.) Ordinarily, we'd also store the values of the caller-saved registers as well, %eax, %edx, and %ecx, but this is not necessary in this case - we don't use %edx or %ecx, and we never need to use the current value of %eax again (notice that the source code annotation doesn't use n-1 again).

With the recursive call argument loaded onto the stack, we use CALL to make the recursive call. CALL does three things:

  1. It decreases %esp by 1 word length (4 bytes), pushing space for an additional word on the stack.
  2. It writes the return address (the address immediately following the CALL) onto the top of the stack frame, which is the same word we just allocated.
  3. Finally, it jumps to the address specified in the instruction, which in this case is the beginning of the same function (factorial) we just left.

Prologue

Before it can do any work, the factorial procedure needs to set up its stack frame. As discussed above, the first step in doing this is to push the old frame pointer on the stack, so we can safely update the frame pointer to point to the new frame. This is done with the instruction

	pushl	%ebp		# Push old frame pointer %ebp on stack

Like CALL, the PUSHL instruction does multiple things:

  1. It pushes space for a new word on the stack, decreasing %esp by 1 word length.
  2. It writes the register specified in the instruction (here %ebp) into the newly pushed stack word.

After the old FP is saved, the SP now points to the old FP. We want to establish that point as the bottom of our stack frame, so we set FP equal to SP:

	movl	%esp, %ebp	# %ebp := %esp (currently pointing to old %ebp)

Unlike the Z86's four types of move operation, the x86 has only one move instruction that accepts various types of register, memory, and immediate operands.

Finally, we push space for the rest of the stack frame onto the stack, by decreasing the SP by the frame size:

	subl	$40, %esp	# Allocate 40-byte stack frame (no push/pop for us)

Alternatively, we can just use PUSHL to allocate stack space when we need it, and its inverse instruction POPL to free up unused stack space. This is not the approach taken by gcc, however.

Epilogue

Once factorial has finished its work and written its result into the %eax return value register, it is ready to pop off its stack frame and return to its caller. This is done with a pair of instructions. The first is

	leave			# pop stack frame (%esp := %ebp), pop and restore old %ebp

LEAVE is a complex instruction, perhaps more so than even CALL. It does three things:

  1. It pops off all but the bottom two words of the stack frame, by setting %esp equal to %ebp, so that only the saved %ebp and the return address remain.
  2. It reads the value at the memory address 0(%esp), which is the saved %ebp, and stores it into the register %ebp.
  3. It pops the saved %ebp off the stack, by increasing %esp by a word length.

Once LEAVE is done, the only word in the stack frame is the return address; %esp contains the address of the return address in memory.

The last instruction in factorial is

	ret			# pop return address, return there (to caller)

RET is, roughly, the opposite of CALL:

  1. It reads the return address from the top of the stack frame into a hidden register.
  2. It pops the return address off the stack by increasing %esp by a word.
  3. Finally, it jumps to the return address by setting the program counter (PC) equal to the RA as stored in the hidden register.

Post-call cleanup

Now that we have returned from the recursive call factorial(n-1), our call factorial(n) needs to restore the register set and stack frame before resuming execution. Because factorial(n) doesn't save any caller-saved registers before recursively calling itself, it doesn't need to restore those registers' values. It does need to save the recursive result somewhere, as it will overwrite the %eax register where the recursive result is currently stored. For this, it simply flushes %eax into the local variable section of its stack frame:

	movl	%eax, -12(%ebp) # t2 (-12(%ebp)) := recursive call's result

Aside: How do I get the assembly code for my program?

References:

  1. Based on: Michael L. Scott, Programming Language Pragmatics (3rd ed.). Morgan Kaufmann, 2009. Figure 8.11 (CD supplement, page 177).