33. Calls, Interrupts and Traps

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Calls

Another classic area where different architectural approaches lead in vastly different directions is the area of procedure, function and method calls.

Since the development of the Algol 60 programming language it has been common to think of calls as allocating objects called activation records on a run-time stack. (It is noteworthy that PL/I, Pascal, C and Simula 67 were all strongly influenced by Algol, and that C++ and Java combine large parts of Simula 67 semantics with C syntax.) The basic model for use of a stack was well documented in E. W. Dijkstra's paper on Recursive Programming (Numerische Mathematik, 2, 312-318, 1960), although Dijkstra did not use the term activation record. The activation record for a block of code contains:

In general, if we assume this model, two registers must be dedicated to holding the static link and dynamic link. A call is made as follows:

  1. Allocate an activation record for the called block.

  2. Load parameters into the new activation record.

  3. Set the new static link, using knowledge of the point where the called block was declared.

  4. Set the new dynamic link to point to the current activation record.

  5. Set the current continuation point or the new return address to point to the instruction to be executed on exit from the called block.

  6. Make the new block the current block by adjusting the static and dynamic links.

  7. Transfer control to the code of the new block.

In general, it takes several instructions to do all of this! Some of these instructions will be part of the code in the calling block, and some will be part of the prologue on the called block. In general, on return from a block to its caller, it takes several instructions to restore the execution context and deallocate the activation record that is no longer needed. Again, these instructions are typically divided between the epilogue on the called block and the body of the calling block. In sum, we can identify 3 sequences of instructions:

The calling sequence
The instruction sequence in the body of the calling block, including the first part of the sequence of instructions used to transfer control to the called block and the second part of the sequence of instructions used to return from the called block.

The receiving sequence
The instruction sequence that serves as a prologue on the called block; this finishes what the calling sequence started on entry to the called block.

The return sequence
The instruction sequence that serves as an epilogue on the called block; this begins the return to the calling block and then transfers control to somewhere in the calling sequence.

Usually, when talking collectively about these three types of instruciton sequences, the term calling sequence is used broadly to refer to all three. There are obviously many ways to build calling sequences, but the design of a good calling sequence is sufficiently difficult that, on most machines, there is a standard calling sequence used by most programmers.

Machine architects have long seen calling sequences as a problem area. As a result, they have long attempted to produce general call instructions that package large parts of the calling sequence into one machine instruction. A typical medium complexity CALL instruction will push the return address on the stack and transfer control to a destination, but many machines have been designed with CALL instructions that attempted to do quite a bit more.

The DEC VAX, for example, included a CALL instruction that included a mask indicating what registers were being used, and selectively pushed these registers prior to pushing the return address and a stack mark word holding this mask so that the right registers can be restored on return. The corresponding RETURN instruction popped the stack mark word and then used it to restore the appropriate registers, including the saved program counter and restore the stack to its state immediately prior to the call. These instrucitons require large numbers of memory cycles and force a break from pipelined execution!

The designers of Gnu C++ discovered that they could almost always produce faster object code if they never used the standard VAX CALL instruction except for calling library routines that were obliged to adhere to the standard. This discovery led many machine architects to abandon such complex call instructions and go back to far simpler models that were easy to pipeline. Typically, this has meant a return to CALL instructions with the semantics of the old IBM System 360 BAL (branch and link) instruction:

	CALL R DST
	   R = PC
	   PC = DST (the effective address)

The return address is simply left in a register, so that the receiving sequence can save it as needed. If the function is at a leaf of the call tree (it calls no other functions), the return address can remain in the register for the duration.

Given this basic calling mechanism, and the fully developed model of an activation record as outlined above, a typical calling sequence suite would be:

	-- assume the following named index registers:
	--    AR - pointer to the current activation record
	--    NR - points to new activation record (temporarily)
	--    RA - register holding return address (temporarily)

	-- if P points to an activation record
	--    P->CP holds the continuation point
	--    P->DL holds the saved dynamic link
	--    P->SL holds the saved static link

	-- we assume that ARS, the size of each activation
	--    record, is known statically; it may vary from
	--    place to place depending on the number of
	--    temporaries currently needed, but each time
	--    any particular instruciton is executed, the
	--    size is fixed.

	--------------------
	-- calling sequence

	      ADD NR = AR + ARS			  -- (a)

	      -- copy parameters into fields of *NR  (b)
	      -- save registers into fields of *AR   (c)

	      MOVE NR->SL = DST_CONTEXT           -- (d)
	      CALL RA,DST

	      -- restore registers from *AR          (e)

	--------------------
	-- receiving sequence

	DST
	      MOVE AR->CP = RA			  -- (f)
	      MOVE NR->DL = AR			  -- (g)
	      MOVE AR = NR                        -- (h)

        --------------------
	-- return sequence   
	      MOVE AR = AR->DL                    -- (i)
	      MOVE RA = AR->CP                    -- (j)
	      JUMP *RA

Notes on the above pseudocode:

(a) The above code assumes that the called function will need to use RAM. Small functions where the entire activation record for the function fits in registers need no activation record in RAM and this line can be eliminated.

(b) The above code assumes that parameters are passed in memory, as must be the case if there are many or if large objects are passed by value. If there are only a few small parameters, all may be passed in registers and the code at b will not be needed.

(c,e) The above code assumes that the caller has values in registers that must be saved. If the called routine (and anything it calls) does not modify certain registers, this code can be omitted.

(d) If the called routine makes no reference to global variables, or rather, to variables declared in statically enclosing blocks other than the outermost block, it needs no static link and this line can be omitted.

(f,j) If the called routine does not call other routines and does not need extra registers, these lines may be omitted and the return address may remain in the RA register through the call.

(g,h,i) If the called routine does not call others and does not need extra registers, these lines may be omitted. If this is done, all references to local variables or parameters in memory must be indexed off of NR instead of AR.

The above notes indicate, in a fair amount of detail, the kind of optimizations that good compilers such as the GNU C++ compiler make in order to minimize the cost of calls. This minimization is only possible if the compiler knows all of the characteristics of the called routine at the time the code for the call is compiled. Therefore, calls to other functions in the same compilation unit will typically be more efficient to calls in other compilation units, and calls to previously compiled routines may be more efficient than calls to routines that have not yet been compiled, but only if the compiler has access to a sufficiently detailed description of the called routine at the time it compiles the call.

Interrupts

Interrupts have been described as calls to parameterless functions that return no results and are initiated not by the calling code but by events detected by hardware outside the CPU such as the completion of an input/output transfer.

In fact, interrupts are not so simple except in the simplest of computers! The original DEC PDP-8 had one of the simplest interrupt mechanisms. On interrupt:

  1. The program counter was saved in location zero.
  2. The interrupt enable flipflop was reset.
  3. The program counter was set to 1.

It was up to interrupt handler, starting at location 1, to save the other registers in the CPU, determine what caused the interrupt, and eventually, return by reenabling interrupts and doing an indirect jump through location zero.

This simplicity is misleading! The PDP-8 CPU had an Interrupt Enable bit, IE. If IE was set to zero, the instruction execution cycle would not handle interrupts. The occurance of an interrupt automatically reset IE, but on return, it was up to the software to enable interrupts; this meant that there was a 1-instruction window between the instruction that enabled interrupts and the instruction that completed the return. An interrupt could not be allowed to occur within this window, so additional hardware had to be added to prevent interrupt processing between the enable interrupt instruction and the following instruction. Things got even more complex in the context of the PDP-8 memory management unit because interrupt processing always took place in bank 0 of memory, and this meant that the MMU state had to be saved and restored.

On the DEC PDP-11 and many stack architectures interrupts were handled quite differently! On interrupt, the previous program counter and processor status word were pushed onto the stack, and then the program counter (PC) and processor status word (PSW) were loaded from the interrupt vector. This again looks simpler than it is, particularly on systems with multiple processes, each with its own stack, and even more so when there is a memory management unit so that each process has its own address space.

The PDP-11 processor status word included the condition codes, an indication of the priority level at which the processor was running, and a field dedicated to controlling the optional memory management unit. The priority field was a 3-bit field. If the CPU was running at high priority, all but the non-maskable interrupts were disabled. at lower priority levels, more interrupts were enabled, until at the lowest level, all were enabled.

The interrupt vector contained 8 PC/PSW pairs, one corresponding to each of the 8 priority levels. The interrupt request line from a device could be attached to one of the 8 levels, so that when that device requested an interrupt, the PC and PSW would be loaded from the corresponding entry in the vector.

The PDP-11 had a special return-from-interrupt instruciton that popped the top two words form the stack into the PC and PSW. So long as there was sufficient space on the stack of the currently running application, interrupt service routines could operate entirely in the context of that stack, popping everything they pushed prior to return and restoring the stack to its state prior to the interrupt as they finished the returned.

Modern Interrupt Models

The trend in more recent architectures is generally away from the complexity of CISC interrupt models and toward a simple model where the interrupt hardware has no knowledge of the structure of the stack, if there is a stack! Thus, the interrupt may simply save the program counter and processor status word (including the basic MMU state) in special registers, set the program counter to a constant that depends on the device that requested the interrupt, and clear the processor status word. From that point on, it is up to the interrupt service routine (application software) to save things in the appropriate places.

This does not mean that the state save and restore process is slow compared to earlier machines! When interrupts push on a stack, if this was the wrong place to save the state (as is frequently the case on machines with one stack and address space per process), the software may have to pop information from one stack and save it elsewhere. Furthermore, the expensive part of interrupt handling is frequently the job of saving and restoring other state information, for example, the contents of registers. Registers are relatively inexpensive, and since the 1970's, many machines have provided multiple register banks, reserving a field of the processor status word to select the currently active register bank. When this is done, the change of processor status word on entry and exit from the interrupt service routine is all that is needed to save and restore the bulk of the registers needed by an applicaton.

Of course, this raises the question of how many register banks a machine should have. Two banks allows applications to share one while interrupt service routines use the other. So long as interrupt service routines cannot be interrupted, this is straightforward. More banks add complexity to the system because register banks become another system resource that must be allocated. So long as the number of active processes is smaller than the number of regbister banks, the situation is not too difficult, but if there are, say, 16 register banks and 17 active processes, suddenly the presence of multiple register banks becomes a serious problem.

Another problem that arises when there are multiple register banks is that of system access to data in a different bank. Allocating a register bank to a process requires loading that processes registers in the bank, and while this is being done, typically, the operating system will be using some other bank. This requires a special set of instructions just for the operating system for moving data to and from registr banks other than the current bank. Generally, these instructions are very obscure, and like all special system instructions, they aren't supported by compilers and force the operating system designer to descend to the assembly language level, something nobody likes to do these days.