34. Interrupts on Pipelined Systems

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

The Problem

The complex interrupt model of the PDP-11 and many later CISC machines (the M 68000, the VAX, and the Intel 80x86) is straightforward to implement on a microcoded machine, but entirely outside the model of pipelined execution. Thus designers of pipelined implementations of such machines are forced to break out of the pipeline model in much the same way as they do when resorting to repeated shift-and-add implementations of multiplication or when implementing memory-to-memory block-move instructions.

When designers of pipelined machines have the opportunity to design a new interrupt model for their machines, they usually aim for a very simple model.

Consider the following model: For each interstage register, there is an interrupt register. On interrupt, the contents of all interstage registers are latched into the corresponding interstage registers, and then everything in the pipeline is invalidated and the program counter is set to a fixed value determined by which interrupt request line has been asserted.

                  The Pipeline     Interrupt
		 and interstage    registers
		   registers
                     ______ 
                 IF |______|         ____ 
                     |>___|  <----> |>___|
                 AC |______|         ____ 
                     |>___|  <----> |>___|
                 OF |______|         ____ 
                     |>___|  <----> |>___|
                 OP |______|         ____ 
                     |>___|  <----> |>___|
                 OS |______|

	                      ----> copy right on interrupt
	                     <----  copy left on return

The return from interrupt instruction simply latches all the interrupt registers into the corresponding interstage registers. Thus, on return from interrupt, any partially executed instructions that were interrupted will completed, with no delay to load up the pipeline. On the other hand, if someone asks "between which two instructions did the interrupt occur", there is no clear answer!

How do you save the state of the interrupted process? How do you allow one interrupt routine to be interrupted by a higher priority interrupt? One way to solve these problems is to add a a pair of new machine instructions for copying to and from the interrupt registers so that they can be saved in RAM and restored from RAM. Most modern CPU's contain a number of such special registers, for example, registers that allow access to hardware diagnostic functions, so it is not an undue burden to include these interrupt registers in the set of strange registers available for occasional use within supervisory code.

System programmers may not like the idea that the processor state of an interrupted process contains a diffuse set of values that have nothing to do with the visible state of that process, and authors of debuggers may be very dissapointed by the fact that they can't stop a process and look at its state as if it had stopped between two specific instructions. This is a real concern, although software that understands the pipeline could reconstruct a view of the state as if the machine was not pipelined.

An alternative way to handle interrupts in a pipelined machine is to invalidate all instructions that have yet to be committed (typically in the result-store stage of the pipeline), reporting the address of the instruction that was about to be completed as the program counter value to save, and then setting the program counter to the first instruction of the appropriate interrupt service routine. On a simple pipelined machine, this takes one clock cycle, and if the saved PC is stored in a special register, it takes no memory cycles, so it is very much compatable with simple pipelines.

On a machine supporting any form of instruction reordering, this approach fails, so we must adopt yet another approach! One option is to handle the interrupt in stages, first disabling the instruction fetch pipeline stage and then allowing all partially executed instructions to drain out of the pipe before saving any state information and setting the program counter to point to the first instruction in the interrupt handler. This slows the interrupt response, but probably not by enough to be of serious concern except in applications with very tight real-time constraints.

Multiple Register Sets

Many machines, starting in the early 1970's, had multiple sets of registers in the CPU. The register select field of the instruction provided the least significant bits of the register select signal to the register bank, while the most significant bits came from a field of the PSW. Models of the PDP-11 starting with the 11/45 had two banks of registers. The Z80 microprocessor had 2 partial banks, and the Modcomp IV computer had 16 register sets in the CPU. Such features became fairly common in designs from the 1980's. On such machines, interrupt service routines operate without saving or restoring registers so long as the number of nesting levels for interrupt calls was smaller than the number of distinct register sets.

This model can be applied to all of the interrupt mechanisms suggested here for pipelined machines. If it is applied to a pipelined machine with interrupt save registers on each interstage register, we must include a bank of interrupt save registers along with each bank of general registers.

The potential for very fast interrupt response is quite impressive with this approach because nothing at all must be saved and restored from memory for most interrupts! However, when the sum of running processes and levels of interrupt nesting exceeds the number of register sets, the context switching speed of the system will drop very suddenly. A system that very gracefully handled 16 processes may perform very poorly with 17 processes. This problem has been quite visible on some versions of the Sun SPARC family of RISC processors.

Traps

It is common to describe traps as interrupts caused directly by the instruction execution cycle within the CPU. For example, the CPU may force a trap when a user attempts to divide by zero, when an add instruciton overflows, when an unimplemented opcode is found, or when a reference is made to an illegal memory location.

Traps are indeed similar to interrupts, but there is one crucial difference. When a trap occurs, the instruction being executed is aborted. If at all possible, the instruction should have no side effects, so the CPU state saved as a result of the trap should be the state immediately prior to the instruction that caused the trap. This allows the trap service routine to inspect the cause of the trap and, in some cases to repair the problem.

For example, software may emulate instructions that are not available in this version of the hardware, for example, to do floating point in software because the optional floating point processor is not present on this machine. Or, for example, virtual memory software may respond to a reference to an illegal memory address by updating addressing data in the memory management unit, performing disk I/O, and restarting the instruction that caused the trap.

Some CPUs have instructions that cannot be executed without side effects. For example, the DEC PDP-11 autoincrement addressing modes may increment one register as it is used to fetch a source operand before starting to compute the destination address. If the destination address causes a trap, it is not trivial to undo the autoincrement that was already completed. This was only a problem for programs that attempted to recover from traps, and the only place this issue commonly arose was in the context of virtual memory.

Various PDP-11 implementations have solved this problem in different ways. The PDP-11/45, for example, had a special processor status register that was zeroed at the start of each instruction; this register was part of the memory management unit option, and was not included in the basic machine. As the CPU executed instructions, fields of this register recorded which registers had been incremented or decremented, and by how much. When a trap occurred, the trap service routine could then, in software, undo the side effects of the partially executed instruciton before doing anything else.

On a pipelined machine, traps are even more problematic. IBM's first pipelined architecture took the easy way out. If any pipeline stage raised a trap request, all instructions in the pipeline were discarded as control was transferred to the trap handler. The documentation for the machine identified, for each trap condition, whether the PC saved as a result of the trap was accurate or not. Many traps on this machine were "imprecise traps" where you knew, for example, that there had been a floating point error, but the most you could say was that it was in an instruction somewhere near the PC value that the trap hardware saved.

Some of the interrupt mechanisms discussed above will naturally lead to imprecise traps, while others lead to precise traps! It is an interesting exercise to work out which interrupt mechanisms can be easily made to deliver precise traps, and which cannot.

Generally, if the machine does not reorder instructions, and if there is exactly one pipeline stage that finalizes instrucitons (the result store stage in our example), all traps can be made precise by having the finalize stage initiate all trap processing. If a pipeline stage prior to the finalize stage detects a condition that must be trapped, it adds a trap request to the data passed onward in its outgoing interstage register. Only when this request reaches the finalize stage does the CPU cease executing instructions from the original source and begin execution of the trap service routine.

This mechanism can be somewhat improved by having the pipeline stage that requests the trap also stall, hanging in the stalled state until the finalize stage receives the trap request. If, in additon, instructions that could cause traps refuse to allow reordering to re-arrange them, it becomes possible to guarantee precise traps in all circumstances.

On the DEC Alpha, the decision was made to avoid this complexity because most traps do not need to be precise! On the Alpha, there is a special non-reorderable instruction called make-precise. If, as a programmer, you need a precise trap to tell you exactly which instruction was, for example, dividing by zero, you add make-precise instructions after each instruction that it might be. Code compiled with debugging turned on might do this, so that error messages would point to specific instructions, while production code would not. Additionally, the Alpha was designed in such a way that sufficent information was available to the handler for memory management unit traps that this software could, for example, resolve page faults without needing precise traps.