Chapter 13, Virtual Memory

Lecture notes for 22C:116 Introduction to System Software, by Douglas W. Jones, University of Iowa Department of Computer Science

Virtual Memory

Most computer users take the main memory resources of computers for granted, assuming that they are completely implemented by hardware and thus, aside from the problem of storage allocation, not an object dealt with by system software. This is indeed the case for small computer systems, but not for many large systems. The reason is that very large high speed memories are expensive, so there is great economic pressure on system designers to use a small high speed memory and a large low speed memory where users expect a single high speed memory. This memory hierarchy frequently extends to more than two levels, for example, there might be a few thousand bytes of very high speed cache memory between the CPU and main memory; the main or primary memory itself may consist of a few megabytes of medium performance memory, and this might be backed up by hundreds of megabytes of secondary or backing storage implemented on a disk drive.

The interface between a cache memory and main memory is almost always managed by hardware, but the interface between main memory and backing storage is usually largely in software. On early systems, this interface was under user program control; when a program needed data or code, that program was responsible for bringing it into main memory. The problems of overlay management which result from this have consumed a tremendous amount of programmer effort and have motivated the development of complex software tools such as the overlay linkers discussed in Chapter 7. None of these solutions has been entirely successful, and in the early 1960's, an alternative was developed: virtual memory.

In a virtual memory system, the burden of overlay management is moved from the programmer to the operating system and the hardware, working in cooperation. When a program requests the use of an object which is not currently in main memory, the system is automatically informed by the hardware so that it can bring the needed object into main memory. The user program sees no difference between an access to a word of data which was previously in main memory and one which had to be moved from secondary memory except a difference in access time. Since most programs exhibit good locality of reference, most memory accesses will be fast, and the average speed of execution of a program will be near that which the program would attain if all of the memory were fast memory.

In addition to the sizes of the primary and secondary memory devices, three key questions must be answered to obtain a complete description of any virtual memory system. The first of these is: What are the units of transfer between main memory and the backing storage device? This has two common answers, pages, or fixed size blocks of information, and segments, or variable sized blocks. Typically, the page size used in a virtual memory system is a small multiple of the sector size of the backing storage device. When segments are used, each segment is typically a logical unit such as the code for a procedure, or a particular data object.

The second question is: What causes a transfer from backing storage to main memory? Demand transfers are those which are initiated by an attempt of a program to access a page or segment which is not currently in main memory, while anticipatory transfers are those made by the system in anticipation of a user demand. Some systems also allow a user program to request a transfer in anticipation of a specific later need.

The final question is: What policy is used to select objects in main memory which are to be moved back to the backing store when the space they occupy is needed for other pages or segments? There are a number of replacement policies, ranging from random replacement to least-recently used replacement. The choice of an appropriate replacement policy is one of the key variables determining the performance of a virtual memory system, since the premature replacement of a page or segment which is still being used can significantly slow the system.

Paging versus Segmenting

The hardware being used largely determines whether a virtual memory system uses paging or segmenting, and indeed, whether a particular computer system can be used to implement virtual memory at all. If virtual memory is to be used, the hardware must provide some facilities for translating virtual addresses, as issued by a user program, to physical addresses, as used to access actual words or bytes in the main memory. This address translation hardware is responsible for detecting attempts by a program to access objects which are currently on secondary memory, and it isolates programs from any knowledge of the physical addresses actually being used to store objects in main memory. In this latter role, virtual memory hardware can be thought of as performing a run-time relocation function.

When segmented memory is used, each memory address issued by the user program must consist of the identity of a segment and the offset of the desired item within that segment, as shown in Figure 13.1.

segment number      address within segment
 ____________        ____________________
|____________|      |____________________|

Figure 13.1. A segmented virtual address.

For example, on the Intel 80286 (and to a lesser extent, the Intel 8086), each memory reference is identified by the hardware as being addressed to either the current code segment, the current stack segment, or one of two additional segments known as the extra segments. Each segment may be from 0 to 64k bytes long, and each is described by a base register and a limit register. All later processors in the Intel 80x86 family support this addressing model, but this addressing model is rarely used on more recent family members.

On the 80286, all instruction fetches are from the instruction segment, and all operand references are, by default, from the stack segment. There is a special form of every memory reference instruction to override this default and explicitly control what segment is used for the operand. There are additional instructions for loading these base and limit registers; in protectec mode, these instructions load the base and limit registers from a 'segment description segment', the processor base segment, which lists all of the segments a program may currently access. A virtual memory system on this machine would quite naturally transfer data between primary and secondary memory in units of one segment.

When paged memory is used, each memory address issued by the user program is typically interpreted as a single integer by that program, but the hardware breaks the address into two fields. The most significant field is the page number, while the least significant field is the word (or byte) number within that page, as is shown in Figure 13.2.

            address                 --- user view
 _______________________________
|___________|___________________|

page number   address within page   --- system view

Figure 13.2. A paged virtual address.

For example, a 16 bit virtual address may be broken into an 8 bit page number, selecting one of 256 addressable pages, and an 8 bit specification of the byte desired from that page. Unlike segments, pages are all the same size, and user data and program structures are usually placed in memory without regard to how those structures are broken up into pages.

Independently of whether paged or segmented memory is used, it is conventional to refer to pages and segments as components of the user's address space, as opposed to referring to them as units of storage in memory or on disk. Thus, a particular page or segment may reside in main memory at some times and on disk at other times. The place where the segment is stored does not change its identity, and the segment may be moved from one place to another many times during the execution of a program.

When the virtual address space is broken up into pages, the term page frames or just frames is conventionally used to refer to the areas in main memory which may hold pages, and the term may also be used to refer to locations in secondary memory (disk) that hold pages, particularly if the page size is not one sector, and if pages may move from one disk address to another during the life of a program.

When a user attempts to address a page or segment which is not currently in some frame in main memory, that event is called a page fault. The term segment fault can be defined similarly, but there is nothing corresponding to a 'segment frame' because there is no standard segment size.

Most modern systems combine paging and segmenting; thus, each segment consists of one or more pages. This is a particularly useful when the maximum size of a segment is such that bringing an entire segment into main memory would be difficult. A second advantage of this is that it eliminates the need to allocate variable sized blocks of main memory to hold entire segments. When paging and segmenting are combined, the virtual address issued by a user program consists of a segment number and an integer offset within that segment, and the system interprets the offset as consisting of two fields, the first giving the page number within that segment, and the second giving the desired word in that page, as shown in Figure 13.3.

segment number            address within segment        --- user view
 ____________        _______________________________
|____________|      |___________|___________________|

segment number      page number   address within page   --- system view

                         or

                      address                           --- user view
 ___________________________________________________
|_______________|_______________|___________________|

segment number     page number    address within page   --- system view

Figure 13.3. Paged, segmented virtual addresses.

The Intel 80386 and higher numbered processors (including all Pentiums) take this view, as do the Motorola 68030 and up, and most RISC processors, including the IBM/Apple Power PC, the HP PA-RISC and the SGI MIPS families of processors. In all these machines, addresses used by the user program are 32 or more bits, and are broken into 3 fields, as illustrated in the second example above. Some operating systems on these machines use segments to hold logical objects such as major parts of the program, while others allow users to ignore the hardware segment boundaries and treat the entire address as an integer.

Paged Virtual Address Translation Hardware

Operating systems that support virtual addressing rely on the services of a hardware component called the memory management unit or MMU. The memory management unit sits logically between the CPU and the main memory. For some computers, the memory management unit is an optional component, while for others, including most modern microprocessors, the memory management unit is seen by the user as being part of the CPU, but however it is packaged, the memory management unit takes, as input, the virtual addresses issues by the program and produces, as output, the physical addresses used by the memory. This is illustrated in Figure 13.4.

 ____________                                         ________
|            |                                       |        |
|   Central  |<=====================================>|        |
| processing |             ____________              | Memory |
|    Unit    |===========>|            |============>|        |
|____________|  virtual   |   Memory   |  physical   |________|
                address   | Management |  address
                          |    Unit    |
                          |____________|

Figure 13.4. The memory management unit, in context.

When a user program issues a virtual address in a paged system, the memory management unit hardware must signal a page fault if the desired page is not in main memory, and the memory management unit must find the appropriate physical address if the page is in main memory. There are two approaches to this, one centered on the use of a page table, and one centered on the use of a frame table. A page table is an array, indexed by the page number field of a virtual address, which has entries for each page indicating where that page is currently stored. A frame table is an array, indexed by frame number, indicating which page is currently stored in each frame. In fact, most operating systems that use paged virtual memory have both tables, but the memory management unit hardware only needs to know about one or the other to do the address translation job.

The use of a page table is illustrated in Figure 13.5.

      virtual address          -- from CPU
 ___________________________
|___________|_______________|
page number   address in page
      |             |___________________ 
      |                                 |
      |     __________________          |
      |    |_____|____________|         |
index |    |_____|__        __|         |
 into |    |_____|__  page  __|         |
table  --->|_____|__  table __|         |
           |_____|____________|         |
           |_____|____________|         |
            valid    address            |
              |         |               |
              |     ____________________________
              |    |____________|_______________|
To CPU <------     frame number   address in page
               
                          physical address  -- to memory

Figure 13.5. The use of a page table.

As shown, the page number field of the virtual address is used as an index into the page table. Each page table entry has at least one bit indicating whether that entry refers to a page in main memory or to one in secondary memory; if the page is in main memory, the hardware considers a reference to that page to be valid; if not, it is an invalid address and the operating system must be informed. The remainder of the page table entry for a valid page is the frame number where that page is stored; for invalid pages, the software may use this field for something else, perhaps the disk address of the page.

If the virtual address space of a user program is small, for example, 256 pages, the entire page table may be held in special registers within the central processor. If the virtual address space is large, the page table must be held in main memory, in which case, there is typically a page-table-base register in memory management unit. When the page table is stored in main memory, it would appear that each memory access made by a user program would require an extra access to fetch the required page table entry, but most paging systems avoid most of these extra accesses by using a special cache memory inside the memory management unit to store the most recently used page table entries.

On most modern systems the page table itself can grow too large to fit conveniently in main memory. Some systems have overcome this problem by using combined paging and segmenting, so that only the page tables for recently used segments need be in memory at one time. Another approach is possible! The Digital Equipment Corproation (now part of Compaq) VAX family of computers, built between the mid 1970's and the early 1990's, stored the page table in the virtual address space; as a result, only the recently used pages of the page table would typically appear in main memory, whilt the remainder of the table would be stored on disk. Of course, this means that the VAX memory management unit may have to report an error in translating the address of the page holding the address of the page requested by the central processor, which complicates the picture.

The use of a frame table is illustrated in Figure 13.6.

      virtual address         -- from CPU
 ___________________________
|___________|_______________|
page number   address in page
      |             |______________
      |                            |
 page number                       |
 ___________  ---                  |
|___________|  |                   |
|__       __|  |-----  search      |
|__ frame __|  |     |             |
|__ table __| ---    |             |
|___________|        |             |
|___________|        |             |
    miss       ______|_____________|_______
      |       |            |               |
      |        ----------------------------
      |          frame number   address in page
  <---
  to CPU             physical address  -- to memory

Figure 13.6. The use of a frame table.

If a frame table is used, the page number field of the virtual address must be compared with each entry in the table to determine which frame holds the desired page. If there is no match, the address is invalid and the operating system must be informed. If there is a match, the index of the table entry which matched is used as the frame number in constructing the physical address. The requirement that the frame table be searched with each memory reference may sound very expensive, but the frame table is usually stored in an associative memory, a special high speed memory that allows this search to be carried out in parallel, and as a result, it can be very fast!

When a valid physical address is found as a result of virtual address translation, the instruction execution cycle continues as it would for a simple machine with no virtual addressing; ideally, the memory management unit is so fast that the CPU operates at the same speed it would have had there been no virtual memory mechanism. If the virtual address is determined to be invalid, on the other hand, the execution of the current instruction must be terminated and the operating system must be informed. This is called a page fault.

The transfer of control from the user program to the system when a page fault is called a page fault trap. In general, traps are similar to interrupts, control is transferred from the program that was running to a trap service routine which is very similar to an interrupt service routine, and when the trap service routine is done, it may return to the program that was running. There are two primary differences between traps and interrupts. First, interrupts are caused asynchronous events such as the completion of an input output operation, while traps are a direct consequence of the execution of instructions on the processor. Second, if an interrupt is requested while the processor is in the middle of executing an instruction, the processor completes that instruction before it services the interrupt. On the other hand, if a trap is requested while the processor is in the middle of executing an instruction, the processor aborts that instruction and services the trap immediately.

Demand Paging

If a system is to support demand paging, the invalid address trap service routine must update the page and frame tables after moving the desired page into main memory. After doing so, it must return to the program which caused the page fault in such a way that that program is able to continue the execution of the instruction which caused the page fault. The above description requires that the hardware possess two key properties: First, the service routine must be able to find the address referenced by the user that caused the page fault, and second, it must be possible to continue the execution of the user program after a return from the service routine.

Ideally, the hardware should load the virtual address which caused a page fault into a special register or memory location so that the trap service routine has immediate access to it. Many virtual memory systems have been constructed with much less convenient hardware, for example, some provide only values of all of the user's registers at the time of the trap. On such systems, the trap service routine must simulate the execution of the instruction which caused the trap in order to determine which memory addresses were issued by that instruction, and then test each of them to see if it would cause a page fault trap.

There are a number of ways that hardware can be built so that the execution of the user program can be continued after the cause of the page fault has been corrected. The approach most convenient for the software would involve having the CPU store all of its internal state information when a trap occurs; this would allow each instruction to be interrupted at any point and then resumed. In fact, this would require very expensive hardware, so most systems require that the instruction which was executing at the time of the trap be restarted after the trap has been serviced. As long as the instructions have no side effects prior to the occurence of a page fault, this approach leads to both simple hardware and software.

On most machines which support virtual memory, the instruction set is carefully designed so that the validity of all addresses issued by an instruction can be tested before the instruction has any side effects. There are exceptions to this; for example, on the Digital Equipment Corporation PDP-11/45, a segmented machine introduced in the mid 1970's, many instructions could increment or decrement up to two registers prior to the time a segment fault is detected. On this machine, special auxiliary registers were added to record which registers in the CPU were incremented by the current instruction, and by how much, prior to the fault. As a result, the trap service routine could undo any side-effects of the instruction prior to restarting it.

A significant number of machines have been constructed which have virtual address translation hardware but which do not have instruction sets which allow demand paging. On these machines, an invalid address trap cannot be interpreted as an indication of a page fault because there is no way to restart the instruction which caused the trap. The virtual address translation hardware is still useful for two reasons: First, although demand paging has been ruled out, the operating system may still allow programs to explicitly request that pages of their address space be moved in and out of main memory. Second, the use of virtual address translation can allow each user program to run in its own virtual address space, thus making unintended communication between user programs impossible and greatly enhancing system security.

An Example Page Fault Service Routine

In order to illustrate the construction of a page fault trap service routine, some assumptions must be made about the hardware. These involve such things as how virtual addresses are translated, what needs to be done to restart the instruction which caused the trap, and how disk input/output is done when the contents of a frame must be copied to or from disk. Additional assumptions must be made about how the software uses the disk for backing storage. These issues are discussed in the following paragraphs.

The example presented here will assume that the hardware for virtual address translation is based on a page table, as shown in Figure 13.5, where the table is stored in main memory at the address contained in a special page table base register within the memory management unit. As with the example peripheral device interface registers used in previous chapters, the page table base register will be assumed to be accessible using the inp and outp functions; this is typical of machines where the memory management unit is an optional device that was designed independently of the central processor.

The example will assume that the virtual address translation hardware maintains a register which always holds the most recent virtual address which caused a page fault trap. As a result, the page fault service routine need not try to simulate the instruction which caused the trap in order to find this address.

This example will assume that all instructions are restartable; that is, that no instruction has any side effects until all virtual addresses issued by that instruction have been translated to physical addresses. As a result, the page fault service routine need not try to undo any side effects before returning to retry the instruction which caused the trap. It will be assumed that a simple return from trap will restart the instruction in question; that is, the program counter passed to the trap service routine points to the instruction which caused the trap, and has not yet been incremented.

Finally, it will be assumed that all disk input/output requests use virtual addresses to specify the buffers being used. This complicates the disk interface, either at the hardware or low-level software level, but it simplifies our job in this section. Our virtual memory software will never attempt a disk transfer to or from a virtual address that is not currently in main memory.

As mentioned above, some assumption must be made about how virtual addresses are associated with disk addresses. The assumption which will be used here is that the virtual address space is mapped to a single disk file; as a result, each page of the virtual address space can be stored on one sector of that disk file, and the DISKREAD and DISKWRITE routines given in Figures 12.3 can be used for moving pages to and from disk.

One final issue must be addressed before a complete example can be presented: Where is the page fault trap service routine stored if all of memory is virtual? Certainly, the page fault trap service routine cannot be allowed to be copied out to disk! One solution to this problem is to permanently store the first few pages of the virtual address space in the first few frames, so that none of the interrupt service routines will ever be moved to disk. An alternate solution to this problem is to introduce two modes of central processor operation, one mode in which virtual address translation is turned on, and one mode in which it is turned off so that programs may directly access main memory; in this case, user programs run with virtual address translation turned on, but system programs such as the page fault service routine, input/output drivers, and other service routines run with virtual address translation turned off. The example page fault service routine presented here will assume the first solution.

The data structures which the example page fault service routine requires are shown in Figure 13.7.

const bytesperpage = ... { number of bytes per page };
      maxpage = ... { highest page number in virtual address space };
      maxframe = ...  { highest frame number in main memory };
      PAGETABLEBASE = ... { input/output register in MMU }
      TRAPADDRESS = ... { input/output register in MMU }

type pagenumber = 0..maxpage;
     framenumber = 0..maxframe;

var  pagetable: array [pagenumber] of record
          frame: framenumber { what frame is this page in };
          valid: boolean { true if page in main memory };
             .
             .  { other fields may be needed }
             .
     end;

     frametable: array [framenumber] of record
          page: pagenumber { what page is in this frame };
          used: boolean { true if a page is in this frame };
           .
           .  { other fields may be needed }
           .
     end;

     pagefile: opendiskfile;

Figure 13.7. Data structures for page fault management.

Initially, the page table base register is set to point to the page table, the page table is initialized so that its first few locations refer to the frames occupied by the operating system, and the remainder are initialized to invalid, indicating that the desired pages are currently on disk. The frame table is initialized to indicate which frames hold pages of the operating system, and the remainder are marked as unused. The frame table is, in a sense, redundant, since it can be reconstructed from the page table, but it is convenient to keep both data structures.

The page fault trap service routine that operates with the data structures from Figure 13.7 will typically be structured as follows:

procedure pagefaultservice;
var
     trapaddress: integer;
     page: pagenumber;
     frame: framenumber;
     done: integer;
begin
     { first, get the address of the instruction that caused the trap }
     trapaddress = inp(TRAPADDRESS);
     page := trapaddress div bytesperpage;

     { second, find a free frame }
     frame := findframe;

     { third, adjust the page table and frame table }
     pagetable[ page ].frame := frame;
     pagetable[ page ].valid := true;
     frametable[ frame ].page := page;
     frametable[ frame ].used := true;

     { copy the page from disk }
     DISKREAD( pagefile, makeaddress(page * bytesperpage), 
               bytesperpage, page, done );
     waitdiskdone( done );

     { return to the user }
end { pagefaultservice };

Figure 13.8. A page fault service routine.

The findframe function in the code in Figure 13.8 searches the frame table for an idle frame; if such a frame is unavailable, it forces some page out to disk to create an idle frame. This code assumes that the makepointer function takes an integer representing a memory address and converts it to a pointer to that address, overriding the strict typechecking of Pascal.

Page Replacement Policies

The findframe function used in the page fault service routine shown in Figure 13.8 may occasionally find an idle frame, but it will usually have to force some other page out of main memory in order to make an idle frame. The particular policy used to decide which page to force out is called the page replacement policy. The selection of an appropriate page replacement policy can have a tremendous impact on the performance of a virtual memory system. For example, if the page which is forced out is one of the pages referenced by the instruction which caused the fault, there will be an immediate page fault when that instruction is restarted.

The optimal page replacement policy, developed by Laszlo Belady in the late 1960's, is to order the pages according to when they will be needed in the future, and replace that page which will be needed at the most distant future time. Unfortunately, Belady's optimal page replacement policy cannot be implemented without knowing the complete future history of the program which caused the fault. Since the only way to accurately predict the future is to wait until it is present, the only way to compute the future history of a program is to run it! As a result, the optimum sequence of page replacements can only be determined in retrospect. The fact that there is an optimal but impractical policy is useful because it allows more realistic policies to be judged in terms of how close to the optimum they come. We can run a program and collect the sequence of virtual addresses it references, compute how long the program would have taken to run using the optimal page replacement policy, and then compare this with actual run times under practical policies.

An eminently practical but somewhat stupid page replacement policy is to simply choose the page to be replaced at random. The random replacement policy makes no use of any knowledge of program performance, and can therefore be thought of as the worst policy that can be used without resorting to deliberately malicious policies. As such, the random policy also provides a useful benchmark for the judgement of other policies. Experimentally, random replacement has been shown to result in roughly three times as many page faults as Belady's optimal policy over a reasonable range of main memory sizes.

The first-in first-out page replacement policy is relatively easy to implement and has some intuitive appeal. In this policy, the page which has been in main memory the longest is selected for replacement. The intuitive appeal of this is that it seems fair, allowing all pages to stay in main memory for about the same amount of time, assuming page faults come at an approximately constant rate. Unfortunately, empirical tests of this policy have shown that it is no better than random replacement. Apparently, the fact that a page has been in memory a long time is not a good predictor of future use of that page.

The least recently used page replacement policy is much better, being perhaps the best policy which does not rely on actual prediction of future program behavior. This policy is based on the observation that most programs exhibit significant temporal locality. That is, if a page has been recently used, it is highly likely that it will be used again, and if it has been a long time since a page has been used, it is unlikely that it will be needed again. Least recently used replacement requires that the hardware record, for each reference to a page, the time of that reference (or something equivalent to the time). When it comes time to replace a page, a search of all page frames is conducted in order to find the frame holding the least recently used page. Although systems have been built which used this policy, the hardware required to record the time of last use is expensive, so most practical systems use some approximation. Empirical results suggest that least recently used replacement leads to no more than twice the number of page faults which would result from Belady's optimal policy over a wide range of practical memory sizes.

The clock replacement policy is a widely used example of an approximation of the least recently used policy. In this case, hardware is required to record references to each page, but only one bit is needed per page, the mark bit. When a page is used, the memory management unit sets the marked bit for that page in the page table; it is up to software to reset this bit. If the software periodically resets the bit, then those pages with the marked bit set were used more recently than those with the bit reset. As a result, it is easy to select one of the less recently used pages for replacement, but the particular page which is least recently used cannot be determined.

The name of the clock policy comes from the particular scheme used to reset the referenced bit and search for a replacible page. As shown in Figure 13.9, the clock policy considers the frames to be organized in a circle, and uses a 'clock hand' to point to the most recently replaced page.

          12      frames
     11       _ 1
             / |
   10       / /  (2)
           / /----------- clock hand
   9       o/     (3)
                       (moves clockwise)
   (8)            4

      7        (5)      Marked frames are
           6              parenthesized

Figure 13.9. An abstract view of clock replacement on 12 frames.

When a page fault occurs, the clock hand sweeps around the list of frames until it encounters a frame that is not marked. This is one of the less recently used pages, so it is the one selected for replacement. As the clock hand passes over frames during its search for an unmarked page, it erases the marks on the frames it skips. Given the initial state shown in Figure 13.9, after one page fault, the clock hand will point to frame number 4 and the marks on frames numbered 2 and 3 will be reset.

The rate at which the clock algorithm resets mark bits depends on the frequency of page faults. If there are few faults, the clock hand moves slowly and pages must remain unreferenced for a long time before they are considered to be candidates for replacement. If there are many faults, the clock hand moves quickly and pages need only remain unreferenced for a short time before they are considered for replacement. Empirically, the clock policy performs only slightly worse than the basic least recently used policy, so it is an excellent choice in real systems.

A natural implementation of the clock policy requires that the hardware keep one bit per frame, thus implying that the clock policy should be easiest to implement on systems where a frame table is used by hardware for virtual address translation. In fact, it is almost as easy to have the hardware associate a bit with each page in a page table, as is shown in Figure 13.10.

var  pagetable: array [pagenumber] of record
          frame: framenumber { what frame is this page in };
          valid: boolean { true if page in main memory };
          mark: boolean { set by hardware if page used };
            .
            .
            .
     end;

     clockhand: framenumber { the clock hand };

function findframe: framenumber;
begin
     { rotate clock hand to candidate frame }
     clockhand := (clockhand + 1) mod (maxframe + 1);
     while frametable[ clockhand ].used
       and pagetable[ frametable[ clockhand ].page ].mark
     do begin
          pagetable[ frametable[ clockhand ].page ].mark := false;
          clockhand := (clockhand + 1) mod (maxframe + 1);
     end { while };

     { hand now points to the frame which will be used }
     if frametable[ clockhand ].used then begin
          DISKWRITE(
               pagefile,
               makeaddress(
                    frametable[ clockhand ].page * bytesperpage
               ), 
               bytesperpage, page, done
          );
          pagetable[ frametable[ clockhand ].valid := false;
          frametable[ clockhand ].used := false;
     end;

     { return the frame }
     findframe := clockhand;
end { findframe };

Figure 13.10. Code for the clock page replacement policy.

Note that the declaration of pagetable in Figure 13.10 is a modification of that given in Figure 13.7, and that the findframe function shown here is called from Figure 13.8. This implementation of the clock policy is typical of modern virtual memory implementations.

The Working Set Concept

Empirical analysis of the behavior of programs in a virtual memory environment has shown that most programs have, at any time, what is called a working set of pages. Pages in the working set are frequently referenced, while pages outside the working set are not referenced at all. Intuitively, the working set holds the currently relevant procedures, the local data for those procedures, and any global data structures being manipulated by those procedures. Over time, the working set of a program may change, either gradually or abruptly. For example, the working set of a program which is slowly computing its way through an array will gradually change as the program advances from one page to the next of the array, but when the program finishes the computation there is likely to be an abrupt change in the working set as the program begins the next phase of its computation.

If the number of frames available to a program is smaller than the working set of that program, there will be a very high number of page faults and not much computation will be done. When this happens, the system is said to be thrashing. Where thrashing is a problem, it is quite common for immense performance improvements to result from the addition of very small amounts of main memory. If the number of frames available to a program is larger than its working set, the number of page faults depends primarily on the rate at which the working set changes and has little to do with the number of available frames. The number of page faults only falls to zero when the entire memory used by the program fits in main memory. In general, adding more memory to a system which is not thrashing will not have much impact on performance, and the price/performance ratio for a virtual memory system is typically minimized by buying just enough memory to avoid thrashing, but no more.

It should be noted that when a program is thrashing, some computation is still being done; the program is making progress, but memory behaves as if it is much slower than main memory. If the number of frames is below a certain threshold, however, no progress can be made at all! This absolute lower limit on the number of frames is determined by the number of different pages that a single instruction can reference. Traditional single operand machine instructions and modern RISC machine instructions require at least two page frames, one to hold the page from which the instruction is fetched, and one to hold the page to or from which the operand is transferred.

For complex instruction set machines, where instructions may span word boundaries and one instruction may address n distinct memory addresses, each of which may span a word boundary, single instructions may reference 2(n+1) pages. On the old DEC PDP-11 or the very similar Motorola 680x0 families, this means one instruction can touch as many as 10 pages (in both cases, the worst case is an indirect memory to memory move). Machines like the Intel Pentium may require even larger minimum numbers of frames for certain rarely used instructions.

The graph shown in Figure 13.11 summarizes the above observations about the performance of a virtual memory system as a function of the number of pages available for a typical program.

             |     infinite
             |      |
 number of   |      |
page faults  |       \
to complete  |        \_
  program    |          \__
             |             \_______
             |                     ---------_________
             +------+-------+------------------------+-
                    a       b
                   number of available frames

       a = maximum number of pages referenced by some instruction
       b = working set size
       c = total memory used by program

Figure 13.11. Performance of a virtual memory system.

The concept of the working set of a program is strongly related to the concept of locality of reference. Programs with a great degree of locality have well defined working sets, while those which exhibit poor locality have no well defined working sets. An ideal page replacement policy should identify the working set of a program and keep that in main memory.

One approach to actually using the concept of working set in designing a page replacement policy is to (somewhat arbitrarily) decide that any page referenced fewer than t time units before a page fault is in the working set, while any page referenced before that is a candidate for replacement. As with pure LRU, this would require that there be a record of the time of reference of each page, but it can be approximated using the same hardware used for clock replacement. For example, if a 'frame tracking' procedure is run every t/a this can manage an age field associated with each frame. The frame tracking procedure can do this by inspecting and clearing the used field on each frame. If the frame was used, the tracking procedure sets the age field to zero; if not, it increments the age field. When a page fault occurs, any frame with an age greater than a can be considered for replacement.

The working set replacement policy described above is more complex than the clock policy, and if only a fixed number of frames are available, it serves no useful purpose on a single user system. On multiple user systems, however, the simple clock policy favors programs which have large well defined working sets while it takes page frames away from programs with ill defined or small working sets. A policy which, in effect, allocates frames to programs in proportion to the sizes of their observed working sets can lead to much better response times, especially for optimal systems, which (as noted previously) are on the verge of thrashing.

Enhancing the Performance of Virtual Memory

The virtual memory page replacement policies outlined in the previous sections will sometimes copy pages back from main memory to disk even though the contents of those pages have not changed since they were last copied into main memory. This is clearly an unnecessary cost, and most virtual memory systems avoid it by using additional hardware to record, for each page, whether or not it has been modified since it was last copied into main memory. This takes only one additional bit per frame table entry or per page table entry (depending on which is used by the hardware). A page which has been modified is usually referred to as a dirty page, a page which has not been modified is usually referred to as a clean page, and the bit which records the difference is referred to as the clean/dirty bit.

The set of frames can be divided into five disjoint subsets, those which are ...

clean and recently used,
dirty and recently used,
clean and old,
dirty and old, and
unused. Clearly, when there is a page fault, the unused frames should be used before any pages are replaced, and the clean old frames should be used before waiting for the disk I/O to complete on a dirty page that is being cleaned. Some systems exploit this by a modification to the clock replacement policy where the sweep of the clock hand past a frame which has not been used since the last sweep schedules it for cleaning (copying to disk), and the system maintains a list of clean frames which can be used immediately when there is a page fault. When there is a page fault, the lists of clean pages and those scheduled for cleaning are inspected first to see if the desired page is still in memory; if it is, there is no need to go to disk; if not, a clean frame is used. The system always tries to maintain a supply of clean frames; if the supply is exhausted and there are no frames scheduled for cleaning, the clock hand is advanced to build up the supply. This policy can be described as using anticipatory page cleaning.
On systems where the backing storage is able to transfer multiple consecutive pages in a time only slightly longer than the time to transfer one page, anticipatory paging becomes a practical option. In the common demand paging policy, only the page which was actually referenced by the page fault is fetched. Anticipatory paging is based on the observation that a reference to a particular page is frequently a good sign that the next page in sequence will be referenced in the near future. This is especially true if the page size is relatively small. When anticipatory paging is done, the page which has been brought into memory in anticipation of future use may indeed not be needed, so it is a prime candidate for replacement. This is why this policy would be inappropriate if a new disk seek were required for each page fetched, but quite appropriate if pages were stored in consecutive sectors, so there is no latency penalty for anticipatory page transfers.
The above approaches to anticipating the behavior of programs are automatic and take no advantage of advice the programmer can provide about the future behavior of programs. Some virtual memory systems provide special system services by which the programmer can provide advice to the system about the future behavior of the program. For example, the system might provide a service which allows a program to initiate anticipatory fetches for pages which, in the programmers estimate, will be needed in the near future. Another possible advisory service might allow the program to tell the system that some page, although dirty, will not be used again, or that some recently used page is a good candidate for replacement. These services can be used to speed the transition of a program from one working set to another, but will typically be of little use when a program is in its steady state.
Some systems allow user programs to give advice by two services, one which indicates that the user program expects to exhibit poor locality, and the other indicating that good locality is expected. Although this kind of advice is quite vague, it can be quite useful on systems which try to make extensive use of locality. For example, the anticipatory policies mentioned here will improve performance for programs exhibiting good locality, but they will probably actually lead to performance degradation for programs with more nearly random behavior. In many cases, it is possible for the programmer to predict the points during the execution of a program where that program will change from one kind of behavior to the other, and advise the system of this so that the system can choose the more appropriate policies.
In any case, most users will not modify their programs to give the system advice about the expected behavior of their code, so these advisory services will not be commonly used. Nonetheless, on many systems, there are a few heavily used programs where slight improvements can have a significant impact on user satisfaction; for example, database managers or text editors. If a system provides advisory services such as are described here, such programs can frequently make good use of them.
Other Uses of Virtual Memory

Although the terms virtual memory and demand paging are considered to be almost synonymous by many people, virtual memory can be used to serve other purposes. As already noted, virtual address translation hardware can be viewed as performing a run-time relocation function. This is particularly useful on multi-user systems where additional security is gained by giving each user a different virtual address space.
The presence of virtual address translation hardware which uses data structures describing the address space of user programs provides an opportunity to further enhance security and provide for a better debugging environment by marking each page in the address space with access rights information. Thus, pages or segments containing code can be marked as executable, those containing constants can be marked as read-only, and those containing variables can be marked as read-write. Most modern virtual memory hardware enforces such restrictions, and most operating systems on such systems will mark pages appropriately; as a result, errors such as attempting to change the value of a constant or execute a variable are detected immediately and prevented by the hardware.
If each program running on a multiple user system is given its own virtual address space, as defined by a page table, it is natural to assume that the actual pages referenced by different programs will be different. In fact, this need not be the case! It is quite possible to have two different page table entries which refer to the same actual page; as a result, it is possible for one page to be shared by two or more different virtual address spaces. This provides for the possibility of tightly controlled sharing of information between programs on such a system.
Some virtual memory systems allow programs to request that a file be used as backing storage for some range of pages in their address space. This allows programs to treat a file which has been inserted in their virtual address space as a large array; any access to that array is equivalent to input/output on that file. This is sometimes called virtual file input/output. On the MULTICS operating system, where the address space consists of many segments, each of which consists of many pages, the only way to read or write a disk file was to insert that file in the virtual address space as a segment. To load and run a program on such a system, all that must be done is to map the object file into the virtual address space and then jump to the starting address!

The Limits of Virtual Memory

It is tempting to think that demand paged virtual memory eliminates the need to consider memory limitations in the design of applications programs, but this is not the case. Different algorithms and different top-level designs for large programs will result in different working set sizes, different page fault rates, and as a result, significant differences in performance. Since optimal systems have just enough memory to avoid thrashing, careless program design which leads to large working sets can lead to very bad performance.
As an example, consider the problem of writing a compiler. Before virtual memory was available, most compilers were organized as a series of passes, with intermediate files to hold the data produced by one pass before it is read by the next. For example, the lexical analyzer might produce a file of lexemes for consumption by the syntax analyzer. The availability of large address spaces (whether virtual or not) encourages an alternate design, in which the different passes are combined and run concurrently, with data flowing between them through procedure calls. For example, the syntax analyzer might call the lexical analyzer each time it needs a lexeme.
With the multipass approach mentioned above, the working sets of each pass can be quite small and an abrupt working set change occurs at the end of each pass. On the other hand, there is a large amount of explicit input/output used to communicate between passes. With the single pass approach, there is no input/output needed to communicate between passes, but the working set of the program is the combined working sets of all of the passes! It is not hard to estimate typical numbers for the simple example of lexical analysis: A typical lexeme description might occupy only a few words, so many lexemes can be moved per disk access to the intermediate file used in the multipass approach. A typical lexical analyzer will have a working set consisting of a few pages of memory (for the code) plus perhaps one page for the stack frame and a few pages for recently referenced symbols in the symbol table. Thus, it is possible that a single call to the lexical analyzer might involve as many as ten page faults, while processing the same lexeme in a multipass compiler would require only a fraction of one disk access.

Terminology

The reader should be familiar with the following general terms after reading this section.
```
pages                           segments
demand paging                   anticipatory paging
page replacement policy         virtual address translation
primary memory                  secondary memory
main memory                     backing storage
frames                          page faults
page tables                     frame tables
address translation traps       instruction restart
random page replacement         Belady's optimal replacement policy
first-in first-out replacement  least recently used replacement
clock replacement               working set replacement
locality of reference           thrashing
clean pages                     dirty pages
separate address spaces         shared pages
read-only pages                 virtual file input/output
```
Exercises
1. Why is there no such thing as a segment frame?
2. What problem would have to be faced by the segment fault trap service routine which is not faced by a page fault trap service routine?
3. Outline how you might use a policy such as the clock policy for replacing segments in main memory in response to a segment fault.
4. How would the code shown in Figure 13.8 change if the hardware used a frame table instead of a page table?
5. How would the code shown in Figure 13.10 change if the hardware used a frame table instead of a page table?
6. Modify the code and data structures shown in Figure 13.10 to use a dirty bit to avoid copying pages back to disk when they are clean. Assume that the hardware can be modified to use the dirty bit you define in the page table.
7. Consider the disk input/output code presented in Chapters 11 and 12. This code assumed hardware that transferred entire sectors between disk and buffers in main memory. How would you have to change this code to run in a virtual memory environment if
  a) the hardware used virtual addresses for buffers?
  b) the hardware used physical addresses for buffers?
8. Assuming a machine with one accumulator and a memory which is addressed in units of words, where the page size is moderately large, how many different page faults could be caused by an attempt to fetch and execute one instruction, if that instruction is one of the following:
  a) LOAD X -- a single word instruction, including the operand address?
  b) MOVE A,B -- a 3 word instruction (opcode and 2 operand address words)?
  c) ADDF A,B,C -- a 4 word instruction (opcode, 3 operand addresses), where each operand is two words long?
9. The working set policy requires that the parameter t be given a value.
  a) Outline an experiment you could perform to determine the best value for t on a particular system.
  b) Could the page fault service routine dynamically adjust t so that it is automatically optimized?
10. Consider the following program involving the procedures "a", "b", "c", "d" and "e".
```
   a { known to take about .03 seconds };
   b { known to take about 5 seconds };
   c { known to take about .01 seconds };
   d { known to take about .01 seconds };
   e { known to take about 10 seconds };
```
  The times quoted in the comments above are the run times of each procedure when there are no page faults. If you know that the average latency of the disk system is about .02 seconds, and you know that the average lifetime of an unreferenced page in main memory before it is replaced is about 3 seconds,
  a) estimate the run-time of the above code assuming that it takes one page fault to bring each procedure into memory.
  b) for which of the above procedures would it be profitable to initiate anticipatory paging or anticipatory replacement? Modify the above code by including calls to "anticipate(x)" and "replace(x)" where these would lead to a maximum performance improvement.
  c) where should calls be inserted in the above code to indicate that it has good locality, and where should the system be told it has bad locality? Rewrite the code with additional calls to "good" and "bad" for this purpose.
11. If 99% of the memory cycles on a system run at main memory speed, while 1% of the cycles involve page faults, and if main memory takes 1 microsecond per access but it takes 10 milliseconds for a disk access, what is the average memory speed? Assume that there is always a good supply of clean pages for page replacement.
Readings

There is a good discussion of page replacement policies in Section 5.2 of
The Logical Design of Operating Systems by Alan C. Shaw's (Prentice Hall, 1974)

A brief introduction to virtual memory is included in Section 2.3.3 of
Operating System Elements by Peter Calingaert's (Prentice Hall, 1982)

The classic survey paper on virtual memory is still one of the best technical surveys of the subject.
"Virtual Memory," by Peter Denning in Computing Surveys, volume 2, number 2 (Sept. 1970) pages 153-189.
This has been reprinted in Sections 6.2 and 6.3 of
Software Systems Principles by Peter Freeman (SRA, 1975)

It is interesting to look back on the first system which used demand paging. This was the ATLAS system designed at Manchester University in England. A good source is
"The ATLAS Supervisor" by Kilburn and Payne in the Proceedings of the 1961 Eastern Joint Computer Conference, AFIPS Conference Proceedings Volume 20, pages 279 to 294,
This was reprinted on pages 661-682 of
Programming Systems and Languages, by Rosen, (McGraw Hill, 1967).

It should also be noted that, although the bulk of the research on page replacement policies was done in the late 1960's and early 1970's, work continues. A good example is
"Converting a Swap-Based System to do Paging in an Architecture Lacking Page-Referenced Bits," by Babaoglu and Joy, in Proceedings of the 8th Symposium on Operating System Principles (ACM, 1979) pages 78 to 86.
The title of this paper describes much of what it covers, but the system described there also does a large amount of anticipatory paging, and the system described is Berkeley UNIX, BSD 4.2, an important system in the history of UNIX.
Another good example of recent work is
"WSClock - A Simple and Effective Algorithm for Virtual Memory Management," by Carr and Hennessy, in Proceedings of the 8th Symposium on Operating System Principles (ACM, 1979) pages 87-95.
This paper describes a simple combination of the working set page replacement policy with the clock policy.

Chapter 13, Virtual Memory

Virtual Memory

Paging versus Segmenting

Paged Virtual Address Translation Hardware

Demand Paging

An Example Page Fault Service Routine

Page Replacement Policies

The Working Set Concept

Enhancing the Performance of Virtual Memory

Other Uses of Virtual Memory

The Limits of Virtual Memory

Terminology

Exercises

Readings