In modern computing, we often assume that each process's address space is small enough to fit entirely in physical memory. However, this assumption is far from reality. As we move towards systems that support multiple large, concurrently running processes, it’s essential to introduce an additional layer in the memory hierarchy to handle this demand, which is the hard disk drive.

You might wonder why we want to support many large processes running at the same time. With a large address space, you don’t have to worry about if there is room enough in memory for your program’s data structures; rather, you just write the program naturally, allocating memory as needed. It is a powerful illusion that the OS provides, and makes your life vastly simpler.

Introducing Swap Space

The first thing we will need to do is to reserve some space on the disk for moving pages back and forth. In operating systems, we generally refer to such space as swap space, because we swap pages out of memory to it and swap pages into memory from it. Thus, we will simply assume that the OS can read from and write to the swap space, in page-sized units. To do so, the OS will need to remember the disk address of a given page.

In the following tiny example, you can see a little example of a 4-page physical memory and an 8-page swap space.

In the example, three processes (Proc 0, Proc 1, and Proc 2) are actively sharing physical memory; each of the three, however, only have some of their valid pages in memory, with the rest located in swap space on disk. A fourth process(Proc 3) has all of its pages swapped out to disk, and thus clearly isn’t currently running. One block of swap remains free. Even from this tiny example, hopefully you can see how using swap space allows the system to pretend that memory is larger than it actually is.

We should note that swap space is not the only place on the disk used for swapping. For example, when you run a program (like your own compiled main program), the code pages are first on the disk. When the program runs, these pages are loaded into memory.

The Present Bit

How can you tell where a page is? Is it in physical memory or in the swap space? Let's find out!

First, let's remember what happens during a memory reference. The running process creates virtual memory references for fetching instructions or accessing data. The hardware then translates these into physical addresses to get the needed data from memory.

The hardware first gets the VPN from the virtual address, checks the TLB for a match (a TLB hit), and if it finds one, it gets the physical address and fetches it from memory. This is usually the case because it's quick and doesn't need extra memory accesses.

If the VPN isn't in the TLB (a TLB miss), the hardware finds the page table in memory using the page table base register and looks up the page table entry (PTE) for this page using the VPN as an index. If the page is valid and present in physical memory, the hardware takes the PFN from the PTE, puts it in the TLB, and retries the instruction, this time getting a TLB hit. So far, everything is working well.

To allow pages to be swapped to disk, we add more features. When checking the PTE, the hardware might find the page isn't in memory. This is determined by the present bit in each page-table entry. If the bit is one, the page is in memory. If it's zero, the page is on disk. Accessing a page not in memory causes a page fault.

When a page fault happens, the OS steps in to handle it. A specific piece of code, called a page-fault handler, runs to take care of the page fault, as we will explain.

What happens when the memory is full?

In the process described above, we assumed there is enough free memory to bring a page from swap space. But sometimes, memory might be full or almost full. In that case, the OS may need to move some pages out to make room for new ones. Deciding which page to remove is called the page-replacement policy.

A lot of effort has gone into making a good page-replacement policy because removing the wrong page can greatly slow down a program. If the wrong choice is made, a program might run as slow as if it were using a disk instead of memory, which could be 10,000 or 100,000 times slower. Therefore, it's important to study this policy in detail, which we will do next.

Handling The Page Fault

If a page isn't present and has been swapped to disk, the OS needs to bring the page back into memory to handle the page fault. So, how does the OS know where to find the page? In many systems, the page table stores the disk address of that page instead of storing the page's PFN. When the OS gets a page fault for a page, it checks the PTE for the address and requests the disk to load the page into memory.

When the disk I/O is done, the OS updates the page table to show the page is now in memory, updating the PFN field in the PTE. It retries the instruction, possibly causing a TLB miss, which is fixed by updating the TLB. Finally, the instruction is retried, finds the translation in the TLB, and fetches the needed data or instruction from memory.

While the I/O is happening, the process is blocked, so the OS can run other processes that are ready. Since I/O is slow, doing I/O for one process while running another process helps a multiprogrammed system use its hardware efficiently.

With this understanding, we can outline the entire process of memory access. So, if someone asks, "What happens when a program gets data from memory?" you should have a clear idea of the different possibilities.

The next snippet shows what the hardware does during translation.

1 VPN = (VirtualAddress & VPN_MASK) >> SHIFT
2 (Success, TlbEntry) = TLB_Lookup(VPN)
3 if (Success == True) // TLB Hit
4     if (CanAccess(TlbEntry.ProtectBits) == True)
5         Offset = VirtualAddress & OFFSET_MASK
6         PhysAddr = (TlbEntry.PFN << SHIFT) | Offset
7         Register = AccessMemory(PhysAddr)
8     else
9         RaiseException(PROTECTION_FAULT)
10 else // TLB Miss
11     PTEAddr = PTBR + (VPN * sizeof(PTE))
12     PTE = AccessMemory(PTEAddr)
13     if (PTE.Valid == False)
14        RaiseException(SEGMENTATION_FAULT)
15     else
16        if (CanAccess(PTE.ProtectBits) == False)
17             RaiseException(PROTECTION_FAULT)
18        else if (PTE.Present == True)
19             // assuming hardware-managed TLB
20             TLB_Insert(VPN, PTE.PFN, PTE.ProtectBits)
21             RetryInstruction()
22        else if (PTE.Present == False)
23             RaiseException(PAGE_FAULT)

There are three key situations to understand when a TLB miss happens. First, if the page is both present and valid (Lines 18–21), the TLB miss handler can just get the PFN from the PTE, retry the instruction (leading to a TLB hit), and continue as usual. In the second case (Lines 22–23), the page fault handler must run because the page is valid but not in physical memory. Third, the access might be to an invalid page, possibly due to a bug in the program (Lines 13–14). Here, the hardware catches this invalid access, and the OS trap handler runs, likely ending the faulty process.

And the next shows what the OS does when there's a page fault.

1 PFN = FindFreePhysicalPage()
2 if (PFN == -1) // no free page found
3     PFN = EvictPage() // run replacement algorithm
4 DiskRead(PTE.DiskAddr, PFN) // sleep (waiting for I/O)
5 PTE.present = True // update page table with present
6 PTE.PFN = PFN // bit and translation (PFN)
7 RetryInstruction() // retry instruction

We see what the OS needs to do to handle the page fault. First, the OS must find a physical frame for the page that will be brought in. If there isn't one, we'll need to wait for the replacement algorithm to run and remove some pages from memory, freeing them for use.

Replacement Policies

How can the OS decide which page (or pages) to evict from memory?

Developing an effective page-replacement policy is crucial because evicting the wrong page can significantly impact a program’s performance. If an incorrect page is swapped out, the program could slow down dramatically, behaving almost as if it were relying on disk storage rather than memory. Since accessing disk can be 10,000 to 100,000 times slower than accessing memory, it's essential to carefully analyze how this policy works. We will now explore this topic in greater detail.

Our goal in choosing a replacement policy is to minimize the number of times we need to get a page from the disk, known as cache misses. Alternatively, we want to maximize the number of times a page is found in memory when accessed, known as cache hits.

To understand how cache misses affect the performance, let's calculate the average memory access time (AMAT)

$$AMAT = T_M + (P_{Miss} · T_D)$$

where TM represents the cost of accessing memory, TD the cost of accessing disk, and Pmiss the probability of not finding the data in the cache(a miss).

You always pay the cost of accessing data in memory, and if there's a miss, you also have to pay the extra cost of getting the data from the disk.

For example, Let's assume hit rate = 90% and miss rate is thus 10% (Pmiss =0.1).

To calculate AMAT, we need to know the cost of accessing memory and the cost of accessing the disk. Let's say accessing memory (TM) takes about 100 nanoseconds, and accessing the disk (TD) takes about 10 milliseconds. So, the AMAT = 100ns + 0.1 × 10ms which is 1.0001 milliseconds, or about 1 millisecond. If our hit rate was 99.9% (Pmiss = 0.001), the AMAT would be 10.1 microseconds, which is about 100 times faster. As the hit rate gets closer to 100%, the AMAT gets closer to 100 nanoseconds.

In this example, it becomes clear that disk access in modern systems is so expensive that even a slight miss rate can overwhelm the overall average memory access time (AMAT) of running programs. To keep performance efficient, it's important to reduce misses as much as possible; otherwise, the system will have to run at the slower speed of the disk. To address this, we can use a carefully planned policy, which we will now discuss.

The Optimal Page Replacement Policy

The optimal policy is a straightforward approach that replaces the page that will be accessed furthest in the future, leading to the fewest possible cache misses.

Consider it this way: if you need to remove a page, why not discard the one you’ll need the latest? By doing so, you’re essentially acknowledging that the other pages in the cache hold more immediate importance. The reasoning is straightforward: you’ll likely access the other pages before needing the one that’s needed the furthest in the future.

Unfortunately, as we saw before in the development of scheduling policies, the future is not generally known; you can’t build the optimal policy for a general-purpose operating system. The optimal policy will thus serve only as a comparison point, to know how close we are to “perfect”.

Let’s Take a simple example to understand the decisions the optimal policy makes. Assume a program accesses the following stream of virtual pages: 0, 1, 2, 0, 1, 3, 0, 3, 1, 2, 1. Assuming a cache that fits three pages.

In the figure, we can ignore the first three misses since the cache is initially empty. Here’s what happens after that:

At the first miss (accessing page 3 when the cache holds pages 0, 1, and 2): The system needs to evict one page to make room for page 3. It will evict page 2 because page 2 is accessed furthest in the future compared to pages 0 and 1.
At the second miss (accessing page 2 when the cache holds pages 0, 1, and 3): Similarly, the system needs to evict one page. Both pages 0 and 3 are good candidates for eviction because neither of them will be accessed again in the future. Therefore, either can be chosen to be replaced by page 2.

This logic follows the Optimal Page Replacement algorithm, which evicts the page that will not be used for the longest period of time in the future.

First In First Out (FIFO) Replacement Policy

Some systems utilize FIFO (First-In, First-Out) as a page replacement policy. In this approach, pages are added to a queue as they enter the system. When a replacement is needed, the page at the front of the queue (i.e., the page that entered first) is removed. FIFO’s primary advantage lies in its simplicity of implementation.

Let’s see how FIFO does on our example reference here.

As you can see, when comparing FIFO to the optimal method, FIFO performs worse with a higher miss rate.

Random Replacement Policy

Another common replacement policy is Random, which selects a page at random to evict. Like FIFO, it is straightforward to implement, but it doesn’t really try to be too intelligent in picking which blocks to evict.

Here is what the Random Replacement Policy does in our example.

Of course, how Random does depends entirely upon how lucky (or unlucky) Random gets in its choices. In the example above, Random does a little better than FIFO, and a little worse than optimal.

Unfortunately, any policy as simple as FIFO or Random is likely to have a common problem: it might kick out an important page, one that is about to be referenced again. FIFO kicks out the page that was first brought in; if this happens to be a page with important code or data structures upon it, it gets thrown out anyhow, even though it will soon be paged back in. Thus, FIFO, Random, and similar policies are not likely to approach optimal; something smarter is needed.

Least Recently Used Replacement Policy

As we did with scheduling policy, to improve our guess at the future, we once again lean on the past and use history as our guide. For example, if a program has accessed a page in the near past, it is likely to access it again in the near future.

One type of historical information a page-replacement policy could use is frequency; if a page has been accessed many times, perhaps it should not be replaced. A more commonly-used property of a page is its recency of access; the more recently a page has been accessed, perhaps the more likely it will be accessed again as we discussed before in the principle of locality

A simple set of algorithms comes from this idea. The Least-Frequently-Used (LFU) method removes the page that has been used the least when a page needs to be replaced. Similarly, the Least-Recently-Used (LRU) method removes the page that hasn't been used for the longest time. These algorithms are easy to understand because their names describe exactly what they do.

To better understand LRU, let's look at how it performs on our example reference stream.

From the figure, you can see how LRU can use history to do better than stateless policies such as Random or FIFO.

Implementing Historical Algorithms ( LRU )

The Least Recently Used (LRU) algorithm tends to perform better than simpler policies like FIFO or Random, but how can we implement it effectively?

To track which pages are least used, the system needs to record some information every time a page is accessed. However, doing this tracking for every memory reference could slow down the system if not done carefully.

One way to make this process faster is to use some hardware support. For instance, when a page is accessed, the hardware could automatically update a “time” field in memory (this could be stored in the page table for each process, or in a separate array with one entry for each physical page). The time field would record when the page was last used. Later, when the operating system needs to replace a page, it can scan these time fields to find the page that hasn’t been used for the longest time.

As the number of pages in a system increases, searching through a large list of time stamps to find the exact least-recently-used page becomes very slow. For example, a modern machine with 4GB of memory divided into 4KB pages would have 1 million pages. Scanning through all of these to find the oldest page would take a lot of time, even with today’s fast CPUs.

This raises an important question: do we really need to find the exact oldest page to replace? Or can we get by with a good enough guess?

Approximating LRU

Approximating LRU (Least Recently Used) is a more practical approach for modern systems due to its reduced computational overhead. To achieve this, systems rely on hardware support, particularly a use bit (or reference bit), which is assigned to each page in the system. This bit is set to 1 whenever a page is accessed, either for reading or writing, and it is the operating system’s job to clear it when needed.

One common method for approximating LRU is the clock algorithm. In this algorithm, the pages are arranged in a circular list, and a clock hand points to a specific page. When the system needs to replace a page, it checks the use bit of the current page. If the bit is set to 1, meaning the page was recently used, the system clears the bit and moves the clock hand to the next page. This process continues until it finds a page with a use bit of 0, indicating that the page hasn’t been used recently and can be replaced.

An additional consideration in page replacement is whether a page is dirty, meaning it has been modified while in memory. If a dirty page is evicted, it must be written back to disk, which is a costly operation. To handle this, systems use a modified bit (or dirty bit) to indicate if a page has been written to. In modified versions of the clock algorithm, clean pages, which don’t require writing back to disk, are prioritized for eviction over dirty pages to minimize I/O operations.

Conclusion of Virtualization

In wrapping up our look at virtualization, it's clear that this idea is key to how modern operating systems manage and organize resources. Virtual memory, especially, helps systems overcome the limits of physical memory, allowing them to handle many tasks at once. By turning memory into large address spaces, segmenting, paging, and using swap space, the OS makes sure programs can run even when physical memory is full.

Moreover, the page replacement policies we talked about, like FIFO and LRU, show how the OS constantly balances performance and memory use. These smart algorithms highlight the cleverness of OS design, always adjusting to real-world limits like limited memory.

As we close this section on virtualization, we can appreciate how memory management and resource abstraction are crucial not only for the efficiency of systems but also for their scalability and stability.

What’s Next?

Having explored memory management and virtualization, another critical area to delve into is concurrency—how operating systems manage multiple processes running simultaneously. Concurrency introduces its own set of challenges, particularly related to synchronization and ensuring that processes don’t interfere with each other.

One of the key topics to study next is race conditions, where two or more processes try to access shared resources simultaneously, potentially leading to unpredictable results. Understanding how to handle race conditions, whether through locks, semaphores, or other synchronization mechanisms, is essential to ensure the correct execution of concurrent processes.

[8] Understanding Swap Space and Page Replacement Policies

Introducing Swap Space

The Present Bit

What happens when the memory is full?

Handling The Page Fault

Replacement Policies

The Optimal Page Replacement Policy

First In First Out (FIFO) Replacement Policy

Random Replacement Policy

Least Recently Used Replacement Policy

Implementing Historical Algorithms ( LRU )

Approximating LRU

Conclusion of Virtualization

What’s Next?

Comments

Operating Systems Three Easy Pieces

[1] Introduction To Virtualization

More from this blog

[7] Understanding Paging in Computer Systems

[6] Understanding Segmentation in Computer Systems

[5] Memory API

[4] Address Space & Address Translation

Command Palette

Introducing Swap Space

The Present Bit

What happens when the memory is full?

Handling The Page Fault

Replacement Policies

The Optimal Page Replacement Policy

First In First Out (FIFO) Replacement Policy

Random Replacement Policy

Least Recently Used Replacement Policy

Implementing Historical Algorithms ( LRU )

Approximating LRU

Conclusion of Virtualization

What’s Next?

Comments

Operating Systems Three Easy Pieces

[1] Introduction To Virtualization

More from this blog