Virtual Memory

9 min readFeb 26, 2023

We want to achieve a system that can allow multiple processes to run together and achieve the following goals in accessing memory:

Memory Configuration for Each Process

First, for each process, its memory has the following configuration, containing three parts: code, heap, and stack.

The heap contains dynamic data and the stack contains local data.

The heap grows from the smaller address to the larger address and the stack is the opposite.

The heap has internal fragmentation. We can freely release any part of the memory on the heap. But the stack has a continuous memory, we can only free the memory on the top (smallest address).

When we are creating dynamic data in C, remember that the data itself is on heap but the pointer is on stack.

Why and How to Achieve Virtual Memory

Then how are we supposed to arrange the memories of different processes on our main memory?

First, we are not doing time-sharing. Time-sharing means the RAM only has one process’s memory at any time. To switch among processes, the memory of the old process needs to be stored on the hard disk. Then we overwrite the main memory using the memory of the new process. This workflow is extremely inefficient.

Therefore, instead, we are using space sharing on the main memory.

There are some ways to achieve space sharing.

Static Allocation is one of them. For each process, we rewrite all the addresses in its instructions. Then all the processes use different parts of the main memory. The downside of this algorithm is that there is no process memory protection, which may arouse security issues. The second downside is that the physical address (address on the main memory) of the process is fixed after being placed on the main memory.

Dynamic Allocation is the way we prefer to manage each process’s memory.

For each process, it only knows its virtual memory address, which is different from the physical address (address on the main memory). Using a piece of hardware, called Memory Management Unit (MMU), the OS translates the virtual address to the physical address. Using this algorithm, the OS can have control over where to put each process’s memory and we can change the physical address without the process knowing it. The processes only know their own virtual address space but nothing else. Any policies to separate each process’s memory can be easily added to the OS.

Base+Bound is one of the policies to achieve dynamic allocation.

For each process, we assign a base address (physical) and memory size bound to it. These two are recorded on Process Control Block for each process.

And this is how MMU works under this algorithm.

The pros and cons of Base+Bound are:

Pros:

Provides protection of memory among the processes
Allow dynamic allocation, meaning we can change the physical address of a process without it knowing
Not expensive to implement, only a few more registers in PCB and easy hardware logic in MMU

Cons:

The memory of a process must be allocated as a whole block. If a process needs a large block of memory, sometimes we cannot find one on the main memory. This problem is called external fragmentation.
Sometimes two processes work together to solve a problem and they need memory sharing. Base+Bound doesn’t allow memory sharing.

A better policy: Segmentation

Every logical entity of one process’s virtual memory, including code, stack, and heap, corresponds to a segment on the physical memory and every segment uses Base+Bound policy.

When translating the virtual address, MMU extracts the highest bits as the identifier for the segment and the lowest bits as the offset inside this segment. Assuming that we are using 14 bits for virtual address, for the following example, the highest two bits are ID for segments and the lowest 12 bits form an offset.

This algorithm enables different processes to share some specific segments, and each segment has R/W identifier to protect itself.

The disadvantage is that we need a finer granularity in virtual memory division. A whole block of heap could still suffer from external fragmentation problem.

Therefore, we normally use the algorithm of Paging.

We divide the address space and physical memory into numerous fixed-size pages, typically 4kb one page. Pages on the main memory are often called Frames, but it is just a matter of naming, and they are the same and we can call both of them pages.

Every virtual address is translated into a physical address by MMU in the following way.

MMU does the translation by having a look-up table stored in the main memory called Page Table. Every entry in it, called Page Table Entry (PTE) records one mapping from a page number to a frame number. Every process has its own Page Table in the main memory. The structure of a linear page table is as the following image. We use virtual page number (VPN) as the index in the linear page table.

In conclusion, using virtual addresses to find the data on the main memory, we need to use the Page Table on the main memory to find the physical addresses, and then we can locate the data on the main memory.

Paging allows processes to share certain memories like kernel data.

The pros and cons of Paging

Pros

fast to allocate and free
no external fragmentation

Cons

additional cost to look up Page Table
additional cost to store Page Table. A linear page table needs to store the mapping of every frame whether the frame is valid (stores data we need) or not because it uses VPN as the index.

Faster and Smaller Page Table

To make it faster to translate virtual addresses, we store some of the frequently accessed PTE inside a component in cache, called Translation Lookaside Buffer (TLB). The idea is the same as using cache to store data.

When TLB is full, we need a policy to decide which entry to evict.

Least recently used (LRU)
Random

In practise, random is not worst than LRU.

When the CPU switches among processes, we don’t want a process to access PTEs from other processes. The worst algorithm is to flush the whole TLB for the new process.

A better idea is to assign each entry in TLB an Address Space ID (ASID). A process can only access the entry in TLB with its corresponding ASID.

To have the Page Table smaller in memory, we have some algorithms to replace the current linear page table.

Inverted Page Table: A hash table that only stores the valid PTEs. Use VPN + ASID as hashing to locate the entry we need. When hashing collides, linked list is used.

Multi-level page table: A multi-level structure of page table. Also only stores the PTEs that are valid.

The pros and cons of this algorithm

Pros:

Sparse mapping to the memory saves a lot of space to store the Page Table, much better than the linear table that stores every entry.

Cons:

More times of memory reference. An x-level page table needs (x+1) times of memory references to access the data in the main memory.

To reduce the number of memory references, we can have the page larger in size. The larger the page, the fewer TLB misses we will have.

Swapping

When the physical memory is out of space, it will keep the data on the hard disk.

Hard disk is storage with low speed, large space, and low price.

When one process first init, all data are on the hard disk. Only when the data are used, it will be moved into the main memory.

On every PTE, there is a “valid bit”. 1 means the physical memory has the data, and 0 means the data is on the hard disk. When we modify data, the modification is first done on the physical memory, and at certain times we will move the data to the hard disk. When the data on physical memory and hard disk differ because of writing, there is a “dirty bit” of 1 for the frame.

In conclusion, when a process looks for data, it will perform the following steps:

Look up TLB. If TLB has the PTE, use it. Otherwise, look up Page Table to find the PTE.
Having the PTE, if the “valid bit” is 1, translate the virtual address into the physical address, and use the physical address to find data on physical memory.
If the “valid bit” is 0, find the data on the hard disk. This situation is called Page Fault if we cannot find the data on the main memory which is quite time-consuming.

There needs to be some swapping policies to decide:

Page selection: When to move a page from the hard disk to the main memory?
Page replacement: Which page on the main memory should be evicted to make room for new pages coming in?

About Page selection, there are two policies:

Demand paging: Only move the page in when it is needed.
Prepaging: Predict which pages will be used using spatial locality and so on, and get them in advance.

About Page replacement, practically we usually have an approximation of LRU, which is called Clock Algorithm.

The Clock Algorithm goes like this: Every page has a “reference bit”, recording if it has recently been used. There’s a pointer on the main memory, and it linearly searches the memory circularly when page fault happens. Encountering page with ref bit of 1, it will set it to 0 and continues to search. Otherwise encountering PTE with ref bit of 0, the OS will evict this and stop searching.

Additionally, we can evict multiple pages at once to save time. Moreover, we should pay attention to dirty bits of the frames, because it’s more time-consuming to evict memory that is dirty.

When there are multiple processes using the main memory, we have to decide on a policy to distribute space to each of them.

Each process has a fixed-size memory space, and it only swaps frames in its own space.
The processes share the whole main memory, and they swap on the global main memory.

Some Techniques and Problems about Swapping

Memory-mapped Files is a technique used in File I/O. When we want to edit a file on the hard disk, we first swap it to the main memory using demand paging, and then we only need to treat this file as normal main memory to do I/O rather than keep using read()/write() syscalls. And in this way, multiple processes can share the same file.

Thrashing is a problem when the number of pages on the main memory is not enough. All the processes may be all busying trying to swap data from the hard disk, which causes the CPU utilization rate to drop. As a result, the CPU may try to respond to the situation, starting more new jobs, which will further deteriorate the problem.