#### CS162 Operating Systems and Systems Programming Lecture 15

#### **Demand Paging (Finished)**

March 19<sup>th</sup>, 2019 Prof. John Kubiatowicz http://cs162.eecs.Berkeley.edu

# Management & Access to the Memory Hierarchy



```
3/19/2019
```

Kubiatowicz CS162 ©UCB Spring 2019

Lec 15.2

# Recall: Demand Paging is Caching, Must Ask...

- What is block size?
  - 1 page
- What is organization of this cache (i.e. direct-mapped, setassociative, fully-associative)?
  - Fully associative: arbitrary virtual  $\rightarrow$  physical mapping
- How do we find a page in the cache when look for it?
   First check TLB, then page-table traversal
- What is page replacement policy? (i.e. LRU, Random...)
   This requires more explanation... (kinda LRU)
- What happens on a miss?
  - Go to lower level to fill miss (i.e. disk)
- What happens on a write? (write-through, write back)
  - Definitely write-back need dirty bit!

#### Recall: What is in a Page Table Entry

- What is in a Page Table Entry (or PTE)?
  - Pointer to next-level page table or to actual page
  - Permission bits: valid, read-only, read-write, write-only
- Example: Intel x86 architecture PTE:
  - Address same format previous slide (10, 10, 12-bit offset)
  - Intermediate page tables called "Directories"

| Page F<br>(Physic | rame Number<br>cal Page Number)                                                                               | Free<br>(OS)                    | 0 L          | DAC             | PAT UW P             |  |
|-------------------|---------------------------------------------------------------------------------------------------------------|---------------------------------|--------------|-----------------|----------------------|--|
|                   | 31-12<br>Present (same as "valid"<br>Writeable                                                                | 11-9<br>' <mark>bit in o</mark> | 87<br>ther a | 6 5 4<br>rchite | 4 3 2 1 0<br>ctures) |  |
| U:                | User accessible                                                                                               |                                 |              | .,              |                      |  |
|                   | PWT: Page write transparent: external cache write-through<br>PCD: Page cache disabled (page cannot be cached) |                                 |              |                 |                      |  |
|                   | Accessed: page has been accessed recently<br>Dirty (PTE only): page has been modified recently                |                                 |              |                 |                      |  |

- L: L=1⇒4MB page (directory only).
  - Bottom 22 bits of virtual address serve as offset

Lec 15.3

3/19/2019

#### **Demand Paging Mechanisms**

- PTE helps us implement demand paging
  - Valid  $\Rightarrow$  Page in memory, PTE points at physical page
  - Not Valid  $\Rightarrow$  Page not in memory; use info in PTE to find it on disk when necessary
- Suppose user references page with invalid PTE?
  - Memory Management Unit (MMU) traps to OS
    - » Resulting trap is a "Page Fault"



Lec 15.5

- What does OS do on a Page Fault?:
  - » Choose an old page to replace
  - » If old page modified ("D=1"), write contents back to disk
  - » Change its PTE and any cached TLB to be invalid
  - » Load new page into memory from disk
  - » Update page table entry, invalidate TLB for new entry
  - » Continue thread from original faulting location
- TLB for new page will be loaded when thread continued!
- While pulling pages off disk for one process, OS runs another process from ready queue
  - » Suspended process sits on wait queue

```
3/19/2019
```

Kubiatowicz CS162 ©UCB Spring 2019

#### Loading an executable into memory



## Create Virtual Address Space of the Process



- Utilized pages in the VAS are backed by a page block on disk
  - Called the backing store or swap file
  - Typically in an optimized block store, but can think of it like a file

# Create Virtual Address Space of the Process



- All the utilized regions are backed on disk
  - swapped into and out of memory as needed
- · For every process

3/19/2019

#### Create Virtual Address Space of the Process



- Resident pages to the frame in memory they occupy
- The portion of it that the HW needs to access must be resident in memory



## Provide Backing Store for VAS



- · Resident pages mapped to memory frames
- · For all other pages, OS must record where to find them on disk

Provide Backing Store for VAS

| Lec 15.9         3/19/2019         Kubiatowicz CS162 ©UCB Spring 2019         Lec 15 | Lec 15.9 | 3/19/2019 | Kubiatowicz CS162 ©UCB Spring 2019 | Lec 15.10 |
|--------------------------------------------------------------------------------------|----------|-----------|------------------------------------|-----------|
|--------------------------------------------------------------------------------------|----------|-----------|------------------------------------|-----------|

#### What Data Structure Maps Non-Resident Pages to Disk?

- FindBlock(PID, page#) → disk block
  - Some OSs utilize spare space in PTE for paged blocks
  - Like the PT, but purely software
- Where to store it?
  - In memory can be compact representation if swap storage is contiguous on disk
  - Could use hash table (like Inverted PT)
- Usually want backing store for resident pages too
- May map code segment directly to on-disk image - Saves a copy of code to swap file
- May share code segment with multiple instances of the ٠ program

# disk (huge, TB)





#### On page Fault ... find & start load



## On page Fault ... schedule other P or T



## On page Fault ... update PTE





## Eventually reschedule faulting thread

#### Summary: Steps in Handling a Page Fault



# Administrivia

- Project 1 Peer evaluations are up!
  - It is very important that you fill these out!
  - It is as important as getting to know your TA.
  - The project grades are a zero-sum game; if you do not contribute to the project, your points might be distributed to those who do!
- Midterm 2: Thursday 4/4
  - Ok, this is a few weeks and after Spring Break
  - Will definitely include Scheduling material (lecture 10)
  - Up to and including some material from lecture 17
  - Will have a Midterm review in early part of that week.... Stay tuned

## Some questions we need to answer!

- During a page fault, where does the OS get a free frame?
  - Keeps a free list
  - Unix runs a "reaper" if memory gets too full
    - » Schedule dirty pages to be written back on disk
    - » Zero (clean) pages which haven't been accessed in a while
  - As a last resort, evict a dirty page first
- · How can we organize these mechanisms?
  - Work on the replacement policy
- How many page frames/process?
  - Like thread scheduling, need to "schedule" memory resources:
     » Utilization? fairness? priority?
  - Allocation of disk paging bandwidth

3/19/2019

#### **Demand Paging Cost Model**

| Since Demand Paging like caching, can compute ave                                                                  | rade      | <ul> <li>Compulso</li> </ul>             | ory Misses:                                                                                                                                         |
|--------------------------------------------------------------------------------------------------------------------|-----------|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>Since Demand Paging like caching, can compute ave<br/>access time! ("Effective Access Time")</li> </ul>   |           | – Pages t                                | hat have never been paged into memory before                                                                                                        |
| – EAT = Hit Rate x Hit Time + Miss Rate x Miss Time                                                                |           |                                          | ght we remove these misses?                                                                                                                         |
| – EAT = Hit Time + Miss Rate x Miss Penalty                                                                        |           | » Pret<br>» Nee                          | etching: loading them into memory before needed d to predict future somehow! More later                                                             |
| • Example:                                                                                                         |           | <ul> <li>Capacity I</li> </ul>           | •                                                                                                                                                   |
| <ul> <li>Memory access time = 200 nanoseconds</li> <li>Average page-fault service time = 8 milliseconds</li> </ul> |           | <ul> <li>– Not end<br/>memory</li> </ul> | ough memory. Must somehow increase available<br>y size.                                                                                             |
| <ul> <li>Suppose p = Probability of miss, 1-p = Probably of hit</li> </ul>                                         |           | – Can we                                 |                                                                                                                                                     |
| <ul> <li>Then, we can compute EAT as follows:</li> <li>EAT = 200ns + p x 8 ms</li> </ul>                           |           | » One<br>» Anot<br>perc                  | option: Increase amount of DRAM (not quick fix!)<br>ther option: If multiple processes in memory: adjust<br>entage of memory allocated to each one! |
| = 200ns + p x 8,000,000ns                                                                                          |           | <ul> <li>Conflict M</li> </ul>           |                                                                                                                                                     |
| <ul> <li>If one access out of 1,000 causes a page fault, then<br/>EAT = 8.2 µs:</li> </ul>                         |           | – Technic<br>is a "full                  | cally, conflict misses don't exist in virtual memory, since it ly-associative" cache                                                                |
| <ul> <li>This is a slowdown by a factor of 40!</li> </ul>                                                          |           | <ul> <li>Policy Mis</li> </ul>           |                                                                                                                                                     |
| <ul> <li>What if want slowdown by less than 10%?</li> </ul>                                                        |           | – Caused                                 | l when pages were in memory, but kicked out<br>urely because of the replacement policy                                                              |
| $-200$ ns x 1.1 < EAT $\Rightarrow$ p < 2.5 x 10 <sup>-6</sup>                                                     |           |                                          | fix? Better replacement policy                                                                                                                      |
| <ul> <li>This is about 1 page fault in 400,000!</li> </ul>                                                         |           |                                          | · · · ·                                                                                                                                             |
| 3/19/2019 Kubiatowicz CS162 ©UCB Spring 2019                                                                       | Lec 15.21 | 3/19/2019                                | Kubiatowicz CS162 ©UCB Spring 2019 Lec 15.22                                                                                                        |

## Page Replacement Policies

- · Why do we care about Replacement Policy?
  - Replacement is an issue with any cache
  - Particularly important with pages
    - » The cost of being wrong is high: must go to disk
    - » Must keep important pages in memory, not toss them out

#### • FIFO (First In, First Out)

- Throw out oldest page. Be fair let every page live in memory for same amount of time.
- Bad throws out heavily used pages instead of infrequently used
- MIN (Minimum):
  - Replace page that won't be used for the longest time
  - Great, but can't really know future...
  - Makes good comparison case, however
- RANDOM:
  - Pick random page for every replacement
  - Typical solution for TLB's. Simple hardware
  - Pretty unpredictable makes it hard to make real-time guarantees

# Replacement Policies (Con't)

What Factors Lead to Misses in Page Cache?

#### • LRU (Least Recently Used):

- Replace page that hasn't been used for the longest time
- Programs have locality, so if something not used for a while, unlikely to be used in the near future.
- $-\operatorname{Seems}$  like LRU should be a good approximation to MIN.
- How to implement LRU? Use a list!



- Tail (LRU) ———— — On each use, remove page from list and place at head
- LRU page is at tail
- Problems with this scheme for paging?
  - Need to know immediately when each page used so that can change position in list...
  - Many instructions for each hardware access
- In practice, people approximate LRU (more later)

Lec 15.23

#### Example: FIFO

- Suppose we have 3 page frames, 4 virtual pages, and following reference stream:
   – A B C A B D A D B C B
- Consider FIFO Page replacement:

| Ref:          | А | В | С | А | В | D | А | D | В | С | В |
|---------------|---|---|---|---|---|---|---|---|---|---|---|
| Ref:<br>Page: |   |   |   |   |   |   |   |   |   |   |   |
| 1             | А |   |   |   |   | D |   |   |   | С |   |
| 2             |   | В |   |   |   |   | А |   |   |   |   |
| 3             |   |   | С |   |   |   |   |   | В |   |   |

- FIFO: 7 faults
- When referencing D, replacing A is bad choice, since need A again right away

| 3/19/2019 | Kubiatowicz CS162 ©UCB Spring 2019 | Lec 15.25 | 3/ |
|-----------|------------------------------------|-----------|----|
|           |                                    |           |    |
|           |                                    |           |    |

#### Example: MIN

- Suppose we have the same reference stream: - A B C A B D A D B C B
- Consider MIN Page replacement:



- MIN: 5 faults
  - Where will D be brought in? Look for page not referenced farthest in future
- What will LRU do?

- Same decisions as MIN here, but won't always be true! Kubiatowicz CS162 ©UCB Spring 2019 Lec 15.26

# When will LRU perform badly?

- Consider the following: A B C D A B C D A B C D
- LRU Performs as follows (same as FIFO here):

| Ref:<br>Page: | A | В | С | D | A | В | С | D | A | В | С | D |
|---------------|---|---|---|---|---|---|---|---|---|---|---|---|
| 1             | А |   |   | D |   |   | С |   |   | В |   |   |
| 2             |   | В |   |   | А |   |   | D |   |   | С |   |
| 3             |   |   | С |   |   | В |   |   | А |   |   | D |

- Every reference is a page fault!

## When will LRU perform badly?

- Consider the following: A B C D A B C D A B C D
- LRU Performs as follows (same as FIFO here):

| Ref:<br>Page: | A | В | С | D | A | В | С | D | A | В | С | D |
|---------------|---|---|---|---|---|---|---|---|---|---|---|---|
| 1             | А |   |   | D |   |   | С |   |   | В |   |   |
| 2             |   | В |   |   | А |   |   | D |   |   | С |   |
| 3             |   |   | С |   |   | В |   |   | А |   |   | D |

- Every reference is a page fault!
- MIN Does much better:



3/1



- One desirable property: When you add memory the miss rate drops
  - Does this always happen?
  - Seems like it should, right?
- No: Bélády's anomaly

- Certain replacement algorithms (FIFO) don't have this

obvious property! 3/19/2019 obvious property!

Lec 15.29

# Adding Memory Doesn't Always Help Fault Rate

- Does adding memory reduce number of page faults?
   –Yes for LRU and MIN
  - Not necessarily for FIFO! (Called Bélády's anomaly)



• After adding memory:

– With FIFO, contents can be completely different

 In contrast, with LRU or MIN, contents of memory with X pages are a subset of contents with X+1 Page

3/19/2019 Kubiatowicz CS162 ©UCB Spring 2019

Lec 15.30

# Implementing LRU

- Perfect:
  - Timestamp page on each reference
  - Keep list of pages ordered by time of reference
  - Too expensive to implement in reality for many reasons
- Clock Algorithm: Arrange physical pages in circle with single clock hand
  - Approximate LRU (approximation to approximation to MIN)
  - Replace an old page, not the oldest page
- Details:
  - Hardware "use" bit per physical page:
    - » Hardware sets use bit on each reference
    - » If use bit isn't set, means not referenced in a long time
    - » Some hardware sets use bit in the TLB; you have to copy this back to page table entry when TLB entry gets replaced
  - On page fault:
    - » Advance clock hand (not real time)
    - » Check use bit:  $1 \rightarrow$  used recently; clear and leave alone  $0 \rightarrow$  selected candidate for replacement
  - Will always find a page or loop forever?
    - » Even if all use bits set, will eventually loop around  $\Rightarrow$  FIFO

Lec 15.31

# Clock Algorithm: Not Recently Used

#### Single Clock Hand:

Advances only on page fault! Check for pages not used recently

Mark pages pot used

# recently

• What if hand moving slowly?

Set of all pages

in Memory

- Good sign or bad sign?
  - » Not many page faults and/or find page quickly
- What if hand is moving quickly?
  - Lots of page faults and/or lots of reference bits set
- One way to view clock algorithm:
  - Crude partitioning of pages into two groups: young and old

Kubiatowicz CS162 ©UCB Spring 2019

- Why not partition into more than 2 groups?

#### N<sup>th</sup> Chance version of Clock Algorithm

| - Use: S<br>- Modifie<br>written<br>- Valid:<br>- Read-<br>» For<br>• Do we re<br>- No. C<br>» Init<br>» On<br>pag | Set when page is referenced; cleared by clo<br>ed: set when page is modified, cleared when<br>to disk<br>ok for program to reference this page<br>only: ok for program to read page, but not n<br>r example for catching modifications to code pa<br>eally need hardware-supported "modified<br>an emulate it (BSD Unix) using read-only b<br>ially, mark all pages as read-only, even data pa<br>write, trap to OS. OS sets software "modified"<br>ge as read-write. | en page<br>modify<br>ges!<br>d" bit?<br>bit<br>ages<br>bit, and marks                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|--------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 3/19/2019                                                                                                          | Kubiatowicz CS162 ©UCB Spring 2019                                                                                                                                                                                                                                                                                                                                                                                                                                    | Lec 15.34                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                                                                                                                    | - Use: S<br>- Modifie<br>written<br>- Valid:<br>- Read-<br>» For<br>• Do we re<br>- No. C<br>» Init<br>» On<br>pag<br>» Wr                                                                                                                                                                                                                                                                                                                                            | <ul> <li>Which bits of a PTE entry are useful to us?</li> <li>Use: Set when page is referenced; cleared by clo</li> <li>Modified: set when page is modified, cleared whe written to disk</li> <li>Valid: ok for program to reference this page</li> <li>Read-only: ok for program to read page, but not in<br/>» For example for catching modifications to code page</li> <li>Do we really need hardware-supported "modifie</li> <li>No. Can emulate it (BSD Unix) using read-only to<br/>» Initially, mark all pages as read-only, even data page as read-write.</li> <li>Whenever page comes back in from disk, mark re</li> </ul> |

# Clock Algorithms Details (continued)

- Do we really need a hardware-supported "use" bit?
  - No. Can emulate it similar to above:
    - » Mark all pages as invalid, even if in memory
    - » On read to invalid page, trap to OS
    - » OS sets use bit, and marks page read-only
  - Get modified bit in same way as previous:
    - » On write, trap to OS (either invalid or read-only)
    - » Set use and modified bits, mark page read-write
  - When clock hand passes by, reset use and modified bits and mark page as invalid again
- Remember, however, clock is just an approximation of LRU!
  - Can we do a better approximation, given that we have to take page faults on some reads and writes to collect use information?
  - Need to identify an old page, not oldest page!
  - Answer: second chance list

# Second-Chance List Algorithm (VAX/VMS)

**Clock Algorithms: Details** 



- Split memory in two: Active list (RW), SC list (Invalid)
- Access pages in Active list at full speed
- Otherwise, Page Fault
  - Always move overflow page from end of Active list to front of Second-chance list (SC) and mark invalid
  - Desired Page On SC List: move to front of Active list, mark RW
  - Not on SC list: page in to front of Active list, mark RW; page out LRU victim at end of SC list Kubiatowicz CS162 ©UCB Spring 2019

Lec 15.35

#### Second-Chance List Algorithm (continued)

How many pages for second chance list?

- If 0  $\Rightarrow$  FIFO

- If all  $\Rightarrow$  LRU, but page fault on every page reference
- Pick intermediate value. Result is:
  - Pro: Few disk accesses (page only goes to disk if unused for a long time)
  - Con: Increased overhead trapping to OS (software / hardware tradeoff)
- With page translation, we can adapt to any kind of access the program makes
  - Later, we will show how to use page translation / protection to share memory between threads on widely separated machines
- · Question: why didn't VAX include "use" bit?
  - Strecker (architect) asked OS people, they said they didn't need it, so didn't implement it
  - He later got blamed, but VAX did OK anyway

```
3/19/2019
```

Kubiatowicz CS162 ©UCB Spring 2019

Lec 15.37



3/19/2019

Kubiatowicz CS162 ©UCB Spring 2019

Lec 15.38

# Demand Paging (more details)

- Does software-loaded TLB need use bit? Two Options:
  - Hardware sets use bit in TLB; when TLB entry is replaced, software copies use bit back to page table
  - Software manages TLB entries as FIFO list; everything not in TLB is Second-Chance list, managed as strict LRU
- Core Map
  - Page tables map virtual page  $\rightarrow$  physical page
  - Do we need a reverse mapping (i.e. physical page  $\rightarrow$  virtual page)?
    - » Yes. Clock algorithm runs through page frames. If sharing, then multiple virtual-pages per physical page
    - » Can't push page out to disk without invalidating all PTEs

## Allocation of Page Frames (Memory Pages)

- · How do we allocate memory among different processes?
  - Does every process get the same fraction of memory? Different fractions?
  - Should we completely swap some processes out of memory?
- Each process needs minimum number of pages
  - Want to make sure that all processes that are loaded into memory can make forward progress
  - Example: IBM 370 6 pages to handle SS MOVE instruction:
    - » instruction is 6 bytes, might span 2 pages
    - » 2 pages to handle from
    - » 2 pages to handle to
- Possible Replacement Scopes:
  - Global replacement process selects replacement frame from set of all frames; one process can take a frame from another
  - Local replacement each process selects from only its own set of allocated frames

## **Fixed/Priority Allocation**

- Equal allocation (Fixed Scheme):
  - Every process gets same amount of memory
  - Example: 100 frames, 5 processes  $\rightarrow$  process gets 20 frames
- Proportional allocation (Fixed Scheme)
  - Allocate according to the size of process
  - Computation proceeds as follows:
    - $s_i$  = size of process  $p_i$  and  $S = \sum s_i$
    - m = total number of frames

$$a_i = (\text{allocation for } p_i) = \frac{s_i}{s} \times m$$

- Priority Allocation:
  - Proportional scheme using priorities rather than size
     » Same type of computation as previous scheme
  - Possible behavior: If process  $p_i$  generates a page fault, select for replacement a frame from a process with lower priority number

 $\times m$ 

- Perhaps we should use an adaptive scheme instead???
  - What if some application just needs more memory?

| 3/19/2019 | Kubiatowicz CS162 ©UCB Spring 2019 | Lec 15.41 | 3/19/2memory' |
|-----------|------------------------------------|-----------|---------------|
|           |                                    |           |               |

# Page-Fault Frequency Allocation

• Can we reduce Capacity misses by dynamically changing the number of pages/application?



- Establish "acceptable" page-fault rate
  - If actual rate too low, process loses frame
  - If actual rate too high, process gains frame
- Question: What if we just don't have enough %/19/2 memory? Kubiatowicz CS162 ©UCB Spring 2019

Lec 15.42



## Locality In A Memory-Reference Pattern

- Program Memory Access Patterns have temporal and spatial locality
  - Group of Pages accessed along a given time slice called the "Working Set"
  - Working Set defines minimum number of pages needed for process to behave well
- Not enough memory for Working Set  $\Rightarrow$  Thrashing
  - Better to swap out process?





- For every page descriptor, keep linked list of page table entries that point to it
  - » Management nightmare expensive
- Linux 2.6: Object-based reverse mapping
  - » Link together memory region descriptors instead (much coarser granularity)

#### 3/19/2019

Lec 15.47

3/19/2019

- Mapped memory (backed by a file)

Allocation priorities

- Is blocking allowed/etc

- Anonymous memory (not backed by a file, heap/stack)

## Linux Virtual memory map



## Virtual Map (Details)

- · Kernel memory not generally visible to user
  - Exception: special VDSO (virtual dynamically linked shared objects) facility that maps kernel code into user space to aid in system calls (and to provide certain actual system calls such as gettimeofday())
- Every physical page described by a "page" structure
  - Collected together in lower physical memory
  - Can be accessed in kernel virtual space
  - Linked together in various "LRU" lists
- For 32-bit virtual memory architectures:
  - When physical memory < 896MB
    - » All physical memory mapped at 0xC0000000
  - When physical memory >= 896MB
    - » Not all physical memory mapped in kernel space all the time
    - » Can be temporarily mapped with addresses > 0xCC000000

Kubiatowicz CS162 ©UCB Spring 2019

- · For 64-bit virtual memory architectures:
  - All physical memory mapped above 0xFFFF800000000000
- 3/19/2019

Lec 15.50

# Summary

- Replacement policies
  - FIFO: Place pages on queue, replace page at end
  - MIN: Replace page that will be used farthest in future
  - LRU: Replace page used farthest in past
- Clock Algorithm: Approximation to LRU
  - Arrange all pages in circular list
  - Sweep through them, marking as not "in use"
  - If page not "in use" for one pass, than can replace
- Nth-chance clock algorithm: Another approximate LRU
  - Give pages multiple passes of clock hand before replacing
- Second-Chance List algorithm: Yet another approximate LRU
  - Divide pages into two groups, one of which is truly LRU and managed on page faults.
- Working Set:
  - Set of pages touched by a process recently
- Thrashing: a process is busy swapping pages in and out
  - Process will thrash if working set doesn't fit in memory

Need to swap out a process

Kubiatowicz CS162 ©UCB Spring 2019