0019: CW4 Graphical Memory Maps

This web page shows color, animated versions of the memory maps in the 0019 CW4 handout, along with their accompanying text.

Getting Familiar with WeensyOS Memory Maps

Once you've cloned your repository from GitHub to your working environment, you can build the initial version of WeensyOS we've given you by issuing the shell command make run in your CW4 directory.

You should see something like the below, which shows four processes running in parallel, each running a version of the program in p-allocator:

The animated image above loops forever; in an actual run, the bars will move to the right and stay there. Don't worry if your image has different numbers of K's or otherwise has different details.

If your bars run painfully slowly, edit the p-allocator.c source file and reduce the ALLOC_SLOWDOWN constant.

Stop now to read and understand p-allocator.c.

Here’s how to interpret the memory map display:

WeensyOS displays the current state of physical and virtual memory. Each character represents 4 KB of memory: a single page. There are 2 MB of physical memory in total. (Ask yourself: how many pages is this?)
WeensyOS runs four processes, 1 through 4. Each process is compiled from the same source code (p-allocator.c), but linked to use a different region of memory.
Each process asks the kernel for more heap memory, one page at a time, until it runs out of room. As usual, each process's heap begins just above its code and global data, and ends just below its stack. The processes allocate heap memory at different rates: compared to Process 1, Process 2 allocates twice as quickly, Process 3 goes three times faster, and Process 4 goes four times faster. (A random number generator is used, so the exact rates may vary.) The marching rows of numbers (in the animated version of this map on the 0019 web site) show how quickly the heap spaces for processes 1, 2, 3, and 4 are allocated.

Here are two labeled memory maps showing what the characters mean and how memory is arranged.

The virtual memory display is similar:

The virtual memory display cycles successively among the four processes' address spaces. In the base version of the WeensyOS code we give you to start from, all four processes' address spaces are the same (your job will be to change that!).
Blank spaces in the virtual memory display correspond to unmapped addresses. If a process (or the kernel) tries to access such an address, the processor will generate a page fault hardware exception.
The character shown at address X in the virtual memory display identifies the owner of the corresponding "physical" page.
In the virtual memory display, a character is in reverse video if an application process is allowed to access the corresponding address. Initially, any process can modify all of physical memory, including the kernel. Memory is not properly isolated.

The Five Stages of WeensyOS

We describe below the five implementation stages you must complete in CW4: what you need to implement in each, and hints on how to do so.

Stage 1: Kernel Isolation

In the starting code we've given you, WeensyOS processes could stomp all over the kernel's memory if they wanted to. Better prevent that. Change kernel(), the kernel initialization function, so that kernel memory is inaccessible to applications, except for the memory holding the CGA console (the single page at (uintptr_t) console == 0xB8000.) Making the console accessible in this way, by making the range of RAM where the contents of the display are held directly accessible to applications, is a throwback to the days of DOS, whose applications typically generated console output in precisely this way. DOS couldn't run more than one application at once, so there wasn't any risk of multiple concurrent applications clobbering one another's display writes to the same screen locations. We borrow this primitive console design to keep WeensyOS simple and compact.

When you are done, WeensyOS should look like the below. In the virtual map, kernel memory is no longer reverse-video, since the user can't access it. Note the lonely CGA console memory block.

Hints:

Use virtual_memory_map(). A description of this function is in kernel.h. You will benefit from reading all the function descriptions in kernel.h. You can supply NULL for the allocator argument for now.
If you really want to look at the code for virtual_memory_map(), it is in k-hardware.c, along with many other hardware-related functions.
The perm argument to virtual_memory_map() is a bitwise-or of zero or more PTE flags: PTE_P, PTE_W, and PTE_U. PTE_P marks Present pages (pages that are mapped). PTE_W marks Writable pages. PTE_U marks User-accessible pages--pages accessible by applications. You want kernel memory to be mapped with permissions PTE_P | PTE_W, which will prevent applications from reading or writing the memory, while allowing the kernel to both read and write.
Make sure that your sys_page_alloc() system call preserves kernel isolation: Applications shouldn’t be able to use sys_page_alloc() to screw up the kernel.

Stage 2: Isolated Address Spaces for Processes

Implement process isolation by giving each process its own independent page table. Your OS memory map should look like this when you're done:

That is, each process only has permission to access its own pages. You can tell this because only its own pages are shown in reverse video.

What goes in per-process page tables:

The initial mappings for addresses less than PROC_START_ADDR should be copied from those in kernel_pagetable. You can use a loop with virtual_memory_lookup() and virtual_memory_map() to copy them. Alternately, you can copy the mappings from the kernel's page table into the new page tables; this is faster, but make sure you copy the right data!
The initial mappings for the user area--addresses greater than or equal to PROC_START_ADDR--should be inaccessible to user processes (i.e., PTE_U should not be set for these PTEs). In our solution (shown above), these addresses are totally inaccessible (so they show as blank), but you can also change this so that the mappings are still there, but accessible only to the kernel, as in this diagram:

The reverse video shows that this OS also implements process isolation correctly.

How to implement per-process page tables:

Change process_setup() to create per-process page tables.
We suggest you write a copy_pagetable(x86_64_pagetable *pagetable, int8_t owner) function that allocates and returns a new page table, initialized as a full copy of pagetable (including all mappings from pagetable). This function will be useful in Stage 5. In process_setup() you can modify the page table returned by copy_pagetable() according to the requirements above. Your function can use pageinfo[] to find free pages to use for page tables. Read about pageinfo[] at the top of kernel.c.
Remember that the x86-64 architecture uses four-level page tables.
The easiest way to copy page tables involves an allocator function suitable for passing to virtual_memory_map().
You’ll need at least to allocate a level-1 page table and initialize it to zero. You can also set up the whole four-level page table skeleton (for addresses 0...MEMSIZE_VIRTUAL - 1) yourself; then you don’t need an allocator function.
A physical page is free if pageinfo[PAGENUMBER].refcount == 0. Look at the other code in kernel.c for some hints on how to examine the pageinfo[] array.
All of process P's page table pages must have pageinfo[...].owner == P or WeensyOS's consistency-checking functions will fail. This will affect your allocator function. (Hint: Don't forget that global variables are allowed in your code!)

If you create an incorrect page table, WeensyOS might crazily reboot. Don't panic! Add log_printf() statements. Another useful technique that may at first seem counterintuitive: add infinite loops to your kernel to track down exactly where a fault occurs. (If the OS hangs without crashing once you've added an infinite loop, then the crash you're debugging must occur at a point in the kernel's execution after your infinite loop's place in the code.)

Stage 3: Virtual Page Allocation

Thus far in CW4, WeensyOS processes have used physical page allocation: the page with physical address X is used to satisfy the sys_page_alloc(X) allocation request for virtual address X. This strategy is inflexible and limits utilization. Change the implementation of the INT_SYS_PAGE_ALLOC system call so that it can use any free physical page to satisfy a sys_page_alloc(X) request.

Your new INT_SYS_PAGE_ALLOC code must perform the following tasks:

Find a free physical page using the pageinfo[] array. Return -1 to the application if you can't find one. Use any algorithm you'd like to find a free physical page; in our model solution, we just return the first one we find.
Record the physical page's allocation in pageinfo[].
Map that physical page at the requested virtual address.

Don't modify the physical_page_alloc() helper function, which is also used by the program loader. You can write a new function if you need to.

Here's how our OS looks after this stage:

Stage 4: Overlapping Virtual Address Spaces

Now the processes are isolated, which is excellent. But they're still not taking full advantage of virtual memory. Isolated address spaces can use the same virtual addresses for different physical memory. There's no need to keep the four processes' address spaces disjoint.

In this stage, change each process's stack to start from address 0x300000 == MEMSIZE_VIRTUAL. Now the processes have enough heap room to use up all of physical memory! Here's how the memory map will look after you've done it successfully:

If there's no physical memory available, sys_page_alloc() should return an error to the caller (by returning -1). Our model solution additionally prints "Out of physical memory!" to the console when this happens; you don't need to.

Stage 5: Fork

The fork() system call is one of Unix's great ideas. It starts a new process as a "copy" of an existing one. The fork() system call appears to return twice, once to each process. To the child process, it returns 0. To the parent process, it returns the child's process ID.

Run WeensyOS with make run or make run-console. At any time, press the "f" key. This will soft-reboot WeensyOS and cause it to run a single process from the p-fork application, rather than the gang of allocator processes. You should see something like this in the memory map:

That's because you haven't implemented fork() yet.

How to implement fork():

When a process calls fork(), look for a free process slot in the processes[] array. Don’t use slot 0. If no free slot exists, return -1 to the caller.
If a free slot is found, make a copy of current->p_pagetable, the forking process’s page table, using your function from earlier.
But you must also copy the process data in every application page shared by the two processes. The processes should not share any writable memory except the console (otherwise they wouldn't be isolated). So fork() must examine every virtual address in the old page table. Whenever the parent process has an application-writable page at virtual address V, then fork() must allocate a new physical page P; copy the data from the parent's page into P using memcpy(); and finally map page P at address V in the child process's page table. (There's a Linux man page for memcpy().)
The child process's registers are initialized as a copy of the parent process's registers, except for reg_rax.
Use virtual_memory_lookup() to query the mapping between virtual and physical addresses in a page table.

When you're done, you should see something like the below after pressing "f":

An image like the below, however, means you forgot to copy the data for some pages, so the processes are actually "sharing" stack and/or data pages when they should not: