[Kernel, courtesy IowaFarmer.com CornCam]

Advanced Operating Systems, Fall 2004

Lab 4: Preemptive Multitasking and More

Handed out Monday, November 15
Due Monday, November 29

Introduction

In this lab you will implement preemptive multitasking among multiple simultaneously active user-mode environments. In part 1, you will implement a Unix-like fork() function, which allows one user-mode environment to fork off other, "child" environments, which start off as virtual "clones" of the parent but can subsequently execute independently of the parent. In part 2 you will add support for inter-process communication (IPC), allowing different user-mode environments to communicate and synchronize with each other explicitly. You will also add support for hardware clock interrupts and preemption.

Getting Started

Download the Lab 4 code from lab4.tar.gz, and port your code into it using the gmake patch procedure as usual. Make sure you keep the lab4.tar.gz tarball around! There will definitely be updates to the lab.

Lab Requirements

You will need to do all of the regular exercises described in the lab. It will also be to your benefit to complete at least one challenge problem. (You do not need to do one challenge problem per part, just one for the whole lab.) There's no need to write up answers to the questions this time, but do write up a short (e.g., one or two paragraph) description of what you did to solve your chosen challenge problem. If you implement more than one challenge problem, you only need to describe one of them in the write-up, though of course you are welcome to do more. Place the write-up in a file called answers.txt (plain text) or answers.html (HTML format) in the top level of your lab4 directory before handing in your work.

Note! Part 3 of this lab is due with the rest of Lab 5, not Lab 4. But it will be to your advantage to get it done now anyway!

Part 1: Copy-on-Write Fork

As mentioned earlier, Unix provides the fork() system call as its primary process creation primitive. The fork() system call copies the address space of the calling process (the parent) to create a new process (the child).

Early versions of Unix implemented fork() by copying the parent's entire data segment into a new memory region allocated for the child. This is essentially the same approach taken by our dumbfork() function from Lab 3. This memory copying is generally by far the most expensive part of the fork() operation.

However, a call to fork() is often followed almost immediately by a call to exec() in the child process, which clears out the child process's address space and loads a new program in its place. (This is what the the shell typically does, for example.) In this case, most of the laborious work of address space copying is useless, because the child process will probably actually touch very little of its memory before calling exec() to execute a new program.

For this reason, later versions of Unix took advantage of more advanced virtual memory hardware to allow the parent and child to share the memory mapped into their respective address spaces until one of the processes actually modifies it. This technique is known as copy-on-write. To do this, on fork() the kernel would merely copy the address space mappings from the parent to the child instead of the actual contents of the mapped pages, and at the same time mark the now-shared pages read-only. When one of the two processes tries to write to one of these shared pages, the process takes a page fault. At this point, the Unix kernel realizes that the page was really a "virtual" or "copy-on-write" copy, and so it makes a new, private copy of the page for the faulting process. In this way, the contents of individual pages aren't actually copied until they are actually written to. This optimization makes a fork() followed by an exec() in the child much cheaper: the child will probably only need to copy one page (the current page of its stack) before it calls exec().

In the next piece of this lab, you will implement "proper" Unix-like fork() functionality with copy-on-write, as a user space library routine. Implementing fork() and copy-on-write support in user space has the benefit that the kernel remains much simpler and thus more likely to be correct. It also lets individual user-mode programs define their own semantics for fork(). A program that wants a slightly different implementation (for example, the expensive always-copy version like dumbfork(), or one in which the parent and child actually share memory afterward) can easily provide its own.

User-level page fault handling

The first thing we must do in order to allow user-mode code to implement true copy-on-write fork() functionality is to enable user-mode programs to handle their own page faults instead of depending on the kernel to do so. Copy-on-write is only one of many possible uses for user-level page fault handling, however.

Most page faults encountered during normal program execution are usually fixable. For example, most Unix kernels initially map only a single page in a new process's stack region, and allocate and map additional stack pages later "on demand" as the process's stack consumption increases and causes page faults on stack pages that are not yet mapped. A typical Unix kernel must track, for each user-mode process, the various regions of memory that are mapped and what to do when faults happen in them. For example, a fault in the stack region will typically allocate and map in a new page. A fault in the program's BSS region will typically map in a new page and also make sure it is zeroed. In systems with demand-paged executables, a fault in the text region will read the corresponding page of the binary off of disk and then map it in.

This is a lot of information for the kernel to track and get right. Instead of taking the traditional Unix approach, in JOS we push this fault handling functionality into user space, where bugs are less damaging. This design has the added benefit of allowing programs great flexibility in defining their memory regions. Not only do we get the ability to implement copy-on-write fork(), but we will use user-level page fault handling later for mapping and accessing files on a disk-based file system.

Setting the Page Fault Handler

In order to handle its own page faults, a user environment will need to register a page fault upcall with the JOS kernel. The user environment registers its page fault upcall via the new sys_set_pgfault_upcall system call. We have added a new member to the Env structure, env_pgfault_upcall, to record this information.

Exercise 1. Implement the sys_set_pgfault_upcall system call in kern/syscall.c, and hook it up to syscall. When looking up the environment ID of the target environment, be sure to enable permission checking by passing 1 as envid2env's checkperm parameter, since this is also a "dangerous" system call.

Normal and Exception Stacks in User Environments

During normal execution, a user environment in JOS will run on the normal user stack: its ESP register starts out pointing at USTACKTOP, and the stack data it pushes resides on the page between USTACKTOP-PGSIZE and USTACKTOP-1 inclusive. When a page fault occurs in user mode, however, the kernel will restart the user environment running a designated user-level page fault handler on a different stack, namely the user exception stack. In essence, we will make the JOS kernel implement automatic "stack switching" functionality on behalf of the user environment, in much the same way that the x86 processor already implements stack switching functionality on behalf of JOS when transferring from user mode to kernel mode!

The JOS user exception stack is also one page in size, and its top is defined to be at virtual address UXSTACKTOP, so the valid bytes of the user exception stack are from UXSTACKTOP-PGSIZE through UXSTACKTOP-1 inclusive. While running on this exception stack, the user-level page fault handler can use JOS's regular system calls to map new pages or adjust mappings so as to fix whatever problem originally caused the page fault. Then the user-level page fault handler returns, via an assembly language stub, to the faulting code on the original stack. This return takes place entirely in user mode! The sequence of events will look like this:

Normal user code causes a page fault.
The processor saves its state and branches to the kernel's IDT entry for page faults, which calls page_fault_handler.
The kernel sets up an exception frame on the user exception stack and branches to the environment's page fault upcall. All registers have the same values as when the fault happened, except for %esp, which points to the exception frame; %eip, which points to the page fault upcall; and %eflags.
The page fault upcall handles the page fault, making sure to save any important register values.
After restoring the register values, the page fault upcall branches directly to the %eip that caused the fault.

Each user environment that wants to support user-level page fault handling will need to allocate memory for its own exception stack, using the sys_page_alloc() system call introduced in Lab 3.

Invoking the User Page Fault Handler

You will now need to change the page fault handling code in kern/trap.c to handle page faults from user mode as follows. We will call the state of the user environment at the time of the fault the trap-time state.

If there is no page fault handler registered, the JOS kernel destroys the user environment with a message as before. Otherwise, the kernel sets up a trap frame on the exception stack that looks like this:

                 	-0  <-- UXSTACKTOP
empty			-4
empty			-8
empty			-12
empty			-16
empty			-20
trap-time eip		-24
trap-time eflags   	-28
trap-time esp		-32
tf_err (error code)	-36
fault_va        	-40 <-- %esp when handler is run

The kernel then arranges for the user environment to resume execution with the page fault handler running on the exception stack with this stack frame. (You must figure out how to make this happen.) Each empty line in the frame above is simply a 32-bit word-size space on the exception stack that the kernel does not initialize, but which the fault handler in the user environment can use. The fault_va is the virtual address at which the page fault occurred.

If the user environment is already running on the user exception stack when an exception occurs, then the page fault handler itself has faulted. In this case, you should start the new stack frame just under the current tf->tf_esp rather than at UXSTACKTOP:

(...existing contents
of exception stack...)
                 	-0  <-- tf->tf_esp
empty			-4
empty			-8
empty			-12
empty			-16
empty			-20
trap-time eip		-24
trap-time eflags   	-28
trap-time esp		-32
tf_err (error code)	-36
fault_va        	-40 <-- %esp when handler is run

To test whether tf->tf_esp is already on the user exception stack, check whether it is in the range between UXSTACKTOP-PGSIZE and UXSTACKTOP-1, inclusive.

Exercise 2. Implement the code in kern/trap.c required to dispatch page faults to the user-mode handler. Be sure to take appropriate precautions when writing into the exception stack. (What happens if the user environment runs out of space on the exception stack?)

User-mode Page Fault Entrypoint

Next, you need to implement the assembly routine that will take care of calling the C page fault handler and resume execution at the original faulting instruction. This assembly routine is the handler that will be registered with the kernel using sys_set_pgfault_upcall().

Exercise 3. Implement the _pgfault_upcall routine in lib/pfentry.S. There is a commented outline there to help you along. You may find it useful to reread the description from the beginning of this section as well.

Three of the empty words in the stack frame created by the kernel above are for the assembly routine to use to save the caller-saved registers before moving on to C code. (We must save the caller-saved registers -- %eax, %ecx, and %edx -- because when we return from the fault with ret, all registers should have the same values as before the fault; but the C page fault handler is free to modify the caller-saved registers.) The other two are important for the recursive fault case; see the comment in the lab. You may want to draw a stack diagram for the recursive case and carefully examine how each word gets used at each point during the execution of the fault handler in order to figure out how to use these words correctly.

Finally, you need to implement the C user library side of the user-level page fault handling mechanism.

Exercise 4. Finish set_pgfault_handler() in lib/pgfault.c.

Testing

Change kern/init.c to run user/faultread. Build your kernel and run it. You should see:

[00000000] new env 00000400
[00000000] new env 00000401
[00000401] user fault va 00000000 ip 0080003a
TRAP frame ...
[00000401] free env 00000401

Change kern/init.c to run user/faultdie. Build your kernel and run it. You should see:

[00000000] new env 00000400
[00000000] new env 00000401
i faulted at va deadbeef, err 6
[00000401] exiting gracefully
[00000401] free env 00000401

Change kern/init.c to run user/faultalloc. Build your kernel and run it. You should see:

[00000000] new env 00000400
[00000000] new env 00000401
fault deadbeef
this string was faulted in at deadbeef
fault cafebffe
fault cafec000
this string was faulted in at cafebffe
[00000401] exiting gracefully
[00000401] free env 00000401

If you see only the first "this string" line, it means you are not handling recursive page faults properly.

Change kern/init.c to run user/faultallocbad. Build your kernel and run it. You should see:

[00000000] new env 00000400
[00000000] new env 00000401
[00000401] PFM_KILL va deadbeef ip f010263d
TRAP frame ...
[00000401] free env 00000401

(Your ip may differ from ours but should begin f01.)

Make sure you understand why user/faultalloc and user/faultallocbad behave differently.

Challenge! Extend your kernel so that not only page faults, but all types of processor exceptions that code running in user space can generate, can be redirected to a user-mode exception handler. Write user-mode test programs to test user-mode handling of various exceptions such as divide-by-zero, general protection fault, and illegal opcode.

Implementing Copy-on-Write Fork

We now have the facilities to implement full copy-on-write fork functionality, entirely in user space.

We have provided a skeleton for your fork() function in lib/fork.c. Like dumbfork(), fork() creates a new environment, then scans through the parent environment's entire address space and sets up corresponding page mappings in the child. The key difference is that, while dumbfork() copied entire pages, fork() will initially only copy page mappings. Notice that the duppage() helper function in dumbfork calls sys_page_alloc() to allocate a new page of physical memory for each page in the parent, and then calls memcpy() to copy the contents of the parent's page into the child's new page. These calls to memcpy() represent the bulk of the time dumbfork() takes to run, and so fork() attempts to "optimize away" most of this page copying by copying pages lazily only when they are actually modified.

The basic control flow for fork() is as follows:

The parent installs pgfault() as the C-level page fault handler, using the set_pgfault_handler() function you implemented above.
The parent calls sys_exofork() to allocate a child environment.
For each writable or copy-on-write page in its address space below UTOP, the parent maps the page copy-on-write into the address space of the child and then remaps the page copy-on-write in its own address space.
The exception stack is not remapped this way, however. Instead you need to allocate a fresh page in the child for the exception stack. Since the page fault handler will be doing the actual copying and the page fault handler runs on the exception stack, the exception stack cannot be made copy-on-write: who would copy it?
The parent sets the page fault upcall for the child to look like its own.
The child is now ready to run, so the parent marks it runnable.

After the fork, both processes will take page faults when the code they run attempts to write to a page that hasn't been copied yet. Here's the control flow for the user page fault handler:

The kernel propagates the page fault to _pgfault_upcall, which calls fork()'s pgfault() handler.
pgfault() checks that the fault is a write (check err & FEC_WR) and that the PTE for the page is marked PTE_COW (copy-on-write). If not, it panics.
pgfault() allocates a new page mapped at a temporary location (namely, PFTEMP) and copies the contents of the faulting page contents into it. Then the fault handler maps the new page at the appropriate address with read/write permissions, in place of the old read-only mapping.

Exercise 5. Implement fork and pgfault in lib/fork.c.

Test your code with the forktree program. It should produce the following messages, with interspersed 'new env', 'free env', and 'exiting gracefully' messages:

	0401: I am ''
	0402: I am '0'
	0801: I am '00'
	0802: I am '000'
	0403: I am '1'
	0c01: I am '11'
	0c02: I am '10'
	1001: I am '100'
	0803: I am '110'
	0404: I am '01'
	1401: I am '011'
	0c03: I am '010'
	0405: I am '001'
	0406: I am '111'
	0407: I am '101'

Challenge! Implement a shared-memory fork called sfork. This version should have the parent and child sharing all their memory pages (writes in one environment appear in the other) except for pages in the stack area, which should be treated in the usual copy-on-write manner. Modify user/forktree.c to use sfork() instead of regular fork(). Also, once you have finished implementing IPC in part C, use your sfork() to run user/pingpongs. You will have to find a new way to provide the functionality of the global env pointer.

Challenge! The current copy-on-write fork copies more pages than necessary. In particular, if both the parent and the child write to a page, then that page will be copied twice: once in the parent, and once in the child. The second of these copies is clearly unnecessary. Write a version of fork that can avoid this second copy in some circumstances. Make sure you correctly handle the case where more than two environments share a copy-on-write page.

Challenge! Your implementation of fork makes a huge number of system calls. On the x86, switching into the kernel has non-trivial cost. Augment the system call interface so that it is possible to send a batch of system calls at once. Then change fork to use this interface.

How much faster is your new fork?

You can answer this (roughly) by using analytical arguments to estimate how much of an improvement batching system calls will make to the performance of your fork: How expensive is an int 0x30 instruction? How many times do you execute int 0x30 in your fork? Is accessing the TSS stack switch also expensive? And so on...

Alternatively, you can boot your kernel on real hardware and really benchmark your code. See the RDTSC (read time-stamp counter) instruction, defined in the IA32 manual, which counts the number of clock cycles that have elapsed since the last processor reset. Bochs doesn't emulate this instruction faithfully.

This ends part 1. As usual, you can grade your submission with gmake grade and hand it in by emailing me the results of gmake tarball.

Part 2: Preemptive Multitasking and Inter-Process communication (IPC)

In the final part of lab 4 we will enhance the JOS kernel's support for multiple environments by allowing the kernel to preempt uncooperative environments and by allowing environments to pass messages to each other explicitly.

Clock Interrupts and Preemption

Modify kern/init.c to run the user/spin test program. This test program forks off a child process, which simply spins forever in a tight loop once it receives control of the CPU. Neither the parent process nor the kernel ever regains the CPU. This is obviously not an ideal situation in terms of protecting the system from bugs or malicious code in user-mode environments, because any user-mode environment can bring the whole system to a halt simply by getting into an infinite loop and never giving back the CPU. In order to allow the kernel to preempt a running environment, forcefully retake control of the CPU from it, we must extend the JOS kernel to support external hardware interrupts. In particular, we'll program the hardware to generate clock interrupts periodically, which will force control back to the kernel where we can switch control to a different user environment.

Interrupt discipline

External interrupts (i.e., device interrupts) are refered as IRQs. There are 16 possible IRQs, numbered 0 through 15. The mapping from IRQ number to IDT entry is not fixed. Pic_init in picirq.c maps IRQs 0-15 to interrupts IRQ_OFFSET through IRQ_OFFSET+15.

In kern/picirq.h, IRQ_OFFSET is defined to be decimal 32. Thus the IDT entries 32-47 correspond to the IRQs 0-15. The clock interrupt is IRQ 0, so IDT[32] contains the address of the clock's interrupt handler routine in the kernel. The IRQ_OFFSET of 32 was chosen so that the device interrupts do not overlap with the processor exceptions, which could obviously cause confusion. (In fact, in the early days of PCs running MS-DOS, the IRQ_OFFSET effectively was zero, which indeed caused massive confusion between handling hardware interrupts and handling processor exceptions!)

In JOS, we make a key simplification compared to Unix. External device interrupts are always disabled when in the kernel and always enabled when in user space. External interrupts are controlled by the FL_IF flag bit of the %eflags register (see inc/mmu.h). When this bit is set, external interrupts are enabled. While the bit can be modified in several ways, because of our simplification, we will handle it solely through the process of saving and restoring %eflags register as we enter and leave user mode.

You will have to ensure that the FL_IF flag is set in user processes when they run so that when an interrupt arrives, it gets passed through to the processor and handled by your interrupt code. Otherwise, interrupts are masked, or ignored until interrupts are re-enabled. Interrupts are masked by default after processor reset, and so far we have simply never gotten around to enabling them.

Exercise 6. Modify kern/trapentry.S and kern/trap.c to initialize the appropriate entries in the IDT and provide handlers for IRQs 0 through 15. Make sure that all entry points into the kernel turn off interrupts. (Check the calls to SETGATE. You might want to re-read section 9.2 of the 80386 Reference Manual, or section 5.8 of the IA-32 Intel Architecture Software Developer's Manual, Volume 3.) Then modify the code in env_alloc() to ensure that user environments are always run with interrupts enabled.

The processor never pushes an error code or checks the Descriptor Privilege Level (DPL) of the IDT entry when invoking a hardware interrupt handler.

After doing this exercise, if you run your kernel with any test program that runs for a non-trivial length of time (e.g., dumbfork), you should see a kernel panic shortly into the program's execution, followed by some strange output from JOS. This is because our code has set up the clock hardware to generate clock interrupts, and interrupts are now enabled in the processor, but JOS isn't yet handling them.

Handling Clock Interrupts

Now, you'll write the code to handle clock interrupts. (The calls to pic_init and kclock_init in init.c, which we have written for you, set up the clock and the interrupt controller to generate interrupts, but JOS doesn't handle them yet.)

Exercise 7. Modify the kernel's trap() function so that it calls sched_yield() to find and run a different environment whenever a clock interrupt takes place.

You should now be able to get the user/spin test to work: the parent process should fork off the child, sys_yield() to it a couple times but in each case regain control of the CPU after one time slice, and finally kill the child process and terminate gracefully.

Some questions for you:

How many instruction of user code are executed between each interrupt?
How many instructions of kernel code are executed to handle the interrupt? Hint: use the vb command mentioned earlier.

Inter-Process communication (IPC)

(Technically in JOS this is "inter-environment communication" or "IEC", but everyone else calls it IPC, so we'll use the standard term.)

We've been focusing on the isolation aspects of the operating system, the ways it provides the illusion that each program has a machine all to itself. Another important service of an operating system is to allow programs to communicate with each other when they want to. It can be quite powerful to let programs interact with other programs. The Unix pipe model is the canonical example.

There are many models for interprocess communication. Even today there are still debates about which models are better for various reasons. We won't get into that debate. Instead, we'll implement a simple IPC mechanism and then try it out.

IPC in JOS

You will implement a few additional JOS kernel system calls that collectively provide a simple interprocess communication mechanism. You will implement two system calls, sys_ipc_recv and sys_ipc_try_send. Then you will implement two library wrappers ipc_recv and ipc_send.

The "messages" that user environments can send to each other using JOS's IPC mechanism consist of two components: a single 32-bit value, and optionally a single page mapping. Allowing environments to pass page mappings in messages provides an efficient way to transfer more data than will fit into a single 32-bit integer, and also allows environments to set up shared memory arrangements easily.

Sending and Receiving Messages

To receive a message, an environment calls sys_ipc_recv. This system call deschedules the current environment and does not run it again until a message has been received. When an environment is waiting to receive a message, any other environment can send it a message - not just a particular environment, and not just environments that have a parent/child arrangement with the receiving environment. In other words, the permission checking used in Lab 3's system calls will not apply to IPC, because the IPC system calls are carefully designed so as to be "safe": an environment cannot cause another environment to malfunction simply by sending it messages (unless the target environment is also buggy).

To try to send a value, an environment calls sys_ipc_try_send with both the receiver's environment id and the value to be sent. If the named environment is actually receiving (it has called sys_ipc_recv and not gotten a value yet), then the send delivers the message and returns 0. Otherwise the send returns -E_IPC_NOT_RECV to indicate that the target environment is not currently expecting to receive a value.

A library function ipc_recv in user space will take care of calling sys_ipc_recv and then looking up the information about the received values in the current environment's struct Env. Similarly, a library function ipc_send will take care of repeatedly calling sys_ipc_try_send until the send succeeds.Transferring Pages

When an environment calls sys_ipc_recv with a dstva parameter below UTOP, the environment is stating that it is willing to receive a page mapping. If the sender sends a page, then that page should be mapped at dstva in the receiver's address space. If the receiver already had a page mapped at dstva, then that previous page is unmapped.

When an environment calls sys_ipc_try_send with a srcva parameter below UTOP, it means the sender wants to send the page currently mapped at srcva to the receiver, with permissions perm. After a successful IPC, the sender keeps its original mapping for the page at srcva in its address space, but the receiver also obtains a mapping for this same physical page at the dstva originally specified by the receiver, in the receiver's address space. As a result this page becomes shared between the sender and receiver.

If either the sender or the receiver does not indicate that a page should be transferred, then no page is transferred. After any IPC the kernel sets the new field env_ipc_perm in the receiver's Env structure to the permissions of the page received, or zero if no page was received.

Implementing IPC

Exercise 8. Implement sys_ipc_recv and sys_ipc_try_send in kern/syscall.c. When you call envid2env in these routines, you should set the checkperm flag to 0, meaning that any environment is allowed to send IPC messages to any other environment, and the kernel does no special permission checking other than verifying that the target envid is valid. Then implement the user versions, ipc_recv and ipc_send, in lib/ipc.c.

Use the user/pingpong and user/primes functions to test your IPC mechanism. You might find it interesting to read user/primes.c to see all the forking and IPC going on behind the scenes.

Challenge! The ipc_send function is not very fair. Run three copies of user/fairness and you will see this problem. The first two copies are both trying to send to the third copy, but only one of them will ever succeed. Make the IPC fair, so that each copy has approximately equal chance of succeeding.

Challenge! Why does ipc_send have to loop? Change the system call interface so it doesn't have to. Make sure you can handle multiple environments trying to send to one environment at the same time.

Challenge! The prime sieve is only one neat use of message passing between a large number of concurrent programs. Read C. A. R. Hoare, "Communicating Sequential Processes," Communications of the ACM 21(8) (August 1978), 666-667, and implement the matrix multiplication example.

Challenge! Probably the most impressive example of the power of message passing is Doug McIlroy's power series calculator, described in M. Douglas McIlroy, "Squinting at Power Series," Software--Practice and Experience, 20(7) (July 1990), 661-683. Implement his power series calculator and compute the power series for sin (1+x²).

Challenge! Make JOS's IPC mechanism more efficient by applying some of the techniques from Liedtke's paper, "Improving IPC by Kernel Design", or any other tricks you may think of. Feel free to modify the kernel's system call API for this purpose, as long as your code is backwards compatible with what our grading scripts expect.

Challenge! Generalize the JOS IPC interface so it is more like L4's, supporting more complex message formats.

This ends part 2. As usual, you can grade your submission with gmake grade. If you are trying to figure out why a particular test case is failing, run sh grade.sh -v, which will show you the output of the kernel builds and Bochs runs for each test, until a test fails. When a test fails, the script will stop, and then you can inspect bochs.out to see what the kernel actually printed.

Part 3: Kernel Binaries

Note! Part 3 of this lab is due with the rest of Lab 5, not Lab 4. But it will be to your advantage to get it done now anyway!

In this portion of the lab, you'll implement spawn, a library OS function that creates a new environment, loads a program image from the kernel, and then starts the child environment running this program. The parent process then continues running independently of the child. The spawn function acts effectively like a UNIX fork followed by an immediate exec.

We're implementing spawn rather than a UNIX-style exec because spawn is easier to implement from user space in "exokernel fashion", without special help from the kernel. Think about what you would have to do in order to implement exec in user space, and be sure you understand why it is harder.

In later labs, you'll load program images from a tiny file system. But we don't have a file system yet, so we've introduced a special system call, sys_kernbin_page_alloc, that you'll use to load at user level the user program images that are linked into the kernel. sys_kernbin_page_alloc allocates a page of memory in a destination environment, then copies data from a named binary image (such as "dumbfork" or "faultbadalloc") into that page.

Exercise 9. Read the header comment on the sys_kernbin_page_alloc function in kern/syscall.c and make sure you understand it. Change your syscall() function to dispatch the relevant system call to sys_kernbin_page_alloc.

Mini-Challenge: Delete the body of sys_kernbin_page_alloc before you read it, then implement it from the documentation. Check out kern/kernbin.h.

The other new system call you'll use is sys_set_trapframe, which lets an environment set its children's struct Trapframe (or its own) to an arbitrary value. Our spawn will call sys_set_trapframe to make the child process start executing the loaded program, rather than starting at the location of the instruction immediately following the parent's sys_exofork.

Exercise 10. Read the header comment on the sys_set_trapframe function in kern/syscall.c and make sure you understand it. Change your syscall() function to dispatch the relevant system call to sys_set_trapframe.

Mini-Challenge: Delete the body of sys_set_trapframe before you read it, then implement it from the documentation.

The skeleton for the spawn function is in lib/spawn.c. We will put off the implementation of argument passing until the next exercise. Fill in spawn so that it operates roughly as follows:

Read in the ELF header of the named binary using sys_kernbin_page_alloc.
Create a new environment.
Allocate a stack at USTACKTOP - PGSIZE using the provided init_stack function.
Load the program text, data, and bss at the appropriate addresses specified in the ELF executable, by filling in the load_segment helper function. Don't forget to clear to zero any portions of these program segments that are not loaded from the executable file, and to set the program's entry point.
Initialize the child's register state using the new sys_set_trapframe system call.
Start the environment running!

Exercise 11. Finish spawn and load_segment.

Test your code by changing kern/init.c to run the user_spawnhello program. This should print out something like this, as the "parent environment" spawns off the hello program:

[00000000] new env 00000401
i am parent environment 00000401
[00000401] new env 00000402
[00000401] exiting gracefully
[00000401] free env 00000401
hello, world
i am environment 00000402
[00000402] exiting gracefully
[00000402] free env 00000402

Challenge! Implement Unix-style exec.

Last but not least, you'll update init_stack in lib/spawn.c to pass any command-line arguments into the spawned process, via argc and argv. There's only one way to pass data to a process: stick that data in a page, map that page in the process's memory space, and pass the process a pointer to the data. In JOS, we save space by storing the arguments on the child process's initial stack page!

There are two components of this work: what the parent does and what the child does.

On the parent side, spawn must setup the new environment's initial stack page so that the arguments are available to the child's umain() function. The parent formats the memory according to the following diagram.

 
USTACKTOP: 
         +--------------+
         |   block of   | Block of strings.  In the example
         |    memory    | "simple", "-f", "foo", "-c", and
         | holding NULL | "junk" would be stored here. 
         |  terminated  |
         | argv strings |
         +--------------+
         |  &argv[n]    |  Next, comes the argv array--an array of 
         |     .        |  pointers to the string. Each &argv[*] points 
         |     .        |  into the "block of strings" above.
         |     .        |
         |  &argv[1]    |
         |  &argv[0]    |<-.
         +--------------+   |
         |   argv ptr   |__/  In the body of umain, access to argc 
%esp ->  |   argc       |     and argv reference these two values.
         +--------------+

If these values are on the stack when umain is called, then umain will be able to access its arguments via the int argc and char* argv[] parameters.

Warning: the diagram shows the memory at USTACKTOP since this is where it will be mapped in the child's address space. However, be careful! When the parent formats the arguments, it must do so at a temporary address, since it can't (well, shouldn't) map over its own stack. Similarly, take care when setting the pointers argv ptr, &argv[0] .. &argv[n]. These pointers need to account for the fact that the data will be remapped into the child at USTACKTOP.

Exercise 12. Finish init_stack.

Test your code by changing kern/init.c to run the user_spawninit program. This should print out something like this, as the "parent environment" spawns off the init program:

[00000000] new env 00000401
i am parent environment 00000401
[00000401] new env 00000402
[00000401] exiting gracefully
[00000401] free env 00000401
init: running
init: data seems okay
init: bss seems okay
init: args: 'init' 'one' 'two'
init: exiting
[00000402] exiting gracefully
[00000402] free env 00000402

Challenge! Implement Unix-like environment variables.

On the child side, spawn examines the entry path of the child process under the start label. It is written such that libmain() and umain() both take arguments (int argc, char *argv[]). libmain() simply passes its arguments along to umain(). You'll also notice that the entry path also takes care of the case when a new process is created by the kernel, in which case no arguments are passed. The code on the child side has been done for you; you do not need to make any changes.

Technical Detail: Actually only the argc and the argv ptr must be placed on the new env's stack. The argv ptr must point to the &argv[0] .. &argv[n] array, each of which point to a string. As a consequence, the &argv[0] .. &argv[n] array and the "block of strings" can be located anywhere in the new env's address space--not necessarily on the stack. In practice, we find it convenient to store all of these values on the stack as has been presented in this exercise.

This completes the lab.

Back to Advanced Operating Systems, Fall 2004