CS111 Fall Scribe Notes October 6, 2005 Week 1

Bootstrapping
What happens when the computer turns on? Gradually, it starts with very simple code that loads up and runs more complex code. This is the process we call bootstrapping. Bootstrapping is usually what happens when a computer begins to load an operating system ("complex code"). But why does it have to start with simple code? Due to the way personal computers were designed, the bootstrapping process has to start with a program that does not require lots of memory thus forcing the code to be small and simple; this is discussed more below. However, more is involved when the computer turns on.

There are many stages involved which include:

Hardware ===> Firmware ===> Sofware

(Processor) (ROM Integrated with hardware) (Disk/Removable Media i.e. CD)

We can understand this better with an example.

Print 111! OS
Let's say we want to create an operating system that just prints "111!" to the screen and does nothing more. Here's what we want to happen:

Turn on computer
Print "111!" to the screen (without quotes)
Wait forever

But how can we get here? Let's look at how a PC's memory is layed out for us by the designers.

PC Physical Memory - Most PC's today have 32-bit memory addresses which allow us to address up to 4 giga-bytes of memory. Below is how memory is arranged.

**Fun Fact** - Old Style Computers could only access 1MB of Memory

So now that we know what goes where in memory, we can work through the problem in more depth.

VGA and Memory Mapped I/O
The address 0xB8000 in the above code represents the memory location that maps to the console's top left corner. This means that we can place characters on a console by treating the console as if it were memory. That is, the VGA's memory is mapped to a specific part of the PC's physical memory. This is an example of memory mapped input-output. So here's the code we can use to create the Print 111! OS.

uint16_t *console = (uint16_t *)0xB8000; console[0] = '1'; // movw $49, 0xB8000 console[1] = '1'; console[2] = '1'; console[3] = '!'; while (1) /* do nothing */;

BIOS - Basic Input Output System
When the computer is turned on, its BIOS instructions are executed. That is, when you turn on ANY PC it jumps to 0xFFFFFFF0 (BIOS). By the way, the BIOS is usually located on read-only memory (ROM) so it is also known as firmware.

RAM - Random Access Memory
Most of the PC's physical memory consist of RAM. Random Access Memory is part of memory that programs use to store information that can be processed quickly. However, whatever is stored in RAM only exists as long as the computer is turned on.

Okay, now that we've seen the Print 111! OS code and parts of memory. Where should we put the code?

* We can't store it in RAM, because it will get erased every time the computer starts.

* We could put the code in BIOS at 0xFFFFFFF0, but this is all it will do. Meaning, we won't be able to upgrade or change the Print 111! OS because we would have to change the content of ROM.
1. When the computer turns on, it accesses BIOS code in 0xFFFFFFF0

So what do we do? Well, how about we have the BIOS (the hardest-to-change code) load in code from another persistent source, such as the disk, and then execute that code? This gets us some flexibility, since it is easier to change the contents of the disk than the contents of ROM. The BIOS is very, very simple, however. It can't necessarily parse a file system! (What a bad idea that would be, programming every existing file system into ROM! You could never fix bugs or design a new file system.) So it loads some more simple code in from disk.

2. Standard BIOS behavior:

find a startup disk
read first sector into memory
jumps there (0x7C00)

Disks, just like floppies and hard disks, are divided into sectors where each sector is 512 bytes long. We read and write sectors as units. read_sector(sector #) gives 512 bytes of data. So this is why the bootstrapping process starts with simple code.

So when we want to read and write to disks we use programmed I/O (the opposite of memory-mapped I/O) which consists of these steps:

wait for disk
write command to read sector
read sector
repeat

This reads the data from the disk one sector at a time.

Therefore, we decide to force the Print 111! OS to be at a certain location on disk. Then we'll rely on BIOS code to send a read command to the disk and read the first sector into memory. Next, the BIOS will jump to the sector at 0x7C00; this 512 bytes is called the Boot Loader.

We can't fit much code into the boot loader. (Our Print 111 OS will fit, but Linux certainly won't!) So the boot loader follows the same process again: it next loads the OS itself, and then jumps into the true OS. This multi-stage process is called bootstrapping, because it's like "pulling yourself up by your bootstraps": the computer starts with very little functionality (the "bootstraps" -- here, the BIOS), and progressively leverages that into more and more complex code ("pulling yourself up").

3. The Boot Loader (first sector of disk) loads the rest of the OS, which eventually jumps into the Operating System.
4. The OS then loads applications from disk, and runs them supplying them as processes.

The 4 steps highlighted in green above make up the bootstrapping process.

Process
What are processes? Processes can be a program in execution, or a set of machine registers plus all associated resources & state, or a virtual machine (in other words, a virtual computer so that hardware can interact with).

An operating system's kernel has a structure per process that holds process state. That is, for every process running there is a structure with important process information. We call this information the Process Descriptor:

Process Descriptor

Accounting

Process ID - How long has it run? How many resources does it hold?
Status - Is this process running? Is it blocked?
Priority
User ID's - Which influences access control decisions? What is this process allowed to do?
Process Links - Parents/Child relationships

Memory

Program Counter & Other registers

Files

Disk Files
Network Connections

Other Resources

This is all fine and dandy but how do we actually create a process? We use fork(). The fork() command will create a new child process of the process that created it. The child program will have the same registers and program counter as the parents. So basically, the new child process is a copy of the parent process. However, the process ID of the child will change. With two commands, getpid() and getppid() we can gather information about what process is which.

getpid() - returns the current process' process ID
getppid() - returns the current process' parent ID

If getpid() returns 0, then the process is a child process.

But what if we don't want a copy of a process? How do we start a completely new process? We use exec() after forking. What exec() does is, it will throw out the copied program and load up the new program, in addition it also wipes the registers and replaces with disk binary.
In Memory:

After fork()

Program 5

Copy of Program 5

After exec( )

Program 5

New Program 6

        exit(status);
                 Kills memory
                 Releases resources
                 Process descriptor remains!! It then MUST notify the parent of its status

        waitpid(pid, &status, flags);
                 waitpid(6, &status, 0);
                 wait for pid to exit
                 ==> pid must be our child
                 Store status
                 Kill childs descriptor

Zombie Processes - are processes that have exited but it's parent hasn't been notified.

What happens to files and resources in processes?

int main ( ) {
       pid_t p;
       printf("Pre-fork\n");
       p=fork();
       if(p>0)
              printf("P, C is %d \n",p);
       else if (p == 0)
              printf("C %d, P is %d \n",getpid(),getppid());
       }

If the parent calls, the output is:

        Pre-fork
        P, C is 6
        C 6, P is 5

If the child calls, the output is:

        Pre-fork
        C 6, P is 5
        P, C is 6

In fork(), files are shared between child and parent.

When a command such as ls | sort is entered, the pipe is an indicator that all the output from ls needs to be the input to sort. A breakdown of the pipe() command is as follows:

        pipe(fds[2]);
                 Opens 2 file descriptors where writes on fd[1] can be read on fd[0]
                 Internal to the running OS (no disk representation)
                 Can talk to itself

When the shell executes this command, it will pipe, fork, and exec to create and run these two programs. Initially, they will share the same file descriptors. This means that some file descriptors need to be duplicated and closed in order to get the pipe to work. A breakdown of the dup2() command is as follows:

        dup2(oldfd,newfd);
                 Close newfd, make newfd, point to oldfd's file
                 Releases resources
                 Process descriptor remains!! It then MUST notify the parent of its status

        pipe;
        fork;
        close 3 on ls;
        close 4 on sort;

This means that the standard output (1) of ls will be duplicated from another file descriptor (4) and the standard input (0) of sort will be duplicated from another file descriptor (3).

Hardware	===>	Firmware	===>	Sofware
(Processor)		(ROM Integrated with hardware)		(Disk/Removable Media i.e. CD)