Lecture 7: Synchronization

We begin this lecture by providing a few definitions:

Critical Section - The textbook defines a critical section as a segment of code that cannot be executed while some other process is in a corresponding segment of code. For example, a critical section might be code segments in two different programs that write a shared variable. Alternatively, as presented in lecture, a critical section is a piece of code that must be executed atomically with respect to other corresponding pieces of code. If a critical section is not atomic, we have unexpected program behavior and a bug.

Atomic - An atomic operation is one where the entire code executes as a single unit. With respect to other critical sections, it behaves as though there were no concurrency and no interrupts.

Let us take the example of a simple ATM (Automated Teller Machine) Network. In our example, every account supports withdrawls and deposits. Obviously, every account also has a set balance. We can define withdrawl and deposit functions as follows:

withdrawl(d)                                  deposit(d)
{                                             {
    if(balance >= d)                              balance += d;
        balance -= d;                         }
}

We can note at this point that balance is a variable in shared memory. Now, what would happen if we tried to execute the following three lines of code?

balance = 10;
withdraw(5);
deposit(5);

Intuitively, we expect balance to result back at 10. However, let us take a closer look at the machine-level operation of withdraw() and deposit()(note: don't mind the numbers 1-6 next to the lines of code, they will be important to our discussion shortly):

withdraw(d)
{
    movl balance , %eax                  1
    subl d , %eax                        2
    movl %eax , balance                  3
}

deposit(d)
{
    movl balance , %eax                  4
    addl d , %eax                        5
    movl $eax , balance                  6
}

We note at this point that we assume we are dealing with a uniprocessor machine. In this design, registers are copied between threads and thus are not shared. This would mean that the %eax register need not be worried about. However, we note, as previously stated, that balance is a shared global variable. Without critical sections, we can follow the line pattern 1-4-5-6-2-3 in the machine code above in order to obtain a final balance of 5! Furthermore, we can follow the pattern 1-4-2-3-5-6 in order to obtain a final balance of 15! How unfortunate that ATM programmers knew what they were doing. :[

Now that we have clearly shown the need for critical sections, let us take a look into how we can determine and build critical sections. When building a critical section, we must take a few things into consideration:

1) Types of concurrency possible.
2) Critical section size.
3) Types of concurrency allowed.

Concurrency in the kernel can occur from interrupts and SMP. In a multi-threaded application, we can experience concurrency due to signals, timers, other interrupts, and SMP.

To build a critical section, it is simple to assume that we should turn off interrupts! At the supervisor-level, there are two simple instructions to aid us in doing so. They are:

"cli" ---> turns off interrupts
"sti" ---> turns on interrupts

cli and sti use a single bit to enable and disable interrupts. When cli is called, it is able to prevent all other code in the system from executing. As such, cli and sti are considered very "aggressive" instructions, and are surely not available at the user level.

At the user level, we need a more "fine-grained" critical section. For this purpose, we propose the use of synchronization objects: objects (a resource which can be allocated and freed) that can be used to achieve synchronization.

An example of such an object is a mutual exclusion lock or MUTEX for short. With this design, at most one thread can hold on to a lock at a time. There are two operations on a MUTEX lock, namely acquire(L) and release(L), where L is the lock parameter. If L = 0, the lock is unlocked; if L = 1, it is locked.

We can implement acquire() as follows:

acquire(L)
{
     cli;

	while( L != 0 )
	{
	  sti;
	  cli; //Gives others a chance to run
	}

     L = 1;
     sti;
}

We make the following observation about the acquire() function: we are disabling and re-enabling interrupts twice. Why would we choose such a setup? After all, why not do the simpler implementation as shown below?

acquire(L)
{
     while( L != 0 )
     L = 1;
}

Note that, in this simpler implementation, we can potentially loop forever, destroying the point of the MUTEX. Therefore, as shown in the original (correct) implementation, we momentarily re-enable interrupts, during which time the operating system can decide if some other code could run. Afterwards, interrupts are turned off again, and then interrupts are disabled with the lock set to a "locked" state. With such a setup, we place calls to acquire() before important system calls. The critical section thus begins when acquire returns.

Synchronization Issues

When dealing with synchronization and the building of "good" critical sections, we have to deal with a few issues. First, we have to understand that locking mechanisms, such as cli and sti, tend to be "expensive" instructions; busy waiting can become commonplace with the usage of such locking mechanisms. Also, lock contention can occur, meaning that many processes are waiting on a single lock. A naive programmer may think to place ALL of his code in a critical section, so as to avoid any unexpected behavior. Sadly, however, this creates contention even worse, and is considered bad programming practice. We also must take into consideration many possible lock errors. It is easy for a programmer to "miss" an access which needs to be synchronized. Building on the previous example with the naive programmer, we note that smaller critical sections may make errors more likely. Clearly, the designing of critical sections can potentially require a large amount of thought and consideration.

One of the most dreaded situations in programming is deadlock. All process involved in a deadlock cannot make progress, as they each hold a lock or resource which is required by another process. The textbook provides a [terrible] dead lock example of two pirates, each greedily holding half of a treasure map. Because both pirates are unwilling to give up their half of the map, neither is able to reach the treasure.

In general, to build proper synchronization, we always need hardware support. For very small critical sections, hardware provides convenient instructions which are more useful than cli and sti. These instructions are:

test_and_set(addr , value)
{
	old = *addr;
	*addr = value;
	return old;
}

and

compare_and_swap(addr, old , new)
{
	if(*addr = old)
	{
		*addr = new;
		return 1;
	}

	else
		return 0;
}

It is important to note that test_and_set() and compare_and_swap() are both atomic instructions. Given these new instructions, we can more efficiently re-write acquire() as:

acquire(L)
{
	while( compare_and_swap(&L,0,1) == 0 )
		/* */
}

* At this point, it is appropriate to introduce a "new" object known as a semaphore. A semaphore consists of an integer and two operations. These two operations are Proberen (to test) and Verhogen(to increment). Though it may be a bit confusing to understand in reverse, this time the semaphore S is locked when S = 0, and UNlocked when S = 1. With that in mind, we can implement acquire() and release() using semaphores as:

P(S)    // P is short for Proberen
{
	while ( S == 0 )
		wait

	S--
}

and

V(S)
{
	S++
}

We can clearly see parallels between P() and acquire(), and similarly, between V() and release().

We end this lecture with an example using P() and V() in practice. In our example, we have a producer and a consumer, a set of N buffers, and 2 threads. A producer takes an empty buffer and fills it. A consumer takes a full buffer and empties it. If no buffers are empty, the producer waits. Likewise, a consumer waits until at least one buffer is full. Let "empty" = the number of empty buffers (initially N), and "full" = the number of full buffers (initially 0). We can implement the producer and consumer processes as follows:

producer()
{
	while(1)
	{
		P(empty)
		~fill buffer~
		V(full)
	}
}

and

consumer()
{
	while(1)
	{
		P(full)
		~empty buffer~
		V(empty)
	}
}

In this example, P() is implemented as follows:

P(s)
{
	while(1)
	{
		int try_s = s;
		if( compare_and_swap(&s, try_s, try_s-1) == 1 )
			return
	}
}

------END------