Project 2 Hints & Help:

- Re-explain concept of thread pools (I don't think I covered this in sec B)
- How to get data into thread pools? 
	* explain concept of bounded queues.. then introduce mailbox (take & put)


Ask: Any other questions about the project?


SCHEDULING AND PRIORITIES REVIEW


Many scheduler types:
- FIFO (non-preemptive)
- shortest-job-first 
	(how could we possibly know who's shortest?)
	what if the task lies?
- round robin (circular FIFO with Fixed quantum)
- priorities


- review concept of task priority
- aging to prevent starvation
- what other problems exist with priorities?
	- what happens when a low priority task 
	gets ahold of a resource that a high priority 
	task needs next?
	- explain concept of priority inversion
	- how to fix it: temporary promotion
	(effective priority vs. set priority)


REAL TIME SCHEDULING 

real-time systems mean systems in which we must have
GUARANTEED deterministic bounds on waiting times

- "soft" real time includes things like jobs to play music;
	- must decode the next buffer-full of memory 
	while the current buffer is still playing
	- track will "skip" if we don't finish in time
	- but that's not the end of the world
- "hard" real time:
	- control the fuel inputs of an aircraft
	- while also monitoring sensors like altitude,
	speed, etc, current engine RPMs..
	- if we mess up, plane falls out of the sky!

	- a pacemaker is definitely a hard real time sys!


a mis-conception about real time:
	doesn't mean things happen instantaneously
	only means they happen on guaranteed bounded intervals.

these are often used in embedded devices
how do they work?
	- all tasks on the system are known ahead of time
	- programmer is responsible for setting the
	time slice associated with each task 

(i.e., precise, variable quantum in a round-robin system)

often every task in a real time system will have its 
own distinct priority level.

When a higher-priority task is ready to go, it always
interrupts a lower-priority task.

With distinct priorities, this means that we have a very 
deterministic system


WHAT HAPPENS WHEN IT MESSES UP: THE MARS ROVER 
(A CASE STUDY IN OPERATING SYSTEM DESIGN TRICKERY)

You all may recall the Mars "Pathfinder" rover in 1996;
it was the first lander on mars that rolled around and
took pictures back to the earth (on the web, no less!)

At some point during the mission, it simply stopped working.
After some debugging back on the ground, they reproduced the
problem, uploaded some new code to the rover (on Mars!) and
rebooted the rover, and it worked. (Talk about pressure..)

So what went wrong?

The system has several tasks and a shared data bus which
it uses to read data from the sensors on the rover


The CPU must collect data from many sensors which 
are all attached to the bus. The CPU must then distribute
that data to the various tasks which monitor the sensors


And this all needs to happen 8 times a second (8 HZ)

Architecture:

-------         --------
| CPU |---------|  RAM |
-------         --------
  |                       
  |                      
 B|-- sensor unit
  |                       
 U|-- sensor unit
  |                       
 S|-- sensor unit
  |
  |-- ...


So every 1/8 of a second, a task starts on the CPU
that turns the machine into data collection mode;
it tells all the sensors to send their data to the CPU

(They do so)

At some time later, it distributes the data to other tasks
through FIFO buffers.


|<------------- .125 seconds --------------------------------------------->|
| sched | (read in bus data) | bc_dist | process_sensor | process_sensor...|

this all must finish by the end of the 1/8 of a second,
so that the first task can begin again


Task 1: bc_sched (turn bus on)
Task 2: bc_dist (put data into FIFO buffers)
Tasks 3...n: individual sensor processors read from FIFO buffers,
   and process sensor input.


The problem: 
The sensors were putting more data than anticipated 
into the FIFO buffers. One of the very low priority
process_sensor tasks couldn't empty the whole buffer.
So eventually, it filled up. 

But bc_dist() simply called write() on the FIFO. What happens?
It BLOCKS until the process_sensor task comes back
	(bam! priority inversion!)

It had to wait til all other process_sensor tasks run
before bc_dist could come back to life!

The sched task freaks out if dist was not complete; it 
was programmed to assume there was an error and reboot 
the machine -- but this just kept happening in a continuous loop.

The fix? A global variable controlling whether priority
inversions should be promoted was turned off, which caused
bc_dist to effectively run at the lower process_sensor level,
and wouldn't finish in time.


Moral of the story: When you're programming with hard
real time systems, pay attention to priorities and locks
on buffers! :)