Project 2 Hints & Help: - Re-explain concept of thread pools (I don't think I covered this in sec B) - How to get data into thread pools? * explain concept of bounded queues.. then introduce mailbox (take & put) Ask: Any other questions about the project? SCHEDULING AND PRIORITIES REVIEW Many scheduler types: - FIFO (non-preemptive) - shortest-job-first (how could we possibly know who's shortest?) what if the task lies? - round robin (circular FIFO with Fixed quantum) - priorities - review concept of task priority - aging to prevent starvation - what other problems exist with priorities? - what happens when a low priority task gets ahold of a resource that a high priority task needs next? - explain concept of priority inversion - how to fix it: temporary promotion (effective priority vs. set priority) REAL TIME SCHEDULING real-time systems mean systems in which we must have GUARANTEED deterministic bounds on waiting times - "soft" real time includes things like jobs to play music; - must decode the next buffer-full of memory while the current buffer is still playing - track will "skip" if we don't finish in time - but that's not the end of the world - "hard" real time: - control the fuel inputs of an aircraft - while also monitoring sensors like altitude, speed, etc, current engine RPMs.. - if we mess up, plane falls out of the sky! - a pacemaker is definitely a hard real time sys! a mis-conception about real time: doesn't mean things happen instantaneously only means they happen on guaranteed bounded intervals. these are often used in embedded devices how do they work? - all tasks on the system are known ahead of time - programmer is responsible for setting the time slice associated with each task (i.e., precise, variable quantum in a round-robin system) often every task in a real time system will have its own distinct priority level. When a higher-priority task is ready to go, it always interrupts a lower-priority task. With distinct priorities, this means that we have a very deterministic system WHAT HAPPENS WHEN IT MESSES UP: THE MARS ROVER (A CASE STUDY IN OPERATING SYSTEM DESIGN TRICKERY) You all may recall the Mars "Pathfinder" rover in 1996; it was the first lander on mars that rolled around and took pictures back to the earth (on the web, no less!) At some point during the mission, it simply stopped working. After some debugging back on the ground, they reproduced the problem, uploaded some new code to the rover (on Mars!) and rebooted the rover, and it worked. (Talk about pressure..) So what went wrong? The system has several tasks and a shared data bus which it uses to read data from the sensors on the rover The CPU must collect data from many sensors which are all attached to the bus. The CPU must then distribute that data to the various tasks which monitor the sensors And this all needs to happen 8 times a second (8 HZ) Architecture: ------- -------- | CPU |---------| RAM | ------- -------- | | B|-- sensor unit | U|-- sensor unit | S|-- sensor unit | |-- ... So every 1/8 of a second, a task starts on the CPU that turns the machine into data collection mode; it tells all the sensors to send their data to the CPU (They do so) At some time later, it distributes the data to other tasks through FIFO buffers. |<------------- .125 seconds --------------------------------------------->| | sched | (read in bus data) | bc_dist | process_sensor | process_sensor...| this all must finish by the end of the 1/8 of a second, so that the first task can begin again Task 1: bc_sched (turn bus on) Task 2: bc_dist (put data into FIFO buffers) Tasks 3...n: individual sensor processors read from FIFO buffers, and process sensor input. The problem: The sensors were putting more data than anticipated into the FIFO buffers. One of the very low priority process_sensor tasks couldn't empty the whole buffer. So eventually, it filled up. But bc_dist() simply called write() on the FIFO. What happens? It BLOCKS until the process_sensor task comes back (bam! priority inversion!) It had to wait til all other process_sensor tasks run before bc_dist could come back to life! The sched task freaks out if dist was not complete; it was programmed to assume there was an error and reboot the machine -- but this just kept happening in a continuous loop. The fix? A global variable controlling whether priority inversions should be promoted was turned off, which caused bc_dist to effectively run at the lower process_sensor level, and wouldn't finish in time. Moral of the story: When you're programming with hard real time systems, pay attention to priorities and locks on buffers! :)