================================================================
CSE 344 -- Fall 2011
Lecture 19:   Transactions

================================================================
REVIEW:

  -- Transaction motivation: what can go wrong without transactions?
  -- Transaction
  -- ACID

================================================================
A - Serializability and chedules

Given a set of transactions, a schedule is a sequence of interleaved
actions from all transactions

Example:

Given the following transactions
T1:        T2:
Read(A)    Read(A)
Write(A)   Wtite(A)
Read(B)    Read(B)
Write(B)   Write(B)

A serial schedule is:

T1:        T2:
R(A)
W(A)
R(B)
W(B)
            R(A)
            W(A)
            R(B)
            W(B)

A schedule is serializable if it is equivalent to a serial schedule

Informally, equivalent, means that all *conflicting* operations, those
that read and write to the same element, are ordered in the same way:

T1:         T2:
R(A)
W(A)
            R(A)
            W(A)
R(B)
W(B)
            R(B)
            W(B)

Here is an example of a non-serializable schedule:

T1:
R(A)
W(A)
            R(A)
            W(A)
            R(B)
            W(B)
R(B)
W(B)


================================================================
B - Serializable schedules using locks

Basic idea:
- Each element in the database has a unique lock
- Each transaction must first acquire the lock
before reading/writing that element
- If the lock is taken by another transaction, then wait
- The transaction must release the lock(s)


sqlite = one lock for the entire database
SQL Server, DB2 = one lock per record (and index entry, etc)

Let's study SQLite first:

SQLITE = is very simple
For more information: http://www.sqlite.org/atomiccommit.html

Step 1: every transaction aquires a READ LOCK (aka "SHARED" lock)
        all these transactions may read happily

Step 2: when one transaction wants to write
        aquire a RESERVED LOCK
        this lock may coexists with many READ LOCKs
        writer transaction may now write (others don't see the updates)
        reader transactions continue to read, new readers accepted


Step 3: when writer transaction wants to commit
        try to aquire a EXCLUSIVE LOCK: no READ LOCKs may coexist
        1. aquire a PENDING LOCK: coexists with READ LOCKs
           no new READ LOCKS are accepted during a PENDING LOCK
           wait for all read locks to be released
        2. aquire an EXCLUSIVE LOCK: 
        all updates are written permanently to the database and log


create table r(a int, b int);
insert into r values (1,10);
insert into r values (2,20);
insert into r values (3,30);

-- Run the script below in three different windows

T1:
   begin transaction;
   select * from r;
   -- T1 has a READ LOCK

T2:
   begin transaction;
   select * from r;
   -- T1 has a READ LOCK

T1:
   update r set b=11 where a=1;
   -- T1 has a RESERVED LOCK

T2:
   update r set b=21 where a=2;
   -- T2 asked for a RESERVED LOCK:  DENIED

T3:
   begin transaction;
   select * from r;
   commit;
   -- everything works fine, could obtain READ LOCK

T1:
   commit;
   SQL error: database is locked
   -- T1 asked for PENDING LOCK -- GRANTED
   -- T1 asked for EXCLUSIVE LOCK -- DENIED

T3':
   begin transaction;
   select * from r;
   -- T3 asked for READ LOCK -- DENIED (because of T1)

T2:
   commit;
   -- releases the last READ LOCK

T1:
   commit;
   -- T1 asked for EXCLUSIVE LOCK -- GRANTED

T3':

   select * from r;
   -- T3 asked for READ LOCK -- GRANTED
   commit;

*** In class: is this schedule serializable ?  If so, then what is the
    serialization order of the four transactions T1, T2, T3, T3' ?


================================================================
DEADLOCK

When Transaction 1 waits for a lock held by Transaction 2
and  Transaction 2 waits for a lock held by Transaction 1

Worse: TX1 waits for TX2; TX2 waits for TX3; ... TXn waits for TX1.

Unavoidable, expensive to detect.  On detection, abort a transaction.

SQLITE avoids deadlocks by never blocking: instead it returns an error
code and application must re-issue request.

================================================================
Now let's study what other commercial DBMSs that use locking
typically do:

For each database element, two types of locks: shared and exclusive.
Must hold a shared lock to read and an exclusive lock to write.

Let's give this a try:

T1:            T2:

S(A) -- T1 acquires a shared lock on A
R(A)
X(A)
W(A)
U(A) -- T1 releases its locks
               S(A)
               R(A)
               X(A)
               W(A)
               U(A)
               S(B)
               R(B)
               X(B)
               W(B)
               U(B)
S(B)
R(B)
X(B)
W(B)
U(B)

Did we ensure a serializable schedule? NO

How do we fix this? Use two-phase locking (2PL)

2PL = In every transaction, all lock requests must
preceed all unlock requests

T1:            T2:

S(A)
R(A)
X(A)
W(A)
               S(A) -- Denied. Blocks and waits
S(B)
R(B)
X(B)
W(B)
U(A)
               S(A) -- Granted
               R(A)
               X(A)
               W(A)
U(B)
               S(B)
               R(B)
               X(B)
               W(B)
               U(A)
               U(B)

Success! We got a serializable schedule.

But what about aborts?

S(A)
R(A)
X(A)
W(A)
               S(A) -- Denied. Blocks and waits
S(B)
R(B)
X(B)
W(B)
U(A)
               S(A) -- Granted
               R(A)
               X(A)
               W(A)
U(B)
               S(B)
               R(B)
               X(B)
               W(B)
               U(A)
               U(B)

ABORT          COMMIT

This is bad! We just commited T2 but T2 read
the updates by T1, which aborted.

How do we fix this? Using *strict* 2PL

Strict 2PL = All locks held by a transaction are
released when the transaction is completed

Strict 2PL ensures that a schedule is both serializable and
recoverable in case of aborts.

S(A) 
R(A)
X(A)
W(A)
               S(A) -- Denied
S(B)
R(B)
X(B)
W(B)
Abort U(A) U(B)
               S(A) -- Granted
               R(A)
               X(A)
               W(A)
               S(B)
               R(B)
               X(B)
               W(B)
               Commit U(A) U(B)

In practice, locking can get even more sophisticated. There
is more to learn.

Bottom line: Locking can work well if we do it right.  However, it is
very expensive. It slows things down considerably because it limits
concurrency.

Sometimes, applications are willing to give-up some of the
nice properties of serializability to get higher performance.

================================================================
Isolation levels in SQL

Run the Java demo together with the following:


1. SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

        yes dirty reads
        A "dirty" value is the value written by a transaction 
        that did not yet commit.

        Implementation using locks:
        - “Long duration” WRITE locks
          – A.k.a Strict Two Phase Locking (you knew that !)
        - Do not use READ locks: Read-only transactions are never delayed

2. SET TRANSACTION ISOLATION LEVEL READ COMMITTED

        no dirty reads HOWEVER may read the same value twice and get
        different answers (how can this be possible ?)

        Implementation using locks:
        - “Long duration” WRITE locks
        - “Short duration” READ locks: Only acquire lock while reading (not 2PL)

3. SET TRANSACTION ISOLATION LEVEL READ REPEATABLE READS

        no dirty reads and repeated reads return the same value

        Implementation using locks:
        - “Long duration” READ and WRITE locks
        – Full Strict Two Phase Locking
        Aren's we done? NO

4. SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

        also handles "phantoms"

What is the phantom problem?

"PHANTOM" = a tuple inserted in a relation during the execution of a
transaction

T1:                          T2:
select count(*)              . . . .
from R where price>20        . . . .
. . . .                      . . . .
. . . .                      insert into R(name,price) values(‘Gizmo’, 50)
. . . .                      . . . .
. . . .
select count(*)
from R where price>20

The problem is in the way we model transactions:
– Fixed set of elements
  - This model fails to capture insertions, because
    these create new elements
- No easy solutions:
  – Need “predicate locking” but how to implement it?
    – Sol1: Lock on the entire relation R (or chunks)
    – Sol2: If there is an index on ‘price’, lock the index nodes


Beware!!!! In commercial DBMSs:
- Default level is often NOT serializable
- Default level differs between DBMSs
- Some engines support subset of levels!
- Serializable may not be exactly ACID
- Also, some DBMSs do NOT use locking and different isolation
  levels can lead to different problems.
Bottom line: Read the documentation for the DBMS that you work with!