================================================================ CSE 344 -- Fall 2011 Lecture 19: Transactions ================================================================ REVIEW: -- Transaction motivation: what can go wrong without transactions? -- Transaction -- ACID ================================================================ A - Serializability and chedules Given a set of transactions, a schedule is a sequence of interleaved actions from all transactions Example: Given the following transactions T1: T2: Read(A) Read(A) Write(A) Wtite(A) Read(B) Read(B) Write(B) Write(B) A serial schedule is: T1: T2: R(A) W(A) R(B) W(B) R(A) W(A) R(B) W(B) A schedule is serializable if it is equivalent to a serial schedule Informally, equivalent, means that all *conflicting* operations, those that read and write to the same element, are ordered in the same way: T1: T2: R(A) W(A) R(A) W(A) R(B) W(B) R(B) W(B) Here is an example of a non-serializable schedule: T1: R(A) W(A) R(A) W(A) R(B) W(B) R(B) W(B) ================================================================ B - Serializable schedules using locks Basic idea: - Each element in the database has a unique lock - Each transaction must first acquire the lock before reading/writing that element - If the lock is taken by another transaction, then wait - The transaction must release the lock(s) sqlite = one lock for the entire database SQL Server, DB2 = one lock per record (and index entry, etc) Let's study SQLite first: SQLITE = is very simple For more information: http://www.sqlite.org/atomiccommit.html Step 1: every transaction aquires a READ LOCK (aka "SHARED" lock) all these transactions may read happily Step 2: when one transaction wants to write aquire a RESERVED LOCK this lock may coexists with many READ LOCKs writer transaction may now write (others don't see the updates) reader transactions continue to read, new readers accepted Step 3: when writer transaction wants to commit try to aquire a EXCLUSIVE LOCK: no READ LOCKs may coexist 1. aquire a PENDING LOCK: coexists with READ LOCKs no new READ LOCKS are accepted during a PENDING LOCK wait for all read locks to be released 2. aquire an EXCLUSIVE LOCK: all updates are written permanently to the database and log create table r(a int, b int); insert into r values (1,10); insert into r values (2,20); insert into r values (3,30); -- Run the script below in three different windows T1: begin transaction; select * from r; -- T1 has a READ LOCK T2: begin transaction; select * from r; -- T1 has a READ LOCK T1: update r set b=11 where a=1; -- T1 has a RESERVED LOCK T2: update r set b=21 where a=2; -- T2 asked for a RESERVED LOCK: DENIED T3: begin transaction; select * from r; commit; -- everything works fine, could obtain READ LOCK T1: commit; SQL error: database is locked -- T1 asked for PENDING LOCK -- GRANTED -- T1 asked for EXCLUSIVE LOCK -- DENIED T3': begin transaction; select * from r; -- T3 asked for READ LOCK -- DENIED (because of T1) T2: commit; -- releases the last READ LOCK T1: commit; -- T1 asked for EXCLUSIVE LOCK -- GRANTED T3': select * from r; -- T3 asked for READ LOCK -- GRANTED commit; *** In class: is this schedule serializable ? If so, then what is the serialization order of the four transactions T1, T2, T3, T3' ? ================================================================ DEADLOCK When Transaction 1 waits for a lock held by Transaction 2 and Transaction 2 waits for a lock held by Transaction 1 Worse: TX1 waits for TX2; TX2 waits for TX3; ... TXn waits for TX1. Unavoidable, expensive to detect. On detection, abort a transaction. SQLITE avoids deadlocks by never blocking: instead it returns an error code and application must re-issue request. ================================================================ Now let's study what other commercial DBMSs that use locking typically do: For each database element, two types of locks: shared and exclusive. Must hold a shared lock to read and an exclusive lock to write. Let's give this a try: T1: T2: S(A) -- T1 acquires a shared lock on A R(A) X(A) W(A) U(A) -- T1 releases its locks S(A) R(A) X(A) W(A) U(A) S(B) R(B) X(B) W(B) U(B) S(B) R(B) X(B) W(B) U(B) Did we ensure a serializable schedule? NO How do we fix this? Use two-phase locking (2PL) 2PL = In every transaction, all lock requests must preceed all unlock requests T1: T2: S(A) R(A) X(A) W(A) S(A) -- Denied. Blocks and waits S(B) R(B) X(B) W(B) U(A) S(A) -- Granted R(A) X(A) W(A) U(B) S(B) R(B) X(B) W(B) U(A) U(B) Success! We got a serializable schedule. But what about aborts? S(A) R(A) X(A) W(A) S(A) -- Denied. Blocks and waits S(B) R(B) X(B) W(B) U(A) S(A) -- Granted R(A) X(A) W(A) U(B) S(B) R(B) X(B) W(B) U(A) U(B) ABORT COMMIT This is bad! We just commited T2 but T2 read the updates by T1, which aborted. How do we fix this? Using *strict* 2PL Strict 2PL = All locks held by a transaction are released when the transaction is completed Strict 2PL ensures that a schedule is both serializable and recoverable in case of aborts. S(A) R(A) X(A) W(A) S(A) -- Denied S(B) R(B) X(B) W(B) Abort U(A) U(B) S(A) -- Granted R(A) X(A) W(A) S(B) R(B) X(B) W(B) Commit U(A) U(B) In practice, locking can get even more sophisticated. There is more to learn. Bottom line: Locking can work well if we do it right. However, it is very expensive. It slows things down considerably because it limits concurrency. Sometimes, applications are willing to give-up some of the nice properties of serializability to get higher performance. ================================================================ Isolation levels in SQL Run the Java demo together with the following: 1. SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED yes dirty reads A "dirty" value is the value written by a transaction that did not yet commit. Implementation using locks: - “Long duration” WRITE locks – A.k.a Strict Two Phase Locking (you knew that !) - Do not use READ locks: Read-only transactions are never delayed 2. SET TRANSACTION ISOLATION LEVEL READ COMMITTED no dirty reads HOWEVER may read the same value twice and get different answers (how can this be possible ?) Implementation using locks: - “Long duration” WRITE locks - “Short duration” READ locks: Only acquire lock while reading (not 2PL) 3. SET TRANSACTION ISOLATION LEVEL READ REPEATABLE READS no dirty reads and repeated reads return the same value Implementation using locks: - “Long duration” READ and WRITE locks – Full Strict Two Phase Locking Aren's we done? NO 4. SET TRANSACTION ISOLATION LEVEL SERIALIZABLE also handles "phantoms" What is the phantom problem? "PHANTOM" = a tuple inserted in a relation during the execution of a transaction T1: T2: select count(*) . . . . from R where price>20 . . . . . . . . . . . . . . . . insert into R(name,price) values(‘Gizmo’, 50) . . . . . . . . . . . . select count(*) from R where price>20 The problem is in the way we model transactions: – Fixed set of elements - This model fails to capture insertions, because these create new elements - No easy solutions: – Need “predicate locking” but how to implement it? – Sol1: Lock on the entire relation R (or chunks) – Sol2: If there is an index on ‘price’, lock the index nodes Beware!!!! In commercial DBMSs: - Default level is often NOT serializable - Default level differs between DBMSs - Some engines support subset of levels! - Serializable may not be exactly ACID - Also, some DBMSs do NOT use locking and different isolation levels can lead to different problems. Bottom line: Read the documentation for the DBMS that you work with!