================================================================
CSE 344 -- Fall 2011
Lecture 20:   Transactions and Security

================================================================
1.  Review

Transactions
Locking
Levels of isolation and demo.


================================================================
2. Alternatives to locking.

Instead of locking, some DBMSs use multiversion concurrency control.

Concurrency control by timestamps:
- When a transaction T_i starts, assign it a timestamp TS_i
- Monitor the timestamps of transactions that read and write
to the database.
- Ensure that all conflicting operations (read/write to the same
element) happen in timestamp order.
- If a read or write would violate the order, abort the violating
transaction.

Multiversion concurrency control:
- This is a variant of concurrency control by timestamps.
- If a transaction wants to read an element but is too late
(i.e., the read would violate the timestamp order), allow
that transaction to read an older version of the element.
This approach reduces the number of aborts. 

When a system uses multiversion concurrency control, some anomalies go
away. For example, no transaction ever needs to read dirty data
because old, committed versions of the data are available and are used
instead. 

It also enables another level of isolation called "Snapshot Isolation".

We will study this in greater detail in 444!

Two things to keep in mind: 

1) Expect that some DBMSs may have different
behaviors with transactions. Always read the documentation of the DBMS
that you need to work with!!!

2) Remember that "serializable" is typically NOT the default level of isolation.
If you need serializability, you need to request it. But you also need
to ask yourself if you really need it.


================================================================
3. Commercial Systems

- DB2: Strict 2PL
- SQL Server:
  – Strict 2PL for standard 4 levels of isolation
  – Multiversion concurrency control for snapshot isolation
- PostgreSQL:
  – Multiversion concurrency control
- Oracle
  – Multiversion concurrency control


================================================================
4. Protecting data beyond transactions

4.1 Transactions and DBMS process failures

Transactions protect from DBMS crashes. If
the DBMS process crashes then:
-- Any transaction that was in progress will be aborted
   when the system restarts and recovers from the failure.
-- For any transaction that committed, its changes are guaranteed
   to survive DBMS crashes.

To provide these guarantees, the DBMS simply writes extra information
to disk in the form of a log (we will learn all the details in 444).

So what happens if the disk drive fails?


4.2 Backups and disk failures

To protect from disk failures, we must perform backups.

BACKUP DATABASE MyDatabase
    TO DISK = 'Z:\some\location\on\disk\MyDatabase.bak' 
    WITH FORMAT;
GO

In commercial DBMSs, the DBMS can still be used (mostly)
during the backup.

With backups in the form of files on disk, recovering
from a failure takes a long time. What if a user cannot
tolerate such a long delay?


4.3 Replication and disk failures

An alternative that is commonly used is to set up a replica
of the database. The two databases are then called
primary and secondary.

The primary performs all operations.
The primary sends information about changes to the database to the
secondary.
The secondary applies these changes.
If the primary fails, the secondary is ready to take over right away.

Challenge:
- Synchronous replication: 
  Wait for all replicas to receive changes before returning.
  Great protection but slows-down normal operations.

- Asynchronous replication: 
  Return as soon as primary performs an operation
  Propagate the list of changes to apply to the secondary in the background.
  Much faster but it's possible to lose some updates when a failure occurs.

For best protection, the replica should be located in a separate
data-center. In that case, synchronous replication is extremely
expensive. See slides at min 43.
http://yahoo.hosted.panopto.com/Panopto/Pages/Viewer/Default.aspx?id=b9e052c4-06c8-42da-a79b-1860d8b9698b

This is an important trade-off today when building very large-scale
data management systems. In those systems, for extra performance, we
can let applications read and write directly to the secondary.  We can
even create multiple replicas.  This ability to use many replicas
together with asychronous replication maximizes performance, but
reduces consistency. It only achieves what is called "eventual
consistency". Systems called "noSQL" typically make this trade-off.
The drawback is that it is difficult for application developers to
build applications on top of such a model. Newer systems thus often
provide the choice to replicate synchronously or asynchronously.


================================================================
5. Security in SQL

Privileges:  SELECT, INSERT, DELETE, UPDATE, REFERENCES ...
Apply to a table or a view
Granting privileges:

GRANT SELECT On Person TO Joe
GRANT SELECT(name, balance) On Person TO Jim
GRANT SELECT, INSERT On Orders TO Fred
GRANT DELETE On Person TO Sue WITH GRANT OPTION;

Revoking prvileges:

REVOKE INSERT ON Orders FROM Fred
REVOKE DELETE on Person FROM Sue CASCADE

Views and Security

How can we restrict read access to only a subset of the rows ?
E.g. A doctors is allowed to see the recors of all patients they
treat, but not more.

Patient(pid, name, diseases)
Treats(pid, did)
Doctor(did, name, speciality)

1. Grant access to Dr. Seus:

CREATE VIEW Drseus
   select x.pid, x.name, x.disease
   from Patient x, Treats y, Doctor z
   where x.pid = y.pid and y.did = z.did
     and z.name = 'Dr. Seus'

GRANT SELECT ON Drseus TO seus

*** In class: what is the problem with thsi approach ?

2. Grant  *each doctor* access to their patients

*** not supported in SQL !  How do we cope with that ?