================================================================ CSE 344 -- Fall 2011 Lecture 20: Transactions and Security ================================================================ 1. Review Transactions Locking Levels of isolation and demo. ================================================================ 2. Alternatives to locking. Instead of locking, some DBMSs use multiversion concurrency control. Concurrency control by timestamps: - When a transaction T_i starts, assign it a timestamp TS_i - Monitor the timestamps of transactions that read and write to the database. - Ensure that all conflicting operations (read/write to the same element) happen in timestamp order. - If a read or write would violate the order, abort the violating transaction. Multiversion concurrency control: - This is a variant of concurrency control by timestamps. - If a transaction wants to read an element but is too late (i.e., the read would violate the timestamp order), allow that transaction to read an older version of the element. This approach reduces the number of aborts. When a system uses multiversion concurrency control, some anomalies go away. For example, no transaction ever needs to read dirty data because old, committed versions of the data are available and are used instead. It also enables another level of isolation called "Snapshot Isolation". We will study this in greater detail in 444! Two things to keep in mind: 1) Expect that some DBMSs may have different behaviors with transactions. Always read the documentation of the DBMS that you need to work with!!! 2) Remember that "serializable" is typically NOT the default level of isolation. If you need serializability, you need to request it. But you also need to ask yourself if you really need it. ================================================================ 3. Commercial Systems - DB2: Strict 2PL - SQL Server: – Strict 2PL for standard 4 levels of isolation – Multiversion concurrency control for snapshot isolation - PostgreSQL: – Multiversion concurrency control - Oracle – Multiversion concurrency control ================================================================ 4. Protecting data beyond transactions 4.1 Transactions and DBMS process failures Transactions protect from DBMS crashes. If the DBMS process crashes then: -- Any transaction that was in progress will be aborted when the system restarts and recovers from the failure. -- For any transaction that committed, its changes are guaranteed to survive DBMS crashes. To provide these guarantees, the DBMS simply writes extra information to disk in the form of a log (we will learn all the details in 444). So what happens if the disk drive fails? 4.2 Backups and disk failures To protect from disk failures, we must perform backups. BACKUP DATABASE MyDatabase TO DISK = 'Z:\some\location\on\disk\MyDatabase.bak' WITH FORMAT; GO In commercial DBMSs, the DBMS can still be used (mostly) during the backup. With backups in the form of files on disk, recovering from a failure takes a long time. What if a user cannot tolerate such a long delay? 4.3 Replication and disk failures An alternative that is commonly used is to set up a replica of the database. The two databases are then called primary and secondary. The primary performs all operations. The primary sends information about changes to the database to the secondary. The secondary applies these changes. If the primary fails, the secondary is ready to take over right away. Challenge: - Synchronous replication: Wait for all replicas to receive changes before returning. Great protection but slows-down normal operations. - Asynchronous replication: Return as soon as primary performs an operation Propagate the list of changes to apply to the secondary in the background. Much faster but it's possible to lose some updates when a failure occurs. For best protection, the replica should be located in a separate data-center. In that case, synchronous replication is extremely expensive. See slides at min 43. http://yahoo.hosted.panopto.com/Panopto/Pages/Viewer/Default.aspx?id=b9e052c4-06c8-42da-a79b-1860d8b9698b This is an important trade-off today when building very large-scale data management systems. In those systems, for extra performance, we can let applications read and write directly to the secondary. We can even create multiple replicas. This ability to use many replicas together with asychronous replication maximizes performance, but reduces consistency. It only achieves what is called "eventual consistency". Systems called "noSQL" typically make this trade-off. The drawback is that it is difficult for application developers to build applications on top of such a model. Newer systems thus often provide the choice to replicate synchronously or asynchronously. ================================================================ 5. Security in SQL Privileges: SELECT, INSERT, DELETE, UPDATE, REFERENCES ... Apply to a table or a view Granting privileges: GRANT SELECT On Person TO Joe GRANT SELECT(name, balance) On Person TO Jim GRANT SELECT, INSERT On Orders TO Fred GRANT DELETE On Person TO Sue WITH GRANT OPTION; Revoking prvileges: REVOKE INSERT ON Orders FROM Fred REVOKE DELETE on Person FROM Sue CASCADE Views and Security How can we restrict read access to only a subset of the rows ? E.g. A doctors is allowed to see the recors of all patients they treat, but not more. Patient(pid, name, diseases) Treats(pid, did) Doctor(did, name, speciality) 1. Grant access to Dr. Seus: CREATE VIEW Drseus select x.pid, x.name, x.disease from Patient x, Treats y, Doctor z where x.pid = y.pid and y.did = z.did and z.name = 'Dr. Seus' GRANT SELECT ON Drseus TO seus *** In class: what is the problem with thsi approach ? 2. Grant *each doctor* access to their patients *** not supported in SQL ! How do we cope with that ?