T. Andrew Yang: CSCI5333

T. Andrew Yang

Email: yang@uhcl.edu

Web page: http://sce.uhcl.edu/yang/

Tel.: (281) 283-3835

CSCI 5333 Data Base Management System

Chapter 19: Transaction Processing

Transaction processing systems are systems with large databases and hundreds of concurrent users that are executing database transactions.

Requirements: high availability and fast response time for hundreds of concurrent users.

A transaction represents a logical unit of database processing that must be completed in its entirety to ensure correctness.

The concurrency control problem occurs when multiple transactions submitted by various users interfere with one another in a way that produces incorrect results

Recoverability: The ability for a DBMS to recover, from a transaction failure, to a database state which was consistent before the transaction was applied.

19.1 Introduction to Transaction Processing

Multiprogramming allows the computer to execute multiple processes at the same time.

Interleaved concurrent execution of processes on a single-processor computer.

If the computer system has multiple hardware processors (CPUs), parallel processing of multiple processes is possible.

Figure 19.1: Interleaved versus parallel processing of concurrent transactions.

A transaction is a logical unit of database processing that includes one or more database access operations—these can include insertion, deletion, modification, or retrieval operations.

A simplified database model: A database can be represented as a collection of named data items. The size of a data item is called its granularity.

Using this simplified database model, the basic database access operations that a transaction can include are as follows:
- read_item(X): Reads a database item named X into a program variable. To simplify our notation, we assume that the program variable is also named X.

write_item(X): Writes the value of program variable X into the database item named X.

Executing a read_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer).
3. Copy item X from the buffer to the program variable named X.

Executing a write_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer).
3. Copy item X from the program variable named X into its correct location in the buffer.
4. Store the updated block from the buffer back to disk (either immediately or at some later point in time).

The read-set of a transaction is the set of all items that the transaction reads, and the write-set is the set of all items that the transaction writes. See Fig. 19.2 for examples.

19.1.3 Why Concurrency Control Is Needed

Sample concurrent transactions: T1 and T2 (See page 632)

The Lost Update Problem

This problem occurs when two transactions that access the same database items have their operations interleaved in a way that makes the update operation of one of the transactions lost .

Example: See Figure 19.3(a), p.634.

Question: Which operation is lost?

Exercise: Reschedule the operations to fix the problem.

The Temporary Update (or Dirty Read) Problem

This problem occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another transaction before it is changed back to its original value.

Example: Figure 19.3(b).

The Incorrect Summary Problem

If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated.

Example: Fig. 19.3(c), p.635.

The Unrepeatable Read Problem

When a transaction T reads an item twice and the item is changed by another transaction T' between the two reads; hence, T receives different values for its two reads of the same item.

Exercise: Demonstrate the above problem by writing down two interleaved transactions. See Fig. 19.3 for examples.

The All-or-None principle: Whenever a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either (1) all the operations in the transaction are completed successfully and their effect is recorded permanently in the database, or (2) the transaction has no effect whatsoever on the database or on any other transactions.

Why Recovery Is Needed?

There are different types of transaction failures (p.636). Whenever a failure occurs, the system must keep sufficient information to recover from the failure.

19.2 Transaction and System Concepts

The recovery manager keeps track of the following operations:
1. BEGIN_TRANSACTION
2. READ or WRITE
3. END_TRANSACTION
4. COMMIT_TRANSACTION: This signals a successful end of the transaction so that any changes (updates) executed by the transaction can be safely committed to the database and will not be undone.
5. ROLLBACK (or ABORT): This signals that the transaction has ended unsuccessfully, so that any changes or effects that the transaction may have applied to the database must be undone.
Transition states of a transaction : Fig. 19.4 (p.638)

Exercise: List five possible "lives" of a transaction, by clearly identifying the states and the respective events that trigger the transition.

A system log is a disk file that keeps track of all transaction operations that affect the values of database items.

The information in the system log may be needed to permit recovery from transaction failures.

Types of log records :

[start_transaction, T]

[write_item, T,X,old_value,new_value]

[read_item, T,X]

[commit,T]

[abort,T]

A system log is mainly used to undo the effect of WRITE operations of a transaction.

It may also be used to redo some WRITE operations. How?

A transaction T reaches its commit point when all its operations that access the database have been executed successfully and the effect of all the transaction operations on the database have been recorded in the log.

Beyond the commit point, the transaction is said to be committed, and its effect is assumed to be permanently recorded in the database. The transaction then writes a commit record [commit,T] into the log.

If a system failure occurs, we search back in the log for all transactions T that have written a [start_transaction,T] record into the log but have not written their [commit,T] record yet; these transactions may have to be rolled back to undo their effect on the database during the recovery process. Transactions that have written their commit record in the log must also have recorded all their WRITE operations in the log, so their effect on the database can be redone from the log records.

Force-writing the log file before committing a transaction:

Before a transaction reaches its commit point, any portion of the log that has not been written to the disk yet (for example, in the main memory buffer) must now be written to the disk.

19.3 Desirable Properties of Transactions

The ACID properties:

Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all.

Consistency preservation : A transaction is consistency preserving if its complete execution take(s) the database from one consistent state to another.

Isolation: A transaction should appear as though it is being executed in isolation from other transactions. That is, the execution of a transaction should not be interfered with by any other transactions executing concurrently.

Durability or permanency: The changes applied to the database by a committed transaction must persist in the database. These changes must not be lost because of any failure.

The levels of isolation of a transaction:

A transaction is said to have level 0 (zero) isolation if it does not overwrite the dirty reads of higher-level transactions.

A level 1 (one) isolation transaction has no lost updates.

Level 2 isolation has no lost updates and no dirty reads.

Level 3 isolation (also called true isolation) has, in addition to degree 2 properties, repeatable reads.

Reponsible parties:

Atomicity	The transaction recovery subsystem of a DBMS
Consistency preservation	The database application programmers, or the DBMS module that enforces integrity constraints
Isolation	The concurrency control subsystem of the DBMS
Durability	The recovery subsystem of the DBMS

19.4 Schedules and Recoverability

When transactions are executing concurrently in an interleaved fashion, then the order of execution of operations from the various transactions is known as a schedule (or history).

A schedule S of n transactions T₁, T₂, ..., T_n is an ordering of the operations of the transactions subject to the constraint that, for each transaction T_i that participates in S, the operations of T_i in S must appear in the same order in which they occur in T_i .

Sample schedules: From Figure 19.3(a) and (b).

Two operations in a schedule are said to conflict if they satisfy all three of the following conditions:

They belong to different transactions.
They access the same item X
At least one of the operations is a write_item(X).

Exercise: Based on the two sample schedules above, S_a and S_b , list at least four operations that conflict and four operations that do not conflict.

A schedule S of n transactions T₁, T₂, ..., T_n is said to be a complete schedule if the following conditions hold:
1. The operations in S are exactly those operations in T₁, T₂, ..., T_n including a commit or abort operation as the last operation for each transaction in the schedule.
2. For any pair of operations from the same transaction T_i, their order of appearance in S is the same as their order of appearance in T_i.
3. For any two conflicting operations, one of the two must occur before the other in the schedule.

A total order must be specified in the schedule for any pair of conflicting operations (condition 3) and for any pair of operations from the same transaction (condition 2).

Recoverable schedule :

A transaction T reads from transaction T' in a schedule S if some item X is first written by T' and later read by T.

A schedule S is recoverable if

no transaction T in S commits until all transactions T' that have written an item that T reads have committed. Why?
T' should not have been aborted before T reads item X. Why?
There should be no transactions that write X after T' writes it and before T reads it (unless those transactions, if any, have aborted before T reads X). Why?

W_T'(X), ..., W_T"(Y), ..., R_T(Y), ..., R_T(X), ...

Question: Where in the above partial schedule can you insert the operations C_T, C_T', and C_T"without making the schedule unrecoverable?

Question: Explain why the sample schedule S_a is a recoverable schedule.

Question: Explain why the sample schedule S_b is a recoverable schedule.

Question: The schedule S_a' below suffers the lost update problem. Explain why the it is still a recoverable schedule.

Exercise: Examine each of the following sample schedules (c, d, e) and determine whether it is recoverable. Justify your answers.

In a recoverable schedule, no committed transaction ever needs to be rolled back.

However, it is possible for a phenomenon known as cascading rollback (or cascading abort) to occur, where an uncommitted transaction has to be rolled back because it read an item from a transaction that failed. This is illustrated in schedule S_e, where transaction T₂ has to be rolled back because it read item X from T₁, and T₁ then aborted.

Cascading rollback can be quite time-consuming.

A schedule is said to be cascadeless, or avoid cascading rollback, if every transaction in the schedule reads only items that were written by committed transactions.

Exercise: Modify schedules S_d and S_e to make them cascadeless.

A strict schedule are composed of transactions that can neither read nor write an item X until the last transaction that wrote X has committed (or aborted).

Exercise: Compare the cascadeless and the strict schedules. Explain their differences.

Exercise: Explain why schedule f below is a cascadeless but not a strict schedule.

A strict schedule is both cascadeless and recoverable. A cascadeless schedule is a recoverable schedule.

19.5 Serializability of Schedules

In a serial schedule, the operations of each transaction are executed consecutively, without any interleaved operations from other transactions. See Fig. 19.5(a) and (b).

Formally, a schedule S is serial if, for every transaction T participating in the schedule, all the operations of T are executed consecutively in the schedule; otherwise, the schedule is called nonserial.

The problem with serial schedules is that they limit concurrency or interleaving of operations.

A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions.

Saying that a nonserial schedule S is serializable is equivalent to saying that it is correct.

Two schedules are called result equivalent if they produce the same final state of the database.

Question: What could be a potential problem with this approach when determining the equivalency between two schedules?

Result equivalence alone cannot be used to define equivalence of schedules.

For two schedules to be equivalent, the operations applied to each data item affected by the schedules should be applied to that item in both schedules in the same order.

Two definitions of equivalence of schedules are generally used: conflict equivalence and view equivalence.

Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules.

A schedule S is conflict serializable if it is conflict equivalent to some serial schedule S´.

Exercise: Using the above definition, explain why schedule D of Figure 19.5(c) is equivalent to the serial schedule A of Figure 19.5(a).

Exercise: Explain why schedule C of Figure 19.5(c) is not serializable.

Testing conflict serializability of a schedule S:

If there is a cycle in the precedence graph, schedule S is not (conflict) serializable; if there is no cycle, S is serializable.

In the precedence graph, an edge from T_i to T_j means that transaction T_i must come before transaction T_j in any serial schedule that is equivalent to S, because two conflicting operations appear in the schedule in that order.

See Figure 19.7, p.649, for the precedence graph for the sample schedules A, B, C, and D.

More sample schedules (E, F) and their respective precedence graphs: Figure 19.8, pp.650-651.

A serializable schedule gives the benefits of concurrent execution without giving up any correctness.

In practice, it is quite difficult to test for the serializability of a schedule. Why?

The approach taken in most commercial DBMSs is to design protocols (sets of rules) that — if followed by every individual transaction or if enforced by a DBMS concurrency control subsystem — will ensure serializability of all schedules in which the transactions participate.

In Chapter 20, a number of different concurrency control protocols that guarantee serializability are discussed.

The most common technique, called two-phase locking, is based on locking data items to prevent concurrent transactions from interfering with one another, and enforcing an additional condition that guarantees serializability. This is used in the majority of commercial DBMSs.

Other techniques: timestamp ordering , multiversion protocols, and optimistic (also called certification or validation) protocols.

The idea behind view equivalence is that, as long as each read operation of a transaction reads the result of the same write operation in both schedules, the write operations of each transaction must produce the same results. The read operations are hence said to see the same view in both schedules.

Two schedules S and S ´ are said to be view equivalent if the following three conditions hold:

See p.653.

Condition 3 ensures that the final write operation on each data item is the same in both schedules, so the database state should be the same at the end of both schedules.

A schedule S is said to be view serializable if it is view equivalent to a serial schedule.

Ø Go to the Index

Main Page

Biography

Teaching

Office hours
Current courses

Research

Services

Other Links

Last updated: 4/02