This is the chapter relating to concurrency control - Part 2.
Locking Solutions for Isolation in ACID Transactions
Why two phases?
Acquiring phase and releasing phase. What does this give us? The claim from the last lecture is that this will give us serializability. So only schedules that are produced that can be produced by such a mechanism will be serializable.
What about the problems with these different strategies? (Solution 4 & Solution 5) How to acquire and release locks? How to structure the two phases?
C2PL => Cascading Aborts, S2PL => Deadlocks
Strict 2PL can however deadlock => We will see how to handle deadlock automatically
What is a schedule?
A schedule is a sequence of READ / WRITE
actions that these performed by different transactions.
Interleaving is one possible execution of these actions and we allow the transactions to interleave their actions. This is what will give us concurrency but in the schedule, all actions are linearized. So there are no concurrent actions in a schedule. !IMPORTANT
Serial schedule: Schedule that does not interleave the actions of different transactions.
Equivalent schedules: See Concurrency Control 1
READ
s and WRITE
s. So we can not somehow assume anything about what these transactions might do to the state of the system so because we need to require that all the values read by the transactions will be the same in the two transactions. So this encodes the assumption that we abstract away the logic of two of the actual transaction.Serializable schedule: Concurrency Control 1
So if we have a serializable schedule, this is the kind of core property that helps us achieve why we are interested in serializable schedules. It helps us to reason about consistency. Consistency is not the task of the database management system to provide it somehow application specific but active if your schedules are not serializable, it becomes very hard for the use of a DBMS to think about consistency.
Note: If each transaction preserves consistency, every serializable schedule preserves consistency.
Whereas for a serializable schedule, the user can pretend that there is no concurrence in the database system. So if there is a transaction that is consistent without any concurrency, if it’s consistent as if it was executed as the only transaction in the system then also the serializable will preserve consistency.
Summary => We want serializable property that it can preserve consistency.
These three are the most typical three.
Dirty Reads (WR Conflicts) => Read uncommitted Data
The first transaction would write the value of A and then the second transaction would execute a Read of A and much later the first transaction with the action aborted for some reason. Then, transaction 2 has a problem because that value should have never been written in the first transaction when the transaction has been aborted, we don’t want to see any of its actions take any effect on the database state.
Unrepeatable Reads (RW Conflicts)
Here the transaction reads a value and transaction 2 modifies that value and transaction 1 reads that value again. Well, it will be a potentially different value because it has been modified. Thus this is a bit problematic like the transaction should rely on that when it reads if it is the only transaction running in the system, whatever it reads will not be modified by anyone else. That’s the isolation view that we want to enforce.
Overwriting Uncommitted Data (WW Conflicts)
Blind Write without reading any values. The last written value for B is the one from transaction 1, and the last written value for A is the one from transaction 2. There is simply no serial execution of this.
Now comes the refinement of this notion of serializability. And this is a very different kind of notion I would say. So while serializability is a semantic notion, it talks about how database state changes across the execution of the schedule. This one is very syntactic. It just looks at which actions are performed and which actions can be easily swapped.
We call the schedule conflict serializable if it is conflict equivalent to some serial schedule and now the important part is that conflict equivalent is a different thing. This is not the same as equivalent, this is where the difference between the two definitions lies. => Introduce the definition of the Conflict Equivalent
Two schedules are conflict equivalent if they have the same actions of the same transaction.
Tip: In equivalent schedules, the serializability doesn’t care about the sequence of the actions, while it does matters in the Conflict Equivalent.
Moreover, every pair of conflicting actions should be ordered in the same way. First, what are conflicting actions? Those are the kind of actions that cause troubles => Conflicts mentioned before. So, two actions are conflicting if they access the same memory location and one of the two is the Write
.
For every pair of actions in two schedules, if conflicting actions to schedules, they must occur in the same order in both schedules. So if a READ
to A occurred before a WRITE
to A in one schedule, then it must do so in the second schedule as well.
Official Version
Two schedules are conflict equivalent if:
Schedule S is _conflict serializable_ if S is conflict equivalent to some serial schedule.
We will build Precedence Graph to see if there is a cycle. If there is like here for example, then the schedule will not be conflict serializable. If there is not, then the schedule will be conflict serializable. There is an arrow if there is a pair of conflicting operations between the two transactions, and the earlier part of the pair comes from the source of the arrow and the later parties and the target of the arrow.
Analysis of the example
Conflicts
READ
conflicts with the WRITE
=> T1: R(A)
-> T2: W(A)
. And because it occurs in the schedule before the WRITE
, that’s the reason why there is an arrow from T1
to T2
.T1: W(A)
, T2: R(A)
is the case of dirty reads, and also T1: W(A)
conflicts with T2: W(A)
. And shares the same arrow with conflict 1 as they are all caused by T1
.T2: R(B)
conflicts with T1: W(B)
, and T2: W(B)
conflicts with T1: R(B)
. Hence, there is a back edge arrow.One node per transaction; edge from Ti
to Tj
if an operation in Tj
conflicts with an earlier operation in Ti
.
Hence, S2PL only gives us conflict serializable schedules.
Are the following schedules conflict serializable?
T1 | R(A) | W(B) | C | |||||||
---|---|---|---|---|---|---|---|---|---|---|
T2 | R(B) | R(A) | R(C) | C | ||||||
T3 | R(B) | W(C) | C |
Yes. This is conflict serializable.
T1 | R(A) | W(B) | C | |||||||
---|---|---|---|---|---|---|---|---|---|---|
T2 | R(B) | R(A) | R(C) | C | ||||||
T3 | R(B) | W(C) | C |
No, as there is a cycle in the Precedence Graph.
The above are my own solutions, I am not sure whether they are correct right now.
We need to pay attention to the direction of the graph. And there is no need to think about commit action.A schedule S is serializable if there exists a serial order SO such that:
Under this definition, certain serializable executions are not conflict serializable!
T1 | R(A) | W(A) | ||
---|---|---|---|---|
T2 | W(A) | |||
T3 | W(A) |
Is this schedule serializable? How to judge whether the schedule is serializable?
Yes, the equivalent schedule is below.
T1 | R(A) | W(A) | ||
---|---|---|---|---|
T2 | W(A) | |||
T3 | W(A) |
However, is this schedule conflict serializable?
No, as there is a cycle between T1
and T2
.
Hence, it is serializable but not conflict serializable.
There is kind of an alternative that sits between conflicts serializable and serializable, view serializable.
A schedule is view serializable if it is view equivalent to a serial schedule. Here is a new term, view equivalent.
Schedules S1 and S2 are view equivalent if:
If Ti
reads initial value of A
in S1
, then Ti
also reads initial value of A
in S2
. => Initial Read
If Ti
reads value of A
written by Tj
in S1
, then Ti
also reads value of A
written by Tj
in S2
=> Any Read happens Later
If Ti
writes final value of A
in S1
, then Ti
also writes final value of A
in S2.
Reference about View Serializable
When we use S2PL, we can get into a deadlocks, and what is the deadlock?
Deadlock: Cycle of transactions waiting for locks to be released by each other. Hold the locks that are required for the other transaction to proceed.
We have two ways dealing with deadlocks.
One is Deadlock Prevention that will ensure that no cycle will happen. When taking locks, we will decide what to do with transactions and we will abort those that would cause a cycle to arise. (Before)
The second is Deadlock Detection, in this strategy, we don’t do the same as the first one. We only always monitor who is waiting for whom, looking if there are cycles. And only when the cycle is there, then, we start aborting transactions. (Arise)
How to prevent deadlocks? The basic idea is to assign certain priorities or weights to our transactions and one good strategy is to use timestamps.
The idea is that the oldest transaction should get the higher priority somehow the transaction that needs to wait longer to execute should get priority over more recent transactions.
Hence, we,
Assign priorities based on timestamps.
Lower timestamps get higher priority, i.e., older transactions get prioritized.
Assume Ti
wants a lock that Tj
holds. Two policies are possible (alternatives).
Wait-Die: If Ti
has higher priority, Ti
waits for Tj
; otherwise Ti
aborts. => Ti
should wait that it has higher priority but it still needs to wait because the lock has been held by Tj
and we want to wait for Tj
to finish its execution, but we also do not want to abort Ti
because it has higher priority. Hence, that is the decision to let Ti
wait. If Ti
does not have higher priority, we abort with Ti
.
=> 如果A的优先权为1,B的优先权为2。此时,如果A向B要锁,A的权限更高,然后A就等,因为A的生存时间更久;如果此时B也正好要A的锁,则B abort,否则造成死锁。
Wound-wait: If Ti
has higher priority, Tj
aborts; otherwise Ti
waits.
=> 同样的例子,优先级高的永远在要锁的时候能抢到锁,优先级低的直接abort。
If a transaction re-starts, make sure it has its original timestamp.
When this graph has a cycle, then we start getting active and start killing transactions.
Ti
to Tj
if Ti
is waiting for Tj
to release a lock => Different from the Precedence Graph T1 | S(A) | R(A) | S(B) | |||||||
---|---|---|---|---|---|---|---|---|---|---|
T2 | X(B) | W(B) | X(C) | |||||||
T3 | S(C) | R(C) | X(A) | |||||||
T4 | X(B) |
In the example above, when T1
would like to R(B)
, it first gets a shared lock for B, however, it needs to wait for T2 to release the exclusive lock. Hence, T1
to T2
. The others are the same.
The right part for X(A)
is in red because this is the operation close the cycle and eventually detects that there is a cycle and decides which transaction to abort. Once we have a cycle, we need to abort one of the involved transactions. => Abort 1, 2, 3 哪个都行
T1 | S(A) | X(D) | X(C) | C | |||||
---|---|---|---|---|---|---|---|---|---|
T2 | X(A) | X(B) | |||||||
T3 | S(B) | S(C) |
Note: We only show locking operations for brevity! C alone denotes commit, all locks released
There is no deadlock.
Notice: We need to delete the arrow from T2 to T1 as soon as the locks are released by T1.
T1 | S(C) | S(A) | X(D) | ||||
---|---|---|---|---|---|---|---|
T2 | S(B) | X(C) | |||||
T3 | S(D) | X(B) |
There is a deadlock.
Here the contents above are something about lockful approaches for concurrency control. Let’s talk about lockless approaches.
Locking is a pessimistic (Prepare yourself for the rest to happen) approach in which conflicts are prevented. Disadvantages:
But what we will be looking right now is a more optimistic approach. Just let things happen and if bad things happen, you try to resolve them without using locks.
However, even if we were to give up locking, we still wanted to enforce some form of serializibility without requiring the entire schedule to be completely serial, so without destroying concurrency.
We will look at how to repair violations when they happen by aborting transactions. Locking is used to prevent violations. While aborts is the only method to fix violations.
How can we design a protocol based on aborts instead of locks?
The basic idea is, there are no locks, so completely different from before, but the transactions has been structured into 3 different phases.
READ
=> We have a READ
phase when you read things from the database. We can also write things but not to the database. (You are writing things you are making changes to copies of the objects you read from the database).
VALIDATE
=> We will talk in detail what the validate phase needs to do to ensure in the end serializability. => Check for conflicts.
WRITE
phase where we write out all the local changes that we accumulated, but we are not reading anymore in this phase. We only access database in the WRITE
phase => Make local copies of changes public.
Test conditions that are sufficient to ensure that no conflict occured.
Each transaction is assigned a numeric id and just use a timestamp.
Transaction IDs are assigned at end of READ
phase, just before validation begins. When the transaction is finished with reading then we record that timestamp and say this is the id of the transaction.
And then, we need to consider the set of objects read by the given transaction and modified by the transaction. And we call these sets ReadSet(Ti)
and WriteSet(Ti)
.
Official => ReadSet(Ti)
: Set of objects read by transaction Ti
, WriteSet(Ti)
: Set of objects modified by Ti
The validation phase proceed the tests and there are three test conditions. If the first test fails, we still have a chance to succeed with the second test and so on.
For all i and j such that `Ti` < `Tj`, check that `Ti` completes before `Tj` begins. Ti and Tj are the id of the transactions and also the timestamp after the end of the READ
phase. But if additionlly Ti
has completed doing everything what it was supposed to do before Tj
even started, that’s ok. So then we passed the first test and there is no problem between these two transactions, this is kind of they do not overlap in time. => Transactions are not executed concurrently.
For all i and j such that Ti
< Tj
, check that:
Ti
completes before Tj
begins its WRITE
phase
WriteSet(Ti)
$\and$ ReadSet(Tj)
is empty. => Tj
does not read dirty data that has been modified by Ti
.
For all i and j such that Ti
< Tj
, check that:
Ti completes READ
phase before Tj
does
WriteSet(Ti)
$\and$ ReadSet(Tj)
is empty => whatever Tj
reads is not touched by Ti
WriteSet(Ti)
$\and$ WriteSet(Tj)
is empty => Ti
will not overwrite Tj
’s writes.
Predict whether T3 will be allowed to commit, given the transaction below.
This is the Optimistic Concurrency Control.
So, there is a question comes out. Why were we developing this complex algorithms for the 2 phase locking if you can do without locks?
There is some overhead that also comes in this way of achieving serializability.
ReadSet
and WriteSet
per transaction. => Must create and destory these sets as needed.Still, optimistic techniques widely used in software transactional memory (STM), main-memory databases.