Recovery > Crash

This is the special case of the recovery when the system crashes.

Recovery: Crash

RAM

What is the Transaction Table (XT)?

What is the Dirty Page Table (DPT)? => The first modification of this page that has not been written to disk (recLSN).

What is the flushedLSN? => It only refers to the log and recall that the log is also split between the memory and the disk. => Denotes which part lives in memory, and which part lives in the disk.

DISK

What is the master record? => It is just a log sequence number and we need to store it a bit more securely to facilitate a speedy recovery. => Master record refers to the beginning of the last checkpoint. Because this will be the point from which we need to start the recovery.

Crash Recovery: Big Picture

What happens after the crash or what should happen?

image-20230107155730658
We can crash at anytime including during the recovery phase!

The recovery looks like and it is subdivided three phases, the ARU phases.

A => Analysis, R => Redo, U => Undo, and they are executed in the same order.

The arrows’ direction is the direction in which they go through the log in their own phase.

It does not have to be the case that one is before the other same with the analysis phase. The length of the arrow is not indicative, in particular, the UNDO phase could stop earlier than the point where the REDO phase starts .


Analysis Phase

What happens at the moment of the crash is that we’ll lose our dirty page table and our transaction table. That’s some important pieces of information that we need. Also, this is the purpose of the Analysis phase to reconstruct these tables as closely as possible to the moment of the crash. Once these are reconstructed, we can find out which transactions have actually committed and which have failed and should be aborted. => Task is to reconstruct these tables. The reconstruction will start from the last checkpoint as the checkpoint contains the snapshot of the tables that is valid at the moment of the begin_checkpoint which we know by the master record. Therefore, we go through the log in that direction and update our tables according to what we find there.

REDO Phase

Then, the second phase comes after we reconstructed those tables and we redo all the actions. We are not looking whether a transaction succeeded or failed. Irrespective of that, we will be performing the actions that our log contains.

To find out where we need to start REDOing actions, we look in the dirty page table and find the smallest recLSN which is the smallest timestamp that made some page dirty. We redo all actions until the current point or until the end of our log.

UNDO Phase
Undo the transactions that have not been successfully committed yet. For that, we will proceed as in the normal operation in the normal abort of the transaction as last time. [Recovery Normal](https://liuying-1.github.io/blog/2023/recovery-normal/)

Recovery: The Analysis Phase

Recovery: The REDO Phase

We repeat History to reconstruct state at crash:

Recovery: The UNDO Phase

ToUndo = { lsn | lsn a lastLSN of a "loser" Xact}

Abort for all losers.

Loser: Those are the ones that are not in the active in the transaction table after REDOing phase.

Repeat:

Until ToUndo is empty.

According to the home assignment, the committed transactions before crash are winners.

image-20230107155730658
image-20230107203841645

Additional Crash Issues

What happens if system crashes during ANALYSIS? During REDO?

If exactly happens during ANALYSIS when we are in the process of reconstructing the table from the previous crash, we will reconstruct the tables again from the same information what we have.

REDO

We are performing anything that is recorded in the log yet. We will redo all the actions again from scratch. But there is a little bit difference from the above. If there is something already propagated to disk, there is no need to redo it. But in principle, we assume everything should be REDOne.

How do you limit the amount of the work in REDO?

How do you limit the amount of the work in UNDO?

A Short Summary of Before

Strong modularity is a kind of fault tolerant.

image-20230107224641524