Recovery > Normal

This is the special part for recovery and normal.

image-20230104213954735

Recovery

Do-it-yourself-recap: The many faces of atomicity

Objectives Today

Recovery: Basic Concepts

For this part, please review [Experi Recovery](https://liuying-1.github.io/blog/2023/recover/).

Recovery: ARIES, normal operation

Recovery Equations

Let’s talk about the algorithm, we need to support all REDO and UNDOs. What do we log and how do we log it?

WAL & the Log

image-20230106162212065

What are the things that we need to store in the log to do our accounting, accounting is what happening in the system.

First of all, we need IDs for the records that we store in the log and we will call log sequence number or just LSN. The log is a sequential thing, and LSNs always increasing. We will use this fact to compare to sequence numbers and to figure out what has happened before what.

Then, when we look at the data pages that are stored everywhere in the system both in memory and on disk, we will extend the data page beyond the actual data, also containing Log Sequence Number which is the most recent update to this page.

Moreover, the system will keep track of other special log sequence numbers. What does this special log sequence number (flushedLSN)? The log itself should live in stable storage, maybe the largest part of it. But still, when we extend it, we first need to go through RAM. The log itself will be split into a part that lives on the disk, and the part that lives on the RAM. And the flushedLSN is a kind of pointer to where the split happens between the two.

image-20230106164519107

When a crash happens, we will lose the “Log tail” since it lives on the RAM. So this is not so perfect, this part will be lost and his will be our only information about actions that were happening in this period of time. => 1. We want to flush the log frequently. We want to make this time where entries log entries leave in the RAM as short as possible. 2. We are not aiming for perfection here, we will lose some information here. However, we don’t want to lose those entries.

What does the write-ahead logging means? This means, whatever we write a page to the disk, we need to make sure that pageLSN which is the most recent record modifying the page is below what has been flushed. And what has been flushed means that the log record corresponding to this modification is already in the “blue” part.

Log Records

LogRecord fields

| prevLSN | XID | type | pageID | length | offset | old data | new data | | :—–: | :-: | :–: | :—-: | :—-: | :—-: | :——: | :——: |

Transaction Table and Dirty Page Table are the data structures maintained in memory. This was kind of the local state of the algorithm that there’s responsible for recovery, but it will be completely lost on the crash and we will be using the log to reconstruct the state of these data structures at the moment of the crash.

Normal Execution of a Transaction

It must be OK to crash at any time -> repeat history!

What do we do during normal execution of a transaction?

Firstly, what is a transaction? A transaction is a series of READs and WRITEs at least from our point of view followed by a Commit or Abort at the end. We have assumed that writes to disk have no duration as opposed to commits or aborts. But in practice, we need to handle the fact that writes also take time.

The Big Picture: What’s Stored Where

image-20230106175408043

We have the log that somehow sits in the hypothetical stable storage. It contains all these things, but maybe not all entries, for example, the pageID, length, offset, old data, and new data are only for the Update entries. However, these things are always there.

Then, we have the database, and the databases have been extended by the pointer of pageLSN => last modification to this page. We also have the master record.

flushedLSN indicates how the log is split between the parts that still in the RAM and the part that lives in the stable storage. Additionally, the log tail should also in the RAM. The two data structures, transaction table and dirty page table sit at memory.

Right now, we have to deal with the case where the RAM part is completely gone.

Checkpointing

Before I tell you how we deal with the normal operation handle crashes. Here are the two extra log records. They can be seen as an optimization. What we want to avoid is that the recovery after crash takes a long time.

This happens when the transaction table and the dirty page table that we are losing. If you need to start from the beginning of time when reconstructing those, this will be a problematic. => So we want to make this point in time as late as possible from where we need to start working construction the transaction table and the dirty page table. And that’s we will take chekpoints of those and store them also in the log.

We will initiate a checkpoint and it consists of the transaction table and the dirty page table. Initiating a checkpoint means starting writing these things to the log and once we are done, we create the end checkpoint record which contains the current state of the two tables (XT & DPT).

Then, when we recover, we can start from this state as opposed the empty table. So, this is called fuzzy checkpoint as it takes the time to write these tables. And the point is fuzziness.

The other transcations continue to run. What is stored in the end_checkpoint record is the snapshot at the moment of the corresponding begin_checkpoint record.

This is the reliable information. From that point (begin_checkpoint), we can rely on what is stored at this slightly later point in time and we will be recreating the state of our table starting from the earlier point.

Since this is an optimization trying to bring the time as close to the present from which we need to start, it doesn’t actually matter that much that there is a certain operation that happened between the two.


Official Version


Transaction Commit

What should happen at when the transaction commits?

  1. We want to write commit record to the log and we want to write this as quickly as possible.
  2. Moreover, we want to flush all the records up to this commit record to the disk. And it guarantees us that flushedLSN will be greater than the lastLSN transaction that has committed. It will make sure that the user can rely on the information about the commit made it to the stable storage.
  3. These are sequentials and synchronous writes to disk. The log is sequential everywhere.
  4. We will try to have multiple of records per single log page. When we are writing a single log page, we are writing a lot of log reference.

After we flush the log, then Commit() can return. Then the user gets the notification that “Your transaction has been committed”. You can rely that it’s durable.

Here I did not say anything about the actual data written to disk, this happens in background process.

Then, we need to update the local state (XT and DPT) of algorithm. After that, you will write the end record to the log.


Official Version

The transaction is considered as committed as soon as the commit record makes it to the stable storage which is the last entry that’s flushed.


Simple Transaction Abort

What we want to do is UNDOing the effects of the transaction that happens so far. And we essentially will play back the log backwards to UNDO the latest action by the transaction, then the second, and so on, until the very start.

So, where do we get the lastLSN of the transaction? We looked up in the transaction table.

We look it up to get what the latest update was. If it was a standard update, we UNDO it by taking the old data entry from the log record, and write it to memory. Flush to disk at some time.

Once we’ve done that, we will follow our linked list backwards => looking at the prevLSN of the last log entry that we’ve UNDOne.

Before we starting doing backwards traverse, we write an Abort log record. And this is important when we have a crash during this UNDO phase. Remeber that we wanted to abort this transaction.

Abort, cont.

First of all, we will be writing pages, and this is a running transaction. To perform UNDO, we must have a lock on data. But magically, we have the lock on the data because we are using S2PL. We only releasing a lock once the transaction commits. => It will not work if using C2PL as it releases locks gradually.

Example

image-20230106194011808

Solution: This is my RAM below to record what we have in the data structures as we go.

image-20230107004543282

A Longer Example

**_`flushedLSN` has not effect on the tables._**

Summary

Today is all about the normal execution of a transaction, including transaction commit, and transaction aborts. The next lecture, we are going to talk about Crash Recovery.