At the baseline, every database is a very simple soul. A good database is meant to ingest your data, process it, and return it to the users when needed in whatever form they want it. To be practical for the needs in hand, any given database must allow as many queries as possible to be simultaneously processed. SQL came up as a declarative language for database management.
This does not mean that an SQL statement dictates the database as to how to perform a query. Still, it just raises the parameters with which a database can operate within itself while a statement is executed. In other terms, a programmer who interacts with the given database is ceding control over the specifics to a database. The application logic gets separated from the data storage with this approach. As a result, the database itself becomes capable of managing multiple simultaneous requests. Most specifically, a database has to handle situations where more than one reads and writes transactions on the same piece of data.
The conventional coders may raise a question as to why don’t we try to read/write locks to handle this? It is reasonable to raise this question as to the most adopted solution for databases for years. However, there is a problem with this approach. The readers may tend to block the writers and vice versa. These come as the semantics of a standard read/write lock. In the case of a transactional database, these semantics may have a bigger adverse impact on the performance.
You may consider a system with two different kinds of transactions. One may be a long-running read transaction that builds up a report or likes doing some analytics. Consider there is only one among these that may be running at one time. Another transaction may be a short-running user action with more than one write or read at a time. An order request or similar transaction can be considered as an example. In a typical web application environment, there can be thousands of such transactions.
Here, the primary problem with a standard read/write lock is that there can be an overlap between the set of read data for a long-running transaction. A set of write data on a short-running transaction may get blocked until the former transaction completes. In the tangled internet world of interweb now, even a long-running transaction that runs for a few seconds is unacceptable to the end-users. A study shows that the bounce rate is high for the web pages, which has a response delay of more than 3 seconds. If you are looking for better database solutions to ensure uptime and performance of your web applications, check out what RemoteDBA got to offer you.
The role of Multi-Version Concurrency Control (MVCC)
Considering the above challenge, there is some ‘wow’ factor about what MVCC or Multi-Version Concurrency Control solve. The name itself suggests how it solves the problem. Rather than maintaining predetermined storage space for any given record and applying read/write as lock as done conventionally, an MVCC DB will let a given record keep multiple versions simultaneously.
This means the updates are now installing another fresh row version. With this, the writers are contesting over their rights toa dd a new version on top of the given row, and the readers can read from any visible version out there. This means that long-running transactions will not lock again even when the short transaction queries are processed. Only if two or more transactions are vying for update rights on a row version, there is a possibility of mutual interference. We can see that MVCC is ideally used to implement the snapshots of a database.
As an effect of the same, while a transaction initiates, it gets freeze with a collection of unique records as a visible set. Updates of these versions are logically sequenced after the transactions are read. All that DBAs need to do is support the MVCC semantics as a record storage system that stores various versions of the same record. It also acts as a mechanism that allows users to translate the record version number to transaction IDs to make the visibility calculations.
MVCC and SQL
MVCC system can be used to implement the SQL semantics concurrently. SQL offers different isolation levels, which acts as a knob allowing the programmersto dial in a level of consistency needed. In the case of MVCC, it means that while a transaction hits a specific record updated by another concurrent transaction, the action depends on its isolation level. At the basic level of consistency, one may allow these transactions to read any update before the transaction update is committed, known as dirty read.
However, the next consistency level will prevent any dirty reads and just read the calculated version from the given transaction’s visibility. This is called ‘consistent reading.’ An update which hits a record version that undergoes an uncommitted update may have two options as:
- It can merely freak out, and straightaway fail, or,
- It can wait for the updated transaction to complete and then decide based on the committed state.
The low isolation levels fail faster as logically; they see the transactional updates as they run. The high consistency levels may need to block or retry based on the user needs until the ultimate record state is known.
For example, you can consider a table with 3 live rows. You may assume that each of these has some specific ID values. Provided this, we may examine the below use case.
Also Read: Asthma Patients Not At Risk From Covid-19
The first transaction starts before transaction 2 and begins to read each entry in the tale. Simultaneously, transaction 2 also starts up and looks for updatable records. Transaction #2 may outrace the first transaction to the record 100 and slips in early to updates it. A second transaction is still running; the update is pending. In a lock-based system, transaction 1 may have to get locked or validate that no record is changed while it was reading. On the other hand, in the MVCC model, while the first transaction finally gets around to read the record 100, it reads the most recent version. As a result, no reader is blocking a writer, and no one has to roll back or block.
This way, MVCC concurrency control can ensure multi-version concurrency in SQL distributed databases. It is proved to be the right approach for the DBAs to ensure database concurrency by providing better performance.