Some Notes on MariaDB system-versioned Tables

December 14, 2018 14 Dec 2018 by Federico Razzoli

As mentioned in a previous post, I gave a talk at Percona Live Europe 2018 about system-versioned tables. This is a new MariaDB 10.3 feature, which consists of preserving old versions of a table rows. Each version has two timestamps that indicate the start (INSERT,UPDATE) of the validity of that version, and its end (DELETE, UPDATE). As a result, the user is able to query these tables as they appear at a point in the past, or how data evolved in a certain time range. An alternative name for this feature is temporal table, and I will use it in the rest of this text.

In this post, I want to talk a bit about temporal tables best practices. Some of the information that I will provide is not present in the documentation; while they are based on my experience and tests, there could be errors. My suggestions for good practices are also based on my experience and opinions, and I don’t consider them as universal truths. If you have different opinions, I hope that you will share them in the comments or as a separate blog post.

Create temporal columns

It is possible – but optional – to create the columns that contain the timestamps of rows. Since there is no special term for them, I call them temporal columns. MariaDB allows us to give them any name we like, so I like to use the names valid_from and valid_to, which seem to be some sort of de facto standard in data warehousing. Whichever names you decide to use, I advise you to use them for all your temporal columns and for nothing else, so that the meaning will be clear. Temporal columns are generated columns, meaning that their values are generated by MariaDB and cannot be modified by the user. They are also invisible columns, which means that they can only be read by mentioning them explicitly. In other words, the following query will not return those columns:

SELECT * FROM temporal_table;

Also, that query will only show current versions of the rows. In this way, if we make a table temporal, existing applications and queries will continue to work as before.

But we can still read old versions and obtain timestamp with a query like this:

SELECT *, valid_from, valid_to
    FROM temporal_table **FOR SYSTEM_TIME ALL**
    WHERE valid_from < NOW() - INTERVAL 1 MONTH;

If we don’t create these columns, we will not be able to read the timestamps of current and old row versions. We will still be able to read data from a point in time or from a time range by using some special syntax. However, I believe that using the consolidated WHERE syntax is easier and more expressive than using some syntax sugar.

Primary keys

For performance reasons, InnoDB tables should always have a primary key, and normally it shouldn’t be updated. Temporal tables provide another reason to follow this golden rule – even on storage engines that are not organised by primary key, like MyISAM.

The reason is easy to demonstrate with an example:

SELECT id, valid_from, valid_to FROM t FOR SYSTEM_TIME ALL WHERE id IN (500, 501);
+-----+----------------------------+----------------------------+
| id  | valid_from                 | valid_to                   |
+-----+----------------------------+----------------------------+
| 500 | 2018-12-09 12:22:45.000001 | 2018-12-09 12:23:03.000001 |
| 501 | 2018-12-09 12:23:03.000001 | 2038-01-19 03:14:07.999999 |
+-----+----------------------------+----------------------------+

What do these results mean? Maybe row 500 has been deleted and row 501 has been added. Or maybe row 500 has been modified, and its id became 501. The timestamps suggest that the latter hypothesis is more likely, but there is no way to know that for sure.

That is why, in my opinion, we need to be able to assume that UPDATEs never touch primary key values.

Indexes

Currently, the documentation says nothing about how temporal columns are indexed. However, my conclusion is that the valid_to column is appended to UNIQUE indexes and the primary key. My opinion is based on the results of some EXPLAIN commands, like the following:

EXPLAIN SELECT email, valid_to FROM customer ORDER BY email G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
         type: index
possible_keys: NULL
          key: unq_email
      key_len: 59
          ref: NULL
         rows: 4
        Extra: Using where; Using index

This means that the query only reads from a UNIQUE index, and not from table data – therefore, the index contains the email column. It is also able to use the index for sorting, which confirms that email is the first column (as expected). In this way, UNIQUE indexes don’t prevent the same value from appearing multiple times, but it will always be shown at different points in time.

It can be a good idea to include valid_to or valid_from in some regular indexes, to optimize queries that use such columns for filtering results.

Transaction-safe temporal tables

Temporal columns contain timestamps that indicate when a row was INSERTed, UPDATEd, or DELETEd. So, when autocommit is not enabled, temporal columns don’t match the COMMIT time. For most use cases, this behaviour is desirable or at least acceptable. But there are cases when we want to only see committed data, to avoid data inconsistencies that were never seen by applications.

To do so, we can create a history-precise temporal table. This only works with InnoDB – not with RocksDB or TokuDB, even if they support transactions. A history-precise temporal table doesn’t contain timestamps; instead, it contains the id’s of transactions that created and deleted each row version. If you know PostgreSQL, you are probably familiar with the xmin and xmax columns – it’s basically the same idea, except that in postgres at some point autovacuum will make old row versions disappear. Because of the similarity, for transaction-precise temporal tables, I like to call the temporal columns xmin and xmax.

From this short description, the astute reader may already see a couple of problems with this approach:

Temporal tables are based on transaction id’s or on timestamps, not both. There is no way to run a transaction-precise query to extract data that were present one hour ago. But think about it: even if it was possible, it would be at least problematic, because transactions are meant to be concurrent.
Transaction id’s are written in the binary log, but such information is typically only accessible by DBAs. An analyst (someone who’s typically interested in temporal tables) has no access to transaction id’s.

A partial workaround would be to query tables with columns like created_at and modified_at. We can run queries like this:

SELECT created_at, xmin
    FROM some_table
    WHERE created_at >= '2018-05-05 16:00:00'
    ORDER BY created_at
    LIMIT 1;

This will return the timestamp of the first row created since ‘2018-05-05 16:00:00’, as well as the id of the transaction which inserted it.

While this approach could give us the information we need with a reasonable extra work, it’s possible that we don’t have such columns, or that rows are not inserted often enough in tables that have them.

In this case, we can occasionally write in a table the current timestamp and the current transaction id. This should allow us to associate a transaction to the timestamp we are interested in. We cannot write all transaction id’s for performance reasons, so we can use two different approaches:

Write the transaction id and the timestamp periodically, for example each minute. This will not create performance problems. On the other hand, we are arbitrarily deciding the granularity of our “log”. This could be acceptable or not.
Write this information when certain events happen. For example when a product is purchased, or when a user changes their password. This will give us a very precise way to see the data as they appeared during critical events, but will not allow us to investigate with the same precision other types of events.

Partitioning

If we look at older implementations of temporary tables, in the world of proprietary databases (Db2, SQL Server, Oracle), they generally store historical data in a separate physical table or partition, sometimes called a history table. In MariaDB this doesn’t happen automatically or by default, leaving the choice to the user. However, it seems to me a good idea in the general case to create one or more partitions to store historical rows. The main reason is that, rarely, a query has to read both historical and current data, and reading only one partition is an interesting optimization.

Excluding columns from versioning

MariaDB allows us to exclude some columns from versioning. This means that if we update the values of those columns, we update the current row version in place rather than creating a new one. This is probably useful if a column is frequently updated and we don’t care about these changes. However, if we update more columns with one statement, and only a subset of them is excluded from versioning, a new row version is still created. All in all, the partial exclusion of some rows could be more confusing than useful in several cases.

Replication

10.3 is a stable version, but it is still recent. Some of us adopt a new major version after some years, and we can even have reasons to stick with an old version. Furthermore, of course, many of us use MySQL, and MariaDB is not a drop-in replacement.

But we can still enjoy temporal tables by adding a MariaDB 10.3 slave. I attached such a slave to older MariaDB versions, and to MySQL 5.6. In all tests, the feature behaved as expected.

Initially, I was worried about replication lags. I assumed that, if replication lags, the slave applies the changes with a delay, and the timestamps in the tables are delayed accordingly. I am glad to say that I was wrong: the timestamps in temporal tables seem to match the ones in the binary log, so replication lags don’t affect their correctness.

This is true both with row-based replication and with statement-based replication.

A small caveat about temporal tables is that the version timestamps are only precise at second level. The fractional part should be ignored. You may have noticed this in the example at the beginning of this post.

Backups

For backups you will need to use mariabackup instead of xtrabackup.

mysqldump can be used, not necessarily from a MariaDB distribution. However, it treats temporal tables as regular tables. It does not backup historical data. This is necessary because of a design choice: we cannot insert rows with timestamps in the past. This makes temporal tables much more reliable. Also, temporal tables are likely to be (or become) quite big, so a dump is probably not the best way to backup them.

–

Photo by Ashim D’Silva on Unsplash ∎

Federico Razzoli

Relational databases lover and open source supporter. I wrote “Mastering MariaDB” in 2014, and I was a MariaDB Community Ambassador. I am an independent consultant based in London

See all posts by Federico Razzoli »