dbrewind - am I onto something?¶
Note
This article will evolve together with the dbrewind project.
Recently I’ve been working on a “perfect solution” for database cleanup between tests. In most backend projects, database setup either needs to be completely redone between tests, or there is a shaky hand-written database cleanup procedure. Alternatively, a large subset of tests is limited to working within a single transaction, and if the database does not support nested transactions (e.g. Postgres), the database connector is monkey-patched to translate commits to checkpoints.
The goal of the project is to have a fast (<20ms) database cleanup procedure capable of undoing any CRUD changes created by any number of transactions, up to the point of initialisation of the rewind engine - and maintaining its performance characteristics no matter how large the fixture data set is.
Design¶
For each table in the public schema we create 2 diff tables in the dbrewind
schema - one of them
contains insertions (positive diff), and the other one - deletions (negative diff). The tables are
exact clones of the original table, except they lack any data and have different constraints and
indexes. Each of the public tables must have some concept of record “uniqueness” - in practice
either a primary key or a UNIQUE constraint involving exclusively non-nullable columns [1]. We
detect the “best” set of columns to identify a row by querying all UNIQUE and PK constraints, and
choosing the one with the smallest set of columns (constraints may involve multiple columns). If there
are 2 winners, we currently take the first one in alphabetical order. We use these columns to define
primary keys in the respective diff tables.
We install a generic trigger on all public tables, which fires for every row modification on SELECT,
UPDATE, or DELETE (TRUNCATE is not supported). The handler updates the related diff tables with the
contents of modified rows. The logic for INSERTS and DELETES is identical but mirrored - on INSERT
we first try to remove an entry from the records_deleted_<tablename>
diff table, and if it
doesn’t exist, we insert it to the records_inserted_<tablename>
table. DELETE does the same, but
with the diff tables swapped. UPDATE is treated as a DELETE followed by an INSERT.
Having a trigger fire for each row incurs an extra cost, which is spread evenly between all operations.
In order to rewind changes, for each table we first delete the rows from
records_inserted_<tablename>
and then insert rows from records_deleted_<tablename>
. When
done restoring a table, we truncate its diff tables.
Since the rewind procedure handles tables separately, we set session_replication_role
to
'replica'
, which disables FK constraints, which we may temporarily violate in the process of
rewinding. This also disables triggers, which is helpful because it prevents changes made by the
rewind procedure itself from being logged to the diff tables.
Changing session_replication_role
is a great solution, since not all constraints are deferrable,
however care has to be taken as FK constraints are not verified when session_replication_role
is
set back to origin
.
Usage¶
How would one use it? After your database is set up by a potentially slow fixture, you execute the
dbrewind.sql
file and call dbrewind.install()
. This creates two diff tables for every one of
your tables and saves some metadata - all in the dbrewind
schema. Then, after each test you can
call dbrewind.rewind()
to go back to the point of calling dbrewind.install()
.
Limitations¶
DDL and TRUNCATE are not supported. Currently adding tables after dbrewind.install()
causes
errors when rewinding. In future, they may be ignored, or even automatically incorporated.
Currently, each table must have defined “uniqueness” - it’s either a primary key on one or more
columns or a unique constraint on one or more non-nullable columns. It is possible to remove this
limitation by automatically modifying tables with no “uniqueness” by adding the internal oid
column to them. oid
is hidden from query results, unless explicitly asked for. oid
is 32-bit
so a wraparound is possible, but it’s good enough for testing.
Failed approaches¶
Initially I wanted to log each table’s changes to a log table, which would mirror its original
table’s schema, but also include additional change ID and change type columns. I then wanted to be
able to take database snapshots at any point - taking a snapshot would simply mean registering which
tables were at which change ID at a given point. Change type would be an enum with INSERT
and
DELETE
variants and an UPDATE would be represented by a pair of INSERT
and DELETE
changes.
This approach proved unfeasible for 2 reasons - first, I couldn’t walk the changes table back,
undoing one change after another, since this would include iteration on generic table rows in
PL/pgSQL code. Generic table rows (the RECORD
type) are untyped and cannot be passed to queries
as a whole - apart from the special NEW
and OLD
records in trigger handlers.
I then tried to to make the rewind procedure undo all deletes (perform re-insertions), and then undo
all deletes. It worked to some degree, since the rows didn’t have to be returned to PL/pgSQL - 2
CTEs were enough. Unfortunately,
since setting session_replication_role
to 'replica'
does not disable UNIQUE and PK
constraints, it was possible to encounter constraint violations when rewinding - one UPDATE
statement was enough. I think that even an INSERT of a row, followed by a DELETE, followed by
another INSERT of the same row would result in constraint violations during the rewind procedure.
It was becoming obvious that I needed to compose 2 diff tables, one with insertions, and one with deletions, and make sure they are not mutually redundant (e.g. insertions and deletions containing an identical row). Turning a log into 2 diff tables with these properties was too complex so after several hours of experimentation, I ended up completely scraping the log table idea and instead went with maintaining diff tables directly in the trigger handler.
Talk is cheap. Show me the code¶
Work in progress is available at https://gitlab.com/karolinepauls/dbrewind. The PoC stage has been
reached. There is some basic testing, but I’m going to want to add some generative tests.
dbrewind
is written completely in PL/pgSQL, with tests in Python. All functionality currently
fits in a bit over 200 lines of PL/pgSQL code.
I also have to decide on the licence - just slapping MIT on feels too lazy for a non-trivial project.